* [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up @ 2020-12-17 20:12 Roman Gushchin 2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin 2020-12-20 6:48 ` [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Mike Rapoport 0 siblings, 2 replies; 38+ messages in thread From: Roman Gushchin @ 2020-12-17 20:12 UTC (permalink / raw) To: Andrew Morton, Mike Rapoport, linux-mm Cc: Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team, Roman Gushchin Currently cma areas without a fixed base are allocated close to the end of the node. This placement is sub-optimal because of compaction: it brings pages into the cma area. In particular, it can bring in hot executable pages, even if there is a plenty of free memory on the machine. This results in cma allocation failures. Instead let's place cma areas close to the beginning of a node. In this case the compaction will help to free cma areas, resulting in better cma allocation success rates. If there is enough memory let's try to allocate bottom-up starting with 4GB to exclude any possible interference with DMA32. On smaller machines or in a case of a failure, stick with the old behavior. 16GB vm, 2GB cma area: With this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002930] cma: Reserved 2048 MiB at 0x0000000100000000 [ 0.002931] hugetlb_cma: reserved 2048 MiB on node 0 Without this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000 [ 0.002934] hugetlb_cma: reserved 2048 MiB on node 0 v2: - switched to memblock_set_bottom_up(true), by Mike - start with 4GB, by Mike Signed-off-by: Roman Gushchin <guro@fb.com> --- mm/cma.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/mm/cma.c b/mm/cma.c index 7f415d7cda9f..21fd40c092f0 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -337,6 +337,22 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, limit = highmem_start; } + /* + * If there is enough memory, try a bottom-up allocation first. + * It will place the new cma area close to the start of the node + * and guarantee that the compaction is moving pages out of the + * cma area and not into it. + * Avoid using first 4GB to not interfere with constrained zones + * like DMA/DMA32. + */ + if (!memblock_bottom_up() && + memblock_end >= SZ_4G + size) { + memblock_set_bottom_up(true); + addr = memblock_alloc_range_nid(size, alignment, SZ_4G, + limit, nid, true); + memblock_set_bottom_up(false); + } + if (!addr) { addr = memblock_alloc_range_nid(size, alignment, base, limit, nid, true); -- 2.26.2 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2020-12-17 20:12 [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Roman Gushchin @ 2020-12-17 20:12 ` Roman Gushchin 2020-12-19 14:52 ` Wonhyuk Yang ` (3 more replies) 2020-12-20 6:48 ` [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Mike Rapoport 1 sibling, 4 replies; 38+ messages in thread From: Roman Gushchin @ 2020-12-17 20:12 UTC (permalink / raw) To: Andrew Morton, Mike Rapoport, linux-mm Cc: Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team, Roman Gushchin With kaslr the kernel image is placed at a random place, so starting the bottom-up allocation with the kernel_end can result in an allocation failure and a warning like this one: [ 0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002921] ------------[ cut here ]------------ [ 0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected [ 0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a [ 0.002937] Modules linked in: [ 0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 [ 0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 [ 0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a [ 0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c [ 0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 [ 0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff [ 0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 [ 0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb [ 0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 [ 0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 [ 0.002952] FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 [ 0.002953] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 [ 0.002956] Call Trace: [ 0.002961] ? memblock_alloc_range_nid+0x8d/0x11e [ 0.002963] ? cma_declare_contiguous_nid+0x2c4/0x38c [ 0.002964] ? hugetlb_cma_reserve+0xdc/0x128 [ 0.002968] ? flush_tlb_one_kernel+0xc/0x20 [ 0.002969] ? native_set_fixmap+0x82/0xd0 [ 0.002971] ? flat_get_apic_id+0x5/0x10 [ 0.002973] ? register_lapic_address+0x8e/0x97 [ 0.002975] ? setup_arch+0x8a5/0xc3f [ 0.002978] ? start_kernel+0x66/0x547 [ 0.002980] ? load_ucode_bsp+0x4c/0xcd [ 0.002982] ? secondary_startup_64_no_verify+0xb0/0xbb [ 0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 [ 0.002988] ---[ end trace f151227d0b39be70 ]--- At the same time, the kernel image is protected with memblock_reserve(), so we can just start searching at PAGE_SIZE. In this case the bottom-up allocation has the same chances to success as a top-down allocation, so there is no reason to fallback in the case of a failure. All together it simplifies the logic. Signed-off-by: Roman Gushchin <guro@fb.com> --- mm/memblock.c | 49 ++++++------------------------------------------- 1 file changed, 6 insertions(+), 43 deletions(-) diff --git a/mm/memblock.c b/mm/memblock.c index b68ee86788af..10bd7d1ef0f4 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end, * * Find @size free area aligned to @align in the specified range and node. * - * When allocation direction is bottom-up, the @start should be greater - * than the end of the kernel image. Otherwise, it will be trimmed. The - * reason is that we want the bottom-up allocation just near the kernel - * image so it is highly likely that the allocated memory and the kernel - * will reside in the same node. - * - * If bottom-up allocation failed, will try to allocate memory top-down. - * * Return: * Found address on success, 0 on failure. */ @@ -291,8 +283,6 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size, phys_addr_t end, int nid, enum memblock_flags flags) { - phys_addr_t kernel_end, ret; - /* pump up @end */ if (end == MEMBLOCK_ALLOC_ACCESSIBLE || end == MEMBLOCK_ALLOC_KASAN) @@ -301,40 +291,13 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size, /* avoid allocating the first page */ start = max_t(phys_addr_t, start, PAGE_SIZE); end = max(start, end); - kernel_end = __pa_symbol(_end); - - /* - * try bottom-up allocation only when bottom-up mode - * is set and @end is above the kernel image. - */ - if (memblock_bottom_up() && end > kernel_end) { - phys_addr_t bottom_up_start; - - /* make sure we will allocate above the kernel */ - bottom_up_start = max(start, kernel_end); - /* ok, try bottom-up allocation first */ - ret = __memblock_find_range_bottom_up(bottom_up_start, end, - size, align, nid, flags); - if (ret) - return ret; - - /* - * we always limit bottom-up allocation above the kernel, - * but top-down allocation doesn't have the limit, so - * retrying top-down allocation may succeed when bottom-up - * allocation failed. - * - * bottom-up allocation is expected to be fail very rarely, - * so we use WARN_ONCE() here to see the stack trace if - * fail happens. - */ - WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE), - "memblock: bottom-up allocation failed, memory hotremove may be affected\n"); - } - - return __memblock_find_range_top_down(start, end, size, align, nid, - flags); + if (memblock_bottom_up()) + return __memblock_find_range_bottom_up(start, end, size, align, + nid, flags); + else + return __memblock_find_range_top_down(start, end, size, align, + nid, flags); } /** -- 2.26.2 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin @ 2020-12-19 14:52 ` Wonhyuk Yang 2020-12-19 17:05 ` Roman Gushchin 2020-12-20 6:49 ` Mike Rapoport ` (2 subsequent siblings) 3 siblings, 1 reply; 38+ messages in thread From: Wonhyuk Yang @ 2020-12-19 14:52 UTC (permalink / raw) To: Roman Gushchin Cc: Andrew Morton, Mike Rapoport, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team Hi Roman, On Fri, Dec 18, 2020 at 5:12 AM Roman Gushchin <guro@fb.com> wrote: > > With kaslr the kernel image is placed at a random place, so starting > the bottom-up allocation with the kernel_end can result in an > allocation failure and a warning like this one: > > [ 0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node > [ 0.002921] ------------[ cut here ]------------ > [ 0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected > [ 0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a > [ 0.002956] Call Trace: > [ 0.002961] ? memblock_alloc_range_nid+0x8d/0x11e > [ 0.002963] ? cma_declare_contiguous_nid+0x2c4/0x38c > [ 0.002964] ? hugetlb_cma_reserve+0xdc/0x128 > [ 0.002968] ? flush_tlb_one_kernel+0xc/0x20 > [ 0.002969] ? native_set_fixmap+0x82/0xd0 > [ 0.002971] ? flat_get_apic_id+0x5/0x10 > [ 0.002973] ? register_lapic_address+0x8e/0x97 > [ 0.002975] ? setup_arch+0x8a5/0xc3f > [ 0.002978] ? start_kernel+0x66/0x547 > [ 0.002980] ? load_ucode_bsp+0x4c/0xcd > [ 0.002982] ? secondary_startup_64_no_verify+0xb0/0xbb > [ 0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 > > At the same time, the kernel image is protected with memblock_reserve(), > so we can just start searching at PAGE_SIZE. In this case the > bottom-up allocation has the same chances to success as a top-down > allocation, so there is no reason to fallback in the case of a > failure. All together it simplifies the logic. I figure out that it was introduced by commit 79442ed189acb ("memblock.c: introduce bottom-up allocation mode") According to this commit, The purpose of bottom up allocation is to allocate memory from the unhotpluggable node. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2020-12-19 14:52 ` Wonhyuk Yang @ 2020-12-19 17:05 ` Roman Gushchin 0 siblings, 0 replies; 38+ messages in thread From: Roman Gushchin @ 2020-12-19 17:05 UTC (permalink / raw) To: Wonhyuk Yang Cc: Andrew Morton, Mike Rapoport, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team On Sat, Dec 19, 2020 at 11:52:19PM +0900, Wonhyuk Yang wrote: > Hi Roman, > > On Fri, Dec 18, 2020 at 5:12 AM Roman Gushchin <guro@fb.com> wrote: > > > > With kaslr the kernel image is placed at a random place, so starting > > the bottom-up allocation with the kernel_end can result in an > > allocation failure and a warning like this one: > > > > [ 0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node > > [ 0.002921] ------------[ cut here ]------------ > > [ 0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected > > [ 0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a > > [ 0.002956] Call Trace: > > [ 0.002961] ? memblock_alloc_range_nid+0x8d/0x11e > > [ 0.002963] ? cma_declare_contiguous_nid+0x2c4/0x38c > > [ 0.002964] ? hugetlb_cma_reserve+0xdc/0x128 > > [ 0.002968] ? flush_tlb_one_kernel+0xc/0x20 > > [ 0.002969] ? native_set_fixmap+0x82/0xd0 > > [ 0.002971] ? flat_get_apic_id+0x5/0x10 > > [ 0.002973] ? register_lapic_address+0x8e/0x97 > > [ 0.002975] ? setup_arch+0x8a5/0xc3f > > [ 0.002978] ? start_kernel+0x66/0x547 > > [ 0.002980] ? load_ucode_bsp+0x4c/0xcd > > [ 0.002982] ? secondary_startup_64_no_verify+0xb0/0xbb > > [ 0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 > > > > At the same time, the kernel image is protected with memblock_reserve(), > > so we can just start searching at PAGE_SIZE. In this case the > > bottom-up allocation has the same chances to success as a top-down > > allocation, so there is no reason to fallback in the case of a > > failure. All together it simplifies the logic. > > I figure out that it was introduced by > commit 79442ed189acb ("memblock.c: introduce bottom-up allocation mode") > > According to this commit, The purpose of bottom up allocation is to > allocate memory from the unhotpluggable node. Hi Wonhyuk, correct! And it remains this way, we just don't need to skip all the memory before the kernel_end. Thanks! ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin 2020-12-19 14:52 ` Wonhyuk Yang @ 2020-12-20 6:49 ` Mike Rapoport 2021-01-22 4:37 ` Thiago Jung Bauermann 2021-02-28 4:18 ` Florian Fainelli 2021-03-23 18:19 ` [tip: x86/boot] x86/setup: Consolidate early memory reservations tip-bot2 for Mike Rapoport 3 siblings, 1 reply; 38+ messages in thread From: Mike Rapoport @ 2020-12-20 6:49 UTC (permalink / raw) To: Roman Gushchin Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team On Thu, Dec 17, 2020 at 12:12:14PM -0800, Roman Gushchin wrote: > With kaslr the kernel image is placed at a random place, so starting > the bottom-up allocation with the kernel_end can result in an > allocation failure and a warning like this one: > > [ 0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node > [ 0.002921] ------------[ cut here ]------------ > [ 0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected > [ 0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a > [ 0.002937] Modules linked in: > [ 0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 > [ 0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 > [ 0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a > [ 0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c > [ 0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 > [ 0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff > [ 0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 > [ 0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb > [ 0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 > [ 0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 > [ 0.002952] FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 > [ 0.002953] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 > [ 0.002956] Call Trace: > [ 0.002961] ? memblock_alloc_range_nid+0x8d/0x11e > [ 0.002963] ? cma_declare_contiguous_nid+0x2c4/0x38c > [ 0.002964] ? hugetlb_cma_reserve+0xdc/0x128 > [ 0.002968] ? flush_tlb_one_kernel+0xc/0x20 > [ 0.002969] ? native_set_fixmap+0x82/0xd0 > [ 0.002971] ? flat_get_apic_id+0x5/0x10 > [ 0.002973] ? register_lapic_address+0x8e/0x97 > [ 0.002975] ? setup_arch+0x8a5/0xc3f > [ 0.002978] ? start_kernel+0x66/0x547 > [ 0.002980] ? load_ucode_bsp+0x4c/0xcd > [ 0.002982] ? secondary_startup_64_no_verify+0xb0/0xbb > [ 0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 > [ 0.002988] ---[ end trace f151227d0b39be70 ]--- > > At the same time, the kernel image is protected with memblock_reserve(), > so we can just start searching at PAGE_SIZE. In this case the > bottom-up allocation has the same chances to success as a top-down > allocation, so there is no reason to fallback in the case of a > failure. All together it simplifies the logic. > > Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > --- > mm/memblock.c | 49 ++++++------------------------------------------- > 1 file changed, 6 insertions(+), 43 deletions(-) > > diff --git a/mm/memblock.c b/mm/memblock.c > index b68ee86788af..10bd7d1ef0f4 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end, > * > * Find @size free area aligned to @align in the specified range and node. > * > - * When allocation direction is bottom-up, the @start should be greater > - * than the end of the kernel image. Otherwise, it will be trimmed. The > - * reason is that we want the bottom-up allocation just near the kernel > - * image so it is highly likely that the allocated memory and the kernel > - * will reside in the same node. > - * > - * If bottom-up allocation failed, will try to allocate memory top-down. > - * > * Return: > * Found address on success, 0 on failure. > */ > @@ -291,8 +283,6 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size, > phys_addr_t end, int nid, > enum memblock_flags flags) > { > - phys_addr_t kernel_end, ret; > - > /* pump up @end */ > if (end == MEMBLOCK_ALLOC_ACCESSIBLE || > end == MEMBLOCK_ALLOC_KASAN) > @@ -301,40 +291,13 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size, > /* avoid allocating the first page */ > start = max_t(phys_addr_t, start, PAGE_SIZE); > end = max(start, end); > - kernel_end = __pa_symbol(_end); > - > - /* > - * try bottom-up allocation only when bottom-up mode > - * is set and @end is above the kernel image. > - */ > - if (memblock_bottom_up() && end > kernel_end) { > - phys_addr_t bottom_up_start; > - > - /* make sure we will allocate above the kernel */ > - bottom_up_start = max(start, kernel_end); > > - /* ok, try bottom-up allocation first */ > - ret = __memblock_find_range_bottom_up(bottom_up_start, end, > - size, align, nid, flags); > - if (ret) > - return ret; > - > - /* > - * we always limit bottom-up allocation above the kernel, > - * but top-down allocation doesn't have the limit, so > - * retrying top-down allocation may succeed when bottom-up > - * allocation failed. > - * > - * bottom-up allocation is expected to be fail very rarely, > - * so we use WARN_ONCE() here to see the stack trace if > - * fail happens. > - */ > - WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE), > - "memblock: bottom-up allocation failed, memory hotremove may be affected\n"); > - } > - > - return __memblock_find_range_top_down(start, end, size, align, nid, > - flags); > + if (memblock_bottom_up()) > + return __memblock_find_range_bottom_up(start, end, size, align, > + nid, flags); > + else > + return __memblock_find_range_top_down(start, end, size, align, > + nid, flags); > } > > /** > -- > 2.26.2 > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2020-12-20 6:49 ` Mike Rapoport @ 2021-01-22 4:37 ` Thiago Jung Bauermann 2021-01-24 2:09 ` Andrew Morton 0 siblings, 1 reply; 38+ messages in thread From: Thiago Jung Bauermann @ 2021-01-22 4:37 UTC (permalink / raw) To: rppt Cc: akpm, guro, iamjoonsoo.kim, Ram Pai, Konrad Rzeszutek Wilk, Satheesh Rajendran, kernel-team, linux-kernel, linux-mm, linuxppc-dev, mhocko, riel, Thiago Jung Bauermann Mike Rapoport <rppt@kernel.org> writes: > > Signed-off-by: Roman Gushchin <guro@fb.com> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> I've seen a couple of spurious triggers of the WARN_ONCE() removed by this patch. This happens on some ppc64le bare metal (powernv) server machines with CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted to solve this issue in a different way: https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/ Since this patch solves that problem, is it possible to include it in the next feasible v5.11-rcX, with the following tag? Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory") This is because reverting the commit above also solves the problem on the machines where I've seen this issue. -- Thiago Jung Bauermann IBM Linux Technology Center ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-01-22 4:37 ` Thiago Jung Bauermann @ 2021-01-24 2:09 ` Andrew Morton 2021-01-24 7:34 ` Mike Rapoport 0 siblings, 1 reply; 38+ messages in thread From: Andrew Morton @ 2021-01-24 2:09 UTC (permalink / raw) To: Thiago Jung Bauermann Cc: rppt, guro, iamjoonsoo.kim, Ram Pai, Konrad Rzeszutek Wilk, Satheesh Rajendran, kernel-team, linux-kernel, linux-mm, linuxppc-dev, mhocko, riel On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote: > Mike Rapoport <rppt@kernel.org> writes: > > > > Signed-off-by: Roman Gushchin <guro@fb.com> > > > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this > patch. This happens on some ppc64le bare metal (powernv) server machines with > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted > to solve this issue in a different way: > > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/ > > Since this patch solves that problem, is it possible to include it in the next > feasible v5.11-rcX, with the following tag? We could do this, if we're confident that this patch doesn't depend on [1/2] "mm: cma: allocate cma areas bottom-up"? I think it is... > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory") I added that. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-01-24 2:09 ` Andrew Morton @ 2021-01-24 7:34 ` Mike Rapoport 2021-01-26 0:30 ` Thiago Jung Bauermann 2021-02-08 23:58 ` Thiago Jung Bauermann 0 siblings, 2 replies; 38+ messages in thread From: Mike Rapoport @ 2021-01-24 7:34 UTC (permalink / raw) To: Andrew Morton Cc: Thiago Jung Bauermann, guro, iamjoonsoo.kim, Ram Pai, Konrad Rzeszutek Wilk, Satheesh Rajendran, kernel-team, linux-kernel, linux-mm, linuxppc-dev, mhocko, riel On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote: > On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote: > > > Mike Rapoport <rppt@kernel.org> writes: > > > > > > Signed-off-by: Roman Gushchin <guro@fb.com> > > > > > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > > > > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this > > patch. This happens on some ppc64le bare metal (powernv) server machines with > > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted > > to solve this issue in a different way: > > > > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/ > > > > Since this patch solves that problem, is it possible to include it in the next > > feasible v5.11-rcX, with the following tag? > > We could do this, if we're confident that this patch doesn't depend on > [1/2] "mm: cma: allocate cma areas bottom-up"? I think it is... A think it does not depend on cma bottom-up allocation, it's rather the other way around: without this CMA bottom-up allocation could fail with KASLR enabled. Still, this patch may need updates to the way x86 does early reservations: https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org > > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory") > > I added that. > > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-01-24 7:34 ` Mike Rapoport @ 2021-01-26 0:30 ` Thiago Jung Bauermann 2021-02-08 23:58 ` Thiago Jung Bauermann 1 sibling, 0 replies; 38+ messages in thread From: Thiago Jung Bauermann @ 2021-01-26 0:30 UTC (permalink / raw) To: Mike Rapoport Cc: Andrew Morton, guro, iamjoonsoo.kim, Ram Pai, Konrad Rzeszutek Wilk, Satheesh Rajendran, kernel-team, linux-kernel, linux-mm, linuxppc-dev, mhocko, riel Mike Rapoport <rppt@kernel.org> writes: > On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote: >> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote: >> >> > Mike Rapoport <rppt@kernel.org> writes: >> > >> > > > Signed-off-by: Roman Gushchin <guro@fb.com> >> > > >> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> >> > >> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this >> > patch. This happens on some ppc64le bare metal (powernv) server machines with >> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted >> > to solve this issue in a different way: >> > >> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/ >> > >> > Since this patch solves that problem, is it possible to include it in the next >> > feasible v5.11-rcX, with the following tag? >> >> We could do this, Thanks! >> if we're confident that this patch doesn't depend on >> [1/2] "mm: cma: allocate cma areas bottom-up"? I think it is... > > A think it does not depend on cma bottom-up allocation, it's rather the other > way around: without this CMA bottom-up allocation could fail with KASLR > enabled. I agree. Conceptually, this could have been patch 1 in this series. > Still, this patch may need updates to the way x86 does early reservations: > > https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org Ah, I wasn't aware of this. Thanks for fixing those issues. That series seems to be well accepted. >> > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory") >> >> I added that. Thanks! -- Thiago Jung Bauermann IBM Linux Technology Center ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-01-24 7:34 ` Mike Rapoport 2021-01-26 0:30 ` Thiago Jung Bauermann @ 2021-02-08 23:58 ` Thiago Jung Bauermann 1 sibling, 0 replies; 38+ messages in thread From: Thiago Jung Bauermann @ 2021-02-08 23:58 UTC (permalink / raw) To: Mike Rapoport Cc: Andrew Morton, riel, kernel-team, Ram Pai, linux-kernel, mhocko, linux-mm, Satheesh Rajendran, Konrad Rzeszutek Wilk, iamjoonsoo.kim, guro, linuxppc-dev Mike Rapoport <rppt@kernel.org> writes: > On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote: >> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote: >> >> > Mike Rapoport <rppt@kernel.org> writes: >> > >> > > > Signed-off-by: Roman Gushchin <guro@fb.com> >> > > >> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> >> > >> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this >> > patch. This happens on some ppc64le bare metal (powernv) server machines with >> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted >> > to solve this issue in a different way: >> > >> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/ >> > >> > Since this patch solves that problem, is it possible to include it in the next >> > feasible v5.11-rcX, with the following tag? >> >> We could do this, if we're confident that this patch doesn't depend on >> [1/2] "mm: cma: allocate cma areas bottom-up"? I think it is... > > A think it does not depend on cma bottom-up allocation, it's rather the other > way around: without this CMA bottom-up allocation could fail with KASLR > enabled. I noticed that this patch is now upstream as: 2dcb39645441 memblock: do not start bottom-up allocations with kernel_end > Still, this patch may need updates to the way x86 does early reservations: > > https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org ... but the patches from this link still aren't. Isn't this a potential problem for x86? The patch series on the link above is now superseded by v2: https://lore.kernel.org/linux-mm/20210128105711.10428-1-rppt@kernel.org/ -- Thiago Jung Bauermann IBM Linux Technology Center ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin 2020-12-19 14:52 ` Wonhyuk Yang 2020-12-20 6:49 ` Mike Rapoport @ 2021-02-28 4:18 ` Florian Fainelli 2021-02-28 9:00 ` Mike Rapoport 2021-03-23 18:19 ` [tip: x86/boot] x86/setup: Consolidate early memory reservations tip-bot2 for Mike Rapoport 3 siblings, 1 reply; 38+ messages in thread From: Florian Fainelli @ 2021-02-28 4:18 UTC (permalink / raw) To: Roman Gushchin, Andrew Morton, Mike Rapoport, linux-mm, Kamal Dasu, linux-mips, Thomas Bogendoerfer, Paul Cercueil, Serge Semin, Jiaxun Yang, rppt, iamjoonsoo.kim, riel Cc: Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team On 12/17/2020 12:12 PM, Roman Gushchin wrote: > With kaslr the kernel image is placed at a random place, so starting > the bottom-up allocation with the kernel_end can result in an > allocation failure and a warning like this one: > > [ 0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node > [ 0.002921] ------------[ cut here ]------------ > [ 0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected > [ 0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a > [ 0.002937] Modules linked in: > [ 0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 > [ 0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 > [ 0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a > [ 0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c > [ 0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 > [ 0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff > [ 0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 > [ 0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb > [ 0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 > [ 0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 > [ 0.002952] FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 > [ 0.002953] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 > [ 0.002956] Call Trace: > [ 0.002961] ? memblock_alloc_range_nid+0x8d/0x11e > [ 0.002963] ? cma_declare_contiguous_nid+0x2c4/0x38c > [ 0.002964] ? hugetlb_cma_reserve+0xdc/0x128 > [ 0.002968] ? flush_tlb_one_kernel+0xc/0x20 > [ 0.002969] ? native_set_fixmap+0x82/0xd0 > [ 0.002971] ? flat_get_apic_id+0x5/0x10 > [ 0.002973] ? register_lapic_address+0x8e/0x97 > [ 0.002975] ? setup_arch+0x8a5/0xc3f > [ 0.002978] ? start_kernel+0x66/0x547 > [ 0.002980] ? load_ucode_bsp+0x4c/0xcd > [ 0.002982] ? secondary_startup_64_no_verify+0xb0/0xbb > [ 0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 > [ 0.002988] ---[ end trace f151227d0b39be70 ]--- > > At the same time, the kernel image is protected with memblock_reserve(), > so we can just start searching at PAGE_SIZE. In this case the > bottom-up allocation has the same chances to success as a top-down > allocation, so there is no reason to fallback in the case of a > failure. All together it simplifies the logic. > > Signed-off-by: Roman Gushchin <guro@fb.com> Hi Roman, Thomas and other linux-mips folks, Kamal and myself have been unable to boot v5.11 on MIPS since this commit, reverting it makes our MIPS platforms boot successfully. We do not see a warning like this one in the commit message, instead what happens appear to be a corrupted Device Tree which prevents the parsing of the "rdb" node and leading to the interrupt controllers not being registered, and the system eventually not booting. The Device Tree is built-into the kernel image and resides at arch/mips/boot/dts/brcm/bcm97435svmb.dts. Do you have any idea what could be wrong with MIPS specifically here? Thanks! -- Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-02-28 4:18 ` Florian Fainelli @ 2021-02-28 9:00 ` Mike Rapoport 2021-02-28 18:19 ` Florian Fainelli 0 siblings, 1 reply; 38+ messages in thread From: Mike Rapoport @ 2021-02-28 9:00 UTC (permalink / raw) To: Florian Fainelli Cc: Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, linux-mips, Thomas Bogendoerfer, Paul Cercueil, Serge Semin, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team Hi Florian, On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: > > On 12/17/2020 12:12 PM, Roman Gushchin wrote: > > With kaslr the kernel image is placed at a random place, so starting > > the bottom-up allocation with the kernel_end can result in an > > allocation failure and a warning like this one: > > > > [ 0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node > > [ 0.002921] ------------[ cut here ]------------ > > [ 0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected > > [ 0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a > > [ 0.002937] Modules linked in: > > [ 0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 > > [ 0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 > > [ 0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a > > [ 0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c > > [ 0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 > > [ 0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff > > [ 0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 > > [ 0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb > > [ 0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 > > [ 0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 > > [ 0.002952] FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 > > [ 0.002953] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 > > [ 0.002956] Call Trace: > > [ 0.002961] ? memblock_alloc_range_nid+0x8d/0x11e > > [ 0.002963] ? cma_declare_contiguous_nid+0x2c4/0x38c > > [ 0.002964] ? hugetlb_cma_reserve+0xdc/0x128 > > [ 0.002968] ? flush_tlb_one_kernel+0xc/0x20 > > [ 0.002969] ? native_set_fixmap+0x82/0xd0 > > [ 0.002971] ? flat_get_apic_id+0x5/0x10 > > [ 0.002973] ? register_lapic_address+0x8e/0x97 > > [ 0.002975] ? setup_arch+0x8a5/0xc3f > > [ 0.002978] ? start_kernel+0x66/0x547 > > [ 0.002980] ? load_ucode_bsp+0x4c/0xcd > > [ 0.002982] ? secondary_startup_64_no_verify+0xb0/0xbb > > [ 0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 > > [ 0.002988] ---[ end trace f151227d0b39be70 ]--- > > > > At the same time, the kernel image is protected with memblock_reserve(), > > so we can just start searching at PAGE_SIZE. In this case the > > bottom-up allocation has the same chances to success as a top-down > > allocation, so there is no reason to fallback in the case of a > > failure. All together it simplifies the logic. > > > > Signed-off-by: Roman Gushchin <guro@fb.com> > > Hi Roman, Thomas and other linux-mips folks, > > Kamal and myself have been unable to boot v5.11 on MIPS since this > commit, reverting it makes our MIPS platforms boot successfully. We do > not see a warning like this one in the commit message, instead what > happens appear to be a corrupted Device Tree which prevents the parsing > of the "rdb" node and leading to the interrupt controllers not being > registered, and the system eventually not booting. > > The Device Tree is built-into the kernel image and resides at > arch/mips/boot/dts/brcm/bcm97435svmb.dts. > > Do you have any idea what could be wrong with MIPS specifically here? Apparently there is a memblock allocation in one of the functions called from arch_mem_init() between plat_mem_setup() and early_init_fdt_reserve_self(). If you have serial available that early we can try to track it down with forcing memblock_debug in mm/memblock.c to 1: diff --git a/mm/memblock.c b/mm/memblock.c index afaefa8fc6ab..83034245f8d5 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -151,7 +151,7 @@ static __refdata struct memblock_type *memblock_memory = &memblock.memory; pr_info(fmt, ##__VA_ARGS__); \ } while (0) -static int memblock_debug __initdata_memblock; +static int memblock_debug __initdata_memblock = 1; static bool system_has_some_mirror __initdata_memblock = false; static int memblock_can_resize __initdata_memblock; static int memblock_memory_in_slab __initdata_memblock = 0; Regardless, I think that moving DT self reservation just after plat_mem_setup() is safe and it'll make things more robust. diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 279be0153f8b..f476b99a7bcd 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -623,6 +623,8 @@ static void __init arch_mem_init(char **cmdline_p) { /* call board setup routine */ plat_mem_setup(); + early_init_fdt_reserve_self(); + early_init_fdt_scan_reserved_mem(); memblock_set_bottom_up(true); bootcmdline_init(); @@ -636,9 +638,6 @@ static void __init arch_mem_init(char **cmdline_p) check_kernel_sections_mem(); - early_init_fdt_reserve_self(); - early_init_fdt_scan_reserved_mem(); - #ifndef CONFIG_NUMA memblock_set_node(0, PHYS_ADDR_MAX, &memblock.memory, 0); #endif -- Sincerely yours, Mike. ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-02-28 9:00 ` Mike Rapoport @ 2021-02-28 18:19 ` Florian Fainelli 2021-02-28 23:08 ` Serge Semin 0 siblings, 1 reply; 38+ messages in thread From: Florian Fainelli @ 2021-02-28 18:19 UTC (permalink / raw) To: Mike Rapoport Cc: Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, linux-mips, Thomas Bogendoerfer, Paul Cercueil, Serge Semin, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team Hi Mike, On 2/28/2021 1:00 AM, Mike Rapoport wrote: > Hi Florian, > > On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: >> >> On 12/17/2020 12:12 PM, Roman Gushchin wrote: >>> With kaslr the kernel image is placed at a random place, so starting >>> the bottom-up allocation with the kernel_end can result in an >>> allocation failure and a warning like this one: >>> >>> [ 0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node >>> [ 0.002921] ------------[ cut here ]------------ >>> [ 0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected >>> [ 0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a >>> [ 0.002937] Modules linked in: >>> [ 0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 >>> [ 0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 >>> [ 0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a >>> [ 0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c >>> [ 0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 >>> [ 0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff >>> [ 0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 >>> [ 0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb >>> [ 0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 >>> [ 0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 >>> [ 0.002952] FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 >>> [ 0.002953] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 >>> [ 0.002956] Call Trace: >>> [ 0.002961] ? memblock_alloc_range_nid+0x8d/0x11e >>> [ 0.002963] ? cma_declare_contiguous_nid+0x2c4/0x38c >>> [ 0.002964] ? hugetlb_cma_reserve+0xdc/0x128 >>> [ 0.002968] ? flush_tlb_one_kernel+0xc/0x20 >>> [ 0.002969] ? native_set_fixmap+0x82/0xd0 >>> [ 0.002971] ? flat_get_apic_id+0x5/0x10 >>> [ 0.002973] ? register_lapic_address+0x8e/0x97 >>> [ 0.002975] ? setup_arch+0x8a5/0xc3f >>> [ 0.002978] ? start_kernel+0x66/0x547 >>> [ 0.002980] ? load_ucode_bsp+0x4c/0xcd >>> [ 0.002982] ? secondary_startup_64_no_verify+0xb0/0xbb >>> [ 0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 >>> [ 0.002988] ---[ end trace f151227d0b39be70 ]--- >>> >>> At the same time, the kernel image is protected with memblock_reserve(), >>> so we can just start searching at PAGE_SIZE. In this case the >>> bottom-up allocation has the same chances to success as a top-down >>> allocation, so there is no reason to fallback in the case of a >>> failure. All together it simplifies the logic. >>> >>> Signed-off-by: Roman Gushchin <guro@fb.com> >> >> Hi Roman, Thomas and other linux-mips folks, >> >> Kamal and myself have been unable to boot v5.11 on MIPS since this >> commit, reverting it makes our MIPS platforms boot successfully. We do >> not see a warning like this one in the commit message, instead what >> happens appear to be a corrupted Device Tree which prevents the parsing >> of the "rdb" node and leading to the interrupt controllers not being >> registered, and the system eventually not booting. >> >> The Device Tree is built-into the kernel image and resides at >> arch/mips/boot/dts/brcm/bcm97435svmb.dts. >> >> Do you have any idea what could be wrong with MIPS specifically here? > > Apparently there is a memblock allocation in one of the functions called > from arch_mem_init() between plat_mem_setup() and > early_init_fdt_reserve_self(). > > If you have serial available that early we can try to track it down with > forcing memblock_debug in mm/memblock.c to 1: > > diff --git a/mm/memblock.c b/mm/memblock.c > index afaefa8fc6ab..83034245f8d5 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -151,7 +151,7 @@ static __refdata struct memblock_type *memblock_memory = &memblock.memory; > pr_info(fmt, ##__VA_ARGS__); \ > } while (0) > > -static int memblock_debug __initdata_memblock; > +static int memblock_debug __initdata_memblock = 1; > static bool system_has_some_mirror __initdata_memblock = false; > static int memblock_can_resize __initdata_memblock; > static int memblock_memory_in_slab __initdata_memblock = 0; > > > Regardless, I think that moving DT self reservation just after > plat_mem_setup() is safe and it'll make things more robust. > > diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c > index 279be0153f8b..f476b99a7bcd 100644 > --- a/arch/mips/kernel/setup.c > +++ b/arch/mips/kernel/setup.c > @@ -623,6 +623,8 @@ static void __init arch_mem_init(char **cmdline_p) > { > /* call board setup routine */ > plat_mem_setup(); > + early_init_fdt_reserve_self(); > + early_init_fdt_scan_reserved_mem(); > memblock_set_bottom_up(true); > > bootcmdline_init(); > @@ -636,9 +638,6 @@ static void __init arch_mem_init(char **cmdline_p) > > check_kernel_sections_mem(); > > - early_init_fdt_reserve_self(); > - early_init_fdt_scan_reserved_mem(); > - > #ifndef CONFIG_NUMA > memblock_set_node(0, PHYS_ADDR_MAX, &memblock.memory, 0); > #endif Thanks a lot for taking a look! The current/broken memblock=debug output looks like this: [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun Feb 28 10:01:50 PST 2021 [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) [ 0.000000] FPU revision is: 00130001 [ 0.000000] memblock_add: [0x00000000-0x0fffffff] early_init_dt_scan_memory+0x160/0x1e0 [ 0.000000] memblock_add: [0x20000000-0x4fffffff] early_init_dt_scan_memory+0x160/0x1e0 [ 0.000000] memblock_add: [0x90000000-0xcfffffff] early_init_dt_scan_memory+0x160/0x1e0 [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') [ 0.000000] printk: bootconsole [ns16550a0] enabled [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] setup_arch+0x128/0x69c [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] setup_arch+0x1f8/0x69c [ 0.000000] Initrd not found or empty - disabling initrd [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x40/0x84 [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x40/0x84 [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x40/0x84 [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] setup_arch+0x3fc/0x69c [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 bytes. [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, linesize 32 bytes [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Zone ranges: [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000cfffffff] [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 from=0x00000000 max_addr=0x00000000 alloc_node_mem_map.constprop.135+0x6c/0xc8 [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] MEMBLOCK configuration: [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 [ 0.000000] memory.cnt = 0x3 [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 bytes flags: 0x0 [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 bytes flags: 0x0 [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 bytes flags: 0x0 [ 0.000000] reserved.cnt = 0xa [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 bytes flags: 0x0 [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 bytes flags: 0x0 [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 bytes flags: 0x0 [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 bytes flags: 0x0 [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 bytes flags: 0x0 [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 bytes flags: 0x0 [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_free: [0x03245000-0x03244fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] memblock_free: [0x03257000-0x03256fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] memblock_free: [0x03269000-0x03268fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] memblock_free: [0x0327b000-0x0327afff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 [ 0.000000] memblock_reserve: [0x03232580-0x032325db] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] pcpu_embed_first_chunk+0x838/0x884 [ 0.000000] memblock_free: [0x03231400-0x032323ff] pcpu_embed_first_chunk+0x850/0x884 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear) [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear) [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] trap_init+0x70/0x4e8 [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K cma-reserved, 1835008K highmem) [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] rcu: Hierarchical RCU implementation. [ 0.000000] rcu: RCU event tracing is enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] NR_IRQS: 256 [ 0.000000] OF: Bad cell count for /rdb [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers [ 0.000000] OF: of_irq_init: children remain, but no parents [ 0.000000] random: get_random_bytes called from start_kernel+0x444/0x654 with crng_init=0 [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, wraps every 8589934590000000ns and with your patch applied which unfortunately did not work we have the following: [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #86 SMP Sun Feb 28 10:04:54 PST 2021 [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) [ 0.000000] FPU revision is: 00130001 [ 0.000000] memblock_add: [0x00000000-0x0fffffff] early_init_dt_scan_memory+0x160/0x1e0 [ 0.000000] memblock_add: [0x20000000-0x4fffffff] early_init_dt_scan_memory+0x160/0x1e0 [ 0.000000] memblock_add: [0x90000000-0xcfffffff] early_init_dt_scan_memory+0x160/0x1e0 [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] setup_arch+0x60/0x6a4 [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') [ 0.000000] printk: bootconsole [ns16550a0] enabled [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] setup_arch+0x200/0x6a4 [ 0.000000] Initrd not found or empty - disabling initrd [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x40/0x84 [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x40/0x84 [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x40/0x84 [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] setup_arch+0x404/0x6a4 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x4e8/0x6a4 [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x4e8/0x6a4 [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x4e8/0x6a4 [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 bytes. [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, linesize 32 bytes [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Zone ranges: [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000cfffffff] [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 from=0x00000000 max_addr=0x00000000 alloc_node_mem_map.constprop.135+0x6c/0xc8 [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] MEMBLOCK configuration: [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 [ 0.000000] memory.cnt = 0x3 [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 bytes flags: 0x0 [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 bytes flags: 0x0 [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 bytes flags: 0x0 [ 0.000000] reserved.cnt = 0xa [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 bytes flags: 0x0 [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 bytes flags: 0x0 [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 bytes flags: 0x0 [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 bytes flags: 0x0 [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 bytes flags: 0x0 [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 bytes flags: 0x0 [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_free: [0x03245000-0x03244fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] memblock_free: [0x03257000-0x03256fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] memblock_free: [0x03269000-0x03268fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] memblock_free: [0x0327b000-0x0327afff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 [ 0.000000] memblock_reserve: [0x03232580-0x032325db] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] pcpu_embed_first_chunk+0x838/0x884 [ 0.000000] memblock_free: [0x03231400-0x032323ff] pcpu_embed_first_chunk+0x850/0x884 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear) [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear) [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] trap_init+0x70/0x4e8 [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K cma-reserved, 1835008K highmem) [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] rcu: Hierarchical RCU implementation. [ 0.000000] rcu: RCU event tracing is enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] NR_IRQS: 256 [ 0.000000] OF: Bad cell count for /rdb [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers [ 0.000000] OF: of_irq_init: children remain, but no parents [ 0.000000] random: get_random_bytes called from start_kernel+0x444/0x654 with crng_init=0 [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, wraps every 8589934590000000ns With only the revert of f787b0b4502cde50c3583432d6cb9bd8306fc242 ("memblock: do not start bottom-up allocations with kernel_end") and an unmodified arch/mips/kernel/setup.c, this boots successfully: [ 0.000000] Linux version 5.11.0-gf787b0b4502c (florian@locahost) (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #88 SMP Sun Feb 28 10:13:21 PST 2021 [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) [ 0.000000] FPU revision is: 00130001 [ 0.000000] memblock_add: [0x00000000-0x0fffffff] early_init_dt_scan_memory+0x160/0x1e0 [ 0.000000] memblock_add: [0x20000000-0x4fffffff] early_init_dt_scan_memory+0x160/0x1e0 [ 0.000000] memblock_add: [0x90000000-0xcfffffff] early_init_dt_scan_memory+0x160/0x1e0 [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') [ 0.000000] printk: bootconsole [ns16550a0] enabled [ 0.000000] memblock_reserve: [0x00aa9600-0x00aac0a0] setup_arch+0x128/0x69c [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] setup_arch+0x1f8/0x69c [ 0.000000] Initrd not found or empty - disabling initrd [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x40/0x84 [ 0.000000] memblock_reserve: [0x01831400-0x01833ea0] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x40/0x84 [ 0.000000] memblock_reserve: [0x01833ea4-0x0183be4b] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x40/0x84 [ 0.000000] memblock_reserve: [0x018313d0-0x018313e8] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_reserve: [0x0096c000-0x0096bfff] setup_arch+0x3fc/0x69c [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c [ 0.000000] memblock_reserve: [0x0183be80-0x0183be9f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c [ 0.000000] memblock_reserve: [0x0183bf00-0x0183bf1f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c [ 0.000000] memblock_reserve: [0x0183bf80-0x0183bf9f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 bytes. [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, linesize 32 bytes [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 [ 0.000000] memblock_reserve: [0x0183c000-0x0183cfff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 [ 0.000000] memblock_reserve: [0x0183d000-0x0183dfff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 [ 0.000000] memblock_reserve: [0x0183e000-0x0183efff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Zone ranges: [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000cfffffff] [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 from=0x00000000 max_addr=0x00000000 alloc_node_mem_map.constprop.135+0x6c/0xc8 [ 0.000000] memblock_reserve: [0x0183f000-0x0323efff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 [ 0.000000] memblock_reserve: [0x0323f000-0x0323f01f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 [ 0.000000] memblock_reserve: [0x0323f080-0x0323f1ff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] MEMBLOCK configuration: [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 [ 0.000000] memory.cnt = 0x3 [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 bytes flags: 0x0 [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 bytes flags: 0x0 [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 bytes flags: 0x0 [ 0.000000] reserved.cnt = 0x8 [ 0.000000] reserved[0x0] [0x00010000-0x018313e8], 0x018213e9 bytes flags: 0x0 [ 0.000000] reserved[0x1] [0x01831400-0x01833ea0], 0x00002aa1 bytes flags: 0x0 [ 0.000000] reserved[0x2] [0x01833ea4-0x0183be4b], 0x00007fa8 bytes flags: 0x0 [ 0.000000] reserved[0x3] [0x0183be80-0x0183be9f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x4] [0x0183bf00-0x0183bf1f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x5] [0x0183bf80-0x0183bf9f], 0x00000020 bytes flags: 0x0 [ 0.000000] reserved[0x6] [0x0183c000-0x0323f01f], 0x01a03020 bytes flags: 0x0 [ 0.000000] reserved[0x7] [0x0323f080-0x0323f1ff], 0x00000180 bytes flags: 0x0 [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 [ 0.000000] memblock_reserve: [0x0323f200-0x0323f21d] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 [ 0.000000] memblock_reserve: [0x0323f280-0x0323f29d] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 [ 0.000000] memblock_reserve: [0x03240000-0x03240fff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 [ 0.000000] memblock_reserve: [0x03241000-0x03241fff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 [ 0.000000] memblock_reserve: [0x03242000-0x03289fff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_free: [0x03254000-0x03253fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] memblock_free: [0x03266000-0x03265fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] memblock_free: [0x03278000-0x03277fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] memblock_free: [0x0328a000-0x03289fff] pcpu_embed_first_chunk+0x7a0/0x884 [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec [ 0.000000] memblock_reserve: [0x0323f300-0x0323f303] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec [ 0.000000] memblock_reserve: [0x0323f380-0x0323f383] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec [ 0.000000] memblock_reserve: [0x0323f400-0x0323f40f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec [ 0.000000] memblock_reserve: [0x0323f480-0x0323f48f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec [ 0.000000] memblock_reserve: [0x0323f500-0x0323f57f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 [ 0.000000] memblock_reserve: [0x0323f580-0x0323f5db] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 [ 0.000000] memblock_reserve: [0x0323f600-0x0323f8ff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 [ 0.000000] memblock_reserve: [0x0323f900-0x0323fc03] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 [ 0.000000] memblock_reserve: [0x0323fc80-0x0323fd3f] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] memblock_free: [0x03240000-0x03240fff] pcpu_embed_first_chunk+0x838/0x884 [ 0.000000] memblock_free: [0x03241000-0x03241fff] pcpu_embed_first_chunk+0x850/0x884 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c [ 0.000000] memblock_reserve: [0x0328a000-0x032a9fff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear) [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c [ 0.000000] memblock_reserve: [0x032aa000-0x032b9fff] memblock_alloc_range_nid+0xf8/0x198 [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear) [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] trap_init+0x70/0x4e8 [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] Memory: 2045272K/2097152K available (8226K kernel code, 1078K rwdata, 1336K rodata, 13800K init, 260K bss, 51880K reserved, 0K cma-reserved, 1835008K highmem) [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] rcu: Hierarchical RCU implementation. [ 0.000000] rcu: RCU event tracing is enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] NR_IRQS: 256 [ 0.000000] irq_bcm7038_l1: registered BCM7038 L1 intc (/rdb/interrupt-controller@41b500, IRQs: 128) [ 0.000000] irq_brcmstb_l2: registered L2 intc (/rdb/interrupt-controller@403000, parent irq: 52) [ 0.000000] irq_bcm7120_l2: registered BCM7120 L2 intc (/rdb/interrupt-controller@406780, parent IRQ(s): 2) [ 0.000000] irq_bcm7120_l2: registered BCM7120 L2 intc (/rdb/interrupt-controller@409480, parent IRQ(s): 3) [ 0.000000] irq_brcmstb_l2: registered L2 intc (/rdb/interrupt-controller@408440, parent irq: 54) [ 0.000000] irq_brcmstb_l2: registered L2 intc (/rdb/interrupt-controller@41b000, parent irq: 24) [ 0.000000] irq_brcmstb_l2: registered L2 intc (/rdb/interrupt-controller@41bd00, parent irq: 25) [ 0.000000] random: get_random_bytes called from start_kernel+0x444/0x654 with crng_init=0 [ 0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 10882621761 ns [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, wraps every 8589934590000000ns The DTB is located at this offset within vmlinux: 37084: 80aac0a1 0 OBJECT GLOBAL DEFAULT 10 __dtb_bcm97435svmb_end 48909: 80aa9600 0 OBJECT GLOBAL DEFAULT 10 __dtb_bcm97435svmb_begin 0x8000_0000 maps to physical address 0x0 on these MIPS platforms. -- Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-02-28 18:19 ` Florian Fainelli @ 2021-02-28 23:08 ` Serge Semin 2021-03-01 3:50 ` Florian Fainelli 0 siblings, 1 reply; 38+ messages in thread From: Serge Semin @ 2021-02-28 23:08 UTC (permalink / raw) To: Florian Fainelli, Mike Rapoport, Thomas Bogendoerfer Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, linux-mips, Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team Hi folks, What you've got here seems a more complicated problem than it could originally look like. Please, see my comments below. (Note I've discarded some of the email logs, which of no interest to the discovered problem. Please also note that I haven't got any Broadcom hardware to test out a solution suggested below.) On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: > Hi Mike, > > On 2/28/2021 1:00 AM, Mike Rapoport wrote: > > Hi Florian, > > > > On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: > >> > >> [...] > >> > >> Hi Roman, Thomas and other linux-mips folks, > >> > >> Kamal and myself have been unable to boot v5.11 on MIPS since this > >> commit, reverting it makes our MIPS platforms boot successfully. We do > >> not see a warning like this one in the commit message, instead what > >> happens appear to be a corrupted Device Tree which prevents the parsing > >> of the "rdb" node and leading to the interrupt controllers not being > >> registered, and the system eventually not booting. > >> > >> The Device Tree is built-into the kernel image and resides at > >> arch/mips/boot/dts/brcm/bcm97435svmb.dts. > >> > >> Do you have any idea what could be wrong with MIPS specifically here? Most likely the problem you've discovered has been there for quite some time. The patch you are referring to just caused it to be triggered by extending the early allocation range. See before that patch was accepted the early memory allocations had been performed in the range: [kernel_end, RAM_END]. The patch changed that, so the early allocations are done within [RAM_START + PAGE_SIZE, RAM_END]. In normal situations it's safe to do that as long as all the critical memory regions (including the memory residing a space below the kernel) have been reserved. But as soon as a memory with some critical structures haven't been reserved, the kernel may allocate it to be used for instance for early initializations with obviously unpredictable but most of the times unpleasant consequences. > > > > Apparently there is a memblock allocation in one of the functions called > > from arch_mem_init() between plat_mem_setup() and > > early_init_fdt_reserve_self(). Mike, alas according to the log provided by Florian that's not the reason of the problem. Please, see my considerations below. > [...] > > [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) > (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun > Feb 28 10:01:50 PST 2021 > [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) > [ 0.000000] FPU revision is: 00130001 > [ 0.000000] memblock_add: [0x00000000-0x0fffffff] > early_init_dt_scan_memory+0x160/0x1e0 > [ 0.000000] memblock_add: [0x20000000-0x4fffffff] > early_init_dt_scan_memory+0x160/0x1e0 > [ 0.000000] memblock_add: [0x90000000-0xcfffffff] > early_init_dt_scan_memory+0x160/0x1e0 Here the memory has been added to the memblock allocator. > [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB > [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') > [ 0.000000] printk: bootconsole [ns16550a0] enabled > [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] > setup_arch+0x128/0x69c Here the fdt memory has been reserved. (Note it's built into the kernel.) > [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] > setup_arch+0x1f8/0x69c Here the kernel itself together with built-in dtb have been reserved. So far so good. > [ 0.000000] Initrd not found or empty - disabling initrd > [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 > from=0x00000000 max_addr=0x00000000 > early_init_dt_alloc_memory_arch+0x40/0x84 > [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 > from=0x00000000 max_addr=0x00000000 > early_init_dt_alloc_memory_arch+0x40/0x84 > [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] > memblock_alloc_range_nid+0xf8/0x198 The log above most likely belongs to the call-chain: setup_arch() +-> arch_mem_init() +-> device_tree_init() - BMIPS specific method +-> unflatten_and_copy_device_tree() So to speak here we've copied the fdt from the original space [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened it to [0x00003aa4-0x0000ba4b]. The problem is that a bit later the next call-chain is performed: setup_arch() +-> plat_smp_setup() +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); +-> if (!board_ebase_setup) board_ebase_setup = &bmips_ebase_setup; So at the moment of the CPU traps initialization the bmips_ebase_setup() method is called. What trap_init() does isn't compatible with the allocation performed by the unflatten_and_copy_device_tree() method. See the next comment. > [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 > from=0x00000000 max_addr=0x00000000 > early_init_dt_alloc_memory_arch+0x40/0x84 > [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] > setup_arch+0x3fc/0x69c > [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 > bytes. > [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, > linesize 32 bytes > [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. > [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] Zone ranges: > [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] > [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] > [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] > [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] > [ 0.000000] Initmem setup node 0 [mem > 0x0000000000000000-0x00000000cfffffff] > [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 > from=0x00000000 max_addr=0x00000000 > alloc_node_mem_map.constprop.135+0x6c/0xc8 > [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 > from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 > [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 > from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 > [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] MEMBLOCK configuration: > [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 > [ 0.000000] memory.cnt = 0x3 > [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 > bytes flags: 0x0 > [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 > bytes flags: 0x0 > [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 > bytes flags: 0x0 > [ 0.000000] reserved.cnt = 0xa > [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 > bytes flags: 0x0 > [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 > bytes flags: 0x0 > [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 > bytes flags: 0x0 > [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 > bytes flags: 0x0 > [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 > bytes flags: 0x0 > [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 > bytes flags: 0x0 > [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 > bytes flags: 0x0 > [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 > bytes flags: 0x0 > [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 > bytes flags: 0x0 > [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 > bytes flags: 0x0 > [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 > [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 > [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 > [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 > [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 > from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 > [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_free: [0x03245000-0x03244fff] > pcpu_embed_first_chunk+0x7a0/0x884 > [ 0.000000] memblock_free: [0x03257000-0x03256fff] > pcpu_embed_first_chunk+0x7a0/0x884 > [ 0.000000] memblock_free: [0x03269000-0x03268fff] > pcpu_embed_first_chunk+0x7a0/0x884 > [ 0.000000] memblock_free: [0x0327b000-0x0327afff] > pcpu_embed_first_chunk+0x7a0/0x884 > [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 > [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec > [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec > [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec > [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec > [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec > [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 > [ 0.000000] memblock_reserve: [0x03232580-0x032325db] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 > [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 > [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 > [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] > pcpu_embed_first_chunk+0x838/0x884 > [ 0.000000] memblock_free: [0x03231400-0x032323ff] > pcpu_embed_first_chunk+0x850/0x884 > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 > [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon > [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c > [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 > bytes, linear) > [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 > from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c > [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] > memblock_alloc_range_nid+0xf8/0x198 > [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 > bytes, linear) > [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] > trap_init+0x70/0x4e8 Most likely someplace here the corruption has happened. The log above has just reserved a memory for NMI/reset vectors: arch/mips/kernel/traps.c: trap_init(void): Line 2373. But then the board_ebase_setup() pointer is dereferenced and called, which has been initialized with bmips_ebase_setup() earlier and which overwrites the ebase variable with: 0x80001000 as this is CPU_BMIPS5000 CPU. So any further calls of the functions like set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a corruption of the memory above 0x80001000, which as we have discovered belongs to fdt and unflattened device tree. > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, > 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K > cma-reserved, 1835008K highmem) > [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > [ 0.000000] rcu: Hierarchical RCU implementation. > [ 0.000000] rcu: RCU event tracing is enabled. > [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay > is 25 jiffies. > [ 0.000000] NR_IRQS: 256 > [ 0.000000] OF: Bad cell count for /rdb > [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers > [ 0.000000] OF: of_irq_init: children remain, but no parents So here is the first time we have got the consequence of the corruption popped up. Luckily it's just the "Bad cells count" error. We could have got much less obvious log here up to getting a crash at some place further... > [ 0.000000] random: get_random_bytes called from > start_kernel+0x444/0x654 with crng_init=0 > [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, > wraps every 8589934590000000ns > > and with your patch applied which unfortunately did not work we have the > following: > > [...] So a patch like this shall workaround the corruption: --- a/arch/mips/bmips/setup.c +++ b/arch/mips/bmips/setup.c @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) __dt_setup_arch(dtb); + memblock_reserve(0x0, 0x1000 + 0x100*64); + for (q = bmips_quirk_list; q->quirk_fn; q++) { if (of_flat_dt_is_compatible(of_get_flat_dt_root(), q->compatible)) { But the main question is how to fix the problem in general. At least for Broadcom CPUs the reservation needs to be performed before device_tree_init() is called, since the later is the very first method which starts allocating from memblock. So the best candidate is to use plat_mem_setup() for reservation right after the memory is added to the memblock allocator by means of the __dt_setup_arch() function invocation. In addition, we need take into account the amount of memory each type of the Broadcom CPU needs for the exception vectors. So a function like this could be used to reserve the exception vectors memory: static void bmips_ebase_reserve(void) { phys_addr_t base, size = VECTORSPACING*64; switch (current_cpu_type()) { case CPU_BMIPS4350: return; case CPU_BMIPS3300: case CPU_BMIPS4380: base = 0x0400; break; case CPU_BMIPS5000: base = 0x1000; break; default: return; } memblock_reserve(base, size); } Though I am not sure it's correct. At least on P5600 the vector spacing is configurable. Anyway all of that concerns the Broadcom CPUs. But the same problem we can experience for some other platforms which developers weren't careful enough in reserving all the critical memory sections in the platform code. Especially after the introduced by Roman patch has been merged into the kernel. -Sergey > > [...] > -- > Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-02-28 23:08 ` Serge Semin @ 2021-03-01 3:50 ` Florian Fainelli 2021-03-01 9:22 ` Serge Semin 2021-03-01 9:45 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport 0 siblings, 2 replies; 38+ messages in thread From: Florian Fainelli @ 2021-03-01 3:50 UTC (permalink / raw) To: Serge Semin, Mike Rapoport, Thomas Bogendoerfer Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team, open list:BROADCOM BMIPS MIPS ARCHITECTURE Hi Serge, On 2/28/2021 3:08 PM, Serge Semin wrote: > Hi folks, > What you've got here seems a more complicated problem than it > could originally look like. Please, see my comments below. > > (Note I've discarded some of the email logs, which of no interest > to the discovered problem. Please also note that I haven't got any > Broadcom hardware to test out a solution suggested below.) > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: >> Hi Mike, >> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote: >>> Hi Florian, >>> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: >>>> > >>>> [...] > >>>> >>>> Hi Roman, Thomas and other linux-mips folks, >>>> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this >>>> commit, reverting it makes our MIPS platforms boot successfully. We do >>>> not see a warning like this one in the commit message, instead what >>>> happens appear to be a corrupted Device Tree which prevents the parsing >>>> of the "rdb" node and leading to the interrupt controllers not being >>>> registered, and the system eventually not booting. >>>> >>>> The Device Tree is built-into the kernel image and resides at >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. >>>> >>>> Do you have any idea what could be wrong with MIPS specifically here? > > Most likely the problem you've discovered has been there for quite > some time. The patch you are referring to just caused it to be > triggered by extending the early allocation range. See before that > patch was accepted the early memory allocations had been performed > in the range: > [kernel_end, RAM_END]. > The patch changed that, so the early allocations are done within > [RAM_START + PAGE_SIZE, RAM_END]. > > In normal situations it's safe to do that as long as all the critical > memory regions (including the memory residing a space below the > kernel) have been reserved. But as soon as a memory with some critical > structures haven't been reserved, the kernel may allocate it to be used > for instance for early initializations with obviously unpredictable but > most of the times unpleasant consequences. > >>> >>> Apparently there is a memblock allocation in one of the functions called >>> from arch_mem_init() between plat_mem_setup() and >>> early_init_fdt_reserve_self(). > > Mike, alas according to the log provided by Florian that's not the reason > of the problem. Please, see my considerations below. > >> [...] >> >> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun >> Feb 28 10:01:50 PST 2021 >> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) >> [ 0.000000] FPU revision is: 00130001 > >> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] >> early_init_dt_scan_memory+0x160/0x1e0 >> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] >> early_init_dt_scan_memory+0x160/0x1e0 >> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] >> early_init_dt_scan_memory+0x160/0x1e0 > > Here the memory has been added to the memblock allocator. > >> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB >> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') >> [ 0.000000] printk: bootconsole [ns16550a0] enabled > >> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] >> setup_arch+0x128/0x69c > > Here the fdt memory has been reserved. (Note it's built into the > kernel.) > >> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] >> setup_arch+0x1f8/0x69c > > Here the kernel itself together with built-in dtb have been reserved. > So far so good. > >> [ 0.000000] Initrd not found or empty - disabling initrd > >> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 >> from=0x00000000 max_addr=0x00000000 >> early_init_dt_alloc_memory_arch+0x40/0x84 >> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 >> from=0x00000000 max_addr=0x00000000 >> early_init_dt_alloc_memory_arch+0x40/0x84 >> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] >> memblock_alloc_range_nid+0xf8/0x198 > > The log above most likely belongs to the call-chain: > setup_arch() > +-> arch_mem_init() > +-> device_tree_init() - BMIPS specific method > +-> unflatten_and_copy_device_tree() > > So to speak here we've copied the fdt from the original space > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened > it to [0x00003aa4-0x0000ba4b]. > > The problem is that a bit later the next call-chain is performed: > setup_arch() > +-> plat_smp_setup() > +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); > +-> if (!board_ebase_setup) > board_ebase_setup = &bmips_ebase_setup; > > So at the moment of the CPU traps initialization the bmips_ebase_setup() > method is called. What trap_init() does isn't compatible with the > allocation performed by the unflatten_and_copy_device_tree() method. > See the next comment. > >> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 >> from=0x00000000 max_addr=0x00000000 >> early_init_dt_alloc_memory_arch+0x40/0x84 >> [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] >> setup_arch+0x3fc/0x69c >> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c >> [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c >> [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c >> [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 >> bytes. >> [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, >> linesize 32 bytes >> [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 >> [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 >> [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 >> [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] Zone ranges: >> [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] >> [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] >> [ 0.000000] Movable zone start for each node >> [ 0.000000] Early memory node ranges >> [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] >> [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] >> [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] >> [ 0.000000] Initmem setup node 0 [mem >> 0x0000000000000000-0x00000000cfffffff] >> [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 >> from=0x00000000 max_addr=0x00000000 >> alloc_node_mem_map.constprop.135+0x6c/0xc8 >> [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 >> [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 >> [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] MEMBLOCK configuration: >> [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 >> [ 0.000000] memory.cnt = 0x3 >> [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 >> bytes flags: 0x0 >> [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 >> bytes flags: 0x0 >> [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 >> bytes flags: 0x0 >> [ 0.000000] reserved.cnt = 0xa >> [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 >> bytes flags: 0x0 >> [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 >> bytes flags: 0x0 >> [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 >> bytes flags: 0x0 >> [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 >> bytes flags: 0x0 >> [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 >> bytes flags: 0x0 >> [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 >> bytes flags: 0x0 >> [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 >> bytes flags: 0x0 >> [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 >> bytes flags: 0x0 >> [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 >> bytes flags: 0x0 >> [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 >> bytes flags: 0x0 >> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 >> [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 >> [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 >> [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 >> [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 >> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 >> [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_free: [0x03245000-0x03244fff] >> pcpu_embed_first_chunk+0x7a0/0x884 >> [ 0.000000] memblock_free: [0x03257000-0x03256fff] >> pcpu_embed_first_chunk+0x7a0/0x884 >> [ 0.000000] memblock_free: [0x03269000-0x03268fff] >> pcpu_embed_first_chunk+0x7a0/0x884 >> [ 0.000000] memblock_free: [0x0327b000-0x0327afff] >> pcpu_embed_first_chunk+0x7a0/0x884 >> [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 >> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec >> [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec >> [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec >> [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec >> [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec >> [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 >> [ 0.000000] memblock_reserve: [0x03232580-0x032325db] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 >> [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 >> [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 >> [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] >> pcpu_embed_first_chunk+0x838/0x884 >> [ 0.000000] memblock_free: [0x03231400-0x032323ff] >> pcpu_embed_first_chunk+0x850/0x884 >> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 >> [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon >> [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c >> [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 >> bytes, linear) >> [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c >> [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] >> memblock_alloc_range_nid+0xf8/0x198 >> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 >> bytes, linear) > >> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] >> trap_init+0x70/0x4e8 > > Most likely someplace here the corruption has happened. The log above > has just reserved a memory for NMI/reset vectors: > arch/mips/kernel/traps.c: trap_init(void): Line 2373. > > But then the board_ebase_setup() pointer is dereferenced and called, > which has been initialized with bmips_ebase_setup() earlier and which > overwrites the ebase variable with: 0x80001000 as this is > CPU_BMIPS5000 CPU. So any further calls of the functions like > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a > corruption of the memory above 0x80001000, which as we have discovered > belongs to fdt and unflattened device tree. > >> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off >> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K >> cma-reserved, 1835008K highmem) >> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 >> [ 0.000000] rcu: Hierarchical RCU implementation. >> [ 0.000000] rcu: RCU event tracing is enabled. >> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay >> is 25 jiffies. >> [ 0.000000] NR_IRQS: 256 > >> [ 0.000000] OF: Bad cell count for /rdb >> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers >> [ 0.000000] OF: of_irq_init: children remain, but no parents > > So here is the first time we have got the consequence of the corruption > popped up. Luckily it's just the "Bad cells count" error. We could have > got much less obvious log here up to getting a crash at some place > further... > >> [ 0.000000] random: get_random_bytes called from >> start_kernel+0x444/0x654 with crng_init=0 >> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, >> wraps every 8589934590000000ns > >> >> and with your patch applied which unfortunately did not work we have the >> following: >> >> [...] > > So a patch like this shall workaround the corruption: > > --- a/arch/mips/bmips/setup.c > +++ b/arch/mips/bmips/setup.c > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) > > __dt_setup_arch(dtb); > > + memblock_reserve(0x0, 0x1000 + 0x100*64); > + > for (q = bmips_quirk_list; q->quirk_fn; q++) { > if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > q->compatible)) { This patch works, thanks a lot for the troubleshooting and analysis! How about the following which would be more generic and works as well and should be more universal since it does not require each architecture to provide an appropriate call to memblock_reserve(): diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index e0352958e2f7..b0a173b500e8 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2367,10 +2367,7 @@ void __init trap_init(void) if (!cpu_has_mips_r2_r6) { ebase = CAC_BASE; - ebase_pa = virt_to_phys((void *)ebase); vec_size = 0x400; - - memblock_reserve(ebase_pa, vec_size); } else { if (cpu_has_veic || cpu_has_vint) vec_size = 0x200 + VECTORSPACING*64; @@ -2410,6 +2407,14 @@ void __init trap_init(void) if (board_ebase_setup) board_ebase_setup(); + + /* board_ebase_setup() can change the exception base address + * reserve it now after changes were made. + */ + if (!cpu_has_mips_r2_r6) { + ebase_pa = virt_to_phys((void *)ebase); + memblock_reserve(ebase_pa, vec_size); + } per_cpu_trap_init(true); memblock_set_bottom_up(false); -- Florian ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-03-01 3:50 ` Florian Fainelli @ 2021-03-01 9:22 ` Serge Semin 2021-03-02 4:09 ` Florian Fainelli 2021-03-02 4:19 ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli 2021-03-01 9:45 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport 1 sibling, 2 replies; 38+ messages in thread From: Serge Semin @ 2021-03-01 9:22 UTC (permalink / raw) To: Florian Fainelli, Mike Rapoport, Thomas Bogendoerfer Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team, open list:BROADCOM BMIPS MIPS ARCHITECTURE On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote: > Hi Serge, > > On 2/28/2021 3:08 PM, Serge Semin wrote: > > Hi folks, > > What you've got here seems a more complicated problem than it > > could originally look like. Please, see my comments below. > > > > (Note I've discarded some of the email logs, which of no interest > > to the discovered problem. Please also note that I haven't got any > > Broadcom hardware to test out a solution suggested below.) > > > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: > >> Hi Mike, > >> > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote: > >>> Hi Florian, > >>> > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: > >>>> > > > >>>> [...] > > > >>>> > >>>> Hi Roman, Thomas and other linux-mips folks, > >>>> > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do > >>>> not see a warning like this one in the commit message, instead what > >>>> happens appear to be a corrupted Device Tree which prevents the parsing > >>>> of the "rdb" node and leading to the interrupt controllers not being > >>>> registered, and the system eventually not booting. > >>>> > >>>> The Device Tree is built-into the kernel image and resides at > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. > >>>> > >>>> Do you have any idea what could be wrong with MIPS specifically here? > > > > Most likely the problem you've discovered has been there for quite > > some time. The patch you are referring to just caused it to be > > triggered by extending the early allocation range. See before that > > patch was accepted the early memory allocations had been performed > > in the range: > > [kernel_end, RAM_END]. > > The patch changed that, so the early allocations are done within > > [RAM_START + PAGE_SIZE, RAM_END]. > > > > In normal situations it's safe to do that as long as all the critical > > memory regions (including the memory residing a space below the > > kernel) have been reserved. But as soon as a memory with some critical > > structures haven't been reserved, the kernel may allocate it to be used > > for instance for early initializations with obviously unpredictable but > > most of the times unpleasant consequences. > > > >>> > >>> Apparently there is a memblock allocation in one of the functions called > >>> from arch_mem_init() between plat_mem_setup() and > >>> early_init_fdt_reserve_self(). > > > > Mike, alas according to the log provided by Florian that's not the reason > > of the problem. Please, see my considerations below. > > > >> [...] > >> > >> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun > >> Feb 28 10:01:50 PST 2021 > >> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) > >> [ 0.000000] FPU revision is: 00130001 > > > >> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] > >> early_init_dt_scan_memory+0x160/0x1e0 > >> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] > >> early_init_dt_scan_memory+0x160/0x1e0 > >> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] > >> early_init_dt_scan_memory+0x160/0x1e0 > > > > Here the memory has been added to the memblock allocator. > > > >> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB > >> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') > >> [ 0.000000] printk: bootconsole [ns16550a0] enabled > > > >> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] > >> setup_arch+0x128/0x69c > > > > Here the fdt memory has been reserved. (Note it's built into the > > kernel.) > > > >> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] > >> setup_arch+0x1f8/0x69c > > > > Here the kernel itself together with built-in dtb have been reserved. > > So far so good. > > > >> [ 0.000000] Initrd not found or empty - disabling initrd > > > >> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 > >> from=0x00000000 max_addr=0x00000000 > >> early_init_dt_alloc_memory_arch+0x40/0x84 > >> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 > >> from=0x00000000 max_addr=0x00000000 > >> early_init_dt_alloc_memory_arch+0x40/0x84 > >> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] > >> memblock_alloc_range_nid+0xf8/0x198 > > > > The log above most likely belongs to the call-chain: > > setup_arch() > > +-> arch_mem_init() > > +-> device_tree_init() - BMIPS specific method > > +-> unflatten_and_copy_device_tree() > > > > So to speak here we've copied the fdt from the original space > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened > > it to [0x00003aa4-0x0000ba4b]. > > > > The problem is that a bit later the next call-chain is performed: > > setup_arch() > > +-> plat_smp_setup() > > +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); > > +-> if (!board_ebase_setup) > > board_ebase_setup = &bmips_ebase_setup; > > > > So at the moment of the CPU traps initialization the bmips_ebase_setup() > > method is called. What trap_init() does isn't compatible with the > > allocation performed by the unflatten_and_copy_device_tree() method. > > See the next comment. > > > >> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 > >> from=0x00000000 max_addr=0x00000000 > >> early_init_dt_alloc_memory_arch+0x40/0x84 > >> [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] > >> setup_arch+0x3fc/0x69c > >> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > >> [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > >> [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > >> [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 > >> bytes. > >> [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, > >> linesize 32 bytes > >> [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. > >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > >> [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > >> [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > >> [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] Zone ranges: > >> [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] > >> [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] > >> [ 0.000000] Movable zone start for each node > >> [ 0.000000] Early memory node ranges > >> [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] > >> [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] > >> [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] > >> [ 0.000000] Initmem setup node 0 [mem > >> 0x0000000000000000-0x00000000cfffffff] > >> [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 > >> from=0x00000000 max_addr=0x00000000 > >> alloc_node_mem_map.constprop.135+0x6c/0xc8 > >> [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 > >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 > >> [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 > >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 > >> [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] MEMBLOCK configuration: > >> [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 > >> [ 0.000000] memory.cnt = 0x3 > >> [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 > >> bytes flags: 0x0 > >> [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 > >> bytes flags: 0x0 > >> [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 > >> bytes flags: 0x0 > >> [ 0.000000] reserved.cnt = 0xa > >> [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 > >> bytes flags: 0x0 > >> [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 > >> bytes flags: 0x0 > >> [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 > >> bytes flags: 0x0 > >> [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 > >> bytes flags: 0x0 > >> [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 > >> bytes flags: 0x0 > >> [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 > >> bytes flags: 0x0 > >> [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 > >> bytes flags: 0x0 > >> [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 > >> bytes flags: 0x0 > >> [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 > >> bytes flags: 0x0 > >> [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 > >> bytes flags: 0x0 > >> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 > >> [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 > >> [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 > >> [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 > >> [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 > >> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 > >> [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_free: [0x03245000-0x03244fff] > >> pcpu_embed_first_chunk+0x7a0/0x884 > >> [ 0.000000] memblock_free: [0x03257000-0x03256fff] > >> pcpu_embed_first_chunk+0x7a0/0x884 > >> [ 0.000000] memblock_free: [0x03269000-0x03268fff] > >> pcpu_embed_first_chunk+0x7a0/0x884 > >> [ 0.000000] memblock_free: [0x0327b000-0x0327afff] > >> pcpu_embed_first_chunk+0x7a0/0x884 > >> [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 > >> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec > >> [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec > >> [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec > >> [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec > >> [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec > >> [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 > >> [ 0.000000] memblock_reserve: [0x03232580-0x032325db] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 > >> [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 > >> [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 > >> [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] > >> pcpu_embed_first_chunk+0x838/0x884 > >> [ 0.000000] memblock_free: [0x03231400-0x032323ff] > >> pcpu_embed_first_chunk+0x850/0x884 > >> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 > >> [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon > >> [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c > >> [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 > >> bytes, linear) > >> [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 > >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c > >> [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 > >> bytes, linear) > > > >> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] > >> trap_init+0x70/0x4e8 > > > > Most likely someplace here the corruption has happened. The log above > > has just reserved a memory for NMI/reset vectors: > > arch/mips/kernel/traps.c: trap_init(void): Line 2373. > > > > But then the board_ebase_setup() pointer is dereferenced and called, > > which has been initialized with bmips_ebase_setup() earlier and which > > overwrites the ebase variable with: 0x80001000 as this is > > CPU_BMIPS5000 CPU. So any further calls of the functions like > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a > > corruption of the memory above 0x80001000, which as we have discovered > > belongs to fdt and unflattened device tree. > > > >> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > >> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K > >> cma-reserved, 1835008K highmem) > >> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > >> [ 0.000000] rcu: Hierarchical RCU implementation. > >> [ 0.000000] rcu: RCU event tracing is enabled. > >> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay > >> is 25 jiffies. > >> [ 0.000000] NR_IRQS: 256 > > > >> [ 0.000000] OF: Bad cell count for /rdb > >> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers > >> [ 0.000000] OF: of_irq_init: children remain, but no parents > > > > So here is the first time we have got the consequence of the corruption > > popped up. Luckily it's just the "Bad cells count" error. We could have > > got much less obvious log here up to getting a crash at some place > > further... > > > >> [ 0.000000] random: get_random_bytes called from > >> start_kernel+0x444/0x654 with crng_init=0 > >> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, > >> wraps every 8589934590000000ns > > > >> > >> and with your patch applied which unfortunately did not work we have the > >> following: > >> > >> [...] > > > > So a patch like this shall workaround the corruption: > > > > --- a/arch/mips/bmips/setup.c > > +++ b/arch/mips/bmips/setup.c > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) > > > > __dt_setup_arch(dtb); > > > > + memblock_reserve(0x0, 0x1000 + 0x100*64); > > + > > for (q = bmips_quirk_list; q->quirk_fn; q++) { > > if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > > q->compatible)) { > > This patch works, thanks a lot for the troubleshooting and analysis! How > about the following which would be more generic and works as well and > should be more universal since it does not require each architecture to > provide an appropriate call to memblock_reserve(): Hm, are you sure it's working? If so, my analysis hasn't been quite correct. My suggestion was based on the memory initializations, allocations and reservations trace. So here is the sequence of most crucial of them: 1) Memblock initialization: start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch() (At this point I suggested to place the exceptions memory reservation.) 2) Base FDT memory reservation: start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self() 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges reservation: start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem() 4) Reserve kernel itself, some critical sections like initrd and crash-kernel: start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()... 5) Copy and unflatten the built-into the kernel device tree (BMIPS-platform code): start_kernel()->setup_arch()->arch_mem_init()->device_tree_init() This is the very first time an allocation from the memblock pool is performed. Since we haven't reserved a memory for the exception vectors yet, the memblock allocator is free to return that memory range for any other use. Needless to say if we try to use that memory later without consulting with memblock, we may and in our case will get into troubles. 6) Many random early memblock allocations for kernel use before buddy and sl*b allocators are up and running... Note if for some fortunate reason the allocations made in 5) didn't overlap the exceptions memory, here we have much more chances to do that with obviously fatal consequences of the ranges independent usage. 7) Trap/exception vectors initialization and !memory reservation! for them: start_kernel()->trap_init() Only at this point we get to reserve the memory for the vectors. 8) Init and run buddy/sl*b allocators: start_kernel()->mm_init()->...mem_init()... There are a lot of allocations done in 5) and 6) before the trap_init() is called in 7). You can see that in your log. That's why I have doubts that your patch worked well. Most likely you've forgotten to revert the workaround suggested by me in the previous message. Could you make sure that you didn't and re-test your patch again? If it still works then I might have confused something and it's strange that my patch worked in the first place... A food for thoughts for everyone (Thomas, Mark, please join the discussion). What we've got here is a bit bigger problem. AFAICS if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node() performs the allocation above the very first PAGE_SIZE memory chunk (see that method code for details). So we are currently on a safe side for some older MIPS platforms. But the platform with VEIC/VINT may get into the same troubles here if they didn't reserve exception memory early enough before the kernel starts random allocations from memblock. So we either need to provide a generic workaround for that or make sure each platform gets to reserve vectors itself for instance in the plat_mem_setup() method. -Sergey > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > index e0352958e2f7..b0a173b500e8 100644 > --- a/arch/mips/kernel/traps.c > +++ b/arch/mips/kernel/traps.c > @@ -2367,10 +2367,7 @@ void __init trap_init(void) > > if (!cpu_has_mips_r2_r6) { > ebase = CAC_BASE; > - ebase_pa = virt_to_phys((void *)ebase); > vec_size = 0x400; > - > - memblock_reserve(ebase_pa, vec_size); > } else { > if (cpu_has_veic || cpu_has_vint) > vec_size = 0x200 + VECTORSPACING*64; > @@ -2410,6 +2407,14 @@ void __init trap_init(void) > > if (board_ebase_setup) > board_ebase_setup(); > + > + /* board_ebase_setup() can change the exception base address > + * reserve it now after changes were made. > + */ > + if (!cpu_has_mips_r2_r6) { > + ebase_pa = virt_to_phys((void *)ebase); > + memblock_reserve(ebase_pa, vec_size); > + } > per_cpu_trap_init(true); > memblock_set_bottom_up(false); > -- > Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-03-01 9:22 ` Serge Semin @ 2021-03-02 4:09 ` Florian Fainelli 2021-03-02 13:26 ` Serge Semin 2021-03-02 4:19 ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli 1 sibling, 1 reply; 38+ messages in thread From: Florian Fainelli @ 2021-03-02 4:09 UTC (permalink / raw) To: Serge Semin, Mike Rapoport, Thomas Bogendoerfer Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team, open list:BROADCOM BMIPS MIPS ARCHITECTURE On 3/1/2021 1:22 AM, Serge Semin wrote: > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote: >> Hi Serge, >> >> On 2/28/2021 3:08 PM, Serge Semin wrote: >>> Hi folks, >>> What you've got here seems a more complicated problem than it >>> could originally look like. Please, see my comments below. >>> >>> (Note I've discarded some of the email logs, which of no interest >>> to the discovered problem. Please also note that I haven't got any >>> Broadcom hardware to test out a solution suggested below.) >>> >>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: >>>> Hi Mike, >>>> >>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote: >>>>> Hi Florian, >>>>> >>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: >>>>>> >>> >>>>>> [...] >>> >>>>>> >>>>>> Hi Roman, Thomas and other linux-mips folks, >>>>>> >>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this >>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do >>>>>> not see a warning like this one in the commit message, instead what >>>>>> happens appear to be a corrupted Device Tree which prevents the parsing >>>>>> of the "rdb" node and leading to the interrupt controllers not being >>>>>> registered, and the system eventually not booting. >>>>>> >>>>>> The Device Tree is built-into the kernel image and resides at >>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. >>>>>> >>>>>> Do you have any idea what could be wrong with MIPS specifically here? >>> >>> Most likely the problem you've discovered has been there for quite >>> some time. The patch you are referring to just caused it to be >>> triggered by extending the early allocation range. See before that >>> patch was accepted the early memory allocations had been performed >>> in the range: >>> [kernel_end, RAM_END]. >>> The patch changed that, so the early allocations are done within >>> [RAM_START + PAGE_SIZE, RAM_END]. >>> >>> In normal situations it's safe to do that as long as all the critical >>> memory regions (including the memory residing a space below the >>> kernel) have been reserved. But as soon as a memory with some critical >>> structures haven't been reserved, the kernel may allocate it to be used >>> for instance for early initializations with obviously unpredictable but >>> most of the times unpleasant consequences. >>> >>>>> >>>>> Apparently there is a memblock allocation in one of the functions called >>>>> from arch_mem_init() between plat_mem_setup() and >>>>> early_init_fdt_reserve_self(). >>> >>> Mike, alas according to the log provided by Florian that's not the reason >>> of the problem. Please, see my considerations below. >>> >>>> [...] >>>> >>>> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) >>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun >>>> Feb 28 10:01:50 PST 2021 >>>> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) >>>> [ 0.000000] FPU revision is: 00130001 >>> >>>> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] >>>> early_init_dt_scan_memory+0x160/0x1e0 >>>> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] >>>> early_init_dt_scan_memory+0x160/0x1e0 >>>> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] >>>> early_init_dt_scan_memory+0x160/0x1e0 >>> >>> Here the memory has been added to the memblock allocator. >>> >>>> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB >>>> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') >>>> [ 0.000000] printk: bootconsole [ns16550a0] enabled >>> >>>> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] >>>> setup_arch+0x128/0x69c >>> >>> Here the fdt memory has been reserved. (Note it's built into the >>> kernel.) >>> >>>> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] >>>> setup_arch+0x1f8/0x69c >>> >>> Here the kernel itself together with built-in dtb have been reserved. >>> So far so good. >>> >>>> [ 0.000000] Initrd not found or empty - disabling initrd >>> >>>> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 >>>> early_init_dt_alloc_memory_arch+0x40/0x84 >>>> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 >>>> early_init_dt_alloc_memory_arch+0x40/0x84 >>>> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] >>>> memblock_alloc_range_nid+0xf8/0x198 >>> >>> The log above most likely belongs to the call-chain: >>> setup_arch() >>> +-> arch_mem_init() >>> +-> device_tree_init() - BMIPS specific method >>> +-> unflatten_and_copy_device_tree() >>> >>> So to speak here we've copied the fdt from the original space >>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened >>> it to [0x00003aa4-0x0000ba4b]. >>> >>> The problem is that a bit later the next call-chain is performed: >>> setup_arch() >>> +-> plat_smp_setup() >>> +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); >>> +-> if (!board_ebase_setup) >>> board_ebase_setup = &bmips_ebase_setup; >>> >>> So at the moment of the CPU traps initialization the bmips_ebase_setup() >>> method is called. What trap_init() does isn't compatible with the >>> allocation performed by the unflatten_and_copy_device_tree() method. >>> See the next comment. >>> >>>> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 >>>> early_init_dt_alloc_memory_arch+0x40/0x84 >>>> [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] >>>> setup_arch+0x3fc/0x69c >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c >>>> [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c >>>> [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c >>>> [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 >>>> bytes. >>>> [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, >>>> linesize 32 bytes >>>> [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 >>>> [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 >>>> [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 >>>> [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Zone ranges: >>>> [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] >>>> [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] >>>> [ 0.000000] Movable zone start for each node >>>> [ 0.000000] Early memory node ranges >>>> [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] >>>> [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] >>>> [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] >>>> [ 0.000000] Initmem setup node 0 [mem >>>> 0x0000000000000000-0x00000000cfffffff] >>>> [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 >>>> from=0x00000000 max_addr=0x00000000 >>>> alloc_node_mem_map.constprop.135+0x6c/0xc8 >>>> [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 >>>> [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 >>>> [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] MEMBLOCK configuration: >>>> [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 >>>> [ 0.000000] memory.cnt = 0x3 >>>> [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 >>>> bytes flags: 0x0 >>>> [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 >>>> bytes flags: 0x0 >>>> [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved.cnt = 0xa >>>> [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 >>>> bytes flags: 0x0 >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 >>>> [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 >>>> [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 >>>> [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 >>>> [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 >>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 >>>> [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_free: [0x03245000-0x03244fff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] memblock_free: [0x03257000-0x03256fff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] memblock_free: [0x03269000-0x03268fff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] memblock_free: [0x0327b000-0x0327afff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec >>>> [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec >>>> [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec >>>> [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec >>>> [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec >>>> [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 >>>> [ 0.000000] memblock_reserve: [0x03232580-0x032325db] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 >>>> [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 >>>> [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 >>>> [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] >>>> pcpu_embed_first_chunk+0x838/0x884 >>>> [ 0.000000] memblock_free: [0x03231400-0x032323ff] >>>> pcpu_embed_first_chunk+0x850/0x884 >>>> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 >>>> [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon >>>> [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c >>>> [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 >>>> bytes, linear) >>>> [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c >>>> [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 >>>> bytes, linear) >>> >>>> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] >>>> trap_init+0x70/0x4e8 >>> >>> Most likely someplace here the corruption has happened. The log above >>> has just reserved a memory for NMI/reset vectors: >>> arch/mips/kernel/traps.c: trap_init(void): Line 2373. >>> >>> But then the board_ebase_setup() pointer is dereferenced and called, >>> which has been initialized with bmips_ebase_setup() earlier and which >>> overwrites the ebase variable with: 0x80001000 as this is >>> CPU_BMIPS5000 CPU. So any further calls of the functions like >>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a >>> corruption of the memory above 0x80001000, which as we have discovered >>> belongs to fdt and unflattened device tree. >>> >>>> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off >>>> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, >>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K >>>> cma-reserved, 1835008K highmem) >>>> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 >>>> [ 0.000000] rcu: Hierarchical RCU implementation. >>>> [ 0.000000] rcu: RCU event tracing is enabled. >>>> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay >>>> is 25 jiffies. >>>> [ 0.000000] NR_IRQS: 256 >>> >>>> [ 0.000000] OF: Bad cell count for /rdb >>>> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers >>>> [ 0.000000] OF: of_irq_init: children remain, but no parents >>> >>> So here is the first time we have got the consequence of the corruption >>> popped up. Luckily it's just the "Bad cells count" error. We could have >>> got much less obvious log here up to getting a crash at some place >>> further... >>> >>>> [ 0.000000] random: get_random_bytes called from >>>> start_kernel+0x444/0x654 with crng_init=0 >>>> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, >>>> wraps every 8589934590000000ns >>> >>>> >>>> and with your patch applied which unfortunately did not work we have the >>>> following: >>>> >>>> [...] >>> >>> So a patch like this shall workaround the corruption: >>> >>> --- a/arch/mips/bmips/setup.c >>> +++ b/arch/mips/bmips/setup.c >>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) >>> >>> __dt_setup_arch(dtb); >>> >>> + memblock_reserve(0x0, 0x1000 + 0x100*64); >>> + >>> for (q = bmips_quirk_list; q->quirk_fn; q++) { >>> if (of_flat_dt_is_compatible(of_get_flat_dt_root(), >>> q->compatible)) { >> > >> This patch works, thanks a lot for the troubleshooting and analysis! How >> about the following which would be more generic and works as well and >> should be more universal since it does not require each architecture to >> provide an appropriate call to memblock_reserve(): > > Hm, are you sure it's working? I was until I noticed that I was working on top of a revert of Roman's patch sorry about the brain fart here. > If so, my analysis hasn't been quite > correct. My suggestion was based on the memory initializations, > allocations and reservations trace. So here is the sequence of most > crucial of them: > 1) Memblock initialization: > start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch() > (At this point I suggested to place the exceptions memory > reservation.) > 2) Base FDT memory reservation: > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self() > 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges > reservation: > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem() > 4) Reserve kernel itself, some critical sections like initrd and > crash-kernel: > start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()... > 5) Copy and unflatten the built-into the kernel device tree > (BMIPS-platform code): > start_kernel()->setup_arch()->arch_mem_init()->device_tree_init() > This is the very first time an allocation from the memblock pool > is performed. Since we haven't reserved a memory for the exception > vectors yet, the memblock allocator is free to return that memory > range for any other use. Needless to say if we try to use that memory > later without consulting with memblock, we may and in our case > will get into troubles. > 6) Many random early memblock allocations for kernel use before > buddy and sl*b allocators are up and running... > Note if for some fortunate reason the allocations made in 5) didn't > overlap the exceptions memory, here we have much more chances to > do that with obviously fatal consequences of the ranges independent > usage. > 7) Trap/exception vectors initialization and !memory reservation! for > them: > start_kernel()->trap_init() > Only at this point we get to reserve the memory for the vectors. > 8) Init and run buddy/sl*b allocators: > start_kernel()->mm_init()->...mem_init()... > > There are a lot of allocations done in 5) and 6) before the > trap_init() is called in 7). You can see that in your log. That's why > I have doubts that your patch worked well. Most likely you've > forgotten to revert the workaround suggested by me in the previous > message. Could you make sure that you didn't and re-test your patch > again? If it still works then I might have confused something and it's > strange that my patch worked in the first place... I would like to submit a fix for 5.12-rc1 and get it back ported into 5.11 so we have BMIPS machines boot again, that will be essentially your earlier proposed fix. BMIPS is the only "legacy" MIPS platform that defines an exception base, so while this problem may certainly exist with other platforms, I do wonder how likely it is there, though? > > A food for thoughts for everyone (Thomas, Mark, please join the > discussion). What we've got here is a bit bigger problem. AFAICS > if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node() > performs the allocation above the very first PAGE_SIZE memory chunk > (see that method code for details). So we are currently on a safe side > for some older MIPS platforms. But the platform with VEIC/VINT may get > into the same troubles here if they didn't reserve exception memory > early enough before the kernel starts random allocations from > memblock. So we either need to provide a generic workaround for that > or make sure each platform gets to reserve vectors itself for instance > in the plat_mem_setup() method. > > -Sergey > >> >> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c >> index e0352958e2f7..b0a173b500e8 100644 >> --- a/arch/mips/kernel/traps.c >> +++ b/arch/mips/kernel/traps.c >> @@ -2367,10 +2367,7 @@ void __init trap_init(void) >> >> if (!cpu_has_mips_r2_r6) { >> ebase = CAC_BASE; >> - ebase_pa = virt_to_phys((void *)ebase); >> vec_size = 0x400; >> - >> - memblock_reserve(ebase_pa, vec_size); >> } else { >> if (cpu_has_veic || cpu_has_vint) >> vec_size = 0x200 + VECTORSPACING*64; >> @@ -2410,6 +2407,14 @@ void __init trap_init(void) >> >> if (board_ebase_setup) >> board_ebase_setup(); >> + >> + /* board_ebase_setup() can change the exception base address >> + * reserve it now after changes were made. >> + */ >> + if (!cpu_has_mips_r2_r6) { >> + ebase_pa = virt_to_phys((void *)ebase); >> + memblock_reserve(ebase_pa, vec_size); >> + } >> per_cpu_trap_init(true); >> memblock_set_bottom_up(false); >> -- >> Florian -- Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-03-02 4:09 ` Florian Fainelli @ 2021-03-02 13:26 ` Serge Semin 0 siblings, 0 replies; 38+ messages in thread From: Serge Semin @ 2021-03-02 13:26 UTC (permalink / raw) To: Florian Fainelli Cc: Serge Semin, Mike Rapoport, Thomas Bogendoerfer, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team, open list:BROADCOM BMIPS MIPS ARCHITECTURE On Mon, Mar 01, 2021 at 08:09:52PM -0800, Florian Fainelli wrote: > > > On 3/1/2021 1:22 AM, Serge Semin wrote: > > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote: > >> Hi Serge, > >> > >> On 2/28/2021 3:08 PM, Serge Semin wrote: > >>> Hi folks, > >>> What you've got here seems a more complicated problem than it > >>> could originally look like. Please, see my comments below. > >>> > >>> (Note I've discarded some of the email logs, which of no interest > >>> to the discovered problem. Please also note that I haven't got any > >>> Broadcom hardware to test out a solution suggested below.) > >>> > >>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: > >>>> Hi Mike, > >>>> > >>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote: > >>>>> Hi Florian, > >>>>> > >>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: > >>>>>> > >>> > >>>>>> [...] > >>> > >>>>>> > >>>>>> Hi Roman, Thomas and other linux-mips folks, > >>>>>> > >>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this > >>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do > >>>>>> not see a warning like this one in the commit message, instead what > >>>>>> happens appear to be a corrupted Device Tree which prevents the parsing > >>>>>> of the "rdb" node and leading to the interrupt controllers not being > >>>>>> registered, and the system eventually not booting. > >>>>>> > >>>>>> The Device Tree is built-into the kernel image and resides at > >>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. > >>>>>> > >>>>>> Do you have any idea what could be wrong with MIPS specifically here? > >>> > >>> Most likely the problem you've discovered has been there for quite > >>> some time. The patch you are referring to just caused it to be > >>> triggered by extending the early allocation range. See before that > >>> patch was accepted the early memory allocations had been performed > >>> in the range: > >>> [kernel_end, RAM_END]. > >>> The patch changed that, so the early allocations are done within > >>> [RAM_START + PAGE_SIZE, RAM_END]. > >>> > >>> In normal situations it's safe to do that as long as all the critical > >>> memory regions (including the memory residing a space below the > >>> kernel) have been reserved. But as soon as a memory with some critical > >>> structures haven't been reserved, the kernel may allocate it to be used > >>> for instance for early initializations with obviously unpredictable but > >>> most of the times unpleasant consequences. > >>> > >>>>> > >>>>> Apparently there is a memblock allocation in one of the functions called > >>>>> from arch_mem_init() between plat_mem_setup() and > >>>>> early_init_fdt_reserve_self(). > >>> > >>> Mike, alas according to the log provided by Florian that's not the reason > >>> of the problem. Please, see my considerations below. > >>> > >>>> [...] > >>>> > >>>> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) > >>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun > >>>> Feb 28 10:01:50 PST 2021 > >>>> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) > >>>> [ 0.000000] FPU revision is: 00130001 > >>> > >>>> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] > >>>> early_init_dt_scan_memory+0x160/0x1e0 > >>>> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] > >>>> early_init_dt_scan_memory+0x160/0x1e0 > >>>> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] > >>>> early_init_dt_scan_memory+0x160/0x1e0 > >>> > >>> Here the memory has been added to the memblock allocator. > >>> > >>>> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB > >>>> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') > >>>> [ 0.000000] printk: bootconsole [ns16550a0] enabled > >>> > >>>> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] > >>>> setup_arch+0x128/0x69c > >>> > >>> Here the fdt memory has been reserved. (Note it's built into the > >>> kernel.) > >>> > >>>> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] > >>>> setup_arch+0x1f8/0x69c > >>> > >>> Here the kernel itself together with built-in dtb have been reserved. > >>> So far so good. > >>> > >>>> [ 0.000000] Initrd not found or empty - disabling initrd > >>> > >>>> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 > >>>> early_init_dt_alloc_memory_arch+0x40/0x84 > >>>> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 > >>>> early_init_dt_alloc_memory_arch+0x40/0x84 > >>>> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>> > >>> The log above most likely belongs to the call-chain: > >>> setup_arch() > >>> +-> arch_mem_init() > >>> +-> device_tree_init() - BMIPS specific method > >>> +-> unflatten_and_copy_device_tree() > >>> > >>> So to speak here we've copied the fdt from the original space > >>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened > >>> it to [0x00003aa4-0x0000ba4b]. > >>> > >>> The problem is that a bit later the next call-chain is performed: > >>> setup_arch() > >>> +-> plat_smp_setup() > >>> +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); > >>> +-> if (!board_ebase_setup) > >>> board_ebase_setup = &bmips_ebase_setup; > >>> > >>> So at the moment of the CPU traps initialization the bmips_ebase_setup() > >>> method is called. What trap_init() does isn't compatible with the > >>> allocation performed by the unflatten_and_copy_device_tree() method. > >>> See the next comment. > >>> > >>>> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 > >>>> early_init_dt_alloc_memory_arch+0x40/0x84 > >>>> [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] > >>>> setup_arch+0x3fc/0x69c > >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > >>>> [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > >>>> [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > >>>> [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 > >>>> bytes. > >>>> [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, > >>>> linesize 32 bytes > >>>> [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > >>>> [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > >>>> [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > >>>> [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] Zone ranges: > >>>> [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] > >>>> [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] > >>>> [ 0.000000] Movable zone start for each node > >>>> [ 0.000000] Early memory node ranges > >>>> [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] > >>>> [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] > >>>> [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] > >>>> [ 0.000000] Initmem setup node 0 [mem > >>>> 0x0000000000000000-0x00000000cfffffff] > >>>> [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 > >>>> from=0x00000000 max_addr=0x00000000 > >>>> alloc_node_mem_map.constprop.135+0x6c/0xc8 > >>>> [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 > >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 > >>>> [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 > >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 > >>>> [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] MEMBLOCK configuration: > >>>> [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 > >>>> [ 0.000000] memory.cnt = 0x3 > >>>> [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved.cnt = 0xa > >>>> [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 > >>>> [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 > >>>> [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 > >>>> [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 > >>>> [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 > >>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 > >>>> [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_free: [0x03245000-0x03244fff] > >>>> pcpu_embed_first_chunk+0x7a0/0x884 > >>>> [ 0.000000] memblock_free: [0x03257000-0x03256fff] > >>>> pcpu_embed_first_chunk+0x7a0/0x884 > >>>> [ 0.000000] memblock_free: [0x03269000-0x03268fff] > >>>> pcpu_embed_first_chunk+0x7a0/0x884 > >>>> [ 0.000000] memblock_free: [0x0327b000-0x0327afff] > >>>> pcpu_embed_first_chunk+0x7a0/0x884 > >>>> [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 > >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 > >>>> [ 0.000000] memblock_reserve: [0x03232580-0x032325db] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 > >>>> [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 > >>>> [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 > >>>> [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] > >>>> pcpu_embed_first_chunk+0x838/0x884 > >>>> [ 0.000000] memblock_free: [0x03231400-0x032323ff] > >>>> pcpu_embed_first_chunk+0x850/0x884 > >>>> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 > >>>> [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon > >>>> [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c > >>>> [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 > >>>> bytes, linear) > >>>> [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c > >>>> [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 > >>>> bytes, linear) > >>> > >>>> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] > >>>> trap_init+0x70/0x4e8 > >>> > >>> Most likely someplace here the corruption has happened. The log above > >>> has just reserved a memory for NMI/reset vectors: > >>> arch/mips/kernel/traps.c: trap_init(void): Line 2373. > >>> > >>> But then the board_ebase_setup() pointer is dereferenced and called, > >>> which has been initialized with bmips_ebase_setup() earlier and which > >>> overwrites the ebase variable with: 0x80001000 as this is > >>> CPU_BMIPS5000 CPU. So any further calls of the functions like > >>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a > >>> corruption of the memory above 0x80001000, which as we have discovered > >>> belongs to fdt and unflattened device tree. > >>> > >>>> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > >>>> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, > >>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K > >>>> cma-reserved, 1835008K highmem) > >>>> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > >>>> [ 0.000000] rcu: Hierarchical RCU implementation. > >>>> [ 0.000000] rcu: RCU event tracing is enabled. > >>>> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay > >>>> is 25 jiffies. > >>>> [ 0.000000] NR_IRQS: 256 > >>> > >>>> [ 0.000000] OF: Bad cell count for /rdb > >>>> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers > >>>> [ 0.000000] OF: of_irq_init: children remain, but no parents > >>> > >>> So here is the first time we have got the consequence of the corruption > >>> popped up. Luckily it's just the "Bad cells count" error. We could have > >>> got much less obvious log here up to getting a crash at some place > >>> further... > >>> > >>>> [ 0.000000] random: get_random_bytes called from > >>>> start_kernel+0x444/0x654 with crng_init=0 > >>>> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, > >>>> wraps every 8589934590000000ns > >>> > >>>> > >>>> and with your patch applied which unfortunately did not work we have the > >>>> following: > >>>> > >>>> [...] > >>> > >>> So a patch like this shall workaround the corruption: > >>> > >>> --- a/arch/mips/bmips/setup.c > >>> +++ b/arch/mips/bmips/setup.c > >>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) > >>> > >>> __dt_setup_arch(dtb); > >>> > >>> + memblock_reserve(0x0, 0x1000 + 0x100*64); > >>> + > >>> for (q = bmips_quirk_list; q->quirk_fn; q++) { > >>> if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > >>> q->compatible)) { > >> > > > >> This patch works, thanks a lot for the troubleshooting and analysis! How > >> about the following which would be more generic and works as well and > >> should be more universal since it does not require each architecture to > >> provide an appropriate call to memblock_reserve(): > > > > Hm, are you sure it's working? > > I was until I noticed that I was working on top of a revert of Roman's > patch sorry about the brain fart here. > > > If so, my analysis hasn't been quite > > correct. My suggestion was based on the memory initializations, > > allocations and reservations trace. So here is the sequence of most > > crucial of them: > > 1) Memblock initialization: > > start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch() > > (At this point I suggested to place the exceptions memory > > reservation.) > > 2) Base FDT memory reservation: > > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self() > > 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges > > reservation: > > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem() > > 4) Reserve kernel itself, some critical sections like initrd and > > crash-kernel: > > start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()... > > 5) Copy and unflatten the built-into the kernel device tree > > (BMIPS-platform code): > > start_kernel()->setup_arch()->arch_mem_init()->device_tree_init() > > This is the very first time an allocation from the memblock pool > > is performed. Since we haven't reserved a memory for the exception > > vectors yet, the memblock allocator is free to return that memory > > range for any other use. Needless to say if we try to use that memory > > later without consulting with memblock, we may and in our case > > will get into troubles. > > 6) Many random early memblock allocations for kernel use before > > buddy and sl*b allocators are up and running... > > Note if for some fortunate reason the allocations made in 5) didn't > > overlap the exceptions memory, here we have much more chances to > > do that with obviously fatal consequences of the ranges independent > > usage. > > 7) Trap/exception vectors initialization and !memory reservation! for > > them: > > start_kernel()->trap_init() > > Only at this point we get to reserve the memory for the vectors. > > 8) Init and run buddy/sl*b allocators: > > start_kernel()->mm_init()->...mem_init()... > > > > There are a lot of allocations done in 5) and 6) before the > > trap_init() is called in 7). You can see that in your log. That's why > > I have doubts that your patch worked well. Most likely you've > > forgotten to revert the workaround suggested by me in the previous > > message. Could you make sure that you didn't and re-test your patch > > again? If it still works then I might have confused something and it's > > strange that my patch worked in the first place... > > I would like to submit a fix for 5.12-rc1 and get it back ported into > 5.11 so we have BMIPS machines boot again, that will be essentially your > earlier proposed fix. > > BMIPS is the only "legacy" MIPS platform that defines an exception base, > so while this problem may certainly exist with other platforms, I do > wonder how likely it is there, though? Hm, at least we can be sure that the problem exists for each platform, which conforms to the !cpu_has_mips_r2_r6 condition and which have VEIC/ VINT capability. Those platforms may get out of the first PAGE_SIZE memory in initializing the exceptions table thus corrupting the memory possibly allocated for something else. In my case the problem doesn't manifest itself because the CPU is MIPS32r5. -Sergey > > > > > A food for thoughts for everyone (Thomas, Mark, please join the > > discussion). What we've got here is a bit bigger problem. AFAICS > > if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node() > > performs the allocation above the very first PAGE_SIZE memory chunk > > (see that method code for details). So we are currently on a safe side > > for some older MIPS platforms. But the platform with VEIC/VINT may get > > into the same troubles here if they didn't reserve exception memory > > early enough before the kernel starts random allocations from > > memblock. So we either need to provide a generic workaround for that > > or make sure each platform gets to reserve vectors itself for instance > > in the plat_mem_setup() method. > > > > -Sergey > > > >> > >> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > >> index e0352958e2f7..b0a173b500e8 100644 > >> --- a/arch/mips/kernel/traps.c > >> +++ b/arch/mips/kernel/traps.c > >> @@ -2367,10 +2367,7 @@ void __init trap_init(void) > >> > >> if (!cpu_has_mips_r2_r6) { > >> ebase = CAC_BASE; > >> - ebase_pa = virt_to_phys((void *)ebase); > >> vec_size = 0x400; > >> - > >> - memblock_reserve(ebase_pa, vec_size); > >> } else { > >> if (cpu_has_veic || cpu_has_vint) > >> vec_size = 0x200 + VECTORSPACING*64; > >> @@ -2410,6 +2407,14 @@ void __init trap_init(void) > >> > >> if (board_ebase_setup) > >> board_ebase_setup(); > >> + > >> + /* board_ebase_setup() can change the exception base address > >> + * reserve it now after changes were made. > >> + */ > >> + if (!cpu_has_mips_r2_r6) { > >> + ebase_pa = virt_to_phys((void *)ebase); > >> + memblock_reserve(ebase_pa, vec_size); > >> + } > >> per_cpu_trap_init(true); > >> memblock_set_bottom_up(false); > >> -- > >> Florian > > -- > Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-01 9:22 ` Serge Semin 2021-03-02 4:09 ` Florian Fainelli @ 2021-03-02 4:19 ` Florian Fainelli 2021-03-02 8:09 ` Mike Rapoport ` (3 more replies) 1 sibling, 4 replies; 38+ messages in thread From: Florian Fainelli @ 2021-03-02 4:19 UTC (permalink / raw) To: linux-mips Cc: rppt, fancer.lancer, guro, akpm, paul, Florian Fainelli, Serge Semin, Kamal Dasu, Thomas Bogendoerfer, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list BMIPS is one of the few platforms that do change the exception base. After commit 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") we started seeing BMIPS boards fail to boot with the built-in FDT being corrupted. Before the cited commit, early allocations would be in the [kernel_end, RAM_END] range, but after commit they would be within [RAM_START + PAGE_SIZE, RAM_END]. The custom exception base handler that is installed by bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the memory region allocated by unflatten_and_copy_device_tree() thus corrupting the FDT used by the kernel. To fix this, we need to perform an early reservation of the custom exception that is going to be installed and this needs to happen at plat_mem_setup() time to ensure that unflatten_and_copy_device_tree() finds a space that is suitable, away from reserved memory. Huge thanks to Serget for analysing and proposing a solution to this issue. Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> --- Thomas, This is intended as a stop-gap solution for 5.12-rc1 and to be picked up by the stable team for 5.11. We should find a safer way to avoid these problems for 5.13 maybe. arch/mips/bmips/setup.c | 22 ++++++++++++++++++++++ arch/mips/include/asm/traps.h | 2 ++ 2 files changed, 24 insertions(+) diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c index 31bcfa4e08b9..0088bd45b892 100644 --- a/arch/mips/bmips/setup.c +++ b/arch/mips/bmips/setup.c @@ -149,6 +149,26 @@ void __init plat_time_init(void) mips_hpt_frequency = freq; } +static void __init bmips_ebase_reserve(void) +{ + phys_addr_t base, size = VECTORSPACING * 64; + + switch (current_cpu_type()) { + default: + case CPU_BMIPS4350: + return; + case CPU_BMIPS3300: + case CPU_BMIPS4380: + base = 0x0400; + break; + case CPU_BMIPS5000: + base = 0x1000; + break; + } + + memblock_reserve(base, size); +} + void __init plat_mem_setup(void) { void *dtb; @@ -169,6 +189,8 @@ void __init plat_mem_setup(void) __dt_setup_arch(dtb); + bmips_ebase_reserve(); + for (q = bmips_quirk_list; q->quirk_fn; q++) { if (of_flat_dt_is_compatible(of_get_flat_dt_root(), q->compatible)) { diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h index 6aa8f126a43d..0ba6bb7f9618 100644 --- a/arch/mips/include/asm/traps.h +++ b/arch/mips/include/asm/traps.h @@ -14,6 +14,8 @@ #define MIPS_BE_FIXUP 1 /* return to the fixup code */ #define MIPS_BE_FATAL 2 /* treat as an unrecoverable error */ +#define VECTORSPACING 0x100 /* for EI/VI mode */ + extern void (*board_be_init)(void); extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup); -- 2.25.1 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-02 4:19 ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli @ 2021-03-02 8:09 ` Mike Rapoport 2021-03-02 13:54 ` Serge Semin ` (2 subsequent siblings) 3 siblings, 0 replies; 38+ messages in thread From: Mike Rapoport @ 2021-03-02 8:09 UTC (permalink / raw) To: Florian Fainelli Cc: linux-mips, fancer.lancer, guro, akpm, paul, Serge Semin, Kamal Dasu, Thomas Bogendoerfer, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception that is going to be installed and this needs to happen at > plat_mem_setup() time to ensure that unflatten_and_copy_device_tree() > finds a space that is suitable, away from reserved memory. > > Huge thanks to Serget for analysing and proposing a solution to this > issue. > > Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") > Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> > Reported-by: Kamal Dasu <kdasu.kdev@gmail.com> > Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> > --- > Thomas, > > This is intended as a stop-gap solution for 5.12-rc1 and to be picked up > by the stable team for 5.11. We should find a safer way to avoid these > problems for 5.13 maybe. > > arch/mips/bmips/setup.c | 22 ++++++++++++++++++++++ > arch/mips/include/asm/traps.h | 2 ++ > 2 files changed, 24 insertions(+) > > diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c > index 31bcfa4e08b9..0088bd45b892 100644 > --- a/arch/mips/bmips/setup.c > +++ b/arch/mips/bmips/setup.c > @@ -149,6 +149,26 @@ void __init plat_time_init(void) > mips_hpt_frequency = freq; > } > > +static void __init bmips_ebase_reserve(void) > +{ > + phys_addr_t base, size = VECTORSPACING * 64; > + > + switch (current_cpu_type()) { > + default: > + case CPU_BMIPS4350: > + return; > + case CPU_BMIPS3300: > + case CPU_BMIPS4380: > + base = 0x0400; > + break; > + case CPU_BMIPS5000: > + base = 0x1000; > + break; > + } > + > + memblock_reserve(base, size); > +} > + > void __init plat_mem_setup(void) > { > void *dtb; > @@ -169,6 +189,8 @@ void __init plat_mem_setup(void) > > __dt_setup_arch(dtb); > > + bmips_ebase_reserve(); > + > for (q = bmips_quirk_list; q->quirk_fn; q++) { > if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > q->compatible)) { > diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h > index 6aa8f126a43d..0ba6bb7f9618 100644 > --- a/arch/mips/include/asm/traps.h > +++ b/arch/mips/include/asm/traps.h > @@ -14,6 +14,8 @@ > #define MIPS_BE_FIXUP 1 /* return to the fixup code */ > #define MIPS_BE_FATAL 2 /* treat as an unrecoverable error */ > > +#define VECTORSPACING 0x100 /* for EI/VI mode */ > + > extern void (*board_be_init)(void); > extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup); > > -- > 2.25.1 > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-02 4:19 ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli 2021-03-02 8:09 ` Mike Rapoport @ 2021-03-02 13:54 ` Serge Semin 2021-03-02 19:04 ` Roman Gushchin 2021-03-02 23:54 ` Thomas Bogendoerfer 3 siblings, 0 replies; 38+ messages in thread From: Serge Semin @ 2021-03-02 13:54 UTC (permalink / raw) To: Florian Fainelli, Thomas Bogendoerfer Cc: Serge Semin, Mike Rapoport, linux-mips, guro, akpm, paul, Kamal Dasu, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception that is going to be installed and this needs to happen at > plat_mem_setup() time to ensure that unflatten_and_copy_device_tree() > finds a space that is suitable, away from reserved memory. > > Huge thanks to Serget for analysing and proposing a solution to this > issue. > > Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") > Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> > Reported-by: Kamal Dasu <kdasu.kdev@gmail.com> I'd change the order of these two tags... > Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> > --- > Thomas, > > This is intended as a stop-gap solution for 5.12-rc1 and to be picked up > by the stable team for 5.11. We should find a safer way to avoid these > problems for 5.13 maybe. Thomas, could you join the discussion? If we had a more clever solution to reserve the exceptions table for each possibly affected platform this patch could have been omitted. > > arch/mips/bmips/setup.c | 22 ++++++++++++++++++++++ > arch/mips/include/asm/traps.h | 2 ++ > 2 files changed, 24 insertions(+) > > diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c > index 31bcfa4e08b9..0088bd45b892 100644 > --- a/arch/mips/bmips/setup.c > +++ b/arch/mips/bmips/setup.c > @@ -149,6 +149,26 @@ void __init plat_time_init(void) > mips_hpt_frequency = freq; > } > > +static void __init bmips_ebase_reserve(void) > +{ > + phys_addr_t base, size = VECTORSPACING * 64; > + > + switch (current_cpu_type()) { > + default: > + case CPU_BMIPS4350: > + return; > + case CPU_BMIPS3300: > + case CPU_BMIPS4380: > + base = 0x0400; > + break; > + case CPU_BMIPS5000: > + base = 0x1000; > + break; > + } > + > + memblock_reserve(base, size); > +} > + > void __init plat_mem_setup(void) > { > void *dtb; > @@ -169,6 +189,8 @@ void __init plat_mem_setup(void) > > __dt_setup_arch(dtb); > > + bmips_ebase_reserve(); > + > for (q = bmips_quirk_list; q->quirk_fn; q++) { > if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > q->compatible)) { > diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h > index 6aa8f126a43d..0ba6bb7f9618 100644 > --- a/arch/mips/include/asm/traps.h > +++ b/arch/mips/include/asm/traps.h > @@ -14,6 +14,8 @@ > #define MIPS_BE_FIXUP 1 /* return to the fixup code */ > #define MIPS_BE_FATAL 2 /* treat as an unrecoverable error */ > > +#define VECTORSPACING 0x100 /* for EI/VI mode */ What about the same macro declared in arch/mips/kernel/traps.c? I'd suggest to remove it from there and explicitly #include this header file into the arch/mips/bmips/setup.c file. -Sergey > + > extern void (*board_be_init)(void); > extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup); > > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-02 4:19 ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli 2021-03-02 8:09 ` Mike Rapoport 2021-03-02 13:54 ` Serge Semin @ 2021-03-02 19:04 ` Roman Gushchin 2021-03-02 23:54 ` Thomas Bogendoerfer 3 siblings, 0 replies; 38+ messages in thread From: Roman Gushchin @ 2021-03-02 19:04 UTC (permalink / raw) To: Florian Fainelli Cc: linux-mips, rppt, fancer.lancer, akpm, paul, Serge Semin, Kamal Dasu, Thomas Bogendoerfer, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception that is going to be installed and this needs to happen at > plat_mem_setup() time to ensure that unflatten_and_copy_device_tree() > finds a space that is suitable, away from reserved memory. > > Huge thanks to Serget for analysing and proposing a solution to this > issue. > > Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") > Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> > Reported-by: Kamal Dasu <kdasu.kdev@gmail.com> > Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Roman Gushchin <guro@fb.com> Thank you! > --- > Thomas, > > This is intended as a stop-gap solution for 5.12-rc1 and to be picked up > by the stable team for 5.11. We should find a safer way to avoid these > problems for 5.13 maybe. > > arch/mips/bmips/setup.c | 22 ++++++++++++++++++++++ > arch/mips/include/asm/traps.h | 2 ++ > 2 files changed, 24 insertions(+) > > diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c > index 31bcfa4e08b9..0088bd45b892 100644 > --- a/arch/mips/bmips/setup.c > +++ b/arch/mips/bmips/setup.c > @@ -149,6 +149,26 @@ void __init plat_time_init(void) > mips_hpt_frequency = freq; > } > > +static void __init bmips_ebase_reserve(void) > +{ > + phys_addr_t base, size = VECTORSPACING * 64; > + > + switch (current_cpu_type()) { > + default: > + case CPU_BMIPS4350: > + return; > + case CPU_BMIPS3300: > + case CPU_BMIPS4380: > + base = 0x0400; > + break; > + case CPU_BMIPS5000: > + base = 0x1000; > + break; > + } > + > + memblock_reserve(base, size); > +} > + > void __init plat_mem_setup(void) > { > void *dtb; > @@ -169,6 +189,8 @@ void __init plat_mem_setup(void) > > __dt_setup_arch(dtb); > > + bmips_ebase_reserve(); > + > for (q = bmips_quirk_list; q->quirk_fn; q++) { > if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > q->compatible)) { > diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h > index 6aa8f126a43d..0ba6bb7f9618 100644 > --- a/arch/mips/include/asm/traps.h > +++ b/arch/mips/include/asm/traps.h > @@ -14,6 +14,8 @@ > #define MIPS_BE_FIXUP 1 /* return to the fixup code */ > #define MIPS_BE_FATAL 2 /* treat as an unrecoverable error */ > > +#define VECTORSPACING 0x100 /* for EI/VI mode */ > + > extern void (*board_be_init)(void); > extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup); > > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-02 4:19 ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli ` (2 preceding siblings ...) 2021-03-02 19:04 ` Roman Gushchin @ 2021-03-02 23:54 ` Thomas Bogendoerfer 2021-03-03 1:30 ` Florian Fainelli 3 siblings, 1 reply; 38+ messages in thread From: Thomas Bogendoerfer @ 2021-03-02 23:54 UTC (permalink / raw) To: Florian Fainelli Cc: linux-mips, rppt, fancer.lancer, guro, akpm, paul, Serge Semin, Kamal Dasu, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote: > BMIPS is one of the few platforms that do change the exception base. > After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > with kernel_end") we started seeing BMIPS boards fail to boot with the > built-in FDT being corrupted. > > Before the cited commit, early allocations would be in the [kernel_end, > RAM_END] range, but after commit they would be within [RAM_START + > PAGE_SIZE, RAM_END]. > > The custom exception base handler that is installed by > bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > memory region allocated by unflatten_and_copy_device_tree() thus > corrupting the FDT used by the kernel. > > To fix this, we need to perform an early reservation of the custom > exception that is going to be installed and this needs to happen at > plat_mem_setup() time to ensure that unflatten_and_copy_device_tree() > finds a space that is suitable, away from reserved memory. > > Huge thanks to Serget for analysing and proposing a solution to this > issue. > > Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") > Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> > Reported-by: Kamal Dasu <kdasu.kdev@gmail.com> > Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> > --- > Thomas, > > This is intended as a stop-gap solution for 5.12-rc1 and to be picked up > by the stable team for 5.11. We should find a safer way to avoid these > problems for 5.13 maybe. let's try to make it in one ago. Hwo about reserving vector space in cpu_probe, if it's known there and leave the rest to trap_init() ? Below patch got a quick test on IP22 (real hardware) and malta (qemu). Not sure, if I got all BMIPS parts correct, so please check/test. BTW. do we really need to EXPORT_SYMBOL ebase ? Thomas, diff --git a/arch/mips/include/asm/setup.h b/arch/mips/include/asm/setup.h index bb36a400203d..3ef62c23c34f 100644 --- a/arch/mips/include/asm/setup.h +++ b/arch/mips/include/asm/setup.h @@ -23,7 +23,7 @@ typedef void (*vi_handler_t)(void); extern void *set_vi_handler(int n, vi_handler_t addr); extern void *set_except_vector(int n, void *addr); -extern unsigned long ebase; +extern unsigned long ebase, ebase_size; extern unsigned int hwrena; extern void per_cpu_trap_init(bool); extern void cpu_cache_init(void); diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h index 6aa8f126a43d..f7d59831aae3 100644 --- a/arch/mips/include/asm/traps.h +++ b/arch/mips/include/asm/traps.h @@ -26,6 +26,8 @@ extern void (*board_cache_error_setup)(void); extern int register_nmi_notifier(struct notifier_block *nb); extern char except_vec_nmi[]; +#define VECTORSPACING 0x100 /* for EI/VI mode */ + #define nmi_notifier(fn, pri) \ ({ \ static struct notifier_block fn##_nb = { \ diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c index 9a89637b4ecf..eef1a4e304da 100644 --- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -13,6 +13,7 @@ #include <linux/smp.h> #include <linux/stddef.h> #include <linux/export.h> +#include <linux/memblock.h> #include <asm/bugs.h> #include <asm/cpu.h> @@ -25,7 +26,9 @@ #include <asm/watch.h> #include <asm/elf.h> #include <asm/pgtable-bits.h> +#include <asm/setup.h> #include <asm/spram.h> +#include <asm/traps.h> #include <linux/uaccess.h> #include "fpu-probe.h" @@ -1628,6 +1631,8 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) c->cputype = CPU_BMIPS3300; __cpu_name[cpu] = "Broadcom BMIPS3300"; set_elf_platform(cpu, "bmips3300"); + ebase = 0x80000400; + ebase_size = VECTORSPACING * 64; break; case PRID_IMP_BMIPS43XX: { int rev = c->processor_id & PRID_REV_MASK; @@ -1638,6 +1643,8 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS4380"; set_elf_platform(cpu, "bmips4380"); c->options |= MIPS_CPU_RIXI; + ebase = 0x80000400; + ebase_size = VECTORSPACING * 64; } else { c->cputype = CPU_BMIPS4350; __cpu_name[cpu] = "Broadcom BMIPS4350"; @@ -1654,6 +1661,8 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu) __cpu_name[cpu] = "Broadcom BMIPS5000"; set_elf_platform(cpu, "bmips5000"); c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI; + ebase = 0x80001000; + ebase_size = VECTORSPACING * 64; break; } } @@ -2133,6 +2142,13 @@ void cpu_probe(void) if (cpu == 0) __ua_limit = ~((1ull << cpu_vmbits) - 1); #endif + + if (ebase_size == 0 && !cpu_has_mips_r2_r6) { + ebase = CAC_BASE; + ebase_size = 0x400; + } + if (ebase_size) + memblock_reserve(__pa((void *)ebase), ebase_size); } void cpu_report(void) diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c index b6ef5f7312cf..ad3f2282a65a 100644 --- a/arch/mips/kernel/smp-bmips.c +++ b/arch/mips/kernel/smp-bmips.c @@ -528,10 +528,6 @@ static void bmips_set_reset_vec(int cpu, u32 val) void bmips_ebase_setup(void) { - unsigned long new_ebase = ebase; - - BUG_ON(ebase != CKSEG0); - switch (current_cpu_type()) { case CPU_BMIPS4350: /* @@ -554,7 +550,6 @@ void bmips_ebase_setup(void) * 0x8000_0000: reset/NMI (initially in kseg1) * 0x8000_0400: normal vectors */ - new_ebase = 0x80000400; bmips_set_reset_vec(0, RESET_FROM_KSEG0); break; case CPU_BMIPS5000: @@ -562,16 +557,14 @@ void bmips_ebase_setup(void) * 0x8000_0000: reset/NMI (initially in kseg1) * 0x8000_1000: normal vectors */ - new_ebase = 0x80001000; bmips_set_reset_vec(0, RESET_FROM_KSEG0); - write_c0_ebase(new_ebase); + write_c0_ebase(ebase); break; default: return; } board_nmi_handler_setup = &bmips_nmi_handler_setup; - ebase = new_ebase; } asmlinkage void __weak plat_wired_tlb_setup(void) diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index e0352958e2f7..21ba9d04683e 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2009,10 +2009,10 @@ void __noreturn nmi_exception_handler(struct pt_regs *regs) nmi_exit(); } -#define VECTORSPACING 0x100 /* for EI/VI mode */ - unsigned long ebase; EXPORT_SYMBOL_GPL(ebase); +unsigned long ebase_size; +EXPORT_SYMBOL_GPL(ebase_size); unsigned long exception_handlers[32]; unsigned long vi_handlers[64]; @@ -2360,27 +2360,22 @@ void __init trap_init(void) extern char except_vec3_generic; extern char except_vec4; extern char except_vec3_r4000; - unsigned long i, vec_size; phys_addr_t ebase_pa; + unsigned long i; check_wait(); - if (!cpu_has_mips_r2_r6) { - ebase = CAC_BASE; - ebase_pa = virt_to_phys((void *)ebase); - vec_size = 0x400; - - memblock_reserve(ebase_pa, vec_size); - } else { + if (cpu_has_mips_r2_r6) { if (cpu_has_veic || cpu_has_vint) - vec_size = 0x200 + VECTORSPACING*64; + ebase_size = 0x200 + VECTORSPACING*64; else - vec_size = PAGE_SIZE; + ebase_size = PAGE_SIZE; - ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size)); + ebase_pa = memblock_phys_alloc(ebase_size, + 1 << fls(ebase_size)); if (!ebase_pa) panic("%s: Failed to allocate %lu bytes align=0x%x\n", - __func__, vec_size, 1 << fls(vec_size)); + __func__, ebase_size, 1 << fls(ebase_size)); /* * Try to ensure ebase resides in KSeg0 if possible. @@ -2534,7 +2529,7 @@ void __init trap_init(void) else set_handler(0x080, &except_vec3_generic, 0x80); - local_flush_icache_range(ebase, ebase + vec_size); + local_flush_icache_range(ebase, ebase + ebase_size); sort_extable(__start___dbe_table, __stop___dbe_table); -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea. [ RFC1925, 2.3 ] ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-02 23:54 ` Thomas Bogendoerfer @ 2021-03-03 1:30 ` Florian Fainelli 2021-03-03 9:41 ` Thomas Bogendoerfer 0 siblings, 1 reply; 38+ messages in thread From: Florian Fainelli @ 2021-03-03 1:30 UTC (permalink / raw) To: Thomas Bogendoerfer Cc: linux-mips, rppt, fancer.lancer, guro, akpm, paul, Serge Semin, Kamal Dasu, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list On 3/2/2021 3:54 PM, Thomas Bogendoerfer wrote: > On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote: >> BMIPS is one of the few platforms that do change the exception base. >> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations >> with kernel_end") we started seeing BMIPS boards fail to boot with the >> built-in FDT being corrupted. >> >> Before the cited commit, early allocations would be in the [kernel_end, >> RAM_END] range, but after commit they would be within [RAM_START + >> PAGE_SIZE, RAM_END]. >> >> The custom exception base handler that is installed by >> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the >> memory region allocated by unflatten_and_copy_device_tree() thus >> corrupting the FDT used by the kernel. >> >> To fix this, we need to perform an early reservation of the custom >> exception that is going to be installed and this needs to happen at >> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree() >> finds a space that is suitable, away from reserved memory. >> >> Huge thanks to Serget for analysing and proposing a solution to this >> issue. >> >> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") >> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> >> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com> >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> >> --- >> Thomas, >> >> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up >> by the stable team for 5.11. We should find a safer way to avoid these >> problems for 5.13 maybe. > > let's try to make it in one ago. Hwo about reserving vector space in > cpu_probe, if it's known there and leave the rest to trap_init() ? > > Below patch got a quick test on IP22 (real hardware) and malta (qemu). > Not sure, if I got all BMIPS parts correct, so please check/test. Works for me here: Tested-by: Florian Fainelli <f.fainelli@gmail.com> Thanks! > BTW. do we really need to EXPORT_SYMBOL ebase ? It seems like MIPS KVM support can be built as a module which is why ebase was exported to modules with 878edf014e29de38c49153aba20273fbc9ae31af ("MIPS: KVM: Restore host EBase from ebase variable")? -- Florian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-03 1:30 ` Florian Fainelli @ 2021-03-03 9:41 ` Thomas Bogendoerfer 2021-03-03 17:45 ` Maciej W. Rozycki 0 siblings, 1 reply; 38+ messages in thread From: Thomas Bogendoerfer @ 2021-03-03 9:41 UTC (permalink / raw) To: Florian Fainelli Cc: linux-mips, rppt, fancer.lancer, guro, akpm, paul, Serge Semin, Kamal Dasu, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list On Tue, Mar 02, 2021 at 05:30:18PM -0800, Florian Fainelli wrote: > > > On 3/2/2021 3:54 PM, Thomas Bogendoerfer wrote: > > On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote: > >> BMIPS is one of the few platforms that do change the exception base. > >> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations > >> with kernel_end") we started seeing BMIPS boards fail to boot with the > >> built-in FDT being corrupted. > >> > >> Before the cited commit, early allocations would be in the [kernel_end, > >> RAM_END] range, but after commit they would be within [RAM_START + > >> PAGE_SIZE, RAM_END]. > >> > >> The custom exception base handler that is installed by > >> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the > >> memory region allocated by unflatten_and_copy_device_tree() thus > >> corrupting the FDT used by the kernel. > >> > >> To fix this, we need to perform an early reservation of the custom > >> exception that is going to be installed and this needs to happen at > >> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree() > >> finds a space that is suitable, away from reserved memory. > >> > >> Huge thanks to Serget for analysing and proposing a solution to this > >> issue. > >> > >> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end") > >> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> > >> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com> > >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> > >> --- > >> Thomas, > >> > >> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up > >> by the stable team for 5.11. We should find a safer way to avoid these > >> problems for 5.13 maybe. > > > > let's try to make it in one ago. Hwo about reserving vector space in > > cpu_probe, if it's known there and leave the rest to trap_init() ? > > > > Below patch got a quick test on IP22 (real hardware) and malta (qemu). > > Not sure, if I got all BMIPS parts correct, so please check/test. > > Works for me here: perfect, I only forgot about R3k... I'll submit a formal patch submission later today. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea. [ RFC1925, 2.3 ] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-03 9:41 ` Thomas Bogendoerfer @ 2021-03-03 17:45 ` Maciej W. Rozycki 2021-03-03 18:15 ` Thomas Bogendoerfer 0 siblings, 1 reply; 38+ messages in thread From: Maciej W. Rozycki @ 2021-03-03 17:45 UTC (permalink / raw) To: Thomas Bogendoerfer Cc: Florian Fainelli, linux-mips, rppt, fancer.lancer, guro, Andrew Morton, paul, Serge Semin, Kamal Dasu, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote: > perfect, I only forgot about R3k... I'll submit a formal patch submission > later today. What's up with the R3k (the usual trigger for me) here? Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-03 17:45 ` Maciej W. Rozycki @ 2021-03-03 18:15 ` Thomas Bogendoerfer 2021-03-03 21:50 ` Maciej W. Rozycki 0 siblings, 1 reply; 38+ messages in thread From: Thomas Bogendoerfer @ 2021-03-03 18:15 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Florian Fainelli, linux-mips, rppt, fancer.lancer, guro, Andrew Morton, paul, Serge Semin, Kamal Dasu, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list On Wed, Mar 03, 2021 at 06:45:52PM +0100, Maciej W. Rozycki wrote: > On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote: > > > perfect, I only forgot about R3k... I'll submit a formal patch submission > > later today. > > What's up with the R3k (the usual trigger for me) here? I've moved r3k cpu_probe() to it's own file and when moving ebase reservation to cpu_probe(), I need to add it there as well. So just a mechanic step, I've missed. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea. [ RFC1925, 2.3 ] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption 2021-03-03 18:15 ` Thomas Bogendoerfer @ 2021-03-03 21:50 ` Maciej W. Rozycki 0 siblings, 0 replies; 38+ messages in thread From: Maciej W. Rozycki @ 2021-03-03 21:50 UTC (permalink / raw) To: Thomas Bogendoerfer Cc: Florian Fainelli, linux-mips, rppt, fancer.lancer, guro, Andrew Morton, paul, Serge Semin, Kamal Dasu, Yanteng Si, Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote: > > What's up with the R3k (the usual trigger for me) here? > > I've moved r3k cpu_probe() to it's own file and when moving ebase > reservation to cpu_probe(), I need to add it there as well. So just > a mechanic step, I've missed. Ah, right, I didn't notice the split. Thanks for taking care of it! Maciej ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-03-01 3:50 ` Florian Fainelli 2021-03-01 9:22 ` Serge Semin @ 2021-03-01 9:45 ` Mike Rapoport 2021-03-02 3:55 ` Roman Gushchin 1 sibling, 1 reply; 38+ messages in thread From: Mike Rapoport @ 2021-03-01 9:45 UTC (permalink / raw) To: Florian Fainelli Cc: Serge Semin, Thomas Bogendoerfer, Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team, open list:BROADCOM BMIPS MIPS ARCHITECTURE On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote: > Hi Serge, > > On 2/28/2021 3:08 PM, Serge Semin wrote: > > Hi folks, > > What you've got here seems a more complicated problem than it > > could originally look like. Please, see my comments below. > > > > (Note I've discarded some of the email logs, which of no interest > > to the discovered problem. Please also note that I haven't got any > > Broadcom hardware to test out a solution suggested below.) > > > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: > >> Hi Mike, > >> > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote: > >>> Hi Florian, > >>> > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: > >>>> > > > >>>> [...] > > > >>>> > >>>> Hi Roman, Thomas and other linux-mips folks, > >>>> > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do > >>>> not see a warning like this one in the commit message, instead what > >>>> happens appear to be a corrupted Device Tree which prevents the parsing > >>>> of the "rdb" node and leading to the interrupt controllers not being > >>>> registered, and the system eventually not booting. > >>>> > >>>> The Device Tree is built-into the kernel image and resides at > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. > >>>> > >>>> Do you have any idea what could be wrong with MIPS specifically here? > > > > Most likely the problem you've discovered has been there for quite > > some time. The patch you are referring to just caused it to be > > triggered by extending the early allocation range. See before that > > patch was accepted the early memory allocations had been performed > > in the range: > > [kernel_end, RAM_END]. > > The patch changed that, so the early allocations are done within > > [RAM_START + PAGE_SIZE, RAM_END]. > > > > In normal situations it's safe to do that as long as all the critical > > memory regions (including the memory residing a space below the > > kernel) have been reserved. But as soon as a memory with some critical > > structures haven't been reserved, the kernel may allocate it to be used > > for instance for early initializations with obviously unpredictable but > > most of the times unpleasant consequences. > > > >>> > >>> Apparently there is a memblock allocation in one of the functions called > >>> from arch_mem_init() between plat_mem_setup() and > >>> early_init_fdt_reserve_self(). > > > > Mike, alas according to the log provided by Florian that's not the reason > > of the problem. Please, see my considerations below. > > > >> [...] > >> > >> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun > >> Feb 28 10:01:50 PST 2021 > >> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) > >> [ 0.000000] FPU revision is: 00130001 > > > >> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] > >> early_init_dt_scan_memory+0x160/0x1e0 > >> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] > >> early_init_dt_scan_memory+0x160/0x1e0 > >> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] > >> early_init_dt_scan_memory+0x160/0x1e0 > > > > Here the memory has been added to the memblock allocator. > > > >> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB > >> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') > >> [ 0.000000] printk: bootconsole [ns16550a0] enabled > > > >> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] > >> setup_arch+0x128/0x69c > > > > Here the fdt memory has been reserved. (Note it's built into the > > kernel.) > > > >> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] > >> setup_arch+0x1f8/0x69c > > > > Here the kernel itself together with built-in dtb have been reserved. > > So far so good. > > > >> [ 0.000000] Initrd not found or empty - disabling initrd > > > >> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 > >> from=0x00000000 max_addr=0x00000000 > >> early_init_dt_alloc_memory_arch+0x40/0x84 > >> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] > >> memblock_alloc_range_nid+0xf8/0x198 > >> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 > >> from=0x00000000 max_addr=0x00000000 > >> early_init_dt_alloc_memory_arch+0x40/0x84 > >> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] > >> memblock_alloc_range_nid+0xf8/0x198 > > > > The log above most likely belongs to the call-chain: > > setup_arch() > > +-> arch_mem_init() > > +-> device_tree_init() - BMIPS specific method > > +-> unflatten_and_copy_device_tree() > > > > So to speak here we've copied the fdt from the original space > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened > > it to [0x00003aa4-0x0000ba4b]. > > > > The problem is that a bit later the next call-chain is performed: > > setup_arch() > > +-> plat_smp_setup() > > +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); > > +-> if (!board_ebase_setup) > > board_ebase_setup = &bmips_ebase_setup; > > > > So at the moment of the CPU traps initialization the bmips_ebase_setup() > > method is called. What trap_init() does isn't compatible with the > > allocation performed by the unflatten_and_copy_device_tree() method. > > See the next comment. > > > >> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 > >> from=0x00000000 max_addr=0x00000000 > >> early_init_dt_alloc_memory_arch+0x40/0x84 ... > >> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 > >> bytes, linear) > > > >> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] > >> trap_init+0x70/0x4e8 > > > > Most likely someplace here the corruption has happened. The log above > > has just reserved a memory for NMI/reset vectors: > > arch/mips/kernel/traps.c: trap_init(void): Line 2373. > > > > But then the board_ebase_setup() pointer is dereferenced and called, > > which has been initialized with bmips_ebase_setup() earlier and which > > overwrites the ebase variable with: 0x80001000 as this is > > CPU_BMIPS5000 CPU. So any further calls of the functions like > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a > > corruption of the memory above 0x80001000, which as we have discovered > > belongs to fdt and unflattened device tree. > > > >> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > >> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K > >> cma-reserved, 1835008K highmem) > >> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > >> [ 0.000000] rcu: Hierarchical RCU implementation. > >> [ 0.000000] rcu: RCU event tracing is enabled. > >> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay > >> is 25 jiffies. > >> [ 0.000000] NR_IRQS: 256 > > > >> [ 0.000000] OF: Bad cell count for /rdb > >> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers > >> [ 0.000000] OF: of_irq_init: children remain, but no parents > > > > So here is the first time we have got the consequence of the corruption > > popped up. Luckily it's just the "Bad cells count" error. We could have > > got much less obvious log here up to getting a crash at some place > > further... > > > >> [ 0.000000] random: get_random_bytes called from > >> start_kernel+0x444/0x654 with crng_init=0 > >> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, > >> wraps every 8589934590000000ns > > > >> > >> and with your patch applied which unfortunately did not work we have the > >> following: > >> > >> [...] > > > > So a patch like this shall workaround the corruption: > > > > --- a/arch/mips/bmips/setup.c > > +++ b/arch/mips/bmips/setup.c > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) > > > > __dt_setup_arch(dtb); > > > > + memblock_reserve(0x0, 0x1000 + 0x100*64); > > + > > for (q = bmips_quirk_list; q->quirk_fn; q++) { > > if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > > q->compatible)) { > > This patch works, thanks a lot for the troubleshooting and analysis! How > about the following which would be more generic and works as well and > should be more universal since it does not require each architecture to > provide an appropriate call to memblock_reserve(): > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > index e0352958e2f7..b0a173b500e8 100644 > --- a/arch/mips/kernel/traps.c > +++ b/arch/mips/kernel/traps.c > @@ -2367,10 +2367,7 @@ void __init trap_init(void) > > if (!cpu_has_mips_r2_r6) { > ebase = CAC_BASE; > - ebase_pa = virt_to_phys((void *)ebase); > vec_size = 0x400; > - > - memblock_reserve(ebase_pa, vec_size); > } else { > if (cpu_has_veic || cpu_has_vint) > vec_size = 0x200 + VECTORSPACING*64; > @@ -2410,6 +2407,14 @@ void __init trap_init(void) > > if (board_ebase_setup) > board_ebase_setup(); > + > + /* board_ebase_setup() can change the exception base address > + * reserve it now after changes were made. > + */ > + if (!cpu_has_mips_r2_r6) { > + ebase_pa = virt_to_phys((void *)ebase); > + memblock_reserve(ebase_pa, vec_size); > + } With this it's still possible to have memblock allocations around ebase_pa before it is reserved. I think we have two options here to solve it in more or less generic way: * split the reservation of ebase from traps_init() and move it earlier to setup_arch(). I didn't check what board_ebase_setup() do, if they need to allocate memory it would not work. * add an API to memblock to set lower limit for allocations and then set the lower limit, to e.g. kernel load address in arch_mem_init(). This may add complexity for configurations with relocatable kernel and kaslr. > per_cpu_trap_init(true); > memblock_set_bottom_up(false); -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-03-01 9:45 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport @ 2021-03-02 3:55 ` Roman Gushchin 2021-03-02 13:08 ` Serge Semin 0 siblings, 1 reply; 38+ messages in thread From: Roman Gushchin @ 2021-03-02 3:55 UTC (permalink / raw) To: Mike Rapoport Cc: Florian Fainelli, Serge Semin, Thomas Bogendoerfer, Serge Semin, Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team, open list:BROADCOM BMIPS MIPS ARCHITECTURE On Mon, Mar 01, 2021 at 11:45:42AM +0200, Mike Rapoport wrote: > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote: > > Hi Serge, > > > > On 2/28/2021 3:08 PM, Serge Semin wrote: > > > Hi folks, > > > What you've got here seems a more complicated problem than it > > > could originally look like. Please, see my comments below. > > > > > > (Note I've discarded some of the email logs, which of no interest > > > to the discovered problem. Please also note that I haven't got any > > > Broadcom hardware to test out a solution suggested below.) > > > > > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: > > >> Hi Mike, > > >> > > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote: > > >>> Hi Florian, > > >>> > > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: > > >>>> > > > > > >>>> [...] > > > > > >>>> > > >>>> Hi Roman, Thomas and other linux-mips folks, > > >>>> > > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this > > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do > > >>>> not see a warning like this one in the commit message, instead what > > >>>> happens appear to be a corrupted Device Tree which prevents the parsing > > >>>> of the "rdb" node and leading to the interrupt controllers not being > > >>>> registered, and the system eventually not booting. > > >>>> > > >>>> The Device Tree is built-into the kernel image and resides at > > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. > > >>>> > > >>>> Do you have any idea what could be wrong with MIPS specifically here? > > > > > > Most likely the problem you've discovered has been there for quite > > > some time. The patch you are referring to just caused it to be > > > triggered by extending the early allocation range. See before that > > > patch was accepted the early memory allocations had been performed > > > in the range: > > > [kernel_end, RAM_END]. > > > The patch changed that, so the early allocations are done within > > > [RAM_START + PAGE_SIZE, RAM_END]. > > > > > > In normal situations it's safe to do that as long as all the critical > > > memory regions (including the memory residing a space below the > > > kernel) have been reserved. But as soon as a memory with some critical > > > structures haven't been reserved, the kernel may allocate it to be used > > > for instance for early initializations with obviously unpredictable but > > > most of the times unpleasant consequences. > > > > > >>> > > >>> Apparently there is a memblock allocation in one of the functions called > > >>> from arch_mem_init() between plat_mem_setup() and > > >>> early_init_fdt_reserve_self(). > > > > > > Mike, alas according to the log provided by Florian that's not the reason > > > of the problem. Please, see my considerations below. > > > > > >> [...] > > >> > > >> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) > > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun > > >> Feb 28 10:01:50 PST 2021 > > >> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) > > >> [ 0.000000] FPU revision is: 00130001 > > > > > >> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] > > >> early_init_dt_scan_memory+0x160/0x1e0 > > >> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] > > >> early_init_dt_scan_memory+0x160/0x1e0 > > >> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] > > >> early_init_dt_scan_memory+0x160/0x1e0 > > > > > > Here the memory has been added to the memblock allocator. > > > > > >> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB > > >> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') > > >> [ 0.000000] printk: bootconsole [ns16550a0] enabled > > > > > >> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] > > >> setup_arch+0x128/0x69c > > > > > > Here the fdt memory has been reserved. (Note it's built into the > > > kernel.) > > > > > >> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] > > >> setup_arch+0x1f8/0x69c > > > > > > Here the kernel itself together with built-in dtb have been reserved. > > > So far so good. > > > > > >> [ 0.000000] Initrd not found or empty - disabling initrd > > > > > >> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 > > >> from=0x00000000 max_addr=0x00000000 > > >> early_init_dt_alloc_memory_arch+0x40/0x84 > > >> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] > > >> memblock_alloc_range_nid+0xf8/0x198 > > >> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 > > >> from=0x00000000 max_addr=0x00000000 > > >> early_init_dt_alloc_memory_arch+0x40/0x84 > > >> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] > > >> memblock_alloc_range_nid+0xf8/0x198 > > > > > > The log above most likely belongs to the call-chain: > > > setup_arch() > > > +-> arch_mem_init() > > > +-> device_tree_init() - BMIPS specific method > > > +-> unflatten_and_copy_device_tree() > > > > > > So to speak here we've copied the fdt from the original space > > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened > > > it to [0x00003aa4-0x0000ba4b]. > > > > > > The problem is that a bit later the next call-chain is performed: > > > setup_arch() > > > +-> plat_smp_setup() > > > +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); > > > +-> if (!board_ebase_setup) > > > board_ebase_setup = &bmips_ebase_setup; > > > > > > So at the moment of the CPU traps initialization the bmips_ebase_setup() > > > method is called. What trap_init() does isn't compatible with the > > > allocation performed by the unflatten_and_copy_device_tree() method. > > > See the next comment. > > > > > >> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 > > >> from=0x00000000 max_addr=0x00000000 > > >> early_init_dt_alloc_memory_arch+0x40/0x84 > > ... > > > >> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 > > >> bytes, linear) > > > > > >> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] > > >> trap_init+0x70/0x4e8 > > > > > > Most likely someplace here the corruption has happened. The log above > > > has just reserved a memory for NMI/reset vectors: > > > arch/mips/kernel/traps.c: trap_init(void): Line 2373. > > > > > > But then the board_ebase_setup() pointer is dereferenced and called, > > > which has been initialized with bmips_ebase_setup() earlier and which > > > overwrites the ebase variable with: 0x80001000 as this is > > > CPU_BMIPS5000 CPU. So any further calls of the functions like > > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a > > > corruption of the memory above 0x80001000, which as we have discovered > > > belongs to fdt and unflattened device tree. > > > > > >> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > > >> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, > > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K > > >> cma-reserved, 1835008K highmem) > > >> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > > >> [ 0.000000] rcu: Hierarchical RCU implementation. > > >> [ 0.000000] rcu: RCU event tracing is enabled. > > >> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay > > >> is 25 jiffies. > > >> [ 0.000000] NR_IRQS: 256 > > > > > >> [ 0.000000] OF: Bad cell count for /rdb > > >> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers > > >> [ 0.000000] OF: of_irq_init: children remain, but no parents > > > > > > So here is the first time we have got the consequence of the corruption > > > popped up. Luckily it's just the "Bad cells count" error. We could have > > > got much less obvious log here up to getting a crash at some place > > > further... > > > > > >> [ 0.000000] random: get_random_bytes called from > > >> start_kernel+0x444/0x654 with crng_init=0 > > >> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, > > >> wraps every 8589934590000000ns > > > > > >> > > >> and with your patch applied which unfortunately did not work we have the > > >> following: > > >> > > >> [...] > > > > > > So a patch like this shall workaround the corruption: > > > > > > --- a/arch/mips/bmips/setup.c > > > +++ b/arch/mips/bmips/setup.c > > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) > > > > > > __dt_setup_arch(dtb); > > > > > > + memblock_reserve(0x0, 0x1000 + 0x100*64); > > > + > > > for (q = bmips_quirk_list; q->quirk_fn; q++) { > > > if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > > > q->compatible)) { > > > > This patch works, thanks a lot for the troubleshooting and analysis! How > > about the following which would be more generic and works as well and > > should be more universal since it does not require each architecture to > > provide an appropriate call to memblock_reserve(): > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > index e0352958e2f7..b0a173b500e8 100644 > > --- a/arch/mips/kernel/traps.c > > +++ b/arch/mips/kernel/traps.c > > @@ -2367,10 +2367,7 @@ void __init trap_init(void) > > > > if (!cpu_has_mips_r2_r6) { > > ebase = CAC_BASE; > > - ebase_pa = virt_to_phys((void *)ebase); > > vec_size = 0x400; > > - > > - memblock_reserve(ebase_pa, vec_size); > > } else { > > if (cpu_has_veic || cpu_has_vint) > > vec_size = 0x200 + VECTORSPACING*64; > > @@ -2410,6 +2407,14 @@ void __init trap_init(void) > > > > if (board_ebase_setup) > > board_ebase_setup(); > > + > > + /* board_ebase_setup() can change the exception base address > > + * reserve it now after changes were made. > > + */ > > + if (!cpu_has_mips_r2_r6) { > > + ebase_pa = virt_to_phys((void *)ebase); > > + memblock_reserve(ebase_pa, vec_size); > > + } Hi folks! First, I'm really sorry for breaking things and also being silent for last couple of days: I was almost completely offline. Thank you for working on this! > > With this it's still possible to have memblock allocations around ebase_pa > before it is reserved. > > I think we have two options here to solve it in more or less generic way: > > * split the reservation of ebase from traps_init() and move it earlier to > setup_arch(). I didn't check what board_ebase_setup() do, if they need to > allocate memory it would not work. It seems that it doesn't allocate any memory, so it sounds like a good option. But doesn't the ebase initialization depend on the memblock allocator? I see in trap_init(): if (!cpu_has_mips_r2_r6) { ... } else { ... ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size)); ... if (!IS_ENABLED(CONFIG_EVA) && !WARN_ON(ebase_pa >= 0x20000000)) ebase = CKSEG0ADDR(ebase_pa); else ebase = (unsigned long)phys_to_virt(ebase_pa); > > * add an API to memblock to set lower limit for allocations and then set > the lower limit, to e.g. kernel load address in arch_mem_init(). This may > add complexity for configurations with relocatable kernel and kaslr. This option looks more like a workaround to me, but maybe it's ok too. Thanks! ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end 2021-03-02 3:55 ` Roman Gushchin @ 2021-03-02 13:08 ` Serge Semin 0 siblings, 0 replies; 38+ messages in thread From: Serge Semin @ 2021-03-02 13:08 UTC (permalink / raw) To: Roman Gushchin Cc: Serge Semin, Mike Rapoport, Florian Fainelli, Thomas Bogendoerfer, Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team, open list:BROADCOM BMIPS MIPS ARCHITECTURE On Mon, Mar 01, 2021 at 07:55:21PM -0800, Roman Gushchin wrote: > On Mon, Mar 01, 2021 at 11:45:42AM +0200, Mike Rapoport wrote: > > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote: > > > Hi Serge, > > > > > > On 2/28/2021 3:08 PM, Serge Semin wrote: > > > > Hi folks, > > > > What you've got here seems a more complicated problem than it > > > > could originally look like. Please, see my comments below. > > > > > > > > (Note I've discarded some of the email logs, which of no interest > > > > to the discovered problem. Please also note that I haven't got any > > > > Broadcom hardware to test out a solution suggested below.) > > > > > > > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: > > > >> Hi Mike, > > > >> > > > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote: > > > >>> Hi Florian, > > > >>> > > > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: > > > >>>> > > > > > > > >>>> [...] > > > > > > > >>>> > > > >>>> Hi Roman, Thomas and other linux-mips folks, > > > >>>> > > > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this > > > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do > > > >>>> not see a warning like this one in the commit message, instead what > > > >>>> happens appear to be a corrupted Device Tree which prevents the parsing > > > >>>> of the "rdb" node and leading to the interrupt controllers not being > > > >>>> registered, and the system eventually not booting. > > > >>>> > > > >>>> The Device Tree is built-into the kernel image and resides at > > > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. > > > >>>> > > > >>>> Do you have any idea what could be wrong with MIPS specifically here? > > > > > > > > Most likely the problem you've discovered has been there for quite > > > > some time. The patch you are referring to just caused it to be > > > > triggered by extending the early allocation range. See before that > > > > patch was accepted the early memory allocations had been performed > > > > in the range: > > > > [kernel_end, RAM_END]. > > > > The patch changed that, so the early allocations are done within > > > > [RAM_START + PAGE_SIZE, RAM_END]. > > > > > > > > In normal situations it's safe to do that as long as all the critical > > > > memory regions (including the memory residing a space below the > > > > kernel) have been reserved. But as soon as a memory with some critical > > > > structures haven't been reserved, the kernel may allocate it to be used > > > > for instance for early initializations with obviously unpredictable but > > > > most of the times unpleasant consequences. > > > > > > > >>> > > > >>> Apparently there is a memblock allocation in one of the functions called > > > >>> from arch_mem_init() between plat_mem_setup() and > > > >>> early_init_fdt_reserve_self(). > > > > > > > > Mike, alas according to the log provided by Florian that's not the reason > > > > of the problem. Please, see my considerations below. > > > > > > > >> [...] > > > >> > > > >> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) > > > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun > > > >> Feb 28 10:01:50 PST 2021 > > > >> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) > > > >> [ 0.000000] FPU revision is: 00130001 > > > > > > > >> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] > > > >> early_init_dt_scan_memory+0x160/0x1e0 > > > >> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] > > > >> early_init_dt_scan_memory+0x160/0x1e0 > > > >> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] > > > >> early_init_dt_scan_memory+0x160/0x1e0 > > > > > > > > Here the memory has been added to the memblock allocator. > > > > > > > >> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB > > > >> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') > > > >> [ 0.000000] printk: bootconsole [ns16550a0] enabled > > > > > > > >> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] > > > >> setup_arch+0x128/0x69c > > > > > > > > Here the fdt memory has been reserved. (Note it's built into the > > > > kernel.) > > > > > > > >> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] > > > >> setup_arch+0x1f8/0x69c > > > > > > > > Here the kernel itself together with built-in dtb have been reserved. > > > > So far so good. > > > > > > > >> [ 0.000000] Initrd not found or empty - disabling initrd > > > > > > > >> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 > > > >> from=0x00000000 max_addr=0x00000000 > > > >> early_init_dt_alloc_memory_arch+0x40/0x84 > > > >> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] > > > >> memblock_alloc_range_nid+0xf8/0x198 > > > >> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 > > > >> from=0x00000000 max_addr=0x00000000 > > > >> early_init_dt_alloc_memory_arch+0x40/0x84 > > > >> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] > > > >> memblock_alloc_range_nid+0xf8/0x198 > > > > > > > > The log above most likely belongs to the call-chain: > > > > setup_arch() > > > > +-> arch_mem_init() > > > > +-> device_tree_init() - BMIPS specific method > > > > +-> unflatten_and_copy_device_tree() > > > > > > > > So to speak here we've copied the fdt from the original space > > > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened > > > > it to [0x00003aa4-0x0000ba4b]. > > > > > > > > The problem is that a bit later the next call-chain is performed: > > > > setup_arch() > > > > +-> plat_smp_setup() > > > > +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); > > > > +-> if (!board_ebase_setup) > > > > board_ebase_setup = &bmips_ebase_setup; > > > > > > > > So at the moment of the CPU traps initialization the bmips_ebase_setup() > > > > method is called. What trap_init() does isn't compatible with the > > > > allocation performed by the unflatten_and_copy_device_tree() method. > > > > See the next comment. > > > > > > > >> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 > > > >> from=0x00000000 max_addr=0x00000000 > > > >> early_init_dt_alloc_memory_arch+0x40/0x84 > > > > ... > > > > > >> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 > > > >> bytes, linear) > > > > > > > >> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] > > > >> trap_init+0x70/0x4e8 > > > > > > > > Most likely someplace here the corruption has happened. The log above > > > > has just reserved a memory for NMI/reset vectors: > > > > arch/mips/kernel/traps.c: trap_init(void): Line 2373. > > > > > > > > But then the board_ebase_setup() pointer is dereferenced and called, > > > > which has been initialized with bmips_ebase_setup() earlier and which > > > > overwrites the ebase variable with: 0x80001000 as this is > > > > CPU_BMIPS5000 CPU. So any further calls of the functions like > > > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a > > > > corruption of the memory above 0x80001000, which as we have discovered > > > > belongs to fdt and unflattened device tree. > > > > > > > >> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > > > >> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, > > > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K > > > >> cma-reserved, 1835008K highmem) > > > >> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > > > >> [ 0.000000] rcu: Hierarchical RCU implementation. > > > >> [ 0.000000] rcu: RCU event tracing is enabled. > > > >> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay > > > >> is 25 jiffies. > > > >> [ 0.000000] NR_IRQS: 256 > > > > > > > >> [ 0.000000] OF: Bad cell count for /rdb > > > >> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers > > > >> [ 0.000000] OF: of_irq_init: children remain, but no parents > > > > > > > > So here is the first time we have got the consequence of the corruption > > > > popped up. Luckily it's just the "Bad cells count" error. We could have > > > > got much less obvious log here up to getting a crash at some place > > > > further... > > > > > > > >> [ 0.000000] random: get_random_bytes called from > > > >> start_kernel+0x444/0x654 with crng_init=0 > > > >> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, > > > >> wraps every 8589934590000000ns > > > > > > > >> > > > >> and with your patch applied which unfortunately did not work we have the > > > >> following: > > > >> > > > >> [...] > > > > > > > > So a patch like this shall workaround the corruption: > > > > > > > > --- a/arch/mips/bmips/setup.c > > > > +++ b/arch/mips/bmips/setup.c > > > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) > > > > > > > > __dt_setup_arch(dtb); > > > > > > > > + memblock_reserve(0x0, 0x1000 + 0x100*64); > > > > + > > > > for (q = bmips_quirk_list; q->quirk_fn; q++) { > > > > if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > > > > q->compatible)) { > > > > > > This patch works, thanks a lot for the troubleshooting and analysis! How > > > about the following which would be more generic and works as well and > > > should be more universal since it does not require each architecture to > > > provide an appropriate call to memblock_reserve(): > > > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > > index e0352958e2f7..b0a173b500e8 100644 > > > --- a/arch/mips/kernel/traps.c > > > +++ b/arch/mips/kernel/traps.c > > > @@ -2367,10 +2367,7 @@ void __init trap_init(void) > > > > > > if (!cpu_has_mips_r2_r6) { > > > ebase = CAC_BASE; > > > - ebase_pa = virt_to_phys((void *)ebase); > > > vec_size = 0x400; > > > - > > > - memblock_reserve(ebase_pa, vec_size); > > > } else { > > > if (cpu_has_veic || cpu_has_vint) > > > vec_size = 0x200 + VECTORSPACING*64; > > > @@ -2410,6 +2407,14 @@ void __init trap_init(void) > > > > > > if (board_ebase_setup) > > > board_ebase_setup(); > > > + > > > + /* board_ebase_setup() can change the exception base address > > > + * reserve it now after changes were made. > > > + */ > > > + if (!cpu_has_mips_r2_r6) { > > > + ebase_pa = virt_to_phys((void *)ebase); > > > + memblock_reserve(ebase_pa, vec_size); > > > + } > > Hi folks! > > First, I'm really sorry for breaking things and also being silent for last > couple of days: I was almost completely offline. Thank you for working on > this! > > > > > With this it's still possible to have memblock allocations around ebase_pa > > before it is reserved. > > > > I think we have two options here to solve it in more or less generic way: > > > > * split the reservation of ebase from traps_init() and move it earlier to > > setup_arch(). I didn't check what board_ebase_setup() do, if they need to > > allocate memory it would not work. > > It seems that it doesn't allocate any memory, so it sounds like a good option. > But doesn't the ebase initialization depend on the memblock allocator? > > I see in trap_init(): > if (!cpu_has_mips_r2_r6) { > ... > } else { > ... > ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size)); > ... > if (!IS_ENABLED(CONFIG_EVA) && !WARN_ON(ebase_pa >= 0x20000000)) > ebase = CKSEG0ADDR(ebase_pa); > else > ebase = (unsigned long)phys_to_virt(ebase_pa); Yeap, this seems like the best option for now. Of course we need to reserve the memory only if the system needs that like in case of non MIPS R2-R5 archs. In addition a custom ebase value must be taken into account. The later is the hardest part to achieve. ebase is a global variable. So we need to thoroughly scan all the MIPS platforms which update it and make sure it's done before the reservation is performed. > > > > > > * add an API to memblock to set lower limit for allocations and then set > > the lower limit, to e.g. kernel load address in arch_mem_init(). This may > > add complexity for configurations with relocatable kernel and kaslr. > > This option looks more like a workaround to me, but maybe it's ok too. Agree. The first one is better. -Sergey > > Thanks! ^ permalink raw reply [flat|nested] 38+ messages in thread
* [tip: x86/boot] x86/setup: Consolidate early memory reservations 2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin ` (2 preceding siblings ...) 2021-02-28 4:18 ` Florian Fainelli @ 2021-03-23 18:19 ` tip-bot2 for Mike Rapoport 3 siblings, 0 replies; 38+ messages in thread From: tip-bot2 for Mike Rapoport @ 2021-03-23 18:19 UTC (permalink / raw) To: linux-tip-commits Cc: Mike Rapoport, Borislav Petkov, Baoquan He, David Hildenbrand, x86, linux-kernel The following commit has been merged into the x86/boot branch of tip: Commit-ID: a799c2bd29d19c565f37fa038b31a0a1d44d0e4d Gitweb: https://git.kernel.org/tip/a799c2bd29d19c565f37fa038b31a0a1d44d0e4d Author: Mike Rapoport <rppt@linux.ibm.com> AuthorDate: Tue, 02 Mar 2021 12:04:05 +02:00 Committer: Borislav Petkov <bp@suse.de> CommitterDate: Tue, 23 Mar 2021 17:13:17 +01:00 x86/setup: Consolidate early memory reservations The early reservations of memory areas used by the firmware, bootloader, kernel text and data are spread over setup_arch(). Moreover, some of them happen *after* memblock allocations, e.g trim_platform_memory_ranges() and trim_low_memory_range() are called after reserve_real_mode() that allocates memory. There was no corruption of these memory regions because memblock always allocates memory either from the end of memory (in top-down mode) or above the kernel image (in bottom-up mode). However, the bottom up mode is going to be updated to span the entire memory [1] to avoid limitations caused by KASLR. Consolidate early memory reservations in a dedicated function to improve robustness against future changes. Having the early reservations in one place also makes it clearer what memory must be reserved before memblock allocations are allowed. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Link: [1] https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com Link: https://lkml.kernel.org/r/20210302100406.22059-2-rppt@kernel.org --- arch/x86/kernel/setup.c | 92 +++++++++++++++++++--------------------- 1 file changed, 44 insertions(+), 48 deletions(-) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index d883176..3e3c603 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -645,18 +645,6 @@ static void __init trim_snb_memory(void) } } -/* - * Here we put platform-specific memory range workarounds, i.e. - * memory known to be corrupt or otherwise in need to be reserved on - * specific platforms. - * - * If this gets used more widely it could use a real dispatch mechanism. - */ -static void __init trim_platform_memory_ranges(void) -{ - trim_snb_memory(); -} - static void __init trim_bios_range(void) { /* @@ -729,7 +717,38 @@ static void __init trim_low_memory_range(void) { memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); } - + +static void __init early_reserve_memory(void) +{ + /* + * Reserve the memory occupied by the kernel between _text and + * __end_of_kernel_reserve symbols. Any kernel sections after the + * __end_of_kernel_reserve symbol must be explicitly reserved with a + * separate memblock_reserve() or they will be discarded. + */ + memblock_reserve(__pa_symbol(_text), + (unsigned long)__end_of_kernel_reserve - (unsigned long)_text); + + /* + * Make sure page 0 is always reserved because on systems with + * L1TF its contents can be leaked to user processes. + */ + memblock_reserve(0, PAGE_SIZE); + + early_reserve_initrd(); + + if (efi_enabled(EFI_BOOT)) + efi_memblock_x86_reserve_range(); + + memblock_x86_reserve_range_setup_data(); + + reserve_ibft_region(); + reserve_bios_regions(); + + trim_snb_memory(); + trim_low_memory_range(); +} + /* * Dump out kernel offset information on panic. */ @@ -764,29 +783,6 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p) void __init setup_arch(char **cmdline_p) { - /* - * Reserve the memory occupied by the kernel between _text and - * __end_of_kernel_reserve symbols. Any kernel sections after the - * __end_of_kernel_reserve symbol must be explicitly reserved with a - * separate memblock_reserve() or they will be discarded. - */ - memblock_reserve(__pa_symbol(_text), - (unsigned long)__end_of_kernel_reserve - (unsigned long)_text); - - /* - * Make sure page 0 is always reserved because on systems with - * L1TF its contents can be leaked to user processes. - */ - memblock_reserve(0, PAGE_SIZE); - - early_reserve_initrd(); - - /* - * At this point everything still needed from the boot loader - * or BIOS or kernel text should be early reserved or marked not - * RAM in e820. All other memory is free game. - */ - #ifdef CONFIG_X86_32 memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data)); @@ -910,8 +906,18 @@ void __init setup_arch(char **cmdline_p) parse_early_param(); - if (efi_enabled(EFI_BOOT)) - efi_memblock_x86_reserve_range(); + /* + * Do some memory reservations *before* memory is added to + * memblock, so memblock allocations won't overwrite it. + * Do it after early param, so we could get (unlikely) panic from + * serial. + * + * After this point everything still needed from the boot loader or + * firmware or kernel text should be early reserved or marked not + * RAM in e820. All other memory is free game. + */ + early_reserve_memory(); + #ifdef CONFIG_MEMORY_HOTPLUG /* * Memory used by the kernel cannot be hot-removed because Linux @@ -938,9 +944,6 @@ void __init setup_arch(char **cmdline_p) x86_report_nx(); - /* after early param, so could get panic from serial */ - memblock_x86_reserve_range_setup_data(); - if (acpi_mps_check()) { #ifdef CONFIG_X86_LOCAL_APIC disable_apic = 1; @@ -1032,8 +1035,6 @@ void __init setup_arch(char **cmdline_p) */ find_smp_config(); - reserve_ibft_region(); - early_alloc_pgt_buf(); /* @@ -1054,8 +1055,6 @@ void __init setup_arch(char **cmdline_p) */ sev_setup_arch(); - reserve_bios_regions(); - efi_fake_memmap(); efi_find_mirror(); efi_esrt_init(); @@ -1081,9 +1080,6 @@ void __init setup_arch(char **cmdline_p) reserve_real_mode(); - trim_platform_memory_ranges(); - trim_low_memory_range(); - init_mem_mapping(); idt_setup_early_pf(); ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up 2020-12-17 20:12 [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Roman Gushchin 2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin @ 2020-12-20 6:48 ` Mike Rapoport 2020-12-21 17:05 ` Roman Gushchin 1 sibling, 1 reply; 38+ messages in thread From: Mike Rapoport @ 2020-12-20 6:48 UTC (permalink / raw) To: Roman Gushchin Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team On Thu, Dec 17, 2020 at 12:12:13PM -0800, Roman Gushchin wrote: > Currently cma areas without a fixed base are allocated close to the > end of the node. This placement is sub-optimal because of compaction: > it brings pages into the cma area. In particular, it can bring in hot > executable pages, even if there is a plenty of free memory on the > machine. This results in cma allocation failures. > > Instead let's place cma areas close to the beginning of a node. > In this case the compaction will help to free cma areas, resulting > in better cma allocation success rates. > > If there is enough memory let's try to allocate bottom-up starting > with 4GB to exclude any possible interference with DMA32. On smaller > machines or in a case of a failure, stick with the old behavior. > > 16GB vm, 2GB cma area: > With this patch: > [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G > [ 0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node > [ 0.002930] cma: Reserved 2048 MiB at 0x0000000100000000 > [ 0.002931] hugetlb_cma: reserved 2048 MiB on node 0 > > Without this patch: > [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G > [ 0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node > [ 0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000 > [ 0.002934] hugetlb_cma: reserved 2048 MiB on node 0 > > v2: > - switched to memblock_set_bottom_up(true), by Mike > - start with 4GB, by Mike > > Signed-off-by: Roman Gushchin <guro@fb.com> With one nit below Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > --- > mm/cma.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/mm/cma.c b/mm/cma.c > index 7f415d7cda9f..21fd40c092f0 100644 > --- a/mm/cma.c > +++ b/mm/cma.c > @@ -337,6 +337,22 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, > limit = highmem_start; > } > > + /* > + * If there is enough memory, try a bottom-up allocation first. > + * It will place the new cma area close to the start of the node > + * and guarantee that the compaction is moving pages out of the > + * cma area and not into it. > + * Avoid using first 4GB to not interfere with constrained zones > + * like DMA/DMA32. > + */ > + if (!memblock_bottom_up() && > + memblock_end >= SZ_4G + size) { This seems short enough to fit a single line > + memblock_set_bottom_up(true); > + addr = memblock_alloc_range_nid(size, alignment, SZ_4G, > + limit, nid, true); > + memblock_set_bottom_up(false); > + } > + > if (!addr) { > addr = memblock_alloc_range_nid(size, alignment, base, > limit, nid, true); > -- > 2.26.2 > -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up 2020-12-20 6:48 ` [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Mike Rapoport @ 2020-12-21 17:05 ` Roman Gushchin 2020-12-23 4:06 ` Andrew Morton 0 siblings, 1 reply; 38+ messages in thread From: Roman Gushchin @ 2020-12-21 17:05 UTC (permalink / raw) To: Mike Rapoport Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team On Sun, Dec 20, 2020 at 08:48:48AM +0200, Mike Rapoport wrote: > On Thu, Dec 17, 2020 at 12:12:13PM -0800, Roman Gushchin wrote: > > Currently cma areas without a fixed base are allocated close to the > > end of the node. This placement is sub-optimal because of compaction: > > it brings pages into the cma area. In particular, it can bring in hot > > executable pages, even if there is a plenty of free memory on the > > machine. This results in cma allocation failures. > > > > Instead let's place cma areas close to the beginning of a node. > > In this case the compaction will help to free cma areas, resulting > > in better cma allocation success rates. > > > > If there is enough memory let's try to allocate bottom-up starting > > with 4GB to exclude any possible interference with DMA32. On smaller > > machines or in a case of a failure, stick with the old behavior. > > > > 16GB vm, 2GB cma area: > > With this patch: > > [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G > > [ 0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node > > [ 0.002930] cma: Reserved 2048 MiB at 0x0000000100000000 > > [ 0.002931] hugetlb_cma: reserved 2048 MiB on node 0 > > > > Without this patch: > > [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G > > [ 0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node > > [ 0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000 > > [ 0.002934] hugetlb_cma: reserved 2048 MiB on node 0 > > > > v2: > > - switched to memblock_set_bottom_up(true), by Mike > > - start with 4GB, by Mike > > > > Signed-off-by: Roman Gushchin <guro@fb.com> > > With one nit below > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > > > --- > > mm/cma.c | 16 ++++++++++++++++ > > 1 file changed, 16 insertions(+) > > > > diff --git a/mm/cma.c b/mm/cma.c > > index 7f415d7cda9f..21fd40c092f0 100644 > > --- a/mm/cma.c > > +++ b/mm/cma.c > > @@ -337,6 +337,22 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, > > limit = highmem_start; > > } > > > > + /* > > + * If there is enough memory, try a bottom-up allocation first. > > + * It will place the new cma area close to the start of the node > > + * and guarantee that the compaction is moving pages out of the > > + * cma area and not into it. > > + * Avoid using first 4GB to not interfere with constrained zones > > + * like DMA/DMA32. > > + */ > > + if (!memblock_bottom_up() && > > + memblock_end >= SZ_4G + size) { > Hi Mike! > This seems short enough to fit a single line Indeed. An updated version below. Thank you for the review of the series! I assume it's simpler to route both patches through the mm tree. What do you think? Thanks! -- From f88bd0a425c7181bd26a4cf900e6924a7b521419 Mon Sep 17 00:00:00 2001 From: Roman Gushchin <guro@fb.com> Date: Mon, 14 Dec 2020 20:20:52 -0800 Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up Currently cma areas without a fixed base are allocated close to the end of the node. This placement is sub-optimal because of compaction: it brings pages into the cma area. In particular, it can bring in hot executable pages, even if there is a plenty of free memory on the machine. This results in cma allocation failures. Instead let's place cma areas close to the beginning of a node. In this case the compaction will help to free cma areas, resulting in better cma allocation success rates. If there is enough memory let's try to allocate bottom-up starting with 4GB to exclude any possible interference with DMA32. On smaller machines or in a case of a failure, stick with the old behavior. 16GB vm, 2GB cma area: With this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002930] cma: Reserved 2048 MiB at 0x0000000100000000 [ 0.002931] hugetlb_cma: reserved 2048 MiB on node 0 Without this patch: [ 0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G [ 0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000 [ 0.002934] hugetlb_cma: reserved 2048 MiB on node 0 v3: - code alignment fix, by Mike v2: - switched to memblock_set_bottom_up(true), by Mike - start with 4GB, by Mike Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> --- mm/cma.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/mm/cma.c b/mm/cma.c index 20c4f6f40037..4fe74c9d83b0 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -336,6 +336,21 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, limit = highmem_start; } + /* + * If there is enough memory, try a bottom-up allocation first. + * It will place the new cma area close to the start of the node + * and guarantee that the compaction is moving pages out of the + * cma area and not into it. + * Avoid using first 4GB to not interfere with constrained zones + * like DMA/DMA32. + */ + if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) { + memblock_set_bottom_up(true); + addr = memblock_alloc_range_nid(size, alignment, SZ_4G, + limit, nid, true); + memblock_set_bottom_up(false); + } + if (!addr) { addr = memblock_alloc_range_nid(size, alignment, base, limit, nid, true); -- 2.26.2 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up 2020-12-21 17:05 ` Roman Gushchin @ 2020-12-23 4:06 ` Andrew Morton 2020-12-23 16:35 ` Roman Gushchin 0 siblings, 1 reply; 38+ messages in thread From: Andrew Morton @ 2020-12-23 4:06 UTC (permalink / raw) To: Roman Gushchin Cc: Mike Rapoport, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin <guro@fb.com> wrote: > Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up i386 allmodconfig: In file included from ./include/vdso/const.h:5, from ./include/linux/const.h:4, from ./include/linux/bits.h:5, from ./include/linux/bitops.h:6, from ./include/linux/kernel.h:11, from ./include/asm-generic/bug.h:20, from ./arch/x86/include/asm/bug.h:93, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/mm.h:9, from ./include/linux/memblock.h:13, from mm/cma.c:24: mm/cma.c: In function ‘cma_declare_contiguous_nid’: ./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from ‘4294967296’ to ‘0’ [-Woverflow] #define __AC(X,Y) (X##Y) ^~~~~~ ./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’ #define _AC(X,Y) __AC(X,Y) ^~~~ ./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’ #define SZ_4G _AC(0x100000000, ULL) ^~~ mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’ addr = memblock_alloc_range_nid(size, alignment, SZ_4G, ^~~~~ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up 2020-12-23 4:06 ` Andrew Morton @ 2020-12-23 16:35 ` Roman Gushchin 2020-12-23 22:10 ` Mike Rapoport 0 siblings, 1 reply; 38+ messages in thread From: Roman Gushchin @ 2020-12-23 16:35 UTC (permalink / raw) To: Andrew Morton Cc: Mike Rapoport, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team On Tue, Dec 22, 2020 at 08:06:06PM -0800, Andrew Morton wrote: > On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin <guro@fb.com> wrote: > > > Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up > > i386 allmodconfig: > > In file included from ./include/vdso/const.h:5, > from ./include/linux/const.h:4, > from ./include/linux/bits.h:5, > from ./include/linux/bitops.h:6, > from ./include/linux/kernel.h:11, > from ./include/asm-generic/bug.h:20, > from ./arch/x86/include/asm/bug.h:93, > from ./include/linux/bug.h:5, > from ./include/linux/mmdebug.h:5, > from ./include/linux/mm.h:9, > from ./include/linux/memblock.h:13, > from mm/cma.c:24: > mm/cma.c: In function ‘cma_declare_contiguous_nid’: > ./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from ‘4294967296’ to ‘0’ [-Woverflow] > #define __AC(X,Y) (X##Y) > ^~~~~~ > ./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’ > #define _AC(X,Y) __AC(X,Y) > ^~~~ > ./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’ > #define SZ_4G _AC(0x100000000, ULL) > ^~~ > mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’ > addr = memblock_alloc_range_nid(size, alignment, SZ_4G, > ^~~~~ > I thought that (!memblock_bottom_up() && memblock_end >= SZ_4G + size) can't be true on a 32-bit platform, so the whole if clause can be compiled out. Maybe it's because memblock_end can be equal to SZ_4G and if the size == 0... I have no better idea than wrapping everything into #if BITS_PER_LONG > 32 #endif. Thanks! -- diff --git a/mm/cma.c b/mm/cma.c index 4fe74c9d83b0..5d69b498603a 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -344,12 +344,14 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, * Avoid using first 4GB to not interfere with constrained zones * like DMA/DMA32. */ +#if BITS_PER_LONG > 32 if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) { memblock_set_bottom_up(true); addr = memblock_alloc_range_nid(size, alignment, SZ_4G, limit, nid, true); memblock_set_bottom_up(false); } +#endif if (!addr) { addr = memblock_alloc_range_nid(size, alignment, base, ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up 2020-12-23 16:35 ` Roman Gushchin @ 2020-12-23 22:10 ` Mike Rapoport 2020-12-28 19:36 ` Roman Gushchin 0 siblings, 1 reply; 38+ messages in thread From: Mike Rapoport @ 2020-12-23 22:10 UTC (permalink / raw) To: Roman Gushchin Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team On Wed, Dec 23, 2020 at 08:35:37AM -0800, Roman Gushchin wrote: > On Tue, Dec 22, 2020 at 08:06:06PM -0800, Andrew Morton wrote: > > On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin <guro@fb.com> wrote: > > > > > Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up > > > > i386 allmodconfig: > > > > In file included from ./include/vdso/const.h:5, > > from ./include/linux/const.h:4, > > from ./include/linux/bits.h:5, > > from ./include/linux/bitops.h:6, > > from ./include/linux/kernel.h:11, > > from ./include/asm-generic/bug.h:20, > > from ./arch/x86/include/asm/bug.h:93, > > from ./include/linux/bug.h:5, > > from ./include/linux/mmdebug.h:5, > > from ./include/linux/mm.h:9, > > from ./include/linux/memblock.h:13, > > from mm/cma.c:24: > > mm/cma.c: In function ‘cma_declare_contiguous_nid’: > > ./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from ‘4294967296’ to ‘0’ [-Woverflow] > > #define __AC(X,Y) (X##Y) > > ^~~~~~ > > ./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’ > > #define _AC(X,Y) __AC(X,Y) > > ^~~~ > > ./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’ > > #define SZ_4G _AC(0x100000000, ULL) > > ^~~ > > mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’ > > addr = memblock_alloc_range_nid(size, alignment, SZ_4G, > > ^~~~~ > > > > I thought that (!memblock_bottom_up() && memblock_end >= SZ_4G + size) > can't be true on a 32-bit platform, so the whole if clause can be compiled out. > Maybe it's because memblock_end can be equal to SZ_4G and if the size == 0... > > I have no better idea than wrapping everything into > #if BITS_PER_LONG > 32 > #endif. 32-bit systems can have more than 32 bit in the physical address. I think a better option would be to use CONFIG_PHYS_ADDR_T_64BIT > Thanks! > > -- > > diff --git a/mm/cma.c b/mm/cma.c > index 4fe74c9d83b0..5d69b498603a 100644 > --- a/mm/cma.c > +++ b/mm/cma.c > @@ -344,12 +344,14 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, > * Avoid using first 4GB to not interfere with constrained zones > * like DMA/DMA32. > */ > +#if BITS_PER_LONG > 32 > if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) { > memblock_set_bottom_up(true); > addr = memblock_alloc_range_nid(size, alignment, SZ_4G, > limit, nid, true); > memblock_set_bottom_up(false); > } > +#endif > > if (!addr) { > addr = memblock_alloc_range_nid(size, alignment, base, -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up 2020-12-23 22:10 ` Mike Rapoport @ 2020-12-28 19:36 ` Roman Gushchin 0 siblings, 0 replies; 38+ messages in thread From: Roman Gushchin @ 2020-12-28 19:36 UTC (permalink / raw) To: Mike Rapoport Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team On Thu, Dec 24, 2020 at 12:10:39AM +0200, Mike Rapoport wrote: > On Wed, Dec 23, 2020 at 08:35:37AM -0800, Roman Gushchin wrote: > > On Tue, Dec 22, 2020 at 08:06:06PM -0800, Andrew Morton wrote: > > > On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin <guro@fb.com> wrote: > > > > > > > Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up > > > > > > i386 allmodconfig: > > > > > > In file included from ./include/vdso/const.h:5, > > > from ./include/linux/const.h:4, > > > from ./include/linux/bits.h:5, > > > from ./include/linux/bitops.h:6, > > > from ./include/linux/kernel.h:11, > > > from ./include/asm-generic/bug.h:20, > > > from ./arch/x86/include/asm/bug.h:93, > > > from ./include/linux/bug.h:5, > > > from ./include/linux/mmdebug.h:5, > > > from ./include/linux/mm.h:9, > > > from ./include/linux/memblock.h:13, > > > from mm/cma.c:24: > > > mm/cma.c: In function ‘cma_declare_contiguous_nid’: > > > ./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from ‘4294967296’ to ‘0’ [-Woverflow] > > > #define __AC(X,Y) (X##Y) > > > ^~~~~~ > > > ./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’ > > > #define _AC(X,Y) __AC(X,Y) > > > ^~~~ > > > ./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’ > > > #define SZ_4G _AC(0x100000000, ULL) > > > ^~~ > > > mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’ > > > addr = memblock_alloc_range_nid(size, alignment, SZ_4G, > > > ^~~~~ > > > > > > > I thought that (!memblock_bottom_up() && memblock_end >= SZ_4G + size) > > can't be true on a 32-bit platform, so the whole if clause can be compiled out. > > Maybe it's because memblock_end can be equal to SZ_4G and if the size == 0... > > > > I have no better idea than wrapping everything into > > #if BITS_PER_LONG > 32 > > #endif. > > 32-bit systems can have more than 32 bit in the physical address. > I think a better option would be to use CONFIG_PHYS_ADDR_T_64BIT I agree. An updated fixup below. Andrew, can you, please, replace the previous fixup with this one? Thanks! -- diff --git a/mm/cma.c b/mm/cma.c index 4fe74c9d83b0..0ba69cd16aeb 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -344,12 +344,14 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, * Avoid using first 4GB to not interfere with constrained zones * like DMA/DMA32. */ +#ifdef CONFIG_PHYS_ADDR_T_64BIT if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) { memblock_set_bottom_up(true); addr = memblock_alloc_range_nid(size, alignment, SZ_4G, limit, nid, true); memblock_set_bottom_up(false); } +#endif if (!addr) { addr = memblock_alloc_range_nid(size, alignment, base, ^ permalink raw reply related [flat|nested] 38+ messages in thread
end of thread, other threads:[~2021-03-23 18:20 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-17 20:12 [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Roman Gushchin 2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin 2020-12-19 14:52 ` Wonhyuk Yang 2020-12-19 17:05 ` Roman Gushchin 2020-12-20 6:49 ` Mike Rapoport 2021-01-22 4:37 ` Thiago Jung Bauermann 2021-01-24 2:09 ` Andrew Morton 2021-01-24 7:34 ` Mike Rapoport 2021-01-26 0:30 ` Thiago Jung Bauermann 2021-02-08 23:58 ` Thiago Jung Bauermann 2021-02-28 4:18 ` Florian Fainelli 2021-02-28 9:00 ` Mike Rapoport 2021-02-28 18:19 ` Florian Fainelli 2021-02-28 23:08 ` Serge Semin 2021-03-01 3:50 ` Florian Fainelli 2021-03-01 9:22 ` Serge Semin 2021-03-02 4:09 ` Florian Fainelli 2021-03-02 13:26 ` Serge Semin 2021-03-02 4:19 ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli 2021-03-02 8:09 ` Mike Rapoport 2021-03-02 13:54 ` Serge Semin 2021-03-02 19:04 ` Roman Gushchin 2021-03-02 23:54 ` Thomas Bogendoerfer 2021-03-03 1:30 ` Florian Fainelli 2021-03-03 9:41 ` Thomas Bogendoerfer 2021-03-03 17:45 ` Maciej W. Rozycki 2021-03-03 18:15 ` Thomas Bogendoerfer 2021-03-03 21:50 ` Maciej W. Rozycki 2021-03-01 9:45 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport 2021-03-02 3:55 ` Roman Gushchin 2021-03-02 13:08 ` Serge Semin 2021-03-23 18:19 ` [tip: x86/boot] x86/setup: Consolidate early memory reservations tip-bot2 for Mike Rapoport 2020-12-20 6:48 ` [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Mike Rapoport 2020-12-21 17:05 ` Roman Gushchin 2020-12-23 4:06 ` Andrew Morton 2020-12-23 16:35 ` Roman Gushchin 2020-12-23 22:10 ` Mike Rapoport 2020-12-28 19:36 ` Roman Gushchin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).