From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02E32C4361A for ; Tue, 2 Mar 2021 16:23:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CFEAC64F1B for ; Tue, 2 Mar 2021 16:23:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1449186AbhCBQQc (ORCPT ); Tue, 2 Mar 2021 11:16:32 -0500 Received: from mail.baikalelectronics.com ([87.245.175.226]:45952 "EHLO mail.baikalelectronics.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376687AbhCBNi5 (ORCPT ); Tue, 2 Mar 2021 08:38:57 -0500 Date: Tue, 2 Mar 2021 16:26:01 +0300 From: Serge Semin To: Florian Fainelli CC: Serge Semin , Mike Rapoport , Thomas Bogendoerfer , Roman Gushchin , Andrew Morton , , Kamal Dasu , Paul Cercueil , Jiaxun Yang , , , Michal Hocko , , , "open list:BROADCOM BMIPS MIPS ARCHITECTURE" Subject: Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Message-ID: <20210302132601.c2bm6sbjnjzud3da@mobilestation> References: <20201217201214.3414100-1-guro@fb.com> <20201217201214.3414100-2-guro@fb.com> <23fc1ef9-7342-8bc2-d184-d898107c52b2@gmail.com> <20210228090041.GO1447004@kernel.org> <8cbafe95-0f8c-a9b7-2dc9-cded846622fd@gmail.com> <20210228230811.wdae7oaaf3mbpgwl@mobilestation> <2e973fa8-5f2b-6840-0874-9c15fa0ebea0@gmail.com> <20210301092241.i7dxo7zbg3ar55d6@mobilestation> <97600bf8-06fd-3d76-8791-c2e3c4eae8a1@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <97600bf8-06fd-3d76-8791-c2e3c4eae8a1@gmail.com> X-ClientProxiedBy: MAIL.baikal.int (192.168.51.25) To mail (192.168.51.25) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 01, 2021 at 08:09:52PM -0800, Florian Fainelli wrote: > > > On 3/1/2021 1:22 AM, Serge Semin wrote: > > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote: > >> Hi Serge, > >> > >> On 2/28/2021 3:08 PM, Serge Semin wrote: > >>> Hi folks, > >>> What you've got here seems a more complicated problem than it > >>> could originally look like. Please, see my comments below. > >>> > >>> (Note I've discarded some of the email logs, which of no interest > >>> to the discovered problem. Please also note that I haven't got any > >>> Broadcom hardware to test out a solution suggested below.) > >>> > >>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: > >>>> Hi Mike, > >>>> > >>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote: > >>>>> Hi Florian, > >>>>> > >>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: > >>>>>> > >>> > >>>>>> [...] > >>> > >>>>>> > >>>>>> Hi Roman, Thomas and other linux-mips folks, > >>>>>> > >>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this > >>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do > >>>>>> not see a warning like this one in the commit message, instead what > >>>>>> happens appear to be a corrupted Device Tree which prevents the parsing > >>>>>> of the "rdb" node and leading to the interrupt controllers not being > >>>>>> registered, and the system eventually not booting. > >>>>>> > >>>>>> The Device Tree is built-into the kernel image and resides at > >>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. > >>>>>> > >>>>>> Do you have any idea what could be wrong with MIPS specifically here? > >>> > >>> Most likely the problem you've discovered has been there for quite > >>> some time. The patch you are referring to just caused it to be > >>> triggered by extending the early allocation range. See before that > >>> patch was accepted the early memory allocations had been performed > >>> in the range: > >>> [kernel_end, RAM_END]. > >>> The patch changed that, so the early allocations are done within > >>> [RAM_START + PAGE_SIZE, RAM_END]. > >>> > >>> In normal situations it's safe to do that as long as all the critical > >>> memory regions (including the memory residing a space below the > >>> kernel) have been reserved. But as soon as a memory with some critical > >>> structures haven't been reserved, the kernel may allocate it to be used > >>> for instance for early initializations with obviously unpredictable but > >>> most of the times unpleasant consequences. > >>> > >>>>> > >>>>> Apparently there is a memblock allocation in one of the functions called > >>>>> from arch_mem_init() between plat_mem_setup() and > >>>>> early_init_fdt_reserve_self(). > >>> > >>> Mike, alas according to the log provided by Florian that's not the reason > >>> of the problem. Please, see my considerations below. > >>> > >>>> [...] > >>>> > >>>> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) > >>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun > >>>> Feb 28 10:01:50 PST 2021 > >>>> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) > >>>> [ 0.000000] FPU revision is: 00130001 > >>> > >>>> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] > >>>> early_init_dt_scan_memory+0x160/0x1e0 > >>>> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] > >>>> early_init_dt_scan_memory+0x160/0x1e0 > >>>> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] > >>>> early_init_dt_scan_memory+0x160/0x1e0 > >>> > >>> Here the memory has been added to the memblock allocator. > >>> > >>>> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB > >>>> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') > >>>> [ 0.000000] printk: bootconsole [ns16550a0] enabled > >>> > >>>> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] > >>>> setup_arch+0x128/0x69c > >>> > >>> Here the fdt memory has been reserved. (Note it's built into the > >>> kernel.) > >>> > >>>> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] > >>>> setup_arch+0x1f8/0x69c > >>> > >>> Here the kernel itself together with built-in dtb have been reserved. > >>> So far so good. > >>> > >>>> [ 0.000000] Initrd not found or empty - disabling initrd > >>> > >>>> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 > >>>> early_init_dt_alloc_memory_arch+0x40/0x84 > >>>> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 > >>>> early_init_dt_alloc_memory_arch+0x40/0x84 > >>>> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>> > >>> The log above most likely belongs to the call-chain: > >>> setup_arch() > >>> +-> arch_mem_init() > >>> +-> device_tree_init() - BMIPS specific method > >>> +-> unflatten_and_copy_device_tree() > >>> > >>> So to speak here we've copied the fdt from the original space > >>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened > >>> it to [0x00003aa4-0x0000ba4b]. > >>> > >>> The problem is that a bit later the next call-chain is performed: > >>> setup_arch() > >>> +-> plat_smp_setup() > >>> +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); > >>> +-> if (!board_ebase_setup) > >>> board_ebase_setup = &bmips_ebase_setup; > >>> > >>> So at the moment of the CPU traps initialization the bmips_ebase_setup() > >>> method is called. What trap_init() does isn't compatible with the > >>> allocation performed by the unflatten_and_copy_device_tree() method. > >>> See the next comment. > >>> > >>>> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 > >>>> early_init_dt_alloc_memory_arch+0x40/0x84 > >>>> [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] > >>>> setup_arch+0x3fc/0x69c > >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > >>>> [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > >>>> [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c > >>>> [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 > >>>> bytes. > >>>> [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, > >>>> linesize 32 bytes > >>>> [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > >>>> [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > >>>> [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 > >>>> [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] Zone ranges: > >>>> [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] > >>>> [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] > >>>> [ 0.000000] Movable zone start for each node > >>>> [ 0.000000] Early memory node ranges > >>>> [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] > >>>> [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] > >>>> [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] > >>>> [ 0.000000] Initmem setup node 0 [mem > >>>> 0x0000000000000000-0x00000000cfffffff] > >>>> [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 > >>>> from=0x00000000 max_addr=0x00000000 > >>>> alloc_node_mem_map.constprop.135+0x6c/0xc8 > >>>> [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 > >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 > >>>> [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 > >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 > >>>> [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] MEMBLOCK configuration: > >>>> [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 > >>>> [ 0.000000] memory.cnt = 0x3 > >>>> [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved.cnt = 0xa > >>>> [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 > >>>> bytes flags: 0x0 > >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 > >>>> [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 > >>>> [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 > >>>> [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 > >>>> [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 > >>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 > >>>> [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_free: [0x03245000-0x03244fff] > >>>> pcpu_embed_first_chunk+0x7a0/0x884 > >>>> [ 0.000000] memblock_free: [0x03257000-0x03256fff] > >>>> pcpu_embed_first_chunk+0x7a0/0x884 > >>>> [ 0.000000] memblock_free: [0x03269000-0x03268fff] > >>>> pcpu_embed_first_chunk+0x7a0/0x884 > >>>> [ 0.000000] memblock_free: [0x0327b000-0x0327afff] > >>>> pcpu_embed_first_chunk+0x7a0/0x884 > >>>> [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 > >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec > >>>> [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 > >>>> [ 0.000000] memblock_reserve: [0x03232580-0x032325db] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 > >>>> [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 > >>>> [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 > >>>> [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] > >>>> pcpu_embed_first_chunk+0x838/0x884 > >>>> [ 0.000000] memblock_free: [0x03231400-0x032323ff] > >>>> pcpu_embed_first_chunk+0x850/0x884 > >>>> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 > >>>> [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon > >>>> [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c > >>>> [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 > >>>> bytes, linear) > >>>> [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 > >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c > >>>> [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] > >>>> memblock_alloc_range_nid+0xf8/0x198 > >>>> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 > >>>> bytes, linear) > >>> > >>>> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] > >>>> trap_init+0x70/0x4e8 > >>> > >>> Most likely someplace here the corruption has happened. The log above > >>> has just reserved a memory for NMI/reset vectors: > >>> arch/mips/kernel/traps.c: trap_init(void): Line 2373. > >>> > >>> But then the board_ebase_setup() pointer is dereferenced and called, > >>> which has been initialized with bmips_ebase_setup() earlier and which > >>> overwrites the ebase variable with: 0x80001000 as this is > >>> CPU_BMIPS5000 CPU. So any further calls of the functions like > >>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a > >>> corruption of the memory above 0x80001000, which as we have discovered > >>> belongs to fdt and unflattened device tree. > >>> > >>>> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > >>>> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, > >>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K > >>>> cma-reserved, 1835008K highmem) > >>>> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > >>>> [ 0.000000] rcu: Hierarchical RCU implementation. > >>>> [ 0.000000] rcu: RCU event tracing is enabled. > >>>> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay > >>>> is 25 jiffies. > >>>> [ 0.000000] NR_IRQS: 256 > >>> > >>>> [ 0.000000] OF: Bad cell count for /rdb > >>>> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers > >>>> [ 0.000000] OF: of_irq_init: children remain, but no parents > >>> > >>> So here is the first time we have got the consequence of the corruption > >>> popped up. Luckily it's just the "Bad cells count" error. We could have > >>> got much less obvious log here up to getting a crash at some place > >>> further... > >>> > >>>> [ 0.000000] random: get_random_bytes called from > >>>> start_kernel+0x444/0x654 with crng_init=0 > >>>> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, > >>>> wraps every 8589934590000000ns > >>> > >>>> > >>>> and with your patch applied which unfortunately did not work we have the > >>>> following: > >>>> > >>>> [...] > >>> > >>> So a patch like this shall workaround the corruption: > >>> > >>> --- a/arch/mips/bmips/setup.c > >>> +++ b/arch/mips/bmips/setup.c > >>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) > >>> > >>> __dt_setup_arch(dtb); > >>> > >>> + memblock_reserve(0x0, 0x1000 + 0x100*64); > >>> + > >>> for (q = bmips_quirk_list; q->quirk_fn; q++) { > >>> if (of_flat_dt_is_compatible(of_get_flat_dt_root(), > >>> q->compatible)) { > >> > > > >> This patch works, thanks a lot for the troubleshooting and analysis! How > >> about the following which would be more generic and works as well and > >> should be more universal since it does not require each architecture to > >> provide an appropriate call to memblock_reserve(): > > > > Hm, are you sure it's working? > > I was until I noticed that I was working on top of a revert of Roman's > patch sorry about the brain fart here. > > > If so, my analysis hasn't been quite > > correct. My suggestion was based on the memory initializations, > > allocations and reservations trace. So here is the sequence of most > > crucial of them: > > 1) Memblock initialization: > > start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch() > > (At this point I suggested to place the exceptions memory > > reservation.) > > 2) Base FDT memory reservation: > > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self() > > 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges > > reservation: > > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem() > > 4) Reserve kernel itself, some critical sections like initrd and > > crash-kernel: > > start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()... > > 5) Copy and unflatten the built-into the kernel device tree > > (BMIPS-platform code): > > start_kernel()->setup_arch()->arch_mem_init()->device_tree_init() > > This is the very first time an allocation from the memblock pool > > is performed. Since we haven't reserved a memory for the exception > > vectors yet, the memblock allocator is free to return that memory > > range for any other use. Needless to say if we try to use that memory > > later without consulting with memblock, we may and in our case > > will get into troubles. > > 6) Many random early memblock allocations for kernel use before > > buddy and sl*b allocators are up and running... > > Note if for some fortunate reason the allocations made in 5) didn't > > overlap the exceptions memory, here we have much more chances to > > do that with obviously fatal consequences of the ranges independent > > usage. > > 7) Trap/exception vectors initialization and !memory reservation! for > > them: > > start_kernel()->trap_init() > > Only at this point we get to reserve the memory for the vectors. > > 8) Init and run buddy/sl*b allocators: > > start_kernel()->mm_init()->...mem_init()... > > > > There are a lot of allocations done in 5) and 6) before the > > trap_init() is called in 7). You can see that in your log. That's why > > I have doubts that your patch worked well. Most likely you've > > forgotten to revert the workaround suggested by me in the previous > > message. Could you make sure that you didn't and re-test your patch > > again? If it still works then I might have confused something and it's > > strange that my patch worked in the first place... > > I would like to submit a fix for 5.12-rc1 and get it back ported into > 5.11 so we have BMIPS machines boot again, that will be essentially your > earlier proposed fix. > > BMIPS is the only "legacy" MIPS platform that defines an exception base, > so while this problem may certainly exist with other platforms, I do > wonder how likely it is there, though? Hm, at least we can be sure that the problem exists for each platform, which conforms to the !cpu_has_mips_r2_r6 condition and which have VEIC/ VINT capability. Those platforms may get out of the first PAGE_SIZE memory in initializing the exceptions table thus corrupting the memory possibly allocated for something else. In my case the problem doesn't manifest itself because the CPU is MIPS32r5. -Sergey > > > > > A food for thoughts for everyone (Thomas, Mark, please join the > > discussion). What we've got here is a bit bigger problem. AFAICS > > if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node() > > performs the allocation above the very first PAGE_SIZE memory chunk > > (see that method code for details). So we are currently on a safe side > > for some older MIPS platforms. But the platform with VEIC/VINT may get > > into the same troubles here if they didn't reserve exception memory > > early enough before the kernel starts random allocations from > > memblock. So we either need to provide a generic workaround for that > > or make sure each platform gets to reserve vectors itself for instance > > in the plat_mem_setup() method. > > > > -Sergey > > > >> > >> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > >> index e0352958e2f7..b0a173b500e8 100644 > >> --- a/arch/mips/kernel/traps.c > >> +++ b/arch/mips/kernel/traps.c > >> @@ -2367,10 +2367,7 @@ void __init trap_init(void) > >> > >> if (!cpu_has_mips_r2_r6) { > >> ebase = CAC_BASE; > >> - ebase_pa = virt_to_phys((void *)ebase); > >> vec_size = 0x400; > >> - > >> - memblock_reserve(ebase_pa, vec_size); > >> } else { > >> if (cpu_has_veic || cpu_has_vint) > >> vec_size = 0x200 + VECTORSPACING*64; > >> @@ -2410,6 +2407,14 @@ void __init trap_init(void) > >> > >> if (board_ebase_setup) > >> board_ebase_setup(); > >> + > >> + /* board_ebase_setup() can change the exception base address > >> + * reserve it now after changes were made. > >> + */ > >> + if (!cpu_has_mips_r2_r6) { > >> + ebase_pa = virt_to_phys((void *)ebase); > >> + memblock_reserve(ebase_pa, vec_size); > >> + } > >> per_cpu_trap_init(true); > >> memblock_set_bottom_up(false); > >> -- > >> Florian > > -- > Florian