From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2829CC4361A for ; Tue, 2 Mar 2021 08:12:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3E3164DE4 for ; Tue, 2 Mar 2021 08:12:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1577411AbhCBGWG (ORCPT ); Tue, 2 Mar 2021 01:22:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238022AbhCBELq (ORCPT ); Mon, 1 Mar 2021 23:11:46 -0500 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FE43C06178B; Mon, 1 Mar 2021 20:09:57 -0800 (PST) Received: by mail-pl1-x631.google.com with SMTP id d8so839363plg.10; Mon, 01 Mar 2021 20:09:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=1KJkvbOOudM1oxDYRZzEgcp/1QGBkgSxkkKaz88TmCI=; b=VOGoksLnOyaf4Fbk6JxAFjNeRv6dSQe08aCWtszgUntCI2V7M9Rta8HK0zf50awgbI N4xzTF/a4f/sT3QzNFdwr9oyskdFx4o6OMo/WEWMBBWbQcka8GmkteDT7K4e+YVHPLsE +tTjM/LJYhW1GoZ9JIqvsd3aAe54bXIzDjbfPijZ7pYFFmGfLIsnqIkHE4ixCBVPFbU/ Q59AdiKznnrCi75Vy5ddItEq95tJNb6Ex3wq5MHIR6PqqBGhWoWnhglX2b2qd/BZIY6+ WyhiC8xFuxSa7nE3gQMb/Se+lBG+15HhJgMMbU1TaNMm3kI/cxuHGx4m8vktqLYNp/n5 Sx1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=1KJkvbOOudM1oxDYRZzEgcp/1QGBkgSxkkKaz88TmCI=; b=VlbF2OxhV9iZDcQZnAv1AGccrNA4ZosWcDQnYUBhi4YDa3BV6aAFhbFZVI6/fd8bPm acC2/aLCtjLYVniy0yplf0/WhEaVqFVHPbN3PjAwNdV0bSflQGnsjpJ97EuyoA9byyz+ 1wFX//XjKXxHmNpJQwzrPezsnQQmEzrpXFje1AIYIU7WlTUwnzcTUkp2L6ngoRpY3EIT tgRQuOV9XIfQ0CE1idvuJNeYfCFNUSjc3ugW6NSHLxydriKSvqlrH70uEOoWg/h48E4z g2iXfYe33gvnHMb7IRa7YzMJoAB+PHjmXIH5HflqeRSZHD+yE16SIfSb4bJcdHiKnmoA WINA== X-Gm-Message-State: AOAM533p/gVVVxUPTtiw+psM/A9iHkp96dHAqFno0QvSeL2wlXjC3Kzk vkfuTDyMjcHW0L8LqWSYoKeqk+yV92o= X-Google-Smtp-Source: ABdhPJw6HCrQT4PLkbPsoyAFiC9U9w6khMKNsepyF4oRxcM2hqrmrvf1O39SjIDs9kTNC9UUj+Z9Gg== X-Received: by 2002:a17:90a:9f4a:: with SMTP id q10mr2337168pjv.129.1614658195890; Mon, 01 Mar 2021 20:09:55 -0800 (PST) Received: from [10.230.29.30] ([192.19.223.252]) by smtp.gmail.com with ESMTPSA id l15sm1102768pjq.9.2021.03.01.20.09.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 01 Mar 2021 20:09:55 -0800 (PST) Subject: Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end To: Serge Semin , Mike Rapoport , Thomas Bogendoerfer Cc: Serge Semin , Roman Gushchin , Andrew Morton , linux-mm@kvack.org, Kamal Dasu , Paul Cercueil , Jiaxun Yang , iamjoonsoo.kim@lge.com, riel@surriel.com, Michal Hocko , linux-kernel@vger.kernel.org, kernel-team@fb.com, "open list:BROADCOM BMIPS MIPS ARCHITECTURE" References: <20201217201214.3414100-1-guro@fb.com> <20201217201214.3414100-2-guro@fb.com> <23fc1ef9-7342-8bc2-d184-d898107c52b2@gmail.com> <20210228090041.GO1447004@kernel.org> <8cbafe95-0f8c-a9b7-2dc9-cded846622fd@gmail.com> <20210228230811.wdae7oaaf3mbpgwl@mobilestation> <2e973fa8-5f2b-6840-0874-9c15fa0ebea0@gmail.com> <20210301092241.i7dxo7zbg3ar55d6@mobilestation> From: Florian Fainelli Message-ID: <97600bf8-06fd-3d76-8791-c2e3c4eae8a1@gmail.com> Date: Mon, 1 Mar 2021 20:09:52 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <20210301092241.i7dxo7zbg3ar55d6@mobilestation> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/1/2021 1:22 AM, Serge Semin wrote: > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote: >> Hi Serge, >> >> On 2/28/2021 3:08 PM, Serge Semin wrote: >>> Hi folks, >>> What you've got here seems a more complicated problem than it >>> could originally look like. Please, see my comments below. >>> >>> (Note I've discarded some of the email logs, which of no interest >>> to the discovered problem. Please also note that I haven't got any >>> Broadcom hardware to test out a solution suggested below.) >>> >>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: >>>> Hi Mike, >>>> >>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote: >>>>> Hi Florian, >>>>> >>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: >>>>>> >>> >>>>>> [...] >>> >>>>>> >>>>>> Hi Roman, Thomas and other linux-mips folks, >>>>>> >>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this >>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do >>>>>> not see a warning like this one in the commit message, instead what >>>>>> happens appear to be a corrupted Device Tree which prevents the parsing >>>>>> of the "rdb" node and leading to the interrupt controllers not being >>>>>> registered, and the system eventually not booting. >>>>>> >>>>>> The Device Tree is built-into the kernel image and resides at >>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. >>>>>> >>>>>> Do you have any idea what could be wrong with MIPS specifically here? >>> >>> Most likely the problem you've discovered has been there for quite >>> some time. The patch you are referring to just caused it to be >>> triggered by extending the early allocation range. See before that >>> patch was accepted the early memory allocations had been performed >>> in the range: >>> [kernel_end, RAM_END]. >>> The patch changed that, so the early allocations are done within >>> [RAM_START + PAGE_SIZE, RAM_END]. >>> >>> In normal situations it's safe to do that as long as all the critical >>> memory regions (including the memory residing a space below the >>> kernel) have been reserved. But as soon as a memory with some critical >>> structures haven't been reserved, the kernel may allocate it to be used >>> for instance for early initializations with obviously unpredictable but >>> most of the times unpleasant consequences. >>> >>>>> >>>>> Apparently there is a memblock allocation in one of the functions called >>>>> from arch_mem_init() between plat_mem_setup() and >>>>> early_init_fdt_reserve_self(). >>> >>> Mike, alas according to the log provided by Florian that's not the reason >>> of the problem. Please, see my considerations below. >>> >>>> [...] >>>> >>>> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost) >>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun >>>> Feb 28 10:01:50 PST 2021 >>>> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) >>>> [ 0.000000] FPU revision is: 00130001 >>> >>>> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] >>>> early_init_dt_scan_memory+0x160/0x1e0 >>>> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] >>>> early_init_dt_scan_memory+0x160/0x1e0 >>>> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] >>>> early_init_dt_scan_memory+0x160/0x1e0 >>> >>> Here the memory has been added to the memblock allocator. >>> >>>> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB >>>> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') >>>> [ 0.000000] printk: bootconsole [ns16550a0] enabled >>> >>>> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] >>>> setup_arch+0x128/0x69c >>> >>> Here the fdt memory has been reserved. (Note it's built into the >>> kernel.) >>> >>>> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] >>>> setup_arch+0x1f8/0x69c >>> >>> Here the kernel itself together with built-in dtb have been reserved. >>> So far so good. >>> >>>> [ 0.000000] Initrd not found or empty - disabling initrd >>> >>>> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 >>>> early_init_dt_alloc_memory_arch+0x40/0x84 >>>> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 >>>> early_init_dt_alloc_memory_arch+0x40/0x84 >>>> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] >>>> memblock_alloc_range_nid+0xf8/0x198 >>> >>> The log above most likely belongs to the call-chain: >>> setup_arch() >>> +-> arch_mem_init() >>> +-> device_tree_init() - BMIPS specific method >>> +-> unflatten_and_copy_device_tree() >>> >>> So to speak here we've copied the fdt from the original space >>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened >>> it to [0x00003aa4-0x0000ba4b]. >>> >>> The problem is that a bit later the next call-chain is performed: >>> setup_arch() >>> +-> plat_smp_setup() >>> +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops(); >>> +-> if (!board_ebase_setup) >>> board_ebase_setup = &bmips_ebase_setup; >>> >>> So at the moment of the CPU traps initialization the bmips_ebase_setup() >>> method is called. What trap_init() does isn't compatible with the >>> allocation performed by the unflatten_and_copy_device_tree() method. >>> See the next comment. >>> >>>> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 >>>> early_init_dt_alloc_memory_arch+0x40/0x84 >>>> [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] >>>> setup_arch+0x3fc/0x69c >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c >>>> [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c >>>> [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c >>>> [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64 >>>> bytes. >>>> [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, >>>> linesize 32 bytes >>>> [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes. >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 >>>> [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 >>>> [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4 >>>> [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Zone ranges: >>>> [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff] >>>> [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff] >>>> [ 0.000000] Movable zone start for each node >>>> [ 0.000000] Early memory node ranges >>>> [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff] >>>> [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004fffffff] >>>> [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cfffffff] >>>> [ 0.000000] Initmem setup node 0 [mem >>>> 0x0000000000000000-0x00000000cfffffff] >>>> [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0 >>>> from=0x00000000 max_addr=0x00000000 >>>> alloc_node_mem_map.constprop.135+0x6c/0xc8 >>>> [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0 >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 >>>> [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0 >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98 >>>> [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] MEMBLOCK configuration: >>>> [ 0.000000] memory size = 0x80000000 reserved size = 0x0322f032 >>>> [ 0.000000] memory.cnt = 0x3 >>>> [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 >>>> bytes flags: 0x0 >>>> [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 >>>> bytes flags: 0x0 >>>> [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved.cnt = 0xa >>>> [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 >>>> bytes flags: 0x0 >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654 >>>> [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654 >>>> [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884 >>>> [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884 >>>> [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1 >>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 >>>> [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_free: [0x03245000-0x03244fff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] memblock_free: [0x03257000-0x03256fff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] memblock_free: [0x03269000-0x03268fff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] memblock_free: [0x0327b000-0x0327afff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec >>>> [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec >>>> [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec >>>> [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec >>>> [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec >>>> [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294 >>>> [ 0.000000] memblock_reserve: [0x03232580-0x032325db] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294 >>>> [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294 >>>> [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294 >>>> [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] >>>> pcpu_embed_first_chunk+0x838/0x884 >>>> [ 0.000000] memblock_free: [0x03231400-0x032323ff] >>>> pcpu_embed_first_chunk+0x850/0x884 >>>> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 523776 >>>> [ 0.000000] Kernel command line: console=ttyS0,115200 earlycon >>>> [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c >>>> [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 >>>> bytes, linear) >>>> [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1 >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c >>>> [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 >>>> bytes, linear) >>> >>>> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] >>>> trap_init+0x70/0x4e8 >>> >>> Most likely someplace here the corruption has happened. The log above >>> has just reserved a memory for NMI/reset vectors: >>> arch/mips/kernel/traps.c: trap_init(void): Line 2373. >>> >>> But then the board_ebase_setup() pointer is dereferenced and called, >>> which has been initialized with bmips_ebase_setup() earlier and which >>> overwrites the ebase variable with: 0x80001000 as this is >>> CPU_BMIPS5000 CPU. So any further calls of the functions like >>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a >>> corruption of the memory above 0x80001000, which as we have discovered >>> belongs to fdt and unflattened device tree. >>> >>>> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off >>>> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel code, >>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K >>>> cma-reserved, 1835008K highmem) >>>> [ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 >>>> [ 0.000000] rcu: Hierarchical RCU implementation. >>>> [ 0.000000] rcu: RCU event tracing is enabled. >>>> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay >>>> is 25 jiffies. >>>> [ 0.000000] NR_IRQS: 256 >>> >>>> [ 0.000000] OF: Bad cell count for /rdb >>>> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers >>>> [ 0.000000] OF: of_irq_init: children remain, but no parents >>> >>> So here is the first time we have got the consequence of the corruption >>> popped up. Luckily it's just the "Bad cells count" error. We could have >>> got much less obvious log here up to getting a crash at some place >>> further... >>> >>>> [ 0.000000] random: get_random_bytes called from >>>> start_kernel+0x444/0x654 with crng_init=0 >>>> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, >>>> wraps every 8589934590000000ns >>> >>>> >>>> and with your patch applied which unfortunately did not work we have the >>>> following: >>>> >>>> [...] >>> >>> So a patch like this shall workaround the corruption: >>> >>> --- a/arch/mips/bmips/setup.c >>> +++ b/arch/mips/bmips/setup.c >>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) >>> >>> __dt_setup_arch(dtb); >>> >>> + memblock_reserve(0x0, 0x1000 + 0x100*64); >>> + >>> for (q = bmips_quirk_list; q->quirk_fn; q++) { >>> if (of_flat_dt_is_compatible(of_get_flat_dt_root(), >>> q->compatible)) { >> > >> This patch works, thanks a lot for the troubleshooting and analysis! How >> about the following which would be more generic and works as well and >> should be more universal since it does not require each architecture to >> provide an appropriate call to memblock_reserve(): > > Hm, are you sure it's working? I was until I noticed that I was working on top of a revert of Roman's patch sorry about the brain fart here. > If so, my analysis hasn't been quite > correct. My suggestion was based on the memory initializations, > allocations and reservations trace. So here is the sequence of most > crucial of them: > 1) Memblock initialization: > start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch() > (At this point I suggested to place the exceptions memory > reservation.) > 2) Base FDT memory reservation: > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self() > 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges > reservation: > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem() > 4) Reserve kernel itself, some critical sections like initrd and > crash-kernel: > start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()... > 5) Copy and unflatten the built-into the kernel device tree > (BMIPS-platform code): > start_kernel()->setup_arch()->arch_mem_init()->device_tree_init() > This is the very first time an allocation from the memblock pool > is performed. Since we haven't reserved a memory for the exception > vectors yet, the memblock allocator is free to return that memory > range for any other use. Needless to say if we try to use that memory > later without consulting with memblock, we may and in our case > will get into troubles. > 6) Many random early memblock allocations for kernel use before > buddy and sl*b allocators are up and running... > Note if for some fortunate reason the allocations made in 5) didn't > overlap the exceptions memory, here we have much more chances to > do that with obviously fatal consequences of the ranges independent > usage. > 7) Trap/exception vectors initialization and !memory reservation! for > them: > start_kernel()->trap_init() > Only at this point we get to reserve the memory for the vectors. > 8) Init and run buddy/sl*b allocators: > start_kernel()->mm_init()->...mem_init()... > > There are a lot of allocations done in 5) and 6) before the > trap_init() is called in 7). You can see that in your log. That's why > I have doubts that your patch worked well. Most likely you've > forgotten to revert the workaround suggested by me in the previous > message. Could you make sure that you didn't and re-test your patch > again? If it still works then I might have confused something and it's > strange that my patch worked in the first place... I would like to submit a fix for 5.12-rc1 and get it back ported into 5.11 so we have BMIPS machines boot again, that will be essentially your earlier proposed fix. BMIPS is the only "legacy" MIPS platform that defines an exception base, so while this problem may certainly exist with other platforms, I do wonder how likely it is there, though? > > A food for thoughts for everyone (Thomas, Mark, please join the > discussion). What we've got here is a bit bigger problem. AFAICS > if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node() > performs the allocation above the very first PAGE_SIZE memory chunk > (see that method code for details). So we are currently on a safe side > for some older MIPS platforms. But the platform with VEIC/VINT may get > into the same troubles here if they didn't reserve exception memory > early enough before the kernel starts random allocations from > memblock. So we either need to provide a generic workaround for that > or make sure each platform gets to reserve vectors itself for instance > in the plat_mem_setup() method. > > -Sergey > >> >> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c >> index e0352958e2f7..b0a173b500e8 100644 >> --- a/arch/mips/kernel/traps.c >> +++ b/arch/mips/kernel/traps.c >> @@ -2367,10 +2367,7 @@ void __init trap_init(void) >> >> if (!cpu_has_mips_r2_r6) { >> ebase = CAC_BASE; >> - ebase_pa = virt_to_phys((void *)ebase); >> vec_size = 0x400; >> - >> - memblock_reserve(ebase_pa, vec_size); >> } else { >> if (cpu_has_veic || cpu_has_vint) >> vec_size = 0x200 + VECTORSPACING*64; >> @@ -2410,6 +2407,14 @@ void __init trap_init(void) >> >> if (board_ebase_setup) >> board_ebase_setup(); >> + >> + /* board_ebase_setup() can change the exception base address >> + * reserve it now after changes were made. >> + */ >> + if (!cpu_has_mips_r2_r6) { >> + ebase_pa = virt_to_phys((void *)ebase); >> + memblock_reserve(ebase_pa, vec_size); >> + } >> per_cpu_trap_init(true); >> memblock_set_bottom_up(false); >> -- >> Florian -- Florian From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C27D9C433E9 for ; Tue, 2 Mar 2021 04:09:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 186B1600EF for ; Tue, 2 Mar 2021 04:09:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 186B1600EF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6A6C18D00D9; Mon, 1 Mar 2021 23:09:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6581D8D0063; Mon, 1 Mar 2021 23:09:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51FD28D00D9; Mon, 1 Mar 2021 23:09:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0248.hostedemail.com [216.40.44.248]) by kanga.kvack.org (Postfix) with ESMTP id 3390E8D0063 for ; Mon, 1 Mar 2021 23:09:58 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E69D18249980 for ; Tue, 2 Mar 2021 04:09:57 +0000 (UTC) X-FDA: 77873606034.12.591E9BA Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by imf24.hostedemail.com (Postfix) with ESMTP id 909F7A0009C5 for ; Tue, 2 Mar 2021 04:09:56 +0000 (UTC) Received: by mail-pl1-f172.google.com with SMTP id z7so11251361plk.7 for ; Mon, 01 Mar 2021 20:09:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=1KJkvbOOudM1oxDYRZzEgcp/1QGBkgSxkkKaz88TmCI=; b=VOGoksLnOyaf4Fbk6JxAFjNeRv6dSQe08aCWtszgUntCI2V7M9Rta8HK0zf50awgbI N4xzTF/a4f/sT3QzNFdwr9oyskdFx4o6OMo/WEWMBBWbQcka8GmkteDT7K4e+YVHPLsE +tTjM/LJYhW1GoZ9JIqvsd3aAe54bXIzDjbfPijZ7pYFFmGfLIsnqIkHE4ixCBVPFbU/ Q59AdiKznnrCi75Vy5ddItEq95tJNb6Ex3wq5MHIR6PqqBGhWoWnhglX2b2qd/BZIY6+ WyhiC8xFuxSa7nE3gQMb/Se+lBG+15HhJgMMbU1TaNMm3kI/cxuHGx4m8vktqLYNp/n5 Sx1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=1KJkvbOOudM1oxDYRZzEgcp/1QGBkgSxkkKaz88TmCI=; b=kxGJJm74Ujs4oO7D/3BKVQz9x6gojmEdqatKcN7dvTEJ4RqYXXQFTTUZT01bRhAfj8 ZgjlvtCfzqGUC6517apit6J6Iw3S31mY2HhxqT/Pjc7pxm9DQm6pFWcF7h2x7UBpWipq K6tkUbvGR8Q/pEdZq6FB3jXoAorR4ckHpWa9bsegXDnP1JsK3YAGL64opJzW0f3eXMK1 Y9brxWrqniXwIimV7I0ZDxhih7nI06vsxZNKYlDXHi3Ac4CA+HZtOTeSUPP7hxgnEcDE nKBxJp2YRoAFw0J0eTpZzes2RHAmVZ6Yje9qxuRs5g761rk8tYg8ylHaWq8SJHH6cf1U mvUw== X-Gm-Message-State: AOAM531Gp60E84001HnKRZiqj0QviiomTT2Xll6/yG6xHfzyrKfSZFe/ O2UzHYoaFVjNWyBDbUp4V2c2cW85FU0= X-Google-Smtp-Source: ABdhPJw6HCrQT4PLkbPsoyAFiC9U9w6khMKNsepyF4oRxcM2hqrmrvf1O39SjIDs9kTNC9UUj+Z9Gg== X-Received: by 2002:a17:90a:9f4a:: with SMTP id q10mr2337168pjv.129.1614658195890; Mon, 01 Mar 2021 20:09:55 -0800 (PST) Received: from [10.230.29.30] ([192.19.223.252]) by smtp.gmail.com with ESMTPSA id l15sm1102768pjq.9.2021.03.01.20.09.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 01 Mar 2021 20:09:55 -0800 (PST) Subject: Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end To: Serge Semin , Mike Rapoport , Thomas Bogendoerfer Cc: Serge Semin , Roman Gushchin , Andrew Morton , linux-mm@kvack.org, Kamal Dasu , Paul Cercueil , Jiaxun Yang , iamjoonsoo.kim@lge.com, riel@surriel.com, Michal Hocko , linux-kernel@vger.kernel.org, kernel-team@fb.com, "open list:BROADCOM BMIPS MIPS ARCHITECTURE" References: <20201217201214.3414100-1-guro@fb.com> <20201217201214.3414100-2-guro@fb.com> <23fc1ef9-7342-8bc2-d184-d898107c52b2@gmail.com> <20210228090041.GO1447004@kernel.org> <8cbafe95-0f8c-a9b7-2dc9-cded846622fd@gmail.com> <20210228230811.wdae7oaaf3mbpgwl@mobilestation> <2e973fa8-5f2b-6840-0874-9c15fa0ebea0@gmail.com> <20210301092241.i7dxo7zbg3ar55d6@mobilestation> From: Florian Fainelli Message-ID: <97600bf8-06fd-3d76-8791-c2e3c4eae8a1@gmail.com> Date: Mon, 1 Mar 2021 20:09:52 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <20210301092241.i7dxo7zbg3ar55d6@mobilestation> Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 909F7A0009C5 X-Stat-Signature: waqcqdejy4aqmgsnnfag5pcfcj3myjgc Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=mail-pl1-f172.google.com; client-ip=209.85.214.172 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614658196-803867 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/1/2021 1:22 AM, Serge Semin wrote: > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote: >> Hi Serge, >> >> On 2/28/2021 3:08 PM, Serge Semin wrote: >>> Hi folks, >>> What you've got here seems a more complicated problem than it >>> could originally look like. Please, see my comments below. >>> >>> (Note I've discarded some of the email logs, which of no interest >>> to the discovered problem. Please also note that I haven't got any >>> Broadcom hardware to test out a solution suggested below.) >>> >>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote: >>>> Hi Mike, >>>> >>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote: >>>>> Hi Florian, >>>>> >>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote: >>>>>> >>> >>>>>> [...] >>> >>>>>> >>>>>> Hi Roman, Thomas and other linux-mips folks, >>>>>> >>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this >>>>>> commit, reverting it makes our MIPS platforms boot successfully. W= e do >>>>>> not see a warning like this one in the commit message, instead wha= t >>>>>> happens appear to be a corrupted Device Tree which prevents the pa= rsing >>>>>> of the "rdb" node and leading to the interrupt controllers not bei= ng >>>>>> registered, and the system eventually not booting. >>>>>> >>>>>> The Device Tree is built-into the kernel image and resides at >>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts. >>>>>> >>>>>> Do you have any idea what could be wrong with MIPS specifically he= re? >>> >>> Most likely the problem you've discovered has been there for quite >>> some time. The patch you are referring to just caused it to be >>> triggered by extending the early allocation range. See before that >>> patch was accepted the early memory allocations had been performed >>> in the range: >>> [kernel_end, RAM_END]. >>> The patch changed that, so the early allocations are done within >>> [RAM_START + PAGE_SIZE, RAM_END]. >>> >>> In normal situations it's safe to do that as long as all the critical >>> memory regions (including the memory residing a space below the >>> kernel) have been reserved. But as soon as a memory with some critica= l >>> structures haven't been reserved, the kernel may allocate it to be us= ed >>> for instance for early initializations with obviously unpredictable b= ut >>> most of the times unpleasant consequences. >>> >>>>> >>>>> Apparently there is a memblock allocation in one of the functions c= alled >>>>> from arch_mem_init() between plat_mem_setup() and >>>>> early_init_fdt_reserve_self(). >>> >>> Mike, alas according to the log provided by Florian that's not the re= ason >>> of the problem. Please, see my considerations below. >>> >>>> [...] >>>> >>>> [ 0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost= ) >>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP S= un >>>> Feb 28 10:01:50 PST 2021 >>>> [ 0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200) >>>> [ 0.000000] FPU revision is: 00130001 >>> >>>> [ 0.000000] memblock_add: [0x00000000-0x0fffffff] >>>> early_init_dt_scan_memory+0x160/0x1e0 >>>> [ 0.000000] memblock_add: [0x20000000-0x4fffffff] >>>> early_init_dt_scan_memory+0x160/0x1e0 >>>> [ 0.000000] memblock_add: [0x90000000-0xcfffffff] >>>> early_init_dt_scan_memory+0x160/0x1e0 >>> >>> Here the memory has been added to the memblock allocator. >>> >>>> [ 0.000000] MIPS: machine is Broadcom BCM97435SVMB >>>> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '') >>>> [ 0.000000] printk: bootconsole [ns16550a0] enabled >>> >>>> [ 0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0] >>>> setup_arch+0x128/0x69c >>> >>> Here the fdt memory has been reserved. (Note it's built into the >>> kernel.) >>> >>>> [ 0.000000] memblock_reserve: [0x00010000-0x018313cf] >>>> setup_arch+0x1f8/0x69c >>> >>> Here the kernel itself together with built-in dtb have been reserved. >>> So far so good. >>> >>>> [ 0.000000] Initrd not found or empty - disabling initrd >>> >>>> [ 0.000000] memblock_alloc_try_nid: 10913 bytes align=3D0x40 nid=3D= -1 >>>> from=3D0x00000000 max_addr=3D0x00000000 >>>> early_init_dt_alloc_memory_arch+0x40/0x84 >>>> [ 0.000000] memblock_reserve: [0x00001000-0x00003aa0] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32680 bytes align=3D0x4 nid=3D= -1 >>>> from=3D0x00000000 max_addr=3D0x00000000 >>>> early_init_dt_alloc_memory_arch+0x40/0x84 >>>> [ 0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b] >>>> memblock_alloc_range_nid+0xf8/0x198 >>> >>> The log above most likely belongs to the call-chain: >>> setup_arch() >>> +-> arch_mem_init() >>> +-> device_tree_init() - BMIPS specific method >>> +-> unflatten_and_copy_device_tree() >>> >>> So to speak here we've copied the fdt from the original space >>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened >>> it to [0x00003aa4-0x0000ba4b]. >>> >>> The problem is that a bit later the next call-chain is performed: >>> setup_arch() >>> +-> plat_smp_setup() >>> +-> mp_ops->smp_setup(); - registered by prom_init()->register_bm= ips_smp_ops(); >>> +-> if (!board_ebase_setup) >>> board_ebase_setup =3D &bmips_ebase_setup; >>> >>> So at the moment of the CPU traps initialization the bmips_ebase_setu= p() >>> method is called. What trap_init() does isn't compatible with the >>> allocation performed by the unflatten_and_copy_device_tree() method. >>> See the next comment. >>> >>>> [ 0.000000] memblock_alloc_try_nid: 25 bytes align=3D0x4 nid=3D-1 >>>> from=3D0x00000000 max_addr=3D0x00000000 >>>> early_init_dt_alloc_memory_arch+0x40/0x84 >>>> [ 0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_reserve: [0x0096a000-0x00969fff] >>>> setup_arch+0x3fc/0x69c >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=3D0x80 nid=3D-= 1 >>>> from=3D0x00000000 max_addr=3D0x00000000 setup_arch+0x4e0/0x69c >>>> [ 0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=3D0x80 nid=3D-= 1 >>>> from=3D0x00000000 max_addr=3D0x00000000 setup_arch+0x4e0/0x69c >>>> [ 0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=3D0x80 nid=3D-= 1 >>>> from=3D0x00000000 max_addr=3D0x00000000 setup_arch+0x4e0/0x69c >>>> [ 0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize= 64 >>>> bytes. >>>> [ 0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases, >>>> linesize 32 bytes >>>> [ 0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes= . >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=3D0x1000 nid= =3D-1 >>>> from=3D0x00000000 max_addr=3D0xffffffff fixrange_init+0x90/0xf4 >>>> [ 0.000000] memblock_reserve: [0x0000c000-0x0000cfff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=3D0x1000 nid= =3D-1 >>>> from=3D0x00000000 max_addr=3D0xffffffff fixrange_init+0x90/0xf4 >>>> [ 0.000000] memblock_reserve: [0x0000d000-0x0000dfff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=3D0x1000 nid= =3D-1 >>>> from=3D0x00000000 max_addr=3D0xffffffff fixrange_init+0x90/0xf4 >>>> [ 0.000000] memblock_reserve: [0x0000e000-0x0000efff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Zone ranges: >>>> [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff= ] >>>> [ 0.000000] HighMem [mem 0x0000000010000000-0x00000000cfffffff= ] >>>> [ 0.000000] Movable zone start for each node >>>> [ 0.000000] Early memory node ranges >>>> [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000ffffff= f] >>>> [ 0.000000] node 0: [mem 0x0000000020000000-0x000000004ffffff= f] >>>> [ 0.000000] node 0: [mem 0x0000000090000000-0x00000000cffffff= f] >>>> [ 0.000000] Initmem setup node 0 [mem >>>> 0x0000000000000000-0x00000000cfffffff] >>>> [ 0.000000] memblock_alloc_try_nid: 27262976 bytes align=3D0x80 n= id=3D0 >>>> from=3D0x00000000 max_addr=3D0x00000000 >>>> alloc_node_mem_map.constprop.135+0x6c/0xc8 >>>> [ 0.000000] memblock_reserve: [0x01831400-0x032313ff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 32 bytes align=3D0x80 nid=3D0 >>>> from=3D0x00000000 max_addr=3D0x00000000 setup_usemap+0x64/0x98 >>>> [ 0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 384 bytes align=3D0x80 nid=3D= 0 >>>> from=3D0x00000000 max_addr=3D0x00000000 setup_usemap+0x64/0x98 >>>> [ 0.000000] memblock_reserve: [0x0000bc80-0x0000bdff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] MEMBLOCK configuration: >>>> [ 0.000000] memory size =3D 0x80000000 reserved size =3D 0x0322f= 032 >>>> [ 0.000000] memory.cnt =3D 0x3 >>>> [ 0.000000] memory[0x0] [0x00000000-0x0fffffff], 0x10000000 >>>> bytes flags: 0x0 >>>> [ 0.000000] memory[0x1] [0x20000000-0x4fffffff], 0x30000000 >>>> bytes flags: 0x0 >>>> [ 0.000000] memory[0x2] [0x90000000-0xcfffffff], 0x40000000 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved.cnt =3D 0xa >>>> [ 0.000000] reserved[0x0] [0x00001000-0x00003aa0], 0x00002aa1 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x1] [0x00003aa4-0x0000ba64], 0x00007fc1 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x2] [0x0000ba80-0x0000ba9f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x3] [0x0000bb00-0x0000bb1f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x4] [0x0000bb80-0x0000bb9f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x5] [0x0000bc00-0x0000bc1f], 0x00000020 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x6] [0x0000bc80-0x0000bdff], 0x00000180 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x7] [0x0000c000-0x0000efff], 0x00003000 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x8] [0x00010000-0x018313cf], 0x018213d0 >>>> bytes flags: 0x0 >>>> [ 0.000000] reserved[0x9] [0x01831400-0x032313ff], 0x01a00000 >>>> bytes flags: 0x0 >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=3D0x80 nid=3D-= 1 >>>> from=3D0x00000000 max_addr=3D0x00000000 start_kernel+0x12c/0x654 >>>> [ 0.000000] memblock_reserve: [0x0000be00-0x0000be1d] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 30 bytes align=3D0x80 nid=3D-= 1 >>>> from=3D0x00000000 max_addr=3D0x00000000 start_kernel+0x150/0x654 >>>> [ 0.000000] memblock_reserve: [0x0000be80-0x0000be9d] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=3D0x1000 nid= =3D-1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_embed_first_chunk+0x3b0= /0x884 >>>> [ 0.000000] memblock_reserve: [0x0000f000-0x0000ffff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=3D0x80 nid=3D= -1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_embed_first_chunk+0x5a4= /0x884 >>>> [ 0.000000] memblock_reserve: [0x03231400-0x032323ff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 294912 bytes align=3D0x1000 n= id=3D-1 >>>> from=3D0x01000000 max_addr=3D0x00000000 pcpu_dfl_fc_alloc+0x24/0x30 >>>> [ 0.000000] memblock_reserve: [0x03233000-0x0327afff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_free: [0x03245000-0x03244fff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] memblock_free: [0x03257000-0x03256fff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] memblock_free: [0x03269000-0x03268fff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] memblock_free: [0x0327b000-0x0327afff] >>>> pcpu_embed_first_chunk+0x7a0/0x884 >>>> [ 0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728 >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=3D0x80 nid=3D-1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_setup_first_chunk+0x178= /0x6ec >>>> [ 0.000000] memblock_reserve: [0x0000bf00-0x0000bf03] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 4 bytes align=3D0x80 nid=3D-1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_setup_first_chunk+0x1a8= /0x6ec >>>> [ 0.000000] memblock_reserve: [0x0000bf80-0x0000bf83] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=3D0x80 nid=3D-= 1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_setup_first_chunk+0x1dc= /0x6ec >>>> [ 0.000000] memblock_reserve: [0x03232400-0x0323240f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 16 bytes align=3D0x80 nid=3D-= 1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_setup_first_chunk+0x20c= /0x6ec >>>> [ 0.000000] memblock_reserve: [0x03232480-0x0323248f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 128 bytes align=3D0x80 nid=3D= -1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_setup_first_chunk+0x558= /0x6ec >>>> [ 0.000000] memblock_reserve: [0x03232500-0x0323257f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 92 bytes align=3D0x80 nid=3D-= 1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_alloc_first_chunk+0x8c/= 0x294 >>>> [ 0.000000] memblock_reserve: [0x03232580-0x032325db] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 768 bytes align=3D0x80 nid=3D= -1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_alloc_first_chunk+0xe0/= 0x294 >>>> [ 0.000000] memblock_reserve: [0x03232600-0x032328ff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 772 bytes align=3D0x80 nid=3D= -1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_alloc_first_chunk+0x124= /0x294 >>>> [ 0.000000] memblock_reserve: [0x03232900-0x03232c03] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_alloc_try_nid: 192 bytes align=3D0x80 nid=3D= -1 >>>> from=3D0x00000000 max_addr=3D0x00000000 pcpu_alloc_first_chunk+0x158= /0x294 >>>> [ 0.000000] memblock_reserve: [0x03232c80-0x03232d3f] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] memblock_free: [0x0000f000-0x0000ffff] >>>> pcpu_embed_first_chunk+0x838/0x884 >>>> [ 0.000000] memblock_free: [0x03231400-0x032323ff] >>>> pcpu_embed_first_chunk+0x850/0x884 >>>> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages= : 523776 >>>> [ 0.000000] Kernel command line: console=3DttyS0,115200 earlycon >>>> [ 0.000000] memblock_alloc_try_nid: 131072 bytes align=3D0x80 nid= =3D-1 >>>> from=3D0x00000000 max_addr=3D0x00000000 alloc_large_system_hash+0x1f= 8/0x33c >>>> [ 0.000000] memblock_reserve: [0x0327b000-0x0329afff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131= 072 >>>> bytes, linear) >>>> [ 0.000000] memblock_alloc_try_nid: 65536 bytes align=3D0x80 nid=3D= -1 >>>> from=3D0x00000000 max_addr=3D0x00000000 alloc_large_system_hash+0x1f= 8/0x33c >>>> [ 0.000000] memblock_reserve: [0x0329b000-0x032aafff] >>>> memblock_alloc_range_nid+0xf8/0x198 >>>> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 6553= 6 >>>> bytes, linear) >>> >>>> [ 0.000000] memblock_reserve: [0x00000000-0x000003ff] >>>> trap_init+0x70/0x4e8 >>> >>> Most likely someplace here the corruption has happened. The log above >>> has just reserved a memory for NMI/reset vectors: >>> arch/mips/kernel/traps.c: trap_init(void): Line 2373. >>> >>> But then the board_ebase_setup() pointer is dereferenced and called, >>> which has been initialized with bmips_ebase_setup() earlier and which >>> overwrites the ebase variable with: 0x80001000 as this is >>> CPU_BMIPS5000 CPU. So any further calls of the functions like >>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause = a >>> corruption of the memory above 0x80001000, which as we have discovere= d >>> belongs to fdt and unflattened device tree. >>> >>>> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:o= ff >>>> [ 0.000000] Memory: 2045268K/2097152K available (8226K kernel cod= e, >>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, = 0K >>>> cma-reserved, 1835008K highmem) >>>> [ 0.000000] SLUB: HWalign=3D128, Order=3D0-3, MinObjects=3D0, CPU= s=3D4, Nodes=3D1 >>>> [ 0.000000] rcu: Hierarchical RCU implementation. >>>> [ 0.000000] rcu: RCU event tracing is enabled. >>>> [ 0.000000] rcu: RCU calculated value of scheduler-enlistment del= ay >>>> is 25 jiffies. >>>> [ 0.000000] NR_IRQS: 256 >>> >>>> [ 0.000000] OF: Bad cell count for /rdb >>>> [ 0.000000] irq_bcm7038_l1: failed to remap intc L1 registers >>>> [ 0.000000] OF: of_irq_init: children remain, but no parents >>> >>> So here is the first time we have got the consequence of the corrupti= on >>> popped up. Luckily it's just the "Bad cells count" error. We could ha= ve >>> got much less obvious log here up to getting a crash at some place >>> further... >>> >>>> [ 0.000000] random: get_random_bytes called from >>>> start_kernel+0x444/0x654 with crng_init=3D0 >>>> [ 0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns, >>>> wraps every 8589934590000000ns >>> >>>> >>>> and with your patch applied which unfortunately did not work we have= the >>>> following: >>>> >>>> [...] >>> >>> So a patch like this shall workaround the corruption: >>> >>> --- a/arch/mips/bmips/setup.c >>> +++ b/arch/mips/bmips/setup.c >>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void) >>> =20 >>> __dt_setup_arch(dtb); >>> =20 >>> + memblock_reserve(0x0, 0x1000 + 0x100*64); >>> + >>> for (q =3D bmips_quirk_list; q->quirk_fn; q++) { >>> if (of_flat_dt_is_compatible(of_get_flat_dt_root(), >>> q->compatible)) { >> >=20 >> This patch works, thanks a lot for the troubleshooting and analysis! H= ow >> about the following which would be more generic and works as well and >> should be more universal since it does not require each architecture t= o >> provide an appropriate call to memblock_reserve(): >=20 > Hm, are you sure it's working? I was until I noticed that I was working on top of a revert of Roman's patch sorry about the brain fart here. > If so, my analysis hasn't been quite > correct. My suggestion was based on the memory initializations, > allocations and reservations trace. So here is the sequence of most > crucial of them: > 1) Memblock initialization: > start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__d= t_setup_arch() > (At this point I suggested to place the exceptions memory > reservation.) > 2) Base FDT memory reservation: > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserv= e_self() > 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges > reservation: > start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_r= eserved_mem() > 4) Reserve kernel itself, some critical sections like initrd and > crash-kernel: > start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()... > 5) Copy and unflatten the built-into the kernel device tree > (BMIPS-platform code): > start_kernel()->setup_arch()->arch_mem_init()->device_tree_init() > This is the very first time an allocation from the memblock pool > is performed. Since we haven't reserved a memory for the exception > vectors yet, the memblock allocator is free to return that memory > range for any other use. Needless to say if we try to use that memor= y > later without consulting with memblock, we may and in our case > will get into troubles. > 6) Many random early memblock allocations for kernel use before > buddy and sl*b allocators are up and running... > Note if for some fortunate reason the allocations made in 5) didn't > overlap the exceptions memory, here we have much more chances to > do that with obviously fatal consequences of the ranges independent > usage. > 7) Trap/exception vectors initialization and !memory reservation! for > them: > start_kernel()->trap_init() > Only at this point we get to reserve the memory for the vectors. > 8) Init and run buddy/sl*b allocators: > start_kernel()->mm_init()->...mem_init()... >=20 > There are a lot of allocations done in 5) and 6) before the > trap_init() is called in 7). You can see that in your log. That's why > I have doubts that your patch worked well. Most likely you've > forgotten to revert the workaround suggested by me in the previous > message. Could you make sure that you didn't and re-test your patch > again? If it still works then I might have confused something and it's > strange that my patch worked in the first place... I would like to submit a fix for 5.12-rc1 and get it back ported into 5.11 so we have BMIPS machines boot again, that will be essentially your earlier proposed fix. BMIPS is the only "legacy" MIPS platform that defines an exception base, so while this problem may certainly exist with other platforms, I do wonder how likely it is there, though? >=20 > A food for thoughts for everyone (Thomas, Mark, please join the > discussion). What we've got here is a bit bigger problem. AFAICS > if bottom-up allocation is enabled (it's our case) memblock_find_in_ran= ge_node() > performs the allocation above the very first PAGE_SIZE memory chunk > (see that method code for details). So we are currently on a safe side > for some older MIPS platforms. But the platform with VEIC/VINT may get > into the same troubles here if they didn't reserve exception memory > early enough before the kernel starts random allocations from > memblock. So we either need to provide a generic workaround for that > or make sure each platform gets to reserve vectors itself for instance > in the plat_mem_setup() method. >=20 > -Sergey >=20 >> >> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c >> index e0352958e2f7..b0a173b500e8 100644 >> --- a/arch/mips/kernel/traps.c >> +++ b/arch/mips/kernel/traps.c >> @@ -2367,10 +2367,7 @@ void __init trap_init(void) >> >> if (!cpu_has_mips_r2_r6) { >> ebase =3D CAC_BASE; >> - ebase_pa =3D virt_to_phys((void *)ebase); >> vec_size =3D 0x400; >> - >> - memblock_reserve(ebase_pa, vec_size); >> } else { >> if (cpu_has_veic || cpu_has_vint) >> vec_size =3D 0x200 + VECTORSPACING*64; >> @@ -2410,6 +2407,14 @@ void __init trap_init(void) >> >> if (board_ebase_setup) >> board_ebase_setup(); >> + >> + /* board_ebase_setup() can change the exception base address >> + * reserve it now after changes were made. >> + */ >> + if (!cpu_has_mips_r2_r6) { >> + ebase_pa =3D virt_to_phys((void *)ebase); >> + memblock_reserve(ebase_pa, vec_size); >> + } >> per_cpu_trap_init(true); >> memblock_set_bottom_up(false); >> --=20 >> Florian --=20 Florian