All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Florian Fainelli <f.fainelli@gmail.com>
Cc: Serge Semin <Sergey.Semin@baikalelectronics.ru>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Serge Semin <fancer.lancer@gmail.com>,
	Roman Gushchin <guro@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Kamal Dasu <kdasu.kdev@gmail.com>,
	Paul Cercueil <paul@crapouillou.net>,
	Jiaxun Yang <jiaxun.yang@flygoat.com>,
	iamjoonsoo.kim@lge.com, riel@surriel.com,
	Michal Hocko <mhocko@kernel.org>,
	linux-kernel@vger.kernel.org, kernel-team@fb.com,
	"open list:BROADCOM BMIPS MIPS ARCHITECTURE" 
	<linux-mips@vger.kernel.org>
Subject: Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
Date: Mon, 1 Mar 2021 11:45:42 +0200	[thread overview]
Message-ID: <YDy3xo3bMCqFtDhI@kernel.org> (raw)
In-Reply-To: <2e973fa8-5f2b-6840-0874-9c15fa0ebea0@gmail.com>

On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> Hi Serge,
> 
> On 2/28/2021 3:08 PM, Serge Semin wrote:
> > Hi folks,
> > What you've got here seems a more complicated problem than it
> > could originally look like. Please, see my comments below.
> > 
> > (Note I've discarded some of the email logs, which of no interest
> > to the discovered problem. Please also note that I haven't got any
> > Broadcom hardware to test out a solution suggested below.)
> > 
> > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >> Hi Mike,
> >>
> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>> Hi Florian,
> >>>
> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>
> > 
> >>>> [...]
> > 
> >>>>
> >>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>
> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>> not see a warning like this one in the commit message, instead what
> >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>> registered, and the system eventually not booting.
> >>>>
> >>>> The Device Tree is built-into the kernel image and resides at
> >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>
> >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > 
> > Most likely the problem you've discovered has been there for quite
> > some time. The patch you are referring to just caused it to be
> > triggered by extending the early allocation range. See before that
> > patch was accepted the early memory allocations had been performed
> > in the range:
> > [kernel_end, RAM_END].
> > The patch changed that, so the early allocations are done within
> > [RAM_START + PAGE_SIZE, RAM_END].
> > 
> > In normal situations it's safe to do that as long as all the critical
> > memory regions (including the memory residing a space below the
> > kernel) have been reserved. But as soon as a memory with some critical
> > structures haven't been reserved, the kernel may allocate it to be used
> > for instance for early initializations with obviously unpredictable but
> > most of the times unpleasant consequences.
> > 
> >>>
> >>> Apparently there is a memblock allocation in one of the functions called
> >>> from arch_mem_init() between plat_mem_setup() and
> >>> early_init_fdt_reserve_self().
> > 
> > Mike, alas according to the log provided by Florian that's not the reason
> > of the problem. Please, see my considerations below.
> > 
> >> [...]
> >>
> >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >> Feb 28 10:01:50 PST 2021
> >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >> [    0.000000] FPU revision is: 00130001
> > 
> >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> > 
> > Here the memory has been added to the memblock allocator.
> > 
> >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > 
> >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >> setup_arch+0x128/0x69c
> > 
> > Here the fdt memory has been reserved. (Note it's built into the
> > kernel.)
> > 
> >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >> setup_arch+0x1f8/0x69c
> > 
> > Here the kernel itself together with built-in dtb have been reserved.
> > So far so good.
> > 
> >> [    0.000000] Initrd not found or empty - disabling initrd
> > 
> >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >> memblock_alloc_range_nid+0xf8/0x198
> > 
> > The log above most likely belongs to the call-chain:
> > setup_arch()
> > +-> arch_mem_init()
> >     +-> device_tree_init() - BMIPS specific method
> >         +-> unflatten_and_copy_device_tree()
> > 
> > So to speak here we've copied the fdt from the original space
> > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > it to [0x00003aa4-0x0000ba4b].
> > 
> > The problem is that a bit later the next call-chain is performed:
> > setup_arch()
> > +-> plat_smp_setup()
> >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >         +-> if (!board_ebase_setup)
> >                  board_ebase_setup = &bmips_ebase_setup;
> > 
> > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > method is called. What trap_init() does isn't compatible with the
> > allocation performed by the unflatten_and_copy_device_tree() method.
> > See the next comment.
> > 
> >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84

...

> >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >> bytes, linear)
> > 
> >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >> trap_init+0x70/0x4e8
> > 
> > Most likely someplace here the corruption has happened. The log above
> > has just reserved a memory for NMI/reset vectors:
> > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > 
> > But then the board_ebase_setup() pointer is dereferenced and called,
> > which has been initialized with bmips_ebase_setup() earlier and which
> > overwrites the ebase variable with: 0x80001000 as this is
> > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > corruption of the memory above 0x80001000, which as we have discovered
> > belongs to fdt and unflattened device tree.
> > 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >> cma-reserved, 1835008K highmem)
> >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] rcu: Hierarchical RCU implementation.
> >> [    0.000000] rcu:     RCU event tracing is enabled.
> >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >> is 25 jiffies.
> >> [    0.000000] NR_IRQS: 256
> > 
> >> [    0.000000] OF: Bad cell count for /rdb
> >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > 
> > So here is the first time we have got the consequence of the corruption
> > popped up. Luckily it's just the "Bad cells count" error. We could have
> > got much less obvious log here up to getting a crash at some place
> > further...
> > 
> >> [    0.000000] random: get_random_bytes called from
> >> start_kernel+0x444/0x654 with crng_init=0
> >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >> wraps every 8589934590000000ns
> > 
> >>
> >> and with your patch applied which unfortunately did not work we have the
> >> following:
> >>
> >> [...]
> > 
> > So a patch like this shall workaround the corruption:
> > 
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >  
> >  	__dt_setup_arch(dtb);
> >  
> > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > +
> >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >  					     q->compatible)) {
> 
> This patch works, thanks a lot for the troubleshooting and analysis! How
> about the following which would be more generic and works as well and
> should be more universal since it does not require each architecture to
> provide an appropriate call to memblock_reserve():
> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index e0352958e2f7..b0a173b500e8 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> 
>         if (!cpu_has_mips_r2_r6) {
>                 ebase = CAC_BASE;
> -               ebase_pa = virt_to_phys((void *)ebase);
>                 vec_size = 0x400;
> -
> -               memblock_reserve(ebase_pa, vec_size);
>         } else {
>                 if (cpu_has_veic || cpu_has_vint)
>                         vec_size = 0x200 + VECTORSPACING*64;
> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> 
>         if (board_ebase_setup)
>                 board_ebase_setup();
> +
> +       /* board_ebase_setup() can change the exception base address
> +        * reserve it now after changes were made.
> +        */
> +       if (!cpu_has_mips_r2_r6) {
> +               ebase_pa = virt_to_phys((void *)ebase);
> +               memblock_reserve(ebase_pa, vec_size);
> +       }

With this it's still possible to have memblock allocations around ebase_pa
before it is reserved.

I think we have two options here to solve it in more or less generic way:

* split the reservation of ebase from traps_init() and move it earlier to
setup_arch(). I didn't check what board_ebase_setup() do, if they need to
allocate memory it would not work.

* add an API to memblock to set lower limit for allocations and then set
the lower limit, to e.g. kernel load address in arch_mem_init(). This may
add complexity for configurations with relocatable kernel and kaslr.

>         per_cpu_trap_init(true);
>         memblock_set_bottom_up(false);

-- 
Sincerely yours,
Mike.

  parent reply	other threads:[~2021-03-01  9:52 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 20:12 [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Roman Gushchin
2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin
2020-12-19 14:52   ` Wonhyuk Yang
2020-12-19 14:52     ` Wonhyuk Yang
2020-12-19 17:05     ` Roman Gushchin
2020-12-20  6:49   ` Mike Rapoport
2021-01-22  4:37     ` Thiago Jung Bauermann
2021-01-22  4:37       ` Thiago Jung Bauermann
2021-01-24  2:09       ` Andrew Morton
2021-01-24  2:09         ` Andrew Morton
2021-01-24  7:34         ` Mike Rapoport
2021-01-24  7:34           ` Mike Rapoport
2021-01-26  0:30           ` Thiago Jung Bauermann
2021-01-26  0:30             ` Thiago Jung Bauermann
2021-02-08 23:58           ` Thiago Jung Bauermann
2021-02-08 23:58             ` Thiago Jung Bauermann
2021-02-28  4:18   ` Florian Fainelli
2021-02-28  9:00     ` Mike Rapoport
2021-02-28 18:19       ` Florian Fainelli
2021-02-28 23:08         ` Serge Semin
2021-03-01  3:50           ` Florian Fainelli
2021-03-01  3:50             ` Florian Fainelli
2021-03-01  9:22             ` Serge Semin
2021-03-01  9:22               ` Serge Semin
2021-03-02  4:09               ` Florian Fainelli
2021-03-02  4:09                 ` Florian Fainelli
2021-03-02 13:26                 ` Serge Semin
2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
2021-03-02  8:09                 ` Mike Rapoport
2021-03-02 13:54                 ` Serge Semin
2021-03-02 19:04                 ` Roman Gushchin
2021-03-02 23:54                 ` Thomas Bogendoerfer
2021-03-03  1:30                   ` Florian Fainelli
2021-03-03  9:41                     ` Thomas Bogendoerfer
2021-03-03 17:45                       ` Maciej W. Rozycki
2021-03-03 18:15                         ` Thomas Bogendoerfer
2021-03-03 21:50                           ` Maciej W. Rozycki
2021-03-01  9:45             ` Mike Rapoport [this message]
2021-03-01  9:45               ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport
2021-03-02  3:55               ` Roman Gushchin
2021-03-02  3:55                 ` Roman Gushchin
2021-03-02 13:08                 ` Serge Semin
2021-03-23 18:19   ` [tip: x86/boot] x86/setup: Consolidate early memory reservations tip-bot2 for Mike Rapoport
2020-12-20  6:48 ` [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Mike Rapoport
2020-12-21 17:05   ` Roman Gushchin
2020-12-23  4:06     ` Andrew Morton
2020-12-23 16:35       ` Roman Gushchin
2020-12-23 22:10         ` Mike Rapoport
2020-12-28 19:36           ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YDy3xo3bMCqFtDhI@kernel.org \
    --to=rppt@kernel.org \
    --cc=Sergey.Semin@baikalelectronics.ru \
    --cc=akpm@linux-foundation.org \
    --cc=f.fainelli@gmail.com \
    --cc=fancer.lancer@gmail.com \
    --cc=guro@fb.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=kdasu.kdev@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=paul@crapouillou.net \
    --cc=riel@surriel.com \
    --cc=tsbogend@alpha.franken.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.