linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
To: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: catalin.marinas@arm.com, labbott@redhat.com,
	linux-arm-kernel@lists.infradead.org, ard.biesheuvel@linaro.org
Subject: Re: Memory size and section boundary on armv7
Date: Fri, 12 Apr 2019 13:10:13 +0300	[thread overview]
Message-ID: <20190412101013.GA3772@apalos> (raw)
In-Reply-To: <20190412084049.k34n22poz2vz7kn7@shell.armlinux.org.uk>

Hi Russell,

> > > 
> > > arch/arm/include/asm/pgtable-2level.h documents how we fit the
> > > hardware's page table implementation to the kernel's page table
> > > requirements.  The key thing to note is that, because of the mismatch
> > > between the kernel and the hardware, a Linux page table occupies two
> > > physical level 1 entries, whereas a section occupies one level 1
> > > entry.
> > > 
> > Yes i read that. Maybe i'll get a better hang of it after more reads
> > 
> > > Hence, if there is a section mapping in the even-numbered level 1
> > > entry, and we then get a request via alloc_init_pte() to create a
> > > page table, we can't without destroying the mapping that is already
> > > present.
> > > 
> > This is exactly what's happening.
> > [    0.000000] __map_init_section addr: c0200000, next c0300000, phys c0200000 len 100000
> > [    0.000000] __map_init_section addr: c0300000, next c0400000, phys c0300000 len 100000 
> > 
> > These are 2 subsequent section mappings that work. If there's any memory marked
> > as MEMBLOCK_NOMAP in the second section though the code path is
> > alloc_init_pte(). Should't we just skip that section overall ?
> 
> We can't skip the section mapping - if we skip the first section,
> when the kerenl then tries to access memory in that section, the
> kernel will oops.  We also have no prior knowledge in the mapping
> functions that this situation will occur.
Thanks for the explanation. 

> >
> > I am not sure i am following here. Since we can detect that why allow
> > firmware implementations crash the kernel? (as long as the memory
> > mappings are not wrong or overlapping)
> 
> It evolution.  These firmware implementations have come along _way_
> after the ARM mapping code was written, and they construct various
> situations that are difficult to handle - and given the further
> evolution of the kernel with facilities such as the NX protections
> (which rely on eg, the kernel text, being aligned to section
> boundaries so that different permissions can be given to each section,
> it is impossible to see any way to solve this problem at the mapping
> code level.
Can we at least we can print a warning saying "What you are trying to do is not
good enough, please re-check the mappings" or something like that, 
to help people avoid that in the future.
The BUG_ON is supposed to do that, but for some reason i can't see it on my
console, any ideas why?

> > All my tests were with memblock and efi debug enabled. I just left them out, my
> > bad.
> > 
> > Here's the tests and working vs non working use-cases
> > On all cases FDT size = 0xc000
> > 
> > ## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7f00000 ##
> > 
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> > [    0.000000]  memory.cnt  = 0x1
> > [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x1
> > [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> > [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> > [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> > [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> > [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> > [    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
> > [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> > [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> > [    0.000000]  memory.cnt  = 0x7
> > [    0.000000]  memory[0x0]     [0xc0000000-0xc7efffff], 0x07f00000 bytes flags: 0x0
> > [    0.000000]  memory[0x1]     [0xc7f00000-0xc7f0bfff], 0x0000c000 bytes flags: 0x4
> > [    0.000000]  memory[0x2]     [0xc7f0c000-0xdcf6afff], 0x1505f000 bytes flags: 0x0
> > [    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> > [    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> > [    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> > [    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x5
> > [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> > [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> > [    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
> > [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> > [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> > 
> > # Extra debug
> > [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
> > [    0.000000] __map_init_section addr: c7e00000, next c7f00000, phys c7e00000 len 100000 c7e1141e
> > [    0.000000] alloc_init_pte addr: c7f0c000, next c8000000, len f4000 pmd: c7e1141e
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc3-29427-g769f1f8f9b56-dirty #152
> > [    0.000000] Hardware name: STM32 (Device Tree Support)
> > [    0.000000] [<c03123ec>] (unwind_backtrace) from [<c030ce0c>] (show_stack+0x10/0x14)
> > [    0.000000] [<c030ce0c>] (show_stack) from [<c0e27250>] (dump_stack+0x8c/0xa0)
> > [    0.000000] [<c0e27250>] (dump_stack) from [<c1507e80>] (arm_pte_alloc+0x58/0x80)
> > [    0.000000] [<c1507e80>] (arm_pte_alloc) from [<c150819c>] (__create_mapping+0x2f4/0x34c)
> > [    0.000000] [<c150819c>] (__create_mapping) from [<c15089a0>] (paging_init+0x234/0x648)
> > [    0.000000] [<c15089a0>] (paging_init) from [<c1504950>] (setup_arch+0x660/0xcac)
> > [    0.000000] [<c1504950>] (setup_arch) from [<c1500a4c>] (start_kernel+0x70/0x458)
> > [    0.000000] [<c1500a4c>] (start_kernel) from [<00000000>] (  (null))
> > 
> > So the kernel correctly skips the 0xc000 that u-boot marked as NOMAP,
> > but then crashes.
> 
> If that is the FDT, why is it being marked NOMAP?  The kernel needs it
> to be mapped (but marked reserved in memblock) so that it can access
> it.  When booting normally with u-boot, FDT is not marked NOMAP.
The u-boot code marks it as EFI_RUNTIME_SERVICES_DATA, so they get flagged as
NOMAP. There's a parallel u-boot discussion on this trying to figure out if
that's correct

> 
> > The pmd value here is the same for the 1mb of mapped by __map_init_section() and 
> > for the subsequent alloc_init_pte(). Should we just skip that?
> > 
> > ## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7eff000 ##
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> > [    0.000000]  memory.cnt  = 0x1
> > [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x1
> > [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> > [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> > [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2861000-0xc7cfefff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7cff000-0xc7efefff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7eff000-0xc7f0afff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f0b000-0xdc705fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> > [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> > [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> > [    0.000000] memblock_reserve: [0xc7cff000-0xc7d0796c] arm_memblock_init+0x11c/0x168
> > [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> > [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> > [    0.000000]  memory.cnt  = 0x7
> > [    0.000000]  memory[0x0]     [0xc0000000-0xc7efefff], 0x07eff000 bytes flags: 0x0
> > [    0.000000]  memory[0x1]     [0xc7eff000-0xc7f0afff], 0x0000c000 bytes flags: 0x4
> > [    0.000000]  memory[0x2]     [0xc7f0b000-0xdcf6afff], 0x15060000 bytes flags: 0x0
> > [    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> > [    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> > [    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> > [    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x5
> > [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> > [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> > [    0.000000]  reserved[0x2]   [0xc7cff000-0xc7d0796c], 0x0000896d bytes flags: 0x0
> > [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> > [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> > 
> > # Extra debug
> > [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
> > [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000
> > [    0.000000] alloc_init_pte addr: c7e00000, next c7eff000, len ff000
> > [    0.000000] alloc_init_pte addr: c7f0b000, next c8000000, len f5000
> > 
> > So moving the FDT out of section alignment seems to fix the problem 
> 
> Indeed, because both even-section and odd-sections can't be created,
> and so are both created as page tables, which avoids the overlapping
> mappings problem.
> 
Yes. U-Boot wise we could fix that, by placing the dtb on a non section aligned
boundary

> > ## Booting with MEMBLOCK_NOMAP removed from FDT area. FDT at c7f00000
> > 
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> > [    0.000000]  memory.cnt  = 0x1
> > [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x1
> > [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> > [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> > [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> > [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> > [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> > [    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
> > [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> > [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> > [    0.000000]  memory.cnt  = 0x5
> > [    0.000000]  memory[0x0]     [0xc0000000-0xdcf6afff], 0x1cf6b000 bytes flags: 0x0
> > [    0.000000]  memory[0x1]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> > [    0.000000]  memory[0x2]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> > [    0.000000]  memory[0x3]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> > [    0.000000]  memory[0x4]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x5
> > [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> > [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> > [    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
> > [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> > [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> 
> I don't see any sign of the FDT being reserved in memblock, which is
> dangerous - the FDT could be overwritten during the early kernel boot.
> 
This was more of a test to indicate that 'if we dont mark it as NOMAP, we dont
crash' and provide further info for the analysis. 
Thanks for the note though, we'll make sure we won't overwrite those

Thanks for taking the time with this.
I still have some reading to fully understand the implications.

The easiest way out of it is to place the fdt on !SECTION_SIZE right?


Thanks
/Ilias

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-04-12 10:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11 15:13 Memory size and section boundary on armv7 Ilias Apalodimas
2019-04-11 15:41 ` Russell King - ARM Linux admin
2019-04-11 15:50   ` Ilias Apalodimas
2019-04-11 16:22     ` Russell King - ARM Linux admin
2019-04-12  5:23       ` Ilias Apalodimas
2019-04-12  8:40         ` Russell King - ARM Linux admin
2019-04-12 10:10           ` Ilias Apalodimas [this message]
2019-04-12 11:16             ` Russell King - ARM Linux admin
2019-04-12 11:26               ` Ilias Apalodimas
2019-04-12 11:43               ` Ilias Apalodimas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190412101013.GA3772@apalos \
    --to=ilias.apalodimas@linaro.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=labbott@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).