linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
To: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: catalin.marinas@arm.com, labbott@redhat.com,
	linux-arm-kernel@lists.infradead.org, ard.biesheuvel@linaro.org
Subject: Re: Memory size and section boundary on armv7
Date: Fri, 12 Apr 2019 08:23:44 +0300	[thread overview]
Message-ID: <20190412052344.GA13530@apalos> (raw)
In-Reply-To: <20190411162210.462hwmwlwrciq3re@shell.armlinux.org.uk>

Hi Russell,
> > > It seems that alloc_init_pte() is finding that a section mapping has
> > > already been created for the virtual address range, but it is then
> > > being asked to create a page table mapping over the top of it.  That
> > > is one BIG no no.  Replacing the existing section mapping with a page
> > > table means that the section mapping is destroyed.  If we allowed
> > > such replacement, when the section mapping is accessed, we will end
> > > up oopsing the kernel.
> > 
> > On the crashing case calling __map_init_section() maps 1mb should the rest be
> > skipped then?
> 
> Sorry, I don't understand the question.  What "rest" are you referring
> to?
I'll try to explain this better on my test-cases below

> > > It is also designed to allow hardware-section sized mappings (making
> > > it possible to map sections on 1MB granularity) but as a single Linux
> > > page table always occupies 2MB, it is not permitted for the unused
> > > half of an aligned 2MB slot to be used for a page table mapping -
> > > hence this BUG_ON().
> > > 
> > > The ARM early mapping routines are intentionally designed such that
> > > areas of memory that they are asked to map are non-overlapping - it
> > > is the caller's responsibility to ensure this.
> > 
> > And the case i described falls into 'overlapping'?
> 
> Yes, as I described above - note that I pointed out that a page table
> takes a naturally aligned 2MB of space, and sections are naturally
> aligned 1MB of space.
> 
> arch/arm/include/asm/pgtable-2level.h documents how we fit the
> hardware's page table implementation to the kernel's page table
> requirements.  The key thing to note is that, because of the mismatch
> between the kernel and the hardware, a Linux page table occupies two
> physical level 1 entries, whereas a section occupies one level 1
> entry.
> 
Yes i read that. Maybe i'll get a better hang of it after more reads

> Hence, if there is a section mapping in the even-numbered level 1
> entry, and we then get a request via alloc_init_pte() to create a
> page table, we can't without destroying the mapping that is already
> present.
> 
This is exactly what's happening.
[    0.000000] __map_init_section addr: c0200000, next c0300000, phys c0200000 len 100000
[    0.000000] __map_init_section addr: c0300000, next c0400000, phys c0300000 len 100000 

These are 2 subsequent section mappings that work. If there's any memory marked
as MEMBLOCK_NOMAP in the second section though the code path is
alloc_init_pte(). Should't we just skip that section overall ?

> So, yes, this absolutely is an overlapping mapping.  It may not be
> at page level, but it is in exactly the same way as asking for the
> first half of a page to be mapped to one physical address, and the
> second half to be mapped to a different physical address would be -
> that is beyond the capabilities of the system to achieve.
> 
> Having a section mapping in the first 1MB of a 2MB block and then
> trying to create a page table in the second half of a 2MB block is
> illegal as, as far as Linux is concerned, the level 1 page table
> is indexed in 2MB chunks and contains 2048 entries, not 4096 that
> the hardware actually has.
Thanks for the explanation. 
> 
> In fact, the software implementation completely ignores the odd-
> numbered entries in the L1 page table beyond writing them.
> 
> > > 
> > > So, this is not a problem with the mapping functions, but the way they
> > > are being used.
I am not sure i am following here. Since we can detect that why allow firmware
implementations crash the kernel? (as long as the memory mappings are not wrong
or overlapping)

> > > Beyond that, it is impossible to say with the above information since:
> > > 
> > > 1. You've obviously added some additional printk()s, but you haven't
> > >    said where they are or what they mean.  What is the difference
> > >    between "MAP addr" and "INI addr" ?
> > MAP comes from alloc_init_pmd() when calling ()
> > INI comes from the same function when calling alloc_init_pte()
> > > 
> > > 2. You haven't included the text of the crash, so there's no way to
> > >    know the call path to the BUG_ON(), and hence ascertain where the
> > >    duplicated mapping is coming from.
> > > 
> > > Please always provide the full unaltered text from any kernel oops, bug
> > > or warning.
> > That's because the BUG_ON never shows up in my console. I added printks before
> > and after the BUG_ON() though and i can only see the first print
> > I can provide a dump_stack() if that helps.
> 
> Yes please.
> 
> It may also help to enable memblock debugging (memblock=debug) so we can
> see what that looks like as well.
All my tests were with memblock and efi debug enabled. I just left them out, my
bad.

Here's the tests and working vs non working use-cases
On all cases FDT size = 0xc000

## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7f00000 ##

[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x1
[    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
[    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
[    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
[    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
[    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
[    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
[    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
[    0.000000] cma: Reserved 64 MiB at 0xd8000000
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
[    0.000000]  memory.cnt  = 0x7
[    0.000000]  memory[0x0]     [0xc0000000-0xc7efffff], 0x07f00000 bytes flags: 0x0
[    0.000000]  memory[0x1]     [0xc7f00000-0xc7f0bfff], 0x0000c000 bytes flags: 0x4
[    0.000000]  memory[0x2]     [0xc7f0c000-0xdcf6afff], 0x1505f000 bytes flags: 0x0
[    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
[    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
[    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
[    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x5
[    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0

# Extra debug
[    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
[    0.000000] __map_init_section addr: c7e00000, next c7f00000, phys c7e00000 len 100000 c7e1141e
[    0.000000] alloc_init_pte addr: c7f0c000, next c8000000, len f4000 pmd: c7e1141e
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc3-29427-g769f1f8f9b56-dirty #152
[    0.000000] Hardware name: STM32 (Device Tree Support)
[    0.000000] [<c03123ec>] (unwind_backtrace) from [<c030ce0c>] (show_stack+0x10/0x14)
[    0.000000] [<c030ce0c>] (show_stack) from [<c0e27250>] (dump_stack+0x8c/0xa0)
[    0.000000] [<c0e27250>] (dump_stack) from [<c1507e80>] (arm_pte_alloc+0x58/0x80)
[    0.000000] [<c1507e80>] (arm_pte_alloc) from [<c150819c>] (__create_mapping+0x2f4/0x34c)
[    0.000000] [<c150819c>] (__create_mapping) from [<c15089a0>] (paging_init+0x234/0x648)
[    0.000000] [<c15089a0>] (paging_init) from [<c1504950>] (setup_arch+0x660/0xcac)
[    0.000000] [<c1504950>] (setup_arch) from [<c1500a4c>] (start_kernel+0x70/0x458)
[    0.000000] [<c1500a4c>] (start_kernel) from [<00000000>] (  (null))

So the kernel correctly skips the 0xc000 that u-boot marked as NOMAP, but then
crashes.
The pmd value here is the same for the 1mb of mapped by __map_init_section() and 
for the subsequent alloc_init_pte(). Should we just skip that?

## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7eff000 ##
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x1
[    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
[    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
[    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2861000-0xc7cfefff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7cff000-0xc7efefff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7eff000-0xc7f0afff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f0b000-0xdc705fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
[    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
[    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
[    0.000000] memblock_reserve: [0xc7cff000-0xc7d0796c] arm_memblock_init+0x11c/0x168
[    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
[    0.000000] cma: Reserved 64 MiB at 0xd8000000
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
[    0.000000]  memory.cnt  = 0x7
[    0.000000]  memory[0x0]     [0xc0000000-0xc7efefff], 0x07eff000 bytes flags: 0x0
[    0.000000]  memory[0x1]     [0xc7eff000-0xc7f0afff], 0x0000c000 bytes flags: 0x4
[    0.000000]  memory[0x2]     [0xc7f0b000-0xdcf6afff], 0x15060000 bytes flags: 0x0
[    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
[    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
[    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
[    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x5
[    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0xc7cff000-0xc7d0796c], 0x0000896d bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0

# Extra debug
[    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
[    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000
[    0.000000] alloc_init_pte addr: c7e00000, next c7eff000, len ff000
[    0.000000] alloc_init_pte addr: c7f0b000, next c8000000, len f5000

So moving the FDT out of section alignment seems to fix the problem 

## Booting with MEMBLOCK_NOMAP removed from FDT area. FDT at c7f00000

[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x1
[    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
[    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
[    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
[    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
[    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
[    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
[    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
[    0.000000] cma: Reserved 64 MiB at 0xd8000000
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
[    0.000000]  memory.cnt  = 0x5
[    0.000000]  memory[0x0]     [0xc0000000-0xdcf6afff], 0x1cf6b000 bytes flags: 0x0
[    0.000000]  memory[0x1]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
[    0.000000]  memory[0x2]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
[    0.000000]  memory[0x3]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
[    0.000000]  memory[0x4]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x5
[    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0

# Extra debug(The FDT is mapped here instead of skipped)
[    0.000000] __map_init_section addr: c7e00000, next c8000000, phys c7e00000 len 200000

Also if i force the FDT size to  1MB i obviously skip the offending region
and boot correctly.


Thanks
/Ilias

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-04-12  5:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11 15:13 Memory size and section boundary on armv7 Ilias Apalodimas
2019-04-11 15:41 ` Russell King - ARM Linux admin
2019-04-11 15:50   ` Ilias Apalodimas
2019-04-11 16:22     ` Russell King - ARM Linux admin
2019-04-12  5:23       ` Ilias Apalodimas [this message]
2019-04-12  8:40         ` Russell King - ARM Linux admin
2019-04-12 10:10           ` Ilias Apalodimas
2019-04-12 11:16             ` Russell King - ARM Linux admin
2019-04-12 11:26               ` Ilias Apalodimas
2019-04-12 11:43               ` Ilias Apalodimas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190412052344.GA13530@apalos \
    --to=ilias.apalodimas@linaro.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=labbott@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).