linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Memory size and section boundary on armv7
@ 2019-04-11 15:13 Ilias Apalodimas
  2019-04-11 15:41 ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 10+ messages in thread
From: Ilias Apalodimas @ 2019-04-11 15:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: catalin.marinas, labbott, linux, mrutland, ard.biesheuvel

Hello, 

While experimenting with u-boot and booting the kernel as an EFI stub i seem to
hit an issue with memory mapping (map_lowmem).

I can only reproduce the crash when using u-boot's dtb (specified in
fdtcontroladdr) and pass that to the kernel, instead of loading an external one
with 'dtb=' on it's command line.

If CONFIG_ARM_LPAE=y, everything works fine. If the latter is not selected
though the kernel seems to crash only if dtb is placed on a section boundary.
The reason is crashes is the BUG_ON() on arm_pte_alloc() triggers.

fdt size is 0xc00 in both scenarios.
So something like this works fine:
In this case ff000 + c000 + f5000 = 200000
[    0.000000] MAP addr: c7c00000, next c7e00000, phys c7c00000 len 200000
[    0.000000] INI addr: c7e00000, next c7eff000, len ff000
[    0.000000] INI addr: c7f0b000, next c8000000, len f5000
[    0.000000] MAP addr: c8000000, next c8200000, phys c8000000 len 200000

In this working case kernel maps sizes ff000 and f5000 alloc_init_pte() and
skips the 0xc000 of the fdt correctly

The non-working case is: 
[    0.000000] MAP addr: c7e00000, next c7f00000, phys c7e00000 len 100000
[    0.000000] INI addr: c7f0c000, next c8000000, len f4000

Both entries end up using the same pmd

I am not sure what's the best way to fix that. 
Obviously skipping the alloc_init_pte() once this case is detected fixes things,
but  is there a better idea?


Thanks
/Ilias

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory size and section boundary on armv7
  2019-04-11 15:13 Memory size and section boundary on armv7 Ilias Apalodimas
@ 2019-04-11 15:41 ` Russell King - ARM Linux admin
  2019-04-11 15:50   ` Ilias Apalodimas
  0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux admin @ 2019-04-11 15:41 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: catalin.marinas, labbott, mrutland, linux-arm-kernel, ard.biesheuvel

On Thu, Apr 11, 2019 at 06:13:20PM +0300, Ilias Apalodimas wrote:
> Hello, 
> 
> While experimenting with u-boot and booting the kernel as an EFI stub i seem to
> hit an issue with memory mapping (map_lowmem).
> 
> I can only reproduce the crash when using u-boot's dtb (specified in
> fdtcontroladdr) and pass that to the kernel, instead of loading an external one
> with 'dtb=' on it's command line.
> 
> If CONFIG_ARM_LPAE=y, everything works fine. If the latter is not selected
> though the kernel seems to crash only if dtb is placed on a section boundary.
> The reason is crashes is the BUG_ON() on arm_pte_alloc() triggers.
> 
> fdt size is 0xc00 in both scenarios.
> So something like this works fine:
> In this case ff000 + c000 + f5000 = 200000
> [    0.000000] MAP addr: c7c00000, next c7e00000, phys c7c00000 len 200000
> [    0.000000] INI addr: c7e00000, next c7eff000, len ff000
> [    0.000000] INI addr: c7f0b000, next c8000000, len f5000
> [    0.000000] MAP addr: c8000000, next c8200000, phys c8000000 len 200000
> 
> In this working case kernel maps sizes ff000 and f5000 alloc_init_pte() and
> skips the 0xc000 of the fdt correctly
> 
> The non-working case is: 
> [    0.000000] MAP addr: c7e00000, next c7f00000, phys c7e00000 len 100000
> [    0.000000] INI addr: c7f0c000, next c8000000, len f4000
> 
> Both entries end up using the same pmd
> 
> I am not sure what's the best way to fix that. 
> Obviously skipping the alloc_init_pte() once this case is detected fixes things,
> but  is there a better idea?

Well, with the above information, all that I can say is that the
mapping code is quite rightfully BUG_ON()ing - it's working as
designed.

It seems that alloc_init_pte() is finding that a section mapping has
already been created for the virtual address range, but it is then
being asked to create a page table mapping over the top of it.  That
is one BIG no no.  Replacing the existing section mapping with a page
table means that the section mapping is destroyed.  If we allowed
such replacement, when the section mapping is accessed, we will end
up oopsing the kernel.

It is also designed to allow hardware-section sized mappings (making
it possible to map sections on 1MB granularity) but as a single Linux
page table always occupies 2MB, it is not permitted for the unused
half of an aligned 2MB slot to be used for a page table mapping -
hence this BUG_ON().

The ARM early mapping routines are intentionally designed such that
areas of memory that they are asked to map are non-overlapping - it
is the caller's responsibility to ensure this.

So, this is not a problem with the mapping functions, but the way they
are being used.

Beyond that, it is impossible to say with the above information since:

1. You've obviously added some additional printk()s, but you haven't
   said where they are or what they mean.  What is the difference
   between "MAP addr" and "INI addr" ?

2. You haven't included the text of the crash, so there's no way to
   know the call path to the BUG_ON(), and hence ascertain where the
   duplicated mapping is coming from.

Please always provide the full unaltered text from any kernel oops, bug
or warning.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory size and section boundary on armv7
  2019-04-11 15:41 ` Russell King - ARM Linux admin
@ 2019-04-11 15:50   ` Ilias Apalodimas
  2019-04-11 16:22     ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 10+ messages in thread
From: Ilias Apalodimas @ 2019-04-11 15:50 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: catalin.marinas, labbott, mrutland, linux-arm-kernel, ard.biesheuvel

Hi Russell,

> On Thu, Apr 11, 2019 at 06:13:20PM +0300, Ilias Apalodimas wrote:
> > Hello, 
> > 
> > While experimenting with u-boot and booting the kernel as an EFI stub i seem to
> > hit an issue with memory mapping (map_lowmem).
> > 
> > I can only reproduce the crash when using u-boot's dtb (specified in
> > fdtcontroladdr) and pass that to the kernel, instead of loading an external one
> > with 'dtb=' on it's command line.
> > 
> > If CONFIG_ARM_LPAE=y, everything works fine. If the latter is not selected
> > though the kernel seems to crash only if dtb is placed on a section boundary.
> > The reason is crashes is the BUG_ON() on arm_pte_alloc() triggers.
> > 
> > fdt size is 0xc00 in both scenarios.
> > So something like this works fine:
> > In this case ff000 + c000 + f5000 = 200000
> > [    0.000000] MAP addr: c7c00000, next c7e00000, phys c7c00000 len 200000
> > [    0.000000] INI addr: c7e00000, next c7eff000, len ff000
> > [    0.000000] INI addr: c7f0b000, next c8000000, len f5000
> > [    0.000000] MAP addr: c8000000, next c8200000, phys c8000000 len 200000
> > 
> > In this working case kernel maps sizes ff000 and f5000 alloc_init_pte() and
> > skips the 0xc000 of the fdt correctly
> > 
> > The non-working case is: 
> > [    0.000000] MAP addr: c7e00000, next c7f00000, phys c7e00000 len 100000
> > [    0.000000] INI addr: c7f0c000, next c8000000, len f4000
> > 
> > Both entries end up using the same pmd
> > 
> > I am not sure what's the best way to fix that. 
> > Obviously skipping the alloc_init_pte() once this case is detected fixes things,
> > but  is there a better idea?
> 
> Well, with the above information, all that I can say is that the
> mapping code is quite rightfully BUG_ON()ing - it's working as
> designed.
> 
> It seems that alloc_init_pte() is finding that a section mapping has
> already been created for the virtual address range, but it is then
> being asked to create a page table mapping over the top of it.  That
> is one BIG no no.  Replacing the existing section mapping with a page
> table means that the section mapping is destroyed.  If we allowed
> such replacement, when the section mapping is accessed, we will end
> up oopsing the kernel.
On the crashing case calling __map_init_section() maps 1mb should the rest be
skipped then?

> 
> It is also designed to allow hardware-section sized mappings (making
> it possible to map sections on 1MB granularity) but as a single Linux
> page table always occupies 2MB, it is not permitted for the unused
> half of an aligned 2MB slot to be used for a page table mapping -
> hence this BUG_ON().
> 
> The ARM early mapping routines are intentionally designed such that
> areas of memory that they are asked to map are non-overlapping - it
> is the caller's responsibility to ensure this.

And the case i described falls into 'overlapping'?

> 
> So, this is not a problem with the mapping functions, but the way they
> are being used.
> Beyond that, it is impossible to say with the above information since:
> 
> 1. You've obviously added some additional printk()s, but you haven't
>    said where they are or what they mean.  What is the difference
>    between "MAP addr" and "INI addr" ?
MAP comes from alloc_init_pmd() when calling ()
INI comes from the same function when calling alloc_init_pte()
> 
> 2. You haven't included the text of the crash, so there's no way to
>    know the call path to the BUG_ON(), and hence ascertain where the
>    duplicated mapping is coming from.
> 
> Please always provide the full unaltered text from any kernel oops, bug
> or warning.
That's because the BUG_ON never shows up in my console. I added printks before
and after the BUG_ON() though and i can only see the first print
I can provide a dump_stack() if that helps.

Thanks
/Ilias

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory size and section boundary on armv7
  2019-04-11 15:50   ` Ilias Apalodimas
@ 2019-04-11 16:22     ` Russell King - ARM Linux admin
  2019-04-12  5:23       ` Ilias Apalodimas
  0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux admin @ 2019-04-11 16:22 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: catalin.marinas, labbott, linux-arm-kernel, ard.biesheuvel

On Thu, Apr 11, 2019 at 06:50:00PM +0300, Ilias Apalodimas wrote:
> Hi Russell,
> 
> > On Thu, Apr 11, 2019 at 06:13:20PM +0300, Ilias Apalodimas wrote:
> > > Hello, 
> > > 
> > > While experimenting with u-boot and booting the kernel as an EFI stub i seem to
> > > hit an issue with memory mapping (map_lowmem).
> > > 
> > > I can only reproduce the crash when using u-boot's dtb (specified in
> > > fdtcontroladdr) and pass that to the kernel, instead of loading an external one
> > > with 'dtb=' on it's command line.
> > > 
> > > If CONFIG_ARM_LPAE=y, everything works fine. If the latter is not selected
> > > though the kernel seems to crash only if dtb is placed on a section boundary.
> > > The reason is crashes is the BUG_ON() on arm_pte_alloc() triggers.
> > > 
> > > fdt size is 0xc00 in both scenarios.
> > > So something like this works fine:
> > > In this case ff000 + c000 + f5000 = 200000
> > > [    0.000000] MAP addr: c7c00000, next c7e00000, phys c7c00000 len 200000
> > > [    0.000000] INI addr: c7e00000, next c7eff000, len ff000
> > > [    0.000000] INI addr: c7f0b000, next c8000000, len f5000
> > > [    0.000000] MAP addr: c8000000, next c8200000, phys c8000000 len 200000
> > > 
> > > In this working case kernel maps sizes ff000 and f5000 alloc_init_pte() and
> > > skips the 0xc000 of the fdt correctly
> > > 
> > > The non-working case is: 
> > > [    0.000000] MAP addr: c7e00000, next c7f00000, phys c7e00000 len 100000
> > > [    0.000000] INI addr: c7f0c000, next c8000000, len f4000
> > > 
> > > Both entries end up using the same pmd
> > > 
> > > I am not sure what's the best way to fix that. 
> > > Obviously skipping the alloc_init_pte() once this case is detected fixes things,
> > > but  is there a better idea?
> > 
> > Well, with the above information, all that I can say is that the
> > mapping code is quite rightfully BUG_ON()ing - it's working as
> > designed.
> > 
> > It seems that alloc_init_pte() is finding that a section mapping has
> > already been created for the virtual address range, but it is then
> > being asked to create a page table mapping over the top of it.  That
> > is one BIG no no.  Replacing the existing section mapping with a page
> > table means that the section mapping is destroyed.  If we allowed
> > such replacement, when the section mapping is accessed, we will end
> > up oopsing the kernel.
> 
> On the crashing case calling __map_init_section() maps 1mb should the rest be
> skipped then?

Sorry, I don't understand the question.  What "rest" are you referring
to?

> > It is also designed to allow hardware-section sized mappings (making
> > it possible to map sections on 1MB granularity) but as a single Linux
> > page table always occupies 2MB, it is not permitted for the unused
> > half of an aligned 2MB slot to be used for a page table mapping -
> > hence this BUG_ON().
> > 
> > The ARM early mapping routines are intentionally designed such that
> > areas of memory that they are asked to map are non-overlapping - it
> > is the caller's responsibility to ensure this.
> 
> And the case i described falls into 'overlapping'?

Yes, as I described above - note that I pointed out that a page table
takes a naturally aligned 2MB of space, and sections are naturally
aligned 1MB of space.

arch/arm/include/asm/pgtable-2level.h documents how we fit the
hardware's page table implementation to the kernel's page table
requirements.  The key thing to note is that, because of the mismatch
between the kernel and the hardware, a Linux page table occupies two
physical level 1 entries, whereas a section occupies one level 1
entry.

Hence, if there is a section mapping in the even-numbered level 1
entry, and we then get a request via alloc_init_pte() to create a
page table, we can't without destroying the mapping that is already
present.

So, yes, this absolutely is an overlapping mapping.  It may not be
at page level, but it is in exactly the same way as asking for the
first half of a page to be mapped to one physical address, and the
second half to be mapped to a different physical address would be -
that is beyond the capabilities of the system to achieve.

Having a section mapping in the first 1MB of a 2MB block and then
trying to create a page table in the second half of a 2MB block is
illegal as, as far as Linux is concerned, the level 1 page table
is indexed in 2MB chunks and contains 2048 entries, not 4096 that
the hardware actually has.

In fact, the software implementation completely ignores the odd-
numbered entries in the L1 page table beyond writing them.

> > 
> > So, this is not a problem with the mapping functions, but the way they
> > are being used.
> > Beyond that, it is impossible to say with the above information since:
> > 
> > 1. You've obviously added some additional printk()s, but you haven't
> >    said where they are or what they mean.  What is the difference
> >    between "MAP addr" and "INI addr" ?
> MAP comes from alloc_init_pmd() when calling ()
> INI comes from the same function when calling alloc_init_pte()
> > 
> > 2. You haven't included the text of the crash, so there's no way to
> >    know the call path to the BUG_ON(), and hence ascertain where the
> >    duplicated mapping is coming from.
> > 
> > Please always provide the full unaltered text from any kernel oops, bug
> > or warning.
> That's because the BUG_ON never shows up in my console. I added printks before
> and after the BUG_ON() though and i can only see the first print
> I can provide a dump_stack() if that helps.

Yes please.

It may also help to enable memblock debugging (memblock=debug) so we can
see what that looks like as well.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory size and section boundary on armv7
  2019-04-11 16:22     ` Russell King - ARM Linux admin
@ 2019-04-12  5:23       ` Ilias Apalodimas
  2019-04-12  8:40         ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 10+ messages in thread
From: Ilias Apalodimas @ 2019-04-12  5:23 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: catalin.marinas, labbott, linux-arm-kernel, ard.biesheuvel

Hi Russell,
> > > It seems that alloc_init_pte() is finding that a section mapping has
> > > already been created for the virtual address range, but it is then
> > > being asked to create a page table mapping over the top of it.  That
> > > is one BIG no no.  Replacing the existing section mapping with a page
> > > table means that the section mapping is destroyed.  If we allowed
> > > such replacement, when the section mapping is accessed, we will end
> > > up oopsing the kernel.
> > 
> > On the crashing case calling __map_init_section() maps 1mb should the rest be
> > skipped then?
> 
> Sorry, I don't understand the question.  What "rest" are you referring
> to?
I'll try to explain this better on my test-cases below

> > > It is also designed to allow hardware-section sized mappings (making
> > > it possible to map sections on 1MB granularity) but as a single Linux
> > > page table always occupies 2MB, it is not permitted for the unused
> > > half of an aligned 2MB slot to be used for a page table mapping -
> > > hence this BUG_ON().
> > > 
> > > The ARM early mapping routines are intentionally designed such that
> > > areas of memory that they are asked to map are non-overlapping - it
> > > is the caller's responsibility to ensure this.
> > 
> > And the case i described falls into 'overlapping'?
> 
> Yes, as I described above - note that I pointed out that a page table
> takes a naturally aligned 2MB of space, and sections are naturally
> aligned 1MB of space.
> 
> arch/arm/include/asm/pgtable-2level.h documents how we fit the
> hardware's page table implementation to the kernel's page table
> requirements.  The key thing to note is that, because of the mismatch
> between the kernel and the hardware, a Linux page table occupies two
> physical level 1 entries, whereas a section occupies one level 1
> entry.
> 
Yes i read that. Maybe i'll get a better hang of it after more reads

> Hence, if there is a section mapping in the even-numbered level 1
> entry, and we then get a request via alloc_init_pte() to create a
> page table, we can't without destroying the mapping that is already
> present.
> 
This is exactly what's happening.
[    0.000000] __map_init_section addr: c0200000, next c0300000, phys c0200000 len 100000
[    0.000000] __map_init_section addr: c0300000, next c0400000, phys c0300000 len 100000 

These are 2 subsequent section mappings that work. If there's any memory marked
as MEMBLOCK_NOMAP in the second section though the code path is
alloc_init_pte(). Should't we just skip that section overall ?

> So, yes, this absolutely is an overlapping mapping.  It may not be
> at page level, but it is in exactly the same way as asking for the
> first half of a page to be mapped to one physical address, and the
> second half to be mapped to a different physical address would be -
> that is beyond the capabilities of the system to achieve.
> 
> Having a section mapping in the first 1MB of a 2MB block and then
> trying to create a page table in the second half of a 2MB block is
> illegal as, as far as Linux is concerned, the level 1 page table
> is indexed in 2MB chunks and contains 2048 entries, not 4096 that
> the hardware actually has.
Thanks for the explanation. 
> 
> In fact, the software implementation completely ignores the odd-
> numbered entries in the L1 page table beyond writing them.
> 
> > > 
> > > So, this is not a problem with the mapping functions, but the way they
> > > are being used.
I am not sure i am following here. Since we can detect that why allow firmware
implementations crash the kernel? (as long as the memory mappings are not wrong
or overlapping)

> > > Beyond that, it is impossible to say with the above information since:
> > > 
> > > 1. You've obviously added some additional printk()s, but you haven't
> > >    said where they are or what they mean.  What is the difference
> > >    between "MAP addr" and "INI addr" ?
> > MAP comes from alloc_init_pmd() when calling ()
> > INI comes from the same function when calling alloc_init_pte()
> > > 
> > > 2. You haven't included the text of the crash, so there's no way to
> > >    know the call path to the BUG_ON(), and hence ascertain where the
> > >    duplicated mapping is coming from.
> > > 
> > > Please always provide the full unaltered text from any kernel oops, bug
> > > or warning.
> > That's because the BUG_ON never shows up in my console. I added printks before
> > and after the BUG_ON() though and i can only see the first print
> > I can provide a dump_stack() if that helps.
> 
> Yes please.
> 
> It may also help to enable memblock debugging (memblock=debug) so we can
> see what that looks like as well.
All my tests were with memblock and efi debug enabled. I just left them out, my
bad.

Here's the tests and working vs non working use-cases
On all cases FDT size = 0xc000

## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7f00000 ##

[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x1
[    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
[    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
[    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
[    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
[    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
[    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
[    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
[    0.000000] cma: Reserved 64 MiB at 0xd8000000
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
[    0.000000]  memory.cnt  = 0x7
[    0.000000]  memory[0x0]     [0xc0000000-0xc7efffff], 0x07f00000 bytes flags: 0x0
[    0.000000]  memory[0x1]     [0xc7f00000-0xc7f0bfff], 0x0000c000 bytes flags: 0x4
[    0.000000]  memory[0x2]     [0xc7f0c000-0xdcf6afff], 0x1505f000 bytes flags: 0x0
[    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
[    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
[    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
[    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x5
[    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0

# Extra debug
[    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
[    0.000000] __map_init_section addr: c7e00000, next c7f00000, phys c7e00000 len 100000 c7e1141e
[    0.000000] alloc_init_pte addr: c7f0c000, next c8000000, len f4000 pmd: c7e1141e
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc3-29427-g769f1f8f9b56-dirty #152
[    0.000000] Hardware name: STM32 (Device Tree Support)
[    0.000000] [<c03123ec>] (unwind_backtrace) from [<c030ce0c>] (show_stack+0x10/0x14)
[    0.000000] [<c030ce0c>] (show_stack) from [<c0e27250>] (dump_stack+0x8c/0xa0)
[    0.000000] [<c0e27250>] (dump_stack) from [<c1507e80>] (arm_pte_alloc+0x58/0x80)
[    0.000000] [<c1507e80>] (arm_pte_alloc) from [<c150819c>] (__create_mapping+0x2f4/0x34c)
[    0.000000] [<c150819c>] (__create_mapping) from [<c15089a0>] (paging_init+0x234/0x648)
[    0.000000] [<c15089a0>] (paging_init) from [<c1504950>] (setup_arch+0x660/0xcac)
[    0.000000] [<c1504950>] (setup_arch) from [<c1500a4c>] (start_kernel+0x70/0x458)
[    0.000000] [<c1500a4c>] (start_kernel) from [<00000000>] (  (null))

So the kernel correctly skips the 0xc000 that u-boot marked as NOMAP, but then
crashes.
The pmd value here is the same for the 1mb of mapped by __map_init_section() and 
for the subsequent alloc_init_pte(). Should we just skip that?

## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7eff000 ##
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x1
[    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
[    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
[    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2861000-0xc7cfefff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7cff000-0xc7efefff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7eff000-0xc7f0afff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f0b000-0xdc705fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
[    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
[    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
[    0.000000] memblock_reserve: [0xc7cff000-0xc7d0796c] arm_memblock_init+0x11c/0x168
[    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
[    0.000000] cma: Reserved 64 MiB at 0xd8000000
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
[    0.000000]  memory.cnt  = 0x7
[    0.000000]  memory[0x0]     [0xc0000000-0xc7efefff], 0x07eff000 bytes flags: 0x0
[    0.000000]  memory[0x1]     [0xc7eff000-0xc7f0afff], 0x0000c000 bytes flags: 0x4
[    0.000000]  memory[0x2]     [0xc7f0b000-0xdcf6afff], 0x15060000 bytes flags: 0x0
[    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
[    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
[    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
[    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x5
[    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0xc7cff000-0xc7d0796c], 0x0000896d bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0

# Extra debug
[    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
[    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000
[    0.000000] alloc_init_pte addr: c7e00000, next c7eff000, len ff000
[    0.000000] alloc_init_pte addr: c7f0b000, next c8000000, len f5000

So moving the FDT out of section alignment seems to fix the problem 

## Booting with MEMBLOCK_NOMAP removed from FDT area. FDT at c7f00000

[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x1
[    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
[    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
[    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
[    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
[    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
[    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
[    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
[    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
[    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
[    0.000000] cma: Reserved 64 MiB at 0xd8000000
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
[    0.000000]  memory.cnt  = 0x5
[    0.000000]  memory[0x0]     [0xc0000000-0xdcf6afff], 0x1cf6b000 bytes flags: 0x0
[    0.000000]  memory[0x1]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
[    0.000000]  memory[0x2]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
[    0.000000]  memory[0x3]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
[    0.000000]  memory[0x4]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x5
[    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0

# Extra debug(The FDT is mapped here instead of skipped)
[    0.000000] __map_init_section addr: c7e00000, next c8000000, phys c7e00000 len 200000

Also if i force the FDT size to  1MB i obviously skip the offending region
and boot correctly.


Thanks
/Ilias

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory size and section boundary on armv7
  2019-04-12  5:23       ` Ilias Apalodimas
@ 2019-04-12  8:40         ` Russell King - ARM Linux admin
  2019-04-12 10:10           ` Ilias Apalodimas
  0 siblings, 1 reply; 10+ messages in thread
From: Russell King - ARM Linux admin @ 2019-04-12  8:40 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: catalin.marinas, labbott, linux-arm-kernel, ard.biesheuvel

On Fri, Apr 12, 2019 at 08:23:44AM +0300, Ilias Apalodimas wrote:
> Hi Russell,
> > > > It seems that alloc_init_pte() is finding that a section mapping has
> > > > already been created for the virtual address range, but it is then
> > > > being asked to create a page table mapping over the top of it.  That
> > > > is one BIG no no.  Replacing the existing section mapping with a page
> > > > table means that the section mapping is destroyed.  If we allowed
> > > > such replacement, when the section mapping is accessed, we will end
> > > > up oopsing the kernel.
> > > 
> > > On the crashing case calling __map_init_section() maps 1mb should the rest be
> > > skipped then?
> > 
> > Sorry, I don't understand the question.  What "rest" are you referring
> > to?
> I'll try to explain this better on my test-cases below
> 
> > > > It is also designed to allow hardware-section sized mappings (making
> > > > it possible to map sections on 1MB granularity) but as a single Linux
> > > > page table always occupies 2MB, it is not permitted for the unused
> > > > half of an aligned 2MB slot to be used for a page table mapping -
> > > > hence this BUG_ON().
> > > > 
> > > > The ARM early mapping routines are intentionally designed such that
> > > > areas of memory that they are asked to map are non-overlapping - it
> > > > is the caller's responsibility to ensure this.
> > > 
> > > And the case i described falls into 'overlapping'?
> > 
> > Yes, as I described above - note that I pointed out that a page table
> > takes a naturally aligned 2MB of space, and sections are naturally
> > aligned 1MB of space.
> > 
> > arch/arm/include/asm/pgtable-2level.h documents how we fit the
> > hardware's page table implementation to the kernel's page table
> > requirements.  The key thing to note is that, because of the mismatch
> > between the kernel and the hardware, a Linux page table occupies two
> > physical level 1 entries, whereas a section occupies one level 1
> > entry.
> > 
> Yes i read that. Maybe i'll get a better hang of it after more reads
> 
> > Hence, if there is a section mapping in the even-numbered level 1
> > entry, and we then get a request via alloc_init_pte() to create a
> > page table, we can't without destroying the mapping that is already
> > present.
> > 
> This is exactly what's happening.
> [    0.000000] __map_init_section addr: c0200000, next c0300000, phys c0200000 len 100000
> [    0.000000] __map_init_section addr: c0300000, next c0400000, phys c0300000 len 100000 
> 
> These are 2 subsequent section mappings that work. If there's any memory marked
> as MEMBLOCK_NOMAP in the second section though the code path is
> alloc_init_pte(). Should't we just skip that section overall ?

We can't skip the section mapping - if we skip the first section,
when the kerenl then tries to access memory in that section, the
kernel will oops.  We also have no prior knowledge in the mapping
functions that this situation will occur.

> > So, yes, this absolutely is an overlapping mapping.  It may not be
> > at page level, but it is in exactly the same way as asking for the
> > first half of a page to be mapped to one physical address, and the
> > second half to be mapped to a different physical address would be -
> > that is beyond the capabilities of the system to achieve.
> > 
> > Having a section mapping in the first 1MB of a 2MB block and then
> > trying to create a page table in the second half of a 2MB block is
> > illegal as, as far as Linux is concerned, the level 1 page table
> > is indexed in 2MB chunks and contains 2048 entries, not 4096 that
> > the hardware actually has.
> Thanks for the explanation. 
> > 
> > In fact, the software implementation completely ignores the odd-
> > numbered entries in the L1 page table beyond writing them.
> > 
> > > > 
> > > > So, this is not a problem with the mapping functions, but the way they
> > > > are being used.
>
> I am not sure i am following here. Since we can detect that why allow
> firmware implementations crash the kernel? (as long as the memory
> mappings are not wrong or overlapping)

It evolution.  These firmware implementations have come along _way_
after the ARM mapping code was written, and they construct various
situations that are difficult to handle - and given the further
evolution of the kernel with facilities such as the NX protections
(which rely on eg, the kernel text, being aligned to section
boundaries so that different permissions can be given to each section,
it is impossible to see any way to solve this problem at the mapping
code level.

> 
> > > > Beyond that, it is impossible to say with the above information since:
> > > > 
> > > > 1. You've obviously added some additional printk()s, but you haven't
> > > >    said where they are or what they mean.  What is the difference
> > > >    between "MAP addr" and "INI addr" ?
> > > MAP comes from alloc_init_pmd() when calling ()
> > > INI comes from the same function when calling alloc_init_pte()
> > > > 
> > > > 2. You haven't included the text of the crash, so there's no way to
> > > >    know the call path to the BUG_ON(), and hence ascertain where the
> > > >    duplicated mapping is coming from.
> > > > 
> > > > Please always provide the full unaltered text from any kernel oops, bug
> > > > or warning.
> > > That's because the BUG_ON never shows up in my console. I added printks before
> > > and after the BUG_ON() though and i can only see the first print
> > > I can provide a dump_stack() if that helps.
> > 
> > Yes please.
> > 
> > It may also help to enable memblock debugging (memblock=debug) so we can
> > see what that looks like as well.
> All my tests were with memblock and efi debug enabled. I just left them out, my
> bad.
> 
> Here's the tests and working vs non working use-cases
> On all cases FDT size = 0xc000
> 
> ## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7f00000 ##
> 
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> [    0.000000]  memory.cnt  = 0x1
> [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x1
> [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> [    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
> [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> [    0.000000]  memory.cnt  = 0x7
> [    0.000000]  memory[0x0]     [0xc0000000-0xc7efffff], 0x07f00000 bytes flags: 0x0
> [    0.000000]  memory[0x1]     [0xc7f00000-0xc7f0bfff], 0x0000c000 bytes flags: 0x4
> [    0.000000]  memory[0x2]     [0xc7f0c000-0xdcf6afff], 0x1505f000 bytes flags: 0x0
> [    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> [    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> [    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> [    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x5
> [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> [    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
> [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> 
> # Extra debug
> [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
> [    0.000000] __map_init_section addr: c7e00000, next c7f00000, phys c7e00000 len 100000 c7e1141e
> [    0.000000] alloc_init_pte addr: c7f0c000, next c8000000, len f4000 pmd: c7e1141e
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc3-29427-g769f1f8f9b56-dirty #152
> [    0.000000] Hardware name: STM32 (Device Tree Support)
> [    0.000000] [<c03123ec>] (unwind_backtrace) from [<c030ce0c>] (show_stack+0x10/0x14)
> [    0.000000] [<c030ce0c>] (show_stack) from [<c0e27250>] (dump_stack+0x8c/0xa0)
> [    0.000000] [<c0e27250>] (dump_stack) from [<c1507e80>] (arm_pte_alloc+0x58/0x80)
> [    0.000000] [<c1507e80>] (arm_pte_alloc) from [<c150819c>] (__create_mapping+0x2f4/0x34c)
> [    0.000000] [<c150819c>] (__create_mapping) from [<c15089a0>] (paging_init+0x234/0x648)
> [    0.000000] [<c15089a0>] (paging_init) from [<c1504950>] (setup_arch+0x660/0xcac)
> [    0.000000] [<c1504950>] (setup_arch) from [<c1500a4c>] (start_kernel+0x70/0x458)
> [    0.000000] [<c1500a4c>] (start_kernel) from [<00000000>] (  (null))
> 
> So the kernel correctly skips the 0xc000 that u-boot marked as NOMAP,
> but then crashes.

If that is the FDT, why is it being marked NOMAP?  The kernel needs it
to be mapped (but marked reserved in memblock) so that it can access
it.  When booting normally with u-boot, FDT is not marked NOMAP.

> The pmd value here is the same for the 1mb of mapped by __map_init_section() and 
> for the subsequent alloc_init_pte(). Should we just skip that?
> 
> ## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7eff000 ##
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> [    0.000000]  memory.cnt  = 0x1
> [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x1
> [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc2861000-0xc7cfefff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc7cff000-0xc7efefff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc7eff000-0xc7f0afff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc7f0b000-0xdc705fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> [    0.000000] memblock_reserve: [0xc7cff000-0xc7d0796c] arm_memblock_init+0x11c/0x168
> [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> [    0.000000]  memory.cnt  = 0x7
> [    0.000000]  memory[0x0]     [0xc0000000-0xc7efefff], 0x07eff000 bytes flags: 0x0
> [    0.000000]  memory[0x1]     [0xc7eff000-0xc7f0afff], 0x0000c000 bytes flags: 0x4
> [    0.000000]  memory[0x2]     [0xc7f0b000-0xdcf6afff], 0x15060000 bytes flags: 0x0
> [    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> [    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> [    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> [    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x5
> [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> [    0.000000]  reserved[0x2]   [0xc7cff000-0xc7d0796c], 0x0000896d bytes flags: 0x0
> [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> 
> # Extra debug
> [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
> [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000
> [    0.000000] alloc_init_pte addr: c7e00000, next c7eff000, len ff000
> [    0.000000] alloc_init_pte addr: c7f0b000, next c8000000, len f5000
> 
> So moving the FDT out of section alignment seems to fix the problem 

Indeed, because both even-section and odd-sections can't be created,
and so are both created as page tables, which avoids the overlapping
mappings problem.

> ## Booting with MEMBLOCK_NOMAP removed from FDT area. FDT at c7f00000
> 
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> [    0.000000]  memory.cnt  = 0x1
> [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x1
> [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> [    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
> [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> [    0.000000]  memory.cnt  = 0x5
> [    0.000000]  memory[0x0]     [0xc0000000-0xdcf6afff], 0x1cf6b000 bytes flags: 0x0
> [    0.000000]  memory[0x1]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> [    0.000000]  memory[0x2]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> [    0.000000]  memory[0x3]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> [    0.000000]  memory[0x4]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x5
> [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> [    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
> [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0

I don't see any sign of the FDT being reserved in memblock, which is
dangerous - the FDT could be overwritten during the early kernel boot.

> 
> # Extra debug(The FDT is mapped here instead of skipped)
> [    0.000000] __map_init_section addr: c7e00000, next c8000000, phys c7e00000 len 200000
> 
> Also if i force the FDT size to  1MB i obviously skip the offending region
> and boot correctly.
> 
> 
> Thanks
> /Ilias
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory size and section boundary on armv7
  2019-04-12  8:40         ` Russell King - ARM Linux admin
@ 2019-04-12 10:10           ` Ilias Apalodimas
  2019-04-12 11:16             ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 10+ messages in thread
From: Ilias Apalodimas @ 2019-04-12 10:10 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: catalin.marinas, labbott, linux-arm-kernel, ard.biesheuvel

Hi Russell,

> > > 
> > > arch/arm/include/asm/pgtable-2level.h documents how we fit the
> > > hardware's page table implementation to the kernel's page table
> > > requirements.  The key thing to note is that, because of the mismatch
> > > between the kernel and the hardware, a Linux page table occupies two
> > > physical level 1 entries, whereas a section occupies one level 1
> > > entry.
> > > 
> > Yes i read that. Maybe i'll get a better hang of it after more reads
> > 
> > > Hence, if there is a section mapping in the even-numbered level 1
> > > entry, and we then get a request via alloc_init_pte() to create a
> > > page table, we can't without destroying the mapping that is already
> > > present.
> > > 
> > This is exactly what's happening.
> > [    0.000000] __map_init_section addr: c0200000, next c0300000, phys c0200000 len 100000
> > [    0.000000] __map_init_section addr: c0300000, next c0400000, phys c0300000 len 100000 
> > 
> > These are 2 subsequent section mappings that work. If there's any memory marked
> > as MEMBLOCK_NOMAP in the second section though the code path is
> > alloc_init_pte(). Should't we just skip that section overall ?
> 
> We can't skip the section mapping - if we skip the first section,
> when the kerenl then tries to access memory in that section, the
> kernel will oops.  We also have no prior knowledge in the mapping
> functions that this situation will occur.
Thanks for the explanation. 

> >
> > I am not sure i am following here. Since we can detect that why allow
> > firmware implementations crash the kernel? (as long as the memory
> > mappings are not wrong or overlapping)
> 
> It evolution.  These firmware implementations have come along _way_
> after the ARM mapping code was written, and they construct various
> situations that are difficult to handle - and given the further
> evolution of the kernel with facilities such as the NX protections
> (which rely on eg, the kernel text, being aligned to section
> boundaries so that different permissions can be given to each section,
> it is impossible to see any way to solve this problem at the mapping
> code level.
Can we at least we can print a warning saying "What you are trying to do is not
good enough, please re-check the mappings" or something like that, 
to help people avoid that in the future.
The BUG_ON is supposed to do that, but for some reason i can't see it on my
console, any ideas why?

> > All my tests were with memblock and efi debug enabled. I just left them out, my
> > bad.
> > 
> > Here's the tests and working vs non working use-cases
> > On all cases FDT size = 0xc000
> > 
> > ## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7f00000 ##
> > 
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> > [    0.000000]  memory.cnt  = 0x1
> > [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x1
> > [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> > [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> > [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> > [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> > [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> > [    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
> > [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> > [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> > [    0.000000]  memory.cnt  = 0x7
> > [    0.000000]  memory[0x0]     [0xc0000000-0xc7efffff], 0x07f00000 bytes flags: 0x0
> > [    0.000000]  memory[0x1]     [0xc7f00000-0xc7f0bfff], 0x0000c000 bytes flags: 0x4
> > [    0.000000]  memory[0x2]     [0xc7f0c000-0xdcf6afff], 0x1505f000 bytes flags: 0x0
> > [    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> > [    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> > [    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> > [    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x5
> > [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> > [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> > [    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
> > [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> > [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> > 
> > # Extra debug
> > [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
> > [    0.000000] __map_init_section addr: c7e00000, next c7f00000, phys c7e00000 len 100000 c7e1141e
> > [    0.000000] alloc_init_pte addr: c7f0c000, next c8000000, len f4000 pmd: c7e1141e
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc3-29427-g769f1f8f9b56-dirty #152
> > [    0.000000] Hardware name: STM32 (Device Tree Support)
> > [    0.000000] [<c03123ec>] (unwind_backtrace) from [<c030ce0c>] (show_stack+0x10/0x14)
> > [    0.000000] [<c030ce0c>] (show_stack) from [<c0e27250>] (dump_stack+0x8c/0xa0)
> > [    0.000000] [<c0e27250>] (dump_stack) from [<c1507e80>] (arm_pte_alloc+0x58/0x80)
> > [    0.000000] [<c1507e80>] (arm_pte_alloc) from [<c150819c>] (__create_mapping+0x2f4/0x34c)
> > [    0.000000] [<c150819c>] (__create_mapping) from [<c15089a0>] (paging_init+0x234/0x648)
> > [    0.000000] [<c15089a0>] (paging_init) from [<c1504950>] (setup_arch+0x660/0xcac)
> > [    0.000000] [<c1504950>] (setup_arch) from [<c1500a4c>] (start_kernel+0x70/0x458)
> > [    0.000000] [<c1500a4c>] (start_kernel) from [<00000000>] (  (null))
> > 
> > So the kernel correctly skips the 0xc000 that u-boot marked as NOMAP,
> > but then crashes.
> 
> If that is the FDT, why is it being marked NOMAP?  The kernel needs it
> to be mapped (but marked reserved in memblock) so that it can access
> it.  When booting normally with u-boot, FDT is not marked NOMAP.
The u-boot code marks it as EFI_RUNTIME_SERVICES_DATA, so they get flagged as
NOMAP. There's a parallel u-boot discussion on this trying to figure out if
that's correct

> 
> > The pmd value here is the same for the 1mb of mapped by __map_init_section() and 
> > for the subsequent alloc_init_pte(). Should we just skip that?
> > 
> > ## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7eff000 ##
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> > [    0.000000]  memory.cnt  = 0x1
> > [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x1
> > [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> > [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> > [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2861000-0xc7cfefff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7cff000-0xc7efefff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7eff000-0xc7f0afff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f0b000-0xdc705fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> > [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> > [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> > [    0.000000] memblock_reserve: [0xc7cff000-0xc7d0796c] arm_memblock_init+0x11c/0x168
> > [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> > [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> > [    0.000000]  memory.cnt  = 0x7
> > [    0.000000]  memory[0x0]     [0xc0000000-0xc7efefff], 0x07eff000 bytes flags: 0x0
> > [    0.000000]  memory[0x1]     [0xc7eff000-0xc7f0afff], 0x0000c000 bytes flags: 0x4
> > [    0.000000]  memory[0x2]     [0xc7f0b000-0xdcf6afff], 0x15060000 bytes flags: 0x0
> > [    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> > [    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> > [    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> > [    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x5
> > [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> > [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> > [    0.000000]  reserved[0x2]   [0xc7cff000-0xc7d0796c], 0x0000896d bytes flags: 0x0
> > [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> > [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> > 
> > # Extra debug
> > [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
> > [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000
> > [    0.000000] alloc_init_pte addr: c7e00000, next c7eff000, len ff000
> > [    0.000000] alloc_init_pte addr: c7f0b000, next c8000000, len f5000
> > 
> > So moving the FDT out of section alignment seems to fix the problem 
> 
> Indeed, because both even-section and odd-sections can't be created,
> and so are both created as page tables, which avoids the overlapping
> mappings problem.
> 
Yes. U-Boot wise we could fix that, by placing the dtb on a non section aligned
boundary

> > ## Booting with MEMBLOCK_NOMAP removed from FDT area. FDT at c7f00000
> > 
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> > [    0.000000]  memory.cnt  = 0x1
> > [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x1
> > [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> > [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> > [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> > [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> > [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> > [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> > [    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
> > [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> > [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> > [    0.000000] MEMBLOCK configuration:
> > [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> > [    0.000000]  memory.cnt  = 0x5
> > [    0.000000]  memory[0x0]     [0xc0000000-0xdcf6afff], 0x1cf6b000 bytes flags: 0x0
> > [    0.000000]  memory[0x1]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> > [    0.000000]  memory[0x2]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> > [    0.000000]  memory[0x3]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> > [    0.000000]  memory[0x4]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> > [    0.000000]  reserved.cnt  = 0x5
> > [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> > [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> > [    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
> > [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> > [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> 
> I don't see any sign of the FDT being reserved in memblock, which is
> dangerous - the FDT could be overwritten during the early kernel boot.
> 
This was more of a test to indicate that 'if we dont mark it as NOMAP, we dont
crash' and provide further info for the analysis. 
Thanks for the note though, we'll make sure we won't overwrite those

Thanks for taking the time with this.
I still have some reading to fully understand the implications.

The easiest way out of it is to place the fdt on !SECTION_SIZE right?


Thanks
/Ilias

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory size and section boundary on armv7
  2019-04-12 10:10           ` Ilias Apalodimas
@ 2019-04-12 11:16             ` Russell King - ARM Linux admin
  2019-04-12 11:26               ` Ilias Apalodimas
  2019-04-12 11:43               ` Ilias Apalodimas
  0 siblings, 2 replies; 10+ messages in thread
From: Russell King - ARM Linux admin @ 2019-04-12 11:16 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: catalin.marinas, labbott, linux-arm-kernel, ard.biesheuvel

On Fri, Apr 12, 2019 at 01:10:13PM +0300, Ilias Apalodimas wrote:
> Hi Russell,
> 
> > > > 
> > > > arch/arm/include/asm/pgtable-2level.h documents how we fit the
> > > > hardware's page table implementation to the kernel's page table
> > > > requirements.  The key thing to note is that, because of the mismatch
> > > > between the kernel and the hardware, a Linux page table occupies two
> > > > physical level 1 entries, whereas a section occupies one level 1
> > > > entry.
> > > > 
> > > Yes i read that. Maybe i'll get a better hang of it after more reads
> > > 
> > > > Hence, if there is a section mapping in the even-numbered level 1
> > > > entry, and we then get a request via alloc_init_pte() to create a
> > > > page table, we can't without destroying the mapping that is already
> > > > present.
> > > > 
> > > This is exactly what's happening.
> > > [    0.000000] __map_init_section addr: c0200000, next c0300000, phys c0200000 len 100000
> > > [    0.000000] __map_init_section addr: c0300000, next c0400000, phys c0300000 len 100000 
> > > 
> > > These are 2 subsequent section mappings that work. If there's any memory marked
> > > as MEMBLOCK_NOMAP in the second section though the code path is
> > > alloc_init_pte(). Should't we just skip that section overall ?
> > 
> > We can't skip the section mapping - if we skip the first section,
> > when the kerenl then tries to access memory in that section, the
> > kernel will oops.  We also have no prior knowledge in the mapping
> > functions that this situation will occur.
> Thanks for the explanation. 
> 
> > >
> > > I am not sure i am following here. Since we can detect that why allow
> > > firmware implementations crash the kernel? (as long as the memory
> > > mappings are not wrong or overlapping)
> > 
> > It evolution.  These firmware implementations have come along _way_
> > after the ARM mapping code was written, and they construct various
> > situations that are difficult to handle - and given the further
> > evolution of the kernel with facilities such as the NX protections
> > (which rely on eg, the kernel text, being aligned to section
> > boundaries so that different permissions can be given to each section,
> > it is impossible to see any way to solve this problem at the mapping
> > code level.
> Can we at least we can print a warning saying "What you are trying to do is not
> good enough, please re-check the mappings" or something like that, 
> to help people avoid that in the future.
> The BUG_ON is supposed to do that, but for some reason i can't see it on my
> console, any ideas why?

That may be a possibility, we would have to ignore the request to setup
the mapping, which means we could end silently locking up shortly
there-after.

It may also be possible to slightly rearrange the code to map the
vectors page before we setup any of the memory mappings (so that
exceptions can work), but that may need careful handling.

> > > All my tests were with memblock and efi debug enabled. I just left them out, my
> > > bad.
> > > 
> > > Here's the tests and working vs non working use-cases
> > > On all cases FDT size = 0xc000
> > > 
> > > ## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7f00000 ##
> > > 
> > > [    0.000000] MEMBLOCK configuration:
> > > [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> > > [    0.000000]  memory.cnt  = 0x1
> > > [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> > > [    0.000000]  reserved.cnt  = 0x1
> > > [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> > > [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> > > [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> > > [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> > > [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> > > [    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
> > > [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> > > [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> > > [    0.000000] MEMBLOCK configuration:
> > > [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> > > [    0.000000]  memory.cnt  = 0x7
> > > [    0.000000]  memory[0x0]     [0xc0000000-0xc7efffff], 0x07f00000 bytes flags: 0x0
> > > [    0.000000]  memory[0x1]     [0xc7f00000-0xc7f0bfff], 0x0000c000 bytes flags: 0x4
> > > [    0.000000]  memory[0x2]     [0xc7f0c000-0xdcf6afff], 0x1505f000 bytes flags: 0x0
> > > [    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> > > [    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> > > [    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> > > [    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> > > [    0.000000]  reserved.cnt  = 0x5
> > > [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> > > [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> > > [    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
> > > [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> > > [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> > > 
> > > # Extra debug
> > > [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
> > > [    0.000000] __map_init_section addr: c7e00000, next c7f00000, phys c7e00000 len 100000 c7e1141e
> > > [    0.000000] alloc_init_pte addr: c7f0c000, next c8000000, len f4000 pmd: c7e1141e
> > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc3-29427-g769f1f8f9b56-dirty #152
> > > [    0.000000] Hardware name: STM32 (Device Tree Support)
> > > [    0.000000] [<c03123ec>] (unwind_backtrace) from [<c030ce0c>] (show_stack+0x10/0x14)
> > > [    0.000000] [<c030ce0c>] (show_stack) from [<c0e27250>] (dump_stack+0x8c/0xa0)
> > > [    0.000000] [<c0e27250>] (dump_stack) from [<c1507e80>] (arm_pte_alloc+0x58/0x80)
> > > [    0.000000] [<c1507e80>] (arm_pte_alloc) from [<c150819c>] (__create_mapping+0x2f4/0x34c)
> > > [    0.000000] [<c150819c>] (__create_mapping) from [<c15089a0>] (paging_init+0x234/0x648)
> > > [    0.000000] [<c15089a0>] (paging_init) from [<c1504950>] (setup_arch+0x660/0xcac)
> > > [    0.000000] [<c1504950>] (setup_arch) from [<c1500a4c>] (start_kernel+0x70/0x458)
> > > [    0.000000] [<c1500a4c>] (start_kernel) from [<00000000>] (  (null))
> > > 
> > > So the kernel correctly skips the 0xc000 that u-boot marked as NOMAP,
> > > but then crashes.
> > 
> > If that is the FDT, why is it being marked NOMAP?  The kernel needs it
> > to be mapped (but marked reserved in memblock) so that it can access
> > it.  When booting normally with u-boot, FDT is not marked NOMAP.
> The u-boot code marks it as EFI_RUNTIME_SERVICES_DATA, so they get flagged as
> NOMAP. There's a parallel u-boot discussion on this trying to figure out if
> that's correct
> 
> > 
> > > The pmd value here is the same for the 1mb of mapped by __map_init_section() and 
> > > for the subsequent alloc_init_pte(). Should we just skip that?
> > > 
> > > ## fdt memory marked as MEMBLOCK_NOMAP. FDT at c7eff000 ##
> > > [    0.000000] MEMBLOCK configuration:
> > > [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> > > [    0.000000]  memory.cnt  = 0x1
> > > [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> > > [    0.000000]  reserved.cnt  = 0x1
> > > [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> > > [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> > > [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc2861000-0xc7cfefff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc7cff000-0xc7efefff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc7eff000-0xc7f0afff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc7f0b000-0xdc705fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> > > [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> > > [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> > > [    0.000000] memblock_reserve: [0xc7cff000-0xc7d0796c] arm_memblock_init+0x11c/0x168
> > > [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> > > [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> > > [    0.000000] MEMBLOCK configuration:
> > > [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> > > [    0.000000]  memory.cnt  = 0x7
> > > [    0.000000]  memory[0x0]     [0xc0000000-0xc7efefff], 0x07eff000 bytes flags: 0x0
> > > [    0.000000]  memory[0x1]     [0xc7eff000-0xc7f0afff], 0x0000c000 bytes flags: 0x4
> > > [    0.000000]  memory[0x2]     [0xc7f0b000-0xdcf6afff], 0x15060000 bytes flags: 0x0
> > > [    0.000000]  memory[0x3]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> > > [    0.000000]  memory[0x4]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> > > [    0.000000]  memory[0x5]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> > > [    0.000000]  memory[0x6]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> > > [    0.000000]  reserved.cnt  = 0x5
> > > [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> > > [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> > > [    0.000000]  reserved[0x2]   [0xc7cff000-0xc7d0796c], 0x0000896d bytes flags: 0x0
> > > [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> > > [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> > > 
> > > # Extra debug
> > > [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000 c7c1141e
> > > [    0.000000] __map_init_section addr: c7c00000, next c7e00000, phys c7c00000 len 200000
> > > [    0.000000] alloc_init_pte addr: c7e00000, next c7eff000, len ff000
> > > [    0.000000] alloc_init_pte addr: c7f0b000, next c8000000, len f5000
> > > 
> > > So moving the FDT out of section alignment seems to fix the problem 
> > 
> > Indeed, because both even-section and odd-sections can't be created,
> > and so are both created as page tables, which avoids the overlapping
> > mappings problem.
> > 
> Yes. U-Boot wise we could fix that, by placing the dtb on a non section
> aligned boundary

That's actually misleading.

The problem happens when you have a small no-map section within a 2MB
region, and it doesn't cross the even-odd 1MB boundary.

To illustrate what I'm saying, if you arranged for it to be (eg) one
page higher (iow, 0xc7f01000) then you'll hit the same problem.

> > > ## Booting with MEMBLOCK_NOMAP removed from FDT area. FDT at c7f00000
> > > 
> > > [    0.000000] MEMBLOCK configuration:
> > > [    0.000000]  memory size = 0x20000000 reserved size = 0x00000000
> > > [    0.000000]  memory.cnt  = 0x1
> > > [    0.000000]  memory[0x0]     [0xc0000000-0xdfffffff], 0x20000000 bytes flags: 0x0
> > > [    0.000000]  reserved.cnt  = 0x1
> > > [    0.000000]  reserved[0x0]   [0x00000000-0xffffffff], 0x00000000 bytes flags: 0x0
> > > [    0.000000] memblock_remove: [0x00000000-0xfffffffe] reserve_regions+0x64/0x220
> > > [    0.000000] memblock_add: [0xc0000000-0xc1ffffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc2000000-0xc2860fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc2861000-0xc7cfffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc7d00000-0xc7efffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc7f00000-0xc7f0bfff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xc7f0c000-0xdc705fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdc706000-0xdc709fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdc70a000-0xdcf6afff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf6b000-0xdcf6bfff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf6c000-0xdcf72fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf73000-0xdcf73fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf74000-0xdcf75fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdcf76000-0xdff80fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdff81000-0xdff81fff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_add: [0xdff82000-0xdfffffff] reserve_regions+0x178/0x220
> > > [    0.000000] memblock_reserve: [0xdc706000-0xdc706fff] efi_init+0xe4/0x1c4
> > > [    0.000000] memblock_reserve: [0xc0300000-0xc18f7ad7] arm_memblock_init+0x30/0x168
> > > [    0.000000] memblock_reserve: [0xc0204000-0xc0207fff] arm_memblock_init+0x108/0x168
> > > [    0.000000] memblock_reserve: [0xc7d00000-0xc7d0896c] arm_memblock_init+0x11c/0x168
> > > [    0.000000] memblock_reserve: [0xd8000000-0xdbffffff] memblock_alloc_range+0x54/0x6c
> > > [    0.000000] cma: Reserved 64 MiB at 0xd8000000
> > > [    0.000000] MEMBLOCK configuration:
> > > [    0.000000]  memory size = 0x20000000 reserved size = 0x05605445
> > > [    0.000000]  memory.cnt  = 0x5
> > > [    0.000000]  memory[0x0]     [0xc0000000-0xdcf6afff], 0x1cf6b000 bytes flags: 0x0
> > > [    0.000000]  memory[0x1]     [0xdcf6b000-0xdcf75fff], 0x0000b000 bytes flags: 0x4
> > > [    0.000000]  memory[0x2]     [0xdcf76000-0xdff80fff], 0x0300b000 bytes flags: 0x0
> > > [    0.000000]  memory[0x3]     [0xdff81000-0xdff81fff], 0x00001000 bytes flags: 0x4
> > > [    0.000000]  memory[0x4]     [0xdff82000-0xdfffffff], 0x0007e000 bytes flags: 0x0
> > > [    0.000000]  reserved.cnt  = 0x5
> > > [    0.000000]  reserved[0x0]   [0xc0204000-0xc0207fff], 0x00004000 bytes flags: 0x0
> > > [    0.000000]  reserved[0x1]   [0xc0300000-0xc18f7ad7], 0x015f7ad8 bytes flags: 0x0
> > > [    0.000000]  reserved[0x2]   [0xc7d00000-0xc7d0896c], 0x0000896d bytes flags: 0x0
> > > [    0.000000]  reserved[0x3]   [0xd8000000-0xdbffffff], 0x04000000 bytes flags: 0x0
> > > [    0.000000]  reserved[0x4]   [0xdc706000-0xdc706fff], 0x00001000 bytes flags: 0x0
> > 
> > I don't see any sign of the FDT being reserved in memblock, which is
> > dangerous - the FDT could be overwritten during the early kernel boot.
> > 
> This was more of a test to indicate that 'if we dont mark it as NOMAP, we dont
> crash' and provide further info for the analysis. 
> Thanks for the note though, we'll make sure we won't overwrite those
> 
> Thanks for taking the time with this.
> I still have some reading to fully understand the implications.
> 
> The easiest way out of it is to place the fdt on !SECTION_SIZE right?

See above, that's not entirely accurate.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory size and section boundary on armv7
  2019-04-12 11:16             ` Russell King - ARM Linux admin
@ 2019-04-12 11:26               ` Ilias Apalodimas
  2019-04-12 11:43               ` Ilias Apalodimas
  1 sibling, 0 replies; 10+ messages in thread
From: Ilias Apalodimas @ 2019-04-12 11:26 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: catalin.marinas, labbott, linux-arm-kernel, ard.biesheuvel

> > Can we at least we can print a warning saying "What you are trying to do is not
> > good enough, please re-check the mappings" or something like that, 
> > to help people avoid that in the future.
> > The BUG_ON is supposed to do that, but for some reason i can't see it on my
> > console, any ideas why?
> 
> That may be a possibility, we would have to ignore the request to setup
> the mapping, which means we could end silently locking up shortly
> there-after.
> 
> It may also be possible to slightly rearrange the code to map the
> vectors page before we setup any of the memory mappings (so that
> exceptions can work), but that may need careful handling.
> 
Ok, i'll try to have a look at that, since i can easily reproduce the splat

> > Yes. U-Boot wise we could fix that, by placing the dtb on a non section
> > aligned boundary
> 
> That's actually misleading.
> 
> The problem happens when you have a small no-map section within a 2MB
> region, and it doesn't cross the even-odd 1MB boundary.
> 
> To illustrate what I'm saying, if you arranged for it to be (eg) one
> page higher (iow, 0xc7f01000) then you'll hit the same problem.
Yes correct, i actually tried that as well in one of my tests and still had the
same effect

> > 
> > The easiest way out of it is to place the fdt on !SECTION_SIZE right?
Yes noted, 

Thanks
/Ilias

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Memory size and section boundary on armv7
  2019-04-12 11:16             ` Russell King - ARM Linux admin
  2019-04-12 11:26               ` Ilias Apalodimas
@ 2019-04-12 11:43               ` Ilias Apalodimas
  1 sibling, 0 replies; 10+ messages in thread
From: Ilias Apalodimas @ 2019-04-12 11:43 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: catalin.marinas, labbott, linux-arm-kernel, ard.biesheuvel

> The problem happens when you have a small no-map section within a 2MB
> region, and it doesn't cross the even-odd 1MB boundary.

> To illustrate what I'm saying, if you arranged for it to be (eg) one
> page higher (iow, 0xc7f01000) then you'll hit the same problem.
> > 
> > The easiest way out of it is to place the fdt on !SECTION_SIZE right?
> 
> See above, that's not entirely accurate.
I somehow removed this line in my previous response. 
Anyway, you are right my original explanation on avoiding the issue from u-boot
was not entirely accurate. Your description is correct

Cheers
/Ilias

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-04-12 11:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11 15:13 Memory size and section boundary on armv7 Ilias Apalodimas
2019-04-11 15:41 ` Russell King - ARM Linux admin
2019-04-11 15:50   ` Ilias Apalodimas
2019-04-11 16:22     ` Russell King - ARM Linux admin
2019-04-12  5:23       ` Ilias Apalodimas
2019-04-12  8:40         ` Russell King - ARM Linux admin
2019-04-12 10:10           ` Ilias Apalodimas
2019-04-12 11:16             ` Russell King - ARM Linux admin
2019-04-12 11:26               ` Ilias Apalodimas
2019-04-12 11:43               ` Ilias Apalodimas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).