linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Linus Walleij <linus.walleij@linaro.org>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Abbott Liu <liuwenliang@huawei.com>,
	Russell King <linux@armlinux.org.uk>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 1/6 v14] ARM: Handle a device tree in lowmem
Date: Sun, 4 Oct 2020 22:50:28 +0200	[thread overview]
Message-ID: <CACRpkdYhLLp_kjE3Gh=NuTjxXjcFn_iag3vg6P+50PMBiTm+tQ@mail.gmail.com> (raw)
In-Reply-To: <CAMj1kXF-rk-r6CGnUMFw8eYHL2nYcX3RJY9ZP7uQDzXGizocJA@mail.gmail.com>

On Fri, Oct 2, 2020 at 1:01 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> On Thu, 1 Oct 2020 at 17:22, Linus Walleij <linus.walleij@linaro.org> wrote:

> OK, so if I am understanding this correctly, the root problem is that
> the kernel unmaps the memory that the attached DTB resides in, right?

Yups, or, well in some places the kernel knows that the DT is there
so it sets up two 1MB sections over it, and then it assumes "no-one
is ever gonna touch those two section mappings, OK".

It's this code from head.S:

/*
 * Then map boot params address in r2 if specified.
 * We map 2 sections in case the ATAGs/DTB crosses a section boundary.
 */
mov r0, r2, lsr #SECTION_SHIFT
movs r0, r0, lsl #SECTION_SHIFT
subne r3, r0, r8
addne r3, r3, #PAGE_OFFSET
addne r3, r4, r3, lsr #(SECTION_SHIFT - PMD_ORDER)
orrne r6, r7, r0
strne r6, [r3], #1 << PMD_ORDER
addne r6, r6, #1 << SECTION_SHIFT
strne r6, [r3]

Then these two blocks end up in lowmem, and as the kernel
clears the PMD:s in prepare_page_table(), in mm/mmu.c
is clears the two PMDs covering the device tree under
some circumstances.

[Ard]
> So how is it possible that the kernel does not fit in the first
> memblock? Doesn't that mean that we are using memory that is not
> available to begin with?
>
> Do you have a dump of the memory layout on the platform in question?
>
> How do 0xc3000000/0xc3200000 map onto physical addresses, and how are
> those described?

The kernel per se actually fits inside the first memblock,
but not the attached DTB which ends up above the
*uncompressed* kernel, so there is always a gap between
the end of the kernel text and the DTB. See this illustration:
https://dflund.se/~triad/images/decompress-5.jpg

This is how it looks with all my funky debug code (that I
am in the process of merging by the way, so it is easier to
debug things like this):

This platform boots using fastboot:
# fastboot --base 40200000 --cmdline "console=ttyMSM0,115200,n8" boot zImage
So we load the zImage at 0x40200000
which is where the physical memory starts on this
platform because the modem is using 0x0-0x41ffffff.

Then I get these debug prints:

DTB:0x410A09C0 (0x000051B5)
C:0x402080C0-0x410A5C20->0x42217100-0x430B4C60
DTB:0x430AFA00 (0x00005242)

This means the DTB is first found in physical memory at 0x410A09C0
then we relocate the kernel from 0x402080C0 to 0x42217100
to make space for the uncompressed kernel above it
and after that the DTB is found located at 0x430AFA00.

This is using attached device tree so that is why the
DTB is right after the compressed kernel in memory.

Uncompressing Linux... done, booting the kernel.
MMU enable: 0x40300000->0x40300000
Kernel RAM: 0x40000000->0xc0000000
Kernel RAM: 0x40100000->0xc0100000
Kernel RAM: 0x40200000->0xc0200000
(...)
Kernel RAM: 0x42500000->0xc2500000
Kernel RAM: 0x42600000->0xc2600000
ATAG/DTB  : 0x43000000->0xc3000000
ATAG/DTB  : 0x43100000->0xc3100000

First is the 1:1 mapping for the MMU enable code, then all the
kernel text segment mappings. Then the DTB is mapped in
using the linear kernel map and as you can see that
ends up to be at 0xc30AFA00 so we map the 2 MB
at 0x43000000 and 0x43100000.

That is close to the kernel... so indeed:

Clear PMDs from 0x00000000 to 0xb6e00000
Clear PMDs from 0xbf000000 to 0xbf000000
Clear PMDs from 0xbf000000 to 0xc0000000
Memblock[0].base: 0x40200000 size: 0x02c00000, end: 0x42e00000
Clear PMDs from 0xc2e00000 to 0xe0800000 (lowmem)
ATAGs/DTB found in lowmem, skip clearing PMD @0xc3000000
ATAGs/DTB found in lowmem, skip clearing PMD @0xc3200000

Ooops.

(The last two messages comes from this patch, the rest is
debug prints I added.)

The first memblock is 44 MB and the linear map ends at
0xc2e00000 which will cover the whole uncompressed kernel,
but then we come to this code in prepare_page_table():

/*
 * Clear out all the kernel space mappings, except for the first
 * memory bank, up to the vmalloc region.
 */
for (addr = __phys_to_virt(end); addr < VMALLOC_START; addr += PMD_SIZE)
     pmd_clear(pmd_off_k(addr));

This clears out all mappings from the end of the first memblock
to VMALLOC_START. Including the PMD used by the DTB.

Ooops.

> If that memory is not actually available to begin with, this fix
> papers over the problem, and we should be fixing this in the
> decompressor instead.

The kernel actually fits in the first memblock, but then it clears
the lowmem right below itself and by that point the MMU
has figured out that there is another memblock below it
that it will happily use for lowmem and just goes ahead and
wipes that. The code has no awareness
that there might be a DTB there.

The decompressor seems to have always been blissfully ignorant
about what happens with the attached DTB after it just pushed
it a bit upwards in memory (if relocating) it just passes the location
in r2 in accordance with the boot specification, to me it seems
more like something the kernel proper should handle.

I don't think that loading the DTB separately to some high
address as advocated by many peope is much better.
It can create the same problem if loaded in the wrong
place and possibly be placed in other dangerous areas
like inside the VMALLOC area where it can get its
mappings destroyed at any instance. (We don't check for that
either.)

To me it seems like it was always like this and people have
just been trial-and-erroring by putting the DTB at
address until it works.

The way attached DTB works is that the entire compressed
kernel and the attached DTB usually ends up
inside the first memblock as well so the initialization of the
kernel lowmem will not wipe its two PMDs.

The problem with that approach is that if your kernel
suddenly gets bigger - like when enabling KASan - then
this can occur, especially with attached device trees.

I can mitigate the problem for example by just loading the
kernel at 0x50000000 instead:

DTB:0x50EA09C0 (0x000051B5)
C:0x500080C0-0x50EA5C20->0x52217100-0x530B4C60
DTB:0x530AFA00 (0x00005242)
Uncompressing Linux... done, booting the kernel.
MMU enable: 0x50300000->0x50300000
Kernel RAM: 0x50000000->0xc0000000
Kernel RAM: 0x50100000->0xc0100000
Kernel RAM: 0x50200000->0xc0200000
(...)
Kernel RAM: 0x52500000->0xc2500000
Kernel RAM: 0x52600000->0xc2600000
ATAG/DTB  : 0x53000000->0xc3000000
ATAG/DTB  : 0x53100000->0xc3100000
(...)
Clear PMDs from 0x00000000 to 0xb6e00000
Clear PMDs from 0xbf000000 to 0xbf000000
Clear PMDs from 0xbf000000 to 0xc0000000
Memblock end is above arm_lowmem_limit (0x60000000)
Memblock[0].base: 0x50000000 size: 0x10000000, end: 0x60000000
Clear PMDs from 0xd0000000 to 0xd0800000 (lowmem)

The DTB is at physical 0x530AFA00 which is
virtual 0xC30AFA00 and since the first memblock now
is bigger, the kernel and the DTB fits inside it
and the DTB mapping is between the kernel
and the lowmem mappings, so the PMDs will not
be cleared.

I think "most" cases work for people because their
first memblock is big like this (256 MB) and their
kernel is relatively small compared to that so the
uncompressed and compressed kernel, and the
attached DTB all fit inside it. In my case the first
memblock was "just" 44 MB and that was just small
enough to trigger this.

So to summarize:

- The kernel decompressor just moves the kernel and the
  attached DTB upward in memory so it will not be
  overwritten by the decompressor.

- This will sometimes make part of the compressed kernel and
  DTB end up above the first memblock.

- This works because there is more memory above
  the first memblock, in an adjacent memblock.
  (The decompressor has always assumed so much)

- The kernel then puts the lowmem mappings from
  the end of the first memblock to VMALLOC_START
  and clear the PMDs

- If the DTB has been pushed up to lowmem, the two
  PMDs over the DTB will be cleared, resulting in a crash.

Maybe I should just put all of this into the commit
message so people can see the mess :D

Yours,
Linus Walleij

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-10-04 20:52 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-01 15:22 [PATCH 0/6 v14] KASan for Arm Linus Walleij
2020-10-01 15:22 ` [PATCH 1/6 v14] ARM: Handle a device tree in lowmem Linus Walleij
2020-10-01 16:45   ` Florian Fainelli
2020-10-01 20:31     ` Linus Walleij
2020-10-02 11:01   ` Ard Biesheuvel
2020-10-04 20:50     ` Linus Walleij [this message]
2020-10-05  7:14       ` Ard Biesheuvel
2020-10-05  9:14         ` Ard Biesheuvel
2020-10-05 13:27           ` Linus Walleij
2020-10-05 13:30             ` Linus Walleij
2020-10-05 13:36             ` Ard Biesheuvel
2020-10-05 14:22               ` Ard Biesheuvel
2020-10-06  9:11                 ` Linus Walleij
2020-10-06  9:16                   ` Ard Biesheuvel
2020-10-06  9:19                     ` Linus Walleij
2020-10-06  8:47           ` Linus Walleij
2020-10-06  8:48             ` Ard Biesheuvel
2020-10-05 12:26         ` Linus Walleij
2020-10-01 15:22 ` [PATCH 2/6 v14] ARM: Disable KASan instrumentation for some code Linus Walleij
2020-10-01 15:22 ` [PATCH 3/6 v14] ARM: Replace string mem* functions for KASan Linus Walleij
2020-10-01 15:22 ` [PATCH 4/6 v14] ARM: Define the virtual space of KASan's shadow region Linus Walleij
2020-10-01 15:22 ` [PATCH 5/6 v14] ARM: Initialize the mapping of KASan shadow memory Linus Walleij
2020-10-01 15:22 ` [PATCH 6/6 v14] ARM: Enable KASan for ARM Linus Walleij
2020-10-01 19:19 ` [PATCH 0/6 v14] KASan for Arm Florian Fainelli
2020-10-01 20:34   ` Linus Walleij
2020-10-01 20:38     ` Florian Fainelli
2020-10-01 21:18   ` Linus Walleij
2020-10-01 21:29     ` Arnd Bergmann
2020-10-01 21:35     ` Florian Fainelli
2020-10-03 15:50   ` Ard Biesheuvel
2020-10-04  8:06     ` Ard Biesheuvel
2020-10-04  8:41       ` Ard Biesheuvel
2020-10-04  9:09         ` Ard Biesheuvel
2020-10-04 20:24           ` Florian Fainelli
2020-10-05  8:40           ` Linus Walleij
2020-10-06 13:21 ` Linus Walleij

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACRpkdYhLLp_kjE3Gh=NuTjxXjcFn_iag3vg6P+50PMBiTm+tQ@mail.gmail.com' \
    --to=linus.walleij@linaro.org \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=aryabinin@virtuozzo.com \
    --cc=f.fainelli@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    --cc=liuwenliang@huawei.com \
    --cc=rppt@linux.ibm.com \
    --subject='Re: [PATCH 1/6 v14] ARM: Handle a device tree in lowmem' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox