From: Kairui Song <kasong@redhat.com>
To: Baoquan He <bhe@redhat.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Matthew Garrett <matthewgarrett@google.com>,
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>,
Dave Young <dyoung@redhat.com>,
"the arch/x86 maintainers" <x86@kernel.org>
Subject: Re: [PATCH v2] x86, efi: never relocate kernel below lowest acceptable address
Date: Thu, 26 Sep 2019 01:35:46 +0800 [thread overview]
Message-ID: <CACPcB9df97J2UP8xQEOkhABbeo9pZ56GOxMvFwrE6gPRkF2TQg@mail.gmail.com> (raw)
In-Reply-To: <20190925095527.GE31919@MiWiFi-R3L-srv>
On Wed, Sep 25, 2019 at 5:55 PM Baoquan He <bhe@redhat.com> wrote:
>
> On 09/20/19 at 12:05am, Kairui Song wrote:
> > Currently, kernel fails to boot on some HyperV VMs when using EFI.
> > And it's a potential issue on all platforms.
> >
> > It's caused a broken kernel relocation on EFI systems, when below three
> > conditions are met:
> >
> > 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
> > by the loader.
> > 2. There isn't enough room to contain the kernel, starting from the
> > default load address (eg. something else occupied part the region).
> > 3. In the memmap provided by EFI firmware, there is a memory region
> > starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
> > kernel.
>
> Thanks for the effort, Kairui.
>
> Let me summarize what I got from this issue, please correct me if
> anything missed:
>
> ***
> Problem:
> This bug is reported on Hyper-V platform. The kernel will reset to
> firmware w/o any console printing in 1st kernel and kdump kernel
> sometime.
>
> ***
> Root cause:
> With debugging, the resetting to firmware is triggered when execute
> 'rep movsq' line of /boot/compressed/head_64.S. The reason is that
> efi boot stub may put kernel image below 16M, then later head_64.S will
> relocate kernel to 16M directly. That relocation will conflict with some
> efi reserved region, then cause the resetting.
>
> A more detail process based on the problem occurred on that HyperV
> machine:
>
> - kernel (INIT_SIZE: 56820K) got loaded at 0x3c881000 (not aligned,
> and not equal to pref_address 0x1000000), need to relocate.
>
> - efi_relocate_kernel is called, try to allocate INIT_SIZE of memory
> at pref_address, failed, something else occupied this region.
>
> - efi_relocate_kernel call efi_low_alloc as fallback, and got the address
> 0x800000 (Below 0x1000000)
>
> - Later in arch/x86/boot/compressed/head_64.S:108, LOAD_PHYSICAL_ADDR is
> force used as the new load address as the current address is lower than
> that. Then kernel try relocate to 0x1000000.
>
> - However the memory starting from 0x1000000 is not allocated from EFI
> firmware, writing to this region caused the system to reset.
>
> ***
> Solution:
> Alwasys search area above LOAD_PHYSICAL_ADDR, namely 16M to put kernel
> image in /boot/compressed/eboot.c. Then efi boot stub in eboot.c will
> search an suitable area in efi memmap, to make sure no any reserved
> region will conflict with the target area of kernel image. Besides,
> kernel won't be relocated in /boot/compressed/head_64.S since it has
> been above 16M.
>
> #ifdef CONFIG_RELOCATABLE
> leaq startup_32(%rip) /* - $startup_32 */, %rbp
> movl BP_kernel_alignment(%rsi), %eax
> decl %eax
> addq %rax, %rbp
> notq %rax
> andq %rax, %rbp
> cmpq $LOAD_PHYSICAL_ADDR, %rbp
> jge 1f
> #endif
> movq $LOAD_PHYSICAL_ADDR, %rbp
> 1:
>
> /* Target address to relocate to for decompression */
> movl BP_init_size(%rsi), %ebx
> subl $_end, %ebx
> addq %rbp, %rbx
>
Hi Baoquan,
Yes, it's all correct. Thanks for adding these details.
>
> ***
> I have one concerns about this patch:
>
> Why this only happen in Hyper-V platform. Qemu/kvm, baremetal, vmware
> ESI don't have this issue? What's the difference?
Let me post part the efi memmap on that machine (and btw the kernel
size is 55M):
kernel: efi: mem00: type=7, attr=0xf,
range=[0x0000000000000000-0x0000000000080000) (0MB)
kernel: efi: mem01: type=4, attr=0xf,
range=[0x0000000000080000-0x0000000000081000) (0MB)
kernel: efi: mem02: type=2, attr=0xf,
range=[0x0000000000081000-0x0000000000082000) (0MB)
kernel: efi: mem03: type=7, attr=0xf,
range=[0x0000000000082000-0x00000000000a0000) (0MB)
kernel: efi: mem04: type=4, attr=0xf,
range=[0x0000000000100000-0x000000000062a000) (5MB)
kernel: efi: mem05: type=7, attr=0xf,
range=[0x000000000062a000-0x0000000004200000) (59MB)
kernel: efi: mem06: type=4, attr=0xf,
range=[0x0000000004200000-0x0000000004400000) (2MB)
kernel: efi: mem07: type=7, attr=0xf,
range=[0x0000000004400000-0x00000000045c6000) (1MB)
kernel: efi: mem08: type=4, attr=0xf,
range=[0x00000000045c6000-0x00000000045e6000) (0MB)
kernel: efi: mem09: type=3, attr=0xf,
range=[0x00000000045e6000-0x000000000460b000) (0MB)
kernel: efi: mem10: type=4, attr=0xf,
range=[0x000000000460b000-0x0000000004613000) (0MB)
kernel: efi: mem11: type=3, attr=0xf,
range=[0x0000000004613000-0x000000000462b000) (0MB)
kernel: efi: mem12: type=7, attr=0xf,
range=[0x000000000462b000-0x0000000004800000) (1MB)
kernel: efi: mem13: type=2, attr=0xf,
range=[0x0000000004800000-0x0000000007f7d000) (55MB)
kernel: efi: mem14: type=7, attr=0xf,
range=[0x0000000007f7d000-0x0000000039a39000) (794MB)
kernel: efi: mem15: type=2, attr=0xf,
range=[0x0000000039a39000-0x0000000040000000) (101MB)
kernel: efi: mem16: type=7, attr=0xf,
range=[0x0000000040000000-0x000000004263d000) (38MB)
kernel: efi: mem17: type=2, attr=0xf,
range=[0x000000004263d000-0x000000007fff2000) (985MB)
kernel: efi: mem18: type=0, attr=0xf,
range=[0x000000007fff2000-0x000000007fff3000) (0MB)
kernel: efi: mem19: type=7, attr=0xf,
range=[0x000000007fff3000-0x00000000f6aaf000) (1898MB)
kernel: efi: mem20: type=2, attr=0xf,
range=[0x00000000f6aaf000-0x00000000f6ab0000) (0MB)
kernel: efi: mem21: type=1, attr=0xf,
range=[0x00000000f6ab0000-0x00000000f6bcd000) (1MB)
kernel: efi: mem22: type=2, attr=0xf,
range=[0x00000000f6bcd000-0x00000000f6cec000) (1MB)
kernel: efi: mem23: type=1, attr=0xf,
range=[0x00000000f6cec000-0x00000000f6dfb000) (1MB)
kernel: efi: mem24: type=6, attr=0x800000000000000f,
range=[0x00000000f6dfb000-0x00000000f6e06000) (0MB)
kernel: efi: mem25: type=9, attr=0xf,
range=[0x00000000f6e06000-0x00000000f6e07000) (0MB)
kernel: efi: mem26: type=3, attr=0xf,
range=[0x00000000f6e07000-0x00000000f6eea000) (0MB)
kernel: efi: mem27: type=9, attr=0xf,
range=[0x00000000f6eea000-0x00000000f6ef2000) (0MB)
kernel: efi: mem28: type=6, attr=0x800000000000000f,
range=[0x00000000f6ef2000-0x00000000f6f1b000) (0MB)
kernel: efi: mem29: type=7, attr=0xf,
range=[0x00000000f6f1b000-0x00000000f73c1000) (4MB)
kernel: efi: mem30: type=4, attr=0xf,
range=[0x00000000f73c1000-0x00000000f7e1b000) (10MB)
kernel: efi: mem31: type=3, attr=0xf,
range=[0x00000000f7e1b000-0x00000000f7f9b000) (1MB)
kernel: efi: mem32: type=5, attr=0x800000000000000f,
range=[0x00000000f7f9b000-0x00000000f7fcb000) (0MB)
kernel: efi: mem33: type=6, attr=0x800000000000000f,
range=[0x00000000f7fcb000-0x00000000f7fef000) (0MB)
kernel: efi: mem34: type=0, attr=0xf,
range=[0x00000000f7fef000-0x00000000f7ff3000) (0MB)
kernel: efi: mem35: type=9, attr=0xf,
range=[0x00000000f7ff3000-0x00000000f7ffb000) (0MB)
kernel: efi: mem36: type=10, attr=0xf,
range=[0x00000000f7ffb000-0x00000000f7fff000) (0MB)
kernel: efi: mem37: type=4, attr=0xf,
range=[0x00000000f7fff000-0x00000000f8000000) (0MB)
kernel: efi: mem38: type=7, attr=0xf,
range=[0x0000000100000000-0x0000000108000000) (128MB)
kernel: efi: mem39: type=0, attr=0x1,
range=[0x00000000000c0000-0x0000000000100000) (0MB)
You see, there is a region:
kernel: efi: mem05: type=7, attr=0xf,
range=[0x000000000062a000-0x0000000004200000) (59MB)
Which fits the kernel, and it's below 0x1000000 (16M), and the loader
didn't load the kernel to a prefered address (16M), so efi-stub will
relocate kernel to that low region.
I didn't observe any other platform's firmware will provide a region
starts below 16M and large enough to contain kernel, and load kernel
into a strange address at the same time.
>
> By the way, I personally like this way better. Because it is fixing a
> potention issue. Efi boot stub code may put kernel below 16M, but the
> relocation code in boot/compressed/head_64.S doesn't consider the
> possible conflict, and head_64.S have no way to know the efi memmap
> information. If this patch can't be accepted, woring around it in
> Hyper-V may be a way.
>
> Thanks
> Baoquan
>
Thanks for the review!
--
Best Regards,
Kairui Song
next prev parent reply other threads:[~2019-09-25 17:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-19 16:05 [PATCH v2] x86, efi: never relocate kernel below lowest acceptable address Kairui Song
2019-09-25 9:51 ` Jarkko Sakkinen
2019-09-25 9:55 ` Baoquan He
2019-09-25 17:35 ` Kairui Song [this message]
2019-09-25 15:25 ` Ard Biesheuvel
2019-09-25 17:36 ` Kairui Song
2019-10-11 10:18 ` Kairui Song
2019-10-11 13:23 ` Borislav Petkov
2019-10-12 3:46 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACPcB9df97J2UP8xQEOkhABbeo9pZ56GOxMvFwrE6gPRkF2TQg@mail.gmail.com \
--to=kasong@redhat.com \
--cc=ard.biesheuvel@linaro.org \
--cc=bhe@redhat.com \
--cc=bp@alien8.de \
--cc=dyoung@redhat.com \
--cc=jarkko.sakkinen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=matthewgarrett@google.com \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).