initramfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
       [not found]       ` <c692eea9213172d8ef937322b02ff585b0dfea82.camel-fHUSW+XspFU@public.gmane.org>
@ 2020-04-06  3:51         ` Arvind Sankar
  2020-04-06  7:32           ` Ard Biesheuvel
  0 siblings, 1 reply; 26+ messages in thread
From: Arvind Sankar @ 2020-04-06  3:51 UTC (permalink / raw)
  To: Sergey Shatunov
  Cc: Arvind Sankar, bp-Gina5bIWoIWzQB+pC5nmwQ,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi-u79uwXL29TY76Z2rM5mHXA,
	Ard Biesheuvel, initramfs-u79uwXL29TY76Z2rM5mHXA,
	Donovan Tremura, Harald Hoyer

On Mon, Apr 06, 2020 at 07:00:39AM +0700, Sergey Shatunov wrote:
> On Sun, 2020-04-05 at 19:18 -0400, Arvind Sankar wrote:
> > I'm not familiar with systemd-boot: when you say systemd-boot stub,
> > is
> > that something different from the kernel's EFI_STUB option? Or is it
> > just a kernel with EFI_STUB enabled and with builtin initramfs +
> > builtin
> > cmdline?
> Basicaly systemd-boot stub is efi application with packed EFI_STUB-
> enabled kernel, initrd and cmdline into single file. Source can be
> found here: 
> https://github.com/systemd/systemd/blob/master/src/boot/efi/stub.c
> 
> It doesn't do anything unusual, just extracting data from sections and
> calling efi handover.
> 
> Final image created by objcopy'ing precompiled stub and adding sections with that stuff:
> 
>     objcopy \
>         --add-section .osrel=os_release --change-section-vma
> '.osrel=0x20000' \
>         --add-section .cmdline=cmdline --change-section-vma
> '.cmdline=0x30000' \
>         --add-section .linux=vmlinuz --change-section-vma
> '.linux=0x2000000' \
>         --add-section .initrd=initrd --change-section-vma
> '.initrd=0x3000000' \
>         /usr/lib/systemd/boot/efi/linuxx64.efi.stub output.efi

So this embeds the bzImage which is a PE executable inside another PE
executable. Before my patch, the bss section was explicitly part of the
bzImage and so would have been zero, now it isn't any more and the PE
loader is expected to zero it out before executing. systemd-boot's stub
loader doesn't do that prior to jumping to the EFI handover entry, so
the issue must be because bss contains garbage.  I'm not 100% sure why
that leads to a crash, as the only variables in bss in the EFI stub are
for some boolean EFI command line arguments, so it ought to still have
worked, just as though it was invoked with random arguments. Anyway we
need to handle an uninitialized bss to get this to work properly.

I also see from systemd [0] and dracut source [1] that these VMA's seem
to be hardcoded with no checking for how big the files actually are, and
objcopy doesn't seem to complain if sections end up overlapping.

So since [2] in dracut, the space available for the .linux section
containing the bzImage shrank from ~48MiB to 16MiB. This will hopefully
still fit the compressed kernel (although an allyesconfig bzImage is far
bigger than even 48MiB), but in-place decompression is unlikely to be
possible even for a normal config, which will break another patchset
that got merged into mainline for 5.7 [3,4], which tries to avoid
copying the kernel unless necessary, and has a good chance of triggering
in-place decompression if kaslr is disabled.

I'll get systemd-boot installed here so I can reproduce and implement
some workarounds for both issues. I should hopefully have a fix in a day
or two.

[0] https://github.com/systemd/systemd/blob/9fac14980df8dcce922e1fe8856a88b09590d2c3/test/test-efi-create-disk.sh#L30
[1] https://git.kernel.org/pub/scm/boot/dracut/dracut.git/tree/dracut.sh#n2039
[2] https://git.kernel.org/pub/scm/boot/dracut/dracut.git/commit/?id=4237aeb040c276722b528001bdea31e6eb994d06
[3] https://lore.kernel.org/linux-efi/20200303221205.4048668-1-nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5cdf4cfeac914617ca22866bd4685fd7f876dec

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
  2020-04-06  3:51         ` [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage Arvind Sankar
@ 2020-04-06  7:32           ` Ard Biesheuvel
  2020-04-06  8:47             ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Ard Biesheuvel @ 2020-04-06  7:32 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Sergey Shatunov, bp, hpa, Linux Kernel Mailing List, mingo,
	Thomas Gleixner, x86, linux-efi, initramfs, Donovan Tremura,
	Harald Hoyer

On Mon, 6 Apr 2020 at 05:51, Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Mon, Apr 06, 2020 at 07:00:39AM +0700, Sergey Shatunov wrote:
> > On Sun, 2020-04-05 at 19:18 -0400, Arvind Sankar wrote:
> > > I'm not familiar with systemd-boot: when you say systemd-boot stub,
> > > is
> > > that something different from the kernel's EFI_STUB option? Or is it
> > > just a kernel with EFI_STUB enabled and with builtin initramfs +
> > > builtin
> > > cmdline?
> > Basicaly systemd-boot stub is efi application with packed EFI_STUB-
> > enabled kernel, initrd and cmdline into single file. Source can be
> > found here:
> > https://github.com/systemd/systemd/blob/master/src/boot/efi/stub.c
> >
> > It doesn't do anything unusual, just extracting data from sections and
> > calling efi handover.
> >
> > Final image created by objcopy'ing precompiled stub and adding sections with that stuff:
> >
> >     objcopy \
> >         --add-section .osrel=os_release --change-section-vma
> > '.osrel=0x20000' \
> >         --add-section .cmdline=cmdline --change-section-vma
> > '.cmdline=0x30000' \
> >         --add-section .linux=vmlinuz --change-section-vma
> > '.linux=0x2000000' \
> >         --add-section .initrd=initrd --change-section-vma
> > '.initrd=0x3000000' \
> >         /usr/lib/systemd/boot/efi/linuxx64.efi.stub output.efi
>
> So this embeds the bzImage which is a PE executable inside another PE
> executable. Before my patch, the bss section was explicitly part of the
> bzImage and so would have been zero, now it isn't any more and the PE
> loader is expected to zero it out before executing. systemd-boot's stub
> loader doesn't do that prior to jumping to the EFI handover entry, so
> the issue must be because bss contains garbage.  I'm not 100% sure why
> that leads to a crash, as the only variables in bss in the EFI stub are
> for some boolean EFI command line arguments, so it ought to still have
> worked, just as though it was invoked with random arguments. Anyway we
> need to handle an uninitialized bss to get this to work properly.
>

The EFI handover protocol strikes again :-(

It seems we did not include any guidance in the documentation in
Documentation/x86/boot.rst regarding zero-initializing BSS, and come
to think of it, we don't include any other requirements either, i.e.,
regarding placement wrt section alignment etc. This is a serious bug.
Even though EFI usually lays out PE/COFF images in files the exact way
they appear in memory, this is not actually required by the spec. Most
notably, the virtual size can be smaller than the file size, and the
loader is expected to zero-initialize the difference as well.

Since the EFI handover protocol should be considered deprecated at
this point (and is never going to be supported in upstream GRUB
either, for instance), I would recommend the systemd-boot developers
to start looking into deprecating this as well, and switch to the
ordinary PE/COFF entry point, and use the new initrd callback protocol
for initrd loading.

On the Linux/x86 side, we should at least add some code to the EFI
handover protocol entry point to zero initialize BSS, and ensure that
it is either not needed in other places, or add the code to deal with
those as well.

> I also see from systemd [0] and dracut source [1] that these VMA's seem
> to be hardcoded with no checking for how big the files actually are, and
> objcopy doesn't seem to complain if sections end up overlapping.
>
> So since [2] in dracut, the space available for the .linux section
> containing the bzImage shrank from ~48MiB to 16MiB. This will hopefully
> still fit the compressed kernel (although an allyesconfig bzImage is far
> bigger than even 48MiB), but in-place decompression is unlikely to be
> possible even for a normal config, which will break another patchset
> that got merged into mainline for 5.7 [3,4], which tries to avoid
> copying the kernel unless necessary, and has a good chance of triggering
> in-place decompression if kaslr is disabled.
>
> I'll get systemd-boot installed here so I can reproduce and implement
> some workarounds for both issues. I should hopefully have a fix in a day
> or two.
>

Thanks Arvind.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
  2020-04-06  7:32           ` Ard Biesheuvel
@ 2020-04-06  8:47             ` Borislav Petkov
       [not found]               ` <20200406084738.GA2520-Jj63ApZU6fQ@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2020-04-06  8:47 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arvind Sankar, Sergey Shatunov, hpa, Linux Kernel Mailing List,
	mingo, Thomas Gleixner, x86, linux-efi, initramfs,
	Donovan Tremura, Harald Hoyer

On Mon, Apr 06, 2020 at 09:32:47AM +0200, Ard Biesheuvel wrote:
> The EFI handover protocol strikes again :-(
> 
> It seems we did not include any guidance in the documentation in
> Documentation/x86/boot.rst regarding zero-initializing BSS, and come
> to think of it, we don't include any other requirements either, i.e.,
> regarding placement wrt section alignment etc. This is a serious bug.
> Even though EFI usually lays out PE/COFF images in files the exact way
> they appear in memory, this is not actually required by the spec. Most
> notably, the virtual size can be smaller than the file size, and the
> loader is expected to zero-initialize the difference as well.

Is that expectation stated explicitly somewhere?

> Since the EFI handover protocol should be considered deprecated at
> this point (and is never going to be supported in upstream GRUB
> either, for instance), I would recommend the systemd-boot developers
> to start looking into deprecating this as well, and switch to the
> ordinary PE/COFF entry point, and use the new initrd callback protocol
> for initrd loading.

Any pointers to that new initrd callback protocol?

In any case, I'd really appreciate a patch to boot.rst formulating those
requirements so that they're written down and people can find them.

> On the Linux/x86 side, we should at least add some code to the EFI
> handover protocol entry point to zero initialize BSS, and ensure that
> it is either not needed in other places, or add the code to deal with
> those as well.

Sounds like a simple fix, if that would fix it.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
       [not found]               ` <20200406084738.GA2520-Jj63ApZU6fQ@public.gmane.org>
@ 2020-04-06  9:11                 ` Ard Biesheuvel
       [not found]                   ` <CAMj1kXHAieZDvPKfjF=J+G=QVS+=XS-b4RP_=mjCEFEB_E_+Qw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ard Biesheuvel @ 2020-04-06  9:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Arvind Sankar, Sergey Shatunov, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	Linux Kernel Mailing List, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	Thomas Gleixner, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Mon, 6 Apr 2020 at 10:47, Borislav Petkov <bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org> wrote:
>
> On Mon, Apr 06, 2020 at 09:32:47AM +0200, Ard Biesheuvel wrote:
> > The EFI handover protocol strikes again :-(
> >
> > It seems we did not include any guidance in the documentation in
> > Documentation/x86/boot.rst regarding zero-initializing BSS, and come
> > to think of it, we don't include any other requirements either, i.e.,
> > regarding placement wrt section alignment etc. This is a serious bug.
> > Even though EFI usually lays out PE/COFF images in files the exact way
> > they appear in memory, this is not actually required by the spec. Most
> > notably, the virtual size can be smaller than the file size, and the
> > loader is expected to zero-initialize the difference as well.
>
> Is that expectation stated explicitly somewhere?
>

Yes, it is in the PE/COFF specification. [0]

The whole problem is that we are conflating 'loading a PE/COFF image'
with 'copying a PE/COFF image into memory', which are not the same
thing. It is not just the layout issue, we are running into other
problems with things like UEFI secure boot and TPM-based measured
boot, where the fact that omitting the standard LoadImage() boot
service (which takes care of these things under the hood) means that
you now have to do your own checks and measurements. These things are
literally all over the place at the moment, shim, GRUB, systemd-boot
etc, with no authoritative spec that describes which component should
be doing what.

> > Since the EFI handover protocol should be considered deprecated at
> > this point (and is never going to be supported in upstream GRUB
> > either, for instance), I would recommend the systemd-boot developers
> > to start looking into deprecating this as well, and switch to the
> > ordinary PE/COFF entry point, and use the new initrd callback protocol
> > for initrd loading.
>
> Any pointers to that new initrd callback protocol?
>

Commit ec93fc371f014a6fb483e3556061ecad4b40735c has the background, but ...

> In any case, I'd really appreciate a patch to boot.rst formulating those
> requirements so that they're written down and people can find them.
>

... I'll look into updating the documentation as well. Note that this
stuff is hot off the press, so there may be some issues lurking (like
this one) that we hadn't thought of yet.

OVMF and u-boot already have implementations of this initrd loading
approach. A GRUB version is under discussion.

> > On the Linux/x86 side, we should at least add some code to the EFI
> > handover protocol entry point to zero initialize BSS, and ensure that
> > it is either not needed in other places, or add the code to deal with
> > those as well.
>
> Sounds like a simple fix, if that would fix it.
>

Actually, it may be sufficient to #define __efistub_global to
__section(.data) like we already do for ARM, to ensure that these
global flags are always initialized correctly. (I'll wait for Sergey
to confirm that the spurious enabling of the PCI DMA protection
resulting from this BSS issue is causing the boot regression)



[0] https://docs.microsoft.com/en-us/windows/win32/debug/pe-format

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
       [not found]                   ` <CAMj1kXHAieZDvPKfjF=J+G=QVS+=XS-b4RP_=mjCEFEB_E_+Qw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-06 11:20                     ` Borislav Petkov
  2020-04-06 13:22                       ` Arvind Sankar
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2020-04-06 11:20 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arvind Sankar, Sergey Shatunov, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	Linux Kernel Mailing List, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	Thomas Gleixner, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Mon, Apr 06, 2020 at 11:11:21AM +0200, Ard Biesheuvel wrote:
> Yes, it is in the PE/COFF specification. [0]
> 
> The whole problem is that we are conflating 'loading a PE/COFF image'
> with 'copying a PE/COFF image into memory', which are not the same
> thing. It is not just the layout issue, we are running into other
> problems with things like UEFI secure boot and TPM-based measured
> boot, where the fact that omitting the standard LoadImage() boot
> service (which takes care of these things under the hood) means that
> you now have to do your own checks and measurements. These things are
> literally all over the place at the moment, shim, GRUB, systemd-boot
> etc, with no authoritative spec that describes which component should
> be doing what.

Sounds to me like what LoadImage() does is what the authoritative spec
should be. Perhaps we should write it down as "Do what LoadImage()
does... " and then enumerate the requirements.

> Commit ec93fc371f014a6fb483e3556061ecad4b40735c has the background, but ...

Nice, I like the aspect of letting firmware do only a minimum amount of
work. :)

> ... I'll look into updating the documentation as well.

Thanks!

> Note that this stuff is hot off the press, so there may be some issues
> lurking (like this one) that we hadn't thought of yet.

Right.

> Actually, it may be sufficient to #define __efistub_global to
> __section(.data) like we already do for ARM, to ensure that these
> global flags are always initialized correctly. (I'll wait for Sergey
> to confirm that the spurious enabling of the PCI DMA protection
> resulting from this BSS issue is causing the boot regression)

Cool, but let's not jinx it. :-)

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
  2020-04-06 11:20                     ` Borislav Petkov
@ 2020-04-06 13:22                       ` Arvind Sankar
       [not found]                         ` <20200406132215.GA113388-ZaHV2fEG+eCZQHugZx4kin66tl449arB@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Arvind Sankar @ 2020-04-06 13:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ard Biesheuvel, Arvind Sankar, Sergey Shatunov, hpa,
	Linux Kernel Mailing List, mingo, Thomas Gleixner, x86,
	linux-efi, initramfs, Donovan Tremura, Harald Hoyer

On Mon, Apr 06, 2020 at 01:20:42PM +0200, Borislav Petkov wrote:
> On Mon, Apr 06, 2020 at 11:11:21AM +0200, Ard Biesheuvel wrote:
> > Yes, it is in the PE/COFF specification. [0]
> > 
> > The whole problem is that we are conflating 'loading a PE/COFF image'
> > with 'copying a PE/COFF image into memory', which are not the same
> > thing. It is not just the layout issue, we are running into other
> > problems with things like UEFI secure boot and TPM-based measured
> > boot, where the fact that omitting the standard LoadImage() boot
> > service (which takes care of these things under the hood) means that
> > you now have to do your own checks and measurements. These things are
> > literally all over the place at the moment, shim, GRUB, systemd-boot
> > etc, with no authoritative spec that describes which component should
> > be doing what.
> 
> Sounds to me like what LoadImage() does is what the authoritative spec
> should be. Perhaps we should write it down as "Do what LoadImage()
> does... " and then enumerate the requirements.
> 
> > Commit ec93fc371f014a6fb483e3556061ecad4b40735c has the background, but ...
> 
> Nice, I like the aspect of letting firmware do only a minimum amount of
> work. :)
> 
> > ... I'll look into updating the documentation as well.
> 
> Thanks!
> 
> > Note that this stuff is hot off the press, so there may be some issues
> > lurking (like this one) that we hadn't thought of yet.
> 
> Right.
> 
> > Actually, it may be sufficient to #define __efistub_global to
> > __section(.data) like we already do for ARM, to ensure that these
> > global flags are always initialized correctly. (I'll wait for Sergey
> > to confirm that the spurious enabling of the PCI DMA protection
> > resulting from this BSS issue is causing the boot regression)

Yeah I thought of that as the easiest fix, but it might be safer to
explicitly zero-init in efi_main to avoid future problems in case
someone adds another variable in bss and isn't aware of this obscure
requirement. We actually already have sys_table in bss, but that one is
always initialized. There's also other globals that aren't annotated
(but not in bss by virtue of having initializers). What do you think?

What do you think of the other problem -- that's actually worse to fix,
as it won't just be when kaslr is disabled, the startup_64 code will do
relocation to the end of init_size and clobber the initrd before getting
to the kaslr code, so it will break as soon as the firmware loads the
"unified kernel image" at a 2Mb-aligned address. The only thing I can
think of is to just unconditionally call efi_relocate_kernel if we were
entered via handover_entry?

> 
> Cool, but let's not jinx it. :-)
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
       [not found]                         ` <20200406132215.GA113388-ZaHV2fEG+eCZQHugZx4kin66tl449arB@public.gmane.org>
@ 2020-04-06 13:29                           ` Ard Biesheuvel
       [not found]                             ` <CAMj1kXG+34-bK1XuxX5VopkRt1SV1ewUAEmif+aQj5cJQ=9vbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ard Biesheuvel @ 2020-04-06 13:29 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Borislav Petkov, Sergey Shatunov, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	Linux Kernel Mailing List, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	Thomas Gleixner, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Mon, 6 Apr 2020 at 15:22, Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
>
> On Mon, Apr 06, 2020 at 01:20:42PM +0200, Borislav Petkov wrote:
> > On Mon, Apr 06, 2020 at 11:11:21AM +0200, Ard Biesheuvel wrote:
> > > Yes, it is in the PE/COFF specification. [0]
> > >
> > > The whole problem is that we are conflating 'loading a PE/COFF image'
> > > with 'copying a PE/COFF image into memory', which are not the same
> > > thing. It is not just the layout issue, we are running into other
> > > problems with things like UEFI secure boot and TPM-based measured
> > > boot, where the fact that omitting the standard LoadImage() boot
> > > service (which takes care of these things under the hood) means that
> > > you now have to do your own checks and measurements. These things are
> > > literally all over the place at the moment, shim, GRUB, systemd-boot
> > > etc, with no authoritative spec that describes which component should
> > > be doing what.
> >
> > Sounds to me like what LoadImage() does is what the authoritative spec
> > should be. Perhaps we should write it down as "Do what LoadImage()
> > does... " and then enumerate the requirements.
> >
> > > Commit ec93fc371f014a6fb483e3556061ecad4b40735c has the background, but ...
> >
> > Nice, I like the aspect of letting firmware do only a minimum amount of
> > work. :)
> >
> > > ... I'll look into updating the documentation as well.
> >
> > Thanks!
> >
> > > Note that this stuff is hot off the press, so there may be some issues
> > > lurking (like this one) that we hadn't thought of yet.
> >
> > Right.
> >
> > > Actually, it may be sufficient to #define __efistub_global to
> > > __section(.data) like we already do for ARM, to ensure that these
> > > global flags are always initialized correctly. (I'll wait for Sergey
> > > to confirm that the spurious enabling of the PCI DMA protection
> > > resulting from this BSS issue is causing the boot regression)
>
> Yeah I thought of that as the easiest fix, but it might be safer to
> explicitly zero-init in efi_main to avoid future problems in case
> someone adds another variable in bss and isn't aware of this obscure
> requirement. We actually already have sys_table in bss, but that one is
> always initialized. There's also other globals that aren't annotated
> (but not in bss by virtue of having initializers). What do you think?
>

*If* we zero init BSS, I'd prefer for it to be done in the EFI
handover protocol entrypoint only. But does that fix the issue that
BSS lives outside of the memory footprint of the kernel image?


> What do you think of the other problem -- that's actually worse to fix,
> as it won't just be when kaslr is disabled, the startup_64 code will do
> relocation to the end of init_size and clobber the initrd before getting
> to the kaslr code, so it will break as soon as the firmware loads the
> "unified kernel image" at a 2Mb-aligned address. The only thing I can
> think of is to just unconditionally call efi_relocate_kernel if we were
> entered via handover_entry?
>

Yes, that seems to be the most robust approach.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
       [not found]                             ` <CAMj1kXG+34-bK1XuxX5VopkRt1SV1ewUAEmif+aQj5cJQ=9vbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-06 16:01                               ` Arvind Sankar
       [not found]                                 ` <20200406160148.GB113388-ZaHV2fEG+eCZQHugZx4kin66tl449arB@public.gmane.org>
  2020-04-06 17:21                               ` [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage Borislav Petkov
  1 sibling, 1 reply; 26+ messages in thread
From: Arvind Sankar @ 2020-04-06 16:01 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arvind Sankar, Borislav Petkov, Sergey Shatunov,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, Linux Kernel Mailing List,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Thomas Gleixner,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Mon, Apr 06, 2020 at 03:29:18PM +0200, Ard Biesheuvel wrote:
> > >
> > > > Actually, it may be sufficient to #define __efistub_global to
> > > > __section(.data) like we already do for ARM, to ensure that these
> > > > global flags are always initialized correctly. (I'll wait for Sergey
> > > > to confirm that the spurious enabling of the PCI DMA protection
> > > > resulting from this BSS issue is causing the boot regression)
> >
> > Yeah I thought of that as the easiest fix, but it might be safer to
> > explicitly zero-init in efi_main to avoid future problems in case
> > someone adds another variable in bss and isn't aware of this obscure
> > requirement. We actually already have sys_table in bss, but that one is
> > always initialized. There's also other globals that aren't annotated
> > (but not in bss by virtue of having initializers). What do you think?
> >
> 
> *If* we zero init BSS, I'd prefer for it to be done in the EFI
> handover protocol entrypoint only. But does that fix the issue that
> BSS lives outside of the memory footprint of the kernel image?
> 

Yes, I was thinking of only doing it if we didn't come through the
pe_entry. We could also avoid re-parsing the command line if we add a
global flag to indicate that.

Regarding the memory footprint, if there's no initrd that might be a
problem, since in that case ImageSize will only cover the actual data
from the bzImage, so it would be safer to move them to data (including
sys_table).

We could just pull the stub's bss section all into data with objcopy
similar to what ARM64 does [1]? i.e. rename .bss to .bss.efistub and
then pull that into .data in the linker script for the compressed
kernel?

There is also this scary looking comment in gnu-efi's linker script:
	/* the EFI loader doesn't seem to like a .bss section, so we stick
	   it all into .data: */
I don't know what the history of that is.

[1] As an aside, why doesn't ARM do this as well rather than using the
section(.data) annotation?

> 
> > What do you think of the other problem -- that's actually worse to fix,
> > as it won't just be when kaslr is disabled, the startup_64 code will do
> > relocation to the end of init_size and clobber the initrd before getting
> > to the kaslr code, so it will break as soon as the firmware loads the
> > "unified kernel image" at a 2Mb-aligned address. The only thing I can
> > think of is to just unconditionally call efi_relocate_kernel if we were
> > entered via handover_entry?
> >
> 
> Yes, that seems to be the most robust approach.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
       [not found]                                 ` <20200406160148.GB113388-ZaHV2fEG+eCZQHugZx4kin66tl449arB@public.gmane.org>
@ 2020-04-06 16:22                                   ` Ard Biesheuvel
       [not found]                                     ` <CAMj1kXFKDWB9n0kWhXHLH0XP0O1v_0b=bFJad=kBvx2qVxqDcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ard Biesheuvel @ 2020-04-06 16:22 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Borislav Petkov, Sergey Shatunov, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	Linux Kernel Mailing List, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	Thomas Gleixner, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Mon, 6 Apr 2020 at 18:01, Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
>
> On Mon, Apr 06, 2020 at 03:29:18PM +0200, Ard Biesheuvel wrote:
> > > >
> > > > > Actually, it may be sufficient to #define __efistub_global to
> > > > > __section(.data) like we already do for ARM, to ensure that these
> > > > > global flags are always initialized correctly. (I'll wait for Sergey
> > > > > to confirm that the spurious enabling of the PCI DMA protection
> > > > > resulting from this BSS issue is causing the boot regression)
> > >
> > > Yeah I thought of that as the easiest fix, but it might be safer to
> > > explicitly zero-init in efi_main to avoid future problems in case
> > > someone adds another variable in bss and isn't aware of this obscure
> > > requirement. We actually already have sys_table in bss, but that one is
> > > always initialized. There's also other globals that aren't annotated
> > > (but not in bss by virtue of having initializers). What do you think?
> > >
> >
> > *If* we zero init BSS, I'd prefer for it to be done in the EFI
> > handover protocol entrypoint only. But does that fix the issue that
> > BSS lives outside of the memory footprint of the kernel image?
> >
>
> Yes, I was thinking of only doing it if we didn't come through the
> pe_entry. We could also avoid re-parsing the command line if we add a
> global flag to indicate that.
>

Yeah. I was trying to avoid that, but if we end up needing to
distinguish between the two cases anyway, we might just as well
optimize that too.

> Regarding the memory footprint, if there's no initrd that might be a
> problem, since in that case ImageSize will only cover the actual data
> from the bzImage, so it would be safer to move them to data (including
> sys_table).
>


> We could just pull the stub's bss section all into data with objcopy
> similar to what ARM64 does [1]? i.e. rename .bss to .bss.efistub and
> then pull that into .data in the linker script for the compressed
> kernel?
>

I don't follow. I'm not aware of arm64 doing anything out of the
ordinary with .bss? Note that arm64 does not have a separate
decompressor, so the EFI stub is part of the kernel proper. This is
why sections are renamed to start with .init

> There is also this scary looking comment in gnu-efi's linker script:
>         /* the EFI loader doesn't seem to like a .bss section, so we stick
>            it all into .data: */
> I don't know what the history of that is.
>

I don't remember, to be honest, but I'm pretty sure I copy-pasted that
from elsewhere at the time.

> [1] As an aside, why doesn't ARM do this as well rather than using the
> section(.data) annotation?
>

The ARM decompressor has this hideous hack, where text [and rodata]
executes straight from flash, and BSS is relocated into DRAM. In order
for this to work, it actually *requires* GOT indirections for BSS
items, to ensure that all BSS references use absolute addresses, which
can be relocated independently from the rest of the kernel image. This
is the reason ARM does not permit a .data section in the decompressor.
However, the EFI stub does not support execute in place from flash,
and so we can permit a .data section there. At the same time, we don't
support GOT indirections in the EFI stub, so we cannot use the BSS
section. So instead, we just put the few BSS pieces we have into .data
instead.






> >
> > > What do you think of the other problem -- that's actually worse to fix,
> > > as it won't just be when kaslr is disabled, the startup_64 code will do
> > > relocation to the end of init_size and clobber the initrd before getting
> > > to the kaslr code, so it will break as soon as the firmware loads the
> > > "unified kernel image" at a 2Mb-aligned address. The only thing I can
> > > think of is to just unconditionally call efi_relocate_kernel if we were
> > > entered via handover_entry?
> > >
> >
> > Yes, that seems to be the most robust approach.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
       [not found]                                     ` <CAMj1kXFKDWB9n0kWhXHLH0XP0O1v_0b=bFJad=kBvx2qVxqDcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-06 16:52                                       ` Arvind Sankar
  2020-04-06 16:59                                         ` Ard Biesheuvel
  0 siblings, 1 reply; 26+ messages in thread
From: Arvind Sankar @ 2020-04-06 16:52 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arvind Sankar, Borislav Petkov, Sergey Shatunov,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, Linux Kernel Mailing List,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Thomas Gleixner,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Mon, Apr 06, 2020 at 06:22:33PM +0200, Ard Biesheuvel wrote:
> 
> > We could just pull the stub's bss section all into data with objcopy
> > similar to what ARM64 does [1]? i.e. rename .bss to .bss.efistub and
> > then pull that into .data in the linker script for the compressed
> > kernel?
> >
> 
> I don't follow. I'm not aware of arm64 doing anything out of the
> ordinary with .bss? Note that arm64 does not have a separate
> decompressor, so the EFI stub is part of the kernel proper. This is
> why sections are renamed to start with .init

ARM64 renames the stub sections prefixing them with .init, and that
includes .bss. The ARM64 linker script then puts .init.bss into the
.init.data section. So I was suggesting doing something similar with the
x86 stub, renaming .bss to .bss.efistub, and then putting that into
.data via linker script.

Anyway, I'll do the section annotation for now as that will be less
churn and we can revisit this after that.

> 
> > There is also this scary looking comment in gnu-efi's linker script:
> >         /* the EFI loader doesn't seem to like a .bss section, so we stick
> >            it all into .data: */
> > I don't know what the history of that is.
> >
> 
> I don't remember, to be honest, but I'm pretty sure I copy-pasted that
> from elsewhere at the time.
> 
> > [1] As an aside, why doesn't ARM do this as well rather than using the
> > section(.data) annotation?
> >
> 
> The ARM decompressor has this hideous hack, where text [and rodata]
> executes straight from flash, and BSS is relocated into DRAM. In order
> for this to work, it actually *requires* GOT indirections for BSS
> items, to ensure that all BSS references use absolute addresses, which
> can be relocated independently from the rest of the kernel image. This
> is the reason ARM does not permit a .data section in the decompressor.
> However, the EFI stub does not support execute in place from flash,
> and so we can permit a .data section there. At the same time, we don't
> support GOT indirections in the EFI stub, so we cannot use the BSS
> section. So instead, we just put the few BSS pieces we have into .data
> instead.
> 

I see, thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
  2020-04-06 16:52                                       ` Arvind Sankar
@ 2020-04-06 16:59                                         ` Ard Biesheuvel
       [not found]                                           ` <CAMj1kXEUkJ1XJ9OTsijeq8tNNYC00bXqEV44OMtX5ugo9WoLKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ard Biesheuvel @ 2020-04-06 16:59 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Borislav Petkov, Sergey Shatunov, hpa, Linux Kernel Mailing List,
	mingo, Thomas Gleixner, x86, linux-efi, initramfs,
	Donovan Tremura, Harald Hoyer

On Mon, 6 Apr 2020 at 18:52, Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Mon, Apr 06, 2020 at 06:22:33PM +0200, Ard Biesheuvel wrote:
> >
> > > We could just pull the stub's bss section all into data with objcopy
> > > similar to what ARM64 does [1]? i.e. rename .bss to .bss.efistub and
> > > then pull that into .data in the linker script for the compressed
> > > kernel?
> > >
> >
> > I don't follow. I'm not aware of arm64 doing anything out of the
> > ordinary with .bss? Note that arm64 does not have a separate
> > decompressor, so the EFI stub is part of the kernel proper. This is
> > why sections are renamed to start with .init
>
> ARM64 renames the stub sections prefixing them with .init, and that
> includes .bss. The ARM64 linker script then puts .init.bss into the
> .init.data section. So I was suggesting doing something similar with the
> x86 stub, renaming .bss to .bss.efistub, and then putting that into
> .data via linker script.
>

OK, I see what you mean now. IIRC, putting that into .init.data was
simply because there was no way to selectively prefix sections, and
leave .bss alone.

But I guess we can combine all these different histories and
rationales into one coherent way of managing the stub's .bss.


> Anyway, I'll do the section annotation for now as that will be less
> churn and we can revisit this after that.
>
> >
> > > There is also this scary looking comment in gnu-efi's linker script:
> > >         /* the EFI loader doesn't seem to like a .bss section, so we stick
> > >            it all into .data: */
> > > I don't know what the history of that is.
> > >
> >
> > I don't remember, to be honest, but I'm pretty sure I copy-pasted that
> > from elsewhere at the time.
> >
> > > [1] As an aside, why doesn't ARM do this as well rather than using the
> > > section(.data) annotation?
> > >
> >
> > The ARM decompressor has this hideous hack, where text [and rodata]
> > executes straight from flash, and BSS is relocated into DRAM. In order
> > for this to work, it actually *requires* GOT indirections for BSS
> > items, to ensure that all BSS references use absolute addresses, which
> > can be relocated independently from the rest of the kernel image. This
> > is the reason ARM does not permit a .data section in the decompressor.
> > However, the EFI stub does not support execute in place from flash,
> > and so we can permit a .data section there. At the same time, we don't
> > support GOT indirections in the EFI stub, so we cannot use the BSS
> > section. So instead, we just put the few BSS pieces we have into .data
> > instead.
> >
>
> I see, thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
       [not found]                             ` <CAMj1kXG+34-bK1XuxX5VopkRt1SV1ewUAEmif+aQj5cJQ=9vbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2020-04-06 16:01                               ` Arvind Sankar
@ 2020-04-06 17:21                               ` Borislav Petkov
  1 sibling, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2020-04-06 17:21 UTC (permalink / raw)
  To: Ard Biesheuvel, Arvind Sankar
  Cc: Sergey Shatunov, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	Linux Kernel Mailing List, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	Thomas Gleixner, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Mon, Apr 06, 2020 at 03:29:18PM +0200, Ard Biesheuvel wrote:
> > What do you think of the other problem -- that's actually worse to fix,
> > as it won't just be when kaslr is disabled, the startup_64 code will do
> > relocation to the end of init_size and clobber the initrd before getting
> > to the kaslr code, so it will break as soon as the firmware loads the
> > "unified kernel image" at a 2Mb-aligned address. The only thing I can
> > think of is to just unconditionally call efi_relocate_kernel if we were
> > entered via handover_entry?
> >
> 
> Yes, that seems to be the most robust approach.

The commit in question is this one:

d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary")

I presume?

I'm guessing it can simply be reverted as it doesn't fix a bug but it is
just an optimization... provided I'm not missing something, of course.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
       [not found]                                           ` <CAMj1kXEUkJ1XJ9OTsijeq8tNNYC00bXqEV44OMtX5ugo9WoLKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-06 18:06                                             ` Arvind Sankar
  2020-04-06 18:06                                               ` [PATCH 2/2] efi/x86: Always relocate the kernel for EFI handover entry Arvind Sankar
                                                                 ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Arvind Sankar @ 2020-04-06 18:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arvind Sankar, Borislav Petkov, Sergey Shatunov,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, Linux Kernel Mailing List,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Thomas Gleixner,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

Commit

  3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
  bzImage")

removed the .bss section from the bzImage.

However, while a PE loader is required to zero-initialize the .bss
section before calling the PE entry point, the EFI handover protocol
does not currently document any requirement that .bss be initialized by
the bootloader prior to calling the handover entry.

When systemd-boot is used to boot a unified kernel image [1], the image
is constructed by embedding the bzImage as a .linux section in a PE
executable that contains a small stub loader from systemd together with
additional sections and potentially an initrd. As the .bss section
within the bzImage is no longer explicitly present as part of the file,
it is not initialized before calling the EFI handover entry.
Furthermore, as the size of the embedded .linux section is only the size
of the bzImage file itself, the .bss section's memory may not even have
been allocated.

In particular, this can result in efi_disable_pci_dma being true even
when it was not specified via the command line or configuration option,
which in turn causes crashes while booting on some systems.

To avoid issues, place all EFI stub global variables into the .data
section instead of .bss. As of this writing, only boolean flags for a
few command line arguments and the sys_table pointer were in .bss and
will now move into the .data section.

[1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images

Signed-off-by: Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
Reported-by: Sergey Shatunov <me-fHUSW+XspFU@public.gmane.org>
Fixes: 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage")
---
 drivers/firmware/efi/libstub/efistub.h  | 2 +-
 drivers/firmware/efi/libstub/x86-stub.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index cc90a748bcf0..67d26949fd26 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -25,7 +25,7 @@
 #define EFI_ALLOC_ALIGN		EFI_PAGE_SIZE
 #endif
 
-#ifdef CONFIG_ARM
+#if defined(CONFIG_ARM) || defined(CONFIG_X86)
 #define __efistub_global	__section(.data)
 #else
 #define __efistub_global
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 8d3a707789de..e7af6d2eddbf 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -20,7 +20,7 @@
 /* Maximum physical address for 64-bit kernel with 4-level paging */
 #define MAXMEM_X86_64_4LEVEL (1ull << 46)
 
-static efi_system_table_t *sys_table;
+static efi_system_table_t *sys_table __efistub_global;
 extern const bool efi_is64;
 extern u32 image_offset;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] efi/x86: Always relocate the kernel for EFI handover entry
  2020-04-06 18:06                                             ` [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data Arvind Sankar
@ 2020-04-06 18:06                                               ` Arvind Sankar
       [not found]                                               ` <20200406180614.429454-1-nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
  2020-04-10 11:26                                               ` Thomas Meyer
  2 siblings, 0 replies; 26+ messages in thread
From: Arvind Sankar @ 2020-04-06 18:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arvind Sankar, Borislav Petkov, Sergey Shatunov, hpa,
	Linux Kernel Mailing List, mingo, Thomas Gleixner, x86,
	linux-efi, initramfs, Donovan Tremura, Harald Hoyer

Commit

  d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary")

tries to avoid relocating the kernel in the EFI stub as far as possible.

However, when systemd-boot is used to boot a unified kernel image [1],
the image is constructed by embedding the bzImage as a .linux section in
a PE executable that contains a small stub loader from systemd that will
call the EFI stub handover entry, together with additional sections and
potentially an initrd. When this image is constructed, by for example
dracut, the initrd is placed after the bzImage without ensuring that at
least init_size bytes are available for the bzImage. If the kernel is
not relocated by the EFI stub, this could result in the compressed
kernel's startup code in head_{32,64}.S overwriting the initrd.

To prevent this, unconditionally relocate the kernel if the EFI stub was
entered via the handover entry point.

[1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Reported-by: Sergey Shatunov <me@prok.pw>
Fixes: d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary")
---
 drivers/firmware/efi/libstub/x86-stub.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index e7af6d2eddbf..7583e908852f 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -742,8 +742,15 @@ unsigned long efi_main(efi_handle_t handle,
 	 * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
 	 * KASLR uses.
 	 *
-	 * Also relocate it if image_offset is zero, i.e. we weren't loaded by
-	 * LoadImage, but we are not aligned correctly.
+	 * Also relocate it if image_offset is zero, i.e. the kernel wasn't
+	 * loaded by LoadImage, but rather by a bootloader that called the
+	 * handover entry. The reason we must always relocate in this case is
+	 * to handle the case of systemd-boot booting a unified kernel image,
+	 * which is a PE executable that contains the bzImage and an initrd as
+	 * COFF sections. The initrd section is placed after the bzImage
+	 * without ensuring that there are at least init_size bytes available
+	 * for the bzImage, and thus the compressed kernel's startup code may
+	 * overwrite the initrd unless it is moved out of the way.
 	 */
 
 	buffer_start = ALIGN(bzimage_addr - image_offset,
@@ -753,8 +760,7 @@ unsigned long efi_main(efi_handle_t handle,
 	if ((buffer_start < LOAD_PHYSICAL_ADDR)				     ||
 	    (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE)    ||
 	    (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
-	    (image_offset == 0 && !IS_ALIGNED(bzimage_addr,
-					      hdr->kernel_alignment))) {
+	    (image_offset == 0)) {
 		status = efi_relocate_kernel(&bzimage_addr,
 					     hdr->init_size, hdr->init_size,
 					     hdr->pref_address,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
       [not found]                                               ` <20200406180614.429454-1-nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
@ 2020-04-06 18:29                                                 ` Ard Biesheuvel
  2020-04-08  7:43                                                 ` Dave Young
  1 sibling, 0 replies; 26+ messages in thread
From: Ard Biesheuvel @ 2020-04-06 18:29 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Borislav Petkov, Sergey Shatunov, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	Linux Kernel Mailing List, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	Thomas Gleixner, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Mon, 6 Apr 2020 at 20:06, Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
>
> Commit
>
>   3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
>   bzImage")
>
> removed the .bss section from the bzImage.
>
> However, while a PE loader is required to zero-initialize the .bss
> section before calling the PE entry point, the EFI handover protocol
> does not currently document any requirement that .bss be initialized by
> the bootloader prior to calling the handover entry.
>
> When systemd-boot is used to boot a unified kernel image [1], the image
> is constructed by embedding the bzImage as a .linux section in a PE
> executable that contains a small stub loader from systemd together with
> additional sections and potentially an initrd. As the .bss section
> within the bzImage is no longer explicitly present as part of the file,
> it is not initialized before calling the EFI handover entry.
> Furthermore, as the size of the embedded .linux section is only the size
> of the bzImage file itself, the .bss section's memory may not even have
> been allocated.
>
> In particular, this can result in efi_disable_pci_dma being true even
> when it was not specified via the command line or configuration option,
> which in turn causes crashes while booting on some systems.
>
> To avoid issues, place all EFI stub global variables into the .data
> section instead of .bss. As of this writing, only boolean flags for a
> few command line arguments and the sys_table pointer were in .bss and
> will now move into the .data section.
>
> [1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images
>
> Signed-off-by: Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
> Reported-by: Sergey Shatunov <me-fHUSW+XspFU@public.gmane.org>
> Fixes: 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage")

Thanks Arvind.

This should be fine for the time being, but going forward, it would
indeed be better [as you suggested] to consolidate the section
tweaking logic that exists on different architectures for entirely
different reasons, but with similar results. That will also ensure
that BSS gets pulled into .data (or .init.data for arm64)
automatically, rather than requiring these manual annotations.

I'll queue these up in efi/urgent, and send them on later this week.


> ---
>  drivers/firmware/efi/libstub/efistub.h  | 2 +-
>  drivers/firmware/efi/libstub/x86-stub.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index cc90a748bcf0..67d26949fd26 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -25,7 +25,7 @@
>  #define EFI_ALLOC_ALIGN                EFI_PAGE_SIZE
>  #endif
>
> -#ifdef CONFIG_ARM
> +#if defined(CONFIG_ARM) || defined(CONFIG_X86)
>  #define __efistub_global       __section(.data)
>  #else
>  #define __efistub_global
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 8d3a707789de..e7af6d2eddbf 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -20,7 +20,7 @@
>  /* Maximum physical address for 64-bit kernel with 4-level paging */
>  #define MAXMEM_X86_64_4LEVEL (1ull << 46)
>
> -static efi_system_table_t *sys_table;
> +static efi_system_table_t *sys_table __efistub_global;
>  extern const bool efi_is64;
>  extern u32 image_offset;
>
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
       [not found]                                               ` <20200406180614.429454-1-nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
  2020-04-06 18:29                                                 ` [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data Ard Biesheuvel
@ 2020-04-08  7:43                                                 ` Dave Young
       [not found]                                                   ` <20200408074334.GA21886-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Young @ 2020-04-08  7:43 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Ard Biesheuvel, Borislav Petkov, Sergey Shatunov,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, Linux Kernel Mailing List,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Thomas Gleixner,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> Commit
> 
>   3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
>   bzImage")
> 
> removed the .bss section from the bzImage.
> 
> However, while a PE loader is required to zero-initialize the .bss
> section before calling the PE entry point, the EFI handover protocol
> does not currently document any requirement that .bss be initialized by
> the bootloader prior to calling the handover entry.
> 
> When systemd-boot is used to boot a unified kernel image [1], the image
> is constructed by embedding the bzImage as a .linux section in a PE
> executable that contains a small stub loader from systemd together with
> additional sections and potentially an initrd. As the .bss section
> within the bzImage is no longer explicitly present as part of the file,
> it is not initialized before calling the EFI handover entry.
> Furthermore, as the size of the embedded .linux section is only the size
> of the bzImage file itself, the .bss section's memory may not even have
> been allocated.

I did not follow up the old report, maybe I missed something. But not
sure why only systemd-boot is mentioned here.  I also have similar issue
with early efi failure.  With these two patches applied, it works well
then.

BTW, I use Fedora 31 + Grub2

> 
> In particular, this can result in efi_disable_pci_dma being true even
> when it was not specified via the command line or configuration option,
> which in turn causes crashes while booting on some systems.
> 
> To avoid issues, place all EFI stub global variables into the .data
> section instead of .bss. As of this writing, only boolean flags for a
> few command line arguments and the sys_table pointer were in .bss and
> will now move into the .data section.
> 
> [1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images
> 
> Signed-off-by: Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
> Reported-by: Sergey Shatunov <me-fHUSW+XspFU@public.gmane.org>
> Fixes: 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage")
> ---
>  drivers/firmware/efi/libstub/efistub.h  | 2 +-
>  drivers/firmware/efi/libstub/x86-stub.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index cc90a748bcf0..67d26949fd26 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -25,7 +25,7 @@
>  #define EFI_ALLOC_ALIGN		EFI_PAGE_SIZE
>  #endif
>  
> -#ifdef CONFIG_ARM
> +#if defined(CONFIG_ARM) || defined(CONFIG_X86)
>  #define __efistub_global	__section(.data)
>  #else
>  #define __efistub_global
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 8d3a707789de..e7af6d2eddbf 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -20,7 +20,7 @@
>  /* Maximum physical address for 64-bit kernel with 4-level paging */
>  #define MAXMEM_X86_64_4LEVEL (1ull << 46)
>  
> -static efi_system_table_t *sys_table;
> +static efi_system_table_t *sys_table __efistub_global;
>  extern const bool efi_is64;
>  extern u32 image_offset;
>  
> -- 
> 2.24.1
> 

Thanks
Dave


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
       [not found]                                                   ` <20200408074334.GA21886-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
@ 2020-04-08  7:49                                                     ` Ard Biesheuvel
  2020-04-09 14:39                                                       ` Arvind Sankar
  0 siblings, 1 reply; 26+ messages in thread
From: Ard Biesheuvel @ 2020-04-08  7:49 UTC (permalink / raw)
  To: Dave Young, pjones-H+wXaHxf7aLQT0dZR+AlfA,
	daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA, Leif Lindholm
  Cc: Arvind Sankar, Borislav Petkov, Sergey Shatunov,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, Linux Kernel Mailing List,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Thomas Gleixner,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

(add Peter, Leif and Daniel)

On Wed, 8 Apr 2020 at 09:43, Dave Young <dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > Commit
> >
> >   3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> >   bzImage")
> >
> > removed the .bss section from the bzImage.
> >
> > However, while a PE loader is required to zero-initialize the .bss
> > section before calling the PE entry point, the EFI handover protocol
> > does not currently document any requirement that .bss be initialized by
> > the bootloader prior to calling the handover entry.
> >
> > When systemd-boot is used to boot a unified kernel image [1], the image
> > is constructed by embedding the bzImage as a .linux section in a PE
> > executable that contains a small stub loader from systemd together with
> > additional sections and potentially an initrd. As the .bss section
> > within the bzImage is no longer explicitly present as part of the file,
> > it is not initialized before calling the EFI handover entry.
> > Furthermore, as the size of the embedded .linux section is only the size
> > of the bzImage file itself, the .bss section's memory may not even have
> > been allocated.
>
> I did not follow up the old report, maybe I missed something. But not
> sure why only systemd-boot is mentioned here.  I also have similar issue
> with early efi failure.  With these two patches applied, it works well
> then.
>
> BTW, I use Fedora 31 + Grub2
>

OK, so I take it this means that GRUB's PE/COFF loader does not
zero-initialize BSS either? Does it honor the image size in memory if
it exceeds the file size?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
  2020-04-08  7:49                                                     ` Ard Biesheuvel
@ 2020-04-09 14:39                                                       ` Arvind Sankar
  2020-04-09 14:47                                                         ` Ard Biesheuvel
  0 siblings, 1 reply; 26+ messages in thread
From: Arvind Sankar @ 2020-04-09 14:39 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Dave Young, pjones, daniel.kiper, Leif Lindholm, Arvind Sankar,
	Borislav Petkov, Sergey Shatunov, hpa, Linux Kernel Mailing List,
	mingo, Thomas Gleixner, x86, linux-efi, initramfs,
	Donovan Tremura, Harald Hoyer

On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> (add Peter, Leif and Daniel)
> 
> On Wed, 8 Apr 2020 at 09:43, Dave Young <dyoung@redhat.com> wrote:
> >
> > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > Commit
> > >
> > >   3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > >   bzImage")
> > >
> > > removed the .bss section from the bzImage.
> > >
> > > However, while a PE loader is required to zero-initialize the .bss
> > > section before calling the PE entry point, the EFI handover protocol
> > > does not currently document any requirement that .bss be initialized by
> > > the bootloader prior to calling the handover entry.
> > >
> > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > is constructed by embedding the bzImage as a .linux section in a PE
> > > executable that contains a small stub loader from systemd together with
> > > additional sections and potentially an initrd. As the .bss section
> > > within the bzImage is no longer explicitly present as part of the file,
> > > it is not initialized before calling the EFI handover entry.
> > > Furthermore, as the size of the embedded .linux section is only the size
> > > of the bzImage file itself, the .bss section's memory may not even have
> > > been allocated.
> >
> > I did not follow up the old report, maybe I missed something. But not
> > sure why only systemd-boot is mentioned here.  I also have similar issue
> > with early efi failure.  With these two patches applied, it works well
> > then.
> >
> > BTW, I use Fedora 31 + Grub2
> >
> 
> OK, so I take it this means that GRUB's PE/COFF loader does not
> zero-initialize BSS either? Does it honor the image size in memory if
> it exceeds the file size?

Dave, that comment was because the previous report was for systemd-boot
stub.

Ard, should I revise the commit message to make it clear it's not
restricted to systemd-boot but anything using handover entry may be
affected? Maybe just a "for example, when systemd-boot..." and then a
line to say grub2 with the EFI stub patches is also impacted?

https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743

+  kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
+					   BYTES_TO_PAGES(lh.init_size));

Looking at this, grub does allocate init_size for the image, but it
doesn't zero it out.

This call also looks wrong to me though. It allocates at max address of
pref_address, which, if it succeeds, will guarantee that the kernel gets
loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
mode, if it weren't for the EFI stub copying the kernel again, this
would cause the startup code to relocate the kernel into unallocated
memory. On a mixed-mode boot, this would cause the early page tables
setup prior to transitioning to 64-bit mode to be in unallocated memory
and potentially get clobbered by the EFI stub.

The first try to allocate pref_address should be calling
grub_efi_allocate_fixed instead.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
  2020-04-09 14:39                                                       ` Arvind Sankar
@ 2020-04-09 14:47                                                         ` Ard Biesheuvel
       [not found]                                                           ` <CAMj1kXEm=E6B+kjZktG=sBPLQ=_HFfUz6KFLskNGzRnuMjn0gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ard Biesheuvel @ 2020-04-09 14:47 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Dave Young, pjones, daniel.kiper, Leif Lindholm, Borislav Petkov,
	Sergey Shatunov, hpa, Linux Kernel Mailing List, mingo,
	Thomas Gleixner, X86 ML, linux-efi, initramfs, Donovan Tremura,
	Harald Hoyer

On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > (add Peter, Leif and Daniel)
> >
> > On Wed, 8 Apr 2020 at 09:43, Dave Young <dyoung@redhat.com> wrote:
> > >
> > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > Commit
> > > >
> > > >   3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > >   bzImage")
> > > >
> > > > removed the .bss section from the bzImage.
> > > >
> > > > However, while a PE loader is required to zero-initialize the .bss
> > > > section before calling the PE entry point, the EFI handover protocol
> > > > does not currently document any requirement that .bss be initialized by
> > > > the bootloader prior to calling the handover entry.
> > > >
> > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > executable that contains a small stub loader from systemd together with
> > > > additional sections and potentially an initrd. As the .bss section
> > > > within the bzImage is no longer explicitly present as part of the file,
> > > > it is not initialized before calling the EFI handover entry.
> > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > been allocated.
> > >
> > > I did not follow up the old report, maybe I missed something. But not
> > > sure why only systemd-boot is mentioned here.  I also have similar issue
> > > with early efi failure.  With these two patches applied, it works well
> > > then.
> > >
> > > BTW, I use Fedora 31 + Grub2
> > >
> >
> > OK, so I take it this means that GRUB's PE/COFF loader does not
> > zero-initialize BSS either? Does it honor the image size in memory if
> > it exceeds the file size?
>
> Dave, that comment was because the previous report was for systemd-boot
> stub.
>
> Ard, should I revise the commit message to make it clear it's not
> restricted to systemd-boot but anything using handover entry may be
> affected? Maybe just a "for example, when systemd-boot..." and then a
> line to say grub2 with the EFI stub patches is also impacted?
>

Well, the fact the /some/ piece of software is used in production that
relies on the ill-defined EFI handover protocol is sufficient
justification, so I don't think it is hugely important to update it.

> https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
>
> +  kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> +                                          BYTES_TO_PAGES(lh.init_size));
>
> Looking at this, grub does allocate init_size for the image, but it
> doesn't zero it out.
>
> This call also looks wrong to me though. It allocates at max address of
> pref_address, which, if it succeeds, will guarantee that the kernel gets
> loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> mode, if it weren't for the EFI stub copying the kernel again, this
> would cause the startup code to relocate the kernel into unallocated
> memory. On a mixed-mode boot, this would cause the early page tables
> setup prior to transitioning to 64-bit mode to be in unallocated memory
> and potentially get clobbered by the EFI stub.
>
> The first try to allocate pref_address should be calling
> grub_efi_allocate_fixed instead.

Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
to get these logged somewhere.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
       [not found]                                                           ` <CAMj1kXEm=E6B+kjZktG=sBPLQ=_HFfUz6KFLskNGzRnuMjn0gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-09 16:35                                                             ` Arvind Sankar
       [not found]                                                               ` <20200409163530.GA785575-ZaHV2fEG+eCZQHugZx4kin66tl449arB@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Arvind Sankar @ 2020-04-09 16:35 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arvind Sankar, Dave Young, pjones-H+wXaHxf7aLQT0dZR+AlfA,
	daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA, Leif Lindholm,
	Borislav Petkov, Sergey Shatunov, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	Linux Kernel Mailing List, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	Thomas Gleixner, X86 ML, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Thu, Apr 09, 2020 at 04:47:55PM +0200, Ard Biesheuvel wrote:
> On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> >
> > On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > > (add Peter, Leif and Daniel)
> > >
> > > On Wed, 8 Apr 2020 at 09:43, Dave Young <dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > >
> > > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > > Commit
> > > > >
> > > > >   3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > >   bzImage")
> > > > >
> > > > > removed the .bss section from the bzImage.
> > > > >
> > > > > However, while a PE loader is required to zero-initialize the .bss
> > > > > section before calling the PE entry point, the EFI handover protocol
> > > > > does not currently document any requirement that .bss be initialized by
> > > > > the bootloader prior to calling the handover entry.
> > > > >
> > > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > > executable that contains a small stub loader from systemd together with
> > > > > additional sections and potentially an initrd. As the .bss section
> > > > > within the bzImage is no longer explicitly present as part of the file,
> > > > > it is not initialized before calling the EFI handover entry.
> > > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > > been allocated.
> > > >
> > > > I did not follow up the old report, maybe I missed something. But not
> > > > sure why only systemd-boot is mentioned here.  I also have similar issue
> > > > with early efi failure.  With these two patches applied, it works well
> > > > then.
> > > >
> > > > BTW, I use Fedora 31 + Grub2
> > > >
> > >
> > > OK, so I take it this means that GRUB's PE/COFF loader does not
> > > zero-initialize BSS either? Does it honor the image size in memory if
> > > it exceeds the file size?
> >
> > Dave, that comment was because the previous report was for systemd-boot
> > stub.
> >
> > Ard, should I revise the commit message to make it clear it's not
> > restricted to systemd-boot but anything using handover entry may be
> > affected? Maybe just a "for example, when systemd-boot..." and then a
> > line to say grub2 with the EFI stub patches is also impacted?
> >
> 
> Well, the fact the /some/ piece of software is used in production that
> relies on the ill-defined EFI handover protocol is sufficient
> justification, so I don't think it is hugely important to update it.
> 
> > https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
> >
> > +  kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> > +                                          BYTES_TO_PAGES(lh.init_size));
> >
> > Looking at this, grub does allocate init_size for the image, but it
> > doesn't zero it out.
> >
> > This call also looks wrong to me though. It allocates at max address of
> > pref_address, which, if it succeeds, will guarantee that the kernel gets
> > loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> > mode, if it weren't for the EFI stub copying the kernel again, this
> > would cause the startup code to relocate the kernel into unallocated
> > memory. On a mixed-mode boot, this would cause the early page tables
> > setup prior to transitioning to 64-bit mode to be in unallocated memory
> > and potentially get clobbered by the EFI stub.
> >
> > The first try to allocate pref_address should be calling
> > grub_efi_allocate_fixed instead.
> 
> Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
> to get these logged somewhere.

Ok. For dracut, the process for building the unified kernel image needs
a check to make sure the kernel can fit in the space provided for it --
there is 16MiB of space and the distro bzImage's are up to 10-11MiB in
size, so there's some slack left at present.

Additionally, in mixed-mode, the unified kernel images are quite likely
to end up with early pgtables from startup_32 clobbering the initrd,
independently of the recent kernel changes. Hopefully no-one actually
uses these in mixed-mode.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
  2020-04-06 18:06                                             ` [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data Arvind Sankar
  2020-04-06 18:06                                               ` [PATCH 2/2] efi/x86: Always relocate the kernel for EFI handover entry Arvind Sankar
       [not found]                                               ` <20200406180614.429454-1-nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
@ 2020-04-10 11:26                                               ` Thomas Meyer
  2020-04-10 14:38                                                 ` Arvind Sankar
  2 siblings, 1 reply; 26+ messages in thread
From: Thomas Meyer @ 2020-04-10 11:26 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Ard Biesheuvel, Borislav Petkov, Sergey Shatunov, hpa,
	Linux Kernel Mailing List, mingo, Thomas Gleixner, x86,
	linux-efi, initramfs, Donovan Tremura, Harald Hoyer

On Mon, Apr 06, 2020 at 02:06:13PM -0400, Arvind Sankar wrote:

Hi,

I did write an email to x86@kernel.org, which sadly seems to have no
mailing list archive, I wonder if my problem has anything to do with the
patches you are discussing here:

I found this reply, which contains my original email in my inbox:

Subject: Kernel v5.5 doesn't boot on my x86 laptop

Hi,

I'm using an old MacBookPro1,1 to run Fedora 30 (the last one to support
x86) and a upstream up-to-date kernel, currently 5.4.16.

I'm using sd-boot to boot into an EFI-enabled kernel which contains
an embedded initram cpio image (because loading the image from kernel's
EFI stub doesn't seem to work for some unknown reason, I tried to debug
this but my early debugging foo is too weak).

Kernel 5.4.x works correctly with this setup (but resuming from disk
seems to have broken in 5.4.x or maybe even earlier - when resuming from
disk I get all kind of funky OOPSes/errors, but that's another story, hopefully
5.5.x was fixed in this regards).

So I did have a look at the commits under arch/x86/boot and "x86/boot:
Introduce setup_indirect" (b3c72fc9a78e74161f9d05ef7191706060628f8c) did
talk about "bump setup_header version in arch/x86/boot/header.S", so I
did revert above commit and I was finally able to boot into v5.5 kernel!

So either sd-boot also needs an upgrade or this commit does break
something.
Any help is welcome, don't hesitate to get in contact with me if you
have any questions.

mfg
thomas
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
  2020-04-10 11:26                                               ` Thomas Meyer
@ 2020-04-10 14:38                                                 ` Arvind Sankar
  2020-04-11  8:50                                                   ` Thomas Meyer
  0 siblings, 1 reply; 26+ messages in thread
From: Arvind Sankar @ 2020-04-10 14:38 UTC (permalink / raw)
  To: Thomas Meyer
  Cc: Arvind Sankar, Ard Biesheuvel, Borislav Petkov, Sergey Shatunov,
	hpa, Linux Kernel Mailing List, mingo, Thomas Gleixner, x86,
	linux-efi, initramfs, Donovan Tremura, Harald Hoyer

On Fri, Apr 10, 2020 at 01:26:05PM +0200, Thomas Meyer wrote:
> On Mon, Apr 06, 2020 at 02:06:13PM -0400, Arvind Sankar wrote:
> 
> Hi,
> 
> I did write an email to x86@kernel.org, which sadly seems to have no
> mailing list archive, I wonder if my problem has anything to do with the
> patches you are discussing here:
> 
> I found this reply, which contains my original email in my inbox:
> 
> Subject: Kernel v5.5 doesn't boot on my x86 laptop
> 
> Hi,
> 
> I'm using an old MacBookPro1,1 to run Fedora 30 (the last one to support
> x86) and a upstream up-to-date kernel, currently 5.4.16.
> 
> I'm using sd-boot to boot into an EFI-enabled kernel which contains
> an embedded initram cpio image (because loading the image from kernel's
> EFI stub doesn't seem to work for some unknown reason, I tried to debug
> this but my early debugging foo is too weak).
> 
> Kernel 5.4.x works correctly with this setup (but resuming from disk
> seems to have broken in 5.4.x or maybe even earlier - when resuming from
> disk I get all kind of funky OOPSes/errors, but that's another story, hopefully
> 5.5.x was fixed in this regards).
> 
> So I did have a look at the commits under arch/x86/boot and "x86/boot:
> Introduce setup_indirect" (b3c72fc9a78e74161f9d05ef7191706060628f8c) did
> talk about "bump setup_header version in arch/x86/boot/header.S", so I
> did revert above commit and I was finally able to boot into v5.5 kernel!
> 
> So either sd-boot also needs an upgrade or this commit does break
> something.
> Any help is welcome, don't hesitate to get in contact with me if you
> have any questions.
> 
> mfg
> thomas
>  

If it is a problem with 5.5, it would be unrelated to this thread, which
is about problems introduced by patches for the upcoming 5.7.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
       [not found]                                                               ` <20200409163530.GA785575-ZaHV2fEG+eCZQHugZx4kin66tl449arB@public.gmane.org>
@ 2020-04-10 14:47                                                                 ` Arvind Sankar
  2020-04-10 15:26                                                                   ` Ard Biesheuvel
  0 siblings, 1 reply; 26+ messages in thread
From: Arvind Sankar @ 2020-04-10 14:47 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Ard Biesheuvel, Dave Young, pjones-H+wXaHxf7aLQT0dZR+AlfA,
	daniel.kiper-QHcLZuEGTsvQT0dZR+AlfA, Leif Lindholm,
	Borislav Petkov, Sergey Shatunov, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	Linux Kernel Mailing List, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	Thomas Gleixner, X86 ML, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Thu, Apr 09, 2020 at 12:35:30PM -0400, Arvind Sankar wrote:
> On Thu, Apr 09, 2020 at 04:47:55PM +0200, Ard Biesheuvel wrote:
> > On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> > >
> > > On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > > > (add Peter, Leif and Daniel)
> > > >
> > > > On Wed, 8 Apr 2020 at 09:43, Dave Young <dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > >
> > > > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > > > Commit
> > > > > >
> > > > > >   3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > > >   bzImage")
> > > > > >
> > > > > > removed the .bss section from the bzImage.
> > > > > >
> > > > > > However, while a PE loader is required to zero-initialize the .bss
> > > > > > section before calling the PE entry point, the EFI handover protocol
> > > > > > does not currently document any requirement that .bss be initialized by
> > > > > > the bootloader prior to calling the handover entry.
> > > > > >
> > > > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > > > executable that contains a small stub loader from systemd together with
> > > > > > additional sections and potentially an initrd. As the .bss section
> > > > > > within the bzImage is no longer explicitly present as part of the file,
> > > > > > it is not initialized before calling the EFI handover entry.
> > > > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > > > been allocated.
> > > > >
> > > > > I did not follow up the old report, maybe I missed something. But not
> > > > > sure why only systemd-boot is mentioned here.  I also have similar issue
> > > > > with early efi failure.  With these two patches applied, it works well
> > > > > then.
> > > > >
> > > > > BTW, I use Fedora 31 + Grub2
> > > > >
> > > >
> > > > OK, so I take it this means that GRUB's PE/COFF loader does not
> > > > zero-initialize BSS either? Does it honor the image size in memory if
> > > > it exceeds the file size?
> > >
> > > Dave, that comment was because the previous report was for systemd-boot
> > > stub.
> > >
> > > Ard, should I revise the commit message to make it clear it's not
> > > restricted to systemd-boot but anything using handover entry may be
> > > affected? Maybe just a "for example, when systemd-boot..." and then a
> > > line to say grub2 with the EFI stub patches is also impacted?
> > >
> > 
> > Well, the fact the /some/ piece of software is used in production that
> > relies on the ill-defined EFI handover protocol is sufficient
> > justification, so I don't think it is hugely important to update it.
> > 
> > > https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
> > >
> > > +  kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> > > +                                          BYTES_TO_PAGES(lh.init_size));
> > >
> > > Looking at this, grub does allocate init_size for the image, but it
> > > doesn't zero it out.
> > >
> > > This call also looks wrong to me though. It allocates at max address of
> > > pref_address, which, if it succeeds, will guarantee that the kernel gets
> > > loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> > > mode, if it weren't for the EFI stub copying the kernel again, this
> > > would cause the startup code to relocate the kernel into unallocated
> > > memory. On a mixed-mode boot, this would cause the early page tables
> > > setup prior to transitioning to 64-bit mode to be in unallocated memory
> > > and potentially get clobbered by the EFI stub.
> > >
> > > The first try to allocate pref_address should be calling
> > > grub_efi_allocate_fixed instead.
> > 
> > Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
> > to get these logged somewhere.
> 
> Ok. For dracut, the process for building the unified kernel image needs
> a check to make sure the kernel can fit in the space provided for it --
> there is 16MiB of space and the distro bzImage's are up to 10-11MiB in
> size, so there's some slack left at present.
> 
> Additionally, in mixed-mode, the unified kernel images are quite likely
> to end up with early pgtables from startup_32 clobbering the initrd,
> independently of the recent kernel changes. Hopefully no-one actually
> uses these in mixed-mode.

The grub EFI handover entry patch is busted in mixed-mode for another
reason -- while it allocates init_size, it doesn't use the correct
alignment. I tested on a Debian buster VM in mixed-mode (that was the
one I was able to get to install/boot with mixed-mode), and the early
pagetable from startup_32 ends up in unallocated memory due to the
rounding up of the bzImage address to account for kernel alignment. This
would be an existing problem prior to these patches.

Should we try to handle this in the kernel? At some point KASLR is going
to pick that memory for the kernel and overwrite the pagetables I would
think, resulting in sporadic crashes that are almost unreproducible.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
  2020-04-10 14:47                                                                 ` Arvind Sankar
@ 2020-04-10 15:26                                                                   ` Ard Biesheuvel
       [not found]                                                                     ` <CAMj1kXHMQgnmdoA+qKLGa=gYg4J2p-DU3-K1LiM=AT61pi+Fvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ard Biesheuvel @ 2020-04-10 15:26 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Dave Young, pjones, daniel.kiper, Leif Lindholm, Borislav Petkov,
	Sergey Shatunov, hpa, Linux Kernel Mailing List, mingo,
	Thomas Gleixner, X86 ML, linux-efi, initramfs, Donovan Tremura,
	Harald Hoyer

On Fri, 10 Apr 2020 at 16:48, Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Thu, Apr 09, 2020 at 12:35:30PM -0400, Arvind Sankar wrote:
> > On Thu, Apr 09, 2020 at 04:47:55PM +0200, Ard Biesheuvel wrote:
> > > On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <nivedita@alum.mit.edu> wrote:
> > > >
> > > > On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > > > > (add Peter, Leif and Daniel)
> > > > >
> > > > > On Wed, 8 Apr 2020 at 09:43, Dave Young <dyoung@redhat.com> wrote:
> > > > > >
> > > > > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > > > > Commit
> > > > > > >
> > > > > > >   3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > > > >   bzImage")
> > > > > > >
> > > > > > > removed the .bss section from the bzImage.
> > > > > > >
> > > > > > > However, while a PE loader is required to zero-initialize the .bss
> > > > > > > section before calling the PE entry point, the EFI handover protocol
> > > > > > > does not currently document any requirement that .bss be initialized by
> > > > > > > the bootloader prior to calling the handover entry.
> > > > > > >
> > > > > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > > > > executable that contains a small stub loader from systemd together with
> > > > > > > additional sections and potentially an initrd. As the .bss section
> > > > > > > within the bzImage is no longer explicitly present as part of the file,
> > > > > > > it is not initialized before calling the EFI handover entry.
> > > > > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > > > > been allocated.
> > > > > >
> > > > > > I did not follow up the old report, maybe I missed something. But not
> > > > > > sure why only systemd-boot is mentioned here.  I also have similar issue
> > > > > > with early efi failure.  With these two patches applied, it works well
> > > > > > then.
> > > > > >
> > > > > > BTW, I use Fedora 31 + Grub2
> > > > > >
> > > > >
> > > > > OK, so I take it this means that GRUB's PE/COFF loader does not
> > > > > zero-initialize BSS either? Does it honor the image size in memory if
> > > > > it exceeds the file size?
> > > >
> > > > Dave, that comment was because the previous report was for systemd-boot
> > > > stub.
> > > >
> > > > Ard, should I revise the commit message to make it clear it's not
> > > > restricted to systemd-boot but anything using handover entry may be
> > > > affected? Maybe just a "for example, when systemd-boot..." and then a
> > > > line to say grub2 with the EFI stub patches is also impacted?
> > > >
> > >
> > > Well, the fact the /some/ piece of software is used in production that
> > > relies on the ill-defined EFI handover protocol is sufficient
> > > justification, so I don't think it is hugely important to update it.
> > >
> > > > https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
> > > >
> > > > +  kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> > > > +                                          BYTES_TO_PAGES(lh.init_size));
> > > >
> > > > Looking at this, grub does allocate init_size for the image, but it
> > > > doesn't zero it out.
> > > >
> > > > This call also looks wrong to me though. It allocates at max address of
> > > > pref_address, which, if it succeeds, will guarantee that the kernel gets
> > > > loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> > > > mode, if it weren't for the EFI stub copying the kernel again, this
> > > > would cause the startup code to relocate the kernel into unallocated
> > > > memory. On a mixed-mode boot, this would cause the early page tables
> > > > setup prior to transitioning to 64-bit mode to be in unallocated memory
> > > > and potentially get clobbered by the EFI stub.
> > > >
> > > > The first try to allocate pref_address should be calling
> > > > grub_efi_allocate_fixed instead.
> > >
> > > Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
> > > to get these logged somewhere.
> >
> > Ok. For dracut, the process for building the unified kernel image needs
> > a check to make sure the kernel can fit in the space provided for it --
> > there is 16MiB of space and the distro bzImage's are up to 10-11MiB in
> > size, so there's some slack left at present.
> >
> > Additionally, in mixed-mode, the unified kernel images are quite likely
> > to end up with early pgtables from startup_32 clobbering the initrd,
> > independently of the recent kernel changes. Hopefully no-one actually
> > uses these in mixed-mode.
>
> The grub EFI handover entry patch is busted in mixed-mode for another
> reason -- while it allocates init_size, it doesn't use the correct
> alignment. I tested on a Debian buster VM in mixed-mode (that was the
> one I was able to get to install/boot with mixed-mode), and the early
> pagetable from startup_32 ends up in unallocated memory due to the
> rounding up of the bzImage address to account for kernel alignment. This
> would be an existing problem prior to these patches.
>
> Should we try to handle this in the kernel? At some point KASLR is going
> to pick that memory for the kernel and overwrite the pagetables I would
> think, resulting in sporadic crashes that are almost unreproducible.

Upstream GRUB does not implement the EFI handover protocol at all, and
the distros all have their own GRUB forks that implement this along
with mixed mode, secure boot, shim, measured boot etc.

What you are saying is that GRUB forks turn out to exist that violate
both the PE/COFF specification and the Linux/x86 boot protocol in a
way that might break mixed mode, and nobody noticed until you happened
to find it by code inspection. While I appreciate the effort, I think
this is where I would like to draw the line, and say that there is
only so much we can do to work around bugs in out-of-tree forks of
other projects. So unless it can be done cleanly and without losing
any of the benefits of the recent cleanup and optimization work, I'd
say don't bother.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
  2020-04-10 14:38                                                 ` Arvind Sankar
@ 2020-04-11  8:50                                                   ` Thomas Meyer
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Meyer @ 2020-04-11  8:50 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Ard Biesheuvel, Borislav Petkov, Sergey Shatunov, hpa,
	Linux Kernel Mailing List, mingo, Thomas Gleixner, x86,
	linux-efi, initramfs, Donovan Tremura, Harald Hoyer

On Fri, Apr 10, 2020 at 10:38:21AM -0400, Arvind Sankar wrote:
> On Fri, Apr 10, 2020 at 01:26:05PM +0200, Thomas Meyer wrote:
> > On Mon, Apr 06, 2020 at 02:06:13PM -0400, Arvind Sankar wrote:
> > 
> > Hi,
> > 
> > I did write an email to x86@kernel.org, which sadly seems to have no
> > mailing list archive, I wonder if my problem has anything to do with the
> > patches you are discussing here:
> > 
> > I found this reply, which contains my original email in my inbox:
> > 
> > Subject: Kernel v5.5 doesn't boot on my x86 laptop
> > 
> > Hi,
> > 
> > I'm using an old MacBookPro1,1 to run Fedora 30 (the last one to support
> > x86) and a upstream up-to-date kernel, currently 5.4.16.
> > 
> > I'm using sd-boot to boot into an EFI-enabled kernel which contains
> > an embedded initram cpio image (because loading the image from kernel's
> > EFI stub doesn't seem to work for some unknown reason, I tried to debug
> > this but my early debugging foo is too weak).
> > 
> > Kernel 5.4.x works correctly with this setup (but resuming from disk
> > seems to have broken in 5.4.x or maybe even earlier - when resuming from
> > disk I get all kind of funky OOPSes/errors, but that's another story, hopefully
> > 5.5.x was fixed in this regards).
> > 
> > So I did have a look at the commits under arch/x86/boot and "x86/boot:
> > Introduce setup_indirect" (b3c72fc9a78e74161f9d05ef7191706060628f8c) did
> > talk about "bump setup_header version in arch/x86/boot/header.S", so I
> > did revert above commit and I was finally able to boot into v5.5 kernel!
> > 
> > So either sd-boot also needs an upgrade or this commit does break
> > something.
> > Any help is welcome, don't hesitate to get in contact with me if you
> > have any questions.
> > 
> > mfg
> > thomas
> >  
> 
> If it is a problem with 5.5, it would be unrelated to this thread, which
> is about problems introduced by patches for the upcoming 5.7.

okay, ping me if I should test something on real hardware.

> 
> Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data
       [not found]                                                                     ` <CAMj1kXHMQgnmdoA+qKLGa=gYg4J2p-DU3-K1LiM=AT61pi+Fvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-14 14:57                                                                       ` Daniel Kiper
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Kiper @ 2020-04-14 14:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Arvind Sankar, Dave Young, pjones-H+wXaHxf7aLQT0dZR+AlfA,
	Leif Lindholm, Borislav Petkov, Sergey Shatunov,
	hpa-YMNOUZJC4hwAvxtiuMwx3w, Linux Kernel Mailing List,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, Thomas Gleixner, X86 ML, linux-efi,
	initramfs-u79uwXL29TY76Z2rM5mHXA, Donovan Tremura, Harald Hoyer

On Fri, Apr 10, 2020 at 05:26:34PM +0200, Ard Biesheuvel wrote:
> On Fri, 10 Apr 2020 at 16:48, Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> > On Thu, Apr 09, 2020 at 12:35:30PM -0400, Arvind Sankar wrote:
> > > On Thu, Apr 09, 2020 at 04:47:55PM +0200, Ard Biesheuvel wrote:
> > > > On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> > > > >
> > > > > On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > > > > > (add Peter, Leif and Daniel)
> > > > > >
> > > > > > On Wed, 8 Apr 2020 at 09:43, Dave Young <dyoung-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > > > >
> > > > > > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > > > > > Commit
> > > > > > > >
> > > > > > > >   3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > > > > >   bzImage")
> > > > > > > >
> > > > > > > > removed the .bss section from the bzImage.
> > > > > > > >
> > > > > > > > However, while a PE loader is required to zero-initialize the .bss
> > > > > > > > section before calling the PE entry point, the EFI handover protocol
> > > > > > > > does not currently document any requirement that .bss be initialized by
> > > > > > > > the bootloader prior to calling the handover entry.
> > > > > > > >
> > > > > > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > > > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > > > > > executable that contains a small stub loader from systemd together with
> > > > > > > > additional sections and potentially an initrd. As the .bss section
> > > > > > > > within the bzImage is no longer explicitly present as part of the file,
> > > > > > > > it is not initialized before calling the EFI handover entry.
> > > > > > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > > > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > > > > > been allocated.
> > > > > > >
> > > > > > > I did not follow up the old report, maybe I missed something. But not
> > > > > > > sure why only systemd-boot is mentioned here.  I also have similar issue
> > > > > > > with early efi failure.  With these two patches applied, it works well
> > > > > > > then.
> > > > > > >
> > > > > > > BTW, I use Fedora 31 + Grub2
> > > > > > >
> > > > > >
> > > > > > OK, so I take it this means that GRUB's PE/COFF loader does not
> > > > > > zero-initialize BSS either? Does it honor the image size in memory if
> > > > > > it exceeds the file size?
> > > > >
> > > > > Dave, that comment was because the previous report was for systemd-boot
> > > > > stub.
> > > > >
> > > > > Ard, should I revise the commit message to make it clear it's not
> > > > > restricted to systemd-boot but anything using handover entry may be
> > > > > affected? Maybe just a "for example, when systemd-boot..." and then a
> > > > > line to say grub2 with the EFI stub patches is also impacted?
> > > > >
> > > >
> > > > Well, the fact the /some/ piece of software is used in production that
> > > > relies on the ill-defined EFI handover protocol is sufficient
> > > > justification, so I don't think it is hugely important to update it.
> > > >
> > > > > https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
> > > > >
> > > > > +  kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> > > > > +                                          BYTES_TO_PAGES(lh.init_size));
> > > > >
> > > > > Looking at this, grub does allocate init_size for the image, but it
> > > > > doesn't zero it out.
> > > > >
> > > > > This call also looks wrong to me though. It allocates at max address of
> > > > > pref_address, which, if it succeeds, will guarantee that the kernel gets
> > > > > loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> > > > > mode, if it weren't for the EFI stub copying the kernel again, this
> > > > > would cause the startup code to relocate the kernel into unallocated
> > > > > memory. On a mixed-mode boot, this would cause the early page tables
> > > > > setup prior to transitioning to 64-bit mode to be in unallocated memory
> > > > > and potentially get clobbered by the EFI stub.
> > > > >
> > > > > The first try to allocate pref_address should be calling
> > > > > grub_efi_allocate_fixed instead.
> > > >
> > > > Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
> > > > to get these logged somewhere.
> > >
> > > Ok. For dracut, the process for building the unified kernel image needs
> > > a check to make sure the kernel can fit in the space provided for it --
> > > there is 16MiB of space and the distro bzImage's are up to 10-11MiB in
> > > size, so there's some slack left at present.
> > >
> > > Additionally, in mixed-mode, the unified kernel images are quite likely
> > > to end up with early pgtables from startup_32 clobbering the initrd,
> > > independently of the recent kernel changes. Hopefully no-one actually
> > > uses these in mixed-mode.
> >
> > The grub EFI handover entry patch is busted in mixed-mode for another
> > reason -- while it allocates init_size, it doesn't use the correct
> > alignment. I tested on a Debian buster VM in mixed-mode (that was the
> > one I was able to get to install/boot with mixed-mode), and the early
> > pagetable from startup_32 ends up in unallocated memory due to the
> > rounding up of the bzImage address to account for kernel alignment. This
> > would be an existing problem prior to these patches.
> >
> > Should we try to handle this in the kernel? At some point KASLR is going
> > to pick that memory for the kernel and overwrite the pagetables I would
> > think, resulting in sporadic crashes that are almost unreproducible.
>
> Upstream GRUB does not implement the EFI handover protocol at all, and
> the distros all have their own GRUB forks that implement this along
> with mixed mode, secure boot, shim, measured boot etc.

Exactly...

> What you are saying is that GRUB forks turn out to exist that violate
> both the PE/COFF specification and the Linux/x86 boot protocol in a
> way that might break mixed mode, and nobody noticed until you happened
> to find it by code inspection. While I appreciate the effort, I think
> this is where I would like to draw the line, and say that there is
> only so much we can do to work around bugs in out-of-tree forks of
> other projects. So unless it can be done cleanly and without losing
> any of the benefits of the recent cleanup and optimization work, I'd
> say don't bother.

I fully agree!

Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-04-14 14:57 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200109150218.16544-1-nivedita@alum.mit.edu>
     [not found] ` <20200405154245.11972-1-me@prok.pw>
     [not found]   ` <20200405231845.GA3095309@rani.riverdale.lan>
     [not found]     ` <c692eea9213172d8ef937322b02ff585b0dfea82.camel@prok.pw>
     [not found]       ` <c692eea9213172d8ef937322b02ff585b0dfea82.camel-fHUSW+XspFU@public.gmane.org>
2020-04-06  3:51         ` [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage Arvind Sankar
2020-04-06  7:32           ` Ard Biesheuvel
2020-04-06  8:47             ` Borislav Petkov
     [not found]               ` <20200406084738.GA2520-Jj63ApZU6fQ@public.gmane.org>
2020-04-06  9:11                 ` Ard Biesheuvel
     [not found]                   ` <CAMj1kXHAieZDvPKfjF=J+G=QVS+=XS-b4RP_=mjCEFEB_E_+Qw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-06 11:20                     ` Borislav Petkov
2020-04-06 13:22                       ` Arvind Sankar
     [not found]                         ` <20200406132215.GA113388-ZaHV2fEG+eCZQHugZx4kin66tl449arB@public.gmane.org>
2020-04-06 13:29                           ` Ard Biesheuvel
     [not found]                             ` <CAMj1kXG+34-bK1XuxX5VopkRt1SV1ewUAEmif+aQj5cJQ=9vbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-06 16:01                               ` Arvind Sankar
     [not found]                                 ` <20200406160148.GB113388-ZaHV2fEG+eCZQHugZx4kin66tl449arB@public.gmane.org>
2020-04-06 16:22                                   ` Ard Biesheuvel
     [not found]                                     ` <CAMj1kXFKDWB9n0kWhXHLH0XP0O1v_0b=bFJad=kBvx2qVxqDcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-06 16:52                                       ` Arvind Sankar
2020-04-06 16:59                                         ` Ard Biesheuvel
     [not found]                                           ` <CAMj1kXEUkJ1XJ9OTsijeq8tNNYC00bXqEV44OMtX5ugo9WoLKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-06 18:06                                             ` [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data Arvind Sankar
2020-04-06 18:06                                               ` [PATCH 2/2] efi/x86: Always relocate the kernel for EFI handover entry Arvind Sankar
     [not found]                                               ` <20200406180614.429454-1-nivedita-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
2020-04-06 18:29                                                 ` [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data Ard Biesheuvel
2020-04-08  7:43                                                 ` Dave Young
     [not found]                                                   ` <20200408074334.GA21886-0VdLhd/A9Pl+NNSt+8eSiB/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2020-04-08  7:49                                                     ` Ard Biesheuvel
2020-04-09 14:39                                                       ` Arvind Sankar
2020-04-09 14:47                                                         ` Ard Biesheuvel
     [not found]                                                           ` <CAMj1kXEm=E6B+kjZktG=sBPLQ=_HFfUz6KFLskNGzRnuMjn0gA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-09 16:35                                                             ` Arvind Sankar
     [not found]                                                               ` <20200409163530.GA785575-ZaHV2fEG+eCZQHugZx4kin66tl449arB@public.gmane.org>
2020-04-10 14:47                                                                 ` Arvind Sankar
2020-04-10 15:26                                                                   ` Ard Biesheuvel
     [not found]                                                                     ` <CAMj1kXHMQgnmdoA+qKLGa=gYg4J2p-DU3-K1LiM=AT61pi+Fvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-14 14:57                                                                       ` Daniel Kiper
2020-04-10 11:26                                               ` Thomas Meyer
2020-04-10 14:38                                                 ` Arvind Sankar
2020-04-11  8:50                                                   ` Thomas Meyer
2020-04-06 17:21                               ` [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).