Re: [PATCH] xen/arm: skip first 32 bytes of zimage32

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] xen/arm: skip first 32 bytes of zimage32
       [not found] <20220320010509.3605525-1-sstabellini@kernel.org>
@ 2022-03-20  1:43 ` Stefano Stabellini
  2022-03-20  7:47 ` Julien Grall
  1 sibling, 0 replies; 7+ messages in thread
From: Stefano Stabellini @ 2022-03-20  1:43 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: julien, bertrand.marquis, Volodymyr_Babchuk, Stefano Stabellini,
	xen-devel

+xen-devel

On Sat, 19 Mar 2022, Stefano Stabellini wrote:
> From: Stefano Stabellini <stefano.stabellini@xilinx.com>
> 
> The first 32 bytes of zImage32 are NOPs, not useful just there for
> compatibility. The reason is that some bootloaders skip the first 32
> bytes when starting the kernel. See the comment in Linux
> arch/arm/boot/compressed/head.S.
> 
> Since the introduction of CONFIG_EFI in Linux arm32, those NOPs
> operations have changed implementation from:
> 
>     mov r0, r0
> 
> to:
>     .inst   MZ_MAGIC | (0x1310 << 16)   @ tstne r0, #0x4d000
> 
> See arch/arm/boot/compressed/efi-header.S.
> 
> The new implementation doesn't work on Xen (at least on all versions of
> QEMU I tried).
> 
> Since the first 32 bytes are made to be skipped anyway, skip them. This
> enables Xen to load and start successfully (on QEMU) aarch32 kernels
> with CONFIG_EFI.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> ---
>  xen/arch/arm/kernel.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> index 8f43caa186..105a010bf4 100644
> --- a/xen/arch/arm/kernel.c
> +++ b/xen/arch/arm/kernel.c
> @@ -411,10 +411,10 @@ static int __init kernel_zimage32_probe(struct kernel_info *info,
>          }
>      }
>  
> -    info->zimage.kernel_addr = addr;
> +    info->zimage.kernel_addr = addr + 32;
>  
>      info->zimage.start = start;
> -    info->zimage.len = end - start;
> +    info->zimage.len = end - start - 32;
>  
>      info->load = kernel_zimage_load;
>  
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/arm: skip first 32 bytes of zimage32
       [not found] <20220320010509.3605525-1-sstabellini@kernel.org>
  2022-03-20  1:43 ` [PATCH] xen/arm: skip first 32 bytes of zimage32 Stefano Stabellini
@ 2022-03-20  7:47 ` Julien Grall
  2022-03-20 19:17   ` Julien Grall
  1 sibling, 1 reply; 7+ messages in thread
From: Julien Grall @ 2022-03-20  7:47 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: bertrand.marquis, Volodymyr_Babchuk, Stefano Stabellini, xen-devel

Hi Stefano,

On 20/03/2022 01:05, Stefano Stabellini wrote:
> From: Stefano Stabellini <stefano.stabellini@xilinx.com>
> 
> The first 32 bytes of zImage32 are NOPs, not useful just there for
> compatibility. The reason is that some bootloaders skip the first 32
> bytes when starting the kernel. See the comment in Linux
> arch/arm/boot/compressed/head.S.

Please mention the Linux verson.

> 
> Since the introduction of CONFIG_EFI in Linux arm32, those NOPs
> operations have changed implementation from:
> 
>      mov r0, r0
> 
> to:
>      .inst   MZ_MAGIC | (0x1310 << 16)   @ tstne r0, #0x4d000

I have duplicated the comment and the instructions below:

                 @ This is a two-instruction NOP, which happens to bear the
                 @ PE/COFF signature "MZ" in the first two bytes, so the 
kernel
                 @ is accepted as an EFI binary. Booting via the UEFI stub
                 @ will not execute those instructions, but the ARM/Linux
                 @ boot protocol does, so we need some NOPs here.
                 .inst   MZ_MAGIC | (0xe225 << 16)       @ eor r5, r5, 
0x4d000
                 eor     r5, r5, 0x4d000                 @ undo previous 
insn


I read this as they are NOPs and this change should not break the 
ARM/Linux boot protocol (we are using it in Xen).

BTW, the instruction decoding is different compare to me. Which version 
of Linux are you using?

> 
> See arch/arm/boot/compressed/efi-header.S.
> 
> The new implementation doesn't work on Xen (at least on all versions of
> QEMU I tried).

As I wrote above, they are NOPs. So why is this breaking?

> 
> Since the first 32 bytes are made to be skipped anyway, skip them. This
> enables Xen to load and start successfully (on QEMU) aarch32 kernels
> with CONFIG_EFI.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> ---
>   xen/arch/arm/kernel.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
> index 8f43caa186..105a010bf4 100644
> --- a/xen/arch/arm/kernel.c
> +++ b/xen/arch/arm/kernel.c
> @@ -411,10 +411,10 @@ static int __init kernel_zimage32_probe(struct kernel_info *info,
>           }
>       }
>   
> -    info->zimage.kernel_addr = addr;
> +    info->zimage.kernel_addr = addr + 32;

This will need some explanation in the code. The code in the tools will 
also need to be updated.

>   
>       info->zimage.start = start;
> -    info->zimage.len = end - start;
> +    info->zimage.len = end - start - 32;
>   
>       info->load = kernel_zimage_load;
>   

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/arm: skip first 32 bytes of zimage32
  2022-03-20  7:47 ` Julien Grall
@ 2022-03-20 19:17   ` Julien Grall
  2022-03-21 19:29     ` Stefano Stabellini
  0 siblings, 1 reply; 7+ messages in thread
From: Julien Grall @ 2022-03-20 19:17 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: bertrand.marquis, Volodymyr_Babchuk, Stefano Stabellini, xen-devel

Hi,

On 20/03/2022 07:47, Julien Grall wrote:
> On 20/03/2022 01:05, Stefano Stabellini wrote:
>> From: Stefano Stabellini <stefano.stabellini@xilinx.com>
>>
>> The first 32 bytes of zImage32 are NOPs, not useful just there for
>> compatibility. The reason is that some bootloaders skip the first 32
>> bytes when starting the kernel. See the comment in Linux
>> arch/arm/boot/compressed/head.S.
> 
> Please mention the Linux verson.
> 
>>
>> Since the introduction of CONFIG_EFI in Linux arm32, those NOPs
>> operations have changed implementation from:
>>
>>      mov r0, r0
>>
>> to:
>>      .inst   MZ_MAGIC | (0x1310 << 16)   @ tstne r0, #0x4d000
> 
> I have duplicated the comment and the instructions below:
> 
>                  @ This is a two-instruction NOP, which happens to bear the
>                  @ PE/COFF signature "MZ" in the first two bytes, so the 
> kernel
>                  @ is accepted as an EFI binary. Booting via the UEFI stub
>                  @ will not execute those instructions, but the ARM/Linux
>                  @ boot protocol does, so we need some NOPs here.
>                  .inst   MZ_MAGIC | (0xe225 << 16)       @ eor r5, r5, 
> 0x4d000
>                  eor     r5, r5, 0x4d000                 @ undo previous 
> insn
> 
> 
> I read this as they are NOPs and this change should not break the 
> ARM/Linux boot protocol (we are using it in Xen).
> 
> BTW, the instruction decoding is different compare to me. Which version 
> of Linux are you using?
> 
>>
>> See arch/arm/boot/compressed/efi-header.S.
>>
>> The new implementation doesn't work on Xen (at least on all versions of
>> QEMU I tried).
> 
> As I wrote above, they are NOPs. So why is this breaking?

I have tried to boot the latest Linux (commit 14702b3b2438) with 
CONFIG_EFI=y on QEMU (commit fa435db8ce1d). This booted for me.

As I wrote earlier today, the instruction used as NOPs is slightly 
different. So I had a look at the git history and found the following 
commit:

commit a92882a4d270
Author: Andre Przywara <andre.przywara@arm.com>
Date:   Mon Nov 22 16:28:43 2021 +0100

     ARM: 9159/1: decompressor: Avoid UNPREDICTABLE NOP encoding

     In the decompressor's head.S we need to start with an instruction that
     is some kind of NOP, but also mimics as the PE/COFF header, when the
     kernel is linked as an UEFI application. The clever solution here is
     "tstne r0, #0x4d000", which in the worst case just clobbers the
     condition flags, and bears the magic "MZ" signature in the lowest 
16 bits.

     However the encoding used (0x13105a4d) is actually not valid, since 
bits
     [15:12] are supposed to be 0 (written as "(0)" in the ARM ARM).
     Violating this is UNPREDICTABLE, and *can* trigger an UNDEFINED
     exception. Common Cortex cores seem to ignore those bits, but QEMU
     chooses to trap, so the code goes fishing because of a missing 
exception
     handler at this point. We are just saved by the fact that commonly 
(with
     -kernel or when running from U-Boot) the "Z" bit is set, so the
     instruction is never executed. See [0] for more details.

     To make things more robust and avoid UNPREDICTABLE behaviour in the
     kernel code, lets replace this with a "two-instruction NOP":
     The first instruction is an exclusive OR, the effect of which the 
second
     instruction reverts. This does not leave any trace, neither in a
     register nor in the condition flags. Also it's a perfectly valid
     encoding. Kudos to Peter Maydell for coming up with this gem.

     [0] 
https://lore.kernel.org/qemu-devel/YTPIdbUCmwagL5%2FD@os.inf.tu-dresden.de/T/

     Link: 
https://lore.kernel.org/linux-arm-kernel/20210908162617.104962-1-andre.przywara@arm.com/T/

So this is a bug in the kernel that has nothing to do with Xen.

Therefore, I am not in favor to workaround it in Xen. Where did you get 
your kernel from? If this from a distro, then please work with them to 
ingest the above patch.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/arm: skip first 32 bytes of zimage32
  2022-03-20 19:17   ` Julien Grall
@ 2022-03-21 19:29     ` Stefano Stabellini
  2022-03-21 20:27       ` Julien Grall
  0 siblings, 1 reply; 7+ messages in thread
From: Stefano Stabellini @ 2022-03-21 19:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, bertrand.marquis, Volodymyr_Babchuk,
	Stefano Stabellini, xen-devel

[-- Attachment #1: Type: text/plain, Size: 5761 bytes --]

On Sun, 20 Mar 2022, Julien Grall wrote:
> On 20/03/2022 07:47, Julien Grall wrote:
> > On 20/03/2022 01:05, Stefano Stabellini wrote:
> > > From: Stefano Stabellini <stefano.stabellini@xilinx.com>
> > > 
> > > The first 32 bytes of zImage32 are NOPs, not useful just there for
> > > compatibility. The reason is that some bootloaders skip the first 32
> > > bytes when starting the kernel. See the comment in Linux
> > > arch/arm/boot/compressed/head.S.
> > 
> > Please mention the Linux verson.
> >
> > > Since the introduction of CONFIG_EFI in Linux arm32, those NOPs
> > > operations have changed implementation from:
> > > 
> > >      mov r0, r0
> > > 
> > > to:
> > >      .inst   MZ_MAGIC | (0x1310 << 16)   @ tstne r0, #0x4d000
> > 
> > I have duplicated the comment and the instructions below:
> > 
> >                  @ This is a two-instruction NOP, which happens to bear the
> >                  @ PE/COFF signature "MZ" in the first two bytes, so the
> > kernel
> >                  @ is accepted as an EFI binary. Booting via the UEFI stub
> >                  @ will not execute those instructions, but the ARM/Linux
> >                  @ boot protocol does, so we need some NOPs here.
> >                  .inst   MZ_MAGIC | (0xe225 << 16)       @ eor r5, r5,
> > 0x4d000
> >                  eor     r5, r5, 0x4d000                 @ undo previous
> > insn
> > 
> > 
> > I read this as they are NOPs and this change should not break the ARM/Linux
> > boot protocol (we are using it in Xen).
> > 
> > BTW, the instruction decoding is different compare to me. Which version of
> > Linux are you using?
> > 
> > > 
> > > See arch/arm/boot/compressed/efi-header.S.
> > > 
> > > The new implementation doesn't work on Xen (at least on all versions of
> > > QEMU I tried).
> > 
> > As I wrote above, they are NOPs. So why is this breaking?
> 
> I have tried to boot the latest Linux (commit 14702b3b2438) with CONFIG_EFI=y
> on QEMU (commit fa435db8ce1d). This booted for me.
>
> As I wrote earlier today, the instruction used as NOPs is slightly different.
> So I had a look at the git history and found the following commit:
> 
> commit a92882a4d270
> Author: Andre Przywara <andre.przywara@arm.com>
> Date:   Mon Nov 22 16:28:43 2021 +0100
> 
>     ARM: 9159/1: decompressor: Avoid UNPREDICTABLE NOP encoding
> 
>     In the decompressor's head.S we need to start with an instruction that
>     is some kind of NOP, but also mimics as the PE/COFF header, when the
>     kernel is linked as an UEFI application. The clever solution here is
>     "tstne r0, #0x4d000", which in the worst case just clobbers the
>     condition flags, and bears the magic "MZ" signature in the lowest 16 bits.
> 
>     However the encoding used (0x13105a4d) is actually not valid, since bits
>     [15:12] are supposed to be 0 (written as "(0)" in the ARM ARM).
>     Violating this is UNPREDICTABLE, and *can* trigger an UNDEFINED
>     exception. Common Cortex cores seem to ignore those bits, but QEMU
>     chooses to trap, so the code goes fishing because of a missing exception
>     handler at this point. We are just saved by the fact that commonly (with
>     -kernel or when running from U-Boot) the "Z" bit is set, so the
>     instruction is never executed. See [0] for more details.
> 
>     To make things more robust and avoid UNPREDICTABLE behaviour in the
>     kernel code, lets replace this with a "two-instruction NOP":
>     The first instruction is an exclusive OR, the effect of which the second
>     instruction reverts. This does not leave any trace, neither in a
>     register nor in the condition flags. Also it's a perfectly valid
>     encoding. Kudos to Peter Maydell for coming up with this gem.
> 
>     [0]
> https://lore.kernel.org/qemu-devel/YTPIdbUCmwagL5%2FD@os.inf.tu-dresden.de/T/
> 
>     Link:
> https://lore.kernel.org/linux-arm-kernel/20210908162617.104962-1-andre.przywara@arm.com/T/
> 
> So this is a bug in the kernel that has nothing to do with Xen.
> 
> Therefore, I am not in favor to workaround it in Xen. Where did you get your
> kernel from? If this from a distro, then please work with them to ingest the
> above patch.

Unfortunately all the kernels I tried failed without the Xen fix.

This is the list of kernels that I tried and failed:

- Debian Buster
- Debian Bullseye
- vanilla 4.9
- vanilla 4.10

The latest Alpine Linux kernel also doesn't boot, but that one doesn't
boot even with the fix for other reason. (More in the other email.)


From a Xen gitlab-ci perspective, we just need a kernel that works.
Ideally, we wouldn't rebuild our own but reuse an existing kernel
because that is one less things to maintain in the gitlab-ci build.

We have a couple of options to make progress on the QEMU32 gitlab-ci
test:

1) use Debian Jessie in gitlab-ci and do not commit the Xen-side fix,
   file a Debian bug and revisit the situation in a couple of months
   (Debian might get the fix in the meantime)

2) commit the Xen fix and use Debian Bullseye right now

3) do not commit the Xen fix and build our own kernel now


All of these options work. My preference is 1) or 2).

Between 1) and 2) I have a slight preference for 2) for this reason: I
know that in Open Source we try to fix bugs wherever they are (kernel
project, QEMU project, Debian project) rather than working around them,
but in this case it looks like there is a significant amount of binaries
out there that require an update before they can boot on Xen. I think it
is one of those times where it is worth considering the Xen fix, or
should I say workaround, if it is considered harmless.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/arm: skip first 32 bytes of zimage32
  2022-03-21 19:29     ` Stefano Stabellini
@ 2022-03-21 20:27       ` Julien Grall
  2022-03-21 22:07         ` Stefano Stabellini
  0 siblings, 1 reply; 7+ messages in thread
From: Julien Grall @ 2022-03-21 20:27 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: bertrand.marquis, Volodymyr_Babchuk, Stefano Stabellini, xen-devel

Hi,

On 21/03/2022 19:29, Stefano Stabellini wrote:
> On Sun, 20 Mar 2022, Julien Grall wrote:
>> On 20/03/2022 07:47, Julien Grall wrote:
>>> On 20/03/2022 01:05, Stefano Stabellini wrote:
>>>> From: Stefano Stabellini <stefano.stabellini@xilinx.com>
>>>>
>>>> The first 32 bytes of zImage32 are NOPs, not useful just there for
>>>> compatibility. The reason is that some bootloaders skip the first 32
>>>> bytes when starting the kernel. See the comment in Linux
>>>> arch/arm/boot/compressed/head.S.
>>>
>>> Please mention the Linux verson.
>>>
>>>> Since the introduction of CONFIG_EFI in Linux arm32, those NOPs
>>>> operations have changed implementation from:
>>>>
>>>>       mov r0, r0
>>>>
>>>> to:
>>>>       .inst   MZ_MAGIC | (0x1310 << 16)   @ tstne r0, #0x4d000
>>>
>>> I have duplicated the comment and the instructions below:
>>>
>>>                   @ This is a two-instruction NOP, which happens to bear the
>>>                   @ PE/COFF signature "MZ" in the first two bytes, so the
>>> kernel
>>>                   @ is accepted as an EFI binary. Booting via the UEFI stub
>>>                   @ will not execute those instructions, but the ARM/Linux
>>>                   @ boot protocol does, so we need some NOPs here.
>>>                   .inst   MZ_MAGIC | (0xe225 << 16)       @ eor r5, r5,
>>> 0x4d000
>>>                   eor     r5, r5, 0x4d000                 @ undo previous
>>> insn
>>>
>>>
>>> I read this as they are NOPs and this change should not break the ARM/Linux
>>> boot protocol (we are using it in Xen).
>>>
>>> BTW, the instruction decoding is different compare to me. Which version of
>>> Linux are you using?
>>>
>>>>
>>>> See arch/arm/boot/compressed/efi-header.S.
>>>>
>>>> The new implementation doesn't work on Xen (at least on all versions of
>>>> QEMU I tried).
>>>
>>> As I wrote above, they are NOPs. So why is this breaking?
>>
>> I have tried to boot the latest Linux (commit 14702b3b2438) with CONFIG_EFI=y
>> on QEMU (commit fa435db8ce1d). This booted for me.
>>
>> As I wrote earlier today, the instruction used as NOPs is slightly different.
>> So I had a look at the git history and found the following commit:
>>
>> commit a92882a4d270
>> Author: Andre Przywara <andre.przywara@arm.com>
>> Date:   Mon Nov 22 16:28:43 2021 +0100
>>
>>      ARM: 9159/1: decompressor: Avoid UNPREDICTABLE NOP encoding
>>
>>      In the decompressor's head.S we need to start with an instruction that
>>      is some kind of NOP, but also mimics as the PE/COFF header, when the
>>      kernel is linked as an UEFI application. The clever solution here is
>>      "tstne r0, #0x4d000", which in the worst case just clobbers the
>>      condition flags, and bears the magic "MZ" signature in the lowest 16 bits.
>>
>>      However the encoding used (0x13105a4d) is actually not valid, since bits
>>      [15:12] are supposed to be 0 (written as "(0)" in the ARM ARM).
>>      Violating this is UNPREDICTABLE, and *can* trigger an UNDEFINED
>>      exception. Common Cortex cores seem to ignore those bits, but QEMU
>>      chooses to trap, so the code goes fishing because of a missing exception
>>      handler at this point. We are just saved by the fact that commonly (with
>>      -kernel or when running from U-Boot) the "Z" bit is set, so the
>>      instruction is never executed. See [0] for more details.
>>
>>      To make things more robust and avoid UNPREDICTABLE behaviour in the
>>      kernel code, lets replace this with a "two-instruction NOP":
>>      The first instruction is an exclusive OR, the effect of which the second
>>      instruction reverts. This does not leave any trace, neither in a
>>      register nor in the condition flags. Also it's a perfectly valid
>>      encoding. Kudos to Peter Maydell for coming up with this gem.
>>
>>      [0]
>> https://lore.kernel.org/qemu-devel/YTPIdbUCmwagL5%2FD@os.inf.tu-dresden.de/T/
>>
>>      Link:
>> https://lore.kernel.org/linux-arm-kernel/20210908162617.104962-1-andre.przywara@arm.com/T/
>>
>> So this is a bug in the kernel that has nothing to do with Xen.
>>
>> Therefore, I am not in favor to workaround it in Xen. Where did you get your
>> kernel from? If this from a distro, then please work with them to ingest the
>> above patch.
> 
> Unfortunately all the kernels I tried failed without the Xen fix.
> 
> This is the list of kernels that I tried and failed:
> 
> - Debian Buster
> - Debian Bullseye
> - vanilla 4.9
> - vanilla 4.10

Does this mean you where using v4.{9,10}.0 rather than the stable branch?

Note that AFAICT, 4.10 is not even supported by the kernel communinty 
(see [1]).

> 
> The latest Alpine Linux kernel also doesn't boot, but that one doesn't
> boot even with the fix for other reason. (More in the other email.)
> 
> 
>  From a Xen gitlab-ci perspective, we just need a kernel that works.
> Ideally, we wouldn't rebuild our own but reuse an existing kernel
> because that is one less things to maintain in the gitlab-ci build.
> 
> We have a couple of options to make progress on the QEMU32 gitlab-ci
> test:
> 
> 1) use Debian Jessie in gitlab-ci and do not commit the Xen-side fix,
>     file a Debian bug and revisit the situation in a couple of months
>     (Debian might get the fix in the meantime)

Why do we need to use Debian here? Couldn't we use Ubuntu (or any 
distros that have a newer kernel)?

> 
> 2) commit the Xen fix and use Debian Bullseye right now
> 
> 3) do not commit the Xen fix and build our own kernel now
> 
> 
> All of these options work. My preference is 1) or 2).
> 
> Between 1) and 2) I have a slight preference for 2) for this reason: I
> know that in Open Source we try to fix bugs wherever they are (kernel
> project, QEMU project, Debian project) rather than working around them,
> but in this case it looks like there is a significant amount of binaries
> out there that require an update before they can boot on Xen. 

My problem here is the bug is not Xen specific. You would exactly have 
the same problem if you were booting on baremetal. But as Andre wrote in 
his commit message, this is only working by luck.

So we are setting another bad precendence by preserving the luck.

I appreciate we recently agreed to merge the code to emulate ldr/str. 
But this was based on the fact the Arm Arm doesn't clearly forbid such 
access. This is quite different here as the instructions are UNDEFINED.

So I am not willing to accept a lot of code in Xen just to workaround a 
software bug (see more below).

> I think it
> is one of those times where it is worth considering the Xen fix, or
> should I say workaround, if it is considered harmless.

The problem with your workaround is now the zImage will be loaded in a 
different aligned. For instance, if the symbol <foo> was 2MB aligned, 
now, it will be aligned to 2MB - 32. See kernel_zimage_load().

The booting protocol (see Documentation/arm/booting.rst). Doesn't look 
to impose an alignment. But I wouldn't be surprised if there is an 
assumption here.

Furthermore, there are some problem if the kernel is expected to be 
loaded a specific address.

I do expect the code to become more convoluted. This would also have to 
be duplicated in the tools side. Overall, this will likely be more than 
I am willing to accept for this issue.

Therefore I would like to suggest a more simple workaround. Per the 
commit message of the Linux bug fix, U-boot and direct loading are not 
affected because the bit "Z" is set.

So how about updating PSR_GUEST32_INIT to set Z?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/arm: skip first 32 bytes of zimage32
  2022-03-21 20:27       ` Julien Grall
@ 2022-03-21 22:07         ` Stefano Stabellini
  2022-03-22 13:52           ` Julien Grall
  0 siblings, 1 reply; 7+ messages in thread
From: Stefano Stabellini @ 2022-03-21 22:07 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, bertrand.marquis, Volodymyr_Babchuk,
	Stefano Stabellini, xen-devel

On Mon, 21 Mar 2022, Julien Grall wrote:
> > This is the list of kernels that I tried and failed:
> > 
> > - Debian Buster
> > - Debian Bullseye
> > - vanilla 4.9
> > - vanilla 4.10
> 
> Does this mean you where using v4.{9,10}.0 rather than the stable branch?
> 
> Note that AFAICT, 4.10 is not even supported by the kernel communinty (see
> [1]).

Yeah... I was trying to bisect the Debian kernel failure. That is how I
discovered that CONFIG_EFI was causing it. So yes, I only tried the
vanilla, now unsupported, releases.


> > The latest Alpine Linux kernel also doesn't boot, but that one doesn't
> > boot even with the fix for other reason. (More in the other email.)
> > 
> > 
> >  From a Xen gitlab-ci perspective, we just need a kernel that works.
> > Ideally, we wouldn't rebuild our own but reuse an existing kernel
> > because that is one less things to maintain in the gitlab-ci build.
> > 
> > We have a couple of options to make progress on the QEMU32 gitlab-ci
> > test:
> > 
> > 1) use Debian Jessie in gitlab-ci and do not commit the Xen-side fix,
> >     file a Debian bug and revisit the situation in a couple of months
> >     (Debian might get the fix in the meantime)
> 
> Why do we need to use Debian here? Couldn't we use Ubuntu (or any distros that
> have a newer kernel)?

We could use something else but see below.


> > 2) commit the Xen fix and use Debian Bullseye right now
> > 
> > 3) do not commit the Xen fix and build our own kernel now
> > 
> > 
> > All of these options work. My preference is 1) or 2).
> > 
> > Between 1) and 2) I have a slight preference for 2) for this reason: I
> > know that in Open Source we try to fix bugs wherever they are (kernel
> > project, QEMU project, Debian project) rather than working around them,
> > but in this case it looks like there is a significant amount of binaries
> > out there that require an update before they can boot on Xen. 
> 
> My problem here is the bug is not Xen specific. You would exactly have the
> same problem if you were booting on baremetal. But as Andre wrote in his
> commit message, this is only working by luck.
> 
> So we are setting another bad precendence by preserving the luck.
> 
> I appreciate we recently agreed to merge the code to emulate ldr/str. But this
> was based on the fact the Arm Arm doesn't clearly forbid such access. This is
> quite different here as the instructions are UNDEFINED.

Yeah, I understand your point and I also kind of agree.


> So I am not willing to accept a lot of code in Xen just to workaround a
> software bug (see more below).
> 
> > I think it
> > is one of those times where it is worth considering the Xen fix, or
> > should I say workaround, if it is considered harmless.
> 
> The problem with your workaround is now the zImage will be loaded in a
> different aligned. For instance, if the symbol <foo> was 2MB aligned, now, it
> will be aligned to 2MB - 32. See kernel_zimage_load().
> 
> The booting protocol (see Documentation/arm/booting.rst). Doesn't look to
> impose an alignment. But I wouldn't be surprised if there is an assumption
> here.
> 
> Furthermore, there are some problem if the kernel is expected to be loaded a
> specific address.
> 
> I do expect the code to become more convoluted. This would also have to be
> duplicated in the tools side. Overall, this will likely be more than I am
> willing to accept for this issue.
> 
> Therefore I would like to suggest a more simple workaround. Per the commit
> message of the Linux bug fix, U-boot and direct loading are not affected
> because the bit "Z" is set.
> 
> So how about updating PSR_GUEST32_INIT to set Z?

That worked! Excellent suggestion and much safer than the 32 byte
workaround. I tested with the Debian Bullseye kernel.

I think you might have a better suggestion for the commit message.

---
xen/arm: set CPSR Z bit when creating aarch32 guests

The first 32 bytes of zImage32 are NOPs. When CONFIG_EFI is enabled in
the kernel, certain versions of Linux have a bug in the way the initial
NOP instructions gets encoded (invalid encoding), resulting in an
unbootable kernel. See commit a92882a4d270 in the Linux kernel for all
the details.

All kernel releases starting from Linux 4.9 without commit a92882a4d270
are affected.

Fortunately there is a simple workaround: setting the "Z" bit in CPSR
make it so those invalid  NOP instructions are never executed. Setting
the "Z" bit makes those kernel versions bootable again and it is
harmless in the other cases.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>

diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 94b31511dd..309684e946 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -361,6 +361,7 @@ typedef uint64_t xen_callback_t;
 #define PSR_DBG_MASK    (1<<9)        /* arm64: Debug Exception mask */
 #define PSR_IT_MASK     (0x0600fc00)  /* Thumb If-Then Mask */
 #define PSR_JAZELLE     (1<<24)       /* Jazelle Mode */
+#define PSR_Z           (1<<30)       /* Zero condition flag */
 
 /* 32 bit modes */
 #define PSR_MODE_USR 0x10
@@ -383,7 +384,7 @@ typedef uint64_t xen_callback_t;
 #define PSR_MODE_EL1t 0x04
 #define PSR_MODE_EL0t 0x00
 
-#define PSR_GUEST32_INIT  (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC)
+#define PSR_GUEST32_INIT  (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC|PSR_Z)
 #define PSR_GUEST64_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_EL1h)
 
 #define SCTLR_GUEST_INIT    xen_mk_ullong(0x00c50078)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen/arm: skip first 32 bytes of zimage32
  2022-03-21 22:07         ` Stefano Stabellini
@ 2022-03-22 13:52           ` Julien Grall
  0 siblings, 0 replies; 7+ messages in thread
From: Julien Grall @ 2022-03-22 13:52 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: bertrand.marquis, Volodymyr_Babchuk, Stefano Stabellini, xen-devel

Hi,

On 21/03/2022 22:07, Stefano Stabellini wrote:
> On Mon, 21 Mar 2022, Julien Grall wrote:
>>> This is the list of kernels that I tried and failed:
>>>
>>> - Debian Buster
>>> - Debian Bullseye
>>> - vanilla 4.9
>>> - vanilla 4.10
>>
>> Does this mean you where using v4.{9,10}.0 rather than the stable branch?
>>
>> Note that AFAICT, 4.10 is not even supported by the kernel communinty (see
>> [1]).
> 
> Yeah... I was trying to bisect the Debian kernel failure. That is how I
> discovered that CONFIG_EFI was causing it. So yes, I only tried the
> vanilla, now unsupported, releases.
> 
> 
>>> The latest Alpine Linux kernel also doesn't boot, but that one doesn't
>>> boot even with the fix for other reason. (More in the other email.)
>>>
>>>
>>>   From a Xen gitlab-ci perspective, we just need a kernel that works.
>>> Ideally, we wouldn't rebuild our own but reuse an existing kernel
>>> because that is one less things to maintain in the gitlab-ci build.
>>>
>>> We have a couple of options to make progress on the QEMU32 gitlab-ci
>>> test:
>>>
>>> 1) use Debian Jessie in gitlab-ci and do not commit the Xen-side fix,
>>>      file a Debian bug and revisit the situation in a couple of months
>>>      (Debian might get the fix in the meantime)
>>
>> Why do we need to use Debian here? Couldn't we use Ubuntu (or any distros that
>> have a newer kernel)?
> 
> We could use something else but see below.
> 
> 
>>> 2) commit the Xen fix and use Debian Bullseye right now
>>>
>>> 3) do not commit the Xen fix and build our own kernel now
>>>
>>>
>>> All of these options work. My preference is 1) or 2).
>>>
>>> Between 1) and 2) I have a slight preference for 2) for this reason: I
>>> know that in Open Source we try to fix bugs wherever they are (kernel
>>> project, QEMU project, Debian project) rather than working around them,
>>> but in this case it looks like there is a significant amount of binaries
>>> out there that require an update before they can boot on Xen.
>>
>> My problem here is the bug is not Xen specific. You would exactly have the
>> same problem if you were booting on baremetal. But as Andre wrote in his
>> commit message, this is only working by luck.
>>
>> So we are setting another bad precendence by preserving the luck.
>>
>> I appreciate we recently agreed to merge the code to emulate ldr/str. But this
>> was based on the fact the Arm Arm doesn't clearly forbid such access. This is
>> quite different here as the instructions are UNDEFINED.
> 
> Yeah, I understand your point and I also kind of agree.
> 
> 
>> So I am not willing to accept a lot of code in Xen just to workaround a
>> software bug (see more below).
>>
>>> I think it
>>> is one of those times where it is worth considering the Xen fix, or
>>> should I say workaround, if it is considered harmless.
>>
>> The problem with your workaround is now the zImage will be loaded in a
>> different aligned. For instance, if the symbol <foo> was 2MB aligned, now, it
>> will be aligned to 2MB - 32. See kernel_zimage_load().
>>
>> The booting protocol (see Documentation/arm/booting.rst). Doesn't look to
>> impose an alignment. But I wouldn't be surprised if there is an assumption
>> here.
>>
>> Furthermore, there are some problem if the kernel is expected to be loaded a
>> specific address.
>>
>> I do expect the code to become more convoluted. This would also have to be
>> duplicated in the tools side. Overall, this will likely be more than I am
>> willing to accept for this issue.
>>
>> Therefore I would like to suggest a more simple workaround. Per the commit
>> message of the Linux bug fix, U-boot and direct loading are not affected
>> because the bit "Z" is set.
>>
>> So how about updating PSR_GUEST32_INIT to set Z?
> 
> That worked! Excellent suggestion and much safer than the 32 byte
> workaround. I tested with the Debian Bullseye kernel.

I would not say it is safer because we will be relying on the processor 
(here QEMU) to check the condition and ignore the UNPREDICTABLE 
behavior. So, in theory, the issue could reappear on a new processor. 
Note that I haven't checked what the Arm Arm says here.

But the workaround is less intrusive. Hence, why I prefer this approach.

> 
> I think you might have a better suggestion for the commit message.
> 
> ---
> xen/arm: set CPSR Z bit when creating aarch32 guests
> 
> The first 32 bytes of zImage32 are NOPs. When CONFIG_EFI is enabled in

The format is zImage not zImage32.

> the kernel, certain versions of Linux have a bug in the way the initial
> NOP instructions gets encoded (invalid encoding), resulting in an
> unbootable kernel.

This paragraph is a bit difficult to understand. I would say "certain 
version of Linux will use a UNPREDICATABLE NOP encoding".

Also, I would suggest make clear that the issue depends on the processor.

  See commit a92882a4d270 in the Linux kernel for all
> the details.
> 
> All kernel releases starting from Linux 4.9 without commit a92882a4d270
> are affected.
> 
> Fortunately there is a simple workaround: setting the "Z" bit in CPSR
> make it so those invalid  NOP instructions are never executed.

I would say that the instruction is conditional (not equal). So, on QEMU 
at least, the instruction will end up to be ignored and not generate an 
UNDEFINED (XXX this needs to be checked).

> Setting
> the "Z" bit makes those kernel versions bootable again and it is
> harmless in the other cases.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
> 
> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
> index 94b31511dd..309684e946 100644
> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -361,6 +361,7 @@ typedef uint64_t xen_callback_t;
>   #define PSR_DBG_MASK    (1<<9)        /* arm64: Debug Exception mask */
>   #define PSR_IT_MASK     (0x0600fc00)  /* Thumb If-Then Mask */
>   #define PSR_JAZELLE     (1<<24)       /* Jazelle Mode */
> +#define PSR_Z           (1<<30)       /* Zero condition flag */
>   
>   /* 32 bit modes */
>   #define PSR_MODE_USR 0x10
> @@ -383,7 +384,7 @@ typedef uint64_t xen_callback_t;
>   #define PSR_MODE_EL1t 0x04
>   #define PSR_MODE_EL0t 0x00
>   
> -#define PSR_GUEST32_INIT  (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC)
> +#define PSR_GUEST32_INIT  (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC|PSR_Z)

The flags are set from the highest to the lowest bit. I would like to 
keep it like that as it helps to match with the Arm Arm. So can you move 
PSR_Z to the beginning?

Also, please add a comment in on top of PSR_GUEST32_INIT explaining why 
we set Z.

>   #define PSR_GUEST64_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_EL1h)
>   
>   #define SCTLR_GUEST_INIT    xen_mk_ullong(0x00c50078)

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-22 13:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220320010509.3605525-1-sstabellini@kernel.org>
2022-03-20  1:43 ` [PATCH] xen/arm: skip first 32 bytes of zimage32 Stefano Stabellini
2022-03-20  7:47 ` Julien Grall
2022-03-20 19:17   ` Julien Grall
2022-03-21 19:29     ` Stefano Stabellini
2022-03-21 20:27       ` Julien Grall
2022-03-21 22:07         ` Stefano Stabellini
2022-03-22 13:52           ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.