linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
       [not found]   ` <CAFEAcA_47wyar3WNhuBmNPdr02RHx-wn_TeyFaajpjzvzG8j5Q@mail.gmail.com>
@ 2018-11-09  1:47     ` Li Zhijian
  2018-11-09  7:20       ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Li Zhijian @ 2018-11-09  1:47 UTC (permalink / raw)
  To: Peter Maydell, x86, hpa, bp, mingo, tglx
  Cc: QEMU Developers, Philip Li, zhijianx.li, linux-kernel


On 11/08/2018 07:06 PM, Peter Maydell wrote:
> On 8 November 2018 at 10:59, Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>> x86/x86_64 has alredy supported 4G initrd.
>>
>> linux/arch/x86/boot/header.S:
>>   # (Header version 0x0203 or later) the highest safe address for the contents
>>   # of an initrd. The current kernel allows up to 4 GB, but leave it at 2 GB to
>>   # avoid possible bootloader bugs.
>>
>> CC: Philip Li <philip.li@intel.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   hw/i386/pc.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index cd5029c..e1b910f 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -913,6 +913,12 @@ static void load_linux(PCMachineState *pcms,
>>       /* highest address for loading the initrd */
>>       if (protocol >= 0x203) {
>>           initrd_max = ldl_p(header+0x22c);
>> +        if (initrd_max == 0x7fffffff) {
>> +            /* for some reasons, initrd_max is hard code with 0x7fffffff
>> +             * hard code to 4G - 1 to allow 4G initrd
>> +             */
>> +            initrd_max = UINT32_MAX - 1;
>> +        }
> I don't understand this. If the header of the file we're using
> says "this is the maximum", then we should trust the header to
> in fact not be lying to us, shouldn't we ?
>
> If the kernel initrd creation process creates an initrd which
> is larger than 2GB and also claims that it can't be placed
> with any part of it above 2GB, then that sounds like a bug
> in the initrd creation process...

Exactly, it's a real problem.

Add x86 maintainers and LKML:

The background is that QEMU want to support up to 4G initrd. but linux header (
initrd_addr_max field) only allow 2G-1.
Is one of the below approaches reasonable:
1) change initrd_addr_max to 4G-1 directly simply(arch/x86/boot/header.S)?
2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header said 2G-1
3) any else






>
>>       } else {
>>           initrd_max = 0x37ffffff;
>>       }
> This patch should come last in the series: only after we have fixed all
> of QEMU's internal plumbing to handle larger initrd sizes should we
> enable it.

Got it.

Thanks
Zhijian

>
> thanks
> -- PMM
>
>
>




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09  1:47     ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Li Zhijian
@ 2018-11-09  7:20       ` Ingo Molnar
  2018-11-09  9:57         ` Li Zhijian
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2018-11-09  7:20 UTC (permalink / raw)
  To: Li Zhijian, Juergen Gross
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, zhijianx.li, linux-kernel, Linus Torvalds,
	Peter Zijlstra, Kees Cook


* Li Zhijian <lizhijian@cn.fujitsu.com> wrote:

> > If the kernel initrd creation process creates an initrd which
> > is larger than 2GB and also claims that it can't be placed
> > with any part of it above 2GB, then that sounds like a bug
> > in the initrd creation process...
> 
> Exactly, it's a real problem.
> 
> Add x86 maintainers and LKML:
> 
> The background is that QEMU want to support up to 4G initrd. but linux header (
> initrd_addr_max field) only allow 2G-1.
> Is one of the below approaches reasonable:
> 1) change initrd_addr_max to 4G-1 directly simply(arch/x86/boot/header.S)?
> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header said 2G-1
> 3) any else

A 10 years old comment from hpa says:

  initrd_addr_max: .long 0x7fffffff
                                        # (Header version 0x0203 or later)
                                        # The highest safe address for
                                        # the contents of an initrd
                                        # The current kernel allows up to 4 GB,
                                        # but leave it at 2 GB to avoid
                                        # possible bootloader bugs.

To avoid the potential of bugs lurking in dozens of major and hundreds of 
minor iterations of various Linux bootloaders I'd prefer a real solution 
and extend it - because if there's a 2GB initrd for some weird reason 
today there might be a 4GB one in two years.

The real solution would be to:

 - Extend the boot protocol with a 64-bit field, named initrd_addr64_max 
   or such.

 - We don't change the old field - but if the new field is set by new
   kernels then new bootloaders can use that as a new initrd_addr64_max
   value. (or reject to load the kernel if the address is too high.)

 - The kernel build should also emit a warning when building larger than 
   2GB initrds, with a list of bootloaders that support the new protocol.

Or something along those lines.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09  7:20       ` Ingo Molnar
@ 2018-11-09  9:57         ` Li Zhijian
  2018-11-09 10:04           ` Juergen Gross
  0 siblings, 1 reply; 12+ messages in thread
From: Li Zhijian @ 2018-11-09  9:57 UTC (permalink / raw)
  To: Ingo Molnar, Li Zhijian, Juergen Gross
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Kees Cook

On 11/9/2018 3:20 PM, Ingo Molnar wrote:
> * Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>
>>> If the kernel initrd creation process creates an initrd which
>>> is larger than 2GB and also claims that it can't be placed
>>> with any part of it above 2GB, then that sounds like a bug
>>> in the initrd creation process...
>> Exactly, it's a real problem.
>>
>> Add x86 maintainers and LKML:
>>
>> The background is that QEMU want to support up to 4G initrd. but linux header (
>> initrd_addr_max field) only allow 2G-1.
>> Is one of the below approaches reasonable:
>> 1) change initrd_addr_max to 4G-1 directly simply(arch/x86/boot/header.S)?
>> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header said 2G-1
>> 3) any else
> A 10 years old comment from hpa says:
>
>    initrd_addr_max: .long 0x7fffffff
>                                          # (Header version 0x0203 or later)
>                                          # The highest safe address for
>                                          # the contents of an initrd
>                                          # The current kernel allows up to 4 GB,
>                                          # but leave it at 2 GB to avoid
>                                          # possible bootloader bugs.
>
> To avoid the potential of bugs lurking in dozens of major and hundreds of
> minor iterations of various Linux bootloaders I'd prefer a real solution
> and extend it - because if there's a 2GB initrd for some weird reason
> today there might be a 4GB one in two years.

thank a lots. that's amazing.


>
> The real solution would be to:
>
>   - Extend the boot protocol with a 64-bit field, named initrd_addr64_max
>     or such.
>   - We don't change the old field - but if the new field is set by new
>     kernels then new bootloaders can use that as a new initrd_addr64_max
>     value. (or reject to load the kernel if the address is too high.)
>
>   - The kernel build should also emit a warning when building larger than
>     2GB initrds, with a list of bootloaders that support the new protocol.

Actually i just knew QEMU(Seabios + optionrom(linuxboot_dma.bin)) can support ~4GB initrd so far.

i just drafted at patch to add this field. could you have a look.
another patch which is to document initrd_addr64_max is ongoing.

commit db463ac9c1975f115d1ce2acb82d530c2b63b888
Author: Li Zhijian <lizhijian@cn.fujitsu.com>
Date:   Fri Nov 9 17:24:14 2018 +0800

     x86: Add header field initrd_addr64_max
     
     Years ago, kernel had support load ~4GB initrd. But for some weird reasons (
     avoid possible bootloader bugs), it only allow leave initrd under 2GB address
     space(see initrd_addr_max fild at arch/x86/boot/header.S).
     
     So modern bootloaders have not chance to load >=2G initrd previously.
     
     To avoid the potential of bugs lurking in dozens of major and hundreds of
     minor iterations of various Linux bootloaders. Ingo suggests to add a new field
     initrd_addr64_max. If bootloader believes that it can load initrd to >=2G
     address space, it can use initrd_addr64_max as the maximum loading address in
     stead of the old field initrd_addr_max.

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 4c881c8..5fc3ebe 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -300,7 +300,7 @@ _start:
         # Part 2 of the header, from the old setup.S
  
                 .ascii  "HdrS"          # header signature
-               .word   0x020e          # header version number (>= 0x0105)
+               .word   0x020f          # header version number (>= 0x0105)
                                         # or else old loadlin-1.5 will fail)
                 .globl realmode_swtch
  realmode_swtch:        .word   0, 0            # default_switch, SETUPSEG
@@ -562,6 +562,12 @@ acpi_rsdp_addr:            .quad 0                 # 64-bit physical pointer to the
                                                 # ACPI RSDP table, added with
                                                 # version 2.14
  
+#ifdef CONFIG_INITRD_SIZE_4GB
+initrd_addr64_max:     .quad 0xffffffff        # allow ~4G initrd since 2.15
+#else
+initrd_addr64_max:     .quad 0
+#endif
+
  # End of setup header #####################################################
  
         .section ".entrytext", "ax"
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 22f89d0..b86013d 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -90,6 +90,7 @@ struct setup_header {
         __u32   init_size;
         __u32   handover_offset;
         __u64   acpi_rsdp_addr;
+       __u64   initrd_addr64_max;
  } __attribute__((packed));
  
  struct sys_desc_table {
diff --git a/init/Kconfig b/init/Kconfig
index a4112e9..611d4af 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1080,6 +1080,14 @@ config BLK_DEV_INITRD
  
           If unsure say Y.
  
+config INITRD_SIZE_4GB
+       bool "4G size initrd support"
+       depends on (X86 || X86_64)
+       help
+         This option enables support ~4GB initrd.
+
+         if unsure say N.
+
  if BLK_DEV_INITRD
  
  source "usr/Kconfig"

Thanks
Zhijian




> Or something along those lines.
>
> Thanks,
>
> 	Ingo

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09  9:57         ` Li Zhijian
@ 2018-11-09 10:04           ` Juergen Gross
  2018-11-09 13:11             ` Li Zhijian
  2018-11-09 13:40             ` Li Zhijian
  0 siblings, 2 replies; 12+ messages in thread
From: Juergen Gross @ 2018-11-09 10:04 UTC (permalink / raw)
  To: Li Zhijian, Ingo Molnar, Li Zhijian
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Kees Cook

On 09/11/2018 10:57, Li Zhijian wrote:
> On 11/9/2018 3:20 PM, Ingo Molnar wrote:
>> * Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>>
>>>> If the kernel initrd creation process creates an initrd which
>>>> is larger than 2GB and also claims that it can't be placed
>>>> with any part of it above 2GB, then that sounds like a bug
>>>> in the initrd creation process...
>>> Exactly, it's a real problem.
>>>
>>> Add x86 maintainers and LKML:
>>>
>>> The background is that QEMU want to support up to 4G initrd. but
>>> linux header (
>>> initrd_addr_max field) only allow 2G-1.
>>> Is one of the below approaches reasonable:
>>> 1) change initrd_addr_max to 4G-1 directly
>>> simply(arch/x86/boot/header.S)?
>>> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header
>>> said 2G-1
>>> 3) any else
>> A 10 years old comment from hpa says:
>>
>>    initrd_addr_max: .long 0x7fffffff
>>                                          # (Header version 0x0203 or
>> later)
>>                                          # The highest safe address for
>>                                          # the contents of an initrd
>>                                          # The current kernel allows
>> up to 4 GB,
>>                                          # but leave it at 2 GB to avoid
>>                                          # possible bootloader bugs.
>>
>> To avoid the potential of bugs lurking in dozens of major and hundreds of
>> minor iterations of various Linux bootloaders I'd prefer a real solution
>> and extend it - because if there's a 2GB initrd for some weird reason
>> today there might be a 4GB one in two years.
> 
> thank a lots. that's amazing.
> 
> 
>>
>> The real solution would be to:
>>
>>   - Extend the boot protocol with a 64-bit field, named initrd_addr64_max
>>     or such.
>>   - We don't change the old field - but if the new field is set by new
>>     kernels then new bootloaders can use that as a new initrd_addr64_max
>>     value. (or reject to load the kernel if the address is too high.)
>>
>>   - The kernel build should also emit a warning when building larger than
>>     2GB initrds, with a list of bootloaders that support the new
>> protocol.
> 
> Actually i just knew QEMU(Seabios + optionrom(linuxboot_dma.bin)) can
> support ~4GB initrd so far.
> 
> i just drafted at patch to add this field. could you have a look.
> another patch which is to document initrd_addr64_max is ongoing.
> 
> commit db463ac9c1975f115d1ce2acb82d530c2b63b888
> Author: Li Zhijian <lizhijian@cn.fujitsu.com>
> Date:   Fri Nov 9 17:24:14 2018 +0800
> 
>     x86: Add header field initrd_addr64_max
>         Years ago, kernel had support load ~4GB initrd. But for some
> weird reasons (
>     avoid possible bootloader bugs), it only allow leave initrd under
> 2GB address
>     space(see initrd_addr_max fild at arch/x86/boot/header.S).
>         So modern bootloaders have not chance to load >=2G initrd
> previously.
>         To avoid the potential of bugs lurking in dozens of major and
> hundreds of
>     minor iterations of various Linux bootloaders. Ingo suggests to add
> a new field
>     initrd_addr64_max. If bootloader believes that it can load initrd to
>>=2G
>     address space, it can use initrd_addr64_max as the maximum loading
> address in
>     stead of the old field initrd_addr_max.
> 
> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index 4c881c8..5fc3ebe 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -300,7 +300,7 @@ _start:
>         # Part 2 of the header, from the old setup.S
>  
>                 .ascii  "HdrS"          # header signature
> -               .word   0x020e          # header version number (>= 0x0105)
> +               .word   0x020f          # header version number (>= 0x0105)
>                                         # or else old loadlin-1.5 will
> fail)
>                 .globl realmode_swtch
>  realmode_swtch:        .word   0, 0            # default_switch, SETUPSEG
> @@ -562,6 +562,12 @@ acpi_rsdp_addr:            .quad 0                
> # 64-bit physical pointer to the
>                                                 # ACPI RSDP table, added
> with
>                                                 # version 2.14
>  
> +#ifdef CONFIG_INITRD_SIZE_4GB
> +initrd_addr64_max:     .quad 0xffffffff        # allow ~4G initrd since
> 2.15
> +#else
> +initrd_addr64_max:     .quad 0

Shouldn't this be 0x7fffffff?

And please update Documentation/x86/boot.txt


Juergen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09 10:04           ` Juergen Gross
@ 2018-11-09 13:11             ` Li Zhijian
  2018-11-09 13:40             ` Li Zhijian
  1 sibling, 0 replies; 12+ messages in thread
From: Li Zhijian @ 2018-11-09 13:11 UTC (permalink / raw)
  To: Juergen Gross, Ingo Molnar, Li Zhijian
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Kees Cook

Just noticed that there is a field xloadflags at recent protocol
   60 Protocol 2.12:  (Kernel 3.8) Added the xloadflags field and extension fields
   61                 to struct boot_params for loading bzImage and ramdisk
   62                 above 4G in 64bit.
[snip]
  617 Field name:     xloadflags
  618 Type:           read
  619 Offset/size:    0x236/2
  620 Protocol:       2.12+
  621
  622   This field is a bitmask.
  623
  624   Bit 0 (read): XLF_KERNEL_64
  625         - If 1, this kernel has the legacy 64-bit entry point at 0x200.
  626
  627   Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
  628         - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
  629

maybe we can reuse this field and append a new Bit 5 XLF_INITRD_SIZE_4G or such


thanks
Zhijian

  

On 11/9/2018 6:04 PM, Juergen Gross wrote:
> On 09/11/2018 10:57, Li Zhijian wrote:
>> On 11/9/2018 3:20 PM, Ingo Molnar wrote:
>>> * Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>>>
>>>>> If the kernel initrd creation process creates an initrd which
>>>>> is larger than 2GB and also claims that it can't be placed
>>>>> with any part of it above 2GB, then that sounds like a bug
>>>>> in the initrd creation process...
>>>> Exactly, it's a real problem.
>>>>
>>>> Add x86 maintainers and LKML:
>>>>
>>>> The background is that QEMU want to support up to 4G initrd. but
>>>> linux header (
>>>> initrd_addr_max field) only allow 2G-1.
>>>> Is one of the below approaches reasonable:
>>>> 1) change initrd_addr_max to 4G-1 directly
>>>> simply(arch/x86/boot/header.S)?
>>>> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header
>>>> said 2G-1
>>>> 3) any else
>>> A 10 years old comment from hpa says:
>>>
>>>     initrd_addr_max: .long 0x7fffffff
>>>                                           # (Header version 0x0203 or
>>> later)
>>>                                           # The highest safe address for
>>>                                           # the contents of an initrd
>>>                                           # The current kernel allows
>>> up to 4 GB,
>>>                                           # but leave it at 2 GB to avoid
>>>                                           # possible bootloader bugs.
>>>
>>> To avoid the potential of bugs lurking in dozens of major and hundreds of
>>> minor iterations of various Linux bootloaders I'd prefer a real solution
>>> and extend it - because if there's a 2GB initrd for some weird reason
>>> today there might be a 4GB one in two years.
>> thank a lots. that's amazing.
>>
>>
>>> The real solution would be to:
>>>
>>>    - Extend the boot protocol with a 64-bit field, named initrd_addr64_max
>>>      or such.
>>>    - We don't change the old field - but if the new field is set by new
>>>      kernels then new bootloaders can use that as a new initrd_addr64_max
>>>      value. (or reject to load the kernel if the address is too high.)
>>>
>>>    - The kernel build should also emit a warning when building larger than
>>>      2GB initrds, with a list of bootloaders that support the new
>>> protocol.
>> Actually i just knew QEMU(Seabios + optionrom(linuxboot_dma.bin)) can
>> support ~4GB initrd so far.
>>
>> i just drafted at patch to add this field. could you have a look.
>> another patch which is to document initrd_addr64_max is ongoing.
>>
>> commit db463ac9c1975f115d1ce2acb82d530c2b63b888
>> Author: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Date:   Fri Nov 9 17:24:14 2018 +0800
>>
>>      x86: Add header field initrd_addr64_max
>>          Years ago, kernel had support load ~4GB initrd. But for some
>> weird reasons (
>>      avoid possible bootloader bugs), it only allow leave initrd under
>> 2GB address
>>      space(see initrd_addr_max fild at arch/x86/boot/header.S).
>>          So modern bootloaders have not chance to load >=2G initrd
>> previously.
>>          To avoid the potential of bugs lurking in dozens of major and
>> hundreds of
>>      minor iterations of various Linux bootloaders. Ingo suggests to add
>> a new field
>>      initrd_addr64_max. If bootloader believes that it can load initrd to
>>> =2G
>>      address space, it can use initrd_addr64_max as the maximum loading
>> address in
>>      stead of the old field initrd_addr_max.
>>
>> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
>> index 4c881c8..5fc3ebe 100644
>> --- a/arch/x86/boot/header.S
>> +++ b/arch/x86/boot/header.S
>> @@ -300,7 +300,7 @@ _start:
>>          # Part 2 of the header, from the old setup.S
>>   
>>                  .ascii  "HdrS"          # header signature
>> -               .word   0x020e          # header version number (>= 0x0105)
>> +               .word   0x020f          # header version number (>= 0x0105)
>>                                          # or else old loadlin-1.5 will
>> fail)
>>                  .globl realmode_swtch
>>   realmode_swtch:        .word   0, 0            # default_switch, SETUPSEG
>> @@ -562,6 +562,12 @@ acpi_rsdp_addr:            .quad 0
>> # 64-bit physical pointer to the
>>                                                  # ACPI RSDP table, added
>> with
>>                                                  # version 2.14
>>   
>> +#ifdef CONFIG_INITRD_SIZE_4GB
>> +initrd_addr64_max:     .quad 0xffffffff        # allow ~4G initrd since
>> 2.15
>> +#else
>> +initrd_addr64_max:     .quad 0
> Shouldn't this be 0x7fffffff?
>
> And please update Documentation/x86/boot.txt
>
>
> Juergen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09 10:04           ` Juergen Gross
  2018-11-09 13:11             ` Li Zhijian
@ 2018-11-09 13:40             ` Li Zhijian
  2018-11-09 21:10               ` H. Peter Anvin
  1 sibling, 1 reply; 12+ messages in thread
From: Li Zhijian @ 2018-11-09 13:40 UTC (permalink / raw)
  To: Juergen Gross, Ingo Molnar, Li Zhijian
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Kees Cook

Just noticed that there is a field xloadflags at recent protocol
   60 Protocol 2.12:  (Kernel 3.8) Added the xloadflags field and extension fields
   61                 to struct boot_params for loading bzImage and ramdisk
   62                 above 4G in 64bit.
[snip]
  617 Field name:     xloadflags
  618 Type:           read
  619 Offset/size:    0x236/2
  620 Protocol:       2.12+
  621
  622   This field is a bitmask.
  623
  624   Bit 0 (read): XLF_KERNEL_64
  625         - If 1, this kernel has the legacy 64-bit entry point at 0x200.
  626
  627   Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
  628         - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
  629

maybe we can reuse this field and append a new Bit 5 XLF_INITRD_MAX_SIZE_4G and increase header version.
For the old protocol version 2.12+, if  XLF_CAN_BE_LOADED_ABOVE_4G is set, we can also realize ~4GB initrd is allowed.

bootloader side:
if protocol >= 2.15
    if XLF_INITRD_LOAD_BELOW_4G
       support ~4G initrd
    fi
else if protocol >=2.12
    if XLF_CAN_BE_LOADED_ABOVE_4G
     support ~4G initrd
    fi
fi

thanks
Zhijian

  

On 11/9/2018 6:04 PM, Juergen Gross wrote:
> On 09/11/2018 10:57, Li Zhijian wrote:
>> On 11/9/2018 3:20 PM, Ingo Molnar wrote:
>>> * Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>>>
>>>>> If the kernel initrd creation process creates an initrd which
>>>>> is larger than 2GB and also claims that it can't be placed
>>>>> with any part of it above 2GB, then that sounds like a bug
>>>>> in the initrd creation process...
>>>> Exactly, it's a real problem.
>>>>
>>>> Add x86 maintainers and LKML:
>>>>
>>>> The background is that QEMU want to support up to 4G initrd. but
>>>> linux header (
>>>> initrd_addr_max field) only allow 2G-1.
>>>> Is one of the below approaches reasonable:
>>>> 1) change initrd_addr_max to 4G-1 directly
>>>> simply(arch/x86/boot/header.S)?
>>>> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header
>>>> said 2G-1
>>>> 3) any else
>>> A 10 years old comment from hpa says:
>>>
>>>     initrd_addr_max: .long 0x7fffffff
>>>                                           # (Header version 0x0203 or
>>> later)
>>>                                           # The highest safe address for
>>>                                           # the contents of an initrd
>>>                                           # The current kernel allows
>>> up to 4 GB,
>>>                                           # but leave it at 2 GB to avoid
>>>                                           # possible bootloader bugs.
>>>
>>> To avoid the potential of bugs lurking in dozens of major and hundreds of
>>> minor iterations of various Linux bootloaders I'd prefer a real solution
>>> and extend it - because if there's a 2GB initrd for some weird reason
>>> today there might be a 4GB one in two years.
>> thank a lots. that's amazing.
>>
>>
>>> The real solution would be to:
>>>
>>>    - Extend the boot protocol with a 64-bit field, named initrd_addr64_max
>>>      or such.
>>>    - We don't change the old field - but if the new field is set by new
>>>      kernels then new bootloaders can use that as a new initrd_addr64_max
>>>      value. (or reject to load the kernel if the address is too high.)
>>>
>>>    - The kernel build should also emit a warning when building larger than
>>>      2GB initrds, with a list of bootloaders that support the new
>>> protocol.
>> Actually i just knew QEMU(Seabios + optionrom(linuxboot_dma.bin)) can
>> support ~4GB initrd so far.
>>
>> i just drafted at patch to add this field. could you have a look.
>> another patch which is to document initrd_addr64_max is ongoing.
>>
>> commit db463ac9c1975f115d1ce2acb82d530c2b63b888
>> Author: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Date:   Fri Nov 9 17:24:14 2018 +0800
>>
>>      x86: Add header field initrd_addr64_max
>>          Years ago, kernel had support load ~4GB initrd. But for some
>> weird reasons (
>>      avoid possible bootloader bugs), it only allow leave initrd under
>> 2GB address
>>      space(see initrd_addr_max fild at arch/x86/boot/header.S).
>>          So modern bootloaders have not chance to load >=2G initrd
>> previously.
>>          To avoid the potential of bugs lurking in dozens of major and
>> hundreds of
>>      minor iterations of various Linux bootloaders. Ingo suggests to add
>> a new field
>>      initrd_addr64_max. If bootloader believes that it can load initrd to
>>> =2G
>>      address space, it can use initrd_addr64_max as the maximum loading
>> address in
>>      stead of the old field initrd_addr_max.
>>
>> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
>> index 4c881c8..5fc3ebe 100644
>> --- a/arch/x86/boot/header.S
>> +++ b/arch/x86/boot/header.S
>> @@ -300,7 +300,7 @@ _start:
>>          # Part 2 of the header, from the old setup.S
>>   
>>                  .ascii  "HdrS"          # header signature
>> -               .word   0x020e          # header version number (>= 0x0105)
>> +               .word   0x020f          # header version number (>= 0x0105)
>>                                          # or else old loadlin-1.5 will
>> fail)
>>                  .globl realmode_swtch
>>   realmode_swtch:        .word   0, 0            # default_switch, SETUPSEG
>> @@ -562,6 +562,12 @@ acpi_rsdp_addr:            .quad 0
>> # 64-bit physical pointer to the
>>                                                  # ACPI RSDP table, added
>> with
>>                                                  # version 2.14
>>   
>> +#ifdef CONFIG_INITRD_SIZE_4GB
>> +initrd_addr64_max:     .quad 0xffffffff        # allow ~4G initrd since
>> 2.15
>> +#else
>> +initrd_addr64_max:     .quad 0
> Shouldn't this be 0x7fffffff?
>
> And please update Documentation/x86/boot.txt
>
>
> Juergen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09 13:40             ` Li Zhijian
@ 2018-11-09 21:10               ` H. Peter Anvin
  2018-11-12  4:56                 ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2018-11-09 21:10 UTC (permalink / raw)
  To: Li Zhijian, Juergen Gross, Ingo Molnar, Li Zhijian
  Cc: Peter Maydell, x86, bp, mingo, tglx, QEMU Developers, Philip Li,
	linux-kernel, Linus Torvalds, Peter Zijlstra, Kees Cook

On 11/9/18 5:40 AM, Li Zhijian wrote:
> Just noticed that there is a field xloadflags at recent protocol
>   60 Protocol 2.12:  (Kernel 3.8) Added the xloadflags field and
> extension fields
>   61                 to struct boot_params for loading bzImage and ramdisk
>   62                 above 4G in 64bit.
> [snip]
>  617 Field name:     xloadflags
>  618 Type:           read
>  619 Offset/size:    0x236/2
>  620 Protocol:       2.12+
>  621
>  622   This field is a bitmask.
>  623
>  624   Bit 0 (read): XLF_KERNEL_64
>  625         - If 1, this kernel has the legacy 64-bit entry point at
> 0x200.
>  626
>  627   Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
>  628         - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
>  629
> 
> maybe we can reuse this field and append a new Bit 5
> XLF_INITRD_MAX_SIZE_4G and increase header version.
> For the old protocol version 2.12+, if  XLF_CAN_BE_LOADED_ABOVE_4G is
> set, we can also realize ~4GB initrd is allowed.
> 
> bootloader side:
> if protocol >= 2.15
>    if XLF_INITRD_LOAD_BELOW_4G
>       support ~4G initrd
>    fi
> else if protocol >=2.12
>    if XLF_CAN_BE_LOADED_ABOVE_4G
>     support ~4G initrd
>    fi
> fi
> 

The two are equivalent.  Obviously you have to load above 4 GB if you
have more than 4 GB of initrd.  If XLF_CAN_BE_LOADED_ABOVE_4G is not
set, then you most likely are on a 32-bit kernel and there are more
fundamental limits (even if you were to load it above the 2 GB mark, you
would be limited by the size of kernel memory.)

So, in case you are wondering: the bootloader that broke when setting
the initrd_max field above 2 GB was, of course, Grub.

So just use XLF_CAN_BE_LOADED_ABOVE_4G. There is no need for a new flag
or new field.

Also note that the ext_ramdisk_image and ext_ramdisk_size are part of
struct boot_params as opposed to struct setup_header, which means that
they are not supported when entering via the 16-bit BIOS entry point,
and I am willing to bet that there will be, ahem, "strangeness" if
entered via the 32-bit entry point if at least the command line is
loaded above the 4 GB mark; the initrd should be fine, though.

This is obviosly not an issue in EFI environments, where we enter
through the EFI handover entry point.

The main reason these were not added to struct setup_header is that
there are only 24 bytes left in that header and so space is highly
precious. One way to deal with that if we really, really need to would
be to add an initrd/initramfs type of setup_data.

	-hpa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09 21:10               ` H. Peter Anvin
@ 2018-11-12  4:56                 ` Ingo Molnar
  2018-11-12  6:00                   ` H. Peter Anvin
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2018-11-12  4:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 11/9/18 5:40 AM, Li Zhijian wrote:
> > Just noticed that there is a field xloadflags at recent protocol
> >   60 Protocol 2.12:  (Kernel 3.8) Added the xloadflags field and
> > extension fields
> >   61                 to struct boot_params for loading bzImage and ramdisk
> >   62                 above 4G in 64bit.
> > [snip]
> >  617 Field name:     xloadflags
> >  618 Type:           read
> >  619 Offset/size:    0x236/2
> >  620 Protocol:       2.12+
> >  621
> >  622   This field is a bitmask.
> >  623
> >  624   Bit 0 (read): XLF_KERNEL_64
> >  625         - If 1, this kernel has the legacy 64-bit entry point at
> > 0x200.
> >  626
> >  627   Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
> >  628         - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
> >  629
> > 
> > maybe we can reuse this field and append a new Bit 5
> > XLF_INITRD_MAX_SIZE_4G and increase header version.
> > For the old protocol version 2.12+, if  XLF_CAN_BE_LOADED_ABOVE_4G is
> > set, we can also realize ~4GB initrd is allowed.
> > 
> > bootloader side:
> > if protocol >= 2.15
> >    if XLF_INITRD_LOAD_BELOW_4G
> >       support ~4G initrd
> >    fi
> > else if protocol >=2.12
> >    if XLF_CAN_BE_LOADED_ABOVE_4G
> >     support ~4G initrd
> >    fi
> > fi
> > 
> 
> The two are equivalent.  Obviously you have to load above 4 GB if you
> have more than 4 GB of initrd.  If XLF_CAN_BE_LOADED_ABOVE_4G is not
> set, then you most likely are on a 32-bit kernel and there are more
> fundamental limits (even if you were to load it above the 2 GB mark, you
> would be limited by the size of kernel memory.)
> 
> So, in case you are wondering: the bootloader that broke when setting
> the initrd_max field above 2 GB was, of course, Grub.
> 
> So just use XLF_CAN_BE_LOADED_ABOVE_4G. There is no need for a new flag
> or new field.

That's nice, and that's the best solution!

> Also note that the ext_ramdisk_image and ext_ramdisk_size are part of
> struct boot_params as opposed to struct setup_header, which means that
> they are not supported when entering via the 16-bit BIOS entry point,
> and I am willing to bet that there will be, ahem, "strangeness" if
> entered via the 32-bit entry point if at least the command line is
> loaded above the 4 GB mark; the initrd should be fine, though.
> 
> This is obviosly not an issue in EFI environments, where we enter
> through the EFI handover entry point.
> 
> The main reason these were not added to struct setup_header is that
> there are only 24 bytes left in that header and so space is highly
> precious. One way to deal with that if we really, really need to would
> be to add an initrd/initramfs type of setup_data.

Is there no way to extend that header by making an extended header part 
of the payload?

IIRC that header is small and fixed size to be part of a single sector at 
the very beginning of boot images, but accessing any extended header bits 
from the payload section shouldn't really be an issue for a modern 
bootloader to handle, right?

Such an extended header could use a more modern (self-extending) ABI as 
well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-12  4:56                 ` Ingo Molnar
@ 2018-11-12  6:00                   ` H. Peter Anvin
  2018-11-12  6:19                     ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2018-11-12  6:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook

On 11/11/18 8:56 PM, Ingo Molnar wrote:
> 
>> Also note that the ext_ramdisk_image and ext_ramdisk_size are part of
>> struct boot_params as opposed to struct setup_header, which means that
>> they are not supported when entering via the 16-bit BIOS entry point,
>> and I am willing to bet that there will be, ahem, "strangeness" if
>> entered via the 32-bit entry point if at least the command line is
>> loaded above the 4 GB mark; the initrd should be fine, though.
>>
>> This is obviosly not an issue in EFI environments, where we enter
>> through the EFI handover entry point.
>>
>> The main reason these were not added to struct setup_header is that
>> there are only 24 bytes left in that header and so space is highly
>> precious. One way to deal with that if we really, really need to would
>> be to add an initrd/initramfs type of setup_data.
> 
> Is there no way to extend that header by making an extended header part 
> of the payload?
> 
> IIRC that header is small and fixed size to be part of a single sector at 
> the very beginning of boot images, but accessing any extended header bits 
> from the payload section shouldn't really be an issue for a modern 
> bootloader to handle, right?
> 
> Such an extended header could use a more modern (self-extending) ABI as 
> well.
> 

Yes, although I don't really think it is as much of an issue as it seems at
this point.

The limit comes from having used a one-byte jump instruction at the beginning;
however, these days that limit is functionally walled.

It is of course possible to address this if it should become necessary,
however, the current protocol has lasted for 23 years so far and we haven't
run out yet, even with occasional missteps. As such, I don't think we are in a
huge hurry to address this particular aspect.

In part as a result of this exchange I have spent some time thinking about the
boot protocol and its dependencies, and there is, in fact, a much more serious
problem that needs to be addressed: it is not currently possible in a
forward-compatible way to map all data areas that may be occupied by
bootloader-provided data. The kernel proper has an advantage here, in that the
kernel will by definition always be the "owner of the protocol" (anything the
kernel doesn't know how to map won't be used by the kernel anyway), but it
really isn't a good situation. So I'm currently trying to think up a way to
make that possible.

	-hpa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-12  6:00                   ` H. Peter Anvin
@ 2018-11-12  6:19                     ` Ingo Molnar
  2018-11-12  6:36                       ` H. Peter Anvin
  2018-11-12 16:47                       ` H. Peter Anvin
  0 siblings, 2 replies; 12+ messages in thread
From: Ingo Molnar @ 2018-11-12  6:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook


* H. Peter Anvin <hpa@zytor.com> wrote:

> > Such an extended header could use a more modern (self-extending) ABI as 
> > well.
> 
> Yes, although I don't really think it is as much of an issue as it seems at
> this point.
> 
> The limit comes from having used a one-byte jump instruction at the beginning;
> however, these days that limit is functionally walled.
> 
> It is of course possible to address this if it should become necessary,
> however, the current protocol has lasted for 23 years so far and we haven't
> run out yet, even with occasional missteps. As such, I don't think we are in a
> huge hurry to address this particular aspect.

Agreed, fair enough!

> In part as a result of this exchange I have spent some time thinking 
> about the boot protocol and its dependencies, and there is, in fact, a 
> much more serious problem that needs to be addressed: it is not 
> currently possible in a forward-compatible way to map all data areas 
> that may be occupied by bootloader-provided data. The kernel proper has 
> an advantage here, in that the kernel will by definition always be the 
> "owner of the protocol" (anything the kernel doesn't know how to map 
> won't be used by the kernel anyway), but it really isn't a good 
> situation. So I'm currently trying to think up a way to make that 
> possible.

I might be a bit dense early in the morning, but could you elaborate? 
What do you mean by mapping all data areas?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-12  6:19                     ` Ingo Molnar
@ 2018-11-12  6:36                       ` H. Peter Anvin
  2018-11-12 16:47                       ` H. Peter Anvin
  1 sibling, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2018-11-12  6:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook

On 11/11/18 10:19 PM, Ingo Molnar wrote:
> 
> I might be a bit dense early in the morning, but could you elaborate? 
> What do you mean by mapping all data areas?
> 

Heh. I need to pack for LPC and get some sleep before my flight lest I'll be
denser than depleted uranium; I'll write an explanation tomorrow.

	-hpa


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-12  6:19                     ` Ingo Molnar
  2018-11-12  6:36                       ` H. Peter Anvin
@ 2018-11-12 16:47                       ` H. Peter Anvin
  1 sibling, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2018-11-12 16:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook

On 11/11/18 10:19 PM, Ingo Molnar wrote:
> 
>> In part as a result of this exchange I have spent some time thinking 
>> about the boot protocol and its dependencies, and there is, in fact, a 
>> much more serious problem that needs to be addressed: it is not 
>> currently possible in a forward-compatible way to map all data areas 
>> that may be occupied by bootloader-provided data. The kernel proper has 
>> an advantage here, in that the kernel will by definition always be the 
>> "owner of the protocol" (anything the kernel doesn't know how to map 
>> won't be used by the kernel anyway), but it really isn't a good 
>> situation. So I'm currently trying to think up a way to make that 
>> possible.
> 
> I might be a bit dense early in the morning, but could you elaborate? 
> What do you mean by mapping all data areas?

Alright, awake now...

As it sits right now, the protocol contains a number of data structures with
pointers, pointing to a variety of memory areas that can be set up by the
bootloader. Now, consider something like KASLR or a secondary boot loader
where we need to allocate memory in between the primary bootloader and the
kernel to be run. With the kernel proper, in the absence of KASLR, we have
solved this by marking out exactly how much memory the kernel may need before
it has its own memory manager up and running, but KASLR needs to move it
outside this range, and a secondary boot loader shim of some sort may need to
allocate additional data structures. In the particular case of an UEFI system
where we do the right thing (which Grub2 doesn't, by default) and enter via
the kernel UEFI stub we are okay, but for other boot scenarios we are in
trouble: even if we know where all the pointers are and how to determine the
size of various data structures, once the protocol is updated with new
information then that is no longer valid.

The setup_data linked list solves that under certain circumstances, but in
others it has turned out to not be adequate.

There are a couple of options:

a) Not allow any new pointers to memory areas in what is considered system
   RAM. Such data structures *must* have a setup_data linked list header.
   Pointers into E820 table reserved areas are still acceptable.

b) Create a new E820 table memory type for "boot data", similar to what UEFI
   already has, and encourage boot loaders to mark any allocated memory
   structures that way.  The main problem with that is that the poor quality
   of boot loaders may mean that that fails to happen, and because it wouldn't
   "fail hard" it is likely that they will get it wrong.

   The difference from the RESERVED memory type is that the kernel can reclaim
   that memory after the data has been recovered.

c) This might be the preferred option:

   1. Just like (a), do not allow new pointers to memory areas in system RAM
      in struct boot_params.
   2. Create a subrange of struct setup_data (e.g. bit 30 = 1) explicitly
      containing pointers to other data structures, including sizes, in a
      way that can be parsed by generic code.
   3. Encourage boot loaders to make sure the setup_data list is in order of
      ascending address (and WARN if it is not.)
   4. Add (b) as an option, for responsible boot loaders ;) to provide an
      extra level of protection.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-11-12 16:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1541674784-25936-1-git-send-email-lizhijian@cn.fujitsu.com>
     [not found] ` <1541674784-25936-2-git-send-email-lizhijian@cn.fujitsu.com>
     [not found]   ` <CAFEAcA_47wyar3WNhuBmNPdr02RHx-wn_TeyFaajpjzvzG8j5Q@mail.gmail.com>
2018-11-09  1:47     ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Li Zhijian
2018-11-09  7:20       ` Ingo Molnar
2018-11-09  9:57         ` Li Zhijian
2018-11-09 10:04           ` Juergen Gross
2018-11-09 13:11             ` Li Zhijian
2018-11-09 13:40             ` Li Zhijian
2018-11-09 21:10               ` H. Peter Anvin
2018-11-12  4:56                 ` Ingo Molnar
2018-11-12  6:00                   ` H. Peter Anvin
2018-11-12  6:19                     ` Ingo Molnar
2018-11-12  6:36                       ` H. Peter Anvin
2018-11-12 16:47                       ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).