[Qemu-devel] [RFC/PoC PATCH 0/3] support initrd size up to 4G

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC/PoC PATCH 0/3] support initrd size up to 4G
@ 2018-11-08 10:59 Li Zhijian
  2018-11-08 10:59 ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Li Zhijian
  2018-11-08 20:12 ` [Qemu-devel] [RFC/PoC PATCH 0/3] support initrd size up to 4G Wainer dos Santos Moschetta
  0 siblings, 2 replies; 22+ messages in thread
From: Li Zhijian @ 2018-11-08 10:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhijianx.li, Li Zhijian, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson

Long long ago, linux kernel have supported up to 4G initrd, but it's header still
hard code to allow 2G - 1 only.
 # (Header version 0x0203 or later) the highest safe address for the contents
 # of an initrd. The current kernel allows up to 4 GB, but leave it at 2 GB to
 # avoid possible bootloader bugs.

kexec is one of the known scenario which has supported up to 4G initrd.

This patches is to enable up to 4G initrd, Seabios + optionrom(linuxboot_dma)
works as expected so far.

3/3 has a huge change of address/memory APIs.
I replace 'int len' in a stupid way, but it looks not safety.
$ sed -i 's/int len/uint32_t len/' exec.c
$ make # and check compiling errors
$ sed -i 's/int len/uint32_t len/' cpu-all.h
$ make -i 's/int len/uint32_t len/' include/exec/cpu-common.h
$ make -i 's/int len/uint32_t len/' include/exec/memory.h
$ make # all errors gone

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Peter Crosthwaite <crosthwaite.peter@gmail.com>
CC: Richard Henderson <rth@twiddle.net>

Li Zhijian (3):
  i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  change size type from int to int64_t on load_image()
  change int len to uin32_t len

 exec.c                    | 42 +++++++++++++++++++++---------------------
 hw/core/loader.c          |  3 ++-
 hw/i386/pc.c              |  6 ++++++
 include/exec/cpu-all.h    |  2 +-
 include/exec/cpu-common.h | 10 +++++-----
 include/exec/memory.h     | 20 ++++++++++----------
 6 files changed, 45 insertions(+), 38 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-08 10:59 [Qemu-devel] [RFC/PoC PATCH 0/3] support initrd size up to 4G Li Zhijian
@ 2018-11-08 10:59 ` Li Zhijian
  2018-11-08 10:59   ` [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image() Li Zhijian
  2018-11-08 11:06   ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Peter Maydell
  2018-11-08 20:12 ` [Qemu-devel] [RFC/PoC PATCH 0/3] support initrd size up to 4G Wainer dos Santos Moschetta
  1 sibling, 2 replies; 22+ messages in thread
From: Li Zhijian @ 2018-11-08 10:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhijianx.li, Li Zhijian, Philip Li

x86/x86_64 has alredy supported 4G initrd.

linux/arch/x86/boot/header.S:
 # (Header version 0x0203 or later) the highest safe address for the contents
 # of an initrd. The current kernel allows up to 4 GB, but leave it at 2 GB to
 # avoid possible bootloader bugs.

CC: Philip Li <philip.li@intel.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 hw/i386/pc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index cd5029c..e1b910f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -913,6 +913,12 @@ static void load_linux(PCMachineState *pcms,
     /* highest address for loading the initrd */
     if (protocol >= 0x203) {
         initrd_max = ldl_p(header+0x22c);
+        if (initrd_max == 0x7fffffff) {
+            /* for some reasons, initrd_max is hard code with 0x7fffffff
+             * hard code to 4G - 1 to allow 4G initrd
+             */
+            initrd_max = UINT32_MAX - 1;
+        }
     } else {
         initrd_max = 0x37ffffff;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image()
  2018-11-08 10:59 ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Li Zhijian
@ 2018-11-08 10:59   ` Li Zhijian
  2018-11-08 10:59     ` [Qemu-devel] [PATCH 3/3] change int len to uin32_t len Li Zhijian
  2018-11-08 11:04     ` [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image() Peter Maydell
  2018-11-08 11:06   ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Peter Maydell
  1 sibling, 2 replies; 22+ messages in thread
From: Li Zhijian @ 2018-11-08 10:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: zhijianx.li, Li Zhijian, Philip Li

allow load_image to load >= 2G file

CC: Philip Li <philip.li@intel.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 hw/core/loader.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index aa0b3fc..8fbc4bd 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -77,7 +77,8 @@ int64_t get_image_size(const char *filename)
 /* deprecated, because caller does not specify buffer size! */
 int load_image(const char *filename, uint8_t *addr)
 {
-    int fd, size;
+    int fd;
+    int64_t size;
     fd = open(filename, O_RDONLY | O_BINARY);
     if (fd < 0)
         return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 3/3] change int len to uin32_t len
  2018-11-08 10:59   ` [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image() Li Zhijian
@ 2018-11-08 10:59     ` Li Zhijian
  2018-11-08 11:14       ` Peter Maydell
  2018-11-08 11:04     ` [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image() Peter Maydell
  1 sibling, 1 reply; 22+ messages in thread
From: Li Zhijian @ 2018-11-08 10:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: zhijianx.li, Li Zhijian, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Philip Li

In order to support >= 2G initrd, we need to change len type from int to
uin32_t.

Below is the flow sample to show how qemu copy initrd from qemu
side to VM when using optionroms bootlinux_dma.bin:
dma_memory_read(uint32_t len)
  -> dma_memory_rw(uint32_t len)
    -> dma_memory_rw_relaxed(uint32_t len)
      -> address_space_rw(int len) # len overflow
        -> address_space_read_full(int len)

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Peter Crosthwaite <crosthwaite.peter@gmail.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Philip Li <philip.li@intel.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 exec.c                    | 42 +++++++++++++++++++++---------------------
 include/exec/cpu-all.h    |  2 +-
 include/exec/cpu-common.h | 10 +++++-----
 include/exec/memory.h     | 20 ++++++++++----------
 4 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/exec.c b/exec.c
index bb6170d..0985f52 100644
--- a/exec.c
+++ b/exec.c
@@ -2719,7 +2719,7 @@ static const MemoryRegionOps notdirty_mem_ops = {
 };
 
 /* Generate a debug exception if a watchpoint has been hit.  */
-static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
+static void check_watchpoint(int offset, uint32_t len, MemTxAttrs attrs, int flags)
 {
     CPUState *cpu = current_cpu;
     CPUClass *cc = CPU_GET_CLASS(cpu);
@@ -2848,10 +2848,10 @@ static const MemoryRegionOps watch_mem_ops = {
 };
 
 static MemTxResult flatview_read(FlatView *fv, hwaddr addr,
-                                      MemTxAttrs attrs, uint8_t *buf, int len);
+                                      MemTxAttrs attrs, uint8_t *buf, uint32_t len);
 static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
-                                  const uint8_t *buf, int len);
-static bool flatview_access_valid(FlatView *fv, hwaddr addr, int len,
+                                  const uint8_t *buf, uint32_t len);
+static bool flatview_access_valid(FlatView *fv, hwaddr addr, uint32_t len,
                                   bool is_write, MemTxAttrs attrs);
 
 static MemTxResult subpage_read(void *opaque, hwaddr addr, uint64_t *data,
@@ -3099,7 +3099,7 @@ MemoryRegion *get_system_io(void)
 /* physical memory access (slow version, mainly for debug) */
 #if defined(CONFIG_USER_ONLY)
 int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
-                        uint8_t *buf, int len, int is_write)
+                        uint8_t *buf, uint32_t len, int is_write)
 {
     int l, flags;
     target_ulong page;
@@ -3215,7 +3215,7 @@ static bool prepare_mmio_access(MemoryRegion *mr)
 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
                                            MemTxAttrs attrs,
                                            const uint8_t *buf,
-                                           int len, hwaddr addr1,
+                                           uint32_t len, hwaddr addr1,
                                            hwaddr l, MemoryRegion *mr)
 {
     uint8_t *ptr;
@@ -3260,7 +3260,7 @@ static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
 
 /* Called from RCU critical section.  */
 static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
-                                  const uint8_t *buf, int len)
+                                  const uint8_t *buf, uint32_t len)
 {
     hwaddr l;
     hwaddr addr1;
@@ -3278,7 +3278,7 @@ static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
 /* Called within RCU critical section.  */
 MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr,
                                    MemTxAttrs attrs, uint8_t *buf,
-                                   int len, hwaddr addr1, hwaddr l,
+                                   uint32_t len, hwaddr addr1, hwaddr l,
                                    MemoryRegion *mr)
 {
     uint8_t *ptr;
@@ -3321,7 +3321,7 @@ MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr,
 
 /* Called from RCU critical section.  */
 static MemTxResult flatview_read(FlatView *fv, hwaddr addr,
-                                 MemTxAttrs attrs, uint8_t *buf, int len)
+                                 MemTxAttrs attrs, uint8_t *buf, uint32_t len)
 {
     hwaddr l;
     hwaddr addr1;
@@ -3334,7 +3334,7 @@ static MemTxResult flatview_read(FlatView *fv, hwaddr addr,
 }
 
 MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
-                                    MemTxAttrs attrs, uint8_t *buf, int len)
+                                    MemTxAttrs attrs, uint8_t *buf, uint32_t len)
 {
     MemTxResult result = MEMTX_OK;
     FlatView *fv;
@@ -3351,7 +3351,7 @@ MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
 
 MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
                                 MemTxAttrs attrs,
-                                const uint8_t *buf, int len)
+                                const uint8_t *buf, uint32_t len)
 {
     MemTxResult result = MEMTX_OK;
     FlatView *fv;
@@ -3367,7 +3367,7 @@ MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
 }
 
 MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
-                             uint8_t *buf, int len, bool is_write)
+                             uint8_t *buf, uint32_t len, bool is_write)
 {
     if (is_write) {
         return address_space_write(as, addr, attrs, buf, len);
@@ -3377,7 +3377,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
 }
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
-                            int len, int is_write)
+                            uint32_t len, int is_write)
 {
     address_space_rw(&address_space_memory, addr, MEMTXATTRS_UNSPECIFIED,
                      buf, len, is_write);
@@ -3389,7 +3389,7 @@ enum write_rom_type {
 };
 
 static inline void cpu_physical_memory_write_rom_internal(AddressSpace *as,
-    hwaddr addr, const uint8_t *buf, int len, enum write_rom_type type)
+    hwaddr addr, const uint8_t *buf, uint32_t len, enum write_rom_type type)
 {
     hwaddr l;
     uint8_t *ptr;
@@ -3427,12 +3427,12 @@ static inline void cpu_physical_memory_write_rom_internal(AddressSpace *as,
 
 /* used for ROM loading : can write in RAM and ROM */
 void cpu_physical_memory_write_rom(AddressSpace *as, hwaddr addr,
-                                   const uint8_t *buf, int len)
+                                   const uint8_t *buf, uint32_t len)
 {
     cpu_physical_memory_write_rom_internal(as, addr, buf, len, WRITE_DATA);
 }
 
-void cpu_flush_icache_range(hwaddr start, int len)
+void cpu_flush_icache_range(hwaddr start, uint32_t len)
 {
     /*
      * This function should do the same thing as an icache flush that was
@@ -3534,7 +3534,7 @@ static void cpu_notify_map_clients(void)
     qemu_mutex_unlock(&map_client_list_lock);
 }
 
-static bool flatview_access_valid(FlatView *fv, hwaddr addr, int len,
+static bool flatview_access_valid(FlatView *fv, hwaddr addr, uint32_t len,
                                   bool is_write, MemTxAttrs attrs)
 {
     MemoryRegion *mr;
@@ -3557,7 +3557,7 @@ static bool flatview_access_valid(FlatView *fv, hwaddr addr, int len,
 }
 
 bool address_space_access_valid(AddressSpace *as, hwaddr addr,
-                                int len, bool is_write,
+                                uint32_t len, bool is_write,
                                 MemTxAttrs attrs)
 {
     FlatView *fv;
@@ -3810,7 +3810,7 @@ static inline MemoryRegion *address_space_translate_cached(
  */
 void
 address_space_read_cached_slow(MemoryRegionCache *cache, hwaddr addr,
-                                   void *buf, int len)
+                                   void *buf, uint32_t len)
 {
     hwaddr addr1, l;
     MemoryRegion *mr;
@@ -3828,7 +3828,7 @@ address_space_read_cached_slow(MemoryRegionCache *cache, hwaddr addr,
  */
 void
 address_space_write_cached_slow(MemoryRegionCache *cache, hwaddr addr,
-                                    const void *buf, int len)
+                                    const void *buf, uint32_t len)
 {
     hwaddr addr1, l;
     MemoryRegion *mr;
@@ -3851,7 +3851,7 @@ address_space_write_cached_slow(MemoryRegionCache *cache, hwaddr addr,
 
 /* virtual memory access for debug (includes writing to ROM) */
 int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
-                        uint8_t *buf, int len, int is_write)
+                        uint8_t *buf, uint32_t len, int is_write)
 {
     int l;
     hwaddr phys_addr;
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 117d2fb..635394c 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -367,7 +367,7 @@ void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf);
 #endif /* !CONFIG_USER_ONLY */
 
 int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
-                        uint8_t *buf, int len, int is_write);
+                        uint8_t *buf, uint32_t len, int is_write);
 
 int cpu_exec(CPUState *cpu);
 
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 18b40d6..bdbb620 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -83,14 +83,14 @@ size_t qemu_ram_pagesize(RAMBlock *block);
 size_t qemu_ram_pagesize_largest(void);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
-                            int len, int is_write);
+                            uint32_t len, int is_write);
 static inline void cpu_physical_memory_read(hwaddr addr,
-                                            void *buf, int len)
+                                            void *buf, uint32_t len)
 {
     cpu_physical_memory_rw(addr, buf, len, 0);
 }
 static inline void cpu_physical_memory_write(hwaddr addr,
-                                             const void *buf, int len)
+                                             const void *buf, uint32_t len)
 {
     cpu_physical_memory_rw(addr, (void *)buf, len, 1);
 }
@@ -112,8 +112,8 @@ bool cpu_physical_memory_is_io(hwaddr phys_addr);
 void qemu_flush_coalesced_mmio_buffer(void);
 
 void cpu_physical_memory_write_rom(AddressSpace *as, hwaddr addr,
-                                   const uint8_t *buf, int len);
-void cpu_flush_icache_range(hwaddr start, int len);
+                                   const uint8_t *buf, uint32_t len);
+void cpu_flush_icache_range(hwaddr start, uint32_t len);
 
 extern struct MemoryRegion io_mem_rom;
 extern struct MemoryRegion io_mem_notdirty;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 667466b..6c1ec75 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1748,7 +1748,7 @@ void address_space_destroy(AddressSpace *as);
  */
 MemTxResult address_space_rw(AddressSpace *as, hwaddr addr,
                              MemTxAttrs attrs, uint8_t *buf,
-                             int len, bool is_write);
+                             uint32_t len, bool is_write);
 
 /**
  * address_space_write: write to address space.
@@ -1765,7 +1765,7 @@ MemTxResult address_space_rw(AddressSpace *as, hwaddr addr,
  */
 MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
                                 MemTxAttrs attrs,
-                                const uint8_t *buf, int len);
+                                const uint8_t *buf, uint32_t len);
 
 /* address_space_ld*: load from an address space
  * address_space_st*: store to an address space
@@ -1966,7 +1966,7 @@ static inline MemoryRegion *address_space_translate(AddressSpace *as,
  * @is_write: indicates the transfer direction
  * @attrs: memory attributes
  */
-bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len,
+bool address_space_access_valid(AddressSpace *as, hwaddr addr, uint32_t len,
                                 bool is_write, MemTxAttrs attrs);
 
 /* address_space_map: map a physical memory region into a host virtual address
@@ -2003,19 +2003,19 @@ void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
 
 /* Internal functions, part of the implementation of address_space_read.  */
 MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
-                                    MemTxAttrs attrs, uint8_t *buf, int len);
+                                    MemTxAttrs attrs, uint8_t *buf, uint32_t len);
 MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr,
                                    MemTxAttrs attrs, uint8_t *buf,
-                                   int len, hwaddr addr1, hwaddr l,
+                                   uint32_t len, hwaddr addr1, hwaddr l,
                                    MemoryRegion *mr);
 void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr);
 
 /* Internal functions, part of the implementation of address_space_read_cached
  * and address_space_write_cached.  */
 void address_space_read_cached_slow(MemoryRegionCache *cache,
-                                    hwaddr addr, void *buf, int len);
+                                    hwaddr addr, void *buf, uint32_t len);
 void address_space_write_cached_slow(MemoryRegionCache *cache,
-                                     hwaddr addr, const void *buf, int len);
+                                     hwaddr addr, const void *buf, uint32_t len);
 
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
@@ -2043,7 +2043,7 @@ static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 static inline __attribute__((__always_inline__))
 MemTxResult address_space_read(AddressSpace *as, hwaddr addr,
                                MemTxAttrs attrs, uint8_t *buf,
-                               int len)
+                               uint32_t len)
 {
     MemTxResult result = MEMTX_OK;
     hwaddr l, addr1;
@@ -2082,7 +2082,7 @@ MemTxResult address_space_read(AddressSpace *as, hwaddr addr,
  */
 static inline void
 address_space_read_cached(MemoryRegionCache *cache, hwaddr addr,
-                          void *buf, int len)
+                          void *buf, uint32_t len)
 {
     assert(addr < cache->len && len <= cache->len - addr);
     if (likely(cache->ptr)) {
@@ -2102,7 +2102,7 @@ address_space_read_cached(MemoryRegionCache *cache, hwaddr addr,
  */
 static inline void
 address_space_write_cached(MemoryRegionCache *cache, hwaddr addr,
-                           void *buf, int len)
+                           void *buf, uint32_t len)
 {
     assert(addr < cache->len && len <= cache->len - addr);
     if (likely(cache->ptr)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image()
  2018-11-08 10:59   ` [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image() Li Zhijian
  2018-11-08 10:59     ` [Qemu-devel] [PATCH 3/3] change int len to uin32_t len Li Zhijian
@ 2018-11-08 11:04     ` Peter Maydell
  2018-11-09  2:48       ` Li Zhijian
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2018-11-08 11:04 UTC (permalink / raw)
  To: Li Zhijian; +Cc: QEMU Developers, Philip Li, zhijianx.li

On 8 November 2018 at 10:59, Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
> allow load_image to load >= 2G file
>
> CC: Philip Li <philip.li@intel.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  hw/core/loader.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/core/loader.c b/hw/core/loader.c
> index aa0b3fc..8fbc4bd 100644
> --- a/hw/core/loader.c
> +++ b/hw/core/loader.c
> @@ -77,7 +77,8 @@ int64_t get_image_size(const char *filename)
>  /* deprecated, because caller does not specify buffer size! */
>  int load_image(const char *filename, uint8_t *addr)
>  {
> -    int fd, size;
> +    int fd;
> +    int64_t size;
>      fd = open(filename, O_RDONLY | O_BINARY);
>      if (fd < 0)
>          return -1;

This function returns "size", so changing the type of "size"
without also changing its return type looks wrong.

load_image_size() uses ssize_t here, which may be what we want.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-08 10:59 ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Li Zhijian
  2018-11-08 10:59   ` [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image() Li Zhijian
@ 2018-11-08 11:06   ` Peter Maydell
  2018-11-09  1:47       ` Li Zhijian
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2018-11-08 11:06 UTC (permalink / raw)
  To: Li Zhijian; +Cc: QEMU Developers, Philip Li, zhijianx.li

On 8 November 2018 at 10:59, Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
> x86/x86_64 has alredy supported 4G initrd.
>
> linux/arch/x86/boot/header.S:
>  # (Header version 0x0203 or later) the highest safe address for the contents
>  # of an initrd. The current kernel allows up to 4 GB, but leave it at 2 GB to
>  # avoid possible bootloader bugs.
>
> CC: Philip Li <philip.li@intel.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  hw/i386/pc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index cd5029c..e1b910f 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -913,6 +913,12 @@ static void load_linux(PCMachineState *pcms,
>      /* highest address for loading the initrd */
>      if (protocol >= 0x203) {
>          initrd_max = ldl_p(header+0x22c);
> +        if (initrd_max == 0x7fffffff) {
> +            /* for some reasons, initrd_max is hard code with 0x7fffffff
> +             * hard code to 4G - 1 to allow 4G initrd
> +             */
> +            initrd_max = UINT32_MAX - 1;
> +        }

I don't understand this. If the header of the file we're using
says "this is the maximum", then we should trust the header to
in fact not be lying to us, shouldn't we ?

If the kernel initrd creation process creates an initrd which
is larger than 2GB and also claims that it can't be placed
with any part of it above 2GB, then that sounds like a bug
in the initrd creation process...

>      } else {
>          initrd_max = 0x37ffffff;
>      }

This patch should come last in the series: only after we have fixed all
of QEMU's internal plumbing to handle larger initrd sizes should we
enable it.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] change int len to uin32_t len
  2018-11-08 10:59     ` [Qemu-devel] [PATCH 3/3] change int len to uin32_t len Li Zhijian
@ 2018-11-08 11:14       ` Peter Maydell
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Maydell @ 2018-11-08 11:14 UTC (permalink / raw)
  To: Li Zhijian
  Cc: QEMU Developers, Peter Crosthwaite, Philip Li, zhijianx.li,
	Paolo Bonzini, Richard Henderson

On 8 November 2018 at 10:59, Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
> In order to support >= 2G initrd, we need to change len type from int to
> uin32_t.
>
> Below is the flow sample to show how qemu copy initrd from qemu
> side to VM when using optionroms bootlinux_dma.bin:
> dma_memory_read(uint32_t len)
>   -> dma_memory_rw(uint32_t len)
>     -> dma_memory_rw_relaxed(uint32_t len)
>       -> address_space_rw(int len) # len overflow
>         -> address_space_read_full(int len)
>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Peter Crosthwaite <crosthwaite.peter@gmail.com>
> CC: Richard Henderson <rth@twiddle.net>
> CC: Philip Li <philip.li@intel.com>
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>

If we change this, then "uint32_t" is not the right type. If we
want to allow writes of large chunks of memory through this API
then the right type for the length of a chunk of guest memory
is "hwaddr".

The other way to approach this would be to say that devices
doing DMA mustn't try to do it in enormous chunks like this,
but need to split it up. If we had a coherent story for how
to avoid dma-engine device emulations from sitting in a loop
forever doing guest-requested DMA this would probably tie in
with that, as part of saying "don't do more than X amount of
work at a time without yielding control back to QEMU".

thanks
-- PMM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 0/3] support initrd size up to 4G
  2018-11-08 10:59 [Qemu-devel] [RFC/PoC PATCH 0/3] support initrd size up to 4G Li Zhijian
  2018-11-08 10:59 ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Li Zhijian
@ 2018-11-08 20:12 ` Wainer dos Santos Moschetta
  1 sibling, 0 replies; 22+ messages in thread
From: Wainer dos Santos Moschetta @ 2018-11-08 20:12 UTC (permalink / raw)
  To: Li Zhijian, qemu-devel
  Cc: Paolo Bonzini, zhijianx.li, Richard Henderson, Peter Crosthwaite

Under review, the following patch adds acceptance tests for the initrd 
option:

https://www.mail-archive.com/qemu-devel@nongnu.org/msg567776.html

The test should be updated in case this series get merged. Maybe the 
best would be to include those tests along with this series actually.

- Wainer


On 11/08/2018 08:59 AM, Li Zhijian wrote:
> Long long ago, linux kernel have supported up to 4G initrd, but it's header still
> hard code to allow 2G - 1 only.
>   # (Header version 0x0203 or later) the highest safe address for the contents
>   # of an initrd. The current kernel allows up to 4 GB, but leave it at 2 GB to
>   # avoid possible bootloader bugs.
>
> kexec is one of the known scenario which has supported up to 4G initrd.
>
> This patches is to enable up to 4G initrd, Seabios + optionrom(linuxboot_dma)
> works as expected so far.
>
> 3/3 has a huge change of address/memory APIs.
> I replace 'int len' in a stupid way, but it looks not safety.
> $ sed -i 's/int len/uint32_t len/' exec.c
> $ make # and check compiling errors
> $ sed -i 's/int len/uint32_t len/' cpu-all.h
> $ make -i 's/int len/uint32_t len/' include/exec/cpu-common.h
> $ make -i 's/int len/uint32_t len/' include/exec/memory.h
> $ make # all errors gone
>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Peter Crosthwaite <crosthwaite.peter@gmail.com>
> CC: Richard Henderson <rth@twiddle.net>
>
> Li Zhijian (3):
>    i386: set initrd_max to 4G - 1 to allow up to 4G initrd
>    change size type from int to int64_t on load_image()
>    change int len to uin32_t len
>
>   exec.c                    | 42 +++++++++++++++++++++---------------------
>   hw/core/loader.c          |  3 ++-
>   hw/i386/pc.c              |  6 ++++++
>   include/exec/cpu-all.h    |  2 +-
>   include/exec/cpu-common.h | 10 +++++-----
>   include/exec/memory.h     | 20 ++++++++++----------
>   6 files changed, 45 insertions(+), 38 deletions(-)
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-08 11:06   ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Peter Maydell
@ 2018-11-09  1:47       ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2018-11-09  1:47 UTC (permalink / raw)
  To: Peter Maydell, x86, hpa, bp, mingo, tglx
  Cc: QEMU Developers, Philip Li, zhijianx.li, linux-kernel


On 11/08/2018 07:06 PM, Peter Maydell wrote:
> On 8 November 2018 at 10:59, Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>> x86/x86_64 has alredy supported 4G initrd.
>>
>> linux/arch/x86/boot/header.S:
>>   # (Header version 0x0203 or later) the highest safe address for the contents
>>   # of an initrd. The current kernel allows up to 4 GB, but leave it at 2 GB to
>>   # avoid possible bootloader bugs.
>>
>> CC: Philip Li <philip.li@intel.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   hw/i386/pc.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index cd5029c..e1b910f 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -913,6 +913,12 @@ static void load_linux(PCMachineState *pcms,
>>       /* highest address for loading the initrd */
>>       if (protocol >= 0x203) {
>>           initrd_max = ldl_p(header+0x22c);
>> +        if (initrd_max == 0x7fffffff) {
>> +            /* for some reasons, initrd_max is hard code with 0x7fffffff
>> +             * hard code to 4G - 1 to allow 4G initrd
>> +             */
>> +            initrd_max = UINT32_MAX - 1;
>> +        }
> I don't understand this. If the header of the file we're using
> says "this is the maximum", then we should trust the header to
> in fact not be lying to us, shouldn't we ?
>
> If the kernel initrd creation process creates an initrd which
> is larger than 2GB and also claims that it can't be placed
> with any part of it above 2GB, then that sounds like a bug
> in the initrd creation process...

Exactly, it's a real problem.

Add x86 maintainers and LKML:

The background is that QEMU want to support up to 4G initrd. but linux header (
initrd_addr_max field) only allow 2G-1.
Is one of the below approaches reasonable:
1) change initrd_addr_max to 4G-1 directly simply(arch/x86/boot/header.S)?
2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header said 2G-1
3) any else






>
>>       } else {
>>           initrd_max = 0x37ffffff;
>>       }
> This patch should come last in the series: only after we have fixed all
> of QEMU's internal plumbing to handle larger initrd sizes should we
> enable it.

Got it.

Thanks
Zhijian

>
> thanks
> -- PMM
>
>
>




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
@ 2018-11-09  1:47       ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2018-11-09  1:47 UTC (permalink / raw)
  To: Peter Maydell, x86, hpa, bp, mingo, tglx
  Cc: QEMU Developers, Philip Li, zhijianx.li, linux-kernel


On 11/08/2018 07:06 PM, Peter Maydell wrote:
> On 8 November 2018 at 10:59, Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>> x86/x86_64 has alredy supported 4G initrd.
>>
>> linux/arch/x86/boot/header.S:
>>   # (Header version 0x0203 or later) the highest safe address for the contents
>>   # of an initrd. The current kernel allows up to 4 GB, but leave it at 2 GB to
>>   # avoid possible bootloader bugs.
>>
>> CC: Philip Li <philip.li@intel.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   hw/i386/pc.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index cd5029c..e1b910f 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -913,6 +913,12 @@ static void load_linux(PCMachineState *pcms,
>>       /* highest address for loading the initrd */
>>       if (protocol >= 0x203) {
>>           initrd_max = ldl_p(header+0x22c);
>> +        if (initrd_max == 0x7fffffff) {
>> +            /* for some reasons, initrd_max is hard code with 0x7fffffff
>> +             * hard code to 4G - 1 to allow 4G initrd
>> +             */
>> +            initrd_max = UINT32_MAX - 1;
>> +        }
> I don't understand this. If the header of the file we're using
> says "this is the maximum", then we should trust the header to
> in fact not be lying to us, shouldn't we ?
>
> If the kernel initrd creation process creates an initrd which
> is larger than 2GB and also claims that it can't be placed
> with any part of it above 2GB, then that sounds like a bug
> in the initrd creation process...

Exactly, it's a real problem.

Add x86 maintainers and LKML:

The background is that QEMU want to support up to 4G initrd. but linux header (
initrd_addr_max field) only allow 2G-1.
Is one of the below approaches reasonable:
1) change initrd_addr_max to 4G-1 directly simply(arch/x86/boot/header.S)?
2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header said 2G-1
3) any else






>
>>       } else {
>>           initrd_max = 0x37ffffff;
>>       }
> This patch should come last in the series: only after we have fixed all
> of QEMU's internal plumbing to handle larger initrd sizes should we
> enable it.

Got it.

Thanks
Zhijian

>
> thanks
> -- PMM
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image()
  2018-11-08 11:04     ` [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image() Peter Maydell
@ 2018-11-09  2:48       ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2018-11-09  2:48 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Philip Li, zhijianx.li


On 11/08/2018 07:04 PM, Peter Maydell wrote:
> On 8 November 2018 at 10:59, Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>> allow load_image to load >= 2G file
>>
>> CC: Philip Li <philip.li@intel.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   hw/core/loader.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/core/loader.c b/hw/core/loader.c
>> index aa0b3fc..8fbc4bd 100644
>> --- a/hw/core/loader.c
>> +++ b/hw/core/loader.c
>> @@ -77,7 +77,8 @@ int64_t get_image_size(const char *filename)
>>   /* deprecated, because caller does not specify buffer size! */
>>   int load_image(const char *filename, uint8_t *addr)
>>   {
>> -    int fd, size;
>> +    int fd;
>> +    int64_t size;
>>       fd = open(filename, O_RDONLY | O_BINARY);
>>       if (fd < 0)
>>           return -1;
> This function returns "size", so changing the type of "size"
> without also changing its return type looks wrong.
>
> load_image_size() uses ssize_t here, which may be what we want.

will change to ssize_t at next version

Thanks
Zhijian

>
> thanks
> -- PMM
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09  1:47       ` Li Zhijian
  (?)
@ 2018-11-09  7:20       ` Ingo Molnar
  2018-11-09  9:57         ` Li Zhijian
  -1 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2018-11-09  7:20 UTC (permalink / raw)
  To: Li Zhijian, Juergen Gross
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, zhijianx.li, linux-kernel, Linus Torvalds,
	Peter Zijlstra, Kees Cook


* Li Zhijian <lizhijian@cn.fujitsu.com> wrote:

> > If the kernel initrd creation process creates an initrd which
> > is larger than 2GB and also claims that it can't be placed
> > with any part of it above 2GB, then that sounds like a bug
> > in the initrd creation process...
> 
> Exactly, it's a real problem.
> 
> Add x86 maintainers and LKML:
> 
> The background is that QEMU want to support up to 4G initrd. but linux header (
> initrd_addr_max field) only allow 2G-1.
> Is one of the below approaches reasonable:
> 1) change initrd_addr_max to 4G-1 directly simply(arch/x86/boot/header.S)?
> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header said 2G-1
> 3) any else

A 10 years old comment from hpa says:

  initrd_addr_max: .long 0x7fffffff
                                        # (Header version 0x0203 or later)
                                        # The highest safe address for
                                        # the contents of an initrd
                                        # The current kernel allows up to 4 GB,
                                        # but leave it at 2 GB to avoid
                                        # possible bootloader bugs.

To avoid the potential of bugs lurking in dozens of major and hundreds of 
minor iterations of various Linux bootloaders I'd prefer a real solution 
and extend it - because if there's a 2GB initrd for some weird reason 
today there might be a 4GB one in two years.

The real solution would be to:

 - Extend the boot protocol with a 64-bit field, named initrd_addr64_max 
   or such.

 - We don't change the old field - but if the new field is set by new
   kernels then new bootloaders can use that as a new initrd_addr64_max
   value. (or reject to load the kernel if the address is too high.)

 - The kernel build should also emit a warning when building larger than 
   2GB initrds, with a list of bootloaders that support the new protocol.

Or something along those lines.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09  7:20       ` Ingo Molnar
@ 2018-11-09  9:57         ` Li Zhijian
  2018-11-09 10:04           ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Li Zhijian @ 2018-11-09  9:57 UTC (permalink / raw)
  To: Ingo Molnar, Li Zhijian, Juergen Gross
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Kees Cook

On 11/9/2018 3:20 PM, Ingo Molnar wrote:
> * Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>
>>> If the kernel initrd creation process creates an initrd which
>>> is larger than 2GB and also claims that it can't be placed
>>> with any part of it above 2GB, then that sounds like a bug
>>> in the initrd creation process...
>> Exactly, it's a real problem.
>>
>> Add x86 maintainers and LKML:
>>
>> The background is that QEMU want to support up to 4G initrd. but linux header (
>> initrd_addr_max field) only allow 2G-1.
>> Is one of the below approaches reasonable:
>> 1) change initrd_addr_max to 4G-1 directly simply(arch/x86/boot/header.S)?
>> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header said 2G-1
>> 3) any else
> A 10 years old comment from hpa says:
>
>    initrd_addr_max: .long 0x7fffffff
>                                          # (Header version 0x0203 or later)
>                                          # The highest safe address for
>                                          # the contents of an initrd
>                                          # The current kernel allows up to 4 GB,
>                                          # but leave it at 2 GB to avoid
>                                          # possible bootloader bugs.
>
> To avoid the potential of bugs lurking in dozens of major and hundreds of
> minor iterations of various Linux bootloaders I'd prefer a real solution
> and extend it - because if there's a 2GB initrd for some weird reason
> today there might be a 4GB one in two years.

thank a lots. that's amazing.


>
> The real solution would be to:
>
>   - Extend the boot protocol with a 64-bit field, named initrd_addr64_max
>     or such.
>   - We don't change the old field - but if the new field is set by new
>     kernels then new bootloaders can use that as a new initrd_addr64_max
>     value. (or reject to load the kernel if the address is too high.)
>
>   - The kernel build should also emit a warning when building larger than
>     2GB initrds, with a list of bootloaders that support the new protocol.

Actually i just knew QEMU(Seabios + optionrom(linuxboot_dma.bin)) can support ~4GB initrd so far.

i just drafted at patch to add this field. could you have a look.
another patch which is to document initrd_addr64_max is ongoing.

commit db463ac9c1975f115d1ce2acb82d530c2b63b888
Author: Li Zhijian <lizhijian@cn.fujitsu.com>
Date:   Fri Nov 9 17:24:14 2018 +0800

     x86: Add header field initrd_addr64_max
     
     Years ago, kernel had support load ~4GB initrd. But for some weird reasons (
     avoid possible bootloader bugs), it only allow leave initrd under 2GB address
     space(see initrd_addr_max fild at arch/x86/boot/header.S).
     
     So modern bootloaders have not chance to load >=2G initrd previously.
     
     To avoid the potential of bugs lurking in dozens of major and hundreds of
     minor iterations of various Linux bootloaders. Ingo suggests to add a new field
     initrd_addr64_max. If bootloader believes that it can load initrd to >=2G
     address space, it can use initrd_addr64_max as the maximum loading address in
     stead of the old field initrd_addr_max.

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 4c881c8..5fc3ebe 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -300,7 +300,7 @@ _start:
         # Part 2 of the header, from the old setup.S
  
                 .ascii  "HdrS"          # header signature
-               .word   0x020e          # header version number (>= 0x0105)
+               .word   0x020f          # header version number (>= 0x0105)
                                         # or else old loadlin-1.5 will fail)
                 .globl realmode_swtch
  realmode_swtch:        .word   0, 0            # default_switch, SETUPSEG
@@ -562,6 +562,12 @@ acpi_rsdp_addr:            .quad 0                 # 64-bit physical pointer to the
                                                 # ACPI RSDP table, added with
                                                 # version 2.14
  
+#ifdef CONFIG_INITRD_SIZE_4GB
+initrd_addr64_max:     .quad 0xffffffff        # allow ~4G initrd since 2.15
+#else
+initrd_addr64_max:     .quad 0
+#endif
+
  # End of setup header #####################################################
  
         .section ".entrytext", "ax"
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 22f89d0..b86013d 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -90,6 +90,7 @@ struct setup_header {
         __u32   init_size;
         __u32   handover_offset;
         __u64   acpi_rsdp_addr;
+       __u64   initrd_addr64_max;
  } __attribute__((packed));
  
  struct sys_desc_table {
diff --git a/init/Kconfig b/init/Kconfig
index a4112e9..611d4af 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1080,6 +1080,14 @@ config BLK_DEV_INITRD
  
           If unsure say Y.
  
+config INITRD_SIZE_4GB
+       bool "4G size initrd support"
+       depends on (X86 || X86_64)
+       help
+         This option enables support ~4GB initrd.
+
+         if unsure say N.
+
  if BLK_DEV_INITRD
  
  source "usr/Kconfig"

Thanks
Zhijian




> Or something along those lines.
>
> Thanks,
>
> 	Ingo

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09  9:57         ` Li Zhijian
@ 2018-11-09 10:04           ` Juergen Gross
  2018-11-09 13:11             ` Li Zhijian
  2018-11-09 13:40             ` Li Zhijian
  0 siblings, 2 replies; 22+ messages in thread
From: Juergen Gross @ 2018-11-09 10:04 UTC (permalink / raw)
  To: Li Zhijian, Ingo Molnar, Li Zhijian
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Kees Cook

On 09/11/2018 10:57, Li Zhijian wrote:
> On 11/9/2018 3:20 PM, Ingo Molnar wrote:
>> * Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>>
>>>> If the kernel initrd creation process creates an initrd which
>>>> is larger than 2GB and also claims that it can't be placed
>>>> with any part of it above 2GB, then that sounds like a bug
>>>> in the initrd creation process...
>>> Exactly, it's a real problem.
>>>
>>> Add x86 maintainers and LKML:
>>>
>>> The background is that QEMU want to support up to 4G initrd. but
>>> linux header (
>>> initrd_addr_max field) only allow 2G-1.
>>> Is one of the below approaches reasonable:
>>> 1) change initrd_addr_max to 4G-1 directly
>>> simply(arch/x86/boot/header.S)?
>>> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header
>>> said 2G-1
>>> 3) any else
>> A 10 years old comment from hpa says:
>>
>>    initrd_addr_max: .long 0x7fffffff
>>                                          # (Header version 0x0203 or
>> later)
>>                                          # The highest safe address for
>>                                          # the contents of an initrd
>>                                          # The current kernel allows
>> up to 4 GB,
>>                                          # but leave it at 2 GB to avoid
>>                                          # possible bootloader bugs.
>>
>> To avoid the potential of bugs lurking in dozens of major and hundreds of
>> minor iterations of various Linux bootloaders I'd prefer a real solution
>> and extend it - because if there's a 2GB initrd for some weird reason
>> today there might be a 4GB one in two years.
> 
> thank a lots. that's amazing.
> 
> 
>>
>> The real solution would be to:
>>
>>   - Extend the boot protocol with a 64-bit field, named initrd_addr64_max
>>     or such.
>>   - We don't change the old field - but if the new field is set by new
>>     kernels then new bootloaders can use that as a new initrd_addr64_max
>>     value. (or reject to load the kernel if the address is too high.)
>>
>>   - The kernel build should also emit a warning when building larger than
>>     2GB initrds, with a list of bootloaders that support the new
>> protocol.
> 
> Actually i just knew QEMU(Seabios + optionrom(linuxboot_dma.bin)) can
> support ~4GB initrd so far.
> 
> i just drafted at patch to add this field. could you have a look.
> another patch which is to document initrd_addr64_max is ongoing.
> 
> commit db463ac9c1975f115d1ce2acb82d530c2b63b888
> Author: Li Zhijian <lizhijian@cn.fujitsu.com>
> Date:   Fri Nov 9 17:24:14 2018 +0800
> 
>     x86: Add header field initrd_addr64_max
>         Years ago, kernel had support load ~4GB initrd. But for some
> weird reasons (
>     avoid possible bootloader bugs), it only allow leave initrd under
> 2GB address
>     space(see initrd_addr_max fild at arch/x86/boot/header.S).
>         So modern bootloaders have not chance to load >=2G initrd
> previously.
>         To avoid the potential of bugs lurking in dozens of major and
> hundreds of
>     minor iterations of various Linux bootloaders. Ingo suggests to add
> a new field
>     initrd_addr64_max. If bootloader believes that it can load initrd to
>>=2G
>     address space, it can use initrd_addr64_max as the maximum loading
> address in
>     stead of the old field initrd_addr_max.
> 
> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index 4c881c8..5fc3ebe 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -300,7 +300,7 @@ _start:
>         # Part 2 of the header, from the old setup.S
>  
>                 .ascii  "HdrS"          # header signature
> -               .word   0x020e          # header version number (>= 0x0105)
> +               .word   0x020f          # header version number (>= 0x0105)
>                                         # or else old loadlin-1.5 will
> fail)
>                 .globl realmode_swtch
>  realmode_swtch:        .word   0, 0            # default_switch, SETUPSEG
> @@ -562,6 +562,12 @@ acpi_rsdp_addr:            .quad 0                
> # 64-bit physical pointer to the
>                                                 # ACPI RSDP table, added
> with
>                                                 # version 2.14
>  
> +#ifdef CONFIG_INITRD_SIZE_4GB
> +initrd_addr64_max:     .quad 0xffffffff        # allow ~4G initrd since
> 2.15
> +#else
> +initrd_addr64_max:     .quad 0

Shouldn't this be 0x7fffffff?

And please update Documentation/x86/boot.txt


Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09 10:04           ` Juergen Gross
@ 2018-11-09 13:11             ` Li Zhijian
  2018-11-09 13:40             ` Li Zhijian
  1 sibling, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2018-11-09 13:11 UTC (permalink / raw)
  To: Juergen Gross, Ingo Molnar, Li Zhijian
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Kees Cook

Just noticed that there is a field xloadflags at recent protocol
   60 Protocol 2.12:  (Kernel 3.8) Added the xloadflags field and extension fields
   61                 to struct boot_params for loading bzImage and ramdisk
   62                 above 4G in 64bit.
[snip]
  617 Field name:     xloadflags
  618 Type:           read
  619 Offset/size:    0x236/2
  620 Protocol:       2.12+
  621
  622   This field is a bitmask.
  623
  624   Bit 0 (read): XLF_KERNEL_64
  625         - If 1, this kernel has the legacy 64-bit entry point at 0x200.
  626
  627   Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
  628         - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
  629

maybe we can reuse this field and append a new Bit 5 XLF_INITRD_SIZE_4G or such


thanks
Zhijian

  

On 11/9/2018 6:04 PM, Juergen Gross wrote:
> On 09/11/2018 10:57, Li Zhijian wrote:
>> On 11/9/2018 3:20 PM, Ingo Molnar wrote:
>>> * Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>>>
>>>>> If the kernel initrd creation process creates an initrd which
>>>>> is larger than 2GB and also claims that it can't be placed
>>>>> with any part of it above 2GB, then that sounds like a bug
>>>>> in the initrd creation process...
>>>> Exactly, it's a real problem.
>>>>
>>>> Add x86 maintainers and LKML:
>>>>
>>>> The background is that QEMU want to support up to 4G initrd. but
>>>> linux header (
>>>> initrd_addr_max field) only allow 2G-1.
>>>> Is one of the below approaches reasonable:
>>>> 1) change initrd_addr_max to 4G-1 directly
>>>> simply(arch/x86/boot/header.S)?
>>>> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header
>>>> said 2G-1
>>>> 3) any else
>>> A 10 years old comment from hpa says:
>>>
>>>     initrd_addr_max: .long 0x7fffffff
>>>                                           # (Header version 0x0203 or
>>> later)
>>>                                           # The highest safe address for
>>>                                           # the contents of an initrd
>>>                                           # The current kernel allows
>>> up to 4 GB,
>>>                                           # but leave it at 2 GB to avoid
>>>                                           # possible bootloader bugs.
>>>
>>> To avoid the potential of bugs lurking in dozens of major and hundreds of
>>> minor iterations of various Linux bootloaders I'd prefer a real solution
>>> and extend it - because if there's a 2GB initrd for some weird reason
>>> today there might be a 4GB one in two years.
>> thank a lots. that's amazing.
>>
>>
>>> The real solution would be to:
>>>
>>>    - Extend the boot protocol with a 64-bit field, named initrd_addr64_max
>>>      or such.
>>>    - We don't change the old field - but if the new field is set by new
>>>      kernels then new bootloaders can use that as a new initrd_addr64_max
>>>      value. (or reject to load the kernel if the address is too high.)
>>>
>>>    - The kernel build should also emit a warning when building larger than
>>>      2GB initrds, with a list of bootloaders that support the new
>>> protocol.
>> Actually i just knew QEMU(Seabios + optionrom(linuxboot_dma.bin)) can
>> support ~4GB initrd so far.
>>
>> i just drafted at patch to add this field. could you have a look.
>> another patch which is to document initrd_addr64_max is ongoing.
>>
>> commit db463ac9c1975f115d1ce2acb82d530c2b63b888
>> Author: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Date:   Fri Nov 9 17:24:14 2018 +0800
>>
>>      x86: Add header field initrd_addr64_max
>>          Years ago, kernel had support load ~4GB initrd. But for some
>> weird reasons (
>>      avoid possible bootloader bugs), it only allow leave initrd under
>> 2GB address
>>      space(see initrd_addr_max fild at arch/x86/boot/header.S).
>>          So modern bootloaders have not chance to load >=2G initrd
>> previously.
>>          To avoid the potential of bugs lurking in dozens of major and
>> hundreds of
>>      minor iterations of various Linux bootloaders. Ingo suggests to add
>> a new field
>>      initrd_addr64_max. If bootloader believes that it can load initrd to
>>> =2G
>>      address space, it can use initrd_addr64_max as the maximum loading
>> address in
>>      stead of the old field initrd_addr_max.
>>
>> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
>> index 4c881c8..5fc3ebe 100644
>> --- a/arch/x86/boot/header.S
>> +++ b/arch/x86/boot/header.S
>> @@ -300,7 +300,7 @@ _start:
>>          # Part 2 of the header, from the old setup.S
>>   
>>                  .ascii  "HdrS"          # header signature
>> -               .word   0x020e          # header version number (>= 0x0105)
>> +               .word   0x020f          # header version number (>= 0x0105)
>>                                          # or else old loadlin-1.5 will
>> fail)
>>                  .globl realmode_swtch
>>   realmode_swtch:        .word   0, 0            # default_switch, SETUPSEG
>> @@ -562,6 +562,12 @@ acpi_rsdp_addr:            .quad 0
>> # 64-bit physical pointer to the
>>                                                  # ACPI RSDP table, added
>> with
>>                                                  # version 2.14
>>   
>> +#ifdef CONFIG_INITRD_SIZE_4GB
>> +initrd_addr64_max:     .quad 0xffffffff        # allow ~4G initrd since
>> 2.15
>> +#else
>> +initrd_addr64_max:     .quad 0
> Shouldn't this be 0x7fffffff?
>
> And please update Documentation/x86/boot.txt
>
>
> Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09 10:04           ` Juergen Gross
  2018-11-09 13:11             ` Li Zhijian
@ 2018-11-09 13:40             ` Li Zhijian
  2018-11-09 21:10               ` H. Peter Anvin
  1 sibling, 1 reply; 22+ messages in thread
From: Li Zhijian @ 2018-11-09 13:40 UTC (permalink / raw)
  To: Juergen Gross, Ingo Molnar, Li Zhijian
  Cc: Peter Maydell, x86, hpa, bp, mingo, tglx, QEMU Developers,
	Philip Li, linux-kernel, Linus Torvalds, Peter Zijlstra,
	Kees Cook

Just noticed that there is a field xloadflags at recent protocol
   60 Protocol 2.12:  (Kernel 3.8) Added the xloadflags field and extension fields
   61                 to struct boot_params for loading bzImage and ramdisk
   62                 above 4G in 64bit.
[snip]
  617 Field name:     xloadflags
  618 Type:           read
  619 Offset/size:    0x236/2
  620 Protocol:       2.12+
  621
  622   This field is a bitmask.
  623
  624   Bit 0 (read): XLF_KERNEL_64
  625         - If 1, this kernel has the legacy 64-bit entry point at 0x200.
  626
  627   Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
  628         - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
  629

maybe we can reuse this field and append a new Bit 5 XLF_INITRD_MAX_SIZE_4G and increase header version.
For the old protocol version 2.12+, if  XLF_CAN_BE_LOADED_ABOVE_4G is set, we can also realize ~4GB initrd is allowed.

bootloader side:
if protocol >= 2.15
    if XLF_INITRD_LOAD_BELOW_4G
       support ~4G initrd
    fi
else if protocol >=2.12
    if XLF_CAN_BE_LOADED_ABOVE_4G
     support ~4G initrd
    fi
fi

thanks
Zhijian

  

On 11/9/2018 6:04 PM, Juergen Gross wrote:
> On 09/11/2018 10:57, Li Zhijian wrote:
>> On 11/9/2018 3:20 PM, Ingo Molnar wrote:
>>> * Li Zhijian <lizhijian@cn.fujitsu.com> wrote:
>>>
>>>>> If the kernel initrd creation process creates an initrd which
>>>>> is larger than 2GB and also claims that it can't be placed
>>>>> with any part of it above 2GB, then that sounds like a bug
>>>>> in the initrd creation process...
>>>> Exactly, it's a real problem.
>>>>
>>>> Add x86 maintainers and LKML:
>>>>
>>>> The background is that QEMU want to support up to 4G initrd. but
>>>> linux header (
>>>> initrd_addr_max field) only allow 2G-1.
>>>> Is one of the below approaches reasonable:
>>>> 1) change initrd_addr_max to 4G-1 directly
>>>> simply(arch/x86/boot/header.S)?
>>>> 2) lie QEMU bootloader the initrd_addr_max is 4G-1 even though header
>>>> said 2G-1
>>>> 3) any else
>>> A 10 years old comment from hpa says:
>>>
>>>     initrd_addr_max: .long 0x7fffffff
>>>                                           # (Header version 0x0203 or
>>> later)
>>>                                           # The highest safe address for
>>>                                           # the contents of an initrd
>>>                                           # The current kernel allows
>>> up to 4 GB,
>>>                                           # but leave it at 2 GB to avoid
>>>                                           # possible bootloader bugs.
>>>
>>> To avoid the potential of bugs lurking in dozens of major and hundreds of
>>> minor iterations of various Linux bootloaders I'd prefer a real solution
>>> and extend it - because if there's a 2GB initrd for some weird reason
>>> today there might be a 4GB one in two years.
>> thank a lots. that's amazing.
>>
>>
>>> The real solution would be to:
>>>
>>>    - Extend the boot protocol with a 64-bit field, named initrd_addr64_max
>>>      or such.
>>>    - We don't change the old field - but if the new field is set by new
>>>      kernels then new bootloaders can use that as a new initrd_addr64_max
>>>      value. (or reject to load the kernel if the address is too high.)
>>>
>>>    - The kernel build should also emit a warning when building larger than
>>>      2GB initrds, with a list of bootloaders that support the new
>>> protocol.
>> Actually i just knew QEMU(Seabios + optionrom(linuxboot_dma.bin)) can
>> support ~4GB initrd so far.
>>
>> i just drafted at patch to add this field. could you have a look.
>> another patch which is to document initrd_addr64_max is ongoing.
>>
>> commit db463ac9c1975f115d1ce2acb82d530c2b63b888
>> Author: Li Zhijian <lizhijian@cn.fujitsu.com>
>> Date:   Fri Nov 9 17:24:14 2018 +0800
>>
>>      x86: Add header field initrd_addr64_max
>>          Years ago, kernel had support load ~4GB initrd. But for some
>> weird reasons (
>>      avoid possible bootloader bugs), it only allow leave initrd under
>> 2GB address
>>      space(see initrd_addr_max fild at arch/x86/boot/header.S).
>>          So modern bootloaders have not chance to load >=2G initrd
>> previously.
>>          To avoid the potential of bugs lurking in dozens of major and
>> hundreds of
>>      minor iterations of various Linux bootloaders. Ingo suggests to add
>> a new field
>>      initrd_addr64_max. If bootloader believes that it can load initrd to
>>> =2G
>>      address space, it can use initrd_addr64_max as the maximum loading
>> address in
>>      stead of the old field initrd_addr_max.
>>
>> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
>> index 4c881c8..5fc3ebe 100644
>> --- a/arch/x86/boot/header.S
>> +++ b/arch/x86/boot/header.S
>> @@ -300,7 +300,7 @@ _start:
>>          # Part 2 of the header, from the old setup.S
>>   
>>                  .ascii  "HdrS"          # header signature
>> -               .word   0x020e          # header version number (>= 0x0105)
>> +               .word   0x020f          # header version number (>= 0x0105)
>>                                          # or else old loadlin-1.5 will
>> fail)
>>                  .globl realmode_swtch
>>   realmode_swtch:        .word   0, 0            # default_switch, SETUPSEG
>> @@ -562,6 +562,12 @@ acpi_rsdp_addr:            .quad 0
>> # 64-bit physical pointer to the
>>                                                  # ACPI RSDP table, added
>> with
>>                                                  # version 2.14
>>   
>> +#ifdef CONFIG_INITRD_SIZE_4GB
>> +initrd_addr64_max:     .quad 0xffffffff        # allow ~4G initrd since
>> 2.15
>> +#else
>> +initrd_addr64_max:     .quad 0
> Shouldn't this be 0x7fffffff?
>
> And please update Documentation/x86/boot.txt
>
>
> Juergen

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09 13:40             ` Li Zhijian
@ 2018-11-09 21:10               ` H. Peter Anvin
  2018-11-12  4:56                 ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2018-11-09 21:10 UTC (permalink / raw)
  To: Li Zhijian, Juergen Gross, Ingo Molnar, Li Zhijian
  Cc: Peter Maydell, x86, bp, mingo, tglx, QEMU Developers, Philip Li,
	linux-kernel, Linus Torvalds, Peter Zijlstra, Kees Cook

On 11/9/18 5:40 AM, Li Zhijian wrote:
> Just noticed that there is a field xloadflags at recent protocol
>   60 Protocol 2.12:  (Kernel 3.8) Added the xloadflags field and
> extension fields
>   61                 to struct boot_params for loading bzImage and ramdisk
>   62                 above 4G in 64bit.
> [snip]
>  617 Field name:     xloadflags
>  618 Type:           read
>  619 Offset/size:    0x236/2
>  620 Protocol:       2.12+
>  621
>  622   This field is a bitmask.
>  623
>  624   Bit 0 (read): XLF_KERNEL_64
>  625         - If 1, this kernel has the legacy 64-bit entry point at
> 0x200.
>  626
>  627   Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
>  628         - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
>  629
> 
> maybe we can reuse this field and append a new Bit 5
> XLF_INITRD_MAX_SIZE_4G and increase header version.
> For the old protocol version 2.12+, if  XLF_CAN_BE_LOADED_ABOVE_4G is
> set, we can also realize ~4GB initrd is allowed.
> 
> bootloader side:
> if protocol >= 2.15
>    if XLF_INITRD_LOAD_BELOW_4G
>       support ~4G initrd
>    fi
> else if protocol >=2.12
>    if XLF_CAN_BE_LOADED_ABOVE_4G
>     support ~4G initrd
>    fi
> fi
> 

The two are equivalent.  Obviously you have to load above 4 GB if you
have more than 4 GB of initrd.  If XLF_CAN_BE_LOADED_ABOVE_4G is not
set, then you most likely are on a 32-bit kernel and there are more
fundamental limits (even if you were to load it above the 2 GB mark, you
would be limited by the size of kernel memory.)

So, in case you are wondering: the bootloader that broke when setting
the initrd_max field above 2 GB was, of course, Grub.

So just use XLF_CAN_BE_LOADED_ABOVE_4G. There is no need for a new flag
or new field.

Also note that the ext_ramdisk_image and ext_ramdisk_size are part of
struct boot_params as opposed to struct setup_header, which means that
they are not supported when entering via the 16-bit BIOS entry point,
and I am willing to bet that there will be, ahem, "strangeness" if
entered via the 32-bit entry point if at least the command line is
loaded above the 4 GB mark; the initrd should be fine, though.

This is obviosly not an issue in EFI environments, where we enter
through the EFI handover entry point.

The main reason these were not added to struct setup_header is that
there are only 24 bytes left in that header and so space is highly
precious. One way to deal with that if we really, really need to would
be to add an initrd/initramfs type of setup_data.

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-09 21:10               ` H. Peter Anvin
@ 2018-11-12  4:56                 ` Ingo Molnar
  2018-11-12  6:00                   ` H. Peter Anvin
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2018-11-12  4:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook


* H. Peter Anvin <hpa@zytor.com> wrote:

> On 11/9/18 5:40 AM, Li Zhijian wrote:
> > Just noticed that there is a field xloadflags at recent protocol
> >   60 Protocol 2.12:  (Kernel 3.8) Added the xloadflags field and
> > extension fields
> >   61                 to struct boot_params for loading bzImage and ramdisk
> >   62                 above 4G in 64bit.
> > [snip]
> >  617 Field name:     xloadflags
> >  618 Type:           read
> >  619 Offset/size:    0x236/2
> >  620 Protocol:       2.12+
> >  621
> >  622   This field is a bitmask.
> >  623
> >  624   Bit 0 (read): XLF_KERNEL_64
> >  625         - If 1, this kernel has the legacy 64-bit entry point at
> > 0x200.
> >  626
> >  627   Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
> >  628         - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
> >  629
> > 
> > maybe we can reuse this field and append a new Bit 5
> > XLF_INITRD_MAX_SIZE_4G and increase header version.
> > For the old protocol version 2.12+, if  XLF_CAN_BE_LOADED_ABOVE_4G is
> > set, we can also realize ~4GB initrd is allowed.
> > 
> > bootloader side:
> > if protocol >= 2.15
> >    if XLF_INITRD_LOAD_BELOW_4G
> >       support ~4G initrd
> >    fi
> > else if protocol >=2.12
> >    if XLF_CAN_BE_LOADED_ABOVE_4G
> >     support ~4G initrd
> >    fi
> > fi
> > 
> 
> The two are equivalent.  Obviously you have to load above 4 GB if you
> have more than 4 GB of initrd.  If XLF_CAN_BE_LOADED_ABOVE_4G is not
> set, then you most likely are on a 32-bit kernel and there are more
> fundamental limits (even if you were to load it above the 2 GB mark, you
> would be limited by the size of kernel memory.)
> 
> So, in case you are wondering: the bootloader that broke when setting
> the initrd_max field above 2 GB was, of course, Grub.
> 
> So just use XLF_CAN_BE_LOADED_ABOVE_4G. There is no need for a new flag
> or new field.

That's nice, and that's the best solution!

> Also note that the ext_ramdisk_image and ext_ramdisk_size are part of
> struct boot_params as opposed to struct setup_header, which means that
> they are not supported when entering via the 16-bit BIOS entry point,
> and I am willing to bet that there will be, ahem, "strangeness" if
> entered via the 32-bit entry point if at least the command line is
> loaded above the 4 GB mark; the initrd should be fine, though.
> 
> This is obviosly not an issue in EFI environments, where we enter
> through the EFI handover entry point.
> 
> The main reason these were not added to struct setup_header is that
> there are only 24 bytes left in that header and so space is highly
> precious. One way to deal with that if we really, really need to would
> be to add an initrd/initramfs type of setup_data.

Is there no way to extend that header by making an extended header part 
of the payload?

IIRC that header is small and fixed size to be part of a single sector at 
the very beginning of boot images, but accessing any extended header bits 
from the payload section shouldn't really be an issue for a modern 
bootloader to handle, right?

Such an extended header could use a more modern (self-extending) ABI as 
well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-12  4:56                 ` Ingo Molnar
@ 2018-11-12  6:00                   ` H. Peter Anvin
  2018-11-12  6:19                     ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2018-11-12  6:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook

On 11/11/18 8:56 PM, Ingo Molnar wrote:
> 
>> Also note that the ext_ramdisk_image and ext_ramdisk_size are part of
>> struct boot_params as opposed to struct setup_header, which means that
>> they are not supported when entering via the 16-bit BIOS entry point,
>> and I am willing to bet that there will be, ahem, "strangeness" if
>> entered via the 32-bit entry point if at least the command line is
>> loaded above the 4 GB mark; the initrd should be fine, though.
>>
>> This is obviosly not an issue in EFI environments, where we enter
>> through the EFI handover entry point.
>>
>> The main reason these were not added to struct setup_header is that
>> there are only 24 bytes left in that header and so space is highly
>> precious. One way to deal with that if we really, really need to would
>> be to add an initrd/initramfs type of setup_data.
> 
> Is there no way to extend that header by making an extended header part 
> of the payload?
> 
> IIRC that header is small and fixed size to be part of a single sector at 
> the very beginning of boot images, but accessing any extended header bits 
> from the payload section shouldn't really be an issue for a modern 
> bootloader to handle, right?
> 
> Such an extended header could use a more modern (self-extending) ABI as 
> well.
> 

Yes, although I don't really think it is as much of an issue as it seems at
this point.

The limit comes from having used a one-byte jump instruction at the beginning;
however, these days that limit is functionally walled.

It is of course possible to address this if it should become necessary,
however, the current protocol has lasted for 23 years so far and we haven't
run out yet, even with occasional missteps. As such, I don't think we are in a
huge hurry to address this particular aspect.

In part as a result of this exchange I have spent some time thinking about the
boot protocol and its dependencies, and there is, in fact, a much more serious
problem that needs to be addressed: it is not currently possible in a
forward-compatible way to map all data areas that may be occupied by
bootloader-provided data. The kernel proper has an advantage here, in that the
kernel will by definition always be the "owner of the protocol" (anything the
kernel doesn't know how to map won't be used by the kernel anyway), but it
really isn't a good situation. So I'm currently trying to think up a way to
make that possible.

	-hpa

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-12  6:00                   ` H. Peter Anvin
@ 2018-11-12  6:19                     ` Ingo Molnar
  2018-11-12  6:36                       ` H. Peter Anvin
  2018-11-12 16:47                       ` H. Peter Anvin
  0 siblings, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2018-11-12  6:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook


* H. Peter Anvin <hpa@zytor.com> wrote:

> > Such an extended header could use a more modern (self-extending) ABI as 
> > well.
> 
> Yes, although I don't really think it is as much of an issue as it seems at
> this point.
> 
> The limit comes from having used a one-byte jump instruction at the beginning;
> however, these days that limit is functionally walled.
> 
> It is of course possible to address this if it should become necessary,
> however, the current protocol has lasted for 23 years so far and we haven't
> run out yet, even with occasional missteps. As such, I don't think we are in a
> huge hurry to address this particular aspect.

Agreed, fair enough!

> In part as a result of this exchange I have spent some time thinking 
> about the boot protocol and its dependencies, and there is, in fact, a 
> much more serious problem that needs to be addressed: it is not 
> currently possible in a forward-compatible way to map all data areas 
> that may be occupied by bootloader-provided data. The kernel proper has 
> an advantage here, in that the kernel will by definition always be the 
> "owner of the protocol" (anything the kernel doesn't know how to map 
> won't be used by the kernel anyway), but it really isn't a good 
> situation. So I'm currently trying to think up a way to make that 
> possible.

I might be a bit dense early in the morning, but could you elaborate? 
What do you mean by mapping all data areas?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-12  6:19                     ` Ingo Molnar
@ 2018-11-12  6:36                       ` H. Peter Anvin
  2018-11-12 16:47                       ` H. Peter Anvin
  1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2018-11-12  6:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook

On 11/11/18 10:19 PM, Ingo Molnar wrote:
> 
> I might be a bit dense early in the morning, but could you elaborate? 
> What do you mean by mapping all data areas?
> 

Heh. I need to pack for LPC and get some sleep before my flight lest I'll be
denser than depleted uranium; I'll write an explanation tomorrow.

	-hpa


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd
  2018-11-12  6:19                     ` Ingo Molnar
  2018-11-12  6:36                       ` H. Peter Anvin
@ 2018-11-12 16:47                       ` H. Peter Anvin
  1 sibling, 0 replies; 22+ messages in thread
From: H. Peter Anvin @ 2018-11-12 16:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Li Zhijian, Juergen Gross, Li Zhijian, Peter Maydell, x86, bp,
	mingo, tglx, QEMU Developers, Philip Li, linux-kernel,
	Linus Torvalds, Peter Zijlstra, Kees Cook

On 11/11/18 10:19 PM, Ingo Molnar wrote:
> 
>> In part as a result of this exchange I have spent some time thinking 
>> about the boot protocol and its dependencies, and there is, in fact, a 
>> much more serious problem that needs to be addressed: it is not 
>> currently possible in a forward-compatible way to map all data areas 
>> that may be occupied by bootloader-provided data. The kernel proper has 
>> an advantage here, in that the kernel will by definition always be the 
>> "owner of the protocol" (anything the kernel doesn't know how to map 
>> won't be used by the kernel anyway), but it really isn't a good 
>> situation. So I'm currently trying to think up a way to make that 
>> possible.
> 
> I might be a bit dense early in the morning, but could you elaborate? 
> What do you mean by mapping all data areas?

Alright, awake now...

As it sits right now, the protocol contains a number of data structures with
pointers, pointing to a variety of memory areas that can be set up by the
bootloader. Now, consider something like KASLR or a secondary boot loader
where we need to allocate memory in between the primary bootloader and the
kernel to be run. With the kernel proper, in the absence of KASLR, we have
solved this by marking out exactly how much memory the kernel may need before
it has its own memory manager up and running, but KASLR needs to move it
outside this range, and a secondary boot loader shim of some sort may need to
allocate additional data structures. In the particular case of an UEFI system
where we do the right thing (which Grub2 doesn't, by default) and enter via
the kernel UEFI stub we are okay, but for other boot scenarios we are in
trouble: even if we know where all the pointers are and how to determine the
size of various data structures, once the protocol is updated with new
information then that is no longer valid.

The setup_data linked list solves that under certain circumstances, but in
others it has turned out to not be adequate.

There are a couple of options:

a) Not allow any new pointers to memory areas in what is considered system
   RAM. Such data structures *must* have a setup_data linked list header.
   Pointers into E820 table reserved areas are still acceptable.

b) Create a new E820 table memory type for "boot data", similar to what UEFI
   already has, and encourage boot loaders to mark any allocated memory
   structures that way.  The main problem with that is that the poor quality
   of boot loaders may mean that that fails to happen, and because it wouldn't
   "fail hard" it is likely that they will get it wrong.

   The difference from the RESERVED memory type is that the kernel can reclaim
   that memory after the data has been recovered.

c) This might be the preferred option:

   1. Just like (a), do not allow new pointers to memory areas in system RAM
      in struct boot_params.
   2. Create a subrange of struct setup_data (e.g. bit 30 = 1) explicitly
      containing pointers to other data structures, including sizes, in a
      way that can be parsed by generic code.
   3. Encourage boot loaders to make sure the setup_data list is in order of
      ascending address (and WARN if it is not.)
   4. Add (b) as an option, for responsible boot loaders ;) to provide an
      extra level of protection.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-11-12 16:48 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-08 10:59 [Qemu-devel] [RFC/PoC PATCH 0/3] support initrd size up to 4G Li Zhijian
2018-11-08 10:59 ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Li Zhijian
2018-11-08 10:59   ` [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image() Li Zhijian
2018-11-08 10:59     ` [Qemu-devel] [PATCH 3/3] change int len to uin32_t len Li Zhijian
2018-11-08 11:14       ` Peter Maydell
2018-11-08 11:04     ` [Qemu-devel] [RFC/PoC PATCH 2/3] change size type from int to int64_t on load_image() Peter Maydell
2018-11-09  2:48       ` Li Zhijian
2018-11-08 11:06   ` [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd Peter Maydell
2018-11-09  1:47     ` Li Zhijian
2018-11-09  1:47       ` Li Zhijian
2018-11-09  7:20       ` Ingo Molnar
2018-11-09  9:57         ` Li Zhijian
2018-11-09 10:04           ` Juergen Gross
2018-11-09 13:11             ` Li Zhijian
2018-11-09 13:40             ` Li Zhijian
2018-11-09 21:10               ` H. Peter Anvin
2018-11-12  4:56                 ` Ingo Molnar
2018-11-12  6:00                   ` H. Peter Anvin
2018-11-12  6:19                     ` Ingo Molnar
2018-11-12  6:36                       ` H. Peter Anvin
2018-11-12 16:47                       ` H. Peter Anvin
2018-11-08 20:12 ` [Qemu-devel] [RFC/PoC PATCH 0/3] support initrd size up to 4G Wainer dos Santos Moschetta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.