All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/3] QEMU changes to do PVH boot
@ 2018-12-05 22:37 ` Liam Merwick
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-05 22:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, ehabkost, rth, xen-devel, sgarzare, mst, stefanha,
	maran.wilson, liam.merwick

For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, QEMU should be able to boot directly into the
uncompressed Linux kernel binary with minimal firmware involvement.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

   https://xenbits.xen.org/docs/unstable/misc/pvh.html

Details on the Linux changes: https://lkml.org/lkml/2018/4/16/1002
qboot patches: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=80020

This patch series provides QEMU support to read the ELF header of an
uncompressed kernel binary and get the 32-bit PVH kernel entry point
from an ELF Note.  This is called when initialising the machine state
in pc_memory_init().  Later on in load_linux() if the kernel entry
address is present, the uncompressed kernel binary (ELF) is loaded
and qboot does futher initialisation of the guest (e820, etc.) and
jumps to the kernel entry address and boots the guest.


Usіng the method/scripts documented by the NEMU team at

   https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
   https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html

below are some timings measured (vmlinux and bzImage from the same build)
Time to get to kernel start is almost halved (95ṁs -> 48ms)

QEMU + qboot + vmlinux (PVH + 4.20-rc4)
 qemu_init_end: 41.550521
 fw_start: 41.667139 (+0.116618)
 fw_do_boot: 47.448495 (+5.781356)
 linux_startup_64: 47.720785 (+0.27229)
 linux_start_kernel: 48.399541 (+0.678756)
 linux_start_user: 296.952056 (+248.552515)

QEMU + qboot + bzImage:
 qemu_init_end: 29.209276
 fw_start: 29.317342 (+0.108066)
 linux_start_boot: 36.679362 (+7.36202)
 linux_startup_64: 94.531349 (+57.851987)
 linux_start_kernel: 94.900913 (+0.369564)
 linux_start_user: 401.060971 (+306.160058)

QEMU + bzImage:
 qemu_init_end: 30.424430
 linux_startup_64: 893.770334 (+863.345904)
 linux_start_kernel: 894.17049 (+0.400156)
 linux_start_user: 1208.679768 (+314.509278)


Liam Merwick (3):
  pvh: Add x86/HVM direct boot ABI header file
  pc: Read PVH entry point from ELF note in kernel binary
  pvh: Boot uncompressed kernel using direct boot ABI

 hw/i386/pc.c                | 344 +++++++++++++++++++++++++++++++++++++++++++-
 include/elf.h               |  10 ++
 include/hw/xen/start_info.h | 146 +++++++++++++++++++
 3 files changed, 499 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/xen/start_info.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC 0/3] QEMU changes to do PVH boot
@ 2018-12-05 22:37 ` Liam Merwick
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-05 22:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: liam.merwick, ehabkost, mst, maran.wilson, stefanha, xen-devel,
	pbonzini, rth, sgarzare

For certain applications it is desirable to rapidly boot a KVM virtual
machine. In cases where legacy hardware and software support within the
guest is not needed, QEMU should be able to boot directly into the
uncompressed Linux kernel binary with minimal firmware involvement.

There already exists an ABI to allow this for Xen PVH guests and the ABI
is supported by Linux and FreeBSD:

   https://xenbits.xen.org/docs/unstable/misc/pvh.html

Details on the Linux changes: https://lkml.org/lkml/2018/4/16/1002
qboot patches: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=80020

This patch series provides QEMU support to read the ELF header of an
uncompressed kernel binary and get the 32-bit PVH kernel entry point
from an ELF Note.  This is called when initialising the machine state
in pc_memory_init().  Later on in load_linux() if the kernel entry
address is present, the uncompressed kernel binary (ELF) is loaded
and qboot does futher initialisation of the guest (e820, etc.) and
jumps to the kernel entry address and boots the guest.


Usіng the method/scripts documented by the NEMU team at

   https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
   https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html

below are some timings measured (vmlinux and bzImage from the same build)
Time to get to kernel start is almost halved (95ṁs -> 48ms)

QEMU + qboot + vmlinux (PVH + 4.20-rc4)
 qemu_init_end: 41.550521
 fw_start: 41.667139 (+0.116618)
 fw_do_boot: 47.448495 (+5.781356)
 linux_startup_64: 47.720785 (+0.27229)
 linux_start_kernel: 48.399541 (+0.678756)
 linux_start_user: 296.952056 (+248.552515)

QEMU + qboot + bzImage:
 qemu_init_end: 29.209276
 fw_start: 29.317342 (+0.108066)
 linux_start_boot: 36.679362 (+7.36202)
 linux_startup_64: 94.531349 (+57.851987)
 linux_start_kernel: 94.900913 (+0.369564)
 linux_start_user: 401.060971 (+306.160058)

QEMU + bzImage:
 qemu_init_end: 30.424430
 linux_startup_64: 893.770334 (+863.345904)
 linux_start_kernel: 894.17049 (+0.400156)
 linux_start_user: 1208.679768 (+314.509278)


Liam Merwick (3):
  pvh: Add x86/HVM direct boot ABI header file
  pc: Read PVH entry point from ELF note in kernel binary
  pvh: Boot uncompressed kernel using direct boot ABI

 hw/i386/pc.c                | 344 +++++++++++++++++++++++++++++++++++++++++++-
 include/elf.h               |  10 ++
 include/hw/xen/start_info.h | 146 +++++++++++++++++++
 3 files changed, 499 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/xen/start_info.h

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
  2018-12-05 22:37 ` Liam Merwick
@ 2018-12-05 22:37   ` Liam Merwick
  -1 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-05 22:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, ehabkost, rth, xen-devel, sgarzare, mst, stefanha,
	maran.wilson, liam.merwick

From: Liam Merwick <Liam.Merwick@oracle.com>

The x86/HVM direct boot ABI permits Qemu to be able to boot directly
into the uncompressed Linux kernel binary without the need to run firmware.

	https://xenbits.xen.org/docs/unstable/misc/pvh.html

This commit adds the header file that defines the start_info struct
that needs to be populated in order to use this ABI.

Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
---
 include/hw/xen/start_info.h | 146 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)
 create mode 100644 include/hw/xen/start_info.h

diff --git a/include/hw/xen/start_info.h b/include/hw/xen/start_info.h
new file mode 100644
index 000000000000..348779eb10cd
--- /dev/null
+++ b/include/hw/xen/start_info.h
@@ -0,0 +1,146 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2016, Citrix Systems, Inc.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+
+/*
+ * Start of day structure passed to PVH guests and to HVM guests in %ebx.
+ *
+ * NOTE: nothing will be loaded at physical address 0, so a 0 value in any
+ * of the address fields should be treated as not present.
+ *
+ *  0 +----------------+
+ *    | magic          | Contains the magic value XEN_HVM_START_MAGIC_VALUE
+ *    |                | ("xEn3" with the 0x80 bit of the "E" set).
+ *  4 +----------------+
+ *    | version        | Version of this structure. Current version is 1. New
+ *    |                | versions are guaranteed to be backwards-compatible.
+ *  8 +----------------+
+ *    | flags          | SIF_xxx flags.
+ * 12 +----------------+
+ *    | nr_modules     | Number of modules passed to the kernel.
+ * 16 +----------------+
+ *    | modlist_paddr  | Physical address of an array of modules
+ *    |                | (layout of the structure below).
+ * 24 +----------------+
+ *    | cmdline_paddr  | Physical address of the command line,
+ *    |                | a zero-terminated ASCII string.
+ * 32 +----------------+
+ *    | rsdp_paddr     | Physical address of the RSDP ACPI data structure.
+ * 40 +----------------+
+ *    | memmap_paddr   | Physical address of the (optional) memory map. Only
+ *    |                | present in version 1 and newer of the structure.
+ * 48 +----------------+
+ *    | memmap_entries | Number of entries in the memory map table. Only
+ *    |                | present in version 1 and newer of the structure.
+ *    |                | Zero if there is no memory map being provided.
+ * 52 +----------------+
+ *    | reserved       | Version 1 and newer only.
+ * 56 +----------------+
+ *
+ * The layout of each entry in the module structure is the following:
+ *
+ *  0 +----------------+
+ *    | paddr          | Physical address of the module.
+ *  8 +----------------+
+ *    | size           | Size of the module in bytes.
+ * 16 +----------------+
+ *    | cmdline_paddr  | Physical address of the command line,
+ *    |                | a zero-terminated ASCII string.
+ * 24 +----------------+
+ *    | reserved       |
+ * 32 +----------------+
+ *
+ * The layout of each entry in the memory map table is as follows:
+ *
+ *  0 +----------------+
+ *    | addr           | Base address
+ *  8 +----------------+
+ *    | size           | Size of mapping in bytes
+ * 16 +----------------+
+ *    | type           | Type of mapping as defined between the hypervisor
+ *    |                | and guest it's starting. E820_TYPE_xxx, for example.
+ * 20 +----------------|
+ *    | reserved       |
+ * 24 +----------------+
+ *
+ * The address and sizes are always a 64bit little endian unsigned integer.
+ *
+ * NB: Xen on x86 will always try to place all the data below the 4GiB
+ * boundary.
+ *
+ * Version numbers of the hvm_start_info structure have evolved like this:
+ *
+ * Version 0:
+ *
+ * Version 1:   Added the memmap_paddr/memmap_entries fields (plus 4 bytes of
+ *              padding) to the end of the hvm_start_info struct. These new
+ *              fields can be used to pass a memory map to the guest. The
+ *              memory map is optional and so guests that understand version 1
+ *              of the structure must check that memmap_entries is non-zero
+ *              before trying to read the memory map.
+ */
+#define XEN_HVM_START_MAGIC_VALUE 0x336ec578
+
+/*
+ * C representation of the x86/HVM start info layout.
+ *
+ * The canonical definition of this layout is above, this is just a way to
+ * represent the layout described there using C types.
+ */
+struct hvm_start_info {
+    uint32_t magic;             /* Contains the magic value 0x336ec578       */
+                                /* ("xEn3" with the 0x80 bit of the "E" set).*/
+    uint32_t version;           /* Version of this structure.                */
+    uint32_t flags;             /* SIF_xxx flags.                            */
+    uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
+    uint64_t modlist_paddr;     /* Physical address of an array of           */
+                                /* hvm_modlist_entry.                        */
+    uint64_t cmdline_paddr;     /* Physical address of the command line.     */
+    uint64_t rsdp_paddr;        /* Physical address of the RSDP ACPI data    */
+                                /* structure.                                */
+    uint64_t memmap_paddr;      /* Physical address of an array of           */
+                                /* hvm_memmap_table_entry. Only present in   */
+                                /* version 1 and newer of the structure      */
+    uint32_t memmap_entries;    /* Number of entries in the memmap table.    */
+                                /* Only present in version 1 and newer of    */
+                                /* the structure. Value will be zero if      */
+                                /* there is no memory map being provided.    */
+    uint32_t reserved;
+};
+
+struct hvm_modlist_entry {
+    uint64_t paddr;             /* Physical address of the module.           */
+    uint64_t size;              /* Size of the module in bytes.              */
+    uint64_t cmdline_paddr;     /* Physical address of the command line.     */
+    uint64_t reserved;
+};
+
+struct hvm_memmap_table_entry {
+    uint64_t addr;              /* Base address of the memory region         */
+    uint64_t size;              /* Size of the memory region in bytes        */
+    uint32_t type;              /* Mapping type                              */
+    uint32_t reserved;
+};
+
+#endif /* __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
@ 2018-12-05 22:37   ` Liam Merwick
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-05 22:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: liam.merwick, ehabkost, mst, maran.wilson, stefanha, xen-devel,
	pbonzini, rth, sgarzare

From: Liam Merwick <Liam.Merwick@oracle.com>

The x86/HVM direct boot ABI permits Qemu to be able to boot directly
into the uncompressed Linux kernel binary without the need to run firmware.

	https://xenbits.xen.org/docs/unstable/misc/pvh.html

This commit adds the header file that defines the start_info struct
that needs to be populated in order to use this ABI.

Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
---
 include/hw/xen/start_info.h | 146 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)
 create mode 100644 include/hw/xen/start_info.h

diff --git a/include/hw/xen/start_info.h b/include/hw/xen/start_info.h
new file mode 100644
index 000000000000..348779eb10cd
--- /dev/null
+++ b/include/hw/xen/start_info.h
@@ -0,0 +1,146 @@
+/*
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2016, Citrix Systems, Inc.
+ */
+
+#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__
+
+/*
+ * Start of day structure passed to PVH guests and to HVM guests in %ebx.
+ *
+ * NOTE: nothing will be loaded at physical address 0, so a 0 value in any
+ * of the address fields should be treated as not present.
+ *
+ *  0 +----------------+
+ *    | magic          | Contains the magic value XEN_HVM_START_MAGIC_VALUE
+ *    |                | ("xEn3" with the 0x80 bit of the "E" set).
+ *  4 +----------------+
+ *    | version        | Version of this structure. Current version is 1. New
+ *    |                | versions are guaranteed to be backwards-compatible.
+ *  8 +----------------+
+ *    | flags          | SIF_xxx flags.
+ * 12 +----------------+
+ *    | nr_modules     | Number of modules passed to the kernel.
+ * 16 +----------------+
+ *    | modlist_paddr  | Physical address of an array of modules
+ *    |                | (layout of the structure below).
+ * 24 +----------------+
+ *    | cmdline_paddr  | Physical address of the command line,
+ *    |                | a zero-terminated ASCII string.
+ * 32 +----------------+
+ *    | rsdp_paddr     | Physical address of the RSDP ACPI data structure.
+ * 40 +----------------+
+ *    | memmap_paddr   | Physical address of the (optional) memory map. Only
+ *    |                | present in version 1 and newer of the structure.
+ * 48 +----------------+
+ *    | memmap_entries | Number of entries in the memory map table. Only
+ *    |                | present in version 1 and newer of the structure.
+ *    |                | Zero if there is no memory map being provided.
+ * 52 +----------------+
+ *    | reserved       | Version 1 and newer only.
+ * 56 +----------------+
+ *
+ * The layout of each entry in the module structure is the following:
+ *
+ *  0 +----------------+
+ *    | paddr          | Physical address of the module.
+ *  8 +----------------+
+ *    | size           | Size of the module in bytes.
+ * 16 +----------------+
+ *    | cmdline_paddr  | Physical address of the command line,
+ *    |                | a zero-terminated ASCII string.
+ * 24 +----------------+
+ *    | reserved       |
+ * 32 +----------------+
+ *
+ * The layout of each entry in the memory map table is as follows:
+ *
+ *  0 +----------------+
+ *    | addr           | Base address
+ *  8 +----------------+
+ *    | size           | Size of mapping in bytes
+ * 16 +----------------+
+ *    | type           | Type of mapping as defined between the hypervisor
+ *    |                | and guest it's starting. E820_TYPE_xxx, for example.
+ * 20 +----------------|
+ *    | reserved       |
+ * 24 +----------------+
+ *
+ * The address and sizes are always a 64bit little endian unsigned integer.
+ *
+ * NB: Xen on x86 will always try to place all the data below the 4GiB
+ * boundary.
+ *
+ * Version numbers of the hvm_start_info structure have evolved like this:
+ *
+ * Version 0:
+ *
+ * Version 1:   Added the memmap_paddr/memmap_entries fields (plus 4 bytes of
+ *              padding) to the end of the hvm_start_info struct. These new
+ *              fields can be used to pass a memory map to the guest. The
+ *              memory map is optional and so guests that understand version 1
+ *              of the structure must check that memmap_entries is non-zero
+ *              before trying to read the memory map.
+ */
+#define XEN_HVM_START_MAGIC_VALUE 0x336ec578
+
+/*
+ * C representation of the x86/HVM start info layout.
+ *
+ * The canonical definition of this layout is above, this is just a way to
+ * represent the layout described there using C types.
+ */
+struct hvm_start_info {
+    uint32_t magic;             /* Contains the magic value 0x336ec578       */
+                                /* ("xEn3" with the 0x80 bit of the "E" set).*/
+    uint32_t version;           /* Version of this structure.                */
+    uint32_t flags;             /* SIF_xxx flags.                            */
+    uint32_t nr_modules;        /* Number of modules passed to the kernel.   */
+    uint64_t modlist_paddr;     /* Physical address of an array of           */
+                                /* hvm_modlist_entry.                        */
+    uint64_t cmdline_paddr;     /* Physical address of the command line.     */
+    uint64_t rsdp_paddr;        /* Physical address of the RSDP ACPI data    */
+                                /* structure.                                */
+    uint64_t memmap_paddr;      /* Physical address of an array of           */
+                                /* hvm_memmap_table_entry. Only present in   */
+                                /* version 1 and newer of the structure      */
+    uint32_t memmap_entries;    /* Number of entries in the memmap table.    */
+                                /* Only present in version 1 and newer of    */
+                                /* the structure. Value will be zero if      */
+                                /* there is no memory map being provided.    */
+    uint32_t reserved;
+};
+
+struct hvm_modlist_entry {
+    uint64_t paddr;             /* Physical address of the module.           */
+    uint64_t size;              /* Size of the module in bytes.              */
+    uint64_t cmdline_paddr;     /* Physical address of the command line.     */
+    uint64_t reserved;
+};
+
+struct hvm_memmap_table_entry {
+    uint64_t addr;              /* Base address of the memory region         */
+    uint64_t size;              /* Size of the memory region in bytes        */
+    uint32_t type;              /* Mapping type                              */
+    uint32_t reserved;
+};
+
+#endif /* __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary
  2018-12-05 22:37 ` Liam Merwick
@ 2018-12-05 22:37   ` Liam Merwick
  -1 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-05 22:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, ehabkost, rth, xen-devel, sgarzare, mst, stefanha,
	maran.wilson, liam.merwick

From: Liam Merwick <Liam.Merwick@oracle.com>

Add support to read the PVH Entry address from an ELF note in the
uncompressed kernel binary (as defined by the x86/HVM direct boot ABI).
This 32-bit entry point will be used by QEMU to load the kernel in the
guest and jump into the kernel entry point.

For now, a call to this function is added in pc_memory_init() to read the
address - a future patch will use the entry point.

Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
---
 hw/i386/pc.c  | 272 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/elf.h |  10 +++
 2 files changed, 281 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f095725dbab2..056aa46d99b9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -109,6 +109,9 @@ static struct e820_entry *e820_table;
 static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
+/* Physical Address of PVH entry point read from kernel ELF NOTE */
+static size_t pvh_start_addr;
+
 void gsi_handler(void *opaque, int n, int level)
 {
     GSIState *s = opaque;
@@ -834,6 +837,267 @@ struct setup_data {
     uint8_t data[0];
 } __attribute__((packed));
 
+/*
+ * Search through the ELF Notes for an entry with the given
+ * ELF Note type
+ */
+static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64,
+    size_t elf_note_type)
+{
+    void *nhdr = NULL;
+    size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
+    size_t elf_note_entry_sz = 0;
+    size_t phdr_off;
+    size_t phdr_align;
+    size_t phdr_memsz;
+    size_t nhdr_namesz;
+    size_t nhdr_descsz;
+    size_t note_type;
+
+    phdr_off = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
+    phdr_align = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
+    phdr_memsz = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
+
+    nhdr = ehdr + phdr_off;
+    note_type = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
+    nhdr_namesz = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+    nhdr_descsz = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+
+    while (note_type != elf_note_type) {
+        elf_note_entry_sz = nhdr_size +
+            QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+            QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
+
+        /*
+         * Verify that we haven't exceeded the end of the ELF Note section.
+         * If we have, then there is no note of the given type present
+         * in the ELF Notes.
+         */
+        if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) {
+            error_report("Note type (0x%lx) not found in ELF Note section",
+                elf_note_type);
+            return NULL;
+        }
+
+        /* skip to the next ELF Note entry */
+        nhdr += elf_note_entry_sz;
+        note_type = elf_is64 ?
+            ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
+        nhdr_namesz = elf_is64 ?
+            ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+        nhdr_descsz = elf_is64 ?
+            ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+    }
+
+    return nhdr;
+}
+
+/*
+ * The entry point into the kernel for PVH boot is different from
+ * the native entry point.  The PVH entry is defined by the x86/HVM
+ * direct boot ABI and is available in an ELFNOTE in the kernel binary.
+ * This function reads the ELF headers of the binary specified on the
+ * command line by -kernel (path contained in 'filename') and discovers
+ * the PVH entry address from the appropriate ELF Note.
+ *
+ * The address of the PVH entry point is saved to the 'pvh_start_addr'
+ * global variable. The ELF class of the binary is returned via 'elfclass'
+ * (although the entry point is 32-bit, the kernel binary can be either
+ * 32-bit or 64-bit).
+ */
+static bool read_pvh_start_addr_elf_note(const char *filename,
+    unsigned char *elfclass)
+{
+    void *ehdr = NULL; /* Cast to Elf64_Ehdr or Elf32_Ehdr */
+    void *phdr = NULL; /* Cast to Elf64_Phdr or Elf32_Phdr */
+    void *nhdr = NULL; /* Cast to Elf64_Nhdr or Elf32_Nhdr */
+    struct stat statbuf;
+    size_t ehdr_size;
+    size_t phdr_size;
+    size_t nhdr_size;
+    size_t elf_note_data_addr;
+    /* Ehdr fields */
+    size_t ehdr_poff;
+    /* Phdr fields */
+    size_t phdr_off;
+    size_t phdr_align;
+    size_t phdr_memsz;
+    size_t phdr_type;
+    /* Nhdr fields */
+    size_t nhdr_namesz;
+    size_t nhdr_descsz;
+    bool elf_is64;
+    FILE *file;
+    union {
+        Elf32_Ehdr h32;
+        Elf64_Ehdr h64;
+    } elf_header;
+    Error *err = NULL;
+
+    pvh_start_addr = 0;
+
+    if (filename == NULL) {
+        return false;
+    }
+
+    file = fopen(filename, "rb");
+    if (file == NULL) {
+        error_report("fopen(%s) failed", filename);
+        return false;
+    }
+
+    if (fstat(fileno(file), &statbuf) < 0) {
+        error_report("fstat() failed on file (%s)", filename);
+        return false;
+    }
+
+    load_elf_hdr(filename, &elf_header, &elf_is64, &err);
+    if (err) {
+        error_free(err);
+        fclose(file);
+        return false;
+    }
+
+    *elfclass = elf_is64 ?
+        elf_header.h64.e_ident[EI_CLASS] : elf_header.h32.e_ident[EI_CLASS];
+    if (*elfclass == ELFCLASSNONE) {
+        error_report("kernel binary (%s) is ELFCLASSNONE", filename);
+        fclose(file);
+        return false;
+    }
+
+    ehdr_size = elf_is64 ? sizeof(Elf64_Ehdr) : sizeof(Elf32_Ehdr);
+    phdr_size = elf_is64 ? sizeof(Elf64_Phdr) : sizeof(Elf32_Phdr);
+    nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
+
+    /* We have already validated the ELF header when calling elf_load_hdr() */
+
+    ehdr = mmap(0, statbuf.st_size,
+        PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
+    if (ehdr == MAP_FAILED) {
+        error_report("Failed to mmap kernel binary (%s)", filename);
+        goto done;
+    }
+
+    /*
+     * Search through the program execution header for the
+     * ELF Note section.
+     */
+
+    ehdr_poff = elf_is64 ?
+        ((Elf64_Ehdr *)(ehdr))->e_phoff : ((Elf32_Ehdr *)(ehdr))->e_phoff;
+    if (statbuf.st_size < (ehdr_size + ehdr_poff)) {
+        error_report("ELF NOTE section exceeds file (%s) size",
+            filename);
+        goto done;
+    }
+
+    phdr = ehdr + ehdr_poff;
+    phdr_type = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
+    while (phdr != NULL && phdr_type != PT_NOTE) {
+        if (statbuf.st_size < ((phdr - ehdr) + phdr_size)) {
+            error_report("ELF Program headers in file (%s) too short",
+                filename);
+            goto done;
+        }
+        phdr += phdr_size;
+        phdr_type = elf_is64 ?
+            ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
+    }
+
+    phdr_off = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
+    phdr_align = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
+    phdr_memsz = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
+
+    /*
+     * check that the start of the ELF Note section is within the bounds
+     * of the kernel ELF binary
+     */
+    if (statbuf.st_size < (ehdr_poff + phdr_size + phdr_off)) {
+        error_report("Start of ELF note section outside of file (%s) bounds",
+            filename);
+        goto done;
+    }
+    /*
+     * check that the end of the ELF Note section is within the bounds
+     * of the kernel ELF binary
+     */
+    if (statbuf.st_size < (phdr_off + phdr_memsz)) {
+        error_report("End of ELF note section outside of file (%s) bounds",
+            filename);
+        goto done;
+    }
+
+    /*
+     * Search through the ELF Notes for an entry with the
+     * Physical Address (PA) of the PVH entry point.
+     */
+    nhdr = get_elf_note_type(ehdr, phdr, elf_is64, XEN_ELFNOTE_PHYS32_ENTRY);
+    if (nhdr == NULL) {
+        error_report("No PVH Entry details in kernel (%s) ELF Note section",
+            filename);
+        goto done;
+    }
+
+    /*
+     * Verify that the returned ELF Note header doesn't exceed the
+     * end of the kernel file
+     */
+    if (statbuf.st_size < ((nhdr - ehdr))) {
+        error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
+            (nhdr - ehdr), filename, statbuf.st_size);
+        goto done;
+    }
+
+    nhdr_namesz = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+    nhdr_descsz = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+
+    /*
+     * Verify that the ELF Note contents don't exceed the end of the
+     * kernel file
+     */
+    if (statbuf.st_size < ((nhdr - ehdr)) + nhdr_size +
+        QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+        QEMU_ALIGN_UP(nhdr_descsz, phdr_align)) {
+        error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
+            (nhdr - ehdr) + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+            QEMU_ALIGN_UP(nhdr_descsz, phdr_align), statbuf.st_size);
+        goto done;
+    }
+
+    elf_note_data_addr =
+        (size_t)nhdr + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+
+    pvh_start_addr = *(size_t *)elf_note_data_addr;
+
+    /*
+     * Verify that the PVH Entry point address does not exceed the
+     * bounds of the kernel file.
+     */
+    if (statbuf.st_size < pvh_start_addr) {
+        error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
+            (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
+        pvh_start_addr = 0;
+        goto done;
+    }
+
+done:
+    (void) munmap(ehdr, statbuf.st_size);
+    return pvh_start_addr != 0;
+}
+
 static void load_linux(PCMachineState *pcms,
                        FWCfgState *fw_cfg)
 {
@@ -1334,9 +1598,11 @@ void pc_memory_init(PCMachineState *pcms,
     int linux_boot, i;
     MemoryRegion *ram, *option_rom_mr;
     MemoryRegion *ram_below_4g, *ram_above_4g;
-    FWCfgState *fw_cfg;
+    FWCfgState *fw_cfg = NULL;
+    unsigned char class = ELFCLASSNONE;
     MachineState *machine = MACHINE(pcms);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    const char *kernel_filename = machine->kernel_filename;
 
     assert(machine->ram_size == pcms->below_4g_mem_size +
                                 pcms->above_4g_mem_size);
@@ -1418,6 +1684,10 @@ void pc_memory_init(PCMachineState *pcms,
                                     &machine->device_memory->mr);
     }
 
+    if (linux_boot) {
+        read_pvh_start_addr_elf_note(kernel_filename, &class);
+    }
+
     /* Initialize PC system firmware */
     pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
 
diff --git a/include/elf.h b/include/elf.h
index c151164b63da..1f82c7a7124b 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -1585,6 +1585,16 @@ typedef struct elf64_shdr {
 #define NT_ARM_HW_WATCH 0x403           /* ARM hardware watchpoint registers */
 #define NT_ARM_SYSTEM_CALL      0x404   /* ARM system call number */
 
+/*
+ * Physical entry point into the kernel.
+ *
+ * 32bit entry point into the kernel. When requested to launch the
+ * guest kernel, use this entry point to launch the guest in 32-bit
+ * protected mode with paging disabled.
+ *
+ * [ Corresponding definition in Linux kernel: include/xen/interface/elfnote.h ]
+ */
+#define XEN_ELFNOTE_PHYS32_ENTRY    18  /* 0x12 */
 
 /* Note header in a PT_NOTE section */
 typedef struct elf32_note {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary
@ 2018-12-05 22:37   ` Liam Merwick
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-05 22:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: liam.merwick, ehabkost, mst, maran.wilson, stefanha, xen-devel,
	pbonzini, rth, sgarzare

From: Liam Merwick <Liam.Merwick@oracle.com>

Add support to read the PVH Entry address from an ELF note in the
uncompressed kernel binary (as defined by the x86/HVM direct boot ABI).
This 32-bit entry point will be used by QEMU to load the kernel in the
guest and jump into the kernel entry point.

For now, a call to this function is added in pc_memory_init() to read the
address - a future patch will use the entry point.

Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
---
 hw/i386/pc.c  | 272 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/elf.h |  10 +++
 2 files changed, 281 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f095725dbab2..056aa46d99b9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -109,6 +109,9 @@ static struct e820_entry *e820_table;
 static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
+/* Physical Address of PVH entry point read from kernel ELF NOTE */
+static size_t pvh_start_addr;
+
 void gsi_handler(void *opaque, int n, int level)
 {
     GSIState *s = opaque;
@@ -834,6 +837,267 @@ struct setup_data {
     uint8_t data[0];
 } __attribute__((packed));
 
+/*
+ * Search through the ELF Notes for an entry with the given
+ * ELF Note type
+ */
+static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64,
+    size_t elf_note_type)
+{
+    void *nhdr = NULL;
+    size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
+    size_t elf_note_entry_sz = 0;
+    size_t phdr_off;
+    size_t phdr_align;
+    size_t phdr_memsz;
+    size_t nhdr_namesz;
+    size_t nhdr_descsz;
+    size_t note_type;
+
+    phdr_off = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
+    phdr_align = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
+    phdr_memsz = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
+
+    nhdr = ehdr + phdr_off;
+    note_type = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
+    nhdr_namesz = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+    nhdr_descsz = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+
+    while (note_type != elf_note_type) {
+        elf_note_entry_sz = nhdr_size +
+            QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+            QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
+
+        /*
+         * Verify that we haven't exceeded the end of the ELF Note section.
+         * If we have, then there is no note of the given type present
+         * in the ELF Notes.
+         */
+        if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) {
+            error_report("Note type (0x%lx) not found in ELF Note section",
+                elf_note_type);
+            return NULL;
+        }
+
+        /* skip to the next ELF Note entry */
+        nhdr += elf_note_entry_sz;
+        note_type = elf_is64 ?
+            ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
+        nhdr_namesz = elf_is64 ?
+            ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+        nhdr_descsz = elf_is64 ?
+            ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+    }
+
+    return nhdr;
+}
+
+/*
+ * The entry point into the kernel for PVH boot is different from
+ * the native entry point.  The PVH entry is defined by the x86/HVM
+ * direct boot ABI and is available in an ELFNOTE in the kernel binary.
+ * This function reads the ELF headers of the binary specified on the
+ * command line by -kernel (path contained in 'filename') and discovers
+ * the PVH entry address from the appropriate ELF Note.
+ *
+ * The address of the PVH entry point is saved to the 'pvh_start_addr'
+ * global variable. The ELF class of the binary is returned via 'elfclass'
+ * (although the entry point is 32-bit, the kernel binary can be either
+ * 32-bit or 64-bit).
+ */
+static bool read_pvh_start_addr_elf_note(const char *filename,
+    unsigned char *elfclass)
+{
+    void *ehdr = NULL; /* Cast to Elf64_Ehdr or Elf32_Ehdr */
+    void *phdr = NULL; /* Cast to Elf64_Phdr or Elf32_Phdr */
+    void *nhdr = NULL; /* Cast to Elf64_Nhdr or Elf32_Nhdr */
+    struct stat statbuf;
+    size_t ehdr_size;
+    size_t phdr_size;
+    size_t nhdr_size;
+    size_t elf_note_data_addr;
+    /* Ehdr fields */
+    size_t ehdr_poff;
+    /* Phdr fields */
+    size_t phdr_off;
+    size_t phdr_align;
+    size_t phdr_memsz;
+    size_t phdr_type;
+    /* Nhdr fields */
+    size_t nhdr_namesz;
+    size_t nhdr_descsz;
+    bool elf_is64;
+    FILE *file;
+    union {
+        Elf32_Ehdr h32;
+        Elf64_Ehdr h64;
+    } elf_header;
+    Error *err = NULL;
+
+    pvh_start_addr = 0;
+
+    if (filename == NULL) {
+        return false;
+    }
+
+    file = fopen(filename, "rb");
+    if (file == NULL) {
+        error_report("fopen(%s) failed", filename);
+        return false;
+    }
+
+    if (fstat(fileno(file), &statbuf) < 0) {
+        error_report("fstat() failed on file (%s)", filename);
+        return false;
+    }
+
+    load_elf_hdr(filename, &elf_header, &elf_is64, &err);
+    if (err) {
+        error_free(err);
+        fclose(file);
+        return false;
+    }
+
+    *elfclass = elf_is64 ?
+        elf_header.h64.e_ident[EI_CLASS] : elf_header.h32.e_ident[EI_CLASS];
+    if (*elfclass == ELFCLASSNONE) {
+        error_report("kernel binary (%s) is ELFCLASSNONE", filename);
+        fclose(file);
+        return false;
+    }
+
+    ehdr_size = elf_is64 ? sizeof(Elf64_Ehdr) : sizeof(Elf32_Ehdr);
+    phdr_size = elf_is64 ? sizeof(Elf64_Phdr) : sizeof(Elf32_Phdr);
+    nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
+
+    /* We have already validated the ELF header when calling elf_load_hdr() */
+
+    ehdr = mmap(0, statbuf.st_size,
+        PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
+    if (ehdr == MAP_FAILED) {
+        error_report("Failed to mmap kernel binary (%s)", filename);
+        goto done;
+    }
+
+    /*
+     * Search through the program execution header for the
+     * ELF Note section.
+     */
+
+    ehdr_poff = elf_is64 ?
+        ((Elf64_Ehdr *)(ehdr))->e_phoff : ((Elf32_Ehdr *)(ehdr))->e_phoff;
+    if (statbuf.st_size < (ehdr_size + ehdr_poff)) {
+        error_report("ELF NOTE section exceeds file (%s) size",
+            filename);
+        goto done;
+    }
+
+    phdr = ehdr + ehdr_poff;
+    phdr_type = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
+    while (phdr != NULL && phdr_type != PT_NOTE) {
+        if (statbuf.st_size < ((phdr - ehdr) + phdr_size)) {
+            error_report("ELF Program headers in file (%s) too short",
+                filename);
+            goto done;
+        }
+        phdr += phdr_size;
+        phdr_type = elf_is64 ?
+            ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
+    }
+
+    phdr_off = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
+    phdr_align = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
+    phdr_memsz = elf_is64 ?
+        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
+
+    /*
+     * check that the start of the ELF Note section is within the bounds
+     * of the kernel ELF binary
+     */
+    if (statbuf.st_size < (ehdr_poff + phdr_size + phdr_off)) {
+        error_report("Start of ELF note section outside of file (%s) bounds",
+            filename);
+        goto done;
+    }
+    /*
+     * check that the end of the ELF Note section is within the bounds
+     * of the kernel ELF binary
+     */
+    if (statbuf.st_size < (phdr_off + phdr_memsz)) {
+        error_report("End of ELF note section outside of file (%s) bounds",
+            filename);
+        goto done;
+    }
+
+    /*
+     * Search through the ELF Notes for an entry with the
+     * Physical Address (PA) of the PVH entry point.
+     */
+    nhdr = get_elf_note_type(ehdr, phdr, elf_is64, XEN_ELFNOTE_PHYS32_ENTRY);
+    if (nhdr == NULL) {
+        error_report("No PVH Entry details in kernel (%s) ELF Note section",
+            filename);
+        goto done;
+    }
+
+    /*
+     * Verify that the returned ELF Note header doesn't exceed the
+     * end of the kernel file
+     */
+    if (statbuf.st_size < ((nhdr - ehdr))) {
+        error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
+            (nhdr - ehdr), filename, statbuf.st_size);
+        goto done;
+    }
+
+    nhdr_namesz = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
+    nhdr_descsz = elf_is64 ?
+        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
+
+    /*
+     * Verify that the ELF Note contents don't exceed the end of the
+     * kernel file
+     */
+    if (statbuf.st_size < ((nhdr - ehdr)) + nhdr_size +
+        QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+        QEMU_ALIGN_UP(nhdr_descsz, phdr_align)) {
+        error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
+            (nhdr - ehdr) + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
+            QEMU_ALIGN_UP(nhdr_descsz, phdr_align), statbuf.st_size);
+        goto done;
+    }
+
+    elf_note_data_addr =
+        (size_t)nhdr + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+
+    pvh_start_addr = *(size_t *)elf_note_data_addr;
+
+    /*
+     * Verify that the PVH Entry point address does not exceed the
+     * bounds of the kernel file.
+     */
+    if (statbuf.st_size < pvh_start_addr) {
+        error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
+            (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
+        pvh_start_addr = 0;
+        goto done;
+    }
+
+done:
+    (void) munmap(ehdr, statbuf.st_size);
+    return pvh_start_addr != 0;
+}
+
 static void load_linux(PCMachineState *pcms,
                        FWCfgState *fw_cfg)
 {
@@ -1334,9 +1598,11 @@ void pc_memory_init(PCMachineState *pcms,
     int linux_boot, i;
     MemoryRegion *ram, *option_rom_mr;
     MemoryRegion *ram_below_4g, *ram_above_4g;
-    FWCfgState *fw_cfg;
+    FWCfgState *fw_cfg = NULL;
+    unsigned char class = ELFCLASSNONE;
     MachineState *machine = MACHINE(pcms);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    const char *kernel_filename = machine->kernel_filename;
 
     assert(machine->ram_size == pcms->below_4g_mem_size +
                                 pcms->above_4g_mem_size);
@@ -1418,6 +1684,10 @@ void pc_memory_init(PCMachineState *pcms,
                                     &machine->device_memory->mr);
     }
 
+    if (linux_boot) {
+        read_pvh_start_addr_elf_note(kernel_filename, &class);
+    }
+
     /* Initialize PC system firmware */
     pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
 
diff --git a/include/elf.h b/include/elf.h
index c151164b63da..1f82c7a7124b 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -1585,6 +1585,16 @@ typedef struct elf64_shdr {
 #define NT_ARM_HW_WATCH 0x403           /* ARM hardware watchpoint registers */
 #define NT_ARM_SYSTEM_CALL      0x404   /* ARM system call number */
 
+/*
+ * Physical entry point into the kernel.
+ *
+ * 32bit entry point into the kernel. When requested to launch the
+ * guest kernel, use this entry point to launch the guest in 32-bit
+ * protected mode with paging disabled.
+ *
+ * [ Corresponding definition in Linux kernel: include/xen/interface/elfnote.h ]
+ */
+#define XEN_ELFNOTE_PHYS32_ENTRY    18  /* 0x12 */
 
 /* Note header in a PT_NOTE section */
 typedef struct elf32_note {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
  2018-12-05 22:37 ` Liam Merwick
@ 2018-12-05 22:37   ` Liam Merwick
  -1 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-05 22:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, ehabkost, rth, xen-devel, sgarzare, mst, stefanha,
	maran.wilson, liam.merwick

These changes (along with corresponding qboot and Linux kernel changes)
enable a guest to be booted using the x86/HVM direct boot ABI.

This commit adds a load_elfboot() routine to pass the size and
location of the kernel entry point to qboot (which will fill in
the start_info struct information needed to to boot the guest).
Having loaded the ELF binary, load_linux() will run qboot
which continues the boot.

The address for the kernel entry point has already been read
from an ELF Note in the uncompressed kernel binary earlier
in pc_memory_init().

Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
---
 hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 056aa46d99b9..d3012cbd8597 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -54,6 +54,7 @@
 #include "sysemu/qtest.h"
 #include "kvm_i386.h"
 #include "hw/xen/xen.h"
+#include "hw/xen/start_info.h"
 #include "ui/qemu-spice.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
@@ -1098,6 +1099,50 @@ done:
     return pvh_start_addr != 0;
 }
 
+static bool load_elfboot(const char *kernel_filename,
+                   int kernel_file_size,
+                   uint8_t *header,
+                   size_t pvh_xen_start_addr,
+                   FWCfgState *fw_cfg)
+{
+    uint32_t flags = 0;
+    uint32_t mh_load_addr = 0;
+    uint32_t elf_kernel_size = 0;
+    uint64_t elf_entry;
+    uint64_t elf_low, elf_high;
+    int kernel_size;
+
+    if (ldl_p(header) != 0x464c457f) {
+        return false; /* no elfboot */
+    }
+
+    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
+    flags = elf_is64 ?
+        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
+
+    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
+        error_report("elfboot unsupported flags = %x", flags);
+        exit(1);
+    }
+
+    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
+                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
+                           0, 0);
+
+    if (kernel_size < 0) {
+        error_report("Error while loading elf kernel");
+        exit(1);
+    }
+    mh_load_addr = elf_low;
+    elf_kernel_size = elf_high - elf_low;
+
+    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
+
+    return true;
+}
+
 static void load_linux(PCMachineState *pcms,
                        FWCfgState *fw_cfg)
 {
@@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
     if (ldl_p(header+0x202) == 0x53726448) {
         protocol = lduw_p(header+0x206);
     } else {
+        /* If the kernel address for using the x86/HVM direct boot ABI has
+         * been saved then proceed with booting the uncompressed kernel */
+        if (pvh_start_addr) {
+            if (load_elfboot(kernel_filename, kernel_size,
+                             header, pvh_start_addr, fw_cfg)) {
+                struct hvm_modlist_entry ramdisk_mod = { 0 };
+
+                fclose(f);
+
+                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
+                    strlen(kernel_cmdline) + 1);
+                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
+
+                assert(machine->device_memory != NULL);
+                ramdisk_mod.paddr = machine->device_memory->base;
+                ramdisk_mod.size =
+                    memory_region_size(&machine->device_memory->mr);
+
+                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
+                                 sizeof(ramdisk_mod));
+                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
+                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
+                                 header, sizeof(header));
+
+                return;
+            }
+        }
         /* This looks like a multiboot kernel. If it is, let's stop
            treating it like a Linux kernel. */
         if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
@ 2018-12-05 22:37   ` Liam Merwick
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-05 22:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: liam.merwick, ehabkost, mst, maran.wilson, stefanha, xen-devel,
	pbonzini, rth, sgarzare

These changes (along with corresponding qboot and Linux kernel changes)
enable a guest to be booted using the x86/HVM direct boot ABI.

This commit adds a load_elfboot() routine to pass the size and
location of the kernel entry point to qboot (which will fill in
the start_info struct information needed to to boot the guest).
Having loaded the ELF binary, load_linux() will run qboot
which continues the boot.

The address for the kernel entry point has already been read
from an ELF Note in the uncompressed kernel binary earlier
in pc_memory_init().

Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
---
 hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 056aa46d99b9..d3012cbd8597 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -54,6 +54,7 @@
 #include "sysemu/qtest.h"
 #include "kvm_i386.h"
 #include "hw/xen/xen.h"
+#include "hw/xen/start_info.h"
 #include "ui/qemu-spice.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
@@ -1098,6 +1099,50 @@ done:
     return pvh_start_addr != 0;
 }
 
+static bool load_elfboot(const char *kernel_filename,
+                   int kernel_file_size,
+                   uint8_t *header,
+                   size_t pvh_xen_start_addr,
+                   FWCfgState *fw_cfg)
+{
+    uint32_t flags = 0;
+    uint32_t mh_load_addr = 0;
+    uint32_t elf_kernel_size = 0;
+    uint64_t elf_entry;
+    uint64_t elf_low, elf_high;
+    int kernel_size;
+
+    if (ldl_p(header) != 0x464c457f) {
+        return false; /* no elfboot */
+    }
+
+    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
+    flags = elf_is64 ?
+        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
+
+    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
+        error_report("elfboot unsupported flags = %x", flags);
+        exit(1);
+    }
+
+    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
+                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
+                           0, 0);
+
+    if (kernel_size < 0) {
+        error_report("Error while loading elf kernel");
+        exit(1);
+    }
+    mh_load_addr = elf_low;
+    elf_kernel_size = elf_high - elf_low;
+
+    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
+
+    return true;
+}
+
 static void load_linux(PCMachineState *pcms,
                        FWCfgState *fw_cfg)
 {
@@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
     if (ldl_p(header+0x202) == 0x53726448) {
         protocol = lduw_p(header+0x206);
     } else {
+        /* If the kernel address for using the x86/HVM direct boot ABI has
+         * been saved then proceed with booting the uncompressed kernel */
+        if (pvh_start_addr) {
+            if (load_elfboot(kernel_filename, kernel_size,
+                             header, pvh_start_addr, fw_cfg)) {
+                struct hvm_modlist_entry ramdisk_mod = { 0 };
+
+                fclose(f);
+
+                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
+                    strlen(kernel_cmdline) + 1);
+                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
+
+                assert(machine->device_memory != NULL);
+                ramdisk_mod.paddr = machine->device_memory->base;
+                ramdisk_mod.size =
+                    memory_region_size(&machine->device_memory->mr);
+
+                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
+                                 sizeof(ramdisk_mod));
+                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
+                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
+                                 header, sizeof(header));
+
+                return;
+            }
+        }
         /* This looks like a multiboot kernel. If it is, let's stop
            treating it like a Linux kernel. */
         if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] QEMU changes to do PVH boot
  2018-12-05 22:37 ` Liam Merwick
@ 2018-12-06  0:01   ` no-reply
  -1 siblings, 0 replies; 32+ messages in thread
From: no-reply @ 2018-12-06  0:01 UTC (permalink / raw)
  To: liam.merwick
  Cc: famz, qemu-devel, ehabkost, mst, maran.wilson, stefanha,
	xen-devel, pbonzini, rth, sgarzare

Patchew URL: https://patchew.org/QEMU/1544049446-6359-1-git-send-email-liam.merwick@oracle.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=8
=== TEST SCRIPT END ===

  CC      x86_64-softmmu/target/i386/hax-windows.o
  CC      x86_64-softmmu/target/i386/sev-stub.o
/tmp/qemu-test/src/hw/i386/pc.c: In function 'get_elf_note_type':
/tmp/qemu-test/src/hw/i386/pc.c:884:42: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'size_t {aka long long unsigned int}' [-Werror=format=]
             error_report("Note type (0x%lx) not found in ELF Note section",
                                        ~~^
                                        %llx
/tmp/qemu-test/src/hw/i386/pc.c: In function 'read_pvh_start_addr_elf_note':
/tmp/qemu-test/src/hw/i386/pc.c:982:12: error: implicit declaration of function 'mmap'; did you mean 'max'? [-Werror=implicit-function-declaration]
     ehdr = mmap(0, statbuf.st_size,
            ^~~~
            max
/tmp/qemu-test/src/hw/i386/pc.c:982:12: error: nested extern declaration of 'mmap' [-Werror=nested-externs]
/tmp/qemu-test/src/hw/i386/pc.c:983:9: error: 'PROT_READ' undeclared (first use in this function); did you mean 'OF_READ'?
         PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
         ^~~~~~~~~
         OF_READ
/tmp/qemu-test/src/hw/i386/pc.c:983:9: note: each undeclared identifier is reported only once for each function it appears in
/tmp/qemu-test/src/hw/i386/pc.c:983:21: error: 'PROT_WRITE' undeclared (first use in this function); did you mean 'OF_WRITE'?
         PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
                     ^~~~~~~~~~
                     OF_WRITE
/tmp/qemu-test/src/hw/i386/pc.c:983:33: error: 'MAP_PRIVATE' undeclared (first use in this function); did you mean 'MEM_PRIVATE'?
         PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
                                 ^~~~~~~~~~~
                                 MEM_PRIVATE
/tmp/qemu-test/src/hw/i386/pc.c:984:17: error: 'MAP_FAILED' undeclared (first use in this function); did you mean 'WAIT_FAILED'?
     if (ehdr == MAP_FAILED) {
                 ^~~~~~~~~~
                 WAIT_FAILED
/tmp/qemu-test/src/hw/i386/pc.c:1058:44: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'long long int' [-Werror=format=]
         error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
                                          ~~^
                                          %llx
             (nhdr - ehdr), filename, statbuf.st_size);
             ~~~~~~~~~~~~~                   
/tmp/qemu-test/src/hw/i386/pc.c:1058:75: error: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Werror=format=]
         error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
                                                                         ~~^
                                                                         %lld
             (nhdr - ehdr), filename, statbuf.st_size);
                                      ~~~~~~~~~~~~~~~                       
/tmp/qemu-test/src/hw/i386/pc.c:1075:46: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'long long unsigned int' [-Werror=format=]
         error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
                                            ~~^
                                            %llx
/tmp/qemu-test/src/hw/i386/pc.c:1075:72: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format=]
         error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
                                                                      ~~^
                                                                      %lld
/tmp/qemu-test/src/hw/i386/pc.c:1077:53:
             QEMU_ALIGN_UP(nhdr_descsz, phdr_align), statbuf.st_size);
                                                     ~~~~~~~~~~~~~~~     
/tmp/qemu-test/src/hw/i386/pc.c:1091:46: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'long long unsigned int' [-Werror=format=]
         error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
                                            ~~^
                                            %llx
             (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/qemu-test/src/hw/i386/pc.c:1091:77: error: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Werror=format=]
         error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
                                                                           ~~^
                                                                           %lld
             (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
                                                            ~~~~~~~~~~~~~~~   
/tmp/qemu-test/src/hw/i386/pc.c:1098:12: error: implicit declaration of function 'munmap'; did you mean 'gunzip'? [-Werror=implicit-function-declaration]
     (void) munmap(ehdr, statbuf.st_size);
            ^~~~~~
            gunzip
/tmp/qemu-test/src/hw/i386/pc.c:1098:12: error: nested extern declaration of 'munmap' [-Werror=nested-externs]
cc1: all warnings being treated as errors
  CC      aarch64-softmmu/hw/display/xlnx_dp.o
  GEN     trace/generated-helpers.c


The full log is available at
http://patchew.org/logs/1544049446-6359-1-git-send-email-liam.merwick@oracle.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] QEMU changes to do PVH boot
@ 2018-12-06  0:01   ` no-reply
  0 siblings, 0 replies; 32+ messages in thread
From: no-reply @ 2018-12-06  0:01 UTC (permalink / raw)
  To: liam.merwick
  Cc: famz, ehabkost, maran.wilson, mst, qemu-devel, stefanha,
	pbonzini, xen-devel, sgarzare, rth

Patchew URL: https://patchew.org/QEMU/1544049446-6359-1-git-send-email-liam.merwick@oracle.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=8
=== TEST SCRIPT END ===

  CC      x86_64-softmmu/target/i386/hax-windows.o
  CC      x86_64-softmmu/target/i386/sev-stub.o
/tmp/qemu-test/src/hw/i386/pc.c: In function 'get_elf_note_type':
/tmp/qemu-test/src/hw/i386/pc.c:884:42: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'size_t {aka long long unsigned int}' [-Werror=format=]
             error_report("Note type (0x%lx) not found in ELF Note section",
                                        ~~^
                                        %llx
/tmp/qemu-test/src/hw/i386/pc.c: In function 'read_pvh_start_addr_elf_note':
/tmp/qemu-test/src/hw/i386/pc.c:982:12: error: implicit declaration of function 'mmap'; did you mean 'max'? [-Werror=implicit-function-declaration]
     ehdr = mmap(0, statbuf.st_size,
            ^~~~
            max
/tmp/qemu-test/src/hw/i386/pc.c:982:12: error: nested extern declaration of 'mmap' [-Werror=nested-externs]
/tmp/qemu-test/src/hw/i386/pc.c:983:9: error: 'PROT_READ' undeclared (first use in this function); did you mean 'OF_READ'?
         PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
         ^~~~~~~~~
         OF_READ
/tmp/qemu-test/src/hw/i386/pc.c:983:9: note: each undeclared identifier is reported only once for each function it appears in
/tmp/qemu-test/src/hw/i386/pc.c:983:21: error: 'PROT_WRITE' undeclared (first use in this function); did you mean 'OF_WRITE'?
         PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
                     ^~~~~~~~~~
                     OF_WRITE
/tmp/qemu-test/src/hw/i386/pc.c:983:33: error: 'MAP_PRIVATE' undeclared (first use in this function); did you mean 'MEM_PRIVATE'?
         PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
                                 ^~~~~~~~~~~
                                 MEM_PRIVATE
/tmp/qemu-test/src/hw/i386/pc.c:984:17: error: 'MAP_FAILED' undeclared (first use in this function); did you mean 'WAIT_FAILED'?
     if (ehdr == MAP_FAILED) {
                 ^~~~~~~~~~
                 WAIT_FAILED
/tmp/qemu-test/src/hw/i386/pc.c:1058:44: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'long long int' [-Werror=format=]
         error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
                                          ~~^
                                          %llx
             (nhdr - ehdr), filename, statbuf.st_size);
             ~~~~~~~~~~~~~                   
/tmp/qemu-test/src/hw/i386/pc.c:1058:75: error: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Werror=format=]
         error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
                                                                         ~~^
                                                                         %lld
             (nhdr - ehdr), filename, statbuf.st_size);
                                      ~~~~~~~~~~~~~~~                       
/tmp/qemu-test/src/hw/i386/pc.c:1075:46: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'long long unsigned int' [-Werror=format=]
         error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
                                            ~~^
                                            %llx
/tmp/qemu-test/src/hw/i386/pc.c:1075:72: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format=]
         error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
                                                                      ~~^
                                                                      %lld
/tmp/qemu-test/src/hw/i386/pc.c:1077:53:
             QEMU_ALIGN_UP(nhdr_descsz, phdr_align), statbuf.st_size);
                                                     ~~~~~~~~~~~~~~~     
/tmp/qemu-test/src/hw/i386/pc.c:1091:46: error: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'long long unsigned int' [-Werror=format=]
         error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
                                            ~~^
                                            %llx
             (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/qemu-test/src/hw/i386/pc.c:1091:77: error: format '%ld' expects argument of type 'long int', but argument 4 has type 'long long int' [-Werror=format=]
         error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
                                                                           ~~^
                                                                           %lld
             (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
                                                            ~~~~~~~~~~~~~~~   
/tmp/qemu-test/src/hw/i386/pc.c:1098:12: error: implicit declaration of function 'munmap'; did you mean 'gunzip'? [-Werror=implicit-function-declaration]
     (void) munmap(ehdr, statbuf.st_size);
            ^~~~~~
            gunzip
/tmp/qemu-test/src/hw/i386/pc.c:1098:12: error: nested extern declaration of 'munmap' [-Werror=nested-externs]
cc1: all warnings being treated as errors
  CC      aarch64-softmmu/hw/display/xlnx_dp.o
  GEN     trace/generated-helpers.c


The full log is available at
http://patchew.org/logs/1544049446-6359-1-git-send-email-liam.merwick@oracle.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/3] QEMU changes to do PVH boot
  2018-12-05 22:37 ` Liam Merwick
@ 2018-12-06  6:18   ` Maran Wilson
  -1 siblings, 0 replies; 32+ messages in thread
From: Maran Wilson @ 2018-12-06  6:18 UTC (permalink / raw)
  To: Liam Merwick, qemu-devel
  Cc: pbonzini, ehabkost, rth, xen-devel, sgarzare, mst, stefanha

On 12/5/2018 2:37 PM, Liam Merwick wrote:
> For certain applications it is desirable to rapidly boot a KVM virtual
> machine. In cases where legacy hardware and software support within the
> guest is not needed, QEMU should be able to boot directly into the
> uncompressed Linux kernel binary with minimal firmware involvement.
>
> There already exists an ABI to allow this for Xen PVH guests and the ABI
> is supported by Linux and FreeBSD:
>
>     https://xenbits.xen.org/docs/unstable/misc/pvh.html
>
> Details on the Linux changes: https://lkml.org/lkml/2018/4/16/1002

In case anyone wants to grab the patches and give it a try, I've just 
posted an updated version of the Linux patches rebased to the latest 
mainline code:

https://lkml.org/lkml/2018/12/6/26

No functional changes from before, just some minor conflict resolution 
as part of the rebase.

Thanks,
-Maran

> qboot patches: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=80020
>
> This patch series provides QEMU support to read the ELF header of an
> uncompressed kernel binary and get the 32-bit PVH kernel entry point
> from an ELF Note.  This is called when initialising the machine state
> in pc_memory_init().  Later on in load_linux() if the kernel entry
> address is present, the uncompressed kernel binary (ELF) is loaded
> and qboot does futher initialisation of the guest (e820, etc.) and
> jumps to the kernel entry address and boots the guest.
>
>
> Usіng the method/scripts documented by the NEMU team at
>
>     https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
>     https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html
>
> below are some timings measured (vmlinux and bzImage from the same build)
> Time to get to kernel start is almost halved (95ṁs -> 48ms)
>
> QEMU + qboot + vmlinux (PVH + 4.20-rc4)
>   qemu_init_end: 41.550521
>   fw_start: 41.667139 (+0.116618)
>   fw_do_boot: 47.448495 (+5.781356)
>   linux_startup_64: 47.720785 (+0.27229)
>   linux_start_kernel: 48.399541 (+0.678756)
>   linux_start_user: 296.952056 (+248.552515)
>
> QEMU + qboot + bzImage:
>   qemu_init_end: 29.209276
>   fw_start: 29.317342 (+0.108066)
>   linux_start_boot: 36.679362 (+7.36202)
>   linux_startup_64: 94.531349 (+57.851987)
>   linux_start_kernel: 94.900913 (+0.369564)
>   linux_start_user: 401.060971 (+306.160058)
>
> QEMU + bzImage:
>   qemu_init_end: 30.424430
>   linux_startup_64: 893.770334 (+863.345904)
>   linux_start_kernel: 894.17049 (+0.400156)
>   linux_start_user: 1208.679768 (+314.509278)
>
>
> Liam Merwick (3):
>    pvh: Add x86/HVM direct boot ABI header file
>    pc: Read PVH entry point from ELF note in kernel binary
>    pvh: Boot uncompressed kernel using direct boot ABI
>
>   hw/i386/pc.c                | 344 +++++++++++++++++++++++++++++++++++++++++++-
>   include/elf.h               |  10 ++
>   include/hw/xen/start_info.h | 146 +++++++++++++++++++
>   3 files changed, 499 insertions(+), 1 deletion(-)
>   create mode 100644 include/hw/xen/start_info.h
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 0/3] QEMU changes to do PVH boot
@ 2018-12-06  6:18   ` Maran Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Maran Wilson @ 2018-12-06  6:18 UTC (permalink / raw)
  To: Liam Merwick, qemu-devel
  Cc: ehabkost, mst, stefanha, xen-devel, pbonzini, rth, sgarzare

On 12/5/2018 2:37 PM, Liam Merwick wrote:
> For certain applications it is desirable to rapidly boot a KVM virtual
> machine. In cases where legacy hardware and software support within the
> guest is not needed, QEMU should be able to boot directly into the
> uncompressed Linux kernel binary with minimal firmware involvement.
>
> There already exists an ABI to allow this for Xen PVH guests and the ABI
> is supported by Linux and FreeBSD:
>
>     https://xenbits.xen.org/docs/unstable/misc/pvh.html
>
> Details on the Linux changes: https://lkml.org/lkml/2018/4/16/1002

In case anyone wants to grab the patches and give it a try, I've just 
posted an updated version of the Linux patches rebased to the latest 
mainline code:

https://lkml.org/lkml/2018/12/6/26

No functional changes from before, just some minor conflict resolution 
as part of the rebase.

Thanks,
-Maran

> qboot patches: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=80020
>
> This patch series provides QEMU support to read the ELF header of an
> uncompressed kernel binary and get the 32-bit PVH kernel entry point
> from an ELF Note.  This is called when initialising the machine state
> in pc_memory_init().  Later on in load_linux() if the kernel entry
> address is present, the uncompressed kernel binary (ELF) is loaded
> and qboot does futher initialisation of the guest (e820, etc.) and
> jumps to the kernel entry address and boots the guest.
>
>
> Usіng the method/scripts documented by the NEMU team at
>
>     https://github.com/intel/nemu/wiki/Measuring-Boot-Latency
>     https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html
>
> below are some timings measured (vmlinux and bzImage from the same build)
> Time to get to kernel start is almost halved (95ṁs -> 48ms)
>
> QEMU + qboot + vmlinux (PVH + 4.20-rc4)
>   qemu_init_end: 41.550521
>   fw_start: 41.667139 (+0.116618)
>   fw_do_boot: 47.448495 (+5.781356)
>   linux_startup_64: 47.720785 (+0.27229)
>   linux_start_kernel: 48.399541 (+0.678756)
>   linux_start_user: 296.952056 (+248.552515)
>
> QEMU + qboot + bzImage:
>   qemu_init_end: 29.209276
>   fw_start: 29.317342 (+0.108066)
>   linux_start_boot: 36.679362 (+7.36202)
>   linux_startup_64: 94.531349 (+57.851987)
>   linux_start_kernel: 94.900913 (+0.369564)
>   linux_start_user: 401.060971 (+306.160058)
>
> QEMU + bzImage:
>   qemu_init_end: 30.424430
>   linux_startup_64: 893.770334 (+863.345904)
>   linux_start_kernel: 894.17049 (+0.400156)
>   linux_start_user: 1208.679768 (+314.509278)
>
>
> Liam Merwick (3):
>    pvh: Add x86/HVM direct boot ABI header file
>    pc: Read PVH entry point from ELF note in kernel binary
>    pvh: Boot uncompressed kernel using direct boot ABI
>
>   hw/i386/pc.c                | 344 +++++++++++++++++++++++++++++++++++++++++++-
>   include/elf.h               |  10 ++
>   include/hw/xen/start_info.h | 146 +++++++++++++++++++
>   3 files changed, 499 insertions(+), 1 deletion(-)
>   create mode 100644 include/hw/xen/start_info.h
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
  2018-12-05 22:37   ` Liam Merwick
@ 2018-12-11 14:01     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2018-12-11 14:01 UTC (permalink / raw)
  To: Liam Merwick
  Cc: qemu-devel, pbonzini, ehabkost, rth, xen-devel, sgarzare, mst,
	maran.wilson

[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]

On Wed, Dec 05, 2018 at 10:37:24PM +0000, Liam Merwick wrote:
> From: Liam Merwick <Liam.Merwick@oracle.com>
> 
> The x86/HVM direct boot ABI permits Qemu to be able to boot directly
> into the uncompressed Linux kernel binary without the need to run firmware.
> 
> 	https://xenbits.xen.org/docs/unstable/misc/pvh.html
> 
> This commit adds the header file that defines the start_info struct
> that needs to be populated in order to use this ABI.
> 
> Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
> ---
>  include/hw/xen/start_info.h | 146 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 146 insertions(+)
>  create mode 100644 include/hw/xen/start_info.h

Does it make sense to bring in Linux
include/xen/interface/hvm/start_info.h via QEMU's
include/standard-headers/?

QEMU has a script in scripts/update-linux-header.sh for syncing Linux
headers into include/standard-headers/.  This makes it easy to keep
Linux header files up-to-date.  We basically treat files in
include/standard-headers/ as auto-generated.

If you define start_info.h yourself without using
include/standard-headers/, then it won't be synced with Linux.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
@ 2018-12-11 14:01     ` Stefan Hajnoczi
  0 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2018-12-11 14:01 UTC (permalink / raw)
  To: Liam Merwick
  Cc: ehabkost, mst, maran.wilson, qemu-devel, xen-devel, pbonzini,
	rth, sgarzare


[-- Attachment #1.1: Type: text/plain, Size: 1302 bytes --]

On Wed, Dec 05, 2018 at 10:37:24PM +0000, Liam Merwick wrote:
> From: Liam Merwick <Liam.Merwick@oracle.com>
> 
> The x86/HVM direct boot ABI permits Qemu to be able to boot directly
> into the uncompressed Linux kernel binary without the need to run firmware.
> 
> 	https://xenbits.xen.org/docs/unstable/misc/pvh.html
> 
> This commit adds the header file that defines the start_info struct
> that needs to be populated in order to use this ABI.
> 
> Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
> ---
>  include/hw/xen/start_info.h | 146 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 146 insertions(+)
>  create mode 100644 include/hw/xen/start_info.h

Does it make sense to bring in Linux
include/xen/interface/hvm/start_info.h via QEMU's
include/standard-headers/?

QEMU has a script in scripts/update-linux-header.sh for syncing Linux
headers into include/standard-headers/.  This makes it easy to keep
Linux header files up-to-date.  We basically treat files in
include/standard-headers/ as auto-generated.

If you define start_info.h yourself without using
include/standard-headers/, then it won't be synced with Linux.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary
  2018-12-05 22:37   ` Liam Merwick
  (?)
@ 2018-12-11 14:17   ` Stefan Hajnoczi
  2018-12-21 20:03       ` Liam Merwick
  -1 siblings, 1 reply; 32+ messages in thread
From: Stefan Hajnoczi @ 2018-12-11 14:17 UTC (permalink / raw)
  To: Liam Merwick
  Cc: qemu-devel, pbonzini, ehabkost, rth, xen-devel, sgarzare, mst,
	maran.wilson

[-- Attachment #1: Type: text/plain, Size: 13547 bytes --]

On Wed, Dec 05, 2018 at 10:37:25PM +0000, Liam Merwick wrote:
> From: Liam Merwick <Liam.Merwick@oracle.com>
> 
> Add support to read the PVH Entry address from an ELF note in the
> uncompressed kernel binary (as defined by the x86/HVM direct boot ABI).
> This 32-bit entry point will be used by QEMU to load the kernel in the
> guest and jump into the kernel entry point.
> 
> For now, a call to this function is added in pc_memory_init() to read the
> address - a future patch will use the entry point.
> 
> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> ---
>  hw/i386/pc.c  | 272 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  include/elf.h |  10 +++
>  2 files changed, 281 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index f095725dbab2..056aa46d99b9 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -109,6 +109,9 @@ static struct e820_entry *e820_table;
>  static unsigned e820_entries;
>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  
> +/* Physical Address of PVH entry point read from kernel ELF NOTE */
> +static size_t pvh_start_addr;
> +
>  void gsi_handler(void *opaque, int n, int level)
>  {
>      GSIState *s = opaque;
> @@ -834,6 +837,267 @@ struct setup_data {
>      uint8_t data[0];
>  } __attribute__((packed));
>  
> +/*
> + * Search through the ELF Notes for an entry with the given
> + * ELF Note type
> + */
> +static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64,
> +    size_t elf_note_type)

Generic ELF code.  Can you put it in hw/core/loader.c?

> +{
> +    void *nhdr = NULL;
> +    size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
> +    size_t elf_note_entry_sz = 0;
> +    size_t phdr_off;
> +    size_t phdr_align;
> +    size_t phdr_memsz;
> +    size_t nhdr_namesz;
> +    size_t nhdr_descsz;
> +    size_t note_type;

The macro tricks used by hw/core/loader.c are nasty, but I think they
get the types right.  Here the Elf64 on 32-bit host case is definitely
broken due to using size_t.  Perhaps 64-on-32 isn't supported, but
getting the types right is worth discussing.

> +
> +    phdr_off = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
> +    phdr_align = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
> +    phdr_memsz = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
> +
> +    nhdr = ehdr + phdr_off;

The ELF file is untrusted.  All inputs must be validated.  phdr_off
could be an bogus/malicious value.

> +    note_type = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
> +    nhdr_namesz = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
> +    nhdr_descsz = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
> +
> +    while (note_type != elf_note_type) {
> +        elf_note_entry_sz = nhdr_size +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
> +            QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
> +
> +        /*
> +         * Verify that we haven't exceeded the end of the ELF Note section.
> +         * If we have, then there is no note of the given type present
> +         * in the ELF Notes.
> +         */
> +        if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) {
> +            error_report("Note type (0x%lx) not found in ELF Note section",
> +                elf_note_type);
> +            return NULL;
> +        }
> +
> +        /* skip to the next ELF Note entry */
> +        nhdr += elf_note_entry_sz;
> +        note_type = elf_is64 ?
> +            ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
> +        nhdr_namesz = elf_is64 ?
> +            ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
> +        nhdr_descsz = elf_is64 ?
> +            ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
> +    }
> +
> +    return nhdr;
> +}
> +
> +/*
> + * The entry point into the kernel for PVH boot is different from
> + * the native entry point.  The PVH entry is defined by the x86/HVM
> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> + * This function reads the ELF headers of the binary specified on the
> + * command line by -kernel (path contained in 'filename') and discovers
> + * the PVH entry address from the appropriate ELF Note.
> + *
> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
> + * global variable. The ELF class of the binary is returned via 'elfclass'
> + * (although the entry point is 32-bit, the kernel binary can be either
> + * 32-bit or 64-bit).
> + */
> +static bool read_pvh_start_addr_elf_note(const char *filename,
> +    unsigned char *elfclass)
> +{

Can this be integrated into ELF loading?  For example, could the elf
loader take a function pointer to perform additional logic (e.g.
extracting the PVH entry point)?  That avoids reparsing the input file.

> +    void *ehdr = NULL; /* Cast to Elf64_Ehdr or Elf32_Ehdr */
> +    void *phdr = NULL; /* Cast to Elf64_Phdr or Elf32_Phdr */
> +    void *nhdr = NULL; /* Cast to Elf64_Nhdr or Elf32_Nhdr */
> +    struct stat statbuf;
> +    size_t ehdr_size;
> +    size_t phdr_size;
> +    size_t nhdr_size;
> +    size_t elf_note_data_addr;
> +    /* Ehdr fields */
> +    size_t ehdr_poff;
> +    /* Phdr fields */
> +    size_t phdr_off;
> +    size_t phdr_align;
> +    size_t phdr_memsz;
> +    size_t phdr_type;
> +    /* Nhdr fields */
> +    size_t nhdr_namesz;
> +    size_t nhdr_descsz;
> +    bool elf_is64;
> +    FILE *file;
> +    union {
> +        Elf32_Ehdr h32;
> +        Elf64_Ehdr h64;
> +    } elf_header;
> +    Error *err = NULL;
> +
> +    pvh_start_addr = 0;
> +
> +    if (filename == NULL) {
> +        return false;
> +    }
> +
> +    file = fopen(filename, "rb");
> +    if (file == NULL) {
> +        error_report("fopen(%s) failed", filename);
> +        return false;
> +    }
> +
> +    if (fstat(fileno(file), &statbuf) < 0) {
> +        error_report("fstat() failed on file (%s)", filename);
> +        return false;
> +    }
> +
> +    load_elf_hdr(filename, &elf_header, &elf_is64, &err);
> +    if (err) {
> +        error_free(err);
> +        fclose(file);
> +        return false;
> +    }
> +
> +    *elfclass = elf_is64 ?
> +        elf_header.h64.e_ident[EI_CLASS] : elf_header.h32.e_ident[EI_CLASS];
> +    if (*elfclass == ELFCLASSNONE) {
> +        error_report("kernel binary (%s) is ELFCLASSNONE", filename);
> +        fclose(file);
> +        return false;
> +    }
> +
> +    ehdr_size = elf_is64 ? sizeof(Elf64_Ehdr) : sizeof(Elf32_Ehdr);
> +    phdr_size = elf_is64 ? sizeof(Elf64_Phdr) : sizeof(Elf32_Phdr);
> +    nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
> +
> +    /* We have already validated the ELF header when calling elf_load_hdr() */
> +
> +    ehdr = mmap(0, statbuf.st_size,
> +        PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
> +    if (ehdr == MAP_FAILED) {
> +        error_report("Failed to mmap kernel binary (%s)", filename);
> +        goto done;
> +    }
> +
> +    /*
> +     * Search through the program execution header for the
> +     * ELF Note section.
> +     */
> +
> +    ehdr_poff = elf_is64 ?
> +        ((Elf64_Ehdr *)(ehdr))->e_phoff : ((Elf32_Ehdr *)(ehdr))->e_phoff;
> +    if (statbuf.st_size < (ehdr_size + ehdr_poff)) {
> +        error_report("ELF NOTE section exceeds file (%s) size",
> +            filename);
> +        goto done;
> +    }
> +
> +    phdr = ehdr + ehdr_poff;
> +    phdr_type = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
> +    while (phdr != NULL && phdr_type != PT_NOTE) {
> +        if (statbuf.st_size < ((phdr - ehdr) + phdr_size)) {
> +            error_report("ELF Program headers in file (%s) too short",
> +                filename);
> +            goto done;
> +        }
> +        phdr += phdr_size;
> +        phdr_type = elf_is64 ?
> +            ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
> +    }
> +
> +    phdr_off = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
> +    phdr_align = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
> +    phdr_memsz = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
> +
> +    /*
> +     * check that the start of the ELF Note section is within the bounds
> +     * of the kernel ELF binary
> +     */
> +    if (statbuf.st_size < (ehdr_poff + phdr_size + phdr_off)) {
> +        error_report("Start of ELF note section outside of file (%s) bounds",
> +            filename);
> +        goto done;
> +    }
> +    /*
> +     * check that the end of the ELF Note section is within the bounds
> +     * of the kernel ELF binary
> +     */
> +    if (statbuf.st_size < (phdr_off + phdr_memsz)) {
> +        error_report("End of ELF note section outside of file (%s) bounds",
> +            filename);
> +        goto done;
> +    }
> +
> +    /*
> +     * Search through the ELF Notes for an entry with the
> +     * Physical Address (PA) of the PVH entry point.
> +     */
> +    nhdr = get_elf_note_type(ehdr, phdr, elf_is64, XEN_ELFNOTE_PHYS32_ENTRY);
> +    if (nhdr == NULL) {
> +        error_report("No PVH Entry details in kernel (%s) ELF Note section",
> +            filename);
> +        goto done;
> +    }
> +
> +    /*
> +     * Verify that the returned ELF Note header doesn't exceed the
> +     * end of the kernel file
> +     */
> +    if (statbuf.st_size < ((nhdr - ehdr))) {
> +        error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
> +            (nhdr - ehdr), filename, statbuf.st_size);
> +        goto done;
> +    }
> +
> +    nhdr_namesz = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
> +    nhdr_descsz = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
> +
> +    /*
> +     * Verify that the ELF Note contents don't exceed the end of the
> +     * kernel file
> +     */
> +    if (statbuf.st_size < ((nhdr - ehdr)) + nhdr_size +
> +        QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
> +        QEMU_ALIGN_UP(nhdr_descsz, phdr_align)) {
> +        error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
> +            (nhdr - ehdr) + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
> +            QEMU_ALIGN_UP(nhdr_descsz, phdr_align), statbuf.st_size);
> +        goto done;
> +    }
> +
> +    elf_note_data_addr =
> +        (size_t)nhdr + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +
> +    pvh_start_addr = *(size_t *)elf_note_data_addr;
> +
> +    /*
> +     * Verify that the PVH Entry point address does not exceed the
> +     * bounds of the kernel file.
> +     */
> +    if (statbuf.st_size < pvh_start_addr) {
> +        error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
> +            (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
> +        pvh_start_addr = 0;
> +        goto done;
> +    }
> +
> +done:
> +    (void) munmap(ehdr, statbuf.st_size);
> +    return pvh_start_addr != 0;
> +}
> +
>  static void load_linux(PCMachineState *pcms,
>                         FWCfgState *fw_cfg)
>  {
> @@ -1334,9 +1598,11 @@ void pc_memory_init(PCMachineState *pcms,
>      int linux_boot, i;
>      MemoryRegion *ram, *option_rom_mr;
>      MemoryRegion *ram_below_4g, *ram_above_4g;
> -    FWCfgState *fw_cfg;
> +    FWCfgState *fw_cfg = NULL;
> +    unsigned char class = ELFCLASSNONE;
>      MachineState *machine = MACHINE(pcms);
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +    const char *kernel_filename = machine->kernel_filename;
>  
>      assert(machine->ram_size == pcms->below_4g_mem_size +
>                                  pcms->above_4g_mem_size);
> @@ -1418,6 +1684,10 @@ void pc_memory_init(PCMachineState *pcms,
>                                      &machine->device_memory->mr);
>      }
>  
> +    if (linux_boot) {
> +        read_pvh_start_addr_elf_note(kernel_filename, &class);
> +    }
> +
>      /* Initialize PC system firmware */
>      pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
>  
> diff --git a/include/elf.h b/include/elf.h
> index c151164b63da..1f82c7a7124b 100644
> --- a/include/elf.h
> +++ b/include/elf.h
> @@ -1585,6 +1585,16 @@ typedef struct elf64_shdr {
>  #define NT_ARM_HW_WATCH 0x403           /* ARM hardware watchpoint registers */
>  #define NT_ARM_SYSTEM_CALL      0x404   /* ARM system call number */
>  
> +/*
> + * Physical entry point into the kernel.
> + *
> + * 32bit entry point into the kernel. When requested to launch the
> + * guest kernel, use this entry point to launch the guest in 32-bit
> + * protected mode with paging disabled.
> + *
> + * [ Corresponding definition in Linux kernel: include/xen/interface/elfnote.h ]
> + */
> +#define XEN_ELFNOTE_PHYS32_ENTRY    18  /* 0x12 */
>  
>  /* Note header in a PT_NOTE section */
>  typedef struct elf32_note {
> -- 
> 1.8.3.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary
  2018-12-05 22:37   ` Liam Merwick
  (?)
  (?)
@ 2018-12-11 14:17   ` Stefan Hajnoczi
  -1 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2018-12-11 14:17 UTC (permalink / raw)
  To: Liam Merwick
  Cc: ehabkost, mst, maran.wilson, qemu-devel, xen-devel, pbonzini,
	rth, sgarzare


[-- Attachment #1.1: Type: text/plain, Size: 13547 bytes --]

On Wed, Dec 05, 2018 at 10:37:25PM +0000, Liam Merwick wrote:
> From: Liam Merwick <Liam.Merwick@oracle.com>
> 
> Add support to read the PVH Entry address from an ELF note in the
> uncompressed kernel binary (as defined by the x86/HVM direct boot ABI).
> This 32-bit entry point will be used by QEMU to load the kernel in the
> guest and jump into the kernel entry point.
> 
> For now, a call to this function is added in pc_memory_init() to read the
> address - a future patch will use the entry point.
> 
> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> ---
>  hw/i386/pc.c  | 272 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  include/elf.h |  10 +++
>  2 files changed, 281 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index f095725dbab2..056aa46d99b9 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -109,6 +109,9 @@ static struct e820_entry *e820_table;
>  static unsigned e820_entries;
>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  
> +/* Physical Address of PVH entry point read from kernel ELF NOTE */
> +static size_t pvh_start_addr;
> +
>  void gsi_handler(void *opaque, int n, int level)
>  {
>      GSIState *s = opaque;
> @@ -834,6 +837,267 @@ struct setup_data {
>      uint8_t data[0];
>  } __attribute__((packed));
>  
> +/*
> + * Search through the ELF Notes for an entry with the given
> + * ELF Note type
> + */
> +static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64,
> +    size_t elf_note_type)

Generic ELF code.  Can you put it in hw/core/loader.c?

> +{
> +    void *nhdr = NULL;
> +    size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
> +    size_t elf_note_entry_sz = 0;
> +    size_t phdr_off;
> +    size_t phdr_align;
> +    size_t phdr_memsz;
> +    size_t nhdr_namesz;
> +    size_t nhdr_descsz;
> +    size_t note_type;

The macro tricks used by hw/core/loader.c are nasty, but I think they
get the types right.  Here the Elf64 on 32-bit host case is definitely
broken due to using size_t.  Perhaps 64-on-32 isn't supported, but
getting the types right is worth discussing.

> +
> +    phdr_off = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
> +    phdr_align = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
> +    phdr_memsz = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
> +
> +    nhdr = ehdr + phdr_off;

The ELF file is untrusted.  All inputs must be validated.  phdr_off
could be an bogus/malicious value.

> +    note_type = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
> +    nhdr_namesz = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
> +    nhdr_descsz = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
> +
> +    while (note_type != elf_note_type) {
> +        elf_note_entry_sz = nhdr_size +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
> +            QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
> +
> +        /*
> +         * Verify that we haven't exceeded the end of the ELF Note section.
> +         * If we have, then there is no note of the given type present
> +         * in the ELF Notes.
> +         */
> +        if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) {
> +            error_report("Note type (0x%lx) not found in ELF Note section",
> +                elf_note_type);
> +            return NULL;
> +        }
> +
> +        /* skip to the next ELF Note entry */
> +        nhdr += elf_note_entry_sz;
> +        note_type = elf_is64 ?
> +            ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
> +        nhdr_namesz = elf_is64 ?
> +            ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
> +        nhdr_descsz = elf_is64 ?
> +            ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
> +    }
> +
> +    return nhdr;
> +}
> +
> +/*
> + * The entry point into the kernel for PVH boot is different from
> + * the native entry point.  The PVH entry is defined by the x86/HVM
> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> + * This function reads the ELF headers of the binary specified on the
> + * command line by -kernel (path contained in 'filename') and discovers
> + * the PVH entry address from the appropriate ELF Note.
> + *
> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
> + * global variable. The ELF class of the binary is returned via 'elfclass'
> + * (although the entry point is 32-bit, the kernel binary can be either
> + * 32-bit or 64-bit).
> + */
> +static bool read_pvh_start_addr_elf_note(const char *filename,
> +    unsigned char *elfclass)
> +{

Can this be integrated into ELF loading?  For example, could the elf
loader take a function pointer to perform additional logic (e.g.
extracting the PVH entry point)?  That avoids reparsing the input file.

> +    void *ehdr = NULL; /* Cast to Elf64_Ehdr or Elf32_Ehdr */
> +    void *phdr = NULL; /* Cast to Elf64_Phdr or Elf32_Phdr */
> +    void *nhdr = NULL; /* Cast to Elf64_Nhdr or Elf32_Nhdr */
> +    struct stat statbuf;
> +    size_t ehdr_size;
> +    size_t phdr_size;
> +    size_t nhdr_size;
> +    size_t elf_note_data_addr;
> +    /* Ehdr fields */
> +    size_t ehdr_poff;
> +    /* Phdr fields */
> +    size_t phdr_off;
> +    size_t phdr_align;
> +    size_t phdr_memsz;
> +    size_t phdr_type;
> +    /* Nhdr fields */
> +    size_t nhdr_namesz;
> +    size_t nhdr_descsz;
> +    bool elf_is64;
> +    FILE *file;
> +    union {
> +        Elf32_Ehdr h32;
> +        Elf64_Ehdr h64;
> +    } elf_header;
> +    Error *err = NULL;
> +
> +    pvh_start_addr = 0;
> +
> +    if (filename == NULL) {
> +        return false;
> +    }
> +
> +    file = fopen(filename, "rb");
> +    if (file == NULL) {
> +        error_report("fopen(%s) failed", filename);
> +        return false;
> +    }
> +
> +    if (fstat(fileno(file), &statbuf) < 0) {
> +        error_report("fstat() failed on file (%s)", filename);
> +        return false;
> +    }
> +
> +    load_elf_hdr(filename, &elf_header, &elf_is64, &err);
> +    if (err) {
> +        error_free(err);
> +        fclose(file);
> +        return false;
> +    }
> +
> +    *elfclass = elf_is64 ?
> +        elf_header.h64.e_ident[EI_CLASS] : elf_header.h32.e_ident[EI_CLASS];
> +    if (*elfclass == ELFCLASSNONE) {
> +        error_report("kernel binary (%s) is ELFCLASSNONE", filename);
> +        fclose(file);
> +        return false;
> +    }
> +
> +    ehdr_size = elf_is64 ? sizeof(Elf64_Ehdr) : sizeof(Elf32_Ehdr);
> +    phdr_size = elf_is64 ? sizeof(Elf64_Phdr) : sizeof(Elf32_Phdr);
> +    nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
> +
> +    /* We have already validated the ELF header when calling elf_load_hdr() */
> +
> +    ehdr = mmap(0, statbuf.st_size,
> +        PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
> +    if (ehdr == MAP_FAILED) {
> +        error_report("Failed to mmap kernel binary (%s)", filename);
> +        goto done;
> +    }
> +
> +    /*
> +     * Search through the program execution header for the
> +     * ELF Note section.
> +     */
> +
> +    ehdr_poff = elf_is64 ?
> +        ((Elf64_Ehdr *)(ehdr))->e_phoff : ((Elf32_Ehdr *)(ehdr))->e_phoff;
> +    if (statbuf.st_size < (ehdr_size + ehdr_poff)) {
> +        error_report("ELF NOTE section exceeds file (%s) size",
> +            filename);
> +        goto done;
> +    }
> +
> +    phdr = ehdr + ehdr_poff;
> +    phdr_type = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
> +    while (phdr != NULL && phdr_type != PT_NOTE) {
> +        if (statbuf.st_size < ((phdr - ehdr) + phdr_size)) {
> +            error_report("ELF Program headers in file (%s) too short",
> +                filename);
> +            goto done;
> +        }
> +        phdr += phdr_size;
> +        phdr_type = elf_is64 ?
> +            ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
> +    }
> +
> +    phdr_off = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
> +    phdr_align = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
> +    phdr_memsz = elf_is64 ?
> +        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
> +
> +    /*
> +     * check that the start of the ELF Note section is within the bounds
> +     * of the kernel ELF binary
> +     */
> +    if (statbuf.st_size < (ehdr_poff + phdr_size + phdr_off)) {
> +        error_report("Start of ELF note section outside of file (%s) bounds",
> +            filename);
> +        goto done;
> +    }
> +    /*
> +     * check that the end of the ELF Note section is within the bounds
> +     * of the kernel ELF binary
> +     */
> +    if (statbuf.st_size < (phdr_off + phdr_memsz)) {
> +        error_report("End of ELF note section outside of file (%s) bounds",
> +            filename);
> +        goto done;
> +    }
> +
> +    /*
> +     * Search through the ELF Notes for an entry with the
> +     * Physical Address (PA) of the PVH entry point.
> +     */
> +    nhdr = get_elf_note_type(ehdr, phdr, elf_is64, XEN_ELFNOTE_PHYS32_ENTRY);
> +    if (nhdr == NULL) {
> +        error_report("No PVH Entry details in kernel (%s) ELF Note section",
> +            filename);
> +        goto done;
> +    }
> +
> +    /*
> +     * Verify that the returned ELF Note header doesn't exceed the
> +     * end of the kernel file
> +     */
> +    if (statbuf.st_size < ((nhdr - ehdr))) {
> +        error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
> +            (nhdr - ehdr), filename, statbuf.st_size);
> +        goto done;
> +    }
> +
> +    nhdr_namesz = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
> +    nhdr_descsz = elf_is64 ?
> +        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
> +
> +    /*
> +     * Verify that the ELF Note contents don't exceed the end of the
> +     * kernel file
> +     */
> +    if (statbuf.st_size < ((nhdr - ehdr)) + nhdr_size +
> +        QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
> +        QEMU_ALIGN_UP(nhdr_descsz, phdr_align)) {
> +        error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
> +            (nhdr - ehdr) + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
> +            QEMU_ALIGN_UP(nhdr_descsz, phdr_align), statbuf.st_size);
> +        goto done;
> +    }
> +
> +    elf_note_data_addr =
> +        (size_t)nhdr + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +
> +    pvh_start_addr = *(size_t *)elf_note_data_addr;
> +
> +    /*
> +     * Verify that the PVH Entry point address does not exceed the
> +     * bounds of the kernel file.
> +     */
> +    if (statbuf.st_size < pvh_start_addr) {
> +        error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
> +            (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
> +        pvh_start_addr = 0;
> +        goto done;
> +    }
> +
> +done:
> +    (void) munmap(ehdr, statbuf.st_size);
> +    return pvh_start_addr != 0;
> +}
> +
>  static void load_linux(PCMachineState *pcms,
>                         FWCfgState *fw_cfg)
>  {
> @@ -1334,9 +1598,11 @@ void pc_memory_init(PCMachineState *pcms,
>      int linux_boot, i;
>      MemoryRegion *ram, *option_rom_mr;
>      MemoryRegion *ram_below_4g, *ram_above_4g;
> -    FWCfgState *fw_cfg;
> +    FWCfgState *fw_cfg = NULL;
> +    unsigned char class = ELFCLASSNONE;
>      MachineState *machine = MACHINE(pcms);
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +    const char *kernel_filename = machine->kernel_filename;
>  
>      assert(machine->ram_size == pcms->below_4g_mem_size +
>                                  pcms->above_4g_mem_size);
> @@ -1418,6 +1684,10 @@ void pc_memory_init(PCMachineState *pcms,
>                                      &machine->device_memory->mr);
>      }
>  
> +    if (linux_boot) {
> +        read_pvh_start_addr_elf_note(kernel_filename, &class);
> +    }
> +
>      /* Initialize PC system firmware */
>      pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
>  
> diff --git a/include/elf.h b/include/elf.h
> index c151164b63da..1f82c7a7124b 100644
> --- a/include/elf.h
> +++ b/include/elf.h
> @@ -1585,6 +1585,16 @@ typedef struct elf64_shdr {
>  #define NT_ARM_HW_WATCH 0x403           /* ARM hardware watchpoint registers */
>  #define NT_ARM_SYSTEM_CALL      0x404   /* ARM system call number */
>  
> +/*
> + * Physical entry point into the kernel.
> + *
> + * 32bit entry point into the kernel. When requested to launch the
> + * guest kernel, use this entry point to launch the guest in 32-bit
> + * protected mode with paging disabled.
> + *
> + * [ Corresponding definition in Linux kernel: include/xen/interface/elfnote.h ]
> + */
> +#define XEN_ELFNOTE_PHYS32_ENTRY    18  /* 0x12 */
>  
>  /* Note header in a PT_NOTE section */
>  typedef struct elf32_note {
> -- 
> 1.8.3.1
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
  2018-12-11 14:01     ` Stefan Hajnoczi
@ 2018-12-11 14:57       ` Liam Merwick
  -1 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-11 14:57 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, pbonzini, ehabkost, rth, xen-devel, sgarzare, mst,
	maran.wilson, George Kennedy, Boris Ostrovsky



On 11/12/2018 14:01, Stefan Hajnoczi wrote:
> On Wed, Dec 05, 2018 at 10:37:24PM +0000, Liam Merwick wrote:
>> From: Liam Merwick <Liam.Merwick@oracle.com>
>>
>> The x86/HVM direct boot ABI permits Qemu to be able to boot directly
>> into the uncompressed Linux kernel binary without the need to run firmware.
>>
>> 	https://xenbits.xen.org/docs/unstable/misc/pvh.html
>>
>> This commit adds the header file that defines the start_info struct
>> that needs to be populated in order to use this ABI.
>>
>> Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
>> ---
>>   include/hw/xen/start_info.h | 146 ++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 146 insertions(+)
>>   create mode 100644 include/hw/xen/start_info.h
> 
> Does it make sense to bring in Linux
> include/xen/interface/hvm/start_info.h via QEMU's
> include/standard-headers/?
> 
> QEMU has a script in scripts/update-linux-header.sh for syncing Linux
> headers into include/standard-headers/.  This makes it easy to keep
> Linux header files up-to-date.  We basically treat files in
> include/standard-headers/ as auto-generated.
> 
> If you define start_info.h yourself without using
> include/standard-headers/, then it won't be synced with Linux.
> 

That does seem better.  I will make that change.

One a related note, I'm trying to fix the mingw compilation errors [1] 
in this series also.  I can fix the format issues with PRIx64, etc but I 
can't seem to find an include file to provide a declaration of mmap() 
et. al. - has this been resolved before? A pointer to something similar 
to investigate would be very welcome.

Regards,
Liam

[1] 
http://patchew.org/logs/1544049446-6359-1-git-send-email-liam.merwick@oracle.com/testing.docker-mingw@fedora/?type=message

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
@ 2018-12-11 14:57       ` Liam Merwick
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-11 14:57 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, mst, maran.wilson, qemu-devel, George Kennedy,
	xen-devel, pbonzini, Boris Ostrovsky, rth, sgarzare



On 11/12/2018 14:01, Stefan Hajnoczi wrote:
> On Wed, Dec 05, 2018 at 10:37:24PM +0000, Liam Merwick wrote:
>> From: Liam Merwick <Liam.Merwick@oracle.com>
>>
>> The x86/HVM direct boot ABI permits Qemu to be able to boot directly
>> into the uncompressed Linux kernel binary without the need to run firmware.
>>
>> 	https://xenbits.xen.org/docs/unstable/misc/pvh.html
>>
>> This commit adds the header file that defines the start_info struct
>> that needs to be populated in order to use this ABI.
>>
>> Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
>> ---
>>   include/hw/xen/start_info.h | 146 ++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 146 insertions(+)
>>   create mode 100644 include/hw/xen/start_info.h
> 
> Does it make sense to bring in Linux
> include/xen/interface/hvm/start_info.h via QEMU's
> include/standard-headers/?
> 
> QEMU has a script in scripts/update-linux-header.sh for syncing Linux
> headers into include/standard-headers/.  This makes it easy to keep
> Linux header files up-to-date.  We basically treat files in
> include/standard-headers/ as auto-generated.
> 
> If you define start_info.h yourself without using
> include/standard-headers/, then it won't be synced with Linux.
> 

That does seem better.  I will make that change.

One a related note, I'm trying to fix the mingw compilation errors [1] 
in this series also.  I can fix the format issues with PRIx64, etc but I 
can't seem to find an include file to provide a declaration of mmap() 
et. al. - has this been resolved before? A pointer to something similar 
to investigate would be very welcome.

Regards,
Liam

[1] 
http://patchew.org/logs/1544049446-6359-1-git-send-email-liam.merwick@oracle.com/testing.docker-mingw@fedora/?type=message

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
  2018-12-11 14:57       ` Liam Merwick
@ 2018-12-11 15:03         ` Daniel P. Berrangé
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2018-12-11 15:03 UTC (permalink / raw)
  To: Liam Merwick
  Cc: Stefan Hajnoczi, ehabkost, mst, maran.wilson, qemu-devel,
	George Kennedy, xen-devel, pbonzini, Boris Ostrovsky, rth,
	sgarzare

On Tue, Dec 11, 2018 at 02:57:29PM +0000, Liam Merwick wrote:
> 
> 
> On 11/12/2018 14:01, Stefan Hajnoczi wrote:
> > On Wed, Dec 05, 2018 at 10:37:24PM +0000, Liam Merwick wrote:
> > > From: Liam Merwick <Liam.Merwick@oracle.com>
> > > 
> > > The x86/HVM direct boot ABI permits Qemu to be able to boot directly
> > > into the uncompressed Linux kernel binary without the need to run firmware.
> > > 
> > > 	https://xenbits.xen.org/docs/unstable/misc/pvh.html
> > > 
> > > This commit adds the header file that defines the start_info struct
> > > that needs to be populated in order to use this ABI.
> > > 
> > > Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
> > > Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> > > Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
> > > ---
> > >   include/hw/xen/start_info.h | 146 ++++++++++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 146 insertions(+)
> > >   create mode 100644 include/hw/xen/start_info.h
> > 
> > Does it make sense to bring in Linux
> > include/xen/interface/hvm/start_info.h via QEMU's
> > include/standard-headers/?
> > 
> > QEMU has a script in scripts/update-linux-header.sh for syncing Linux
> > headers into include/standard-headers/.  This makes it easy to keep
> > Linux header files up-to-date.  We basically treat files in
> > include/standard-headers/ as auto-generated.
> > 
> > If you define start_info.h yourself without using
> > include/standard-headers/, then it won't be synced with Linux.
> > 
> 
> That does seem better.  I will make that change.
> 
> One a related note, I'm trying to fix the mingw compilation errors [1] in
> this series also.  I can fix the format issues with PRIx64, etc but I can't
> seem to find an include file to provide a declaration of mmap() et. al. -
> has this been resolved before? A pointer to something similar to investigate
> would be very welcome.

There is no mmap() on mingw, so you'll have to make sure that code is
conditionally compiled with  #ifndef WIN32 where appropriate.

> [1] http://patchew.org/logs/1544049446-6359-1-git-send-email-liam.merwick@oracle.com/testing.docker-mingw@fedora/?type=message
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
@ 2018-12-11 15:03         ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2018-12-11 15:03 UTC (permalink / raw)
  To: Liam Merwick
  Cc: ehabkost, maran.wilson, mst, qemu-devel, George Kennedy,
	Stefan Hajnoczi, pbonzini, xen-devel, Boris Ostrovsky, sgarzare,
	rth

On Tue, Dec 11, 2018 at 02:57:29PM +0000, Liam Merwick wrote:
> 
> 
> On 11/12/2018 14:01, Stefan Hajnoczi wrote:
> > On Wed, Dec 05, 2018 at 10:37:24PM +0000, Liam Merwick wrote:
> > > From: Liam Merwick <Liam.Merwick@oracle.com>
> > > 
> > > The x86/HVM direct boot ABI permits Qemu to be able to boot directly
> > > into the uncompressed Linux kernel binary without the need to run firmware.
> > > 
> > > 	https://xenbits.xen.org/docs/unstable/misc/pvh.html
> > > 
> > > This commit adds the header file that defines the start_info struct
> > > that needs to be populated in order to use this ABI.
> > > 
> > > Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
> > > Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> > > Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
> > > ---
> > >   include/hw/xen/start_info.h | 146 ++++++++++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 146 insertions(+)
> > >   create mode 100644 include/hw/xen/start_info.h
> > 
> > Does it make sense to bring in Linux
> > include/xen/interface/hvm/start_info.h via QEMU's
> > include/standard-headers/?
> > 
> > QEMU has a script in scripts/update-linux-header.sh for syncing Linux
> > headers into include/standard-headers/.  This makes it easy to keep
> > Linux header files up-to-date.  We basically treat files in
> > include/standard-headers/ as auto-generated.
> > 
> > If you define start_info.h yourself without using
> > include/standard-headers/, then it won't be synced with Linux.
> > 
> 
> That does seem better.  I will make that change.
> 
> One a related note, I'm trying to fix the mingw compilation errors [1] in
> this series also.  I can fix the format issues with PRIx64, etc but I can't
> seem to find an include file to provide a declaration of mmap() et. al. -
> has this been resolved before? A pointer to something similar to investigate
> would be very welcome.

There is no mmap() on mingw, so you'll have to make sure that code is
conditionally compiled with  #ifndef WIN32 where appropriate.

> [1] http://patchew.org/logs/1544049446-6359-1-git-send-email-liam.merwick@oracle.com/testing.docker-mingw@fedora/?type=message
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
  2018-12-05 22:37   ` Liam Merwick
  (?)
@ 2018-12-11 17:11   ` Stefano Garzarella
  2018-12-11 18:35       ` Maran Wilson
  -1 siblings, 1 reply; 32+ messages in thread
From: Stefano Garzarella @ 2018-12-11 17:11 UTC (permalink / raw)
  To: liam.merwick
  Cc: qemu-devel, Paolo Bonzini, Eduardo Habkost, rth, xen-devel, mst,
	Stefan Hajnoczi, maran.wilson

Hi Liam,
in order to support PVH also with SeaBIOS, I'm going to work on a new
option rom (like linuxboot/multiboot) that can be used in this case.

I'll keep you updated on it!

Cheers,
Stefano
On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>
> These changes (along with corresponding qboot and Linux kernel changes)
> enable a guest to be booted using the x86/HVM direct boot ABI.
>
> This commit adds a load_elfboot() routine to pass the size and
> location of the kernel entry point to qboot (which will fill in
> the start_info struct information needed to to boot the guest).
> Having loaded the ELF binary, load_linux() will run qboot
> which continues the boot.
>
> The address for the kernel entry point has already been read
> from an ELF Note in the uncompressed kernel binary earlier
> in pc_memory_init().
>
> Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> ---
>  hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 72 insertions(+)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 056aa46d99b9..d3012cbd8597 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -54,6 +54,7 @@
>  #include "sysemu/qtest.h"
>  #include "kvm_i386.h"
>  #include "hw/xen/xen.h"
> +#include "hw/xen/start_info.h"
>  #include "ui/qemu-spice.h"
>  #include "exec/memory.h"
>  #include "exec/address-spaces.h"
> @@ -1098,6 +1099,50 @@ done:
>      return pvh_start_addr != 0;
>  }
>
> +static bool load_elfboot(const char *kernel_filename,
> +                   int kernel_file_size,
> +                   uint8_t *header,
> +                   size_t pvh_xen_start_addr,
> +                   FWCfgState *fw_cfg)
> +{
> +    uint32_t flags = 0;
> +    uint32_t mh_load_addr = 0;
> +    uint32_t elf_kernel_size = 0;
> +    uint64_t elf_entry;
> +    uint64_t elf_low, elf_high;
> +    int kernel_size;
> +
> +    if (ldl_p(header) != 0x464c457f) {
> +        return false; /* no elfboot */
> +    }
> +
> +    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> +    flags = elf_is64 ?
> +        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> +
> +    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> +        error_report("elfboot unsupported flags = %x", flags);
> +        exit(1);
> +    }
> +
> +    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> +                           0, 0);
> +
> +    if (kernel_size < 0) {
> +        error_report("Error while loading elf kernel");
> +        exit(1);
> +    }
> +    mh_load_addr = elf_low;
> +    elf_kernel_size = elf_high - elf_low;
> +
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> +
> +    return true;
> +}
> +
>  static void load_linux(PCMachineState *pcms,
>                         FWCfgState *fw_cfg)
>  {
> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
>      if (ldl_p(header+0x202) == 0x53726448) {
>          protocol = lduw_p(header+0x206);
>      } else {
> +        /* If the kernel address for using the x86/HVM direct boot ABI has
> +         * been saved then proceed with booting the uncompressed kernel */
> +        if (pvh_start_addr) {
> +            if (load_elfboot(kernel_filename, kernel_size,
> +                             header, pvh_start_addr, fw_cfg)) {
> +                struct hvm_modlist_entry ramdisk_mod = { 0 };
> +
> +                fclose(f);
> +
> +                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
> +                    strlen(kernel_cmdline) + 1);
> +                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> +
> +                assert(machine->device_memory != NULL);
> +                ramdisk_mod.paddr = machine->device_memory->base;
> +                ramdisk_mod.size =
> +                    memory_region_size(&machine->device_memory->mr);
> +
> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
> +                                 sizeof(ramdisk_mod));
> +                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
> +                                 header, sizeof(header));
> +
> +                return;
> +            }
> +        }
>          /* This looks like a multiboot kernel. If it is, let's stop
>             treating it like a Linux kernel. */
>          if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
> --
> 1.8.3.1
>


-- 
Stefano Garzarella
Red Hat

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
  2018-12-05 22:37   ` Liam Merwick
  (?)
  (?)
@ 2018-12-11 17:11   ` Stefano Garzarella
  -1 siblings, 0 replies; 32+ messages in thread
From: Stefano Garzarella @ 2018-12-11 17:11 UTC (permalink / raw)
  To: liam.merwick
  Cc: Eduardo Habkost, mst, maran.wilson, qemu-devel, Stefan Hajnoczi,
	xen-devel, Paolo Bonzini, rth

Hi Liam,
in order to support PVH also with SeaBIOS, I'm going to work on a new
option rom (like linuxboot/multiboot) that can be used in this case.

I'll keep you updated on it!

Cheers,
Stefano
On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>
> These changes (along with corresponding qboot and Linux kernel changes)
> enable a guest to be booted using the x86/HVM direct boot ABI.
>
> This commit adds a load_elfboot() routine to pass the size and
> location of the kernel entry point to qboot (which will fill in
> the start_info struct information needed to to boot the guest).
> Having loaded the ELF binary, load_linux() will run qboot
> which continues the boot.
>
> The address for the kernel entry point has already been read
> from an ELF Note in the uncompressed kernel binary earlier
> in pc_memory_init().
>
> Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> ---
>  hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 72 insertions(+)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 056aa46d99b9..d3012cbd8597 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -54,6 +54,7 @@
>  #include "sysemu/qtest.h"
>  #include "kvm_i386.h"
>  #include "hw/xen/xen.h"
> +#include "hw/xen/start_info.h"
>  #include "ui/qemu-spice.h"
>  #include "exec/memory.h"
>  #include "exec/address-spaces.h"
> @@ -1098,6 +1099,50 @@ done:
>      return pvh_start_addr != 0;
>  }
>
> +static bool load_elfboot(const char *kernel_filename,
> +                   int kernel_file_size,
> +                   uint8_t *header,
> +                   size_t pvh_xen_start_addr,
> +                   FWCfgState *fw_cfg)
> +{
> +    uint32_t flags = 0;
> +    uint32_t mh_load_addr = 0;
> +    uint32_t elf_kernel_size = 0;
> +    uint64_t elf_entry;
> +    uint64_t elf_low, elf_high;
> +    int kernel_size;
> +
> +    if (ldl_p(header) != 0x464c457f) {
> +        return false; /* no elfboot */
> +    }
> +
> +    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> +    flags = elf_is64 ?
> +        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> +
> +    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> +        error_report("elfboot unsupported flags = %x", flags);
> +        exit(1);
> +    }
> +
> +    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> +                           0, 0);
> +
> +    if (kernel_size < 0) {
> +        error_report("Error while loading elf kernel");
> +        exit(1);
> +    }
> +    mh_load_addr = elf_low;
> +    elf_kernel_size = elf_high - elf_low;
> +
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> +
> +    return true;
> +}
> +
>  static void load_linux(PCMachineState *pcms,
>                         FWCfgState *fw_cfg)
>  {
> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
>      if (ldl_p(header+0x202) == 0x53726448) {
>          protocol = lduw_p(header+0x206);
>      } else {
> +        /* If the kernel address for using the x86/HVM direct boot ABI has
> +         * been saved then proceed with booting the uncompressed kernel */
> +        if (pvh_start_addr) {
> +            if (load_elfboot(kernel_filename, kernel_size,
> +                             header, pvh_start_addr, fw_cfg)) {
> +                struct hvm_modlist_entry ramdisk_mod = { 0 };
> +
> +                fclose(f);
> +
> +                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
> +                    strlen(kernel_cmdline) + 1);
> +                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> +
> +                assert(machine->device_memory != NULL);
> +                ramdisk_mod.paddr = machine->device_memory->base;
> +                ramdisk_mod.size =
> +                    memory_region_size(&machine->device_memory->mr);
> +
> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
> +                                 sizeof(ramdisk_mod));
> +                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
> +                                 header, sizeof(header));
> +
> +                return;
> +            }
> +        }
>          /* This looks like a multiboot kernel. If it is, let's stop
>             treating it like a Linux kernel. */
>          if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
> --
> 1.8.3.1
>


-- 
Stefano Garzarella
Red Hat

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
  2018-12-11 17:11   ` [Qemu-devel] " Stefano Garzarella
@ 2018-12-11 18:35       ` Maran Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Maran Wilson @ 2018-12-11 18:35 UTC (permalink / raw)
  To: Stefano Garzarella, liam.merwick
  Cc: qemu-devel, Paolo Bonzini, Eduardo Habkost, rth, xen-devel, mst,
	Stefan Hajnoczi

On 12/11/2018 9:11 AM, Stefano Garzarella wrote:
> Hi Liam,
> in order to support PVH also with SeaBIOS, I'm going to work on a new
> option rom (like linuxboot/multiboot) that can be used in this case.

That is awesome. Yes, please keep us posted when you have something working.

Just FYI, before switching over to using Qemu+qboot, we had been using a 
Qemu only solution (but not using an option rom) internally that worked 
very well using no FW at all. We had Qemu simply parse the ELF file and 
jump to the PVH entry point if one is found. The only gotcha was that we 
had to include a pair of patches that were originally written by folks 
at Intel as part of the clear containers work. Specifically, in order to 
be able to skip firmware entirely, we had to do 2 additional things: (1) 
ACPI tables generated by Qemu are usually patched up by FW. Since we 
were running no FW, we needed to do that patching up of the ACPI tables 
in Qemu when it was detected that we were going to enter the OS via the 
PVH entry point. (2) We also needed to add a patch to Qemu to enable a 
few PM registers -- something typically done by FW.

But if SeaBIOS is involved in the solution you are working on, I guess 
you won't really need those extra patches. Just figured I'd mention it 
so you have the full picture.

Thanks,
-Maran

> I'll keep you updated on it!
>
> Cheers,
> Stefano
> On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>> These changes (along with corresponding qboot and Linux kernel changes)
>> enable a guest to be booted using the x86/HVM direct boot ABI.
>>
>> This commit adds a load_elfboot() routine to pass the size and
>> location of the kernel entry point to qboot (which will fill in
>> the start_info struct information needed to to boot the guest).
>> Having loaded the ELF binary, load_linux() will run qboot
>> which continues the boot.
>>
>> The address for the kernel entry point has already been read
>> from an ELF Note in the uncompressed kernel binary earlier
>> in pc_memory_init().
>>
>> Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>> ---
>>   hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 72 insertions(+)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 056aa46d99b9..d3012cbd8597 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -54,6 +54,7 @@
>>   #include "sysemu/qtest.h"
>>   #include "kvm_i386.h"
>>   #include "hw/xen/xen.h"
>> +#include "hw/xen/start_info.h"
>>   #include "ui/qemu-spice.h"
>>   #include "exec/memory.h"
>>   #include "exec/address-spaces.h"
>> @@ -1098,6 +1099,50 @@ done:
>>       return pvh_start_addr != 0;
>>   }
>>
>> +static bool load_elfboot(const char *kernel_filename,
>> +                   int kernel_file_size,
>> +                   uint8_t *header,
>> +                   size_t pvh_xen_start_addr,
>> +                   FWCfgState *fw_cfg)
>> +{
>> +    uint32_t flags = 0;
>> +    uint32_t mh_load_addr = 0;
>> +    uint32_t elf_kernel_size = 0;
>> +    uint64_t elf_entry;
>> +    uint64_t elf_low, elf_high;
>> +    int kernel_size;
>> +
>> +    if (ldl_p(header) != 0x464c457f) {
>> +        return false; /* no elfboot */
>> +    }
>> +
>> +    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
>> +    flags = elf_is64 ?
>> +        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
>> +
>> +    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
>> +        error_report("elfboot unsupported flags = %x", flags);
>> +        exit(1);
>> +    }
>> +
>> +    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
>> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> +                           0, 0);
>> +
>> +    if (kernel_size < 0) {
>> +        error_report("Error while loading elf kernel");
>> +        exit(1);
>> +    }
>> +    mh_load_addr = elf_low;
>> +    elf_kernel_size = elf_high - elf_low;
>> +
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> +
>> +    return true;
>> +}
>> +
>>   static void load_linux(PCMachineState *pcms,
>>                          FWCfgState *fw_cfg)
>>   {
>> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
>>       if (ldl_p(header+0x202) == 0x53726448) {
>>           protocol = lduw_p(header+0x206);
>>       } else {
>> +        /* If the kernel address for using the x86/HVM direct boot ABI has
>> +         * been saved then proceed with booting the uncompressed kernel */
>> +        if (pvh_start_addr) {
>> +            if (load_elfboot(kernel_filename, kernel_size,
>> +                             header, pvh_start_addr, fw_cfg)) {
>> +                struct hvm_modlist_entry ramdisk_mod = { 0 };
>> +
>> +                fclose(f);
>> +
>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>> +                    strlen(kernel_cmdline) + 1);
>> +                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> +
>> +                assert(machine->device_memory != NULL);
>> +                ramdisk_mod.paddr = machine->device_memory->base;
>> +                ramdisk_mod.size =
>> +                    memory_region_size(&machine->device_memory->mr);
>> +
>> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
>> +                                 sizeof(ramdisk_mod));
>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
>> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
>> +                                 header, sizeof(header));
>> +
>> +                return;
>> +            }
>> +        }
>>           /* This looks like a multiboot kernel. If it is, let's stop
>>              treating it like a Linux kernel. */
>>           if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
>> --
>> 1.8.3.1
>>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
@ 2018-12-11 18:35       ` Maran Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Maran Wilson @ 2018-12-11 18:35 UTC (permalink / raw)
  To: Stefano Garzarella, liam.merwick
  Cc: Eduardo Habkost, mst, qemu-devel, Stefan Hajnoczi, xen-devel,
	Paolo Bonzini, rth

On 12/11/2018 9:11 AM, Stefano Garzarella wrote:
> Hi Liam,
> in order to support PVH also with SeaBIOS, I'm going to work on a new
> option rom (like linuxboot/multiboot) that can be used in this case.

That is awesome. Yes, please keep us posted when you have something working.

Just FYI, before switching over to using Qemu+qboot, we had been using a 
Qemu only solution (but not using an option rom) internally that worked 
very well using no FW at all. We had Qemu simply parse the ELF file and 
jump to the PVH entry point if one is found. The only gotcha was that we 
had to include a pair of patches that were originally written by folks 
at Intel as part of the clear containers work. Specifically, in order to 
be able to skip firmware entirely, we had to do 2 additional things: (1) 
ACPI tables generated by Qemu are usually patched up by FW. Since we 
were running no FW, we needed to do that patching up of the ACPI tables 
in Qemu when it was detected that we were going to enter the OS via the 
PVH entry point. (2) We also needed to add a patch to Qemu to enable a 
few PM registers -- something typically done by FW.

But if SeaBIOS is involved in the solution you are working on, I guess 
you won't really need those extra patches. Just figured I'd mention it 
so you have the full picture.

Thanks,
-Maran

> I'll keep you updated on it!
>
> Cheers,
> Stefano
> On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>> These changes (along with corresponding qboot and Linux kernel changes)
>> enable a guest to be booted using the x86/HVM direct boot ABI.
>>
>> This commit adds a load_elfboot() routine to pass the size and
>> location of the kernel entry point to qboot (which will fill in
>> the start_info struct information needed to to boot the guest).
>> Having loaded the ELF binary, load_linux() will run qboot
>> which continues the boot.
>>
>> The address for the kernel entry point has already been read
>> from an ELF Note in the uncompressed kernel binary earlier
>> in pc_memory_init().
>>
>> Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>> ---
>>   hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 72 insertions(+)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 056aa46d99b9..d3012cbd8597 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -54,6 +54,7 @@
>>   #include "sysemu/qtest.h"
>>   #include "kvm_i386.h"
>>   #include "hw/xen/xen.h"
>> +#include "hw/xen/start_info.h"
>>   #include "ui/qemu-spice.h"
>>   #include "exec/memory.h"
>>   #include "exec/address-spaces.h"
>> @@ -1098,6 +1099,50 @@ done:
>>       return pvh_start_addr != 0;
>>   }
>>
>> +static bool load_elfboot(const char *kernel_filename,
>> +                   int kernel_file_size,
>> +                   uint8_t *header,
>> +                   size_t pvh_xen_start_addr,
>> +                   FWCfgState *fw_cfg)
>> +{
>> +    uint32_t flags = 0;
>> +    uint32_t mh_load_addr = 0;
>> +    uint32_t elf_kernel_size = 0;
>> +    uint64_t elf_entry;
>> +    uint64_t elf_low, elf_high;
>> +    int kernel_size;
>> +
>> +    if (ldl_p(header) != 0x464c457f) {
>> +        return false; /* no elfboot */
>> +    }
>> +
>> +    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
>> +    flags = elf_is64 ?
>> +        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
>> +
>> +    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
>> +        error_report("elfboot unsupported flags = %x", flags);
>> +        exit(1);
>> +    }
>> +
>> +    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
>> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> +                           0, 0);
>> +
>> +    if (kernel_size < 0) {
>> +        error_report("Error while loading elf kernel");
>> +        exit(1);
>> +    }
>> +    mh_load_addr = elf_low;
>> +    elf_kernel_size = elf_high - elf_low;
>> +
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> +
>> +    return true;
>> +}
>> +
>>   static void load_linux(PCMachineState *pcms,
>>                          FWCfgState *fw_cfg)
>>   {
>> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
>>       if (ldl_p(header+0x202) == 0x53726448) {
>>           protocol = lduw_p(header+0x206);
>>       } else {
>> +        /* If the kernel address for using the x86/HVM direct boot ABI has
>> +         * been saved then proceed with booting the uncompressed kernel */
>> +        if (pvh_start_addr) {
>> +            if (load_elfboot(kernel_filename, kernel_size,
>> +                             header, pvh_start_addr, fw_cfg)) {
>> +                struct hvm_modlist_entry ramdisk_mod = { 0 };
>> +
>> +                fclose(f);
>> +
>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>> +                    strlen(kernel_cmdline) + 1);
>> +                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> +
>> +                assert(machine->device_memory != NULL);
>> +                ramdisk_mod.paddr = machine->device_memory->base;
>> +                ramdisk_mod.size =
>> +                    memory_region_size(&machine->device_memory->mr);
>> +
>> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
>> +                                 sizeof(ramdisk_mod));
>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
>> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
>> +                                 header, sizeof(header));
>> +
>> +                return;
>> +            }
>> +        }
>>           /* This looks like a multiboot kernel. If it is, let's stop
>>              treating it like a Linux kernel. */
>>           if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
>> --
>> 1.8.3.1
>>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
  2018-12-11 18:35       ` Maran Wilson
  (?)
@ 2018-12-12 15:28       ` Stefano Garzarella
  2018-12-12 17:36           ` Maran Wilson
  -1 siblings, 1 reply; 32+ messages in thread
From: Stefano Garzarella @ 2018-12-12 15:28 UTC (permalink / raw)
  To: maran.wilson
  Cc: liam.merwick, qemu-devel, Paolo Bonzini, Eduardo Habkost, rth,
	xen-devel, Michael Tsirkin, Stefan Hajnoczi

On Tue, Dec 11, 2018 at 7:35 PM Maran Wilson <maran.wilson@oracle.com> wrote:
>
> On 12/11/2018 9:11 AM, Stefano Garzarella wrote:
> > Hi Liam,
> > in order to support PVH also with SeaBIOS, I'm going to work on a new
> > option rom (like linuxboot/multiboot) that can be used in this case.
>
> That is awesome. Yes, please keep us posted when you have something working.

Yes, I'll keep you updated!

>
> Just FYI, before switching over to using Qemu+qboot, we had been using a
> Qemu only solution (but not using an option rom) internally that worked
> very well using no FW at all. We had Qemu simply parse the ELF file and
> jump to the PVH entry point if one is found. The only gotcha was that we
> had to include a pair of patches that were originally written by folks
> at Intel as part of the clear containers work. Specifically, in order to
> be able to skip firmware entirely, we had to do 2 additional things: (1)
> ACPI tables generated by Qemu are usually patched up by FW. Since we
> were running no FW, we needed to do that patching up of the ACPI tables
> in Qemu when it was detected that we were going to enter the OS via the
> PVH entry point. (2) We also needed to add a patch to Qemu to enable a
> few PM registers -- something typically done by FW.

I had a look of qemu-lite, are you referring to this?

>
> But if SeaBIOS is involved in the solution you are working on, I guess
> you won't really need those extra patches. Just figured I'd mention it
> so you have the full picture.

Thank you very much to share with me these details!

Cheers,
Stefano

>
> Thanks,
> -Maran
>
> > I'll keep you updated on it!
> >
> > Cheers,
> > Stefano
> > On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick <liam.merwick@oracle.com> wrote:
> >> These changes (along with corresponding qboot and Linux kernel changes)
> >> enable a guest to be booted using the x86/HVM direct boot ABI.
> >>
> >> This commit adds a load_elfboot() routine to pass the size and
> >> location of the kernel entry point to qboot (which will fill in
> >> the start_info struct information needed to to boot the guest).
> >> Having loaded the ELF binary, load_linux() will run qboot
> >> which continues the boot.
> >>
> >> The address for the kernel entry point has already been read
> >> from an ELF Note in the uncompressed kernel binary earlier
> >> in pc_memory_init().
> >>
> >> Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
> >> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> >> ---
> >>   hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 72 insertions(+)
> >>
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index 056aa46d99b9..d3012cbd8597 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -54,6 +54,7 @@
> >>   #include "sysemu/qtest.h"
> >>   #include "kvm_i386.h"
> >>   #include "hw/xen/xen.h"
> >> +#include "hw/xen/start_info.h"
> >>   #include "ui/qemu-spice.h"
> >>   #include "exec/memory.h"
> >>   #include "exec/address-spaces.h"
> >> @@ -1098,6 +1099,50 @@ done:
> >>       return pvh_start_addr != 0;
> >>   }
> >>
> >> +static bool load_elfboot(const char *kernel_filename,
> >> +                   int kernel_file_size,
> >> +                   uint8_t *header,
> >> +                   size_t pvh_xen_start_addr,
> >> +                   FWCfgState *fw_cfg)
> >> +{
> >> +    uint32_t flags = 0;
> >> +    uint32_t mh_load_addr = 0;
> >> +    uint32_t elf_kernel_size = 0;
> >> +    uint64_t elf_entry;
> >> +    uint64_t elf_low, elf_high;
> >> +    int kernel_size;
> >> +
> >> +    if (ldl_p(header) != 0x464c457f) {
> >> +        return false; /* no elfboot */
> >> +    }
> >> +
> >> +    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> >> +    flags = elf_is64 ?
> >> +        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> >> +
> >> +    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> >> +        error_report("elfboot unsupported flags = %x", flags);
> >> +        exit(1);
> >> +    }
> >> +
> >> +    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
> >> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> >> +                           0, 0);
> >> +
> >> +    if (kernel_size < 0) {
> >> +        error_report("Error while loading elf kernel");
> >> +        exit(1);
> >> +    }
> >> +    mh_load_addr = elf_low;
> >> +    elf_kernel_size = elf_high - elf_low;
> >> +
> >> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
> >> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> >> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> >> +
> >> +    return true;
> >> +}
> >> +
> >>   static void load_linux(PCMachineState *pcms,
> >>                          FWCfgState *fw_cfg)
> >>   {
> >> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
> >>       if (ldl_p(header+0x202) == 0x53726448) {
> >>           protocol = lduw_p(header+0x206);
> >>       } else {
> >> +        /* If the kernel address for using the x86/HVM direct boot ABI has
> >> +         * been saved then proceed with booting the uncompressed kernel */
> >> +        if (pvh_start_addr) {
> >> +            if (load_elfboot(kernel_filename, kernel_size,
> >> +                             header, pvh_start_addr, fw_cfg)) {
> >> +                struct hvm_modlist_entry ramdisk_mod = { 0 };
> >> +
> >> +                fclose(f);
> >> +
> >> +                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
> >> +                    strlen(kernel_cmdline) + 1);
> >> +                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> >> +
> >> +                assert(machine->device_memory != NULL);
> >> +                ramdisk_mod.paddr = machine->device_memory->base;
> >> +                ramdisk_mod.size =
> >> +                    memory_region_size(&machine->device_memory->mr);
> >> +
> >> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
> >> +                                 sizeof(ramdisk_mod));
> >> +                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
> >> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
> >> +                                 header, sizeof(header));
> >> +
> >> +                return;
> >> +            }
> >> +        }
> >>           /* This looks like a multiboot kernel. If it is, let's stop
> >>              treating it like a Linux kernel. */
> >>           if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
> >> --
> >> 1.8.3.1
> >>
> >
>


-- 
Stefano Garzarella
Red Hat

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
  2018-12-11 18:35       ` Maran Wilson
  (?)
  (?)
@ 2018-12-12 15:28       ` Stefano Garzarella
  -1 siblings, 0 replies; 32+ messages in thread
From: Stefano Garzarella @ 2018-12-12 15:28 UTC (permalink / raw)
  To: maran.wilson
  Cc: Eduardo Habkost, Michael Tsirkin, qemu-devel, Paolo Bonzini,
	Stefan Hajnoczi, xen-devel, liam.merwick, rth

On Tue, Dec 11, 2018 at 7:35 PM Maran Wilson <maran.wilson@oracle.com> wrote:
>
> On 12/11/2018 9:11 AM, Stefano Garzarella wrote:
> > Hi Liam,
> > in order to support PVH also with SeaBIOS, I'm going to work on a new
> > option rom (like linuxboot/multiboot) that can be used in this case.
>
> That is awesome. Yes, please keep us posted when you have something working.

Yes, I'll keep you updated!

>
> Just FYI, before switching over to using Qemu+qboot, we had been using a
> Qemu only solution (but not using an option rom) internally that worked
> very well using no FW at all. We had Qemu simply parse the ELF file and
> jump to the PVH entry point if one is found. The only gotcha was that we
> had to include a pair of patches that were originally written by folks
> at Intel as part of the clear containers work. Specifically, in order to
> be able to skip firmware entirely, we had to do 2 additional things: (1)
> ACPI tables generated by Qemu are usually patched up by FW. Since we
> were running no FW, we needed to do that patching up of the ACPI tables
> in Qemu when it was detected that we were going to enter the OS via the
> PVH entry point. (2) We also needed to add a patch to Qemu to enable a
> few PM registers -- something typically done by FW.

I had a look of qemu-lite, are you referring to this?

>
> But if SeaBIOS is involved in the solution you are working on, I guess
> you won't really need those extra patches. Just figured I'd mention it
> so you have the full picture.

Thank you very much to share with me these details!

Cheers,
Stefano

>
> Thanks,
> -Maran
>
> > I'll keep you updated on it!
> >
> > Cheers,
> > Stefano
> > On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick <liam.merwick@oracle.com> wrote:
> >> These changes (along with corresponding qboot and Linux kernel changes)
> >> enable a guest to be booted using the x86/HVM direct boot ABI.
> >>
> >> This commit adds a load_elfboot() routine to pass the size and
> >> location of the kernel entry point to qboot (which will fill in
> >> the start_info struct information needed to to boot the guest).
> >> Having loaded the ELF binary, load_linux() will run qboot
> >> which continues the boot.
> >>
> >> The address for the kernel entry point has already been read
> >> from an ELF Note in the uncompressed kernel binary earlier
> >> in pc_memory_init().
> >>
> >> Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
> >> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
> >> ---
> >>   hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 72 insertions(+)
> >>
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index 056aa46d99b9..d3012cbd8597 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -54,6 +54,7 @@
> >>   #include "sysemu/qtest.h"
> >>   #include "kvm_i386.h"
> >>   #include "hw/xen/xen.h"
> >> +#include "hw/xen/start_info.h"
> >>   #include "ui/qemu-spice.h"
> >>   #include "exec/memory.h"
> >>   #include "exec/address-spaces.h"
> >> @@ -1098,6 +1099,50 @@ done:
> >>       return pvh_start_addr != 0;
> >>   }
> >>
> >> +static bool load_elfboot(const char *kernel_filename,
> >> +                   int kernel_file_size,
> >> +                   uint8_t *header,
> >> +                   size_t pvh_xen_start_addr,
> >> +                   FWCfgState *fw_cfg)
> >> +{
> >> +    uint32_t flags = 0;
> >> +    uint32_t mh_load_addr = 0;
> >> +    uint32_t elf_kernel_size = 0;
> >> +    uint64_t elf_entry;
> >> +    uint64_t elf_low, elf_high;
> >> +    int kernel_size;
> >> +
> >> +    if (ldl_p(header) != 0x464c457f) {
> >> +        return false; /* no elfboot */
> >> +    }
> >> +
> >> +    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> >> +    flags = elf_is64 ?
> >> +        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> >> +
> >> +    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> >> +        error_report("elfboot unsupported flags = %x", flags);
> >> +        exit(1);
> >> +    }
> >> +
> >> +    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
> >> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> >> +                           0, 0);
> >> +
> >> +    if (kernel_size < 0) {
> >> +        error_report("Error while loading elf kernel");
> >> +        exit(1);
> >> +    }
> >> +    mh_load_addr = elf_low;
> >> +    elf_kernel_size = elf_high - elf_low;
> >> +
> >> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
> >> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> >> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> >> +
> >> +    return true;
> >> +}
> >> +
> >>   static void load_linux(PCMachineState *pcms,
> >>                          FWCfgState *fw_cfg)
> >>   {
> >> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
> >>       if (ldl_p(header+0x202) == 0x53726448) {
> >>           protocol = lduw_p(header+0x206);
> >>       } else {
> >> +        /* If the kernel address for using the x86/HVM direct boot ABI has
> >> +         * been saved then proceed with booting the uncompressed kernel */
> >> +        if (pvh_start_addr) {
> >> +            if (load_elfboot(kernel_filename, kernel_size,
> >> +                             header, pvh_start_addr, fw_cfg)) {
> >> +                struct hvm_modlist_entry ramdisk_mod = { 0 };
> >> +
> >> +                fclose(f);
> >> +
> >> +                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
> >> +                    strlen(kernel_cmdline) + 1);
> >> +                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> >> +
> >> +                assert(machine->device_memory != NULL);
> >> +                ramdisk_mod.paddr = machine->device_memory->base;
> >> +                ramdisk_mod.size =
> >> +                    memory_region_size(&machine->device_memory->mr);
> >> +
> >> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
> >> +                                 sizeof(ramdisk_mod));
> >> +                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
> >> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
> >> +                                 header, sizeof(header));
> >> +
> >> +                return;
> >> +            }
> >> +        }
> >>           /* This looks like a multiboot kernel. If it is, let's stop
> >>              treating it like a Linux kernel. */
> >>           if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
> >> --
> >> 1.8.3.1
> >>
> >
>


-- 
Stefano Garzarella
Red Hat

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
  2018-12-12 15:28       ` [Qemu-devel] " Stefano Garzarella
@ 2018-12-12 17:36           ` Maran Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Maran Wilson @ 2018-12-12 17:36 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: liam.merwick, qemu-devel, Paolo Bonzini, Eduardo Habkost, rth,
	xen-devel, Michael Tsirkin, Stefan Hajnoczi

On 12/12/2018 7:28 AM, Stefano Garzarella wrote:
> On Tue, Dec 11, 2018 at 7:35 PM Maran Wilson <maran.wilson@oracle.com> wrote:
>> On 12/11/2018 9:11 AM, Stefano Garzarella wrote:
>>> Hi Liam,
>>> in order to support PVH also with SeaBIOS, I'm going to work on a new
>>> option rom (like linuxboot/multiboot) that can be used in this case.
>> That is awesome. Yes, please keep us posted when you have something working.
> Yes, I'll keep you updated!
>
>> Just FYI, before switching over to using Qemu+qboot, we had been using a
>> Qemu only solution (but not using an option rom) internally that worked
>> very well using no FW at all. We had Qemu simply parse the ELF file and
>> jump to the PVH entry point if one is found. The only gotcha was that we
>> had to include a pair of patches that were originally written by folks
>> at Intel as part of the clear containers work. Specifically, in order to
>> be able to skip firmware entirely, we had to do 2 additional things: (1)
>> ACPI tables generated by Qemu are usually patched up by FW. Since we
>> were running no FW, we needed to do that patching up of the ACPI tables
>> in Qemu when it was detected that we were going to enter the OS via the
>> PVH entry point. (2) We also needed to add a patch to Qemu to enable a
>> few PM registers -- something typically done by FW.
> I had a look of qemu-lite, are you referring to this?

Yes. More specifically, we were using a modified version of this patch:
    acpi: patch guest ACPI when loading firmware is skipped
But unlike qemu-lite, we were not using a -nofw flag, instead, just 
choosing PVH vs legacy boot based on which -kernel binary was provided 
and whether it contained the PVH ELF note.

So apply the above patch, you also need to pick up:
    acpi: expose acpi_checksum()

For a while, we had also been using patch:
    ich9: enable pm registers when there is no firmware
But that last patch can be avoided by simply selecting Hardware-Reduced 
ACPI mode when building the FADT in Qemu, when PVH boot is selected.

But you probably wont need those patches at all if you are actually 
running some version of minimized SeaBIOS.

Thanks,
-Maran


>> But if SeaBIOS is involved in the solution you are working on, I guess
>> you won't really need those extra patches. Just figured I'd mention it
>> so you have the full picture.
> Thank you very much to share with me these details!
>
> Cheers,
> Stefano
>
>> Thanks,
>> -Maran
>>
>>> I'll keep you updated on it!
>>>
>>> Cheers,
>>> Stefano
>>> On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>>>> These changes (along with corresponding qboot and Linux kernel changes)
>>>> enable a guest to be booted using the x86/HVM direct boot ABI.
>>>>
>>>> This commit adds a load_elfboot() routine to pass the size and
>>>> location of the kernel entry point to qboot (which will fill in
>>>> the start_info struct information needed to to boot the guest).
>>>> Having loaded the ELF binary, load_linux() will run qboot
>>>> which continues the boot.
>>>>
>>>> The address for the kernel entry point has already been read
>>>> from an ELF Note in the uncompressed kernel binary earlier
>>>> in pc_memory_init().
>>>>
>>>> Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
>>>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>>>> ---
>>>>    hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 72 insertions(+)
>>>>
>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>>> index 056aa46d99b9..d3012cbd8597 100644
>>>> --- a/hw/i386/pc.c
>>>> +++ b/hw/i386/pc.c
>>>> @@ -54,6 +54,7 @@
>>>>    #include "sysemu/qtest.h"
>>>>    #include "kvm_i386.h"
>>>>    #include "hw/xen/xen.h"
>>>> +#include "hw/xen/start_info.h"
>>>>    #include "ui/qemu-spice.h"
>>>>    #include "exec/memory.h"
>>>>    #include "exec/address-spaces.h"
>>>> @@ -1098,6 +1099,50 @@ done:
>>>>        return pvh_start_addr != 0;
>>>>    }
>>>>
>>>> +static bool load_elfboot(const char *kernel_filename,
>>>> +                   int kernel_file_size,
>>>> +                   uint8_t *header,
>>>> +                   size_t pvh_xen_start_addr,
>>>> +                   FWCfgState *fw_cfg)
>>>> +{
>>>> +    uint32_t flags = 0;
>>>> +    uint32_t mh_load_addr = 0;
>>>> +    uint32_t elf_kernel_size = 0;
>>>> +    uint64_t elf_entry;
>>>> +    uint64_t elf_low, elf_high;
>>>> +    int kernel_size;
>>>> +
>>>> +    if (ldl_p(header) != 0x464c457f) {
>>>> +        return false; /* no elfboot */
>>>> +    }
>>>> +
>>>> +    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
>>>> +    flags = elf_is64 ?
>>>> +        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
>>>> +
>>>> +    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
>>>> +        error_report("elfboot unsupported flags = %x", flags);
>>>> +        exit(1);
>>>> +    }
>>>> +
>>>> +    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
>>>> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>>>> +                           0, 0);
>>>> +
>>>> +    if (kernel_size < 0) {
>>>> +        error_report("Error while loading elf kernel");
>>>> +        exit(1);
>>>> +    }
>>>> +    mh_load_addr = elf_low;
>>>> +    elf_kernel_size = elf_high - elf_low;
>>>> +
>>>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
>>>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>>>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>>>> +
>>>> +    return true;
>>>> +}
>>>> +
>>>>    static void load_linux(PCMachineState *pcms,
>>>>                           FWCfgState *fw_cfg)
>>>>    {
>>>> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
>>>>        if (ldl_p(header+0x202) == 0x53726448) {
>>>>            protocol = lduw_p(header+0x206);
>>>>        } else {
>>>> +        /* If the kernel address for using the x86/HVM direct boot ABI has
>>>> +         * been saved then proceed with booting the uncompressed kernel */
>>>> +        if (pvh_start_addr) {
>>>> +            if (load_elfboot(kernel_filename, kernel_size,
>>>> +                             header, pvh_start_addr, fw_cfg)) {
>>>> +                struct hvm_modlist_entry ramdisk_mod = { 0 };
>>>> +
>>>> +                fclose(f);
>>>> +
>>>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>>>> +                    strlen(kernel_cmdline) + 1);
>>>> +                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>>>> +
>>>> +                assert(machine->device_memory != NULL);
>>>> +                ramdisk_mod.paddr = machine->device_memory->base;
>>>> +                ramdisk_mod.size =
>>>> +                    memory_region_size(&machine->device_memory->mr);
>>>> +
>>>> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
>>>> +                                 sizeof(ramdisk_mod));
>>>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
>>>> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
>>>> +                                 header, sizeof(header));
>>>> +
>>>> +                return;
>>>> +            }
>>>> +        }
>>>>            /* This looks like a multiboot kernel. If it is, let's stop
>>>>               treating it like a Linux kernel. */
>>>>            if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
>>>> --
>>>> 1.8.3.1
>>>>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
@ 2018-12-12 17:36           ` Maran Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Maran Wilson @ 2018-12-12 17:36 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Eduardo Habkost, Michael Tsirkin, qemu-devel, Paolo Bonzini,
	Stefan Hajnoczi, xen-devel, liam.merwick, rth

On 12/12/2018 7:28 AM, Stefano Garzarella wrote:
> On Tue, Dec 11, 2018 at 7:35 PM Maran Wilson <maran.wilson@oracle.com> wrote:
>> On 12/11/2018 9:11 AM, Stefano Garzarella wrote:
>>> Hi Liam,
>>> in order to support PVH also with SeaBIOS, I'm going to work on a new
>>> option rom (like linuxboot/multiboot) that can be used in this case.
>> That is awesome. Yes, please keep us posted when you have something working.
> Yes, I'll keep you updated!
>
>> Just FYI, before switching over to using Qemu+qboot, we had been using a
>> Qemu only solution (but not using an option rom) internally that worked
>> very well using no FW at all. We had Qemu simply parse the ELF file and
>> jump to the PVH entry point if one is found. The only gotcha was that we
>> had to include a pair of patches that were originally written by folks
>> at Intel as part of the clear containers work. Specifically, in order to
>> be able to skip firmware entirely, we had to do 2 additional things: (1)
>> ACPI tables generated by Qemu are usually patched up by FW. Since we
>> were running no FW, we needed to do that patching up of the ACPI tables
>> in Qemu when it was detected that we were going to enter the OS via the
>> PVH entry point. (2) We also needed to add a patch to Qemu to enable a
>> few PM registers -- something typically done by FW.
> I had a look of qemu-lite, are you referring to this?

Yes. More specifically, we were using a modified version of this patch:
    acpi: patch guest ACPI when loading firmware is skipped
But unlike qemu-lite, we were not using a -nofw flag, instead, just 
choosing PVH vs legacy boot based on which -kernel binary was provided 
and whether it contained the PVH ELF note.

So apply the above patch, you also need to pick up:
    acpi: expose acpi_checksum()

For a while, we had also been using patch:
    ich9: enable pm registers when there is no firmware
But that last patch can be avoided by simply selecting Hardware-Reduced 
ACPI mode when building the FADT in Qemu, when PVH boot is selected.

But you probably wont need those patches at all if you are actually 
running some version of minimized SeaBIOS.

Thanks,
-Maran


>> But if SeaBIOS is involved in the solution you are working on, I guess
>> you won't really need those extra patches. Just figured I'd mention it
>> so you have the full picture.
> Thank you very much to share with me these details!
>
> Cheers,
> Stefano
>
>> Thanks,
>> -Maran
>>
>>> I'll keep you updated on it!
>>>
>>> Cheers,
>>> Stefano
>>> On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick <liam.merwick@oracle.com> wrote:
>>>> These changes (along with corresponding qboot and Linux kernel changes)
>>>> enable a guest to be booted using the x86/HVM direct boot ABI.
>>>>
>>>> This commit adds a load_elfboot() routine to pass the size and
>>>> location of the kernel entry point to qboot (which will fill in
>>>> the start_info struct information needed to to boot the guest).
>>>> Having loaded the ELF binary, load_linux() will run qboot
>>>> which continues the boot.
>>>>
>>>> The address for the kernel entry point has already been read
>>>> from an ELF Note in the uncompressed kernel binary earlier
>>>> in pc_memory_init().
>>>>
>>>> Signed-off-by: George Kennedy <George.Kennedy@oracle.com>
>>>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>>>> ---
>>>>    hw/i386/pc.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 72 insertions(+)
>>>>
>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>>> index 056aa46d99b9..d3012cbd8597 100644
>>>> --- a/hw/i386/pc.c
>>>> +++ b/hw/i386/pc.c
>>>> @@ -54,6 +54,7 @@
>>>>    #include "sysemu/qtest.h"
>>>>    #include "kvm_i386.h"
>>>>    #include "hw/xen/xen.h"
>>>> +#include "hw/xen/start_info.h"
>>>>    #include "ui/qemu-spice.h"
>>>>    #include "exec/memory.h"
>>>>    #include "exec/address-spaces.h"
>>>> @@ -1098,6 +1099,50 @@ done:
>>>>        return pvh_start_addr != 0;
>>>>    }
>>>>
>>>> +static bool load_elfboot(const char *kernel_filename,
>>>> +                   int kernel_file_size,
>>>> +                   uint8_t *header,
>>>> +                   size_t pvh_xen_start_addr,
>>>> +                   FWCfgState *fw_cfg)
>>>> +{
>>>> +    uint32_t flags = 0;
>>>> +    uint32_t mh_load_addr = 0;
>>>> +    uint32_t elf_kernel_size = 0;
>>>> +    uint64_t elf_entry;
>>>> +    uint64_t elf_low, elf_high;
>>>> +    int kernel_size;
>>>> +
>>>> +    if (ldl_p(header) != 0x464c457f) {
>>>> +        return false; /* no elfboot */
>>>> +    }
>>>> +
>>>> +    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
>>>> +    flags = elf_is64 ?
>>>> +        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
>>>> +
>>>> +    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
>>>> +        error_report("elfboot unsupported flags = %x", flags);
>>>> +        exit(1);
>>>> +    }
>>>> +
>>>> +    kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry,
>>>> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>>>> +                           0, 0);
>>>> +
>>>> +    if (kernel_size < 0) {
>>>> +        error_report("Error while loading elf kernel");
>>>> +        exit(1);
>>>> +    }
>>>> +    mh_load_addr = elf_low;
>>>> +    elf_kernel_size = elf_high - elf_low;
>>>> +
>>>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
>>>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>>>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>>>> +
>>>> +    return true;
>>>> +}
>>>> +
>>>>    static void load_linux(PCMachineState *pcms,
>>>>                           FWCfgState *fw_cfg)
>>>>    {
>>>> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
>>>>        if (ldl_p(header+0x202) == 0x53726448) {
>>>>            protocol = lduw_p(header+0x206);
>>>>        } else {
>>>> +        /* If the kernel address for using the x86/HVM direct boot ABI has
>>>> +         * been saved then proceed with booting the uncompressed kernel */
>>>> +        if (pvh_start_addr) {
>>>> +            if (load_elfboot(kernel_filename, kernel_size,
>>>> +                             header, pvh_start_addr, fw_cfg)) {
>>>> +                struct hvm_modlist_entry ramdisk_mod = { 0 };
>>>> +
>>>> +                fclose(f);
>>>> +
>>>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>>>> +                    strlen(kernel_cmdline) + 1);
>>>> +                fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>>>> +
>>>> +                assert(machine->device_memory != NULL);
>>>> +                ramdisk_mod.paddr = machine->device_memory->base;
>>>> +                ramdisk_mod.size =
>>>> +                    memory_region_size(&machine->device_memory->mr);
>>>> +
>>>> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod,
>>>> +                                 sizeof(ramdisk_mod));
>>>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
>>>> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
>>>> +                                 header, sizeof(header));
>>>> +
>>>> +                return;
>>>> +            }
>>>> +        }
>>>>            /* This looks like a multiboot kernel. If it is, let's stop
>>>>               treating it like a Linux kernel. */
>>>>            if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
>>>> --
>>>> 1.8.3.1
>>>>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
  2018-12-11 14:57       ` Liam Merwick
@ 2018-12-21 20:03         ` Liam Merwick
  -1 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-21 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, ehabkost, rth, xen-devel, sgarzare, mst,
	Maran Wilson, Boris Ostrovsky, George Kennedy, liam.merwick

On 11/12/2018 14:57, Liam Merwick wrote:
> On 11/12/2018 14:01, Stefan Hajnoczi wrote:
>> On Wed, Dec 05, 2018 at 10:37:24PM +0000, Liam Merwick wrote:
>>> From: Liam Merwick <Liam.Merwick@oracle.com>
>>>
>>> The x86/HVM direct boot ABI permits Qemu to be able to boot directly
>>> into the uncompressed Linux kernel binary without the need to run 
>>> firmware.
>>>
>>>     https://xenbits.xen.org/docs/unstable/misc/pvh.html
>>>
>>> This commit adds the header file that defines the start_info struct
>>> that needs to be populated in order to use this ABI.
>>>
>>> Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
>>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>>> Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
>>> ---
>>>   include/hw/xen/start_info.h | 146 
>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 146 insertions(+)
>>>   create mode 100644 include/hw/xen/start_info.h
>>
>> Does it make sense to bring in Linux
>> include/xen/interface/hvm/start_info.h via QEMU's
>> include/standard-headers/?
>>
>> QEMU has a script in scripts/update-linux-header.sh for syncing Linux
>> headers into include/standard-headers/.  This makes it easy to keep
>> Linux header files up-to-date.  We basically treat files in
>> include/standard-headers/ as auto-generated.
>>
>> If you define start_info.h yourself without using
>> include/standard-headers/, then it won't be synced with Linux.
>>
> 
> That does seem better.  I will make that change.

When attempting to implement this, I found the canonical copy of this 
header file is actually in Xen and the Linux copy is kept in sync with 
that.  Also, 'make headers_install' doesn't install those Xen headers.

Instead I updated the commit comment to mention the canonical copy 
location.  This file isn't expected to change much so I think keeping it 
in sync in future shouldn't be onerous.

Regards,
Liam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
@ 2018-12-21 20:03         ` Liam Merwick
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-21 20:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: liam.merwick, ehabkost, mst, Maran Wilson, George Kennedy,
	xen-devel, Paolo Bonzini, Boris Ostrovsky, rth, sgarzare

On 11/12/2018 14:57, Liam Merwick wrote:
> On 11/12/2018 14:01, Stefan Hajnoczi wrote:
>> On Wed, Dec 05, 2018 at 10:37:24PM +0000, Liam Merwick wrote:
>>> From: Liam Merwick <Liam.Merwick@oracle.com>
>>>
>>> The x86/HVM direct boot ABI permits Qemu to be able to boot directly
>>> into the uncompressed Linux kernel binary without the need to run 
>>> firmware.
>>>
>>>     https://xenbits.xen.org/docs/unstable/misc/pvh.html
>>>
>>> This commit adds the header file that defines the start_info struct
>>> that needs to be populated in order to use this ABI.
>>>
>>> Signed-off-by: Maran Wilson <Maran.Wilson@oracle.com>
>>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>>> Reviewed-by: Konrad Rzeszutek Wilk <Konrad.Wilk@oracle.com>
>>> ---
>>>   include/hw/xen/start_info.h | 146 
>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 146 insertions(+)
>>>   create mode 100644 include/hw/xen/start_info.h
>>
>> Does it make sense to bring in Linux
>> include/xen/interface/hvm/start_info.h via QEMU's
>> include/standard-headers/?
>>
>> QEMU has a script in scripts/update-linux-header.sh for syncing Linux
>> headers into include/standard-headers/.  This makes it easy to keep
>> Linux header files up-to-date.  We basically treat files in
>> include/standard-headers/ as auto-generated.
>>
>> If you define start_info.h yourself without using
>> include/standard-headers/, then it won't be synced with Linux.
>>
> 
> That does seem better.  I will make that change.

When attempting to implement this, I found the canonical copy of this 
header file is actually in Xen and the Linux copy is kept in sync with 
that.  Also, 'make headers_install' doesn't install those Xen headers.

Instead I updated the commit comment to mention the canonical copy 
location.  This file isn't expected to change much so I think keeping it 
in sync in future shouldn't be onerous.

Regards,
Liam

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary
  2018-12-11 14:17   ` [Qemu-devel] " Stefan Hajnoczi
@ 2018-12-21 20:03       ` Liam Merwick
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-21 20:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, pbonzini, ehabkost, rth, xen-devel, sgarzare, mst,
	maran.wilson, Boris Ostrovsky, George Kennedy, liam.merwick

Thanks Stefan for the review - comments inline.

On 11/12/2018 14:17, Stefan Hajnoczi wrote:
> On Wed, Dec 05, 2018 at 10:37:25PM +0000, Liam Merwick wrote:
>> From: Liam Merwick <Liam.Merwick@oracle.com>
>>
>> Add support to read the PVH Entry address from an ELF note in the
>> uncompressed kernel binary (as defined by the x86/HVM direct boot ABI).
>> This 32-bit entry point will be used by QEMU to load the kernel in the
>> guest and jump into the kernel entry point.
>>
>> For now, a call to this function is added in pc_memory_init() to read the
>> address - a future patch will use the entry point.
>>
>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>> ---
>>   hw/i386/pc.c  | 272 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   include/elf.h |  10 +++
>>   2 files changed, 281 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index f095725dbab2..056aa46d99b9 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -109,6 +109,9 @@ static struct e820_entry *e820_table;
>>   static unsigned e820_entries;
>>   struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>>   
>> +/* Physical Address of PVH entry point read from kernel ELF NOTE */
>> +static size_t pvh_start_addr;
>> +
>>   void gsi_handler(void *opaque, int n, int level)
>>   {
>>       GSIState *s = opaque;
>> @@ -834,6 +837,267 @@ struct setup_data {
>>       uint8_t data[0];
>>   } __attribute__((packed));
>>   
>> +/*
>> + * Search through the ELF Notes for an entry with the given
>> + * ELF Note type
>> + */
>> +static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64,
>> +    size_t elf_note_type)
> 
> Generic ELF code.  Can you put it in hw/core/loader.c?


I've added a modified/slimmed down version to include/hw/elf_ops.h 
(which now handles 32 and 64 bit as you mention below).  I've put this 
in a separate commit.


> 
>> +{
>> +    void *nhdr = NULL;
>> +    size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
>> +    size_t elf_note_entry_sz = 0;
>> +    size_t phdr_off;
>> +    size_t phdr_align;
>> +    size_t phdr_memsz;
>> +    size_t nhdr_namesz;
>> +    size_t nhdr_descsz;
>> +    size_t note_type;
> 
> The macro tricks used by hw/core/loader.c are nasty, but I think they
> get the types right.  Here the Elf64 on 32-bit host case is definitely
> broken due to using size_t.  Perhaps 64-on-32 isn't supported, but
> getting the types right is worth discussing.
> 
>> +
>> +    phdr_off = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
>> +    phdr_align = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
>> +    phdr_memsz = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
>> +
>> +    nhdr = ehdr + phdr_off;
> 
> The ELF file is untrusted.  All inputs must be validated.  phdr_off
> could be an bogus/malicious value.


Most of the parsing of the ELF binary goes away due to moving to parse 
during elf_load() - more info below.


> 
>> +    note_type = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
>> +    nhdr_namesz = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
>> +    nhdr_descsz = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
>> +
>> +    while (note_type != elf_note_type) {
>> +        elf_note_entry_sz = nhdr_size +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
>> +            QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
>> +
>> +        /*
>> +         * Verify that we haven't exceeded the end of the ELF Note section.
>> +         * If we have, then there is no note of the given type present
>> +         * in the ELF Notes.
>> +         */
>> +        if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) {
>> +            error_report("Note type (0x%lx) not found in ELF Note section",
>> +                elf_note_type);
>> +            return NULL;
>> +        }
>> +
>> +        /* skip to the next ELF Note entry */
>> +        nhdr += elf_note_entry_sz;
>> +        note_type = elf_is64 ?
>> +            ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
>> +        nhdr_namesz = elf_is64 ?
>> +            ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
>> +        nhdr_descsz = elf_is64 ?
>> +            ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
>> +    }
>> +
>> +    return nhdr;
>> +}
>> +
>> +/*
>> + * The entry point into the kernel for PVH boot is different from
>> + * the native entry point.  The PVH entry is defined by the x86/HVM
>> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> + * This function reads the ELF headers of the binary specified on the
>> + * command line by -kernel (path contained in 'filename') and discovers
>> + * the PVH entry address from the appropriate ELF Note.
>> + *
>> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> + * global variable. The ELF class of the binary is returned via 'elfclass'
>> + * (although the entry point is 32-bit, the kernel binary can be either
>> + * 32-bit or 64-bit).
>> + */
>> +static bool read_pvh_start_addr_elf_note(const char *filename,
>> +    unsigned char *elfclass)
>> +{
> 
> Can this be integrated into ELF loading?  For example, could the elf
> loader take a function pointer to perform additional logic (e.g.
> extracting the PVH entry point)?  That avoids reparsing the input file.


I have rewritten this considerably based on that suggestion.  The 
reading of the PVH entry point is now done in a single pass during 
elf_load() - I added a commit that adds a new optional function pointer 
to parse the ELF note type (which is passed in via the existing 
translate_opaque arg - the function already had 11 args so I didn't want 
to add more than one new arg).  Another commit adds a function to 
elf_ops.h to find an ELF note matching a specific type and then the 4th 
patch to do the PVH boot is for the most part the same - just minor 
load_elfboot() changes and the addition of a read_pvh_start_addr() 
helper function for load_elf()

v2 will follow in a sec.

Regards,
Liam

> 
>> +    void *ehdr = NULL; /* Cast to Elf64_Ehdr or Elf32_Ehdr */
>> +    void *phdr = NULL; /* Cast to Elf64_Phdr or Elf32_Phdr */
>> +    void *nhdr = NULL; /* Cast to Elf64_Nhdr or Elf32_Nhdr */
>> +    struct stat statbuf;
>> +    size_t ehdr_size;
>> +    size_t phdr_size;
>> +    size_t nhdr_size;
>> +    size_t elf_note_data_addr;
>> +    /* Ehdr fields */
>> +    size_t ehdr_poff;
>> +    /* Phdr fields */
>> +    size_t phdr_off;
>> +    size_t phdr_align;
>> +    size_t phdr_memsz;
>> +    size_t phdr_type;
>> +    /* Nhdr fields */
>> +    size_t nhdr_namesz;
>> +    size_t nhdr_descsz;
>> +    bool elf_is64;
>> +    FILE *file;
>> +    union {
>> +        Elf32_Ehdr h32;
>> +        Elf64_Ehdr h64;
>> +    } elf_header;
>> +    Error *err = NULL;
>> +
>> +    pvh_start_addr = 0;
>> +
>> +    if (filename == NULL) {
>> +        return false;
>> +    }
>> +
>> +    file = fopen(filename, "rb");
>> +    if (file == NULL) {
>> +        error_report("fopen(%s) failed", filename);
>> +        return false;
>> +    }
>> +
>> +    if (fstat(fileno(file), &statbuf) < 0) {
>> +        error_report("fstat() failed on file (%s)", filename);
>> +        return false;
>> +    }
>> +
>> +    load_elf_hdr(filename, &elf_header, &elf_is64, &err);
>> +    if (err) {
>> +        error_free(err);
>> +        fclose(file);
>> +        return false;
>> +    }
>> +
>> +    *elfclass = elf_is64 ?
>> +        elf_header.h64.e_ident[EI_CLASS] : elf_header.h32.e_ident[EI_CLASS];
>> +    if (*elfclass == ELFCLASSNONE) {
>> +        error_report("kernel binary (%s) is ELFCLASSNONE", filename);
>> +        fclose(file);
>> +        return false;
>> +    }
>> +
>> +    ehdr_size = elf_is64 ? sizeof(Elf64_Ehdr) : sizeof(Elf32_Ehdr);
>> +    phdr_size = elf_is64 ? sizeof(Elf64_Phdr) : sizeof(Elf32_Phdr);
>> +    nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
>> +
>> +    /* We have already validated the ELF header when calling elf_load_hdr() */
>> +
>> +    ehdr = mmap(0, statbuf.st_size,
>> +        PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
>> +    if (ehdr == MAP_FAILED) {
>> +        error_report("Failed to mmap kernel binary (%s)", filename);
>> +        goto done;
>> +    }
>> +
>> +    /*
>> +     * Search through the program execution header for the
>> +     * ELF Note section.
>> +     */
>> +
>> +    ehdr_poff = elf_is64 ?
>> +        ((Elf64_Ehdr *)(ehdr))->e_phoff : ((Elf32_Ehdr *)(ehdr))->e_phoff;
>> +    if (statbuf.st_size < (ehdr_size + ehdr_poff)) {
>> +        error_report("ELF NOTE section exceeds file (%s) size",
>> +            filename);
>> +        goto done;
>> +    }
>> +
>> +    phdr = ehdr + ehdr_poff;
>> +    phdr_type = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
>> +    while (phdr != NULL && phdr_type != PT_NOTE) {
>> +        if (statbuf.st_size < ((phdr - ehdr) + phdr_size)) {
>> +            error_report("ELF Program headers in file (%s) too short",
>> +                filename);
>> +            goto done;
>> +        }
>> +        phdr += phdr_size;
>> +        phdr_type = elf_is64 ?
>> +            ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
>> +    }
>> +
>> +    phdr_off = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
>> +    phdr_align = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
>> +    phdr_memsz = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
>> +
>> +    /*
>> +     * check that the start of the ELF Note section is within the bounds
>> +     * of the kernel ELF binary
>> +     */
>> +    if (statbuf.st_size < (ehdr_poff + phdr_size + phdr_off)) {
>> +        error_report("Start of ELF note section outside of file (%s) bounds",
>> +            filename);
>> +        goto done;
>> +    }
>> +    /*
>> +     * check that the end of the ELF Note section is within the bounds
>> +     * of the kernel ELF binary
>> +     */
>> +    if (statbuf.st_size < (phdr_off + phdr_memsz)) {
>> +        error_report("End of ELF note section outside of file (%s) bounds",
>> +            filename);
>> +        goto done;
>> +    }
>> +
>> +    /*
>> +     * Search through the ELF Notes for an entry with the
>> +     * Physical Address (PA) of the PVH entry point.
>> +     */
>> +    nhdr = get_elf_note_type(ehdr, phdr, elf_is64, XEN_ELFNOTE_PHYS32_ENTRY);
>> +    if (nhdr == NULL) {
>> +        error_report("No PVH Entry details in kernel (%s) ELF Note section",
>> +            filename);
>> +        goto done;
>> +    }
>> +
>> +    /*
>> +     * Verify that the returned ELF Note header doesn't exceed the
>> +     * end of the kernel file
>> +     */
>> +    if (statbuf.st_size < ((nhdr - ehdr))) {
>> +        error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
>> +            (nhdr - ehdr), filename, statbuf.st_size);
>> +        goto done;
>> +    }
>> +
>> +    nhdr_namesz = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
>> +    nhdr_descsz = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
>> +
>> +    /*
>> +     * Verify that the ELF Note contents don't exceed the end of the
>> +     * kernel file
>> +     */
>> +    if (statbuf.st_size < ((nhdr - ehdr)) + nhdr_size +
>> +        QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
>> +        QEMU_ALIGN_UP(nhdr_descsz, phdr_align)) {
>> +        error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
>> +            (nhdr - ehdr) + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
>> +            QEMU_ALIGN_UP(nhdr_descsz, phdr_align), statbuf.st_size);
>> +        goto done;
>> +    }
>> +
>> +    elf_note_data_addr =
>> +        (size_t)nhdr + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +
>> +    pvh_start_addr = *(size_t *)elf_note_data_addr;
>> +
>> +    /*
>> +     * Verify that the PVH Entry point address does not exceed the
>> +     * bounds of the kernel file.
>> +     */
>> +    if (statbuf.st_size < pvh_start_addr) {
>> +        error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
>> +            (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
>> +        pvh_start_addr = 0;
>> +        goto done;
>> +    }
>> +
>> +done:
>> +    (void) munmap(ehdr, statbuf.st_size);
>> +    return pvh_start_addr != 0;
>> +}
>> +
>>   static void load_linux(PCMachineState *pcms,
>>                          FWCfgState *fw_cfg)
>>   {
>> @@ -1334,9 +1598,11 @@ void pc_memory_init(PCMachineState *pcms,
>>       int linux_boot, i;
>>       MemoryRegion *ram, *option_rom_mr;
>>       MemoryRegion *ram_below_4g, *ram_above_4g;
>> -    FWCfgState *fw_cfg;
>> +    FWCfgState *fw_cfg = NULL;
>> +    unsigned char class = ELFCLASSNONE;
>>       MachineState *machine = MACHINE(pcms);
>>       PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> +    const char *kernel_filename = machine->kernel_filename;
>>   
>>       assert(machine->ram_size == pcms->below_4g_mem_size +
>>                                   pcms->above_4g_mem_size);
>> @@ -1418,6 +1684,10 @@ void pc_memory_init(PCMachineState *pcms,
>>                                       &machine->device_memory->mr);
>>       }
>>   
>> +    if (linux_boot) {
>> +        read_pvh_start_addr_elf_note(kernel_filename, &class);
>> +    }
>> +
>>       /* Initialize PC system firmware */
>>       pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
>>   
>> diff --git a/include/elf.h b/include/elf.h
>> index c151164b63da..1f82c7a7124b 100644
>> --- a/include/elf.h
>> +++ b/include/elf.h
>> @@ -1585,6 +1585,16 @@ typedef struct elf64_shdr {
>>   #define NT_ARM_HW_WATCH 0x403           /* ARM hardware watchpoint registers */
>>   #define NT_ARM_SYSTEM_CALL      0x404   /* ARM system call number */
>>   
>> +/*
>> + * Physical entry point into the kernel.
>> + *
>> + * 32bit entry point into the kernel. When requested to launch the
>> + * guest kernel, use this entry point to launch the guest in 32-bit
>> + * protected mode with paging disabled.
>> + *
>> + * [ Corresponding definition in Linux kernel: include/xen/interface/elfnote.h ]
>> + */
>> +#define XEN_ELFNOTE_PHYS32_ENTRY    18  /* 0x12 */
>>   
>>   /* Note header in a PT_NOTE section */
>>   typedef struct elf32_note {
>> -- 
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary
@ 2018-12-21 20:03       ` Liam Merwick
  0 siblings, 0 replies; 32+ messages in thread
From: Liam Merwick @ 2018-12-21 20:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: liam.merwick, ehabkost, mst, maran.wilson, qemu-devel,
	George Kennedy, xen-devel, pbonzini, Boris Ostrovsky, rth,
	sgarzare

Thanks Stefan for the review - comments inline.

On 11/12/2018 14:17, Stefan Hajnoczi wrote:
> On Wed, Dec 05, 2018 at 10:37:25PM +0000, Liam Merwick wrote:
>> From: Liam Merwick <Liam.Merwick@oracle.com>
>>
>> Add support to read the PVH Entry address from an ELF note in the
>> uncompressed kernel binary (as defined by the x86/HVM direct boot ABI).
>> This 32-bit entry point will be used by QEMU to load the kernel in the
>> guest and jump into the kernel entry point.
>>
>> For now, a call to this function is added in pc_memory_init() to read the
>> address - a future patch will use the entry point.
>>
>> Signed-off-by: Liam Merwick <Liam.Merwick@oracle.com>
>> ---
>>   hw/i386/pc.c  | 272 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   include/elf.h |  10 +++
>>   2 files changed, 281 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index f095725dbab2..056aa46d99b9 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -109,6 +109,9 @@ static struct e820_entry *e820_table;
>>   static unsigned e820_entries;
>>   struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>>   
>> +/* Physical Address of PVH entry point read from kernel ELF NOTE */
>> +static size_t pvh_start_addr;
>> +
>>   void gsi_handler(void *opaque, int n, int level)
>>   {
>>       GSIState *s = opaque;
>> @@ -834,6 +837,267 @@ struct setup_data {
>>       uint8_t data[0];
>>   } __attribute__((packed));
>>   
>> +/*
>> + * Search through the ELF Notes for an entry with the given
>> + * ELF Note type
>> + */
>> +static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64,
>> +    size_t elf_note_type)
> 
> Generic ELF code.  Can you put it in hw/core/loader.c?


I've added a modified/slimmed down version to include/hw/elf_ops.h 
(which now handles 32 and 64 bit as you mention below).  I've put this 
in a separate commit.


> 
>> +{
>> +    void *nhdr = NULL;
>> +    size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
>> +    size_t elf_note_entry_sz = 0;
>> +    size_t phdr_off;
>> +    size_t phdr_align;
>> +    size_t phdr_memsz;
>> +    size_t nhdr_namesz;
>> +    size_t nhdr_descsz;
>> +    size_t note_type;
> 
> The macro tricks used by hw/core/loader.c are nasty, but I think they
> get the types right.  Here the Elf64 on 32-bit host case is definitely
> broken due to using size_t.  Perhaps 64-on-32 isn't supported, but
> getting the types right is worth discussing.
> 
>> +
>> +    phdr_off = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
>> +    phdr_align = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
>> +    phdr_memsz = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
>> +
>> +    nhdr = ehdr + phdr_off;
> 
> The ELF file is untrusted.  All inputs must be validated.  phdr_off
> could be an bogus/malicious value.


Most of the parsing of the ELF binary goes away due to moving to parse 
during elf_load() - more info below.


> 
>> +    note_type = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
>> +    nhdr_namesz = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
>> +    nhdr_descsz = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
>> +
>> +    while (note_type != elf_note_type) {
>> +        elf_note_entry_sz = nhdr_size +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
>> +            QEMU_ALIGN_UP(nhdr_descsz, phdr_align);
>> +
>> +        /*
>> +         * Verify that we haven't exceeded the end of the ELF Note section.
>> +         * If we have, then there is no note of the given type present
>> +         * in the ELF Notes.
>> +         */
>> +        if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) {
>> +            error_report("Note type (0x%lx) not found in ELF Note section",
>> +                elf_note_type);
>> +            return NULL;
>> +        }
>> +
>> +        /* skip to the next ELF Note entry */
>> +        nhdr += elf_note_entry_sz;
>> +        note_type = elf_is64 ?
>> +            ((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type;
>> +        nhdr_namesz = elf_is64 ?
>> +            ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
>> +        nhdr_descsz = elf_is64 ?
>> +            ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
>> +    }
>> +
>> +    return nhdr;
>> +}
>> +
>> +/*
>> + * The entry point into the kernel for PVH boot is different from
>> + * the native entry point.  The PVH entry is defined by the x86/HVM
>> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> + * This function reads the ELF headers of the binary specified on the
>> + * command line by -kernel (path contained in 'filename') and discovers
>> + * the PVH entry address from the appropriate ELF Note.
>> + *
>> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> + * global variable. The ELF class of the binary is returned via 'elfclass'
>> + * (although the entry point is 32-bit, the kernel binary can be either
>> + * 32-bit or 64-bit).
>> + */
>> +static bool read_pvh_start_addr_elf_note(const char *filename,
>> +    unsigned char *elfclass)
>> +{
> 
> Can this be integrated into ELF loading?  For example, could the elf
> loader take a function pointer to perform additional logic (e.g.
> extracting the PVH entry point)?  That avoids reparsing the input file.


I have rewritten this considerably based on that suggestion.  The 
reading of the PVH entry point is now done in a single pass during 
elf_load() - I added a commit that adds a new optional function pointer 
to parse the ELF note type (which is passed in via the existing 
translate_opaque arg - the function already had 11 args so I didn't want 
to add more than one new arg).  Another commit adds a function to 
elf_ops.h to find an ELF note matching a specific type and then the 4th 
patch to do the PVH boot is for the most part the same - just minor 
load_elfboot() changes and the addition of a read_pvh_start_addr() 
helper function for load_elf()

v2 will follow in a sec.

Regards,
Liam

> 
>> +    void *ehdr = NULL; /* Cast to Elf64_Ehdr or Elf32_Ehdr */
>> +    void *phdr = NULL; /* Cast to Elf64_Phdr or Elf32_Phdr */
>> +    void *nhdr = NULL; /* Cast to Elf64_Nhdr or Elf32_Nhdr */
>> +    struct stat statbuf;
>> +    size_t ehdr_size;
>> +    size_t phdr_size;
>> +    size_t nhdr_size;
>> +    size_t elf_note_data_addr;
>> +    /* Ehdr fields */
>> +    size_t ehdr_poff;
>> +    /* Phdr fields */
>> +    size_t phdr_off;
>> +    size_t phdr_align;
>> +    size_t phdr_memsz;
>> +    size_t phdr_type;
>> +    /* Nhdr fields */
>> +    size_t nhdr_namesz;
>> +    size_t nhdr_descsz;
>> +    bool elf_is64;
>> +    FILE *file;
>> +    union {
>> +        Elf32_Ehdr h32;
>> +        Elf64_Ehdr h64;
>> +    } elf_header;
>> +    Error *err = NULL;
>> +
>> +    pvh_start_addr = 0;
>> +
>> +    if (filename == NULL) {
>> +        return false;
>> +    }
>> +
>> +    file = fopen(filename, "rb");
>> +    if (file == NULL) {
>> +        error_report("fopen(%s) failed", filename);
>> +        return false;
>> +    }
>> +
>> +    if (fstat(fileno(file), &statbuf) < 0) {
>> +        error_report("fstat() failed on file (%s)", filename);
>> +        return false;
>> +    }
>> +
>> +    load_elf_hdr(filename, &elf_header, &elf_is64, &err);
>> +    if (err) {
>> +        error_free(err);
>> +        fclose(file);
>> +        return false;
>> +    }
>> +
>> +    *elfclass = elf_is64 ?
>> +        elf_header.h64.e_ident[EI_CLASS] : elf_header.h32.e_ident[EI_CLASS];
>> +    if (*elfclass == ELFCLASSNONE) {
>> +        error_report("kernel binary (%s) is ELFCLASSNONE", filename);
>> +        fclose(file);
>> +        return false;
>> +    }
>> +
>> +    ehdr_size = elf_is64 ? sizeof(Elf64_Ehdr) : sizeof(Elf32_Ehdr);
>> +    phdr_size = elf_is64 ? sizeof(Elf64_Phdr) : sizeof(Elf32_Phdr);
>> +    nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr);
>> +
>> +    /* We have already validated the ELF header when calling elf_load_hdr() */
>> +
>> +    ehdr = mmap(0, statbuf.st_size,
>> +        PROT_READ | PROT_WRITE, MAP_PRIVATE, fileno(file), 0);
>> +    if (ehdr == MAP_FAILED) {
>> +        error_report("Failed to mmap kernel binary (%s)", filename);
>> +        goto done;
>> +    }
>> +
>> +    /*
>> +     * Search through the program execution header for the
>> +     * ELF Note section.
>> +     */
>> +
>> +    ehdr_poff = elf_is64 ?
>> +        ((Elf64_Ehdr *)(ehdr))->e_phoff : ((Elf32_Ehdr *)(ehdr))->e_phoff;
>> +    if (statbuf.st_size < (ehdr_size + ehdr_poff)) {
>> +        error_report("ELF NOTE section exceeds file (%s) size",
>> +            filename);
>> +        goto done;
>> +    }
>> +
>> +    phdr = ehdr + ehdr_poff;
>> +    phdr_type = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
>> +    while (phdr != NULL && phdr_type != PT_NOTE) {
>> +        if (statbuf.st_size < ((phdr - ehdr) + phdr_size)) {
>> +            error_report("ELF Program headers in file (%s) too short",
>> +                filename);
>> +            goto done;
>> +        }
>> +        phdr += phdr_size;
>> +        phdr_type = elf_is64 ?
>> +            ((Elf64_Phdr *)phdr)->p_type : ((Elf32_Phdr *)phdr)->p_type;
>> +    }
>> +
>> +    phdr_off = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset;
>> +    phdr_align = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align;
>> +    phdr_memsz = elf_is64 ?
>> +        ((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz;
>> +
>> +    /*
>> +     * check that the start of the ELF Note section is within the bounds
>> +     * of the kernel ELF binary
>> +     */
>> +    if (statbuf.st_size < (ehdr_poff + phdr_size + phdr_off)) {
>> +        error_report("Start of ELF note section outside of file (%s) bounds",
>> +            filename);
>> +        goto done;
>> +    }
>> +    /*
>> +     * check that the end of the ELF Note section is within the bounds
>> +     * of the kernel ELF binary
>> +     */
>> +    if (statbuf.st_size < (phdr_off + phdr_memsz)) {
>> +        error_report("End of ELF note section outside of file (%s) bounds",
>> +            filename);
>> +        goto done;
>> +    }
>> +
>> +    /*
>> +     * Search through the ELF Notes for an entry with the
>> +     * Physical Address (PA) of the PVH entry point.
>> +     */
>> +    nhdr = get_elf_note_type(ehdr, phdr, elf_is64, XEN_ELFNOTE_PHYS32_ENTRY);
>> +    if (nhdr == NULL) {
>> +        error_report("No PVH Entry details in kernel (%s) ELF Note section",
>> +            filename);
>> +        goto done;
>> +    }
>> +
>> +    /*
>> +     * Verify that the returned ELF Note header doesn't exceed the
>> +     * end of the kernel file
>> +     */
>> +    if (statbuf.st_size < ((nhdr - ehdr))) {
>> +        error_report("ELF Nhdr offset (0x%lx) exceeds file (%s) bounds (%ld)",
>> +            (nhdr - ehdr), filename, statbuf.st_size);
>> +        goto done;
>> +    }
>> +
>> +    nhdr_namesz = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz;
>> +    nhdr_descsz = elf_is64 ?
>> +        ((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz;
>> +
>> +    /*
>> +     * Verify that the ELF Note contents don't exceed the end of the
>> +     * kernel file
>> +     */
>> +    if (statbuf.st_size < ((nhdr - ehdr)) + nhdr_size +
>> +        QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
>> +        QEMU_ALIGN_UP(nhdr_descsz, phdr_align)) {
>> +        error_report("ELF Nhdr contents (0x%lx) exceeds file bounds (%ld)",
>> +            (nhdr - ehdr) + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align) +
>> +            QEMU_ALIGN_UP(nhdr_descsz, phdr_align), statbuf.st_size);
>> +        goto done;
>> +    }
>> +
>> +    elf_note_data_addr =
>> +        (size_t)nhdr + nhdr_size + QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +
>> +    pvh_start_addr = *(size_t *)elf_note_data_addr;
>> +
>> +    /*
>> +     * Verify that the PVH Entry point address does not exceed the
>> +     * bounds of the kernel file.
>> +     */
>> +    if (statbuf.st_size < pvh_start_addr) {
>> +        error_report("PVH ELF note addr (0x%lx) exceeds file (%s) bounds (%ld)",
>> +            (elf_note_data_addr - (size_t)ehdr), filename, statbuf.st_size);
>> +        pvh_start_addr = 0;
>> +        goto done;
>> +    }
>> +
>> +done:
>> +    (void) munmap(ehdr, statbuf.st_size);
>> +    return pvh_start_addr != 0;
>> +}
>> +
>>   static void load_linux(PCMachineState *pcms,
>>                          FWCfgState *fw_cfg)
>>   {
>> @@ -1334,9 +1598,11 @@ void pc_memory_init(PCMachineState *pcms,
>>       int linux_boot, i;
>>       MemoryRegion *ram, *option_rom_mr;
>>       MemoryRegion *ram_below_4g, *ram_above_4g;
>> -    FWCfgState *fw_cfg;
>> +    FWCfgState *fw_cfg = NULL;
>> +    unsigned char class = ELFCLASSNONE;
>>       MachineState *machine = MACHINE(pcms);
>>       PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> +    const char *kernel_filename = machine->kernel_filename;
>>   
>>       assert(machine->ram_size == pcms->below_4g_mem_size +
>>                                   pcms->above_4g_mem_size);
>> @@ -1418,6 +1684,10 @@ void pc_memory_init(PCMachineState *pcms,
>>                                       &machine->device_memory->mr);
>>       }
>>   
>> +    if (linux_boot) {
>> +        read_pvh_start_addr_elf_note(kernel_filename, &class);
>> +    }
>> +
>>       /* Initialize PC system firmware */
>>       pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
>>   
>> diff --git a/include/elf.h b/include/elf.h
>> index c151164b63da..1f82c7a7124b 100644
>> --- a/include/elf.h
>> +++ b/include/elf.h
>> @@ -1585,6 +1585,16 @@ typedef struct elf64_shdr {
>>   #define NT_ARM_HW_WATCH 0x403           /* ARM hardware watchpoint registers */
>>   #define NT_ARM_SYSTEM_CALL      0x404   /* ARM system call number */
>>   
>> +/*
>> + * Physical entry point into the kernel.
>> + *
>> + * 32bit entry point into the kernel. When requested to launch the
>> + * guest kernel, use this entry point to launch the guest in 32-bit
>> + * protected mode with paging disabled.
>> + *
>> + * [ Corresponding definition in Linux kernel: include/xen/interface/elfnote.h ]
>> + */
>> +#define XEN_ELFNOTE_PHYS32_ENTRY    18  /* 0x12 */
>>   
>>   /* Note header in a PT_NOTE section */
>>   typedef struct elf32_note {
>> -- 
>> 1.8.3.1
>>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-12-21 21:35 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05 22:37 [Qemu-devel] [RFC 0/3] QEMU changes to do PVH boot Liam Merwick
2018-12-05 22:37 ` Liam Merwick
2018-12-05 22:37 ` [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file Liam Merwick
2018-12-05 22:37   ` Liam Merwick
2018-12-11 14:01   ` [Qemu-devel] " Stefan Hajnoczi
2018-12-11 14:01     ` Stefan Hajnoczi
2018-12-11 14:57     ` [Qemu-devel] " Liam Merwick
2018-12-11 14:57       ` Liam Merwick
2018-12-11 15:03       ` [Qemu-devel] " Daniel P. Berrangé
2018-12-11 15:03         ` Daniel P. Berrangé
2018-12-21 20:03       ` Liam Merwick
2018-12-21 20:03         ` Liam Merwick
2018-12-05 22:37 ` [Qemu-devel] [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary Liam Merwick
2018-12-05 22:37   ` Liam Merwick
2018-12-11 14:17   ` [Qemu-devel] " Stefan Hajnoczi
2018-12-21 20:03     ` Liam Merwick
2018-12-21 20:03       ` Liam Merwick
2018-12-11 14:17   ` Stefan Hajnoczi
2018-12-05 22:37 ` [Qemu-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI Liam Merwick
2018-12-05 22:37   ` Liam Merwick
2018-12-11 17:11   ` [Qemu-devel] " Stefano Garzarella
2018-12-11 18:35     ` Maran Wilson
2018-12-11 18:35       ` Maran Wilson
2018-12-12 15:28       ` [Qemu-devel] " Stefano Garzarella
2018-12-12 17:36         ` Maran Wilson
2018-12-12 17:36           ` Maran Wilson
2018-12-12 15:28       ` Stefano Garzarella
2018-12-11 17:11   ` Stefano Garzarella
2018-12-06  0:01 ` [Qemu-devel] [RFC 0/3] QEMU changes to do PVH boot no-reply
2018-12-06  0:01   ` no-reply
2018-12-06  6:18 ` Maran Wilson
2018-12-06  6:18   ` Maran Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.