All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-24 12:44 ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm, Sergio Lopez

Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a minimalist machine type free
from the burden of legacy compatibility, serving as a stepping stone
for future projects aiming at improving boot times, reducing the
attack surface and slimming down QEMU's footprint.

The microvm machine type supports the following devices:

 - ISA bus
 - i8259 PIC
 - LAPIC (implicit if using KVM)
 - IOAPIC (defaults to kernel_irqchip_split = true)
 - i8254 PIT
 - MC146818 RTC (optional)
 - kvmclock (if using KVM)
 - fw_cfg
 - One ISA serial port (optional)
 - Up to eight virtio-mmio devices (configured by the user)

It supports the following machine-specific options:

microvm.option-roms=bool (Set off to disable loading option ROMs)
microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)

By default, microvm uses qboot as its BIOS, to obtain better boot
times, but it's also compatible with SeaBIOS.

As no current FW is able to boot from a block device using virtio-mmio
as its transport, a microvm-based VM needs to be run using a host-side
kernel and, optionally, an initrd image.

This is an example of instantiating a microvm VM with a virtio-mmio
based console:

qemu-system-x86_64 -M microvm
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -chardev stdio,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

This is another example, this time using an ISA serial port, useful
for debugging purposes:

qemu-system-x86_64 -M microvm \
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -serial stdio \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

Finally, in this example a microvm VM is instantiated without RTC,
without an ISA serial port and without loading the option ROMs,
obtaining the smallest configuration:

qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -chardev stdio,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

---

Changelog
v4:
 - This is a complete rewrite of the whole patchset, with a focus on
   reusing as much existing code as possible to ease the maintenance burden
   and making the machine type as compatible as possible by default. As
   a result, the number of lines dedicated specifically to microvm is
   383 (code lines measured by "cloc") and, with the default
   configuration, it's now able to boot both PVH ELF images and
   bzImages with either SeaBIOS or qboot.

v3:
  - Add initrd support (thanks Stefano).

v2:
  - Drop "[PATCH 1/4] hw/i386: Factorize CPU routine".
  - Simplify machine definition (thanks Eduardo).
  - Remove use of unneeded NUMA-related callbacks (thanks Eduardo).
  - Add a patch to factorize PVH-related functions.
  - Replace use of Linux's Zero Page with PVH (thanks Maran and Paolo).
  
---
Sergio Lopez (8):
  hw/i386: Factorize PVH related functions
  hw/i386: Factorize e820 related functions
  hw/virtio: Factorize virtio-mmio headers
  hw/i386: split PCMachineState deriving X86MachineState from it
  fw_cfg: add "modify" functions for all types
  roms: add microvm-bios (qboot) as binary and git submodule
  docs/microvm.txt: document the new microvm machine type
  hw/i386: Introduce the microvm machine type

 .gitmodules                      |   3 +
 default-configs/i386-softmmu.mak |   1 +
 docs/microvm.txt                 |  78 +++
 hw/acpi/cpu_hotplug.c            |  10 +-
 hw/i386/Kconfig                  |   4 +
 hw/i386/Makefile.objs            |   4 +
 hw/i386/acpi-build.c             |  31 +-
 hw/i386/amd_iommu.c              |   4 +-
 hw/i386/e820.c                   |  99 ++++
 hw/i386/e820.h                   |  11 +
 hw/i386/intel_iommu.c            |   4 +-
 hw/i386/microvm.c                | 512 +++++++++++++++++
 hw/i386/pc.c                     | 960 +++----------------------------
 hw/i386/pc_piix.c                |  48 +-
 hw/i386/pc_q35.c                 |  38 +-
 hw/i386/pc_sysfw.c               |  60 +-
 hw/i386/pvh.c                    | 113 ++++
 hw/i386/pvh.h                    |  10 +
 hw/i386/x86.c                    | 788 +++++++++++++++++++++++++
 hw/intc/ioapic.c                 |   3 +-
 hw/nvram/fw_cfg.c                |  29 +
 hw/virtio/virtio-mmio.c          |  35 +-
 include/hw/i386/microvm.h        |  80 +++
 include/hw/i386/pc.h             |  40 +-
 include/hw/i386/x86.h            |  97 ++++
 include/hw/nvram/fw_cfg.h        |  42 ++
 include/hw/virtio/virtio-mmio.h  |  60 ++
 pc-bios/bios-microvm.bin         | Bin 0 -> 65536 bytes
 roms/Makefile                    |   6 +
 roms/qboot                       |   1 +
 target/i386/kvm.c                |   1 +
 31 files changed, 2102 insertions(+), 1070 deletions(-)
 create mode 100644 docs/microvm.txt
 create mode 100644 hw/i386/e820.c
 create mode 100644 hw/i386/e820.h
 create mode 100644 hw/i386/microvm.c
 create mode 100644 hw/i386/pvh.c
 create mode 100644 hw/i386/pvh.h
 create mode 100644 hw/i386/x86.c
 create mode 100644 include/hw/i386/microvm.h
 create mode 100644 include/hw/i386/x86.h
 create mode 100644 include/hw/virtio/virtio-mmio.h
 create mode 100755 pc-bios/bios-microvm.bin
 create mode 160000 roms/qboot

-- 
2.21.0


^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-24 12:44 ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a minimalist machine type free
from the burden of legacy compatibility, serving as a stepping stone
for future projects aiming at improving boot times, reducing the
attack surface and slimming down QEMU's footprint.

The microvm machine type supports the following devices:

 - ISA bus
 - i8259 PIC
 - LAPIC (implicit if using KVM)
 - IOAPIC (defaults to kernel_irqchip_split = true)
 - i8254 PIT
 - MC146818 RTC (optional)
 - kvmclock (if using KVM)
 - fw_cfg
 - One ISA serial port (optional)
 - Up to eight virtio-mmio devices (configured by the user)

It supports the following machine-specific options:

microvm.option-roms=bool (Set off to disable loading option ROMs)
microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)

By default, microvm uses qboot as its BIOS, to obtain better boot
times, but it's also compatible with SeaBIOS.

As no current FW is able to boot from a block device using virtio-mmio
as its transport, a microvm-based VM needs to be run using a host-side
kernel and, optionally, an initrd image.

This is an example of instantiating a microvm VM with a virtio-mmio
based console:

qemu-system-x86_64 -M microvm
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -chardev stdio,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

This is another example, this time using an ISA serial port, useful
for debugging purposes:

qemu-system-x86_64 -M microvm \
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -serial stdio \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

Finally, in this example a microvm VM is instantiated without RTC,
without an ISA serial port and without loading the option ROMs,
obtaining the smallest configuration:

qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -chardev stdio,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

---

Changelog
v4:
 - This is a complete rewrite of the whole patchset, with a focus on
   reusing as much existing code as possible to ease the maintenance burden
   and making the machine type as compatible as possible by default. As
   a result, the number of lines dedicated specifically to microvm is
   383 (code lines measured by "cloc") and, with the default
   configuration, it's now able to boot both PVH ELF images and
   bzImages with either SeaBIOS or qboot.

v3:
  - Add initrd support (thanks Stefano).

v2:
  - Drop "[PATCH 1/4] hw/i386: Factorize CPU routine".
  - Simplify machine definition (thanks Eduardo).
  - Remove use of unneeded NUMA-related callbacks (thanks Eduardo).
  - Add a patch to factorize PVH-related functions.
  - Replace use of Linux's Zero Page with PVH (thanks Maran and Paolo).
  
---
Sergio Lopez (8):
  hw/i386: Factorize PVH related functions
  hw/i386: Factorize e820 related functions
  hw/virtio: Factorize virtio-mmio headers
  hw/i386: split PCMachineState deriving X86MachineState from it
  fw_cfg: add "modify" functions for all types
  roms: add microvm-bios (qboot) as binary and git submodule
  docs/microvm.txt: document the new microvm machine type
  hw/i386: Introduce the microvm machine type

 .gitmodules                      |   3 +
 default-configs/i386-softmmu.mak |   1 +
 docs/microvm.txt                 |  78 +++
 hw/acpi/cpu_hotplug.c            |  10 +-
 hw/i386/Kconfig                  |   4 +
 hw/i386/Makefile.objs            |   4 +
 hw/i386/acpi-build.c             |  31 +-
 hw/i386/amd_iommu.c              |   4 +-
 hw/i386/e820.c                   |  99 ++++
 hw/i386/e820.h                   |  11 +
 hw/i386/intel_iommu.c            |   4 +-
 hw/i386/microvm.c                | 512 +++++++++++++++++
 hw/i386/pc.c                     | 960 +++----------------------------
 hw/i386/pc_piix.c                |  48 +-
 hw/i386/pc_q35.c                 |  38 +-
 hw/i386/pc_sysfw.c               |  60 +-
 hw/i386/pvh.c                    | 113 ++++
 hw/i386/pvh.h                    |  10 +
 hw/i386/x86.c                    | 788 +++++++++++++++++++++++++
 hw/intc/ioapic.c                 |   3 +-
 hw/nvram/fw_cfg.c                |  29 +
 hw/virtio/virtio-mmio.c          |  35 +-
 include/hw/i386/microvm.h        |  80 +++
 include/hw/i386/pc.h             |  40 +-
 include/hw/i386/x86.h            |  97 ++++
 include/hw/nvram/fw_cfg.h        |  42 ++
 include/hw/virtio/virtio-mmio.h  |  60 ++
 pc-bios/bios-microvm.bin         | Bin 0 -> 65536 bytes
 roms/Makefile                    |   6 +
 roms/qboot                       |   1 +
 target/i386/kvm.c                |   1 +
 31 files changed, 2102 insertions(+), 1070 deletions(-)
 create mode 100644 docs/microvm.txt
 create mode 100644 hw/i386/e820.c
 create mode 100644 hw/i386/e820.h
 create mode 100644 hw/i386/microvm.c
 create mode 100644 hw/i386/pvh.c
 create mode 100644 hw/i386/pvh.h
 create mode 100644 hw/i386/x86.c
 create mode 100644 include/hw/i386/microvm.h
 create mode 100644 include/hw/i386/x86.h
 create mode 100644 include/hw/virtio/virtio-mmio.h
 create mode 100755 pc-bios/bios-microvm.bin
 create mode 160000 roms/qboot

-- 
2.21.0



^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH v4 1/8] hw/i386: Factorize PVH related functions
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-24 12:44   ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm, Sergio Lopez

Extract PVH related functions from pc.c, and put them in pvh.c, so
they can be shared with other components.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/i386/Makefile.objs |   1 +
 hw/i386/pc.c          | 120 +++++-------------------------------------
 hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
 hw/i386/pvh.h         |  10 ++++
 4 files changed, 136 insertions(+), 108 deletions(-)
 create mode 100644 hw/i386/pvh.c
 create mode 100644 hw/i386/pvh.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 5d9c9efd5f..c5f20bbd72 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
+obj-y += pvh.o
 obj-y += pc.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index bad866fe44..10e4ced0c6 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -42,6 +42,7 @@
 #include "elf.h"
 #include "migration/vmstate.h"
 #include "multiboot.h"
+#include "pvh.h"
 #include "hw/timer/mc146818rtc.h"
 #include "hw/dma/i8257.h"
 #include "hw/timer/i8254.h"
@@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
 static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
-/* Physical Address of PVH entry point read from kernel ELF NOTE */
-static size_t pvh_start_addr;
-
 GlobalProperty pc_compat_4_1[] = {};
 const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
 
@@ -1076,109 +1074,6 @@ struct setup_data {
     uint8_t data[0];
 } __attribute__((packed));
 
-
-/*
- * The entry point into the kernel for PVH boot is different from
- * the native entry point.  The PVH entry is defined by the x86/HVM
- * direct boot ABI and is available in an ELFNOTE in the kernel binary.
- *
- * This function is passed to load_elf() when it is called from
- * load_elfboot() which then additionally checks for an ELF Note of
- * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
- * parse the PVH entry address from the ELF Note.
- *
- * Due to trickery in elf_opts.h, load_elf() is actually available as
- * load_elf32() or load_elf64() and this routine needs to be able
- * to deal with being called as 32 or 64 bit.
- *
- * The address of the PVH entry point is saved to the 'pvh_start_addr'
- * global variable.  (although the entry point is 32-bit, the kernel
- * binary can be either 32-bit or 64-bit).
- */
-static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
-{
-    size_t *elf_note_data_addr;
-
-    /* Check if ELF Note header passed in is valid */
-    if (arg1 == NULL) {
-        return 0;
-    }
-
-    if (is64) {
-        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
-        uint64_t nhdr_size64 = sizeof(struct elf64_note);
-        uint64_t phdr_align = *(uint64_t *)arg2;
-        uint64_t nhdr_namesz = nhdr64->n_namesz;
-
-        elf_note_data_addr =
-            ((void *)nhdr64) + nhdr_size64 +
-            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
-    } else {
-        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
-        uint32_t nhdr_size32 = sizeof(struct elf32_note);
-        uint32_t phdr_align = *(uint32_t *)arg2;
-        uint32_t nhdr_namesz = nhdr32->n_namesz;
-
-        elf_note_data_addr =
-            ((void *)nhdr32) + nhdr_size32 +
-            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
-    }
-
-    pvh_start_addr = *elf_note_data_addr;
-
-    return pvh_start_addr;
-}
-
-static bool load_elfboot(const char *kernel_filename,
-                   int kernel_file_size,
-                   uint8_t *header,
-                   size_t pvh_xen_start_addr,
-                   FWCfgState *fw_cfg)
-{
-    uint32_t flags = 0;
-    uint32_t mh_load_addr = 0;
-    uint32_t elf_kernel_size = 0;
-    uint64_t elf_entry;
-    uint64_t elf_low, elf_high;
-    int kernel_size;
-
-    if (ldl_p(header) != 0x464c457f) {
-        return false; /* no elfboot */
-    }
-
-    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
-    flags = elf_is64 ?
-        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
-
-    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
-        error_report("elfboot unsupported flags = %x", flags);
-        exit(1);
-    }
-
-    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
-    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
-                           NULL, &elf_note_type, &elf_entry,
-                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
-                           0, 0);
-
-    if (kernel_size < 0) {
-        error_report("Error while loading elf kernel");
-        exit(1);
-    }
-    mh_load_addr = elf_low;
-    elf_kernel_size = elf_high - elf_low;
-
-    if (pvh_start_addr == 0) {
-        error_report("Error loading uncompressed kernel without PVH ELF Note");
-        exit(1);
-    }
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
-
-    return true;
-}
-
 static void load_linux(PCMachineState *pcms,
                        FWCfgState *fw_cfg)
 {
@@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
     if (ldl_p(header+0x202) == 0x53726448) {
         protocol = lduw_p(header+0x206);
     } else {
+        size_t pvh_start_addr;
+        uint32_t mh_load_addr = 0;
+        uint32_t elf_kernel_size = 0;
         /*
          * This could be a multiboot kernel. If it is, let's stop treating it
          * like a Linux kernel.
@@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
          * If load_elfboot() is successful, populate the fw_cfg info.
          */
         if (pcmc->pvh_enabled &&
-            load_elfboot(kernel_filename, kernel_size,
-                         header, pvh_start_addr, fw_cfg)) {
+            pvh_load_elfboot(kernel_filename,
+                             &mh_load_addr, &elf_kernel_size)) {
             fclose(f);
 
+            pvh_start_addr = pvh_get_start_addr();
+
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
+
             fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
                 strlen(kernel_cmdline) + 1);
             fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
new file mode 100644
index 0000000000..1c81727811
--- /dev/null
+++ b/hw/i386/pvh.c
@@ -0,0 +1,113 @@
+/*
+ * PVH Boot Helper
+ *
+ * Copyright (C) 2019 Oracle
+ * Copyright (C) 2019 Red Hat, Inc
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/error-report.h"
+#include "hw/loader.h"
+#include "cpu.h"
+#include "elf.h"
+#include "pvh.h"
+
+static size_t pvh_start_addr;
+
+size_t pvh_get_start_addr(void)
+{
+    return pvh_start_addr;
+}
+
+/*
+ * The entry point into the kernel for PVH boot is different from
+ * the native entry point.  The PVH entry is defined by the x86/HVM
+ * direct boot ABI and is available in an ELFNOTE in the kernel binary.
+ *
+ * This function is passed to load_elf() when it is called from
+ * load_elfboot() which then additionally checks for an ELF Note of
+ * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
+ * parse the PVH entry address from the ELF Note.
+ *
+ * Due to trickery in elf_opts.h, load_elf() is actually available as
+ * load_elf32() or load_elf64() and this routine needs to be able
+ * to deal with being called as 32 or 64 bit.
+ *
+ * The address of the PVH entry point is saved to the 'pvh_start_addr'
+ * global variable.  (although the entry point is 32-bit, the kernel
+ * binary can be either 32-bit or 64-bit).
+ */
+
+static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
+{
+    size_t *elf_note_data_addr;
+
+    /* Check if ELF Note header passed in is valid */
+    if (arg1 == NULL) {
+        return 0;
+    }
+
+    if (is64) {
+        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
+        uint64_t nhdr_size64 = sizeof(struct elf64_note);
+        uint64_t phdr_align = *(uint64_t *)arg2;
+        uint64_t nhdr_namesz = nhdr64->n_namesz;
+
+        elf_note_data_addr =
+            ((void *)nhdr64) + nhdr_size64 +
+            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+    } else {
+        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
+        uint32_t nhdr_size32 = sizeof(struct elf32_note);
+        uint32_t phdr_align = *(uint32_t *)arg2;
+        uint32_t nhdr_namesz = nhdr32->n_namesz;
+
+        elf_note_data_addr =
+            ((void *)nhdr32) + nhdr_size32 +
+            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+    }
+
+    pvh_start_addr = *elf_note_data_addr;
+
+    return pvh_start_addr;
+}
+
+bool pvh_load_elfboot(const char *kernel_filename,
+                      uint32_t *mh_load_addr,
+                      uint32_t *elf_kernel_size)
+{
+    uint64_t elf_entry;
+    uint64_t elf_low, elf_high;
+    int kernel_size;
+    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
+
+    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
+                           NULL, &elf_note_type, &elf_entry,
+                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
+                           0, 0);
+
+    if (kernel_size < 0) {
+        error_report("Error while loading elf kernel");
+        return false;
+    }
+
+    if (pvh_start_addr == 0) {
+        error_report("Error loading uncompressed kernel without PVH ELF Note");
+        return false;
+    }
+
+    if (mh_load_addr) {
+        *mh_load_addr = elf_low;
+    }
+
+    if (elf_kernel_size) {
+        *elf_kernel_size = elf_high - elf_low;
+    }
+
+    return true;
+}
diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
new file mode 100644
index 0000000000..ada67ff6e8
--- /dev/null
+++ b/hw/i386/pvh.h
@@ -0,0 +1,10 @@
+#ifndef HW_I386_PVH_H
+#define HW_I386_PVH_H
+
+size_t pvh_get_start_addr(void);
+
+bool pvh_load_elfboot(const char *kernel_filename,
+                      uint32_t *mh_load_addr,
+                      uint32_t *elf_kernel_size);
+
+#endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 1/8] hw/i386: Factorize PVH related functions
@ 2019-09-24 12:44   ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

Extract PVH related functions from pc.c, and put them in pvh.c, so
they can be shared with other components.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/i386/Makefile.objs |   1 +
 hw/i386/pc.c          | 120 +++++-------------------------------------
 hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
 hw/i386/pvh.h         |  10 ++++
 4 files changed, 136 insertions(+), 108 deletions(-)
 create mode 100644 hw/i386/pvh.c
 create mode 100644 hw/i386/pvh.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 5d9c9efd5f..c5f20bbd72 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
+obj-y += pvh.o
 obj-y += pc.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index bad866fe44..10e4ced0c6 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -42,6 +42,7 @@
 #include "elf.h"
 #include "migration/vmstate.h"
 #include "multiboot.h"
+#include "pvh.h"
 #include "hw/timer/mc146818rtc.h"
 #include "hw/dma/i8257.h"
 #include "hw/timer/i8254.h"
@@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
 static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
-/* Physical Address of PVH entry point read from kernel ELF NOTE */
-static size_t pvh_start_addr;
-
 GlobalProperty pc_compat_4_1[] = {};
 const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
 
@@ -1076,109 +1074,6 @@ struct setup_data {
     uint8_t data[0];
 } __attribute__((packed));
 
-
-/*
- * The entry point into the kernel for PVH boot is different from
- * the native entry point.  The PVH entry is defined by the x86/HVM
- * direct boot ABI and is available in an ELFNOTE in the kernel binary.
- *
- * This function is passed to load_elf() when it is called from
- * load_elfboot() which then additionally checks for an ELF Note of
- * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
- * parse the PVH entry address from the ELF Note.
- *
- * Due to trickery in elf_opts.h, load_elf() is actually available as
- * load_elf32() or load_elf64() and this routine needs to be able
- * to deal with being called as 32 or 64 bit.
- *
- * The address of the PVH entry point is saved to the 'pvh_start_addr'
- * global variable.  (although the entry point is 32-bit, the kernel
- * binary can be either 32-bit or 64-bit).
- */
-static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
-{
-    size_t *elf_note_data_addr;
-
-    /* Check if ELF Note header passed in is valid */
-    if (arg1 == NULL) {
-        return 0;
-    }
-
-    if (is64) {
-        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
-        uint64_t nhdr_size64 = sizeof(struct elf64_note);
-        uint64_t phdr_align = *(uint64_t *)arg2;
-        uint64_t nhdr_namesz = nhdr64->n_namesz;
-
-        elf_note_data_addr =
-            ((void *)nhdr64) + nhdr_size64 +
-            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
-    } else {
-        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
-        uint32_t nhdr_size32 = sizeof(struct elf32_note);
-        uint32_t phdr_align = *(uint32_t *)arg2;
-        uint32_t nhdr_namesz = nhdr32->n_namesz;
-
-        elf_note_data_addr =
-            ((void *)nhdr32) + nhdr_size32 +
-            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
-    }
-
-    pvh_start_addr = *elf_note_data_addr;
-
-    return pvh_start_addr;
-}
-
-static bool load_elfboot(const char *kernel_filename,
-                   int kernel_file_size,
-                   uint8_t *header,
-                   size_t pvh_xen_start_addr,
-                   FWCfgState *fw_cfg)
-{
-    uint32_t flags = 0;
-    uint32_t mh_load_addr = 0;
-    uint32_t elf_kernel_size = 0;
-    uint64_t elf_entry;
-    uint64_t elf_low, elf_high;
-    int kernel_size;
-
-    if (ldl_p(header) != 0x464c457f) {
-        return false; /* no elfboot */
-    }
-
-    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
-    flags = elf_is64 ?
-        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
-
-    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
-        error_report("elfboot unsupported flags = %x", flags);
-        exit(1);
-    }
-
-    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
-    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
-                           NULL, &elf_note_type, &elf_entry,
-                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
-                           0, 0);
-
-    if (kernel_size < 0) {
-        error_report("Error while loading elf kernel");
-        exit(1);
-    }
-    mh_load_addr = elf_low;
-    elf_kernel_size = elf_high - elf_low;
-
-    if (pvh_start_addr == 0) {
-        error_report("Error loading uncompressed kernel without PVH ELF Note");
-        exit(1);
-    }
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
-
-    return true;
-}
-
 static void load_linux(PCMachineState *pcms,
                        FWCfgState *fw_cfg)
 {
@@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
     if (ldl_p(header+0x202) == 0x53726448) {
         protocol = lduw_p(header+0x206);
     } else {
+        size_t pvh_start_addr;
+        uint32_t mh_load_addr = 0;
+        uint32_t elf_kernel_size = 0;
         /*
          * This could be a multiboot kernel. If it is, let's stop treating it
          * like a Linux kernel.
@@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
          * If load_elfboot() is successful, populate the fw_cfg info.
          */
         if (pcmc->pvh_enabled &&
-            load_elfboot(kernel_filename, kernel_size,
-                         header, pvh_start_addr, fw_cfg)) {
+            pvh_load_elfboot(kernel_filename,
+                             &mh_load_addr, &elf_kernel_size)) {
             fclose(f);
 
+            pvh_start_addr = pvh_get_start_addr();
+
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
+
             fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
                 strlen(kernel_cmdline) + 1);
             fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
new file mode 100644
index 0000000000..1c81727811
--- /dev/null
+++ b/hw/i386/pvh.c
@@ -0,0 +1,113 @@
+/*
+ * PVH Boot Helper
+ *
+ * Copyright (C) 2019 Oracle
+ * Copyright (C) 2019 Red Hat, Inc
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/error-report.h"
+#include "hw/loader.h"
+#include "cpu.h"
+#include "elf.h"
+#include "pvh.h"
+
+static size_t pvh_start_addr;
+
+size_t pvh_get_start_addr(void)
+{
+    return pvh_start_addr;
+}
+
+/*
+ * The entry point into the kernel for PVH boot is different from
+ * the native entry point.  The PVH entry is defined by the x86/HVM
+ * direct boot ABI and is available in an ELFNOTE in the kernel binary.
+ *
+ * This function is passed to load_elf() when it is called from
+ * load_elfboot() which then additionally checks for an ELF Note of
+ * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
+ * parse the PVH entry address from the ELF Note.
+ *
+ * Due to trickery in elf_opts.h, load_elf() is actually available as
+ * load_elf32() or load_elf64() and this routine needs to be able
+ * to deal with being called as 32 or 64 bit.
+ *
+ * The address of the PVH entry point is saved to the 'pvh_start_addr'
+ * global variable.  (although the entry point is 32-bit, the kernel
+ * binary can be either 32-bit or 64-bit).
+ */
+
+static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
+{
+    size_t *elf_note_data_addr;
+
+    /* Check if ELF Note header passed in is valid */
+    if (arg1 == NULL) {
+        return 0;
+    }
+
+    if (is64) {
+        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
+        uint64_t nhdr_size64 = sizeof(struct elf64_note);
+        uint64_t phdr_align = *(uint64_t *)arg2;
+        uint64_t nhdr_namesz = nhdr64->n_namesz;
+
+        elf_note_data_addr =
+            ((void *)nhdr64) + nhdr_size64 +
+            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+    } else {
+        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
+        uint32_t nhdr_size32 = sizeof(struct elf32_note);
+        uint32_t phdr_align = *(uint32_t *)arg2;
+        uint32_t nhdr_namesz = nhdr32->n_namesz;
+
+        elf_note_data_addr =
+            ((void *)nhdr32) + nhdr_size32 +
+            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+    }
+
+    pvh_start_addr = *elf_note_data_addr;
+
+    return pvh_start_addr;
+}
+
+bool pvh_load_elfboot(const char *kernel_filename,
+                      uint32_t *mh_load_addr,
+                      uint32_t *elf_kernel_size)
+{
+    uint64_t elf_entry;
+    uint64_t elf_low, elf_high;
+    int kernel_size;
+    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
+
+    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
+                           NULL, &elf_note_type, &elf_entry,
+                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
+                           0, 0);
+
+    if (kernel_size < 0) {
+        error_report("Error while loading elf kernel");
+        return false;
+    }
+
+    if (pvh_start_addr == 0) {
+        error_report("Error loading uncompressed kernel without PVH ELF Note");
+        return false;
+    }
+
+    if (mh_load_addr) {
+        *mh_load_addr = elf_low;
+    }
+
+    if (elf_kernel_size) {
+        *elf_kernel_size = elf_high - elf_low;
+    }
+
+    return true;
+}
diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
new file mode 100644
index 0000000000..ada67ff6e8
--- /dev/null
+++ b/hw/i386/pvh.h
@@ -0,0 +1,10 @@
+#ifndef HW_I386_PVH_H
+#define HW_I386_PVH_H
+
+size_t pvh_get_start_addr(void);
+
+bool pvh_load_elfboot(const char *kernel_filename,
+                      uint32_t *mh_load_addr,
+                      uint32_t *elf_kernel_size);
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 2/8] hw/i386: Factorize e820 related functions
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-24 12:44   ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm, Sergio Lopez

Extract e820 related functions from pc.c, and put them in e820.c, so
they can be shared with other components.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/i386/Makefile.objs |  1 +
 hw/i386/e820.c        | 99 +++++++++++++++++++++++++++++++++++++++++++
 hw/i386/e820.h        | 11 +++++
 hw/i386/pc.c          | 66 +----------------------------
 include/hw/i386/pc.h  | 11 -----
 target/i386/kvm.c     |  1 +
 6 files changed, 114 insertions(+), 75 deletions(-)
 create mode 100644 hw/i386/e820.c
 create mode 100644 hw/i386/e820.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index c5f20bbd72..149712db07 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -2,6 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
 obj-y += pvh.o
 obj-y += pc.o
+obj-y += e820.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
 obj-y += fw_cfg.o pc_sysfw.o
diff --git a/hw/i386/e820.c b/hw/i386/e820.c
new file mode 100644
index 0000000000..d5c5c0d528
--- /dev/null
+++ b/hw/i386/e820.c
@@ -0,0 +1,99 @@
+/*
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/cutils.h"
+#include "qemu/units.h"
+
+#include "hw/i386/e820.h"
+#include "hw/i386/fw_cfg.h"
+
+#define E820_NR_ENTRIES		16
+
+struct e820_entry {
+    uint64_t address;
+    uint64_t length;
+    uint32_t type;
+} QEMU_PACKED __attribute((__aligned__(4)));
+
+struct e820_table {
+    uint32_t count;
+    struct e820_entry entry[E820_NR_ENTRIES];
+} QEMU_PACKED __attribute((__aligned__(4)));
+
+static struct e820_table e820_reserve;
+static struct e820_entry *e820_table;
+static unsigned e820_entries;
+
+int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+    int index = le32_to_cpu(e820_reserve.count);
+    struct e820_entry *entry;
+
+    if (type != E820_RAM) {
+        /* old FW_CFG_E820_TABLE entry -- reservations only */
+        if (index >= E820_NR_ENTRIES) {
+            return -EBUSY;
+        }
+        entry = &e820_reserve.entry[index++];
+
+        entry->address = cpu_to_le64(address);
+        entry->length = cpu_to_le64(length);
+        entry->type = cpu_to_le32(type);
+
+        e820_reserve.count = cpu_to_le32(index);
+    }
+
+    /* new "etc/e820" file -- include ram too */
+    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
+    e820_table[e820_entries].address = cpu_to_le64(address);
+    e820_table[e820_entries].length = cpu_to_le64(length);
+    e820_table[e820_entries].type = cpu_to_le32(type);
+    e820_entries++;
+
+    return e820_entries;
+}
+
+int e820_get_num_entries(void)
+{
+    return e820_entries;
+}
+
+bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
+{
+    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
+        *address = le64_to_cpu(e820_table[idx].address);
+        *length = le64_to_cpu(e820_table[idx].length);
+        return true;
+    }
+    return false;
+}
+
+void e820_create_fw_entry(FWCfgState *fw_cfg)
+{
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
+                     &e820_reserve, sizeof(e820_reserve));
+    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
+                    sizeof(struct e820_entry) * e820_entries);
+}
diff --git a/hw/i386/e820.h b/hw/i386/e820.h
new file mode 100644
index 0000000000..569d1f0ab5
--- /dev/null
+++ b/hw/i386/e820.h
@@ -0,0 +1,11 @@
+/* e820 types */
+#define E820_RAM        1
+#define E820_RESERVED   2
+#define E820_ACPI       3
+#define E820_NVS        4
+#define E820_UNUSABLE   5
+
+int e820_add_entry(uint64_t address, uint64_t length, uint32_t type);
+int e820_get_num_entries(void);
+bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length);
+void e820_create_fw_entry(FWCfgState *fw_cfg);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 10e4ced0c6..3920aa7e85 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -30,6 +30,7 @@
 #include "hw/i386/apic.h"
 #include "hw/i386/topology.h"
 #include "hw/i386/fw_cfg.h"
+#include "hw/i386/e820.h"
 #include "sysemu/cpus.h"
 #include "hw/block/fdc.h"
 #include "hw/ide.h"
@@ -99,22 +100,6 @@
 #define DPRINTF(fmt, ...)
 #endif
 
-#define E820_NR_ENTRIES		16
-
-struct e820_entry {
-    uint64_t address;
-    uint64_t length;
-    uint32_t type;
-} QEMU_PACKED __attribute((__aligned__(4)));
-
-struct e820_table {
-    uint32_t count;
-    struct e820_entry entry[E820_NR_ENTRIES];
-} QEMU_PACKED __attribute((__aligned__(4)));
-
-static struct e820_table e820_reserve;
-static struct e820_entry *e820_table;
-static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
 GlobalProperty pc_compat_4_1[] = {};
@@ -878,50 +863,6 @@ static void handle_a20_line_change(void *opaque, int irq, int level)
     x86_cpu_set_a20(cpu, level);
 }
 
-int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
-{
-    int index = le32_to_cpu(e820_reserve.count);
-    struct e820_entry *entry;
-
-    if (type != E820_RAM) {
-        /* old FW_CFG_E820_TABLE entry -- reservations only */
-        if (index >= E820_NR_ENTRIES) {
-            return -EBUSY;
-        }
-        entry = &e820_reserve.entry[index++];
-
-        entry->address = cpu_to_le64(address);
-        entry->length = cpu_to_le64(length);
-        entry->type = cpu_to_le32(type);
-
-        e820_reserve.count = cpu_to_le32(index);
-    }
-
-    /* new "etc/e820" file -- include ram too */
-    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
-    e820_table[e820_entries].address = cpu_to_le64(address);
-    e820_table[e820_entries].length = cpu_to_le64(length);
-    e820_table[e820_entries].type = cpu_to_le32(type);
-    e820_entries++;
-
-    return e820_entries;
-}
-
-int e820_get_num_entries(void)
-{
-    return e820_entries;
-}
-
-bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
-{
-    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
-        *address = le64_to_cpu(e820_table[idx].address);
-        *length = le64_to_cpu(e820_table[idx].length);
-        return true;
-    }
-    return false;
-}
-
 /* Calculates initial APIC ID for a specific CPU index
  *
  * Currently we need to be able to calculate the APIC ID from the CPU index
@@ -1024,10 +965,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
                      acpi_tables, acpi_tables_len);
     fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
 
-    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
-                     &e820_reserve, sizeof(e820_reserve));
-    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
-                    sizeof(struct e820_entry) * e820_entries);
+    e820_create_fw_entry(fw_cfg);
 
     fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
     /* allocate memory for the NUMA channel: one (64bit) word for the number
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 19a837889d..062feeb69e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -291,17 +291,6 @@ void pc_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
 void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
                        const CPUArchIdList *apic_ids, GArray *entry);
 
-/* e820 types */
-#define E820_RAM        1
-#define E820_RESERVED   2
-#define E820_ACPI       3
-#define E820_NVS        4
-#define E820_UNUSABLE   5
-
-int e820_add_entry(uint64_t, uint64_t, uint32_t);
-int e820_get_num_entries(void);
-bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
-
 extern GlobalProperty pc_compat_4_1[];
 extern const size_t pc_compat_4_1_len;
 
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 8023c679ea..8ce56db7d4 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -41,6 +41,7 @@
 #include "hw/i386/apic-msidef.h"
 #include "hw/i386/intel_iommu.h"
 #include "hw/i386/x86-iommu.h"
+#include "hw/i386/e820.h"
 
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 2/8] hw/i386: Factorize e820 related functions
@ 2019-09-24 12:44   ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

Extract e820 related functions from pc.c, and put them in e820.c, so
they can be shared with other components.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/i386/Makefile.objs |  1 +
 hw/i386/e820.c        | 99 +++++++++++++++++++++++++++++++++++++++++++
 hw/i386/e820.h        | 11 +++++
 hw/i386/pc.c          | 66 +----------------------------
 include/hw/i386/pc.h  | 11 -----
 target/i386/kvm.c     |  1 +
 6 files changed, 114 insertions(+), 75 deletions(-)
 create mode 100644 hw/i386/e820.c
 create mode 100644 hw/i386/e820.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index c5f20bbd72..149712db07 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -2,6 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
 obj-y += pvh.o
 obj-y += pc.o
+obj-y += e820.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
 obj-y += fw_cfg.o pc_sysfw.o
diff --git a/hw/i386/e820.c b/hw/i386/e820.c
new file mode 100644
index 0000000000..d5c5c0d528
--- /dev/null
+++ b/hw/i386/e820.c
@@ -0,0 +1,99 @@
+/*
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/cutils.h"
+#include "qemu/units.h"
+
+#include "hw/i386/e820.h"
+#include "hw/i386/fw_cfg.h"
+
+#define E820_NR_ENTRIES		16
+
+struct e820_entry {
+    uint64_t address;
+    uint64_t length;
+    uint32_t type;
+} QEMU_PACKED __attribute((__aligned__(4)));
+
+struct e820_table {
+    uint32_t count;
+    struct e820_entry entry[E820_NR_ENTRIES];
+} QEMU_PACKED __attribute((__aligned__(4)));
+
+static struct e820_table e820_reserve;
+static struct e820_entry *e820_table;
+static unsigned e820_entries;
+
+int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+    int index = le32_to_cpu(e820_reserve.count);
+    struct e820_entry *entry;
+
+    if (type != E820_RAM) {
+        /* old FW_CFG_E820_TABLE entry -- reservations only */
+        if (index >= E820_NR_ENTRIES) {
+            return -EBUSY;
+        }
+        entry = &e820_reserve.entry[index++];
+
+        entry->address = cpu_to_le64(address);
+        entry->length = cpu_to_le64(length);
+        entry->type = cpu_to_le32(type);
+
+        e820_reserve.count = cpu_to_le32(index);
+    }
+
+    /* new "etc/e820" file -- include ram too */
+    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
+    e820_table[e820_entries].address = cpu_to_le64(address);
+    e820_table[e820_entries].length = cpu_to_le64(length);
+    e820_table[e820_entries].type = cpu_to_le32(type);
+    e820_entries++;
+
+    return e820_entries;
+}
+
+int e820_get_num_entries(void)
+{
+    return e820_entries;
+}
+
+bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
+{
+    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
+        *address = le64_to_cpu(e820_table[idx].address);
+        *length = le64_to_cpu(e820_table[idx].length);
+        return true;
+    }
+    return false;
+}
+
+void e820_create_fw_entry(FWCfgState *fw_cfg)
+{
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
+                     &e820_reserve, sizeof(e820_reserve));
+    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
+                    sizeof(struct e820_entry) * e820_entries);
+}
diff --git a/hw/i386/e820.h b/hw/i386/e820.h
new file mode 100644
index 0000000000..569d1f0ab5
--- /dev/null
+++ b/hw/i386/e820.h
@@ -0,0 +1,11 @@
+/* e820 types */
+#define E820_RAM        1
+#define E820_RESERVED   2
+#define E820_ACPI       3
+#define E820_NVS        4
+#define E820_UNUSABLE   5
+
+int e820_add_entry(uint64_t address, uint64_t length, uint32_t type);
+int e820_get_num_entries(void);
+bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length);
+void e820_create_fw_entry(FWCfgState *fw_cfg);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 10e4ced0c6..3920aa7e85 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -30,6 +30,7 @@
 #include "hw/i386/apic.h"
 #include "hw/i386/topology.h"
 #include "hw/i386/fw_cfg.h"
+#include "hw/i386/e820.h"
 #include "sysemu/cpus.h"
 #include "hw/block/fdc.h"
 #include "hw/ide.h"
@@ -99,22 +100,6 @@
 #define DPRINTF(fmt, ...)
 #endif
 
-#define E820_NR_ENTRIES		16
-
-struct e820_entry {
-    uint64_t address;
-    uint64_t length;
-    uint32_t type;
-} QEMU_PACKED __attribute((__aligned__(4)));
-
-struct e820_table {
-    uint32_t count;
-    struct e820_entry entry[E820_NR_ENTRIES];
-} QEMU_PACKED __attribute((__aligned__(4)));
-
-static struct e820_table e820_reserve;
-static struct e820_entry *e820_table;
-static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
 GlobalProperty pc_compat_4_1[] = {};
@@ -878,50 +863,6 @@ static void handle_a20_line_change(void *opaque, int irq, int level)
     x86_cpu_set_a20(cpu, level);
 }
 
-int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
-{
-    int index = le32_to_cpu(e820_reserve.count);
-    struct e820_entry *entry;
-
-    if (type != E820_RAM) {
-        /* old FW_CFG_E820_TABLE entry -- reservations only */
-        if (index >= E820_NR_ENTRIES) {
-            return -EBUSY;
-        }
-        entry = &e820_reserve.entry[index++];
-
-        entry->address = cpu_to_le64(address);
-        entry->length = cpu_to_le64(length);
-        entry->type = cpu_to_le32(type);
-
-        e820_reserve.count = cpu_to_le32(index);
-    }
-
-    /* new "etc/e820" file -- include ram too */
-    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
-    e820_table[e820_entries].address = cpu_to_le64(address);
-    e820_table[e820_entries].length = cpu_to_le64(length);
-    e820_table[e820_entries].type = cpu_to_le32(type);
-    e820_entries++;
-
-    return e820_entries;
-}
-
-int e820_get_num_entries(void)
-{
-    return e820_entries;
-}
-
-bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
-{
-    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
-        *address = le64_to_cpu(e820_table[idx].address);
-        *length = le64_to_cpu(e820_table[idx].length);
-        return true;
-    }
-    return false;
-}
-
 /* Calculates initial APIC ID for a specific CPU index
  *
  * Currently we need to be able to calculate the APIC ID from the CPU index
@@ -1024,10 +965,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
                      acpi_tables, acpi_tables_len);
     fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
 
-    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
-                     &e820_reserve, sizeof(e820_reserve));
-    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
-                    sizeof(struct e820_entry) * e820_entries);
+    e820_create_fw_entry(fw_cfg);
 
     fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
     /* allocate memory for the NUMA channel: one (64bit) word for the number
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 19a837889d..062feeb69e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -291,17 +291,6 @@ void pc_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
 void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
                        const CPUArchIdList *apic_ids, GArray *entry);
 
-/* e820 types */
-#define E820_RAM        1
-#define E820_RESERVED   2
-#define E820_ACPI       3
-#define E820_NVS        4
-#define E820_UNUSABLE   5
-
-int e820_add_entry(uint64_t, uint64_t, uint32_t);
-int e820_get_num_entries(void);
-bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
-
 extern GlobalProperty pc_compat_4_1[];
 extern const size_t pc_compat_4_1_len;
 
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 8023c679ea..8ce56db7d4 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -41,6 +41,7 @@
 #include "hw/i386/apic-msidef.h"
 #include "hw/i386/intel_iommu.h"
 #include "hw/i386/x86-iommu.h"
+#include "hw/i386/e820.h"
 
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 3/8] hw/virtio: Factorize virtio-mmio headers
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-24 12:44   ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm, Sergio Lopez

Put QOM and main struct definition in a separate header file, so it
can be accessed from other components.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/virtio/virtio-mmio.c         | 35 +------------------
 include/hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 34 deletions(-)
 create mode 100644 include/hw/virtio/virtio-mmio.h

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index eccc795f28..6be6b298d5 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -29,44 +29,11 @@
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
 #include "sysemu/kvm.h"
-#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-mmio.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "trace.h"
 
-/* QOM macros */
-/* virtio-mmio-bus */
-#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
-#define VIRTIO_MMIO_BUS(obj) \
-        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
-        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_CLASS(klass) \
-        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
-
-/* virtio-mmio */
-#define TYPE_VIRTIO_MMIO "virtio-mmio"
-#define VIRTIO_MMIO(obj) \
-        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
-
-#define VIRT_MAGIC 0x74726976 /* 'virt' */
-#define VIRT_VERSION 1
-#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
-
-typedef struct {
-    /* Generic */
-    SysBusDevice parent_obj;
-    MemoryRegion iomem;
-    qemu_irq irq;
-    /* Guest accessible state needing migration and reset */
-    uint32_t host_features_sel;
-    uint32_t guest_features_sel;
-    uint32_t guest_page_shift;
-    /* virtio-bus */
-    VirtioBusState bus;
-    bool format_transport_address;
-} VirtIOMMIOProxy;
-
 static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
 {
     return kvm_eventfds_enabled();
diff --git a/include/hw/virtio/virtio-mmio.h b/include/hw/virtio/virtio-mmio.h
new file mode 100644
index 0000000000..2f3973f8c7
--- /dev/null
+++ b/include/hw/virtio/virtio-mmio.h
@@ -0,0 +1,60 @@
+/*
+ * Virtio MMIO bindings
+ *
+ * Copyright (c) 2011 Linaro Limited
+ *
+ * Author:
+ *  Peter Maydell <peter.maydell@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_VIRTIO_MMIO_H
+#define QEMU_VIRTIO_MMIO_H
+
+#include "hw/virtio/virtio-bus.h"
+
+/* QOM macros */
+/* virtio-mmio-bus */
+#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
+#define VIRTIO_MMIO_BUS(obj) \
+        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_CLASS(klass) \
+        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
+
+/* virtio-mmio */
+#define TYPE_VIRTIO_MMIO "virtio-mmio"
+#define VIRTIO_MMIO(obj) \
+        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
+
+#define VIRT_MAGIC 0x74726976 /* 'virt' */
+#define VIRT_VERSION 1
+#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
+
+typedef struct {
+    /* Generic */
+    SysBusDevice parent_obj;
+    MemoryRegion iomem;
+    qemu_irq irq;
+    /* Guest accessible state needing migration and reset */
+    uint32_t host_features_sel;
+    uint32_t guest_features_sel;
+    uint32_t guest_page_shift;
+    /* virtio-bus */
+    VirtioBusState bus;
+    bool format_transport_address;
+} VirtIOMMIOProxy;
+
+#endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 3/8] hw/virtio: Factorize virtio-mmio headers
@ 2019-09-24 12:44   ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

Put QOM and main struct definition in a separate header file, so it
can be accessed from other components.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/virtio/virtio-mmio.c         | 35 +------------------
 include/hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 34 deletions(-)
 create mode 100644 include/hw/virtio/virtio-mmio.h

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index eccc795f28..6be6b298d5 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -29,44 +29,11 @@
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
 #include "sysemu/kvm.h"
-#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-mmio.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "trace.h"
 
-/* QOM macros */
-/* virtio-mmio-bus */
-#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
-#define VIRTIO_MMIO_BUS(obj) \
-        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
-        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_CLASS(klass) \
-        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
-
-/* virtio-mmio */
-#define TYPE_VIRTIO_MMIO "virtio-mmio"
-#define VIRTIO_MMIO(obj) \
-        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
-
-#define VIRT_MAGIC 0x74726976 /* 'virt' */
-#define VIRT_VERSION 1
-#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
-
-typedef struct {
-    /* Generic */
-    SysBusDevice parent_obj;
-    MemoryRegion iomem;
-    qemu_irq irq;
-    /* Guest accessible state needing migration and reset */
-    uint32_t host_features_sel;
-    uint32_t guest_features_sel;
-    uint32_t guest_page_shift;
-    /* virtio-bus */
-    VirtioBusState bus;
-    bool format_transport_address;
-} VirtIOMMIOProxy;
-
 static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
 {
     return kvm_eventfds_enabled();
diff --git a/include/hw/virtio/virtio-mmio.h b/include/hw/virtio/virtio-mmio.h
new file mode 100644
index 0000000000..2f3973f8c7
--- /dev/null
+++ b/include/hw/virtio/virtio-mmio.h
@@ -0,0 +1,60 @@
+/*
+ * Virtio MMIO bindings
+ *
+ * Copyright (c) 2011 Linaro Limited
+ *
+ * Author:
+ *  Peter Maydell <peter.maydell@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_VIRTIO_MMIO_H
+#define QEMU_VIRTIO_MMIO_H
+
+#include "hw/virtio/virtio-bus.h"
+
+/* QOM macros */
+/* virtio-mmio-bus */
+#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
+#define VIRTIO_MMIO_BUS(obj) \
+        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_CLASS(klass) \
+        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
+
+/* virtio-mmio */
+#define TYPE_VIRTIO_MMIO "virtio-mmio"
+#define VIRTIO_MMIO(obj) \
+        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
+
+#define VIRT_MAGIC 0x74726976 /* 'virt' */
+#define VIRT_VERSION 1
+#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
+
+typedef struct {
+    /* Generic */
+    SysBusDevice parent_obj;
+    MemoryRegion iomem;
+    qemu_irq irq;
+    /* Guest accessible state needing migration and reset */
+    uint32_t host_features_sel;
+    uint32_t guest_features_sel;
+    uint32_t guest_page_shift;
+    /* virtio-bus */
+    VirtioBusState bus;
+    bool format_transport_address;
+} VirtIOMMIOProxy;
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 4/8] hw/i386: split PCMachineState deriving X86MachineState from it
  2019-09-24 12:44 ` Sergio Lopez
                   ` (3 preceding siblings ...)
  (?)
@ 2019-09-24 12:44 ` Sergio Lopez
  2019-09-24 13:40   ` Philippe Mathieu-Daudé
  -1 siblings, 1 reply; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

Split up PCMachineState and PCMachineClass and derive X86MachineState
and X86MachineClass from them. This allows sharing code with non-PC
machine types.

Also, move shared functions from pc.c to x86.c.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/acpi/cpu_hotplug.c |  10 +-
 hw/i386/Makefile.objs |   1 +
 hw/i386/acpi-build.c  |  31 +-
 hw/i386/amd_iommu.c   |   4 +-
 hw/i386/intel_iommu.c |   4 +-
 hw/i386/pc.c          | 796 +++++-------------------------------------
 hw/i386/pc_piix.c     |  48 +--
 hw/i386/pc_q35.c      |  38 +-
 hw/i386/pc_sysfw.c    |  60 +---
 hw/i386/x86.c         | 788 +++++++++++++++++++++++++++++++++++++++++
 hw/intc/ioapic.c      |   3 +-
 include/hw/i386/pc.h  |  29 +-
 include/hw/i386/x86.h |  97 +++++
 13 files changed, 1045 insertions(+), 864 deletions(-)
 create mode 100644 hw/i386/x86.c
 create mode 100644 include/hw/i386/x86.h

diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
index 6e8293aac9..3ac2045a95 100644
--- a/hw/acpi/cpu_hotplug.c
+++ b/hw/acpi/cpu_hotplug.c
@@ -128,7 +128,7 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState *machine,
     Aml *one = aml_int(1);
     MachineClass *mc = MACHINE_GET_CLASS(machine);
     const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
-    PCMachineState *pcms = PC_MACHINE(machine);
+    X86MachineState *x86ms = X86_MACHINE(machine);
 
     /*
      * _MAT method - creates an madt apic buffer
@@ -236,9 +236,9 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState *machine,
     /* The current AML generator can cover the APIC ID range [0..255],
      * inclusive, for VCPU hotplug. */
     QEMU_BUILD_BUG_ON(ACPI_CPU_HOTPLUG_ID_LIMIT > 256);
-    if (pcms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
+    if (x86ms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
         error_report("max_cpus is too large. APIC ID of last CPU is %u",
-                     pcms->apic_id_limit - 1);
+                     x86ms->apic_id_limit - 1);
         exit(1);
     }
 
@@ -315,8 +315,8 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState *machine,
      * ith up to 255 elements. Windows guests up to win2k8 fail when
      * VarPackageOp is used.
      */
-    pkg = pcms->apic_id_limit <= 255 ? aml_package(pcms->apic_id_limit) :
-                                       aml_varpackage(pcms->apic_id_limit);
+    pkg = x86ms->apic_id_limit <= 255 ? aml_package(x86ms->apic_id_limit) :
+                                        aml_varpackage(x86ms->apic_id_limit);
 
     for (i = 0, apic_idx = 0; i < apic_ids->len; i++) {
         int apic_id = apic_ids->cpus[i].arch_id;
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 149712db07..5b4b3a672e 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -1,6 +1,7 @@
 obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
 obj-y += pvh.o
+obj-y += x86.o
 obj-y += pc.o
 obj-y += e820.o
 obj-$(CONFIG_I440FX) += pc_piix.o
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index e54e571a75..76e18d3285 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -29,6 +29,7 @@
 #include "hw/pci/pci.h"
 #include "hw/core/cpu.h"
 #include "target/i386/cpu.h"
+#include "hw/i386/x86.h"
 #include "hw/misc/pvpanic.h"
 #include "hw/timer/hpet.h"
 #include "hw/acpi/acpi-defs.h"
@@ -361,6 +362,7 @@ static void
 build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
 {
     MachineClass *mc = MACHINE_GET_CLASS(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(pcms));
     int madt_start = table_data->len;
     AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(pcms->acpi_dev);
@@ -390,7 +392,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
     io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
     io_apic->interrupt = cpu_to_le32(0);
 
-    if (pcms->apic_xrupt_override) {
+    if (x86ms->apic_xrupt_override) {
         intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
         intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
         intsrcovr->length = sizeof(*intsrcovr);
@@ -1817,8 +1819,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
     CrsRangeEntry *entry;
     Aml *dsdt, *sb_scope, *scope, *dev, *method, *field, *pkg, *crs;
     CrsRangeSet crs_range_set;
-    PCMachineState *pcms = PC_MACHINE(machine);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
+    X86MachineState *x86ms = X86_MACHINE(machine);
     AcpiMcfgInfo mcfg;
     uint32_t nr_mem = machine->ram_slots;
     int root_bus_limit = 0xFF;
@@ -2083,7 +2085,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
          * with half of the 16-bit control register. Hence, the total size
          * of the i/o region used is FW_CFG_CTL_SIZE; when using DMA, the
          * DMA control register is located at FW_CFG_DMA_IO_BASE + 4 */
-        uint8_t io_size = object_property_get_bool(OBJECT(pcms->fw_cfg),
+        uint8_t io_size = object_property_get_bool(OBJECT(x86ms->fw_cfg),
                                                    "dma_enabled", NULL) ?
                           ROUND_UP(FW_CFG_CTL_SIZE, 4) + sizeof(dma_addr_t) :
                           FW_CFG_CTL_SIZE;
@@ -2318,6 +2320,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
     MachineClass *mc = MACHINE_GET_CLASS(machine);
     const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
     PCMachineState *pcms = PC_MACHINE(machine);
+    X86MachineState *x86ms = X86_MACHINE(machine);
     ram_addr_t hotplugabble_address_space_size =
         object_property_get_int(OBJECT(pcms), PC_MACHINE_DEVMEM_REGION_SIZE,
                                 NULL);
@@ -2386,16 +2389,16 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
         }
 
         /* Cut out the ACPI_PCI hole */
-        if (mem_base <= pcms->below_4g_mem_size &&
-            next_base > pcms->below_4g_mem_size) {
-            mem_len -= next_base - pcms->below_4g_mem_size;
+        if (mem_base <= x86ms->below_4g_mem_size &&
+            next_base > x86ms->below_4g_mem_size) {
+            mem_len -= next_base - x86ms->below_4g_mem_size;
             if (mem_len > 0) {
                 numamem = acpi_data_push(table_data, sizeof *numamem);
                 build_srat_memory(numamem, mem_base, mem_len, i - 1,
                                   MEM_AFFINITY_ENABLED);
             }
             mem_base = 1ULL << 32;
-            mem_len = next_base - pcms->below_4g_mem_size;
+            mem_len = next_base - x86ms->below_4g_mem_size;
             next_base = mem_base + mem_len;
         }
 
@@ -2614,6 +2617,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
 {
     PCMachineState *pcms = PC_MACHINE(machine);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    X86MachineState *x86ms = X86_MACHINE(machine);
     GArray *table_offsets;
     unsigned facs, dsdt, rsdt, fadt;
     AcpiPmInfo pm;
@@ -2775,7 +2779,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
          */
         int legacy_aml_len =
             pcmc->legacy_acpi_table_size +
-            ACPI_BUILD_LEGACY_CPU_AML_SIZE * pcms->apic_id_limit;
+            ACPI_BUILD_LEGACY_CPU_AML_SIZE * x86ms->apic_id_limit;
         int legacy_table_size =
             ROUND_UP(tables_blob->len - aml_len + legacy_aml_len,
                      ACPI_BUILD_ALIGN_SIZE);
@@ -2865,13 +2869,14 @@ void acpi_setup(void)
 {
     PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     AcpiBuildTables tables;
     AcpiBuildState *build_state;
     Object *vmgenid_dev;
     TPMIf *tpm;
     static FwCfgTPMConfig tpm_config;
 
-    if (!pcms->fw_cfg) {
+    if (!x86ms->fw_cfg) {
         ACPI_BUILD_DPRINTF("No fw cfg. Bailing out.\n");
         return;
     }
@@ -2902,7 +2907,7 @@ void acpi_setup(void)
         acpi_add_rom_blob(acpi_build_update, build_state,
                           tables.linker->cmd_blob, "etc/table-loader", 0);
 
-    fw_cfg_add_file(pcms->fw_cfg, ACPI_BUILD_TPMLOG_FILE,
+    fw_cfg_add_file(x86ms->fw_cfg, ACPI_BUILD_TPMLOG_FILE,
                     tables.tcpalog->data, acpi_data_len(tables.tcpalog));
 
     tpm = tpm_find();
@@ -2912,13 +2917,13 @@ void acpi_setup(void)
             .tpm_version = tpm_get_version(tpm),
             .tpmppi_version = TPM_PPI_VERSION_1_30
         };
-        fw_cfg_add_file(pcms->fw_cfg, "etc/tpm/config",
+        fw_cfg_add_file(x86ms->fw_cfg, "etc/tpm/config",
                         &tpm_config, sizeof tpm_config);
     }
 
     vmgenid_dev = find_vmgenid_dev();
     if (vmgenid_dev) {
-        vmgenid_add_fw_cfg(VMGENID(vmgenid_dev), pcms->fw_cfg,
+        vmgenid_add_fw_cfg(VMGENID(vmgenid_dev), x86ms->fw_cfg,
                            tables.vmgenid);
     }
 
@@ -2931,7 +2936,7 @@ void acpi_setup(void)
         uint32_t rsdp_size = acpi_data_len(tables.rsdp);
 
         build_state->rsdp = g_memdup(tables.rsdp->data, rsdp_size);
-        fw_cfg_add_file_callback(pcms->fw_cfg, ACPI_BUILD_RSDP_FILE,
+        fw_cfg_add_file_callback(x86ms->fw_cfg, ACPI_BUILD_RSDP_FILE,
                                  acpi_build_update, NULL, build_state,
                                  build_state->rsdp, rsdp_size, true);
         build_state->rsdp_mr = NULL;
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 08884523e2..bb3b5b4563 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -21,6 +21,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/pci_bus.h"
@@ -1537,6 +1538,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
     MachineState *ms = MACHINE(qdev_get_machine());
     PCMachineState *pcms = PC_MACHINE(ms);
+    X86MachineState *x86ms = X86_MACHINE(ms);
     PCIBus *bus = pcms->bus;
 
     s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
@@ -1565,7 +1567,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
     }
 
     /* Pseudo address space under root PCI bus. */
-    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_IOAPIC_SB_DEVID);
+    x86ms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_IOAPIC_SB_DEVID);
 
     /* set up MMIO */
     memory_region_init_io(&s->mmio, OBJECT(s), &mmio_mem_ops, s, "amdvi-mmio",
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 75ca6f9c70..21f091c654 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -29,6 +29,7 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
+#include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/apic-msidef.h"
 #include "hw/boards.h"
@@ -3703,6 +3704,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
     PCMachineState *pcms = PC_MACHINE(ms);
+    X86MachineState *x86ms = X86_MACHINE(ms);
     PCIBus *bus = pcms->bus;
     IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
     X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
@@ -3743,7 +3745,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
     sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
     pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
     /* Pseudo address space under root PCI bus. */
-    pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
+    x86ms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
 }
 
 static void vtd_class_init(ObjectClass *klass, void *data)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3920aa7e85..d18b461f01 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -24,6 +24,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/units.h"
+#include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
 #include "hw/char/serial.h"
 #include "hw/char/parallel.h"
@@ -676,6 +677,7 @@ void pc_cmos_init(PCMachineState *pcms,
                   BusState *idebus0, BusState *idebus1,
                   ISADevice *s)
 {
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     int val;
     static pc_cmos_init_late_arg arg;
 
@@ -683,12 +685,12 @@ void pc_cmos_init(PCMachineState *pcms,
 
     /* memory size */
     /* base memory (first MiB) */
-    val = MIN(pcms->below_4g_mem_size / KiB, 640);
+    val = MIN(x86ms->below_4g_mem_size / KiB, 640);
     rtc_set_memory(s, 0x15, val);
     rtc_set_memory(s, 0x16, val >> 8);
     /* extended memory (next 64MiB) */
-    if (pcms->below_4g_mem_size > 1 * MiB) {
-        val = (pcms->below_4g_mem_size - 1 * MiB) / KiB;
+    if (x86ms->below_4g_mem_size > 1 * MiB) {
+        val = (x86ms->below_4g_mem_size - 1 * MiB) / KiB;
     } else {
         val = 0;
     }
@@ -699,8 +701,8 @@ void pc_cmos_init(PCMachineState *pcms,
     rtc_set_memory(s, 0x30, val);
     rtc_set_memory(s, 0x31, val >> 8);
     /* memory between 16MiB and 4GiB */
-    if (pcms->below_4g_mem_size > 16 * MiB) {
-        val = (pcms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
+    if (x86ms->below_4g_mem_size > 16 * MiB) {
+        val = (x86ms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
     } else {
         val = 0;
     }
@@ -709,20 +711,20 @@ void pc_cmos_init(PCMachineState *pcms,
     rtc_set_memory(s, 0x34, val);
     rtc_set_memory(s, 0x35, val >> 8);
     /* memory above 4GiB */
-    val = pcms->above_4g_mem_size / 65536;
+    val = x86ms->above_4g_mem_size / 65536;
     rtc_set_memory(s, 0x5b, val);
     rtc_set_memory(s, 0x5c, val >> 8);
     rtc_set_memory(s, 0x5d, val >> 16);
 
-    object_property_add_link(OBJECT(pcms), "rtc_state",
+    object_property_add_link(OBJECT(x86ms), "rtc_state",
                              TYPE_ISA_DEVICE,
-                             (Object **)&pcms->rtc,
+                             (Object **)&x86ms->rtc,
                              object_property_allow_set_link,
                              OBJ_PROP_LINK_STRONG, &error_abort);
-    object_property_set_link(OBJECT(pcms), OBJECT(s),
+    object_property_set_link(OBJECT(x86ms), OBJECT(s),
                              "rtc_state", &error_abort);
 
-    set_boot_dev(s, MACHINE(pcms)->boot_order, &error_fatal);
+    set_boot_dev(s, MACHINE(x86ms)->boot_order, &error_fatal);
 
     val = 0;
     val |= 0x02; /* FPU is there */
@@ -863,35 +865,6 @@ static void handle_a20_line_change(void *opaque, int irq, int level)
     x86_cpu_set_a20(cpu, level);
 }
 
-/* Calculates initial APIC ID for a specific CPU index
- *
- * Currently we need to be able to calculate the APIC ID from the CPU index
- * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces have
- * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of
- * all CPUs up to max_cpus.
- */
-static uint32_t x86_cpu_apic_id_from_index(PCMachineState *pcms,
-                                           unsigned int cpu_index)
-{
-    MachineState *ms = MACHINE(pcms);
-    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
-    uint32_t correct_id;
-    static bool warned;
-
-    correct_id = x86_apicid_from_cpu_idx(pcms->smp_dies, ms->smp.cores,
-                                         ms->smp.threads, cpu_index);
-    if (pcmc->compat_apic_id_mode) {
-        if (cpu_index != correct_id && !warned && !qtest_enabled()) {
-            error_report("APIC IDs set in compatibility mode, "
-                         "CPU topology won't match the configuration");
-            warned = true;
-        }
-        return cpu_index;
-    } else {
-        return correct_id;
-    }
-}
-
 static void pc_build_smbios(PCMachineState *pcms)
 {
     uint8_t *smbios_tables, *smbios_anchor;
@@ -899,6 +872,7 @@ static void pc_build_smbios(PCMachineState *pcms)
     struct smbios_phys_mem_area *mem_array;
     unsigned i, array_count;
     MachineState *ms = MACHINE(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     X86CPU *cpu = X86_CPU(ms->possible_cpus->cpus[0].cpu);
 
     /* tell smbios about cpuid version and features */
@@ -906,7 +880,7 @@ static void pc_build_smbios(PCMachineState *pcms)
 
     smbios_tables = smbios_get_table_legacy(ms, &smbios_tables_len);
     if (smbios_tables) {
-        fw_cfg_add_bytes(pcms->fw_cfg, FW_CFG_SMBIOS_ENTRIES,
+        fw_cfg_add_bytes(x86ms->fw_cfg, FW_CFG_SMBIOS_ENTRIES,
                          smbios_tables, smbios_tables_len);
     }
 
@@ -927,9 +901,9 @@ static void pc_build_smbios(PCMachineState *pcms)
     g_free(mem_array);
 
     if (smbios_anchor) {
-        fw_cfg_add_file(pcms->fw_cfg, "etc/smbios/smbios-tables",
+        fw_cfg_add_file(x86ms->fw_cfg, "etc/smbios/smbios-tables",
                         smbios_tables, smbios_tables_len);
-        fw_cfg_add_file(pcms->fw_cfg, "etc/smbios/smbios-anchor",
+        fw_cfg_add_file(x86ms->fw_cfg, "etc/smbios/smbios-anchor",
                         smbios_anchor, smbios_anchor_len);
     }
 }
@@ -942,10 +916,11 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
     const CPUArchIdList *cpus;
     MachineClass *mc = MACHINE_GET_CLASS(pcms);
     MachineState *ms = MACHINE(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     int nb_numa_nodes = ms->numa_state->num_nodes;
 
     fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
-    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
 
     /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
      *
@@ -959,7 +934,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
      * So for compatibility reasons with old BIOSes we are stuck with
      * "etc/max-cpus" actually being apic_id_limit
      */
-    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)pcms->apic_id_limit);
+    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)x86ms->apic_id_limit);
     fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
     fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES,
                      acpi_tables, acpi_tables_len);
@@ -972,374 +947,25 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
      * of nodes, one word for each VCPU->node and one word for each node to
      * hold the amount of memory.
      */
-    numa_fw_cfg = g_new0(uint64_t, 1 + pcms->apic_id_limit + nb_numa_nodes);
+    numa_fw_cfg = g_new0(uint64_t, 1 + x86ms->apic_id_limit + nb_numa_nodes);
     numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
     cpus = mc->possible_cpu_arch_ids(MACHINE(pcms));
     for (i = 0; i < cpus->len; i++) {
         unsigned int apic_id = cpus->cpus[i].arch_id;
-        assert(apic_id < pcms->apic_id_limit);
+        assert(apic_id < x86ms->apic_id_limit);
         numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
     }
     for (i = 0; i < nb_numa_nodes; i++) {
-        numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
+        numa_fw_cfg[x86ms->apic_id_limit + 1 + i] =
             cpu_to_le64(ms->numa_state->nodes[i].node_mem);
     }
     fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
-                     (1 + pcms->apic_id_limit + nb_numa_nodes) *
+                     (1 + x86ms->apic_id_limit + nb_numa_nodes) *
                      sizeof(*numa_fw_cfg));
 
     return fw_cfg;
 }
 
-static long get_file_size(FILE *f)
-{
-    long where, size;
-
-    /* XXX: on Unix systems, using fstat() probably makes more sense */
-
-    where = ftell(f);
-    fseek(f, 0, SEEK_END);
-    size = ftell(f);
-    fseek(f, where, SEEK_SET);
-
-    return size;
-}
-
-struct setup_data {
-    uint64_t next;
-    uint32_t type;
-    uint32_t len;
-    uint8_t data[0];
-} __attribute__((packed));
-
-static void load_linux(PCMachineState *pcms,
-                       FWCfgState *fw_cfg)
-{
-    uint16_t protocol;
-    int setup_size, kernel_size, cmdline_size;
-    int dtb_size, setup_data_offset;
-    uint32_t initrd_max;
-    uint8_t header[8192], *setup, *kernel;
-    hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
-    FILE *f;
-    char *vmode;
-    MachineState *machine = MACHINE(pcms);
-    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
-    struct setup_data *setup_data;
-    const char *kernel_filename = machine->kernel_filename;
-    const char *initrd_filename = machine->initrd_filename;
-    const char *dtb_filename = machine->dtb;
-    const char *kernel_cmdline = machine->kernel_cmdline;
-
-    /* Align to 16 bytes as a paranoia measure */
-    cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
-
-    /* load the kernel header */
-    f = fopen(kernel_filename, "rb");
-    if (!f || !(kernel_size = get_file_size(f)) ||
-        fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
-        MIN(ARRAY_SIZE(header), kernel_size)) {
-        fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
-                kernel_filename, strerror(errno));
-        exit(1);
-    }
-
-    /* kernel protocol version */
-#if 0
-    fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
-#endif
-    if (ldl_p(header+0x202) == 0x53726448) {
-        protocol = lduw_p(header+0x206);
-    } else {
-        size_t pvh_start_addr;
-        uint32_t mh_load_addr = 0;
-        uint32_t elf_kernel_size = 0;
-        /*
-         * This could be a multiboot kernel. If it is, let's stop treating it
-         * like a Linux kernel.
-         * Note: some multiboot images could be in the ELF format (the same of
-         * PVH), so we try multiboot first since we check the multiboot magic
-         * header before to load it.
-         */
-        if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
-                           kernel_cmdline, kernel_size, header)) {
-            return;
-        }
-        /*
-         * Check if the file is an uncompressed kernel file (ELF) and load it,
-         * saving the PVH entry point used by the x86/HVM direct boot ABI.
-         * If load_elfboot() is successful, populate the fw_cfg info.
-         */
-        if (pcmc->pvh_enabled &&
-            pvh_load_elfboot(kernel_filename,
-                             &mh_load_addr, &elf_kernel_size)) {
-            fclose(f);
-
-            pvh_start_addr = pvh_get_start_addr();
-
-            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
-            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
-            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
-
-            fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
-                strlen(kernel_cmdline) + 1);
-            fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
-
-            fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
-            fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
-                             header, sizeof(header));
-
-            /* load initrd */
-            if (initrd_filename) {
-                GMappedFile *mapped_file;
-                gsize initrd_size;
-                gchar *initrd_data;
-                GError *gerr = NULL;
-
-                mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
-                if (!mapped_file) {
-                    fprintf(stderr, "qemu: error reading initrd %s: %s\n",
-                            initrd_filename, gerr->message);
-                    exit(1);
-                }
-                pcms->initrd_mapped_file = mapped_file;
-
-                initrd_data = g_mapped_file_get_contents(mapped_file);
-                initrd_size = g_mapped_file_get_length(mapped_file);
-                initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1;
-                if (initrd_size >= initrd_max) {
-                    fprintf(stderr, "qemu: initrd is too large, cannot support."
-                            "(max: %"PRIu32", need %"PRId64")\n",
-                            initrd_max, (uint64_t)initrd_size);
-                    exit(1);
-                }
-
-                initrd_addr = (initrd_max - initrd_size) & ~4095;
-
-                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
-                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
-                fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
-                                 initrd_size);
-            }
-
-            option_rom[nb_option_roms].bootindex = 0;
-            option_rom[nb_option_roms].name = "pvh.bin";
-            nb_option_roms++;
-
-            return;
-        }
-        protocol = 0;
-    }
-
-    if (protocol < 0x200 || !(header[0x211] & 0x01)) {
-        /* Low kernel */
-        real_addr    = 0x90000;
-        cmdline_addr = 0x9a000 - cmdline_size;
-        prot_addr    = 0x10000;
-    } else if (protocol < 0x202) {
-        /* High but ancient kernel */
-        real_addr    = 0x90000;
-        cmdline_addr = 0x9a000 - cmdline_size;
-        prot_addr    = 0x100000;
-    } else {
-        /* High and recent kernel */
-        real_addr    = 0x10000;
-        cmdline_addr = 0x20000;
-        prot_addr    = 0x100000;
-    }
-
-#if 0
-    fprintf(stderr,
-            "qemu: real_addr     = 0x" TARGET_FMT_plx "\n"
-            "qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
-            "qemu: prot_addr     = 0x" TARGET_FMT_plx "\n",
-            real_addr,
-            cmdline_addr,
-            prot_addr);
-#endif
-
-    /* highest address for loading the initrd */
-    if (protocol >= 0x20c &&
-        lduw_p(header+0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
-        /*
-         * Linux has supported initrd up to 4 GB for a very long time (2007,
-         * long before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013),
-         * though it only sets initrd_max to 2 GB to "work around bootloader
-         * bugs". Luckily, QEMU firmware(which does something like bootloader)
-         * has supported this.
-         *
-         * It's believed that if XLF_CAN_BE_LOADED_ABOVE_4G is set, initrd can
-         * be loaded into any address.
-         *
-         * In addition, initrd_max is uint32_t simply because QEMU doesn't
-         * support the 64-bit boot protocol (specifically the ext_ramdisk_image
-         * field).
-         *
-         * Therefore here just limit initrd_max to UINT32_MAX simply as well.
-         */
-        initrd_max = UINT32_MAX;
-    } else if (protocol >= 0x203) {
-        initrd_max = ldl_p(header+0x22c);
-    } else {
-        initrd_max = 0x37ffffff;
-    }
-
-    if (initrd_max >= pcms->below_4g_mem_size - pcmc->acpi_data_size) {
-        initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1;
-    }
-
-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline)+1);
-    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
-
-    if (protocol >= 0x202) {
-        stl_p(header+0x228, cmdline_addr);
-    } else {
-        stw_p(header+0x20, 0xA33F);
-        stw_p(header+0x22, cmdline_addr-real_addr);
-    }
-
-    /* handle vga= parameter */
-    vmode = strstr(kernel_cmdline, "vga=");
-    if (vmode) {
-        unsigned int video_mode;
-        /* skip "vga=" */
-        vmode += 4;
-        if (!strncmp(vmode, "normal", 6)) {
-            video_mode = 0xffff;
-        } else if (!strncmp(vmode, "ext", 3)) {
-            video_mode = 0xfffe;
-        } else if (!strncmp(vmode, "ask", 3)) {
-            video_mode = 0xfffd;
-        } else {
-            video_mode = strtol(vmode, NULL, 0);
-        }
-        stw_p(header+0x1fa, video_mode);
-    }
-
-    /* loader type */
-    /* High nybble = B reserved for QEMU; low nybble is revision number.
-       If this code is substantially changed, you may want to consider
-       incrementing the revision. */
-    if (protocol >= 0x200) {
-        header[0x210] = 0xB0;
-    }
-    /* heap */
-    if (protocol >= 0x201) {
-        header[0x211] |= 0x80;	/* CAN_USE_HEAP */
-        stw_p(header+0x224, cmdline_addr-real_addr-0x200);
-    }
-
-    /* load initrd */
-    if (initrd_filename) {
-        GMappedFile *mapped_file;
-        gsize initrd_size;
-        gchar *initrd_data;
-        GError *gerr = NULL;
-
-        if (protocol < 0x200) {
-            fprintf(stderr, "qemu: linux kernel too old to load a ram disk\n");
-            exit(1);
-        }
-
-        mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
-        if (!mapped_file) {
-            fprintf(stderr, "qemu: error reading initrd %s: %s\n",
-                    initrd_filename, gerr->message);
-            exit(1);
-        }
-        pcms->initrd_mapped_file = mapped_file;
-
-        initrd_data = g_mapped_file_get_contents(mapped_file);
-        initrd_size = g_mapped_file_get_length(mapped_file);
-        if (initrd_size >= initrd_max) {
-            fprintf(stderr, "qemu: initrd is too large, cannot support."
-                    "(max: %"PRIu32", need %"PRId64")\n",
-                    initrd_max, (uint64_t)initrd_size);
-            exit(1);
-        }
-
-        initrd_addr = (initrd_max-initrd_size) & ~4095;
-
-        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
-        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
-        fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
-
-        stl_p(header+0x218, initrd_addr);
-        stl_p(header+0x21c, initrd_size);
-    }
-
-    /* load kernel and setup */
-    setup_size = header[0x1f1];
-    if (setup_size == 0) {
-        setup_size = 4;
-    }
-    setup_size = (setup_size+1)*512;
-    if (setup_size > kernel_size) {
-        fprintf(stderr, "qemu: invalid kernel header\n");
-        exit(1);
-    }
-    kernel_size -= setup_size;
-
-    setup  = g_malloc(setup_size);
-    kernel = g_malloc(kernel_size);
-    fseek(f, 0, SEEK_SET);
-    if (fread(setup, 1, setup_size, f) != setup_size) {
-        fprintf(stderr, "fread() failed\n");
-        exit(1);
-    }
-    if (fread(kernel, 1, kernel_size, f) != kernel_size) {
-        fprintf(stderr, "fread() failed\n");
-        exit(1);
-    }
-    fclose(f);
-
-    /* append dtb to kernel */
-    if (dtb_filename) {
-        if (protocol < 0x209) {
-            fprintf(stderr, "qemu: Linux kernel too old to load a dtb\n");
-            exit(1);
-        }
-
-        dtb_size = get_image_size(dtb_filename);
-        if (dtb_size <= 0) {
-            fprintf(stderr, "qemu: error reading dtb %s: %s\n",
-                    dtb_filename, strerror(errno));
-            exit(1);
-        }
-
-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-        kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
-        kernel = g_realloc(kernel, kernel_size);
-
-        stq_p(header+0x250, prot_addr + setup_data_offset);
-
-        setup_data = (struct setup_data *)(kernel + setup_data_offset);
-        setup_data->next = 0;
-        setup_data->type = cpu_to_le32(SETUP_DTB);
-        setup_data->len = cpu_to_le32(dtb_size);
-
-        load_image_size(dtb_filename, setup_data->data, dtb_size);
-    }
-
-    memcpy(setup, header, MIN(sizeof(header), setup_size));
-
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
-    fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
-
-    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_ADDR, real_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
-    fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
-
-    option_rom[nb_option_roms].bootindex = 0;
-    option_rom[nb_option_roms].name = "linuxboot.bin";
-    if (pcmc->linuxboot_dma_enabled && fw_cfg_dma_enabled(fw_cfg)) {
-        option_rom[nb_option_roms].name = "linuxboot_dma.bin";
-    }
-    nb_option_roms++;
-}
-
 #define NE2000_NB_MAX 6
 
 static const int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360,
@@ -1376,157 +1002,10 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level)
     }
 }
 
-static void pc_new_cpu(PCMachineState *pcms, int64_t apic_id, Error **errp)
-{
-    Object *cpu = NULL;
-    Error *local_err = NULL;
-    CPUX86State *env = NULL;
-
-    cpu = object_new(MACHINE(pcms)->cpu_type);
-
-    env = &X86_CPU(cpu)->env;
-    env->nr_dies = pcms->smp_dies;
-
-    object_property_set_uint(cpu, apic_id, "apic-id", &local_err);
-    object_property_set_bool(cpu, true, "realized", &local_err);
-
-    object_unref(cpu);
-    error_propagate(errp, local_err);
-}
-
-/*
- * This function is very similar to smp_parse()
- * in hw/core/machine.c but includes CPU die support.
- */
-void pc_smp_parse(MachineState *ms, QemuOpts *opts)
-{
-    PCMachineState *pcms = PC_MACHINE(ms);
-
-    if (opts) {
-        unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
-        unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
-        unsigned dies = qemu_opt_get_number(opts, "dies", 1);
-        unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
-        unsigned threads = qemu_opt_get_number(opts, "threads", 0);
-
-        /* compute missing values, prefer sockets over cores over threads */
-        if (cpus == 0 || sockets == 0) {
-            cores = cores > 0 ? cores : 1;
-            threads = threads > 0 ? threads : 1;
-            if (cpus == 0) {
-                sockets = sockets > 0 ? sockets : 1;
-                cpus = cores * threads * dies * sockets;
-            } else {
-                ms->smp.max_cpus =
-                        qemu_opt_get_number(opts, "maxcpus", cpus);
-                sockets = ms->smp.max_cpus / (cores * threads * dies);
-            }
-        } else if (cores == 0) {
-            threads = threads > 0 ? threads : 1;
-            cores = cpus / (sockets * dies * threads);
-            cores = cores > 0 ? cores : 1;
-        } else if (threads == 0) {
-            threads = cpus / (cores * dies * sockets);
-            threads = threads > 0 ? threads : 1;
-        } else if (sockets * dies * cores * threads < cpus) {
-            error_report("cpu topology: "
-                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
-                         "smp_cpus (%u)",
-                         sockets, dies, cores, threads, cpus);
-            exit(1);
-        }
-
-        ms->smp.max_cpus =
-                qemu_opt_get_number(opts, "maxcpus", cpus);
-
-        if (ms->smp.max_cpus < cpus) {
-            error_report("maxcpus must be equal to or greater than smp");
-            exit(1);
-        }
-
-        if (sockets * dies * cores * threads > ms->smp.max_cpus) {
-            error_report("cpu topology: "
-                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) > "
-                         "maxcpus (%u)",
-                         sockets, dies, cores, threads,
-                         ms->smp.max_cpus);
-            exit(1);
-        }
-
-        if (sockets * dies * cores * threads != ms->smp.max_cpus) {
-            warn_report("Invalid CPU topology deprecated: "
-                        "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
-                        "!= maxcpus (%u)",
-                        sockets, dies, cores, threads,
-                        ms->smp.max_cpus);
-        }
-
-        ms->smp.cpus = cpus;
-        ms->smp.cores = cores;
-        ms->smp.threads = threads;
-        pcms->smp_dies = dies;
-    }
-
-    if (ms->smp.cpus > 1) {
-        Error *blocker = NULL;
-        error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp");
-        replay_add_blocker(blocker);
-    }
-}
-
-void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp)
-{
-    PCMachineState *pcms = PC_MACHINE(ms);
-    int64_t apic_id = x86_cpu_apic_id_from_index(pcms, id);
-    Error *local_err = NULL;
-
-    if (id < 0) {
-        error_setg(errp, "Invalid CPU id: %" PRIi64, id);
-        return;
-    }
-
-    if (apic_id >= ACPI_CPU_HOTPLUG_ID_LIMIT) {
-        error_setg(errp, "Unable to add CPU: %" PRIi64
-                   ", resulting APIC ID (%" PRIi64 ") is too large",
-                   id, apic_id);
-        return;
-    }
-
-    pc_new_cpu(PC_MACHINE(ms), apic_id, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        return;
-    }
-}
-
-void pc_cpus_init(PCMachineState *pcms)
-{
-    int i;
-    const CPUArchIdList *possible_cpus;
-    MachineState *ms = MACHINE(pcms);
-    MachineClass *mc = MACHINE_GET_CLASS(pcms);
-    PCMachineClass *pcmc = PC_MACHINE_CLASS(mc);
-
-    x86_cpu_set_default_version(pcmc->default_cpu_version);
-
-    /* Calculates the limit to CPU APIC ID values
-     *
-     * Limit for the APIC ID value, so that all
-     * CPU APIC IDs are < pcms->apic_id_limit.
-     *
-     * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
-     */
-    pcms->apic_id_limit = x86_cpu_apic_id_from_index(pcms,
-                                                     ms->smp.max_cpus - 1) + 1;
-    possible_cpus = mc->possible_cpu_arch_ids(ms);
-    for (i = 0; i < ms->smp.cpus; i++) {
-        pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
-    }
-}
-
 static void pc_build_feature_control_file(PCMachineState *pcms)
 {
     MachineState *ms = MACHINE(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     X86CPU *cpu = X86_CPU(ms->possible_cpus->cpus[0].cpu);
     CPUX86State *env = &cpu->env;
     uint32_t unused, ecx, edx;
@@ -1550,7 +1029,7 @@ static void pc_build_feature_control_file(PCMachineState *pcms)
 
     val = g_malloc(sizeof(*val));
     *val = cpu_to_le64(feature_control_bits | FEATURE_CONTROL_LOCKED);
-    fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val));
+    fw_cfg_add_file(x86ms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val));
 }
 
 static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count)
@@ -1571,10 +1050,11 @@ void pc_machine_done(Notifier *notifier, void *data)
 {
     PCMachineState *pcms = container_of(notifier,
                                         PCMachineState, machine_done);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     PCIBus *bus = pcms->bus;
 
     /* set the number of CPUs */
-    rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
+    rtc_set_cpus_count(x86ms->rtc, x86ms->boot_cpus);
 
     if (bus) {
         int extra_hosts = 0;
@@ -1585,23 +1065,23 @@ void pc_machine_done(Notifier *notifier, void *data)
                 extra_hosts++;
             }
         }
-        if (extra_hosts && pcms->fw_cfg) {
+        if (extra_hosts && x86ms->fw_cfg) {
             uint64_t *val = g_malloc(sizeof(*val));
             *val = cpu_to_le64(extra_hosts);
-            fw_cfg_add_file(pcms->fw_cfg,
+            fw_cfg_add_file(x86ms->fw_cfg,
                     "etc/extra-pci-roots", val, sizeof(*val));
         }
     }
 
     acpi_setup();
-    if (pcms->fw_cfg) {
+    if (x86ms->fw_cfg) {
         pc_build_smbios(pcms);
         pc_build_feature_control_file(pcms);
         /* update FW_CFG_NB_CPUS to account for -device added CPUs */
-        fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+        fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
     }
 
-    if (pcms->apic_id_limit > 255 && !xen_enabled()) {
+    if (x86ms->apic_id_limit > 255 && !xen_enabled()) {
         IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
 
         if (!iommu || !x86_iommu_ir_supported(X86_IOMMU_DEVICE(iommu)) ||
@@ -1619,8 +1099,9 @@ void pc_guest_info_init(PCMachineState *pcms)
 {
     int i;
     MachineState *ms = MACHINE(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
 
-    pcms->apic_xrupt_override = kvm_allows_irq0_override();
+    x86ms->apic_xrupt_override = kvm_allows_irq0_override();
     pcms->numa_nodes = ms->numa_state->num_nodes;
     pcms->node_mem = g_malloc0(pcms->numa_nodes *
                                     sizeof *pcms->node_mem);
@@ -1645,14 +1126,17 @@ void xen_load_linux(PCMachineState *pcms)
 {
     int i;
     FWCfgState *fw_cfg;
+    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
 
     assert(MACHINE(pcms)->kernel_filename != NULL);
 
     fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
-    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
     rom_set_fw(fw_cfg);
 
-    load_linux(pcms, fw_cfg);
+    load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
+               pcmc->linuxboot_dma_enabled, pcmc->pvh_enabled);
     for (i = 0; i < nb_option_roms; i++) {
         assert(!strcmp(option_rom[i].name, "linuxboot.bin") ||
                !strcmp(option_rom[i].name, "linuxboot_dma.bin") ||
@@ -1660,7 +1144,7 @@ void xen_load_linux(PCMachineState *pcms)
                !strcmp(option_rom[i].name, "multiboot.bin"));
         rom_add_option(option_rom[i].name, option_rom[i].bootindex);
     }
-    pcms->fw_cfg = fw_cfg;
+    x86ms->fw_cfg = fw_cfg;
 }
 
 void pc_memory_init(PCMachineState *pcms,
@@ -1673,10 +1157,11 @@ void pc_memory_init(PCMachineState *pcms,
     MemoryRegion *ram_below_4g, *ram_above_4g;
     FWCfgState *fw_cfg;
     MachineState *machine = MACHINE(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 
-    assert(machine->ram_size == pcms->below_4g_mem_size +
-                                pcms->above_4g_mem_size);
+    assert(machine->ram_size == x86ms->below_4g_mem_size +
+                                x86ms->above_4g_mem_size);
 
     linux_boot = (machine->kernel_filename != NULL);
 
@@ -1690,17 +1175,17 @@ void pc_memory_init(PCMachineState *pcms,
     *ram_memory = ram;
     ram_below_4g = g_malloc(sizeof(*ram_below_4g));
     memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
-                             0, pcms->below_4g_mem_size);
+                             0, x86ms->below_4g_mem_size);
     memory_region_add_subregion(system_memory, 0, ram_below_4g);
-    e820_add_entry(0, pcms->below_4g_mem_size, E820_RAM);
-    if (pcms->above_4g_mem_size > 0) {
+    e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
+    if (x86ms->above_4g_mem_size > 0) {
         ram_above_4g = g_malloc(sizeof(*ram_above_4g));
         memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
-                                 pcms->below_4g_mem_size,
-                                 pcms->above_4g_mem_size);
+                                 x86ms->below_4g_mem_size,
+                                 x86ms->above_4g_mem_size);
         memory_region_add_subregion(system_memory, 0x100000000ULL,
                                     ram_above_4g);
-        e820_add_entry(0x100000000ULL, pcms->above_4g_mem_size, E820_RAM);
+        e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM);
     }
 
     if (!pcmc->has_reserved_memory &&
@@ -1735,7 +1220,7 @@ void pc_memory_init(PCMachineState *pcms,
         }
 
         machine->device_memory->base =
-            ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1 * GiB);
+            ROUND_UP(0x100000000ULL + x86ms->above_4g_mem_size, 1 * GiB);
 
         if (pcmc->enforce_aligned_dimm) {
             /* size device region assuming 1G page max alignment per slot */
@@ -1786,16 +1271,17 @@ void pc_memory_init(PCMachineState *pcms,
     }
 
     if (linux_boot) {
-        load_linux(pcms, fw_cfg);
+        load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
+                   pcmc->linuxboot_dma_enabled, pcmc->pvh_enabled);
     }
 
     for (i = 0; i < nb_option_roms; i++) {
         rom_add_option(option_rom[i].name, option_rom[i].bootindex);
     }
-    pcms->fw_cfg = fw_cfg;
+    x86ms->fw_cfg = fw_cfg;
 
     /* Init default IOAPIC address space */
-    pcms->ioapic_as = &address_space_memory;
+    x86ms->ioapic_as = &address_space_memory;
 }
 
 /*
@@ -1807,6 +1293,7 @@ uint64_t pc_pci_hole64_start(void)
     PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
     MachineState *ms = MACHINE(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     uint64_t hole64_start = 0;
 
     if (pcmc->has_reserved_memory && ms->device_memory->base) {
@@ -1815,7 +1302,7 @@ uint64_t pc_pci_hole64_start(void)
             hole64_start += memory_region_size(&ms->device_memory->mr);
         }
     } else {
-        hole64_start = 0x100000000ULL + pcms->above_4g_mem_size;
+        hole64_start = 0x100000000ULL + x86ms->above_4g_mem_size;
     }
 
     return ROUND_UP(hole64_start, 1 * GiB);
@@ -2154,6 +1641,7 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
     Error *local_err = NULL;
     X86CPU *cpu = X86_CPU(dev);
     PCMachineState *pcms = PC_MACHINE(hotplug_dev);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
 
     if (pcms->acpi_dev) {
         hotplug_handler_plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err);
@@ -2163,12 +1651,12 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
     }
 
     /* increment the number of CPUs */
-    pcms->boot_cpus++;
-    if (pcms->rtc) {
-        rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
+    x86ms->boot_cpus++;
+    if (x86ms->rtc) {
+        rtc_set_cpus_count(x86ms->rtc, x86ms->boot_cpus);
     }
-    if (pcms->fw_cfg) {
-        fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+    if (x86ms->fw_cfg) {
+        fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
     }
 
     found_cpu = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, NULL);
@@ -2214,6 +1702,7 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev,
     Error *local_err = NULL;
     X86CPU *cpu = X86_CPU(dev);
     PCMachineState *pcms = PC_MACHINE(hotplug_dev);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
 
     hotplug_handler_unplug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err);
     if (local_err) {
@@ -2225,10 +1714,10 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev,
     object_property_set_bool(OBJECT(dev), false, "realized", NULL);
 
     /* decrement the number of CPUs */
-    pcms->boot_cpus--;
+    x86ms->boot_cpus--;
     /* Update the number of CPUs in CMOS */
-    rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
-    fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
+    rtc_set_cpus_count(x86ms->rtc, x86ms->boot_cpus);
+    fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
  out:
     error_propagate(errp, local_err);
 }
@@ -2244,6 +1733,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
     CPUX86State *env = &cpu->env;
     MachineState *ms = MACHINE(hotplug_dev);
     PCMachineState *pcms = PC_MACHINE(hotplug_dev);
+    X86MachineState *x86ms = X86_MACHINE(hotplug_dev);
     unsigned int smp_cores = ms->smp.cores;
     unsigned int smp_threads = ms->smp.threads;
 
@@ -2253,7 +1743,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
         return;
     }
 
-    env->nr_dies = pcms->smp_dies;
+    env->nr_dies = x86ms->smp_dies;
 
     /*
      * If APIC ID is not set,
@@ -2261,13 +1751,13 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
      */
     if (cpu->apic_id == UNASSIGNED_APIC_ID) {
         int max_socket = (ms->smp.max_cpus - 1) /
-                                smp_threads / smp_cores / pcms->smp_dies;
+                                smp_threads / smp_cores / x86ms->smp_dies;
 
         /*
          * die-id was optional in QEMU 4.0 and older, so keep it optional
          * if there's only one die per socket.
          */
-        if (cpu->die_id < 0 && pcms->smp_dies == 1) {
+        if (cpu->die_id < 0 && x86ms->smp_dies == 1) {
             cpu->die_id = 0;
         }
 
@@ -2282,9 +1772,9 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
         if (cpu->die_id < 0) {
             error_setg(errp, "CPU die-id is not set");
             return;
-        } else if (cpu->die_id > pcms->smp_dies - 1) {
+        } else if (cpu->die_id > x86ms->smp_dies - 1) {
             error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u",
-                       cpu->die_id, pcms->smp_dies - 1);
+                       cpu->die_id, x86ms->smp_dies - 1);
             return;
         }
         if (cpu->core_id < 0) {
@@ -2308,7 +1798,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
         topo.die_id = cpu->die_id;
         topo.core_id = cpu->core_id;
         topo.smt_id = cpu->thread_id;
-        cpu->apic_id = apicid_from_topo_ids(pcms->smp_dies, smp_cores,
+        cpu->apic_id = apicid_from_topo_ids(x86ms->smp_dies, smp_cores,
                                             smp_threads, &topo);
     }
 
@@ -2316,7 +1806,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
     if (!cpu_slot) {
         MachineState *ms = MACHINE(pcms);
 
-        x86_topo_ids_from_apicid(cpu->apic_id, pcms->smp_dies,
+        x86_topo_ids_from_apicid(cpu->apic_id, x86ms->smp_dies,
                                  smp_cores, smp_threads, &topo);
         error_setg(errp,
             "Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with"
@@ -2338,7 +1828,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
     /* TODO: move socket_id/core_id/thread_id checks into x86_cpu_realizefn()
      * once -smp refactoring is complete and there will be CPU private
      * CPUState::nr_cores and CPUState::nr_threads fields instead of globals */
-    x86_topo_ids_from_apicid(cpu->apic_id, pcms->smp_dies,
+    x86_topo_ids_from_apicid(cpu->apic_id, x86ms->smp_dies,
                              smp_cores, smp_threads, &topo);
     if (cpu->socket_id != -1 && cpu->socket_id != topo.pkg_id) {
         error_setg(errp, "property socket-id: %u doesn't match set apic-id:"
@@ -2520,45 +2010,6 @@ pc_machine_get_device_memory_region_size(Object *obj, Visitor *v,
     visit_type_int(v, name, &value, errp);
 }
 
-static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
-                                            const char *name, void *opaque,
-                                            Error **errp)
-{
-    PCMachineState *pcms = PC_MACHINE(obj);
-    uint64_t value = pcms->max_ram_below_4g;
-
-    visit_type_size(v, name, &value, errp);
-}
-
-static void pc_machine_set_max_ram_below_4g(Object *obj, Visitor *v,
-                                            const char *name, void *opaque,
-                                            Error **errp)
-{
-    PCMachineState *pcms = PC_MACHINE(obj);
-    Error *error = NULL;
-    uint64_t value;
-
-    visit_type_size(v, name, &value, &error);
-    if (error) {
-        error_propagate(errp, error);
-        return;
-    }
-    if (value > 4 * GiB) {
-        error_setg(&error,
-                   "Machine option 'max-ram-below-4g=%"PRIu64
-                   "' expects size less than or equal to 4G", value);
-        error_propagate(errp, error);
-        return;
-    }
-
-    if (value < 1 * MiB) {
-        warn_report("Only %" PRIu64 " bytes of RAM below the 4GiB boundary,"
-                    "BIOS may not work with less than 1MiB", value);
-    }
-
-    pcms->max_ram_below_4g = value;
-}
-
 static void pc_machine_get_vmport(Object *obj, Visitor *v, const char *name,
                                   void *opaque, Error **errp)
 {
@@ -2664,7 +2115,6 @@ static void pc_machine_initfn(Object *obj)
 {
     PCMachineState *pcms = PC_MACHINE(obj);
 
-    pcms->max_ram_below_4g = 0; /* use default */
     pcms->smm = ON_OFF_AUTO_AUTO;
 #ifdef CONFIG_VMPORT
     pcms->vmport = ON_OFF_AUTO_AUTO;
@@ -2676,7 +2126,6 @@ static void pc_machine_initfn(Object *obj)
     pcms->smbus_enabled = true;
     pcms->sata_enabled = true;
     pcms->pit_enabled = true;
-    pcms->smp_dies = 1;
 
     pc_system_flash_create(pcms);
 }
@@ -2707,85 +2156,6 @@ static void pc_machine_wakeup(MachineState *machine)
     cpu_synchronize_all_post_reset();
 }
 
-static CpuInstanceProperties
-pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
-{
-    MachineClass *mc = MACHINE_GET_CLASS(ms);
-    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
-
-    assert(cpu_index < possible_cpus->len);
-    return possible_cpus->cpus[cpu_index].props;
-}
-
-static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx)
-{
-   X86CPUTopoInfo topo;
-   PCMachineState *pcms = PC_MACHINE(ms);
-
-   assert(idx < ms->possible_cpus->len);
-   x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
-                            pcms->smp_dies, ms->smp.cores,
-                            ms->smp.threads, &topo);
-   return topo.pkg_id % ms->numa_state->num_nodes;
-}
-
-static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
-{
-    PCMachineState *pcms = PC_MACHINE(ms);
-    int i;
-    unsigned int max_cpus = ms->smp.max_cpus;
-
-    if (ms->possible_cpus) {
-        /*
-         * make sure that max_cpus hasn't changed since the first use, i.e.
-         * -smp hasn't been parsed after it
-        */
-        assert(ms->possible_cpus->len == max_cpus);
-        return ms->possible_cpus;
-    }
-
-    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
-                                  sizeof(CPUArchId) * max_cpus);
-    ms->possible_cpus->len = max_cpus;
-    for (i = 0; i < ms->possible_cpus->len; i++) {
-        X86CPUTopoInfo topo;
-
-        ms->possible_cpus->cpus[i].type = ms->cpu_type;
-        ms->possible_cpus->cpus[i].vcpus_count = 1;
-        ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(pcms, i);
-        x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
-                                 pcms->smp_dies, ms->smp.cores,
-                                 ms->smp.threads, &topo);
-        ms->possible_cpus->cpus[i].props.has_socket_id = true;
-        ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id;
-        if (pcms->smp_dies > 1) {
-            ms->possible_cpus->cpus[i].props.has_die_id = true;
-            ms->possible_cpus->cpus[i].props.die_id = topo.die_id;
-        }
-        ms->possible_cpus->cpus[i].props.has_core_id = true;
-        ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
-        ms->possible_cpus->cpus[i].props.has_thread_id = true;
-        ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
-    }
-    return ms->possible_cpus;
-}
-
-static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
-{
-    /* cpu index isn't used */
-    CPUState *cs;
-
-    CPU_FOREACH(cs) {
-        X86CPU *cpu = X86_CPU(cs);
-
-        if (!cpu->apic_state) {
-            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
-        } else {
-            apic_deliver_nmi(cpu->apic_state);
-        }
-    }
-}
-
 static void pc_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -2810,14 +2180,11 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     pcmc->pvh_enabled = true;
     assert(!mc->get_hotplug_handler);
     mc->get_hotplug_handler = pc_get_hotplug_handler;
-    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
-    mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
-    mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
     mc->auto_enable_numa_with_memhp = true;
     mc->has_hotpluggable_cpus = true;
     mc->default_boot_order = "cad";
-    mc->hot_add_cpu = pc_hot_add_cpu;
-    mc->smp_parse = pc_smp_parse;
+    mc->hot_add_cpu = x86_hot_add_cpu;
+    mc->smp_parse = x86_smp_parse;
     mc->block_default_type = IF_IDE;
     mc->max_cpus = 255;
     mc->reset = pc_machine_reset;
@@ -2835,13 +2202,6 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
         pc_machine_get_device_memory_region_size, NULL,
         NULL, NULL, &error_abort);
 
-    object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
-        pc_machine_get_max_ram_below_4g, pc_machine_set_max_ram_below_4g,
-        NULL, NULL, &error_abort);
-
-    object_class_property_set_description(oc, PC_MACHINE_MAX_RAM_BELOW_4G,
-        "Maximum ram below the 4G boundary (32bit boundary)", &error_abort);
-
     object_class_property_add(oc, PC_MACHINE_SMM, "OnOffAuto",
         pc_machine_get_smm, pc_machine_set_smm,
         NULL, NULL, &error_abort);
@@ -2866,7 +2226,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
 
 static const TypeInfo pc_machine_info = {
     .name = TYPE_PC_MACHINE,
-    .parent = TYPE_MACHINE,
+    .parent = TYPE_X86_MACHINE,
     .abstract = true,
     .instance_size = sizeof(PCMachineState),
     .instance_init = pc_machine_initfn,
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2362675149..f63c27bc74 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -27,6 +27,7 @@
 
 #include "qemu/units.h"
 #include "hw/loader.h"
+#include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/apic.h"
 #include "hw/display/ramfb.h"
@@ -73,6 +74,7 @@ static void pc_init1(MachineState *machine,
 {
     PCMachineState *pcms = PC_MACHINE(machine);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     MemoryRegion *system_memory = get_system_memory();
     MemoryRegion *system_io = get_system_io();
     int i;
@@ -125,11 +127,11 @@ static void pc_init1(MachineState *machine,
     if (xen_enabled()) {
         xen_hvm_init(pcms, &ram_memory);
     } else {
-        if (!pcms->max_ram_below_4g) {
-            pcms->max_ram_below_4g = 0xe0000000; /* default: 3.5G */
+        if (!x86ms->max_ram_below_4g) {
+            x86ms->max_ram_below_4g = 0xe0000000; /* default: 3.5G */
         }
-        lowmem = pcms->max_ram_below_4g;
-        if (machine->ram_size >= pcms->max_ram_below_4g) {
+        lowmem = x86ms->max_ram_below_4g;
+        if (machine->ram_size >= x86ms->max_ram_below_4g) {
             if (pcmc->gigabyte_align) {
                 if (lowmem > 0xc0000000) {
                     lowmem = 0xc0000000;
@@ -138,21 +140,21 @@ static void pc_init1(MachineState *machine,
                     warn_report("Large machine and max_ram_below_4g "
                                 "(%" PRIu64 ") not a multiple of 1G; "
                                 "possible bad performance.",
-                                pcms->max_ram_below_4g);
+                                x86ms->max_ram_below_4g);
                 }
             }
         }
 
         if (machine->ram_size >= lowmem) {
-            pcms->above_4g_mem_size = machine->ram_size - lowmem;
-            pcms->below_4g_mem_size = lowmem;
+            x86ms->above_4g_mem_size = machine->ram_size - lowmem;
+            x86ms->below_4g_mem_size = lowmem;
         } else {
-            pcms->above_4g_mem_size = 0;
-            pcms->below_4g_mem_size = machine->ram_size;
+            x86ms->above_4g_mem_size = 0;
+            x86ms->below_4g_mem_size = machine->ram_size;
         }
     }
 
-    pc_cpus_init(pcms);
+    x86_cpus_init(x86ms, pcmc->default_cpu_version);
 
     if (kvm_enabled() && pcmc->kvmclock_enabled) {
         kvmclock_create();
@@ -190,19 +192,19 @@ static void pc_init1(MachineState *machine,
     gsi_state = g_malloc0(sizeof(*gsi_state));
     if (kvm_ioapic_in_kernel()) {
         kvm_pc_setup_irq_routing(pcmc->pci_enabled);
-        pcms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
-                                       GSI_NUM_PINS);
+        x86ms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
+                                        GSI_NUM_PINS);
     } else {
-        pcms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
+        x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
     }
 
     if (pcmc->pci_enabled) {
         pci_bus = i440fx_init(host_type,
                               pci_type,
-                              &i440fx_state, &piix3_devfn, &isa_bus, pcms->gsi,
+                              &i440fx_state, &piix3_devfn, &isa_bus, x86ms->gsi,
                               system_memory, system_io, machine->ram_size,
-                              pcms->below_4g_mem_size,
-                              pcms->above_4g_mem_size,
+                              x86ms->below_4g_mem_size,
+                              x86ms->above_4g_mem_size,
                               pci_memory, ram_memory);
         pcms->bus = pci_bus;
     } else {
@@ -212,7 +214,7 @@ static void pc_init1(MachineState *machine,
                               &error_abort);
         no_hpet = 1;
     }
-    isa_bus_irqs(isa_bus, pcms->gsi);
+    isa_bus_irqs(isa_bus, x86ms->gsi);
 
     if (kvm_pic_in_kernel()) {
         i8259 = kvm_i8259_init(isa_bus);
@@ -230,7 +232,7 @@ static void pc_init1(MachineState *machine,
         ioapic_init_gsi(gsi_state, "i440fx");
     }
 
-    pc_register_ferr_irq(pcms->gsi[13]);
+    pc_register_ferr_irq(x86ms->gsi[13]);
 
     pc_vga_init(isa_bus, pcmc->pci_enabled ? pci_bus : NULL);
 
@@ -240,7 +242,7 @@ static void pc_init1(MachineState *machine,
     }
 
     /* init basic PC hardware */
-    pc_basic_device_init(isa_bus, pcms->gsi, &rtc_state, true,
+    pc_basic_device_init(isa_bus, x86ms->gsi, &rtc_state, true,
                          (pcms->vmport != ON_OFF_AUTO_ON), pcms->pit_enabled,
                          0x4);
 
@@ -288,7 +290,7 @@ else {
         smi_irq = qemu_allocate_irq(pc_acpi_smi_interrupt, first_cpu, 0);
         /* TODO: Populate SPD eeprom data.  */
         smbus = piix4_pm_init(pci_bus, piix3_devfn + 3, 0xb100,
-                              pcms->gsi[9], smi_irq,
+                              x86ms->gsi[9], smi_irq,
                               pc_machine_is_smm_enabled(pcms),
                               &piix4_pm);
         smbus_eeprom_init(smbus, 8, NULL, 0);
@@ -304,7 +306,7 @@ else {
 
     if (machine->nvdimms_state->is_enabled) {
         nvdimm_init_acpi_state(machine->nvdimms_state, system_io,
-                               pcms->fw_cfg, OBJECT(pcms));
+                               x86ms->fw_cfg, OBJECT(pcms));
     }
 }
 
@@ -728,7 +730,7 @@ DEFINE_I440FX_MACHINE(v1_4, "pc-i440fx-1.4", pc_compat_1_4_fn,
 
 static void pc_i440fx_1_3_machine_options(MachineClass *m)
 {
-    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    X86MachineClass *x86mc = X86_MACHINE_CLASS(m);
     static GlobalProperty compat[] = {
         PC_CPU_MODEL_IDS("1.3.0")
         { "usb-tablet", "usb_version", "1" },
@@ -739,7 +741,7 @@ static void pc_i440fx_1_3_machine_options(MachineClass *m)
 
     pc_i440fx_1_4_machine_options(m);
     m->hw_version = "1.3.0";
-    pcmc->compat_apic_id_mode = true;
+    x86mc->compat_apic_id_mode = true;
     compat_props_add(m->compat_props, compat, G_N_ELEMENTS(compat));
 }
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index d4e8a1cb9f..71f71bc61d 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -41,6 +41,7 @@
 #include "hw/pci-host/q35.h"
 #include "hw/qdev-properties.h"
 #include "exec/address-spaces.h"
+#include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/ich9.h"
 #include "hw/i386/amd_iommu.h"
@@ -115,6 +116,7 @@ static void pc_q35_init(MachineState *machine)
 {
     PCMachineState *pcms = PC_MACHINE(machine);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    X86MachineState *x86ms = X86_MACHINE(pcms);
     Q35PCIHost *q35_host;
     PCIHostState *phb;
     PCIBus *host_bus;
@@ -152,34 +154,34 @@ static void pc_q35_init(MachineState *machine)
     /* Handle the machine opt max-ram-below-4g.  It is basically doing
      * min(qemu limit, user limit).
      */
-    if (!pcms->max_ram_below_4g) {
-        pcms->max_ram_below_4g = 1ULL << 32; /* default: 4G */;
+    if (!x86ms->max_ram_below_4g) {
+        x86ms->max_ram_below_4g = 1ULL << 32; /* default: 4G */;
     }
-    if (lowmem > pcms->max_ram_below_4g) {
-        lowmem = pcms->max_ram_below_4g;
+    if (lowmem > x86ms->max_ram_below_4g) {
+        lowmem = x86ms->max_ram_below_4g;
         if (machine->ram_size - lowmem > lowmem &&
             lowmem & (1 * GiB - 1)) {
             warn_report("There is possibly poor performance as the ram size "
                         " (0x%" PRIx64 ") is more then twice the size of"
                         " max-ram-below-4g (%"PRIu64") and"
                         " max-ram-below-4g is not a multiple of 1G.",
-                        (uint64_t)machine->ram_size, pcms->max_ram_below_4g);
+                        (uint64_t)machine->ram_size, x86ms->max_ram_below_4g);
         }
     }
 
     if (machine->ram_size >= lowmem) {
-        pcms->above_4g_mem_size = machine->ram_size - lowmem;
-        pcms->below_4g_mem_size = lowmem;
+        x86ms->above_4g_mem_size = machine->ram_size - lowmem;
+        x86ms->below_4g_mem_size = lowmem;
     } else {
-        pcms->above_4g_mem_size = 0;
-        pcms->below_4g_mem_size = machine->ram_size;
+        x86ms->above_4g_mem_size = 0;
+        x86ms->below_4g_mem_size = machine->ram_size;
     }
 
     if (xen_enabled()) {
         xen_hvm_init(pcms, &ram_memory);
     }
 
-    pc_cpus_init(pcms);
+    x86_cpus_init(x86ms, pcmc->default_cpu_version);
 
     kvmclock_create();
 
@@ -213,10 +215,10 @@ static void pc_q35_init(MachineState *machine)
     gsi_state = g_malloc0(sizeof(*gsi_state));
     if (kvm_ioapic_in_kernel()) {
         kvm_pc_setup_irq_routing(pcmc->pci_enabled);
-        pcms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
+        x86ms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
                                        GSI_NUM_PINS);
     } else {
-        pcms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
+        x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
     }
 
     /* create pci host bus */
@@ -231,9 +233,9 @@ static void pc_q35_init(MachineState *machine)
                              MCH_HOST_PROP_SYSTEM_MEM, NULL);
     object_property_set_link(OBJECT(q35_host), OBJECT(system_io),
                              MCH_HOST_PROP_IO_MEM, NULL);
-    object_property_set_int(OBJECT(q35_host), pcms->below_4g_mem_size,
+    object_property_set_int(OBJECT(q35_host), x86ms->below_4g_mem_size,
                             PCI_HOST_BELOW_4G_MEM_SIZE, NULL);
-    object_property_set_int(OBJECT(q35_host), pcms->above_4g_mem_size,
+    object_property_set_int(OBJECT(q35_host), x86ms->above_4g_mem_size,
                             PCI_HOST_ABOVE_4G_MEM_SIZE, NULL);
     /* pci */
     qdev_init_nofail(DEVICE(q35_host));
@@ -255,7 +257,7 @@ static void pc_q35_init(MachineState *machine)
     ich9_lpc = ICH9_LPC_DEVICE(lpc);
     lpc_dev = DEVICE(lpc);
     for (i = 0; i < GSI_NUM_PINS; i++) {
-        qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, pcms->gsi[i]);
+        qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, x86ms->gsi[i]);
     }
     pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
                  ICH9_LPC_NB_PIRQS);
@@ -279,7 +281,7 @@ static void pc_q35_init(MachineState *machine)
         ioapic_init_gsi(gsi_state, "q35");
     }
 
-    pc_register_ferr_irq(pcms->gsi[13]);
+    pc_register_ferr_irq(x86ms->gsi[13]);
 
     assert(pcms->vmport != ON_OFF_AUTO__MAX);
     if (pcms->vmport == ON_OFF_AUTO_AUTO) {
@@ -287,7 +289,7 @@ static void pc_q35_init(MachineState *machine)
     }
 
     /* init basic PC hardware */
-    pc_basic_device_init(isa_bus, pcms->gsi, &rtc_state, !mc->no_floppy,
+    pc_basic_device_init(isa_bus, x86ms->gsi, &rtc_state, !mc->no_floppy,
                          (pcms->vmport != ON_OFF_AUTO_ON), pcms->pit_enabled,
                          0xff0104);
 
@@ -330,7 +332,7 @@ static void pc_q35_init(MachineState *machine)
 
     if (machine->nvdimms_state->is_enabled) {
         nvdimm_init_acpi_state(machine->nvdimms_state, system_io,
-                               pcms->fw_cfg, OBJECT(pcms));
+                               x86ms->fw_cfg, OBJECT(pcms));
     }
 }
 
diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index a9983f0bfb..97f38e0423 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -31,6 +31,7 @@
 #include "qemu/option.h"
 #include "qemu/units.h"
 #include "hw/sysbus.h"
+#include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
 #include "hw/loader.h"
 #include "hw/qdev-properties.h"
@@ -38,8 +39,6 @@
 #include "hw/block/flash.h"
 #include "sysemu/kvm.h"
 
-#define BIOS_FILENAME "bios.bin"
-
 /*
  * We don't have a theoretically justifiable exact lower bound on the base
  * address of any flash mapping. In practice, the IO-APIC MMIO range is
@@ -211,59 +210,6 @@ static void pc_system_flash_map(PCMachineState *pcms,
     }
 }
 
-static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)
-{
-    char *filename;
-    MemoryRegion *bios, *isa_bios;
-    int bios_size, isa_bios_size;
-    int ret;
-
-    /* BIOS load */
-    if (bios_name == NULL) {
-        bios_name = BIOS_FILENAME;
-    }
-    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
-    if (filename) {
-        bios_size = get_image_size(filename);
-    } else {
-        bios_size = -1;
-    }
-    if (bios_size <= 0 ||
-        (bios_size % 65536) != 0) {
-        goto bios_error;
-    }
-    bios = g_malloc(sizeof(*bios));
-    memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
-    if (!isapc_ram_fw) {
-        memory_region_set_readonly(bios, true);
-    }
-    ret = rom_add_file_fixed(bios_name, (uint32_t)(-bios_size), -1);
-    if (ret != 0) {
-    bios_error:
-        fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
-        exit(1);
-    }
-    g_free(filename);
-
-    /* map the last 128KB of the BIOS in ISA space */
-    isa_bios_size = MIN(bios_size, 128 * KiB);
-    isa_bios = g_malloc(sizeof(*isa_bios));
-    memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
-                             bios_size - isa_bios_size, isa_bios_size);
-    memory_region_add_subregion_overlap(rom_memory,
-                                        0x100000 - isa_bios_size,
-                                        isa_bios,
-                                        1);
-    if (!isapc_ram_fw) {
-        memory_region_set_readonly(isa_bios, true);
-    }
-
-    /* map all the bios at the top of memory */
-    memory_region_add_subregion(rom_memory,
-                                (uint32_t)(-bios_size),
-                                bios);
-}
-
 void pc_system_firmware_init(PCMachineState *pcms,
                              MemoryRegion *rom_memory)
 {
@@ -272,7 +218,7 @@ void pc_system_firmware_init(PCMachineState *pcms,
     BlockBackend *pflash_blk[ARRAY_SIZE(pcms->flash)];
 
     if (!pcmc->pci_enabled) {
-        old_pc_system_rom_init(rom_memory, true);
+        x86_system_rom_init(rom_memory, true);
         return;
     }
 
@@ -293,7 +239,7 @@ void pc_system_firmware_init(PCMachineState *pcms,
 
     if (!pflash_blk[0]) {
         /* Machine property pflash0 not set, use ROM mode */
-        old_pc_system_rom_init(rom_memory, false);
+        x86_system_rom_init(rom_memory, false);
     } else {
         if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
             /*
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
new file mode 100644
index 0000000000..4de9dd100f
--- /dev/null
+++ b/hw/i386/x86.c
@@ -0,0 +1,788 @@
+/*
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/cutils.h"
+#include "qemu/units.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi/qapi-visit-common.h"
+#include "qapi/visitor.h"
+#include "sysemu/qtest.h"
+#include "sysemu/numa.h"
+#include "sysemu/replay.h"
+#include "sysemu/sysemu.h"
+
+#include "hw/i386/x86.h"
+#include "target/i386/cpu.h"
+#include "hw/i386/topology.h"
+#include "hw/i386/fw_cfg.h"
+#include "hw/acpi/cpu_hotplug.h"
+#include "hw/nmi.h"
+#include "hw/loader.h"
+#include "multiboot.h"
+#include "pvh.h"
+#include "standard-headers/asm-x86/bootparam.h"
+
+#define BIOS_FILENAME "bios.bin"
+
+/* Calculates initial APIC ID for a specific CPU index
+ *
+ * Currently we need to be able to calculate the APIC ID from the CPU index
+ * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces have
+ * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of
+ * all CPUs up to max_cpus.
+ */
+uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
+                                    unsigned int cpu_index)
+{
+    MachineState *ms = MACHINE(x86ms);
+    X86MachineClass *x86mc = X86_MACHINE_GET_CLASS(x86ms);
+    uint32_t correct_id;
+    static bool warned;
+
+    correct_id = x86_apicid_from_cpu_idx(x86ms->smp_dies, ms->smp.cores,
+                                         ms->smp.threads, cpu_index);
+    if (x86mc->compat_apic_id_mode) {
+        if (cpu_index != correct_id && !warned && !qtest_enabled()) {
+            error_report("APIC IDs set in compatibility mode, "
+                         "CPU topology won't match the configuration");
+            warned = true;
+        }
+        return cpu_index;
+    } else {
+        return correct_id;
+    }
+}
+
+
+static void x86_new_cpu(X86MachineState *x86ms, int64_t apic_id, Error **errp)
+{
+    Object *cpu = NULL;
+    Error *local_err = NULL;
+    CPUX86State *env = NULL;
+
+    cpu = object_new(MACHINE(x86ms)->cpu_type);
+
+    env = &X86_CPU(cpu)->env;
+    env->nr_dies = x86ms->smp_dies;
+
+    object_property_set_uint(cpu, apic_id, "apic-id", &local_err);
+    object_property_set_bool(cpu, true, "realized", &local_err);
+
+    object_unref(cpu);
+    error_propagate(errp, local_err);
+}
+
+/*
+ * This function is very similar to smp_parse()
+ * in hw/core/machine.c but includes CPU die support.
+ */
+void x86_smp_parse(MachineState *ms, QemuOpts *opts)
+{
+    X86MachineState *x86ms = X86_MACHINE(ms);
+
+    if (opts) {
+        unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
+        unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
+        unsigned dies = qemu_opt_get_number(opts, "dies", 1);
+        unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
+        unsigned threads = qemu_opt_get_number(opts, "threads", 0);
+
+        /* compute missing values, prefer sockets over cores over threads */
+        if (cpus == 0 || sockets == 0) {
+            cores = cores > 0 ? cores : 1;
+            threads = threads > 0 ? threads : 1;
+            if (cpus == 0) {
+                sockets = sockets > 0 ? sockets : 1;
+                cpus = cores * threads * dies * sockets;
+            } else {
+                ms->smp.max_cpus =
+                        qemu_opt_get_number(opts, "maxcpus", cpus);
+                sockets = ms->smp.max_cpus / (cores * threads * dies);
+            }
+        } else if (cores == 0) {
+            threads = threads > 0 ? threads : 1;
+            cores = cpus / (sockets * dies * threads);
+            cores = cores > 0 ? cores : 1;
+        } else if (threads == 0) {
+            threads = cpus / (cores * dies * sockets);
+            threads = threads > 0 ? threads : 1;
+        } else if (sockets * dies * cores * threads < cpus) {
+            error_report("cpu topology: "
+                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
+                         "smp_cpus (%u)",
+                         sockets, dies, cores, threads, cpus);
+            exit(1);
+        }
+
+        ms->smp.max_cpus =
+                qemu_opt_get_number(opts, "maxcpus", cpus);
+
+        if (ms->smp.max_cpus < cpus) {
+            error_report("maxcpus must be equal to or greater than smp");
+            exit(1);
+        }
+
+        if (sockets * dies * cores * threads > ms->smp.max_cpus) {
+            error_report("cpu topology: "
+                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) > "
+                         "maxcpus (%u)",
+                         sockets, dies, cores, threads,
+                         ms->smp.max_cpus);
+            exit(1);
+        }
+
+        if (sockets * dies * cores * threads != ms->smp.max_cpus) {
+            warn_report("Invalid CPU topology deprecated: "
+                        "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
+                        "!= maxcpus (%u)",
+                        sockets, dies, cores, threads,
+                        ms->smp.max_cpus);
+        }
+
+        ms->smp.cpus = cpus;
+        ms->smp.cores = cores;
+        ms->smp.threads = threads;
+        x86ms->smp_dies = dies;
+    }
+
+    if (ms->smp.cpus > 1) {
+        Error *blocker = NULL;
+        error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp");
+        replay_add_blocker(blocker);
+    }
+}
+
+void x86_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    int64_t apic_id = x86_cpu_apic_id_from_index(x86ms, id);
+    Error *local_err = NULL;
+
+    if (id < 0) {
+        error_setg(errp, "Invalid CPU id: %" PRIi64, id);
+        return;
+    }
+
+    if (apic_id >= ACPI_CPU_HOTPLUG_ID_LIMIT) {
+        error_setg(errp, "Unable to add CPU: %" PRIi64
+                   ", resulting APIC ID (%" PRIi64 ") is too large",
+                   id, apic_id);
+        return;
+    }
+
+    x86_new_cpu(X86_MACHINE(ms), apic_id, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
+{
+    int i;
+    const CPUArchIdList *possible_cpus;
+    MachineState *ms = MACHINE(x86ms);
+    MachineClass *mc = MACHINE_GET_CLASS(x86ms);
+
+    x86_cpu_set_default_version(default_cpu_version);
+
+    /* Calculates the limit to CPU APIC ID values
+     *
+     * Limit for the APIC ID value, so that all
+     * CPU APIC IDs are < x86ms->apic_id_limit.
+     *
+     * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
+     */
+    x86ms->apic_id_limit = x86_cpu_apic_id_from_index(x86ms,
+                                                      ms->smp.max_cpus - 1) + 1;
+    possible_cpus = mc->possible_cpu_arch_ids(ms);
+    for (i = 0; i < ms->smp.cpus; i++) {
+        x86_new_cpu(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
+    }
+}
+
+void x86_nmi(NMIState *n, int cpu_index, Error **errp)
+{
+    /* cpu index isn't used */
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        X86CPU *cpu = X86_CPU(cs);
+
+        if (!cpu->apic_state) {
+            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
+        } else {
+            apic_deliver_nmi(cpu->apic_state);
+        }
+    }
+}
+
+CpuInstanceProperties
+x86_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+    assert(cpu_index < possible_cpus->len);
+    return possible_cpus->cpus[cpu_index].props;
+}
+
+int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx)
+{
+   X86CPUTopoInfo topo;
+   X86MachineState *x86ms = X86_MACHINE(ms);
+
+   assert(idx < ms->possible_cpus->len);
+   x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
+                            x86ms->smp_dies, ms->smp.cores,
+                            ms->smp.threads, &topo);
+   return topo.pkg_id % ms->numa_state->num_nodes;
+}
+
+const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
+{
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    int i;
+    unsigned int max_cpus = ms->smp.max_cpus;
+
+    if (ms->possible_cpus) {
+        /*
+         * make sure that max_cpus hasn't changed since the first use, i.e.
+         * -smp hasn't been parsed after it
+        */
+        assert(ms->possible_cpus->len == max_cpus);
+        return ms->possible_cpus;
+    }
+
+    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
+                                  sizeof(CPUArchId) * max_cpus);
+    ms->possible_cpus->len = max_cpus;
+    for (i = 0; i < ms->possible_cpus->len; i++) {
+        X86CPUTopoInfo topo;
+
+        ms->possible_cpus->cpus[i].type = ms->cpu_type;
+        ms->possible_cpus->cpus[i].vcpus_count = 1;
+        ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(x86ms, i);
+        x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
+                                 x86ms->smp_dies, ms->smp.cores,
+                                 ms->smp.threads, &topo);
+        ms->possible_cpus->cpus[i].props.has_socket_id = true;
+        ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id;
+        if (x86ms->smp_dies > 1) {
+            ms->possible_cpus->cpus[i].props.has_die_id = true;
+            ms->possible_cpus->cpus[i].props.die_id = topo.die_id;
+        }
+        ms->possible_cpus->cpus[i].props.has_core_id = true;
+        ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
+        ms->possible_cpus->cpus[i].props.has_thread_id = true;
+        ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
+    }
+    return ms->possible_cpus;
+}
+
+void x86_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)
+{
+    char *filename;
+    MemoryRegion *bios, *isa_bios;
+    int bios_size, isa_bios_size;
+    int ret;
+
+    /* BIOS load */
+    if (bios_name == NULL) {
+        bios_name = BIOS_FILENAME;
+    }
+    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+    if (filename) {
+        bios_size = get_image_size(filename);
+    } else {
+        bios_size = -1;
+    }
+    if (bios_size <= 0 ||
+        (bios_size % 65536) != 0) {
+        goto bios_error;
+    }
+    bios = g_malloc(sizeof(*bios));
+    memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
+    if (!isapc_ram_fw) {
+        memory_region_set_readonly(bios, true);
+    }
+    ret = rom_add_file_fixed(bios_name, (uint32_t)(-bios_size), -1);
+    if (ret != 0) {
+    bios_error:
+        fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
+        exit(1);
+    }
+    g_free(filename);
+
+    /* map the last 128KB of the BIOS in ISA space */
+    isa_bios_size = MIN(bios_size, 128 * KiB);
+    isa_bios = g_malloc(sizeof(*isa_bios));
+    memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
+                             bios_size - isa_bios_size, isa_bios_size);
+    memory_region_add_subregion_overlap(rom_memory,
+                                        0x100000 - isa_bios_size,
+                                        isa_bios,
+                                        1);
+    if (!isapc_ram_fw) {
+        memory_region_set_readonly(isa_bios, true);
+    }
+
+    /* map all the bios at the top of memory */
+    memory_region_add_subregion(rom_memory,
+                                (uint32_t)(-bios_size),
+                                bios);
+}
+
+static long get_file_size(FILE *f)
+{
+    long where, size;
+
+    /* XXX: on Unix systems, using fstat() probably makes more sense */
+
+    where = ftell(f);
+    fseek(f, 0, SEEK_END);
+    size = ftell(f);
+    fseek(f, where, SEEK_SET);
+
+    return size;
+}
+
+struct setup_data {
+    uint64_t next;
+    uint32_t type;
+    uint32_t len;
+    uint8_t data[0];
+} __attribute__((packed));
+
+void load_linux(X86MachineState *x86ms,
+                FWCfgState *fw_cfg,
+                unsigned acpi_data_size,
+                bool linuxboot_dma_enabled,
+                bool pvh_enabled)
+{
+    uint16_t protocol;
+    int setup_size, kernel_size, cmdline_size;
+    int dtb_size, setup_data_offset;
+    uint32_t initrd_max;
+    uint8_t header[8192], *setup, *kernel;
+    hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
+    FILE *f;
+    char *vmode;
+    MachineState *machine = MACHINE(x86ms);
+    struct setup_data *setup_data;
+    const char *kernel_filename = machine->kernel_filename;
+    const char *initrd_filename = machine->initrd_filename;
+    const char *dtb_filename = machine->dtb;
+    const char *kernel_cmdline = machine->kernel_cmdline;
+
+    /* Align to 16 bytes as a paranoia measure */
+    cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
+
+    /* load the kernel header */
+    f = fopen(kernel_filename, "rb");
+    if (!f || !(kernel_size = get_file_size(f)) ||
+        fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
+        MIN(ARRAY_SIZE(header), kernel_size)) {
+        fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
+                kernel_filename, strerror(errno));
+        exit(1);
+    }
+
+    /* kernel protocol version */
+#if 0
+    fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
+#endif
+    if (ldl_p(header+0x202) == 0x53726448) {
+        protocol = lduw_p(header+0x206);
+    } else {
+        size_t pvh_start_addr;
+        uint32_t mh_load_addr = 0;
+        uint32_t elf_kernel_size = 0;
+        /*
+         * This could be a multiboot kernel. If it is, let's stop treating it
+         * like a Linux kernel.
+         * Note: some multiboot images could be in the ELF format (the same of
+         * PVH), so we try multiboot first since we check the multiboot magic
+         * header before to load it.
+         */
+        if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
+                           kernel_cmdline, kernel_size, header)) {
+            return;
+        }
+        /*
+         * Check if the file is an uncompressed kernel file (ELF) and load it,
+         * saving the PVH entry point used by the x86/HVM direct boot ABI.
+         * If load_elfboot() is successful, populate the fw_cfg info.
+         */
+        if (pvh_enabled &&
+            pvh_load_elfboot(kernel_filename,
+                             &mh_load_addr, &elf_kernel_size)) {
+            fclose(f);
+
+            pvh_start_addr = pvh_get_start_addr();
+
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
+
+            fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
+                strlen(kernel_cmdline) + 1);
+            fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
+
+            fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
+            fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
+                             header, sizeof(header));
+
+            /* load initrd */
+            if (initrd_filename) {
+                GMappedFile *mapped_file;
+                gsize initrd_size;
+                gchar *initrd_data;
+                GError *gerr = NULL;
+
+                mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
+                if (!mapped_file) {
+                    fprintf(stderr, "qemu: error reading initrd %s: %s\n",
+                            initrd_filename, gerr->message);
+                    exit(1);
+                }
+                x86ms->initrd_mapped_file = mapped_file;
+
+                initrd_data = g_mapped_file_get_contents(mapped_file);
+                initrd_size = g_mapped_file_get_length(mapped_file);
+                initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
+                if (initrd_size >= initrd_max) {
+                    fprintf(stderr, "qemu: initrd is too large, cannot support."
+                            "(max: %"PRIu32", need %"PRId64")\n",
+                            initrd_max, (uint64_t)initrd_size);
+                    exit(1);
+                }
+
+                initrd_addr = (initrd_max - initrd_size) & ~4095;
+
+                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
+                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
+                fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
+                                 initrd_size);
+            }
+
+            option_rom[nb_option_roms].bootindex = 0;
+            option_rom[nb_option_roms].name = "pvh.bin";
+            nb_option_roms++;
+
+            return;
+        }
+        protocol = 0;
+    }
+
+    if (protocol < 0x200 || !(header[0x211] & 0x01)) {
+        /* Low kernel */
+        real_addr    = 0x90000;
+        cmdline_addr = 0x9a000 - cmdline_size;
+        prot_addr    = 0x10000;
+    } else if (protocol < 0x202) {
+        /* High but ancient kernel */
+        real_addr    = 0x90000;
+        cmdline_addr = 0x9a000 - cmdline_size;
+        prot_addr    = 0x100000;
+    } else {
+        /* High and recent kernel */
+        real_addr    = 0x10000;
+        cmdline_addr = 0x20000;
+        prot_addr    = 0x100000;
+    }
+
+#if 0
+    fprintf(stderr,
+            "qemu: real_addr     = 0x" TARGET_FMT_plx "\n"
+            "qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
+            "qemu: prot_addr     = 0x" TARGET_FMT_plx "\n",
+            real_addr,
+            cmdline_addr,
+            prot_addr);
+#endif
+
+    /* highest address for loading the initrd */
+    if (protocol >= 0x20c &&
+        lduw_p(header+0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
+        /*
+         * Linux has supported initrd up to 4 GB for a very long time (2007,
+         * long before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013),
+         * though it only sets initrd_max to 2 GB to "work around bootloader
+         * bugs". Luckily, QEMU firmware(which does something like bootloader)
+         * has supported this.
+         *
+         * It's believed that if XLF_CAN_BE_LOADED_ABOVE_4G is set, initrd can
+         * be loaded into any address.
+         *
+         * In addition, initrd_max is uint32_t simply because QEMU doesn't
+         * support the 64-bit boot protocol (specifically the ext_ramdisk_image
+         * field).
+         *
+         * Therefore here just limit initrd_max to UINT32_MAX simply as well.
+         */
+        initrd_max = UINT32_MAX;
+    } else if (protocol >= 0x203) {
+        initrd_max = ldl_p(header+0x22c);
+    } else {
+        initrd_max = 0x37ffffff;
+    }
+
+    if (initrd_max >= x86ms->below_4g_mem_size - acpi_data_size) {
+        initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
+    }
+
+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline)+1);
+    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
+
+    if (protocol >= 0x202) {
+        stl_p(header+0x228, cmdline_addr);
+    } else {
+        stw_p(header+0x20, 0xA33F);
+        stw_p(header+0x22, cmdline_addr-real_addr);
+    }
+
+    /* handle vga= parameter */
+    vmode = strstr(kernel_cmdline, "vga=");
+    if (vmode) {
+        unsigned int video_mode;
+        /* skip "vga=" */
+        vmode += 4;
+        if (!strncmp(vmode, "normal", 6)) {
+            video_mode = 0xffff;
+        } else if (!strncmp(vmode, "ext", 3)) {
+            video_mode = 0xfffe;
+        } else if (!strncmp(vmode, "ask", 3)) {
+            video_mode = 0xfffd;
+        } else {
+            video_mode = strtol(vmode, NULL, 0);
+        }
+        stw_p(header+0x1fa, video_mode);
+    }
+
+    /* loader type */
+    /* High nybble = B reserved for QEMU; low nybble is revision number.
+       If this code is substantially changed, you may want to consider
+       incrementing the revision. */
+    if (protocol >= 0x200) {
+        header[0x210] = 0xB0;
+    }
+    /* heap */
+    if (protocol >= 0x201) {
+        header[0x211] |= 0x80;	/* CAN_USE_HEAP */
+        stw_p(header+0x224, cmdline_addr-real_addr-0x200);
+    }
+
+    /* load initrd */
+    if (initrd_filename) {
+        GMappedFile *mapped_file;
+        gsize initrd_size;
+        gchar *initrd_data;
+        GError *gerr = NULL;
+
+        if (protocol < 0x200) {
+            fprintf(stderr, "qemu: linux kernel too old to load a ram disk\n");
+            exit(1);
+        }
+
+        mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
+        if (!mapped_file) {
+            fprintf(stderr, "qemu: error reading initrd %s: %s\n",
+                    initrd_filename, gerr->message);
+            exit(1);
+        }
+        x86ms->initrd_mapped_file = mapped_file;
+
+        initrd_data = g_mapped_file_get_contents(mapped_file);
+        initrd_size = g_mapped_file_get_length(mapped_file);
+        if (initrd_size >= initrd_max) {
+            fprintf(stderr, "qemu: initrd is too large, cannot support."
+                    "(max: %"PRIu32", need %"PRId64")\n",
+                    initrd_max, (uint64_t)initrd_size);
+            exit(1);
+        }
+
+        initrd_addr = (initrd_max-initrd_size) & ~4095;
+
+        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
+        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
+        fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
+
+        stl_p(header+0x218, initrd_addr);
+        stl_p(header+0x21c, initrd_size);
+    }
+
+    /* load kernel and setup */
+    setup_size = header[0x1f1];
+    if (setup_size == 0) {
+        setup_size = 4;
+    }
+    setup_size = (setup_size+1)*512;
+    if (setup_size > kernel_size) {
+        fprintf(stderr, "qemu: invalid kernel header\n");
+        exit(1);
+    }
+    kernel_size -= setup_size;
+
+    setup  = g_malloc(setup_size);
+    kernel = g_malloc(kernel_size);
+    fseek(f, 0, SEEK_SET);
+    if (fread(setup, 1, setup_size, f) != setup_size) {
+        fprintf(stderr, "fread() failed\n");
+        exit(1);
+    }
+    if (fread(kernel, 1, kernel_size, f) != kernel_size) {
+        fprintf(stderr, "fread() failed\n");
+        exit(1);
+    }
+    fclose(f);
+
+    /* append dtb to kernel */
+    if (dtb_filename) {
+        if (protocol < 0x209) {
+            fprintf(stderr, "qemu: Linux kernel too old to load a dtb\n");
+            exit(1);
+        }
+
+        dtb_size = get_image_size(dtb_filename);
+        if (dtb_size <= 0) {
+            fprintf(stderr, "qemu: error reading dtb %s: %s\n",
+                    dtb_filename, strerror(errno));
+            exit(1);
+        }
+
+        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
+        kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
+        kernel = g_realloc(kernel, kernel_size);
+
+        stq_p(header+0x250, prot_addr + setup_data_offset);
+
+        setup_data = (struct setup_data *)(kernel + setup_data_offset);
+        setup_data->next = 0;
+        setup_data->type = cpu_to_le32(SETUP_DTB);
+        setup_data->len = cpu_to_le32(dtb_size);
+
+        load_image_size(dtb_filename, setup_data->data, dtb_size);
+    }
+
+    memcpy(setup, header, MIN(sizeof(header), setup_size));
+
+    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
+
+    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_ADDR, real_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
+
+    option_rom[nb_option_roms].bootindex = 0;
+    option_rom[nb_option_roms].name = "linuxboot.bin";
+    if (linuxboot_dma_enabled && fw_cfg_dma_enabled(fw_cfg)) {
+        option_rom[nb_option_roms].name = "linuxboot_dma.bin";
+    }
+    nb_option_roms++;
+}
+
+static void x86_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
+                                             const char *name, void *opaque,
+                                             Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+    uint64_t value = x86ms->max_ram_below_4g;
+
+    visit_type_size(v, name, &value, errp);
+}
+
+static void x86_machine_set_max_ram_below_4g(Object *obj, Visitor *v,
+                                             const char *name, void *opaque,
+                                             Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+    Error *error = NULL;
+    uint64_t value;
+
+    visit_type_size(v, name, &value, &error);
+    if (error) {
+        error_propagate(errp, error);
+        return;
+    }
+    if (value > 4 * GiB) {
+        error_setg(&error,
+                   "Machine option 'max-ram-below-4g=%"PRIu64
+                   "' expects size less than or equal to 4G", value);
+        error_propagate(errp, error);
+        return;
+    }
+
+    if (value < 1 * MiB) {
+        warn_report("Only %" PRIu64 " bytes of RAM below the 4GiB boundary,"
+                    "BIOS may not work with less than 1MiB", value);
+    }
+
+    x86ms->max_ram_below_4g = value;
+}
+
+static void x86_machine_initfn(Object *obj)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    x86ms->max_ram_below_4g = 0; /* use default */
+    x86ms->smp_dies = 1;
+}
+
+static void x86_machine_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
+    mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
+    mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+
+    object_class_property_add(oc, X86_MACHINE_MAX_RAM_BELOW_4G, "size",
+        x86_machine_get_max_ram_below_4g, x86_machine_set_max_ram_below_4g,
+        NULL, NULL, &error_abort);
+
+    object_class_property_set_description(oc, X86_MACHINE_MAX_RAM_BELOW_4G,
+        "Maximum ram below the 4G boundary (32bit boundary)", &error_abort);
+}
+
+static const TypeInfo x86_machine_info = {
+    .name = TYPE_X86_MACHINE,
+    .parent = TYPE_MACHINE,
+    .abstract = true,
+    .instance_size = sizeof(X86MachineState),
+    .instance_init = x86_machine_initfn,
+    .class_size = sizeof(X86MachineClass),
+    .class_init = x86_machine_class_init,
+};
+
+static void x86_machine_register_types(void)
+{
+    type_register_static(&x86_machine_info);
+}
+
+type_init(x86_machine_register_types)
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 1ede055387..e621dde6c3 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -23,6 +23,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "monitor/monitor.h"
+#include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/apic.h"
 #include "hw/i386/ioapic.h"
@@ -89,7 +90,7 @@ static void ioapic_entry_parse(uint64_t entry, struct ioapic_entry_info *info)
 
 static void ioapic_service(IOAPICCommonState *s)
 {
-    AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
+    AddressSpace *ioapic_as = X86_MACHINE(qdev_get_machine())->ioapic_as;
     struct ioapic_entry_info info;
     uint8_t i;
     uint32_t mask;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 062feeb69e..de28d55e5c 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -3,6 +3,7 @@
 
 #include "exec/memory.h"
 #include "hw/boards.h"
+#include "hw/i386/x86.h"
 #include "hw/isa/isa.h"
 #include "hw/block/fdc.h"
 #include "hw/block/flash.h"
@@ -27,7 +28,7 @@
  */
 struct PCMachineState {
     /*< private >*/
-    MachineState parent_obj;
+    X86MachineState parent_obj;
 
     /* <public> */
 
@@ -36,15 +37,10 @@ struct PCMachineState {
 
     /* Pointers to devices and objects: */
     HotplugHandler *acpi_dev;
-    ISADevice *rtc;
     PCIBus *bus;
-    FWCfgState *fw_cfg;
-    qemu_irq *gsi;
     PFlashCFI01 *flash[2];
-    GMappedFile *initrd_mapped_file;
 
     /* Configuration options: */
-    uint64_t max_ram_below_4g;
     OnOffAuto vmport;
     OnOffAuto smm;
 
@@ -53,27 +49,13 @@ struct PCMachineState {
     bool sata_enabled;
     bool pit_enabled;
 
-    /* RAM information (sizes, addresses, configuration): */
-    ram_addr_t below_4g_mem_size, above_4g_mem_size;
-
-    /* CPU and apic information: */
-    bool apic_xrupt_override;
-    unsigned apic_id_limit;
-    uint16_t boot_cpus;
-    unsigned smp_dies;
-
     /* NUMA information: */
     uint64_t numa_nodes;
     uint64_t *node_mem;
-
-    /* Address space used by IOAPIC device. All IOAPIC interrupts
-     * will be translated to MSI messages in the address space. */
-    AddressSpace *ioapic_as;
 };
 
 #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
 #define PC_MACHINE_DEVMEM_REGION_SIZE "device-memory-region-size"
-#define PC_MACHINE_MAX_RAM_BELOW_4G "max-ram-below-4g"
 #define PC_MACHINE_VMPORT           "vmport"
 #define PC_MACHINE_SMM              "smm"
 #define PC_MACHINE_SMBUS            "smbus"
@@ -139,9 +121,6 @@ typedef struct PCMachineClass {
 
     /* use PVH to load kernels that support this feature */
     bool pvh_enabled;
-
-    /* Enables contiguous-apic-ID mode */
-    bool compat_apic_id_mode;
 } PCMachineClass;
 
 #define TYPE_PC_MACHINE "generic-pc-machine"
@@ -193,10 +172,6 @@ bool pc_machine_is_smm_enabled(PCMachineState *pcms);
 void pc_register_ferr_irq(qemu_irq irq);
 void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
 
-void pc_cpus_init(PCMachineState *pcms);
-void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
-void pc_smp_parse(MachineState *ms, QemuOpts *opts);
-
 void pc_guest_info_init(PCMachineState *pcms);
 
 #define PCI_HOST_PROP_PCI_HOLE_START   "pci-hole-start"
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
new file mode 100644
index 0000000000..5980090b29
--- /dev/null
+++ b/include/hw/i386/x86.h
@@ -0,0 +1,97 @@
+/*
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_X86_H
+#define HW_I386_X86_H
+
+#include "qemu-common.h"
+#include "exec/hwaddr.h"
+#include "qemu/notify.h"
+
+#include "hw/boards.h"
+#include "hw/nmi.h"
+
+typedef struct {
+    /*< private >*/
+    MachineClass parent;
+
+    /*< public >*/
+
+    /* Enables contiguous-apic-ID mode */
+    bool compat_apic_id_mode;
+} X86MachineClass;
+
+typedef struct {
+    /*< private >*/
+    MachineState parent;
+
+    /*< public >*/
+
+    /* Pointers to devices and objects: */
+    ISADevice *rtc;
+    FWCfgState *fw_cfg;
+    qemu_irq *gsi;
+    GMappedFile *initrd_mapped_file;
+
+    /* Configuration options: */
+    uint64_t max_ram_below_4g;
+
+    /* RAM information (sizes, addresses, configuration): */
+    ram_addr_t below_4g_mem_size, above_4g_mem_size;
+
+    /* CPU and apic information: */
+    bool apic_xrupt_override;
+    unsigned apic_id_limit;
+    uint16_t boot_cpus;
+    unsigned smp_dies;
+
+    /* Address space used by IOAPIC device. All IOAPIC interrupts
+     * will be translated to MSI messages in the address space. */
+    AddressSpace *ioapic_as;
+} X86MachineState;
+
+#define X86_MACHINE_MAX_RAM_BELOW_4G "max-ram-below-4g"
+
+#define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
+#define X86_MACHINE(obj) \
+    OBJECT_CHECK(X86MachineState, (obj), TYPE_X86_MACHINE)
+#define X86_MACHINE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(X86MachineClass, obj, TYPE_X86_MACHINE)
+#define X86_MACHINE_CLASS(class) \
+    OBJECT_CLASS_CHECK(X86MachineClass, class, TYPE_X86_MACHINE)
+
+uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
+                                    unsigned int cpu_index);
+
+void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
+void x86_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
+void x86_smp_parse(MachineState *ms, QemuOpts *opts);
+void x86_nmi(NMIState *n, int cpu_index, Error **errp);
+
+CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
+                                             unsigned cpu_index);
+int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx);
+const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms);
+
+void x86_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw);
+
+void load_linux(X86MachineState *x86ms,
+                FWCfgState *fw_cfg,
+                unsigned acpi_data_size,
+                bool linuxboot_dma_enabled,
+                bool pvh_enabled);
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 5/8] fw_cfg: add "modify" functions for all types
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-24 12:44   ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm, Sergio Lopez

This allows to alter the contents of an already added item.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/nvram/fw_cfg.c         | 29 +++++++++++++++++++++++++++
 include/hw/nvram/fw_cfg.h | 42 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 7dc3ac378e..aef1727250 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -690,6 +690,15 @@ void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value)
     fw_cfg_add_bytes(s, key, g_memdup(value, sz), sz);
 }
 
+void fw_cfg_modify_string(FWCfgState *s, uint16_t key, const char *value)
+{
+    size_t sz = strlen(value) + 1;
+    char *old;
+
+    old = fw_cfg_modify_bytes_read(s, key, g_memdup(value, sz), sz);
+    g_free(old);
+}
+
 void fw_cfg_add_i16(FWCfgState *s, uint16_t key, uint16_t value)
 {
     uint16_t *copy;
@@ -720,6 +729,16 @@ void fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t value)
     fw_cfg_add_bytes(s, key, copy, sizeof(value));
 }
 
+void fw_cfg_modify_i32(FWCfgState *s, uint16_t key, uint32_t value)
+{
+    uint32_t *copy, *old;
+
+    copy = g_malloc(sizeof(value));
+    *copy = cpu_to_le32(value);
+    old = fw_cfg_modify_bytes_read(s, key, copy, sizeof(value));
+    g_free(old);
+}
+
 void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value)
 {
     uint64_t *copy;
@@ -730,6 +749,16 @@ void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value)
     fw_cfg_add_bytes(s, key, copy, sizeof(value));
 }
 
+void fw_cfg_modify_i64(FWCfgState *s, uint16_t key, uint64_t value)
+{
+    uint64_t *copy, *old;
+
+    copy = g_malloc(sizeof(value));
+    *copy = cpu_to_le64(value);
+    old = fw_cfg_modify_bytes_read(s, key, copy, sizeof(value));
+    g_free(old);
+}
+
 void fw_cfg_set_order_override(FWCfgState *s, int order)
 {
     assert(s->fw_cfg_order_override == 0);
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 80e435d303..b5291eefad 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -98,6 +98,20 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t len);
  */
 void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value);
 
+/**
+ * fw_cfg_modify_string:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ * @value: NUL-terminated ascii string
+ *
+ * Replace the fw_cfg item available by selecting the given key. The new
+ * data will consist of a dynamically allocated copy of the provided string,
+ * including its NUL terminator. The data being replaced, assumed to have
+ * been dynamically allocated during an earlier call to either
+ * fw_cfg_add_string() or fw_cfg_modify_string(), is freed before returning.
+ */
+void fw_cfg_modify_string(FWCfgState *s, uint16_t key, const char *value);
+
 /**
  * fw_cfg_add_i16:
  * @s: fw_cfg device being modified
@@ -136,6 +150,20 @@ void fw_cfg_modify_i16(FWCfgState *s, uint16_t key, uint16_t value);
  */
 void fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t value);
 
+/**
+ * fw_cfg_modify_i32:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ * @value: 32-bit integer
+ *
+ * Replace the fw_cfg item available by selecting the given key. The new
+ * data will consist of a dynamically allocated copy of the given 32-bit
+ * value, converted to little-endian representation. The data being replaced,
+ * assumed to have been dynamically allocated during an earlier call to
+ * either fw_cfg_add_i32() or fw_cfg_modify_i32(), is freed before returning.
+ */
+void fw_cfg_modify_i32(FWCfgState *s, uint16_t key, uint32_t value);
+
 /**
  * fw_cfg_add_i64:
  * @s: fw_cfg device being modified
@@ -148,6 +176,20 @@ void fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t value);
  */
 void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value);
 
+/**
+ * fw_cfg_modify_i64:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ * @value: 64-bit integer
+ *
+ * Replace the fw_cfg item available by selecting the given key. The new
+ * data will consist of a dynamically allocated copy of the given 64-bit
+ * value, converted to little-endian representation. The data being replaced,
+ * assumed to have been dynamically allocated during an earlier call to
+ * either fw_cfg_add_i64() or fw_cfg_modify_i64(), is freed before returning.
+ */
+void fw_cfg_modify_i64(FWCfgState *s, uint16_t key, uint64_t value);
+
 /**
  * fw_cfg_add_file:
  * @s: fw_cfg device being modified
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 5/8] fw_cfg: add "modify" functions for all types
@ 2019-09-24 12:44   ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

This allows to alter the contents of an already added item.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/nvram/fw_cfg.c         | 29 +++++++++++++++++++++++++++
 include/hw/nvram/fw_cfg.h | 42 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 7dc3ac378e..aef1727250 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -690,6 +690,15 @@ void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value)
     fw_cfg_add_bytes(s, key, g_memdup(value, sz), sz);
 }
 
+void fw_cfg_modify_string(FWCfgState *s, uint16_t key, const char *value)
+{
+    size_t sz = strlen(value) + 1;
+    char *old;
+
+    old = fw_cfg_modify_bytes_read(s, key, g_memdup(value, sz), sz);
+    g_free(old);
+}
+
 void fw_cfg_add_i16(FWCfgState *s, uint16_t key, uint16_t value)
 {
     uint16_t *copy;
@@ -720,6 +729,16 @@ void fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t value)
     fw_cfg_add_bytes(s, key, copy, sizeof(value));
 }
 
+void fw_cfg_modify_i32(FWCfgState *s, uint16_t key, uint32_t value)
+{
+    uint32_t *copy, *old;
+
+    copy = g_malloc(sizeof(value));
+    *copy = cpu_to_le32(value);
+    old = fw_cfg_modify_bytes_read(s, key, copy, sizeof(value));
+    g_free(old);
+}
+
 void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value)
 {
     uint64_t *copy;
@@ -730,6 +749,16 @@ void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value)
     fw_cfg_add_bytes(s, key, copy, sizeof(value));
 }
 
+void fw_cfg_modify_i64(FWCfgState *s, uint16_t key, uint64_t value)
+{
+    uint64_t *copy, *old;
+
+    copy = g_malloc(sizeof(value));
+    *copy = cpu_to_le64(value);
+    old = fw_cfg_modify_bytes_read(s, key, copy, sizeof(value));
+    g_free(old);
+}
+
 void fw_cfg_set_order_override(FWCfgState *s, int order)
 {
     assert(s->fw_cfg_order_override == 0);
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 80e435d303..b5291eefad 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -98,6 +98,20 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t len);
  */
 void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value);
 
+/**
+ * fw_cfg_modify_string:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ * @value: NUL-terminated ascii string
+ *
+ * Replace the fw_cfg item available by selecting the given key. The new
+ * data will consist of a dynamically allocated copy of the provided string,
+ * including its NUL terminator. The data being replaced, assumed to have
+ * been dynamically allocated during an earlier call to either
+ * fw_cfg_add_string() or fw_cfg_modify_string(), is freed before returning.
+ */
+void fw_cfg_modify_string(FWCfgState *s, uint16_t key, const char *value);
+
 /**
  * fw_cfg_add_i16:
  * @s: fw_cfg device being modified
@@ -136,6 +150,20 @@ void fw_cfg_modify_i16(FWCfgState *s, uint16_t key, uint16_t value);
  */
 void fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t value);
 
+/**
+ * fw_cfg_modify_i32:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ * @value: 32-bit integer
+ *
+ * Replace the fw_cfg item available by selecting the given key. The new
+ * data will consist of a dynamically allocated copy of the given 32-bit
+ * value, converted to little-endian representation. The data being replaced,
+ * assumed to have been dynamically allocated during an earlier call to
+ * either fw_cfg_add_i32() or fw_cfg_modify_i32(), is freed before returning.
+ */
+void fw_cfg_modify_i32(FWCfgState *s, uint16_t key, uint32_t value);
+
 /**
  * fw_cfg_add_i64:
  * @s: fw_cfg device being modified
@@ -148,6 +176,20 @@ void fw_cfg_add_i32(FWCfgState *s, uint16_t key, uint32_t value);
  */
 void fw_cfg_add_i64(FWCfgState *s, uint16_t key, uint64_t value);
 
+/**
+ * fw_cfg_modify_i64:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ * @value: 64-bit integer
+ *
+ * Replace the fw_cfg item available by selecting the given key. The new
+ * data will consist of a dynamically allocated copy of the given 64-bit
+ * value, converted to little-endian representation. The data being replaced,
+ * assumed to have been dynamically allocated during an earlier call to
+ * either fw_cfg_add_i64() or fw_cfg_modify_i64(), is freed before returning.
+ */
+void fw_cfg_modify_i64(FWCfgState *s, uint16_t key, uint64_t value);
+
 /**
  * fw_cfg_add_file:
  * @s: fw_cfg device being modified
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 6/8] roms: add microvm-bios (qboot) as binary and git submodule
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-24 12:44   ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm, Sergio Lopez

qboot is a minimalist x86 firmware for booting Linux kernels. It does
the mininum amount of work required for the task, and it's able to
boot both PVH images and bzImages without relying on option roms.

This characteristics make it an ideal companion for the microvm
machine type.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 .gitmodules              |   3 +++
 pc-bios/bios-microvm.bin | Bin 0 -> 65536 bytes
 roms/Makefile            |   6 ++++++
 roms/qboot               |   1 +
 4 files changed, 10 insertions(+)
 create mode 100755 pc-bios/bios-microvm.bin
 create mode 160000 roms/qboot

diff --git a/.gitmodules b/.gitmodules
index c5c474169d..19792c9a11 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -58,3 +58,6 @@
 [submodule "roms/opensbi"]
 	path = roms/opensbi
 	url = 	https://git.qemu.org/git/opensbi.git
+[submodule "roms/qboot"]
+	path = roms/qboot
+	url = https://github.com/bonzini/qboot
diff --git a/pc-bios/bios-microvm.bin b/pc-bios/bios-microvm.bin
new file mode 100755
index 0000000000000000000000000000000000000000..45eabc516692e2d134bbb630d133c7c2dcc9a9b6
GIT binary patch
literal 65536
zcmeI2eS8zwneS)hF_vth5y2#;brv;O^x_6mtAUMO%tn5}IEf)jY``G|XbMBqa_^-@
zBBUXSyplo3+VMx9Ci}VFwz~@rx!t?By&v!0UdNb=V<BQ0NE-8!l)yH1LPAE!VDnNi
zCSmUHnUQRVmuC0>GyaU`%sJ0_&U3!!Ij`fT?33lo8bTirhX$S6(dmK^Nu2qLAPZJO
zd+n=&@XLYwWFdS~Zh2c2gidFUEU<VyB{hF24C^}E=jl<HQ(<+MP>-}gkOUzx)FqJ6
z;W433mmmmAY+Noh;tibd?18@VxCH}v4GeX<kXL!tiyU!Hnn`6SuU6r$bB&SE_=SXJ
zc+)=1$Iq|sgvbt9s)=?%V2RL(E{A7Ar52#~B&%`TJKuimt+%dx%KD*M-I%M^1gEP~
z3rrTeoTTV}=y-L<H(`6Jx<%^VI8zpO=OW?aYwDJw?(oFd+1)>#`0DNc_4sRW0TC1A
zmJsrGlX|tHBmSvHZ7h?b^=>;mu1$tb(P>mvG{5}(=7p;C?X<Nv)KiF;vS^dFph*f0
z|KM_=_+GTu{^~BsC2OI`iH8;Xgy<4`bd|F?E`U$ysLqzy*(zuJjHIw>{{UgF+=c4I
zP{`<|^mTPhV|UNEdEHc{)HAxSYaiW>P|0;&C$X5aR9UVpQyP@Vl`hhv$gdx{Z+-NR
zwvW~;(eFuX@w<QuU$3wVw^IK3ra9{s)&CnoiJ!J0%~y=q9~UYm@2P&Y>tUs8?fTZ0
z-^y7ZnXZ(N2F|Wml7g>taRZ)SYa#R~3)d_2XS+8|g~IPiN|W-WvPxO4Jf$R*7zq(M
zVPZ7&u2-7NT$&*G^Vf&!N<}4sxYR-&zO@e~z~qj%l+aa&|1SJK8kh_<mPg?Ph8*ct
zx<+yY;hYhS;T3-|U%TtHtLYu}w_i63jI9sWyCrer`&zejU6FSvla#+O5G_?20qHTI
z@+oXU(Ov{h!_X&`70OD~fa)<r$y4N=@1QQhFU$Y+KbwE#7xIotf3bYo(#FRhYw)oF
z?fGIsXnOLA6)T@wwR%RLyz}o5Wo%bFs0P1iIT@I~!0oGks67~Pcuwvn$LZRE?$a*(
z{h{{eadET88UP82Cf}S~Nfy9}`gHKyg1<oBEO>(z79nsokwAD^1AC7pf@Oj~FIF9#
zF9b$C2U-gqk-~z?@R7iuo?L~z4ZVWkIiSQ^i}NGJ*2?fn#8cjeR;%3om9j(r$>}0*
zmHG01U~>3C;Jl}&<hUUtcgHCdOXn#uZ}@;euhm+1IPj;0rza63dsZ*SFvX5~bm(K(
z7X>R|!j%@?|5s#Espmjtu)+#k%inqSKe2u4Y^$dyZBy&S@?QTm*4Jv!DYIMrLsd*G
z=`T-i{>XEL^*_04^-~G9FB2d;TMqbVppb+**NV)Wh3cyEi~fAMBS-GYFX{6SqmnR3
zi7j8o-YehtBPZl-;$fEDRenBjZPn_8`k0QW%duAboe{h9;n1k=PmPBITKXgysm0cG
zff9plWlq19^_3qFT=eu943;%0F}d3r9Ci6~gQK=UjyF9V9G&A|6db+Rf1l@a_=w-<
zIgp;B+L?Gjt)J5Gg)}0q>Wcp0HQVM-TQ2)8SKeoRFckXbZl8$MyIG&-ayp6njK|qn
zUpF;;x*hu7L6uw4aN!=m!{{NE=USzLa8KY0G!Yj<9~vxX3HENZ(Onw#yXUrCeraCo
zVgU^`gK6GqgVb`wZ;#Gbmy2v_MAjdX^lEFX6)k+K<#J$AXn(Om8@eKmZkcJ?-+r#^
zCA}?|Tk-m02jZiTNML7==D<i+5OJC+mIOXEN(uZdy$8OjX-^@a*!$f7jzz1bmLC6P
z$oo)aui!E>)CNvx3sYjI-Lq-*XZ<Zl<Oq$)Z{QZ>a2UU;-+c$WsKoy6%2oD<>eine
z!#Ejf&|7)}XW?0sdUNKePg&pstJ#T?3+&za*%{)yhd+P##k*mz+)*)!?YnZKwSKA|
zu}@-GBM5lQwXLUnbA-^KRrkB=>1G$AHSS_<H^$k}{tAOa?8@;s2!3iSdOq3ETjRc?
zR9)$wnodlx*{W`JPX*Q&*ij0x>ujsr9%IKnN3NV_f2z%xyG&<)WQp>tn@_V;6awYf
z{q=0PWP~N+=^0|;@HO^_x)<+6Z;;S0r?eL5MGEsG)4dPD(&64oU$Ar(mKJI91WQp*
z*fzBkMi!<IXKbrL`}@Gu{V;2E&q~&~XA{xZDd8h>_#i^;Q2JgahPiGQ8gvStZuJR~
zt#rF1Q*=b?$N+zvO5*<=;%c=B9T?OE0Z$h_QD(6#I65=Xr9N+IZ4eRkS9#6`N53nF
zgGh%{`-7vUa`;ocs8#P&SmkZ6Ac(#qhoeSdgWU2t0t?xQx<zb7mzo*qiEe;NXZ?4e
znr`x%Mz_0Hm^o(CHN&Qs4b&*`M+jj!!(+BL+i)zAszGW@t(BrHq3fi7YSe)q;Z|3e
z70e$~*2}J?iVWFD_25hT9RCm4JD#4_8EQj6M$u8*vvNr?R^2MiP<{P)-F%7lF{}=>
ztJi2*MGFO0nhpzHsS48`Vp6I$SzypCIQ|r)66h+xi_VhantBb5r^GsquKknX=-R+M
zR5)+3-109(YL#R<<}5hotld~R3DG;%8uwi7=x5}ePWAz;ei|y+wczL$|D53HM*s5Q
z=%zrc`r^c_nOvX1R?2lfbszGepq#~lx-UxZrlnOz{2p{&Q(UKz(M1ePy8b6tOokmV
zLn9Qdcp}|RixRp+gLWhp<KZKhBHF0BOWSB@D^4^t4<64$-NNpV@i4^x#yyN+S3gD9
zxSiTUh{EF#MDZBprm6LA(HQ8^&gUvyD|KV6JKnsXG+_>+axvgJlad?eTee+@?{jXu
zfYy+<4a9q#+Xg%bFc#Lkt?+6)8^}a<6=7{Pl(xY4C3hw+G;lp|9;H5+Abs%xXQ(ef
zlf)pDipQPXQqUx2%7Egdw^J<MQg3E#-s42!M!T~U%24+dys-;|a>oenzQOp1R=J}G
z#oOYxtAp`@B3qnI9t8DHEcgY=VF>p@4)U7qZI&7f#y_FB9^0F2j)oj89X~b6STSQI
zJGB67=8qA3IhV9q{PWleZJ*%`FMcgLjZ#$m)L7J#t+PdmS4q~;K5zNqKq{1(6=GPR
zzt1jQ+&}|ddcP9G9Na5+7s=gK*0Mxk2BxTKvC!2A{4?b4l@75|?ykwFgh^OnXr7ZL
z33i+&JdDZR31b~%GO@GXorc9VZcG&~Hp$(SA~o)u=mIh;l(c%zVi2>ct4HK+8P1VY
z))%OjyZbRz-UVHukq7*D$=!{UL`<^vIo7g+DDMc$J5q8GgR;CZl=RNu;Fbd2d(lC4
zriJ#~jnN$*YE7SvVGH7y;{%g&a;dneK=#pEW<1I(by1@!Ly6_^4fgx76!~?plm%+n
z5{9EYS7Y6gk?*3`y|@8xwK@?azsl16k9t%NY`T@Nl08`i3bYI6;DEBwPKPHJZrYud
zoU9dPO@-cD*-GuwJh;JyN!V~#+6S;vWoVD#t|#DT!#BC>`HZ{PyBj;<ZH7GiGSQWt
zv{53}H;QWUPn@=t?ff8nGyX}D?d{IHZX=lKEx$90?-kE6zq?5e|APg+tSFNuQ*pWF
z6~tnSUb9<pQExIFxfnxFh1P|Z3irq@wAgBf#7hh7Yvu43$afnA3^OlGByigfD~F%Q
zo~RQe6nb{1V%i`)+74#uIPp)jeLQI!GOSK!4GjMya<b9RDafmB!We7f&*I!`;6DR3
z$8W;_zKLHBU&qd=lco%Vswtc)+>gRfXAAGOT^wY+@zX`N53<F#_{i`(vU#ttGbSPI
zYF`40owUQ9L)(<Nmd|Rf$y#GUk=Y@C1M^Bvw6-*~Pcqj2NrAj3njc*uu{w!0S)&hI
zqbuyJ&d!>g(TAO^u5eMPrzo_quzV<RM3o*Clkrb;*o(6}X)2oNm8k0k*vI3ioVNGb
zA=|aVNId>wk?s9XOZ#u`Wf%1iwZW^pj&5Bay_-h4&?!svD7EzF;^s5-#C%kdOM#Y?
za@Yk<+8fAV&^B6`nhQ&_8YH*uMTN7?oyxijCbmWhc*lVK22!2VV6WNICN4;=#ENPf
zB<oC?h8s-qwi<X_^M2}I<~p;BjDjDo(e8smsqdqE#*(=LaBq3@zuuqBl@Jn9N;21q
z5M9Y!`&ejhCS-c5(O3U{o@&kx=6dE2<auvqDn`p1Is6S&j_ot5r=1?b&^f0FB_(r<
zGv)0izcf6QX@7on=(F{j-P8Y^DQfg~?ItHqWqbP=;3KH^xJM4L^F~u_a2IEnpy=*y
zgl7>HT7SIiG;A!)*l2Phb~}x8987b92)D_ZSoaU%3bdVOzo&{{{O9OoL)L>`+Wn!p
z#QN3ZzZ5U3#V4ZW(Pt!9hI5eUbJplHcDBXJzGsg<=VV))p?D$OxjYd#!KTeJY?z&~
zfz2yL=s!A+^!Xn!dcxQkGNBI)$4@(L=h^RU*E4<K{2p=^qa&ajd~zB7!&2FS<QrGe
zD>9yoIC59*PBRXEDu<teyOFOVjrzVeitMRI^1%u72&Vkv)M3mpMt7O$?y2P%sIZnD
z>joIhO<zwPgc5ea)Vhnb{vTR5;#}Z7J7dPSQS59iL}EC8Y&+i8(>Bniwe>5W9{HXg
z<;*qm=Ha%JZHcy1ZRxh7>{uw_YVawZlM2o+JVz9du6WKKYVQ0xe6?=vj~e8b9^|0;
z>oFzzQ}W>GJQ}vt7j=f?u;y#JC~w~hkuG6}O}N)r6#8fwn`OJZ868i5U~}$ndO*%%
zk;7}@H#p&{dGJ+*`D(ib^ude<8&a@8wFyz53;WeD?Nv(-XAWBru?UW11Qzq+SMG>S
z!-IrAfROsI82`}5>YUNK;*!G{16h+D4z&?}B(;kxjNw3u(fZ?Za31oBflXhq)U<5d
zBe2r2$2211gXny_dvgwI!eaG!>kZDuZ#(1#i@a?q<bx4X7{pMBhr(iYc$nt5R3h1P
zz;iNHg#g+IxkSsem0u`%jM1|XeSwM71d(n-zTE;Qo-T|eLTWlR#SClx-`tC}xd3ww
zQ4#3znpy+_19rN7ye6p!5j$2=rZsMv{Z7)4W>z=4cjF9Ph?ZS^Eox<l;%O=BlS&ST
z&N$Buz<Y{}!&RIgMklYqsX<Ac^<6pfwf<M>CBfo5*W#(xICY8?vDl{f1nn!9n&<6<
zv)Q`-&R=6X(U$xP-rTDlqDLY$5G>h`BRqR_UB9~Sk~xlJb^H+<*4~9gJe}wNn3fjt
zQ+evv;tKY{TPzW+-X9vV_&3Sh$rrFY+}BzyMHU{zc}#VmcJ7j){|BT$9=eW`@gUrc
zjfQmRrrgYU6@1~Cg)KfNho>M(n*|6s7?b|57cL%Md<<njf!v{VaQ;(o$>Hx1Orywv
z+FD&f1wCJ;ZtBB^;IC0uw6KNpv88?ForP4^u=dP&0>xg3o#JS6;Q*dOa^c7HkUJJ0
zQ(qXsF=PDVu4tgoql<xoDKC7?ey;`wC@%Fp3J#6r$Su?_qK{bF;y#=_v8BhN_fti4
zsf-Gv@D|kaXAsf?h%4`vLn`H1daQh(y#22cEa{If?kkT!`yA^RM=+`W7qrphW9axX
zuiR0miwEZ~JQlb#tMGwT9?I12r9E+S0+4;EQY}>PrUhKrUNj>c(e|dX3&vva7^I^n
zyjNWyk<4}Dm#!2Y1EC)tN$=8zFo?l_XiTBCzjdad>-cnmo~(X9uX_Z?_c7Kt^E2eK
z9iNQfAiFRmcN}g$_!*Mg?>|>r{&wI4Hgbam8<9KuGcu*;GX0*)*i9Nc4Lt{k_K*&C
z;_QNL!3XY;!)~+@f;}aE8%`qWSX7C<v1SBCj<AD6?fLA)IU}_9b1FKYoiSg#uhVM*
zyI{V)p_{Lx=Q_Vfc{4dDyqp<Np<H7r1q^w-W}Z%Q5}7vCS(DKbMCrR@FnmW06mf17
znMQ$D<F~9|4xd9BDu&NB?T9alquWz!UEI|f)DcPJ10sXu(}--Ja@uuzN4xd(*V;Fp
z-q|kUX~9!G{Wbhv!<&A3r>Cc@AEV8=c6--h<0+c`uzQUa^%_Ra-Y{0eBLs$Cy9f*$
zF<;E}3$dX^Nk)L!Fb%aVFvyJ6GHAZHi%``W%O+xxPYC)mi8iR&{RFC+8ZrEb9rW}V
z`@)ojTG4rQ!=CZxJu$S~Xx@X=ykl)j`_Jw*<VM3#?a_?}?OaO{Cx(m{BY}<(+NtTN
z%5a11nO|@2-vO?k%zXt9y5x@BpCO1Xz-;;TdA~pqOD0$A<eySxy?YKk8r$nxxUtcp
zF#zXjSWp?~d^pScDjo7o*Y}dy3Uc0uMYIz#cF0H&7rLnqIAycPPgz-}1DwGz1#Q|3
zy-H|Al(+4|RD(1K2HBaRE<#QlqOTyXg5t2fr+8`{S`&s3$Sozvav36`{~nY`$!{1f
zHL9Ft@k|Ws3-wuyZO-w@=-;G!w;^;;IjbxXMgMwbfix*l(R>Iy!5}-1betc|rhY(S
zEcgzZA){m@O+SYpJ1m=y!_Q)>y%^exYRcjNftpg#Du;iCAd)5A_Ee71uBZn#EmVHz
zFd@7!gcwXdM$9EzT^ccl6_D9DCIiZbjNg<&y$azA&W-*&Jf{3G(A<2;n>Bs6fUbel
zN7_t~`36STrwo}Uwmbz>;5)&sZhRJwtiW3=gtoF$gO#IE<(5ivO04wmgUnZPw7v&5
zp<4fvTn2~5#cUpVP3ztNgx{9S=Ey8})VrtW;6#u0!Br+3b$GLj+#%CogsVDY<cZu(
zuEucM!ZcGPa@=5hF8jWh+NWK|$P`cIutT9!GsHhCYyr8Y_H7zyqei$S8)@2tDo9y<
z$ht&sQBb|?X)n(4Ob@}?813QTgdW7$<62H=U<OQgp}unj|3bCis)s(b$bTCT#q%?H
zA^>_W^%G<Py9~*x;B*7JNf1c&ARgyMMK5)U@icu4!I_TLn3#=~y7n6^s*}R%`@gjs
zpGEuWLpZ*n3ZmMSpY+l=g!gI;Q`l<Y+uJwtBX#*z<wpaTL;I~b%f!ei1T$QzOPOQ7
zzt|WX0<(K_sl(E_sdk5P&SOtE$IM8PB0tB<I<((zd@-iw3(n>q$AZJ!d^I0O%5r!e
zeD2vz%~!3)+jhJ)U$r3|crd!LIO>(gCbkRB9~pC5zcH7Y2C?DLfMv2VSnX5J%CGmZ
zYHR6PR?vh~yymN-p>hZ8SDUXdp_yOY{7~uH;M>itAib#hs@+KcHPV}}PNpK~Pbe?i
zP<)F5qt(_fDsCwKQ&O~z6@(s{Et<s@kcu7u9u;y2lpE9Vxzu9n4@q+u?UkGuU9^Ax
zNY~|#AUQkgldQJmTdXTukh%w=@To7y0xQ{t&_@=t3}q;#!9l*%Fr#8;QKGTPh<&NN
z+)-VeVUNL)(5HeZht>*ccpM3QW&>K|NMI}^Ahde1Z#9k%5OU&HOhHlFOKF}Se7{^O
zi2VzjKd02d`B}O7cy<-y$N1ntyWxxfE165FzUsxk-rPG8l9IVacsy41=%Y9XK&a8>
za&@{8=>8ZB=iL~_@LwDkUrXlBMVufET<^qD$Bm~s6Hc$jcCsG~GMxVu87JfNFKt(_
zTa()!L^E~!pvj}tbOTK^xL9$fKf>N+e~hWi29vcT6rHRkrxki@2ZczL*jr^O!~!qe
z2siJo$`RR*CH8%*y5y|qi05QF7j>UfDIP8VUEs2~(k*#Iy@lnHJMb^~?qrQhZnZcV
zKV?z(RG?t>Uu*CiO4KXKJW)t4vOLsK=~9x5Cb?Yrgo?vy!(beIOLDiRN{pBPK5!W$
zEdDW}cao)5a(g3^J5nnu$y!UHX#96b&vkW~80xilqHg$>o6f^HlF9pC;Ib;om<^>)
z3&#2z&cMtgSN4u^euyssPTN%+kJ2}SK9=+xQA5<Q2A<E80yRXhR}vz=+*Z3gQa35$
z%Ts?W;*8FQL-9vKqZa>M^<~khAvsVdWq>ur#;6Bu5%IZ0t+rzO;df!Z8#MHfrhlo#
zz%(cID>xx+@Ac+c(!V8i$q;FhpvkBxhJ%-FMginQoz!|0I6=S3DH)<<T1ptXbWweI
zijj*PEt_i+eQzJL>uM-rRoLC+t2igf`Y|<E;Jny3EB7vhFcunX!9DdTx0@RA4HC$@
zk#XjLl!+Jq^MZ!h8!zOd3kC{Az_<)oq-`u+G*HrIzL`l`Jw~of3jVRfyy$?_h%5Dc
z>;Ad({_=sc^6xzT-L>nK(g_#I(CG(V;*TE}#I08Gt9D6>K&20HSUL!oPU`wj5~y-m
zTP#%$`}UkFhjW`$<tPm0Rym3+oAVuoRSAc)s@Gw!N;<4nZ#tx!f=6zSq{X)&Y{Xsj
zZ@Nxr?Q~IG7<V1wJQRWJb~cecbl)zWbWwWUA9S6Lis*9TF2!AsxO?yvDjn8smP&^`
zl(Q<~uvYduBy8XXp)n(4<bzDe(vU!sk)LcPo%iBN`9qe~`ADoOs5L~4I=q4;A7rMI
z46Zud|Ad?3)=@+)@k6=Wb1I2nCGoQ?fgUGpuvdayCCF74EW9BGSzWj<;obmUU&m8e
zI97)zvvqJIx<~|H$b6e*VCQx!Z(B}delL_KNjV)>2!a=KVaU-3KeIl|jf#+tm6^0H
z2;Yt~=^)O>pyO^lO&u@=ynJ{q!+jiS#y=s!j+^RT?I?W#Zc}+fRci_?>u!!k+d<#o
z;I8;U*Z&nyD(@CLy_SHZuEtSc<M1uHSxuG08_I35Y?HR<kiLJzy#xAw{^#`l;Lqr*
z8Q15Bj@jTg2z`F)qPMosXSsBXN>>(G-gGq8I9Ap;)-*VLYqH@b4qsz7jQEH1voDW3
zR#rNis?e6y2D?zMHlXV+goSE{bCILC(&1{PrVwW1-k(Wtpmf+3veY_m*RI+AWB*Xs
zDL-8<+|++QLmAYzrjBZK{D<LGrvJMT-H3Y@@-^W(F7x=9Oa-FJr&dDdNJi!sZCGk~
z{T7T+S*lAX?qTHrt4#h(KmH*)k<pQB3Tm?5L7?(9lEVrtl##zhc`nNjZ&4mP3bB|4
zb~VU9ajdF9Tpi-F3}->kmx-I7DQCL{eAeWfovymzF*9JY+zQzy<R@O^oR*AnB7Hj`
z6M+ncNygCkSVmt}0STiw!#*Ux1=%oUzd2F1AtPHeUbYwMBN@G$aI8XlZ{ktw2v+8-
z$;sC3#yOz|*~sTEX}W#|x&+khCLNV<jo;qf{T4Zjvfa<nu@>2HS5DRs+s~MszfA78
zkjwfj`3d>!F2qe9>x)&iRW`00>oga!RHyKuu0Kr@x8h=1al=Suj!BIWZ%4ilh{dh)
zegDScy}BSr7H^ECu565%yYTd$({(!DG27i3zcF8gB${z<R|oQSs>2%O{UNQgZe>fg
z!<Tc;atmj#E^r~sO5CrU*Y!v6r2HZHu+xHxCez1Be-QWWY@jipWCo#QAj2cmKViS+
z>3qy_x650R$s4<<>(#fn-?h&F-EXcd`&Ow?x<#O{|2t1_ST|?GfBVkbbw3gwZ>Vwk
z8XtE-7r!_GPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(8
z6W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;Z
zH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULas
zfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O
z1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1U
zPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu
z-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m
z0Z!od1Ux-$$J=_^2HLc?{{JUzm0dl`OkLOi(JspO^xUV&;+-j7IrD2oSp}ujDU3^>
o5d@0NUb>FZ&)-2Do-dnE`R8)xT^9bc5QmajO4XIv_+RY*0|l%Fi~s-t

literal 0
HcmV?d00001

diff --git a/roms/Makefile b/roms/Makefile
index 775c963f9d..47eabc8633 100644
--- a/roms/Makefile
+++ b/roms/Makefile
@@ -67,6 +67,7 @@ default:
 	@echo "  opensbi32-virt     -- update OpenSBI for 32-bit virt machine"
 	@echo "  opensbi64-virt     -- update OpenSBI for 64-bit virt machine"
 	@echo "  opensbi64-sifive_u -- update OpenSBI for 64-bit sifive_u machine"
+	@echo "  bios-microvm       -- update bios-microvm.bin (qboot)"
 	@echo "  clean              -- delete the files generated by the previous" \
 	                              "build targets"
 
@@ -185,6 +186,10 @@ opensbi64-sifive_u:
 		PLATFORM="qemu/sifive_u"
 	cp opensbi/build/platform/qemu/sifive_u/firmware/fw_jump.bin ../pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin
 
+bios-microvm:
+	$(MAKE) -C qboot
+	cp qboot/bios.bin ../pc-bios/bios-microvm.bin
+
 clean:
 	rm -rf seabios/.config seabios/out seabios/builds
 	$(MAKE) -C sgabios clean
@@ -197,3 +202,4 @@ clean:
 	$(MAKE) -C skiboot clean
 	$(MAKE) -f Makefile.edk2 clean
 	$(MAKE) -C opensbi clean
+	$(MAKE) -C qboot clean
diff --git a/roms/qboot b/roms/qboot
new file mode 160000
index 0000000000..cb1c49e0cf
--- /dev/null
+++ b/roms/qboot
@@ -0,0 +1 @@
+Subproject commit cb1c49e0cfac99b9961d136ac0194da62c28cf64
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 6/8] roms: add microvm-bios (qboot) as binary and git submodule
@ 2019-09-24 12:44   ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

qboot is a minimalist x86 firmware for booting Linux kernels. It does
the mininum amount of work required for the task, and it's able to
boot both PVH images and bzImages without relying on option roms.

This characteristics make it an ideal companion for the microvm
machine type.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 .gitmodules              |   3 +++
 pc-bios/bios-microvm.bin | Bin 0 -> 65536 bytes
 roms/Makefile            |   6 ++++++
 roms/qboot               |   1 +
 4 files changed, 10 insertions(+)
 create mode 100755 pc-bios/bios-microvm.bin
 create mode 160000 roms/qboot

diff --git a/.gitmodules b/.gitmodules
index c5c474169d..19792c9a11 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -58,3 +58,6 @@
 [submodule "roms/opensbi"]
 	path = roms/opensbi
 	url = 	https://git.qemu.org/git/opensbi.git
+[submodule "roms/qboot"]
+	path = roms/qboot
+	url = https://github.com/bonzini/qboot
diff --git a/pc-bios/bios-microvm.bin b/pc-bios/bios-microvm.bin
new file mode 100755
index 0000000000000000000000000000000000000000..45eabc516692e2d134bbb630d133c7c2dcc9a9b6
GIT binary patch
literal 65536
zcmeI2eS8zwneS)hF_vth5y2#;brv;O^x_6mtAUMO%tn5}IEf)jY``G|XbMBqa_^-@
zBBUXSyplo3+VMx9Ci}VFwz~@rx!t?By&v!0UdNb=V<BQ0NE-8!l)yH1LPAE!VDnNi
zCSmUHnUQRVmuC0>GyaU`%sJ0_&U3!!Ij`fT?33lo8bTirhX$S6(dmK^Nu2qLAPZJO
zd+n=&@XLYwWFdS~Zh2c2gidFUEU<VyB{hF24C^}E=jl<HQ(<+MP>-}gkOUzx)FqJ6
z;W433mmmmAY+Noh;tibd?18@VxCH}v4GeX<kXL!tiyU!Hnn`6SuU6r$bB&SE_=SXJ
zc+)=1$Iq|sgvbt9s)=?%V2RL(E{A7Ar52#~B&%`TJKuimt+%dx%KD*M-I%M^1gEP~
z3rrTeoTTV}=y-L<H(`6Jx<%^VI8zpO=OW?aYwDJw?(oFd+1)>#`0DNc_4sRW0TC1A
zmJsrGlX|tHBmSvHZ7h?b^=>;mu1$tb(P>mvG{5}(=7p;C?X<Nv)KiF;vS^dFph*f0
z|KM_=_+GTu{^~BsC2OI`iH8;Xgy<4`bd|F?E`U$ysLqzy*(zuJjHIw>{{UgF+=c4I
zP{`<|^mTPhV|UNEdEHc{)HAxSYaiW>P|0;&C$X5aR9UVpQyP@Vl`hhv$gdx{Z+-NR
zwvW~;(eFuX@w<QuU$3wVw^IK3ra9{s)&CnoiJ!J0%~y=q9~UYm@2P&Y>tUs8?fTZ0
z-^y7ZnXZ(N2F|Wml7g>taRZ)SYa#R~3)d_2XS+8|g~IPiN|W-WvPxO4Jf$R*7zq(M
zVPZ7&u2-7NT$&*G^Vf&!N<}4sxYR-&zO@e~z~qj%l+aa&|1SJK8kh_<mPg?Ph8*ct
zx<+yY;hYhS;T3-|U%TtHtLYu}w_i63jI9sWyCrer`&zejU6FSvla#+O5G_?20qHTI
z@+oXU(Ov{h!_X&`70OD~fa)<r$y4N=@1QQhFU$Y+KbwE#7xIotf3bYo(#FRhYw)oF
z?fGIsXnOLA6)T@wwR%RLyz}o5Wo%bFs0P1iIT@I~!0oGks67~Pcuwvn$LZRE?$a*(
z{h{{eadET88UP82Cf}S~Nfy9}`gHKyg1<oBEO>(z79nsokwAD^1AC7pf@Oj~FIF9#
zF9b$C2U-gqk-~z?@R7iuo?L~z4ZVWkIiSQ^i}NGJ*2?fn#8cjeR;%3om9j(r$>}0*
zmHG01U~>3C;Jl}&<hUUtcgHCdOXn#uZ}@;euhm+1IPj;0rza63dsZ*SFvX5~bm(K(
z7X>R|!j%@?|5s#Espmjtu)+#k%inqSKe2u4Y^$dyZBy&S@?QTm*4Jv!DYIMrLsd*G
z=`T-i{>XEL^*_04^-~G9FB2d;TMqbVppb+**NV)Wh3cyEi~fAMBS-GYFX{6SqmnR3
zi7j8o-YehtBPZl-;$fEDRenBjZPn_8`k0QW%duAboe{h9;n1k=PmPBITKXgysm0cG
zff9plWlq19^_3qFT=eu943;%0F}d3r9Ci6~gQK=UjyF9V9G&A|6db+Rf1l@a_=w-<
zIgp;B+L?Gjt)J5Gg)}0q>Wcp0HQVM-TQ2)8SKeoRFckXbZl8$MyIG&-ayp6njK|qn
zUpF;;x*hu7L6uw4aN!=m!{{NE=USzLa8KY0G!Yj<9~vxX3HENZ(Onw#yXUrCeraCo
zVgU^`gK6GqgVb`wZ;#Gbmy2v_MAjdX^lEFX6)k+K<#J$AXn(Om8@eKmZkcJ?-+r#^
zCA}?|Tk-m02jZiTNML7==D<i+5OJC+mIOXEN(uZdy$8OjX-^@a*!$f7jzz1bmLC6P
z$oo)aui!E>)CNvx3sYjI-Lq-*XZ<Zl<Oq$)Z{QZ>a2UU;-+c$WsKoy6%2oD<>eine
z!#Ejf&|7)}XW?0sdUNKePg&pstJ#T?3+&za*%{)yhd+P##k*mz+)*)!?YnZKwSKA|
zu}@-GBM5lQwXLUnbA-^KRrkB=>1G$AHSS_<H^$k}{tAOa?8@;s2!3iSdOq3ETjRc?
zR9)$wnodlx*{W`JPX*Q&*ij0x>ujsr9%IKnN3NV_f2z%xyG&<)WQp>tn@_V;6awYf
z{q=0PWP~N+=^0|;@HO^_x)<+6Z;;S0r?eL5MGEsG)4dPD(&64oU$Ar(mKJI91WQp*
z*fzBkMi!<IXKbrL`}@Gu{V;2E&q~&~XA{xZDd8h>_#i^;Q2JgahPiGQ8gvStZuJR~
zt#rF1Q*=b?$N+zvO5*<=;%c=B9T?OE0Z$h_QD(6#I65=Xr9N+IZ4eRkS9#6`N53nF
zgGh%{`-7vUa`;ocs8#P&SmkZ6Ac(#qhoeSdgWU2t0t?xQx<zb7mzo*qiEe;NXZ?4e
znr`x%Mz_0Hm^o(CHN&Qs4b&*`M+jj!!(+BL+i)zAszGW@t(BrHq3fi7YSe)q;Z|3e
z70e$~*2}J?iVWFD_25hT9RCm4JD#4_8EQj6M$u8*vvNr?R^2MiP<{P)-F%7lF{}=>
ztJi2*MGFO0nhpzHsS48`Vp6I$SzypCIQ|r)66h+xi_VhantBb5r^GsquKknX=-R+M
zR5)+3-109(YL#R<<}5hotld~R3DG;%8uwi7=x5}ePWAz;ei|y+wczL$|D53HM*s5Q
z=%zrc`r^c_nOvX1R?2lfbszGepq#~lx-UxZrlnOz{2p{&Q(UKz(M1ePy8b6tOokmV
zLn9Qdcp}|RixRp+gLWhp<KZKhBHF0BOWSB@D^4^t4<64$-NNpV@i4^x#yyN+S3gD9
zxSiTUh{EF#MDZBprm6LA(HQ8^&gUvyD|KV6JKnsXG+_>+axvgJlad?eTee+@?{jXu
zfYy+<4a9q#+Xg%bFc#Lkt?+6)8^}a<6=7{Pl(xY4C3hw+G;lp|9;H5+Abs%xXQ(ef
zlf)pDipQPXQqUx2%7Egdw^J<MQg3E#-s42!M!T~U%24+dys-;|a>oenzQOp1R=J}G
z#oOYxtAp`@B3qnI9t8DHEcgY=VF>p@4)U7qZI&7f#y_FB9^0F2j)oj89X~b6STSQI
zJGB67=8qA3IhV9q{PWleZJ*%`FMcgLjZ#$m)L7J#t+PdmS4q~;K5zNqKq{1(6=GPR
zzt1jQ+&}|ddcP9G9Na5+7s=gK*0Mxk2BxTKvC!2A{4?b4l@75|?ykwFgh^OnXr7ZL
z33i+&JdDZR31b~%GO@GXorc9VZcG&~Hp$(SA~o)u=mIh;l(c%zVi2>ct4HK+8P1VY
z))%OjyZbRz-UVHukq7*D$=!{UL`<^vIo7g+DDMc$J5q8GgR;CZl=RNu;Fbd2d(lC4
zriJ#~jnN$*YE7SvVGH7y;{%g&a;dneK=#pEW<1I(by1@!Ly6_^4fgx76!~?plm%+n
z5{9EYS7Y6gk?*3`y|@8xwK@?azsl16k9t%NY`T@Nl08`i3bYI6;DEBwPKPHJZrYud
zoU9dPO@-cD*-GuwJh;JyN!V~#+6S;vWoVD#t|#DT!#BC>`HZ{PyBj;<ZH7GiGSQWt
zv{53}H;QWUPn@=t?ff8nGyX}D?d{IHZX=lKEx$90?-kE6zq?5e|APg+tSFNuQ*pWF
z6~tnSUb9<pQExIFxfnxFh1P|Z3irq@wAgBf#7hh7Yvu43$afnA3^OlGByigfD~F%Q
zo~RQe6nb{1V%i`)+74#uIPp)jeLQI!GOSK!4GjMya<b9RDafmB!We7f&*I!`;6DR3
z$8W;_zKLHBU&qd=lco%Vswtc)+>gRfXAAGOT^wY+@zX`N53<F#_{i`(vU#ttGbSPI
zYF`40owUQ9L)(<Nmd|Rf$y#GUk=Y@C1M^Bvw6-*~Pcqj2NrAj3njc*uu{w!0S)&hI
zqbuyJ&d!>g(TAO^u5eMPrzo_quzV<RM3o*Clkrb;*o(6}X)2oNm8k0k*vI3ioVNGb
zA=|aVNId>wk?s9XOZ#u`Wf%1iwZW^pj&5Bay_-h4&?!svD7EzF;^s5-#C%kdOM#Y?
za@Yk<+8fAV&^B6`nhQ&_8YH*uMTN7?oyxijCbmWhc*lVK22!2VV6WNICN4;=#ENPf
zB<oC?h8s-qwi<X_^M2}I<~p;BjDjDo(e8smsqdqE#*(=LaBq3@zuuqBl@Jn9N;21q
z5M9Y!`&ejhCS-c5(O3U{o@&kx=6dE2<auvqDn`p1Is6S&j_ot5r=1?b&^f0FB_(r<
zGv)0izcf6QX@7on=(F{j-P8Y^DQfg~?ItHqWqbP=;3KH^xJM4L^F~u_a2IEnpy=*y
zgl7>HT7SIiG;A!)*l2Phb~}x8987b92)D_ZSoaU%3bdVOzo&{{{O9OoL)L>`+Wn!p
z#QN3ZzZ5U3#V4ZW(Pt!9hI5eUbJplHcDBXJzGsg<=VV))p?D$OxjYd#!KTeJY?z&~
zfz2yL=s!A+^!Xn!dcxQkGNBI)$4@(L=h^RU*E4<K{2p=^qa&ajd~zB7!&2FS<QrGe
zD>9yoIC59*PBRXEDu<teyOFOVjrzVeitMRI^1%u72&Vkv)M3mpMt7O$?y2P%sIZnD
z>joIhO<zwPgc5ea)Vhnb{vTR5;#}Z7J7dPSQS59iL}EC8Y&+i8(>Bniwe>5W9{HXg
z<;*qm=Ha%JZHcy1ZRxh7>{uw_YVawZlM2o+JVz9du6WKKYVQ0xe6?=vj~e8b9^|0;
z>oFzzQ}W>GJQ}vt7j=f?u;y#JC~w~hkuG6}O}N)r6#8fwn`OJZ868i5U~}$ndO*%%
zk;7}@H#p&{dGJ+*`D(ib^ude<8&a@8wFyz53;WeD?Nv(-XAWBru?UW11Qzq+SMG>S
z!-IrAfROsI82`}5>YUNK;*!G{16h+D4z&?}B(;kxjNw3u(fZ?Za31oBflXhq)U<5d
zBe2r2$2211gXny_dvgwI!eaG!>kZDuZ#(1#i@a?q<bx4X7{pMBhr(iYc$nt5R3h1P
zz;iNHg#g+IxkSsem0u`%jM1|XeSwM71d(n-zTE;Qo-T|eLTWlR#SClx-`tC}xd3ww
zQ4#3znpy+_19rN7ye6p!5j$2=rZsMv{Z7)4W>z=4cjF9Ph?ZS^Eox<l;%O=BlS&ST
z&N$Buz<Y{}!&RIgMklYqsX<Ac^<6pfwf<M>CBfo5*W#(xICY8?vDl{f1nn!9n&<6<
zv)Q`-&R=6X(U$xP-rTDlqDLY$5G>h`BRqR_UB9~Sk~xlJb^H+<*4~9gJe}wNn3fjt
zQ+evv;tKY{TPzW+-X9vV_&3Sh$rrFY+}BzyMHU{zc}#VmcJ7j){|BT$9=eW`@gUrc
zjfQmRrrgYU6@1~Cg)KfNho>M(n*|6s7?b|57cL%Md<<njf!v{VaQ;(o$>Hx1Orywv
z+FD&f1wCJ;ZtBB^;IC0uw6KNpv88?ForP4^u=dP&0>xg3o#JS6;Q*dOa^c7HkUJJ0
zQ(qXsF=PDVu4tgoql<xoDKC7?ey;`wC@%Fp3J#6r$Su?_qK{bF;y#=_v8BhN_fti4
zsf-Gv@D|kaXAsf?h%4`vLn`H1daQh(y#22cEa{If?kkT!`yA^RM=+`W7qrphW9axX
zuiR0miwEZ~JQlb#tMGwT9?I12r9E+S0+4;EQY}>PrUhKrUNj>c(e|dX3&vva7^I^n
zyjNWyk<4}Dm#!2Y1EC)tN$=8zFo?l_XiTBCzjdad>-cnmo~(X9uX_Z?_c7Kt^E2eK
z9iNQfAiFRmcN}g$_!*Mg?>|>r{&wI4Hgbam8<9KuGcu*;GX0*)*i9Nc4Lt{k_K*&C
z;_QNL!3XY;!)~+@f;}aE8%`qWSX7C<v1SBCj<AD6?fLA)IU}_9b1FKYoiSg#uhVM*
zyI{V)p_{Lx=Q_Vfc{4dDyqp<Np<H7r1q^w-W}Z%Q5}7vCS(DKbMCrR@FnmW06mf17
znMQ$D<F~9|4xd9BDu&NB?T9alquWz!UEI|f)DcPJ10sXu(}--Ja@uuzN4xd(*V;Fp
z-q|kUX~9!G{Wbhv!<&A3r>Cc@AEV8=c6--h<0+c`uzQUa^%_Ra-Y{0eBLs$Cy9f*$
zF<;E}3$dX^Nk)L!Fb%aVFvyJ6GHAZHi%``W%O+xxPYC)mi8iR&{RFC+8ZrEb9rW}V
z`@)ojTG4rQ!=CZxJu$S~Xx@X=ykl)j`_Jw*<VM3#?a_?}?OaO{Cx(m{BY}<(+NtTN
z%5a11nO|@2-vO?k%zXt9y5x@BpCO1Xz-;;TdA~pqOD0$A<eySxy?YKk8r$nxxUtcp
zF#zXjSWp?~d^pScDjo7o*Y}dy3Uc0uMYIz#cF0H&7rLnqIAycPPgz-}1DwGz1#Q|3
zy-H|Al(+4|RD(1K2HBaRE<#QlqOTyXg5t2fr+8`{S`&s3$Sozvav36`{~nY`$!{1f
zHL9Ft@k|Ws3-wuyZO-w@=-;G!w;^;;IjbxXMgMwbfix*l(R>Iy!5}-1betc|rhY(S
zEcgzZA){m@O+SYpJ1m=y!_Q)>y%^exYRcjNftpg#Du;iCAd)5A_Ee71uBZn#EmVHz
zFd@7!gcwXdM$9EzT^ccl6_D9DCIiZbjNg<&y$azA&W-*&Jf{3G(A<2;n>Bs6fUbel
zN7_t~`36STrwo}Uwmbz>;5)&sZhRJwtiW3=gtoF$gO#IE<(5ivO04wmgUnZPw7v&5
zp<4fvTn2~5#cUpVP3ztNgx{9S=Ey8})VrtW;6#u0!Br+3b$GLj+#%CogsVDY<cZu(
zuEucM!ZcGPa@=5hF8jWh+NWK|$P`cIutT9!GsHhCYyr8Y_H7zyqei$S8)@2tDo9y<
z$ht&sQBb|?X)n(4Ob@}?813QTgdW7$<62H=U<OQgp}unj|3bCis)s(b$bTCT#q%?H
zA^>_W^%G<Py9~*x;B*7JNf1c&ARgyMMK5)U@icu4!I_TLn3#=~y7n6^s*}R%`@gjs
zpGEuWLpZ*n3ZmMSpY+l=g!gI;Q`l<Y+uJwtBX#*z<wpaTL;I~b%f!ei1T$QzOPOQ7
zzt|WX0<(K_sl(E_sdk5P&SOtE$IM8PB0tB<I<((zd@-iw3(n>q$AZJ!d^I0O%5r!e
zeD2vz%~!3)+jhJ)U$r3|crd!LIO>(gCbkRB9~pC5zcH7Y2C?DLfMv2VSnX5J%CGmZ
zYHR6PR?vh~yymN-p>hZ8SDUXdp_yOY{7~uH;M>itAib#hs@+KcHPV}}PNpK~Pbe?i
zP<)F5qt(_fDsCwKQ&O~z6@(s{Et<s@kcu7u9u;y2lpE9Vxzu9n4@q+u?UkGuU9^Ax
zNY~|#AUQkgldQJmTdXTukh%w=@To7y0xQ{t&_@=t3}q;#!9l*%Fr#8;QKGTPh<&NN
z+)-VeVUNL)(5HeZht>*ccpM3QW&>K|NMI}^Ahde1Z#9k%5OU&HOhHlFOKF}Se7{^O
zi2VzjKd02d`B}O7cy<-y$N1ntyWxxfE165FzUsxk-rPG8l9IVacsy41=%Y9XK&a8>
za&@{8=>8ZB=iL~_@LwDkUrXlBMVufET<^qD$Bm~s6Hc$jcCsG~GMxVu87JfNFKt(_
zTa()!L^E~!pvj}tbOTK^xL9$fKf>N+e~hWi29vcT6rHRkrxki@2ZczL*jr^O!~!qe
z2siJo$`RR*CH8%*y5y|qi05QF7j>UfDIP8VUEs2~(k*#Iy@lnHJMb^~?qrQhZnZcV
zKV?z(RG?t>Uu*CiO4KXKJW)t4vOLsK=~9x5Cb?Yrgo?vy!(beIOLDiRN{pBPK5!W$
zEdDW}cao)5a(g3^J5nnu$y!UHX#96b&vkW~80xilqHg$>o6f^HlF9pC;Ib;om<^>)
z3&#2z&cMtgSN4u^euyssPTN%+kJ2}SK9=+xQA5<Q2A<E80yRXhR}vz=+*Z3gQa35$
z%Ts?W;*8FQL-9vKqZa>M^<~khAvsVdWq>ur#;6Bu5%IZ0t+rzO;df!Z8#MHfrhlo#
zz%(cID>xx+@Ac+c(!V8i$q;FhpvkBxhJ%-FMginQoz!|0I6=S3DH)<<T1ptXbWweI
zijj*PEt_i+eQzJL>uM-rRoLC+t2igf`Y|<E;Jny3EB7vhFcunX!9DdTx0@RA4HC$@
zk#XjLl!+Jq^MZ!h8!zOd3kC{Az_<)oq-`u+G*HrIzL`l`Jw~of3jVRfyy$?_h%5Dc
z>;Ad({_=sc^6xzT-L>nK(g_#I(CG(V;*TE}#I08Gt9D6>K&20HSUL!oPU`wj5~y-m
zTP#%$`}UkFhjW`$<tPm0Rym3+oAVuoRSAc)s@Gw!N;<4nZ#tx!f=6zSq{X)&Y{Xsj
zZ@Nxr?Q~IG7<V1wJQRWJb~cecbl)zWbWwWUA9S6Lis*9TF2!AsxO?yvDjn8smP&^`
zl(Q<~uvYduBy8XXp)n(4<bzDe(vU!sk)LcPo%iBN`9qe~`ADoOs5L~4I=q4;A7rMI
z46Zud|Ad?3)=@+)@k6=Wb1I2nCGoQ?fgUGpuvdayCCF74EW9BGSzWj<;obmUU&m8e
zI97)zvvqJIx<~|H$b6e*VCQx!Z(B}delL_KNjV)>2!a=KVaU-3KeIl|jf#+tm6^0H
z2;Yt~=^)O>pyO^lO&u@=ynJ{q!+jiS#y=s!j+^RT?I?W#Zc}+fRci_?>u!!k+d<#o
z;I8;U*Z&nyD(@CLy_SHZuEtSc<M1uHSxuG08_I35Y?HR<kiLJzy#xAw{^#`l;Lqr*
z8Q15Bj@jTg2z`F)qPMosXSsBXN>>(G-gGq8I9Ap;)-*VLYqH@b4qsz7jQEH1voDW3
zR#rNis?e6y2D?zMHlXV+goSE{bCILC(&1{PrVwW1-k(Wtpmf+3veY_m*RI+AWB*Xs
zDL-8<+|++QLmAYzrjBZK{D<LGrvJMT-H3Y@@-^W(F7x=9Oa-FJr&dDdNJi!sZCGk~
z{T7T+S*lAX?qTHrt4#h(KmH*)k<pQB3Tm?5L7?(9lEVrtl##zhc`nNjZ&4mP3bB|4
zb~VU9ajdF9Tpi-F3}->kmx-I7DQCL{eAeWfovymzF*9JY+zQzy<R@O^oR*AnB7Hj`
z6M+ncNygCkSVmt}0STiw!#*Ux1=%oUzd2F1AtPHeUbYwMBN@G$aI8XlZ{ktw2v+8-
z$;sC3#yOz|*~sTEX}W#|x&+khCLNV<jo;qf{T4Zjvfa<nu@>2HS5DRs+s~MszfA78
zkjwfj`3d>!F2qe9>x)&iRW`00>oga!RHyKuu0Kr@x8h=1al=Suj!BIWZ%4ilh{dh)
zegDScy}BSr7H^ECu565%yYTd$({(!DG27i3zcF8gB${z<R|oQSs>2%O{UNQgZe>fg
z!<Tc;atmj#E^r~sO5CrU*Y!v6r2HZHu+xHxCez1Be-QWWY@jipWCo#QAj2cmKViS+
z>3qy_x650R$s4<<>(#fn-?h&F-EXcd`&Ow?x<#O{|2t1_ST|?GfBVkbbw3gwZ>Vwk
z8XtE-7r!_GPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(8
z6W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;Z
zH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULas
zfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O
z1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1U
zPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu
z-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m
z0Z!od1Ux-$$J=_^2HLc?{{JUzm0dl`OkLOi(JspO^xUV&;+-j7IrD2oSp}ujDU3^>
o5d@0NUb>FZ&)-2Do-dnE`R8)xT^9bc5QmajO4XIv_+RY*0|l%Fi~s-t

literal 0
HcmV?d00001

diff --git a/roms/Makefile b/roms/Makefile
index 775c963f9d..47eabc8633 100644
--- a/roms/Makefile
+++ b/roms/Makefile
@@ -67,6 +67,7 @@ default:
 	@echo "  opensbi32-virt     -- update OpenSBI for 32-bit virt machine"
 	@echo "  opensbi64-virt     -- update OpenSBI for 64-bit virt machine"
 	@echo "  opensbi64-sifive_u -- update OpenSBI for 64-bit sifive_u machine"
+	@echo "  bios-microvm       -- update bios-microvm.bin (qboot)"
 	@echo "  clean              -- delete the files generated by the previous" \
 	                              "build targets"
 
@@ -185,6 +186,10 @@ opensbi64-sifive_u:
 		PLATFORM="qemu/sifive_u"
 	cp opensbi/build/platform/qemu/sifive_u/firmware/fw_jump.bin ../pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin
 
+bios-microvm:
+	$(MAKE) -C qboot
+	cp qboot/bios.bin ../pc-bios/bios-microvm.bin
+
 clean:
 	rm -rf seabios/.config seabios/out seabios/builds
 	$(MAKE) -C sgabios clean
@@ -197,3 +202,4 @@ clean:
 	$(MAKE) -C skiboot clean
 	$(MAKE) -f Makefile.edk2 clean
 	$(MAKE) -C opensbi clean
+	$(MAKE) -C qboot clean
diff --git a/roms/qboot b/roms/qboot
new file mode 160000
index 0000000000..cb1c49e0cf
--- /dev/null
+++ b/roms/qboot
@@ -0,0 +1 @@
+Subproject commit cb1c49e0cfac99b9961d136ac0194da62c28cf64
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-24 12:44   ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm, Sergio Lopez

Document the new microvm machine type.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 docs/microvm.txt | 78 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)
 create mode 100644 docs/microvm.txt

diff --git a/docs/microvm.txt b/docs/microvm.txt
new file mode 100644
index 0000000000..0241226b2a
--- /dev/null
+++ b/docs/microvm.txt
@@ -0,0 +1,78 @@
+Microvm is a machine type inspired by both NEMU and Firecracker, and
+constructed after the machine model implemented by the latter.
+
+It's main purpose is providing users a minimalist machine type free
+from the burden of legacy compatibility, serving as a stepping stone
+for future projects aiming at improving boot times, reducing the
+attack surface and slimming down QEMU's footprint.
+
+The microvm machine type supports the following devices:
+
+ - ISA bus
+ - i8259 PIC
+ - LAPIC (implicit if using KVM)
+ - IOAPIC (defaults to kernel_irqchip_split = true)
+ - i8254 PIT
+ - MC146818 RTC (optional)
+ - kvmclock (if using KVM)
+ - fw_cfg
+ - One ISA serial port (optional)
+ - Up to eight virtio-mmio devices (configured by the user)
+
+It supports the following machine-specific options:
+
+microvm.option-roms=bool (Set off to disable loading option ROMs)
+microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
+microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
+microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
+
+By default, microvm uses qboot as its BIOS, to obtain better boot
+times, but it's also compatible with SeaBIOS.
+
+As no current FW is able to boot from a block device using virtio-mmio
+as its transport, a microvm-based VM needs to be run using a host-side
+kernel and, optionally, an initrd image.
+
+This is an example of instantiating a microvm VM with a virtio-mmio
+based console:
+
+qemu-system-x86_64 -M microvm
+ -enable-kvm -cpu host -m 512m -smp 2 \
+ -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
+ -nodefaults -no-user-config -nographic \
+ -chardev stdio,id=virtiocon0,server \
+ -device virtio-serial-device \
+ -device virtconsole,chardev=virtiocon0 \
+ -drive id=test,file=test.img,format=raw,if=none \
+ -device virtio-blk-device,drive=test \
+ -netdev tap,id=tap0,script=no,downscript=no \
+ -device virtio-net-device,netdev=tap0
+
+This is another example, this time using an ISA serial port, useful
+for debugging purposes:
+
+qemu-system-x86_64 -M microvm \
+ -enable-kvm -cpu host -m 512m -smp 2 \
+ -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
+ -nodefaults -no-user-config -nographic \
+ -serial stdio \
+ -drive id=test,file=test.img,format=raw,if=none \
+ -device virtio-blk-device,drive=test \
+ -netdev tap,id=tap0,script=no,downscript=no \
+ -device virtio-net-device,netdev=tap0
+
+Finally, in this example a microvm VM is instantiated without RTC,
+without an ISA serial port and without loading the option ROMs,
+obtaining the smallest configuration:
+
+qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
+ -enable-kvm -cpu host -m 512m -smp 2 \
+ -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
+ -nodefaults -no-user-config -nographic \
+ -chardev stdio,id=virtiocon0,server \
+ -device virtio-serial-device \
+ -device virtconsole,chardev=virtiocon0 \
+ -drive id=test,file=test.img,format=raw,if=none \
+ -device virtio-blk-device,drive=test \
+ -netdev tap,id=tap0,script=no,downscript=no \
+ -device virtio-net-device,netdev=tap0
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-24 12:44   ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

Document the new microvm machine type.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 docs/microvm.txt | 78 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)
 create mode 100644 docs/microvm.txt

diff --git a/docs/microvm.txt b/docs/microvm.txt
new file mode 100644
index 0000000000..0241226b2a
--- /dev/null
+++ b/docs/microvm.txt
@@ -0,0 +1,78 @@
+Microvm is a machine type inspired by both NEMU and Firecracker, and
+constructed after the machine model implemented by the latter.
+
+It's main purpose is providing users a minimalist machine type free
+from the burden of legacy compatibility, serving as a stepping stone
+for future projects aiming at improving boot times, reducing the
+attack surface and slimming down QEMU's footprint.
+
+The microvm machine type supports the following devices:
+
+ - ISA bus
+ - i8259 PIC
+ - LAPIC (implicit if using KVM)
+ - IOAPIC (defaults to kernel_irqchip_split = true)
+ - i8254 PIT
+ - MC146818 RTC (optional)
+ - kvmclock (if using KVM)
+ - fw_cfg
+ - One ISA serial port (optional)
+ - Up to eight virtio-mmio devices (configured by the user)
+
+It supports the following machine-specific options:
+
+microvm.option-roms=bool (Set off to disable loading option ROMs)
+microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
+microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
+microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
+
+By default, microvm uses qboot as its BIOS, to obtain better boot
+times, but it's also compatible with SeaBIOS.
+
+As no current FW is able to boot from a block device using virtio-mmio
+as its transport, a microvm-based VM needs to be run using a host-side
+kernel and, optionally, an initrd image.
+
+This is an example of instantiating a microvm VM with a virtio-mmio
+based console:
+
+qemu-system-x86_64 -M microvm
+ -enable-kvm -cpu host -m 512m -smp 2 \
+ -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
+ -nodefaults -no-user-config -nographic \
+ -chardev stdio,id=virtiocon0,server \
+ -device virtio-serial-device \
+ -device virtconsole,chardev=virtiocon0 \
+ -drive id=test,file=test.img,format=raw,if=none \
+ -device virtio-blk-device,drive=test \
+ -netdev tap,id=tap0,script=no,downscript=no \
+ -device virtio-net-device,netdev=tap0
+
+This is another example, this time using an ISA serial port, useful
+for debugging purposes:
+
+qemu-system-x86_64 -M microvm \
+ -enable-kvm -cpu host -m 512m -smp 2 \
+ -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
+ -nodefaults -no-user-config -nographic \
+ -serial stdio \
+ -drive id=test,file=test.img,format=raw,if=none \
+ -device virtio-blk-device,drive=test \
+ -netdev tap,id=tap0,script=no,downscript=no \
+ -device virtio-net-device,netdev=tap0
+
+Finally, in this example a microvm VM is instantiated without RTC,
+without an ISA serial port and without loading the option ROMs,
+obtaining the smallest configuration:
+
+qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
+ -enable-kvm -cpu host -m 512m -smp 2 \
+ -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
+ -nodefaults -no-user-config -nographic \
+ -chardev stdio,id=virtiocon0,server \
+ -device virtio-serial-device \
+ -device virtconsole,chardev=virtiocon0 \
+ -drive id=test,file=test.img,format=raw,if=none \
+ -device virtio-blk-device,drive=test \
+ -netdev tap,id=tap0,script=no,downscript=no \
+ -device virtio-net-device,netdev=tap0
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-24 12:44   ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm, Sergio Lopez

Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a minimalist machine type free
from the burden of legacy compatibility, serving as a stepping stone
for future projects aiming at improving boot times, reducing the
attack surface and slimming down QEMU's footprint.

The microvm machine type supports the following devices:

 - ISA bus
 - i8259 PIC
 - LAPIC (implicit if using KVM)
 - IOAPIC (defaults to kernel_irqchip_split = true)
 - i8254 PIT
 - MC146818 RTC (optional)
 - kvmclock (if using KVM)
 - fw_cfg
 - One ISA serial port (optional)
 - Up to eight virtio-mmio devices (configured by the user)

It supports the following machine-specific options:

microvm.option-roms=bool (Set off to disable loading option ROMs)
microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)

By default, microvm uses qboot as its BIOS, to obtain better boot
times, but it's also compatible with SeaBIOS.

As no current FW is able to boot from a block device using virtio-mmio
as its transport, a microvm-based VM needs to be run using a host-side
kernel and, optionally, an initrd image.

This is an example of instantiating a microvm VM with a virtio-mmio
based console:

qemu-system-x86_64 -M microvm
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -chardev stdio,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

This is another example, this time using an ISA serial port, useful
for debugging purposes:

qemu-system-x86_64 -M microvm \
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -serial stdio \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

Finally, in this example a microvm VM is instantiated without RTC,
without an ISA serial port and without loading the option ROMs,
obtaining the smallest configuration:

qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -chardev stdio,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 default-configs/i386-softmmu.mak |   1 +
 hw/i386/Kconfig                  |   4 +
 hw/i386/Makefile.objs            |   1 +
 hw/i386/microvm.c                | 512 +++++++++++++++++++++++++++++++
 include/hw/i386/microvm.h        |  80 +++++
 5 files changed, 598 insertions(+)
 create mode 100644 hw/i386/microvm.c
 create mode 100644 include/hw/i386/microvm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index cd5ea391e8..c27cdd98e9 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -26,3 +26,4 @@ CONFIG_ISAPC=y
 CONFIG_I440FX=y
 CONFIG_Q35=y
 CONFIG_ACPI_PCI=y
+CONFIG_MICROVM=y
\ No newline at end of file
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 6350438036..324e193dd8 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -88,6 +88,10 @@ config Q35
     select SMBIOS
     select FW_CFG_DMA
 
+config MICROVM
+    bool
+    select VIRTIO_MMIO
+
 config VTD
     bool
 
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 5b4b3a672e..bb17d54567 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -6,6 +6,7 @@ obj-y += pc.o
 obj-y += e820.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
+obj-$(CONFIG_MICROVM) += microvm.o
 obj-y += fw_cfg.o pc_sysfw.o
 obj-y += x86-iommu.o
 obj-$(CONFIG_VTD) += intel_iommu.o
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
new file mode 100644
index 0000000000..4b494a1b27
--- /dev/null
+++ b/hw/i386/microvm.c
@@ -0,0 +1,512 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/cutils.h"
+#include "qemu/units.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/numa.h"
+#include "sysemu/reset.h"
+
+#include "hw/loader.h"
+#include "hw/irq.h"
+#include "hw/nmi.h"
+#include "hw/kvm/clock.h"
+#include "hw/i386/microvm.h"
+#include "hw/i386/x86.h"
+#include "hw/i386/pc.h"
+#include "target/i386/cpu.h"
+#include "hw/timer/i8254.h"
+#include "hw/timer/mc146818rtc.h"
+#include "hw/char/serial.h"
+#include "hw/i386/topology.h"
+#include "hw/i386/e820.h"
+#include "hw/i386/fw_cfg.h"
+#include "hw/virtio/virtio-mmio.h"
+
+#include "cpu.h"
+#include "elf.h"
+#include "pvh.h"
+#include "kvm_i386.h"
+#include "hw/xen/start_info.h"
+
+#define MICROVM_BIOS_FILENAME "bios-microvm.bin"
+
+static void microvm_set_rtc(MicrovmMachineState *mms, ISADevice *s)
+{
+    X86MachineState *x86ms = X86_MACHINE(mms);
+    int val;
+
+    val = MIN(x86ms->below_4g_mem_size / KiB, 640);
+    rtc_set_memory(s, 0x15, val);
+    rtc_set_memory(s, 0x16, val >> 8);
+    /* extended memory (next 64MiB) */
+    if (x86ms->below_4g_mem_size > 1 * MiB) {
+        val = (x86ms->below_4g_mem_size - 1 * MiB) / KiB;
+    } else {
+        val = 0;
+    }
+    if (val > 65535) {
+        val = 65535;
+    }
+    rtc_set_memory(s, 0x17, val);
+    rtc_set_memory(s, 0x18, val >> 8);
+    rtc_set_memory(s, 0x30, val);
+    rtc_set_memory(s, 0x31, val >> 8);
+    /* memory between 16MiB and 4GiB */
+    if (x86ms->below_4g_mem_size > 16 * MiB) {
+        val = (x86ms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
+    } else {
+        val = 0;
+    }
+    if (val > 65535) {
+        val = 65535;
+    }
+    rtc_set_memory(s, 0x34, val);
+    rtc_set_memory(s, 0x35, val >> 8);
+    /* memory above 4GiB */
+    val = x86ms->above_4g_mem_size / 65536;
+    rtc_set_memory(s, 0x5b, val);
+    rtc_set_memory(s, 0x5c, val >> 8);
+    rtc_set_memory(s, 0x5d, val >> 16);
+}
+
+static void microvm_devices_init(MicrovmMachineState *mms)
+{
+    X86MachineState *x86ms = X86_MACHINE(mms);
+    ISABus *isa_bus;
+    ISADevice *rtc_state;
+    GSIState *gsi_state;
+    qemu_irq *i8259;
+    int i;
+
+    gsi_state = g_malloc0(sizeof(*gsi_state));
+    x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
+
+    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
+                          &error_abort);
+    isa_bus_irqs(isa_bus, x86ms->gsi);
+
+    i8259 = i8259_init(isa_bus, pc_allocate_cpu_irq());
+
+    for (i = 0; i < ISA_NUM_IRQS; i++) {
+        gsi_state->i8259_irq[i] = i8259[i];
+    }
+
+    ioapic_init_gsi(gsi_state, "machine");
+
+    if (mms->rtc_enabled) {
+        rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL);
+        microvm_set_rtc(mms, rtc_state);
+    }
+
+    if (kvm_pit_in_kernel()) {
+        kvm_pit_init(isa_bus, 0x40);
+    } else {
+        i8254_pit_init(isa_bus, 0x40, 0, NULL);
+    }
+
+    kvmclock_create();
+
+    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
+        int nirq = VIRTIO_IRQ_BASE + i;
+        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
+        qemu_irq mmio_irq;
+
+        isa_init_irq(isadev, &mmio_irq, nirq);
+        sysbus_create_simple("virtio-mmio",
+                             VIRTIO_MMIO_BASE + i * 512,
+                             x86ms->gsi[VIRTIO_IRQ_BASE + i]);
+    }
+
+    g_free(i8259);
+
+    if (mms->isa_serial_enabled) {
+        serial_hds_isa_init(isa_bus, 0, 1);
+    }
+
+    if (bios_name == NULL) {
+        bios_name = MICROVM_BIOS_FILENAME;
+    }
+    x86_system_rom_init(get_system_memory(), true);
+}
+
+static void microvm_memory_init(MicrovmMachineState *mms)
+{
+    MachineState *machine = MACHINE(mms);
+    X86MachineState *x86ms = X86_MACHINE(mms);
+    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
+    MemoryRegion *system_memory = get_system_memory();
+    FWCfgState *fw_cfg;
+    ram_addr_t lowmem;
+    int i;
+
+    /*
+     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
+     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
+     * also known as MMCFG).
+     * If it doesn't, we need to split it in chunks below and above 4G.
+     * In any case, try to make sure that guest addresses aligned at
+     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
+     */
+    if (machine->ram_size >= 0xb0000000) {
+        lowmem = 0x80000000;
+    } else {
+        lowmem = 0xb0000000;
+    }
+
+    /*
+     * Handle the machine opt max-ram-below-4g.  It is basically doing
+     * min(qemu limit, user limit).
+     */
+    if (!x86ms->max_ram_below_4g) {
+        x86ms->max_ram_below_4g = 1ULL << 32; /* default: 4G */
+    }
+    if (lowmem > x86ms->max_ram_below_4g) {
+        lowmem = x86ms->max_ram_below_4g;
+        if (machine->ram_size - lowmem > lowmem &&
+            lowmem & (1 * GiB - 1)) {
+            warn_report("There is possibly poor performance as the ram size "
+                        " (0x%" PRIx64 ") is more then twice the size of"
+                        " max-ram-below-4g (%"PRIu64") and"
+                        " max-ram-below-4g is not a multiple of 1G.",
+                        (uint64_t)machine->ram_size, x86ms->max_ram_below_4g);
+        }
+    }
+
+    if (machine->ram_size > lowmem) {
+        x86ms->above_4g_mem_size = machine->ram_size - lowmem;
+        x86ms->below_4g_mem_size = lowmem;
+    } else {
+        x86ms->above_4g_mem_size = 0;
+        x86ms->below_4g_mem_size = machine->ram_size;
+    }
+
+    ram = g_malloc(sizeof(*ram));
+    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
+                                         machine->ram_size);
+
+    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
+    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
+                             0, x86ms->below_4g_mem_size);
+    memory_region_add_subregion(system_memory, 0, ram_below_4g);
+
+    e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
+
+    if (x86ms->above_4g_mem_size > 0) {
+        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
+        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
+                                 x86ms->below_4g_mem_size,
+                                 x86ms->above_4g_mem_size);
+        memory_region_add_subregion(system_memory, 0x100000000ULL,
+                                    ram_above_4g);
+        e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM);
+    }
+
+    fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4,
+                                &address_space_memory);
+
+    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
+    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)x86ms->apic_id_limit);
+    fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)machine->ram_size);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
+
+    rom_set_fw(fw_cfg);
+
+    e820_create_fw_entry(fw_cfg);
+
+    load_linux(x86ms, fw_cfg, 0, true, true);
+
+    if (mms->option_roms_enabled) {
+        for (i = 0; i < nb_option_roms; i++) {
+            rom_add_option(option_rom[i].name, option_rom[i].bootindex);
+        }
+    }
+
+    x86ms->fw_cfg = fw_cfg;
+    x86ms->ioapic_as = &address_space_memory;
+}
+
+static gchar *microvm_get_mmio_cmdline(gchar *name)
+{
+    gchar *cmdline;
+    gchar *separator;
+    long int index;
+    int ret;
+
+    separator = g_strrstr(name, ".");
+    if (!separator) {
+        return NULL;
+    }
+
+    if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) {
+        return NULL;
+    }
+
+    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
+    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
+                     " virtio_mmio.device=512@0x%lx:%ld",
+                     VIRTIO_MMIO_BASE + index * 512,
+                     VIRTIO_IRQ_BASE + index);
+    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
+        g_free(cmdline);
+        return NULL;
+    }
+
+    return cmdline;
+}
+
+static void microvm_fix_kernel_cmdline(MachineState *machine)
+{
+    X86MachineState *x86ms = X86_MACHINE(machine);
+    BusState *bus;
+    BusChild *kid;
+    char *cmdline;
+
+    /*
+     * Find MMIO transports with attached devices, and add them to the kernel
+     * command line.
+     *
+     * Yes, this is a hack, but one that heavily improves the UX without
+     * introducing any significant issues.
+     */
+    cmdline = g_strdup(machine->kernel_cmdline);
+    bus = sysbus_get_default();
+    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+        DeviceState *dev = kid->child;
+        ObjectClass *class = object_get_class(OBJECT(dev));
+
+        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
+            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
+            VirtioBusState *mmio_virtio_bus = &mmio->bus;
+            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
+
+            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
+                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
+                if (mmio_cmdline) {
+                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
+                    g_free(mmio_cmdline);
+                    g_free(cmdline);
+                    cmdline = newcmd;
+                }
+            }
+        }
+    }
+
+    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
+    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
+}
+
+static void microvm_machine_state_init(MachineState *machine)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
+    X86MachineState *x86ms = X86_MACHINE(machine);
+    Error *local_err = NULL;
+
+    if (machine->kernel_filename == NULL) {
+        error_report("missing kernel image file name, required by microvm");
+        exit(1);
+    }
+
+    microvm_memory_init(mms);
+
+    x86_cpus_init(x86ms, CPU_VERSION_LATEST);
+    if (local_err) {
+        error_report_err(local_err);
+        exit(1);
+    }
+
+    microvm_devices_init(mms);
+}
+
+static void microvm_machine_reset(MachineState *machine)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
+    CPUState *cs;
+    X86CPU *cpu;
+
+    if (mms->kernel_cmdline_enabled && !mms->kernel_cmdline_fixed) {
+        microvm_fix_kernel_cmdline(machine);
+        mms->kernel_cmdline_fixed = true;
+    }
+
+    qemu_devices_reset();
+
+    CPU_FOREACH(cs) {
+        cpu = X86_CPU(cs);
+
+        if (cpu->apic_state) {
+            device_reset(cpu->apic_state);
+        }
+    }
+}
+
+static bool microvm_machine_get_rtc(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->rtc_enabled;
+}
+
+static void microvm_machine_set_rtc(Object *obj, bool value, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->rtc_enabled = value;
+}
+
+static bool microvm_machine_get_isa_serial(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->isa_serial_enabled;
+}
+
+static void microvm_machine_set_isa_serial(Object *obj, bool value,
+                                           Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->isa_serial_enabled = value;
+}
+
+static bool microvm_machine_get_option_roms(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->option_roms_enabled;
+}
+
+static void microvm_machine_set_option_roms(Object *obj, bool value,
+                                            Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->option_roms_enabled = value;
+}
+
+static bool microvm_machine_get_kernel_cmdline(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->kernel_cmdline_enabled;
+}
+
+static void microvm_machine_set_kernel_cmdline(Object *obj, bool value,
+                                               Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->kernel_cmdline_enabled = value;
+}
+
+static void microvm_machine_initfn(Object *obj)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    /* Configuration */
+    mms->rtc_enabled = true;
+    mms->isa_serial_enabled = true;
+    mms->option_roms_enabled = true;
+    mms->kernel_cmdline_enabled = true;
+
+    /* State */
+    mms->kernel_cmdline_fixed = false;
+}
+
+static void microvm_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+    NMIClass *nc = NMI_CLASS(oc);
+
+    mc->init = microvm_machine_state_init;
+
+    mc->family = "microvm_i386";
+    mc->desc = "Microvm (i386)";
+    mc->units_per_default_bus = 1;
+    mc->no_floppy = 1;
+    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
+    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
+    mc->max_cpus = 288;
+    mc->has_hotpluggable_cpus = false;
+    mc->auto_enable_numa_with_memhp = false;
+    mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
+    mc->nvdimm_supported = false;
+
+    /* Avoid relying too much on kernel components */
+    mc->default_kernel_irqchip_split = true;
+
+    /* Machine class handlers */
+    mc->reset = microvm_machine_reset;
+
+    /* NMI handler */
+    nc->nmi_monitor_handler = x86_nmi;
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_RTC,
+                                   microvm_machine_get_rtc,
+                                   microvm_machine_set_rtc,
+                                   &error_abort);
+    object_class_property_set_description(oc, MICROVM_MACHINE_RTC,
+        "Set off to disable the instantiation of an MC146818 RTC",
+        &error_abort);
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_ISA_SERIAL,
+                                   microvm_machine_get_isa_serial,
+                                   microvm_machine_set_isa_serial,
+                                   &error_abort);
+    object_class_property_set_description(oc, MICROVM_MACHINE_ISA_SERIAL,
+        "Set off to disable the instantiation an ISA serial port",
+        &error_abort);
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_OPTION_ROMS,
+                                   microvm_machine_get_option_roms,
+                                   microvm_machine_set_option_roms,
+                                   &error_abort);
+    object_class_property_set_description(oc, MICROVM_MACHINE_OPTION_ROMS,
+        "Set off to disable loading option ROMs", &error_abort);
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
+                                   microvm_machine_get_kernel_cmdline,
+                                   microvm_machine_set_kernel_cmdline,
+                                   &error_abort);
+    object_class_property_set_description(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
+        "Set off to disable adding virtio-mmio devices to the kernel cmdline",
+        &error_abort);
+}
+
+static const TypeInfo microvm_machine_info = {
+    .name          = TYPE_MICROVM_MACHINE,
+    .parent        = TYPE_X86_MACHINE,
+    .instance_size = sizeof(MicrovmMachineState),
+    .instance_init = microvm_machine_initfn,
+    .class_size    = sizeof(MicrovmMachineClass),
+    .class_init    = microvm_class_init,
+    .interfaces = (InterfaceInfo[]) {
+         { TYPE_NMI },
+         { }
+    },
+};
+
+static void microvm_machine_init(void)
+{
+    type_register_static(&microvm_machine_info);
+}
+type_init(microvm_machine_init);
diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
new file mode 100644
index 0000000000..04c8caf886
--- /dev/null
+++ b/include/hw/i386/microvm.h
@@ -0,0 +1,80 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_MICROVM_H
+#define HW_I386_MICROVM_H
+
+#include "qemu-common.h"
+#include "exec/hwaddr.h"
+#include "qemu/notify.h"
+
+#include "hw/boards.h"
+#include "hw/i386/x86.h"
+
+/* Microvm memory layout */
+#define PVH_START_INFO        0x6000
+#define MEMMAP_START          0x7000
+#define MODLIST_START         0x7800
+#define BOOT_STACK_POINTER    0x8ff0
+#define PML4_START            0x9000
+#define PDPTE_START           0xa000
+#define PDE_START             0xb000
+#define KERNEL_CMDLINE_START  0x20000
+#define EBDA_START            0x9fc00
+#define HIMEM_START           0x100000
+
+/* Platform virtio definitions */
+#define VIRTIO_MMIO_BASE      0xc0000000
+#define VIRTIO_IRQ_BASE       5
+#define VIRTIO_NUM_TRANSPORTS 8
+#define VIRTIO_CMDLINE_MAXLEN 64
+
+/* Machine type options */
+#define MICROVM_MACHINE_RTC            "rtc"
+#define MICROVM_MACHINE_ISA_SERIAL     "isa-serial"
+#define MICROVM_MACHINE_OPTION_ROMS    "option-roms"
+#define MICROVM_MACHINE_KERNEL_CMDLINE "kernel-cmdline"
+
+typedef struct {
+    X86MachineClass parent;
+    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
+                                           DeviceState *dev);
+} MicrovmMachineClass;
+
+typedef struct {
+    X86MachineState parent;
+
+    /* Machine type options */
+    bool rtc_enabled;
+    bool isa_serial_enabled;
+    bool option_roms_enabled;
+    bool kernel_cmdline_enabled;
+
+
+    /* Machine state */
+    bool kernel_cmdline_fixed;
+} MicrovmMachineState;
+
+#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
+#define MICROVM_MACHINE(obj) \
+    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_CLASS(class) \
+    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
+
+#endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 133+ messages in thread

* [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-24 12:44   ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 12:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a minimalist machine type free
from the burden of legacy compatibility, serving as a stepping stone
for future projects aiming at improving boot times, reducing the
attack surface and slimming down QEMU's footprint.

The microvm machine type supports the following devices:

 - ISA bus
 - i8259 PIC
 - LAPIC (implicit if using KVM)
 - IOAPIC (defaults to kernel_irqchip_split = true)
 - i8254 PIT
 - MC146818 RTC (optional)
 - kvmclock (if using KVM)
 - fw_cfg
 - One ISA serial port (optional)
 - Up to eight virtio-mmio devices (configured by the user)

It supports the following machine-specific options:

microvm.option-roms=bool (Set off to disable loading option ROMs)
microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)

By default, microvm uses qboot as its BIOS, to obtain better boot
times, but it's also compatible with SeaBIOS.

As no current FW is able to boot from a block device using virtio-mmio
as its transport, a microvm-based VM needs to be run using a host-side
kernel and, optionally, an initrd image.

This is an example of instantiating a microvm VM with a virtio-mmio
based console:

qemu-system-x86_64 -M microvm
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -chardev stdio,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

This is another example, this time using an ISA serial port, useful
for debugging purposes:

qemu-system-x86_64 -M microvm \
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -serial stdio \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

Finally, in this example a microvm VM is instantiated without RTC,
without an ISA serial port and without loading the option ROMs,
obtaining the smallest configuration:

qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
 -enable-kvm -cpu host -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config -nographic \
 -chardev stdio,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 default-configs/i386-softmmu.mak |   1 +
 hw/i386/Kconfig                  |   4 +
 hw/i386/Makefile.objs            |   1 +
 hw/i386/microvm.c                | 512 +++++++++++++++++++++++++++++++
 include/hw/i386/microvm.h        |  80 +++++
 5 files changed, 598 insertions(+)
 create mode 100644 hw/i386/microvm.c
 create mode 100644 include/hw/i386/microvm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index cd5ea391e8..c27cdd98e9 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -26,3 +26,4 @@ CONFIG_ISAPC=y
 CONFIG_I440FX=y
 CONFIG_Q35=y
 CONFIG_ACPI_PCI=y
+CONFIG_MICROVM=y
\ No newline at end of file
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 6350438036..324e193dd8 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -88,6 +88,10 @@ config Q35
     select SMBIOS
     select FW_CFG_DMA
 
+config MICROVM
+    bool
+    select VIRTIO_MMIO
+
 config VTD
     bool
 
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 5b4b3a672e..bb17d54567 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -6,6 +6,7 @@ obj-y += pc.o
 obj-y += e820.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
+obj-$(CONFIG_MICROVM) += microvm.o
 obj-y += fw_cfg.o pc_sysfw.o
 obj-y += x86-iommu.o
 obj-$(CONFIG_VTD) += intel_iommu.o
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
new file mode 100644
index 0000000000..4b494a1b27
--- /dev/null
+++ b/hw/i386/microvm.c
@@ -0,0 +1,512 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/cutils.h"
+#include "qemu/units.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/numa.h"
+#include "sysemu/reset.h"
+
+#include "hw/loader.h"
+#include "hw/irq.h"
+#include "hw/nmi.h"
+#include "hw/kvm/clock.h"
+#include "hw/i386/microvm.h"
+#include "hw/i386/x86.h"
+#include "hw/i386/pc.h"
+#include "target/i386/cpu.h"
+#include "hw/timer/i8254.h"
+#include "hw/timer/mc146818rtc.h"
+#include "hw/char/serial.h"
+#include "hw/i386/topology.h"
+#include "hw/i386/e820.h"
+#include "hw/i386/fw_cfg.h"
+#include "hw/virtio/virtio-mmio.h"
+
+#include "cpu.h"
+#include "elf.h"
+#include "pvh.h"
+#include "kvm_i386.h"
+#include "hw/xen/start_info.h"
+
+#define MICROVM_BIOS_FILENAME "bios-microvm.bin"
+
+static void microvm_set_rtc(MicrovmMachineState *mms, ISADevice *s)
+{
+    X86MachineState *x86ms = X86_MACHINE(mms);
+    int val;
+
+    val = MIN(x86ms->below_4g_mem_size / KiB, 640);
+    rtc_set_memory(s, 0x15, val);
+    rtc_set_memory(s, 0x16, val >> 8);
+    /* extended memory (next 64MiB) */
+    if (x86ms->below_4g_mem_size > 1 * MiB) {
+        val = (x86ms->below_4g_mem_size - 1 * MiB) / KiB;
+    } else {
+        val = 0;
+    }
+    if (val > 65535) {
+        val = 65535;
+    }
+    rtc_set_memory(s, 0x17, val);
+    rtc_set_memory(s, 0x18, val >> 8);
+    rtc_set_memory(s, 0x30, val);
+    rtc_set_memory(s, 0x31, val >> 8);
+    /* memory between 16MiB and 4GiB */
+    if (x86ms->below_4g_mem_size > 16 * MiB) {
+        val = (x86ms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
+    } else {
+        val = 0;
+    }
+    if (val > 65535) {
+        val = 65535;
+    }
+    rtc_set_memory(s, 0x34, val);
+    rtc_set_memory(s, 0x35, val >> 8);
+    /* memory above 4GiB */
+    val = x86ms->above_4g_mem_size / 65536;
+    rtc_set_memory(s, 0x5b, val);
+    rtc_set_memory(s, 0x5c, val >> 8);
+    rtc_set_memory(s, 0x5d, val >> 16);
+}
+
+static void microvm_devices_init(MicrovmMachineState *mms)
+{
+    X86MachineState *x86ms = X86_MACHINE(mms);
+    ISABus *isa_bus;
+    ISADevice *rtc_state;
+    GSIState *gsi_state;
+    qemu_irq *i8259;
+    int i;
+
+    gsi_state = g_malloc0(sizeof(*gsi_state));
+    x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
+
+    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
+                          &error_abort);
+    isa_bus_irqs(isa_bus, x86ms->gsi);
+
+    i8259 = i8259_init(isa_bus, pc_allocate_cpu_irq());
+
+    for (i = 0; i < ISA_NUM_IRQS; i++) {
+        gsi_state->i8259_irq[i] = i8259[i];
+    }
+
+    ioapic_init_gsi(gsi_state, "machine");
+
+    if (mms->rtc_enabled) {
+        rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL);
+        microvm_set_rtc(mms, rtc_state);
+    }
+
+    if (kvm_pit_in_kernel()) {
+        kvm_pit_init(isa_bus, 0x40);
+    } else {
+        i8254_pit_init(isa_bus, 0x40, 0, NULL);
+    }
+
+    kvmclock_create();
+
+    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
+        int nirq = VIRTIO_IRQ_BASE + i;
+        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
+        qemu_irq mmio_irq;
+
+        isa_init_irq(isadev, &mmio_irq, nirq);
+        sysbus_create_simple("virtio-mmio",
+                             VIRTIO_MMIO_BASE + i * 512,
+                             x86ms->gsi[VIRTIO_IRQ_BASE + i]);
+    }
+
+    g_free(i8259);
+
+    if (mms->isa_serial_enabled) {
+        serial_hds_isa_init(isa_bus, 0, 1);
+    }
+
+    if (bios_name == NULL) {
+        bios_name = MICROVM_BIOS_FILENAME;
+    }
+    x86_system_rom_init(get_system_memory(), true);
+}
+
+static void microvm_memory_init(MicrovmMachineState *mms)
+{
+    MachineState *machine = MACHINE(mms);
+    X86MachineState *x86ms = X86_MACHINE(mms);
+    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
+    MemoryRegion *system_memory = get_system_memory();
+    FWCfgState *fw_cfg;
+    ram_addr_t lowmem;
+    int i;
+
+    /*
+     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
+     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
+     * also known as MMCFG).
+     * If it doesn't, we need to split it in chunks below and above 4G.
+     * In any case, try to make sure that guest addresses aligned at
+     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
+     */
+    if (machine->ram_size >= 0xb0000000) {
+        lowmem = 0x80000000;
+    } else {
+        lowmem = 0xb0000000;
+    }
+
+    /*
+     * Handle the machine opt max-ram-below-4g.  It is basically doing
+     * min(qemu limit, user limit).
+     */
+    if (!x86ms->max_ram_below_4g) {
+        x86ms->max_ram_below_4g = 1ULL << 32; /* default: 4G */
+    }
+    if (lowmem > x86ms->max_ram_below_4g) {
+        lowmem = x86ms->max_ram_below_4g;
+        if (machine->ram_size - lowmem > lowmem &&
+            lowmem & (1 * GiB - 1)) {
+            warn_report("There is possibly poor performance as the ram size "
+                        " (0x%" PRIx64 ") is more then twice the size of"
+                        " max-ram-below-4g (%"PRIu64") and"
+                        " max-ram-below-4g is not a multiple of 1G.",
+                        (uint64_t)machine->ram_size, x86ms->max_ram_below_4g);
+        }
+    }
+
+    if (machine->ram_size > lowmem) {
+        x86ms->above_4g_mem_size = machine->ram_size - lowmem;
+        x86ms->below_4g_mem_size = lowmem;
+    } else {
+        x86ms->above_4g_mem_size = 0;
+        x86ms->below_4g_mem_size = machine->ram_size;
+    }
+
+    ram = g_malloc(sizeof(*ram));
+    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
+                                         machine->ram_size);
+
+    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
+    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
+                             0, x86ms->below_4g_mem_size);
+    memory_region_add_subregion(system_memory, 0, ram_below_4g);
+
+    e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
+
+    if (x86ms->above_4g_mem_size > 0) {
+        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
+        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
+                                 x86ms->below_4g_mem_size,
+                                 x86ms->above_4g_mem_size);
+        memory_region_add_subregion(system_memory, 0x100000000ULL,
+                                    ram_above_4g);
+        e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM);
+    }
+
+    fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4,
+                                &address_space_memory);
+
+    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
+    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)x86ms->apic_id_limit);
+    fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)machine->ram_size);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
+
+    rom_set_fw(fw_cfg);
+
+    e820_create_fw_entry(fw_cfg);
+
+    load_linux(x86ms, fw_cfg, 0, true, true);
+
+    if (mms->option_roms_enabled) {
+        for (i = 0; i < nb_option_roms; i++) {
+            rom_add_option(option_rom[i].name, option_rom[i].bootindex);
+        }
+    }
+
+    x86ms->fw_cfg = fw_cfg;
+    x86ms->ioapic_as = &address_space_memory;
+}
+
+static gchar *microvm_get_mmio_cmdline(gchar *name)
+{
+    gchar *cmdline;
+    gchar *separator;
+    long int index;
+    int ret;
+
+    separator = g_strrstr(name, ".");
+    if (!separator) {
+        return NULL;
+    }
+
+    if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) {
+        return NULL;
+    }
+
+    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
+    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
+                     " virtio_mmio.device=512@0x%lx:%ld",
+                     VIRTIO_MMIO_BASE + index * 512,
+                     VIRTIO_IRQ_BASE + index);
+    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
+        g_free(cmdline);
+        return NULL;
+    }
+
+    return cmdline;
+}
+
+static void microvm_fix_kernel_cmdline(MachineState *machine)
+{
+    X86MachineState *x86ms = X86_MACHINE(machine);
+    BusState *bus;
+    BusChild *kid;
+    char *cmdline;
+
+    /*
+     * Find MMIO transports with attached devices, and add them to the kernel
+     * command line.
+     *
+     * Yes, this is a hack, but one that heavily improves the UX without
+     * introducing any significant issues.
+     */
+    cmdline = g_strdup(machine->kernel_cmdline);
+    bus = sysbus_get_default();
+    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+        DeviceState *dev = kid->child;
+        ObjectClass *class = object_get_class(OBJECT(dev));
+
+        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
+            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
+            VirtioBusState *mmio_virtio_bus = &mmio->bus;
+            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
+
+            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
+                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
+                if (mmio_cmdline) {
+                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
+                    g_free(mmio_cmdline);
+                    g_free(cmdline);
+                    cmdline = newcmd;
+                }
+            }
+        }
+    }
+
+    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
+    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
+}
+
+static void microvm_machine_state_init(MachineState *machine)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
+    X86MachineState *x86ms = X86_MACHINE(machine);
+    Error *local_err = NULL;
+
+    if (machine->kernel_filename == NULL) {
+        error_report("missing kernel image file name, required by microvm");
+        exit(1);
+    }
+
+    microvm_memory_init(mms);
+
+    x86_cpus_init(x86ms, CPU_VERSION_LATEST);
+    if (local_err) {
+        error_report_err(local_err);
+        exit(1);
+    }
+
+    microvm_devices_init(mms);
+}
+
+static void microvm_machine_reset(MachineState *machine)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
+    CPUState *cs;
+    X86CPU *cpu;
+
+    if (mms->kernel_cmdline_enabled && !mms->kernel_cmdline_fixed) {
+        microvm_fix_kernel_cmdline(machine);
+        mms->kernel_cmdline_fixed = true;
+    }
+
+    qemu_devices_reset();
+
+    CPU_FOREACH(cs) {
+        cpu = X86_CPU(cs);
+
+        if (cpu->apic_state) {
+            device_reset(cpu->apic_state);
+        }
+    }
+}
+
+static bool microvm_machine_get_rtc(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->rtc_enabled;
+}
+
+static void microvm_machine_set_rtc(Object *obj, bool value, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->rtc_enabled = value;
+}
+
+static bool microvm_machine_get_isa_serial(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->isa_serial_enabled;
+}
+
+static void microvm_machine_set_isa_serial(Object *obj, bool value,
+                                           Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->isa_serial_enabled = value;
+}
+
+static bool microvm_machine_get_option_roms(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->option_roms_enabled;
+}
+
+static void microvm_machine_set_option_roms(Object *obj, bool value,
+                                            Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->option_roms_enabled = value;
+}
+
+static bool microvm_machine_get_kernel_cmdline(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->kernel_cmdline_enabled;
+}
+
+static void microvm_machine_set_kernel_cmdline(Object *obj, bool value,
+                                               Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->kernel_cmdline_enabled = value;
+}
+
+static void microvm_machine_initfn(Object *obj)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    /* Configuration */
+    mms->rtc_enabled = true;
+    mms->isa_serial_enabled = true;
+    mms->option_roms_enabled = true;
+    mms->kernel_cmdline_enabled = true;
+
+    /* State */
+    mms->kernel_cmdline_fixed = false;
+}
+
+static void microvm_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+    NMIClass *nc = NMI_CLASS(oc);
+
+    mc->init = microvm_machine_state_init;
+
+    mc->family = "microvm_i386";
+    mc->desc = "Microvm (i386)";
+    mc->units_per_default_bus = 1;
+    mc->no_floppy = 1;
+    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
+    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
+    mc->max_cpus = 288;
+    mc->has_hotpluggable_cpus = false;
+    mc->auto_enable_numa_with_memhp = false;
+    mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
+    mc->nvdimm_supported = false;
+
+    /* Avoid relying too much on kernel components */
+    mc->default_kernel_irqchip_split = true;
+
+    /* Machine class handlers */
+    mc->reset = microvm_machine_reset;
+
+    /* NMI handler */
+    nc->nmi_monitor_handler = x86_nmi;
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_RTC,
+                                   microvm_machine_get_rtc,
+                                   microvm_machine_set_rtc,
+                                   &error_abort);
+    object_class_property_set_description(oc, MICROVM_MACHINE_RTC,
+        "Set off to disable the instantiation of an MC146818 RTC",
+        &error_abort);
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_ISA_SERIAL,
+                                   microvm_machine_get_isa_serial,
+                                   microvm_machine_set_isa_serial,
+                                   &error_abort);
+    object_class_property_set_description(oc, MICROVM_MACHINE_ISA_SERIAL,
+        "Set off to disable the instantiation an ISA serial port",
+        &error_abort);
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_OPTION_ROMS,
+                                   microvm_machine_get_option_roms,
+                                   microvm_machine_set_option_roms,
+                                   &error_abort);
+    object_class_property_set_description(oc, MICROVM_MACHINE_OPTION_ROMS,
+        "Set off to disable loading option ROMs", &error_abort);
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
+                                   microvm_machine_get_kernel_cmdline,
+                                   microvm_machine_set_kernel_cmdline,
+                                   &error_abort);
+    object_class_property_set_description(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
+        "Set off to disable adding virtio-mmio devices to the kernel cmdline",
+        &error_abort);
+}
+
+static const TypeInfo microvm_machine_info = {
+    .name          = TYPE_MICROVM_MACHINE,
+    .parent        = TYPE_X86_MACHINE,
+    .instance_size = sizeof(MicrovmMachineState),
+    .instance_init = microvm_machine_initfn,
+    .class_size    = sizeof(MicrovmMachineClass),
+    .class_init    = microvm_class_init,
+    .interfaces = (InterfaceInfo[]) {
+         { TYPE_NMI },
+         { }
+    },
+};
+
+static void microvm_machine_init(void)
+{
+    type_register_static(&microvm_machine_info);
+}
+type_init(microvm_machine_init);
diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
new file mode 100644
index 0000000000..04c8caf886
--- /dev/null
+++ b/include/hw/i386/microvm.h
@@ -0,0 +1,80 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_MICROVM_H
+#define HW_I386_MICROVM_H
+
+#include "qemu-common.h"
+#include "exec/hwaddr.h"
+#include "qemu/notify.h"
+
+#include "hw/boards.h"
+#include "hw/i386/x86.h"
+
+/* Microvm memory layout */
+#define PVH_START_INFO        0x6000
+#define MEMMAP_START          0x7000
+#define MODLIST_START         0x7800
+#define BOOT_STACK_POINTER    0x8ff0
+#define PML4_START            0x9000
+#define PDPTE_START           0xa000
+#define PDE_START             0xb000
+#define KERNEL_CMDLINE_START  0x20000
+#define EBDA_START            0x9fc00
+#define HIMEM_START           0x100000
+
+/* Platform virtio definitions */
+#define VIRTIO_MMIO_BASE      0xc0000000
+#define VIRTIO_IRQ_BASE       5
+#define VIRTIO_NUM_TRANSPORTS 8
+#define VIRTIO_CMDLINE_MAXLEN 64
+
+/* Machine type options */
+#define MICROVM_MACHINE_RTC            "rtc"
+#define MICROVM_MACHINE_ISA_SERIAL     "isa-serial"
+#define MICROVM_MACHINE_OPTION_ROMS    "option-roms"
+#define MICROVM_MACHINE_KERNEL_CMDLINE "kernel-cmdline"
+
+typedef struct {
+    X86MachineClass parent;
+    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
+                                           DeviceState *dev);
+} MicrovmMachineClass;
+
+typedef struct {
+    X86MachineState parent;
+
+    /* Machine type options */
+    bool rtc_enabled;
+    bool isa_serial_enabled;
+    bool option_roms_enabled;
+    bool kernel_cmdline_enabled;
+
+
+    /* Machine state */
+    bool kernel_cmdline_fixed;
+} MicrovmMachineState;
+
+#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
+#define MICROVM_MACHINE(obj) \
+    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_CLASS(class) \
+    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-24 12:44   ` Sergio Lopez
@ 2019-09-24 13:10     ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-24 13:10 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, rth, ehabkost, philmd, lersek,
	kraxel, mtosatti, kvm

On 24/09/19 14:44, Sergio Lopez wrote:
> +Microvm is a machine type inspired by both NEMU and Firecracker, and
> +constructed after the machine model implemented by the latter.

I would say it's inspired by Firecracker only.  The NEMU virt machine
had virtio-pci and ACPI.

> +It's main purpose is providing users a minimalist machine type free
> +from the burden of legacy compatibility,

I think this is too strong, especially if you keep the PIC and PIT. :)
Maybe just "It's a minimalist machine type without PCI support designed
for short-lived guests".

> +serving as a stepping stone
> +for future projects aiming at improving boot times, reducing the
> +attack surface and slimming down QEMU's footprint.

"Microvm also establishes a baseline for benchmarking QEMU and operating
systems, since it is optimized for both boot time and footprint".

> +The microvm machine type supports the following devices:
> +
> + - ISA bus
> + - i8259 PIC
> + - LAPIC (implicit if using KVM)
> + - IOAPIC (defaults to kernel_irqchip_split = true)
> + - i8254 PIT

Do we need the PIT?  And perhaps the PIC even?

Paolo

> + - MC146818 RTC (optional)
> + - kvmclock (if using KVM)
> + - fw_cfg
> + - One ISA serial port (optional)
> + - Up to eight virtio-mmio devices (configured by the user)
> +


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-24 13:10     ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-24 13:10 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: ehabkost, kvm, mst, lersek, mtosatti, kraxel, imammedo, philmd, rth

On 24/09/19 14:44, Sergio Lopez wrote:
> +Microvm is a machine type inspired by both NEMU and Firecracker, and
> +constructed after the machine model implemented by the latter.

I would say it's inspired by Firecracker only.  The NEMU virt machine
had virtio-pci and ACPI.

> +It's main purpose is providing users a minimalist machine type free
> +from the burden of legacy compatibility,

I think this is too strong, especially if you keep the PIC and PIT. :)
Maybe just "It's a minimalist machine type without PCI support designed
for short-lived guests".

> +serving as a stepping stone
> +for future projects aiming at improving boot times, reducing the
> +attack surface and slimming down QEMU's footprint.

"Microvm also establishes a baseline for benchmarking QEMU and operating
systems, since it is optimized for both boot time and footprint".

> +The microvm machine type supports the following devices:
> +
> + - ISA bus
> + - i8259 PIC
> + - LAPIC (implicit if using KVM)
> + - IOAPIC (defaults to kernel_irqchip_split = true)
> + - i8254 PIT

Do we need the PIT?  And perhaps the PIC even?

Paolo

> + - MC146818 RTC (optional)
> + - kvmclock (if using KVM)
> + - fw_cfg
> + - One ISA serial port (optional)
> + - Up to eight virtio-mmio devices (configured by the user)
> +



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-24 12:44   ` Sergio Lopez
@ 2019-09-24 13:12     ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-24 13:12 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, rth, ehabkost, philmd, lersek,
	kraxel, mtosatti, kvm

On 24/09/19 14:44, Sergio Lopez wrote:
> microvm.option-roms=bool (Set off to disable loading option ROMs)

Please make this x-option-roms

> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)

Perhaps auto-kernel-cmdline?

Paolo

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-24 13:12     ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-24 13:12 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: ehabkost, kvm, mst, lersek, mtosatti, kraxel, imammedo, philmd, rth

On 24/09/19 14:44, Sergio Lopez wrote:
> microvm.option-roms=bool (Set off to disable loading option ROMs)

Please make this x-option-roms

> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)

Perhaps auto-kernel-cmdline?

Paolo


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
  2019-09-24 12:44   ` Sergio Lopez
@ 2019-09-24 13:18     ` Philippe Mathieu-Daudé
  -1 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-24 13:18 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, lersek,
	kraxel, mtosatti, kvm

Hi Sergio,

On 9/24/19 2:44 PM, Sergio Lopez wrote:
> Extract PVH related functions from pc.c, and put them in pvh.c, so
> they can be shared with other components.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  hw/i386/Makefile.objs |   1 +
>  hw/i386/pc.c          | 120 +++++-------------------------------------
>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
>  hw/i386/pvh.h         |  10 ++++
>  4 files changed, 136 insertions(+), 108 deletions(-)
>  create mode 100644 hw/i386/pvh.c
>  create mode 100644 hw/i386/pvh.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 5d9c9efd5f..c5f20bbd72 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -1,5 +1,6 @@
>  obj-$(CONFIG_KVM) += kvm/
>  obj-y += multiboot.o
> +obj-y += pvh.o
>  obj-y += pc.o
>  obj-$(CONFIG_I440FX) += pc_piix.o
>  obj-$(CONFIG_Q35) += pc_q35.o
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index bad866fe44..10e4ced0c6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -42,6 +42,7 @@
>  #include "elf.h"
>  #include "migration/vmstate.h"
>  #include "multiboot.h"
> +#include "pvh.h"
>  #include "hw/timer/mc146818rtc.h"
>  #include "hw/dma/i8257.h"
>  #include "hw/timer/i8254.h"
> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
>  static unsigned e820_entries;
>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  
> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
> -static size_t pvh_start_addr;
> -
>  GlobalProperty pc_compat_4_1[] = {};
>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
>  
> @@ -1076,109 +1074,6 @@ struct setup_data {
>      uint8_t data[0];
>  } __attribute__((packed));
>  
> -
> -/*
> - * The entry point into the kernel for PVH boot is different from
> - * the native entry point.  The PVH entry is defined by the x86/HVM
> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> - *
> - * This function is passed to load_elf() when it is called from
> - * load_elfboot() which then additionally checks for an ELF Note of
> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> - * parse the PVH entry address from the ELF Note.
> - *
> - * Due to trickery in elf_opts.h, load_elf() is actually available as
> - * load_elf32() or load_elf64() and this routine needs to be able
> - * to deal with being called as 32 or 64 bit.
> - *
> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
> - * global variable.  (although the entry point is 32-bit, the kernel
> - * binary can be either 32-bit or 64-bit).
> - */
> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> -{
> -    size_t *elf_note_data_addr;
> -
> -    /* Check if ELF Note header passed in is valid */
> -    if (arg1 == NULL) {
> -        return 0;
> -    }
> -
> -    if (is64) {
> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> -        uint64_t phdr_align = *(uint64_t *)arg2;
> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr64) + nhdr_size64 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    } else {
> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> -        uint32_t phdr_align = *(uint32_t *)arg2;
> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr32) + nhdr_size32 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    }
> -
> -    pvh_start_addr = *elf_note_data_addr;
> -
> -    return pvh_start_addr;
> -}
> -
> -static bool load_elfboot(const char *kernel_filename,
> -                   int kernel_file_size,
> -                   uint8_t *header,
> -                   size_t pvh_xen_start_addr,
> -                   FWCfgState *fw_cfg)
> -{
> -    uint32_t flags = 0;
> -    uint32_t mh_load_addr = 0;
> -    uint32_t elf_kernel_size = 0;
> -    uint64_t elf_entry;
> -    uint64_t elf_low, elf_high;
> -    int kernel_size;
> -
> -    if (ldl_p(header) != 0x464c457f) {
> -        return false; /* no elfboot */
> -    }
> -
> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> -    flags = elf_is64 ?
> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> -
> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> -        error_report("elfboot unsupported flags = %x", flags);
> -        exit(1);
> -    }
> -
> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> -                           NULL, &elf_note_type, &elf_entry,
> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> -                           0, 0);
> -
> -    if (kernel_size < 0) {
> -        error_report("Error while loading elf kernel");
> -        exit(1);
> -    }
> -    mh_load_addr = elf_low;
> -    elf_kernel_size = elf_high - elf_low;
> -
> -    if (pvh_start_addr == 0) {
> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
> -        exit(1);
> -    }
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> -
> -    return true;
> -}
> -
>  static void load_linux(PCMachineState *pcms,
>                         FWCfgState *fw_cfg)
>  {
> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
>      if (ldl_p(header+0x202) == 0x53726448) {
>          protocol = lduw_p(header+0x206);
>      } else {
> +        size_t pvh_start_addr;
> +        uint32_t mh_load_addr = 0;
> +        uint32_t elf_kernel_size = 0;
>          /*
>           * This could be a multiboot kernel. If it is, let's stop treating it
>           * like a Linux kernel.
> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
>           * If load_elfboot() is successful, populate the fw_cfg info.
>           */
>          if (pcmc->pvh_enabled &&
> -            load_elfboot(kernel_filename, kernel_size,
> -                         header, pvh_start_addr, fw_cfg)) {
> +            pvh_load_elfboot(kernel_filename,
> +                             &mh_load_addr, &elf_kernel_size)) {
>              fclose(f);
>  
> +            pvh_start_addr = pvh_get_start_addr();
> +
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> +
>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>                  strlen(kernel_cmdline) + 1);
>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
> new file mode 100644
> index 0000000000..1c81727811
> --- /dev/null
> +++ b/hw/i386/pvh.c
> @@ -0,0 +1,113 @@
> +/*
> + * PVH Boot Helper
> + *
> + * Copyright (C) 2019 Oracle
> + * Copyright (C) 2019 Red Hat, Inc
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/error-report.h"
> +#include "hw/loader.h"
> +#include "cpu.h"
> +#include "elf.h"
> +#include "pvh.h"
> +
> +static size_t pvh_start_addr;
> +
> +size_t pvh_get_start_addr(void)
> +{
> +    return pvh_start_addr;
> +}
> +
> +/*
> + * The entry point into the kernel for PVH boot is different from
> + * the native entry point.  The PVH entry is defined by the x86/HVM
> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> + *
> + * This function is passed to load_elf() when it is called from
> + * load_elfboot() which then additionally checks for an ELF Note of
> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> + * parse the PVH entry address from the ELF Note.
> + *
> + * Due to trickery in elf_opts.h, load_elf() is actually available as
> + * load_elf32() or load_elf64() and this routine needs to be able
> + * to deal with being called as 32 or 64 bit.
> + *
> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
> + * global variable.  (although the entry point is 32-bit, the kernel
> + * binary can be either 32-bit or 64-bit).
> + */
> +
> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> +{
> +    size_t *elf_note_data_addr;
> +
> +    /* Check if ELF Note header passed in is valid */
> +    if (arg1 == NULL) {
> +        return 0;
> +    }
> +
> +    if (is64) {
> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> +        uint64_t phdr_align = *(uint64_t *)arg2;
> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr64) + nhdr_size64 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    } else {
> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> +        uint32_t phdr_align = *(uint32_t *)arg2;
> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr32) + nhdr_size32 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    }
> +
> +    pvh_start_addr = *elf_note_data_addr;
> +
> +    return pvh_start_addr;
> +}
> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size)
> +{
> +    uint64_t elf_entry;
> +    uint64_t elf_low, elf_high;
> +    int kernel_size;
> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> +
> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> +                           NULL, &elf_note_type, &elf_entry,
> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> +                           0, 0);
> +
> +    if (kernel_size < 0) {
> +        error_report("Error while loading elf kernel");
> +        return false;
> +    }
> +
> +    if (pvh_start_addr == 0) {
> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
> +        return false;
> +    }
> +
> +    if (mh_load_addr) {
> +        *mh_load_addr = elf_low;
> +    }
> +
> +    if (elf_kernel_size) {
> +        *elf_kernel_size = elf_high - elf_low;
> +    }
> +
> +    return true;
> +}
> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
> new file mode 100644
> index 0000000000..ada67ff6e8
> --- /dev/null
> +++ b/hw/i386/pvh.h
> @@ -0,0 +1,10 @@

License missing.

> +#ifndef HW_I386_PVH_H
> +#define HW_I386_PVH_H
> +
> +size_t pvh_get_start_addr(void);
> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size);

Can you document these functions?

Thanks,

Phil.

> +
> +#endif
> 

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
@ 2019-09-24 13:18     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-24 13:18 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: ehabkost, kvm, mst, mtosatti, kraxel, pbonzini, imammedo, lersek, rth

Hi Sergio,

On 9/24/19 2:44 PM, Sergio Lopez wrote:
> Extract PVH related functions from pc.c, and put them in pvh.c, so
> they can be shared with other components.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  hw/i386/Makefile.objs |   1 +
>  hw/i386/pc.c          | 120 +++++-------------------------------------
>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
>  hw/i386/pvh.h         |  10 ++++
>  4 files changed, 136 insertions(+), 108 deletions(-)
>  create mode 100644 hw/i386/pvh.c
>  create mode 100644 hw/i386/pvh.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 5d9c9efd5f..c5f20bbd72 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -1,5 +1,6 @@
>  obj-$(CONFIG_KVM) += kvm/
>  obj-y += multiboot.o
> +obj-y += pvh.o
>  obj-y += pc.o
>  obj-$(CONFIG_I440FX) += pc_piix.o
>  obj-$(CONFIG_Q35) += pc_q35.o
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index bad866fe44..10e4ced0c6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -42,6 +42,7 @@
>  #include "elf.h"
>  #include "migration/vmstate.h"
>  #include "multiboot.h"
> +#include "pvh.h"
>  #include "hw/timer/mc146818rtc.h"
>  #include "hw/dma/i8257.h"
>  #include "hw/timer/i8254.h"
> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
>  static unsigned e820_entries;
>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  
> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
> -static size_t pvh_start_addr;
> -
>  GlobalProperty pc_compat_4_1[] = {};
>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
>  
> @@ -1076,109 +1074,6 @@ struct setup_data {
>      uint8_t data[0];
>  } __attribute__((packed));
>  
> -
> -/*
> - * The entry point into the kernel for PVH boot is different from
> - * the native entry point.  The PVH entry is defined by the x86/HVM
> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> - *
> - * This function is passed to load_elf() when it is called from
> - * load_elfboot() which then additionally checks for an ELF Note of
> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> - * parse the PVH entry address from the ELF Note.
> - *
> - * Due to trickery in elf_opts.h, load_elf() is actually available as
> - * load_elf32() or load_elf64() and this routine needs to be able
> - * to deal with being called as 32 or 64 bit.
> - *
> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
> - * global variable.  (although the entry point is 32-bit, the kernel
> - * binary can be either 32-bit or 64-bit).
> - */
> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> -{
> -    size_t *elf_note_data_addr;
> -
> -    /* Check if ELF Note header passed in is valid */
> -    if (arg1 == NULL) {
> -        return 0;
> -    }
> -
> -    if (is64) {
> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> -        uint64_t phdr_align = *(uint64_t *)arg2;
> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr64) + nhdr_size64 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    } else {
> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> -        uint32_t phdr_align = *(uint32_t *)arg2;
> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr32) + nhdr_size32 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    }
> -
> -    pvh_start_addr = *elf_note_data_addr;
> -
> -    return pvh_start_addr;
> -}
> -
> -static bool load_elfboot(const char *kernel_filename,
> -                   int kernel_file_size,
> -                   uint8_t *header,
> -                   size_t pvh_xen_start_addr,
> -                   FWCfgState *fw_cfg)
> -{
> -    uint32_t flags = 0;
> -    uint32_t mh_load_addr = 0;
> -    uint32_t elf_kernel_size = 0;
> -    uint64_t elf_entry;
> -    uint64_t elf_low, elf_high;
> -    int kernel_size;
> -
> -    if (ldl_p(header) != 0x464c457f) {
> -        return false; /* no elfboot */
> -    }
> -
> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> -    flags = elf_is64 ?
> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> -
> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> -        error_report("elfboot unsupported flags = %x", flags);
> -        exit(1);
> -    }
> -
> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> -                           NULL, &elf_note_type, &elf_entry,
> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> -                           0, 0);
> -
> -    if (kernel_size < 0) {
> -        error_report("Error while loading elf kernel");
> -        exit(1);
> -    }
> -    mh_load_addr = elf_low;
> -    elf_kernel_size = elf_high - elf_low;
> -
> -    if (pvh_start_addr == 0) {
> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
> -        exit(1);
> -    }
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> -
> -    return true;
> -}
> -
>  static void load_linux(PCMachineState *pcms,
>                         FWCfgState *fw_cfg)
>  {
> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
>      if (ldl_p(header+0x202) == 0x53726448) {
>          protocol = lduw_p(header+0x206);
>      } else {
> +        size_t pvh_start_addr;
> +        uint32_t mh_load_addr = 0;
> +        uint32_t elf_kernel_size = 0;
>          /*
>           * This could be a multiboot kernel. If it is, let's stop treating it
>           * like a Linux kernel.
> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
>           * If load_elfboot() is successful, populate the fw_cfg info.
>           */
>          if (pcmc->pvh_enabled &&
> -            load_elfboot(kernel_filename, kernel_size,
> -                         header, pvh_start_addr, fw_cfg)) {
> +            pvh_load_elfboot(kernel_filename,
> +                             &mh_load_addr, &elf_kernel_size)) {
>              fclose(f);
>  
> +            pvh_start_addr = pvh_get_start_addr();
> +
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> +
>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>                  strlen(kernel_cmdline) + 1);
>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
> new file mode 100644
> index 0000000000..1c81727811
> --- /dev/null
> +++ b/hw/i386/pvh.c
> @@ -0,0 +1,113 @@
> +/*
> + * PVH Boot Helper
> + *
> + * Copyright (C) 2019 Oracle
> + * Copyright (C) 2019 Red Hat, Inc
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/error-report.h"
> +#include "hw/loader.h"
> +#include "cpu.h"
> +#include "elf.h"
> +#include "pvh.h"
> +
> +static size_t pvh_start_addr;
> +
> +size_t pvh_get_start_addr(void)
> +{
> +    return pvh_start_addr;
> +}
> +
> +/*
> + * The entry point into the kernel for PVH boot is different from
> + * the native entry point.  The PVH entry is defined by the x86/HVM
> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> + *
> + * This function is passed to load_elf() when it is called from
> + * load_elfboot() which then additionally checks for an ELF Note of
> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> + * parse the PVH entry address from the ELF Note.
> + *
> + * Due to trickery in elf_opts.h, load_elf() is actually available as
> + * load_elf32() or load_elf64() and this routine needs to be able
> + * to deal with being called as 32 or 64 bit.
> + *
> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
> + * global variable.  (although the entry point is 32-bit, the kernel
> + * binary can be either 32-bit or 64-bit).
> + */
> +
> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> +{
> +    size_t *elf_note_data_addr;
> +
> +    /* Check if ELF Note header passed in is valid */
> +    if (arg1 == NULL) {
> +        return 0;
> +    }
> +
> +    if (is64) {
> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> +        uint64_t phdr_align = *(uint64_t *)arg2;
> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr64) + nhdr_size64 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    } else {
> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> +        uint32_t phdr_align = *(uint32_t *)arg2;
> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr32) + nhdr_size32 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    }
> +
> +    pvh_start_addr = *elf_note_data_addr;
> +
> +    return pvh_start_addr;
> +}
> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size)
> +{
> +    uint64_t elf_entry;
> +    uint64_t elf_low, elf_high;
> +    int kernel_size;
> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> +
> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> +                           NULL, &elf_note_type, &elf_entry,
> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> +                           0, 0);
> +
> +    if (kernel_size < 0) {
> +        error_report("Error while loading elf kernel");
> +        return false;
> +    }
> +
> +    if (pvh_start_addr == 0) {
> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
> +        return false;
> +    }
> +
> +    if (mh_load_addr) {
> +        *mh_load_addr = elf_low;
> +    }
> +
> +    if (elf_kernel_size) {
> +        *elf_kernel_size = elf_high - elf_low;
> +    }
> +
> +    return true;
> +}
> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
> new file mode 100644
> index 0000000000..ada67ff6e8
> --- /dev/null
> +++ b/hw/i386/pvh.h
> @@ -0,0 +1,10 @@

License missing.

> +#ifndef HW_I386_PVH_H
> +#define HW_I386_PVH_H
> +
> +size_t pvh_get_start_addr(void);
> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size);

Can you document these functions?

Thanks,

Phil.

> +
> +#endif
> 


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 2/8] hw/i386: Factorize e820 related functions
  2019-09-24 12:44   ` Sergio Lopez
@ 2019-09-24 13:20     ` Philippe Mathieu-Daudé
  -1 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-24 13:20 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, lersek,
	kraxel, mtosatti, kvm

On 9/24/19 2:44 PM, Sergio Lopez wrote:
> Extract e820 related functions from pc.c, and put them in e820.c, so
> they can be shared with other components.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  hw/i386/Makefile.objs |  1 +
>  hw/i386/e820.c        | 99 +++++++++++++++++++++++++++++++++++++++++++
>  hw/i386/e820.h        | 11 +++++
>  hw/i386/pc.c          | 66 +----------------------------
>  include/hw/i386/pc.h  | 11 -----
>  target/i386/kvm.c     |  1 +
>  6 files changed, 114 insertions(+), 75 deletions(-)
>  create mode 100644 hw/i386/e820.c
>  create mode 100644 hw/i386/e820.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index c5f20bbd72..149712db07 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -2,6 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
>  obj-y += multiboot.o
>  obj-y += pvh.o
>  obj-y += pc.o
> +obj-y += e820.o

Isn't that commit d6d059ca07ae907b8945f88c382fb54d43f9f03a?
I'm confuse now.

>  obj-$(CONFIG_I440FX) += pc_piix.o
>  obj-$(CONFIG_Q35) += pc_q35.o
>  obj-y += fw_cfg.o pc_sysfw.o
> diff --git a/hw/i386/e820.c b/hw/i386/e820.c
> new file mode 100644
> index 0000000000..d5c5c0d528
> --- /dev/null
> +++ b/hw/i386/e820.c
> @@ -0,0 +1,99 @@
> +/*
> + * Copyright (c) 2003-2004 Fabrice Bellard
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qemu/cutils.h"
> +#include "qemu/units.h"
> +
> +#include "hw/i386/e820.h"
> +#include "hw/i386/fw_cfg.h"
> +
> +#define E820_NR_ENTRIES		16
> +
> +struct e820_entry {
> +    uint64_t address;
> +    uint64_t length;
> +    uint32_t type;
> +} QEMU_PACKED __attribute((__aligned__(4)));
> +
> +struct e820_table {
> +    uint32_t count;
> +    struct e820_entry entry[E820_NR_ENTRIES];
> +} QEMU_PACKED __attribute((__aligned__(4)));
> +
> +static struct e820_table e820_reserve;
> +static struct e820_entry *e820_table;
> +static unsigned e820_entries;
> +
> +int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
> +{
> +    int index = le32_to_cpu(e820_reserve.count);
> +    struct e820_entry *entry;
> +
> +    if (type != E820_RAM) {
> +        /* old FW_CFG_E820_TABLE entry -- reservations only */
> +        if (index >= E820_NR_ENTRIES) {
> +            return -EBUSY;
> +        }
> +        entry = &e820_reserve.entry[index++];
> +
> +        entry->address = cpu_to_le64(address);
> +        entry->length = cpu_to_le64(length);
> +        entry->type = cpu_to_le32(type);
> +
> +        e820_reserve.count = cpu_to_le32(index);
> +    }
> +
> +    /* new "etc/e820" file -- include ram too */
> +    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
> +    e820_table[e820_entries].address = cpu_to_le64(address);
> +    e820_table[e820_entries].length = cpu_to_le64(length);
> +    e820_table[e820_entries].type = cpu_to_le32(type);
> +    e820_entries++;
> +
> +    return e820_entries;
> +}
> +
> +int e820_get_num_entries(void)
> +{
> +    return e820_entries;
> +}
> +
> +bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
> +{
> +    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
> +        *address = le64_to_cpu(e820_table[idx].address);
> +        *length = le64_to_cpu(e820_table[idx].length);
> +        return true;
> +    }
> +    return false;
> +}
> +
> +void e820_create_fw_entry(FWCfgState *fw_cfg)
> +{
> +    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
> +                     &e820_reserve, sizeof(e820_reserve));
> +    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
> +                    sizeof(struct e820_entry) * e820_entries);
> +}
> diff --git a/hw/i386/e820.h b/hw/i386/e820.h
> new file mode 100644
> index 0000000000..569d1f0ab5
> --- /dev/null
> +++ b/hw/i386/e820.h
> @@ -0,0 +1,11 @@
> +/* e820 types */
> +#define E820_RAM        1
> +#define E820_RESERVED   2
> +#define E820_ACPI       3
> +#define E820_NVS        4
> +#define E820_UNUSABLE   5
> +
> +int e820_add_entry(uint64_t address, uint64_t length, uint32_t type);
> +int e820_get_num_entries(void);
> +bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length);
> +void e820_create_fw_entry(FWCfgState *fw_cfg);
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 10e4ced0c6..3920aa7e85 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -30,6 +30,7 @@
>  #include "hw/i386/apic.h"
>  #include "hw/i386/topology.h"
>  #include "hw/i386/fw_cfg.h"
> +#include "hw/i386/e820.h"
>  #include "sysemu/cpus.h"
>  #include "hw/block/fdc.h"
>  #include "hw/ide.h"
> @@ -99,22 +100,6 @@
>  #define DPRINTF(fmt, ...)
>  #endif
>  
> -#define E820_NR_ENTRIES		16
> -
> -struct e820_entry {
> -    uint64_t address;
> -    uint64_t length;
> -    uint32_t type;
> -} QEMU_PACKED __attribute((__aligned__(4)));
> -
> -struct e820_table {
> -    uint32_t count;
> -    struct e820_entry entry[E820_NR_ENTRIES];
> -} QEMU_PACKED __attribute((__aligned__(4)));
> -
> -static struct e820_table e820_reserve;
> -static struct e820_entry *e820_table;
> -static unsigned e820_entries;
>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  
>  GlobalProperty pc_compat_4_1[] = {};
> @@ -878,50 +863,6 @@ static void handle_a20_line_change(void *opaque, int irq, int level)
>      x86_cpu_set_a20(cpu, level);
>  }
>  
> -int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
> -{
> -    int index = le32_to_cpu(e820_reserve.count);
> -    struct e820_entry *entry;
> -
> -    if (type != E820_RAM) {
> -        /* old FW_CFG_E820_TABLE entry -- reservations only */
> -        if (index >= E820_NR_ENTRIES) {
> -            return -EBUSY;
> -        }
> -        entry = &e820_reserve.entry[index++];
> -
> -        entry->address = cpu_to_le64(address);
> -        entry->length = cpu_to_le64(length);
> -        entry->type = cpu_to_le32(type);
> -
> -        e820_reserve.count = cpu_to_le32(index);
> -    }
> -
> -    /* new "etc/e820" file -- include ram too */
> -    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
> -    e820_table[e820_entries].address = cpu_to_le64(address);
> -    e820_table[e820_entries].length = cpu_to_le64(length);
> -    e820_table[e820_entries].type = cpu_to_le32(type);
> -    e820_entries++;
> -
> -    return e820_entries;
> -}
> -
> -int e820_get_num_entries(void)
> -{
> -    return e820_entries;
> -}
> -
> -bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
> -{
> -    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
> -        *address = le64_to_cpu(e820_table[idx].address);
> -        *length = le64_to_cpu(e820_table[idx].length);
> -        return true;
> -    }
> -    return false;
> -}
> -
>  /* Calculates initial APIC ID for a specific CPU index
>   *
>   * Currently we need to be able to calculate the APIC ID from the CPU index
> @@ -1024,10 +965,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>                       acpi_tables, acpi_tables_len);
>      fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
>  
> -    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
> -                     &e820_reserve, sizeof(e820_reserve));
> -    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
> -                    sizeof(struct e820_entry) * e820_entries);
> +    e820_create_fw_entry(fw_cfg);
>  
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
>      /* allocate memory for the NUMA channel: one (64bit) word for the number
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 19a837889d..062feeb69e 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -291,17 +291,6 @@ void pc_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
>  void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
>                         const CPUArchIdList *apic_ids, GArray *entry);
>  
> -/* e820 types */
> -#define E820_RAM        1
> -#define E820_RESERVED   2
> -#define E820_ACPI       3
> -#define E820_NVS        4
> -#define E820_UNUSABLE   5
> -
> -int e820_add_entry(uint64_t, uint64_t, uint32_t);
> -int e820_get_num_entries(void);
> -bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
> -
>  extern GlobalProperty pc_compat_4_1[];
>  extern const size_t pc_compat_4_1_len;
>  
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 8023c679ea..8ce56db7d4 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -41,6 +41,7 @@
>  #include "hw/i386/apic-msidef.h"
>  #include "hw/i386/intel_iommu.h"
>  #include "hw/i386/x86-iommu.h"
> +#include "hw/i386/e820.h"
>  
>  #include "hw/pci/pci.h"
>  #include "hw/pci/msi.h"
> 

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 2/8] hw/i386: Factorize e820 related functions
@ 2019-09-24 13:20     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-24 13:20 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: ehabkost, kvm, mst, mtosatti, kraxel, pbonzini, imammedo, lersek, rth

On 9/24/19 2:44 PM, Sergio Lopez wrote:
> Extract e820 related functions from pc.c, and put them in e820.c, so
> they can be shared with other components.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  hw/i386/Makefile.objs |  1 +
>  hw/i386/e820.c        | 99 +++++++++++++++++++++++++++++++++++++++++++
>  hw/i386/e820.h        | 11 +++++
>  hw/i386/pc.c          | 66 +----------------------------
>  include/hw/i386/pc.h  | 11 -----
>  target/i386/kvm.c     |  1 +
>  6 files changed, 114 insertions(+), 75 deletions(-)
>  create mode 100644 hw/i386/e820.c
>  create mode 100644 hw/i386/e820.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index c5f20bbd72..149712db07 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -2,6 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
>  obj-y += multiboot.o
>  obj-y += pvh.o
>  obj-y += pc.o
> +obj-y += e820.o

Isn't that commit d6d059ca07ae907b8945f88c382fb54d43f9f03a?
I'm confuse now.

>  obj-$(CONFIG_I440FX) += pc_piix.o
>  obj-$(CONFIG_Q35) += pc_q35.o
>  obj-y += fw_cfg.o pc_sysfw.o
> diff --git a/hw/i386/e820.c b/hw/i386/e820.c
> new file mode 100644
> index 0000000000..d5c5c0d528
> --- /dev/null
> +++ b/hw/i386/e820.c
> @@ -0,0 +1,99 @@
> +/*
> + * Copyright (c) 2003-2004 Fabrice Bellard
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qemu/cutils.h"
> +#include "qemu/units.h"
> +
> +#include "hw/i386/e820.h"
> +#include "hw/i386/fw_cfg.h"
> +
> +#define E820_NR_ENTRIES		16
> +
> +struct e820_entry {
> +    uint64_t address;
> +    uint64_t length;
> +    uint32_t type;
> +} QEMU_PACKED __attribute((__aligned__(4)));
> +
> +struct e820_table {
> +    uint32_t count;
> +    struct e820_entry entry[E820_NR_ENTRIES];
> +} QEMU_PACKED __attribute((__aligned__(4)));
> +
> +static struct e820_table e820_reserve;
> +static struct e820_entry *e820_table;
> +static unsigned e820_entries;
> +
> +int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
> +{
> +    int index = le32_to_cpu(e820_reserve.count);
> +    struct e820_entry *entry;
> +
> +    if (type != E820_RAM) {
> +        /* old FW_CFG_E820_TABLE entry -- reservations only */
> +        if (index >= E820_NR_ENTRIES) {
> +            return -EBUSY;
> +        }
> +        entry = &e820_reserve.entry[index++];
> +
> +        entry->address = cpu_to_le64(address);
> +        entry->length = cpu_to_le64(length);
> +        entry->type = cpu_to_le32(type);
> +
> +        e820_reserve.count = cpu_to_le32(index);
> +    }
> +
> +    /* new "etc/e820" file -- include ram too */
> +    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
> +    e820_table[e820_entries].address = cpu_to_le64(address);
> +    e820_table[e820_entries].length = cpu_to_le64(length);
> +    e820_table[e820_entries].type = cpu_to_le32(type);
> +    e820_entries++;
> +
> +    return e820_entries;
> +}
> +
> +int e820_get_num_entries(void)
> +{
> +    return e820_entries;
> +}
> +
> +bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
> +{
> +    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
> +        *address = le64_to_cpu(e820_table[idx].address);
> +        *length = le64_to_cpu(e820_table[idx].length);
> +        return true;
> +    }
> +    return false;
> +}
> +
> +void e820_create_fw_entry(FWCfgState *fw_cfg)
> +{
> +    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
> +                     &e820_reserve, sizeof(e820_reserve));
> +    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
> +                    sizeof(struct e820_entry) * e820_entries);
> +}
> diff --git a/hw/i386/e820.h b/hw/i386/e820.h
> new file mode 100644
> index 0000000000..569d1f0ab5
> --- /dev/null
> +++ b/hw/i386/e820.h
> @@ -0,0 +1,11 @@
> +/* e820 types */
> +#define E820_RAM        1
> +#define E820_RESERVED   2
> +#define E820_ACPI       3
> +#define E820_NVS        4
> +#define E820_UNUSABLE   5
> +
> +int e820_add_entry(uint64_t address, uint64_t length, uint32_t type);
> +int e820_get_num_entries(void);
> +bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length);
> +void e820_create_fw_entry(FWCfgState *fw_cfg);
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 10e4ced0c6..3920aa7e85 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -30,6 +30,7 @@
>  #include "hw/i386/apic.h"
>  #include "hw/i386/topology.h"
>  #include "hw/i386/fw_cfg.h"
> +#include "hw/i386/e820.h"
>  #include "sysemu/cpus.h"
>  #include "hw/block/fdc.h"
>  #include "hw/ide.h"
> @@ -99,22 +100,6 @@
>  #define DPRINTF(fmt, ...)
>  #endif
>  
> -#define E820_NR_ENTRIES		16
> -
> -struct e820_entry {
> -    uint64_t address;
> -    uint64_t length;
> -    uint32_t type;
> -} QEMU_PACKED __attribute((__aligned__(4)));
> -
> -struct e820_table {
> -    uint32_t count;
> -    struct e820_entry entry[E820_NR_ENTRIES];
> -} QEMU_PACKED __attribute((__aligned__(4)));
> -
> -static struct e820_table e820_reserve;
> -static struct e820_entry *e820_table;
> -static unsigned e820_entries;
>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  
>  GlobalProperty pc_compat_4_1[] = {};
> @@ -878,50 +863,6 @@ static void handle_a20_line_change(void *opaque, int irq, int level)
>      x86_cpu_set_a20(cpu, level);
>  }
>  
> -int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
> -{
> -    int index = le32_to_cpu(e820_reserve.count);
> -    struct e820_entry *entry;
> -
> -    if (type != E820_RAM) {
> -        /* old FW_CFG_E820_TABLE entry -- reservations only */
> -        if (index >= E820_NR_ENTRIES) {
> -            return -EBUSY;
> -        }
> -        entry = &e820_reserve.entry[index++];
> -
> -        entry->address = cpu_to_le64(address);
> -        entry->length = cpu_to_le64(length);
> -        entry->type = cpu_to_le32(type);
> -
> -        e820_reserve.count = cpu_to_le32(index);
> -    }
> -
> -    /* new "etc/e820" file -- include ram too */
> -    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
> -    e820_table[e820_entries].address = cpu_to_le64(address);
> -    e820_table[e820_entries].length = cpu_to_le64(length);
> -    e820_table[e820_entries].type = cpu_to_le32(type);
> -    e820_entries++;
> -
> -    return e820_entries;
> -}
> -
> -int e820_get_num_entries(void)
> -{
> -    return e820_entries;
> -}
> -
> -bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
> -{
> -    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
> -        *address = le64_to_cpu(e820_table[idx].address);
> -        *length = le64_to_cpu(e820_table[idx].length);
> -        return true;
> -    }
> -    return false;
> -}
> -
>  /* Calculates initial APIC ID for a specific CPU index
>   *
>   * Currently we need to be able to calculate the APIC ID from the CPU index
> @@ -1024,10 +965,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>                       acpi_tables, acpi_tables_len);
>      fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
>  
> -    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
> -                     &e820_reserve, sizeof(e820_reserve));
> -    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
> -                    sizeof(struct e820_entry) * e820_entries);
> +    e820_create_fw_entry(fw_cfg);
>  
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
>      /* allocate memory for the NUMA channel: one (64bit) word for the number
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 19a837889d..062feeb69e 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -291,17 +291,6 @@ void pc_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
>  void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
>                         const CPUArchIdList *apic_ids, GArray *entry);
>  
> -/* e820 types */
> -#define E820_RAM        1
> -#define E820_RESERVED   2
> -#define E820_ACPI       3
> -#define E820_NVS        4
> -#define E820_UNUSABLE   5
> -
> -int e820_add_entry(uint64_t, uint64_t, uint32_t);
> -int e820_get_num_entries(void);
> -bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
> -
>  extern GlobalProperty pc_compat_4_1[];
>  extern const size_t pc_compat_4_1_len;
>  
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index 8023c679ea..8ce56db7d4 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -41,6 +41,7 @@
>  #include "hw/i386/apic-msidef.h"
>  #include "hw/i386/intel_iommu.h"
>  #include "hw/i386/x86-iommu.h"
> +#include "hw/i386/e820.h"
>  
>  #include "hw/pci/pci.h"
>  #include "hw/pci/msi.h"
> 


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-24 13:12     ` Paolo Bonzini
@ 2019-09-24 13:24       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 133+ messages in thread
From: Michael S. Tsirkin @ 2019-09-24 13:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergio Lopez, qemu-devel, imammedo, marcel.apfelbaum, rth,
	ehabkost, philmd, lersek, kraxel, mtosatti, kvm

On Tue, Sep 24, 2019 at 03:12:15PM +0200, Paolo Bonzini wrote:
> On 24/09/19 14:44, Sergio Lopez wrote:
> > microvm.option-roms=bool (Set off to disable loading option ROMs)
> 
> Please make this x-option-roms

Why? We don't plan to support this going forward?

> > microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
> > microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
> > microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
> 
> Perhaps auto-kernel-cmdline?
> 
> Paolo

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-24 13:24       ` Michael S. Tsirkin
  0 siblings, 0 replies; 133+ messages in thread
From: Michael S. Tsirkin @ 2019-09-24 13:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, Sergio Lopez, lersek, mtosatti, qemu-devel, kraxel,
	kvm, imammedo, philmd, rth

On Tue, Sep 24, 2019 at 03:12:15PM +0200, Paolo Bonzini wrote:
> On 24/09/19 14:44, Sergio Lopez wrote:
> > microvm.option-roms=bool (Set off to disable loading option ROMs)
> 
> Please make this x-option-roms

Why? We don't plan to support this going forward?

> > microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
> > microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
> > microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
> 
> Perhaps auto-kernel-cmdline?
> 
> Paolo


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-24 12:44   ` Sergio Lopez
@ 2019-09-24 13:28     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 133+ messages in thread
From: Michael S. Tsirkin @ 2019-09-24 13:28 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

On Tue, Sep 24, 2019 at 02:44:33PM +0200, Sergio Lopez wrote:
> +static void microvm_fix_kernel_cmdline(MachineState *machine)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(machine);
> +    BusState *bus;
> +    BusChild *kid;
> +    char *cmdline;
> +
> +    /*
> +     * Find MMIO transports with attached devices, and add them to the kernel
> +     * command line.
> +     *
> +     * Yes, this is a hack, but one that heavily improves the UX without
> +     * introducing any significant issues.
> +     */
> +    cmdline = g_strdup(machine->kernel_cmdline);
> +    bus = sysbus_get_default();
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        DeviceState *dev = kid->child;
> +        ObjectClass *class = object_get_class(OBJECT(dev));
> +
> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
> +
> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
> +                if (mmio_cmdline) {
> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
> +                    g_free(mmio_cmdline);
> +                    g_free(cmdline);
> +                    cmdline = newcmd;
> +                }
> +            }
> +        }
> +    }
> +
> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
> +}

Can we rearrange this somewhat? Maybe the mmio constructor
would format the device description and add to some list,
and then microvm would just get stuff from that list
and add it to kernel command line?
This way it can also be controlled by a virtio-mmio property, so
e.g. you can disable it per device if you like.
In particular, this seems like a handy trick for any machine type
using mmio.

-- 
MST

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-24 13:28     ` Michael S. Tsirkin
  0 siblings, 0 replies; 133+ messages in thread
From: Michael S. Tsirkin @ 2019-09-24 13:28 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, lersek, mtosatti, qemu-devel, kraxel, pbonzini,
	imammedo, philmd, rth

On Tue, Sep 24, 2019 at 02:44:33PM +0200, Sergio Lopez wrote:
> +static void microvm_fix_kernel_cmdline(MachineState *machine)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(machine);
> +    BusState *bus;
> +    BusChild *kid;
> +    char *cmdline;
> +
> +    /*
> +     * Find MMIO transports with attached devices, and add them to the kernel
> +     * command line.
> +     *
> +     * Yes, this is a hack, but one that heavily improves the UX without
> +     * introducing any significant issues.
> +     */
> +    cmdline = g_strdup(machine->kernel_cmdline);
> +    bus = sysbus_get_default();
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        DeviceState *dev = kid->child;
> +        ObjectClass *class = object_get_class(OBJECT(dev));
> +
> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
> +
> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
> +                if (mmio_cmdline) {
> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
> +                    g_free(mmio_cmdline);
> +                    g_free(cmdline);
> +                    cmdline = newcmd;
> +                }
> +            }
> +        }
> +    }
> +
> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
> +}

Can we rearrange this somewhat? Maybe the mmio constructor
would format the device description and add to some list,
and then microvm would just get stuff from that list
and add it to kernel command line?
This way it can also be controlled by a virtio-mmio property, so
e.g. you can disable it per device if you like.
In particular, this seems like a handy trick for any machine type
using mmio.

-- 
MST


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 6/8] roms: add microvm-bios (qboot) as binary and git submodule
  2019-09-24 12:44   ` Sergio Lopez
@ 2019-09-24 13:31     ` Philippe Mathieu-Daudé
  -1 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-24 13:31 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, lersek,
	kraxel, mtosatti, kvm

On 9/24/19 2:44 PM, Sergio Lopez wrote:
> qboot is a minimalist x86 firmware for booting Linux kernels. It does
> the mininum amount of work required for the task, and it's able to
> boot both PVH images and bzImages without relying on option roms.
> 
> This characteristics make it an ideal companion for the microvm
> machine type.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  .gitmodules              |   3 +++
>  pc-bios/bios-microvm.bin | Bin 0 -> 65536 bytes
>  roms/Makefile            |   6 ++++++
>  roms/qboot               |   1 +
>  4 files changed, 10 insertions(+)
>  create mode 100755 pc-bios/bios-microvm.bin
>  create mode 160000 roms/qboot
> 
> diff --git a/.gitmodules b/.gitmodules
> index c5c474169d..19792c9a11 100644
> --- a/.gitmodules
> +++ b/.gitmodules
> @@ -58,3 +58,6 @@
>  [submodule "roms/opensbi"]
>  	path = roms/opensbi
>  	url = 	https://git.qemu.org/git/opensbi.git
> +[submodule "roms/qboot"]
> +	path = roms/qboot
> +	url = https://github.com/bonzini/qboot
> diff --git a/pc-bios/bios-microvm.bin b/pc-bios/bios-microvm.bin
> new file mode 100755
> index 0000000000000000000000000000000000000000..45eabc516692e2d134bbb630d133c7c2dcc9a9b6
> GIT binary patch
> literal 65536
> zcmeI2eS8zwneS)hF_vth5y2#;brv;O^x_6mtAUMO%tn5}IEf)jY``G|XbMBqa_^-@
> zBBUXSyplo3+VMx9Ci}VFwz~@rx!t?By&v!0UdNb=V<BQ0NE-8!l)yH1LPAE!VDnNi
> zCSmUHnUQRVmuC0>GyaU`%sJ0_&U3!!Ij`fT?33lo8bTirhX$S6(dmK^Nu2qLAPZJO
> zd+n=&@XLYwWFdS~Zh2c2gidFUEU<VyB{hF24C^}E=jl<HQ(<+MP>-}gkOUzx)FqJ6
> z;W433mmmmAY+Noh;tibd?18@VxCH}v4GeX<kXL!tiyU!Hnn`6SuU6r$bB&SE_=SXJ
> zc+)=1$Iq|sgvbt9s)=?%V2RL(E{A7Ar52#~B&%`TJKuimt+%dx%KD*M-I%M^1gEP~

Now that using Docker is quite simple, I'd rather add a job building
this and commit the built binary, so we have reproducible builds.

> z3rrTeoTTV}=y-L<H(`6Jx<%^VI8zpO=OW?aYwDJw?(oFd+1)>#`0DNc_4sRW0TC1A
> zmJsrGlX|tHBmSvHZ7h?b^=>;mu1$tb(P>mvG{5}(=7p;C?X<Nv)KiF;vS^dFph*f0
> z|KM_=_+GTu{^~BsC2OI`iH8;Xgy<4`bd|F?E`U$ysLqzy*(zuJjHIw>{{UgF+=c4I
> zP{`<|^mTPhV|UNEdEHc{)HAxSYaiW>P|0;&C$X5aR9UVpQyP@Vl`hhv$gdx{Z+-NR
> zwvW~;(eFuX@w<QuU$3wVw^IK3ra9{s)&CnoiJ!J0%~y=q9~UYm@2P&Y>tUs8?fTZ0
> z-^y7ZnXZ(N2F|Wml7g>taRZ)SYa#R~3)d_2XS+8|g~IPiN|W-WvPxO4Jf$R*7zq(M
> zVPZ7&u2-7NT$&*G^Vf&!N<}4sxYR-&zO@e~z~qj%l+aa&|1SJK8kh_<mPg?Ph8*ct
> zx<+yY;hYhS;T3-|U%TtHtLYu}w_i63jI9sWyCrer`&zejU6FSvla#+O5G_?20qHTI
> z@+oXU(Ov{h!_X&`70OD~fa)<r$y4N=@1QQhFU$Y+KbwE#7xIotf3bYo(#FRhYw)oF
> z?fGIsXnOLA6)T@wwR%RLyz}o5Wo%bFs0P1iIT@I~!0oGks67~Pcuwvn$LZRE?$a*(
> z{h{{eadET88UP82Cf}S~Nfy9}`gHKyg1<oBEO>(z79nsokwAD^1AC7pf@Oj~FIF9#
> zF9b$C2U-gqk-~z?@R7iuo?L~z4ZVWkIiSQ^i}NGJ*2?fn#8cjeR;%3om9j(r$>}0*
> zmHG01U~>3C;Jl}&<hUUtcgHCdOXn#uZ}@;euhm+1IPj;0rza63dsZ*SFvX5~bm(K(
> z7X>R|!j%@?|5s#Espmjtu)+#k%inqSKe2u4Y^$dyZBy&S@?QTm*4Jv!DYIMrLsd*G
> z=`T-i{>XEL^*_04^-~G9FB2d;TMqbVppb+**NV)Wh3cyEi~fAMBS-GYFX{6SqmnR3
> zi7j8o-YehtBPZl-;$fEDRenBjZPn_8`k0QW%duAboe{h9;n1k=PmPBITKXgysm0cG
> zff9plWlq19^_3qFT=eu943;%0F}d3r9Ci6~gQK=UjyF9V9G&A|6db+Rf1l@a_=w-<
> zIgp;B+L?Gjt)J5Gg)}0q>Wcp0HQVM-TQ2)8SKeoRFckXbZl8$MyIG&-ayp6njK|qn
> zUpF;;x*hu7L6uw4aN!=m!{{NE=USzLa8KY0G!Yj<9~vxX3HENZ(Onw#yXUrCeraCo
> zVgU^`gK6GqgVb`wZ;#Gbmy2v_MAjdX^lEFX6)k+K<#J$AXn(Om8@eKmZkcJ?-+r#^
> zCA}?|Tk-m02jZiTNML7==D<i+5OJC+mIOXEN(uZdy$8OjX-^@a*!$f7jzz1bmLC6P
> z$oo)aui!E>)CNvx3sYjI-Lq-*XZ<Zl<Oq$)Z{QZ>a2UU;-+c$WsKoy6%2oD<>eine
> z!#Ejf&|7)}XW?0sdUNKePg&pstJ#T?3+&za*%{)yhd+P##k*mz+)*)!?YnZKwSKA|
> zu}@-GBM5lQwXLUnbA-^KRrkB=>1G$AHSS_<H^$k}{tAOa?8@;s2!3iSdOq3ETjRc?
> zR9)$wnodlx*{W`JPX*Q&*ij0x>ujsr9%IKnN3NV_f2z%xyG&<)WQp>tn@_V;6awYf
> z{q=0PWP~N+=^0|;@HO^_x)<+6Z;;S0r?eL5MGEsG)4dPD(&64oU$Ar(mKJI91WQp*
> z*fzBkMi!<IXKbrL`}@Gu{V;2E&q~&~XA{xZDd8h>_#i^;Q2JgahPiGQ8gvStZuJR~
> zt#rF1Q*=b?$N+zvO5*<=;%c=B9T?OE0Z$h_QD(6#I65=Xr9N+IZ4eRkS9#6`N53nF
> zgGh%{`-7vUa`;ocs8#P&SmkZ6Ac(#qhoeSdgWU2t0t?xQx<zb7mzo*qiEe;NXZ?4e
> znr`x%Mz_0Hm^o(CHN&Qs4b&*`M+jj!!(+BL+i)zAszGW@t(BrHq3fi7YSe)q;Z|3e
> z70e$~*2}J?iVWFD_25hT9RCm4JD#4_8EQj6M$u8*vvNr?R^2MiP<{P)-F%7lF{}=>
> ztJi2*MGFO0nhpzHsS48`Vp6I$SzypCIQ|r)66h+xi_VhantBb5r^GsquKknX=-R+M
> zR5)+3-109(YL#R<<}5hotld~R3DG;%8uwi7=x5}ePWAz;ei|y+wczL$|D53HM*s5Q
> z=%zrc`r^c_nOvX1R?2lfbszGepq#~lx-UxZrlnOz{2p{&Q(UKz(M1ePy8b6tOokmV
> zLn9Qdcp}|RixRp+gLWhp<KZKhBHF0BOWSB@D^4^t4<64$-NNpV@i4^x#yyN+S3gD9
> zxSiTUh{EF#MDZBprm6LA(HQ8^&gUvyD|KV6JKnsXG+_>+axvgJlad?eTee+@?{jXu
> zfYy+<4a9q#+Xg%bFc#Lkt?+6)8^}a<6=7{Pl(xY4C3hw+G;lp|9;H5+Abs%xXQ(ef
> zlf)pDipQPXQqUx2%7Egdw^J<MQg3E#-s42!M!T~U%24+dys-;|a>oenzQOp1R=J}G
> z#oOYxtAp`@B3qnI9t8DHEcgY=VF>p@4)U7qZI&7f#y_FB9^0F2j)oj89X~b6STSQI
> zJGB67=8qA3IhV9q{PWleZJ*%`FMcgLjZ#$m)L7J#t+PdmS4q~;K5zNqKq{1(6=GPR
> zzt1jQ+&}|ddcP9G9Na5+7s=gK*0Mxk2BxTKvC!2A{4?b4l@75|?ykwFgh^OnXr7ZL
> z33i+&JdDZR31b~%GO@GXorc9VZcG&~Hp$(SA~o)u=mIh;l(c%zVi2>ct4HK+8P1VY
> z))%OjyZbRz-UVHukq7*D$=!{UL`<^vIo7g+DDMc$J5q8GgR;CZl=RNu;Fbd2d(lC4
> zriJ#~jnN$*YE7SvVGH7y;{%g&a;dneK=#pEW<1I(by1@!Ly6_^4fgx76!~?plm%+n
> z5{9EYS7Y6gk?*3`y|@8xwK@?azsl16k9t%NY`T@Nl08`i3bYI6;DEBwPKPHJZrYud
> zoU9dPO@-cD*-GuwJh;JyN!V~#+6S;vWoVD#t|#DT!#BC>`HZ{PyBj;<ZH7GiGSQWt
> zv{53}H;QWUPn@=t?ff8nGyX}D?d{IHZX=lKEx$90?-kE6zq?5e|APg+tSFNuQ*pWF
> z6~tnSUb9<pQExIFxfnxFh1P|Z3irq@wAgBf#7hh7Yvu43$afnA3^OlGByigfD~F%Q
> zo~RQe6nb{1V%i`)+74#uIPp)jeLQI!GOSK!4GjMya<b9RDafmB!We7f&*I!`;6DR3
> z$8W;_zKLHBU&qd=lco%Vswtc)+>gRfXAAGOT^wY+@zX`N53<F#_{i`(vU#ttGbSPI
> zYF`40owUQ9L)(<Nmd|Rf$y#GUk=Y@C1M^Bvw6-*~Pcqj2NrAj3njc*uu{w!0S)&hI
> zqbuyJ&d!>g(TAO^u5eMPrzo_quzV<RM3o*Clkrb;*o(6}X)2oNm8k0k*vI3ioVNGb
> zA=|aVNId>wk?s9XOZ#u`Wf%1iwZW^pj&5Bay_-h4&?!svD7EzF;^s5-#C%kdOM#Y?
> za@Yk<+8fAV&^B6`nhQ&_8YH*uMTN7?oyxijCbmWhc*lVK22!2VV6WNICN4;=#ENPf
> zB<oC?h8s-qwi<X_^M2}I<~p;BjDjDo(e8smsqdqE#*(=LaBq3@zuuqBl@Jn9N;21q
> z5M9Y!`&ejhCS-c5(O3U{o@&kx=6dE2<auvqDn`p1Is6S&j_ot5r=1?b&^f0FB_(r<
> zGv)0izcf6QX@7on=(F{j-P8Y^DQfg~?ItHqWqbP=;3KH^xJM4L^F~u_a2IEnpy=*y
> zgl7>HT7SIiG;A!)*l2Phb~}x8987b92)D_ZSoaU%3bdVOzo&{{{O9OoL)L>`+Wn!p
> z#QN3ZzZ5U3#V4ZW(Pt!9hI5eUbJplHcDBXJzGsg<=VV))p?D$OxjYd#!KTeJY?z&~
> zfz2yL=s!A+^!Xn!dcxQkGNBI)$4@(L=h^RU*E4<K{2p=^qa&ajd~zB7!&2FS<QrGe
> zD>9yoIC59*PBRXEDu<teyOFOVjrzVeitMRI^1%u72&Vkv)M3mpMt7O$?y2P%sIZnD
> z>joIhO<zwPgc5ea)Vhnb{vTR5;#}Z7J7dPSQS59iL}EC8Y&+i8(>Bniwe>5W9{HXg
> z<;*qm=Ha%JZHcy1ZRxh7>{uw_YVawZlM2o+JVz9du6WKKYVQ0xe6?=vj~e8b9^|0;
> z>oFzzQ}W>GJQ}vt7j=f?u;y#JC~w~hkuG6}O}N)r6#8fwn`OJZ868i5U~}$ndO*%%
> zk;7}@H#p&{dGJ+*`D(ib^ude<8&a@8wFyz53;WeD?Nv(-XAWBru?UW11Qzq+SMG>S
> z!-IrAfROsI82`}5>YUNK;*!G{16h+D4z&?}B(;kxjNw3u(fZ?Za31oBflXhq)U<5d
> zBe2r2$2211gXny_dvgwI!eaG!>kZDuZ#(1#i@a?q<bx4X7{pMBhr(iYc$nt5R3h1P
> zz;iNHg#g+IxkSsem0u`%jM1|XeSwM71d(n-zTE;Qo-T|eLTWlR#SClx-`tC}xd3ww
> zQ4#3znpy+_19rN7ye6p!5j$2=rZsMv{Z7)4W>z=4cjF9Ph?ZS^Eox<l;%O=BlS&ST
> z&N$Buz<Y{}!&RIgMklYqsX<Ac^<6pfwf<M>CBfo5*W#(xICY8?vDl{f1nn!9n&<6<
> zv)Q`-&R=6X(U$xP-rTDlqDLY$5G>h`BRqR_UB9~Sk~xlJb^H+<*4~9gJe}wNn3fjt
> zQ+evv;tKY{TPzW+-X9vV_&3Sh$rrFY+}BzyMHU{zc}#VmcJ7j){|BT$9=eW`@gUrc
> zjfQmRrrgYU6@1~Cg)KfNho>M(n*|6s7?b|57cL%Md<<njf!v{VaQ;(o$>Hx1Orywv
> z+FD&f1wCJ;ZtBB^;IC0uw6KNpv88?ForP4^u=dP&0>xg3o#JS6;Q*dOa^c7HkUJJ0
> zQ(qXsF=PDVu4tgoql<xoDKC7?ey;`wC@%Fp3J#6r$Su?_qK{bF;y#=_v8BhN_fti4
> zsf-Gv@D|kaXAsf?h%4`vLn`H1daQh(y#22cEa{If?kkT!`yA^RM=+`W7qrphW9axX
> zuiR0miwEZ~JQlb#tMGwT9?I12r9E+S0+4;EQY}>PrUhKrUNj>c(e|dX3&vva7^I^n
> zyjNWyk<4}Dm#!2Y1EC)tN$=8zFo?l_XiTBCzjdad>-cnmo~(X9uX_Z?_c7Kt^E2eK
> z9iNQfAiFRmcN}g$_!*Mg?>|>r{&wI4Hgbam8<9KuGcu*;GX0*)*i9Nc4Lt{k_K*&C
> z;_QNL!3XY;!)~+@f;}aE8%`qWSX7C<v1SBCj<AD6?fLA)IU}_9b1FKYoiSg#uhVM*
> zyI{V)p_{Lx=Q_Vfc{4dDyqp<Np<H7r1q^w-W}Z%Q5}7vCS(DKbMCrR@FnmW06mf17
> znMQ$D<F~9|4xd9BDu&NB?T9alquWz!UEI|f)DcPJ10sXu(}--Ja@uuzN4xd(*V;Fp
> z-q|kUX~9!G{Wbhv!<&A3r>Cc@AEV8=c6--h<0+c`uzQUa^%_Ra-Y{0eBLs$Cy9f*$
> zF<;E}3$dX^Nk)L!Fb%aVFvyJ6GHAZHi%``W%O+xxPYC)mi8iR&{RFC+8ZrEb9rW}V
> z`@)ojTG4rQ!=CZxJu$S~Xx@X=ykl)j`_Jw*<VM3#?a_?}?OaO{Cx(m{BY}<(+NtTN
> z%5a11nO|@2-vO?k%zXt9y5x@BpCO1Xz-;;TdA~pqOD0$A<eySxy?YKk8r$nxxUtcp
> zF#zXjSWp?~d^pScDjo7o*Y}dy3Uc0uMYIz#cF0H&7rLnqIAycPPgz-}1DwGz1#Q|3
> zy-H|Al(+4|RD(1K2HBaRE<#QlqOTyXg5t2fr+8`{S`&s3$Sozvav36`{~nY`$!{1f
> zHL9Ft@k|Ws3-wuyZO-w@=-;G!w;^;;IjbxXMgMwbfix*l(R>Iy!5}-1betc|rhY(S
> zEcgzZA){m@O+SYpJ1m=y!_Q)>y%^exYRcjNftpg#Du;iCAd)5A_Ee71uBZn#EmVHz
> zFd@7!gcwXdM$9EzT^ccl6_D9DCIiZbjNg<&y$azA&W-*&Jf{3G(A<2;n>Bs6fUbel
> zN7_t~`36STrwo}Uwmbz>;5)&sZhRJwtiW3=gtoF$gO#IE<(5ivO04wmgUnZPw7v&5
> zp<4fvTn2~5#cUpVP3ztNgx{9S=Ey8})VrtW;6#u0!Br+3b$GLj+#%CogsVDY<cZu(
> zuEucM!ZcGPa@=5hF8jWh+NWK|$P`cIutT9!GsHhCYyr8Y_H7zyqei$S8)@2tDo9y<
> z$ht&sQBb|?X)n(4Ob@}?813QTgdW7$<62H=U<OQgp}unj|3bCis)s(b$bTCT#q%?H
> zA^>_W^%G<Py9~*x;B*7JNf1c&ARgyMMK5)U@icu4!I_TLn3#=~y7n6^s*}R%`@gjs
> zpGEuWLpZ*n3ZmMSpY+l=g!gI;Q`l<Y+uJwtBX#*z<wpaTL;I~b%f!ei1T$QzOPOQ7
> zzt|WX0<(K_sl(E_sdk5P&SOtE$IM8PB0tB<I<((zd@-iw3(n>q$AZJ!d^I0O%5r!e
> zeD2vz%~!3)+jhJ)U$r3|crd!LIO>(gCbkRB9~pC5zcH7Y2C?DLfMv2VSnX5J%CGmZ
> zYHR6PR?vh~yymN-p>hZ8SDUXdp_yOY{7~uH;M>itAib#hs@+KcHPV}}PNpK~Pbe?i
> zP<)F5qt(_fDsCwKQ&O~z6@(s{Et<s@kcu7u9u;y2lpE9Vxzu9n4@q+u?UkGuU9^Ax
> zNY~|#AUQkgldQJmTdXTukh%w=@To7y0xQ{t&_@=t3}q;#!9l*%Fr#8;QKGTPh<&NN
> z+)-VeVUNL)(5HeZht>*ccpM3QW&>K|NMI}^Ahde1Z#9k%5OU&HOhHlFOKF}Se7{^O
> zi2VzjKd02d`B}O7cy<-y$N1ntyWxxfE165FzUsxk-rPG8l9IVacsy41=%Y9XK&a8>
> za&@{8=>8ZB=iL~_@LwDkUrXlBMVufET<^qD$Bm~s6Hc$jcCsG~GMxVu87JfNFKt(_
> zTa()!L^E~!pvj}tbOTK^xL9$fKf>N+e~hWi29vcT6rHRkrxki@2ZczL*jr^O!~!qe
> z2siJo$`RR*CH8%*y5y|qi05QF7j>UfDIP8VUEs2~(k*#Iy@lnHJMb^~?qrQhZnZcV
> zKV?z(RG?t>Uu*CiO4KXKJW)t4vOLsK=~9x5Cb?Yrgo?vy!(beIOLDiRN{pBPK5!W$
> zEdDW}cao)5a(g3^J5nnu$y!UHX#96b&vkW~80xilqHg$>o6f^HlF9pC;Ib;om<^>)
> z3&#2z&cMtgSN4u^euyssPTN%+kJ2}SK9=+xQA5<Q2A<E80yRXhR}vz=+*Z3gQa35$
> z%Ts?W;*8FQL-9vKqZa>M^<~khAvsVdWq>ur#;6Bu5%IZ0t+rzO;df!Z8#MHfrhlo#
> zz%(cID>xx+@Ac+c(!V8i$q;FhpvkBxhJ%-FMginQoz!|0I6=S3DH)<<T1ptXbWweI
> zijj*PEt_i+eQzJL>uM-rRoLC+t2igf`Y|<E;Jny3EB7vhFcunX!9DdTx0@RA4HC$@
> zk#XjLl!+Jq^MZ!h8!zOd3kC{Az_<)oq-`u+G*HrIzL`l`Jw~of3jVRfyy$?_h%5Dc
> z>;Ad({_=sc^6xzT-L>nK(g_#I(CG(V;*TE}#I08Gt9D6>K&20HSUL!oPU`wj5~y-m
> zTP#%$`}UkFhjW`$<tPm0Rym3+oAVuoRSAc)s@Gw!N;<4nZ#tx!f=6zSq{X)&Y{Xsj
> zZ@Nxr?Q~IG7<V1wJQRWJb~cecbl)zWbWwWUA9S6Lis*9TF2!AsxO?yvDjn8smP&^`
> zl(Q<~uvYduBy8XXp)n(4<bzDe(vU!sk)LcPo%iBN`9qe~`ADoOs5L~4I=q4;A7rMI
> z46Zud|Ad?3)=@+)@k6=Wb1I2nCGoQ?fgUGpuvdayCCF74EW9BGSzWj<;obmUU&m8e
> zI97)zvvqJIx<~|H$b6e*VCQx!Z(B}delL_KNjV)>2!a=KVaU-3KeIl|jf#+tm6^0H
> z2;Yt~=^)O>pyO^lO&u@=ynJ{q!+jiS#y=s!j+^RT?I?W#Zc}+fRci_?>u!!k+d<#o
> z;I8;U*Z&nyD(@CLy_SHZuEtSc<M1uHSxuG08_I35Y?HR<kiLJzy#xAw{^#`l;Lqr*
> z8Q15Bj@jTg2z`F)qPMosXSsBXN>>(G-gGq8I9Ap;)-*VLYqH@b4qsz7jQEH1voDW3
> zR#rNis?e6y2D?zMHlXV+goSE{bCILC(&1{PrVwW1-k(Wtpmf+3veY_m*RI+AWB*Xs
> zDL-8<+|++QLmAYzrjBZK{D<LGrvJMT-H3Y@@-^W(F7x=9Oa-FJr&dDdNJi!sZCGk~
> z{T7T+S*lAX?qTHrt4#h(KmH*)k<pQB3Tm?5L7?(9lEVrtl##zhc`nNjZ&4mP3bB|4
> zb~VU9ajdF9Tpi-F3}->kmx-I7DQCL{eAeWfovymzF*9JY+zQzy<R@O^oR*AnB7Hj`
> z6M+ncNygCkSVmt}0STiw!#*Ux1=%oUzd2F1AtPHeUbYwMBN@G$aI8XlZ{ktw2v+8-
> z$;sC3#yOz|*~sTEX}W#|x&+khCLNV<jo;qf{T4Zjvfa<nu@>2HS5DRs+s~MszfA78
> zkjwfj`3d>!F2qe9>x)&iRW`00>oga!RHyKuu0Kr@x8h=1al=Suj!BIWZ%4ilh{dh)
> zegDScy}BSr7H^ECu565%yYTd$({(!DG27i3zcF8gB${z<R|oQSs>2%O{UNQgZe>fg
> z!<Tc;atmj#E^r~sO5CrU*Y!v6r2HZHu+xHxCez1Be-QWWY@jipWCo#QAj2cmKViS+
> z>3qy_x650R$s4<<>(#fn-?h&F-EXcd`&Ow?x<#O{|2t1_ST|?GfBVkbbw3gwZ>Vwk
> z8XtE-7r!_GPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(8
> z6W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;Z
> zH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULas
> zfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O
> z1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1U
> zPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu
> z-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m
> z0Z!od1Ux-$$J=_^2HLc?{{JUzm0dl`OkLOi(JspO^xUV&;+-j7IrD2oSp}ujDU3^>
> o5d@0NUb>FZ&)-2Do-dnE`R8)xT^9bc5QmajO4XIv_+RY*0|l%Fi~s-t
> 
> literal 0
> HcmV?d00001
> 
> diff --git a/roms/Makefile b/roms/Makefile
> index 775c963f9d..47eabc8633 100644
> --- a/roms/Makefile
> +++ b/roms/Makefile
> @@ -67,6 +67,7 @@ default:
>  	@echo "  opensbi32-virt     -- update OpenSBI for 32-bit virt machine"
>  	@echo "  opensbi64-virt     -- update OpenSBI for 64-bit virt machine"
>  	@echo "  opensbi64-sifive_u -- update OpenSBI for 64-bit sifive_u machine"
> +	@echo "  bios-microvm       -- update bios-microvm.bin (qboot)"

I'd go the other way around:

        @echo "  qboot -- update qboot (BIOS used by microvm)"

>  	@echo "  clean              -- delete the files generated by the previous" \
>  	                              "build targets"
>  
> @@ -185,6 +186,10 @@ opensbi64-sifive_u:
>  		PLATFORM="qemu/sifive_u"
>  	cp opensbi/build/platform/qemu/sifive_u/firmware/fw_jump.bin ../pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin
>  
> +bios-microvm:

   qboot:

or

   qboot bios-microvm:

> +	$(MAKE) -C qboot
> +	cp qboot/bios.bin ../pc-bios/bios-microvm.bin
> +
>  clean:
>  	rm -rf seabios/.config seabios/out seabios/builds
>  	$(MAKE) -C sgabios clean
> @@ -197,3 +202,4 @@ clean:
>  	$(MAKE) -C skiboot clean
>  	$(MAKE) -f Makefile.edk2 clean
>  	$(MAKE) -C opensbi clean
> +	$(MAKE) -C qboot clean
> diff --git a/roms/qboot b/roms/qboot
> new file mode 160000
> index 0000000000..cb1c49e0cf
> --- /dev/null
> +++ b/roms/qboot
> @@ -0,0 +1 @@
> +Subproject commit cb1c49e0cfac99b9961d136ac0194da62c28cf64
> 

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 6/8] roms: add microvm-bios (qboot) as binary and git submodule
@ 2019-09-24 13:31     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-24 13:31 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: ehabkost, kvm, mst, mtosatti, kraxel, pbonzini, imammedo, lersek, rth

On 9/24/19 2:44 PM, Sergio Lopez wrote:
> qboot is a minimalist x86 firmware for booting Linux kernels. It does
> the mininum amount of work required for the task, and it's able to
> boot both PVH images and bzImages without relying on option roms.
> 
> This characteristics make it an ideal companion for the microvm
> machine type.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  .gitmodules              |   3 +++
>  pc-bios/bios-microvm.bin | Bin 0 -> 65536 bytes
>  roms/Makefile            |   6 ++++++
>  roms/qboot               |   1 +
>  4 files changed, 10 insertions(+)
>  create mode 100755 pc-bios/bios-microvm.bin
>  create mode 160000 roms/qboot
> 
> diff --git a/.gitmodules b/.gitmodules
> index c5c474169d..19792c9a11 100644
> --- a/.gitmodules
> +++ b/.gitmodules
> @@ -58,3 +58,6 @@
>  [submodule "roms/opensbi"]
>  	path = roms/opensbi
>  	url = 	https://git.qemu.org/git/opensbi.git
> +[submodule "roms/qboot"]
> +	path = roms/qboot
> +	url = https://github.com/bonzini/qboot
> diff --git a/pc-bios/bios-microvm.bin b/pc-bios/bios-microvm.bin
> new file mode 100755
> index 0000000000000000000000000000000000000000..45eabc516692e2d134bbb630d133c7c2dcc9a9b6
> GIT binary patch
> literal 65536
> zcmeI2eS8zwneS)hF_vth5y2#;brv;O^x_6mtAUMO%tn5}IEf)jY``G|XbMBqa_^-@
> zBBUXSyplo3+VMx9Ci}VFwz~@rx!t?By&v!0UdNb=V<BQ0NE-8!l)yH1LPAE!VDnNi
> zCSmUHnUQRVmuC0>GyaU`%sJ0_&U3!!Ij`fT?33lo8bTirhX$S6(dmK^Nu2qLAPZJO
> zd+n=&@XLYwWFdS~Zh2c2gidFUEU<VyB{hF24C^}E=jl<HQ(<+MP>-}gkOUzx)FqJ6
> z;W433mmmmAY+Noh;tibd?18@VxCH}v4GeX<kXL!tiyU!Hnn`6SuU6r$bB&SE_=SXJ
> zc+)=1$Iq|sgvbt9s)=?%V2RL(E{A7Ar52#~B&%`TJKuimt+%dx%KD*M-I%M^1gEP~

Now that using Docker is quite simple, I'd rather add a job building
this and commit the built binary, so we have reproducible builds.

> z3rrTeoTTV}=y-L<H(`6Jx<%^VI8zpO=OW?aYwDJw?(oFd+1)>#`0DNc_4sRW0TC1A
> zmJsrGlX|tHBmSvHZ7h?b^=>;mu1$tb(P>mvG{5}(=7p;C?X<Nv)KiF;vS^dFph*f0
> z|KM_=_+GTu{^~BsC2OI`iH8;Xgy<4`bd|F?E`U$ysLqzy*(zuJjHIw>{{UgF+=c4I
> zP{`<|^mTPhV|UNEdEHc{)HAxSYaiW>P|0;&C$X5aR9UVpQyP@Vl`hhv$gdx{Z+-NR
> zwvW~;(eFuX@w<QuU$3wVw^IK3ra9{s)&CnoiJ!J0%~y=q9~UYm@2P&Y>tUs8?fTZ0
> z-^y7ZnXZ(N2F|Wml7g>taRZ)SYa#R~3)d_2XS+8|g~IPiN|W-WvPxO4Jf$R*7zq(M
> zVPZ7&u2-7NT$&*G^Vf&!N<}4sxYR-&zO@e~z~qj%l+aa&|1SJK8kh_<mPg?Ph8*ct
> zx<+yY;hYhS;T3-|U%TtHtLYu}w_i63jI9sWyCrer`&zejU6FSvla#+O5G_?20qHTI
> z@+oXU(Ov{h!_X&`70OD~fa)<r$y4N=@1QQhFU$Y+KbwE#7xIotf3bYo(#FRhYw)oF
> z?fGIsXnOLA6)T@wwR%RLyz}o5Wo%bFs0P1iIT@I~!0oGks67~Pcuwvn$LZRE?$a*(
> z{h{{eadET88UP82Cf}S~Nfy9}`gHKyg1<oBEO>(z79nsokwAD^1AC7pf@Oj~FIF9#
> zF9b$C2U-gqk-~z?@R7iuo?L~z4ZVWkIiSQ^i}NGJ*2?fn#8cjeR;%3om9j(r$>}0*
> zmHG01U~>3C;Jl}&<hUUtcgHCdOXn#uZ}@;euhm+1IPj;0rza63dsZ*SFvX5~bm(K(
> z7X>R|!j%@?|5s#Espmjtu)+#k%inqSKe2u4Y^$dyZBy&S@?QTm*4Jv!DYIMrLsd*G
> z=`T-i{>XEL^*_04^-~G9FB2d;TMqbVppb+**NV)Wh3cyEi~fAMBS-GYFX{6SqmnR3
> zi7j8o-YehtBPZl-;$fEDRenBjZPn_8`k0QW%duAboe{h9;n1k=PmPBITKXgysm0cG
> zff9plWlq19^_3qFT=eu943;%0F}d3r9Ci6~gQK=UjyF9V9G&A|6db+Rf1l@a_=w-<
> zIgp;B+L?Gjt)J5Gg)}0q>Wcp0HQVM-TQ2)8SKeoRFckXbZl8$MyIG&-ayp6njK|qn
> zUpF;;x*hu7L6uw4aN!=m!{{NE=USzLa8KY0G!Yj<9~vxX3HENZ(Onw#yXUrCeraCo
> zVgU^`gK6GqgVb`wZ;#Gbmy2v_MAjdX^lEFX6)k+K<#J$AXn(Om8@eKmZkcJ?-+r#^
> zCA}?|Tk-m02jZiTNML7==D<i+5OJC+mIOXEN(uZdy$8OjX-^@a*!$f7jzz1bmLC6P
> z$oo)aui!E>)CNvx3sYjI-Lq-*XZ<Zl<Oq$)Z{QZ>a2UU;-+c$WsKoy6%2oD<>eine
> z!#Ejf&|7)}XW?0sdUNKePg&pstJ#T?3+&za*%{)yhd+P##k*mz+)*)!?YnZKwSKA|
> zu}@-GBM5lQwXLUnbA-^KRrkB=>1G$AHSS_<H^$k}{tAOa?8@;s2!3iSdOq3ETjRc?
> zR9)$wnodlx*{W`JPX*Q&*ij0x>ujsr9%IKnN3NV_f2z%xyG&<)WQp>tn@_V;6awYf
> z{q=0PWP~N+=^0|;@HO^_x)<+6Z;;S0r?eL5MGEsG)4dPD(&64oU$Ar(mKJI91WQp*
> z*fzBkMi!<IXKbrL`}@Gu{V;2E&q~&~XA{xZDd8h>_#i^;Q2JgahPiGQ8gvStZuJR~
> zt#rF1Q*=b?$N+zvO5*<=;%c=B9T?OE0Z$h_QD(6#I65=Xr9N+IZ4eRkS9#6`N53nF
> zgGh%{`-7vUa`;ocs8#P&SmkZ6Ac(#qhoeSdgWU2t0t?xQx<zb7mzo*qiEe;NXZ?4e
> znr`x%Mz_0Hm^o(CHN&Qs4b&*`M+jj!!(+BL+i)zAszGW@t(BrHq3fi7YSe)q;Z|3e
> z70e$~*2}J?iVWFD_25hT9RCm4JD#4_8EQj6M$u8*vvNr?R^2MiP<{P)-F%7lF{}=>
> ztJi2*MGFO0nhpzHsS48`Vp6I$SzypCIQ|r)66h+xi_VhantBb5r^GsquKknX=-R+M
> zR5)+3-109(YL#R<<}5hotld~R3DG;%8uwi7=x5}ePWAz;ei|y+wczL$|D53HM*s5Q
> z=%zrc`r^c_nOvX1R?2lfbszGepq#~lx-UxZrlnOz{2p{&Q(UKz(M1ePy8b6tOokmV
> zLn9Qdcp}|RixRp+gLWhp<KZKhBHF0BOWSB@D^4^t4<64$-NNpV@i4^x#yyN+S3gD9
> zxSiTUh{EF#MDZBprm6LA(HQ8^&gUvyD|KV6JKnsXG+_>+axvgJlad?eTee+@?{jXu
> zfYy+<4a9q#+Xg%bFc#Lkt?+6)8^}a<6=7{Pl(xY4C3hw+G;lp|9;H5+Abs%xXQ(ef
> zlf)pDipQPXQqUx2%7Egdw^J<MQg3E#-s42!M!T~U%24+dys-;|a>oenzQOp1R=J}G
> z#oOYxtAp`@B3qnI9t8DHEcgY=VF>p@4)U7qZI&7f#y_FB9^0F2j)oj89X~b6STSQI
> zJGB67=8qA3IhV9q{PWleZJ*%`FMcgLjZ#$m)L7J#t+PdmS4q~;K5zNqKq{1(6=GPR
> zzt1jQ+&}|ddcP9G9Na5+7s=gK*0Mxk2BxTKvC!2A{4?b4l@75|?ykwFgh^OnXr7ZL
> z33i+&JdDZR31b~%GO@GXorc9VZcG&~Hp$(SA~o)u=mIh;l(c%zVi2>ct4HK+8P1VY
> z))%OjyZbRz-UVHukq7*D$=!{UL`<^vIo7g+DDMc$J5q8GgR;CZl=RNu;Fbd2d(lC4
> zriJ#~jnN$*YE7SvVGH7y;{%g&a;dneK=#pEW<1I(by1@!Ly6_^4fgx76!~?plm%+n
> z5{9EYS7Y6gk?*3`y|@8xwK@?azsl16k9t%NY`T@Nl08`i3bYI6;DEBwPKPHJZrYud
> zoU9dPO@-cD*-GuwJh;JyN!V~#+6S;vWoVD#t|#DT!#BC>`HZ{PyBj;<ZH7GiGSQWt
> zv{53}H;QWUPn@=t?ff8nGyX}D?d{IHZX=lKEx$90?-kE6zq?5e|APg+tSFNuQ*pWF
> z6~tnSUb9<pQExIFxfnxFh1P|Z3irq@wAgBf#7hh7Yvu43$afnA3^OlGByigfD~F%Q
> zo~RQe6nb{1V%i`)+74#uIPp)jeLQI!GOSK!4GjMya<b9RDafmB!We7f&*I!`;6DR3
> z$8W;_zKLHBU&qd=lco%Vswtc)+>gRfXAAGOT^wY+@zX`N53<F#_{i`(vU#ttGbSPI
> zYF`40owUQ9L)(<Nmd|Rf$y#GUk=Y@C1M^Bvw6-*~Pcqj2NrAj3njc*uu{w!0S)&hI
> zqbuyJ&d!>g(TAO^u5eMPrzo_quzV<RM3o*Clkrb;*o(6}X)2oNm8k0k*vI3ioVNGb
> zA=|aVNId>wk?s9XOZ#u`Wf%1iwZW^pj&5Bay_-h4&?!svD7EzF;^s5-#C%kdOM#Y?
> za@Yk<+8fAV&^B6`nhQ&_8YH*uMTN7?oyxijCbmWhc*lVK22!2VV6WNICN4;=#ENPf
> zB<oC?h8s-qwi<X_^M2}I<~p;BjDjDo(e8smsqdqE#*(=LaBq3@zuuqBl@Jn9N;21q
> z5M9Y!`&ejhCS-c5(O3U{o@&kx=6dE2<auvqDn`p1Is6S&j_ot5r=1?b&^f0FB_(r<
> zGv)0izcf6QX@7on=(F{j-P8Y^DQfg~?ItHqWqbP=;3KH^xJM4L^F~u_a2IEnpy=*y
> zgl7>HT7SIiG;A!)*l2Phb~}x8987b92)D_ZSoaU%3bdVOzo&{{{O9OoL)L>`+Wn!p
> z#QN3ZzZ5U3#V4ZW(Pt!9hI5eUbJplHcDBXJzGsg<=VV))p?D$OxjYd#!KTeJY?z&~
> zfz2yL=s!A+^!Xn!dcxQkGNBI)$4@(L=h^RU*E4<K{2p=^qa&ajd~zB7!&2FS<QrGe
> zD>9yoIC59*PBRXEDu<teyOFOVjrzVeitMRI^1%u72&Vkv)M3mpMt7O$?y2P%sIZnD
> z>joIhO<zwPgc5ea)Vhnb{vTR5;#}Z7J7dPSQS59iL}EC8Y&+i8(>Bniwe>5W9{HXg
> z<;*qm=Ha%JZHcy1ZRxh7>{uw_YVawZlM2o+JVz9du6WKKYVQ0xe6?=vj~e8b9^|0;
> z>oFzzQ}W>GJQ}vt7j=f?u;y#JC~w~hkuG6}O}N)r6#8fwn`OJZ868i5U~}$ndO*%%
> zk;7}@H#p&{dGJ+*`D(ib^ude<8&a@8wFyz53;WeD?Nv(-XAWBru?UW11Qzq+SMG>S
> z!-IrAfROsI82`}5>YUNK;*!G{16h+D4z&?}B(;kxjNw3u(fZ?Za31oBflXhq)U<5d
> zBe2r2$2211gXny_dvgwI!eaG!>kZDuZ#(1#i@a?q<bx4X7{pMBhr(iYc$nt5R3h1P
> zz;iNHg#g+IxkSsem0u`%jM1|XeSwM71d(n-zTE;Qo-T|eLTWlR#SClx-`tC}xd3ww
> zQ4#3znpy+_19rN7ye6p!5j$2=rZsMv{Z7)4W>z=4cjF9Ph?ZS^Eox<l;%O=BlS&ST
> z&N$Buz<Y{}!&RIgMklYqsX<Ac^<6pfwf<M>CBfo5*W#(xICY8?vDl{f1nn!9n&<6<
> zv)Q`-&R=6X(U$xP-rTDlqDLY$5G>h`BRqR_UB9~Sk~xlJb^H+<*4~9gJe}wNn3fjt
> zQ+evv;tKY{TPzW+-X9vV_&3Sh$rrFY+}BzyMHU{zc}#VmcJ7j){|BT$9=eW`@gUrc
> zjfQmRrrgYU6@1~Cg)KfNho>M(n*|6s7?b|57cL%Md<<njf!v{VaQ;(o$>Hx1Orywv
> z+FD&f1wCJ;ZtBB^;IC0uw6KNpv88?ForP4^u=dP&0>xg3o#JS6;Q*dOa^c7HkUJJ0
> zQ(qXsF=PDVu4tgoql<xoDKC7?ey;`wC@%Fp3J#6r$Su?_qK{bF;y#=_v8BhN_fti4
> zsf-Gv@D|kaXAsf?h%4`vLn`H1daQh(y#22cEa{If?kkT!`yA^RM=+`W7qrphW9axX
> zuiR0miwEZ~JQlb#tMGwT9?I12r9E+S0+4;EQY}>PrUhKrUNj>c(e|dX3&vva7^I^n
> zyjNWyk<4}Dm#!2Y1EC)tN$=8zFo?l_XiTBCzjdad>-cnmo~(X9uX_Z?_c7Kt^E2eK
> z9iNQfAiFRmcN}g$_!*Mg?>|>r{&wI4Hgbam8<9KuGcu*;GX0*)*i9Nc4Lt{k_K*&C
> z;_QNL!3XY;!)~+@f;}aE8%`qWSX7C<v1SBCj<AD6?fLA)IU}_9b1FKYoiSg#uhVM*
> zyI{V)p_{Lx=Q_Vfc{4dDyqp<Np<H7r1q^w-W}Z%Q5}7vCS(DKbMCrR@FnmW06mf17
> znMQ$D<F~9|4xd9BDu&NB?T9alquWz!UEI|f)DcPJ10sXu(}--Ja@uuzN4xd(*V;Fp
> z-q|kUX~9!G{Wbhv!<&A3r>Cc@AEV8=c6--h<0+c`uzQUa^%_Ra-Y{0eBLs$Cy9f*$
> zF<;E}3$dX^Nk)L!Fb%aVFvyJ6GHAZHi%``W%O+xxPYC)mi8iR&{RFC+8ZrEb9rW}V
> z`@)ojTG4rQ!=CZxJu$S~Xx@X=ykl)j`_Jw*<VM3#?a_?}?OaO{Cx(m{BY}<(+NtTN
> z%5a11nO|@2-vO?k%zXt9y5x@BpCO1Xz-;;TdA~pqOD0$A<eySxy?YKk8r$nxxUtcp
> zF#zXjSWp?~d^pScDjo7o*Y}dy3Uc0uMYIz#cF0H&7rLnqIAycPPgz-}1DwGz1#Q|3
> zy-H|Al(+4|RD(1K2HBaRE<#QlqOTyXg5t2fr+8`{S`&s3$Sozvav36`{~nY`$!{1f
> zHL9Ft@k|Ws3-wuyZO-w@=-;G!w;^;;IjbxXMgMwbfix*l(R>Iy!5}-1betc|rhY(S
> zEcgzZA){m@O+SYpJ1m=y!_Q)>y%^exYRcjNftpg#Du;iCAd)5A_Ee71uBZn#EmVHz
> zFd@7!gcwXdM$9EzT^ccl6_D9DCIiZbjNg<&y$azA&W-*&Jf{3G(A<2;n>Bs6fUbel
> zN7_t~`36STrwo}Uwmbz>;5)&sZhRJwtiW3=gtoF$gO#IE<(5ivO04wmgUnZPw7v&5
> zp<4fvTn2~5#cUpVP3ztNgx{9S=Ey8})VrtW;6#u0!Br+3b$GLj+#%CogsVDY<cZu(
> zuEucM!ZcGPa@=5hF8jWh+NWK|$P`cIutT9!GsHhCYyr8Y_H7zyqei$S8)@2tDo9y<
> z$ht&sQBb|?X)n(4Ob@}?813QTgdW7$<62H=U<OQgp}unj|3bCis)s(b$bTCT#q%?H
> zA^>_W^%G<Py9~*x;B*7JNf1c&ARgyMMK5)U@icu4!I_TLn3#=~y7n6^s*}R%`@gjs
> zpGEuWLpZ*n3ZmMSpY+l=g!gI;Q`l<Y+uJwtBX#*z<wpaTL;I~b%f!ei1T$QzOPOQ7
> zzt|WX0<(K_sl(E_sdk5P&SOtE$IM8PB0tB<I<((zd@-iw3(n>q$AZJ!d^I0O%5r!e
> zeD2vz%~!3)+jhJ)U$r3|crd!LIO>(gCbkRB9~pC5zcH7Y2C?DLfMv2VSnX5J%CGmZ
> zYHR6PR?vh~yymN-p>hZ8SDUXdp_yOY{7~uH;M>itAib#hs@+KcHPV}}PNpK~Pbe?i
> zP<)F5qt(_fDsCwKQ&O~z6@(s{Et<s@kcu7u9u;y2lpE9Vxzu9n4@q+u?UkGuU9^Ax
> zNY~|#AUQkgldQJmTdXTukh%w=@To7y0xQ{t&_@=t3}q;#!9l*%Fr#8;QKGTPh<&NN
> z+)-VeVUNL)(5HeZht>*ccpM3QW&>K|NMI}^Ahde1Z#9k%5OU&HOhHlFOKF}Se7{^O
> zi2VzjKd02d`B}O7cy<-y$N1ntyWxxfE165FzUsxk-rPG8l9IVacsy41=%Y9XK&a8>
> za&@{8=>8ZB=iL~_@LwDkUrXlBMVufET<^qD$Bm~s6Hc$jcCsG~GMxVu87JfNFKt(_
> zTa()!L^E~!pvj}tbOTK^xL9$fKf>N+e~hWi29vcT6rHRkrxki@2ZczL*jr^O!~!qe
> z2siJo$`RR*CH8%*y5y|qi05QF7j>UfDIP8VUEs2~(k*#Iy@lnHJMb^~?qrQhZnZcV
> zKV?z(RG?t>Uu*CiO4KXKJW)t4vOLsK=~9x5Cb?Yrgo?vy!(beIOLDiRN{pBPK5!W$
> zEdDW}cao)5a(g3^J5nnu$y!UHX#96b&vkW~80xilqHg$>o6f^HlF9pC;Ib;om<^>)
> z3&#2z&cMtgSN4u^euyssPTN%+kJ2}SK9=+xQA5<Q2A<E80yRXhR}vz=+*Z3gQa35$
> z%Ts?W;*8FQL-9vKqZa>M^<~khAvsVdWq>ur#;6Bu5%IZ0t+rzO;df!Z8#MHfrhlo#
> zz%(cID>xx+@Ac+c(!V8i$q;FhpvkBxhJ%-FMginQoz!|0I6=S3DH)<<T1ptXbWweI
> zijj*PEt_i+eQzJL>uM-rRoLC+t2igf`Y|<E;Jny3EB7vhFcunX!9DdTx0@RA4HC$@
> zk#XjLl!+Jq^MZ!h8!zOd3kC{Az_<)oq-`u+G*HrIzL`l`Jw~of3jVRfyy$?_h%5Dc
> z>;Ad({_=sc^6xzT-L>nK(g_#I(CG(V;*TE}#I08Gt9D6>K&20HSUL!oPU`wj5~y-m
> zTP#%$`}UkFhjW`$<tPm0Rym3+oAVuoRSAc)s@Gw!N;<4nZ#tx!f=6zSq{X)&Y{Xsj
> zZ@Nxr?Q~IG7<V1wJQRWJb~cecbl)zWbWwWUA9S6Lis*9TF2!AsxO?yvDjn8smP&^`
> zl(Q<~uvYduBy8XXp)n(4<bzDe(vU!sk)LcPo%iBN`9qe~`ADoOs5L~4I=q4;A7rMI
> z46Zud|Ad?3)=@+)@k6=Wb1I2nCGoQ?fgUGpuvdayCCF74EW9BGSzWj<;obmUU&m8e
> zI97)zvvqJIx<~|H$b6e*VCQx!Z(B}delL_KNjV)>2!a=KVaU-3KeIl|jf#+tm6^0H
> z2;Yt~=^)O>pyO^lO&u@=ynJ{q!+jiS#y=s!j+^RT?I?W#Zc}+fRci_?>u!!k+d<#o
> z;I8;U*Z&nyD(@CLy_SHZuEtSc<M1uHSxuG08_I35Y?HR<kiLJzy#xAw{^#`l;Lqr*
> z8Q15Bj@jTg2z`F)qPMosXSsBXN>>(G-gGq8I9Ap;)-*VLYqH@b4qsz7jQEH1voDW3
> zR#rNis?e6y2D?zMHlXV+goSE{bCILC(&1{PrVwW1-k(Wtpmf+3veY_m*RI+AWB*Xs
> zDL-8<+|++QLmAYzrjBZK{D<LGrvJMT-H3Y@@-^W(F7x=9Oa-FJr&dDdNJi!sZCGk~
> z{T7T+S*lAX?qTHrt4#h(KmH*)k<pQB3Tm?5L7?(9lEVrtl##zhc`nNjZ&4mP3bB|4
> zb~VU9ajdF9Tpi-F3}->kmx-I7DQCL{eAeWfovymzF*9JY+zQzy<R@O^oR*AnB7Hj`
> z6M+ncNygCkSVmt}0STiw!#*Ux1=%oUzd2F1AtPHeUbYwMBN@G$aI8XlZ{ktw2v+8-
> z$;sC3#yOz|*~sTEX}W#|x&+khCLNV<jo;qf{T4Zjvfa<nu@>2HS5DRs+s~MszfA78
> zkjwfj`3d>!F2qe9>x)&iRW`00>oga!RHyKuu0Kr@x8h=1al=Suj!BIWZ%4ilh{dh)
> zegDScy}BSr7H^ECu565%yYTd$({(!DG27i3zcF8gB${z<R|oQSs>2%O{UNQgZe>fg
> z!<Tc;atmj#E^r~sO5CrU*Y!v6r2HZHu+xHxCez1Be-QWWY@jipWCo#QAj2cmKViS+
> z>3qy_x650R$s4<<>(#fn-?h&F-EXcd`&Ow?x<#O{|2t1_ST|?GfBVkbbw3gwZ>Vwk
> z8XtE-7r!_GPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(8
> z6W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;Z
> zH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULas
> zfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O
> z1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1U
> zPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu
> z-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m
> z0Z!od1Ux-$$J=_^2HLc?{{JUzm0dl`OkLOi(JspO^xUV&;+-j7IrD2oSp}ujDU3^>
> o5d@0NUb>FZ&)-2Do-dnE`R8)xT^9bc5QmajO4XIv_+RY*0|l%Fi~s-t
> 
> literal 0
> HcmV?d00001
> 
> diff --git a/roms/Makefile b/roms/Makefile
> index 775c963f9d..47eabc8633 100644
> --- a/roms/Makefile
> +++ b/roms/Makefile
> @@ -67,6 +67,7 @@ default:
>  	@echo "  opensbi32-virt     -- update OpenSBI for 32-bit virt machine"
>  	@echo "  opensbi64-virt     -- update OpenSBI for 64-bit virt machine"
>  	@echo "  opensbi64-sifive_u -- update OpenSBI for 64-bit sifive_u machine"
> +	@echo "  bios-microvm       -- update bios-microvm.bin (qboot)"

I'd go the other way around:

        @echo "  qboot -- update qboot (BIOS used by microvm)"

>  	@echo "  clean              -- delete the files generated by the previous" \
>  	                              "build targets"
>  
> @@ -185,6 +186,10 @@ opensbi64-sifive_u:
>  		PLATFORM="qemu/sifive_u"
>  	cp opensbi/build/platform/qemu/sifive_u/firmware/fw_jump.bin ../pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin
>  
> +bios-microvm:

   qboot:

or

   qboot bios-microvm:

> +	$(MAKE) -C qboot
> +	cp qboot/bios.bin ../pc-bios/bios-microvm.bin
> +
>  clean:
>  	rm -rf seabios/.config seabios/out seabios/builds
>  	$(MAKE) -C sgabios clean
> @@ -197,3 +202,4 @@ clean:
>  	$(MAKE) -C skiboot clean
>  	$(MAKE) -f Makefile.edk2 clean
>  	$(MAKE) -C opensbi clean
> +	$(MAKE) -C qboot clean
> diff --git a/roms/qboot b/roms/qboot
> new file mode 160000
> index 0000000000..cb1c49e0cf
> --- /dev/null
> +++ b/roms/qboot
> @@ -0,0 +1 @@
> +Subproject commit cb1c49e0cfac99b9961d136ac0194da62c28cf64
> 


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-24 13:24       ` Michael S. Tsirkin
@ 2019-09-24 13:34         ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-24 13:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sergio Lopez, qemu-devel, imammedo, marcel.apfelbaum, rth,
	ehabkost, philmd, lersek, kraxel, mtosatti, kvm

On 24/09/19 15:24, Michael S. Tsirkin wrote:
> On Tue, Sep 24, 2019 at 03:12:15PM +0200, Paolo Bonzini wrote:
>> On 24/09/19 14:44, Sergio Lopez wrote:
>>> microvm.option-roms=bool (Set off to disable loading option ROMs)
>>
>> Please make this x-option-roms
> 
> Why? We don't plan to support this going forward?

The option is only useful for SeaBIOS.  Since it doesn't have any effect
for the default firmware, I think it's fair to consider it experimental.

Paolo

>>> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
>>> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
>>> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
>>
>> Perhaps auto-kernel-cmdline?
>>
>> Paolo


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-24 13:34         ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-24 13:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost, Sergio Lopez, lersek, mtosatti, qemu-devel, kraxel,
	kvm, imammedo, philmd, rth

On 24/09/19 15:24, Michael S. Tsirkin wrote:
> On Tue, Sep 24, 2019 at 03:12:15PM +0200, Paolo Bonzini wrote:
>> On 24/09/19 14:44, Sergio Lopez wrote:
>>> microvm.option-roms=bool (Set off to disable loading option ROMs)
>>
>> Please make this x-option-roms
> 
> Why? We don't plan to support this going forward?

The option is only useful for SeaBIOS.  Since it doesn't have any effect
for the default firmware, I think it's fair to consider it experimental.

Paolo

>>> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
>>> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
>>> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
>>
>> Perhaps auto-kernel-cmdline?
>>
>> Paolo



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 4/8] hw/i386: split PCMachineState deriving X86MachineState from it
  2019-09-24 12:44 ` [PATCH v4 4/8] hw/i386: split PCMachineState deriving X86MachineState from it Sergio Lopez
@ 2019-09-24 13:40   ` Philippe Mathieu-Daudé
  2019-09-25 15:39     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-24 13:40 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: ehabkost, kvm, mst, mtosatti, kraxel, pbonzini, imammedo, lersek, rth

On 9/24/19 2:44 PM, Sergio Lopez wrote:
> Split up PCMachineState and PCMachineClass and derive X86MachineState
> and X86MachineClass from them. This allows sharing code with non-PC
> machine types.
> 
> Also, move shared functions from pc.c to x86.c.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  hw/acpi/cpu_hotplug.c |  10 +-
>  hw/i386/Makefile.objs |   1 +
>  hw/i386/acpi-build.c  |  31 +-
>  hw/i386/amd_iommu.c   |   4 +-
>  hw/i386/intel_iommu.c |   4 +-
>  hw/i386/pc.c          | 796 +++++-------------------------------------
>  hw/i386/pc_piix.c     |  48 +--
>  hw/i386/pc_q35.c      |  38 +-
>  hw/i386/pc_sysfw.c    |  60 +---
>  hw/i386/x86.c         | 788 +++++++++++++++++++++++++++++++++++++++++
>  hw/intc/ioapic.c      |   3 +-
>  include/hw/i386/pc.h  |  29 +-
>  include/hw/i386/x86.h |  97 +++++
>  13 files changed, 1045 insertions(+), 864 deletions(-)
>  create mode 100644 hw/i386/x86.c
>  create mode 100644 include/hw/i386/x86.h
> 
> diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
> index 6e8293aac9..3ac2045a95 100644
> --- a/hw/acpi/cpu_hotplug.c
> +++ b/hw/acpi/cpu_hotplug.c
> @@ -128,7 +128,7 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState *machine,
>      Aml *one = aml_int(1);
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
>      const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
> -    PCMachineState *pcms = PC_MACHINE(machine);
> +    X86MachineState *x86ms = X86_MACHINE(machine);
>  
>      /*
>       * _MAT method - creates an madt apic buffer
> @@ -236,9 +236,9 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState *machine,
>      /* The current AML generator can cover the APIC ID range [0..255],
>       * inclusive, for VCPU hotplug. */
>      QEMU_BUILD_BUG_ON(ACPI_CPU_HOTPLUG_ID_LIMIT > 256);
> -    if (pcms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
> +    if (x86ms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
>          error_report("max_cpus is too large. APIC ID of last CPU is %u",
> -                     pcms->apic_id_limit - 1);
> +                     x86ms->apic_id_limit - 1);
>          exit(1);
>      }
>  
> @@ -315,8 +315,8 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState *machine,
>       * ith up to 255 elements. Windows guests up to win2k8 fail when
>       * VarPackageOp is used.
>       */
> -    pkg = pcms->apic_id_limit <= 255 ? aml_package(pcms->apic_id_limit) :
> -                                       aml_varpackage(pcms->apic_id_limit);
> +    pkg = x86ms->apic_id_limit <= 255 ? aml_package(x86ms->apic_id_limit) :
> +                                        aml_varpackage(x86ms->apic_id_limit);
>  
>      for (i = 0, apic_idx = 0; i < apic_ids->len; i++) {
>          int apic_id = apic_ids->cpus[i].arch_id;
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 149712db07..5b4b3a672e 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -1,6 +1,7 @@
>  obj-$(CONFIG_KVM) += kvm/
>  obj-y += multiboot.o
>  obj-y += pvh.o
> +obj-y += x86.o
>  obj-y += pc.o
>  obj-y += e820.o
>  obj-$(CONFIG_I440FX) += pc_piix.o
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index e54e571a75..76e18d3285 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -29,6 +29,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/core/cpu.h"
>  #include "target/i386/cpu.h"
> +#include "hw/i386/x86.h"
>  #include "hw/misc/pvpanic.h"
>  #include "hw/timer/hpet.h"
>  #include "hw/acpi/acpi-defs.h"
> @@ -361,6 +362,7 @@ static void
>  build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(pcms));
>      int madt_start = table_data->len;
>      AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(pcms->acpi_dev);
> @@ -390,7 +392,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
>      io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
>      io_apic->interrupt = cpu_to_le32(0);
>  
> -    if (pcms->apic_xrupt_override) {
> +    if (x86ms->apic_xrupt_override) {
>          intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
>          intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
>          intsrcovr->length = sizeof(*intsrcovr);
> @@ -1817,8 +1819,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>      CrsRangeEntry *entry;
>      Aml *dsdt, *sb_scope, *scope, *dev, *method, *field, *pkg, *crs;
>      CrsRangeSet crs_range_set;
> -    PCMachineState *pcms = PC_MACHINE(machine);
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
> +    X86MachineState *x86ms = X86_MACHINE(machine);
>      AcpiMcfgInfo mcfg;
>      uint32_t nr_mem = machine->ram_slots;
>      int root_bus_limit = 0xFF;
> @@ -2083,7 +2085,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>           * with half of the 16-bit control register. Hence, the total size
>           * of the i/o region used is FW_CFG_CTL_SIZE; when using DMA, the
>           * DMA control register is located at FW_CFG_DMA_IO_BASE + 4 */
> -        uint8_t io_size = object_property_get_bool(OBJECT(pcms->fw_cfg),
> +        uint8_t io_size = object_property_get_bool(OBJECT(x86ms->fw_cfg),
>                                                     "dma_enabled", NULL) ?
>                            ROUND_UP(FW_CFG_CTL_SIZE, 4) + sizeof(dma_addr_t) :
>                            FW_CFG_CTL_SIZE;
> @@ -2318,6 +2320,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
>      const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
>      PCMachineState *pcms = PC_MACHINE(machine);
> +    X86MachineState *x86ms = X86_MACHINE(machine);
>      ram_addr_t hotplugabble_address_space_size =
>          object_property_get_int(OBJECT(pcms), PC_MACHINE_DEVMEM_REGION_SIZE,
>                                  NULL);
> @@ -2386,16 +2389,16 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>          }
>  
>          /* Cut out the ACPI_PCI hole */
> -        if (mem_base <= pcms->below_4g_mem_size &&
> -            next_base > pcms->below_4g_mem_size) {
> -            mem_len -= next_base - pcms->below_4g_mem_size;
> +        if (mem_base <= x86ms->below_4g_mem_size &&
> +            next_base > x86ms->below_4g_mem_size) {
> +            mem_len -= next_base - x86ms->below_4g_mem_size;
>              if (mem_len > 0) {
>                  numamem = acpi_data_push(table_data, sizeof *numamem);
>                  build_srat_memory(numamem, mem_base, mem_len, i - 1,
>                                    MEM_AFFINITY_ENABLED);
>              }
>              mem_base = 1ULL << 32;
> -            mem_len = next_base - pcms->below_4g_mem_size;
> +            mem_len = next_base - x86ms->below_4g_mem_size;
>              next_base = mem_base + mem_len;
>          }
>  
> @@ -2614,6 +2617,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>  {
>      PCMachineState *pcms = PC_MACHINE(machine);
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(machine);
>      GArray *table_offsets;
>      unsigned facs, dsdt, rsdt, fadt;
>      AcpiPmInfo pm;
> @@ -2775,7 +2779,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>           */
>          int legacy_aml_len =
>              pcmc->legacy_acpi_table_size +
> -            ACPI_BUILD_LEGACY_CPU_AML_SIZE * pcms->apic_id_limit;
> +            ACPI_BUILD_LEGACY_CPU_AML_SIZE * x86ms->apic_id_limit;
>          int legacy_table_size =
>              ROUND_UP(tables_blob->len - aml_len + legacy_aml_len,
>                       ACPI_BUILD_ALIGN_SIZE);
> @@ -2865,13 +2869,14 @@ void acpi_setup(void)
>  {
>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      AcpiBuildTables tables;
>      AcpiBuildState *build_state;
>      Object *vmgenid_dev;
>      TPMIf *tpm;
>      static FwCfgTPMConfig tpm_config;
>  
> -    if (!pcms->fw_cfg) {
> +    if (!x86ms->fw_cfg) {
>          ACPI_BUILD_DPRINTF("No fw cfg. Bailing out.\n");
>          return;
>      }
> @@ -2902,7 +2907,7 @@ void acpi_setup(void)
>          acpi_add_rom_blob(acpi_build_update, build_state,
>                            tables.linker->cmd_blob, "etc/table-loader", 0);
>  
> -    fw_cfg_add_file(pcms->fw_cfg, ACPI_BUILD_TPMLOG_FILE,
> +    fw_cfg_add_file(x86ms->fw_cfg, ACPI_BUILD_TPMLOG_FILE,
>                      tables.tcpalog->data, acpi_data_len(tables.tcpalog));
>  
>      tpm = tpm_find();
> @@ -2912,13 +2917,13 @@ void acpi_setup(void)
>              .tpm_version = tpm_get_version(tpm),
>              .tpmppi_version = TPM_PPI_VERSION_1_30
>          };
> -        fw_cfg_add_file(pcms->fw_cfg, "etc/tpm/config",
> +        fw_cfg_add_file(x86ms->fw_cfg, "etc/tpm/config",
>                          &tpm_config, sizeof tpm_config);
>      }
>  
>      vmgenid_dev = find_vmgenid_dev();
>      if (vmgenid_dev) {
> -        vmgenid_add_fw_cfg(VMGENID(vmgenid_dev), pcms->fw_cfg,
> +        vmgenid_add_fw_cfg(VMGENID(vmgenid_dev), x86ms->fw_cfg,
>                             tables.vmgenid);
>      }
>  
> @@ -2931,7 +2936,7 @@ void acpi_setup(void)
>          uint32_t rsdp_size = acpi_data_len(tables.rsdp);
>  
>          build_state->rsdp = g_memdup(tables.rsdp->data, rsdp_size);
> -        fw_cfg_add_file_callback(pcms->fw_cfg, ACPI_BUILD_RSDP_FILE,
> +        fw_cfg_add_file_callback(x86ms->fw_cfg, ACPI_BUILD_RSDP_FILE,
>                                   acpi_build_update, NULL, build_state,
>                                   build_state->rsdp, rsdp_size, true);
>          build_state->rsdp_mr = NULL;
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 08884523e2..bb3b5b4563 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -21,6 +21,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "hw/i386/x86.h"
>  #include "hw/i386/pc.h"
>  #include "hw/pci/msi.h"
>  #include "hw/pci/pci_bus.h"
> @@ -1537,6 +1538,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>      MachineState *ms = MACHINE(qdev_get_machine());
>      PCMachineState *pcms = PC_MACHINE(ms);
> +    X86MachineState *x86ms = X86_MACHINE(ms);
>      PCIBus *bus = pcms->bus;
>  
>      s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
> @@ -1565,7 +1567,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
>      }
>  
>      /* Pseudo address space under root PCI bus. */
> -    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_IOAPIC_SB_DEVID);
> +    x86ms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_IOAPIC_SB_DEVID);
>  
>      /* set up MMIO */
>      memory_region_init_io(&s->mmio, OBJECT(s), &mmio_mem_ops, s, "amdvi-mmio",
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 75ca6f9c70..21f091c654 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -29,6 +29,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/pci/pci_bus.h"
>  #include "hw/qdev-properties.h"
> +#include "hw/i386/x86.h"
>  #include "hw/i386/pc.h"
>  #include "hw/i386/apic-msidef.h"
>  #include "hw/boards.h"
> @@ -3703,6 +3704,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>  {
>      MachineState *ms = MACHINE(qdev_get_machine());
>      PCMachineState *pcms = PC_MACHINE(ms);
> +    X86MachineState *x86ms = X86_MACHINE(ms);
>      PCIBus *bus = pcms->bus;
>      IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
> @@ -3743,7 +3745,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>      sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
>      pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
>      /* Pseudo address space under root PCI bus. */
> -    pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
> +    x86ms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
>  }
>  
>  static void vtd_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3920aa7e85..d18b461f01 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -24,6 +24,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/units.h"
> +#include "hw/i386/x86.h"
>  #include "hw/i386/pc.h"
>  #include "hw/char/serial.h"
>  #include "hw/char/parallel.h"
> @@ -676,6 +677,7 @@ void pc_cmos_init(PCMachineState *pcms,
>                    BusState *idebus0, BusState *idebus1,
>                    ISADevice *s)
>  {
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      int val;
>      static pc_cmos_init_late_arg arg;
>  
> @@ -683,12 +685,12 @@ void pc_cmos_init(PCMachineState *pcms,
>  
>      /* memory size */
>      /* base memory (first MiB) */
> -    val = MIN(pcms->below_4g_mem_size / KiB, 640);
> +    val = MIN(x86ms->below_4g_mem_size / KiB, 640);
>      rtc_set_memory(s, 0x15, val);
>      rtc_set_memory(s, 0x16, val >> 8);
>      /* extended memory (next 64MiB) */
> -    if (pcms->below_4g_mem_size > 1 * MiB) {
> -        val = (pcms->below_4g_mem_size - 1 * MiB) / KiB;
> +    if (x86ms->below_4g_mem_size > 1 * MiB) {
> +        val = (x86ms->below_4g_mem_size - 1 * MiB) / KiB;
>      } else {
>          val = 0;
>      }
> @@ -699,8 +701,8 @@ void pc_cmos_init(PCMachineState *pcms,
>      rtc_set_memory(s, 0x30, val);
>      rtc_set_memory(s, 0x31, val >> 8);
>      /* memory between 16MiB and 4GiB */
> -    if (pcms->below_4g_mem_size > 16 * MiB) {
> -        val = (pcms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
> +    if (x86ms->below_4g_mem_size > 16 * MiB) {
> +        val = (x86ms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
>      } else {
>          val = 0;
>      }
> @@ -709,20 +711,20 @@ void pc_cmos_init(PCMachineState *pcms,
>      rtc_set_memory(s, 0x34, val);
>      rtc_set_memory(s, 0x35, val >> 8);
>      /* memory above 4GiB */
> -    val = pcms->above_4g_mem_size / 65536;
> +    val = x86ms->above_4g_mem_size / 65536;
>      rtc_set_memory(s, 0x5b, val);
>      rtc_set_memory(s, 0x5c, val >> 8);
>      rtc_set_memory(s, 0x5d, val >> 16);
>  
> -    object_property_add_link(OBJECT(pcms), "rtc_state",
> +    object_property_add_link(OBJECT(x86ms), "rtc_state",
>                               TYPE_ISA_DEVICE,
> -                             (Object **)&pcms->rtc,
> +                             (Object **)&x86ms->rtc,
>                               object_property_allow_set_link,
>                               OBJ_PROP_LINK_STRONG, &error_abort);
> -    object_property_set_link(OBJECT(pcms), OBJECT(s),
> +    object_property_set_link(OBJECT(x86ms), OBJECT(s),
>                               "rtc_state", &error_abort);
>  
> -    set_boot_dev(s, MACHINE(pcms)->boot_order, &error_fatal);
> +    set_boot_dev(s, MACHINE(x86ms)->boot_order, &error_fatal);
>  
>      val = 0;
>      val |= 0x02; /* FPU is there */
> @@ -863,35 +865,6 @@ static void handle_a20_line_change(void *opaque, int irq, int level)
>      x86_cpu_set_a20(cpu, level);
>  }
>  
> -/* Calculates initial APIC ID for a specific CPU index
> - *
> - * Currently we need to be able to calculate the APIC ID from the CPU index
> - * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces have
> - * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of
> - * all CPUs up to max_cpus.
> - */
> -static uint32_t x86_cpu_apic_id_from_index(PCMachineState *pcms,
> -                                           unsigned int cpu_index)
> -{
> -    MachineState *ms = MACHINE(pcms);
> -    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> -    uint32_t correct_id;
> -    static bool warned;
> -
> -    correct_id = x86_apicid_from_cpu_idx(pcms->smp_dies, ms->smp.cores,
> -                                         ms->smp.threads, cpu_index);
> -    if (pcmc->compat_apic_id_mode) {
> -        if (cpu_index != correct_id && !warned && !qtest_enabled()) {
> -            error_report("APIC IDs set in compatibility mode, "
> -                         "CPU topology won't match the configuration");
> -            warned = true;
> -        }
> -        return cpu_index;
> -    } else {
> -        return correct_id;
> -    }
> -}
> -
>  static void pc_build_smbios(PCMachineState *pcms)
>  {
>      uint8_t *smbios_tables, *smbios_anchor;
> @@ -899,6 +872,7 @@ static void pc_build_smbios(PCMachineState *pcms)
>      struct smbios_phys_mem_area *mem_array;
>      unsigned i, array_count;
>      MachineState *ms = MACHINE(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      X86CPU *cpu = X86_CPU(ms->possible_cpus->cpus[0].cpu);
>  
>      /* tell smbios about cpuid version and features */
> @@ -906,7 +880,7 @@ static void pc_build_smbios(PCMachineState *pcms)
>  
>      smbios_tables = smbios_get_table_legacy(ms, &smbios_tables_len);
>      if (smbios_tables) {
> -        fw_cfg_add_bytes(pcms->fw_cfg, FW_CFG_SMBIOS_ENTRIES,
> +        fw_cfg_add_bytes(x86ms->fw_cfg, FW_CFG_SMBIOS_ENTRIES,
>                           smbios_tables, smbios_tables_len);
>      }
>  
> @@ -927,9 +901,9 @@ static void pc_build_smbios(PCMachineState *pcms)
>      g_free(mem_array);
>  
>      if (smbios_anchor) {
> -        fw_cfg_add_file(pcms->fw_cfg, "etc/smbios/smbios-tables",
> +        fw_cfg_add_file(x86ms->fw_cfg, "etc/smbios/smbios-tables",
>                          smbios_tables, smbios_tables_len);
> -        fw_cfg_add_file(pcms->fw_cfg, "etc/smbios/smbios-anchor",
> +        fw_cfg_add_file(x86ms->fw_cfg, "etc/smbios/smbios-anchor",
>                          smbios_anchor, smbios_anchor_len);
>      }
>  }
> @@ -942,10 +916,11 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>      const CPUArchIdList *cpus;
>      MachineClass *mc = MACHINE_GET_CLASS(pcms);
>      MachineState *ms = MACHINE(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      int nb_numa_nodes = ms->numa_state->num_nodes;
>  
>      fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
> -    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> +    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>  
>      /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
>       *
> @@ -959,7 +934,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>       * So for compatibility reasons with old BIOSes we are stuck with
>       * "etc/max-cpus" actually being apic_id_limit
>       */
> -    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)pcms->apic_id_limit);
> +    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)x86ms->apic_id_limit);
>      fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES,
>                       acpi_tables, acpi_tables_len);
> @@ -972,374 +947,25 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>       * of nodes, one word for each VCPU->node and one word for each node to
>       * hold the amount of memory.
>       */
> -    numa_fw_cfg = g_new0(uint64_t, 1 + pcms->apic_id_limit + nb_numa_nodes);
> +    numa_fw_cfg = g_new0(uint64_t, 1 + x86ms->apic_id_limit + nb_numa_nodes);
>      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
>      cpus = mc->possible_cpu_arch_ids(MACHINE(pcms));
>      for (i = 0; i < cpus->len; i++) {
>          unsigned int apic_id = cpus->cpus[i].arch_id;
> -        assert(apic_id < pcms->apic_id_limit);
> +        assert(apic_id < x86ms->apic_id_limit);
>          numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
>      }
>      for (i = 0; i < nb_numa_nodes; i++) {
> -        numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
> +        numa_fw_cfg[x86ms->apic_id_limit + 1 + i] =
>              cpu_to_le64(ms->numa_state->nodes[i].node_mem);
>      }
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
> -                     (1 + pcms->apic_id_limit + nb_numa_nodes) *
> +                     (1 + x86ms->apic_id_limit + nb_numa_nodes) *
>                       sizeof(*numa_fw_cfg));
>  
>      return fw_cfg;
>  }
>  
> -static long get_file_size(FILE *f)
> -{
> -    long where, size;
> -
> -    /* XXX: on Unix systems, using fstat() probably makes more sense */
> -
> -    where = ftell(f);
> -    fseek(f, 0, SEEK_END);
> -    size = ftell(f);
> -    fseek(f, where, SEEK_SET);
> -
> -    return size;
> -}
> -
> -struct setup_data {
> -    uint64_t next;
> -    uint32_t type;
> -    uint32_t len;
> -    uint8_t data[0];
> -} __attribute__((packed));
> -
> -static void load_linux(PCMachineState *pcms,
> -                       FWCfgState *fw_cfg)
> -{
> -    uint16_t protocol;
> -    int setup_size, kernel_size, cmdline_size;
> -    int dtb_size, setup_data_offset;
> -    uint32_t initrd_max;
> -    uint8_t header[8192], *setup, *kernel;
> -    hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
> -    FILE *f;
> -    char *vmode;
> -    MachineState *machine = MACHINE(pcms);
> -    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> -    struct setup_data *setup_data;
> -    const char *kernel_filename = machine->kernel_filename;
> -    const char *initrd_filename = machine->initrd_filename;
> -    const char *dtb_filename = machine->dtb;
> -    const char *kernel_cmdline = machine->kernel_cmdline;
> -
> -    /* Align to 16 bytes as a paranoia measure */
> -    cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
> -
> -    /* load the kernel header */
> -    f = fopen(kernel_filename, "rb");
> -    if (!f || !(kernel_size = get_file_size(f)) ||
> -        fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
> -        MIN(ARRAY_SIZE(header), kernel_size)) {
> -        fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
> -                kernel_filename, strerror(errno));
> -        exit(1);
> -    }
> -
> -    /* kernel protocol version */
> -#if 0
> -    fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
> -#endif
> -    if (ldl_p(header+0x202) == 0x53726448) {
> -        protocol = lduw_p(header+0x206);
> -    } else {
> -        size_t pvh_start_addr;
> -        uint32_t mh_load_addr = 0;
> -        uint32_t elf_kernel_size = 0;
> -        /*
> -         * This could be a multiboot kernel. If it is, let's stop treating it
> -         * like a Linux kernel.
> -         * Note: some multiboot images could be in the ELF format (the same of
> -         * PVH), so we try multiboot first since we check the multiboot magic
> -         * header before to load it.
> -         */
> -        if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
> -                           kernel_cmdline, kernel_size, header)) {
> -            return;
> -        }
> -        /*
> -         * Check if the file is an uncompressed kernel file (ELF) and load it,
> -         * saving the PVH entry point used by the x86/HVM direct boot ABI.
> -         * If load_elfboot() is successful, populate the fw_cfg info.
> -         */
> -        if (pcmc->pvh_enabled &&
> -            pvh_load_elfboot(kernel_filename,
> -                             &mh_load_addr, &elf_kernel_size)) {
> -            fclose(f);
> -
> -            pvh_start_addr = pvh_get_start_addr();
> -
> -            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> -            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> -            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> -
> -            fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
> -                strlen(kernel_cmdline) + 1);
> -            fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> -
> -            fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
> -            fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
> -                             header, sizeof(header));
> -
> -            /* load initrd */
> -            if (initrd_filename) {
> -                GMappedFile *mapped_file;
> -                gsize initrd_size;
> -                gchar *initrd_data;
> -                GError *gerr = NULL;
> -
> -                mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
> -                if (!mapped_file) {
> -                    fprintf(stderr, "qemu: error reading initrd %s: %s\n",
> -                            initrd_filename, gerr->message);
> -                    exit(1);
> -                }
> -                pcms->initrd_mapped_file = mapped_file;
> -
> -                initrd_data = g_mapped_file_get_contents(mapped_file);
> -                initrd_size = g_mapped_file_get_length(mapped_file);
> -                initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1;
> -                if (initrd_size >= initrd_max) {
> -                    fprintf(stderr, "qemu: initrd is too large, cannot support."
> -                            "(max: %"PRIu32", need %"PRId64")\n",
> -                            initrd_max, (uint64_t)initrd_size);
> -                    exit(1);
> -                }
> -
> -                initrd_addr = (initrd_max - initrd_size) & ~4095;
> -
> -                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
> -                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
> -                fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
> -                                 initrd_size);
> -            }
> -
> -            option_rom[nb_option_roms].bootindex = 0;
> -            option_rom[nb_option_roms].name = "pvh.bin";
> -            nb_option_roms++;
> -
> -            return;
> -        }
> -        protocol = 0;
> -    }
> -
> -    if (protocol < 0x200 || !(header[0x211] & 0x01)) {
> -        /* Low kernel */
> -        real_addr    = 0x90000;
> -        cmdline_addr = 0x9a000 - cmdline_size;
> -        prot_addr    = 0x10000;
> -    } else if (protocol < 0x202) {
> -        /* High but ancient kernel */
> -        real_addr    = 0x90000;
> -        cmdline_addr = 0x9a000 - cmdline_size;
> -        prot_addr    = 0x100000;
> -    } else {
> -        /* High and recent kernel */
> -        real_addr    = 0x10000;
> -        cmdline_addr = 0x20000;
> -        prot_addr    = 0x100000;
> -    }
> -
> -#if 0
> -    fprintf(stderr,
> -            "qemu: real_addr     = 0x" TARGET_FMT_plx "\n"
> -            "qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
> -            "qemu: prot_addr     = 0x" TARGET_FMT_plx "\n",
> -            real_addr,
> -            cmdline_addr,
> -            prot_addr);
> -#endif
> -
> -    /* highest address for loading the initrd */
> -    if (protocol >= 0x20c &&
> -        lduw_p(header+0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
> -        /*
> -         * Linux has supported initrd up to 4 GB for a very long time (2007,
> -         * long before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013),
> -         * though it only sets initrd_max to 2 GB to "work around bootloader
> -         * bugs". Luckily, QEMU firmware(which does something like bootloader)
> -         * has supported this.
> -         *
> -         * It's believed that if XLF_CAN_BE_LOADED_ABOVE_4G is set, initrd can
> -         * be loaded into any address.
> -         *
> -         * In addition, initrd_max is uint32_t simply because QEMU doesn't
> -         * support the 64-bit boot protocol (specifically the ext_ramdisk_image
> -         * field).
> -         *
> -         * Therefore here just limit initrd_max to UINT32_MAX simply as well.
> -         */
> -        initrd_max = UINT32_MAX;
> -    } else if (protocol >= 0x203) {
> -        initrd_max = ldl_p(header+0x22c);
> -    } else {
> -        initrd_max = 0x37ffffff;
> -    }
> -
> -    if (initrd_max >= pcms->below_4g_mem_size - pcmc->acpi_data_size) {
> -        initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1;
> -    }
> -
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline)+1);
> -    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> -
> -    if (protocol >= 0x202) {
> -        stl_p(header+0x228, cmdline_addr);
> -    } else {
> -        stw_p(header+0x20, 0xA33F);
> -        stw_p(header+0x22, cmdline_addr-real_addr);
> -    }
> -
> -    /* handle vga= parameter */
> -    vmode = strstr(kernel_cmdline, "vga=");
> -    if (vmode) {
> -        unsigned int video_mode;
> -        /* skip "vga=" */
> -        vmode += 4;
> -        if (!strncmp(vmode, "normal", 6)) {
> -            video_mode = 0xffff;
> -        } else if (!strncmp(vmode, "ext", 3)) {
> -            video_mode = 0xfffe;
> -        } else if (!strncmp(vmode, "ask", 3)) {
> -            video_mode = 0xfffd;
> -        } else {
> -            video_mode = strtol(vmode, NULL, 0);
> -        }
> -        stw_p(header+0x1fa, video_mode);
> -    }
> -
> -    /* loader type */
> -    /* High nybble = B reserved for QEMU; low nybble is revision number.
> -       If this code is substantially changed, you may want to consider
> -       incrementing the revision. */
> -    if (protocol >= 0x200) {
> -        header[0x210] = 0xB0;
> -    }
> -    /* heap */
> -    if (protocol >= 0x201) {
> -        header[0x211] |= 0x80;	/* CAN_USE_HEAP */
> -        stw_p(header+0x224, cmdline_addr-real_addr-0x200);
> -    }
> -
> -    /* load initrd */
> -    if (initrd_filename) {
> -        GMappedFile *mapped_file;
> -        gsize initrd_size;
> -        gchar *initrd_data;
> -        GError *gerr = NULL;
> -
> -        if (protocol < 0x200) {
> -            fprintf(stderr, "qemu: linux kernel too old to load a ram disk\n");
> -            exit(1);
> -        }
> -
> -        mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
> -        if (!mapped_file) {
> -            fprintf(stderr, "qemu: error reading initrd %s: %s\n",
> -                    initrd_filename, gerr->message);
> -            exit(1);
> -        }
> -        pcms->initrd_mapped_file = mapped_file;
> -
> -        initrd_data = g_mapped_file_get_contents(mapped_file);
> -        initrd_size = g_mapped_file_get_length(mapped_file);
> -        if (initrd_size >= initrd_max) {
> -            fprintf(stderr, "qemu: initrd is too large, cannot support."
> -                    "(max: %"PRIu32", need %"PRId64")\n",
> -                    initrd_max, (uint64_t)initrd_size);
> -            exit(1);
> -        }
> -
> -        initrd_addr = (initrd_max-initrd_size) & ~4095;
> -
> -        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
> -        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
> -        fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
> -
> -        stl_p(header+0x218, initrd_addr);
> -        stl_p(header+0x21c, initrd_size);
> -    }
> -
> -    /* load kernel and setup */
> -    setup_size = header[0x1f1];
> -    if (setup_size == 0) {
> -        setup_size = 4;
> -    }
> -    setup_size = (setup_size+1)*512;
> -    if (setup_size > kernel_size) {
> -        fprintf(stderr, "qemu: invalid kernel header\n");
> -        exit(1);
> -    }
> -    kernel_size -= setup_size;
> -
> -    setup  = g_malloc(setup_size);
> -    kernel = g_malloc(kernel_size);
> -    fseek(f, 0, SEEK_SET);
> -    if (fread(setup, 1, setup_size, f) != setup_size) {
> -        fprintf(stderr, "fread() failed\n");
> -        exit(1);
> -    }
> -    if (fread(kernel, 1, kernel_size, f) != kernel_size) {
> -        fprintf(stderr, "fread() failed\n");
> -        exit(1);
> -    }
> -    fclose(f);
> -
> -    /* append dtb to kernel */
> -    if (dtb_filename) {
> -        if (protocol < 0x209) {
> -            fprintf(stderr, "qemu: Linux kernel too old to load a dtb\n");
> -            exit(1);
> -        }
> -
> -        dtb_size = get_image_size(dtb_filename);
> -        if (dtb_size <= 0) {
> -            fprintf(stderr, "qemu: error reading dtb %s: %s\n",
> -                    dtb_filename, strerror(errno));
> -            exit(1);
> -        }
> -
> -        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
> -        kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
> -        kernel = g_realloc(kernel, kernel_size);
> -
> -        stq_p(header+0x250, prot_addr + setup_data_offset);
> -
> -        setup_data = (struct setup_data *)(kernel + setup_data_offset);
> -        setup_data->next = 0;
> -        setup_data->type = cpu_to_le32(SETUP_DTB);
> -        setup_data->len = cpu_to_le32(dtb_size);
> -
> -        load_image_size(dtb_filename, setup_data->data, dtb_size);
> -    }
> -
> -    memcpy(setup, header, MIN(sizeof(header), setup_size));
> -
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
> -    fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
> -
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_ADDR, real_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
> -    fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
> -
> -    option_rom[nb_option_roms].bootindex = 0;
> -    option_rom[nb_option_roms].name = "linuxboot.bin";
> -    if (pcmc->linuxboot_dma_enabled && fw_cfg_dma_enabled(fw_cfg)) {
> -        option_rom[nb_option_roms].name = "linuxboot_dma.bin";
> -    }
> -    nb_option_roms++;
> -}
> -
>  #define NE2000_NB_MAX 6
>  
>  static const int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360,
> @@ -1376,157 +1002,10 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level)
>      }
>  }
>  
> -static void pc_new_cpu(PCMachineState *pcms, int64_t apic_id, Error **errp)
> -{
> -    Object *cpu = NULL;
> -    Error *local_err = NULL;
> -    CPUX86State *env = NULL;
> -
> -    cpu = object_new(MACHINE(pcms)->cpu_type);
> -
> -    env = &X86_CPU(cpu)->env;
> -    env->nr_dies = pcms->smp_dies;
> -
> -    object_property_set_uint(cpu, apic_id, "apic-id", &local_err);
> -    object_property_set_bool(cpu, true, "realized", &local_err);
> -
> -    object_unref(cpu);
> -    error_propagate(errp, local_err);
> -}
> -
> -/*
> - * This function is very similar to smp_parse()
> - * in hw/core/machine.c but includes CPU die support.
> - */
> -void pc_smp_parse(MachineState *ms, QemuOpts *opts)
> -{
> -    PCMachineState *pcms = PC_MACHINE(ms);
> -
> -    if (opts) {
> -        unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
> -        unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
> -        unsigned dies = qemu_opt_get_number(opts, "dies", 1);
> -        unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
> -        unsigned threads = qemu_opt_get_number(opts, "threads", 0);
> -
> -        /* compute missing values, prefer sockets over cores over threads */
> -        if (cpus == 0 || sockets == 0) {
> -            cores = cores > 0 ? cores : 1;
> -            threads = threads > 0 ? threads : 1;
> -            if (cpus == 0) {
> -                sockets = sockets > 0 ? sockets : 1;
> -                cpus = cores * threads * dies * sockets;
> -            } else {
> -                ms->smp.max_cpus =
> -                        qemu_opt_get_number(opts, "maxcpus", cpus);
> -                sockets = ms->smp.max_cpus / (cores * threads * dies);
> -            }
> -        } else if (cores == 0) {
> -            threads = threads > 0 ? threads : 1;
> -            cores = cpus / (sockets * dies * threads);
> -            cores = cores > 0 ? cores : 1;
> -        } else if (threads == 0) {
> -            threads = cpus / (cores * dies * sockets);
> -            threads = threads > 0 ? threads : 1;
> -        } else if (sockets * dies * cores * threads < cpus) {
> -            error_report("cpu topology: "
> -                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
> -                         "smp_cpus (%u)",
> -                         sockets, dies, cores, threads, cpus);
> -            exit(1);
> -        }
> -
> -        ms->smp.max_cpus =
> -                qemu_opt_get_number(opts, "maxcpus", cpus);
> -
> -        if (ms->smp.max_cpus < cpus) {
> -            error_report("maxcpus must be equal to or greater than smp");
> -            exit(1);
> -        }
> -
> -        if (sockets * dies * cores * threads > ms->smp.max_cpus) {
> -            error_report("cpu topology: "
> -                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) > "
> -                         "maxcpus (%u)",
> -                         sockets, dies, cores, threads,
> -                         ms->smp.max_cpus);
> -            exit(1);
> -        }
> -
> -        if (sockets * dies * cores * threads != ms->smp.max_cpus) {
> -            warn_report("Invalid CPU topology deprecated: "
> -                        "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
> -                        "!= maxcpus (%u)",
> -                        sockets, dies, cores, threads,
> -                        ms->smp.max_cpus);
> -        }
> -
> -        ms->smp.cpus = cpus;
> -        ms->smp.cores = cores;
> -        ms->smp.threads = threads;
> -        pcms->smp_dies = dies;
> -    }
> -
> -    if (ms->smp.cpus > 1) {
> -        Error *blocker = NULL;
> -        error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp");
> -        replay_add_blocker(blocker);
> -    }
> -}
> -
> -void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp)
> -{
> -    PCMachineState *pcms = PC_MACHINE(ms);
> -    int64_t apic_id = x86_cpu_apic_id_from_index(pcms, id);
> -    Error *local_err = NULL;
> -
> -    if (id < 0) {
> -        error_setg(errp, "Invalid CPU id: %" PRIi64, id);
> -        return;
> -    }
> -
> -    if (apic_id >= ACPI_CPU_HOTPLUG_ID_LIMIT) {
> -        error_setg(errp, "Unable to add CPU: %" PRIi64
> -                   ", resulting APIC ID (%" PRIi64 ") is too large",
> -                   id, apic_id);
> -        return;
> -    }
> -
> -    pc_new_cpu(PC_MACHINE(ms), apic_id, &local_err);
> -    if (local_err) {
> -        error_propagate(errp, local_err);
> -        return;
> -    }
> -}
> -
> -void pc_cpus_init(PCMachineState *pcms)
> -{
> -    int i;
> -    const CPUArchIdList *possible_cpus;
> -    MachineState *ms = MACHINE(pcms);
> -    MachineClass *mc = MACHINE_GET_CLASS(pcms);
> -    PCMachineClass *pcmc = PC_MACHINE_CLASS(mc);
> -
> -    x86_cpu_set_default_version(pcmc->default_cpu_version);
> -
> -    /* Calculates the limit to CPU APIC ID values
> -     *
> -     * Limit for the APIC ID value, so that all
> -     * CPU APIC IDs are < pcms->apic_id_limit.
> -     *
> -     * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
> -     */
> -    pcms->apic_id_limit = x86_cpu_apic_id_from_index(pcms,
> -                                                     ms->smp.max_cpus - 1) + 1;
> -    possible_cpus = mc->possible_cpu_arch_ids(ms);
> -    for (i = 0; i < ms->smp.cpus; i++) {
> -        pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
> -    }
> -}
> -
>  static void pc_build_feature_control_file(PCMachineState *pcms)
>  {
>      MachineState *ms = MACHINE(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      X86CPU *cpu = X86_CPU(ms->possible_cpus->cpus[0].cpu);
>      CPUX86State *env = &cpu->env;
>      uint32_t unused, ecx, edx;
> @@ -1550,7 +1029,7 @@ static void pc_build_feature_control_file(PCMachineState *pcms)
>  
>      val = g_malloc(sizeof(*val));
>      *val = cpu_to_le64(feature_control_bits | FEATURE_CONTROL_LOCKED);
> -    fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val));
> +    fw_cfg_add_file(x86ms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val));
>  }
>  
>  static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count)
> @@ -1571,10 +1050,11 @@ void pc_machine_done(Notifier *notifier, void *data)
>  {
>      PCMachineState *pcms = container_of(notifier,
>                                          PCMachineState, machine_done);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      PCIBus *bus = pcms->bus;
>  
>      /* set the number of CPUs */
> -    rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
> +    rtc_set_cpus_count(x86ms->rtc, x86ms->boot_cpus);
>  
>      if (bus) {
>          int extra_hosts = 0;
> @@ -1585,23 +1065,23 @@ void pc_machine_done(Notifier *notifier, void *data)
>                  extra_hosts++;
>              }
>          }
> -        if (extra_hosts && pcms->fw_cfg) {
> +        if (extra_hosts && x86ms->fw_cfg) {
>              uint64_t *val = g_malloc(sizeof(*val));
>              *val = cpu_to_le64(extra_hosts);
> -            fw_cfg_add_file(pcms->fw_cfg,
> +            fw_cfg_add_file(x86ms->fw_cfg,
>                      "etc/extra-pci-roots", val, sizeof(*val));
>          }
>      }
>  
>      acpi_setup();
> -    if (pcms->fw_cfg) {
> +    if (x86ms->fw_cfg) {
>          pc_build_smbios(pcms);
>          pc_build_feature_control_file(pcms);
>          /* update FW_CFG_NB_CPUS to account for -device added CPUs */
> -        fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> +        fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>      }
>  
> -    if (pcms->apic_id_limit > 255 && !xen_enabled()) {
> +    if (x86ms->apic_id_limit > 255 && !xen_enabled()) {
>          IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
>  
>          if (!iommu || !x86_iommu_ir_supported(X86_IOMMU_DEVICE(iommu)) ||
> @@ -1619,8 +1099,9 @@ void pc_guest_info_init(PCMachineState *pcms)
>  {
>      int i;
>      MachineState *ms = MACHINE(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>  
> -    pcms->apic_xrupt_override = kvm_allows_irq0_override();
> +    x86ms->apic_xrupt_override = kvm_allows_irq0_override();
>      pcms->numa_nodes = ms->numa_state->num_nodes;
>      pcms->node_mem = g_malloc0(pcms->numa_nodes *
>                                      sizeof *pcms->node_mem);
> @@ -1645,14 +1126,17 @@ void xen_load_linux(PCMachineState *pcms)
>  {
>      int i;
>      FWCfgState *fw_cfg;
> +    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>  
>      assert(MACHINE(pcms)->kernel_filename != NULL);
>  
>      fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
> -    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> +    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>      rom_set_fw(fw_cfg);
>  
> -    load_linux(pcms, fw_cfg);
> +    load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
> +               pcmc->linuxboot_dma_enabled, pcmc->pvh_enabled);
>      for (i = 0; i < nb_option_roms; i++) {
>          assert(!strcmp(option_rom[i].name, "linuxboot.bin") ||
>                 !strcmp(option_rom[i].name, "linuxboot_dma.bin") ||
> @@ -1660,7 +1144,7 @@ void xen_load_linux(PCMachineState *pcms)
>                 !strcmp(option_rom[i].name, "multiboot.bin"));
>          rom_add_option(option_rom[i].name, option_rom[i].bootindex);
>      }
> -    pcms->fw_cfg = fw_cfg;
> +    x86ms->fw_cfg = fw_cfg;
>  }
>  
>  void pc_memory_init(PCMachineState *pcms,
> @@ -1673,10 +1157,11 @@ void pc_memory_init(PCMachineState *pcms,
>      MemoryRegion *ram_below_4g, *ram_above_4g;
>      FWCfgState *fw_cfg;
>      MachineState *machine = MACHINE(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>  
> -    assert(machine->ram_size == pcms->below_4g_mem_size +
> -                                pcms->above_4g_mem_size);
> +    assert(machine->ram_size == x86ms->below_4g_mem_size +
> +                                x86ms->above_4g_mem_size);
>  
>      linux_boot = (machine->kernel_filename != NULL);
>  
> @@ -1690,17 +1175,17 @@ void pc_memory_init(PCMachineState *pcms,
>      *ram_memory = ram;
>      ram_below_4g = g_malloc(sizeof(*ram_below_4g));
>      memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> -                             0, pcms->below_4g_mem_size);
> +                             0, x86ms->below_4g_mem_size);
>      memory_region_add_subregion(system_memory, 0, ram_below_4g);
> -    e820_add_entry(0, pcms->below_4g_mem_size, E820_RAM);
> -    if (pcms->above_4g_mem_size > 0) {
> +    e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
> +    if (x86ms->above_4g_mem_size > 0) {
>          ram_above_4g = g_malloc(sizeof(*ram_above_4g));
>          memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> -                                 pcms->below_4g_mem_size,
> -                                 pcms->above_4g_mem_size);
> +                                 x86ms->below_4g_mem_size,
> +                                 x86ms->above_4g_mem_size);
>          memory_region_add_subregion(system_memory, 0x100000000ULL,
>                                      ram_above_4g);
> -        e820_add_entry(0x100000000ULL, pcms->above_4g_mem_size, E820_RAM);
> +        e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM);
>      }
>  
>      if (!pcmc->has_reserved_memory &&
> @@ -1735,7 +1220,7 @@ void pc_memory_init(PCMachineState *pcms,
>          }
>  
>          machine->device_memory->base =
> -            ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1 * GiB);
> +            ROUND_UP(0x100000000ULL + x86ms->above_4g_mem_size, 1 * GiB);
>  
>          if (pcmc->enforce_aligned_dimm) {
>              /* size device region assuming 1G page max alignment per slot */
> @@ -1786,16 +1271,17 @@ void pc_memory_init(PCMachineState *pcms,
>      }
>  
>      if (linux_boot) {
> -        load_linux(pcms, fw_cfg);
> +        load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
> +                   pcmc->linuxboot_dma_enabled, pcmc->pvh_enabled);
>      }
>  
>      for (i = 0; i < nb_option_roms; i++) {
>          rom_add_option(option_rom[i].name, option_rom[i].bootindex);
>      }
> -    pcms->fw_cfg = fw_cfg;
> +    x86ms->fw_cfg = fw_cfg;
>  
>      /* Init default IOAPIC address space */
> -    pcms->ioapic_as = &address_space_memory;
> +    x86ms->ioapic_as = &address_space_memory;
>  }
>  
>  /*
> @@ -1807,6 +1293,7 @@ uint64_t pc_pci_hole64_start(void)
>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>      MachineState *ms = MACHINE(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      uint64_t hole64_start = 0;
>  
>      if (pcmc->has_reserved_memory && ms->device_memory->base) {
> @@ -1815,7 +1302,7 @@ uint64_t pc_pci_hole64_start(void)
>              hole64_start += memory_region_size(&ms->device_memory->mr);
>          }
>      } else {
> -        hole64_start = 0x100000000ULL + pcms->above_4g_mem_size;
> +        hole64_start = 0x100000000ULL + x86ms->above_4g_mem_size;
>      }
>  
>      return ROUND_UP(hole64_start, 1 * GiB);
> @@ -2154,6 +1641,7 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
>      Error *local_err = NULL;
>      X86CPU *cpu = X86_CPU(dev);
>      PCMachineState *pcms = PC_MACHINE(hotplug_dev);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>  
>      if (pcms->acpi_dev) {
>          hotplug_handler_plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err);
> @@ -2163,12 +1651,12 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
>      }
>  
>      /* increment the number of CPUs */
> -    pcms->boot_cpus++;
> -    if (pcms->rtc) {
> -        rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
> +    x86ms->boot_cpus++;
> +    if (x86ms->rtc) {
> +        rtc_set_cpus_count(x86ms->rtc, x86ms->boot_cpus);
>      }
> -    if (pcms->fw_cfg) {
> -        fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> +    if (x86ms->fw_cfg) {
> +        fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>      }
>  
>      found_cpu = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, NULL);
> @@ -2214,6 +1702,7 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev,
>      Error *local_err = NULL;
>      X86CPU *cpu = X86_CPU(dev);
>      PCMachineState *pcms = PC_MACHINE(hotplug_dev);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>  
>      hotplug_handler_unplug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err);
>      if (local_err) {
> @@ -2225,10 +1714,10 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev,
>      object_property_set_bool(OBJECT(dev), false, "realized", NULL);
>  
>      /* decrement the number of CPUs */
> -    pcms->boot_cpus--;
> +    x86ms->boot_cpus--;
>      /* Update the number of CPUs in CMOS */
> -    rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
> -    fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> +    rtc_set_cpus_count(x86ms->rtc, x86ms->boot_cpus);
> +    fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>   out:
>      error_propagate(errp, local_err);
>  }
> @@ -2244,6 +1733,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>      CPUX86State *env = &cpu->env;
>      MachineState *ms = MACHINE(hotplug_dev);
>      PCMachineState *pcms = PC_MACHINE(hotplug_dev);
> +    X86MachineState *x86ms = X86_MACHINE(hotplug_dev);
>      unsigned int smp_cores = ms->smp.cores;
>      unsigned int smp_threads = ms->smp.threads;
>  
> @@ -2253,7 +1743,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>          return;
>      }
>  
> -    env->nr_dies = pcms->smp_dies;
> +    env->nr_dies = x86ms->smp_dies;
>  
>      /*
>       * If APIC ID is not set,
> @@ -2261,13 +1751,13 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>       */
>      if (cpu->apic_id == UNASSIGNED_APIC_ID) {
>          int max_socket = (ms->smp.max_cpus - 1) /
> -                                smp_threads / smp_cores / pcms->smp_dies;
> +                                smp_threads / smp_cores / x86ms->smp_dies;
>  
>          /*
>           * die-id was optional in QEMU 4.0 and older, so keep it optional
>           * if there's only one die per socket.
>           */
> -        if (cpu->die_id < 0 && pcms->smp_dies == 1) {
> +        if (cpu->die_id < 0 && x86ms->smp_dies == 1) {
>              cpu->die_id = 0;
>          }
>  
> @@ -2282,9 +1772,9 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>          if (cpu->die_id < 0) {
>              error_setg(errp, "CPU die-id is not set");
>              return;
> -        } else if (cpu->die_id > pcms->smp_dies - 1) {
> +        } else if (cpu->die_id > x86ms->smp_dies - 1) {
>              error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u",
> -                       cpu->die_id, pcms->smp_dies - 1);
> +                       cpu->die_id, x86ms->smp_dies - 1);
>              return;
>          }
>          if (cpu->core_id < 0) {
> @@ -2308,7 +1798,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>          topo.die_id = cpu->die_id;
>          topo.core_id = cpu->core_id;
>          topo.smt_id = cpu->thread_id;
> -        cpu->apic_id = apicid_from_topo_ids(pcms->smp_dies, smp_cores,
> +        cpu->apic_id = apicid_from_topo_ids(x86ms->smp_dies, smp_cores,
>                                              smp_threads, &topo);
>      }
>  
> @@ -2316,7 +1806,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>      if (!cpu_slot) {
>          MachineState *ms = MACHINE(pcms);
>  
> -        x86_topo_ids_from_apicid(cpu->apic_id, pcms->smp_dies,
> +        x86_topo_ids_from_apicid(cpu->apic_id, x86ms->smp_dies,
>                                   smp_cores, smp_threads, &topo);
>          error_setg(errp,
>              "Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with"
> @@ -2338,7 +1828,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>      /* TODO: move socket_id/core_id/thread_id checks into x86_cpu_realizefn()
>       * once -smp refactoring is complete and there will be CPU private
>       * CPUState::nr_cores and CPUState::nr_threads fields instead of globals */
> -    x86_topo_ids_from_apicid(cpu->apic_id, pcms->smp_dies,
> +    x86_topo_ids_from_apicid(cpu->apic_id, x86ms->smp_dies,
>                               smp_cores, smp_threads, &topo);
>      if (cpu->socket_id != -1 && cpu->socket_id != topo.pkg_id) {
>          error_setg(errp, "property socket-id: %u doesn't match set apic-id:"
> @@ -2520,45 +2010,6 @@ pc_machine_get_device_memory_region_size(Object *obj, Visitor *v,
>      visit_type_int(v, name, &value, errp);
>  }
>  
> -static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
> -                                            const char *name, void *opaque,
> -                                            Error **errp)
> -{
> -    PCMachineState *pcms = PC_MACHINE(obj);
> -    uint64_t value = pcms->max_ram_below_4g;
> -
> -    visit_type_size(v, name, &value, errp);
> -}
> -
> -static void pc_machine_set_max_ram_below_4g(Object *obj, Visitor *v,
> -                                            const char *name, void *opaque,
> -                                            Error **errp)
> -{
> -    PCMachineState *pcms = PC_MACHINE(obj);
> -    Error *error = NULL;
> -    uint64_t value;
> -
> -    visit_type_size(v, name, &value, &error);
> -    if (error) {
> -        error_propagate(errp, error);
> -        return;
> -    }
> -    if (value > 4 * GiB) {
> -        error_setg(&error,
> -                   "Machine option 'max-ram-below-4g=%"PRIu64
> -                   "' expects size less than or equal to 4G", value);
> -        error_propagate(errp, error);
> -        return;
> -    }
> -
> -    if (value < 1 * MiB) {
> -        warn_report("Only %" PRIu64 " bytes of RAM below the 4GiB boundary,"
> -                    "BIOS may not work with less than 1MiB", value);
> -    }
> -
> -    pcms->max_ram_below_4g = value;
> -}
> -
>  static void pc_machine_get_vmport(Object *obj, Visitor *v, const char *name,
>                                    void *opaque, Error **errp)
>  {
> @@ -2664,7 +2115,6 @@ static void pc_machine_initfn(Object *obj)
>  {
>      PCMachineState *pcms = PC_MACHINE(obj);
>  
> -    pcms->max_ram_below_4g = 0; /* use default */
>      pcms->smm = ON_OFF_AUTO_AUTO;
>  #ifdef CONFIG_VMPORT
>      pcms->vmport = ON_OFF_AUTO_AUTO;
> @@ -2676,7 +2126,6 @@ static void pc_machine_initfn(Object *obj)
>      pcms->smbus_enabled = true;
>      pcms->sata_enabled = true;
>      pcms->pit_enabled = true;
> -    pcms->smp_dies = 1;
>  
>      pc_system_flash_create(pcms);
>  }
> @@ -2707,85 +2156,6 @@ static void pc_machine_wakeup(MachineState *machine)
>      cpu_synchronize_all_post_reset();
>  }
>  
> -static CpuInstanceProperties
> -pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> -{
> -    MachineClass *mc = MACHINE_GET_CLASS(ms);
> -    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> -
> -    assert(cpu_index < possible_cpus->len);
> -    return possible_cpus->cpus[cpu_index].props;
> -}
> -
> -static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx)
> -{
> -   X86CPUTopoInfo topo;
> -   PCMachineState *pcms = PC_MACHINE(ms);
> -
> -   assert(idx < ms->possible_cpus->len);
> -   x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
> -                            pcms->smp_dies, ms->smp.cores,
> -                            ms->smp.threads, &topo);
> -   return topo.pkg_id % ms->numa_state->num_nodes;
> -}
> -
> -static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
> -{
> -    PCMachineState *pcms = PC_MACHINE(ms);
> -    int i;
> -    unsigned int max_cpus = ms->smp.max_cpus;
> -
> -    if (ms->possible_cpus) {
> -        /*
> -         * make sure that max_cpus hasn't changed since the first use, i.e.
> -         * -smp hasn't been parsed after it
> -        */
> -        assert(ms->possible_cpus->len == max_cpus);
> -        return ms->possible_cpus;
> -    }
> -
> -    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> -                                  sizeof(CPUArchId) * max_cpus);
> -    ms->possible_cpus->len = max_cpus;
> -    for (i = 0; i < ms->possible_cpus->len; i++) {
> -        X86CPUTopoInfo topo;
> -
> -        ms->possible_cpus->cpus[i].type = ms->cpu_type;
> -        ms->possible_cpus->cpus[i].vcpus_count = 1;
> -        ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(pcms, i);
> -        x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
> -                                 pcms->smp_dies, ms->smp.cores,
> -                                 ms->smp.threads, &topo);
> -        ms->possible_cpus->cpus[i].props.has_socket_id = true;
> -        ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id;
> -        if (pcms->smp_dies > 1) {
> -            ms->possible_cpus->cpus[i].props.has_die_id = true;
> -            ms->possible_cpus->cpus[i].props.die_id = topo.die_id;
> -        }
> -        ms->possible_cpus->cpus[i].props.has_core_id = true;
> -        ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
> -        ms->possible_cpus->cpus[i].props.has_thread_id = true;
> -        ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
> -    }
> -    return ms->possible_cpus;
> -}
> -
> -static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
> -{
> -    /* cpu index isn't used */
> -    CPUState *cs;
> -
> -    CPU_FOREACH(cs) {
> -        X86CPU *cpu = X86_CPU(cs);
> -
> -        if (!cpu->apic_state) {
> -            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
> -        } else {
> -            apic_deliver_nmi(cpu->apic_state);
> -        }
> -    }
> -}
> -
>  static void pc_machine_class_init(ObjectClass *oc, void *data)
>  {
>      MachineClass *mc = MACHINE_CLASS(oc);
> @@ -2810,14 +2180,11 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>      pcmc->pvh_enabled = true;
>      assert(!mc->get_hotplug_handler);
>      mc->get_hotplug_handler = pc_get_hotplug_handler;
> -    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
> -    mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
> -    mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
>      mc->auto_enable_numa_with_memhp = true;
>      mc->has_hotpluggable_cpus = true;
>      mc->default_boot_order = "cad";
> -    mc->hot_add_cpu = pc_hot_add_cpu;
> -    mc->smp_parse = pc_smp_parse;
> +    mc->hot_add_cpu = x86_hot_add_cpu;
> +    mc->smp_parse = x86_smp_parse;
>      mc->block_default_type = IF_IDE;
>      mc->max_cpus = 255;
>      mc->reset = pc_machine_reset;
> @@ -2835,13 +2202,6 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>          pc_machine_get_device_memory_region_size, NULL,
>          NULL, NULL, &error_abort);
>  
> -    object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
> -        pc_machine_get_max_ram_below_4g, pc_machine_set_max_ram_below_4g,
> -        NULL, NULL, &error_abort);
> -
> -    object_class_property_set_description(oc, PC_MACHINE_MAX_RAM_BELOW_4G,
> -        "Maximum ram below the 4G boundary (32bit boundary)", &error_abort);
> -
>      object_class_property_add(oc, PC_MACHINE_SMM, "OnOffAuto",
>          pc_machine_get_smm, pc_machine_set_smm,
>          NULL, NULL, &error_abort);
> @@ -2866,7 +2226,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>  
>  static const TypeInfo pc_machine_info = {
>      .name = TYPE_PC_MACHINE,
> -    .parent = TYPE_MACHINE,
> +    .parent = TYPE_X86_MACHINE,
>      .abstract = true,
>      .instance_size = sizeof(PCMachineState),
>      .instance_init = pc_machine_initfn,
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 2362675149..f63c27bc74 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -27,6 +27,7 @@
>  
>  #include "qemu/units.h"
>  #include "hw/loader.h"
> +#include "hw/i386/x86.h"
>  #include "hw/i386/pc.h"
>  #include "hw/i386/apic.h"
>  #include "hw/display/ramfb.h"
> @@ -73,6 +74,7 @@ static void pc_init1(MachineState *machine,
>  {
>      PCMachineState *pcms = PC_MACHINE(machine);
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      MemoryRegion *system_memory = get_system_memory();
>      MemoryRegion *system_io = get_system_io();
>      int i;
> @@ -125,11 +127,11 @@ static void pc_init1(MachineState *machine,
>      if (xen_enabled()) {
>          xen_hvm_init(pcms, &ram_memory);
>      } else {
> -        if (!pcms->max_ram_below_4g) {
> -            pcms->max_ram_below_4g = 0xe0000000; /* default: 3.5G */
> +        if (!x86ms->max_ram_below_4g) {
> +            x86ms->max_ram_below_4g = 0xe0000000; /* default: 3.5G */
>          }
> -        lowmem = pcms->max_ram_below_4g;
> -        if (machine->ram_size >= pcms->max_ram_below_4g) {
> +        lowmem = x86ms->max_ram_below_4g;
> +        if (machine->ram_size >= x86ms->max_ram_below_4g) {
>              if (pcmc->gigabyte_align) {
>                  if (lowmem > 0xc0000000) {
>                      lowmem = 0xc0000000;
> @@ -138,21 +140,21 @@ static void pc_init1(MachineState *machine,
>                      warn_report("Large machine and max_ram_below_4g "
>                                  "(%" PRIu64 ") not a multiple of 1G; "
>                                  "possible bad performance.",
> -                                pcms->max_ram_below_4g);
> +                                x86ms->max_ram_below_4g);
>                  }
>              }
>          }
>  
>          if (machine->ram_size >= lowmem) {
> -            pcms->above_4g_mem_size = machine->ram_size - lowmem;
> -            pcms->below_4g_mem_size = lowmem;
> +            x86ms->above_4g_mem_size = machine->ram_size - lowmem;
> +            x86ms->below_4g_mem_size = lowmem;
>          } else {
> -            pcms->above_4g_mem_size = 0;
> -            pcms->below_4g_mem_size = machine->ram_size;
> +            x86ms->above_4g_mem_size = 0;
> +            x86ms->below_4g_mem_size = machine->ram_size;
>          }
>      }
>  
> -    pc_cpus_init(pcms);
> +    x86_cpus_init(x86ms, pcmc->default_cpu_version);
>  
>      if (kvm_enabled() && pcmc->kvmclock_enabled) {
>          kvmclock_create();
> @@ -190,19 +192,19 @@ static void pc_init1(MachineState *machine,
>      gsi_state = g_malloc0(sizeof(*gsi_state));
>      if (kvm_ioapic_in_kernel()) {
>          kvm_pc_setup_irq_routing(pcmc->pci_enabled);
> -        pcms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
> -                                       GSI_NUM_PINS);
> +        x86ms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
> +                                        GSI_NUM_PINS);
>      } else {
> -        pcms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
> +        x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>      }
>  
>      if (pcmc->pci_enabled) {
>          pci_bus = i440fx_init(host_type,
>                                pci_type,
> -                              &i440fx_state, &piix3_devfn, &isa_bus, pcms->gsi,
> +                              &i440fx_state, &piix3_devfn, &isa_bus, x86ms->gsi,
>                                system_memory, system_io, machine->ram_size,
> -                              pcms->below_4g_mem_size,
> -                              pcms->above_4g_mem_size,
> +                              x86ms->below_4g_mem_size,
> +                              x86ms->above_4g_mem_size,
>                                pci_memory, ram_memory);
>          pcms->bus = pci_bus;
>      } else {
> @@ -212,7 +214,7 @@ static void pc_init1(MachineState *machine,
>                                &error_abort);
>          no_hpet = 1;
>      }
> -    isa_bus_irqs(isa_bus, pcms->gsi);
> +    isa_bus_irqs(isa_bus, x86ms->gsi);
>  
>      if (kvm_pic_in_kernel()) {
>          i8259 = kvm_i8259_init(isa_bus);
> @@ -230,7 +232,7 @@ static void pc_init1(MachineState *machine,
>          ioapic_init_gsi(gsi_state, "i440fx");
>      }
>  
> -    pc_register_ferr_irq(pcms->gsi[13]);
> +    pc_register_ferr_irq(x86ms->gsi[13]);
>  
>      pc_vga_init(isa_bus, pcmc->pci_enabled ? pci_bus : NULL);
>  
> @@ -240,7 +242,7 @@ static void pc_init1(MachineState *machine,
>      }
>  
>      /* init basic PC hardware */
> -    pc_basic_device_init(isa_bus, pcms->gsi, &rtc_state, true,
> +    pc_basic_device_init(isa_bus, x86ms->gsi, &rtc_state, true,
>                           (pcms->vmport != ON_OFF_AUTO_ON), pcms->pit_enabled,
>                           0x4);
>  
> @@ -288,7 +290,7 @@ else {
>          smi_irq = qemu_allocate_irq(pc_acpi_smi_interrupt, first_cpu, 0);
>          /* TODO: Populate SPD eeprom data.  */
>          smbus = piix4_pm_init(pci_bus, piix3_devfn + 3, 0xb100,
> -                              pcms->gsi[9], smi_irq,
> +                              x86ms->gsi[9], smi_irq,
>                                pc_machine_is_smm_enabled(pcms),
>                                &piix4_pm);
>          smbus_eeprom_init(smbus, 8, NULL, 0);
> @@ -304,7 +306,7 @@ else {
>  
>      if (machine->nvdimms_state->is_enabled) {
>          nvdimm_init_acpi_state(machine->nvdimms_state, system_io,
> -                               pcms->fw_cfg, OBJECT(pcms));
> +                               x86ms->fw_cfg, OBJECT(pcms));
>      }
>  }
>  
> @@ -728,7 +730,7 @@ DEFINE_I440FX_MACHINE(v1_4, "pc-i440fx-1.4", pc_compat_1_4_fn,
>  
>  static void pc_i440fx_1_3_machine_options(MachineClass *m)
>  {
> -    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
> +    X86MachineClass *x86mc = X86_MACHINE_CLASS(m);
>      static GlobalProperty compat[] = {
>          PC_CPU_MODEL_IDS("1.3.0")
>          { "usb-tablet", "usb_version", "1" },
> @@ -739,7 +741,7 @@ static void pc_i440fx_1_3_machine_options(MachineClass *m)
>  
>      pc_i440fx_1_4_machine_options(m);
>      m->hw_version = "1.3.0";
> -    pcmc->compat_apic_id_mode = true;
> +    x86mc->compat_apic_id_mode = true;
>      compat_props_add(m->compat_props, compat, G_N_ELEMENTS(compat));
>  }
>  
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index d4e8a1cb9f..71f71bc61d 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -41,6 +41,7 @@
>  #include "hw/pci-host/q35.h"
>  #include "hw/qdev-properties.h"
>  #include "exec/address-spaces.h"
> +#include "hw/i386/x86.h"
>  #include "hw/i386/pc.h"
>  #include "hw/i386/ich9.h"
>  #include "hw/i386/amd_iommu.h"
> @@ -115,6 +116,7 @@ static void pc_q35_init(MachineState *machine)
>  {
>      PCMachineState *pcms = PC_MACHINE(machine);
>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>      Q35PCIHost *q35_host;
>      PCIHostState *phb;
>      PCIBus *host_bus;
> @@ -152,34 +154,34 @@ static void pc_q35_init(MachineState *machine)
>      /* Handle the machine opt max-ram-below-4g.  It is basically doing
>       * min(qemu limit, user limit).
>       */
> -    if (!pcms->max_ram_below_4g) {
> -        pcms->max_ram_below_4g = 1ULL << 32; /* default: 4G */;
> +    if (!x86ms->max_ram_below_4g) {
> +        x86ms->max_ram_below_4g = 1ULL << 32; /* default: 4G */;
>      }
> -    if (lowmem > pcms->max_ram_below_4g) {
> -        lowmem = pcms->max_ram_below_4g;
> +    if (lowmem > x86ms->max_ram_below_4g) {
> +        lowmem = x86ms->max_ram_below_4g;
>          if (machine->ram_size - lowmem > lowmem &&
>              lowmem & (1 * GiB - 1)) {
>              warn_report("There is possibly poor performance as the ram size "
>                          " (0x%" PRIx64 ") is more then twice the size of"
>                          " max-ram-below-4g (%"PRIu64") and"
>                          " max-ram-below-4g is not a multiple of 1G.",
> -                        (uint64_t)machine->ram_size, pcms->max_ram_below_4g);
> +                        (uint64_t)machine->ram_size, x86ms->max_ram_below_4g);
>          }
>      }
>  
>      if (machine->ram_size >= lowmem) {
> -        pcms->above_4g_mem_size = machine->ram_size - lowmem;
> -        pcms->below_4g_mem_size = lowmem;
> +        x86ms->above_4g_mem_size = machine->ram_size - lowmem;
> +        x86ms->below_4g_mem_size = lowmem;
>      } else {
> -        pcms->above_4g_mem_size = 0;
> -        pcms->below_4g_mem_size = machine->ram_size;
> +        x86ms->above_4g_mem_size = 0;
> +        x86ms->below_4g_mem_size = machine->ram_size;
>      }
>  
>      if (xen_enabled()) {
>          xen_hvm_init(pcms, &ram_memory);
>      }
>  
> -    pc_cpus_init(pcms);
> +    x86_cpus_init(x86ms, pcmc->default_cpu_version);
>  
>      kvmclock_create();
>  
> @@ -213,10 +215,10 @@ static void pc_q35_init(MachineState *machine)
>      gsi_state = g_malloc0(sizeof(*gsi_state));
>      if (kvm_ioapic_in_kernel()) {
>          kvm_pc_setup_irq_routing(pcmc->pci_enabled);
> -        pcms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
> +        x86ms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
>                                         GSI_NUM_PINS);
>      } else {
> -        pcms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
> +        x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>      }
>  
>      /* create pci host bus */
> @@ -231,9 +233,9 @@ static void pc_q35_init(MachineState *machine)
>                               MCH_HOST_PROP_SYSTEM_MEM, NULL);
>      object_property_set_link(OBJECT(q35_host), OBJECT(system_io),
>                               MCH_HOST_PROP_IO_MEM, NULL);
> -    object_property_set_int(OBJECT(q35_host), pcms->below_4g_mem_size,
> +    object_property_set_int(OBJECT(q35_host), x86ms->below_4g_mem_size,
>                              PCI_HOST_BELOW_4G_MEM_SIZE, NULL);
> -    object_property_set_int(OBJECT(q35_host), pcms->above_4g_mem_size,
> +    object_property_set_int(OBJECT(q35_host), x86ms->above_4g_mem_size,
>                              PCI_HOST_ABOVE_4G_MEM_SIZE, NULL);
>      /* pci */
>      qdev_init_nofail(DEVICE(q35_host));
> @@ -255,7 +257,7 @@ static void pc_q35_init(MachineState *machine)
>      ich9_lpc = ICH9_LPC_DEVICE(lpc);
>      lpc_dev = DEVICE(lpc);
>      for (i = 0; i < GSI_NUM_PINS; i++) {
> -        qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, pcms->gsi[i]);
> +        qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, x86ms->gsi[i]);
>      }
>      pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
>                   ICH9_LPC_NB_PIRQS);
> @@ -279,7 +281,7 @@ static void pc_q35_init(MachineState *machine)
>          ioapic_init_gsi(gsi_state, "q35");
>      }
>  
> -    pc_register_ferr_irq(pcms->gsi[13]);
> +    pc_register_ferr_irq(x86ms->gsi[13]);
>  
>      assert(pcms->vmport != ON_OFF_AUTO__MAX);
>      if (pcms->vmport == ON_OFF_AUTO_AUTO) {
> @@ -287,7 +289,7 @@ static void pc_q35_init(MachineState *machine)
>      }
>  
>      /* init basic PC hardware */
> -    pc_basic_device_init(isa_bus, pcms->gsi, &rtc_state, !mc->no_floppy,
> +    pc_basic_device_init(isa_bus, x86ms->gsi, &rtc_state, !mc->no_floppy,
>                           (pcms->vmport != ON_OFF_AUTO_ON), pcms->pit_enabled,
>                           0xff0104);
>  
> @@ -330,7 +332,7 @@ static void pc_q35_init(MachineState *machine)
>  
>      if (machine->nvdimms_state->is_enabled) {
>          nvdimm_init_acpi_state(machine->nvdimms_state, system_io,
> -                               pcms->fw_cfg, OBJECT(pcms));
> +                               x86ms->fw_cfg, OBJECT(pcms));
>      }
>  }
>  
> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
> index a9983f0bfb..97f38e0423 100644
> --- a/hw/i386/pc_sysfw.c
> +++ b/hw/i386/pc_sysfw.c
> @@ -31,6 +31,7 @@
>  #include "qemu/option.h"
>  #include "qemu/units.h"
>  #include "hw/sysbus.h"
> +#include "hw/i386/x86.h"
>  #include "hw/i386/pc.h"
>  #include "hw/loader.h"
>  #include "hw/qdev-properties.h"
> @@ -38,8 +39,6 @@
>  #include "hw/block/flash.h"
>  #include "sysemu/kvm.h"
>  
> -#define BIOS_FILENAME "bios.bin"
> -
>  /*
>   * We don't have a theoretically justifiable exact lower bound on the base
>   * address of any flash mapping. In practice, the IO-APIC MMIO range is
> @@ -211,59 +210,6 @@ static void pc_system_flash_map(PCMachineState *pcms,
>      }
>  }
>  
> -static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)
> -{
> -    char *filename;
> -    MemoryRegion *bios, *isa_bios;
> -    int bios_size, isa_bios_size;
> -    int ret;
> -
> -    /* BIOS load */
> -    if (bios_name == NULL) {
> -        bios_name = BIOS_FILENAME;
> -    }
> -    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> -    if (filename) {
> -        bios_size = get_image_size(filename);
> -    } else {
> -        bios_size = -1;
> -    }
> -    if (bios_size <= 0 ||
> -        (bios_size % 65536) != 0) {
> -        goto bios_error;
> -    }
> -    bios = g_malloc(sizeof(*bios));
> -    memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
> -    if (!isapc_ram_fw) {
> -        memory_region_set_readonly(bios, true);
> -    }
> -    ret = rom_add_file_fixed(bios_name, (uint32_t)(-bios_size), -1);
> -    if (ret != 0) {
> -    bios_error:
> -        fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
> -        exit(1);
> -    }
> -    g_free(filename);
> -
> -    /* map the last 128KB of the BIOS in ISA space */
> -    isa_bios_size = MIN(bios_size, 128 * KiB);
> -    isa_bios = g_malloc(sizeof(*isa_bios));
> -    memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
> -                             bios_size - isa_bios_size, isa_bios_size);
> -    memory_region_add_subregion_overlap(rom_memory,
> -                                        0x100000 - isa_bios_size,
> -                                        isa_bios,
> -                                        1);
> -    if (!isapc_ram_fw) {
> -        memory_region_set_readonly(isa_bios, true);
> -    }
> -
> -    /* map all the bios at the top of memory */
> -    memory_region_add_subregion(rom_memory,
> -                                (uint32_t)(-bios_size),
> -                                bios);
> -}
> -
>  void pc_system_firmware_init(PCMachineState *pcms,
>                               MemoryRegion *rom_memory)
>  {
> @@ -272,7 +218,7 @@ void pc_system_firmware_init(PCMachineState *pcms,
>      BlockBackend *pflash_blk[ARRAY_SIZE(pcms->flash)];
>  
>      if (!pcmc->pci_enabled) {
> -        old_pc_system_rom_init(rom_memory, true);
> +        x86_system_rom_init(rom_memory, true);
>          return;
>      }
>  
> @@ -293,7 +239,7 @@ void pc_system_firmware_init(PCMachineState *pcms,
>  
>      if (!pflash_blk[0]) {
>          /* Machine property pflash0 not set, use ROM mode */
> -        old_pc_system_rom_init(rom_memory, false);
> +        x86_system_rom_init(rom_memory, false);
>      } else {
>          if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
>              /*
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> new file mode 100644
> index 0000000000..4de9dd100f
> --- /dev/null
> +++ b/hw/i386/x86.c
> @@ -0,0 +1,788 @@
> +/*
> + * Copyright (c) 2003-2004 Fabrice Bellard
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qemu/option.h"
> +#include "qemu/cutils.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "qapi/qmp/qerror.h"
> +#include "qapi/qapi-visit-common.h"
> +#include "qapi/visitor.h"
> +#include "sysemu/qtest.h"
> +#include "sysemu/numa.h"
> +#include "sysemu/replay.h"
> +#include "sysemu/sysemu.h"
> +
> +#include "hw/i386/x86.h"
> +#include "target/i386/cpu.h"
> +#include "hw/i386/topology.h"
> +#include "hw/i386/fw_cfg.h"
> +#include "hw/acpi/cpu_hotplug.h"
> +#include "hw/nmi.h"
> +#include "hw/loader.h"
> +#include "multiboot.h"
> +#include "pvh.h"
> +#include "standard-headers/asm-x86/bootparam.h"
> +
> +#define BIOS_FILENAME "bios.bin"
> +
> +/* Calculates initial APIC ID for a specific CPU index
> + *
> + * Currently we need to be able to calculate the APIC ID from the CPU index
> + * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces have
> + * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of
> + * all CPUs up to max_cpus.
> + */
> +uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
> +                                    unsigned int cpu_index)
> +{
> +    MachineState *ms = MACHINE(x86ms);
> +    X86MachineClass *x86mc = X86_MACHINE_GET_CLASS(x86ms);
> +    uint32_t correct_id;
> +    static bool warned;
> +
> +    correct_id = x86_apicid_from_cpu_idx(x86ms->smp_dies, ms->smp.cores,
> +                                         ms->smp.threads, cpu_index);
> +    if (x86mc->compat_apic_id_mode) {
> +        if (cpu_index != correct_id && !warned && !qtest_enabled()) {
> +            error_report("APIC IDs set in compatibility mode, "
> +                         "CPU topology won't match the configuration");
> +            warned = true;
> +        }
> +        return cpu_index;
> +    } else {
> +        return correct_id;
> +    }
> +}
> +
> +
> +static void x86_new_cpu(X86MachineState *x86ms, int64_t apic_id, Error **errp)
> +{
> +    Object *cpu = NULL;
> +    Error *local_err = NULL;
> +    CPUX86State *env = NULL;
> +
> +    cpu = object_new(MACHINE(x86ms)->cpu_type);
> +
> +    env = &X86_CPU(cpu)->env;
> +    env->nr_dies = x86ms->smp_dies;
> +
> +    object_property_set_uint(cpu, apic_id, "apic-id", &local_err);
> +    object_property_set_bool(cpu, true, "realized", &local_err);
> +
> +    object_unref(cpu);
> +    error_propagate(errp, local_err);
> +}
> +
> +/*
> + * This function is very similar to smp_parse()
> + * in hw/core/machine.c but includes CPU die support.
> + */
> +void x86_smp_parse(MachineState *ms, QemuOpts *opts)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(ms);
> +
> +    if (opts) {
> +        unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
> +        unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
> +        unsigned dies = qemu_opt_get_number(opts, "dies", 1);
> +        unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
> +        unsigned threads = qemu_opt_get_number(opts, "threads", 0);
> +
> +        /* compute missing values, prefer sockets over cores over threads */
> +        if (cpus == 0 || sockets == 0) {
> +            cores = cores > 0 ? cores : 1;
> +            threads = threads > 0 ? threads : 1;
> +            if (cpus == 0) {
> +                sockets = sockets > 0 ? sockets : 1;
> +                cpus = cores * threads * dies * sockets;
> +            } else {
> +                ms->smp.max_cpus =
> +                        qemu_opt_get_number(opts, "maxcpus", cpus);
> +                sockets = ms->smp.max_cpus / (cores * threads * dies);
> +            }
> +        } else if (cores == 0) {
> +            threads = threads > 0 ? threads : 1;
> +            cores = cpus / (sockets * dies * threads);
> +            cores = cores > 0 ? cores : 1;
> +        } else if (threads == 0) {
> +            threads = cpus / (cores * dies * sockets);
> +            threads = threads > 0 ? threads : 1;
> +        } else if (sockets * dies * cores * threads < cpus) {
> +            error_report("cpu topology: "
> +                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
> +                         "smp_cpus (%u)",
> +                         sockets, dies, cores, threads, cpus);
> +            exit(1);
> +        }
> +
> +        ms->smp.max_cpus =
> +                qemu_opt_get_number(opts, "maxcpus", cpus);
> +
> +        if (ms->smp.max_cpus < cpus) {
> +            error_report("maxcpus must be equal to or greater than smp");
> +            exit(1);
> +        }
> +
> +        if (sockets * dies * cores * threads > ms->smp.max_cpus) {
> +            error_report("cpu topology: "
> +                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) > "
> +                         "maxcpus (%u)",
> +                         sockets, dies, cores, threads,
> +                         ms->smp.max_cpus);
> +            exit(1);
> +        }
> +
> +        if (sockets * dies * cores * threads != ms->smp.max_cpus) {
> +            warn_report("Invalid CPU topology deprecated: "
> +                        "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
> +                        "!= maxcpus (%u)",
> +                        sockets, dies, cores, threads,
> +                        ms->smp.max_cpus);
> +        }
> +
> +        ms->smp.cpus = cpus;
> +        ms->smp.cores = cores;
> +        ms->smp.threads = threads;
> +        x86ms->smp_dies = dies;
> +    }
> +
> +    if (ms->smp.cpus > 1) {
> +        Error *blocker = NULL;
> +        error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp");
> +        replay_add_blocker(blocker);
> +    }
> +}
> +
> +void x86_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(ms);
> +    int64_t apic_id = x86_cpu_apic_id_from_index(x86ms, id);
> +    Error *local_err = NULL;
> +
> +    if (id < 0) {
> +        error_setg(errp, "Invalid CPU id: %" PRIi64, id);
> +        return;
> +    }
> +
> +    if (apic_id >= ACPI_CPU_HOTPLUG_ID_LIMIT) {
> +        error_setg(errp, "Unable to add CPU: %" PRIi64
> +                   ", resulting APIC ID (%" PRIi64 ") is too large",
> +                   id, apic_id);
> +        return;
> +    }
> +
> +    x86_new_cpu(X86_MACHINE(ms), apic_id, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
> +void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
> +{
> +    int i;
> +    const CPUArchIdList *possible_cpus;
> +    MachineState *ms = MACHINE(x86ms);
> +    MachineClass *mc = MACHINE_GET_CLASS(x86ms);
> +
> +    x86_cpu_set_default_version(default_cpu_version);
> +
> +    /* Calculates the limit to CPU APIC ID values
> +     *
> +     * Limit for the APIC ID value, so that all
> +     * CPU APIC IDs are < x86ms->apic_id_limit.
> +     *
> +     * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
> +     */
> +    x86ms->apic_id_limit = x86_cpu_apic_id_from_index(x86ms,
> +                                                      ms->smp.max_cpus - 1) + 1;
> +    possible_cpus = mc->possible_cpu_arch_ids(ms);
> +    for (i = 0; i < ms->smp.cpus; i++) {
> +        x86_new_cpu(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
> +    }
> +}
> +
> +void x86_nmi(NMIState *n, int cpu_index, Error **errp)
> +{
> +    /* cpu index isn't used */
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        X86CPU *cpu = X86_CPU(cs);
> +
> +        if (!cpu->apic_state) {
> +            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
> +        } else {
> +            apic_deliver_nmi(cpu->apic_state);
> +        }
> +    }
> +}
> +
> +CpuInstanceProperties
> +x86_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;
> +}
> +
> +int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx)
> +{
> +   X86CPUTopoInfo topo;
> +   X86MachineState *x86ms = X86_MACHINE(ms);
> +
> +   assert(idx < ms->possible_cpus->len);
> +   x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
> +                            x86ms->smp_dies, ms->smp.cores,
> +                            ms->smp.threads, &topo);
> +   return topo.pkg_id % ms->numa_state->num_nodes;
> +}
> +
> +const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(ms);
> +    int i;
> +    unsigned int max_cpus = ms->smp.max_cpus;
> +
> +    if (ms->possible_cpus) {
> +        /*
> +         * make sure that max_cpus hasn't changed since the first use, i.e.
> +         * -smp hasn't been parsed after it
> +        */
> +        assert(ms->possible_cpus->len == max_cpus);
> +        return ms->possible_cpus;
> +    }
> +
> +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
> +                                  sizeof(CPUArchId) * max_cpus);
> +    ms->possible_cpus->len = max_cpus;
> +    for (i = 0; i < ms->possible_cpus->len; i++) {
> +        X86CPUTopoInfo topo;
> +
> +        ms->possible_cpus->cpus[i].type = ms->cpu_type;
> +        ms->possible_cpus->cpus[i].vcpus_count = 1;
> +        ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(x86ms, i);
> +        x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
> +                                 x86ms->smp_dies, ms->smp.cores,
> +                                 ms->smp.threads, &topo);
> +        ms->possible_cpus->cpus[i].props.has_socket_id = true;
> +        ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id;
> +        if (x86ms->smp_dies > 1) {
> +            ms->possible_cpus->cpus[i].props.has_die_id = true;
> +            ms->possible_cpus->cpus[i].props.die_id = topo.die_id;
> +        }
> +        ms->possible_cpus->cpus[i].props.has_core_id = true;
> +        ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
> +        ms->possible_cpus->cpus[i].props.has_thread_id = true;
> +        ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
> +    }
> +    return ms->possible_cpus;
> +}
> +
> +void x86_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)
> +{
> +    char *filename;
> +    MemoryRegion *bios, *isa_bios;
> +    int bios_size, isa_bios_size;
> +    int ret;
> +
> +    /* BIOS load */
> +    if (bios_name == NULL) {
> +        bios_name = BIOS_FILENAME;
> +    }
> +    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> +    if (filename) {
> +        bios_size = get_image_size(filename);
> +    } else {
> +        bios_size = -1;
> +    }
> +    if (bios_size <= 0 ||
> +        (bios_size % 65536) != 0) {
> +        goto bios_error;
> +    }
> +    bios = g_malloc(sizeof(*bios));
> +    memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
> +    if (!isapc_ram_fw) {
> +        memory_region_set_readonly(bios, true);
> +    }
> +    ret = rom_add_file_fixed(bios_name, (uint32_t)(-bios_size), -1);
> +    if (ret != 0) {
> +    bios_error:
> +        fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
> +        exit(1);
> +    }
> +    g_free(filename);
> +
> +    /* map the last 128KB of the BIOS in ISA space */
> +    isa_bios_size = MIN(bios_size, 128 * KiB);
> +    isa_bios = g_malloc(sizeof(*isa_bios));
> +    memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
> +                             bios_size - isa_bios_size, isa_bios_size);
> +    memory_region_add_subregion_overlap(rom_memory,
> +                                        0x100000 - isa_bios_size,
> +                                        isa_bios,
> +                                        1);
> +    if (!isapc_ram_fw) {
> +        memory_region_set_readonly(isa_bios, true);
> +    }
> +
> +    /* map all the bios at the top of memory */
> +    memory_region_add_subregion(rom_memory,
> +                                (uint32_t)(-bios_size),
> +                                bios);
> +}
> +
> +static long get_file_size(FILE *f)
> +{
> +    long where, size;
> +
> +    /* XXX: on Unix systems, using fstat() probably makes more sense */
> +
> +    where = ftell(f);
> +    fseek(f, 0, SEEK_END);
> +    size = ftell(f);
> +    fseek(f, where, SEEK_SET);
> +
> +    return size;
> +}
> +
> +struct setup_data {
> +    uint64_t next;
> +    uint32_t type;
> +    uint32_t len;
> +    uint8_t data[0];
> +} __attribute__((packed));
> +
> +void load_linux(X86MachineState *x86ms,
> +                FWCfgState *fw_cfg,
> +                unsigned acpi_data_size,
> +                bool linuxboot_dma_enabled,
> +                bool pvh_enabled)
> +{
> +    uint16_t protocol;
> +    int setup_size, kernel_size, cmdline_size;
> +    int dtb_size, setup_data_offset;
> +    uint32_t initrd_max;
> +    uint8_t header[8192], *setup, *kernel;
> +    hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
> +    FILE *f;
> +    char *vmode;
> +    MachineState *machine = MACHINE(x86ms);
> +    struct setup_data *setup_data;
> +    const char *kernel_filename = machine->kernel_filename;
> +    const char *initrd_filename = machine->initrd_filename;
> +    const char *dtb_filename = machine->dtb;
> +    const char *kernel_cmdline = machine->kernel_cmdline;
> +
> +    /* Align to 16 bytes as a paranoia measure */
> +    cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
> +
> +    /* load the kernel header */
> +    f = fopen(kernel_filename, "rb");
> +    if (!f || !(kernel_size = get_file_size(f)) ||
> +        fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
> +        MIN(ARRAY_SIZE(header), kernel_size)) {
> +        fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
> +                kernel_filename, strerror(errno));
> +        exit(1);
> +    }
> +
> +    /* kernel protocol version */
> +#if 0
> +    fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
> +#endif
> +    if (ldl_p(header+0x202) == 0x53726448) {
> +        protocol = lduw_p(header+0x206);
> +    } else {
> +        size_t pvh_start_addr;
> +        uint32_t mh_load_addr = 0;
> +        uint32_t elf_kernel_size = 0;
> +        /*
> +         * This could be a multiboot kernel. If it is, let's stop treating it
> +         * like a Linux kernel.
> +         * Note: some multiboot images could be in the ELF format (the same of
> +         * PVH), so we try multiboot first since we check the multiboot magic
> +         * header before to load it.
> +         */
> +        if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
> +                           kernel_cmdline, kernel_size, header)) {
> +            return;
> +        }
> +        /*
> +         * Check if the file is an uncompressed kernel file (ELF) and load it,
> +         * saving the PVH entry point used by the x86/HVM direct boot ABI.
> +         * If load_elfboot() is successful, populate the fw_cfg info.
> +         */
> +        if (pvh_enabled &&
> +            pvh_load_elfboot(kernel_filename,
> +                             &mh_load_addr, &elf_kernel_size)) {
> +            fclose(f);
> +
> +            pvh_start_addr = pvh_get_start_addr();
> +
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> +
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
> +                strlen(kernel_cmdline) + 1);
> +            fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> +
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
> +            fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
> +                             header, sizeof(header));
> +
> +            /* load initrd */
> +            if (initrd_filename) {
> +                GMappedFile *mapped_file;
> +                gsize initrd_size;
> +                gchar *initrd_data;
> +                GError *gerr = NULL;
> +
> +                mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
> +                if (!mapped_file) {
> +                    fprintf(stderr, "qemu: error reading initrd %s: %s\n",
> +                            initrd_filename, gerr->message);
> +                    exit(1);
> +                }
> +                x86ms->initrd_mapped_file = mapped_file;
> +
> +                initrd_data = g_mapped_file_get_contents(mapped_file);
> +                initrd_size = g_mapped_file_get_length(mapped_file);
> +                initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
> +                if (initrd_size >= initrd_max) {
> +                    fprintf(stderr, "qemu: initrd is too large, cannot support."
> +                            "(max: %"PRIu32", need %"PRId64")\n",
> +                            initrd_max, (uint64_t)initrd_size);
> +                    exit(1);
> +                }
> +
> +                initrd_addr = (initrd_max - initrd_size) & ~4095;
> +
> +                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
> +                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
> +                                 initrd_size);
> +            }
> +
> +            option_rom[nb_option_roms].bootindex = 0;
> +            option_rom[nb_option_roms].name = "pvh.bin";
> +            nb_option_roms++;
> +
> +            return;
> +        }
> +        protocol = 0;
> +    }
> +
> +    if (protocol < 0x200 || !(header[0x211] & 0x01)) {
> +        /* Low kernel */
> +        real_addr    = 0x90000;
> +        cmdline_addr = 0x9a000 - cmdline_size;
> +        prot_addr    = 0x10000;
> +    } else if (protocol < 0x202) {
> +        /* High but ancient kernel */
> +        real_addr    = 0x90000;
> +        cmdline_addr = 0x9a000 - cmdline_size;
> +        prot_addr    = 0x100000;
> +    } else {
> +        /* High and recent kernel */
> +        real_addr    = 0x10000;
> +        cmdline_addr = 0x20000;
> +        prot_addr    = 0x100000;
> +    }
> +
> +#if 0
> +    fprintf(stderr,
> +            "qemu: real_addr     = 0x" TARGET_FMT_plx "\n"
> +            "qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
> +            "qemu: prot_addr     = 0x" TARGET_FMT_plx "\n",
> +            real_addr,
> +            cmdline_addr,
> +            prot_addr);
> +#endif
> +
> +    /* highest address for loading the initrd */
> +    if (protocol >= 0x20c &&
> +        lduw_p(header+0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
> +        /*
> +         * Linux has supported initrd up to 4 GB for a very long time (2007,
> +         * long before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013),
> +         * though it only sets initrd_max to 2 GB to "work around bootloader
> +         * bugs". Luckily, QEMU firmware(which does something like bootloader)
> +         * has supported this.
> +         *
> +         * It's believed that if XLF_CAN_BE_LOADED_ABOVE_4G is set, initrd can
> +         * be loaded into any address.
> +         *
> +         * In addition, initrd_max is uint32_t simply because QEMU doesn't
> +         * support the 64-bit boot protocol (specifically the ext_ramdisk_image
> +         * field).
> +         *
> +         * Therefore here just limit initrd_max to UINT32_MAX simply as well.
> +         */
> +        initrd_max = UINT32_MAX;
> +    } else if (protocol >= 0x203) {
> +        initrd_max = ldl_p(header+0x22c);
> +    } else {
> +        initrd_max = 0x37ffffff;
> +    }
> +
> +    if (initrd_max >= x86ms->below_4g_mem_size - acpi_data_size) {
> +        initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
> +    }
> +
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline)+1);
> +    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> +
> +    if (protocol >= 0x202) {
> +        stl_p(header+0x228, cmdline_addr);
> +    } else {
> +        stw_p(header+0x20, 0xA33F);
> +        stw_p(header+0x22, cmdline_addr-real_addr);
> +    }
> +
> +    /* handle vga= parameter */
> +    vmode = strstr(kernel_cmdline, "vga=");
> +    if (vmode) {
> +        unsigned int video_mode;
> +        /* skip "vga=" */
> +        vmode += 4;
> +        if (!strncmp(vmode, "normal", 6)) {
> +            video_mode = 0xffff;
> +        } else if (!strncmp(vmode, "ext", 3)) {
> +            video_mode = 0xfffe;
> +        } else if (!strncmp(vmode, "ask", 3)) {
> +            video_mode = 0xfffd;
> +        } else {
> +            video_mode = strtol(vmode, NULL, 0);
> +        }
> +        stw_p(header+0x1fa, video_mode);
> +    }
> +
> +    /* loader type */
> +    /* High nybble = B reserved for QEMU; low nybble is revision number.
> +       If this code is substantially changed, you may want to consider
> +       incrementing the revision. */
> +    if (protocol >= 0x200) {
> +        header[0x210] = 0xB0;
> +    }
> +    /* heap */
> +    if (protocol >= 0x201) {
> +        header[0x211] |= 0x80;	/* CAN_USE_HEAP */
> +        stw_p(header+0x224, cmdline_addr-real_addr-0x200);
> +    }
> +
> +    /* load initrd */
> +    if (initrd_filename) {
> +        GMappedFile *mapped_file;
> +        gsize initrd_size;
> +        gchar *initrd_data;
> +        GError *gerr = NULL;
> +
> +        if (protocol < 0x200) {
> +            fprintf(stderr, "qemu: linux kernel too old to load a ram disk\n");
> +            exit(1);
> +        }
> +
> +        mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
> +        if (!mapped_file) {
> +            fprintf(stderr, "qemu: error reading initrd %s: %s\n",
> +                    initrd_filename, gerr->message);
> +            exit(1);
> +        }
> +        x86ms->initrd_mapped_file = mapped_file;
> +
> +        initrd_data = g_mapped_file_get_contents(mapped_file);
> +        initrd_size = g_mapped_file_get_length(mapped_file);
> +        if (initrd_size >= initrd_max) {
> +            fprintf(stderr, "qemu: initrd is too large, cannot support."
> +                    "(max: %"PRIu32", need %"PRId64")\n",
> +                    initrd_max, (uint64_t)initrd_size);
> +            exit(1);
> +        }
> +
> +        initrd_addr = (initrd_max-initrd_size) & ~4095;
> +
> +        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
> +        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
> +        fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
> +
> +        stl_p(header+0x218, initrd_addr);
> +        stl_p(header+0x21c, initrd_size);
> +    }
> +
> +    /* load kernel and setup */
> +    setup_size = header[0x1f1];
> +    if (setup_size == 0) {
> +        setup_size = 4;
> +    }
> +    setup_size = (setup_size+1)*512;
> +    if (setup_size > kernel_size) {
> +        fprintf(stderr, "qemu: invalid kernel header\n");
> +        exit(1);
> +    }
> +    kernel_size -= setup_size;
> +
> +    setup  = g_malloc(setup_size);
> +    kernel = g_malloc(kernel_size);
> +    fseek(f, 0, SEEK_SET);
> +    if (fread(setup, 1, setup_size, f) != setup_size) {
> +        fprintf(stderr, "fread() failed\n");
> +        exit(1);
> +    }
> +    if (fread(kernel, 1, kernel_size, f) != kernel_size) {
> +        fprintf(stderr, "fread() failed\n");
> +        exit(1);
> +    }
> +    fclose(f);
> +
> +    /* append dtb to kernel */
> +    if (dtb_filename) {
> +        if (protocol < 0x209) {
> +            fprintf(stderr, "qemu: Linux kernel too old to load a dtb\n");
> +            exit(1);
> +        }
> +
> +        dtb_size = get_image_size(dtb_filename);
> +        if (dtb_size <= 0) {
> +            fprintf(stderr, "qemu: error reading dtb %s: %s\n",
> +                    dtb_filename, strerror(errno));
> +            exit(1);
> +        }
> +
> +        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
> +        kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
> +        kernel = g_realloc(kernel, kernel_size);
> +
> +        stq_p(header+0x250, prot_addr + setup_data_offset);
> +
> +        setup_data = (struct setup_data *)(kernel + setup_data_offset);
> +        setup_data->next = 0;
> +        setup_data->type = cpu_to_le32(SETUP_DTB);
> +        setup_data->len = cpu_to_le32(dtb_size);
> +
> +        load_image_size(dtb_filename, setup_data->data, dtb_size);
> +    }
> +
> +    memcpy(setup, header, MIN(sizeof(header), setup_size));
> +
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
> +    fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
> +
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_ADDR, real_addr);
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
> +    fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
> +
> +    option_rom[nb_option_roms].bootindex = 0;
> +    option_rom[nb_option_roms].name = "linuxboot.bin";
> +    if (linuxboot_dma_enabled && fw_cfg_dma_enabled(fw_cfg)) {
> +        option_rom[nb_option_roms].name = "linuxboot_dma.bin";
> +    }
> +    nb_option_roms++;
> +}
> +
> +static void x86_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
> +                                             const char *name, void *opaque,
> +                                             Error **errp)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(obj);
> +    uint64_t value = x86ms->max_ram_below_4g;
> +
> +    visit_type_size(v, name, &value, errp);
> +}
> +
> +static void x86_machine_set_max_ram_below_4g(Object *obj, Visitor *v,
> +                                             const char *name, void *opaque,
> +                                             Error **errp)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(obj);
> +    Error *error = NULL;
> +    uint64_t value;
> +
> +    visit_type_size(v, name, &value, &error);
> +    if (error) {
> +        error_propagate(errp, error);
> +        return;
> +    }
> +    if (value > 4 * GiB) {
> +        error_setg(&error,
> +                   "Machine option 'max-ram-below-4g=%"PRIu64
> +                   "' expects size less than or equal to 4G", value);
> +        error_propagate(errp, error);
> +        return;
> +    }
> +
> +    if (value < 1 * MiB) {
> +        warn_report("Only %" PRIu64 " bytes of RAM below the 4GiB boundary,"
> +                    "BIOS may not work with less than 1MiB", value);
> +    }
> +
> +    x86ms->max_ram_below_4g = value;
> +}
> +
> +static void x86_machine_initfn(Object *obj)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(obj);
> +
> +    x86ms->max_ram_below_4g = 0; /* use default */
> +    x86ms->smp_dies = 1;
> +}
> +
> +static void x86_machine_class_init(ObjectClass *oc, void *data)
> +{
> +    MachineClass *mc = MACHINE_CLASS(oc);
> +
> +    mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
> +    mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
> +    mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
> +
> +    object_class_property_add(oc, X86_MACHINE_MAX_RAM_BELOW_4G, "size",
> +        x86_machine_get_max_ram_below_4g, x86_machine_set_max_ram_below_4g,
> +        NULL, NULL, &error_abort);
> +
> +    object_class_property_set_description(oc, X86_MACHINE_MAX_RAM_BELOW_4G,
> +        "Maximum ram below the 4G boundary (32bit boundary)", &error_abort);
> +}
> +
> +static const TypeInfo x86_machine_info = {
> +    .name = TYPE_X86_MACHINE,
> +    .parent = TYPE_MACHINE,
> +    .abstract = true,
> +    .instance_size = sizeof(X86MachineState),
> +    .instance_init = x86_machine_initfn,
> +    .class_size = sizeof(X86MachineClass),
> +    .class_init = x86_machine_class_init,
> +};
> +
> +static void x86_machine_register_types(void)
> +{
> +    type_register_static(&x86_machine_info);
> +}
> +
> +type_init(x86_machine_register_types)
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index 1ede055387..e621dde6c3 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -23,6 +23,7 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "monitor/monitor.h"
> +#include "hw/i386/x86.h"
>  #include "hw/i386/pc.h"
>  #include "hw/i386/apic.h"
>  #include "hw/i386/ioapic.h"
> @@ -89,7 +90,7 @@ static void ioapic_entry_parse(uint64_t entry, struct ioapic_entry_info *info)
>  
>  static void ioapic_service(IOAPICCommonState *s)
>  {
> -    AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
> +    AddressSpace *ioapic_as = X86_MACHINE(qdev_get_machine())->ioapic_as;
>      struct ioapic_entry_info info;
>      uint8_t i;
>      uint32_t mask;
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 062feeb69e..de28d55e5c 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -3,6 +3,7 @@
>  
>  #include "exec/memory.h"
>  #include "hw/boards.h"
> +#include "hw/i386/x86.h"
>  #include "hw/isa/isa.h"
>  #include "hw/block/fdc.h"
>  #include "hw/block/flash.h"
> @@ -27,7 +28,7 @@
>   */
>  struct PCMachineState {
>      /*< private >*/
> -    MachineState parent_obj;
> +    X86MachineState parent_obj;
>  
>      /* <public> */
>  
> @@ -36,15 +37,10 @@ struct PCMachineState {
>  
>      /* Pointers to devices and objects: */
>      HotplugHandler *acpi_dev;
> -    ISADevice *rtc;
>      PCIBus *bus;
> -    FWCfgState *fw_cfg;
> -    qemu_irq *gsi;
>      PFlashCFI01 *flash[2];
> -    GMappedFile *initrd_mapped_file;
>  
>      /* Configuration options: */
> -    uint64_t max_ram_below_4g;
>      OnOffAuto vmport;
>      OnOffAuto smm;
>  
> @@ -53,27 +49,13 @@ struct PCMachineState {
>      bool sata_enabled;
>      bool pit_enabled;
>  
> -    /* RAM information (sizes, addresses, configuration): */
> -    ram_addr_t below_4g_mem_size, above_4g_mem_size;
> -
> -    /* CPU and apic information: */
> -    bool apic_xrupt_override;
> -    unsigned apic_id_limit;
> -    uint16_t boot_cpus;
> -    unsigned smp_dies;
> -
>      /* NUMA information: */
>      uint64_t numa_nodes;
>      uint64_t *node_mem;
> -
> -    /* Address space used by IOAPIC device. All IOAPIC interrupts
> -     * will be translated to MSI messages in the address space. */
> -    AddressSpace *ioapic_as;
>  };
>  
>  #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
>  #define PC_MACHINE_DEVMEM_REGION_SIZE "device-memory-region-size"
> -#define PC_MACHINE_MAX_RAM_BELOW_4G "max-ram-below-4g"
>  #define PC_MACHINE_VMPORT           "vmport"
>  #define PC_MACHINE_SMM              "smm"
>  #define PC_MACHINE_SMBUS            "smbus"
> @@ -139,9 +121,6 @@ typedef struct PCMachineClass {
>  
>      /* use PVH to load kernels that support this feature */
>      bool pvh_enabled;
> -
> -    /* Enables contiguous-apic-ID mode */
> -    bool compat_apic_id_mode;
>  } PCMachineClass;
>  
>  #define TYPE_PC_MACHINE "generic-pc-machine"
> @@ -193,10 +172,6 @@ bool pc_machine_is_smm_enabled(PCMachineState *pcms);
>  void pc_register_ferr_irq(qemu_irq irq);
>  void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
>  
> -void pc_cpus_init(PCMachineState *pcms);
> -void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
> -void pc_smp_parse(MachineState *ms, QemuOpts *opts);
> -
>  void pc_guest_info_init(PCMachineState *pcms);
>  
>  #define PCI_HOST_PROP_PCI_HOLE_START   "pci-hole-start"
> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> new file mode 100644
> index 0000000000..5980090b29
> --- /dev/null
> +++ b/include/hw/i386/x86.h
> @@ -0,0 +1,97 @@
> +/*
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_I386_X86_H
> +#define HW_I386_X86_H
> +
> +#include "qemu-common.h"
> +#include "exec/hwaddr.h"
> +#include "qemu/notify.h"
> +
> +#include "hw/boards.h"
> +#include "hw/nmi.h"
> +
> +typedef struct {
> +    /*< private >*/
> +    MachineClass parent;
> +
> +    /*< public >*/
> +
> +    /* Enables contiguous-apic-ID mode */
> +    bool compat_apic_id_mode;
> +} X86MachineClass;
> +
> +typedef struct {
> +    /*< private >*/
> +    MachineState parent;
> +
> +    /*< public >*/
> +
> +    /* Pointers to devices and objects: */
> +    ISADevice *rtc;
> +    FWCfgState *fw_cfg;
> +    qemu_irq *gsi;
> +    GMappedFile *initrd_mapped_file;
> +
> +    /* Configuration options: */
> +    uint64_t max_ram_below_4g;
> +
> +    /* RAM information (sizes, addresses, configuration): */
> +    ram_addr_t below_4g_mem_size, above_4g_mem_size;
> +
> +    /* CPU and apic information: */
> +    bool apic_xrupt_override;
> +    unsigned apic_id_limit;
> +    uint16_t boot_cpus;
> +    unsigned smp_dies;
> +
> +    /* Address space used by IOAPIC device. All IOAPIC interrupts
> +     * will be translated to MSI messages in the address space. */
> +    AddressSpace *ioapic_as;
> +} X86MachineState;
> +
> +#define X86_MACHINE_MAX_RAM_BELOW_4G "max-ram-below-4g"
> +
> +#define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
> +#define X86_MACHINE(obj) \
> +    OBJECT_CHECK(X86MachineState, (obj), TYPE_X86_MACHINE)
> +#define X86_MACHINE_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(X86MachineClass, obj, TYPE_X86_MACHINE)
> +#define X86_MACHINE_CLASS(class) \
> +    OBJECT_CLASS_CHECK(X86MachineClass, class, TYPE_X86_MACHINE)
> +
> +uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
> +                                    unsigned int cpu_index);
> +
> +void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
> +void x86_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
> +void x86_smp_parse(MachineState *ms, QemuOpts *opts);
> +void x86_nmi(NMIState *n, int cpu_index, Error **errp);
> +
> +CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
> +                                             unsigned cpu_index);
> +int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx);
> +const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms);
> +
> +void x86_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw);
> +
> +void load_linux(X86MachineState *x86ms,

Maybe rename x86_load_linux()?

> +                FWCfgState *fw_cfg,
> +                unsigned acpi_data_size,
> +                bool linuxboot_dma_enabled,
> +                bool pvh_enabled);
> +
> +#endif
> 

Patch looks good, however I'd split it as:

1/ rename functions x86_*
2/ export functions, add "hw/i386/x86.h"
3/ move functions to hw/i386/x86.c
4/ add/use X86MachineState

Anyhow if the maintainer is happy as it:
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-24 13:57   ` Peter Maydell
  -1 siblings, 0 replies; 133+ messages in thread
From: Peter Maydell @ 2019-09-24 13:57 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: QEMU Developers, Eduardo Habkost, kvm-devel, Michael S. Tsirkin,
	Laszlo Ersek, Marcelo Tosatti, Gerd Hoffmann, Paolo Bonzini,
	Igor Mammedov, Philippe Mathieu-Daudé,
	Richard Henderson

On Tue, 24 Sep 2019 at 14:25, Sergio Lopez <slp@redhat.com> wrote:
>
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
>
> It's main purpose is providing users a minimalist machine type free
> from the burden of legacy compatibility, serving as a stepping stone
> for future projects aiming at improving boot times, reducing the
> attack surface and slimming down QEMU's footprint.


>  docs/microvm.txt                 |  78 +++

I'm not sure how close to acceptance this patchset is at the
moment, so not necessarily something you need to do now,
but could new documentation in docs/ be in rst format, not
plain text, please? (Ideally also they should be in the right
manual subdirectory, but documentation of system emulation
machines at the moment is still in texinfo format, so we
don't have a subdir for it yet.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-24 13:57   ` Peter Maydell
  0 siblings, 0 replies; 133+ messages in thread
From: Peter Maydell @ 2019-09-24 13:57 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Eduardo Habkost, kvm-devel, Michael S. Tsirkin,
	Philippe Mathieu-Daudé,
	Marcelo Tosatti, QEMU Developers, Gerd Hoffmann, Igor Mammedov,
	Paolo Bonzini, Laszlo Ersek, Richard Henderson

On Tue, 24 Sep 2019 at 14:25, Sergio Lopez <slp@redhat.com> wrote:
>
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
>
> It's main purpose is providing users a minimalist machine type free
> from the burden of legacy compatibility, serving as a stepping stone
> for future projects aiming at improving boot times, reducing the
> attack surface and slimming down QEMU's footprint.


>  docs/microvm.txt                 |  78 +++

I'm not sure how close to acceptance this patchset is at the
moment, so not necessarily something you need to do now,
but could new documentation in docs/ be in rst format, not
plain text, please? (Ideally also they should be in the right
manual subdirectory, but documentation of system emulation
machines at the moment is still in texinfo format, so we
don't have a subdir for it yet.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 2/8] hw/i386: Factorize e820 related functions
  2019-09-24 13:20     ` Philippe Mathieu-Daudé
@ 2019-09-24 14:12       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 14:12 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 10613 bytes --]


Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> Extract e820 related functions from pc.c, and put them in e820.c, so
>> they can be shared with other components.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  hw/i386/Makefile.objs |  1 +
>>  hw/i386/e820.c        | 99 +++++++++++++++++++++++++++++++++++++++++++
>>  hw/i386/e820.h        | 11 +++++
>>  hw/i386/pc.c          | 66 +----------------------------
>>  include/hw/i386/pc.h  | 11 -----
>>  target/i386/kvm.c     |  1 +
>>  6 files changed, 114 insertions(+), 75 deletions(-)
>>  create mode 100644 hw/i386/e820.c
>>  create mode 100644 hw/i386/e820.h
>> 
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index c5f20bbd72..149712db07 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -2,6 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
>>  obj-y += multiboot.o
>>  obj-y += pvh.o
>>  obj-y += pc.o
>> +obj-y += e820.o
>
> Isn't that commit d6d059ca07ae907b8945f88c382fb54d43f9f03a?
> I'm confuse now.

Hm... this was pulled on 2019-09-17 and I totally missed it. I'll drop
this and rebase the patchset for v5.

Thanks!

>>  obj-$(CONFIG_I440FX) += pc_piix.o
>>  obj-$(CONFIG_Q35) += pc_q35.o
>>  obj-y += fw_cfg.o pc_sysfw.o
>> diff --git a/hw/i386/e820.c b/hw/i386/e820.c
>> new file mode 100644
>> index 0000000000..d5c5c0d528
>> --- /dev/null
>> +++ b/hw/i386/e820.c
>> @@ -0,0 +1,99 @@
>> +/*
>> + * Copyright (c) 2003-2004 Fabrice Bellard
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/cutils.h"
>> +#include "qemu/units.h"
>> +
>> +#include "hw/i386/e820.h"
>> +#include "hw/i386/fw_cfg.h"
>> +
>> +#define E820_NR_ENTRIES		16
>> +
>> +struct e820_entry {
>> +    uint64_t address;
>> +    uint64_t length;
>> +    uint32_t type;
>> +} QEMU_PACKED __attribute((__aligned__(4)));
>> +
>> +struct e820_table {
>> +    uint32_t count;
>> +    struct e820_entry entry[E820_NR_ENTRIES];
>> +} QEMU_PACKED __attribute((__aligned__(4)));
>> +
>> +static struct e820_table e820_reserve;
>> +static struct e820_entry *e820_table;
>> +static unsigned e820_entries;
>> +
>> +int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>> +{
>> +    int index = le32_to_cpu(e820_reserve.count);
>> +    struct e820_entry *entry;
>> +
>> +    if (type != E820_RAM) {
>> +        /* old FW_CFG_E820_TABLE entry -- reservations only */
>> +        if (index >= E820_NR_ENTRIES) {
>> +            return -EBUSY;
>> +        }
>> +        entry = &e820_reserve.entry[index++];
>> +
>> +        entry->address = cpu_to_le64(address);
>> +        entry->length = cpu_to_le64(length);
>> +        entry->type = cpu_to_le32(type);
>> +
>> +        e820_reserve.count = cpu_to_le32(index);
>> +    }
>> +
>> +    /* new "etc/e820" file -- include ram too */
>> +    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
>> +    e820_table[e820_entries].address = cpu_to_le64(address);
>> +    e820_table[e820_entries].length = cpu_to_le64(length);
>> +    e820_table[e820_entries].type = cpu_to_le32(type);
>> +    e820_entries++;
>> +
>> +    return e820_entries;
>> +}
>> +
>> +int e820_get_num_entries(void)
>> +{
>> +    return e820_entries;
>> +}
>> +
>> +bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
>> +{
>> +    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
>> +        *address = le64_to_cpu(e820_table[idx].address);
>> +        *length = le64_to_cpu(e820_table[idx].length);
>> +        return true;
>> +    }
>> +    return false;
>> +}
>> +
>> +void e820_create_fw_entry(FWCfgState *fw_cfg)
>> +{
>> +    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
>> +                     &e820_reserve, sizeof(e820_reserve));
>> +    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
>> +                    sizeof(struct e820_entry) * e820_entries);
>> +}
>> diff --git a/hw/i386/e820.h b/hw/i386/e820.h
>> new file mode 100644
>> index 0000000000..569d1f0ab5
>> --- /dev/null
>> +++ b/hw/i386/e820.h
>> @@ -0,0 +1,11 @@
>> +/* e820 types */
>> +#define E820_RAM        1
>> +#define E820_RESERVED   2
>> +#define E820_ACPI       3
>> +#define E820_NVS        4
>> +#define E820_UNUSABLE   5
>> +
>> +int e820_add_entry(uint64_t address, uint64_t length, uint32_t type);
>> +int e820_get_num_entries(void);
>> +bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length);
>> +void e820_create_fw_entry(FWCfgState *fw_cfg);
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 10e4ced0c6..3920aa7e85 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -30,6 +30,7 @@
>>  #include "hw/i386/apic.h"
>>  #include "hw/i386/topology.h"
>>  #include "hw/i386/fw_cfg.h"
>> +#include "hw/i386/e820.h"
>>  #include "sysemu/cpus.h"
>>  #include "hw/block/fdc.h"
>>  #include "hw/ide.h"
>> @@ -99,22 +100,6 @@
>>  #define DPRINTF(fmt, ...)
>>  #endif
>>  
>> -#define E820_NR_ENTRIES		16
>> -
>> -struct e820_entry {
>> -    uint64_t address;
>> -    uint64_t length;
>> -    uint32_t type;
>> -} QEMU_PACKED __attribute((__aligned__(4)));
>> -
>> -struct e820_table {
>> -    uint32_t count;
>> -    struct e820_entry entry[E820_NR_ENTRIES];
>> -} QEMU_PACKED __attribute((__aligned__(4)));
>> -
>> -static struct e820_table e820_reserve;
>> -static struct e820_entry *e820_table;
>> -static unsigned e820_entries;
>>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>>  
>>  GlobalProperty pc_compat_4_1[] = {};
>> @@ -878,50 +863,6 @@ static void handle_a20_line_change(void *opaque, int irq, int level)
>>      x86_cpu_set_a20(cpu, level);
>>  }
>>  
>> -int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>> -{
>> -    int index = le32_to_cpu(e820_reserve.count);
>> -    struct e820_entry *entry;
>> -
>> -    if (type != E820_RAM) {
>> -        /* old FW_CFG_E820_TABLE entry -- reservations only */
>> -        if (index >= E820_NR_ENTRIES) {
>> -            return -EBUSY;
>> -        }
>> -        entry = &e820_reserve.entry[index++];
>> -
>> -        entry->address = cpu_to_le64(address);
>> -        entry->length = cpu_to_le64(length);
>> -        entry->type = cpu_to_le32(type);
>> -
>> -        e820_reserve.count = cpu_to_le32(index);
>> -    }
>> -
>> -    /* new "etc/e820" file -- include ram too */
>> -    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
>> -    e820_table[e820_entries].address = cpu_to_le64(address);
>> -    e820_table[e820_entries].length = cpu_to_le64(length);
>> -    e820_table[e820_entries].type = cpu_to_le32(type);
>> -    e820_entries++;
>> -
>> -    return e820_entries;
>> -}
>> -
>> -int e820_get_num_entries(void)
>> -{
>> -    return e820_entries;
>> -}
>> -
>> -bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
>> -{
>> -    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
>> -        *address = le64_to_cpu(e820_table[idx].address);
>> -        *length = le64_to_cpu(e820_table[idx].length);
>> -        return true;
>> -    }
>> -    return false;
>> -}
>> -
>>  /* Calculates initial APIC ID for a specific CPU index
>>   *
>>   * Currently we need to be able to calculate the APIC ID from the CPU index
>> @@ -1024,10 +965,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>>                       acpi_tables, acpi_tables_len);
>>      fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
>>  
>> -    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
>> -                     &e820_reserve, sizeof(e820_reserve));
>> -    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
>> -                    sizeof(struct e820_entry) * e820_entries);
>> +    e820_create_fw_entry(fw_cfg);
>>  
>>      fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
>>      /* allocate memory for the NUMA channel: one (64bit) word for the number
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 19a837889d..062feeb69e 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -291,17 +291,6 @@ void pc_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
>>  void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
>>                         const CPUArchIdList *apic_ids, GArray *entry);
>>  
>> -/* e820 types */
>> -#define E820_RAM        1
>> -#define E820_RESERVED   2
>> -#define E820_ACPI       3
>> -#define E820_NVS        4
>> -#define E820_UNUSABLE   5
>> -
>> -int e820_add_entry(uint64_t, uint64_t, uint32_t);
>> -int e820_get_num_entries(void);
>> -bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>> -
>>  extern GlobalProperty pc_compat_4_1[];
>>  extern const size_t pc_compat_4_1_len;
>>  
>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>> index 8023c679ea..8ce56db7d4 100644
>> --- a/target/i386/kvm.c
>> +++ b/target/i386/kvm.c
>> @@ -41,6 +41,7 @@
>>  #include "hw/i386/apic-msidef.h"
>>  #include "hw/i386/intel_iommu.h"
>>  #include "hw/i386/x86-iommu.h"
>> +#include "hw/i386/e820.h"
>>  
>>  #include "hw/pci/pci.h"
>>  #include "hw/pci/msi.h"
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 2/8] hw/i386: Factorize e820 related functions
@ 2019-09-24 14:12       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-24 14:12 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: ehabkost, kvm, mst, mtosatti, qemu-devel, kraxel, pbonzini,
	imammedo, lersek, rth

[-- Attachment #1: Type: text/plain, Size: 10613 bytes --]


Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> Extract e820 related functions from pc.c, and put them in e820.c, so
>> they can be shared with other components.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  hw/i386/Makefile.objs |  1 +
>>  hw/i386/e820.c        | 99 +++++++++++++++++++++++++++++++++++++++++++
>>  hw/i386/e820.h        | 11 +++++
>>  hw/i386/pc.c          | 66 +----------------------------
>>  include/hw/i386/pc.h  | 11 -----
>>  target/i386/kvm.c     |  1 +
>>  6 files changed, 114 insertions(+), 75 deletions(-)
>>  create mode 100644 hw/i386/e820.c
>>  create mode 100644 hw/i386/e820.h
>> 
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index c5f20bbd72..149712db07 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -2,6 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
>>  obj-y += multiboot.o
>>  obj-y += pvh.o
>>  obj-y += pc.o
>> +obj-y += e820.o
>
> Isn't that commit d6d059ca07ae907b8945f88c382fb54d43f9f03a?
> I'm confuse now.

Hm... this was pulled on 2019-09-17 and I totally missed it. I'll drop
this and rebase the patchset for v5.

Thanks!

>>  obj-$(CONFIG_I440FX) += pc_piix.o
>>  obj-$(CONFIG_Q35) += pc_q35.o
>>  obj-y += fw_cfg.o pc_sysfw.o
>> diff --git a/hw/i386/e820.c b/hw/i386/e820.c
>> new file mode 100644
>> index 0000000000..d5c5c0d528
>> --- /dev/null
>> +++ b/hw/i386/e820.c
>> @@ -0,0 +1,99 @@
>> +/*
>> + * Copyright (c) 2003-2004 Fabrice Bellard
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/cutils.h"
>> +#include "qemu/units.h"
>> +
>> +#include "hw/i386/e820.h"
>> +#include "hw/i386/fw_cfg.h"
>> +
>> +#define E820_NR_ENTRIES		16
>> +
>> +struct e820_entry {
>> +    uint64_t address;
>> +    uint64_t length;
>> +    uint32_t type;
>> +} QEMU_PACKED __attribute((__aligned__(4)));
>> +
>> +struct e820_table {
>> +    uint32_t count;
>> +    struct e820_entry entry[E820_NR_ENTRIES];
>> +} QEMU_PACKED __attribute((__aligned__(4)));
>> +
>> +static struct e820_table e820_reserve;
>> +static struct e820_entry *e820_table;
>> +static unsigned e820_entries;
>> +
>> +int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>> +{
>> +    int index = le32_to_cpu(e820_reserve.count);
>> +    struct e820_entry *entry;
>> +
>> +    if (type != E820_RAM) {
>> +        /* old FW_CFG_E820_TABLE entry -- reservations only */
>> +        if (index >= E820_NR_ENTRIES) {
>> +            return -EBUSY;
>> +        }
>> +        entry = &e820_reserve.entry[index++];
>> +
>> +        entry->address = cpu_to_le64(address);
>> +        entry->length = cpu_to_le64(length);
>> +        entry->type = cpu_to_le32(type);
>> +
>> +        e820_reserve.count = cpu_to_le32(index);
>> +    }
>> +
>> +    /* new "etc/e820" file -- include ram too */
>> +    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
>> +    e820_table[e820_entries].address = cpu_to_le64(address);
>> +    e820_table[e820_entries].length = cpu_to_le64(length);
>> +    e820_table[e820_entries].type = cpu_to_le32(type);
>> +    e820_entries++;
>> +
>> +    return e820_entries;
>> +}
>> +
>> +int e820_get_num_entries(void)
>> +{
>> +    return e820_entries;
>> +}
>> +
>> +bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
>> +{
>> +    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
>> +        *address = le64_to_cpu(e820_table[idx].address);
>> +        *length = le64_to_cpu(e820_table[idx].length);
>> +        return true;
>> +    }
>> +    return false;
>> +}
>> +
>> +void e820_create_fw_entry(FWCfgState *fw_cfg)
>> +{
>> +    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
>> +                     &e820_reserve, sizeof(e820_reserve));
>> +    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
>> +                    sizeof(struct e820_entry) * e820_entries);
>> +}
>> diff --git a/hw/i386/e820.h b/hw/i386/e820.h
>> new file mode 100644
>> index 0000000000..569d1f0ab5
>> --- /dev/null
>> +++ b/hw/i386/e820.h
>> @@ -0,0 +1,11 @@
>> +/* e820 types */
>> +#define E820_RAM        1
>> +#define E820_RESERVED   2
>> +#define E820_ACPI       3
>> +#define E820_NVS        4
>> +#define E820_UNUSABLE   5
>> +
>> +int e820_add_entry(uint64_t address, uint64_t length, uint32_t type);
>> +int e820_get_num_entries(void);
>> +bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length);
>> +void e820_create_fw_entry(FWCfgState *fw_cfg);
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 10e4ced0c6..3920aa7e85 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -30,6 +30,7 @@
>>  #include "hw/i386/apic.h"
>>  #include "hw/i386/topology.h"
>>  #include "hw/i386/fw_cfg.h"
>> +#include "hw/i386/e820.h"
>>  #include "sysemu/cpus.h"
>>  #include "hw/block/fdc.h"
>>  #include "hw/ide.h"
>> @@ -99,22 +100,6 @@
>>  #define DPRINTF(fmt, ...)
>>  #endif
>>  
>> -#define E820_NR_ENTRIES		16
>> -
>> -struct e820_entry {
>> -    uint64_t address;
>> -    uint64_t length;
>> -    uint32_t type;
>> -} QEMU_PACKED __attribute((__aligned__(4)));
>> -
>> -struct e820_table {
>> -    uint32_t count;
>> -    struct e820_entry entry[E820_NR_ENTRIES];
>> -} QEMU_PACKED __attribute((__aligned__(4)));
>> -
>> -static struct e820_table e820_reserve;
>> -static struct e820_entry *e820_table;
>> -static unsigned e820_entries;
>>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>>  
>>  GlobalProperty pc_compat_4_1[] = {};
>> @@ -878,50 +863,6 @@ static void handle_a20_line_change(void *opaque, int irq, int level)
>>      x86_cpu_set_a20(cpu, level);
>>  }
>>  
>> -int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
>> -{
>> -    int index = le32_to_cpu(e820_reserve.count);
>> -    struct e820_entry *entry;
>> -
>> -    if (type != E820_RAM) {
>> -        /* old FW_CFG_E820_TABLE entry -- reservations only */
>> -        if (index >= E820_NR_ENTRIES) {
>> -            return -EBUSY;
>> -        }
>> -        entry = &e820_reserve.entry[index++];
>> -
>> -        entry->address = cpu_to_le64(address);
>> -        entry->length = cpu_to_le64(length);
>> -        entry->type = cpu_to_le32(type);
>> -
>> -        e820_reserve.count = cpu_to_le32(index);
>> -    }
>> -
>> -    /* new "etc/e820" file -- include ram too */
>> -    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
>> -    e820_table[e820_entries].address = cpu_to_le64(address);
>> -    e820_table[e820_entries].length = cpu_to_le64(length);
>> -    e820_table[e820_entries].type = cpu_to_le32(type);
>> -    e820_entries++;
>> -
>> -    return e820_entries;
>> -}
>> -
>> -int e820_get_num_entries(void)
>> -{
>> -    return e820_entries;
>> -}
>> -
>> -bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
>> -{
>> -    if (idx < e820_entries && e820_table[idx].type == cpu_to_le32(type)) {
>> -        *address = le64_to_cpu(e820_table[idx].address);
>> -        *length = le64_to_cpu(e820_table[idx].length);
>> -        return true;
>> -    }
>> -    return false;
>> -}
>> -
>>  /* Calculates initial APIC ID for a specific CPU index
>>   *
>>   * Currently we need to be able to calculate the APIC ID from the CPU index
>> @@ -1024,10 +965,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>>                       acpi_tables, acpi_tables_len);
>>      fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
>>  
>> -    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
>> -                     &e820_reserve, sizeof(e820_reserve));
>> -    fw_cfg_add_file(fw_cfg, "etc/e820", e820_table,
>> -                    sizeof(struct e820_entry) * e820_entries);
>> +    e820_create_fw_entry(fw_cfg);
>>  
>>      fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, &hpet_cfg, sizeof(hpet_cfg));
>>      /* allocate memory for the NUMA channel: one (64bit) word for the number
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 19a837889d..062feeb69e 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -291,17 +291,6 @@ void pc_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
>>  void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
>>                         const CPUArchIdList *apic_ids, GArray *entry);
>>  
>> -/* e820 types */
>> -#define E820_RAM        1
>> -#define E820_RESERVED   2
>> -#define E820_ACPI       3
>> -#define E820_NVS        4
>> -#define E820_UNUSABLE   5
>> -
>> -int e820_add_entry(uint64_t, uint64_t, uint32_t);
>> -int e820_get_num_entries(void);
>> -bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
>> -
>>  extern GlobalProperty pc_compat_4_1[];
>>  extern const size_t pc_compat_4_1_len;
>>  
>> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
>> index 8023c679ea..8ce56db7d4 100644
>> --- a/target/i386/kvm.c
>> +++ b/target/i386/kvm.c
>> @@ -41,6 +41,7 @@
>>  #include "hw/i386/apic-msidef.h"
>>  #include "hw/i386/intel_iommu.h"
>>  #include "hw/i386/x86-iommu.h"
>> +#include "hw/i386/e820.h"
>>  
>>  #include "hw/pci/pci.h"
>>  #include "hw/pci/msi.h"
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-24 12:44   ` Sergio Lopez
@ 2019-09-25  5:06     ` Gerd Hoffmann
  -1 siblings, 0 replies; 133+ messages in thread
From: Gerd Hoffmann @ 2019-09-25  5:06 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, philmd, lersek, mtosatti, kvm

  Hi,

> +microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)

Hmm, is that the long-term plan?  IMO the virtio-mmio devices should be
discoverable somehow.  ACPI, or device-tree, or fw_cfg, or ...

> +As no current FW is able to boot from a block device using virtio-mmio
> +as its transport,

To fix that the firmware must be able to find the virtio-mmio devices.

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25  5:06     ` Gerd Hoffmann
  0 siblings, 0 replies; 133+ messages in thread
From: Gerd Hoffmann @ 2019-09-25  5:06 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, pbonzini,
	imammedo, philmd, rth

  Hi,

> +microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)

Hmm, is that the long-term plan?  IMO the virtio-mmio devices should be
discoverable somehow.  ACPI, or device-tree, or fw_cfg, or ...

> +As no current FW is able to boot from a block device using virtio-mmio
> +as its transport,

To fix that the firmware must be able to find the virtio-mmio devices.

cheers,
  Gerd



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-24 13:10     ` Paolo Bonzini
@ 2019-09-25  5:49       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  5:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 1809 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 24/09/19 14:44, Sergio Lopez wrote:
>> +Microvm is a machine type inspired by both NEMU and Firecracker, and
>> +constructed after the machine model implemented by the latter.
>
> I would say it's inspired by Firecracker only.  The NEMU virt machine
> had virtio-pci and ACPI.

Actually, the NEMU reference comes from the fact that, originally,
microvm.c code was based on virt.c, but on v4 all that is already gone,
so it makes sense to remove the reference.

>> +It's main purpose is providing users a minimalist machine type free
>> +from the burden of legacy compatibility,
>
> I think this is too strong, especially if you keep the PIC and PIT. :)
> Maybe just "It's a minimalist machine type without PCI support designed
> for short-lived guests".

OK.

>> +serving as a stepping stone
>> +for future projects aiming at improving boot times, reducing the
>> +attack surface and slimming down QEMU's footprint.
>
> "Microvm also establishes a baseline for benchmarking QEMU and operating
> systems, since it is optimized for both boot time and footprint".

Well, I prefer my paragraph, but I'm good with either.

>> +The microvm machine type supports the following devices:
>> +
>> + - ISA bus
>> + - i8259 PIC
>> + - LAPIC (implicit if using KVM)
>> + - IOAPIC (defaults to kernel_irqchip_split = true)
>> + - i8254 PIT
>
> Do we need the PIT?  And perhaps the PIC even?

We need the PIT for non-KVM accel (if present with KVM and
kernel_irqchip_split = off, it basically becomes a placeholder), and the
PIC for both the PIT and the ISA serial port.

Thanks,
Sergio.

>> + - MC146818 RTC (optional)
>> + - kvmclock (if using KVM)
>> + - fw_cfg
>> + - One ISA serial port (optional)
>> + - Up to eight virtio-mmio devices (configured by the user)
>> +


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25  5:49       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  5:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 1809 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 24/09/19 14:44, Sergio Lopez wrote:
>> +Microvm is a machine type inspired by both NEMU and Firecracker, and
>> +constructed after the machine model implemented by the latter.
>
> I would say it's inspired by Firecracker only.  The NEMU virt machine
> had virtio-pci and ACPI.

Actually, the NEMU reference comes from the fact that, originally,
microvm.c code was based on virt.c, but on v4 all that is already gone,
so it makes sense to remove the reference.

>> +It's main purpose is providing users a minimalist machine type free
>> +from the burden of legacy compatibility,
>
> I think this is too strong, especially if you keep the PIC and PIT. :)
> Maybe just "It's a minimalist machine type without PCI support designed
> for short-lived guests".

OK.

>> +serving as a stepping stone
>> +for future projects aiming at improving boot times, reducing the
>> +attack surface and slimming down QEMU's footprint.
>
> "Microvm also establishes a baseline for benchmarking QEMU and operating
> systems, since it is optimized for both boot time and footprint".

Well, I prefer my paragraph, but I'm good with either.

>> +The microvm machine type supports the following devices:
>> +
>> + - ISA bus
>> + - i8259 PIC
>> + - LAPIC (implicit if using KVM)
>> + - IOAPIC (defaults to kernel_irqchip_split = true)
>> + - i8254 PIT
>
> Do we need the PIT?  And perhaps the PIC even?

We need the PIT for non-KVM accel (if present with KVM and
kernel_irqchip_split = off, it basically becomes a placeholder), and the
PIC for both the PIT and the ISA serial port.

Thanks,
Sergio.

>> + - MC146818 RTC (optional)
>> + - kvmclock (if using KVM)
>> + - fw_cfg
>> + - One ISA serial port (optional)
>> + - Up to eight virtio-mmio devices (configured by the user)
>> +


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-24 13:57   ` Peter Maydell
@ 2019-09-25  5:51     ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  5:51 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, Eduardo Habkost, kvm-devel, Michael S. Tsirkin,
	Laszlo Ersek, Marcelo Tosatti, Gerd Hoffmann, Paolo Bonzini,
	Igor Mammedov, Philippe Mathieu-Daudé,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]


Peter Maydell <peter.maydell@linaro.org> writes:

> On Tue, 24 Sep 2019 at 14:25, Sergio Lopez <slp@redhat.com> wrote:
>>
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>>
>> It's main purpose is providing users a minimalist machine type free
>> from the burden of legacy compatibility, serving as a stepping stone
>> for future projects aiming at improving boot times, reducing the
>> attack surface and slimming down QEMU's footprint.
>
>
>>  docs/microvm.txt                 |  78 +++
>
> I'm not sure how close to acceptance this patchset is at the
> moment, so not necessarily something you need to do now,
> but could new documentation in docs/ be in rst format, not
> plain text, please? (Ideally also they should be in the right
> manual subdirectory, but documentation of system emulation
> machines at the moment is still in texinfo format, so we
> don't have a subdir for it yet.)

Sure. What I didn't get is, should I put it in "docs/microvm.rst" or in
some other subdirectory?

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  5:51     ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  5:51 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Eduardo Habkost, kvm-devel, Michael S. Tsirkin,
	Philippe Mathieu-Daudé,
	Marcelo Tosatti, QEMU Developers, Gerd Hoffmann, Igor Mammedov,
	Paolo Bonzini, Laszlo Ersek, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]


Peter Maydell <peter.maydell@linaro.org> writes:

> On Tue, 24 Sep 2019 at 14:25, Sergio Lopez <slp@redhat.com> wrote:
>>
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>>
>> It's main purpose is providing users a minimalist machine type free
>> from the burden of legacy compatibility, serving as a stepping stone
>> for future projects aiming at improving boot times, reducing the
>> attack surface and slimming down QEMU's footprint.
>
>
>>  docs/microvm.txt                 |  78 +++
>
> I'm not sure how close to acceptance this patchset is at the
> moment, so not necessarily something you need to do now,
> but could new documentation in docs/ be in rst format, not
> plain text, please? (Ideally also they should be in the right
> manual subdirectory, but documentation of system emulation
> machines at the moment is still in texinfo format, so we
> don't have a subdir for it yet.)

Sure. What I didn't get is, should I put it in "docs/microvm.rst" or in
some other subdirectory?

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-24 13:12     ` Paolo Bonzini
@ 2019-09-25  5:53       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  5:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 538 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 24/09/19 14:44, Sergio Lopez wrote:
>> microvm.option-roms=bool (Set off to disable loading option ROMs)
>
> Please make this x-option-roms

OK.

>> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
>> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
>> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
>
> Perhaps auto-kernel-cmdline?

Yeah, that sounds better.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-25  5:53       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  5:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 538 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 24/09/19 14:44, Sergio Lopez wrote:
>> microvm.option-roms=bool (Set off to disable loading option ROMs)
>
> Please make this x-option-roms

OK.

>> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
>> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
>> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
>
> Perhaps auto-kernel-cmdline?

Yeah, that sounds better.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-24 13:28     ` Michael S. Tsirkin
@ 2019-09-25  5:59       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  5:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 2380 bytes --]


Michael S. Tsirkin <mst@redhat.com> writes:

> On Tue, Sep 24, 2019 at 02:44:33PM +0200, Sergio Lopez wrote:
>> +static void microvm_fix_kernel_cmdline(MachineState *machine)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>> +    BusState *bus;
>> +    BusChild *kid;
>> +    char *cmdline;
>> +
>> +    /*
>> +     * Find MMIO transports with attached devices, and add them to the kernel
>> +     * command line.
>> +     *
>> +     * Yes, this is a hack, but one that heavily improves the UX without
>> +     * introducing any significant issues.
>> +     */
>> +    cmdline = g_strdup(machine->kernel_cmdline);
>> +    bus = sysbus_get_default();
>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>> +        DeviceState *dev = kid->child;
>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>> +
>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>> +
>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
>> +                if (mmio_cmdline) {
>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>> +                    g_free(mmio_cmdline);
>> +                    g_free(cmdline);
>> +                    cmdline = newcmd;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
>> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
>> +}
>
> Can we rearrange this somewhat? Maybe the mmio constructor
> would format the device description and add to some list,
> and then microvm would just get stuff from that list
> and add it to kernel command line?
> This way it can also be controlled by a virtio-mmio property, so
> e.g. you can disable it per device if you like.
> In particular, this seems like a handy trick for any machine type
> using mmio.

Disabling it per-device won't be easy, as transport options can't be
specified using the underlying device properties.

But, otherwise, sounds like a good idea to avoid having to traverse the
qtree. I'll give it a try.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-25  5:59       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  5:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost, kvm, lersek, mtosatti, qemu-devel, kraxel, pbonzini,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 2380 bytes --]


Michael S. Tsirkin <mst@redhat.com> writes:

> On Tue, Sep 24, 2019 at 02:44:33PM +0200, Sergio Lopez wrote:
>> +static void microvm_fix_kernel_cmdline(MachineState *machine)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>> +    BusState *bus;
>> +    BusChild *kid;
>> +    char *cmdline;
>> +
>> +    /*
>> +     * Find MMIO transports with attached devices, and add them to the kernel
>> +     * command line.
>> +     *
>> +     * Yes, this is a hack, but one that heavily improves the UX without
>> +     * introducing any significant issues.
>> +     */
>> +    cmdline = g_strdup(machine->kernel_cmdline);
>> +    bus = sysbus_get_default();
>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>> +        DeviceState *dev = kid->child;
>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>> +
>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>> +
>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
>> +                if (mmio_cmdline) {
>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>> +                    g_free(mmio_cmdline);
>> +                    g_free(cmdline);
>> +                    cmdline = newcmd;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
>> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
>> +}
>
> Can we rearrange this somewhat? Maybe the mmio constructor
> would format the device description and add to some list,
> and then microvm would just get stuff from that list
> and add it to kernel command line?
> This way it can also be controlled by a virtio-mmio property, so
> e.g. you can disable it per device if you like.
> In particular, this seems like a handy trick for any machine type
> using mmio.

Disabling it per-device won't be easy, as transport options can't be
specified using the underlying device properties.

But, otherwise, sounds like a good idea to avoid having to traverse the
qtree. I'll give it a try.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
  2019-09-24 13:18     ` Philippe Mathieu-Daudé
@ 2019-09-25  6:03       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  6:03 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 12029 bytes --]


Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> Hi Sergio,
>
> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> Extract PVH related functions from pc.c, and put them in pvh.c, so
>> they can be shared with other components.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  hw/i386/Makefile.objs |   1 +
>>  hw/i386/pc.c          | 120 +++++-------------------------------------
>>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
>>  hw/i386/pvh.h         |  10 ++++
>>  4 files changed, 136 insertions(+), 108 deletions(-)
>>  create mode 100644 hw/i386/pvh.c
>>  create mode 100644 hw/i386/pvh.h
>> 
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 5d9c9efd5f..c5f20bbd72 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -1,5 +1,6 @@
>>  obj-$(CONFIG_KVM) += kvm/
>>  obj-y += multiboot.o
>> +obj-y += pvh.o
>>  obj-y += pc.o
>>  obj-$(CONFIG_I440FX) += pc_piix.o
>>  obj-$(CONFIG_Q35) += pc_q35.o
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index bad866fe44..10e4ced0c6 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -42,6 +42,7 @@
>>  #include "elf.h"
>>  #include "migration/vmstate.h"
>>  #include "multiboot.h"
>> +#include "pvh.h"
>>  #include "hw/timer/mc146818rtc.h"
>>  #include "hw/dma/i8257.h"
>>  #include "hw/timer/i8254.h"
>> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
>>  static unsigned e820_entries;
>>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>>  
>> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
>> -static size_t pvh_start_addr;
>> -
>>  GlobalProperty pc_compat_4_1[] = {};
>>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
>>  
>> @@ -1076,109 +1074,6 @@ struct setup_data {
>>      uint8_t data[0];
>>  } __attribute__((packed));
>>  
>> -
>> -/*
>> - * The entry point into the kernel for PVH boot is different from
>> - * the native entry point.  The PVH entry is defined by the x86/HVM
>> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> - *
>> - * This function is passed to load_elf() when it is called from
>> - * load_elfboot() which then additionally checks for an ELF Note of
>> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
>> - * parse the PVH entry address from the ELF Note.
>> - *
>> - * Due to trickery in elf_opts.h, load_elf() is actually available as
>> - * load_elf32() or load_elf64() and this routine needs to be able
>> - * to deal with being called as 32 or 64 bit.
>> - *
>> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> - * global variable.  (although the entry point is 32-bit, the kernel
>> - * binary can be either 32-bit or 64-bit).
>> - */
>> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
>> -{
>> -    size_t *elf_note_data_addr;
>> -
>> -    /* Check if ELF Note header passed in is valid */
>> -    if (arg1 == NULL) {
>> -        return 0;
>> -    }
>> -
>> -    if (is64) {
>> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
>> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
>> -        uint64_t phdr_align = *(uint64_t *)arg2;
>> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
>> -
>> -        elf_note_data_addr =
>> -            ((void *)nhdr64) + nhdr_size64 +
>> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> -    } else {
>> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
>> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
>> -        uint32_t phdr_align = *(uint32_t *)arg2;
>> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
>> -
>> -        elf_note_data_addr =
>> -            ((void *)nhdr32) + nhdr_size32 +
>> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> -    }
>> -
>> -    pvh_start_addr = *elf_note_data_addr;
>> -
>> -    return pvh_start_addr;
>> -}
>> -
>> -static bool load_elfboot(const char *kernel_filename,
>> -                   int kernel_file_size,
>> -                   uint8_t *header,
>> -                   size_t pvh_xen_start_addr,
>> -                   FWCfgState *fw_cfg)
>> -{
>> -    uint32_t flags = 0;
>> -    uint32_t mh_load_addr = 0;
>> -    uint32_t elf_kernel_size = 0;
>> -    uint64_t elf_entry;
>> -    uint64_t elf_low, elf_high;
>> -    int kernel_size;
>> -
>> -    if (ldl_p(header) != 0x464c457f) {
>> -        return false; /* no elfboot */
>> -    }
>> -
>> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
>> -    flags = elf_is64 ?
>> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
>> -
>> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
>> -        error_report("elfboot unsupported flags = %x", flags);
>> -        exit(1);
>> -    }
>> -
>> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
>> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
>> -                           NULL, &elf_note_type, &elf_entry,
>> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> -                           0, 0);
>> -
>> -    if (kernel_size < 0) {
>> -        error_report("Error while loading elf kernel");
>> -        exit(1);
>> -    }
>> -    mh_load_addr = elf_low;
>> -    elf_kernel_size = elf_high - elf_low;
>> -
>> -    if (pvh_start_addr == 0) {
>> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
>> -        exit(1);
>> -    }
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> -
>> -    return true;
>> -}
>> -
>>  static void load_linux(PCMachineState *pcms,
>>                         FWCfgState *fw_cfg)
>>  {
>> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
>>      if (ldl_p(header+0x202) == 0x53726448) {
>>          protocol = lduw_p(header+0x206);
>>      } else {
>> +        size_t pvh_start_addr;
>> +        uint32_t mh_load_addr = 0;
>> +        uint32_t elf_kernel_size = 0;
>>          /*
>>           * This could be a multiboot kernel. If it is, let's stop treating it
>>           * like a Linux kernel.
>> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
>>           * If load_elfboot() is successful, populate the fw_cfg info.
>>           */
>>          if (pcmc->pvh_enabled &&
>> -            load_elfboot(kernel_filename, kernel_size,
>> -                         header, pvh_start_addr, fw_cfg)) {
>> +            pvh_load_elfboot(kernel_filename,
>> +                             &mh_load_addr, &elf_kernel_size)) {
>>              fclose(f);
>>  
>> +            pvh_start_addr = pvh_get_start_addr();
>> +
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> +
>>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>>                  strlen(kernel_cmdline) + 1);
>>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
>> new file mode 100644
>> index 0000000000..1c81727811
>> --- /dev/null
>> +++ b/hw/i386/pvh.c
>> @@ -0,0 +1,113 @@
>> +/*
>> + * PVH Boot Helper
>> + *
>> + * Copyright (C) 2019 Oracle
>> + * Copyright (C) 2019 Red Hat, Inc
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/units.h"
>> +#include "qemu/error-report.h"
>> +#include "hw/loader.h"
>> +#include "cpu.h"
>> +#include "elf.h"
>> +#include "pvh.h"
>> +
>> +static size_t pvh_start_addr;
>> +
>> +size_t pvh_get_start_addr(void)
>> +{
>> +    return pvh_start_addr;
>> +}
>> +
>> +/*
>> + * The entry point into the kernel for PVH boot is different from
>> + * the native entry point.  The PVH entry is defined by the x86/HVM
>> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> + *
>> + * This function is passed to load_elf() when it is called from
>> + * load_elfboot() which then additionally checks for an ELF Note of
>> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
>> + * parse the PVH entry address from the ELF Note.
>> + *
>> + * Due to trickery in elf_opts.h, load_elf() is actually available as
>> + * load_elf32() or load_elf64() and this routine needs to be able
>> + * to deal with being called as 32 or 64 bit.
>> + *
>> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> + * global variable.  (although the entry point is 32-bit, the kernel
>> + * binary can be either 32-bit or 64-bit).
>> + */
>> +
>> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
>> +{
>> +    size_t *elf_note_data_addr;
>> +
>> +    /* Check if ELF Note header passed in is valid */
>> +    if (arg1 == NULL) {
>> +        return 0;
>> +    }
>> +
>> +    if (is64) {
>> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
>> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
>> +        uint64_t phdr_align = *(uint64_t *)arg2;
>> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
>> +
>> +        elf_note_data_addr =
>> +            ((void *)nhdr64) + nhdr_size64 +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +    } else {
>> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
>> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
>> +        uint32_t phdr_align = *(uint32_t *)arg2;
>> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
>> +
>> +        elf_note_data_addr =
>> +            ((void *)nhdr32) + nhdr_size32 +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +    }
>> +
>> +    pvh_start_addr = *elf_note_data_addr;
>> +
>> +    return pvh_start_addr;
>> +}
>> +
>> +bool pvh_load_elfboot(const char *kernel_filename,
>> +                      uint32_t *mh_load_addr,
>> +                      uint32_t *elf_kernel_size)
>> +{
>> +    uint64_t elf_entry;
>> +    uint64_t elf_low, elf_high;
>> +    int kernel_size;
>> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
>> +
>> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
>> +                           NULL, &elf_note_type, &elf_entry,
>> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> +                           0, 0);
>> +
>> +    if (kernel_size < 0) {
>> +        error_report("Error while loading elf kernel");
>> +        return false;
>> +    }
>> +
>> +    if (pvh_start_addr == 0) {
>> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
>> +        return false;
>> +    }
>> +
>> +    if (mh_load_addr) {
>> +        *mh_load_addr = elf_low;
>> +    }
>> +
>> +    if (elf_kernel_size) {
>> +        *elf_kernel_size = elf_high - elf_low;
>> +    }
>> +
>> +    return true;
>> +}
>> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
>> new file mode 100644
>> index 0000000000..ada67ff6e8
>> --- /dev/null
>> +++ b/hw/i386/pvh.h
>> @@ -0,0 +1,10 @@
>
> License missing.

I'm a bit confused about the policy for license blocks in headers, as
some do have it, while others don't (i.e. multiboot.h and acpi-build.h).

>> +#ifndef HW_I386_PVH_H
>> +#define HW_I386_PVH_H
>> +
>> +size_t pvh_get_start_addr(void);
>> +
>> +bool pvh_load_elfboot(const char *kernel_filename,
>> +                      uint32_t *mh_load_addr,
>> +                      uint32_t *elf_kernel_size);
>
> Can you document these functions?

Sure.

Thanks,
Sergio.

>> +
>> +#endif
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
@ 2019-09-25  6:03       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  6:03 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: ehabkost, kvm, mst, mtosatti, qemu-devel, kraxel, pbonzini,
	imammedo, lersek, rth

[-- Attachment #1: Type: text/plain, Size: 12029 bytes --]


Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> Hi Sergio,
>
> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> Extract PVH related functions from pc.c, and put them in pvh.c, so
>> they can be shared with other components.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  hw/i386/Makefile.objs |   1 +
>>  hw/i386/pc.c          | 120 +++++-------------------------------------
>>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
>>  hw/i386/pvh.h         |  10 ++++
>>  4 files changed, 136 insertions(+), 108 deletions(-)
>>  create mode 100644 hw/i386/pvh.c
>>  create mode 100644 hw/i386/pvh.h
>> 
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 5d9c9efd5f..c5f20bbd72 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -1,5 +1,6 @@
>>  obj-$(CONFIG_KVM) += kvm/
>>  obj-y += multiboot.o
>> +obj-y += pvh.o
>>  obj-y += pc.o
>>  obj-$(CONFIG_I440FX) += pc_piix.o
>>  obj-$(CONFIG_Q35) += pc_q35.o
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index bad866fe44..10e4ced0c6 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -42,6 +42,7 @@
>>  #include "elf.h"
>>  #include "migration/vmstate.h"
>>  #include "multiboot.h"
>> +#include "pvh.h"
>>  #include "hw/timer/mc146818rtc.h"
>>  #include "hw/dma/i8257.h"
>>  #include "hw/timer/i8254.h"
>> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
>>  static unsigned e820_entries;
>>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>>  
>> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
>> -static size_t pvh_start_addr;
>> -
>>  GlobalProperty pc_compat_4_1[] = {};
>>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
>>  
>> @@ -1076,109 +1074,6 @@ struct setup_data {
>>      uint8_t data[0];
>>  } __attribute__((packed));
>>  
>> -
>> -/*
>> - * The entry point into the kernel for PVH boot is different from
>> - * the native entry point.  The PVH entry is defined by the x86/HVM
>> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> - *
>> - * This function is passed to load_elf() when it is called from
>> - * load_elfboot() which then additionally checks for an ELF Note of
>> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
>> - * parse the PVH entry address from the ELF Note.
>> - *
>> - * Due to trickery in elf_opts.h, load_elf() is actually available as
>> - * load_elf32() or load_elf64() and this routine needs to be able
>> - * to deal with being called as 32 or 64 bit.
>> - *
>> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> - * global variable.  (although the entry point is 32-bit, the kernel
>> - * binary can be either 32-bit or 64-bit).
>> - */
>> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
>> -{
>> -    size_t *elf_note_data_addr;
>> -
>> -    /* Check if ELF Note header passed in is valid */
>> -    if (arg1 == NULL) {
>> -        return 0;
>> -    }
>> -
>> -    if (is64) {
>> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
>> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
>> -        uint64_t phdr_align = *(uint64_t *)arg2;
>> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
>> -
>> -        elf_note_data_addr =
>> -            ((void *)nhdr64) + nhdr_size64 +
>> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> -    } else {
>> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
>> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
>> -        uint32_t phdr_align = *(uint32_t *)arg2;
>> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
>> -
>> -        elf_note_data_addr =
>> -            ((void *)nhdr32) + nhdr_size32 +
>> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> -    }
>> -
>> -    pvh_start_addr = *elf_note_data_addr;
>> -
>> -    return pvh_start_addr;
>> -}
>> -
>> -static bool load_elfboot(const char *kernel_filename,
>> -                   int kernel_file_size,
>> -                   uint8_t *header,
>> -                   size_t pvh_xen_start_addr,
>> -                   FWCfgState *fw_cfg)
>> -{
>> -    uint32_t flags = 0;
>> -    uint32_t mh_load_addr = 0;
>> -    uint32_t elf_kernel_size = 0;
>> -    uint64_t elf_entry;
>> -    uint64_t elf_low, elf_high;
>> -    int kernel_size;
>> -
>> -    if (ldl_p(header) != 0x464c457f) {
>> -        return false; /* no elfboot */
>> -    }
>> -
>> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
>> -    flags = elf_is64 ?
>> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
>> -
>> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
>> -        error_report("elfboot unsupported flags = %x", flags);
>> -        exit(1);
>> -    }
>> -
>> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
>> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
>> -                           NULL, &elf_note_type, &elf_entry,
>> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> -                           0, 0);
>> -
>> -    if (kernel_size < 0) {
>> -        error_report("Error while loading elf kernel");
>> -        exit(1);
>> -    }
>> -    mh_load_addr = elf_low;
>> -    elf_kernel_size = elf_high - elf_low;
>> -
>> -    if (pvh_start_addr == 0) {
>> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
>> -        exit(1);
>> -    }
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> -
>> -    return true;
>> -}
>> -
>>  static void load_linux(PCMachineState *pcms,
>>                         FWCfgState *fw_cfg)
>>  {
>> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
>>      if (ldl_p(header+0x202) == 0x53726448) {
>>          protocol = lduw_p(header+0x206);
>>      } else {
>> +        size_t pvh_start_addr;
>> +        uint32_t mh_load_addr = 0;
>> +        uint32_t elf_kernel_size = 0;
>>          /*
>>           * This could be a multiboot kernel. If it is, let's stop treating it
>>           * like a Linux kernel.
>> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
>>           * If load_elfboot() is successful, populate the fw_cfg info.
>>           */
>>          if (pcmc->pvh_enabled &&
>> -            load_elfboot(kernel_filename, kernel_size,
>> -                         header, pvh_start_addr, fw_cfg)) {
>> +            pvh_load_elfboot(kernel_filename,
>> +                             &mh_load_addr, &elf_kernel_size)) {
>>              fclose(f);
>>  
>> +            pvh_start_addr = pvh_get_start_addr();
>> +
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> +
>>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>>                  strlen(kernel_cmdline) + 1);
>>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
>> new file mode 100644
>> index 0000000000..1c81727811
>> --- /dev/null
>> +++ b/hw/i386/pvh.c
>> @@ -0,0 +1,113 @@
>> +/*
>> + * PVH Boot Helper
>> + *
>> + * Copyright (C) 2019 Oracle
>> + * Copyright (C) 2019 Red Hat, Inc
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/units.h"
>> +#include "qemu/error-report.h"
>> +#include "hw/loader.h"
>> +#include "cpu.h"
>> +#include "elf.h"
>> +#include "pvh.h"
>> +
>> +static size_t pvh_start_addr;
>> +
>> +size_t pvh_get_start_addr(void)
>> +{
>> +    return pvh_start_addr;
>> +}
>> +
>> +/*
>> + * The entry point into the kernel for PVH boot is different from
>> + * the native entry point.  The PVH entry is defined by the x86/HVM
>> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> + *
>> + * This function is passed to load_elf() when it is called from
>> + * load_elfboot() which then additionally checks for an ELF Note of
>> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
>> + * parse the PVH entry address from the ELF Note.
>> + *
>> + * Due to trickery in elf_opts.h, load_elf() is actually available as
>> + * load_elf32() or load_elf64() and this routine needs to be able
>> + * to deal with being called as 32 or 64 bit.
>> + *
>> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> + * global variable.  (although the entry point is 32-bit, the kernel
>> + * binary can be either 32-bit or 64-bit).
>> + */
>> +
>> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
>> +{
>> +    size_t *elf_note_data_addr;
>> +
>> +    /* Check if ELF Note header passed in is valid */
>> +    if (arg1 == NULL) {
>> +        return 0;
>> +    }
>> +
>> +    if (is64) {
>> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
>> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
>> +        uint64_t phdr_align = *(uint64_t *)arg2;
>> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
>> +
>> +        elf_note_data_addr =
>> +            ((void *)nhdr64) + nhdr_size64 +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +    } else {
>> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
>> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
>> +        uint32_t phdr_align = *(uint32_t *)arg2;
>> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
>> +
>> +        elf_note_data_addr =
>> +            ((void *)nhdr32) + nhdr_size32 +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +    }
>> +
>> +    pvh_start_addr = *elf_note_data_addr;
>> +
>> +    return pvh_start_addr;
>> +}
>> +
>> +bool pvh_load_elfboot(const char *kernel_filename,
>> +                      uint32_t *mh_load_addr,
>> +                      uint32_t *elf_kernel_size)
>> +{
>> +    uint64_t elf_entry;
>> +    uint64_t elf_low, elf_high;
>> +    int kernel_size;
>> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
>> +
>> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
>> +                           NULL, &elf_note_type, &elf_entry,
>> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> +                           0, 0);
>> +
>> +    if (kernel_size < 0) {
>> +        error_report("Error while loading elf kernel");
>> +        return false;
>> +    }
>> +
>> +    if (pvh_start_addr == 0) {
>> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
>> +        return false;
>> +    }
>> +
>> +    if (mh_load_addr) {
>> +        *mh_load_addr = elf_low;
>> +    }
>> +
>> +    if (elf_kernel_size) {
>> +        *elf_kernel_size = elf_high - elf_low;
>> +    }
>> +
>> +    return true;
>> +}
>> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
>> new file mode 100644
>> index 0000000000..ada67ff6e8
>> --- /dev/null
>> +++ b/hw/i386/pvh.h
>> @@ -0,0 +1,10 @@
>
> License missing.

I'm a bit confused about the policy for license blocks in headers, as
some do have it, while others don't (i.e. multiboot.h and acpi-build.h).

>> +#ifndef HW_I386_PVH_H
>> +#define HW_I386_PVH_H
>> +
>> +size_t pvh_get_start_addr(void);
>> +
>> +bool pvh_load_elfboot(const char *kernel_filename,
>> +                      uint32_t *mh_load_addr,
>> +                      uint32_t *elf_kernel_size);
>
> Can you document these functions?

Sure.

Thanks,
Sergio.

>> +
>> +#endif
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 6/8] roms: add microvm-bios (qboot) as binary and git submodule
  2019-09-24 13:31     ` Philippe Mathieu-Daudé
@ 2019-09-25  6:09       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  6:09 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 14041 bytes --]


Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> qboot is a minimalist x86 firmware for booting Linux kernels. It does
>> the mininum amount of work required for the task, and it's able to
>> boot both PVH images and bzImages without relying on option roms.
>> 
>> This characteristics make it an ideal companion for the microvm
>> machine type.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  .gitmodules              |   3 +++
>>  pc-bios/bios-microvm.bin | Bin 0 -> 65536 bytes
>>  roms/Makefile            |   6 ++++++
>>  roms/qboot               |   1 +
>>  4 files changed, 10 insertions(+)
>>  create mode 100755 pc-bios/bios-microvm.bin
>>  create mode 160000 roms/qboot
>> 
>> diff --git a/.gitmodules b/.gitmodules
>> index c5c474169d..19792c9a11 100644
>> --- a/.gitmodules
>> +++ b/.gitmodules
>> @@ -58,3 +58,6 @@
>>  [submodule "roms/opensbi"]
>>  	path = roms/opensbi
>>  	url = 	https://git.qemu.org/git/opensbi.git
>> +[submodule "roms/qboot"]
>> +	path = roms/qboot
>> +	url = https://github.com/bonzini/qboot
>> diff --git a/pc-bios/bios-microvm.bin b/pc-bios/bios-microvm.bin
>> new file mode 100755
>> index 0000000000000000000000000000000000000000..45eabc516692e2d134bbb630d133c7c2dcc9a9b6
>> GIT binary patch
>> literal 65536
>> zcmeI2eS8zwneS)hF_vth5y2#;brv;O^x_6mtAUMO%tn5}IEf)jY``G|XbMBqa_^-@
>> zBBUXSyplo3+VMx9Ci}VFwz~@rx!t?By&v!0UdNb=V<BQ0NE-8!l)yH1LPAE!VDnNi
>> zCSmUHnUQRVmuC0>GyaU`%sJ0_&U3!!Ij`fT?33lo8bTirhX$S6(dmK^Nu2qLAPZJO
>> zd+n=&@XLYwWFdS~Zh2c2gidFUEU<VyB{hF24C^}E=jl<HQ(<+MP>-}gkOUzx)FqJ6
>> z;W433mmmmAY+Noh;tibd?18@VxCH}v4GeX<kXL!tiyU!Hnn`6SuU6r$bB&SE_=SXJ
>> zc+)=1$Iq|sgvbt9s)=?%V2RL(E{A7Ar52#~B&%`TJKuimt+%dx%KD*M-I%M^1gEP~
>
> Now that using Docker is quite simple, I'd rather add a job building
> this and commit the built binary, so we have reproducible builds.

I'm not sure how can we achieve this. Are we already doing it for some
other binary? Could you please point me to an example?

>> z3rrTeoTTV}=y-L<H(`6Jx<%^VI8zpO=OW?aYwDJw?(oFd+1)>#`0DNc_4sRW0TC1A
>> zmJsrGlX|tHBmSvHZ7h?b^=>;mu1$tb(P>mvG{5}(=7p;C?X<Nv)KiF;vS^dFph*f0
>> z|KM_=_+GTu{^~BsC2OI`iH8;Xgy<4`bd|F?E`U$ysLqzy*(zuJjHIw>{{UgF+=c4I
>> zP{`<|^mTPhV|UNEdEHc{)HAxSYaiW>P|0;&C$X5aR9UVpQyP@Vl`hhv$gdx{Z+-NR
>> zwvW~;(eFuX@w<QuU$3wVw^IK3ra9{s)&CnoiJ!J0%~y=q9~UYm@2P&Y>tUs8?fTZ0
>> z-^y7ZnXZ(N2F|Wml7g>taRZ)SYa#R~3)d_2XS+8|g~IPiN|W-WvPxO4Jf$R*7zq(M
>> zVPZ7&u2-7NT$&*G^Vf&!N<}4sxYR-&zO@e~z~qj%l+aa&|1SJK8kh_<mPg?Ph8*ct
>> zx<+yY;hYhS;T3-|U%TtHtLYu}w_i63jI9sWyCrer`&zejU6FSvla#+O5G_?20qHTI
>> z@+oXU(Ov{h!_X&`70OD~fa)<r$y4N=@1QQhFU$Y+KbwE#7xIotf3bYo(#FRhYw)oF
>> z?fGIsXnOLA6)T@wwR%RLyz}o5Wo%bFs0P1iIT@I~!0oGks67~Pcuwvn$LZRE?$a*(
>> z{h{{eadET88UP82Cf}S~Nfy9}`gHKyg1<oBEO>(z79nsokwAD^1AC7pf@Oj~FIF9#
>> zF9b$C2U-gqk-~z?@R7iuo?L~z4ZVWkIiSQ^i}NGJ*2?fn#8cjeR;%3om9j(r$>}0*
>> zmHG01U~>3C;Jl}&<hUUtcgHCdOXn#uZ}@;euhm+1IPj;0rza63dsZ*SFvX5~bm(K(
>> z7X>R|!j%@?|5s#Espmjtu)+#k%inqSKe2u4Y^$dyZBy&S@?QTm*4Jv!DYIMrLsd*G
>> z=`T-i{>XEL^*_04^-~G9FB2d;TMqbVppb+**NV)Wh3cyEi~fAMBS-GYFX{6SqmnR3
>> zi7j8o-YehtBPZl-;$fEDRenBjZPn_8`k0QW%duAboe{h9;n1k=PmPBITKXgysm0cG
>> zff9plWlq19^_3qFT=eu943;%0F}d3r9Ci6~gQK=UjyF9V9G&A|6db+Rf1l@a_=w-<
>> zIgp;B+L?Gjt)J5Gg)}0q>Wcp0HQVM-TQ2)8SKeoRFckXbZl8$MyIG&-ayp6njK|qn
>> zUpF;;x*hu7L6uw4aN!=m!{{NE=USzLa8KY0G!Yj<9~vxX3HENZ(Onw#yXUrCeraCo
>> zVgU^`gK6GqgVb`wZ;#Gbmy2v_MAjdX^lEFX6)k+K<#J$AXn(Om8@eKmZkcJ?-+r#^
>> zCA}?|Tk-m02jZiTNML7==D<i+5OJC+mIOXEN(uZdy$8OjX-^@a*!$f7jzz1bmLC6P
>> z$oo)aui!E>)CNvx3sYjI-Lq-*XZ<Zl<Oq$)Z{QZ>a2UU;-+c$WsKoy6%2oD<>eine
>> z!#Ejf&|7)}XW?0sdUNKePg&pstJ#T?3+&za*%{)yhd+P##k*mz+)*)!?YnZKwSKA|
>> zu}@-GBM5lQwXLUnbA-^KRrkB=>1G$AHSS_<H^$k}{tAOa?8@;s2!3iSdOq3ETjRc?
>> zR9)$wnodlx*{W`JPX*Q&*ij0x>ujsr9%IKnN3NV_f2z%xyG&<)WQp>tn@_V;6awYf
>> z{q=0PWP~N+=^0|;@HO^_x)<+6Z;;S0r?eL5MGEsG)4dPD(&64oU$Ar(mKJI91WQp*
>> z*fzBkMi!<IXKbrL`}@Gu{V;2E&q~&~XA{xZDd8h>_#i^;Q2JgahPiGQ8gvStZuJR~
>> zt#rF1Q*=b?$N+zvO5*<=;%c=B9T?OE0Z$h_QD(6#I65=Xr9N+IZ4eRkS9#6`N53nF
>> zgGh%{`-7vUa`;ocs8#P&SmkZ6Ac(#qhoeSdgWU2t0t?xQx<zb7mzo*qiEe;NXZ?4e
>> znr`x%Mz_0Hm^o(CHN&Qs4b&*`M+jj!!(+BL+i)zAszGW@t(BrHq3fi7YSe)q;Z|3e
>> z70e$~*2}J?iVWFD_25hT9RCm4JD#4_8EQj6M$u8*vvNr?R^2MiP<{P)-F%7lF{}=>
>> ztJi2*MGFO0nhpzHsS48`Vp6I$SzypCIQ|r)66h+xi_VhantBb5r^GsquKknX=-R+M
>> zR5)+3-109(YL#R<<}5hotld~R3DG;%8uwi7=x5}ePWAz;ei|y+wczL$|D53HM*s5Q
>> z=%zrc`r^c_nOvX1R?2lfbszGepq#~lx-UxZrlnOz{2p{&Q(UKz(M1ePy8b6tOokmV
>> zLn9Qdcp}|RixRp+gLWhp<KZKhBHF0BOWSB@D^4^t4<64$-NNpV@i4^x#yyN+S3gD9
>> zxSiTUh{EF#MDZBprm6LA(HQ8^&gUvyD|KV6JKnsXG+_>+axvgJlad?eTee+@?{jXu
>> zfYy+<4a9q#+Xg%bFc#Lkt?+6)8^}a<6=7{Pl(xY4C3hw+G;lp|9;H5+Abs%xXQ(ef
>> zlf)pDipQPXQqUx2%7Egdw^J<MQg3E#-s42!M!T~U%24+dys-;|a>oenzQOp1R=J}G
>> z#oOYxtAp`@B3qnI9t8DHEcgY=VF>p@4)U7qZI&7f#y_FB9^0F2j)oj89X~b6STSQI
>> zJGB67=8qA3IhV9q{PWleZJ*%`FMcgLjZ#$m)L7J#t+PdmS4q~;K5zNqKq{1(6=GPR
>> zzt1jQ+&}|ddcP9G9Na5+7s=gK*0Mxk2BxTKvC!2A{4?b4l@75|?ykwFgh^OnXr7ZL
>> z33i+&JdDZR31b~%GO@GXorc9VZcG&~Hp$(SA~o)u=mIh;l(c%zVi2>ct4HK+8P1VY
>> z))%OjyZbRz-UVHukq7*D$=!{UL`<^vIo7g+DDMc$J5q8GgR;CZl=RNu;Fbd2d(lC4
>> zriJ#~jnN$*YE7SvVGH7y;{%g&a;dneK=#pEW<1I(by1@!Ly6_^4fgx76!~?plm%+n
>> z5{9EYS7Y6gk?*3`y|@8xwK@?azsl16k9t%NY`T@Nl08`i3bYI6;DEBwPKPHJZrYud
>> zoU9dPO@-cD*-GuwJh;JyN!V~#+6S;vWoVD#t|#DT!#BC>`HZ{PyBj;<ZH7GiGSQWt
>> zv{53}H;QWUPn@=t?ff8nGyX}D?d{IHZX=lKEx$90?-kE6zq?5e|APg+tSFNuQ*pWF
>> z6~tnSUb9<pQExIFxfnxFh1P|Z3irq@wAgBf#7hh7Yvu43$afnA3^OlGByigfD~F%Q
>> zo~RQe6nb{1V%i`)+74#uIPp)jeLQI!GOSK!4GjMya<b9RDafmB!We7f&*I!`;6DR3
>> z$8W;_zKLHBU&qd=lco%Vswtc)+>gRfXAAGOT^wY+@zX`N53<F#_{i`(vU#ttGbSPI
>> zYF`40owUQ9L)(<Nmd|Rf$y#GUk=Y@C1M^Bvw6-*~Pcqj2NrAj3njc*uu{w!0S)&hI
>> zqbuyJ&d!>g(TAO^u5eMPrzo_quzV<RM3o*Clkrb;*o(6}X)2oNm8k0k*vI3ioVNGb
>> zA=|aVNId>wk?s9XOZ#u`Wf%1iwZW^pj&5Bay_-h4&?!svD7EzF;^s5-#C%kdOM#Y?
>> za@Yk<+8fAV&^B6`nhQ&_8YH*uMTN7?oyxijCbmWhc*lVK22!2VV6WNICN4;=#ENPf
>> zB<oC?h8s-qwi<X_^M2}I<~p;BjDjDo(e8smsqdqE#*(=LaBq3@zuuqBl@Jn9N;21q
>> z5M9Y!`&ejhCS-c5(O3U{o@&kx=6dE2<auvqDn`p1Is6S&j_ot5r=1?b&^f0FB_(r<
>> zGv)0izcf6QX@7on=(F{j-P8Y^DQfg~?ItHqWqbP=;3KH^xJM4L^F~u_a2IEnpy=*y
>> zgl7>HT7SIiG;A!)*l2Phb~}x8987b92)D_ZSoaU%3bdVOzo&{{{O9OoL)L>`+Wn!p
>> z#QN3ZzZ5U3#V4ZW(Pt!9hI5eUbJplHcDBXJzGsg<=VV))p?D$OxjYd#!KTeJY?z&~
>> zfz2yL=s!A+^!Xn!dcxQkGNBI)$4@(L=h^RU*E4<K{2p=^qa&ajd~zB7!&2FS<QrGe
>> zD>9yoIC59*PBRXEDu<teyOFOVjrzVeitMRI^1%u72&Vkv)M3mpMt7O$?y2P%sIZnD
>> z>joIhO<zwPgc5ea)Vhnb{vTR5;#}Z7J7dPSQS59iL}EC8Y&+i8(>Bniwe>5W9{HXg
>> z<;*qm=Ha%JZHcy1ZRxh7>{uw_YVawZlM2o+JVz9du6WKKYVQ0xe6?=vj~e8b9^|0;
>> z>oFzzQ}W>GJQ}vt7j=f?u;y#JC~w~hkuG6}O}N)r6#8fwn`OJZ868i5U~}$ndO*%%
>> zk;7}@H#p&{dGJ+*`D(ib^ude<8&a@8wFyz53;WeD?Nv(-XAWBru?UW11Qzq+SMG>S
>> z!-IrAfROsI82`}5>YUNK;*!G{16h+D4z&?}B(;kxjNw3u(fZ?Za31oBflXhq)U<5d
>> zBe2r2$2211gXny_dvgwI!eaG!>kZDuZ#(1#i@a?q<bx4X7{pMBhr(iYc$nt5R3h1P
>> zz;iNHg#g+IxkSsem0u`%jM1|XeSwM71d(n-zTE;Qo-T|eLTWlR#SClx-`tC}xd3ww
>> zQ4#3znpy+_19rN7ye6p!5j$2=rZsMv{Z7)4W>z=4cjF9Ph?ZS^Eox<l;%O=BlS&ST
>> z&N$Buz<Y{}!&RIgMklYqsX<Ac^<6pfwf<M>CBfo5*W#(xICY8?vDl{f1nn!9n&<6<
>> zv)Q`-&R=6X(U$xP-rTDlqDLY$5G>h`BRqR_UB9~Sk~xlJb^H+<*4~9gJe}wNn3fjt
>> zQ+evv;tKY{TPzW+-X9vV_&3Sh$rrFY+}BzyMHU{zc}#VmcJ7j){|BT$9=eW`@gUrc
>> zjfQmRrrgYU6@1~Cg)KfNho>M(n*|6s7?b|57cL%Md<<njf!v{VaQ;(o$>Hx1Orywv
>> z+FD&f1wCJ;ZtBB^;IC0uw6KNpv88?ForP4^u=dP&0>xg3o#JS6;Q*dOa^c7HkUJJ0
>> zQ(qXsF=PDVu4tgoql<xoDKC7?ey;`wC@%Fp3J#6r$Su?_qK{bF;y#=_v8BhN_fti4
>> zsf-Gv@D|kaXAsf?h%4`vLn`H1daQh(y#22cEa{If?kkT!`yA^RM=+`W7qrphW9axX
>> zuiR0miwEZ~JQlb#tMGwT9?I12r9E+S0+4;EQY}>PrUhKrUNj>c(e|dX3&vva7^I^n
>> zyjNWyk<4}Dm#!2Y1EC)tN$=8zFo?l_XiTBCzjdad>-cnmo~(X9uX_Z?_c7Kt^E2eK
>> z9iNQfAiFRmcN}g$_!*Mg?>|>r{&wI4Hgbam8<9KuGcu*;GX0*)*i9Nc4Lt{k_K*&C
>> z;_QNL!3XY;!)~+@f;}aE8%`qWSX7C<v1SBCj<AD6?fLA)IU}_9b1FKYoiSg#uhVM*
>> zyI{V)p_{Lx=Q_Vfc{4dDyqp<Np<H7r1q^w-W}Z%Q5}7vCS(DKbMCrR@FnmW06mf17
>> znMQ$D<F~9|4xd9BDu&NB?T9alquWz!UEI|f)DcPJ10sXu(}--Ja@uuzN4xd(*V;Fp
>> z-q|kUX~9!G{Wbhv!<&A3r>Cc@AEV8=c6--h<0+c`uzQUa^%_Ra-Y{0eBLs$Cy9f*$
>> zF<;E}3$dX^Nk)L!Fb%aVFvyJ6GHAZHi%``W%O+xxPYC)mi8iR&{RFC+8ZrEb9rW}V
>> z`@)ojTG4rQ!=CZxJu$S~Xx@X=ykl)j`_Jw*<VM3#?a_?}?OaO{Cx(m{BY}<(+NtTN
>> z%5a11nO|@2-vO?k%zXt9y5x@BpCO1Xz-;;TdA~pqOD0$A<eySxy?YKk8r$nxxUtcp
>> zF#zXjSWp?~d^pScDjo7o*Y}dy3Uc0uMYIz#cF0H&7rLnqIAycPPgz-}1DwGz1#Q|3
>> zy-H|Al(+4|RD(1K2HBaRE<#QlqOTyXg5t2fr+8`{S`&s3$Sozvav36`{~nY`$!{1f
>> zHL9Ft@k|Ws3-wuyZO-w@=-;G!w;^;;IjbxXMgMwbfix*l(R>Iy!5}-1betc|rhY(S
>> zEcgzZA){m@O+SYpJ1m=y!_Q)>y%^exYRcjNftpg#Du;iCAd)5A_Ee71uBZn#EmVHz
>> zFd@7!gcwXdM$9EzT^ccl6_D9DCIiZbjNg<&y$azA&W-*&Jf{3G(A<2;n>Bs6fUbel
>> zN7_t~`36STrwo}Uwmbz>;5)&sZhRJwtiW3=gtoF$gO#IE<(5ivO04wmgUnZPw7v&5
>> zp<4fvTn2~5#cUpVP3ztNgx{9S=Ey8})VrtW;6#u0!Br+3b$GLj+#%CogsVDY<cZu(
>> zuEucM!ZcGPa@=5hF8jWh+NWK|$P`cIutT9!GsHhCYyr8Y_H7zyqei$S8)@2tDo9y<
>> z$ht&sQBb|?X)n(4Ob@}?813QTgdW7$<62H=U<OQgp}unj|3bCis)s(b$bTCT#q%?H
>> zA^>_W^%G<Py9~*x;B*7JNf1c&ARgyMMK5)U@icu4!I_TLn3#=~y7n6^s*}R%`@gjs
>> zpGEuWLpZ*n3ZmMSpY+l=g!gI;Q`l<Y+uJwtBX#*z<wpaTL;I~b%f!ei1T$QzOPOQ7
>> zzt|WX0<(K_sl(E_sdk5P&SOtE$IM8PB0tB<I<((zd@-iw3(n>q$AZJ!d^I0O%5r!e
>> zeD2vz%~!3)+jhJ)U$r3|crd!LIO>(gCbkRB9~pC5zcH7Y2C?DLfMv2VSnX5J%CGmZ
>> zYHR6PR?vh~yymN-p>hZ8SDUXdp_yOY{7~uH;M>itAib#hs@+KcHPV}}PNpK~Pbe?i
>> zP<)F5qt(_fDsCwKQ&O~z6@(s{Et<s@kcu7u9u;y2lpE9Vxzu9n4@q+u?UkGuU9^Ax
>> zNY~|#AUQkgldQJmTdXTukh%w=@To7y0xQ{t&_@=t3}q;#!9l*%Fr#8;QKGTPh<&NN
>> z+)-VeVUNL)(5HeZht>*ccpM3QW&>K|NMI}^Ahde1Z#9k%5OU&HOhHlFOKF}Se7{^O
>> zi2VzjKd02d`B}O7cy<-y$N1ntyWxxfE165FzUsxk-rPG8l9IVacsy41=%Y9XK&a8>
>> za&@{8=>8ZB=iL~_@LwDkUrXlBMVufET<^qD$Bm~s6Hc$jcCsG~GMxVu87JfNFKt(_
>> zTa()!L^E~!pvj}tbOTK^xL9$fKf>N+e~hWi29vcT6rHRkrxki@2ZczL*jr^O!~!qe
>> z2siJo$`RR*CH8%*y5y|qi05QF7j>UfDIP8VUEs2~(k*#Iy@lnHJMb^~?qrQhZnZcV
>> zKV?z(RG?t>Uu*CiO4KXKJW)t4vOLsK=~9x5Cb?Yrgo?vy!(beIOLDiRN{pBPK5!W$
>> zEdDW}cao)5a(g3^J5nnu$y!UHX#96b&vkW~80xilqHg$>o6f^HlF9pC;Ib;om<^>)
>> z3&#2z&cMtgSN4u^euyssPTN%+kJ2}SK9=+xQA5<Q2A<E80yRXhR}vz=+*Z3gQa35$
>> z%Ts?W;*8FQL-9vKqZa>M^<~khAvsVdWq>ur#;6Bu5%IZ0t+rzO;df!Z8#MHfrhlo#
>> zz%(cID>xx+@Ac+c(!V8i$q;FhpvkBxhJ%-FMginQoz!|0I6=S3DH)<<T1ptXbWweI
>> zijj*PEt_i+eQzJL>uM-rRoLC+t2igf`Y|<E;Jny3EB7vhFcunX!9DdTx0@RA4HC$@
>> zk#XjLl!+Jq^MZ!h8!zOd3kC{Az_<)oq-`u+G*HrIzL`l`Jw~of3jVRfyy$?_h%5Dc
>> z>;Ad({_=sc^6xzT-L>nK(g_#I(CG(V;*TE}#I08Gt9D6>K&20HSUL!oPU`wj5~y-m
>> zTP#%$`}UkFhjW`$<tPm0Rym3+oAVuoRSAc)s@Gw!N;<4nZ#tx!f=6zSq{X)&Y{Xsj
>> zZ@Nxr?Q~IG7<V1wJQRWJb~cecbl)zWbWwWUA9S6Lis*9TF2!AsxO?yvDjn8smP&^`
>> zl(Q<~uvYduBy8XXp)n(4<bzDe(vU!sk)LcPo%iBN`9qe~`ADoOs5L~4I=q4;A7rMI
>> z46Zud|Ad?3)=@+)@k6=Wb1I2nCGoQ?fgUGpuvdayCCF74EW9BGSzWj<;obmUU&m8e
>> zI97)zvvqJIx<~|H$b6e*VCQx!Z(B}delL_KNjV)>2!a=KVaU-3KeIl|jf#+tm6^0H
>> z2;Yt~=^)O>pyO^lO&u@=ynJ{q!+jiS#y=s!j+^RT?I?W#Zc}+fRci_?>u!!k+d<#o
>> z;I8;U*Z&nyD(@CLy_SHZuEtSc<M1uHSxuG08_I35Y?HR<kiLJzy#xAw{^#`l;Lqr*
>> z8Q15Bj@jTg2z`F)qPMosXSsBXN>>(G-gGq8I9Ap;)-*VLYqH@b4qsz7jQEH1voDW3
>> zR#rNis?e6y2D?zMHlXV+goSE{bCILC(&1{PrVwW1-k(Wtpmf+3veY_m*RI+AWB*Xs
>> zDL-8<+|++QLmAYzrjBZK{D<LGrvJMT-H3Y@@-^W(F7x=9Oa-FJr&dDdNJi!sZCGk~
>> z{T7T+S*lAX?qTHrt4#h(KmH*)k<pQB3Tm?5L7?(9lEVrtl##zhc`nNjZ&4mP3bB|4
>> zb~VU9ajdF9Tpi-F3}->kmx-I7DQCL{eAeWfovymzF*9JY+zQzy<R@O^oR*AnB7Hj`
>> z6M+ncNygCkSVmt}0STiw!#*Ux1=%oUzd2F1AtPHeUbYwMBN@G$aI8XlZ{ktw2v+8-
>> z$;sC3#yOz|*~sTEX}W#|x&+khCLNV<jo;qf{T4Zjvfa<nu@>2HS5DRs+s~MszfA78
>> zkjwfj`3d>!F2qe9>x)&iRW`00>oga!RHyKuu0Kr@x8h=1al=Suj!BIWZ%4ilh{dh)
>> zegDScy}BSr7H^ECu565%yYTd$({(!DG27i3zcF8gB${z<R|oQSs>2%O{UNQgZe>fg
>> z!<Tc;atmj#E^r~sO5CrU*Y!v6r2HZHu+xHxCez1Be-QWWY@jipWCo#QAj2cmKViS+
>> z>3qy_x650R$s4<<>(#fn-?h&F-EXcd`&Ow?x<#O{|2t1_ST|?GfBVkbbw3gwZ>Vwk
>> z8XtE-7r!_GPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(8
>> z6W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;Z
>> zH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULas
>> zfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O
>> z1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1U
>> zPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu
>> z-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m
>> z0Z!od1Ux-$$J=_^2HLc?{{JUzm0dl`OkLOi(JspO^xUV&;+-j7IrD2oSp}ujDU3^>
>> o5d@0NUb>FZ&)-2Do-dnE`R8)xT^9bc5QmajO4XIv_+RY*0|l%Fi~s-t
>> 
>> literal 0
>> HcmV?d00001
>> 
>> diff --git a/roms/Makefile b/roms/Makefile
>> index 775c963f9d..47eabc8633 100644
>> --- a/roms/Makefile
>> +++ b/roms/Makefile
>> @@ -67,6 +67,7 @@ default:
>>  	@echo "  opensbi32-virt     -- update OpenSBI for 32-bit virt machine"
>>  	@echo "  opensbi64-virt     -- update OpenSBI for 64-bit virt machine"
>>  	@echo "  opensbi64-sifive_u -- update OpenSBI for 64-bit sifive_u machine"
>> +	@echo "  bios-microvm       -- update bios-microvm.bin (qboot)"
>
> I'd go the other way around:
>
>         @echo "  qboot -- update qboot (BIOS used by microvm)"

I think it's better the other way. In all cases, "target" specifies the
binary that will be built, not the name of the project holding it
(the targets for SeaBIOS are bios.bin and vgabios.bin).

I also want to clarify that the name bios-microvm was suggested by
qboot's author (Paolo) ;-)

Thanks,
Sergio.

>>  	@echo "  clean              -- delete the files generated by the previous" \
>>  	                              "build targets"
>>  
>> @@ -185,6 +186,10 @@ opensbi64-sifive_u:
>>  		PLATFORM="qemu/sifive_u"
>>  	cp opensbi/build/platform/qemu/sifive_u/firmware/fw_jump.bin ../pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin
>>  
>> +bios-microvm:
>
>    qboot:
>
> or
>
>    qboot bios-microvm:
>
>> +	$(MAKE) -C qboot
>> +	cp qboot/bios.bin ../pc-bios/bios-microvm.bin
>> +
>>  clean:
>>  	rm -rf seabios/.config seabios/out seabios/builds
>>  	$(MAKE) -C sgabios clean
>> @@ -197,3 +202,4 @@ clean:
>>  	$(MAKE) -C skiboot clean
>>  	$(MAKE) -f Makefile.edk2 clean
>>  	$(MAKE) -C opensbi clean
>> +	$(MAKE) -C qboot clean
>> diff --git a/roms/qboot b/roms/qboot
>> new file mode 160000
>> index 0000000000..cb1c49e0cf
>> --- /dev/null
>> +++ b/roms/qboot
>> @@ -0,0 +1 @@
>> +Subproject commit cb1c49e0cfac99b9961d136ac0194da62c28cf64
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 6/8] roms: add microvm-bios (qboot) as binary and git submodule
@ 2019-09-25  6:09       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  6:09 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: ehabkost, kvm, mst, mtosatti, qemu-devel, kraxel, pbonzini,
	imammedo, lersek, rth

[-- Attachment #1: Type: text/plain, Size: 14041 bytes --]


Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> qboot is a minimalist x86 firmware for booting Linux kernels. It does
>> the mininum amount of work required for the task, and it's able to
>> boot both PVH images and bzImages without relying on option roms.
>> 
>> This characteristics make it an ideal companion for the microvm
>> machine type.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  .gitmodules              |   3 +++
>>  pc-bios/bios-microvm.bin | Bin 0 -> 65536 bytes
>>  roms/Makefile            |   6 ++++++
>>  roms/qboot               |   1 +
>>  4 files changed, 10 insertions(+)
>>  create mode 100755 pc-bios/bios-microvm.bin
>>  create mode 160000 roms/qboot
>> 
>> diff --git a/.gitmodules b/.gitmodules
>> index c5c474169d..19792c9a11 100644
>> --- a/.gitmodules
>> +++ b/.gitmodules
>> @@ -58,3 +58,6 @@
>>  [submodule "roms/opensbi"]
>>  	path = roms/opensbi
>>  	url = 	https://git.qemu.org/git/opensbi.git
>> +[submodule "roms/qboot"]
>> +	path = roms/qboot
>> +	url = https://github.com/bonzini/qboot
>> diff --git a/pc-bios/bios-microvm.bin b/pc-bios/bios-microvm.bin
>> new file mode 100755
>> index 0000000000000000000000000000000000000000..45eabc516692e2d134bbb630d133c7c2dcc9a9b6
>> GIT binary patch
>> literal 65536
>> zcmeI2eS8zwneS)hF_vth5y2#;brv;O^x_6mtAUMO%tn5}IEf)jY``G|XbMBqa_^-@
>> zBBUXSyplo3+VMx9Ci}VFwz~@rx!t?By&v!0UdNb=V<BQ0NE-8!l)yH1LPAE!VDnNi
>> zCSmUHnUQRVmuC0>GyaU`%sJ0_&U3!!Ij`fT?33lo8bTirhX$S6(dmK^Nu2qLAPZJO
>> zd+n=&@XLYwWFdS~Zh2c2gidFUEU<VyB{hF24C^}E=jl<HQ(<+MP>-}gkOUzx)FqJ6
>> z;W433mmmmAY+Noh;tibd?18@VxCH}v4GeX<kXL!tiyU!Hnn`6SuU6r$bB&SE_=SXJ
>> zc+)=1$Iq|sgvbt9s)=?%V2RL(E{A7Ar52#~B&%`TJKuimt+%dx%KD*M-I%M^1gEP~
>
> Now that using Docker is quite simple, I'd rather add a job building
> this and commit the built binary, so we have reproducible builds.

I'm not sure how can we achieve this. Are we already doing it for some
other binary? Could you please point me to an example?

>> z3rrTeoTTV}=y-L<H(`6Jx<%^VI8zpO=OW?aYwDJw?(oFd+1)>#`0DNc_4sRW0TC1A
>> zmJsrGlX|tHBmSvHZ7h?b^=>;mu1$tb(P>mvG{5}(=7p;C?X<Nv)KiF;vS^dFph*f0
>> z|KM_=_+GTu{^~BsC2OI`iH8;Xgy<4`bd|F?E`U$ysLqzy*(zuJjHIw>{{UgF+=c4I
>> zP{`<|^mTPhV|UNEdEHc{)HAxSYaiW>P|0;&C$X5aR9UVpQyP@Vl`hhv$gdx{Z+-NR
>> zwvW~;(eFuX@w<QuU$3wVw^IK3ra9{s)&CnoiJ!J0%~y=q9~UYm@2P&Y>tUs8?fTZ0
>> z-^y7ZnXZ(N2F|Wml7g>taRZ)SYa#R~3)d_2XS+8|g~IPiN|W-WvPxO4Jf$R*7zq(M
>> zVPZ7&u2-7NT$&*G^Vf&!N<}4sxYR-&zO@e~z~qj%l+aa&|1SJK8kh_<mPg?Ph8*ct
>> zx<+yY;hYhS;T3-|U%TtHtLYu}w_i63jI9sWyCrer`&zejU6FSvla#+O5G_?20qHTI
>> z@+oXU(Ov{h!_X&`70OD~fa)<r$y4N=@1QQhFU$Y+KbwE#7xIotf3bYo(#FRhYw)oF
>> z?fGIsXnOLA6)T@wwR%RLyz}o5Wo%bFs0P1iIT@I~!0oGks67~Pcuwvn$LZRE?$a*(
>> z{h{{eadET88UP82Cf}S~Nfy9}`gHKyg1<oBEO>(z79nsokwAD^1AC7pf@Oj~FIF9#
>> zF9b$C2U-gqk-~z?@R7iuo?L~z4ZVWkIiSQ^i}NGJ*2?fn#8cjeR;%3om9j(r$>}0*
>> zmHG01U~>3C;Jl}&<hUUtcgHCdOXn#uZ}@;euhm+1IPj;0rza63dsZ*SFvX5~bm(K(
>> z7X>R|!j%@?|5s#Espmjtu)+#k%inqSKe2u4Y^$dyZBy&S@?QTm*4Jv!DYIMrLsd*G
>> z=`T-i{>XEL^*_04^-~G9FB2d;TMqbVppb+**NV)Wh3cyEi~fAMBS-GYFX{6SqmnR3
>> zi7j8o-YehtBPZl-;$fEDRenBjZPn_8`k0QW%duAboe{h9;n1k=PmPBITKXgysm0cG
>> zff9plWlq19^_3qFT=eu943;%0F}d3r9Ci6~gQK=UjyF9V9G&A|6db+Rf1l@a_=w-<
>> zIgp;B+L?Gjt)J5Gg)}0q>Wcp0HQVM-TQ2)8SKeoRFckXbZl8$MyIG&-ayp6njK|qn
>> zUpF;;x*hu7L6uw4aN!=m!{{NE=USzLa8KY0G!Yj<9~vxX3HENZ(Onw#yXUrCeraCo
>> zVgU^`gK6GqgVb`wZ;#Gbmy2v_MAjdX^lEFX6)k+K<#J$AXn(Om8@eKmZkcJ?-+r#^
>> zCA}?|Tk-m02jZiTNML7==D<i+5OJC+mIOXEN(uZdy$8OjX-^@a*!$f7jzz1bmLC6P
>> z$oo)aui!E>)CNvx3sYjI-Lq-*XZ<Zl<Oq$)Z{QZ>a2UU;-+c$WsKoy6%2oD<>eine
>> z!#Ejf&|7)}XW?0sdUNKePg&pstJ#T?3+&za*%{)yhd+P##k*mz+)*)!?YnZKwSKA|
>> zu}@-GBM5lQwXLUnbA-^KRrkB=>1G$AHSS_<H^$k}{tAOa?8@;s2!3iSdOq3ETjRc?
>> zR9)$wnodlx*{W`JPX*Q&*ij0x>ujsr9%IKnN3NV_f2z%xyG&<)WQp>tn@_V;6awYf
>> z{q=0PWP~N+=^0|;@HO^_x)<+6Z;;S0r?eL5MGEsG)4dPD(&64oU$Ar(mKJI91WQp*
>> z*fzBkMi!<IXKbrL`}@Gu{V;2E&q~&~XA{xZDd8h>_#i^;Q2JgahPiGQ8gvStZuJR~
>> zt#rF1Q*=b?$N+zvO5*<=;%c=B9T?OE0Z$h_QD(6#I65=Xr9N+IZ4eRkS9#6`N53nF
>> zgGh%{`-7vUa`;ocs8#P&SmkZ6Ac(#qhoeSdgWU2t0t?xQx<zb7mzo*qiEe;NXZ?4e
>> znr`x%Mz_0Hm^o(CHN&Qs4b&*`M+jj!!(+BL+i)zAszGW@t(BrHq3fi7YSe)q;Z|3e
>> z70e$~*2}J?iVWFD_25hT9RCm4JD#4_8EQj6M$u8*vvNr?R^2MiP<{P)-F%7lF{}=>
>> ztJi2*MGFO0nhpzHsS48`Vp6I$SzypCIQ|r)66h+xi_VhantBb5r^GsquKknX=-R+M
>> zR5)+3-109(YL#R<<}5hotld~R3DG;%8uwi7=x5}ePWAz;ei|y+wczL$|D53HM*s5Q
>> z=%zrc`r^c_nOvX1R?2lfbszGepq#~lx-UxZrlnOz{2p{&Q(UKz(M1ePy8b6tOokmV
>> zLn9Qdcp}|RixRp+gLWhp<KZKhBHF0BOWSB@D^4^t4<64$-NNpV@i4^x#yyN+S3gD9
>> zxSiTUh{EF#MDZBprm6LA(HQ8^&gUvyD|KV6JKnsXG+_>+axvgJlad?eTee+@?{jXu
>> zfYy+<4a9q#+Xg%bFc#Lkt?+6)8^}a<6=7{Pl(xY4C3hw+G;lp|9;H5+Abs%xXQ(ef
>> zlf)pDipQPXQqUx2%7Egdw^J<MQg3E#-s42!M!T~U%24+dys-;|a>oenzQOp1R=J}G
>> z#oOYxtAp`@B3qnI9t8DHEcgY=VF>p@4)U7qZI&7f#y_FB9^0F2j)oj89X~b6STSQI
>> zJGB67=8qA3IhV9q{PWleZJ*%`FMcgLjZ#$m)L7J#t+PdmS4q~;K5zNqKq{1(6=GPR
>> zzt1jQ+&}|ddcP9G9Na5+7s=gK*0Mxk2BxTKvC!2A{4?b4l@75|?ykwFgh^OnXr7ZL
>> z33i+&JdDZR31b~%GO@GXorc9VZcG&~Hp$(SA~o)u=mIh;l(c%zVi2>ct4HK+8P1VY
>> z))%OjyZbRz-UVHukq7*D$=!{UL`<^vIo7g+DDMc$J5q8GgR;CZl=RNu;Fbd2d(lC4
>> zriJ#~jnN$*YE7SvVGH7y;{%g&a;dneK=#pEW<1I(by1@!Ly6_^4fgx76!~?plm%+n
>> z5{9EYS7Y6gk?*3`y|@8xwK@?azsl16k9t%NY`T@Nl08`i3bYI6;DEBwPKPHJZrYud
>> zoU9dPO@-cD*-GuwJh;JyN!V~#+6S;vWoVD#t|#DT!#BC>`HZ{PyBj;<ZH7GiGSQWt
>> zv{53}H;QWUPn@=t?ff8nGyX}D?d{IHZX=lKEx$90?-kE6zq?5e|APg+tSFNuQ*pWF
>> z6~tnSUb9<pQExIFxfnxFh1P|Z3irq@wAgBf#7hh7Yvu43$afnA3^OlGByigfD~F%Q
>> zo~RQe6nb{1V%i`)+74#uIPp)jeLQI!GOSK!4GjMya<b9RDafmB!We7f&*I!`;6DR3
>> z$8W;_zKLHBU&qd=lco%Vswtc)+>gRfXAAGOT^wY+@zX`N53<F#_{i`(vU#ttGbSPI
>> zYF`40owUQ9L)(<Nmd|Rf$y#GUk=Y@C1M^Bvw6-*~Pcqj2NrAj3njc*uu{w!0S)&hI
>> zqbuyJ&d!>g(TAO^u5eMPrzo_quzV<RM3o*Clkrb;*o(6}X)2oNm8k0k*vI3ioVNGb
>> zA=|aVNId>wk?s9XOZ#u`Wf%1iwZW^pj&5Bay_-h4&?!svD7EzF;^s5-#C%kdOM#Y?
>> za@Yk<+8fAV&^B6`nhQ&_8YH*uMTN7?oyxijCbmWhc*lVK22!2VV6WNICN4;=#ENPf
>> zB<oC?h8s-qwi<X_^M2}I<~p;BjDjDo(e8smsqdqE#*(=LaBq3@zuuqBl@Jn9N;21q
>> z5M9Y!`&ejhCS-c5(O3U{o@&kx=6dE2<auvqDn`p1Is6S&j_ot5r=1?b&^f0FB_(r<
>> zGv)0izcf6QX@7on=(F{j-P8Y^DQfg~?ItHqWqbP=;3KH^xJM4L^F~u_a2IEnpy=*y
>> zgl7>HT7SIiG;A!)*l2Phb~}x8987b92)D_ZSoaU%3bdVOzo&{{{O9OoL)L>`+Wn!p
>> z#QN3ZzZ5U3#V4ZW(Pt!9hI5eUbJplHcDBXJzGsg<=VV))p?D$OxjYd#!KTeJY?z&~
>> zfz2yL=s!A+^!Xn!dcxQkGNBI)$4@(L=h^RU*E4<K{2p=^qa&ajd~zB7!&2FS<QrGe
>> zD>9yoIC59*PBRXEDu<teyOFOVjrzVeitMRI^1%u72&Vkv)M3mpMt7O$?y2P%sIZnD
>> z>joIhO<zwPgc5ea)Vhnb{vTR5;#}Z7J7dPSQS59iL}EC8Y&+i8(>Bniwe>5W9{HXg
>> z<;*qm=Ha%JZHcy1ZRxh7>{uw_YVawZlM2o+JVz9du6WKKYVQ0xe6?=vj~e8b9^|0;
>> z>oFzzQ}W>GJQ}vt7j=f?u;y#JC~w~hkuG6}O}N)r6#8fwn`OJZ868i5U~}$ndO*%%
>> zk;7}@H#p&{dGJ+*`D(ib^ude<8&a@8wFyz53;WeD?Nv(-XAWBru?UW11Qzq+SMG>S
>> z!-IrAfROsI82`}5>YUNK;*!G{16h+D4z&?}B(;kxjNw3u(fZ?Za31oBflXhq)U<5d
>> zBe2r2$2211gXny_dvgwI!eaG!>kZDuZ#(1#i@a?q<bx4X7{pMBhr(iYc$nt5R3h1P
>> zz;iNHg#g+IxkSsem0u`%jM1|XeSwM71d(n-zTE;Qo-T|eLTWlR#SClx-`tC}xd3ww
>> zQ4#3znpy+_19rN7ye6p!5j$2=rZsMv{Z7)4W>z=4cjF9Ph?ZS^Eox<l;%O=BlS&ST
>> z&N$Buz<Y{}!&RIgMklYqsX<Ac^<6pfwf<M>CBfo5*W#(xICY8?vDl{f1nn!9n&<6<
>> zv)Q`-&R=6X(U$xP-rTDlqDLY$5G>h`BRqR_UB9~Sk~xlJb^H+<*4~9gJe}wNn3fjt
>> zQ+evv;tKY{TPzW+-X9vV_&3Sh$rrFY+}BzyMHU{zc}#VmcJ7j){|BT$9=eW`@gUrc
>> zjfQmRrrgYU6@1~Cg)KfNho>M(n*|6s7?b|57cL%Md<<njf!v{VaQ;(o$>Hx1Orywv
>> z+FD&f1wCJ;ZtBB^;IC0uw6KNpv88?ForP4^u=dP&0>xg3o#JS6;Q*dOa^c7HkUJJ0
>> zQ(qXsF=PDVu4tgoql<xoDKC7?ey;`wC@%Fp3J#6r$Su?_qK{bF;y#=_v8BhN_fti4
>> zsf-Gv@D|kaXAsf?h%4`vLn`H1daQh(y#22cEa{If?kkT!`yA^RM=+`W7qrphW9axX
>> zuiR0miwEZ~JQlb#tMGwT9?I12r9E+S0+4;EQY}>PrUhKrUNj>c(e|dX3&vva7^I^n
>> zyjNWyk<4}Dm#!2Y1EC)tN$=8zFo?l_XiTBCzjdad>-cnmo~(X9uX_Z?_c7Kt^E2eK
>> z9iNQfAiFRmcN}g$_!*Mg?>|>r{&wI4Hgbam8<9KuGcu*;GX0*)*i9Nc4Lt{k_K*&C
>> z;_QNL!3XY;!)~+@f;}aE8%`qWSX7C<v1SBCj<AD6?fLA)IU}_9b1FKYoiSg#uhVM*
>> zyI{V)p_{Lx=Q_Vfc{4dDyqp<Np<H7r1q^w-W}Z%Q5}7vCS(DKbMCrR@FnmW06mf17
>> znMQ$D<F~9|4xd9BDu&NB?T9alquWz!UEI|f)DcPJ10sXu(}--Ja@uuzN4xd(*V;Fp
>> z-q|kUX~9!G{Wbhv!<&A3r>Cc@AEV8=c6--h<0+c`uzQUa^%_Ra-Y{0eBLs$Cy9f*$
>> zF<;E}3$dX^Nk)L!Fb%aVFvyJ6GHAZHi%``W%O+xxPYC)mi8iR&{RFC+8ZrEb9rW}V
>> z`@)ojTG4rQ!=CZxJu$S~Xx@X=ykl)j`_Jw*<VM3#?a_?}?OaO{Cx(m{BY}<(+NtTN
>> z%5a11nO|@2-vO?k%zXt9y5x@BpCO1Xz-;;TdA~pqOD0$A<eySxy?YKk8r$nxxUtcp
>> zF#zXjSWp?~d^pScDjo7o*Y}dy3Uc0uMYIz#cF0H&7rLnqIAycPPgz-}1DwGz1#Q|3
>> zy-H|Al(+4|RD(1K2HBaRE<#QlqOTyXg5t2fr+8`{S`&s3$Sozvav36`{~nY`$!{1f
>> zHL9Ft@k|Ws3-wuyZO-w@=-;G!w;^;;IjbxXMgMwbfix*l(R>Iy!5}-1betc|rhY(S
>> zEcgzZA){m@O+SYpJ1m=y!_Q)>y%^exYRcjNftpg#Du;iCAd)5A_Ee71uBZn#EmVHz
>> zFd@7!gcwXdM$9EzT^ccl6_D9DCIiZbjNg<&y$azA&W-*&Jf{3G(A<2;n>Bs6fUbel
>> zN7_t~`36STrwo}Uwmbz>;5)&sZhRJwtiW3=gtoF$gO#IE<(5ivO04wmgUnZPw7v&5
>> zp<4fvTn2~5#cUpVP3ztNgx{9S=Ey8})VrtW;6#u0!Br+3b$GLj+#%CogsVDY<cZu(
>> zuEucM!ZcGPa@=5hF8jWh+NWK|$P`cIutT9!GsHhCYyr8Y_H7zyqei$S8)@2tDo9y<
>> z$ht&sQBb|?X)n(4Ob@}?813QTgdW7$<62H=U<OQgp}unj|3bCis)s(b$bTCT#q%?H
>> zA^>_W^%G<Py9~*x;B*7JNf1c&ARgyMMK5)U@icu4!I_TLn3#=~y7n6^s*}R%`@gjs
>> zpGEuWLpZ*n3ZmMSpY+l=g!gI;Q`l<Y+uJwtBX#*z<wpaTL;I~b%f!ei1T$QzOPOQ7
>> zzt|WX0<(K_sl(E_sdk5P&SOtE$IM8PB0tB<I<((zd@-iw3(n>q$AZJ!d^I0O%5r!e
>> zeD2vz%~!3)+jhJ)U$r3|crd!LIO>(gCbkRB9~pC5zcH7Y2C?DLfMv2VSnX5J%CGmZ
>> zYHR6PR?vh~yymN-p>hZ8SDUXdp_yOY{7~uH;M>itAib#hs@+KcHPV}}PNpK~Pbe?i
>> zP<)F5qt(_fDsCwKQ&O~z6@(s{Et<s@kcu7u9u;y2lpE9Vxzu9n4@q+u?UkGuU9^Ax
>> zNY~|#AUQkgldQJmTdXTukh%w=@To7y0xQ{t&_@=t3}q;#!9l*%Fr#8;QKGTPh<&NN
>> z+)-VeVUNL)(5HeZht>*ccpM3QW&>K|NMI}^Ahde1Z#9k%5OU&HOhHlFOKF}Se7{^O
>> zi2VzjKd02d`B}O7cy<-y$N1ntyWxxfE165FzUsxk-rPG8l9IVacsy41=%Y9XK&a8>
>> za&@{8=>8ZB=iL~_@LwDkUrXlBMVufET<^qD$Bm~s6Hc$jcCsG~GMxVu87JfNFKt(_
>> zTa()!L^E~!pvj}tbOTK^xL9$fKf>N+e~hWi29vcT6rHRkrxki@2ZczL*jr^O!~!qe
>> z2siJo$`RR*CH8%*y5y|qi05QF7j>UfDIP8VUEs2~(k*#Iy@lnHJMb^~?qrQhZnZcV
>> zKV?z(RG?t>Uu*CiO4KXKJW)t4vOLsK=~9x5Cb?Yrgo?vy!(beIOLDiRN{pBPK5!W$
>> zEdDW}cao)5a(g3^J5nnu$y!UHX#96b&vkW~80xilqHg$>o6f^HlF9pC;Ib;om<^>)
>> z3&#2z&cMtgSN4u^euyssPTN%+kJ2}SK9=+xQA5<Q2A<E80yRXhR}vz=+*Z3gQa35$
>> z%Ts?W;*8FQL-9vKqZa>M^<~khAvsVdWq>ur#;6Bu5%IZ0t+rzO;df!Z8#MHfrhlo#
>> zz%(cID>xx+@Ac+c(!V8i$q;FhpvkBxhJ%-FMginQoz!|0I6=S3DH)<<T1ptXbWweI
>> zijj*PEt_i+eQzJL>uM-rRoLC+t2igf`Y|<E;Jny3EB7vhFcunX!9DdTx0@RA4HC$@
>> zk#XjLl!+Jq^MZ!h8!zOd3kC{Az_<)oq-`u+G*HrIzL`l`Jw~of3jVRfyy$?_h%5Dc
>> z>;Ad({_=sc^6xzT-L>nK(g_#I(CG(V;*TE}#I08Gt9D6>K&20HSUL!oPU`wj5~y-m
>> zTP#%$`}UkFhjW`$<tPm0Rym3+oAVuoRSAc)s@Gw!N;<4nZ#tx!f=6zSq{X)&Y{Xsj
>> zZ@Nxr?Q~IG7<V1wJQRWJb~cecbl)zWbWwWUA9S6Lis*9TF2!AsxO?yvDjn8smP&^`
>> zl(Q<~uvYduBy8XXp)n(4<bzDe(vU!sk)LcPo%iBN`9qe~`ADoOs5L~4I=q4;A7rMI
>> z46Zud|Ad?3)=@+)@k6=Wb1I2nCGoQ?fgUGpuvdayCCF74EW9BGSzWj<;obmUU&m8e
>> zI97)zvvqJIx<~|H$b6e*VCQx!Z(B}delL_KNjV)>2!a=KVaU-3KeIl|jf#+tm6^0H
>> z2;Yt~=^)O>pyO^lO&u@=ynJ{q!+jiS#y=s!j+^RT?I?W#Zc}+fRci_?>u!!k+d<#o
>> z;I8;U*Z&nyD(@CLy_SHZuEtSc<M1uHSxuG08_I35Y?HR<kiLJzy#xAw{^#`l;Lqr*
>> z8Q15Bj@jTg2z`F)qPMosXSsBXN>>(G-gGq8I9Ap;)-*VLYqH@b4qsz7jQEH1voDW3
>> zR#rNis?e6y2D?zMHlXV+goSE{bCILC(&1{PrVwW1-k(Wtpmf+3veY_m*RI+AWB*Xs
>> zDL-8<+|++QLmAYzrjBZK{D<LGrvJMT-H3Y@@-^W(F7x=9Oa-FJr&dDdNJi!sZCGk~
>> z{T7T+S*lAX?qTHrt4#h(KmH*)k<pQB3Tm?5L7?(9lEVrtl##zhc`nNjZ&4mP3bB|4
>> zb~VU9ajdF9Tpi-F3}->kmx-I7DQCL{eAeWfovymzF*9JY+zQzy<R@O^oR*AnB7Hj`
>> z6M+ncNygCkSVmt}0STiw!#*Ux1=%oUzd2F1AtPHeUbYwMBN@G$aI8XlZ{ktw2v+8-
>> z$;sC3#yOz|*~sTEX}W#|x&+khCLNV<jo;qf{T4Zjvfa<nu@>2HS5DRs+s~MszfA78
>> zkjwfj`3d>!F2qe9>x)&iRW`00>oga!RHyKuu0Kr@x8h=1al=Suj!BIWZ%4ilh{dh)
>> zegDScy}BSr7H^ECu565%yYTd$({(!DG27i3zcF8gB${z<R|oQSs>2%O{UNQgZe>fg
>> z!<Tc;atmj#E^r~sO5CrU*Y!v6r2HZHu+xHxCez1Be-QWWY@jipWCo#QAj2cmKViS+
>> z>3qy_x650R$s4<<>(#fn-?h&F-EXcd`&Ow?x<#O{|2t1_ST|?GfBVkbbw3gwZ>Vwk
>> z8XtE-7r!_GPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(8
>> z6W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;Z
>> zH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULas
>> zfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O
>> z1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1U
>> zPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu
>> z-~>1UPJk2O1ULasfD_;ZH~~(86W|0m0ZxDu-~>1UPJk2O1ULasfD_;ZH~~(86W|0m
>> z0Z!od1Ux-$$J=_^2HLc?{{JUzm0dl`OkLOi(JspO^xUV&;+-j7IrD2oSp}ujDU3^>
>> o5d@0NUb>FZ&)-2Do-dnE`R8)xT^9bc5QmajO4XIv_+RY*0|l%Fi~s-t
>> 
>> literal 0
>> HcmV?d00001
>> 
>> diff --git a/roms/Makefile b/roms/Makefile
>> index 775c963f9d..47eabc8633 100644
>> --- a/roms/Makefile
>> +++ b/roms/Makefile
>> @@ -67,6 +67,7 @@ default:
>>  	@echo "  opensbi32-virt     -- update OpenSBI for 32-bit virt machine"
>>  	@echo "  opensbi64-virt     -- update OpenSBI for 64-bit virt machine"
>>  	@echo "  opensbi64-sifive_u -- update OpenSBI for 64-bit sifive_u machine"
>> +	@echo "  bios-microvm       -- update bios-microvm.bin (qboot)"
>
> I'd go the other way around:
>
>         @echo "  qboot -- update qboot (BIOS used by microvm)"

I think it's better the other way. In all cases, "target" specifies the
binary that will be built, not the name of the project holding it
(the targets for SeaBIOS are bios.bin and vgabios.bin).

I also want to clarify that the name bios-microvm was suggested by
qboot's author (Paolo) ;-)

Thanks,
Sergio.

>>  	@echo "  clean              -- delete the files generated by the previous" \
>>  	                              "build targets"
>>  
>> @@ -185,6 +186,10 @@ opensbi64-sifive_u:
>>  		PLATFORM="qemu/sifive_u"
>>  	cp opensbi/build/platform/qemu/sifive_u/firmware/fw_jump.bin ../pc-bios/opensbi-riscv64-sifive_u-fw_jump.bin
>>  
>> +bios-microvm:
>
>    qboot:
>
> or
>
>    qboot bios-microvm:
>
>> +	$(MAKE) -C qboot
>> +	cp qboot/bios.bin ../pc-bios/bios-microvm.bin
>> +
>>  clean:
>>  	rm -rf seabios/.config seabios/out seabios/builds
>>  	$(MAKE) -C sgabios clean
>> @@ -197,3 +202,4 @@ clean:
>>  	$(MAKE) -C skiboot clean
>>  	$(MAKE) -f Makefile.edk2 clean
>>  	$(MAKE) -C opensbi clean
>> +	$(MAKE) -C qboot clean
>> diff --git a/roms/qboot b/roms/qboot
>> new file mode 160000
>> index 0000000000..cb1c49e0cf
>> --- /dev/null
>> +++ b/roms/qboot
>> @@ -0,0 +1 @@
>> +Subproject commit cb1c49e0cfac99b9961d136ac0194da62c28cf64
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-25  5:06     ` Gerd Hoffmann
@ 2019-09-25  7:33       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  7:33 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, philmd, lersek, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]


Gerd Hoffmann <kraxel@redhat.com> writes:

>   Hi,
>
>> +microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
>
> Hmm, is that the long-term plan?  IMO the virtio-mmio devices should be
> discoverable somehow.  ACPI, or device-tree, or fw_cfg, or ...

I'd say that depends on the machine type. ARM's virt and vexpress do
support virtio-mmio devices, adding them to a generated DTB.

For microvm that's simply not worth it. Fiddling with the command line
achieves the same result without any significant drawbacks, with less
code and less consumed cycles on both sides.

>> +As no current FW is able to boot from a block device using virtio-mmio
>> +as its transport,
>
> To fix that the firmware must be able to find the virtio-mmio devices.

No FW supports modern virtio-mmio transports anyway. And, from microvm's
perspective, there's little incentive to change this situation, given
that it's main use cases (serverless computing and VM-isolated
containers) will run with an external kernel.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25  7:33       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  7:33 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, pbonzini,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]


Gerd Hoffmann <kraxel@redhat.com> writes:

>   Hi,
>
>> +microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
>
> Hmm, is that the long-term plan?  IMO the virtio-mmio devices should be
> discoverable somehow.  ACPI, or device-tree, or fw_cfg, or ...

I'd say that depends on the machine type. ARM's virt and vexpress do
support virtio-mmio devices, adding them to a generated DTB.

For microvm that's simply not worth it. Fiddling with the command line
achieves the same result without any significant drawbacks, with less
code and less consumed cycles on both sides.

>> +As no current FW is able to boot from a block device using virtio-mmio
>> +as its transport,
>
> To fix that the firmware must be able to find the virtio-mmio devices.

No FW supports modern virtio-mmio transports anyway. And, from microvm's
perspective, there's little incentive to change this situation, given
that it's main use cases (serverless computing and VM-isolated
containers) will run with an external kernel.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-25  7:41   ` David Hildenbrand
  -1 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25  7:41 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm, Pankaj Gupta

On 24.09.19 14:44, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a minimalist machine type free
> from the burden of legacy compatibility, serving as a stepping stone
> for future projects aiming at improving boot times, reducing the
> attack surface and slimming down QEMU's footprint.
> 
> The microvm machine type supports the following devices:
> 
>  - ISA bus
>  - i8259 PIC
>  - LAPIC (implicit if using KVM)
>  - IOAPIC (defaults to kernel_irqchip_split = true)
>  - i8254 PIT
>  - MC146818 RTC (optional)
>  - kvmclock (if using KVM)
>  - fw_cfg
>  - One ISA serial port (optional)
>  - Up to eight virtio-mmio devices (configured by the user)

So I assume also no ACPI (CPU/memory hotplug), correct?

@Pankaj, I think it would make sense to make virtio-pmem play with
virtio-mmio/microvm.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  7:41   ` David Hildenbrand
  0 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25  7:41 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: Pankaj Gupta, ehabkost, kvm, mst, lersek, mtosatti, kraxel,
	pbonzini, imammedo, philmd, rth

On 24.09.19 14:44, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a minimalist machine type free
> from the burden of legacy compatibility, serving as a stepping stone
> for future projects aiming at improving boot times, reducing the
> attack surface and slimming down QEMU's footprint.
> 
> The microvm machine type supports the following devices:
> 
>  - ISA bus
>  - i8259 PIC
>  - LAPIC (implicit if using KVM)
>  - IOAPIC (defaults to kernel_irqchip_split = true)
>  - i8254 PIT
>  - MC146818 RTC (optional)
>  - kvmclock (if using KVM)
>  - fw_cfg
>  - One ISA serial port (optional)
>  - Up to eight virtio-mmio devices (configured by the user)

So I assume also no ACPI (CPU/memory hotplug), correct?

@Pankaj, I think it would make sense to make virtio-pmem play with
virtio-mmio/microvm.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-25  5:49       ` Sergio Lopez
@ 2019-09-25  7:57         ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25  7:57 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm


[-- Attachment #1.1: Type: text/plain, Size: 1070 bytes --]

On 25/09/19 07:49, Sergio Lopez wrote:
>>> +serving as a stepping stone
>>> +for future projects aiming at improving boot times, reducing the
>>> +attack surface and slimming down QEMU's footprint.
>>
>> "Microvm also establishes a baseline for benchmarking QEMU and operating
>> systems, since it is optimized for both boot time and footprint".
> 
> Well, I prefer my paragraph, but I'm good with either.

You're right my version sort of missed the point.  What about
s/benchmarking/benchmarking and optimizing/?

>>> +The microvm machine type supports the following devices:
>>> +
>>> + - ISA bus
>>> + - i8259 PIC
>>> + - LAPIC (implicit if using KVM)
>>> + - IOAPIC (defaults to kernel_irqchip_split = true)
>>> + - i8254 PIT
>>
>> Do we need the PIT?  And perhaps the PIC even?
> 
> We need the PIT for non-KVM accel (if present with KVM and
> kernel_irqchip_split = off, it basically becomes a placeholder)

Why?

> and the
> PIC for both the PIT and the ISA serial port.

Can't the ISA serial port work with the IOAPIC?

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25  7:57         ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25  7:57 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth


[-- Attachment #1.1: Type: text/plain, Size: 1070 bytes --]

On 25/09/19 07:49, Sergio Lopez wrote:
>>> +serving as a stepping stone
>>> +for future projects aiming at improving boot times, reducing the
>>> +attack surface and slimming down QEMU's footprint.
>>
>> "Microvm also establishes a baseline for benchmarking QEMU and operating
>> systems, since it is optimized for both boot time and footprint".
> 
> Well, I prefer my paragraph, but I'm good with either.

You're right my version sort of missed the point.  What about
s/benchmarking/benchmarking and optimizing/?

>>> +The microvm machine type supports the following devices:
>>> +
>>> + - ISA bus
>>> + - i8259 PIC
>>> + - LAPIC (implicit if using KVM)
>>> + - IOAPIC (defaults to kernel_irqchip_split = true)
>>> + - i8254 PIT
>>
>> Do we need the PIT?  And perhaps the PIC even?
> 
> We need the PIT for non-KVM accel (if present with KVM and
> kernel_irqchip_split = off, it basically becomes a placeholder)

Why?

> and the
> PIC for both the PIT and the ISA serial port.

Can't the ISA serial port work with the IOAPIC?

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  7:41   ` David Hildenbrand
@ 2019-09-25  7:58     ` Pankaj Gupta
  -1 siblings, 0 replies; 133+ messages in thread
From: Pankaj Gupta @ 2019-09-25  7:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Sergio Lopez, qemu-devel, mst, imammedo, marcel apfelbaum,
	pbonzini, rth, ehabkost, philmd, lersek, kraxel, mtosatti, kvm


> On 24.09.19 14:44, Sergio Lopez wrote:
> > Microvm is a machine type inspired by both NEMU and Firecracker, and
> > constructed after the machine model implemented by the latter.
> > 
> > It's main purpose is providing users a minimalist machine type free
> > from the burden of legacy compatibility, serving as a stepping stone
> > for future projects aiming at improving boot times, reducing the
> > attack surface and slimming down QEMU's footprint.
> > 
> > The microvm machine type supports the following devices:
> > 
> >  - ISA bus
> >  - i8259 PIC
> >  - LAPIC (implicit if using KVM)
> >  - IOAPIC (defaults to kernel_irqchip_split = true)
> >  - i8254 PIT
> >  - MC146818 RTC (optional)
> >  - kvmclock (if using KVM)
> >  - fw_cfg
> >  - One ISA serial port (optional)
> >  - Up to eight virtio-mmio devices (configured by the user)
> 
> So I assume also no ACPI (CPU/memory hotplug), correct?
> 
> @Pankaj, I think it would make sense to make virtio-pmem play with
> virtio-mmio/microvm.

I agree. Its using virtio-blk device over a raw image.
Similarly or alternatively(as an experiment) we can use virtio-pmem
which will even reduce the guest memory footprint. 

Best regards,
Pankaj

> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  7:58     ` Pankaj Gupta
  0 siblings, 0 replies; 133+ messages in thread
From: Pankaj Gupta @ 2019-09-25  7:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: kvm, ehabkost, Sergio Lopez, mst, lersek, mtosatti, qemu-devel,
	kraxel, pbonzini, imammedo, philmd, rth


> On 24.09.19 14:44, Sergio Lopez wrote:
> > Microvm is a machine type inspired by both NEMU and Firecracker, and
> > constructed after the machine model implemented by the latter.
> > 
> > It's main purpose is providing users a minimalist machine type free
> > from the burden of legacy compatibility, serving as a stepping stone
> > for future projects aiming at improving boot times, reducing the
> > attack surface and slimming down QEMU's footprint.
> > 
> > The microvm machine type supports the following devices:
> > 
> >  - ISA bus
> >  - i8259 PIC
> >  - LAPIC (implicit if using KVM)
> >  - IOAPIC (defaults to kernel_irqchip_split = true)
> >  - i8254 PIT
> >  - MC146818 RTC (optional)
> >  - kvmclock (if using KVM)
> >  - fw_cfg
> >  - One ISA serial port (optional)
> >  - Up to eight virtio-mmio devices (configured by the user)
> 
> So I assume also no ACPI (CPU/memory hotplug), correct?
> 
> @Pankaj, I think it would make sense to make virtio-pmem play with
> virtio-mmio/microvm.

I agree. Its using virtio-blk device over a raw image.
Similarly or alternatively(as an experiment) we can use virtio-pmem
which will even reduce the guest memory footprint. 

Best regards,
Pankaj

> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  7:41   ` David Hildenbrand
@ 2019-09-25  8:10     ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  8:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, philmd, lersek, kraxel, mtosatti, kvm, Pankaj Gupta

[-- Attachment #1: Type: text/plain, Size: 1238 bytes --]


David Hildenbrand <david@redhat.com> writes:

> On 24.09.19 14:44, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a minimalist machine type free
>> from the burden of legacy compatibility, serving as a stepping stone
>> for future projects aiming at improving boot times, reducing the
>> attack surface and slimming down QEMU's footprint.
>> 
>> The microvm machine type supports the following devices:
>> 
>>  - ISA bus
>>  - i8259 PIC
>>  - LAPIC (implicit if using KVM)
>>  - IOAPIC (defaults to kernel_irqchip_split = true)
>>  - i8254 PIT
>>  - MC146818 RTC (optional)
>>  - kvmclock (if using KVM)
>>  - fw_cfg
>>  - One ISA serial port (optional)
>>  - Up to eight virtio-mmio devices (configured by the user)
>
> So I assume also no ACPI (CPU/memory hotplug), correct?

Correct.

> @Pankaj, I think it would make sense to make virtio-pmem play with
> virtio-mmio/microvm.

That would be great. I'm also looking forward for virtio-mem (and an
hypothetical virtio-cpu) to eventually gain hotplug capabilities in
microvm.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  8:10     ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  8:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Pankaj Gupta, ehabkost, kvm, mst, lersek, mtosatti, qemu-devel,
	kraxel, pbonzini, imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 1238 bytes --]


David Hildenbrand <david@redhat.com> writes:

> On 24.09.19 14:44, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a minimalist machine type free
>> from the burden of legacy compatibility, serving as a stepping stone
>> for future projects aiming at improving boot times, reducing the
>> attack surface and slimming down QEMU's footprint.
>> 
>> The microvm machine type supports the following devices:
>> 
>>  - ISA bus
>>  - i8259 PIC
>>  - LAPIC (implicit if using KVM)
>>  - IOAPIC (defaults to kernel_irqchip_split = true)
>>  - i8254 PIT
>>  - MC146818 RTC (optional)
>>  - kvmclock (if using KVM)
>>  - fw_cfg
>>  - One ISA serial port (optional)
>>  - Up to eight virtio-mmio devices (configured by the user)
>
> So I assume also no ACPI (CPU/memory hotplug), correct?

Correct.

> @Pankaj, I think it would make sense to make virtio-pmem play with
> virtio-mmio/microvm.

That would be great. I'm also looking forward for virtio-mem (and an
hypothetical virtio-cpu) to eventually gain hotplug capabilities in
microvm.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  8:10     ` Sergio Lopez
@ 2019-09-25  8:16       ` David Hildenbrand
  -1 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25  8:16 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, philmd, lersek, kraxel, mtosatti, kvm, Pankaj Gupta

On 25.09.19 10:10, Sergio Lopez wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 24.09.19 14:44, Sergio Lopez wrote:
>>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>>> constructed after the machine model implemented by the latter.
>>>
>>> It's main purpose is providing users a minimalist machine type free
>>> from the burden of legacy compatibility, serving as a stepping stone
>>> for future projects aiming at improving boot times, reducing the
>>> attack surface and slimming down QEMU's footprint.
>>>
>>> The microvm machine type supports the following devices:
>>>
>>>  - ISA bus
>>>  - i8259 PIC
>>>  - LAPIC (implicit if using KVM)
>>>  - IOAPIC (defaults to kernel_irqchip_split = true)
>>>  - i8254 PIT
>>>  - MC146818 RTC (optional)
>>>  - kvmclock (if using KVM)
>>>  - fw_cfg
>>>  - One ISA serial port (optional)
>>>  - Up to eight virtio-mmio devices (configured by the user)
>>
>> So I assume also no ACPI (CPU/memory hotplug), correct?
> 
> Correct.
> 
>> @Pankaj, I think it would make sense to make virtio-pmem play with
>> virtio-mmio/microvm.
> 
> That would be great. I'm also looking forward for virtio-mem (and an
> hypothetical virtio-cpu) to eventually gain hotplug capabilities in
> microvm.

@Pankaj, do you have time to look into the virtio-pmem thingy? I guess
the virtio-mmio rapper shouldn't be too hard (very similar to the
virtio-pci wrapper - luckily I insisted to make it work independently
from PCI BARs and ACPI slots ;) ). The microvm bits would be properly
setting up device memory and wiring up the hotplug handlers, similar as
done in the other PC machine types (maybe that comes for free?).

virtio-pmem will allow (in read-only mode) to place the rootfs on a fake
NVDIMM, as done e.g., in kata containers. We might have to include the
virtio-pmem kernel module in the initramfs, shouldn't  be too hard. Not
sure what else we'll need to make virtio-pmem get used as a rootfs.

> 
> Thanks,
> Sergio.
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  8:16       ` David Hildenbrand
  0 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25  8:16 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Pankaj Gupta, ehabkost, kvm, mst, lersek, mtosatti, qemu-devel,
	kraxel, pbonzini, imammedo, philmd, rth

On 25.09.19 10:10, Sergio Lopez wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 24.09.19 14:44, Sergio Lopez wrote:
>>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>>> constructed after the machine model implemented by the latter.
>>>
>>> It's main purpose is providing users a minimalist machine type free
>>> from the burden of legacy compatibility, serving as a stepping stone
>>> for future projects aiming at improving boot times, reducing the
>>> attack surface and slimming down QEMU's footprint.
>>>
>>> The microvm machine type supports the following devices:
>>>
>>>  - ISA bus
>>>  - i8259 PIC
>>>  - LAPIC (implicit if using KVM)
>>>  - IOAPIC (defaults to kernel_irqchip_split = true)
>>>  - i8254 PIT
>>>  - MC146818 RTC (optional)
>>>  - kvmclock (if using KVM)
>>>  - fw_cfg
>>>  - One ISA serial port (optional)
>>>  - Up to eight virtio-mmio devices (configured by the user)
>>
>> So I assume also no ACPI (CPU/memory hotplug), correct?
> 
> Correct.
> 
>> @Pankaj, I think it would make sense to make virtio-pmem play with
>> virtio-mmio/microvm.
> 
> That would be great. I'm also looking forward for virtio-mem (and an
> hypothetical virtio-cpu) to eventually gain hotplug capabilities in
> microvm.

@Pankaj, do you have time to look into the virtio-pmem thingy? I guess
the virtio-mmio rapper shouldn't be too hard (very similar to the
virtio-pci wrapper - luckily I insisted to make it work independently
from PCI BARs and ACPI slots ;) ). The microvm bits would be properly
setting up device memory and wiring up the hotplug handlers, similar as
done in the other PC machine types (maybe that comes for free?).

virtio-pmem will allow (in read-only mode) to place the rootfs on a fake
NVDIMM, as done e.g., in kata containers. We might have to include the
virtio-pmem kernel module in the initramfs, shouldn't  be too hard. Not
sure what else we'll need to make virtio-pmem get used as a rootfs.

> 
> Thanks,
> Sergio.
> 


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  8:10     ` Sergio Lopez
@ 2019-09-25  8:26       ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25  8:26 UTC (permalink / raw)
  To: Sergio Lopez, David Hildenbrand
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm, Pankaj Gupta


[-- Attachment #1.1: Type: text/plain, Size: 742 bytes --]

On 25/09/19 10:10, Sergio Lopez wrote:
> That would be great. I'm also looking forward for virtio-mem (and an
> hypothetical virtio-cpu) to eventually gain hotplug capabilities in
> microvm.

I disagree with this.  virtio is not a silver bullet (and in fact
perhaps it's just me but I've never understood the advantages of
virtio-mem over anything else).

If you want to add hotplug to microvm, you can reuse the existing code
for CPU and memory hotplug controllers, and write drivers for them in
Linux's drivers/platform.  The drivers would basically do what the ACPI
AML tells the interpreter to do.

There is no reason to add the complexity of virtio to something as
low-level and deadlock-prone as CPU hotplug.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  8:26       ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25  8:26 UTC (permalink / raw)
  To: Sergio Lopez, David Hildenbrand
  Cc: Pankaj Gupta, ehabkost, kvm, mst, lersek, mtosatti, qemu-devel,
	kraxel, imammedo, philmd, rth


[-- Attachment #1.1: Type: text/plain, Size: 742 bytes --]

On 25/09/19 10:10, Sergio Lopez wrote:
> That would be great. I'm also looking forward for virtio-mem (and an
> hypothetical virtio-cpu) to eventually gain hotplug capabilities in
> microvm.

I disagree with this.  virtio is not a silver bullet (and in fact
perhaps it's just me but I've never understood the advantages of
virtio-mem over anything else).

If you want to add hotplug to microvm, you can reuse the existing code
for CPU and memory hotplug controllers, and write drivers for them in
Linux's drivers/platform.  The drivers would basically do what the ACPI
AML tells the interpreter to do.

There is no reason to add the complexity of virtio to something as
low-level and deadlock-prone as CPU hotplug.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
  2019-09-24 12:44   ` Sergio Lopez
@ 2019-09-25  8:36     ` Stefano Garzarella
  -1 siblings, 0 replies; 133+ messages in thread
From: Stefano Garzarella @ 2019-09-25  8:36 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, philmd, lersek, kraxel, mtosatti, kvm

Hi Sergio,

On Tue, Sep 24, 2019 at 02:44:26PM +0200, Sergio Lopez wrote:
> Extract PVH related functions from pc.c, and put them in pvh.c, so
> they can be shared with other components.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  hw/i386/Makefile.objs |   1 +
>  hw/i386/pc.c          | 120 +++++-------------------------------------
>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
>  hw/i386/pvh.h         |  10 ++++
>  4 files changed, 136 insertions(+), 108 deletions(-)
>  create mode 100644 hw/i386/pvh.c
>  create mode 100644 hw/i386/pvh.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 5d9c9efd5f..c5f20bbd72 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -1,5 +1,6 @@
>  obj-$(CONFIG_KVM) += kvm/
>  obj-y += multiboot.o
> +obj-y += pvh.o
>  obj-y += pc.o
>  obj-$(CONFIG_I440FX) += pc_piix.o
>  obj-$(CONFIG_Q35) += pc_q35.o
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index bad866fe44..10e4ced0c6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -42,6 +42,7 @@
>  #include "elf.h"
>  #include "migration/vmstate.h"
>  #include "multiboot.h"
> +#include "pvh.h"
>  #include "hw/timer/mc146818rtc.h"
>  #include "hw/dma/i8257.h"
>  #include "hw/timer/i8254.h"
> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
>  static unsigned e820_entries;
>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  
> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
> -static size_t pvh_start_addr;
> -
>  GlobalProperty pc_compat_4_1[] = {};
>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
>  
> @@ -1076,109 +1074,6 @@ struct setup_data {
>      uint8_t data[0];
>  } __attribute__((packed));
>  
> -
> -/*
> - * The entry point into the kernel for PVH boot is different from
> - * the native entry point.  The PVH entry is defined by the x86/HVM
> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> - *
> - * This function is passed to load_elf() when it is called from
> - * load_elfboot() which then additionally checks for an ELF Note of
> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> - * parse the PVH entry address from the ELF Note.
> - *
> - * Due to trickery in elf_opts.h, load_elf() is actually available as
> - * load_elf32() or load_elf64() and this routine needs to be able
> - * to deal with being called as 32 or 64 bit.
> - *
> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
> - * global variable.  (although the entry point is 32-bit, the kernel
> - * binary can be either 32-bit or 64-bit).
> - */
> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> -{
> -    size_t *elf_note_data_addr;
> -
> -    /* Check if ELF Note header passed in is valid */
> -    if (arg1 == NULL) {
> -        return 0;
> -    }
> -
> -    if (is64) {
> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> -        uint64_t phdr_align = *(uint64_t *)arg2;
> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr64) + nhdr_size64 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    } else {
> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> -        uint32_t phdr_align = *(uint32_t *)arg2;
> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr32) + nhdr_size32 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    }
> -
> -    pvh_start_addr = *elf_note_data_addr;
> -
> -    return pvh_start_addr;
> -}
> -
> -static bool load_elfboot(const char *kernel_filename,
> -                   int kernel_file_size,
> -                   uint8_t *header,
> -                   size_t pvh_xen_start_addr,
> -                   FWCfgState *fw_cfg)
> -{
> -    uint32_t flags = 0;
> -    uint32_t mh_load_addr = 0;
> -    uint32_t elf_kernel_size = 0;
> -    uint64_t elf_entry;
> -    uint64_t elf_low, elf_high;
> -    int kernel_size;
> -

Are we removing the following checks (ELF magic, flags) because they
are superfluous?

Should we mention this in the commit message?

> -    if (ldl_p(header) != 0x464c457f) {
> -        return false; /* no elfboot */
> -    }
> -
> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> -    flags = elf_is64 ?
> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> -
> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> -        error_report("elfboot unsupported flags = %x", flags);
> -        exit(1);
> -    }
> -
> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> -                           NULL, &elf_note_type, &elf_entry,
> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> -                           0, 0);
> -
> -    if (kernel_size < 0) {
> -        error_report("Error while loading elf kernel");
> -        exit(1);
> -    }
> -    mh_load_addr = elf_low;
> -    elf_kernel_size = elf_high - elf_low;
> -
> -    if (pvh_start_addr == 0) {
> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
> -        exit(1);
> -    }
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> -
> -    return true;
> -}
> -
>  static void load_linux(PCMachineState *pcms,
>                         FWCfgState *fw_cfg)
>  {
> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
>      if (ldl_p(header+0x202) == 0x53726448) {
>          protocol = lduw_p(header+0x206);
>      } else {
> +        size_t pvh_start_addr;
> +        uint32_t mh_load_addr = 0;
> +        uint32_t elf_kernel_size = 0;
>          /*
>           * This could be a multiboot kernel. If it is, let's stop treating it
>           * like a Linux kernel.
> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
>           * If load_elfboot() is successful, populate the fw_cfg info.
>           */
>          if (pcmc->pvh_enabled &&
> -            load_elfboot(kernel_filename, kernel_size,
> -                         header, pvh_start_addr, fw_cfg)) {
> +            pvh_load_elfboot(kernel_filename,
> +                             &mh_load_addr, &elf_kernel_size)) {
>              fclose(f);
>  
> +            pvh_start_addr = pvh_get_start_addr();
> +
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> +
>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>                  strlen(kernel_cmdline) + 1);
>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
> new file mode 100644
> index 0000000000..1c81727811
> --- /dev/null
> +++ b/hw/i386/pvh.c
> @@ -0,0 +1,113 @@
> +/*
> + * PVH Boot Helper
> + *
> + * Copyright (C) 2019 Oracle
> + * Copyright (C) 2019 Red Hat, Inc
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/error-report.h"
> +#include "hw/loader.h"
> +#include "cpu.h"
> +#include "elf.h"
> +#include "pvh.h"
> +
> +static size_t pvh_start_addr;
> +
> +size_t pvh_get_start_addr(void)
> +{
> +    return pvh_start_addr;
> +}
> +
> +/*
> + * The entry point into the kernel for PVH boot is different from
> + * the native entry point.  The PVH entry is defined by the x86/HVM
> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> + *
> + * This function is passed to load_elf() when it is called from
> + * load_elfboot() which then additionally checks for an ELF Note of
> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> + * parse the PVH entry address from the ELF Note.
> + *
> + * Due to trickery in elf_opts.h, load_elf() is actually available as
> + * load_elf32() or load_elf64() and this routine needs to be able
> + * to deal with being called as 32 or 64 bit.
> + *
> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
> + * global variable.  (although the entry point is 32-bit, the kernel
> + * binary can be either 32-bit or 64-bit).
> + */
> +
> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> +{
> +    size_t *elf_note_data_addr;
> +
> +    /* Check if ELF Note header passed in is valid */
> +    if (arg1 == NULL) {
> +        return 0;
> +    }
> +
> +    if (is64) {
> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> +        uint64_t phdr_align = *(uint64_t *)arg2;
> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr64) + nhdr_size64 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    } else {
> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> +        uint32_t phdr_align = *(uint32_t *)arg2;
> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr32) + nhdr_size32 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    }
> +
> +    pvh_start_addr = *elf_note_data_addr;
> +
> +    return pvh_start_addr;
> +}
> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size)
> +{
> +    uint64_t elf_entry;
> +    uint64_t elf_low, elf_high;
> +    int kernel_size;
> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> +
> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> +                           NULL, &elf_note_type, &elf_entry,
> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> +                           0, 0);
> +
> +    if (kernel_size < 0) {
> +        error_report("Error while loading elf kernel");
> +        return false;
> +    }
> +
> +    if (pvh_start_addr == 0) {
> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
> +        return false;
> +    }
> +
> +    if (mh_load_addr) {
> +        *mh_load_addr = elf_low;
> +    }
> +
> +    if (elf_kernel_size) {
> +        *elf_kernel_size = elf_high - elf_low;
> +    }
> +
> +    return true;
> +}
> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
> new file mode 100644
> index 0000000000..ada67ff6e8
> --- /dev/null
> +++ b/hw/i386/pvh.h
> @@ -0,0 +1,10 @@
> +#ifndef HW_I386_PVH_H
> +#define HW_I386_PVH_H
> +
> +size_t pvh_get_start_addr(void);

What about adding "size_t *pvh_start_addr" to the pvh_load_elfboot()?
Just an idea, I'm not sure if it is better...

> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size);
> +
> +#endif

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
@ 2019-09-25  8:36     ` Stefano Garzarella
  0 siblings, 0 replies; 133+ messages in thread
From: Stefano Garzarella @ 2019-09-25  8:36 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	pbonzini, imammedo, philmd, rth

Hi Sergio,

On Tue, Sep 24, 2019 at 02:44:26PM +0200, Sergio Lopez wrote:
> Extract PVH related functions from pc.c, and put them in pvh.c, so
> they can be shared with other components.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  hw/i386/Makefile.objs |   1 +
>  hw/i386/pc.c          | 120 +++++-------------------------------------
>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
>  hw/i386/pvh.h         |  10 ++++
>  4 files changed, 136 insertions(+), 108 deletions(-)
>  create mode 100644 hw/i386/pvh.c
>  create mode 100644 hw/i386/pvh.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 5d9c9efd5f..c5f20bbd72 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -1,5 +1,6 @@
>  obj-$(CONFIG_KVM) += kvm/
>  obj-y += multiboot.o
> +obj-y += pvh.o
>  obj-y += pc.o
>  obj-$(CONFIG_I440FX) += pc_piix.o
>  obj-$(CONFIG_Q35) += pc_q35.o
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index bad866fe44..10e4ced0c6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -42,6 +42,7 @@
>  #include "elf.h"
>  #include "migration/vmstate.h"
>  #include "multiboot.h"
> +#include "pvh.h"
>  #include "hw/timer/mc146818rtc.h"
>  #include "hw/dma/i8257.h"
>  #include "hw/timer/i8254.h"
> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
>  static unsigned e820_entries;
>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>  
> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
> -static size_t pvh_start_addr;
> -
>  GlobalProperty pc_compat_4_1[] = {};
>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
>  
> @@ -1076,109 +1074,6 @@ struct setup_data {
>      uint8_t data[0];
>  } __attribute__((packed));
>  
> -
> -/*
> - * The entry point into the kernel for PVH boot is different from
> - * the native entry point.  The PVH entry is defined by the x86/HVM
> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> - *
> - * This function is passed to load_elf() when it is called from
> - * load_elfboot() which then additionally checks for an ELF Note of
> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> - * parse the PVH entry address from the ELF Note.
> - *
> - * Due to trickery in elf_opts.h, load_elf() is actually available as
> - * load_elf32() or load_elf64() and this routine needs to be able
> - * to deal with being called as 32 or 64 bit.
> - *
> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
> - * global variable.  (although the entry point is 32-bit, the kernel
> - * binary can be either 32-bit or 64-bit).
> - */
> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> -{
> -    size_t *elf_note_data_addr;
> -
> -    /* Check if ELF Note header passed in is valid */
> -    if (arg1 == NULL) {
> -        return 0;
> -    }
> -
> -    if (is64) {
> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> -        uint64_t phdr_align = *(uint64_t *)arg2;
> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr64) + nhdr_size64 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    } else {
> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> -        uint32_t phdr_align = *(uint32_t *)arg2;
> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr32) + nhdr_size32 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    }
> -
> -    pvh_start_addr = *elf_note_data_addr;
> -
> -    return pvh_start_addr;
> -}
> -
> -static bool load_elfboot(const char *kernel_filename,
> -                   int kernel_file_size,
> -                   uint8_t *header,
> -                   size_t pvh_xen_start_addr,
> -                   FWCfgState *fw_cfg)
> -{
> -    uint32_t flags = 0;
> -    uint32_t mh_load_addr = 0;
> -    uint32_t elf_kernel_size = 0;
> -    uint64_t elf_entry;
> -    uint64_t elf_low, elf_high;
> -    int kernel_size;
> -

Are we removing the following checks (ELF magic, flags) because they
are superfluous?

Should we mention this in the commit message?

> -    if (ldl_p(header) != 0x464c457f) {
> -        return false; /* no elfboot */
> -    }
> -
> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> -    flags = elf_is64 ?
> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> -
> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> -        error_report("elfboot unsupported flags = %x", flags);
> -        exit(1);
> -    }
> -
> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> -                           NULL, &elf_note_type, &elf_entry,
> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> -                           0, 0);
> -
> -    if (kernel_size < 0) {
> -        error_report("Error while loading elf kernel");
> -        exit(1);
> -    }
> -    mh_load_addr = elf_low;
> -    elf_kernel_size = elf_high - elf_low;
> -
> -    if (pvh_start_addr == 0) {
> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
> -        exit(1);
> -    }
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> -
> -    return true;
> -}
> -
>  static void load_linux(PCMachineState *pcms,
>                         FWCfgState *fw_cfg)
>  {
> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
>      if (ldl_p(header+0x202) == 0x53726448) {
>          protocol = lduw_p(header+0x206);
>      } else {
> +        size_t pvh_start_addr;
> +        uint32_t mh_load_addr = 0;
> +        uint32_t elf_kernel_size = 0;
>          /*
>           * This could be a multiboot kernel. If it is, let's stop treating it
>           * like a Linux kernel.
> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
>           * If load_elfboot() is successful, populate the fw_cfg info.
>           */
>          if (pcmc->pvh_enabled &&
> -            load_elfboot(kernel_filename, kernel_size,
> -                         header, pvh_start_addr, fw_cfg)) {
> +            pvh_load_elfboot(kernel_filename,
> +                             &mh_load_addr, &elf_kernel_size)) {
>              fclose(f);
>  
> +            pvh_start_addr = pvh_get_start_addr();
> +
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> +
>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>                  strlen(kernel_cmdline) + 1);
>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
> new file mode 100644
> index 0000000000..1c81727811
> --- /dev/null
> +++ b/hw/i386/pvh.c
> @@ -0,0 +1,113 @@
> +/*
> + * PVH Boot Helper
> + *
> + * Copyright (C) 2019 Oracle
> + * Copyright (C) 2019 Red Hat, Inc
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/error-report.h"
> +#include "hw/loader.h"
> +#include "cpu.h"
> +#include "elf.h"
> +#include "pvh.h"
> +
> +static size_t pvh_start_addr;
> +
> +size_t pvh_get_start_addr(void)
> +{
> +    return pvh_start_addr;
> +}
> +
> +/*
> + * The entry point into the kernel for PVH boot is different from
> + * the native entry point.  The PVH entry is defined by the x86/HVM
> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> + *
> + * This function is passed to load_elf() when it is called from
> + * load_elfboot() which then additionally checks for an ELF Note of
> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> + * parse the PVH entry address from the ELF Note.
> + *
> + * Due to trickery in elf_opts.h, load_elf() is actually available as
> + * load_elf32() or load_elf64() and this routine needs to be able
> + * to deal with being called as 32 or 64 bit.
> + *
> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
> + * global variable.  (although the entry point is 32-bit, the kernel
> + * binary can be either 32-bit or 64-bit).
> + */
> +
> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> +{
> +    size_t *elf_note_data_addr;
> +
> +    /* Check if ELF Note header passed in is valid */
> +    if (arg1 == NULL) {
> +        return 0;
> +    }
> +
> +    if (is64) {
> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> +        uint64_t phdr_align = *(uint64_t *)arg2;
> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr64) + nhdr_size64 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    } else {
> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> +        uint32_t phdr_align = *(uint32_t *)arg2;
> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr32) + nhdr_size32 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    }
> +
> +    pvh_start_addr = *elf_note_data_addr;
> +
> +    return pvh_start_addr;
> +}
> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size)
> +{
> +    uint64_t elf_entry;
> +    uint64_t elf_low, elf_high;
> +    int kernel_size;
> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> +
> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> +                           NULL, &elf_note_type, &elf_entry,
> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> +                           0, 0);
> +
> +    if (kernel_size < 0) {
> +        error_report("Error while loading elf kernel");
> +        return false;
> +    }
> +
> +    if (pvh_start_addr == 0) {
> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
> +        return false;
> +    }
> +
> +    if (mh_load_addr) {
> +        *mh_load_addr = elf_low;
> +    }
> +
> +    if (elf_kernel_size) {
> +        *elf_kernel_size = elf_high - elf_low;
> +    }
> +
> +    return true;
> +}
> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
> new file mode 100644
> index 0000000000..ada67ff6e8
> --- /dev/null
> +++ b/hw/i386/pvh.h
> @@ -0,0 +1,10 @@
> +#ifndef HW_I386_PVH_H
> +#define HW_I386_PVH_H
> +
> +size_t pvh_get_start_addr(void);

What about adding "size_t *pvh_start_addr" to the pvh_load_elfboot()?
Just an idea, I'm not sure if it is better...

> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size);
> +
> +#endif

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  8:16       ` David Hildenbrand
@ 2019-09-25  8:37         ` Pankaj Gupta
  -1 siblings, 0 replies; 133+ messages in thread
From: Pankaj Gupta @ 2019-09-25  8:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Sergio Lopez, ehabkost, kvm, mst, lersek, mtosatti, qemu-devel,
	kraxel, pbonzini, imammedo, philmd, rth


> >>> Microvm is a machine type inspired by both NEMU and Firecracker, and
> >>> constructed after the machine model implemented by the latter.
> >>>
> >>> It's main purpose is providing users a minimalist machine type free
> >>> from the burden of legacy compatibility, serving as a stepping stone
> >>> for future projects aiming at improving boot times, reducing the
> >>> attack surface and slimming down QEMU's footprint.
> >>>
> >>> The microvm machine type supports the following devices:
> >>>
> >>>  - ISA bus
> >>>  - i8259 PIC
> >>>  - LAPIC (implicit if using KVM)
> >>>  - IOAPIC (defaults to kernel_irqchip_split = true)
> >>>  - i8254 PIT
> >>>  - MC146818 RTC (optional)
> >>>  - kvmclock (if using KVM)
> >>>  - fw_cfg
> >>>  - One ISA serial port (optional)
> >>>  - Up to eight virtio-mmio devices (configured by the user)
> >>
> >> So I assume also no ACPI (CPU/memory hotplug), correct?
> > 
> > Correct.
> > 
> >> @Pankaj, I think it would make sense to make virtio-pmem play with
> >> virtio-mmio/microvm.
> > 
> > That would be great. I'm also looking forward for virtio-mem (and an
> > hypothetical virtio-cpu) to eventually gain hotplug capabilities in
> > microvm.
> 
> @Pankaj, do you have time to look into the virtio-pmem thingy? I guess
> the virtio-mmio rapper shouldn't be too hard (very similar to the
> virtio-pci wrapper - luckily I insisted to make it work independently
> from PCI BARs and ACPI slots ;) ). The microvm bits would be properly
> setting up device memory and wiring up the hotplug handlers, similar as
> done in the other PC machine types (maybe that comes for free?).

Yes, I can look at.

> 
> virtio-pmem will allow (in read-only mode) to place the rootfs on a fake
> NVDIMM, as done e.g., in kata containers. We might have to include the
> virtio-pmem kernel module in the initramfs, shouldn't  be too hard. Not
> sure what else we'll need to make virtio-pmem get used as a rootfs.

Sure, will work on it.

Thanks,
Pankaj

> 
> > 
> > Thanks,
> > Sergio.
> > 
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 
> 

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  8:37         ` Pankaj Gupta
  0 siblings, 0 replies; 133+ messages in thread
From: Pankaj Gupta @ 2019-09-25  8:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: ehabkost, Sergio Lopez, mst, philmd, mtosatti, qemu-devel,
	kraxel, kvm, imammedo, pbonzini, lersek, rth


> >>> Microvm is a machine type inspired by both NEMU and Firecracker, and
> >>> constructed after the machine model implemented by the latter.
> >>>
> >>> It's main purpose is providing users a minimalist machine type free
> >>> from the burden of legacy compatibility, serving as a stepping stone
> >>> for future projects aiming at improving boot times, reducing the
> >>> attack surface and slimming down QEMU's footprint.
> >>>
> >>> The microvm machine type supports the following devices:
> >>>
> >>>  - ISA bus
> >>>  - i8259 PIC
> >>>  - LAPIC (implicit if using KVM)
> >>>  - IOAPIC (defaults to kernel_irqchip_split = true)
> >>>  - i8254 PIT
> >>>  - MC146818 RTC (optional)
> >>>  - kvmclock (if using KVM)
> >>>  - fw_cfg
> >>>  - One ISA serial port (optional)
> >>>  - Up to eight virtio-mmio devices (configured by the user)
> >>
> >> So I assume also no ACPI (CPU/memory hotplug), correct?
> > 
> > Correct.
> > 
> >> @Pankaj, I think it would make sense to make virtio-pmem play with
> >> virtio-mmio/microvm.
> > 
> > That would be great. I'm also looking forward for virtio-mem (and an
> > hypothetical virtio-cpu) to eventually gain hotplug capabilities in
> > microvm.
> 
> @Pankaj, do you have time to look into the virtio-pmem thingy? I guess
> the virtio-mmio rapper shouldn't be too hard (very similar to the
> virtio-pci wrapper - luckily I insisted to make it work independently
> from PCI BARs and ACPI slots ;) ). The microvm bits would be properly
> setting up device memory and wiring up the hotplug handlers, similar as
> done in the other PC machine types (maybe that comes for free?).

Yes, I can look at.

> 
> virtio-pmem will allow (in read-only mode) to place the rootfs on a fake
> NVDIMM, as done e.g., in kata containers. We might have to include the
> virtio-pmem kernel module in the initramfs, shouldn't  be too hard. Not
> sure what else we'll need to make virtio-pmem get used as a rootfs.

Sure, will work on it.

Thanks,
Pankaj

> 
> > 
> > Thanks,
> > Sergio.
> > 
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 
> 


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-25  7:57         ` Paolo Bonzini
@ 2019-09-25  8:40           ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  8:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 1607 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/09/19 07:49, Sergio Lopez wrote:
>>>> +serving as a stepping stone
>>>> +for future projects aiming at improving boot times, reducing the
>>>> +attack surface and slimming down QEMU's footprint.
>>>
>>> "Microvm also establishes a baseline for benchmarking QEMU and operating
>>> systems, since it is optimized for both boot time and footprint".
>> 
>> Well, I prefer my paragraph, but I'm good with either.
>
> You're right my version sort of missed the point.  What about
> s/benchmarking/benchmarking and optimizing/?
>
>>>> +The microvm machine type supports the following devices:
>>>> +
>>>> + - ISA bus
>>>> + - i8259 PIC
>>>> + - LAPIC (implicit if using KVM)
>>>> + - IOAPIC (defaults to kernel_irqchip_split = true)
>>>> + - i8254 PIT
>>>
>>> Do we need the PIT?  And perhaps the PIC even?
>> 
>> We need the PIT for non-KVM accel (if present with KVM and
>> kernel_irqchip_split = off, it basically becomes a placeholder)
>
> Why?

Perhaps I'm missing something. Is some other device supposed to be
acting as a HW timer while running with TCG acceleration?

>> and the
>> PIC for both the PIT and the ISA serial port.
>
> Can't the ISA serial port work with the IOAPIC?

Hm... I'm not sure. I wanted to give it a try, but then noticed that
multiple places in the code (like hw/intc/apic.c:560) do expect to have
an ISA PIC present through the isa_pic global variable.

I guess we should be able to work around this, but I'm not sure if it's
really worth it. What do you think?

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25  8:40           ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  8:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 1607 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/09/19 07:49, Sergio Lopez wrote:
>>>> +serving as a stepping stone
>>>> +for future projects aiming at improving boot times, reducing the
>>>> +attack surface and slimming down QEMU's footprint.
>>>
>>> "Microvm also establishes a baseline for benchmarking QEMU and operating
>>> systems, since it is optimized for both boot time and footprint".
>> 
>> Well, I prefer my paragraph, but I'm good with either.
>
> You're right my version sort of missed the point.  What about
> s/benchmarking/benchmarking and optimizing/?
>
>>>> +The microvm machine type supports the following devices:
>>>> +
>>>> + - ISA bus
>>>> + - i8259 PIC
>>>> + - LAPIC (implicit if using KVM)
>>>> + - IOAPIC (defaults to kernel_irqchip_split = true)
>>>> + - i8254 PIT
>>>
>>> Do we need the PIT?  And perhaps the PIC even?
>> 
>> We need the PIT for non-KVM accel (if present with KVM and
>> kernel_irqchip_split = off, it basically becomes a placeholder)
>
> Why?

Perhaps I'm missing something. Is some other device supposed to be
acting as a HW timer while running with TCG acceleration?

>> and the
>> PIC for both the PIT and the ISA serial port.
>
> Can't the ISA serial port work with the IOAPIC?

Hm... I'm not sure. I wanted to give it a try, but then noticed that
multiple places in the code (like hw/intc/apic.c:560) do expect to have
an ISA PIC present through the isa_pic global variable.

I guess we should be able to work around this, but I'm not sure if it's
really worth it. What do you think?

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  8:26       ` Paolo Bonzini
@ 2019-09-25  8:42         ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  8:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Hildenbrand, qemu-devel, mst, imammedo, marcel.apfelbaum,
	rth, ehabkost, philmd, lersek, kraxel, mtosatti, kvm,
	Pankaj Gupta

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/09/19 10:10, Sergio Lopez wrote:
>> That would be great. I'm also looking forward for virtio-mem (and an
>> hypothetical virtio-cpu) to eventually gain hotplug capabilities in
>> microvm.
>
> I disagree with this.  virtio is not a silver bullet (and in fact
> perhaps it's just me but I've never understood the advantages of
> virtio-mem over anything else).
>
> If you want to add hotplug to microvm, you can reuse the existing code
> for CPU and memory hotplug controllers, and write drivers for them in
> Linux's drivers/platform.  The drivers would basically do what the ACPI
> AML tells the interpreter to do.
>
> There is no reason to add the complexity of virtio to something as
> low-level and deadlock-prone as CPU hotplug.

TBH, I haven't put much thought into this yet. I'll keep this in mind
for the future.

Thanks,
Sergio.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  8:42         ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  8:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Pankaj Gupta, mtosatti, ehabkost, kvm, mst, lersek,
	David Hildenbrand, qemu-devel, kraxel, imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/09/19 10:10, Sergio Lopez wrote:
>> That would be great. I'm also looking forward for virtio-mem (and an
>> hypothetical virtio-cpu) to eventually gain hotplug capabilities in
>> microvm.
>
> I disagree with this.  virtio is not a silver bullet (and in fact
> perhaps it's just me but I've never understood the advantages of
> virtio-mem over anything else).
>
> If you want to add hotplug to microvm, you can reuse the existing code
> for CPU and memory hotplug controllers, and write drivers for them in
> Linux's drivers/platform.  The drivers would basically do what the ACPI
> AML tells the interpreter to do.
>
> There is no reason to add the complexity of virtio to something as
> low-level and deadlock-prone as CPU hotplug.

TBH, I haven't put much thought into this yet. I'll keep this in mind
for the future.

Thanks,
Sergio.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  8:26       ` Paolo Bonzini
@ 2019-09-25  8:44         ` David Hildenbrand
  -1 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25  8:44 UTC (permalink / raw)
  To: Paolo Bonzini, Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm, Pankaj Gupta

On 25.09.19 10:26, Paolo Bonzini wrote:
> On 25/09/19 10:10, Sergio Lopez wrote:
>> That would be great. I'm also looking forward for virtio-mem (and an
>> hypothetical virtio-cpu) to eventually gain hotplug capabilities in
>> microvm.
> 
> I disagree with this.  virtio is not a silver bullet (and in fact
> perhaps it's just me but I've never understood the advantages of
> virtio-mem over anything else).

Sorry, I had to lol about "virtio-mem over anything else". No, not
starting a discussion.

> 
> If you want to add hotplug to microvm, you can reuse the existing code
> for CPU and memory hotplug controllers, and write drivers for them in
> Linux's drivers/platform.  The drivers would basically do what the ACPI
> AML tells the interpreter to do.
> 
> There is no reason to add the complexity of virtio to something as
> low-level and deadlock-prone as CPU hotplug.

I do agree in respect of CPU hotplug complexity (especially accross
architectures), but thinking "outside of the wonderful x86 world", other
architectures impose limitations (e.g., no cpu unplug on s390x - at
least for now) that make something like this very interesting. But yeah,
I already expressed somewhere else my feelings about CPU hotplug.

I consider virtio the silver bullet whenever we want a mature
paravirtualized interface across architectures. And you can tell that
I'm not the only one by the huge amount of virtio device people are
crafting right now.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  8:44         ` David Hildenbrand
  0 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25  8:44 UTC (permalink / raw)
  To: Paolo Bonzini, Sergio Lopez
  Cc: Pankaj Gupta, ehabkost, kvm, mst, lersek, mtosatti, qemu-devel,
	kraxel, imammedo, philmd, rth

On 25.09.19 10:26, Paolo Bonzini wrote:
> On 25/09/19 10:10, Sergio Lopez wrote:
>> That would be great. I'm also looking forward for virtio-mem (and an
>> hypothetical virtio-cpu) to eventually gain hotplug capabilities in
>> microvm.
> 
> I disagree with this.  virtio is not a silver bullet (and in fact
> perhaps it's just me but I've never understood the advantages of
> virtio-mem over anything else).

Sorry, I had to lol about "virtio-mem over anything else". No, not
starting a discussion.

> 
> If you want to add hotplug to microvm, you can reuse the existing code
> for CPU and memory hotplug controllers, and write drivers for them in
> Linux's drivers/platform.  The drivers would basically do what the ACPI
> AML tells the interpreter to do.
> 
> There is no reason to add the complexity of virtio to something as
> low-level and deadlock-prone as CPU hotplug.

I do agree in respect of CPU hotplug complexity (especially accross
architectures), but thinking "outside of the wonderful x86 world", other
architectures impose limitations (e.g., no cpu unplug on s390x - at
least for now) that make something like this very interesting. But yeah,
I already expressed somewhere else my feelings about CPU hotplug.

I consider virtio the silver bullet whenever we want a mature
paravirtualized interface across architectures. And you can tell that
I'm not the only one by the huge amount of virtio device people are
crafting right now.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-25  7:33       ` Sergio Lopez
@ 2019-09-25  8:51         ` Gerd Hoffmann
  -1 siblings, 0 replies; 133+ messages in thread
From: Gerd Hoffmann @ 2019-09-25  8:51 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, philmd, lersek, mtosatti, kvm

  Hi,

> For microvm that's simply not worth it. Fiddling with the command line
> achieves the same result without any significant drawbacks,

Assuming you actually can fiddle with the command line, which is only
the case with direct kernel boot.

> > To fix that the firmware must be able to find the virtio-mmio devices.
> 
> No FW supports modern virtio-mmio transports anyway.

Well, we change that if we want ...

> And, from microvm's perspective, there's little incentive to change
> this situation, given that it's main use cases (serverless computing
> and VM-isolated containers) will run with an external kernel.

If direct kernel boot is the only use case microvm ever wants support,
then there is little reason to go the extra mile for optional seabios
support.

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25  8:51         ` Gerd Hoffmann
  0 siblings, 0 replies; 133+ messages in thread
From: Gerd Hoffmann @ 2019-09-25  8:51 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, pbonzini,
	imammedo, philmd, rth

  Hi,

> For microvm that's simply not worth it. Fiddling with the command line
> achieves the same result without any significant drawbacks,

Assuming you actually can fiddle with the command line, which is only
the case with direct kernel boot.

> > To fix that the firmware must be able to find the virtio-mmio devices.
> 
> No FW supports modern virtio-mmio transports anyway.

Well, we change that if we want ...

> And, from microvm's perspective, there's little incentive to change
> this situation, given that it's main use cases (serverless computing
> and VM-isolated containers) will run with an external kernel.

If direct kernel boot is the only use case microvm ever wants support,
then there is little reason to go the extra mile for optional seabios
support.

cheers,
  Gerd



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
  2019-09-25  8:36     ` Stefano Garzarella
@ 2019-09-25  9:00       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  9:00 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 12606 bytes --]


Stefano Garzarella <sgarzare@redhat.com> writes:

> Hi Sergio,
>
> On Tue, Sep 24, 2019 at 02:44:26PM +0200, Sergio Lopez wrote:
>> Extract PVH related functions from pc.c, and put them in pvh.c, so
>> they can be shared with other components.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  hw/i386/Makefile.objs |   1 +
>>  hw/i386/pc.c          | 120 +++++-------------------------------------
>>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
>>  hw/i386/pvh.h         |  10 ++++
>>  4 files changed, 136 insertions(+), 108 deletions(-)
>>  create mode 100644 hw/i386/pvh.c
>>  create mode 100644 hw/i386/pvh.h
>> 
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 5d9c9efd5f..c5f20bbd72 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -1,5 +1,6 @@
>>  obj-$(CONFIG_KVM) += kvm/
>>  obj-y += multiboot.o
>> +obj-y += pvh.o
>>  obj-y += pc.o
>>  obj-$(CONFIG_I440FX) += pc_piix.o
>>  obj-$(CONFIG_Q35) += pc_q35.o
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index bad866fe44..10e4ced0c6 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -42,6 +42,7 @@
>>  #include "elf.h"
>>  #include "migration/vmstate.h"
>>  #include "multiboot.h"
>> +#include "pvh.h"
>>  #include "hw/timer/mc146818rtc.h"
>>  #include "hw/dma/i8257.h"
>>  #include "hw/timer/i8254.h"
>> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
>>  static unsigned e820_entries;
>>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>>  
>> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
>> -static size_t pvh_start_addr;
>> -
>>  GlobalProperty pc_compat_4_1[] = {};
>>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
>>  
>> @@ -1076,109 +1074,6 @@ struct setup_data {
>>      uint8_t data[0];
>>  } __attribute__((packed));
>>  
>> -
>> -/*
>> - * The entry point into the kernel for PVH boot is different from
>> - * the native entry point.  The PVH entry is defined by the x86/HVM
>> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> - *
>> - * This function is passed to load_elf() when it is called from
>> - * load_elfboot() which then additionally checks for an ELF Note of
>> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
>> - * parse the PVH entry address from the ELF Note.
>> - *
>> - * Due to trickery in elf_opts.h, load_elf() is actually available as
>> - * load_elf32() or load_elf64() and this routine needs to be able
>> - * to deal with being called as 32 or 64 bit.
>> - *
>> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> - * global variable.  (although the entry point is 32-bit, the kernel
>> - * binary can be either 32-bit or 64-bit).
>> - */
>> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
>> -{
>> -    size_t *elf_note_data_addr;
>> -
>> -    /* Check if ELF Note header passed in is valid */
>> -    if (arg1 == NULL) {
>> -        return 0;
>> -    }
>> -
>> -    if (is64) {
>> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
>> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
>> -        uint64_t phdr_align = *(uint64_t *)arg2;
>> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
>> -
>> -        elf_note_data_addr =
>> -            ((void *)nhdr64) + nhdr_size64 +
>> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> -    } else {
>> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
>> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
>> -        uint32_t phdr_align = *(uint32_t *)arg2;
>> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
>> -
>> -        elf_note_data_addr =
>> -            ((void *)nhdr32) + nhdr_size32 +
>> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> -    }
>> -
>> -    pvh_start_addr = *elf_note_data_addr;
>> -
>> -    return pvh_start_addr;
>> -}
>> -
>> -static bool load_elfboot(const char *kernel_filename,
>> -                   int kernel_file_size,
>> -                   uint8_t *header,
>> -                   size_t pvh_xen_start_addr,
>> -                   FWCfgState *fw_cfg)
>> -{
>> -    uint32_t flags = 0;
>> -    uint32_t mh_load_addr = 0;
>> -    uint32_t elf_kernel_size = 0;
>> -    uint64_t elf_entry;
>> -    uint64_t elf_low, elf_high;
>> -    int kernel_size;
>> -
>
> Are we removing the following checks (ELF magic, flags) because they
> are superfluous?
>
> Should we mention this in the commit message?

Damn, good catch, that's wrong.

The only patches coming from previous iterations are the one factorizing
the e820 functions and this one, and both are wrong. I'm going to ditch
them and write whatever it's needed from scratch.

>> -    if (ldl_p(header) != 0x464c457f) {
>> -        return false; /* no elfboot */
>> -    }
>> -
>> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
>> -    flags = elf_is64 ?
>> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
>> -
>> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
>> -        error_report("elfboot unsupported flags = %x", flags);
>> -        exit(1);
>> -    }
>> -
>> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
>> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
>> -                           NULL, &elf_note_type, &elf_entry,
>> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> -                           0, 0);
>> -
>> -    if (kernel_size < 0) {
>> -        error_report("Error while loading elf kernel");
>> -        exit(1);
>> -    }
>> -    mh_load_addr = elf_low;
>> -    elf_kernel_size = elf_high - elf_low;
>> -
>> -    if (pvh_start_addr == 0) {
>> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
>> -        exit(1);
>> -    }
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> -
>> -    return true;
>> -}
>> -
>>  static void load_linux(PCMachineState *pcms,
>>                         FWCfgState *fw_cfg)
>>  {
>> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
>>      if (ldl_p(header+0x202) == 0x53726448) {
>>          protocol = lduw_p(header+0x206);
>>      } else {
>> +        size_t pvh_start_addr;
>> +        uint32_t mh_load_addr = 0;
>> +        uint32_t elf_kernel_size = 0;
>>          /*
>>           * This could be a multiboot kernel. If it is, let's stop treating it
>>           * like a Linux kernel.
>> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
>>           * If load_elfboot() is successful, populate the fw_cfg info.
>>           */
>>          if (pcmc->pvh_enabled &&
>> -            load_elfboot(kernel_filename, kernel_size,
>> -                         header, pvh_start_addr, fw_cfg)) {
>> +            pvh_load_elfboot(kernel_filename,
>> +                             &mh_load_addr, &elf_kernel_size)) {
>>              fclose(f);
>>  
>> +            pvh_start_addr = pvh_get_start_addr();
>> +
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> +
>>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>>                  strlen(kernel_cmdline) + 1);
>>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
>> new file mode 100644
>> index 0000000000..1c81727811
>> --- /dev/null
>> +++ b/hw/i386/pvh.c
>> @@ -0,0 +1,113 @@
>> +/*
>> + * PVH Boot Helper
>> + *
>> + * Copyright (C) 2019 Oracle
>> + * Copyright (C) 2019 Red Hat, Inc
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/units.h"
>> +#include "qemu/error-report.h"
>> +#include "hw/loader.h"
>> +#include "cpu.h"
>> +#include "elf.h"
>> +#include "pvh.h"
>> +
>> +static size_t pvh_start_addr;
>> +
>> +size_t pvh_get_start_addr(void)
>> +{
>> +    return pvh_start_addr;
>> +}
>> +
>> +/*
>> + * The entry point into the kernel for PVH boot is different from
>> + * the native entry point.  The PVH entry is defined by the x86/HVM
>> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> + *
>> + * This function is passed to load_elf() when it is called from
>> + * load_elfboot() which then additionally checks for an ELF Note of
>> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
>> + * parse the PVH entry address from the ELF Note.
>> + *
>> + * Due to trickery in elf_opts.h, load_elf() is actually available as
>> + * load_elf32() or load_elf64() and this routine needs to be able
>> + * to deal with being called as 32 or 64 bit.
>> + *
>> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> + * global variable.  (although the entry point is 32-bit, the kernel
>> + * binary can be either 32-bit or 64-bit).
>> + */
>> +
>> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
>> +{
>> +    size_t *elf_note_data_addr;
>> +
>> +    /* Check if ELF Note header passed in is valid */
>> +    if (arg1 == NULL) {
>> +        return 0;
>> +    }
>> +
>> +    if (is64) {
>> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
>> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
>> +        uint64_t phdr_align = *(uint64_t *)arg2;
>> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
>> +
>> +        elf_note_data_addr =
>> +            ((void *)nhdr64) + nhdr_size64 +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +    } else {
>> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
>> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
>> +        uint32_t phdr_align = *(uint32_t *)arg2;
>> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
>> +
>> +        elf_note_data_addr =
>> +            ((void *)nhdr32) + nhdr_size32 +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +    }
>> +
>> +    pvh_start_addr = *elf_note_data_addr;
>> +
>> +    return pvh_start_addr;
>> +}
>> +
>> +bool pvh_load_elfboot(const char *kernel_filename,
>> +                      uint32_t *mh_load_addr,
>> +                      uint32_t *elf_kernel_size)
>> +{
>> +    uint64_t elf_entry;
>> +    uint64_t elf_low, elf_high;
>> +    int kernel_size;
>> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
>> +
>> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
>> +                           NULL, &elf_note_type, &elf_entry,
>> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> +                           0, 0);
>> +
>> +    if (kernel_size < 0) {
>> +        error_report("Error while loading elf kernel");
>> +        return false;
>> +    }
>> +
>> +    if (pvh_start_addr == 0) {
>> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
>> +        return false;
>> +    }
>> +
>> +    if (mh_load_addr) {
>> +        *mh_load_addr = elf_low;
>> +    }
>> +
>> +    if (elf_kernel_size) {
>> +        *elf_kernel_size = elf_high - elf_low;
>> +    }
>> +
>> +    return true;
>> +}
>> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
>> new file mode 100644
>> index 0000000000..ada67ff6e8
>> --- /dev/null
>> +++ b/hw/i386/pvh.h
>> @@ -0,0 +1,10 @@
>> +#ifndef HW_I386_PVH_H
>> +#define HW_I386_PVH_H
>> +
>> +size_t pvh_get_start_addr(void);
>
> What about adding "size_t *pvh_start_addr" to the pvh_load_elfboot()?
> Just an idea, I'm not sure if it is better...

I agree. In fact, given that patch 4/8 extracts some common functions
from pc.c into x86.c, and load_linux is among these functions, perhaps
we can avoid creating an independent file and just put the PVH code
there.

What do you think?

Thanks a lot,
Sergio.

>> +
>> +bool pvh_load_elfboot(const char *kernel_filename,
>> +                      uint32_t *mh_load_addr,
>> +                      uint32_t *elf_kernel_size);
>> +
>> +#endif
>
> Thanks,
> Stefano


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
@ 2019-09-25  9:00       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25  9:00 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	pbonzini, imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 12606 bytes --]


Stefano Garzarella <sgarzare@redhat.com> writes:

> Hi Sergio,
>
> On Tue, Sep 24, 2019 at 02:44:26PM +0200, Sergio Lopez wrote:
>> Extract PVH related functions from pc.c, and put them in pvh.c, so
>> they can be shared with other components.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  hw/i386/Makefile.objs |   1 +
>>  hw/i386/pc.c          | 120 +++++-------------------------------------
>>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
>>  hw/i386/pvh.h         |  10 ++++
>>  4 files changed, 136 insertions(+), 108 deletions(-)
>>  create mode 100644 hw/i386/pvh.c
>>  create mode 100644 hw/i386/pvh.h
>> 
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 5d9c9efd5f..c5f20bbd72 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -1,5 +1,6 @@
>>  obj-$(CONFIG_KVM) += kvm/
>>  obj-y += multiboot.o
>> +obj-y += pvh.o
>>  obj-y += pc.o
>>  obj-$(CONFIG_I440FX) += pc_piix.o
>>  obj-$(CONFIG_Q35) += pc_q35.o
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index bad866fe44..10e4ced0c6 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -42,6 +42,7 @@
>>  #include "elf.h"
>>  #include "migration/vmstate.h"
>>  #include "multiboot.h"
>> +#include "pvh.h"
>>  #include "hw/timer/mc146818rtc.h"
>>  #include "hw/dma/i8257.h"
>>  #include "hw/timer/i8254.h"
>> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
>>  static unsigned e820_entries;
>>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>>  
>> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
>> -static size_t pvh_start_addr;
>> -
>>  GlobalProperty pc_compat_4_1[] = {};
>>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
>>  
>> @@ -1076,109 +1074,6 @@ struct setup_data {
>>      uint8_t data[0];
>>  } __attribute__((packed));
>>  
>> -
>> -/*
>> - * The entry point into the kernel for PVH boot is different from
>> - * the native entry point.  The PVH entry is defined by the x86/HVM
>> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> - *
>> - * This function is passed to load_elf() when it is called from
>> - * load_elfboot() which then additionally checks for an ELF Note of
>> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
>> - * parse the PVH entry address from the ELF Note.
>> - *
>> - * Due to trickery in elf_opts.h, load_elf() is actually available as
>> - * load_elf32() or load_elf64() and this routine needs to be able
>> - * to deal with being called as 32 or 64 bit.
>> - *
>> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> - * global variable.  (although the entry point is 32-bit, the kernel
>> - * binary can be either 32-bit or 64-bit).
>> - */
>> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
>> -{
>> -    size_t *elf_note_data_addr;
>> -
>> -    /* Check if ELF Note header passed in is valid */
>> -    if (arg1 == NULL) {
>> -        return 0;
>> -    }
>> -
>> -    if (is64) {
>> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
>> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
>> -        uint64_t phdr_align = *(uint64_t *)arg2;
>> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
>> -
>> -        elf_note_data_addr =
>> -            ((void *)nhdr64) + nhdr_size64 +
>> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> -    } else {
>> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
>> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
>> -        uint32_t phdr_align = *(uint32_t *)arg2;
>> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
>> -
>> -        elf_note_data_addr =
>> -            ((void *)nhdr32) + nhdr_size32 +
>> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> -    }
>> -
>> -    pvh_start_addr = *elf_note_data_addr;
>> -
>> -    return pvh_start_addr;
>> -}
>> -
>> -static bool load_elfboot(const char *kernel_filename,
>> -                   int kernel_file_size,
>> -                   uint8_t *header,
>> -                   size_t pvh_xen_start_addr,
>> -                   FWCfgState *fw_cfg)
>> -{
>> -    uint32_t flags = 0;
>> -    uint32_t mh_load_addr = 0;
>> -    uint32_t elf_kernel_size = 0;
>> -    uint64_t elf_entry;
>> -    uint64_t elf_low, elf_high;
>> -    int kernel_size;
>> -
>
> Are we removing the following checks (ELF magic, flags) because they
> are superfluous?
>
> Should we mention this in the commit message?

Damn, good catch, that's wrong.

The only patches coming from previous iterations are the one factorizing
the e820 functions and this one, and both are wrong. I'm going to ditch
them and write whatever it's needed from scratch.

>> -    if (ldl_p(header) != 0x464c457f) {
>> -        return false; /* no elfboot */
>> -    }
>> -
>> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
>> -    flags = elf_is64 ?
>> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
>> -
>> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
>> -        error_report("elfboot unsupported flags = %x", flags);
>> -        exit(1);
>> -    }
>> -
>> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
>> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
>> -                           NULL, &elf_note_type, &elf_entry,
>> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> -                           0, 0);
>> -
>> -    if (kernel_size < 0) {
>> -        error_report("Error while loading elf kernel");
>> -        exit(1);
>> -    }
>> -    mh_load_addr = elf_low;
>> -    elf_kernel_size = elf_high - elf_low;
>> -
>> -    if (pvh_start_addr == 0) {
>> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
>> -        exit(1);
>> -    }
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> -
>> -    return true;
>> -}
>> -
>>  static void load_linux(PCMachineState *pcms,
>>                         FWCfgState *fw_cfg)
>>  {
>> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
>>      if (ldl_p(header+0x202) == 0x53726448) {
>>          protocol = lduw_p(header+0x206);
>>      } else {
>> +        size_t pvh_start_addr;
>> +        uint32_t mh_load_addr = 0;
>> +        uint32_t elf_kernel_size = 0;
>>          /*
>>           * This could be a multiboot kernel. If it is, let's stop treating it
>>           * like a Linux kernel.
>> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
>>           * If load_elfboot() is successful, populate the fw_cfg info.
>>           */
>>          if (pcmc->pvh_enabled &&
>> -            load_elfboot(kernel_filename, kernel_size,
>> -                         header, pvh_start_addr, fw_cfg)) {
>> +            pvh_load_elfboot(kernel_filename,
>> +                             &mh_load_addr, &elf_kernel_size)) {
>>              fclose(f);
>>  
>> +            pvh_start_addr = pvh_get_start_addr();
>> +
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> +
>>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>>                  strlen(kernel_cmdline) + 1);
>>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
>> new file mode 100644
>> index 0000000000..1c81727811
>> --- /dev/null
>> +++ b/hw/i386/pvh.c
>> @@ -0,0 +1,113 @@
>> +/*
>> + * PVH Boot Helper
>> + *
>> + * Copyright (C) 2019 Oracle
>> + * Copyright (C) 2019 Red Hat, Inc
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/units.h"
>> +#include "qemu/error-report.h"
>> +#include "hw/loader.h"
>> +#include "cpu.h"
>> +#include "elf.h"
>> +#include "pvh.h"
>> +
>> +static size_t pvh_start_addr;
>> +
>> +size_t pvh_get_start_addr(void)
>> +{
>> +    return pvh_start_addr;
>> +}
>> +
>> +/*
>> + * The entry point into the kernel for PVH boot is different from
>> + * the native entry point.  The PVH entry is defined by the x86/HVM
>> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
>> + *
>> + * This function is passed to load_elf() when it is called from
>> + * load_elfboot() which then additionally checks for an ELF Note of
>> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
>> + * parse the PVH entry address from the ELF Note.
>> + *
>> + * Due to trickery in elf_opts.h, load_elf() is actually available as
>> + * load_elf32() or load_elf64() and this routine needs to be able
>> + * to deal with being called as 32 or 64 bit.
>> + *
>> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
>> + * global variable.  (although the entry point is 32-bit, the kernel
>> + * binary can be either 32-bit or 64-bit).
>> + */
>> +
>> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
>> +{
>> +    size_t *elf_note_data_addr;
>> +
>> +    /* Check if ELF Note header passed in is valid */
>> +    if (arg1 == NULL) {
>> +        return 0;
>> +    }
>> +
>> +    if (is64) {
>> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
>> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
>> +        uint64_t phdr_align = *(uint64_t *)arg2;
>> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
>> +
>> +        elf_note_data_addr =
>> +            ((void *)nhdr64) + nhdr_size64 +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +    } else {
>> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
>> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
>> +        uint32_t phdr_align = *(uint32_t *)arg2;
>> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
>> +
>> +        elf_note_data_addr =
>> +            ((void *)nhdr32) + nhdr_size32 +
>> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
>> +    }
>> +
>> +    pvh_start_addr = *elf_note_data_addr;
>> +
>> +    return pvh_start_addr;
>> +}
>> +
>> +bool pvh_load_elfboot(const char *kernel_filename,
>> +                      uint32_t *mh_load_addr,
>> +                      uint32_t *elf_kernel_size)
>> +{
>> +    uint64_t elf_entry;
>> +    uint64_t elf_low, elf_high;
>> +    int kernel_size;
>> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
>> +
>> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
>> +                           NULL, &elf_note_type, &elf_entry,
>> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
>> +                           0, 0);
>> +
>> +    if (kernel_size < 0) {
>> +        error_report("Error while loading elf kernel");
>> +        return false;
>> +    }
>> +
>> +    if (pvh_start_addr == 0) {
>> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
>> +        return false;
>> +    }
>> +
>> +    if (mh_load_addr) {
>> +        *mh_load_addr = elf_low;
>> +    }
>> +
>> +    if (elf_kernel_size) {
>> +        *elf_kernel_size = elf_high - elf_low;
>> +    }
>> +
>> +    return true;
>> +}
>> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
>> new file mode 100644
>> index 0000000000..ada67ff6e8
>> --- /dev/null
>> +++ b/hw/i386/pvh.h
>> @@ -0,0 +1,10 @@
>> +#ifndef HW_I386_PVH_H
>> +#define HW_I386_PVH_H
>> +
>> +size_t pvh_get_start_addr(void);
>
> What about adding "size_t *pvh_start_addr" to the pvh_load_elfboot()?
> Just an idea, I'm not sure if it is better...

I agree. In fact, given that patch 4/8 extracts some common functions
from pc.c into x86.c, and load_linux is among these functions, perhaps
we can avoid creating an independent file and just put the PVH code
there.

What do you think?

Thanks a lot,
Sergio.

>> +
>> +bool pvh_load_elfboot(const char *kernel_filename,
>> +                      uint32_t *mh_load_addr,
>> +                      uint32_t *elf_kernel_size);
>> +
>> +#endif
>
> Thanks,
> Stefano


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  8:26       ` Paolo Bonzini
@ 2019-09-25  9:12         ` Gerd Hoffmann
  -1 siblings, 0 replies; 133+ messages in thread
From: Gerd Hoffmann @ 2019-09-25  9:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sergio Lopez, David Hildenbrand, qemu-devel, mst, imammedo,
	marcel.apfelbaum, rth, ehabkost, philmd, lersek, mtosatti, kvm,
	Pankaj Gupta

  Hi,

> If you want to add hotplug to microvm, you can reuse the existing code
> for CPU and memory hotplug controllers, and write drivers for them in
> Linux's drivers/platform.  The drivers would basically do what the ACPI
> AML tells the interpreter to do.

How would the linux kernel detect those devices?

I guess that wouldn't be ACPI, seems everyone wants avoid it[1].

So device tree on x86?  Something else?

cheers,
  Gerd

[1] Not clear to me why, some minimal ACPI tables listing our
    devices (isa-serial, fw_cfg, ...) doesn't look unreasonable
    to me.  We could also make virtio-mmio discoverable that way.
    Also we could do acpi cpu hotplug without having to write those
    linux platform drivers.  We would need a sysbus-acpi device though,
    but given that most acpi code is already separated out so piix and
    q35 can share it it should not be that hard to wire up.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  9:12         ` Gerd Hoffmann
  0 siblings, 0 replies; 133+ messages in thread
From: Gerd Hoffmann @ 2019-09-25  9:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Pankaj Gupta, mtosatti, ehabkost, Sergio Lopez, mst, lersek,
	David Hildenbrand, qemu-devel, kvm, imammedo, philmd, rth

  Hi,

> If you want to add hotplug to microvm, you can reuse the existing code
> for CPU and memory hotplug controllers, and write drivers for them in
> Linux's drivers/platform.  The drivers would basically do what the ACPI
> AML tells the interpreter to do.

How would the linux kernel detect those devices?

I guess that wouldn't be ACPI, seems everyone wants avoid it[1].

So device tree on x86?  Something else?

cheers,
  Gerd

[1] Not clear to me why, some minimal ACPI tables listing our
    devices (isa-serial, fw_cfg, ...) doesn't look unreasonable
    to me.  We could also make virtio-mmio discoverable that way.
    Also we could do acpi cpu hotplug without having to write those
    linux platform drivers.  We would need a sysbus-acpi device though,
    but given that most acpi code is already separated out so piix and
    q35 can share it it should not be that hard to wire up.


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-25  8:40           ` Sergio Lopez
@ 2019-09-25  9:22             ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25  9:22 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm


[-- Attachment #1.1: Type: text/plain, Size: 1626 bytes --]

On 25/09/19 10:40, Sergio Lopez wrote:
>>> We need the PIT for non-KVM accel (if present with KVM and
>>> kernel_irqchip_split = off, it basically becomes a placeholder)
>> Why?
> 
> Perhaps I'm missing something. Is some other device supposed to be
> acting as a HW timer while running with TCG acceleration?

Sure, the LAPIC timer.  I wonder if Linux, however, wants to use the PIT
in order to calibrate the LAPIC timer if TSC deadline mode is unavailable.

>>> and the PIC for both the PIT and the ISA serial port.
>>
>> Can't the ISA serial port work with the IOAPIC?
> 
> Hm... I'm not sure. I wanted to give it a try, but then noticed that
> multiple places in the code (like hw/intc/apic.c:560) do expect to have
> an ISA PIC present through the isa_pic global variable.
> 
> I guess we should be able to work around this, but I'm not sure if it's
> really worth it. What do you think?

You can add a paragraph saying that in the future the list could be
reduced further.  I think that the direction we want to go is to only
leave the IOAPIC around (the ISA devices in this respect are no
different from the virtio-mmio devices).

But you're right about isa_pic.  I wonder if it's as easy as this:

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index bce89911dc..5d03e48a19 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -610,7 +610,7 @@ int apic_accept_pic_intr(DeviceState *dev)

     if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) == 0 ||
         (lvt0 & APIC_LVT_MASKED) == 0)
-        return 1;
+        return isa_pic != NULL;

     return 0;
 }

Thanks,

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25  9:22             ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25  9:22 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth


[-- Attachment #1.1: Type: text/plain, Size: 1626 bytes --]

On 25/09/19 10:40, Sergio Lopez wrote:
>>> We need the PIT for non-KVM accel (if present with KVM and
>>> kernel_irqchip_split = off, it basically becomes a placeholder)
>> Why?
> 
> Perhaps I'm missing something. Is some other device supposed to be
> acting as a HW timer while running with TCG acceleration?

Sure, the LAPIC timer.  I wonder if Linux, however, wants to use the PIT
in order to calibrate the LAPIC timer if TSC deadline mode is unavailable.

>>> and the PIC for both the PIT and the ISA serial port.
>>
>> Can't the ISA serial port work with the IOAPIC?
> 
> Hm... I'm not sure. I wanted to give it a try, but then noticed that
> multiple places in the code (like hw/intc/apic.c:560) do expect to have
> an ISA PIC present through the isa_pic global variable.
> 
> I guess we should be able to work around this, but I'm not sure if it's
> really worth it. What do you think?

You can add a paragraph saying that in the future the list could be
reduced further.  I think that the direction we want to go is to only
leave the IOAPIC around (the ISA devices in this respect are no
different from the virtio-mmio devices).

But you're right about isa_pic.  I wonder if it's as easy as this:

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index bce89911dc..5d03e48a19 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -610,7 +610,7 @@ int apic_accept_pic_intr(DeviceState *dev)

     if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) == 0 ||
         (lvt0 & APIC_LVT_MASKED) == 0)
-        return 1;
+        return isa_pic != NULL;

     return 0;
 }

Thanks,

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  9:12         ` Gerd Hoffmann
@ 2019-09-25  9:29           ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25  9:29 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Sergio Lopez, David Hildenbrand, qemu-devel, mst, imammedo,
	marcel.apfelbaum, rth, ehabkost, philmd, lersek, mtosatti, kvm,
	Pankaj Gupta

On 25/09/19 11:12, Gerd Hoffmann wrote:
>   Hi,
> 
>> If you want to add hotplug to microvm, you can reuse the existing code
>> for CPU and memory hotplug controllers, and write drivers for them in
>> Linux's drivers/platform.  The drivers would basically do what the ACPI
>> AML tells the interpreter to do.
> 
> How would the linux kernel detect those devices?
>
> I guess that wouldn't be ACPI, seems everyone wants avoid it[1].
> 
> So device tree on x86?  Something else?

Yes, device tree would be great.

> [1] Not clear to me why, some minimal ACPI tables listing our
>     devices (isa-serial, fw_cfg, ...) doesn't look unreasonable
>     to me.

It's not, but ACPI is dog slow and half of the boot time is cut if you
remove it.

> We could also make virtio-mmio discoverable that way.

True, but the simplest way to plumb virtio-mmio into ACPI would be
taking the device tree properties and representing them as _DSD[1].  So
at this point it's just as easy to use directly the device tree.

Paolo

[1]
https://kernel-recipes.org/en/2015/talks/representing-device-tree-peripherals-in-acpi/

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  9:29           ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25  9:29 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Pankaj Gupta, mtosatti, ehabkost, Sergio Lopez, mst, lersek,
	David Hildenbrand, qemu-devel, kvm, imammedo, philmd, rth

On 25/09/19 11:12, Gerd Hoffmann wrote:
>   Hi,
> 
>> If you want to add hotplug to microvm, you can reuse the existing code
>> for CPU and memory hotplug controllers, and write drivers for them in
>> Linux's drivers/platform.  The drivers would basically do what the ACPI
>> AML tells the interpreter to do.
> 
> How would the linux kernel detect those devices?
>
> I guess that wouldn't be ACPI, seems everyone wants avoid it[1].
> 
> So device tree on x86?  Something else?

Yes, device tree would be great.

> [1] Not clear to me why, some minimal ACPI tables listing our
>     devices (isa-serial, fw_cfg, ...) doesn't look unreasonable
>     to me.

It's not, but ACPI is dog slow and half of the boot time is cut if you
remove it.

> We could also make virtio-mmio discoverable that way.

True, but the simplest way to plumb virtio-mmio into ACPI would be
taking the device tree properties and representing them as _DSD[1].  So
at this point it's just as easy to use directly the device tree.

Paolo

[1]
https://kernel-recipes.org/en/2015/talks/representing-device-tree-peripherals-in-acpi/


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
  2019-09-25  9:00       ` Sergio Lopez
@ 2019-09-25  9:29         ` Stefano Garzarella
  -1 siblings, 0 replies; 133+ messages in thread
From: Stefano Garzarella @ 2019-09-25  9:29 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, philmd, lersek, kraxel, mtosatti, kvm

On Wed, Sep 25, 2019 at 11:00:30AM +0200, Sergio Lopez wrote:
> Stefano Garzarella <sgarzare@redhat.com> writes:
> > Hi Sergio,
> >
> > On Tue, Sep 24, 2019 at 02:44:26PM +0200, Sergio Lopez wrote:
> >> Extract PVH related functions from pc.c, and put them in pvh.c, so
> >> they can be shared with other components.
> >> 
> >> Signed-off-by: Sergio Lopez <slp@redhat.com>
> >> ---
> >>  hw/i386/Makefile.objs |   1 +
> >>  hw/i386/pc.c          | 120 +++++-------------------------------------
> >>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
> >>  hw/i386/pvh.h         |  10 ++++
> >>  4 files changed, 136 insertions(+), 108 deletions(-)
> >>  create mode 100644 hw/i386/pvh.c
> >>  create mode 100644 hw/i386/pvh.h
> >> 
> >> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> >> index 5d9c9efd5f..c5f20bbd72 100644
> >> --- a/hw/i386/Makefile.objs
> >> +++ b/hw/i386/Makefile.objs
> >> @@ -1,5 +1,6 @@
> >>  obj-$(CONFIG_KVM) += kvm/
> >>  obj-y += multiboot.o
> >> +obj-y += pvh.o
> >>  obj-y += pc.o
> >>  obj-$(CONFIG_I440FX) += pc_piix.o
> >>  obj-$(CONFIG_Q35) += pc_q35.o
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index bad866fe44..10e4ced0c6 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -42,6 +42,7 @@
> >>  #include "elf.h"
> >>  #include "migration/vmstate.h"
> >>  #include "multiboot.h"
> >> +#include "pvh.h"
> >>  #include "hw/timer/mc146818rtc.h"
> >>  #include "hw/dma/i8257.h"
> >>  #include "hw/timer/i8254.h"
> >> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
> >>  static unsigned e820_entries;
> >>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
> >>  
> >> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
> >> -static size_t pvh_start_addr;
> >> -
> >>  GlobalProperty pc_compat_4_1[] = {};
> >>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
> >>  
> >> @@ -1076,109 +1074,6 @@ struct setup_data {
> >>      uint8_t data[0];
> >>  } __attribute__((packed));
> >>  
> >> -
> >> -/*
> >> - * The entry point into the kernel for PVH boot is different from
> >> - * the native entry point.  The PVH entry is defined by the x86/HVM
> >> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> >> - *
> >> - * This function is passed to load_elf() when it is called from
> >> - * load_elfboot() which then additionally checks for an ELF Note of
> >> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> >> - * parse the PVH entry address from the ELF Note.
> >> - *
> >> - * Due to trickery in elf_opts.h, load_elf() is actually available as
> >> - * load_elf32() or load_elf64() and this routine needs to be able
> >> - * to deal with being called as 32 or 64 bit.
> >> - *
> >> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
> >> - * global variable.  (although the entry point is 32-bit, the kernel
> >> - * binary can be either 32-bit or 64-bit).
> >> - */
> >> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> >> -{
> >> -    size_t *elf_note_data_addr;
> >> -
> >> -    /* Check if ELF Note header passed in is valid */
> >> -    if (arg1 == NULL) {
> >> -        return 0;
> >> -    }
> >> -
> >> -    if (is64) {
> >> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> >> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> >> -        uint64_t phdr_align = *(uint64_t *)arg2;
> >> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
> >> -
> >> -        elf_note_data_addr =
> >> -            ((void *)nhdr64) + nhdr_size64 +
> >> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> >> -    } else {
> >> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> >> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> >> -        uint32_t phdr_align = *(uint32_t *)arg2;
> >> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
> >> -
> >> -        elf_note_data_addr =
> >> -            ((void *)nhdr32) + nhdr_size32 +
> >> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> >> -    }
> >> -
> >> -    pvh_start_addr = *elf_note_data_addr;
> >> -
> >> -    return pvh_start_addr;
> >> -}
> >> -
> >> -static bool load_elfboot(const char *kernel_filename,
> >> -                   int kernel_file_size,
> >> -                   uint8_t *header,
> >> -                   size_t pvh_xen_start_addr,
> >> -                   FWCfgState *fw_cfg)
> >> -{
> >> -    uint32_t flags = 0;
> >> -    uint32_t mh_load_addr = 0;
> >> -    uint32_t elf_kernel_size = 0;
> >> -    uint64_t elf_entry;
> >> -    uint64_t elf_low, elf_high;
> >> -    int kernel_size;
> >> -
> >
> > Are we removing the following checks (ELF magic, flags) because they
> > are superfluous?
> >
> > Should we mention this in the commit message?
> 
> Damn, good catch, that's wrong.
> 
> The only patches coming from previous iterations are the one factorizing
> the e820 functions and this one, and both are wrong. I'm going to ditch
> them and write whatever it's needed from scratch.
> 
> >> -    if (ldl_p(header) != 0x464c457f) {
> >> -        return false; /* no elfboot */
> >> -    }
> >> -
> >> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> >> -    flags = elf_is64 ?
> >> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> >> -
> >> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> >> -        error_report("elfboot unsupported flags = %x", flags);
> >> -        exit(1);
> >> -    }
> >> -
> >> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> >> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> >> -                           NULL, &elf_note_type, &elf_entry,
> >> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> >> -                           0, 0);
> >> -
> >> -    if (kernel_size < 0) {
> >> -        error_report("Error while loading elf kernel");
> >> -        exit(1);
> >> -    }
> >> -    mh_load_addr = elf_low;
> >> -    elf_kernel_size = elf_high - elf_low;
> >> -
> >> -    if (pvh_start_addr == 0) {
> >> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
> >> -        exit(1);
> >> -    }
> >> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> >> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> >> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> >> -
> >> -    return true;
> >> -}
> >> -
> >>  static void load_linux(PCMachineState *pcms,
> >>                         FWCfgState *fw_cfg)
> >>  {
> >> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
> >>      if (ldl_p(header+0x202) == 0x53726448) {
> >>          protocol = lduw_p(header+0x206);
> >>      } else {
> >> +        size_t pvh_start_addr;
> >> +        uint32_t mh_load_addr = 0;
> >> +        uint32_t elf_kernel_size = 0;
> >>          /*
> >>           * This could be a multiboot kernel. If it is, let's stop treating it
> >>           * like a Linux kernel.
> >> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
> >>           * If load_elfboot() is successful, populate the fw_cfg info.
> >>           */
> >>          if (pcmc->pvh_enabled &&
> >> -            load_elfboot(kernel_filename, kernel_size,
> >> -                         header, pvh_start_addr, fw_cfg)) {
> >> +            pvh_load_elfboot(kernel_filename,
> >> +                             &mh_load_addr, &elf_kernel_size)) {
> >>              fclose(f);
> >>  
> >> +            pvh_start_addr = pvh_get_start_addr();
> >> +
> >> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> >> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> >> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> >> +
> >>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
> >>                  strlen(kernel_cmdline) + 1);
> >>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> >> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
> >> new file mode 100644
> >> index 0000000000..1c81727811
> >> --- /dev/null
> >> +++ b/hw/i386/pvh.c
> >> @@ -0,0 +1,113 @@
> >> +/*
> >> + * PVH Boot Helper
> >> + *
> >> + * Copyright (C) 2019 Oracle
> >> + * Copyright (C) 2019 Red Hat, Inc
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >> + * See the COPYING file in the top-level directory.
> >> + *
> >> + */
> >> +
> >> +#include "qemu/osdep.h"
> >> +#include "qemu/units.h"
> >> +#include "qemu/error-report.h"
> >> +#include "hw/loader.h"
> >> +#include "cpu.h"
> >> +#include "elf.h"
> >> +#include "pvh.h"
> >> +
> >> +static size_t pvh_start_addr;
> >> +
> >> +size_t pvh_get_start_addr(void)
> >> +{
> >> +    return pvh_start_addr;
> >> +}
> >> +
> >> +/*
> >> + * The entry point into the kernel for PVH boot is different from
> >> + * the native entry point.  The PVH entry is defined by the x86/HVM
> >> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> >> + *
> >> + * This function is passed to load_elf() when it is called from
> >> + * load_elfboot() which then additionally checks for an ELF Note of
> >> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> >> + * parse the PVH entry address from the ELF Note.
> >> + *
> >> + * Due to trickery in elf_opts.h, load_elf() is actually available as
> >> + * load_elf32() or load_elf64() and this routine needs to be able
> >> + * to deal with being called as 32 or 64 bit.
> >> + *
> >> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
> >> + * global variable.  (although the entry point is 32-bit, the kernel
> >> + * binary can be either 32-bit or 64-bit).
> >> + */
> >> +
> >> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> >> +{
> >> +    size_t *elf_note_data_addr;
> >> +
> >> +    /* Check if ELF Note header passed in is valid */
> >> +    if (arg1 == NULL) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    if (is64) {
> >> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> >> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> >> +        uint64_t phdr_align = *(uint64_t *)arg2;
> >> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
> >> +
> >> +        elf_note_data_addr =
> >> +            ((void *)nhdr64) + nhdr_size64 +
> >> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> >> +    } else {
> >> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> >> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> >> +        uint32_t phdr_align = *(uint32_t *)arg2;
> >> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
> >> +
> >> +        elf_note_data_addr =
> >> +            ((void *)nhdr32) + nhdr_size32 +
> >> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> >> +    }
> >> +
> >> +    pvh_start_addr = *elf_note_data_addr;
> >> +
> >> +    return pvh_start_addr;
> >> +}
> >> +
> >> +bool pvh_load_elfboot(const char *kernel_filename,
> >> +                      uint32_t *mh_load_addr,
> >> +                      uint32_t *elf_kernel_size)
> >> +{
> >> +    uint64_t elf_entry;
> >> +    uint64_t elf_low, elf_high;
> >> +    int kernel_size;
> >> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> >> +
> >> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> >> +                           NULL, &elf_note_type, &elf_entry,
> >> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> >> +                           0, 0);
> >> +
> >> +    if (kernel_size < 0) {
> >> +        error_report("Error while loading elf kernel");
> >> +        return false;
> >> +    }
> >> +
> >> +    if (pvh_start_addr == 0) {
> >> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
> >> +        return false;
> >> +    }
> >> +
> >> +    if (mh_load_addr) {
> >> +        *mh_load_addr = elf_low;
> >> +    }
> >> +
> >> +    if (elf_kernel_size) {
> >> +        *elf_kernel_size = elf_high - elf_low;
> >> +    }
> >> +
> >> +    return true;
> >> +}
> >> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
> >> new file mode 100644
> >> index 0000000000..ada67ff6e8
> >> --- /dev/null
> >> +++ b/hw/i386/pvh.h
> >> @@ -0,0 +1,10 @@
> >> +#ifndef HW_I386_PVH_H
> >> +#define HW_I386_PVH_H
> >> +
> >> +size_t pvh_get_start_addr(void);
> >
> > What about adding "size_t *pvh_start_addr" to the pvh_load_elfboot()?
> > Just an idea, I'm not sure if it is better...
> 
> I agree. In fact, given that patch 4/8 extracts some common functions
> from pc.c into x86.c, and load_linux is among these functions, perhaps
> we can avoid creating an independent file and just put the PVH code
> there.
> 
> What do you think?

Make sense to me, since it's going to be used by pc and microvm.

Thanks,
Stefano

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 1/8] hw/i386: Factorize PVH related functions
@ 2019-09-25  9:29         ` Stefano Garzarella
  0 siblings, 0 replies; 133+ messages in thread
From: Stefano Garzarella @ 2019-09-25  9:29 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	pbonzini, imammedo, philmd, rth

On Wed, Sep 25, 2019 at 11:00:30AM +0200, Sergio Lopez wrote:
> Stefano Garzarella <sgarzare@redhat.com> writes:
> > Hi Sergio,
> >
> > On Tue, Sep 24, 2019 at 02:44:26PM +0200, Sergio Lopez wrote:
> >> Extract PVH related functions from pc.c, and put them in pvh.c, so
> >> they can be shared with other components.
> >> 
> >> Signed-off-by: Sergio Lopez <slp@redhat.com>
> >> ---
> >>  hw/i386/Makefile.objs |   1 +
> >>  hw/i386/pc.c          | 120 +++++-------------------------------------
> >>  hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
> >>  hw/i386/pvh.h         |  10 ++++
> >>  4 files changed, 136 insertions(+), 108 deletions(-)
> >>  create mode 100644 hw/i386/pvh.c
> >>  create mode 100644 hw/i386/pvh.h
> >> 
> >> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> >> index 5d9c9efd5f..c5f20bbd72 100644
> >> --- a/hw/i386/Makefile.objs
> >> +++ b/hw/i386/Makefile.objs
> >> @@ -1,5 +1,6 @@
> >>  obj-$(CONFIG_KVM) += kvm/
> >>  obj-y += multiboot.o
> >> +obj-y += pvh.o
> >>  obj-y += pc.o
> >>  obj-$(CONFIG_I440FX) += pc_piix.o
> >>  obj-$(CONFIG_Q35) += pc_q35.o
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index bad866fe44..10e4ced0c6 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -42,6 +42,7 @@
> >>  #include "elf.h"
> >>  #include "migration/vmstate.h"
> >>  #include "multiboot.h"
> >> +#include "pvh.h"
> >>  #include "hw/timer/mc146818rtc.h"
> >>  #include "hw/dma/i8257.h"
> >>  #include "hw/timer/i8254.h"
> >> @@ -116,9 +117,6 @@ static struct e820_entry *e820_table;
> >>  static unsigned e820_entries;
> >>  struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
> >>  
> >> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
> >> -static size_t pvh_start_addr;
> >> -
> >>  GlobalProperty pc_compat_4_1[] = {};
> >>  const size_t pc_compat_4_1_len = G_N_ELEMENTS(pc_compat_4_1);
> >>  
> >> @@ -1076,109 +1074,6 @@ struct setup_data {
> >>      uint8_t data[0];
> >>  } __attribute__((packed));
> >>  
> >> -
> >> -/*
> >> - * The entry point into the kernel for PVH boot is different from
> >> - * the native entry point.  The PVH entry is defined by the x86/HVM
> >> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> >> - *
> >> - * This function is passed to load_elf() when it is called from
> >> - * load_elfboot() which then additionally checks for an ELF Note of
> >> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> >> - * parse the PVH entry address from the ELF Note.
> >> - *
> >> - * Due to trickery in elf_opts.h, load_elf() is actually available as
> >> - * load_elf32() or load_elf64() and this routine needs to be able
> >> - * to deal with being called as 32 or 64 bit.
> >> - *
> >> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
> >> - * global variable.  (although the entry point is 32-bit, the kernel
> >> - * binary can be either 32-bit or 64-bit).
> >> - */
> >> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> >> -{
> >> -    size_t *elf_note_data_addr;
> >> -
> >> -    /* Check if ELF Note header passed in is valid */
> >> -    if (arg1 == NULL) {
> >> -        return 0;
> >> -    }
> >> -
> >> -    if (is64) {
> >> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> >> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> >> -        uint64_t phdr_align = *(uint64_t *)arg2;
> >> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
> >> -
> >> -        elf_note_data_addr =
> >> -            ((void *)nhdr64) + nhdr_size64 +
> >> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> >> -    } else {
> >> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> >> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> >> -        uint32_t phdr_align = *(uint32_t *)arg2;
> >> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
> >> -
> >> -        elf_note_data_addr =
> >> -            ((void *)nhdr32) + nhdr_size32 +
> >> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> >> -    }
> >> -
> >> -    pvh_start_addr = *elf_note_data_addr;
> >> -
> >> -    return pvh_start_addr;
> >> -}
> >> -
> >> -static bool load_elfboot(const char *kernel_filename,
> >> -                   int kernel_file_size,
> >> -                   uint8_t *header,
> >> -                   size_t pvh_xen_start_addr,
> >> -                   FWCfgState *fw_cfg)
> >> -{
> >> -    uint32_t flags = 0;
> >> -    uint32_t mh_load_addr = 0;
> >> -    uint32_t elf_kernel_size = 0;
> >> -    uint64_t elf_entry;
> >> -    uint64_t elf_low, elf_high;
> >> -    int kernel_size;
> >> -
> >
> > Are we removing the following checks (ELF magic, flags) because they
> > are superfluous?
> >
> > Should we mention this in the commit message?
> 
> Damn, good catch, that's wrong.
> 
> The only patches coming from previous iterations are the one factorizing
> the e820 functions and this one, and both are wrong. I'm going to ditch
> them and write whatever it's needed from scratch.
> 
> >> -    if (ldl_p(header) != 0x464c457f) {
> >> -        return false; /* no elfboot */
> >> -    }
> >> -
> >> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> >> -    flags = elf_is64 ?
> >> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> >> -
> >> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> >> -        error_report("elfboot unsupported flags = %x", flags);
> >> -        exit(1);
> >> -    }
> >> -
> >> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> >> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> >> -                           NULL, &elf_note_type, &elf_entry,
> >> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> >> -                           0, 0);
> >> -
> >> -    if (kernel_size < 0) {
> >> -        error_report("Error while loading elf kernel");
> >> -        exit(1);
> >> -    }
> >> -    mh_load_addr = elf_low;
> >> -    elf_kernel_size = elf_high - elf_low;
> >> -
> >> -    if (pvh_start_addr == 0) {
> >> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
> >> -        exit(1);
> >> -    }
> >> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> >> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> >> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> >> -
> >> -    return true;
> >> -}
> >> -
> >>  static void load_linux(PCMachineState *pcms,
> >>                         FWCfgState *fw_cfg)
> >>  {
> >> @@ -1218,6 +1113,9 @@ static void load_linux(PCMachineState *pcms,
> >>      if (ldl_p(header+0x202) == 0x53726448) {
> >>          protocol = lduw_p(header+0x206);
> >>      } else {
> >> +        size_t pvh_start_addr;
> >> +        uint32_t mh_load_addr = 0;
> >> +        uint32_t elf_kernel_size = 0;
> >>          /*
> >>           * This could be a multiboot kernel. If it is, let's stop treating it
> >>           * like a Linux kernel.
> >> @@ -1235,10 +1133,16 @@ static void load_linux(PCMachineState *pcms,
> >>           * If load_elfboot() is successful, populate the fw_cfg info.
> >>           */
> >>          if (pcmc->pvh_enabled &&
> >> -            load_elfboot(kernel_filename, kernel_size,
> >> -                         header, pvh_start_addr, fw_cfg)) {
> >> +            pvh_load_elfboot(kernel_filename,
> >> +                             &mh_load_addr, &elf_kernel_size)) {
> >>              fclose(f);
> >>  
> >> +            pvh_start_addr = pvh_get_start_addr();
> >> +
> >> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> >> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> >> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> >> +
> >>              fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
> >>                  strlen(kernel_cmdline) + 1);
> >>              fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> >> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
> >> new file mode 100644
> >> index 0000000000..1c81727811
> >> --- /dev/null
> >> +++ b/hw/i386/pvh.c
> >> @@ -0,0 +1,113 @@
> >> +/*
> >> + * PVH Boot Helper
> >> + *
> >> + * Copyright (C) 2019 Oracle
> >> + * Copyright (C) 2019 Red Hat, Inc
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >> + * See the COPYING file in the top-level directory.
> >> + *
> >> + */
> >> +
> >> +#include "qemu/osdep.h"
> >> +#include "qemu/units.h"
> >> +#include "qemu/error-report.h"
> >> +#include "hw/loader.h"
> >> +#include "cpu.h"
> >> +#include "elf.h"
> >> +#include "pvh.h"
> >> +
> >> +static size_t pvh_start_addr;
> >> +
> >> +size_t pvh_get_start_addr(void)
> >> +{
> >> +    return pvh_start_addr;
> >> +}
> >> +
> >> +/*
> >> + * The entry point into the kernel for PVH boot is different from
> >> + * the native entry point.  The PVH entry is defined by the x86/HVM
> >> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> >> + *
> >> + * This function is passed to load_elf() when it is called from
> >> + * load_elfboot() which then additionally checks for an ELF Note of
> >> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> >> + * parse the PVH entry address from the ELF Note.
> >> + *
> >> + * Due to trickery in elf_opts.h, load_elf() is actually available as
> >> + * load_elf32() or load_elf64() and this routine needs to be able
> >> + * to deal with being called as 32 or 64 bit.
> >> + *
> >> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
> >> + * global variable.  (although the entry point is 32-bit, the kernel
> >> + * binary can be either 32-bit or 64-bit).
> >> + */
> >> +
> >> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> >> +{
> >> +    size_t *elf_note_data_addr;
> >> +
> >> +    /* Check if ELF Note header passed in is valid */
> >> +    if (arg1 == NULL) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    if (is64) {
> >> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> >> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> >> +        uint64_t phdr_align = *(uint64_t *)arg2;
> >> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
> >> +
> >> +        elf_note_data_addr =
> >> +            ((void *)nhdr64) + nhdr_size64 +
> >> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> >> +    } else {
> >> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> >> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> >> +        uint32_t phdr_align = *(uint32_t *)arg2;
> >> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
> >> +
> >> +        elf_note_data_addr =
> >> +            ((void *)nhdr32) + nhdr_size32 +
> >> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> >> +    }
> >> +
> >> +    pvh_start_addr = *elf_note_data_addr;
> >> +
> >> +    return pvh_start_addr;
> >> +}
> >> +
> >> +bool pvh_load_elfboot(const char *kernel_filename,
> >> +                      uint32_t *mh_load_addr,
> >> +                      uint32_t *elf_kernel_size)
> >> +{
> >> +    uint64_t elf_entry;
> >> +    uint64_t elf_low, elf_high;
> >> +    int kernel_size;
> >> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> >> +
> >> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> >> +                           NULL, &elf_note_type, &elf_entry,
> >> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> >> +                           0, 0);
> >> +
> >> +    if (kernel_size < 0) {
> >> +        error_report("Error while loading elf kernel");
> >> +        return false;
> >> +    }
> >> +
> >> +    if (pvh_start_addr == 0) {
> >> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
> >> +        return false;
> >> +    }
> >> +
> >> +    if (mh_load_addr) {
> >> +        *mh_load_addr = elf_low;
> >> +    }
> >> +
> >> +    if (elf_kernel_size) {
> >> +        *elf_kernel_size = elf_high - elf_low;
> >> +    }
> >> +
> >> +    return true;
> >> +}
> >> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
> >> new file mode 100644
> >> index 0000000000..ada67ff6e8
> >> --- /dev/null
> >> +++ b/hw/i386/pvh.h
> >> @@ -0,0 +1,10 @@
> >> +#ifndef HW_I386_PVH_H
> >> +#define HW_I386_PVH_H
> >> +
> >> +size_t pvh_get_start_addr(void);
> >
> > What about adding "size_t *pvh_start_addr" to the pvh_load_elfboot()?
> > Just an idea, I'm not sure if it is better...
> 
> I agree. In fact, given that patch 4/8 extracts some common functions
> from pc.c into x86.c, and load_linux is among these functions, perhaps
> we can avoid creating an independent file and just put the PVH code
> there.
> 
> What do you think?

Make sense to me, since it's going to be used by pc and microvm.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  9:12         ` Gerd Hoffmann
@ 2019-09-25  9:47           ` David Hildenbrand
  -1 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25  9:47 UTC (permalink / raw)
  To: Gerd Hoffmann, Paolo Bonzini
  Cc: Sergio Lopez, qemu-devel, mst, imammedo, marcel.apfelbaum, rth,
	ehabkost, philmd, lersek, mtosatti, kvm, Pankaj Gupta

On 25.09.19 11:12, Gerd Hoffmann wrote:
>   Hi,
> 
>> If you want to add hotplug to microvm, you can reuse the existing code
>> for CPU and memory hotplug controllers, and write drivers for them in
>> Linux's drivers/platform.  The drivers would basically do what the ACPI
>> AML tells the interpreter to do.
> 
> How would the linux kernel detect those devices?
> 
> I guess that wouldn't be ACPI, seems everyone wants avoid it[1].
> 
> So device tree on x86?  Something else?
> 
> cheers,
>   Gerd
> 
> [1] Not clear to me why, some minimal ACPI tables listing our
>     devices (isa-serial, fw_cfg, ...) doesn't look unreasonable
>     to me.  We could also make virtio-mmio discoverable that way.
>     Also we could do acpi cpu hotplug without having to write those
>     linux platform drivers.  We would need a sysbus-acpi device though,
>     but given that most acpi code is already separated out so piix and
>     q35 can share it it should not be that hard to wire up.
> 

Just to make one thing clear, the same could be used for DIMM based
memory hotplug, too. virtio-mem is not simply exposing DIMMs to a guest
using virtio. It's even designed to co-exist with DIMM based memory hotplug.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25  9:47           ` David Hildenbrand
  0 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25  9:47 UTC (permalink / raw)
  To: Gerd Hoffmann, Paolo Bonzini
  Cc: Pankaj Gupta, ehabkost, Sergio Lopez, mst, lersek, mtosatti,
	qemu-devel, kvm, imammedo, philmd, rth

On 25.09.19 11:12, Gerd Hoffmann wrote:
>   Hi,
> 
>> If you want to add hotplug to microvm, you can reuse the existing code
>> for CPU and memory hotplug controllers, and write drivers for them in
>> Linux's drivers/platform.  The drivers would basically do what the ACPI
>> AML tells the interpreter to do.
> 
> How would the linux kernel detect those devices?
> 
> I guess that wouldn't be ACPI, seems everyone wants avoid it[1].
> 
> So device tree on x86?  Something else?
> 
> cheers,
>   Gerd
> 
> [1] Not clear to me why, some minimal ACPI tables listing our
>     devices (isa-serial, fw_cfg, ...) doesn't look unreasonable
>     to me.  We could also make virtio-mmio discoverable that way.
>     Also we could do acpi cpu hotplug without having to write those
>     linux platform drivers.  We would need a sysbus-acpi device though,
>     but given that most acpi code is already separated out so piix and
>     q35 can share it it should not be that hard to wire up.
> 

Just to make one thing clear, the same could be used for DIMM based
memory hotplug, too. virtio-mem is not simply exposing DIMMs to a guest
using virtio. It's even designed to co-exist with DIMM based memory hotplug.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 133+ messages in thread

* when to use virtio (was Re: [PATCH v4 0/8] Introduce the microvm machine type)
  2019-09-25  8:44         ` David Hildenbrand
@ 2019-09-25 10:19           ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25 10:19 UTC (permalink / raw)
  To: David Hildenbrand, Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm, Pankaj Gupta

This is a tangent, but I was a bit too harsh in my previous message (at
least it made you laugh rather than angry!) so I think I owe you an
explanation.

On 25/09/19 10:44, David Hildenbrand wrote:
> I consider virtio the silver bullet whenever we want a mature
> paravirtualized interface across architectures. And you can tell that
> I'm not the only one by the huge amount of virtio device people are
> crafting right now.

Given there are hardware implementation of virtio, I would refine that:
virtio is a silver bullet whenever we want a mature ring buffer
interface across architectures.  Being friendly to virtualization is by
now only a detail of virtio.  It is also not exclusive to virtio, for
example NVMe 1.3 has incorporated some ideas from Xen and virtio and is
also virtualization-friendly.

In turn, the ring buffer interface is great if you want to have mostly
asynchronous operation---if not, the ring buffer is just adding
complexity.  Sure, we have the luxury of abstractions and powerful
computers that hide most of the complexity, but some of it still lurks
in the form of race conditions.

So the question for virtio-mem is what makes asynchronous operation
important for memory hotplug?  If I understand the virtio-mem driver,
all interaction with the virtio device happens through a work item,
meaning that it is strictly synchronous.  At this point, you do not need
a ring buffer, you only need:

- a command register where you write the address of a command buffer.
The device will do DMA from the command block, do whatever it has to do,
DMA back the results, and trigger an interrupt.

- an interrupt mechanism.  It could be MSI, or it could be an interrupt
pending/interrupt acknowledge register if all the hardware offers is
level-triggered interrupts.

I do agree that virtio-mem's command buffer/DMA architecture is better
than the more traditional "bunch of hardware registers" architecture
that QEMU uses for its ACPI-based CPU and memory hotplug controllers.
But that's because command buffer/DMA is what actually defines a good
paravirtualized interface; virtio is a superset of that that may not be
always a good solution.

Paolo

^ permalink raw reply	[flat|nested] 133+ messages in thread

* when to use virtio (was Re: [PATCH v4 0/8] Introduce the microvm machine type)
@ 2019-09-25 10:19           ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25 10:19 UTC (permalink / raw)
  To: David Hildenbrand, Sergio Lopez
  Cc: Pankaj Gupta, ehabkost, kvm, mst, lersek, mtosatti, qemu-devel,
	kraxel, imammedo, philmd, rth

This is a tangent, but I was a bit too harsh in my previous message (at
least it made you laugh rather than angry!) so I think I owe you an
explanation.

On 25/09/19 10:44, David Hildenbrand wrote:
> I consider virtio the silver bullet whenever we want a mature
> paravirtualized interface across architectures. And you can tell that
> I'm not the only one by the huge amount of virtio device people are
> crafting right now.

Given there are hardware implementation of virtio, I would refine that:
virtio is a silver bullet whenever we want a mature ring buffer
interface across architectures.  Being friendly to virtualization is by
now only a detail of virtio.  It is also not exclusive to virtio, for
example NVMe 1.3 has incorporated some ideas from Xen and virtio and is
also virtualization-friendly.

In turn, the ring buffer interface is great if you want to have mostly
asynchronous operation---if not, the ring buffer is just adding
complexity.  Sure, we have the luxury of abstractions and powerful
computers that hide most of the complexity, but some of it still lurks
in the form of race conditions.

So the question for virtio-mem is what makes asynchronous operation
important for memory hotplug?  If I understand the virtio-mem driver,
all interaction with the virtio device happens through a work item,
meaning that it is strictly synchronous.  At this point, you do not need
a ring buffer, you only need:

- a command register where you write the address of a command buffer.
The device will do DMA from the command block, do whatever it has to do,
DMA back the results, and trigger an interrupt.

- an interrupt mechanism.  It could be MSI, or it could be an interrupt
pending/interrupt acknowledge register if all the hardware offers is
level-triggered interrupts.

I do agree that virtio-mem's command buffer/DMA architecture is better
than the more traditional "bunch of hardware registers" architecture
that QEMU uses for its ACPI-based CPU and memory hotplug controllers.
But that's because command buffer/DMA is what actually defines a good
paravirtualized interface; virtio is a superset of that that may not be
always a good solution.

Paolo


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: when to use virtio (was Re: [PATCH v4 0/8] Introduce the microvm machine type)
  2019-09-25 10:19           ` Paolo Bonzini
@ 2019-09-25 10:50             ` David Hildenbrand
  -1 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25 10:50 UTC (permalink / raw)
  To: Paolo Bonzini, Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm, Pankaj Gupta

On 25.09.19 12:19, Paolo Bonzini wrote:
> This is a tangent, but I was a bit too harsh in my previous message (at
> least it made you laugh rather than angry!) so I think I owe you an
> explanation.

It's hard to make me really angry, you have to try better :) However,
after years of working on VMs, VM memory management and Linux MM, I
learned that things are horribly complicated - it's not obvious so I
can't expect all people to know what I learned.

> 
> On 25/09/19 10:44, David Hildenbrand wrote:
>> I consider virtio the silver bullet whenever we want a mature
>> paravirtualized interface across architectures. And you can tell that
>> I'm not the only one by the huge amount of virtio device people are
>> crafting right now.
> 
> Given there are hardware implementation of virtio, I would refine that:
> virtio is a silver bullet whenever we want a mature ring buffer
> interface across architectures.  Being friendly to virtualization is by
> now only a detail of virtio.  It is also not exclusive to virtio, for
> example NVMe 1.3 has incorporated some ideas from Xen and virtio and is
> also virtualization-friendly.
> 
> In turn, the ring buffer interface is great if you want to have mostly
> asynchronous operation---if not, the ring buffer is just adding
> complexity.  Sure, we have the luxury of abstractions and powerful
> computers that hide most of the complexity, but some of it still lurks
> in the form of race conditions.
> 
> So the question for virtio-mem is what makes asynchronous operation
> important for memory hotplug?  If I understand the virtio-mem driver,
> all interaction with the virtio device happens through a work item,
> meaning that it is strictly synchronous.  At this point, you do not need
> a ring buffer, you only need:

So, the main building pieces virtio-mem uses as of now in the virtio
infrastructure are the config space and one virtqueue.

a) A way for the host to send requests to the guest. E.g., request a
certain amount of memory to be plugged/unplugged by the guest. Done via
config space updates (e.g., similar to virtio-balloon
inflation/deflation requests).
b) A way for the guest to communicate with the host. E.g., send
plug/unplug requests to plug/unplug separate memory blocks. Done via a
virtqueue. Similar to inflation/deflation of pages in virtio-balloon.

Requests by the host via the config space are processed asynchronously
by the guest (again, similar to - say - virtio-balloon). Guest requests
are currently processed synchronously by the host.

Guest: Can I plug this block?
Host: Sorry, No can do.

Can't tell if there might be extensions (if virtio-mem ever comes to
life ;) ) that might make use of asynchronous communication. Especially,
there might be asynchronous/multiple guest->host requests at some point
(e.g., "I'm nearly out of memory, please send help").

So yes, currently we could live without the ring buffer. But the config
space and the virtqueue are real life-savers for me right now :)

> 
> - a command register where you write the address of a command buffer.
> The device will do DMA from the command block, do whatever it has to do,
> DMA back the results, and trigger an interrupt.
> 
> - an interrupt mechanism.  It could be MSI, or it could be an interrupt
> pending/interrupt acknowledge register if all the hardware offers is
> level-triggered interrupts.
> 
> I do agree that virtio-mem's command buffer/DMA architecture is better
> than the more traditional "bunch of hardware registers" architecture
> that QEMU uses for its ACPI-based CPU and memory hotplug controllers.
> But that's because command buffer/DMA is what actually defines a good
> paravirtualized interface; virtio is a superset of that that may not be
> always a good solution.
> 

I completely agree to what you say here, virtio comes with complexity,
but also with features (e.g., config space, support for multiple queues,
abstraction of transports).

Say, I would only want to expose a DIMM to the guest just like via ACPI,
virtio would clearly not be the right choice.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: when to use virtio (was Re: [PATCH v4 0/8] Introduce the microvm machine type)
@ 2019-09-25 10:50             ` David Hildenbrand
  0 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25 10:50 UTC (permalink / raw)
  To: Paolo Bonzini, Sergio Lopez
  Cc: Pankaj Gupta, ehabkost, kvm, mst, lersek, mtosatti, qemu-devel,
	kraxel, imammedo, philmd, rth

On 25.09.19 12:19, Paolo Bonzini wrote:
> This is a tangent, but I was a bit too harsh in my previous message (at
> least it made you laugh rather than angry!) so I think I owe you an
> explanation.

It's hard to make me really angry, you have to try better :) However,
after years of working on VMs, VM memory management and Linux MM, I
learned that things are horribly complicated - it's not obvious so I
can't expect all people to know what I learned.

> 
> On 25/09/19 10:44, David Hildenbrand wrote:
>> I consider virtio the silver bullet whenever we want a mature
>> paravirtualized interface across architectures. And you can tell that
>> I'm not the only one by the huge amount of virtio device people are
>> crafting right now.
> 
> Given there are hardware implementation of virtio, I would refine that:
> virtio is a silver bullet whenever we want a mature ring buffer
> interface across architectures.  Being friendly to virtualization is by
> now only a detail of virtio.  It is also not exclusive to virtio, for
> example NVMe 1.3 has incorporated some ideas from Xen and virtio and is
> also virtualization-friendly.
> 
> In turn, the ring buffer interface is great if you want to have mostly
> asynchronous operation---if not, the ring buffer is just adding
> complexity.  Sure, we have the luxury of abstractions and powerful
> computers that hide most of the complexity, but some of it still lurks
> in the form of race conditions.
> 
> So the question for virtio-mem is what makes asynchronous operation
> important for memory hotplug?  If I understand the virtio-mem driver,
> all interaction with the virtio device happens through a work item,
> meaning that it is strictly synchronous.  At this point, you do not need
> a ring buffer, you only need:

So, the main building pieces virtio-mem uses as of now in the virtio
infrastructure are the config space and one virtqueue.

a) A way for the host to send requests to the guest. E.g., request a
certain amount of memory to be plugged/unplugged by the guest. Done via
config space updates (e.g., similar to virtio-balloon
inflation/deflation requests).
b) A way for the guest to communicate with the host. E.g., send
plug/unplug requests to plug/unplug separate memory blocks. Done via a
virtqueue. Similar to inflation/deflation of pages in virtio-balloon.

Requests by the host via the config space are processed asynchronously
by the guest (again, similar to - say - virtio-balloon). Guest requests
are currently processed synchronously by the host.

Guest: Can I plug this block?
Host: Sorry, No can do.

Can't tell if there might be extensions (if virtio-mem ever comes to
life ;) ) that might make use of asynchronous communication. Especially,
there might be asynchronous/multiple guest->host requests at some point
(e.g., "I'm nearly out of memory, please send help").

So yes, currently we could live without the ring buffer. But the config
space and the virtqueue are real life-savers for me right now :)

> 
> - a command register where you write the address of a command buffer.
> The device will do DMA from the command block, do whatever it has to do,
> DMA back the results, and trigger an interrupt.
> 
> - an interrupt mechanism.  It could be MSI, or it could be an interrupt
> pending/interrupt acknowledge register if all the hardware offers is
> level-triggered interrupts.
> 
> I do agree that virtio-mem's command buffer/DMA architecture is better
> than the more traditional "bunch of hardware registers" architecture
> that QEMU uses for its ACPI-based CPU and memory hotplug controllers.
> But that's because command buffer/DMA is what actually defines a good
> paravirtualized interface; virtio is a superset of that that may not be
> always a good solution.
> 

I completely agree to what you say here, virtio comes with complexity,
but also with features (e.g., config space, support for multiple queues,
abstraction of transports).

Say, I would only want to expose a DIMM to the guest just like via ACPI,
virtio would clearly not be the right choice.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-25  9:22             ` Paolo Bonzini
@ 2019-09-25 11:04               ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25 11:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 2342 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/09/19 10:40, Sergio Lopez wrote:
>>>> We need the PIT for non-KVM accel (if present with KVM and
>>>> kernel_irqchip_split = off, it basically becomes a placeholder)
>>> Why?
>> 
>> Perhaps I'm missing something. Is some other device supposed to be
>> acting as a HW timer while running with TCG acceleration?
>
> Sure, the LAPIC timer.  I wonder if Linux, however, wants to use the PIT
> in order to calibrate the LAPIC timer if TSC deadline mode is unavailable.

Ah, yes. I was so confused by the nomenclature that I assumed we didn't
have a userspace implementation of it.

On the other hand, as you suspect, without the PIT Linux does hang in
TSC calibration with TCG accel.

A simple option could be adding it only if we're running without KVM.

>>>> and the PIC for both the PIT and the ISA serial port.
>>>
>>> Can't the ISA serial port work with the IOAPIC?
>> 
>> Hm... I'm not sure. I wanted to give it a try, but then noticed that
>> multiple places in the code (like hw/intc/apic.c:560) do expect to have
>> an ISA PIC present through the isa_pic global variable.
>> 
>> I guess we should be able to work around this, but I'm not sure if it's
>> really worth it. What do you think?
>
> You can add a paragraph saying that in the future the list could be
> reduced further.  I think that the direction we want to go is to only
> leave the IOAPIC around (the ISA devices in this respect are no
> different from the virtio-mmio devices).
>
> But you're right about isa_pic.  I wonder if it's as easy as this:
>
> diff --git a/hw/intc/apic.c b/hw/intc/apic.c
> index bce89911dc..5d03e48a19 100644
> --- a/hw/intc/apic.c
> +++ b/hw/intc/apic.c
> @@ -610,7 +610,7 @@ int apic_accept_pic_intr(DeviceState *dev)
>
>      if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) == 0 ||
>          (lvt0 & APIC_LVT_MASKED) == 0)
> -        return 1;
> +        return isa_pic != NULL;
>
>      return 0;
>  }

Yes, that would do the trick. There's another use of it at
hw/intc/ioapic.c:78, but we should be safe as, at least in the case of
Linux, DM_EXTINT is only used in check_timer(), which is only called if
it detects a i8259 PIC.

We should probably add an assertion with an informative message, just in
case.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25 11:04               ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25 11:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 2342 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/09/19 10:40, Sergio Lopez wrote:
>>>> We need the PIT for non-KVM accel (if present with KVM and
>>>> kernel_irqchip_split = off, it basically becomes a placeholder)
>>> Why?
>> 
>> Perhaps I'm missing something. Is some other device supposed to be
>> acting as a HW timer while running with TCG acceleration?
>
> Sure, the LAPIC timer.  I wonder if Linux, however, wants to use the PIT
> in order to calibrate the LAPIC timer if TSC deadline mode is unavailable.

Ah, yes. I was so confused by the nomenclature that I assumed we didn't
have a userspace implementation of it.

On the other hand, as you suspect, without the PIT Linux does hang in
TSC calibration with TCG accel.

A simple option could be adding it only if we're running without KVM.

>>>> and the PIC for both the PIT and the ISA serial port.
>>>
>>> Can't the ISA serial port work with the IOAPIC?
>> 
>> Hm... I'm not sure. I wanted to give it a try, but then noticed that
>> multiple places in the code (like hw/intc/apic.c:560) do expect to have
>> an ISA PIC present through the isa_pic global variable.
>> 
>> I guess we should be able to work around this, but I'm not sure if it's
>> really worth it. What do you think?
>
> You can add a paragraph saying that in the future the list could be
> reduced further.  I think that the direction we want to go is to only
> leave the IOAPIC around (the ISA devices in this respect are no
> different from the virtio-mmio devices).
>
> But you're right about isa_pic.  I wonder if it's as easy as this:
>
> diff --git a/hw/intc/apic.c b/hw/intc/apic.c
> index bce89911dc..5d03e48a19 100644
> --- a/hw/intc/apic.c
> +++ b/hw/intc/apic.c
> @@ -610,7 +610,7 @@ int apic_accept_pic_intr(DeviceState *dev)
>
>      if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) == 0 ||
>          (lvt0 & APIC_LVT_MASKED) == 0)
> -        return 1;
> +        return isa_pic != NULL;
>
>      return 0;
>  }

Yes, that would do the trick. There's another use of it at
hw/intc/ioapic.c:78, but we should be safe as, at least in the case of
Linux, DM_EXTINT is only used in check_timer(), which is only called if
it detects a i8259 PIC.

We should probably add an assertion with an informative message, just in
case.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-25 11:04               ` Sergio Lopez
@ 2019-09-25 11:20                 ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25 11:20 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm


[-- Attachment #1.1: Type: text/plain, Size: 602 bytes --]

On 25/09/19 13:04, Sergio Lopez wrote:
> Yes, that would do the trick. There's another use of it at
> hw/intc/ioapic.c:78, but we should be safe as, at least in the case of
> Linux, DM_EXTINT is only used in check_timer(), which is only called if
> it detects a i8259 PIC.

Even there it is actually LVT0's DM_EXTINT, not the IOAPIC's.  I think
pic_read_irq would have returned 7 (spurious IRQ on master i8259) until
commit 29bb5317cb ("i8259: QOM cleanups", 2013-04-29), so we should fix it.

Paolo

> We should probably add an assertion with an informative message, just in
> case.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25 11:20                 ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25 11:20 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth


[-- Attachment #1.1: Type: text/plain, Size: 602 bytes --]

On 25/09/19 13:04, Sergio Lopez wrote:
> Yes, that would do the trick. There's another use of it at
> hw/intc/ioapic.c:78, but we should be safe as, at least in the case of
> Linux, DM_EXTINT is only used in check_timer(), which is only called if
> it detects a i8259 PIC.

Even there it is actually LVT0's DM_EXTINT, not the IOAPIC's.  I think
pic_read_irq would have returned 7 (spurious IRQ on master i8259) until
commit 29bb5317cb ("i8259: QOM cleanups", 2013-04-29), so we should fix it.

Paolo

> We should probably add an assertion with an informative message, just in
> case.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: when to use virtio (was Re: [PATCH v4 0/8] Introduce the microvm machine type)
  2019-09-25 10:50             ` David Hildenbrand
@ 2019-09-25 11:24               ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25 11:24 UTC (permalink / raw)
  To: David Hildenbrand, Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm, Pankaj Gupta

On 25/09/19 12:50, David Hildenbrand wrote:
> Can't tell if there might be extensions (if virtio-mem ever comes to
> life ;) ) that might make use of asynchronous communication. Especially,
> there might be asynchronous/multiple guest->host requests at some point
> (e.g., "I'm nearly out of memory, please send help").

Okay, this makes sense.  I'm almost sold on it. :)

Config space also makes sense, though what you really need is the config
space interrupt, rather than config space per se.

Paolo

> So yes, currently we could live without the ring buffer. But the config
> space and the virtqueue are real life-savers for me right now :)


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: when to use virtio (was Re: [PATCH v4 0/8] Introduce the microvm machine type)
@ 2019-09-25 11:24               ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25 11:24 UTC (permalink / raw)
  To: David Hildenbrand, Sergio Lopez
  Cc: Pankaj Gupta, ehabkost, kvm, mst, lersek, mtosatti, qemu-devel,
	kraxel, imammedo, philmd, rth

On 25/09/19 12:50, David Hildenbrand wrote:
> Can't tell if there might be extensions (if virtio-mem ever comes to
> life ;) ) that might make use of asynchronous communication. Especially,
> there might be asynchronous/multiple guest->host requests at some point
> (e.g., "I'm nearly out of memory, please send help").

Okay, this makes sense.  I'm almost sold on it. :)

Config space also makes sense, though what you really need is the config
space interrupt, rather than config space per se.

Paolo

> So yes, currently we could live without the ring buffer. But the config
> space and the virtqueue are real life-savers for me right now :)



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: when to use virtio (was Re: [PATCH v4 0/8] Introduce the microvm machine type)
  2019-09-25 11:24               ` Paolo Bonzini
@ 2019-09-25 11:32                 ` David Hildenbrand
  -1 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25 11:32 UTC (permalink / raw)
  To: Paolo Bonzini, Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm, Pankaj Gupta

On 25.09.19 13:24, Paolo Bonzini wrote:
> On 25/09/19 12:50, David Hildenbrand wrote:
>> Can't tell if there might be extensions (if virtio-mem ever comes to
>> life ;) ) that might make use of asynchronous communication. Especially,
>> there might be asynchronous/multiple guest->host requests at some point
>> (e.g., "I'm nearly out of memory, please send help").
> 
> Okay, this makes sense.  I'm almost sold on it. :)
> 
> Config space also makes sense, though what you really need is the config
> space interrupt, rather than config space per se.
> 

Right, and feature negotiation is yet another nice-to-have thingy in the
virtio world :)

> Paolo


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: when to use virtio (was Re: [PATCH v4 0/8] Introduce the microvm machine type)
@ 2019-09-25 11:32                 ` David Hildenbrand
  0 siblings, 0 replies; 133+ messages in thread
From: David Hildenbrand @ 2019-09-25 11:32 UTC (permalink / raw)
  To: Paolo Bonzini, Sergio Lopez
  Cc: Pankaj Gupta, ehabkost, kvm, mst, lersek, mtosatti, qemu-devel,
	kraxel, imammedo, philmd, rth

On 25.09.19 13:24, Paolo Bonzini wrote:
> On 25/09/19 12:50, David Hildenbrand wrote:
>> Can't tell if there might be extensions (if virtio-mem ever comes to
>> life ;) ) that might make use of asynchronous communication. Especially,
>> there might be asynchronous/multiple guest->host requests at some point
>> (e.g., "I'm nearly out of memory, please send help").
> 
> Okay, this makes sense.  I'm almost sold on it. :)
> 
> Config space also makes sense, though what you really need is the config
> space interrupt, rather than config space per se.
> 

Right, and feature negotiation is yet another nice-to-have thingy in the
virtio world :)

> Paolo


-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25  5:51     ` Sergio Lopez
@ 2019-09-25 11:33       ` Philippe Mathieu-Daudé
  -1 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-25 11:33 UTC (permalink / raw)
  To: Sergio Lopez, Peter Maydell
  Cc: QEMU Developers, Eduardo Habkost, kvm-devel, Michael S. Tsirkin,
	Laszlo Ersek, Marcelo Tosatti, Gerd Hoffmann, Paolo Bonzini,
	Igor Mammedov, Richard Henderson

On 9/25/19 7:51 AM, Sergio Lopez wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
>> On Tue, 24 Sep 2019 at 14:25, Sergio Lopez <slp@redhat.com> wrote:
>>>
>>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>>> constructed after the machine model implemented by the latter.
>>>
>>> It's main purpose is providing users a minimalist machine type free
>>> from the burden of legacy compatibility, serving as a stepping stone
>>> for future projects aiming at improving boot times, reducing the
>>> attack surface and slimming down QEMU's footprint.
>>
>>
>>>  docs/microvm.txt                 |  78 +++
>>
>> I'm not sure how close to acceptance this patchset is at the
>> moment, so not necessarily something you need to do now,
>> but could new documentation in docs/ be in rst format, not
>> plain text, please? (Ideally also they should be in the right
>> manual subdirectory, but documentation of system emulation
>> machines at the moment is still in texinfo format, so we
>> don't have a subdir for it yet.)
> 
> Sure. What I didn't get is, should I put it in "docs/microvm.rst" or in
> some other subdirectory?

Should we introduce docs/machines/?

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25 11:33       ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-25 11:33 UTC (permalink / raw)
  To: Sergio Lopez, Peter Maydell
  Cc: Eduardo Habkost, kvm-devel, Michael S. Tsirkin, Marcelo Tosatti,
	QEMU Developers, Gerd Hoffmann, Igor Mammedov, Paolo Bonzini,
	Laszlo Ersek, Richard Henderson

On 9/25/19 7:51 AM, Sergio Lopez wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
>> On Tue, 24 Sep 2019 at 14:25, Sergio Lopez <slp@redhat.com> wrote:
>>>
>>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>>> constructed after the machine model implemented by the latter.
>>>
>>> It's main purpose is providing users a minimalist machine type free
>>> from the burden of legacy compatibility, serving as a stepping stone
>>> for future projects aiming at improving boot times, reducing the
>>> attack surface and slimming down QEMU's footprint.
>>
>>
>>>  docs/microvm.txt                 |  78 +++
>>
>> I'm not sure how close to acceptance this patchset is at the
>> moment, so not necessarily something you need to do now,
>> but could new documentation in docs/ be in rst format, not
>> plain text, please? (Ideally also they should be in the right
>> manual subdirectory, but documentation of system emulation
>> machines at the moment is still in texinfo format, so we
>> don't have a subdir for it yet.)
> 
> Sure. What I didn't get is, should I put it in "docs/microvm.rst" or in
> some other subdirectory?

Should we introduce docs/machines/?


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-25 11:33       ` Philippe Mathieu-Daudé
@ 2019-09-25 12:39         ` Peter Maydell
  -1 siblings, 0 replies; 133+ messages in thread
From: Peter Maydell @ 2019-09-25 12:39 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Sergio Lopez, QEMU Developers, Eduardo Habkost, kvm-devel,
	Michael S. Tsirkin, Laszlo Ersek, Marcelo Tosatti, Gerd Hoffmann,
	Paolo Bonzini, Igor Mammedov, Richard Henderson

On Wed, 25 Sep 2019 at 12:33, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>
> On 9/25/19 7:51 AM, Sergio Lopez wrote:
> > Peter Maydell <peter.maydell@linaro.org> writes:
> >
> >> On Tue, 24 Sep 2019 at 14:25, Sergio Lopez <slp@redhat.com> wrote:
> >>>
> >>> Microvm is a machine type inspired by both NEMU and Firecracker, and
> >>> constructed after the machine model implemented by the latter.
> >>>
> >>> It's main purpose is providing users a minimalist machine type free
> >>> from the burden of legacy compatibility, serving as a stepping stone
> >>> for future projects aiming at improving boot times, reducing the
> >>> attack surface and slimming down QEMU's footprint.
> >>
> >>
> >>>  docs/microvm.txt                 |  78 +++
> >>
> >> I'm not sure how close to acceptance this patchset is at the
> >> moment, so not necessarily something you need to do now,
> >> but could new documentation in docs/ be in rst format, not
> >> plain text, please? (Ideally also they should be in the right
> >> manual subdirectory, but documentation of system emulation
> >> machines at the moment is still in texinfo format, so we
> >> don't have a subdir for it yet.)
> >
> > Sure. What I didn't get is, should I put it in "docs/microvm.rst" or in
> > some other subdirectory?
>
> Should we introduce docs/machines/?

This should live in the not-yet-created docs/system (the "system emulation
user's guide"), along with much of the content currently still in
the texinfo docs. But we don't have that structure yet and won't
until we do the texinfo conversion, so I think for the moment we
have two reasonable choices:
 (1) put it in the texinfo, so it is at least shipped to
     users until we get around to doing our docs conversion
 (2) leave it in docs/microvm.rst for now (we have a bunch
     of other docs in docs/ which are basically there because
     they're also awaiting the texinfo conversion and creation
     of the docs/user and docs/system manuals)

My ideal vision of how to do documentation of individual
machines, incidentally, would be to do it via doc comments
or some other kind of structured markup in the .c files
that define the machine, so that we could automatically
collect up the docs for the machines we're building,
put them in to per-architecture sections of the docs,
have autogenerated stub "this machine exists but isn't
documented yet" entries, etc. But that's not something that
we could easily do today so I don't want to block interim
improvements to our documentation just because I have some
nice theoretical idea for how it ought to work :-)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-25 12:39         ` Peter Maydell
  0 siblings, 0 replies; 133+ messages in thread
From: Peter Maydell @ 2019-09-25 12:39 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Sergio Lopez, kvm-devel, Michael S. Tsirkin, Marcelo Tosatti,
	QEMU Developers, Gerd Hoffmann, Igor Mammedov, Paolo Bonzini,
	Richard Henderson, Laszlo Ersek, Eduardo Habkost

On Wed, 25 Sep 2019 at 12:33, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>
> On 9/25/19 7:51 AM, Sergio Lopez wrote:
> > Peter Maydell <peter.maydell@linaro.org> writes:
> >
> >> On Tue, 24 Sep 2019 at 14:25, Sergio Lopez <slp@redhat.com> wrote:
> >>>
> >>> Microvm is a machine type inspired by both NEMU and Firecracker, and
> >>> constructed after the machine model implemented by the latter.
> >>>
> >>> It's main purpose is providing users a minimalist machine type free
> >>> from the burden of legacy compatibility, serving as a stepping stone
> >>> for future projects aiming at improving boot times, reducing the
> >>> attack surface and slimming down QEMU's footprint.
> >>
> >>
> >>>  docs/microvm.txt                 |  78 +++
> >>
> >> I'm not sure how close to acceptance this patchset is at the
> >> moment, so not necessarily something you need to do now,
> >> but could new documentation in docs/ be in rst format, not
> >> plain text, please? (Ideally also they should be in the right
> >> manual subdirectory, but documentation of system emulation
> >> machines at the moment is still in texinfo format, so we
> >> don't have a subdir for it yet.)
> >
> > Sure. What I didn't get is, should I put it in "docs/microvm.rst" or in
> > some other subdirectory?
>
> Should we introduce docs/machines/?

This should live in the not-yet-created docs/system (the "system emulation
user's guide"), along with much of the content currently still in
the texinfo docs. But we don't have that structure yet and won't
until we do the texinfo conversion, so I think for the moment we
have two reasonable choices:
 (1) put it in the texinfo, so it is at least shipped to
     users until we get around to doing our docs conversion
 (2) leave it in docs/microvm.rst for now (we have a bunch
     of other docs in docs/ which are basically there because
     they're also awaiting the texinfo conversion and creation
     of the docs/user and docs/system manuals)

My ideal vision of how to do documentation of individual
machines, incidentally, would be to do it via doc comments
or some other kind of structured markup in the .c files
that define the machine, so that we could automatically
collect up the docs for the machines we're building,
put them in to per-architecture sections of the docs,
have autogenerated stub "this machine exists but isn't
documented yet" entries, etc. But that's not something that
we could easily do today so I don't want to block interim
improvements to our documentation just because I have some
nice theoretical idea for how it ought to work :-)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-24 13:10     ` Paolo Bonzini
@ 2019-09-25 15:04       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25 15:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 24/09/19 14:44, Sergio Lopez wrote:
>> +Microvm is a machine type inspired by both NEMU and Firecracker, and
>> +constructed after the machine model implemented by the latter.
>
> I would say it's inspired by Firecracker only.  The NEMU virt machine
> had virtio-pci and ACPI.
>
>> +It's main purpose is providing users a minimalist machine type free
>> +from the burden of legacy compatibility,
>
> I think this is too strong, especially if you keep the PIC and PIT. :)
> Maybe just "It's a minimalist machine type without PCI support designed
> for short-lived guests".
>
>> +serving as a stepping stone
>> +for future projects aiming at improving boot times, reducing the
>> +attack surface and slimming down QEMU's footprint.
>
> "Microvm also establishes a baseline for benchmarking QEMU and operating
> systems, since it is optimized for both boot time and footprint".
>
>> +The microvm machine type supports the following devices:
>> +
>> + - ISA bus
>> + - i8259 PIC
>> + - LAPIC (implicit if using KVM)
>> + - IOAPIC (defaults to kernel_irqchip_split = true)
>> + - i8254 PIT
>
> Do we need the PIT?  And perhaps the PIC even?
>

I'm going back to this level of the thread, because after your
suggestion I took a deeper look at how things work around the PIC, and
discovered I was completely wrong about my assumptions.

For virtio-mmio devices, given that we don't have the ability to
configure vectors (as it's done in the PCI case) we're stuck with the
ones provided by the platform PIC, which in the x86 case is the i8259
(at least from Linux's perspective).

So we can get rid of the IOAPIC, but we need to keep the i8259 (we have
both a userspace and a kernel implementation too, so it should be fine).

As for the PIT, we can omit it if we're running with KVM acceleration,
as kvmclock will be used to calculate loops per jiffie and avoid the
calibration, leaving it enabled otherwise.

Thanks,
Sergio.




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25 15:04       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-25 15:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 24/09/19 14:44, Sergio Lopez wrote:
>> +Microvm is a machine type inspired by both NEMU and Firecracker, and
>> +constructed after the machine model implemented by the latter.
>
> I would say it's inspired by Firecracker only.  The NEMU virt machine
> had virtio-pci and ACPI.
>
>> +It's main purpose is providing users a minimalist machine type free
>> +from the burden of legacy compatibility,
>
> I think this is too strong, especially if you keep the PIC and PIT. :)
> Maybe just "It's a minimalist machine type without PCI support designed
> for short-lived guests".
>
>> +serving as a stepping stone
>> +for future projects aiming at improving boot times, reducing the
>> +attack surface and slimming down QEMU's footprint.
>
> "Microvm also establishes a baseline for benchmarking QEMU and operating
> systems, since it is optimized for both boot time and footprint".
>
>> +The microvm machine type supports the following devices:
>> +
>> + - ISA bus
>> + - i8259 PIC
>> + - LAPIC (implicit if using KVM)
>> + - IOAPIC (defaults to kernel_irqchip_split = true)
>> + - i8254 PIT
>
> Do we need the PIT?  And perhaps the PIC even?
>

I'm going back to this level of the thread, because after your
suggestion I took a deeper look at how things work around the PIC, and
discovered I was completely wrong about my assumptions.

For virtio-mmio devices, given that we don't have the ability to
configure vectors (as it's done in the PCI case) we're stuck with the
ones provided by the platform PIC, which in the x86 case is the i8259
(at least from Linux's perspective).

So we can get rid of the IOAPIC, but we need to keep the i8259 (we have
both a userspace and a kernel implementation too, so it should be fine).

As for the PIT, we can omit it if we're running with KVM acceleration,
as kvmclock will be used to calculate loops per jiffie and avoid the
calibration, leaving it enabled otherwise.

Thanks,
Sergio.




[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 4/8] hw/i386: split PCMachineState deriving X86MachineState from it
  2019-09-24 13:40   ` Philippe Mathieu-Daudé
@ 2019-09-25 15:39     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-25 15:39 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel, ehabkost
  Cc: kvm, mst, mtosatti, kraxel, pbonzini, imammedo, lersek, rth

On 9/24/19 3:40 PM, Philippe Mathieu-Daudé wrote:
> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> Split up PCMachineState and PCMachineClass and derive X86MachineState
>> and X86MachineClass from them. This allows sharing code with non-PC
>> machine types.
>>
>> Also, move shared functions from pc.c to x86.c.
>>
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  hw/acpi/cpu_hotplug.c |  10 +-
>>  hw/i386/Makefile.objs |   1 +
>>  hw/i386/acpi-build.c  |  31 +-
>>  hw/i386/amd_iommu.c   |   4 +-
>>  hw/i386/intel_iommu.c |   4 +-
>>  hw/i386/pc.c          | 796 +++++-------------------------------------
>>  hw/i386/pc_piix.c     |  48 +--
>>  hw/i386/pc_q35.c      |  38 +-
>>  hw/i386/pc_sysfw.c    |  60 +---
>>  hw/i386/x86.c         | 788 +++++++++++++++++++++++++++++++++++++++++
>>  hw/intc/ioapic.c      |   3 +-
>>  include/hw/i386/pc.h  |  29 +-
>>  include/hw/i386/x86.h |  97 +++++
>>  13 files changed, 1045 insertions(+), 864 deletions(-)
>>  create mode 100644 hw/i386/x86.c
>>  create mode 100644 include/hw/i386/x86.h
>>
>> diff --git a/hw/acpi/cpu_hotplug.c b/hw/acpi/cpu_hotplug.c
>> index 6e8293aac9..3ac2045a95 100644
>> --- a/hw/acpi/cpu_hotplug.c
>> +++ b/hw/acpi/cpu_hotplug.c
>> @@ -128,7 +128,7 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState *machine,
>>      Aml *one = aml_int(1);
>>      MachineClass *mc = MACHINE_GET_CLASS(machine);
>>      const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
>> -    PCMachineState *pcms = PC_MACHINE(machine);
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>>  
>>      /*
>>       * _MAT method - creates an madt apic buffer
>> @@ -236,9 +236,9 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState *machine,
>>      /* The current AML generator can cover the APIC ID range [0..255],
>>       * inclusive, for VCPU hotplug. */
>>      QEMU_BUILD_BUG_ON(ACPI_CPU_HOTPLUG_ID_LIMIT > 256);
>> -    if (pcms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
>> +    if (x86ms->apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
>>          error_report("max_cpus is too large. APIC ID of last CPU is %u",
>> -                     pcms->apic_id_limit - 1);
>> +                     x86ms->apic_id_limit - 1);
>>          exit(1);
>>      }
>>  
>> @@ -315,8 +315,8 @@ void build_legacy_cpu_hotplug_aml(Aml *ctx, MachineState *machine,
>>       * ith up to 255 elements. Windows guests up to win2k8 fail when
>>       * VarPackageOp is used.
>>       */
>> -    pkg = pcms->apic_id_limit <= 255 ? aml_package(pcms->apic_id_limit) :
>> -                                       aml_varpackage(pcms->apic_id_limit);
>> +    pkg = x86ms->apic_id_limit <= 255 ? aml_package(x86ms->apic_id_limit) :
>> +                                        aml_varpackage(x86ms->apic_id_limit);
>>  
>>      for (i = 0, apic_idx = 0; i < apic_ids->len; i++) {
>>          int apic_id = apic_ids->cpus[i].arch_id;
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 149712db07..5b4b3a672e 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -1,6 +1,7 @@
>>  obj-$(CONFIG_KVM) += kvm/
>>  obj-y += multiboot.o
>>  obj-y += pvh.o
>> +obj-y += x86.o
>>  obj-y += pc.o
>>  obj-y += e820.o
>>  obj-$(CONFIG_I440FX) += pc_piix.o
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index e54e571a75..76e18d3285 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -29,6 +29,7 @@
>>  #include "hw/pci/pci.h"
>>  #include "hw/core/cpu.h"
>>  #include "target/i386/cpu.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/misc/pvpanic.h"
>>  #include "hw/timer/hpet.h"
>>  #include "hw/acpi/acpi-defs.h"
>> @@ -361,6 +362,7 @@ static void
>>  build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
>>  {
>>      MachineClass *mc = MACHINE_GET_CLASS(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(MACHINE(pcms));
>>      int madt_start = table_data->len;
>>      AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(pcms->acpi_dev);
>> @@ -390,7 +392,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
>>      io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
>>      io_apic->interrupt = cpu_to_le32(0);
>>  
>> -    if (pcms->apic_xrupt_override) {
>> +    if (x86ms->apic_xrupt_override) {
>>          intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
>>          intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
>>          intsrcovr->length = sizeof(*intsrcovr);
>> @@ -1817,8 +1819,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>      CrsRangeEntry *entry;
>>      Aml *dsdt, *sb_scope, *scope, *dev, *method, *field, *pkg, *crs;
>>      CrsRangeSet crs_range_set;
>> -    PCMachineState *pcms = PC_MACHINE(machine);
>>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>>      AcpiMcfgInfo mcfg;
>>      uint32_t nr_mem = machine->ram_slots;
>>      int root_bus_limit = 0xFF;
>> @@ -2083,7 +2085,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>>           * with half of the 16-bit control register. Hence, the total size
>>           * of the i/o region used is FW_CFG_CTL_SIZE; when using DMA, the
>>           * DMA control register is located at FW_CFG_DMA_IO_BASE + 4 */
>> -        uint8_t io_size = object_property_get_bool(OBJECT(pcms->fw_cfg),
>> +        uint8_t io_size = object_property_get_bool(OBJECT(x86ms->fw_cfg),
>>                                                     "dma_enabled", NULL) ?
>>                            ROUND_UP(FW_CFG_CTL_SIZE, 4) + sizeof(dma_addr_t) :
>>                            FW_CFG_CTL_SIZE;
>> @@ -2318,6 +2320,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>>      MachineClass *mc = MACHINE_GET_CLASS(machine);
>>      const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
>>      PCMachineState *pcms = PC_MACHINE(machine);
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>>      ram_addr_t hotplugabble_address_space_size =
>>          object_property_get_int(OBJECT(pcms), PC_MACHINE_DEVMEM_REGION_SIZE,
>>                                  NULL);
>> @@ -2386,16 +2389,16 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>>          }
>>  
>>          /* Cut out the ACPI_PCI hole */
>> -        if (mem_base <= pcms->below_4g_mem_size &&
>> -            next_base > pcms->below_4g_mem_size) {
>> -            mem_len -= next_base - pcms->below_4g_mem_size;
>> +        if (mem_base <= x86ms->below_4g_mem_size &&
>> +            next_base > x86ms->below_4g_mem_size) {
>> +            mem_len -= next_base - x86ms->below_4g_mem_size;
>>              if (mem_len > 0) {
>>                  numamem = acpi_data_push(table_data, sizeof *numamem);
>>                  build_srat_memory(numamem, mem_base, mem_len, i - 1,
>>                                    MEM_AFFINITY_ENABLED);
>>              }
>>              mem_base = 1ULL << 32;
>> -            mem_len = next_base - pcms->below_4g_mem_size;
>> +            mem_len = next_base - x86ms->below_4g_mem_size;
>>              next_base = mem_base + mem_len;
>>          }
>>  
>> @@ -2614,6 +2617,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>  {
>>      PCMachineState *pcms = PC_MACHINE(machine);
>>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>>      GArray *table_offsets;
>>      unsigned facs, dsdt, rsdt, fadt;
>>      AcpiPmInfo pm;
>> @@ -2775,7 +2779,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>           */
>>          int legacy_aml_len =
>>              pcmc->legacy_acpi_table_size +
>> -            ACPI_BUILD_LEGACY_CPU_AML_SIZE * pcms->apic_id_limit;
>> +            ACPI_BUILD_LEGACY_CPU_AML_SIZE * x86ms->apic_id_limit;
>>          int legacy_table_size =
>>              ROUND_UP(tables_blob->len - aml_len + legacy_aml_len,
>>                       ACPI_BUILD_ALIGN_SIZE);
>> @@ -2865,13 +2869,14 @@ void acpi_setup(void)
>>  {
>>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
>>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      AcpiBuildTables tables;
>>      AcpiBuildState *build_state;
>>      Object *vmgenid_dev;
>>      TPMIf *tpm;
>>      static FwCfgTPMConfig tpm_config;
>>  
>> -    if (!pcms->fw_cfg) {
>> +    if (!x86ms->fw_cfg) {
>>          ACPI_BUILD_DPRINTF("No fw cfg. Bailing out.\n");
>>          return;
>>      }
>> @@ -2902,7 +2907,7 @@ void acpi_setup(void)
>>          acpi_add_rom_blob(acpi_build_update, build_state,
>>                            tables.linker->cmd_blob, "etc/table-loader", 0);
>>  
>> -    fw_cfg_add_file(pcms->fw_cfg, ACPI_BUILD_TPMLOG_FILE,
>> +    fw_cfg_add_file(x86ms->fw_cfg, ACPI_BUILD_TPMLOG_FILE,
>>                      tables.tcpalog->data, acpi_data_len(tables.tcpalog));
>>  
>>      tpm = tpm_find();
>> @@ -2912,13 +2917,13 @@ void acpi_setup(void)
>>              .tpm_version = tpm_get_version(tpm),
>>              .tpmppi_version = TPM_PPI_VERSION_1_30
>>          };
>> -        fw_cfg_add_file(pcms->fw_cfg, "etc/tpm/config",
>> +        fw_cfg_add_file(x86ms->fw_cfg, "etc/tpm/config",
>>                          &tpm_config, sizeof tpm_config);
>>      }
>>  
>>      vmgenid_dev = find_vmgenid_dev();
>>      if (vmgenid_dev) {
>> -        vmgenid_add_fw_cfg(VMGENID(vmgenid_dev), pcms->fw_cfg,
>> +        vmgenid_add_fw_cfg(VMGENID(vmgenid_dev), x86ms->fw_cfg,
>>                             tables.vmgenid);
>>      }
>>  
>> @@ -2931,7 +2936,7 @@ void acpi_setup(void)
>>          uint32_t rsdp_size = acpi_data_len(tables.rsdp);
>>  
>>          build_state->rsdp = g_memdup(tables.rsdp->data, rsdp_size);
>> -        fw_cfg_add_file_callback(pcms->fw_cfg, ACPI_BUILD_RSDP_FILE,
>> +        fw_cfg_add_file_callback(x86ms->fw_cfg, ACPI_BUILD_RSDP_FILE,
>>                                   acpi_build_update, NULL, build_state,
>>                                   build_state->rsdp, rsdp_size, true);
>>          build_state->rsdp_mr = NULL;
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index 08884523e2..bb3b5b4563 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -21,6 +21,7 @@
>>   */
>>  
>>  #include "qemu/osdep.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/i386/pc.h"
>>  #include "hw/pci/msi.h"
>>  #include "hw/pci/pci_bus.h"
>> @@ -1537,6 +1538,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
>>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>>      MachineState *ms = MACHINE(qdev_get_machine());
>>      PCMachineState *pcms = PC_MACHINE(ms);
>> +    X86MachineState *x86ms = X86_MACHINE(ms);
>>      PCIBus *bus = pcms->bus;
>>  
>>      s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
>> @@ -1565,7 +1567,7 @@ static void amdvi_realize(DeviceState *dev, Error **err)
>>      }
>>  
>>      /* Pseudo address space under root PCI bus. */
>> -    pcms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_IOAPIC_SB_DEVID);
>> +    x86ms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_IOAPIC_SB_DEVID);
>>  
>>      /* set up MMIO */
>>      memory_region_init_io(&s->mmio, OBJECT(s), &mmio_mem_ops, s, "amdvi-mmio",
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 75ca6f9c70..21f091c654 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -29,6 +29,7 @@
>>  #include "hw/pci/pci.h"
>>  #include "hw/pci/pci_bus.h"
>>  #include "hw/qdev-properties.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/i386/pc.h"
>>  #include "hw/i386/apic-msidef.h"
>>  #include "hw/boards.h"
>> @@ -3703,6 +3704,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>>  {
>>      MachineState *ms = MACHINE(qdev_get_machine());
>>      PCMachineState *pcms = PC_MACHINE(ms);
>> +    X86MachineState *x86ms = X86_MACHINE(ms);
>>      PCIBus *bus = pcms->bus;
>>      IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>>      X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>> @@ -3743,7 +3745,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
>>      sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
>>      pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
>>      /* Pseudo address space under root PCI bus. */
>> -    pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
>> +    x86ms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
>>  }
>>  
>>  static void vtd_class_init(ObjectClass *klass, void *data)
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 3920aa7e85..d18b461f01 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -24,6 +24,7 @@
>>  
>>  #include "qemu/osdep.h"
>>  #include "qemu/units.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/i386/pc.h"
>>  #include "hw/char/serial.h"
>>  #include "hw/char/parallel.h"
>> @@ -676,6 +677,7 @@ void pc_cmos_init(PCMachineState *pcms,
>>                    BusState *idebus0, BusState *idebus1,
>>                    ISADevice *s)
>>  {
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      int val;
>>      static pc_cmos_init_late_arg arg;
>>  
>> @@ -683,12 +685,12 @@ void pc_cmos_init(PCMachineState *pcms,
>>  
>>      /* memory size */
>>      /* base memory (first MiB) */
>> -    val = MIN(pcms->below_4g_mem_size / KiB, 640);
>> +    val = MIN(x86ms->below_4g_mem_size / KiB, 640);
>>      rtc_set_memory(s, 0x15, val);
>>      rtc_set_memory(s, 0x16, val >> 8);
>>      /* extended memory (next 64MiB) */
>> -    if (pcms->below_4g_mem_size > 1 * MiB) {
>> -        val = (pcms->below_4g_mem_size - 1 * MiB) / KiB;
>> +    if (x86ms->below_4g_mem_size > 1 * MiB) {
>> +        val = (x86ms->below_4g_mem_size - 1 * MiB) / KiB;
>>      } else {
>>          val = 0;
>>      }
>> @@ -699,8 +701,8 @@ void pc_cmos_init(PCMachineState *pcms,
>>      rtc_set_memory(s, 0x30, val);
>>      rtc_set_memory(s, 0x31, val >> 8);
>>      /* memory between 16MiB and 4GiB */
>> -    if (pcms->below_4g_mem_size > 16 * MiB) {
>> -        val = (pcms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
>> +    if (x86ms->below_4g_mem_size > 16 * MiB) {
>> +        val = (x86ms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
>>      } else {
>>          val = 0;
>>      }
>> @@ -709,20 +711,20 @@ void pc_cmos_init(PCMachineState *pcms,
>>      rtc_set_memory(s, 0x34, val);
>>      rtc_set_memory(s, 0x35, val >> 8);
>>      /* memory above 4GiB */
>> -    val = pcms->above_4g_mem_size / 65536;
>> +    val = x86ms->above_4g_mem_size / 65536;
>>      rtc_set_memory(s, 0x5b, val);
>>      rtc_set_memory(s, 0x5c, val >> 8);
>>      rtc_set_memory(s, 0x5d, val >> 16);
>>  
>> -    object_property_add_link(OBJECT(pcms), "rtc_state",
>> +    object_property_add_link(OBJECT(x86ms), "rtc_state",
>>                               TYPE_ISA_DEVICE,
>> -                             (Object **)&pcms->rtc,
>> +                             (Object **)&x86ms->rtc,
>>                               object_property_allow_set_link,
>>                               OBJ_PROP_LINK_STRONG, &error_abort);
>> -    object_property_set_link(OBJECT(pcms), OBJECT(s),
>> +    object_property_set_link(OBJECT(x86ms), OBJECT(s),
>>                               "rtc_state", &error_abort);
>>  
>> -    set_boot_dev(s, MACHINE(pcms)->boot_order, &error_fatal);
>> +    set_boot_dev(s, MACHINE(x86ms)->boot_order, &error_fatal);
>>  
>>      val = 0;
>>      val |= 0x02; /* FPU is there */
>> @@ -863,35 +865,6 @@ static void handle_a20_line_change(void *opaque, int irq, int level)
>>      x86_cpu_set_a20(cpu, level);
>>  }
>>  
>> -/* Calculates initial APIC ID for a specific CPU index
>> - *
>> - * Currently we need to be able to calculate the APIC ID from the CPU index
>> - * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces have
>> - * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of
>> - * all CPUs up to max_cpus.
>> - */
>> -static uint32_t x86_cpu_apic_id_from_index(PCMachineState *pcms,
>> -                                           unsigned int cpu_index)
>> -{
>> -    MachineState *ms = MACHINE(pcms);
>> -    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> -    uint32_t correct_id;
>> -    static bool warned;
>> -
>> -    correct_id = x86_apicid_from_cpu_idx(pcms->smp_dies, ms->smp.cores,
>> -                                         ms->smp.threads, cpu_index);
>> -    if (pcmc->compat_apic_id_mode) {
>> -        if (cpu_index != correct_id && !warned && !qtest_enabled()) {
>> -            error_report("APIC IDs set in compatibility mode, "
>> -                         "CPU topology won't match the configuration");
>> -            warned = true;
>> -        }
>> -        return cpu_index;
>> -    } else {
>> -        return correct_id;
>> -    }
>> -}
>> -
>>  static void pc_build_smbios(PCMachineState *pcms)
>>  {
>>      uint8_t *smbios_tables, *smbios_anchor;
>> @@ -899,6 +872,7 @@ static void pc_build_smbios(PCMachineState *pcms)
>>      struct smbios_phys_mem_area *mem_array;
>>      unsigned i, array_count;
>>      MachineState *ms = MACHINE(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      X86CPU *cpu = X86_CPU(ms->possible_cpus->cpus[0].cpu);
>>  
>>      /* tell smbios about cpuid version and features */
>> @@ -906,7 +880,7 @@ static void pc_build_smbios(PCMachineState *pcms)
>>  
>>      smbios_tables = smbios_get_table_legacy(ms, &smbios_tables_len);
>>      if (smbios_tables) {
>> -        fw_cfg_add_bytes(pcms->fw_cfg, FW_CFG_SMBIOS_ENTRIES,
>> +        fw_cfg_add_bytes(x86ms->fw_cfg, FW_CFG_SMBIOS_ENTRIES,
>>                           smbios_tables, smbios_tables_len);
>>      }
>>  
>> @@ -927,9 +901,9 @@ static void pc_build_smbios(PCMachineState *pcms)
>>      g_free(mem_array);
>>  
>>      if (smbios_anchor) {
>> -        fw_cfg_add_file(pcms->fw_cfg, "etc/smbios/smbios-tables",
>> +        fw_cfg_add_file(x86ms->fw_cfg, "etc/smbios/smbios-tables",
>>                          smbios_tables, smbios_tables_len);
>> -        fw_cfg_add_file(pcms->fw_cfg, "etc/smbios/smbios-anchor",
>> +        fw_cfg_add_file(x86ms->fw_cfg, "etc/smbios/smbios-anchor",
>>                          smbios_anchor, smbios_anchor_len);
>>      }
>>  }
>> @@ -942,10 +916,11 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>>      const CPUArchIdList *cpus;
>>      MachineClass *mc = MACHINE_GET_CLASS(pcms);
>>      MachineState *ms = MACHINE(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      int nb_numa_nodes = ms->numa_state->num_nodes;
>>  
>>      fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
>> -    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
>> +    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>>  
>>      /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
>>       *
>> @@ -959,7 +934,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>>       * So for compatibility reasons with old BIOSes we are stuck with
>>       * "etc/max-cpus" actually being apic_id_limit
>>       */
>> -    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)pcms->apic_id_limit);
>> +    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)x86ms->apic_id_limit);
>>      fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
>>      fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES,
>>                       acpi_tables, acpi_tables_len);
>> @@ -972,374 +947,25 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms)
>>       * of nodes, one word for each VCPU->node and one word for each node to
>>       * hold the amount of memory.
>>       */
>> -    numa_fw_cfg = g_new0(uint64_t, 1 + pcms->apic_id_limit + nb_numa_nodes);
>> +    numa_fw_cfg = g_new0(uint64_t, 1 + x86ms->apic_id_limit + nb_numa_nodes);
>>      numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
>>      cpus = mc->possible_cpu_arch_ids(MACHINE(pcms));
>>      for (i = 0; i < cpus->len; i++) {
>>          unsigned int apic_id = cpus->cpus[i].arch_id;
>> -        assert(apic_id < pcms->apic_id_limit);
>> +        assert(apic_id < x86ms->apic_id_limit);
>>          numa_fw_cfg[apic_id + 1] = cpu_to_le64(cpus->cpus[i].props.node_id);
>>      }
>>      for (i = 0; i < nb_numa_nodes; i++) {
>> -        numa_fw_cfg[pcms->apic_id_limit + 1 + i] =
>> +        numa_fw_cfg[x86ms->apic_id_limit + 1 + i] =
>>              cpu_to_le64(ms->numa_state->nodes[i].node_mem);
>>      }
>>      fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg,
>> -                     (1 + pcms->apic_id_limit + nb_numa_nodes) *
>> +                     (1 + x86ms->apic_id_limit + nb_numa_nodes) *
>>                       sizeof(*numa_fw_cfg));
>>  
>>      return fw_cfg;
>>  }
>>  
>> -static long get_file_size(FILE *f)
>> -{
>> -    long where, size;
>> -
>> -    /* XXX: on Unix systems, using fstat() probably makes more sense */
>> -
>> -    where = ftell(f);
>> -    fseek(f, 0, SEEK_END);
>> -    size = ftell(f);
>> -    fseek(f, where, SEEK_SET);
>> -
>> -    return size;
>> -}
>> -
>> -struct setup_data {
>> -    uint64_t next;
>> -    uint32_t type;
>> -    uint32_t len;
>> -    uint8_t data[0];
>> -} __attribute__((packed));
>> -
>> -static void load_linux(PCMachineState *pcms,
>> -                       FWCfgState *fw_cfg)
>> -{
>> -    uint16_t protocol;
>> -    int setup_size, kernel_size, cmdline_size;
>> -    int dtb_size, setup_data_offset;
>> -    uint32_t initrd_max;
>> -    uint8_t header[8192], *setup, *kernel;
>> -    hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
>> -    FILE *f;
>> -    char *vmode;
>> -    MachineState *machine = MACHINE(pcms);
>> -    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> -    struct setup_data *setup_data;
>> -    const char *kernel_filename = machine->kernel_filename;
>> -    const char *initrd_filename = machine->initrd_filename;
>> -    const char *dtb_filename = machine->dtb;
>> -    const char *kernel_cmdline = machine->kernel_cmdline;
>> -
>> -    /* Align to 16 bytes as a paranoia measure */
>> -    cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
>> -
>> -    /* load the kernel header */
>> -    f = fopen(kernel_filename, "rb");
>> -    if (!f || !(kernel_size = get_file_size(f)) ||
>> -        fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
>> -        MIN(ARRAY_SIZE(header), kernel_size)) {
>> -        fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
>> -                kernel_filename, strerror(errno));
>> -        exit(1);
>> -    }
>> -
>> -    /* kernel protocol version */
>> -#if 0
>> -    fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
>> -#endif
>> -    if (ldl_p(header+0x202) == 0x53726448) {
>> -        protocol = lduw_p(header+0x206);
>> -    } else {
>> -        size_t pvh_start_addr;
>> -        uint32_t mh_load_addr = 0;
>> -        uint32_t elf_kernel_size = 0;
>> -        /*
>> -         * This could be a multiboot kernel. If it is, let's stop treating it
>> -         * like a Linux kernel.
>> -         * Note: some multiboot images could be in the ELF format (the same of
>> -         * PVH), so we try multiboot first since we check the multiboot magic
>> -         * header before to load it.
>> -         */
>> -        if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
>> -                           kernel_cmdline, kernel_size, header)) {
>> -            return;
>> -        }
>> -        /*
>> -         * Check if the file is an uncompressed kernel file (ELF) and load it,
>> -         * saving the PVH entry point used by the x86/HVM direct boot ABI.
>> -         * If load_elfboot() is successful, populate the fw_cfg info.
>> -         */
>> -        if (pcmc->pvh_enabled &&
>> -            pvh_load_elfboot(kernel_filename,
>> -                             &mh_load_addr, &elf_kernel_size)) {
>> -            fclose(f);
>> -
>> -            pvh_start_addr = pvh_get_start_addr();
>> -
>> -            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> -            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> -            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> -
>> -            fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>> -                strlen(kernel_cmdline) + 1);
>> -            fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> -
>> -            fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
>> -            fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
>> -                             header, sizeof(header));
>> -
>> -            /* load initrd */
>> -            if (initrd_filename) {
>> -                GMappedFile *mapped_file;
>> -                gsize initrd_size;
>> -                gchar *initrd_data;
>> -                GError *gerr = NULL;
>> -
>> -                mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
>> -                if (!mapped_file) {
>> -                    fprintf(stderr, "qemu: error reading initrd %s: %s\n",
>> -                            initrd_filename, gerr->message);
>> -                    exit(1);
>> -                }
>> -                pcms->initrd_mapped_file = mapped_file;
>> -
>> -                initrd_data = g_mapped_file_get_contents(mapped_file);
>> -                initrd_size = g_mapped_file_get_length(mapped_file);
>> -                initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1;
>> -                if (initrd_size >= initrd_max) {
>> -                    fprintf(stderr, "qemu: initrd is too large, cannot support."
>> -                            "(max: %"PRIu32", need %"PRId64")\n",
>> -                            initrd_max, (uint64_t)initrd_size);
>> -                    exit(1);
>> -                }
>> -
>> -                initrd_addr = (initrd_max - initrd_size) & ~4095;
>> -
>> -                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
>> -                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
>> -                fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
>> -                                 initrd_size);
>> -            }
>> -
>> -            option_rom[nb_option_roms].bootindex = 0;
>> -            option_rom[nb_option_roms].name = "pvh.bin";
>> -            nb_option_roms++;
>> -
>> -            return;
>> -        }
>> -        protocol = 0;
>> -    }
>> -
>> -    if (protocol < 0x200 || !(header[0x211] & 0x01)) {
>> -        /* Low kernel */
>> -        real_addr    = 0x90000;
>> -        cmdline_addr = 0x9a000 - cmdline_size;
>> -        prot_addr    = 0x10000;
>> -    } else if (protocol < 0x202) {
>> -        /* High but ancient kernel */
>> -        real_addr    = 0x90000;
>> -        cmdline_addr = 0x9a000 - cmdline_size;
>> -        prot_addr    = 0x100000;
>> -    } else {
>> -        /* High and recent kernel */
>> -        real_addr    = 0x10000;
>> -        cmdline_addr = 0x20000;
>> -        prot_addr    = 0x100000;
>> -    }
>> -
>> -#if 0
>> -    fprintf(stderr,
>> -            "qemu: real_addr     = 0x" TARGET_FMT_plx "\n"
>> -            "qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
>> -            "qemu: prot_addr     = 0x" TARGET_FMT_plx "\n",
>> -            real_addr,
>> -            cmdline_addr,
>> -            prot_addr);
>> -#endif
>> -
>> -    /* highest address for loading the initrd */
>> -    if (protocol >= 0x20c &&
>> -        lduw_p(header+0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
>> -        /*
>> -         * Linux has supported initrd up to 4 GB for a very long time (2007,
>> -         * long before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013),
>> -         * though it only sets initrd_max to 2 GB to "work around bootloader
>> -         * bugs". Luckily, QEMU firmware(which does something like bootloader)
>> -         * has supported this.
>> -         *
>> -         * It's believed that if XLF_CAN_BE_LOADED_ABOVE_4G is set, initrd can
>> -         * be loaded into any address.
>> -         *
>> -         * In addition, initrd_max is uint32_t simply because QEMU doesn't
>> -         * support the 64-bit boot protocol (specifically the ext_ramdisk_image
>> -         * field).
>> -         *
>> -         * Therefore here just limit initrd_max to UINT32_MAX simply as well.
>> -         */
>> -        initrd_max = UINT32_MAX;
>> -    } else if (protocol >= 0x203) {
>> -        initrd_max = ldl_p(header+0x22c);
>> -    } else {
>> -        initrd_max = 0x37ffffff;
>> -    }
>> -
>> -    if (initrd_max >= pcms->below_4g_mem_size - pcmc->acpi_data_size) {
>> -        initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1;
>> -    }
>> -
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline)+1);
>> -    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> -
>> -    if (protocol >= 0x202) {
>> -        stl_p(header+0x228, cmdline_addr);
>> -    } else {
>> -        stw_p(header+0x20, 0xA33F);
>> -        stw_p(header+0x22, cmdline_addr-real_addr);
>> -    }
>> -
>> -    /* handle vga= parameter */
>> -    vmode = strstr(kernel_cmdline, "vga=");
>> -    if (vmode) {
>> -        unsigned int video_mode;
>> -        /* skip "vga=" */
>> -        vmode += 4;
>> -        if (!strncmp(vmode, "normal", 6)) {
>> -            video_mode = 0xffff;
>> -        } else if (!strncmp(vmode, "ext", 3)) {
>> -            video_mode = 0xfffe;
>> -        } else if (!strncmp(vmode, "ask", 3)) {
>> -            video_mode = 0xfffd;
>> -        } else {
>> -            video_mode = strtol(vmode, NULL, 0);
>> -        }
>> -        stw_p(header+0x1fa, video_mode);
>> -    }
>> -
>> -    /* loader type */
>> -    /* High nybble = B reserved for QEMU; low nybble is revision number.
>> -       If this code is substantially changed, you may want to consider
>> -       incrementing the revision. */
>> -    if (protocol >= 0x200) {
>> -        header[0x210] = 0xB0;
>> -    }
>> -    /* heap */
>> -    if (protocol >= 0x201) {
>> -        header[0x211] |= 0x80;	/* CAN_USE_HEAP */
>> -        stw_p(header+0x224, cmdline_addr-real_addr-0x200);
>> -    }
>> -
>> -    /* load initrd */
>> -    if (initrd_filename) {
>> -        GMappedFile *mapped_file;
>> -        gsize initrd_size;
>> -        gchar *initrd_data;
>> -        GError *gerr = NULL;
>> -
>> -        if (protocol < 0x200) {
>> -            fprintf(stderr, "qemu: linux kernel too old to load a ram disk\n");
>> -            exit(1);
>> -        }
>> -
>> -        mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
>> -        if (!mapped_file) {
>> -            fprintf(stderr, "qemu: error reading initrd %s: %s\n",
>> -                    initrd_filename, gerr->message);
>> -            exit(1);
>> -        }
>> -        pcms->initrd_mapped_file = mapped_file;
>> -
>> -        initrd_data = g_mapped_file_get_contents(mapped_file);
>> -        initrd_size = g_mapped_file_get_length(mapped_file);
>> -        if (initrd_size >= initrd_max) {
>> -            fprintf(stderr, "qemu: initrd is too large, cannot support."
>> -                    "(max: %"PRIu32", need %"PRId64")\n",
>> -                    initrd_max, (uint64_t)initrd_size);
>> -            exit(1);
>> -        }
>> -
>> -        initrd_addr = (initrd_max-initrd_size) & ~4095;
>> -
>> -        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
>> -        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
>> -        fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
>> -
>> -        stl_p(header+0x218, initrd_addr);
>> -        stl_p(header+0x21c, initrd_size);
>> -    }
>> -
>> -    /* load kernel and setup */
>> -    setup_size = header[0x1f1];
>> -    if (setup_size == 0) {
>> -        setup_size = 4;
>> -    }
>> -    setup_size = (setup_size+1)*512;
>> -    if (setup_size > kernel_size) {
>> -        fprintf(stderr, "qemu: invalid kernel header\n");
>> -        exit(1);
>> -    }
>> -    kernel_size -= setup_size;
>> -
>> -    setup  = g_malloc(setup_size);
>> -    kernel = g_malloc(kernel_size);
>> -    fseek(f, 0, SEEK_SET);
>> -    if (fread(setup, 1, setup_size, f) != setup_size) {
>> -        fprintf(stderr, "fread() failed\n");
>> -        exit(1);
>> -    }
>> -    if (fread(kernel, 1, kernel_size, f) != kernel_size) {
>> -        fprintf(stderr, "fread() failed\n");
>> -        exit(1);
>> -    }
>> -    fclose(f);
>> -
>> -    /* append dtb to kernel */
>> -    if (dtb_filename) {
>> -        if (protocol < 0x209) {
>> -            fprintf(stderr, "qemu: Linux kernel too old to load a dtb\n");
>> -            exit(1);
>> -        }
>> -
>> -        dtb_size = get_image_size(dtb_filename);
>> -        if (dtb_size <= 0) {
>> -            fprintf(stderr, "qemu: error reading dtb %s: %s\n",
>> -                    dtb_filename, strerror(errno));
>> -            exit(1);
>> -        }
>> -
>> -        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
>> -        kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
>> -        kernel = g_realloc(kernel, kernel_size);
>> -
>> -        stq_p(header+0x250, prot_addr + setup_data_offset);
>> -
>> -        setup_data = (struct setup_data *)(kernel + setup_data_offset);
>> -        setup_data->next = 0;
>> -        setup_data->type = cpu_to_le32(SETUP_DTB);
>> -        setup_data->len = cpu_to_le32(dtb_size);
>> -
>> -        load_image_size(dtb_filename, setup_data->data, dtb_size);
>> -    }
>> -
>> -    memcpy(setup, header, MIN(sizeof(header), setup_size));
>> -
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
>> -    fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
>> -
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_ADDR, real_addr);
>> -    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
>> -    fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
>> -
>> -    option_rom[nb_option_roms].bootindex = 0;
>> -    option_rom[nb_option_roms].name = "linuxboot.bin";
>> -    if (pcmc->linuxboot_dma_enabled && fw_cfg_dma_enabled(fw_cfg)) {
>> -        option_rom[nb_option_roms].name = "linuxboot_dma.bin";
>> -    }
>> -    nb_option_roms++;
>> -}
>> -
>>  #define NE2000_NB_MAX 6
>>  
>>  static const int ne2000_io[NE2000_NB_MAX] = { 0x300, 0x320, 0x340, 0x360,
>> @@ -1376,157 +1002,10 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level)
>>      }
>>  }
>>  
>> -static void pc_new_cpu(PCMachineState *pcms, int64_t apic_id, Error **errp)
>> -{
>> -    Object *cpu = NULL;
>> -    Error *local_err = NULL;
>> -    CPUX86State *env = NULL;
>> -
>> -    cpu = object_new(MACHINE(pcms)->cpu_type);
>> -
>> -    env = &X86_CPU(cpu)->env;
>> -    env->nr_dies = pcms->smp_dies;
>> -
>> -    object_property_set_uint(cpu, apic_id, "apic-id", &local_err);
>> -    object_property_set_bool(cpu, true, "realized", &local_err);
>> -
>> -    object_unref(cpu);
>> -    error_propagate(errp, local_err);
>> -}
>> -
>> -/*
>> - * This function is very similar to smp_parse()
>> - * in hw/core/machine.c but includes CPU die support.
>> - */
>> -void pc_smp_parse(MachineState *ms, QemuOpts *opts)
>> -{
>> -    PCMachineState *pcms = PC_MACHINE(ms);
>> -
>> -    if (opts) {
>> -        unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
>> -        unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
>> -        unsigned dies = qemu_opt_get_number(opts, "dies", 1);
>> -        unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
>> -        unsigned threads = qemu_opt_get_number(opts, "threads", 0);
>> -
>> -        /* compute missing values, prefer sockets over cores over threads */
>> -        if (cpus == 0 || sockets == 0) {
>> -            cores = cores > 0 ? cores : 1;
>> -            threads = threads > 0 ? threads : 1;
>> -            if (cpus == 0) {
>> -                sockets = sockets > 0 ? sockets : 1;
>> -                cpus = cores * threads * dies * sockets;
>> -            } else {
>> -                ms->smp.max_cpus =
>> -                        qemu_opt_get_number(opts, "maxcpus", cpus);
>> -                sockets = ms->smp.max_cpus / (cores * threads * dies);
>> -            }
>> -        } else if (cores == 0) {
>> -            threads = threads > 0 ? threads : 1;
>> -            cores = cpus / (sockets * dies * threads);
>> -            cores = cores > 0 ? cores : 1;
>> -        } else if (threads == 0) {
>> -            threads = cpus / (cores * dies * sockets);
>> -            threads = threads > 0 ? threads : 1;
>> -        } else if (sockets * dies * cores * threads < cpus) {
>> -            error_report("cpu topology: "
>> -                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
>> -                         "smp_cpus (%u)",
>> -                         sockets, dies, cores, threads, cpus);
>> -            exit(1);
>> -        }
>> -
>> -        ms->smp.max_cpus =
>> -                qemu_opt_get_number(opts, "maxcpus", cpus);
>> -
>> -        if (ms->smp.max_cpus < cpus) {
>> -            error_report("maxcpus must be equal to or greater than smp");
>> -            exit(1);
>> -        }
>> -
>> -        if (sockets * dies * cores * threads > ms->smp.max_cpus) {
>> -            error_report("cpu topology: "
>> -                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) > "
>> -                         "maxcpus (%u)",
>> -                         sockets, dies, cores, threads,
>> -                         ms->smp.max_cpus);
>> -            exit(1);
>> -        }
>> -
>> -        if (sockets * dies * cores * threads != ms->smp.max_cpus) {
>> -            warn_report("Invalid CPU topology deprecated: "
>> -                        "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
>> -                        "!= maxcpus (%u)",
>> -                        sockets, dies, cores, threads,
>> -                        ms->smp.max_cpus);
>> -        }
>> -
>> -        ms->smp.cpus = cpus;
>> -        ms->smp.cores = cores;
>> -        ms->smp.threads = threads;
>> -        pcms->smp_dies = dies;
>> -    }
>> -
>> -    if (ms->smp.cpus > 1) {
>> -        Error *blocker = NULL;
>> -        error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp");
>> -        replay_add_blocker(blocker);
>> -    }
>> -}
>> -
>> -void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp)
>> -{
>> -    PCMachineState *pcms = PC_MACHINE(ms);
>> -    int64_t apic_id = x86_cpu_apic_id_from_index(pcms, id);
>> -    Error *local_err = NULL;
>> -
>> -    if (id < 0) {
>> -        error_setg(errp, "Invalid CPU id: %" PRIi64, id);
>> -        return;
>> -    }
>> -
>> -    if (apic_id >= ACPI_CPU_HOTPLUG_ID_LIMIT) {
>> -        error_setg(errp, "Unable to add CPU: %" PRIi64
>> -                   ", resulting APIC ID (%" PRIi64 ") is too large",
>> -                   id, apic_id);
>> -        return;
>> -    }
>> -
>> -    pc_new_cpu(PC_MACHINE(ms), apic_id, &local_err);
>> -    if (local_err) {
>> -        error_propagate(errp, local_err);
>> -        return;
>> -    }
>> -}
>> -
>> -void pc_cpus_init(PCMachineState *pcms)
>> -{
>> -    int i;
>> -    const CPUArchIdList *possible_cpus;
>> -    MachineState *ms = MACHINE(pcms);
>> -    MachineClass *mc = MACHINE_GET_CLASS(pcms);
>> -    PCMachineClass *pcmc = PC_MACHINE_CLASS(mc);
>> -
>> -    x86_cpu_set_default_version(pcmc->default_cpu_version);
>> -
>> -    /* Calculates the limit to CPU APIC ID values
>> -     *
>> -     * Limit for the APIC ID value, so that all
>> -     * CPU APIC IDs are < pcms->apic_id_limit.
>> -     *
>> -     * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
>> -     */
>> -    pcms->apic_id_limit = x86_cpu_apic_id_from_index(pcms,
>> -                                                     ms->smp.max_cpus - 1) + 1;
>> -    possible_cpus = mc->possible_cpu_arch_ids(ms);
>> -    for (i = 0; i < ms->smp.cpus; i++) {
>> -        pc_new_cpu(pcms, possible_cpus->cpus[i].arch_id, &error_fatal);
>> -    }
>> -}
>> -
>>  static void pc_build_feature_control_file(PCMachineState *pcms)
>>  {
>>      MachineState *ms = MACHINE(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      X86CPU *cpu = X86_CPU(ms->possible_cpus->cpus[0].cpu);
>>      CPUX86State *env = &cpu->env;
>>      uint32_t unused, ecx, edx;
>> @@ -1550,7 +1029,7 @@ static void pc_build_feature_control_file(PCMachineState *pcms)
>>  
>>      val = g_malloc(sizeof(*val));
>>      *val = cpu_to_le64(feature_control_bits | FEATURE_CONTROL_LOCKED);
>> -    fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val));
>> +    fw_cfg_add_file(x86ms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val));
>>  }
>>  
>>  static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count)
>> @@ -1571,10 +1050,11 @@ void pc_machine_done(Notifier *notifier, void *data)
>>  {
>>      PCMachineState *pcms = container_of(notifier,
>>                                          PCMachineState, machine_done);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      PCIBus *bus = pcms->bus;
>>  
>>      /* set the number of CPUs */
>> -    rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
>> +    rtc_set_cpus_count(x86ms->rtc, x86ms->boot_cpus);
>>  
>>      if (bus) {
>>          int extra_hosts = 0;
>> @@ -1585,23 +1065,23 @@ void pc_machine_done(Notifier *notifier, void *data)
>>                  extra_hosts++;
>>              }
>>          }
>> -        if (extra_hosts && pcms->fw_cfg) {
>> +        if (extra_hosts && x86ms->fw_cfg) {
>>              uint64_t *val = g_malloc(sizeof(*val));
>>              *val = cpu_to_le64(extra_hosts);
>> -            fw_cfg_add_file(pcms->fw_cfg,
>> +            fw_cfg_add_file(x86ms->fw_cfg,
>>                      "etc/extra-pci-roots", val, sizeof(*val));
>>          }
>>      }
>>  
>>      acpi_setup();
>> -    if (pcms->fw_cfg) {
>> +    if (x86ms->fw_cfg) {
>>          pc_build_smbios(pcms);
>>          pc_build_feature_control_file(pcms);
>>          /* update FW_CFG_NB_CPUS to account for -device added CPUs */
>> -        fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
>> +        fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>>      }
>>  
>> -    if (pcms->apic_id_limit > 255 && !xen_enabled()) {
>> +    if (x86ms->apic_id_limit > 255 && !xen_enabled()) {
>>          IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
>>  
>>          if (!iommu || !x86_iommu_ir_supported(X86_IOMMU_DEVICE(iommu)) ||
>> @@ -1619,8 +1099,9 @@ void pc_guest_info_init(PCMachineState *pcms)
>>  {
>>      int i;
>>      MachineState *ms = MACHINE(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>  
>> -    pcms->apic_xrupt_override = kvm_allows_irq0_override();
>> +    x86ms->apic_xrupt_override = kvm_allows_irq0_override();
>>      pcms->numa_nodes = ms->numa_state->num_nodes;
>>      pcms->node_mem = g_malloc0(pcms->numa_nodes *
>>                                      sizeof *pcms->node_mem);
>> @@ -1645,14 +1126,17 @@ void xen_load_linux(PCMachineState *pcms)
>>  {
>>      int i;
>>      FWCfgState *fw_cfg;
>> +    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>  
>>      assert(MACHINE(pcms)->kernel_filename != NULL);
>>  
>>      fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
>> -    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
>> +    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>>      rom_set_fw(fw_cfg);
>>  
>> -    load_linux(pcms, fw_cfg);
>> +    load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
>> +               pcmc->linuxboot_dma_enabled, pcmc->pvh_enabled);
>>      for (i = 0; i < nb_option_roms; i++) {
>>          assert(!strcmp(option_rom[i].name, "linuxboot.bin") ||
>>                 !strcmp(option_rom[i].name, "linuxboot_dma.bin") ||
>> @@ -1660,7 +1144,7 @@ void xen_load_linux(PCMachineState *pcms)
>>                 !strcmp(option_rom[i].name, "multiboot.bin"));
>>          rom_add_option(option_rom[i].name, option_rom[i].bootindex);
>>      }
>> -    pcms->fw_cfg = fw_cfg;
>> +    x86ms->fw_cfg = fw_cfg;
>>  }
>>  
>>  void pc_memory_init(PCMachineState *pcms,
>> @@ -1673,10 +1157,11 @@ void pc_memory_init(PCMachineState *pcms,
>>      MemoryRegion *ram_below_4g, *ram_above_4g;
>>      FWCfgState *fw_cfg;
>>      MachineState *machine = MACHINE(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>>  
>> -    assert(machine->ram_size == pcms->below_4g_mem_size +
>> -                                pcms->above_4g_mem_size);
>> +    assert(machine->ram_size == x86ms->below_4g_mem_size +
>> +                                x86ms->above_4g_mem_size);
>>  
>>      linux_boot = (machine->kernel_filename != NULL);
>>  
>> @@ -1690,17 +1175,17 @@ void pc_memory_init(PCMachineState *pcms,
>>      *ram_memory = ram;
>>      ram_below_4g = g_malloc(sizeof(*ram_below_4g));
>>      memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>> -                             0, pcms->below_4g_mem_size);
>> +                             0, x86ms->below_4g_mem_size);
>>      memory_region_add_subregion(system_memory, 0, ram_below_4g);
>> -    e820_add_entry(0, pcms->below_4g_mem_size, E820_RAM);
>> -    if (pcms->above_4g_mem_size > 0) {
>> +    e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
>> +    if (x86ms->above_4g_mem_size > 0) {
>>          ram_above_4g = g_malloc(sizeof(*ram_above_4g));
>>          memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>> -                                 pcms->below_4g_mem_size,
>> -                                 pcms->above_4g_mem_size);
>> +                                 x86ms->below_4g_mem_size,
>> +                                 x86ms->above_4g_mem_size);
>>          memory_region_add_subregion(system_memory, 0x100000000ULL,
>>                                      ram_above_4g);
>> -        e820_add_entry(0x100000000ULL, pcms->above_4g_mem_size, E820_RAM);
>> +        e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM);
>>      }
>>  
>>      if (!pcmc->has_reserved_memory &&
>> @@ -1735,7 +1220,7 @@ void pc_memory_init(PCMachineState *pcms,
>>          }
>>  
>>          machine->device_memory->base =
>> -            ROUND_UP(0x100000000ULL + pcms->above_4g_mem_size, 1 * GiB);
>> +            ROUND_UP(0x100000000ULL + x86ms->above_4g_mem_size, 1 * GiB);
>>  
>>          if (pcmc->enforce_aligned_dimm) {
>>              /* size device region assuming 1G page max alignment per slot */
>> @@ -1786,16 +1271,17 @@ void pc_memory_init(PCMachineState *pcms,
>>      }
>>  
>>      if (linux_boot) {
>> -        load_linux(pcms, fw_cfg);
>> +        load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
>> +                   pcmc->linuxboot_dma_enabled, pcmc->pvh_enabled);
>>      }
>>  
>>      for (i = 0; i < nb_option_roms; i++) {
>>          rom_add_option(option_rom[i].name, option_rom[i].bootindex);
>>      }
>> -    pcms->fw_cfg = fw_cfg;
>> +    x86ms->fw_cfg = fw_cfg;
>>  
>>      /* Init default IOAPIC address space */
>> -    pcms->ioapic_as = &address_space_memory;
>> +    x86ms->ioapic_as = &address_space_memory;
>>  }
>>  
>>  /*
>> @@ -1807,6 +1293,7 @@ uint64_t pc_pci_hole64_start(void)
>>      PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
>>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>>      MachineState *ms = MACHINE(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      uint64_t hole64_start = 0;
>>  
>>      if (pcmc->has_reserved_memory && ms->device_memory->base) {
>> @@ -1815,7 +1302,7 @@ uint64_t pc_pci_hole64_start(void)
>>              hole64_start += memory_region_size(&ms->device_memory->mr);
>>          }
>>      } else {
>> -        hole64_start = 0x100000000ULL + pcms->above_4g_mem_size;
>> +        hole64_start = 0x100000000ULL + x86ms->above_4g_mem_size;
>>      }
>>  
>>      return ROUND_UP(hole64_start, 1 * GiB);
>> @@ -2154,6 +1641,7 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
>>      Error *local_err = NULL;
>>      X86CPU *cpu = X86_CPU(dev);
>>      PCMachineState *pcms = PC_MACHINE(hotplug_dev);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>  
>>      if (pcms->acpi_dev) {
>>          hotplug_handler_plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err);
>> @@ -2163,12 +1651,12 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
>>      }
>>  
>>      /* increment the number of CPUs */
>> -    pcms->boot_cpus++;
>> -    if (pcms->rtc) {
>> -        rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
>> +    x86ms->boot_cpus++;
>> +    if (x86ms->rtc) {
>> +        rtc_set_cpus_count(x86ms->rtc, x86ms->boot_cpus);
>>      }
>> -    if (pcms->fw_cfg) {
>> -        fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
>> +    if (x86ms->fw_cfg) {
>> +        fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>>      }
>>  
>>      found_cpu = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, NULL);
>> @@ -2214,6 +1702,7 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev,
>>      Error *local_err = NULL;
>>      X86CPU *cpu = X86_CPU(dev);
>>      PCMachineState *pcms = PC_MACHINE(hotplug_dev);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>  
>>      hotplug_handler_unplug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err);
>>      if (local_err) {
>> @@ -2225,10 +1714,10 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev,
>>      object_property_set_bool(OBJECT(dev), false, "realized", NULL);
>>  
>>      /* decrement the number of CPUs */
>> -    pcms->boot_cpus--;
>> +    x86ms->boot_cpus--;
>>      /* Update the number of CPUs in CMOS */
>> -    rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
>> -    fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
>> +    rtc_set_cpus_count(x86ms->rtc, x86ms->boot_cpus);
>> +    fw_cfg_modify_i16(x86ms->fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>>   out:
>>      error_propagate(errp, local_err);
>>  }
>> @@ -2244,6 +1733,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>      CPUX86State *env = &cpu->env;
>>      MachineState *ms = MACHINE(hotplug_dev);
>>      PCMachineState *pcms = PC_MACHINE(hotplug_dev);
>> +    X86MachineState *x86ms = X86_MACHINE(hotplug_dev);
>>      unsigned int smp_cores = ms->smp.cores;
>>      unsigned int smp_threads = ms->smp.threads;
>>  
>> @@ -2253,7 +1743,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>          return;
>>      }
>>  
>> -    env->nr_dies = pcms->smp_dies;
>> +    env->nr_dies = x86ms->smp_dies;
>>  
>>      /*
>>       * If APIC ID is not set,
>> @@ -2261,13 +1751,13 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>       */
>>      if (cpu->apic_id == UNASSIGNED_APIC_ID) {
>>          int max_socket = (ms->smp.max_cpus - 1) /
>> -                                smp_threads / smp_cores / pcms->smp_dies;
>> +                                smp_threads / smp_cores / x86ms->smp_dies;
>>  
>>          /*
>>           * die-id was optional in QEMU 4.0 and older, so keep it optional
>>           * if there's only one die per socket.
>>           */
>> -        if (cpu->die_id < 0 && pcms->smp_dies == 1) {
>> +        if (cpu->die_id < 0 && x86ms->smp_dies == 1) {
>>              cpu->die_id = 0;
>>          }
>>  
>> @@ -2282,9 +1772,9 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>          if (cpu->die_id < 0) {
>>              error_setg(errp, "CPU die-id is not set");
>>              return;
>> -        } else if (cpu->die_id > pcms->smp_dies - 1) {
>> +        } else if (cpu->die_id > x86ms->smp_dies - 1) {
>>              error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u",
>> -                       cpu->die_id, pcms->smp_dies - 1);
>> +                       cpu->die_id, x86ms->smp_dies - 1);
>>              return;
>>          }
>>          if (cpu->core_id < 0) {
>> @@ -2308,7 +1798,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>          topo.die_id = cpu->die_id;
>>          topo.core_id = cpu->core_id;
>>          topo.smt_id = cpu->thread_id;
>> -        cpu->apic_id = apicid_from_topo_ids(pcms->smp_dies, smp_cores,
>> +        cpu->apic_id = apicid_from_topo_ids(x86ms->smp_dies, smp_cores,
>>                                              smp_threads, &topo);
>>      }
>>  
>> @@ -2316,7 +1806,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>      if (!cpu_slot) {
>>          MachineState *ms = MACHINE(pcms);
>>  
>> -        x86_topo_ids_from_apicid(cpu->apic_id, pcms->smp_dies,
>> +        x86_topo_ids_from_apicid(cpu->apic_id, x86ms->smp_dies,
>>                                   smp_cores, smp_threads, &topo);
>>          error_setg(errp,
>>              "Invalid CPU [socket: %u, die: %u, core: %u, thread: %u] with"
>> @@ -2338,7 +1828,7 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>>      /* TODO: move socket_id/core_id/thread_id checks into x86_cpu_realizefn()
>>       * once -smp refactoring is complete and there will be CPU private
>>       * CPUState::nr_cores and CPUState::nr_threads fields instead of globals */
>> -    x86_topo_ids_from_apicid(cpu->apic_id, pcms->smp_dies,
>> +    x86_topo_ids_from_apicid(cpu->apic_id, x86ms->smp_dies,
>>                               smp_cores, smp_threads, &topo);
>>      if (cpu->socket_id != -1 && cpu->socket_id != topo.pkg_id) {
>>          error_setg(errp, "property socket-id: %u doesn't match set apic-id:"
>> @@ -2520,45 +2010,6 @@ pc_machine_get_device_memory_region_size(Object *obj, Visitor *v,
>>      visit_type_int(v, name, &value, errp);
>>  }
>>  
>> -static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
>> -                                            const char *name, void *opaque,
>> -                                            Error **errp)
>> -{
>> -    PCMachineState *pcms = PC_MACHINE(obj);
>> -    uint64_t value = pcms->max_ram_below_4g;
>> -
>> -    visit_type_size(v, name, &value, errp);
>> -}
>> -
>> -static void pc_machine_set_max_ram_below_4g(Object *obj, Visitor *v,
>> -                                            const char *name, void *opaque,
>> -                                            Error **errp)
>> -{
>> -    PCMachineState *pcms = PC_MACHINE(obj);
>> -    Error *error = NULL;
>> -    uint64_t value;
>> -
>> -    visit_type_size(v, name, &value, &error);
>> -    if (error) {
>> -        error_propagate(errp, error);
>> -        return;
>> -    }
>> -    if (value > 4 * GiB) {
>> -        error_setg(&error,
>> -                   "Machine option 'max-ram-below-4g=%"PRIu64
>> -                   "' expects size less than or equal to 4G", value);
>> -        error_propagate(errp, error);
>> -        return;
>> -    }
>> -
>> -    if (value < 1 * MiB) {
>> -        warn_report("Only %" PRIu64 " bytes of RAM below the 4GiB boundary,"
>> -                    "BIOS may not work with less than 1MiB", value);
>> -    }
>> -
>> -    pcms->max_ram_below_4g = value;
>> -}
>> -
>>  static void pc_machine_get_vmport(Object *obj, Visitor *v, const char *name,
>>                                    void *opaque, Error **errp)
>>  {
>> @@ -2664,7 +2115,6 @@ static void pc_machine_initfn(Object *obj)
>>  {
>>      PCMachineState *pcms = PC_MACHINE(obj);
>>  
>> -    pcms->max_ram_below_4g = 0; /* use default */
>>      pcms->smm = ON_OFF_AUTO_AUTO;
>>  #ifdef CONFIG_VMPORT
>>      pcms->vmport = ON_OFF_AUTO_AUTO;
>> @@ -2676,7 +2126,6 @@ static void pc_machine_initfn(Object *obj)
>>      pcms->smbus_enabled = true;
>>      pcms->sata_enabled = true;
>>      pcms->pit_enabled = true;
>> -    pcms->smp_dies = 1;
>>  
>>      pc_system_flash_create(pcms);
>>  }
>> @@ -2707,85 +2156,6 @@ static void pc_machine_wakeup(MachineState *machine)
>>      cpu_synchronize_all_post_reset();
>>  }
>>  
>> -static CpuInstanceProperties
>> -pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>> -{
>> -    MachineClass *mc = MACHINE_GET_CLASS(ms);
>> -    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
>> -
>> -    assert(cpu_index < possible_cpus->len);
>> -    return possible_cpus->cpus[cpu_index].props;
>> -}
>> -
>> -static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx)
>> -{
>> -   X86CPUTopoInfo topo;
>> -   PCMachineState *pcms = PC_MACHINE(ms);
>> -
>> -   assert(idx < ms->possible_cpus->len);
>> -   x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
>> -                            pcms->smp_dies, ms->smp.cores,
>> -                            ms->smp.threads, &topo);
>> -   return topo.pkg_id % ms->numa_state->num_nodes;
>> -}
>> -
>> -static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
>> -{
>> -    PCMachineState *pcms = PC_MACHINE(ms);
>> -    int i;
>> -    unsigned int max_cpus = ms->smp.max_cpus;
>> -
>> -    if (ms->possible_cpus) {
>> -        /*
>> -         * make sure that max_cpus hasn't changed since the first use, i.e.
>> -         * -smp hasn't been parsed after it
>> -        */
>> -        assert(ms->possible_cpus->len == max_cpus);
>> -        return ms->possible_cpus;
>> -    }
>> -
>> -    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
>> -                                  sizeof(CPUArchId) * max_cpus);
>> -    ms->possible_cpus->len = max_cpus;
>> -    for (i = 0; i < ms->possible_cpus->len; i++) {
>> -        X86CPUTopoInfo topo;
>> -
>> -        ms->possible_cpus->cpus[i].type = ms->cpu_type;
>> -        ms->possible_cpus->cpus[i].vcpus_count = 1;
>> -        ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(pcms, i);
>> -        x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
>> -                                 pcms->smp_dies, ms->smp.cores,
>> -                                 ms->smp.threads, &topo);
>> -        ms->possible_cpus->cpus[i].props.has_socket_id = true;
>> -        ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id;
>> -        if (pcms->smp_dies > 1) {
>> -            ms->possible_cpus->cpus[i].props.has_die_id = true;
>> -            ms->possible_cpus->cpus[i].props.die_id = topo.die_id;
>> -        }
>> -        ms->possible_cpus->cpus[i].props.has_core_id = true;
>> -        ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
>> -        ms->possible_cpus->cpus[i].props.has_thread_id = true;
>> -        ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
>> -    }
>> -    return ms->possible_cpus;
>> -}
>> -
>> -static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
>> -{
>> -    /* cpu index isn't used */
>> -    CPUState *cs;
>> -
>> -    CPU_FOREACH(cs) {
>> -        X86CPU *cpu = X86_CPU(cs);
>> -
>> -        if (!cpu->apic_state) {
>> -            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
>> -        } else {
>> -            apic_deliver_nmi(cpu->apic_state);
>> -        }
>> -    }
>> -}
>> -
>>  static void pc_machine_class_init(ObjectClass *oc, void *data)
>>  {
>>      MachineClass *mc = MACHINE_CLASS(oc);
>> @@ -2810,14 +2180,11 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>>      pcmc->pvh_enabled = true;
>>      assert(!mc->get_hotplug_handler);
>>      mc->get_hotplug_handler = pc_get_hotplug_handler;
>> -    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
>> -    mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
>> -    mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
>>      mc->auto_enable_numa_with_memhp = true;
>>      mc->has_hotpluggable_cpus = true;
>>      mc->default_boot_order = "cad";
>> -    mc->hot_add_cpu = pc_hot_add_cpu;
>> -    mc->smp_parse = pc_smp_parse;
>> +    mc->hot_add_cpu = x86_hot_add_cpu;
>> +    mc->smp_parse = x86_smp_parse;
>>      mc->block_default_type = IF_IDE;
>>      mc->max_cpus = 255;
>>      mc->reset = pc_machine_reset;
>> @@ -2835,13 +2202,6 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>>          pc_machine_get_device_memory_region_size, NULL,
>>          NULL, NULL, &error_abort);
>>  
>> -    object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
>> -        pc_machine_get_max_ram_below_4g, pc_machine_set_max_ram_below_4g,
>> -        NULL, NULL, &error_abort);
>> -
>> -    object_class_property_set_description(oc, PC_MACHINE_MAX_RAM_BELOW_4G,
>> -        "Maximum ram below the 4G boundary (32bit boundary)", &error_abort);
>> -
>>      object_class_property_add(oc, PC_MACHINE_SMM, "OnOffAuto",
>>          pc_machine_get_smm, pc_machine_set_smm,
>>          NULL, NULL, &error_abort);
>> @@ -2866,7 +2226,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>>  
>>  static const TypeInfo pc_machine_info = {
>>      .name = TYPE_PC_MACHINE,
>> -    .parent = TYPE_MACHINE,
>> +    .parent = TYPE_X86_MACHINE,
>>      .abstract = true,
>>      .instance_size = sizeof(PCMachineState),
>>      .instance_init = pc_machine_initfn,
>> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
>> index 2362675149..f63c27bc74 100644
>> --- a/hw/i386/pc_piix.c
>> +++ b/hw/i386/pc_piix.c
>> @@ -27,6 +27,7 @@
>>  
>>  #include "qemu/units.h"
>>  #include "hw/loader.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/i386/pc.h"
>>  #include "hw/i386/apic.h"
>>  #include "hw/display/ramfb.h"
>> @@ -73,6 +74,7 @@ static void pc_init1(MachineState *machine,
>>  {
>>      PCMachineState *pcms = PC_MACHINE(machine);
>>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      MemoryRegion *system_memory = get_system_memory();
>>      MemoryRegion *system_io = get_system_io();
>>      int i;
>> @@ -125,11 +127,11 @@ static void pc_init1(MachineState *machine,
>>      if (xen_enabled()) {
>>          xen_hvm_init(pcms, &ram_memory);
>>      } else {
>> -        if (!pcms->max_ram_below_4g) {
>> -            pcms->max_ram_below_4g = 0xe0000000; /* default: 3.5G */
>> +        if (!x86ms->max_ram_below_4g) {
>> +            x86ms->max_ram_below_4g = 0xe0000000; /* default: 3.5G */
>>          }
>> -        lowmem = pcms->max_ram_below_4g;
>> -        if (machine->ram_size >= pcms->max_ram_below_4g) {
>> +        lowmem = x86ms->max_ram_below_4g;
>> +        if (machine->ram_size >= x86ms->max_ram_below_4g) {
>>              if (pcmc->gigabyte_align) {
>>                  if (lowmem > 0xc0000000) {
>>                      lowmem = 0xc0000000;
>> @@ -138,21 +140,21 @@ static void pc_init1(MachineState *machine,
>>                      warn_report("Large machine and max_ram_below_4g "
>>                                  "(%" PRIu64 ") not a multiple of 1G; "
>>                                  "possible bad performance.",
>> -                                pcms->max_ram_below_4g);
>> +                                x86ms->max_ram_below_4g);
>>                  }
>>              }
>>          }
>>  
>>          if (machine->ram_size >= lowmem) {
>> -            pcms->above_4g_mem_size = machine->ram_size - lowmem;
>> -            pcms->below_4g_mem_size = lowmem;
>> +            x86ms->above_4g_mem_size = machine->ram_size - lowmem;
>> +            x86ms->below_4g_mem_size = lowmem;
>>          } else {
>> -            pcms->above_4g_mem_size = 0;
>> -            pcms->below_4g_mem_size = machine->ram_size;
>> +            x86ms->above_4g_mem_size = 0;
>> +            x86ms->below_4g_mem_size = machine->ram_size;
>>          }
>>      }
>>  
>> -    pc_cpus_init(pcms);
>> +    x86_cpus_init(x86ms, pcmc->default_cpu_version);
>>  
>>      if (kvm_enabled() && pcmc->kvmclock_enabled) {
>>          kvmclock_create();
>> @@ -190,19 +192,19 @@ static void pc_init1(MachineState *machine,
>>      gsi_state = g_malloc0(sizeof(*gsi_state));
>>      if (kvm_ioapic_in_kernel()) {
>>          kvm_pc_setup_irq_routing(pcmc->pci_enabled);
>> -        pcms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
>> -                                       GSI_NUM_PINS);
>> +        x86ms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
>> +                                        GSI_NUM_PINS);
>>      } else {
>> -        pcms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>> +        x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>>      }
>>  
>>      if (pcmc->pci_enabled) {
>>          pci_bus = i440fx_init(host_type,
>>                                pci_type,
>> -                              &i440fx_state, &piix3_devfn, &isa_bus, pcms->gsi,
>> +                              &i440fx_state, &piix3_devfn, &isa_bus, x86ms->gsi,
>>                                system_memory, system_io, machine->ram_size,
>> -                              pcms->below_4g_mem_size,
>> -                              pcms->above_4g_mem_size,
>> +                              x86ms->below_4g_mem_size,
>> +                              x86ms->above_4g_mem_size,
>>                                pci_memory, ram_memory);
>>          pcms->bus = pci_bus;
>>      } else {
>> @@ -212,7 +214,7 @@ static void pc_init1(MachineState *machine,
>>                                &error_abort);
>>          no_hpet = 1;
>>      }
>> -    isa_bus_irqs(isa_bus, pcms->gsi);
>> +    isa_bus_irqs(isa_bus, x86ms->gsi);
>>  
>>      if (kvm_pic_in_kernel()) {
>>          i8259 = kvm_i8259_init(isa_bus);
>> @@ -230,7 +232,7 @@ static void pc_init1(MachineState *machine,
>>          ioapic_init_gsi(gsi_state, "i440fx");
>>      }
>>  
>> -    pc_register_ferr_irq(pcms->gsi[13]);
>> +    pc_register_ferr_irq(x86ms->gsi[13]);
>>  
>>      pc_vga_init(isa_bus, pcmc->pci_enabled ? pci_bus : NULL);
>>  
>> @@ -240,7 +242,7 @@ static void pc_init1(MachineState *machine,
>>      }
>>  
>>      /* init basic PC hardware */
>> -    pc_basic_device_init(isa_bus, pcms->gsi, &rtc_state, true,
>> +    pc_basic_device_init(isa_bus, x86ms->gsi, &rtc_state, true,
>>                           (pcms->vmport != ON_OFF_AUTO_ON), pcms->pit_enabled,
>>                           0x4);
>>  
>> @@ -288,7 +290,7 @@ else {
>>          smi_irq = qemu_allocate_irq(pc_acpi_smi_interrupt, first_cpu, 0);
>>          /* TODO: Populate SPD eeprom data.  */
>>          smbus = piix4_pm_init(pci_bus, piix3_devfn + 3, 0xb100,
>> -                              pcms->gsi[9], smi_irq,
>> +                              x86ms->gsi[9], smi_irq,
>>                                pc_machine_is_smm_enabled(pcms),
>>                                &piix4_pm);
>>          smbus_eeprom_init(smbus, 8, NULL, 0);
>> @@ -304,7 +306,7 @@ else {
>>  
>>      if (machine->nvdimms_state->is_enabled) {
>>          nvdimm_init_acpi_state(machine->nvdimms_state, system_io,
>> -                               pcms->fw_cfg, OBJECT(pcms));
>> +                               x86ms->fw_cfg, OBJECT(pcms));
>>      }
>>  }
>>  
>> @@ -728,7 +730,7 @@ DEFINE_I440FX_MACHINE(v1_4, "pc-i440fx-1.4", pc_compat_1_4_fn,
>>  
>>  static void pc_i440fx_1_3_machine_options(MachineClass *m)
>>  {
>> -    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>> +    X86MachineClass *x86mc = X86_MACHINE_CLASS(m);
>>      static GlobalProperty compat[] = {
>>          PC_CPU_MODEL_IDS("1.3.0")
>>          { "usb-tablet", "usb_version", "1" },
>> @@ -739,7 +741,7 @@ static void pc_i440fx_1_3_machine_options(MachineClass *m)
>>  
>>      pc_i440fx_1_4_machine_options(m);
>>      m->hw_version = "1.3.0";
>> -    pcmc->compat_apic_id_mode = true;
>> +    x86mc->compat_apic_id_mode = true;
>>      compat_props_add(m->compat_props, compat, G_N_ELEMENTS(compat));
>>  }
>>  
>> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
>> index d4e8a1cb9f..71f71bc61d 100644
>> --- a/hw/i386/pc_q35.c
>> +++ b/hw/i386/pc_q35.c
>> @@ -41,6 +41,7 @@
>>  #include "hw/pci-host/q35.h"
>>  #include "hw/qdev-properties.h"
>>  #include "exec/address-spaces.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/i386/pc.h"
>>  #include "hw/i386/ich9.h"
>>  #include "hw/i386/amd_iommu.h"
>> @@ -115,6 +116,7 @@ static void pc_q35_init(MachineState *machine)
>>  {
>>      PCMachineState *pcms = PC_MACHINE(machine);
>>      PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
>> +    X86MachineState *x86ms = X86_MACHINE(pcms);
>>      Q35PCIHost *q35_host;
>>      PCIHostState *phb;
>>      PCIBus *host_bus;
>> @@ -152,34 +154,34 @@ static void pc_q35_init(MachineState *machine)
>>      /* Handle the machine opt max-ram-below-4g.  It is basically doing
>>       * min(qemu limit, user limit).
>>       */
>> -    if (!pcms->max_ram_below_4g) {
>> -        pcms->max_ram_below_4g = 1ULL << 32; /* default: 4G */;
>> +    if (!x86ms->max_ram_below_4g) {
>> +        x86ms->max_ram_below_4g = 1ULL << 32; /* default: 4G */;
>>      }
>> -    if (lowmem > pcms->max_ram_below_4g) {
>> -        lowmem = pcms->max_ram_below_4g;
>> +    if (lowmem > x86ms->max_ram_below_4g) {
>> +        lowmem = x86ms->max_ram_below_4g;
>>          if (machine->ram_size - lowmem > lowmem &&
>>              lowmem & (1 * GiB - 1)) {
>>              warn_report("There is possibly poor performance as the ram size "
>>                          " (0x%" PRIx64 ") is more then twice the size of"
>>                          " max-ram-below-4g (%"PRIu64") and"
>>                          " max-ram-below-4g is not a multiple of 1G.",
>> -                        (uint64_t)machine->ram_size, pcms->max_ram_below_4g);
>> +                        (uint64_t)machine->ram_size, x86ms->max_ram_below_4g);
>>          }
>>      }
>>  
>>      if (machine->ram_size >= lowmem) {
>> -        pcms->above_4g_mem_size = machine->ram_size - lowmem;
>> -        pcms->below_4g_mem_size = lowmem;
>> +        x86ms->above_4g_mem_size = machine->ram_size - lowmem;
>> +        x86ms->below_4g_mem_size = lowmem;
>>      } else {
>> -        pcms->above_4g_mem_size = 0;
>> -        pcms->below_4g_mem_size = machine->ram_size;
>> +        x86ms->above_4g_mem_size = 0;
>> +        x86ms->below_4g_mem_size = machine->ram_size;
>>      }
>>  
>>      if (xen_enabled()) {
>>          xen_hvm_init(pcms, &ram_memory);
>>      }
>>  
>> -    pc_cpus_init(pcms);
>> +    x86_cpus_init(x86ms, pcmc->default_cpu_version);
>>  
>>      kvmclock_create();
>>  
>> @@ -213,10 +215,10 @@ static void pc_q35_init(MachineState *machine)
>>      gsi_state = g_malloc0(sizeof(*gsi_state));
>>      if (kvm_ioapic_in_kernel()) {
>>          kvm_pc_setup_irq_routing(pcmc->pci_enabled);
>> -        pcms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
>> +        x86ms->gsi = qemu_allocate_irqs(kvm_pc_gsi_handler, gsi_state,
>>                                         GSI_NUM_PINS);
>>      } else {
>> -        pcms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>> +        x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>>      }
>>  
>>      /* create pci host bus */
>> @@ -231,9 +233,9 @@ static void pc_q35_init(MachineState *machine)
>>                               MCH_HOST_PROP_SYSTEM_MEM, NULL);
>>      object_property_set_link(OBJECT(q35_host), OBJECT(system_io),
>>                               MCH_HOST_PROP_IO_MEM, NULL);
>> -    object_property_set_int(OBJECT(q35_host), pcms->below_4g_mem_size,
>> +    object_property_set_int(OBJECT(q35_host), x86ms->below_4g_mem_size,
>>                              PCI_HOST_BELOW_4G_MEM_SIZE, NULL);
>> -    object_property_set_int(OBJECT(q35_host), pcms->above_4g_mem_size,
>> +    object_property_set_int(OBJECT(q35_host), x86ms->above_4g_mem_size,
>>                              PCI_HOST_ABOVE_4G_MEM_SIZE, NULL);
>>      /* pci */
>>      qdev_init_nofail(DEVICE(q35_host));
>> @@ -255,7 +257,7 @@ static void pc_q35_init(MachineState *machine)
>>      ich9_lpc = ICH9_LPC_DEVICE(lpc);
>>      lpc_dev = DEVICE(lpc);
>>      for (i = 0; i < GSI_NUM_PINS; i++) {
>> -        qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, pcms->gsi[i]);
>> +        qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, x86ms->gsi[i]);
>>      }
>>      pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
>>                   ICH9_LPC_NB_PIRQS);
>> @@ -279,7 +281,7 @@ static void pc_q35_init(MachineState *machine)
>>          ioapic_init_gsi(gsi_state, "q35");
>>      }
>>  
>> -    pc_register_ferr_irq(pcms->gsi[13]);
>> +    pc_register_ferr_irq(x86ms->gsi[13]);
>>  
>>      assert(pcms->vmport != ON_OFF_AUTO__MAX);
>>      if (pcms->vmport == ON_OFF_AUTO_AUTO) {
>> @@ -287,7 +289,7 @@ static void pc_q35_init(MachineState *machine)
>>      }
>>  
>>      /* init basic PC hardware */
>> -    pc_basic_device_init(isa_bus, pcms->gsi, &rtc_state, !mc->no_floppy,
>> +    pc_basic_device_init(isa_bus, x86ms->gsi, &rtc_state, !mc->no_floppy,
>>                           (pcms->vmport != ON_OFF_AUTO_ON), pcms->pit_enabled,
>>                           0xff0104);
>>  
>> @@ -330,7 +332,7 @@ static void pc_q35_init(MachineState *machine)
>>  
>>      if (machine->nvdimms_state->is_enabled) {
>>          nvdimm_init_acpi_state(machine->nvdimms_state, system_io,
>> -                               pcms->fw_cfg, OBJECT(pcms));
>> +                               x86ms->fw_cfg, OBJECT(pcms));
>>      }
>>  }
>>  
>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
>> index a9983f0bfb..97f38e0423 100644
>> --- a/hw/i386/pc_sysfw.c
>> +++ b/hw/i386/pc_sysfw.c
>> @@ -31,6 +31,7 @@
>>  #include "qemu/option.h"
>>  #include "qemu/units.h"
>>  #include "hw/sysbus.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/i386/pc.h"
>>  #include "hw/loader.h"
>>  #include "hw/qdev-properties.h"
>> @@ -38,8 +39,6 @@
>>  #include "hw/block/flash.h"
>>  #include "sysemu/kvm.h"
>>  
>> -#define BIOS_FILENAME "bios.bin"
>> -
>>  /*
>>   * We don't have a theoretically justifiable exact lower bound on the base
>>   * address of any flash mapping. In practice, the IO-APIC MMIO range is
>> @@ -211,59 +210,6 @@ static void pc_system_flash_map(PCMachineState *pcms,
>>      }
>>  }
>>  
>> -static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)
>> -{
>> -    char *filename;
>> -    MemoryRegion *bios, *isa_bios;
>> -    int bios_size, isa_bios_size;
>> -    int ret;
>> -
>> -    /* BIOS load */
>> -    if (bios_name == NULL) {
>> -        bios_name = BIOS_FILENAME;
>> -    }
>> -    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
>> -    if (filename) {
>> -        bios_size = get_image_size(filename);
>> -    } else {
>> -        bios_size = -1;
>> -    }
>> -    if (bios_size <= 0 ||
>> -        (bios_size % 65536) != 0) {
>> -        goto bios_error;
>> -    }
>> -    bios = g_malloc(sizeof(*bios));
>> -    memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
>> -    if (!isapc_ram_fw) {
>> -        memory_region_set_readonly(bios, true);
>> -    }
>> -    ret = rom_add_file_fixed(bios_name, (uint32_t)(-bios_size), -1);
>> -    if (ret != 0) {
>> -    bios_error:
>> -        fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
>> -        exit(1);
>> -    }
>> -    g_free(filename);
>> -
>> -    /* map the last 128KB of the BIOS in ISA space */
>> -    isa_bios_size = MIN(bios_size, 128 * KiB);
>> -    isa_bios = g_malloc(sizeof(*isa_bios));
>> -    memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
>> -                             bios_size - isa_bios_size, isa_bios_size);
>> -    memory_region_add_subregion_overlap(rom_memory,
>> -                                        0x100000 - isa_bios_size,
>> -                                        isa_bios,
>> -                                        1);
>> -    if (!isapc_ram_fw) {
>> -        memory_region_set_readonly(isa_bios, true);
>> -    }
>> -
>> -    /* map all the bios at the top of memory */
>> -    memory_region_add_subregion(rom_memory,
>> -                                (uint32_t)(-bios_size),
>> -                                bios);
>> -}
>> -
>>  void pc_system_firmware_init(PCMachineState *pcms,
>>                               MemoryRegion *rom_memory)
>>  {
>> @@ -272,7 +218,7 @@ void pc_system_firmware_init(PCMachineState *pcms,
>>      BlockBackend *pflash_blk[ARRAY_SIZE(pcms->flash)];
>>  
>>      if (!pcmc->pci_enabled) {
>> -        old_pc_system_rom_init(rom_memory, true);
>> +        x86_system_rom_init(rom_memory, true);
>>          return;
>>      }
>>  
>> @@ -293,7 +239,7 @@ void pc_system_firmware_init(PCMachineState *pcms,
>>  
>>      if (!pflash_blk[0]) {
>>          /* Machine property pflash0 not set, use ROM mode */
>> -        old_pc_system_rom_init(rom_memory, false);
>> +        x86_system_rom_init(rom_memory, false);
>>      } else {
>>          if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
>>              /*
>> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>> new file mode 100644
>> index 0000000000..4de9dd100f
>> --- /dev/null
>> +++ b/hw/i386/x86.c
>> @@ -0,0 +1,788 @@
>> +/*
>> + * Copyright (c) 2003-2004 Fabrice Bellard
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/option.h"
>> +#include "qemu/cutils.h"
>> +#include "qemu/units.h"
>> +#include "qapi/error.h"
>> +#include "qapi/qmp/qerror.h"
>> +#include "qapi/qapi-visit-common.h"
>> +#include "qapi/visitor.h"
>> +#include "sysemu/qtest.h"
>> +#include "sysemu/numa.h"
>> +#include "sysemu/replay.h"
>> +#include "sysemu/sysemu.h"
>> +
>> +#include "hw/i386/x86.h"
>> +#include "target/i386/cpu.h"
>> +#include "hw/i386/topology.h"
>> +#include "hw/i386/fw_cfg.h"
>> +#include "hw/acpi/cpu_hotplug.h"
>> +#include "hw/nmi.h"
>> +#include "hw/loader.h"
>> +#include "multiboot.h"
>> +#include "pvh.h"
>> +#include "standard-headers/asm-x86/bootparam.h"
>> +
>> +#define BIOS_FILENAME "bios.bin"
>> +
>> +/* Calculates initial APIC ID for a specific CPU index
>> + *
>> + * Currently we need to be able to calculate the APIC ID from the CPU index
>> + * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces have
>> + * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of
>> + * all CPUs up to max_cpus.
>> + */
>> +uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
>> +                                    unsigned int cpu_index)
>> +{
>> +    MachineState *ms = MACHINE(x86ms);
>> +    X86MachineClass *x86mc = X86_MACHINE_GET_CLASS(x86ms);
>> +    uint32_t correct_id;
>> +    static bool warned;
>> +
>> +    correct_id = x86_apicid_from_cpu_idx(x86ms->smp_dies, ms->smp.cores,
>> +                                         ms->smp.threads, cpu_index);
>> +    if (x86mc->compat_apic_id_mode) {
>> +        if (cpu_index != correct_id && !warned && !qtest_enabled()) {
>> +            error_report("APIC IDs set in compatibility mode, "
>> +                         "CPU topology won't match the configuration");
>> +            warned = true;
>> +        }
>> +        return cpu_index;
>> +    } else {
>> +        return correct_id;
>> +    }
>> +}
>> +
>> +
>> +static void x86_new_cpu(X86MachineState *x86ms, int64_t apic_id, Error **errp)
>> +{
>> +    Object *cpu = NULL;
>> +    Error *local_err = NULL;
>> +    CPUX86State *env = NULL;
>> +
>> +    cpu = object_new(MACHINE(x86ms)->cpu_type);
>> +
>> +    env = &X86_CPU(cpu)->env;
>> +    env->nr_dies = x86ms->smp_dies;
>> +
>> +    object_property_set_uint(cpu, apic_id, "apic-id", &local_err);
>> +    object_property_set_bool(cpu, true, "realized", &local_err);
>> +
>> +    object_unref(cpu);
>> +    error_propagate(errp, local_err);
>> +}
>> +
>> +/*
>> + * This function is very similar to smp_parse()
>> + * in hw/core/machine.c but includes CPU die support.
>> + */
>> +void x86_smp_parse(MachineState *ms, QemuOpts *opts)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(ms);
>> +
>> +    if (opts) {
>> +        unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
>> +        unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
>> +        unsigned dies = qemu_opt_get_number(opts, "dies", 1);
>> +        unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
>> +        unsigned threads = qemu_opt_get_number(opts, "threads", 0);
>> +
>> +        /* compute missing values, prefer sockets over cores over threads */
>> +        if (cpus == 0 || sockets == 0) {
>> +            cores = cores > 0 ? cores : 1;
>> +            threads = threads > 0 ? threads : 1;
>> +            if (cpus == 0) {
>> +                sockets = sockets > 0 ? sockets : 1;
>> +                cpus = cores * threads * dies * sockets;
>> +            } else {
>> +                ms->smp.max_cpus =
>> +                        qemu_opt_get_number(opts, "maxcpus", cpus);
>> +                sockets = ms->smp.max_cpus / (cores * threads * dies);
>> +            }
>> +        } else if (cores == 0) {
>> +            threads = threads > 0 ? threads : 1;
>> +            cores = cpus / (sockets * dies * threads);
>> +            cores = cores > 0 ? cores : 1;
>> +        } else if (threads == 0) {
>> +            threads = cpus / (cores * dies * sockets);
>> +            threads = threads > 0 ? threads : 1;
>> +        } else if (sockets * dies * cores * threads < cpus) {
>> +            error_report("cpu topology: "
>> +                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
>> +                         "smp_cpus (%u)",
>> +                         sockets, dies, cores, threads, cpus);
>> +            exit(1);
>> +        }
>> +
>> +        ms->smp.max_cpus =
>> +                qemu_opt_get_number(opts, "maxcpus", cpus);
>> +
>> +        if (ms->smp.max_cpus < cpus) {
>> +            error_report("maxcpus must be equal to or greater than smp");
>> +            exit(1);
>> +        }
>> +
>> +        if (sockets * dies * cores * threads > ms->smp.max_cpus) {
>> +            error_report("cpu topology: "
>> +                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) > "
>> +                         "maxcpus (%u)",
>> +                         sockets, dies, cores, threads,
>> +                         ms->smp.max_cpus);
>> +            exit(1);
>> +        }
>> +
>> +        if (sockets * dies * cores * threads != ms->smp.max_cpus) {
>> +            warn_report("Invalid CPU topology deprecated: "
>> +                        "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
>> +                        "!= maxcpus (%u)",
>> +                        sockets, dies, cores, threads,
>> +                        ms->smp.max_cpus);
>> +        }
>> +
>> +        ms->smp.cpus = cpus;
>> +        ms->smp.cores = cores;
>> +        ms->smp.threads = threads;
>> +        x86ms->smp_dies = dies;
>> +    }
>> +
>> +    if (ms->smp.cpus > 1) {
>> +        Error *blocker = NULL;
>> +        error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED, "smp");
>> +        replay_add_blocker(blocker);
>> +    }
>> +}
>> +
>> +void x86_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(ms);
>> +    int64_t apic_id = x86_cpu_apic_id_from_index(x86ms, id);
>> +    Error *local_err = NULL;
>> +
>> +    if (id < 0) {
>> +        error_setg(errp, "Invalid CPU id: %" PRIi64, id);
>> +        return;
>> +    }
>> +
>> +    if (apic_id >= ACPI_CPU_HOTPLUG_ID_LIMIT) {
>> +        error_setg(errp, "Unable to add CPU: %" PRIi64
>> +                   ", resulting APIC ID (%" PRIi64 ") is too large",
>> +                   id, apic_id);
>> +        return;
>> +    }
>> +
>> +    x86_new_cpu(X86_MACHINE(ms), apic_id, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +}
>> +
>> +void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
>> +{
>> +    int i;
>> +    const CPUArchIdList *possible_cpus;
>> +    MachineState *ms = MACHINE(x86ms);
>> +    MachineClass *mc = MACHINE_GET_CLASS(x86ms);
>> +
>> +    x86_cpu_set_default_version(default_cpu_version);
>> +
>> +    /* Calculates the limit to CPU APIC ID values
>> +     *
>> +     * Limit for the APIC ID value, so that all
>> +     * CPU APIC IDs are < x86ms->apic_id_limit.
>> +     *
>> +     * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
>> +     */
>> +    x86ms->apic_id_limit = x86_cpu_apic_id_from_index(x86ms,
>> +                                                      ms->smp.max_cpus - 1) + 1;
>> +    possible_cpus = mc->possible_cpu_arch_ids(ms);
>> +    for (i = 0; i < ms->smp.cpus; i++) {
>> +        x86_new_cpu(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
>> +    }
>> +}
>> +
>> +void x86_nmi(NMIState *n, int cpu_index, Error **errp)
>> +{
>> +    /* cpu index isn't used */
>> +    CPUState *cs;
>> +
>> +    CPU_FOREACH(cs) {
>> +        X86CPU *cpu = X86_CPU(cs);
>> +
>> +        if (!cpu->apic_state) {
>> +            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
>> +        } else {
>> +            apic_deliver_nmi(cpu->apic_state);
>> +        }
>> +    }
>> +}
>> +
>> +CpuInstanceProperties
>> +x86_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>> +{
>> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
>> +
>> +    assert(cpu_index < possible_cpus->len);
>> +    return possible_cpus->cpus[cpu_index].props;
>> +}
>> +
>> +int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx)
>> +{
>> +   X86CPUTopoInfo topo;
>> +   X86MachineState *x86ms = X86_MACHINE(ms);
>> +
>> +   assert(idx < ms->possible_cpus->len);
>> +   x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
>> +                            x86ms->smp_dies, ms->smp.cores,
>> +                            ms->smp.threads, &topo);
>> +   return topo.pkg_id % ms->numa_state->num_nodes;
>> +}
>> +
>> +const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(ms);
>> +    int i;
>> +    unsigned int max_cpus = ms->smp.max_cpus;
>> +
>> +    if (ms->possible_cpus) {
>> +        /*
>> +         * make sure that max_cpus hasn't changed since the first use, i.e.
>> +         * -smp hasn't been parsed after it
>> +        */
>> +        assert(ms->possible_cpus->len == max_cpus);
>> +        return ms->possible_cpus;
>> +    }
>> +
>> +    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
>> +                                  sizeof(CPUArchId) * max_cpus);
>> +    ms->possible_cpus->len = max_cpus;
>> +    for (i = 0; i < ms->possible_cpus->len; i++) {
>> +        X86CPUTopoInfo topo;
>> +
>> +        ms->possible_cpus->cpus[i].type = ms->cpu_type;
>> +        ms->possible_cpus->cpus[i].vcpus_count = 1;
>> +        ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(x86ms, i);
>> +        x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
>> +                                 x86ms->smp_dies, ms->smp.cores,
>> +                                 ms->smp.threads, &topo);
>> +        ms->possible_cpus->cpus[i].props.has_socket_id = true;
>> +        ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id;
>> +        if (x86ms->smp_dies > 1) {
>> +            ms->possible_cpus->cpus[i].props.has_die_id = true;
>> +            ms->possible_cpus->cpus[i].props.die_id = topo.die_id;
>> +        }
>> +        ms->possible_cpus->cpus[i].props.has_core_id = true;
>> +        ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
>> +        ms->possible_cpus->cpus[i].props.has_thread_id = true;
>> +        ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
>> +    }
>> +    return ms->possible_cpus;
>> +}
>> +
>> +void x86_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw)
>> +{
>> +    char *filename;
>> +    MemoryRegion *bios, *isa_bios;
>> +    int bios_size, isa_bios_size;
>> +    int ret;
>> +
>> +    /* BIOS load */
>> +    if (bios_name == NULL) {
>> +        bios_name = BIOS_FILENAME;
>> +    }
>> +    filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
>> +    if (filename) {
>> +        bios_size = get_image_size(filename);
>> +    } else {
>> +        bios_size = -1;
>> +    }
>> +    if (bios_size <= 0 ||
>> +        (bios_size % 65536) != 0) {
>> +        goto bios_error;
>> +    }
>> +    bios = g_malloc(sizeof(*bios));
>> +    memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
>> +    if (!isapc_ram_fw) {
>> +        memory_region_set_readonly(bios, true);
>> +    }
>> +    ret = rom_add_file_fixed(bios_name, (uint32_t)(-bios_size), -1);
>> +    if (ret != 0) {
>> +    bios_error:
>> +        fprintf(stderr, "qemu: could not load PC BIOS '%s'\n", bios_name);
>> +        exit(1);
>> +    }
>> +    g_free(filename);
>> +
>> +    /* map the last 128KB of the BIOS in ISA space */
>> +    isa_bios_size = MIN(bios_size, 128 * KiB);
>> +    isa_bios = g_malloc(sizeof(*isa_bios));
>> +    memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
>> +                             bios_size - isa_bios_size, isa_bios_size);
>> +    memory_region_add_subregion_overlap(rom_memory,
>> +                                        0x100000 - isa_bios_size,
>> +                                        isa_bios,
>> +                                        1);
>> +    if (!isapc_ram_fw) {
>> +        memory_region_set_readonly(isa_bios, true);
>> +    }
>> +
>> +    /* map all the bios at the top of memory */
>> +    memory_region_add_subregion(rom_memory,
>> +                                (uint32_t)(-bios_size),
>> +                                bios);
>> +}
>> +
>> +static long get_file_size(FILE *f)
>> +{
>> +    long where, size;
>> +
>> +    /* XXX: on Unix systems, using fstat() probably makes more sense */
>> +
>> +    where = ftell(f);
>> +    fseek(f, 0, SEEK_END);
>> +    size = ftell(f);
>> +    fseek(f, where, SEEK_SET);
>> +
>> +    return size;
>> +}
>> +
>> +struct setup_data {
>> +    uint64_t next;
>> +    uint32_t type;
>> +    uint32_t len;
>> +    uint8_t data[0];
>> +} __attribute__((packed));
>> +
>> +void load_linux(X86MachineState *x86ms,
>> +                FWCfgState *fw_cfg,
>> +                unsigned acpi_data_size,
>> +                bool linuxboot_dma_enabled,
>> +                bool pvh_enabled)
>> +{
>> +    uint16_t protocol;
>> +    int setup_size, kernel_size, cmdline_size;
>> +    int dtb_size, setup_data_offset;
>> +    uint32_t initrd_max;
>> +    uint8_t header[8192], *setup, *kernel;
>> +    hwaddr real_addr, prot_addr, cmdline_addr, initrd_addr = 0;
>> +    FILE *f;
>> +    char *vmode;
>> +    MachineState *machine = MACHINE(x86ms);
>> +    struct setup_data *setup_data;
>> +    const char *kernel_filename = machine->kernel_filename;
>> +    const char *initrd_filename = machine->initrd_filename;
>> +    const char *dtb_filename = machine->dtb;
>> +    const char *kernel_cmdline = machine->kernel_cmdline;
>> +
>> +    /* Align to 16 bytes as a paranoia measure */
>> +    cmdline_size = (strlen(kernel_cmdline)+16) & ~15;
>> +
>> +    /* load the kernel header */
>> +    f = fopen(kernel_filename, "rb");
>> +    if (!f || !(kernel_size = get_file_size(f)) ||
>> +        fread(header, 1, MIN(ARRAY_SIZE(header), kernel_size), f) !=
>> +        MIN(ARRAY_SIZE(header), kernel_size)) {
>> +        fprintf(stderr, "qemu: could not load kernel '%s': %s\n",
>> +                kernel_filename, strerror(errno));
>> +        exit(1);
>> +    }
>> +
>> +    /* kernel protocol version */
>> +#if 0
>> +    fprintf(stderr, "header magic: %#x\n", ldl_p(header+0x202));
>> +#endif
>> +    if (ldl_p(header+0x202) == 0x53726448) {
>> +        protocol = lduw_p(header+0x206);
>> +    } else {
>> +        size_t pvh_start_addr;
>> +        uint32_t mh_load_addr = 0;
>> +        uint32_t elf_kernel_size = 0;
>> +        /*
>> +         * This could be a multiboot kernel. If it is, let's stop treating it
>> +         * like a Linux kernel.
>> +         * Note: some multiboot images could be in the ELF format (the same of
>> +         * PVH), so we try multiboot first since we check the multiboot magic
>> +         * header before to load it.
>> +         */
>> +        if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
>> +                           kernel_cmdline, kernel_size, header)) {
>> +            return;
>> +        }
>> +        /*
>> +         * Check if the file is an uncompressed kernel file (ELF) and load it,
>> +         * saving the PVH entry point used by the x86/HVM direct boot ABI.
>> +         * If load_elfboot() is successful, populate the fw_cfg info.
>> +         */
>> +        if (pvh_enabled &&
>> +            pvh_load_elfboot(kernel_filename,
>> +                             &mh_load_addr, &elf_kernel_size)) {
>> +            fclose(f);
>> +
>> +            pvh_start_addr = pvh_get_start_addr();
>> +
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
>> +
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>> +                strlen(kernel_cmdline) + 1);
>> +            fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> +
>> +            fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
>> +            fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
>> +                             header, sizeof(header));
>> +
>> +            /* load initrd */
>> +            if (initrd_filename) {
>> +                GMappedFile *mapped_file;
>> +                gsize initrd_size;
>> +                gchar *initrd_data;
>> +                GError *gerr = NULL;
>> +
>> +                mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
>> +                if (!mapped_file) {
>> +                    fprintf(stderr, "qemu: error reading initrd %s: %s\n",
>> +                            initrd_filename, gerr->message);
>> +                    exit(1);
>> +                }
>> +                x86ms->initrd_mapped_file = mapped_file;
>> +
>> +                initrd_data = g_mapped_file_get_contents(mapped_file);
>> +                initrd_size = g_mapped_file_get_length(mapped_file);
>> +                initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
>> +                if (initrd_size >= initrd_max) {
>> +                    fprintf(stderr, "qemu: initrd is too large, cannot support."
>> +                            "(max: %"PRIu32", need %"PRId64")\n",
>> +                            initrd_max, (uint64_t)initrd_size);
>> +                    exit(1);
>> +                }
>> +
>> +                initrd_addr = (initrd_max - initrd_size) & ~4095;
>> +
>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
>> +                fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
>> +                fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data,
>> +                                 initrd_size);
>> +            }
>> +
>> +            option_rom[nb_option_roms].bootindex = 0;
>> +            option_rom[nb_option_roms].name = "pvh.bin";
>> +            nb_option_roms++;
>> +
>> +            return;
>> +        }
>> +        protocol = 0;
>> +    }
>> +
>> +    if (protocol < 0x200 || !(header[0x211] & 0x01)) {
>> +        /* Low kernel */
>> +        real_addr    = 0x90000;
>> +        cmdline_addr = 0x9a000 - cmdline_size;
>> +        prot_addr    = 0x10000;
>> +    } else if (protocol < 0x202) {
>> +        /* High but ancient kernel */
>> +        real_addr    = 0x90000;
>> +        cmdline_addr = 0x9a000 - cmdline_size;
>> +        prot_addr    = 0x100000;
>> +    } else {
>> +        /* High and recent kernel */
>> +        real_addr    = 0x10000;
>> +        cmdline_addr = 0x20000;
>> +        prot_addr    = 0x100000;
>> +    }
>> +
>> +#if 0
>> +    fprintf(stderr,
>> +            "qemu: real_addr     = 0x" TARGET_FMT_plx "\n"
>> +            "qemu: cmdline_addr  = 0x" TARGET_FMT_plx "\n"
>> +            "qemu: prot_addr     = 0x" TARGET_FMT_plx "\n",
>> +            real_addr,
>> +            cmdline_addr,
>> +            prot_addr);
>> +#endif
>> +
>> +    /* highest address for loading the initrd */
>> +    if (protocol >= 0x20c &&
>> +        lduw_p(header+0x236) & XLF_CAN_BE_LOADED_ABOVE_4G) {
>> +        /*
>> +         * Linux has supported initrd up to 4 GB for a very long time (2007,
>> +         * long before XLF_CAN_BE_LOADED_ABOVE_4G which was added in 2013),
>> +         * though it only sets initrd_max to 2 GB to "work around bootloader
>> +         * bugs". Luckily, QEMU firmware(which does something like bootloader)
>> +         * has supported this.
>> +         *
>> +         * It's believed that if XLF_CAN_BE_LOADED_ABOVE_4G is set, initrd can
>> +         * be loaded into any address.
>> +         *
>> +         * In addition, initrd_max is uint32_t simply because QEMU doesn't
>> +         * support the 64-bit boot protocol (specifically the ext_ramdisk_image
>> +         * field).
>> +         *
>> +         * Therefore here just limit initrd_max to UINT32_MAX simply as well.
>> +         */
>> +        initrd_max = UINT32_MAX;
>> +    } else if (protocol >= 0x203) {
>> +        initrd_max = ldl_p(header+0x22c);
>> +    } else {
>> +        initrd_max = 0x37ffffff;
>> +    }
>> +
>> +    if (initrd_max >= x86ms->below_4g_mem_size - acpi_data_size) {
>> +        initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
>> +    }
>> +
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline)+1);
>> +    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> +
>> +    if (protocol >= 0x202) {
>> +        stl_p(header+0x228, cmdline_addr);
>> +    } else {
>> +        stw_p(header+0x20, 0xA33F);
>> +        stw_p(header+0x22, cmdline_addr-real_addr);
>> +    }
>> +
>> +    /* handle vga= parameter */
>> +    vmode = strstr(kernel_cmdline, "vga=");
>> +    if (vmode) {
>> +        unsigned int video_mode;
>> +        /* skip "vga=" */
>> +        vmode += 4;
>> +        if (!strncmp(vmode, "normal", 6)) {
>> +            video_mode = 0xffff;
>> +        } else if (!strncmp(vmode, "ext", 3)) {
>> +            video_mode = 0xfffe;
>> +        } else if (!strncmp(vmode, "ask", 3)) {
>> +            video_mode = 0xfffd;
>> +        } else {
>> +            video_mode = strtol(vmode, NULL, 0);
>> +        }
>> +        stw_p(header+0x1fa, video_mode);
>> +    }
>> +
>> +    /* loader type */
>> +    /* High nybble = B reserved for QEMU; low nybble is revision number.
>> +       If this code is substantially changed, you may want to consider
>> +       incrementing the revision. */
>> +    if (protocol >= 0x200) {
>> +        header[0x210] = 0xB0;
>> +    }
>> +    /* heap */
>> +    if (protocol >= 0x201) {
>> +        header[0x211] |= 0x80;	/* CAN_USE_HEAP */
>> +        stw_p(header+0x224, cmdline_addr-real_addr-0x200);
>> +    }
>> +
>> +    /* load initrd */
>> +    if (initrd_filename) {
>> +        GMappedFile *mapped_file;
>> +        gsize initrd_size;
>> +        gchar *initrd_data;
>> +        GError *gerr = NULL;
>> +
>> +        if (protocol < 0x200) {
>> +            fprintf(stderr, "qemu: linux kernel too old to load a ram disk\n");
>> +            exit(1);
>> +        }
>> +
>> +        mapped_file = g_mapped_file_new(initrd_filename, false, &gerr);
>> +        if (!mapped_file) {
>> +            fprintf(stderr, "qemu: error reading initrd %s: %s\n",
>> +                    initrd_filename, gerr->message);
>> +            exit(1);
>> +        }
>> +        x86ms->initrd_mapped_file = mapped_file;
>> +
>> +        initrd_data = g_mapped_file_get_contents(mapped_file);
>> +        initrd_size = g_mapped_file_get_length(mapped_file);
>> +        if (initrd_size >= initrd_max) {
>> +            fprintf(stderr, "qemu: initrd is too large, cannot support."
>> +                    "(max: %"PRIu32", need %"PRId64")\n",
>> +                    initrd_max, (uint64_t)initrd_size);
>> +            exit(1);
>> +        }
>> +
>> +        initrd_addr = (initrd_max-initrd_size) & ~4095;
>> +
>> +        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
>> +        fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
>> +        fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
>> +
>> +        stl_p(header+0x218, initrd_addr);
>> +        stl_p(header+0x21c, initrd_size);
>> +    }
>> +
>> +    /* load kernel and setup */
>> +    setup_size = header[0x1f1];
>> +    if (setup_size == 0) {
>> +        setup_size = 4;
>> +    }
>> +    setup_size = (setup_size+1)*512;
>> +    if (setup_size > kernel_size) {
>> +        fprintf(stderr, "qemu: invalid kernel header\n");
>> +        exit(1);
>> +    }
>> +    kernel_size -= setup_size;
>> +
>> +    setup  = g_malloc(setup_size);
>> +    kernel = g_malloc(kernel_size);
>> +    fseek(f, 0, SEEK_SET);
>> +    if (fread(setup, 1, setup_size, f) != setup_size) {
>> +        fprintf(stderr, "fread() failed\n");
>> +        exit(1);
>> +    }
>> +    if (fread(kernel, 1, kernel_size, f) != kernel_size) {
>> +        fprintf(stderr, "fread() failed\n");
>> +        exit(1);
>> +    }
>> +    fclose(f);
>> +
>> +    /* append dtb to kernel */
>> +    if (dtb_filename) {
>> +        if (protocol < 0x209) {
>> +            fprintf(stderr, "qemu: Linux kernel too old to load a dtb\n");
>> +            exit(1);
>> +        }
>> +
>> +        dtb_size = get_image_size(dtb_filename);
>> +        if (dtb_size <= 0) {
>> +            fprintf(stderr, "qemu: error reading dtb %s: %s\n",
>> +                    dtb_filename, strerror(errno));
>> +            exit(1);
>> +        }
>> +
>> +        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
>> +        kernel_size = setup_data_offset + sizeof(struct setup_data) + dtb_size;
>> +        kernel = g_realloc(kernel, kernel_size);
>> +
>> +        stq_p(header+0x250, prot_addr + setup_data_offset);
>> +
>> +        setup_data = (struct setup_data *)(kernel + setup_data_offset);
>> +        setup_data->next = 0;
>> +        setup_data->type = cpu_to_le32(SETUP_DTB);
>> +        setup_data->len = cpu_to_le32(dtb_size);
>> +
>> +        load_image_size(dtb_filename, setup_data->data, dtb_size);
>> +    }
>> +
>> +    memcpy(setup, header, MIN(sizeof(header), setup_size));
>> +
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
>> +    fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
>> +
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_ADDR, real_addr);
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
>> +    fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
>> +
>> +    option_rom[nb_option_roms].bootindex = 0;
>> +    option_rom[nb_option_roms].name = "linuxboot.bin";
>> +    if (linuxboot_dma_enabled && fw_cfg_dma_enabled(fw_cfg)) {
>> +        option_rom[nb_option_roms].name = "linuxboot_dma.bin";
>> +    }
>> +    nb_option_roms++;
>> +}
>> +
>> +static void x86_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
>> +                                             const char *name, void *opaque,
>> +                                             Error **errp)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>> +    uint64_t value = x86ms->max_ram_below_4g;
>> +
>> +    visit_type_size(v, name, &value, errp);
>> +}
>> +
>> +static void x86_machine_set_max_ram_below_4g(Object *obj, Visitor *v,
>> +                                             const char *name, void *opaque,
>> +                                             Error **errp)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>> +    Error *error = NULL;
>> +    uint64_t value;
>> +
>> +    visit_type_size(v, name, &value, &error);
>> +    if (error) {
>> +        error_propagate(errp, error);
>> +        return;
>> +    }
>> +    if (value > 4 * GiB) {
>> +        error_setg(&error,
>> +                   "Machine option 'max-ram-below-4g=%"PRIu64
>> +                   "' expects size less than or equal to 4G", value);
>> +        error_propagate(errp, error);
>> +        return;
>> +    }
>> +
>> +    if (value < 1 * MiB) {
>> +        warn_report("Only %" PRIu64 " bytes of RAM below the 4GiB boundary,"
>> +                    "BIOS may not work with less than 1MiB", value);
>> +    }
>> +
>> +    x86ms->max_ram_below_4g = value;
>> +}
>> +
>> +static void x86_machine_initfn(Object *obj)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(obj);
>> +
>> +    x86ms->max_ram_below_4g = 0; /* use default */
>> +    x86ms->smp_dies = 1;
>> +}
>> +
>> +static void x86_machine_class_init(ObjectClass *oc, void *data)
>> +{
>> +    MachineClass *mc = MACHINE_CLASS(oc);
>> +
>> +    mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
>> +    mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
>> +    mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
>> +
>> +    object_class_property_add(oc, X86_MACHINE_MAX_RAM_BELOW_4G, "size",
>> +        x86_machine_get_max_ram_below_4g, x86_machine_set_max_ram_below_4g,
>> +        NULL, NULL, &error_abort);
>> +
>> +    object_class_property_set_description(oc, X86_MACHINE_MAX_RAM_BELOW_4G,
>> +        "Maximum ram below the 4G boundary (32bit boundary)", &error_abort);
>> +}
>> +
>> +static const TypeInfo x86_machine_info = {
>> +    .name = TYPE_X86_MACHINE,
>> +    .parent = TYPE_MACHINE,
>> +    .abstract = true,
>> +    .instance_size = sizeof(X86MachineState),
>> +    .instance_init = x86_machine_initfn,
>> +    .class_size = sizeof(X86MachineClass),
>> +    .class_init = x86_machine_class_init,

Don't we also have:

       .interfaces = (InterfaceInfo[]) {
           { TYPE_NMI },
           { }
       },

>> +};
>> +
>> +static void x86_machine_register_types(void)
>> +{
>> +    type_register_static(&x86_machine_info);
>> +}
>> +
>> +type_init(x86_machine_register_types)
>> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
>> index 1ede055387..e621dde6c3 100644
>> --- a/hw/intc/ioapic.c
>> +++ b/hw/intc/ioapic.c
>> @@ -23,6 +23,7 @@
>>  #include "qemu/osdep.h"
>>  #include "qapi/error.h"
>>  #include "monitor/monitor.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/i386/pc.h"
>>  #include "hw/i386/apic.h"
>>  #include "hw/i386/ioapic.h"
>> @@ -89,7 +90,7 @@ static void ioapic_entry_parse(uint64_t entry, struct ioapic_entry_info *info)
>>  
>>  static void ioapic_service(IOAPICCommonState *s)
>>  {
>> -    AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
>> +    AddressSpace *ioapic_as = X86_MACHINE(qdev_get_machine())->ioapic_as;
>>      struct ioapic_entry_info info;
>>      uint8_t i;
>>      uint32_t mask;
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 062feeb69e..de28d55e5c 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -3,6 +3,7 @@
>>  
>>  #include "exec/memory.h"
>>  #include "hw/boards.h"
>> +#include "hw/i386/x86.h"
>>  #include "hw/isa/isa.h"
>>  #include "hw/block/fdc.h"
>>  #include "hw/block/flash.h"
>> @@ -27,7 +28,7 @@
>>   */
>>  struct PCMachineState {
>>      /*< private >*/
>> -    MachineState parent_obj;
>> +    X86MachineState parent_obj;
>>  
>>      /* <public> */
>>  
>> @@ -36,15 +37,10 @@ struct PCMachineState {
>>  
>>      /* Pointers to devices and objects: */
>>      HotplugHandler *acpi_dev;
>> -    ISADevice *rtc;
>>      PCIBus *bus;
>> -    FWCfgState *fw_cfg;
>> -    qemu_irq *gsi;
>>      PFlashCFI01 *flash[2];
>> -    GMappedFile *initrd_mapped_file;
>>  
>>      /* Configuration options: */
>> -    uint64_t max_ram_below_4g;
>>      OnOffAuto vmport;
>>      OnOffAuto smm;
>>  
>> @@ -53,27 +49,13 @@ struct PCMachineState {
>>      bool sata_enabled;
>>      bool pit_enabled;
>>  
>> -    /* RAM information (sizes, addresses, configuration): */
>> -    ram_addr_t below_4g_mem_size, above_4g_mem_size;
>> -
>> -    /* CPU and apic information: */
>> -    bool apic_xrupt_override;
>> -    unsigned apic_id_limit;
>> -    uint16_t boot_cpus;
>> -    unsigned smp_dies;
>> -
>>      /* NUMA information: */
>>      uint64_t numa_nodes;
>>      uint64_t *node_mem;
>> -
>> -    /* Address space used by IOAPIC device. All IOAPIC interrupts
>> -     * will be translated to MSI messages in the address space. */
>> -    AddressSpace *ioapic_as;
>>  };
>>  
>>  #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
>>  #define PC_MACHINE_DEVMEM_REGION_SIZE "device-memory-region-size"
>> -#define PC_MACHINE_MAX_RAM_BELOW_4G "max-ram-below-4g"
>>  #define PC_MACHINE_VMPORT           "vmport"
>>  #define PC_MACHINE_SMM              "smm"
>>  #define PC_MACHINE_SMBUS            "smbus"
>> @@ -139,9 +121,6 @@ typedef struct PCMachineClass {
>>  
>>      /* use PVH to load kernels that support this feature */
>>      bool pvh_enabled;
>> -
>> -    /* Enables contiguous-apic-ID mode */
>> -    bool compat_apic_id_mode;
>>  } PCMachineClass;
>>  
>>  #define TYPE_PC_MACHINE "generic-pc-machine"
>> @@ -193,10 +172,6 @@ bool pc_machine_is_smm_enabled(PCMachineState *pcms);
>>  void pc_register_ferr_irq(qemu_irq irq);
>>  void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
>>  
>> -void pc_cpus_init(PCMachineState *pcms);
>> -void pc_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
>> -void pc_smp_parse(MachineState *ms, QemuOpts *opts);
>> -
>>  void pc_guest_info_init(PCMachineState *pcms);
>>  
>>  #define PCI_HOST_PROP_PCI_HOLE_START   "pci-hole-start"
>> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
>> new file mode 100644
>> index 0000000000..5980090b29
>> --- /dev/null
>> +++ b/include/hw/i386/x86.h
>> @@ -0,0 +1,97 @@
>> +/*
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef HW_I386_X86_H
>> +#define HW_I386_X86_H
>> +
>> +#include "qemu-common.h"
>> +#include "exec/hwaddr.h"
>> +#include "qemu/notify.h"
>> +
>> +#include "hw/boards.h"
>> +#include "hw/nmi.h"
>> +
>> +typedef struct {
>> +    /*< private >*/
>> +    MachineClass parent;
>> +
>> +    /*< public >*/
>> +
>> +    /* Enables contiguous-apic-ID mode */
>> +    bool compat_apic_id_mode;
>> +} X86MachineClass;
>> +
>> +typedef struct {
>> +    /*< private >*/
>> +    MachineState parent;
>> +
>> +    /*< public >*/
>> +
>> +    /* Pointers to devices and objects: */
>> +    ISADevice *rtc;
>> +    FWCfgState *fw_cfg;
>> +    qemu_irq *gsi;
>> +    GMappedFile *initrd_mapped_file;
>> +
>> +    /* Configuration options: */
>> +    uint64_t max_ram_below_4g;
>> +
>> +    /* RAM information (sizes, addresses, configuration): */
>> +    ram_addr_t below_4g_mem_size, above_4g_mem_size;
>> +
>> +    /* CPU and apic information: */
>> +    bool apic_xrupt_override;
>> +    unsigned apic_id_limit;
>> +    uint16_t boot_cpus;
>> +    unsigned smp_dies;
>> +
>> +    /* Address space used by IOAPIC device. All IOAPIC interrupts
>> +     * will be translated to MSI messages in the address space. */
>> +    AddressSpace *ioapic_as;
>> +} X86MachineState;
>> +
>> +#define X86_MACHINE_MAX_RAM_BELOW_4G "max-ram-below-4g"
>> +
>> +#define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")

Maybe we should name it TYPE_X86_BASE_MACHINE (or COMMON?) since it is
not a real machine, but a abstract base class.

>> +#define X86_MACHINE(obj) \
>> +    OBJECT_CHECK(X86MachineState, (obj), TYPE_X86_MACHINE)
>> +#define X86_MACHINE_GET_CLASS(obj) \
>> +    OBJECT_GET_CLASS(X86MachineClass, obj, TYPE_X86_MACHINE)
>> +#define X86_MACHINE_CLASS(class) \
>> +    OBJECT_CLASS_CHECK(X86MachineClass, class, TYPE_X86_MACHINE)
>> +
>> +uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
>> +                                    unsigned int cpu_index);
>> +
>> +void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
>> +void x86_hot_add_cpu(MachineState *ms, const int64_t id, Error **errp);
>> +void x86_smp_parse(MachineState *ms, QemuOpts *opts);
>> +void x86_nmi(NMIState *n, int cpu_index, Error **errp);
>> +
>> +CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
>> +                                             unsigned cpu_index);
>> +int64_t x86_get_default_cpu_node_id(const MachineState *ms, int idx);
>> +const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms);
>> +
>> +void x86_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw);
>> +
>> +void load_linux(X86MachineState *x86ms,
> 
> Maybe rename x86_load_linux()?
> 
>> +                FWCfgState *fw_cfg,
>> +                unsigned acpi_data_size,
>> +                bool linuxboot_dma_enabled,
>> +                bool pvh_enabled);
>> +
>> +#endif
>>
> 
> Patch looks good, however I'd split it as:
> 
> 1/ rename functions x86_*
> 2/ export functions, add "hw/i386/x86.h"
> 3/ move functions to hw/i386/x86.c
> 4/ add/use X86MachineState
> 
> Anyhow if the maintainer is happy as it:
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> 


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-24 12:44   ` Sergio Lopez
@ 2019-09-25 15:40     ` Philippe Mathieu-Daudé
  -1 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-25 15:40 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, lersek,
	kraxel, mtosatti, kvm



On 9/24/19 2:44 PM, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a minimalist machine type free
> from the burden of legacy compatibility, serving as a stepping stone
> for future projects aiming at improving boot times, reducing the
> attack surface and slimming down QEMU's footprint.
> 
> The microvm machine type supports the following devices:
> 
>  - ISA bus
>  - i8259 PIC
>  - LAPIC (implicit if using KVM)
>  - IOAPIC (defaults to kernel_irqchip_split = true)
>  - i8254 PIT
>  - MC146818 RTC (optional)
>  - kvmclock (if using KVM)
>  - fw_cfg
>  - One ISA serial port (optional)
>  - Up to eight virtio-mmio devices (configured by the user)
> 
> It supports the following machine-specific options:
> 
> microvm.option-roms=bool (Set off to disable loading option ROMs)
> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
> 
> By default, microvm uses qboot as its BIOS, to obtain better boot
> times, but it's also compatible with SeaBIOS.
> 
> As no current FW is able to boot from a block device using virtio-mmio
> as its transport, a microvm-based VM needs to be run using a host-side
> kernel and, optionally, an initrd image.
> 
> This is an example of instantiating a microvm VM with a virtio-mmio
> based console:
> 
> qemu-system-x86_64 -M microvm
>  -enable-kvm -cpu host -m 512m -smp 2 \
>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>  -nodefaults -no-user-config -nographic \
>  -chardev stdio,id=virtiocon0,server \
>  -device virtio-serial-device \
>  -device virtconsole,chardev=virtiocon0 \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0
> 
> This is another example, this time using an ISA serial port, useful
> for debugging purposes:
> 
> qemu-system-x86_64 -M microvm \
>  -enable-kvm -cpu host -m 512m -smp 2 \
>  -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
>  -nodefaults -no-user-config -nographic \
>  -serial stdio \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0
> 
> Finally, in this example a microvm VM is instantiated without RTC,
> without an ISA serial port and without loading the option ROMs,
> obtaining the smallest configuration:
> 
> qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
>  -enable-kvm -cpu host -m 512m -smp 2 \
>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>  -nodefaults -no-user-config -nographic \
>  -chardev stdio,id=virtiocon0,server \
>  -device virtio-serial-device \
>  -device virtconsole,chardev=virtiocon0 \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  default-configs/i386-softmmu.mak |   1 +
>  hw/i386/Kconfig                  |   4 +
>  hw/i386/Makefile.objs            |   1 +
>  hw/i386/microvm.c                | 512 +++++++++++++++++++++++++++++++
>  include/hw/i386/microvm.h        |  80 +++++
>  5 files changed, 598 insertions(+)
>  create mode 100644 hw/i386/microvm.c
>  create mode 100644 include/hw/i386/microvm.h
> 
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index cd5ea391e8..c27cdd98e9 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>  CONFIG_I440FX=y
>  CONFIG_Q35=y
>  CONFIG_ACPI_PCI=y
> +CONFIG_MICROVM=y
> \ No newline at end of file
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index 6350438036..324e193dd8 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -88,6 +88,10 @@ config Q35
>      select SMBIOS
>      select FW_CFG_DMA
>  
> +config MICROVM
> +    bool
> +    select VIRTIO_MMIO
> +
>  config VTD
>      bool
>  
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 5b4b3a672e..bb17d54567 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -6,6 +6,7 @@ obj-y += pc.o
>  obj-y += e820.o
>  obj-$(CONFIG_I440FX) += pc_piix.o
>  obj-$(CONFIG_Q35) += pc_q35.o
> +obj-$(CONFIG_MICROVM) += microvm.o
>  obj-y += fw_cfg.o pc_sysfw.o
>  obj-y += x86-iommu.o
>  obj-$(CONFIG_VTD) += intel_iommu.o
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> new file mode 100644
> index 0000000000..4b494a1b27
> --- /dev/null
> +++ b/hw/i386/microvm.c
> @@ -0,0 +1,512 @@
> +/*
> + * Copyright (c) 2018 Intel Corporation
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qemu/cutils.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "qapi/visitor.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/numa.h"
> +#include "sysemu/reset.h"
> +
> +#include "hw/loader.h"
> +#include "hw/irq.h"
> +#include "hw/nmi.h"
> +#include "hw/kvm/clock.h"
> +#include "hw/i386/microvm.h"
> +#include "hw/i386/x86.h"
> +#include "hw/i386/pc.h"
> +#include "target/i386/cpu.h"
> +#include "hw/timer/i8254.h"
> +#include "hw/timer/mc146818rtc.h"
> +#include "hw/char/serial.h"
> +#include "hw/i386/topology.h"
> +#include "hw/i386/e820.h"
> +#include "hw/i386/fw_cfg.h"
> +#include "hw/virtio/virtio-mmio.h"
> +
> +#include "cpu.h"
> +#include "elf.h"
> +#include "pvh.h"
> +#include "kvm_i386.h"
> +#include "hw/xen/start_info.h"
> +
> +#define MICROVM_BIOS_FILENAME "bios-microvm.bin"
> +
> +static void microvm_set_rtc(MicrovmMachineState *mms, ISADevice *s)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(mms);
> +    int val;
> +
> +    val = MIN(x86ms->below_4g_mem_size / KiB, 640);
> +    rtc_set_memory(s, 0x15, val);
> +    rtc_set_memory(s, 0x16, val >> 8);
> +    /* extended memory (next 64MiB) */
> +    if (x86ms->below_4g_mem_size > 1 * MiB) {
> +        val = (x86ms->below_4g_mem_size - 1 * MiB) / KiB;
> +    } else {
> +        val = 0;
> +    }
> +    if (val > 65535) {
> +        val = 65535;
> +    }
> +    rtc_set_memory(s, 0x17, val);
> +    rtc_set_memory(s, 0x18, val >> 8);
> +    rtc_set_memory(s, 0x30, val);
> +    rtc_set_memory(s, 0x31, val >> 8);
> +    /* memory between 16MiB and 4GiB */
> +    if (x86ms->below_4g_mem_size > 16 * MiB) {
> +        val = (x86ms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
> +    } else {
> +        val = 0;
> +    }
> +    if (val > 65535) {
> +        val = 65535;
> +    }
> +    rtc_set_memory(s, 0x34, val);
> +    rtc_set_memory(s, 0x35, val >> 8);
> +    /* memory above 4GiB */
> +    val = x86ms->above_4g_mem_size / 65536;
> +    rtc_set_memory(s, 0x5b, val);
> +    rtc_set_memory(s, 0x5c, val >> 8);
> +    rtc_set_memory(s, 0x5d, val >> 16);
> +}
> +
> +static void microvm_devices_init(MicrovmMachineState *mms)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(mms);
> +    ISABus *isa_bus;
> +    ISADevice *rtc_state;
> +    GSIState *gsi_state;
> +    qemu_irq *i8259;
> +    int i;
> +
> +    gsi_state = g_malloc0(sizeof(*gsi_state));
> +    x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
> +
> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
> +                          &error_abort);
> +    isa_bus_irqs(isa_bus, x86ms->gsi);
> +
> +    i8259 = i8259_init(isa_bus, pc_allocate_cpu_irq());
> +
> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
> +        gsi_state->i8259_irq[i] = i8259[i];
> +    }
> +
> +    ioapic_init_gsi(gsi_state, "machine");
> +
> +    if (mms->rtc_enabled) {
> +        rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL);
> +        microvm_set_rtc(mms, rtc_state);
> +    }
> +

Maybe refactor that ...

> +    if (kvm_pit_in_kernel()) {
> +        kvm_pit_init(isa_bus, 0x40);
> +    } else {
> +        i8254_pit_init(isa_bus, 0x40, 0, NULL);
> +    }

... as a x86_pit_create() function?

> +
> +    kvmclock_create();
> +
> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
> +        int nirq = VIRTIO_IRQ_BASE + i;
> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
> +        qemu_irq mmio_irq;
> +
> +        isa_init_irq(isadev, &mmio_irq, nirq);
> +        sysbus_create_simple("virtio-mmio",
> +                             VIRTIO_MMIO_BASE + i * 512,
> +                             x86ms->gsi[VIRTIO_IRQ_BASE + i]);
> +    }
> +
> +    g_free(i8259);

Not related to this patch, but i8259_init() API is not clear,
it returns an allocated array of allocated qemu_irqs? Is it safe to copy
them to gsi_state then free the array?

> +
> +    if (mms->isa_serial_enabled) {
> +        serial_hds_isa_init(isa_bus, 0, 1);
> +    }
> +
> +    if (bios_name == NULL) {
> +        bios_name = MICROVM_BIOS_FILENAME;
> +    }
> +    x86_system_rom_init(get_system_memory(), true);
> +}
> +
> +static void microvm_memory_init(MicrovmMachineState *mms)
> +{
> +    MachineState *machine = MACHINE(mms);
> +    X86MachineState *x86ms = X86_MACHINE(mms);
> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
> +    MemoryRegion *system_memory = get_system_memory();
> +    FWCfgState *fw_cfg;
> +    ram_addr_t lowmem;
> +    int i;
> +
> +    /*
> +     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
> +     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
> +     * also known as MMCFG).
> +     * If it doesn't, we need to split it in chunks below and above 4G.
> +     * In any case, try to make sure that guest addresses aligned at
> +     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
> +     */
> +    if (machine->ram_size >= 0xb0000000) {
> +        lowmem = 0x80000000;
> +    } else {
> +        lowmem = 0xb0000000;
> +    }
> +
> +    /*
> +     * Handle the machine opt max-ram-below-4g.  It is basically doing
> +     * min(qemu limit, user limit).
> +     */
> +    if (!x86ms->max_ram_below_4g) {
> +        x86ms->max_ram_below_4g = 1ULL << 32; /* default: 4G */

Please use '4 * GiB' with no comment.

> +    }
> +    if (lowmem > x86ms->max_ram_below_4g) {
> +        lowmem = x86ms->max_ram_below_4g;
> +        if (machine->ram_size - lowmem > lowmem &&
> +            lowmem & (1 * GiB - 1)) {
> +            warn_report("There is possibly poor performance as the ram size "
> +                        " (0x%" PRIx64 ") is more then twice the size of"
> +                        " max-ram-below-4g (%"PRIu64") and"
> +                        " max-ram-below-4g is not a multiple of 1G.",
> +                        (uint64_t)machine->ram_size, x86ms->max_ram_below_4g);
> +        }
> +    }
> +
> +    if (machine->ram_size > lowmem) {
> +        x86ms->above_4g_mem_size = machine->ram_size - lowmem;
> +        x86ms->below_4g_mem_size = lowmem;
> +    } else {
> +        x86ms->above_4g_mem_size = 0;
> +        x86ms->below_4g_mem_size = machine->ram_size;
> +    }
> +
> +    ram = g_malloc(sizeof(*ram));
> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
> +                                         machine->ram_size);
> +
> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> +                             0, x86ms->below_4g_mem_size);
> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
> +
> +    e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
> +
> +    if (x86ms->above_4g_mem_size > 0) {
> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> +                                 x86ms->below_4g_mem_size,
> +                                 x86ms->above_4g_mem_size);
> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
> +                                    ram_above_4g);
> +        e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM);
> +    }
> +
> +    fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4,
> +                                &address_space_memory);
> +
> +    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
> +    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)x86ms->apic_id_limit);
> +    fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)machine->ram_size);
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
> +
> +    rom_set_fw(fw_cfg);
> +
> +    e820_create_fw_entry(fw_cfg);
> +
> +    load_linux(x86ms, fw_cfg, 0, true, true);
> +
> +    if (mms->option_roms_enabled) {
> +        for (i = 0; i < nb_option_roms; i++) {
> +            rom_add_option(option_rom[i].name, option_rom[i].bootindex);
> +        }
> +    }
> +
> +    x86ms->fw_cfg = fw_cfg;
> +    x86ms->ioapic_as = &address_space_memory;
> +}
> +
> +static gchar *microvm_get_mmio_cmdline(gchar *name)
> +{
> +    gchar *cmdline;
> +    gchar *separator;
> +    long int index;
> +    int ret;
> +
> +    separator = g_strrstr(name, ".");
> +    if (!separator) {
> +        return NULL;
> +    }
> +
> +    if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) {
> +        return NULL;
> +    }
> +
> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
> +                     " virtio_mmio.device=512@0x%lx:%ld",
> +                     VIRTIO_MMIO_BASE + index * 512,
> +                     VIRTIO_IRQ_BASE + index);
> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
> +        g_free(cmdline);
> +        return NULL;
> +    }
> +
> +    return cmdline;
> +}
> +
> +static void microvm_fix_kernel_cmdline(MachineState *machine)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(machine);
> +    BusState *bus;
> +    BusChild *kid;
> +    char *cmdline;
> +
> +    /*
> +     * Find MMIO transports with attached devices, and add them to the kernel
> +     * command line.
> +     *
> +     * Yes, this is a hack, but one that heavily improves the UX without
> +     * introducing any significant issues.
> +     */
> +    cmdline = g_strdup(machine->kernel_cmdline);
> +    bus = sysbus_get_default();
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        DeviceState *dev = kid->child;
> +        ObjectClass *class = object_get_class(OBJECT(dev));
> +
> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
> +
> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
> +                if (mmio_cmdline) {
> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
> +                    g_free(mmio_cmdline);
> +                    g_free(cmdline);
> +                    cmdline = newcmd;
> +                }
> +            }
> +        }
> +    }
> +
> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
> +}
> +
> +static void microvm_machine_state_init(MachineState *machine)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    X86MachineState *x86ms = X86_MACHINE(machine);
> +    Error *local_err = NULL;
> +
> +    if (machine->kernel_filename == NULL) {
> +        error_report("missing kernel image file name, required by microvm");
> +        exit(1);
> +    }
> +
> +    microvm_memory_init(mms);
> +
> +    x86_cpus_init(x86ms, CPU_VERSION_LATEST);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        exit(1);
> +    }
> +
> +    microvm_devices_init(mms);
> +}
> +
> +static void microvm_machine_reset(MachineState *machine)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    CPUState *cs;
> +    X86CPU *cpu;
> +
> +    if (mms->kernel_cmdline_enabled && !mms->kernel_cmdline_fixed) {
> +        microvm_fix_kernel_cmdline(machine);
> +        mms->kernel_cmdline_fixed = true;
> +    }
> +
> +    qemu_devices_reset();
> +
> +    CPU_FOREACH(cs) {
> +        cpu = X86_CPU(cs);
> +
> +        if (cpu->apic_state) {
> +            device_reset(cpu->apic_state);
> +        }
> +    }
> +}
> +
> +static bool microvm_machine_get_rtc(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->rtc_enabled;
> +}
> +
> +static void microvm_machine_set_rtc(Object *obj, bool value, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->rtc_enabled = value;
> +}
> +
> +static bool microvm_machine_get_isa_serial(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->isa_serial_enabled;
> +}
> +
> +static void microvm_machine_set_isa_serial(Object *obj, bool value,
> +                                           Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->isa_serial_enabled = value;
> +}
> +
> +static bool microvm_machine_get_option_roms(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->option_roms_enabled;
> +}
> +
> +static void microvm_machine_set_option_roms(Object *obj, bool value,
> +                                            Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->option_roms_enabled = value;
> +}
> +
> +static bool microvm_machine_get_kernel_cmdline(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->kernel_cmdline_enabled;
> +}
> +
> +static void microvm_machine_set_kernel_cmdline(Object *obj, bool value,
> +                                               Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->kernel_cmdline_enabled = value;
> +}
> +
> +static void microvm_machine_initfn(Object *obj)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    /* Configuration */
> +    mms->rtc_enabled = true;
> +    mms->isa_serial_enabled = true;
> +    mms->option_roms_enabled = true;
> +    mms->kernel_cmdline_enabled = true;
> +
> +    /* State */
> +    mms->kernel_cmdline_fixed = false;
> +}
> +
> +static void microvm_class_init(ObjectClass *oc, void *data)
> +{
> +    MachineClass *mc = MACHINE_CLASS(oc);
> +    NMIClass *nc = NMI_CLASS(oc);
> +
> +    mc->init = microvm_machine_state_init;
> +
> +    mc->family = "microvm_i386";
> +    mc->desc = "Microvm (i386)";
> +    mc->units_per_default_bus = 1;
> +    mc->no_floppy = 1;
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");

Aren't these common to X86?

> +    mc->max_cpus = 288;
> +    mc->has_hotpluggable_cpus = false;
> +    mc->auto_enable_numa_with_memhp = false;
> +    mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
> +    mc->nvdimm_supported = false;
> +
> +    /* Avoid relying too much on kernel components */
> +    mc->default_kernel_irqchip_split = true;
> +
> +    /* Machine class handlers */
> +    mc->reset = microvm_machine_reset;
> +
> +    /* NMI handler */
> +    nc->nmi_monitor_handler = x86_nmi;
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_RTC,
> +                                   microvm_machine_get_rtc,
> +                                   microvm_machine_set_rtc,
> +                                   &error_abort);
> +    object_class_property_set_description(oc, MICROVM_MACHINE_RTC,
> +        "Set off to disable the instantiation of an MC146818 RTC",
> +        &error_abort);
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_ISA_SERIAL,
> +                                   microvm_machine_get_isa_serial,
> +                                   microvm_machine_set_isa_serial,
> +                                   &error_abort);
> +    object_class_property_set_description(oc, MICROVM_MACHINE_ISA_SERIAL,
> +        "Set off to disable the instantiation an ISA serial port",
> +        &error_abort);
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_OPTION_ROMS,
> +                                   microvm_machine_get_option_roms,
> +                                   microvm_machine_set_option_roms,
> +                                   &error_abort);
> +    object_class_property_set_description(oc, MICROVM_MACHINE_OPTION_ROMS,
> +        "Set off to disable loading option ROMs", &error_abort);
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
> +                                   microvm_machine_get_kernel_cmdline,
> +                                   microvm_machine_set_kernel_cmdline,
> +                                   &error_abort);
> +    object_class_property_set_description(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
> +        "Set off to disable adding virtio-mmio devices to the kernel cmdline",
> +        &error_abort);
> +}
> +
> +static const TypeInfo microvm_machine_info = {
> +    .name          = TYPE_MICROVM_MACHINE,
> +    .parent        = TYPE_X86_MACHINE,
> +    .instance_size = sizeof(MicrovmMachineState),
> +    .instance_init = microvm_machine_initfn,
> +    .class_size    = sizeof(MicrovmMachineClass),
> +    .class_init    = microvm_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +         { TYPE_NMI },

Isn't this inherited from TYPE_X86_MACHINE?

> +         { }
> +    },
> +};
> +
> +static void microvm_machine_init(void)
> +{
> +    type_register_static(&microvm_machine_info);
> +}
> +type_init(microvm_machine_init);
> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
> new file mode 100644
> index 0000000000..04c8caf886
> --- /dev/null
> +++ b/include/hw/i386/microvm.h
> @@ -0,0 +1,80 @@
> +/*
> + * Copyright (c) 2018 Intel Corporation
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_I386_MICROVM_H
> +#define HW_I386_MICROVM_H
> +
> +#include "qemu-common.h"
> +#include "exec/hwaddr.h"
> +#include "qemu/notify.h"
> +
> +#include "hw/boards.h"
> +#include "hw/i386/x86.h"
> +
> +/* Microvm memory layout */
> +#define PVH_START_INFO        0x6000
> +#define MEMMAP_START          0x7000
> +#define MODLIST_START         0x7800
> +#define BOOT_STACK_POINTER    0x8ff0
> +#define PML4_START            0x9000
> +#define PDPTE_START           0xa000
> +#define PDE_START             0xb000
> +#define KERNEL_CMDLINE_START  0x20000
> +#define EBDA_START            0x9fc00
> +#define HIMEM_START           0x100000
> +
> +/* Platform virtio definitions */
> +#define VIRTIO_MMIO_BASE      0xc0000000
> +#define VIRTIO_IRQ_BASE       5
> +#define VIRTIO_NUM_TRANSPORTS 8
> +#define VIRTIO_CMDLINE_MAXLEN 64
> +
> +/* Machine type options */
> +#define MICROVM_MACHINE_RTC            "rtc"
> +#define MICROVM_MACHINE_ISA_SERIAL     "isa-serial"
> +#define MICROVM_MACHINE_OPTION_ROMS    "option-roms"
> +#define MICROVM_MACHINE_KERNEL_CMDLINE "kernel-cmdline"
> +
> +typedef struct {
> +    X86MachineClass parent;
> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
> +                                           DeviceState *dev);
> +} MicrovmMachineClass;
> +
> +typedef struct {
> +    X86MachineState parent;
> +
> +    /* Machine type options */
> +    bool rtc_enabled;
> +    bool isa_serial_enabled;
> +    bool option_roms_enabled;
> +    bool kernel_cmdline_enabled;
> +
> +
> +    /* Machine state */
> +    bool kernel_cmdline_fixed;
> +} MicrovmMachineState;
> +
> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
> +#define MICROVM_MACHINE(obj) \
> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_CLASS(class) \
> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
> +
> +#endif
> 

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-25 15:40     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 133+ messages in thread
From: Philippe Mathieu-Daudé @ 2019-09-25 15:40 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: ehabkost, kvm, mst, mtosatti, kraxel, pbonzini, imammedo, lersek, rth



On 9/24/19 2:44 PM, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a minimalist machine type free
> from the burden of legacy compatibility, serving as a stepping stone
> for future projects aiming at improving boot times, reducing the
> attack surface and slimming down QEMU's footprint.
> 
> The microvm machine type supports the following devices:
> 
>  - ISA bus
>  - i8259 PIC
>  - LAPIC (implicit if using KVM)
>  - IOAPIC (defaults to kernel_irqchip_split = true)
>  - i8254 PIT
>  - MC146818 RTC (optional)
>  - kvmclock (if using KVM)
>  - fw_cfg
>  - One ISA serial port (optional)
>  - Up to eight virtio-mmio devices (configured by the user)
> 
> It supports the following machine-specific options:
> 
> microvm.option-roms=bool (Set off to disable loading option ROMs)
> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
> 
> By default, microvm uses qboot as its BIOS, to obtain better boot
> times, but it's also compatible with SeaBIOS.
> 
> As no current FW is able to boot from a block device using virtio-mmio
> as its transport, a microvm-based VM needs to be run using a host-side
> kernel and, optionally, an initrd image.
> 
> This is an example of instantiating a microvm VM with a virtio-mmio
> based console:
> 
> qemu-system-x86_64 -M microvm
>  -enable-kvm -cpu host -m 512m -smp 2 \
>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>  -nodefaults -no-user-config -nographic \
>  -chardev stdio,id=virtiocon0,server \
>  -device virtio-serial-device \
>  -device virtconsole,chardev=virtiocon0 \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0
> 
> This is another example, this time using an ISA serial port, useful
> for debugging purposes:
> 
> qemu-system-x86_64 -M microvm \
>  -enable-kvm -cpu host -m 512m -smp 2 \
>  -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
>  -nodefaults -no-user-config -nographic \
>  -serial stdio \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0
> 
> Finally, in this example a microvm VM is instantiated without RTC,
> without an ISA serial port and without loading the option ROMs,
> obtaining the smallest configuration:
> 
> qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
>  -enable-kvm -cpu host -m 512m -smp 2 \
>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>  -nodefaults -no-user-config -nographic \
>  -chardev stdio,id=virtiocon0,server \
>  -device virtio-serial-device \
>  -device virtconsole,chardev=virtiocon0 \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  default-configs/i386-softmmu.mak |   1 +
>  hw/i386/Kconfig                  |   4 +
>  hw/i386/Makefile.objs            |   1 +
>  hw/i386/microvm.c                | 512 +++++++++++++++++++++++++++++++
>  include/hw/i386/microvm.h        |  80 +++++
>  5 files changed, 598 insertions(+)
>  create mode 100644 hw/i386/microvm.c
>  create mode 100644 include/hw/i386/microvm.h
> 
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index cd5ea391e8..c27cdd98e9 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>  CONFIG_I440FX=y
>  CONFIG_Q35=y
>  CONFIG_ACPI_PCI=y
> +CONFIG_MICROVM=y
> \ No newline at end of file
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index 6350438036..324e193dd8 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -88,6 +88,10 @@ config Q35
>      select SMBIOS
>      select FW_CFG_DMA
>  
> +config MICROVM
> +    bool
> +    select VIRTIO_MMIO
> +
>  config VTD
>      bool
>  
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 5b4b3a672e..bb17d54567 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -6,6 +6,7 @@ obj-y += pc.o
>  obj-y += e820.o
>  obj-$(CONFIG_I440FX) += pc_piix.o
>  obj-$(CONFIG_Q35) += pc_q35.o
> +obj-$(CONFIG_MICROVM) += microvm.o
>  obj-y += fw_cfg.o pc_sysfw.o
>  obj-y += x86-iommu.o
>  obj-$(CONFIG_VTD) += intel_iommu.o
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> new file mode 100644
> index 0000000000..4b494a1b27
> --- /dev/null
> +++ b/hw/i386/microvm.c
> @@ -0,0 +1,512 @@
> +/*
> + * Copyright (c) 2018 Intel Corporation
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qemu/cutils.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "qapi/visitor.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/numa.h"
> +#include "sysemu/reset.h"
> +
> +#include "hw/loader.h"
> +#include "hw/irq.h"
> +#include "hw/nmi.h"
> +#include "hw/kvm/clock.h"
> +#include "hw/i386/microvm.h"
> +#include "hw/i386/x86.h"
> +#include "hw/i386/pc.h"
> +#include "target/i386/cpu.h"
> +#include "hw/timer/i8254.h"
> +#include "hw/timer/mc146818rtc.h"
> +#include "hw/char/serial.h"
> +#include "hw/i386/topology.h"
> +#include "hw/i386/e820.h"
> +#include "hw/i386/fw_cfg.h"
> +#include "hw/virtio/virtio-mmio.h"
> +
> +#include "cpu.h"
> +#include "elf.h"
> +#include "pvh.h"
> +#include "kvm_i386.h"
> +#include "hw/xen/start_info.h"
> +
> +#define MICROVM_BIOS_FILENAME "bios-microvm.bin"
> +
> +static void microvm_set_rtc(MicrovmMachineState *mms, ISADevice *s)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(mms);
> +    int val;
> +
> +    val = MIN(x86ms->below_4g_mem_size / KiB, 640);
> +    rtc_set_memory(s, 0x15, val);
> +    rtc_set_memory(s, 0x16, val >> 8);
> +    /* extended memory (next 64MiB) */
> +    if (x86ms->below_4g_mem_size > 1 * MiB) {
> +        val = (x86ms->below_4g_mem_size - 1 * MiB) / KiB;
> +    } else {
> +        val = 0;
> +    }
> +    if (val > 65535) {
> +        val = 65535;
> +    }
> +    rtc_set_memory(s, 0x17, val);
> +    rtc_set_memory(s, 0x18, val >> 8);
> +    rtc_set_memory(s, 0x30, val);
> +    rtc_set_memory(s, 0x31, val >> 8);
> +    /* memory between 16MiB and 4GiB */
> +    if (x86ms->below_4g_mem_size > 16 * MiB) {
> +        val = (x86ms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
> +    } else {
> +        val = 0;
> +    }
> +    if (val > 65535) {
> +        val = 65535;
> +    }
> +    rtc_set_memory(s, 0x34, val);
> +    rtc_set_memory(s, 0x35, val >> 8);
> +    /* memory above 4GiB */
> +    val = x86ms->above_4g_mem_size / 65536;
> +    rtc_set_memory(s, 0x5b, val);
> +    rtc_set_memory(s, 0x5c, val >> 8);
> +    rtc_set_memory(s, 0x5d, val >> 16);
> +}
> +
> +static void microvm_devices_init(MicrovmMachineState *mms)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(mms);
> +    ISABus *isa_bus;
> +    ISADevice *rtc_state;
> +    GSIState *gsi_state;
> +    qemu_irq *i8259;
> +    int i;
> +
> +    gsi_state = g_malloc0(sizeof(*gsi_state));
> +    x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
> +
> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
> +                          &error_abort);
> +    isa_bus_irqs(isa_bus, x86ms->gsi);
> +
> +    i8259 = i8259_init(isa_bus, pc_allocate_cpu_irq());
> +
> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
> +        gsi_state->i8259_irq[i] = i8259[i];
> +    }
> +
> +    ioapic_init_gsi(gsi_state, "machine");
> +
> +    if (mms->rtc_enabled) {
> +        rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL);
> +        microvm_set_rtc(mms, rtc_state);
> +    }
> +

Maybe refactor that ...

> +    if (kvm_pit_in_kernel()) {
> +        kvm_pit_init(isa_bus, 0x40);
> +    } else {
> +        i8254_pit_init(isa_bus, 0x40, 0, NULL);
> +    }

... as a x86_pit_create() function?

> +
> +    kvmclock_create();
> +
> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
> +        int nirq = VIRTIO_IRQ_BASE + i;
> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
> +        qemu_irq mmio_irq;
> +
> +        isa_init_irq(isadev, &mmio_irq, nirq);
> +        sysbus_create_simple("virtio-mmio",
> +                             VIRTIO_MMIO_BASE + i * 512,
> +                             x86ms->gsi[VIRTIO_IRQ_BASE + i]);
> +    }
> +
> +    g_free(i8259);

Not related to this patch, but i8259_init() API is not clear,
it returns an allocated array of allocated qemu_irqs? Is it safe to copy
them to gsi_state then free the array?

> +
> +    if (mms->isa_serial_enabled) {
> +        serial_hds_isa_init(isa_bus, 0, 1);
> +    }
> +
> +    if (bios_name == NULL) {
> +        bios_name = MICROVM_BIOS_FILENAME;
> +    }
> +    x86_system_rom_init(get_system_memory(), true);
> +}
> +
> +static void microvm_memory_init(MicrovmMachineState *mms)
> +{
> +    MachineState *machine = MACHINE(mms);
> +    X86MachineState *x86ms = X86_MACHINE(mms);
> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
> +    MemoryRegion *system_memory = get_system_memory();
> +    FWCfgState *fw_cfg;
> +    ram_addr_t lowmem;
> +    int i;
> +
> +    /*
> +     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
> +     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
> +     * also known as MMCFG).
> +     * If it doesn't, we need to split it in chunks below and above 4G.
> +     * In any case, try to make sure that guest addresses aligned at
> +     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
> +     */
> +    if (machine->ram_size >= 0xb0000000) {
> +        lowmem = 0x80000000;
> +    } else {
> +        lowmem = 0xb0000000;
> +    }
> +
> +    /*
> +     * Handle the machine opt max-ram-below-4g.  It is basically doing
> +     * min(qemu limit, user limit).
> +     */
> +    if (!x86ms->max_ram_below_4g) {
> +        x86ms->max_ram_below_4g = 1ULL << 32; /* default: 4G */

Please use '4 * GiB' with no comment.

> +    }
> +    if (lowmem > x86ms->max_ram_below_4g) {
> +        lowmem = x86ms->max_ram_below_4g;
> +        if (machine->ram_size - lowmem > lowmem &&
> +            lowmem & (1 * GiB - 1)) {
> +            warn_report("There is possibly poor performance as the ram size "
> +                        " (0x%" PRIx64 ") is more then twice the size of"
> +                        " max-ram-below-4g (%"PRIu64") and"
> +                        " max-ram-below-4g is not a multiple of 1G.",
> +                        (uint64_t)machine->ram_size, x86ms->max_ram_below_4g);
> +        }
> +    }
> +
> +    if (machine->ram_size > lowmem) {
> +        x86ms->above_4g_mem_size = machine->ram_size - lowmem;
> +        x86ms->below_4g_mem_size = lowmem;
> +    } else {
> +        x86ms->above_4g_mem_size = 0;
> +        x86ms->below_4g_mem_size = machine->ram_size;
> +    }
> +
> +    ram = g_malloc(sizeof(*ram));
> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
> +                                         machine->ram_size);
> +
> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> +                             0, x86ms->below_4g_mem_size);
> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
> +
> +    e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
> +
> +    if (x86ms->above_4g_mem_size > 0) {
> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> +                                 x86ms->below_4g_mem_size,
> +                                 x86ms->above_4g_mem_size);
> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
> +                                    ram_above_4g);
> +        e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM);
> +    }
> +
> +    fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4,
> +                                &address_space_memory);
> +
> +    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
> +    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)x86ms->apic_id_limit);
> +    fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)machine->ram_size);
> +    fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
> +
> +    rom_set_fw(fw_cfg);
> +
> +    e820_create_fw_entry(fw_cfg);
> +
> +    load_linux(x86ms, fw_cfg, 0, true, true);
> +
> +    if (mms->option_roms_enabled) {
> +        for (i = 0; i < nb_option_roms; i++) {
> +            rom_add_option(option_rom[i].name, option_rom[i].bootindex);
> +        }
> +    }
> +
> +    x86ms->fw_cfg = fw_cfg;
> +    x86ms->ioapic_as = &address_space_memory;
> +}
> +
> +static gchar *microvm_get_mmio_cmdline(gchar *name)
> +{
> +    gchar *cmdline;
> +    gchar *separator;
> +    long int index;
> +    int ret;
> +
> +    separator = g_strrstr(name, ".");
> +    if (!separator) {
> +        return NULL;
> +    }
> +
> +    if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) {
> +        return NULL;
> +    }
> +
> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
> +                     " virtio_mmio.device=512@0x%lx:%ld",
> +                     VIRTIO_MMIO_BASE + index * 512,
> +                     VIRTIO_IRQ_BASE + index);
> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
> +        g_free(cmdline);
> +        return NULL;
> +    }
> +
> +    return cmdline;
> +}
> +
> +static void microvm_fix_kernel_cmdline(MachineState *machine)
> +{
> +    X86MachineState *x86ms = X86_MACHINE(machine);
> +    BusState *bus;
> +    BusChild *kid;
> +    char *cmdline;
> +
> +    /*
> +     * Find MMIO transports with attached devices, and add them to the kernel
> +     * command line.
> +     *
> +     * Yes, this is a hack, but one that heavily improves the UX without
> +     * introducing any significant issues.
> +     */
> +    cmdline = g_strdup(machine->kernel_cmdline);
> +    bus = sysbus_get_default();
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        DeviceState *dev = kid->child;
> +        ObjectClass *class = object_get_class(OBJECT(dev));
> +
> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
> +
> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
> +                if (mmio_cmdline) {
> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
> +                    g_free(mmio_cmdline);
> +                    g_free(cmdline);
> +                    cmdline = newcmd;
> +                }
> +            }
> +        }
> +    }
> +
> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
> +}
> +
> +static void microvm_machine_state_init(MachineState *machine)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    X86MachineState *x86ms = X86_MACHINE(machine);
> +    Error *local_err = NULL;
> +
> +    if (machine->kernel_filename == NULL) {
> +        error_report("missing kernel image file name, required by microvm");
> +        exit(1);
> +    }
> +
> +    microvm_memory_init(mms);
> +
> +    x86_cpus_init(x86ms, CPU_VERSION_LATEST);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        exit(1);
> +    }
> +
> +    microvm_devices_init(mms);
> +}
> +
> +static void microvm_machine_reset(MachineState *machine)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    CPUState *cs;
> +    X86CPU *cpu;
> +
> +    if (mms->kernel_cmdline_enabled && !mms->kernel_cmdline_fixed) {
> +        microvm_fix_kernel_cmdline(machine);
> +        mms->kernel_cmdline_fixed = true;
> +    }
> +
> +    qemu_devices_reset();
> +
> +    CPU_FOREACH(cs) {
> +        cpu = X86_CPU(cs);
> +
> +        if (cpu->apic_state) {
> +            device_reset(cpu->apic_state);
> +        }
> +    }
> +}
> +
> +static bool microvm_machine_get_rtc(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->rtc_enabled;
> +}
> +
> +static void microvm_machine_set_rtc(Object *obj, bool value, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->rtc_enabled = value;
> +}
> +
> +static bool microvm_machine_get_isa_serial(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->isa_serial_enabled;
> +}
> +
> +static void microvm_machine_set_isa_serial(Object *obj, bool value,
> +                                           Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->isa_serial_enabled = value;
> +}
> +
> +static bool microvm_machine_get_option_roms(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->option_roms_enabled;
> +}
> +
> +static void microvm_machine_set_option_roms(Object *obj, bool value,
> +                                            Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->option_roms_enabled = value;
> +}
> +
> +static bool microvm_machine_get_kernel_cmdline(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->kernel_cmdline_enabled;
> +}
> +
> +static void microvm_machine_set_kernel_cmdline(Object *obj, bool value,
> +                                               Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->kernel_cmdline_enabled = value;
> +}
> +
> +static void microvm_machine_initfn(Object *obj)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    /* Configuration */
> +    mms->rtc_enabled = true;
> +    mms->isa_serial_enabled = true;
> +    mms->option_roms_enabled = true;
> +    mms->kernel_cmdline_enabled = true;
> +
> +    /* State */
> +    mms->kernel_cmdline_fixed = false;
> +}
> +
> +static void microvm_class_init(ObjectClass *oc, void *data)
> +{
> +    MachineClass *mc = MACHINE_CLASS(oc);
> +    NMIClass *nc = NMI_CLASS(oc);
> +
> +    mc->init = microvm_machine_state_init;
> +
> +    mc->family = "microvm_i386";
> +    mc->desc = "Microvm (i386)";
> +    mc->units_per_default_bus = 1;
> +    mc->no_floppy = 1;
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");

Aren't these common to X86?

> +    mc->max_cpus = 288;
> +    mc->has_hotpluggable_cpus = false;
> +    mc->auto_enable_numa_with_memhp = false;
> +    mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
> +    mc->nvdimm_supported = false;
> +
> +    /* Avoid relying too much on kernel components */
> +    mc->default_kernel_irqchip_split = true;
> +
> +    /* Machine class handlers */
> +    mc->reset = microvm_machine_reset;
> +
> +    /* NMI handler */
> +    nc->nmi_monitor_handler = x86_nmi;
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_RTC,
> +                                   microvm_machine_get_rtc,
> +                                   microvm_machine_set_rtc,
> +                                   &error_abort);
> +    object_class_property_set_description(oc, MICROVM_MACHINE_RTC,
> +        "Set off to disable the instantiation of an MC146818 RTC",
> +        &error_abort);
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_ISA_SERIAL,
> +                                   microvm_machine_get_isa_serial,
> +                                   microvm_machine_set_isa_serial,
> +                                   &error_abort);
> +    object_class_property_set_description(oc, MICROVM_MACHINE_ISA_SERIAL,
> +        "Set off to disable the instantiation an ISA serial port",
> +        &error_abort);
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_OPTION_ROMS,
> +                                   microvm_machine_get_option_roms,
> +                                   microvm_machine_set_option_roms,
> +                                   &error_abort);
> +    object_class_property_set_description(oc, MICROVM_MACHINE_OPTION_ROMS,
> +        "Set off to disable loading option ROMs", &error_abort);
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
> +                                   microvm_machine_get_kernel_cmdline,
> +                                   microvm_machine_set_kernel_cmdline,
> +                                   &error_abort);
> +    object_class_property_set_description(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
> +        "Set off to disable adding virtio-mmio devices to the kernel cmdline",
> +        &error_abort);
> +}
> +
> +static const TypeInfo microvm_machine_info = {
> +    .name          = TYPE_MICROVM_MACHINE,
> +    .parent        = TYPE_X86_MACHINE,
> +    .instance_size = sizeof(MicrovmMachineState),
> +    .instance_init = microvm_machine_initfn,
> +    .class_size    = sizeof(MicrovmMachineClass),
> +    .class_init    = microvm_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +         { TYPE_NMI },

Isn't this inherited from TYPE_X86_MACHINE?

> +         { }
> +    },
> +};
> +
> +static void microvm_machine_init(void)
> +{
> +    type_register_static(&microvm_machine_info);
> +}
> +type_init(microvm_machine_init);
> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
> new file mode 100644
> index 0000000000..04c8caf886
> --- /dev/null
> +++ b/include/hw/i386/microvm.h
> @@ -0,0 +1,80 @@
> +/*
> + * Copyright (c) 2018 Intel Corporation
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_I386_MICROVM_H
> +#define HW_I386_MICROVM_H
> +
> +#include "qemu-common.h"
> +#include "exec/hwaddr.h"
> +#include "qemu/notify.h"
> +
> +#include "hw/boards.h"
> +#include "hw/i386/x86.h"
> +
> +/* Microvm memory layout */
> +#define PVH_START_INFO        0x6000
> +#define MEMMAP_START          0x7000
> +#define MODLIST_START         0x7800
> +#define BOOT_STACK_POINTER    0x8ff0
> +#define PML4_START            0x9000
> +#define PDPTE_START           0xa000
> +#define PDE_START             0xb000
> +#define KERNEL_CMDLINE_START  0x20000
> +#define EBDA_START            0x9fc00
> +#define HIMEM_START           0x100000
> +
> +/* Platform virtio definitions */
> +#define VIRTIO_MMIO_BASE      0xc0000000
> +#define VIRTIO_IRQ_BASE       5
> +#define VIRTIO_NUM_TRANSPORTS 8
> +#define VIRTIO_CMDLINE_MAXLEN 64
> +
> +/* Machine type options */
> +#define MICROVM_MACHINE_RTC            "rtc"
> +#define MICROVM_MACHINE_ISA_SERIAL     "isa-serial"
> +#define MICROVM_MACHINE_OPTION_ROMS    "option-roms"
> +#define MICROVM_MACHINE_KERNEL_CMDLINE "kernel-cmdline"
> +
> +typedef struct {
> +    X86MachineClass parent;
> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
> +                                           DeviceState *dev);
> +} MicrovmMachineClass;
> +
> +typedef struct {
> +    X86MachineState parent;
> +
> +    /* Machine type options */
> +    bool rtc_enabled;
> +    bool isa_serial_enabled;
> +    bool option_roms_enabled;
> +    bool kernel_cmdline_enabled;
> +
> +
> +    /* Machine state */
> +    bool kernel_cmdline_fixed;
> +} MicrovmMachineState;
> +
> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
> +#define MICROVM_MACHINE(obj) \
> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_CLASS(class) \
> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
> +
> +#endif
> 


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-25 15:04       ` Sergio Lopez
@ 2019-09-25 16:46         ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25 16:46 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm


[-- Attachment #1.1: Type: text/plain, Size: 1101 bytes --]

On 25/09/19 17:04, Sergio Lopez wrote:
> I'm going back to this level of the thread, because after your
> suggestion I took a deeper look at how things work around the PIC, and
> discovered I was completely wrong about my assumptions.
> 
> For virtio-mmio devices, given that we don't have the ability to
> configure vectors (as it's done in the PCI case) we're stuck with the
> ones provided by the platform PIC, which in the x86 case is the i8259
> (at least from Linux's perspective).
> 
> So we can get rid of the IOAPIC, but we need to keep the i8259 (we have
> both a userspace and a kernel implementation too, so it should be fine).

Hmm...  I would have thought the vectors are just GSIs, which will be
configured to the IOAPIC if it is present.  Maybe something is causing
Linux to ignore the IOAPIC?

> As for the PIT, we can omit it if we're running with KVM acceleration,
> as kvmclock will be used to calculate loops per jiffie and avoid the
> calibration, leaving it enabled otherwise.

Can you make it an OnOffAuto property, and default to on iff !KVM?

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-25 16:46         ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-25 16:46 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth


[-- Attachment #1.1: Type: text/plain, Size: 1101 bytes --]

On 25/09/19 17:04, Sergio Lopez wrote:
> I'm going back to this level of the thread, because after your
> suggestion I took a deeper look at how things work around the PIC, and
> discovered I was completely wrong about my assumptions.
> 
> For virtio-mmio devices, given that we don't have the ability to
> configure vectors (as it's done in the PCI case) we're stuck with the
> ones provided by the platform PIC, which in the x86 case is the i8259
> (at least from Linux's perspective).
> 
> So we can get rid of the IOAPIC, but we need to keep the i8259 (we have
> both a userspace and a kernel implementation too, so it should be fine).

Hmm...  I would have thought the vectors are just GSIs, which will be
configured to the IOAPIC if it is present.  Maybe something is causing
Linux to ignore the IOAPIC?

> As for the PIT, we can omit it if we're running with KVM acceleration,
> as kvmclock will be used to calculate loops per jiffie and avoid the
> calibration, leaving it enabled otherwise.

Can you make it an OnOffAuto property, and default to on iff !KVM?

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-25 16:46         ` Paolo Bonzini
@ 2019-09-26  6:23           ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26  6:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/09/19 17:04, Sergio Lopez wrote:
>> I'm going back to this level of the thread, because after your
>> suggestion I took a deeper look at how things work around the PIC, and
>> discovered I was completely wrong about my assumptions.
>> 
>> For virtio-mmio devices, given that we don't have the ability to
>> configure vectors (as it's done in the PCI case) we're stuck with the
>> ones provided by the platform PIC, which in the x86 case is the i8259
>> (at least from Linux's perspective).
>> 
>> So we can get rid of the IOAPIC, but we need to keep the i8259 (we have
>> both a userspace and a kernel implementation too, so it should be fine).
>
> Hmm...  I would have thought the vectors are just GSIs, which will be
> configured to the IOAPIC if it is present.  Maybe something is causing
> Linux to ignore the IOAPIC?

Turns out it was a bug in microvm. I was writing 0 to FW_CFG_NB_CPUS
(because I was using x86ms->boot_cpus instead of ms->smp.cpus), which
led to a broken MP table, causing Linux to ignore it and, as a side
effect to disable IOAPIC symmetric I/O mode.

After fixing it we can, indeed, boot without the i8259 \o/ :

/ # dmesg | grep legacy
[    0.074144] Using NULL legacy PIC
/ # cat /pr[   12.116930] random: fast init done
/ # cat /proc/interrupts 
           CPU0       CPU1       
  4:          0        278   IO-APIC   4-edge      ttyS0
 12:         48          0   IO-APIC  12-edge      virtio0
NMI:          0          0   Non-maskable interrupts
LOC:        124         98   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:        476        535   Rescheduling interrupts
CAL:          0         76   Function call interrupts
TLB:          0          0   TLB shootdowns
HYP:          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0
PIN:          0          0   Posted-interrupt notification event
NPI:          0          0   Nested posted-interrupt event
PIW:          0          0   Posted-interrupt wakeup event

There's still one problem. If the Guest doesn't have TSC_DEADLINE_TIME,
Linux hangs on APIC timer calibration. I'm looking for a way to work
around this. Worst case scenario, we can check for that feature and add
both PIC and PIT if is missing.

>> As for the PIT, we can omit it if we're running with KVM acceleration,
>> as kvmclock will be used to calculate loops per jiffie and avoid the
>> calibration, leaving it enabled otherwise.
>
> Can you make it an OnOffAuto property, and default to on iff !KVM?

Sure.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-26  6:23           ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26  6:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/09/19 17:04, Sergio Lopez wrote:
>> I'm going back to this level of the thread, because after your
>> suggestion I took a deeper look at how things work around the PIC, and
>> discovered I was completely wrong about my assumptions.
>> 
>> For virtio-mmio devices, given that we don't have the ability to
>> configure vectors (as it's done in the PCI case) we're stuck with the
>> ones provided by the platform PIC, which in the x86 case is the i8259
>> (at least from Linux's perspective).
>> 
>> So we can get rid of the IOAPIC, but we need to keep the i8259 (we have
>> both a userspace and a kernel implementation too, so it should be fine).
>
> Hmm...  I would have thought the vectors are just GSIs, which will be
> configured to the IOAPIC if it is present.  Maybe something is causing
> Linux to ignore the IOAPIC?

Turns out it was a bug in microvm. I was writing 0 to FW_CFG_NB_CPUS
(because I was using x86ms->boot_cpus instead of ms->smp.cpus), which
led to a broken MP table, causing Linux to ignore it and, as a side
effect to disable IOAPIC symmetric I/O mode.

After fixing it we can, indeed, boot without the i8259 \o/ :

/ # dmesg | grep legacy
[    0.074144] Using NULL legacy PIC
/ # cat /pr[   12.116930] random: fast init done
/ # cat /proc/interrupts 
           CPU0       CPU1       
  4:          0        278   IO-APIC   4-edge      ttyS0
 12:         48          0   IO-APIC  12-edge      virtio0
NMI:          0          0   Non-maskable interrupts
LOC:        124         98   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:        476        535   Rescheduling interrupts
CAL:          0         76   Function call interrupts
TLB:          0          0   TLB shootdowns
HYP:          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0
PIN:          0          0   Posted-interrupt notification event
NPI:          0          0   Nested posted-interrupt event
PIW:          0          0   Posted-interrupt wakeup event

There's still one problem. If the Guest doesn't have TSC_DEADLINE_TIME,
Linux hangs on APIC timer calibration. I'm looking for a way to work
around this. Worst case scenario, we can check for that feature and add
both PIC and PIT if is missing.

>> As for the PIT, we can omit it if we're running with KVM acceleration,
>> as kvmclock will be used to calculate loops per jiffie and avoid the
>> calibration, leaving it enabled otherwise.
>
> Can you make it an OnOffAuto property, and default to on iff !KVM?

Sure.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-25 15:40     ` Philippe Mathieu-Daudé
@ 2019-09-26  6:34       ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26  6:34 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 27719 bytes --]


Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a minimalist machine type free
>> from the burden of legacy compatibility, serving as a stepping stone
>> for future projects aiming at improving boot times, reducing the
>> attack surface and slimming down QEMU's footprint.
>> 
>> The microvm machine type supports the following devices:
>> 
>>  - ISA bus
>>  - i8259 PIC
>>  - LAPIC (implicit if using KVM)
>>  - IOAPIC (defaults to kernel_irqchip_split = true)
>>  - i8254 PIT
>>  - MC146818 RTC (optional)
>>  - kvmclock (if using KVM)
>>  - fw_cfg
>>  - One ISA serial port (optional)
>>  - Up to eight virtio-mmio devices (configured by the user)
>> 
>> It supports the following machine-specific options:
>> 
>> microvm.option-roms=bool (Set off to disable loading option ROMs)
>> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
>> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
>> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
>> 
>> By default, microvm uses qboot as its BIOS, to obtain better boot
>> times, but it's also compatible with SeaBIOS.
>> 
>> As no current FW is able to boot from a block device using virtio-mmio
>> as its transport, a microvm-based VM needs to be run using a host-side
>> kernel and, optionally, an initrd image.
>> 
>> This is an example of instantiating a microvm VM with a virtio-mmio
>> based console:
>> 
>> qemu-system-x86_64 -M microvm
>>  -enable-kvm -cpu host -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>  -nodefaults -no-user-config -nographic \
>>  -chardev stdio,id=virtiocon0,server \
>>  -device virtio-serial-device \
>>  -device virtconsole,chardev=virtiocon0 \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0
>> 
>> This is another example, this time using an ISA serial port, useful
>> for debugging purposes:
>> 
>> qemu-system-x86_64 -M microvm \
>>  -enable-kvm -cpu host -m 512m -smp 2 \
>>  -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
>>  -nodefaults -no-user-config -nographic \
>>  -serial stdio \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0
>> 
>> Finally, in this example a microvm VM is instantiated without RTC,
>> without an ISA serial port and without loading the option ROMs,
>> obtaining the smallest configuration:
>> 
>> qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
>>  -enable-kvm -cpu host -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>  -nodefaults -no-user-config -nographic \
>>  -chardev stdio,id=virtiocon0,server \
>>  -device virtio-serial-device \
>>  -device virtconsole,chardev=virtiocon0 \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  default-configs/i386-softmmu.mak |   1 +
>>  hw/i386/Kconfig                  |   4 +
>>  hw/i386/Makefile.objs            |   1 +
>>  hw/i386/microvm.c                | 512 +++++++++++++++++++++++++++++++
>>  include/hw/i386/microvm.h        |  80 +++++
>>  5 files changed, 598 insertions(+)
>>  create mode 100644 hw/i386/microvm.c
>>  create mode 100644 include/hw/i386/microvm.h
>> 
>> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
>> index cd5ea391e8..c27cdd98e9 100644
>> --- a/default-configs/i386-softmmu.mak
>> +++ b/default-configs/i386-softmmu.mak
>> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>>  CONFIG_I440FX=y
>>  CONFIG_Q35=y
>>  CONFIG_ACPI_PCI=y
>> +CONFIG_MICROVM=y
>> \ No newline at end of file
>> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
>> index 6350438036..324e193dd8 100644
>> --- a/hw/i386/Kconfig
>> +++ b/hw/i386/Kconfig
>> @@ -88,6 +88,10 @@ config Q35
>>      select SMBIOS
>>      select FW_CFG_DMA
>>  
>> +config MICROVM
>> +    bool
>> +    select VIRTIO_MMIO
>> +
>>  config VTD
>>      bool
>>  
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 5b4b3a672e..bb17d54567 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -6,6 +6,7 @@ obj-y += pc.o
>>  obj-y += e820.o
>>  obj-$(CONFIG_I440FX) += pc_piix.o
>>  obj-$(CONFIG_Q35) += pc_q35.o
>> +obj-$(CONFIG_MICROVM) += microvm.o
>>  obj-y += fw_cfg.o pc_sysfw.o
>>  obj-y += x86-iommu.o
>>  obj-$(CONFIG_VTD) += intel_iommu.o
>> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
>> new file mode 100644
>> index 0000000000..4b494a1b27
>> --- /dev/null
>> +++ b/hw/i386/microvm.c
>> @@ -0,0 +1,512 @@
>> +/*
>> + * Copyright (c) 2018 Intel Corporation
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/cutils.h"
>> +#include "qemu/units.h"
>> +#include "qapi/error.h"
>> +#include "qapi/visitor.h"
>> +#include "sysemu/sysemu.h"
>> +#include "sysemu/cpus.h"
>> +#include "sysemu/numa.h"
>> +#include "sysemu/reset.h"
>> +
>> +#include "hw/loader.h"
>> +#include "hw/irq.h"
>> +#include "hw/nmi.h"
>> +#include "hw/kvm/clock.h"
>> +#include "hw/i386/microvm.h"
>> +#include "hw/i386/x86.h"
>> +#include "hw/i386/pc.h"
>> +#include "target/i386/cpu.h"
>> +#include "hw/timer/i8254.h"
>> +#include "hw/timer/mc146818rtc.h"
>> +#include "hw/char/serial.h"
>> +#include "hw/i386/topology.h"
>> +#include "hw/i386/e820.h"
>> +#include "hw/i386/fw_cfg.h"
>> +#include "hw/virtio/virtio-mmio.h"
>> +
>> +#include "cpu.h"
>> +#include "elf.h"
>> +#include "pvh.h"
>> +#include "kvm_i386.h"
>> +#include "hw/xen/start_info.h"
>> +
>> +#define MICROVM_BIOS_FILENAME "bios-microvm.bin"
>> +
>> +static void microvm_set_rtc(MicrovmMachineState *mms, ISADevice *s)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(mms);
>> +    int val;
>> +
>> +    val = MIN(x86ms->below_4g_mem_size / KiB, 640);
>> +    rtc_set_memory(s, 0x15, val);
>> +    rtc_set_memory(s, 0x16, val >> 8);
>> +    /* extended memory (next 64MiB) */
>> +    if (x86ms->below_4g_mem_size > 1 * MiB) {
>> +        val = (x86ms->below_4g_mem_size - 1 * MiB) / KiB;
>> +    } else {
>> +        val = 0;
>> +    }
>> +    if (val > 65535) {
>> +        val = 65535;
>> +    }
>> +    rtc_set_memory(s, 0x17, val);
>> +    rtc_set_memory(s, 0x18, val >> 8);
>> +    rtc_set_memory(s, 0x30, val);
>> +    rtc_set_memory(s, 0x31, val >> 8);
>> +    /* memory between 16MiB and 4GiB */
>> +    if (x86ms->below_4g_mem_size > 16 * MiB) {
>> +        val = (x86ms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
>> +    } else {
>> +        val = 0;
>> +    }
>> +    if (val > 65535) {
>> +        val = 65535;
>> +    }
>> +    rtc_set_memory(s, 0x34, val);
>> +    rtc_set_memory(s, 0x35, val >> 8);
>> +    /* memory above 4GiB */
>> +    val = x86ms->above_4g_mem_size / 65536;
>> +    rtc_set_memory(s, 0x5b, val);
>> +    rtc_set_memory(s, 0x5c, val >> 8);
>> +    rtc_set_memory(s, 0x5d, val >> 16);
>> +}
>> +
>> +static void microvm_devices_init(MicrovmMachineState *mms)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(mms);
>> +    ISABus *isa_bus;
>> +    ISADevice *rtc_state;
>> +    GSIState *gsi_state;
>> +    qemu_irq *i8259;
>> +    int i;
>> +
>> +    gsi_state = g_malloc0(sizeof(*gsi_state));
>> +    x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>> +
>> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
>> +                          &error_abort);
>> +    isa_bus_irqs(isa_bus, x86ms->gsi);
>> +
>> +    i8259 = i8259_init(isa_bus, pc_allocate_cpu_irq());
>> +
>> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
>> +        gsi_state->i8259_irq[i] = i8259[i];
>> +    }
>> +
>> +    ioapic_init_gsi(gsi_state, "machine");
>> +
>> +    if (mms->rtc_enabled) {
>> +        rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL);
>> +        microvm_set_rtc(mms, rtc_state);
>> +    }
>> +
>
> Maybe refactor that ...
>
>> +    if (kvm_pit_in_kernel()) {
>> +        kvm_pit_init(isa_bus, 0x40);
>> +    } else {
>> +        i8254_pit_init(isa_bus, 0x40, 0, NULL);
>> +    }
>
> ... as a x86_pit_create() function?

This is deemed to change in v5, as we want to avoid the legacy PIC+PIT
when possible.

>> +
>> +    kvmclock_create();
>> +
>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>> +        int nirq = VIRTIO_IRQ_BASE + i;
>> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
>> +        qemu_irq mmio_irq;
>> +
>> +        isa_init_irq(isadev, &mmio_irq, nirq);
>> +        sysbus_create_simple("virtio-mmio",
>> +                             VIRTIO_MMIO_BASE + i * 512,
>> +                             x86ms->gsi[VIRTIO_IRQ_BASE + i]);
>> +    }
>> +
>> +    g_free(i8259);
>
> Not related to this patch, but i8259_init() API is not clear,
> it returns an allocated array of allocated qemu_irqs? Is it safe to copy
> them to gsi_state then free the array?

That's how I understand it, and also how it's used elsewhere.

>> +
>> +    if (mms->isa_serial_enabled) {
>> +        serial_hds_isa_init(isa_bus, 0, 1);
>> +    }
>> +
>> +    if (bios_name == NULL) {
>> +        bios_name = MICROVM_BIOS_FILENAME;
>> +    }
>> +    x86_system_rom_init(get_system_memory(), true);
>> +}
>> +
>> +static void microvm_memory_init(MicrovmMachineState *mms)
>> +{
>> +    MachineState *machine = MACHINE(mms);
>> +    X86MachineState *x86ms = X86_MACHINE(mms);
>> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
>> +    MemoryRegion *system_memory = get_system_memory();
>> +    FWCfgState *fw_cfg;
>> +    ram_addr_t lowmem;
>> +    int i;
>> +
>> +    /*
>> +     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
>> +     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
>> +     * also known as MMCFG).
>> +     * If it doesn't, we need to split it in chunks below and above 4G.
>> +     * In any case, try to make sure that guest addresses aligned at
>> +     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
>> +     */
>> +    if (machine->ram_size >= 0xb0000000) {
>> +        lowmem = 0x80000000;
>> +    } else {
>> +        lowmem = 0xb0000000;
>> +    }
>> +
>> +    /*
>> +     * Handle the machine opt max-ram-below-4g.  It is basically doing
>> +     * min(qemu limit, user limit).
>> +     */
>> +    if (!x86ms->max_ram_below_4g) {
>> +        x86ms->max_ram_below_4g = 1ULL << 32; /* default: 4G */
>
> Please use '4 * GiB' with no comment.

Ack (this is copypaste from pc_q35.c).

>> +    }
>> +    if (lowmem > x86ms->max_ram_below_4g) {
>> +        lowmem = x86ms->max_ram_below_4g;
>> +        if (machine->ram_size - lowmem > lowmem &&
>> +            lowmem & (1 * GiB - 1)) {
>> +            warn_report("There is possibly poor performance as the ram size "
>> +                        " (0x%" PRIx64 ") is more then twice the size of"
>> +                        " max-ram-below-4g (%"PRIu64") and"
>> +                        " max-ram-below-4g is not a multiple of 1G.",
>> +                        (uint64_t)machine->ram_size, x86ms->max_ram_below_4g);
>> +        }
>> +    }
>> +
>> +    if (machine->ram_size > lowmem) {
>> +        x86ms->above_4g_mem_size = machine->ram_size - lowmem;
>> +        x86ms->below_4g_mem_size = lowmem;
>> +    } else {
>> +        x86ms->above_4g_mem_size = 0;
>> +        x86ms->below_4g_mem_size = machine->ram_size;
>> +    }
>> +
>> +    ram = g_malloc(sizeof(*ram));
>> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
>> +                                         machine->ram_size);
>> +
>> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
>> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>> +                             0, x86ms->below_4g_mem_size);
>> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
>> +
>> +    e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
>> +
>> +    if (x86ms->above_4g_mem_size > 0) {
>> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
>> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>> +                                 x86ms->below_4g_mem_size,
>> +                                 x86ms->above_4g_mem_size);
>> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
>> +                                    ram_above_4g);
>> +        e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM);
>> +    }
>> +
>> +    fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4,
>> +                                &address_space_memory);
>> +
>> +    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>> +    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)x86ms->apic_id_limit);
>> +    fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)machine->ram_size);
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
>> +
>> +    rom_set_fw(fw_cfg);
>> +
>> +    e820_create_fw_entry(fw_cfg);
>> +
>> +    load_linux(x86ms, fw_cfg, 0, true, true);
>> +
>> +    if (mms->option_roms_enabled) {
>> +        for (i = 0; i < nb_option_roms; i++) {
>> +            rom_add_option(option_rom[i].name, option_rom[i].bootindex);
>> +        }
>> +    }
>> +
>> +    x86ms->fw_cfg = fw_cfg;
>> +    x86ms->ioapic_as = &address_space_memory;
>> +}
>> +
>> +static gchar *microvm_get_mmio_cmdline(gchar *name)
>> +{
>> +    gchar *cmdline;
>> +    gchar *separator;
>> +    long int index;
>> +    int ret;
>> +
>> +    separator = g_strrstr(name, ".");
>> +    if (!separator) {
>> +        return NULL;
>> +    }
>> +
>> +    if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) {
>> +        return NULL;
>> +    }
>> +
>> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
>> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
>> +                     " virtio_mmio.device=512@0x%lx:%ld",
>> +                     VIRTIO_MMIO_BASE + index * 512,
>> +                     VIRTIO_IRQ_BASE + index);
>> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
>> +        g_free(cmdline);
>> +        return NULL;
>> +    }
>> +
>> +    return cmdline;
>> +}
>> +
>> +static void microvm_fix_kernel_cmdline(MachineState *machine)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>> +    BusState *bus;
>> +    BusChild *kid;
>> +    char *cmdline;
>> +
>> +    /*
>> +     * Find MMIO transports with attached devices, and add them to the kernel
>> +     * command line.
>> +     *
>> +     * Yes, this is a hack, but one that heavily improves the UX without
>> +     * introducing any significant issues.
>> +     */
>> +    cmdline = g_strdup(machine->kernel_cmdline);
>> +    bus = sysbus_get_default();
>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>> +        DeviceState *dev = kid->child;
>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>> +
>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>> +
>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
>> +                if (mmio_cmdline) {
>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>> +                    g_free(mmio_cmdline);
>> +                    g_free(cmdline);
>> +                    cmdline = newcmd;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
>> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
>> +}
>> +
>> +static void microvm_machine_state_init(MachineState *machine)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>> +    Error *local_err = NULL;
>> +
>> +    if (machine->kernel_filename == NULL) {
>> +        error_report("missing kernel image file name, required by microvm");
>> +        exit(1);
>> +    }
>> +
>> +    microvm_memory_init(mms);
>> +
>> +    x86_cpus_init(x86ms, CPU_VERSION_LATEST);
>> +    if (local_err) {
>> +        error_report_err(local_err);
>> +        exit(1);
>> +    }
>> +
>> +    microvm_devices_init(mms);
>> +}
>> +
>> +static void microvm_machine_reset(MachineState *machine)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>> +    CPUState *cs;
>> +    X86CPU *cpu;
>> +
>> +    if (mms->kernel_cmdline_enabled && !mms->kernel_cmdline_fixed) {
>> +        microvm_fix_kernel_cmdline(machine);
>> +        mms->kernel_cmdline_fixed = true;
>> +    }
>> +
>> +    qemu_devices_reset();
>> +
>> +    CPU_FOREACH(cs) {
>> +        cpu = X86_CPU(cs);
>> +
>> +        if (cpu->apic_state) {
>> +            device_reset(cpu->apic_state);
>> +        }
>> +    }
>> +}
>> +
>> +static bool microvm_machine_get_rtc(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->rtc_enabled;
>> +}
>> +
>> +static void microvm_machine_set_rtc(Object *obj, bool value, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->rtc_enabled = value;
>> +}
>> +
>> +static bool microvm_machine_get_isa_serial(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->isa_serial_enabled;
>> +}
>> +
>> +static void microvm_machine_set_isa_serial(Object *obj, bool value,
>> +                                           Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->isa_serial_enabled = value;
>> +}
>> +
>> +static bool microvm_machine_get_option_roms(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->option_roms_enabled;
>> +}
>> +
>> +static void microvm_machine_set_option_roms(Object *obj, bool value,
>> +                                            Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->option_roms_enabled = value;
>> +}
>> +
>> +static bool microvm_machine_get_kernel_cmdline(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->kernel_cmdline_enabled;
>> +}
>> +
>> +static void microvm_machine_set_kernel_cmdline(Object *obj, bool value,
>> +                                               Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->kernel_cmdline_enabled = value;
>> +}
>> +
>> +static void microvm_machine_initfn(Object *obj)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    /* Configuration */
>> +    mms->rtc_enabled = true;
>> +    mms->isa_serial_enabled = true;
>> +    mms->option_roms_enabled = true;
>> +    mms->kernel_cmdline_enabled = true;
>> +
>> +    /* State */
>> +    mms->kernel_cmdline_fixed = false;
>> +}
>> +
>> +static void microvm_class_init(ObjectClass *oc, void *data)
>> +{
>> +    MachineClass *mc = MACHINE_CLASS(oc);
>> +    NMIClass *nc = NMI_CLASS(oc);
>> +
>> +    mc->init = microvm_machine_state_init;
>> +
>> +    mc->family = "microvm_i386";
>> +    mc->desc = "Microvm (i386)";
>> +    mc->units_per_default_bus = 1;
>> +    mc->no_floppy = 1;
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
>
> Aren't these common to X86?

Hm... Those seem to be leftovers from NEMU's virt.c. I'll check it those
are really needed.

>> +    mc->max_cpus = 288;
>> +    mc->has_hotpluggable_cpus = false;
>> +    mc->auto_enable_numa_with_memhp = false;
>> +    mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
>> +    mc->nvdimm_supported = false;
>> +
>> +    /* Avoid relying too much on kernel components */
>> +    mc->default_kernel_irqchip_split = true;
>> +
>> +    /* Machine class handlers */
>> +    mc->reset = microvm_machine_reset;
>> +
>> +    /* NMI handler */
>> +    nc->nmi_monitor_handler = x86_nmi;
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_RTC,
>> +                                   microvm_machine_get_rtc,
>> +                                   microvm_machine_set_rtc,
>> +                                   &error_abort);
>> +    object_class_property_set_description(oc, MICROVM_MACHINE_RTC,
>> +        "Set off to disable the instantiation of an MC146818 RTC",
>> +        &error_abort);
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_ISA_SERIAL,
>> +                                   microvm_machine_get_isa_serial,
>> +                                   microvm_machine_set_isa_serial,
>> +                                   &error_abort);
>> +    object_class_property_set_description(oc, MICROVM_MACHINE_ISA_SERIAL,
>> +        "Set off to disable the instantiation an ISA serial port",
>> +        &error_abort);
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_OPTION_ROMS,
>> +                                   microvm_machine_get_option_roms,
>> +                                   microvm_machine_set_option_roms,
>> +                                   &error_abort);
>> +    object_class_property_set_description(oc, MICROVM_MACHINE_OPTION_ROMS,
>> +        "Set off to disable loading option ROMs", &error_abort);
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
>> +                                   microvm_machine_get_kernel_cmdline,
>> +                                   microvm_machine_set_kernel_cmdline,
>> +                                   &error_abort);
>> +    object_class_property_set_description(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
>> +        "Set off to disable adding virtio-mmio devices to the kernel cmdline",
>> +        &error_abort);
>> +}
>> +
>> +static const TypeInfo microvm_machine_info = {
>> +    .name          = TYPE_MICROVM_MACHINE,
>> +    .parent        = TYPE_X86_MACHINE,
>> +    .instance_size = sizeof(MicrovmMachineState),
>> +    .instance_init = microvm_machine_initfn,
>> +    .class_size    = sizeof(MicrovmMachineClass),
>> +    .class_init    = microvm_class_init,
>> +    .interfaces = (InterfaceInfo[]) {
>> +         { TYPE_NMI },
>
> Isn't this inherited from TYPE_X86_MACHINE?

Good question. Should we assume all x86 based machines have NMI, or just
leave it to each board?

Thanks,
Sergio.

>> +         { }
>> +    },
>> +};
>> +
>> +static void microvm_machine_init(void)
>> +{
>> +    type_register_static(&microvm_machine_info);
>> +}
>> +type_init(microvm_machine_init);
>> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>> new file mode 100644
>> index 0000000000..04c8caf886
>> --- /dev/null
>> +++ b/include/hw/i386/microvm.h
>> @@ -0,0 +1,80 @@
>> +/*
>> + * Copyright (c) 2018 Intel Corporation
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef HW_I386_MICROVM_H
>> +#define HW_I386_MICROVM_H
>> +
>> +#include "qemu-common.h"
>> +#include "exec/hwaddr.h"
>> +#include "qemu/notify.h"
>> +
>> +#include "hw/boards.h"
>> +#include "hw/i386/x86.h"
>> +
>> +/* Microvm memory layout */
>> +#define PVH_START_INFO        0x6000
>> +#define MEMMAP_START          0x7000
>> +#define MODLIST_START         0x7800
>> +#define BOOT_STACK_POINTER    0x8ff0
>> +#define PML4_START            0x9000
>> +#define PDPTE_START           0xa000
>> +#define PDE_START             0xb000
>> +#define KERNEL_CMDLINE_START  0x20000
>> +#define EBDA_START            0x9fc00
>> +#define HIMEM_START           0x100000
>> +
>> +/* Platform virtio definitions */
>> +#define VIRTIO_MMIO_BASE      0xc0000000
>> +#define VIRTIO_IRQ_BASE       5
>> +#define VIRTIO_NUM_TRANSPORTS 8
>> +#define VIRTIO_CMDLINE_MAXLEN 64
>> +
>> +/* Machine type options */
>> +#define MICROVM_MACHINE_RTC            "rtc"
>> +#define MICROVM_MACHINE_ISA_SERIAL     "isa-serial"
>> +#define MICROVM_MACHINE_OPTION_ROMS    "option-roms"
>> +#define MICROVM_MACHINE_KERNEL_CMDLINE "kernel-cmdline"
>> +
>> +typedef struct {
>> +    X86MachineClass parent;
>> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
>> +                                           DeviceState *dev);
>> +} MicrovmMachineClass;
>> +
>> +typedef struct {
>> +    X86MachineState parent;
>> +
>> +    /* Machine type options */
>> +    bool rtc_enabled;
>> +    bool isa_serial_enabled;
>> +    bool option_roms_enabled;
>> +    bool kernel_cmdline_enabled;
>> +
>> +
>> +    /* Machine state */
>> +    bool kernel_cmdline_fixed;
>> +} MicrovmMachineState;
>> +
>> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
>> +#define MICROVM_MACHINE(obj) \
>> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_GET_CLASS(obj) \
>> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_CLASS(class) \
>> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
>> +
>> +#endif
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-26  6:34       ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26  6:34 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: ehabkost, kvm, mst, mtosatti, qemu-devel, kraxel, pbonzini,
	imammedo, lersek, rth

[-- Attachment #1: Type: text/plain, Size: 27719 bytes --]


Philippe Mathieu-Daudé <philmd@redhat.com> writes:

> On 9/24/19 2:44 PM, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a minimalist machine type free
>> from the burden of legacy compatibility, serving as a stepping stone
>> for future projects aiming at improving boot times, reducing the
>> attack surface and slimming down QEMU's footprint.
>> 
>> The microvm machine type supports the following devices:
>> 
>>  - ISA bus
>>  - i8259 PIC
>>  - LAPIC (implicit if using KVM)
>>  - IOAPIC (defaults to kernel_irqchip_split = true)
>>  - i8254 PIT
>>  - MC146818 RTC (optional)
>>  - kvmclock (if using KVM)
>>  - fw_cfg
>>  - One ISA serial port (optional)
>>  - Up to eight virtio-mmio devices (configured by the user)
>> 
>> It supports the following machine-specific options:
>> 
>> microvm.option-roms=bool (Set off to disable loading option ROMs)
>> microvm.isa-serial=bool (Set off to disable the instantiation an ISA serial port)
>> microvm.rtc=bool (Set off to disable the instantiation of an MC146818 RTC)
>> microvm.kernel-cmdline=bool (Set off to disable adding virtio-mmio devices to the kernel cmdline)
>> 
>> By default, microvm uses qboot as its BIOS, to obtain better boot
>> times, but it's also compatible with SeaBIOS.
>> 
>> As no current FW is able to boot from a block device using virtio-mmio
>> as its transport, a microvm-based VM needs to be run using a host-side
>> kernel and, optionally, an initrd image.
>> 
>> This is an example of instantiating a microvm VM with a virtio-mmio
>> based console:
>> 
>> qemu-system-x86_64 -M microvm
>>  -enable-kvm -cpu host -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>  -nodefaults -no-user-config -nographic \
>>  -chardev stdio,id=virtiocon0,server \
>>  -device virtio-serial-device \
>>  -device virtconsole,chardev=virtiocon0 \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0
>> 
>> This is another example, this time using an ISA serial port, useful
>> for debugging purposes:
>> 
>> qemu-system-x86_64 -M microvm \
>>  -enable-kvm -cpu host -m 512m -smp 2 \
>>  -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/vda" \
>>  -nodefaults -no-user-config -nographic \
>>  -serial stdio \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0
>> 
>> Finally, in this example a microvm VM is instantiated without RTC,
>> without an ISA serial port and without loading the option ROMs,
>> obtaining the smallest configuration:
>> 
>> qemu-system-x86_64 -M microvm,rtc=off,isa-serial=off,option-roms=off \
>>  -enable-kvm -cpu host -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>  -nodefaults -no-user-config -nographic \
>>  -chardev stdio,id=virtiocon0,server \
>>  -device virtio-serial-device \
>>  -device virtconsole,chardev=virtiocon0 \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  default-configs/i386-softmmu.mak |   1 +
>>  hw/i386/Kconfig                  |   4 +
>>  hw/i386/Makefile.objs            |   1 +
>>  hw/i386/microvm.c                | 512 +++++++++++++++++++++++++++++++
>>  include/hw/i386/microvm.h        |  80 +++++
>>  5 files changed, 598 insertions(+)
>>  create mode 100644 hw/i386/microvm.c
>>  create mode 100644 include/hw/i386/microvm.h
>> 
>> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
>> index cd5ea391e8..c27cdd98e9 100644
>> --- a/default-configs/i386-softmmu.mak
>> +++ b/default-configs/i386-softmmu.mak
>> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>>  CONFIG_I440FX=y
>>  CONFIG_Q35=y
>>  CONFIG_ACPI_PCI=y
>> +CONFIG_MICROVM=y
>> \ No newline at end of file
>> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
>> index 6350438036..324e193dd8 100644
>> --- a/hw/i386/Kconfig
>> +++ b/hw/i386/Kconfig
>> @@ -88,6 +88,10 @@ config Q35
>>      select SMBIOS
>>      select FW_CFG_DMA
>>  
>> +config MICROVM
>> +    bool
>> +    select VIRTIO_MMIO
>> +
>>  config VTD
>>      bool
>>  
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 5b4b3a672e..bb17d54567 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -6,6 +6,7 @@ obj-y += pc.o
>>  obj-y += e820.o
>>  obj-$(CONFIG_I440FX) += pc_piix.o
>>  obj-$(CONFIG_Q35) += pc_q35.o
>> +obj-$(CONFIG_MICROVM) += microvm.o
>>  obj-y += fw_cfg.o pc_sysfw.o
>>  obj-y += x86-iommu.o
>>  obj-$(CONFIG_VTD) += intel_iommu.o
>> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
>> new file mode 100644
>> index 0000000000..4b494a1b27
>> --- /dev/null
>> +++ b/hw/i386/microvm.c
>> @@ -0,0 +1,512 @@
>> +/*
>> + * Copyright (c) 2018 Intel Corporation
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "qemu/cutils.h"
>> +#include "qemu/units.h"
>> +#include "qapi/error.h"
>> +#include "qapi/visitor.h"
>> +#include "sysemu/sysemu.h"
>> +#include "sysemu/cpus.h"
>> +#include "sysemu/numa.h"
>> +#include "sysemu/reset.h"
>> +
>> +#include "hw/loader.h"
>> +#include "hw/irq.h"
>> +#include "hw/nmi.h"
>> +#include "hw/kvm/clock.h"
>> +#include "hw/i386/microvm.h"
>> +#include "hw/i386/x86.h"
>> +#include "hw/i386/pc.h"
>> +#include "target/i386/cpu.h"
>> +#include "hw/timer/i8254.h"
>> +#include "hw/timer/mc146818rtc.h"
>> +#include "hw/char/serial.h"
>> +#include "hw/i386/topology.h"
>> +#include "hw/i386/e820.h"
>> +#include "hw/i386/fw_cfg.h"
>> +#include "hw/virtio/virtio-mmio.h"
>> +
>> +#include "cpu.h"
>> +#include "elf.h"
>> +#include "pvh.h"
>> +#include "kvm_i386.h"
>> +#include "hw/xen/start_info.h"
>> +
>> +#define MICROVM_BIOS_FILENAME "bios-microvm.bin"
>> +
>> +static void microvm_set_rtc(MicrovmMachineState *mms, ISADevice *s)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(mms);
>> +    int val;
>> +
>> +    val = MIN(x86ms->below_4g_mem_size / KiB, 640);
>> +    rtc_set_memory(s, 0x15, val);
>> +    rtc_set_memory(s, 0x16, val >> 8);
>> +    /* extended memory (next 64MiB) */
>> +    if (x86ms->below_4g_mem_size > 1 * MiB) {
>> +        val = (x86ms->below_4g_mem_size - 1 * MiB) / KiB;
>> +    } else {
>> +        val = 0;
>> +    }
>> +    if (val > 65535) {
>> +        val = 65535;
>> +    }
>> +    rtc_set_memory(s, 0x17, val);
>> +    rtc_set_memory(s, 0x18, val >> 8);
>> +    rtc_set_memory(s, 0x30, val);
>> +    rtc_set_memory(s, 0x31, val >> 8);
>> +    /* memory between 16MiB and 4GiB */
>> +    if (x86ms->below_4g_mem_size > 16 * MiB) {
>> +        val = (x86ms->below_4g_mem_size - 16 * MiB) / (64 * KiB);
>> +    } else {
>> +        val = 0;
>> +    }
>> +    if (val > 65535) {
>> +        val = 65535;
>> +    }
>> +    rtc_set_memory(s, 0x34, val);
>> +    rtc_set_memory(s, 0x35, val >> 8);
>> +    /* memory above 4GiB */
>> +    val = x86ms->above_4g_mem_size / 65536;
>> +    rtc_set_memory(s, 0x5b, val);
>> +    rtc_set_memory(s, 0x5c, val >> 8);
>> +    rtc_set_memory(s, 0x5d, val >> 16);
>> +}
>> +
>> +static void microvm_devices_init(MicrovmMachineState *mms)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(mms);
>> +    ISABus *isa_bus;
>> +    ISADevice *rtc_state;
>> +    GSIState *gsi_state;
>> +    qemu_irq *i8259;
>> +    int i;
>> +
>> +    gsi_state = g_malloc0(sizeof(*gsi_state));
>> +    x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>> +
>> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
>> +                          &error_abort);
>> +    isa_bus_irqs(isa_bus, x86ms->gsi);
>> +
>> +    i8259 = i8259_init(isa_bus, pc_allocate_cpu_irq());
>> +
>> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
>> +        gsi_state->i8259_irq[i] = i8259[i];
>> +    }
>> +
>> +    ioapic_init_gsi(gsi_state, "machine");
>> +
>> +    if (mms->rtc_enabled) {
>> +        rtc_state = mc146818_rtc_init(isa_bus, 2000, NULL);
>> +        microvm_set_rtc(mms, rtc_state);
>> +    }
>> +
>
> Maybe refactor that ...
>
>> +    if (kvm_pit_in_kernel()) {
>> +        kvm_pit_init(isa_bus, 0x40);
>> +    } else {
>> +        i8254_pit_init(isa_bus, 0x40, 0, NULL);
>> +    }
>
> ... as a x86_pit_create() function?

This is deemed to change in v5, as we want to avoid the legacy PIC+PIT
when possible.

>> +
>> +    kvmclock_create();
>> +
>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>> +        int nirq = VIRTIO_IRQ_BASE + i;
>> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
>> +        qemu_irq mmio_irq;
>> +
>> +        isa_init_irq(isadev, &mmio_irq, nirq);
>> +        sysbus_create_simple("virtio-mmio",
>> +                             VIRTIO_MMIO_BASE + i * 512,
>> +                             x86ms->gsi[VIRTIO_IRQ_BASE + i]);
>> +    }
>> +
>> +    g_free(i8259);
>
> Not related to this patch, but i8259_init() API is not clear,
> it returns an allocated array of allocated qemu_irqs? Is it safe to copy
> them to gsi_state then free the array?

That's how I understand it, and also how it's used elsewhere.

>> +
>> +    if (mms->isa_serial_enabled) {
>> +        serial_hds_isa_init(isa_bus, 0, 1);
>> +    }
>> +
>> +    if (bios_name == NULL) {
>> +        bios_name = MICROVM_BIOS_FILENAME;
>> +    }
>> +    x86_system_rom_init(get_system_memory(), true);
>> +}
>> +
>> +static void microvm_memory_init(MicrovmMachineState *mms)
>> +{
>> +    MachineState *machine = MACHINE(mms);
>> +    X86MachineState *x86ms = X86_MACHINE(mms);
>> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
>> +    MemoryRegion *system_memory = get_system_memory();
>> +    FWCfgState *fw_cfg;
>> +    ram_addr_t lowmem;
>> +    int i;
>> +
>> +    /*
>> +     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
>> +     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
>> +     * also known as MMCFG).
>> +     * If it doesn't, we need to split it in chunks below and above 4G.
>> +     * In any case, try to make sure that guest addresses aligned at
>> +     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
>> +     */
>> +    if (machine->ram_size >= 0xb0000000) {
>> +        lowmem = 0x80000000;
>> +    } else {
>> +        lowmem = 0xb0000000;
>> +    }
>> +
>> +    /*
>> +     * Handle the machine opt max-ram-below-4g.  It is basically doing
>> +     * min(qemu limit, user limit).
>> +     */
>> +    if (!x86ms->max_ram_below_4g) {
>> +        x86ms->max_ram_below_4g = 1ULL << 32; /* default: 4G */
>
> Please use '4 * GiB' with no comment.

Ack (this is copypaste from pc_q35.c).

>> +    }
>> +    if (lowmem > x86ms->max_ram_below_4g) {
>> +        lowmem = x86ms->max_ram_below_4g;
>> +        if (machine->ram_size - lowmem > lowmem &&
>> +            lowmem & (1 * GiB - 1)) {
>> +            warn_report("There is possibly poor performance as the ram size "
>> +                        " (0x%" PRIx64 ") is more then twice the size of"
>> +                        " max-ram-below-4g (%"PRIu64") and"
>> +                        " max-ram-below-4g is not a multiple of 1G.",
>> +                        (uint64_t)machine->ram_size, x86ms->max_ram_below_4g);
>> +        }
>> +    }
>> +
>> +    if (machine->ram_size > lowmem) {
>> +        x86ms->above_4g_mem_size = machine->ram_size - lowmem;
>> +        x86ms->below_4g_mem_size = lowmem;
>> +    } else {
>> +        x86ms->above_4g_mem_size = 0;
>> +        x86ms->below_4g_mem_size = machine->ram_size;
>> +    }
>> +
>> +    ram = g_malloc(sizeof(*ram));
>> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
>> +                                         machine->ram_size);
>> +
>> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
>> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>> +                             0, x86ms->below_4g_mem_size);
>> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
>> +
>> +    e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM);
>> +
>> +    if (x86ms->above_4g_mem_size > 0) {
>> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
>> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>> +                                 x86ms->below_4g_mem_size,
>> +                                 x86ms->above_4g_mem_size);
>> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
>> +                                    ram_above_4g);
>> +        e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM);
>> +    }
>> +
>> +    fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4,
>> +                                &address_space_memory);
>> +
>> +    fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
>> +    fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)x86ms->apic_id_limit);
>> +    fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)machine->ram_size);
>> +    fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
>> +
>> +    rom_set_fw(fw_cfg);
>> +
>> +    e820_create_fw_entry(fw_cfg);
>> +
>> +    load_linux(x86ms, fw_cfg, 0, true, true);
>> +
>> +    if (mms->option_roms_enabled) {
>> +        for (i = 0; i < nb_option_roms; i++) {
>> +            rom_add_option(option_rom[i].name, option_rom[i].bootindex);
>> +        }
>> +    }
>> +
>> +    x86ms->fw_cfg = fw_cfg;
>> +    x86ms->ioapic_as = &address_space_memory;
>> +}
>> +
>> +static gchar *microvm_get_mmio_cmdline(gchar *name)
>> +{
>> +    gchar *cmdline;
>> +    gchar *separator;
>> +    long int index;
>> +    int ret;
>> +
>> +    separator = g_strrstr(name, ".");
>> +    if (!separator) {
>> +        return NULL;
>> +    }
>> +
>> +    if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) {
>> +        return NULL;
>> +    }
>> +
>> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
>> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
>> +                     " virtio_mmio.device=512@0x%lx:%ld",
>> +                     VIRTIO_MMIO_BASE + index * 512,
>> +                     VIRTIO_IRQ_BASE + index);
>> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
>> +        g_free(cmdline);
>> +        return NULL;
>> +    }
>> +
>> +    return cmdline;
>> +}
>> +
>> +static void microvm_fix_kernel_cmdline(MachineState *machine)
>> +{
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>> +    BusState *bus;
>> +    BusChild *kid;
>> +    char *cmdline;
>> +
>> +    /*
>> +     * Find MMIO transports with attached devices, and add them to the kernel
>> +     * command line.
>> +     *
>> +     * Yes, this is a hack, but one that heavily improves the UX without
>> +     * introducing any significant issues.
>> +     */
>> +    cmdline = g_strdup(machine->kernel_cmdline);
>> +    bus = sysbus_get_default();
>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>> +        DeviceState *dev = kid->child;
>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>> +
>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>> +
>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
>> +                if (mmio_cmdline) {
>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>> +                    g_free(mmio_cmdline);
>> +                    g_free(cmdline);
>> +                    cmdline = newcmd;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
>> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
>> +}
>> +
>> +static void microvm_machine_state_init(MachineState *machine)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>> +    Error *local_err = NULL;
>> +
>> +    if (machine->kernel_filename == NULL) {
>> +        error_report("missing kernel image file name, required by microvm");
>> +        exit(1);
>> +    }
>> +
>> +    microvm_memory_init(mms);
>> +
>> +    x86_cpus_init(x86ms, CPU_VERSION_LATEST);
>> +    if (local_err) {
>> +        error_report_err(local_err);
>> +        exit(1);
>> +    }
>> +
>> +    microvm_devices_init(mms);
>> +}
>> +
>> +static void microvm_machine_reset(MachineState *machine)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>> +    CPUState *cs;
>> +    X86CPU *cpu;
>> +
>> +    if (mms->kernel_cmdline_enabled && !mms->kernel_cmdline_fixed) {
>> +        microvm_fix_kernel_cmdline(machine);
>> +        mms->kernel_cmdline_fixed = true;
>> +    }
>> +
>> +    qemu_devices_reset();
>> +
>> +    CPU_FOREACH(cs) {
>> +        cpu = X86_CPU(cs);
>> +
>> +        if (cpu->apic_state) {
>> +            device_reset(cpu->apic_state);
>> +        }
>> +    }
>> +}
>> +
>> +static bool microvm_machine_get_rtc(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->rtc_enabled;
>> +}
>> +
>> +static void microvm_machine_set_rtc(Object *obj, bool value, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->rtc_enabled = value;
>> +}
>> +
>> +static bool microvm_machine_get_isa_serial(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->isa_serial_enabled;
>> +}
>> +
>> +static void microvm_machine_set_isa_serial(Object *obj, bool value,
>> +                                           Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->isa_serial_enabled = value;
>> +}
>> +
>> +static bool microvm_machine_get_option_roms(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->option_roms_enabled;
>> +}
>> +
>> +static void microvm_machine_set_option_roms(Object *obj, bool value,
>> +                                            Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->option_roms_enabled = value;
>> +}
>> +
>> +static bool microvm_machine_get_kernel_cmdline(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->kernel_cmdline_enabled;
>> +}
>> +
>> +static void microvm_machine_set_kernel_cmdline(Object *obj, bool value,
>> +                                               Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->kernel_cmdline_enabled = value;
>> +}
>> +
>> +static void microvm_machine_initfn(Object *obj)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    /* Configuration */
>> +    mms->rtc_enabled = true;
>> +    mms->isa_serial_enabled = true;
>> +    mms->option_roms_enabled = true;
>> +    mms->kernel_cmdline_enabled = true;
>> +
>> +    /* State */
>> +    mms->kernel_cmdline_fixed = false;
>> +}
>> +
>> +static void microvm_class_init(ObjectClass *oc, void *data)
>> +{
>> +    MachineClass *mc = MACHINE_CLASS(oc);
>> +    NMIClass *nc = NMI_CLASS(oc);
>> +
>> +    mc->init = microvm_machine_state_init;
>> +
>> +    mc->family = "microvm_i386";
>> +    mc->desc = "Microvm (i386)";
>> +    mc->units_per_default_bus = 1;
>> +    mc->no_floppy = 1;
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
>
> Aren't these common to X86?

Hm... Those seem to be leftovers from NEMU's virt.c. I'll check it those
are really needed.

>> +    mc->max_cpus = 288;
>> +    mc->has_hotpluggable_cpus = false;
>> +    mc->auto_enable_numa_with_memhp = false;
>> +    mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
>> +    mc->nvdimm_supported = false;
>> +
>> +    /* Avoid relying too much on kernel components */
>> +    mc->default_kernel_irqchip_split = true;
>> +
>> +    /* Machine class handlers */
>> +    mc->reset = microvm_machine_reset;
>> +
>> +    /* NMI handler */
>> +    nc->nmi_monitor_handler = x86_nmi;
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_RTC,
>> +                                   microvm_machine_get_rtc,
>> +                                   microvm_machine_set_rtc,
>> +                                   &error_abort);
>> +    object_class_property_set_description(oc, MICROVM_MACHINE_RTC,
>> +        "Set off to disable the instantiation of an MC146818 RTC",
>> +        &error_abort);
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_ISA_SERIAL,
>> +                                   microvm_machine_get_isa_serial,
>> +                                   microvm_machine_set_isa_serial,
>> +                                   &error_abort);
>> +    object_class_property_set_description(oc, MICROVM_MACHINE_ISA_SERIAL,
>> +        "Set off to disable the instantiation an ISA serial port",
>> +        &error_abort);
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_OPTION_ROMS,
>> +                                   microvm_machine_get_option_roms,
>> +                                   microvm_machine_set_option_roms,
>> +                                   &error_abort);
>> +    object_class_property_set_description(oc, MICROVM_MACHINE_OPTION_ROMS,
>> +        "Set off to disable loading option ROMs", &error_abort);
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
>> +                                   microvm_machine_get_kernel_cmdline,
>> +                                   microvm_machine_set_kernel_cmdline,
>> +                                   &error_abort);
>> +    object_class_property_set_description(oc, MICROVM_MACHINE_KERNEL_CMDLINE,
>> +        "Set off to disable adding virtio-mmio devices to the kernel cmdline",
>> +        &error_abort);
>> +}
>> +
>> +static const TypeInfo microvm_machine_info = {
>> +    .name          = TYPE_MICROVM_MACHINE,
>> +    .parent        = TYPE_X86_MACHINE,
>> +    .instance_size = sizeof(MicrovmMachineState),
>> +    .instance_init = microvm_machine_initfn,
>> +    .class_size    = sizeof(MicrovmMachineClass),
>> +    .class_init    = microvm_class_init,
>> +    .interfaces = (InterfaceInfo[]) {
>> +         { TYPE_NMI },
>
> Isn't this inherited from TYPE_X86_MACHINE?

Good question. Should we assume all x86 based machines have NMI, or just
leave it to each board?

Thanks,
Sergio.

>> +         { }
>> +    },
>> +};
>> +
>> +static void microvm_machine_init(void)
>> +{
>> +    type_register_static(&microvm_machine_info);
>> +}
>> +type_init(microvm_machine_init);
>> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>> new file mode 100644
>> index 0000000000..04c8caf886
>> --- /dev/null
>> +++ b/include/hw/i386/microvm.h
>> @@ -0,0 +1,80 @@
>> +/*
>> + * Copyright (c) 2018 Intel Corporation
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef HW_I386_MICROVM_H
>> +#define HW_I386_MICROVM_H
>> +
>> +#include "qemu-common.h"
>> +#include "exec/hwaddr.h"
>> +#include "qemu/notify.h"
>> +
>> +#include "hw/boards.h"
>> +#include "hw/i386/x86.h"
>> +
>> +/* Microvm memory layout */
>> +#define PVH_START_INFO        0x6000
>> +#define MEMMAP_START          0x7000
>> +#define MODLIST_START         0x7800
>> +#define BOOT_STACK_POINTER    0x8ff0
>> +#define PML4_START            0x9000
>> +#define PDPTE_START           0xa000
>> +#define PDE_START             0xb000
>> +#define KERNEL_CMDLINE_START  0x20000
>> +#define EBDA_START            0x9fc00
>> +#define HIMEM_START           0x100000
>> +
>> +/* Platform virtio definitions */
>> +#define VIRTIO_MMIO_BASE      0xc0000000
>> +#define VIRTIO_IRQ_BASE       5
>> +#define VIRTIO_NUM_TRANSPORTS 8
>> +#define VIRTIO_CMDLINE_MAXLEN 64
>> +
>> +/* Machine type options */
>> +#define MICROVM_MACHINE_RTC            "rtc"
>> +#define MICROVM_MACHINE_ISA_SERIAL     "isa-serial"
>> +#define MICROVM_MACHINE_OPTION_ROMS    "option-roms"
>> +#define MICROVM_MACHINE_KERNEL_CMDLINE "kernel-cmdline"
>> +
>> +typedef struct {
>> +    X86MachineClass parent;
>> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
>> +                                           DeviceState *dev);
>> +} MicrovmMachineClass;
>> +
>> +typedef struct {
>> +    X86MachineState parent;
>> +
>> +    /* Machine type options */
>> +    bool rtc_enabled;
>> +    bool isa_serial_enabled;
>> +    bool option_roms_enabled;
>> +    bool kernel_cmdline_enabled;
>> +
>> +
>> +    /* Machine state */
>> +    bool kernel_cmdline_fixed;
>> +} MicrovmMachineState;
>> +
>> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
>> +#define MICROVM_MACHINE(obj) \
>> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_GET_CLASS(obj) \
>> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_CLASS(class) \
>> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
>> +
>> +#endif
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-24 12:44 ` Sergio Lopez
@ 2019-09-26  7:48   ` Christian Borntraeger
  -1 siblings, 0 replies; 133+ messages in thread
From: Christian Borntraeger @ 2019-09-26  7:48 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, philmd,
	lersek, kraxel, mtosatti, kvm



On 24.09.19 14:44, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a minimalist machine type free
> from the burden of legacy compatibility, serving as a stepping stone
> for future projects aiming at improving boot times, reducing the
> attack surface and slimming down QEMU's footprint.
> 
> The microvm machine type supports the following devices:
> 
>  - ISA bus
>  - i8259 PIC
>  - LAPIC (implicit if using KVM)
>  - IOAPIC (defaults to kernel_irqchip_split = true)
>  - i8254 PIT
>  - MC146818 RTC (optional)
>  - kvmclock (if using KVM)
>  - fw_cfg
>  - One ISA serial port (optional)
>  - Up to eight virtio-mmio devices (configured by the user)

Just out of curiosity. 
What is the reason for not going virtio-pci? Is the PCI bus really
that expensive and complicated?
FWIW, I do not complain. When people start using virtio-mmio more
often this would also help virtio-ccw (which I am interested in)
as this forces people to think beyond virtio-pci.


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-26  7:48   ` Christian Borntraeger
  0 siblings, 0 replies; 133+ messages in thread
From: Christian Borntraeger @ 2019-09-26  7:48 UTC (permalink / raw)
  To: Sergio Lopez, qemu-devel
  Cc: ehabkost, kvm, mst, lersek, mtosatti, kraxel, pbonzini, imammedo,
	philmd, rth



On 24.09.19 14:44, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a minimalist machine type free
> from the burden of legacy compatibility, serving as a stepping stone
> for future projects aiming at improving boot times, reducing the
> attack surface and slimming down QEMU's footprint.
> 
> The microvm machine type supports the following devices:
> 
>  - ISA bus
>  - i8259 PIC
>  - LAPIC (implicit if using KVM)
>  - IOAPIC (defaults to kernel_irqchip_split = true)
>  - i8254 PIT
>  - MC146818 RTC (optional)
>  - kvmclock (if using KVM)
>  - fw_cfg
>  - One ISA serial port (optional)
>  - Up to eight virtio-mmio devices (configured by the user)

Just out of curiosity. 
What is the reason for not going virtio-pci? Is the PCI bus really
that expensive and complicated?
FWIW, I do not complain. When people start using virtio-mmio more
often this would also help virtio-ccw (which I am interested in)
as this forces people to think beyond virtio-pci.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
  2019-09-26  7:48   ` Christian Borntraeger
@ 2019-09-26  8:22     ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26  8:22 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, pbonzini, rth,
	ehabkost, philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 2179 bytes --]


Christian Borntraeger <borntraeger@de.ibm.com> writes:

> On 24.09.19 14:44, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a minimalist machine type free
>> from the burden of legacy compatibility, serving as a stepping stone
>> for future projects aiming at improving boot times, reducing the
>> attack surface and slimming down QEMU's footprint.
>> 
>> The microvm machine type supports the following devices:
>> 
>>  - ISA bus
>>  - i8259 PIC
>>  - LAPIC (implicit if using KVM)
>>  - IOAPIC (defaults to kernel_irqchip_split = true)
>>  - i8254 PIT
>>  - MC146818 RTC (optional)
>>  - kvmclock (if using KVM)
>>  - fw_cfg
>>  - One ISA serial port (optional)
>>  - Up to eight virtio-mmio devices (configured by the user)
>
> Just out of curiosity. 
> What is the reason for not going virtio-pci? Is the PCI bus really
> that expensive and complicated?

Well, expensive is a relative term. PCI does indeed require a
significant amount of code and cycles, but that's for a good reason, as
it provides an extensive bus logic allowing things like vector
configuration, hot-plug, chaining, etc...

On the other hand, MMIO lacks any kind of bus logic, as it basically
works by saying "hey, take a look at this address, there may be
something there" to the kernel, so of course is cheaper. This makes it
ideal for microvm's aim of supporting a VM with the smallest amount of
code, but bad for almost everything else.

I don't think this means PCI is expensive. That would be the case if
there were a bus providing similar functionality while requiring less
code and cycles. And this is definitely not the case of MMIO.

In other words, I think PCI cost is justified by its use case, while
MMIO simplicity makes it ideal for some specific purposes (like
microvm).

Cheers,
Sergio.

> FWIW, I do not complain. When people start using virtio-mmio more
> often this would also help virtio-ccw (which I am interested in)
> as this forces people to think beyond virtio-pci.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 0/8] Introduce the microvm machine type
@ 2019-09-26  8:22     ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26  8:22 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	pbonzini, imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 2179 bytes --]


Christian Borntraeger <borntraeger@de.ibm.com> writes:

> On 24.09.19 14:44, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a minimalist machine type free
>> from the burden of legacy compatibility, serving as a stepping stone
>> for future projects aiming at improving boot times, reducing the
>> attack surface and slimming down QEMU's footprint.
>> 
>> The microvm machine type supports the following devices:
>> 
>>  - ISA bus
>>  - i8259 PIC
>>  - LAPIC (implicit if using KVM)
>>  - IOAPIC (defaults to kernel_irqchip_split = true)
>>  - i8254 PIT
>>  - MC146818 RTC (optional)
>>  - kvmclock (if using KVM)
>>  - fw_cfg
>>  - One ISA serial port (optional)
>>  - Up to eight virtio-mmio devices (configured by the user)
>
> Just out of curiosity. 
> What is the reason for not going virtio-pci? Is the PCI bus really
> that expensive and complicated?

Well, expensive is a relative term. PCI does indeed require a
significant amount of code and cycles, but that's for a good reason, as
it provides an extensive bus logic allowing things like vector
configuration, hot-plug, chaining, etc...

On the other hand, MMIO lacks any kind of bus logic, as it basically
works by saying "hey, take a look at this address, there may be
something there" to the kernel, so of course is cheaper. This makes it
ideal for microvm's aim of supporting a VM with the smallest amount of
code, but bad for almost everything else.

I don't think this means PCI is expensive. That would be the case if
there were a bus providing similar functionality while requiring less
code and cycles. And this is definitely not the case of MMIO.

In other words, I think PCI cost is justified by its use case, while
MMIO simplicity makes it ideal for some specific purposes (like
microvm).

Cheers,
Sergio.

> FWIW, I do not complain. When people start using virtio-mmio more
> often this would also help virtio-ccw (which I am interested in)
> as this forces people to think beyond virtio-pci.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-26  6:23           ` Sergio Lopez
@ 2019-09-26  8:58             ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-26  8:58 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm


[-- Attachment #1.1: Type: text/plain, Size: 587 bytes --]

On 26/09/19 08:23, Sergio Lopez wrote:
> 
> There's still one problem. If the Guest doesn't have TSC_DEADLINE_TIME,
> Linux hangs on APIC timer calibration. I'm looking for a way to work
> around this. Worst case scenario, we can check for that feature and add
> both PIC and PIT if is missing.
> 

Huh, that's a silly thing that Linux is doing!  If KVM is in use, the
LAPIC timer frequency is known to be 1 GHz.

arch/x86/kernel/kvm.c can just set

	lapic_timer_period = 1000000000 / HZ;

and that should disabled LAPIC calibration if TSC deadline is absent.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-26  8:58             ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-26  8:58 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth


[-- Attachment #1.1: Type: text/plain, Size: 587 bytes --]

On 26/09/19 08:23, Sergio Lopez wrote:
> 
> There's still one problem. If the Guest doesn't have TSC_DEADLINE_TIME,
> Linux hangs on APIC timer calibration. I'm looking for a way to work
> around this. Worst case scenario, we can check for that feature and add
> both PIC and PIT if is missing.
> 

Huh, that's a silly thing that Linux is doing!  If KVM is in use, the
LAPIC timer frequency is known to be 1 GHz.

arch/x86/kernel/kvm.c can just set

	lapic_timer_period = 1000000000 / HZ;

and that should disabled LAPIC calibration if TSC deadline is absent.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-26  6:34       ` Sergio Lopez
@ 2019-09-26  8:59         ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-26  8:59 UTC (permalink / raw)
  To: Sergio Lopez, Philippe Mathieu-Daudé
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	lersek, kraxel, mtosatti, kvm


[-- Attachment #1.1: Type: text/plain, Size: 291 bytes --]

On 26/09/19 08:34, Sergio Lopez wrote:
>> Isn't this inherited from TYPE_X86_MACHINE?
> Good question. Should we assume all x86 based machines have NMI, or just
> leave it to each board?

NMI is hardcoded to exception 2 in the processor so it is there in all
x86 machines.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-09-26  8:59         ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-26  8:59 UTC (permalink / raw)
  To: Sergio Lopez, Philippe Mathieu-Daudé
  Cc: ehabkost, kvm, mst, mtosatti, qemu-devel, kraxel, imammedo, lersek, rth


[-- Attachment #1.1: Type: text/plain, Size: 291 bytes --]

On 26/09/19 08:34, Sergio Lopez wrote:
>> Isn't this inherited from TYPE_X86_MACHINE?
> Good question. Should we assume all x86 based machines have NMI, or just
> leave it to each board?

NMI is hardcoded to exception 2 in the processor so it is there in all
x86 machines.

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-26  8:58             ` Paolo Bonzini
@ 2019-09-26 10:16               ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26 10:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 862 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 26/09/19 08:23, Sergio Lopez wrote:
>> 
>> There's still one problem. If the Guest doesn't have TSC_DEADLINE_TIME,
>> Linux hangs on APIC timer calibration. I'm looking for a way to work
>> around this. Worst case scenario, we can check for that feature and add
>> both PIC and PIT if is missing.
>> 
>
> Huh, that's a silly thing that Linux is doing!  If KVM is in use, the
> LAPIC timer frequency is known to be 1 GHz.
>
> arch/x86/kernel/kvm.c can just set
>
> 	lapic_timer_period = 1000000000 / HZ;
>
> and that should disabled LAPIC calibration if TSC deadline is absent.

Given that they can only be omitted when an specific set of conditions
is met, I think I'm going to make them optional but enabled by default.

I'll also point to this in the documentation.

Thanks,
Sergio

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-26 10:16               ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26 10:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 862 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 26/09/19 08:23, Sergio Lopez wrote:
>> 
>> There's still one problem. If the Guest doesn't have TSC_DEADLINE_TIME,
>> Linux hangs on APIC timer calibration. I'm looking for a way to work
>> around this. Worst case scenario, we can check for that feature and add
>> both PIC and PIT if is missing.
>> 
>
> Huh, that's a silly thing that Linux is doing!  If KVM is in use, the
> LAPIC timer frequency is known to be 1 GHz.
>
> arch/x86/kernel/kvm.c can just set
>
> 	lapic_timer_period = 1000000000 / HZ;
>
> and that should disabled LAPIC calibration if TSC deadline is absent.

Given that they can only be omitted when an specific set of conditions
is met, I think I'm going to make them optional but enabled by default.

I'll also point to this in the documentation.

Thanks,
Sergio

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-26 10:16               ` Sergio Lopez
@ 2019-09-26 10:21                 ` Paolo Bonzini
  -1 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-26 10:21 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm


[-- Attachment #1.1: Type: text/plain, Size: 675 bytes --]

On 26/09/19 12:16, Sergio Lopez wrote:
>> If KVM is in use, the
>> LAPIC timer frequency is known to be 1 GHz.
>>
>> arch/x86/kernel/kvm.c can just set
>>
>> 	lapic_timer_period = 1000000000 / HZ;
>>
>> and that should disabled LAPIC calibration if TSC deadline is absent.
> Given that they can only be omitted when an specific set of conditions
> is met, I think I'm going to make them optional but enabled by default.

Please do introduce the infrastructure to make them OnOffAuto, and for
now make Auto the same as On.  We have time to review that since microvm
is not versioned.

Thanks,

Paolo

> I'll also point to this in the documentation.




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-26 10:21                 ` Paolo Bonzini
  0 siblings, 0 replies; 133+ messages in thread
From: Paolo Bonzini @ 2019-09-26 10:21 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth


[-- Attachment #1.1: Type: text/plain, Size: 675 bytes --]

On 26/09/19 12:16, Sergio Lopez wrote:
>> If KVM is in use, the
>> LAPIC timer frequency is known to be 1 GHz.
>>
>> arch/x86/kernel/kvm.c can just set
>>
>> 	lapic_timer_period = 1000000000 / HZ;
>>
>> and that should disabled LAPIC calibration if TSC deadline is absent.
> Given that they can only be omitted when an specific set of conditions
> is met, I think I'm going to make them optional but enabled by default.

Please do introduce the infrastructure to make them OnOffAuto, and for
now make Auto the same as On.  We have time to review that since microvm
is not versioned.

Thanks,

Paolo

> I'll also point to this in the documentation.




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
  2019-09-26 10:21                 ` Paolo Bonzini
@ 2019-09-26 12:12                   ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26 12:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, mst, imammedo, marcel.apfelbaum, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 26/09/19 12:16, Sergio Lopez wrote:
>>> If KVM is in use, the
>>> LAPIC timer frequency is known to be 1 GHz.
>>>
>>> arch/x86/kernel/kvm.c can just set
>>>
>>> 	lapic_timer_period = 1000000000 / HZ;
>>>
>>> and that should disabled LAPIC calibration if TSC deadline is absent.
>> Given that they can only be omitted when an specific set of conditions
>> is met, I think I'm going to make them optional but enabled by default.
>
> Please do introduce the infrastructure to make them OnOffAuto, and for
> now make Auto the same as On.  We have time to review that since microvm
> is not versioned.

OK, sounds like a good idea to me.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type
@ 2019-09-26 12:12                   ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-09-26 12:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, kvm, mst, lersek, mtosatti, qemu-devel, kraxel,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 26/09/19 12:16, Sergio Lopez wrote:
>>> If KVM is in use, the
>>> LAPIC timer frequency is known to be 1 GHz.
>>>
>>> arch/x86/kernel/kvm.c can just set
>>>
>>> 	lapic_timer_period = 1000000000 / HZ;
>>>
>>> and that should disabled LAPIC calibration if TSC deadline is absent.
>> Given that they can only be omitted when an specific set of conditions
>> is met, I think I'm going to make them optional but enabled by default.
>
> Please do introduce the infrastructure to make them OnOffAuto, and for
> now make Auto the same as On.  We have time to review that since microvm
> is not versioned.

OK, sounds like a good idea to me.

Thanks,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
  2019-09-25  5:59       ` Sergio Lopez
@ 2019-10-01  8:56         ` Sergio Lopez
  -1 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-10-01  8:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost,
	philmd, lersek, kraxel, mtosatti, kvm

[-- Attachment #1: Type: text/plain, Size: 2825 bytes --]


Sergio Lopez <slp@redhat.com> writes:

> Michael S. Tsirkin <mst@redhat.com> writes:
>
>> On Tue, Sep 24, 2019 at 02:44:33PM +0200, Sergio Lopez wrote:
>>> +static void microvm_fix_kernel_cmdline(MachineState *machine)
>>> +{
>>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>>> +    BusState *bus;
>>> +    BusChild *kid;
>>> +    char *cmdline;
>>> +
>>> +    /*
>>> +     * Find MMIO transports with attached devices, and add them to the kernel
>>> +     * command line.
>>> +     *
>>> +     * Yes, this is a hack, but one that heavily improves the UX without
>>> +     * introducing any significant issues.
>>> +     */
>>> +    cmdline = g_strdup(machine->kernel_cmdline);
>>> +    bus = sysbus_get_default();
>>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>>> +        DeviceState *dev = kid->child;
>>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>>> +
>>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>>> +
>>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>>> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
>>> +                if (mmio_cmdline) {
>>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>>> +                    g_free(mmio_cmdline);
>>> +                    g_free(cmdline);
>>> +                    cmdline = newcmd;
>>> +                }
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
>>> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
>>> +}
>>
>> Can we rearrange this somewhat? Maybe the mmio constructor
>> would format the device description and add to some list,
>> and then microvm would just get stuff from that list
>> and add it to kernel command line?
>> This way it can also be controlled by a virtio-mmio property, so
>> e.g. you can disable it per device if you like.
>> In particular, this seems like a handy trick for any machine type
>> using mmio.
>
> Disabling it per-device won't be easy, as transport options can't be
> specified using the underlying device properties.
>
> But, otherwise, sounds like a good idea to avoid having to traverse the
> qtree. I'll give it a try.

Hi Michael,

I'm working on this, but can't find an easy way to obtain the actual IRQ
number with the data I have access on virtio_mmio_realizefn(). I there a
way to do that without building a new access interface? If it isn't,
knowing this is an specific hack for microvm, is it really worth
building it or can we just keep it as is in v4?

Thanks,
Sergio.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH v4 8/8] hw/i386: Introduce the microvm machine type
@ 2019-10-01  8:56         ` Sergio Lopez
  0 siblings, 0 replies; 133+ messages in thread
From: Sergio Lopez @ 2019-10-01  8:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost, kvm, lersek, mtosatti, qemu-devel, kraxel, pbonzini,
	imammedo, philmd, rth

[-- Attachment #1: Type: text/plain, Size: 2825 bytes --]


Sergio Lopez <slp@redhat.com> writes:

> Michael S. Tsirkin <mst@redhat.com> writes:
>
>> On Tue, Sep 24, 2019 at 02:44:33PM +0200, Sergio Lopez wrote:
>>> +static void microvm_fix_kernel_cmdline(MachineState *machine)
>>> +{
>>> +    X86MachineState *x86ms = X86_MACHINE(machine);
>>> +    BusState *bus;
>>> +    BusChild *kid;
>>> +    char *cmdline;
>>> +
>>> +    /*
>>> +     * Find MMIO transports with attached devices, and add them to the kernel
>>> +     * command line.
>>> +     *
>>> +     * Yes, this is a hack, but one that heavily improves the UX without
>>> +     * introducing any significant issues.
>>> +     */
>>> +    cmdline = g_strdup(machine->kernel_cmdline);
>>> +    bus = sysbus_get_default();
>>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>>> +        DeviceState *dev = kid->child;
>>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>>> +
>>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>>> +
>>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>>> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
>>> +                if (mmio_cmdline) {
>>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>>> +                    g_free(mmio_cmdline);
>>> +                    g_free(cmdline);
>>> +                    cmdline = newcmd;
>>> +                }
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
>>> +    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
>>> +}
>>
>> Can we rearrange this somewhat? Maybe the mmio constructor
>> would format the device description and add to some list,
>> and then microvm would just get stuff from that list
>> and add it to kernel command line?
>> This way it can also be controlled by a virtio-mmio property, so
>> e.g. you can disable it per device if you like.
>> In particular, this seems like a handy trick for any machine type
>> using mmio.
>
> Disabling it per-device won't be easy, as transport options can't be
> specified using the underlying device properties.
>
> But, otherwise, sounds like a good idea to avoid having to traverse the
> qtree. I'll give it a try.

Hi Michael,

I'm working on this, but can't find an easy way to obtain the actual IRQ
number with the data I have access on virtio_mmio_realizefn(). I there a
way to do that without building a new access interface? If it isn't,
knowing this is an specific hack for microvm, is it really worth
building it or can we just keep it as is in v4?

Thanks,
Sergio.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 133+ messages in thread

end of thread, other threads:[~2019-10-01  9:03 UTC | newest]

Thread overview: 133+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-24 12:44 [PATCH v4 0/8] Introduce the microvm machine type Sergio Lopez
2019-09-24 12:44 ` Sergio Lopez
2019-09-24 12:44 ` [PATCH v4 1/8] hw/i386: Factorize PVH related functions Sergio Lopez
2019-09-24 12:44   ` Sergio Lopez
2019-09-24 13:18   ` Philippe Mathieu-Daudé
2019-09-24 13:18     ` Philippe Mathieu-Daudé
2019-09-25  6:03     ` Sergio Lopez
2019-09-25  6:03       ` Sergio Lopez
2019-09-25  8:36   ` Stefano Garzarella
2019-09-25  8:36     ` Stefano Garzarella
2019-09-25  9:00     ` Sergio Lopez
2019-09-25  9:00       ` Sergio Lopez
2019-09-25  9:29       ` Stefano Garzarella
2019-09-25  9:29         ` Stefano Garzarella
2019-09-24 12:44 ` [PATCH v4 2/8] hw/i386: Factorize e820 " Sergio Lopez
2019-09-24 12:44   ` Sergio Lopez
2019-09-24 13:20   ` Philippe Mathieu-Daudé
2019-09-24 13:20     ` Philippe Mathieu-Daudé
2019-09-24 14:12     ` Sergio Lopez
2019-09-24 14:12       ` Sergio Lopez
2019-09-24 12:44 ` [PATCH v4 3/8] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
2019-09-24 12:44   ` Sergio Lopez
2019-09-24 12:44 ` [PATCH v4 4/8] hw/i386: split PCMachineState deriving X86MachineState from it Sergio Lopez
2019-09-24 13:40   ` Philippe Mathieu-Daudé
2019-09-25 15:39     ` Philippe Mathieu-Daudé
2019-09-24 12:44 ` [PATCH v4 5/8] fw_cfg: add "modify" functions for all types Sergio Lopez
2019-09-24 12:44   ` Sergio Lopez
2019-09-24 12:44 ` [PATCH v4 6/8] roms: add microvm-bios (qboot) as binary and git submodule Sergio Lopez
2019-09-24 12:44   ` Sergio Lopez
2019-09-24 13:31   ` Philippe Mathieu-Daudé
2019-09-24 13:31     ` Philippe Mathieu-Daudé
2019-09-25  6:09     ` Sergio Lopez
2019-09-25  6:09       ` Sergio Lopez
2019-09-24 12:44 ` [PATCH v4 7/8] docs/microvm.txt: document the new microvm machine type Sergio Lopez
2019-09-24 12:44   ` Sergio Lopez
2019-09-24 13:10   ` Paolo Bonzini
2019-09-24 13:10     ` Paolo Bonzini
2019-09-25  5:49     ` Sergio Lopez
2019-09-25  5:49       ` Sergio Lopez
2019-09-25  7:57       ` Paolo Bonzini
2019-09-25  7:57         ` Paolo Bonzini
2019-09-25  8:40         ` Sergio Lopez
2019-09-25  8:40           ` Sergio Lopez
2019-09-25  9:22           ` Paolo Bonzini
2019-09-25  9:22             ` Paolo Bonzini
2019-09-25 11:04             ` Sergio Lopez
2019-09-25 11:04               ` Sergio Lopez
2019-09-25 11:20               ` Paolo Bonzini
2019-09-25 11:20                 ` Paolo Bonzini
2019-09-25 15:04     ` Sergio Lopez
2019-09-25 15:04       ` Sergio Lopez
2019-09-25 16:46       ` Paolo Bonzini
2019-09-25 16:46         ` Paolo Bonzini
2019-09-26  6:23         ` Sergio Lopez
2019-09-26  6:23           ` Sergio Lopez
2019-09-26  8:58           ` Paolo Bonzini
2019-09-26  8:58             ` Paolo Bonzini
2019-09-26 10:16             ` Sergio Lopez
2019-09-26 10:16               ` Sergio Lopez
2019-09-26 10:21               ` Paolo Bonzini
2019-09-26 10:21                 ` Paolo Bonzini
2019-09-26 12:12                 ` Sergio Lopez
2019-09-26 12:12                   ` Sergio Lopez
2019-09-25  5:06   ` Gerd Hoffmann
2019-09-25  5:06     ` Gerd Hoffmann
2019-09-25  7:33     ` Sergio Lopez
2019-09-25  7:33       ` Sergio Lopez
2019-09-25  8:51       ` Gerd Hoffmann
2019-09-25  8:51         ` Gerd Hoffmann
2019-09-24 12:44 ` [PATCH v4 8/8] hw/i386: Introduce the " Sergio Lopez
2019-09-24 12:44   ` Sergio Lopez
2019-09-24 13:12   ` Paolo Bonzini
2019-09-24 13:12     ` Paolo Bonzini
2019-09-24 13:24     ` Michael S. Tsirkin
2019-09-24 13:24       ` Michael S. Tsirkin
2019-09-24 13:34       ` Paolo Bonzini
2019-09-24 13:34         ` Paolo Bonzini
2019-09-25  5:53     ` Sergio Lopez
2019-09-25  5:53       ` Sergio Lopez
2019-09-24 13:28   ` Michael S. Tsirkin
2019-09-24 13:28     ` Michael S. Tsirkin
2019-09-25  5:59     ` Sergio Lopez
2019-09-25  5:59       ` Sergio Lopez
2019-10-01  8:56       ` Sergio Lopez
2019-10-01  8:56         ` Sergio Lopez
2019-09-25 15:40   ` Philippe Mathieu-Daudé
2019-09-25 15:40     ` Philippe Mathieu-Daudé
2019-09-26  6:34     ` Sergio Lopez
2019-09-26  6:34       ` Sergio Lopez
2019-09-26  8:59       ` Paolo Bonzini
2019-09-26  8:59         ` Paolo Bonzini
2019-09-24 13:57 ` [PATCH v4 0/8] " Peter Maydell
2019-09-24 13:57   ` Peter Maydell
2019-09-25  5:51   ` Sergio Lopez
2019-09-25  5:51     ` Sergio Lopez
2019-09-25 11:33     ` Philippe Mathieu-Daudé
2019-09-25 11:33       ` Philippe Mathieu-Daudé
2019-09-25 12:39       ` Peter Maydell
2019-09-25 12:39         ` Peter Maydell
2019-09-25  7:41 ` David Hildenbrand
2019-09-25  7:41   ` David Hildenbrand
2019-09-25  7:58   ` Pankaj Gupta
2019-09-25  7:58     ` Pankaj Gupta
2019-09-25  8:10   ` Sergio Lopez
2019-09-25  8:10     ` Sergio Lopez
2019-09-25  8:16     ` David Hildenbrand
2019-09-25  8:16       ` David Hildenbrand
2019-09-25  8:37       ` Pankaj Gupta
2019-09-25  8:37         ` Pankaj Gupta
2019-09-25  8:26     ` Paolo Bonzini
2019-09-25  8:26       ` Paolo Bonzini
2019-09-25  8:42       ` Sergio Lopez
2019-09-25  8:42         ` Sergio Lopez
2019-09-25  8:44       ` David Hildenbrand
2019-09-25  8:44         ` David Hildenbrand
2019-09-25 10:19         ` when to use virtio (was Re: [PATCH v4 0/8] Introduce the microvm machine type) Paolo Bonzini
2019-09-25 10:19           ` Paolo Bonzini
2019-09-25 10:50           ` David Hildenbrand
2019-09-25 10:50             ` David Hildenbrand
2019-09-25 11:24             ` Paolo Bonzini
2019-09-25 11:24               ` Paolo Bonzini
2019-09-25 11:32               ` David Hildenbrand
2019-09-25 11:32                 ` David Hildenbrand
2019-09-25  9:12       ` [PATCH v4 0/8] Introduce the microvm machine type Gerd Hoffmann
2019-09-25  9:12         ` Gerd Hoffmann
2019-09-25  9:29         ` Paolo Bonzini
2019-09-25  9:29           ` Paolo Bonzini
2019-09-25  9:47         ` David Hildenbrand
2019-09-25  9:47           ` David Hildenbrand
2019-09-26  7:48 ` Christian Borntraeger
2019-09-26  7:48   ` Christian Borntraeger
2019-09-26  8:22   ` Sergio Lopez
2019-09-26  8:22     ` Sergio Lopez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.