qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type
@ 2019-06-28 11:53 Sergio Lopez
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 1/4] hw/i386: Factorize CPU routine Sergio Lopez
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 11:53 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost; +Cc: qemu-devel, Sergio Lopez

Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a KVM-only machine type with fast
boot times, minimal attack surface (measured as the number of IO ports
and MMIO regions exposed to the Guest) and small footprint (specially
when combined with the ongoing QEMU modularization effort).

Normally, other than the device support provided by KVM itself,
microvm only supports virtio-mmio devices. Microvm also includes a
legacy mode, which adds an ISA bus with a 16550A serial port, useful
for being able to see the early boot kernel messages.

This is the list of the exposed IO ports and MMIO regions when running
in non-legacy mode:

address-space: memory
    00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
    00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
    00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
    00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
    00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
    00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
    00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
    00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
    00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi

address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io
    0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
    0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
    000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
    00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
    00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
    00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr

A QEMU instance with the microvm machine type can be invoked this way:

 - Normal mode:

qemu-system-x86_64 -M microvm -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config \
 -chardev pty,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

 - Legacy mode:

qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
 -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
 -nodefaults -no-user-config \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0 \
 -serial stdio


Sergio Lopez (4):
  hw/i386: Factorize CPU routine
  hw/virtio: Factorize virtio-mmio headers
  hw/i386: Add an Intel MPTable generator
  hw/i386: Introduce the microvm machine type

 default-configs/i386-softmmu.mak            |   1 +
 hw/i386/Kconfig                             |   4 +
 hw/i386/Makefile.objs                       |   2 +
 hw/i386/cpu.c                               | 174 +++++++
 hw/i386/microvm.c                           | 518 ++++++++++++++++++++
 hw/i386/mptable.c                           | 157 ++++++
 hw/i386/pc.c                                | 151 +-----
 hw/i386/pc_piix.c                           |   3 +-
 hw/i386/pc_q35.c                            |   3 +-
 hw/virtio/virtio-mmio.c                     |  35 +-
 hw/virtio/virtio-mmio.h                     |  60 +++
 include/hw/i386/apic.h                      |   1 +
 include/hw/i386/cpu-internal.h              |  32 ++
 include/hw/i386/microvm.h                   |  85 ++++
 include/hw/i386/mptable.h                   |  37 ++
 include/standard-headers/linux/mpspec_def.h | 182 +++++++
 16 files changed, 1264 insertions(+), 181 deletions(-)
 create mode 100644 hw/i386/cpu.c
 create mode 100644 hw/i386/microvm.c
 create mode 100644 hw/i386/mptable.c
 create mode 100644 hw/virtio/virtio-mmio.h
 create mode 100644 include/hw/i386/cpu-internal.h
 create mode 100644 include/hw/i386/microvm.h
 create mode 100644 include/hw/i386/mptable.h
 create mode 100644 include/standard-headers/linux/mpspec_def.h

--
2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 1/4] hw/i386: Factorize CPU routine
  2019-06-28 11:53 [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type Sergio Lopez
@ 2019-06-28 11:53 ` Sergio Lopez
  2019-06-28 20:03   ` Eduardo Habkost
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 11:53 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost; +Cc: qemu-devel, Sergio Lopez

This is a combination of the following commits from
NEMU (https://github.com/intel/nemu):

===============================================
commit b6472ce5ce5108c7aacb0dfa3d74b3eb8f98ae85
Author: Samuel Ortiz <sameo@linux.intel.com>
Date:   Fri Mar 22 10:28:31 2019 +0800

    hw: i386: Factorize CPU routines

    A few routines are now shared between pc_* and virt, including the CPU
    init one.
    We factorize those routines into an i386 specific file that is now used
    by all x86 machines.

    Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

commit f29f3c294a889ad659dc8808728e8441e23a675c
Author: Samuel Ortiz <sameo@linux.intel.com>
Date:   Mon Oct 8 15:37:17 2018 +0200

    hw: i386: Remove the pc header dependency from the cpu code

    It's only a matter of moving the compat APIC boolean to the correct
    header file (apic.h).

    Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
===============================================

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/i386/Makefile.objs          |   1 +
 hw/i386/cpu.c                  | 174 +++++++++++++++++++++++++++++++++
 hw/i386/pc.c                   | 151 ++--------------------------
 hw/i386/pc_piix.c              |   3 +-
 hw/i386/pc_q35.c               |   3 +-
 include/hw/i386/apic.h         |   1 +
 include/hw/i386/cpu-internal.h |  32 ++++++
 7 files changed, 218 insertions(+), 147 deletions(-)
 create mode 100644 hw/i386/cpu.c
 create mode 100644 include/hw/i386/cpu-internal.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 5d9c9efd5f..102f2b35fc 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
+obj-y += cpu.o
 obj-y += pc.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
diff --git a/hw/i386/cpu.c b/hw/i386/cpu.c
new file mode 100644
index 0000000000..e13ae61535
--- /dev/null
+++ b/hw/i386/cpu.c
@@ -0,0 +1,174 @@
+/*
+ *
+ * Copyright (c) 2018 Intel Corportation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+
+#include "sysemu/cpus.h"
+#include "sysemu/qtest.h"
+#include "sysemu/numa.h"
+#include "sysemu/sysemu.h"
+
+#include "hw/i386/cpu-internal.h"
+#include "hw/i386/apic.h"
+#include "hw/i386/topology.h"
+
+#include "hw/acpi/pc-hotplug.h"
+
+static void cpu_new(const char *typename, int64_t apic_id, Error **errp)
+{
+    Object *cpu = NULL;
+    Error *local_err = NULL;
+
+    cpu = object_new(typename);
+
+    object_property_set_uint(cpu, apic_id, "apic-id", &local_err);
+    object_property_set_bool(cpu, true, "realized", &local_err);
+
+    object_unref(cpu);
+    error_propagate(errp, local_err);
+}
+
+/* Calculates initial APIC ID for a specific CPU index
+ *
+ * Currently we need to be able to calculate the APIC ID from the CPU index
+ * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces have
+ * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of
+ * all CPUs up to max_cpus.
+ */
+uint32_t cpu_apicid_from_index(unsigned int cpu_index, bool compat)
+{
+    uint32_t correct_id;
+    static bool warned;
+
+    correct_id = x86_apicid_from_cpu_idx(smp_cores, smp_threads, cpu_index);
+    if (compat) {
+        if (cpu_index != correct_id && !warned && !qtest_enabled()) {
+            error_report("APIC IDs set in compatibility mode, "
+                         "CPU topology won't match the configuration");
+            warned = true;
+        }
+        return cpu_index;
+    } else {
+        return correct_id;
+    }
+}
+
+CpuInstanceProperties cpu_index_to_props(MachineState *ms, unsigned cpu_index)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+    assert(cpu_index < possible_cpus->len);
+    return possible_cpus->cpus[cpu_index].props;
+}
+
+
+int64_t cpu_get_default_cpu_node_id(const MachineState *ms, int idx)
+{
+   X86CPUTopoInfo topo;
+
+   assert(idx < ms->possible_cpus->len);
+   x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
+                            smp_cores, smp_threads, &topo);
+   return topo.pkg_id % nb_numa_nodes;
+}
+
+const CPUArchIdList *cpu_possible_cpu_arch_ids(MachineState *ms)
+{
+    int i;
+
+    if (ms->possible_cpus) {
+        /*
+         * make sure that max_cpus hasn't changed since the first use, i.e.
+         * -smp hasn't been parsed after it
+        */
+        assert(ms->possible_cpus->len == max_cpus);
+        return ms->possible_cpus;
+    }
+
+    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
+                                  sizeof(CPUArchId) * max_cpus);
+    ms->possible_cpus->len = max_cpus;
+    for (i = 0; i < ms->possible_cpus->len; i++) {
+        X86CPUTopoInfo topo;
+
+        ms->possible_cpus->cpus[i].type = ms->cpu_type;
+        ms->possible_cpus->cpus[i].vcpus_count = 1;
+        ms->possible_cpus->cpus[i].arch_id = cpu_apicid_from_index(i, compat_apic_id_mode);
+        x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
+                                 smp_cores, smp_threads, &topo);
+        ms->possible_cpus->cpus[i].props.has_socket_id = true;
+        ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id;
+        ms->possible_cpus->cpus[i].props.has_core_id = true;
+        ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
+        ms->possible_cpus->cpus[i].props.has_thread_id = true;
+        ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
+    }
+    return ms->possible_cpus;
+}
+
+
+void cpu_hot_add(const int64_t id, Error **errp)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    int64_t apic_id = cpu_apicid_from_index(id, compat_apic_id_mode);
+    Error *local_err = NULL;
+
+    if (id < 0) {
+        error_setg(errp, "Invalid CPU id: %" PRIi64, id);
+        return;
+    }
+
+    if (apic_id >= ACPI_CPU_HOTPLUG_ID_LIMIT) {
+        error_setg(errp, "Unable to add CPU: %" PRIi64
+                   ", resulting APIC ID (%" PRIi64 ") is too large",
+                   id, apic_id);
+        return;
+    }
+
+    cpu_new(ms->cpu_type, apic_id, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+uint32_t cpus_init(MachineState *ms, bool compat)
+{
+    int i;
+    uint32_t apic_id_limit;
+    const CPUArchIdList *possible_cpus;
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+
+    /* Calculates the limit to CPU APIC ID values
+     *
+     * Limit for the APIC ID value, so that all
+     * CPU APIC IDs are < ms->apic_id_limit.
+     *
+     * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
+     */
+    apic_id_limit = cpu_apicid_from_index(max_cpus - 1, compat) + 1;
+    possible_cpus = mc->possible_cpu_arch_ids(ms);
+    for (i = 0; i < smp_cpus; i++) {
+        cpu_new(possible_cpus->cpus[i].type, possible_cpus->cpus[i].arch_id,
+                &error_fatal);
+    }
+
+    return apic_id_limit;
+}
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e96360b47a..07d67a5031 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -26,6 +26,7 @@
 #include "qemu/units.h"
 #include "hw/hw.h"
 #include "hw/i386/pc.h"
+#include "hw/i386/cpu-internal.h"
 #include "hw/char/serial.h"
 #include "hw/char/parallel.h"
 #include "hw/i386/apic.h"
@@ -914,38 +915,13 @@ bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
 }
 
 /* Enables contiguous-apic-ID mode, for compatibility */
-static bool compat_apic_id_mode;
+bool compat_apic_id_mode;
 
 void enable_compat_apic_id_mode(void)
 {
     compat_apic_id_mode = true;
 }
 
-/* Calculates initial APIC ID for a specific CPU index
- *
- * Currently we need to be able to calculate the APIC ID from the CPU index
- * alone (without requiring a CPU object), as the QEMU<->Seabios interfaces have
- * no concept of "CPU index", and the NUMA tables on fw_cfg need the APIC ID of
- * all CPUs up to max_cpus.
- */
-static uint32_t x86_cpu_apic_id_from_index(unsigned int cpu_index)
-{
-    uint32_t correct_id;
-    static bool warned;
-
-    correct_id = x86_apicid_from_cpu_idx(smp_cores, smp_threads, cpu_index);
-    if (compat_apic_id_mode) {
-        if (cpu_index != correct_id && !warned && !qtest_enabled()) {
-            error_report("APIC IDs set in compatibility mode, "
-                         "CPU topology won't match the configuration");
-            warned = true;
-        }
-        return cpu_index;
-    } else {
-        return correct_id;
-    }
-}
-
 static void pc_build_smbios(PCMachineState *pcms)
 {
     uint8_t *smbios_tables, *smbios_anchor;
@@ -1516,67 +1492,6 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level)
     }
 }
 
-static void pc_new_cpu(const char *typename, int64_t apic_id, Error **errp)
-{
-    Object *cpu = NULL;
-    Error *local_err = NULL;
-
-    cpu = object_new(typename);
-
-    object_property_set_uint(cpu, apic_id, "apic-id", &local_err);
-    object_property_set_bool(cpu, true, "realized", &local_err);
-
-    object_unref(cpu);
-    error_propagate(errp, local_err);
-}
-
-void pc_hot_add_cpu(const int64_t id, Error **errp)
-{
-    MachineState *ms = MACHINE(qdev_get_machine());
-    int64_t apic_id = x86_cpu_apic_id_from_index(id);
-    Error *local_err = NULL;
-
-    if (id < 0) {
-        error_setg(errp, "Invalid CPU id: %" PRIi64, id);
-        return;
-    }
-
-    if (apic_id >= ACPI_CPU_HOTPLUG_ID_LIMIT) {
-        error_setg(errp, "Unable to add CPU: %" PRIi64
-                   ", resulting APIC ID (%" PRIi64 ") is too large",
-                   id, apic_id);
-        return;
-    }
-
-    pc_new_cpu(ms->cpu_type, apic_id, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        return;
-    }
-}
-
-void pc_cpus_init(PCMachineState *pcms)
-{
-    int i;
-    const CPUArchIdList *possible_cpus;
-    MachineState *ms = MACHINE(pcms);
-    MachineClass *mc = MACHINE_GET_CLASS(pcms);
-
-    /* Calculates the limit to CPU APIC ID values
-     *
-     * Limit for the APIC ID value, so that all
-     * CPU APIC IDs are < pcms->apic_id_limit.
-     *
-     * This is used for FW_CFG_MAX_CPUS. See comments on bochs_bios_init().
-     */
-    pcms->apic_id_limit = x86_cpu_apic_id_from_index(max_cpus - 1) + 1;
-    possible_cpus = mc->possible_cpu_arch_ids(ms);
-    for (i = 0; i < smp_cpus; i++) {
-        pc_new_cpu(possible_cpus->cpus[i].type, possible_cpus->cpus[i].arch_id,
-                   &error_fatal);
-    }
-}
-
 static void pc_build_feature_control_file(PCMachineState *pcms)
 {
     MachineState *ms = MACHINE(pcms);
@@ -2638,60 +2553,6 @@ static void pc_machine_reset(void)
     }
 }
 
-static CpuInstanceProperties
-pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
-{
-    MachineClass *mc = MACHINE_GET_CLASS(ms);
-    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
-
-    assert(cpu_index < possible_cpus->len);
-    return possible_cpus->cpus[cpu_index].props;
-}
-
-static int64_t pc_get_default_cpu_node_id(const MachineState *ms, int idx)
-{
-   X86CPUTopoInfo topo;
-
-   assert(idx < ms->possible_cpus->len);
-   x86_topo_ids_from_apicid(ms->possible_cpus->cpus[idx].arch_id,
-                            smp_cores, smp_threads, &topo);
-   return topo.pkg_id % nb_numa_nodes;
-}
-
-static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
-{
-    int i;
-
-    if (ms->possible_cpus) {
-        /*
-         * make sure that max_cpus hasn't changed since the first use, i.e.
-         * -smp hasn't been parsed after it
-        */
-        assert(ms->possible_cpus->len == max_cpus);
-        return ms->possible_cpus;
-    }
-
-    ms->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
-                                  sizeof(CPUArchId) * max_cpus);
-    ms->possible_cpus->len = max_cpus;
-    for (i = 0; i < ms->possible_cpus->len; i++) {
-        X86CPUTopoInfo topo;
-
-        ms->possible_cpus->cpus[i].type = ms->cpu_type;
-        ms->possible_cpus->cpus[i].vcpus_count = 1;
-        ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(i);
-        x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
-                                 smp_cores, smp_threads, &topo);
-        ms->possible_cpus->cpus[i].props.has_socket_id = true;
-        ms->possible_cpus->cpus[i].props.socket_id = topo.pkg_id;
-        ms->possible_cpus->cpus[i].props.has_core_id = true;
-        ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
-        ms->possible_cpus->cpus[i].props.has_thread_id = true;
-        ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
-    }
-    return ms->possible_cpus;
-}
-
 static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
 {
     /* cpu index isn't used */
@@ -2732,13 +2593,13 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     pcmc->pvh_enabled = true;
     assert(!mc->get_hotplug_handler);
     mc->get_hotplug_handler = pc_get_hotplug_handler;
-    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
-    mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
-    mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+    mc->cpu_index_to_instance_props = cpu_index_to_props;
+    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
+    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;
     mc->auto_enable_numa_with_memhp = true;
     mc->has_hotpluggable_cpus = true;
     mc->default_boot_order = "cad";
-    mc->hot_add_cpu = pc_hot_add_cpu;
+    mc->hot_add_cpu = cpu_hot_add;
     mc->block_default_type = IF_IDE;
     mc->max_cpus = 255;
     mc->reset = pc_machine_reset;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index c07c4a5b38..1e240004dd 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -28,6 +28,7 @@
 #include "hw/hw.h"
 #include "hw/loader.h"
 #include "hw/i386/pc.h"
+#include "hw/i386/cpu-internal.h"
 #include "hw/i386/apic.h"
 #include "hw/display/ramfb.h"
 #include "hw/firmware/smbios.h"
@@ -150,7 +151,7 @@ static void pc_init1(MachineState *machine,
         }
     }
 
-    pc_cpus_init(pcms);
+    pcms->apic_id_limit = cpus_init(machine, compat_apic_id_mode);
 
     if (kvm_enabled() && pcmc->kvmclock_enabled) {
         kvmclock_create();
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 57232aed6b..308cd04a13 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -43,6 +43,7 @@
 #include "hw/pci-host/q35.h"
 #include "exec/address-spaces.h"
 #include "hw/i386/pc.h"
+#include "hw/i386/cpu-internal.h"
 #include "hw/i386/ich9.h"
 #include "hw/i386/amd_iommu.h"
 #include "hw/i386/intel_iommu.h"
@@ -180,7 +181,7 @@ static void pc_q35_init(MachineState *machine)
         xen_hvm_init(pcms, &ram_memory);
     }
 
-    pc_cpus_init(pcms);
+    pcms->apic_id_limit = cpus_init(machine, compat_apic_id_mode);
 
     kvmclock_create();
 
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index da1d2fe155..f72be753b8 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -23,5 +23,6 @@ int apic_get_highest_priority_irr(DeviceState *dev);
 
 /* pc.c */
 DeviceState *cpu_get_current_apic(void);
+extern bool compat_apic_id_mode;
 
 #endif
diff --git a/include/hw/i386/cpu-internal.h b/include/hw/i386/cpu-internal.h
new file mode 100644
index 0000000000..48a5253aa9
--- /dev/null
+++ b/include/hw/i386/cpu-internal.h
@@ -0,0 +1,32 @@
+/*
+ *
+ * Copyright (c) 2018 Intel Corportation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_I386_CPU_H
+#define QEMU_I386_CPU_H
+
+#include "hw/boards.h"
+
+uint32_t cpu_apicid_from_index(unsigned int cpu_index, bool compat);
+
+CpuInstanceProperties cpu_index_to_props(MachineState *ms, unsigned cpu_index);
+int64_t cpu_get_default_cpu_node_id(const MachineState *ms, int idx);
+const CPUArchIdList *cpu_possible_cpu_arch_ids(MachineState *ms);
+
+void cpu_hot_add(const int64_t id, Error **errp);
+uint32_t cpus_init(MachineState *ms, bool compat);
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers
  2019-06-28 11:53 [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type Sergio Lopez
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 1/4] hw/i386: Factorize CPU routine Sergio Lopez
@ 2019-06-28 11:53 ` Sergio Lopez
  2019-06-28 14:03   ` Michael S. Tsirkin
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 3/4] hw/i386: Add an Intel MPTable generator Sergio Lopez
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 11:53 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost; +Cc: qemu-devel, Sergio Lopez

Put QOM and main struct definition in a separate header file, so it
can be accesed from other components.

This is needed for the microvm machine type implementation.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/virtio/virtio-mmio.c | 35 +-----------------------
 hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 34 deletions(-)
 create mode 100644 hw/virtio/virtio-mmio.h

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 97b7f35496..87c7fe4d8d 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -26,44 +26,11 @@
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
 #include "sysemu/kvm.h"
-#include "hw/virtio/virtio-bus.h"
+#include "virtio-mmio.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "trace.h"
 
-/* QOM macros */
-/* virtio-mmio-bus */
-#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
-#define VIRTIO_MMIO_BUS(obj) \
-        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
-        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_CLASS(klass) \
-        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
-
-/* virtio-mmio */
-#define TYPE_VIRTIO_MMIO "virtio-mmio"
-#define VIRTIO_MMIO(obj) \
-        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
-
-#define VIRT_MAGIC 0x74726976 /* 'virt' */
-#define VIRT_VERSION 1
-#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
-
-typedef struct {
-    /* Generic */
-    SysBusDevice parent_obj;
-    MemoryRegion iomem;
-    qemu_irq irq;
-    /* Guest accessible state needing migration and reset */
-    uint32_t host_features_sel;
-    uint32_t guest_features_sel;
-    uint32_t guest_page_shift;
-    /* virtio-bus */
-    VirtioBusState bus;
-    bool format_transport_address;
-} VirtIOMMIOProxy;
-
 static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
 {
     return kvm_eventfds_enabled();
diff --git a/hw/virtio/virtio-mmio.h b/hw/virtio/virtio-mmio.h
new file mode 100644
index 0000000000..2f3973f8c7
--- /dev/null
+++ b/hw/virtio/virtio-mmio.h
@@ -0,0 +1,60 @@
+/*
+ * Virtio MMIO bindings
+ *
+ * Copyright (c) 2011 Linaro Limited
+ *
+ * Author:
+ *  Peter Maydell <peter.maydell@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_VIRTIO_MMIO_H
+#define QEMU_VIRTIO_MMIO_H
+
+#include "hw/virtio/virtio-bus.h"
+
+/* QOM macros */
+/* virtio-mmio-bus */
+#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
+#define VIRTIO_MMIO_BUS(obj) \
+        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_CLASS(klass) \
+        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
+
+/* virtio-mmio */
+#define TYPE_VIRTIO_MMIO "virtio-mmio"
+#define VIRTIO_MMIO(obj) \
+        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
+
+#define VIRT_MAGIC 0x74726976 /* 'virt' */
+#define VIRT_VERSION 1
+#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
+
+typedef struct {
+    /* Generic */
+    SysBusDevice parent_obj;
+    MemoryRegion iomem;
+    qemu_irq irq;
+    /* Guest accessible state needing migration and reset */
+    uint32_t host_features_sel;
+    uint32_t guest_features_sel;
+    uint32_t guest_page_shift;
+    /* virtio-bus */
+    VirtioBusState bus;
+    bool format_transport_address;
+} VirtIOMMIOProxy;
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 3/4] hw/i386: Add an Intel MPTable generator
  2019-06-28 11:53 [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type Sergio Lopez
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 1/4] hw/i386: Factorize CPU routine Sergio Lopez
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
@ 2019-06-28 11:53 ` Sergio Lopez
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 11:53 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost; +Cc: qemu-devel, Sergio Lopez

Add a helper function (mptable_generate) for generating an Intel
MPTable according to version 1.4 of the specification.

This is needed for the microvm machine type implementation.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/i386/mptable.c                           | 157 +++++++++++++++++
 include/hw/i386/mptable.h                   |  37 ++++
 include/standard-headers/linux/mpspec_def.h | 182 ++++++++++++++++++++
 3 files changed, 376 insertions(+)
 create mode 100644 hw/i386/mptable.c
 create mode 100644 include/hw/i386/mptable.h
 create mode 100644 include/standard-headers/linux/mpspec_def.h

diff --git a/hw/i386/mptable.c b/hw/i386/mptable.c
new file mode 100644
index 0000000000..c5cb57dd18
--- /dev/null
+++ b/hw/i386/mptable.c
@@ -0,0 +1,157 @@
+/*
+ * Intel MPTable generator
+ *
+ * Copyright (C) 2019 Red Hat, Inc.
+ *
+ * Authors:
+ *   Sergio Lopez <slp@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i386/mptable.h"
+#include "standard-headers/linux/mpspec_def.h"
+
+static int mptable_checksum(char *buf, int size)
+{
+    int i;
+    int checksum = 0;
+
+    for (i = 0; i < size; i++) {
+        checksum += buf[i];
+    }
+
+    return checksum;
+}
+
+/*
+ * Generate an MPTable for "ncpus". "apic_id" must be the next available
+ * APIC ID (last CPU apic_id + 1). "table_base" is the physical location
+ * in the Guest where the caller intends to write the table, needed to
+ * fill the "physptr" field from the "mpf_intel" structure.
+ *
+ * On success, return a newly allocated buffer, that must be freed by the
+ * caller using "g_free" when it's no longer needed, and update
+ * "mptable_size" with the size of the buffer.
+ */
+char *mptable_generate(int ncpus, int apic_id,
+                        int table_base, int *mptable_size)
+{
+    struct mpf_intel *mpf;
+    struct mpc_table *table;
+    struct mpc_cpu *cpu;
+    struct mpc_bus *bus;
+    struct mpc_ioapic *ioapic;
+    struct mpc_intsrc *intsrc;
+    struct mpc_lintsrc *lintsrc;
+    const char mpc_signature[] = MPC_SIGNATURE;
+    const char smp_magic_ident[] = "_MP_";
+    char *mptable;
+    int checksum = 0;
+    int offset = 0;
+    int ssize;
+    int i;
+
+    ssize = sizeof(struct mpf_intel);
+    mptable = g_malloc0(ssize);
+
+    mpf = (struct mpf_intel *) mptable;
+    memcpy(mpf->signature, smp_magic_ident, sizeof(smp_magic_ident) - 1);
+    mpf->length = 1;
+    mpf->specification = 4;
+    mpf->physptr = table_base + ssize;
+    mpf->checksum -= mptable_checksum((char *) mpf, ssize);
+    offset = ssize + sizeof(struct mpc_table);
+
+    ssize = sizeof(struct mpc_cpu);
+    for (i = 0; i < ncpus; i++) {
+        mptable = g_realloc(mptable, offset + ssize);
+        cpu = (struct mpc_cpu *) (mptable + offset);
+        cpu->type = MP_PROCESSOR;
+        cpu->apicid = i;
+        cpu->apicver = APIC_VERSION;
+        cpu->cpuflag = CPU_ENABLED;
+        if (i == 0) {
+            cpu->cpuflag |= CPU_BOOTPROCESSOR;
+        }
+        cpu->cpufeature = CPU_STEPPING;
+        cpu->featureflag = CPU_FEATURE_APIC | CPU_FEATURE_FPU;
+        checksum += mptable_checksum((char *) cpu, ssize);
+        offset += ssize;
+    }
+
+    ssize = sizeof(struct mpc_bus);
+    mptable = g_realloc(mptable, offset + ssize);
+    bus = (struct mpc_bus *) (mptable + offset);
+    bus->type = MP_BUS;
+    bus->busid = 0;
+    memcpy(bus->bustype, BUS_TYPE_ISA, sizeof(BUS_TYPE_ISA) - 1);
+    checksum += mptable_checksum((char *) bus, ssize);
+    offset += ssize;
+
+    ssize = sizeof(struct mpc_ioapic);
+    mptable = g_realloc(mptable, offset + ssize);
+    ioapic = (struct mpc_ioapic *) (mptable + offset);
+    ioapic->type = MP_IOAPIC;
+    ioapic->apicid = ncpus + 1;
+    ioapic->apicver = APIC_VERSION;
+    ioapic->flags = MPC_APIC_USABLE;
+    ioapic->apicaddr = IO_APIC_DEFAULT_PHYS_BASE;
+    checksum += mptable_checksum((char *) ioapic, ssize);
+    offset += ssize;
+
+    ssize = sizeof(struct mpc_intsrc);
+    for (i = 0; i < 16; i++) {
+        mptable = g_realloc(mptable, offset + ssize);
+        intsrc = (struct mpc_intsrc *) (mptable + offset);
+        intsrc->type = MP_INTSRC;
+        intsrc->irqtype = mp_INT;
+        intsrc->irqflag = MP_IRQDIR_DEFAULT;
+        intsrc->srcbus = 0;
+        intsrc->srcbusirq = i;
+        intsrc->dstapic = ncpus + 1;
+        intsrc->dstirq = i;
+        checksum += mptable_checksum((char *) intsrc, ssize);
+        offset += ssize;
+    }
+
+    ssize = sizeof(struct mpc_lintsrc);
+    mptable = g_realloc(mptable, offset + (ssize * 2));
+    lintsrc = (struct mpc_lintsrc *) (mptable + offset);
+    lintsrc->type = MP_LINTSRC;
+    lintsrc->irqtype = mp_ExtINT;
+    lintsrc->irqflag = MP_IRQDIR_DEFAULT;
+    lintsrc->srcbusid = 0;
+    lintsrc->srcbusirq = 0;
+    lintsrc->destapic = 0;
+    lintsrc->destapiclint = 0;
+    checksum += mptable_checksum((char *) lintsrc, ssize);
+    offset += ssize;
+
+    lintsrc = (struct mpc_lintsrc *) (mptable + offset);
+    lintsrc->type = MP_LINTSRC;
+    lintsrc->irqtype = mp_NMI;
+    lintsrc->irqflag = MP_IRQDIR_DEFAULT;
+    lintsrc->srcbusid = 0;
+    lintsrc->srcbusirq = 0;
+    lintsrc->destapic = 0xFF;
+    lintsrc->destapiclint = 1;
+    checksum += mptable_checksum((char *) lintsrc, ssize);
+    offset += ssize;
+
+    ssize = sizeof(struct mpc_table);
+    table = (struct mpc_table *) (mptable + sizeof(struct mpf_intel));
+    memcpy(table->signature, mpc_signature, sizeof(mpc_signature) - 1);
+    table->length = offset - sizeof(struct mpf_intel);
+    table->spec = MPC_SPEC;
+    memcpy(table->oem, MPC_OEM, sizeof(MPC_OEM) - 1);
+    memcpy(table->productid, MPC_PRODUCT_ID, sizeof(MPC_PRODUCT_ID) - 1);
+    table->lapic = APIC_DEFAULT_PHYS_BASE;
+    checksum += mptable_checksum((char *) table, ssize);
+    table->checksum -= checksum;
+
+    *mptable_size = offset;
+    return mptable;
+}
diff --git a/include/hw/i386/mptable.h b/include/hw/i386/mptable.h
new file mode 100644
index 0000000000..9f9eb82618
--- /dev/null
+++ b/include/hw/i386/mptable.h
@@ -0,0 +1,37 @@
+/*
+ * Intel MPTable generator
+ *
+ * Copyright (C) 2019 Red Hat, Inc.
+ *
+ * Authors:
+ *   Sergio Lopez <slp@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_I386_MPTABLE_H
+#define HW_I386_MPTABLE_H
+
+#define APIC_VERSION     0x14
+#define CPU_STEPPING     0x600
+#define CPU_FEATURE_APIC 0x200
+#define CPU_FEATURE_FPU  0x001
+#define MPC_SPEC         0x4
+
+#define MP_IRQDIR_DEFAULT 0
+#define MP_IRQDIR_HIGH    1
+#define MP_IRQDIR_LOW     3
+
+static const char MPC_OEM[]        = "QEMU    ";
+static const char MPC_PRODUCT_ID[] = "000000000000";
+static const char BUS_TYPE_ISA[]   = "ISA   ";
+
+#define IO_APIC_DEFAULT_PHYS_BASE 0xfec00000
+#define APIC_DEFAULT_PHYS_BASE    0xfee00000
+#define APIC_VERSION              0x14
+
+char *mptable_generate(int ncpus, int apic_id,
+                       int table_base, int *mptable_size);
+
+#endif
diff --git a/include/standard-headers/linux/mpspec_def.h b/include/standard-headers/linux/mpspec_def.h
new file mode 100644
index 0000000000..6fb923a343
--- /dev/null
+++ b/include/standard-headers/linux/mpspec_def.h
@@ -0,0 +1,182 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_MPSPEC_DEF_H
+#define _ASM_X86_MPSPEC_DEF_H
+
+/*
+ * Structure definitions for SMP machines following the
+ * Intel Multiprocessing Specification 1.1 and 1.4.
+ */
+
+/*
+ * This tag identifies where the SMP configuration
+ * information is.
+ */
+
+#define SMP_MAGIC_IDENT	(('_'<<24) | ('P'<<16) | ('M'<<8) | '_')
+
+#ifdef CONFIG_X86_32
+# define MAX_MPC_ENTRY 1024
+#endif
+
+/* Intel MP Floating Pointer Structure */
+struct mpf_intel {
+	char signature[4];		/* "_MP_"			*/
+	unsigned int physptr;		/* Configuration table address	*/
+	unsigned char length;		/* Our length (paragraphs)	*/
+	unsigned char specification;	/* Specification version	*/
+	unsigned char checksum;		/* Checksum (makes sum 0)	*/
+	unsigned char feature1;		/* Standard or configuration ?	*/
+	unsigned char feature2;		/* Bit7 set for IMCR|PIC	*/
+	unsigned char feature3;		/* Unused (0)			*/
+	unsigned char feature4;		/* Unused (0)			*/
+	unsigned char feature5;		/* Unused (0)			*/
+};
+
+#define MPC_SIGNATURE "PCMP"
+
+struct mpc_table {
+	char signature[4];
+	unsigned short length;		/* Size of table */
+	char spec;			/* 0x01 */
+	char checksum;
+	char oem[8];
+	char productid[12];
+	unsigned int oemptr;		/* 0 if not present */
+	unsigned short oemsize;		/* 0 if not present */
+	unsigned short oemcount;
+	unsigned int lapic;		/* APIC address */
+	unsigned int reserved;
+};
+
+/* Followed by entries */
+
+#define	MP_PROCESSOR		0
+#define	MP_BUS			1
+#define	MP_IOAPIC		2
+#define	MP_INTSRC		3
+#define	MP_LINTSRC		4
+/* Used by IBM NUMA-Q to describe node locality */
+#define	MP_TRANSLATION		192
+
+#define CPU_ENABLED		1	/* Processor is available */
+#define CPU_BOOTPROCESSOR	2	/* Processor is the boot CPU */
+
+#define CPU_STEPPING_MASK	0x000F
+#define CPU_MODEL_MASK		0x00F0
+#define CPU_FAMILY_MASK		0x0F00
+
+struct mpc_cpu {
+	unsigned char type;
+	unsigned char apicid;		/* Local APIC number */
+	unsigned char apicver;		/* Its versions */
+	unsigned char cpuflag;
+	unsigned int cpufeature;
+	unsigned int featureflag;	/* CPUID feature value */
+	unsigned int reserved[2];
+};
+
+struct mpc_bus {
+	unsigned char type;
+	unsigned char busid;
+	unsigned char bustype[6];
+};
+
+/* List of Bus Type string values, Intel MP Spec. */
+#define BUSTYPE_EISA	"EISA"
+#define BUSTYPE_ISA	"ISA"
+#define BUSTYPE_INTERN	"INTERN"	/* Internal BUS */
+#define BUSTYPE_MCA	"MCA"		/* Obsolete */
+#define BUSTYPE_VL	"VL"		/* Local bus */
+#define BUSTYPE_PCI	"PCI"
+#define BUSTYPE_PCMCIA	"PCMCIA"
+#define BUSTYPE_CBUS	"CBUS"
+#define BUSTYPE_CBUSII	"CBUSII"
+#define BUSTYPE_FUTURE	"FUTURE"
+#define BUSTYPE_MBI	"MBI"
+#define BUSTYPE_MBII	"MBII"
+#define BUSTYPE_MPI	"MPI"
+#define BUSTYPE_MPSA	"MPSA"
+#define BUSTYPE_NUBUS	"NUBUS"
+#define BUSTYPE_TC	"TC"
+#define BUSTYPE_VME	"VME"
+#define BUSTYPE_XPRESS	"XPRESS"
+
+#define MPC_APIC_USABLE		0x01
+
+struct mpc_ioapic {
+	unsigned char type;
+	unsigned char apicid;
+	unsigned char apicver;
+	unsigned char flags;
+	unsigned int apicaddr;
+};
+
+struct mpc_intsrc {
+	unsigned char type;
+	unsigned char irqtype;
+	unsigned short irqflag;
+	unsigned char srcbus;
+	unsigned char srcbusirq;
+	unsigned char dstapic;
+	unsigned char dstirq;
+};
+
+enum mp_irq_source_types {
+	mp_INT = 0,
+	mp_NMI = 1,
+	mp_SMI = 2,
+	mp_ExtINT = 3
+};
+
+#define MP_IRQPOL_DEFAULT	0x0
+#define MP_IRQPOL_ACTIVE_HIGH	0x1
+#define MP_IRQPOL_RESERVED	0x2
+#define MP_IRQPOL_ACTIVE_LOW	0x3
+#define MP_IRQPOL_MASK		0x3
+
+#define MP_IRQTRIG_DEFAULT	0x0
+#define MP_IRQTRIG_EDGE		0x4
+#define MP_IRQTRIG_RESERVED	0x8
+#define MP_IRQTRIG_LEVEL	0xc
+#define MP_IRQTRIG_MASK		0xc
+
+#define MP_APIC_ALL	0xFF
+
+struct mpc_lintsrc {
+	unsigned char type;
+	unsigned char irqtype;
+	unsigned short irqflag;
+	unsigned char srcbusid;
+	unsigned char srcbusirq;
+	unsigned char destapic;
+	unsigned char destapiclint;
+};
+
+#define MPC_OEM_SIGNATURE "_OEM"
+
+struct mpc_oemtable {
+	char signature[4];
+	unsigned short length;		/* Size of table */
+	char  rev;			/* 0x01 */
+	char  checksum;
+	char  mpc[8];
+};
+
+/*
+ *	Default configurations
+ *
+ *	1	2 CPU ISA 82489DX
+ *	2	2 CPU EISA 82489DX neither IRQ 0 timer nor IRQ 13 DMA chaining
+ *	3	2 CPU EISA 82489DX
+ *	4	2 CPU MCA 82489DX
+ *	5	2 CPU ISA+PCI
+ *	6	2 CPU EISA+PCI
+ *	7	2 CPU MCA+PCI
+ */
+
+enum mp_bustype {
+	MP_BUS_ISA = 1,
+	MP_BUS_EISA,
+	MP_BUS_PCI,
+};
+#endif /* _ASM_X86_MPSPEC_DEF_H */
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 11:53 [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (2 preceding siblings ...)
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 3/4] hw/i386: Add an Intel MPTable generator Sergio Lopez
@ 2019-06-28 11:53 ` Sergio Lopez
  2019-06-28 14:06   ` Michael S. Tsirkin
                     ` (2 more replies)
  2019-06-28 13:21 ` [Qemu-devel] [PATCH 0/4] " Paolo Bonzini
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 11:53 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost; +Cc: qemu-devel, Sergio Lopez

Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a KVM-only machine type with fast
boot times, minimal attack surface (measured as the number of IO ports
and MMIO regions exposed to the Guest) and small footprint (specially
when combined with the ongoing QEMU modularization effort).

Normally, other than the device support provided by KVM itself,
microvm only supports virtio-mmio devices. Microvm also includes a
legacy mode, which adds an ISA bus with a 16550A serial port, useful
for being able to see the early boot kernel messages.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 default-configs/i386-softmmu.mak |   1 +
 hw/i386/Kconfig                  |   4 +
 hw/i386/Makefile.objs            |   1 +
 hw/i386/microvm.c                | 518 +++++++++++++++++++++++++++++++
 include/hw/i386/microvm.h        |  85 +++++
 5 files changed, 609 insertions(+)
 create mode 100644 hw/i386/microvm.c
 create mode 100644 include/hw/i386/microvm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index cd5ea391e8..338f07420f 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -26,3 +26,4 @@ CONFIG_ISAPC=y
 CONFIG_I440FX=y
 CONFIG_Q35=y
 CONFIG_ACPI_PCI=y
+CONFIG_MICROVM=y
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 9817888216..94c565d8db 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -87,6 +87,10 @@ config Q35
     select VMMOUSE
     select FW_CFG_DMA
 
+config MICROVM
+    bool
+    select VIRTIO_MMIO
+
 config VTD
     bool
 
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 102f2b35fc..149bdd0784 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -4,6 +4,7 @@ obj-y += cpu.o
 obj-y += pc.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
+obj-$(CONFIG_MICROVM) += mptable.o microvm.o
 obj-y += fw_cfg.o pc_sysfw.o
 obj-y += x86-iommu.o
 obj-$(CONFIG_VTD) += intel_iommu.o
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
new file mode 100644
index 0000000000..fff88c3697
--- /dev/null
+++ b/hw/i386/microvm.c
@@ -0,0 +1,518 @@
+/*
+ *
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/numa.h"
+
+#include "hw/loader.h"
+#include "hw/nmi.h"
+#include "hw/kvm/clock.h"
+#include "hw/i386/microvm.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/cpu-internal.h"
+#include "target/i386/cpu.h"
+#include "hw/timer/i8254.h"
+#include "hw/char/serial.h"
+#include "hw/i386/topology.h"
+#include "hw/virtio/virtio-mmio.h"
+#include "hw/i386/mptable.h"
+
+#include "cpu.h"
+#include "elf.h"
+#include "kvm_i386.h"
+#include <asm/bootparam.h>
+
+#define DEFINE_MICROVM_MACHINE_LATEST(major, minor, latest) \
+    static void microvm_##major##_##minor##_object_class_init(ObjectClass *oc, \
+                                                              void *data) \
+    { \
+        MachineClass *mc = MACHINE_CLASS(oc); \
+        microvm_##major##_##minor##_machine_class_init(mc); \
+        mc->desc = "Microvm (i386)"; \
+        if (latest) { \
+            mc->alias = "microvm"; \
+        } \
+    } \
+    static const TypeInfo microvm_##major##_##minor##_info = { \
+        .name = MACHINE_TYPE_NAME("microvm-" # major "." # minor), \
+        .parent = TYPE_MICROVM_MACHINE, \
+        .instance_init = microvm_##major##_##minor##_instance_init, \
+        .class_init = microvm_##major##_##minor##_object_class_init, \
+    }; \
+    static void microvm_##major##_##minor##_init(void) \
+    { \
+        type_register_static(&microvm_##major##_##minor##_info); \
+    } \
+    type_init(microvm_##major##_##minor##_init);
+
+#define DEFINE_MICROVM_MACHINE_AS_LATEST(major, minor) \
+    DEFINE_MICROVM_MACHINE_LATEST(major, minor, true)
+#define DEFINE_MICROVM_MACHINE(major, minor) \
+    DEFINE_MICROVM_MACHINE_LATEST(major, minor, false)
+
+static void microvm_gsi_handler(void *opaque, int n, int level)
+{
+    qemu_irq *ioapic_irq = opaque;
+
+    qemu_set_irq(ioapic_irq[n], level);
+}
+
+static void microvm_legacy_init(MicrovmMachineState *mms)
+{
+    ISABus *isa_bus;
+    GSIState *gsi_state;
+    qemu_irq *i8259;
+    int i;
+
+    assert(kvm_irqchip_in_kernel());
+    gsi_state = g_malloc0(sizeof(*gsi_state));
+    mms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
+
+    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
+                          &error_abort);
+    isa_bus_irqs(isa_bus, mms->gsi);
+
+    assert(kvm_pic_in_kernel());
+    i8259 = kvm_i8259_init(isa_bus);
+
+    for (i = 0; i < ISA_NUM_IRQS; i++) {
+        gsi_state->i8259_irq[i] = i8259[i];
+    }
+
+    kvm_pit_init(isa_bus, 0x40);
+
+    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
+        int nirq = VIRTIO_IRQ_BASE + i;
+        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
+        qemu_irq mmio_irq;
+
+        isa_init_irq(isadev, &mmio_irq, nirq);
+        sysbus_create_simple("virtio-mmio",
+                             VIRTIO_MMIO_BASE + i * 512,
+                             mms->gsi[VIRTIO_IRQ_BASE + i]);
+    }
+
+    g_free(i8259);
+
+    serial_hds_isa_init(isa_bus, 0, 1);
+}
+
+static void microvm_ioapic_init(MicrovmMachineState *mms)
+{
+    qemu_irq *ioapic_irq;
+    DeviceState *ioapic_dev;
+    SysBusDevice *d;
+    int i;
+
+    assert(kvm_irqchip_in_kernel());
+    ioapic_irq = g_new0(qemu_irq, IOAPIC_NUM_PINS);
+    kvm_pc_setup_irq_routing(true);
+
+    assert(kvm_ioapic_in_kernel());
+    ioapic_dev = qdev_create(NULL, "kvm-ioapic");
+
+    object_property_add_child(qdev_get_machine(), "ioapic", OBJECT(ioapic_dev), NULL);
+
+    qdev_init_nofail(ioapic_dev);
+    d = SYS_BUS_DEVICE(ioapic_dev);
+    sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
+
+    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+        ioapic_irq[i] = qdev_get_gpio_in(ioapic_dev, i);
+    }
+
+    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler, ioapic_irq, IOAPIC_NUM_PINS);
+
+    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
+        sysbus_create_simple("virtio-mmio",
+                             VIRTIO_MMIO_BASE + i * 512,
+                             mms->gsi[VIRTIO_IRQ_BASE + i]);
+    }
+}
+
+static void microvm_memory_init(MicrovmMachineState *mms)
+{
+    MachineState *machine = MACHINE(mms);
+    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
+    MemoryRegion *system_memory = get_system_memory();
+
+    if (machine->ram_size > MICROVM_MAX_BELOW_4G) {
+        mms->above_4g_mem_size = machine->ram_size - MICROVM_MAX_BELOW_4G;
+        mms->below_4g_mem_size = MICROVM_MAX_BELOW_4G;
+    } else {
+        mms->above_4g_mem_size = 0;
+        mms->below_4g_mem_size = machine->ram_size;
+    }
+
+    ram = g_malloc(sizeof(*ram));
+    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
+                                         machine->ram_size);
+
+    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
+    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
+                             0, mms->below_4g_mem_size);
+    memory_region_add_subregion(system_memory, 0, ram_below_4g);
+
+    e820_add_entry(0, mms->below_4g_mem_size, E820_RAM);
+
+    if (mms->above_4g_mem_size > 0) {
+        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
+        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
+                                 mms->below_4g_mem_size,
+                                 mms->above_4g_mem_size);
+        memory_region_add_subregion(system_memory, 0x100000000ULL,
+                                    ram_above_4g);
+        e820_add_entry(0x100000000ULL, mms->above_4g_mem_size, E820_RAM);
+    }
+}
+
+static void microvm_machine_state_init(MachineState *machine)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
+    uint64_t elf_entry;
+    int kernel_size;
+
+    if (machine->kernel_filename == NULL) {
+        error_report("missing kernel image file name, required by microvm");
+        exit(1);
+    }
+
+    microvm_memory_init(mms);
+    if (mms->legacy) {
+        microvm_legacy_init(mms);
+    } else {
+        microvm_ioapic_init(mms);
+    }
+
+    mms->apic_id_limit = cpus_init(machine, false);
+
+    kvmclock_create();
+
+    kernel_size = load_elf(machine->kernel_filename, NULL,
+                           NULL, NULL, &elf_entry,
+                           NULL, NULL, 0, I386_ELF_MACHINE,
+                           0, 0);
+
+    if (kernel_size < 0) {
+        error_report("Error while loading elf kernel");
+        exit(1);
+    }
+
+    mms->elf_entry = elf_entry;
+}
+
+static gchar *microvm_get_virtio_mmio_cmdline(gchar *name)
+{
+    gchar *cmdline;
+    gchar *separator;
+    unsigned long index;
+    int ret;
+
+    separator = g_strrstr(name, ".");
+    if (!separator) {
+        return NULL;
+    }
+
+    index = strtol(separator + 1, NULL, 10);
+    if (index == LONG_MIN || index == LONG_MAX) {
+        return NULL;
+    }
+
+    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
+    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
+                     " virtio_mmio.device=512@0x%lx:%ld",
+                     VIRTIO_MMIO_BASE + index * 512,
+                     VIRTIO_IRQ_BASE + index);
+    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
+        g_free(cmdline);
+        return NULL;
+    }
+
+    return cmdline;
+}
+
+static void microvm_setup_bootparams(MicrovmMachineState *mms, const gchar *kernel_cmdline)
+{
+    struct boot_params params;
+    BusState *bus;
+    BusChild *kid;
+    gchar *cmdline;
+    int cmdline_len;
+    int i;
+
+    cmdline = g_strdup(kernel_cmdline);
+
+    /*
+     * Find MMIO transports with attached devices, and add them to the kernel
+     * command line.
+     */
+    bus = sysbus_get_default();
+    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+        DeviceState *dev = kid->child;
+        ObjectClass *class = object_get_class(OBJECT(dev));
+
+        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
+            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
+            VirtioBusState *mmio_virtio_bus = &mmio->bus;
+            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
+
+            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
+                gchar *mmio_cmdline = microvm_get_virtio_mmio_cmdline(mmio_bus->name);
+                if (mmio_cmdline) {
+                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
+                    g_free(mmio_cmdline);
+                    g_free(cmdline);
+                    cmdline = newcmd;
+                }
+            }
+        }
+    }
+
+    cmdline_len = strlen(cmdline);
+
+    address_space_write(&address_space_memory,
+                        KERNEL_CMDLINE_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) cmdline, cmdline_len);
+
+    g_free(cmdline);
+
+    memset(&params, 0, sizeof(struct boot_params));
+
+    params.hdr.type_of_loader = KERNEL_LOADER_OTHER;
+    params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC;
+    params.hdr.header = KERNEL_HDR_MAGIC;
+    params.hdr.cmd_line_ptr = KERNEL_CMDLINE_START;
+    params.hdr.cmdline_size = cmdline_len;
+    params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
+
+    params.e820_entries = e820_get_num_entries();
+    for (i = 0; i < params.e820_entries; i++) {
+        uint64_t address, length;
+        if (e820_get_entry(i, E820_RAM, &address, &length)) {
+            params.e820_table[i].addr = address;
+            params.e820_table[i].size = length;
+            params.e820_table[i].type = E820_RAM;
+        }
+    }
+
+    address_space_write(&address_space_memory,
+                        ZERO_PAGE_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) &params, sizeof(struct boot_params));
+}
+
+static void microvm_init_page_tables(void)
+{
+    uint64_t val = 0;
+    int i;
+
+    val = PDPTE_START | 0x03;
+    address_space_write(&address_space_memory,
+                        PML4_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) &val, 8);
+    val = PDE_START | 0x03;
+    address_space_write(&address_space_memory,
+                        PDPTE_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) &val, 8);
+
+    for (i = 0; i < 512; i++) {
+        val = (i << 21) + 0x83;
+        address_space_write(&address_space_memory,
+                            PDE_START + (i * 8), MEMTXATTRS_UNSPECIFIED,
+                            (uint8_t *) &val, 8);
+    }
+}
+
+static void microvm_cpu_reset(CPUState *cs, uint64_t elf_entry)
+{
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = &cpu->env;
+    struct SegmentCache seg_code =
+        { .selector = 0x8, .base = 0x0, .limit = 0xfffff, .flags = 0xa09b00 };
+    struct SegmentCache seg_data =
+        { .selector = 0x10, .base = 0x0, .limit = 0xfffff, .flags = 0xc09300 };
+    struct SegmentCache seg_tr =
+        { .selector = 0x18, .base = 0x0, .limit = 0xfffff, .flags = 0x808b00 };
+
+    kvm_arch_get_registers(cs);
+
+    memcpy(&env->segs[R_CS], &seg_code, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_DS], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_ES], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_FS], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_GS], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_SS], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->tr, &seg_tr, sizeof(struct SegmentCache));
+
+    env->efer |= MSR_EFER_LME | MSR_EFER_LMA;
+    env->regs[R_ESP] = BOOT_STACK_POINTER;
+    env->regs[R_EBP] = BOOT_STACK_POINTER;
+    env->regs[R_ESI] = ZERO_PAGE_START;
+
+    cpu_set_pc(cs, elf_entry);
+    cpu_x86_update_cr3(env, PML4_START);
+    cpu_x86_update_cr4(env, env->cr[4] | CR4_PAE_MASK);
+    cpu_x86_update_cr0(env, env->cr[0] | CR0_PE_MASK | CR0_PG_MASK);
+    x86_update_hflags(env);
+
+    kvm_arch_put_registers(cs, KVM_PUT_RESET_STATE);
+}
+
+static void microvm_mptable_setup(MicrovmMachineState *mms)
+{
+    char *mptable;
+    int size;
+
+    mptable = mptable_generate(smp_cpus, mms->apic_id_limit,
+                               EBDA_START, &size);
+    address_space_write(&address_space_memory,
+                        EBDA_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) mptable, size);
+    g_free(mptable);
+}
+
+static bool microvm_machine_get_legacy(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->legacy;
+}
+
+static void microvm_machine_set_legacy(Object *obj, bool value, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->legacy = value;
+}
+
+static void microvm_machine_reset(void)
+{
+    MachineState *machine = MACHINE(qdev_get_machine());
+    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
+    CPUState *cs;
+    X86CPU *cpu;
+
+    qemu_devices_reset();
+
+    microvm_mptable_setup(mms);
+    microvm_setup_bootparams(mms, machine->kernel_cmdline);
+    microvm_init_page_tables();
+
+    CPU_FOREACH(cs) {
+        cpu = X86_CPU(cs);
+
+        /* Reset APIC after devices have been reset to cancel
+         * any changes that qemu_devices_reset() might have done.
+         */
+        if (cpu->apic_state) {
+            device_reset(cpu->apic_state);
+        }
+
+        microvm_cpu_reset(cs, mms->elf_entry);
+    }
+}
+
+static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
+{
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        X86CPU *cpu = X86_CPU(cs);
+
+        if (!cpu->apic_state) {
+            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
+        } else {
+            apic_deliver_nmi(cpu->apic_state);
+        }
+    }
+}
+
+static void microvm_machine_instance_init(Object *obj)
+{
+}
+
+static void microvm_class_init(ObjectClass *oc, void *data)
+{
+    NMIClass *nc = NMI_CLASS(oc);
+
+    /* NMI handler */
+    nc->nmi_monitor_handler = x86_nmi;
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_LEGACY,
+                                   microvm_machine_get_legacy,
+                                   microvm_machine_set_legacy,
+                                   &error_abort);
+}
+
+static const TypeInfo microvm_machine_info = {
+    .name          = TYPE_MICROVM_MACHINE,
+    .parent        = TYPE_MACHINE,
+    .abstract      = true,
+    .instance_size = sizeof(MicrovmMachineState),
+    .instance_init = microvm_machine_instance_init,
+    .class_size    = sizeof(MicrovmMachineClass),
+    .class_init    = microvm_class_init,
+    .interfaces = (InterfaceInfo[]) {
+         { TYPE_NMI },
+         { }
+    },
+};
+
+static void microvm_machine_init(void)
+{
+    type_register_static(&microvm_machine_info);
+}
+type_init(microvm_machine_init);
+
+static void microvm_1_0_instance_init(Object *obj)
+{
+}
+
+static void microvm_machine_class_init(MachineClass *mc)
+{
+    mc->init = microvm_machine_state_init;
+
+    mc->family = "microvm_i386";
+    mc->desc = "Microvm (i386)";
+    mc->units_per_default_bus = 1;
+    mc->no_floppy = 1;
+    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
+    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
+    mc->max_cpus = 288;
+    mc->has_hotpluggable_cpus = false;
+    mc->auto_enable_numa_with_memhp = false;
+    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");
+    mc->nvdimm_supported = false;
+    mc->default_machine_opts = "accel=kvm";
+
+    /* Machine class handlers */
+    mc->cpu_index_to_instance_props = cpu_index_to_props;
+    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
+    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;;
+    mc->reset = microvm_machine_reset;
+}
+
+static void microvm_1_0_machine_class_init(MachineClass *mc)
+{
+    microvm_machine_class_init(mc);
+}
+DEFINE_MICROVM_MACHINE_AS_LATEST(1, 0)
diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
new file mode 100644
index 0000000000..544ef60563
--- /dev/null
+++ b/include/hw/i386/microvm.h
@@ -0,0 +1,85 @@
+/*
+ *
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_MICROVM_H
+#define HW_I386_MICROVM_H
+
+#include "qemu-common.h"
+#include "exec/hwaddr.h"
+#include "qemu/notify.h"
+
+#include "hw/boards.h"
+
+/* Microvm memory layout */
+#define ZERO_PAGE_START      0x7000
+#define BOOT_STACK_POINTER   0x8ff0
+#define PML4_START           0x9000
+#define PDPTE_START          0xa000
+#define PDE_START            0xb000
+#define EBDA_START           0x9fc00
+#define HIMEM_START          0x100000
+#define MICROVM_MAX_BELOW_4G 0xe0000000
+
+/* Bootparams related definitions */
+#define KERNEL_BOOT_FLAG_MAGIC     0xaa55
+#define KERNEL_HDR_MAGIC           0x53726448
+#define KERNEL_LOADER_OTHER        0xff
+#define KERNEL_MIN_ALIGNMENT_BYTES 0x01000000
+#define KERNEL_CMDLINE_START       0x20000
+#define KERNEL_CMDLINE_MAX_SIZE    0x10000
+
+/* Platform virtio definitions */
+#define VIRTIO_MMIO_BASE      0xd0000000
+#define VIRTIO_IRQ_BASE       5
+#define VIRTIO_NUM_TRANSPORTS 8
+#define VIRTIO_CMDLINE_MAXLEN 64
+
+/* Machine type options */
+#define MICROVM_MACHINE_LEGACY "legacy"
+
+typedef struct {
+    MachineClass parent;
+    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
+                                           DeviceState *dev);
+} MicrovmMachineClass;
+
+typedef struct {
+    MachineState parent;
+    unsigned apic_id_limit;
+    qemu_irq *gsi;
+
+    /* RAM size */
+    ram_addr_t below_4g_mem_size;
+    ram_addr_t above_4g_mem_size;
+
+    /* Kernel ELF entry. On reset, vCPUs RIP will be set to this */
+    uint64_t elf_entry;
+
+    /* Legacy mode based on an ISA bus. Useful for debugging */
+    bool legacy;
+} MicrovmMachineState;
+
+#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
+#define MICROVM_MACHINE(obj) \
+    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_CLASS(class) \
+    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type
  2019-06-28 11:53 [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (3 preceding siblings ...)
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
@ 2019-06-28 13:21 ` Paolo Bonzini
  2019-06-28 20:49   ` Sergio Lopez
  2019-06-28 16:32 ` no-reply
  2019-06-28 18:16 ` no-reply
  6 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2019-06-28 13:21 UTC (permalink / raw)
  To: Sergio Lopez, mst, marcel.apfelbaum, rth, ehabkost; +Cc: qemu-devel

On 28/06/19 13:53, Sergio Lopez wrote:
> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>  -nodefaults -no-user-config \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0 \
>  -serial stdio

I think the "non-legacy" mode can be obtained from the "legacy" one just
with -nodefaults (which all sane management should be using anyway), so
legacy=on can actually be the default. :)

I think this is interesting.  I'd love to have it optionally provide a
device tree as well.  It's not very common on x86 and most distro
kernels don't support device tree, but it would provide a more
out-of-the-box experience and it may even be a drop-in replacement for
q35 or pc as far as Kata is concerned.

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
@ 2019-06-28 14:03   ` Michael S. Tsirkin
  2019-06-28 20:50     ` Sergio Lopez
  0 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2019-06-28 14:03 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: qemu-devel, pbonzini, ehabkost, rth

On Fri, Jun 28, 2019 at 01:53:47PM +0200, Sergio Lopez wrote:
> Put QOM and main struct definition in a separate header file, so it
> can be accesed from other components.
> 
> This is needed for the microvm machine type implementation.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>

If you are going to productise virtio-mmio, then 1.0 support is a must.
I am not sure we want a new machine with 0.X mmio devices.
Especially considering that virtio-mmio does not have support for
transitional devices.

> ---
>  hw/virtio/virtio-mmio.c | 35 +-----------------------
>  hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 61 insertions(+), 34 deletions(-)
>  create mode 100644 hw/virtio/virtio-mmio.h
> 
> diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
> index 97b7f35496..87c7fe4d8d 100644
> --- a/hw/virtio/virtio-mmio.c
> +++ b/hw/virtio/virtio-mmio.c
> @@ -26,44 +26,11 @@
>  #include "qemu/host-utils.h"
>  #include "qemu/module.h"
>  #include "sysemu/kvm.h"
> -#include "hw/virtio/virtio-bus.h"
> +#include "virtio-mmio.h"
>  #include "qemu/error-report.h"
>  #include "qemu/log.h"
>  #include "trace.h"
>  
> -/* QOM macros */
> -/* virtio-mmio-bus */
> -#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
> -#define VIRTIO_MMIO_BUS(obj) \
> -        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
> -#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
> -        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
> -#define VIRTIO_MMIO_BUS_CLASS(klass) \
> -        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
> -
> -/* virtio-mmio */
> -#define TYPE_VIRTIO_MMIO "virtio-mmio"
> -#define VIRTIO_MMIO(obj) \
> -        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
> -
> -#define VIRT_MAGIC 0x74726976 /* 'virt' */
> -#define VIRT_VERSION 1
> -#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
> -
> -typedef struct {
> -    /* Generic */
> -    SysBusDevice parent_obj;
> -    MemoryRegion iomem;
> -    qemu_irq irq;
> -    /* Guest accessible state needing migration and reset */
> -    uint32_t host_features_sel;
> -    uint32_t guest_features_sel;
> -    uint32_t guest_page_shift;
> -    /* virtio-bus */
> -    VirtioBusState bus;
> -    bool format_transport_address;
> -} VirtIOMMIOProxy;
> -
>  static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
>  {
>      return kvm_eventfds_enabled();
> diff --git a/hw/virtio/virtio-mmio.h b/hw/virtio/virtio-mmio.h
> new file mode 100644
> index 0000000000..2f3973f8c7
> --- /dev/null
> +++ b/hw/virtio/virtio-mmio.h
> @@ -0,0 +1,60 @@
> +/*
> + * Virtio MMIO bindings
> + *
> + * Copyright (c) 2011 Linaro Limited
> + *
> + * Author:
> + *  Peter Maydell <peter.maydell@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef QEMU_VIRTIO_MMIO_H
> +#define QEMU_VIRTIO_MMIO_H
> +
> +#include "hw/virtio/virtio-bus.h"
> +
> +/* QOM macros */
> +/* virtio-mmio-bus */
> +#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
> +#define VIRTIO_MMIO_BUS(obj) \
> +        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
> +#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
> +        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
> +#define VIRTIO_MMIO_BUS_CLASS(klass) \
> +        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
> +
> +/* virtio-mmio */
> +#define TYPE_VIRTIO_MMIO "virtio-mmio"
> +#define VIRTIO_MMIO(obj) \
> +        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
> +
> +#define VIRT_MAGIC 0x74726976 /* 'virt' */
> +#define VIRT_VERSION 1
> +#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
> +
> +typedef struct {
> +    /* Generic */
> +    SysBusDevice parent_obj;
> +    MemoryRegion iomem;
> +    qemu_irq irq;
> +    /* Guest accessible state needing migration and reset */
> +    uint32_t host_features_sel;
> +    uint32_t guest_features_sel;
> +    uint32_t guest_page_shift;
> +    /* virtio-bus */
> +    VirtioBusState bus;
> +    bool format_transport_address;
> +} VirtIOMMIOProxy;
> +
> +#endif
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
@ 2019-06-28 14:06   ` Michael S. Tsirkin
  2019-06-28 20:56     ` Sergio Lopez
  2019-06-28 22:17     ` Paolo Bonzini
  2019-06-28 19:15   ` Maran Wilson
  2019-06-28 19:47   ` Eduardo Habkost
  2 siblings, 2 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2019-06-28 14:06 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: qemu-devel, pbonzini, ehabkost, rth

On Fri, Jun 28, 2019 at 01:53:49PM +0200, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a KVM-only machine type with fast
> boot times, minimal attack surface (measured as the number of IO ports
> and MMIO regions exposed to the Guest) and small footprint (specially
> when combined with the ongoing QEMU modularization effort).
> 
> Normally, other than the device support provided by KVM itself,
> microvm only supports virtio-mmio devices. Microvm also includes a
> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> for being able to see the early boot kernel messages.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>  default-configs/i386-softmmu.mak |   1 +
>  hw/i386/Kconfig                  |   4 +
>  hw/i386/Makefile.objs            |   1 +
>  hw/i386/microvm.c                | 518 +++++++++++++++++++++++++++++++
>  include/hw/i386/microvm.h        |  85 +++++
>  5 files changed, 609 insertions(+)
>  create mode 100644 hw/i386/microvm.c
>  create mode 100644 include/hw/i386/microvm.h
> 
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index cd5ea391e8..338f07420f 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>  CONFIG_I440FX=y
>  CONFIG_Q35=y
>  CONFIG_ACPI_PCI=y
> +CONFIG_MICROVM=y
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index 9817888216..94c565d8db 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -87,6 +87,10 @@ config Q35
>      select VMMOUSE
>      select FW_CFG_DMA
>  
> +config MICROVM
> +    bool
> +    select VIRTIO_MMIO
> +
>  config VTD
>      bool
>  
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 102f2b35fc..149bdd0784 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -4,6 +4,7 @@ obj-y += cpu.o
>  obj-y += pc.o
>  obj-$(CONFIG_I440FX) += pc_piix.o
>  obj-$(CONFIG_Q35) += pc_q35.o
> +obj-$(CONFIG_MICROVM) += mptable.o microvm.o
>  obj-y += fw_cfg.o pc_sysfw.o
>  obj-y += x86-iommu.o
>  obj-$(CONFIG_VTD) += intel_iommu.o
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> new file mode 100644
> index 0000000000..fff88c3697
> --- /dev/null
> +++ b/hw/i386/microvm.c
> @@ -0,0 +1,518 @@
> +/*
> + *
> + * Copyright (c) 2018 Intel Corporation
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "qapi/visitor.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/numa.h"
> +
> +#include "hw/loader.h"
> +#include "hw/nmi.h"
> +#include "hw/kvm/clock.h"
> +#include "hw/i386/microvm.h"
> +#include "hw/i386/pc.h"
> +#include "hw/i386/cpu-internal.h"
> +#include "target/i386/cpu.h"
> +#include "hw/timer/i8254.h"
> +#include "hw/char/serial.h"
> +#include "hw/i386/topology.h"
> +#include "hw/virtio/virtio-mmio.h"
> +#include "hw/i386/mptable.h"
> +
> +#include "cpu.h"
> +#include "elf.h"
> +#include "kvm_i386.h"
> +#include <asm/bootparam.h>
> +
> +#define DEFINE_MICROVM_MACHINE_LATEST(major, minor, latest) \
> +    static void microvm_##major##_##minor##_object_class_init(ObjectClass *oc, \
> +                                                              void *data) \
> +    { \
> +        MachineClass *mc = MACHINE_CLASS(oc); \
> +        microvm_##major##_##minor##_machine_class_init(mc); \
> +        mc->desc = "Microvm (i386)"; \
> +        if (latest) { \
> +            mc->alias = "microvm"; \
> +        } \
> +    } \
> +    static const TypeInfo microvm_##major##_##minor##_info = { \
> +        .name = MACHINE_TYPE_NAME("microvm-" # major "." # minor), \
> +        .parent = TYPE_MICROVM_MACHINE, \
> +        .instance_init = microvm_##major##_##minor##_instance_init, \
> +        .class_init = microvm_##major##_##minor##_object_class_init, \
> +    }; \
> +    static void microvm_##major##_##minor##_init(void) \
> +    { \
> +        type_register_static(&microvm_##major##_##minor##_info); \
> +    } \
> +    type_init(microvm_##major##_##minor##_init);
> +
> +#define DEFINE_MICROVM_MACHINE_AS_LATEST(major, minor) \
> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, true)
> +#define DEFINE_MICROVM_MACHINE(major, minor) \
> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, false)
> +
> +static void microvm_gsi_handler(void *opaque, int n, int level)
> +{
> +    qemu_irq *ioapic_irq = opaque;
> +
> +    qemu_set_irq(ioapic_irq[n], level);
> +}
> +
> +static void microvm_legacy_init(MicrovmMachineState *mms)
> +{
> +    ISABus *isa_bus;
> +    GSIState *gsi_state;
> +    qemu_irq *i8259;
> +    int i;
> +
> +    assert(kvm_irqchip_in_kernel());
> +    gsi_state = g_malloc0(sizeof(*gsi_state));
> +    mms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
> +
> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
> +                          &error_abort);
> +    isa_bus_irqs(isa_bus, mms->gsi);
> +
> +    assert(kvm_pic_in_kernel());
> +    i8259 = kvm_i8259_init(isa_bus);
> +
> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
> +        gsi_state->i8259_irq[i] = i8259[i];
> +    }
> +
> +    kvm_pit_init(isa_bus, 0x40);
> +
> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
> +        int nirq = VIRTIO_IRQ_BASE + i;
> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
> +        qemu_irq mmio_irq;
> +
> +        isa_init_irq(isadev, &mmio_irq, nirq);
> +        sysbus_create_simple("virtio-mmio",
> +                             VIRTIO_MMIO_BASE + i * 512,
> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
> +    }
> +
> +    g_free(i8259);
> +
> +    serial_hds_isa_init(isa_bus, 0, 1);
> +}
> +
> +static void microvm_ioapic_init(MicrovmMachineState *mms)
> +{
> +    qemu_irq *ioapic_irq;
> +    DeviceState *ioapic_dev;
> +    SysBusDevice *d;
> +    int i;
> +
> +    assert(kvm_irqchip_in_kernel());

Hmm - irqchip in kernel actually increases the attack surface,
does it not? Or at least, the severity of the attacks.

> +    ioapic_irq = g_new0(qemu_irq, IOAPIC_NUM_PINS);
> +    kvm_pc_setup_irq_routing(true);
> +
> +    assert(kvm_ioapic_in_kernel());
> +    ioapic_dev = qdev_create(NULL, "kvm-ioapic");
> +
> +    object_property_add_child(qdev_get_machine(), "ioapic", OBJECT(ioapic_dev), NULL);
> +
> +    qdev_init_nofail(ioapic_dev);
> +    d = SYS_BUS_DEVICE(ioapic_dev);
> +    sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
> +
> +    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> +        ioapic_irq[i] = qdev_get_gpio_in(ioapic_dev, i);
> +    }
> +
> +    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler, ioapic_irq, IOAPIC_NUM_PINS);
> +
> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
> +        sysbus_create_simple("virtio-mmio",
> +                             VIRTIO_MMIO_BASE + i * 512,
> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
> +    }
> +}
> +
> +static void microvm_memory_init(MicrovmMachineState *mms)
> +{
> +    MachineState *machine = MACHINE(mms);
> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
> +    MemoryRegion *system_memory = get_system_memory();
> +
> +    if (machine->ram_size > MICROVM_MAX_BELOW_4G) {
> +        mms->above_4g_mem_size = machine->ram_size - MICROVM_MAX_BELOW_4G;
> +        mms->below_4g_mem_size = MICROVM_MAX_BELOW_4G;
> +    } else {
> +        mms->above_4g_mem_size = 0;
> +        mms->below_4g_mem_size = machine->ram_size;
> +    }
> +
> +    ram = g_malloc(sizeof(*ram));
> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
> +                                         machine->ram_size);
> +
> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> +                             0, mms->below_4g_mem_size);
> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
> +
> +    e820_add_entry(0, mms->below_4g_mem_size, E820_RAM);
> +
> +    if (mms->above_4g_mem_size > 0) {
> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> +                                 mms->below_4g_mem_size,
> +                                 mms->above_4g_mem_size);
> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
> +                                    ram_above_4g);
> +        e820_add_entry(0x100000000ULL, mms->above_4g_mem_size, E820_RAM);
> +    }
> +}
> +
> +static void microvm_machine_state_init(MachineState *machine)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    uint64_t elf_entry;
> +    int kernel_size;
> +
> +    if (machine->kernel_filename == NULL) {
> +        error_report("missing kernel image file name, required by microvm");
> +        exit(1);
> +    }
> +
> +    microvm_memory_init(mms);
> +    if (mms->legacy) {
> +        microvm_legacy_init(mms);
> +    } else {
> +        microvm_ioapic_init(mms);
> +    }
> +
> +    mms->apic_id_limit = cpus_init(machine, false);
> +
> +    kvmclock_create();
> +
> +    kernel_size = load_elf(machine->kernel_filename, NULL,
> +                           NULL, NULL, &elf_entry,
> +                           NULL, NULL, 0, I386_ELF_MACHINE,
> +                           0, 0);
> +
> +    if (kernel_size < 0) {
> +        error_report("Error while loading elf kernel");
> +        exit(1);
> +    }
> +
> +    mms->elf_entry = elf_entry;
> +}
> +
> +static gchar *microvm_get_virtio_mmio_cmdline(gchar *name)
> +{
> +    gchar *cmdline;
> +    gchar *separator;
> +    unsigned long index;
> +    int ret;
> +
> +    separator = g_strrstr(name, ".");
> +    if (!separator) {
> +        return NULL;
> +    }
> +
> +    index = strtol(separator + 1, NULL, 10);
> +    if (index == LONG_MIN || index == LONG_MAX) {
> +        return NULL;
> +    }
> +
> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
> +                     " virtio_mmio.device=512@0x%lx:%ld",
> +                     VIRTIO_MMIO_BASE + index * 512,
> +                     VIRTIO_IRQ_BASE + index);
> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
> +        g_free(cmdline);
> +        return NULL;
> +    }
> +
> +    return cmdline;
> +}
> +
> +static void microvm_setup_bootparams(MicrovmMachineState *mms, const gchar *kernel_cmdline)
> +{
> +    struct boot_params params;
> +    BusState *bus;
> +    BusChild *kid;
> +    gchar *cmdline;
> +    int cmdline_len;
> +    int i;
> +
> +    cmdline = g_strdup(kernel_cmdline);
> +
> +    /*
> +     * Find MMIO transports with attached devices, and add them to the kernel
> +     * command line.
> +     */
> +    bus = sysbus_get_default();
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        DeviceState *dev = kid->child;
> +        ObjectClass *class = object_get_class(OBJECT(dev));
> +
> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
> +
> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
> +                gchar *mmio_cmdline = microvm_get_virtio_mmio_cmdline(mmio_bus->name);
> +                if (mmio_cmdline) {
> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
> +                    g_free(mmio_cmdline);
> +                    g_free(cmdline);
> +                    cmdline = newcmd;
> +                }
> +            }
> +        }
> +    }
> +
> +    cmdline_len = strlen(cmdline);
> +
> +    address_space_write(&address_space_memory,
> +                        KERNEL_CMDLINE_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) cmdline, cmdline_len);
> +
> +    g_free(cmdline);
> +
> +    memset(&params, 0, sizeof(struct boot_params));
> +
> +    params.hdr.type_of_loader = KERNEL_LOADER_OTHER;
> +    params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC;
> +    params.hdr.header = KERNEL_HDR_MAGIC;
> +    params.hdr.cmd_line_ptr = KERNEL_CMDLINE_START;
> +    params.hdr.cmdline_size = cmdline_len;
> +    params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
> +
> +    params.e820_entries = e820_get_num_entries();
> +    for (i = 0; i < params.e820_entries; i++) {
> +        uint64_t address, length;
> +        if (e820_get_entry(i, E820_RAM, &address, &length)) {
> +            params.e820_table[i].addr = address;
> +            params.e820_table[i].size = length;
> +            params.e820_table[i].type = E820_RAM;
> +        }
> +    }
> +
> +    address_space_write(&address_space_memory,
> +                        ZERO_PAGE_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) &params, sizeof(struct boot_params));
> +}
> +
> +static void microvm_init_page_tables(void)
> +{
> +    uint64_t val = 0;
> +    int i;
> +
> +    val = PDPTE_START | 0x03;
> +    address_space_write(&address_space_memory,
> +                        PML4_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) &val, 8);
> +    val = PDE_START | 0x03;
> +    address_space_write(&address_space_memory,
> +                        PDPTE_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) &val, 8);
> +
> +    for (i = 0; i < 512; i++) {
> +        val = (i << 21) + 0x83;
> +        address_space_write(&address_space_memory,
> +                            PDE_START + (i * 8), MEMTXATTRS_UNSPECIFIED,
> +                            (uint8_t *) &val, 8);
> +    }
> +}
> +
> +static void microvm_cpu_reset(CPUState *cs, uint64_t elf_entry)
> +{
> +    X86CPU *cpu = X86_CPU(cs);
> +    CPUX86State *env = &cpu->env;
> +    struct SegmentCache seg_code =
> +        { .selector = 0x8, .base = 0x0, .limit = 0xfffff, .flags = 0xa09b00 };
> +    struct SegmentCache seg_data =
> +        { .selector = 0x10, .base = 0x0, .limit = 0xfffff, .flags = 0xc09300 };
> +    struct SegmentCache seg_tr =
> +        { .selector = 0x18, .base = 0x0, .limit = 0xfffff, .flags = 0x808b00 };
> +
> +    kvm_arch_get_registers(cs);
> +
> +    memcpy(&env->segs[R_CS], &seg_code, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_DS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_ES], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_FS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_GS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_SS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->tr, &seg_tr, sizeof(struct SegmentCache));
> +
> +    env->efer |= MSR_EFER_LME | MSR_EFER_LMA;
> +    env->regs[R_ESP] = BOOT_STACK_POINTER;
> +    env->regs[R_EBP] = BOOT_STACK_POINTER;
> +    env->regs[R_ESI] = ZERO_PAGE_START;
> +
> +    cpu_set_pc(cs, elf_entry);
> +    cpu_x86_update_cr3(env, PML4_START);
> +    cpu_x86_update_cr4(env, env->cr[4] | CR4_PAE_MASK);
> +    cpu_x86_update_cr0(env, env->cr[0] | CR0_PE_MASK | CR0_PG_MASK);
> +    x86_update_hflags(env);
> +
> +    kvm_arch_put_registers(cs, KVM_PUT_RESET_STATE);
> +}
> +
> +static void microvm_mptable_setup(MicrovmMachineState *mms)
> +{
> +    char *mptable;
> +    int size;
> +
> +    mptable = mptable_generate(smp_cpus, mms->apic_id_limit,
> +                               EBDA_START, &size);
> +    address_space_write(&address_space_memory,
> +                        EBDA_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) mptable, size);
> +    g_free(mptable);
> +}
> +
> +static bool microvm_machine_get_legacy(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->legacy;
> +}
> +
> +static void microvm_machine_set_legacy(Object *obj, bool value, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->legacy = value;
> +}
> +
> +static void microvm_machine_reset(void)
> +{
> +    MachineState *machine = MACHINE(qdev_get_machine());
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    CPUState *cs;
> +    X86CPU *cpu;
> +
> +    qemu_devices_reset();
> +
> +    microvm_mptable_setup(mms);
> +    microvm_setup_bootparams(mms, machine->kernel_cmdline);
> +    microvm_init_page_tables();
> +
> +    CPU_FOREACH(cs) {
> +        cpu = X86_CPU(cs);
> +
> +        /* Reset APIC after devices have been reset to cancel
> +         * any changes that qemu_devices_reset() might have done.
> +         */
> +        if (cpu->apic_state) {
> +            device_reset(cpu->apic_state);
> +        }
> +
> +        microvm_cpu_reset(cs, mms->elf_entry);
> +    }
> +}
> +
> +static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
> +{
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        X86CPU *cpu = X86_CPU(cs);
> +
> +        if (!cpu->apic_state) {
> +            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
> +        } else {
> +            apic_deliver_nmi(cpu->apic_state);
> +        }
> +    }
> +}
> +
> +static void microvm_machine_instance_init(Object *obj)
> +{
> +}
> +
> +static void microvm_class_init(ObjectClass *oc, void *data)
> +{
> +    NMIClass *nc = NMI_CLASS(oc);
> +
> +    /* NMI handler */
> +    nc->nmi_monitor_handler = x86_nmi;
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_LEGACY,
> +                                   microvm_machine_get_legacy,
> +                                   microvm_machine_set_legacy,
> +                                   &error_abort);
> +}
> +
> +static const TypeInfo microvm_machine_info = {
> +    .name          = TYPE_MICROVM_MACHINE,
> +    .parent        = TYPE_MACHINE,
> +    .abstract      = true,
> +    .instance_size = sizeof(MicrovmMachineState),
> +    .instance_init = microvm_machine_instance_init,
> +    .class_size    = sizeof(MicrovmMachineClass),
> +    .class_init    = microvm_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +         { TYPE_NMI },
> +         { }
> +    },
> +};
> +
> +static void microvm_machine_init(void)
> +{
> +    type_register_static(&microvm_machine_info);
> +}
> +type_init(microvm_machine_init);
> +
> +static void microvm_1_0_instance_init(Object *obj)
> +{
> +}
> +
> +static void microvm_machine_class_init(MachineClass *mc)
> +{
> +    mc->init = microvm_machine_state_init;
> +
> +    mc->family = "microvm_i386";
> +    mc->desc = "Microvm (i386)";
> +    mc->units_per_default_bus = 1;
> +    mc->no_floppy = 1;
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
> +    mc->max_cpus = 288;
> +    mc->has_hotpluggable_cpus = false;
> +    mc->auto_enable_numa_with_memhp = false;
> +    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");
> +    mc->nvdimm_supported = false;
> +    mc->default_machine_opts = "accel=kvm";
> +
> +    /* Machine class handlers */
> +    mc->cpu_index_to_instance_props = cpu_index_to_props;
> +    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
> +    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;;
> +    mc->reset = microvm_machine_reset;
> +}
> +
> +static void microvm_1_0_machine_class_init(MachineClass *mc)
> +{
> +    microvm_machine_class_init(mc);
> +}
> +DEFINE_MICROVM_MACHINE_AS_LATEST(1, 0)
> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
> new file mode 100644
> index 0000000000..544ef60563
> --- /dev/null
> +++ b/include/hw/i386/microvm.h
> @@ -0,0 +1,85 @@
> +/*
> + *
> + * Copyright (c) 2018 Intel Corporation
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_I386_MICROVM_H
> +#define HW_I386_MICROVM_H
> +
> +#include "qemu-common.h"
> +#include "exec/hwaddr.h"
> +#include "qemu/notify.h"
> +
> +#include "hw/boards.h"
> +
> +/* Microvm memory layout */
> +#define ZERO_PAGE_START      0x7000
> +#define BOOT_STACK_POINTER   0x8ff0
> +#define PML4_START           0x9000
> +#define PDPTE_START          0xa000
> +#define PDE_START            0xb000
> +#define EBDA_START           0x9fc00
> +#define HIMEM_START          0x100000
> +#define MICROVM_MAX_BELOW_4G 0xe0000000
> +
> +/* Bootparams related definitions */
> +#define KERNEL_BOOT_FLAG_MAGIC     0xaa55
> +#define KERNEL_HDR_MAGIC           0x53726448
> +#define KERNEL_LOADER_OTHER        0xff
> +#define KERNEL_MIN_ALIGNMENT_BYTES 0x01000000
> +#define KERNEL_CMDLINE_START       0x20000
> +#define KERNEL_CMDLINE_MAX_SIZE    0x10000
> +
> +/* Platform virtio definitions */
> +#define VIRTIO_MMIO_BASE      0xd0000000
> +#define VIRTIO_IRQ_BASE       5
> +#define VIRTIO_NUM_TRANSPORTS 8
> +#define VIRTIO_CMDLINE_MAXLEN 64
> +
> +/* Machine type options */
> +#define MICROVM_MACHINE_LEGACY "legacy"
> +
> +typedef struct {
> +    MachineClass parent;
> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
> +                                           DeviceState *dev);
> +} MicrovmMachineClass;
> +
> +typedef struct {
> +    MachineState parent;
> +    unsigned apic_id_limit;
> +    qemu_irq *gsi;
> +
> +    /* RAM size */
> +    ram_addr_t below_4g_mem_size;
> +    ram_addr_t above_4g_mem_size;
> +
> +    /* Kernel ELF entry. On reset, vCPUs RIP will be set to this */
> +    uint64_t elf_entry;
> +
> +    /* Legacy mode based on an ISA bus. Useful for debugging */
> +    bool legacy;
> +} MicrovmMachineState;
> +
> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
> +#define MICROVM_MACHINE(obj) \
> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_CLASS(class) \
> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
> +
> +#endif
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type
  2019-06-28 11:53 [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (4 preceding siblings ...)
  2019-06-28 13:21 ` [Qemu-devel] [PATCH 0/4] " Paolo Bonzini
@ 2019-06-28 16:32 ` no-reply
  2019-06-28 18:16 ` no-reply
  6 siblings, 0 replies; 29+ messages in thread
From: no-reply @ 2019-06-28 16:32 UTC (permalink / raw)
  To: slp; +Cc: ehabkost, slp, mst, qemu-devel, pbonzini, rth

Patchew URL: https://patchew.org/QEMU/20190628115349.60293-1-slp@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20190628115349.60293-1-slp@redhat.com
Type: series
Subject: [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
7ea633c hw/i386: Introduce the microvm machine type
df613e2 hw/i386: Add an Intel MPTable generator
a4046b8 hw/virtio: Factorize virtio-mmio headers
6fc3cc5 hw/i386: Factorize CPU routine

=== OUTPUT BEGIN ===
1/4 Checking commit 6fc3cc5db6b1 (hw/i386: Factorize CPU routine)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#51: 
new file mode 100644

WARNING: Block comments use a leading /* on a separate line
#102: FILE: hw/i386/cpu.c:47:
+/* Calculates initial APIC ID for a specific CPU index

WARNING: Block comments should align the * on each line
#155: FILE: hw/i386/cpu.c:100:
+         * -smp hasn't been parsed after it
+        */

ERROR: line over 90 characters
#168: FILE: hw/i386/cpu.c:113:
+        ms->possible_cpus->cpus[i].arch_id = cpu_apicid_from_index(i, compat_apic_id_mode);

WARNING: Block comments use a leading /* on a separate line
#214: FILE: hw/i386/cpu.c:159:
+    /* Calculates the limit to CPU APIC ID values

total: 1 errors, 4 warnings, 438 lines checked

Patch 1/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/4 Checking commit a4046b824588 (hw/virtio: Factorize virtio-mmio headers)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#66: 
new file mode 100644

total: 0 errors, 1 warnings, 105 lines checked

Patch 2/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/4 Checking commit df613e2dbf51 (hw/i386: Add an Intel MPTable generator)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#16: 
new file mode 100644

total: 0 errors, 1 warnings, 376 lines checked

Patch 3/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
4/4 Checking commit 7ea633c1e39e (hw/i386: Introduce the microvm machine type)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#60: 
new file mode 100644

WARNING: line over 80 characters
#198: FILE: hw/i386/microvm.c:134:
+    object_property_add_child(qdev_get_machine(), "ioapic", OBJECT(ioapic_dev), NULL);

WARNING: line over 80 characters
#208: FILE: hw/i386/microvm.c:144:
+    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler, ioapic_irq, IOAPIC_NUM_PINS);

ERROR: consider using qemu_strtol in preference to strtol
#300: FILE: hw/i386/microvm.c:236:
+    index = strtol(separator + 1, NULL, 10);

ERROR: line over 90 characters
#318: FILE: hw/i386/microvm.c:254:
+static void microvm_setup_bootparams(MicrovmMachineState *mms, const gchar *kernel_cmdline)

WARNING: line over 80 characters
#344: FILE: hw/i386/microvm.c:280:
+                gchar *mmio_cmdline = microvm_get_virtio_mmio_cmdline(mmio_bus->name);

ERROR: that open brace { should be on the previous line
#414: FILE: hw/i386/microvm.c:350:
+    struct SegmentCache seg_code =
+        { .selector = 0x8, .base = 0x0, .limit = 0xfffff, .flags = 0xa09b00 };

ERROR: that open brace { should be on the previous line
#416: FILE: hw/i386/microvm.c:352:
+    struct SegmentCache seg_data =
+        { .selector = 0x10, .base = 0x0, .limit = 0xfffff, .flags = 0xc09300 };

ERROR: that open brace { should be on the previous line
#418: FILE: hw/i386/microvm.c:354:
+    struct SegmentCache seg_tr =
+        { .selector = 0x18, .base = 0x0, .limit = 0xfffff, .flags = 0x808b00 };

WARNING: Block comments use a leading /* on a separate line
#487: FILE: hw/i386/microvm.c:423:
+        /* Reset APIC after devices have been reset to cancel

ERROR: space prohibited between function name and open parenthesis '('
#567: FILE: hw/i386/microvm.c:503:
+    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");

total: 6 errors, 5 warnings, 624 lines checked

Patch 4/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190628115349.60293-1-slp@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type
  2019-06-28 11:53 [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (5 preceding siblings ...)
  2019-06-28 16:32 ` no-reply
@ 2019-06-28 18:16 ` no-reply
  6 siblings, 0 replies; 29+ messages in thread
From: no-reply @ 2019-06-28 18:16 UTC (permalink / raw)
  To: slp; +Cc: ehabkost, slp, mst, qemu-devel, pbonzini, rth

Patchew URL: https://patchew.org/QEMU/20190628115349.60293-1-slp@redhat.com/



Hi,

This series failed build test on s390x host. Please find the details below.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e

echo
echo "=== ENV ==="
env

echo
echo "=== PACKAGES ==="
rpm -qa

echo
echo "=== UNAME ==="
uname -a

CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

  CC      i386-softmmu/hw/i386/microvm.o
  CC      aarch64-softmmu/hw/misc/exynos4210_rng.o
  CC      lm32-softmmu/gdbstub.o
/var/tmp/patchew-tester-tmp-kt7629bk/src/hw/i386/microvm.c:43:10: fatal error: asm/bootparam.h: No such file or directory
   43 | #include <asm/bootparam.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.


The full log is available at
http://patchew.org/logs/20190628115349.60293-1-slp@redhat.com/testing.s390x/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
  2019-06-28 14:06   ` Michael S. Tsirkin
@ 2019-06-28 19:15   ` Maran Wilson
  2019-06-28 21:05     ` Sergio Lopez
  2019-06-28 19:47   ` Eduardo Habkost
  2 siblings, 1 reply; 29+ messages in thread
From: Maran Wilson @ 2019-06-28 19:15 UTC (permalink / raw)
  To: Sergio Lopez, mst, marcel.apfelbaum, pbonzini, rth, ehabkost
  Cc: qemu-devel, Maran Wilson

This seems like a good overall direction to be headed with Qemu.

But there is a lot of Linux OS specific startup details being baked into 
the Qemu machine type here. Things that are usually pushed into firmware 
or option ROM.

Instead of hard coding all the Zero page stuff into the Qemu machine 
model, couldn't you just setup the PVH kernel entry point and leave all 
the OS specific details to the OS being started? That way, at least you 
are programming to a more generic ABI spec. See: 
https://gist.github.com/stefano-garzarella/7b7e17e75add20abd1c42fb496cc6504

And I think you still wouldn't need any firmware if you just replace 
your zeropage initialization with PVH spec setup.

Thanks,
-Maran

On 6/28/2019 4:53 AM, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
>
> It's main purpose is providing users a KVM-only machine type with fast
> boot times, minimal attack surface (measured as the number of IO ports
> and MMIO regions exposed to the Guest) and small footprint (specially
> when combined with the ongoing QEMU modularization effort).
>
> Normally, other than the device support provided by KVM itself,
> microvm only supports virtio-mmio devices. Microvm also includes a
> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> for being able to see the early boot kernel messages.
>
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
>   default-configs/i386-softmmu.mak |   1 +
>   hw/i386/Kconfig                  |   4 +
>   hw/i386/Makefile.objs            |   1 +
>   hw/i386/microvm.c                | 518 +++++++++++++++++++++++++++++++
>   include/hw/i386/microvm.h        |  85 +++++
>   5 files changed, 609 insertions(+)
>   create mode 100644 hw/i386/microvm.c
>   create mode 100644 include/hw/i386/microvm.h
>
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index cd5ea391e8..338f07420f 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>   CONFIG_I440FX=y
>   CONFIG_Q35=y
>   CONFIG_ACPI_PCI=y
> +CONFIG_MICROVM=y
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index 9817888216..94c565d8db 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -87,6 +87,10 @@ config Q35
>       select VMMOUSE
>       select FW_CFG_DMA
>   
> +config MICROVM
> +    bool
> +    select VIRTIO_MMIO
> +
>   config VTD
>       bool
>   
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 102f2b35fc..149bdd0784 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -4,6 +4,7 @@ obj-y += cpu.o
>   obj-y += pc.o
>   obj-$(CONFIG_I440FX) += pc_piix.o
>   obj-$(CONFIG_Q35) += pc_q35.o
> +obj-$(CONFIG_MICROVM) += mptable.o microvm.o
>   obj-y += fw_cfg.o pc_sysfw.o
>   obj-y += x86-iommu.o
>   obj-$(CONFIG_VTD) += intel_iommu.o
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> new file mode 100644
> index 0000000000..fff88c3697
> --- /dev/null
> +++ b/hw/i386/microvm.c
> @@ -0,0 +1,518 @@
> +/*
> + *
> + * Copyright (c) 2018 Intel Corporation
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "qapi/visitor.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/numa.h"
> +
> +#include "hw/loader.h"
> +#include "hw/nmi.h"
> +#include "hw/kvm/clock.h"
> +#include "hw/i386/microvm.h"
> +#include "hw/i386/pc.h"
> +#include "hw/i386/cpu-internal.h"
> +#include "target/i386/cpu.h"
> +#include "hw/timer/i8254.h"
> +#include "hw/char/serial.h"
> +#include "hw/i386/topology.h"
> +#include "hw/virtio/virtio-mmio.h"
> +#include "hw/i386/mptable.h"
> +
> +#include "cpu.h"
> +#include "elf.h"
> +#include "kvm_i386.h"
> +#include <asm/bootparam.h>
> +
> +#define DEFINE_MICROVM_MACHINE_LATEST(major, minor, latest) \
> +    static void microvm_##major##_##minor##_object_class_init(ObjectClass *oc, \
> +                                                              void *data) \
> +    { \
> +        MachineClass *mc = MACHINE_CLASS(oc); \
> +        microvm_##major##_##minor##_machine_class_init(mc); \
> +        mc->desc = "Microvm (i386)"; \
> +        if (latest) { \
> +            mc->alias = "microvm"; \
> +        } \
> +    } \
> +    static const TypeInfo microvm_##major##_##minor##_info = { \
> +        .name = MACHINE_TYPE_NAME("microvm-" # major "." # minor), \
> +        .parent = TYPE_MICROVM_MACHINE, \
> +        .instance_init = microvm_##major##_##minor##_instance_init, \
> +        .class_init = microvm_##major##_##minor##_object_class_init, \
> +    }; \
> +    static void microvm_##major##_##minor##_init(void) \
> +    { \
> +        type_register_static(&microvm_##major##_##minor##_info); \
> +    } \
> +    type_init(microvm_##major##_##minor##_init);
> +
> +#define DEFINE_MICROVM_MACHINE_AS_LATEST(major, minor) \
> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, true)
> +#define DEFINE_MICROVM_MACHINE(major, minor) \
> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, false)
> +
> +static void microvm_gsi_handler(void *opaque, int n, int level)
> +{
> +    qemu_irq *ioapic_irq = opaque;
> +
> +    qemu_set_irq(ioapic_irq[n], level);
> +}
> +
> +static void microvm_legacy_init(MicrovmMachineState *mms)
> +{
> +    ISABus *isa_bus;
> +    GSIState *gsi_state;
> +    qemu_irq *i8259;
> +    int i;
> +
> +    assert(kvm_irqchip_in_kernel());
> +    gsi_state = g_malloc0(sizeof(*gsi_state));
> +    mms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
> +
> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
> +                          &error_abort);
> +    isa_bus_irqs(isa_bus, mms->gsi);
> +
> +    assert(kvm_pic_in_kernel());
> +    i8259 = kvm_i8259_init(isa_bus);
> +
> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
> +        gsi_state->i8259_irq[i] = i8259[i];
> +    }
> +
> +    kvm_pit_init(isa_bus, 0x40);
> +
> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
> +        int nirq = VIRTIO_IRQ_BASE + i;
> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
> +        qemu_irq mmio_irq;
> +
> +        isa_init_irq(isadev, &mmio_irq, nirq);
> +        sysbus_create_simple("virtio-mmio",
> +                             VIRTIO_MMIO_BASE + i * 512,
> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
> +    }
> +
> +    g_free(i8259);
> +
> +    serial_hds_isa_init(isa_bus, 0, 1);
> +}
> +
> +static void microvm_ioapic_init(MicrovmMachineState *mms)
> +{
> +    qemu_irq *ioapic_irq;
> +    DeviceState *ioapic_dev;
> +    SysBusDevice *d;
> +    int i;
> +
> +    assert(kvm_irqchip_in_kernel());
> +    ioapic_irq = g_new0(qemu_irq, IOAPIC_NUM_PINS);
> +    kvm_pc_setup_irq_routing(true);
> +
> +    assert(kvm_ioapic_in_kernel());
> +    ioapic_dev = qdev_create(NULL, "kvm-ioapic");
> +
> +    object_property_add_child(qdev_get_machine(), "ioapic", OBJECT(ioapic_dev), NULL);
> +
> +    qdev_init_nofail(ioapic_dev);
> +    d = SYS_BUS_DEVICE(ioapic_dev);
> +    sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
> +
> +    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> +        ioapic_irq[i] = qdev_get_gpio_in(ioapic_dev, i);
> +    }
> +
> +    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler, ioapic_irq, IOAPIC_NUM_PINS);
> +
> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
> +        sysbus_create_simple("virtio-mmio",
> +                             VIRTIO_MMIO_BASE + i * 512,
> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
> +    }
> +}
> +
> +static void microvm_memory_init(MicrovmMachineState *mms)
> +{
> +    MachineState *machine = MACHINE(mms);
> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
> +    MemoryRegion *system_memory = get_system_memory();
> +
> +    if (machine->ram_size > MICROVM_MAX_BELOW_4G) {
> +        mms->above_4g_mem_size = machine->ram_size - MICROVM_MAX_BELOW_4G;
> +        mms->below_4g_mem_size = MICROVM_MAX_BELOW_4G;
> +    } else {
> +        mms->above_4g_mem_size = 0;
> +        mms->below_4g_mem_size = machine->ram_size;
> +    }
> +
> +    ram = g_malloc(sizeof(*ram));
> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
> +                                         machine->ram_size);
> +
> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> +                             0, mms->below_4g_mem_size);
> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
> +
> +    e820_add_entry(0, mms->below_4g_mem_size, E820_RAM);
> +
> +    if (mms->above_4g_mem_size > 0) {
> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> +                                 mms->below_4g_mem_size,
> +                                 mms->above_4g_mem_size);
> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
> +                                    ram_above_4g);
> +        e820_add_entry(0x100000000ULL, mms->above_4g_mem_size, E820_RAM);
> +    }
> +}
> +
> +static void microvm_machine_state_init(MachineState *machine)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    uint64_t elf_entry;
> +    int kernel_size;
> +
> +    if (machine->kernel_filename == NULL) {
> +        error_report("missing kernel image file name, required by microvm");
> +        exit(1);
> +    }
> +
> +    microvm_memory_init(mms);
> +    if (mms->legacy) {
> +        microvm_legacy_init(mms);
> +    } else {
> +        microvm_ioapic_init(mms);
> +    }
> +
> +    mms->apic_id_limit = cpus_init(machine, false);
> +
> +    kvmclock_create();
> +
> +    kernel_size = load_elf(machine->kernel_filename, NULL,
> +                           NULL, NULL, &elf_entry,
> +                           NULL, NULL, 0, I386_ELF_MACHINE,
> +                           0, 0);
> +
> +    if (kernel_size < 0) {
> +        error_report("Error while loading elf kernel");
> +        exit(1);
> +    }
> +
> +    mms->elf_entry = elf_entry;
> +}
> +
> +static gchar *microvm_get_virtio_mmio_cmdline(gchar *name)
> +{
> +    gchar *cmdline;
> +    gchar *separator;
> +    unsigned long index;
> +    int ret;
> +
> +    separator = g_strrstr(name, ".");
> +    if (!separator) {
> +        return NULL;
> +    }
> +
> +    index = strtol(separator + 1, NULL, 10);
> +    if (index == LONG_MIN || index == LONG_MAX) {
> +        return NULL;
> +    }
> +
> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
> +                     " virtio_mmio.device=512@0x%lx:%ld",
> +                     VIRTIO_MMIO_BASE + index * 512,
> +                     VIRTIO_IRQ_BASE + index);
> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
> +        g_free(cmdline);
> +        return NULL;
> +    }
> +
> +    return cmdline;
> +}
> +
> +static void microvm_setup_bootparams(MicrovmMachineState *mms, const gchar *kernel_cmdline)
> +{
> +    struct boot_params params;
> +    BusState *bus;
> +    BusChild *kid;
> +    gchar *cmdline;
> +    int cmdline_len;
> +    int i;
> +
> +    cmdline = g_strdup(kernel_cmdline);
> +
> +    /*
> +     * Find MMIO transports with attached devices, and add them to the kernel
> +     * command line.
> +     */
> +    bus = sysbus_get_default();
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        DeviceState *dev = kid->child;
> +        ObjectClass *class = object_get_class(OBJECT(dev));
> +
> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
> +
> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
> +                gchar *mmio_cmdline = microvm_get_virtio_mmio_cmdline(mmio_bus->name);
> +                if (mmio_cmdline) {
> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
> +                    g_free(mmio_cmdline);
> +                    g_free(cmdline);
> +                    cmdline = newcmd;
> +                }
> +            }
> +        }
> +    }
> +
> +    cmdline_len = strlen(cmdline);
> +
> +    address_space_write(&address_space_memory,
> +                        KERNEL_CMDLINE_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) cmdline, cmdline_len);
> +
> +    g_free(cmdline);
> +
> +    memset(&params, 0, sizeof(struct boot_params));
> +
> +    params.hdr.type_of_loader = KERNEL_LOADER_OTHER;
> +    params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC;
> +    params.hdr.header = KERNEL_HDR_MAGIC;
> +    params.hdr.cmd_line_ptr = KERNEL_CMDLINE_START;
> +    params.hdr.cmdline_size = cmdline_len;
> +    params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
> +
> +    params.e820_entries = e820_get_num_entries();
> +    for (i = 0; i < params.e820_entries; i++) {
> +        uint64_t address, length;
> +        if (e820_get_entry(i, E820_RAM, &address, &length)) {
> +            params.e820_table[i].addr = address;
> +            params.e820_table[i].size = length;
> +            params.e820_table[i].type = E820_RAM;
> +        }
> +    }
> +
> +    address_space_write(&address_space_memory,
> +                        ZERO_PAGE_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) &params, sizeof(struct boot_params));
> +}
> +
> +static void microvm_init_page_tables(void)
> +{
> +    uint64_t val = 0;
> +    int i;
> +
> +    val = PDPTE_START | 0x03;
> +    address_space_write(&address_space_memory,
> +                        PML4_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) &val, 8);
> +    val = PDE_START | 0x03;
> +    address_space_write(&address_space_memory,
> +                        PDPTE_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) &val, 8);
> +
> +    for (i = 0; i < 512; i++) {
> +        val = (i << 21) + 0x83;
> +        address_space_write(&address_space_memory,
> +                            PDE_START + (i * 8), MEMTXATTRS_UNSPECIFIED,
> +                            (uint8_t *) &val, 8);
> +    }
> +}
> +
> +static void microvm_cpu_reset(CPUState *cs, uint64_t elf_entry)
> +{
> +    X86CPU *cpu = X86_CPU(cs);
> +    CPUX86State *env = &cpu->env;
> +    struct SegmentCache seg_code =
> +        { .selector = 0x8, .base = 0x0, .limit = 0xfffff, .flags = 0xa09b00 };
> +    struct SegmentCache seg_data =
> +        { .selector = 0x10, .base = 0x0, .limit = 0xfffff, .flags = 0xc09300 };
> +    struct SegmentCache seg_tr =
> +        { .selector = 0x18, .base = 0x0, .limit = 0xfffff, .flags = 0x808b00 };
> +
> +    kvm_arch_get_registers(cs);
> +
> +    memcpy(&env->segs[R_CS], &seg_code, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_DS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_ES], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_FS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_GS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_SS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->tr, &seg_tr, sizeof(struct SegmentCache));
> +
> +    env->efer |= MSR_EFER_LME | MSR_EFER_LMA;
> +    env->regs[R_ESP] = BOOT_STACK_POINTER;
> +    env->regs[R_EBP] = BOOT_STACK_POINTER;
> +    env->regs[R_ESI] = ZERO_PAGE_START;
> +
> +    cpu_set_pc(cs, elf_entry);
> +    cpu_x86_update_cr3(env, PML4_START);
> +    cpu_x86_update_cr4(env, env->cr[4] | CR4_PAE_MASK);
> +    cpu_x86_update_cr0(env, env->cr[0] | CR0_PE_MASK | CR0_PG_MASK);
> +    x86_update_hflags(env);
> +
> +    kvm_arch_put_registers(cs, KVM_PUT_RESET_STATE);
> +}
> +
> +static void microvm_mptable_setup(MicrovmMachineState *mms)
> +{
> +    char *mptable;
> +    int size;
> +
> +    mptable = mptable_generate(smp_cpus, mms->apic_id_limit,
> +                               EBDA_START, &size);
> +    address_space_write(&address_space_memory,
> +                        EBDA_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) mptable, size);
> +    g_free(mptable);
> +}
> +
> +static bool microvm_machine_get_legacy(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->legacy;
> +}
> +
> +static void microvm_machine_set_legacy(Object *obj, bool value, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->legacy = value;
> +}
> +
> +static void microvm_machine_reset(void)
> +{
> +    MachineState *machine = MACHINE(qdev_get_machine());
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    CPUState *cs;
> +    X86CPU *cpu;
> +
> +    qemu_devices_reset();
> +
> +    microvm_mptable_setup(mms);
> +    microvm_setup_bootparams(mms, machine->kernel_cmdline);
> +    microvm_init_page_tables();
> +
> +    CPU_FOREACH(cs) {
> +        cpu = X86_CPU(cs);
> +
> +        /* Reset APIC after devices have been reset to cancel
> +         * any changes that qemu_devices_reset() might have done.
> +         */
> +        if (cpu->apic_state) {
> +            device_reset(cpu->apic_state);
> +        }
> +
> +        microvm_cpu_reset(cs, mms->elf_entry);
> +    }
> +}
> +
> +static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
> +{
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        X86CPU *cpu = X86_CPU(cs);
> +
> +        if (!cpu->apic_state) {
> +            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
> +        } else {
> +            apic_deliver_nmi(cpu->apic_state);
> +        }
> +    }
> +}
> +
> +static void microvm_machine_instance_init(Object *obj)
> +{
> +}
> +
> +static void microvm_class_init(ObjectClass *oc, void *data)
> +{
> +    NMIClass *nc = NMI_CLASS(oc);
> +
> +    /* NMI handler */
> +    nc->nmi_monitor_handler = x86_nmi;
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_LEGACY,
> +                                   microvm_machine_get_legacy,
> +                                   microvm_machine_set_legacy,
> +                                   &error_abort);
> +}
> +
> +static const TypeInfo microvm_machine_info = {
> +    .name          = TYPE_MICROVM_MACHINE,
> +    .parent        = TYPE_MACHINE,
> +    .abstract      = true,
> +    .instance_size = sizeof(MicrovmMachineState),
> +    .instance_init = microvm_machine_instance_init,
> +    .class_size    = sizeof(MicrovmMachineClass),
> +    .class_init    = microvm_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +         { TYPE_NMI },
> +         { }
> +    },
> +};
> +
> +static void microvm_machine_init(void)
> +{
> +    type_register_static(&microvm_machine_info);
> +}
> +type_init(microvm_machine_init);
> +
> +static void microvm_1_0_instance_init(Object *obj)
> +{
> +}
> +
> +static void microvm_machine_class_init(MachineClass *mc)
> +{
> +    mc->init = microvm_machine_state_init;
> +
> +    mc->family = "microvm_i386";
> +    mc->desc = "Microvm (i386)";
> +    mc->units_per_default_bus = 1;
> +    mc->no_floppy = 1;
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
> +    mc->max_cpus = 288;
> +    mc->has_hotpluggable_cpus = false;
> +    mc->auto_enable_numa_with_memhp = false;
> +    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");
> +    mc->nvdimm_supported = false;
> +    mc->default_machine_opts = "accel=kvm";
> +
> +    /* Machine class handlers */
> +    mc->cpu_index_to_instance_props = cpu_index_to_props;
> +    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
> +    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;;
> +    mc->reset = microvm_machine_reset;
> +}
> +
> +static void microvm_1_0_machine_class_init(MachineClass *mc)
> +{
> +    microvm_machine_class_init(mc);
> +}
> +DEFINE_MICROVM_MACHINE_AS_LATEST(1, 0)
> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
> new file mode 100644
> index 0000000000..544ef60563
> --- /dev/null
> +++ b/include/hw/i386/microvm.h
> @@ -0,0 +1,85 @@
> +/*
> + *
> + * Copyright (c) 2018 Intel Corporation
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_I386_MICROVM_H
> +#define HW_I386_MICROVM_H
> +
> +#include "qemu-common.h"
> +#include "exec/hwaddr.h"
> +#include "qemu/notify.h"
> +
> +#include "hw/boards.h"
> +
> +/* Microvm memory layout */
> +#define ZERO_PAGE_START      0x7000
> +#define BOOT_STACK_POINTER   0x8ff0
> +#define PML4_START           0x9000
> +#define PDPTE_START          0xa000
> +#define PDE_START            0xb000
> +#define EBDA_START           0x9fc00
> +#define HIMEM_START          0x100000
> +#define MICROVM_MAX_BELOW_4G 0xe0000000
> +
> +/* Bootparams related definitions */
> +#define KERNEL_BOOT_FLAG_MAGIC     0xaa55
> +#define KERNEL_HDR_MAGIC           0x53726448
> +#define KERNEL_LOADER_OTHER        0xff
> +#define KERNEL_MIN_ALIGNMENT_BYTES 0x01000000
> +#define KERNEL_CMDLINE_START       0x20000
> +#define KERNEL_CMDLINE_MAX_SIZE    0x10000
> +
> +/* Platform virtio definitions */
> +#define VIRTIO_MMIO_BASE      0xd0000000
> +#define VIRTIO_IRQ_BASE       5
> +#define VIRTIO_NUM_TRANSPORTS 8
> +#define VIRTIO_CMDLINE_MAXLEN 64
> +
> +/* Machine type options */
> +#define MICROVM_MACHINE_LEGACY "legacy"
> +
> +typedef struct {
> +    MachineClass parent;
> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
> +                                           DeviceState *dev);
> +} MicrovmMachineClass;
> +
> +typedef struct {
> +    MachineState parent;
> +    unsigned apic_id_limit;
> +    qemu_irq *gsi;
> +
> +    /* RAM size */
> +    ram_addr_t below_4g_mem_size;
> +    ram_addr_t above_4g_mem_size;
> +
> +    /* Kernel ELF entry. On reset, vCPUs RIP will be set to this */
> +    uint64_t elf_entry;
> +
> +    /* Legacy mode based on an ISA bus. Useful for debugging */
> +    bool legacy;
> +} MicrovmMachineState;
> +
> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
> +#define MICROVM_MACHINE(obj) \
> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_CLASS(class) \
> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
> +
> +#endif



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
  2019-06-28 14:06   ` Michael S. Tsirkin
  2019-06-28 19:15   ` Maran Wilson
@ 2019-06-28 19:47   ` Eduardo Habkost
  2019-06-28 21:42     ` Sergio Lopez
  2 siblings, 1 reply; 29+ messages in thread
From: Eduardo Habkost @ 2019-06-28 19:47 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: pbonzini, rth, qemu-devel, mst

Hi,

This looks good, overall, I'm just confused by the versioning
system.  Comments below:


On Fri, Jun 28, 2019 at 01:53:49PM +0200, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a KVM-only machine type with fast
> boot times, minimal attack surface (measured as the number of IO ports
> and MMIO regions exposed to the Guest) and small footprint (specially
> when combined with the ongoing QEMU modularization effort).
> 
> Normally, other than the device support provided by KVM itself,
> microvm only supports virtio-mmio devices. Microvm also includes a
> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> for being able to see the early boot kernel messages.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>
[...]
> +static const TypeInfo microvm_machine_info = {
> +    .name          = TYPE_MICROVM_MACHINE,
> +    .parent        = TYPE_MACHINE,
> +    .abstract      = true,
> +    .instance_size = sizeof(MicrovmMachineState),
> +    .instance_init = microvm_machine_instance_init,
> +    .class_size    = sizeof(MicrovmMachineClass),
> +    .class_init    = microvm_class_init,

[1]

> +    .interfaces = (InterfaceInfo[]) {
> +         { TYPE_NMI },
> +         { }
> +    },
> +};
> +
> +static void microvm_machine_init(void)
> +{
> +    type_register_static(&microvm_machine_info);
> +}
> +type_init(microvm_machine_init);
> +
> +static void microvm_1_0_instance_init(Object *obj)
> +{
> +}

You shouldn't need a instance_init function if it's empty, I
believe you can delete it.

> +
> +static void microvm_machine_class_init(MachineClass *mc)

Why do you need both microvm_machine_class_init() [1] and
microvm_class_init()?

> +{
> +    mc->init = microvm_machine_state_init;
> +
> +    mc->family = "microvm_i386";
> +    mc->desc = "Microvm (i386)";
> +    mc->units_per_default_bus = 1;
> +    mc->no_floppy = 1;
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
> +    mc->max_cpus = 288;

Where does this limit come from?

> +    mc->has_hotpluggable_cpus = false;
> +    mc->auto_enable_numa_with_memhp = false;
> +    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");
> +    mc->nvdimm_supported = false;
> +    mc->default_machine_opts = "accel=kvm";
> +
> +    /* Machine class handlers */
> +    mc->cpu_index_to_instance_props = cpu_index_to_props;
> +    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
> +    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;;

I don't think these methods should be mandatory if you don't
support NUMA or CPU hotplug.  Do you really need them?

(If the core machine code makes them mandatory, it's probably not
intentional).


> +    mc->reset = microvm_machine_reset;
> +}
> +
> +static void microvm_1_0_machine_class_init(MachineClass *mc)
> +{
> +    microvm_machine_class_init(mc);
> +}
> +DEFINE_MICROVM_MACHINE_AS_LATEST(1, 0)


We only have multiple versions of some machine types (pc-*,
virt-*, pseries-*, s390-ccw-virtio-*) because of Guest ABI
compatibility (which you are not implementing here).  What's the
reason behind having multiple microvm machine versions?


[...]
> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")

Using MACHINE_TYPE_NAME("microvm") might eventually cause
conflicts with the "microvm" alias you are registering.  I
suggest using something like "microvm-machine-base".

A separate base class will only be necessary if you are really
planning to provide multiple versions of the machine type,
though.


> +#define MICROVM_MACHINE(obj) \
> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_CLASS(class) \
> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
> +
> +#endif
> -- 
> 2.21.0
> 

-- 
Eduardo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] hw/i386: Factorize CPU routine
  2019-06-28 11:53 ` [Qemu-devel] [PATCH 1/4] hw/i386: Factorize CPU routine Sergio Lopez
@ 2019-06-28 20:03   ` Eduardo Habkost
  2019-06-28 21:44     ` Sergio Lopez
  0 siblings, 1 reply; 29+ messages in thread
From: Eduardo Habkost @ 2019-06-28 20:03 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: pbonzini, rth, qemu-devel, mst

On Fri, Jun 28, 2019 at 01:53:46PM +0200, Sergio Lopez wrote:
[...]
>  /* Enables contiguous-apic-ID mode, for compatibility */
> -static bool compat_apic_id_mode;
> +bool compat_apic_id_mode;

We can get rid of this global variable, see the patch I have just
sent:

  [PATCH] pc: Move compat_apic_id_mode variable to PCMachineClass

-- 
Eduardo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type
  2019-06-28 13:21 ` [Qemu-devel] [PATCH 0/4] " Paolo Bonzini
@ 2019-06-28 20:49   ` Sergio Lopez
  0 siblings, 0 replies; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 20:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, rth, ehabkost, mst

[-- Attachment #1: Type: text/plain, Size: 1413 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 28/06/19 13:53, Sergio Lopez wrote:
>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>  -nodefaults -no-user-config \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0 \
>>  -serial stdio
>
> I think the "non-legacy" mode can be obtained from the "legacy" one just
> with -nodefaults (which all sane management should be using anyway), so
> legacy=on can actually be the default. :)

I'm a bit confused here. The "legacy" boolean property in the microvm
machine type is used to indicate that QEMU should instantiate an i8259
PIC and an ISA bus (mainly to have easy access to "isa-serial"), instead
of relying on KVM's LAPIC+IOAPIC exclusively.

> I think this is interesting.  I'd love to have it optionally provide a
> device tree as well.  It's not very common on x86 and most distro
> kernels don't support device tree, but it would provide a more
> out-of-the-box experience and it may even be a drop-in replacement for
> q35 or pc as far as Kata is concerned.

I guess generating a dtb and having the kernel parse it will eat some
valuable microseconds but, as you say, it may be a good idea adding it
in the future, as long it's optional.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers
  2019-06-28 14:03   ` Michael S. Tsirkin
@ 2019-06-28 20:50     ` Sergio Lopez
  2019-06-30 21:36       ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 20:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, pbonzini, ehabkost, rth

[-- Attachment #1: Type: text/plain, Size: 5023 bytes --]


Michael S. Tsirkin <mst@redhat.com> writes:

> On Fri, Jun 28, 2019 at 01:53:47PM +0200, Sergio Lopez wrote:
>> Put QOM and main struct definition in a separate header file, so it
>> can be accesed from other components.
>> 
>> This is needed for the microvm machine type implementation.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>
> If you are going to productise virtio-mmio, then 1.0 support is a must.
> I am not sure we want a new machine with 0.X mmio devices.
> Especially considering that virtio-mmio does not have support for
> transitional devices.

What are the practical implications of that?

>> ---
>>  hw/virtio/virtio-mmio.c | 35 +-----------------------
>>  hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 61 insertions(+), 34 deletions(-)
>>  create mode 100644 hw/virtio/virtio-mmio.h
>> 
>> diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
>> index 97b7f35496..87c7fe4d8d 100644
>> --- a/hw/virtio/virtio-mmio.c
>> +++ b/hw/virtio/virtio-mmio.c
>> @@ -26,44 +26,11 @@
>>  #include "qemu/host-utils.h"
>>  #include "qemu/module.h"
>>  #include "sysemu/kvm.h"
>> -#include "hw/virtio/virtio-bus.h"
>> +#include "virtio-mmio.h"
>>  #include "qemu/error-report.h"
>>  #include "qemu/log.h"
>>  #include "trace.h"
>>  
>> -/* QOM macros */
>> -/* virtio-mmio-bus */
>> -#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
>> -#define VIRTIO_MMIO_BUS(obj) \
>> -        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
>> -#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
>> -        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
>> -#define VIRTIO_MMIO_BUS_CLASS(klass) \
>> -        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
>> -
>> -/* virtio-mmio */
>> -#define TYPE_VIRTIO_MMIO "virtio-mmio"
>> -#define VIRTIO_MMIO(obj) \
>> -        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
>> -
>> -#define VIRT_MAGIC 0x74726976 /* 'virt' */
>> -#define VIRT_VERSION 1
>> -#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
>> -
>> -typedef struct {
>> -    /* Generic */
>> -    SysBusDevice parent_obj;
>> -    MemoryRegion iomem;
>> -    qemu_irq irq;
>> -    /* Guest accessible state needing migration and reset */
>> -    uint32_t host_features_sel;
>> -    uint32_t guest_features_sel;
>> -    uint32_t guest_page_shift;
>> -    /* virtio-bus */
>> -    VirtioBusState bus;
>> -    bool format_transport_address;
>> -} VirtIOMMIOProxy;
>> -
>>  static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
>>  {
>>      return kvm_eventfds_enabled();
>> diff --git a/hw/virtio/virtio-mmio.h b/hw/virtio/virtio-mmio.h
>> new file mode 100644
>> index 0000000000..2f3973f8c7
>> --- /dev/null
>> +++ b/hw/virtio/virtio-mmio.h
>> @@ -0,0 +1,60 @@
>> +/*
>> + * Virtio MMIO bindings
>> + *
>> + * Copyright (c) 2011 Linaro Limited
>> + *
>> + * Author:
>> + *  Peter Maydell <peter.maydell@linaro.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License; either version 2
>> + * of the License, or (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef QEMU_VIRTIO_MMIO_H
>> +#define QEMU_VIRTIO_MMIO_H
>> +
>> +#include "hw/virtio/virtio-bus.h"
>> +
>> +/* QOM macros */
>> +/* virtio-mmio-bus */
>> +#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
>> +#define VIRTIO_MMIO_BUS(obj) \
>> +        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
>> +#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
>> +        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
>> +#define VIRTIO_MMIO_BUS_CLASS(klass) \
>> +        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
>> +
>> +/* virtio-mmio */
>> +#define TYPE_VIRTIO_MMIO "virtio-mmio"
>> +#define VIRTIO_MMIO(obj) \
>> +        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
>> +
>> +#define VIRT_MAGIC 0x74726976 /* 'virt' */
>> +#define VIRT_VERSION 1
>> +#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
>> +
>> +typedef struct {
>> +    /* Generic */
>> +    SysBusDevice parent_obj;
>> +    MemoryRegion iomem;
>> +    qemu_irq irq;
>> +    /* Guest accessible state needing migration and reset */
>> +    uint32_t host_features_sel;
>> +    uint32_t guest_features_sel;
>> +    uint32_t guest_page_shift;
>> +    /* virtio-bus */
>> +    VirtioBusState bus;
>> +    bool format_transport_address;
>> +} VirtIOMMIOProxy;
>> +
>> +#endif
>> -- 
>> 2.21.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 14:06   ` Michael S. Tsirkin
@ 2019-06-28 20:56     ` Sergio Lopez
  2019-06-28 22:17     ` Paolo Bonzini
  1 sibling, 0 replies; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 20:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, pbonzini, ehabkost, rth

[-- Attachment #1: Type: text/plain, Size: 25032 bytes --]


Michael S. Tsirkin <mst@redhat.com> writes:

> On Fri, Jun 28, 2019 at 01:53:49PM +0200, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a KVM-only machine type with fast
>> boot times, minimal attack surface (measured as the number of IO ports
>> and MMIO regions exposed to the Guest) and small footprint (specially
>> when combined with the ongoing QEMU modularization effort).
>> 
>> Normally, other than the device support provided by KVM itself,
>> microvm only supports virtio-mmio devices. Microvm also includes a
>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> for being able to see the early boot kernel messages.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>  default-configs/i386-softmmu.mak |   1 +
>>  hw/i386/Kconfig                  |   4 +
>>  hw/i386/Makefile.objs            |   1 +
>>  hw/i386/microvm.c                | 518 +++++++++++++++++++++++++++++++
>>  include/hw/i386/microvm.h        |  85 +++++
>>  5 files changed, 609 insertions(+)
>>  create mode 100644 hw/i386/microvm.c
>>  create mode 100644 include/hw/i386/microvm.h
>> 
>> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
>> index cd5ea391e8..338f07420f 100644
>> --- a/default-configs/i386-softmmu.mak
>> +++ b/default-configs/i386-softmmu.mak
>> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>>  CONFIG_I440FX=y
>>  CONFIG_Q35=y
>>  CONFIG_ACPI_PCI=y
>> +CONFIG_MICROVM=y
>> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
>> index 9817888216..94c565d8db 100644
>> --- a/hw/i386/Kconfig
>> +++ b/hw/i386/Kconfig
>> @@ -87,6 +87,10 @@ config Q35
>>      select VMMOUSE
>>      select FW_CFG_DMA
>>  
>> +config MICROVM
>> +    bool
>> +    select VIRTIO_MMIO
>> +
>>  config VTD
>>      bool
>>  
>> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 102f2b35fc..149bdd0784 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -4,6 +4,7 @@ obj-y += cpu.o
>>  obj-y += pc.o
>>  obj-$(CONFIG_I440FX) += pc_piix.o
>>  obj-$(CONFIG_Q35) += pc_q35.o
>> +obj-$(CONFIG_MICROVM) += mptable.o microvm.o
>>  obj-y += fw_cfg.o pc_sysfw.o
>>  obj-y += x86-iommu.o
>>  obj-$(CONFIG_VTD) += intel_iommu.o
>> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
>> new file mode 100644
>> index 0000000000..fff88c3697
>> --- /dev/null
>> +++ b/hw/i386/microvm.c
>> @@ -0,0 +1,518 @@
>> +/*
>> + *
>> + * Copyright (c) 2018 Intel Corporation
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "qapi/error.h"
>> +#include "qapi/visitor.h"
>> +#include "sysemu/sysemu.h"
>> +#include "sysemu/cpus.h"
>> +#include "sysemu/numa.h"
>> +
>> +#include "hw/loader.h"
>> +#include "hw/nmi.h"
>> +#include "hw/kvm/clock.h"
>> +#include "hw/i386/microvm.h"
>> +#include "hw/i386/pc.h"
>> +#include "hw/i386/cpu-internal.h"
>> +#include "target/i386/cpu.h"
>> +#include "hw/timer/i8254.h"
>> +#include "hw/char/serial.h"
>> +#include "hw/i386/topology.h"
>> +#include "hw/virtio/virtio-mmio.h"
>> +#include "hw/i386/mptable.h"
>> +
>> +#include "cpu.h"
>> +#include "elf.h"
>> +#include "kvm_i386.h"
>> +#include <asm/bootparam.h>
>> +
>> +#define DEFINE_MICROVM_MACHINE_LATEST(major, minor, latest) \
>> +    static void microvm_##major##_##minor##_object_class_init(ObjectClass *oc, \
>> +                                                              void *data) \
>> +    { \
>> +        MachineClass *mc = MACHINE_CLASS(oc); \
>> +        microvm_##major##_##minor##_machine_class_init(mc); \
>> +        mc->desc = "Microvm (i386)"; \
>> +        if (latest) { \
>> +            mc->alias = "microvm"; \
>> +        } \
>> +    } \
>> +    static const TypeInfo microvm_##major##_##minor##_info = { \
>> +        .name = MACHINE_TYPE_NAME("microvm-" # major "." # minor), \
>> +        .parent = TYPE_MICROVM_MACHINE, \
>> +        .instance_init = microvm_##major##_##minor##_instance_init, \
>> +        .class_init = microvm_##major##_##minor##_object_class_init, \
>> +    }; \
>> +    static void microvm_##major##_##minor##_init(void) \
>> +    { \
>> +        type_register_static(&microvm_##major##_##minor##_info); \
>> +    } \
>> +    type_init(microvm_##major##_##minor##_init);
>> +
>> +#define DEFINE_MICROVM_MACHINE_AS_LATEST(major, minor) \
>> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, true)
>> +#define DEFINE_MICROVM_MACHINE(major, minor) \
>> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, false)
>> +
>> +static void microvm_gsi_handler(void *opaque, int n, int level)
>> +{
>> +    qemu_irq *ioapic_irq = opaque;
>> +
>> +    qemu_set_irq(ioapic_irq[n], level);
>> +}
>> +
>> +static void microvm_legacy_init(MicrovmMachineState *mms)
>> +{
>> +    ISABus *isa_bus;
>> +    GSIState *gsi_state;
>> +    qemu_irq *i8259;
>> +    int i;
>> +
>> +    assert(kvm_irqchip_in_kernel());
>> +    gsi_state = g_malloc0(sizeof(*gsi_state));
>> +    mms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>> +
>> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
>> +                          &error_abort);
>> +    isa_bus_irqs(isa_bus, mms->gsi);
>> +
>> +    assert(kvm_pic_in_kernel());
>> +    i8259 = kvm_i8259_init(isa_bus);
>> +
>> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
>> +        gsi_state->i8259_irq[i] = i8259[i];
>> +    }
>> +
>> +    kvm_pit_init(isa_bus, 0x40);
>> +
>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>> +        int nirq = VIRTIO_IRQ_BASE + i;
>> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
>> +        qemu_irq mmio_irq;
>> +
>> +        isa_init_irq(isadev, &mmio_irq, nirq);
>> +        sysbus_create_simple("virtio-mmio",
>> +                             VIRTIO_MMIO_BASE + i * 512,
>> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
>> +    }
>> +
>> +    g_free(i8259);
>> +
>> +    serial_hds_isa_init(isa_bus, 0, 1);
>> +}
>> +
>> +static void microvm_ioapic_init(MicrovmMachineState *mms)
>> +{
>> +    qemu_irq *ioapic_irq;
>> +    DeviceState *ioapic_dev;
>> +    SysBusDevice *d;
>> +    int i;
>> +
>> +    assert(kvm_irqchip_in_kernel());
>
> Hmm - irqchip in kernel actually increases the attack surface,
> does it not? Or at least, the severity of the attacks.

I'd say the attack surface exposed to the Guest is roughly the same. As
for the severity of a hypothetical vulnerability, it depends a lot on
the context and the nature of the vulnerability itself, specially given
that QEMU is run as root and barely containerized in many scenarios.

Balancing the risks and costs, I'm more inclined to keep the irqchip in
kernel.

>> +    ioapic_irq = g_new0(qemu_irq, IOAPIC_NUM_PINS);
>> +    kvm_pc_setup_irq_routing(true);
>> +
>> +    assert(kvm_ioapic_in_kernel());
>> +    ioapic_dev = qdev_create(NULL, "kvm-ioapic");
>> +
>> +    object_property_add_child(qdev_get_machine(), "ioapic", OBJECT(ioapic_dev), NULL);
>> +
>> +    qdev_init_nofail(ioapic_dev);
>> +    d = SYS_BUS_DEVICE(ioapic_dev);
>> +    sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
>> +
>> +    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>> +        ioapic_irq[i] = qdev_get_gpio_in(ioapic_dev, i);
>> +    }
>> +
>> +    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler, ioapic_irq, IOAPIC_NUM_PINS);
>> +
>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>> +        sysbus_create_simple("virtio-mmio",
>> +                             VIRTIO_MMIO_BASE + i * 512,
>> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
>> +    }
>> +}
>> +
>> +static void microvm_memory_init(MicrovmMachineState *mms)
>> +{
>> +    MachineState *machine = MACHINE(mms);
>> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
>> +    MemoryRegion *system_memory = get_system_memory();
>> +
>> +    if (machine->ram_size > MICROVM_MAX_BELOW_4G) {
>> +        mms->above_4g_mem_size = machine->ram_size - MICROVM_MAX_BELOW_4G;
>> +        mms->below_4g_mem_size = MICROVM_MAX_BELOW_4G;
>> +    } else {
>> +        mms->above_4g_mem_size = 0;
>> +        mms->below_4g_mem_size = machine->ram_size;
>> +    }
>> +
>> +    ram = g_malloc(sizeof(*ram));
>> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
>> +                                         machine->ram_size);
>> +
>> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
>> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>> +                             0, mms->below_4g_mem_size);
>> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
>> +
>> +    e820_add_entry(0, mms->below_4g_mem_size, E820_RAM);
>> +
>> +    if (mms->above_4g_mem_size > 0) {
>> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
>> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>> +                                 mms->below_4g_mem_size,
>> +                                 mms->above_4g_mem_size);
>> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
>> +                                    ram_above_4g);
>> +        e820_add_entry(0x100000000ULL, mms->above_4g_mem_size, E820_RAM);
>> +    }
>> +}
>> +
>> +static void microvm_machine_state_init(MachineState *machine)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>> +    uint64_t elf_entry;
>> +    int kernel_size;
>> +
>> +    if (machine->kernel_filename == NULL) {
>> +        error_report("missing kernel image file name, required by microvm");
>> +        exit(1);
>> +    }
>> +
>> +    microvm_memory_init(mms);
>> +    if (mms->legacy) {
>> +        microvm_legacy_init(mms);
>> +    } else {
>> +        microvm_ioapic_init(mms);
>> +    }
>> +
>> +    mms->apic_id_limit = cpus_init(machine, false);
>> +
>> +    kvmclock_create();
>> +
>> +    kernel_size = load_elf(machine->kernel_filename, NULL,
>> +                           NULL, NULL, &elf_entry,
>> +                           NULL, NULL, 0, I386_ELF_MACHINE,
>> +                           0, 0);
>> +
>> +    if (kernel_size < 0) {
>> +        error_report("Error while loading elf kernel");
>> +        exit(1);
>> +    }
>> +
>> +    mms->elf_entry = elf_entry;
>> +}
>> +
>> +static gchar *microvm_get_virtio_mmio_cmdline(gchar *name)
>> +{
>> +    gchar *cmdline;
>> +    gchar *separator;
>> +    unsigned long index;
>> +    int ret;
>> +
>> +    separator = g_strrstr(name, ".");
>> +    if (!separator) {
>> +        return NULL;
>> +    }
>> +
>> +    index = strtol(separator + 1, NULL, 10);
>> +    if (index == LONG_MIN || index == LONG_MAX) {
>> +        return NULL;
>> +    }
>> +
>> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
>> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
>> +                     " virtio_mmio.device=512@0x%lx:%ld",
>> +                     VIRTIO_MMIO_BASE + index * 512,
>> +                     VIRTIO_IRQ_BASE + index);
>> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
>> +        g_free(cmdline);
>> +        return NULL;
>> +    }
>> +
>> +    return cmdline;
>> +}
>> +
>> +static void microvm_setup_bootparams(MicrovmMachineState *mms, const gchar *kernel_cmdline)
>> +{
>> +    struct boot_params params;
>> +    BusState *bus;
>> +    BusChild *kid;
>> +    gchar *cmdline;
>> +    int cmdline_len;
>> +    int i;
>> +
>> +    cmdline = g_strdup(kernel_cmdline);
>> +
>> +    /*
>> +     * Find MMIO transports with attached devices, and add them to the kernel
>> +     * command line.
>> +     */
>> +    bus = sysbus_get_default();
>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>> +        DeviceState *dev = kid->child;
>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>> +
>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>> +
>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>> +                gchar *mmio_cmdline = microvm_get_virtio_mmio_cmdline(mmio_bus->name);
>> +                if (mmio_cmdline) {
>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>> +                    g_free(mmio_cmdline);
>> +                    g_free(cmdline);
>> +                    cmdline = newcmd;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    cmdline_len = strlen(cmdline);
>> +
>> +    address_space_write(&address_space_memory,
>> +                        KERNEL_CMDLINE_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) cmdline, cmdline_len);
>> +
>> +    g_free(cmdline);
>> +
>> +    memset(&params, 0, sizeof(struct boot_params));
>> +
>> +    params.hdr.type_of_loader = KERNEL_LOADER_OTHER;
>> +    params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC;
>> +    params.hdr.header = KERNEL_HDR_MAGIC;
>> +    params.hdr.cmd_line_ptr = KERNEL_CMDLINE_START;
>> +    params.hdr.cmdline_size = cmdline_len;
>> +    params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
>> +
>> +    params.e820_entries = e820_get_num_entries();
>> +    for (i = 0; i < params.e820_entries; i++) {
>> +        uint64_t address, length;
>> +        if (e820_get_entry(i, E820_RAM, &address, &length)) {
>> +            params.e820_table[i].addr = address;
>> +            params.e820_table[i].size = length;
>> +            params.e820_table[i].type = E820_RAM;
>> +        }
>> +    }
>> +
>> +    address_space_write(&address_space_memory,
>> +                        ZERO_PAGE_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) &params, sizeof(struct boot_params));
>> +}
>> +
>> +static void microvm_init_page_tables(void)
>> +{
>> +    uint64_t val = 0;
>> +    int i;
>> +
>> +    val = PDPTE_START | 0x03;
>> +    address_space_write(&address_space_memory,
>> +                        PML4_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) &val, 8);
>> +    val = PDE_START | 0x03;
>> +    address_space_write(&address_space_memory,
>> +                        PDPTE_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) &val, 8);
>> +
>> +    for (i = 0; i < 512; i++) {
>> +        val = (i << 21) + 0x83;
>> +        address_space_write(&address_space_memory,
>> +                            PDE_START + (i * 8), MEMTXATTRS_UNSPECIFIED,
>> +                            (uint8_t *) &val, 8);
>> +    }
>> +}
>> +
>> +static void microvm_cpu_reset(CPUState *cs, uint64_t elf_entry)
>> +{
>> +    X86CPU *cpu = X86_CPU(cs);
>> +    CPUX86State *env = &cpu->env;
>> +    struct SegmentCache seg_code =
>> +        { .selector = 0x8, .base = 0x0, .limit = 0xfffff, .flags = 0xa09b00 };
>> +    struct SegmentCache seg_data =
>> +        { .selector = 0x10, .base = 0x0, .limit = 0xfffff, .flags = 0xc09300 };
>> +    struct SegmentCache seg_tr =
>> +        { .selector = 0x18, .base = 0x0, .limit = 0xfffff, .flags = 0x808b00 };
>> +
>> +    kvm_arch_get_registers(cs);
>> +
>> +    memcpy(&env->segs[R_CS], &seg_code, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_DS], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_ES], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_FS], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_GS], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_SS], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->tr, &seg_tr, sizeof(struct SegmentCache));
>> +
>> +    env->efer |= MSR_EFER_LME | MSR_EFER_LMA;
>> +    env->regs[R_ESP] = BOOT_STACK_POINTER;
>> +    env->regs[R_EBP] = BOOT_STACK_POINTER;
>> +    env->regs[R_ESI] = ZERO_PAGE_START;
>> +
>> +    cpu_set_pc(cs, elf_entry);
>> +    cpu_x86_update_cr3(env, PML4_START);
>> +    cpu_x86_update_cr4(env, env->cr[4] | CR4_PAE_MASK);
>> +    cpu_x86_update_cr0(env, env->cr[0] | CR0_PE_MASK | CR0_PG_MASK);
>> +    x86_update_hflags(env);
>> +
>> +    kvm_arch_put_registers(cs, KVM_PUT_RESET_STATE);
>> +}
>> +
>> +static void microvm_mptable_setup(MicrovmMachineState *mms)
>> +{
>> +    char *mptable;
>> +    int size;
>> +
>> +    mptable = mptable_generate(smp_cpus, mms->apic_id_limit,
>> +                               EBDA_START, &size);
>> +    address_space_write(&address_space_memory,
>> +                        EBDA_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) mptable, size);
>> +    g_free(mptable);
>> +}
>> +
>> +static bool microvm_machine_get_legacy(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->legacy;
>> +}
>> +
>> +static void microvm_machine_set_legacy(Object *obj, bool value, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->legacy = value;
>> +}
>> +
>> +static void microvm_machine_reset(void)
>> +{
>> +    MachineState *machine = MACHINE(qdev_get_machine());
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>> +    CPUState *cs;
>> +    X86CPU *cpu;
>> +
>> +    qemu_devices_reset();
>> +
>> +    microvm_mptable_setup(mms);
>> +    microvm_setup_bootparams(mms, machine->kernel_cmdline);
>> +    microvm_init_page_tables();
>> +
>> +    CPU_FOREACH(cs) {
>> +        cpu = X86_CPU(cs);
>> +
>> +        /* Reset APIC after devices have been reset to cancel
>> +         * any changes that qemu_devices_reset() might have done.
>> +         */
>> +        if (cpu->apic_state) {
>> +            device_reset(cpu->apic_state);
>> +        }
>> +
>> +        microvm_cpu_reset(cs, mms->elf_entry);
>> +    }
>> +}
>> +
>> +static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
>> +{
>> +    CPUState *cs;
>> +
>> +    CPU_FOREACH(cs) {
>> +        X86CPU *cpu = X86_CPU(cs);
>> +
>> +        if (!cpu->apic_state) {
>> +            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
>> +        } else {
>> +            apic_deliver_nmi(cpu->apic_state);
>> +        }
>> +    }
>> +}
>> +
>> +static void microvm_machine_instance_init(Object *obj)
>> +{
>> +}
>> +
>> +static void microvm_class_init(ObjectClass *oc, void *data)
>> +{
>> +    NMIClass *nc = NMI_CLASS(oc);
>> +
>> +    /* NMI handler */
>> +    nc->nmi_monitor_handler = x86_nmi;
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_LEGACY,
>> +                                   microvm_machine_get_legacy,
>> +                                   microvm_machine_set_legacy,
>> +                                   &error_abort);
>> +}
>> +
>> +static const TypeInfo microvm_machine_info = {
>> +    .name          = TYPE_MICROVM_MACHINE,
>> +    .parent        = TYPE_MACHINE,
>> +    .abstract      = true,
>> +    .instance_size = sizeof(MicrovmMachineState),
>> +    .instance_init = microvm_machine_instance_init,
>> +    .class_size    = sizeof(MicrovmMachineClass),
>> +    .class_init    = microvm_class_init,
>> +    .interfaces = (InterfaceInfo[]) {
>> +         { TYPE_NMI },
>> +         { }
>> +    },
>> +};
>> +
>> +static void microvm_machine_init(void)
>> +{
>> +    type_register_static(&microvm_machine_info);
>> +}
>> +type_init(microvm_machine_init);
>> +
>> +static void microvm_1_0_instance_init(Object *obj)
>> +{
>> +}
>> +
>> +static void microvm_machine_class_init(MachineClass *mc)
>> +{
>> +    mc->init = microvm_machine_state_init;
>> +
>> +    mc->family = "microvm_i386";
>> +    mc->desc = "Microvm (i386)";
>> +    mc->units_per_default_bus = 1;
>> +    mc->no_floppy = 1;
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
>> +    mc->max_cpus = 288;
>> +    mc->has_hotpluggable_cpus = false;
>> +    mc->auto_enable_numa_with_memhp = false;
>> +    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");
>> +    mc->nvdimm_supported = false;
>> +    mc->default_machine_opts = "accel=kvm";
>> +
>> +    /* Machine class handlers */
>> +    mc->cpu_index_to_instance_props = cpu_index_to_props;
>> +    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
>> +    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;;
>> +    mc->reset = microvm_machine_reset;
>> +}
>> +
>> +static void microvm_1_0_machine_class_init(MachineClass *mc)
>> +{
>> +    microvm_machine_class_init(mc);
>> +}
>> +DEFINE_MICROVM_MACHINE_AS_LATEST(1, 0)
>> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>> new file mode 100644
>> index 0000000000..544ef60563
>> --- /dev/null
>> +++ b/include/hw/i386/microvm.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + *
>> + * Copyright (c) 2018 Intel Corporation
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef HW_I386_MICROVM_H
>> +#define HW_I386_MICROVM_H
>> +
>> +#include "qemu-common.h"
>> +#include "exec/hwaddr.h"
>> +#include "qemu/notify.h"
>> +
>> +#include "hw/boards.h"
>> +
>> +/* Microvm memory layout */
>> +#define ZERO_PAGE_START      0x7000
>> +#define BOOT_STACK_POINTER   0x8ff0
>> +#define PML4_START           0x9000
>> +#define PDPTE_START          0xa000
>> +#define PDE_START            0xb000
>> +#define EBDA_START           0x9fc00
>> +#define HIMEM_START          0x100000
>> +#define MICROVM_MAX_BELOW_4G 0xe0000000
>> +
>> +/* Bootparams related definitions */
>> +#define KERNEL_BOOT_FLAG_MAGIC     0xaa55
>> +#define KERNEL_HDR_MAGIC           0x53726448
>> +#define KERNEL_LOADER_OTHER        0xff
>> +#define KERNEL_MIN_ALIGNMENT_BYTES 0x01000000
>> +#define KERNEL_CMDLINE_START       0x20000
>> +#define KERNEL_CMDLINE_MAX_SIZE    0x10000
>> +
>> +/* Platform virtio definitions */
>> +#define VIRTIO_MMIO_BASE      0xd0000000
>> +#define VIRTIO_IRQ_BASE       5
>> +#define VIRTIO_NUM_TRANSPORTS 8
>> +#define VIRTIO_CMDLINE_MAXLEN 64
>> +
>> +/* Machine type options */
>> +#define MICROVM_MACHINE_LEGACY "legacy"
>> +
>> +typedef struct {
>> +    MachineClass parent;
>> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
>> +                                           DeviceState *dev);
>> +} MicrovmMachineClass;
>> +
>> +typedef struct {
>> +    MachineState parent;
>> +    unsigned apic_id_limit;
>> +    qemu_irq *gsi;
>> +
>> +    /* RAM size */
>> +    ram_addr_t below_4g_mem_size;
>> +    ram_addr_t above_4g_mem_size;
>> +
>> +    /* Kernel ELF entry. On reset, vCPUs RIP will be set to this */
>> +    uint64_t elf_entry;
>> +
>> +    /* Legacy mode based on an ISA bus. Useful for debugging */
>> +    bool legacy;
>> +} MicrovmMachineState;
>> +
>> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
>> +#define MICROVM_MACHINE(obj) \
>> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_GET_CLASS(obj) \
>> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_CLASS(class) \
>> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
>> +
>> +#endif
>> -- 
>> 2.21.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 19:15   ` Maran Wilson
@ 2019-06-28 21:05     ` Sergio Lopez
  2019-06-28 21:54       ` Maran Wilson
  2019-06-28 21:56       ` Paolo Bonzini
  0 siblings, 2 replies; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 21:05 UTC (permalink / raw)
  To: Maran Wilson; +Cc: ehabkost, mst, qemu-devel, pbonzini, rth

[-- Attachment #1: Type: text/plain, Size: 24929 bytes --]


Maran Wilson <maran.wilson@oracle.com> writes:

> This seems like a good overall direction to be headed with Qemu.
>
> But there is a lot of Linux OS specific startup details being baked
> into the Qemu machine type here. Things that are usually pushed into
> firmware or option ROM.
>
> Instead of hard coding all the Zero page stuff into the Qemu machine
> model, couldn't you just setup the PVH kernel entry point and leave
> all the OS specific details to the OS being started? That way, at
> least you are programming to a more generic ABI spec. See:
> https://gist.github.com/stefano-garzarella/7b7e17e75add20abd1c42fb496cc6504
>
> And I think you still wouldn't need any firmware if you just replace
> your zeropage initialization with PVH spec setup.

The main reason for relying on Linux's Zero Page, is to be able to
pass the e820 table with the basic physical memory layout to the kernel
through it, as there isn't a BIOS nor ACPI. AFAIK, we can't do that with
PVH.

I'm inclined to keep it this way, and once there's an interest to use
the microvm machine type with a different kernel, try to find some
common ground.

> Thanks,
> -Maran
>
> On 6/28/2019 4:53 AM, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>>
>> It's main purpose is providing users a KVM-only machine type with fast
>> boot times, minimal attack surface (measured as the number of IO ports
>> and MMIO regions exposed to the Guest) and small footprint (specially
>> when combined with the ongoing QEMU modularization effort).
>>
>> Normally, other than the device support provided by KVM itself,
>> microvm only supports virtio-mmio devices. Microvm also includes a
>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> for being able to see the early boot kernel messages.
>>
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>> ---
>>   default-configs/i386-softmmu.mak |   1 +
>>   hw/i386/Kconfig                  |   4 +
>>   hw/i386/Makefile.objs            |   1 +
>>   hw/i386/microvm.c                | 518 +++++++++++++++++++++++++++++++
>>   include/hw/i386/microvm.h        |  85 +++++
>>   5 files changed, 609 insertions(+)
>>   create mode 100644 hw/i386/microvm.c
>>   create mode 100644 include/hw/i386/microvm.h
>>
>> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
>> index cd5ea391e8..338f07420f 100644
>> --- a/default-configs/i386-softmmu.mak
>> +++ b/default-configs/i386-softmmu.mak
>> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>>   CONFIG_I440FX=y
>>   CONFIG_Q35=y
>>   CONFIG_ACPI_PCI=y
>> +CONFIG_MICROVM=y
>> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
>> index 9817888216..94c565d8db 100644
>> --- a/hw/i386/Kconfig
>> +++ b/hw/i386/Kconfig
>> @@ -87,6 +87,10 @@ config Q35
>>       select VMMOUSE
>>       select FW_CFG_DMA
>>   +config MICROVM
>> +    bool
>> +    select VIRTIO_MMIO
>> +
>>   config VTD
>>       bool
>>   diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>> index 102f2b35fc..149bdd0784 100644
>> --- a/hw/i386/Makefile.objs
>> +++ b/hw/i386/Makefile.objs
>> @@ -4,6 +4,7 @@ obj-y += cpu.o
>>   obj-y += pc.o
>>   obj-$(CONFIG_I440FX) += pc_piix.o
>>   obj-$(CONFIG_Q35) += pc_q35.o
>> +obj-$(CONFIG_MICROVM) += mptable.o microvm.o
>>   obj-y += fw_cfg.o pc_sysfw.o
>>   obj-y += x86-iommu.o
>>   obj-$(CONFIG_VTD) += intel_iommu.o
>> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
>> new file mode 100644
>> index 0000000000..fff88c3697
>> --- /dev/null
>> +++ b/hw/i386/microvm.c
>> @@ -0,0 +1,518 @@
>> +/*
>> + *
>> + * Copyright (c) 2018 Intel Corporation
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/error-report.h"
>> +#include "qapi/error.h"
>> +#include "qapi/visitor.h"
>> +#include "sysemu/sysemu.h"
>> +#include "sysemu/cpus.h"
>> +#include "sysemu/numa.h"
>> +
>> +#include "hw/loader.h"
>> +#include "hw/nmi.h"
>> +#include "hw/kvm/clock.h"
>> +#include "hw/i386/microvm.h"
>> +#include "hw/i386/pc.h"
>> +#include "hw/i386/cpu-internal.h"
>> +#include "target/i386/cpu.h"
>> +#include "hw/timer/i8254.h"
>> +#include "hw/char/serial.h"
>> +#include "hw/i386/topology.h"
>> +#include "hw/virtio/virtio-mmio.h"
>> +#include "hw/i386/mptable.h"
>> +
>> +#include "cpu.h"
>> +#include "elf.h"
>> +#include "kvm_i386.h"
>> +#include <asm/bootparam.h>
>> +
>> +#define DEFINE_MICROVM_MACHINE_LATEST(major, minor, latest) \
>> +    static void microvm_##major##_##minor##_object_class_init(ObjectClass *oc, \
>> +                                                              void *data) \
>> +    { \
>> +        MachineClass *mc = MACHINE_CLASS(oc); \
>> +        microvm_##major##_##minor##_machine_class_init(mc); \
>> +        mc->desc = "Microvm (i386)"; \
>> +        if (latest) { \
>> +            mc->alias = "microvm"; \
>> +        } \
>> +    } \
>> +    static const TypeInfo microvm_##major##_##minor##_info = { \
>> +        .name = MACHINE_TYPE_NAME("microvm-" # major "." # minor), \
>> +        .parent = TYPE_MICROVM_MACHINE, \
>> +        .instance_init = microvm_##major##_##minor##_instance_init, \
>> +        .class_init = microvm_##major##_##minor##_object_class_init, \
>> +    }; \
>> +    static void microvm_##major##_##minor##_init(void) \
>> +    { \
>> +        type_register_static(&microvm_##major##_##minor##_info); \
>> +    } \
>> +    type_init(microvm_##major##_##minor##_init);
>> +
>> +#define DEFINE_MICROVM_MACHINE_AS_LATEST(major, minor) \
>> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, true)
>> +#define DEFINE_MICROVM_MACHINE(major, minor) \
>> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, false)
>> +
>> +static void microvm_gsi_handler(void *opaque, int n, int level)
>> +{
>> +    qemu_irq *ioapic_irq = opaque;
>> +
>> +    qemu_set_irq(ioapic_irq[n], level);
>> +}
>> +
>> +static void microvm_legacy_init(MicrovmMachineState *mms)
>> +{
>> +    ISABus *isa_bus;
>> +    GSIState *gsi_state;
>> +    qemu_irq *i8259;
>> +    int i;
>> +
>> +    assert(kvm_irqchip_in_kernel());
>> +    gsi_state = g_malloc0(sizeof(*gsi_state));
>> +    mms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>> +
>> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
>> +                          &error_abort);
>> +    isa_bus_irqs(isa_bus, mms->gsi);
>> +
>> +    assert(kvm_pic_in_kernel());
>> +    i8259 = kvm_i8259_init(isa_bus);
>> +
>> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
>> +        gsi_state->i8259_irq[i] = i8259[i];
>> +    }
>> +
>> +    kvm_pit_init(isa_bus, 0x40);
>> +
>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>> +        int nirq = VIRTIO_IRQ_BASE + i;
>> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
>> +        qemu_irq mmio_irq;
>> +
>> +        isa_init_irq(isadev, &mmio_irq, nirq);
>> +        sysbus_create_simple("virtio-mmio",
>> +                             VIRTIO_MMIO_BASE + i * 512,
>> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
>> +    }
>> +
>> +    g_free(i8259);
>> +
>> +    serial_hds_isa_init(isa_bus, 0, 1);
>> +}
>> +
>> +static void microvm_ioapic_init(MicrovmMachineState *mms)
>> +{
>> +    qemu_irq *ioapic_irq;
>> +    DeviceState *ioapic_dev;
>> +    SysBusDevice *d;
>> +    int i;
>> +
>> +    assert(kvm_irqchip_in_kernel());
>> +    ioapic_irq = g_new0(qemu_irq, IOAPIC_NUM_PINS);
>> +    kvm_pc_setup_irq_routing(true);
>> +
>> +    assert(kvm_ioapic_in_kernel());
>> +    ioapic_dev = qdev_create(NULL, "kvm-ioapic");
>> +
>> +    object_property_add_child(qdev_get_machine(), "ioapic", OBJECT(ioapic_dev), NULL);
>> +
>> +    qdev_init_nofail(ioapic_dev);
>> +    d = SYS_BUS_DEVICE(ioapic_dev);
>> +    sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
>> +
>> +    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>> +        ioapic_irq[i] = qdev_get_gpio_in(ioapic_dev, i);
>> +    }
>> +
>> +    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler, ioapic_irq, IOAPIC_NUM_PINS);
>> +
>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>> +        sysbus_create_simple("virtio-mmio",
>> +                             VIRTIO_MMIO_BASE + i * 512,
>> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
>> +    }
>> +}
>> +
>> +static void microvm_memory_init(MicrovmMachineState *mms)
>> +{
>> +    MachineState *machine = MACHINE(mms);
>> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
>> +    MemoryRegion *system_memory = get_system_memory();
>> +
>> +    if (machine->ram_size > MICROVM_MAX_BELOW_4G) {
>> +        mms->above_4g_mem_size = machine->ram_size - MICROVM_MAX_BELOW_4G;
>> +        mms->below_4g_mem_size = MICROVM_MAX_BELOW_4G;
>> +    } else {
>> +        mms->above_4g_mem_size = 0;
>> +        mms->below_4g_mem_size = machine->ram_size;
>> +    }
>> +
>> +    ram = g_malloc(sizeof(*ram));
>> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
>> +                                         machine->ram_size);
>> +
>> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
>> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>> +                             0, mms->below_4g_mem_size);
>> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
>> +
>> +    e820_add_entry(0, mms->below_4g_mem_size, E820_RAM);
>> +
>> +    if (mms->above_4g_mem_size > 0) {
>> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
>> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>> +                                 mms->below_4g_mem_size,
>> +                                 mms->above_4g_mem_size);
>> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
>> +                                    ram_above_4g);
>> +        e820_add_entry(0x100000000ULL, mms->above_4g_mem_size, E820_RAM);
>> +    }
>> +}
>> +
>> +static void microvm_machine_state_init(MachineState *machine)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>> +    uint64_t elf_entry;
>> +    int kernel_size;
>> +
>> +    if (machine->kernel_filename == NULL) {
>> +        error_report("missing kernel image file name, required by microvm");
>> +        exit(1);
>> +    }
>> +
>> +    microvm_memory_init(mms);
>> +    if (mms->legacy) {
>> +        microvm_legacy_init(mms);
>> +    } else {
>> +        microvm_ioapic_init(mms);
>> +    }
>> +
>> +    mms->apic_id_limit = cpus_init(machine, false);
>> +
>> +    kvmclock_create();
>> +
>> +    kernel_size = load_elf(machine->kernel_filename, NULL,
>> +                           NULL, NULL, &elf_entry,
>> +                           NULL, NULL, 0, I386_ELF_MACHINE,
>> +                           0, 0);
>> +
>> +    if (kernel_size < 0) {
>> +        error_report("Error while loading elf kernel");
>> +        exit(1);
>> +    }
>> +
>> +    mms->elf_entry = elf_entry;
>> +}
>> +
>> +static gchar *microvm_get_virtio_mmio_cmdline(gchar *name)
>> +{
>> +    gchar *cmdline;
>> +    gchar *separator;
>> +    unsigned long index;
>> +    int ret;
>> +
>> +    separator = g_strrstr(name, ".");
>> +    if (!separator) {
>> +        return NULL;
>> +    }
>> +
>> +    index = strtol(separator + 1, NULL, 10);
>> +    if (index == LONG_MIN || index == LONG_MAX) {
>> +        return NULL;
>> +    }
>> +
>> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
>> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
>> +                     " virtio_mmio.device=512@0x%lx:%ld",
>> +                     VIRTIO_MMIO_BASE + index * 512,
>> +                     VIRTIO_IRQ_BASE + index);
>> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
>> +        g_free(cmdline);
>> +        return NULL;
>> +    }
>> +
>> +    return cmdline;
>> +}
>> +
>> +static void microvm_setup_bootparams(MicrovmMachineState *mms, const gchar *kernel_cmdline)
>> +{
>> +    struct boot_params params;
>> +    BusState *bus;
>> +    BusChild *kid;
>> +    gchar *cmdline;
>> +    int cmdline_len;
>> +    int i;
>> +
>> +    cmdline = g_strdup(kernel_cmdline);
>> +
>> +    /*
>> +     * Find MMIO transports with attached devices, and add them to the kernel
>> +     * command line.
>> +     */
>> +    bus = sysbus_get_default();
>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>> +        DeviceState *dev = kid->child;
>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>> +
>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>> +
>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>> +                gchar *mmio_cmdline = microvm_get_virtio_mmio_cmdline(mmio_bus->name);
>> +                if (mmio_cmdline) {
>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>> +                    g_free(mmio_cmdline);
>> +                    g_free(cmdline);
>> +                    cmdline = newcmd;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    cmdline_len = strlen(cmdline);
>> +
>> +    address_space_write(&address_space_memory,
>> +                        KERNEL_CMDLINE_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) cmdline, cmdline_len);
>> +
>> +    g_free(cmdline);
>> +
>> +    memset(&params, 0, sizeof(struct boot_params));
>> +
>> +    params.hdr.type_of_loader = KERNEL_LOADER_OTHER;
>> +    params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC;
>> +    params.hdr.header = KERNEL_HDR_MAGIC;
>> +    params.hdr.cmd_line_ptr = KERNEL_CMDLINE_START;
>> +    params.hdr.cmdline_size = cmdline_len;
>> +    params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
>> +
>> +    params.e820_entries = e820_get_num_entries();
>> +    for (i = 0; i < params.e820_entries; i++) {
>> +        uint64_t address, length;
>> +        if (e820_get_entry(i, E820_RAM, &address, &length)) {
>> +            params.e820_table[i].addr = address;
>> +            params.e820_table[i].size = length;
>> +            params.e820_table[i].type = E820_RAM;
>> +        }
>> +    }
>> +
>> +    address_space_write(&address_space_memory,
>> +                        ZERO_PAGE_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) &params, sizeof(struct boot_params));
>> +}
>> +
>> +static void microvm_init_page_tables(void)
>> +{
>> +    uint64_t val = 0;
>> +    int i;
>> +
>> +    val = PDPTE_START | 0x03;
>> +    address_space_write(&address_space_memory,
>> +                        PML4_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) &val, 8);
>> +    val = PDE_START | 0x03;
>> +    address_space_write(&address_space_memory,
>> +                        PDPTE_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) &val, 8);
>> +
>> +    for (i = 0; i < 512; i++) {
>> +        val = (i << 21) + 0x83;
>> +        address_space_write(&address_space_memory,
>> +                            PDE_START + (i * 8), MEMTXATTRS_UNSPECIFIED,
>> +                            (uint8_t *) &val, 8);
>> +    }
>> +}
>> +
>> +static void microvm_cpu_reset(CPUState *cs, uint64_t elf_entry)
>> +{
>> +    X86CPU *cpu = X86_CPU(cs);
>> +    CPUX86State *env = &cpu->env;
>> +    struct SegmentCache seg_code =
>> +        { .selector = 0x8, .base = 0x0, .limit = 0xfffff, .flags = 0xa09b00 };
>> +    struct SegmentCache seg_data =
>> +        { .selector = 0x10, .base = 0x0, .limit = 0xfffff, .flags = 0xc09300 };
>> +    struct SegmentCache seg_tr =
>> +        { .selector = 0x18, .base = 0x0, .limit = 0xfffff, .flags = 0x808b00 };
>> +
>> +    kvm_arch_get_registers(cs);
>> +
>> +    memcpy(&env->segs[R_CS], &seg_code, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_DS], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_ES], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_FS], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_GS], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->segs[R_SS], &seg_data, sizeof(struct SegmentCache));
>> +    memcpy(&env->tr, &seg_tr, sizeof(struct SegmentCache));
>> +
>> +    env->efer |= MSR_EFER_LME | MSR_EFER_LMA;
>> +    env->regs[R_ESP] = BOOT_STACK_POINTER;
>> +    env->regs[R_EBP] = BOOT_STACK_POINTER;
>> +    env->regs[R_ESI] = ZERO_PAGE_START;
>> +
>> +    cpu_set_pc(cs, elf_entry);
>> +    cpu_x86_update_cr3(env, PML4_START);
>> +    cpu_x86_update_cr4(env, env->cr[4] | CR4_PAE_MASK);
>> +    cpu_x86_update_cr0(env, env->cr[0] | CR0_PE_MASK | CR0_PG_MASK);
>> +    x86_update_hflags(env);
>> +
>> +    kvm_arch_put_registers(cs, KVM_PUT_RESET_STATE);
>> +}
>> +
>> +static void microvm_mptable_setup(MicrovmMachineState *mms)
>> +{
>> +    char *mptable;
>> +    int size;
>> +
>> +    mptable = mptable_generate(smp_cpus, mms->apic_id_limit,
>> +                               EBDA_START, &size);
>> +    address_space_write(&address_space_memory,
>> +                        EBDA_START, MEMTXATTRS_UNSPECIFIED,
>> +                        (uint8_t *) mptable, size);
>> +    g_free(mptable);
>> +}
>> +
>> +static bool microvm_machine_get_legacy(Object *obj, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    return mms->legacy;
>> +}
>> +
>> +static void microvm_machine_set_legacy(Object *obj, bool value, Error **errp)
>> +{
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>> +
>> +    mms->legacy = value;
>> +}
>> +
>> +static void microvm_machine_reset(void)
>> +{
>> +    MachineState *machine = MACHINE(qdev_get_machine());
>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>> +    CPUState *cs;
>> +    X86CPU *cpu;
>> +
>> +    qemu_devices_reset();
>> +
>> +    microvm_mptable_setup(mms);
>> +    microvm_setup_bootparams(mms, machine->kernel_cmdline);
>> +    microvm_init_page_tables();
>> +
>> +    CPU_FOREACH(cs) {
>> +        cpu = X86_CPU(cs);
>> +
>> +        /* Reset APIC after devices have been reset to cancel
>> +         * any changes that qemu_devices_reset() might have done.
>> +         */
>> +        if (cpu->apic_state) {
>> +            device_reset(cpu->apic_state);
>> +        }
>> +
>> +        microvm_cpu_reset(cs, mms->elf_entry);
>> +    }
>> +}
>> +
>> +static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
>> +{
>> +    CPUState *cs;
>> +
>> +    CPU_FOREACH(cs) {
>> +        X86CPU *cpu = X86_CPU(cs);
>> +
>> +        if (!cpu->apic_state) {
>> +            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
>> +        } else {
>> +            apic_deliver_nmi(cpu->apic_state);
>> +        }
>> +    }
>> +}
>> +
>> +static void microvm_machine_instance_init(Object *obj)
>> +{
>> +}
>> +
>> +static void microvm_class_init(ObjectClass *oc, void *data)
>> +{
>> +    NMIClass *nc = NMI_CLASS(oc);
>> +
>> +    /* NMI handler */
>> +    nc->nmi_monitor_handler = x86_nmi;
>> +
>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_LEGACY,
>> +                                   microvm_machine_get_legacy,
>> +                                   microvm_machine_set_legacy,
>> +                                   &error_abort);
>> +}
>> +
>> +static const TypeInfo microvm_machine_info = {
>> +    .name          = TYPE_MICROVM_MACHINE,
>> +    .parent        = TYPE_MACHINE,
>> +    .abstract      = true,
>> +    .instance_size = sizeof(MicrovmMachineState),
>> +    .instance_init = microvm_machine_instance_init,
>> +    .class_size    = sizeof(MicrovmMachineClass),
>> +    .class_init    = microvm_class_init,
>> +    .interfaces = (InterfaceInfo[]) {
>> +         { TYPE_NMI },
>> +         { }
>> +    },
>> +};
>> +
>> +static void microvm_machine_init(void)
>> +{
>> +    type_register_static(&microvm_machine_info);
>> +}
>> +type_init(microvm_machine_init);
>> +
>> +static void microvm_1_0_instance_init(Object *obj)
>> +{
>> +}
>> +
>> +static void microvm_machine_class_init(MachineClass *mc)
>> +{
>> +    mc->init = microvm_machine_state_init;
>> +
>> +    mc->family = "microvm_i386";
>> +    mc->desc = "Microvm (i386)";
>> +    mc->units_per_default_bus = 1;
>> +    mc->no_floppy = 1;
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
>> +    mc->max_cpus = 288;
>> +    mc->has_hotpluggable_cpus = false;
>> +    mc->auto_enable_numa_with_memhp = false;
>> +    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");
>> +    mc->nvdimm_supported = false;
>> +    mc->default_machine_opts = "accel=kvm";
>> +
>> +    /* Machine class handlers */
>> +    mc->cpu_index_to_instance_props = cpu_index_to_props;
>> +    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
>> +    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;;
>> +    mc->reset = microvm_machine_reset;
>> +}
>> +
>> +static void microvm_1_0_machine_class_init(MachineClass *mc)
>> +{
>> +    microvm_machine_class_init(mc);
>> +}
>> +DEFINE_MICROVM_MACHINE_AS_LATEST(1, 0)
>> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>> new file mode 100644
>> index 0000000000..544ef60563
>> --- /dev/null
>> +++ b/include/hw/i386/microvm.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + *
>> + * Copyright (c) 2018 Intel Corporation
>> + * Copyright (c) 2019 Red Hat, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef HW_I386_MICROVM_H
>> +#define HW_I386_MICROVM_H
>> +
>> +#include "qemu-common.h"
>> +#include "exec/hwaddr.h"
>> +#include "qemu/notify.h"
>> +
>> +#include "hw/boards.h"
>> +
>> +/* Microvm memory layout */
>> +#define ZERO_PAGE_START      0x7000
>> +#define BOOT_STACK_POINTER   0x8ff0
>> +#define PML4_START           0x9000
>> +#define PDPTE_START          0xa000
>> +#define PDE_START            0xb000
>> +#define EBDA_START           0x9fc00
>> +#define HIMEM_START          0x100000
>> +#define MICROVM_MAX_BELOW_4G 0xe0000000
>> +
>> +/* Bootparams related definitions */
>> +#define KERNEL_BOOT_FLAG_MAGIC     0xaa55
>> +#define KERNEL_HDR_MAGIC           0x53726448
>> +#define KERNEL_LOADER_OTHER        0xff
>> +#define KERNEL_MIN_ALIGNMENT_BYTES 0x01000000
>> +#define KERNEL_CMDLINE_START       0x20000
>> +#define KERNEL_CMDLINE_MAX_SIZE    0x10000
>> +
>> +/* Platform virtio definitions */
>> +#define VIRTIO_MMIO_BASE      0xd0000000
>> +#define VIRTIO_IRQ_BASE       5
>> +#define VIRTIO_NUM_TRANSPORTS 8
>> +#define VIRTIO_CMDLINE_MAXLEN 64
>> +
>> +/* Machine type options */
>> +#define MICROVM_MACHINE_LEGACY "legacy"
>> +
>> +typedef struct {
>> +    MachineClass parent;
>> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
>> +                                           DeviceState *dev);
>> +} MicrovmMachineClass;
>> +
>> +typedef struct {
>> +    MachineState parent;
>> +    unsigned apic_id_limit;
>> +    qemu_irq *gsi;
>> +
>> +    /* RAM size */
>> +    ram_addr_t below_4g_mem_size;
>> +    ram_addr_t above_4g_mem_size;
>> +
>> +    /* Kernel ELF entry. On reset, vCPUs RIP will be set to this */
>> +    uint64_t elf_entry;
>> +
>> +    /* Legacy mode based on an ISA bus. Useful for debugging */
>> +    bool legacy;
>> +} MicrovmMachineState;
>> +
>> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
>> +#define MICROVM_MACHINE(obj) \
>> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_GET_CLASS(obj) \
>> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_CLASS(class) \
>> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
>> +
>> +#endif


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 19:47   ` Eduardo Habkost
@ 2019-06-28 21:42     ` Sergio Lopez
  2019-06-28 21:57       ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 21:42 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: pbonzini, rth, qemu-devel, mst

[-- Attachment #1: Type: text/plain, Size: 4975 bytes --]


Eduardo Habkost <ehabkost@redhat.com> writes:

> Hi,
>
> This looks good, overall, I'm just confused by the versioning
> system.  Comments below:
>
>
> On Fri, Jun 28, 2019 at 01:53:49PM +0200, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a KVM-only machine type with fast
>> boot times, minimal attack surface (measured as the number of IO ports
>> and MMIO regions exposed to the Guest) and small footprint (specially
>> when combined with the ongoing QEMU modularization effort).
>> 
>> Normally, other than the device support provided by KVM itself,
>> microvm only supports virtio-mmio devices. Microvm also includes a
>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> for being able to see the early boot kernel messages.
>> 
>> Signed-off-by: Sergio Lopez <slp@redhat.com>
> [...]
>> +static const TypeInfo microvm_machine_info = {
>> +    .name          = TYPE_MICROVM_MACHINE,
>> +    .parent        = TYPE_MACHINE,
>> +    .abstract      = true,
>> +    .instance_size = sizeof(MicrovmMachineState),
>> +    .instance_init = microvm_machine_instance_init,
>> +    .class_size    = sizeof(MicrovmMachineClass),
>> +    .class_init    = microvm_class_init,
>
> [1]
>
>> +    .interfaces = (InterfaceInfo[]) {
>> +         { TYPE_NMI },
>> +         { }
>> +    },
>> +};
>> +
>> +static void microvm_machine_init(void)
>> +{
>> +    type_register_static(&microvm_machine_info);
>> +}
>> +type_init(microvm_machine_init);
>> +
>> +static void microvm_1_0_instance_init(Object *obj)
>> +{
>> +}
>
> You shouldn't need a instance_init function if it's empty, I
> believe you can delete it.

Ack.

>> +
>> +static void microvm_machine_class_init(MachineClass *mc)
>
> Why do you need both microvm_machine_class_init() [1] and
> microvm_class_init()?

No idea. To be honest, I took the boilerplate from NEMU's virt machine
type (hence the copyright notice), and I assumed that was actually
mandatory.

>> +{
>> +    mc->init = microvm_machine_state_init;
>> +
>> +    mc->family = "microvm_i386";
>> +    mc->desc = "Microvm (i386)";
>> +    mc->units_per_default_bus = 1;
>> +    mc->no_floppy = 1;
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
>> +    mc->max_cpus = 288;
>
> Where does this limit come from?

From pc_q35.c:366. Apparently, having this limit defined is mandatory,
and I wasn't which value would make sense for microvm.

>> +    mc->has_hotpluggable_cpus = false;
>> +    mc->auto_enable_numa_with_memhp = false;
>> +    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");
>> +    mc->nvdimm_supported = false;
>> +    mc->default_machine_opts = "accel=kvm";
>> +
>> +    /* Machine class handlers */
>> +    mc->cpu_index_to_instance_props = cpu_index_to_props;
>> +    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
>> +    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;;
>
> I don't think these methods should be mandatory if you don't
> support NUMA or CPU hotplug.  Do you really need them?
>
> (If the core machine code makes them mandatory, it's probably not
> intentional).

Ack, I'll check whether this is actually needed or not.

>> +    mc->reset = microvm_machine_reset;
>> +}
>> +
>> +static void microvm_1_0_machine_class_init(MachineClass *mc)
>> +{
>> +    microvm_machine_class_init(mc);
>> +}
>> +DEFINE_MICROVM_MACHINE_AS_LATEST(1, 0)
>
>
> We only have multiple versions of some machine types (pc-*,
> virt-*, pseries-*, s390-ccw-virtio-*) because of Guest ABI
> compatibility (which you are not implementing here).  What's the
> reason behind having multiple microvm machine versions?

I though it could be a good idea to have versioning already in place, in
case we need it in the future. But, perhaps we can do a simple machine
definition and just add versioning when it's really needed?

> [...]
>> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
>
> Using MACHINE_TYPE_NAME("microvm") might eventually cause
> conflicts with the "microvm" alias you are registering.  I
> suggest using something like "microvm-machine-base".
>
> A separate base class will only be necessary if you are really
> planning to provide multiple versions of the machine type,
> though.

Ack.

Thanks!
Sergio.

>> +#define MICROVM_MACHINE(obj) \
>> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_GET_CLASS(obj) \
>> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
>> +#define MICROVM_MACHINE_CLASS(class) \
>> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
>> +
>> +#endif
>> -- 
>> 2.21.0
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] hw/i386: Factorize CPU routine
  2019-06-28 20:03   ` Eduardo Habkost
@ 2019-06-28 21:44     ` Sergio Lopez
  2019-07-01  9:25       ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 21:44 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: pbonzini, rth, qemu-devel, mst

[-- Attachment #1: Type: text/plain, Size: 489 bytes --]


Eduardo Habkost <ehabkost@redhat.com> writes:

> On Fri, Jun 28, 2019 at 01:53:46PM +0200, Sergio Lopez wrote:
> [...]
>>  /* Enables contiguous-apic-ID mode, for compatibility */
>> -static bool compat_apic_id_mode;
>> +bool compat_apic_id_mode;
>
> We can get rid of this global variable, see the patch I have just
> sent:
>
>   [PATCH] pc: Move compat_apic_id_mode variable to PCMachineClass

Nice. I'll adapt the v2 of the patchset to assume this has been
committed.

Thanks!
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 21:05     ` Sergio Lopez
@ 2019-06-28 21:54       ` Maran Wilson
  2019-06-28 22:23         ` Sergio Lopez
  2019-06-28 21:56       ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Maran Wilson @ 2019-06-28 21:54 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: ehabkost, Maran Wilson, mst, qemu-devel, pbonzini, rth

On 6/28/2019 2:05 PM, Sergio Lopez wrote:
> Maran Wilson <maran.wilson@oracle.com> writes:
>
>> This seems like a good overall direction to be headed with Qemu.
>>
>> But there is a lot of Linux OS specific startup details being baked
>> into the Qemu machine type here. Things that are usually pushed into
>> firmware or option ROM.
>>
>> Instead of hard coding all the Zero page stuff into the Qemu machine
>> model, couldn't you just setup the PVH kernel entry point and leave
>> all the OS specific details to the OS being started? That way, at
>> least you are programming to a more generic ABI spec. See:
>> https://gist.github.com/stefano-garzarella/7b7e17e75add20abd1c42fb496cc6504
>>
>> And I think you still wouldn't need any firmware if you just replace
>> your zeropage initialization with PVH spec setup.
> The main reason for relying on Linux's Zero Page, is to be able to
> pass the e820 table with the basic physical memory layout to the kernel
> through it, as there isn't a BIOS nor ACPI. AFAIK, we can't do that with
> PVH.

Actually, we specifically updated the PVH interface to add just that (). 
Please take a look at start_info.h in the Qemu source.

And here's the patch where the canonical definition for the HVM direct 
boot ABI was updated:
https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00053.html
You can see we added fields for memmap_paddr/memmap_entries to the 
struct. I think the community feedback was to avoid labeling it "e820" 
but, in fact, that's exactly what it is -- and as you can see, the types 
were tied to the memmap types specified in the ACPI spec.

And no BIOS nor Firmware was needed to boot via the PVH entry. When we 
first came to the community as an RFC with the PVH/KVM idea, we got 
feedback that folks would prefer to see the PVH setup stuff pushed into 
qboot or option rom so that's why it ended up that way (using a very 
minimal FW of your choice). But there is technically no reason why it 
can't be done directly in Qemu just like you have done all the zeropage 
init stuff in these patches. That would have been our first preference 
anyway.

Thanks,
-Maran

> I'm inclined to keep it this way, and once there's an interest to use
> the microvm machine type with a different kernel, try to find some
> common ground.
>
>> Thanks,
>> -Maran
>>
>> On 6/28/2019 4:53 AM, Sergio Lopez wrote:
>>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>>> constructed after the machine model implemented by the latter.
>>>
>>> It's main purpose is providing users a KVM-only machine type with fast
>>> boot times, minimal attack surface (measured as the number of IO ports
>>> and MMIO regions exposed to the Guest) and small footprint (specially
>>> when combined with the ongoing QEMU modularization effort).
>>>
>>> Normally, other than the device support provided by KVM itself,
>>> microvm only supports virtio-mmio devices. Microvm also includes a
>>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>>> for being able to see the early boot kernel messages.
>>>
>>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>>> ---
>>>    default-configs/i386-softmmu.mak |   1 +
>>>    hw/i386/Kconfig                  |   4 +
>>>    hw/i386/Makefile.objs            |   1 +
>>>    hw/i386/microvm.c                | 518 +++++++++++++++++++++++++++++++
>>>    include/hw/i386/microvm.h        |  85 +++++
>>>    5 files changed, 609 insertions(+)
>>>    create mode 100644 hw/i386/microvm.c
>>>    create mode 100644 include/hw/i386/microvm.h
>>>
>>> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
>>> index cd5ea391e8..338f07420f 100644
>>> --- a/default-configs/i386-softmmu.mak
>>> +++ b/default-configs/i386-softmmu.mak
>>> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>>>    CONFIG_I440FX=y
>>>    CONFIG_Q35=y
>>>    CONFIG_ACPI_PCI=y
>>> +CONFIG_MICROVM=y
>>> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
>>> index 9817888216..94c565d8db 100644
>>> --- a/hw/i386/Kconfig
>>> +++ b/hw/i386/Kconfig
>>> @@ -87,6 +87,10 @@ config Q35
>>>        select VMMOUSE
>>>        select FW_CFG_DMA
>>>    +config MICROVM
>>> +    bool
>>> +    select VIRTIO_MMIO
>>> +
>>>    config VTD
>>>        bool
>>>    diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>>> index 102f2b35fc..149bdd0784 100644
>>> --- a/hw/i386/Makefile.objs
>>> +++ b/hw/i386/Makefile.objs
>>> @@ -4,6 +4,7 @@ obj-y += cpu.o
>>>    obj-y += pc.o
>>>    obj-$(CONFIG_I440FX) += pc_piix.o
>>>    obj-$(CONFIG_Q35) += pc_q35.o
>>> +obj-$(CONFIG_MICROVM) += mptable.o microvm.o
>>>    obj-y += fw_cfg.o pc_sysfw.o
>>>    obj-y += x86-iommu.o
>>>    obj-$(CONFIG_VTD) += intel_iommu.o
>>> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
>>> new file mode 100644
>>> index 0000000000..fff88c3697
>>> --- /dev/null
>>> +++ b/hw/i386/microvm.c
>>> @@ -0,0 +1,518 @@
>>> +/*
>>> + *
>>> + * Copyright (c) 2018 Intel Corporation
>>> + * Copyright (c) 2019 Red Hat, Inc.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2 or later, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>> + * more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along with
>>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu/error-report.h"
>>> +#include "qapi/error.h"
>>> +#include "qapi/visitor.h"
>>> +#include "sysemu/sysemu.h"
>>> +#include "sysemu/cpus.h"
>>> +#include "sysemu/numa.h"
>>> +
>>> +#include "hw/loader.h"
>>> +#include "hw/nmi.h"
>>> +#include "hw/kvm/clock.h"
>>> +#include "hw/i386/microvm.h"
>>> +#include "hw/i386/pc.h"
>>> +#include "hw/i386/cpu-internal.h"
>>> +#include "target/i386/cpu.h"
>>> +#include "hw/timer/i8254.h"
>>> +#include "hw/char/serial.h"
>>> +#include "hw/i386/topology.h"
>>> +#include "hw/virtio/virtio-mmio.h"
>>> +#include "hw/i386/mptable.h"
>>> +
>>> +#include "cpu.h"
>>> +#include "elf.h"
>>> +#include "kvm_i386.h"
>>> +#include <asm/bootparam.h>
>>> +
>>> +#define DEFINE_MICROVM_MACHINE_LATEST(major, minor, latest) \
>>> +    static void microvm_##major##_##minor##_object_class_init(ObjectClass *oc, \
>>> +                                                              void *data) \
>>> +    { \
>>> +        MachineClass *mc = MACHINE_CLASS(oc); \
>>> +        microvm_##major##_##minor##_machine_class_init(mc); \
>>> +        mc->desc = "Microvm (i386)"; \
>>> +        if (latest) { \
>>> +            mc->alias = "microvm"; \
>>> +        } \
>>> +    } \
>>> +    static const TypeInfo microvm_##major##_##minor##_info = { \
>>> +        .name = MACHINE_TYPE_NAME("microvm-" # major "." # minor), \
>>> +        .parent = TYPE_MICROVM_MACHINE, \
>>> +        .instance_init = microvm_##major##_##minor##_instance_init, \
>>> +        .class_init = microvm_##major##_##minor##_object_class_init, \
>>> +    }; \
>>> +    static void microvm_##major##_##minor##_init(void) \
>>> +    { \
>>> +        type_register_static(&microvm_##major##_##minor##_info); \
>>> +    } \
>>> +    type_init(microvm_##major##_##minor##_init);
>>> +
>>> +#define DEFINE_MICROVM_MACHINE_AS_LATEST(major, minor) \
>>> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, true)
>>> +#define DEFINE_MICROVM_MACHINE(major, minor) \
>>> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, false)
>>> +
>>> +static void microvm_gsi_handler(void *opaque, int n, int level)
>>> +{
>>> +    qemu_irq *ioapic_irq = opaque;
>>> +
>>> +    qemu_set_irq(ioapic_irq[n], level);
>>> +}
>>> +
>>> +static void microvm_legacy_init(MicrovmMachineState *mms)
>>> +{
>>> +    ISABus *isa_bus;
>>> +    GSIState *gsi_state;
>>> +    qemu_irq *i8259;
>>> +    int i;
>>> +
>>> +    assert(kvm_irqchip_in_kernel());
>>> +    gsi_state = g_malloc0(sizeof(*gsi_state));
>>> +    mms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>>> +
>>> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
>>> +                          &error_abort);
>>> +    isa_bus_irqs(isa_bus, mms->gsi);
>>> +
>>> +    assert(kvm_pic_in_kernel());
>>> +    i8259 = kvm_i8259_init(isa_bus);
>>> +
>>> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
>>> +        gsi_state->i8259_irq[i] = i8259[i];
>>> +    }
>>> +
>>> +    kvm_pit_init(isa_bus, 0x40);
>>> +
>>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>>> +        int nirq = VIRTIO_IRQ_BASE + i;
>>> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
>>> +        qemu_irq mmio_irq;
>>> +
>>> +        isa_init_irq(isadev, &mmio_irq, nirq);
>>> +        sysbus_create_simple("virtio-mmio",
>>> +                             VIRTIO_MMIO_BASE + i * 512,
>>> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
>>> +    }
>>> +
>>> +    g_free(i8259);
>>> +
>>> +    serial_hds_isa_init(isa_bus, 0, 1);
>>> +}
>>> +
>>> +static void microvm_ioapic_init(MicrovmMachineState *mms)
>>> +{
>>> +    qemu_irq *ioapic_irq;
>>> +    DeviceState *ioapic_dev;
>>> +    SysBusDevice *d;
>>> +    int i;
>>> +
>>> +    assert(kvm_irqchip_in_kernel());
>>> +    ioapic_irq = g_new0(qemu_irq, IOAPIC_NUM_PINS);
>>> +    kvm_pc_setup_irq_routing(true);
>>> +
>>> +    assert(kvm_ioapic_in_kernel());
>>> +    ioapic_dev = qdev_create(NULL, "kvm-ioapic");
>>> +
>>> +    object_property_add_child(qdev_get_machine(), "ioapic", OBJECT(ioapic_dev), NULL);
>>> +
>>> +    qdev_init_nofail(ioapic_dev);
>>> +    d = SYS_BUS_DEVICE(ioapic_dev);
>>> +    sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
>>> +
>>> +    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>>> +        ioapic_irq[i] = qdev_get_gpio_in(ioapic_dev, i);
>>> +    }
>>> +
>>> +    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler, ioapic_irq, IOAPIC_NUM_PINS);
>>> +
>>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>>> +        sysbus_create_simple("virtio-mmio",
>>> +                             VIRTIO_MMIO_BASE + i * 512,
>>> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
>>> +    }
>>> +}
>>> +
>>> +static void microvm_memory_init(MicrovmMachineState *mms)
>>> +{
>>> +    MachineState *machine = MACHINE(mms);
>>> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
>>> +    MemoryRegion *system_memory = get_system_memory();
>>> +
>>> +    if (machine->ram_size > MICROVM_MAX_BELOW_4G) {
>>> +        mms->above_4g_mem_size = machine->ram_size - MICROVM_MAX_BELOW_4G;
>>> +        mms->below_4g_mem_size = MICROVM_MAX_BELOW_4G;
>>> +    } else {
>>> +        mms->above_4g_mem_size = 0;
>>> +        mms->below_4g_mem_size = machine->ram_size;
>>> +    }
>>> +
>>> +    ram = g_malloc(sizeof(*ram));
>>> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
>>> +                                         machine->ram_size);
>>> +
>>> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
>>> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>>> +                             0, mms->below_4g_mem_size);
>>> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
>>> +
>>> +    e820_add_entry(0, mms->below_4g_mem_size, E820_RAM);
>>> +
>>> +    if (mms->above_4g_mem_size > 0) {
>>> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
>>> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>>> +                                 mms->below_4g_mem_size,
>>> +                                 mms->above_4g_mem_size);
>>> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
>>> +                                    ram_above_4g);
>>> +        e820_add_entry(0x100000000ULL, mms->above_4g_mem_size, E820_RAM);
>>> +    }
>>> +}
>>> +
>>> +static void microvm_machine_state_init(MachineState *machine)
>>> +{
>>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>>> +    uint64_t elf_entry;
>>> +    int kernel_size;
>>> +
>>> +    if (machine->kernel_filename == NULL) {
>>> +        error_report("missing kernel image file name, required by microvm");
>>> +        exit(1);
>>> +    }
>>> +
>>> +    microvm_memory_init(mms);
>>> +    if (mms->legacy) {
>>> +        microvm_legacy_init(mms);
>>> +    } else {
>>> +        microvm_ioapic_init(mms);
>>> +    }
>>> +
>>> +    mms->apic_id_limit = cpus_init(machine, false);
>>> +
>>> +    kvmclock_create();
>>> +
>>> +    kernel_size = load_elf(machine->kernel_filename, NULL,
>>> +                           NULL, NULL, &elf_entry,
>>> +                           NULL, NULL, 0, I386_ELF_MACHINE,
>>> +                           0, 0);
>>> +
>>> +    if (kernel_size < 0) {
>>> +        error_report("Error while loading elf kernel");
>>> +        exit(1);
>>> +    }
>>> +
>>> +    mms->elf_entry = elf_entry;
>>> +}
>>> +
>>> +static gchar *microvm_get_virtio_mmio_cmdline(gchar *name)
>>> +{
>>> +    gchar *cmdline;
>>> +    gchar *separator;
>>> +    unsigned long index;
>>> +    int ret;
>>> +
>>> +    separator = g_strrstr(name, ".");
>>> +    if (!separator) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    index = strtol(separator + 1, NULL, 10);
>>> +    if (index == LONG_MIN || index == LONG_MAX) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
>>> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
>>> +                     " virtio_mmio.device=512@0x%lx:%ld",
>>> +                     VIRTIO_MMIO_BASE + index * 512,
>>> +                     VIRTIO_IRQ_BASE + index);
>>> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
>>> +        g_free(cmdline);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    return cmdline;
>>> +}
>>> +
>>> +static void microvm_setup_bootparams(MicrovmMachineState *mms, const gchar *kernel_cmdline)
>>> +{
>>> +    struct boot_params params;
>>> +    BusState *bus;
>>> +    BusChild *kid;
>>> +    gchar *cmdline;
>>> +    int cmdline_len;
>>> +    int i;
>>> +
>>> +    cmdline = g_strdup(kernel_cmdline);
>>> +
>>> +    /*
>>> +     * Find MMIO transports with attached devices, and add them to the kernel
>>> +     * command line.
>>> +     */
>>> +    bus = sysbus_get_default();
>>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>>> +        DeviceState *dev = kid->child;
>>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>>> +
>>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>>> +
>>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>>> +                gchar *mmio_cmdline = microvm_get_virtio_mmio_cmdline(mmio_bus->name);
>>> +                if (mmio_cmdline) {
>>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>>> +                    g_free(mmio_cmdline);
>>> +                    g_free(cmdline);
>>> +                    cmdline = newcmd;
>>> +                }
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    cmdline_len = strlen(cmdline);
>>> +
>>> +    address_space_write(&address_space_memory,
>>> +                        KERNEL_CMDLINE_START, MEMTXATTRS_UNSPECIFIED,
>>> +                        (uint8_t *) cmdline, cmdline_len);
>>> +
>>> +    g_free(cmdline);
>>> +
>>> +    memset(&params, 0, sizeof(struct boot_params));
>>> +
>>> +    params.hdr.type_of_loader = KERNEL_LOADER_OTHER;
>>> +    params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC;
>>> +    params.hdr.header = KERNEL_HDR_MAGIC;
>>> +    params.hdr.cmd_line_ptr = KERNEL_CMDLINE_START;
>>> +    params.hdr.cmdline_size = cmdline_len;
>>> +    params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
>>> +
>>> +    params.e820_entries = e820_get_num_entries();
>>> +    for (i = 0; i < params.e820_entries; i++) {
>>> +        uint64_t address, length;
>>> +        if (e820_get_entry(i, E820_RAM, &address, &length)) {
>>> +            params.e820_table[i].addr = address;
>>> +            params.e820_table[i].size = length;
>>> +            params.e820_table[i].type = E820_RAM;
>>> +        }
>>> +    }
>>> +
>>> +    address_space_write(&address_space_memory,
>>> +                        ZERO_PAGE_START, MEMTXATTRS_UNSPECIFIED,
>>> +                        (uint8_t *) &params, sizeof(struct boot_params));
>>> +}
>>> +
>>> +static void microvm_init_page_tables(void)
>>> +{
>>> +    uint64_t val = 0;
>>> +    int i;
>>> +
>>> +    val = PDPTE_START | 0x03;
>>> +    address_space_write(&address_space_memory,
>>> +                        PML4_START, MEMTXATTRS_UNSPECIFIED,
>>> +                        (uint8_t *) &val, 8);
>>> +    val = PDE_START | 0x03;
>>> +    address_space_write(&address_space_memory,
>>> +                        PDPTE_START, MEMTXATTRS_UNSPECIFIED,
>>> +                        (uint8_t *) &val, 8);
>>> +
>>> +    for (i = 0; i < 512; i++) {
>>> +        val = (i << 21) + 0x83;
>>> +        address_space_write(&address_space_memory,
>>> +                            PDE_START + (i * 8), MEMTXATTRS_UNSPECIFIED,
>>> +                            (uint8_t *) &val, 8);
>>> +    }
>>> +}
>>> +
>>> +static void microvm_cpu_reset(CPUState *cs, uint64_t elf_entry)
>>> +{
>>> +    X86CPU *cpu = X86_CPU(cs);
>>> +    CPUX86State *env = &cpu->env;
>>> +    struct SegmentCache seg_code =
>>> +        { .selector = 0x8, .base = 0x0, .limit = 0xfffff, .flags = 0xa09b00 };
>>> +    struct SegmentCache seg_data =
>>> +        { .selector = 0x10, .base = 0x0, .limit = 0xfffff, .flags = 0xc09300 };
>>> +    struct SegmentCache seg_tr =
>>> +        { .selector = 0x18, .base = 0x0, .limit = 0xfffff, .flags = 0x808b00 };
>>> +
>>> +    kvm_arch_get_registers(cs);
>>> +
>>> +    memcpy(&env->segs[R_CS], &seg_code, sizeof(struct SegmentCache));
>>> +    memcpy(&env->segs[R_DS], &seg_data, sizeof(struct SegmentCache));
>>> +    memcpy(&env->segs[R_ES], &seg_data, sizeof(struct SegmentCache));
>>> +    memcpy(&env->segs[R_FS], &seg_data, sizeof(struct SegmentCache));
>>> +    memcpy(&env->segs[R_GS], &seg_data, sizeof(struct SegmentCache));
>>> +    memcpy(&env->segs[R_SS], &seg_data, sizeof(struct SegmentCache));
>>> +    memcpy(&env->tr, &seg_tr, sizeof(struct SegmentCache));
>>> +
>>> +    env->efer |= MSR_EFER_LME | MSR_EFER_LMA;
>>> +    env->regs[R_ESP] = BOOT_STACK_POINTER;
>>> +    env->regs[R_EBP] = BOOT_STACK_POINTER;
>>> +    env->regs[R_ESI] = ZERO_PAGE_START;
>>> +
>>> +    cpu_set_pc(cs, elf_entry);
>>> +    cpu_x86_update_cr3(env, PML4_START);
>>> +    cpu_x86_update_cr4(env, env->cr[4] | CR4_PAE_MASK);
>>> +    cpu_x86_update_cr0(env, env->cr[0] | CR0_PE_MASK | CR0_PG_MASK);
>>> +    x86_update_hflags(env);
>>> +
>>> +    kvm_arch_put_registers(cs, KVM_PUT_RESET_STATE);
>>> +}
>>> +
>>> +static void microvm_mptable_setup(MicrovmMachineState *mms)
>>> +{
>>> +    char *mptable;
>>> +    int size;
>>> +
>>> +    mptable = mptable_generate(smp_cpus, mms->apic_id_limit,
>>> +                               EBDA_START, &size);
>>> +    address_space_write(&address_space_memory,
>>> +                        EBDA_START, MEMTXATTRS_UNSPECIFIED,
>>> +                        (uint8_t *) mptable, size);
>>> +    g_free(mptable);
>>> +}
>>> +
>>> +static bool microvm_machine_get_legacy(Object *obj, Error **errp)
>>> +{
>>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>>> +
>>> +    return mms->legacy;
>>> +}
>>> +
>>> +static void microvm_machine_set_legacy(Object *obj, bool value, Error **errp)
>>> +{
>>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>>> +
>>> +    mms->legacy = value;
>>> +}
>>> +
>>> +static void microvm_machine_reset(void)
>>> +{
>>> +    MachineState *machine = MACHINE(qdev_get_machine());
>>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>>> +    CPUState *cs;
>>> +    X86CPU *cpu;
>>> +
>>> +    qemu_devices_reset();
>>> +
>>> +    microvm_mptable_setup(mms);
>>> +    microvm_setup_bootparams(mms, machine->kernel_cmdline);
>>> +    microvm_init_page_tables();
>>> +
>>> +    CPU_FOREACH(cs) {
>>> +        cpu = X86_CPU(cs);
>>> +
>>> +        /* Reset APIC after devices have been reset to cancel
>>> +         * any changes that qemu_devices_reset() might have done.
>>> +         */
>>> +        if (cpu->apic_state) {
>>> +            device_reset(cpu->apic_state);
>>> +        }
>>> +
>>> +        microvm_cpu_reset(cs, mms->elf_entry);
>>> +    }
>>> +}
>>> +
>>> +static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
>>> +{
>>> +    CPUState *cs;
>>> +
>>> +    CPU_FOREACH(cs) {
>>> +        X86CPU *cpu = X86_CPU(cs);
>>> +
>>> +        if (!cpu->apic_state) {
>>> +            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
>>> +        } else {
>>> +            apic_deliver_nmi(cpu->apic_state);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void microvm_machine_instance_init(Object *obj)
>>> +{
>>> +}
>>> +
>>> +static void microvm_class_init(ObjectClass *oc, void *data)
>>> +{
>>> +    NMIClass *nc = NMI_CLASS(oc);
>>> +
>>> +    /* NMI handler */
>>> +    nc->nmi_monitor_handler = x86_nmi;
>>> +
>>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_LEGACY,
>>> +                                   microvm_machine_get_legacy,
>>> +                                   microvm_machine_set_legacy,
>>> +                                   &error_abort);
>>> +}
>>> +
>>> +static const TypeInfo microvm_machine_info = {
>>> +    .name          = TYPE_MICROVM_MACHINE,
>>> +    .parent        = TYPE_MACHINE,
>>> +    .abstract      = true,
>>> +    .instance_size = sizeof(MicrovmMachineState),
>>> +    .instance_init = microvm_machine_instance_init,
>>> +    .class_size    = sizeof(MicrovmMachineClass),
>>> +    .class_init    = microvm_class_init,
>>> +    .interfaces = (InterfaceInfo[]) {
>>> +         { TYPE_NMI },
>>> +         { }
>>> +    },
>>> +};
>>> +
>>> +static void microvm_machine_init(void)
>>> +{
>>> +    type_register_static(&microvm_machine_info);
>>> +}
>>> +type_init(microvm_machine_init);
>>> +
>>> +static void microvm_1_0_instance_init(Object *obj)
>>> +{
>>> +}
>>> +
>>> +static void microvm_machine_class_init(MachineClass *mc)
>>> +{
>>> +    mc->init = microvm_machine_state_init;
>>> +
>>> +    mc->family = "microvm_i386";
>>> +    mc->desc = "Microvm (i386)";
>>> +    mc->units_per_default_bus = 1;
>>> +    mc->no_floppy = 1;
>>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
>>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
>>> +    mc->max_cpus = 288;
>>> +    mc->has_hotpluggable_cpus = false;
>>> +    mc->auto_enable_numa_with_memhp = false;
>>> +    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");
>>> +    mc->nvdimm_supported = false;
>>> +    mc->default_machine_opts = "accel=kvm";
>>> +
>>> +    /* Machine class handlers */
>>> +    mc->cpu_index_to_instance_props = cpu_index_to_props;
>>> +    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
>>> +    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;;
>>> +    mc->reset = microvm_machine_reset;
>>> +}
>>> +
>>> +static void microvm_1_0_machine_class_init(MachineClass *mc)
>>> +{
>>> +    microvm_machine_class_init(mc);
>>> +}
>>> +DEFINE_MICROVM_MACHINE_AS_LATEST(1, 0)
>>> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>>> new file mode 100644
>>> index 0000000000..544ef60563
>>> --- /dev/null
>>> +++ b/include/hw/i386/microvm.h
>>> @@ -0,0 +1,85 @@
>>> +/*
>>> + *
>>> + * Copyright (c) 2018 Intel Corporation
>>> + * Copyright (c) 2019 Red Hat, Inc.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2 or later, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>> + * more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along with
>>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#ifndef HW_I386_MICROVM_H
>>> +#define HW_I386_MICROVM_H
>>> +
>>> +#include "qemu-common.h"
>>> +#include "exec/hwaddr.h"
>>> +#include "qemu/notify.h"
>>> +
>>> +#include "hw/boards.h"
>>> +
>>> +/* Microvm memory layout */
>>> +#define ZERO_PAGE_START      0x7000
>>> +#define BOOT_STACK_POINTER   0x8ff0
>>> +#define PML4_START           0x9000
>>> +#define PDPTE_START          0xa000
>>> +#define PDE_START            0xb000
>>> +#define EBDA_START           0x9fc00
>>> +#define HIMEM_START          0x100000
>>> +#define MICROVM_MAX_BELOW_4G 0xe0000000
>>> +
>>> +/* Bootparams related definitions */
>>> +#define KERNEL_BOOT_FLAG_MAGIC     0xaa55
>>> +#define KERNEL_HDR_MAGIC           0x53726448
>>> +#define KERNEL_LOADER_OTHER        0xff
>>> +#define KERNEL_MIN_ALIGNMENT_BYTES 0x01000000
>>> +#define KERNEL_CMDLINE_START       0x20000
>>> +#define KERNEL_CMDLINE_MAX_SIZE    0x10000
>>> +
>>> +/* Platform virtio definitions */
>>> +#define VIRTIO_MMIO_BASE      0xd0000000
>>> +#define VIRTIO_IRQ_BASE       5
>>> +#define VIRTIO_NUM_TRANSPORTS 8
>>> +#define VIRTIO_CMDLINE_MAXLEN 64
>>> +
>>> +/* Machine type options */
>>> +#define MICROVM_MACHINE_LEGACY "legacy"
>>> +
>>> +typedef struct {
>>> +    MachineClass parent;
>>> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
>>> +                                           DeviceState *dev);
>>> +} MicrovmMachineClass;
>>> +
>>> +typedef struct {
>>> +    MachineState parent;
>>> +    unsigned apic_id_limit;
>>> +    qemu_irq *gsi;
>>> +
>>> +    /* RAM size */
>>> +    ram_addr_t below_4g_mem_size;
>>> +    ram_addr_t above_4g_mem_size;
>>> +
>>> +    /* Kernel ELF entry. On reset, vCPUs RIP will be set to this */
>>> +    uint64_t elf_entry;
>>> +
>>> +    /* Legacy mode based on an ISA bus. Useful for debugging */
>>> +    bool legacy;
>>> +} MicrovmMachineState;
>>> +
>>> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
>>> +#define MICROVM_MACHINE(obj) \
>>> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
>>> +#define MICROVM_MACHINE_GET_CLASS(obj) \
>>> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
>>> +#define MICROVM_MACHINE_CLASS(class) \
>>> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
>>> +
>>> +#endif



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 21:05     ` Sergio Lopez
  2019-06-28 21:54       ` Maran Wilson
@ 2019-06-28 21:56       ` Paolo Bonzini
  1 sibling, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2019-06-28 21:56 UTC (permalink / raw)
  To: Sergio Lopez, Maran Wilson; +Cc: qemu-devel, rth, ehabkost, mst

On 28/06/19 23:05, Sergio Lopez wrote:
> The main reason for relying on Linux's Zero Page, is to be able to
> pass the e820 table with the basic physical memory layout to the kernel
> through it, as there isn't a BIOS nor ACPI. AFAIK, we can't do that with
> PVH.

e820 is passed through both PVH and multiboot.  qboot supports all
three, and also literally three 16-bit BIOS services for use with
vmlinuz.  I agree with Maran that it would be better to use a normal
firmware and reuse the fw_cfg code from the pc and q35 machine types.

qboot doesn't do mptable yet (it can only take ACPI tables from fwcfg)
but it should be easy to lift the SeaBIOS code that generate the tables.

It should be very interesting to have a comparison between running
"firmware" code in the host vs. the guest.

Another promising thing to do is to mmap the -kernel ELF file instead of
reading it.  You still have the memcpy when it is read into guest memory
with fw_cfg, but given Linux patches the heck out of itself at boot, it
should be faster to memcpy than to mmap vmlinux directly as MAP_PRIVATE
and hen suffer a thousand CoW page faults.

Thanks,

Paolo

> I'm inclined to keep it this way, and once there's an interest to use
> the microvm machine type with a different kernel, try to find some
> common ground.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 21:42     ` Sergio Lopez
@ 2019-06-28 21:57       ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2019-06-28 21:57 UTC (permalink / raw)
  To: Sergio Lopez, Eduardo Habkost; +Cc: rth, qemu-devel, mst

On 28/06/19 23:42, Sergio Lopez wrote:
> I though it could be a good idea to have versioning already in place, in
> case we need it in the future. But, perhaps we can do a simple machine
> definition and just add versioning when it's really needed?
> 

I think if the use case is really short-lived VMs, versioning is mostly
unnecessary.

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 14:06   ` Michael S. Tsirkin
  2019-06-28 20:56     ` Sergio Lopez
@ 2019-06-28 22:17     ` Paolo Bonzini
  2019-06-30 21:37       ` Michael S. Tsirkin
  1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2019-06-28 22:17 UTC (permalink / raw)
  To: Michael S. Tsirkin, Sergio Lopez; +Cc: qemu-devel, ehabkost, rth

On 28/06/19 16:06, Michael S. Tsirkin wrote:
>> +    assert(kvm_irqchip_in_kernel());
> Hmm - irqchip in kernel actually increases the attack surface,
> does it not? Or at least, the severity of the attacks.

Yeah, we should at least support split irqchip.  But, irqchip completely
in userspace is slow when it is not broken, and it does not support
APICv.  So it's not really feasible.

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 21:54       ` Maran Wilson
@ 2019-06-28 22:23         ` Sergio Lopez
  0 siblings, 0 replies; 29+ messages in thread
From: Sergio Lopez @ 2019-06-28 22:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, rth, mst, ehabkost, Maran Wilson

[-- Attachment #1: Type: text/plain, Size: 27627 bytes --]


Maran Wilson <maran.wilson@oracle.com> writes:

> On 6/28/2019 2:05 PM, Sergio Lopez wrote:
>> Maran Wilson <maran.wilson@oracle.com> writes:
>>
>>> This seems like a good overall direction to be headed with Qemu.
>>>
>>> But there is a lot of Linux OS specific startup details being baked
>>> into the Qemu machine type here. Things that are usually pushed into
>>> firmware or option ROM.
>>>
>>> Instead of hard coding all the Zero page stuff into the Qemu machine
>>> model, couldn't you just setup the PVH kernel entry point and leave
>>> all the OS specific details to the OS being started? That way, at
>>> least you are programming to a more generic ABI spec. See:
>>> https://gist.github.com/stefano-garzarella/7b7e17e75add20abd1c42fb496cc6504
>>>
>>> And I think you still wouldn't need any firmware if you just replace
>>> your zeropage initialization with PVH spec setup.
>> The main reason for relying on Linux's Zero Page, is to be able to
>> pass the e820 table with the basic physical memory layout to the kernel
>> through it, as there isn't a BIOS nor ACPI. AFAIK, we can't do that with
>> PVH.
>
> Actually, we specifically updated the PVH interface to add just that
> (). Please take a look at start_info.h in the Qemu source.
>
> And here's the patch where the canonical definition for the HVM direct
> boot ABI was updated:
> https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00053.html
> You can see we added fields for memmap_paddr/memmap_entries to the
> struct. I think the community feedback was to avoid labeling it "e820"
> but, in fact, that's exactly what it is -- and as you can see, the
> types were tied to the memmap types specified in the ACPI spec.

Nice! I'll give this a try next week. Thanks!

> And no BIOS nor Firmware was needed to boot via the PVH entry. When we
> first came to the community as an RFC with the PVH/KVM idea, we got
> feedback that folks would prefer to see the PVH setup stuff pushed
> into qboot or option rom so that's why it ended up that way (using a
> very minimal FW of your choice). But there is technically no reason
> why it can't be done directly in Qemu just like you have done all the
> zeropage init stuff in these patches. That would have been our first
> preference anyway.
>
> Thanks,
> -Maran
>
>> I'm inclined to keep it this way, and once there's an interest to use
>> the microvm machine type with a different kernel, try to find some
>> common ground.
>>
>>> Thanks,
>>> -Maran
>>>
>>> On 6/28/2019 4:53 AM, Sergio Lopez wrote:
>>>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>>>> constructed after the machine model implemented by the latter.
>>>>
>>>> It's main purpose is providing users a KVM-only machine type with fast
>>>> boot times, minimal attack surface (measured as the number of IO ports
>>>> and MMIO regions exposed to the Guest) and small footprint (specially
>>>> when combined with the ongoing QEMU modularization effort).
>>>>
>>>> Normally, other than the device support provided by KVM itself,
>>>> microvm only supports virtio-mmio devices. Microvm also includes a
>>>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>>>> for being able to see the early boot kernel messages.
>>>>
>>>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>>>> ---
>>>>    default-configs/i386-softmmu.mak |   1 +
>>>>    hw/i386/Kconfig                  |   4 +
>>>>    hw/i386/Makefile.objs            |   1 +
>>>>    hw/i386/microvm.c                | 518 +++++++++++++++++++++++++++++++
>>>>    include/hw/i386/microvm.h        |  85 +++++
>>>>    5 files changed, 609 insertions(+)
>>>>    create mode 100644 hw/i386/microvm.c
>>>>    create mode 100644 include/hw/i386/microvm.h
>>>>
>>>> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
>>>> index cd5ea391e8..338f07420f 100644
>>>> --- a/default-configs/i386-softmmu.mak
>>>> +++ b/default-configs/i386-softmmu.mak
>>>> @@ -26,3 +26,4 @@ CONFIG_ISAPC=y
>>>>    CONFIG_I440FX=y
>>>>    CONFIG_Q35=y
>>>>    CONFIG_ACPI_PCI=y
>>>> +CONFIG_MICROVM=y
>>>> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
>>>> index 9817888216..94c565d8db 100644
>>>> --- a/hw/i386/Kconfig
>>>> +++ b/hw/i386/Kconfig
>>>> @@ -87,6 +87,10 @@ config Q35
>>>>        select VMMOUSE
>>>>        select FW_CFG_DMA
>>>>    +config MICROVM
>>>> +    bool
>>>> +    select VIRTIO_MMIO
>>>> +
>>>>    config VTD
>>>>        bool
>>>>    diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
>>>> index 102f2b35fc..149bdd0784 100644
>>>> --- a/hw/i386/Makefile.objs
>>>> +++ b/hw/i386/Makefile.objs
>>>> @@ -4,6 +4,7 @@ obj-y += cpu.o
>>>>    obj-y += pc.o
>>>>    obj-$(CONFIG_I440FX) += pc_piix.o
>>>>    obj-$(CONFIG_Q35) += pc_q35.o
>>>> +obj-$(CONFIG_MICROVM) += mptable.o microvm.o
>>>>    obj-y += fw_cfg.o pc_sysfw.o
>>>>    obj-y += x86-iommu.o
>>>>    obj-$(CONFIG_VTD) += intel_iommu.o
>>>> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
>>>> new file mode 100644
>>>> index 0000000000..fff88c3697
>>>> --- /dev/null
>>>> +++ b/hw/i386/microvm.c
>>>> @@ -0,0 +1,518 @@
>>>> +/*
>>>> + *
>>>> + * Copyright (c) 2018 Intel Corporation
>>>> + * Copyright (c) 2019 Red Hat, Inc.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify it
>>>> + * under the terms and conditions of the GNU General Public License,
>>>> + * version 2 or later, as published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>>> + * more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License along with
>>>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#include "qemu/osdep.h"
>>>> +#include "qemu/error-report.h"
>>>> +#include "qapi/error.h"
>>>> +#include "qapi/visitor.h"
>>>> +#include "sysemu/sysemu.h"
>>>> +#include "sysemu/cpus.h"
>>>> +#include "sysemu/numa.h"
>>>> +
>>>> +#include "hw/loader.h"
>>>> +#include "hw/nmi.h"
>>>> +#include "hw/kvm/clock.h"
>>>> +#include "hw/i386/microvm.h"
>>>> +#include "hw/i386/pc.h"
>>>> +#include "hw/i386/cpu-internal.h"
>>>> +#include "target/i386/cpu.h"
>>>> +#include "hw/timer/i8254.h"
>>>> +#include "hw/char/serial.h"
>>>> +#include "hw/i386/topology.h"
>>>> +#include "hw/virtio/virtio-mmio.h"
>>>> +#include "hw/i386/mptable.h"
>>>> +
>>>> +#include "cpu.h"
>>>> +#include "elf.h"
>>>> +#include "kvm_i386.h"
>>>> +#include <asm/bootparam.h>
>>>> +
>>>> +#define DEFINE_MICROVM_MACHINE_LATEST(major, minor, latest) \
>>>> +    static void microvm_##major##_##minor##_object_class_init(ObjectClass *oc, \
>>>> +                                                              void *data) \
>>>> +    { \
>>>> +        MachineClass *mc = MACHINE_CLASS(oc); \
>>>> +        microvm_##major##_##minor##_machine_class_init(mc); \
>>>> +        mc->desc = "Microvm (i386)"; \
>>>> +        if (latest) { \
>>>> +            mc->alias = "microvm"; \
>>>> +        } \
>>>> +    } \
>>>> +    static const TypeInfo microvm_##major##_##minor##_info = { \
>>>> +        .name = MACHINE_TYPE_NAME("microvm-" # major "." # minor), \
>>>> +        .parent = TYPE_MICROVM_MACHINE, \
>>>> +        .instance_init = microvm_##major##_##minor##_instance_init, \
>>>> +        .class_init = microvm_##major##_##minor##_object_class_init, \
>>>> +    }; \
>>>> +    static void microvm_##major##_##minor##_init(void) \
>>>> +    { \
>>>> +        type_register_static(&microvm_##major##_##minor##_info); \
>>>> +    } \
>>>> +    type_init(microvm_##major##_##minor##_init);
>>>> +
>>>> +#define DEFINE_MICROVM_MACHINE_AS_LATEST(major, minor) \
>>>> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, true)
>>>> +#define DEFINE_MICROVM_MACHINE(major, minor) \
>>>> +    DEFINE_MICROVM_MACHINE_LATEST(major, minor, false)
>>>> +
>>>> +static void microvm_gsi_handler(void *opaque, int n, int level)
>>>> +{
>>>> +    qemu_irq *ioapic_irq = opaque;
>>>> +
>>>> +    qemu_set_irq(ioapic_irq[n], level);
>>>> +}
>>>> +
>>>> +static void microvm_legacy_init(MicrovmMachineState *mms)
>>>> +{
>>>> +    ISABus *isa_bus;
>>>> +    GSIState *gsi_state;
>>>> +    qemu_irq *i8259;
>>>> +    int i;
>>>> +
>>>> +    assert(kvm_irqchip_in_kernel());
>>>> +    gsi_state = g_malloc0(sizeof(*gsi_state));
>>>> +    mms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
>>>> +
>>>> +    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
>>>> +                          &error_abort);
>>>> +    isa_bus_irqs(isa_bus, mms->gsi);
>>>> +
>>>> +    assert(kvm_pic_in_kernel());
>>>> +    i8259 = kvm_i8259_init(isa_bus);
>>>> +
>>>> +    for (i = 0; i < ISA_NUM_IRQS; i++) {
>>>> +        gsi_state->i8259_irq[i] = i8259[i];
>>>> +    }
>>>> +
>>>> +    kvm_pit_init(isa_bus, 0x40);
>>>> +
>>>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>>>> +        int nirq = VIRTIO_IRQ_BASE + i;
>>>> +        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
>>>> +        qemu_irq mmio_irq;
>>>> +
>>>> +        isa_init_irq(isadev, &mmio_irq, nirq);
>>>> +        sysbus_create_simple("virtio-mmio",
>>>> +                             VIRTIO_MMIO_BASE + i * 512,
>>>> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
>>>> +    }
>>>> +
>>>> +    g_free(i8259);
>>>> +
>>>> +    serial_hds_isa_init(isa_bus, 0, 1);
>>>> +}
>>>> +
>>>> +static void microvm_ioapic_init(MicrovmMachineState *mms)
>>>> +{
>>>> +    qemu_irq *ioapic_irq;
>>>> +    DeviceState *ioapic_dev;
>>>> +    SysBusDevice *d;
>>>> +    int i;
>>>> +
>>>> +    assert(kvm_irqchip_in_kernel());
>>>> +    ioapic_irq = g_new0(qemu_irq, IOAPIC_NUM_PINS);
>>>> +    kvm_pc_setup_irq_routing(true);
>>>> +
>>>> +    assert(kvm_ioapic_in_kernel());
>>>> +    ioapic_dev = qdev_create(NULL, "kvm-ioapic");
>>>> +
>>>> +    object_property_add_child(qdev_get_machine(), "ioapic", OBJECT(ioapic_dev), NULL);
>>>> +
>>>> +    qdev_init_nofail(ioapic_dev);
>>>> +    d = SYS_BUS_DEVICE(ioapic_dev);
>>>> +    sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
>>>> +
>>>> +    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
>>>> +        ioapic_irq[i] = qdev_get_gpio_in(ioapic_dev, i);
>>>> +    }
>>>> +
>>>> +    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler, ioapic_irq, IOAPIC_NUM_PINS);
>>>> +
>>>> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
>>>> +        sysbus_create_simple("virtio-mmio",
>>>> +                             VIRTIO_MMIO_BASE + i * 512,
>>>> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void microvm_memory_init(MicrovmMachineState *mms)
>>>> +{
>>>> +    MachineState *machine = MACHINE(mms);
>>>> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
>>>> +    MemoryRegion *system_memory = get_system_memory();
>>>> +
>>>> +    if (machine->ram_size > MICROVM_MAX_BELOW_4G) {
>>>> +        mms->above_4g_mem_size = machine->ram_size - MICROVM_MAX_BELOW_4G;
>>>> +        mms->below_4g_mem_size = MICROVM_MAX_BELOW_4G;
>>>> +    } else {
>>>> +        mms->above_4g_mem_size = 0;
>>>> +        mms->below_4g_mem_size = machine->ram_size;
>>>> +    }
>>>> +
>>>> +    ram = g_malloc(sizeof(*ram));
>>>> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
>>>> +                                         machine->ram_size);
>>>> +
>>>> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
>>>> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
>>>> +                             0, mms->below_4g_mem_size);
>>>> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
>>>> +
>>>> +    e820_add_entry(0, mms->below_4g_mem_size, E820_RAM);
>>>> +
>>>> +    if (mms->above_4g_mem_size > 0) {
>>>> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
>>>> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
>>>> +                                 mms->below_4g_mem_size,
>>>> +                                 mms->above_4g_mem_size);
>>>> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
>>>> +                                    ram_above_4g);
>>>> +        e820_add_entry(0x100000000ULL, mms->above_4g_mem_size, E820_RAM);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void microvm_machine_state_init(MachineState *machine)
>>>> +{
>>>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>>>> +    uint64_t elf_entry;
>>>> +    int kernel_size;
>>>> +
>>>> +    if (machine->kernel_filename == NULL) {
>>>> +        error_report("missing kernel image file name, required by microvm");
>>>> +        exit(1);
>>>> +    }
>>>> +
>>>> +    microvm_memory_init(mms);
>>>> +    if (mms->legacy) {
>>>> +        microvm_legacy_init(mms);
>>>> +    } else {
>>>> +        microvm_ioapic_init(mms);
>>>> +    }
>>>> +
>>>> +    mms->apic_id_limit = cpus_init(machine, false);
>>>> +
>>>> +    kvmclock_create();
>>>> +
>>>> +    kernel_size = load_elf(machine->kernel_filename, NULL,
>>>> +                           NULL, NULL, &elf_entry,
>>>> +                           NULL, NULL, 0, I386_ELF_MACHINE,
>>>> +                           0, 0);
>>>> +
>>>> +    if (kernel_size < 0) {
>>>> +        error_report("Error while loading elf kernel");
>>>> +        exit(1);
>>>> +    }
>>>> +
>>>> +    mms->elf_entry = elf_entry;
>>>> +}
>>>> +
>>>> +static gchar *microvm_get_virtio_mmio_cmdline(gchar *name)
>>>> +{
>>>> +    gchar *cmdline;
>>>> +    gchar *separator;
>>>> +    unsigned long index;
>>>> +    int ret;
>>>> +
>>>> +    separator = g_strrstr(name, ".");
>>>> +    if (!separator) {
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    index = strtol(separator + 1, NULL, 10);
>>>> +    if (index == LONG_MIN || index == LONG_MAX) {
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
>>>> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
>>>> +                     " virtio_mmio.device=512@0x%lx:%ld",
>>>> +                     VIRTIO_MMIO_BASE + index * 512,
>>>> +                     VIRTIO_IRQ_BASE + index);
>>>> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
>>>> +        g_free(cmdline);
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    return cmdline;
>>>> +}
>>>> +
>>>> +static void microvm_setup_bootparams(MicrovmMachineState *mms, const gchar *kernel_cmdline)
>>>> +{
>>>> +    struct boot_params params;
>>>> +    BusState *bus;
>>>> +    BusChild *kid;
>>>> +    gchar *cmdline;
>>>> +    int cmdline_len;
>>>> +    int i;
>>>> +
>>>> +    cmdline = g_strdup(kernel_cmdline);
>>>> +
>>>> +    /*
>>>> +     * Find MMIO transports with attached devices, and add them to the kernel
>>>> +     * command line.
>>>> +     */
>>>> +    bus = sysbus_get_default();
>>>> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
>>>> +        DeviceState *dev = kid->child;
>>>> +        ObjectClass *class = object_get_class(OBJECT(dev));
>>>> +
>>>> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
>>>> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
>>>> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
>>>> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
>>>> +
>>>> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
>>>> +                gchar *mmio_cmdline = microvm_get_virtio_mmio_cmdline(mmio_bus->name);
>>>> +                if (mmio_cmdline) {
>>>> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
>>>> +                    g_free(mmio_cmdline);
>>>> +                    g_free(cmdline);
>>>> +                    cmdline = newcmd;
>>>> +                }
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    cmdline_len = strlen(cmdline);
>>>> +
>>>> +    address_space_write(&address_space_memory,
>>>> +                        KERNEL_CMDLINE_START, MEMTXATTRS_UNSPECIFIED,
>>>> +                        (uint8_t *) cmdline, cmdline_len);
>>>> +
>>>> +    g_free(cmdline);
>>>> +
>>>> +    memset(&params, 0, sizeof(struct boot_params));
>>>> +
>>>> +    params.hdr.type_of_loader = KERNEL_LOADER_OTHER;
>>>> +    params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC;
>>>> +    params.hdr.header = KERNEL_HDR_MAGIC;
>>>> +    params.hdr.cmd_line_ptr = KERNEL_CMDLINE_START;
>>>> +    params.hdr.cmdline_size = cmdline_len;
>>>> +    params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
>>>> +
>>>> +    params.e820_entries = e820_get_num_entries();
>>>> +    for (i = 0; i < params.e820_entries; i++) {
>>>> +        uint64_t address, length;
>>>> +        if (e820_get_entry(i, E820_RAM, &address, &length)) {
>>>> +            params.e820_table[i].addr = address;
>>>> +            params.e820_table[i].size = length;
>>>> +            params.e820_table[i].type = E820_RAM;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    address_space_write(&address_space_memory,
>>>> +                        ZERO_PAGE_START, MEMTXATTRS_UNSPECIFIED,
>>>> +                        (uint8_t *) &params, sizeof(struct boot_params));
>>>> +}
>>>> +
>>>> +static void microvm_init_page_tables(void)
>>>> +{
>>>> +    uint64_t val = 0;
>>>> +    int i;
>>>> +
>>>> +    val = PDPTE_START | 0x03;
>>>> +    address_space_write(&address_space_memory,
>>>> +                        PML4_START, MEMTXATTRS_UNSPECIFIED,
>>>> +                        (uint8_t *) &val, 8);
>>>> +    val = PDE_START | 0x03;
>>>> +    address_space_write(&address_space_memory,
>>>> +                        PDPTE_START, MEMTXATTRS_UNSPECIFIED,
>>>> +                        (uint8_t *) &val, 8);
>>>> +
>>>> +    for (i = 0; i < 512; i++) {
>>>> +        val = (i << 21) + 0x83;
>>>> +        address_space_write(&address_space_memory,
>>>> +                            PDE_START + (i * 8), MEMTXATTRS_UNSPECIFIED,
>>>> +                            (uint8_t *) &val, 8);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void microvm_cpu_reset(CPUState *cs, uint64_t elf_entry)
>>>> +{
>>>> +    X86CPU *cpu = X86_CPU(cs);
>>>> +    CPUX86State *env = &cpu->env;
>>>> +    struct SegmentCache seg_code =
>>>> +        { .selector = 0x8, .base = 0x0, .limit = 0xfffff, .flags = 0xa09b00 };
>>>> +    struct SegmentCache seg_data =
>>>> +        { .selector = 0x10, .base = 0x0, .limit = 0xfffff, .flags = 0xc09300 };
>>>> +    struct SegmentCache seg_tr =
>>>> +        { .selector = 0x18, .base = 0x0, .limit = 0xfffff, .flags = 0x808b00 };
>>>> +
>>>> +    kvm_arch_get_registers(cs);
>>>> +
>>>> +    memcpy(&env->segs[R_CS], &seg_code, sizeof(struct SegmentCache));
>>>> +    memcpy(&env->segs[R_DS], &seg_data, sizeof(struct SegmentCache));
>>>> +    memcpy(&env->segs[R_ES], &seg_data, sizeof(struct SegmentCache));
>>>> +    memcpy(&env->segs[R_FS], &seg_data, sizeof(struct SegmentCache));
>>>> +    memcpy(&env->segs[R_GS], &seg_data, sizeof(struct SegmentCache));
>>>> +    memcpy(&env->segs[R_SS], &seg_data, sizeof(struct SegmentCache));
>>>> +    memcpy(&env->tr, &seg_tr, sizeof(struct SegmentCache));
>>>> +
>>>> +    env->efer |= MSR_EFER_LME | MSR_EFER_LMA;
>>>> +    env->regs[R_ESP] = BOOT_STACK_POINTER;
>>>> +    env->regs[R_EBP] = BOOT_STACK_POINTER;
>>>> +    env->regs[R_ESI] = ZERO_PAGE_START;
>>>> +
>>>> +    cpu_set_pc(cs, elf_entry);
>>>> +    cpu_x86_update_cr3(env, PML4_START);
>>>> +    cpu_x86_update_cr4(env, env->cr[4] | CR4_PAE_MASK);
>>>> +    cpu_x86_update_cr0(env, env->cr[0] | CR0_PE_MASK | CR0_PG_MASK);
>>>> +    x86_update_hflags(env);
>>>> +
>>>> +    kvm_arch_put_registers(cs, KVM_PUT_RESET_STATE);
>>>> +}
>>>> +
>>>> +static void microvm_mptable_setup(MicrovmMachineState *mms)
>>>> +{
>>>> +    char *mptable;
>>>> +    int size;
>>>> +
>>>> +    mptable = mptable_generate(smp_cpus, mms->apic_id_limit,
>>>> +                               EBDA_START, &size);
>>>> +    address_space_write(&address_space_memory,
>>>> +                        EBDA_START, MEMTXATTRS_UNSPECIFIED,
>>>> +                        (uint8_t *) mptable, size);
>>>> +    g_free(mptable);
>>>> +}
>>>> +
>>>> +static bool microvm_machine_get_legacy(Object *obj, Error **errp)
>>>> +{
>>>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>>>> +
>>>> +    return mms->legacy;
>>>> +}
>>>> +
>>>> +static void microvm_machine_set_legacy(Object *obj, bool value, Error **errp)
>>>> +{
>>>> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
>>>> +
>>>> +    mms->legacy = value;
>>>> +}
>>>> +
>>>> +static void microvm_machine_reset(void)
>>>> +{
>>>> +    MachineState *machine = MACHINE(qdev_get_machine());
>>>> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>>>> +    CPUState *cs;
>>>> +    X86CPU *cpu;
>>>> +
>>>> +    qemu_devices_reset();
>>>> +
>>>> +    microvm_mptable_setup(mms);
>>>> +    microvm_setup_bootparams(mms, machine->kernel_cmdline);
>>>> +    microvm_init_page_tables();
>>>> +
>>>> +    CPU_FOREACH(cs) {
>>>> +        cpu = X86_CPU(cs);
>>>> +
>>>> +        /* Reset APIC after devices have been reset to cancel
>>>> +         * any changes that qemu_devices_reset() might have done.
>>>> +         */
>>>> +        if (cpu->apic_state) {
>>>> +            device_reset(cpu->apic_state);
>>>> +        }
>>>> +
>>>> +        microvm_cpu_reset(cs, mms->elf_entry);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
>>>> +{
>>>> +    CPUState *cs;
>>>> +
>>>> +    CPU_FOREACH(cs) {
>>>> +        X86CPU *cpu = X86_CPU(cs);
>>>> +
>>>> +        if (!cpu->apic_state) {
>>>> +            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
>>>> +        } else {
>>>> +            apic_deliver_nmi(cpu->apic_state);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +static void microvm_machine_instance_init(Object *obj)
>>>> +{
>>>> +}
>>>> +
>>>> +static void microvm_class_init(ObjectClass *oc, void *data)
>>>> +{
>>>> +    NMIClass *nc = NMI_CLASS(oc);
>>>> +
>>>> +    /* NMI handler */
>>>> +    nc->nmi_monitor_handler = x86_nmi;
>>>> +
>>>> +    object_class_property_add_bool(oc, MICROVM_MACHINE_LEGACY,
>>>> +                                   microvm_machine_get_legacy,
>>>> +                                   microvm_machine_set_legacy,
>>>> +                                   &error_abort);
>>>> +}
>>>> +
>>>> +static const TypeInfo microvm_machine_info = {
>>>> +    .name          = TYPE_MICROVM_MACHINE,
>>>> +    .parent        = TYPE_MACHINE,
>>>> +    .abstract      = true,
>>>> +    .instance_size = sizeof(MicrovmMachineState),
>>>> +    .instance_init = microvm_machine_instance_init,
>>>> +    .class_size    = sizeof(MicrovmMachineClass),
>>>> +    .class_init    = microvm_class_init,
>>>> +    .interfaces = (InterfaceInfo[]) {
>>>> +         { TYPE_NMI },
>>>> +         { }
>>>> +    },
>>>> +};
>>>> +
>>>> +static void microvm_machine_init(void)
>>>> +{
>>>> +    type_register_static(&microvm_machine_info);
>>>> +}
>>>> +type_init(microvm_machine_init);
>>>> +
>>>> +static void microvm_1_0_instance_init(Object *obj)
>>>> +{
>>>> +}
>>>> +
>>>> +static void microvm_machine_class_init(MachineClass *mc)
>>>> +{
>>>> +    mc->init = microvm_machine_state_init;
>>>> +
>>>> +    mc->family = "microvm_i386";
>>>> +    mc->desc = "Microvm (i386)";
>>>> +    mc->units_per_default_bus = 1;
>>>> +    mc->no_floppy = 1;
>>>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
>>>> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
>>>> +    mc->max_cpus = 288;
>>>> +    mc->has_hotpluggable_cpus = false;
>>>> +    mc->auto_enable_numa_with_memhp = false;
>>>> +    mc->default_cpu_type = X86_CPU_TYPE_NAME ("host");
>>>> +    mc->nvdimm_supported = false;
>>>> +    mc->default_machine_opts = "accel=kvm";
>>>> +
>>>> +    /* Machine class handlers */
>>>> +    mc->cpu_index_to_instance_props = cpu_index_to_props;
>>>> +    mc->get_default_cpu_node_id = cpu_get_default_cpu_node_id;
>>>> +    mc->possible_cpu_arch_ids = cpu_possible_cpu_arch_ids;;
>>>> +    mc->reset = microvm_machine_reset;
>>>> +}
>>>> +
>>>> +static void microvm_1_0_machine_class_init(MachineClass *mc)
>>>> +{
>>>> +    microvm_machine_class_init(mc);
>>>> +}
>>>> +DEFINE_MICROVM_MACHINE_AS_LATEST(1, 0)
>>>> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>>>> new file mode 100644
>>>> index 0000000000..544ef60563
>>>> --- /dev/null
>>>> +++ b/include/hw/i386/microvm.h
>>>> @@ -0,0 +1,85 @@
>>>> +/*
>>>> + *
>>>> + * Copyright (c) 2018 Intel Corporation
>>>> + * Copyright (c) 2019 Red Hat, Inc.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify it
>>>> + * under the terms and conditions of the GNU General Public License,
>>>> + * version 2 or later, as published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>>> + * more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License along with
>>>> + * this program.  If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#ifndef HW_I386_MICROVM_H
>>>> +#define HW_I386_MICROVM_H
>>>> +
>>>> +#include "qemu-common.h"
>>>> +#include "exec/hwaddr.h"
>>>> +#include "qemu/notify.h"
>>>> +
>>>> +#include "hw/boards.h"
>>>> +
>>>> +/* Microvm memory layout */
>>>> +#define ZERO_PAGE_START      0x7000
>>>> +#define BOOT_STACK_POINTER   0x8ff0
>>>> +#define PML4_START           0x9000
>>>> +#define PDPTE_START          0xa000
>>>> +#define PDE_START            0xb000
>>>> +#define EBDA_START           0x9fc00
>>>> +#define HIMEM_START          0x100000
>>>> +#define MICROVM_MAX_BELOW_4G 0xe0000000
>>>> +
>>>> +/* Bootparams related definitions */
>>>> +#define KERNEL_BOOT_FLAG_MAGIC     0xaa55
>>>> +#define KERNEL_HDR_MAGIC           0x53726448
>>>> +#define KERNEL_LOADER_OTHER        0xff
>>>> +#define KERNEL_MIN_ALIGNMENT_BYTES 0x01000000
>>>> +#define KERNEL_CMDLINE_START       0x20000
>>>> +#define KERNEL_CMDLINE_MAX_SIZE    0x10000
>>>> +
>>>> +/* Platform virtio definitions */
>>>> +#define VIRTIO_MMIO_BASE      0xd0000000
>>>> +#define VIRTIO_IRQ_BASE       5
>>>> +#define VIRTIO_NUM_TRANSPORTS 8
>>>> +#define VIRTIO_CMDLINE_MAXLEN 64
>>>> +
>>>> +/* Machine type options */
>>>> +#define MICROVM_MACHINE_LEGACY "legacy"
>>>> +
>>>> +typedef struct {
>>>> +    MachineClass parent;
>>>> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
>>>> +                                           DeviceState *dev);
>>>> +} MicrovmMachineClass;
>>>> +
>>>> +typedef struct {
>>>> +    MachineState parent;
>>>> +    unsigned apic_id_limit;
>>>> +    qemu_irq *gsi;
>>>> +
>>>> +    /* RAM size */
>>>> +    ram_addr_t below_4g_mem_size;
>>>> +    ram_addr_t above_4g_mem_size;
>>>> +
>>>> +    /* Kernel ELF entry. On reset, vCPUs RIP will be set to this */
>>>> +    uint64_t elf_entry;
>>>> +
>>>> +    /* Legacy mode based on an ISA bus. Useful for debugging */
>>>> +    bool legacy;
>>>> +} MicrovmMachineState;
>>>> +
>>>> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
>>>> +#define MICROVM_MACHINE(obj) \
>>>> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
>>>> +#define MICROVM_MACHINE_GET_CLASS(obj) \
>>>> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
>>>> +#define MICROVM_MACHINE_CLASS(class) \
>>>> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
>>>> +
>>>> +#endif


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers
  2019-06-28 20:50     ` Sergio Lopez
@ 2019-06-30 21:36       ` Michael S. Tsirkin
  2019-07-02  8:19         ` Gerd Hoffmann
  0 siblings, 1 reply; 29+ messages in thread
From: Michael S. Tsirkin @ 2019-06-30 21:36 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: qemu-devel, pbonzini, ehabkost, rth

On Fri, Jun 28, 2019 at 10:50:47PM +0200, Sergio Lopez wrote:
> 
> Michael S. Tsirkin <mst@redhat.com> writes:
> 
> > On Fri, Jun 28, 2019 at 01:53:47PM +0200, Sergio Lopez wrote:
> >> Put QOM and main struct definition in a separate header file, so it
> >> can be accesed from other components.
> >> 
> >> This is needed for the microvm machine type implementation.
> >> 
> >> Signed-off-by: Sergio Lopez <slp@redhat.com>
> >
> > If you are going to productise virtio-mmio, then 1.0 support is a must.
> > I am not sure we want a new machine with 0.X mmio devices.
> > Especially considering that virtio-mmio does not have support for
> > transitional devices.
> 
> What are the practical implications of that?

On the plus side, this means we don't need to maintain a bunch of hacks
for old guests with quirky drivers.

On the minus side, this requires Linux guests 3.19 and up.




> >> ---
> >>  hw/virtio/virtio-mmio.c | 35 +-----------------------
> >>  hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 61 insertions(+), 34 deletions(-)
> >>  create mode 100644 hw/virtio/virtio-mmio.h
> >> 
> >> diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
> >> index 97b7f35496..87c7fe4d8d 100644
> >> --- a/hw/virtio/virtio-mmio.c
> >> +++ b/hw/virtio/virtio-mmio.c
> >> @@ -26,44 +26,11 @@
> >>  #include "qemu/host-utils.h"
> >>  #include "qemu/module.h"
> >>  #include "sysemu/kvm.h"
> >> -#include "hw/virtio/virtio-bus.h"
> >> +#include "virtio-mmio.h"
> >>  #include "qemu/error-report.h"
> >>  #include "qemu/log.h"
> >>  #include "trace.h"
> >>  
> >> -/* QOM macros */
> >> -/* virtio-mmio-bus */
> >> -#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
> >> -#define VIRTIO_MMIO_BUS(obj) \
> >> -        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
> >> -#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
> >> -        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
> >> -#define VIRTIO_MMIO_BUS_CLASS(klass) \
> >> -        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
> >> -
> >> -/* virtio-mmio */
> >> -#define TYPE_VIRTIO_MMIO "virtio-mmio"
> >> -#define VIRTIO_MMIO(obj) \
> >> -        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
> >> -
> >> -#define VIRT_MAGIC 0x74726976 /* 'virt' */
> >> -#define VIRT_VERSION 1
> >> -#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
> >> -
> >> -typedef struct {
> >> -    /* Generic */
> >> -    SysBusDevice parent_obj;
> >> -    MemoryRegion iomem;
> >> -    qemu_irq irq;
> >> -    /* Guest accessible state needing migration and reset */
> >> -    uint32_t host_features_sel;
> >> -    uint32_t guest_features_sel;
> >> -    uint32_t guest_page_shift;
> >> -    /* virtio-bus */
> >> -    VirtioBusState bus;
> >> -    bool format_transport_address;
> >> -} VirtIOMMIOProxy;
> >> -
> >>  static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
> >>  {
> >>      return kvm_eventfds_enabled();
> >> diff --git a/hw/virtio/virtio-mmio.h b/hw/virtio/virtio-mmio.h
> >> new file mode 100644
> >> index 0000000000..2f3973f8c7
> >> --- /dev/null
> >> +++ b/hw/virtio/virtio-mmio.h
> >> @@ -0,0 +1,60 @@
> >> +/*
> >> + * Virtio MMIO bindings
> >> + *
> >> + * Copyright (c) 2011 Linaro Limited
> >> + *
> >> + * Author:
> >> + *  Peter Maydell <peter.maydell@linaro.org>
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License; either version 2
> >> + * of the License, or (at your option) any later version.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License along
> >> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#ifndef QEMU_VIRTIO_MMIO_H
> >> +#define QEMU_VIRTIO_MMIO_H
> >> +
> >> +#include "hw/virtio/virtio-bus.h"
> >> +
> >> +/* QOM macros */
> >> +/* virtio-mmio-bus */
> >> +#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
> >> +#define VIRTIO_MMIO_BUS(obj) \
> >> +        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
> >> +#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
> >> +        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
> >> +#define VIRTIO_MMIO_BUS_CLASS(klass) \
> >> +        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
> >> +
> >> +/* virtio-mmio */
> >> +#define TYPE_VIRTIO_MMIO "virtio-mmio"
> >> +#define VIRTIO_MMIO(obj) \
> >> +        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
> >> +
> >> +#define VIRT_MAGIC 0x74726976 /* 'virt' */
> >> +#define VIRT_VERSION 1
> >> +#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
> >> +
> >> +typedef struct {
> >> +    /* Generic */
> >> +    SysBusDevice parent_obj;
> >> +    MemoryRegion iomem;
> >> +    qemu_irq irq;
> >> +    /* Guest accessible state needing migration and reset */
> >> +    uint32_t host_features_sel;
> >> +    uint32_t guest_features_sel;
> >> +    uint32_t guest_page_shift;
> >> +    /* virtio-bus */
> >> +    VirtioBusState bus;
> >> +    bool format_transport_address;
> >> +} VirtIOMMIOProxy;
> >> +
> >> +#endif
> >> -- 
> >> 2.21.0
> 




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type
  2019-06-28 22:17     ` Paolo Bonzini
@ 2019-06-30 21:37       ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2019-06-30 21:37 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, ehabkost, Sergio Lopez, rth

On Sat, Jun 29, 2019 at 12:17:22AM +0200, Paolo Bonzini wrote:
> On 28/06/19 16:06, Michael S. Tsirkin wrote:
> >> +    assert(kvm_irqchip_in_kernel());
> > Hmm - irqchip in kernel actually increases the attack surface,
> > does it not? Or at least, the severity of the attacks.
> 
> Yeah, we should at least support split irqchip.  But, irqchip completely
> in userspace is slow when it is not broken, and it does not support
> APICv.  So it's not really feasible.
> 
> Paolo

Right, I meant split.

-- 
MST


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] hw/i386: Factorize CPU routine
  2019-06-28 21:44     ` Sergio Lopez
@ 2019-07-01  9:25       ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2019-07-01  9:25 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: qemu-devel, pbonzini, Eduardo Habkost, rth

On Fri, Jun 28, 2019 at 11:44:07PM +0200, Sergio Lopez wrote:
> 
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > On Fri, Jun 28, 2019 at 01:53:46PM +0200, Sergio Lopez wrote:
> > [...]
> >>  /* Enables contiguous-apic-ID mode, for compatibility */
> >> -static bool compat_apic_id_mode;
> >> +bool compat_apic_id_mode;
> >
> > We can get rid of this global variable, see the patch I have just
> > sent:
> >
> >   [PATCH] pc: Move compat_apic_id_mode variable to PCMachineClass
> 
> Nice. I'll adapt the v2 of the patchset to assume this has been
> committed.
> 
> Thanks!
> Sergio.


or include it for completeness.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers
  2019-06-30 21:36       ` Michael S. Tsirkin
@ 2019-07-02  8:19         ` Gerd Hoffmann
  2019-07-02 13:22           ` Michael S. Tsirkin
  0 siblings, 1 reply; 29+ messages in thread
From: Gerd Hoffmann @ 2019-07-02  8:19 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: pbonzini, rth, qemu-devel, Sergio Lopez, ehabkost

> > > I am not sure we want a new machine with 0.X mmio devices.
> > > Especially considering that virtio-mmio does not have support for
> > > transitional devices.
> > 
> > What are the practical implications of that?
> 
> On the plus side, this means we don't need to maintain a bunch of hacks
> for old guests with quirky drivers.

Also note that some newer virtio devices require virtio 1.0.

cheers,
  Gerd



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers
  2019-07-02  8:19         ` Gerd Hoffmann
@ 2019-07-02 13:22           ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2019-07-02 13:22 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: pbonzini, rth, qemu-devel, Sergio Lopez, ehabkost

On Tue, Jul 02, 2019 at 10:19:10AM +0200, Gerd Hoffmann wrote:
> > > > I am not sure we want a new machine with 0.X mmio devices.
> > > > Especially considering that virtio-mmio does not have support for
> > > > transitional devices.
> > > 
> > > What are the practical implications of that?
> > 
> > On the plus side, this means we don't need to maintain a bunch of hacks
> > for old guests with quirky drivers.
> 
> Also note that some newer virtio devices require virtio 1.0.
> 
> cheers,
>   Gerd

Right. I forgot.


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-07-02 13:24 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-28 11:53 [Qemu-devel] [PATCH 0/4] Introduce the microvm machine type Sergio Lopez
2019-06-28 11:53 ` [Qemu-devel] [PATCH 1/4] hw/i386: Factorize CPU routine Sergio Lopez
2019-06-28 20:03   ` Eduardo Habkost
2019-06-28 21:44     ` Sergio Lopez
2019-07-01  9:25       ` Michael S. Tsirkin
2019-06-28 11:53 ` [Qemu-devel] [PATCH 2/4] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
2019-06-28 14:03   ` Michael S. Tsirkin
2019-06-28 20:50     ` Sergio Lopez
2019-06-30 21:36       ` Michael S. Tsirkin
2019-07-02  8:19         ` Gerd Hoffmann
2019-07-02 13:22           ` Michael S. Tsirkin
2019-06-28 11:53 ` [Qemu-devel] [PATCH 3/4] hw/i386: Add an Intel MPTable generator Sergio Lopez
2019-06-28 11:53 ` [Qemu-devel] [PATCH 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
2019-06-28 14:06   ` Michael S. Tsirkin
2019-06-28 20:56     ` Sergio Lopez
2019-06-28 22:17     ` Paolo Bonzini
2019-06-30 21:37       ` Michael S. Tsirkin
2019-06-28 19:15   ` Maran Wilson
2019-06-28 21:05     ` Sergio Lopez
2019-06-28 21:54       ` Maran Wilson
2019-06-28 22:23         ` Sergio Lopez
2019-06-28 21:56       ` Paolo Bonzini
2019-06-28 19:47   ` Eduardo Habkost
2019-06-28 21:42     ` Sergio Lopez
2019-06-28 21:57       ` Paolo Bonzini
2019-06-28 13:21 ` [Qemu-devel] [PATCH 0/4] " Paolo Bonzini
2019-06-28 20:49   ` Sergio Lopez
2019-06-28 16:32 ` no-reply
2019-06-28 18:16 ` no-reply

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).