All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
@ 2019-07-02 12:11 Sergio Lopez
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
                   ` (8 more replies)
  0 siblings, 9 replies; 68+ messages in thread
From: Sergio Lopez @ 2019-07-02 12:11 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost, maran.wilson,
	sgarzare, kraxel
  Cc: qemu-devel, Sergio Lopez

Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a KVM-only machine type with fast
boot times, minimal attack surface (measured as the number of IO ports
and MMIO regions exposed to the Guest) and small footprint (specially
when combined with the ongoing QEMU modularization effort).

Normally, other than the device support provided by KVM itself,
microvm only supports virtio-mmio devices. Microvm also includes a
legacy mode, which adds an ISA bus with a 16550A serial port, useful
for being able to see the early boot kernel messages.

Microvm only supports booting PVH-enabled Linux ELF images. Booting
other PVH-enabled kernels may be possible, but due to the lack of ACPI
and firmware, we're relying on the command line for specifying the
location of the virtio-mmio transports. If there's an interest on
using this machine type with other kernels, we'll try to find some
kind of middle ground solution.

This is the list of the exposed IO ports and MMIO regions when running
in non-legacy mode:

address-space: memory
    00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
    00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
    00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
    00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
    00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
    00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
    00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
    00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
    00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi

address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io
    0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
    0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
    000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
    00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
    00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
    00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr

A QEMU instance with the microvm machine type can be invoked this way:

 - Normal mode:

qemu-system-x86_64 -M microvm -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config \
 -chardev pty,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

 - Legacy mode:

qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
 -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
 -nodefaults -no-user-config \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0 \
 -serial stdio


Changelog:
v3:
  - Add initrd support (thanks Stefano).

v2:
  - Drop "[PATCH 1/4] hw/i386: Factorize CPU routine".
  - Simplify machine definition (thanks Eduardo).
  - Remove use of unneeded NUMA-related callbacks (thanks Eduardo).
  - Add a patch to factorize PVH-related functions.
  - Replace use of Linux's Zero Page with PVH (thanks Maran and Paolo).


Sergio Lopez (4):
  hw/virtio: Factorize virtio-mmio headers
  hw/i386: Add an Intel MPTable generator
  hw/i386: Factorize PVH related functions
  hw/i386: Introduce the microvm machine type

 default-configs/i386-softmmu.mak            |   1 +
 hw/i386/Kconfig                             |   4 +
 hw/i386/Makefile.objs                       |   2 +
 hw/i386/microvm.c                           | 550 ++++++++++++++++++++
 hw/i386/mptable.c                           | 156 ++++++
 hw/i386/pc.c                                | 120 +----
 hw/i386/pvh.c                               | 113 ++++
 hw/i386/pvh.h                               |  10 +
 hw/virtio/virtio-mmio.c                     |  35 +-
 hw/virtio/virtio-mmio.h                     |  60 +++
 include/hw/i386/microvm.h                   |  82 +++
 include/hw/i386/mptable.h                   |  36 ++
 include/standard-headers/linux/mpspec_def.h | 182 +++++++
 13 files changed, 1209 insertions(+), 142 deletions(-)
 create mode 100644 hw/i386/microvm.c
 create mode 100644 hw/i386/mptable.c
 create mode 100644 hw/i386/pvh.c
 create mode 100644 hw/i386/pvh.h
 create mode 100644 hw/virtio/virtio-mmio.h
 create mode 100644 include/hw/i386/microvm.h
 create mode 100644 include/hw/i386/mptable.h
 create mode 100644 include/standard-headers/linux/mpspec_def.h

--
2.21.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers
  2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
@ 2019-07-02 12:11 ` Sergio Lopez
  2019-07-25  9:46   ` Liam Merwick
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 2/4] hw/i386: Add an Intel MPTable generator Sergio Lopez
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-07-02 12:11 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost, maran.wilson,
	sgarzare, kraxel
  Cc: qemu-devel, Sergio Lopez

Put QOM and main struct definition in a separate header file, so it
can be accesed from other components.

This is needed for the microvm machine type implementation.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/virtio/virtio-mmio.c | 35 +-----------------------
 hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 34 deletions(-)
 create mode 100644 hw/virtio/virtio-mmio.h

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 97b7f35496..87c7fe4d8d 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -26,44 +26,11 @@
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
 #include "sysemu/kvm.h"
-#include "hw/virtio/virtio-bus.h"
+#include "virtio-mmio.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "trace.h"
 
-/* QOM macros */
-/* virtio-mmio-bus */
-#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
-#define VIRTIO_MMIO_BUS(obj) \
-        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
-        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
-#define VIRTIO_MMIO_BUS_CLASS(klass) \
-        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
-
-/* virtio-mmio */
-#define TYPE_VIRTIO_MMIO "virtio-mmio"
-#define VIRTIO_MMIO(obj) \
-        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
-
-#define VIRT_MAGIC 0x74726976 /* 'virt' */
-#define VIRT_VERSION 1
-#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
-
-typedef struct {
-    /* Generic */
-    SysBusDevice parent_obj;
-    MemoryRegion iomem;
-    qemu_irq irq;
-    /* Guest accessible state needing migration and reset */
-    uint32_t host_features_sel;
-    uint32_t guest_features_sel;
-    uint32_t guest_page_shift;
-    /* virtio-bus */
-    VirtioBusState bus;
-    bool format_transport_address;
-} VirtIOMMIOProxy;
-
 static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
 {
     return kvm_eventfds_enabled();
diff --git a/hw/virtio/virtio-mmio.h b/hw/virtio/virtio-mmio.h
new file mode 100644
index 0000000000..2f3973f8c7
--- /dev/null
+++ b/hw/virtio/virtio-mmio.h
@@ -0,0 +1,60 @@
+/*
+ * Virtio MMIO bindings
+ *
+ * Copyright (c) 2011 Linaro Limited
+ *
+ * Author:
+ *  Peter Maydell <peter.maydell@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_VIRTIO_MMIO_H
+#define QEMU_VIRTIO_MMIO_H
+
+#include "hw/virtio/virtio-bus.h"
+
+/* QOM macros */
+/* virtio-mmio-bus */
+#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
+#define VIRTIO_MMIO_BUS(obj) \
+        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
+#define VIRTIO_MMIO_BUS_CLASS(klass) \
+        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
+
+/* virtio-mmio */
+#define TYPE_VIRTIO_MMIO "virtio-mmio"
+#define VIRTIO_MMIO(obj) \
+        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
+
+#define VIRT_MAGIC 0x74726976 /* 'virt' */
+#define VIRT_VERSION 1
+#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
+
+typedef struct {
+    /* Generic */
+    SysBusDevice parent_obj;
+    MemoryRegion iomem;
+    qemu_irq irq;
+    /* Guest accessible state needing migration and reset */
+    uint32_t host_features_sel;
+    uint32_t guest_features_sel;
+    uint32_t guest_page_shift;
+    /* virtio-bus */
+    VirtioBusState bus;
+    bool format_transport_address;
+} VirtIOMMIOProxy;
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v3 2/4] hw/i386: Add an Intel MPTable generator
  2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
@ 2019-07-02 12:11 ` Sergio Lopez
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 3/4] hw/i386: Factorize PVH related functions Sergio Lopez
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 68+ messages in thread
From: Sergio Lopez @ 2019-07-02 12:11 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost, maran.wilson,
	sgarzare, kraxel
  Cc: qemu-devel, Sergio Lopez

Add a helper function (mptable_generate) for generating an Intel
MPTable according to version 1.4 of the specification.

This is needed for the microvm machine type implementation.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/i386/mptable.c                           | 156 +++++++++++++++++
 include/hw/i386/mptable.h                   |  36 ++++
 include/standard-headers/linux/mpspec_def.h | 182 ++++++++++++++++++++
 3 files changed, 374 insertions(+)
 create mode 100644 hw/i386/mptable.c
 create mode 100644 include/hw/i386/mptable.h
 create mode 100644 include/standard-headers/linux/mpspec_def.h

diff --git a/hw/i386/mptable.c b/hw/i386/mptable.c
new file mode 100644
index 0000000000..cf1e0eef3a
--- /dev/null
+++ b/hw/i386/mptable.c
@@ -0,0 +1,156 @@
+/*
+ * Intel MPTable generator
+ *
+ * Copyright (C) 2019 Red Hat, Inc.
+ *
+ * Authors:
+ *   Sergio Lopez <slp@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i386/mptable.h"
+#include "standard-headers/linux/mpspec_def.h"
+
+static int mptable_checksum(char *buf, int size)
+{
+    int i;
+    int checksum = 0;
+
+    for (i = 0; i < size; i++) {
+        checksum += buf[i];
+    }
+
+    return checksum;
+}
+
+/*
+ * Generate an MPTable for "ncpus". "apic_id" must be the next available
+ * APIC ID (last CPU apic_id + 1). "table_base" is the physical location
+ * in the Guest where the caller intends to write the table, needed to
+ * fill the "physptr" field from the "mpf_intel" structure.
+ *
+ * On success, return a newly allocated buffer, that must be freed by the
+ * caller using "g_free" when it's no longer needed, and update
+ * "mptable_size" with the size of the buffer.
+ */
+char *mptable_generate(int ncpus, int table_base, int *mptable_size)
+{
+    struct mpf_intel *mpf;
+    struct mpc_table *table;
+    struct mpc_cpu *cpu;
+    struct mpc_bus *bus;
+    struct mpc_ioapic *ioapic;
+    struct mpc_intsrc *intsrc;
+    struct mpc_lintsrc *lintsrc;
+    const char mpc_signature[] = MPC_SIGNATURE;
+    const char smp_magic_ident[] = "_MP_";
+    char *mptable;
+    int checksum = 0;
+    int offset = 0;
+    int ssize;
+    int i;
+
+    ssize = sizeof(struct mpf_intel);
+    mptable = g_malloc0(ssize);
+
+    mpf = (struct mpf_intel *) mptable;
+    memcpy(mpf->signature, smp_magic_ident, sizeof(smp_magic_ident) - 1);
+    mpf->length = 1;
+    mpf->specification = 4;
+    mpf->physptr = table_base + ssize;
+    mpf->checksum -= mptable_checksum((char *) mpf, ssize);
+    offset = ssize + sizeof(struct mpc_table);
+
+    ssize = sizeof(struct mpc_cpu);
+    for (i = 0; i < ncpus; i++) {
+        mptable = g_realloc(mptable, offset + ssize);
+        cpu = (struct mpc_cpu *) (mptable + offset);
+        cpu->type = MP_PROCESSOR;
+        cpu->apicid = i;
+        cpu->apicver = APIC_VERSION;
+        cpu->cpuflag = CPU_ENABLED;
+        if (i == 0) {
+            cpu->cpuflag |= CPU_BOOTPROCESSOR;
+        }
+        cpu->cpufeature = CPU_STEPPING;
+        cpu->featureflag = CPU_FEATURE_APIC | CPU_FEATURE_FPU;
+        checksum += mptable_checksum((char *) cpu, ssize);
+        offset += ssize;
+    }
+
+    ssize = sizeof(struct mpc_bus);
+    mptable = g_realloc(mptable, offset + ssize);
+    bus = (struct mpc_bus *) (mptable + offset);
+    bus->type = MP_BUS;
+    bus->busid = 0;
+    memcpy(bus->bustype, BUS_TYPE_ISA, sizeof(BUS_TYPE_ISA) - 1);
+    checksum += mptable_checksum((char *) bus, ssize);
+    offset += ssize;
+
+    ssize = sizeof(struct mpc_ioapic);
+    mptable = g_realloc(mptable, offset + ssize);
+    ioapic = (struct mpc_ioapic *) (mptable + offset);
+    ioapic->type = MP_IOAPIC;
+    ioapic->apicid = ncpus + 1;
+    ioapic->apicver = APIC_VERSION;
+    ioapic->flags = MPC_APIC_USABLE;
+    ioapic->apicaddr = IO_APIC_DEFAULT_PHYS_BASE;
+    checksum += mptable_checksum((char *) ioapic, ssize);
+    offset += ssize;
+
+    ssize = sizeof(struct mpc_intsrc);
+    for (i = 0; i < 16; i++) {
+        mptable = g_realloc(mptable, offset + ssize);
+        intsrc = (struct mpc_intsrc *) (mptable + offset);
+        intsrc->type = MP_INTSRC;
+        intsrc->irqtype = mp_INT;
+        intsrc->irqflag = MP_IRQDIR_DEFAULT;
+        intsrc->srcbus = 0;
+        intsrc->srcbusirq = i;
+        intsrc->dstapic = ncpus + 1;
+        intsrc->dstirq = i;
+        checksum += mptable_checksum((char *) intsrc, ssize);
+        offset += ssize;
+    }
+
+    ssize = sizeof(struct mpc_lintsrc);
+    mptable = g_realloc(mptable, offset + (ssize * 2));
+    lintsrc = (struct mpc_lintsrc *) (mptable + offset);
+    lintsrc->type = MP_LINTSRC;
+    lintsrc->irqtype = mp_ExtINT;
+    lintsrc->irqflag = MP_IRQDIR_DEFAULT;
+    lintsrc->srcbusid = 0;
+    lintsrc->srcbusirq = 0;
+    lintsrc->destapic = 0;
+    lintsrc->destapiclint = 0;
+    checksum += mptable_checksum((char *) lintsrc, ssize);
+    offset += ssize;
+
+    lintsrc = (struct mpc_lintsrc *) (mptable + offset);
+    lintsrc->type = MP_LINTSRC;
+    lintsrc->irqtype = mp_NMI;
+    lintsrc->irqflag = MP_IRQDIR_DEFAULT;
+    lintsrc->srcbusid = 0;
+    lintsrc->srcbusirq = 0;
+    lintsrc->destapic = 0xFF;
+    lintsrc->destapiclint = 1;
+    checksum += mptable_checksum((char *) lintsrc, ssize);
+    offset += ssize;
+
+    ssize = sizeof(struct mpc_table);
+    table = (struct mpc_table *) (mptable + sizeof(struct mpf_intel));
+    memcpy(table->signature, mpc_signature, sizeof(mpc_signature) - 1);
+    table->length = offset - sizeof(struct mpf_intel);
+    table->spec = MPC_SPEC;
+    memcpy(table->oem, MPC_OEM, sizeof(MPC_OEM) - 1);
+    memcpy(table->productid, MPC_PRODUCT_ID, sizeof(MPC_PRODUCT_ID) - 1);
+    table->lapic = APIC_DEFAULT_PHYS_BASE;
+    checksum += mptable_checksum((char *) table, ssize);
+    table->checksum -= checksum;
+
+    *mptable_size = offset;
+    return mptable;
+}
diff --git a/include/hw/i386/mptable.h b/include/hw/i386/mptable.h
new file mode 100644
index 0000000000..96a9778bba
--- /dev/null
+++ b/include/hw/i386/mptable.h
@@ -0,0 +1,36 @@
+/*
+ * Intel MPTable generator
+ *
+ * Copyright (C) 2019 Red Hat, Inc.
+ *
+ * Authors:
+ *   Sergio Lopez <slp@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_I386_MPTABLE_H
+#define HW_I386_MPTABLE_H
+
+#define APIC_VERSION     0x14
+#define CPU_STEPPING     0x600
+#define CPU_FEATURE_APIC 0x200
+#define CPU_FEATURE_FPU  0x001
+#define MPC_SPEC         0x4
+
+#define MP_IRQDIR_DEFAULT 0
+#define MP_IRQDIR_HIGH    1
+#define MP_IRQDIR_LOW     3
+
+static const char MPC_OEM[]        = "QEMU    ";
+static const char MPC_PRODUCT_ID[] = "000000000000";
+static const char BUS_TYPE_ISA[]   = "ISA   ";
+
+#define IO_APIC_DEFAULT_PHYS_BASE 0xfec00000
+#define APIC_DEFAULT_PHYS_BASE    0xfee00000
+#define APIC_VERSION              0x14
+
+char *mptable_generate(int ncpus, int table_base, int *mptable_size);
+
+#endif
diff --git a/include/standard-headers/linux/mpspec_def.h b/include/standard-headers/linux/mpspec_def.h
new file mode 100644
index 0000000000..6fb923a343
--- /dev/null
+++ b/include/standard-headers/linux/mpspec_def.h
@@ -0,0 +1,182 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_MPSPEC_DEF_H
+#define _ASM_X86_MPSPEC_DEF_H
+
+/*
+ * Structure definitions for SMP machines following the
+ * Intel Multiprocessing Specification 1.1 and 1.4.
+ */
+
+/*
+ * This tag identifies where the SMP configuration
+ * information is.
+ */
+
+#define SMP_MAGIC_IDENT	(('_'<<24) | ('P'<<16) | ('M'<<8) | '_')
+
+#ifdef CONFIG_X86_32
+# define MAX_MPC_ENTRY 1024
+#endif
+
+/* Intel MP Floating Pointer Structure */
+struct mpf_intel {
+	char signature[4];		/* "_MP_"			*/
+	unsigned int physptr;		/* Configuration table address	*/
+	unsigned char length;		/* Our length (paragraphs)	*/
+	unsigned char specification;	/* Specification version	*/
+	unsigned char checksum;		/* Checksum (makes sum 0)	*/
+	unsigned char feature1;		/* Standard or configuration ?	*/
+	unsigned char feature2;		/* Bit7 set for IMCR|PIC	*/
+	unsigned char feature3;		/* Unused (0)			*/
+	unsigned char feature4;		/* Unused (0)			*/
+	unsigned char feature5;		/* Unused (0)			*/
+};
+
+#define MPC_SIGNATURE "PCMP"
+
+struct mpc_table {
+	char signature[4];
+	unsigned short length;		/* Size of table */
+	char spec;			/* 0x01 */
+	char checksum;
+	char oem[8];
+	char productid[12];
+	unsigned int oemptr;		/* 0 if not present */
+	unsigned short oemsize;		/* 0 if not present */
+	unsigned short oemcount;
+	unsigned int lapic;		/* APIC address */
+	unsigned int reserved;
+};
+
+/* Followed by entries */
+
+#define	MP_PROCESSOR		0
+#define	MP_BUS			1
+#define	MP_IOAPIC		2
+#define	MP_INTSRC		3
+#define	MP_LINTSRC		4
+/* Used by IBM NUMA-Q to describe node locality */
+#define	MP_TRANSLATION		192
+
+#define CPU_ENABLED		1	/* Processor is available */
+#define CPU_BOOTPROCESSOR	2	/* Processor is the boot CPU */
+
+#define CPU_STEPPING_MASK	0x000F
+#define CPU_MODEL_MASK		0x00F0
+#define CPU_FAMILY_MASK		0x0F00
+
+struct mpc_cpu {
+	unsigned char type;
+	unsigned char apicid;		/* Local APIC number */
+	unsigned char apicver;		/* Its versions */
+	unsigned char cpuflag;
+	unsigned int cpufeature;
+	unsigned int featureflag;	/* CPUID feature value */
+	unsigned int reserved[2];
+};
+
+struct mpc_bus {
+	unsigned char type;
+	unsigned char busid;
+	unsigned char bustype[6];
+};
+
+/* List of Bus Type string values, Intel MP Spec. */
+#define BUSTYPE_EISA	"EISA"
+#define BUSTYPE_ISA	"ISA"
+#define BUSTYPE_INTERN	"INTERN"	/* Internal BUS */
+#define BUSTYPE_MCA	"MCA"		/* Obsolete */
+#define BUSTYPE_VL	"VL"		/* Local bus */
+#define BUSTYPE_PCI	"PCI"
+#define BUSTYPE_PCMCIA	"PCMCIA"
+#define BUSTYPE_CBUS	"CBUS"
+#define BUSTYPE_CBUSII	"CBUSII"
+#define BUSTYPE_FUTURE	"FUTURE"
+#define BUSTYPE_MBI	"MBI"
+#define BUSTYPE_MBII	"MBII"
+#define BUSTYPE_MPI	"MPI"
+#define BUSTYPE_MPSA	"MPSA"
+#define BUSTYPE_NUBUS	"NUBUS"
+#define BUSTYPE_TC	"TC"
+#define BUSTYPE_VME	"VME"
+#define BUSTYPE_XPRESS	"XPRESS"
+
+#define MPC_APIC_USABLE		0x01
+
+struct mpc_ioapic {
+	unsigned char type;
+	unsigned char apicid;
+	unsigned char apicver;
+	unsigned char flags;
+	unsigned int apicaddr;
+};
+
+struct mpc_intsrc {
+	unsigned char type;
+	unsigned char irqtype;
+	unsigned short irqflag;
+	unsigned char srcbus;
+	unsigned char srcbusirq;
+	unsigned char dstapic;
+	unsigned char dstirq;
+};
+
+enum mp_irq_source_types {
+	mp_INT = 0,
+	mp_NMI = 1,
+	mp_SMI = 2,
+	mp_ExtINT = 3
+};
+
+#define MP_IRQPOL_DEFAULT	0x0
+#define MP_IRQPOL_ACTIVE_HIGH	0x1
+#define MP_IRQPOL_RESERVED	0x2
+#define MP_IRQPOL_ACTIVE_LOW	0x3
+#define MP_IRQPOL_MASK		0x3
+
+#define MP_IRQTRIG_DEFAULT	0x0
+#define MP_IRQTRIG_EDGE		0x4
+#define MP_IRQTRIG_RESERVED	0x8
+#define MP_IRQTRIG_LEVEL	0xc
+#define MP_IRQTRIG_MASK		0xc
+
+#define MP_APIC_ALL	0xFF
+
+struct mpc_lintsrc {
+	unsigned char type;
+	unsigned char irqtype;
+	unsigned short irqflag;
+	unsigned char srcbusid;
+	unsigned char srcbusirq;
+	unsigned char destapic;
+	unsigned char destapiclint;
+};
+
+#define MPC_OEM_SIGNATURE "_OEM"
+
+struct mpc_oemtable {
+	char signature[4];
+	unsigned short length;		/* Size of table */
+	char  rev;			/* 0x01 */
+	char  checksum;
+	char  mpc[8];
+};
+
+/*
+ *	Default configurations
+ *
+ *	1	2 CPU ISA 82489DX
+ *	2	2 CPU EISA 82489DX neither IRQ 0 timer nor IRQ 13 DMA chaining
+ *	3	2 CPU EISA 82489DX
+ *	4	2 CPU MCA 82489DX
+ *	5	2 CPU ISA+PCI
+ *	6	2 CPU EISA+PCI
+ *	7	2 CPU MCA+PCI
+ */
+
+enum mp_bustype {
+	MP_BUS_ISA = 1,
+	MP_BUS_EISA,
+	MP_BUS_PCI,
+};
+#endif /* _ASM_X86_MPSPEC_DEF_H */
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v3 3/4] hw/i386: Factorize PVH related functions
  2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 2/4] hw/i386: Add an Intel MPTable generator Sergio Lopez
@ 2019-07-02 12:11 ` Sergio Lopez
  2019-07-23  8:39   ` Liam Merwick
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-07-02 12:11 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost, maran.wilson,
	sgarzare, kraxel
  Cc: qemu-devel, Sergio Lopez

Extract PVH related functions from pc.c, and put them in pvh.c, so
they can be shared with other components.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 hw/i386/Makefile.objs |   1 +
 hw/i386/pc.c          | 120 +++++-------------------------------------
 hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
 hw/i386/pvh.h         |  10 ++++
 4 files changed, 136 insertions(+), 108 deletions(-)
 create mode 100644 hw/i386/pvh.c
 create mode 100644 hw/i386/pvh.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 5d9c9efd5f..c5f20bbd72 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
+obj-y += pvh.o
 obj-y += pc.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3983621f1c..325ec2c1c8 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -42,6 +42,7 @@
 #include "hw/loader.h"
 #include "elf.h"
 #include "multiboot.h"
+#include "pvh.h"
 #include "hw/timer/mc146818rtc.h"
 #include "hw/dma/i8257.h"
 #include "hw/timer/i8254.h"
@@ -108,9 +109,6 @@ static struct e820_entry *e820_table;
 static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
-/* Physical Address of PVH entry point read from kernel ELF NOTE */
-static size_t pvh_start_addr;
-
 GlobalProperty pc_compat_4_0[] = {};
 const size_t pc_compat_4_0_len = G_N_ELEMENTS(pc_compat_4_0);
 
@@ -1061,109 +1059,6 @@ struct setup_data {
     uint8_t data[0];
 } __attribute__((packed));
 
-
-/*
- * The entry point into the kernel for PVH boot is different from
- * the native entry point.  The PVH entry is defined by the x86/HVM
- * direct boot ABI and is available in an ELFNOTE in the kernel binary.
- *
- * This function is passed to load_elf() when it is called from
- * load_elfboot() which then additionally checks for an ELF Note of
- * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
- * parse the PVH entry address from the ELF Note.
- *
- * Due to trickery in elf_opts.h, load_elf() is actually available as
- * load_elf32() or load_elf64() and this routine needs to be able
- * to deal with being called as 32 or 64 bit.
- *
- * The address of the PVH entry point is saved to the 'pvh_start_addr'
- * global variable.  (although the entry point is 32-bit, the kernel
- * binary can be either 32-bit or 64-bit).
- */
-static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
-{
-    size_t *elf_note_data_addr;
-
-    /* Check if ELF Note header passed in is valid */
-    if (arg1 == NULL) {
-        return 0;
-    }
-
-    if (is64) {
-        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
-        uint64_t nhdr_size64 = sizeof(struct elf64_note);
-        uint64_t phdr_align = *(uint64_t *)arg2;
-        uint64_t nhdr_namesz = nhdr64->n_namesz;
-
-        elf_note_data_addr =
-            ((void *)nhdr64) + nhdr_size64 +
-            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
-    } else {
-        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
-        uint32_t nhdr_size32 = sizeof(struct elf32_note);
-        uint32_t phdr_align = *(uint32_t *)arg2;
-        uint32_t nhdr_namesz = nhdr32->n_namesz;
-
-        elf_note_data_addr =
-            ((void *)nhdr32) + nhdr_size32 +
-            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
-    }
-
-    pvh_start_addr = *elf_note_data_addr;
-
-    return pvh_start_addr;
-}
-
-static bool load_elfboot(const char *kernel_filename,
-                   int kernel_file_size,
-                   uint8_t *header,
-                   size_t pvh_xen_start_addr,
-                   FWCfgState *fw_cfg)
-{
-    uint32_t flags = 0;
-    uint32_t mh_load_addr = 0;
-    uint32_t elf_kernel_size = 0;
-    uint64_t elf_entry;
-    uint64_t elf_low, elf_high;
-    int kernel_size;
-
-    if (ldl_p(header) != 0x464c457f) {
-        return false; /* no elfboot */
-    }
-
-    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
-    flags = elf_is64 ?
-        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
-
-    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
-        error_report("elfboot unsupported flags = %x", flags);
-        exit(1);
-    }
-
-    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
-    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
-                           NULL, &elf_note_type, &elf_entry,
-                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
-                           0, 0);
-
-    if (kernel_size < 0) {
-        error_report("Error while loading elf kernel");
-        exit(1);
-    }
-    mh_load_addr = elf_low;
-    elf_kernel_size = elf_high - elf_low;
-
-    if (pvh_start_addr == 0) {
-        error_report("Error loading uncompressed kernel without PVH ELF Note");
-        exit(1);
-    }
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
-
-    return true;
-}
-
 static void load_linux(PCMachineState *pcms,
                        FWCfgState *fw_cfg)
 {
@@ -1203,6 +1098,9 @@ static void load_linux(PCMachineState *pcms,
     if (ldl_p(header+0x202) == 0x53726448) {
         protocol = lduw_p(header+0x206);
     } else {
+        size_t pvh_start_addr;
+        uint32_t mh_load_addr = 0;
+        uint32_t elf_kernel_size = 0;
         /*
          * This could be a multiboot kernel. If it is, let's stop treating it
          * like a Linux kernel.
@@ -1220,10 +1118,16 @@ static void load_linux(PCMachineState *pcms,
          * If load_elfboot() is successful, populate the fw_cfg info.
          */
         if (pcmc->pvh_enabled &&
-            load_elfboot(kernel_filename, kernel_size,
-                         header, pvh_start_addr, fw_cfg)) {
+            pvh_load_elfboot(kernel_filename,
+                             &mh_load_addr, &elf_kernel_size)) {
             fclose(f);
 
+            pvh_start_addr = pvh_get_start_addr();
+
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
+            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
+
             fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
                 strlen(kernel_cmdline) + 1);
             fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
new file mode 100644
index 0000000000..61623b4533
--- /dev/null
+++ b/hw/i386/pvh.c
@@ -0,0 +1,113 @@
+/*
+ * PVH Boot Helper
+ *
+ * Copyright (C) 2019 Oracle
+ * Copyright (C) 2019 Red Hat, Inc
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/error-report.h"
+#include "hw/loader.h"
+#include "cpu.h"
+#include "elf.h"
+#include "pvh.h"
+
+static size_t pvh_start_addr = 0;
+
+size_t pvh_get_start_addr(void)
+{
+    return pvh_start_addr;
+}
+
+/*
+ * The entry point into the kernel for PVH boot is different from
+ * the native entry point.  The PVH entry is defined by the x86/HVM
+ * direct boot ABI and is available in an ELFNOTE in the kernel binary.
+ *
+ * This function is passed to load_elf() when it is called from
+ * load_elfboot() which then additionally checks for an ELF Note of
+ * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
+ * parse the PVH entry address from the ELF Note.
+ *
+ * Due to trickery in elf_opts.h, load_elf() is actually available as
+ * load_elf32() or load_elf64() and this routine needs to be able
+ * to deal with being called as 32 or 64 bit.
+ *
+ * The address of the PVH entry point is saved to the 'pvh_start_addr'
+ * global variable.  (although the entry point is 32-bit, the kernel
+ * binary can be either 32-bit or 64-bit).
+ */
+
+static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
+{
+    size_t *elf_note_data_addr;
+
+    /* Check if ELF Note header passed in is valid */
+    if (arg1 == NULL) {
+        return 0;
+    }
+
+    if (is64) {
+        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
+        uint64_t nhdr_size64 = sizeof(struct elf64_note);
+        uint64_t phdr_align = *(uint64_t *)arg2;
+        uint64_t nhdr_namesz = nhdr64->n_namesz;
+
+        elf_note_data_addr =
+            ((void *)nhdr64) + nhdr_size64 +
+            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+    } else {
+        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
+        uint32_t nhdr_size32 = sizeof(struct elf32_note);
+        uint32_t phdr_align = *(uint32_t *)arg2;
+        uint32_t nhdr_namesz = nhdr32->n_namesz;
+
+        elf_note_data_addr =
+            ((void *)nhdr32) + nhdr_size32 +
+            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
+    }
+
+    pvh_start_addr = *elf_note_data_addr;
+
+    return pvh_start_addr;
+}
+
+bool pvh_load_elfboot(const char *kernel_filename,
+                      uint32_t *mh_load_addr,
+                      uint32_t *elf_kernel_size)
+{
+    uint64_t elf_entry;
+    uint64_t elf_low, elf_high;
+    int kernel_size;
+    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
+
+    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
+                           NULL, &elf_note_type, &elf_entry,
+                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
+                           0, 0);
+
+    if (kernel_size < 0) {
+        error_report("Error while loading elf kernel");
+        return false;
+    }
+
+    if (pvh_start_addr == 0) {
+        error_report("Error loading uncompressed kernel without PVH ELF Note");
+        return false;
+    }
+
+    if (mh_load_addr) {
+        *mh_load_addr = elf_low;
+    }
+
+    if (elf_kernel_size) {
+        *elf_kernel_size = elf_high - elf_low;
+    }
+
+    return true;
+}
diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
new file mode 100644
index 0000000000..ada67ff6e8
--- /dev/null
+++ b/hw/i386/pvh.h
@@ -0,0 +1,10 @@
+#ifndef HW_I386_PVH_H
+#define HW_I386_PVH_H
+
+size_t pvh_get_start_addr(void);
+
+bool pvh_load_elfboot(const char *kernel_filename,
+                      uint32_t *mh_load_addr,
+                      uint32_t *elf_kernel_size);
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [PATCH v3 4/4] hw/i386: Introduce the microvm machine type
  2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (2 preceding siblings ...)
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 3/4] hw/i386: Factorize PVH related functions Sergio Lopez
@ 2019-07-02 12:11 ` Sergio Lopez
  2019-07-02 13:58   ` Gerd Hoffmann
  2019-07-25 10:47   ` Paolo Bonzini
  2019-07-02 15:01 ` [Qemu-devel] [PATCH v3 0/4] " no-reply
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 68+ messages in thread
From: Sergio Lopez @ 2019-07-02 12:11 UTC (permalink / raw)
  To: mst, marcel.apfelbaum, pbonzini, rth, ehabkost, maran.wilson,
	sgarzare, kraxel
  Cc: qemu-devel, Sergio Lopez

Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a KVM-only machine type with fast
boot times, minimal attack surface (measured as the number of IO ports
and MMIO regions exposed to the Guest) and small footprint (specially
when combined with the ongoing QEMU modularization effort).

Normally, other than the device support provided by KVM itself,
microvm only supports virtio-mmio devices. Microvm also includes a
legacy mode, which adds an ISA bus with a 16550A serial port, useful
for being able to see the early boot kernel messages.

Microvm only supports booting PVH-enabled Linux ELF images. Booting
other PVH-enabled kernels may be possible, but due to the lack of ACPI
and firmware, we're relying on the command line for specifying the
location of the virtio-mmio transports. If there's an interest on
using this machine type with other kernels, we'll try to find some
kind of middle ground solution.

Signed-off-by: Sergio Lopez <slp@redhat.com>
---
 default-configs/i386-softmmu.mak |   1 +
 hw/i386/Kconfig                  |   4 +
 hw/i386/Makefile.objs            |   1 +
 hw/i386/microvm.c                | 550 +++++++++++++++++++++++++++++++
 include/hw/i386/microvm.h        |  82 +++++
 5 files changed, 638 insertions(+)
 create mode 100644 hw/i386/microvm.c
 create mode 100644 include/hw/i386/microvm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index cd5ea391e8..338f07420f 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -26,3 +26,4 @@ CONFIG_ISAPC=y
 CONFIG_I440FX=y
 CONFIG_Q35=y
 CONFIG_ACPI_PCI=y
+CONFIG_MICROVM=y
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 9817888216..94c565d8db 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -87,6 +87,10 @@ config Q35
     select VMMOUSE
     select FW_CFG_DMA
 
+config MICROVM
+    bool
+    select VIRTIO_MMIO
+
 config VTD
     bool
 
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index c5f20bbd72..7bffca413e 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -4,6 +4,7 @@ obj-y += pvh.o
 obj-y += pc.o
 obj-$(CONFIG_I440FX) += pc_piix.o
 obj-$(CONFIG_Q35) += pc_q35.o
+obj-$(CONFIG_MICROVM) += mptable.o microvm.o
 obj-y += fw_cfg.o pc_sysfw.o
 obj-y += x86-iommu.o
 obj-$(CONFIG_VTD) += intel_iommu.o
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
new file mode 100644
index 0000000000..b3b367add1
--- /dev/null
+++ b/hw/i386/microvm.c
@@ -0,0 +1,550 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/numa.h"
+
+#include "hw/loader.h"
+#include "hw/nmi.h"
+#include "hw/kvm/clock.h"
+#include "hw/i386/microvm.h"
+#include "hw/i386/pc.h"
+#include "target/i386/cpu.h"
+#include "hw/timer/i8254.h"
+#include "hw/char/serial.h"
+#include "hw/i386/topology.h"
+#include "hw/virtio/virtio-mmio.h"
+#include "hw/i386/mptable.h"
+
+#include "cpu.h"
+#include "elf.h"
+#include "pvh.h"
+#include "kvm_i386.h"
+#include "hw/xen/start_info.h"
+
+static void microvm_gsi_handler(void *opaque, int n, int level)
+{
+    qemu_irq *ioapic_irq = opaque;
+
+    qemu_set_irq(ioapic_irq[n], level);
+}
+
+static void microvm_legacy_init(MicrovmMachineState *mms)
+{
+    ISABus *isa_bus;
+    GSIState *gsi_state;
+    qemu_irq *i8259;
+    int i;
+
+    assert(kvm_irqchip_in_kernel());
+    gsi_state = g_malloc0(sizeof(*gsi_state));
+    mms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
+
+    isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
+                          &error_abort);
+    isa_bus_irqs(isa_bus, mms->gsi);
+
+    assert(kvm_pic_in_kernel());
+    i8259 = kvm_i8259_init(isa_bus);
+
+    for (i = 0; i < ISA_NUM_IRQS; i++) {
+        gsi_state->i8259_irq[i] = i8259[i];
+    }
+
+    kvm_pit_init(isa_bus, 0x40);
+
+    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
+        int nirq = VIRTIO_IRQ_BASE + i;
+        ISADevice *isadev = isa_create(isa_bus, TYPE_ISA_SERIAL);
+        qemu_irq mmio_irq;
+
+        isa_init_irq(isadev, &mmio_irq, nirq);
+        sysbus_create_simple("virtio-mmio",
+                             VIRTIO_MMIO_BASE + i * 512,
+                             mms->gsi[VIRTIO_IRQ_BASE + i]);
+    }
+
+    g_free(i8259);
+
+    serial_hds_isa_init(isa_bus, 0, 1);
+}
+
+static void microvm_ioapic_init(MicrovmMachineState *mms)
+{
+    qemu_irq *ioapic_irq;
+    DeviceState *ioapic_dev;
+    SysBusDevice *d;
+    int i;
+
+    assert(kvm_irqchip_in_kernel());
+    ioapic_irq = g_new0(qemu_irq, IOAPIC_NUM_PINS);
+    kvm_pc_setup_irq_routing(true);
+
+    assert(kvm_ioapic_in_kernel());
+    ioapic_dev = qdev_create(NULL, "kvm-ioapic");
+
+    object_property_add_child(qdev_get_machine(),
+                              "ioapic", OBJECT(ioapic_dev), NULL);
+
+    qdev_init_nofail(ioapic_dev);
+    d = SYS_BUS_DEVICE(ioapic_dev);
+    sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
+
+    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+        ioapic_irq[i] = qdev_get_gpio_in(ioapic_dev, i);
+    }
+
+    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler,
+                                  ioapic_irq, IOAPIC_NUM_PINS);
+
+    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
+        sysbus_create_simple("virtio-mmio",
+                             VIRTIO_MMIO_BASE + i * 512,
+                             mms->gsi[VIRTIO_IRQ_BASE + i]);
+    }
+}
+
+static void microvm_memory_init(MicrovmMachineState *mms)
+{
+    MachineState *machine = MACHINE(mms);
+    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
+    MemoryRegion *system_memory = get_system_memory();
+
+    if (machine->ram_size > MICROVM_MAX_BELOW_4G) {
+        mms->above_4g_mem_size = machine->ram_size - MICROVM_MAX_BELOW_4G;
+        mms->below_4g_mem_size = MICROVM_MAX_BELOW_4G;
+    } else {
+        mms->above_4g_mem_size = 0;
+        mms->below_4g_mem_size = machine->ram_size;
+    }
+
+    ram = g_malloc(sizeof(*ram));
+    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
+                                         machine->ram_size);
+
+    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
+    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
+                             0, mms->below_4g_mem_size);
+    memory_region_add_subregion(system_memory, 0, ram_below_4g);
+
+    e820_add_entry(0, mms->below_4g_mem_size, E820_RAM);
+
+    if (mms->above_4g_mem_size > 0) {
+        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
+        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
+                                 mms->below_4g_mem_size,
+                                 mms->above_4g_mem_size);
+        memory_region_add_subregion(system_memory, 0x100000000ULL,
+                                    ram_above_4g);
+        e820_add_entry(0x100000000ULL, mms->above_4g_mem_size, E820_RAM);
+    }
+}
+
+static void microvm_cpus_init(const char *typename, Error **errp)
+{
+    int i;
+
+    for (i = 0; i < smp_cpus; i++) {
+        Object *cpu = NULL;
+        Error *local_err = NULL;
+
+        cpu = object_new(typename);
+
+        object_property_set_uint(cpu, i, "apic-id", &local_err);
+        object_property_set_bool(cpu, true, "realized", &local_err);
+
+        object_unref(cpu);
+        error_propagate(errp, local_err);
+    }
+}
+
+static void microvm_machine_state_init(MachineState *machine)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
+    Error *local_err = NULL;
+
+    if (machine->kernel_filename == NULL) {
+        error_report("missing kernel image file name, required by microvm");
+        exit(1);
+    }
+
+    microvm_memory_init(mms);
+
+    microvm_cpus_init(machine->cpu_type, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        exit(1);
+    }
+
+    if (mms->legacy) {
+        microvm_legacy_init(mms);
+    } else {
+        microvm_ioapic_init(mms);
+    }
+
+    kvmclock_create();
+
+    if (!pvh_load_elfboot(machine->kernel_filename, NULL, NULL)) {
+        error_report("Error while loading elf kernel");
+        exit(1);
+    }
+
+    if (machine->initrd_filename) {
+        uint32_t initrd_max;
+        gsize initrd_size;
+        gchar *initrd_data;
+        GError *gerr = NULL;
+
+        if (!g_file_get_contents(machine->initrd_filename, &initrd_data,
+                                 &initrd_size, &gerr)) {
+            error_report("qemu: error reading initrd %s: %s\n",
+                         machine->initrd_filename, gerr->message);
+            exit(1);
+        }
+
+        initrd_max = mms->below_4g_mem_size - HIMEM_START;
+        if (initrd_size >= initrd_max) {
+            error_report("qemu: initrd is too large, cannot support."
+                         "(max: %"PRIu32", need %"PRId64")\n",
+                         initrd_max, (uint64_t)initrd_size);
+            exit(1);
+        }
+
+        address_space_write(&address_space_memory,
+                            HIMEM_START, MEMTXATTRS_UNSPECIFIED,
+                            (uint8_t *) initrd_data, initrd_size);
+
+        g_free(initrd_data);
+
+        mms->initrd_addr = HIMEM_START;
+        mms->initrd_size = initrd_size;
+    }
+
+    mms->elf_entry = pvh_get_start_addr();
+}
+
+static gchar *microvm_get_mmio_cmdline(gchar *name)
+{
+    gchar *cmdline;
+    gchar *separator;
+    long int index;
+    int ret;
+
+    separator = g_strrstr(name, ".");
+    if (!separator) {
+        return NULL;
+    }
+
+    if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) {
+        return NULL;
+    }
+
+    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
+    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
+                     " virtio_mmio.device=512@0x%lx:%ld",
+                     VIRTIO_MMIO_BASE + index * 512,
+                     VIRTIO_IRQ_BASE + index);
+    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
+        g_free(cmdline);
+        return NULL;
+    }
+
+    return cmdline;
+}
+
+static void microvm_setup_pvh(MicrovmMachineState *mms,
+                              const gchar *kernel_cmdline)
+{
+    struct hvm_memmap_table_entry *memmap_table;
+    struct hvm_start_info *start_info;
+    BusState *bus;
+    BusChild *kid;
+    gchar *cmdline;
+    int cmdline_len;
+    int memmap_entries;
+    int i;
+
+    cmdline = g_strdup(kernel_cmdline);
+
+    /*
+     * Find MMIO transports with attached devices, and add them to the kernel
+     * command line.
+     */
+    bus = sysbus_get_default();
+    QTAILQ_FOREACH(kid, &bus->children, sibling) {
+        DeviceState *dev = kid->child;
+        ObjectClass *class = object_get_class(OBJECT(dev));
+
+        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
+            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
+            VirtioBusState *mmio_virtio_bus = &mmio->bus;
+            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
+
+            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
+                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
+                if (mmio_cmdline) {
+                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
+                    g_free(mmio_cmdline);
+                    g_free(cmdline);
+                    cmdline = newcmd;
+                }
+            }
+        }
+    }
+
+    cmdline_len = strlen(cmdline);
+
+    address_space_write(&address_space_memory,
+                        KERNEL_CMDLINE_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) cmdline, cmdline_len);
+
+    g_free(cmdline);
+
+    memmap_entries = e820_get_num_entries();
+    memmap_table = g_new0(struct hvm_memmap_table_entry, memmap_entries);
+    for (i = 0; i < memmap_entries; i++) {
+        uint64_t address, length;
+        struct hvm_memmap_table_entry *entry = &memmap_table[i];
+
+        if (e820_get_entry(i, E820_RAM, &address, &length)) {
+            entry->addr = address;
+            entry->size = length;
+            entry->type = E820_RAM;
+            entry->reserved = 0;
+        }
+    }
+
+    address_space_write(&address_space_memory,
+                        MEMMAP_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) memmap_table,
+                        memmap_entries * sizeof(struct hvm_memmap_table_entry));
+
+    g_free(memmap_table);
+
+    start_info = g_malloc0(sizeof(struct hvm_start_info));
+
+    start_info->magic = XEN_HVM_START_MAGIC_VALUE;
+    start_info->version = 1;
+
+    start_info->nr_modules = 0;
+    start_info->cmdline_paddr = KERNEL_CMDLINE_START;
+    start_info->memmap_entries = memmap_entries;
+    start_info->memmap_paddr = MEMMAP_START;
+
+    if (mms->initrd_addr) {
+        struct hvm_modlist_entry *entry = g_new0(struct hvm_modlist_entry, 1);
+
+        entry->paddr = mms->initrd_addr;
+        entry->size = mms->initrd_size;
+
+        address_space_write(&address_space_memory,
+                            MODLIST_START, MEMTXATTRS_UNSPECIFIED,
+                            (uint8_t *) entry,
+                            sizeof(struct hvm_modlist_entry));
+        g_free(entry);
+
+        start_info->nr_modules = 1;
+        start_info->modlist_paddr = MODLIST_START;
+    } else {
+        start_info->nr_modules = 0;
+    }
+
+    address_space_write(&address_space_memory,
+                        PVH_START_INFO, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) start_info,
+                        sizeof(struct hvm_start_info));
+
+    g_free(start_info);
+}
+
+static void microvm_init_page_tables(void)
+{
+    uint64_t val = 0;
+    int i;
+
+    val = PDPTE_START | 0x03;
+    address_space_write(&address_space_memory,
+                        PML4_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) &val, 8);
+    val = PDE_START | 0x03;
+    address_space_write(&address_space_memory,
+                        PDPTE_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) &val, 8);
+
+    for (i = 0; i < 512; i++) {
+        val = (i << 21) + 0x83;
+        address_space_write(&address_space_memory,
+                            PDE_START + (i * 8), MEMTXATTRS_UNSPECIFIED,
+                            (uint8_t *) &val, 8);
+    }
+}
+
+static void microvm_cpu_reset(CPUState *cs, uint64_t elf_entry)
+{
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = &cpu->env;
+    struct SegmentCache seg_code = { .selector = 0x8,
+                                     .base = 0x0,
+                                     .limit = 0xffffffff,
+                                     .flags = 0xc09b00 };
+    struct SegmentCache seg_data = { .selector = 0x10,
+                                     .base = 0x0,
+                                     .limit = 0xffffffff,
+                                     .flags = 0xc09300 };
+    struct SegmentCache seg_tr = { .selector = 0x18,
+                                   .base = 0x0,
+                                   .limit = 0xffff,
+                                   .flags = 0x8b00 };
+
+    memcpy(&env->segs[R_CS], &seg_code, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_DS], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_ES], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_FS], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_GS], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->segs[R_SS], &seg_data, sizeof(struct SegmentCache));
+    memcpy(&env->tr, &seg_tr, sizeof(struct SegmentCache));
+
+    env->regs[R_EBX] = PVH_START_INFO;
+
+    cpu_set_pc(cs, elf_entry);
+    cpu_x86_update_cr3(env, 0);
+    cpu_x86_update_cr4(env, 0);
+    cpu_x86_update_cr0(env, CR0_PE_MASK);
+
+    x86_update_hflags(env);
+}
+
+static void microvm_mptable_setup(MicrovmMachineState *mms)
+{
+    char *mptable;
+    int size;
+
+    mptable = mptable_generate(smp_cpus, EBDA_START, &size);
+    address_space_write(&address_space_memory,
+                        EBDA_START, MEMTXATTRS_UNSPECIFIED,
+                        (uint8_t *) mptable, size);
+    g_free(mptable);
+}
+
+static bool microvm_machine_get_legacy(Object *obj, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    return mms->legacy;
+}
+
+static void microvm_machine_set_legacy(Object *obj, bool value, Error **errp)
+{
+    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+    mms->legacy = value;
+}
+
+static void microvm_machine_reset(void)
+{
+    MachineState *machine = MACHINE(qdev_get_machine());
+    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
+    CPUState *cs;
+    X86CPU *cpu;
+
+    qemu_devices_reset();
+
+    microvm_mptable_setup(mms);
+    microvm_setup_pvh(mms, machine->kernel_cmdline);
+    microvm_init_page_tables();
+
+    CPU_FOREACH(cs) {
+        cpu = X86_CPU(cs);
+
+        if (cpu->apic_state) {
+            device_reset(cpu->apic_state);
+        }
+
+        microvm_cpu_reset(cs, mms->elf_entry);
+    }
+}
+
+static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
+{
+    CPUState *cs;
+
+    CPU_FOREACH(cs) {
+        X86CPU *cpu = X86_CPU(cs);
+
+        if (!cpu->apic_state) {
+            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
+        } else {
+            apic_deliver_nmi(cpu->apic_state);
+        }
+    }
+}
+
+static void microvm_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+    NMIClass *nc = NMI_CLASS(oc);
+
+    mc->init = microvm_machine_state_init;
+
+    mc->family = "microvm_i386";
+    mc->desc = "Microvm (i386)";
+    mc->units_per_default_bus = 1;
+    mc->no_floppy = 1;
+    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
+    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
+    mc->max_cpus = 288;
+    mc->has_hotpluggable_cpus = false;
+    mc->auto_enable_numa_with_memhp = false;
+    mc->default_cpu_type = X86_CPU_TYPE_NAME("host");
+    mc->nvdimm_supported = false;
+    mc->default_machine_opts = "accel=kvm";
+
+    /* Machine class handlers */
+    mc->reset = microvm_machine_reset;
+
+    /* NMI handler */
+    nc->nmi_monitor_handler = x86_nmi;
+
+    object_class_property_add_bool(oc, MICROVM_MACHINE_LEGACY,
+                                   microvm_machine_get_legacy,
+                                   microvm_machine_set_legacy,
+                                   &error_abort);
+}
+
+static const TypeInfo microvm_machine_info = {
+    .name          = TYPE_MICROVM_MACHINE,
+    .parent        = TYPE_MACHINE,
+    .instance_size = sizeof(MicrovmMachineState),
+    .class_size    = sizeof(MicrovmMachineClass),
+    .class_init    = microvm_class_init,
+    .interfaces = (InterfaceInfo[]) {
+         { TYPE_NMI },
+         { }
+    },
+};
+
+static void microvm_machine_init(void)
+{
+    type_register_static(&microvm_machine_info);
+}
+type_init(microvm_machine_init);
diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
new file mode 100644
index 0000000000..fd6f370997
--- /dev/null
+++ b/include/hw/i386/microvm.h
@@ -0,0 +1,82 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_MICROVM_H
+#define HW_I386_MICROVM_H
+
+#include "qemu-common.h"
+#include "exec/hwaddr.h"
+#include "qemu/notify.h"
+
+#include "hw/boards.h"
+
+/* Microvm memory layout */
+#define PVH_START_INFO        0x6000
+#define MEMMAP_START          0x7000
+#define MODLIST_START         0x7800
+#define BOOT_STACK_POINTER    0x8ff0
+#define PML4_START            0x9000
+#define PDPTE_START           0xa000
+#define PDE_START             0xb000
+#define KERNEL_CMDLINE_START  0x20000
+#define EBDA_START            0x9fc00
+#define HIMEM_START           0x100000
+#define MICROVM_MAX_BELOW_4G  0xe0000000
+
+/* Platform virtio definitions */
+#define VIRTIO_MMIO_BASE      0xd0000000
+#define VIRTIO_IRQ_BASE       5
+#define VIRTIO_NUM_TRANSPORTS 8
+#define VIRTIO_CMDLINE_MAXLEN 64
+
+/* Machine type options */
+#define MICROVM_MACHINE_LEGACY "legacy"
+
+typedef struct {
+    MachineClass parent;
+    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
+                                           DeviceState *dev);
+} MicrovmMachineClass;
+
+typedef struct {
+    MachineState parent;
+    qemu_irq *gsi;
+
+    /* RAM size */
+    ram_addr_t below_4g_mem_size;
+    ram_addr_t above_4g_mem_size;
+
+    /* Kernel ELF entry. On reset, vCPUs RIP will be set to this */
+    uint64_t elf_entry;
+
+    /* Optional initrd start address and size */
+    uint64_t initrd_addr;
+    uint32_t initrd_size;
+
+    /* Legacy mode based on an ISA bus. Useful for debugging */
+    bool legacy;
+} MicrovmMachineState;
+
+#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
+#define MICROVM_MACHINE(obj) \
+    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
+#define MICROVM_MACHINE_CLASS(class) \
+    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
+
+#endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] hw/i386: Introduce the microvm machine type
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
@ 2019-07-02 13:58   ` Gerd Hoffmann
  2019-07-25 10:47   ` Paolo Bonzini
  1 sibling, 0 replies; 68+ messages in thread
From: Gerd Hoffmann @ 2019-07-02 13:58 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, maran.wilson, mst, qemu-devel, pbonzini, sgarzare, rth

  Hi,

> +#define MICROVM_MAX_BELOW_4G  0xe0000000
> +
> +/* Platform virtio definitions */
> +#define VIRTIO_MMIO_BASE      0xd0000000

That isn't going to fly ...

I'd also suggest to add a microvm.txt file to docs/ with specification
(io memory, io ports, memory layout in pvh mode, in firmware mode, ...)
and usage information.

cut & paste the bits sprinkled all over in commit messages and cover
letter would be a good start.

cheers,
  Gerd



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (3 preceding siblings ...)
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
@ 2019-07-02 15:01 ` no-reply
  2019-07-02 15:23 ` Peter Maydell
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 68+ messages in thread
From: no-reply @ 2019-07-02 15:01 UTC (permalink / raw)
  To: slp
  Cc: ehabkost, slp, maran.wilson, mst, qemu-devel, kraxel, pbonzini,
	sgarzare, rth

Patchew URL: https://patchew.org/QEMU/20190702121106.28374-1-slp@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Message-id: 20190702121106.28374-1-slp@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/20190702113414.6896-1-armbru@redhat.com -> patchew/20190702113414.6896-1-armbru@redhat.com
Switched to a new branch 'test'
8ebe540 hw/i386: Introduce the microvm machine type
ac71c2a hw/i386: Factorize PVH related functions
faeccbd hw/i386: Add an Intel MPTable generator
7540b93 hw/virtio: Factorize virtio-mmio headers

=== OUTPUT BEGIN ===
1/4 Checking commit 7540b9358a0f (hw/virtio: Factorize virtio-mmio headers)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#66: 
new file mode 100644

total: 0 errors, 1 warnings, 105 lines checked

Patch 1/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
2/4 Checking commit faeccbd2c589 (hw/i386: Add an Intel MPTable generator)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#16: 
new file mode 100644

total: 0 errors, 1 warnings, 374 lines checked

Patch 2/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/4 Checking commit ac71c2af3972 (hw/i386: Factorize PVH related functions)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#186: 
new file mode 100644

ERROR: do not initialise statics to 0 or NULL
#210: FILE: hw/i386/pvh.c:20:
+static size_t pvh_start_addr = 0;

total: 1 errors, 1 warnings, 281 lines checked

Patch 3/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/4 Checking commit 8ebe540c4430 (hw/i386: Introduce the microvm machine type)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#67: 
new file mode 100644

ERROR: Error messages should not contain newlines
#291: FILE: hw/i386/microvm.c:220:
+            error_report("qemu: error reading initrd %s: %s\n",

ERROR: Error messages should not contain newlines
#299: FILE: hw/i386/microvm.c:228:
+                         "(max: %"PRIu32", need %"PRId64")\n",

total: 2 errors, 1 warnings, 653 lines checked

Patch 4/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190702121106.28374-1-slp@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (4 preceding siblings ...)
  2019-07-02 15:01 ` [Qemu-devel] [PATCH v3 0/4] " no-reply
@ 2019-07-02 15:23 ` Peter Maydell
  2019-07-02 17:34   ` Sergio Lopez
  2019-07-02 15:30 ` no-reply
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2019-07-02 15:23 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Eduardo Habkost, maran.wilson, Michael S. Tsirkin,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

On Tue, 2 Jul 2019 at 13:14, Sergio Lopez <slp@redhat.com> wrote:
>
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
>
> It's main purpose is providing users a KVM-only machine type with fast
> boot times, minimal attack surface (measured as the number of IO ports
> and MMIO regions exposed to the Guest) and small footprint (specially
> when combined with the ongoing QEMU modularization effort).
>
> Normally, other than the device support provided by KVM itself,
> microvm only supports virtio-mmio devices. Microvm also includes a
> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> for being able to see the early boot kernel messages.

Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
a bit deprecated and tends not to support all the features that
virtio-pci does. It was introduced mostly as a stopgap while we
didn't have pci support in the aarch64 virt machine, and remains
for legacy "we don't like to break existing working setups" rather
than as a recommended config for new systems.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (5 preceding siblings ...)
  2019-07-02 15:23 ` Peter Maydell
@ 2019-07-02 15:30 ` no-reply
  2019-07-03  9:58 ` Stefan Hajnoczi
  2019-08-29  9:02 ` Jing Liu
  8 siblings, 0 replies; 68+ messages in thread
From: no-reply @ 2019-07-02 15:30 UTC (permalink / raw)
  To: slp
  Cc: ehabkost, slp, maran.wilson, mst, qemu-devel, kraxel, pbonzini,
	sgarzare, rth

Patchew URL: https://patchew.org/QEMU/20190702121106.28374-1-slp@redhat.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/check-qlit -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="check-qlit" 
==7808==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
PASS 6 fdc-test /x86_64/fdc/relative_seek
---
PASS 32 test-opts-visitor /visitor/opts/range/beyond
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-coroutine" 
==7851==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==7851==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc0ad0a000; bottom 0x7fa44def8000; size: 0x0057bce12000 (376831025152)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-coroutine /basic/no-dangling-access
---
PASS 11 test-aio /aio/event/wait
PASS 12 test-aio /aio/event/flush
PASS 13 test-aio /aio/event/wait/no-flush-cb
==7866==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 test-aio /aio/timer/schedule
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
---
PASS 28 test-aio /aio-gsource/timer/schedule
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-aio-multithread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-aio-multithread" 
==7873==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-aio-multithread /aio/multi/lifecycle
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ide-test" 
==7890==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-aio-multithread /aio/multi/schedule
PASS 1 ide-test /x86_64/ide/identify
PASS 3 test-aio-multithread /aio/multi/mutex/contended
==7901==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 ide-test /x86_64/ide/flush
==7912==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
==7918==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
PASS 4 ide-test /x86_64/ide/bmdma/trim
==7929==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 5 ide-test /x86_64/ide/bmdma/short_prdt
==7940==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-throttle" 
PASS 6 ide-test /x86_64/ide/bmdma/one_sector_short_prdt
==7948==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-throttle /throttle/leak_bucket
PASS 2 test-throttle /throttle/compute_wait
PASS 3 test-throttle /throttle/init
---
PASS 14 test-throttle /throttle/config/max
PASS 15 test-throttle /throttle/config/iops_size
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-thread-pool" 
==7951==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==7955==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
PASS 3 test-thread-pool /thread-pool/submit-co
PASS 4 test-thread-pool /thread-pool/submit-many
PASS 7 ide-test /x86_64/ide/bmdma/long_prdt
==8027==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8027==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd45d06000; bottom 0x7f83e57fe000; size: 0x007960508000 (521306931200)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 8 ide-test /x86_64/ide/bmdma/no_busmaster
PASS 5 test-thread-pool /thread-pool/cancel
PASS 9 ide-test /x86_64/ide/flush/nodev
==8038==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 ide-test /x86_64/ide/flush/empty_drive
PASS 6 test-thread-pool /thread-pool/cancel-async
==8043==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-hbitmap -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-hbitmap" 
PASS 1 test-hbitmap /hbitmap/granularity
PASS 2 test-hbitmap /hbitmap/size/0
---
PASS 4 test-hbitmap /hbitmap/iter/empty
PASS 11 ide-test /x86_64/ide/flush/retry_pci
PASS 5 test-hbitmap /hbitmap/iter/partial
==8054==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 test-hbitmap /hbitmap/iter/granularity
PASS 7 test-hbitmap /hbitmap/iter/iter_and_reset
PASS 8 test-hbitmap /hbitmap/get/all
---
PASS 14 test-hbitmap /hbitmap/set/twice
PASS 15 test-hbitmap /hbitmap/set/overlap
PASS 16 test-hbitmap /hbitmap/reset/empty
==8060==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 17 test-hbitmap /hbitmap/reset/general
PASS 13 ide-test /x86_64/ide/cdrom/pio
PASS 18 test-hbitmap /hbitmap/reset/all
---
PASS 28 test-hbitmap /hbitmap/truncate/shrink/medium
PASS 29 test-hbitmap /hbitmap/truncate/shrink/large
PASS 30 test-hbitmap /hbitmap/meta/zero
==8066==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 ide-test /x86_64/ide/cdrom/pio_large
==8072==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 15 ide-test /x86_64/ide/cdrom/dma
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ahci-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ahci-test" 
==8086==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 31 test-hbitmap /hbitmap/meta/one
PASS 32 test-hbitmap /hbitmap/meta/byte
PASS 33 test-hbitmap /hbitmap/meta/word
PASS 1 ahci-test /x86_64/ahci/sanity
==8092==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 ahci-test /x86_64/ahci/pci_spec
PASS 34 test-hbitmap /hbitmap/meta/sector
PASS 35 test-hbitmap /hbitmap/serialize/align
==8098==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 ahci-test /x86_64/ahci/pci_enable
==8104==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 36 test-hbitmap /hbitmap/serialize/basic
PASS 37 test-hbitmap /hbitmap/serialize/part
PASS 38 test-hbitmap /hbitmap/serialize/zeroes
---
PASS 4 ahci-test /x86_64/ahci/hba_spec
PASS 43 test-hbitmap /hbitmap/next_dirty_area/next_dirty_area_4
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bdrv-drain -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-drain" 
==8113==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bdrv-drain /bdrv-drain/nested
PASS 2 test-bdrv-drain /bdrv-drain/multiparent
PASS 3 test-bdrv-drain /bdrv-drain/set_aio_context
---
PASS 20 test-bdrv-drain /bdrv-drain/iothread/drain_subtree
PASS 21 test-bdrv-drain /bdrv-drain/blockjob/drain_all
PASS 22 test-bdrv-drain /bdrv-drain/blockjob/drain
==8110==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 23 test-bdrv-drain /bdrv-drain/blockjob/drain_subtree
PASS 24 test-bdrv-drain /bdrv-drain/blockjob/error/drain_all
PASS 25 test-bdrv-drain /bdrv-drain/blockjob/error/drain
---
PASS 39 test-bdrv-drain /bdrv-drain/attach/drain
PASS 5 ahci-test /x86_64/ahci/hba_enable
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bdrv-graph-mod -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-graph-mod" 
==8159==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bdrv-graph-mod /bdrv-graph-mod/update-perm-tree
PASS 2 test-bdrv-graph-mod /bdrv-graph-mod/should-update-child
==8157==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-blockjob -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob" 
==8168==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-blockjob /blockjob/ids
PASS 2 test-blockjob /blockjob/cancel/created
PASS 3 test-blockjob /blockjob/cancel/running
---
PASS 8 test-blockjob /blockjob/cancel/concluded
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-blockjob-txn -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob-txn" 
PASS 6 ahci-test /x86_64/ahci/identify
==8174==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-blockjob-txn /single/success
PASS 2 test-blockjob-txn /single/failure
PASS 3 test-blockjob-txn /single/cancel
---
PASS 6 test-blockjob-txn /pair/cancel
PASS 7 test-blockjob-txn /pair/fail-cancel-race
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-block-backend -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-backend" 
==8176==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8181==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-block-backend /block-backend/drain_aio_error
PASS 2 test-block-backend /block-backend/drain_all_aio_error
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-block-iothread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-iothread" 
PASS 7 ahci-test /x86_64/ahci/max
==8190==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-block-iothread /sync-op/pread
PASS 2 test-block-iothread /sync-op/pwrite
PASS 3 test-block-iothread /sync-op/load_vmstate
---
PASS 15 test-block-iothread /propagate/diamond
PASS 16 test-block-iothread /propagate/mirror
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-image-locking -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-image-locking" 
==8192==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8212==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-image-locking /image-locking/basic
PASS 2 test-image-locking /image-locking/set-perm-abort
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-x86-cpuid -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-x86-cpuid" 
---
PASS 4 test-xbzrle /xbzrle/encode_decode_1_byte
PASS 5 test-xbzrle /xbzrle/encode_decode_overflow
PASS 8 ahci-test /x86_64/ahci/reset
==8228==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8228==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc98ab7000; bottom 0x7f6a659fe000; size: 0x0092330b9000 (627921620992)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 6 test-xbzrle /xbzrle/encode_decode
---
PASS 133 test-cutils /cutils/strtosz/erange
PASS 134 test-cutils /cutils/strtosz/metric
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-shift128 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-shift128" 
==8240==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-shift128 /host-utils/test_lshift
PASS 2 test-shift128 /host-utils/test_rshift
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-mul64 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-mul64" 
==8240==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd869e8000; bottom 0x7f71117fe000; size: 0x008c751ea000 (603260362752)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-mul64 /host-utils/mulu64
---
PASS 9 test-int128 /int128/int128_gt
PASS 10 test-int128 /int128/int128_rshift
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/rcutorture -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="rcutorture" 
==8262==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8262==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fffd5dde000; bottom 0x7f7850bfe000; size: 0x0087851e0000 (582053920768)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 rcutorture /rcu/torture/1reader
PASS 11 ahci-test /x86_64/ahci/io/pio/lba28/simple/high
==8295==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 rcutorture /rcu/torture/10readers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-list -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-list" 
==8295==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe6e62e000; bottom 0x7f1b1fbfe000; size: 0x00e34ea30000 (976276881408)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 12 ahci-test /x86_64/ahci/io/pio/lba28/double/zero
==8308==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-rcu-list /rcu/qlist/single-threaded
==8308==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc54a9f000; bottom 0x7f5c1bdfe000; size: 0x00a038ca1000 (688147533824)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 2 test-rcu-list /rcu/qlist/short-few
PASS 13 ahci-test /x86_64/ahci/io/pio/lba28/double/low
==8341==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8341==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc4b8b5000; bottom 0x7f782c7fe000; size: 0x00841f0b7000 (567456526336)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 14 ahci-test /x86_64/ahci/io/pio/lba28/double/high
==8347==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8347==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffeb2bc8000; bottom 0x7fd572124000; size: 0x002940aa4000 (177178558464)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 3 test-rcu-list /rcu/qlist/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-simpleq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-simpleq" 
PASS 15 ahci-test /x86_64/ahci/io/pio/lba28/long/zero
PASS 1 test-rcu-simpleq /rcu/qsimpleq/single-threaded
==8360==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8360==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc5ebf2000; bottom 0x7f8d6cdfe000; size: 0x006ef1df4000 (476504342528)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 2 test-rcu-simpleq /rcu/qsimpleq/short-few
PASS 16 ahci-test /x86_64/ahci/io/pio/lba28/long/low
==8393==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8393==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc1e90d000; bottom 0x7fef47124000; size: 0x000cd77e9000 (55155003392)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 17 ahci-test /x86_64/ahci/io/pio/lba28/long/high
PASS 3 test-rcu-simpleq /rcu/qsimpleq/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-tailq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-tailq" 
==8399==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 18 ahci-test /x86_64/ahci/io/pio/lba28/short/zero
PASS 1 test-rcu-tailq /rcu/qtailq/single-threaded
==8412==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-rcu-tailq /rcu/qtailq/short-few
PASS 19 ahci-test /x86_64/ahci/io/pio/lba28/short/low
==8445==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 20 ahci-test /x86_64/ahci/io/pio/lba28/short/high
PASS 3 test-rcu-tailq /rcu/qtailq/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qdist -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qdist" 
==8451==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qdist /qdist/none
PASS 2 test-qdist /qdist/pr
PASS 3 test-qdist /qdist/single/empty
---
PASS 7 test-qdist /qdist/binning/expand
PASS 8 test-qdist /qdist/binning/shrink
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qht -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht" 
==8451==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffcd1fb1000; bottom 0x7f8bae7fe000; size: 0x0071237b3000 (485926580224)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 21 ahci-test /x86_64/ahci/io/pio/lba48/simple/zero
==8466==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8466==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd0b06f000; bottom 0x7fd8d85fe000; size: 0x002432a71000 (155468632064)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 22 ahci-test /x86_64/ahci/io/pio/lba48/simple/low
==8472==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8472==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe2c664000; bottom 0x7f11299fe000; size: 0x00ed02c66000 (1017953804288)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 23 ahci-test /x86_64/ahci/io/pio/lba48/simple/high
==8478==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8478==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdb1ded000; bottom 0x7f37fd1fe000; size: 0x00c5b4bef000 (849140969472)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 24 ahci-test /x86_64/ahci/io/pio/lba48/double/zero
==8484==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8484==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc4f4ff000; bottom 0x7ff9595fe000; size: 0x0002f5f01000 (12716085248)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 25 ahci-test /x86_64/ahci/io/pio/lba48/double/low
==8490==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8490==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdb07bb000; bottom 0x7ffbc8dfe000; size: 0x0001e79bd000 (8180715520)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 26 ahci-test /x86_64/ahci/io/pio/lba48/double/high
==8496==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8496==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff207e2000; bottom 0x7fb6ffdfe000; size: 0x0048209e4000 (309784887296)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 27 ahci-test /x86_64/ahci/io/pio/lba48/long/zero
==8502==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8502==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc7d92d000; bottom 0x7f0b65b7c000; size: 0x00f117db1000 (1035487350784)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 28 ahci-test /x86_64/ahci/io/pio/lba48/long/low
==8508==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8508==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe6de73000; bottom 0x7fc79a9fe000; size: 0x0036d3475000 (235472900096)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 29 ahci-test /x86_64/ahci/io/pio/lba48/long/high
==8514==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 30 ahci-test /x86_64/ahci/io/pio/lba48/short/zero
==8520==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qht /qht/mode/default
PASS 31 ahci-test /x86_64/ahci/io/pio/lba48/short/low
PASS 2 test-qht /qht/mode/resize
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qht-par -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht-par" 
==8526==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 32 ahci-test /x86_64/ahci/io/pio/lba48/short/high
PASS 1 test-qht-par /qht/parallel/2threads-0%updates-1s
==8542==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 33 ahci-test /x86_64/ahci/io/dma/lba28/fragmented
PASS 2 test-qht-par /qht/parallel/2threads-20%updates-1s
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bitops -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bitops" 
==8555==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bitops /bitops/sextract32
PASS 2 test-bitops /bitops/sextract64
PASS 3 test-bitops /bitops/half_shuffle32
---
PASS 1 check-qom-interface /qom/interface/direct_impl
PASS 2 check-qom-interface /qom/interface/intermediate_impl
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/check-qom-proplist -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="check-qom-proplist" 
==8580==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 check-qom-proplist /qom/proplist/createlist
PASS 2 check-qom-proplist /qom/proplist/createv
PASS 3 check-qom-proplist /qom/proplist/createcmdline
---
PASS 4 test-write-threshold /write-threshold/not-trigger
PASS 5 test-write-threshold /write-threshold/trigger
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-hash -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-hash" 
==8607==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-hash /crypto/hash/iov
PASS 2 test-crypto-hash /crypto/hash/alloc
PASS 3 test-crypto-hash /crypto/hash/prealloc
---
PASS 15 test-crypto-secret /crypto/secret/crypt/missingiv
PASS 16 test-crypto-secret /crypto/secret/crypt/badiv
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-tlscredsx509 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-tlscredsx509" 
==8630==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 37 ahci-test /x86_64/ahci/io/dma/lba28/simple/high
PASS 1 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectserver
PASS 2 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectclient
PASS 3 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca1
==8645==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca2
PASS 38 ahci-test /x86_64/ahci/io/dma/lba28/double/zero
PASS 5 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca3
PASS 6 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca1
PASS 7 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca2
PASS 8 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca3
==8651==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver1
PASS 39 ahci-test /x86_64/ahci/io/dma/lba28/double/low
==8657==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 40 ahci-test /x86_64/ahci/io/dma/lba28/double/high
PASS 10 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver2
==8663==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver3
PASS 41 ahci-test /x86_64/ahci/io/dma/lba28/long/zero
PASS 12 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver4
==8669==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 42 ahci-test /x86_64/ahci/io/dma/lba28/long/low
PASS 13 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver5
PASS 14 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver6
---
PASS 32 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive1
PASS 33 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive2
PASS 34 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive3
==8675==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 35 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/chain1
PASS 36 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/chain2
PASS 37 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/missingca
---
PASS 39 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/missingclient
PASS 43 ahci-test /x86_64/ahci/io/dma/lba28/long/high
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-tlssession -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-tlssession" 
==8682==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-tlssession /qcrypto/tlssession/psk
PASS 44 ahci-test /x86_64/ahci/io/dma/lba28/short/zero
PASS 2 test-crypto-tlssession /qcrypto/tlssession/basicca
==8692==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 test-crypto-tlssession /qcrypto/tlssession/differentca
PASS 45 ahci-test /x86_64/ahci/io/dma/lba28/short/low
==8698==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 test-crypto-tlssession /qcrypto/tlssession/altname1
PASS 46 ahci-test /x86_64/ahci/io/dma/lba28/short/high
==8704==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 test-crypto-tlssession /qcrypto/tlssession/altname2
PASS 47 ahci-test /x86_64/ahci/io/dma/lba48/simple/zero
PASS 6 test-crypto-tlssession /qcrypto/tlssession/altname3
PASS 7 test-crypto-tlssession /qcrypto/tlssession/altname4
==8710==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 test-crypto-tlssession /qcrypto/tlssession/altname5
PASS 48 ahci-test /x86_64/ahci/io/dma/lba48/simple/low
PASS 9 test-crypto-tlssession /qcrypto/tlssession/altname6
==8716==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 49 ahci-test /x86_64/ahci/io/dma/lba48/simple/high
PASS 10 test-crypto-tlssession /qcrypto/tlssession/wildcard1
==8722==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-crypto-tlssession /qcrypto/tlssession/wildcard2
PASS 12 test-crypto-tlssession /qcrypto/tlssession/wildcard3
PASS 50 ahci-test /x86_64/ahci/io/dma/lba48/double/zero
==8729==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 51 ahci-test /x86_64/ahci/io/dma/lba48/double/low
==8735==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 13 test-crypto-tlssession /qcrypto/tlssession/wildcard4
PASS 52 ahci-test /x86_64/ahci/io/dma/lba48/double/high
==8741==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 test-crypto-tlssession /qcrypto/tlssession/wildcard5
PASS 15 test-crypto-tlssession /qcrypto/tlssession/wildcard6
PASS 16 test-crypto-tlssession /qcrypto/tlssession/cachain
PASS 53 ahci-test /x86_64/ahci/io/dma/lba48/long/zero
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qga -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qga" 
==8748==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qga /qga/sync-delimited
PASS 2 test-qga /qga/sync
PASS 3 test-qga /qga/ping
---
PASS 16 test-qga /qga/invalid-args
PASS 17 test-qga /qga/fsfreeze-status
PASS 54 ahci-test /x86_64/ahci/io/dma/lba48/long/low
==8760==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 18 test-qga /qga/blacklist
PASS 19 test-qga /qga/config
PASS 20 test-qga /qga/guest-exec
PASS 21 test-qga /qga/guest-exec-invalid
PASS 55 ahci-test /x86_64/ahci/io/dma/lba48/long/high
==8773==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 22 test-qga /qga/guest-get-osinfo
PASS 23 test-qga /qga/guest-get-host-name
PASS 24 test-qga /qga/guest-get-timezone
---
PASS 56 ahci-test /x86_64/ahci/io/dma/lba48/short/zero
PASS 1 test-util-filemonitor /util/filemonitor
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-util-sockets -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-util-sockets" 
==8790==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-util-sockets /util/socket/is-socket/bad
PASS 2 test-util-sockets /util/socket/is-socket/good
PASS 3 test-util-sockets /socket/fd-pass/name/good
---
PASS 4 test-authz-listfile /auth/list/explicit/deny
PASS 5 test-authz-listfile /auth/list/explicit/allow
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-task -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-task" 
==8818==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-io-task /crypto/task/complete
PASS 2 test-io-task /crypto/task/datafree
PASS 3 test-io-task /crypto/task/failure
---
PASS 5 test-io-channel-file /io/channel/pipe/async
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-channel-tls -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-tls" 
PASS 58 ahci-test /x86_64/ahci/io/dma/lba48/short/high
==8885==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-io-channel-tls /qio/channel/tls/basic
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-channel-command -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-command" 
PASS 1 test-io-channel-command /io/channel/command/fifo/sync
---
PASS 17 test-crypto-pbkdf /crypto/pbkdf/nonrfc/sha384/iter1200
PASS 18 test-crypto-pbkdf /crypto/pbkdf/nonrfc/ripemd160/iter1200
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-ivgen -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-ivgen" 
==8906==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-ivgen /crypto/ivgen/plain/1
PASS 2 test-crypto-ivgen /crypto/ivgen/plain/1f2e3d4c
PASS 3 test-crypto-ivgen /crypto/ivgen/plain/1f2e3d4c5b6a7988
---
PASS 1 test-logging /logging/parse_range
PASS 2 test-logging /logging/parse_path
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-replication -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-replication" 
==8947==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8945==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-replication /replication/primary/read
PASS 2 test-replication /replication/primary/write
PASS 61 ahci-test /x86_64/ahci/flush/simple
==8956==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 test-replication /replication/primary/start
PASS 4 test-replication /replication/primary/stop
PASS 5 test-replication /replication/primary/do_checkpoint
PASS 6 test-replication /replication/primary/get_error_all
PASS 62 ahci-test /x86_64/ahci/flush/retry
==8962==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 test-replication /replication/secondary/read
==8967==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 test-replication /replication/secondary/write
PASS 63 ahci-test /x86_64/ahci/flush/migrate
==8976==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8981==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8947==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc17022000; bottom 0x7fa4f2cfc000; size: 0x005724326000 (374269435904)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 9 test-replication /replication/secondary/start
PASS 64 ahci-test /x86_64/ahci/migrate/sanity
==9008==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9013==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 test-replication /replication/secondary/stop
PASS 65 ahci-test /x86_64/ahci/migrate/dma/simple
==9022==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9027==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-replication /replication/secondary/do_checkpoint
PASS 12 test-replication /replication/secondary/get_error_all
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bufferiszero -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bufferiszero" 
PASS 66 ahci-test /x86_64/ahci/migrate/dma/halted
==9040==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9045==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 67 ahci-test /x86_64/ahci/migrate/ncq/simple
==9054==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9059==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 68 ahci-test /x86_64/ahci/migrate/ncq/halted
==9068==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 69 ahci-test /x86_64/ahci/cdrom/eject
==9073==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 70 ahci-test /x86_64/ahci/cdrom/dma/single
==9079==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 71 ahci-test /x86_64/ahci/cdrom/dma/multi
==9085==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 72 ahci-test /x86_64/ahci/cdrom/pio/single
==9091==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9091==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdd7f93000; bottom 0x7f75251fe000; size: 0x0088b2d95000 (587116138496)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 73 ahci-test /x86_64/ahci/cdrom/pio/multi
==9097==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 74 ahci-test /x86_64/ahci/cdrom/pio/bcl
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/hd-geo-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="hd-geo-test" 
PASS 1 hd-geo-test /x86_64/hd-geo/ide/none
==9111==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 hd-geo-test /x86_64/hd-geo/ide/drive/cd_0
==9117==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/blank
==9123==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/lba
==9129==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/chs
==9135==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 hd-geo-test /x86_64/hd-geo/ide/device/mbr/blank
==9141==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 hd-geo-test /x86_64/hd-geo/ide/device/mbr/lba
==9147==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 hd-geo-test /x86_64/hd-geo/ide/device/mbr/chs
==9153==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 hd-geo-test /x86_64/hd-geo/ide/device/user/chs
==9158==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 hd-geo-test /x86_64/hd-geo/ide/device/user/chst
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/boot-order-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-order-test" 
PASS 1 test-bufferiszero /cutils/bufferiszero
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9243==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 bios-tables-test /x86_64/acpi/piix4
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9249==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 bios-tables-test /x86_64/acpi/q35
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9255==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 bios-tables-test /x86_64/acpi/piix4/bridge
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9261==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 bios-tables-test /x86_64/acpi/piix4/ipmi
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9267==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 bios-tables-test /x86_64/acpi/piix4/cpuhp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9274==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 bios-tables-test /x86_64/acpi/piix4/memhp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9280==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 bios-tables-test /x86_64/acpi/piix4/numamem
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9286==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 bios-tables-test /x86_64/acpi/piix4/dimmpxm
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9295==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 bios-tables-test /x86_64/acpi/q35/bridge
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9301==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 bios-tables-test /x86_64/acpi/q35/mmio64
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9307==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 bios-tables-test /x86_64/acpi/q35/ipmi
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9313==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 12 bios-tables-test /x86_64/acpi/q35/cpuhp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9320==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 13 bios-tables-test /x86_64/acpi/q35/memhp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9326==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 bios-tables-test /x86_64/acpi/q35/numamem
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9332==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 15 bios-tables-test /x86_64/acpi/q35/dimmpxm
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/boot-serial-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-serial-test" 
PASS 1 boot-serial-test /x86_64/boot-serial/isapc
---
PASS 1 i440fx-test /x86_64/i440fx/defaults
PASS 2 i440fx-test /x86_64/i440fx/pam
PASS 3 i440fx-test /x86_64/i440fx/firmware/bios
==9416==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 i440fx-test /x86_64/i440fx/firmware/pflash
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/fw_cfg-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="fw_cfg-test" 
PASS 1 fw_cfg-test /x86_64/fw_cfg/signature
---
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/drive_del-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="drive_del-test" 
PASS 1 drive_del-test /x86_64/drive_del/without-dev
PASS 2 drive_del-test /x86_64/drive_del/after_failed_device_add
==9504==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 drive_del-test /x86_64/blockdev/drive_del_device_del
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/wdt_ib700-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="wdt_ib700-test" 
PASS 1 wdt_ib700-test /x86_64/wdt_ib700/pause
---
PASS 1 usb-hcd-uhci-test /x86_64/uhci/pci/init
PASS 2 usb-hcd-uhci-test /x86_64/uhci/pci/port1
PASS 3 usb-hcd-uhci-test /x86_64/uhci/pci/hotplug
==9699==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 usb-hcd-uhci-test /x86_64/uhci/pci/hotplug/usb-storage
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/usb-hcd-xhci-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="usb-hcd-xhci-test" 
PASS 1 usb-hcd-xhci-test /x86_64/xhci/pci/init
PASS 2 usb-hcd-xhci-test /x86_64/xhci/pci/hotplug
==9708==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 usb-hcd-xhci-test /x86_64/xhci/pci/hotplug/usb-uas
PASS 4 usb-hcd-xhci-test /x86_64/xhci/pci/hotplug/usb-ccid
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/cpu-plug-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="cpu-plug-test" 
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9814==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 vmgenid-test /x86_64/vmgenid/vmgenid/set-guid
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9820==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 vmgenid-test /x86_64/vmgenid/vmgenid/set-guid-auto
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9826==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 vmgenid-test /x86_64/vmgenid/vmgenid/query-monitor
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/tpm-crb-swtpm-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="tpm-crb-swtpm-test" 
SKIP 1 tpm-crb-swtpm-test /x86_64/tpm/crb-swtpm/test # SKIP swtpm not in PATH or missing --tpm2 support
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9931==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9936==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 migration-test /x86_64/migration/fd_proto
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9944==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9949==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 migration-test /x86_64/migration/postcopy/unix
PASS 5 migration-test /x86_64/migration/postcopy/recovery
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9979==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9984==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 migration-test /x86_64/migration/precopy/unix
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9993==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9998==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 migration-test /x86_64/migration/precopy/tcp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==10007==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==10012==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 migration-test /x86_64/migration/xbzrle/unix
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/test-x86-cpuid-compat -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-x86-cpuid-compat" 
PASS 1 test-x86-cpuid-compat /x86/cpuid/parsing-plus-minus
---
PASS 6 numa-test /x86_64/numa/pc/dynamic/cpu
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/qmp-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="qmp-test" 
PASS 1 qmp-test /x86_64/qmp/protocol
==10341==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 qmp-test /x86_64/qmp/oob
PASS 3 qmp-test /x86_64/qmp/preconfig
PASS 4 qmp-test /x86_64/qmp/missing-any-arg
---
PASS 5 device-introspect-test /x86_64/device/introspect/abstract-interfaces

=================================================================
==10589==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x561de4fecb2e in calloc (/tmp/qemu-test/build/x86_64-softmmu/qemu-system-x86_64+0x19fdb2e)
---

SUMMARY: AddressSanitizer: 64 byte(s) leaked in 2 allocation(s).
/tmp/qemu-test/src/tests/libqtest.c:137: kill_qemu() tried to terminate QEMU process but encountered exit status 1
ERROR - too few tests run (expected 6, got 5)
make: *** [/tmp/qemu-test/src/tests/Makefile.include:894: check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):


The full log is available at
http://patchew.org/logs/20190702121106.28374-1-slp@redhat.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-02 15:23 ` Peter Maydell
@ 2019-07-02 17:34   ` Sergio Lopez
  2019-07-02 18:04     ` Peter Maydell
  0 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-07-02 17:34 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Eduardo Habkost, maran.wilson, Michael S. Tsirkin,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1429 bytes --]


Peter Maydell <peter.maydell@linaro.org> writes:

> On Tue, 2 Jul 2019 at 13:14, Sergio Lopez <slp@redhat.com> wrote:
>>
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>>
>> It's main purpose is providing users a KVM-only machine type with fast
>> boot times, minimal attack surface (measured as the number of IO ports
>> and MMIO regions exposed to the Guest) and small footprint (specially
>> when combined with the ongoing QEMU modularization effort).
>>
>> Normally, other than the device support provided by KVM itself,
>> microvm only supports virtio-mmio devices. Microvm also includes a
>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> for being able to see the early boot kernel messages.
>
> Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> a bit deprecated and tends not to support all the features that
> virtio-pci does. It was introduced mostly as a stopgap while we
> didn't have pci support in the aarch64 virt machine, and remains
> for legacy "we don't like to break existing working setups" rather
> than as a recommended config for new systems.

Using virtio-pci implies keeping PCI and ACPI support, defeating a
significant part of microvm's purpose.

What are the issues with the current state of virtio-mmio? Is there a
way I can help to improve the situation?

Sergio.



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-02 17:34   ` Sergio Lopez
@ 2019-07-02 18:04     ` Peter Maydell
  2019-07-02 22:04       ` Sergio Lopez
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2019-07-02 18:04 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Eduardo Habkost, maran.wilson, Michael S. Tsirkin,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

On Tue, 2 Jul 2019 at 18:34, Sergio Lopez <slp@redhat.com> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> > a bit deprecated and tends not to support all the features that
> > virtio-pci does. It was introduced mostly as a stopgap while we
> > didn't have pci support in the aarch64 virt machine, and remains
> > for legacy "we don't like to break existing working setups" rather
> > than as a recommended config for new systems.
>
> Using virtio-pci implies keeping PCI and ACPI support, defeating a
> significant part of microvm's purpose.
>
> What are the issues with the current state of virtio-mmio? Is there a
> way I can help to improve the situation?

Off the top of my head:
 * limitations on numbers of devices
 * no hotplug support
 * unlike PCI, it's not probeable, so you have to tell the
   guest where all the transports are using device tree or
   some similar mechanism
 * you need one IRQ line per transport, which restricts how
   many you can have
 * it's only virtio-0.9, it doesn't support any of the new
   virtio-1.0 functionality
 * it is broadly not really maintained in QEMU (and I think
   not really in the kernel either? not sure), because we'd
   rather not have to maintain two mechanisms for doing virtio
   when virtio-pci is clearly better than virtio-mmio

thanks
-- PMM


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-02 18:04     ` Peter Maydell
@ 2019-07-02 22:04       ` Sergio Lopez
  2019-07-25  9:59         ` Michael S. Tsirkin
  0 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-07-02 22:04 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Eduardo Habkost, maran.wilson, Michael S. Tsirkin,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

On Tue, Jul 02, 2019 at 07:04:15PM +0100, Peter Maydell wrote:
> On Tue, 2 Jul 2019 at 18:34, Sergio Lopez <slp@redhat.com> wrote:
> > Peter Maydell <peter.maydell@linaro.org> writes:
> > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> > > a bit deprecated and tends not to support all the features that
> > > virtio-pci does. It was introduced mostly as a stopgap while we
> > > didn't have pci support in the aarch64 virt machine, and remains
> > > for legacy "we don't like to break existing working setups" rather
> > > than as a recommended config for new systems.
> >
> > Using virtio-pci implies keeping PCI and ACPI support, defeating a
> > significant part of microvm's purpose.
> >
> > What are the issues with the current state of virtio-mmio? Is there a
> > way I can help to improve the situation?
> 
> Off the top of my head:
>  * limitations on numbers of devices
>  * no hotplug support
>  * unlike PCI, it's not probeable, so you have to tell the
>    guest where all the transports are using device tree or
>    some similar mechanism
>  * you need one IRQ line per transport, which restricts how
>    many you can have
>  * it's only virtio-0.9, it doesn't support any of the new
>    virtio-1.0 functionality
>  * it is broadly not really maintained in QEMU (and I think
>    not really in the kernel either? not sure), because we'd
>    rather not have to maintain two mechanisms for doing virtio
>    when virtio-pci is clearly better than virtio-mmio

Some of these are design issues, but others can be improved with a bit
of work.

As for the maintenance burden, I volunteer myself to help with that, so
it won't have an impact on other developers and/or projects.

Sergio.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (6 preceding siblings ...)
  2019-07-02 15:30 ` no-reply
@ 2019-07-03  9:58 ` Stefan Hajnoczi
  2019-07-18 15:21   ` Sergio Lopez
  2019-08-29  9:02 ` Jing Liu
  8 siblings, 1 reply; 68+ messages in thread
From: Stefan Hajnoczi @ 2019-07-03  9:58 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, maran.wilson, mst, qemu-devel, kraxel, pbonzini, sgarzare, rth

[-- Attachment #1: Type: text/plain, Size: 3602 bytes --]

On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a KVM-only machine type with fast
> boot times, minimal attack surface (measured as the number of IO ports
> and MMIO regions exposed to the Guest) and small footprint (specially
> when combined with the ongoing QEMU modularization effort).
> 
> Normally, other than the device support provided by KVM itself,
> microvm only supports virtio-mmio devices. Microvm also includes a
> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> for being able to see the early boot kernel messages.
> 
> Microvm only supports booting PVH-enabled Linux ELF images. Booting
> other PVH-enabled kernels may be possible, but due to the lack of ACPI
> and firmware, we're relying on the command line for specifying the
> location of the virtio-mmio transports. If there's an interest on
> using this machine type with other kernels, we'll try to find some
> kind of middle ground solution.
> 
> This is the list of the exposed IO ports and MMIO regions when running
> in non-legacy mode:
> 
> address-space: memory
>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>     00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
>     00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
>     00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
>     00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> 
> address-space: I/O
>   0000000000000000-000000000000ffff (prio 0, i/o): io
>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
> 
> A QEMU instance with the microvm machine type can be invoked this way:
> 
>  - Normal mode:
> 
> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>  -nodefaults -no-user-config \
>  -chardev pty,id=virtiocon0,server \
>  -device virtio-serial-device \
>  -device virtconsole,chardev=virtiocon0 \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0
> 
>  - Legacy mode:
> 
> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>  -nodefaults -no-user-config \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0 \
>  -serial stdio

Please post metrics that compare this against a minimal Q35.

With qboot it was later found that SeaBIOS can achieve comparable boot
times, so it wasn't worth maintaining qboot.

Data is needed to show that microvm is really a significant improvement
over a minimal Q35.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-03  9:58 ` Stefan Hajnoczi
@ 2019-07-18 15:21   ` Sergio Lopez
  2019-07-19 10:29     ` Stefan Hajnoczi
  0 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-07-18 15:21 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, maran.wilson, mst, qemu-devel, kraxel, pbonzini, sgarzare, rth

[-- Attachment #1: Type: text/plain, Size: 30017 bytes --]


Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a KVM-only machine type with fast
>> boot times, minimal attack surface (measured as the number of IO ports
>> and MMIO regions exposed to the Guest) and small footprint (specially
>> when combined with the ongoing QEMU modularization effort).
>> 
>> Normally, other than the device support provided by KVM itself,
>> microvm only supports virtio-mmio devices. Microvm also includes a
>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> for being able to see the early boot kernel messages.
>> 
>> Microvm only supports booting PVH-enabled Linux ELF images. Booting
>> other PVH-enabled kernels may be possible, but due to the lack of ACPI
>> and firmware, we're relying on the command line for specifying the
>> location of the virtio-mmio transports. If there's an interest on
>> using this machine type with other kernels, we'll try to find some
>> kind of middle ground solution.
>> 
>> This is the list of the exposed IO ports and MMIO regions when running
>> in non-legacy mode:
>> 
>> address-space: memory
>>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>>     00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
>>     00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
>>     00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
>>     00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
>>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> 
>> address-space: I/O
>>   0000000000000000-000000000000ffff (prio 0, i/o): io
>>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>> 
>> A QEMU instance with the microvm machine type can be invoked this way:
>> 
>>  - Normal mode:
>> 
>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>  -nodefaults -no-user-config \
>>  -chardev pty,id=virtiocon0,server \
>>  -device virtio-serial-device \
>>  -device virtconsole,chardev=virtiocon0 \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0
>> 
>>  - Legacy mode:
>> 
>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>  -nodefaults -no-user-config \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0 \
>>  -serial stdio
>
> Please post metrics that compare this against a minimal Q35.
>
> With qboot it was later found that SeaBIOS can achieve comparable boot
> times, so it wasn't worth maintaining qboot.
>
> Data is needed to show that microvm is really a significant improvement
> over a minimal Q35.

I've just ran some numbers using Stefano Garzarella's qemu-boot-time
scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the
upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with
minimal features [2]. The VM boots a minimal kernel [3] without initrd,
using a kata container image as root via virtio-blk (though this isn't
really relevant, as we're just taking measurements until the kernel is
about to exec init).

To try to make the comparison as fair as possible, I've used a minimal
q35 machine with as few devices as possible. Disabling HPET and PIT at
the same time caused the kernel to get stuck on boot, so I ran two
iterations, one without HPET and the other without PIT:


-----------------
 | Q35 with HPET |
 -----------------

Command line:

./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test

Average boot times after 10 consecutive runs:

 qemu_init_end: 77.637936
 linux_start_kernel: 117.082526 (+39.44459)
 linux_start_user: 364.629972 (+247.547446)

Memory tree:

 address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
    0000000000000000-ffffffffffffffff (prio -1, i/o): pci
      00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
      00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
      00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
        00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
        00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
        00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
        00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
      00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
        00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
        00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
        00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
        00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
      00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
        00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
        00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
      00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
        00000000febff000-00000000febff01f (prio 0, i/o): msix-table
        00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
      00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
    00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
    00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
    0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
    00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
    00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
    00000000fed00000-00000000fed003ff (prio 0, i/o): hpet
    00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
    00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
    00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi

 address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io
    0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
    0000000000000008-000000000000000f (prio 0, i/o): dma-cont
    0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
    0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
    0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
    0000000000000070-0000000000000071 (prio 0, i/o): rtc
      0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
    000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
    0000000000000080-0000000000000080 (prio 0, i/o): ioport80
    0000000000000081-0000000000000083 (prio 0, i/o): dma-page
    0000000000000087-0000000000000087 (prio 0, i/o): dma-page
    0000000000000089-000000000000008b (prio 0, i/o): dma-page
    000000000000008f-000000000000008f (prio 0, i/o): dma-page
    0000000000000092-0000000000000092 (prio 0, i/o): port92
    00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
    00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
    00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
    00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
    00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
    00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
    00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
    0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
    0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
    0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
      0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
      0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
      0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
      0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
      0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
      0000000000000660-000000000000067f (prio 0, i/o): sm-tco
    0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
    0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
    0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
    0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
    000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
    000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci


 ----------------
 | Q35 with PIT |
 ----------------

Command line:

./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=on,vmport=off,sata=off,usb=off,graphics=off -no-hpet -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test

Average boot times after 10 consecutive runs:

 qemu_init_end: 77.467852
 linux_start_kernel: 116.688472 (+39.22062)
 linux_start_user: 363.033365 (+246.344893)

Memory tree:

address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
    0000000000000000-ffffffffffffffff (prio -1, i/o): pci
      00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
      00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
      00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
        00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
        00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
        00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
        00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
      00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
        00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
        00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
        00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
        00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
      00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
        00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
        00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
      00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
        00000000febff000-00000000febff01f (prio 0, i/o): msix-table
        00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
      00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
    00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
    00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
    0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
    00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
    00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
    00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
    00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
    00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi

address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io
    0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
    0000000000000008-000000000000000f (prio 0, i/o): dma-cont
    0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
    0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
    0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
    0000000000000061-0000000000000061 (prio 0, i/o): pcspk
    0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
    0000000000000070-0000000000000071 (prio 0, i/o): rtc
      0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
    000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
    0000000000000080-0000000000000080 (prio 0, i/o): ioport80
    0000000000000081-0000000000000083 (prio 0, i/o): dma-page
    0000000000000087-0000000000000087 (prio 0, i/o): dma-page
    0000000000000089-000000000000008b (prio 0, i/o): dma-page
    000000000000008f-000000000000008f (prio 0, i/o): dma-page
    0000000000000092-0000000000000092 (prio 0, i/o): port92
    00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
    00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
    00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
    00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
    00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
    00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
    00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
    0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
    0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
    0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
      0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
      0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
      0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
      0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
      0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
      0000000000000660-000000000000067f (prio 0, i/o): sm-tco
    0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
    0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
    0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
    0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
    000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
    000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci


 -----------
 | microvm |
 -----------

Command line:

./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M microvm -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk-device,drive=test

Average boot times after 10 consecutive runs:

 qemu_init_end: 64.043264
 linux_start_kernel: 65.481782 (+1.438518)
 linux_start_user: 114.938353 (+49.456571)

Memory tree:

 address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @microvm.ram 0000000000000000-000000001fffffff
    00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
    00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
    00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
    00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
    00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
    00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi

 address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io
    000000000000007e-000000000000007f (prio 0, i/o): kvmvapic


 --------------
 | Conclusion |
 --------------

The average boot time of microvm is a third of Q35's (115ms vs. 363ms),
and is smaller on all sections (QEMU initialization, firmware overhead
and kernel start-to-user).

Microvm's memory tree is also visibly simpler, significantly reducing
the exposed surface to the guest.

While we can certainly work on making Q35 smaller, I definitely think
it's better (and way safer!) having a specialized machine type for a
specific use case, than a minimal Q35 whose behavior significantly
diverges from a conventional Q35.

Sergio.

[1] https://github.com/stefano-garzarella/qemu-boot-time
[2] https://paste.fedoraproject.org/paste/YZ9Ok-dJtQrc0xxctFm-nw
[3] https://paste.fedoraproject.org/paste/sck0jfioAJdMq51HH6wkmA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-18 15:21   ` Sergio Lopez
@ 2019-07-19 10:29     ` Stefan Hajnoczi
  2019-07-19 13:48       ` Sergio Lopez
  0 siblings, 1 reply; 68+ messages in thread
From: Stefan Hajnoczi @ 2019-07-19 10:29 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, maran.wilson, mst, qemu-devel, kraxel, pbonzini, sgarzare, rth

[-- Attachment #1: Type: text/plain, Size: 31402 bytes --]

On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> 
> Stefan Hajnoczi <stefanha@gmail.com> writes:
> 
> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> >> Microvm is a machine type inspired by both NEMU and Firecracker, and
> >> constructed after the machine model implemented by the latter.
> >> 
> >> It's main purpose is providing users a KVM-only machine type with fast
> >> boot times, minimal attack surface (measured as the number of IO ports
> >> and MMIO regions exposed to the Guest) and small footprint (specially
> >> when combined with the ongoing QEMU modularization effort).
> >> 
> >> Normally, other than the device support provided by KVM itself,
> >> microvm only supports virtio-mmio devices. Microvm also includes a
> >> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> >> for being able to see the early boot kernel messages.
> >> 
> >> Microvm only supports booting PVH-enabled Linux ELF images. Booting
> >> other PVH-enabled kernels may be possible, but due to the lack of ACPI
> >> and firmware, we're relying on the command line for specifying the
> >> location of the virtio-mmio transports. If there's an interest on
> >> using this machine type with other kernels, we'll try to find some
> >> kind of middle ground solution.
> >> 
> >> This is the list of the exposed IO ports and MMIO regions when running
> >> in non-legacy mode:
> >> 
> >> address-space: memory
> >>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
> >>     00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
> >>     00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
> >>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> >> 
> >> address-space: I/O
> >>   0000000000000000-000000000000ffff (prio 0, i/o): io
> >>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
> >>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
> >>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
> >>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
> >>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
> >>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
> >> 
> >> A QEMU instance with the microvm machine type can be invoked this way:
> >> 
> >>  - Normal mode:
> >> 
> >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
> >>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
> >>  -nodefaults -no-user-config \
> >>  -chardev pty,id=virtiocon0,server \
> >>  -device virtio-serial-device \
> >>  -device virtconsole,chardev=virtiocon0 \
> >>  -drive id=test,file=test.img,format=raw,if=none \
> >>  -device virtio-blk-device,drive=test \
> >>  -netdev tap,id=tap0,script=no,downscript=no \
> >>  -device virtio-net-device,netdev=tap0
> >> 
> >>  - Legacy mode:
> >> 
> >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
> >>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
> >>  -nodefaults -no-user-config \
> >>  -drive id=test,file=test.img,format=raw,if=none \
> >>  -device virtio-blk-device,drive=test \
> >>  -netdev tap,id=tap0,script=no,downscript=no \
> >>  -device virtio-net-device,netdev=tap0 \
> >>  -serial stdio
> >
> > Please post metrics that compare this against a minimal Q35.
> >
> > With qboot it was later found that SeaBIOS can achieve comparable boot
> > times, so it wasn't worth maintaining qboot.
> >
> > Data is needed to show that microvm is really a significant improvement
> > over a minimal Q35.
> 
> I've just ran some numbers using Stefano Garzarella's qemu-boot-time
> scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the
> upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with
> minimal features [2]. The VM boots a minimal kernel [3] without initrd,
> using a kata container image as root via virtio-blk (though this isn't
> really relevant, as we're just taking measurements until the kernel is
> about to exec init).
> 
> To try to make the comparison as fair as possible, I've used a minimal
> q35 machine with as few devices as possible. Disabling HPET and PIT at
> the same time caused the kernel to get stuck on boot, so I ran two
> iterations, one without HPET and the other without PIT:
> 
> 
> -----------------
>  | Q35 with HPET |
>  -----------------
> 
> Command line:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test
> 
> Average boot times after 10 consecutive runs:
> 
>  qemu_init_end: 77.637936
>  linux_start_kernel: 117.082526 (+39.44459)
>  linux_start_user: 364.629972 (+247.547446)
> 
> Memory tree:
> 
>  address-space: memory
>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
>     0000000000000000-ffffffffffffffff (prio -1, i/o): pci
>       00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
>       00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
>       00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
>         00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
>         00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
>         00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
>         00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
>       00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
>         00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
>         00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
>         00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
>         00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
>       00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
>         00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
>         00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
>       00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
>         00000000febff000-00000000febff01f (prio 0, i/o): msix-table
>         00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
>       00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
>     00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
>     00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
>     0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
>     00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>     00000000fed00000-00000000fed003ff (prio 0, i/o): hpet
>     00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
>     00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> 
>  address-space: I/O
>   0000000000000000-000000000000ffff (prio 0, i/o): io
>     0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
>     0000000000000008-000000000000000f (prio 0, i/o): dma-cont
>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>     0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
>     0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
>     0000000000000070-0000000000000071 (prio 0, i/o): rtc
>       0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>     0000000000000080-0000000000000080 (prio 0, i/o): ioport80
>     0000000000000081-0000000000000083 (prio 0, i/o): dma-page
>     0000000000000087-0000000000000087 (prio 0, i/o): dma-page
>     0000000000000089-000000000000008b (prio 0, i/o): dma-page
>     000000000000008f-000000000000008f (prio 0, i/o): dma-page
>     0000000000000092-0000000000000092 (prio 0, i/o): port92
>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>     00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
>     00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
>     00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
>     00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>     0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
>     0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
>     0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
>       0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
>       0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
>       0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
>       0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
>       0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
>       0000000000000660-000000000000067f (prio 0, i/o): sm-tco
>     0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
>     0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
>     0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
>     0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
>     000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
>     000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci
> 
> 
>  ----------------
>  | Q35 with PIT |
>  ----------------
> 
> Command line:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=on,vmport=off,sata=off,usb=off,graphics=off -no-hpet -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test
> 
> Average boot times after 10 consecutive runs:
> 
>  qemu_init_end: 77.467852
>  linux_start_kernel: 116.688472 (+39.22062)
>  linux_start_user: 363.033365 (+246.344893)
> 
> Memory tree:
> 
> address-space: memory
>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
>     0000000000000000-ffffffffffffffff (prio -1, i/o): pci
>       00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
>       00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
>       00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
>         00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
>         00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
>         00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
>         00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
>       00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
>         00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
>         00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
>         00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
>         00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
>       00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
>         00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
>         00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
>       00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
>         00000000febff000-00000000febff01f (prio 0, i/o): msix-table
>         00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
>       00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
>     00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
>     00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
>     0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
>     00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>     00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
>     00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> 
> address-space: I/O
>   0000000000000000-000000000000ffff (prio 0, i/o): io
>     0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
>     0000000000000008-000000000000000f (prio 0, i/o): dma-cont
>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>     0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
>     0000000000000061-0000000000000061 (prio 0, i/o): pcspk
>     0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
>     0000000000000070-0000000000000071 (prio 0, i/o): rtc
>       0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>     0000000000000080-0000000000000080 (prio 0, i/o): ioport80
>     0000000000000081-0000000000000083 (prio 0, i/o): dma-page
>     0000000000000087-0000000000000087 (prio 0, i/o): dma-page
>     0000000000000089-000000000000008b (prio 0, i/o): dma-page
>     000000000000008f-000000000000008f (prio 0, i/o): dma-page
>     0000000000000092-0000000000000092 (prio 0, i/o): port92
>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>     00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
>     00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
>     00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
>     00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>     0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
>     0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
>     0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
>       0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
>       0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
>       0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
>       0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
>       0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
>       0000000000000660-000000000000067f (prio 0, i/o): sm-tco
>     0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
>     0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
>     0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
>     0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
>     000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
>     000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci
> 
> 
>  -----------
>  | microvm |
>  -----------
> 
> Command line:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M microvm -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk-device,drive=test
> 
> Average boot times after 10 consecutive runs:
> 
>  qemu_init_end: 64.043264
>  linux_start_kernel: 65.481782 (+1.438518)
>  linux_start_user: 114.938353 (+49.456571)
> 
> Memory tree:
> 
>  address-space: memory
>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @microvm.ram 0000000000000000-000000001fffffff
>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> 
>  address-space: I/O
>   0000000000000000-000000000000ffff (prio 0, i/o): io
>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
> 
> 
>  --------------
>  | Conclusion |
>  --------------
> 
> The average boot time of microvm is a third of Q35's (115ms vs. 363ms),
> and is smaller on all sections (QEMU initialization, firmware overhead
> and kernel start-to-user).
> 
> Microvm's memory tree is also visibly simpler, significantly reducing
> the exposed surface to the guest.
> 
> While we can certainly work on making Q35 smaller, I definitely think
> it's better (and way safer!) having a specialized machine type for a
> specific use case, than a minimal Q35 whose behavior significantly
> diverges from a conventional Q35.

Interesting, so not a 10x difference!  This might be amenable to
optimization.

My concern with microvm is that it's so limited that few users will be
able to benefit from the reduced attack surface and faster startup time.
I think it's worth investigating slimming down Q35 further first.

In terms of startup time the first step would be profiling Q35 kernel
startup to find out what's taking so long (firmware initialization, PCI
probing, etc)?

> Sergio.
> 
> [1] https://github.com/stefano-garzarella/qemu-boot-time
> [2] https://paste.fedoraproject.org/paste/YZ9Ok-dJtQrc0xxctFm-nw
> [3] https://paste.fedoraproject.org/paste/sck0jfioAJdMq51HH6wkmA



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-19 10:29     ` Stefan Hajnoczi
@ 2019-07-19 13:48       ` Sergio Lopez
  2019-07-19 15:09         ` Stefan Hajnoczi
  0 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-07-19 13:48 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, maran.wilson, mst, qemu-devel, kraxel, pbonzini, sgarzare, rth

[-- Attachment #1: Type: text/plain, Size: 32396 bytes --]


Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
>> 
>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>> 
>> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
>> >> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> >> constructed after the machine model implemented by the latter.
>> >> 
>> >> It's main purpose is providing users a KVM-only machine type with fast
>> >> boot times, minimal attack surface (measured as the number of IO ports
>> >> and MMIO regions exposed to the Guest) and small footprint (specially
>> >> when combined with the ongoing QEMU modularization effort).
>> >> 
>> >> Normally, other than the device support provided by KVM itself,
>> >> microvm only supports virtio-mmio devices. Microvm also includes a
>> >> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> >> for being able to see the early boot kernel messages.
>> >> 
>> >> Microvm only supports booting PVH-enabled Linux ELF images. Booting
>> >> other PVH-enabled kernels may be possible, but due to the lack of ACPI
>> >> and firmware, we're relying on the command line for specifying the
>> >> location of the virtio-mmio transports. If there's an interest on
>> >> using this machine type with other kernels, we'll try to find some
>> >> kind of middle ground solution.
>> >> 
>> >> This is the list of the exposed IO ports and MMIO regions when running
>> >> in non-legacy mode:
>> >> 
>> >> address-space: memory
>> >>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
>> >>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> >> 
>> >> address-space: I/O
>> >>   0000000000000000-000000000000ffff (prio 0, i/o): io
>> >>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>> >>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>> >>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>> >>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>> >>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>> >>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>> >> 
>> >> A QEMU instance with the microvm machine type can be invoked this way:
>> >> 
>> >>  - Normal mode:
>> >> 
>> >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>> >>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>> >>  -nodefaults -no-user-config \
>> >>  -chardev pty,id=virtiocon0,server \
>> >>  -device virtio-serial-device \
>> >>  -device virtconsole,chardev=virtiocon0 \
>> >>  -drive id=test,file=test.img,format=raw,if=none \
>> >>  -device virtio-blk-device,drive=test \
>> >>  -netdev tap,id=tap0,script=no,downscript=no \
>> >>  -device virtio-net-device,netdev=tap0
>> >> 
>> >>  - Legacy mode:
>> >> 
>> >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>> >>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>> >>  -nodefaults -no-user-config \
>> >>  -drive id=test,file=test.img,format=raw,if=none \
>> >>  -device virtio-blk-device,drive=test \
>> >>  -netdev tap,id=tap0,script=no,downscript=no \
>> >>  -device virtio-net-device,netdev=tap0 \
>> >>  -serial stdio
>> >
>> > Please post metrics that compare this against a minimal Q35.
>> >
>> > With qboot it was later found that SeaBIOS can achieve comparable boot
>> > times, so it wasn't worth maintaining qboot.
>> >
>> > Data is needed to show that microvm is really a significant improvement
>> > over a minimal Q35.
>> 
>> I've just ran some numbers using Stefano Garzarella's qemu-boot-time
>> scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the
>> upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with
>> minimal features [2]. The VM boots a minimal kernel [3] without initrd,
>> using a kata container image as root via virtio-blk (though this isn't
>> really relevant, as we're just taking measurements until the kernel is
>> about to exec init).
>> 
>> To try to make the comparison as fair as possible, I've used a minimal
>> q35 machine with as few devices as possible. Disabling HPET and PIT at
>> the same time caused the kernel to get stuck on boot, so I ran two
>> iterations, one without HPET and the other without PIT:
>> 
>> 
>> -----------------
>>  | Q35 with HPET |
>>  -----------------
>> 
>> Command line:
>> 
>> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test
>> 
>> Average boot times after 10 consecutive runs:
>> 
>>  qemu_init_end: 77.637936
>>  linux_start_kernel: 117.082526 (+39.44459)
>>  linux_start_user: 364.629972 (+247.547446)
>> 
>> Memory tree:
>> 
>>  address-space: memory
>>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
>>     0000000000000000-ffffffffffffffff (prio -1, i/o): pci
>>       00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
>>       00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
>>       00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
>>         00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
>>         00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
>>         00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
>>         00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
>>       00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
>>         00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
>>         00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
>>         00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
>>         00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
>>       00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
>>         00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
>>         00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
>>       00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
>>         00000000febff000-00000000febff01f (prio 0, i/o): msix-table
>>         00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
>>       00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
>>     00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
>>     00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
>>     0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
>>     00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
>>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>>     00000000fed00000-00000000fed003ff (prio 0, i/o): hpet
>>     00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
>>     00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
>>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> 
>>  address-space: I/O
>>   0000000000000000-000000000000ffff (prio 0, i/o): io
>>     0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
>>     0000000000000008-000000000000000f (prio 0, i/o): dma-cont
>>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>>     0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
>>     0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
>>     0000000000000070-0000000000000071 (prio 0, i/o): rtc
>>       0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
>>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>>     0000000000000080-0000000000000080 (prio 0, i/o): ioport80
>>     0000000000000081-0000000000000083 (prio 0, i/o): dma-page
>>     0000000000000087-0000000000000087 (prio 0, i/o): dma-page
>>     0000000000000089-000000000000008b (prio 0, i/o): dma-page
>>     000000000000008f-000000000000008f (prio 0, i/o): dma-page
>>     0000000000000092-0000000000000092 (prio 0, i/o): port92
>>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>>     00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
>>     00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
>>     00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
>>     00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
>>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>>     0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
>>     0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
>>     0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
>>       0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
>>       0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
>>       0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
>>       0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
>>       0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
>>       0000000000000660-000000000000067f (prio 0, i/o): sm-tco
>>     0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
>>     0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
>>     0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
>>     0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
>>     000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
>>     000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci
>> 
>> 
>>  ----------------
>>  | Q35 with PIT |
>>  ----------------
>> 
>> Command line:
>> 
>> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=on,vmport=off,sata=off,usb=off,graphics=off -no-hpet -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test
>> 
>> Average boot times after 10 consecutive runs:
>> 
>>  qemu_init_end: 77.467852
>>  linux_start_kernel: 116.688472 (+39.22062)
>>  linux_start_user: 363.033365 (+246.344893)
>> 
>> Memory tree:
>> 
>> address-space: memory
>>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
>>     0000000000000000-ffffffffffffffff (prio -1, i/o): pci
>>       00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
>>       00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
>>       00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
>>         00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
>>         00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
>>         00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
>>         00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
>>       00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
>>         00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
>>         00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
>>         00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
>>         00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
>>       00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
>>         00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
>>         00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
>>       00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
>>         00000000febff000-00000000febff01f (prio 0, i/o): msix-table
>>         00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
>>       00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
>>     00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
>>     00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
>>     0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
>>     00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
>>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>>     00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
>>     00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
>>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> 
>> address-space: I/O
>>   0000000000000000-000000000000ffff (prio 0, i/o): io
>>     0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
>>     0000000000000008-000000000000000f (prio 0, i/o): dma-cont
>>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>>     0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
>>     0000000000000061-0000000000000061 (prio 0, i/o): pcspk
>>     0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
>>     0000000000000070-0000000000000071 (prio 0, i/o): rtc
>>       0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
>>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>>     0000000000000080-0000000000000080 (prio 0, i/o): ioport80
>>     0000000000000081-0000000000000083 (prio 0, i/o): dma-page
>>     0000000000000087-0000000000000087 (prio 0, i/o): dma-page
>>     0000000000000089-000000000000008b (prio 0, i/o): dma-page
>>     000000000000008f-000000000000008f (prio 0, i/o): dma-page
>>     0000000000000092-0000000000000092 (prio 0, i/o): port92
>>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>>     00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
>>     00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
>>     00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
>>     00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
>>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>>     0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
>>     0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
>>     0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
>>       0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
>>       0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
>>       0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
>>       0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
>>       0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
>>       0000000000000660-000000000000067f (prio 0, i/o): sm-tco
>>     0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
>>     0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
>>     0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
>>     0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
>>     000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
>>     000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci
>> 
>> 
>>  -----------
>>  | microvm |
>>  -----------
>> 
>> Command line:
>> 
>> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M microvm -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk-device,drive=test
>> 
>> Average boot times after 10 consecutive runs:
>> 
>>  qemu_init_end: 64.043264
>>  linux_start_kernel: 65.481782 (+1.438518)
>>  linux_start_user: 114.938353 (+49.456571)
>> 
>> Memory tree:
>> 
>>  address-space: memory
>>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @microvm.ram 0000000000000000-000000001fffffff
>>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> 
>>  address-space: I/O
>>   0000000000000000-000000000000ffff (prio 0, i/o): io
>>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>> 
>> 
>>  --------------
>>  | Conclusion |
>>  --------------
>> 
>> The average boot time of microvm is a third of Q35's (115ms vs. 363ms),
>> and is smaller on all sections (QEMU initialization, firmware overhead
>> and kernel start-to-user).
>> 
>> Microvm's memory tree is also visibly simpler, significantly reducing
>> the exposed surface to the guest.
>> 
>> While we can certainly work on making Q35 smaller, I definitely think
>> it's better (and way safer!) having a specialized machine type for a
>> specific use case, than a minimal Q35 whose behavior significantly
>> diverges from a conventional Q35.
>
> Interesting, so not a 10x difference!  This might be amenable to
> optimization.
>
> My concern with microvm is that it's so limited that few users will be
> able to benefit from the reduced attack surface and faster startup time.
> I think it's worth investigating slimming down Q35 further first.
>
> In terms of startup time the first step would be profiling Q35 kernel
> startup to find out what's taking so long (firmware initialization, PCI
> probing, etc)?

Some findings:

 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") saves a
    whooping 120ms by avoiding the APIC timer calibration at
    arch/x86/kernel/apic/apic.c:calibrate_APIC_clock

Average boot time with "-cpu host"
 qemu_init_end: 76.408950
 linux_start_kernel: 116.166142 (+39.757192)
 linux_start_user: 242.954347 (+126.788205)

Average boot time with default "cpu"
 qemu_init_end: 77.467852
 linux_start_kernel: 116.688472 (+39.22062)
 linux_start_user: 363.033365 (+246.344893)

 2. The other 130ms are a direct result of PCI and ACPI presence (tested
    with a kernel without support for those elements). I'll publish some
    detailed numbers next week.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-19 13:48       ` Sergio Lopez
@ 2019-07-19 15:09         ` Stefan Hajnoczi
  2019-07-19 15:42           ` Montes, Julio
  0 siblings, 1 reply; 68+ messages in thread
From: Stefan Hajnoczi @ 2019-07-19 15:09 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Eduardo Habkost, Maran Wilson, Michael S. Tsirkin, qemu-devel,
	Gerd Hoffmann, Paolo Bonzini, Stefano Garzarella,
	Richard Henderson

On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> Stefan Hajnoczi <stefanha@gmail.com> writes:
> > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> >>
> >> Stefan Hajnoczi <stefanha@gmail.com> writes:
> >>
> >> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> >>  --------------
> >>  | Conclusion |
> >>  --------------
> >>
> >> The average boot time of microvm is a third of Q35's (115ms vs. 363ms),
> >> and is smaller on all sections (QEMU initialization, firmware overhead
> >> and kernel start-to-user).
> >>
> >> Microvm's memory tree is also visibly simpler, significantly reducing
> >> the exposed surface to the guest.
> >>
> >> While we can certainly work on making Q35 smaller, I definitely think
> >> it's better (and way safer!) having a specialized machine type for a
> >> specific use case, than a minimal Q35 whose behavior significantly
> >> diverges from a conventional Q35.
> >
> > Interesting, so not a 10x difference!  This might be amenable to
> > optimization.
> >
> > My concern with microvm is that it's so limited that few users will be
> > able to benefit from the reduced attack surface and faster startup time.
> > I think it's worth investigating slimming down Q35 further first.
> >
> > In terms of startup time the first step would be profiling Q35 kernel
> > startup to find out what's taking so long (firmware initialization, PCI
> > probing, etc)?
>
> Some findings:
>
>  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") saves a
>     whooping 120ms by avoiding the APIC timer calibration at
>     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
>
> Average boot time with "-cpu host"
>  qemu_init_end: 76.408950
>  linux_start_kernel: 116.166142 (+39.757192)
>  linux_start_user: 242.954347 (+126.788205)
>
> Average boot time with default "cpu"
>  qemu_init_end: 77.467852
>  linux_start_kernel: 116.688472 (+39.22062)
>  linux_start_user: 363.033365 (+246.344893)

\o/

>  2. The other 130ms are a direct result of PCI and ACPI presence (tested
>     with a kernel without support for those elements). I'll publish some
>     detailed numbers next week.

Here are the Kata Containers kernel parameters:

var kernelParams = []Param{
        {"tsc", "reliable"},
        {"no_timer_check", ""},
        {"rcupdate.rcu_expedited", "1"},
        {"i8042.direct", "1"},
        {"i8042.dumbkbd", "1"},
        {"i8042.nopnp", "1"},
        {"i8042.noaux", "1"},
        {"noreplace-smp", ""},
        {"reboot", "k"},
        {"console", "hvc0"},
        {"console", "hvc1"},
        {"iommu", "off"},
        {"cryptomgr.notests", ""},
        {"net.ifnames", "0"},
        {"pci", "lastbus=0"},
}

pci lastbus=0 looks interesting and so do some of the others :).

Stefan


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-19 15:09         ` Stefan Hajnoczi
@ 2019-07-19 15:42           ` Montes, Julio
  2019-07-23  8:43             ` Sergio Lopez
  0 siblings, 1 reply; 68+ messages in thread
From: Montes, Julio @ 2019-07-19 15:42 UTC (permalink / raw)
  To: stefanha, slp
  Cc: ehabkost, mst, maran.wilson, qemu-devel, kraxel, pbonzini, rth, sgarzare

On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > > > 
> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> > > >  --------------
> > > >  | Conclusion |
> > > >  --------------
> > > > 
> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> > > > 363ms),
> > > > and is smaller on all sections (QEMU initialization, firmware
> > > > overhead
> > > > and kernel start-to-user).
> > > > 
> > > > Microvm's memory tree is also visibly simpler, significantly
> > > > reducing
> > > > the exposed surface to the guest.
> > > > 
> > > > While we can certainly work on making Q35 smaller, I definitely
> > > > think
> > > > it's better (and way safer!) having a specialized machine type
> > > > for a
> > > > specific use case, than a minimal Q35 whose behavior
> > > > significantly
> > > > diverges from a conventional Q35.
> > > 
> > > Interesting, so not a 10x difference!  This might be amenable to
> > > optimization.
> > > 
> > > My concern with microvm is that it's so limited that few users
> > > will be
> > > able to benefit from the reduced attack surface and faster
> > > startup time.
> > > I think it's worth investigating slimming down Q35 further first.
> > > 
> > > In terms of startup time the first step would be profiling Q35
> > > kernel
> > > startup to find out what's taking so long (firmware
> > > initialization, PCI
> > > probing, etc)?
> > 
> > Some findings:
> > 
> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> > saves a
> >     whooping 120ms by avoiding the APIC timer calibration at
> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> > 
> > Average boot time with "-cpu host"
> >  qemu_init_end: 76.408950
> >  linux_start_kernel: 116.166142 (+39.757192)
> >  linux_start_user: 242.954347 (+126.788205)
> > 
> > Average boot time with default "cpu"
> >  qemu_init_end: 77.467852
> >  linux_start_kernel: 116.688472 (+39.22062)
> >  linux_start_user: 363.033365 (+246.344893)
> 
> \o/
> 
> >  2. The other 130ms are a direct result of PCI and ACPI presence
> > (tested
> >     with a kernel without support for those elements). I'll publish
> > some
> >     detailed numbers next week.
> 
> Here are the Kata Containers kernel parameters:
> 
> var kernelParams = []Param{
>         {"tsc", "reliable"},
>         {"no_timer_check", ""},
>         {"rcupdate.rcu_expedited", "1"},
>         {"i8042.direct", "1"},
>         {"i8042.dumbkbd", "1"},
>         {"i8042.nopnp", "1"},
>         {"i8042.noaux", "1"},
>         {"noreplace-smp", ""},
>         {"reboot", "k"},
>         {"console", "hvc0"},
>         {"console", "hvc1"},
>         {"iommu", "off"},
>         {"cryptomgr.notests", ""},
>         {"net.ifnames", "0"},
>         {"pci", "lastbus=0"},
> }
> 
> pci lastbus=0 looks interesting and so do some of the others :).
> 

yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
kernel won't scan the 255.. buses :)

> Stefan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/4] hw/i386: Factorize PVH related functions
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 3/4] hw/i386: Factorize PVH related functions Sergio Lopez
@ 2019-07-23  8:39   ` Liam Merwick
  0 siblings, 0 replies; 68+ messages in thread
From: Liam Merwick @ 2019-07-23  8:39 UTC (permalink / raw)
  To: Sergio Lopez, mst, marcel.apfelbaum, pbonzini, rth, ehabkost,
	maran.wilson, sgarzare, kraxel
  Cc: qemu-devel

On 02/07/2019 13:11, Sergio Lopez wrote:
> Extract PVH related functions from pc.c, and put them in pvh.c, so
> they can be shared with other components.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>


Refactoring LGTM

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>


> ---
>   hw/i386/Makefile.objs |   1 +
>   hw/i386/pc.c          | 120 +++++-------------------------------------
>   hw/i386/pvh.c         | 113 +++++++++++++++++++++++++++++++++++++++
>   hw/i386/pvh.h         |  10 ++++
>   4 files changed, 136 insertions(+), 108 deletions(-)
>   create mode 100644 hw/i386/pvh.c
>   create mode 100644 hw/i386/pvh.h
> 
> diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
> index 5d9c9efd5f..c5f20bbd72 100644
> --- a/hw/i386/Makefile.objs
> +++ b/hw/i386/Makefile.objs
> @@ -1,5 +1,6 @@
>   obj-$(CONFIG_KVM) += kvm/
>   obj-y += multiboot.o
> +obj-y += pvh.o
>   obj-y += pc.o
>   obj-$(CONFIG_I440FX) += pc_piix.o
>   obj-$(CONFIG_Q35) += pc_q35.o
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3983621f1c..325ec2c1c8 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -42,6 +42,7 @@
>   #include "hw/loader.h"
>   #include "elf.h"
>   #include "multiboot.h"
> +#include "pvh.h"
>   #include "hw/timer/mc146818rtc.h"
>   #include "hw/dma/i8257.h"
>   #include "hw/timer/i8254.h"
> @@ -108,9 +109,6 @@ static struct e820_entry *e820_table;
>   static unsigned e820_entries;
>   struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
>   
> -/* Physical Address of PVH entry point read from kernel ELF NOTE */
> -static size_t pvh_start_addr;
> -
>   GlobalProperty pc_compat_4_0[] = {};
>   const size_t pc_compat_4_0_len = G_N_ELEMENTS(pc_compat_4_0);
>   
> @@ -1061,109 +1059,6 @@ struct setup_data {
>       uint8_t data[0];
>   } __attribute__((packed));
>   
> -
> -/*
> - * The entry point into the kernel for PVH boot is different from
> - * the native entry point.  The PVH entry is defined by the x86/HVM
> - * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> - *
> - * This function is passed to load_elf() when it is called from
> - * load_elfboot() which then additionally checks for an ELF Note of
> - * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> - * parse the PVH entry address from the ELF Note.
> - *
> - * Due to trickery in elf_opts.h, load_elf() is actually available as
> - * load_elf32() or load_elf64() and this routine needs to be able
> - * to deal with being called as 32 or 64 bit.
> - *
> - * The address of the PVH entry point is saved to the 'pvh_start_addr'
> - * global variable.  (although the entry point is 32-bit, the kernel
> - * binary can be either 32-bit or 64-bit).
> - */
> -static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> -{
> -    size_t *elf_note_data_addr;
> -
> -    /* Check if ELF Note header passed in is valid */
> -    if (arg1 == NULL) {
> -        return 0;
> -    }
> -
> -    if (is64) {
> -        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> -        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> -        uint64_t phdr_align = *(uint64_t *)arg2;
> -        uint64_t nhdr_namesz = nhdr64->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr64) + nhdr_size64 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    } else {
> -        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> -        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> -        uint32_t phdr_align = *(uint32_t *)arg2;
> -        uint32_t nhdr_namesz = nhdr32->n_namesz;
> -
> -        elf_note_data_addr =
> -            ((void *)nhdr32) + nhdr_size32 +
> -            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> -    }
> -
> -    pvh_start_addr = *elf_note_data_addr;
> -
> -    return pvh_start_addr;
> -}
> -
> -static bool load_elfboot(const char *kernel_filename,
> -                   int kernel_file_size,
> -                   uint8_t *header,
> -                   size_t pvh_xen_start_addr,
> -                   FWCfgState *fw_cfg)
> -{
> -    uint32_t flags = 0;
> -    uint32_t mh_load_addr = 0;
> -    uint32_t elf_kernel_size = 0;
> -    uint64_t elf_entry;
> -    uint64_t elf_low, elf_high;
> -    int kernel_size;
> -
> -    if (ldl_p(header) != 0x464c457f) {
> -        return false; /* no elfboot */
> -    }
> -
> -    bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> -    flags = elf_is64 ?
> -        ((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> -
> -    if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> -        error_report("elfboot unsupported flags = %x", flags);
> -        exit(1);
> -    }
> -
> -    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> -    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> -                           NULL, &elf_note_type, &elf_entry,
> -                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> -                           0, 0);
> -
> -    if (kernel_size < 0) {
> -        error_report("Error while loading elf kernel");
> -        exit(1);
> -    }
> -    mh_load_addr = elf_low;
> -    elf_kernel_size = elf_high - elf_low;
> -
> -    if (pvh_start_addr == 0) {
> -        error_report("Error loading uncompressed kernel without PVH ELF Note");
> -        exit(1);
> -    }
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> -    fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> -
> -    return true;
> -}
> -
>   static void load_linux(PCMachineState *pcms,
>                          FWCfgState *fw_cfg)
>   {
> @@ -1203,6 +1098,9 @@ static void load_linux(PCMachineState *pcms,
>       if (ldl_p(header+0x202) == 0x53726448) {
>           protocol = lduw_p(header+0x206);
>       } else {
> +        size_t pvh_start_addr;
> +        uint32_t mh_load_addr = 0;
> +        uint32_t elf_kernel_size = 0;
>           /*
>            * This could be a multiboot kernel. If it is, let's stop treating it
>            * like a Linux kernel.
> @@ -1220,10 +1118,16 @@ static void load_linux(PCMachineState *pcms,
>            * If load_elfboot() is successful, populate the fw_cfg info.
>            */
>           if (pcmc->pvh_enabled &&
> -            load_elfboot(kernel_filename, kernel_size,
> -                         header, pvh_start_addr, fw_cfg)) {
> +            pvh_load_elfboot(kernel_filename,
> +                             &mh_load_addr, &elf_kernel_size)) {
>               fclose(f);
>   
> +            pvh_start_addr = pvh_get_start_addr();
> +
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_start_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> +            fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> +
>               fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
>                   strlen(kernel_cmdline) + 1);
>               fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> diff --git a/hw/i386/pvh.c b/hw/i386/pvh.c
> new file mode 100644
> index 0000000000..61623b4533
> --- /dev/null
> +++ b/hw/i386/pvh.c
> @@ -0,0 +1,113 @@
> +/*
> + * PVH Boot Helper
> + *
> + * Copyright (C) 2019 Oracle
> + * Copyright (C) 2019 Red Hat, Inc
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qemu/error-report.h"
> +#include "hw/loader.h"
> +#include "cpu.h"
> +#include "elf.h"
> +#include "pvh.h"
> +
> +static size_t pvh_start_addr = 0;
> +
> +size_t pvh_get_start_addr(void)
> +{
> +    return pvh_start_addr;
> +}
> +
> +/*
> + * The entry point into the kernel for PVH boot is different from
> + * the native entry point.  The PVH entry is defined by the x86/HVM
> + * direct boot ABI and is available in an ELFNOTE in the kernel binary.
> + *
> + * This function is passed to load_elf() when it is called from
> + * load_elfboot() which then additionally checks for an ELF Note of
> + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to
> + * parse the PVH entry address from the ELF Note.
> + *
> + * Due to trickery in elf_opts.h, load_elf() is actually available as
> + * load_elf32() or load_elf64() and this routine needs to be able
> + * to deal with being called as 32 or 64 bit.
> + *
> + * The address of the PVH entry point is saved to the 'pvh_start_addr'
> + * global variable.  (although the entry point is 32-bit, the kernel
> + * binary can be either 32-bit or 64-bit).
> + */
> +
> +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64)
> +{
> +    size_t *elf_note_data_addr;
> +
> +    /* Check if ELF Note header passed in is valid */
> +    if (arg1 == NULL) {
> +        return 0;
> +    }
> +
> +    if (is64) {
> +        struct elf64_note *nhdr64 = (struct elf64_note *)arg1;
> +        uint64_t nhdr_size64 = sizeof(struct elf64_note);
> +        uint64_t phdr_align = *(uint64_t *)arg2;
> +        uint64_t nhdr_namesz = nhdr64->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr64) + nhdr_size64 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    } else {
> +        struct elf32_note *nhdr32 = (struct elf32_note *)arg1;
> +        uint32_t nhdr_size32 = sizeof(struct elf32_note);
> +        uint32_t phdr_align = *(uint32_t *)arg2;
> +        uint32_t nhdr_namesz = nhdr32->n_namesz;
> +
> +        elf_note_data_addr =
> +            ((void *)nhdr32) + nhdr_size32 +
> +            QEMU_ALIGN_UP(nhdr_namesz, phdr_align);
> +    }
> +
> +    pvh_start_addr = *elf_note_data_addr;
> +
> +    return pvh_start_addr;
> +}
> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size)
> +{
> +    uint64_t elf_entry;
> +    uint64_t elf_low, elf_high;
> +    int kernel_size;
> +    uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY;
> +
> +    kernel_size = load_elf(kernel_filename, read_pvh_start_addr,
> +                           NULL, &elf_note_type, &elf_entry,
> +                           &elf_low, &elf_high, 0, I386_ELF_MACHINE,
> +                           0, 0);
> +
> +    if (kernel_size < 0) {
> +        error_report("Error while loading elf kernel");
> +        return false;
> +    }
> +
> +    if (pvh_start_addr == 0) {
> +        error_report("Error loading uncompressed kernel without PVH ELF Note");
> +        return false;
> +    }
> +
> +    if (mh_load_addr) {
> +        *mh_load_addr = elf_low;
> +    }
> +
> +    if (elf_kernel_size) {
> +        *elf_kernel_size = elf_high - elf_low;
> +    }
> +
> +    return true;
> +}
> diff --git a/hw/i386/pvh.h b/hw/i386/pvh.h
> new file mode 100644
> index 0000000000..ada67ff6e8
> --- /dev/null
> +++ b/hw/i386/pvh.h
> @@ -0,0 +1,10 @@
> +#ifndef HW_I386_PVH_H
> +#define HW_I386_PVH_H
> +
> +size_t pvh_get_start_addr(void);
> +
> +bool pvh_load_elfboot(const char *kernel_filename,
> +                      uint32_t *mh_load_addr,
> +                      uint32_t *elf_kernel_size);
> +
> +#endif
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-19 15:42           ` Montes, Julio
@ 2019-07-23  8:43             ` Sergio Lopez
  2019-07-23  9:47               ` Stefan Hajnoczi
  0 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-07-23  8:43 UTC (permalink / raw)
  To: Montes, Julio
  Cc: ehabkost, mst, stefanha, maran.wilson, qemu-devel, kraxel,
	pbonzini, rth, sgarzare

[-- Attachment #1: Type: text/plain, Size: 4151 bytes --]


Montes, Julio <julio.montes@intel.com> writes:

> On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
>> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
>> > Stefan Hajnoczi <stefanha@gmail.com> writes:
>> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
>> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
>> > > > 
>> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
>> > > >  --------------
>> > > >  | Conclusion |
>> > > >  --------------
>> > > > 
>> > > > The average boot time of microvm is a third of Q35's (115ms vs.
>> > > > 363ms),
>> > > > and is smaller on all sections (QEMU initialization, firmware
>> > > > overhead
>> > > > and kernel start-to-user).
>> > > > 
>> > > > Microvm's memory tree is also visibly simpler, significantly
>> > > > reducing
>> > > > the exposed surface to the guest.
>> > > > 
>> > > > While we can certainly work on making Q35 smaller, I definitely
>> > > > think
>> > > > it's better (and way safer!) having a specialized machine type
>> > > > for a
>> > > > specific use case, than a minimal Q35 whose behavior
>> > > > significantly
>> > > > diverges from a conventional Q35.
>> > > 
>> > > Interesting, so not a 10x difference!  This might be amenable to
>> > > optimization.
>> > > 
>> > > My concern with microvm is that it's so limited that few users
>> > > will be
>> > > able to benefit from the reduced attack surface and faster
>> > > startup time.
>> > > I think it's worth investigating slimming down Q35 further first.
>> > > 
>> > > In terms of startup time the first step would be profiling Q35
>> > > kernel
>> > > startup to find out what's taking so long (firmware
>> > > initialization, PCI
>> > > probing, etc)?
>> > 
>> > Some findings:
>> > 
>> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
>> > saves a
>> >     whooping 120ms by avoiding the APIC timer calibration at
>> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
>> > 
>> > Average boot time with "-cpu host"
>> >  qemu_init_end: 76.408950
>> >  linux_start_kernel: 116.166142 (+39.757192)
>> >  linux_start_user: 242.954347 (+126.788205)
>> > 
>> > Average boot time with default "cpu"
>> >  qemu_init_end: 77.467852
>> >  linux_start_kernel: 116.688472 (+39.22062)
>> >  linux_start_user: 363.033365 (+246.344893)
>> 
>> \o/
>> 
>> >  2. The other 130ms are a direct result of PCI and ACPI presence
>> > (tested
>> >     with a kernel without support for those elements). I'll publish
>> > some
>> >     detailed numbers next week.
>> 
>> Here are the Kata Containers kernel parameters:
>> 
>> var kernelParams = []Param{
>>         {"tsc", "reliable"},
>>         {"no_timer_check", ""},
>>         {"rcupdate.rcu_expedited", "1"},
>>         {"i8042.direct", "1"},
>>         {"i8042.dumbkbd", "1"},
>>         {"i8042.nopnp", "1"},
>>         {"i8042.noaux", "1"},
>>         {"noreplace-smp", ""},
>>         {"reboot", "k"},
>>         {"console", "hvc0"},
>>         {"console", "hvc1"},
>>         {"iommu", "off"},
>>         {"cryptomgr.notests", ""},
>>         {"net.ifnames", "0"},
>>         {"pci", "lastbus=0"},
>> }
>> 
>> pci lastbus=0 looks interesting and so do some of the others :).
>> 
>
> yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> kernel won't scan the 255.. buses :)

I can confirm that adding pci=lastbus=0 makes a significant
improvement. In fact, is the only option from Kata's kernel parameter
list that has an impact, probably because the kernel is already quite
minimalistic.

Average boot time with "-cpu host" and "pci=lastbus=0"
 qemu_init_end: 73.711569
 linux_start_kernel: 113.414311 (+39.702742)
 linux_start_user: 190.949939 (+77.535628)

That's still ~40% slower than microvm, and the breach quickly widens
when adding more PCI devices (each one adds 10-15ms), but it's certainly
an improvement over the original numbers.

On the other hand, there isn't much we can do here from QEMU's
perspective, as this is basically Guest OS tuning.

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-23  8:43             ` Sergio Lopez
@ 2019-07-23  9:47               ` Stefan Hajnoczi
  2019-07-23 10:01                 ` Paolo Bonzini
  2019-07-23 11:30                 ` Stefano Garzarella
  0 siblings, 2 replies; 68+ messages in thread
From: Stefan Hajnoczi @ 2019-07-23  9:47 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, mst, Montes, Julio, maran.wilson, qemu-devel, kraxel,
	pbonzini, rth, sgarzare

On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <slp@redhat.com> wrote:
> Montes, Julio <julio.montes@intel.com> writes:
>
> > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> >> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
> >> > > >
> >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> >> > > >  --------------
> >> > > >  | Conclusion |
> >> > > >  --------------
> >> > > >
> >> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> >> > > > 363ms),
> >> > > > and is smaller on all sections (QEMU initialization, firmware
> >> > > > overhead
> >> > > > and kernel start-to-user).
> >> > > >
> >> > > > Microvm's memory tree is also visibly simpler, significantly
> >> > > > reducing
> >> > > > the exposed surface to the guest.
> >> > > >
> >> > > > While we can certainly work on making Q35 smaller, I definitely
> >> > > > think
> >> > > > it's better (and way safer!) having a specialized machine type
> >> > > > for a
> >> > > > specific use case, than a minimal Q35 whose behavior
> >> > > > significantly
> >> > > > diverges from a conventional Q35.
> >> > >
> >> > > Interesting, so not a 10x difference!  This might be amenable to
> >> > > optimization.
> >> > >
> >> > > My concern with microvm is that it's so limited that few users
> >> > > will be
> >> > > able to benefit from the reduced attack surface and faster
> >> > > startup time.
> >> > > I think it's worth investigating slimming down Q35 further first.
> >> > >
> >> > > In terms of startup time the first step would be profiling Q35
> >> > > kernel
> >> > > startup to find out what's taking so long (firmware
> >> > > initialization, PCI
> >> > > probing, etc)?
> >> >
> >> > Some findings:
> >> >
> >> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> >> > saves a
> >> >     whooping 120ms by avoiding the APIC timer calibration at
> >> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> >> >
> >> > Average boot time with "-cpu host"
> >> >  qemu_init_end: 76.408950
> >> >  linux_start_kernel: 116.166142 (+39.757192)
> >> >  linux_start_user: 242.954347 (+126.788205)
> >> >
> >> > Average boot time with default "cpu"
> >> >  qemu_init_end: 77.467852
> >> >  linux_start_kernel: 116.688472 (+39.22062)
> >> >  linux_start_user: 363.033365 (+246.344893)
> >>
> >> \o/
> >>
> >> >  2. The other 130ms are a direct result of PCI and ACPI presence
> >> > (tested
> >> >     with a kernel without support for those elements). I'll publish
> >> > some
> >> >     detailed numbers next week.
> >>
> >> Here are the Kata Containers kernel parameters:
> >>
> >> var kernelParams = []Param{
> >>         {"tsc", "reliable"},
> >>         {"no_timer_check", ""},
> >>         {"rcupdate.rcu_expedited", "1"},
> >>         {"i8042.direct", "1"},
> >>         {"i8042.dumbkbd", "1"},
> >>         {"i8042.nopnp", "1"},
> >>         {"i8042.noaux", "1"},
> >>         {"noreplace-smp", ""},
> >>         {"reboot", "k"},
> >>         {"console", "hvc0"},
> >>         {"console", "hvc1"},
> >>         {"iommu", "off"},
> >>         {"cryptomgr.notests", ""},
> >>         {"net.ifnames", "0"},
> >>         {"pci", "lastbus=0"},
> >> }
> >>
> >> pci lastbus=0 looks interesting and so do some of the others :).
> >>
> >
> > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> > kernel won't scan the 255.. buses :)
>
> I can confirm that adding pci=lastbus=0 makes a significant
> improvement. In fact, is the only option from Kata's kernel parameter
> list that has an impact, probably because the kernel is already quite
> minimalistic.
>
> Average boot time with "-cpu host" and "pci=lastbus=0"
>  qemu_init_end: 73.711569
>  linux_start_kernel: 113.414311 (+39.702742)
>  linux_start_user: 190.949939 (+77.535628)
>
> That's still ~40% slower than microvm, and the breach quickly widens
> when adding more PCI devices (each one adds 10-15ms), but it's certainly
> an improvement over the original numbers.
>
> On the other hand, there isn't much we can do here from QEMU's
> perspective, as this is basically Guest OS tuning.

fw_cfg could expose this information so guest kernels know when to
stop enumerating the PCI bus.  This would make all PCI guests with new
kernels boot ~50 ms faster, regardless of machine type.

The difference between microvm and tuned Q35 is 76 ms now.

microvm:
qemu_init_end: 64.043264
linux_start_kernel: 65.481782 (+1.438518)
linux_start_user: 114.938353 (+49.456571)

Q35 with -cpu host and pci=lasbus=0:
qemu_init_end: 73.711569
linux_start_kernel: 113.414311 (+39.702742)
linux_start_user: 190.949939 (+77.535628)

There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
loading the PVH Option ROM.

Stefano: any recommendations for profiling or tuning SeaBIOS?

Stefan


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-23  9:47               ` Stefan Hajnoczi
@ 2019-07-23 10:01                 ` Paolo Bonzini
  2019-07-24 11:14                   ` Paolo Bonzini
  2019-07-23 11:30                 ` Stefano Garzarella
  1 sibling, 1 reply; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-23 10:01 UTC (permalink / raw)
  To: Stefan Hajnoczi, Sergio Lopez
  Cc: ehabkost, mst, Montes, Julio, maran.wilson, qemu-devel, kraxel,
	rth, sgarzare

On 23/07/19 11:47, Stefan Hajnoczi wrote:
> fw_cfg could expose this information so guest kernels know when to
> stop enumerating the PCI bus.  This would make all PCI guests with new
> kernels boot ~50 ms faster, regardless of machine type.

The number of buses is determined by the firmware, not by QEMU, so
fw_cfg would not be the right interface.  In fact (as I have just
learnt) lastbus is an x86-specific option that overrides the last bus
returned by SeaBIOS's handle_1ab101.

So the next step could be to figure out what is the lastbus returned by
handle_1ab101 and possibly why it isn't zero.

Paolo

> The difference between microvm and tuned Q35 is 76 ms now.
> 
> microvm:
> qemu_init_end: 64.043264
> linux_start_kernel: 65.481782 (+1.438518)
> linux_start_user: 114.938353 (+49.456571)
> 
> Q35 with -cpu host and pci=lasbus=0:
> qemu_init_end: 73.711569
> linux_start_kernel: 113.414311 (+39.702742)
> linux_start_user: 190.949939 (+77.535628)
> 
> There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
> loading the PVH Option ROM.
> 
> Stefano: any recommendations for profiling or tuning SeaBIOS?



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-23  9:47               ` Stefan Hajnoczi
  2019-07-23 10:01                 ` Paolo Bonzini
@ 2019-07-23 11:30                 ` Stefano Garzarella
  2019-07-24 15:23                   ` Stefano Garzarella
  1 sibling, 1 reply; 68+ messages in thread
From: Stefano Garzarella @ 2019-07-23 11:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: ehabkost, Sergio Lopez, mst, Montes, Julio, maran.wilson,
	qemu-devel, kraxel, pbonzini, rth

On Tue, Jul 23, 2019 at 10:47:39AM +0100, Stefan Hajnoczi wrote:
> On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <slp@redhat.com> wrote:
> > Montes, Julio <julio.montes@intel.com> writes:
> >
> > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> > >> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> > >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > >> > > >
> > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> > >> > > >  --------------
> > >> > > >  | Conclusion |
> > >> > > >  --------------
> > >> > > >
> > >> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> > >> > > > 363ms),
> > >> > > > and is smaller on all sections (QEMU initialization, firmware
> > >> > > > overhead
> > >> > > > and kernel start-to-user).
> > >> > > >
> > >> > > > Microvm's memory tree is also visibly simpler, significantly
> > >> > > > reducing
> > >> > > > the exposed surface to the guest.
> > >> > > >
> > >> > > > While we can certainly work on making Q35 smaller, I definitely
> > >> > > > think
> > >> > > > it's better (and way safer!) having a specialized machine type
> > >> > > > for a
> > >> > > > specific use case, than a minimal Q35 whose behavior
> > >> > > > significantly
> > >> > > > diverges from a conventional Q35.
> > >> > >
> > >> > > Interesting, so not a 10x difference!  This might be amenable to
> > >> > > optimization.
> > >> > >
> > >> > > My concern with microvm is that it's so limited that few users
> > >> > > will be
> > >> > > able to benefit from the reduced attack surface and faster
> > >> > > startup time.
> > >> > > I think it's worth investigating slimming down Q35 further first.
> > >> > >
> > >> > > In terms of startup time the first step would be profiling Q35
> > >> > > kernel
> > >> > > startup to find out what's taking so long (firmware
> > >> > > initialization, PCI
> > >> > > probing, etc)?
> > >> >
> > >> > Some findings:
> > >> >
> > >> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> > >> > saves a
> > >> >     whooping 120ms by avoiding the APIC timer calibration at
> > >> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> > >> >
> > >> > Average boot time with "-cpu host"
> > >> >  qemu_init_end: 76.408950
> > >> >  linux_start_kernel: 116.166142 (+39.757192)
> > >> >  linux_start_user: 242.954347 (+126.788205)
> > >> >
> > >> > Average boot time with default "cpu"
> > >> >  qemu_init_end: 77.467852
> > >> >  linux_start_kernel: 116.688472 (+39.22062)
> > >> >  linux_start_user: 363.033365 (+246.344893)
> > >>
> > >> \o/
> > >>
> > >> >  2. The other 130ms are a direct result of PCI and ACPI presence
> > >> > (tested
> > >> >     with a kernel without support for those elements). I'll publish
> > >> > some
> > >> >     detailed numbers next week.
> > >>
> > >> Here are the Kata Containers kernel parameters:
> > >>
> > >> var kernelParams = []Param{
> > >>         {"tsc", "reliable"},
> > >>         {"no_timer_check", ""},
> > >>         {"rcupdate.rcu_expedited", "1"},
> > >>         {"i8042.direct", "1"},
> > >>         {"i8042.dumbkbd", "1"},
> > >>         {"i8042.nopnp", "1"},
> > >>         {"i8042.noaux", "1"},
> > >>         {"noreplace-smp", ""},
> > >>         {"reboot", "k"},
> > >>         {"console", "hvc0"},
> > >>         {"console", "hvc1"},
> > >>         {"iommu", "off"},
> > >>         {"cryptomgr.notests", ""},
> > >>         {"net.ifnames", "0"},
> > >>         {"pci", "lastbus=0"},
> > >> }
> > >>
> > >> pci lastbus=0 looks interesting and so do some of the others :).
> > >>
> > >
> > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> > > kernel won't scan the 255.. buses :)
> >
> > I can confirm that adding pci=lastbus=0 makes a significant
> > improvement. In fact, is the only option from Kata's kernel parameter
> > list that has an impact, probably because the kernel is already quite
> > minimalistic.
> >
> > Average boot time with "-cpu host" and "pci=lastbus=0"
> >  qemu_init_end: 73.711569
> >  linux_start_kernel: 113.414311 (+39.702742)
> >  linux_start_user: 190.949939 (+77.535628)
> >
> > That's still ~40% slower than microvm, and the breach quickly widens
> > when adding more PCI devices (each one adds 10-15ms), but it's certainly
> > an improvement over the original numbers.
> >
> > On the other hand, there isn't much we can do here from QEMU's
> > perspective, as this is basically Guest OS tuning.
> 
> fw_cfg could expose this information so guest kernels know when to
> stop enumerating the PCI bus.  This would make all PCI guests with new
> kernels boot ~50 ms faster, regardless of machine type.
> 
> The difference between microvm and tuned Q35 is 76 ms now.
> 
> microvm:
> qemu_init_end: 64.043264
> linux_start_kernel: 65.481782 (+1.438518)
> linux_start_user: 114.938353 (+49.456571)
> 
> Q35 with -cpu host and pci=lasbus=0:
> qemu_init_end: 73.711569
> linux_start_kernel: 113.414311 (+39.702742)
> linux_start_user: 190.949939 (+77.535628)
> 
> There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
> loading the PVH Option ROM.
> 
> Stefano: any recommendations for profiling or tuning SeaBIOS?

As I said on IRC, the SeaBIOS image in QEMU is the 1.12.1 and it doesn't
include this patch (available in the upstream SeaBIOS) that saves ~10ms:

    commit 75b42835134553c96f113e5014072c0caf99d092
    Author: Stefano Garzarella <sgarzare@redhat.com>
    Date:   Sun Dec 2 14:10:13 2018 +0100

        qemu: avoid debug prints if debugcon is not enabled

        In order to speed up the boot phase, we can check the QEMU
        debugcon device, and disable the writes if it is not recognized.

        This patch allow us to save around 10 msec (time measured
        between SeaBIOS entry point and "linuxboot" entry point)
        when CONFIG_DEBUG_LEVEL=1 and debugcon is not enabled.

        Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
        Signed-off-by: Kevin O'Connor <kevin@koconnor.net>

As you said, we should update SeaBIOS for the next QEMU release.

For profiling, I have some patches that I used to put trace points in
the SeaBIOS code. I'll put them in this repository ASAP:
    https://github.com/stefano-garzarella/qemu-boot-time


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-23 10:01                 ` Paolo Bonzini
@ 2019-07-24 11:14                   ` Paolo Bonzini
  2019-07-25  9:35                     ` Sergio Lopez
                                       ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-24 11:14 UTC (permalink / raw)
  To: Stefan Hajnoczi, Sergio Lopez
  Cc: ehabkost, mst, Montes, Julio, maran.wilson, qemu-devel, kraxel,
	rth, sgarzare

On 23/07/19 12:01, Paolo Bonzini wrote:
> The number of buses is determined by the firmware, not by QEMU, so
> fw_cfg would not be the right interface.  In fact (as I have just
> learnt) lastbus is an x86-specific option that overrides the last bus
> returned by SeaBIOS's handle_1ab101.
> 
> So the next step could be to figure out what is the lastbus returned by
> handle_1ab101 and possibly why it isn't zero.

Some update:

- for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
work on 32-bit kernels with ACPI disabled, because they are located beyond
pcibios_last_bus (with ACPI enabled, the DSDT exposes them).

- for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done.

- for -M q35, pcibios_last_bus in Linux is set based on the size of the 
MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
for buses above 0.

Here is a patch that only scans devfn==0, which should mostly remove the need
for pci=lastbus=0.  (Testing is welcome).

Actually, KVM could probably avoid the scanning altogether.  The only "hidden" root
buses we expect are from PCI expander bridges and if you found an MMCONFIG area
through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges.
However, I am being conservative.

A possible alternative could be a mechanism whereby the vmlinuz real mode entry
point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
kernel via the vmlinuz or PVH boot information structs.  However, I don't think
that's very useful, and there is some risk of breaking real hardware too.

Paolo

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 73bb404f4d2a..17012aa60d22 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -61,6 +61,7 @@ enum pci_bf_sort_state {
 extern struct pci_ops pci_root_ops;
 
 void pcibios_scan_specific_bus(int busn);
+void pcibios_scan_bus_by_device(int busn);
 
 /* pci-irq.c */
 
@@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
 # endif
 # define x86_default_pci_init_irq	pcibios_irq_init
 # define x86_default_pci_fixup_irqs	pcibios_fixup_irqs
+# define x86_default_pci_scan_bus	pcibios_scan_bus_by_device
 #else
 # define x86_default_pci_init		NULL
 # define x86_default_pci_init_irq	NULL
 # define x86_default_pci_fixup_irqs	NULL
+# define x86_default_pci_scan_bus      NULL
 #endif
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index b85a7c54c6a1..4c3a0a17a600 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -251,6 +251,7 @@ struct x86_hyper_runtime {
  * @save_sched_clock_state:	save state for sched_clock() on suspend
  * @restore_sched_clock_state:	restore state for sched_clock() on resume
  * @apic_post_init:		adjust apic if needed
+ * @pci_scan_bus:		scan a PCI bus
  * @legacy:			legacy features
  * @set_legacy_features:	override legacy features. Use of this callback
  * 				is highly discouraged. You should only need
@@ -273,6 +274,7 @@ struct x86_platform_ops {
 	void (*save_sched_clock_state)(void);
 	void (*restore_sched_clock_state)(void);
 	void (*apic_post_init)(void);
+	void (*pci_scan_bus)(int busn);
 	struct x86_legacy_features legacy;
 	void (*set_legacy_features)(void);
 	struct x86_hyper_runtime hyper;
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index 6857b4577f17..b248d7036dd3 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -11,12 +11,14 @@
 #include <linux/acpi_pmtmr.h>
 #include <linux/kernel.h>
 #include <linux/reboot.h>
+#include <linux/pci.h>
 #include <asm/apic.h>
 #include <asm/cpu.h>
 #include <asm/hypervisor.h>
 #include <asm/i8259.h>
 #include <asm/irqdomain.h>
 #include <asm/pci_x86.h>
+#include <asm/pci.h>
 #include <asm/reboot.h>
 #include <asm/setup.h>
 #include <asm/jailhouse_para.h>
@@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
 	return 0;
 }
 
+static void jailhouse_pci_scan_bus_by_function(int busn)
+{
+        int devfn;
+        u32 l;
+
+        for (devfn = 0; devfn < 256; devfn++) {
+                if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
+                    l != 0x0000 && l != 0xffff) {
+                        DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
+                        pr_info("PCI: Discovered peer bus %02x\n", busn);
+                        pcibios_scan_root(busn);
+                        return;
+                }
+        }
+}
+
 static void __init jailhouse_init_platform(void)
 {
 	u64 pa_data = boot_params.hdr.setup_data;
@@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void)
 	x86_platform.legacy.rtc		= 0;
 	x86_platform.legacy.warm_reset	= 0;
 	x86_platform.legacy.i8042	= X86_LEGACY_I8042_PLATFORM_ABSENT;
+	x86_platform.pci_scan_bus	= jailhouse_pci_scan_bus_by_function;
 
 	legacy_pic			= &null_legacy_pic;
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 82caf01b63dd..59f7204ed8f3 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -24,6 +24,7 @@
 #include <linux/debugfs.h>
 #include <linux/nmi.h>
 #include <linux/swait.h>
+#include <linux/pci.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -33,6 +34,7 @@
 #include <asm/apicdef.h>
 #include <asm/hypervisor.h>
 #include <asm/tlb.h>
+#include <asm/pci.h>
 
 static int kvmapf = 1;
 
@@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask,
 	native_flush_tlb_others(flushmask, info);
 }
 
+#ifdef CONFIG_PCI
+static void kvm_pci_scan_bus(int busn)
+{
+        u32 l;
+
+	/*
+	 * Assume that there are no "hidden" buses, i.e. all PCI root buses
+	 * have a host bridge at device 0, function 0.
+	 */
+	if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) &&
+	    l != 0x0000 && l != 0xffff) {
+		pr_info("PCI: Discovered peer bus %02x\n", busn);
+		pcibios_scan_root(busn);
+        }
+}
+#endif
+
 static void __init kvm_guest_init(void)
 {
 	int i;
 
+#ifdef CONFIG_PCI
+	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
+#endif
+
 	if (!kvm_para_available())
 		return;
 
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 50a2b492fdd6..19e1cc2cb6e0 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
 	.get_nmi_reason			= default_get_nmi_reason,
 	.save_sched_clock_state 	= tsc_save_sched_clock_state,
 	.restore_sched_clock_state 	= tsc_restore_sched_clock_state,
+	.pci_scan_bus			= x86_default_pci_scan_bus,
 	.hyper.pin_vcpu			= x86_op_int_noop,
 };
 
diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 467311b1eeea..6214dbce26d3 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -36,14 +36,19 @@ int __init pci_legacy_init(void)
 
 void pcibios_scan_specific_bus(int busn)
 {
-	int stride = jailhouse_paravirt() ? 1 : 8;
-	int devfn;
-	u32 l;
-
 	if (pci_find_bus(0, busn))
 		return;
 
-	for (devfn = 0; devfn < 256; devfn += stride) {
+	x86_platform.pci_scan_bus(busn);
+}
+EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
+
+void pcibios_scan_bus_by_device(int busn)
+{
+	int devfn;
+	u32 l;
+
+	for (devfn = 0; devfn < 256; devfn += 8) {
 		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
 		    l != 0x0000 && l != 0xffff) {
 			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
@@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn)
 		}
 	}
 }
-EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
 
 static int __init pci_subsys_init(void)
 {


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-23 11:30                 ` Stefano Garzarella
@ 2019-07-24 15:23                   ` Stefano Garzarella
  0 siblings, 0 replies; 68+ messages in thread
From: Stefano Garzarella @ 2019-07-24 15:23 UTC (permalink / raw)
  To: Stefan Hajnoczi, Sergio Lopez
  Cc: ehabkost, mst, Montes, Julio, maran.wilson, qemu-devel, kraxel,
	pbonzini, rth



On Tue, Jul 23, 2019 at 1:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Tue, Jul 23, 2019 at 10:47:39AM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <slp@redhat.com> wrote:
> > > Montes, Julio <julio.montes@intel.com> writes:
> > >
> > > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> > > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> > > >> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> > > >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > > >> > > >
> > > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> > > >> > > >  --------------
> > > >> > > >  | Conclusion |
> > > >> > > >  --------------
> > > >> > > >
> > > >> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> > > >> > > > 363ms),
> > > >> > > > and is smaller on all sections (QEMU initialization, firmware
> > > >> > > > overhead
> > > >> > > > and kernel start-to-user).
> > > >> > > >
> > > >> > > > Microvm's memory tree is also visibly simpler, significantly
> > > >> > > > reducing
> > > >> > > > the exposed surface to the guest.
> > > >> > > >
> > > >> > > > While we can certainly work on making Q35 smaller, I definitely
> > > >> > > > think
> > > >> > > > it's better (and way safer!) having a specialized machine type
> > > >> > > > for a
> > > >> > > > specific use case, than a minimal Q35 whose behavior
> > > >> > > > significantly
> > > >> > > > diverges from a conventional Q35.
> > > >> > >
> > > >> > > Interesting, so not a 10x difference!  This might be amenable to
> > > >> > > optimization.
> > > >> > >
> > > >> > > My concern with microvm is that it's so limited that few users
> > > >> > > will be
> > > >> > > able to benefit from the reduced attack surface and faster
> > > >> > > startup time.
> > > >> > > I think it's worth investigating slimming down Q35 further first.
> > > >> > >
> > > >> > > In terms of startup time the first step would be profiling Q35
> > > >> > > kernel
> > > >> > > startup to find out what's taking so long (firmware
> > > >> > > initialization, PCI
> > > >> > > probing, etc)?
> > > >> >
> > > >> > Some findings:
> > > >> >
> > > >> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> > > >> > saves a
> > > >> >     whooping 120ms by avoiding the APIC timer calibration at
> > > >> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> > > >> >
> > > >> > Average boot time with "-cpu host"
> > > >> >  qemu_init_end: 76.408950
> > > >> >  linux_start_kernel: 116.166142 (+39.757192)
> > > >> >  linux_start_user: 242.954347 (+126.788205)
> > > >> >
> > > >> > Average boot time with default "cpu"
> > > >> >  qemu_init_end: 77.467852
> > > >> >  linux_start_kernel: 116.688472 (+39.22062)
> > > >> >  linux_start_user: 363.033365 (+246.344893)
> > > >>
> > > >> \o/
> > > >>
> > > >> >  2. The other 130ms are a direct result of PCI and ACPI presence
> > > >> > (tested
> > > >> >     with a kernel without support for those elements). I'll publish
> > > >> > some
> > > >> >     detailed numbers next week.
> > > >>
> > > >> Here are the Kata Containers kernel parameters:
> > > >>
> > > >> var kernelParams = []Param{
> > > >>         {"tsc", "reliable"},
> > > >>         {"no_timer_check", ""},
> > > >>         {"rcupdate.rcu_expedited", "1"},
> > > >>         {"i8042.direct", "1"},
> > > >>         {"i8042.dumbkbd", "1"},
> > > >>         {"i8042.nopnp", "1"},
> > > >>         {"i8042.noaux", "1"},
> > > >>         {"noreplace-smp", ""},
> > > >>         {"reboot", "k"},
> > > >>         {"console", "hvc0"},
> > > >>         {"console", "hvc1"},
> > > >>         {"iommu", "off"},
> > > >>         {"cryptomgr.notests", ""},
> > > >>         {"net.ifnames", "0"},
> > > >>         {"pci", "lastbus=0"},
> > > >> }
> > > >>
> > > >> pci lastbus=0 looks interesting and so do some of the others :).
> > > >>
> > > >
> > > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> > > > kernel won't scan the 255.. buses :)
> > >
> > > I can confirm that adding pci=lastbus=0 makes a significant
> > > improvement. In fact, is the only option from Kata's kernel parameter
> > > list that has an impact, probably because the kernel is already quite
> > > minimalistic.
> > >
> > > Average boot time with "-cpu host" and "pci=lastbus=0"
> > >  qemu_init_end: 73.711569
> > >  linux_start_kernel: 113.414311 (+39.702742)
> > >  linux_start_user: 190.949939 (+77.535628)
> > >
> > > That's still ~40% slower than microvm, and the breach quickly widens
> > > when adding more PCI devices (each one adds 10-15ms), but it's certainly
> > > an improvement over the original numbers.
> > >
> > > On the other hand, there isn't much we can do here from QEMU's
> > > perspective, as this is basically Guest OS tuning.
> >
> > fw_cfg could expose this information so guest kernels know when to
> > stop enumerating the PCI bus.  This would make all PCI guests with new
> > kernels boot ~50 ms faster, regardless of machine type.
> >
> > The difference between microvm and tuned Q35 is 76 ms now.
> >
> > microvm:
> > qemu_init_end: 64.043264
> > linux_start_kernel: 65.481782 (+1.438518)
> > linux_start_user: 114.938353 (+49.456571)
> >
> > Q35 with -cpu host and pci=lasbus=0:
> > qemu_init_end: 73.711569
> > linux_start_kernel: 113.414311 (+39.702742)
> > linux_start_user: 190.949939 (+77.535628)
> >
> > There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
> > loading the PVH Option ROM.
> >
> > Stefano: any recommendations for profiling or tuning SeaBIOS?
>
> As I said on IRC, the SeaBIOS image in QEMU is the 1.12.1 and it doesn't
> include this patch (available in the upstream SeaBIOS) that saves ~10ms:
>
>     commit 75b42835134553c96f113e5014072c0caf99d092
>     Author: Stefano Garzarella <sgarzare@redhat.com>
>     Date:   Sun Dec 2 14:10:13 2018 +0100
>
>         qemu: avoid debug prints if debugcon is not enabled
>
>         In order to speed up the boot phase, we can check the QEMU
>         debugcon device, and disable the writes if it is not recognized.
>
>         This patch allow us to save around 10 msec (time measured
>         between SeaBIOS entry point and "linuxboot" entry point)
>         when CONFIG_DEBUG_LEVEL=1 and debugcon is not enabled.
>
>         Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>         Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
>
> As you said, we should update SeaBIOS for the next QEMU release.
>
> For profiling, I have some patches that I used to put trace points in
> the SeaBIOS code. I'll put them in this repository ASAP:
>     https://github.com/stefano-garzarella/qemu-boot-time

I pushed QEMU (optionrom) and SeaBIOS patches in:
https://github.com/stefano-garzarella/qemu-boot-time
They can be useful for profiling.

Cheers,
Stefano


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-24 11:14                   ` Paolo Bonzini
@ 2019-07-25  9:35                     ` Sergio Lopez
  2019-07-25 10:03                     ` Michael S. Tsirkin
  2019-07-25 14:46                     ` Michael S. Tsirkin
  2 siblings, 0 replies; 68+ messages in thread
From: Sergio Lopez @ 2019-07-25  9:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, mst, Montes, Julio, Stefan Hajnoczi, maran.wilson,
	qemu-devel, kraxel, rth, sgarzare

[-- Attachment #1: Type: text/plain, Size: 9149 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 23/07/19 12:01, Paolo Bonzini wrote:
>> The number of buses is determined by the firmware, not by QEMU, so
>> fw_cfg would not be the right interface.  In fact (as I have just
>> learnt) lastbus is an x86-specific option that overrides the last bus
>> returned by SeaBIOS's handle_1ab101.
>> 
>> So the next step could be to figure out what is the lastbus returned by
>> handle_1ab101 and possibly why it isn't zero.
>
> Some update:
>
> - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
> only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
> work on 32-bit kernels with ACPI disabled, because they are located beyond
> pcibios_last_bus (with ACPI enabled, the DSDT exposes them).
>
> - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done.
>
> - for -M q35, pcibios_last_bus in Linux is set based on the size of the 
> MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
> for buses above 0.
>
> Here is a patch that only scans devfn==0, which should mostly remove the need
> for pci=lastbus=0.  (Testing is welcome).

I just gave it a try. These are the results (avg on 10 consecutive runs):

 - Unpatched kernel:

Avg
 qemu_init_end: 75.207386
 linux_start_kernel: 115.056767 (+39.849381)
 linux_start_user: 241.020113 (+125.963346)

 - Unpatched kernel with pci=lastbus=0:

Avg
 qemu_init_end: 75.468282
 linux_start_kernel: 115.189322 (+39.72104)
 linux_start_user: 192.404823 (+77.215501)

 - Patched kernel (without pci=lastbus=0):

Avg
 qemu_init_end: 75.605627
 linux_start_kernel: 115.656557 (+40.05093)
 linux_start_user: 192.857655 (+77.201098)

Looks fine to me. There must an extra cost in the patched kernel
vs. using pci=lastbus=0, but it's so low that's hard to catch on the
average numbers.

> Actually, KVM could probably avoid the scanning altogether.  The only "hidden" root
> buses we expect are from PCI expander bridges and if you found an MMCONFIG area
> through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges.
> However, I am being conservative.
>
> A possible alternative could be a mechanism whereby the vmlinuz real mode entry
> point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
> kernel via the vmlinuz or PVH boot information structs.  However, I don't think
> that's very useful, and there is some risk of breaking real hardware too.
>
> Paolo
>
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 73bb404f4d2a..17012aa60d22 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -61,6 +61,7 @@ enum pci_bf_sort_state {
>  extern struct pci_ops pci_root_ops;
>  
>  void pcibios_scan_specific_bus(int busn);
> +void pcibios_scan_bus_by_device(int busn);
>  
>  /* pci-irq.c */
>  
> @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # endif
>  # define x86_default_pci_init_irq	pcibios_irq_init
>  # define x86_default_pci_fixup_irqs	pcibios_fixup_irqs
> +# define x86_default_pci_scan_bus	pcibios_scan_bus_by_device
>  #else
>  # define x86_default_pci_init		NULL
>  # define x86_default_pci_init_irq	NULL
>  # define x86_default_pci_fixup_irqs	NULL
> +# define x86_default_pci_scan_bus      NULL
>  #endif
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index b85a7c54c6a1..4c3a0a17a600 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -251,6 +251,7 @@ struct x86_hyper_runtime {
>   * @save_sched_clock_state:	save state for sched_clock() on suspend
>   * @restore_sched_clock_state:	restore state for sched_clock() on resume
>   * @apic_post_init:		adjust apic if needed
> + * @pci_scan_bus:		scan a PCI bus
>   * @legacy:			legacy features
>   * @set_legacy_features:	override legacy features. Use of this callback
>   * 				is highly discouraged. You should only need
> @@ -273,6 +274,7 @@ struct x86_platform_ops {
>  	void (*save_sched_clock_state)(void);
>  	void (*restore_sched_clock_state)(void);
>  	void (*apic_post_init)(void);
> +	void (*pci_scan_bus)(int busn);
>  	struct x86_legacy_features legacy;
>  	void (*set_legacy_features)(void);
>  	struct x86_hyper_runtime hyper;
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index 6857b4577f17..b248d7036dd3 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -11,12 +11,14 @@
>  #include <linux/acpi_pmtmr.h>
>  #include <linux/kernel.h>
>  #include <linux/reboot.h>
> +#include <linux/pci.h>
>  #include <asm/apic.h>
>  #include <asm/cpu.h>
>  #include <asm/hypervisor.h>
>  #include <asm/i8259.h>
>  #include <asm/irqdomain.h>
>  #include <asm/pci_x86.h>
> +#include <asm/pci.h>
>  #include <asm/reboot.h>
>  #include <asm/setup.h>
>  #include <asm/jailhouse_para.h>
> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
>  	return 0;
>  }
>  
> +static void jailhouse_pci_scan_bus_by_function(int busn)
> +{
> +        int devfn;
> +        u32 l;
> +
> +        for (devfn = 0; devfn < 256; devfn++) {
> +                if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
> +                    l != 0x0000 && l != 0xffff) {
> +                        DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> +                        pr_info("PCI: Discovered peer bus %02x\n", busn);
> +                        pcibios_scan_root(busn);
> +                        return;
> +                }
> +        }
> +}
> +
>  static void __init jailhouse_init_platform(void)
>  {
>  	u64 pa_data = boot_params.hdr.setup_data;
> @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void)
>  	x86_platform.legacy.rtc		= 0;
>  	x86_platform.legacy.warm_reset	= 0;
>  	x86_platform.legacy.i8042	= X86_LEGACY_I8042_PLATFORM_ABSENT;
> +	x86_platform.pci_scan_bus	= jailhouse_pci_scan_bus_by_function;
>  
>  	legacy_pic			= &null_legacy_pic;
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 82caf01b63dd..59f7204ed8f3 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -24,6 +24,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/nmi.h>
>  #include <linux/swait.h>
> +#include <linux/pci.h>
>  #include <asm/timer.h>
>  #include <asm/cpu.h>
>  #include <asm/traps.h>
> @@ -33,6 +34,7 @@
>  #include <asm/apicdef.h>
>  #include <asm/hypervisor.h>
>  #include <asm/tlb.h>
> +#include <asm/pci.h>
>  
>  static int kvmapf = 1;
>  
> @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask,
>  	native_flush_tlb_others(flushmask, info);
>  }
>  
> +#ifdef CONFIG_PCI
> +static void kvm_pci_scan_bus(int busn)
> +{
> +        u32 l;
> +
> +	/*
> +	 * Assume that there are no "hidden" buses, i.e. all PCI root buses
> +	 * have a host bridge at device 0, function 0.
> +	 */
> +	if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) &&
> +	    l != 0x0000 && l != 0xffff) {
> +		pr_info("PCI: Discovered peer bus %02x\n", busn);
> +		pcibios_scan_root(busn);
> +        }
> +}
> +#endif
> +
>  static void __init kvm_guest_init(void)
>  {
>  	int i;
>  
> +#ifdef CONFIG_PCI
> +	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
> +#endif
> +
>  	if (!kvm_para_available())
>  		return;
>  
> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 50a2b492fdd6..19e1cc2cb6e0 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
>  	.get_nmi_reason			= default_get_nmi_reason,
>  	.save_sched_clock_state 	= tsc_save_sched_clock_state,
>  	.restore_sched_clock_state 	= tsc_restore_sched_clock_state,
> +	.pci_scan_bus			= x86_default_pci_scan_bus,
>  	.hyper.pin_vcpu			= x86_op_int_noop,
>  };
>  
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 467311b1eeea..6214dbce26d3 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -36,14 +36,19 @@ int __init pci_legacy_init(void)
>  
>  void pcibios_scan_specific_bus(int busn)
>  {
> -	int stride = jailhouse_paravirt() ? 1 : 8;
> -	int devfn;
> -	u32 l;
> -
>  	if (pci_find_bus(0, busn))
>  		return;
>  
> -	for (devfn = 0; devfn < 256; devfn += stride) {
> +	x86_platform.pci_scan_bus(busn);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
> +
> +void pcibios_scan_bus_by_device(int busn)
> +{
> +	int devfn;
> +	u32 l;
> +
> +	for (devfn = 0; devfn < 256; devfn += 8) {
>  		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
>  		    l != 0x0000 && l != 0xffff) {
>  			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn)
>  		}
>  	}
>  }
> -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
>  
>  static int __init pci_subsys_init(void)
>  {


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
@ 2019-07-25  9:46   ` Liam Merwick
  2019-07-25  9:58     ` Michael S. Tsirkin
  0 siblings, 1 reply; 68+ messages in thread
From: Liam Merwick @ 2019-07-25  9:46 UTC (permalink / raw)
  To: Sergio Lopez, mst, marcel.apfelbaum, pbonzini, rth, ehabkost,
	maran.wilson, sgarzare, kraxel
  Cc: qemu-devel

On 02/07/2019 13:11, Sergio Lopez wrote:
> Put QOM and main struct definition in a separate header file, so it
> can be accesed from other components.

typo: accesed -> accessed

> 
> This is needed for the microvm machine type implementation.
> 
> Signed-off-by: Sergio Lopez <slp@redhat.com>

One nit below, either way

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>


> ---
>   hw/virtio/virtio-mmio.c | 35 +-----------------------
>   hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 61 insertions(+), 34 deletions(-)
>   create mode 100644 hw/virtio/virtio-mmio.h
> 
> diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
> index 97b7f35496..87c7fe4d8d 100644
> --- a/hw/virtio/virtio-mmio.c
> +++ b/hw/virtio/virtio-mmio.c
> @@ -26,44 +26,11 @@
>   #include "qemu/host-utils.h"
>   #include "qemu/module.h"
>   #include "sysemu/kvm.h"
> -#include "hw/virtio/virtio-bus.h"
> +#include "virtio-mmio.h"


Virtually all the other includes of virtio-xxx.h files in hw/virtio use 
the full path - e.g. "hw/virtio/virtio-mmio.h" - maybe do the same to be 
consistent.


>   #include "qemu/error-report.h"
>   #include "qemu/log.h"
>   #include "trace.h"
>   
> -/* QOM macros */
> -/* virtio-mmio-bus */
> -#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
> -#define VIRTIO_MMIO_BUS(obj) \
> -        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
> -#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
> -        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
> -#define VIRTIO_MMIO_BUS_CLASS(klass) \
> -        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
> -
> -/* virtio-mmio */
> -#define TYPE_VIRTIO_MMIO "virtio-mmio"
> -#define VIRTIO_MMIO(obj) \
> -        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
> -
> -#define VIRT_MAGIC 0x74726976 /* 'virt' */
> -#define VIRT_VERSION 1
> -#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
> -
> -typedef struct {
> -    /* Generic */
> -    SysBusDevice parent_obj;
> -    MemoryRegion iomem;
> -    qemu_irq irq;
> -    /* Guest accessible state needing migration and reset */
> -    uint32_t host_features_sel;
> -    uint32_t guest_features_sel;
> -    uint32_t guest_page_shift;
> -    /* virtio-bus */
> -    VirtioBusState bus;
> -    bool format_transport_address;
> -} VirtIOMMIOProxy;
> -
>   static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
>   {
>       return kvm_eventfds_enabled();
> diff --git a/hw/virtio/virtio-mmio.h b/hw/virtio/virtio-mmio.h
> new file mode 100644
> index 0000000000..2f3973f8c7
> --- /dev/null
> +++ b/hw/virtio/virtio-mmio.h
> @@ -0,0 +1,60 @@
> +/*
> + * Virtio MMIO bindings
> + *
> + * Copyright (c) 2011 Linaro Limited
> + *
> + * Author:
> + *  Peter Maydell <peter.maydell@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef QEMU_VIRTIO_MMIO_H
> +#define QEMU_VIRTIO_MMIO_H
> +
> +#include "hw/virtio/virtio-bus.h"
> +
> +/* QOM macros */
> +/* virtio-mmio-bus */
> +#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
> +#define VIRTIO_MMIO_BUS(obj) \
> +        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
> +#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
> +        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
> +#define VIRTIO_MMIO_BUS_CLASS(klass) \
> +        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
> +
> +/* virtio-mmio */
> +#define TYPE_VIRTIO_MMIO "virtio-mmio"
> +#define VIRTIO_MMIO(obj) \
> +        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
> +
> +#define VIRT_MAGIC 0x74726976 /* 'virt' */
> +#define VIRT_VERSION 1
> +#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
> +
> +typedef struct {
> +    /* Generic */
> +    SysBusDevice parent_obj;
> +    MemoryRegion iomem;
> +    qemu_irq irq;
> +    /* Guest accessible state needing migration and reset */
> +    uint32_t host_features_sel;
> +    uint32_t guest_features_sel;
> +    uint32_t guest_page_shift;
> +    /* virtio-bus */
> +    VirtioBusState bus;
> +    bool format_transport_address;
> +} VirtIOMMIOProxy;
> +
> +#endif
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers
  2019-07-25  9:46   ` Liam Merwick
@ 2019-07-25  9:58     ` Michael S. Tsirkin
  2019-07-25 10:03       ` Peter Maydell
  2019-07-25 10:36       ` Paolo Bonzini
  0 siblings, 2 replies; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25  9:58 UTC (permalink / raw)
  To: Liam Merwick
  Cc: ehabkost, Sergio Lopez, maran.wilson, qemu-devel, kraxel,
	pbonzini, sgarzare, rth

On Thu, Jul 25, 2019 at 10:46:00AM +0100, Liam Merwick wrote:
> On 02/07/2019 13:11, Sergio Lopez wrote:
> > Put QOM and main struct definition in a separate header file, so it
> > can be accesed from other components.
> 
> typo: accesed -> accessed
> 
> > 
> > This is needed for the microvm machine type implementation.
> > 
> > Signed-off-by: Sergio Lopez <slp@redhat.com>
> 
> One nit below, either way
> 
> Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
> 
> > ---
> >   hw/virtio/virtio-mmio.c | 35 +-----------------------
> >   hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 61 insertions(+), 34 deletions(-)
> >   create mode 100644 hw/virtio/virtio-mmio.h
> > 
> > diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
> > index 97b7f35496..87c7fe4d8d 100644
> > --- a/hw/virtio/virtio-mmio.c
> > +++ b/hw/virtio/virtio-mmio.c
> > @@ -26,44 +26,11 @@
> >   #include "qemu/host-utils.h"
> >   #include "qemu/module.h"
> >   #include "sysemu/kvm.h"
> > -#include "hw/virtio/virtio-bus.h"
> > +#include "virtio-mmio.h"
> 
> 
> Virtually all the other includes of virtio-xxx.h files in hw/virtio use the
> full path - e.g. "hw/virtio/virtio-mmio.h" - maybe do the same to be
> consistent.

That's for headers under include/.
Local ones are ok with a short name.


> 
> >   #include "qemu/error-report.h"
> >   #include "qemu/log.h"
> >   #include "trace.h"
> > -/* QOM macros */
> > -/* virtio-mmio-bus */
> > -#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
> > -#define VIRTIO_MMIO_BUS(obj) \
> > -        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
> > -#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
> > -        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
> > -#define VIRTIO_MMIO_BUS_CLASS(klass) \
> > -        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
> > -
> > -/* virtio-mmio */
> > -#define TYPE_VIRTIO_MMIO "virtio-mmio"
> > -#define VIRTIO_MMIO(obj) \
> > -        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
> > -
> > -#define VIRT_MAGIC 0x74726976 /* 'virt' */
> > -#define VIRT_VERSION 1
> > -#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
> > -
> > -typedef struct {
> > -    /* Generic */
> > -    SysBusDevice parent_obj;
> > -    MemoryRegion iomem;
> > -    qemu_irq irq;
> > -    /* Guest accessible state needing migration and reset */
> > -    uint32_t host_features_sel;
> > -    uint32_t guest_features_sel;
> > -    uint32_t guest_page_shift;
> > -    /* virtio-bus */
> > -    VirtioBusState bus;
> > -    bool format_transport_address;
> > -} VirtIOMMIOProxy;
> > -
> >   static bool virtio_mmio_ioeventfd_enabled(DeviceState *d)
> >   {
> >       return kvm_eventfds_enabled();
> > diff --git a/hw/virtio/virtio-mmio.h b/hw/virtio/virtio-mmio.h
> > new file mode 100644
> > index 0000000000..2f3973f8c7
> > --- /dev/null
> > +++ b/hw/virtio/virtio-mmio.h
> > @@ -0,0 +1,60 @@
> > +/*
> > + * Virtio MMIO bindings
> > + *
> > + * Copyright (c) 2011 Linaro Limited
> > + *
> > + * Author:
> > + *  Peter Maydell <peter.maydell@linaro.org>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License along
> > + * with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef QEMU_VIRTIO_MMIO_H
> > +#define QEMU_VIRTIO_MMIO_H
> > +
> > +#include "hw/virtio/virtio-bus.h"
> > +
> > +/* QOM macros */
> > +/* virtio-mmio-bus */
> > +#define TYPE_VIRTIO_MMIO_BUS "virtio-mmio-bus"
> > +#define VIRTIO_MMIO_BUS(obj) \
> > +        OBJECT_CHECK(VirtioBusState, (obj), TYPE_VIRTIO_MMIO_BUS)
> > +#define VIRTIO_MMIO_BUS_GET_CLASS(obj) \
> > +        OBJECT_GET_CLASS(VirtioBusClass, (obj), TYPE_VIRTIO_MMIO_BUS)
> > +#define VIRTIO_MMIO_BUS_CLASS(klass) \
> > +        OBJECT_CLASS_CHECK(VirtioBusClass, (klass), TYPE_VIRTIO_MMIO_BUS)
> > +
> > +/* virtio-mmio */
> > +#define TYPE_VIRTIO_MMIO "virtio-mmio"
> > +#define VIRTIO_MMIO(obj) \
> > +        OBJECT_CHECK(VirtIOMMIOProxy, (obj), TYPE_VIRTIO_MMIO)
> > +
> > +#define VIRT_MAGIC 0x74726976 /* 'virt' */
> > +#define VIRT_VERSION 1
> > +#define VIRT_VENDOR 0x554D4551 /* 'QEMU' */
> > +
> > +typedef struct {
> > +    /* Generic */
> > +    SysBusDevice parent_obj;
> > +    MemoryRegion iomem;
> > +    qemu_irq irq;
> > +    /* Guest accessible state needing migration and reset */
> > +    uint32_t host_features_sel;
> > +    uint32_t guest_features_sel;
> > +    uint32_t guest_page_shift;
> > +    /* virtio-bus */
> > +    VirtioBusState bus;
> > +    bool format_transport_address;
> > +} VirtIOMMIOProxy;


I'm repeating myself, but still: if you insist on virtio mmio, please
implement virtio 1 and use that with microvm. We can't keep carrying
legacy interface into every new machine type.

> > +
> > +#endif
> > 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-02 22:04       ` Sergio Lopez
@ 2019-07-25  9:59         ` Michael S. Tsirkin
  2019-07-25 10:05           ` Peter Maydell
  0 siblings, 1 reply; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25  9:59 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Peter Maydell, Eduardo Habkost, maran.wilson, QEMU Developers,
	Gerd Hoffmann, Paolo Bonzini, Stefano Garzarella,
	Richard Henderson

On Wed, Jul 03, 2019 at 12:04:00AM +0200, Sergio Lopez wrote:
> On Tue, Jul 02, 2019 at 07:04:15PM +0100, Peter Maydell wrote:
> > On Tue, 2 Jul 2019 at 18:34, Sergio Lopez <slp@redhat.com> wrote:
> > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> > > > a bit deprecated and tends not to support all the features that
> > > > virtio-pci does. It was introduced mostly as a stopgap while we
> > > > didn't have pci support in the aarch64 virt machine, and remains
> > > > for legacy "we don't like to break existing working setups" rather
> > > > than as a recommended config for new systems.
> > >
> > > Using virtio-pci implies keeping PCI and ACPI support, defeating a
> > > significant part of microvm's purpose.
> > >
> > > What are the issues with the current state of virtio-mmio? Is there a
> > > way I can help to improve the situation?
> > 
> > Off the top of my head:
> >  * limitations on numbers of devices
> >  * no hotplug support
> >  * unlike PCI, it's not probeable, so you have to tell the
> >    guest where all the transports are using device tree or
> >    some similar mechanism
> >  * you need one IRQ line per transport, which restricts how
> >    many you can have
> >  * it's only virtio-0.9, it doesn't support any of the new
> >    virtio-1.0 functionality
> >  * it is broadly not really maintained in QEMU (and I think
> >    not really in the kernel either? not sure), because we'd
> >    rather not have to maintain two mechanisms for doing virtio
> >    when virtio-pci is clearly better than virtio-mmio
> 
> Some of these are design issues, but others can be improved with a bit
> of work.
> 
> As for the maintenance burden, I volunteer myself to help with that, so
> it won't have an impact on other developers and/or projects.
> 
> Sergio.

OK so please start with adding virtio 1 support. Guest bits
have been ready for years now.

-- 
MST


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-24 11:14                   ` Paolo Bonzini
  2019-07-25  9:35                     ` Sergio Lopez
@ 2019-07-25 10:03                     ` Michael S. Tsirkin
  2019-07-25 10:55                       ` Paolo Bonzini
  2019-07-25 14:46                     ` Michael S. Tsirkin
  2 siblings, 1 reply; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 10:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, Sergio Lopez, maran.wilson, Montes, Julio,
	Stefan Hajnoczi, qemu-devel, kraxel, rth, sgarzare

On Wed, Jul 24, 2019 at 01:14:35PM +0200, Paolo Bonzini wrote:
> On 23/07/19 12:01, Paolo Bonzini wrote:
> > The number of buses is determined by the firmware, not by QEMU, so
> > fw_cfg would not be the right interface.  In fact (as I have just
> > learnt) lastbus is an x86-specific option that overrides the last bus
> > returned by SeaBIOS's handle_1ab101.
> > 
> > So the next step could be to figure out what is the lastbus returned by
> > handle_1ab101 and possibly why it isn't zero.
> 
> Some update:
> 
> - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
> only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
> work on 32-bit kernels with ACPI disabled, because they are located beyond
> pcibios_last_bus (with ACPI enabled, the DSDT exposes them).
> 
> - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done.
> 
> - for -M q35, pcibios_last_bus in Linux is set based on the size of the 
> MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
> for buses above 0.
> 
> Here is a patch that only scans devfn==0, which should mostly remove the need
> for pci=lastbus=0.  (Testing is welcome).
> 
> Actually, KVM could probably avoid the scanning altogether.  The only "hidden" root
> buses we expect are from PCI expander bridges and if you found an MMCONFIG area
> through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges.
> However, I am being conservative.
> 
> A possible alternative could be a mechanism whereby the vmlinuz real mode entry
> point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
> kernel via the vmlinuz or PVH boot information structs.  However, I don't think
> that's very useful, and there is some risk of breaking real hardware too.
> 
> Paolo
> 
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 73bb404f4d2a..17012aa60d22 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -61,6 +61,7 @@ enum pci_bf_sort_state {
>  extern struct pci_ops pci_root_ops;
>  
>  void pcibios_scan_specific_bus(int busn);
> +void pcibios_scan_bus_by_device(int busn);
>  
>  /* pci-irq.c */
>  
> @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # endif
>  # define x86_default_pci_init_irq	pcibios_irq_init
>  # define x86_default_pci_fixup_irqs	pcibios_fixup_irqs
> +# define x86_default_pci_scan_bus	pcibios_scan_bus_by_device
>  #else
>  # define x86_default_pci_init		NULL
>  # define x86_default_pci_init_irq	NULL
>  # define x86_default_pci_fixup_irqs	NULL
> +# define x86_default_pci_scan_bus      NULL
>  #endif
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index b85a7c54c6a1..4c3a0a17a600 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -251,6 +251,7 @@ struct x86_hyper_runtime {
>   * @save_sched_clock_state:	save state for sched_clock() on suspend
>   * @restore_sched_clock_state:	restore state for sched_clock() on resume
>   * @apic_post_init:		adjust apic if needed
> + * @pci_scan_bus:		scan a PCI bus
>   * @legacy:			legacy features
>   * @set_legacy_features:	override legacy features. Use of this callback
>   * 				is highly discouraged. You should only need
> @@ -273,6 +274,7 @@ struct x86_platform_ops {
>  	void (*save_sched_clock_state)(void);
>  	void (*restore_sched_clock_state)(void);
>  	void (*apic_post_init)(void);
> +	void (*pci_scan_bus)(int busn);
>  	struct x86_legacy_features legacy;
>  	void (*set_legacy_features)(void);
>  	struct x86_hyper_runtime hyper;
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index 6857b4577f17..b248d7036dd3 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -11,12 +11,14 @@
>  #include <linux/acpi_pmtmr.h>
>  #include <linux/kernel.h>
>  #include <linux/reboot.h>
> +#include <linux/pci.h>
>  #include <asm/apic.h>
>  #include <asm/cpu.h>
>  #include <asm/hypervisor.h>
>  #include <asm/i8259.h>
>  #include <asm/irqdomain.h>
>  #include <asm/pci_x86.h>
> +#include <asm/pci.h>
>  #include <asm/reboot.h>
>  #include <asm/setup.h>
>  #include <asm/jailhouse_para.h>
> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
>  	return 0;
>  }
>  
> +static void jailhouse_pci_scan_bus_by_function(int busn)
> +{
> +        int devfn;
> +        u32 l;
> +
> +        for (devfn = 0; devfn < 256; devfn++) {
> +                if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
> +                    l != 0x0000 && l != 0xffff) {
> +                        DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> +                        pr_info("PCI: Discovered peer bus %02x\n", busn);
> +                        pcibios_scan_root(busn);
> +                        return;
> +                }
> +        }
> +}
> +
>  static void __init jailhouse_init_platform(void)
>  {
>  	u64 pa_data = boot_params.hdr.setup_data;
> @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void)
>  	x86_platform.legacy.rtc		= 0;
>  	x86_platform.legacy.warm_reset	= 0;
>  	x86_platform.legacy.i8042	= X86_LEGACY_I8042_PLATFORM_ABSENT;
> +	x86_platform.pci_scan_bus	= jailhouse_pci_scan_bus_by_function;
>  
>  	legacy_pic			= &null_legacy_pic;
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 82caf01b63dd..59f7204ed8f3 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -24,6 +24,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/nmi.h>
>  #include <linux/swait.h>
> +#include <linux/pci.h>
>  #include <asm/timer.h>
>  #include <asm/cpu.h>
>  #include <asm/traps.h>
> @@ -33,6 +34,7 @@
>  #include <asm/apicdef.h>
>  #include <asm/hypervisor.h>
>  #include <asm/tlb.h>
> +#include <asm/pci.h>
>  
>  static int kvmapf = 1;
>  
> @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask,
>  	native_flush_tlb_others(flushmask, info);
>  }
>  
> +#ifdef CONFIG_PCI
> +static void kvm_pci_scan_bus(int busn)
> +{
> +        u32 l;
> +
> +	/*
> +	 * Assume that there are no "hidden" buses, i.e. all PCI root buses
> +	 * have a host bridge at device 0, function 0.
> +	 */
> +	if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) &&
> +	    l != 0x0000 && l != 0xffff) {
> +		pr_info("PCI: Discovered peer bus %02x\n", busn);
> +		pcibios_scan_root(busn);
> +        }
> +}
> +#endif
> +
>  static void __init kvm_guest_init(void)
>  {
>  	int i;
>  
> +#ifdef CONFIG_PCI
> +	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
> +#endif
> +
>  	if (!kvm_para_available())
>  		return;
>  

Shouldn't this happen after kvm_para_available?
In fact, let's add a CPU ID flag for this, so it's
easy to tell guest whether to scan extra buses.
What do you say?

> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 50a2b492fdd6..19e1cc2cb6e0 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
>  	.get_nmi_reason			= default_get_nmi_reason,
>  	.save_sched_clock_state 	= tsc_save_sched_clock_state,
>  	.restore_sched_clock_state 	= tsc_restore_sched_clock_state,
> +	.pci_scan_bus			= x86_default_pci_scan_bus,
>  	.hyper.pin_vcpu			= x86_op_int_noop,
>  };
>  
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 467311b1eeea..6214dbce26d3 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -36,14 +36,19 @@ int __init pci_legacy_init(void)
>  
>  void pcibios_scan_specific_bus(int busn)
>  {
> -	int stride = jailhouse_paravirt() ? 1 : 8;
> -	int devfn;
> -	u32 l;
> -
>  	if (pci_find_bus(0, busn))
>  		return;
>  
> -	for (devfn = 0; devfn < 256; devfn += stride) {
> +	x86_platform.pci_scan_bus(busn);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
> +
> +void pcibios_scan_bus_by_device(int busn)
> +{
> +	int devfn;
> +	u32 l;
> +
> +	for (devfn = 0; devfn < 256; devfn += 8) {
>  		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
>  		    l != 0x0000 && l != 0xffff) {
>  			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn)
>  		}
>  	}
>  }
> -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
>  
>  static int __init pci_subsys_init(void)
>  {


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers
  2019-07-25  9:58     ` Michael S. Tsirkin
@ 2019-07-25 10:03       ` Peter Maydell
  2019-07-25 10:36       ` Paolo Bonzini
  1 sibling, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2019-07-25 10:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eduardo Habkost, Sergio Lopez, maran.wilson, QEMU Developers,
	Gerd Hoffmann, Paolo Bonzini, Richard Henderson,
	Stefano Garzarella

On Thu, 25 Jul 2019 at 10:58, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jul 25, 2019 at 10:46:00AM +0100, Liam Merwick wrote:
> > On 02/07/2019 13:11, Sergio Lopez wrote:
> > > Put QOM and main struct definition in a separate header file, so it
> > > can be accesed from other components.
> >
> > typo: accesed -> accessed
> >
> > >
> > > This is needed for the microvm machine type implementation.
> > >
> > > Signed-off-by: Sergio Lopez <slp@redhat.com>
> >
> > One nit below, either way
> >
> > Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
> >
> > > ---
> > >   hw/virtio/virtio-mmio.c | 35 +-----------------------
> > >   hw/virtio/virtio-mmio.h | 60 +++++++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 61 insertions(+), 34 deletions(-)
> > >   create mode 100644 hw/virtio/virtio-mmio.h
> > >
> > > diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
> > > index 97b7f35496..87c7fe4d8d 100644
> > > --- a/hw/virtio/virtio-mmio.c
> > > +++ b/hw/virtio/virtio-mmio.c
> > > @@ -26,44 +26,11 @@
> > >   #include "qemu/host-utils.h"
> > >   #include "qemu/module.h"
> > >   #include "sysemu/kvm.h"
> > > -#include "hw/virtio/virtio-bus.h"
> > > +#include "virtio-mmio.h"
> >
> >
> > Virtually all the other includes of virtio-xxx.h files in hw/virtio use the
> > full path - e.g. "hw/virtio/virtio-mmio.h" - maybe do the same to be
> > consistent.
>
> That's for headers under include/.
> Local ones are ok with a short name.

Yes, but we should put this one into include/ as that fits with
our usual arrangement of where we put the headers for devices.

> I'm repeating myself, but still: if you insist on virtio mmio, please
> implement virtio 1 and use that with microvm. We can't keep carrying
> legacy interface into every new machine type.

Agreed (but we've had this discussion on another thread, as you say).

thanks
-- PMM


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25  9:59         ` Michael S. Tsirkin
@ 2019-07-25 10:05           ` Peter Maydell
  2019-07-25 10:10             ` Michael S. Tsirkin
  2019-07-25 10:42             ` Sergio Lopez
  0 siblings, 2 replies; 68+ messages in thread
From: Peter Maydell @ 2019-07-25 10:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eduardo Habkost, Sergio Lopez, maran.wilson, QEMU Developers,
	Gerd Hoffmann, Paolo Bonzini, Stefano Garzarella,
	Richard Henderson

On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> OK so please start with adding virtio 1 support. Guest bits
> have been ready for years now.

I'd still rather we just used pci virtio. If pci isn't
fast enough at startup, do something to make it faster...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 10:05           ` Peter Maydell
@ 2019-07-25 10:10             ` Michael S. Tsirkin
  2019-07-25 14:52               ` Sergio Lopez
  2019-07-25 10:42             ` Sergio Lopez
  1 sibling, 1 reply; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 10:10 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Eduardo Habkost, Sergio Lopez, maran.wilson, QEMU Developers,
	Gerd Hoffmann, Paolo Bonzini, Stefano Garzarella,
	Richard Henderson

On Thu, Jul 25, 2019 at 11:05:05AM +0100, Peter Maydell wrote:
> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> > OK so please start with adding virtio 1 support. Guest bits
> > have been ready for years now.
> 
> I'd still rather we just used pci virtio. If pci isn't
> fast enough at startup, do something to make it faster...
> 
> thanks
> -- PMM

Oh that's putting microvm aside - if we have a maintainer for
virtio mmio that's great because it does need a maintainer,
and virtio 1 would be the thing to fix before adding features ;)

-- 
MST


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers
  2019-07-25  9:58     ` Michael S. Tsirkin
  2019-07-25 10:03       ` Peter Maydell
@ 2019-07-25 10:36       ` Paolo Bonzini
  1 sibling, 0 replies; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-25 10:36 UTC (permalink / raw)
  To: Michael S. Tsirkin, Liam Merwick
  Cc: ehabkost, Sergio Lopez, maran.wilson, qemu-devel, kraxel, rth, sgarzare

On 25/07/19 11:58, Michael S. Tsirkin wrote:
> I'm repeating myself, but still: if you insist on virtio mmio, please
> implement virtio 1 and use that with microvm. We can't keep carrying
> legacy interface into every new machine type.

I'd give Sergio the benefit of doubt, since so far he's addressed many
other review comments---just, one at a time. :)

Paolo


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 10:05           ` Peter Maydell
  2019-07-25 10:10             ` Michael S. Tsirkin
@ 2019-07-25 10:42             ` Sergio Lopez
  2019-07-25 11:23               ` Paolo Bonzini
  1 sibling, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-07-25 10:42 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Eduardo Habkost, maran.wilson, Michael S. Tsirkin,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 769 bytes --]


Peter Maydell <peter.maydell@linaro.org> writes:

> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
>> OK so please start with adding virtio 1 support. Guest bits
>> have been ready for years now.
>
> I'd still rather we just used pci virtio. If pci isn't
> fast enough at startup, do something to make it faster...

Actually, removing PCI (and ACPI), is one of the main ways microvm has
to reduce not only boot time, but also the exposed surface and the
general footprint.

I think we need to discuss and settle whether using virtio-mmio (even if
maintained and upgraded to virtio 1) for a new machine type is
acceptable or not. Because if it isn't, we should probably just ditch
the whole microvm idea and move to something else.

Sergio.



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/4] hw/i386: Introduce the microvm machine type
  2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
  2019-07-02 13:58   ` Gerd Hoffmann
@ 2019-07-25 10:47   ` Paolo Bonzini
  1 sibling, 0 replies; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-25 10:47 UTC (permalink / raw)
  To: Sergio Lopez, mst, marcel.apfelbaum, rth, ehabkost, maran.wilson,
	sgarzare, kraxel
  Cc: qemu-devel

On 02/07/19 14:11, Sergio Lopez wrote:
> +static void microvm_ioapic_init(MicrovmMachineState *mms)
> +{
> +    qemu_irq *ioapic_irq;
> +    DeviceState *ioapic_dev;
> +    SysBusDevice *d;
> +    int i;
> +
> +    assert(kvm_irqchip_in_kernel());
> +    ioapic_irq = g_new0(qemu_irq, IOAPIC_NUM_PINS);
> +    kvm_pc_setup_irq_routing(true);
> +
> +    assert(kvm_ioapic_in_kernel());
> +    ioapic_dev = qdev_create(NULL, "kvm-ioapic");
> +
> +    object_property_add_child(qdev_get_machine(),
> +                              "ioapic", OBJECT(ioapic_dev), NULL);

Please use the userspace IOAPIC instead, using the kernel one is just
sweeping the attack surface under the rug.

You are also missing the LAPIC device; things are working only because
KVM is helpfully creating one for you but it's better to be precise in
the description of the hardware.  I'd like to have support for non-KVM
accelerators (TCG, HAX, HVF) and they should all come for free if you
support "-machine kernel_irqchip=off".

Finally, I thing we can agree that legacy mode can go away.  At the same
time I think we should always have:

1) an ISA bus, even if it's mostly empty, since we have fw_cfg on it now

2) an optional RTC accessible via "-machine rtc=on|off", so that the
guest can know the current time even if it is running under an
accelerator other than KVM (or doesn't have access to kvmclock).

3) possibly, a fake "keyboard controller" device to support reset via
port 64h

Thanks!

Paolo

> +    qdev_init_nofail(ioapic_dev);
> +    d = SYS_BUS_DEVICE(ioapic_dev);
> +    sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
> +
> +    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> +        ioapic_irq[i] = qdev_get_gpio_in(ioapic_dev, i);
> +    }
> +
> +    mms->gsi = qemu_allocate_irqs(microvm_gsi_handler,
> +                                  ioapic_irq, IOAPIC_NUM_PINS);
> +
> +    for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
> +        sysbus_create_simple("virtio-mmio",
> +                             VIRTIO_MMIO_BASE + i * 512,
> +                             mms->gsi[VIRTIO_IRQ_BASE + i]);
> +    }
> +}
> +
> +static void microvm_memory_init(MicrovmMachineState *mms)
> +{
> +    MachineState *machine = MACHINE(mms);
> +    MemoryRegion *ram, *ram_below_4g, *ram_above_4g;
> +    MemoryRegion *system_memory = get_system_memory();
> +
> +    if (machine->ram_size > MICROVM_MAX_BELOW_4G) {
> +        mms->above_4g_mem_size = machine->ram_size - MICROVM_MAX_BELOW_4G;
> +        mms->below_4g_mem_size = MICROVM_MAX_BELOW_4G;
> +    } else {
> +        mms->above_4g_mem_size = 0;
> +        mms->below_4g_mem_size = machine->ram_size;
> +    }
> +
> +    ram = g_malloc(sizeof(*ram));
> +    memory_region_allocate_system_memory(ram, NULL, "microvm.ram",
> +                                         machine->ram_size);
> +
> +    ram_below_4g = g_malloc(sizeof(*ram_below_4g));
> +    memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram,
> +                             0, mms->below_4g_mem_size);
> +    memory_region_add_subregion(system_memory, 0, ram_below_4g);
> +
> +    e820_add_entry(0, mms->below_4g_mem_size, E820_RAM);
> +
> +    if (mms->above_4g_mem_size > 0) {
> +        ram_above_4g = g_malloc(sizeof(*ram_above_4g));
> +        memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram,
> +                                 mms->below_4g_mem_size,
> +                                 mms->above_4g_mem_size);
> +        memory_region_add_subregion(system_memory, 0x100000000ULL,
> +                                    ram_above_4g);
> +        e820_add_entry(0x100000000ULL, mms->above_4g_mem_size, E820_RAM);
> +    }
> +}
> +
> +static void microvm_cpus_init(const char *typename, Error **errp)
> +{
> +    int i;
> +
> +    for (i = 0; i < smp_cpus; i++) {
> +        Object *cpu = NULL;
> +        Error *local_err = NULL;
> +
> +        cpu = object_new(typename);
> +
> +        object_property_set_uint(cpu, i, "apic-id", &local_err);
> +        object_property_set_bool(cpu, true, "realized", &local_err);
> +
> +        object_unref(cpu);
> +        error_propagate(errp, local_err);
> +    }
> +}
> +
> +static void microvm_machine_state_init(MachineState *machine)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    Error *local_err = NULL;
> +
> +    if (machine->kernel_filename == NULL) {
> +        error_report("missing kernel image file name, required by microvm");
> +        exit(1);
> +    }
> +
> +    microvm_memory_init(mms);
> +
> +    microvm_cpus_init(machine->cpu_type, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        exit(1);
> +    }
> +
> +    if (mms->legacy) {
> +        microvm_legacy_init(mms);
> +    } else {
> +        microvm_ioapic_init(mms);
> +    }
> +
> +    kvmclock_create();
> +
> +    if (!pvh_load_elfboot(machine->kernel_filename, NULL, NULL)) {
> +        error_report("Error while loading elf kernel");
> +        exit(1);
> +    }
> +
> +    if (machine->initrd_filename) {
> +        uint32_t initrd_max;
> +        gsize initrd_size;
> +        gchar *initrd_data;
> +        GError *gerr = NULL;
> +
> +        if (!g_file_get_contents(machine->initrd_filename, &initrd_data,
> +                                 &initrd_size, &gerr)) {
> +            error_report("qemu: error reading initrd %s: %s\n",
> +                         machine->initrd_filename, gerr->message);
> +            exit(1);
> +        }
> +
> +        initrd_max = mms->below_4g_mem_size - HIMEM_START;
> +        if (initrd_size >= initrd_max) {
> +            error_report("qemu: initrd is too large, cannot support."
> +                         "(max: %"PRIu32", need %"PRId64")\n",
> +                         initrd_max, (uint64_t)initrd_size);
> +            exit(1);
> +        }
> +
> +        address_space_write(&address_space_memory,
> +                            HIMEM_START, MEMTXATTRS_UNSPECIFIED,
> +                            (uint8_t *) initrd_data, initrd_size);
> +
> +        g_free(initrd_data);
> +
> +        mms->initrd_addr = HIMEM_START;
> +        mms->initrd_size = initrd_size;
> +    }
> +
> +    mms->elf_entry = pvh_get_start_addr();
> +}
> +
> +static gchar *microvm_get_mmio_cmdline(gchar *name)
> +{
> +    gchar *cmdline;
> +    gchar *separator;
> +    long int index;
> +    int ret;
> +
> +    separator = g_strrstr(name, ".");
> +    if (!separator) {
> +        return NULL;
> +    }
> +
> +    if (qemu_strtol(separator + 1, NULL, 10, &index) != 0) {
> +        return NULL;
> +    }
> +
> +    cmdline = g_malloc0(VIRTIO_CMDLINE_MAXLEN);
> +    ret = g_snprintf(cmdline, VIRTIO_CMDLINE_MAXLEN,
> +                     " virtio_mmio.device=512@0x%lx:%ld",
> +                     VIRTIO_MMIO_BASE + index * 512,
> +                     VIRTIO_IRQ_BASE + index);
> +    if (ret < 0 || ret >= VIRTIO_CMDLINE_MAXLEN) {
> +        g_free(cmdline);
> +        return NULL;
> +    }
> +
> +    return cmdline;
> +}
> +
> +static void microvm_setup_pvh(MicrovmMachineState *mms,
> +                              const gchar *kernel_cmdline)
> +{
> +    struct hvm_memmap_table_entry *memmap_table;
> +    struct hvm_start_info *start_info;
> +    BusState *bus;
> +    BusChild *kid;
> +    gchar *cmdline;
> +    int cmdline_len;
> +    int memmap_entries;
> +    int i;
> +
> +    cmdline = g_strdup(kernel_cmdline);
> +
> +    /*
> +     * Find MMIO transports with attached devices, and add them to the kernel
> +     * command line.
> +     */
> +    bus = sysbus_get_default();
> +    QTAILQ_FOREACH(kid, &bus->children, sibling) {
> +        DeviceState *dev = kid->child;
> +        ObjectClass *class = object_get_class(OBJECT(dev));
> +
> +        if (class == object_class_by_name(TYPE_VIRTIO_MMIO)) {
> +            VirtIOMMIOProxy *mmio = VIRTIO_MMIO(OBJECT(dev));
> +            VirtioBusState *mmio_virtio_bus = &mmio->bus;
> +            BusState *mmio_bus = &mmio_virtio_bus->parent_obj;
> +
> +            if (!QTAILQ_EMPTY(&mmio_bus->children)) {
> +                gchar *mmio_cmdline = microvm_get_mmio_cmdline(mmio_bus->name);
> +                if (mmio_cmdline) {
> +                    char *newcmd = g_strjoin(NULL, cmdline, mmio_cmdline, NULL);
> +                    g_free(mmio_cmdline);
> +                    g_free(cmdline);
> +                    cmdline = newcmd;
> +                }
> +            }
> +        }
> +    }
> +
> +    cmdline_len = strlen(cmdline);
> +
> +    address_space_write(&address_space_memory,
> +                        KERNEL_CMDLINE_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) cmdline, cmdline_len);
> +
> +    g_free(cmdline);
> +
> +    memmap_entries = e820_get_num_entries();
> +    memmap_table = g_new0(struct hvm_memmap_table_entry, memmap_entries);
> +    for (i = 0; i < memmap_entries; i++) {
> +        uint64_t address, length;
> +        struct hvm_memmap_table_entry *entry = &memmap_table[i];
> +
> +        if (e820_get_entry(i, E820_RAM, &address, &length)) {
> +            entry->addr = address;
> +            entry->size = length;
> +            entry->type = E820_RAM;
> +            entry->reserved = 0;
> +        }
> +    }
> +
> +    address_space_write(&address_space_memory,
> +                        MEMMAP_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) memmap_table,
> +                        memmap_entries * sizeof(struct hvm_memmap_table_entry));
> +
> +    g_free(memmap_table);
> +
> +    start_info = g_malloc0(sizeof(struct hvm_start_info));
> +
> +    start_info->magic = XEN_HVM_START_MAGIC_VALUE;
> +    start_info->version = 1;
> +
> +    start_info->nr_modules = 0;
> +    start_info->cmdline_paddr = KERNEL_CMDLINE_START;
> +    start_info->memmap_entries = memmap_entries;
> +    start_info->memmap_paddr = MEMMAP_START;
> +
> +    if (mms->initrd_addr) {
> +        struct hvm_modlist_entry *entry = g_new0(struct hvm_modlist_entry, 1);
> +
> +        entry->paddr = mms->initrd_addr;
> +        entry->size = mms->initrd_size;
> +
> +        address_space_write(&address_space_memory,
> +                            MODLIST_START, MEMTXATTRS_UNSPECIFIED,
> +                            (uint8_t *) entry,
> +                            sizeof(struct hvm_modlist_entry));
> +        g_free(entry);
> +
> +        start_info->nr_modules = 1;
> +        start_info->modlist_paddr = MODLIST_START;
> +    } else {
> +        start_info->nr_modules = 0;
> +    }
> +
> +    address_space_write(&address_space_memory,
> +                        PVH_START_INFO, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) start_info,
> +                        sizeof(struct hvm_start_info));
> +
> +    g_free(start_info);
> +}
> +
> +static void microvm_init_page_tables(void)
> +{
> +    uint64_t val = 0;
> +    int i;
> +
> +    val = PDPTE_START | 0x03;
> +    address_space_write(&address_space_memory,
> +                        PML4_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) &val, 8);
> +    val = PDE_START | 0x03;
> +    address_space_write(&address_space_memory,
> +                        PDPTE_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) &val, 8);
> +
> +    for (i = 0; i < 512; i++) {
> +        val = (i << 21) + 0x83;
> +        address_space_write(&address_space_memory,
> +                            PDE_START + (i * 8), MEMTXATTRS_UNSPECIFIED,
> +                            (uint8_t *) &val, 8);
> +    }
> +}
> +
> +static void microvm_cpu_reset(CPUState *cs, uint64_t elf_entry)
> +{
> +    X86CPU *cpu = X86_CPU(cs);
> +    CPUX86State *env = &cpu->env;
> +    struct SegmentCache seg_code = { .selector = 0x8,
> +                                     .base = 0x0,
> +                                     .limit = 0xffffffff,
> +                                     .flags = 0xc09b00 };
> +    struct SegmentCache seg_data = { .selector = 0x10,
> +                                     .base = 0x0,
> +                                     .limit = 0xffffffff,
> +                                     .flags = 0xc09300 };
> +    struct SegmentCache seg_tr = { .selector = 0x18,
> +                                   .base = 0x0,
> +                                   .limit = 0xffff,
> +                                   .flags = 0x8b00 };
> +
> +    memcpy(&env->segs[R_CS], &seg_code, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_DS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_ES], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_FS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_GS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->segs[R_SS], &seg_data, sizeof(struct SegmentCache));
> +    memcpy(&env->tr, &seg_tr, sizeof(struct SegmentCache));
> +
> +    env->regs[R_EBX] = PVH_START_INFO;
> +
> +    cpu_set_pc(cs, elf_entry);
> +    cpu_x86_update_cr3(env, 0);
> +    cpu_x86_update_cr4(env, 0);
> +    cpu_x86_update_cr0(env, CR0_PE_MASK);
> +
> +    x86_update_hflags(env);
> +}
> +
> +static void microvm_mptable_setup(MicrovmMachineState *mms)
> +{
> +    char *mptable;
> +    int size;
> +
> +    mptable = mptable_generate(smp_cpus, EBDA_START, &size);
> +    address_space_write(&address_space_memory,
> +                        EBDA_START, MEMTXATTRS_UNSPECIFIED,
> +                        (uint8_t *) mptable, size);
> +    g_free(mptable);
> +}
> +
> +static bool microvm_machine_get_legacy(Object *obj, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    return mms->legacy;
> +}
> +
> +static void microvm_machine_set_legacy(Object *obj, bool value, Error **errp)
> +{
> +    MicrovmMachineState *mms = MICROVM_MACHINE(obj);
> +
> +    mms->legacy = value;
> +}
> +
> +static void microvm_machine_reset(void)
> +{
> +    MachineState *machine = MACHINE(qdev_get_machine());
> +    MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> +    CPUState *cs;
> +    X86CPU *cpu;
> +
> +    qemu_devices_reset();
> +
> +    microvm_mptable_setup(mms);
> +    microvm_setup_pvh(mms, machine->kernel_cmdline);
> +    microvm_init_page_tables();
> +
> +    CPU_FOREACH(cs) {
> +        cpu = X86_CPU(cs);
> +
> +        if (cpu->apic_state) {
> +            device_reset(cpu->apic_state);
> +        }
> +
> +        microvm_cpu_reset(cs, mms->elf_entry);
> +    }
> +}
> +
> +static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
> +{
> +    CPUState *cs;
> +
> +    CPU_FOREACH(cs) {
> +        X86CPU *cpu = X86_CPU(cs);
> +
> +        if (!cpu->apic_state) {
> +            cpu_interrupt(cs, CPU_INTERRUPT_NMI);
> +        } else {
> +            apic_deliver_nmi(cpu->apic_state);
> +        }
> +    }
> +}
> +
> +static void microvm_class_init(ObjectClass *oc, void *data)
> +{
> +    MachineClass *mc = MACHINE_CLASS(oc);
> +    NMIClass *nc = NMI_CLASS(oc);
> +
> +    mc->init = microvm_machine_state_init;
> +
> +    mc->family = "microvm_i386";
> +    mc->desc = "Microvm (i386)";
> +    mc->units_per_default_bus = 1;
> +    mc->no_floppy = 1;
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugcon");
> +    machine_class_allow_dynamic_sysbus_dev(mc, "sysbus-debugexit");
> +    mc->max_cpus = 288;
> +    mc->has_hotpluggable_cpus = false;
> +    mc->auto_enable_numa_with_memhp = false;
> +    mc->default_cpu_type = X86_CPU_TYPE_NAME("host");
> +    mc->nvdimm_supported = false;
> +    mc->default_machine_opts = "accel=kvm";
> +
> +    /* Machine class handlers */
> +    mc->reset = microvm_machine_reset;
> +
> +    /* NMI handler */
> +    nc->nmi_monitor_handler = x86_nmi;
> +
> +    object_class_property_add_bool(oc, MICROVM_MACHINE_LEGACY,
> +                                   microvm_machine_get_legacy,
> +                                   microvm_machine_set_legacy,
> +                                   &error_abort);
> +}
> +
> +static const TypeInfo microvm_machine_info = {
> +    .name          = TYPE_MICROVM_MACHINE,
> +    .parent        = TYPE_MACHINE,
> +    .instance_size = sizeof(MicrovmMachineState),
> +    .class_size    = sizeof(MicrovmMachineClass),
> +    .class_init    = microvm_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +         { TYPE_NMI },
> +         { }
> +    },
> +};
> +
> +static void microvm_machine_init(void)
> +{
> +    type_register_static(&microvm_machine_info);
> +}
> +type_init(microvm_machine_init);
> diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
> new file mode 100644
> index 0000000000..fd6f370997
> --- /dev/null
> +++ b/include/hw/i386/microvm.h
> @@ -0,0 +1,82 @@
> +/*
> + * Copyright (c) 2018 Intel Corporation
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef HW_I386_MICROVM_H
> +#define HW_I386_MICROVM_H
> +
> +#include "qemu-common.h"
> +#include "exec/hwaddr.h"
> +#include "qemu/notify.h"
> +
> +#include "hw/boards.h"
> +
> +/* Microvm memory layout */
> +#define PVH_START_INFO        0x6000
> +#define MEMMAP_START          0x7000
> +#define MODLIST_START         0x7800
> +#define BOOT_STACK_POINTER    0x8ff0
> +#define PML4_START            0x9000
> +#define PDPTE_START           0xa000
> +#define PDE_START             0xb000
> +#define KERNEL_CMDLINE_START  0x20000
> +#define EBDA_START            0x9fc00
> +#define HIMEM_START           0x100000
> +#define MICROVM_MAX_BELOW_4G  0xe0000000
> +
> +/* Platform virtio definitions */
> +#define VIRTIO_MMIO_BASE      0xd0000000
> +#define VIRTIO_IRQ_BASE       5
> +#define VIRTIO_NUM_TRANSPORTS 8
> +#define VIRTIO_CMDLINE_MAXLEN 64
> +
> +/* Machine type options */
> +#define MICROVM_MACHINE_LEGACY "legacy"
> +
> +typedef struct {
> +    MachineClass parent;
> +    HotplugHandler *(*orig_hotplug_handler)(MachineState *machine,
> +                                           DeviceState *dev);
> +} MicrovmMachineClass;
> +
> +typedef struct {
> +    MachineState parent;
> +    qemu_irq *gsi;
> +
> +    /* RAM size */
> +    ram_addr_t below_4g_mem_size;
> +    ram_addr_t above_4g_mem_size;
> +
> +    /* Kernel ELF entry. On reset, vCPUs RIP will be set to this */
> +    uint64_t elf_entry;
> +
> +    /* Optional initrd start address and size */
> +    uint64_t initrd_addr;
> +    uint32_t initrd_size;
> +
> +    /* Legacy mode based on an ISA bus. Useful for debugging */
> +    bool legacy;
> +} MicrovmMachineState;
> +
> +#define TYPE_MICROVM_MACHINE   MACHINE_TYPE_NAME("microvm")
> +#define MICROVM_MACHINE(obj) \
> +    OBJECT_CHECK(MicrovmMachineState, (obj), TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(MicrovmMachineClass, obj, TYPE_MICROVM_MACHINE)
> +#define MICROVM_MACHINE_CLASS(class) \
> +    OBJECT_CLASS_CHECK(MicrovmMachineClass, class, TYPE_MICROVM_MACHINE)
> +
> +#endif
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 10:03                     ` Michael S. Tsirkin
@ 2019-07-25 10:55                       ` Paolo Bonzini
  0 siblings, 0 replies; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-25 10:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost, Sergio Lopez, maran.wilson, Montes, Julio,
	Stefan Hajnoczi, qemu-devel, kraxel, sgarzare, rth

On 25/07/19 12:03, Michael S. Tsirkin wrote:
>> +#ifdef CONFIG_PCI
>> +	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
>> +#endif
>> +
>>  	if (!kvm_para_available())
>>  		return;
>>  
> Shouldn't this happen after kvm_para_available?

Actually kvm_para_available() is not needed anymore, since this only
runs after kvm_detect() has returned true.

> In fact, let's add a CPU ID flag for this, so it's
> easy to tell guest whether to scan extra buses.
> What do you say?

I think it would make it much harder to deploy this, since it relies on
having new userspace and new machine types.  This patch is basically a
reflection of the status quo, which is that there are generally no
"hidden" buses on commonly-used KVM userspaces, and even in the weird
configurations that have them there is always something at devfn=0.

(On real hardware, the only such hidden bus is e.g. 0x7f/0xff, which
have a bunch of QPI and MCH-related devices.  This is not something
you'd have in a virtual machine).

Paolo


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 10:42             ` Sergio Lopez
@ 2019-07-25 11:23               ` Paolo Bonzini
  2019-07-25 12:01                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-25 11:23 UTC (permalink / raw)
  To: Sergio Lopez, Peter Maydell
  Cc: Eduardo Habkost, maran.wilson, Michael S. Tsirkin,
	QEMU Developers, Gerd Hoffmann, Richard Henderson,
	Stefano Garzarella

On 25/07/19 12:42, Sergio Lopez wrote:
> 
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
>> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> OK so please start with adding virtio 1 support. Guest bits
>>> have been ready for years now.
>>
>> I'd still rather we just used pci virtio. If pci isn't
>> fast enough at startup, do something to make it faster...
> 
> Actually, removing PCI (and ACPI), is one of the main ways microvm has
> to reduce not only boot time, but also the exposed surface and the
> general footprint.
> 
> I think we need to discuss and settle whether using virtio-mmio (even if
> maintained and upgraded to virtio 1) for a new machine type is
> acceptable or not. Because if it isn't, we should probably just ditch
> the whole microvm idea and move to something else.

I agree.  IMNSHO the reduced attack surface from removing PCI is
(mostly) security theater, however the boot time numbers that Sergio
showed for microvm are quite extreme and I don't think there is any hope
of getting even close with a PCI-based virtual machine.

So I'd even go a step further: if using virtio-mmio for a new machine
type is not acceptable, we should admit that boot time optimization in
QEMU is basically as good as it can get---low-hanging fruit has been
picked with PVH and mmap is the logical next step, but all that's left
is optimizing the guest or something else.

I must say that -M microvm took a while to grow on me, but I think it's
a great example of how the infrastructure provided by QEMU provides
useful features for free, even for the simplest emulated hardware.  For
example, in v3 microvm could only boot from PVH kernels, but the next
firmware-enabled version reuses more of the PC code and thus supports
all of vmlinuz, multiboot and PVH.

Again: Sergio has been very receptive to feedback and has provided
numbers to back the design choices, and we should reciprocate or at
least be very clear on the constraints.

Paolo


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 11:23               ` Paolo Bonzini
@ 2019-07-25 12:01                 ` Stefan Hajnoczi
  2019-07-25 12:10                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 68+ messages in thread
From: Stefan Hajnoczi @ 2019-07-25 12:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Michael S. Tsirkin,
	Maran Wilson, QEMU Developers, Gerd Hoffmann, Stefano Garzarella,
	Richard Henderson

On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 25/07/19 12:42, Sergio Lopez wrote:
> > Peter Maydell <peter.maydell@linaro.org> writes:
> >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> OK so please start with adding virtio 1 support. Guest bits
> >>> have been ready for years now.
> >>
> >> I'd still rather we just used pci virtio. If pci isn't
> >> fast enough at startup, do something to make it faster...
> >
> > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > to reduce not only boot time, but also the exposed surface and the
> > general footprint.
> >
> > I think we need to discuss and settle whether using virtio-mmio (even if
> > maintained and upgraded to virtio 1) for a new machine type is
> > acceptable or not. Because if it isn't, we should probably just ditch
> > the whole microvm idea and move to something else.
>
> I agree.  IMNSHO the reduced attack surface from removing PCI is
> (mostly) security theater, however the boot time numbers that Sergio
> showed for microvm are quite extreme and I don't think there is any hope
> of getting even close with a PCI-based virtual machine.
>
> So I'd even go a step further: if using virtio-mmio for a new machine
> type is not acceptable, we should admit that boot time optimization in
> QEMU is basically as good as it can get---low-hanging fruit has been
> picked with PVH and mmap is the logical next step, but all that's left
> is optimizing the guest or something else.

I haven't seen enough analysis to declare boot time optimization done.
QEMU startup can be profiled and improved.

The numbers show that removing PCI and ACPI makes things faster but
this doesn't justify removing them.  Understanding of why they are
slow is what justifies removing them.  Otherwise it could just be a
misconfiguration, inefficient implementation, etc and we've seen there
is low-hanging fruit.

How much time is spent doing PCI initialization?  Is the vmexit
pattern for PCI initialization as good as the hardware interface
allows?

Without an analysis of why things are slow it's not possible come to
an informed decision.

Stefan


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 12:01                 ` Stefan Hajnoczi
@ 2019-07-25 12:10                   ` Michael S. Tsirkin
  2019-07-25 13:26                     ` Stefan Hajnoczi
  0 siblings, 1 reply; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 12:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote:
> On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > On 25/07/19 12:42, Sergio Lopez wrote:
> > > Peter Maydell <peter.maydell@linaro.org> writes:
> > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >>> OK so please start with adding virtio 1 support. Guest bits
> > >>> have been ready for years now.
> > >>
> > >> I'd still rather we just used pci virtio. If pci isn't
> > >> fast enough at startup, do something to make it faster...
> > >
> > > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > > to reduce not only boot time, but also the exposed surface and the
> > > general footprint.
> > >
> > > I think we need to discuss and settle whether using virtio-mmio (even if
> > > maintained and upgraded to virtio 1) for a new machine type is
> > > acceptable or not. Because if it isn't, we should probably just ditch
> > > the whole microvm idea and move to something else.
> >
> > I agree.  IMNSHO the reduced attack surface from removing PCI is
> > (mostly) security theater, however the boot time numbers that Sergio
> > showed for microvm are quite extreme and I don't think there is any hope
> > of getting even close with a PCI-based virtual machine.
> >
> > So I'd even go a step further: if using virtio-mmio for a new machine
> > type is not acceptable, we should admit that boot time optimization in
> > QEMU is basically as good as it can get---low-hanging fruit has been
> > picked with PVH and mmap is the logical next step, but all that's left
> > is optimizing the guest or something else.
> 
> I haven't seen enough analysis to declare boot time optimization done.
> QEMU startup can be profiled and improved.

Right, and that will always stay the case. OTOH imho microvm is
non-intrusive enough, and small enough, that we'd just put it upstream
after addressing low-level comments.
This will allow more contributions from people interested in boot time.
With no cross-version migration support, or maybe migration
disabled completely, maintainance burden should not be too high.
Not everyone wants to hack on pci/acpi specifically.


> The numbers show that removing PCI and ACPI makes things faster but
> this doesn't justify removing them.  Understanding of why they are
> slow is what justifies removing them.  Otherwise it could just be a
> misconfiguration, inefficient implementation, etc and we've seen there
> is low-hanging fruit.
> 
> How much time is spent doing PCI initialization?  Is the vmexit
> pattern for PCI initialization as good as the hardware interface
> allows?

I know in the bios we wanted to use memory mapped for pci config
accesses for a very long time now. This makes each vmexit slower but
cuts the number of exits by half. Only affects seabios though.




> Without an analysis of why things are slow it's not possible come to
> an informed decision.
> 
> Stefan


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 12:10                   ` Michael S. Tsirkin
@ 2019-07-25 13:26                     ` Stefan Hajnoczi
  2019-07-25 13:43                       ` Paolo Bonzini
  2019-07-25 13:48                       ` Michael S. Tsirkin
  0 siblings, 2 replies; 68+ messages in thread
From: Stefan Hajnoczi @ 2019-07-25 13:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

On Thu, Jul 25, 2019 at 1:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > On 25/07/19 12:42, Sergio Lopez wrote:
> > > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >>> OK so please start with adding virtio 1 support. Guest bits
> > > >>> have been ready for years now.
> > > >>
> > > >> I'd still rather we just used pci virtio. If pci isn't
> > > >> fast enough at startup, do something to make it faster...
> > > >
> > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > > > to reduce not only boot time, but also the exposed surface and the
> > > > general footprint.
> > > >
> > > > I think we need to discuss and settle whether using virtio-mmio (even if
> > > > maintained and upgraded to virtio 1) for a new machine type is
> > > > acceptable or not. Because if it isn't, we should probably just ditch
> > > > the whole microvm idea and move to something else.
> > >
> > > I agree.  IMNSHO the reduced attack surface from removing PCI is
> > > (mostly) security theater, however the boot time numbers that Sergio
> > > showed for microvm are quite extreme and I don't think there is any hope
> > > of getting even close with a PCI-based virtual machine.
> > >
> > > So I'd even go a step further: if using virtio-mmio for a new machine
> > > type is not acceptable, we should admit that boot time optimization in
> > > QEMU is basically as good as it can get---low-hanging fruit has been
> > > picked with PVH and mmap is the logical next step, but all that's left
> > > is optimizing the guest or something else.
> >
> > I haven't seen enough analysis to declare boot time optimization done.
> > QEMU startup can be profiled and improved.
>
> Right, and that will always stay the case.

The microvm design has a premise and it can be answered definitively
through performance analysis.

If I had to explain to someone why PCI or ACPI significantly slows
things down, I couldn't honestly do so.  I say significantly because
PCI init definitely requires more vmexits but can it be a small
number?  For ACPI I have no idea why it would consume significant
amounts of time.

Until we have this knowledge, the premise of microvm is unproven and
merging it would be premature because maybe we can get into the same
ballpark by optimizing existing code.

I'm sorry for being a pain.  I actually think the analysis will
support microvm, but it still needs to be done in order to justify it.

Stefan


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 13:26                     ` Stefan Hajnoczi
@ 2019-07-25 13:43                       ` Paolo Bonzini
  2019-07-25 13:54                         ` Michael S. Tsirkin
                                           ` (2 more replies)
  2019-07-25 13:48                       ` Michael S. Tsirkin
  1 sibling, 3 replies; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-25 13:43 UTC (permalink / raw)
  To: Stefan Hajnoczi, Michael S. Tsirkin
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	QEMU Developers, Gerd Hoffmann, Stefano Garzarella,
	Richard Henderson

On 25/07/19 15:26, Stefan Hajnoczi wrote:
> The microvm design has a premise and it can be answered definitively
> through performance analysis.
> 
> If I had to explain to someone why PCI or ACPI significantly slows
> things down, I couldn't honestly do so.  I say significantly because
> PCI init definitely requires more vmexits but can it be a small
> number?  For ACPI I have no idea why it would consume significant
> amounts of time.

My guess is that it's just a lot of code that has to run. :(

> Until we have this knowledge, the premise of microvm is unproven and
> merging it would be premature because maybe we can get into the same
> ballpark by optimizing existing code.
> 
> I'm sorry for being a pain.  I actually think the analysis will
> support microvm, but it still needs to be done in order to justify it.

No, you're not a pain, you're explaining your reasoning and that helps.

To me *maintainability is the biggest consideration* when introducing a
new feature.  "We can do just as well with q35" is a good reason to
deprecate and delete microvm, but not a good reason to reject it now as
long as microvm is good enough in terms of maintainability.  Keeping it
out of tree only makes it harder to do this kind of experiment.  virtio
1 seems to be the biggest remaining blocker and I think it'd be a good
thing to have even for the ARM virt machine type.

FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
and ~25 ms in the kernel.  I must say that's pretty good, but it's still
30% of the whole boot time and reducing it is the hardest part.  If
having microvm in tree can help reducing it, good.  Yes, it will get
users, but most likely they will have to support pc or q35 as a fallback
so we could still delete microvm at any time with the due deprecation
period if it turns out to be a failed experiment.

Whether to use qboot or SeaBIOS for microvm is another story, but it's
an implementation detail as long as the ROM size doesn't change and/or
we don't do versioned machine types.  So we can switch from one to the
other at any time; we can also include qboot directly in QEMU's tree,
without going through a submodule, which also reduces the infrastructure
needed (mirrors, etc.) and makes it easier to delete it.

Paolo

(*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
end up measured as PCI in SeaBIOS, due to different init order, so the
real firmware cost of PAM and PCI initialization should be 5ms for qboot
and 10ms for SeaBIOS.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 13:26                     ` Stefan Hajnoczi
  2019-07-25 13:43                       ` Paolo Bonzini
@ 2019-07-25 13:48                       ` Michael S. Tsirkin
  1 sibling, 0 replies; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 13:48 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

On Thu, Jul 25, 2019 at 02:26:12PM +0100, Stefan Hajnoczi wrote:
> On Thu, Jul 25, 2019 at 1:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > On 25/07/19 12:42, Sergio Lopez wrote:
> > > > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >>> OK so please start with adding virtio 1 support. Guest bits
> > > > >>> have been ready for years now.
> > > > >>
> > > > >> I'd still rather we just used pci virtio. If pci isn't
> > > > >> fast enough at startup, do something to make it faster...
> > > > >
> > > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > > > > to reduce not only boot time, but also the exposed surface and the
> > > > > general footprint.
> > > > >
> > > > > I think we need to discuss and settle whether using virtio-mmio (even if
> > > > > maintained and upgraded to virtio 1) for a new machine type is
> > > > > acceptable or not. Because if it isn't, we should probably just ditch
> > > > > the whole microvm idea and move to something else.
> > > >
> > > > I agree.  IMNSHO the reduced attack surface from removing PCI is
> > > > (mostly) security theater, however the boot time numbers that Sergio
> > > > showed for microvm are quite extreme and I don't think there is any hope
> > > > of getting even close with a PCI-based virtual machine.
> > > >
> > > > So I'd even go a step further: if using virtio-mmio for a new machine
> > > > type is not acceptable, we should admit that boot time optimization in
> > > > QEMU is basically as good as it can get---low-hanging fruit has been
> > > > picked with PVH and mmap is the logical next step, but all that's left
> > > > is optimizing the guest or something else.
> > >
> > > I haven't seen enough analysis to declare boot time optimization done.
> > > QEMU startup can be profiled and improved.
> >
> > Right, and that will always stay the case.
> 
> The microvm design has a premise and it can be answered definitively
> through performance analysis.
> 
> If I had to explain to someone why PCI or ACPI significantly slows
> things down, I couldn't honestly do so.

well with pci each device describes itself. you read
this description dword by dword normally. typical
description is 20-50 words.

if both bios and linux do this, that's twice the amount.

bios also uses two vmexits for each access.

there's also the resource allocation game.

I would say up to 200 exits per device is reasonable.


>  I say significantly because
> PCI init definitely requires more vmexits but can it be a small
> number?

each bus is scanned for devices. 32 accesses, 256 bus numbers
(that's the lastbus thing). Paolo posted a hack just
for the root bus but whenever we have a bridge the problem
will just re-surface.

pcie is actually link based so downstream buses do not
need to be scanned outside device 0 unless we see
a multifunction bit set. I don't think linux
implements this optimization atm.
But still the case for internal buses.


> For ACPI I have no idea why it would consume significant
> amounts of time.


me neither. I suspect it's not vmexit related at all.  Is ACPI driver in
linux just slow?  It's not been designed to be on any data path...
I'd love to know. I don't feel it's fair to ask someone
interested in writing new performant code to necessary optimize
old non-performant one.

> Until we have this knowledge, the premise of microvm is unproven and
> merging it would be premature because maybe we can get into the same
> ballpark by optimizing existing code.

maybe but who is working on this right now?

If it's possible to make PC faster but not enough people
know how to do it, and enough people know how to make microvm
faster, then it does not matter what's possible in theory.


> 
> I'm sorry for being a pain.  I actually think the analysis will
> support microvm, but it still needs to be done in order to justify it.
> 
> Stefan

At some level it would be great to have someone do detailed performance
profiling. But it is a lot of work, which also needs to be justified
given there's working code, and it's not bad code at that.

Yes speeding up PC would be nice but if everyone's gut feeling is it
won't get us what microvm is trying to achieve, why spend cycles making
sure?

-- 
MST


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 13:43                       ` Paolo Bonzini
@ 2019-07-25 13:54                         ` Michael S. Tsirkin
  2019-07-25 14:13                           ` Paolo Bonzini
  2019-07-25 14:04                         ` Peter Maydell
  2019-07-25 14:42                         ` Sergio Lopez
  2 siblings, 1 reply; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 13:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	Stefan Hajnoczi, QEMU Developers, Gerd Hoffmann,
	Stefano Garzarella, Richard Henderson

On Thu, Jul 25, 2019 at 03:43:12PM +0200, Paolo Bonzini wrote:
> On 25/07/19 15:26, Stefan Hajnoczi wrote:
> > The microvm design has a premise and it can be answered definitively
> > through performance analysis.
> > 
> > If I had to explain to someone why PCI or ACPI significantly slows
> > things down, I couldn't honestly do so.  I say significantly because
> > PCI init definitely requires more vmexits but can it be a small
> > number?  For ACPI I have no idea why it would consume significant
> > amounts of time.
> 
> My guess is that it's just a lot of code that has to run. :(
> 
> > Until we have this knowledge, the premise of microvm is unproven and
> > merging it would be premature because maybe we can get into the same
> > ballpark by optimizing existing code.
> > 
> > I'm sorry for being a pain.  I actually think the analysis will
> > support microvm, but it still needs to be done in order to justify it.
> 
> No, you're not a pain, you're explaining your reasoning and that helps.
> 
> To me *maintainability is the biggest consideration* when introducing a
> new feature.  "We can do just as well with q35" is a good reason to
> deprecate and delete microvm, but not a good reason to reject it now as
> long as microvm is good enough in terms of maintainability.  Keeping it
> out of tree only makes it harder to do this kind of experiment.  virtio
> 1 seems to be the biggest remaining blocker and I think it'd be a good
> thing to have even for the ARM virt machine type.

Yep. E.g. virtio-iommu guys wanted that too.

> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> and ~25 ms in the kernel.

How did you measure the qemu time btw?

>  I must say that's pretty good, but it's still
> 30% of the whole boot time and reducing it is the hardest part.  If
> having microvm in tree can help reducing it, good.  Yes, it will get
> users, but most likely they will have to support pc or q35 as a fallback
> so we could still delete microvm at any time with the due deprecation
> period if it turns out to be a failed experiment.
> 
> Whether to use qboot or SeaBIOS for microvm is another story, but it's
> an implementation detail as long as the ROM size doesn't change and/or
> we don't do versioned machine types.  So we can switch from one to the
> other at any time; we can also include qboot directly in QEMU's tree,
> without going through a submodule, which also reduces the infrastructure
> needed (mirrors, etc.) and makes it easier to delete it.
> 
> Paolo
> 
> (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> end up measured as PCI in SeaBIOS, due to different init order, so the
> real firmware cost of PAM and PCI initialization should be 5ms for qboot
> and 10ms for SeaBIOS.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 13:43                       ` Paolo Bonzini
  2019-07-25 13:54                         ` Michael S. Tsirkin
@ 2019-07-25 14:04                         ` Peter Maydell
  2019-07-25 14:26                           ` Paolo Bonzini
  2019-07-25 14:42                         ` Sergio Lopez
  2 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2019-07-25 14:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Eduardo Habkost, Sergio Lopez, Maran Wilson, Stefan Hajnoczi,
	Michael S. Tsirkin, QEMU Developers, Gerd Hoffmann,
	Stefano Garzarella, Richard Henderson

On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini <pbonzini@redhat.com> wrote:
> To me *maintainability is the biggest consideration* when introducing a
> new feature.  "We can do just as well with q35" is a good reason to
> deprecate and delete microvm, but not a good reason to reject it now as
> long as microvm is good enough in terms of maintainability.

I think maintainability matters, but also important is "are
we going in the right direction in the first place?".
virtio-mmio is (variously deliberately and accidentally)
quite a long way behind virtio-pci, and certain kinds of things
(hotplug, extensibility beyond a certain number of endpoints)
are not going to be possible (either ever, or without a lot
of extra design and implementation work to reimplement stuff
we have already today with PCI). Are we sure we're not going
to end up with a stream of "oh, now we need to implement X for
virtio-mmio (that virtio-pci already has)", "users want Y now
(that virtio-pci already has)", etc?

The other thing is that once we've introduced something we're
stuck with whatever it does, because we don't like breaking
backwards compatibility. So I think getting the virtio-legacy
vs virtio-1 story sorted out before we land microvm is
important, at least to the point where we know we haven't
backed ourselves into a corner or required a lot of extra
effort on transitional-device support that we could have
avoided.

Which isn't to say that I'm against the microvm approach;
just that I'd like us to consider and make a decision on
these issues before landing it, rather than just saying
"the patches in themselves look good, let's merge it".

thanks
-- PMM


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 13:54                         ` Michael S. Tsirkin
@ 2019-07-25 14:13                           ` Paolo Bonzini
  2019-07-25 14:42                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-25 14:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	Stefan Hajnoczi, QEMU Developers, Gerd Hoffmann,
	Stefano Garzarella, Richard Henderson

On 25/07/19 15:54, Michael S. Tsirkin wrote:
>> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
>> and ~25 ms in the kernel.
> How did you measure the qemu time btw?
> 

It's QEMU startup, but not QEMU altogether.  For example the time spent
in memory.c when a BAR is programmed is not part of those 10 ms.

So I just computed q35 qemu startup - microvm qemu startup, it's 65 vs
65 ms.

Paolo


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 14:04                         ` Peter Maydell
@ 2019-07-25 14:26                           ` Paolo Bonzini
  2019-07-25 14:35                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-25 14:26 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Eduardo Habkost, Sergio Lopez, Maran Wilson, Stefan Hajnoczi,
	Michael S. Tsirkin, QEMU Developers, Gerd Hoffmann,
	Stefano Garzarella, Richard Henderson

On 25/07/19 16:04, Peter Maydell wrote:
> On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> To me *maintainability is the biggest consideration* when introducing a
>> new feature.  "We can do just as well with q35" is a good reason to
>> deprecate and delete microvm, but not a good reason to reject it now as
>> long as microvm is good enough in terms of maintainability.
> 
> I think maintainability matters, but also important is "are
> we going in the right direction in the first place?".
> virtio-mmio is (variously deliberately and accidentally)
> quite a long way behind virtio-pci, and certain kinds of things
> (hotplug, extensibility beyond a certain number of endpoints)
> are not going to be possible (either ever, or without a lot
> of extra design and implementation work to reimplement stuff
> we have already today with PCI). Are we sure we're not going
> to end up with a stream of "oh, now we need to implement X for
> virtio-mmio (that virtio-pci already has)", "users want Y now
> (that virtio-pci already has)", etc?

I think this is part of maintainability in a wider sense.  For every
missing feature there should be a good reason why it's not needed.  And
if there is already code to do that in QEMU, then there should be an
excellent reason why it's not being used.  (This was the essence of the
firmware debate).

So for microvm you could do without hotplug because the idea is that you
just tear down the VM and restart it.  Lack of MSI is actually what
worries me the most, but we could say that microvm clients generally
have little multiprocessing so it's not common to have multiple network
flows at the same time and so you don't need multiqueue.

For microvm in particular there are two reasons why we can take some
shortcuts (but with care):

- we won't support versioned machine types for microvm.  microvm guests
die every time you upgrade QEMU, by design.  So this is not another QED,
which implemented more features than qcow2 but did so at the wrong place
of the stack.  In fact it's exactly the opposite (it implements less
features, so that the implementation of e.g. q35 or PCI is untouched and
does not need one-off boot time optimization hacks)

- we know that Amazon is using something very similar to microvm in
production, with virtio-mmio, so the feature set is at least usable for
something.

> The other thing is that once we've introduced something we're
> stuck with whatever it does, because we don't like breaking
> backwards compatibility. So I think getting the virtio-legacy
> vs virtio-1 story sorted out before we land microvm is
> important, at least to the point where we know we haven't
> backed ourselves into a corner or required a lot of extra
> effort on transitional-device support that we could have
> avoided.

Even though we won't support versioned machine types, I think there is
agreement that virtio 0.9 is a bad idea and should be fixed.

Paolo

> Which isn't to say that I'm against the microvm approach;
> just that I'd like us to consider and make a decision on
> these issues before landing it, rather than just saying
> "the patches in themselves look good, let's merge it".
> 
> thanks
> -- PMM
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 14:26                           ` Paolo Bonzini
@ 2019-07-25 14:35                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 14:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	Stefan Hajnoczi, QEMU Developers, Gerd Hoffmann,
	Stefano Garzarella, Richard Henderson

On Thu, Jul 25, 2019 at 04:26:42PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:04, Peter Maydell wrote:
> > On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> To me *maintainability is the biggest consideration* when introducing a
> >> new feature.  "We can do just as well with q35" is a good reason to
> >> deprecate and delete microvm, but not a good reason to reject it now as
> >> long as microvm is good enough in terms of maintainability.
> > 
> > I think maintainability matters, but also important is "are
> > we going in the right direction in the first place?".
> > virtio-mmio is (variously deliberately and accidentally)
> > quite a long way behind virtio-pci, and certain kinds of things
> > (hotplug, extensibility beyond a certain number of endpoints)
> > are not going to be possible (either ever, or without a lot
> > of extra design and implementation work to reimplement stuff
> > we have already today with PCI). Are we sure we're not going
> > to end up with a stream of "oh, now we need to implement X for
> > virtio-mmio (that virtio-pci already has)", "users want Y now
> > (that virtio-pci already has)", etc?
> 
> I think this is part of maintainability in a wider sense.  For every
> missing feature there should be a good reason why it's not needed.  And
> if there is already code to do that in QEMU, then there should be an
> excellent reason why it's not being used.  (This was the essence of the
> firmware debate).
> 
> So for microvm you could do without hotplug because the idea is that you
> just tear down the VM and restart it.  Lack of MSI is actually what
> worries me the most, but we could say that microvm clients generally
> have little multiprocessing so it's not common to have multiple network
> flows at the same time and so you don't need multiqueue.

Me too, and in fact someone just posted
	virtio-mmio: support multiple interrupt vectors


> For microvm in particular there are two reasons why we can take some
> shortcuts (but with care):
> 
> - we won't support versioned machine types for microvm.  microvm guests
> die every time you upgrade QEMU, by design.  So this is not another QED,
> which implemented more features than qcow2 but did so at the wrong place
> of the stack.  In fact it's exactly the opposite (it implements less
> features, so that the implementation of e.g. q35 or PCI is untouched and
> does not need one-off boot time optimization hacks)
> 
> - we know that Amazon is using something very similar to microvm in
> production, with virtio-mmio, so the feature set is at least usable for
> something.
> 
> > The other thing is that once we've introduced something we're
> > stuck with whatever it does, because we don't like breaking
> > backwards compatibility. So I think getting the virtio-legacy
> > vs virtio-1 story sorted out before we land microvm is
> > important, at least to the point where we know we haven't
> > backed ourselves into a corner or required a lot of extra
> > effort on transitional-device support that we could have
> > avoided.
> 
> Even though we won't support versioned machine types, I think there is
> agreement that virtio 0.9 is a bad idea and should be fixed.
> 
> Paolo

Right, for the simple reason that mmio does not support transitional
devices, only transitional drivers.  So if we commit to supporting old
guests, we won't be able to back out of that.

> > Which isn't to say that I'm against the microvm approach;
> > just that I'd like us to consider and make a decision on
> > these issues before landing it, rather than just saying
> > "the patches in themselves look good, let's merge it".
> > 
> > thanks
> > -- PMM
> > 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 14:13                           ` Paolo Bonzini
@ 2019-07-25 14:42                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 14:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	Stefan Hajnoczi, QEMU Developers, Gerd Hoffmann,
	Stefano Garzarella, Richard Henderson

On Thu, Jul 25, 2019 at 04:13:13PM +0200, Paolo Bonzini wrote:
> On 25/07/19 15:54, Michael S. Tsirkin wrote:
> >> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> >> and ~25 ms in the kernel.
> > How did you measure the qemu time btw?
> > 
> 
> It's QEMU startup, but not QEMU altogether.  For example the time spent
> in memory.c when a BAR is programmed is not part of those 10 ms.
> 
> So I just computed q35 qemu startup - microvm qemu startup, it's 65 vs
> 65 ms.
> 
> Paolo

Oh so it could be eventfd or whatever, just as well.

I actually wonder whether we spend much time within
synchronize_* calls. eventfd triggers this a  lot of times.

How about ioeventfd=off? Does this speed up things?



-- 
MST


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 13:43                       ` Paolo Bonzini
  2019-07-25 13:54                         ` Michael S. Tsirkin
  2019-07-25 14:04                         ` Peter Maydell
@ 2019-07-25 14:42                         ` Sergio Lopez
  2019-07-25 14:58                           ` Michael S. Tsirkin
  2 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-07-25 14:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Eduardo Habkost, Maran Wilson, Stefan Hajnoczi,
	Michael S. Tsirkin, QEMU Developers, Gerd Hoffmann,
	Stefano Garzarella, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 3235 bytes --]


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/07/19 15:26, Stefan Hajnoczi wrote:
>> The microvm design has a premise and it can be answered definitively
>> through performance analysis.
>> 
>> If I had to explain to someone why PCI or ACPI significantly slows
>> things down, I couldn't honestly do so.  I say significantly because
>> PCI init definitely requires more vmexits but can it be a small
>> number?  For ACPI I have no idea why it would consume significant
>> amounts of time.
>
> My guess is that it's just a lot of code that has to run. :(

I think I haven't shared any numbers about ACPI.

I don't have details about where exactly the time is spent, but
compiling a guest kernel without ACPI decreases the average boot time in
~12ms, and the kernel's unstripped ELF binary size goes down in a
whooping ~300KiB.

On the other hand, removing ACPI from QEMU decreases its initialization
time in ~5ms, and the binary size is ~183KiB smaller.

IMHO, those are pretty relevant savings on both fronts.

>> Until we have this knowledge, the premise of microvm is unproven and
>> merging it would be premature because maybe we can get into the same
>> ballpark by optimizing existing code.
>> 
>> I'm sorry for being a pain.  I actually think the analysis will
>> support microvm, but it still needs to be done in order to justify it.
>
> No, you're not a pain, you're explaining your reasoning and that helps.
>
> To me *maintainability is the biggest consideration* when introducing a
> new feature.  "We can do just as well with q35" is a good reason to
> deprecate and delete microvm, but not a good reason to reject it now as
> long as microvm is good enough in terms of maintainability.  Keeping it
> out of tree only makes it harder to do this kind of experiment.  virtio
> 1 seems to be the biggest remaining blocker and I think it'd be a good
> thing to have even for the ARM virt machine type.
>
> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> and ~25 ms in the kernel.  I must say that's pretty good, but it's still
> 30% of the whole boot time and reducing it is the hardest part.  If
> having microvm in tree can help reducing it, good.  Yes, it will get
> users, but most likely they will have to support pc or q35 as a fallback
> so we could still delete microvm at any time with the due deprecation
> period if it turns out to be a failed experiment.
>
> Whether to use qboot or SeaBIOS for microvm is another story, but it's
> an implementation detail as long as the ROM size doesn't change and/or
> we don't do versioned machine types.  So we can switch from one to the
> other at any time; we can also include qboot directly in QEMU's tree,
> without going through a submodule, which also reduces the infrastructure
> needed (mirrors, etc.) and makes it easier to delete it.
>
> Paolo
>
> (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> end up measured as PCI in SeaBIOS, due to different init order, so the
> real firmware cost of PAM and PCI initialization should be 5ms for qboot
> and 10ms for SeaBIOS.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-24 11:14                   ` Paolo Bonzini
  2019-07-25  9:35                     ` Sergio Lopez
  2019-07-25 10:03                     ` Michael S. Tsirkin
@ 2019-07-25 14:46                     ` Michael S. Tsirkin
  2019-07-25 15:35                       ` Paolo Bonzini
  2 siblings, 1 reply; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 14:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, Sergio Lopez, maran.wilson, Montes, Julio,
	Stefan Hajnoczi, qemu-devel, kraxel, rth, sgarzare

On Wed, Jul 24, 2019 at 01:14:35PM +0200, Paolo Bonzini wrote:
> On 23/07/19 12:01, Paolo Bonzini wrote:
> > The number of buses is determined by the firmware, not by QEMU, so
> > fw_cfg would not be the right interface.  In fact (as I have just
> > learnt) lastbus is an x86-specific option that overrides the last bus
> > returned by SeaBIOS's handle_1ab101.
> > 
> > So the next step could be to figure out what is the lastbus returned by
> > handle_1ab101 and possibly why it isn't zero.
> 
> Some update:
> 
> - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
> only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
> work on 32-bit kernels with ACPI disabled, because they are located beyond
> pcibios_last_bus (with ACPI enabled, the DSDT exposes them).
> 
> - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done.
> 
> - for -M q35, pcibios_last_bus in Linux is set based on the size of the 
> MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
> for buses above 0.
> 
> Here is a patch that only scans devfn==0, which should mostly remove the need
> for pci=lastbus=0.  (Testing is welcome).

Actually, I think I have a better idea.
At the moment we just get an exit on these reads and return all-ones.
Yes, in theory there could be a UR bit set in a bunch of
registers but in practice no one cares about these,
and I don't think we implement them.
So how about mapping a single page, read-only, and filling it
with all-ones?

We'll still run the code within linux but it will be free.

What do you think?


> Actually, KVM could probably avoid the scanning altogether.  The only "hidden" root
> buses we expect are from PCI expander bridges and if you found an MMCONFIG area
> through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges.
> However, I am being conservative.
> 
> A possible alternative could be a mechanism whereby the vmlinuz real mode entry
> point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
> kernel via the vmlinuz or PVH boot information structs.  However, I don't think
> that's very useful, and there is some risk of breaking real hardware too.
> 
> Paolo
> 
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 73bb404f4d2a..17012aa60d22 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -61,6 +61,7 @@ enum pci_bf_sort_state {
>  extern struct pci_ops pci_root_ops;
>  
>  void pcibios_scan_specific_bus(int busn);
> +void pcibios_scan_bus_by_device(int busn);
>  
>  /* pci-irq.c */
>  
> @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # endif
>  # define x86_default_pci_init_irq	pcibios_irq_init
>  # define x86_default_pci_fixup_irqs	pcibios_fixup_irqs
> +# define x86_default_pci_scan_bus	pcibios_scan_bus_by_device
>  #else
>  # define x86_default_pci_init		NULL
>  # define x86_default_pci_init_irq	NULL
>  # define x86_default_pci_fixup_irqs	NULL
> +# define x86_default_pci_scan_bus      NULL
>  #endif
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index b85a7c54c6a1..4c3a0a17a600 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -251,6 +251,7 @@ struct x86_hyper_runtime {
>   * @save_sched_clock_state:	save state for sched_clock() on suspend
>   * @restore_sched_clock_state:	restore state for sched_clock() on resume
>   * @apic_post_init:		adjust apic if needed
> + * @pci_scan_bus:		scan a PCI bus
>   * @legacy:			legacy features
>   * @set_legacy_features:	override legacy features. Use of this callback
>   * 				is highly discouraged. You should only need
> @@ -273,6 +274,7 @@ struct x86_platform_ops {
>  	void (*save_sched_clock_state)(void);
>  	void (*restore_sched_clock_state)(void);
>  	void (*apic_post_init)(void);
> +	void (*pci_scan_bus)(int busn);
>  	struct x86_legacy_features legacy;
>  	void (*set_legacy_features)(void);
>  	struct x86_hyper_runtime hyper;
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index 6857b4577f17..b248d7036dd3 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -11,12 +11,14 @@
>  #include <linux/acpi_pmtmr.h>
>  #include <linux/kernel.h>
>  #include <linux/reboot.h>
> +#include <linux/pci.h>
>  #include <asm/apic.h>
>  #include <asm/cpu.h>
>  #include <asm/hypervisor.h>
>  #include <asm/i8259.h>
>  #include <asm/irqdomain.h>
>  #include <asm/pci_x86.h>
> +#include <asm/pci.h>
>  #include <asm/reboot.h>
>  #include <asm/setup.h>
>  #include <asm/jailhouse_para.h>
> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
>  	return 0;
>  }
>  
> +static void jailhouse_pci_scan_bus_by_function(int busn)
> +{
> +        int devfn;
> +        u32 l;
> +
> +        for (devfn = 0; devfn < 256; devfn++) {
> +                if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
> +                    l != 0x0000 && l != 0xffff) {
> +                        DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> +                        pr_info("PCI: Discovered peer bus %02x\n", busn);
> +                        pcibios_scan_root(busn);
> +                        return;
> +                }
> +        }
> +}
> +
>  static void __init jailhouse_init_platform(void)
>  {
>  	u64 pa_data = boot_params.hdr.setup_data;
> @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void)
>  	x86_platform.legacy.rtc		= 0;
>  	x86_platform.legacy.warm_reset	= 0;
>  	x86_platform.legacy.i8042	= X86_LEGACY_I8042_PLATFORM_ABSENT;
> +	x86_platform.pci_scan_bus	= jailhouse_pci_scan_bus_by_function;
>  
>  	legacy_pic			= &null_legacy_pic;
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 82caf01b63dd..59f7204ed8f3 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -24,6 +24,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/nmi.h>
>  #include <linux/swait.h>
> +#include <linux/pci.h>
>  #include <asm/timer.h>
>  #include <asm/cpu.h>
>  #include <asm/traps.h>
> @@ -33,6 +34,7 @@
>  #include <asm/apicdef.h>
>  #include <asm/hypervisor.h>
>  #include <asm/tlb.h>
> +#include <asm/pci.h>
>  
>  static int kvmapf = 1;
>  
> @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask,
>  	native_flush_tlb_others(flushmask, info);
>  }
>  
> +#ifdef CONFIG_PCI
> +static void kvm_pci_scan_bus(int busn)
> +{
> +        u32 l;
> +
> +	/*
> +	 * Assume that there are no "hidden" buses, i.e. all PCI root buses
> +	 * have a host bridge at device 0, function 0.
> +	 */
> +	if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) &&
> +	    l != 0x0000 && l != 0xffff) {
> +		pr_info("PCI: Discovered peer bus %02x\n", busn);
> +		pcibios_scan_root(busn);
> +        }
> +}
> +#endif
> +
>  static void __init kvm_guest_init(void)
>  {
>  	int i;
>  
> +#ifdef CONFIG_PCI
> +	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
> +#endif
> +
>  	if (!kvm_para_available())
>  		return;
>  
> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 50a2b492fdd6..19e1cc2cb6e0 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
>  	.get_nmi_reason			= default_get_nmi_reason,
>  	.save_sched_clock_state 	= tsc_save_sched_clock_state,
>  	.restore_sched_clock_state 	= tsc_restore_sched_clock_state,
> +	.pci_scan_bus			= x86_default_pci_scan_bus,
>  	.hyper.pin_vcpu			= x86_op_int_noop,
>  };
>  
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 467311b1eeea..6214dbce26d3 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -36,14 +36,19 @@ int __init pci_legacy_init(void)
>  
>  void pcibios_scan_specific_bus(int busn)
>  {
> -	int stride = jailhouse_paravirt() ? 1 : 8;
> -	int devfn;
> -	u32 l;
> -
>  	if (pci_find_bus(0, busn))
>  		return;
>  
> -	for (devfn = 0; devfn < 256; devfn += stride) {
> +	x86_platform.pci_scan_bus(busn);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
> +
> +void pcibios_scan_bus_by_device(int busn)
> +{
> +	int devfn;
> +	u32 l;
> +
> +	for (devfn = 0; devfn < 256; devfn += 8) {
>  		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
>  		    l != 0x0000 && l != 0xffff) {
>  			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn)
>  		}
>  	}
>  }
> -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
>  
>  static int __init pci_subsys_init(void)
>  {


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 10:10             ` Michael S. Tsirkin
@ 2019-07-25 14:52               ` Sergio Lopez
  0 siblings, 0 replies; 68+ messages in thread
From: Sergio Lopez @ 2019-07-25 14:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Eduardo Habkost, maran.wilson, QEMU Developers,
	Gerd Hoffmann, Paolo Bonzini, Stefano Garzarella,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 872 bytes --]


Michael S. Tsirkin <mst@redhat.com> writes:

> On Thu, Jul 25, 2019 at 11:05:05AM +0100, Peter Maydell wrote:
>> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > OK so please start with adding virtio 1 support. Guest bits
>> > have been ready for years now.
>> 
>> I'd still rather we just used pci virtio. If pci isn't
>> fast enough at startup, do something to make it faster...
>> 
>> thanks
>> -- PMM
>
> Oh that's putting microvm aside - if we have a maintainer for
> virtio mmio that's great because it does need a maintainer,
> and virtio 1 would be the thing to fix before adding features ;)

There seems to be a general consensus that virtio-mmio needs some care,
and looking at the specs, implementing virtio-mmio v2/virtio v1
shouldn't be too time consuming, so I'm going to give it a try.

Cheers,
Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 14:42                         ` Sergio Lopez
@ 2019-07-25 14:58                           ` Michael S. Tsirkin
  2019-07-25 15:01                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 14:58 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Peter Maydell, Eduardo Habkost, Maran Wilson, Stefan Hajnoczi,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 25/07/19 15:26, Stefan Hajnoczi wrote:
> >> The microvm design has a premise and it can be answered definitively
> >> through performance analysis.
> >> 
> >> If I had to explain to someone why PCI or ACPI significantly slows
> >> things down, I couldn't honestly do so.  I say significantly because
> >> PCI init definitely requires more vmexits but can it be a small
> >> number?  For ACPI I have no idea why it would consume significant
> >> amounts of time.
> >
> > My guess is that it's just a lot of code that has to run. :(
> 
> I think I haven't shared any numbers about ACPI.
> 
> I don't have details about where exactly the time is spent, but
> compiling a guest kernel without ACPI decreases the average boot time in
> ~12ms, and the kernel's unstripped ELF binary size goes down in a
> whooping ~300KiB.

At least the binary size is hardly surprising.

I'm guessing you built in lots of drivers.

It would be educational to try to enable ACPI core but disable all
optional features.


> On the other hand, removing ACPI from QEMU decreases its initialization
> time in ~5ms, and the binary size is ~183KiB smaller.

Yes - ACPI generation uses a ton of allocations and data copies.

Need to play with pre-allocation strategies. Maybe something
as simple as:

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index f3fdfefcd5..24becc069e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     acpi_get_pci_holes(&pci_hole, &pci_hole64);
     acpi_get_slic_oem(&slic_oem);
 
+#define DEFAULT_ARRAY_SIZE 16
     table_offsets = g_array_new(false, true /* clear */,
-                                        sizeof(uint32_t));
+                                        sizeof(uint32_t),
+                                        DEFAULT_ARRAY_SIZE);
     ACPI_BUILD_DPRINTF("init ACPI tables\n");
 
     bios_linker_loader_alloc(tables->linker,

will already help a bit.

> 
> IMHO, those are pretty relevant savings on both fronts.
> 
> >> Until we have this knowledge, the premise of microvm is unproven and
> >> merging it would be premature because maybe we can get into the same
> >> ballpark by optimizing existing code.
> >> 
> >> I'm sorry for being a pain.  I actually think the analysis will
> >> support microvm, but it still needs to be done in order to justify it.
> >
> > No, you're not a pain, you're explaining your reasoning and that helps.
> >
> > To me *maintainability is the biggest consideration* when introducing a
> > new feature.  "We can do just as well with q35" is a good reason to
> > deprecate and delete microvm, but not a good reason to reject it now as
> > long as microvm is good enough in terms of maintainability.  Keeping it
> > out of tree only makes it harder to do this kind of experiment.  virtio
> > 1 seems to be the biggest remaining blocker and I think it'd be a good
> > thing to have even for the ARM virt machine type.
> >
> > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
> > 30% of the whole boot time and reducing it is the hardest part.  If
> > having microvm in tree can help reducing it, good.  Yes, it will get
> > users, but most likely they will have to support pc or q35 as a fallback
> > so we could still delete microvm at any time with the due deprecation
> > period if it turns out to be a failed experiment.
> >
> > Whether to use qboot or SeaBIOS for microvm is another story, but it's
> > an implementation detail as long as the ROM size doesn't change and/or
> > we don't do versioned machine types.  So we can switch from one to the
> > other at any time; we can also include qboot directly in QEMU's tree,
> > without going through a submodule, which also reduces the infrastructure
> > needed (mirrors, etc.) and makes it easier to delete it.
> >
> > Paolo
> >
> > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> > end up measured as PCI in SeaBIOS, due to different init order, so the
> > real firmware cost of PAM and PCI initialization should be 5ms for qboot
> > and 10ms for SeaBIOS.
> 




^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 14:58                           ` Michael S. Tsirkin
@ 2019-07-25 15:01                             ` Michael S. Tsirkin
  2019-07-25 15:39                               ` Paolo Bonzini
  2019-07-25 15:49                               ` Sergio Lopez
  0 siblings, 2 replies; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 15:01 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Peter Maydell, Eduardo Habkost, Maran Wilson, Stefan Hajnoczi,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

On Thu, Jul 25, 2019 at 10:58:22AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
> > 
> > Paolo Bonzini <pbonzini@redhat.com> writes:
> > 
> > > On 25/07/19 15:26, Stefan Hajnoczi wrote:
> > >> The microvm design has a premise and it can be answered definitively
> > >> through performance analysis.
> > >> 
> > >> If I had to explain to someone why PCI or ACPI significantly slows
> > >> things down, I couldn't honestly do so.  I say significantly because
> > >> PCI init definitely requires more vmexits but can it be a small
> > >> number?  For ACPI I have no idea why it would consume significant
> > >> amounts of time.
> > >
> > > My guess is that it's just a lot of code that has to run. :(
> > 
> > I think I haven't shared any numbers about ACPI.
> > 
> > I don't have details about where exactly the time is spent, but
> > compiling a guest kernel without ACPI decreases the average boot time in
> > ~12ms, and the kernel's unstripped ELF binary size goes down in a
> > whooping ~300KiB.
> 
> At least the binary size is hardly surprising.
> 
> I'm guessing you built in lots of drivers.
> 
> It would be educational to try to enable ACPI core but disable all
> optional features.

Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.


> 
> > On the other hand, removing ACPI from QEMU decreases its initialization
> > time in ~5ms, and the binary size is ~183KiB smaller.
> 
> Yes - ACPI generation uses a ton of allocations and data copies.
> 
> Need to play with pre-allocation strategies. Maybe something
> as simple as:
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index f3fdfefcd5..24becc069e 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>      acpi_get_pci_holes(&pci_hole, &pci_hole64);
>      acpi_get_slic_oem(&slic_oem);
>  
> +#define DEFAULT_ARRAY_SIZE 16
>      table_offsets = g_array_new(false, true /* clear */,
> -                                        sizeof(uint32_t));
> +                                        sizeof(uint32_t),
> +                                        DEFAULT_ARRAY_SIZE);
>      ACPI_BUILD_DPRINTF("init ACPI tables\n");
>  
>      bios_linker_loader_alloc(tables->linker,
> 
> will already help a bit.
> 
> > 
> > IMHO, those are pretty relevant savings on both fronts.
> > 
> > >> Until we have this knowledge, the premise of microvm is unproven and
> > >> merging it would be premature because maybe we can get into the same
> > >> ballpark by optimizing existing code.
> > >> 
> > >> I'm sorry for being a pain.  I actually think the analysis will
> > >> support microvm, but it still needs to be done in order to justify it.
> > >
> > > No, you're not a pain, you're explaining your reasoning and that helps.
> > >
> > > To me *maintainability is the biggest consideration* when introducing a
> > > new feature.  "We can do just as well with q35" is a good reason to
> > > deprecate and delete microvm, but not a good reason to reject it now as
> > > long as microvm is good enough in terms of maintainability.  Keeping it
> > > out of tree only makes it harder to do this kind of experiment.  virtio
> > > 1 seems to be the biggest remaining blocker and I think it'd be a good
> > > thing to have even for the ARM virt machine type.
> > >
> > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> > > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
> > > 30% of the whole boot time and reducing it is the hardest part.  If
> > > having microvm in tree can help reducing it, good.  Yes, it will get
> > > users, but most likely they will have to support pc or q35 as a fallback
> > > so we could still delete microvm at any time with the due deprecation
> > > period if it turns out to be a failed experiment.
> > >
> > > Whether to use qboot or SeaBIOS for microvm is another story, but it's
> > > an implementation detail as long as the ROM size doesn't change and/or
> > > we don't do versioned machine types.  So we can switch from one to the
> > > other at any time; we can also include qboot directly in QEMU's tree,
> > > without going through a submodule, which also reduces the infrastructure
> > > needed (mirrors, etc.) and makes it easier to delete it.
> > >
> > > Paolo
> > >
> > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> > > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> > > end up measured as PCI in SeaBIOS, due to different init order, so the
> > > real firmware cost of PAM and PCI initialization should be 5ms for qboot
> > > and 10ms for SeaBIOS.
> > 
> 
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 14:46                     ` Michael S. Tsirkin
@ 2019-07-25 15:35                       ` Paolo Bonzini
  2019-07-25 17:33                         ` Michael S. Tsirkin
  2019-07-25 20:30                         ` Michael S. Tsirkin
  0 siblings, 2 replies; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-25 15:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost, Sergio Lopez, maran.wilson, Montes, Julio,
	Stefan Hajnoczi, qemu-devel, kraxel, rth, sgarzare

On 25/07/19 16:46, Michael S. Tsirkin wrote:
> Actually, I think I have a better idea.
> At the moment we just get an exit on these reads and return all-ones.
> Yes, in theory there could be a UR bit set in a bunch of
> registers but in practice no one cares about these,
> and I don't think we implement them.
> So how about mapping a single page, read-only, and filling it
> with all-ones?

Yes, that's nice indeed. :)  But it does have some cost, in terms of
either number of VMAs or QEMU RSS since the MMCONFIG area is large.

What breaks if we return all zeroes?  Zero is not a valid vendor ID.

Paolo


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 15:01                             ` Michael S. Tsirkin
@ 2019-07-25 15:39                               ` Paolo Bonzini
  2019-07-25 17:38                                 ` Michael S. Tsirkin
  2019-07-25 15:49                               ` Sergio Lopez
  1 sibling, 1 reply; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-25 15:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, Sergio Lopez
  Cc: Peter Maydell, Eduardo Habkost, Maran Wilson, Stefan Hajnoczi,
	QEMU Developers, Gerd Hoffmann, Stefano Garzarella,
	Richard Henderson

On 25/07/19 17:01, Michael S. Tsirkin wrote:
>> It would be educational to try to enable ACPI core but disable all
>> optional features.

A lot of them are select'ed so it's not easy.

> Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.

That's what the NEMU guys experimented with.  It's not supported by our
DSDT since it uses ACPI GPE, and the reduction in code size is small
(about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).

Paolo


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 15:01                             ` Michael S. Tsirkin
  2019-07-25 15:39                               ` Paolo Bonzini
@ 2019-07-25 15:49                               ` Sergio Lopez
  1 sibling, 0 replies; 68+ messages in thread
From: Sergio Lopez @ 2019-07-25 15:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Eduardo Habkost, Maran Wilson, Stefan Hajnoczi,
	QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Stefano Garzarella, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 5230 bytes --]


Michael S. Tsirkin <mst@redhat.com> writes:

> On Thu, Jul 25, 2019 at 10:58:22AM -0400, Michael S. Tsirkin wrote:
>> On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
>> > 
>> > Paolo Bonzini <pbonzini@redhat.com> writes:
>> > 
>> > > On 25/07/19 15:26, Stefan Hajnoczi wrote:
>> > >> The microvm design has a premise and it can be answered definitively
>> > >> through performance analysis.
>> > >> 
>> > >> If I had to explain to someone why PCI or ACPI significantly slows
>> > >> things down, I couldn't honestly do so.  I say significantly because
>> > >> PCI init definitely requires more vmexits but can it be a small
>> > >> number?  For ACPI I have no idea why it would consume significant
>> > >> amounts of time.
>> > >
>> > > My guess is that it's just a lot of code that has to run. :(
>> > 
>> > I think I haven't shared any numbers about ACPI.
>> > 
>> > I don't have details about where exactly the time is spent, but
>> > compiling a guest kernel without ACPI decreases the average boot time in
>> > ~12ms, and the kernel's unstripped ELF binary size goes down in a
>> > whooping ~300KiB.
>> 
>> At least the binary size is hardly surprising.
>> 
>> I'm guessing you built in lots of drivers.
>> 
>> It would be educational to try to enable ACPI core but disable all
>> optional features.

I just tried disabling everything that menuconfig allowed me to. Saves
~27KiB and doesn't improve boot time.

> Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.

I also tried enabling this one in my original config. It saves ~11.5KiB,
and has on impact on boot time either.

>> 
>> > On the other hand, removing ACPI from QEMU decreases its initialization
>> > time in ~5ms, and the binary size is ~183KiB smaller.
>> 
>> Yes - ACPI generation uses a ton of allocations and data copies.
>> 
>> Need to play with pre-allocation strategies. Maybe something
>> as simple as:
>> 
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index f3fdfefcd5..24becc069e 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>      acpi_get_pci_holes(&pci_hole, &pci_hole64);
>>      acpi_get_slic_oem(&slic_oem);
>>  
>> +#define DEFAULT_ARRAY_SIZE 16
>>      table_offsets = g_array_new(false, true /* clear */,
>> -                                        sizeof(uint32_t));
>> +                                        sizeof(uint32_t),
>> +                                        DEFAULT_ARRAY_SIZE);
>>      ACPI_BUILD_DPRINTF("init ACPI tables\n");
>>  
>>      bios_linker_loader_alloc(tables->linker,
>> 
>> will already help a bit.
>> 
>> > 
>> > IMHO, those are pretty relevant savings on both fronts.
>> > 
>> > >> Until we have this knowledge, the premise of microvm is unproven and
>> > >> merging it would be premature because maybe we can get into the same
>> > >> ballpark by optimizing existing code.
>> > >> 
>> > >> I'm sorry for being a pain.  I actually think the analysis will
>> > >> support microvm, but it still needs to be done in order to justify it.
>> > >
>> > > No, you're not a pain, you're explaining your reasoning and that helps.
>> > >
>> > > To me *maintainability is the biggest consideration* when introducing a
>> > > new feature.  "We can do just as well with q35" is a good reason to
>> > > deprecate and delete microvm, but not a good reason to reject it now as
>> > > long as microvm is good enough in terms of maintainability.  Keeping it
>> > > out of tree only makes it harder to do this kind of experiment.  virtio
>> > > 1 seems to be the biggest remaining blocker and I think it'd be a good
>> > > thing to have even for the ARM virt machine type.
>> > >
>> > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
>> > > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
>> > > 30% of the whole boot time and reducing it is the hardest part.  If
>> > > having microvm in tree can help reducing it, good.  Yes, it will get
>> > > users, but most likely they will have to support pc or q35 as a fallback
>> > > so we could still delete microvm at any time with the due deprecation
>> > > period if it turns out to be a failed experiment.
>> > >
>> > > Whether to use qboot or SeaBIOS for microvm is another story, but it's
>> > > an implementation detail as long as the ROM size doesn't change and/or
>> > > we don't do versioned machine types.  So we can switch from one to the
>> > > other at any time; we can also include qboot directly in QEMU's tree,
>> > > without going through a submodule, which also reduces the infrastructure
>> > > needed (mirrors, etc.) and makes it easier to delete it.
>> > >
>> > > Paolo
>> > >
>> > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
>> > > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
>> > > end up measured as PCI in SeaBIOS, due to different init order, so the
>> > > real firmware cost of PAM and PCI initialization should be 5ms for qboot
>> > > and 10ms for SeaBIOS.
>> > 
>> 
>> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 15:35                       ` Paolo Bonzini
@ 2019-07-25 17:33                         ` Michael S. Tsirkin
  2019-07-25 20:30                         ` Michael S. Tsirkin
  1 sibling, 0 replies; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 17:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, Sergio Lopez, maran.wilson, Montes, Julio,
	Stefan Hajnoczi, qemu-devel, kraxel, rth, sgarzare

On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> > Actually, I think I have a better idea.
> > At the moment we just get an exit on these reads and return all-ones.
> > Yes, in theory there could be a UR bit set in a bunch of
> > registers but in practice no one cares about these,
> > and I don't think we implement them.
> > So how about mapping a single page, read-only, and filling it
> > with all-ones?
> 
> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> 
> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> 
> Paolo

It isn't but that's not what baremetal does. So there's some risk
there ...

Why is all zeroes better? We still need to map it, right?

-- 
MST


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 15:39                               ` Paolo Bonzini
@ 2019-07-25 17:38                                 ` Michael S. Tsirkin
  2019-07-26 12:46                                   ` Igor Mammedov
  0 siblings, 1 reply; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 17:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	Stefan Hajnoczi, QEMU Developers, Gerd Hoffmann,
	Stefano Garzarella, Richard Henderson

On Thu, Jul 25, 2019 at 05:39:39PM +0200, Paolo Bonzini wrote:
> On 25/07/19 17:01, Michael S. Tsirkin wrote:
> >> It would be educational to try to enable ACPI core but disable all
> >> optional features.
> 
> A lot of them are select'ed so it's not easy.
> 
> > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.
> 
> That's what the NEMU guys experimented with.  It's not supported by our
> DSDT since it uses ACPI GPE,

Well there are two GPE blocks in FADT. We could just switch to
these if necesary I think.

> and the reduction in code size is small
> (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).
> 
> Paolo

Well ACPI is 150k loc I think, right?

linux]$ wc -l `find drivers/acpi/ -name '*.c' `|tail -1
 145926 total

So 100k wouldn't be too shabby.

-- 
MST


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 15:35                       ` Paolo Bonzini
  2019-07-25 17:33                         ` Michael S. Tsirkin
@ 2019-07-25 20:30                         ` Michael S. Tsirkin
  2019-07-26  7:57                           ` Paolo Bonzini
  1 sibling, 1 reply; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-25 20:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, Sergio Lopez, maran.wilson, Montes, Julio,
	Stefan Hajnoczi, qemu-devel, kraxel, rth, sgarzare

On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> > Actually, I think I have a better idea.
> > At the moment we just get an exit on these reads and return all-ones.
> > Yes, in theory there could be a UR bit set in a bunch of
> > registers but in practice no one cares about these,
> > and I don't think we implement them.
> > So how about mapping a single page, read-only, and filling it
> > with all-ones?
> 
> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> 
> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> 
> Paolo

I think I know what you are thinking of doing:
map /dev/zero so we get a single VMA but all mapped to
a single zero pte?

We could start with that, at least as an experiment.
Further:

- we can limit the amount of fragmentation and simply
  unmap everything if we exceed a specific limit:
  with more than X devices it's no longer a lightweight
  VM anyway :)

- we can implement /dev/ones. in fact, we can implement
  /dev/byteXX for each possible value, the cost will
  be only 1M on a 4k page system.
  it might come in handy for e.g. free page hinting:
  at the moment if guest memory is poisoned
  we can not unmap it, with this trick we can
  map it to /dev/byteXX.

Note that the kvm memory array is still fragmented.
Again, we can fallback on disabling the optimization
if there are too many devices.


-- 
MST


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 20:30                         ` Michael S. Tsirkin
@ 2019-07-26  7:57                           ` Paolo Bonzini
  2019-07-26 11:10                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 68+ messages in thread
From: Paolo Bonzini @ 2019-07-26  7:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost, Sergio Lopez, maran.wilson, Montes, Julio,
	Stefan Hajnoczi, qemu-devel, kraxel, rth, sgarzare

On 25/07/19 22:30, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
>> On 25/07/19 16:46, Michael S. Tsirkin wrote:
>>> Actually, I think I have a better idea.
>>> At the moment we just get an exit on these reads and return all-ones.
>>> Yes, in theory there could be a UR bit set in a bunch of
>>> registers but in practice no one cares about these,
>>> and I don't think we implement them.
>>> So how about mapping a single page, read-only, and filling it
>>> with all-ones?
>>
>> Yes, that's nice indeed. :)  But it does have some cost, in terms of
>> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
>>
>> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
>>
>> Paolo
> 
> I think I know what you are thinking of doing:
> map /dev/zero so we get a single VMA but all mapped to
> a single zero pte?

Yes, exactly.  You absolutely need to share the page because the guest
could easily touch 32*256 pages just to scan function 0 on every bus and
device, even if the VM has just 4 or 5 devices and all of them on the
root complex.  And that causes fragmentation so you have to map bigger
areas.

> - we can implement /dev/ones. in fact, we can implement
>   /dev/byteXX for each possible value, the cost will
>   be only 1M on a 4k page system.
>   it might come in handy for e.g. free page hinting:
>   at the moment if guest memory is poisoned
>   we can not unmap it, with this trick we can
>   map it to /dev/byteXX.

I also thought of /dev/ones, not sure how it would be accepted. :)  Also
you cannot map lazily on page fault, otherwise you get a vmexit and it's
slow again.  So /dev/ones needs to be written to use a huge page, possibly.

Paolo


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-26  7:57                           ` Paolo Bonzini
@ 2019-07-26 11:10                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 68+ messages in thread
From: Michael S. Tsirkin @ 2019-07-26 11:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: ehabkost, Sergio Lopez, maran.wilson, Montes, Julio,
	Stefan Hajnoczi, qemu-devel, kraxel, rth, sgarzare

On Fri, Jul 26, 2019 at 09:57:51AM +0200, Paolo Bonzini wrote:
> On 25/07/19 22:30, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> >> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> >>> Actually, I think I have a better idea.
> >>> At the moment we just get an exit on these reads and return all-ones.
> >>> Yes, in theory there could be a UR bit set in a bunch of
> >>> registers but in practice no one cares about these,
> >>> and I don't think we implement them.
> >>> So how about mapping a single page, read-only, and filling it
> >>> with all-ones?
> >>
> >> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> >> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> >>
> >> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> >>
> >> Paolo
> > 
> > I think I know what you are thinking of doing:
> > map /dev/zero so we get a single VMA but all mapped to
> > a single zero pte?
> 
> Yes, exactly.  You absolutely need to share the page because the guest
> could easily touch 32*256 pages just to scan function 0 on every bus and
> device, even if the VM has just 4 or 5 devices and all of them on the
> root complex.  And that causes fragmentation so you have to map bigger
> areas.
> 
> > - we can implement /dev/ones. in fact, we can implement
> >   /dev/byteXX for each possible value, the cost will
> >   be only 1M on a 4k page system.
> >   it might come in handy for e.g. free page hinting:
> >   at the moment if guest memory is poisoned
> >   we can not unmap it, with this trick we can
> >   map it to /dev/byteXX.
> 
> I also thought of /dev/ones, not sure how it would be accepted. :)  Also
> you cannot map lazily on page fault, otherwise you get a vmexit and it's
> slow again.  So /dev/ones needs to be written to use a huge page, possibly.
> 
> Paolo

It's not easy to do that - each device gets 4K within MCFG.

So what we need then is a kvm option to create an address range - or
maybe even a group of address ranges and aggressively map all pages in a
group to the same guest page on a fault of one page in the group.

-- 
MST


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-25 17:38                                 ` Michael S. Tsirkin
@ 2019-07-26 12:46                                   ` Igor Mammedov
  0 siblings, 0 replies; 68+ messages in thread
From: Igor Mammedov @ 2019-07-26 12:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Eduardo Habkost, Sergio Lopez, Maran Wilson,
	Stefan Hajnoczi, QEMU Developers, Gerd Hoffmann, Paolo Bonzini,
	Richard Henderson, Stefano Garzarella

On Thu, 25 Jul 2019 13:38:48 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Jul 25, 2019 at 05:39:39PM +0200, Paolo Bonzini wrote:
> > On 25/07/19 17:01, Michael S. Tsirkin wrote:  
> > >> It would be educational to try to enable ACPI core but disable all
> > >> optional features.  
> > 
> > A lot of them are select'ed so it's not easy.
> >   
> > > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.  
> > 
> > That's what the NEMU guys experimented with.  It's not supported by our
> > DSDT since it uses ACPI GPE,  
> 
> Well there are two GPE blocks in FADT. We could just switch to
> these if necesary I think.

if it's simplistic vm we could build dedicated DSDT (or whole set of tables)
for it and use reduced profile like arm-virt machine does (just a newer
version of FADT with need flags set). That probably would cut acpi cost on
QEMU side.

> > and the reduction in code size is small
> > (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).
> > 
> > Paolo  
> 
> Well ACPI is 150k loc I think, right?
> 
> linux]$ wc -l `find drivers/acpi/ -name '*.c' `|tail -1
>  145926 total
> 
> So 100k wouldn't be too shabby.
> 



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
                   ` (7 preceding siblings ...)
  2019-07-03  9:58 ` Stefan Hajnoczi
@ 2019-08-29  9:02 ` Jing Liu
  2019-08-29 15:46   ` Sergio Lopez
  8 siblings, 1 reply; 68+ messages in thread
From: Jing Liu @ 2019-08-29  9:02 UTC (permalink / raw)
  To: Sergio Lopez, mst, marcel.apfelbaum, pbonzini, rth, ehabkost,
	maran.wilson, sgarzare, kraxel
  Cc: qemu-devel

Hi Sergio,

The idea is interesting and I tried to launch a guest by your
guide but seems failed to me. I tried both legacy and normal modes,
but the vncviewer connected and told me that:
The vm has no graphic display device.
All the screen in vnc is just black.

kernel config:
CONFIG_KVM_MMIO=y
CONFIG_VIRTIO_MMIO=y

I don't know if any specified kernel version/patch/config
is needed or anything I missed.
Could you kindly give some tips?

Thanks very much.
Jing



> A QEMU instance with the microvm machine type can be invoked this way:
> 
>   - Normal mode:
> 
> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>   -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>   -nodefaults -no-user-config \
>   -chardev pty,id=virtiocon0,server \
>   -device virtio-serial-device \
>   -device virtconsole,chardev=virtiocon0 \
>   -drive id=test,file=test.img,format=raw,if=none \
>   -device virtio-blk-device,drive=test \
>   -netdev tap,id=tap0,script=no,downscript=no \
>   -device virtio-net-device,netdev=tap0
> 
>   - Legacy mode:
> 
> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>   -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>   -nodefaults -no-user-config \
>   -drive id=test,file=test.img,format=raw,if=none \
>   -device virtio-blk-device,drive=test \
>   -netdev tap,id=tap0,script=no,downscript=no \
>   -device virtio-net-device,netdev=tap0 \
>   -serial stdio
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-08-29  9:02 ` Jing Liu
@ 2019-08-29 15:46   ` Sergio Lopez
  2019-08-30  4:53     ` Jing Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-08-29 15:46 UTC (permalink / raw)
  To: Jing Liu
  Cc: ehabkost, maran.wilson, mst, qemu-devel, kraxel, pbonzini, sgarzare, rth

[-- Attachment #1: Type: text/plain, Size: 3127 bytes --]


Jing Liu <jing2.liu@linux.intel.com> writes:

> Hi Sergio,
>
> The idea is interesting and I tried to launch a guest by your
> guide but seems failed to me. I tried both legacy and normal modes,
> but the vncviewer connected and told me that:
> The vm has no graphic display device.
> All the screen in vnc is just black.

The microvm machine type doesn't support any graphics device, so you
need to rely on the serial console.

> kernel config:
> CONFIG_KVM_MMIO=y
> CONFIG_VIRTIO_MMIO=y
>
> I don't know if any specified kernel version/patch/config
> is needed or anything I missed.
> Could you kindly give some tips?

I'm testing it with upstream vanilla Linux. In addition to MMIO, you
need to add support for PVH (the next version of this patchset, v4, will
support booting from FW, so it'll be possible to use non-PVH ELF kernels
and bzImages too).

I've just uploaded a working kernel config here:

https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9

As for the QEMU command line, something like this should do the trick:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio

If this works, you can move to non-legacy mode with a virtio-console:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0

If is still working, you can try adding some devices too:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test

Sergio.

> Thanks very much.
> Jing
>
>
>
>> A QEMU instance with the microvm machine type can be invoked this way:
>>
>>   - Normal mode:
>>
>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>   -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>   -nodefaults -no-user-config \
>>   -chardev pty,id=virtiocon0,server \
>>   -device virtio-serial-device \
>>   -device virtconsole,chardev=virtiocon0 \
>>   -drive id=test,file=test.img,format=raw,if=none \
>>   -device virtio-blk-device,drive=test \
>>   -netdev tap,id=tap0,script=no,downscript=no \
>>   -device virtio-net-device,netdev=tap0
>>
>>   - Legacy mode:
>>
>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>   -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>   -nodefaults -no-user-config \
>>   -drive id=test,file=test.img,format=raw,if=none \
>>   -device virtio-blk-device,drive=test \
>>   -netdev tap,id=tap0,script=no,downscript=no \
>>   -device virtio-net-device,netdev=tap0 \
>>   -serial stdio
>>


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-08-29 15:46   ` Sergio Lopez
@ 2019-08-30  4:53     ` Jing Liu
  2019-08-30 14:27       ` Sergio Lopez
  0 siblings, 1 reply; 68+ messages in thread
From: Jing Liu @ 2019-08-30  4:53 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, maran.wilson, mst, qemu-devel, kraxel, pbonzini, sgarzare, rth

Hi Sergio,

On 8/29/2019 11:46 PM, Sergio Lopez wrote:
> 
> Jing Liu <jing2.liu@linux.intel.com> writes:
> 
>> Hi Sergio,
>>
>> The idea is interesting and I tried to launch a guest by your
>> guide but seems failed to me. I tried both legacy and normal modes,
>> but the vncviewer connected and told me that:
>> The vm has no graphic display device.
>> All the screen in vnc is just black.
> 
> The microvm machine type doesn't support any graphics device, so you
> need to rely on the serial console.
Got it.

> 
>> kernel config:
>> CONFIG_KVM_MMIO=y
>> CONFIG_VIRTIO_MMIO=y
>>
>> I don't know if any specified kernel version/patch/config
>> is needed or anything I missed.
>> Could you kindly give some tips?
> 
> I'm testing it with upstream vanilla Linux. In addition to MMIO, you
> need to add support for PVH (the next version of this patchset, v4, will
> support booting from FW, so it'll be possible to use non-PVH ELF kernels
> and bzImages too).
> 
> I've just uploaded a working kernel config here:
> 
> https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9
> 
Thanks very much and this config is helpful to me.

> As for the QEMU command line, something like this should do the trick:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio
> 
> If this works, you can move to non-legacy mode with a virtio-console:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0
> 
I tried the above two ways and it works now. Thanks!

> If is still working, you can try adding some devices too:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test
> 
But I'm wondering why the image I used can not be found.
root=/dev/vda3 and the same image worked well on normal qemu/guest-
config bootup, but didn't work here. The details are,

-append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \

[    0.022784] Key type encrypted registered
[    0.022988] VFS: Cannot open root device "vda3" or 
unknown-block(254,3): error -6
[    0.023041] Please append a correct "root=" boot option; here are the 
available partitions:
[    0.023089] fe00         8946688 vda
[    0.023090]  driver: virtio_blk
[    0.023143] Kernel panic - not syncing: VFS: Unable to mount root fs 
on unknown-block(254,3)
[    0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23


BTW, root=/dev/vda is also tried and didn't work. The dmesg is a little 
different:

[    0.028050] Key type encrypted registered
[    0.028484] List of all partitions:
[    0.028529] fe00         8946688 vda
[    0.028529]  driver: virtio_blk
[    0.028615] No filesystem could mount root, tried:
[    0.028616]  ext4
[    0.028670]
[    0.028712] Kernel panic - not syncing: VFS: Unable to mount root fs 
on unknown-block(254,0)

I tried another ext4 img but still doesn't work.
Is there any limitation of blk image? Could I copy your image for simple
test?

Thanks in advance,
Jing

> Sergio.
> 
>> Thanks very much.
>> Jing
>>
>>
>>
>>> A QEMU instance with the microvm machine type can be invoked this way:
>>>
>>>    - Normal mode:
>>>
>>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>>    -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>>    -nodefaults -no-user-config \
>>>    -chardev pty,id=virtiocon0,server \
>>>    -device virtio-serial-device \
>>>    -device virtconsole,chardev=virtiocon0 \
>>>    -drive id=test,file=test.img,format=raw,if=none \
>>>    -device virtio-blk-device,drive=test \
>>>    -netdev tap,id=tap0,script=no,downscript=no \
>>>    -device virtio-net-device,netdev=tap0
>>>
>>>    - Legacy mode:
>>>
>>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>>    -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>>    -nodefaults -no-user-config \
>>>    -drive id=test,file=test.img,format=raw,if=none \
>>>    -device virtio-blk-device,drive=test \
>>>    -netdev tap,id=tap0,script=no,downscript=no \
>>>    -device virtio-net-device,netdev=tap0 \
>>>    -serial stdio
>>>
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-08-30  4:53     ` Jing Liu
@ 2019-08-30 14:27       ` Sergio Lopez
  2019-09-02  5:43         ` Jing Liu
  0 siblings, 1 reply; 68+ messages in thread
From: Sergio Lopez @ 2019-08-30 14:27 UTC (permalink / raw)
  To: Jing Liu
  Cc: ehabkost, maran.wilson, mst, qemu-devel, kraxel, pbonzini, sgarzare, rth

[-- Attachment #1: Type: text/plain, Size: 5688 bytes --]


Jing Liu <jing2.liu@linux.intel.com> writes:

> Hi Sergio,
>
> On 8/29/2019 11:46 PM, Sergio Lopez wrote:
>>
>> Jing Liu <jing2.liu@linux.intel.com> writes:
>>
>>> Hi Sergio,
>>>
>>> The idea is interesting and I tried to launch a guest by your
>>> guide but seems failed to me. I tried both legacy and normal modes,
>>> but the vncviewer connected and told me that:
>>> The vm has no graphic display device.
>>> All the screen in vnc is just black.
>>
>> The microvm machine type doesn't support any graphics device, so you
>> need to rely on the serial console.
> Got it.
>
>>
>>> kernel config:
>>> CONFIG_KVM_MMIO=y
>>> CONFIG_VIRTIO_MMIO=y
>>>
>>> I don't know if any specified kernel version/patch/config
>>> is needed or anything I missed.
>>> Could you kindly give some tips?
>>
>> I'm testing it with upstream vanilla Linux. In addition to MMIO, you
>> need to add support for PVH (the next version of this patchset, v4, will
>> support booting from FW, so it'll be possible to use non-PVH ELF kernels
>> and bzImages too).
>>
>> I've just uploaded a working kernel config here:
>>
>> https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9
>>
> Thanks very much and this config is helpful to me.
>
>> As for the QEMU command line, something like this should do the trick:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio
>>
>> If this works, you can move to non-legacy mode with a virtio-console:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0
>>
> I tried the above two ways and it works now. Thanks!
>
>> If is still working, you can try adding some devices too:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test
>>
> But I'm wondering why the image I used can not be found.
> root=/dev/vda3 and the same image worked well on normal qemu/guest-
> config bootup, but didn't work here. The details are,
>
> -append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \
>
> [    0.022784] Key type encrypted registered
> [    0.022988] VFS: Cannot open root device "vda3" or
> unknown-block(254,3): error -6
> [    0.023041] Please append a correct "root=" boot option; here are
> the available partitions:
> [    0.023089] fe00         8946688 vda
> [    0.023090]  driver: virtio_blk
> [    0.023143] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(254,3)
> [    0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23
>
>
> BTW, root=/dev/vda is also tried and didn't work. The dmesg is a
> little different:
>
> [    0.028050] Key type encrypted registered
> [    0.028484] List of all partitions:
> [    0.028529] fe00         8946688 vda
> [    0.028529]  driver: virtio_blk
> [    0.028615] No filesystem could mount root, tried:
> [    0.028616]  ext4
> [    0.028670]
> [    0.028712] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(254,0)
>
> I tried another ext4 img but still doesn't work.
> Is there any limitation of blk image? Could I copy your image for simple
> test?

The kernel config I posted lacks support for DOS partitions. Adding
CONFIG_MSDOS_PARTITION=y should allow you to boot from /dev/vda3.

Anyway, in case you also want to try booting from /dev/vda (without
partitions), this is the recipe I use to quickly create a minimal rootfs
image:

# wget http://dl-cdn.alpinelinux.org/alpine/v3.10/releases/x86_64/alpine-minirootfs-3.10.2-x86_64.tar.gz
# qemu-img create -f raw alpine-rootfs-x86_64.raw 1G
# sudo losetup /dev/loop0 alpine-rootfs-x86_64.raw
# sudo mkfs.ext4 /dev/loop0
# sudo mount /dev/loop0 /mnt
# sudo tar xpf alpine-minirootfs-3.10.2-x86_64.tar.gz -C /mnt
# sudo umount /mnt
# sudo losetup -d /dev/loop0

The rootfs will be missing openrc, so you'll need to add "init=/bin/sh"
to the command line.

Sergio.

> Thanks in advance,
> Jing
>
>> Sergio.
>>
>>> Thanks very much.
>>> Jing
>>>
>>>
>>>
>>>> A QEMU instance with the microvm machine type can be invoked this way:
>>>>
>>>>    - Normal mode:
>>>>
>>>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>>>    -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>>>    -nodefaults -no-user-config \
>>>>    -chardev pty,id=virtiocon0,server \
>>>>    -device virtio-serial-device \
>>>>    -device virtconsole,chardev=virtiocon0 \
>>>>    -drive id=test,file=test.img,format=raw,if=none \
>>>>    -device virtio-blk-device,drive=test \
>>>>    -netdev tap,id=tap0,script=no,downscript=no \
>>>>    -device virtio-net-device,netdev=tap0
>>>>
>>>>    - Legacy mode:
>>>>
>>>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>>>    -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>>>    -nodefaults -no-user-config \
>>>>    -drive id=test,file=test.img,format=raw,if=none \
>>>>    -device virtio-blk-device,drive=test \
>>>>    -netdev tap,id=tap0,script=no,downscript=no \
>>>>    -device virtio-net-device,netdev=tap0 \
>>>>    -serial stdio
>>>>
>>


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
  2019-08-30 14:27       ` Sergio Lopez
@ 2019-09-02  5:43         ` Jing Liu
  0 siblings, 0 replies; 68+ messages in thread
From: Jing Liu @ 2019-09-02  5:43 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: ehabkost, mst, maran.wilson, qemu-devel, kraxel, pbonzini, rth, sgarzare



On 8/30/2019 10:27 PM, Sergio Lopez wrote:
> 
> Jing Liu <jing2.liu@linux.intel.com> writes:
> 
>> Hi Sergio,
>>
>> On 8/29/2019 11:46 PM, Sergio Lopez wrote:
>>>
>>> Jing Liu <jing2.liu@linux.intel.com> writes:
>>>
>>>> Hi Sergio,
>>>>
>>>> The idea is interesting and I tried to launch a guest by your
>>>> guide but seems failed to me. I tried both legacy and normal modes,
>>>> but the vncviewer connected and told me that:
>>>> The vm has no graphic display device.
>>>> All the screen in vnc is just black.
>>>
>>> The microvm machine type doesn't support any graphics device, so you
>>> need to rely on the serial console.
>> Got it.
>>
>>>
>>>> kernel config:
>>>> CONFIG_KVM_MMIO=y
>>>> CONFIG_VIRTIO_MMIO=y
>>>>
>>>> I don't know if any specified kernel version/patch/config
>>>> is needed or anything I missed.
>>>> Could you kindly give some tips?
>>>
>>> I'm testing it with upstream vanilla Linux. In addition to MMIO, you
>>> need to add support for PVH (the next version of this patchset, v4, will
>>> support booting from FW, so it'll be possible to use non-PVH ELF kernels
>>> and bzImages too).
>>>
>>> I've just uploaded a working kernel config here:
>>>
>>> https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9
>>>
>> Thanks very much and this config is helpful to me.
>>
>>> As for the QEMU command line, something like this should do the trick:
>>>
>>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio
>>>
>>> If this works, you can move to non-legacy mode with a virtio-console:
>>>
>>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0
>>>
>> I tried the above two ways and it works now. Thanks!
>>
>>> If is still working, you can try adding some devices too:
>>>
>>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test
>>>
>> But I'm wondering why the image I used can not be found.
>> root=/dev/vda3 and the same image worked well on normal qemu/guest-
>> config bootup, but didn't work here. The details are,
>>
>> -append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \
>>
>> [    0.022784] Key type encrypted registered
>> [    0.022988] VFS: Cannot open root device "vda3" or
>> unknown-block(254,3): error -6
>> [    0.023041] Please append a correct "root=" boot option; here are
>> the available partitions:
>> [    0.023089] fe00         8946688 vda
>> [    0.023090]  driver: virtio_blk
>> [    0.023143] Kernel panic - not syncing: VFS: Unable to mount root
>> fs on unknown-block(254,3)
>> [    0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23
>>
>>
>> BTW, root=/dev/vda is also tried and didn't work. The dmesg is a
>> little different:
>>
>> [    0.028050] Key type encrypted registered
>> [    0.028484] List of all partitions:
>> [    0.028529] fe00         8946688 vda
>> [    0.028529]  driver: virtio_blk
>> [    0.028615] No filesystem could mount root, tried:
>> [    0.028616]  ext4
>> [    0.028670]
>> [    0.028712] Kernel panic - not syncing: VFS: Unable to mount root
>> fs on unknown-block(254,0)
>>
>> I tried another ext4 img but still doesn't work.
>> Is there any limitation of blk image? Could I copy your image for simple
>> test?
> 
> The kernel config I posted lacks support for DOS partitions. Adding
> CONFIG_MSDOS_PARTITION=y should allow you to boot from /dev/vda3.
> 
> Anyway, in case you also want to try booting from /dev/vda (without
> partitions), this is the recipe I use to quickly create a minimal rootfs
> image:
> 
> # wget http://dl-cdn.alpinelinux.org/alpine/v3.10/releases/x86_64/alpine-minirootfs-3.10.2-x86_64.tar.gz
> # qemu-img create -f raw alpine-rootfs-x86_64.raw 1G
> # sudo losetup /dev/loop0 alpine-rootfs-x86_64.raw
> # sudo mkfs.ext4 /dev/loop0
> # sudo mount /dev/loop0 /mnt
> # sudo tar xpf alpine-minirootfs-3.10.2-x86_64.tar.gz -C /mnt
> # sudo umount /mnt
> # sudo losetup -d /dev/loop0
> 
> The rootfs will be missing openrc, so you'll need to add "init=/bin/sh"
> to the command line.
> 

Thank you Sergio. I'll try that.

Jing
> Sergio.
> 
>> Thanks in advance,
>> Jing
>>
>>> Sergio.


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2019-09-02  5:44 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-02 12:11 [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Sergio Lopez
2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 1/4] hw/virtio: Factorize virtio-mmio headers Sergio Lopez
2019-07-25  9:46   ` Liam Merwick
2019-07-25  9:58     ` Michael S. Tsirkin
2019-07-25 10:03       ` Peter Maydell
2019-07-25 10:36       ` Paolo Bonzini
2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 2/4] hw/i386: Add an Intel MPTable generator Sergio Lopez
2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 3/4] hw/i386: Factorize PVH related functions Sergio Lopez
2019-07-23  8:39   ` Liam Merwick
2019-07-02 12:11 ` [Qemu-devel] [PATCH v3 4/4] hw/i386: Introduce the microvm machine type Sergio Lopez
2019-07-02 13:58   ` Gerd Hoffmann
2019-07-25 10:47   ` Paolo Bonzini
2019-07-02 15:01 ` [Qemu-devel] [PATCH v3 0/4] " no-reply
2019-07-02 15:23 ` Peter Maydell
2019-07-02 17:34   ` Sergio Lopez
2019-07-02 18:04     ` Peter Maydell
2019-07-02 22:04       ` Sergio Lopez
2019-07-25  9:59         ` Michael S. Tsirkin
2019-07-25 10:05           ` Peter Maydell
2019-07-25 10:10             ` Michael S. Tsirkin
2019-07-25 14:52               ` Sergio Lopez
2019-07-25 10:42             ` Sergio Lopez
2019-07-25 11:23               ` Paolo Bonzini
2019-07-25 12:01                 ` Stefan Hajnoczi
2019-07-25 12:10                   ` Michael S. Tsirkin
2019-07-25 13:26                     ` Stefan Hajnoczi
2019-07-25 13:43                       ` Paolo Bonzini
2019-07-25 13:54                         ` Michael S. Tsirkin
2019-07-25 14:13                           ` Paolo Bonzini
2019-07-25 14:42                             ` Michael S. Tsirkin
2019-07-25 14:04                         ` Peter Maydell
2019-07-25 14:26                           ` Paolo Bonzini
2019-07-25 14:35                             ` Michael S. Tsirkin
2019-07-25 14:42                         ` Sergio Lopez
2019-07-25 14:58                           ` Michael S. Tsirkin
2019-07-25 15:01                             ` Michael S. Tsirkin
2019-07-25 15:39                               ` Paolo Bonzini
2019-07-25 17:38                                 ` Michael S. Tsirkin
2019-07-26 12:46                                   ` Igor Mammedov
2019-07-25 15:49                               ` Sergio Lopez
2019-07-25 13:48                       ` Michael S. Tsirkin
2019-07-02 15:30 ` no-reply
2019-07-03  9:58 ` Stefan Hajnoczi
2019-07-18 15:21   ` Sergio Lopez
2019-07-19 10:29     ` Stefan Hajnoczi
2019-07-19 13:48       ` Sergio Lopez
2019-07-19 15:09         ` Stefan Hajnoczi
2019-07-19 15:42           ` Montes, Julio
2019-07-23  8:43             ` Sergio Lopez
2019-07-23  9:47               ` Stefan Hajnoczi
2019-07-23 10:01                 ` Paolo Bonzini
2019-07-24 11:14                   ` Paolo Bonzini
2019-07-25  9:35                     ` Sergio Lopez
2019-07-25 10:03                     ` Michael S. Tsirkin
2019-07-25 10:55                       ` Paolo Bonzini
2019-07-25 14:46                     ` Michael S. Tsirkin
2019-07-25 15:35                       ` Paolo Bonzini
2019-07-25 17:33                         ` Michael S. Tsirkin
2019-07-25 20:30                         ` Michael S. Tsirkin
2019-07-26  7:57                           ` Paolo Bonzini
2019-07-26 11:10                             ` Michael S. Tsirkin
2019-07-23 11:30                 ` Stefano Garzarella
2019-07-24 15:23                   ` Stefano Garzarella
2019-08-29  9:02 ` Jing Liu
2019-08-29 15:46   ` Sergio Lopez
2019-08-30  4:53     ` Jing Liu
2019-08-30 14:27       ` Sergio Lopez
2019-09-02  5:43         ` Jing Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.