All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
@ 2016-06-17  8:14 Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 1/9] acpi: introduce light weight ACPI PM emulation pm-lite Chao Peng
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

This patchset is against commit 585fcd4 (Merge remote-tracking branch
'remotes/bonzini/tags/for-upstream' into staging) on master branch. I
also put it on github:

https://github.com/chao-p/qemu pc-lite-v1
 
Although we have run the patchset internally for a while but it is still
considered as RFC. Any comments (coding style or design issue) are
welcome.

Introduction
============
The patch series introduces a new platform pc-lite, which is designed to
be a virtual and light weight x86 PC platform. It is not designed to be
compatible with old hardware and system. Instead, It removes the burden
of legacy devices and emulates new fast hardware as much as possible. It
is expected to be used together with optimized guest (though unoptimized
guest works as well) to gain fast booting and small footprint benefits
that are difficult to achieve for traditional hardware-emulated
platform.

Basically:
- it removes old ISA devices and support only PCI devices;
- it removes 8259, instead use MSI as much as possible. IOAPIC and PCI
  PIN are still kept to support ACPI SCI;
- it supports PCIE ( you can use MMFG instead of 0xcf8/0xcfc port
  access);
- it gets rid of legacy firmware interfaces and supports ACPI tables;
- it loads guest kernel directly, no BIOS, no bootloader, no realmode
  code;
- it supports CPU/memory/PCI hotplug;
- it is FAST;

However:
- it supports KVM-host only at present;
- it supports Linux-guest only at present;
- You may need carefully configure guest kernel;
- You are forced to use virtio-serial-pci, old 8250/16550 is not there;

Want to have a try?
===================

Please follow https://github.com/chao-p/qemu-lite-tools.

Thanks,
Chao

Chao Peng (6):
  acpi: introduce light weight ACPI PM emulation pm-lite
  pci: introduce light weight PCIE Host emulation pci-lite
  acpi: add support for pc-lite platform
  pc: skip setting CMOS data when RTC device is unavailable
  pc: support direct loading protected/long mode kernel
  pc: introduce light weight PC board pc-lite

Haozhong Zhang (3):
  acpi: expose data structurs and functions of BIOS linker loader
  acpi: expose acpi_checksum()
  acpi: patch guest ACPI for pc-lite

 default-configs/i386-softmmu.mak     |   1 +
 default-configs/x86_64-softmmu.mak   |   1 +
 docs/specs/acpi_cpu_hotplug.txt      |   1 +
 hw/acpi/Makefile.objs                |   2 +-
 hw/acpi/bios-linker-loader.c         |  83 +------
 hw/acpi/core.c                       |   2 +-
 hw/acpi/nvdimm.c                     |   6 +-
 hw/acpi/pm_lite.c                    | 446 +++++++++++++++++++++++++++++++++++
 hw/i386/Makefile.objs                |   2 +-
 hw/i386/acpi-build.c                 | 180 +++++++++-----
 hw/i386/pc.c                         | 263 ++++++++++++++++++---
 hw/i386/pc_lite.c                    | 205 ++++++++++++++++
 hw/i386/pc_lite_acpi.c               | 299 +++++++++++++++++++++++
 hw/i386/pc_piix.c                    |   2 +
 hw/i386/pc_q35.c                     |   2 +
 hw/pci-host/Makefile.objs            |   1 +
 hw/pci-host/pci_lite.c               | 259 ++++++++++++++++++++
 include/hw/acpi/acpi.h               |   2 +
 include/hw/acpi/bios-linker-loader.h |  85 +++++++
 include/hw/acpi/pc-hotplug.h         |   1 +
 include/hw/acpi/pm_lite.h            |   6 +
 include/hw/i386/pc.h                 |  17 ++
 include/hw/i386/pc_lite_acpi.h       |  10 +
 23 files changed, 1697 insertions(+), 179 deletions(-)
 create mode 100644 hw/acpi/pm_lite.c
 create mode 100644 hw/i386/pc_lite.c
 create mode 100644 hw/i386/pc_lite_acpi.c
 create mode 100644 hw/pci-host/pci_lite.c
 create mode 100644 include/hw/acpi/pm_lite.h
 create mode 100644 include/hw/i386/pc_lite_acpi.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 1/9] acpi: introduce light weight ACPI PM emulation pm-lite
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
@ 2016-06-17  8:14 ` Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 2/9] pci: introduce light weight PCIE Host emulation pci-lite Chao Peng
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

The code is loosely based on piix4_pm. The goal is to make it light
weight and dedicated to the emulation of PM registers defined in ACPI
spec. Unlike piix4_pm, the register address (PM_IO_BASE) is fixed so
configuration in BIOS is impossible and unnecessary.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 docs/specs/acpi_cpu_hotplug.txt |   1 +
 hw/acpi/Makefile.objs           |   2 +-
 hw/acpi/pm_lite.c               | 446 ++++++++++++++++++++++++++++++++++++++++
 include/hw/acpi/pc-hotplug.h    |   1 +
 include/hw/acpi/pm_lite.h       |   6 +
 include/hw/i386/pc.h            |   4 +
 6 files changed, 459 insertions(+), 1 deletion(-)
 create mode 100644 hw/acpi/pm_lite.c
 create mode 100644 include/hw/acpi/pm_lite.h

diff --git a/docs/specs/acpi_cpu_hotplug.txt b/docs/specs/acpi_cpu_hotplug.txt
index 340b751..06ac18e 100644
--- a/docs/specs/acpi_cpu_hotplug.txt
+++ b/docs/specs/acpi_cpu_hotplug.txt
@@ -13,6 +13,7 @@ hot-add/remove event to ACPI BIOS, via SCI interrupt.
 CPU present bitmap for:
   ICH9-LPC (IO port 0x0cd8-0xcf7, 1-byte access)
   PIIX-PM  (IO port 0xaf00-0xaf1f, 1-byte access)
+  PM-LITE  (IO port 0xaf00-0xaf1f, 1-byte access)
 ---------------------------------------------------------------
 One bit per CPU. Bit position reflects corresponding CPU APIC ID.
 Read-only.
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 66bd727..82adf32 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o pcihp.o
+common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o pcihp.o pm_lite.o
 common-obj-$(CONFIG_ACPI_X86_ICH) += ich9.o tco.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o memory_hotplug_acpi_table.o
diff --git a/hw/acpi/pm_lite.c b/hw/acpi/pm_lite.c
new file mode 100644
index 0000000..7c19e28
--- /dev/null
+++ b/hw/acpi/pm_lite.c
@@ -0,0 +1,446 @@
+/*
+ * Light weight ACPI PM implementation
+ *
+ * Copyright (c) 2006 Fabrice Bellard
+ * Copyright (C) 2016 Intel Corporation.
+ *
+ * Author:
+ *  Chao Peng <chao.p.peng@linux.intel.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "hw/i386/pc.h"
+#include "hw/pci/pci.h"
+#include "hw/acpi/acpi.h"
+#include "sysemu/sysemu.h"
+#include "qapi/error.h"
+#include "qemu/range.h"
+#include "exec/ioport.h"
+#include "hw/nvram/fw_cfg.h"
+#include "exec/address-spaces.h"
+#include "hw/acpi/pm_lite.h"
+#include "hw/acpi/pcihp.h"
+#include "hw/acpi/cpu_hotplug.h"
+#include "hw/hotplug.h"
+#include "hw/mem/pc-dimm.h"
+#include "hw/acpi/memory_hotplug.h"
+#include "hw/acpi/acpi_dev_interface.h"
+#include "hw/xen/xen.h"
+
+#define PM_IO_BASE      0x600
+#define GPE_BASE        0xafe0
+#define GPE_LEN         4
+
+typedef struct PMLiteState {
+    /*< private >*/
+    PCIDevice parent_obj;
+    /*< public >*/
+
+    MemoryRegion io;
+    MemoryRegion io_gpe;
+    ACPIREGS ar;
+
+    qemu_irq irq;
+    Notifier machine_ready;
+    Notifier powerdown_notifier;
+
+    AcpiPciHpState acpi_pci_hotplug;
+    bool use_acpi_pci_hotplug;
+
+    uint8_t disable_s3;
+    uint8_t disable_s4;
+    uint8_t s4_val;
+
+    AcpiCpuHotplug gpe_cpu;
+
+    MemHotplugState acpi_memory_hotplug;
+} PMLiteState;
+
+#define TYPE_PM_LITE "PM_LITE"
+
+#define PM_LITE(obj) \
+    OBJECT_CHECK(PMLiteState, (obj), TYPE_PM_LITE)
+
+#define ACPI_ENABLE 0xf1
+#define ACPI_DISABLE 0xf0
+
+static void pm_tmr_timer(ACPIREGS *ar)
+{
+    PMLiteState *s = container_of(ar, PMLiteState, ar);
+    acpi_update_sci(&s->ar, s->irq);
+}
+
+#define VMSTATE_GPE_ARRAY(_field, _state)                            \
+ {                                                                   \
+     .name       = (stringify(_field)),                              \
+     .version_id = 0,                                                \
+     .info       = &vmstate_info_uint16,                             \
+     .size       = sizeof(uint16_t),                                 \
+     .flags      = VMS_SINGLE | VMS_POINTER,                         \
+     .offset     = vmstate_offset_pointer(_state, _field, uint8_t),  \
+ }
+
+static const VMStateDescription vmstate_gpe = {
+    .name = "gpe",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_GPE_ARRAY(sts, ACPIGPE),
+        VMSTATE_GPE_ARRAY(en, ACPIGPE),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_pci_status = {
+    .name = "pci_status",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(up, struct AcpiPciHpPciStatus),
+        VMSTATE_UINT32(down, struct AcpiPciHpPciStatus),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static bool vmstate_test_use_acpi_pci_hotplug(void *opaque, int version_id)
+{
+    PMLiteState *s = opaque;
+    return s->use_acpi_pci_hotplug;
+}
+
+static bool vmstate_test_no_use_acpi_pci_hotplug(void *opaque, int version_id)
+{
+    PMLiteState *s = opaque;
+    return !s->use_acpi_pci_hotplug;
+}
+
+static bool vmstate_test_use_memhp(void *opaque)
+{
+    PMLiteState *s = opaque;
+    return s->acpi_memory_hotplug.is_enabled;
+}
+
+static const VMStateDescription vmstate_memhp_state = {
+    .name = "pm_lite/memhp",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .needed = vmstate_test_use_memhp,
+    .fields      = (VMStateField[]) {
+        VMSTATE_MEMORY_HOTPLUG(acpi_memory_hotplug, PMLiteState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_acpi = {
+    .name = "pm_lite",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCI_DEVICE(parent_obj, PMLiteState),
+        VMSTATE_UINT16(ar.pm1.evt.sts, PMLiteState),
+        VMSTATE_UINT16(ar.pm1.evt.en, PMLiteState),
+        VMSTATE_UINT16(ar.pm1.cnt.cnt, PMLiteState),
+        VMSTATE_TIMER_PTR(ar.tmr.timer, PMLiteState),
+        VMSTATE_INT64(ar.tmr.overflow_time, PMLiteState),
+        VMSTATE_STRUCT(ar.gpe, PMLiteState, 2, vmstate_gpe, ACPIGPE),
+        VMSTATE_STRUCT_TEST(
+            acpi_pci_hotplug.acpi_pcihp_pci_status[ACPI_PCIHP_BSEL_DEFAULT],
+            PMLiteState,
+            vmstate_test_no_use_acpi_pci_hotplug,
+            2, vmstate_pci_status,
+            struct AcpiPciHpPciStatus),
+        VMSTATE_PCI_HOTPLUG(acpi_pci_hotplug, PMLiteState,
+                            vmstate_test_use_acpi_pci_hotplug),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription*[]) {
+         &vmstate_memhp_state,
+         NULL
+    }
+};
+
+static void pm_lite_reset(void *opaque)
+{
+    PMLiteState *s = opaque;
+    acpi_pcihp_reset(&s->acpi_pci_hotplug);
+}
+
+static void pm_lite_powerdown_req(Notifier *n, void *opaque)
+{
+    PMLiteState *s = container_of(n, PMLiteState, powerdown_notifier);
+
+    assert(s != NULL);
+    acpi_pm1_evt_power_down(&s->ar);
+}
+
+static void pm_lite_device_plug_cb(HotplugHandler *hotplug_dev,
+                                   DeviceState *dev, Error **errp)
+{
+    PMLiteState *s = PM_LITE(hotplug_dev);
+
+    if (s->acpi_memory_hotplug.is_enabled &&
+        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        acpi_memory_plug_cb(hotplug_dev, &s->acpi_memory_hotplug, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        acpi_pcihp_device_plug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+        legacy_acpi_cpu_plug_cb(hotplug_dev, &s->gpe_cpu, dev, errp);
+    } else {
+        error_setg(errp, "acpi: device plug request for not supported device"
+                   " type: %s", object_get_typename(OBJECT(dev)));
+    }
+}
+
+static void pm_lite_device_unplug_request_cb(HotplugHandler *hotplug_dev,
+                                             DeviceState *dev, Error **errp)
+{
+    PMLiteState *s = PM_LITE(hotplug_dev);
+
+    if (s->acpi_memory_hotplug.is_enabled &&
+        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        acpi_memory_unplug_request_cb(hotplug_dev, &s->acpi_memory_hotplug,
+                                      dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        acpi_pcihp_device_unplug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev,
+                                    errp);
+    } else {
+        error_setg(errp, "acpi: device unplug request for not supported device"
+                   " type: %s", object_get_typename(OBJECT(dev)));
+    }
+}
+
+static void pm_lite_device_unplug_cb(HotplugHandler *hotplug_dev,
+                                     DeviceState *dev, Error **errp)
+{
+    PMLiteState *s = PM_LITE(hotplug_dev);
+
+    if (s->acpi_memory_hotplug.is_enabled &&
+        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        acpi_memory_unplug_cb(&s->acpi_memory_hotplug, dev, errp);
+    } else {
+        error_setg(errp, "acpi: device unplug for not supported device"
+                   " type: %s", object_get_typename(OBJECT(dev)));
+    }
+}
+
+static void pm_lite_update_bus_hotplug(PCIBus *pci_bus, void *opaque)
+{
+    PMLiteState *s = opaque;
+
+    qbus_set_hotplug_handler(BUS(pci_bus), DEVICE(s), &error_abort);
+}
+
+static void pm_lite_machine_ready(Notifier *n, void *opaque)
+{
+    PMLiteState *s = container_of(n, PMLiteState, machine_ready);
+    PCIDevice *d = PCI_DEVICE(s);
+
+    if (s->use_acpi_pci_hotplug) {
+        pci_for_each_bus(d->bus, pm_lite_update_bus_hotplug, s);
+    } else {
+        pm_lite_update_bus_hotplug(d->bus, s);
+    }
+}
+
+static void pm_lite_add_propeties(PMLiteState *s)
+{
+    static const uint8_t acpi_enable_cmd = ACPI_ENABLE;
+    static const uint8_t acpi_disable_cmd = ACPI_DISABLE;
+    static const uint32_t pm_io_base = PM_IO_BASE;
+    static const uint32_t gpe0_blk = GPE_BASE;
+    static const uint32_t gpe0_blk_len = GPE_LEN;
+    static const uint16_t sci_int = 9;
+
+    object_property_add_uint8_ptr(OBJECT(s), ACPI_PM_PROP_ACPI_ENABLE_CMD,
+                                  &acpi_enable_cmd, NULL);
+    object_property_add_uint8_ptr(OBJECT(s), ACPI_PM_PROP_ACPI_DISABLE_CMD,
+                                  &acpi_disable_cmd, NULL);
+    object_property_add_uint32_ptr(OBJECT(s), ACPI_PM_PROP_PM_IO_BASE,
+                                  &pm_io_base, NULL);
+    object_property_add_uint32_ptr(OBJECT(s), ACPI_PM_PROP_GPE0_BLK,
+                                  &gpe0_blk, NULL);
+    object_property_add_uint32_ptr(OBJECT(s), ACPI_PM_PROP_GPE0_BLK_LEN,
+                                  &gpe0_blk_len, NULL);
+    object_property_add_uint16_ptr(OBJECT(s), ACPI_PM_PROP_SCI_INT,
+                                  &sci_int, NULL);
+}
+
+Object *pm_lite_find(void)
+{
+    bool ambig;
+    Object *o = object_resolve_path_type("", TYPE_PM_LITE, &ambig);
+
+    if (ambig || !o) {
+        return NULL;
+    }
+    return o;
+}
+
+DeviceState *pm_lite_init(PCIBus *bus, int devfn, qemu_irq sci_irq)
+{
+    DeviceState *dev;
+    PMLiteState *s;
+
+    dev = DEVICE(pci_create(bus, devfn, TYPE_PM_LITE));
+
+    s = PM_LITE(dev);
+    s->irq = sci_irq;
+    if (xen_enabled()) {
+        s->use_acpi_pci_hotplug = false;
+    }
+
+    qdev_init_nofail(dev);
+
+    return dev;
+}
+
+static uint64_t gpe_readb(void *opaque, hwaddr addr, unsigned width)
+{
+    PMLiteState *s = opaque;
+    uint32_t val = acpi_gpe_ioport_readb(&s->ar, addr);
+
+    return val;
+}
+
+static void gpe_writeb(void *opaque, hwaddr addr, uint64_t val,
+                       unsigned width)
+{
+    PMLiteState *s = opaque;
+
+    acpi_gpe_ioport_writeb(&s->ar, addr, val);
+    acpi_update_sci(&s->ar, s->irq);
+}
+
+static const MemoryRegionOps pm_lite_gpe_ops = {
+    .read = gpe_readb,
+    .write = gpe_writeb,
+    .valid.min_access_size = 1,
+    .valid.max_access_size = 4,
+    .impl.min_access_size = 1,
+    .impl.max_access_size = 1,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static void pm_lite_acpi_system_hot_add_init(MemoryRegion *parent,
+                                             PCIBus *bus, PMLiteState *s)
+{
+    memory_region_init_io(&s->io_gpe, OBJECT(s), &pm_lite_gpe_ops, s,
+                          "acpi-gpe0", GPE_LEN);
+    memory_region_add_subregion(parent, GPE_BASE, &s->io_gpe);
+
+    acpi_pcihp_init(OBJECT(s), &s->acpi_pci_hotplug, bus, parent,
+                    s->use_acpi_pci_hotplug);
+
+    legacy_acpi_cpu_hotplug_init(parent, OBJECT(s), &s->gpe_cpu,
+                                 PM_LITE_CPU_HOTPLUG_IO_BASE);
+
+    if (s->acpi_memory_hotplug.is_enabled) {
+        acpi_memory_hotplug_init(parent, OBJECT(s), &s->acpi_memory_hotplug);
+    }
+}
+
+static void pm_lite_ospm_status(AcpiDeviceIf *adev, ACPIOSTInfoList ***list)
+{
+    PMLiteState *s = PM_LITE(adev);
+
+    acpi_memory_ospm_status(&s->acpi_memory_hotplug, list);
+}
+
+static void pm_lite_send_gpe(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
+{
+    PMLiteState *s = PM_LITE(adev);
+
+    acpi_send_gpe_event(&s->ar, s->irq, ev);
+}
+
+static Property pm_lite_properties[] = {
+    DEFINE_PROP_UINT8(ACPI_PM_PROP_S3_DISABLED, PMLiteState, disable_s3, 0),
+    DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_DISABLED, PMLiteState, disable_s4, 0),
+    DEFINE_PROP_UINT8(ACPI_PM_PROP_S4_VAL, PMLiteState, s4_val, 2),
+    DEFINE_PROP_BOOL("acpi-pci-hotplug-with-bridge-support", PMLiteState,
+                     use_acpi_pci_hotplug, true),
+    DEFINE_PROP_BOOL("memory-hotplug-support", PMLiteState,
+                     acpi_memory_hotplug.is_enabled, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pm_lite_realize(PCIDevice *dev, Error **errp)
+{
+    PMLiteState *s = PM_LITE(dev);
+
+    memory_region_init(&s->io, OBJECT(s), "pm_lite", 64);
+    memory_region_add_subregion(pci_address_space_io(dev), PM_IO_BASE, &s->io);
+
+    acpi_pm_tmr_init(&s->ar, pm_tmr_timer, &s->io);
+    acpi_pm1_evt_init(&s->ar, pm_tmr_timer, &s->io);
+    acpi_pm1_cnt_init(&s->ar, &s->io, s->disable_s3, s->disable_s4, s->s4_val);
+    acpi_gpe_init(&s->ar, GPE_LEN);
+
+    s->powerdown_notifier.notify = pm_lite_powerdown_req;
+    qemu_register_powerdown_notifier(&s->powerdown_notifier);
+
+    s->machine_ready.notify = pm_lite_machine_ready;
+    qemu_add_machine_init_done_notifier(&s->machine_ready);
+    qemu_register_reset(pm_lite_reset, s);
+
+    pm_lite_acpi_system_hot_add_init(pci_address_space_io(dev), dev->bus, s);
+
+    pm_lite_add_propeties(s);
+}
+
+static void pm_lite_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
+    AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_CLASS(klass);
+
+    k->realize = pm_lite_realize;
+    k->class_id = PCI_CLASS_BRIDGE_OTHER;
+    dc->desc = "PM LITE";
+    dc->vmsd = &vmstate_acpi;
+    dc->props = pm_lite_properties;
+    /* Reason: part of pc-lite, needs to be wired up */
+    dc->cannot_instantiate_with_device_add_yet = true;
+    dc->hotpluggable = false;
+    hc->plug = pm_lite_device_plug_cb;
+    hc->unplug_request = pm_lite_device_unplug_request_cb;
+    hc->unplug = pm_lite_device_unplug_cb;
+    adevc->ospm_status = pm_lite_ospm_status;
+    adevc->send_event = pm_lite_send_gpe;
+}
+
+static const TypeInfo pm_lite_info = {
+    .name          = TYPE_PM_LITE,
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(PMLiteState),
+    .class_init    = pm_lite_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { TYPE_ACPI_DEVICE_IF },
+        { }
+    }
+};
+
+static void pm_lite_register_types(void)
+{
+    type_register_static(&pm_lite_info);
+}
+
+type_init(pm_lite_register_types)
diff --git a/include/hw/acpi/pc-hotplug.h b/include/hw/acpi/pc-hotplug.h
index 6a8d268..02dbe57 100644
--- a/include/hw/acpi/pc-hotplug.h
+++ b/include/hw/acpi/pc-hotplug.h
@@ -27,6 +27,7 @@
 
 #define ICH9_CPU_HOTPLUG_IO_BASE 0x0CD8
 #define PIIX4_CPU_HOTPLUG_IO_BASE 0xaf00
+#define PM_LITE_CPU_HOTPLUG_IO_BASE 0xaf00
 #define CPU_HOTPLUG_RESOURCE_DEVICE PRES
 
 #define ACPI_MEMORY_HOTPLUG_IO_LEN 24
diff --git a/include/hw/acpi/pm_lite.h b/include/hw/acpi/pm_lite.h
new file mode 100644
index 0000000..011233d
--- /dev/null
+++ b/include/hw/acpi/pm_lite.h
@@ -0,0 +1,6 @@
+#ifndef HW_ACPI_PM_LITE_H
+#define HW_ACPI_PM_LITE_H
+
+Object *pm_lite_find(void);
+
+#endif
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 49566c8..7c3506e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -278,6 +278,10 @@ I2CBus *piix4_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
                       int smm_enabled, DeviceState **piix4_pm);
 void piix4_smbus_register_device(SMBusDevice *dev, uint8_t addr);
 
+/* pm_lite.c */
+
+DeviceState *pm_lite_init(PCIBus *bus, int devfn, qemu_irq sci_irq);
+
 /* hpet.c */
 extern int no_hpet;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 2/9] pci: introduce light weight PCIE Host emulation pci-lite
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 1/9] acpi: introduce light weight ACPI PM emulation pm-lite Chao Peng
@ 2016-06-17  8:14 ` Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 3/9] acpi: add support for pc-lite platform Chao Peng
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

A minimal emulation for PCIE host bridge that supports 0xcf8/0xcfc and
MMFG. Actually there is already a gpex there, but it is designed mainly
for ARM and is not quite suitable for x86:
- it lacks things like PCI hole properties which are required by ACPI.
- the corresponding driver in Linux is designed to work with device tree
  which is not suitable for x86. For this case, additional guest driver
  is even not needed.

Currently MMFG size is limited to 1M, which means only 1 bus is
supported, this is aimed to reduce the scan time in guest. And it
doesn't have a valid vendor ID/device ID assigned so guest may not
recognize it, the functionality however is expected to be OK.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/pci-host/Makefile.objs          |   1 +
 hw/pci-host/pci_lite.c             | 259 +++++++++++++++++++++++++++++++++++++
 include/hw/i386/pc.h               |   5 +
 5 files changed, 267 insertions(+)
 create mode 100644 hw/pci-host/pci_lite.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index b177e52..421ad0a 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -38,6 +38,7 @@ CONFIG_PFLASH_CFI01=y
 CONFIG_TPM_TIS=$(CONFIG_TPM)
 CONFIG_MC146818RTC=y
 CONFIG_PAM=y
+CONFIG_PCI_LITE=y
 CONFIG_PCI_PIIX=y
 CONFIG_WDT_IB700=y
 CONFIG_XEN_I386=$(CONFIG_XEN)
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 6e3b312..f197cfd 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -38,6 +38,7 @@ CONFIG_PFLASH_CFI01=y
 CONFIG_TPM_TIS=$(CONFIG_TPM)
 CONFIG_MC146818RTC=y
 CONFIG_PAM=y
+CONFIG_PCI_LITE=y
 CONFIG_PCI_PIIX=y
 CONFIG_WDT_IB700=y
 CONFIG_XEN_I386=$(CONFIG_XEN)
diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs
index 45f1f0e..5dbb034 100644
--- a/hw/pci-host/Makefile.objs
+++ b/hw/pci-host/Makefile.objs
@@ -16,3 +16,4 @@ common-obj-$(CONFIG_FULONG) += bonito.o
 common-obj-$(CONFIG_PCI_PIIX) += piix.o
 common-obj-$(CONFIG_PCI_Q35) += q35.o
 common-obj-$(CONFIG_PCI_GENERIC) += gpex.o
+common-obj-$(CONFIG_PCI_LITE) += pci_lite.o
diff --git a/hw/pci-host/pci_lite.c b/hw/pci-host/pci_lite.c
new file mode 100644
index 0000000..15b388d
--- /dev/null
+++ b/hw/pci-host/pci_lite.c
@@ -0,0 +1,259 @@
+/*
+ * QEMU Light weight PCI Host Bridge Emulation
+ *
+ * Copyright (C) 2016 Intel Corporation.
+ *
+ * Author:
+ *  Chao Peng <chao.p.peng@linux.intel.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "hw/i386/pc.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/pcie_host.h"
+#include "hw/isa/isa.h"
+#include "hw/sysbus.h"
+#include "qapi/error.h"
+#include "qemu/range.h"
+#include "hw/xen/xen.h"
+#include "sysemu/sysemu.h"
+#include "hw/i386/ioapic.h"
+#include "qapi/visitor.h"
+#include "qemu/error-report.h"
+
+#define TYPE_PCI_LITE_HOST      "pci-lite-host"
+#define TYPE_PCI_LITE_DEVICE    "pci-lite-device"
+
+#define PCI_LITE_HOST(obj) \
+    OBJECT_CHECK(PCILiteHost, (obj), TYPE_PCI_LITE_HOST)
+
+#define PCI_LITE_NUM_IRQS       4
+#define PCI_LITE_PCIEXBAR_BASE  0xb0000000
+#define PCI_LITE_PCIEXBAR_SIZE  (0x100000) /* 1M for bus 0 */
+
+typedef struct PCILiteHost {
+    /*< private >*/
+    PCIExpressHost parent_obj;
+    /*< public >*/
+
+    PcPciInfo pci_info;
+    qemu_irq irq[PCI_LITE_NUM_IRQS];
+    uint64_t pci_hole64_size;
+} PCILiteHost;
+
+static void pci_lite_get_pci_hole_start(Object *obj, Visitor *v,
+                                        const char *name, void *opaque,
+                                        Error **errp)
+{
+    PCILiteHost *s = PCI_LITE_HOST(obj);
+    uint32_t value = s->pci_info.w32.begin;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
+static void pci_lite_get_pci_hole_end(Object *obj, Visitor *v,
+                                      const char *name, void *opaque,
+                                      Error **errp)
+{
+    PCILiteHost *s = PCI_LITE_HOST(obj);
+    uint32_t value = s->pci_info.w32.end;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
+static void pci_lite_get_pci_hole64_start(Object *obj, Visitor *v,
+                                          const char *name,
+                                          void *opaque, Error **errp)
+{
+    PCIHostState *h = PCI_HOST_BRIDGE(obj);
+    Range w64;
+
+    pci_bus_get_w64_range(h->bus, &w64);
+
+    visit_type_uint64(v, name, &w64.begin, errp);
+}
+
+static void pci_lite_get_pci_hole64_end(Object *obj, Visitor *v,
+                                        const char *name, void *opaque,
+                                        Error **errp)
+{
+    PCIHostState *h = PCI_HOST_BRIDGE(obj);
+    Range w64;
+
+    pci_bus_get_w64_range(h->bus, &w64);
+
+    visit_type_uint64(v, name, &w64.end, errp);
+}
+
+static void pci_lite_initfn(Object *obj)
+{
+    PCIHostState *s = PCI_HOST_BRIDGE(obj);
+    PCILiteHost *d = PCI_LITE_HOST(obj);
+
+    memory_region_init_io(&s->conf_mem, obj, &pci_host_conf_le_ops, s,
+                          "pci-conf-idx", 4);
+    memory_region_init_io(&s->data_mem, obj, &pci_host_data_le_ops, s,
+                          "pci-conf-data", 4);
+
+    object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_START, "int",
+                        pci_lite_get_pci_hole_start,
+                        NULL, NULL, NULL, NULL);
+
+    object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_END, "int",
+                        pci_lite_get_pci_hole_end,
+                        NULL, NULL, NULL, NULL);
+
+    object_property_add(obj, PCI_HOST_PROP_PCI_HOLE64_START, "int",
+                        pci_lite_get_pci_hole64_start,
+                        NULL, NULL, NULL, NULL);
+
+    object_property_add(obj, PCI_HOST_PROP_PCI_HOLE64_END, "int",
+                        pci_lite_get_pci_hole64_end,
+                        NULL, NULL, NULL, NULL);
+
+    d->pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS;
+}
+
+static void pci_lite_set_irq(void *opaque, int irq_num, int level)
+{
+    PCILiteHost *d = opaque;
+
+    qemu_set_irq(d->irq[irq_num], level);
+}
+
+static void pci_lite_realize(DeviceState *dev, Error **errp)
+{
+    PCIHostState *s = PCI_HOST_BRIDGE(dev);
+    PCILiteHost *d = PCI_LITE_HOST(dev);
+    SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+    int i;
+
+    sysbus_add_io(sbd, 0xcf8, &s->conf_mem);
+    sysbus_init_ioports(sbd, 0xcf8, 4);
+
+    sysbus_add_io(sbd, 0xcfc, &s->data_mem);
+    sysbus_init_ioports(sbd, 0xcfc, 4);
+
+    for (i = 0; i < PCI_LITE_NUM_IRQS; i++) {
+        sysbus_init_irq(sbd, &d->irq[i]);
+    }
+}
+
+PCIBus *pci_lite_init(MemoryRegion *address_space_mem,
+                      MemoryRegion *address_space_io,
+                      MemoryRegion *pci_address_space)
+{
+    DeviceState *dev;
+    PCIHostState *pci;
+    PCIExpressHost *pcie;
+    PCILiteHost *pci_lite;
+
+    dev = qdev_create(NULL, TYPE_PCI_LITE_HOST);
+    pci = PCI_HOST_BRIDGE(dev);
+    pcie = PCIE_HOST_BRIDGE(dev);
+
+    pci->bus = pci_register_bus(dev, "pcie.0", pci_lite_set_irq,
+                                pci_swizzle_map_irq_fn, pci, pci_address_space,
+                                address_space_io, 0, 4, TYPE_PCIE_BUS);
+
+    object_property_add_child(qdev_get_machine(), "pcilite", OBJECT(dev), NULL);
+    qdev_init_nofail(dev);
+
+    pci_lite = PCI_LITE_HOST(dev);
+    pci_lite->pci_info.w32.begin = PCI_LITE_PCIEXBAR_BASE +
+                                   PCI_LITE_PCIEXBAR_SIZE;
+
+    pcie_host_mmcfg_update(pcie, 1, PCI_LITE_PCIEXBAR_BASE,
+                           PCI_LITE_PCIEXBAR_SIZE);
+    e820_add_entry(PCI_LITE_PCIEXBAR_BASE, PCI_LITE_PCIEXBAR_SIZE,
+                   E820_RESERVED);
+
+    /* setup pci memory mapping */
+    pc_pci_as_mapping_init(OBJECT(dev), address_space_mem, pci_address_space);
+
+    pci_create_simple(pci->bus, 0, TYPE_PCI_LITE_DEVICE);
+    return pci->bus;
+}
+
+static const char *pci_lite_root_bus_path(PCIHostState *host_bridge,
+                                          PCIBus *rootbus)
+{
+    return "0000:00";
+}
+
+static Property pci_lite_props[] = {
+    DEFINE_PROP_UINT64(PCIE_HOST_MCFG_BASE, PCILiteHost,
+                       parent_obj.base_addr, PCI_LITE_PCIEXBAR_BASE),
+    DEFINE_PROP_UINT64(PCIE_HOST_MCFG_SIZE, PCILiteHost,
+                       parent_obj.size, PCI_LITE_PCIEXBAR_SIZE),
+    DEFINE_PROP_SIZE(PCI_HOST_PROP_PCI_HOLE64_SIZE, PCILiteHost,
+                     pci_hole64_size, DEFAULT_PCI_HOLE64_SIZE),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void pci_lite_host_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIHostBridgeClass *hc = PCI_HOST_BRIDGE_CLASS(klass);
+
+    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+    dc->realize = pci_lite_realize;
+    dc->props = pci_lite_props;
+    hc->root_bus_path = pci_lite_root_bus_path;
+}
+
+static const TypeInfo pci_lite_host_info = {
+    .name          = TYPE_PCI_LITE_HOST,
+    .parent        = TYPE_PCIE_HOST_BRIDGE,
+    .instance_size = sizeof(PCILiteHost),
+    .instance_init = pci_lite_initfn,
+    .class_init    = pci_lite_host_class_init,
+};
+
+static void pci_lite_device_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->class_id = PCI_CLASS_BRIDGE_HOST;
+    dc->desc = "Host bridge";
+    /*
+     * PCI-facing part of the host bridge, not usable without the
+     * host-facing part, which can't be device_add'ed, yet.
+     */
+    dc->cannot_instantiate_with_device_add_yet = true;
+    dc->hotpluggable   = false;
+}
+
+static const TypeInfo pci_lite_device_info = {
+    .name          = TYPE_PCI_LITE_DEVICE,
+    .parent        = TYPE_PCI_DEVICE,
+    .class_init    = pci_lite_device_class_init,
+};
+
+static void pci_lite_register_types(void)
+{
+    type_register_static(&pci_lite_device_info);
+    type_register_static(&pci_lite_host_info);
+}
+
+type_init(pci_lite_register_types)
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 7c3506e..ad7533b 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -285,6 +285,11 @@ DeviceState *pm_lite_init(PCIBus *bus, int devfn, qemu_irq sci_irq);
 /* hpet.c */
 extern int no_hpet;
 
+/* pci_lite.c */
+PCIBus *pci_lite_init(MemoryRegion *address_space_mem,
+                      MemoryRegion *address_space_io,
+                      MemoryRegion *pci_memory);
+
 /* piix_pci.c */
 struct PCII440FXState;
 typedef struct PCII440FXState PCII440FXState;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 3/9] acpi: add support for pc-lite platform
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 1/9] acpi: introduce light weight ACPI PM emulation pm-lite Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 2/9] pci: introduce light weight PCIE Host emulation pci-lite Chao Peng
@ 2016-06-17  8:14 ` Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 4/9] acpi: expose data structurs and functions of BIOS linker loader Chao Peng
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

Basically, the to be added pc-lite platform uses pm-lite to support
CPU/memory/PCI hotplug and pci-lite as the host bridge. The code here
reuses some existing facilities for piix/q35 platform.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 hw/i386/acpi-build.c | 108 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 77 insertions(+), 31 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 8ca2032..4b5ed96 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -47,6 +47,7 @@
 /* Supported chipsets: */
 #include "hw/acpi/piix4.h"
 #include "hw/acpi/pcihp.h"
+#include "hw/acpi/pm_lite.h"
 #include "hw/i386/ich9.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/q35.h"
@@ -76,6 +77,8 @@
 #define ACPI_BUILD_DPRINTF(fmt, ...)
 #endif
 
+#define ACPI_PORT_SMI_CMD           0x00b2 /* TODO: this is APM_CNT_IOPORT */
+
 typedef struct AcpiMcfgInfo {
     uint64_t mcfg_base;
     uint32_t mcfg_size;
@@ -87,6 +90,7 @@ typedef struct AcpiPmInfo {
     bool pcihp_bridge_en;
     uint8_t s4_val;
     uint16_t sci_int;
+    uint32_t smi_cmd;
     uint8_t acpi_enable_cmd;
     uint8_t acpi_disable_cmd;
     uint32_t gpe0_blk;
@@ -99,8 +103,14 @@ typedef struct AcpiPmInfo {
     uint16_t pcihp_io_len;
 } AcpiPmInfo;
 
+typedef enum {
+    PMTYPE_PIIX4   = 0,
+    PMTYPE_LPC     = 1,
+    PMTYPE_LITE    = 2,
+} PMType;
+
 typedef struct AcpiMiscInfo {
-    bool is_piix4;
+    PMType pm_type;
     bool has_hpet;
     TPMVersion tpm_version;
     const unsigned char *dsdt_code;
@@ -118,27 +128,32 @@ typedef struct AcpiBuildPciBusHotplugState {
 
 static void acpi_get_pm_info(AcpiPmInfo *pm)
 {
-    Object *piix = piix4_pm_find();
-    Object *lpc = ich9_lpc_find();
-    Object *obj = NULL;
+    Object *obj = ich9_lpc_find();
     QObject *o;
 
     pm->cpu_hp_io_base = 0;
     pm->pcihp_io_base = 0;
     pm->pcihp_io_len = 0;
-    if (piix) {
-        obj = piix;
-        pm->cpu_hp_io_base = PIIX4_CPU_HOTPLUG_IO_BASE;
+    pm->smi_cmd = ACPI_PORT_SMI_CMD;
+
+    if (obj) {
+        pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
+    } else {
+        obj = piix4_pm_find();
+        if (obj) {
+            pm->cpu_hp_io_base = PIIX4_CPU_HOTPLUG_IO_BASE;
+        } else {
+            obj = pm_lite_find();
+            assert(obj);
+            pm->smi_cmd = 0;
+            pm->cpu_hp_io_base = PM_LITE_CPU_HOTPLUG_IO_BASE;
+        }
+
         pm->pcihp_io_base =
             object_property_get_int(obj, ACPI_PCIHP_IO_BASE_PROP, NULL);
         pm->pcihp_io_len =
             object_property_get_int(obj, ACPI_PCIHP_IO_LEN_PROP, NULL);
     }
-    if (lpc) {
-        obj = lpc;
-        pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
-    }
-    assert(obj);
 
     pm->mem_hp_io_base = ACPI_MEMORY_HOTPLUG_BASE;
     pm->mem_hp_io_len = ACPI_MEMORY_HOTPLUG_IO_LEN;
@@ -188,15 +203,12 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
 
 static void acpi_get_misc_info(AcpiMiscInfo *info)
 {
-    Object *piix = piix4_pm_find();
-    Object *lpc = ich9_lpc_find();
-    assert(!!piix != !!lpc);
-
-    if (piix) {
-        info->is_piix4 = true;
-    }
-    if (lpc) {
-        info->is_piix4 = false;
+    if (piix4_pm_find()) {
+        info->pm_type = PMTYPE_PIIX4;
+    } else if (ich9_lpc_find()) {
+        info->pm_type = PMTYPE_LPC;
+    } else if (pm_lite_find()) {
+         info->pm_type = PMTYPE_LITE;
     }
 
     info->has_hpet = hpet_find();
@@ -222,6 +234,12 @@ static Object *acpi_get_i386_pci_host(void)
                             TYPE_PCI_HOST_BRIDGE);
     }
 
+    if (!host) {
+        host = OBJECT_CHECK(PCIHostState,
+                            object_resolve_path("/machine/pcilite", NULL),
+                            TYPE_PCI_HOST_BRIDGE);
+    }
+
     return OBJECT(host);
 }
 
@@ -247,8 +265,6 @@ static void acpi_get_pci_info(PcPciInfo *info)
                                             NULL);
 }
 
-#define ACPI_PORT_SMI_CMD           0x00b2 /* TODO: this is APM_CNT_IOPORT */
-
 static void acpi_align_size(GArray *blob, unsigned align)
 {
     /* Align size to multiple of given size. This reduces the chance
@@ -272,7 +288,7 @@ static void fadt_setup(AcpiFadtDescriptorRev1 *fadt, AcpiPmInfo *pm)
     fadt->model = 1;
     fadt->reserved1 = 0;
     fadt->sci_int = cpu_to_le16(pm->sci_int);
-    fadt->smi_cmd = cpu_to_le32(ACPI_PORT_SMI_CMD);
+    fadt->smi_cmd = cpu_to_le32(pm->smi_cmd);
     fadt->acpi_enable = pm->acpi_enable_cmd;
     fadt->acpi_disable = pm->acpi_disable_cmd;
     /* EVT, CNT, TMR offset matches hw/acpi/core.c */
@@ -1774,7 +1790,7 @@ static void build_piix4_isa_bridge(Aml *table)
     aml_append(table, scope);
 }
 
-static void build_piix4_pci_hotplug(Aml *table)
+static void build_pci_hotplug(Aml *table)
 {
     Aml *scope;
     Aml *field;
@@ -1815,7 +1831,7 @@ static void build_piix4_pci_hotplug(Aml *table)
     aml_append(table, scope);
 }
 
-static Aml *build_q35_osc_method(void)
+static Aml *build_osc_method(void)
 {
     Aml *if_ctx;
     Aml *if_ctx2;
@@ -1864,6 +1880,21 @@ static Aml *build_q35_osc_method(void)
     return method;
 }
 
+static void build_lite_pci0_int(Aml *table)
+{
+    Aml *sb_scope = aml_scope("_SB");
+    Aml *pci0_scope = aml_scope("PCI0");
+
+    aml_append(pci0_scope, build_prt(false));
+    aml_append(sb_scope, pci0_scope);
+
+    aml_append(sb_scope, build_gsi_link_dev("LNKA", 0x10, 0x10));
+    aml_append(sb_scope, build_gsi_link_dev("LNKB", 0x11, 0x11));
+    aml_append(sb_scope, build_gsi_link_dev("LNKC", 0x12, 0x12));
+    aml_append(sb_scope, build_gsi_link_dev("LNKD", 0x13, 0x13));
+    aml_append(table, sb_scope);
+}
+
 static void
 build_dsdt(GArray *table_data, BIOSLinker *linker,
            AcpiPmInfo *pm, AcpiMiscInfo *misc,
@@ -1885,7 +1916,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
     acpi_data_push(dsdt->buf, sizeof(AcpiTableHeader));
 
     build_dbg_aml(dsdt);
-    if (misc->is_piix4) {
+    if (misc->pm_type == PMTYPE_PIIX4) {
         sb_scope = aml_scope("_SB");
         dev = aml_device("PCI0");
         aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
@@ -1898,9 +1929,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         build_piix4_pm(dsdt);
         build_piix4_isa_bridge(dsdt);
         build_isa_devices_aml(dsdt);
-        build_piix4_pci_hotplug(dsdt);
+        build_pci_hotplug(dsdt);
         build_piix4_pci0_int(dsdt);
-    } else {
+    } else if (misc->pm_type == PMTYPE_LPC) {
         sb_scope = aml_scope("_SB");
         aml_append(sb_scope,
             aml_operation_region("PCST", AML_SYSTEM_IO, aml_int(0xae00), 0x0c));
@@ -1919,7 +1950,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         aml_append(dev, aml_name_decl("_UID", aml_int(1)));
         aml_append(dev, aml_name_decl("SUPP", aml_int(0)));
         aml_append(dev, aml_name_decl("CTRL", aml_int(0)));
-        aml_append(dev, build_q35_osc_method());
+        aml_append(dev, build_osc_method());
         aml_append(sb_scope, dev);
         aml_append(dsdt, sb_scope);
 
@@ -1927,6 +1958,21 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         build_q35_isa_bridge(dsdt);
         build_isa_devices_aml(dsdt);
         build_q35_pci0_int(dsdt);
+    } else { /* misc->pm_type == PMTYPE_LITE */
+        sb_scope = aml_scope("_SB");
+        dev = aml_device("PCI0");
+        aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
+        aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
+        aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
+        aml_append(dev, aml_name_decl("_UID", aml_int(1)));
+        aml_append(dev, aml_name_decl("SUPP", aml_int(0)));
+        aml_append(dev, aml_name_decl("CTRL", aml_int(0)));
+        aml_append(dev, build_osc_method());
+        aml_append(sb_scope, dev);
+        aml_append(dsdt, sb_scope);
+
+        build_pci_hotplug(dsdt);
+        build_lite_pci0_int(dsdt);
     }
 
     build_legacy_cpu_hotplug_aml(dsdt, machine, pm->cpu_hp_io_base);
@@ -1937,7 +1983,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
     {
         aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
 
-        if (misc->is_piix4) {
+        if (misc->pm_type == PMTYPE_PIIX4 || misc->pm_type == PMTYPE_LITE) {
             method = aml_method("_E01", 0, AML_NOTSERIALIZED);
             aml_append(method,
                 aml_acquire(aml_name("\\_SB.PCI0.BLCK"), 0xFFFF));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 4/9] acpi: expose data structurs and functions of BIOS linker loader
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
                   ` (2 preceding siblings ...)
  2016-06-17  8:14 ` [Qemu-devel] [RFC 3/9] acpi: add support for pc-lite platform Chao Peng
@ 2016-06-17  8:14 ` Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 5/9] acpi: expose acpi_checksum() Chao Peng
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

From: Haozhong Zhang <haozhong.zhang@intel.com>

Expose some data structures and functions of BIOS linker loader which
will be used by later commits.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 hw/acpi/bios-linker-loader.c         | 83 +----------------------------------
 include/hw/acpi/bios-linker-loader.h | 85 ++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+), 82 deletions(-)

diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
index d963ebe..e9c19cf 100644
--- a/hw/acpi/bios-linker-loader.c
+++ b/hw/acpi/bios-linker-loader.c
@@ -21,91 +21,10 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 #include "hw/acpi/bios-linker-loader.h"
-#include "hw/nvram/fw_cfg.h"
 
 #include "qemu/bswap.h"
 
 /*
- * Linker/loader is a paravirtualized interface that passes commands to guest.
- * The commands can be used to request guest to
- * - allocate memory chunks and initialize them from QEMU FW CFG files
- * - link allocated chunks by storing pointer to one chunk into another
- * - calculate ACPI checksum of part of the chunk and store into same chunk
- */
-#define BIOS_LINKER_LOADER_FILESZ FW_CFG_MAX_FILE_PATH
-
-struct BiosLinkerLoaderEntry {
-    uint32_t command;
-    union {
-        /*
-         * COMMAND_ALLOCATE - allocate a table from @alloc.file
-         * subject to @alloc.align alignment (must be power of 2)
-         * and @alloc.zone (can be HIGH or FSEG) requirements.
-         *
-         * Must appear exactly once for each file, and before
-         * this file is referenced by any other command.
-         */
-        struct {
-            char file[BIOS_LINKER_LOADER_FILESZ];
-            uint32_t align;
-            uint8_t zone;
-        } alloc;
-
-        /*
-         * COMMAND_ADD_POINTER - patch the table (originating from
-         * @dest_file) at @pointer.offset, by adding a pointer to the table
-         * originating from @src_file. 1,2,4 or 8 byte unsigned
-         * addition is used depending on @pointer.size.
-         */
-        struct {
-            char dest_file[BIOS_LINKER_LOADER_FILESZ];
-            char src_file[BIOS_LINKER_LOADER_FILESZ];
-            uint32_t offset;
-            uint8_t size;
-        } pointer;
-
-        /*
-         * COMMAND_ADD_CHECKSUM - calculate checksum of the range specified by
-         * @cksum_start and @cksum_length fields,
-         * and then add the value at @cksum.offset.
-         * Checksum simply sums -X for each byte X in the range
-         * using 8-bit math.
-         */
-        struct {
-            char file[BIOS_LINKER_LOADER_FILESZ];
-            uint32_t offset;
-            uint32_t start;
-            uint32_t length;
-        } cksum;
-
-        /* padding */
-        char pad[124];
-    };
-} QEMU_PACKED;
-typedef struct BiosLinkerLoaderEntry BiosLinkerLoaderEntry;
-
-enum {
-    BIOS_LINKER_LOADER_COMMAND_ALLOCATE     = 0x1,
-    BIOS_LINKER_LOADER_COMMAND_ADD_POINTER  = 0x2,
-    BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM = 0x3,
-};
-
-enum {
-    BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH = 0x1,
-    BIOS_LINKER_LOADER_ALLOC_ZONE_FSEG = 0x2,
-};
-
-/*
- * BiosLinkerFileEntry:
- *
- * An internal type used for book-keeping file entries
- */
-typedef struct BiosLinkerFileEntry {
-    char *name; /* file name */
-    GArray *blob; /* data accosiated with @name */
-} BiosLinkerFileEntry;
-
-/*
  * bios_linker_loader_init: allocate a new linker object instance.
  *
  * After initialization, linker commands can be added, and will
@@ -137,7 +56,7 @@ void bios_linker_loader_cleanup(BIOSLinker *linker)
     g_free(linker);
 }
 
-static const BiosLinkerFileEntry *
+const BiosLinkerFileEntry *
 bios_linker_find_file(const BIOSLinker *linker, const char *name)
 {
     int i;
diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
index fa1e5d1..52c1e44 100644
--- a/include/hw/acpi/bios-linker-loader.h
+++ b/include/hw/acpi/bios-linker-loader.h
@@ -1,12 +1,93 @@
 #ifndef BIOS_LINKER_LOADER_H
 #define BIOS_LINKER_LOADER_H
 
+#include "hw/nvram/fw_cfg.h"
 
 typedef struct BIOSLinker {
     GArray *cmd_blob;
     GArray *file_list;
 } BIOSLinker;
 
+/*
+ * Linker/loader is a paravirtualized interface that passes commands to guest.
+ * The commands can be used to request guest to
+ * - allocate memory chunks and initialize them from QEMU FW CFG files
+ * - link allocated chunks by storing pointer to one chunk into another
+ * - calculate ACPI checksum of part of the chunk and store into same chunk
+ */
+#define BIOS_LINKER_LOADER_FILESZ FW_CFG_MAX_FILE_PATH
+
+struct BiosLinkerLoaderEntry {
+    uint32_t command;
+    union {
+        /*
+         * COMMAND_ALLOCATE - allocate a table from @alloc.file
+         * subject to @alloc.align alignment (must be power of 2)
+         * and @alloc.zone (can be HIGH or FSEG) requirements.
+         *
+         * Must appear exactly once for each file, and before
+         * this file is referenced by any other command.
+         */
+        struct {
+            char file[BIOS_LINKER_LOADER_FILESZ];
+            uint32_t align;
+            uint8_t zone;
+        } alloc;
+
+        /*
+         * COMMAND_ADD_POINTER - patch the table (originating from
+         * @dest_file) at @pointer.offset, by adding a pointer to the table
+         * originating from @src_file. 1,2,4 or 8 byte unsigned
+         * addition is used depending on @pointer.size.
+         */
+        struct {
+            char dest_file[BIOS_LINKER_LOADER_FILESZ];
+            char src_file[BIOS_LINKER_LOADER_FILESZ];
+            uint32_t offset;
+            uint8_t size;
+        } pointer;
+
+        /*
+         * COMMAND_ADD_CHECKSUM - calculate checksum of the range specified by
+         * @cksum_start and @cksum_length fields,
+         * and then add the value at @cksum.offset.
+         * Checksum simply sums -X for each byte X in the range
+         * using 8-bit math.
+         */
+        struct {
+            char file[BIOS_LINKER_LOADER_FILESZ];
+            uint32_t offset;
+            uint32_t start;
+            uint32_t length;
+        } cksum;
+
+        /* padding */
+        char pad[124];
+    };
+} QEMU_PACKED;
+typedef struct BiosLinkerLoaderEntry BiosLinkerLoaderEntry;
+
+enum {
+    BIOS_LINKER_LOADER_COMMAND_ALLOCATE     = 0x1,
+    BIOS_LINKER_LOADER_COMMAND_ADD_POINTER  = 0x2,
+    BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM = 0x3,
+};
+
+enum {
+    BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH = 0x1,
+    BIOS_LINKER_LOADER_ALLOC_ZONE_FSEG = 0x2,
+};
+
+/*
+ * BiosLinkerFileEntry:
+ *
+ * An internal type used for book-keeping file entries
+ */
+typedef struct BiosLinkerFileEntry {
+    char *name; /* file name */
+    GArray *blob; /* data accosiated with @name */
+} BiosLinkerFileEntry;
+
 BIOSLinker *bios_linker_loader_init(void);
 
 void bios_linker_loader_alloc(BIOSLinker *linker,
@@ -27,4 +108,8 @@ void bios_linker_loader_add_pointer(BIOSLinker *linker,
                                     uint32_t src_offset);
 
 void bios_linker_loader_cleanup(BIOSLinker *linker);
+
+const BiosLinkerFileEntry *
+bios_linker_find_file(const BIOSLinker *linker, const char *name);
+
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 5/9] acpi: expose acpi_checksum()
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
                   ` (3 preceding siblings ...)
  2016-06-17  8:14 ` [Qemu-devel] [RFC 4/9] acpi: expose data structurs and functions of BIOS linker loader Chao Peng
@ 2016-06-17  8:14 ` Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 6/9] acpi: patch guest ACPI for pc-lite Chao Peng
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

From: Haozhong Zhang <haozhong.zhang@intel.com>

It will be used in later commits.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 hw/acpi/core.c         | 2 +-
 include/hw/acpi/acpi.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index d24b9a9..70ad6ff 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -69,7 +69,7 @@ static void acpi_register_config(void)
 
 opts_init(acpi_register_config);
 
-static int acpi_checksum(const uint8_t *data, int len)
+int acpi_checksum(const uint8_t *data, int len)
 {
     int sum, i;
     sum = 0;
diff --git a/include/hw/acpi/acpi.h b/include/hw/acpi/acpi.h
index c717f15..b23eff8 100644
--- a/include/hw/acpi/acpi.h
+++ b/include/hw/acpi/acpi.h
@@ -188,4 +188,6 @@ struct AcpiSlicOem {
 };
 int acpi_get_slic_oem(AcpiSlicOem *oem);
 
+int acpi_checksum(const uint8_t *data, int len);
+
 #endif /* !QEMU_HW_ACPI_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 6/9] acpi: patch guest ACPI for pc-lite
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
                   ` (4 preceding siblings ...)
  2016-06-17  8:14 ` [Qemu-devel] [RFC 5/9] acpi: expose acpi_checksum() Chao Peng
@ 2016-06-17  8:14 ` Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 7/9] pc: skip setting CMOS data when RTC device is unavailable Chao Peng
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

From: Haozhong Zhang <haozhong.zhang@intel.com>

Traditionally, guest firmware is responsible to patch ACPI tables
generated by QEMU. However, no firmware is used with pc-lite and
patching ACPI should be done in QEMU for pc-lite.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 hw/acpi/nvdimm.c               |   6 +-
 hw/i386/Makefile.objs          |   2 +-
 hw/i386/acpi-build.c           |  72 +++++-----
 hw/i386/pc_lite_acpi.c         | 299 +++++++++++++++++++++++++++++++++++++++++
 include/hw/i386/pc_lite_acpi.h |  10 ++
 5 files changed, 355 insertions(+), 34 deletions(-)
 create mode 100644 hw/i386/pc_lite_acpi.c
 create mode 100644 include/hw/i386/pc_lite_acpi.h

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index b4c2262..523c744 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -478,8 +478,10 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io,
 
     state->dsm_mem = g_array_new(false, true /* clear */, 1);
     acpi_data_push(state->dsm_mem, sizeof(NvdimmDsmIn));
-    fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
-                    state->dsm_mem->len);
+    if (fw_cfg) {
+        fw_cfg_add_file(fw_cfg, NVDIMM_DSM_MEM_FILE, state->dsm_mem->data,
+                        state->dsm_mem->len);
+    }
 }
 
 #define NVDIMM_COMMON_DSM      "NCAL"
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index b52d5b8..7d29ec0 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -1,6 +1,6 @@
 obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
-obj-y += pc.o pc_piix.o pc_q35.o
+obj-y += pc.o pc_piix.o pc_q35.o pc_lite_acpi.o
 obj-y += pc_sysfw.o
 obj-y += intel_iommu.o
 obj-$(CONFIG_XEN) += ../xenpv/ xen/
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4b5ed96..6532de7 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -59,6 +59,8 @@
 #include "qapi/qmp/qint.h"
 #include "qom/qom-qobject.h"
 
+#include "hw/i386/pc_lite_acpi.h"
+
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
  * a little bit, there should be plenty of free space since the DSDT
@@ -2148,8 +2150,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
     aml_append(scope, aml_name_decl("_S5", pkg));
     aml_append(dsdt, scope);
 
-    /* create fw_cfg node, unconditionally */
-    {
+    /* create fw_cfg node for non pc-lite platforms */
+    if (misc->pm_type != PMTYPE_LITE) {
         /* when using port i/o, the 8-bit data register *always* overlaps
          * with half of the 16-bit control register. Hence, the total size
          * of the i/o region used is FW_CFG_CTL_SIZE; when using DMA, the
@@ -2775,8 +2777,9 @@ void acpi_setup(void)
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
     AcpiBuildTables tables;
     AcpiBuildState *build_state;
+    bool is_pc_lite = !!pm_lite_find();
 
-    if (!pcms->fw_cfg) {
+    if (!pcms->fw_cfg && !is_pc_lite) {
         ACPI_BUILD_DPRINTF("No fw cfg. Bailing out.\n");
         return;
     }
@@ -2799,41 +2802,48 @@ void acpi_setup(void)
     acpi_build(&tables, MACHINE(pcms));
 
     /* Now expose it all to Guest */
-    build_state->table_mr = acpi_add_rom_blob(build_state, tables.table_data,
-                                               ACPI_BUILD_TABLE_FILE,
-                                               ACPI_BUILD_TABLE_MAX_SIZE);
-    assert(build_state->table_mr != NULL);
-
-    build_state->linker_mr =
-        acpi_add_rom_blob(build_state, tables.linker->cmd_blob,
-                          "etc/table-loader", 0);
-
-    fw_cfg_add_file(pcms->fw_cfg, ACPI_BUILD_TPMLOG_FILE,
-                    tables.tcpalog->data, acpi_data_len(tables.tcpalog));
-
-    if (!pcmc->rsdp_in_ram) {
-        /*
-         * Keep for compatibility with old machine types.
-         * Though RSDP is small, its contents isn't immutable, so
-         * we'll update it along with the rest of tables on guest access.
-         */
-        uint32_t rsdp_size = acpi_data_len(tables.rsdp);
+    if (!is_pc_lite) {
+        build_state->table_mr = acpi_add_rom_blob(build_state, tables.table_data,
+                                                  ACPI_BUILD_TABLE_FILE,
+                                                  ACPI_BUILD_TABLE_MAX_SIZE);
+        assert(build_state->table_mr != NULL);
 
-        build_state->rsdp = g_memdup(tables.rsdp->data, rsdp_size);
-        fw_cfg_add_file_callback(pcms->fw_cfg, ACPI_BUILD_RSDP_FILE,
-                                 acpi_build_update, build_state,
-                                 build_state->rsdp, rsdp_size);
-        build_state->rsdp_mr = NULL;
-    } else {
-        build_state->rsdp = NULL;
-        build_state->rsdp_mr = acpi_add_rom_blob(build_state, tables.rsdp,
-                                                  ACPI_BUILD_RSDP_FILE, 0);
+        build_state->linker_mr =
+            acpi_add_rom_blob(build_state, tables.linker->cmd_blob,
+                              "etc/table-loader", 0);
+
+        fw_cfg_add_file(pcms->fw_cfg, ACPI_BUILD_TPMLOG_FILE,
+                        tables.tcpalog->data, acpi_data_len(tables.tcpalog));
+
+        if (!pcmc->rsdp_in_ram) {
+            /*
+             * Keep for compatibility with old machine types.
+             * Though RSDP is small, its contents isn't immutable, so
+             * we'll update it along with the rest of tables on guest access.
+             */
+            uint32_t rsdp_size = acpi_data_len(tables.rsdp);
+
+            build_state->rsdp = g_memdup(tables.rsdp->data, rsdp_size);
+            fw_cfg_add_file_callback(pcms->fw_cfg, ACPI_BUILD_RSDP_FILE,
+                                     acpi_build_update, build_state,
+                                     build_state->rsdp, rsdp_size);
+            build_state->rsdp_mr = NULL;
+        } else {
+            build_state->rsdp = NULL;
+            build_state->rsdp_mr = acpi_add_rom_blob(build_state, tables.rsdp,
+                                                     ACPI_BUILD_RSDP_FILE, 0);
+        }
     }
 
     qemu_register_reset(acpi_build_reset, build_state);
     acpi_build_reset(build_state);
     vmstate_register(NULL, 0, &vmstate_acpi_build, build_state);
 
+    if (is_pc_lite) {
+        pc_lite_acpi_build(pcms, tables.linker, &error_abort);
+        build_state->patched = 1;
+    }
+
     /* Cleanup tables but don't free the memory: we track it
      * in build_state.
      */
diff --git a/hw/i386/pc_lite_acpi.c b/hw/i386/pc_lite_acpi.c
new file mode 100644
index 0000000..01f4394
--- /dev/null
+++ b/hw/i386/pc_lite_acpi.c
@@ -0,0 +1,299 @@
+#include "qemu/osdep.h"
+#include <glib.h>
+#include "qemu-common.h"
+#include "qemu/mmap-alloc.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/pc_lite_acpi.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "exec/memory.h"
+#include "qapi/error.h"
+
+/* #define DEBUG_PC_LITE_ACPI */
+#ifdef DEBUG_PC_LITE_ACPI
+#define pc_lite_acpi_dprintf(fmt, ...)                  \
+    do {                                                \
+        printf("PC_LITE_ACPI: "fmt, ##__VA_ARGS__);     \
+    } while (0)
+#else
+#define pc_lite_acpi_dprintf(fmt, ...)
+#endif
+
+
+typedef
+struct PCLiteAcpiZone {
+    MemoryRegion *mr;
+    hwaddr       start;
+    hwaddr       offset;
+} PCLiteAcpiZone;
+static PCLiteAcpiZone pc_lite_acpi_himem_zone;
+static PCLiteAcpiZone pc_lite_acpi_fseg_zone;
+
+#define PC_LITE_ACPI_HIMEM_SIZE (256 * 1024)
+#define PC_LITE_ACPI_FSEG_SIZE  (0x100000 - 0xe0000)
+
+static PCLiteAcpiZone *pc_lite_acpi_get_zone(uint8_t zone)
+{
+    if (zone == BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH) {
+        return &pc_lite_acpi_himem_zone;
+    } else if (zone == BIOS_LINKER_LOADER_ALLOC_ZONE_FSEG) {
+        return &pc_lite_acpi_fseg_zone;
+    } else {
+        return NULL;
+    }
+}
+
+static int pc_lite_acpi_zone_init(PCLiteAcpiZone *zone, const char *name,
+                                  hwaddr start, uint64_t size)
+{
+    void *buf;
+    MemoryRegion *mr;
+
+    buf = qemu_ram_mmap(-1, size, 0x1000, true);
+    if (buf == MAP_FAILED) {
+        return -1;
+    }
+
+    mr = g_malloc(sizeof(*mr));
+    memory_region_init_ram_ptr(mr, NULL, name, size, buf);
+    memory_region_add_subregion_overlap(get_system_memory(), start, mr, 0);
+    e820_add_entry(start, size, E820_RESERVED);
+
+    zone->mr = mr;
+    zone->start = start;
+    zone->offset = 0;
+
+    return 0;
+}
+
+static void pc_lite_acpi_zones_init(PCMachineState *pcms)
+{
+    uint64_t start;
+
+    assert(pcms->below_4g_mem_size >= PC_LITE_ACPI_HIMEM_SIZE);
+    start = pcms->below_4g_mem_size - PC_LITE_ACPI_HIMEM_SIZE;
+    pc_lite_acpi_zone_init(&pc_lite_acpi_himem_zone, "acpi_himem",
+                           start, PC_LITE_ACPI_HIMEM_SIZE);
+    pc_lite_acpi_zone_init(&pc_lite_acpi_fseg_zone, "acpi_fseg",
+                           0xe0000, PC_LITE_ACPI_FSEG_SIZE);
+}
+
+/* return the offset within the corresponding zone, or ~0 for failure */
+static hwaddr pc_lite_acpi_zone_alloc(PCLiteAcpiZone *zone,
+                                      uint64_t size, uint64_t align,
+                                      Error **errp)
+{
+    hwaddr start = zone->start;
+    hwaddr offset = zone->offset;
+    uint64_t max_size = memory_region_size(zone->mr);
+    uint64_t addr;
+    Error *local_err = NULL;
+
+    addr = ROUND_UP(start + offset, align);
+    offset = addr - start;
+    if (size > max_size || max_size - size < offset) {
+        error_setg(&local_err, "Not enough space");
+        goto out;
+    }
+    zone->offset = offset + size;
+
+ out:
+    error_propagate(errp, local_err);
+    return offset;
+}
+
+
+typedef
+struct PCLiteAcpiFileEntry {
+    char         *name;
+    MemoryRegion *mr;
+    hwaddr       offset;
+} PCLiteAcpiFileEntry;
+
+typedef
+struct PCLiteAcpiFiles {
+    GArray *file_list;
+} PCLiteAcpiFiles;
+
+static PCLiteAcpiFiles *pc_lite_acpi_files;
+
+static void pc_lite_acpi_files_init(void)
+{
+    pc_lite_acpi_files = g_new(PCLiteAcpiFiles, 1);
+    pc_lite_acpi_files->file_list = g_array_new(false, true /* clear */,
+                                                sizeof(PCLiteAcpiFileEntry));
+}
+
+static PCLiteAcpiFileEntry *pc_lite_acpi_file_search(const char *name)
+{
+    int i;
+    GArray *file_list = pc_lite_acpi_files->file_list;
+    PCLiteAcpiFileEntry *file;
+
+    for (i = 0; i < file_list->len; i++) {
+        file = &g_array_index(file_list, PCLiteAcpiFileEntry, i);
+        if (!strcmp(file->name, name)) {
+            return file;
+        }
+    }
+    return NULL;
+}
+
+static void pc_lite_acpi_file_add(const char *name,
+                                  MemoryRegion *mr, hwaddr offset)
+{
+    PCLiteAcpiFileEntry file = { g_strdup(name), mr, offset };
+    assert(!pc_lite_acpi_file_search(name));
+    g_array_append_val(pc_lite_acpi_files->file_list, file);
+}
+
+static void *pc_lite_acpi_file_get_ptr(PCLiteAcpiFileEntry *file)
+{
+    void *ptr = memory_region_get_ram_ptr(file->mr);
+    return ptr + file->offset;
+}
+
+static hwaddr pc_lite_acpi_file_get_addr(PCLiteAcpiFileEntry *file)
+{
+    return file->mr->addr + file->offset;
+}
+
+static void pc_lite_acpi_patch_allocate(const BiosLinkerLoaderEntry *cmd,
+                                        const BiosLinkerFileEntry *file,
+                                        Error **errp)
+{
+    PCLiteAcpiZone *zone = pc_lite_acpi_get_zone(cmd->alloc.zone);
+    MemoryRegion *zone_mr = zone->mr;
+    GArray *data = file->blob;
+    unsigned size = acpi_data_len(data);
+    hwaddr offset;
+    void *dest;
+    Error *local_err = NULL;
+
+    assert(!strncmp(cmd->alloc.file, file->name, BIOS_LINKER_LOADER_FILESZ));
+
+    if (!zone) {
+        error_setg(&local_err, "Unknown zone type %d of file %s",
+                   cmd->alloc.zone, cmd->alloc.file);
+        goto out;
+    }
+
+    offset = pc_lite_acpi_zone_alloc(zone, size, cmd->alloc.align, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    dest = memory_region_get_ram_ptr(zone_mr);
+    memcpy(dest + offset, data->data, size);
+    memory_region_set_dirty(zone_mr, offset, size);
+
+    pc_lite_acpi_file_add(cmd->alloc.file, zone_mr, offset);
+
+ out:
+    error_propagate(errp, local_err);
+}
+
+static void pc_lite_acpi_patch_add_pointer(const BiosLinkerLoaderEntry *cmd,
+                                           Error **errp)
+{
+    PCLiteAcpiFileEntry *dest_file, *src_file;
+    void *dest;
+    uint64_t pointer = 0;
+    uint32_t offset = cmd->pointer.offset;
+    uint32_t size = cmd->pointer.size;
+    Error *local_err = NULL;
+
+    dest_file = pc_lite_acpi_file_search(cmd->pointer.dest_file);
+    if (!dest_file) {
+        error_setg(&local_err, "Not found dest_file %s",
+                   cmd->pointer.dest_file);
+        goto out;
+    }
+    src_file = pc_lite_acpi_file_search(cmd->pointer.src_file);
+    if (!src_file) {
+        error_setg(&local_err, "Not found src_file %s",
+                   cmd->pointer.src_file);
+        goto out;
+    }
+
+    dest = pc_lite_acpi_file_get_ptr(dest_file);
+    memcpy(&pointer, dest + offset, size);
+    pointer += pc_lite_acpi_file_get_addr(src_file);
+    memcpy(dest + offset, &pointer, size);
+    memory_region_set_dirty(dest_file->mr, dest_file->offset + offset, size);
+
+ out:
+    error_propagate(errp, local_err);
+}
+
+static void pc_lite_acpi_patch_add_checksum(const BiosLinkerLoaderEntry *cmd,
+                                            Error **errp)
+{
+    PCLiteAcpiFileEntry *file = pc_lite_acpi_file_search(cmd->cksum.file);
+    uint32_t offset = cmd->cksum.offset;
+    uint8_t *dest, *cksum;
+    Error *local_err = NULL;
+
+    if (!file) {
+        error_setg(&local_err, "Not found file %s", cmd->cksum.file);
+        goto out;
+    }
+
+    dest = pc_lite_acpi_file_get_ptr(file);
+    cksum = dest + offset;
+    *cksum = acpi_checksum(dest + cmd->cksum.start, cmd->cksum.length);
+    memory_region_set_dirty(file->mr, file->offset + offset, sizeof(*cksum));
+
+ out:
+    error_propagate(errp, local_err);
+}
+
+/**
+ * Patch guest ACPI which is usually done by guest BIOS. However, no
+ * BIOS is used with pc-lite, so it has to be done in QEMU.
+ */
+static void pc_lite_acpi_patch(BIOSLinker *linker, Error **errp)
+{
+    void *cmd_blob_data = linker->cmd_blob->data;
+    unsigned cmd_blob_len = linker->cmd_blob->len;
+    uint64_t offset;
+    const BiosLinkerLoaderEntry *cmd;
+    const BiosLinkerFileEntry *file;
+    Error *local_err = NULL;
+
+    for (offset = 0; offset < cmd_blob_len; offset += sizeof(*cmd)) {
+        cmd = cmd_blob_data + offset;
+
+        switch (cmd->command) {
+        case BIOS_LINKER_LOADER_COMMAND_ALLOCATE:
+            file = bios_linker_find_file(linker, cmd->alloc.file);
+            pc_lite_acpi_patch_allocate(cmd, file, &local_err);
+            break;
+        case BIOS_LINKER_LOADER_COMMAND_ADD_POINTER:
+            pc_lite_acpi_patch_add_pointer(cmd, &local_err);
+            break;
+        case BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM:
+            pc_lite_acpi_patch_add_checksum(cmd, &local_err);
+            break;
+        default:
+            pc_lite_acpi_dprintf("Ignore unknown command 0x%x\n", cmd->command);
+            continue;
+        }
+
+        if (local_err) {
+            goto out;
+        }
+    }
+
+ out:
+    error_propagate(errp, local_err);
+}
+
+
+void pc_lite_acpi_build(PCMachineState *pcms, BIOSLinker *linker, Error **errp)
+{
+    pc_lite_acpi_zones_init(pcms);
+    pc_lite_acpi_files_init();
+    pc_lite_acpi_patch(linker, errp);
+}
diff --git a/include/hw/i386/pc_lite_acpi.h b/include/hw/i386/pc_lite_acpi.h
new file mode 100644
index 0000000..aa08415
--- /dev/null
+++ b/include/hw/i386/pc_lite_acpi.h
@@ -0,0 +1,10 @@
+#ifndef HW_I386_PC_LITE_ACPI_H
+#define HW_I386_PC_LITE_ACPI_H
+
+#include "hw/i386/pc.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "qapi/error.h"
+
+void pc_lite_acpi_build(PCMachineState *pcms, BIOSLinker *linker, Error **errp);
+
+#endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 7/9] pc: skip setting CMOS data when RTC device is unavailable
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
                   ` (5 preceding siblings ...)
  2016-06-17  8:14 ` [Qemu-devel] [RFC 6/9] acpi: patch guest ACPI for pc-lite Chao Peng
@ 2016-06-17  8:14 ` Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 8/9] pc: support direct loading protected/long mode kernel Chao Peng
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

... to make sure hotplug for new platform that without RTC support
can still work.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 hw/i386/pc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7198ed5..46ca0e3 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1695,8 +1695,10 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
         goto out;
     }
 
-    /* increment the number of CPUs */
-    rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 1);
+    if (pcms->rtc) {
+        /* increment the number of CPUs */
+        rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 1);
+    }
 
     apic_id.arch_id = cc->get_arch_id(CPU(dev));
     found_cpu = bsearch(&apic_id, pcms->possible_cpus->cpus,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 8/9] pc: support direct loading protected/long mode kernel
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
                   ` (6 preceding siblings ...)
  2016-06-17  8:14 ` [Qemu-devel] [RFC 7/9] pc: skip setting CMOS data when RTC device is unavailable Chao Peng
@ 2016-06-17  8:14 ` Chao Peng
  2016-06-17  8:14 ` [Qemu-devel] [RFC 9/9] pc: introduce light weight PC board pc-lite Chao Peng
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

Traditionally a PC follows the following booting steps:

    QEMU -> BIOS -> bootloader -> kernel realmode code
         -> kernel protected/longmode code.

This process takes lots of time. For platform like pc-lite we let QEMU
loads protected/long mode kernel directly, hence skipping several steps
and speed up the whole booting. We do this by filling the zero page per
Linux booting protocol and then jumping to the kernel protected/long
mode entry. Registers and paging should also be put in the correct
state.

Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 hw/i386/pc.c | 166 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 163 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 46ca0e3..64ce65c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -85,6 +85,31 @@
 #define FW_CFG_E820_TABLE (FW_CFG_ARCH_LOCAL + 3)
 #define FW_CFG_HPET (FW_CFG_ARCH_LOCAL + 4)
 
+#define BOOT_GDT                0x500
+#define BOOT_IDT                0x520
+#define BOOT_GDT_NULL           0
+#define BOOT_GDT_CODE           1
+#define BOOT_GDT_DATA           2
+#define BOOT_GDT_TSS            3
+#define BOOT_GDT_MAX            4
+#define BOOT_GDT_FLAGS_CODE     (DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | \
+                                 DESC_R_MASK | DESC_A_MASK | DESC_G_MASK )
+#define BOOT_GDT_FLAGS_DATA     (DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |  \
+                                 DESC_A_MASK | DESC_B_MASK | DESC_G_MASK)
+#define BOOT_GDT_FLAGS_TSS      DESC_P_MASK | (11 << DESC_TYPE_SHIFT)
+#define BOOT_PML4               0x9000
+#define BOOT_PDPTE              0xA000
+#define BOOT_LOADER_SP          0x8000
+#define BOOT_CMDLINE_OFFSET     0x20000
+#define BOOT_ZEROPAGE_OFFSET    0x7000
+
+#define GDT_ENTRY(flags, base, limit)               \
+       ((((base)  & 0xff000000ULL) << (56-24)) |    \
+       (((flags) & 0x0000f0ffULL) << 40) |          \
+       (((limit) & 0x000f0000ULL) << (48-16)) |     \
+       (((base)  & 0x00ffffffULL) << 16) |          \
+       (((limit) & 0x0000ffffULL)))
+
 #define E820_NR_ENTRIES		16
 
 struct e820_entry {
@@ -98,6 +123,13 @@ struct e820_table {
     struct e820_entry entry[E820_NR_ENTRIES];
 } QEMU_PACKED __attribute((__aligned__(4)));
 
+struct kernel_boot_info {
+    uint64_t entry;
+    bool protected_mode;
+    bool long_mode;
+};
+
+static struct kernel_boot_info boot_info;
 static struct e820_table e820_reserve;
 static struct e820_entry *e820_table;
 static unsigned e820_entries;
@@ -667,6 +699,124 @@ bool e820_get_entry(int idx, uint32_t type, uint64_t *address, uint64_t *length)
     return false;
 }
 
+static void reset_cpu(CPUX86State *env)
+{
+    unsigned int flags = BOOT_GDT_FLAGS_CODE;
+
+    if (boot_info.long_mode) {
+        flags |= DESC_L_MASK;
+    }
+    cpu_x86_load_seg_cache(env, R_CS, BOOT_GDT_CODE * 8, 0, 0xfffff, flags);
+
+    cpu_x86_load_seg_cache(env, R_DS, BOOT_GDT_DATA * 8, 0, 0xfffff,
+                           BOOT_GDT_FLAGS_DATA);
+    cpu_x86_load_seg_cache(env, R_ES, BOOT_GDT_DATA * 8, 0, 0xfffff,
+                           BOOT_GDT_FLAGS_DATA);
+    cpu_x86_load_seg_cache(env, R_FS, BOOT_GDT_DATA * 8, 0, 0xfffff,
+                           BOOT_GDT_FLAGS_DATA);
+    cpu_x86_load_seg_cache(env, R_GS, BOOT_GDT_DATA * 8, 0, 0xfffff,
+                           BOOT_GDT_FLAGS_DATA);
+    cpu_x86_load_seg_cache(env, R_SS, BOOT_GDT_DATA * 8, 0, 0xfffff,
+                           BOOT_GDT_FLAGS_DATA);
+
+    env->gdt.base = BOOT_GDT;
+    env->gdt.limit = BOOT_GDT_MAX * 8 - 1;
+
+    env->idt.base = BOOT_IDT;
+
+    env->tr.selector = BOOT_GDT_TSS * 8;
+    env->tr.flags = BOOT_GDT_FLAGS_TSS;
+
+    env->cr[3] = BOOT_PML4;
+    env->cr[0] |= (CR0_PG_MASK | CR0_PE_MASK);
+
+    if (boot_info.long_mode) {
+        env->cr[4] |= CR4_PAE_MASK;
+        cpu_load_efer(env, env->efer | MSR_EFER_LME | MSR_EFER_LMA);
+    }
+
+    env->regs[R_ESP] = BOOT_LOADER_SP;
+    env->regs[R_ESI] = BOOT_ZEROPAGE_OFFSET;
+    env->eip = boot_info.entry;
+}
+
+static void setup_seg_desc_tables(void)
+{
+    uint64_t idt = 0;
+    uint64_t gdt[BOOT_GDT_MAX] = {
+             [BOOT_GDT_NULL] = GDT_ENTRY(0, 0, 0),
+             [BOOT_GDT_CODE] = GDT_ENTRY(BOOT_GDT_FLAGS_CODE, 0, 0xFFFFF),
+             [BOOT_GDT_DATA] = GDT_ENTRY(BOOT_GDT_FLAGS_DATA, 0, 0xFFFFF),
+             [BOOT_GDT_TSS ] = GDT_ENTRY(BOOT_GDT_FLAGS_TSS, 0, 0xFFFFF)
+            };
+
+    if (boot_info.long_mode) {
+        gdt[BOOT_GDT_CODE] |= (1UL << (32 + DESC_L_SHIFT));
+    }
+
+    cpu_physical_memory_write((hwaddr)BOOT_GDT, gdt, sizeof(gdt));
+    cpu_physical_memory_write((hwaddr)BOOT_IDT, &idt, sizeof(idt));
+}
+
+static void setup_page_tables(void)
+{
+    void *p;
+    size_t len = 4096;
+
+    p = cpu_physical_memory_map(BOOT_PML4, &len, 1);
+    memset(p, 0, 4096);
+    *(uint64_t*)p = (uint64_t)(BOOT_PDPTE | 3);
+    cpu_physical_memory_unmap(p, len, 1, len);
+
+    len = 4096;
+    p = cpu_physical_memory_map(BOOT_PDPTE, &len, 1);
+    memset(p, 0, 4096);
+    *(uint64_t*)p = 0x83;
+    cpu_physical_memory_unmap(p, len, 1, len);
+}
+
+static void setup_kernel_zero_page(void)
+{
+    int i;
+    uint8_t *zero_page;
+    void *e820_map;
+    size_t zero_page_size = 4096;
+    MachineState *machine = MACHINE(qdev_get_machine());
+    size_t cmdline_size = strlen(machine->kernel_cmdline) + 1;
+
+    cpu_physical_memory_write((hwaddr)BOOT_CMDLINE_OFFSET,
+                               machine->kernel_cmdline, cmdline_size);
+
+    zero_page = cpu_physical_memory_map((hwaddr)BOOT_ZEROPAGE_OFFSET,
+                                        &zero_page_size, 1);
+    memset(zero_page, 0, zero_page_size);
+
+    /* hdr.type_of_loader */
+    zero_page[0x210] = 0xFF;
+    /* hdr.boot_flag */
+    stw_p(zero_page + 0x1fe, 0xAA55);
+    /* hdr.header */
+    stl_p(zero_page + 0x202, 0x53726448);
+    /* hdr.cmd_line_ptr */
+    stl_p(zero_page + 0x228, BOOT_CMDLINE_OFFSET);
+    /* hdr.cmdline_size */
+    stl_p(zero_page + 0x238, cmdline_size);
+    /* e820_entries */
+    zero_page[0x1e8] = e820_entries;
+    /* e820_map */
+    e820_map = zero_page + 0x2d0;
+    for (i = 0; i < e820_entries; i++) {
+        stq_p(e820_map, e820_table[i].address);
+        e820_map += 8;
+        stq_p(e820_map, e820_table[i].length);
+        e820_map += 8;
+        stl_p(e820_map, e820_table[i].type);
+        e820_map += 4;
+    }
+
+    cpu_physical_memory_unmap(zero_page, zero_page_size, 1, zero_page_size);
+}
+
 /* Enables contiguous-apic-ID mode, for compatibility */
 static bool compat_apic_id_mode;
 
@@ -1928,15 +2078,25 @@ static void pc_machine_reset(void)
 
     qemu_devices_reset();
 
-    /* Reset APIC after devices have been reset to cancel
-     * any changes that qemu_devices_reset() might have done.
-     */
     CPU_FOREACH(cs) {
         cpu = X86_CPU(cs);
 
+        /* Reset APIC after devices have been reset to cancel
+         * any changes that qemu_devices_reset() might have done.
+         */
         if (cpu->apic_state) {
             device_reset(cpu->apic_state);
         }
+
+        if (boot_info.protected_mode && cpu_is_bsp(cpu)) {
+            reset_cpu(&cpu->env);
+        }
+    }
+
+    if (boot_info.protected_mode) {
+        setup_seg_desc_tables();
+        setup_page_tables();
+        setup_kernel_zero_page();
     }
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [RFC 9/9] pc: introduce light weight PC board pc-lite
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
                   ` (7 preceding siblings ...)
  2016-06-17  8:14 ` [Qemu-devel] [RFC 8/9] pc: support direct loading protected/long mode kernel Chao Peng
@ 2016-06-17  8:14 ` Chao Peng
  2016-06-17 13:24 ` [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Paolo Bonzini
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-17  8:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

The new board gets rid of most legacy devices and mainly support modern
PCI devices. BIOS is skipped and an ELF format kernel must be specified.
QEMU will boot this kernel directly.

Add "-machine pc-lite -kernel $ELF_KERNEL -append $KERNEL_CMD" to use it.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 hw/i386/Makefile.objs |   2 +-
 hw/i386/pc.c          |  91 ++++++++++++++++------
 hw/i386/pc_lite.c     | 205 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/pc_piix.c     |   2 +
 hw/i386/pc_q35.c      |   2 +
 include/hw/i386/pc.h  |   8 ++
 6 files changed, 284 insertions(+), 26 deletions(-)
 create mode 100644 hw/i386/pc_lite.c

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 7d29ec0..af3f53d 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -1,6 +1,6 @@
 obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
-obj-y += pc.o pc_piix.o pc_q35.o pc_lite_acpi.o
+obj-y += pc.o pc_piix.o pc_q35.o pc_lite_acpi.o pc_lite.o
 obj-y += pc_sysfw.o
 obj-y += intel_iommu.o
 obj-$(CONFIG_XEN) += ../xenpv/ xen/
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 64ce65c..0533951 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -962,8 +962,43 @@ static long get_file_size(FILE *f)
     return size;
 }
 
-static void load_linux(PCMachineState *pcms,
-                       FWCfgState *fw_cfg)
+static void load_linux_efi(PCMachineState *pcms)
+{
+    unsigned char class;
+    MachineState *machine = MACHINE(pcms);
+    FILE *file = fopen(machine->kernel_filename, "rb");
+
+    if (!file) {
+        goto err;
+    }
+
+    if (fseek(file, EI_CLASS, 0) || fread(&class, 1, 1, file) != 1) {
+        fclose(file);
+        goto err;
+    }
+    fclose(file);
+
+    if (load_elf(machine->kernel_filename, NULL, NULL, &boot_info.entry,
+                   NULL, NULL, 0, EM_X86_64, 0, 0) < 0) {
+        goto err;
+    }
+
+    if (class == ELFCLASS64) {
+        boot_info.long_mode = true;
+    } else if (class != ELFCLASS32) {
+        goto err;
+    }
+
+    boot_info.protected_mode = true;
+    return;
+
+err:
+    fprintf(stderr, "qemu: could not load kernel '%s'\n",
+                    machine->kernel_filename);
+    exit(1);
+}
+
+static void load_linux_bzimage(PCMachineState *pcms, FWCfgState *fw_cfg)
 {
     uint16_t protocol;
     int setup_size, kernel_size, initrd_size = 0, cmdline_size;
@@ -1404,7 +1439,7 @@ void xen_load_linux(PCMachineState *pcms)
     fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
     rom_set_fw(fw_cfg);
 
-    load_linux(pcms, fw_cfg);
+    load_linux_bzimage(pcms, fw_cfg);
     for (i = 0; i < nb_option_roms; i++) {
         assert(!strcmp(option_rom[i].name, "linuxboot.bin") ||
                !strcmp(option_rom[i].name, "multiboot.bin"));
@@ -1421,7 +1456,7 @@ void pc_memory_init(PCMachineState *pcms,
     int linux_boot, i;
     MemoryRegion *ram, *option_rom_mr;
     MemoryRegion *ram_below_4g, *ram_above_4g;
-    FWCfgState *fw_cfg;
+    FWCfgState *fw_cfg = NULL;
     MachineState *machine = MACHINE(pcms);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 
@@ -1503,36 +1538,42 @@ void pc_memory_init(PCMachineState *pcms,
                                     &pcms->hotplug_memory.mr);
     }
 
-    /* Initialize PC system firmware */
-    pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
+    if (pcmc->type != PC_MACHINE_TYPE_LITE) {
+        /* Initialize PC system firmware */
+        pc_system_firmware_init(rom_memory, !pcmc->pci_enabled);
 
-    option_rom_mr = g_malloc(sizeof(*option_rom_mr));
-    memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
-                           &error_fatal);
-    vmstate_register_ram_global(option_rom_mr);
-    memory_region_add_subregion_overlap(rom_memory,
-                                        PC_ROM_MIN_VGA,
-                                        option_rom_mr,
-                                        1);
+        option_rom_mr = g_malloc(sizeof(*option_rom_mr));
+        memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
+                               &error_fatal);
+        vmstate_register_ram_global(option_rom_mr);
+        memory_region_add_subregion_overlap(rom_memory,
+                                            PC_ROM_MIN_VGA,
+                                            option_rom_mr,
+                                            1);
 
-    fw_cfg = bochs_bios_init(&address_space_memory, pcms);
+        fw_cfg = bochs_bios_init(&address_space_memory, pcms);
 
-    rom_set_fw(fw_cfg);
+        rom_set_fw(fw_cfg);
 
-    if (pcmc->has_reserved_memory && pcms->hotplug_memory.base) {
-        uint64_t *val = g_malloc(sizeof(*val));
-        PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
-        uint64_t res_mem_end = pcms->hotplug_memory.base;
+        if (pcmc->has_reserved_memory && pcms->hotplug_memory.base) {
+            uint64_t *val = g_malloc(sizeof(*val));
+            PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+            uint64_t res_mem_end = pcms->hotplug_memory.base;
 
-        if (!pcmc->broken_reserved_end) {
-            res_mem_end += memory_region_size(&pcms->hotplug_memory.mr);
+            if (!pcmc->broken_reserved_end) {
+                res_mem_end += memory_region_size(&pcms->hotplug_memory.mr);
+            }
+            *val = cpu_to_le64(ROUND_UP(res_mem_end, 0x1ULL << 30));
+            fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));
         }
-        *val = cpu_to_le64(ROUND_UP(res_mem_end, 0x1ULL << 30));
-        fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));
     }
 
     if (linux_boot) {
-        load_linux(pcms, fw_cfg);
+        if (pcmc->type == PC_MACHINE_TYPE_LITE) {
+            load_linux_efi(pcms);
+        } else {
+            load_linux_bzimage(pcms, fw_cfg);
+        }
     }
 
     for (i = 0; i < nb_option_roms; i++) {
diff --git a/hw/i386/pc_lite.c b/hw/i386/pc_lite.c
new file mode 100644
index 0000000..3756848
--- /dev/null
+++ b/hw/i386/pc_lite.c
@@ -0,0 +1,205 @@
+/*
+ * Light weight PC chipset
+ *
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ * Copyright (c) 2009, 2010
+ *               Isaku Yamahata <yamahata at valinux co jp>
+ *               VA Linux Systems Japan K.K.
+ * Copyright (C) 2012 Jason Baron <jbaron@redhat.com>
+ * Copyright (C) 2016 Chao Peng <chao.p.peng@linux.intel.com>
+ *
+ * This is based on pc_q35.c, but heavily modified.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "hw/loader.h"
+#include "sysemu/arch_init.h"
+#include "hw/boards.h"
+#include "hw/xen/xen.h"
+#include "sysemu/kvm.h"
+#include "hw/kvm/clock.h"
+#include "hw/pci-host/q35.h"
+#include "exec/address-spaces.h"
+#include "hw/i386/ich9.h"
+#include "hw/smbios/smbios.h"
+#include "hw/ide/pci.h"
+#include "qemu/error-report.h"
+#include "migration/migration.h"
+
+static void pc_lite_gsi_handler(void *opaque, int n, int level)
+{
+    GSIState *s = opaque;
+
+    qemu_set_irq(s->ioapic_irq[n], level);
+}
+
+static void pc_lite_init(MachineState *machine)
+{
+    PCMachineState *pcms = PC_MACHINE(machine);
+    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+    PCIBus *host_bus;
+    MemoryRegion *pci_memory;
+    MemoryRegion *rom_memory;
+    MemoryRegion *ram_memory;
+    GSIState *gsi_state;
+    qemu_irq *gsi;
+    ram_addr_t lowmem;
+    DeviceState *pm;
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+
+    /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
+     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
+     * also known as MMCFG).
+     * If it doesn't, we need to split it in chunks below and above 4G.
+     * In any case, try to make sure that guest addresses aligned at
+     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
+     */
+    if (machine->ram_size >= 0xb0000000) {
+        lowmem = 0x80000000;
+    } else {
+        lowmem = 0xb0000000;
+    }
+
+    /* Handle the machine opt max-ram-below-4g.  It is basically doing
+     * min(qemu limit, user limit).
+     */
+    if (lowmem > pcms->max_ram_below_4g) {
+        lowmem = pcms->max_ram_below_4g;
+        if (machine->ram_size - lowmem > lowmem &&
+            lowmem & ((1ULL << 30) - 1)) {
+            error_report("Warning: Large machine and max_ram_below_4g(%"PRIu64
+                         ") not a multiple of 1G; possible bad performance.",
+                         pcms->max_ram_below_4g);
+        }
+    }
+
+    if (machine->ram_size >= lowmem) {
+        pcms->above_4g_mem_size = machine->ram_size - lowmem;
+        pcms->below_4g_mem_size = lowmem;
+    } else {
+        pcms->above_4g_mem_size = 0;
+        pcms->below_4g_mem_size = machine->ram_size;
+    }
+
+    if (xen_enabled()) {
+        xen_hvm_init(pcms, &ram_memory);
+    }
+
+    pc_cpus_init(pcms);
+
+    kvmclock_create();
+
+    /* pci enabled */
+    if (pcmc->pci_enabled) {
+        pci_memory = g_new(MemoryRegion, 1);
+        memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
+        rom_memory = pci_memory;
+    } else {
+        pci_memory = NULL;
+        rom_memory = get_system_memory();
+    }
+
+    pc_guest_info_init(pcms);
+
+    if (pcmc->smbios_defaults) {
+        /* These values are guest ABI, do not change */
+        smbios_set_defaults("QEMU", "Light weight PC)",
+                            mc->name, pcmc->smbios_legacy_mode,
+                            pcmc->smbios_uuid_encoded,
+                            SMBIOS_ENTRY_POINT_21);
+    }
+
+    /* allocate ram and load rom/bios */
+    if (!xen_enabled()) {
+        pc_memory_init(pcms, get_system_memory(),
+                       rom_memory, &ram_memory);
+    }
+
+    /* irq lines */
+    gsi_state = g_malloc0(sizeof(*gsi_state));
+    if (kvm_irqchip_in_kernel()) {
+        kvm_pc_setup_irq_routing(pcmc->pci_enabled);
+    }
+    gsi = qemu_allocate_irqs(pc_lite_gsi_handler, gsi_state, GSI_NUM_PINS);
+
+    if (pcmc->pci_enabled) {
+        host_bus = pci_lite_init(get_system_memory(), get_system_io(),
+                                     pci_memory);
+        pcms->bus = host_bus;
+
+        if (acpi_enabled) {
+            pm = pm_lite_init(host_bus, -1, gsi[9]);
+
+            object_property_add_link(OBJECT(machine),
+                                     PC_MACHINE_ACPI_DEVICE_PROP,
+                                     TYPE_HOTPLUG_HANDLER,
+                                     (Object **)&pcms->acpi_dev,
+                                     object_property_allow_set_link,
+                                     OBJ_PROP_LINK_UNREF_ON_RELEASE,
+                                     &error_abort);
+            object_property_set_link(OBJECT(machine), OBJECT(pm),
+                                     PC_MACHINE_ACPI_DEVICE_PROP, &error_abort);
+        }
+    }
+
+    if (pcmc->pci_enabled) {
+        ioapic_init_gsi(gsi_state, "pcilite");
+    }
+
+    pc_register_ferr_irq(gsi[13]);
+
+    if (pcms->acpi_nvdimm_state.is_enabled) {
+        nvdimm_init_acpi_state(&pcms->acpi_nvdimm_state, get_system_io(),
+                               pcms->fw_cfg, OBJECT(pcms));
+    }
+}
+
+#define DEFINE_LITE_MACHINE(suffix, name, compatfn, optionfn) \
+    static void pc_init_##suffix(MachineState *machine) \
+    { \
+        void (*compat)(MachineState *m) = (compatfn); \
+        if (compat) { \
+            compat(machine); \
+        } \
+        pc_lite_init(machine); \
+    } \
+    DEFINE_PC_MACHINE(suffix, name, pc_init_##suffix, optionfn)
+
+
+static void pc_lite_machine_options(MachineClass *m)
+{
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+    m->family = "pc_lite";
+    m->desc = "Light weight PC";
+    m->hot_add_cpu = pc_hot_add_cpu;
+    m->units_per_default_bus = 1;
+    m->no_floppy = 1;
+    pcmc->type = PC_MACHINE_TYPE_LITE;
+}
+
+static void pc_lite_2_7_machine_options(MachineClass *m)
+{
+    pc_lite_machine_options(m);
+    m->alias = "pc-lite";
+}
+
+DEFINE_LITE_MACHINE(v2_7, "pc-lite-2.7", NULL,
+                    pc_lite_2_7_machine_options);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 53bc968..9d25abc 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -425,11 +425,13 @@ static void pc_xen_hvm_init(MachineState *machine)
 
 static void pc_i440fx_machine_options(MachineClass *m)
 {
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
     m->family = "pc_piix";
     m->desc = "Standard PC (i440FX + PIIX, 1996)";
     m->hot_add_cpu = pc_hot_add_cpu;
     m->default_machine_opts = "firmware=bios-256k.bin";
     m->default_display = "std";
+    pcmc->type = PC_MACHINE_TYPE_PIIX;
 }
 
 static void pc_i440fx_2_7_machine_options(MachineClass *m)
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index e4b541f..af2c201 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -274,6 +274,7 @@ static void pc_q35_init(MachineState *machine)
 
 static void pc_q35_machine_options(MachineClass *m)
 {
+    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
     m->family = "pc_q35";
     m->desc = "Standard PC (Q35 + ICH9, 2009)";
     m->hot_add_cpu = pc_hot_add_cpu;
@@ -281,6 +282,7 @@ static void pc_q35_machine_options(MachineClass *m)
     m->default_machine_opts = "firmware=bios-256k.bin";
     m->default_display = "std";
     m->no_floppy = 1;
+    pcmc->type = PC_MACHINE_TYPE_Q35;
 }
 
 static void pc_q35_2_7_machine_options(MachineClass *m)
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index ad7533b..ef0e3df 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -33,6 +33,12 @@
 #define kvm_ioapic_in_kernel()   0
 #endif
 
+typedef enum {
+    PC_MACHINE_TYPE_PIIX    = 0,
+    PC_MACHINE_TYPE_Q35     = 1,
+    PC_MACHINE_TYPE_LITE    = 2,
+} PCMachineType;
+
 /**
  * PCMachineState:
  * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling
@@ -136,6 +142,8 @@ struct PCMachineClass {
 
     /* TSC rate migration: */
     bool save_tsc_khz;
+
+    PCMachineType type;
 };
 
 #define TYPE_PC_MACHINE "generic-pc-machine"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
                   ` (8 preceding siblings ...)
  2016-06-17  8:14 ` [Qemu-devel] [RFC 9/9] pc: introduce light weight PC board pc-lite Chao Peng
@ 2016-06-17 13:24 ` Paolo Bonzini
  2016-06-20  6:01   ` Chao Peng
  2016-06-20 10:36   ` Dr. David Alan Gilbert
  2016-06-19  3:51 ` Michael S. Tsirkin
  2016-06-19  8:21 ` Claudio Fontana
  11 siblings, 2 replies; 32+ messages in thread
From: Paolo Bonzini @ 2016-06-17 13:24 UTC (permalink / raw)
  To: Chao Peng, qemu-devel
  Cc: Michael S. Tsirkin, gor Mammedov, Xiao Guangrong,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang



On 17/06/2016 10:14, Chao Peng wrote:
> Basically:
> - it removes old ISA devices and support only PCI devices;

I think you need to keep at least the RTC, otherwise where does Linux
get the time of day from?

> - it removes 8259, instead use MSI as much as possible. IOAPIC and PCI
>   PIN are still kept to support ACPI SCI;
> - it supports PCIE ( you can use MMFG instead of 0xcf8/0xcfc port
>   access);
> - it gets rid of legacy firmware interfaces and supports ACPI tables;
> - it supports CPU/memory/PCI hotplug;
> - it supports Linux-guest only at present;
> - You may need carefully configure guest kernel;
> - You are forced to use virtio-serial-pci, old 8250/16550 is not there;

It doesn't support PCIe hotplug though, I think? (Because it doesn't
support PCI bridges and PCIe hotplug doesn't work for root complex
devices).  So is it ACPI-based hotplug?

Lack of 8250/16550 means lack of earlyprintk.  I know the driver is slow
though, so I understand that.

Anyway, I guess all the items above are acceptable.

The ones that I think are "less acceptable" are just two. :)

1) I am a bit worried about introducing a custom northbridge and PM
device.  In principle you could remove most ISA devices (especially
those that take long to initialize) and the 8259 while keeping Q35 and
ICH9.  This would make it easier to choose between having a firmware and
direct guest kernel load.

In general I'd model the lightweight devices around Q35 and ICH9, not
PIIX.  ICH9 in particular is good because it integrates the PM device
and it has an ISA bridge for the RTC and perhaps an optional 8250.  But
it would be even better to use Q35 and ICH9, not model around them. :)

2) this:

> - it loads guest kernel directly, no BIOS, no bootloader, no realmode
>   code;

... which is related to Linux-only support.  How much does this gain
over a minimal firmware (either SeaBIOS with the fw_cfg DMA interface,
or qboot with cbfs in parallel flash)?


> - it supports KVM-host only at present;

Do you know why?

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
                   ` (9 preceding siblings ...)
  2016-06-17 13:24 ` [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Paolo Bonzini
@ 2016-06-19  3:51 ` Michael S. Tsirkin
  2016-06-20  6:12   ` Chao Peng
  2016-06-19  8:21 ` Claudio Fontana
  11 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-06-19  3:51 UTC (permalink / raw)
  To: Chao Peng
  Cc: qemu-devel, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

On Fri, Jun 17, 2016 at 04:14:08AM -0400, Chao Peng wrote:
> - it is FAST;

Any numbers to demonstrate just how fast it is and fast at what?

> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
                   ` (10 preceding siblings ...)
  2016-06-19  3:51 ` Michael S. Tsirkin
@ 2016-06-19  8:21 ` Claudio Fontana
  2016-06-20  6:30   ` Chao Peng
  11 siblings, 1 reply; 32+ messages in thread
From: Claudio Fontana @ 2016-06-19  8:21 UTC (permalink / raw)
  To: Chao Peng, qemu-devel
  Cc: Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, gor Mammedov,
	Richard Henderson

Hi,

On 17.06.2016 10:14, Chao Peng wrote:
> This patchset is against commit 585fcd4 (Merge remote-tracking branch
> 'remotes/bonzini/tags/for-upstream' into staging) on master branch. I
> also put it on github:
> 
> https://github.com/chao-p/qemu pc-lite-v1
>  
> Although we have run the patchset internally for a while but it is still
> considered as RFC. Any comments (coding style or design issue) are
> welcome.
> 
> Introduction
> ============
> The patch series introduces a new platform pc-lite, which is designed to
> be a virtual and light weight x86 PC platform. It is not designed to be
> compatible with old hardware and system. Instead, It removes the burden
> of legacy devices and emulates new fast hardware as much as possible. It
> is expected to be used together with optimized guest (though unoptimized
> guest works as well) to gain fast booting and small footprint benefits
> that are difficult to achieve for traditional hardware-emulated
> platform.
> 
> Basically:
> - it removes old ISA devices and support only PCI devices;
> - it removes 8259, instead use MSI as much as possible. IOAPIC and PCI
>   PIN are still kept to support ACPI SCI;
> - it supports PCIE ( you can use MMFG instead of 0xcf8/0xcfc port
>   access);
> - it gets rid of legacy firmware interfaces and supports ACPI tables;
> - it loads guest kernel directly, no BIOS, no bootloader, no realmode
>   code;
> - it supports CPU/memory/PCI hotplug;
> - it is FAST;
> 
> However:
> - it supports KVM-host only at present;
> - it supports Linux-guest only at present;
> - You may need carefully configure guest kernel;
> - You are forced to use virtio-serial-pci, old 8250/16550 is not there;
> 
> Want to have a try?
> ===================

The combination of devices removed/supported seems a bit arbitrary to me.
The use in this text of "legacy" and non-"legacy" is also arbitrary if not given a definition.

What is the use case of this new platform?

Care to share your numbers about how FAST is FAST? Like how much time you need to boot from qemu launch to the guest being booted, and on which hardware, and which guest kernel configuration?

Ciao,

Claudio

> 
> Please follow https://github.com/chao-p/qemu-lite-tools.
> 
> Thanks,
> Chao
> 
> Chao Peng (6):
>   acpi: introduce light weight ACPI PM emulation pm-lite
>   pci: introduce light weight PCIE Host emulation pci-lite
>   acpi: add support for pc-lite platform
>   pc: skip setting CMOS data when RTC device is unavailable
>   pc: support direct loading protected/long mode kernel
>   pc: introduce light weight PC board pc-lite
> 
> Haozhong Zhang (3):
>   acpi: expose data structurs and functions of BIOS linker loader
>   acpi: expose acpi_checksum()
>   acpi: patch guest ACPI for pc-lite
> 
>  default-configs/i386-softmmu.mak     |   1 +
>  default-configs/x86_64-softmmu.mak   |   1 +
>  docs/specs/acpi_cpu_hotplug.txt      |   1 +
>  hw/acpi/Makefile.objs                |   2 +-
>  hw/acpi/bios-linker-loader.c         |  83 +------
>  hw/acpi/core.c                       |   2 +-
>  hw/acpi/nvdimm.c                     |   6 +-
>  hw/acpi/pm_lite.c                    | 446 +++++++++++++++++++++++++++++++++++
>  hw/i386/Makefile.objs                |   2 +-
>  hw/i386/acpi-build.c                 | 180 +++++++++-----
>  hw/i386/pc.c                         | 263 ++++++++++++++++++---
>  hw/i386/pc_lite.c                    | 205 ++++++++++++++++
>  hw/i386/pc_lite_acpi.c               | 299 +++++++++++++++++++++++
>  hw/i386/pc_piix.c                    |   2 +
>  hw/i386/pc_q35.c                     |   2 +
>  hw/pci-host/Makefile.objs            |   1 +
>  hw/pci-host/pci_lite.c               | 259 ++++++++++++++++++++
>  include/hw/acpi/acpi.h               |   2 +
>  include/hw/acpi/bios-linker-loader.h |  85 +++++++
>  include/hw/acpi/pc-hotplug.h         |   1 +
>  include/hw/acpi/pm_lite.h            |   6 +
>  include/hw/i386/pc.h                 |  17 ++
>  include/hw/i386/pc_lite_acpi.h       |  10 +
>  23 files changed, 1697 insertions(+), 179 deletions(-)
>  create mode 100644 hw/acpi/pm_lite.c
>  create mode 100644 hw/i386/pc_lite.c
>  create mode 100644 hw/i386/pc_lite_acpi.c
>  create mode 100644 hw/pci-host/pci_lite.c
>  create mode 100644 include/hw/acpi/pm_lite.h
>  create mode 100644 include/hw/i386/pc_lite_acpi.h
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-17 13:24 ` [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Paolo Bonzini
@ 2016-06-20  6:01   ` Chao Peng
  2016-06-20  6:54     ` Paolo Bonzini
  2016-06-20 10:36   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 32+ messages in thread
From: Chao Peng @ 2016-06-20  6:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Michael S. Tsirkin, gor Mammedov, Xiao Guangrong,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang, anthony.xu

On Fri, Jun 17, 2016 at 03:24:59PM +0200, Paolo Bonzini wrote:
> 
> 
> On 17/06/2016 10:14, Chao Peng wrote:
> > Basically:
> > - it removes old ISA devices and support only PCI devices;
> 
> I think you need to keep at least the RTC, otherwise where does Linux
> get the time of day from?

PV clock will provide that.

> 
> It doesn't support PCIe hotplug though, I think? (Because it doesn't
> support PCI bridges and PCIe hotplug doesn't work for root complex
> devices).  So is it ACPI-based hotplug?

Yes, this is ACPI-based.

> 
> Lack of 8250/16550 means lack of earlyprintk.  I know the driver is slow
> though, so I understand that.

Understand, it might be a little bit hard for debugging, one solution
would be adding 8250/16550 in debug build?

> 
> Anyway, I guess all the items above are acceptable.
> 
> The ones that I think are "less acceptable" are just two. :)
> 
> 1) I am a bit worried about introducing a custom northbridge and PM
> device.  In principle you could remove most ISA devices (especially
> those that take long to initialize) and the 8259 while keeping Q35 and
> ICH9.  This would make it easier to choose between having a firmware and
> direct guest kernel load.
> 
> In general I'd model the lightweight devices around Q35 and ICH9, not
> PIIX.  ICH9 in particular is good because it integrates the PM device
> and it has an ISA bridge for the RTC and perhaps an optional 8250.  But
> it would be even better to use Q35 and ICH9, not model around them. :)

We actually have patches for Q35 + ICH9 and it does exactly the same
thing you described here. Adding a new one is just:
 1). to keep both Q35 and 'lite' code clean, and
 2). don't expose two different Q35 implementations to guest.

One example: we saw smram/pam in Q35 causes guest boot time decrease so
we want to strip them off, we change Q35 for our case (ifdef or
whatever) but this changes the way guest sees it and also may not be in
accord with Q35 spec.

I'm not strongly insist on this. Also reuse Q35 in QEMU can make code in
other place (like BIOS) be reused as well (if not cause problem).

> 
> 2) this:
> 
> > - it loads guest kernel directly, no BIOS, no bootloader, no realmode
> >   code;
> 
> ... which is related to Linux-only support.  How much does this gain
> over a minimal firmware (either SeaBIOS with the fw_cfg DMA interface,
> or qboot with cbfs in parallel flash)?

We have tried Q35 version (as described above) with both SeaBIOS and qboot.
The 'perfect' time with optimized BIOS we have seen is ~15ms, with the
additional time in kernel real mode code, the total time overhead comparing
to current Linux-aware implementation is more than 40ms. This sounds still
a little too much for us.

One solution is to support both direct Linux booting and BIOS booting with a
little overhead (but with more flexibility).

> 
> 
> > - it supports KVM-host only at present;
> 
> Do you know why?

Just because we have never tested that on other hypervisors (like Xen).

Thanks for comments and suggestions.

Chao

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-19  3:51 ` Michael S. Tsirkin
@ 2016-06-20  6:12   ` Chao Peng
  2016-06-23 12:55     ` Daniel P. Berrange
  0 siblings, 1 reply; 32+ messages in thread
From: Chao Peng @ 2016-06-20  6:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, gor Mammedov, Xiao Guangrong, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang

On Sun, Jun 19, 2016 at 06:51:04AM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 17, 2016 at 04:14:08AM -0400, Chao Peng wrote:
> > - it is FAST;
> 
> Any numbers to demonstrate just how fast it is and fast at what?

On a 2.30GHz Haswell server, guest kernel booting time is 59.9ms by
following test steps listed at

https://github.com/chao-p/qemu-lite-tools

Ran the same test with "-machine q35", the guest kernel booting
time is 129.8ms. There is additional 75ms in SeaBIOS for Q35 case.

Thanks,
Chao

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-19  8:21 ` Claudio Fontana
@ 2016-06-20  6:30   ` Chao Peng
  0 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-20  6:30 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: qemu-devel, Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, Paolo Bonzini, gor Mammedov,
	Richard Henderson

> 
> The combination of devices removed/supported seems a bit arbitrary to me.
> The use in this text of "legacy" and non-"legacy" is also arbitrary if not given a definition.

The criterial here is to remove the devices that have poor performance.
You can think legacy devices here as ISA devices. Most of them are
emulated with slow port IO and slow line-based IRQ handling.

> 
> What is the use case of this new platform?

You can have a look at Clear containers
(https://clearlinux.org/features/clear-containers) or other similar
technologies. Basically they try to provide container-like service with
VT to improve security. Clear containers employ kvm-tool to gain 'lite'
benefit. This patch however try to achive the same with QEMU.

> 
> Care to share your numbers about how FAST is FAST? Like how much time you need to boot from qemu launch to the guest being booted, and on which hardware, and which guest kernel configuration?

Kernel .config is already listed in the repo I listed below. For
guest kernel and BIOS performance data please see another reply to mst.
QEMU bootup time itself however is mainly not improved with this
patchset, but the optmized QEMU normally we see takes ~45ms.

Thanks,
Chao

> 
> Ciao,
> 
> Claudio
> 
> > 
> > Please follow https://github.com/chao-p/qemu-lite-tools.
> > 
> > Thanks,
> > Chao

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-20  6:01   ` Chao Peng
@ 2016-06-20  6:54     ` Paolo Bonzini
  2016-06-20 12:31       ` Stefan Hajnoczi
                         ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Paolo Bonzini @ 2016-06-20  6:54 UTC (permalink / raw)
  To: Chao Peng
  Cc: qemu-devel, Michael S. Tsirkin, gor Mammedov, Xiao Guangrong,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang, anthony.xu



On 20/06/2016 08:01, Chao Peng wrote:
> On Fri, Jun 17, 2016 at 03:24:59PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 17/06/2016 10:14, Chao Peng wrote:
>>> Basically:
>>> - it removes old ISA devices and support only PCI devices;
>>
>> I think you need to keep at least the RTC, otherwise where does Linux
>> get the time of day from?
> 
> PV clock will provide that.

It's KVM only, though.  Sometimes TCG is useful for debugging.

>> Lack of 8250/16550 means lack of earlyprintk.  I know the driver is slow
>> though, so I understand that.
> 
> Understand, it might be a little bit hard for debugging, one solution
> would be adding 8250/16550 in debug build?

The serial port is optional anyway, it's not there unless you specify
"-serial stdio" or similar.

> We actually have patches for Q35 + ICH9 and it does exactly the same
> thing you described here. Adding a new one is just:
>  1). to keep both Q35 and 'lite' code clean, and
>  2). don't expose two different Q35 implementations to guest.

It would be nice to at least see the patches. :)

I think a lightweight q35 platform that can run the usual firmware could
be acceptable in QEMU.

>> 2) this:
>>
>>> - it loads guest kernel directly, no BIOS, no bootloader, no realmode
>>>   code;
>>
>> ... which is related to Linux-only support.  How much does this gain
>> over a minimal firmware (either SeaBIOS with the fw_cfg DMA interface,
>> or qboot with cbfs in parallel flash)?
> 
> We have tried Q35 version (as described above) with both SeaBIOS and qboot.
> The 'perfect' time with optimized BIOS we have seen is ~15ms, with the
> additional time in kernel real mode code, the total time overhead comparing
> to current Linux-aware implementation is more than 40ms. This sounds still
> a little too much for us.

I guess it is related to real mode decompression code?

My main issue is that there are other things that the firmware does.
Not all of them are necessary (e.g. SMRAM is not needed, most PCI
devices need not be initialized), but in general we don't like putting
code in QEMU that modifies the guest state.  For example another Intel
person is adding code to SeaBIOS that initializes the feature control MSR.

I wonder if Linux could run as a multiboot-compliant ELF file, and what
the performance would be...  Multiboot omits the real mode stub.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-17 13:24 ` [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Paolo Bonzini
  2016-06-20  6:01   ` Chao Peng
@ 2016-06-20 10:36   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 32+ messages in thread
From: Dr. David Alan Gilbert @ 2016-06-20 10:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Chao Peng, qemu-devel, Haozhong Zhang, Xiao Guangrong,
	Eduardo Habkost, Michael S. Tsirkin, gor Mammedov,
	Richard Henderson

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 17/06/2016 10:14, Chao Peng wrote:
> > Basically:
> > - it removes old ISA devices and support only PCI devices;
> 
> I think you need to keep at least the RTC, otherwise where does Linux
> get the time of day from?
> 
> > - it removes 8259, instead use MSI as much as possible. IOAPIC and PCI
> >   PIN are still kept to support ACPI SCI;
> > - it supports PCIE ( you can use MMFG instead of 0xcf8/0xcfc port
> >   access);
> > - it gets rid of legacy firmware interfaces and supports ACPI tables;
> > - it supports CPU/memory/PCI hotplug;
> > - it supports Linux-guest only at present;
> > - You may need carefully configure guest kernel;
> > - You are forced to use virtio-serial-pci, old 8250/16550 is not there;
> 
> It doesn't support PCIe hotplug though, I think? (Because it doesn't
> support PCI bridges and PCIe hotplug doesn't work for root complex
> devices).  So is it ACPI-based hotplug?
> 
> Lack of 8250/16550 means lack of earlyprintk.  I know the driver is slow
> though, so I understand that.

You could always define a different way to get early debug; for example
a chunk of RAM at a known address seems fine for a VM.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-20  6:54     ` Paolo Bonzini
@ 2016-06-20 12:31       ` Stefan Hajnoczi
  2016-06-20 13:00         ` Paolo Bonzini
  2016-06-21  1:23       ` Chao Peng
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Stefan Hajnoczi @ 2016-06-20 12:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Chao Peng, Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, qemu-devel, anthony.xu, gor Mammedov,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 342 bytes --]

On Mon, Jun 20, 2016 at 08:54:10AM +0200, Paolo Bonzini wrote:
> I wonder if Linux could run as a multiboot-compliant ELF file, and what
> the performance would be...  Multiboot omits the real mode stub.

The Linux boot protocol does not require real mode.  I think "64-bit
BOOT PROTOCOL" in Documentation/x86/boot.txt could be used.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-20 12:31       ` Stefan Hajnoczi
@ 2016-06-20 13:00         ` Paolo Bonzini
  0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2016-06-20 13:00 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Chao Peng, Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, qemu-devel, anthony.xu, gor Mammedov,
	Richard Henderson



On 20/06/2016 14:31, Stefan Hajnoczi wrote:
> On Mon, Jun 20, 2016 at 08:54:10AM +0200, Paolo Bonzini wrote:
> > I wonder if Linux could run as a multiboot-compliant ELF file, and what
> > the performance would be...  Multiboot omits the real mode stub.
>
> The Linux boot protocol does not require real mode.  I think "64-bit
> BOOT PROTOCOL" in Documentation/x86/boot.txt could be used.

Yes, the real mode is only needed to decompress the kernel, to retrieve
the e820 memory map, and then invoke the Linux boot protocol.  However,
neither QEMU nor (I think) GRUB can boot a vmlinux file.  So I wondered
if it would be possible to compile Linux in a format that is not
compressed (for speed) and can be invoked by both QEMU and GRUB.

Multiboot seems interesting because it has other advantages.  For
example it supports modules, so you can use it with an initrd, and it
passes the e820 data directly to the loaded kernel.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-20  6:54     ` Paolo Bonzini
  2016-06-20 12:31       ` Stefan Hajnoczi
@ 2016-06-21  1:23       ` Chao Peng
  2016-06-21 16:44       ` Michael S. Tsirkin
  2016-06-23  8:32       ` Chao Peng
  3 siblings, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-21  1:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, qemu-devel, anthony.xu, gor Mammedov,
	Richard Henderson

> It would be nice to at least see the patches. :)
> 
> I think a lightweight q35 platform that can run the usual firmware could
> be acceptable in QEMU.

OK, I will send out v2.

> 
> >> 2) this:
> >>
> >>> - it loads guest kernel directly, no BIOS, no bootloader, no realmode
> >>>   code;
> >>
> >> ... which is related to Linux-only support.  How much does this gain
> >> over a minimal firmware (either SeaBIOS with the fw_cfg DMA interface,
> >> or qboot with cbfs in parallel flash)?
> > 
> > We have tried Q35 version (as described above) with both SeaBIOS and qboot.
> > The 'perfect' time with optimized BIOS we have seen is ~15ms, with the
> > additional time in kernel real mode code, the total time overhead comparing
> > to current Linux-aware implementation is more than 40ms. This sounds still
> > a little too much for us.
> 
> I guess it is related to real mode decompression code?

Yes, that's the major part.

> 
> My main issue is that there are other things that the firmware does.
> Not all of them are necessary (e.g. SMRAM is not needed, most PCI
> devices need not be initialized), but in general we don't like putting
> code in QEMU that modifies the guest state.  For example another Intel
> person is adding code to SeaBIOS that initializes the feature control MSR.
> 
> I wonder if Linux could run as a multiboot-compliant ELF file, and what
> the performance would be...  Multiboot omits the real mode stub.
> 

I can look into this. Thanks.

Chao

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-20  6:54     ` Paolo Bonzini
  2016-06-20 12:31       ` Stefan Hajnoczi
  2016-06-21  1:23       ` Chao Peng
@ 2016-06-21 16:44       ` Michael S. Tsirkin
  2016-06-23  8:32       ` Chao Peng
  3 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2016-06-21 16:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Chao Peng, qemu-devel, gor Mammedov, Xiao Guangrong,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang, anthony.xu

On Mon, Jun 20, 2016 at 08:54:10AM +0200, Paolo Bonzini wrote:
> 
> 
> On 20/06/2016 08:01, Chao Peng wrote:
> > On Fri, Jun 17, 2016 at 03:24:59PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 17/06/2016 10:14, Chao Peng wrote:
> >>> Basically:
> >>> - it removes old ISA devices and support only PCI devices;
> >>
> >> I think you need to keep at least the RTC, otherwise where does Linux
> >> get the time of day from?
> > 
> > PV clock will provide that.
> 
> It's KVM only, though.  Sometimes TCG is useful for debugging.
> 
> >> Lack of 8250/16550 means lack of earlyprintk.  I know the driver is slow
> >> though, so I understand that.
> > 
> > Understand, it might be a little bit hard for debugging, one solution
> > would be adding 8250/16550 in debug build?
> 
> The serial port is optional anyway, it's not there unless you specify
> "-serial stdio" or similar.
> 
> > We actually have patches for Q35 + ICH9 and it does exactly the same
> > thing you described here. Adding a new one is just:
> >  1). to keep both Q35 and 'lite' code clean, and
> >  2). don't expose two different Q35 implementations to guest.
> 
> It would be nice to at least see the patches. :)
> 
> I think a lightweight q35 platform that can run the usual firmware could
> be acceptable in QEMU.

I agree.

> >> 2) this:
> >>
> >>> - it loads guest kernel directly, no BIOS, no bootloader, no realmode
> >>>   code;
> >>
> >> ... which is related to Linux-only support.  How much does this gain
> >> over a minimal firmware (either SeaBIOS with the fw_cfg DMA interface,
> >> or qboot with cbfs in parallel flash)?
> > 
> > We have tried Q35 version (as described above) with both SeaBIOS and qboot.
> > The 'perfect' time with optimized BIOS we have seen is ~15ms, with the
> > additional time in kernel real mode code, the total time overhead comparing
> > to current Linux-aware implementation is more than 40ms. This sounds still
> > a little too much for us.
> 
> I guess it is related to real mode decompression code?
> 
> My main issue is that there are other things that the firmware does.
> Not all of them are necessary (e.g. SMRAM is not needed, most PCI
> devices need not be initialized),

BTW there's a CMOS flag that says guest is PNP and so only boot devices
need to be initialized. seabios could use it and skip non bootable
PCI devices.

> but in general we don't like putting
> code in QEMU that modifies the guest state.  For example another Intel
> person is adding code to SeaBIOS that initializes the feature control MSR.
> 
> I wonder if Linux could run as a multiboot-compliant ELF file, and what
> the performance would be...  Multiboot omits the real mode stub.
> 
> Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-20  6:54     ` Paolo Bonzini
                         ` (2 preceding siblings ...)
  2016-06-21 16:44       ` Michael S. Tsirkin
@ 2016-06-23  8:32       ` Chao Peng
  2016-06-23 12:44         ` Paolo Bonzini
  3 siblings, 1 reply; 32+ messages in thread
From: Chao Peng @ 2016-06-23  8:32 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Michael S. Tsirkin, gor Mammedov, Xiao Guangrong,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang, anthony.xu

> 
> I think a lightweight q35 platform that can run the usual firmware could
> be acceptable in QEMU.

After re-thought the target usage model, I'm not sure if this is a good
idea.

The original usage model is to replace kvm-tool with QEMU for Clear
Containers (https://clearlinux.org/features/clear-containers). It's not
going to present the guest a real PC platform, but instead, a totally
virtual platform. Every little bit boot time saving is important because
we are trying to achieve comparable result with that for Linux native
container.

With this usage model, I doubt introducing a firmware layer is a good
idea:

    On one side, even with optimized and compact qboot it still takes us
    ~15ms. This is not a small value because current Linux kernel takes
    only ~50ms (and we are still on the way to optimize it). And when
    you look at the SeaBIOS or qboot, almost all the code are useless for
    this usage model. They are doing things that is important for
    traditional PC booting but cost 15ms doing useless things for us (It
    is really not easy to save 15ms in other place, for example, in
    Linux. Personally I tend to change the architecture for this new
    usage model, e.g. eliminate firmware).

    On the other side, even boot the new pc-lite platform with firmware,
    it does not mean it can support non-Linux system like Windows. So
    generally I don't see the benefit of introducing a firmware layer.

Besides, I'm also not quite sure if build around Q35 is the best
solution:

    The problem with Q35 is some features like SMM/SMRAM/PAM slow done
    the booting even we actually never use them. While removing these
    features can cause guest see different feature set for a same device
    and it also prevents us to do further optimizations on that in guest.

Really appreciate if you could give some more suggestions or other usage
models if you have.

Thanks,
Chao

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-23  8:32       ` Chao Peng
@ 2016-06-23 12:44         ` Paolo Bonzini
  2016-06-24  6:39           ` Claudio Fontana
  2016-06-28  9:27           ` Chao Peng
  0 siblings, 2 replies; 32+ messages in thread
From: Paolo Bonzini @ 2016-06-23 12:44 UTC (permalink / raw)
  To: Chao Peng
  Cc: qemu-devel, Michael S. Tsirkin, gor Mammedov, Xiao Guangrong,
	Richard Henderson, Eduardo Habkost, Haozhong Zhang, anthony.xu



On 23/06/2016 10:32, Chao Peng wrote:
> The original usage model is to replace kvm-tool with QEMU for Clear
> Containers (https://clearlinux.org/features/clear-containers). It's not
> going to present the guest a real PC platform, but instead, a totally
> virtual platform.

It is not completely virtual; it has PCI for example.  Hyper-V is an
example of a completely virtual platform, even the LAPIC is customized
with paravirtual features.

qboot does basically four things: 1) relocate from ROM to 0xf0000; 2)
initialize PCI; 2) provide the ACPI and e820 tables; 3) boot.

If Linux can boot without initializing PCI bridges and without INTX, we
can remove that code from qboot.  The PCI scan is the most expensive
part, I think.  (2) and (3) are the same no matter if you run them in
QEMU or the guest.

That leaves out only relocation (PAM).

> Every little bit boot time saving is important because
> we are trying to achieve comparable result with that for Linux native
> container.
> 
> With this usage model, I doubt introducing a firmware layer is a good
> idea:
> 
>     On one side, even with optimized and compact qboot it still takes us
>     ~15ms.

Have you profiled it?  If it is code in QEMU that we can optimize (e.g.
memory.c), that would benefit all guests.

>     This is not a small value because current Linux kernel takes
>     only ~50ms (and we are still on the way to optimize it). And when
>     you look at the SeaBIOS or qboot, almost all the code are useless for
>     this usage model. They are doing things that is important for
>     traditional PC booting but cost 15ms doing useless things for us (It
>     is really not easy to save 15ms in other place, for example, in
>     Linux. Personally I tend to change the architecture for this new
>     usage model, e.g. eliminate firmware).
> 
>     On the other side, even boot the new pc-lite platform with firmware,
>     it does not mean it can support non-Linux system like Windows. So
>     generally I don't see the benefit of introducing a firmware layer.

The main benefit is maintainability, by reducing the amount of code
specific to pc-lite.

> Besides, I'm also not quite sure if build around Q35 is the best
> solution:
> 
>     The problem with Q35 is some features like SMM/SMRAM/PAM slow done
>     the booting even we actually never use them. While removing these
>     features can cause guest see different feature set for a same device
>     and it also prevents us to do further optimizations on that in guest.

Of these, qboot only uses PAM, and even that could be removed (PAM is
only necessary because of how qboot probes parallel flash).  SMRAM
should not slow down booting if you don't use them.  Do they?

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-20  6:12   ` Chao Peng
@ 2016-06-23 12:55     ` Daniel P. Berrange
  2016-06-28 10:10       ` Chao Peng
  0 siblings, 1 reply; 32+ messages in thread
From: Daniel P. Berrange @ 2016-06-23 12:55 UTC (permalink / raw)
  To: Chao Peng
  Cc: Michael S. Tsirkin, Haozhong Zhang, Xiao Guangrong,
	Eduardo Habkost, qemu-devel, Paolo Bonzini, gor Mammedov,
	Richard Henderson

On Mon, Jun 20, 2016 at 02:12:17PM +0800, Chao Peng wrote:
> On Sun, Jun 19, 2016 at 06:51:04AM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 17, 2016 at 04:14:08AM -0400, Chao Peng wrote:
> > > - it is FAST;
> > 
> > Any numbers to demonstrate just how fast it is and fast at what?
> 
> On a 2.30GHz Haswell server, guest kernel booting time is 59.9ms by
> following test steps listed at
> 
> https://github.com/chao-p/qemu-lite-tools
> 
> Ran the same test with "-machine q35", the guest kernel booting
> time is 129.8ms. There is additional 75ms in SeaBIOS for Q35 case.

I think it'd be useful / interesting to understand why we have saved
this time vs Q35. I'm not a huge fan of the idea of defining an
arbitrarily cut down machine type, because inevitably one applications
view of what is the "bare minimum required functionality" will be
different from another applications' view.

It seems to me that whether some features emulated by QEMU are slow
or not should only matter if the guest OS actually tries to use those
features. IOW, could we achieve the same speed up in boot time, by
making Linux more configurable at runtime. eg so with a single Linux
kernel binary and standard Q35/PIIX machine type, we can disable
slow functionality by just giving Linux suitable kernel command
line arguments.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-23 12:44         ` Paolo Bonzini
@ 2016-06-24  6:39           ` Claudio Fontana
  2016-06-24  6:41             ` Claudio Fontana
  2016-06-28  9:27           ` Chao Peng
  1 sibling, 1 reply; 32+ messages in thread
From: Claudio Fontana @ 2016-06-24  6:39 UTC (permalink / raw)
  To: Paolo Bonzini, Chao Peng
  Cc: Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, qemu-devel, anthony.xu, gor Mammedov,
	Richard Henderson

Hi Paolo,

On 23.06.2016 20:44, Paolo Bonzini wrote:
> 
> 
> On 23/06/2016 10:32, Chao Peng wrote:
>> The original usage model is to replace kvm-tool with QEMU for Clear
>> Containers (https://clearlinux.org/features/clear-containers). It's not
>> going to present the guest a real PC platform, but instead, a totally
>> virtual platform.
> 
> It is not completely virtual; it has PCI for example.  Hyper-V is an
> example of a completely virtual platform, even the LAPIC is customized
> with paravirtual features.
> 
> qboot does basically four things: 1) relocate from ROM to 0xf0000; 2)
> initialize PCI; 2) provide the ACPI and e820 tables; 3) boot.
> 
> If Linux can boot without initializing PCI bridges and without INTX, we
> can remove that code from qboot.  The PCI scan is the most expensive
> part, I think.  (2) and (3) are the same no matter if you run them in
> QEMU or the guest.
> 
> That leaves out only relocation (PAM).
> 
>> Every little bit boot time saving is important because
>> we are trying to achieve comparable result with that for Linux native
>> container.
>>
>> With this usage model, I doubt introducing a firmware layer is a good
>> idea:
>>
>>     On one side, even with optimized and compact qboot it still takes us
>>     ~15ms.
> 
> Have you profiled it?  If it is code in QEMU that we can optimize (e.g.
> memory.c), that would benefit all guests.
> 
>>     This is not a small value because current Linux kernel takes
>>     only ~50ms (and we are still on the way to optimize it). And when
>>     you look at the SeaBIOS or qboot, almost all the code are useless for
>>     this usage model. They are doing things that is important for
>>     traditional PC booting but cost 15ms doing useless things for us (It
>>     is really not easy to save 15ms in other place, for example, in
>>     Linux. Personally I tend to change the architecture for this new
>>     usage model, e.g. eliminate firmware).
>>
>>     On the other side, even boot the new pc-lite platform with firmware,
>>     it does not mean it can support non-Linux system like Windows. So
>>     generally I don't see the benefit of introducing a firmware layer.
> 
> The main benefit is maintainability, by reducing the amount of code
> specific to pc-lite.
> 
>> Besides, I'm also not quite sure if build around Q35 is the best
>> solution:
>>
>>     The problem with Q35 is some features like SMM/SMRAM/PAM slow done
>>     the booting even we actually never use them. While removing these
>>     features can cause guest see different feature set for a same device
>>     and it also prevents us to do further optimizations on that in guest.
> 
> Of these, qboot only uses PAM, and even that could be removed (PAM is
> only necessary because of how qboot probes parallel flash).  SMRAM
> should not slow down booting if you don't use them.  Do they?
> 
> Paolo

I use qboot for similar goals, you mention that PAM is necessary because of how qboot probes parallel flash,
however in my custom platform I removed PAM completely from QEMU, and everything seems to work without any problems..

Ciao

Claudio

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-24  6:39           ` Claudio Fontana
@ 2016-06-24  6:41             ` Claudio Fontana
  2016-06-24  6:53               ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Claudio Fontana @ 2016-06-24  6:41 UTC (permalink / raw)
  To: Paolo Bonzini, Chao Peng
  Cc: Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, qemu-devel, anthony.xu, gor Mammedov,
	Richard Henderson

On 24.06.2016 14:39, Claudio Fontana wrote:
> Hi Paolo,
> 
> On 23.06.2016 20:44, Paolo Bonzini wrote:
>>
>>
>> On 23/06/2016 10:32, Chao Peng wrote:
>>> The original usage model is to replace kvm-tool with QEMU for Clear
>>> Containers (https://clearlinux.org/features/clear-containers). It's not
>>> going to present the guest a real PC platform, but instead, a totally
>>> virtual platform.
>>
>> It is not completely virtual; it has PCI for example.  Hyper-V is an
>> example of a completely virtual platform, even the LAPIC is customized
>> with paravirtual features.
>>
>> qboot does basically four things: 1) relocate from ROM to 0xf0000; 2)
>> initialize PCI; 2) provide the ACPI and e820 tables; 3) boot.
>>
>> If Linux can boot without initializing PCI bridges and without INTX, we
>> can remove that code from qboot.  The PCI scan is the most expensive
>> part, I think.  (2) and (3) are the same no matter if you run them in
>> QEMU or the guest.
>>
>> That leaves out only relocation (PAM).
>>
>>> Every little bit boot time saving is important because
>>> we are trying to achieve comparable result with that for Linux native
>>> container.
>>>
>>> With this usage model, I doubt introducing a firmware layer is a good
>>> idea:
>>>
>>>     On one side, even with optimized and compact qboot it still takes us
>>>     ~15ms.
>>
>> Have you profiled it?  If it is code in QEMU that we can optimize (e.g.
>> memory.c), that would benefit all guests.
>>
>>>     This is not a small value because current Linux kernel takes
>>>     only ~50ms (and we are still on the way to optimize it). And when
>>>     you look at the SeaBIOS or qboot, almost all the code are useless for
>>>     this usage model. They are doing things that is important for
>>>     traditional PC booting but cost 15ms doing useless things for us (It
>>>     is really not easy to save 15ms in other place, for example, in
>>>     Linux. Personally I tend to change the architecture for this new
>>>     usage model, e.g. eliminate firmware).
>>>
>>>     On the other side, even boot the new pc-lite platform with firmware,
>>>     it does not mean it can support non-Linux system like Windows. So
>>>     generally I don't see the benefit of introducing a firmware layer.
>>
>> The main benefit is maintainability, by reducing the amount of code
>> specific to pc-lite.
>>
>>> Besides, I'm also not quite sure if build around Q35 is the best
>>> solution:
>>>
>>>     The problem with Q35 is some features like SMM/SMRAM/PAM slow done
>>>     the booting even we actually never use them. While removing these
>>>     features can cause guest see different feature set for a same device
>>>     and it also prevents us to do further optimizations on that in guest.
>>
>> Of these, qboot only uses PAM, and even that could be removed (PAM is
>> only necessary because of how qboot probes parallel flash).  SMRAM
>> should not slow down booting if you don't use them.  Do they?
>>
>> Paolo
> 
> I use qboot for similar goals, you mention that PAM is necessary because of how qboot probes parallel flash,
> however in my custom platform I removed PAM completely from QEMU, and everything seems to work without any problems..
>

Btw before you ask: yes I am booting with pflash.

Ciao

C.
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-24  6:41             ` Claudio Fontana
@ 2016-06-24  6:53               ` Paolo Bonzini
  2016-06-27  2:55                 ` Claudio Fontana
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2016-06-24  6:53 UTC (permalink / raw)
  To: Claudio Fontana, Chao Peng
  Cc: Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, qemu-devel, anthony.xu, gor Mammedov,
	Richard Henderson



On 24/06/2016 08:41, Claudio Fontana wrote:
>> I use qboot for similar goals, you mention that PAM is necessary because of how qboot probes parallel flash,
>> however in my custom platform I removed PAM completely from QEMU, and everything seems to work without any problems..
> 
> Btw before you ask: yes I am booting with pflash.

By default low memory points to PCI address space

    00000000000f0000-00000000000fffff (prio 1, RW): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, RW): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, R-): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, RW): alias pam-pci @pci 00000000000f0000-00000000000fffff

All that qboot does is enabling pam-ram:

        // Make ram from 0xc0000-0xf0000 read-write
        int i;
        for (i=0; i<6; i++) {
                int pam = pambase + 1 + i;
                pci_config_writeb(bdf, pam, 0x33);
        }

        // Make ram from 0xf0000-0x100000 read-write and shadow BIOS
        // We're still running from 0xffff0000
        pci_config_writeb(bdf, pambase, 0x30);
        memcpy(low_start, bios_start, 0x10000);

So if you remove PAM but you are leaving 0xC000-0x10000 pointing to
RAM, you are effectively moving qboot's PAM configuration to QEMU. :)

Of these writes, only the last write is strictly necessary.  qboot
currently uses 0xe0000-0xf0000 for the ACPI tables but we could move
them to the EBDA instead and save the initial loop.  But I'd like to
see a trace saying how much time is spent configuring PAM exactly.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-24  6:53               ` Paolo Bonzini
@ 2016-06-27  2:55                 ` Claudio Fontana
  0 siblings, 0 replies; 32+ messages in thread
From: Claudio Fontana @ 2016-06-27  2:55 UTC (permalink / raw)
  To: Paolo Bonzini, Chao Peng
  Cc: Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, qemu-devel, anthony.xu, gor Mammedov,
	Richard Henderson

Hi Paolo,

On 24.06.2016 14:53, Paolo Bonzini wrote:
> 
> 
> On 24/06/2016 08:41, Claudio Fontana wrote:
>>> I use qboot for similar goals, you mention that PAM is necessary because of how qboot probes parallel flash,
>>> however in my custom platform I removed PAM completely from QEMU, and everything seems to work without any problems..
>>
>> Btw before you ask: yes I am booting with pflash.
> 
> By default low memory points to PCI address space
> 
>     00000000000f0000-00000000000fffff (prio 1, RW): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, RW): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, R-): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, RW): alias pam-pci @pci 00000000000f0000-00000000000fffff
> 
> All that qboot does is enabling pam-ram:
> 
>         // Make ram from 0xc0000-0xf0000 read-write
>         int i;
>         for (i=0; i<6; i++) {
>                 int pam = pambase + 1 + i;
>                 pci_config_writeb(bdf, pam, 0x33);
>         }
> 
>         // Make ram from 0xf0000-0x100000 read-write and shadow BIOS
>         // We're still running from 0xffff0000
>         pci_config_writeb(bdf, pambase, 0x30);
>         memcpy(low_start, bios_start, 0x10000);
> 
> So if you remove PAM but you are leaving 0xC000-0x10000 pointing to
> RAM, you are effectively moving qboot's PAM configuration to QEMU. :)
> 
> Of these writes, only the last write is strictly necessary.  qboot
> currently uses 0xe0000-0xf0000 for the ACPI tables but we could move
> them to the EBDA instead and save the initial loop.  But I'd like to
> see a trace saying how much time is spent configuring PAM exactly.
> 
> Paolo
> 

In my case the boot times are satisfactory including the PAM configuration loop in qboot.

The reason I removed the PAM backend in QEMU (or rather, made them configurable via existing CONFIG_PAM),
is as part of memory saving patches, not because of boottime issues.

Ciao,

Claudio

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-23 12:44         ` Paolo Bonzini
  2016-06-24  6:39           ` Claudio Fontana
@ 2016-06-28  9:27           ` Chao Peng
  1 sibling, 0 replies; 32+ messages in thread
From: Chao Peng @ 2016-06-28  9:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Haozhong Zhang, Xiao Guangrong, Eduardo Habkost,
	Michael S. Tsirkin, qemu-devel, anthony.xu, gor Mammedov,
	Richard Henderson

> 
> qboot does basically four things: 1) relocate from ROM to 0xf0000; 2)
> initialize PCI; 2) provide the ACPI and e820 tables; 3) boot.
> 
> If Linux can boot without initializing PCI bridges and without INTX, we
> can remove that code from qboot.  The PCI scan is the most expensive
> part, I think.  (2) and (3) are the same no matter if you run them in
> QEMU or the guest.
> 
> That leaves out only relocation (PAM).

Yes, most 'overhead' comes from PCI stuff, fw_cfg related and PAM.

> 
> > Every little bit boot time saving is important because
> > we are trying to achieve comparable result with that for Linux native
> > container.
> > 
> > With this usage model, I doubt introducing a firmware layer is a good
> > idea:
> > 
> >     On one side, even with optimized and compact qboot it still takes us
> >     ~15ms.
> 
> Have you profiled it?  If it is code in QEMU that we can optimize (e.g.
> memory.c), that would benefit all guests.

Most optimizations are not helpful for traditional guests (because
there is functionalities missing), but there does exist some. For
example, PAM for Q35 emulation contains 12 memory regions (info mtree).
We can merge them into 1 to reduce loop time (even helpful for other
memory regions access when PAM is disabled).

> 
> >     This is not a small value because current Linux kernel takes
> >     only ~50ms (and we are still on the way to optimize it). And when
> >     you look at the SeaBIOS or qboot, almost all the code are useless for
> >     this usage model. They are doing things that is important for
> >     traditional PC booting but cost 15ms doing useless things for us (It
> >     is really not easy to save 15ms in other place, for example, in
> >     Linux. Personally I tend to change the architecture for this new
> >     usage model, e.g. eliminate firmware).
> > 
> >     On the other side, even boot the new pc-lite platform with firmware,
> >     it does not mean it can support non-Linux system like Windows. So
> >     generally I don't see the benefit of introducing a firmware layer.
> 
> The main benefit is maintainability, by reducing the amount of code
> specific to pc-lite.

Understand.

Chao

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-23 12:55     ` Daniel P. Berrange
@ 2016-06-28 10:10       ` Chao Peng
  2016-06-28 10:26         ` Daniel P. Berrange
  0 siblings, 1 reply; 32+ messages in thread
From: Chao Peng @ 2016-06-28 10:10 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Michael S. Tsirkin, Haozhong Zhang, Xiao Guangrong,
	Eduardo Habkost, qemu-devel, Paolo Bonzini, gor Mammedov,
	Richard Henderson

On Thu, Jun 23, 2016 at 01:55:06PM +0100, Daniel P. Berrange wrote:
> On Mon, Jun 20, 2016 at 02:12:17PM +0800, Chao Peng wrote:
> > On Sun, Jun 19, 2016 at 06:51:04AM +0300, Michael S. Tsirkin wrote:
> > > On Fri, Jun 17, 2016 at 04:14:08AM -0400, Chao Peng wrote:
> > > > - it is FAST;
> > > 
> > > Any numbers to demonstrate just how fast it is and fast at what?
> > 
> > On a 2.30GHz Haswell server, guest kernel booting time is 59.9ms by
> > following test steps listed at
> > 
> > https://github.com/chao-p/qemu-lite-tools
> > 
> > Ran the same test with "-machine q35", the guest kernel booting
> > time is 129.8ms. There is additional 75ms in SeaBIOS for Q35 case.
> 
> I think it'd be useful / interesting to understand why we have saved
> this time vs Q35. I'm not a huge fan of the idea of defining an
> arbitrarily cut down machine type, because inevitably one applications
> view of what is the "bare minimum required functionality" will be
> different from another applications' view.
> 
> It seems to me that whether some features emulated by QEMU are slow
> or not should only matter if the guest OS actually tries to use those
> features. IOW, could we achieve the same speed up in boot time, by
> making Linux more configurable at runtime. eg so with a single Linux
> kernel binary and standard Q35/PIIX machine type, we can disable
> slow functionality by just giving Linux suitable kernel command
> line arguments.

I totally agree with you. And our goal is reducing boot time so I don't
mind using existing code to achieve this goal.

When I looked into this. I have thought there might be a minimal
platform with which I can add other stuff on demand, or a full
functional platform that allow me to disable unnecessary functionalities.

But in practice I can't get a system that exactly fit for me, regardless
there is lots of configuration methods in both QEMU/BIOS/kernel.

Taking kernel time overhead with Q35 here for example. The most time
saving comes from PCI initialization. New pc-lite platform supports only
1 bus so guest don't need to scan all the possible buses which save ~60ms.
Another time saving place is new pc-lite platform removed SMBUS/SATA/LPC
bridges that Q35 creates by default. Initialization for these devices
costs ~20ms.

The problem is not all these functionalities can be disabled either in
kernel or in QEMU, so in the end, turns out to introduce a new one.

Thanks,
Chao

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite
  2016-06-28 10:10       ` Chao Peng
@ 2016-06-28 10:26         ` Daniel P. Berrange
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrange @ 2016-06-28 10:26 UTC (permalink / raw)
  To: Chao Peng
  Cc: Michael S. Tsirkin, Haozhong Zhang, Xiao Guangrong,
	Eduardo Habkost, qemu-devel, Paolo Bonzini, gor Mammedov,
	Richard Henderson

On Tue, Jun 28, 2016 at 06:10:51PM +0800, Chao Peng wrote:
> On Thu, Jun 23, 2016 at 01:55:06PM +0100, Daniel P. Berrange wrote:
> > On Mon, Jun 20, 2016 at 02:12:17PM +0800, Chao Peng wrote:
> > > On Sun, Jun 19, 2016 at 06:51:04AM +0300, Michael S. Tsirkin wrote:
> > > > On Fri, Jun 17, 2016 at 04:14:08AM -0400, Chao Peng wrote:
> > > > > - it is FAST;
> > > > 
> > > > Any numbers to demonstrate just how fast it is and fast at what?
> > > 
> > > On a 2.30GHz Haswell server, guest kernel booting time is 59.9ms by
> > > following test steps listed at
> > > 
> > > https://github.com/chao-p/qemu-lite-tools
> > > 
> > > Ran the same test with "-machine q35", the guest kernel booting
> > > time is 129.8ms. There is additional 75ms in SeaBIOS for Q35 case.
> > 
> > I think it'd be useful / interesting to understand why we have saved
> > this time vs Q35. I'm not a huge fan of the idea of defining an
> > arbitrarily cut down machine type, because inevitably one applications
> > view of what is the "bare minimum required functionality" will be
> > different from another applications' view.
> > 
> > It seems to me that whether some features emulated by QEMU are slow
> > or not should only matter if the guest OS actually tries to use those
> > features. IOW, could we achieve the same speed up in boot time, by
> > making Linux more configurable at runtime. eg so with a single Linux
> > kernel binary and standard Q35/PIIX machine type, we can disable
> > slow functionality by just giving Linux suitable kernel command
> > line arguments.
> 
> I totally agree with you. And our goal is reducing boot time so I don't
> mind using existing code to achieve this goal.
> 
> When I looked into this. I have thought there might be a minimal
> platform with which I can add other stuff on demand, or a full
> functional platform that allow me to disable unnecessary functionalities.
> 
> But in practice I can't get a system that exactly fit for me, regardless
> there is lots of configuration methods in both QEMU/BIOS/kernel.
> 
> Taking kernel time overhead with Q35 here for example. The most time
> saving comes from PCI initialization. New pc-lite platform supports only
> 1 bus so guest don't need to scan all the possible buses which save ~60ms.
> Another time saving place is new pc-lite platform removed SMBUS/SATA/LPC
> bridges that Q35 creates by default. Initialization for these devices
> costs ~20ms.
> 
> The problem is not all these functionalities can be disabled either in
> kernel or in QEMU, so in the end, turns out to introduce a new one.

I know that can't be disable in *current* kernel / QEMU. What i'm suggesting
is that you enhance the kernel / QEMU to allow more features to be disabled
dynamically. Adding yet another machine type with a static fixed set of
hardware will only ever suit niche use cases which happen to match whatever
you've personally decided is important, while adding long term support
burden on QEMU for yet another machine. IMHO it is a better long term
payout to make existing kernel/qemu more dynamically configurable instead
of adding yet another non-dynamic machine.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-06-28 10:26 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17  8:14 [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Chao Peng
2016-06-17  8:14 ` [Qemu-devel] [RFC 1/9] acpi: introduce light weight ACPI PM emulation pm-lite Chao Peng
2016-06-17  8:14 ` [Qemu-devel] [RFC 2/9] pci: introduce light weight PCIE Host emulation pci-lite Chao Peng
2016-06-17  8:14 ` [Qemu-devel] [RFC 3/9] acpi: add support for pc-lite platform Chao Peng
2016-06-17  8:14 ` [Qemu-devel] [RFC 4/9] acpi: expose data structurs and functions of BIOS linker loader Chao Peng
2016-06-17  8:14 ` [Qemu-devel] [RFC 5/9] acpi: expose acpi_checksum() Chao Peng
2016-06-17  8:14 ` [Qemu-devel] [RFC 6/9] acpi: patch guest ACPI for pc-lite Chao Peng
2016-06-17  8:14 ` [Qemu-devel] [RFC 7/9] pc: skip setting CMOS data when RTC device is unavailable Chao Peng
2016-06-17  8:14 ` [Qemu-devel] [RFC 8/9] pc: support direct loading protected/long mode kernel Chao Peng
2016-06-17  8:14 ` [Qemu-devel] [RFC 9/9] pc: introduce light weight PC board pc-lite Chao Peng
2016-06-17 13:24 ` [Qemu-devel] [RFC 0/9] Introduce light weight PC platform pc-lite Paolo Bonzini
2016-06-20  6:01   ` Chao Peng
2016-06-20  6:54     ` Paolo Bonzini
2016-06-20 12:31       ` Stefan Hajnoczi
2016-06-20 13:00         ` Paolo Bonzini
2016-06-21  1:23       ` Chao Peng
2016-06-21 16:44       ` Michael S. Tsirkin
2016-06-23  8:32       ` Chao Peng
2016-06-23 12:44         ` Paolo Bonzini
2016-06-24  6:39           ` Claudio Fontana
2016-06-24  6:41             ` Claudio Fontana
2016-06-24  6:53               ` Paolo Bonzini
2016-06-27  2:55                 ` Claudio Fontana
2016-06-28  9:27           ` Chao Peng
2016-06-20 10:36   ` Dr. David Alan Gilbert
2016-06-19  3:51 ` Michael S. Tsirkin
2016-06-20  6:12   ` Chao Peng
2016-06-23 12:55     ` Daniel P. Berrange
2016-06-28 10:10       ` Chao Peng
2016-06-28 10:26         ` Daniel P. Berrange
2016-06-19  8:21 ` Claudio Fontana
2016-06-20  6:30   ` Chao Peng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.