All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID
@ 2016-01-28 10:54 Igor Mammedov
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 1/9] acpi: extend ACPI interface to provide access to ACPI registers and SCI irq Igor Mammedov
                   ` (8 more replies)
  0 siblings, 9 replies; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, mst, ghammer, lersek, lcapitulino

Changes since v18:
  - drop MachineClass->default_props approach and make
    setting compat_props to work incrementaly which
    allows to use it for default properties as well and
    reduces data duplication as result of removing
    nesting in [PC|SPAPR]_COMPAT_* macroses.
Changes since v17:
  - make BAR prefetchable to meet cached req of MS spec
  - rename UUID/uuid to GUID/guid across series to match spec
  - qmp: add new GuidInfo type and use it instead of UuidInfo
  - tests: fail if test is timed out when waitng for address
Changes since v14:
  - statically reserve used BAR resources in SSDT, so
    that Windows won't claim them during PCI rebalancing
  - support VGID page in high mem in addition to low mem
  - add QMP/HMP interfaces to get/set VM Generation ID
  - do not consume a PCI slot by default and attach
    vmgenid device as a function of multifuction
    ISA bridge.
  - allow only one vmgenid device instance

It's respin of v14* series which uses a PCI BAR to map
VGID page in guest AS.

Tested with WS2012R2x64, older Windows versions which don't
support vmgenid boot fine but show unknown device
which is expected.

Git tree for testing:
https://github.com/imammedo/qemu.git vmgenid_v19

* v14, https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg00530.html




Gal Hammer (1):
  docs: vm generation id device's description

Igor Mammedov (8):
  acpi: extend ACPI interface to provide access to ACPI registers and
    SCI irq
  pc: add a Virtual Machine Generation ID device
  tests: add a unit test for the vmgenid device.
  qmp/hmp: add query-vm-generation-id and 'info vm-generation-id'
    commands
  qmp/hmp: add set-vm-generation-id commands
  machine: add properties to compat_props incrementaly
  pc: put PIIX3 in slot 1 explicitly and cleanup functions assignment
  pc/q53: by default put vmgenid device as an function of ISA bridge

 default-configs/i386-softmmu.mak     |   1 +
 default-configs/x86_64-softmmu.mak   |   1 +
 docs/specs/pci-ids.txt               |   1 +
 docs/specs/vmgenid.txt               |  36 +++++++
 hmp-commands-info.hx                 |  13 +++
 hmp-commands.hx                      |  13 +++
 hmp.c                                |  21 ++++
 hmp.h                                |   2 +
 hw/acpi/piix4.c                      |  17 ++++
 hw/core/machine.c                    |  10 ++
 hw/i386/acpi-build.c                 |  56 ++++++++++-
 hw/i386/pc_piix.c                    |  27 +++--
 hw/i386/pc_q35.c                     |   9 ++
 hw/isa/lpc_ich9.c                    |  16 +++
 hw/isa/vt82c686.c                    |  19 ++++
 hw/misc/Makefile.objs                |   1 +
 hw/misc/vmgenid.c                    | 185 +++++++++++++++++++++++++++++++++++
 hw/pci-host/piix.c                   |   9 +-
 hw/ppc/spapr.c                       |   3 -
 hw/s390x/s390-virtio-ccw.c           |  12 +--
 include/hw/acpi/acpi.h               |   1 +
 include/hw/acpi/acpi_dev_interface.h |   9 ++
 include/hw/boards.h                  |  11 ++-
 include/hw/i386/ich9.h               |   3 +-
 include/hw/i386/pc.h                 |  18 ++--
 include/hw/misc/vmgenid.h            |  27 +++++
 include/hw/pci/pci.h                 |   1 +
 qapi-schema.json                     |  31 ++++++
 qmp-commands.hx                      |  41 ++++++++
 stubs/Makefile.objs                  |   1 +
 stubs/vmgenid.c                      |  13 +++
 tests/Makefile                       |   2 +
 tests/vmgenid-test.c                 |  93 ++++++++++++++++++
 vl.c                                 |   6 +-
 34 files changed, 668 insertions(+), 41 deletions(-)
 create mode 100644 docs/specs/vmgenid.txt
 create mode 100644 hw/misc/vmgenid.c
 create mode 100644 include/hw/misc/vmgenid.h
 create mode 100644 stubs/vmgenid.c
 create mode 100644 tests/vmgenid-test.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v19 1/9] acpi: extend ACPI interface to provide access to ACPI registers and SCI irq
  2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
@ 2016-01-28 10:54 ` Igor Mammedov
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 2/9] docs: vm generation id device's description Igor Mammedov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, mst, ghammer, lersek, lcapitulino

so that we don't have to always add proxy wrappers in piix4pm/ich9
to access ACPI regs and SCI kept in piix4pm/lcp_ich9 devices
and call acpi_foo() API directly.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/piix4.c                      | 17 +++++++++++++++++
 hw/isa/lpc_ich9.c                    | 16 ++++++++++++++++
 hw/isa/vt82c686.c                    | 19 +++++++++++++++++++
 include/hw/acpi/acpi.h               |  1 +
 include/hw/acpi/acpi_dev_interface.h |  9 +++++++++
 5 files changed, 62 insertions(+)

diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 2cd2fee..5e29be4 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -583,6 +583,21 @@ static void piix4_ospm_status(AcpiDeviceIf *adev, ACPIOSTInfoList ***list)
     acpi_memory_ospm_status(&s->acpi_memory_hotplug, list);
 }
 
+static ACPIREGS *piix4_acpi_regs(AcpiDeviceIf *adev)
+{
+    PIIX4PMState *s = PIIX4_PM(adev);
+
+    return &s->ar;
+}
+
+static qemu_irq piix4_acpi_irq(AcpiDeviceIf *adev)
+{
+    PIIX4PMState *s = PIIX4_PM(adev);
+
+    return s->irq;
+}
+
+
 static Property piix4_pm_properties[] = {
     DEFINE_PROP_UINT32("smb_io_base", PIIX4PMState, smb_io_base, 0),
     DEFINE_PROP_UINT8(ACPI_PM_PROP_S3_DISABLED, PIIX4PMState, disable_s3, 0),
@@ -621,6 +636,8 @@ static void piix4_pm_class_init(ObjectClass *klass, void *data)
     hc->unplug_request = piix4_device_unplug_request_cb;
     hc->unplug = piix4_device_unplug_cb;
     adevc->ospm_status = piix4_ospm_status;
+    adevc->regs = piix4_acpi_regs;
+    adevc->sci = piix4_acpi_irq;
 }
 
 static const TypeInfo piix4_pm_info = {
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index ed9907d..9d60caa 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -700,6 +700,20 @@ static Property ich9_lpc_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static ACPIREGS *ich9_acpi_regs(AcpiDeviceIf *adev)
+{
+    ICH9LPCState *s = ICH9_LPC_DEVICE(adev);
+
+    return &s->pm.acpi_regs;
+}
+
+static qemu_irq ich9_acpi_irq(AcpiDeviceIf *adev)
+{
+    ICH9LPCState *s = ICH9_LPC_DEVICE(adev);
+
+    return s->pm.irq;
+}
+
 static void ich9_lpc_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -727,6 +741,8 @@ static void ich9_lpc_class_init(ObjectClass *klass, void *data)
     hc->unplug_request = ich9_device_unplug_request_cb;
     hc->unplug = ich9_device_unplug_cb;
     adevc->ospm_status = ich9_pm_ospm_status;
+    adevc->regs = ich9_acpi_regs;
+    adevc->sci = ich9_acpi_irq;
 }
 
 static const TypeInfo ich9_lpc_info = {
diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 6c2190b..58707d4 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -391,6 +391,18 @@ I2CBus *vt82c686b_pm_init(PCIBus *bus, int devfn, uint32_t smb_io_base,
     return s->smb.smbus;
 }
 
+static ACPIREGS *via_pm_acpi_regs(AcpiDeviceIf *adev)
+{
+    VT686PMState *s = VT82C686B_PM_DEVICE(adev);
+
+    return &s->ar;
+}
+
+static qemu_irq via_pm_acpi_irq(AcpiDeviceIf *adev)
+{
+    return NULL;
+}
+
 static Property via_pm_properties[] = {
     DEFINE_PROP_UINT32("smb_io_base", VT686PMState, smb_io_base, 0),
     DEFINE_PROP_END_OF_LIST(),
@@ -400,6 +412,7 @@ static void via_pm_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+    AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_CLASS(klass);
 
     k->realize = vt82c686b_pm_realize;
     k->config_write = pm_write_config;
@@ -411,6 +424,8 @@ static void via_pm_class_init(ObjectClass *klass, void *data)
     dc->vmsd = &vmstate_acpi;
     set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
     dc->props = via_pm_properties;
+    adevc->regs = via_pm_acpi_regs;
+    adevc->sci = via_pm_acpi_irq;
 }
 
 static const TypeInfo via_pm_info = {
@@ -418,6 +433,10 @@ static const TypeInfo via_pm_info = {
     .parent        = TYPE_PCI_DEVICE,
     .instance_size = sizeof(VT686PMState),
     .class_init    = via_pm_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_ACPI_DEVICE_IF },
+        { }
+    },
 };
 
 static const VMStateDescription vmstate_via = {
diff --git a/include/hw/acpi/acpi.h b/include/hw/acpi/acpi.h
index b20bd55..3fb6399 100644
--- a/include/hw/acpi/acpi.h
+++ b/include/hw/acpi/acpi.h
@@ -25,6 +25,7 @@
 #include "qemu/option.h"
 #include "exec/memory.h"
 #include "hw/irq.h"
+#include "hw/acpi/acpi_dev_interface.h"
 
 /*
  * current device naming scheme supports up to 256 memory devices
diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
index f245f8d..e3f1bad 100644
--- a/include/hw/acpi/acpi_dev_interface.h
+++ b/include/hw/acpi/acpi_dev_interface.h
@@ -3,6 +3,9 @@
 
 #include "qom/object.h"
 #include "qapi-types.h"
+#include "hw/irq.h"
+
+typedef struct ACPIREGS ACPIREGS;
 
 #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
 
@@ -28,6 +31,10 @@ typedef struct AcpiDeviceIf {
  * ospm_status: returns status of ACPI device objects, reported
  *              via _OST method if device supports it.
  *
+ * regs: returns pointer to ACPI registers block
+ *
+ * sci: return pointer to IRQ object associated with SCI
+ *
  * Interface is designed for providing unified interface
  * to generic ACPI functionality that could be used without
  * knowledge about internals of actual device that implements
@@ -39,5 +46,7 @@ typedef struct AcpiDeviceIfClass {
 
     /* <public> */
     void (*ospm_status)(AcpiDeviceIf *adev, ACPIOSTInfoList ***list);
+    ACPIREGS *(*regs)(AcpiDeviceIf *adev);
+    qemu_irq (*sci)(AcpiDeviceIf *adev);
 } AcpiDeviceIfClass;
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v19 2/9] docs: vm generation id device's description
  2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 1/9] acpi: extend ACPI interface to provide access to ACPI registers and SCI irq Igor Mammedov
@ 2016-01-28 10:54 ` Igor Mammedov
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device Igor Mammedov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, mst, ghammer, lersek, lcapitulino

From: Gal Hammer <ghammer@redhat.com>

Signed-off-by: Gal Hammer <ghammer@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 docs/specs/vmgenid.txt | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)
 create mode 100644 docs/specs/vmgenid.txt

diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
new file mode 100644
index 0000000..462d17f
--- /dev/null
+++ b/docs/specs/vmgenid.txt
@@ -0,0 +1,36 @@
+VIRTUAL MACHINE GENERATION ID
+=============================
+
+Copyright (C) 2016 Red Hat, Inc.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+===
+
+The VM generation ID (vmgenid) device is an emulated device which
+exposes a 128-bit, cryptographically random, integer value identifier.
+This allows management applications (e.g. libvirt) to notify the guest
+operating system when the virtual machine is executed with a different
+configuration (e.g. snapshot execution or creation from a template).
+
+This is specified on the web at: http://go.microsoft.com/fwlink/?LinkId=260709
+
+---
+
+The vmgenid device is a PCI device with the following ACPI ID: "QEMU0003".
+
+The device has a "vmgenid.guid" property, which can be set using
+the command line argument or the QMP interface.
+For example:
+QEMU  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
+
+Or to change guid in runtime use:
+ qom-set "/machine/peripheral/FOO.guid" "124e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
+
+According to the specification, any change to the GUID executes an
+ACPI notification. The vmgenid device triggers the \_GPE._E00 handler
+which executes the ACPI Notify operation.
+
+Although not specified in Microsoft's document, it is assumed that the
+device is expected to use the little-endian system.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 1/9] acpi: extend ACPI interface to provide access to ACPI registers and SCI irq Igor Mammedov
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 2/9] docs: vm generation id device's description Igor Mammedov
@ 2016-01-28 10:54 ` Igor Mammedov
  2016-01-28 11:13   ` Michael S. Tsirkin
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 4/9] tests: add a unit test for the vmgenid device Igor Mammedov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, mst, ghammer, lersek, lcapitulino

Based on Microsoft's specifications (paper can be
downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
easily found by "Virtual Machine Generation ID" keywords),
add a PCI device with corresponding description in
SSDT ACPI table.

The GUID is set using "vmgenid.guid" property or
a corresponding HMP/QMP command.

Example of using vmgenid device:
 -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"

'vmgenid' device initialization flow is as following:
 1. vmgenid has RAM BAR registered with size of GUID buffer
 2. BIOS initializes PCI devices and it maps BAR in PCI hole
 3. BIOS reads ACPI tables from QEMU, at that moment tables
    are generated with \_SB.VMGI.ADDR constant pointing to
    GPA where BIOS's mapped vmgenid's BAR earlier

Note:
This implementation uses PCI class 0x0500 code for vmgenid device,
that is marked as NO_DRV in Windows's machine.inf.
Testing various Windows versions showed that, OS
doesn't touch nor checks for resource conflicts
for such PCI devices.
There was concern that during PCI rebalancing, OS
could reprogram the BAR at other place, which would
leave VGEN.ADDR pointing to the old (no more valid)
address.
However testing showed that Windows does rebalancing
only for PCI device that have a driver attached
and completely ignores NO_DRV class of devices.
Which in turn creates a problem where OS could remap
one of PCI devices(with driver) over BAR used by
a driver-less PCI device.
Statically declaring used memory range as VGEN._CRS
makes OS to honor resource reservation and an ignored
BAR range is not longer touched during PCI rebalancing.

Signed-off-by: Gal Hammer <ghammer@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
changes since 17:
  - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
  - make BAR prefetchable to make region cached as per MS spec
  - s/uuid/guid/ to match spec
changes since 14:
  - reserve BAR resources so that Windows won't touch it
    during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
  - ACPI: split VGEN device of PCI device descriptor
    and place it at PCI0 scope, so that won't be need trace its
    location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
  - permit only one vmgenid to be created
  - enable BAR be mapped above 4Gb if it can't be mapped at low mem
---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 docs/specs/pci-ids.txt             |   1 +
 hw/i386/acpi-build.c               |  56 +++++++++++++-
 hw/misc/Makefile.objs              |   1 +
 hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
 include/hw/misc/vmgenid.h          |  27 +++++++
 include/hw/pci/pci.h               |   1 +
 8 files changed, 240 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/vmgenid.c
 create mode 100644 include/hw/misc/vmgenid.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index b177e52..6402439 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -51,6 +51,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_VMGENID=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 6e3b312..fdac18f 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -51,6 +51,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_VMGENID=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
index 0adcb89..e65ecf9 100644
--- a/docs/specs/pci-ids.txt
+++ b/docs/specs/pci-ids.txt
@@ -47,6 +47,7 @@ PCI devices (other than virtio):
 1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
 1b36:0006  PCI Rocker Ethernet switch device
 1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
+1b36:0009  PCI VM-Generation device
 1b36:000a  PCI-PCI bridge (multiseat)
 
 All these devices are documented in docs/specs.
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 78758e2..0187262 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -44,6 +44,7 @@
 #include "hw/acpi/tpm.h"
 #include "sysemu/tpm_backend.h"
 #include "hw/timer/mc146818rtc_regs.h"
+#include "hw/misc/vmgenid.h"
 
 /* Supported chipsets: */
 #include "hw/acpi/piix4.h"
@@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
     info->applesmc_io_base = applesmc_port();
 }
 
+static Aml *build_vmgenid_device(uint64_t buf_paddr)
+{
+    Aml *dev, *pkg, *crs;
+
+    dev = aml_device("VGEN");
+    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
+    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
+    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
+
+    pkg = aml_package(2);
+    /* low 32 bits of UUID buffer addr */
+    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
+    /* high 32 bits of UUID buffer addr */
+    aml_append(pkg, aml_int(buf_paddr >> 32));
+    aml_append(dev, aml_name_decl("ADDR", pkg));
+
+    /*
+     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
+     * displays it as "PCI RAM controller" which is marked as NO_DRV
+     * so Windows ignores VMGEN device completely and doesn't check
+     * for resource conflicts which during PCI rebalancing can lead
+     * to another PCI device claiming ignored BARs. To prevent this
+     * statically reserve resources used by VM_Gen_Counter.
+     * For more verbose comment see this commit message.
+     */
+     crs = aml_resource_template();
+     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
+                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
+                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
+                VMGENID_VMGID_BUF_SIZE));
+     aml_append(dev, aml_name_decl("_CRS", crs));
+     return dev;
+}
+
 /*
  * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
  * On i386 arch we only have two pci hosts, so we can look only for them.
@@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
             }
 
             if (bus) {
+                Object *vmgen;
                 Aml *scope = aml_scope("PCI0");
                 /* Scan all PCI buses. Generate tables to support hotplug. */
                 build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
@@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
                     aml_append(scope, dev);
                 }
 
+                vmgen = find_vmgneid_dev(NULL);
+                if (vmgen) {
+                    PCIDevice *pdev = PCI_DEVICE(vmgen);
+                    uint64_t buf_paddr =
+                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
+
+                    if (buf_paddr != PCI_BAR_UNMAPPED) {
+                        aml_append(scope, build_vmgenid_device(buf_paddr));
+
+                        method = aml_method("\\_GPE._E00", 0,
+                                            AML_NOTSERIALIZED);
+                        aml_append(method,
+                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
+                                       aml_int(0x80)));
+                        aml_append(ssdt, method);
+                    }
+                }
+
                 aml_append(sb_scope, scope);
             }
         }
@@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
     {
         aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
 
-        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
-
         if (misc->is_piix4) {
             method = aml_method("_E01", 0, AML_NOTSERIALIZED);
             aml_append(method,
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index d4765c2..1f05edd 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_EDU) += edu.o
+obj-$(CONFIG_VMGENID) += vmgenid.o
 obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
new file mode 100644
index 0000000..a2fbdfc
--- /dev/null
+++ b/hw/misc/vmgenid.c
@@ -0,0 +1,154 @@
+/*
+ *  Virtual Machine Generation ID Device
+ *
+ *  Copyright (C) 2016 Red Hat Inc.
+ *
+ *  Authors: Gal Hammer <ghammer@redhat.com>
+ *           Igor Mammedov <imammedo@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/i386/pc.h"
+#include "hw/pci/pci.h"
+#include "hw/misc/vmgenid.h"
+#include "hw/acpi/acpi.h"
+#include "qapi/visitor.h"
+
+#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
+
+typedef struct VmGenIdState {
+    PCIDevice parent_obj;
+    MemoryRegion iomem;
+    union {
+        uint8_t guid[16];
+        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
+    };
+    bool guid_set;
+} VmGenIdState;
+
+Object *find_vmgneid_dev(Error **errp)
+{
+    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
+    if (!obj) {
+        error_setg(errp, VMGENID_DEVICE " is not found");
+    }
+    return obj;
+}
+
+static void vmgenid_update_guest(VmGenIdState *s)
+{
+    Object *acpi_obj;
+    void *ptr = memory_region_get_ram_ptr(&s->iomem);
+
+    memcpy(ptr, &s->guid, sizeof(s->guid));
+    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
+
+    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
+    if (acpi_obj) {
+        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
+        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
+        ACPIREGS *acpi_regs = adevc->regs(adev);
+
+        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
+        acpi_update_sci(acpi_regs, adevc->sci(adev));
+    }
+}
+
+static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
+{
+    VmGenIdState *s = VMGENID(obj);
+
+    if (qemu_uuid_parse(value, s->guid) < 0) {
+        error_setg(errp, "'%s." VMGENID_GUID
+                   "': Failed to parse GUID string: %s",
+                   object_get_typename(OBJECT(s)),
+                   value);
+        return;
+    }
+
+    s->guid_set = true;
+    vmgenid_update_guest(s);
+}
+
+static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
+                                   const char *name, Error **errp)
+{
+    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
+
+    if (value == PCI_BAR_UNMAPPED) {
+        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
+                   object_get_typename(OBJECT(obj)));
+        return;
+    }
+    visit_type_int(v, &value, name, errp);
+}
+
+static void vmgenid_initfn(Object *obj)
+{
+    VmGenIdState *s = VMGENID(obj);
+
+    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
+                           &error_abort);
+
+    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
+    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
+                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
+}
+
+
+static void vmgenid_realize(PCIDevice *dev, Error **errp)
+{
+    VmGenIdState *s = VMGENID(dev);
+    bool ambiguous = false;
+
+    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
+    if (ambiguous) {
+        error_setg(errp, "no more than one " VMGENID_DEVICE
+                         " device is permitted");
+        return;
+    }
+
+    if (!s->guid_set) {
+        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
+                   object_get_typename(OBJECT(s)));
+        return;
+    }
+
+    vmstate_register_ram(&s->iomem, DEVICE(s));
+    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
+        PCI_BASE_ADDRESS_MEM_PREFETCH |
+        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
+        &s->iomem);
+    return;
+}
+
+static void vmgenid_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    dc->hotpluggable = false;
+    k->realize = vmgenid_realize;
+    k->vendor_id = PCI_VENDOR_ID_REDHAT;
+    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
+    k->class_id = PCI_CLASS_MEMORY_RAM;
+}
+
+static const TypeInfo vmgenid_device_info = {
+    .name          = VMGENID_DEVICE,
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(VmGenIdState),
+    .instance_init = vmgenid_initfn,
+    .class_init    = vmgenid_class_init,
+};
+
+static void vmgenid_register_types(void)
+{
+    type_register_static(&vmgenid_device_info);
+}
+
+type_init(vmgenid_register_types)
diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
new file mode 100644
index 0000000..b90882c
--- /dev/null
+++ b/include/hw/misc/vmgenid.h
@@ -0,0 +1,27 @@
+/*
+ *  Virtual Machine Generation ID Device
+ *
+ *  Copyright (C) 2016 Red Hat Inc.
+ *
+ *  Authors: Gal Hammer <ghammer@redhat.com>
+ *           Igor Mammedov <imammedo@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef HW_MISC_VMGENID_H
+#define HW_MISC_VMGENID_H
+
+#include "qom/object.h"
+
+#define VMGENID_DEVICE           "vmgenid"
+#define VMGENID_GUID             "guid"
+#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
+#define VMGENID_VMGID_BUF_SIZE   0x1000
+#define VMGENID_VMGID_BUF_BAR    0
+
+Object *find_vmgneid_dev(Error **errp);
+
+#endif
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index dedf277..f4c9d48 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -94,6 +94,7 @@
 #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
 #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
 #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
+#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
 #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
 
 #define FMT_PCIBUS                      PRIx64
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v19 4/9] tests: add a unit test for the vmgenid device.
  2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
                   ` (2 preceding siblings ...)
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device Igor Mammedov
@ 2016-01-28 10:54 ` Igor Mammedov
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 5/9] qmp/hmp: add query-vm-generation-id and 'info vm-generation-id' commands Igor Mammedov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, mst, ghammer, lersek, lcapitulino

* test that guest can read GUID provided on CLI from buffer
  accessing it at HPA which is available via 'vmgid-addr'
  property when device is inintialized.
* test setting GUID at runtime and check that it's updated
  at expected HPA.

Signed-off-by: Gal Hammer <ghammer@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 tests/Makefile       |  2 ++
 tests/vmgenid-test.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)
 create mode 100644 tests/vmgenid-test.c

diff --git a/tests/Makefile b/tests/Makefile
index 650e654..5211971 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -206,6 +206,7 @@ check-qtest-i386-y += tests/usb-hcd-xhci-test$(EXESUF)
 gcov-files-i386-y += hw/usb/hcd-xhci.c
 check-qtest-i386-y += tests/pc-cpu-test$(EXESUF)
 check-qtest-i386-y += tests/q35-test$(EXESUF)
+check-qtest-i386-y += tests/vmgenid-test$(EXESUF)
 gcov-files-i386-y += hw/pci-host/q35.c
 check-qtest-i386-$(CONFIG_VHOST_NET_TEST_i386) += tests/vhost-user-test$(EXESUF)
 ifeq ($(CONFIG_VHOST_NET_TEST_i386),)
@@ -565,6 +566,7 @@ tests/test-write-threshold$(EXESUF): tests/test-write-threshold.o $(test-block-o
 tests/test-netfilter$(EXESUF): tests/test-netfilter.o $(qtest-obj-y)
 tests/ivshmem-test$(EXESUF): tests/ivshmem-test.o contrib/ivshmem-server/ivshmem-server.o $(libqos-pc-obj-y)
 tests/vhost-user-bridge$(EXESUF): tests/vhost-user-bridge.o
+tests/vmgenid-test$(EXESUF): tests/vmgenid-test.o
 
 ifeq ($(CONFIG_POSIX),y)
 LIBS += -lutil
diff --git a/tests/vmgenid-test.c b/tests/vmgenid-test.c
new file mode 100644
index 0000000..9388180
--- /dev/null
+++ b/tests/vmgenid-test.c
@@ -0,0 +1,93 @@
+/*
+ * QTest testcase for VM Generation ID
+ *
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <string.h>
+#include <unistd.h>
+#include "libqtest.h"
+
+/* Wait at most 1 minute */
+#define TEST_DELAY (1 * G_USEC_PER_SEC / 10)
+#define TEST_CYCLES MAX((60 * G_USEC_PER_SEC / TEST_DELAY), 1)
+
+#define VGID_GUID "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
+
+static void vmgenid_check_guid(const uint8_t *expected)
+{
+    uint8_t guid[16];
+    int i;
+    uint32_t addr = 0;
+    QDict *response;
+
+    for (i = 0; i < TEST_CYCLES; ++i) {
+        response = qmp("{ 'execute': 'qom-get', 'arguments': { "
+                       "'path': '/machine/peripheral/testvgid', "
+                       "'property': 'vmgid-addr' } }");
+        if (qdict_haskey(response, "return")) {
+            addr = qdict_get_int(response, "return");
+        }
+        QDECREF(response);
+        if (addr) {
+            break;
+        }
+        g_usleep(TEST_DELAY);
+    }
+    g_assert(addr);
+
+    /* Skip the ACPI ADDR method and read the GUID directly from memory */
+    for (i = 0; i < 16; i++) {
+        guid[i] = readb(addr + i);
+    }
+
+    g_assert(memcmp(guid, expected, sizeof(guid)) == 0);
+}
+
+static void vmgenid_test(void)
+{
+    static const uint8_t expected[16] = {
+        0x32, 0x4e, 0x6e, 0xaf, 0xd1, 0xd1, 0x4b, 0xf6,
+        0xbf, 0x41, 0xb9, 0xbb, 0x6c, 0x91, 0xfb, 0x87
+    };
+    vmgenid_check_guid(expected);
+}
+
+static void vmgenid_set_guid_test(void)
+{
+    QDict *response;
+    static const uint8_t expected[16] = {
+        0x12, 0x4e, 0x6e, 0xaf, 0xd1, 0xd1, 0x4b, 0xf6,
+        0xbf, 0x41, 0xb9, 0xbb, 0x6c, 0x91, 0xfb, 0x87
+    };
+
+    response = qmp("{ 'execute': 'qom-set', 'arguments': { "
+                   "'path': '/machine/peripheral/testvgid', "
+                   "'property': 'guid', 'value': '"
+                   "124e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87' } }");
+    g_assert(qdict_haskey(response, "return"));
+    QDECREF(response);
+
+    vmgenid_check_guid(expected);
+}
+
+int main(int argc, char **argv)
+{
+    int ret;
+
+    g_test_init(&argc, &argv, NULL);
+
+    qtest_start("-machine accel=tcg -device vmgenid,id=testvgid,"
+                "guid=" VGID_GUID);
+    qtest_add_func("/vmgenid/vmgenid", vmgenid_test);
+    qtest_add_func("/vmgenid/vmgenid/set-guid", vmgenid_set_guid_test);
+    ret = g_test_run();
+
+    qtest_end();
+
+    return ret;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v19 5/9] qmp/hmp: add query-vm-generation-id and 'info vm-generation-id' commands
  2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
                   ` (3 preceding siblings ...)
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 4/9] tests: add a unit test for the vmgenid device Igor Mammedov
@ 2016-01-28 10:54 ` Igor Mammedov
  2016-02-09 17:31   ` Eric Blake
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 6/9] qmp/hmp: add set-vm-generation-id commands Igor Mammedov
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, mst, ghammer, lersek, lcapitulino

Add commands to query Virtual Machine Generation ID counter.

QMP command example:
    { "execute": "query-vm-generation-id" }

HMP command example:
    info vm-generation-id

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v18:
  - add a new QMP type GuidInfo instead of reusing UuidInfo
    Eric Blake <eblake@redhat.com>
---
 hmp-commands-info.hx | 13 +++++++++++++
 hmp.c                |  9 +++++++++
 hmp.h                |  1 +
 hw/misc/vmgenid.c    | 20 ++++++++++++++++++++
 qapi-schema.json     | 20 ++++++++++++++++++++
 qmp-commands.hx      | 19 +++++++++++++++++++
 stubs/Makefile.objs  |  1 +
 stubs/vmgenid.c      |  7 +++++++
 8 files changed, 90 insertions(+)
 create mode 100644 stubs/vmgenid.c

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 9b71351..b649a5d 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -787,6 +787,19 @@ Display the value of a storage key (s390 only)
 ETEXI
 
 STEXI
+@item info vm-generation-id
+Show Virtual Machine Generation ID
+ETEXI
+
+    {
+        .name       = "vm-generation-id",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Show Virtual Machine Generation ID",
+        .mhandler.cmd = hmp_info_vm_generation_id,
+    },
+
+STEXI
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index 54f2620..aeb753d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2375,3 +2375,12 @@ void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict)
 
     qapi_free_RockerOfDpaGroupList(list);
 }
+
+void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict)
+{
+    GuidInfo *info = qmp_query_vm_generation_id(NULL);
+    if (info) {
+        monitor_printf(mon, "%s\n", info->guid);
+    }
+    qapi_free_GuidInfo(info);
+}
diff --git a/hmp.h b/hmp.h
index a8c5b5a..21c5132 100644
--- a/hmp.h
+++ b/hmp.h
@@ -131,5 +131,6 @@ void hmp_rocker(Monitor *mon, const QDict *qdict);
 void hmp_rocker_ports(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_flows(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
+void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
index a2fbdfc..24c0a4e 100644
--- a/hw/misc/vmgenid.c
+++ b/hw/misc/vmgenid.c
@@ -16,6 +16,7 @@
 #include "hw/misc/vmgenid.h"
 #include "hw/acpi/acpi.h"
 #include "qapi/visitor.h"
+#include "qmp-commands.h"
 
 #define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
 
@@ -38,6 +39,25 @@ Object *find_vmgneid_dev(Error **errp)
     return obj;
 }
 
+GuidInfo *qmp_query_vm_generation_id(Error **errp)
+{
+    GuidInfo *info;
+    VmGenIdState *vdev;
+    Object *obj = find_vmgneid_dev(errp);
+
+    if (!obj) {
+        return NULL;
+    }
+    vdev = VMGENID(obj);
+    info = g_malloc0(sizeof(*info));
+    info->guid = g_strdup_printf(UUID_FMT, vdev->guid[0], vdev->guid[1],
+        vdev->guid[2], vdev->guid[3], vdev->guid[4], vdev->guid[5],
+        vdev->guid[6], vdev->guid[7], vdev->guid[8], vdev->guid[9],
+        vdev->guid[10], vdev->guid[11], vdev->guid[12], vdev->guid[13],
+        vdev->guid[14], vdev->guid[15]);
+    return info;
+}
+
 static void vmgenid_update_guest(VmGenIdState *s)
 {
     Object *acpi_obj;
diff --git a/qapi-schema.json b/qapi-schema.json
index 8d04897..3f99549 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4083,3 +4083,23 @@
 ##
 { 'enum': 'ReplayMode',
   'data': [ 'none', 'record', 'play' ] }
+
+##
+# @GuidInfo:
+#
+# GUID information.
+#
+# @guid: the globally unique identifier
+#
+# Since: 2.6
+##
+{ 'struct': 'GuidInfo', 'data': {'guid': 'str'} }
+
+##
+# @query-vm-generation-id
+#
+# Show Virtual Machine Generation ID
+#
+# Since 2.6
+##
+{ 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index db072a6..38e4e16 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4795,3 +4795,22 @@ Example:
                  {"type": 0, "out-pport": 0, "pport": 0, "vlan-id": 3840,
                   "pop-vlan": 1, "id": 251658240}
    ]}
+
+EQMP
+
+    {
+        .name       = "query-vm-generation-id",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_query_vm_generation_id,
+    },
+
+SQMP
+Show Virtual Machine Generation ID counter
+-----
+
+Arguments: none
+
+Example:
+
+-> { "execute": "query-vm-generation-id" }
+<- {"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index d7898a0..c1ebfcc 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -38,3 +38,4 @@ stub-obj-y += qmp_pc_dimm_device_list.o
 stub-obj-y += target-monitor-defs.o
 stub-obj-y += target-get-monitor-def.o
 stub-obj-y += vhost.o
+stub-obj-y += vmgenid.o
diff --git a/stubs/vmgenid.c b/stubs/vmgenid.c
new file mode 100644
index 0000000..1ff8cd2
--- /dev/null
+++ b/stubs/vmgenid.c
@@ -0,0 +1,7 @@
+#include "qmp-commands.h"
+
+GuidInfo *qmp_query_vm_generation_id(Error **errp)
+{
+    error_setg(errp, "this command is not currently supported");
+    return NULL;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v19 6/9] qmp/hmp: add set-vm-generation-id commands
  2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
                   ` (4 preceding siblings ...)
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 5/9] qmp/hmp: add query-vm-generation-id and 'info vm-generation-id' commands Igor Mammedov
@ 2016-01-28 10:54 ` Igor Mammedov
  2016-02-09 17:33   ` Eric Blake
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 8/9] pc: put PIIX3 in slot 1 explicitly and cleanup functions assignment Igor Mammedov
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, mst, ghammer, lersek, lcapitulino

Add set-vm-generation-id command to set Virtual Machine
Generation ID counter.

QMP command example:
    { "execute": "set-vm-generation-id",
          "arguments": {
              "guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
          }
    }

HMP command example:
    set-vm-generation-id 324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v18:
   - use new GuidInfo type introduced in previous patch and fix
     argument to lowercase and corresponding example.
     Eric Blake <eblake@redhat.com>
   - s/if (errp != NULL)/if (errp)/
     Eric Blake <eblake@redhat.com>
---
 hmp-commands.hx   | 13 +++++++++++++
 hmp.c             | 12 ++++++++++++
 hmp.h             |  1 +
 hw/misc/vmgenid.c | 11 +++++++++++
 qapi-schema.json  | 11 +++++++++++
 qmp-commands.hx   | 22 ++++++++++++++++++++++
 stubs/vmgenid.c   |  6 ++++++
 7 files changed, 76 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index bb52e4d..d310707 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1756,5 +1756,18 @@ ETEXI
     },
 
 STEXI
+@item set-vm-generation-id @var{uuid}
+Set Virtual Machine Generation ID counter to @var{guid}
+ETEXI
+
+    {
+        .name       = "set-vm-generation-id",
+        .args_type  = "guid:s",
+        .params     = "guid",
+        .help       = "Set Virtual Machine Generation ID counter",
+        .mhandler.cmd = hmp_set_vm_generation_id,
+    },
+
+STEXI
 @end table
 ETEXI
diff --git a/hmp.c b/hmp.c
index aeb753d..c1f3a7a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2384,3 +2384,15 @@ void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict)
     }
     qapi_free_GuidInfo(info);
 }
+
+void hmp_set_vm_generation_id(Monitor *mon, const QDict *qdict)
+{
+    Error *errp = NULL;
+    const char *guid = qdict_get_str(qdict, "guid");
+
+    qmp_set_vm_generation_id(guid, &errp);
+    if (errp) {
+        hmp_handle_error(mon, &errp);
+        return;
+    }
+}
diff --git a/hmp.h b/hmp.h
index 21c5132..cbf2045 100644
--- a/hmp.h
+++ b/hmp.h
@@ -132,5 +132,6 @@ void hmp_rocker_ports(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_flows(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
+void hmp_set_vm_generation_id(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
index 24c0a4e..4fa52bc 100644
--- a/hw/misc/vmgenid.c
+++ b/hw/misc/vmgenid.c
@@ -58,6 +58,17 @@ GuidInfo *qmp_query_vm_generation_id(Error **errp)
     return info;
 }
 
+void qmp_set_vm_generation_id(const char *guid, Error **errp)
+{
+    Object *obj = find_vmgneid_dev(errp);
+
+    if (!obj) {
+        return;
+    }
+
+    object_property_set_str(obj, guid, VMGENID_GUID, errp);
+}
+
 static void vmgenid_update_guest(VmGenIdState *s)
 {
     Object *acpi_obj;
diff --git a/qapi-schema.json b/qapi-schema.json
index 3f99549..770d451 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4103,3 +4103,14 @@
 # Since 2.6
 ##
 { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
+
+##
+# @set-vm-generation-id
+#
+# Set Virtual Machine Generation ID
+#
+# @guid: new GUID to set as Virtual Machine Generation ID
+#
+# Since 2.6
+##
+{ 'command': 'set-vm-generation-id', 'data': { 'guid': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 38e4e16..84738c7 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4814,3 +4814,25 @@ Example:
 
 -> { "execute": "query-vm-generation-id" }
 <- {"return": {"guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"}}
+
+EQMP
+
+    {
+        .name       = "set-vm-generation-id",
+        .args_type  = "guid:s",
+        .mhandler.cmd_new = qmp_marshal_set_vm_generation_id,
+    },
+
+SQMP
+Set Virtual Machine Generation ID counter
+-----
+
+Arguments:
+
+- "guid": counter ID in GUID string representation (json-string)"
+
+Example:
+
+-> { "execute": "set-vm-generation-id" ,
+     "arguments": { "guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87" } }
+<- {"return": {}}
diff --git a/stubs/vmgenid.c b/stubs/vmgenid.c
index 1ff8cd2..6af0b73 100644
--- a/stubs/vmgenid.c
+++ b/stubs/vmgenid.c
@@ -5,3 +5,9 @@ GuidInfo *qmp_query_vm_generation_id(Error **errp)
     error_setg(errp, "this command is not currently supported");
     return NULL;
 }
+
+void qmp_set_vm_generation_id(const char *guid, Error **errp)
+{
+    error_setg(errp, "this command is not currently supported");
+    return;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v19 8/9] pc: put PIIX3 in slot 1 explicitly and cleanup functions assignment
  2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
                   ` (5 preceding siblings ...)
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 6/9] qmp/hmp: add set-vm-generation-id commands Igor Mammedov
@ 2016-01-28 10:54 ` Igor Mammedov
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 9/9] pc/q53: by default put vmgenid device as an function of ISA bridge Igor Mammedov
  2016-01-28 10:58 ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Igor Mammedov
  8 siblings, 0 replies; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, mst, ghammer, lersek, lcapitulino

currently slot for PIIX3 bridge is selected
dynamically but it's always in slot 1 for
existing machine types.
However it's easy to regress if another PCI
device were added before PIIX3 is created and
also requires passing around devfn of the
created bridge.
Replace dynamic slot assignment with a static
one like it's done for ICH9_LPC, explicitly
setting slot # for the bridge.
While at it cleanup IDE/USB/PIIX4_PM functions
assignment, replacing magic offsets with defines.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
Static assignment will help with adding other
functions to multifunction bridge in following
patch.
---
 hw/i386/pc_piix.c    | 17 ++++++++++-------
 hw/pci-host/piix.c   |  9 ++++-----
 include/hw/i386/pc.h |  8 +++++++-
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index bc74557..2ea3d84 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -73,7 +73,6 @@ static void pc_init1(MachineState *machine,
     PCIBus *pci_bus;
     ISABus *isa_bus;
     PCII440FXState *i440fx_state;
-    int piix3_devfn = -1;
     qemu_irq *gsi;
     qemu_irq *i8259;
     qemu_irq smi_irq;
@@ -179,7 +178,7 @@ static void pc_init1(MachineState *machine,
     if (pcmc->pci_enabled) {
         pci_bus = i440fx_init(host_type,
                               pci_type,
-                              &i440fx_state, &piix3_devfn, &isa_bus, gsi,
+                              &i440fx_state, &isa_bus, gsi,
                               system_memory, system_io, machine->ram_size,
                               pcms->below_4g_mem_size,
                               pcms->above_4g_mem_size,
@@ -229,9 +228,11 @@ static void pc_init1(MachineState *machine,
     if (pcmc->pci_enabled) {
         PCIDevice *dev;
         if (xen_enabled()) {
-            dev = pci_piix3_xen_ide_init(pci_bus, hd, piix3_devfn + 1);
+            dev = pci_piix3_xen_ide_init(pci_bus, hd,
+                PCI_DEVFN(PIIX3_PCI_SLOT, PIIX3_IDE_FUNC));
         } else {
-            dev = pci_piix3_ide_init(pci_bus, hd, piix3_devfn + 1);
+            dev = pci_piix3_ide_init(pci_bus, hd,
+                PCI_DEVFN(PIIX3_PCI_SLOT, PIIX3_IDE_FUNC));
         }
         idebus[0] = qdev_get_child_bus(&dev->qdev, "ide.0");
         idebus[1] = qdev_get_child_bus(&dev->qdev, "ide.1");
@@ -254,7 +255,8 @@ static void pc_init1(MachineState *machine,
     pc_cmos_init(pcms, idebus[0], idebus[1], rtc_state);
 
     if (pcmc->pci_enabled && usb_enabled()) {
-        pci_create_simple(pci_bus, piix3_devfn + 2, "piix3-usb-uhci");
+        pci_create_simple(pci_bus, PCI_DEVFN(PIIX3_PCI_SLOT, PIIX3_USB_FUNC),
+                          "piix3-usb-uhci");
     }
 
     if (pcmc->pci_enabled && acpi_enabled) {
@@ -263,8 +265,9 @@ static void pc_init1(MachineState *machine,
 
         smi_irq = qemu_allocate_irq(pc_acpi_smi_interrupt, first_cpu, 0);
         /* TODO: Populate SPD eeprom data.  */
-        smbus = piix4_pm_init(pci_bus, piix3_devfn + 3, 0xb100,
-                              gsi[9], smi_irq,
+        smbus = piix4_pm_init(pci_bus,
+                              PCI_DEVFN(PIIX3_PCI_SLOT, PIIX3_PIIX4_PM_FUNC),
+                              0xb100, gsi[9], smi_irq,
                               pc_machine_is_smm_enabled(pcms),
                               &piix4_pm);
         smbus_eeprom_init(smbus, 8, NULL, 0);
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index b0d7e31..b8bf1fc 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -310,7 +310,6 @@ static void i440fx_realize(PCIDevice *dev, Error **errp)
 
 PCIBus *i440fx_init(const char *host_type, const char *pci_type,
                     PCII440FXState **pi440fx_state,
-                    int *piix3_devfn,
                     ISABus **isa_bus, qemu_irq *pic,
                     MemoryRegion *address_space_mem,
                     MemoryRegion *address_space_io,
@@ -382,13 +381,15 @@ PCIBus *i440fx_init(const char *host_type, const char *pci_type,
      * These additional routes can be discovered through ACPI. */
     if (xen_enabled()) {
         PCIDevice *pci_dev = pci_create_simple_multifunction(b,
-                             -1, true, "PIIX3-xen");
+                             PCI_DEVFN(PIIX3_PCI_SLOT, PIIX3_PCI_FUNC),
+                             true, "PIIX3-xen");
         piix3 = PIIX3_PCI_DEVICE(pci_dev);
         pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
                 piix3, XEN_PIIX_NUM_PIRQS);
     } else {
         PCIDevice *pci_dev = pci_create_simple_multifunction(b,
-                             -1, true, "PIIX3");
+                             PCI_DEVFN(PIIX3_PCI_SLOT, PIIX3_PCI_FUNC),
+                             true, "PIIX3");
         piix3 = PIIX3_PCI_DEVICE(pci_dev);
         pci_bus_irqs(b, piix3_set_irq, pci_slot_get_pirq, piix3,
                 PIIX_NUM_PIRQS);
@@ -397,8 +398,6 @@ PCIBus *i440fx_init(const char *host_type, const char *pci_type,
     piix3->pic = pic;
     *isa_bus = ISA_BUS(qdev_get_child_bus(DEVICE(piix3), "isa.0"));
 
-    *piix3_devfn = piix3->dev.devfn;
-
     ram_size = ram_size / 8 / 1024 / 1024;
     if (ram_size > 255) {
         ram_size = 255;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 7713361..69ed687 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -291,8 +291,14 @@ typedef struct PCII440FXState PCII440FXState;
 
 #define TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE "igd-passthrough-i440FX"
 
+#define PIIX3_PCI_SLOT                       1
+#define PIIX3_PCI_FUNC                       0
+#define PIIX3_IDE_FUNC                       1
+#define PIIX3_USB_FUNC                       2
+#define PIIX3_PIIX4_PM_FUNC                  3
+
 PCIBus *i440fx_init(const char *host_type, const char *pci_type,
-                    PCII440FXState **pi440fx_state, int *piix_devfn,
+                    PCII440FXState **pi440fx_state,
                     ISABus **isa_bus, qemu_irq *pic,
                     MemoryRegion *address_space_mem,
                     MemoryRegion *address_space_io,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v19 9/9] pc/q53: by default put vmgenid device as an function of ISA bridge
  2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
                   ` (6 preceding siblings ...)
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 8/9] pc: put PIIX3 in slot 1 explicitly and cleanup functions assignment Igor Mammedov
@ 2016-01-28 10:54 ` Igor Mammedov
  2016-01-28 10:58 ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Igor Mammedov
  8 siblings, 0 replies; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, mst, ghammer, lersek, lcapitulino

it will save a PCI slot that would be used otherwise.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/pc_piix.c      | 10 ++++++++++
 hw/i386/pc_q35.c       |  9 +++++++++
 include/hw/i386/ich9.h |  3 ++-
 include/hw/i386/pc.h   |  1 +
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2ea3d84..e51e885 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -54,6 +54,7 @@
 #endif
 #include "migration/migration.h"
 #include "kvm_i386.h"
+#include "hw/misc/vmgenid.h"
 
 #define MAX_IDE_BUS 2
 
@@ -412,8 +413,17 @@ static void pc_xen_hvm_init(MachineState *machine)
     } \
     DEFINE_PC_MACHINE(suffix, name, pc_init_##suffix, optionfn)
 
+#define DEFAULT_PC_PROPS \
+    { \
+        .driver   = VMGENID_DEVICE, \
+        .property = "addr", \
+        .value    = stringify(PIIX3_PCI_SLOT) "."  \
+                    stringify(PIIX3_VMGENID_FUNC), \
+    }, \
+
 static void pc_i440fx_machine_options(MachineClass *m)
 {
+    SET_MACHINE_COMPAT(m, DEFAULT_PC_PROPS);
     m->family = "pc_piix";
     m->desc = "Standard PC (i440FX + PIIX, 1996)";
     m->hot_add_cpu = pc_hot_add_cpu;
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 6128b02..8c8d4ab 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -45,6 +45,7 @@
 #include "hw/usb.h"
 #include "qemu/error-report.h"
 #include "migration/migration.h"
+#include "hw/misc/vmgenid.h"
 
 /* ICH9 AHCI has 6 ports */
 #define MAX_SATA_PORTS     6
@@ -335,9 +336,17 @@ static void pc_compat_1_4(MachineState *machine)
     } \
     DEFINE_PC_MACHINE(suffix, name, pc_init_##suffix, optionfn)
 
+#define DEFAULT_Q35_PROPS \
+    { \
+        .driver   = VMGENID_DEVICE, \
+        .property = "addr",         \
+        .value    = stringify(ICH9_LPC_DEV) "."       \
+                    stringify(ICH9_LPC_VMGENID_FUNC), \
+    }, \
 
 static void pc_q35_machine_options(MachineClass *m)
 {
+    SET_MACHINE_COMPAT(m, DEFAULT_Q35_PROPS);
     m->family = "pc_q35";
     m->desc = "Standard PC (Q35 + ICH9, 2009)";
     m->hot_add_cpu = pc_hot_add_cpu;
diff --git a/include/hw/i386/ich9.h b/include/hw/i386/ich9.h
index b9d2b04..271b0c7 100644
--- a/include/hw/i386/ich9.h
+++ b/include/hw/i386/ich9.h
@@ -129,8 +129,9 @@ Object *ich9_lpc_find(void);
 #define ICH9_A2_LPC                             "ICH9 A2 LPC"
 #define ICH9_A2_LPC_SAVEVM_VERSION              0
 
-#define ICH9_LPC_DEV                            31
+#define ICH9_LPC_DEV                            0x1f
 #define ICH9_LPC_FUNC                           0
+#define ICH9_LPC_VMGENID_FUNC                   6
 
 #define ICH9_A2_LPC_REVISION                    0x2
 #define ICH9_LPC_NB_PIRQS                       8       /* PCI A-H */
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 69ed687..af3a74a 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -296,6 +296,7 @@ typedef struct PCII440FXState PCII440FXState;
 #define PIIX3_IDE_FUNC                       1
 #define PIIX3_USB_FUNC                       2
 #define PIIX3_PIIX4_PM_FUNC                  3
+#define PIIX3_VMGENID_FUNC                   7
 
 PCIBus *i440fx_init(const char *host_type, const char *pci_type,
                     PCII440FXState **pi440fx_state,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly
  2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
                   ` (7 preceding siblings ...)
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 9/9] pc/q53: by default put vmgenid device as an function of ISA bridge Igor Mammedov
@ 2016-01-28 10:58 ` Igor Mammedov
  2016-01-28 14:02   ` Eduardo Habkost
  2016-01-29 12:51   ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Cornelia Huck
  8 siblings, 2 replies; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 10:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: ehabkost, mst, ghammer, lersek, agraf, lcapitulino, borntraeger,
	qemu-ppc, cornelia.huck, pbonzini, rth, david

Switch to adding compat properties incrementaly instead of
completly overwriting compat_props per machine type.
That removes data duplication which we have due to nested
[PC|SPAPR]_COMPAT_* macros.

It also allows to set default device properties from
default foo_machine_options() hook, which will be used
in following patch for putting VMGENID device as
a function if ISA bridge on pc/q35 machines.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>

compat_props GArray
---
 hw/core/machine.c          | 10 ++++++++++
 hw/ppc/spapr.c             |  3 ---
 hw/s390x/s390-virtio-ccw.c | 12 ++----------
 include/hw/boards.h        | 11 +++++++++--
 include/hw/i386/pc.h       |  9 ---------
 vl.c                       |  6 +++++-
 6 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index c46ddc7..2b96f47 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -526,6 +526,15 @@ bool machine_mem_merge(MachineState *machine)
     return machine->mem_merge;
 }
 
+static void machine_class_finalize(ObjectClass *klass, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(klass);
+
+    if (mc->compat_props) {
+        g_array_free(mc->compat_props, true);
+    }
+}
+
 static const TypeInfo machine_info = {
     .name = TYPE_MACHINE,
     .parent = TYPE_OBJECT,
@@ -533,6 +542,7 @@ static const TypeInfo machine_info = {
     .class_size = sizeof(MachineClass),
     .class_init    = machine_class_init,
     .class_base_init = machine_class_base_init,
+    .class_finalize = machine_class_finalize,
     .instance_size = sizeof(MachineState),
     .instance_init = machine_initfn,
     .instance_finalize = machine_finalize,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 50e5a26..4ec1156 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2412,7 +2412,6 @@ DEFINE_SPAPR_MACHINE(2_4, "2.4", false);
  * pseries-2.3
  */
 #define SPAPR_COMPAT_2_3 \
-        SPAPR_COMPAT_2_4 \
         HW_COMPAT_2_3 \
         {\
             .driver   = "spapr-pci-host-bridge",\
@@ -2439,7 +2438,6 @@ DEFINE_SPAPR_MACHINE(2_3, "2.3", false);
  */
 
 #define SPAPR_COMPAT_2_2 \
-        SPAPR_COMPAT_2_3 \
         HW_COMPAT_2_2 \
         {\
             .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
@@ -2463,7 +2461,6 @@ DEFINE_SPAPR_MACHINE(2_2, "2.2", false);
  * pseries-2.1
  */
 #define SPAPR_COMPAT_2_1 \
-        SPAPR_COMPAT_2_2 \
         HW_COMPAT_2_1
 
 static void spapr_machine_2_1_instance_options(MachineState *machine)
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 586ddbb..0d3c3f8 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -282,13 +282,9 @@ static const TypeInfo ccw_machine_info = {
 static void ccw_machine_2_4_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
-    static GlobalProperty compat_props[] = {
-        CCW_COMPAT_2_4
-        { /* end of list */ }
-    };
 
     mc->desc = "VirtIO-ccw based S390 machine v2.4";
-    mc->compat_props = compat_props;
+    SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_4);
 }
 
 static const TypeInfo ccw_machine_2_4_info = {
@@ -300,13 +296,9 @@ static const TypeInfo ccw_machine_2_4_info = {
 static void ccw_machine_2_5_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
-    static GlobalProperty compat_props[] = {
-        CCW_COMPAT_2_5
-        { /* end of list */ }
-    };
 
     mc->desc = "VirtIO-ccw based S390 machine v2.5";
-    mc->compat_props = compat_props;
+    SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_5);
 }
 
 static const TypeInfo ccw_machine_2_5_info = {
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 0f30959..cdb4a98 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -90,7 +90,7 @@ struct MachineClass {
     const char *default_machine_opts;
     const char *default_boot_order;
     const char *default_display;
-    GlobalProperty *compat_props;
+    GArray *compat_props;
     const char *hw_version;
     ram_addr_t default_ram_size;
     bool option_rom_has_mr;
@@ -159,11 +159,18 @@ struct MachineState {
 
 #define SET_MACHINE_COMPAT(m, COMPAT) \
     do {                              \
+        int i;                        \
         static GlobalProperty props[] = {       \
             COMPAT                              \
             { /* end of list */ }               \
         };                                      \
-        (m)->compat_props = props;              \
+        if (!m->compat_props) { \
+            m->compat_props = g_array_new(false, false, sizeof(void *)); \
+        } \
+        for (i = 0; props[i].driver != NULL; i++) {    \
+            GlobalProperty *prop = &props[i];          \
+            g_array_append_val(m->compat_props, prop); \
+        }                                              \
     } while (0)
 
 #endif
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 65e8f24..7713361 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -361,7 +361,6 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     HW_COMPAT_2_5
 
 #define PC_COMPAT_2_4 \
-    PC_COMPAT_2_5 \
     HW_COMPAT_2_4 \
     {\
         .driver   = "Haswell-" TYPE_X86_CPU,\
@@ -432,7 +431,6 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 
 
 #define PC_COMPAT_2_3 \
-    PC_COMPAT_2_4 \
     HW_COMPAT_2_3 \
     {\
         .driver   = TYPE_X86_CPU,\
@@ -513,7 +511,6 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     },
 
 #define PC_COMPAT_2_2 \
-    PC_COMPAT_2_3 \
     HW_COMPAT_2_2 \
     {\
         .driver = "kvm64" "-" TYPE_X86_CPU,\
@@ -607,7 +604,6 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     },
 
 #define PC_COMPAT_2_1 \
-    PC_COMPAT_2_2 \
     HW_COMPAT_2_1 \
     {\
         .driver = "coreduo" "-" TYPE_X86_CPU,\
@@ -621,7 +617,6 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     },
 
 #define PC_COMPAT_2_0 \
-    PC_COMPAT_2_1 \
     {\
         .driver   = "virtio-scsi-pci",\
         .property = "any_layout",\
@@ -681,7 +676,6 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     },
 
 #define PC_COMPAT_1_7 \
-    PC_COMPAT_2_0 \
     {\
         .driver   = TYPE_USB_DEVICE,\
         .property = "msos-desc",\
@@ -699,7 +693,6 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     },
 
 #define PC_COMPAT_1_6 \
-    PC_COMPAT_1_7 \
     {\
         .driver   = "e1000",\
         .property = "mitigation",\
@@ -723,7 +716,6 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     },
 
 #define PC_COMPAT_1_5 \
-    PC_COMPAT_1_6 \
     {\
         .driver   = "Conroe-" TYPE_X86_CPU,\
         .property = "model",\
@@ -767,7 +759,6 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
     },
 
 #define PC_COMPAT_1_4 \
-    PC_COMPAT_1_5 \
     {\
         .driver   = "scsi-hd",\
         .property = "discard_granularity",\
diff --git a/vl.c b/vl.c
index f043009..cf103d7 100644
--- a/vl.c
+++ b/vl.c
@@ -4492,7 +4492,11 @@ int main(int argc, char **argv, char **envp)
     }
 
     if (machine_class->compat_props) {
-        qdev_prop_register_global_list(machine_class->compat_props);
+        GlobalProperty *p;
+        for (i = 0; i < machine_class->compat_props->len; i++) {
+            p = g_array_index(machine_class->compat_props, GlobalProperty *, i);
+            qdev_prop_register_global(p);
+        }
     }
     qemu_add_globals();
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device Igor Mammedov
@ 2016-01-28 11:13   ` Michael S. Tsirkin
  2016-01-28 12:03     ` Igor Mammedov
  2016-01-28 13:48     ` Laszlo Ersek
  0 siblings, 2 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-01-28 11:13 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: ehabkost, ghammer, lersek, qemu-devel, lcapitulino

On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
> Based on Microsoft's specifications (paper can be
> downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> easily found by "Virtual Machine Generation ID" keywords),
> add a PCI device with corresponding description in
> SSDT ACPI table.
> 
> The GUID is set using "vmgenid.guid" property or
> a corresponding HMP/QMP command.
> 
> Example of using vmgenid device:
>  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> 
> 'vmgenid' device initialization flow is as following:
>  1. vmgenid has RAM BAR registered with size of GUID buffer
>  2. BIOS initializes PCI devices and it maps BAR in PCI hole
>  3. BIOS reads ACPI tables from QEMU, at that moment tables
>     are generated with \_SB.VMGI.ADDR constant pointing to
>     GPA where BIOS's mapped vmgenid's BAR earlier
> 
> Note:
> This implementation uses PCI class 0x0500 code for vmgenid device,
> that is marked as NO_DRV in Windows's machine.inf.
> Testing various Windows versions showed that, OS
> doesn't touch nor checks for resource conflicts
> for such PCI devices.
> There was concern that during PCI rebalancing, OS
> could reprogram the BAR at other place, which would
> leave VGEN.ADDR pointing to the old (no more valid)
> address.
> However testing showed that Windows does rebalancing
> only for PCI device that have a driver attached
> and completely ignores NO_DRV class of devices.
> Which in turn creates a problem where OS could remap
> one of PCI devices(with driver) over BAR used by
> a driver-less PCI device.
> Statically declaring used memory range as VGEN._CRS
> makes OS to honor resource reservation and an ignored
> BAR range is not longer touched during PCI rebalancing.
> 
> Signed-off-by: Gal Hammer <ghammer@redhat.com>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

It's an interesting hack, but this needs some thought. BIOS has no idea
this BAR is special and can not be rebalanced, so it might put the BAR
in the middle of the range, in effect fragmenting it.

Really I think something like V12 just rewritten using the new APIs
(probably with something like build_append_named_dword that I suggested)
would be much a simpler way to implement this device, given
the weird API limitations.

And hey, if you want to use a pci device to pass the physical
address guest to host, instead of reserving
a couple of IO addresses, sure, stick it in pci config in
a vendor-specific capability, this way it'll get migrated
automatically.


> ---
> changes since 17:
>   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
>   - make BAR prefetchable to make region cached as per MS spec
>   - s/uuid/guid/ to match spec
> changes since 14:
>   - reserve BAR resources so that Windows won't touch it
>     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
>   - ACPI: split VGEN device of PCI device descriptor
>     and place it at PCI0 scope, so that won't be need trace its
>     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
>   - permit only one vmgenid to be created
>   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> ---
>  default-configs/i386-softmmu.mak   |   1 +
>  default-configs/x86_64-softmmu.mak |   1 +
>  docs/specs/pci-ids.txt             |   1 +
>  hw/i386/acpi-build.c               |  56 +++++++++++++-
>  hw/misc/Makefile.objs              |   1 +
>  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
>  include/hw/misc/vmgenid.h          |  27 +++++++
>  include/hw/pci/pci.h               |   1 +
>  8 files changed, 240 insertions(+), 2 deletions(-)
>  create mode 100644 hw/misc/vmgenid.c
>  create mode 100644 include/hw/misc/vmgenid.h
> 
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index b177e52..6402439 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -51,6 +51,7 @@ CONFIG_APIC=y
>  CONFIG_IOAPIC=y
>  CONFIG_PVPANIC=y
>  CONFIG_MEM_HOTPLUG=y
> +CONFIG_VMGENID=y
>  CONFIG_NVDIMM=y
>  CONFIG_ACPI_NVDIMM=y
>  CONFIG_XIO3130=y
> diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> index 6e3b312..fdac18f 100644
> --- a/default-configs/x86_64-softmmu.mak
> +++ b/default-configs/x86_64-softmmu.mak
> @@ -51,6 +51,7 @@ CONFIG_APIC=y
>  CONFIG_IOAPIC=y
>  CONFIG_PVPANIC=y
>  CONFIG_MEM_HOTPLUG=y
> +CONFIG_VMGENID=y
>  CONFIG_NVDIMM=y
>  CONFIG_ACPI_NVDIMM=y
>  CONFIG_XIO3130=y
> diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> index 0adcb89..e65ecf9 100644
> --- a/docs/specs/pci-ids.txt
> +++ b/docs/specs/pci-ids.txt
> @@ -47,6 +47,7 @@ PCI devices (other than virtio):
>  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
>  1b36:0006  PCI Rocker Ethernet switch device
>  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> +1b36:0009  PCI VM-Generation device
>  1b36:000a  PCI-PCI bridge (multiseat)
>  
>  All these devices are documented in docs/specs.
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 78758e2..0187262 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -44,6 +44,7 @@
>  #include "hw/acpi/tpm.h"
>  #include "sysemu/tpm_backend.h"
>  #include "hw/timer/mc146818rtc_regs.h"
> +#include "hw/misc/vmgenid.h"
>  
>  /* Supported chipsets: */
>  #include "hw/acpi/piix4.h"
> @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
>      info->applesmc_io_base = applesmc_port();
>  }
>  
> +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> +{
> +    Aml *dev, *pkg, *crs;
> +
> +    dev = aml_device("VGEN");
> +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> +
> +    pkg = aml_package(2);
> +    /* low 32 bits of UUID buffer addr */
> +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> +    /* high 32 bits of UUID buffer addr */
> +    aml_append(pkg, aml_int(buf_paddr >> 32));
> +    aml_append(dev, aml_name_decl("ADDR", pkg));
> +
> +    /*
> +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> +     * so Windows ignores VMGEN device completely and doesn't check
> +     * for resource conflicts which during PCI rebalancing can lead
> +     * to another PCI device claiming ignored BARs. To prevent this
> +     * statically reserve resources used by VM_Gen_Counter.
> +     * For more verbose comment see this commit message.

What does "this commit message" mean?

> +     */
> +     crs = aml_resource_template();
> +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> +                VMGENID_VMGID_BUF_SIZE));
> +     aml_append(dev, aml_name_decl("_CRS", crs));
> +     return dev;
> +}
> +
>  /*
>   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
>   * On i386 arch we only have two pci hosts, so we can look only for them.
> @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
>              }
>  
>              if (bus) {
> +                Object *vmgen;
>                  Aml *scope = aml_scope("PCI0");
>                  /* Scan all PCI buses. Generate tables to support hotplug. */
>                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
>                      aml_append(scope, dev);
>                  }
>  
> +                vmgen = find_vmgneid_dev(NULL);
> +                if (vmgen) {
> +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> +                    uint64_t buf_paddr =
> +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> +
> +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> +
> +                        method = aml_method("\\_GPE._E00", 0,
> +                                            AML_NOTSERIALIZED);
> +                        aml_append(method,
> +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> +                                       aml_int(0x80)));
> +                        aml_append(ssdt, method);
> +                    }
> +                }
> +
>                  aml_append(sb_scope, scope);
>              }
>          }
> @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
>      {
>          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
>  
> -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> -
>          if (misc->is_piix4) {
>              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
>              aml_append(method,
> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> index d4765c2..1f05edd 100644
> --- a/hw/misc/Makefile.objs
> +++ b/hw/misc/Makefile.objs
> @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
>  
>  obj-$(CONFIG_PVPANIC) += pvpanic.o
>  obj-$(CONFIG_EDU) += edu.o
> +obj-$(CONFIG_VMGENID) += vmgenid.o
>  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> new file mode 100644
> index 0000000..a2fbdfc
> --- /dev/null
> +++ b/hw/misc/vmgenid.c
> @@ -0,0 +1,154 @@
> +/*
> + *  Virtual Machine Generation ID Device
> + *
> + *  Copyright (C) 2016 Red Hat Inc.
> + *
> + *  Authors: Gal Hammer <ghammer@redhat.com>
> + *           Igor Mammedov <imammedo@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "hw/i386/pc.h"
> +#include "hw/pci/pci.h"
> +#include "hw/misc/vmgenid.h"
> +#include "hw/acpi/acpi.h"
> +#include "qapi/visitor.h"
> +
> +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> +
> +typedef struct VmGenIdState {
> +    PCIDevice parent_obj;
> +    MemoryRegion iomem;
> +    union {
> +        uint8_t guid[16];
> +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> +    };
> +    bool guid_set;
> +} VmGenIdState;
> +
> +Object *find_vmgneid_dev(Error **errp)
> +{
> +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> +    if (!obj) {
> +        error_setg(errp, VMGENID_DEVICE " is not found");
> +    }
> +    return obj;
> +}
> +
> +static void vmgenid_update_guest(VmGenIdState *s)
> +{
> +    Object *acpi_obj;
> +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> +
> +    memcpy(ptr, &s->guid, sizeof(s->guid));
> +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> +
> +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> +    if (acpi_obj) {
> +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> +        ACPIREGS *acpi_regs = adevc->regs(adev);
> +
> +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> +    }
> +}
> +
> +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> +{
> +    VmGenIdState *s = VMGENID(obj);
> +
> +    if (qemu_uuid_parse(value, s->guid) < 0) {
> +        error_setg(errp, "'%s." VMGENID_GUID
> +                   "': Failed to parse GUID string: %s",
> +                   object_get_typename(OBJECT(s)),
> +                   value);
> +        return;
> +    }
> +
> +    s->guid_set = true;
> +    vmgenid_update_guest(s);
> +}
> +
> +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> +                                   const char *name, Error **errp)
> +{
> +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> +
> +    if (value == PCI_BAR_UNMAPPED) {
> +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> +                   object_get_typename(OBJECT(obj)));
> +        return;
> +    }
> +    visit_type_int(v, &value, name, errp);
> +}
> +
> +static void vmgenid_initfn(Object *obj)
> +{
> +    VmGenIdState *s = VMGENID(obj);
> +
> +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> +                           &error_abort);
> +
> +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> +}
> +
> +
> +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> +{
> +    VmGenIdState *s = VMGENID(dev);
> +    bool ambiguous = false;
> +
> +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> +    if (ambiguous) {
> +        error_setg(errp, "no more than one " VMGENID_DEVICE
> +                         " device is permitted");
> +        return;
> +    }
> +
> +    if (!s->guid_set) {
> +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> +                   object_get_typename(OBJECT(s)));
> +        return;
> +    }
> +
> +    vmstate_register_ram(&s->iomem, DEVICE(s));
> +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> +        &s->iomem);
> +    return;
> +}
> +
> +static void vmgenid_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    dc->hotpluggable = false;
> +    k->realize = vmgenid_realize;
> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> +    k->class_id = PCI_CLASS_MEMORY_RAM;
> +}
> +
> +static const TypeInfo vmgenid_device_info = {
> +    .name          = VMGENID_DEVICE,
> +    .parent        = TYPE_PCI_DEVICE,
> +    .instance_size = sizeof(VmGenIdState),
> +    .instance_init = vmgenid_initfn,
> +    .class_init    = vmgenid_class_init,
> +};
> +
> +static void vmgenid_register_types(void)
> +{
> +    type_register_static(&vmgenid_device_info);
> +}
> +
> +type_init(vmgenid_register_types)
> diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> new file mode 100644
> index 0000000..b90882c
> --- /dev/null
> +++ b/include/hw/misc/vmgenid.h
> @@ -0,0 +1,27 @@
> +/*
> + *  Virtual Machine Generation ID Device
> + *
> + *  Copyright (C) 2016 Red Hat Inc.
> + *
> + *  Authors: Gal Hammer <ghammer@redhat.com>
> + *           Igor Mammedov <imammedo@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef HW_MISC_VMGENID_H
> +#define HW_MISC_VMGENID_H
> +
> +#include "qom/object.h"
> +
> +#define VMGENID_DEVICE           "vmgenid"
> +#define VMGENID_GUID             "guid"
> +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> +#define VMGENID_VMGID_BUF_SIZE   0x1000
> +#define VMGENID_VMGID_BUF_BAR    0
> +
> +Object *find_vmgneid_dev(Error **errp);
> +
> +#endif
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index dedf277..f4c9d48 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -94,6 +94,7 @@
>  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
>  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
>  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
>  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
>  
>  #define FMT_PCIBUS                      PRIx64
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-01-28 11:13   ` Michael S. Tsirkin
@ 2016-01-28 12:03     ` Igor Mammedov
  2016-01-28 12:59       ` Michael S. Tsirkin
  2016-01-28 13:48     ` Laszlo Ersek
  1 sibling, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 12:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: ehabkost, ghammer, lersek, qemu-devel, lcapitulino

On Thu, 28 Jan 2016 13:13:04 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
> > Based on Microsoft's specifications (paper can be
> > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > easily found by "Virtual Machine Generation ID" keywords),
> > add a PCI device with corresponding description in
> > SSDT ACPI table.
> > 
> > The GUID is set using "vmgenid.guid" property or
> > a corresponding HMP/QMP command.
> > 
> > Example of using vmgenid device:
> >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > 
> > 'vmgenid' device initialization flow is as following:
> >  1. vmgenid has RAM BAR registered with size of GUID buffer
> >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> >     are generated with \_SB.VMGI.ADDR constant pointing to
> >     GPA where BIOS's mapped vmgenid's BAR earlier
> > 
> > Note:
> > This implementation uses PCI class 0x0500 code for vmgenid device,
> > that is marked as NO_DRV in Windows's machine.inf.
> > Testing various Windows versions showed that, OS
> > doesn't touch nor checks for resource conflicts
> > for such PCI devices.
> > There was concern that during PCI rebalancing, OS
> > could reprogram the BAR at other place, which would
> > leave VGEN.ADDR pointing to the old (no more valid)
> > address.
> > However testing showed that Windows does rebalancing
> > only for PCI device that have a driver attached
> > and completely ignores NO_DRV class of devices.
> > Which in turn creates a problem where OS could remap
> > one of PCI devices(with driver) over BAR used by
> > a driver-less PCI device.
> > Statically declaring used memory range as VGEN._CRS
> > makes OS to honor resource reservation and an ignored
> > BAR range is not longer touched during PCI rebalancing.
> > 
> > Signed-off-by: Gal Hammer <ghammer@redhat.com>
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> 
> It's an interesting hack, but this needs some thought. BIOS has no idea
> this BAR is special and can not be rebalanced, so it might put the BAR
> in the middle of the range, in effect fragmenting it.
yep that's the only drawback in PCI approach.

> Really I think something like V12 just rewritten using the new APIs
> (probably with something like build_append_named_dword that I suggested)
> would be much a simpler way to implement this device, given
> the weird API limitations.
We went over stating drawbacks of both approaches several times 
and that's where I strongly disagree with using v12 AML patching
approach for reasons stated in those discussions.

> And hey, if you want to use a pci device to pass the physical
> address guest to host, instead of reserving
> a couple of IO addresses, sure, stick it in pci config in
> a vendor-specific capability, this way it'll get migrated
> automatically.
Could you elaborate more on this suggestion?

> 
> 
> > ---
> > changes since 17:
> >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> >   - make BAR prefetchable to make region cached as per MS spec
> >   - s/uuid/guid/ to match spec
> > changes since 14:
> >   - reserve BAR resources so that Windows won't touch it
> >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> >   - ACPI: split VGEN device of PCI device descriptor
> >     and place it at PCI0 scope, so that won't be need trace its
> >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> >   - permit only one vmgenid to be created
> >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > ---
> >  default-configs/i386-softmmu.mak   |   1 +
> >  default-configs/x86_64-softmmu.mak |   1 +
> >  docs/specs/pci-ids.txt             |   1 +
> >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> >  hw/misc/Makefile.objs              |   1 +
> >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> >  include/hw/misc/vmgenid.h          |  27 +++++++
> >  include/hw/pci/pci.h               |   1 +
> >  8 files changed, 240 insertions(+), 2 deletions(-)
> >  create mode 100644 hw/misc/vmgenid.c
> >  create mode 100644 include/hw/misc/vmgenid.h
> > 
> > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > index b177e52..6402439 100644
> > --- a/default-configs/i386-softmmu.mak
> > +++ b/default-configs/i386-softmmu.mak
> > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> >  CONFIG_IOAPIC=y
> >  CONFIG_PVPANIC=y
> >  CONFIG_MEM_HOTPLUG=y
> > +CONFIG_VMGENID=y
> >  CONFIG_NVDIMM=y
> >  CONFIG_ACPI_NVDIMM=y
> >  CONFIG_XIO3130=y
> > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > index 6e3b312..fdac18f 100644
> > --- a/default-configs/x86_64-softmmu.mak
> > +++ b/default-configs/x86_64-softmmu.mak
> > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> >  CONFIG_IOAPIC=y
> >  CONFIG_PVPANIC=y
> >  CONFIG_MEM_HOTPLUG=y
> > +CONFIG_VMGENID=y
> >  CONFIG_NVDIMM=y
> >  CONFIG_ACPI_NVDIMM=y
> >  CONFIG_XIO3130=y
> > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > index 0adcb89..e65ecf9 100644
> > --- a/docs/specs/pci-ids.txt
> > +++ b/docs/specs/pci-ids.txt
> > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> >  1b36:0006  PCI Rocker Ethernet switch device
> >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > +1b36:0009  PCI VM-Generation device
> >  1b36:000a  PCI-PCI bridge (multiseat)
> >  
> >  All these devices are documented in docs/specs.
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 78758e2..0187262 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -44,6 +44,7 @@
> >  #include "hw/acpi/tpm.h"
> >  #include "sysemu/tpm_backend.h"
> >  #include "hw/timer/mc146818rtc_regs.h"
> > +#include "hw/misc/vmgenid.h"
> >  
> >  /* Supported chipsets: */
> >  #include "hw/acpi/piix4.h"
> > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> >      info->applesmc_io_base = applesmc_port();
> >  }
> >  
> > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > +{
> > +    Aml *dev, *pkg, *crs;
> > +
> > +    dev = aml_device("VGEN");
> > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > +
> > +    pkg = aml_package(2);
> > +    /* low 32 bits of UUID buffer addr */
> > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > +    /* high 32 bits of UUID buffer addr */
> > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > +
> > +    /*
> > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > +     * so Windows ignores VMGEN device completely and doesn't check
> > +     * for resource conflicts which during PCI rebalancing can lead
> > +     * to another PCI device claiming ignored BARs. To prevent this
> > +     * statically reserve resources used by VM_Gen_Counter.
> > +     * For more verbose comment see this commit message.  
> 
> What does "this commit message" mean?
above commit message. Should I reword it to just 'see commit message'

> 
> > +     */
> > +     crs = aml_resource_template();
> > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > +                VMGENID_VMGID_BUF_SIZE));
> > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > +     return dev;
> > +}
> > +
> >  /*
> >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> >              }
> >  
> >              if (bus) {
> > +                Object *vmgen;
> >                  Aml *scope = aml_scope("PCI0");
> >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> >                      aml_append(scope, dev);
> >                  }
> >  
> > +                vmgen = find_vmgneid_dev(NULL);
> > +                if (vmgen) {
> > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > +                    uint64_t buf_paddr =
> > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > +
> > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > +
> > +                        method = aml_method("\\_GPE._E00", 0,
> > +                                            AML_NOTSERIALIZED);
> > +                        aml_append(method,
> > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > +                                       aml_int(0x80)));
> > +                        aml_append(ssdt, method);
> > +                    }
> > +                }
> > +
> >                  aml_append(sb_scope, scope);
> >              }
> >          }
> > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> >      {
> >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> >  
> > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > -
> >          if (misc->is_piix4) {
> >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> >              aml_append(method,
> > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > index d4765c2..1f05edd 100644
> > --- a/hw/misc/Makefile.objs
> > +++ b/hw/misc/Makefile.objs
> > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> >  
> >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> >  obj-$(CONFIG_EDU) += edu.o
> > +obj-$(CONFIG_VMGENID) += vmgenid.o
> >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > new file mode 100644
> > index 0000000..a2fbdfc
> > --- /dev/null
> > +++ b/hw/misc/vmgenid.c
> > @@ -0,0 +1,154 @@
> > +/*
> > + *  Virtual Machine Generation ID Device
> > + *
> > + *  Copyright (C) 2016 Red Hat Inc.
> > + *
> > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > + *           Igor Mammedov <imammedo@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "hw/i386/pc.h"
> > +#include "hw/pci/pci.h"
> > +#include "hw/misc/vmgenid.h"
> > +#include "hw/acpi/acpi.h"
> > +#include "qapi/visitor.h"
> > +
> > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > +
> > +typedef struct VmGenIdState {
> > +    PCIDevice parent_obj;
> > +    MemoryRegion iomem;
> > +    union {
> > +        uint8_t guid[16];
> > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > +    };
> > +    bool guid_set;
> > +} VmGenIdState;
> > +
> > +Object *find_vmgneid_dev(Error **errp)
> > +{
> > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > +    if (!obj) {
> > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > +    }
> > +    return obj;
> > +}
> > +
> > +static void vmgenid_update_guest(VmGenIdState *s)
> > +{
> > +    Object *acpi_obj;
> > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > +
> > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > +
> > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > +    if (acpi_obj) {
> > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > +
> > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > +    }
> > +}
> > +
> > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > +{
> > +    VmGenIdState *s = VMGENID(obj);
> > +
> > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > +        error_setg(errp, "'%s." VMGENID_GUID
> > +                   "': Failed to parse GUID string: %s",
> > +                   object_get_typename(OBJECT(s)),
> > +                   value);
> > +        return;
> > +    }
> > +
> > +    s->guid_set = true;
> > +    vmgenid_update_guest(s);
> > +}
> > +
> > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > +                                   const char *name, Error **errp)
> > +{
> > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > +
> > +    if (value == PCI_BAR_UNMAPPED) {
> > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > +                   object_get_typename(OBJECT(obj)));
> > +        return;
> > +    }
> > +    visit_type_int(v, &value, name, errp);
> > +}
> > +
> > +static void vmgenid_initfn(Object *obj)
> > +{
> > +    VmGenIdState *s = VMGENID(obj);
> > +
> > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > +                           &error_abort);
> > +
> > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > +}
> > +
> > +
> > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > +{
> > +    VmGenIdState *s = VMGENID(dev);
> > +    bool ambiguous = false;
> > +
> > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > +    if (ambiguous) {
> > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > +                         " device is permitted");
> > +        return;
> > +    }
> > +
> > +    if (!s->guid_set) {
> > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > +                   object_get_typename(OBJECT(s)));
> > +        return;
> > +    }
> > +
> > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > +        &s->iomem);
> > +    return;
> > +}
> > +
> > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > +
> > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > +    dc->hotpluggable = false;
> > +    k->realize = vmgenid_realize;
> > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > +}
> > +
> > +static const TypeInfo vmgenid_device_info = {
> > +    .name          = VMGENID_DEVICE,
> > +    .parent        = TYPE_PCI_DEVICE,
> > +    .instance_size = sizeof(VmGenIdState),
> > +    .instance_init = vmgenid_initfn,
> > +    .class_init    = vmgenid_class_init,
> > +};
> > +
> > +static void vmgenid_register_types(void)
> > +{
> > +    type_register_static(&vmgenid_device_info);
> > +}
> > +
> > +type_init(vmgenid_register_types)
> > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > new file mode 100644
> > index 0000000..b90882c
> > --- /dev/null
> > +++ b/include/hw/misc/vmgenid.h
> > @@ -0,0 +1,27 @@
> > +/*
> > + *  Virtual Machine Generation ID Device
> > + *
> > + *  Copyright (C) 2016 Red Hat Inc.
> > + *
> > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > + *           Igor Mammedov <imammedo@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef HW_MISC_VMGENID_H
> > +#define HW_MISC_VMGENID_H
> > +
> > +#include "qom/object.h"
> > +
> > +#define VMGENID_DEVICE           "vmgenid"
> > +#define VMGENID_GUID             "guid"
> > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > +#define VMGENID_VMGID_BUF_BAR    0
> > +
> > +Object *find_vmgneid_dev(Error **errp);
> > +
> > +#endif
> > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > index dedf277..f4c9d48 100644
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -94,6 +94,7 @@
> >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> >  
> >  #define FMT_PCIBUS                      PRIx64
> > -- 
> > 1.8.3.1  

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-01-28 12:03     ` Igor Mammedov
@ 2016-01-28 12:59       ` Michael S. Tsirkin
  2016-01-29 11:13         ` Igor Mammedov
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-01-28 12:59 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: ehabkost, ghammer, lersek, qemu-devel, lcapitulino

On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:
> On Thu, 28 Jan 2016 13:13:04 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
> > > Based on Microsoft's specifications (paper can be
> > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > easily found by "Virtual Machine Generation ID" keywords),
> > > add a PCI device with corresponding description in
> > > SSDT ACPI table.
> > > 
> > > The GUID is set using "vmgenid.guid" property or
> > > a corresponding HMP/QMP command.
> > > 
> > > Example of using vmgenid device:
> > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > 
> > > 'vmgenid' device initialization flow is as following:
> > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > >     are generated with \_SB.VMGI.ADDR constant pointing to
> > >     GPA where BIOS's mapped vmgenid's BAR earlier
> > > 
> > > Note:
> > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > that is marked as NO_DRV in Windows's machine.inf.
> > > Testing various Windows versions showed that, OS
> > > doesn't touch nor checks for resource conflicts
> > > for such PCI devices.
> > > There was concern that during PCI rebalancing, OS
> > > could reprogram the BAR at other place, which would
> > > leave VGEN.ADDR pointing to the old (no more valid)
> > > address.
> > > However testing showed that Windows does rebalancing
> > > only for PCI device that have a driver attached
> > > and completely ignores NO_DRV class of devices.
> > > Which in turn creates a problem where OS could remap
> > > one of PCI devices(with driver) over BAR used by
> > > a driver-less PCI device.
> > > Statically declaring used memory range as VGEN._CRS
> > > makes OS to honor resource reservation and an ignored
> > > BAR range is not longer touched during PCI rebalancing.
> > > 
> > > Signed-off-by: Gal Hammer <ghammer@redhat.com>
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> > 
> > It's an interesting hack, but this needs some thought. BIOS has no idea
> > this BAR is special and can not be rebalanced, so it might put the BAR
> > in the middle of the range, in effect fragmenting it.
> yep that's the only drawback in PCI approach.
> 
> > Really I think something like V12 just rewritten using the new APIs
> > (probably with something like build_append_named_dword that I suggested)
> > would be much a simpler way to implement this device, given
> > the weird API limitations.
> We went over stating drawbacks of both approaches several times 
> and that's where I strongly disagree with using v12 AML patching
> approach for reasons stated in those discussions.

Yes, IIRC you dislike the need to allocate an IO range to pass address
to host, and to have costom code to migrate the address.

> > And hey, if you want to use a pci device to pass the physical
> > address guest to host, instead of reserving
> > a couple of IO addresses, sure, stick it in pci config in
> > a vendor-specific capability, this way it'll get migrated
> > automatically.
> Could you elaborate more on this suggestion?

I really just mean using PCI_Config operation region.
If you wish, I'll try to post a prototype next week.

> > 
> > 
> > > ---
> > > changes since 17:
> > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > >   - make BAR prefetchable to make region cached as per MS spec
> > >   - s/uuid/guid/ to match spec
> > > changes since 14:
> > >   - reserve BAR resources so that Windows won't touch it
> > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > >   - ACPI: split VGEN device of PCI device descriptor
> > >     and place it at PCI0 scope, so that won't be need trace its
> > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > >   - permit only one vmgenid to be created
> > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > ---
> > >  default-configs/i386-softmmu.mak   |   1 +
> > >  default-configs/x86_64-softmmu.mak |   1 +
> > >  docs/specs/pci-ids.txt             |   1 +
> > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > >  hw/misc/Makefile.objs              |   1 +
> > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > >  include/hw/pci/pci.h               |   1 +
> > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > >  create mode 100644 hw/misc/vmgenid.c
> > >  create mode 100644 include/hw/misc/vmgenid.h
> > > 
> > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > index b177e52..6402439 100644
> > > --- a/default-configs/i386-softmmu.mak
> > > +++ b/default-configs/i386-softmmu.mak
> > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > >  CONFIG_IOAPIC=y
> > >  CONFIG_PVPANIC=y
> > >  CONFIG_MEM_HOTPLUG=y
> > > +CONFIG_VMGENID=y
> > >  CONFIG_NVDIMM=y
> > >  CONFIG_ACPI_NVDIMM=y
> > >  CONFIG_XIO3130=y
> > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > index 6e3b312..fdac18f 100644
> > > --- a/default-configs/x86_64-softmmu.mak
> > > +++ b/default-configs/x86_64-softmmu.mak
> > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > >  CONFIG_IOAPIC=y
> > >  CONFIG_PVPANIC=y
> > >  CONFIG_MEM_HOTPLUG=y
> > > +CONFIG_VMGENID=y
> > >  CONFIG_NVDIMM=y
> > >  CONFIG_ACPI_NVDIMM=y
> > >  CONFIG_XIO3130=y
> > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > index 0adcb89..e65ecf9 100644
> > > --- a/docs/specs/pci-ids.txt
> > > +++ b/docs/specs/pci-ids.txt
> > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > >  1b36:0006  PCI Rocker Ethernet switch device
> > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > +1b36:0009  PCI VM-Generation device
> > >  1b36:000a  PCI-PCI bridge (multiseat)
> > >  
> > >  All these devices are documented in docs/specs.
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index 78758e2..0187262 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -44,6 +44,7 @@
> > >  #include "hw/acpi/tpm.h"
> > >  #include "sysemu/tpm_backend.h"
> > >  #include "hw/timer/mc146818rtc_regs.h"
> > > +#include "hw/misc/vmgenid.h"
> > >  
> > >  /* Supported chipsets: */
> > >  #include "hw/acpi/piix4.h"
> > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > >      info->applesmc_io_base = applesmc_port();
> > >  }
> > >  
> > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > +{
> > > +    Aml *dev, *pkg, *crs;
> > > +
> > > +    dev = aml_device("VGEN");
> > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > +
> > > +    pkg = aml_package(2);
> > > +    /* low 32 bits of UUID buffer addr */
> > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > +    /* high 32 bits of UUID buffer addr */
> > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > +
> > > +    /*
> > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > +     * for resource conflicts which during PCI rebalancing can lead
> > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > +     * statically reserve resources used by VM_Gen_Counter.
> > > +     * For more verbose comment see this commit message.  
> > 
> > What does "this commit message" mean?
> above commit message. Should I reword it to just 'see commit message'
> 
> > 
> > > +     */
> > > +     crs = aml_resource_template();
> > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > +                VMGENID_VMGID_BUF_SIZE));
> > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > +     return dev;
> > > +}
> > > +
> > >  /*
> > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > >              }
> > >  
> > >              if (bus) {
> > > +                Object *vmgen;
> > >                  Aml *scope = aml_scope("PCI0");
> > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > >                      aml_append(scope, dev);
> > >                  }
> > >  
> > > +                vmgen = find_vmgneid_dev(NULL);
> > > +                if (vmgen) {
> > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > +                    uint64_t buf_paddr =
> > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > +
> > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > +
> > > +                        method = aml_method("\\_GPE._E00", 0,
> > > +                                            AML_NOTSERIALIZED);
> > > +                        aml_append(method,
> > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > +                                       aml_int(0x80)));
> > > +                        aml_append(ssdt, method);
> > > +                    }
> > > +                }
> > > +
> > >                  aml_append(sb_scope, scope);
> > >              }
> > >          }
> > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > >      {
> > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > >  
> > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > -
> > >          if (misc->is_piix4) {
> > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > >              aml_append(method,
> > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > index d4765c2..1f05edd 100644
> > > --- a/hw/misc/Makefile.objs
> > > +++ b/hw/misc/Makefile.objs
> > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > >  
> > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > >  obj-$(CONFIG_EDU) += edu.o
> > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > new file mode 100644
> > > index 0000000..a2fbdfc
> > > --- /dev/null
> > > +++ b/hw/misc/vmgenid.c
> > > @@ -0,0 +1,154 @@
> > > +/*
> > > + *  Virtual Machine Generation ID Device
> > > + *
> > > + *  Copyright (C) 2016 Red Hat Inc.
> > > + *
> > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > + *           Igor Mammedov <imammedo@redhat.com>
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > + * See the COPYING file in the top-level directory.
> > > + *
> > > + */
> > > +
> > > +#include "hw/i386/pc.h"
> > > +#include "hw/pci/pci.h"
> > > +#include "hw/misc/vmgenid.h"
> > > +#include "hw/acpi/acpi.h"
> > > +#include "qapi/visitor.h"
> > > +
> > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > +
> > > +typedef struct VmGenIdState {
> > > +    PCIDevice parent_obj;
> > > +    MemoryRegion iomem;
> > > +    union {
> > > +        uint8_t guid[16];
> > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > +    };
> > > +    bool guid_set;
> > > +} VmGenIdState;
> > > +
> > > +Object *find_vmgneid_dev(Error **errp)
> > > +{
> > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > +    if (!obj) {
> > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > +    }
> > > +    return obj;
> > > +}
> > > +
> > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > +{
> > > +    Object *acpi_obj;
> > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > +
> > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > +
> > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > +    if (acpi_obj) {
> > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > +
> > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > +    }
> > > +}
> > > +
> > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > +{
> > > +    VmGenIdState *s = VMGENID(obj);
> > > +
> > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > +                   "': Failed to parse GUID string: %s",
> > > +                   object_get_typename(OBJECT(s)),
> > > +                   value);
> > > +        return;
> > > +    }
> > > +
> > > +    s->guid_set = true;
> > > +    vmgenid_update_guest(s);
> > > +}
> > > +
> > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > +                                   const char *name, Error **errp)
> > > +{
> > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > +
> > > +    if (value == PCI_BAR_UNMAPPED) {
> > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > +                   object_get_typename(OBJECT(obj)));
> > > +        return;
> > > +    }
> > > +    visit_type_int(v, &value, name, errp);
> > > +}
> > > +
> > > +static void vmgenid_initfn(Object *obj)
> > > +{
> > > +    VmGenIdState *s = VMGENID(obj);
> > > +
> > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > +                           &error_abort);
> > > +
> > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > +}
> > > +
> > > +
> > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > +{
> > > +    VmGenIdState *s = VMGENID(dev);
> > > +    bool ambiguous = false;
> > > +
> > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > +    if (ambiguous) {
> > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > +                         " device is permitted");
> > > +        return;
> > > +    }
> > > +
> > > +    if (!s->guid_set) {
> > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > +                   object_get_typename(OBJECT(s)));
> > > +        return;
> > > +    }
> > > +
> > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > +        &s->iomem);
> > > +    return;
> > > +}
> > > +
> > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > +{
> > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > +
> > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > +    dc->hotpluggable = false;
> > > +    k->realize = vmgenid_realize;
> > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > +}
> > > +
> > > +static const TypeInfo vmgenid_device_info = {
> > > +    .name          = VMGENID_DEVICE,
> > > +    .parent        = TYPE_PCI_DEVICE,
> > > +    .instance_size = sizeof(VmGenIdState),
> > > +    .instance_init = vmgenid_initfn,
> > > +    .class_init    = vmgenid_class_init,
> > > +};
> > > +
> > > +static void vmgenid_register_types(void)
> > > +{
> > > +    type_register_static(&vmgenid_device_info);
> > > +}
> > > +
> > > +type_init(vmgenid_register_types)
> > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > new file mode 100644
> > > index 0000000..b90882c
> > > --- /dev/null
> > > +++ b/include/hw/misc/vmgenid.h
> > > @@ -0,0 +1,27 @@
> > > +/*
> > > + *  Virtual Machine Generation ID Device
> > > + *
> > > + *  Copyright (C) 2016 Red Hat Inc.
> > > + *
> > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > + *           Igor Mammedov <imammedo@redhat.com>
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > + * See the COPYING file in the top-level directory.
> > > + *
> > > + */
> > > +
> > > +#ifndef HW_MISC_VMGENID_H
> > > +#define HW_MISC_VMGENID_H
> > > +
> > > +#include "qom/object.h"
> > > +
> > > +#define VMGENID_DEVICE           "vmgenid"
> > > +#define VMGENID_GUID             "guid"
> > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > +#define VMGENID_VMGID_BUF_BAR    0
> > > +
> > > +Object *find_vmgneid_dev(Error **errp);
> > > +
> > > +#endif
> > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > index dedf277..f4c9d48 100644
> > > --- a/include/hw/pci/pci.h
> > > +++ b/include/hw/pci/pci.h
> > > @@ -94,6 +94,7 @@
> > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > >  
> > >  #define FMT_PCIBUS                      PRIx64
> > > -- 
> > > 1.8.3.1  

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-01-28 11:13   ` Michael S. Tsirkin
  2016-01-28 12:03     ` Igor Mammedov
@ 2016-01-28 13:48     ` Laszlo Ersek
  1 sibling, 0 replies; 59+ messages in thread
From: Laszlo Ersek @ 2016-01-28 13:48 UTC (permalink / raw)
  To: Michael S. Tsirkin, Igor Mammedov
  Cc: ghammer, lcapitulino, qemu-devel, ehabkost

On 01/28/16 12:13, Michael S. Tsirkin wrote:
> On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:
>> Based on Microsoft's specifications (paper can be
>> downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
>> easily found by "Virtual Machine Generation ID" keywords),
>> add a PCI device with corresponding description in
>> SSDT ACPI table.
>>
>> The GUID is set using "vmgenid.guid" property or
>> a corresponding HMP/QMP command.
>>
>> Example of using vmgenid device:
>>  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
>>
>> 'vmgenid' device initialization flow is as following:
>>  1. vmgenid has RAM BAR registered with size of GUID buffer
>>  2. BIOS initializes PCI devices and it maps BAR in PCI hole
>>  3. BIOS reads ACPI tables from QEMU, at that moment tables
>>     are generated with \_SB.VMGI.ADDR constant pointing to
>>     GPA where BIOS's mapped vmgenid's BAR earlier
>>
>> Note:
>> This implementation uses PCI class 0x0500 code for vmgenid device,
>> that is marked as NO_DRV in Windows's machine.inf.
>> Testing various Windows versions showed that, OS
>> doesn't touch nor checks for resource conflicts
>> for such PCI devices.
>> There was concern that during PCI rebalancing, OS
>> could reprogram the BAR at other place, which would
>> leave VGEN.ADDR pointing to the old (no more valid)
>> address.
>> However testing showed that Windows does rebalancing
>> only for PCI device that have a driver attached
>> and completely ignores NO_DRV class of devices.
>> Which in turn creates a problem where OS could remap
>> one of PCI devices(with driver) over BAR used by
>> a driver-less PCI device.
>> Statically declaring used memory range as VGEN._CRS
>> makes OS to honor resource reservation and an ignored
>> BAR range is not longer touched during PCI rebalancing.
>>
>> Signed-off-by: Gal Hammer <ghammer@redhat.com>
>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> 
> It's an interesting hack, but this needs some thought. BIOS has no idea
> this BAR is special and can not be rebalanced, so it might put the BAR
> in the middle of the range, in effect fragmenting it.
> 
> Really I think something like V12 just rewritten using the new APIs
> (probably with something like build_append_named_dword that I suggested)
> would be much a simpler way to implement this device, given
> the weird API limitations.
> 
> And hey, if you want to use a pci device to pass the physical
> address guest to host, instead of reserving
> a couple of IO addresses, sure, stick it in pci config in
> a vendor-specific capability, this way it'll get migrated
> automatically.

Looks like the bottleneck for this feature is that we can't agree on the
design.

I thought that the ACPI approach was a nice idea; the DataTableRegion
trick made me enthusiastic. Since DataTableRegion won't work, I feel
that the ACPI approach has devolved into too many layers of manual
indirections. (If I recall right, I posted one of the recent updates on
that, with some diagrams).

I think that this investigation has been beaten to death; now we should
just do the simplest thing that works. I think Igor's PCI approach is
simpler, and may be more easily debuggable, than the one ACPI approach
that could *actually* be made work.

I believe it wasn't my original priority about this feature, but at v19,
simplicity of design & implementation looks like the most important
trait. At this point even the fixed MMIO range near the LAPIC range
(whose relocation we don't support anyway, BTW) could be preferable.

Anyway, I've said my piece, I'm outta here. I hope you guys can reach an
agreement.

Thanks
Laszlo

> 
> 
>> ---
>> changes since 17:
>>   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
>>   - make BAR prefetchable to make region cached as per MS spec
>>   - s/uuid/guid/ to match spec
>> changes since 14:
>>   - reserve BAR resources so that Windows won't touch it
>>     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
>>   - ACPI: split VGEN device of PCI device descriptor
>>     and place it at PCI0 scope, so that won't be need trace its
>>     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
>>   - permit only one vmgenid to be created
>>   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
>> ---
>>  default-configs/i386-softmmu.mak   |   1 +
>>  default-configs/x86_64-softmmu.mak |   1 +
>>  docs/specs/pci-ids.txt             |   1 +
>>  hw/i386/acpi-build.c               |  56 +++++++++++++-
>>  hw/misc/Makefile.objs              |   1 +
>>  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
>>  include/hw/misc/vmgenid.h          |  27 +++++++
>>  include/hw/pci/pci.h               |   1 +
>>  8 files changed, 240 insertions(+), 2 deletions(-)
>>  create mode 100644 hw/misc/vmgenid.c
>>  create mode 100644 include/hw/misc/vmgenid.h
>>
>> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
>> index b177e52..6402439 100644
>> --- a/default-configs/i386-softmmu.mak
>> +++ b/default-configs/i386-softmmu.mak
>> @@ -51,6 +51,7 @@ CONFIG_APIC=y
>>  CONFIG_IOAPIC=y
>>  CONFIG_PVPANIC=y
>>  CONFIG_MEM_HOTPLUG=y
>> +CONFIG_VMGENID=y
>>  CONFIG_NVDIMM=y
>>  CONFIG_ACPI_NVDIMM=y
>>  CONFIG_XIO3130=y
>> diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
>> index 6e3b312..fdac18f 100644
>> --- a/default-configs/x86_64-softmmu.mak
>> +++ b/default-configs/x86_64-softmmu.mak
>> @@ -51,6 +51,7 @@ CONFIG_APIC=y
>>  CONFIG_IOAPIC=y
>>  CONFIG_PVPANIC=y
>>  CONFIG_MEM_HOTPLUG=y
>> +CONFIG_VMGENID=y
>>  CONFIG_NVDIMM=y
>>  CONFIG_ACPI_NVDIMM=y
>>  CONFIG_XIO3130=y
>> diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
>> index 0adcb89..e65ecf9 100644
>> --- a/docs/specs/pci-ids.txt
>> +++ b/docs/specs/pci-ids.txt
>> @@ -47,6 +47,7 @@ PCI devices (other than virtio):
>>  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
>>  1b36:0006  PCI Rocker Ethernet switch device
>>  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
>> +1b36:0009  PCI VM-Generation device
>>  1b36:000a  PCI-PCI bridge (multiseat)
>>  
>>  All these devices are documented in docs/specs.
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 78758e2..0187262 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -44,6 +44,7 @@
>>  #include "hw/acpi/tpm.h"
>>  #include "sysemu/tpm_backend.h"
>>  #include "hw/timer/mc146818rtc_regs.h"
>> +#include "hw/misc/vmgenid.h"
>>  
>>  /* Supported chipsets: */
>>  #include "hw/acpi/piix4.h"
>> @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
>>      info->applesmc_io_base = applesmc_port();
>>  }
>>  
>> +static Aml *build_vmgenid_device(uint64_t buf_paddr)
>> +{
>> +    Aml *dev, *pkg, *crs;
>> +
>> +    dev = aml_device("VGEN");
>> +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
>> +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
>> +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
>> +
>> +    pkg = aml_package(2);
>> +    /* low 32 bits of UUID buffer addr */
>> +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
>> +    /* high 32 bits of UUID buffer addr */
>> +    aml_append(pkg, aml_int(buf_paddr >> 32));
>> +    aml_append(dev, aml_name_decl("ADDR", pkg));
>> +
>> +    /*
>> +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
>> +     * displays it as "PCI RAM controller" which is marked as NO_DRV
>> +     * so Windows ignores VMGEN device completely and doesn't check
>> +     * for resource conflicts which during PCI rebalancing can lead
>> +     * to another PCI device claiming ignored BARs. To prevent this
>> +     * statically reserve resources used by VM_Gen_Counter.
>> +     * For more verbose comment see this commit message.
> 
> What does "this commit message" mean?
> 
>> +     */
>> +     crs = aml_resource_template();
>> +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
>> +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
>> +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
>> +                VMGENID_VMGID_BUF_SIZE));
>> +     aml_append(dev, aml_name_decl("_CRS", crs));
>> +     return dev;
>> +}
>> +
>>  /*
>>   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
>>   * On i386 arch we only have two pci hosts, so we can look only for them.
>> @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
>>              }
>>  
>>              if (bus) {
>> +                Object *vmgen;
>>                  Aml *scope = aml_scope("PCI0");
>>                  /* Scan all PCI buses. Generate tables to support hotplug. */
>>                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
>> @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
>>                      aml_append(scope, dev);
>>                  }
>>  
>> +                vmgen = find_vmgneid_dev(NULL);
>> +                if (vmgen) {
>> +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
>> +                    uint64_t buf_paddr =
>> +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
>> +
>> +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
>> +                        aml_append(scope, build_vmgenid_device(buf_paddr));
>> +
>> +                        method = aml_method("\\_GPE._E00", 0,
>> +                                            AML_NOTSERIALIZED);
>> +                        aml_append(method,
>> +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
>> +                                       aml_int(0x80)));
>> +                        aml_append(ssdt, method);
>> +                    }
>> +                }
>> +
>>                  aml_append(sb_scope, scope);
>>              }
>>          }
>> @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
>>      {
>>          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
>>  
>> -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
>> -
>>          if (misc->is_piix4) {
>>              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
>>              aml_append(method,
>> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
>> index d4765c2..1f05edd 100644
>> --- a/hw/misc/Makefile.objs
>> +++ b/hw/misc/Makefile.objs
>> @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
>>  
>>  obj-$(CONFIG_PVPANIC) += pvpanic.o
>>  obj-$(CONFIG_EDU) += edu.o
>> +obj-$(CONFIG_VMGENID) += vmgenid.o
>>  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
>> diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
>> new file mode 100644
>> index 0000000..a2fbdfc
>> --- /dev/null
>> +++ b/hw/misc/vmgenid.c
>> @@ -0,0 +1,154 @@
>> +/*
>> + *  Virtual Machine Generation ID Device
>> + *
>> + *  Copyright (C) 2016 Red Hat Inc.
>> + *
>> + *  Authors: Gal Hammer <ghammer@redhat.com>
>> + *           Igor Mammedov <imammedo@redhat.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "hw/i386/pc.h"
>> +#include "hw/pci/pci.h"
>> +#include "hw/misc/vmgenid.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "qapi/visitor.h"
>> +
>> +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
>> +
>> +typedef struct VmGenIdState {
>> +    PCIDevice parent_obj;
>> +    MemoryRegion iomem;
>> +    union {
>> +        uint8_t guid[16];
>> +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
>> +    };
>> +    bool guid_set;
>> +} VmGenIdState;
>> +
>> +Object *find_vmgneid_dev(Error **errp)
>> +{
>> +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
>> +    if (!obj) {
>> +        error_setg(errp, VMGENID_DEVICE " is not found");
>> +    }
>> +    return obj;
>> +}
>> +
>> +static void vmgenid_update_guest(VmGenIdState *s)
>> +{
>> +    Object *acpi_obj;
>> +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
>> +
>> +    memcpy(ptr, &s->guid, sizeof(s->guid));
>> +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
>> +
>> +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
>> +    if (acpi_obj) {
>> +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
>> +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
>> +        ACPIREGS *acpi_regs = adevc->regs(adev);
>> +
>> +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
>> +        acpi_update_sci(acpi_regs, adevc->sci(adev));
>> +    }
>> +}
>> +
>> +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
>> +{
>> +    VmGenIdState *s = VMGENID(obj);
>> +
>> +    if (qemu_uuid_parse(value, s->guid) < 0) {
>> +        error_setg(errp, "'%s." VMGENID_GUID
>> +                   "': Failed to parse GUID string: %s",
>> +                   object_get_typename(OBJECT(s)),
>> +                   value);
>> +        return;
>> +    }
>> +
>> +    s->guid_set = true;
>> +    vmgenid_update_guest(s);
>> +}
>> +
>> +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
>> +                                   const char *name, Error **errp)
>> +{
>> +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
>> +
>> +    if (value == PCI_BAR_UNMAPPED) {
>> +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
>> +                   object_get_typename(OBJECT(obj)));
>> +        return;
>> +    }
>> +    visit_type_int(v, &value, name, errp);
>> +}
>> +
>> +static void vmgenid_initfn(Object *obj)
>> +{
>> +    VmGenIdState *s = VMGENID(obj);
>> +
>> +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
>> +                           &error_abort);
>> +
>> +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
>> +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
>> +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
>> +}
>> +
>> +
>> +static void vmgenid_realize(PCIDevice *dev, Error **errp)
>> +{
>> +    VmGenIdState *s = VMGENID(dev);
>> +    bool ambiguous = false;
>> +
>> +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
>> +    if (ambiguous) {
>> +        error_setg(errp, "no more than one " VMGENID_DEVICE
>> +                         " device is permitted");
>> +        return;
>> +    }
>> +
>> +    if (!s->guid_set) {
>> +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
>> +                   object_get_typename(OBJECT(s)));
>> +        return;
>> +    }
>> +
>> +    vmstate_register_ram(&s->iomem, DEVICE(s));
>> +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
>> +        PCI_BASE_ADDRESS_MEM_PREFETCH |
>> +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
>> +        &s->iomem);
>> +    return;
>> +}
>> +
>> +static void vmgenid_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>> +
>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>> +    dc->hotpluggable = false;
>> +    k->realize = vmgenid_realize;
>> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
>> +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
>> +    k->class_id = PCI_CLASS_MEMORY_RAM;
>> +}
>> +
>> +static const TypeInfo vmgenid_device_info = {
>> +    .name          = VMGENID_DEVICE,
>> +    .parent        = TYPE_PCI_DEVICE,
>> +    .instance_size = sizeof(VmGenIdState),
>> +    .instance_init = vmgenid_initfn,
>> +    .class_init    = vmgenid_class_init,
>> +};
>> +
>> +static void vmgenid_register_types(void)
>> +{
>> +    type_register_static(&vmgenid_device_info);
>> +}
>> +
>> +type_init(vmgenid_register_types)
>> diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
>> new file mode 100644
>> index 0000000..b90882c
>> --- /dev/null
>> +++ b/include/hw/misc/vmgenid.h
>> @@ -0,0 +1,27 @@
>> +/*
>> + *  Virtual Machine Generation ID Device
>> + *
>> + *  Copyright (C) 2016 Red Hat Inc.
>> + *
>> + *  Authors: Gal Hammer <ghammer@redhat.com>
>> + *           Igor Mammedov <imammedo@redhat.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef HW_MISC_VMGENID_H
>> +#define HW_MISC_VMGENID_H
>> +
>> +#include "qom/object.h"
>> +
>> +#define VMGENID_DEVICE           "vmgenid"
>> +#define VMGENID_GUID             "guid"
>> +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
>> +#define VMGENID_VMGID_BUF_SIZE   0x1000
>> +#define VMGENID_VMGID_BUF_BAR    0
>> +
>> +Object *find_vmgneid_dev(Error **errp);
>> +
>> +#endif
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index dedf277..f4c9d48 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -94,6 +94,7 @@
>>  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
>>  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
>>  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
>> +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
>>  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
>>  
>>  #define FMT_PCIBUS                      PRIx64
>> -- 
>> 1.8.3.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly
  2016-01-28 10:58 ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Igor Mammedov
@ 2016-01-28 14:02   ` Eduardo Habkost
  2016-01-28 17:00     ` Igor Mammedov
  2016-01-29 12:51   ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Cornelia Huck
  1 sibling, 1 reply; 59+ messages in thread
From: Eduardo Habkost @ 2016-01-28 14:02 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: agraf, mst, ghammer, lersek, qemu-devel, lcapitulino,
	borntraeger, qemu-ppc, cornelia.huck, pbonzini, rth, david

On Thu, Jan 28, 2016 at 11:58:08AM +0100, Igor Mammedov wrote:
> Switch to adding compat properties incrementaly instead of
> completly overwriting compat_props per machine type.
> That removes data duplication which we have due to nested
> [PC|SPAPR]_COMPAT_* macros.
> 
> It also allows to set default device properties from
> default foo_machine_options() hook, which will be used
> in following patch for putting VMGENID device as
> a function if ISA bridge on pc/q35 machines.
> 
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> 

Very nice. The only suggestion I have is to use the simpler GList
type, instead of GArray.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

> compat_props GArray

I assume this line was left here by mistake?

> ---
>  hw/core/machine.c          | 10 ++++++++++
>  hw/ppc/spapr.c             |  3 ---
>  hw/s390x/s390-virtio-ccw.c | 12 ++----------
>  include/hw/boards.h        | 11 +++++++++--
>  include/hw/i386/pc.h       |  9 ---------
>  vl.c                       |  6 +++++-
>  6 files changed, 26 insertions(+), 25 deletions(-)
[...]

-- 
Eduardo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly
  2016-01-28 14:02   ` Eduardo Habkost
@ 2016-01-28 17:00     ` Igor Mammedov
  2016-02-03 17:55       ` [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly) Eduardo Habkost
  0 siblings, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-01-28 17:00 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: agraf, mst, ghammer, lersek, qemu-devel, lcapitulino,
	borntraeger, qemu-ppc, cornelia.huck, pbonzini, rth, david

On Thu, 28 Jan 2016 12:02:12 -0200
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Thu, Jan 28, 2016 at 11:58:08AM +0100, Igor Mammedov wrote:
> > Switch to adding compat properties incrementaly instead of
> > completly overwriting compat_props per machine type.
> > That removes data duplication which we have due to nested
> > [PC|SPAPR]_COMPAT_* macros.
> > 
> > It also allows to set default device properties from
> > default foo_machine_options() hook, which will be used
> > in following patch for putting VMGENID device as
> > a function if ISA bridge on pc/q35 machines.
> > 
> > Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >   
> 
> Very nice. The only suggestion I have is to use the simpler GList
> type, instead of GArray.
It's fine with me to use GList here as well,
fill free to pick this patch in case you'd like to do it.
it should be trivial to swap from one type to another.

It looks like this series might go nowhere but this patch
is not tied to it and useful to us in general
so perhaps you could pick it up after ACKs from
S390/SPAPR maintainers.

> 
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> 
> > compat_props GArray  
> 
> I assume this line was left here by mistake?
yep, it's leftover from squashing fixup.

> 
> > ---
> >  hw/core/machine.c          | 10 ++++++++++
> >  hw/ppc/spapr.c             |  3 ---
> >  hw/s390x/s390-virtio-ccw.c | 12 ++----------
> >  include/hw/boards.h        | 11 +++++++++--
> >  include/hw/i386/pc.h       |  9 ---------
> >  vl.c                       |  6 +++++-
> >  6 files changed, 26 insertions(+), 25 deletions(-)  
> [...]
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-01-28 12:59       ` Michael S. Tsirkin
@ 2016-01-29 11:13         ` Igor Mammedov
  2016-01-31 16:22           ` Michael S. Tsirkin
  0 siblings, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-01-29 11:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: ehabkost, ghammer, lersek, qemu-devel, lcapitulino

On Thu, 28 Jan 2016 14:59:25 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:
> > On Thu, 28 Jan 2016 13:13:04 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:  
> > > > Based on Microsoft's specifications (paper can be
> > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > add a PCI device with corresponding description in
> > > > SSDT ACPI table.
> > > > 
> > > > The GUID is set using "vmgenid.guid" property or
> > > > a corresponding HMP/QMP command.
> > > > 
> > > > Example of using vmgenid device:
> > > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > 
> > > > 'vmgenid' device initialization flow is as following:
> > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > >     are generated with \_SB.VMGI.ADDR constant pointing to
> > > >     GPA where BIOS's mapped vmgenid's BAR earlier
> > > > 
> > > > Note:
> > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > Testing various Windows versions showed that, OS
> > > > doesn't touch nor checks for resource conflicts
> > > > for such PCI devices.
> > > > There was concern that during PCI rebalancing, OS
> > > > could reprogram the BAR at other place, which would
> > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > address.
> > > > However testing showed that Windows does rebalancing
> > > > only for PCI device that have a driver attached
> > > > and completely ignores NO_DRV class of devices.
> > > > Which in turn creates a problem where OS could remap
> > > > one of PCI devices(with driver) over BAR used by
> > > > a driver-less PCI device.
> > > > Statically declaring used memory range as VGEN._CRS
> > > > makes OS to honor resource reservation and an ignored
> > > > BAR range is not longer touched during PCI rebalancing.
> > > > 
> > > > Signed-off-by: Gal Hammer <ghammer@redhat.com>
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>    
> > > 
> > > It's an interesting hack, but this needs some thought. BIOS has no idea
> > > this BAR is special and can not be rebalanced, so it might put the BAR
> > > in the middle of the range, in effect fragmenting it.  
> > yep that's the only drawback in PCI approach.
> >   
> > > Really I think something like V12 just rewritten using the new APIs
> > > (probably with something like build_append_named_dword that I suggested)
> > > would be much a simpler way to implement this device, given
> > > the weird API limitations.  
> > We went over stating drawbacks of both approaches several times 
> > and that's where I strongly disagree with using v12 AML patching
> > approach for reasons stated in those discussions.  
> 
> Yes, IIRC you dislike the need to allocate an IO range to pass address
> to host, and to have costom code to migrate the address.
allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
approach for task at hand,
let me enumerate one more time the issues that make me dislike it so much
(in order where most disliked ones go the first):

1. over-engineered for the task at hand, 
   for device to become initialized guest OS has to execute AML,
   so init chain looks like:
     QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
         QEMU (update buf address)
   it's hell to debug when something doesn't work right in this chain
   even if there isn't any memory corruption that incorrect AML patching
   could introduce.
   As result of complexity patches are hard to review since one has
   to remember/relearn all details how bios_linker in QEMU and BIOS works,
   hence chance of regression is very high.
   Dynamically patched AML also introduces its own share of AML
   code that has to deal with dynamic buff address value.
   For an example:
     "nvdimm acpi: add _CRS" https://patchwork.ozlabs.org/patch/566697/
   27 liner patch could be just 5-6 lines if static (known in advance)
   buffer address were used to declare static _CRS variable.

2. ACPI approach consumes guest usable RAM to allocate buffer
   and then makes device to DMA data in that RAM.
   That's a design point I don't agree with.
   Just compare with a graphics card design, where on device memory
   is mapped directly at some GPA not wasting RAM that guest could
   use for other tasks.
   VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
   instead of consuming guest's RAM they should be mapped at
   some GPA and their memory accessed directly.
   In that case NVDIMM could even map whole label area and
   significantly simplify QEMU<->OSPM protocol that currently
   serializes that data through a 4K page.
   There is also performance issue with buffer allocated in RAM,
   because DMA adds unnecessary copying step when data could
   be read/written directly of NVDIMM.
   It might be no very important for _DSM interface but when it
   comes to supporting block mode it can become an issue.

Above points make ACPI patching approach not robust and fragile
and hard to maintain.

> 
> > > And hey, if you want to use a pci device to pass the physical
> > > address guest to host, instead of reserving
> > > a couple of IO addresses, sure, stick it in pci config in
> > > a vendor-specific capability, this way it'll get migrated
> > > automatically.  
> > Could you elaborate more on this suggestion?  
> 
> I really just mean using PCI_Config operation region.
> If you wish, I'll try to post a prototype next week.
I don't know much about PCI but it would be interesting,
perhaps we could use it somewhere else.

However it should be checked if it works with Windows,
for example PCI specific _DSM method is ignored by it
if PCI device doesn't have working PCI driver bound to it.

> 
> > > 
> > >   
> > > > ---
> > > > changes since 17:
> > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > >   - make BAR prefetchable to make region cached as per MS spec
> > > >   - s/uuid/guid/ to match spec
> > > > changes since 14:
> > > >   - reserve BAR resources so that Windows won't touch it
> > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > >   - ACPI: split VGEN device of PCI device descriptor
> > > >     and place it at PCI0 scope, so that won't be need trace its
> > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > >   - permit only one vmgenid to be created
> > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > ---
> > > >  default-configs/i386-softmmu.mak   |   1 +
> > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > >  docs/specs/pci-ids.txt             |   1 +
> > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > >  hw/misc/Makefile.objs              |   1 +
> > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > >  include/hw/pci/pci.h               |   1 +
> > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > >  create mode 100644 hw/misc/vmgenid.c
> > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > 
> > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > index b177e52..6402439 100644
> > > > --- a/default-configs/i386-softmmu.mak
> > > > +++ b/default-configs/i386-softmmu.mak
> > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > >  CONFIG_IOAPIC=y
> > > >  CONFIG_PVPANIC=y
> > > >  CONFIG_MEM_HOTPLUG=y
> > > > +CONFIG_VMGENID=y
> > > >  CONFIG_NVDIMM=y
> > > >  CONFIG_ACPI_NVDIMM=y
> > > >  CONFIG_XIO3130=y
> > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > index 6e3b312..fdac18f 100644
> > > > --- a/default-configs/x86_64-softmmu.mak
> > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > >  CONFIG_IOAPIC=y
> > > >  CONFIG_PVPANIC=y
> > > >  CONFIG_MEM_HOTPLUG=y
> > > > +CONFIG_VMGENID=y
> > > >  CONFIG_NVDIMM=y
> > > >  CONFIG_ACPI_NVDIMM=y
> > > >  CONFIG_XIO3130=y
> > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > index 0adcb89..e65ecf9 100644
> > > > --- a/docs/specs/pci-ids.txt
> > > > +++ b/docs/specs/pci-ids.txt
> > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > +1b36:0009  PCI VM-Generation device
> > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > >  
> > > >  All these devices are documented in docs/specs.
> > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > index 78758e2..0187262 100644
> > > > --- a/hw/i386/acpi-build.c
> > > > +++ b/hw/i386/acpi-build.c
> > > > @@ -44,6 +44,7 @@
> > > >  #include "hw/acpi/tpm.h"
> > > >  #include "sysemu/tpm_backend.h"
> > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > +#include "hw/misc/vmgenid.h"
> > > >  
> > > >  /* Supported chipsets: */
> > > >  #include "hw/acpi/piix4.h"
> > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > >      info->applesmc_io_base = applesmc_port();
> > > >  }
> > > >  
> > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > +{
> > > > +    Aml *dev, *pkg, *crs;
> > > > +
> > > > +    dev = aml_device("VGEN");
> > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > +
> > > > +    pkg = aml_package(2);
> > > > +    /* low 32 bits of UUID buffer addr */
> > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > +    /* high 32 bits of UUID buffer addr */
> > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > +
> > > > +    /*
> > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > +     * For more verbose comment see this commit message.    
> > > 
> > > What does "this commit message" mean?  
> > above commit message. Should I reword it to just 'see commit message'
> >   
> > >   
> > > > +     */
> > > > +     crs = aml_resource_template();
> > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > +     return dev;
> > > > +}
> > > > +
> > > >  /*
> > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > >              }
> > > >  
> > > >              if (bus) {
> > > > +                Object *vmgen;
> > > >                  Aml *scope = aml_scope("PCI0");
> > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > >                      aml_append(scope, dev);
> > > >                  }
> > > >  
> > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > +                if (vmgen) {
> > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > +                    uint64_t buf_paddr =
> > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > +
> > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > +
> > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > +                                            AML_NOTSERIALIZED);
> > > > +                        aml_append(method,
> > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > +                                       aml_int(0x80)));
> > > > +                        aml_append(ssdt, method);
> > > > +                    }
> > > > +                }
> > > > +
> > > >                  aml_append(sb_scope, scope);
> > > >              }
> > > >          }
> > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > >      {
> > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > >  
> > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > -
> > > >          if (misc->is_piix4) {
> > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > >              aml_append(method,
> > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > index d4765c2..1f05edd 100644
> > > > --- a/hw/misc/Makefile.objs
> > > > +++ b/hw/misc/Makefile.objs
> > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > >  
> > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > >  obj-$(CONFIG_EDU) += edu.o
> > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > new file mode 100644
> > > > index 0000000..a2fbdfc
> > > > --- /dev/null
> > > > +++ b/hw/misc/vmgenid.c
> > > > @@ -0,0 +1,154 @@
> > > > +/*
> > > > + *  Virtual Machine Generation ID Device
> > > > + *
> > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > + *
> > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > + *
> > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > + * See the COPYING file in the top-level directory.
> > > > + *
> > > > + */
> > > > +
> > > > +#include "hw/i386/pc.h"
> > > > +#include "hw/pci/pci.h"
> > > > +#include "hw/misc/vmgenid.h"
> > > > +#include "hw/acpi/acpi.h"
> > > > +#include "qapi/visitor.h"
> > > > +
> > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > +
> > > > +typedef struct VmGenIdState {
> > > > +    PCIDevice parent_obj;
> > > > +    MemoryRegion iomem;
> > > > +    union {
> > > > +        uint8_t guid[16];
> > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > +    };
> > > > +    bool guid_set;
> > > > +} VmGenIdState;
> > > > +
> > > > +Object *find_vmgneid_dev(Error **errp)
> > > > +{
> > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > +    if (!obj) {
> > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > +    }
> > > > +    return obj;
> > > > +}
> > > > +
> > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > +{
> > > > +    Object *acpi_obj;
> > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > +
> > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > +
> > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > +    if (acpi_obj) {
> > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > +
> > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > +    }
> > > > +}
> > > > +
> > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > +{
> > > > +    VmGenIdState *s = VMGENID(obj);
> > > > +
> > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > +                   "': Failed to parse GUID string: %s",
> > > > +                   object_get_typename(OBJECT(s)),
> > > > +                   value);
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    s->guid_set = true;
> > > > +    vmgenid_update_guest(s);
> > > > +}
> > > > +
> > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > +                                   const char *name, Error **errp)
> > > > +{
> > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > +
> > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > +                   object_get_typename(OBJECT(obj)));
> > > > +        return;
> > > > +    }
> > > > +    visit_type_int(v, &value, name, errp);
> > > > +}
> > > > +
> > > > +static void vmgenid_initfn(Object *obj)
> > > > +{
> > > > +    VmGenIdState *s = VMGENID(obj);
> > > > +
> > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > +                           &error_abort);
> > > > +
> > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > +}
> > > > +
> > > > +
> > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > +{
> > > > +    VmGenIdState *s = VMGENID(dev);
> > > > +    bool ambiguous = false;
> > > > +
> > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > +    if (ambiguous) {
> > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > +                         " device is permitted");
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    if (!s->guid_set) {
> > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > +                   object_get_typename(OBJECT(s)));
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > +        &s->iomem);
> > > > +    return;
> > > > +}
> > > > +
> > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > +{
> > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > +
> > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > +    dc->hotpluggable = false;
> > > > +    k->realize = vmgenid_realize;
> > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > +}
> > > > +
> > > > +static const TypeInfo vmgenid_device_info = {
> > > > +    .name          = VMGENID_DEVICE,
> > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > +    .instance_size = sizeof(VmGenIdState),
> > > > +    .instance_init = vmgenid_initfn,
> > > > +    .class_init    = vmgenid_class_init,
> > > > +};
> > > > +
> > > > +static void vmgenid_register_types(void)
> > > > +{
> > > > +    type_register_static(&vmgenid_device_info);
> > > > +}
> > > > +
> > > > +type_init(vmgenid_register_types)
> > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > new file mode 100644
> > > > index 0000000..b90882c
> > > > --- /dev/null
> > > > +++ b/include/hw/misc/vmgenid.h
> > > > @@ -0,0 +1,27 @@
> > > > +/*
> > > > + *  Virtual Machine Generation ID Device
> > > > + *
> > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > + *
> > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > + *
> > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > + * See the COPYING file in the top-level directory.
> > > > + *
> > > > + */
> > > > +
> > > > +#ifndef HW_MISC_VMGENID_H
> > > > +#define HW_MISC_VMGENID_H
> > > > +
> > > > +#include "qom/object.h"
> > > > +
> > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > +#define VMGENID_GUID             "guid"
> > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > +
> > > > +Object *find_vmgneid_dev(Error **errp);
> > > > +
> > > > +#endif
> > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > index dedf277..f4c9d48 100644
> > > > --- a/include/hw/pci/pci.h
> > > > +++ b/include/hw/pci/pci.h
> > > > @@ -94,6 +94,7 @@
> > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > >  
> > > >  #define FMT_PCIBUS                      PRIx64
> > > > -- 
> > > > 1.8.3.1    

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly
  2016-01-28 10:58 ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Igor Mammedov
  2016-01-28 14:02   ` Eduardo Habkost
@ 2016-01-29 12:51   ` Cornelia Huck
  1 sibling, 0 replies; 59+ messages in thread
From: Cornelia Huck @ 2016-01-29 12:51 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: agraf, ehabkost, mst, ghammer, qemu-devel, lcapitulino,
	borntraeger, qemu-ppc, pbonzini, rth, lersek, david

On Thu, 28 Jan 2016 11:58:08 +0100
Igor Mammedov <imammedo@redhat.com> wrote:

> Switch to adding compat properties incrementaly instead of
> completly overwriting compat_props per machine type.
> That removes data duplication which we have due to nested
> [PC|SPAPR]_COMPAT_* macros.

We'll try to switch to something similar to spapr for ccw so we can get
rid of the nesting as well (once one of us has time to look into that).

> 
> It also allows to set default device properties from
> default foo_machine_options() hook, which will be used
> in following patch for putting VMGENID device as
> a function if ISA bridge on pc/q35 machines.
> 
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

master + this patch (+ <20160115120143.GB2432@work-vm>) survives some
playing around with virsh managedsave and the 2.4/2.5/2.6 ccw machines,
so

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-01-29 11:13         ` Igor Mammedov
@ 2016-01-31 16:22           ` Michael S. Tsirkin
  2016-02-02  9:59             ` Igor Mammedov
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-01-31 16:22 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: ehabkost, ghammer, lersek, qemu-devel, lcapitulino

On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:
> On Thu, 28 Jan 2016 14:59:25 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:
> > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:  
> > > > > Based on Microsoft's specifications (paper can be
> > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > add a PCI device with corresponding description in
> > > > > SSDT ACPI table.
> > > > > 
> > > > > The GUID is set using "vmgenid.guid" property or
> > > > > a corresponding HMP/QMP command.
> > > > > 
> > > > > Example of using vmgenid device:
> > > > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > 
> > > > > 'vmgenid' device initialization flow is as following:
> > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > >     are generated with \_SB.VMGI.ADDR constant pointing to
> > > > >     GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > 
> > > > > Note:
> > > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > Testing various Windows versions showed that, OS
> > > > > doesn't touch nor checks for resource conflicts
> > > > > for such PCI devices.
> > > > > There was concern that during PCI rebalancing, OS
> > > > > could reprogram the BAR at other place, which would
> > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > address.
> > > > > However testing showed that Windows does rebalancing
> > > > > only for PCI device that have a driver attached
> > > > > and completely ignores NO_DRV class of devices.
> > > > > Which in turn creates a problem where OS could remap
> > > > > one of PCI devices(with driver) over BAR used by
> > > > > a driver-less PCI device.
> > > > > Statically declaring used memory range as VGEN._CRS
> > > > > makes OS to honor resource reservation and an ignored
> > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > 
> > > > > Signed-off-by: Gal Hammer <ghammer@redhat.com>
> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>    
> > > > 
> > > > It's an interesting hack, but this needs some thought. BIOS has no idea
> > > > this BAR is special and can not be rebalanced, so it might put the BAR
> > > > in the middle of the range, in effect fragmenting it.  
> > > yep that's the only drawback in PCI approach.
> > >   
> > > > Really I think something like V12 just rewritten using the new APIs
> > > > (probably with something like build_append_named_dword that I suggested)
> > > > would be much a simpler way to implement this device, given
> > > > the weird API limitations.  
> > > We went over stating drawbacks of both approaches several times 
> > > and that's where I strongly disagree with using v12 AML patching
> > > approach for reasons stated in those discussions.  
> > 
> > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > to host, and to have costom code to migrate the address.
> allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
> approach for task at hand,
> let me enumerate one more time the issues that make me dislike it so much
> (in order where most disliked ones go the first):
> 
> 1. over-engineered for the task at hand, 
>    for device to become initialized guest OS has to execute AML,
>    so init chain looks like:
>      QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
>          QEMU (update buf address)
>    it's hell to debug when something doesn't work right in this chain

Well this is not very different from e.g. virtio.
If it's just AML that worries you, we could teach BIOS/EFI a new command
to give some addresses after linking back to QEMU. Would this address
this issue?


>    even if there isn't any memory corruption that incorrect AML patching
>    could introduce.
>    As result of complexity patches are hard to review since one has
>    to remember/relearn all details how bios_linker in QEMU and BIOS works,
>    hence chance of regression is very high.
>    Dynamically patched AML also introduces its own share of AML
>    code that has to deal with dynamic buff address value.
>    For an example:
>      "nvdimm acpi: add _CRS" https://patchwork.ozlabs.org/patch/566697/
>    27 liner patch could be just 5-6 lines if static (known in advance)
>    buffer address were used to declare static _CRS variable.

Problem is with finding a fixed address, and fragmentation that this
causes.  Look at the mess we have with just allocating addresses for
RAM.  I think it's a mistake to add to this mess.  Either let's teach
management to specify an address map, or let guest allocate addresses
for us.


> 2. ACPI approach consumes guest usable RAM to allocate buffer
>    and then makes device to DMA data in that RAM.
>    That's a design point I don't agree with.

Blame the broken VM GEN ID spec.

For VM GEN ID, instead of DMA we could make ACPI code read PCI BAR and
copy data over, this would fix rebalancing but there is a problem with
this approach: it can not be done atomically (while VM is not yet
running and accessing RAM).  So you can have guest read a partially
corrupted ID from memory.

And hey, nowdays we actually made fw cfg do DMA too.


>    Just compare with a graphics card design, where on device memory
>    is mapped directly at some GPA not wasting RAM that guest could
>    use for other tasks.

This might have been true 20 years ago.  Most modern cards do DMA.

>    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
>    instead of consuming guest's RAM they should be mapped at
>    some GPA and their memory accessed directly.

VMGENID is tied to a spec that rather arbitrarily asks for a fixed
address. This breaks the straight-forward approach of using a
rebalanceable PCI BAR.

>    In that case NVDIMM could even map whole label area and
>    significantly simplify QEMU<->OSPM protocol that currently
>    serializes that data through a 4K page.
>    There is also performance issue with buffer allocated in RAM,
>    because DMA adds unnecessary copying step when data could
>    be read/written directly of NVDIMM.
>    It might be no very important for _DSM interface but when it
>    comes to supporting block mode it can become an issue.

So for NVDIMM, presumably it will have code access PCI BAR properly, so
it's guaranteed to work across BAR rebalancing.
Would that address the performance issue?


> Above points make ACPI patching approach not robust and fragile
> and hard to maintain.

Wrt GEN ID these are all kind of subjective though.  I especially don't
get what appears your general dislike of the linker host/guest
interface.  It's there and we are not moving away from it, so why not
use it in more places?  Or if you think it's wrong, why don't you build
something better then?  We could then maybe use it for these things as
well.

> 
> > 
> > > > And hey, if you want to use a pci device to pass the physical
> > > > address guest to host, instead of reserving
> > > > a couple of IO addresses, sure, stick it in pci config in
> > > > a vendor-specific capability, this way it'll get migrated
> > > > automatically.  
> > > Could you elaborate more on this suggestion?  
> > 
> > I really just mean using PCI_Config operation region.
> > If you wish, I'll try to post a prototype next week.
> I don't know much about PCI but it would be interesting,
> perhaps we could use it somewhere else.
> 
> However it should be checked if it works with Windows,
> for example PCI specific _DSM method is ignored by it
> if PCI device doesn't have working PCI driver bound to it.
> 
> > 
> > > > 
> > > >   
> > > > > ---
> > > > > changes since 17:
> > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > >   - s/uuid/guid/ to match spec
> > > > > changes since 14:
> > > > >   - reserve BAR resources so that Windows won't touch it
> > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > >   - permit only one vmgenid to be created
> > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > ---
> > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > >  hw/misc/Makefile.objs              |   1 +
> > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > >  include/hw/pci/pci.h               |   1 +
> > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > 
> > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > index b177e52..6402439 100644
> > > > > --- a/default-configs/i386-softmmu.mak
> > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > >  CONFIG_IOAPIC=y
> > > > >  CONFIG_PVPANIC=y
> > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > +CONFIG_VMGENID=y
> > > > >  CONFIG_NVDIMM=y
> > > > >  CONFIG_ACPI_NVDIMM=y
> > > > >  CONFIG_XIO3130=y
> > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > index 6e3b312..fdac18f 100644
> > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > >  CONFIG_IOAPIC=y
> > > > >  CONFIG_PVPANIC=y
> > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > +CONFIG_VMGENID=y
> > > > >  CONFIG_NVDIMM=y
> > > > >  CONFIG_ACPI_NVDIMM=y
> > > > >  CONFIG_XIO3130=y
> > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > index 0adcb89..e65ecf9 100644
> > > > > --- a/docs/specs/pci-ids.txt
> > > > > +++ b/docs/specs/pci-ids.txt
> > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > +1b36:0009  PCI VM-Generation device
> > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > >  
> > > > >  All these devices are documented in docs/specs.
> > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > index 78758e2..0187262 100644
> > > > > --- a/hw/i386/acpi-build.c
> > > > > +++ b/hw/i386/acpi-build.c
> > > > > @@ -44,6 +44,7 @@
> > > > >  #include "hw/acpi/tpm.h"
> > > > >  #include "sysemu/tpm_backend.h"
> > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > +#include "hw/misc/vmgenid.h"
> > > > >  
> > > > >  /* Supported chipsets: */
> > > > >  #include "hw/acpi/piix4.h"
> > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > >      info->applesmc_io_base = applesmc_port();
> > > > >  }
> > > > >  
> > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > +{
> > > > > +    Aml *dev, *pkg, *crs;
> > > > > +
> > > > > +    dev = aml_device("VGEN");
> > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > +
> > > > > +    pkg = aml_package(2);
> > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > +
> > > > > +    /*
> > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > +     * For more verbose comment see this commit message.    
> > > > 
> > > > What does "this commit message" mean?  
> > > above commit message. Should I reword it to just 'see commit message'
> > >   
> > > >   
> > > > > +     */
> > > > > +     crs = aml_resource_template();
> > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > +     return dev;
> > > > > +}
> > > > > +
> > > > >  /*
> > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > >              }
> > > > >  
> > > > >              if (bus) {
> > > > > +                Object *vmgen;
> > > > >                  Aml *scope = aml_scope("PCI0");
> > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > >                      aml_append(scope, dev);
> > > > >                  }
> > > > >  
> > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > +                if (vmgen) {
> > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > +                    uint64_t buf_paddr =
> > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > +
> > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > +
> > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > +                                            AML_NOTSERIALIZED);
> > > > > +                        aml_append(method,
> > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > +                                       aml_int(0x80)));
> > > > > +                        aml_append(ssdt, method);
> > > > > +                    }
> > > > > +                }
> > > > > +
> > > > >                  aml_append(sb_scope, scope);
> > > > >              }
> > > > >          }
> > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > >      {
> > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > >  
> > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > -
> > > > >          if (misc->is_piix4) {
> > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > >              aml_append(method,
> > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > index d4765c2..1f05edd 100644
> > > > > --- a/hw/misc/Makefile.objs
> > > > > +++ b/hw/misc/Makefile.objs
> > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > >  
> > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > new file mode 100644
> > > > > index 0000000..a2fbdfc
> > > > > --- /dev/null
> > > > > +++ b/hw/misc/vmgenid.c
> > > > > @@ -0,0 +1,154 @@
> > > > > +/*
> > > > > + *  Virtual Machine Generation ID Device
> > > > > + *
> > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > + *
> > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > + *
> > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > + * See the COPYING file in the top-level directory.
> > > > > + *
> > > > > + */
> > > > > +
> > > > > +#include "hw/i386/pc.h"
> > > > > +#include "hw/pci/pci.h"
> > > > > +#include "hw/misc/vmgenid.h"
> > > > > +#include "hw/acpi/acpi.h"
> > > > > +#include "qapi/visitor.h"
> > > > > +
> > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > +
> > > > > +typedef struct VmGenIdState {
> > > > > +    PCIDevice parent_obj;
> > > > > +    MemoryRegion iomem;
> > > > > +    union {
> > > > > +        uint8_t guid[16];
> > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > +    };
> > > > > +    bool guid_set;
> > > > > +} VmGenIdState;
> > > > > +
> > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > +{
> > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > +    if (!obj) {
> > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > +    }
> > > > > +    return obj;
> > > > > +}
> > > > > +
> > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > +{
> > > > > +    Object *acpi_obj;
> > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > +
> > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > +
> > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > +    if (acpi_obj) {
> > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > +
> > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > +{
> > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > +
> > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > +                   "': Failed to parse GUID string: %s",
> > > > > +                   object_get_typename(OBJECT(s)),
> > > > > +                   value);
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > > +    s->guid_set = true;
> > > > > +    vmgenid_update_guest(s);
> > > > > +}
> > > > > +
> > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > +                                   const char *name, Error **errp)
> > > > > +{
> > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > +
> > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > +        return;
> > > > > +    }
> > > > > +    visit_type_int(v, &value, name, errp);
> > > > > +}
> > > > > +
> > > > > +static void vmgenid_initfn(Object *obj)
> > > > > +{
> > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > +
> > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > +                           &error_abort);
> > > > > +
> > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > +}
> > > > > +
> > > > > +
> > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > +{
> > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > +    bool ambiguous = false;
> > > > > +
> > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > +    if (ambiguous) {
> > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > +                         " device is permitted");
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > > +    if (!s->guid_set) {
> > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > +                   object_get_typename(OBJECT(s)));
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > +        &s->iomem);
> > > > > +    return;
> > > > > +}
> > > > > +
> > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > +{
> > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > +
> > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > +    dc->hotpluggable = false;
> > > > > +    k->realize = vmgenid_realize;
> > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > +}
> > > > > +
> > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > +    .name          = VMGENID_DEVICE,
> > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > +    .instance_init = vmgenid_initfn,
> > > > > +    .class_init    = vmgenid_class_init,
> > > > > +};
> > > > > +
> > > > > +static void vmgenid_register_types(void)
> > > > > +{
> > > > > +    type_register_static(&vmgenid_device_info);
> > > > > +}
> > > > > +
> > > > > +type_init(vmgenid_register_types)
> > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > new file mode 100644
> > > > > index 0000000..b90882c
> > > > > --- /dev/null
> > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > @@ -0,0 +1,27 @@
> > > > > +/*
> > > > > + *  Virtual Machine Generation ID Device
> > > > > + *
> > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > + *
> > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > + *
> > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > + * See the COPYING file in the top-level directory.
> > > > > + *
> > > > > + */
> > > > > +
> > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > +#define HW_MISC_VMGENID_H
> > > > > +
> > > > > +#include "qom/object.h"
> > > > > +
> > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > +#define VMGENID_GUID             "guid"
> > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > +
> > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > +
> > > > > +#endif
> > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > index dedf277..f4c9d48 100644
> > > > > --- a/include/hw/pci/pci.h
> > > > > +++ b/include/hw/pci/pci.h
> > > > > @@ -94,6 +94,7 @@
> > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > >  
> > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > -- 
> > > > > 1.8.3.1    

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-01-31 16:22           ` Michael S. Tsirkin
@ 2016-02-02  9:59             ` Igor Mammedov
  2016-02-02 11:16               ` Michael S. Tsirkin
  0 siblings, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-02-02  9:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xiao Guangrong, ehabkost, ghammer, qemu-devel, lcapitulino, lersek

On Sun, 31 Jan 2016 18:22:13 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:
> > On Thu, 28 Jan 2016 14:59:25 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:  
> > > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >     
> > > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:    
> > > > > > Based on Microsoft's specifications (paper can be
> > > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > > add a PCI device with corresponding description in
> > > > > > SSDT ACPI table.
> > > > > > 
> > > > > > The GUID is set using "vmgenid.guid" property or
> > > > > > a corresponding HMP/QMP command.
> > > > > > 
> > > > > > Example of using vmgenid device:
> > > > > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > > 
> > > > > > 'vmgenid' device initialization flow is as following:
> > > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > > >     are generated with \_SB.VMGI.ADDR constant pointing to
> > > > > >     GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > > 
> > > > > > Note:
> > > > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > > Testing various Windows versions showed that, OS
> > > > > > doesn't touch nor checks for resource conflicts
> > > > > > for such PCI devices.
> > > > > > There was concern that during PCI rebalancing, OS
> > > > > > could reprogram the BAR at other place, which would
> > > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > > address.
> > > > > > However testing showed that Windows does rebalancing
> > > > > > only for PCI device that have a driver attached
> > > > > > and completely ignores NO_DRV class of devices.
> > > > > > Which in turn creates a problem where OS could remap
> > > > > > one of PCI devices(with driver) over BAR used by
> > > > > > a driver-less PCI device.
> > > > > > Statically declaring used memory range as VGEN._CRS
> > > > > > makes OS to honor resource reservation and an ignored
> > > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > > 
> > > > > > Signed-off-by: Gal Hammer <ghammer@redhat.com>
> > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
> > > > > 
> > > > > It's an interesting hack, but this needs some thought. BIOS has no idea
> > > > > this BAR is special and can not be rebalanced, so it might put the BAR
> > > > > in the middle of the range, in effect fragmenting it.    
> > > > yep that's the only drawback in PCI approach.
> > > >     
> > > > > Really I think something like V12 just rewritten using the new APIs
> > > > > (probably with something like build_append_named_dword that I suggested)
> > > > > would be much a simpler way to implement this device, given
> > > > > the weird API limitations.    
> > > > We went over stating drawbacks of both approaches several times 
> > > > and that's where I strongly disagree with using v12 AML patching
> > > > approach for reasons stated in those discussions.    
> > > 
> > > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > > to host, and to have costom code to migrate the address.  
> > allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
> > approach for task at hand,
> > let me enumerate one more time the issues that make me dislike it so much
> > (in order where most disliked ones go the first):
> > 
> > 1. over-engineered for the task at hand, 
> >    for device to become initialized guest OS has to execute AML,
> >    so init chain looks like:
> >      QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
> >          QEMU (update buf address)
> >    it's hell to debug when something doesn't work right in this chain  
> 
> Well this is not very different from e.g. virtio.
> If it's just AML that worries you, we could teach BIOS/EFI a new command
> to give some addresses after linking back to QEMU. Would this address
> this issue?
it would make it marginally better (especially from tests pov)
though it won't fix other issues.

> 
> 
> >    even if there isn't any memory corruption that incorrect AML patching
> >    could introduce.
> >    As result of complexity patches are hard to review since one has
> >    to remember/relearn all details how bios_linker in QEMU and BIOS works,
> >    hence chance of regression is very high.
> >    Dynamically patched AML also introduces its own share of AML
> >    code that has to deal with dynamic buff address value.
> >    For an example:
> >      "nvdimm acpi: add _CRS" https://patchwork.ozlabs.org/patch/566697/
> >    27 liner patch could be just 5-6 lines if static (known in advance)
> >    buffer address were used to declare static _CRS variable.  
> 
> Problem is with finding a fixed address, and fragmentation that this
> causes.  Look at the mess we have with just allocating addresses for
> RAM.  I think it's a mistake to add to this mess.  Either let's teach
> management to specify an address map, or let guest allocate addresses
> for us.

Yep, problem here is in 'fixed' part but not in an address in general.
Allowing Mgmt to specify address map partially works. See pc-dimm,
on target side libvirt specifies address where it has been mapped
on source. Initial RAM mess could be fixed for future machines types
in similar way by replacing memory_region_allocate_system_memory()
with pc-dimms, so that specific machine type could reproduce
the same layout.

But default addresses doesn't appear magically and have to
come from somewhere, so we have to have an address allocator
somewhere.
If we put allocator into guest and emulate memory controller
in QEMU, we probably would need to add fw_cfg interfaces
that describe hardware which needs mapping (probably ever
growing interface like it has been with ACPI tables, before
we axed them in BIOS and moved them into QEMU).
Alternatively we can put allocator in QEMU which could
be simpler to implement and maintain since we won't need
to implement extra fw_cfg interfaces and memory controller
and ship/fix QEMU/BIOS pair in sync as it has been with ACPI
in past.

> > 2. ACPI approach consumes guest usable RAM to allocate buffer
> >    and then makes device to DMA data in that RAM.
> >    That's a design point I don't agree with.  
> 
> Blame the broken VM GEN ID spec.
> 
> For VM GEN ID, instead of DMA we could make ACPI code read PCI BAR and
> copy data over, this would fix rebalancing but there is a problem with
> this approach: it can not be done atomically (while VM is not yet
> running and accessing RAM).  So you can have guest read a partially
> corrupted ID from memory.

Yep, VM GEN ID spec is broken and we can't do anything about it as
it's absolutely impossible to guaranty atomic update as guest OS
has address of the buffer and can read it any time. Nothing could be
done here.

> 
> And hey, nowdays we actually made fw cfg do DMA too.

I'm not against DMA, it's direction which seems wrong to me.
What we need here is a way to allocate GPA range and make sure
that QEMU maps device memory there, PCI BAR is one
of the ways to do it.

> >    Just compare with a graphics card design, where on device memory
> >    is mapped directly at some GPA not wasting RAM that guest could
> >    use for other tasks.  
> 
> This might have been true 20 years ago.  Most modern cards do DMA.

Modern cards, with it's own RAM, map its VRAM in address space directly
and allow users use it (GEM API). So they do not waste conventional RAM.
For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
series (even PCI class id is the same)

> >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> >    instead of consuming guest's RAM they should be mapped at
> >    some GPA and their memory accessed directly.  
> 
> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> address. This breaks the straight-forward approach of using a
> rebalanceable PCI BAR.

For PCI rebalance to work on Windows, one has to provide working PCI driver
otherwise OS will ignore it when rebalancing happens and
might map something else over ignored BAR.

> 
> >    In that case NVDIMM could even map whole label area and
> >    significantly simplify QEMU<->OSPM protocol that currently
> >    serializes that data through a 4K page.
> >    There is also performance issue with buffer allocated in RAM,
> >    because DMA adds unnecessary copying step when data could
> >    be read/written directly of NVDIMM.
> >    It might be no very important for _DSM interface but when it
> >    comes to supporting block mode it can become an issue.  
> 
> So for NVDIMM, presumably it will have code access PCI BAR properly, so
> it's guaranteed to work across BAR rebalancing.
> Would that address the performance issue?

it would if rebalancing were to account for driverless PCI device BARs,
but it doesn't hence such BARs need to be statically pinned
at place where BIOS put them at start up.
I'm also not sure that PCIConfig operation region would work
on Windows without loaded driver (similar to _DSM case).


> > Above points make ACPI patching approach not robust and fragile
> > and hard to maintain.  
> 
> Wrt GEN ID these are all kind of subjective though.  I especially don't
> get what appears your general dislike of the linker host/guest
> interface.
Besides technical issues general dislike is just what I've written
"not robust and fragile" bios_linker_loader_add_pointer() interface.

to make it less fragile:
 1. it should be impossible to corrupt memory or patch wrong address.
    current impl. silently relies on value referenced by 'pointer' argument
    and to figure that out one has to read linker code on BIOS side.
    That could be easily set wrong and slip through review.
    API shouldn't rely on the caller setting value pointed by that argument.
 2. If it's going to be used for patching AML, it should assert
    when bios_linker_loader_add_pointer() is called if to be patched
    AML object is wrong and patching would corrupt AML blob.


> It's there and we are not moving away from it, so why not
> use it in more places?  Or if you think it's wrong, why don't you build
> something better then?  We could then maybe use it for these things as
> well.

Yep, I think for vmgenid and even more so for nvdimm
it would be better to allocate GPAs in QEMU and map backing
MemoryRegions directly in QEMU. For nvdimm (main data region)
we already do it using pc-dimm's GPA allocation algorithm, we also
could use similar approach for nvdimm's label area and vmgenid.

Here is a simple attempt to add a limited GPA allocator in high memory
 https://patchwork.ozlabs.org/patch/540852/
But it haven't got any comment from you and were ignored.
Lets consider it and perhaps we could come up with GPA allocator
that could be used for other things as well.

> 
> >   
> > >   
> > > > > And hey, if you want to use a pci device to pass the physical
> > > > > address guest to host, instead of reserving
> > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > a vendor-specific capability, this way it'll get migrated
> > > > > automatically.    
> > > > Could you elaborate more on this suggestion?    
> > > 
> > > I really just mean using PCI_Config operation region.
> > > If you wish, I'll try to post a prototype next week.  
> > I don't know much about PCI but it would be interesting,
> > perhaps we could use it somewhere else.
> > 
> > However it should be checked if it works with Windows,
> > for example PCI specific _DSM method is ignored by it
> > if PCI device doesn't have working PCI driver bound to it.
> >   
> > >   
> > > > > 
> > > > >     
> > > > > > ---
> > > > > > changes since 17:
> > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > >   - s/uuid/guid/ to match spec
> > > > > > changes since 14:
> > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > >   - permit only one vmgenid to be created
> > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > > ---
> > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > 
> > > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > > index b177e52..6402439 100644
> > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > >  CONFIG_IOAPIC=y
> > > > > >  CONFIG_PVPANIC=y
> > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > +CONFIG_VMGENID=y
> > > > > >  CONFIG_NVDIMM=y
> > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > >  CONFIG_XIO3130=y
> > > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > > index 6e3b312..fdac18f 100644
> > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > >  CONFIG_IOAPIC=y
> > > > > >  CONFIG_PVPANIC=y
> > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > +CONFIG_VMGENID=y
> > > > > >  CONFIG_NVDIMM=y
> > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > >  CONFIG_XIO3130=y
> > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > index 0adcb89..e65ecf9 100644
> > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > +1b36:0009  PCI VM-Generation device
> > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > >  
> > > > > >  All these devices are documented in docs/specs.
> > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > index 78758e2..0187262 100644
> > > > > > --- a/hw/i386/acpi-build.c
> > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > @@ -44,6 +44,7 @@
> > > > > >  #include "hw/acpi/tpm.h"
> > > > > >  #include "sysemu/tpm_backend.h"
> > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > +#include "hw/misc/vmgenid.h"
> > > > > >  
> > > > > >  /* Supported chipsets: */
> > > > > >  #include "hw/acpi/piix4.h"
> > > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > >  }
> > > > > >  
> > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > +{
> > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > +
> > > > > > +    dev = aml_device("VGEN");
> > > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > > +
> > > > > > +    pkg = aml_package(2);
> > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > +
> > > > > > +    /*
> > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > +     * For more verbose comment see this commit message.      
> > > > > 
> > > > > What does "this commit message" mean?    
> > > > above commit message. Should I reword it to just 'see commit message'
> > > >     
> > > > >     
> > > > > > +     */
> > > > > > +     crs = aml_resource_template();
> > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > +     return dev;
> > > > > > +}
> > > > > > +
> > > > > >  /*
> > > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > >              }
> > > > > >  
> > > > > >              if (bus) {
> > > > > > +                Object *vmgen;
> > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > >                      aml_append(scope, dev);
> > > > > >                  }
> > > > > >  
> > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > +                if (vmgen) {
> > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > +                    uint64_t buf_paddr =
> > > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > > +
> > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > > +
> > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > +                        aml_append(method,
> > > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > +                                       aml_int(0x80)));
> > > > > > +                        aml_append(ssdt, method);
> > > > > > +                    }
> > > > > > +                }
> > > > > > +
> > > > > >                  aml_append(sb_scope, scope);
> > > > > >              }
> > > > > >          }
> > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > > >      {
> > > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > > >  
> > > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > > -
> > > > > >          if (misc->is_piix4) {
> > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > >              aml_append(method,
> > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > index d4765c2..1f05edd 100644
> > > > > > --- a/hw/misc/Makefile.objs
> > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > > >  
> > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > new file mode 100644
> > > > > > index 0000000..a2fbdfc
> > > > > > --- /dev/null
> > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > @@ -0,0 +1,154 @@
> > > > > > +/*
> > > > > > + *  Virtual Machine Generation ID Device
> > > > > > + *
> > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > + *
> > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > + *
> > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > + * See the COPYING file in the top-level directory.
> > > > > > + *
> > > > > > + */
> > > > > > +
> > > > > > +#include "hw/i386/pc.h"
> > > > > > +#include "hw/pci/pci.h"
> > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > +#include "hw/acpi/acpi.h"
> > > > > > +#include "qapi/visitor.h"
> > > > > > +
> > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > > +
> > > > > > +typedef struct VmGenIdState {
> > > > > > +    PCIDevice parent_obj;
> > > > > > +    MemoryRegion iomem;
> > > > > > +    union {
> > > > > > +        uint8_t guid[16];
> > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > +    };
> > > > > > +    bool guid_set;
> > > > > > +} VmGenIdState;
> > > > > > +
> > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > +{
> > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > > +    if (!obj) {
> > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > +    }
> > > > > > +    return obj;
> > > > > > +}
> > > > > > +
> > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > +{
> > > > > > +    Object *acpi_obj;
> > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > +
> > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > +
> > > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > +    if (acpi_obj) {
> > > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > +
> > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > > +{
> > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > +
> > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > +                   value);
> > > > > > +        return;
> > > > > > +    }
> > > > > > +
> > > > > > +    s->guid_set = true;
> > > > > > +    vmgenid_update_guest(s);
> > > > > > +}
> > > > > > +
> > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > > +                                   const char *name, Error **errp)
> > > > > > +{
> > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > +
> > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > +        return;
> > > > > > +    }
> > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > +}
> > > > > > +
> > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > +{
> > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > +
> > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > > +                           &error_abort);
> > > > > > +
> > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > > +}
> > > > > > +
> > > > > > +
> > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > +{
> > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > +    bool ambiguous = false;
> > > > > > +
> > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > +    if (ambiguous) {
> > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > +                         " device is permitted");
> > > > > > +        return;
> > > > > > +    }
> > > > > > +
> > > > > > +    if (!s->guid_set) {
> > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > +        return;
> > > > > > +    }
> > > > > > +
> > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > +        &s->iomem);
> > > > > > +    return;
> > > > > > +}
> > > > > > +
> > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > +{
> > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > +
> > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > +    dc->hotpluggable = false;
> > > > > > +    k->realize = vmgenid_realize;
> > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > +}
> > > > > > +
> > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > +};
> > > > > > +
> > > > > > +static void vmgenid_register_types(void)
> > > > > > +{
> > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > +}
> > > > > > +
> > > > > > +type_init(vmgenid_register_types)
> > > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > > new file mode 100644
> > > > > > index 0000000..b90882c
> > > > > > --- /dev/null
> > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > @@ -0,0 +1,27 @@
> > > > > > +/*
> > > > > > + *  Virtual Machine Generation ID Device
> > > > > > + *
> > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > + *
> > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > + *
> > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > + * See the COPYING file in the top-level directory.
> > > > > > + *
> > > > > > + */
> > > > > > +
> > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > +#define HW_MISC_VMGENID_H
> > > > > > +
> > > > > > +#include "qom/object.h"
> > > > > > +
> > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > +#define VMGENID_GUID             "guid"
> > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > +
> > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > +
> > > > > > +#endif
> > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > index dedf277..f4c9d48 100644
> > > > > > --- a/include/hw/pci/pci.h
> > > > > > +++ b/include/hw/pci/pci.h
> > > > > > @@ -94,6 +94,7 @@
> > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > >  
> > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > -- 
> > > > > > 1.8.3.1      
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-02  9:59             ` Igor Mammedov
@ 2016-02-02 11:16               ` Michael S. Tsirkin
  2016-02-09 10:46                 ` Igor Mammedov
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-02 11:16 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, ghammer, qemu-devel, lcapitulino, lersek

On Tue, Feb 02, 2016 at 10:59:53AM +0100, Igor Mammedov wrote:
> On Sun, 31 Jan 2016 18:22:13 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:
> > > On Thu, 28 Jan 2016 14:59:25 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:  
> > > > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > >     
> > > > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:    
> > > > > > > Based on Microsoft's specifications (paper can be
> > > > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > > > add a PCI device with corresponding description in
> > > > > > > SSDT ACPI table.
> > > > > > > 
> > > > > > > The GUID is set using "vmgenid.guid" property or
> > > > > > > a corresponding HMP/QMP command.
> > > > > > > 
> > > > > > > Example of using vmgenid device:
> > > > > > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > > > 
> > > > > > > 'vmgenid' device initialization flow is as following:
> > > > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > > > >     are generated with \_SB.VMGI.ADDR constant pointing to
> > > > > > >     GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > > > 
> > > > > > > Note:
> > > > > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > > > Testing various Windows versions showed that, OS
> > > > > > > doesn't touch nor checks for resource conflicts
> > > > > > > for such PCI devices.
> > > > > > > There was concern that during PCI rebalancing, OS
> > > > > > > could reprogram the BAR at other place, which would
> > > > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > > > address.
> > > > > > > However testing showed that Windows does rebalancing
> > > > > > > only for PCI device that have a driver attached
> > > > > > > and completely ignores NO_DRV class of devices.
> > > > > > > Which in turn creates a problem where OS could remap
> > > > > > > one of PCI devices(with driver) over BAR used by
> > > > > > > a driver-less PCI device.
> > > > > > > Statically declaring used memory range as VGEN._CRS
> > > > > > > makes OS to honor resource reservation and an ignored
> > > > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > > > 
> > > > > > > Signed-off-by: Gal Hammer <ghammer@redhat.com>
> > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>      
> > > > > > 
> > > > > > It's an interesting hack, but this needs some thought. BIOS has no idea
> > > > > > this BAR is special and can not be rebalanced, so it might put the BAR
> > > > > > in the middle of the range, in effect fragmenting it.    
> > > > > yep that's the only drawback in PCI approach.
> > > > >     
> > > > > > Really I think something like V12 just rewritten using the new APIs
> > > > > > (probably with something like build_append_named_dword that I suggested)
> > > > > > would be much a simpler way to implement this device, given
> > > > > > the weird API limitations.    
> > > > > We went over stating drawbacks of both approaches several times 
> > > > > and that's where I strongly disagree with using v12 AML patching
> > > > > approach for reasons stated in those discussions.    
> > > > 
> > > > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > > > to host, and to have costom code to migrate the address.  
> > > allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
> > > approach for task at hand,
> > > let me enumerate one more time the issues that make me dislike it so much
> > > (in order where most disliked ones go the first):
> > > 
> > > 1. over-engineered for the task at hand, 
> > >    for device to become initialized guest OS has to execute AML,
> > >    so init chain looks like:
> > >      QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
> > >          QEMU (update buf address)
> > >    it's hell to debug when something doesn't work right in this chain  
> > 
> > Well this is not very different from e.g. virtio.
> > If it's just AML that worries you, we could teach BIOS/EFI a new command
> > to give some addresses after linking back to QEMU. Would this address
> > this issue?
> it would make it marginally better (especially from tests pov)
> though it won't fix other issues.
> 
> > 
> > 
> > >    even if there isn't any memory corruption that incorrect AML patching
> > >    could introduce.
> > >    As result of complexity patches are hard to review since one has
> > >    to remember/relearn all details how bios_linker in QEMU and BIOS works,
> > >    hence chance of regression is very high.
> > >    Dynamically patched AML also introduces its own share of AML
> > >    code that has to deal with dynamic buff address value.
> > >    For an example:
> > >      "nvdimm acpi: add _CRS" https://patchwork.ozlabs.org/patch/566697/
> > >    27 liner patch could be just 5-6 lines if static (known in advance)
> > >    buffer address were used to declare static _CRS variable.  
> > 
> > Problem is with finding a fixed address, and fragmentation that this
> > causes.  Look at the mess we have with just allocating addresses for
> > RAM.  I think it's a mistake to add to this mess.  Either let's teach
> > management to specify an address map, or let guest allocate addresses
> > for us.
> 
> Yep, problem here is in 'fixed' part but not in an address in general.
> Allowing Mgmt to specify address map partially works. See pc-dimm,
> on target side libvirt specifies address where it has been mapped
> on source. Initial RAM mess could be fixed for future machines types
> in similar way by replacing memory_region_allocate_system_memory()
> with pc-dimms, so that specific machine type could reproduce
> the same layout.

But this requires management to specify the complete memory map.
Otherwise we won't be able to change the layout, ever,
even when we change machine types.
Migration is different as addresses are queried on source.

Hmm, I did not realize someone might misuse this and
set address manually even without migration.
We need to find a way to prevent that before it's too late.
Eduardo - any ideas?


> But default addresses doesn't appear magically and have to
> come from somewhere, so we have to have an address allocator
> somewhere.

I feel for advanced functionality like vm gen id,
we could require all addresses to be specified.

> If we put allocator into guest and emulate memory controller
> in QEMU, we probably would need to add fw_cfg interfaces
> that describe hardware which needs mapping (probably ever
> growing interface like it has been with ACPI tables, before
> we axed them in BIOS and moved them into QEMU).
> Alternatively we can put allocator in QEMU which could
> be simpler to implement and maintain since we won't need
> to implement extra fw_cfg interfaces and memory controller
> and ship/fix QEMU/BIOS pair in sync as it has been with ACPI
> in past.

So the linker interface solves this rather neatly:
bios allocates memory, bios passes memory map to guest.
Served us well for several years without need for extensions,
and it does solve the VM GEN ID problem, even though
1. it was never designed for huge areas like nvdimm seems to want to use
2. we might want to add a new 64 bit flag to avoid touching low memory


> > > 2. ACPI approach consumes guest usable RAM to allocate buffer
> > >    and then makes device to DMA data in that RAM.
> > >    That's a design point I don't agree with.  
> > 
> > Blame the broken VM GEN ID spec.
> > 
> > For VM GEN ID, instead of DMA we could make ACPI code read PCI BAR and
> > copy data over, this would fix rebalancing but there is a problem with
> > this approach: it can not be done atomically (while VM is not yet
> > running and accessing RAM).  So you can have guest read a partially
> > corrupted ID from memory.
> 
> Yep, VM GEN ID spec is broken and we can't do anything about it as
> it's absolutely impossible to guaranty atomic update as guest OS
> has address of the buffer and can read it any time. Nothing could be
> done here.

Hmm I thought we can stop VM while we are changing the ID.
But of course VM could be accessing it at the same time.
So I take this back, ACPI code reading PCI BAR and
writing data out to the buffer would be fine
from this point of view.

> > 
> > And hey, nowdays we actually made fw cfg do DMA too.
> 
> I'm not against DMA, it's direction which seems wrong to me.
> What we need here is a way to allocate GPA range and make sure
> that QEMU maps device memory there, PCI BAR is one
> of the ways to do it.

OK fine, but returning PCI BAR address to guest is wrong.
How about reading it from ACPI then? Is it really
broken unless there's *also* a driver?


> > >    Just compare with a graphics card design, where on device memory
> > >    is mapped directly at some GPA not wasting RAM that guest could
> > >    use for other tasks.  
> > 
> > This might have been true 20 years ago.  Most modern cards do DMA.
> 
> Modern cards, with it's own RAM, map its VRAM in address space directly
> and allow users use it (GEM API). So they do not waste conventional RAM.
> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> series (even PCI class id is the same)

Don't know enough about graphics really, I'm not sure how these are
relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
mostly use guest RAM, not on card RAM.

> > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > >    instead of consuming guest's RAM they should be mapped at
> > >    some GPA and their memory accessed directly.  
> > 
> > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > address. This breaks the straight-forward approach of using a
> > rebalanceable PCI BAR.
> 
> For PCI rebalance to work on Windows, one has to provide working PCI driver
> otherwise OS will ignore it when rebalancing happens and
> might map something else over ignored BAR.

Does it disable the BAR then? Or just move it elsewhere?

> > 
> > >    In that case NVDIMM could even map whole label area and
> > >    significantly simplify QEMU<->OSPM protocol that currently
> > >    serializes that data through a 4K page.
> > >    There is also performance issue with buffer allocated in RAM,
> > >    because DMA adds unnecessary copying step when data could
> > >    be read/written directly of NVDIMM.
> > >    It might be no very important for _DSM interface but when it
> > >    comes to supporting block mode it can become an issue.  
> > 
> > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > it's guaranteed to work across BAR rebalancing.
> > Would that address the performance issue?
> 
> it would if rebalancing were to account for driverless PCI device BARs,
> but it doesn't hence such BARs need to be statically pinned
> at place where BIOS put them at start up.
> I'm also not sure that PCIConfig operation region would work
> on Windows without loaded driver (similar to _DSM case).
> 
> 
> > > Above points make ACPI patching approach not robust and fragile
> > > and hard to maintain.  
> > 
> > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > get what appears your general dislike of the linker host/guest
> > interface.
> Besides technical issues general dislike is just what I've written
> "not robust and fragile" bios_linker_loader_add_pointer() interface.
> 
> to make it less fragile:
>  1. it should be impossible to corrupt memory or patch wrong address.
>     current impl. silently relies on value referenced by 'pointer' argument
>     and to figure that out one has to read linker code on BIOS side.
>     That could be easily set wrong and slip through review.

That's an API issue, it seemed like a good idea but I guess
it confuses people. Would you be happier using an offset
instead of a pointer?

>     API shouldn't rely on the caller setting value pointed by that argument.

I couldn't parse that one. Care suggesting a cleaner API for linker?

>  2. If it's going to be used for patching AML, it should assert
>     when bios_linker_loader_add_pointer() is called if to be patched
>     AML object is wrong and patching would corrupt AML blob.

Hmm for example check that the patched data has
the expected pattern?

> 
> > It's there and we are not moving away from it, so why not
> > use it in more places?  Or if you think it's wrong, why don't you build
> > something better then?  We could then maybe use it for these things as
> > well.
> 
> Yep, I think for vmgenid and even more so for nvdimm
> it would be better to allocate GPAs in QEMU and map backing
> MemoryRegions directly in QEMU.
> For nvdimm (main data region)
> we already do it using pc-dimm's GPA allocation algorithm, we also
> could use similar approach for nvdimm's label area and vmgenid.
> 
> Here is a simple attempt to add a limited GPA allocator in high memory
>  https://patchwork.ozlabs.org/patch/540852/
> But it haven't got any comment from you and were ignored.
> Lets consider it and perhaps we could come up with GPA allocator
> that could be used for other things as well.

For nvdimm label area, I agree passing things through
a 4K buffer seems inefficient.

I'm not sure what's a better way though.

Use 64 bit memory? Setting aside old guests such as XP,
does it break 32 bit guests?

I'm really afraid of adding yet another allocator, I think you
underestimate the maintainance headache: it's not theoretical and is
already felt.

> 
> > 
> > >   
> > > >   
> > > > > > And hey, if you want to use a pci device to pass the physical
> > > > > > address guest to host, instead of reserving
> > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > automatically.    
> > > > > Could you elaborate more on this suggestion?    
> > > > 
> > > > I really just mean using PCI_Config operation region.
> > > > If you wish, I'll try to post a prototype next week.  
> > > I don't know much about PCI but it would be interesting,
> > > perhaps we could use it somewhere else.
> > > 
> > > However it should be checked if it works with Windows,
> > > for example PCI specific _DSM method is ignored by it
> > > if PCI device doesn't have working PCI driver bound to it.
> > >   
> > > >   
> > > > > > 
> > > > > >     
> > > > > > > ---
> > > > > > > changes since 17:
> > > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > changes since 14:
> > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > >   - permit only one vmgenid to be created
> > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > > > ---
> > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > 
> > > > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > > > index b177e52..6402439 100644
> > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > >  CONFIG_IOAPIC=y
> > > > > > >  CONFIG_PVPANIC=y
> > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > +CONFIG_VMGENID=y
> > > > > > >  CONFIG_NVDIMM=y
> > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > >  CONFIG_XIO3130=y
> > > > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > >  CONFIG_IOAPIC=y
> > > > > > >  CONFIG_PVPANIC=y
> > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > +CONFIG_VMGENID=y
> > > > > > >  CONFIG_NVDIMM=y
> > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > >  CONFIG_XIO3130=y
> > > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > >  
> > > > > > >  All these devices are documented in docs/specs.
> > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > index 78758e2..0187262 100644
> > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > @@ -44,6 +44,7 @@
> > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > >  
> > > > > > >  /* Supported chipsets: */
> > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > >  }
> > > > > > >  
> > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > +{
> > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > +
> > > > > > > +    dev = aml_device("VGEN");
> > > > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > > > +
> > > > > > > +    pkg = aml_package(2);
> > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > +
> > > > > > > +    /*
> > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > > +     * For more verbose comment see this commit message.      
> > > > > > 
> > > > > > What does "this commit message" mean?    
> > > > > above commit message. Should I reword it to just 'see commit message'
> > > > >     
> > > > > >     
> > > > > > > +     */
> > > > > > > +     crs = aml_resource_template();
> > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > +     return dev;
> > > > > > > +}
> > > > > > > +
> > > > > > >  /*
> > > > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > >              }
> > > > > > >  
> > > > > > >              if (bus) {
> > > > > > > +                Object *vmgen;
> > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > >                      aml_append(scope, dev);
> > > > > > >                  }
> > > > > > >  
> > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > +                if (vmgen) {
> > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > +                    uint64_t buf_paddr =
> > > > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > > > +
> > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > > > +
> > > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > > +                        aml_append(method,
> > > > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > +                                       aml_int(0x80)));
> > > > > > > +                        aml_append(ssdt, method);
> > > > > > > +                    }
> > > > > > > +                }
> > > > > > > +
> > > > > > >                  aml_append(sb_scope, scope);
> > > > > > >              }
> > > > > > >          }
> > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > > > >      {
> > > > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > > > >  
> > > > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > > > -
> > > > > > >          if (misc->is_piix4) {
> > > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > > >              aml_append(method,
> > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > index d4765c2..1f05edd 100644
> > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > > > >  
> > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > new file mode 100644
> > > > > > > index 0000000..a2fbdfc
> > > > > > > --- /dev/null
> > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > @@ -0,0 +1,154 @@
> > > > > > > +/*
> > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > + *
> > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > + *
> > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > + *
> > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > + *
> > > > > > > + */
> > > > > > > +
> > > > > > > +#include "hw/i386/pc.h"
> > > > > > > +#include "hw/pci/pci.h"
> > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > +#include "qapi/visitor.h"
> > > > > > > +
> > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > > > +
> > > > > > > +typedef struct VmGenIdState {
> > > > > > > +    PCIDevice parent_obj;
> > > > > > > +    MemoryRegion iomem;
> > > > > > > +    union {
> > > > > > > +        uint8_t guid[16];
> > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > +    };
> > > > > > > +    bool guid_set;
> > > > > > > +} VmGenIdState;
> > > > > > > +
> > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > +{
> > > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > > > +    if (!obj) {
> > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > +    }
> > > > > > > +    return obj;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > +{
> > > > > > > +    Object *acpi_obj;
> > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > +
> > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > > +
> > > > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > +    if (acpi_obj) {
> > > > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > +
> > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > +    }
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > > > +{
> > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > +
> > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > +                   value);
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    s->guid_set = true;
> > > > > > > +    vmgenid_update_guest(s);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > > > +                                   const char *name, Error **errp)
> > > > > > > +{
> > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > +
> > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > +{
> > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > +
> > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > > > +                           &error_abort);
> > > > > > > +
> > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > > > +}
> > > > > > > +
> > > > > > > +
> > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > +{
> > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > +    bool ambiguous = false;
> > > > > > > +
> > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > > +    if (ambiguous) {
> > > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > > +                         " device is permitted");
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    if (!s->guid_set) {
> > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > +        &s->iomem);
> > > > > > > +    return;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > > +{
> > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > +
> > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > +    dc->hotpluggable = false;
> > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > +};
> > > > > > > +
> > > > > > > +static void vmgenid_register_types(void)
> > > > > > > +{
> > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > +}
> > > > > > > +
> > > > > > > +type_init(vmgenid_register_types)
> > > > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > > > new file mode 100644
> > > > > > > index 0000000..b90882c
> > > > > > > --- /dev/null
> > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > @@ -0,0 +1,27 @@
> > > > > > > +/*
> > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > + *
> > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > + *
> > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > + *
> > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > + *
> > > > > > > + */
> > > > > > > +
> > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > +
> > > > > > > +#include "qom/object.h"
> > > > > > > +
> > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > +
> > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > +
> > > > > > > +#endif
> > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > index dedf277..f4c9d48 100644
> > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > @@ -94,6 +94,7 @@
> > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > >  
> > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > -- 
> > > > > > > 1.8.3.1      
> > 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-01-28 17:00     ` Igor Mammedov
@ 2016-02-03 17:55       ` Eduardo Habkost
  2016-02-03 18:46         ` Laszlo Ersek
  2016-02-03 19:06         ` Michael S. Tsirkin
  0 siblings, 2 replies; 59+ messages in thread
From: Eduardo Habkost @ 2016-02-03 17:55 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: agraf, Peter Maydell, mst, ghammer, lersek, qemu-devel,
	lcapitulino, borntraeger, qemu-ppc, Gerd Hoffmann, cornelia.huck,
	pbonzini, rth, Andreas Färber, david

On Thu, Jan 28, 2016 at 06:00:31PM +0100, Igor Mammedov wrote:
[...]
> It looks like this series might go nowhere but this patch
> is not tied to it and useful to us in general
> so perhaps you could pick it up after ACKs from
> S390/SPAPR maintainers.
> 
> > 
> > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

We don't have a maintainer for hw/core/machine.c, hw/core/qdev*,
and related files.

Assuming we don't have a volunteer to maintain them officially,
can we agree on a default destination for those patches so they
don't linger on the list? Michael? Andreas?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-03 17:55       ` [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly) Eduardo Habkost
@ 2016-02-03 18:46         ` Laszlo Ersek
  2016-02-03 19:06         ` Michael S. Tsirkin
  1 sibling, 0 replies; 59+ messages in thread
From: Laszlo Ersek @ 2016-02-03 18:46 UTC (permalink / raw)
  To: Eduardo Habkost, Igor Mammedov
  Cc: agraf, Peter Maydell, mst, ghammer, qemu-devel, lcapitulino,
	borntraeger, qemu-ppc, Gerd Hoffmann, cornelia.huck, pbonzini,
	rth, Andreas Färber, david

On 02/03/16 18:55, Eduardo Habkost wrote:
> On Thu, Jan 28, 2016 at 06:00:31PM +0100, Igor Mammedov wrote:
> [...]
>> It looks like this series might go nowhere but this patch
>> is not tied to it and useful to us in general
>> so perhaps you could pick it up after ACKs from
>> S390/SPAPR maintainers.
>>
>>>
>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> 
> We don't have a maintainer for hw/core/machine.c, hw/core/qdev*,
> and related files.
> 
> Assuming we don't have a volunteer to maintain them officially,
> can we agree on a default destination for those patches so they
> don't linger on the list? Michael? Andreas?

Preferably someone who is otherwise not incessantly overloaded by
patches to review.

Just my two cents.

Laszlo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-03 17:55       ` [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly) Eduardo Habkost
  2016-02-03 18:46         ` Laszlo Ersek
@ 2016-02-03 19:06         ` Michael S. Tsirkin
  2016-02-04 11:31           ` Paolo Bonzini
  1 sibling, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-03 19:06 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: agraf, Peter Maydell, ghammer, qemu-devel, lcapitulino,
	borntraeger, qemu-ppc, Gerd Hoffmann, pbonzini, cornelia.huck,
	Igor Mammedov, rth, lersek, Andreas Färber, david

On Wed, Feb 03, 2016 at 03:55:04PM -0200, Eduardo Habkost wrote:
> On Thu, Jan 28, 2016 at 06:00:31PM +0100, Igor Mammedov wrote:
> [...]
> > It looks like this series might go nowhere but this patch
> > is not tied to it and useful to us in general
> > so perhaps you could pick it up after ACKs from
> > S390/SPAPR maintainers.
> > 
> > > 
> > > Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> 
> We don't have a maintainer for hw/core/machine.c, hw/core/qdev*,
> and related files.
> 
> Assuming we don't have a volunteer to maintain them officially,
> can we agree on a default destination for those patches so they
> don't linger on the list? Michael? Andreas?

Not me please. Have too much on my plate.
Would you like to maintain it yourself?

> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-03 19:06         ` Michael S. Tsirkin
@ 2016-02-04 11:31           ` Paolo Bonzini
  2016-02-04 11:41             ` Andreas Färber
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2016-02-04 11:31 UTC (permalink / raw)
  To: Michael S. Tsirkin, Eduardo Habkost
  Cc: agraf, Peter Maydell, ghammer, qemu-devel, lcapitulino,
	borntraeger, qemu-ppc, Gerd Hoffmann, cornelia.huck,
	Igor Mammedov, rth, lersek, Andreas Färber, david



On 03/02/2016 20:06, Michael S. Tsirkin wrote:
> On Wed, Feb 03, 2016 at 03:55:04PM -0200, Eduardo Habkost wrote:
>> On Thu, Jan 28, 2016 at 06:00:31PM +0100, Igor Mammedov wrote:
>> [...]
>>> It looks like this series might go nowhere but this patch
>>> is not tied to it and useful to us in general
>>> so perhaps you could pick it up after ACKs from
>>> S390/SPAPR maintainers.
>>>
>>>>
>>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>
>> We don't have a maintainer for hw/core/machine.c, hw/core/qdev*,
>> and related files.
>>
>> Assuming we don't have a volunteer to maintain them officially,
>> can we agree on a default destination for those patches so they
>> don't linger on the list? Michael? Andreas?
> 
> Not me please. Have too much on my plate.
> Would you like to maintain it yourself?

That's my suggestion too.  I guess Igor and I could help with reviews,
but testing and sending the pull requests would add too much work.
Since you're the main one touching it, it makes sense for you to handle it.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-04 11:31           ` Paolo Bonzini
@ 2016-02-04 11:41             ` Andreas Färber
  2016-02-04 11:55               ` Paolo Bonzini
                                 ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Andreas Färber @ 2016-02-04 11:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: agraf, Eduardo Habkost, Michael S. Tsirkin, ghammer, qemu-devel,
	lcapitulino, borntraeger, qemu-ppc, Gerd Hoffmann, cornelia.huck,
	Igor Mammedov, rth, lersek, Peter Maydell, Marcel Apfelbaum,
	david

Am 04.02.2016 um 12:31 schrieb Paolo Bonzini:
> On 03/02/2016 20:06, Michael S. Tsirkin wrote:
>> On Wed, Feb 03, 2016 at 03:55:04PM -0200, Eduardo Habkost wrote:
>>> On Thu, Jan 28, 2016 at 06:00:31PM +0100, Igor Mammedov wrote:
>>> [...]
>>>> It looks like this series might go nowhere but this patch
>>>> is not tied to it and useful to us in general
>>>> so perhaps you could pick it up after ACKs from
>>>> S390/SPAPR maintainers.
>>>>
>>>>>
>>>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>>
>>> We don't have a maintainer for hw/core/machine.c, hw/core/qdev*,
>>> and related files.
>>>
>>> Assuming we don't have a volunteer to maintain them officially,
>>> can we agree on a default destination for those patches so they
>>> don't linger on the list? Michael? Andreas?
>>
>> Not me please. Have too much on my plate.
>> Would you like to maintain it yourself?
> 
> That's my suggestion too.  I guess Igor and I could help with reviews,
> but testing and sending the pull requests would add too much work.
> Since you're the main one touching it, it makes sense for you to handle it.

You're talking about machine, right? Some time ago I had proposed Marcel
who initially worked on it, but I'm fine with anyone taking it.

For some (but not all) core qdev parts related to the (stalled) QOM
migration I've been taking care of via qom-next. Last time this came up
you didn't want anyone to be M: for qdev, so maybe we can use R: so that
at least people automatically get CC'ed and we avoid this recurring
discussion?

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-04 11:41             ` Andreas Färber
@ 2016-02-04 11:55               ` Paolo Bonzini
  2016-02-04 12:06                 ` Michael S. Tsirkin
  2016-02-05  7:52                 ` Markus Armbruster
  2016-02-04 12:03               ` Michael S. Tsirkin
  2016-02-04 12:12               ` Marcel Apfelbaum
  2 siblings, 2 replies; 59+ messages in thread
From: Paolo Bonzini @ 2016-02-04 11:55 UTC (permalink / raw)
  To: Andreas Färber
  Cc: agraf, Eduardo Habkost, Michael S. Tsirkin, ghammer, qemu-devel,
	lcapitulino, borntraeger, qemu-ppc, Gerd Hoffmann, cornelia.huck,
	Igor Mammedov, rth, lersek, Peter Maydell, Marcel Apfelbaum,
	david



On 04/02/2016 12:41, Andreas Färber wrote:
> You're talking about machine, right? Some time ago I had proposed Marcel
> who initially worked on it, but I'm fine with anyone taking it.

Yes.

> For some (but not all) core qdev parts related to the (stalled) QOM
> migration I've been taking care of via qom-next. Last time this came up
> you didn't want anyone to be M: for qdev, so maybe we can use R: so that
> at least people automatically get CC'ed and we avoid this recurring
> discussion?

I might have changed my mind on that.  You definitely should be M: for qdev.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-04 11:41             ` Andreas Färber
  2016-02-04 11:55               ` Paolo Bonzini
@ 2016-02-04 12:03               ` Michael S. Tsirkin
  2016-02-04 12:12               ` Marcel Apfelbaum
  2 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-04 12:03 UTC (permalink / raw)
  To: Andreas Färber
  Cc: agraf, Eduardo Habkost, Peter Maydell, ghammer, qemu-devel,
	lcapitulino, borntraeger, qemu-ppc, Gerd Hoffmann, Igor Mammedov,
	cornelia.huck, Paolo Bonzini, rth, lersek, Marcel Apfelbaum,
	david

On Thu, Feb 04, 2016 at 12:41:39PM +0100, Andreas Färber wrote:
> Am 04.02.2016 um 12:31 schrieb Paolo Bonzini:
> > On 03/02/2016 20:06, Michael S. Tsirkin wrote:
> >> On Wed, Feb 03, 2016 at 03:55:04PM -0200, Eduardo Habkost wrote:
> >>> On Thu, Jan 28, 2016 at 06:00:31PM +0100, Igor Mammedov wrote:
> >>> [...]
> >>>> It looks like this series might go nowhere but this patch
> >>>> is not tied to it and useful to us in general
> >>>> so perhaps you could pick it up after ACKs from
> >>>> S390/SPAPR maintainers.
> >>>>
> >>>>>
> >>>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> >>>
> >>> We don't have a maintainer for hw/core/machine.c, hw/core/qdev*,
> >>> and related files.
> >>>
> >>> Assuming we don't have a volunteer to maintain them officially,
> >>> can we agree on a default destination for those patches so they
> >>> don't linger on the list? Michael? Andreas?
> >>
> >> Not me please. Have too much on my plate.
> >> Would you like to maintain it yourself?
> > 
> > That's my suggestion too.  I guess Igor and I could help with reviews,
> > but testing and sending the pull requests would add too much work.
> > Since you're the main one touching it, it makes sense for you to handle it.
> 
> You're talking about machine, right? Some time ago I had proposed Marcel
> who initially worked on it, but I'm fine with anyone taking it.

Sure, Marcel can do it too.

> 
> For some (but not all) core qdev parts related to the (stalled) QOM
> migration I've been taking care of via qom-next. Last time this came up
> you didn't want anyone to be M: for qdev, so maybe we can use R: so that
> at least people automatically get CC'ed and we avoid this recurring
> discussion?
> 
> Regards,
> Andreas

I think that was because core was not changing much, so merging through
other trees was more appropriate, but that changed.

So it appears we already can have
M: Eduardo Habkost <ehabkost@redhat.com>
M: Marcel Apfelbaum <marcel@redhat.com>

This should spread the review load nicely.

> -- 
> SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-04 11:55               ` Paolo Bonzini
@ 2016-02-04 12:06                 ` Michael S. Tsirkin
  2016-02-05  7:49                   ` Markus Armbruster
  2016-02-05  7:52                 ` Markus Armbruster
  1 sibling, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-04 12:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: agraf, Eduardo Habkost, Peter Maydell, ghammer, qemu-devel,
	lcapitulino, borntraeger, qemu-ppc, Gerd Hoffmann, cornelia.huck,
	Igor Mammedov, rth, lersek, Andreas Färber,
	Marcel Apfelbaum, david

On Thu, Feb 04, 2016 at 12:55:22PM +0100, Paolo Bonzini wrote:
> 
> 
> On 04/02/2016 12:41, Andreas Färber wrote:
> > You're talking about machine, right? Some time ago I had proposed Marcel
> > who initially worked on it, but I'm fine with anyone taking it.
> 
> Yes.
> 
> > For some (but not all) core qdev parts related to the (stalled) QOM
> > migration I've been taking care of via qom-next. Last time this came up
> > you didn't want anyone to be M: for qdev, so maybe we can use R: so that
> > at least people automatically get CC'ed and we avoid this recurring
> > discussion?
> 
> I might have changed my mind on that.  You definitely should be M: for qdev.
> 
> Paolo

If Andreas wants to, that's also fine. Several maintainers are
better than one.

-- 
MST

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-04 11:41             ` Andreas Färber
  2016-02-04 11:55               ` Paolo Bonzini
  2016-02-04 12:03               ` Michael S. Tsirkin
@ 2016-02-04 12:12               ` Marcel Apfelbaum
  2 siblings, 0 replies; 59+ messages in thread
From: Marcel Apfelbaum @ 2016-02-04 12:12 UTC (permalink / raw)
  To: Andreas Färber, Paolo Bonzini
  Cc: agraf, Eduardo Habkost, Michael S. Tsirkin, ghammer, qemu-devel,
	lcapitulino, borntraeger, qemu-ppc, Gerd Hoffmann, cornelia.huck,
	Igor Mammedov, rth, lersek, Peter Maydell, david

On 02/04/2016 01:41 PM, Andreas Färber wrote:
> Am 04.02.2016 um 12:31 schrieb Paolo Bonzini:
>> On 03/02/2016 20:06, Michael S. Tsirkin wrote:
>>> On Wed, Feb 03, 2016 at 03:55:04PM -0200, Eduardo Habkost wrote:
>>>> On Thu, Jan 28, 2016 at 06:00:31PM +0100, Igor Mammedov wrote:
>>>> [...]
>>>>> It looks like this series might go nowhere but this patch
>>>>> is not tied to it and useful to us in general
>>>>> so perhaps you could pick it up after ACKs from
>>>>> S390/SPAPR maintainers.
>>>>>
>>>>>>
>>>>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>>>
>>>> We don't have a maintainer for hw/core/machine.c, hw/core/qdev*,
>>>> and related files.
>>>>
>>>> Assuming we don't have a volunteer to maintain them officially,
>>>> can we agree on a default destination for those patches so they
>>>> don't linger on the list? Michael? Andreas?
>>>
>>> Not me please. Have too much on my plate.
>>> Would you like to maintain it yourself?
>>
>> That's my suggestion too.  I guess Igor and I could help with reviews,
>> but testing and sending the pull requests would add too much work.
>> Since you're the main one touching it, it makes sense for you to handle it.
>
> You're talking about machine, right? Some time ago I had proposed Marcel
> who initially worked on it, but I'm fine with anyone taking it.

Hi,

As I previously said I can maintain the machine and the related code.
I'll gladly help Eduardo with reviews or have a tree ready for machine/qdev
and send pull requests if Peter agrees to it.

Eduardo, Peter what is your take on this?

Thanks,
Marcel


>
> For some (but not all) core qdev parts related to the (stalled) QOM
> migration I've been taking care of via qom-next. Last time this came up
> you didn't want anyone to be M: for qdev, so maybe we can use R: so that
> at least people automatically get CC'ed and we avoid this recurring
> discussion?
>
> Regards,
> Andreas
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-04 12:06                 ` Michael S. Tsirkin
@ 2016-02-05  7:49                   ` Markus Armbruster
  2016-02-05  7:51                     ` Marcel Apfelbaum
  0 siblings, 1 reply; 59+ messages in thread
From: Markus Armbruster @ 2016-02-05  7:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Eduardo Habkost, Marcel Apfelbaum, ghammer,
	qemu-devel, agraf, borntraeger, qemu-ppc, Gerd Hoffmann, david,
	Igor Mammedov, cornelia.huck, Paolo Bonzini, lcapitulino, lersek,
	Andreas Färber, rth

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, Feb 04, 2016 at 12:55:22PM +0100, Paolo Bonzini wrote:
>> 
>> 
>> On 04/02/2016 12:41, Andreas Färber wrote:
>> > You're talking about machine, right? Some time ago I had proposed Marcel
>> > who initially worked on it, but I'm fine with anyone taking it.
>> 
>> Yes.
>> 
>> > For some (but not all) core qdev parts related to the (stalled) QOM
>> > migration I've been taking care of via qom-next. Last time this came up
>> > you didn't want anyone to be M: for qdev, so maybe we can use R: so that
>> > at least people automatically get CC'ed and we avoid this recurring
>> > discussion?
>> 
>> I might have changed my mind on that.  You definitely should be M: for qdev.
>> 
>> Paolo
>
> If Andreas wants to, that's also fine. Several maintainers are
> better than one.

*If* the maintainers are all willing and able to work together.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-05  7:49                   ` Markus Armbruster
@ 2016-02-05  7:51                     ` Marcel Apfelbaum
  2016-02-11 19:41                       ` Eduardo Habkost
  0 siblings, 1 reply; 59+ messages in thread
From: Marcel Apfelbaum @ 2016-02-05  7:51 UTC (permalink / raw)
  To: Markus Armbruster, Michael S. Tsirkin
  Cc: Peter Maydell, Eduardo Habkost, ghammer, qemu-devel, agraf,
	borntraeger, qemu-ppc, Gerd Hoffmann, david, Igor Mammedov,
	cornelia.huck, Paolo Bonzini, lcapitulino, lersek,
	Andreas Färber, rth

On 02/05/2016 09:49 AM, Markus Armbruster wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
>> On Thu, Feb 04, 2016 at 12:55:22PM +0100, Paolo Bonzini wrote:
>>>
>>>
>>> On 04/02/2016 12:41, Andreas Färber wrote:
>>>> You're talking about machine, right? Some time ago I had proposed Marcel
>>>> who initially worked on it, but I'm fine with anyone taking it.
>>>
>>> Yes.
>>>
>>>> For some (but not all) core qdev parts related to the (stalled) QOM
>>>> migration I've been taking care of via qom-next. Last time this came up
>>>> you didn't want anyone to be M: for qdev, so maybe we can use R: so that
>>>> at least people automatically get CC'ed and we avoid this recurring
>>>> discussion?
>>>
>>> I might have changed my mind on that.  You definitely should be M: for qdev.
>>>
>>> Paolo
>>
>> If Andreas wants to, that's also fine. Several maintainers are
>> better than one.
>
> *If* the maintainers are all willing and able to work together.
>

No problem here from my point of view :)

Thanks,
Marcel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-04 11:55               ` Paolo Bonzini
  2016-02-04 12:06                 ` Michael S. Tsirkin
@ 2016-02-05  7:52                 ` Markus Armbruster
  1 sibling, 0 replies; 59+ messages in thread
From: Markus Armbruster @ 2016-02-05  7:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Eduardo Habkost, Michael S. Tsirkin, ghammer,
	qemu-devel, agraf, borntraeger, qemu-ppc, Gerd Hoffmann, david,
	cornelia.huck, Igor Mammedov, lcapitulino, lersek,
	Andreas Färber, Marcel Apfelbaum, rth

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 04/02/2016 12:41, Andreas Färber wrote:
>> You're talking about machine, right? Some time ago I had proposed Marcel
>> who initially worked on it, but I'm fine with anyone taking it.
>
> Yes.
>
>> For some (but not all) core qdev parts related to the (stalled) QOM
>> migration I've been taking care of via qom-next. Last time this came up
>> you didn't want anyone to be M: for qdev, so maybe we can use R: so that
>> at least people automatically get CC'ed and we avoid this recurring
>> discussion?
>
> I might have changed my mind on that.  You definitely should be M: for qdev.

Yes.  Would you like co-maintainers for just qdev, for QOM+qdev, or
simply add qdev to your QOM portfolio?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-02 11:16               ` Michael S. Tsirkin
@ 2016-02-09 10:46                 ` Igor Mammedov
  2016-02-09 12:17                   ` Michael S. Tsirkin
  2016-02-10  8:51                   ` Michael S. Tsirkin
  0 siblings, 2 replies; 59+ messages in thread
From: Igor Mammedov @ 2016-02-09 10:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xiao Guangrong, ehabkost, ghammer, qemu-devel, lcapitulino, lersek

On Tue, 2 Feb 2016 13:16:38 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Feb 02, 2016 at 10:59:53AM +0100, Igor Mammedov wrote:
> > On Sun, 31 Jan 2016 18:22:13 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Fri, Jan 29, 2016 at 12:13:59PM +0100, Igor Mammedov wrote:  
> > > > On Thu, 28 Jan 2016 14:59:25 +0200
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >     
> > > > > On Thu, Jan 28, 2016 at 01:03:16PM +0100, Igor Mammedov wrote:    
> > > > > > On Thu, 28 Jan 2016 13:13:04 +0200
> > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > >       
> > > > > > > On Thu, Jan 28, 2016 at 11:54:25AM +0100, Igor Mammedov wrote:      
> > > > > > > > Based on Microsoft's specifications (paper can be
> > > > > > > > downloaded from http://go.microsoft.com/fwlink/?LinkId=260709,
> > > > > > > > easily found by "Virtual Machine Generation ID" keywords),
> > > > > > > > add a PCI device with corresponding description in
> > > > > > > > SSDT ACPI table.
> > > > > > > > 
> > > > > > > > The GUID is set using "vmgenid.guid" property or
> > > > > > > > a corresponding HMP/QMP command.
> > > > > > > > 
> > > > > > > > Example of using vmgenid device:
> > > > > > > >  -device vmgenid,id=FOO,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> > > > > > > > 
> > > > > > > > 'vmgenid' device initialization flow is as following:
> > > > > > > >  1. vmgenid has RAM BAR registered with size of GUID buffer
> > > > > > > >  2. BIOS initializes PCI devices and it maps BAR in PCI hole
> > > > > > > >  3. BIOS reads ACPI tables from QEMU, at that moment tables
> > > > > > > >     are generated with \_SB.VMGI.ADDR constant pointing to
> > > > > > > >     GPA where BIOS's mapped vmgenid's BAR earlier
> > > > > > > > 
> > > > > > > > Note:
> > > > > > > > This implementation uses PCI class 0x0500 code for vmgenid device,
> > > > > > > > that is marked as NO_DRV in Windows's machine.inf.
> > > > > > > > Testing various Windows versions showed that, OS
> > > > > > > > doesn't touch nor checks for resource conflicts
> > > > > > > > for such PCI devices.
> > > > > > > > There was concern that during PCI rebalancing, OS
> > > > > > > > could reprogram the BAR at other place, which would
> > > > > > > > leave VGEN.ADDR pointing to the old (no more valid)
> > > > > > > > address.
> > > > > > > > However testing showed that Windows does rebalancing
> > > > > > > > only for PCI device that have a driver attached
> > > > > > > > and completely ignores NO_DRV class of devices.
> > > > > > > > Which in turn creates a problem where OS could remap
> > > > > > > > one of PCI devices(with driver) over BAR used by
> > > > > > > > a driver-less PCI device.
> > > > > > > > Statically declaring used memory range as VGEN._CRS
> > > > > > > > makes OS to honor resource reservation and an ignored
> > > > > > > > BAR range is not longer touched during PCI rebalancing.
> > > > > > > > 
> > > > > > > > Signed-off-by: Gal Hammer <ghammer@redhat.com>
> > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>        
> > > > > > > 
> > > > > > > It's an interesting hack, but this needs some thought. BIOS has no idea
> > > > > > > this BAR is special and can not be rebalanced, so it might put the BAR
> > > > > > > in the middle of the range, in effect fragmenting it.      
> > > > > > yep that's the only drawback in PCI approach.
> > > > > >       
> > > > > > > Really I think something like V12 just rewritten using the new APIs
> > > > > > > (probably with something like build_append_named_dword that I suggested)
> > > > > > > would be much a simpler way to implement this device, given
> > > > > > > the weird API limitations.      
> > > > > > We went over stating drawbacks of both approaches several times 
> > > > > > and that's where I strongly disagree with using v12 AML patching
> > > > > > approach for reasons stated in those discussions.      
> > > > > 
> > > > > Yes, IIRC you dislike the need to allocate an IO range to pass address
> > > > > to host, and to have costom code to migrate the address.    
> > > > allocating IO ports is fine by me but I'm against using bios_linker (ACPI)
> > > > approach for task at hand,
> > > > let me enumerate one more time the issues that make me dislike it so much
> > > > (in order where most disliked ones go the first):
> > > > 
> > > > 1. over-engineered for the task at hand, 
> > > >    for device to become initialized guest OS has to execute AML,
> > > >    so init chain looks like:
> > > >      QEMU -> BIOS (patch AML) -> OS (AML write buf address to IO port) ->
> > > >          QEMU (update buf address)
> > > >    it's hell to debug when something doesn't work right in this chain    
> > > 
> > > Well this is not very different from e.g. virtio.
> > > If it's just AML that worries you, we could teach BIOS/EFI a new command
> > > to give some addresses after linking back to QEMU. Would this address
> > > this issue?  
> > it would make it marginally better (especially from tests pov)
> > though it won't fix other issues.
> >   
> > > 
> > >   
> > > >    even if there isn't any memory corruption that incorrect AML patching
> > > >    could introduce.
> > > >    As result of complexity patches are hard to review since one has
> > > >    to remember/relearn all details how bios_linker in QEMU and BIOS works,
> > > >    hence chance of regression is very high.
> > > >    Dynamically patched AML also introduces its own share of AML
> > > >    code that has to deal with dynamic buff address value.
> > > >    For an example:
> > > >      "nvdimm acpi: add _CRS" https://patchwork.ozlabs.org/patch/566697/
> > > >    27 liner patch could be just 5-6 lines if static (known in advance)
> > > >    buffer address were used to declare static _CRS variable.    
> > > 
> > > Problem is with finding a fixed address, and fragmentation that this
> > > causes.  Look at the mess we have with just allocating addresses for
> > > RAM.  I think it's a mistake to add to this mess.  Either let's teach
> > > management to specify an address map, or let guest allocate addresses
> > > for us.  
> > 
> > Yep, problem here is in 'fixed' part but not in an address in general.
> > Allowing Mgmt to specify address map partially works. See pc-dimm,
> > on target side libvirt specifies address where it has been mapped
> > on source. Initial RAM mess could be fixed for future machines types
> > in similar way by replacing memory_region_allocate_system_memory()
> > with pc-dimms, so that specific machine type could reproduce
> > the same layout.  
> 
> But this requires management to specify the complete memory map.
for clean start up I'd say it's impossible for mgmt to define
complete or even partial memory map, it just doesn't have
knowledge about machine internals. So GPAs should be allocated
by QEMU (currently it's mostly handpicked fixed ones)

> Otherwise we won't be able to change the layout, ever,
> even when we change machine types.
we won't be able to change layout for old machines but if
we make layout configurable then new machines and probably
with auto allocated GPAs then new machines could be more flexible
wrt layout (but that just vague thoughts for now, maybe some day
I'll try to implement it)

> Migration is different as addresses are queried on source.
> 
> Hmm, I did not realize someone might misuse this and
> set address manually even without migration.
> We need to find a way to prevent that before it's too late.
> Eduardo - any ideas?
it's late and it doesn't really matter as pc-dimm.addr is
confined inside hotplug range.


> > But default addresses doesn't appear magically and have to
> > come from somewhere, so we have to have an address allocator
> > somewhere.  
> 
> I feel for advanced functionality like vm gen id,
> we could require all addresses to be specified.
Any suggestion how would mgmt pick this address?


> > If we put allocator into guest and emulate memory controller
> > in QEMU, we probably would need to add fw_cfg interfaces
> > that describe hardware which needs mapping (probably ever
> > growing interface like it has been with ACPI tables, before
> > we axed them in BIOS and moved them into QEMU).
> > Alternatively we can put allocator in QEMU which could
> > be simpler to implement and maintain since we won't need
> > to implement extra fw_cfg interfaces and memory controller
> > and ship/fix QEMU/BIOS pair in sync as it has been with ACPI
> > in past.  
> 
> So the linker interface solves this rather neatly:
> bios allocates memory, bios passes memory map to guest.
> Served us well for several years without need for extensions,
> and it does solve the VM GEN ID problem, even though
> 1. it was never designed for huge areas like nvdimm seems to want to use
> 2. we might want to add a new 64 bit flag to avoid touching low memory
linker interface is fine for some readonly data, like ACPI tables
especially fixed tables not so for AML ones is one wants to patch it.

However now when you want to use it for other purposes you start
adding extensions and other guest->QEMU channels to communicate
patching info back.
It steals guest's memory which is also not nice and doesn't scale well.

> 
> 
> > > > 2. ACPI approach consumes guest usable RAM to allocate buffer
> > > >    and then makes device to DMA data in that RAM.
> > > >    That's a design point I don't agree with.    
> > > 
> > > Blame the broken VM GEN ID spec.
> > > 
> > > For VM GEN ID, instead of DMA we could make ACPI code read PCI BAR and
> > > copy data over, this would fix rebalancing but there is a problem with
> > > this approach: it can not be done atomically (while VM is not yet
> > > running and accessing RAM).  So you can have guest read a partially
> > > corrupted ID from memory.  
> > 
> > Yep, VM GEN ID spec is broken and we can't do anything about it as
> > it's absolutely impossible to guaranty atomic update as guest OS
> > has address of the buffer and can read it any time. Nothing could be
> > done here.  
> 
> Hmm I thought we can stop VM while we are changing the ID.
> But of course VM could be accessing it at the same time.
> So I take this back, ACPI code reading PCI BAR and
> writing data out to the buffer would be fine
> from this point of view.
spec is broken regardless of who reads BAR or RAM as read
isn't atomic and UUID could be updated in the middle.
So MS will have to live with that, unless they have a secret
way to tell guest stop reading from that ADDR, do update and
signal guest that it can read it.

> 
> > > 
> > > And hey, nowdays we actually made fw cfg do DMA too.  
> > 
> > I'm not against DMA, it's direction which seems wrong to me.
> > What we need here is a way to allocate GPA range and make sure
> > that QEMU maps device memory there, PCI BAR is one
> > of the ways to do it.  
> 
> OK fine, but returning PCI BAR address to guest is wrong.
> How about reading it from ACPI then? Is it really
> broken unless there's *also* a driver?
I don't get question, MS Spec requires address (ADDR method),
and it's read by ACPI (AML).
As for working PCI_Config OpRegion without driver, I haven't tried,
but I wouldn't be surprised if it doesn't, taking in account that
MS introduced _DSM doesn't.

> 
> 
> > > >    Just compare with a graphics card design, where on device memory
> > > >    is mapped directly at some GPA not wasting RAM that guest could
> > > >    use for other tasks.    
> > > 
> > > This might have been true 20 years ago.  Most modern cards do DMA.  
> > 
> > Modern cards, with it's own RAM, map its VRAM in address space directly
> > and allow users use it (GEM API). So they do not waste conventional RAM.
> > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > series (even PCI class id is the same)  
> 
> Don't know enough about graphics really, I'm not sure how these are
> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> mostly use guest RAM, not on card RAM.
> 
> > > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > >    instead of consuming guest's RAM they should be mapped at
> > > >    some GPA and their memory accessed directly.    
> > > 
> > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > address. This breaks the straight-forward approach of using a
> > > rebalanceable PCI BAR.  
> > 
> > For PCI rebalance to work on Windows, one has to provide working PCI driver
> > otherwise OS will ignore it when rebalancing happens and
> > might map something else over ignored BAR.  
> 
> Does it disable the BAR then? Or just move it elsewhere?
it doesn't, it just blindly ignores BARs existence and maps BAR of
another device with driver over it.

> 
> > >   
> > > >    In that case NVDIMM could even map whole label area and
> > > >    significantly simplify QEMU<->OSPM protocol that currently
> > > >    serializes that data through a 4K page.
> > > >    There is also performance issue with buffer allocated in RAM,
> > > >    because DMA adds unnecessary copying step when data could
> > > >    be read/written directly of NVDIMM.
> > > >    It might be no very important for _DSM interface but when it
> > > >    comes to supporting block mode it can become an issue.    
> > > 
> > > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > > it's guaranteed to work across BAR rebalancing.
> > > Would that address the performance issue?  
> > 
> > it would if rebalancing were to account for driverless PCI device BARs,
> > but it doesn't hence such BARs need to be statically pinned
> > at place where BIOS put them at start up.
> > I'm also not sure that PCIConfig operation region would work
> > on Windows without loaded driver (similar to _DSM case).
> > 
> >   
> > > > Above points make ACPI patching approach not robust and fragile
> > > > and hard to maintain.    
> > > 
> > > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > > get what appears your general dislike of the linker host/guest
> > > interface.  
> > Besides technical issues general dislike is just what I've written
> > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > 
> > to make it less fragile:
> >  1. it should be impossible to corrupt memory or patch wrong address.
> >     current impl. silently relies on value referenced by 'pointer' argument
> >     and to figure that out one has to read linker code on BIOS side.
> >     That could be easily set wrong and slip through review.  
> 
> That's an API issue, it seemed like a good idea but I guess
> it confuses people. Would you be happier using an offset
> instead of a pointer?
offset is better and it would be better if it were saying
which offset it is (i.e. relative to what)

> 
> >     API shouldn't rely on the caller setting value pointed by that argument.  
> 
> I couldn't parse that one. Care suggesting a cleaner API for linker?
here is current API signature:

bios_linker_loader_add_pointer(GArray *linker,
                                    const char *dest_file,
                                    const char *src_file,
                                    GArray *table, void *pointer,
                                    uint8_t pointer_size)

issue 1: 
where 'pointer' is a real pointer pointing inside 'table' and API
calculates offset underhood:
  offset = (gchar *)pointer - table->data;
and puts it in ADD_POINTER command.

it's easy to get wrong offset if 'pointer' is not from 'table'.

issue 2:
'pointer' points to another offset of size 'pointer_size' in 'table'
blob, that means that whoever composes blob, has to aware of
it and fill correct value there which is possible to do right
if one looks inside of SeaBIOS part of linker interface.
Which is easy to forget and then one has to deal with mess
caused by random memory corruption.

bios_linker_loader_add_pointer() and corresponding
ADD_POINTER command should take this second offset as argument
and do no require 'table' be pre-filled with it or
in worst case if of extending ADD_POINTER command is problematic
bios_linker_loader_add_pointer() should still take
the second offset and patch 'table' itself so that 'table' composer
don't have to worry about it.

issue 3:
all patching obviously needs bounds checking on QEMU side
so it would abort early if it could corrupt memory.

> 
> >  2. If it's going to be used for patching AML, it should assert
> >     when bios_linker_loader_add_pointer() is called if to be patched
> >     AML object is wrong and patching would corrupt AML blob.  
> 
> Hmm for example check that the patched data has
> the expected pattern?
yep, nothing could be done for raw tables but that should be possible
for AML tables and if pattern is unsupported/size doesn't match
it should abort QEMU early instead of corrupting table.

> 
> >   
> > > It's there and we are not moving away from it, so why not
> > > use it in more places?  Or if you think it's wrong, why don't you build
> > > something better then?  We could then maybe use it for these things as
> > > well.  
> > 
> > Yep, I think for vmgenid and even more so for nvdimm
> > it would be better to allocate GPAs in QEMU and map backing
> > MemoryRegions directly in QEMU.
> > For nvdimm (main data region)
> > we already do it using pc-dimm's GPA allocation algorithm, we also
> > could use similar approach for nvdimm's label area and vmgenid.
> > 
> > Here is a simple attempt to add a limited GPA allocator in high memory
> >  https://patchwork.ozlabs.org/patch/540852/
> > But it haven't got any comment from you and were ignored.
> > Lets consider it and perhaps we could come up with GPA allocator
> > that could be used for other things as well.  
> 
> For nvdimm label area, I agree passing things through
> a 4K buffer seems inefficient.
> 
> I'm not sure what's a better way though.
> 
> Use 64 bit memory? Setting aside old guests such as XP,
> does it break 32 bit guests?
it might not work with 32bit guests, the same way as mem hotplug
doesn't work for them unless they are PAE enabled.
but well that's a limitation of implementation and considering
that storage nvdimm area is mapped at 64bit GPA it doesn't matter.
 
> I'm really afraid of adding yet another allocator, I think you
> underestimate the maintainance headache: it's not theoretical and is
> already felt.
Current maintenance headache is due to fixed handpicked
mem layout, we can't do much with it for legacy machine
types but with QEMU side GPA allocator we can try to switch
to a flexible memory layout that would allocate GPA
depending on QEMU config in a stable manner.

well there is maintenance headache with bios_linker as well
due to its complexity (multiple layers of indirection) and
it will grow when more places try to use it.
Yep we could use it as a hack, stealing RAM and trying implement
backwards DMA or we could be less afraid and consider
yet another allocator which will do the job without hacks
which should benefit QEMU in a long run (it might be not easy
to impl. it right but if we won't even try we would be buried
in complex hacks that 'work' for now)

> > >   
> > > >     
> > > > >     
> > > > > > > And hey, if you want to use a pci device to pass the physical
> > > > > > > address guest to host, instead of reserving
> > > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > > automatically.      
> > > > > > Could you elaborate more on this suggestion?      
> > > > > 
> > > > > I really just mean using PCI_Config operation region.
> > > > > If you wish, I'll try to post a prototype next week.    
> > > > I don't know much about PCI but it would be interesting,
> > > > perhaps we could use it somewhere else.
> > > > 
> > > > However it should be checked if it works with Windows,
> > > > for example PCI specific _DSM method is ignored by it
> > > > if PCI device doesn't have working PCI driver bound to it.
> > > >     
> > > > >     
> > > > > > > 
> > > > > > >       
> > > > > > > > ---
> > > > > > > > changes since 17:
> > > > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > > changes since 14:
> > > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > >   - permit only one vmgenid to be created
> > > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > > > > ---
> > > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > > 
> > > > > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > > > > index b177e52..6402439 100644
> > > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > +CONFIG_VMGENID=y
> > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > +CONFIG_VMGENID=y
> > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > > >  
> > > > > > > >  All these devices are documented in docs/specs.
> > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > > index 78758e2..0187262 100644
> > > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > > @@ -44,6 +44,7 @@
> > > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > >  
> > > > > > > >  /* Supported chipsets: */
> > > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > > +{
> > > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > > +
> > > > > > > > +    dev = aml_device("VGEN");
> > > > > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > > > > +
> > > > > > > > +    pkg = aml_package(2);
> > > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > > +
> > > > > > > > +    /*
> > > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > > > +     * For more verbose comment see this commit message.        
> > > > > > > 
> > > > > > > What does "this commit message" mean?      
> > > > > > above commit message. Should I reword it to just 'see commit message'
> > > > > >       
> > > > > > >       
> > > > > > > > +     */
> > > > > > > > +     crs = aml_resource_template();
> > > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > > +     return dev;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /*
> > > > > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > >              }
> > > > > > > >  
> > > > > > > >              if (bus) {
> > > > > > > > +                Object *vmgen;
> > > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > >                      aml_append(scope, dev);
> > > > > > > >                  }
> > > > > > > >  
> > > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > > +                if (vmgen) {
> > > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > > +                    uint64_t buf_paddr =
> > > > > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > > > > +
> > > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > > > > +
> > > > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > > > +                        aml_append(method,
> > > > > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > > +                                       aml_int(0x80)));
> > > > > > > > +                        aml_append(ssdt, method);
> > > > > > > > +                    }
> > > > > > > > +                }
> > > > > > > > +
> > > > > > > >                  aml_append(sb_scope, scope);
> > > > > > > >              }
> > > > > > > >          }
> > > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > > > > >      {
> > > > > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > > > > >  
> > > > > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > > > > -
> > > > > > > >          if (misc->is_piix4) {
> > > > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > > > >              aml_append(method,
> > > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > > index d4765c2..1f05edd 100644
> > > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > > > > >  
> > > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000..a2fbdfc
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > > @@ -0,0 +1,154 @@
> > > > > > > > +/*
> > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > + *
> > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > + *
> > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > + *
> > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > + *
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +#include "hw/i386/pc.h"
> > > > > > > > +#include "hw/pci/pci.h"
> > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > > +#include "qapi/visitor.h"
> > > > > > > > +
> > > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > > > > +
> > > > > > > > +typedef struct VmGenIdState {
> > > > > > > > +    PCIDevice parent_obj;
> > > > > > > > +    MemoryRegion iomem;
> > > > > > > > +    union {
> > > > > > > > +        uint8_t guid[16];
> > > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > > +    };
> > > > > > > > +    bool guid_set;
> > > > > > > > +} VmGenIdState;
> > > > > > > > +
> > > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > > +{
> > > > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > > > > +    if (!obj) {
> > > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > > +    }
> > > > > > > > +    return obj;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > > +{
> > > > > > > > +    Object *acpi_obj;
> > > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > > +
> > > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > > > +
> > > > > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > > +    if (acpi_obj) {
> > > > > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > > +
> > > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > > +    }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > > > > +{
> > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > +
> > > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > > +                   value);
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    s->guid_set = true;
> > > > > > > > +    vmgenid_update_guest(s);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > > > > +                                   const char *name, Error **errp)
> > > > > > > > +{
> > > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > > +
> > > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > > +{
> > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > +
> > > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > > > > +                           &error_abort);
> > > > > > > > +
> > > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > > +{
> > > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > > +    bool ambiguous = false;
> > > > > > > > +
> > > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > > > +    if (ambiguous) {
> > > > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > > > +                         " device is permitted");
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    if (!s->guid_set) {
> > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > > +        &s->iomem);
> > > > > > > > +    return;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > > > +{
> > > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > > +
> > > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > > +    dc->hotpluggable = false;
> > > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +static void vmgenid_register_types(void)
> > > > > > > > +{
> > > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +type_init(vmgenid_register_types)
> > > > > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000..b90882c
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > > @@ -0,0 +1,27 @@
> > > > > > > > +/*
> > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > + *
> > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > + *
> > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > + *
> > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > + *
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > > +
> > > > > > > > +#include "qom/object.h"
> > > > > > > > +
> > > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > > +
> > > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > > +
> > > > > > > > +#endif
> > > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > > index dedf277..f4c9d48 100644
> > > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > > @@ -94,6 +94,7 @@
> > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > > >  
> > > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > > -- 
> > > > > > > > 1.8.3.1        
> > >   

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-09 10:46                 ` Igor Mammedov
@ 2016-02-09 12:17                   ` Michael S. Tsirkin
  2016-02-11 15:16                     ` Igor Mammedov
  2016-02-10  8:51                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-09 12:17 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, ghammer, qemu-devel, lcapitulino, lersek

On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > So the linker interface solves this rather neatly:
> > bios allocates memory, bios passes memory map to guest.
> > Served us well for several years without need for extensions,
> > and it does solve the VM GEN ID problem, even though
> > 1. it was never designed for huge areas like nvdimm seems to want to use
> > 2. we might want to add a new 64 bit flag to avoid touching low memory
> linker interface is fine for some readonly data, like ACPI tables
> especially fixed tables not so for AML ones is one wants to patch it.
> 
> However now when you want to use it for other purposes you start
> adding extensions and other guest->QEMU channels to communicate
> patching info back.
> It steals guest's memory which is also not nice and doesn't scale well.

This is an argument I don't get. memory is memory. call it guest memory
or RAM backed PCI BAR - same thing. MMIO is cheaper of course
but much slower.

...

> > OK fine, but returning PCI BAR address to guest is wrong.
> > How about reading it from ACPI then? Is it really
> > broken unless there's *also* a driver?
> I don't get question, MS Spec requires address (ADDR method),
> and it's read by ACPI (AML).

You were unhappy about DMA into guest memory.
As a replacement for DMA, we could have AML read from
e.g. PCI and write into RAM.
This way we don't need to pass address to QEMU.

> As for working PCI_Config OpRegion without driver, I haven't tried,
> but I wouldn't be surprised if it doesn't, taking in account that
> MS introduced _DSM doesn't.
> 
> > 
> > 
> > > > >    Just compare with a graphics card design, where on device memory
> > > > >    is mapped directly at some GPA not wasting RAM that guest could
> > > > >    use for other tasks.    
> > > > 
> > > > This might have been true 20 years ago.  Most modern cards do DMA.  
> > > 
> > > Modern cards, with it's own RAM, map its VRAM in address space directly
> > > and allow users use it (GEM API). So they do not waste conventional RAM.
> > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > > series (even PCI class id is the same)  
> > 
> > Don't know enough about graphics really, I'm not sure how these are
> > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > mostly use guest RAM, not on card RAM.
> > 
> > > > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > >    instead of consuming guest's RAM they should be mapped at
> > > > >    some GPA and their memory accessed directly.    
> > > > 
> > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > address. This breaks the straight-forward approach of using a
> > > > rebalanceable PCI BAR.  
> > > 
> > > For PCI rebalance to work on Windows, one has to provide working PCI driver
> > > otherwise OS will ignore it when rebalancing happens and
> > > might map something else over ignored BAR.  
> > 
> > Does it disable the BAR then? Or just move it elsewhere?
> it doesn't, it just blindly ignores BARs existence and maps BAR of
> another device with driver over it.

Interesting. On classical PCI this is a forbidden configuration.
Maybe we do something that confuses windows?
Could you tell me how to reproduce this behaviour?

> > 
> > > >   
> > > > >    In that case NVDIMM could even map whole label area and
> > > > >    significantly simplify QEMU<->OSPM protocol that currently
> > > > >    serializes that data through a 4K page.
> > > > >    There is also performance issue with buffer allocated in RAM,
> > > > >    because DMA adds unnecessary copying step when data could
> > > > >    be read/written directly of NVDIMM.
> > > > >    It might be no very important for _DSM interface but when it
> > > > >    comes to supporting block mode it can become an issue.    
> > > > 
> > > > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > > > it's guaranteed to work across BAR rebalancing.
> > > > Would that address the performance issue?  
> > > 
> > > it would if rebalancing were to account for driverless PCI device BARs,
> > > but it doesn't hence such BARs need to be statically pinned
> > > at place where BIOS put them at start up.
> > > I'm also not sure that PCIConfig operation region would work
> > > on Windows without loaded driver (similar to _DSM case).
> > > 
> > >   
> > > > > Above points make ACPI patching approach not robust and fragile
> > > > > and hard to maintain.    
> > > > 
> > > > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > > > get what appears your general dislike of the linker host/guest
> > > > interface.  
> > > Besides technical issues general dislike is just what I've written
> > > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > > 
> > > to make it less fragile:
> > >  1. it should be impossible to corrupt memory or patch wrong address.
> > >     current impl. silently relies on value referenced by 'pointer' argument
> > >     and to figure that out one has to read linker code on BIOS side.
> > >     That could be easily set wrong and slip through review.  
> > 
> > That's an API issue, it seemed like a good idea but I guess
> > it confuses people. Would you be happier using an offset
> > instead of a pointer?
> offset is better and it would be better if it were saying
> which offset it is (i.e. relative to what)


Start of table, right?

> > 
> > >     API shouldn't rely on the caller setting value pointed by that argument.  
> > 
> > I couldn't parse that one. Care suggesting a cleaner API for linker?
> here is current API signature:
> 
> bios_linker_loader_add_pointer(GArray *linker,
>                                     const char *dest_file,
>                                     const char *src_file,
>                                     GArray *table, void *pointer,
>                                     uint8_t pointer_size)
> 
> issue 1: 
> where 'pointer' is a real pointer pointing inside 'table' and API
> calculates offset underhood:
>   offset = (gchar *)pointer - table->data;
> and puts it in ADD_POINTER command.
> 
> it's easy to get wrong offset if 'pointer' is not from 'table'.

OK, replace that with table_offset?

> issue 2:
> 'pointer' points to another offset of size 'pointer_size' in 'table'
> blob, that means that whoever composes blob, has to aware of
> it and fill correct value there which is possible to do right
> if one looks inside of SeaBIOS part of linker interface.
> Which is easy to forget and then one has to deal with mess
> caused by random memory corruption.
> 
> bios_linker_loader_add_pointer() and corresponding
> ADD_POINTER command should take this second offset as argument
> and do no require 'table' be pre-filled with it or
> in worst case if of extending ADD_POINTER command is problematic
> bios_linker_loader_add_pointer() should still take
> the second offset and patch 'table' itself so that 'table' composer
> don't have to worry about it.

This one I don't understand. What's the second pointer you
are talking about?

> issue 3:
> all patching obviously needs bounds checking on QEMU side
> so it would abort early if it could corrupt memory.

That's easy.

> > 
> > >  2. If it's going to be used for patching AML, it should assert
> > >     when bios_linker_loader_add_pointer() is called if to be patched
> > >     AML object is wrong and patching would corrupt AML blob.  
> > 
> > Hmm for example check that the patched data has
> > the expected pattern?
> yep, nothing could be done for raw tables but that should be possible
> for AML tables and if pattern is unsupported/size doesn't match
> it should abort QEMU early instead of corrupting table.

Above all sounds reasonable. Would you like to take a stub
at it or prefer me to?

> > 
> > >   
> > > > It's there and we are not moving away from it, so why not
> > > > use it in more places?  Or if you think it's wrong, why don't you build
> > > > something better then?  We could then maybe use it for these things as
> > > > well.  
> > > 
> > > Yep, I think for vmgenid and even more so for nvdimm
> > > it would be better to allocate GPAs in QEMU and map backing
> > > MemoryRegions directly in QEMU.
> > > For nvdimm (main data region)
> > > we already do it using pc-dimm's GPA allocation algorithm, we also
> > > could use similar approach for nvdimm's label area and vmgenid.
> > > 
> > > Here is a simple attempt to add a limited GPA allocator in high memory
> > >  https://patchwork.ozlabs.org/patch/540852/
> > > But it haven't got any comment from you and were ignored.
> > > Lets consider it and perhaps we could come up with GPA allocator
> > > that could be used for other things as well.  
> > 
> > For nvdimm label area, I agree passing things through
> > a 4K buffer seems inefficient.
> > 
> > I'm not sure what's a better way though.
> > 
> > Use 64 bit memory? Setting aside old guests such as XP,
> > does it break 32 bit guests?
> it might not work with 32bit guests, the same way as mem hotplug
> doesn't work for them unless they are PAE enabled.

Right, I mean with PAE.

> but well that's a limitation of implementation and considering
> that storage nvdimm area is mapped at 64bit GPA it doesn't matter.
>  
> > I'm really afraid of adding yet another allocator, I think you
> > underestimate the maintainance headache: it's not theoretical and is
> > already felt.
> Current maintenance headache is due to fixed handpicked
> mem layout, we can't do much with it for legacy machine
> types but with QEMU side GPA allocator we can try to switch
> to a flexible memory layout that would allocate GPA
> depending on QEMU config in a stable manner.

So far, we didn't manage to. It seems to go in the reverse
direction were we add more and more control to let management
influence the layout. Things like alignment requirements
also tend to surface later and wreck havoc on whatever
we do.

> well there is maintenance headache with bios_linker as well
> due to its complexity (multiple layers of indirection) and
> it will grow when more places try to use it.
> Yep we could use it as a hack, stealing RAM and trying implement
> backwards DMA or we could be less afraid and consider
> yet another allocator which will do the job without hacks
> which should benefit QEMU in a long run (it might be not easy
> to impl. it right but if we won't even try we would be buried
> in complex hacks that 'work' for now)
> 
> > > >   
> > > > >     
> > > > > >     
> > > > > > > > And hey, if you want to use a pci device to pass the physical
> > > > > > > > address guest to host, instead of reserving
> > > > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > > > automatically.      
> > > > > > > Could you elaborate more on this suggestion?      
> > > > > > 
> > > > > > I really just mean using PCI_Config operation region.
> > > > > > If you wish, I'll try to post a prototype next week.    
> > > > > I don't know much about PCI but it would be interesting,
> > > > > perhaps we could use it somewhere else.
> > > > > 
> > > > > However it should be checked if it works with Windows,
> > > > > for example PCI specific _DSM method is ignored by it
> > > > > if PCI device doesn't have working PCI driver bound to it.
> > > > >     
> > > > > >     
> > > > > > > > 
> > > > > > > >       
> > > > > > > > > ---
> > > > > > > > > changes since 17:
> > > > > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > > > changes since 14:
> > > > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > >   - permit only one vmgenid to be created
> > > > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > > > > > ---
> > > > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > > > 
> > > > > > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > > > > > index b177e52..6402439 100644
> > > > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > > > >  
> > > > > > > > >  All these devices are documented in docs/specs.
> > > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > > > index 78758e2..0187262 100644
> > > > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > > > @@ -44,6 +44,7 @@
> > > > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > >  
> > > > > > > > >  /* Supported chipsets: */
> > > > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > > > +{
> > > > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > > > +
> > > > > > > > > +    dev = aml_device("VGEN");
> > > > > > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > > > > > +
> > > > > > > > > +    pkg = aml_package(2);
> > > > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > > > +
> > > > > > > > > +    /*
> > > > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > > > > +     * For more verbose comment see this commit message.        
> > > > > > > > 
> > > > > > > > What does "this commit message" mean?      
> > > > > > > above commit message. Should I reword it to just 'see commit message'
> > > > > > >       
> > > > > > > >       
> > > > > > > > > +     */
> > > > > > > > > +     crs = aml_resource_template();
> > > > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > > > +     return dev;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  /*
> > > > > > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > > > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > >              }
> > > > > > > > >  
> > > > > > > > >              if (bus) {
> > > > > > > > > +                Object *vmgen;
> > > > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > > > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > >                      aml_append(scope, dev);
> > > > > > > > >                  }
> > > > > > > > >  
> > > > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > > > +                if (vmgen) {
> > > > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > > > +                    uint64_t buf_paddr =
> > > > > > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > > > > > +
> > > > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > > > > > +
> > > > > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > > > > +                        aml_append(method,
> > > > > > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > > > +                                       aml_int(0x80)));
> > > > > > > > > +                        aml_append(ssdt, method);
> > > > > > > > > +                    }
> > > > > > > > > +                }
> > > > > > > > > +
> > > > > > > > >                  aml_append(sb_scope, scope);
> > > > > > > > >              }
> > > > > > > > >          }
> > > > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > > > > > >      {
> > > > > > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > > > > > >  
> > > > > > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > > > > > -
> > > > > > > > >          if (misc->is_piix4) {
> > > > > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > > > > >              aml_append(method,
> > > > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > > > index d4765c2..1f05edd 100644
> > > > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > > > > > >  
> > > > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > > > new file mode 100644
> > > > > > > > > index 0000000..a2fbdfc
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > > > @@ -0,0 +1,154 @@
> > > > > > > > > +/*
> > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > + *
> > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > + *
> > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > + *
> > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > + *
> > > > > > > > > + */
> > > > > > > > > +
> > > > > > > > > +#include "hw/i386/pc.h"
> > > > > > > > > +#include "hw/pci/pci.h"
> > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > > > +#include "qapi/visitor.h"
> > > > > > > > > +
> > > > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > > > > > +
> > > > > > > > > +typedef struct VmGenIdState {
> > > > > > > > > +    PCIDevice parent_obj;
> > > > > > > > > +    MemoryRegion iomem;
> > > > > > > > > +    union {
> > > > > > > > > +        uint8_t guid[16];
> > > > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > > > +    };
> > > > > > > > > +    bool guid_set;
> > > > > > > > > +} VmGenIdState;
> > > > > > > > > +
> > > > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > > > +{
> > > > > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > > > > > +    if (!obj) {
> > > > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > > > +    }
> > > > > > > > > +    return obj;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > > > +{
> > > > > > > > > +    Object *acpi_obj;
> > > > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > > > +
> > > > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > > > > +
> > > > > > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > > > +    if (acpi_obj) {
> > > > > > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > > > +
> > > > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > > > +    }
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > > > > > +{
> > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > +
> > > > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > > > +                   value);
> > > > > > > > > +        return;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > > +    s->guid_set = true;
> > > > > > > > > +    vmgenid_update_guest(s);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > > > > > +                                   const char *name, Error **errp)
> > > > > > > > > +{
> > > > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > > > +
> > > > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > > > +        return;
> > > > > > > > > +    }
> > > > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > > > +{
> > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > +
> > > > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > > > > > +                           &error_abort);
> > > > > > > > > +
> > > > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +
> > > > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > > > +{
> > > > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > > > +    bool ambiguous = false;
> > > > > > > > > +
> > > > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > > > > +    if (ambiguous) {
> > > > > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > > > > +                         " device is permitted");
> > > > > > > > > +        return;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > > +    if (!s->guid_set) {
> > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > > > +        return;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > > > +        &s->iomem);
> > > > > > > > > +    return;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > > > > +{
> > > > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > > > +
> > > > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > > > +    dc->hotpluggable = false;
> > > > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > +static void vmgenid_register_types(void)
> > > > > > > > > +{
> > > > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +type_init(vmgenid_register_types)
> > > > > > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > > > > > new file mode 100644
> > > > > > > > > index 0000000..b90882c
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > > > @@ -0,0 +1,27 @@
> > > > > > > > > +/*
> > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > + *
> > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > + *
> > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > + *
> > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > + *
> > > > > > > > > + */
> > > > > > > > > +
> > > > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > > > +
> > > > > > > > > +#include "qom/object.h"
> > > > > > > > > +
> > > > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > > > +
> > > > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > > > +
> > > > > > > > > +#endif
> > > > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > > > index dedf277..f4c9d48 100644
> > > > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > > > @@ -94,6 +94,7 @@
> > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > > > >  
> > > > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > > > -- 
> > > > > > > > > 1.8.3.1        
> > > >   

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 5/9] qmp/hmp: add query-vm-generation-id and 'info vm-generation-id' commands
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 5/9] qmp/hmp: add query-vm-generation-id and 'info vm-generation-id' commands Igor Mammedov
@ 2016-02-09 17:31   ` Eric Blake
  0 siblings, 0 replies; 59+ messages in thread
From: Eric Blake @ 2016-02-09 17:31 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: ghammer, lcapitulino, lersek, ehabkost, mst

[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]

On 01/28/2016 03:54 AM, Igor Mammedov wrote:
> Add commands to query Virtual Machine Generation ID counter.
> 
> QMP command example:
>     { "execute": "query-vm-generation-id" }
> 
> HMP command example:
>     info vm-generation-id
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v18:
>   - add a new QMP type GuidInfo instead of reusing UuidInfo
>     Eric Blake <eblake@redhat.com>
> ---

> +++ b/hmp.c
> @@ -2375,3 +2375,12 @@ void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict)
>  
>      qapi_free_RockerOfDpaGroupList(list);
>  }
> +
> +void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict)
> +{
> +    GuidInfo *info = qmp_query_vm_generation_id(NULL);

Should we report rather than ignore errors in the HMP interface?

> +    if (info) {
> +        monitor_printf(mon, "%s\n", info->guid);
> +    }


> +++ b/qapi-schema.json
> @@ -4083,3 +4083,23 @@
>  ##
>  { 'enum': 'ReplayMode',
>    'data': [ 'none', 'record', 'play' ] }
> +
> +##
> +# @GuidInfo:
> +#
> +# GUID information.
> +#
> +# @guid: the globally unique identifier

Maybe add "in usual ASCII format" to make it obvious it is the 36-byte
string, not the 16-byte binary value.

But neither comment is a strong objection, so:
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 6/9] qmp/hmp: add set-vm-generation-id commands
  2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 6/9] qmp/hmp: add set-vm-generation-id commands Igor Mammedov
@ 2016-02-09 17:33   ` Eric Blake
  0 siblings, 0 replies; 59+ messages in thread
From: Eric Blake @ 2016-02-09 17:33 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: ghammer, lcapitulino, lersek, ehabkost, mst

[-- Attachment #1: Type: text/plain, Size: 635 bytes --]

On 01/28/2016 03:54 AM, Igor Mammedov wrote:
> Add set-vm-generation-id command to set Virtual Machine
> Generation ID counter.
> 
> QMP command example:
>     { "execute": "set-vm-generation-id",
>           "arguments": {
>               "guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
>           }
>     }
> 
> HMP command example:
>     set-vm-generation-id 324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-09 10:46                 ` Igor Mammedov
  2016-02-09 12:17                   ` Michael S. Tsirkin
@ 2016-02-10  8:51                   ` Michael S. Tsirkin
  2016-02-10  9:28                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-10  8:51 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, ghammer, qemu-devel, lcapitulino, lersek

On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > > > > 2. ACPI approach consumes guest usable RAM to allocate buffer
> > > > >    and then makes device to DMA data in that RAM.
> > > > >    That's a design point I don't agree with.    
> > > > 
> > > > Blame the broken VM GEN ID spec.
> > > > 
> > > > For VM GEN ID, instead of DMA we could make ACPI code read PCI BAR and
> > > > copy data over, this would fix rebalancing but there is a problem with
> > > > this approach: it can not be done atomically (while VM is not yet
> > > > running and accessing RAM).  So you can have guest read a partially
> > > > corrupted ID from memory.  
> > > 
> > > Yep, VM GEN ID spec is broken and we can't do anything about it as
> > > it's absolutely impossible to guaranty atomic update as guest OS
> > > has address of the buffer and can read it any time. Nothing could be
> > > done here.  
> > 
> > Hmm I thought we can stop VM while we are changing the ID.
> > But of course VM could be accessing it at the same time.
> > So I take this back, ACPI code reading PCI BAR and
> > writing data out to the buffer would be fine
> > from this point of view.
> spec is broken regardless of who reads BAR or RAM as read
> isn't atomic and UUID could be updated in the middle.
> So MS will have to live with that, unless they have a secret
> way to tell guest stop reading from that ADDR, do update and
> signal guest that it can read it.

And really, that the issue that's driving this up to v19.  There's no
good way to implement a bad spec.  It feels more like a failed
experiment than like a thought through interface.  So maybe we should
leave this alone, wait until we see an actual user - this way we can
figure out the implementation constraints better.

-- 
MST

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-10  8:51                   ` Michael S. Tsirkin
@ 2016-02-10  9:28                     ` Michael S. Tsirkin
  2016-02-10 10:00                       ` Laszlo Ersek
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-10  9:28 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, ghammer, qemu-devel, lcapitulino, lersek

On Wed, Feb 10, 2016 at 10:51:47AM +0200, Michael S. Tsirkin wrote:
> So maybe we should
> leave this alone, wait until we see an actual user - this way we can
> figure out the implementation constraints better.

What I'm definitely interested in seeing is improving the
bios_linker_loader API within QEMU.

Is the below going in the right direction, in your opinion?


diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
index 498c0af..78d9a16 100644
--- a/include/hw/acpi/bios-linker-loader.h
+++ b/include/hw/acpi/bios-linker-loader.h
@@ -17,6 +17,17 @@ void bios_linker_loader_add_checksum(GArray *linker, const char *file,
                                      void *start, unsigned size,
                                      uint8_t *checksum);
 
+/*
+ * bios_linker_loader_add_pointer: ask guest to append address of source file
+ * into destination file at the specified pointer.
+ *
+ * @linker: linker file array
+ * @dest_file: destination file that must be changed
+ * @src_file: source file whos address must be taken
+ * @table: destination file array
+ * @pointer: location of the pointer to be patched within destination file
+ * @pointer_size: size of pointer to be patched, in bytes
+ */
 void bios_linker_loader_add_pointer(GArray *linker,
                                     const char *dest_file,
                                     const char *src_file,
diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
index e04d60a..84be25a 100644
--- a/hw/acpi/bios-linker-loader.c
+++ b/hw/acpi/bios-linker-loader.c
@@ -142,7 +142,13 @@ void bios_linker_loader_add_pointer(GArray *linker,
                                     uint8_t pointer_size)
 {
     BiosLinkerLoaderEntry entry;
-    size_t offset = (gchar *)pointer - table->data;
+    size_t offset;
+
+    assert((gchar *)pointer >= table->data);
+
+    offset = (gchar *)pointer - table->data;
+
+    assert(offset + pointer_size < table->len);
 
     memset(&entry, 0, sizeof entry);
     strncpy(entry.pointer.dest_file, dest_file,

-- 
MST

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-10  9:28                     ` Michael S. Tsirkin
@ 2016-02-10 10:00                       ` Laszlo Ersek
  0 siblings, 0 replies; 59+ messages in thread
From: Laszlo Ersek @ 2016-02-10 10:00 UTC (permalink / raw)
  To: Michael S. Tsirkin, Igor Mammedov
  Cc: ghammer, lcapitulino, Xiao Guangrong, ehabkost, qemu-devel

On 02/10/16 10:28, Michael S. Tsirkin wrote:
> On Wed, Feb 10, 2016 at 10:51:47AM +0200, Michael S. Tsirkin wrote:
>> So maybe we should
>> leave this alone, wait until we see an actual user - this way we can
>> figure out the implementation constraints better.
> 
> What I'm definitely interested in seeing is improving the
> bios_linker_loader API within QEMU.
> 
> Is the below going in the right direction, in your opinion?
> 
> 
> diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
> index 498c0af..78d9a16 100644
> --- a/include/hw/acpi/bios-linker-loader.h
> +++ b/include/hw/acpi/bios-linker-loader.h
> @@ -17,6 +17,17 @@ void bios_linker_loader_add_checksum(GArray *linker, const char *file,
>                                       void *start, unsigned size,
>                                       uint8_t *checksum);
>  
> +/*
> + * bios_linker_loader_add_pointer: ask guest to append address of source file
> + * into destination file at the specified pointer.
> + *
> + * @linker: linker file array
> + * @dest_file: destination file that must be changed
> + * @src_file: source file whos address must be taken
> + * @table: destination file array
> + * @pointer: location of the pointer to be patched within destination file
> + * @pointer_size: size of pointer to be patched, in bytes
> + */
>  void bios_linker_loader_add_pointer(GArray *linker,
>                                      const char *dest_file,
>                                      const char *src_file,
> diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
> index e04d60a..84be25a 100644
> --- a/hw/acpi/bios-linker-loader.c
> +++ b/hw/acpi/bios-linker-loader.c
> @@ -142,7 +142,13 @@ void bios_linker_loader_add_pointer(GArray *linker,
>                                      uint8_t pointer_size)
>  {
>      BiosLinkerLoaderEntry entry;
> -    size_t offset = (gchar *)pointer - table->data;
> +    size_t offset;
> +
> +    assert((gchar *)pointer >= table->data);
> +
> +    offset = (gchar *)pointer - table->data;
> +
> +    assert(offset + pointer_size < table->len);
>  
>      memset(&entry, 0, sizeof entry);
>      strncpy(entry.pointer.dest_file, dest_file,
> 

I have two suggestions (independently of Igor's upcoming opinion):

(1) I propose to do all this arithmetic in uintptr_t, not (char*).

(2) In the last assertion, < should be <=. Both sides are exclusive, so
equality is valid.

(BTW the OVMF implementation of the linker-loader client is chock full
of verifications like the above, done in UINT64, so I can only agree
with the above safety measures, independently of vmgenid.)

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-09 12:17                   ` Michael S. Tsirkin
@ 2016-02-11 15:16                     ` Igor Mammedov
  2016-02-11 16:30                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-02-11 15:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xiao Guangrong, ehabkost, ghammer, qemu-devel, lcapitulino, lersek

On Tue, 9 Feb 2016 14:17:44 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > > So the linker interface solves this rather neatly:
> > > bios allocates memory, bios passes memory map to guest.
> > > Served us well for several years without need for extensions,
> > > and it does solve the VM GEN ID problem, even though
> > > 1. it was never designed for huge areas like nvdimm seems to want to use
> > > 2. we might want to add a new 64 bit flag to avoid touching low memory  
> > linker interface is fine for some readonly data, like ACPI tables
> > especially fixed tables not so for AML ones is one wants to patch it.
> > 
> > However now when you want to use it for other purposes you start
> > adding extensions and other guest->QEMU channels to communicate
> > patching info back.
> > It steals guest's memory which is also not nice and doesn't scale well.  
> 
> This is an argument I don't get. memory is memory. call it guest memory
> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> but much slower.
> 
> ...
It however matters for user, he pays for guest with XXX RAM but gets less
than that. And that will be getting worse as a number of such devices
increases.

> > > OK fine, but returning PCI BAR address to guest is wrong.
> > > How about reading it from ACPI then? Is it really
> > > broken unless there's *also* a driver?  
> > I don't get question, MS Spec requires address (ADDR method),
> > and it's read by ACPI (AML).  
> 
> You were unhappy about DMA into guest memory.
> As a replacement for DMA, we could have AML read from
> e.g. PCI and write into RAM.
> This way we don't need to pass address to QEMU.
That sounds better as it saves us from allocation of IO port
and QEMU don't need to write into guest memory, the only question is
if PCI_Config opregion would work with driver-less PCI device.

And it's still pretty much not test-able since it would require
fully running OSPM to execute AML side.

> 
> > As for working PCI_Config OpRegion without driver, I haven't tried,
> > but I wouldn't be surprised if it doesn't, taking in account that
> > MS introduced _DSM doesn't.
> >   
> > > 
> > >   
> > > > > >    Just compare with a graphics card design, where on device memory
> > > > > >    is mapped directly at some GPA not wasting RAM that guest could
> > > > > >    use for other tasks.      
> > > > > 
> > > > > This might have been true 20 years ago.  Most modern cards do DMA.    
> > > > 
> > > > Modern cards, with it's own RAM, map its VRAM in address space directly
> > > > and allow users use it (GEM API). So they do not waste conventional RAM.
> > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > > > series (even PCI class id is the same)    
> > > 
> > > Don't know enough about graphics really, I'm not sure how these are
> > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > mostly use guest RAM, not on card RAM.
> > >   
> > > > > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > > >    instead of consuming guest's RAM they should be mapped at
> > > > > >    some GPA and their memory accessed directly.      
> > > > > 
> > > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > > address. This breaks the straight-forward approach of using a
> > > > > rebalanceable PCI BAR.    
> > > > 
> > > > For PCI rebalance to work on Windows, one has to provide working PCI driver
> > > > otherwise OS will ignore it when rebalancing happens and
> > > > might map something else over ignored BAR.    
> > > 
> > > Does it disable the BAR then? Or just move it elsewhere?  
> > it doesn't, it just blindly ignores BARs existence and maps BAR of
> > another device with driver over it.  
> 
> Interesting. On classical PCI this is a forbidden configuration.
> Maybe we do something that confuses windows?
> Could you tell me how to reproduce this behaviour?
#cat > t << EOF
pci_update_mappings_del
pci_update_mappings_add
EOF

#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
 -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
 -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
 -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01

wait till OS boots, note BARs programmed for ivshmem
 in my case it was
   01:01.0 0,0xfe800000+0x100
then execute script and watch pci_update_mappings* trace events

# for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;

hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
and then programs new BARs, where:
  pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
creates overlapping BAR with ivshmem 


> 
> > >   
> > > > >     
> > > > > >    In that case NVDIMM could even map whole label area and
> > > > > >    significantly simplify QEMU<->OSPM protocol that currently
> > > > > >    serializes that data through a 4K page.
> > > > > >    There is also performance issue with buffer allocated in RAM,
> > > > > >    because DMA adds unnecessary copying step when data could
> > > > > >    be read/written directly of NVDIMM.
> > > > > >    It might be no very important for _DSM interface but when it
> > > > > >    comes to supporting block mode it can become an issue.      
> > > > > 
> > > > > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > > > > it's guaranteed to work across BAR rebalancing.
> > > > > Would that address the performance issue?    
> > > > 
> > > > it would if rebalancing were to account for driverless PCI device BARs,
> > > > but it doesn't hence such BARs need to be statically pinned
> > > > at place where BIOS put them at start up.
> > > > I'm also not sure that PCIConfig operation region would work
> > > > on Windows without loaded driver (similar to _DSM case).
> > > > 
> > > >     
> > > > > > Above points make ACPI patching approach not robust and fragile
> > > > > > and hard to maintain.      
> > > > > 
> > > > > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > > > > get what appears your general dislike of the linker host/guest
> > > > > interface.    
> > > > Besides technical issues general dislike is just what I've written
> > > > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > > > 
> > > > to make it less fragile:
> > > >  1. it should be impossible to corrupt memory or patch wrong address.
> > > >     current impl. silently relies on value referenced by 'pointer' argument
> > > >     and to figure that out one has to read linker code on BIOS side.
> > > >     That could be easily set wrong and slip through review.    
> > > 
> > > That's an API issue, it seemed like a good idea but I guess
> > > it confuses people. Would you be happier using an offset
> > > instead of a pointer?  
> > offset is better and it would be better if it were saying
> > which offset it is (i.e. relative to what)  
> 
> 
> Start of table, right?
not sure, to me it looks like start of a blob and not the table

> 
> > >   
> > > >     API shouldn't rely on the caller setting value pointed by that argument.    
> > > 
> > > I couldn't parse that one. Care suggesting a cleaner API for linker?  
> > here is current API signature:
> > 
> > bios_linker_loader_add_pointer(GArray *linker,
> >                                     const char *dest_file,
> >                                     const char *src_file,
> >                                     GArray *table, void *pointer,
> >                                     uint8_t pointer_size)
> > 
> > issue 1: 
> > where 'pointer' is a real pointer pointing inside 'table' and API
> > calculates offset underhood:
> >   offset = (gchar *)pointer - table->data;
> > and puts it in ADD_POINTER command.
> > 
> > it's easy to get wrong offset if 'pointer' is not from 'table'.  
> 
> OK, replace that with table_offset?
blob_offset?

also s/table/blob/

> 
> > issue 2:
> > 'pointer' points to another offset of size 'pointer_size' in 'table'
> > blob, that means that whoever composes blob, has to aware of
> > it and fill correct value there which is possible to do right
> > if one looks inside of SeaBIOS part of linker interface.
> > Which is easy to forget and then one has to deal with mess
> > caused by random memory corruption.
> > 
> > bios_linker_loader_add_pointer() and corresponding
> > ADD_POINTER command should take this second offset as argument
> > and do no require 'table' be pre-filled with it or
> > in worst case if of extending ADD_POINTER command is problematic
> > bios_linker_loader_add_pointer() should still take
> > the second offset and patch 'table' itself so that 'table' composer
> > don't have to worry about it.  
> 
> This one I don't understand. What's the second pointer you
> are talking about?
ha, see even the author already has absolutely no clue how linker works
and about what offsets are relative to.
see SeaBIOS romfile_loader_add_pointer():
    ...
    memcpy(&pointer, dest_file->data + offset, entry->pointer_size);
here is the second offset       ^^^^^^^^^^^^^
it should be properly named field of ADD_POINTER command and
the not part of data blob.

    pointer = le64_to_cpu(pointer);
    pointer += (unsigned long)src_file->data;
    pointer = cpu_to_le64(pointer);
    memcpy(dest_file->data + offset, &pointer, entry->pointer_size);

all this src|dst_file and confusing offsets(whatever they might mean)
make me experience headache every time I need to remember how linker
works and read both QEMU and SeaBIOS code to figure it out each time.
That's what I'd call un-maintainable and hard to use API.


> 
> > issue 3:
> > all patching obviously needs bounds checking on QEMU side
> > so it would abort early if it could corrupt memory.  
> 
> That's easy.
> 
> > >   
> > > >  2. If it's going to be used for patching AML, it should assert
> > > >     when bios_linker_loader_add_pointer() is called if to be patched
> > > >     AML object is wrong and patching would corrupt AML blob.    
> > > 
> > > Hmm for example check that the patched data has
> > > the expected pattern?  
> > yep, nothing could be done for raw tables but that should be possible
> > for AML tables and if pattern is unsupported/size doesn't match
> > it should abort QEMU early instead of corrupting table.  
> 
> Above all sounds reasonable. Would you like to take a stub
> at it or prefer me to?
It would be better if it were you.

I wouldn't like to maintain it ever as it's too complex and hard to use API,
which I'd use only as the last resort if there weren't any other way
to implement the task at hand.
 
> > >   
> > > >     
> > > > > It's there and we are not moving away from it, so why not
> > > > > use it in more places?  Or if you think it's wrong, why don't you build
> > > > > something better then?  We could then maybe use it for these things as
> > > > > well.    
> > > > 
> > > > Yep, I think for vmgenid and even more so for nvdimm
> > > > it would be better to allocate GPAs in QEMU and map backing
> > > > MemoryRegions directly in QEMU.
> > > > For nvdimm (main data region)
> > > > we already do it using pc-dimm's GPA allocation algorithm, we also
> > > > could use similar approach for nvdimm's label area and vmgenid.
> > > > 
> > > > Here is a simple attempt to add a limited GPA allocator in high memory
> > > >  https://patchwork.ozlabs.org/patch/540852/
> > > > But it haven't got any comment from you and were ignored.
> > > > Lets consider it and perhaps we could come up with GPA allocator
> > > > that could be used for other things as well.    
> > > 
> > > For nvdimm label area, I agree passing things through
> > > a 4K buffer seems inefficient.
> > > 
> > > I'm not sure what's a better way though.
> > > 
> > > Use 64 bit memory? Setting aside old guests such as XP,
> > > does it break 32 bit guests?  
> > it might not work with 32bit guests, the same way as mem hotplug
> > doesn't work for them unless they are PAE enabled.  
> 
> Right, I mean with PAE.
I've tested it with 32-bit XP and Windows 10, they boot fine and
vmgenid device is displayed as OK with buffer above 4Gb (on Win10).
So at least is doesn't crash guest.
I can't test more than that for 32 bit guests since utility
to read vmgenid works only on Windows Server and there isn't
a 32bit version of it.

> 
> > but well that's a limitation of implementation and considering
> > that storage nvdimm area is mapped at 64bit GPA it doesn't matter.
> >    
> > > I'm really afraid of adding yet another allocator, I think you
> > > underestimate the maintainance headache: it's not theoretical and is
> > > already felt.  
> > Current maintenance headache is due to fixed handpicked
> > mem layout, we can't do much with it for legacy machine
> > types but with QEMU side GPA allocator we can try to switch
> > to a flexible memory layout that would allocate GPA
> > depending on QEMU config in a stable manner.  
> 
> So far, we didn't manage to. It seems to go in the reverse
> direction were we add more and more control to let management
> influence the layout. Things like alignment requirements
> also tend to surface later and wreck havoc on whatever
> we do.
Was even there an attempt to try it before, could you point to it?
The only attempt I've seen was https://patchwork.ozlabs.org/patch/540852/
but it haven't got any technical comments from you,
except of 'I'm afraid that it won't work' on IRC.

QEMU already has GPA allocator limited to memory hotplug AS
and it has passed through 'growing' issues. What above patch
proposes is to reuse already existing memory hotplug AS and
maybe make its GPA allocator more generic (i.e. not tied
only to pc-dimm) on top of it.

It's sufficient for vmgenid use-case and a definitely
much more suitable for nvdimm which already uses it for mapping
main storage MemoryRegion.

> > well there is maintenance headache with bios_linker as well
> > due to its complexity (multiple layers of indirection) and
> > it will grow when more places try to use it.
> > Yep we could use it as a hack, stealing RAM and trying implement
> > backwards DMA or we could be less afraid and consider
> > yet another allocator which will do the job without hacks
> > which should benefit QEMU in a long run (it might be not easy
> > to impl. it right but if we won't even try we would be buried
> > in complex hacks that 'work' for now)
> >   
> > > > >     
> > > > > >       
> > > > > > >       
> > > > > > > > > And hey, if you want to use a pci device to pass the physical
> > > > > > > > > address guest to host, instead of reserving
> > > > > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > > > > automatically.        
> > > > > > > > Could you elaborate more on this suggestion?        
> > > > > > > 
> > > > > > > I really just mean using PCI_Config operation region.
> > > > > > > If you wish, I'll try to post a prototype next week.      
> > > > > > I don't know much about PCI but it would be interesting,
> > > > > > perhaps we could use it somewhere else.
> > > > > > 
> > > > > > However it should be checked if it works with Windows,
> > > > > > for example PCI specific _DSM method is ignored by it
> > > > > > if PCI device doesn't have working PCI driver bound to it.
> > > > > >       
> > > > > > >       
> > > > > > > > > 
> > > > > > > > >         
> > > > > > > > > > ---
> > > > > > > > > > changes since 17:
> > > > > > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > > > > changes since 14:
> > > > > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > > > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > >   - permit only one vmgenid to be created
> > > > > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > > > > > > ---
> > > > > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > > > > 
> > > > > > > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > > > > > > index b177e52..6402439 100644
> > > > > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > > > > >  
> > > > > > > > > >  All these devices are documented in docs/specs.
> > > > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > > > > index 78758e2..0187262 100644
> > > > > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > > > > @@ -44,6 +44,7 @@
> > > > > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > >  
> > > > > > > > > >  /* Supported chipsets: */
> > > > > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > > > > >  }
> > > > > > > > > >  
> > > > > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > > > > +{
> > > > > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > > > > +
> > > > > > > > > > +    dev = aml_device("VGEN");
> > > > > > > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > > > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > > > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > > > > > > +
> > > > > > > > > > +    pkg = aml_package(2);
> > > > > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > > > > +
> > > > > > > > > > +    /*
> > > > > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > > > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > > > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > > > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > > > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > > > > > +     * For more verbose comment see this commit message.          
> > > > > > > > > 
> > > > > > > > > What does "this commit message" mean?        
> > > > > > > > above commit message. Should I reword it to just 'see commit message'
> > > > > > > >         
> > > > > > > > >         
> > > > > > > > > > +     */
> > > > > > > > > > +     crs = aml_resource_template();
> > > > > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > > > > +     return dev;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  /*
> > > > > > > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > > > > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > >              }
> > > > > > > > > >  
> > > > > > > > > >              if (bus) {
> > > > > > > > > > +                Object *vmgen;
> > > > > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > > > > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > >                      aml_append(scope, dev);
> > > > > > > > > >                  }
> > > > > > > > > >  
> > > > > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > > > > +                if (vmgen) {
> > > > > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > > > > +                    uint64_t buf_paddr =
> > > > > > > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > > > > > > +
> > > > > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > > > > > > +
> > > > > > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > > > > > +                        aml_append(method,
> > > > > > > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > > > > +                                       aml_int(0x80)));
> > > > > > > > > > +                        aml_append(ssdt, method);
> > > > > > > > > > +                    }
> > > > > > > > > > +                }
> > > > > > > > > > +
> > > > > > > > > >                  aml_append(sb_scope, scope);
> > > > > > > > > >              }
> > > > > > > > > >          }
> > > > > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > > > > > > >      {
> > > > > > > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > > > > > > >  
> > > > > > > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > > > > > > -
> > > > > > > > > >          if (misc->is_piix4) {
> > > > > > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > > > > > >              aml_append(method,
> > > > > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > > > > index d4765c2..1f05edd 100644
> > > > > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > > > > > > >  
> > > > > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > > > > new file mode 100644
> > > > > > > > > > index 0000000..a2fbdfc
> > > > > > > > > > --- /dev/null
> > > > > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > > > > @@ -0,0 +1,154 @@
> > > > > > > > > > +/*
> > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > + *
> > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > + *
> > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > + *
> > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > + *
> > > > > > > > > > + */
> > > > > > > > > > +
> > > > > > > > > > +#include "hw/i386/pc.h"
> > > > > > > > > > +#include "hw/pci/pci.h"
> > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > > > > +#include "qapi/visitor.h"
> > > > > > > > > > +
> > > > > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > > > > > > +
> > > > > > > > > > +typedef struct VmGenIdState {
> > > > > > > > > > +    PCIDevice parent_obj;
> > > > > > > > > > +    MemoryRegion iomem;
> > > > > > > > > > +    union {
> > > > > > > > > > +        uint8_t guid[16];
> > > > > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > > > > +    };
> > > > > > > > > > +    bool guid_set;
> > > > > > > > > > +} VmGenIdState;
> > > > > > > > > > +
> > > > > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > > > > +{
> > > > > > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > > > > > > +    if (!obj) {
> > > > > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > > > > +    }
> > > > > > > > > > +    return obj;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > > > > +{
> > > > > > > > > > +    Object *acpi_obj;
> > > > > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > > > > +
> > > > > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > > > > > +
> > > > > > > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > > > > +    if (acpi_obj) {
> > > > > > > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > > > > +
> > > > > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > > > > +    }
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > > > > > > +{
> > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > +
> > > > > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > > > > +                   value);
> > > > > > > > > > +        return;
> > > > > > > > > > +    }
> > > > > > > > > > +
> > > > > > > > > > +    s->guid_set = true;
> > > > > > > > > > +    vmgenid_update_guest(s);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > > > > > > +                                   const char *name, Error **errp)
> > > > > > > > > > +{
> > > > > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > > > > +
> > > > > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > > > > +        return;
> > > > > > > > > > +    }
> > > > > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > > > > +{
> > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > +
> > > > > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > > > > > > +                           &error_abort);
> > > > > > > > > > +
> > > > > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +
> > > > > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > > > > +{
> > > > > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > > > > +    bool ambiguous = false;
> > > > > > > > > > +
> > > > > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > > > > > +    if (ambiguous) {
> > > > > > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > > > > > +                         " device is permitted");
> > > > > > > > > > +        return;
> > > > > > > > > > +    }
> > > > > > > > > > +
> > > > > > > > > > +    if (!s->guid_set) {
> > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > > > > +        return;
> > > > > > > > > > +    }
> > > > > > > > > > +
> > > > > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > > > > +        &s->iomem);
> > > > > > > > > > +    return;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > > > > > +{
> > > > > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > > > > +
> > > > > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > > > > +    dc->hotpluggable = false;
> > > > > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > > > > +};
> > > > > > > > > > +
> > > > > > > > > > +static void vmgenid_register_types(void)
> > > > > > > > > > +{
> > > > > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +type_init(vmgenid_register_types)
> > > > > > > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > > > > > > new file mode 100644
> > > > > > > > > > index 0000000..b90882c
> > > > > > > > > > --- /dev/null
> > > > > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > > > > @@ -0,0 +1,27 @@
> > > > > > > > > > +/*
> > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > + *
> > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > + *
> > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > + *
> > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > + *
> > > > > > > > > > + */
> > > > > > > > > > +
> > > > > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > > > > +
> > > > > > > > > > +#include "qom/object.h"
> > > > > > > > > > +
> > > > > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > > > > +
> > > > > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > > > > +
> > > > > > > > > > +#endif
> > > > > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > > > > index dedf277..f4c9d48 100644
> > > > > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > > > > @@ -94,6 +94,7 @@
> > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > > > > >  
> > > > > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > > > > -- 
> > > > > > > > > > 1.8.3.1          
> > > > >     
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-11 15:16                     ` Igor Mammedov
@ 2016-02-11 16:30                       ` Michael S. Tsirkin
  2016-02-11 17:34                         ` Marcel Apfelbaum
                                           ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-11 16:30 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer, qemu-devel,
	lcapitulino, lersek

On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> On Tue, 9 Feb 2016 14:17:44 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> > > > So the linker interface solves this rather neatly:
> > > > bios allocates memory, bios passes memory map to guest.
> > > > Served us well for several years without need for extensions,
> > > > and it does solve the VM GEN ID problem, even though
> > > > 1. it was never designed for huge areas like nvdimm seems to want to use
> > > > 2. we might want to add a new 64 bit flag to avoid touching low memory  
> > > linker interface is fine for some readonly data, like ACPI tables
> > > especially fixed tables not so for AML ones is one wants to patch it.
> > > 
> > > However now when you want to use it for other purposes you start
> > > adding extensions and other guest->QEMU channels to communicate
> > > patching info back.
> > > It steals guest's memory which is also not nice and doesn't scale well.  
> > 
> > This is an argument I don't get. memory is memory. call it guest memory
> > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > but much slower.
> > 
> > ...
> It however matters for user, he pays for guest with XXX RAM but gets less
> than that. And that will be getting worse as a number of such devices
> increases.
> 
> > > > OK fine, but returning PCI BAR address to guest is wrong.
> > > > How about reading it from ACPI then? Is it really
> > > > broken unless there's *also* a driver?  
> > > I don't get question, MS Spec requires address (ADDR method),
> > > and it's read by ACPI (AML).  
> > 
> > You were unhappy about DMA into guest memory.
> > As a replacement for DMA, we could have AML read from
> > e.g. PCI and write into RAM.
> > This way we don't need to pass address to QEMU.
> That sounds better as it saves us from allocation of IO port
> and QEMU don't need to write into guest memory, the only question is
> if PCI_Config opregion would work with driver-less PCI device.

Or PCI BAR for that reason. I don't know for sure.

> 
> And it's still pretty much not test-able since it would require
> fully running OSPM to execute AML side.

AML is not testable, but that's nothing new.
You can test reading from PCI.

> > 
> > > As for working PCI_Config OpRegion without driver, I haven't tried,
> > > but I wouldn't be surprised if it doesn't, taking in account that
> > > MS introduced _DSM doesn't.
> > >   
> > > > 
> > > >   
> > > > > > >    Just compare with a graphics card design, where on device memory
> > > > > > >    is mapped directly at some GPA not wasting RAM that guest could
> > > > > > >    use for other tasks.      
> > > > > > 
> > > > > > This might have been true 20 years ago.  Most modern cards do DMA.    
> > > > > 
> > > > > Modern cards, with it's own RAM, map its VRAM in address space directly
> > > > > and allow users use it (GEM API). So they do not waste conventional RAM.
> > > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > > > > series (even PCI class id is the same)    
> > > > 
> > > > Don't know enough about graphics really, I'm not sure how these are
> > > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > > mostly use guest RAM, not on card RAM.
> > > >   
> > > > > > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > > > >    instead of consuming guest's RAM they should be mapped at
> > > > > > >    some GPA and their memory accessed directly.      
> > > > > > 
> > > > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > > > address. This breaks the straight-forward approach of using a
> > > > > > rebalanceable PCI BAR.    
> > > > > 
> > > > > For PCI rebalance to work on Windows, one has to provide working PCI driver
> > > > > otherwise OS will ignore it when rebalancing happens and
> > > > > might map something else over ignored BAR.    
> > > > 
> > > > Does it disable the BAR then? Or just move it elsewhere?  
> > > it doesn't, it just blindly ignores BARs existence and maps BAR of
> > > another device with driver over it.  
> > 
> > Interesting. On classical PCI this is a forbidden configuration.
> > Maybe we do something that confuses windows?
> > Could you tell me how to reproduce this behaviour?
> #cat > t << EOF
> pci_update_mappings_del
> pci_update_mappings_add
> EOF
> 
> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
>  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
>  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
>  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> 
> wait till OS boots, note BARs programmed for ivshmem
>  in my case it was
>    01:01.0 0,0xfe800000+0x100
> then execute script and watch pci_update_mappings* trace events
> 
> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> 
> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> and then programs new BARs, where:
>   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> creates overlapping BAR with ivshmem 


Thanks!
We need to figure this out because currently this does not
work properly (or maybe it works, but merely by chance).
Me and Marcel will play with this.

> 
> > 
> > > >   
> > > > > >     
> > > > > > >    In that case NVDIMM could even map whole label area and
> > > > > > >    significantly simplify QEMU<->OSPM protocol that currently
> > > > > > >    serializes that data through a 4K page.
> > > > > > >    There is also performance issue with buffer allocated in RAM,
> > > > > > >    because DMA adds unnecessary copying step when data could
> > > > > > >    be read/written directly of NVDIMM.
> > > > > > >    It might be no very important for _DSM interface but when it
> > > > > > >    comes to supporting block mode it can become an issue.      
> > > > > > 
> > > > > > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > > > > > it's guaranteed to work across BAR rebalancing.
> > > > > > Would that address the performance issue?    
> > > > > 
> > > > > it would if rebalancing were to account for driverless PCI device BARs,
> > > > > but it doesn't hence such BARs need to be statically pinned
> > > > > at place where BIOS put them at start up.
> > > > > I'm also not sure that PCIConfig operation region would work
> > > > > on Windows without loaded driver (similar to _DSM case).
> > > > > 
> > > > >     
> > > > > > > Above points make ACPI patching approach not robust and fragile
> > > > > > > and hard to maintain.      
> > > > > > 
> > > > > > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > > > > > get what appears your general dislike of the linker host/guest
> > > > > > interface.    
> > > > > Besides technical issues general dislike is just what I've written
> > > > > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > > > > 
> > > > > to make it less fragile:
> > > > >  1. it should be impossible to corrupt memory or patch wrong address.
> > > > >     current impl. silently relies on value referenced by 'pointer' argument
> > > > >     and to figure that out one has to read linker code on BIOS side.
> > > > >     That could be easily set wrong and slip through review.    
> > > > 
> > > > That's an API issue, it seemed like a good idea but I guess
> > > > it confuses people. Would you be happier using an offset
> > > > instead of a pointer?  
> > > offset is better and it would be better if it were saying
> > > which offset it is (i.e. relative to what)  
> > 
> > 
> > Start of table, right?
> not sure, to me it looks like start of a blob and not the table

Right that's what I meant.

> > 
> > > >   
> > > > >     API shouldn't rely on the caller setting value pointed by that argument.    
> > > > 
> > > > I couldn't parse that one. Care suggesting a cleaner API for linker?  
> > > here is current API signature:
> > > 
> > > bios_linker_loader_add_pointer(GArray *linker,
> > >                                     const char *dest_file,
> > >                                     const char *src_file,
> > >                                     GArray *table, void *pointer,
> > >                                     uint8_t pointer_size)
> > > 
> > > issue 1: 
> > > where 'pointer' is a real pointer pointing inside 'table' and API
> > > calculates offset underhood:
> > >   offset = (gchar *)pointer - table->data;
> > > and puts it in ADD_POINTER command.
> > > 
> > > it's easy to get wrong offset if 'pointer' is not from 'table'.  
> > 
> > OK, replace that with table_offset?
> blob_offset?
> 
> also s/table/blob/


OK.

> > 
> > > issue 2:
> > > 'pointer' points to another offset of size 'pointer_size' in 'table'
> > > blob, that means that whoever composes blob, has to aware of
> > > it and fill correct value there which is possible to do right
> > > if one looks inside of SeaBIOS part of linker interface.
> > > Which is easy to forget and then one has to deal with mess
> > > caused by random memory corruption.
> > > 
> > > bios_linker_loader_add_pointer() and corresponding
> > > ADD_POINTER command should take this second offset as argument
> > > and do no require 'table' be pre-filled with it or
> > > in worst case if of extending ADD_POINTER command is problematic
> > > bios_linker_loader_add_pointer() should still take
> > > the second offset and patch 'table' itself so that 'table' composer
> > > don't have to worry about it.  
> > 
> > This one I don't understand. What's the second pointer you
> > are talking about?
> ha, see even the author already has absolutely no clue how linker works
> and about what offsets are relative to.
> see SeaBIOS romfile_loader_add_pointer():
>     ...
>     memcpy(&pointer, dest_file->data + offset, entry->pointer_size);
> here is the second offset       ^^^^^^^^^^^^^

It's the same offset in the entry.
        struct {
            char pointer_dest_file[ROMFILE_LOADER_FILESZ];
            char pointer_src_file[ROMFILE_LOADER_FILESZ];
            u32 pointer_offset;
            u8 pointer_size;
        };



> it should be properly named field of ADD_POINTER command and
> the not part of data blob.
> 
>     pointer = le64_to_cpu(pointer);
>     pointer += (unsigned long)src_file->data;
>     pointer = cpu_to_le64(pointer);
>     memcpy(dest_file->data + offset, &pointer, entry->pointer_size);
> 
> all this src|dst_file and confusing offsets(whatever they might mean)
> make me experience headache every time I need to remember how linker
> works and read both QEMU and SeaBIOS code to figure it out each time.
> That's what I'd call un-maintainable and hard to use API.

Tight, there's lack of documentation. It's my fault, so let's fix it.
It's an API issue, nothing to do with ABI.


> 
> > 
> > > issue 3:
> > > all patching obviously needs bounds checking on QEMU side
> > > so it would abort early if it could corrupt memory.  
> > 
> > That's easy.
> > 
> > > >   
> > > > >  2. If it's going to be used for patching AML, it should assert
> > > > >     when bios_linker_loader_add_pointer() is called if to be patched
> > > > >     AML object is wrong and patching would corrupt AML blob.    
> > > > 
> > > > Hmm for example check that the patched data has
> > > > the expected pattern?  
> > > yep, nothing could be done for raw tables but that should be possible
> > > for AML tables and if pattern is unsupported/size doesn't match
> > > it should abort QEMU early instead of corrupting table.  
> > 
> > Above all sounds reasonable. Would you like to take a stub
> > at it or prefer me to?
> It would be better if it were you.
> 
> I wouldn't like to maintain it ever as it's too complex and hard to use API,
> which I'd use only as the last resort if there weren't any other way
> to implement the task at hand.

Sorry, I don't get it. You don't like the API, write a better one
for an existing ABI. If you prefer waiting for me to fix it,
that's fine too but no guarantees that you will like the new one
or when it will happen.

Look there have been 1 change (a bigfix for alignment) in several years
since we added the linker. We don't maintain any compatiblity flags
around it *at all*. It might have a hard to use API but that is the
definition of easy to maintain.  You are pushing allocating memory host
side as an alternative, what happened there is the reverse. A ton of
changes and pain all the way, and we get to maintain a bag of compat
hacks for old machine types. You say we finally know what we are
doing and won't have to change it any more. I'm not convinced.

> > > >   
> > > > >     
> > > > > > It's there and we are not moving away from it, so why not
> > > > > > use it in more places?  Or if you think it's wrong, why don't you build
> > > > > > something better then?  We could then maybe use it for these things as
> > > > > > well.    
> > > > > 
> > > > > Yep, I think for vmgenid and even more so for nvdimm
> > > > > it would be better to allocate GPAs in QEMU and map backing
> > > > > MemoryRegions directly in QEMU.
> > > > > For nvdimm (main data region)
> > > > > we already do it using pc-dimm's GPA allocation algorithm, we also
> > > > > could use similar approach for nvdimm's label area and vmgenid.
> > > > > 
> > > > > Here is a simple attempt to add a limited GPA allocator in high memory
> > > > >  https://patchwork.ozlabs.org/patch/540852/
> > > > > But it haven't got any comment from you and were ignored.
> > > > > Lets consider it and perhaps we could come up with GPA allocator
> > > > > that could be used for other things as well.    
> > > > 
> > > > For nvdimm label area, I agree passing things through
> > > > a 4K buffer seems inefficient.
> > > > 
> > > > I'm not sure what's a better way though.
> > > > 
> > > > Use 64 bit memory? Setting aside old guests such as XP,
> > > > does it break 32 bit guests?  
> > > it might not work with 32bit guests, the same way as mem hotplug
> > > doesn't work for them unless they are PAE enabled.  
> > 
> > Right, I mean with PAE.
> I've tested it with 32-bit XP and Windows 10, they boot fine and
> vmgenid device is displayed as OK with buffer above 4Gb (on Win10).
> So at least is doesn't crash guest.
> I can't test more than that for 32 bit guests since utility
> to read vmgenid works only on Windows Server and there isn't
> a 32bit version of it.
> 
> > 
> > > but well that's a limitation of implementation and considering
> > > that storage nvdimm area is mapped at 64bit GPA it doesn't matter.
> > >    
> > > > I'm really afraid of adding yet another allocator, I think you
> > > > underestimate the maintainance headache: it's not theoretical and is
> > > > already felt.  
> > > Current maintenance headache is due to fixed handpicked
> > > mem layout, we can't do much with it for legacy machine
> > > types but with QEMU side GPA allocator we can try to switch
> > > to a flexible memory layout that would allocate GPA
> > > depending on QEMU config in a stable manner.  
> > 
> > So far, we didn't manage to. It seems to go in the reverse
> > direction were we add more and more control to let management
> > influence the layout. Things like alignment requirements
> > also tend to surface later and wreck havoc on whatever
> > we do.
> Was even there an attempt to try it before, could you point to it?

Look at the mess we have with the existing allocator.
As a way to fix this unmaintainable mess, what I see is suggestions
to drop old machine types so people have to reinstall guests.
This does not inspire confidence.

> The only attempt I've seen was https://patchwork.ozlabs.org/patch/540852/
> but it haven't got any technical comments from you,
> except of 'I'm afraid that it won't work' on IRC.
> 
> QEMU already has GPA allocator limited to memory hotplug AS
> and it has passed through 'growing' issues. What above patch
> proposes is to reuse already existing memory hotplug AS and
> maybe make its GPA allocator more generic (i.e. not tied
> only to pc-dimm) on top of it.

You say we finally know what we are doing and won't have to change it
any more. I'm not convinced.

> 
> It's sufficient for vmgenid use-case and a definitely
> much more suitable for nvdimm which already uses it for mapping
> main storage MemoryRegion.
> 
> > > well there is maintenance headache with bios_linker as well
> > > due to its complexity (multiple layers of indirection) and
> > > it will grow when more places try to use it.
> > > Yep we could use it as a hack, stealing RAM and trying implement
> > > backwards DMA or we could be less afraid and consider
> > > yet another allocator which will do the job without hacks
> > > which should benefit QEMU in a long run (it might be not easy
> > > to impl. it right but if we won't even try we would be buried
> > > in complex hacks that 'work' for now)
> > >   
> > > > > >     
> > > > > > >       
> > > > > > > >       
> > > > > > > > > > And hey, if you want to use a pci device to pass the physical
> > > > > > > > > > address guest to host, instead of reserving
> > > > > > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > > > > > automatically.        
> > > > > > > > > Could you elaborate more on this suggestion?        
> > > > > > > > 
> > > > > > > > I really just mean using PCI_Config operation region.
> > > > > > > > If you wish, I'll try to post a prototype next week.      
> > > > > > > I don't know much about PCI but it would be interesting,
> > > > > > > perhaps we could use it somewhere else.
> > > > > > > 
> > > > > > > However it should be checked if it works with Windows,
> > > > > > > for example PCI specific _DSM method is ignored by it
> > > > > > > if PCI device doesn't have working PCI driver bound to it.
> > > > > > >       
> > > > > > > >       
> > > > > > > > > > 
> > > > > > > > > >         
> > > > > > > > > > > ---
> > > > > > > > > > > changes since 17:
> > > > > > > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > > > > > changes since 14:
> > > > > > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > > > > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > >   - permit only one vmgenid to be created
> > > > > > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > > > > > > > ---
> > > > > > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > > > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > > > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > > > > > > > index b177e52..6402439 100644
> > > > > > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > > > > > >  
> > > > > > > > > > >  All these devices are documented in docs/specs.
> > > > > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > > > > > index 78758e2..0187262 100644
> > > > > > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > > > > > @@ -44,6 +44,7 @@
> > > > > > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > >  
> > > > > > > > > > >  /* Supported chipsets: */
> > > > > > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > > > > > >  }
> > > > > > > > > > >  
> > > > > > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > > > > > +{
> > > > > > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > > > > > +
> > > > > > > > > > > +    dev = aml_device("VGEN");
> > > > > > > > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > > > > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > > > > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > > > > > > > +
> > > > > > > > > > > +    pkg = aml_package(2);
> > > > > > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > > > > > +
> > > > > > > > > > > +    /*
> > > > > > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > > > > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > > > > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > > > > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > > > > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > > > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > > > > > > +     * For more verbose comment see this commit message.          
> > > > > > > > > > 
> > > > > > > > > > What does "this commit message" mean?        
> > > > > > > > > above commit message. Should I reword it to just 'see commit message'
> > > > > > > > >         
> > > > > > > > > >         
> > > > > > > > > > > +     */
> > > > > > > > > > > +     crs = aml_resource_template();
> > > > > > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > > > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > > > > > +     return dev;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > >  /*
> > > > > > > > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > > > > > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > > >              }
> > > > > > > > > > >  
> > > > > > > > > > >              if (bus) {
> > > > > > > > > > > +                Object *vmgen;
> > > > > > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > > > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > > > > > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > > >                      aml_append(scope, dev);
> > > > > > > > > > >                  }
> > > > > > > > > > >  
> > > > > > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > > > > > +                if (vmgen) {
> > > > > > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > > > > > +                    uint64_t buf_paddr =
> > > > > > > > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > > > > > > > +
> > > > > > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > > > > > > > +
> > > > > > > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > > > > > > +                        aml_append(method,
> > > > > > > > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > > > > > +                                       aml_int(0x80)));
> > > > > > > > > > > +                        aml_append(ssdt, method);
> > > > > > > > > > > +                    }
> > > > > > > > > > > +                }
> > > > > > > > > > > +
> > > > > > > > > > >                  aml_append(sb_scope, scope);
> > > > > > > > > > >              }
> > > > > > > > > > >          }
> > > > > > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > > > > > > > >      {
> > > > > > > > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > > > > > > > >  
> > > > > > > > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > > > > > > > -
> > > > > > > > > > >          if (misc->is_piix4) {
> > > > > > > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > > > > > > >              aml_append(method,
> > > > > > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > > > > > index d4765c2..1f05edd 100644
> > > > > > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > > > > > > > >  
> > > > > > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > > > > > new file mode 100644
> > > > > > > > > > > index 0000000..a2fbdfc
> > > > > > > > > > > --- /dev/null
> > > > > > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > > > > > @@ -0,0 +1,154 @@
> > > > > > > > > > > +/*
> > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > + *
> > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > + *
> > > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > > + *
> > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > + *
> > > > > > > > > > > + */
> > > > > > > > > > > +
> > > > > > > > > > > +#include "hw/i386/pc.h"
> > > > > > > > > > > +#include "hw/pci/pci.h"
> > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > > > > > +#include "qapi/visitor.h"
> > > > > > > > > > > +
> > > > > > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > > > > > > > +
> > > > > > > > > > > +typedef struct VmGenIdState {
> > > > > > > > > > > +    PCIDevice parent_obj;
> > > > > > > > > > > +    MemoryRegion iomem;
> > > > > > > > > > > +    union {
> > > > > > > > > > > +        uint8_t guid[16];
> > > > > > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > > > > > +    };
> > > > > > > > > > > +    bool guid_set;
> > > > > > > > > > > +} VmGenIdState;
> > > > > > > > > > > +
> > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > > > > > > > +    if (!obj) {
> > > > > > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > > > > > +    }
> > > > > > > > > > > +    return obj;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > > > > > +{
> > > > > > > > > > > +    Object *acpi_obj;
> > > > > > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > > > > > +
> > > > > > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > > > > > > +
> > > > > > > > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > > > > > +    if (acpi_obj) {
> > > > > > > > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > > > > > +
> > > > > > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > > > > > +    }
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > +
> > > > > > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > > > > > +                   value);
> > > > > > > > > > > +        return;
> > > > > > > > > > > +    }
> > > > > > > > > > > +
> > > > > > > > > > > +    s->guid_set = true;
> > > > > > > > > > > +    vmgenid_update_guest(s);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > > > > > > > +                                   const char *name, Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > > > > > +
> > > > > > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > > > > > +        return;
> > > > > > > > > > > +    }
> > > > > > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > > > > > +{
> > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > +
> > > > > > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > > > > > > > +                           &error_abort);
> > > > > > > > > > > +
> > > > > > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > > > > > +    bool ambiguous = false;
> > > > > > > > > > > +
> > > > > > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > > > > > > +    if (ambiguous) {
> > > > > > > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > > > > > > +                         " device is permitted");
> > > > > > > > > > > +        return;
> > > > > > > > > > > +    }
> > > > > > > > > > > +
> > > > > > > > > > > +    if (!s->guid_set) {
> > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > > > > > +        return;
> > > > > > > > > > > +    }
> > > > > > > > > > > +
> > > > > > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > > > > > +        &s->iomem);
> > > > > > > > > > > +    return;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > > > > > > +{
> > > > > > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > > > > > +
> > > > > > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > > > > > +    dc->hotpluggable = false;
> > > > > > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > > > > +static void vmgenid_register_types(void)
> > > > > > > > > > > +{
> > > > > > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +type_init(vmgenid_register_types)
> > > > > > > > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > > > > > > > new file mode 100644
> > > > > > > > > > > index 0000000..b90882c
> > > > > > > > > > > --- /dev/null
> > > > > > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > > > > > @@ -0,0 +1,27 @@
> > > > > > > > > > > +/*
> > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > + *
> > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > + *
> > > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > > + *
> > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > + *
> > > > > > > > > > > + */
> > > > > > > > > > > +
> > > > > > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > > > > > +
> > > > > > > > > > > +#include "qom/object.h"
> > > > > > > > > > > +
> > > > > > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > > > > > +
> > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > > > > > +
> > > > > > > > > > > +#endif
> > > > > > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > > > > > index dedf277..f4c9d48 100644
> > > > > > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > > > > > @@ -94,6 +94,7 @@
> > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > > > > > >  
> > > > > > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > > > > > -- 
> > > > > > > > > > > 1.8.3.1          
> > > > > >     
> > 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-11 16:30                       ` Michael S. Tsirkin
@ 2016-02-11 17:34                         ` Marcel Apfelbaum
  2016-02-12  6:15                           ` Michael S. Tsirkin
  2016-02-15 10:30                         ` Igor Mammedov
  2016-02-16 10:05                         ` Marcel Apfelbaum
  2 siblings, 1 reply; 59+ messages in thread
From: Marcel Apfelbaum @ 2016-02-11 17:34 UTC (permalink / raw)
  To: Michael S. Tsirkin, Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer, qemu-devel,
	lcapitulino, lersek

On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
>> On Tue, 9 Feb 2016 14:17:44 +0200
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>
>>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
>>>>> So the linker interface solves this rather neatly:
>>>>> bios allocates memory, bios passes memory map to guest.
>>>>> Served us well for several years without need for extensions,
>>>>> and it does solve the VM GEN ID problem, even though
>>>>> 1. it was never designed for huge areas like nvdimm seems to want to use
>>>>> 2. we might want to add a new 64 bit flag to avoid touching low memory
>>>> linker interface is fine for some readonly data, like ACPI tables
>>>> especially fixed tables not so for AML ones is one wants to patch it.
>>>>
>>>> However now when you want to use it for other purposes you start
>>>> adding extensions and other guest->QEMU channels to communicate
>>>> patching info back.
>>>> It steals guest's memory which is also not nice and doesn't scale well.
>>>
>>> This is an argument I don't get. memory is memory. call it guest memory
>>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
>>> but much slower.
>>>
>>> ...
>> It however matters for user, he pays for guest with XXX RAM but gets less
>> than that. And that will be getting worse as a number of such devices
>> increases.
>>
>>>>> OK fine, but returning PCI BAR address to guest is wrong.
>>>>> How about reading it from ACPI then? Is it really
>>>>> broken unless there's *also* a driver?
>>>> I don't get question, MS Spec requires address (ADDR method),
>>>> and it's read by ACPI (AML).
>>>
>>> You were unhappy about DMA into guest memory.
>>> As a replacement for DMA, we could have AML read from
>>> e.g. PCI and write into RAM.
>>> This way we don't need to pass address to QEMU.
>> That sounds better as it saves us from allocation of IO port
>> and QEMU don't need to write into guest memory, the only question is
>> if PCI_Config opregion would work with driver-less PCI device.
>
> Or PCI BAR for that reason. I don't know for sure.
>
>>
>> And it's still pretty much not test-able since it would require
>> fully running OSPM to execute AML side.
>
> AML is not testable, but that's nothing new.
> You can test reading from PCI.
>
>>>
>>>> As for working PCI_Config OpRegion without driver, I haven't tried,
>>>> but I wouldn't be surprised if it doesn't, taking in account that
>>>> MS introduced _DSM doesn't.
>>>>
>>>>>
>>>>>
>>>>>>>>     Just compare with a graphics card design, where on device memory
>>>>>>>>     is mapped directly at some GPA not wasting RAM that guest could
>>>>>>>>     use for other tasks.
>>>>>>>
>>>>>>> This might have been true 20 years ago.  Most modern cards do DMA.
>>>>>>
>>>>>> Modern cards, with it's own RAM, map its VRAM in address space directly
>>>>>> and allow users use it (GEM API). So they do not waste conventional RAM.
>>>>>> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
>>>>>> series (even PCI class id is the same)
>>>>>
>>>>> Don't know enough about graphics really, I'm not sure how these are
>>>>> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
>>>>> mostly use guest RAM, not on card RAM.
>>>>>
>>>>>>>>     VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
>>>>>>>>     instead of consuming guest's RAM they should be mapped at
>>>>>>>>     some GPA and their memory accessed directly.
>>>>>>>
>>>>>>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
>>>>>>> address. This breaks the straight-forward approach of using a
>>>>>>> rebalanceable PCI BAR.
>>>>>>
>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver
>>>>>> otherwise OS will ignore it when rebalancing happens and
>>>>>> might map something else over ignored BAR.
>>>>>
>>>>> Does it disable the BAR then? Or just move it elsewhere?
>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
>>>> another device with driver over it.
>>>
>>> Interesting. On classical PCI this is a forbidden configuration.
>>> Maybe we do something that confuses windows?
>>> Could you tell me how to reproduce this behaviour?
>> #cat > t << EOF
>> pci_update_mappings_del
>> pci_update_mappings_add
>> EOF
>>
>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
>>   -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
>>   -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
>>   -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
>>
>> wait till OS boots, note BARs programmed for ivshmem
>>   in my case it was
>>     01:01.0 0,0xfe800000+0x100
>> then execute script and watch pci_update_mappings* trace events
>>
>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
>>
>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
>> and then programs new BARs, where:
>>    pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
>> creates overlapping BAR with ivshmem
>

Hi,

Let me see if I understand.
You say that in Windows, if a device does not have a driver installed,
its BARS ranges can be used after re-balancing by other devices, right?

If yes, in Windows we cannot use the device anyway, so we shouldn't care, right?

Our only problem remains the overlapping memory regions for
the old device and the new one and we need to ensure only the new device
will use these region?


Thanks,
Marcel

>
> Thanks!
> We need to figure this out because currently this does not
> work properly (or maybe it works, but merely by chance).
> Me and Marcel will play with this.
>
[...]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-05  7:51                     ` Marcel Apfelbaum
@ 2016-02-11 19:41                       ` Eduardo Habkost
  2016-02-12  9:17                         ` Marcel Apfelbaum
  0 siblings, 1 reply; 59+ messages in thread
From: Eduardo Habkost @ 2016-02-11 19:41 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Peter Maydell, qemu-devel, Michael S. Tsirkin, ghammer, agraf,
	Markus Armbruster, borntraeger, qemu-ppc, Gerd Hoffmann, david,
	Igor Mammedov, cornelia.huck, Paolo Bonzini, lcapitulino, lersek,
	Andreas Färber, rth

On Fri, Feb 05, 2016 at 09:51:07AM +0200, Marcel Apfelbaum wrote:
> On 02/05/2016 09:49 AM, Markus Armbruster wrote:
> >"Michael S. Tsirkin" <mst@redhat.com> writes:
> >
> >>On Thu, Feb 04, 2016 at 12:55:22PM +0100, Paolo Bonzini wrote:
> >>>
> >>>
> >>>On 04/02/2016 12:41, Andreas Färber wrote:
> >>>>You're talking about machine, right? Some time ago I had proposed Marcel
> >>>>who initially worked on it, but I'm fine with anyone taking it.
> >>>
> >>>Yes.
> >>>
> >>>>For some (but not all) core qdev parts related to the (stalled) QOM
> >>>>migration I've been taking care of via qom-next. Last time this came up
> >>>>you didn't want anyone to be M: for qdev, so maybe we can use R: so that
> >>>>at least people automatically get CC'ed and we avoid this recurring
> >>>>discussion?
> >>>
> >>>I might have changed my mind on that.  You definitely should be M: for qdev.
> >>>
> >>>Paolo
> >>
> >>If Andreas wants to, that's also fine. Several maintainers are
> >>better than one.
> >
> >*If* the maintainers are all willing and able to work together.
> >
> 
> No problem here from my point of view :)

No problem to me, too. :)

I am going to be away from work for 15 days starting on Tuesday
Feb 16th. So if Marcel wants to start queueing patches already,
please be my guest. I will be able to help on that after I'm
back.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-11 17:34                         ` Marcel Apfelbaum
@ 2016-02-12  6:15                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-12  6:15 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer, qemu-devel,
	lcapitulino, Igor Mammedov, lersek

On Thu, Feb 11, 2016 at 07:34:52PM +0200, Marcel Apfelbaum wrote:
> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
> >On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> >>On Tue, 9 Feb 2016 14:17:44 +0200
> >>"Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>
> >>>On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
> >>>>>So the linker interface solves this rather neatly:
> >>>>>bios allocates memory, bios passes memory map to guest.
> >>>>>Served us well for several years without need for extensions,
> >>>>>and it does solve the VM GEN ID problem, even though
> >>>>>1. it was never designed for huge areas like nvdimm seems to want to use
> >>>>>2. we might want to add a new 64 bit flag to avoid touching low memory
> >>>>linker interface is fine for some readonly data, like ACPI tables
> >>>>especially fixed tables not so for AML ones is one wants to patch it.
> >>>>
> >>>>However now when you want to use it for other purposes you start
> >>>>adding extensions and other guest->QEMU channels to communicate
> >>>>patching info back.
> >>>>It steals guest's memory which is also not nice and doesn't scale well.
> >>>
> >>>This is an argument I don't get. memory is memory. call it guest memory
> >>>or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> >>>but much slower.
> >>>
> >>>...
> >>It however matters for user, he pays for guest with XXX RAM but gets less
> >>than that. And that will be getting worse as a number of such devices
> >>increases.
> >>
> >>>>>OK fine, but returning PCI BAR address to guest is wrong.
> >>>>>How about reading it from ACPI then? Is it really
> >>>>>broken unless there's *also* a driver?
> >>>>I don't get question, MS Spec requires address (ADDR method),
> >>>>and it's read by ACPI (AML).
> >>>
> >>>You were unhappy about DMA into guest memory.
> >>>As a replacement for DMA, we could have AML read from
> >>>e.g. PCI and write into RAM.
> >>>This way we don't need to pass address to QEMU.
> >>That sounds better as it saves us from allocation of IO port
> >>and QEMU don't need to write into guest memory, the only question is
> >>if PCI_Config opregion would work with driver-less PCI device.
> >
> >Or PCI BAR for that reason. I don't know for sure.
> >
> >>
> >>And it's still pretty much not test-able since it would require
> >>fully running OSPM to execute AML side.
> >
> >AML is not testable, but that's nothing new.
> >You can test reading from PCI.
> >
> >>>
> >>>>As for working PCI_Config OpRegion without driver, I haven't tried,
> >>>>but I wouldn't be surprised if it doesn't, taking in account that
> >>>>MS introduced _DSM doesn't.
> >>>>
> >>>>>
> >>>>>
> >>>>>>>>    Just compare with a graphics card design, where on device memory
> >>>>>>>>    is mapped directly at some GPA not wasting RAM that guest could
> >>>>>>>>    use for other tasks.
> >>>>>>>
> >>>>>>>This might have been true 20 years ago.  Most modern cards do DMA.
> >>>>>>
> >>>>>>Modern cards, with it's own RAM, map its VRAM in address space directly
> >>>>>>and allow users use it (GEM API). So they do not waste conventional RAM.
> >>>>>>For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> >>>>>>series (even PCI class id is the same)
> >>>>>
> >>>>>Don't know enough about graphics really, I'm not sure how these are
> >>>>>relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> >>>>>mostly use guest RAM, not on card RAM.
> >>>>>
> >>>>>>>>    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> >>>>>>>>    instead of consuming guest's RAM they should be mapped at
> >>>>>>>>    some GPA and their memory accessed directly.
> >>>>>>>
> >>>>>>>VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> >>>>>>>address. This breaks the straight-forward approach of using a
> >>>>>>>rebalanceable PCI BAR.
> >>>>>>
> >>>>>>For PCI rebalance to work on Windows, one has to provide working PCI driver
> >>>>>>otherwise OS will ignore it when rebalancing happens and
> >>>>>>might map something else over ignored BAR.
> >>>>>
> >>>>>Does it disable the BAR then? Or just move it elsewhere?
> >>>>it doesn't, it just blindly ignores BARs existence and maps BAR of
> >>>>another device with driver over it.
> >>>
> >>>Interesting. On classical PCI this is a forbidden configuration.
> >>>Maybe we do something that confuses windows?
> >>>Could you tell me how to reproduce this behaviour?
> >>#cat > t << EOF
> >>pci_update_mappings_del
> >>pci_update_mappings_add
> >>EOF
> >>
> >>#./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> >>  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> >>  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> >>  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> >>
> >>wait till OS boots, note BARs programmed for ivshmem
> >>  in my case it was
> >>    01:01.0 0,0xfe800000+0x100
> >>then execute script and watch pci_update_mappings* trace events
> >>
> >># for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> >>
> >>hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> >>Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> >>and then programs new BARs, where:
> >>   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> >>creates overlapping BAR with ivshmem
> >
> 
> Hi,
> 
> Let me see if I understand.
> You say that in Windows, if a device does not have a driver installed,
> its BARS ranges can be used after re-balancing by other devices, right?
> 
> If yes, in Windows we cannot use the device anyway, so we shouldn't care, right?

If e1000 (has driver) overlaps ibshmem (no driver) we have
a problem as e1000 won't work, or will, but mostly by luck.

> Our only problem remains the overlapping memory regions for
> the old device and the new one and we need to ensure only the new device
> will use these region?
> 
> 
> Thanks,
> Marcel
> 
> >
> >Thanks!
> >We need to figure this out because currently this does not
> >work properly (or maybe it works, but merely by chance).
> >Me and Marcel will play with this.
> >
> [...]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-11 19:41                       ` Eduardo Habkost
@ 2016-02-12  9:17                         ` Marcel Apfelbaum
  2016-02-12 11:22                           ` Andreas Färber
  2016-02-12 18:09                           ` Eduardo Habkost
  0 siblings, 2 replies; 59+ messages in thread
From: Marcel Apfelbaum @ 2016-02-12  9:17 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Peter Maydell, qemu-devel, Michael S. Tsirkin, ghammer, agraf,
	Markus Armbruster, borntraeger, qemu-ppc, Gerd Hoffmann, david,
	Igor Mammedov, cornelia.huck, Paolo Bonzini, lcapitulino, lersek,
	Andreas Färber, rth

On 02/11/2016 09:41 PM, Eduardo Habkost wrote:
> On Fri, Feb 05, 2016 at 09:51:07AM +0200, Marcel Apfelbaum wrote:
>> On 02/05/2016 09:49 AM, Markus Armbruster wrote:
>>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>>
>>>> On Thu, Feb 04, 2016 at 12:55:22PM +0100, Paolo Bonzini wrote:
>>>>>
>>>>>
>>>>> On 04/02/2016 12:41, Andreas Färber wrote:
>>>>>> You're talking about machine, right? Some time ago I had proposed Marcel
>>>>>> who initially worked on it, but I'm fine with anyone taking it.
>>>>>
>>>>> Yes.
>>>>>
>>>>>> For some (but not all) core qdev parts related to the (stalled) QOM
>>>>>> migration I've been taking care of via qom-next. Last time this came up
>>>>>> you didn't want anyone to be M: for qdev, so maybe we can use R: so that
>>>>>> at least people automatically get CC'ed and we avoid this recurring
>>>>>> discussion?
>>>>>
>>>>> I might have changed my mind on that.  You definitely should be M: for qdev.
>>>>>
>>>>> Paolo
>>>>
>>>> If Andreas wants to, that's also fine. Several maintainers are
>>>> better than one.
>>>
>>> *If* the maintainers are all willing and able to work together.
>>>
>>
>> No problem here from my point of view :)
>
> No problem to me, too. :)
>
> I am going to be away from work for 15 days starting on Tuesday
> Feb 16th. So if Marcel wants to start queueing patches already,
> please be my guest. I will be able to help on that after I'm
> back.
>

Hi,

If there are only a few patches on the mailing list, they can wait.
If the number will grow I'll send a pull request.

So the MAINTAINER file should look like this, right?

Regarding qdev, Andreas, I also think you are the most qualified
to take it, will you?

diff --git a/MAINTAINERS b/MAINTAINERS
index 2d6ee17..a86491a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1200,6 +1200,13 @@ F: docs/*qmp-*
  F: scripts/qmp/
  T: git git://repo.or.cz/qemu/armbru.git qapi-next

+Machine
+M: Eduardo Habkost <ehabkost@redhat.com>
+M: Marcel Apfelbaum <marcel@redhat.com>
+S: Supported
+F: hw/core/machine.c
+F: include/hw/boards.h
+



Thanks,
Marcel

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-12  9:17                         ` Marcel Apfelbaum
@ 2016-02-12 11:22                           ` Andreas Färber
  2016-02-12 18:17                             ` Eduardo Habkost
  2016-02-12 18:09                           ` Eduardo Habkost
  1 sibling, 1 reply; 59+ messages in thread
From: Andreas Färber @ 2016-02-12 11:22 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Peter Maydell, qemu-devel, Eduardo Habkost, Michael S. Tsirkin,
	ghammer, agraf, Markus Armbruster, borntraeger, qemu-ppc,
	Gerd Hoffmann, david, Igor Mammedov, cornelia.huck,
	Paolo Bonzini, lcapitulino, lersek, rth

Am 12.02.2016 um 10:17 schrieb Marcel Apfelbaum:
> On 02/11/2016 09:41 PM, Eduardo Habkost wrote:
>> On Fri, Feb 05, 2016 at 09:51:07AM +0200, Marcel Apfelbaum wrote:
>>> On 02/05/2016 09:49 AM, Markus Armbruster wrote:
>>>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>>>
>>>>> On Thu, Feb 04, 2016 at 12:55:22PM +0100, Paolo Bonzini wrote:
>>>>>>
>>>>>>
>>>>>> On 04/02/2016 12:41, Andreas Färber wrote:
>>>>>>> You're talking about machine, right? Some time ago I had proposed
>>>>>>> Marcel
>>>>>>> who initially worked on it, but I'm fine with anyone taking it.
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> For some (but not all) core qdev parts related to the (stalled) QOM
>>>>>>> migration I've been taking care of via qom-next. Last time this
>>>>>>> came up
>>>>>>> you didn't want anyone to be M: for qdev, so maybe we can use R:
>>>>>>> so that
>>>>>>> at least people automatically get CC'ed and we avoid this recurring
>>>>>>> discussion?
>>>>>>
>>>>>> I might have changed my mind on that.  You definitely should be M:
>>>>>> for qdev.
>>>>>>
>>>>>> Paolo
>>>>>
>>>>> If Andreas wants to, that's also fine. Several maintainers are
>>>>> better than one.
>>>>
>>>> *If* the maintainers are all willing and able to work together.
>>>>
>>>
>>> No problem here from my point of view :)
>>
>> No problem to me, too. :)
>>
>> I am going to be away from work for 15 days starting on Tuesday
>> Feb 16th. So if Marcel wants to start queueing patches already,
>> please be my guest. I will be able to help on that after I'm
>> back.
>>
> 
> Hi,
> 
> If there are only a few patches on the mailing list, they can wait.
> If the number will grow I'll send a pull request.
> 
> So the MAINTAINER file should look like this, right?
> 
> Regarding qdev, Andreas, I also think you are the most qualified
> to take it, will you?
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2d6ee17..a86491a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1200,6 +1200,13 @@ F: docs/*qmp-*
>  F: scripts/qmp/
>  T: git git://repo.or.cz/qemu/armbru.git qapi-next
> 
> +Machine
> +M: Eduardo Habkost <ehabkost@redhat.com>
> +M: Marcel Apfelbaum <marcel@redhat.com>
> +S: Supported
> +F: hw/core/machine.c
> +F: include/hw/boards.h
> +

Fine with me, ack.

For qdev.c itself I prefer not to create a misleading "QDev" section but
rather just proposed a first step to split up qdev.c not just into
common vs. system-only code but also in better maintainable subareas.
That's targeted at having a section like "Core device API" covering a
to-be-created device.c with myself plus some backup as maintainer, then
Igor/mst/whomever for "Device hotplug interface" or the like.
qdev-system.c we could consider to split up so that the block/net/char
specific parts can be assigned clear maintainers - haven't investigated
that part yet. In the meantime we could simply create multiple sections
covering different aspects of qdev* files.

Cheers,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-12  9:17                         ` Marcel Apfelbaum
  2016-02-12 11:22                           ` Andreas Färber
@ 2016-02-12 18:09                           ` Eduardo Habkost
  1 sibling, 0 replies; 59+ messages in thread
From: Eduardo Habkost @ 2016-02-12 18:09 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Peter Maydell, qemu-devel, Michael S. Tsirkin, ghammer, agraf,
	Markus Armbruster, borntraeger, qemu-ppc, Gerd Hoffmann, david,
	Igor Mammedov, cornelia.huck, Paolo Bonzini, lcapitulino, lersek,
	Andreas Färber, rth

On Fri, Feb 12, 2016 at 11:17:14AM +0200, Marcel Apfelbaum wrote:
> On 02/11/2016 09:41 PM, Eduardo Habkost wrote:
> >On Fri, Feb 05, 2016 at 09:51:07AM +0200, Marcel Apfelbaum wrote:
> >>On 02/05/2016 09:49 AM, Markus Armbruster wrote:
> >>>"Michael S. Tsirkin" <mst@redhat.com> writes:
> >>>
> >>>>On Thu, Feb 04, 2016 at 12:55:22PM +0100, Paolo Bonzini wrote:
> >>>>>
> >>>>>
> >>>>>On 04/02/2016 12:41, Andreas Färber wrote:
> >>>>>>You're talking about machine, right? Some time ago I had proposed Marcel
> >>>>>>who initially worked on it, but I'm fine with anyone taking it.
> >>>>>
> >>>>>Yes.
> >>>>>
> >>>>>>For some (but not all) core qdev parts related to the (stalled) QOM
> >>>>>>migration I've been taking care of via qom-next. Last time this came up
> >>>>>>you didn't want anyone to be M: for qdev, so maybe we can use R: so that
> >>>>>>at least people automatically get CC'ed and we avoid this recurring
> >>>>>>discussion?
> >>>>>
> >>>>>I might have changed my mind on that.  You definitely should be M: for qdev.
> >>>>>
> >>>>>Paolo
> >>>>
> >>>>If Andreas wants to, that's also fine. Several maintainers are
> >>>>better than one.
> >>>
> >>>*If* the maintainers are all willing and able to work together.
> >>>
> >>
> >>No problem here from my point of view :)
> >
> >No problem to me, too. :)
> >
> >I am going to be away from work for 15 days starting on Tuesday
> >Feb 16th. So if Marcel wants to start queueing patches already,
> >please be my guest. I will be able to help on that after I'm
> >back.
> >
> 
> Hi,
> 
> If there are only a few patches on the mailing list, they can wait.
> If the number will grow I'll send a pull request.
> 
> So the MAINTAINER file should look like this, right?
> 
> Regarding qdev, Andreas, I also think you are the most qualified
> to take it, will you?
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2d6ee17..a86491a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1200,6 +1200,13 @@ F: docs/*qmp-*
>  F: scripts/qmp/
>  T: git git://repo.or.cz/qemu/armbru.git qapi-next
> 
> +Machine

I believe it would be clearer if described as "Machine core", or
"Common machine code".

> +M: Eduardo Habkost <ehabkost@redhat.com>
> +M: Marcel Apfelbaum <marcel@redhat.com>
> +S: Supported
> +F: hw/core/machine.c
> +F: include/hw/boards.h
> +
> 
> 
> 
> Thanks,
> Marcel

Thanks!

-- 
Eduardo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-12 11:22                           ` Andreas Färber
@ 2016-02-12 18:17                             ` Eduardo Habkost
  2016-02-12 22:30                               ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Eduardo Habkost @ 2016-02-12 18:17 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Peter Maydell, qemu-devel, cornelia.huck, Michael S. Tsirkin,
	ghammer, agraf, Markus Armbruster, borntraeger, qemu-ppc,
	Gerd Hoffmann, david, Igor Mammedov, Marcel Apfelbaum,
	Paolo Bonzini, lcapitulino, lersek, rth

On Fri, Feb 12, 2016 at 12:22:41PM +0100, Andreas Färber wrote:
> Am 12.02.2016 um 10:17 schrieb Marcel Apfelbaum:
> > On 02/11/2016 09:41 PM, Eduardo Habkost wrote:
> >> On Fri, Feb 05, 2016 at 09:51:07AM +0200, Marcel Apfelbaum wrote:
> >>> On 02/05/2016 09:49 AM, Markus Armbruster wrote:
> >>>> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >>>>
> >>>>> On Thu, Feb 04, 2016 at 12:55:22PM +0100, Paolo Bonzini wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 04/02/2016 12:41, Andreas Färber wrote:
> >>>>>>> You're talking about machine, right? Some time ago I had proposed
> >>>>>>> Marcel
> >>>>>>> who initially worked on it, but I'm fine with anyone taking it.
> >>>>>>
> >>>>>> Yes.
> >>>>>>
> >>>>>>> For some (but not all) core qdev parts related to the (stalled) QOM
> >>>>>>> migration I've been taking care of via qom-next. Last time this
> >>>>>>> came up
> >>>>>>> you didn't want anyone to be M: for qdev, so maybe we can use R:
> >>>>>>> so that
> >>>>>>> at least people automatically get CC'ed and we avoid this recurring
> >>>>>>> discussion?
> >>>>>>
> >>>>>> I might have changed my mind on that.  You definitely should be M:
> >>>>>> for qdev.
> >>>>>>
> >>>>>> Paolo
> >>>>>
> >>>>> If Andreas wants to, that's also fine. Several maintainers are
> >>>>> better than one.
> >>>>
> >>>> *If* the maintainers are all willing and able to work together.
> >>>>
> >>>
> >>> No problem here from my point of view :)
> >>
> >> No problem to me, too. :)
> >>
> >> I am going to be away from work for 15 days starting on Tuesday
> >> Feb 16th. So if Marcel wants to start queueing patches already,
> >> please be my guest. I will be able to help on that after I'm
> >> back.
> >>
> > 
> > Hi,
> > 
> > If there are only a few patches on the mailing list, they can wait.
> > If the number will grow I'll send a pull request.
> > 
> > So the MAINTAINER file should look like this, right?
> > 
> > Regarding qdev, Andreas, I also think you are the most qualified
> > to take it, will you?
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 2d6ee17..a86491a 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -1200,6 +1200,13 @@ F: docs/*qmp-*
> >  F: scripts/qmp/
> >  T: git git://repo.or.cz/qemu/armbru.git qapi-next
> > 
> > +Machine
> > +M: Eduardo Habkost <ehabkost@redhat.com>
> > +M: Marcel Apfelbaum <marcel@redhat.com>
> > +S: Supported
> > +F: hw/core/machine.c
> > +F: include/hw/boards.h
> > +
> 
> Fine with me, ack.
> 
> For qdev.c itself I prefer not to create a misleading "QDev" section but
> rather just proposed a first step to split up qdev.c not just into
> common vs. system-only code but also in better maintainable subareas.
> That's targeted at having a section like "Core device API" covering a
> to-be-created device.c with myself plus some backup as maintainer, then
> Igor/mst/whomever for "Device hotplug interface" or the like.
> qdev-system.c we could consider to split up so that the block/net/char
> specific parts can be assigned clear maintainers - haven't investigated
> that part yet. In the meantime we could simply create multiple sections
> covering different aspects of qdev* files.

Related question: is it OK to have files appearing in multiple
sections? It would be useful for qdev*.c and vl.c. I would like
to be CCed in any vl.c patch affecting machine initialization,
for example.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly)
  2016-02-12 18:17                             ` Eduardo Habkost
@ 2016-02-12 22:30                               ` Paolo Bonzini
  0 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2016-02-12 22:30 UTC (permalink / raw)
  To: Eduardo Habkost, Andreas Färber
  Cc: Peter Maydell, qemu-devel, cornelia.huck, Michael S. Tsirkin,
	ghammer, agraf, Markus Armbruster, borntraeger, qemu-ppc,
	Gerd Hoffmann, david, Marcel Apfelbaum, Igor Mammedov,
	lcapitulino, lersek, rth



On 12/02/2016 19:17, Eduardo Habkost wrote:
> Related question: is it OK to have files appearing in multiple
> sections? It would be useful for qdev*.c and vl.c. I would like
> to be CCed in any vl.c patch affecting machine initialization,
> for example.

Sure it is.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-11 16:30                       ` Michael S. Tsirkin
  2016-02-11 17:34                         ` Marcel Apfelbaum
@ 2016-02-15 10:30                         ` Igor Mammedov
  2016-02-15 11:26                           ` Michael S. Tsirkin
  2016-02-16 10:05                         ` Marcel Apfelbaum
  2 siblings, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-02-15 10:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer, qemu-devel,
	lcapitulino, lersek

On Thu, 11 Feb 2016 18:30:19 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> > On Tue, 9 Feb 2016 14:17:44 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> > > > > So the linker interface solves this rather neatly:
> > > > > bios allocates memory, bios passes memory map to guest.
> > > > > Served us well for several years without need for extensions,
> > > > > and it does solve the VM GEN ID problem, even though
> > > > > 1. it was never designed for huge areas like nvdimm seems to want to use
> > > > > 2. we might want to add a new 64 bit flag to avoid touching low memory    
> > > > linker interface is fine for some readonly data, like ACPI tables
> > > > especially fixed tables not so for AML ones is one wants to patch it.
> > > > 
> > > > However now when you want to use it for other purposes you start
> > > > adding extensions and other guest->QEMU channels to communicate
> > > > patching info back.
> > > > It steals guest's memory which is also not nice and doesn't scale well.    
> > > 
> > > This is an argument I don't get. memory is memory. call it guest memory
> > > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > > but much slower.
> > > 
> > > ...  
> > It however matters for user, he pays for guest with XXX RAM but gets less
> > than that. And that will be getting worse as a number of such devices
> > increases.
> >   
> > > > > OK fine, but returning PCI BAR address to guest is wrong.
> > > > > How about reading it from ACPI then? Is it really
> > > > > broken unless there's *also* a driver?    
> > > > I don't get question, MS Spec requires address (ADDR method),
> > > > and it's read by ACPI (AML).    
> > > 
> > > You were unhappy about DMA into guest memory.
> > > As a replacement for DMA, we could have AML read from
> > > e.g. PCI and write into RAM.
> > > This way we don't need to pass address to QEMU.  
> > That sounds better as it saves us from allocation of IO port
> > and QEMU don't need to write into guest memory, the only question is
> > if PCI_Config opregion would work with driver-less PCI device.  
> 
> Or PCI BAR for that reason. I don't know for sure.
unfortunately BAR doesn't work for driver-less PCI device,
but maybe we can add vendor specific PCI_Confog to always present
LPC/ISA bridges and make it do the job like it does it for allocating
IO ports for CPU/MEM hotplug now.

> 
> > 
> > And it's still pretty much not test-able since it would require
> > fully running OSPM to execute AML side.  
> 
> AML is not testable, but that's nothing new.
> You can test reading from PCI.
> 
> > >   
> > > > As for working PCI_Config OpRegion without driver, I haven't tried,
> > > > but I wouldn't be surprised if it doesn't, taking in account that
> > > > MS introduced _DSM doesn't.
> > > >     
> > > > > 
> > > > >     
> > > > > > > >    Just compare with a graphics card design, where on device memory
> > > > > > > >    is mapped directly at some GPA not wasting RAM that guest could
> > > > > > > >    use for other tasks.        
> > > > > > > 
> > > > > > > This might have been true 20 years ago.  Most modern cards do DMA.      
> > > > > > 
> > > > > > Modern cards, with it's own RAM, map its VRAM in address space directly
> > > > > > and allow users use it (GEM API). So they do not waste conventional RAM.
> > > > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > > > > > series (even PCI class id is the same)      
> > > > > 
> > > > > Don't know enough about graphics really, I'm not sure how these are
> > > > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > > > mostly use guest RAM, not on card RAM.
> > > > >     
> > > > > > > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > > > > >    instead of consuming guest's RAM they should be mapped at
> > > > > > > >    some GPA and their memory accessed directly.        
> > > > > > > 
> > > > > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > > > > address. This breaks the straight-forward approach of using a
> > > > > > > rebalanceable PCI BAR.      
> > > > > > 
> > > > > > For PCI rebalance to work on Windows, one has to provide working PCI driver
> > > > > > otherwise OS will ignore it when rebalancing happens and
> > > > > > might map something else over ignored BAR.      
> > > > > 
> > > > > Does it disable the BAR then? Or just move it elsewhere?    
> > > > it doesn't, it just blindly ignores BARs existence and maps BAR of
> > > > another device with driver over it.    
> > > 
> > > Interesting. On classical PCI this is a forbidden configuration.
> > > Maybe we do something that confuses windows?
> > > Could you tell me how to reproduce this behaviour?  
> > #cat > t << EOF
> > pci_update_mappings_del
> > pci_update_mappings_add
> > EOF
> > 
> > #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> >  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> >  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> >  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> > 
> > wait till OS boots, note BARs programmed for ivshmem
> >  in my case it was
> >    01:01.0 0,0xfe800000+0x100
> > then execute script and watch pci_update_mappings* trace events
> > 
> > # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> > 
> > hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> > Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> > and then programs new BARs, where:
> >   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> > creates overlapping BAR with ivshmem   
> 
> 
> Thanks!
> We need to figure this out because currently this does not
> work properly (or maybe it works, but merely by chance).
> Me and Marcel will play with this.
> 
> >   
> > >   
> > > > >     
> > > > > > >       
> > > > > > > >    In that case NVDIMM could even map whole label area and
> > > > > > > >    significantly simplify QEMU<->OSPM protocol that currently
> > > > > > > >    serializes that data through a 4K page.
> > > > > > > >    There is also performance issue with buffer allocated in RAM,
> > > > > > > >    because DMA adds unnecessary copying step when data could
> > > > > > > >    be read/written directly of NVDIMM.
> > > > > > > >    It might be no very important for _DSM interface but when it
> > > > > > > >    comes to supporting block mode it can become an issue.        
> > > > > > > 
> > > > > > > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > > > > > > it's guaranteed to work across BAR rebalancing.
> > > > > > > Would that address the performance issue?      
> > > > > > 
> > > > > > it would if rebalancing were to account for driverless PCI device BARs,
> > > > > > but it doesn't hence such BARs need to be statically pinned
> > > > > > at place where BIOS put them at start up.
> > > > > > I'm also not sure that PCIConfig operation region would work
> > > > > > on Windows without loaded driver (similar to _DSM case).
> > > > > > 
> > > > > >       
> > > > > > > > Above points make ACPI patching approach not robust and fragile
> > > > > > > > and hard to maintain.        
> > > > > > > 
> > > > > > > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > > > > > > get what appears your general dislike of the linker host/guest
> > > > > > > interface.      
> > > > > > Besides technical issues general dislike is just what I've written
> > > > > > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > > > > > 
> > > > > > to make it less fragile:
> > > > > >  1. it should be impossible to corrupt memory or patch wrong address.
> > > > > >     current impl. silently relies on value referenced by 'pointer' argument
> > > > > >     and to figure that out one has to read linker code on BIOS side.
> > > > > >     That could be easily set wrong and slip through review.      
> > > > > 
> > > > > That's an API issue, it seemed like a good idea but I guess
> > > > > it confuses people. Would you be happier using an offset
> > > > > instead of a pointer?    
> > > > offset is better and it would be better if it were saying
> > > > which offset it is (i.e. relative to what)    
> > > 
> > > 
> > > Start of table, right?  
> > not sure, to me it looks like start of a blob and not the table  
> 
> Right that's what I meant.
> 
> > >   
> > > > >     
> > > > > >     API shouldn't rely on the caller setting value pointed by that argument.      
> > > > > 
> > > > > I couldn't parse that one. Care suggesting a cleaner API for linker?    
> > > > here is current API signature:
> > > > 
> > > > bios_linker_loader_add_pointer(GArray *linker,
> > > >                                     const char *dest_file,
> > > >                                     const char *src_file,
> > > >                                     GArray *table, void *pointer,
> > > >                                     uint8_t pointer_size)
> > > > 
> > > > issue 1: 
> > > > where 'pointer' is a real pointer pointing inside 'table' and API
> > > > calculates offset underhood:
> > > >   offset = (gchar *)pointer - table->data;
> > > > and puts it in ADD_POINTER command.
> > > > 
> > > > it's easy to get wrong offset if 'pointer' is not from 'table'.    
> > > 
> > > OK, replace that with table_offset?  
> > blob_offset?
> > 
> > also s/table/blob/  
> 
> 
> OK.
> 
> > >   
> > > > issue 2:
> > > > 'pointer' points to another offset of size 'pointer_size' in 'table'
> > > > blob, that means that whoever composes blob, has to aware of
> > > > it and fill correct value there which is possible to do right
> > > > if one looks inside of SeaBIOS part of linker interface.
> > > > Which is easy to forget and then one has to deal with mess
> > > > caused by random memory corruption.
> > > > 
> > > > bios_linker_loader_add_pointer() and corresponding
> > > > ADD_POINTER command should take this second offset as argument
> > > > and do no require 'table' be pre-filled with it or
> > > > in worst case if of extending ADD_POINTER command is problematic
> > > > bios_linker_loader_add_pointer() should still take
> > > > the second offset and patch 'table' itself so that 'table' composer
> > > > don't have to worry about it.    
> > > 
> > > This one I don't understand. What's the second pointer you
> > > are talking about?  
> > ha, see even the author already has absolutely no clue how linker works
> > and about what offsets are relative to.
> > see SeaBIOS romfile_loader_add_pointer():
> >     ...
> >     memcpy(&pointer, dest_file->data + offset, entry->pointer_size);
> > here is the second offset       ^^^^^^^^^^^^^  
> 
> It's the same offset in the entry.
>         struct {
>             char pointer_dest_file[ROMFILE_LOADER_FILESZ];
>             char pointer_src_file[ROMFILE_LOADER_FILESZ];
>             u32 pointer_offset;
>             u8 pointer_size;
>         };
pointer_offset == offset from above but question is what is result
of memcpy() and how it's used below vvvv

> >     pointer = le64_to_cpu(pointer);
> >     pointer += (unsigned long)src_file->data;
as you see *(foo_size *)(dest_file->data + offset) is the 2nd offset
relative to the beginning of src_file and current API requires
dst_file blob to contain valid value there of pointer_size.
i.e. author of AML have to prefill 2nd offset before passing
blob to bios_linker_loader_add_pointer() which is rather fragile.
If it's difficult to make ADD_POINTER command pass that offset
as part of the command then it would be better to extend
bios_linker_loader_add_pointer() to take src_offset and write
it into blob instead of asking for AML author to do it manually
every time.
And may by add bios_linker_loader_add_aml_pointer() which would be able
to check if it patches AML correctly while old bios_linker_loader_add_pointer()
wouldn't do check and work with raw tables.


> >     pointer = cpu_to_le64(pointer);
> >     memcpy(dest_file->data + offset, &pointer, entry->pointer_size);
> > 
> > all this src|dst_file and confusing offsets(whatever they might mean)
> > make me experience headache every time I need to remember how linker
> > works and read both QEMU and SeaBIOS code to figure it out each time.
> > That's what I'd call un-maintainable and hard to use API.  
> 
> Tight, there's lack of documentation. It's my fault, so let's fix it.
> It's an API issue, nothing to do with ABI.
> 
> 
> >   
> > >   
> > > > issue 3:
> > > > all patching obviously needs bounds checking on QEMU side
> > > > so it would abort early if it could corrupt memory.    
> > > 
> > > That's easy.
> > >   
> > > > >     
> > > > > >  2. If it's going to be used for patching AML, it should assert
> > > > > >     when bios_linker_loader_add_pointer() is called if to be patched
> > > > > >     AML object is wrong and patching would corrupt AML blob.      
> > > > > 
> > > > > Hmm for example check that the patched data has
> > > > > the expected pattern?    
> > > > yep, nothing could be done for raw tables but that should be possible
> > > > for AML tables and if pattern is unsupported/size doesn't match
> > > > it should abort QEMU early instead of corrupting table.    
> > > 
> > > Above all sounds reasonable. Would you like to take a stub
> > > at it or prefer me to?  
> > It would be better if it were you.
> > 
> > I wouldn't like to maintain it ever as it's too complex and hard to use API,
> > which I'd use only as the last resort if there weren't any other way
> > to implement the task at hand.  
> 
> Sorry, I don't get it. You don't like the API, write a better one
> for an existing ABI. If you prefer waiting for me to fix it,
> that's fine too but no guarantees that you will like the new one
> or when it will happen.
> 
> Look there have been 1 change (a bigfix for alignment) in several years
> since we added the linker. We don't maintain any compatiblity flags
> around it *at all*. It might have a hard to use API but that is the
> definition of easy to maintain.  You are pushing allocating memory host
> side as an alternative, what happened there is the reverse. A ton of
> changes and pain all the way, and we get to maintain a bag of compat
> hacks for old machine types. You say we finally know what we are
> doing and won't have to change it any more. I'm not convinced.
You are talking (lots of compat issues) about manually mapped initial memory map,
while I'm talking about allocation in hotplug memory region.
It also had only one compat 'improvement' for alignment but otherwise
it wasn't a source of any problems.

> > > > >     
> > > > > >       
> > > > > > > It's there and we are not moving away from it, so why not
> > > > > > > use it in more places?  Or if you think it's wrong, why don't you build
> > > > > > > something better then?  We could then maybe use it for these things as
> > > > > > > well.      
> > > > > > 
> > > > > > Yep, I think for vmgenid and even more so for nvdimm
> > > > > > it would be better to allocate GPAs in QEMU and map backing
> > > > > > MemoryRegions directly in QEMU.
> > > > > > For nvdimm (main data region)
> > > > > > we already do it using pc-dimm's GPA allocation algorithm, we also
> > > > > > could use similar approach for nvdimm's label area and vmgenid.
> > > > > > 
> > > > > > Here is a simple attempt to add a limited GPA allocator in high memory
> > > > > >  https://patchwork.ozlabs.org/patch/540852/
> > > > > > But it haven't got any comment from you and were ignored.
> > > > > > Lets consider it and perhaps we could come up with GPA allocator
> > > > > > that could be used for other things as well.      
> > > > > 
> > > > > For nvdimm label area, I agree passing things through
> > > > > a 4K buffer seems inefficient.
> > > > > 
> > > > > I'm not sure what's a better way though.
> > > > > 
> > > > > Use 64 bit memory? Setting aside old guests such as XP,
> > > > > does it break 32 bit guests?    
> > > > it might not work with 32bit guests, the same way as mem hotplug
> > > > doesn't work for them unless they are PAE enabled.    
> > > 
> > > Right, I mean with PAE.  
> > I've tested it with 32-bit XP and Windows 10, they boot fine and
> > vmgenid device is displayed as OK with buffer above 4Gb (on Win10).
> > So at least is doesn't crash guest.
> > I can't test more than that for 32 bit guests since utility
> > to read vmgenid works only on Windows Server and there isn't
> > a 32bit version of it.
> >   
> > >   
> > > > but well that's a limitation of implementation and considering
> > > > that storage nvdimm area is mapped at 64bit GPA it doesn't matter.
> > > >      
> > > > > I'm really afraid of adding yet another allocator, I think you
> > > > > underestimate the maintainance headache: it's not theoretical and is
> > > > > already felt.    
> > > > Current maintenance headache is due to fixed handpicked
> > > > mem layout, we can't do much with it for legacy machine
> > > > types but with QEMU side GPA allocator we can try to switch
> > > > to a flexible memory layout that would allocate GPA
> > > > depending on QEMU config in a stable manner.    
> > > 
> > > So far, we didn't manage to. It seems to go in the reverse
> > > direction were we add more and more control to let management
> > > influence the layout. Things like alignment requirements
> > > also tend to surface later and wreck havoc on whatever
> > > we do.  
> > Was even there an attempt to try it before, could you point to it?  
> 
> Look at the mess we have with the existing allocator.
> As a way to fix this unmaintainable mess, what I see is suggestions
> to drop old machine types so people have to reinstall guests.
> This does not inspire confidence.
> 
> > The only attempt I've seen was https://patchwork.ozlabs.org/patch/540852/
> > but it haven't got any technical comments from you,
> > except of 'I'm afraid that it won't work' on IRC.
> > 
> > QEMU already has GPA allocator limited to memory hotplug AS
> > and it has passed through 'growing' issues. What above patch
> > proposes is to reuse already existing memory hotplug AS and
> > maybe make its GPA allocator more generic (i.e. not tied
> > only to pc-dimm) on top of it.  
> 
> You say we finally know what we are doing and won't have to change it
> any more. I'm not convinced.
> 
> > 
> > It's sufficient for vmgenid use-case and a definitely
> > much more suitable for nvdimm which already uses it for mapping
> > main storage MemoryRegion.
> >   
> > > > well there is maintenance headache with bios_linker as well
> > > > due to its complexity (multiple layers of indirection) and
> > > > it will grow when more places try to use it.
> > > > Yep we could use it as a hack, stealing RAM and trying implement
> > > > backwards DMA or we could be less afraid and consider
> > > > yet another allocator which will do the job without hacks
> > > > which should benefit QEMU in a long run (it might be not easy
> > > > to impl. it right but if we won't even try we would be buried
> > > > in complex hacks that 'work' for now)
> > > >     
> > > > > > >       
> > > > > > > >         
> > > > > > > > >         
> > > > > > > > > > > And hey, if you want to use a pci device to pass the physical
> > > > > > > > > > > address guest to host, instead of reserving
> > > > > > > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > > > > > > automatically.          
> > > > > > > > > > Could you elaborate more on this suggestion?          
> > > > > > > > > 
> > > > > > > > > I really just mean using PCI_Config operation region.
> > > > > > > > > If you wish, I'll try to post a prototype next week.        
> > > > > > > > I don't know much about PCI but it would be interesting,
> > > > > > > > perhaps we could use it somewhere else.
> > > > > > > > 
> > > > > > > > However it should be checked if it works with Windows,
> > > > > > > > for example PCI specific _DSM method is ignored by it
> > > > > > > > if PCI device doesn't have working PCI driver bound to it.
> > > > > > > >         
> > > > > > > > >         
> > > > > > > > > > > 
> > > > > > > > > > >           
> > > > > > > > > > > > ---
> > > > > > > > > > > > changes since 17:
> > > > > > > > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > > > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > > > > > > changes since 14:
> > > > > > > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > > > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > > > > > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > > >   - permit only one vmgenid to be created
> > > > > > > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > > > > > > > > ---
> > > > > > > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > > > > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > > > > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > > > > > > > > index b177e52..6402439 100644
> > > > > > > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > > > > > > >  
> > > > > > > > > > > >  All these devices are documented in docs/specs.
> > > > > > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > > > > > > index 78758e2..0187262 100644
> > > > > > > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > > > > > > @@ -44,6 +44,7 @@
> > > > > > > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > > >  
> > > > > > > > > > > >  /* Supported chipsets: */
> > > > > > > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > > > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > > > > > > >  }
> > > > > > > > > > > >  
> > > > > > > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > > > > > > +
> > > > > > > > > > > > +    dev = aml_device("VGEN");
> > > > > > > > > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > > > > > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > > > > > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > > > > > > > > +
> > > > > > > > > > > > +    pkg = aml_package(2);
> > > > > > > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > > > > > > +
> > > > > > > > > > > > +    /*
> > > > > > > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > > > > > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > > > > > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > > > > > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > > > > > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > > > > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > > > > > > > +     * For more verbose comment see this commit message.            
> > > > > > > > > > > 
> > > > > > > > > > > What does "this commit message" mean?          
> > > > > > > > > > above commit message. Should I reword it to just 'see commit message'
> > > > > > > > > >           
> > > > > > > > > > >           
> > > > > > > > > > > > +     */
> > > > > > > > > > > > +     crs = aml_resource_template();
> > > > > > > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > > > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > > > > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > > > > > > +     return dev;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > >  /*
> > > > > > > > > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > > > > > > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > > > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > > > >              }
> > > > > > > > > > > >  
> > > > > > > > > > > >              if (bus) {
> > > > > > > > > > > > +                Object *vmgen;
> > > > > > > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > > > > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > > > > > > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > > > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > > > >                      aml_append(scope, dev);
> > > > > > > > > > > >                  }
> > > > > > > > > > > >  
> > > > > > > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > > > > > > +                if (vmgen) {
> > > > > > > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > > > > > > +                    uint64_t buf_paddr =
> > > > > > > > > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > > > > > > > > +
> > > > > > > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > > > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > > > > > > > > +
> > > > > > > > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > > > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > > > > > > > +                        aml_append(method,
> > > > > > > > > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > > > > > > +                                       aml_int(0x80)));
> > > > > > > > > > > > +                        aml_append(ssdt, method);
> > > > > > > > > > > > +                    }
> > > > > > > > > > > > +                }
> > > > > > > > > > > > +
> > > > > > > > > > > >                  aml_append(sb_scope, scope);
> > > > > > > > > > > >              }
> > > > > > > > > > > >          }
> > > > > > > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > > > > > > > > >      {
> > > > > > > > > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > > > > > > > > >  
> > > > > > > > > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > > > > > > > > -
> > > > > > > > > > > >          if (misc->is_piix4) {
> > > > > > > > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > > > > > > > >              aml_append(method,
> > > > > > > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > > > > > > index d4765c2..1f05edd 100644
> > > > > > > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > > > > > > > > >  
> > > > > > > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > index 0000000..a2fbdfc
> > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > > > > > > @@ -0,0 +1,154 @@
> > > > > > > > > > > > +/*
> > > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > > + *
> > > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > > + *
> > > > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > > > + *
> > > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > > + *
> > > > > > > > > > > > + */
> > > > > > > > > > > > +
> > > > > > > > > > > > +#include "hw/i386/pc.h"
> > > > > > > > > > > > +#include "hw/pci/pci.h"
> > > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > > > > > > +#include "qapi/visitor.h"
> > > > > > > > > > > > +
> > > > > > > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > > > > > > > > +
> > > > > > > > > > > > +typedef struct VmGenIdState {
> > > > > > > > > > > > +    PCIDevice parent_obj;
> > > > > > > > > > > > +    MemoryRegion iomem;
> > > > > > > > > > > > +    union {
> > > > > > > > > > > > +        uint8_t guid[16];
> > > > > > > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > > > > > > +    };
> > > > > > > > > > > > +    bool guid_set;
> > > > > > > > > > > > +} VmGenIdState;
> > > > > > > > > > > > +
> > > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > > > > > > > > +    if (!obj) {
> > > > > > > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > > > > > > +    }
> > > > > > > > > > > > +    return obj;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    Object *acpi_obj;
> > > > > > > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > > > > > > +
> > > > > > > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > > > > > > > +
> > > > > > > > > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > > > > > > +    if (acpi_obj) {
> > > > > > > > > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > > > > > > +
> > > > > > > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > > > > > > +    }
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > > +
> > > > > > > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > > > > > > +                   value);
> > > > > > > > > > > > +        return;
> > > > > > > > > > > > +    }
> > > > > > > > > > > > +
> > > > > > > > > > > > +    s->guid_set = true;
> > > > > > > > > > > > +    vmgenid_update_guest(s);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > > > > > > > > +                                   const char *name, Error **errp)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > > > > > > +
> > > > > > > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > > > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > > > > > > +        return;
> > > > > > > > > > > > +    }
> > > > > > > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > > +
> > > > > > > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > > > > > > > > +                           &error_abort);
> > > > > > > > > > > > +
> > > > > > > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > > > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > > > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > > > > > > +    bool ambiguous = false;
> > > > > > > > > > > > +
> > > > > > > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > > > > > > > +    if (ambiguous) {
> > > > > > > > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > > > > > > > +                         " device is permitted");
> > > > > > > > > > > > +        return;
> > > > > > > > > > > > +    }
> > > > > > > > > > > > +
> > > > > > > > > > > > +    if (!s->guid_set) {
> > > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > > > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > > > > > > +        return;
> > > > > > > > > > > > +    }
> > > > > > > > > > > > +
> > > > > > > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > > > > > > +        &s->iomem);
> > > > > > > > > > > > +    return;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > > > > > > +
> > > > > > > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > > > > > > +    dc->hotpluggable = false;
> > > > > > > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > > > > > > +};
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void vmgenid_register_types(void)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +type_init(vmgenid_register_types)
> > > > > > > > > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > index 0000000..b90882c
> > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > > > > > > @@ -0,0 +1,27 @@
> > > > > > > > > > > > +/*
> > > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > > + *
> > > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > > + *
> > > > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > > > + *
> > > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > > + *
> > > > > > > > > > > > + */
> > > > > > > > > > > > +
> > > > > > > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > > > > > > +
> > > > > > > > > > > > +#include "qom/object.h"
> > > > > > > > > > > > +
> > > > > > > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > > > > > > +
> > > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > > > > > > +
> > > > > > > > > > > > +#endif
> > > > > > > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > > > > > > index dedf277..f4c9d48 100644
> > > > > > > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > > > > > > @@ -94,6 +94,7 @@
> > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > > > > > > >  
> > > > > > > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > > > > > > -- 
> > > > > > > > > > > > 1.8.3.1            
> > > > > > >       
> > >   

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-15 10:30                         ` Igor Mammedov
@ 2016-02-15 11:26                           ` Michael S. Tsirkin
  2016-02-15 13:56                             ` Igor Mammedov
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-15 11:26 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer, qemu-devel,
	lcapitulino, lersek

On Mon, Feb 15, 2016 at 11:30:24AM +0100, Igor Mammedov wrote:
> On Thu, 11 Feb 2016 18:30:19 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
> > > On Tue, 9 Feb 2016 14:17:44 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> > > > > > So the linker interface solves this rather neatly:
> > > > > > bios allocates memory, bios passes memory map to guest.
> > > > > > Served us well for several years without need for extensions,
> > > > > > and it does solve the VM GEN ID problem, even though
> > > > > > 1. it was never designed for huge areas like nvdimm seems to want to use
> > > > > > 2. we might want to add a new 64 bit flag to avoid touching low memory    
> > > > > linker interface is fine for some readonly data, like ACPI tables
> > > > > especially fixed tables not so for AML ones is one wants to patch it.
> > > > > 
> > > > > However now when you want to use it for other purposes you start
> > > > > adding extensions and other guest->QEMU channels to communicate
> > > > > patching info back.
> > > > > It steals guest's memory which is also not nice and doesn't scale well.    
> > > > 
> > > > This is an argument I don't get. memory is memory. call it guest memory
> > > > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > > > but much slower.
> > > > 
> > > > ...  
> > > It however matters for user, he pays for guest with XXX RAM but gets less
> > > than that. And that will be getting worse as a number of such devices
> > > increases.
> > >   
> > > > > > OK fine, but returning PCI BAR address to guest is wrong.
> > > > > > How about reading it from ACPI then? Is it really
> > > > > > broken unless there's *also* a driver?    
> > > > > I don't get question, MS Spec requires address (ADDR method),
> > > > > and it's read by ACPI (AML).    
> > > > 
> > > > You were unhappy about DMA into guest memory.
> > > > As a replacement for DMA, we could have AML read from
> > > > e.g. PCI and write into RAM.
> > > > This way we don't need to pass address to QEMU.  
> > > That sounds better as it saves us from allocation of IO port
> > > and QEMU don't need to write into guest memory, the only question is
> > > if PCI_Config opregion would work with driver-less PCI device.  
> > 
> > Or PCI BAR for that reason. I don't know for sure.
> unfortunately BAR doesn't work for driver-less PCI device,
> but maybe we can add vendor specific PCI_Confog to always present
> LPC/ISA bridges and make it do the job like it does it for allocating
> IO ports for CPU/MEM hotplug now.
> 
> > 
> > > 
> > > And it's still pretty much not test-able since it would require
> > > fully running OSPM to execute AML side.  
> > 
> > AML is not testable, but that's nothing new.
> > You can test reading from PCI.
> > 
> > > >   
> > > > > As for working PCI_Config OpRegion without driver, I haven't tried,
> > > > > but I wouldn't be surprised if it doesn't, taking in account that
> > > > > MS introduced _DSM doesn't.
> > > > >     
> > > > > > 
> > > > > >     
> > > > > > > > >    Just compare with a graphics card design, where on device memory
> > > > > > > > >    is mapped directly at some GPA not wasting RAM that guest could
> > > > > > > > >    use for other tasks.        
> > > > > > > > 
> > > > > > > > This might have been true 20 years ago.  Most modern cards do DMA.      
> > > > > > > 
> > > > > > > Modern cards, with it's own RAM, map its VRAM in address space directly
> > > > > > > and allow users use it (GEM API). So they do not waste conventional RAM.
> > > > > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > > > > > > series (even PCI class id is the same)      
> > > > > > 
> > > > > > Don't know enough about graphics really, I'm not sure how these are
> > > > > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > > > > mostly use guest RAM, not on card RAM.
> > > > > >     
> > > > > > > > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > > > > > >    instead of consuming guest's RAM they should be mapped at
> > > > > > > > >    some GPA and their memory accessed directly.        
> > > > > > > > 
> > > > > > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > > > > > address. This breaks the straight-forward approach of using a
> > > > > > > > rebalanceable PCI BAR.      
> > > > > > > 
> > > > > > > For PCI rebalance to work on Windows, one has to provide working PCI driver
> > > > > > > otherwise OS will ignore it when rebalancing happens and
> > > > > > > might map something else over ignored BAR.      
> > > > > > 
> > > > > > Does it disable the BAR then? Or just move it elsewhere?    
> > > > > it doesn't, it just blindly ignores BARs existence and maps BAR of
> > > > > another device with driver over it.    
> > > > 
> > > > Interesting. On classical PCI this is a forbidden configuration.
> > > > Maybe we do something that confuses windows?
> > > > Could you tell me how to reproduce this behaviour?  
> > > #cat > t << EOF
> > > pci_update_mappings_del
> > > pci_update_mappings_add
> > > EOF
> > > 
> > > #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> > >  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> > >  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> > >  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> > > 
> > > wait till OS boots, note BARs programmed for ivshmem
> > >  in my case it was
> > >    01:01.0 0,0xfe800000+0x100
> > > then execute script and watch pci_update_mappings* trace events
> > > 
> > > # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> > > 
> > > hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> > > Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> > > and then programs new BARs, where:
> > >   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> > > creates overlapping BAR with ivshmem   
> > 
> > 
> > Thanks!
> > We need to figure this out because currently this does not
> > work properly (or maybe it works, but merely by chance).
> > Me and Marcel will play with this.
> > 
> > >   
> > > >   
> > > > > >     
> > > > > > > >       
> > > > > > > > >    In that case NVDIMM could even map whole label area and
> > > > > > > > >    significantly simplify QEMU<->OSPM protocol that currently
> > > > > > > > >    serializes that data through a 4K page.
> > > > > > > > >    There is also performance issue with buffer allocated in RAM,
> > > > > > > > >    because DMA adds unnecessary copying step when data could
> > > > > > > > >    be read/written directly of NVDIMM.
> > > > > > > > >    It might be no very important for _DSM interface but when it
> > > > > > > > >    comes to supporting block mode it can become an issue.        
> > > > > > > > 
> > > > > > > > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > > > > > > > it's guaranteed to work across BAR rebalancing.
> > > > > > > > Would that address the performance issue?      
> > > > > > > 
> > > > > > > it would if rebalancing were to account for driverless PCI device BARs,
> > > > > > > but it doesn't hence such BARs need to be statically pinned
> > > > > > > at place where BIOS put them at start up.
> > > > > > > I'm also not sure that PCIConfig operation region would work
> > > > > > > on Windows without loaded driver (similar to _DSM case).
> > > > > > > 
> > > > > > >       
> > > > > > > > > Above points make ACPI patching approach not robust and fragile
> > > > > > > > > and hard to maintain.        
> > > > > > > > 
> > > > > > > > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > > > > > > > get what appears your general dislike of the linker host/guest
> > > > > > > > interface.      
> > > > > > > Besides technical issues general dislike is just what I've written
> > > > > > > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > > > > > > 
> > > > > > > to make it less fragile:
> > > > > > >  1. it should be impossible to corrupt memory or patch wrong address.
> > > > > > >     current impl. silently relies on value referenced by 'pointer' argument
> > > > > > >     and to figure that out one has to read linker code on BIOS side.
> > > > > > >     That could be easily set wrong and slip through review.      
> > > > > > 
> > > > > > That's an API issue, it seemed like a good idea but I guess
> > > > > > it confuses people. Would you be happier using an offset
> > > > > > instead of a pointer?    
> > > > > offset is better and it would be better if it were saying
> > > > > which offset it is (i.e. relative to what)    
> > > > 
> > > > 
> > > > Start of table, right?  
> > > not sure, to me it looks like start of a blob and not the table  
> > 
> > Right that's what I meant.
> > 
> > > >   
> > > > > >     
> > > > > > >     API shouldn't rely on the caller setting value pointed by that argument.      
> > > > > > 
> > > > > > I couldn't parse that one. Care suggesting a cleaner API for linker?    
> > > > > here is current API signature:
> > > > > 
> > > > > bios_linker_loader_add_pointer(GArray *linker,
> > > > >                                     const char *dest_file,
> > > > >                                     const char *src_file,
> > > > >                                     GArray *table, void *pointer,
> > > > >                                     uint8_t pointer_size)
> > > > > 
> > > > > issue 1: 
> > > > > where 'pointer' is a real pointer pointing inside 'table' and API
> > > > > calculates offset underhood:
> > > > >   offset = (gchar *)pointer - table->data;
> > > > > and puts it in ADD_POINTER command.
> > > > > 
> > > > > it's easy to get wrong offset if 'pointer' is not from 'table'.    
> > > > 
> > > > OK, replace that with table_offset?  
> > > blob_offset?
> > > 
> > > also s/table/blob/  
> > 
> > 
> > OK.
> > 
> > > >   
> > > > > issue 2:
> > > > > 'pointer' points to another offset of size 'pointer_size' in 'table'
> > > > > blob, that means that whoever composes blob, has to aware of
> > > > > it and fill correct value there which is possible to do right
> > > > > if one looks inside of SeaBIOS part of linker interface.
> > > > > Which is easy to forget and then one has to deal with mess
> > > > > caused by random memory corruption.
> > > > > 
> > > > > bios_linker_loader_add_pointer() and corresponding
> > > > > ADD_POINTER command should take this second offset as argument
> > > > > and do no require 'table' be pre-filled with it or
> > > > > in worst case if of extending ADD_POINTER command is problematic
> > > > > bios_linker_loader_add_pointer() should still take
> > > > > the second offset and patch 'table' itself so that 'table' composer
> > > > > don't have to worry about it.    
> > > > 
> > > > This one I don't understand. What's the second pointer you
> > > > are talking about?  
> > > ha, see even the author already has absolutely no clue how linker works
> > > and about what offsets are relative to.
> > > see SeaBIOS romfile_loader_add_pointer():
> > >     ...
> > >     memcpy(&pointer, dest_file->data + offset, entry->pointer_size);
> > > here is the second offset       ^^^^^^^^^^^^^  
> > 
> > It's the same offset in the entry.
> >         struct {
> >             char pointer_dest_file[ROMFILE_LOADER_FILESZ];
> >             char pointer_src_file[ROMFILE_LOADER_FILESZ];
> >             u32 pointer_offset;
> >             u8 pointer_size;
> >         };
> pointer_offset == offset from above but question is what is result
> of memcpy() and how it's used below vvvv

It's just because seabios is trying to be endian-ness
agnostic. So instead of patching value directly,
we read the pointer value, add offset, then store
the result back.

> > >     pointer = le64_to_cpu(pointer);
> > >     pointer += (unsigned long)src_file->data;
> as you see *(foo_size *)(dest_file->data + offset) is the 2nd offset
> relative to the beginning of src_file

Sorry I don't see. Where's the second offset? I see a single one.

> and current API requires
> dst_file blob to contain valid value there of pointer_size.
> i.e. author of AML have to prefill 2nd offset before passing
> blob to bios_linker_loader_add_pointer()

Since you say prefill, I am guessing what you refer
to is the fact the add_pointer command adds
a pointer to a current value in the dest blob,
as opposed to overwriting it.

So if you want the result to point at offset within source blob,
you can put that offset within the pointer value.




> which is rather fragile.
> If it's difficult to make ADD_POINTER command pass that offset
> as part of the command then it would be better to extend
> bios_linker_loader_add_pointer() to take src_offset and write
> it into blob instead of asking for AML author to do it manually
> every time.

Okay. so let's be more specific. This is what makes you unhappy I guess?

    rsdp->rsdt_physical_address = cpu_to_le32(rsdt);
    /* Address to be filled by Guest linker */
    bios_linker_loader_add_pointer(linker, ACPI_BUILD_RSDP_FILE,
                                   ACPI_BUILD_TABLE_FILE,
                                   rsdp_table,
				   &rsdp->rsdt_physical_address,
                                   sizeof rsdp->rsdt_physical_address);

You need to remember to fill in the patched value.
How would you improve on this?



> And may by add bios_linker_loader_add_aml_pointer() which would be able
> to check if it patches AML correctly while old bios_linker_loader_add_pointer()
> wouldn't do check and work with raw tables.

I'm not sure how that will be used.


> 
> > >     pointer = cpu_to_le64(pointer);
> > >     memcpy(dest_file->data + offset, &pointer, entry->pointer_size);
> > > 
> > > all this src|dst_file and confusing offsets(whatever they might mean)
> > > make me experience headache every time I need to remember how linker
> > > works and read both QEMU and SeaBIOS code to figure it out each time.
> > > That's what I'd call un-maintainable and hard to use API.  
> > 
> > Tight, there's lack of documentation. It's my fault, so let's fix it.
> > It's an API issue, nothing to do with ABI.
> > 
> > 
> > >   
> > > >   
> > > > > issue 3:
> > > > > all patching obviously needs bounds checking on QEMU side
> > > > > so it would abort early if it could corrupt memory.    
> > > > 
> > > > That's easy.
> > > >   
> > > > > >     
> > > > > > >  2. If it's going to be used for patching AML, it should assert
> > > > > > >     when bios_linker_loader_add_pointer() is called if to be patched
> > > > > > >     AML object is wrong and patching would corrupt AML blob.      
> > > > > > 
> > > > > > Hmm for example check that the patched data has
> > > > > > the expected pattern?    
> > > > > yep, nothing could be done for raw tables but that should be possible
> > > > > for AML tables and if pattern is unsupported/size doesn't match
> > > > > it should abort QEMU early instead of corrupting table.    
> > > > 
> > > > Above all sounds reasonable. Would you like to take a stub
> > > > at it or prefer me to?  
> > > It would be better if it were you.
> > > 
> > > I wouldn't like to maintain it ever as it's too complex and hard to use API,
> > > which I'd use only as the last resort if there weren't any other way
> > > to implement the task at hand.  
> > 
> > Sorry, I don't get it. You don't like the API, write a better one
> > for an existing ABI. If you prefer waiting for me to fix it,
> > that's fine too but no guarantees that you will like the new one
> > or when it will happen.
> > 
> > Look there have been 1 change (a bigfix for alignment) in several years
> > since we added the linker. We don't maintain any compatiblity flags
> > around it *at all*. It might have a hard to use API but that is the
> > definition of easy to maintain.  You are pushing allocating memory host
> > side as an alternative, what happened there is the reverse. A ton of
> > changes and pain all the way, and we get to maintain a bag of compat
> > hacks for old machine types. You say we finally know what we are
> > doing and won't have to change it any more. I'm not convinced.
> You are talking (lots of compat issues) about manually mapped initial memory map,
> while I'm talking about allocation in hotplug memory region.
> It also had only one compat 'improvement' for alignment but otherwise
> it wasn't a source of any problems.

I'll have to look at the differences.
Why don't they affect hotplug memory region?

> > > > > >     
> > > > > > >       
> > > > > > > > It's there and we are not moving away from it, so why not
> > > > > > > > use it in more places?  Or if you think it's wrong, why don't you build
> > > > > > > > something better then?  We could then maybe use it for these things as
> > > > > > > > well.      
> > > > > > > 
> > > > > > > Yep, I think for vmgenid and even more so for nvdimm
> > > > > > > it would be better to allocate GPAs in QEMU and map backing
> > > > > > > MemoryRegions directly in QEMU.
> > > > > > > For nvdimm (main data region)
> > > > > > > we already do it using pc-dimm's GPA allocation algorithm, we also
> > > > > > > could use similar approach for nvdimm's label area and vmgenid.
> > > > > > > 
> > > > > > > Here is a simple attempt to add a limited GPA allocator in high memory
> > > > > > >  https://patchwork.ozlabs.org/patch/540852/
> > > > > > > But it haven't got any comment from you and were ignored.
> > > > > > > Lets consider it and perhaps we could come up with GPA allocator
> > > > > > > that could be used for other things as well.      
> > > > > > 
> > > > > > For nvdimm label area, I agree passing things through
> > > > > > a 4K buffer seems inefficient.
> > > > > > 
> > > > > > I'm not sure what's a better way though.
> > > > > > 
> > > > > > Use 64 bit memory? Setting aside old guests such as XP,
> > > > > > does it break 32 bit guests?    
> > > > > it might not work with 32bit guests, the same way as mem hotplug
> > > > > doesn't work for them unless they are PAE enabled.    
> > > > 
> > > > Right, I mean with PAE.  
> > > I've tested it with 32-bit XP and Windows 10, they boot fine and
> > > vmgenid device is displayed as OK with buffer above 4Gb (on Win10).
> > > So at least is doesn't crash guest.
> > > I can't test more than that for 32 bit guests since utility
> > > to read vmgenid works only on Windows Server and there isn't
> > > a 32bit version of it.
> > >   
> > > >   
> > > > > but well that's a limitation of implementation and considering
> > > > > that storage nvdimm area is mapped at 64bit GPA it doesn't matter.
> > > > >      
> > > > > > I'm really afraid of adding yet another allocator, I think you
> > > > > > underestimate the maintainance headache: it's not theoretical and is
> > > > > > already felt.    
> > > > > Current maintenance headache is due to fixed handpicked
> > > > > mem layout, we can't do much with it for legacy machine
> > > > > types but with QEMU side GPA allocator we can try to switch
> > > > > to a flexible memory layout that would allocate GPA
> > > > > depending on QEMU config in a stable manner.    
> > > > 
> > > > So far, we didn't manage to. It seems to go in the reverse
> > > > direction were we add more and more control to let management
> > > > influence the layout. Things like alignment requirements
> > > > also tend to surface later and wreck havoc on whatever
> > > > we do.  
> > > Was even there an attempt to try it before, could you point to it?  
> > 
> > Look at the mess we have with the existing allocator.
> > As a way to fix this unmaintainable mess, what I see is suggestions
> > to drop old machine types so people have to reinstall guests.
> > This does not inspire confidence.
> > 
> > > The only attempt I've seen was https://patchwork.ozlabs.org/patch/540852/
> > > but it haven't got any technical comments from you,
> > > except of 'I'm afraid that it won't work' on IRC.
> > > 
> > > QEMU already has GPA allocator limited to memory hotplug AS
> > > and it has passed through 'growing' issues. What above patch
> > > proposes is to reuse already existing memory hotplug AS and
> > > maybe make its GPA allocator more generic (i.e. not tied
> > > only to pc-dimm) on top of it.  
> > 
> > You say we finally know what we are doing and won't have to change it
> > any more. I'm not convinced.
> > 
> > > 
> > > It's sufficient for vmgenid use-case and a definitely
> > > much more suitable for nvdimm which already uses it for mapping
> > > main storage MemoryRegion.
> > >   
> > > > > well there is maintenance headache with bios_linker as well
> > > > > due to its complexity (multiple layers of indirection) and
> > > > > it will grow when more places try to use it.
> > > > > Yep we could use it as a hack, stealing RAM and trying implement
> > > > > backwards DMA or we could be less afraid and consider
> > > > > yet another allocator which will do the job without hacks
> > > > > which should benefit QEMU in a long run (it might be not easy
> > > > > to impl. it right but if we won't even try we would be buried
> > > > > in complex hacks that 'work' for now)
> > > > >     
> > > > > > > >       
> > > > > > > > >         
> > > > > > > > > >         
> > > > > > > > > > > > And hey, if you want to use a pci device to pass the physical
> > > > > > > > > > > > address guest to host, instead of reserving
> > > > > > > > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > > > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > > > > > > > automatically.          
> > > > > > > > > > > Could you elaborate more on this suggestion?          
> > > > > > > > > > 
> > > > > > > > > > I really just mean using PCI_Config operation region.
> > > > > > > > > > If you wish, I'll try to post a prototype next week.        
> > > > > > > > > I don't know much about PCI but it would be interesting,
> > > > > > > > > perhaps we could use it somewhere else.
> > > > > > > > > 
> > > > > > > > > However it should be checked if it works with Windows,
> > > > > > > > > for example PCI specific _DSM method is ignored by it
> > > > > > > > > if PCI device doesn't have working PCI driver bound to it.
> > > > > > > > >         
> > > > > > > > > >         
> > > > > > > > > > > > 
> > > > > > > > > > > >           
> > > > > > > > > > > > > ---
> > > > > > > > > > > > > changes since 17:
> > > > > > > > > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > > > > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > > > > > > > changes since 14:
> > > > > > > > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > > > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > > > > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > > > > > > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > > > >   - permit only one vmgenid to be created
> > > > > > > > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > > > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > > > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > > > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > > > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > > > > > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > > > > > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > > > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > > > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > > > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > > > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > > > > > > > 
> > > > > > > > > > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > > > > > > > > > index b177e52..6402439 100644
> > > > > > > > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > > > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > > > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > > > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > > > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > > > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > > > > > > > >  
> > > > > > > > > > > > >  All these devices are documented in docs/specs.
> > > > > > > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > > > > > > > index 78758e2..0187262 100644
> > > > > > > > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > > > > > > > @@ -44,6 +44,7 @@
> > > > > > > > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > > > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > > > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > > > >  
> > > > > > > > > > > > >  /* Supported chipsets: */
> > > > > > > > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > > > > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > > > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >  
> > > > > > > > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    dev = aml_device("VGEN");
> > > > > > > > > > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > > > > > > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > > > > > > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    pkg = aml_package(2);
> > > > > > > > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    /*
> > > > > > > > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > > > > > > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > > > > > > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > > > > > > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > > > > > > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > > > > > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > > > > > > > > +     * For more verbose comment see this commit message.            
> > > > > > > > > > > > 
> > > > > > > > > > > > What does "this commit message" mean?          
> > > > > > > > > > > above commit message. Should I reword it to just 'see commit message'
> > > > > > > > > > >           
> > > > > > > > > > > >           
> > > > > > > > > > > > > +     */
> > > > > > > > > > > > > +     crs = aml_resource_template();
> > > > > > > > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > > > > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > > > > > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > > > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > > > > > > > +     return dev;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  /*
> > > > > > > > > > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > > > > > > > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > > > > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > > > > >              }
> > > > > > > > > > > > >  
> > > > > > > > > > > > >              if (bus) {
> > > > > > > > > > > > > +                Object *vmgen;
> > > > > > > > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > > > > > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > > > > > > > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > > > > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > > > > >                      aml_append(scope, dev);
> > > > > > > > > > > > >                  }
> > > > > > > > > > > > >  
> > > > > > > > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > > > > > > > +                if (vmgen) {
> > > > > > > > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > > > > > > > +                    uint64_t buf_paddr =
> > > > > > > > > > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > > > > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > > > > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > > > > > > > > +                        aml_append(method,
> > > > > > > > > > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > > > > > > > +                                       aml_int(0x80)));
> > > > > > > > > > > > > +                        aml_append(ssdt, method);
> > > > > > > > > > > > > +                    }
> > > > > > > > > > > > > +                }
> > > > > > > > > > > > > +
> > > > > > > > > > > > >                  aml_append(sb_scope, scope);
> > > > > > > > > > > > >              }
> > > > > > > > > > > > >          }
> > > > > > > > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > > > > > > > > > >      {
> > > > > > > > > > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > > > > > > > > > >  
> > > > > > > > > > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > > > > > > > > > -
> > > > > > > > > > > > >          if (misc->is_piix4) {
> > > > > > > > > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > > > > > > > > >              aml_append(method,
> > > > > > > > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > > > > > > > index d4765c2..1f05edd 100644
> > > > > > > > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > > > > > > > > > >  
> > > > > > > > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > > > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > > > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index 0000000..a2fbdfc
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > > > > > > > @@ -0,0 +1,154 @@
> > > > > > > > > > > > > +/*
> > > > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > > > + *
> > > > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > > > + *
> > > > > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > > > > + *
> > > > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > > > + *
> > > > > > > > > > > > > + */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#include "hw/i386/pc.h"
> > > > > > > > > > > > > +#include "hw/pci/pci.h"
> > > > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > > > > > > > +#include "qapi/visitor.h"
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +typedef struct VmGenIdState {
> > > > > > > > > > > > > +    PCIDevice parent_obj;
> > > > > > > > > > > > > +    MemoryRegion iomem;
> > > > > > > > > > > > > +    union {
> > > > > > > > > > > > > +        uint8_t guid[16];
> > > > > > > > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > > > > > > > +    };
> > > > > > > > > > > > > +    bool guid_set;
> > > > > > > > > > > > > +} VmGenIdState;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > > > > > > > > > +    if (!obj) {
> > > > > > > > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > > > > > > > +    }
> > > > > > > > > > > > > +    return obj;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +    Object *acpi_obj;
> > > > > > > > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > > > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > > > > > > > +    if (acpi_obj) {
> > > > > > > > > > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > > > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > > > > > > > +    }
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > > > > > > > +                   value);
> > > > > > > > > > > > > +        return;
> > > > > > > > > > > > > +    }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    s->guid_set = true;
> > > > > > > > > > > > > +    vmgenid_update_guest(s);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > > > > > > > > > +                                   const char *name, Error **errp)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > > > > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > > > > > > > +        return;
> > > > > > > > > > > > > +    }
> > > > > > > > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > > > > > > > > > +                           &error_abort);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > > > > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > > > > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > > > > > > > +    bool ambiguous = false;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > > > > > > > > +    if (ambiguous) {
> > > > > > > > > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > > > > > > > > +                         " device is permitted");
> > > > > > > > > > > > > +        return;
> > > > > > > > > > > > > +    }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    if (!s->guid_set) {
> > > > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > > > > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > > > > > > > +        return;
> > > > > > > > > > > > > +    }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > > > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > > > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > > > > > > > +        &s->iomem);
> > > > > > > > > > > > > +    return;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > > > > > > > +    dc->hotpluggable = false;
> > > > > > > > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > > > > > > > +};
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void vmgenid_register_types(void)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +type_init(vmgenid_register_types)
> > > > > > > > > > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > index 0000000..b90882c
> > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > > > > > > > @@ -0,0 +1,27 @@
> > > > > > > > > > > > > +/*
> > > > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > > > + *
> > > > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > > > + *
> > > > > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > > > > + *
> > > > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > > > + *
> > > > > > > > > > > > > + */
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#include "qom/object.h"
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +#endif
> > > > > > > > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > > > > > > > index dedf277..f4c9d48 100644
> > > > > > > > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > > > > > > > @@ -94,6 +94,7 @@
> > > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > > > > > > > >  
> > > > > > > > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > > > > > > > -- 
> > > > > > > > > > > > > 1.8.3.1            
> > > > > > > >       
> > > >   

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-15 11:26                           ` Michael S. Tsirkin
@ 2016-02-15 13:56                             ` Igor Mammedov
  0 siblings, 0 replies; 59+ messages in thread
From: Igor Mammedov @ 2016-02-15 13:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer, qemu-devel,
	lcapitulino, lersek

On Mon, 15 Feb 2016 13:26:29 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Feb 15, 2016 at 11:30:24AM +0100, Igor Mammedov wrote:
> > On Thu, 11 Feb 2016 18:30:19 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:  
> > > > On Tue, 9 Feb 2016 14:17:44 +0200
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >     
> > > > > On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:    
> > > > > > > So the linker interface solves this rather neatly:
> > > > > > > bios allocates memory, bios passes memory map to guest.
> > > > > > > Served us well for several years without need for extensions,
> > > > > > > and it does solve the VM GEN ID problem, even though
> > > > > > > 1. it was never designed for huge areas like nvdimm seems to want to use
> > > > > > > 2. we might want to add a new 64 bit flag to avoid touching low memory      
> > > > > > linker interface is fine for some readonly data, like ACPI tables
> > > > > > especially fixed tables not so for AML ones is one wants to patch it.
> > > > > > 
> > > > > > However now when you want to use it for other purposes you start
> > > > > > adding extensions and other guest->QEMU channels to communicate
> > > > > > patching info back.
> > > > > > It steals guest's memory which is also not nice and doesn't scale well.      
> > > > > 
> > > > > This is an argument I don't get. memory is memory. call it guest memory
> > > > > or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > > > > but much slower.
> > > > > 
> > > > > ...    
> > > > It however matters for user, he pays for guest with XXX RAM but gets less
> > > > than that. And that will be getting worse as a number of such devices
> > > > increases.
> > > >     
> > > > > > > OK fine, but returning PCI BAR address to guest is wrong.
> > > > > > > How about reading it from ACPI then? Is it really
> > > > > > > broken unless there's *also* a driver?      
> > > > > > I don't get question, MS Spec requires address (ADDR method),
> > > > > > and it's read by ACPI (AML).      
> > > > > 
> > > > > You were unhappy about DMA into guest memory.
> > > > > As a replacement for DMA, we could have AML read from
> > > > > e.g. PCI and write into RAM.
> > > > > This way we don't need to pass address to QEMU.    
> > > > That sounds better as it saves us from allocation of IO port
> > > > and QEMU don't need to write into guest memory, the only question is
> > > > if PCI_Config opregion would work with driver-less PCI device.    
> > > 
> > > Or PCI BAR for that reason. I don't know for sure.  
> > unfortunately BAR doesn't work for driver-less PCI device,
> > but maybe we can add vendor specific PCI_Confog to always present
> > LPC/ISA bridges and make it do the job like it does it for allocating
> > IO ports for CPU/MEM hotplug now.
> >   
> > >   
> > > > 
> > > > And it's still pretty much not test-able since it would require
> > > > fully running OSPM to execute AML side.    
> > > 
> > > AML is not testable, but that's nothing new.
> > > You can test reading from PCI.
> > >   
> > > > >     
> > > > > > As for working PCI_Config OpRegion without driver, I haven't tried,
> > > > > > but I wouldn't be surprised if it doesn't, taking in account that
> > > > > > MS introduced _DSM doesn't.
> > > > > >       
> > > > > > > 
> > > > > > >       
> > > > > > > > > >    Just compare with a graphics card design, where on device memory
> > > > > > > > > >    is mapped directly at some GPA not wasting RAM that guest could
> > > > > > > > > >    use for other tasks.          
> > > > > > > > > 
> > > > > > > > > This might have been true 20 years ago.  Most modern cards do DMA.        
> > > > > > > > 
> > > > > > > > Modern cards, with it's own RAM, map its VRAM in address space directly
> > > > > > > > and allow users use it (GEM API). So they do not waste conventional RAM.
> > > > > > > > For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > > > > > > > series (even PCI class id is the same)        
> > > > > > > 
> > > > > > > Don't know enough about graphics really, I'm not sure how these are
> > > > > > > relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > > > > > > mostly use guest RAM, not on card RAM.
> > > > > > >       
> > > > > > > > > >    VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > > > > > > > > >    instead of consuming guest's RAM they should be mapped at
> > > > > > > > > >    some GPA and their memory accessed directly.          
> > > > > > > > > 
> > > > > > > > > VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > > > > > > > > address. This breaks the straight-forward approach of using a
> > > > > > > > > rebalanceable PCI BAR.        
> > > > > > > > 
> > > > > > > > For PCI rebalance to work on Windows, one has to provide working PCI driver
> > > > > > > > otherwise OS will ignore it when rebalancing happens and
> > > > > > > > might map something else over ignored BAR.        
> > > > > > > 
> > > > > > > Does it disable the BAR then? Or just move it elsewhere?      
> > > > > > it doesn't, it just blindly ignores BARs existence and maps BAR of
> > > > > > another device with driver over it.      
> > > > > 
> > > > > Interesting. On classical PCI this is a forbidden configuration.
> > > > > Maybe we do something that confuses windows?
> > > > > Could you tell me how to reproduce this behaviour?    
> > > > #cat > t << EOF
> > > > pci_update_mappings_del
> > > > pci_update_mappings_add
> > > > EOF
> > > > 
> > > > #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> > > >  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> > > >  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> > > >  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> > > > 
> > > > wait till OS boots, note BARs programmed for ivshmem
> > > >  in my case it was
> > > >    01:01.0 0,0xfe800000+0x100
> > > > then execute script and watch pci_update_mappings* trace events
> > > > 
> > > > # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> > > > 
> > > > hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> > > > Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> > > > and then programs new BARs, where:
> > > >   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> > > > creates overlapping BAR with ivshmem     
> > > 
> > > 
> > > Thanks!
> > > We need to figure this out because currently this does not
> > > work properly (or maybe it works, but merely by chance).
> > > Me and Marcel will play with this.
> > >   
> > > >     
> > > > >     
> > > > > > >       
> > > > > > > > >         
> > > > > > > > > >    In that case NVDIMM could even map whole label area and
> > > > > > > > > >    significantly simplify QEMU<->OSPM protocol that currently
> > > > > > > > > >    serializes that data through a 4K page.
> > > > > > > > > >    There is also performance issue with buffer allocated in RAM,
> > > > > > > > > >    because DMA adds unnecessary copying step when data could
> > > > > > > > > >    be read/written directly of NVDIMM.
> > > > > > > > > >    It might be no very important for _DSM interface but when it
> > > > > > > > > >    comes to supporting block mode it can become an issue.          
> > > > > > > > > 
> > > > > > > > > So for NVDIMM, presumably it will have code access PCI BAR properly, so
> > > > > > > > > it's guaranteed to work across BAR rebalancing.
> > > > > > > > > Would that address the performance issue?        
> > > > > > > > 
> > > > > > > > it would if rebalancing were to account for driverless PCI device BARs,
> > > > > > > > but it doesn't hence such BARs need to be statically pinned
> > > > > > > > at place where BIOS put them at start up.
> > > > > > > > I'm also not sure that PCIConfig operation region would work
> > > > > > > > on Windows without loaded driver (similar to _DSM case).
> > > > > > > > 
> > > > > > > >         
> > > > > > > > > > Above points make ACPI patching approach not robust and fragile
> > > > > > > > > > and hard to maintain.          
> > > > > > > > > 
> > > > > > > > > Wrt GEN ID these are all kind of subjective though.  I especially don't
> > > > > > > > > get what appears your general dislike of the linker host/guest
> > > > > > > > > interface.        
> > > > > > > > Besides technical issues general dislike is just what I've written
> > > > > > > > "not robust and fragile" bios_linker_loader_add_pointer() interface.
> > > > > > > > 
> > > > > > > > to make it less fragile:
> > > > > > > >  1. it should be impossible to corrupt memory or patch wrong address.
> > > > > > > >     current impl. silently relies on value referenced by 'pointer' argument
> > > > > > > >     and to figure that out one has to read linker code on BIOS side.
> > > > > > > >     That could be easily set wrong and slip through review.        
> > > > > > > 
> > > > > > > That's an API issue, it seemed like a good idea but I guess
> > > > > > > it confuses people. Would you be happier using an offset
> > > > > > > instead of a pointer?      
> > > > > > offset is better and it would be better if it were saying
> > > > > > which offset it is (i.e. relative to what)      
> > > > > 
> > > > > 
> > > > > Start of table, right?    
> > > > not sure, to me it looks like start of a blob and not the table    
> > > 
> > > Right that's what I meant.
> > >   
> > > > >     
> > > > > > >       
> > > > > > > >     API shouldn't rely on the caller setting value pointed by that argument.        
> > > > > > > 
> > > > > > > I couldn't parse that one. Care suggesting a cleaner API for linker?      
> > > > > > here is current API signature:
> > > > > > 
> > > > > > bios_linker_loader_add_pointer(GArray *linker,
> > > > > >                                     const char *dest_file,
> > > > > >                                     const char *src_file,
> > > > > >                                     GArray *table, void *pointer,
> > > > > >                                     uint8_t pointer_size)
> > > > > > 
> > > > > > issue 1: 
> > > > > > where 'pointer' is a real pointer pointing inside 'table' and API
> > > > > > calculates offset underhood:
> > > > > >   offset = (gchar *)pointer - table->data;
> > > > > > and puts it in ADD_POINTER command.
> > > > > > 
> > > > > > it's easy to get wrong offset if 'pointer' is not from 'table'.      
> > > > > 
> > > > > OK, replace that with table_offset?    
> > > > blob_offset?
> > > > 
> > > > also s/table/blob/    
> > > 
> > > 
> > > OK.
> > >   
> > > > >     
> > > > > > issue 2:
> > > > > > 'pointer' points to another offset of size 'pointer_size' in 'table'
> > > > > > blob, that means that whoever composes blob, has to aware of
> > > > > > it and fill correct value there which is possible to do right
> > > > > > if one looks inside of SeaBIOS part of linker interface.
> > > > > > Which is easy to forget and then one has to deal with mess
> > > > > > caused by random memory corruption.
> > > > > > 
> > > > > > bios_linker_loader_add_pointer() and corresponding
> > > > > > ADD_POINTER command should take this second offset as argument
> > > > > > and do no require 'table' be pre-filled with it or
> > > > > > in worst case if of extending ADD_POINTER command is problematic
> > > > > > bios_linker_loader_add_pointer() should still take
> > > > > > the second offset and patch 'table' itself so that 'table' composer
> > > > > > don't have to worry about it.      
> > > > > 
> > > > > This one I don't understand. What's the second pointer you
> > > > > are talking about?    
> > > > ha, see even the author already has absolutely no clue how linker works
> > > > and about what offsets are relative to.
> > > > see SeaBIOS romfile_loader_add_pointer():
> > > >     ...
> > > >     memcpy(&pointer, dest_file->data + offset, entry->pointer_size);
> > > > here is the second offset       ^^^^^^^^^^^^^    
> > > 
> > > It's the same offset in the entry.
> > >         struct {
> > >             char pointer_dest_file[ROMFILE_LOADER_FILESZ];
> > >             char pointer_src_file[ROMFILE_LOADER_FILESZ];
> > >             u32 pointer_offset;
> > >             u8 pointer_size;
> > >         };  
> > pointer_offset == offset from above but question is what is result
> > of memcpy() and how it's used below vvvv  
> 
> It's just because seabios is trying to be endian-ness
> agnostic. So instead of patching value directly,
> we read the pointer value, add offset, then store
> the result back.
> 
> > > >     pointer = le64_to_cpu(pointer);
> > > >     pointer += (unsigned long)src_file->data;  
> > as you see *(foo_size *)(dest_file->data + offset) is the 2nd offset
> > relative to the beginning of src_file  
> 
> Sorry I don't see. Where's the second offset? I see a single one.
> 
> > and current API requires
> > dst_file blob to contain valid value there of pointer_size.
> > i.e. author of AML have to prefill 2nd offset before passing
> > blob to bios_linker_loader_add_pointer()  
> 
> Since you say prefill, I am guessing what you refer
> to is the fact the add_pointer command adds
> a pointer to a current value in the dest blob,
> as opposed to overwriting it.
> 
> So if you want the result to point at offset within source blob,
> you can put that offset within the pointer value.
> 
> 
> 
> 
> > which is rather fragile.
> > If it's difficult to make ADD_POINTER command pass that offset
> > as part of the command then it would be better to extend
> > bios_linker_loader_add_pointer() to take src_offset and write
> > it into blob instead of asking for AML author to do it manually
> > every time.  
> 
> Okay. so let's be more specific. This is what makes you unhappy I guess?
> 
>     rsdp->rsdt_physical_address = cpu_to_le32(rsdt);
>     /* Address to be filled by Guest linker */
>     bios_linker_loader_add_pointer(linker, ACPI_BUILD_RSDP_FILE,
>                                    ACPI_BUILD_TABLE_FILE,
>                                    rsdp_table,
> 				   &rsdp->rsdt_physical_address,
>                                    sizeof rsdp->rsdt_physical_address);
> 
> You need to remember to fill in the patched value.
> How would you improve on this?

it would be better and more clear if user won't have use following line:
  rsdp->rsdt_physical_address = cpu_to_le32(rsdt);

and signature would look like:
 void bios_linker_loader_add_pointer(GArray *linker,
            const char *dest_file, uint_foo_t dst_offset, uint8_t dst_patched_size,
            const char *src_file, uint_foo_t src_offset)

so above would look like:
 bios_linker_loader_add_pointer(linker,
     ACPI_BUILD_RSDP_FILE, &rsdp->rsdt_physical_address - rsdp_table,
                           sizeof rsdp->rsdt_physical_address,
     ACPI_BUILD_TABLE_FILE, rsdt_offset_in_TABLE_FILE);


> 
> 
> 
> > And may by add bios_linker_loader_add_aml_pointer() which would be able
> > to check if it patches AML correctly while old bios_linker_loader_add_pointer()
> > wouldn't do check and work with raw tables.  
> 
> I'm not sure how that will be used.
> 
> 
> >   
> > > >     pointer = cpu_to_le64(pointer);
> > > >     memcpy(dest_file->data + offset, &pointer, entry->pointer_size);
> > > > 
> > > > all this src|dst_file and confusing offsets(whatever they might mean)
> > > > make me experience headache every time I need to remember how linker
> > > > works and read both QEMU and SeaBIOS code to figure it out each time.
> > > > That's what I'd call un-maintainable and hard to use API.    
> > > 
> > > Tight, there's lack of documentation. It's my fault, so let's fix it.
> > > It's an API issue, nothing to do with ABI.
> > > 
> > >   
> > > >     
> > > > >     
> > > > > > issue 3:
> > > > > > all patching obviously needs bounds checking on QEMU side
> > > > > > so it would abort early if it could corrupt memory.      
> > > > > 
> > > > > That's easy.
> > > > >     
> > > > > > >       
> > > > > > > >  2. If it's going to be used for patching AML, it should assert
> > > > > > > >     when bios_linker_loader_add_pointer() is called if to be patched
> > > > > > > >     AML object is wrong and patching would corrupt AML blob.        
> > > > > > > 
> > > > > > > Hmm for example check that the patched data has
> > > > > > > the expected pattern?      
> > > > > > yep, nothing could be done for raw tables but that should be possible
> > > > > > for AML tables and if pattern is unsupported/size doesn't match
> > > > > > it should abort QEMU early instead of corrupting table.      
> > > > > 
> > > > > Above all sounds reasonable. Would you like to take a stub
> > > > > at it or prefer me to?    
> > > > It would be better if it were you.
> > > > 
> > > > I wouldn't like to maintain it ever as it's too complex and hard to use API,
> > > > which I'd use only as the last resort if there weren't any other way
> > > > to implement the task at hand.    
> > > 
> > > Sorry, I don't get it. You don't like the API, write a better one
> > > for an existing ABI. If you prefer waiting for me to fix it,
> > > that's fine too but no guarantees that you will like the new one
> > > or when it will happen.
> > > 
> > > Look there have been 1 change (a bigfix for alignment) in several years
> > > since we added the linker. We don't maintain any compatiblity flags
> > > around it *at all*. It might have a hard to use API but that is the
> > > definition of easy to maintain.  You are pushing allocating memory host
> > > side as an alternative, what happened there is the reverse. A ton of
> > > changes and pain all the way, and we get to maintain a bag of compat
> > > hacks for old machine types. You say we finally know what we are
> > > doing and won't have to change it any more. I'm not convinced.  
> > You are talking (lots of compat issues) about manually mapped initial memory map,
> > while I'm talking about allocation in hotplug memory region.
> > It also had only one compat 'improvement' for alignment but otherwise
> > it wasn't a source of any problems.  
> 
> I'll have to look at the differences.
> Why don't they affect hotplug memory region?
> 
> > > > > > >       
> > > > > > > >         
> > > > > > > > > It's there and we are not moving away from it, so why not
> > > > > > > > > use it in more places?  Or if you think it's wrong, why don't you build
> > > > > > > > > something better then?  We could then maybe use it for these things as
> > > > > > > > > well.        
> > > > > > > > 
> > > > > > > > Yep, I think for vmgenid and even more so for nvdimm
> > > > > > > > it would be better to allocate GPAs in QEMU and map backing
> > > > > > > > MemoryRegions directly in QEMU.
> > > > > > > > For nvdimm (main data region)
> > > > > > > > we already do it using pc-dimm's GPA allocation algorithm, we also
> > > > > > > > could use similar approach for nvdimm's label area and vmgenid.
> > > > > > > > 
> > > > > > > > Here is a simple attempt to add a limited GPA allocator in high memory
> > > > > > > >  https://patchwork.ozlabs.org/patch/540852/
> > > > > > > > But it haven't got any comment from you and were ignored.
> > > > > > > > Lets consider it and perhaps we could come up with GPA allocator
> > > > > > > > that could be used for other things as well.        
> > > > > > > 
> > > > > > > For nvdimm label area, I agree passing things through
> > > > > > > a 4K buffer seems inefficient.
> > > > > > > 
> > > > > > > I'm not sure what's a better way though.
> > > > > > > 
> > > > > > > Use 64 bit memory? Setting aside old guests such as XP,
> > > > > > > does it break 32 bit guests?      
> > > > > > it might not work with 32bit guests, the same way as mem hotplug
> > > > > > doesn't work for them unless they are PAE enabled.      
> > > > > 
> > > > > Right, I mean with PAE.    
> > > > I've tested it with 32-bit XP and Windows 10, they boot fine and
> > > > vmgenid device is displayed as OK with buffer above 4Gb (on Win10).
> > > > So at least is doesn't crash guest.
> > > > I can't test more than that for 32 bit guests since utility
> > > > to read vmgenid works only on Windows Server and there isn't
> > > > a 32bit version of it.
> > > >     
> > > > >     
> > > > > > but well that's a limitation of implementation and considering
> > > > > > that storage nvdimm area is mapped at 64bit GPA it doesn't matter.
> > > > > >        
> > > > > > > I'm really afraid of adding yet another allocator, I think you
> > > > > > > underestimate the maintainance headache: it's not theoretical and is
> > > > > > > already felt.      
> > > > > > Current maintenance headache is due to fixed handpicked
> > > > > > mem layout, we can't do much with it for legacy machine
> > > > > > types but with QEMU side GPA allocator we can try to switch
> > > > > > to a flexible memory layout that would allocate GPA
> > > > > > depending on QEMU config in a stable manner.      
> > > > > 
> > > > > So far, we didn't manage to. It seems to go in the reverse
> > > > > direction were we add more and more control to let management
> > > > > influence the layout. Things like alignment requirements
> > > > > also tend to surface later and wreck havoc on whatever
> > > > > we do.    
> > > > Was even there an attempt to try it before, could you point to it?    
> > > 
> > > Look at the mess we have with the existing allocator.
> > > As a way to fix this unmaintainable mess, what I see is suggestions
> > > to drop old machine types so people have to reinstall guests.
> > > This does not inspire confidence.
> > >   
> > > > The only attempt I've seen was https://patchwork.ozlabs.org/patch/540852/
> > > > but it haven't got any technical comments from you,
> > > > except of 'I'm afraid that it won't work' on IRC.
> > > > 
> > > > QEMU already has GPA allocator limited to memory hotplug AS
> > > > and it has passed through 'growing' issues. What above patch
> > > > proposes is to reuse already existing memory hotplug AS and
> > > > maybe make its GPA allocator more generic (i.e. not tied
> > > > only to pc-dimm) on top of it.    
> > > 
> > > You say we finally know what we are doing and won't have to change it
> > > any more. I'm not convinced.
> > >   
> > > > 
> > > > It's sufficient for vmgenid use-case and a definitely
> > > > much more suitable for nvdimm which already uses it for mapping
> > > > main storage MemoryRegion.
> > > >     
> > > > > > well there is maintenance headache with bios_linker as well
> > > > > > due to its complexity (multiple layers of indirection) and
> > > > > > it will grow when more places try to use it.
> > > > > > Yep we could use it as a hack, stealing RAM and trying implement
> > > > > > backwards DMA or we could be less afraid and consider
> > > > > > yet another allocator which will do the job without hacks
> > > > > > which should benefit QEMU in a long run (it might be not easy
> > > > > > to impl. it right but if we won't even try we would be buried
> > > > > > in complex hacks that 'work' for now)
> > > > > >       
> > > > > > > > >         
> > > > > > > > > >           
> > > > > > > > > > >           
> > > > > > > > > > > > > And hey, if you want to use a pci device to pass the physical
> > > > > > > > > > > > > address guest to host, instead of reserving
> > > > > > > > > > > > > a couple of IO addresses, sure, stick it in pci config in
> > > > > > > > > > > > > a vendor-specific capability, this way it'll get migrated
> > > > > > > > > > > > > automatically.            
> > > > > > > > > > > > Could you elaborate more on this suggestion?            
> > > > > > > > > > > 
> > > > > > > > > > > I really just mean using PCI_Config operation region.
> > > > > > > > > > > If you wish, I'll try to post a prototype next week.          
> > > > > > > > > > I don't know much about PCI but it would be interesting,
> > > > > > > > > > perhaps we could use it somewhere else.
> > > > > > > > > > 
> > > > > > > > > > However it should be checked if it works with Windows,
> > > > > > > > > > for example PCI specific _DSM method is ignored by it
> > > > > > > > > > if PCI device doesn't have working PCI driver bound to it.
> > > > > > > > > >           
> > > > > > > > > > >           
> > > > > > > > > > > > > 
> > > > > > > > > > > > >             
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > changes since 17:
> > > > > > > > > > > > > >   - small fixups suggested in v14 review by Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > > > > >   - make BAR prefetchable to make region cached as per MS spec
> > > > > > > > > > > > > >   - s/uuid/guid/ to match spec
> > > > > > > > > > > > > > changes since 14:
> > > > > > > > > > > > > >   - reserve BAR resources so that Windows won't touch it
> > > > > > > > > > > > > >     during PCI rebalancing - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > > > > >   - ACPI: split VGEN device of PCI device descriptor
> > > > > > > > > > > > > >     and place it at PCI0 scope, so that won't be need trace its
> > > > > > > > > > > > > >     location on PCI buses. - "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > > > > > > > > >   - permit only one vmgenid to be created
> > > > > > > > > > > > > >   - enable BAR be mapped above 4Gb if it can't be mapped at low mem
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  default-configs/i386-softmmu.mak   |   1 +
> > > > > > > > > > > > > >  default-configs/x86_64-softmmu.mak |   1 +
> > > > > > > > > > > > > >  docs/specs/pci-ids.txt             |   1 +
> > > > > > > > > > > > > >  hw/i386/acpi-build.c               |  56 +++++++++++++-
> > > > > > > > > > > > > >  hw/misc/Makefile.objs              |   1 +
> > > > > > > > > > > > > >  hw/misc/vmgenid.c                  | 154 +++++++++++++++++++++++++++++++++++++
> > > > > > > > > > > > > >  include/hw/misc/vmgenid.h          |  27 +++++++
> > > > > > > > > > > > > >  include/hw/pci/pci.h               |   1 +
> > > > > > > > > > > > > >  8 files changed, 240 insertions(+), 2 deletions(-)
> > > > > > > > > > > > > >  create mode 100644 hw/misc/vmgenid.c
> > > > > > > > > > > > > >  create mode 100644 include/hw/misc/vmgenid.h
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> > > > > > > > > > > > > > index b177e52..6402439 100644
> > > > > > > > > > > > > > --- a/default-configs/i386-softmmu.mak
> > > > > > > > > > > > > > +++ b/default-configs/i386-softmmu.mak
> > > > > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > > > > diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > > > > index 6e3b312..fdac18f 100644
> > > > > > > > > > > > > > --- a/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > > > > +++ b/default-configs/x86_64-softmmu.mak
> > > > > > > > > > > > > > @@ -51,6 +51,7 @@ CONFIG_APIC=y
> > > > > > > > > > > > > >  CONFIG_IOAPIC=y
> > > > > > > > > > > > > >  CONFIG_PVPANIC=y
> > > > > > > > > > > > > >  CONFIG_MEM_HOTPLUG=y
> > > > > > > > > > > > > > +CONFIG_VMGENID=y
> > > > > > > > > > > > > >  CONFIG_NVDIMM=y
> > > > > > > > > > > > > >  CONFIG_ACPI_NVDIMM=y
> > > > > > > > > > > > > >  CONFIG_XIO3130=y
> > > > > > > > > > > > > > diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
> > > > > > > > > > > > > > index 0adcb89..e65ecf9 100644
> > > > > > > > > > > > > > --- a/docs/specs/pci-ids.txt
> > > > > > > > > > > > > > +++ b/docs/specs/pci-ids.txt
> > > > > > > > > > > > > > @@ -47,6 +47,7 @@ PCI devices (other than virtio):
> > > > > > > > > > > > > >  1b36:0005  PCI test device (docs/specs/pci-testdev.txt)
> > > > > > > > > > > > > >  1b36:0006  PCI Rocker Ethernet switch device
> > > > > > > > > > > > > >  1b36:0007  PCI SD Card Host Controller Interface (SDHCI)
> > > > > > > > > > > > > > +1b36:0009  PCI VM-Generation device
> > > > > > > > > > > > > >  1b36:000a  PCI-PCI bridge (multiseat)
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > >  All these devices are documented in docs/specs.
> > > > > > > > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > > > > > > > > > > index 78758e2..0187262 100644
> > > > > > > > > > > > > > --- a/hw/i386/acpi-build.c
> > > > > > > > > > > > > > +++ b/hw/i386/acpi-build.c
> > > > > > > > > > > > > > @@ -44,6 +44,7 @@
> > > > > > > > > > > > > >  #include "hw/acpi/tpm.h"
> > > > > > > > > > > > > >  #include "sysemu/tpm_backend.h"
> > > > > > > > > > > > > >  #include "hw/timer/mc146818rtc_regs.h"
> > > > > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > >  /* Supported chipsets: */
> > > > > > > > > > > > > >  #include "hw/acpi/piix4.h"
> > > > > > > > > > > > > > @@ -237,6 +238,40 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> > > > > > > > > > > > > >      info->applesmc_io_base = applesmc_port();
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > > +static Aml *build_vmgenid_device(uint64_t buf_paddr)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +    Aml *dev, *pkg, *crs;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    dev = aml_device("VGEN");
> > > > > > > > > > > > > > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0003")));
> > > > > > > > > > > > > > +    aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
> > > > > > > > > > > > > > +    aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    pkg = aml_package(2);
> > > > > > > > > > > > > > +    /* low 32 bits of UUID buffer addr */
> > > > > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr & 0xFFFFFFFFUL));
> > > > > > > > > > > > > > +    /* high 32 bits of UUID buffer addr */
> > > > > > > > > > > > > > +    aml_append(pkg, aml_int(buf_paddr >> 32));
> > > > > > > > > > > > > > +    aml_append(dev, aml_name_decl("ADDR", pkg));
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    /*
> > > > > > > > > > > > > > +     * VMGEN device has class_id PCI_CLASS_MEMORY_RAM and Windows
> > > > > > > > > > > > > > +     * displays it as "PCI RAM controller" which is marked as NO_DRV
> > > > > > > > > > > > > > +     * so Windows ignores VMGEN device completely and doesn't check
> > > > > > > > > > > > > > +     * for resource conflicts which during PCI rebalancing can lead
> > > > > > > > > > > > > > +     * to another PCI device claiming ignored BARs. To prevent this
> > > > > > > > > > > > > > +     * statically reserve resources used by VM_Gen_Counter.
> > > > > > > > > > > > > > +     * For more verbose comment see this commit message.              
> > > > > > > > > > > > > 
> > > > > > > > > > > > > What does "this commit message" mean?            
> > > > > > > > > > > > above commit message. Should I reword it to just 'see commit message'
> > > > > > > > > > > >             
> > > > > > > > > > > > >             
> > > > > > > > > > > > > > +     */
> > > > > > > > > > > > > > +     crs = aml_resource_template();
> > > > > > > > > > > > > > +     aml_append(crs, aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
> > > > > > > > > > > > > > +                AML_MAX_FIXED, AML_CACHEABLE, AML_READ_WRITE, 0,
> > > > > > > > > > > > > > +                buf_paddr, buf_paddr + VMGENID_VMGID_BUF_SIZE - 1, 0,
> > > > > > > > > > > > > > +                VMGENID_VMGID_BUF_SIZE));
> > > > > > > > > > > > > > +     aml_append(dev, aml_name_decl("_CRS", crs));
> > > > > > > > > > > > > > +     return dev;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >  /*
> > > > > > > > > > > > > >   * Because of the PXB hosts we cannot simply query TYPE_PCI_HOST_BRIDGE.
> > > > > > > > > > > > > >   * On i386 arch we only have two pci hosts, so we can look only for them.
> > > > > > > > > > > > > > @@ -2171,6 +2206,7 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > > > > > >              }
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > >              if (bus) {
> > > > > > > > > > > > > > +                Object *vmgen;
> > > > > > > > > > > > > >                  Aml *scope = aml_scope("PCI0");
> > > > > > > > > > > > > >                  /* Scan all PCI buses. Generate tables to support hotplug. */
> > > > > > > > > > > > > >                  build_append_pci_bus_devices(scope, bus, pm->pcihp_bridge_en);
> > > > > > > > > > > > > > @@ -2187,6 +2223,24 @@ build_ssdt(GArray *table_data, GArray *linker,
> > > > > > > > > > > > > >                      aml_append(scope, dev);
> > > > > > > > > > > > > >                  }
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > > +                vmgen = find_vmgneid_dev(NULL);
> > > > > > > > > > > > > > +                if (vmgen) {
> > > > > > > > > > > > > > +                    PCIDevice *pdev = PCI_DEVICE(vmgen);
> > > > > > > > > > > > > > +                    uint64_t buf_paddr =
> > > > > > > > > > > > > > +                        pci_get_bar_addr(pdev, VMGENID_VMGID_BUF_BAR);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +                    if (buf_paddr != PCI_BAR_UNMAPPED) {
> > > > > > > > > > > > > > +                        aml_append(scope, build_vmgenid_device(buf_paddr));
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +                        method = aml_method("\\_GPE._E00", 0,
> > > > > > > > > > > > > > +                                            AML_NOTSERIALIZED);
> > > > > > > > > > > > > > +                        aml_append(method,
> > > > > > > > > > > > > > +                            aml_notify(aml_name("\\_SB.PCI0.VGEN"),
> > > > > > > > > > > > > > +                                       aml_int(0x80)));
> > > > > > > > > > > > > > +                        aml_append(ssdt, method);
> > > > > > > > > > > > > > +                    }
> > > > > > > > > > > > > > +                }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >                  aml_append(sb_scope, scope);
> > > > > > > > > > > > > >              }
> > > > > > > > > > > > > >          }
> > > > > > > > > > > > > > @@ -2489,8 +2543,6 @@ build_dsdt(GArray *table_data, GArray *linker,
> > > > > > > > > > > > > >      {
> > > > > > > > > > > > > >          aml_append(scope, aml_name_decl("_HID", aml_string("ACPI0006")));
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > > -        aml_append(scope, aml_method("_L00", 0, AML_NOTSERIALIZED));
> > > > > > > > > > > > > > -
> > > > > > > > > > > > > >          if (misc->is_piix4) {
> > > > > > > > > > > > > >              method = aml_method("_E01", 0, AML_NOTSERIALIZED);
> > > > > > > > > > > > > >              aml_append(method,
> > > > > > > > > > > > > > diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> > > > > > > > > > > > > > index d4765c2..1f05edd 100644
> > > > > > > > > > > > > > --- a/hw/misc/Makefile.objs
> > > > > > > > > > > > > > +++ b/hw/misc/Makefile.objs
> > > > > > > > > > > > > > @@ -43,4 +43,5 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > >  obj-$(CONFIG_PVPANIC) += pvpanic.o
> > > > > > > > > > > > > >  obj-$(CONFIG_EDU) += edu.o
> > > > > > > > > > > > > > +obj-$(CONFIG_VMGENID) += vmgenid.o
> > > > > > > > > > > > > >  obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
> > > > > > > > > > > > > > diff --git a/hw/misc/vmgenid.c b/hw/misc/vmgenid.c
> > > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > > index 0000000..a2fbdfc
> > > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > > +++ b/hw/misc/vmgenid.c
> > > > > > > > > > > > > > @@ -0,0 +1,154 @@
> > > > > > > > > > > > > > +/*
> > > > > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > > > > + *
> > > > > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > > > > + *
> > > > > > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > > > > > + *
> > > > > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > > > > + *
> > > > > > > > > > > > > > + */
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +#include "hw/i386/pc.h"
> > > > > > > > > > > > > > +#include "hw/pci/pci.h"
> > > > > > > > > > > > > > +#include "hw/misc/vmgenid.h"
> > > > > > > > > > > > > > +#include "hw/acpi/acpi.h"
> > > > > > > > > > > > > > +#include "qapi/visitor.h"
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +#define VMGENID(obj) OBJECT_CHECK(VmGenIdState, (obj), VMGENID_DEVICE)
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +typedef struct VmGenIdState {
> > > > > > > > > > > > > > +    PCIDevice parent_obj;
> > > > > > > > > > > > > > +    MemoryRegion iomem;
> > > > > > > > > > > > > > +    union {
> > > > > > > > > > > > > > +        uint8_t guid[16];
> > > > > > > > > > > > > > +        uint8_t guid_page[VMGENID_VMGID_BUF_SIZE];
> > > > > > > > > > > > > > +    };
> > > > > > > > > > > > > > +    bool guid_set;
> > > > > > > > > > > > > > +} VmGenIdState;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +    Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
> > > > > > > > > > > > > > +    if (!obj) {
> > > > > > > > > > > > > > +        error_setg(errp, VMGENID_DEVICE " is not found");
> > > > > > > > > > > > > > +    }
> > > > > > > > > > > > > > +    return obj;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void vmgenid_update_guest(VmGenIdState *s)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +    Object *acpi_obj;
> > > > > > > > > > > > > > +    void *ptr = memory_region_get_ram_ptr(&s->iomem);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    memcpy(ptr, &s->guid, sizeof(s->guid));
> > > > > > > > > > > > > > +    memory_region_set_dirty(&s->iomem, 0, sizeof(s->guid));
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    acpi_obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
> > > > > > > > > > > > > > +    if (acpi_obj) {
> > > > > > > > > > > > > > +        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(acpi_obj);
> > > > > > > > > > > > > > +        AcpiDeviceIf *adev = ACPI_DEVICE_IF(acpi_obj);
> > > > > > > > > > > > > > +        ACPIREGS *acpi_regs = adevc->regs(adev);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +        acpi_regs->gpe.sts[0] |= 1; /* _GPE.E00 handler */
> > > > > > > > > > > > > > +        acpi_update_sci(acpi_regs, adevc->sci(adev));
> > > > > > > > > > > > > > +    }
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    if (qemu_uuid_parse(value, s->guid) < 0) {
> > > > > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID
> > > > > > > > > > > > > > +                   "': Failed to parse GUID string: %s",
> > > > > > > > > > > > > > +                   object_get_typename(OBJECT(s)),
> > > > > > > > > > > > > > +                   value);
> > > > > > > > > > > > > > +        return;
> > > > > > > > > > > > > > +    }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    s->guid_set = true;
> > > > > > > > > > > > > > +    vmgenid_update_guest(s);
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void vmgenid_get_vmgid_addr(Object *obj, Visitor *v, void *opaque,
> > > > > > > > > > > > > > +                                   const char *name, Error **errp)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +    int64_t value = pci_get_bar_addr(PCI_DEVICE(obj), 0);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    if (value == PCI_BAR_UNMAPPED) {
> > > > > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_VMGID_BUF_ADDR "': not initialized",
> > > > > > > > > > > > > > +                   object_get_typename(OBJECT(obj)));
> > > > > > > > > > > > > > +        return;
> > > > > > > > > > > > > > +    }
> > > > > > > > > > > > > > +    visit_type_int(v, &value, name, errp);
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void vmgenid_initfn(Object *obj)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +    VmGenIdState *s = VMGENID(obj);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    memory_region_init_ram(&s->iomem, obj, "vgid.bar", sizeof(s->guid_page),
> > > > > > > > > > > > > > +                           &error_abort);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    object_property_add_str(obj, VMGENID_GUID, NULL, vmgenid_set_guid, NULL);
> > > > > > > > > > > > > > +    object_property_add(obj, VMGENID_VMGID_BUF_ADDR, "int",
> > > > > > > > > > > > > > +                        vmgenid_get_vmgid_addr, NULL, NULL, NULL, NULL);
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void vmgenid_realize(PCIDevice *dev, Error **errp)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +    VmGenIdState *s = VMGENID(dev);
> > > > > > > > > > > > > > +    bool ambiguous = false;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    object_resolve_path_type("", VMGENID_DEVICE, &ambiguous);
> > > > > > > > > > > > > > +    if (ambiguous) {
> > > > > > > > > > > > > > +        error_setg(errp, "no more than one " VMGENID_DEVICE
> > > > > > > > > > > > > > +                         " device is permitted");
> > > > > > > > > > > > > > +        return;
> > > > > > > > > > > > > > +    }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    if (!s->guid_set) {
> > > > > > > > > > > > > > +        error_setg(errp, "'%s." VMGENID_GUID "' property is not set",
> > > > > > > > > > > > > > +                   object_get_typename(OBJECT(s)));
> > > > > > > > > > > > > > +        return;
> > > > > > > > > > > > > > +    }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    vmstate_register_ram(&s->iomem, DEVICE(s));
> > > > > > > > > > > > > > +    pci_register_bar(PCI_DEVICE(s), VMGENID_VMGID_BUF_BAR,
> > > > > > > > > > > > > > +        PCI_BASE_ADDRESS_MEM_PREFETCH |
> > > > > > > > > > > > > > +        PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
> > > > > > > > > > > > > > +        &s->iomem);
> > > > > > > > > > > > > > +    return;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void vmgenid_class_init(ObjectClass *klass, void *data)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > > > > > > > > > > > > +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > > > > > > > > > > > > +    dc->hotpluggable = false;
> > > > > > > > > > > > > > +    k->realize = vmgenid_realize;
> > > > > > > > > > > > > > +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > > > > > > > > > > > > > +    k->device_id = PCI_DEVICE_ID_REDHAT_VMGENID;
> > > > > > > > > > > > > > +    k->class_id = PCI_CLASS_MEMORY_RAM;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static const TypeInfo vmgenid_device_info = {
> > > > > > > > > > > > > > +    .name          = VMGENID_DEVICE,
> > > > > > > > > > > > > > +    .parent        = TYPE_PCI_DEVICE,
> > > > > > > > > > > > > > +    .instance_size = sizeof(VmGenIdState),
> > > > > > > > > > > > > > +    .instance_init = vmgenid_initfn,
> > > > > > > > > > > > > > +    .class_init    = vmgenid_class_init,
> > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void vmgenid_register_types(void)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +    type_register_static(&vmgenid_device_info);
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +type_init(vmgenid_register_types)
> > > > > > > > > > > > > > diff --git a/include/hw/misc/vmgenid.h b/include/hw/misc/vmgenid.h
> > > > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > > > index 0000000..b90882c
> > > > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > > > +++ b/include/hw/misc/vmgenid.h
> > > > > > > > > > > > > > @@ -0,0 +1,27 @@
> > > > > > > > > > > > > > +/*
> > > > > > > > > > > > > > + *  Virtual Machine Generation ID Device
> > > > > > > > > > > > > > + *
> > > > > > > > > > > > > > + *  Copyright (C) 2016 Red Hat Inc.
> > > > > > > > > > > > > > + *
> > > > > > > > > > > > > > + *  Authors: Gal Hammer <ghammer@redhat.com>
> > > > > > > > > > > > > > + *           Igor Mammedov <imammedo@redhat.com>
> > > > > > > > > > > > > > + *
> > > > > > > > > > > > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > > > > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > > > > > > + *
> > > > > > > > > > > > > > + */
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +#ifndef HW_MISC_VMGENID_H
> > > > > > > > > > > > > > +#define HW_MISC_VMGENID_H
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +#include "qom/object.h"
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +#define VMGENID_DEVICE           "vmgenid"
> > > > > > > > > > > > > > +#define VMGENID_GUID             "guid"
> > > > > > > > > > > > > > +#define VMGENID_VMGID_BUF_ADDR   "vmgid-addr"
> > > > > > > > > > > > > > +#define VMGENID_VMGID_BUF_SIZE   0x1000
> > > > > > > > > > > > > > +#define VMGENID_VMGID_BUF_BAR    0
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +Object *find_vmgneid_dev(Error **errp);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +#endif
> > > > > > > > > > > > > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > > > > > > > > > > > > index dedf277..f4c9d48 100644
> > > > > > > > > > > > > > --- a/include/hw/pci/pci.h
> > > > > > > > > > > > > > +++ b/include/hw/pci/pci.h
> > > > > > > > > > > > > > @@ -94,6 +94,7 @@
> > > > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB         0x0009
> > > > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
> > > > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_PXB_PCIE    0x000b
> > > > > > > > > > > > > > +#define PCI_DEVICE_ID_REDHAT_VMGENID     0x000c
> > > > > > > > > > > > > >  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > >  #define FMT_PCIBUS                      PRIx64
> > > > > > > > > > > > > > -- 
> > > > > > > > > > > > > > 1.8.3.1              
> > > > > > > > >         
> > > > >     
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-11 16:30                       ` Michael S. Tsirkin
  2016-02-11 17:34                         ` Marcel Apfelbaum
  2016-02-15 10:30                         ` Igor Mammedov
@ 2016-02-16 10:05                         ` Marcel Apfelbaum
  2016-02-16 12:17                           ` Igor Mammedov
  2 siblings, 1 reply; 59+ messages in thread
From: Marcel Apfelbaum @ 2016-02-16 10:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer, qemu-devel,
	lcapitulino, lersek

On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
>> On Tue, 9 Feb 2016 14:17:44 +0200
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>
>>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
>>>>> So the linker interface solves this rather neatly:
>>>>> bios allocates memory, bios passes memory map to guest.
>>>>> Served us well for several years without need for extensions,
>>>>> and it does solve the VM GEN ID problem, even though
>>>>> 1. it was never designed for huge areas like nvdimm seems to want to use
>>>>> 2. we might want to add a new 64 bit flag to avoid touching low memory
>>>> linker interface is fine for some readonly data, like ACPI tables
>>>> especially fixed tables not so for AML ones is one wants to patch it.
>>>>
>>>> However now when you want to use it for other purposes you start
>>>> adding extensions and other guest->QEMU channels to communicate
>>>> patching info back.
>>>> It steals guest's memory which is also not nice and doesn't scale well.
>>>
>>> This is an argument I don't get. memory is memory. call it guest memory
>>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
>>> but much slower.
>>>
>>> ...
>> It however matters for user, he pays for guest with XXX RAM but gets less
>> than that. And that will be getting worse as a number of such devices
>> increases.
>>
>>>>> OK fine, but returning PCI BAR address to guest is wrong.
>>>>> How about reading it from ACPI then? Is it really
>>>>> broken unless there's *also* a driver?
>>>> I don't get question, MS Spec requires address (ADDR method),
>>>> and it's read by ACPI (AML).
>>>
>>> You were unhappy about DMA into guest memory.
>>> As a replacement for DMA, we could have AML read from
>>> e.g. PCI and write into RAM.
>>> This way we don't need to pass address to QEMU.
>> That sounds better as it saves us from allocation of IO port
>> and QEMU don't need to write into guest memory, the only question is
>> if PCI_Config opregion would work with driver-less PCI device.
>
> Or PCI BAR for that reason. I don't know for sure.
>
>>
>> And it's still pretty much not test-able since it would require
>> fully running OSPM to execute AML side.
>
> AML is not testable, but that's nothing new.
> You can test reading from PCI.
>
>>>
>>>> As for working PCI_Config OpRegion without driver, I haven't tried,
>>>> but I wouldn't be surprised if it doesn't, taking in account that
>>>> MS introduced _DSM doesn't.
>>>>
>>>>>
>>>>>
>>>>>>>>     Just compare with a graphics card design, where on device memory
>>>>>>>>     is mapped directly at some GPA not wasting RAM that guest could
>>>>>>>>     use for other tasks.
>>>>>>>
>>>>>>> This might have been true 20 years ago.  Most modern cards do DMA.
>>>>>>
>>>>>> Modern cards, with it's own RAM, map its VRAM in address space directly
>>>>>> and allow users use it (GEM API). So they do not waste conventional RAM.
>>>>>> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
>>>>>> series (even PCI class id is the same)
>>>>>
>>>>> Don't know enough about graphics really, I'm not sure how these are
>>>>> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
>>>>> mostly use guest RAM, not on card RAM.
>>>>>
>>>>>>>>     VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
>>>>>>>>     instead of consuming guest's RAM they should be mapped at
>>>>>>>>     some GPA and their memory accessed directly.
>>>>>>>
>>>>>>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
>>>>>>> address. This breaks the straight-forward approach of using a
>>>>>>> rebalanceable PCI BAR.
>>>>>>
>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver
>>>>>> otherwise OS will ignore it when rebalancing happens and
>>>>>> might map something else over ignored BAR.
>>>>>
>>>>> Does it disable the BAR then? Or just move it elsewhere?
>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
>>>> another device with driver over it.
>>>
>>> Interesting. On classical PCI this is a forbidden configuration.
>>> Maybe we do something that confuses windows?
>>> Could you tell me how to reproduce this behaviour?
>> #cat > t << EOF
>> pci_update_mappings_del
>> pci_update_mappings_add
>> EOF
>>
>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
>>   -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
>>   -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
>>   -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
>>
>> wait till OS boots, note BARs programmed for ivshmem
>>   in my case it was
>>     01:01.0 0,0xfe800000+0x100
>> then execute script and watch pci_update_mappings* trace events
>>
>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
>>
>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
>> and then programs new BARs, where:
>>    pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
>> creates overlapping BAR with ivshmem
>
>
> Thanks!
> We need to figure this out because currently this does not
> work properly (or maybe it works, but merely by chance).
> Me and Marcel will play with this.
>

I checked and indeed we have 2 separate problems:

1. ivshmem is declared as PCI RAM controller and Windows *does* have the drivers
    for it, however it is not remapped on re-balancing.
    You can see on Device Manage 2 working devices with the same MMIO region - strange!
    This may be because PCI RAM controllers can't be re-mapped? Even then, it should not be overridden.
    Maybe we need to add a clue to the OS in ACPI regarding this range?

2. PCI devices with no driver installed are not re-mapped. This can be OK
    from the Windows point of view because Resources Window does not show the MMIO range
    for this device.

    If the other (re-mapped) device is working, is pure luck. Both Memory Regions occupy the same range
    and have the same priority.

We need to think about how to solve this.
One way would be to defer the BAR activation to the guest OS, but I am not sure of the consequences.
And this does not solve the ivshmem problem.

Thanks,
Marcel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-16 10:05                         ` Marcel Apfelbaum
@ 2016-02-16 12:17                           ` Igor Mammedov
  2016-02-16 12:36                             ` Marcel Apfelbaum
  0 siblings, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-02-16 12:17 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Xiao Guangrong, ehabkost, Michael S. Tsirkin, ghammer,
	Marcel Apfelbaum, qemu-devel, lcapitulino, lersek

On Tue, 16 Feb 2016 12:05:33 +0200
Marcel Apfelbaum <marcel@redhat.com> wrote:

> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
> > On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:  
> >> On Tue, 9 Feb 2016 14:17:44 +0200
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>  
> >>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> >>>>> So the linker interface solves this rather neatly:
> >>>>> bios allocates memory, bios passes memory map to guest.
> >>>>> Served us well for several years without need for extensions,
> >>>>> and it does solve the VM GEN ID problem, even though
> >>>>> 1. it was never designed for huge areas like nvdimm seems to want to use
> >>>>> 2. we might want to add a new 64 bit flag to avoid touching low memory  
> >>>> linker interface is fine for some readonly data, like ACPI tables
> >>>> especially fixed tables not so for AML ones is one wants to patch it.
> >>>>
> >>>> However now when you want to use it for other purposes you start
> >>>> adding extensions and other guest->QEMU channels to communicate
> >>>> patching info back.
> >>>> It steals guest's memory which is also not nice and doesn't scale well.  
> >>>
> >>> This is an argument I don't get. memory is memory. call it guest memory
> >>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> >>> but much slower.
> >>>
> >>> ...  
> >> It however matters for user, he pays for guest with XXX RAM but gets less
> >> than that. And that will be getting worse as a number of such devices
> >> increases.
> >>  
> >>>>> OK fine, but returning PCI BAR address to guest is wrong.
> >>>>> How about reading it from ACPI then? Is it really
> >>>>> broken unless there's *also* a driver?  
> >>>> I don't get question, MS Spec requires address (ADDR method),
> >>>> and it's read by ACPI (AML).  
> >>>
> >>> You were unhappy about DMA into guest memory.
> >>> As a replacement for DMA, we could have AML read from
> >>> e.g. PCI and write into RAM.
> >>> This way we don't need to pass address to QEMU.  
> >> That sounds better as it saves us from allocation of IO port
> >> and QEMU don't need to write into guest memory, the only question is
> >> if PCI_Config opregion would work with driver-less PCI device.  
> >
> > Or PCI BAR for that reason. I don't know for sure.
> >  
> >>
> >> And it's still pretty much not test-able since it would require
> >> fully running OSPM to execute AML side.  
> >
> > AML is not testable, but that's nothing new.
> > You can test reading from PCI.
> >  
> >>>  
> >>>> As for working PCI_Config OpRegion without driver, I haven't tried,
> >>>> but I wouldn't be surprised if it doesn't, taking in account that
> >>>> MS introduced _DSM doesn't.
> >>>>  
> >>>>>
> >>>>>  
> >>>>>>>>     Just compare with a graphics card design, where on device memory
> >>>>>>>>     is mapped directly at some GPA not wasting RAM that guest could
> >>>>>>>>     use for other tasks.  
> >>>>>>>
> >>>>>>> This might have been true 20 years ago.  Most modern cards do DMA.  
> >>>>>>
> >>>>>> Modern cards, with it's own RAM, map its VRAM in address space directly
> >>>>>> and allow users use it (GEM API). So they do not waste conventional RAM.
> >>>>>> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> >>>>>> series (even PCI class id is the same)  
> >>>>>
> >>>>> Don't know enough about graphics really, I'm not sure how these are
> >>>>> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> >>>>> mostly use guest RAM, not on card RAM.
> >>>>>  
> >>>>>>>>     VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> >>>>>>>>     instead of consuming guest's RAM they should be mapped at
> >>>>>>>>     some GPA and their memory accessed directly.  
> >>>>>>>
> >>>>>>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> >>>>>>> address. This breaks the straight-forward approach of using a
> >>>>>>> rebalanceable PCI BAR.  
> >>>>>>
> >>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver
> >>>>>> otherwise OS will ignore it when rebalancing happens and
> >>>>>> might map something else over ignored BAR.  
> >>>>>
> >>>>> Does it disable the BAR then? Or just move it elsewhere?  
> >>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
> >>>> another device with driver over it.  
> >>>
> >>> Interesting. On classical PCI this is a forbidden configuration.
> >>> Maybe we do something that confuses windows?
> >>> Could you tell me how to reproduce this behaviour?  
> >> #cat > t << EOF
> >> pci_update_mappings_del
> >> pci_update_mappings_add
> >> EOF
> >>
> >> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> >>   -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> >>   -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> >>   -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> >>
> >> wait till OS boots, note BARs programmed for ivshmem
> >>   in my case it was
> >>     01:01.0 0,0xfe800000+0x100
> >> then execute script and watch pci_update_mappings* trace events
> >>
> >> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> >>
> >> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> >> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> >> and then programs new BARs, where:
> >>    pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> >> creates overlapping BAR with ivshmem  
> >
> >
> > Thanks!
> > We need to figure this out because currently this does not
> > work properly (or maybe it works, but merely by chance).
> > Me and Marcel will play with this.
> >  
> 
> I checked and indeed we have 2 separate problems:
> 
> 1. ivshmem is declared as PCI RAM controller and Windows *does* have the drivers
>     for it, however it is not remapped on re-balancing.
Does it really have a driver, i.e ivshmem specific one?
It should have its own driver or otherwise userspace
won't be able to access/work with it and it would be pointless
to add such device to machine.

>     You can see on Device Manage 2 working devices with the same MMIO region - strange!
>     This may be because PCI RAM controllers can't be re-mapped? Even then, it should not be overridden.
>     Maybe we need to add a clue to the OS in ACPI regarding this range?
> 
> 2. PCI devices with no driver installed are not re-mapped. This can be OK
>     from the Windows point of view because Resources Window does not show the MMIO range
>     for this device.
> 
>     If the other (re-mapped) device is working, is pure luck. Both Memory Regions occupy the same range
>     and have the same priority.
> 
> We need to think about how to solve this.
> One way would be to defer the BAR activation to the guest OS, but I am not sure of the consequences.
deferring won't solve problem as rebalancing could happen later
and make BARs overlap.
I've noticed that at startup Windows unmaps and then maps BARs
at the same addresses where BIOS've put them before. 

> And this does not solve the ivshmem problem.
So far the only way to avoid overlapping BARs due to Windows
doing rebalancing for driver-less devices is to pin such
BARs statically with _CRS in ACPI table but as Michael said
it fragments PCI address-space.

> 
> Thanks,
> Marcel
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-16 12:17                           ` Igor Mammedov
@ 2016-02-16 12:36                             ` Marcel Apfelbaum
  2016-02-16 13:51                               ` Igor Mammedov
  2016-02-16 15:10                               ` Michael S. Tsirkin
  0 siblings, 2 replies; 59+ messages in thread
From: Marcel Apfelbaum @ 2016-02-16 12:36 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, Michael S. Tsirkin, ghammer,
	Marcel Apfelbaum, qemu-devel, lcapitulino, lersek

On 02/16/2016 02:17 PM, Igor Mammedov wrote:
> On Tue, 16 Feb 2016 12:05:33 +0200
> Marcel Apfelbaum <marcel@redhat.com> wrote:
>
>> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:
>>> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:
>>>> On Tue, 9 Feb 2016 14:17:44 +0200
>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>
>>>>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:
>>>>>>> So the linker interface solves this rather neatly:
>>>>>>> bios allocates memory, bios passes memory map to guest.
>>>>>>> Served us well for several years without need for extensions,
>>>>>>> and it does solve the VM GEN ID problem, even though
>>>>>>> 1. it was never designed for huge areas like nvdimm seems to want to use
>>>>>>> 2. we might want to add a new 64 bit flag to avoid touching low memory
>>>>>> linker interface is fine for some readonly data, like ACPI tables
>>>>>> especially fixed tables not so for AML ones is one wants to patch it.
>>>>>>
>>>>>> However now when you want to use it for other purposes you start
>>>>>> adding extensions and other guest->QEMU channels to communicate
>>>>>> patching info back.
>>>>>> It steals guest's memory which is also not nice and doesn't scale well.
>>>>>
>>>>> This is an argument I don't get. memory is memory. call it guest memory
>>>>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
>>>>> but much slower.
>>>>>
>>>>> ...
>>>> It however matters for user, he pays for guest with XXX RAM but gets less
>>>> than that. And that will be getting worse as a number of such devices
>>>> increases.
>>>>
>>>>>>> OK fine, but returning PCI BAR address to guest is wrong.
>>>>>>> How about reading it from ACPI then? Is it really
>>>>>>> broken unless there's *also* a driver?
>>>>>> I don't get question, MS Spec requires address (ADDR method),
>>>>>> and it's read by ACPI (AML).
>>>>>
>>>>> You were unhappy about DMA into guest memory.
>>>>> As a replacement for DMA, we could have AML read from
>>>>> e.g. PCI and write into RAM.
>>>>> This way we don't need to pass address to QEMU.
>>>> That sounds better as it saves us from allocation of IO port
>>>> and QEMU don't need to write into guest memory, the only question is
>>>> if PCI_Config opregion would work with driver-less PCI device.
>>>
>>> Or PCI BAR for that reason. I don't know for sure.
>>>
>>>>
>>>> And it's still pretty much not test-able since it would require
>>>> fully running OSPM to execute AML side.
>>>
>>> AML is not testable, but that's nothing new.
>>> You can test reading from PCI.
>>>
>>>>>
>>>>>> As for working PCI_Config OpRegion without driver, I haven't tried,
>>>>>> but I wouldn't be surprised if it doesn't, taking in account that
>>>>>> MS introduced _DSM doesn't.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>>      Just compare with a graphics card design, where on device memory
>>>>>>>>>>      is mapped directly at some GPA not wasting RAM that guest could
>>>>>>>>>>      use for other tasks.
>>>>>>>>>
>>>>>>>>> This might have been true 20 years ago.  Most modern cards do DMA.
>>>>>>>>
>>>>>>>> Modern cards, with it's own RAM, map its VRAM in address space directly
>>>>>>>> and allow users use it (GEM API). So they do not waste conventional RAM.
>>>>>>>> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
>>>>>>>> series (even PCI class id is the same)
>>>>>>>
>>>>>>> Don't know enough about graphics really, I'm not sure how these are
>>>>>>> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
>>>>>>> mostly use guest RAM, not on card RAM.
>>>>>>>
>>>>>>>>>>      VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
>>>>>>>>>>      instead of consuming guest's RAM they should be mapped at
>>>>>>>>>>      some GPA and their memory accessed directly.
>>>>>>>>>
>>>>>>>>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
>>>>>>>>> address. This breaks the straight-forward approach of using a
>>>>>>>>> rebalanceable PCI BAR.
>>>>>>>>
>>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver
>>>>>>>> otherwise OS will ignore it when rebalancing happens and
>>>>>>>> might map something else over ignored BAR.
>>>>>>>
>>>>>>> Does it disable the BAR then? Or just move it elsewhere?
>>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
>>>>>> another device with driver over it.
>>>>>
>>>>> Interesting. On classical PCI this is a forbidden configuration.
>>>>> Maybe we do something that confuses windows?
>>>>> Could you tell me how to reproduce this behaviour?
>>>> #cat > t << EOF
>>>> pci_update_mappings_del
>>>> pci_update_mappings_add
>>>> EOF
>>>>
>>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
>>>>    -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
>>>>    -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
>>>>    -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
>>>>
>>>> wait till OS boots, note BARs programmed for ivshmem
>>>>    in my case it was
>>>>      01:01.0 0,0xfe800000+0x100
>>>> then execute script and watch pci_update_mappings* trace events
>>>>
>>>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
>>>>
>>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
>>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
>>>> and then programs new BARs, where:
>>>>     pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
>>>> creates overlapping BAR with ivshmem
>>>
>>>
>>> Thanks!
>>> We need to figure this out because currently this does not
>>> work properly (or maybe it works, but merely by chance).
>>> Me and Marcel will play with this.
>>>
>>
>> I checked and indeed we have 2 separate problems:
>>
>> 1. ivshmem is declared as PCI RAM controller and Windows *does* have the drivers
>>      for it, however it is not remapped on re-balancing.
> Does it really have a driver, i.e ivshmem specific one?
> It should have its own driver or otherwise userspace
> won't be able to access/work with it and it would be pointless
> to add such device to machine.

No, it does not.

>
>>      You can see on Device Manage 2 working devices with the same MMIO region - strange!
>>      This may be because PCI RAM controllers can't be re-mapped? Even then, it should not be overridden.
>>      Maybe we need to add a clue to the OS in ACPI regarding this range?
>>
>> 2. PCI devices with no driver installed are not re-mapped. This can be OK
>>      from the Windows point of view because Resources Window does not show the MMIO range
>>      for this device.
>>
>>      If the other (re-mapped) device is working, is pure luck. Both Memory Regions occupy the same range
>>      and have the same priority.
>>
>> We need to think about how to solve this.
>> One way would be to defer the BAR activation to the guest OS, but I am not sure of the consequences.
> deferring won't solve problem as rebalancing could happen later
> and make BARs overlap.

Why not? If we do not activate the BAR in firmware and Windows does not have a driver
for it, will not activate it at all, right?
Why would Windows activate the device BAR if it can't use it? At least this is what I hope.
Any other idea would be appreciated.


> I've noticed that at startup Windows unmaps and then maps BARs
> at the same addresses where BIOS've put them before.

Including devices without a working driver?


Thanks,
Marcel

>
>> And this does not solve the ivshmem problem.
> So far the only way to avoid overlapping BARs due to Windows
> doing rebalancing for driver-less devices is to pin such
> BARs statically with _CRS in ACPI table but as Michael said
> it fragments PCI address-space.
>
>>
>> Thanks,
>> Marcel
>>
>>
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-16 12:36                             ` Marcel Apfelbaum
@ 2016-02-16 13:51                               ` Igor Mammedov
  2016-02-16 14:53                                 ` Michael S. Tsirkin
  2016-02-16 15:10                               ` Michael S. Tsirkin
  1 sibling, 1 reply; 59+ messages in thread
From: Igor Mammedov @ 2016-02-16 13:51 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer,
	Michael S. Tsirkin, qemu-devel, lcapitulino, lersek

On Tue, 16 Feb 2016 14:36:49 +0200
Marcel Apfelbaum <marcel@redhat.com> wrote:

> On 02/16/2016 02:17 PM, Igor Mammedov wrote:
> > On Tue, 16 Feb 2016 12:05:33 +0200
> > Marcel Apfelbaum <marcel@redhat.com> wrote:
> >  
> >> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:  
> >>> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:  
> >>>> On Tue, 9 Feb 2016 14:17:44 +0200
> >>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>>  
> >>>>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> >>>>>>> So the linker interface solves this rather neatly:
> >>>>>>> bios allocates memory, bios passes memory map to guest.
> >>>>>>> Served us well for several years without need for extensions,
> >>>>>>> and it does solve the VM GEN ID problem, even though
> >>>>>>> 1. it was never designed for huge areas like nvdimm seems to want to use
> >>>>>>> 2. we might want to add a new 64 bit flag to avoid touching low memory  
> >>>>>> linker interface is fine for some readonly data, like ACPI tables
> >>>>>> especially fixed tables not so for AML ones is one wants to patch it.
> >>>>>>
> >>>>>> However now when you want to use it for other purposes you start
> >>>>>> adding extensions and other guest->QEMU channels to communicate
> >>>>>> patching info back.
> >>>>>> It steals guest's memory which is also not nice and doesn't scale well.  
> >>>>>
> >>>>> This is an argument I don't get. memory is memory. call it guest memory
> >>>>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> >>>>> but much slower.
> >>>>>
> >>>>> ...  
> >>>> It however matters for user, he pays for guest with XXX RAM but gets less
> >>>> than that. And that will be getting worse as a number of such devices
> >>>> increases.
> >>>>  
> >>>>>>> OK fine, but returning PCI BAR address to guest is wrong.
> >>>>>>> How about reading it from ACPI then? Is it really
> >>>>>>> broken unless there's *also* a driver?  
> >>>>>> I don't get question, MS Spec requires address (ADDR method),
> >>>>>> and it's read by ACPI (AML).  
> >>>>>
> >>>>> You were unhappy about DMA into guest memory.
> >>>>> As a replacement for DMA, we could have AML read from
> >>>>> e.g. PCI and write into RAM.
> >>>>> This way we don't need to pass address to QEMU.  
> >>>> That sounds better as it saves us from allocation of IO port
> >>>> and QEMU don't need to write into guest memory, the only question is
> >>>> if PCI_Config opregion would work with driver-less PCI device.  
> >>>
> >>> Or PCI BAR for that reason. I don't know for sure.
> >>>  
> >>>>
> >>>> And it's still pretty much not test-able since it would require
> >>>> fully running OSPM to execute AML side.  
> >>>
> >>> AML is not testable, but that's nothing new.
> >>> You can test reading from PCI.
> >>>  
> >>>>>  
> >>>>>> As for working PCI_Config OpRegion without driver, I haven't tried,
> >>>>>> but I wouldn't be surprised if it doesn't, taking in account that
> >>>>>> MS introduced _DSM doesn't.
> >>>>>>  
> >>>>>>>
> >>>>>>>  
> >>>>>>>>>>      Just compare with a graphics card design, where on device memory
> >>>>>>>>>>      is mapped directly at some GPA not wasting RAM that guest could
> >>>>>>>>>>      use for other tasks.  
> >>>>>>>>>
> >>>>>>>>> This might have been true 20 years ago.  Most modern cards do DMA.  
> >>>>>>>>
> >>>>>>>> Modern cards, with it's own RAM, map its VRAM in address space directly
> >>>>>>>> and allow users use it (GEM API). So they do not waste conventional RAM.
> >>>>>>>> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> >>>>>>>> series (even PCI class id is the same)  
> >>>>>>>
> >>>>>>> Don't know enough about graphics really, I'm not sure how these are
> >>>>>>> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> >>>>>>> mostly use guest RAM, not on card RAM.
> >>>>>>>  
> >>>>>>>>>>      VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> >>>>>>>>>>      instead of consuming guest's RAM they should be mapped at
> >>>>>>>>>>      some GPA and their memory accessed directly.  
> >>>>>>>>>
> >>>>>>>>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> >>>>>>>>> address. This breaks the straight-forward approach of using a
> >>>>>>>>> rebalanceable PCI BAR.  
> >>>>>>>>
> >>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver
> >>>>>>>> otherwise OS will ignore it when rebalancing happens and
> >>>>>>>> might map something else over ignored BAR.  
> >>>>>>>
> >>>>>>> Does it disable the BAR then? Or just move it elsewhere?  
> >>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
> >>>>>> another device with driver over it.  
> >>>>>
> >>>>> Interesting. On classical PCI this is a forbidden configuration.
> >>>>> Maybe we do something that confuses windows?
> >>>>> Could you tell me how to reproduce this behaviour?  
> >>>> #cat > t << EOF
> >>>> pci_update_mappings_del
> >>>> pci_update_mappings_add
> >>>> EOF
> >>>>
> >>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> >>>>    -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> >>>>    -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> >>>>    -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> >>>>
> >>>> wait till OS boots, note BARs programmed for ivshmem
> >>>>    in my case it was
> >>>>      01:01.0 0,0xfe800000+0x100
> >>>> then execute script and watch pci_update_mappings* trace events
> >>>>
> >>>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> >>>>
> >>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> >>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> >>>> and then programs new BARs, where:
> >>>>     pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> >>>> creates overlapping BAR with ivshmem  
> >>>
> >>>
> >>> Thanks!
> >>> We need to figure this out because currently this does not
> >>> work properly (or maybe it works, but merely by chance).
> >>> Me and Marcel will play with this.
> >>>  
> >>
> >> I checked and indeed we have 2 separate problems:
> >>
> >> 1. ivshmem is declared as PCI RAM controller and Windows *does* have the drivers
> >>      for it, however it is not remapped on re-balancing.  
> > Does it really have a driver, i.e ivshmem specific one?
> > It should have its own driver or otherwise userspace
> > won't be able to access/work with it and it would be pointless
> > to add such device to machine.  
> 
> No, it does not.
so it's "PCI RAM controller", which is marked as NODRV in INF file,
NODRV they use as a stub to prevent Windows asking for driver assuming
that HW owns/manages device.
And when rebalancing happens Windows completely ignores NODRV
BARs which causes overlapping with devices that have PCI drivers.

> 
> >  
> >>      You can see on Device Manage 2 working devices with the same MMIO region - strange!
> >>      This may be because PCI RAM controllers can't be re-mapped? Even then, it should not be overridden.
> >>      Maybe we need to add a clue to the OS in ACPI regarding this range?
> >>
> >> 2. PCI devices with no driver installed are not re-mapped. This can be OK
> >>      from the Windows point of view because Resources Window does not show the MMIO range
> >>      for this device.
> >>
> >>      If the other (re-mapped) device is working, is pure luck. Both Memory Regions occupy the same range
> >>      and have the same priority.
> >>
> >> We need to think about how to solve this.
> >> One way would be to defer the BAR activation to the guest OS, but I am not sure of the consequences.  
> > deferring won't solve problem as rebalancing could happen later
> > and make BARs overlap.  
> 
> Why not? If we do not activate the BAR in firmware and Windows does not have a driver
> for it, will not activate it at all, right?
> Why would Windows activate the device BAR if it can't use it? At least this is what I hope.
> Any other idea would be appreciated.
> 
> 
> > I've noticed that at startup Windows unmaps and then maps BARs
> > at the same addresses where BIOS've put them before.  
> 
> Including devices without a working driver?
I've just tried, it does so for ivshmem.

> 
> 
> Thanks,
> Marcel
> 
> >  
> >> And this does not solve the ivshmem problem.  
> > So far the only way to avoid overlapping BARs due to Windows
> > doing rebalancing for driver-less devices is to pin such
> > BARs statically with _CRS in ACPI table but as Michael said
> > it fragments PCI address-space.
> >  
> >>
> >> Thanks,
> >> Marcel
> >>
> >>
> >>
> >>
> >>  
> >  
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-16 13:51                               ` Igor Mammedov
@ 2016-02-16 14:53                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-16 14:53 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer, qemu-devel,
	lcapitulino, Marcel Apfelbaum, lersek

On Tue, Feb 16, 2016 at 02:51:25PM +0100, Igor Mammedov wrote:
> On Tue, 16 Feb 2016 14:36:49 +0200
> Marcel Apfelbaum <marcel@redhat.com> wrote:
> 
> > On 02/16/2016 02:17 PM, Igor Mammedov wrote:
> > > On Tue, 16 Feb 2016 12:05:33 +0200
> > > Marcel Apfelbaum <marcel@redhat.com> wrote:
> > >  
> > >> On 02/11/2016 06:30 PM, Michael S. Tsirkin wrote:  
> > >>> On Thu, Feb 11, 2016 at 04:16:05PM +0100, Igor Mammedov wrote:  
> > >>>> On Tue, 9 Feb 2016 14:17:44 +0200
> > >>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >>>>  
> > >>>>> On Tue, Feb 09, 2016 at 11:46:08AM +0100, Igor Mammedov wrote:  
> > >>>>>>> So the linker interface solves this rather neatly:
> > >>>>>>> bios allocates memory, bios passes memory map to guest.
> > >>>>>>> Served us well for several years without need for extensions,
> > >>>>>>> and it does solve the VM GEN ID problem, even though
> > >>>>>>> 1. it was never designed for huge areas like nvdimm seems to want to use
> > >>>>>>> 2. we might want to add a new 64 bit flag to avoid touching low memory  
> > >>>>>> linker interface is fine for some readonly data, like ACPI tables
> > >>>>>> especially fixed tables not so for AML ones is one wants to patch it.
> > >>>>>>
> > >>>>>> However now when you want to use it for other purposes you start
> > >>>>>> adding extensions and other guest->QEMU channels to communicate
> > >>>>>> patching info back.
> > >>>>>> It steals guest's memory which is also not nice and doesn't scale well.  
> > >>>>>
> > >>>>> This is an argument I don't get. memory is memory. call it guest memory
> > >>>>> or RAM backed PCI BAR - same thing. MMIO is cheaper of course
> > >>>>> but much slower.
> > >>>>>
> > >>>>> ...  
> > >>>> It however matters for user, he pays for guest with XXX RAM but gets less
> > >>>> than that. And that will be getting worse as a number of such devices
> > >>>> increases.
> > >>>>  
> > >>>>>>> OK fine, but returning PCI BAR address to guest is wrong.
> > >>>>>>> How about reading it from ACPI then? Is it really
> > >>>>>>> broken unless there's *also* a driver?  
> > >>>>>> I don't get question, MS Spec requires address (ADDR method),
> > >>>>>> and it's read by ACPI (AML).  
> > >>>>>
> > >>>>> You were unhappy about DMA into guest memory.
> > >>>>> As a replacement for DMA, we could have AML read from
> > >>>>> e.g. PCI and write into RAM.
> > >>>>> This way we don't need to pass address to QEMU.  
> > >>>> That sounds better as it saves us from allocation of IO port
> > >>>> and QEMU don't need to write into guest memory, the only question is
> > >>>> if PCI_Config opregion would work with driver-less PCI device.  
> > >>>
> > >>> Or PCI BAR for that reason. I don't know for sure.
> > >>>  
> > >>>>
> > >>>> And it's still pretty much not test-able since it would require
> > >>>> fully running OSPM to execute AML side.  
> > >>>
> > >>> AML is not testable, but that's nothing new.
> > >>> You can test reading from PCI.
> > >>>  
> > >>>>>  
> > >>>>>> As for working PCI_Config OpRegion without driver, I haven't tried,
> > >>>>>> but I wouldn't be surprised if it doesn't, taking in account that
> > >>>>>> MS introduced _DSM doesn't.
> > >>>>>>  
> > >>>>>>>
> > >>>>>>>  
> > >>>>>>>>>>      Just compare with a graphics card design, where on device memory
> > >>>>>>>>>>      is mapped directly at some GPA not wasting RAM that guest could
> > >>>>>>>>>>      use for other tasks.  
> > >>>>>>>>>
> > >>>>>>>>> This might have been true 20 years ago.  Most modern cards do DMA.  
> > >>>>>>>>
> > >>>>>>>> Modern cards, with it's own RAM, map its VRAM in address space directly
> > >>>>>>>> and allow users use it (GEM API). So they do not waste conventional RAM.
> > >>>>>>>> For example NVIDIA VRAM is mapped as PCI BARs the same way like in this
> > >>>>>>>> series (even PCI class id is the same)  
> > >>>>>>>
> > >>>>>>> Don't know enough about graphics really, I'm not sure how these are
> > >>>>>>> relevant.  NICs and disks certainly do DMA.  And virtio gl seems to
> > >>>>>>> mostly use guest RAM, not on card RAM.
> > >>>>>>>  
> > >>>>>>>>>>      VMGENID and NVDIMM use-cases look to me exactly the same, i.e.
> > >>>>>>>>>>      instead of consuming guest's RAM they should be mapped at
> > >>>>>>>>>>      some GPA and their memory accessed directly.  
> > >>>>>>>>>
> > >>>>>>>>> VMGENID is tied to a spec that rather arbitrarily asks for a fixed
> > >>>>>>>>> address. This breaks the straight-forward approach of using a
> > >>>>>>>>> rebalanceable PCI BAR.  
> > >>>>>>>>
> > >>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI driver
> > >>>>>>>> otherwise OS will ignore it when rebalancing happens and
> > >>>>>>>> might map something else over ignored BAR.  
> > >>>>>>>
> > >>>>>>> Does it disable the BAR then? Or just move it elsewhere?  
> > >>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
> > >>>>>> another device with driver over it.  
> > >>>>>
> > >>>>> Interesting. On classical PCI this is a forbidden configuration.
> > >>>>> Maybe we do something that confuses windows?
> > >>>>> Could you tell me how to reproduce this behaviour?  
> > >>>> #cat > t << EOF
> > >>>> pci_update_mappings_del
> > >>>> pci_update_mappings_add
> > >>>> EOF
> > >>>>
> > >>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
> > >>>>    -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
> > >>>>    -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
> > >>>>    -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
> > >>>>
> > >>>> wait till OS boots, note BARs programmed for ivshmem
> > >>>>    in my case it was
> > >>>>      01:01.0 0,0xfe800000+0x100
> > >>>> then execute script and watch pci_update_mappings* trace events
> > >>>>
> > >>>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done;
> > >>>>
> > >>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
> > >>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
> > >>>> and then programs new BARs, where:
> > >>>>     pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
> > >>>> creates overlapping BAR with ivshmem  
> > >>>
> > >>>
> > >>> Thanks!
> > >>> We need to figure this out because currently this does not
> > >>> work properly (or maybe it works, but merely by chance).
> > >>> Me and Marcel will play with this.
> > >>>  
> > >>
> > >> I checked and indeed we have 2 separate problems:
> > >>
> > >> 1. ivshmem is declared as PCI RAM controller and Windows *does* have the drivers
> > >>      for it, however it is not remapped on re-balancing.  
> > > Does it really have a driver, i.e ivshmem specific one?
> > > It should have its own driver or otherwise userspace
> > > won't be able to access/work with it and it would be pointless
> > > to add such device to machine.  
> > 
> > No, it does not.
> so it's "PCI RAM controller", which is marked as NODRV in INF file,
> NODRV they use as a stub to prevent Windows asking for driver assuming
> that HW owns/manages device.
> And when rebalancing happens Windows completely ignores NODRV
> BARs which causes overlapping with devices that have PCI drivers.

But that can't work for classic pci: if BARs overlap,
behaviour is undefined.
We do something that windows does not expect that makes
it create this setup.

Is this something enabling BARs in fimware? Something else?

> > 
> > >  
> > >>      You can see on Device Manage 2 working devices with the same MMIO region - strange!
> > >>      This may be because PCI RAM controllers can't be re-mapped? Even then, it should not be overridden.
> > >>      Maybe we need to add a clue to the OS in ACPI regarding this range?
> > >>
> > >> 2. PCI devices with no driver installed are not re-mapped. This can be OK
> > >>      from the Windows point of view because Resources Window does not show the MMIO range
> > >>      for this device.
> > >>
> > >>      If the other (re-mapped) device is working, is pure luck. Both Memory Regions occupy the same range
> > >>      and have the same priority.
> > >>
> > >> We need to think about how to solve this.
> > >> One way would be to defer the BAR activation to the guest OS, but I am not sure of the consequences.  
> > > deferring won't solve problem as rebalancing could happen later
> > > and make BARs overlap.  
> > 
> > Why not? If we do not activate the BAR in firmware and Windows does not have a driver
> > for it, will not activate it at all, right?
> > Why would Windows activate the device BAR if it can't use it? At least this is what I hope.
> > Any other idea would be appreciated.
> > 
> > 
> > > I've noticed that at startup Windows unmaps and then maps BARs
> > > at the same addresses where BIOS've put them before.  
> > 
> > Including devices without a working driver?
> I've just tried, it does so for ivshmem.
> 
> > 
> > 
> > Thanks,
> > Marcel
> > 
> > >  
> > >> And this does not solve the ivshmem problem.  
> > > So far the only way to avoid overlapping BARs due to Windows
> > > doing rebalancing for driver-less devices is to pin such
> > > BARs statically with _CRS in ACPI table but as Michael said
> > > it fragments PCI address-space.
> > >  
> > >>
> > >> Thanks,
> > >> Marcel
> > >>
> > >>
> > >>
> > >>
> > >>  
> > >  
> > 
> > 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device
  2016-02-16 12:36                             ` Marcel Apfelbaum
  2016-02-16 13:51                               ` Igor Mammedov
@ 2016-02-16 15:10                               ` Michael S. Tsirkin
  1 sibling, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2016-02-16 15:10 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Xiao Guangrong, ehabkost, Marcel Apfelbaum, ghammer, qemu-devel,
	lcapitulino, Igor Mammedov, lersek

On Tue, Feb 16, 2016 at 02:36:49PM +0200, Marcel Apfelbaum wrote:
> >>2. PCI devices with no driver installed are not re-mapped. This can be OK
> >>     from the Windows point of view because Resources Window does not show the MMIO range
> >>     for this device.
> >>
> >>     If the other (re-mapped) device is working, is pure luck. Both Memory Regions occupy the same range
> >>     and have the same priority.
> >>
> >>We need to think about how to solve this.
> >>One way would be to defer the BAR activation to the guest OS, but I am not sure of the consequences.
> >deferring won't solve problem as rebalancing could happen later
> >and make BARs overlap.
> 
> Why not? If we do not activate the BAR in firmware and Windows does not have a driver
> for it, will not activate it at all, right?
> Why would Windows activate the device BAR if it can't use it? At least this is what I hope.
> Any other idea would be appreciated.
> 

I wonder whether this is related to setting PnP in CMOS.
See e.g.  http://oss.sgi.com/LDP/HOWTO/Plug-and-Play-HOWTO-4.html
and https://support.microsoft.com/en-us/kb/321779



> >I've noticed that at startup Windows unmaps and then maps BARs
> >at the same addresses where BIOS've put them before.
> 
> Including devices without a working driver?
> 
> 
> Thanks,
> Marcel
> 
> >
> >>And this does not solve the ivshmem problem.
> >So far the only way to avoid overlapping BARs due to Windows
> >doing rebalancing for driver-less devices is to pin such
> >BARs statically with _CRS in ACPI table but as Michael said
> >it fragments PCI address-space.
> >
> >>
> >>Thanks,
> >>Marcel
> >>
> >>
> >>
> >>
> >>
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2016-02-16 15:10 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-28 10:54 [Qemu-devel] [PATCH v19 0/9] Virtual Machine Generation ID Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 1/9] acpi: extend ACPI interface to provide access to ACPI registers and SCI irq Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 2/9] docs: vm generation id device's description Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 3/9] pc: add a Virtual Machine Generation ID device Igor Mammedov
2016-01-28 11:13   ` Michael S. Tsirkin
2016-01-28 12:03     ` Igor Mammedov
2016-01-28 12:59       ` Michael S. Tsirkin
2016-01-29 11:13         ` Igor Mammedov
2016-01-31 16:22           ` Michael S. Tsirkin
2016-02-02  9:59             ` Igor Mammedov
2016-02-02 11:16               ` Michael S. Tsirkin
2016-02-09 10:46                 ` Igor Mammedov
2016-02-09 12:17                   ` Michael S. Tsirkin
2016-02-11 15:16                     ` Igor Mammedov
2016-02-11 16:30                       ` Michael S. Tsirkin
2016-02-11 17:34                         ` Marcel Apfelbaum
2016-02-12  6:15                           ` Michael S. Tsirkin
2016-02-15 10:30                         ` Igor Mammedov
2016-02-15 11:26                           ` Michael S. Tsirkin
2016-02-15 13:56                             ` Igor Mammedov
2016-02-16 10:05                         ` Marcel Apfelbaum
2016-02-16 12:17                           ` Igor Mammedov
2016-02-16 12:36                             ` Marcel Apfelbaum
2016-02-16 13:51                               ` Igor Mammedov
2016-02-16 14:53                                 ` Michael S. Tsirkin
2016-02-16 15:10                               ` Michael S. Tsirkin
2016-02-10  8:51                   ` Michael S. Tsirkin
2016-02-10  9:28                     ` Michael S. Tsirkin
2016-02-10 10:00                       ` Laszlo Ersek
2016-01-28 13:48     ` Laszlo Ersek
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 4/9] tests: add a unit test for the vmgenid device Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 5/9] qmp/hmp: add query-vm-generation-id and 'info vm-generation-id' commands Igor Mammedov
2016-02-09 17:31   ` Eric Blake
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 6/9] qmp/hmp: add set-vm-generation-id commands Igor Mammedov
2016-02-09 17:33   ` Eric Blake
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 8/9] pc: put PIIX3 in slot 1 explicitly and cleanup functions assignment Igor Mammedov
2016-01-28 10:54 ` [Qemu-devel] [PATCH v19 9/9] pc/q53: by default put vmgenid device as an function of ISA bridge Igor Mammedov
2016-01-28 10:58 ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Igor Mammedov
2016-01-28 14:02   ` Eduardo Habkost
2016-01-28 17:00     ` Igor Mammedov
2016-02-03 17:55       ` [Qemu-devel] qdev & hw/core owner? (was Re: [PATCH v19 7/9] machine: add properties to compat_props incrementaly) Eduardo Habkost
2016-02-03 18:46         ` Laszlo Ersek
2016-02-03 19:06         ` Michael S. Tsirkin
2016-02-04 11:31           ` Paolo Bonzini
2016-02-04 11:41             ` Andreas Färber
2016-02-04 11:55               ` Paolo Bonzini
2016-02-04 12:06                 ` Michael S. Tsirkin
2016-02-05  7:49                   ` Markus Armbruster
2016-02-05  7:51                     ` Marcel Apfelbaum
2016-02-11 19:41                       ` Eduardo Habkost
2016-02-12  9:17                         ` Marcel Apfelbaum
2016-02-12 11:22                           ` Andreas Färber
2016-02-12 18:17                             ` Eduardo Habkost
2016-02-12 22:30                               ` Paolo Bonzini
2016-02-12 18:09                           ` Eduardo Habkost
2016-02-05  7:52                 ` Markus Armbruster
2016-02-04 12:03               ` Michael S. Tsirkin
2016-02-04 12:12               ` Marcel Apfelbaum
2016-01-29 12:51   ` [Qemu-devel] [PATCH v19 7/9] machine: add properties to compat_props incrementaly Cornelia Huck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.