All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] vmgenid: add generation counter
@ 2022-08-03 13:41 bchalios
  2022-08-03 13:41 ` [PATCH 1/2] vmgenid: make device data size configurable bchalios
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: bchalios @ 2022-08-03 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: bchalios, ani, mst, imammedo, dwmw, graf, xmarcalx

From: Babis Chalios <bchalios@amazon.es>

VM generation ID exposes a GUID inside the VM which changes every time a
VM restore is happening. Typically, this GUID is used by the guest
kernel to re-seed its internal PRNG. As a result, this value cannot be
exposed in guest user-space as a notification mechanism for VM restore
events.

This patch set extends vmgenid to introduce a 32 bits generation counter
whose purpose is to be used as a VM restore notification mechanism for
the guest user-space.

It is true that such a counter could be implemented entirely by the
guest kernel, but this would rely on the vmgenid ACPI notification to
trigger the counter update, which is inherently racy. Exposing this
through the monitor allows the updated value to be in-place before
resuming the vcpus, so interested user-space code can (atomically)
observe the update without relying on the ACPI notification.

Babis Chalios (2):
  vmgenid: make device data size configurable
  vmgenid: add generation counter

 docs/specs/vmgenid.txt    | 101 ++++++++++++++++++--------
 hw/acpi/vmgenid.c         | 145 +++++++++++++++++++++++++++++++-------
 include/hw/acpi/vmgenid.h |  23 ++++--
 3 files changed, 204 insertions(+), 65 deletions(-)

-- 
2.37.1

Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] vmgenid: make device data size configurable
  2022-08-03 13:41 [PATCH 0/2] vmgenid: add generation counter bchalios
@ 2022-08-03 13:41 ` bchalios
  2022-08-03 13:41 ` [PATCH 2/2] vmgenid: add generation counter bchalios
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: bchalios @ 2022-08-03 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: bchalios, ani, mst, imammedo, dwmw, graf, xmarcalx

From: Babis Chalios <bchalios@amazon.es>

When allocating memory for the device data the assumption is we are
dealing with 4K pages. Make this configurable, so that other
architectures can be handled.

Note, than in the original spec this is not a requirement, however, it
is useful for implementing the generation counter (see next commit in
this patchset) in architectures with page sizes other than 4K.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
---
 docs/specs/vmgenid.txt    |  8 +++---
 hw/acpi/vmgenid.c         | 57 ++++++++++++++++++++++++++++++++++++---
 include/hw/acpi/vmgenid.h | 14 +++++-----
 3 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
index 80ff69f31c..5274b4c895 100644
--- a/docs/specs/vmgenid.txt
+++ b/docs/specs/vmgenid.txt
@@ -225,14 +225,16 @@ following diagram:
 Device Usage:
 -------------
 
-The device has one property, which may be only be set using the command line:
+The device has two properties, which may be only be set using the command line:
 
-  guid - sets the value of the GUID.  A special value "auto" instructs
+  guid - sets the value of the GUID. A special value "auto" instructs
          QEMU to generate a new random GUID.
+  page-size - sets the target machines page size. Currently accepted values
+              are 4096 (default) and 65536.
 
 For example:
 
-  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
+  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87",page-size=65536
   QEMU  -device vmgenid,guid=auto
 
 The property may be queried via QMP/HMP:
diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
index 0c9f158ac9..ac2b116b6a 100644
--- a/hw/acpi/vmgenid.c
+++ b/hw/acpi/vmgenid.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/visitor.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-machine.h"
 #include "qemu/module.h"
@@ -35,7 +36,7 @@ void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *guid,
     /* Fill in the GUID values.  These need to be converted to little-endian
      * first, since that's what the guest expects
      */
-    g_array_set_size(guid, VMGENID_FW_CFG_SIZE - ARRAY_SIZE(guid_le.data));
+    g_array_set_size(guid, vms->page_size - ARRAY_SIZE(guid_le.data));
     guid_le = qemu_uuid_bswap(vms->guid);
     /* The GUID is written at a fixed offset into the fw_cfg file
      * in order to implement the "OVMF SDT Header probe suppressor"
@@ -94,7 +95,8 @@ void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *guid,
     g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
 
     /* Allocate guest memory for the Data fw_cfg blob */
-    bios_linker_loader_alloc(linker, VMGENID_GUID_FW_CFG_FILE, guid, 4096,
+    bios_linker_loader_alloc(linker, VMGENID_GUID_FW_CFG_FILE, guid,
+                             vms->page_size,
                              false /* page boundary, high memory */);
 
     /* Patch address of GUID fw_cfg blob into the ADDR fw_cfg blob
@@ -124,8 +126,7 @@ void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *guid,
 void vmgenid_add_fw_cfg(VmGenIdState *vms, FWCfgState *s, GArray *guid)
 {
     /* Create a read-only fw_cfg file for GUID */
-    fw_cfg_add_file(s, VMGENID_GUID_FW_CFG_FILE, guid->data,
-                    VMGENID_FW_CFG_SIZE);
+    fw_cfg_add_file(s, VMGENID_GUID_FW_CFG_FILE, guid->data, vms->page_size);
     /* Create a read-write fw_cfg file for Address */
     fw_cfg_add_file_callback(s, VMGENID_ADDR_FW_CFG_FILE, NULL, NULL, NULL,
                              vms->vmgenid_addr_le,
@@ -215,8 +216,56 @@ static void vmgenid_realize(DeviceState *dev, Error **errp)
     vmgenid_update_guest(vms);
 }
 
+static void get_page_size(Object *obj, Visitor *v, const char *name,
+                         void *opaque, Error **errp)
+{
+    Property *prop = opaque;
+    uint32_t *page_size = object_field_prop_ptr(obj, prop);
+
+    visit_type_uint32(v, name, page_size, errp);
+}
+
+static void set_page_size(Object *obj, Visitor *v, const char *name,
+                         void *opaque, Error **errp)
+{
+    Property *prop = opaque;
+    uint32_t *page_size = object_field_prop_ptr(obj, prop);
+    uint32_t val;
+    char str[10];
+
+    if (!visit_type_uint32(v, name, &val, errp)) {
+        return;
+    }
+
+    switch (val) {
+    case 4096:
+    case 65536:
+        *page_size = val;
+        break;
+    default:
+        snprintf(str, 10, "%d", val);
+        error_set_from_qdev_prop_error(errp, EINVAL, obj, name, str);
+    }
+}
+
+static void set_default_page_size(ObjectProperty *op, const Property *prop)
+{
+    object_property_set_default_uint(op, VMGENID_DEFAULT_FW_PAGE_SIZE);
+}
+
+const PropertyInfo vmgenid_prop_page_size = {
+    .name = "uint32",
+    .description = "Page size to use for allocating device memory. \""
+                   "\"Valid values: 4096(default) 65536",
+    .get = get_page_size,
+    .set = set_page_size,
+    .set_default_value = set_default_page_size,
+};
+
 static Property vmgenid_device_properties[] = {
     DEFINE_PROP_UUID(VMGENID_GUID, VmGenIdState, guid),
+    DEFINE_PROP_UNSIGNED(VMGENID_PAGE_SIZE, VmGenIdState, page_size, 0,
+                         vmgenid_prop_page_size, uint32_t),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h
index dc8bb3433e..e4d83f5c74 100644
--- a/include/hw/acpi/vmgenid.h
+++ b/include/hw/acpi/vmgenid.h
@@ -6,15 +6,16 @@
 #include "qemu/uuid.h"
 #include "qom/object.h"
 
-#define TYPE_VMGENID           "vmgenid"
-#define VMGENID_GUID             "guid"
+#define TYPE_VMGENID                  "vmgenid"
+#define VMGENID_GUID                  "guid"
+#define VMGENID_PAGE_SIZE             "page-size"
 #define VMGENID_GUID_FW_CFG_FILE      "etc/vmgenid_guid"
 #define VMGENID_ADDR_FW_CFG_FILE      "etc/vmgenid_addr"
 
-#define VMGENID_FW_CFG_SIZE      4096 /* Occupy a page of memory */
-#define VMGENID_GUID_OFFSET      40   /* allow space for
-                                       * OVMF SDT Header Probe Supressor
-                                       */
+#define VMGENID_DEFAULT_FW_PAGE_SIZE  4096 /* Assume 4K page by default */
+#define VMGENID_GUID_OFFSET           40   /* allow space for OVMF SDT Header
+                                            * Probe Supressor
+                                            */
 
 OBJECT_DECLARE_SIMPLE_TYPE(VmGenIdState, VMGENID)
 
@@ -22,6 +23,7 @@ struct VmGenIdState {
     DeviceState parent_obj;
     QemuUUID guid;                /* The 128-bit GUID seen by the guest */
     uint8_t vmgenid_addr_le[8];   /* Address of the GUID (little-endian) */
+    uint32_t page_size;           /* Page size to use as a the allocation unit */
 };
 
 /* returns NULL unless there is exactly one device */
-- 
2.32.1 (Apple Git-133)

Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] vmgenid: add generation counter
  2022-08-03 13:41 [PATCH 0/2] vmgenid: add generation counter bchalios
  2022-08-03 13:41 ` [PATCH 1/2] vmgenid: make device data size configurable bchalios
@ 2022-08-03 13:41 ` bchalios
  2022-08-03 15:36 ` [PATCH 0/2] " Michael S. Tsirkin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: bchalios @ 2022-08-03 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: bchalios, ani, mst, imammedo, dwmw, graf, xmarcalx

From: Babis Chalios <bchalios@amazon.es>

Some user-space applications such as user-space PRNGs or applications
that keep world-unique data need to be notified about VM restore events,
so that they modify their internal state. However, the vmgenid 128-bits
UUID is consumed by the guest kernel to reseed its RNG, hence it is not
suitable to be exposed to user-space.

In order to address the user-space needs this commit extends the vmgenid
to include a 32-bits generation counter. The value of the counter is
meant to be increased every time we start a new VM from a saved state.

It is stored in a page aligned, one page-long buffer, so that it allows
the guest kernel to expose it as an mmap-able device to the user-space
for low latency access, out of the notification path.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
---
 docs/specs/vmgenid.txt    | 97 +++++++++++++++++++++++++++------------
 hw/acpi/vmgenid.c         | 96 +++++++++++++++++++++++++++-----------
 include/hw/acpi/vmgenid.h |  9 +++-
 3 files changed, 144 insertions(+), 58 deletions(-)

diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
index 5274b4c895..3cf6d0e191 100644
--- a/docs/specs/vmgenid.txt
+++ b/docs/specs/vmgenid.txt
@@ -20,6 +20,15 @@ guest operating system notices the change, and is then able to react as
 appropriate by marking its copies of distributed databases as dirty,
 re-initializing its random number generator etc.
 
+Currently, in Linux the UUID exposed by vmgenid is consumed by the guest kernel
+to reseed its RNG and not exposed in the user-space, hence leaving applications
+without a mechanism which would help them detect a VM restore event.
+
+To address the use-case of user-space applications we extend the vmgenid
+specification to include a 32-bit generation counter. The counter needs to be
+incremented every time a VM is loaded from a saved state by the user. The guest
+kernel needs to expose to user-space applications which should monitor it for
+detecting the VM-load event.
 
 Requirements
 ------------
@@ -53,12 +62,24 @@ R2 to R5. [These AML requirements are isolated well enough in the Microsoft
 R6. The hypervisor shall expose a _HID (hardware identifier) object in the
     VMGenId device's scope that is unique to the hypervisor vendor.
 
+The generation counter and the buffer holding it shall adhere to the same
+requirements as the generation ID. Moreover,
+
+R7. The BIOS will need to include an object below the device named "CTRA",
+    which, similarly to "ADDR", will evaluate to a package of two integers. The
+    first integer must be the low 32-bits of the guest physical address of the
+    buffer that contains generation counter. The second integer must be the
+    hight 32 bits of the guest physical address of the same buffer.
+
+R8. The buffer holding the generation counter will be page aligned and 1 page
+    long.
+
 
 QEMU Implementation
 -------------------
 
 The above-mentioned specification does not dictate which ACPI descriptor table
-will contain the VM Generation ID device.  Other implementations (Hyper-V and
+will contain the VM Generation ID device. Other implementations (Hyper-V and
 Xen) put it in the main descriptor table (Differentiated System Description
 Table or DSDT).  For ease of debugging and implementation, we have decided to
 put it in its own Secondary System Description Table, or SSDT.
@@ -72,8 +93,8 @@ ASL+ Optimizing Compiler version 20150717-64
 Copyright (c) 2000 - 2015 Intel Corporation
 
 Reading ACPI table from file /sys/firmware/acpi/tables/SSDT - Length
-00000198 (0x0000C6)
-ACPI: SSDT 0x0000000000000000 0000C6 (v01 BOCHS  VMGENID  00000001 BXPC
+00000236 (0x0000EC)
+ACPI: SSDT 0x0000000000000000 0000EC (v01 BOCHS  VMGENID  00000001 BXPC
 00000001)
 Acpi table [SSDT] successfully installed and loaded
 Pass 1 parse of [SSDT]
@@ -95,9 +116,9 @@ ASL Output:    ./SSDT.dsl - 1631 bytes
  *
  * Original Table Header:
  *     Signature        "SSDT"
- *     Length           0x000000CA (202)
+ *     Length           0x000000EC (236)
  *     Revision         0x01
- *     Checksum         0x4B
+ *     Checksum         0x63
  *     OEM ID           "BOCHS "
  *     OEM Table ID     "VMGENID"
  *     OEM Revision     0x00000001 (1)
@@ -133,6 +154,14 @@ DefinitionBlock ("/sys/firmware/acpi/tables/SSDT.aml", "SSDT", 1, "BOCHS ",
                 Index (Local0, One) = Zero
                 Return (Local0)
             }
+
+            Method (CTRA, 0, NotSerialized)
+            {
+                Local0 = Package (0x02) {}
+                Local0 [Zero] = (VGIA + 0x1000)
+                Local0 [One] = Zero
+                Return (Local0)
+            }
         }
     }
 
@@ -161,7 +190,8 @@ More information about fw_cfg can be found in "docs/specs/fw_cfg.txt"
 
 Two fw_cfg blobs are used in this case:
 
-/etc/vmgenid_guid - contains the actual VM Generation ID GUID
+/etc/vmgenid_data - contains the actual VM Generation ID GUID and Generation
+                    counter
                   - read-only to the guest
 /etc/vmgenid_addr - contains the address of the downloaded vmgenid blob
                   - writable by the guest
@@ -169,14 +199,14 @@ Two fw_cfg blobs are used in this case:
 
 QEMU sends the following commands to the guest at startup:
 
-1. Allocate memory for vmgenid_guid fw_cfg blob.
-2. Write the address of vmgenid_guid into the SSDT (VGIA ACPI variable as
+1. Allocate memory for vmgenid_data fw_cfg blob.
+2. Write the address of vmgenid_data into the SSDT (VGIA ACPI variable as
    shown above in the iasl dump).  Note that this change is not propagated
    back to QEMU.
-3. Write the address of vmgenid_guid back to QEMU's copy of vmgenid_addr
+3. Write the address of vmgenid_data back to QEMU's copy of vmgenid_addr
    via the fw_cfg DMA interface.
 
-After step 3, QEMU is able to update the contents of vmgenid_guid at will.
+After step 3, QEMU is able to update the contents of vmgenid_data at will.
 
 Since BIOS or UEFI does not necessarily run when we wish to change the GUID,
 the value of VGIA is persisted via the VMState mechanism.
@@ -196,46 +226,53 @@ All GUID passed in via command line or monitor are treated as big-endian.
 GUID values displayed via monitor are shown in big-endian format.
 
 
-GUID Storage Format:
+vmgenid Storage Format:
 --------------------
 
-In order to implement an OVMF "SDT Header Probe Suppressor", the contents of
-the vmgenid_guid fw_cfg blob are not simply a 128-bit GUID.  There is also
-significant padding in order to align and fill a memory page, as shown in the
-following diagram:
+The contents of the vmgenid_data fw_cfg blob are two memory pages. In beginning
+of the first page lives the OVMF SDT Header probe padded so that GUID is stored
+in an 8-byte aligned address. Immediately after that is stored the GUID
+followed by padding until the end of the page. The genration counter lives
+in the beginning of the second page. The rest of the page is padded with zeros.
+The layout is shown in the following diagram:
 
 +----------------------------------+
 | SSDT with OEM Table ID = VMGENID |
 +----------------------------------+
 | ...                              |       TOP OF PAGE
-| VGIA dword object ---------------|-----> +---------------------------+
-| ...                              |       | fw-allocated array for    |
-| _STA method referring to VGIA    |       | "etc/vmgenid_guid"        |
-| ...                              |       +---------------------------+
-| ADDR method referring to VGIA    |       |  0: OVMF SDT Header probe |
-| ...                              |       |     suppressor            |
-+----------------------------------+       | 36: padding for 8-byte    |
-                                           |     alignment             |
-                                           | 40: GUID                  |
-                                           | 56: padding to page size  |
-                                           +---------------------------+
-                                           END OF PAGE
+| VGIA dword object ---------------|-----> +------------------------------------+
+| ...                              |       | fw-allocated array for             |
+| _STA method referring to VGIA    |       | "etc/vmgenid_data"                 |
+| ...                              |       +------------------------------------+
+| ADDR method referring to VGIA    |       |  0: OVMF SDT Header probe          |
+| ...                              |       |     suppressor                     |
++----------------------------------+       | 36: padding for 8-byte             |
+                                           |     alignment                      |
+                                           | 40: GUID                           |
+                                           | 56: padding to page size           |
+                                           |                                    |
+                                           | ...                                |
+                                           | PAGE_SIZE    : Gen counter         |
+                                           | PAGE_SIZE + 4: padding to page size|
+                                           +------------------------------------+
 
 
 Device Usage:
 -------------
 
-The device has two properties, which may be only be set using the command line:
+The device has three properties, which may be only be set using the command
+line:
 
   guid - sets the value of the GUID. A special value "auto" instructs
          QEMU to generate a new random GUID.
   page-size - sets the target machines page size. Currently accepted values
               are 4096 (default) and 65536.
+  genctr - sets the Generation counter of the VM.
 
 For example:
 
-  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87",page-size=65536
-  QEMU  -device vmgenid,guid=auto
+  QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87",page-size=65536,genctr=0
+  QEMU  -device vmgenid,guid=auto,genctr=42
 
 The property may be queried via QMP/HMP:
 
diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
index ac2b116b6a..980d19daf3 100644
--- a/hw/acpi/vmgenid.c
+++ b/hw/acpi/vmgenid.c
@@ -24,7 +24,7 @@
 #include "migration/vmstate.h"
 #include "sysemu/reset.h"
 
-void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *guid,
+void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *vmgenid,
                         BIOSLinker *linker, const char *oem_id)
 {
     Aml *ssdt, *dev, *scope, *method, *addr, *if_ctx;
@@ -34,17 +34,26 @@ void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *guid,
                         .oem_id = oem_id, .oem_table_id = "VMGENID" };
 
     /* Fill in the GUID values.  These need to be converted to little-endian
-     * first, since that's what the guest expects
+     * first, since that's what the guest expects. Allocate two pages, so that
+     * we can dedicate a full page (the second one) to the generation counter.
      */
-    g_array_set_size(guid, vms->page_size - ARRAY_SIZE(guid_le.data));
+    g_array_set_size(vmgenid, 2 * vms->page_size - ARRAY_SIZE(guid_le.data)
+                     - sizeof(vms->gen_ctr));
     guid_le = qemu_uuid_bswap(vms->guid);
     /* The GUID is written at a fixed offset into the fw_cfg file
      * in order to implement the "OVMF SDT Header probe suppressor"
      * see docs/specs/vmgenid.txt for more details
      */
-    g_array_insert_vals(guid, VMGENID_GUID_OFFSET, guid_le.data,
+    g_array_insert_vals(vmgenid, VMGENID_GUID_OFFSET, guid_le.data,
                         ARRAY_SIZE(guid_le.data));
 
+    /* Write the generation counter at the beginning of the second page of
+     * the fw_cfg file.
+     */
+    g_array_insert_vals(vmgenid, vms->page_size, (uint8_t *)&vms->gen_ctr,
+                        sizeof(vms->gen_ctr));
+    vms->has_gen_ctr = true;
+
     /* Put VMGNEID into a separate SSDT table */
     acpi_table_begin(&table, table_data);
     ssdt = init_aml_allocator();
@@ -84,6 +93,23 @@ void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *guid,
     aml_append(method, aml_return(addr));
 
     aml_append(dev, method);
+
+    /* the CTRA method returns two 32-bit words representing the lower and
+     * upper halves of the physical address of the Generation counter inside
+     * the fw_cfg blob
+     */
+    method = aml_method("CTRA", 0, AML_NOTSERIALIZED);
+
+    addr = aml_local(0);
+    aml_append(method, aml_store(aml_package(2), addr));
+    aml_append(method, aml_store(aml_add(aml_name("VGIA"),
+                                         aml_int(vms->page_size), NULL),
+                                 aml_index(addr, aml_int(0))));
+    aml_append(method, aml_store(aml_int(0), aml_index(addr, aml_int(1))));
+    aml_append(method, aml_return(addr));
+
+    aml_append(dev, method);
+
     aml_append(scope, dev);
     aml_append(ssdt, scope);
 
@@ -95,8 +121,8 @@ void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *guid,
     g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
 
     /* Allocate guest memory for the Data fw_cfg blob */
-    bios_linker_loader_alloc(linker, VMGENID_GUID_FW_CFG_FILE, guid,
-                             vms->page_size,
+    bios_linker_loader_alloc(linker, VMGENID_DATA_FW_CFG_FILE, vmgenid,
+                             2 * vms->page_size,
                              false /* page boundary, high memory */);
 
     /* Patch address of GUID fw_cfg blob into the ADDR fw_cfg blob
@@ -108,7 +134,7 @@ void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *guid,
      */
     bios_linker_loader_write_pointer(linker,
         VMGENID_ADDR_FW_CFG_FILE, 0, sizeof(uint64_t),
-        VMGENID_GUID_FW_CFG_FILE, VMGENID_GUID_OFFSET);
+        VMGENID_DATA_FW_CFG_FILE, VMGENID_GUID_OFFSET);
 
     /* Patch address of GUID fw_cfg blob into the AML so OSPM can retrieve
      * and read it.  Note that while we provide storage for 64 bits, only
@@ -116,17 +142,18 @@ void vmgenid_build_acpi(VmGenIdState *vms, GArray *table_data, GArray *guid,
      */
     bios_linker_loader_add_pointer(linker,
         ACPI_BUILD_TABLE_FILE, vgia_offset, sizeof(uint32_t),
-        VMGENID_GUID_FW_CFG_FILE, 0);
+        VMGENID_DATA_FW_CFG_FILE, 0);
 
     /* must be called after above command to ensure correct table checksum */
     acpi_table_end(linker, &table);
     free_aml_allocator();
 }
 
-void vmgenid_add_fw_cfg(VmGenIdState *vms, FWCfgState *s, GArray *guid)
+void vmgenid_add_fw_cfg(VmGenIdState *vms, FWCfgState *s, GArray *vmgenid)
 {
-    /* Create a read-only fw_cfg file for GUID */
-    fw_cfg_add_file(s, VMGENID_GUID_FW_CFG_FILE, guid->data, vms->page_size);
+    /* Create a read-only fw_cfg file for vmgenid data */
+    fw_cfg_add_file(s, VMGENID_DATA_FW_CFG_FILE, vmgenid->data,
+                    2 * vms->page_size);
     /* Create a read-write fw_cfg file for Address */
     fw_cfg_add_file_callback(s, VMGENID_ADDR_FW_CFG_FILE, NULL, NULL, NULL,
                              vms->vmgenid_addr_le,
@@ -136,7 +163,7 @@ void vmgenid_add_fw_cfg(VmGenIdState *vms, FWCfgState *s, GArray *guid)
 static void vmgenid_update_guest(VmGenIdState *vms)
 {
     Object *obj = object_resolve_path_type("", TYPE_ACPI_DEVICE_IF, NULL);
-    uint32_t vmgenid_addr;
+    uint32_t vmgenid_addr, vmgenid_gen_ctr_addr;
     QemuUUID guid_le;
 
     if (obj) {
@@ -146,22 +173,36 @@ static void vmgenid_update_guest(VmGenIdState *vms)
         /* A zero value in vmgenid_addr means that BIOS has not yet written
          * the address
          */
-        if (vmgenid_addr) {
-            /* QemuUUID has the first three words as big-endian, and expect
-             * that any GUIDs passed in will always be BE.  The guest,
-             * however, will expect the fields to be little-endian.
-             * Perform a byte swap immediately before writing.
-             */
-            guid_le = qemu_uuid_bswap(vms->guid);
-            /* The GUID is written at a fixed offset into the fw_cfg file
-             * in order to implement the "OVMF SDT Header probe suppressor"
-             * see docs/specs/vmgenid.txt for more details.
-             */
-            cpu_physical_memory_write(vmgenid_addr, guid_le.data,
-                                      sizeof(guid_le.data));
-            /* Send _GPE.E05 event */
-            acpi_send_event(DEVICE(obj), ACPI_VMGENID_CHANGE_STATUS);
+        if (!vmgenid_addr) {
+            return;
+        }
+
+        /* Only update the generation counter if the original guest supported
+         * it.
+         */
+        if (vms->has_gen_ctr) {
+            /* The generation counter address is in the second page of the blob */
+            vmgenid_gen_ctr_addr = vmgenid_addr - VMGENID_GUID_OFFSET
+                                   + vms->page_size;
+
+            cpu_physical_memory_write(vmgenid_gen_ctr_addr, &vms->gen_ctr,
+                                      sizeof(vms->gen_ctr));
         }
+
+        /* QemuUUID has the first three words as big-endian, and expect
+         * that any GUIDs passed in will always be BE.  The guest,
+         * however, will expect the fields to be little-endian.
+         * Perform a byte swap immediately before writing.
+         */
+        guid_le = qemu_uuid_bswap(vms->guid);
+        /* The GUID is written at a fixed offset into the fw_cfg file
+         * in order to implement the "OVMF SDT Header probe suppressor"
+         * see docs/specs/vmgenid.txt for more details.
+         */
+        cpu_physical_memory_write(vmgenid_addr, guid_le.data,
+                sizeof(guid_le.data));
+        /* Send _GPE.E05 event */
+        acpi_send_event(DEVICE(obj), ACPI_VMGENID_CHANGE_STATUS);
     }
 }
 
@@ -266,6 +307,7 @@ static Property vmgenid_device_properties[] = {
     DEFINE_PROP_UUID(VMGENID_GUID, VmGenIdState, guid),
     DEFINE_PROP_UNSIGNED(VMGENID_PAGE_SIZE, VmGenIdState, page_size, 0,
                          vmgenid_prop_page_size, uint32_t),
+    DEFINE_PROP_UINT32(VMGENID_GEN_CTR, VmGenIdState, gen_ctr, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h
index e4d83f5c74..56dd7533b0 100644
--- a/include/hw/acpi/vmgenid.h
+++ b/include/hw/acpi/vmgenid.h
@@ -9,7 +9,8 @@
 #define TYPE_VMGENID                  "vmgenid"
 #define VMGENID_GUID                  "guid"
 #define VMGENID_PAGE_SIZE             "page-size"
-#define VMGENID_GUID_FW_CFG_FILE      "etc/vmgenid_guid"
+#define VMGENID_GEN_CTR               "genctr"
+#define VMGENID_DATA_FW_CFG_FILE      "etc/vmgenid_data"
 #define VMGENID_ADDR_FW_CFG_FILE      "etc/vmgenid_addr"
 
 #define VMGENID_DEFAULT_FW_PAGE_SIZE  4096 /* Assume 4K page by default */
@@ -22,6 +23,12 @@ OBJECT_DECLARE_SIMPLE_TYPE(VmGenIdState, VMGENID)
 struct VmGenIdState {
     DeviceState parent_obj;
     QemuUUID guid;                /* The 128-bit GUID seen by the guest */
+    uint32_t gen_ctr;             /* A 32bit generation counter meant to
+                                   * be exposed in the guest user-space
+                                   */
+    bool has_gen_ctr;             /* Backwards compatibility: Only update a
+                                   * guest when generation current present
+                                   */
     uint8_t vmgenid_addr_le[8];   /* Address of the GUID (little-endian) */
     uint32_t page_size;           /* Page size to use as a the allocation unit */
 };
-- 
2.32.1 (Apple Git-133)

Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] vmgenid: add generation counter
  2022-08-03 13:41 [PATCH 0/2] vmgenid: add generation counter bchalios
  2022-08-03 13:41 ` [PATCH 1/2] vmgenid: make device data size configurable bchalios
  2022-08-03 13:41 ` [PATCH 2/2] vmgenid: add generation counter bchalios
@ 2022-08-03 15:36 ` Michael S. Tsirkin
  2022-08-03 16:17   ` bchalios
  2022-08-03 16:26 ` Daniel P. Berrangé
  2022-08-04 15:01 ` Jason A. Donenfeld
  4 siblings, 1 reply; 12+ messages in thread
From: Michael S. Tsirkin @ 2022-08-03 15:36 UTC (permalink / raw)
  To: bchalios; +Cc: qemu-devel, ani, imammedo, dwmw, graf, xmarcalx

On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> From: Babis Chalios <bchalios@amazon.es>
> 
> VM generation ID exposes a GUID inside the VM which changes every time a
> VM restore is happening. Typically, this GUID is used by the guest
> kernel to re-seed its internal PRNG. As a result, this value cannot be
> exposed in guest user-space as a notification mechanism for VM restore
> events.
> 
> This patch set extends vmgenid to introduce a 32 bits generation counter
> whose purpose is to be used as a VM restore notification mechanism for
> the guest user-space.
> 
> It is true that such a counter could be implemented entirely by the
> guest kernel, but this would rely on the vmgenid ACPI notification to
> trigger the counter update, which is inherently racy. Exposing this
> through the monitor allows the updated value to be in-place before
> resuming the vcpus, so interested user-space code can (atomically)
> observe the update without relying on the ACPI notification.

Producing another 4 bytes is not really the issue, the issue
is how does guest consume this.
So I would like this discussion to happen on the linux kernel mailing
list not just here.  Can you post the linux patch please?




> Babis Chalios (2):
>   vmgenid: make device data size configurable
>   vmgenid: add generation counter
> 
>  docs/specs/vmgenid.txt    | 101 ++++++++++++++++++--------
>  hw/acpi/vmgenid.c         | 145 +++++++++++++++++++++++++++++++-------
>  include/hw/acpi/vmgenid.h |  23 ++++--
>  3 files changed, 204 insertions(+), 65 deletions(-)
> 
> -- 
> 2.37.1
> 
> Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] vmgenid: add generation counter
  2022-08-03 15:36 ` [PATCH 0/2] " Michael S. Tsirkin
@ 2022-08-03 16:17   ` bchalios
  0 siblings, 0 replies; 12+ messages in thread
From: bchalios @ 2022-08-03 16:17 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel, ani, imammedo, dwmw, graf, xmarcalx



On 8/3/22 5:36 PM, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> > From: Babis Chalios <bchalios@amazon.es>
> >
> > VM generation ID exposes a GUID inside the VM which changes every time a
> > VM restore is happening. Typically, this GUID is used by the guest
> > kernel to re-seed its internal PRNG. As a result, this value cannot be
> > exposed in guest user-space as a notification mechanism for VM restore
> > events.
> >
> > This patch set extends vmgenid to introduce a 32 bits generation counter
> > whose purpose is to be used as a VM restore notification mechanism for
> > the guest user-space.
> >
> > It is true that such a counter could be implemented entirely by the
> > guest kernel, but this would rely on the vmgenid ACPI notification to
> > trigger the counter update, which is inherently racy. Exposing this
> > through the monitor allows the updated value to be in-place before
> > resuming the vcpus, so interested user-space code can (atomically)
> > observe the update without relying on the ACPI notification.
> 
> Producing another 4 bytes is not really the issue, the issue
> is how does guest consume this.
> So I would like this discussion to happen on the linux kernel mailing
> list not just here.  Can you post the linux patch please?
> 

CCed you in the Linux patch thread.

> 
> 
> 
> > Babis Chalios (2):
> >    vmgenid: make device data size configurable
> >    vmgenid: add generation counter
> >
> >   docs/specs/vmgenid.txt    | 101 ++++++++++++++++++--------
> >   hw/acpi/vmgenid.c         | 145 +++++++++++++++++++++++++++++++-------
> >   include/hw/acpi/vmgenid.h |  23 ++++--
> >   3 files changed, 204 insertions(+), 65 deletions(-)
> >
> > --
> > 2.37.1
> >
> > Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936
> 
> 
Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] vmgenid: add generation counter
  2022-08-03 13:41 [PATCH 0/2] vmgenid: add generation counter bchalios
                   ` (2 preceding siblings ...)
  2022-08-03 15:36 ` [PATCH 0/2] " Michael S. Tsirkin
@ 2022-08-03 16:26 ` Daniel P. Berrangé
  2022-08-04  9:54   ` Chalios, Babis
  2022-08-04 15:01 ` Jason A. Donenfeld
  4 siblings, 1 reply; 12+ messages in thread
From: Daniel P. Berrangé @ 2022-08-03 16:26 UTC (permalink / raw)
  To: bchalios; +Cc: qemu-devel, ani, mst, imammedo, dwmw, graf, xmarcalx

On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> From: Babis Chalios <bchalios@amazon.es>
> 
> VM generation ID exposes a GUID inside the VM which changes every time a
> VM restore is happening. Typically, this GUID is used by the guest
> kernel to re-seed its internal PRNG. As a result, this value cannot be
> exposed in guest user-space as a notification mechanism for VM restore
> events.
> 
> This patch set extends vmgenid to introduce a 32 bits generation counter
> whose purpose is to be used as a VM restore notification mechanism for
> the guest user-space.
> 
> It is true that such a counter could be implemented entirely by the
> guest kernel, but this would rely on the vmgenid ACPI notification to
> trigger the counter update, which is inherently racy. Exposing this
> through the monitor allows the updated value to be in-place before
> resuming the vcpus, so interested user-space code can (atomically)
> observe the update without relying on the ACPI notification.

The VM generation ID feature in QEMU is implementing a spec defined
by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
more. This series is proposing a QEMU specific variant, which means
Linux running on all these other hypervisor platforms won't benefit
from the change. If the counter were provided entirely in the guest
kernel, then it works across all hypervisors.

It feels like the kernel ought to provide an implementation itself
as a starting point, with this QEMU change merely being an optional
enhancement to close the race window.

Ideally there would be someone at Microsoft we could connect with to
propose they include this feature in a VM Gen ID spec update, but I
don't personally know who to contact about that kind of thing. A
spec update would increase chances that this change gets provieded
across all hypervisors.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] vmgenid: add generation counter
  2022-08-03 16:26 ` Daniel P. Berrangé
@ 2022-08-04  9:54   ` Chalios, Babis
  2022-08-04 10:02     ` Daniel P. Berrangé
  0 siblings, 1 reply; 12+ messages in thread
From: Chalios, Babis @ 2022-08-04  9:54 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, ani, mst, imammedo, dwmw, graf, xmarcalx, Jason

Hi Daniel,

On 3/8/22 18:26, Daniel P. Berrangé wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
>> From: Babis Chalios <bchalios@amazon.es>
>>
>> VM generation ID exposes a GUID inside the VM which changes every time a
>> VM restore is happening. Typically, this GUID is used by the guest
>> kernel to re-seed its internal PRNG. As a result, this value cannot be
>> exposed in guest user-space as a notification mechanism for VM restore
>> events.
>>
>> This patch set extends vmgenid to introduce a 32 bits generation counter
>> whose purpose is to be used as a VM restore notification mechanism for
>> the guest user-space.
>>
>> It is true that such a counter could be implemented entirely by the
>> guest kernel, but this would rely on the vmgenid ACPI notification to
>> trigger the counter update, which is inherently racy. Exposing this
>> through the monitor allows the updated value to be in-place before
>> resuming the vcpus, so interested user-space code can (atomically)
>> observe the update without relying on the ACPI notification.
> The VM generation ID feature in QEMU is implementing a spec defined
> by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
> more. This series is proposing a QEMU specific variant, which means
> Linux running on all these other hypervisor platforms won't benefit
> from the change. If the counter were provided entirely in the guest
> kernel, then it works across all hypervisors.
>
> It feels like the kernel ought to provide an implementation itself
> as a starting point, with this QEMU change merely being an optional
> enhancement to close the race window.
>
> Ideally there would be someone at Microsoft we could connect with to
> propose they include this feature in a VM Gen ID spec update, but I
> don't personally know who to contact about that kind of thing. A
> spec update would increase chances that this change gets provieded
> across all hypervisors.

You are right, this *is* out-of-spec. The approach here is based on various
discussions happened last year when we first tried to upstream and more
recently when vmgenid landed in Linux. I find that this summary:
https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
have his take on the matter).

This series comes together with a Linux counterpart:
https://lkml.org/lkml/2022/8/3/563, where the generation counter is
exposed to user-space as a misc device. There, I tried to make the
generation counter "optional", in the sense that if it is not there, the
ACPI device should not fail, exactly because, for the moment, this is
not in the spec and hypervisors might not want to implement it.

However, I think that changing the spec will take time and this is a
real issue affecting real use-cases, so we should start from somewhere.

Cheers,
Babis


>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] vmgenid: add generation counter
  2022-08-04  9:54   ` Chalios, Babis
@ 2022-08-04 10:02     ` Daniel P. Berrangé
  2022-08-04 10:17       ` Chalios, Babis
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel P. Berrangé @ 2022-08-04 10:02 UTC (permalink / raw)
  To: Chalios, Babis
  Cc: qemu-devel, ani, mst, imammedo, dwmw, graf, xmarcalx, Jason

On Thu, Aug 04, 2022 at 11:54:05AM +0200, Chalios, Babis wrote:
> Hi Daniel,
> 
> On 3/8/22 18:26, Daniel P. Berrangé wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > 
> > 
> > 
> > On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> > > From: Babis Chalios <bchalios@amazon.es>
> > > 
> > > VM generation ID exposes a GUID inside the VM which changes every time a
> > > VM restore is happening. Typically, this GUID is used by the guest
> > > kernel to re-seed its internal PRNG. As a result, this value cannot be
> > > exposed in guest user-space as a notification mechanism for VM restore
> > > events.
> > > 
> > > This patch set extends vmgenid to introduce a 32 bits generation counter
> > > whose purpose is to be used as a VM restore notification mechanism for
> > > the guest user-space.
> > > 
> > > It is true that such a counter could be implemented entirely by the
> > > guest kernel, but this would rely on the vmgenid ACPI notification to
> > > trigger the counter update, which is inherently racy. Exposing this
> > > through the monitor allows the updated value to be in-place before
> > > resuming the vcpus, so interested user-space code can (atomically)
> > > observe the update without relying on the ACPI notification.
> > The VM generation ID feature in QEMU is implementing a spec defined
> > by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
> > more. This series is proposing a QEMU specific variant, which means
> > Linux running on all these other hypervisor platforms won't benefit
> > from the change. If the counter were provided entirely in the guest
> > kernel, then it works across all hypervisors.
> > 
> > It feels like the kernel ought to provide an implementation itself
> > as a starting point, with this QEMU change merely being an optional
> > enhancement to close the race window.
> > 
> > Ideally there would be someone at Microsoft we could connect with to
> > propose they include this feature in a VM Gen ID spec update, but I
> > don't personally know who to contact about that kind of thing. A
> > spec update would increase chances that this change gets provieded
> > across all hypervisors.
> 
> You are right, this *is* out-of-spec. The approach here is based on various
> discussions happened last year when we first tried to upstream and more
> recently when vmgenid landed in Linux. I find that this summary:
> https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
> have his take on the matter).
> 
> This series comes together with a Linux counterpart:
> https://lkml.org/lkml/2022/8/3/563, where the generation counter is
> exposed to user-space as a misc device. There, I tried to make the
> generation counter "optional", in the sense that if it is not there, the
> ACPI device should not fail, exactly because, for the moment, this is
> not in the spec and hypervisors might not want to implement it.
> 
> However, I think that changing the spec will take time and this is a
> real issue affecting real use-cases, so we should start from somewhere.

I know a spec change can take time, but has there even been any effort
at all to try to start that action since first discussed a year ago ?

If these race condition issues are supposedly so serious that we need
to do this without waiting for a spec, then what is the answer for the
masses of users running Linux on VMware or HyperV/Azure ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] vmgenid: add generation counter
  2022-08-04 10:02     ` Daniel P. Berrangé
@ 2022-08-04 10:17       ` Chalios, Babis
  2022-08-04 13:31         ` Chalios, Babis
  0 siblings, 1 reply; 12+ messages in thread
From: Chalios, Babis @ 2022-08-04 10:17 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, ani, mst, imammedo, dwmw, graf, xmarcalx, Jason, bchalios



On 4/8/22 12:02, Daniel P. Berrangé wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Thu, Aug 04, 2022 at 11:54:05AM +0200, Chalios, Babis wrote:
>> Hi Daniel,
>>
>> On 3/8/22 18:26, Daniel P. Berrangé wrote:
>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
>>>> From: Babis Chalios <bchalios@amazon.es>
>>>>
>>>> VM generation ID exposes a GUID inside the VM which changes every time a
>>>> VM restore is happening. Typically, this GUID is used by the guest
>>>> kernel to re-seed its internal PRNG. As a result, this value cannot be
>>>> exposed in guest user-space as a notification mechanism for VM restore
>>>> events.
>>>>
>>>> This patch set extends vmgenid to introduce a 32 bits generation counter
>>>> whose purpose is to be used as a VM restore notification mechanism for
>>>> the guest user-space.
>>>>
>>>> It is true that such a counter could be implemented entirely by the
>>>> guest kernel, but this would rely on the vmgenid ACPI notification to
>>>> trigger the counter update, which is inherently racy. Exposing this
>>>> through the monitor allows the updated value to be in-place before
>>>> resuming the vcpus, so interested user-space code can (atomically)
>>>> observe the update without relying on the ACPI notification.
>>> The VM generation ID feature in QEMU is implementing a spec defined
>>> by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
>>> more. This series is proposing a QEMU specific variant, which means
>>> Linux running on all these other hypervisor platforms won't benefit
>>> from the change. If the counter were provided entirely in the guest
>>> kernel, then it works across all hypervisors.
>>>
>>> It feels like the kernel ought to provide an implementation itself
>>> as a starting point, with this QEMU change merely being an optional
>>> enhancement to close the race window.
>>>
>>> Ideally there would be someone at Microsoft we could connect with to
>>> propose they include this feature in a VM Gen ID spec update, but I
>>> don't personally know who to contact about that kind of thing. A
>>> spec update would increase chances that this change gets provieded
>>> across all hypervisors.
>> You are right, this *is* out-of-spec. The approach here is based on various
>> discussions happened last year when we first tried to upstream and more
>> recently when vmgenid landed in Linux. I find that this summary:
>> https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
>> have his take on the matter).
>>
>> This series comes together with a Linux counterpart:
>> https://lkml.org/lkml/2022/8/3/563, where the generation counter is
>> exposed to user-space as a misc device. There, I tried to make the
>> generation counter "optional", in the sense that if it is not there, the
>> ACPI device should not fail, exactly because, for the moment, this is
>> not in the spec and hypervisors might not want to implement it.
>>
>> However, I think that changing the spec will take time and this is a
>> real issue affecting real use-cases, so we should start from somewhere.
> I know a spec change can take time, but has there even been any effort
> at all to try to start that action since first discussed a year ago ?

These patch-sets are out exactly for starting the conversation on adding
this to the spec. As you mentioned, it would be great if we could get the
opinion of someone at Microsoft on this.

>
> If these race condition issues are supposedly so serious that we need
> to do this without waiting for a spec, then what is the answer for the
> masses of users running Linux on VMware or HyperV/Azure ?

The problem arises when you start snapshotting and restoring on VMs,
so not everyone is affected from the issue. Use-cases interested in this
are ones that manage fleets of VMs that run code that relies on
user/kernel-space PRNGs or network-facing services using UUIDs, for
example.

>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

Cheers,
Babis
Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] vmgenid: add generation counter
  2022-08-04 10:17       ` Chalios, Babis
@ 2022-08-04 13:31         ` Chalios, Babis
  2022-08-07 15:39           ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 12+ messages in thread
From: Chalios, Babis @ 2022-08-04 13:31 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, ani, mst, imammedo, dwmw, graf, xmarcalx, Jason,
	bchalios, mikelley

On 4/8/22 12:17, Chalios, Babis wrote:
>
>
> On 4/8/22 12:02, Daniel P. Berrangé wrote:
>> CAUTION: This email originated from outside of the organization. Do 
>> not click links or open attachments unless you can confirm the sender 
>> and know the content is safe.
>>
>>
>>
>> On Thu, Aug 04, 2022 at 11:54:05AM +0200, Chalios, Babis wrote:
>>> Hi Daniel,
>>>
>>> On 3/8/22 18:26, Daniel P. Berrangé wrote:
>>>> CAUTION: This email originated from outside of the organization. Do 
>>>> not click links or open attachments unless you can confirm the 
>>>> sender and know the content is safe.
>>>>
>>>>
>>>>
>>>> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
>>>>> From: Babis Chalios <bchalios@amazon.es>
>>>>>
>>>>> VM generation ID exposes a GUID inside the VM which changes every 
>>>>> time a
>>>>> VM restore is happening. Typically, this GUID is used by the guest
>>>>> kernel to re-seed its internal PRNG. As a result, this value 
>>>>> cannot be
>>>>> exposed in guest user-space as a notification mechanism for VM 
>>>>> restore
>>>>> events.
>>>>>
>>>>> This patch set extends vmgenid to introduce a 32 bits generation 
>>>>> counter
>>>>> whose purpose is to be used as a VM restore notification mechanism 
>>>>> for
>>>>> the guest user-space.
>>>>>
>>>>> It is true that such a counter could be implemented entirely by the
>>>>> guest kernel, but this would rely on the vmgenid ACPI notification to
>>>>> trigger the counter update, which is inherently racy. Exposing this
>>>>> through the monitor allows the updated value to be in-place before
>>>>> resuming the vcpus, so interested user-space code can (atomically)
>>>>> observe the update without relying on the ACPI notification.
>>>> The VM generation ID feature in QEMU is implementing a spec defined
>>>> by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
>>>> more. This series is proposing a QEMU specific variant, which means
>>>> Linux running on all these other hypervisor platforms won't benefit
>>>> from the change. If the counter were provided entirely in the guest
>>>> kernel, then it works across all hypervisors.
>>>>
>>>> It feels like the kernel ought to provide an implementation itself
>>>> as a starting point, with this QEMU change merely being an optional
>>>> enhancement to close the race window.
>>>>
>>>> Ideally there would be someone at Microsoft we could connect with to
>>>> propose they include this feature in a VM Gen ID spec update, but I
>>>> don't personally know who to contact about that kind of thing. A
>>>> spec update would increase chances that this change gets provieded
>>>> across all hypervisors.
>>> You are right, this *is* out-of-spec. The approach here is based on 
>>> various
>>> discussions happened last year when we first tried to upstream and more
>>> recently when vmgenid landed in Linux. I find that this summary:
>>> https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
>>> have his take on the matter).
>>>
>>> This series comes together with a Linux counterpart:
>>> https://lkml.org/lkml/2022/8/3/563, where the generation counter is
>>> exposed to user-space as a misc device. There, I tried to make the
>>> generation counter "optional", in the sense that if it is not there, 
>>> the
>>> ACPI device should not fail, exactly because, for the moment, this is
>>> not in the spec and hypervisors might not want to implement it.
>>>
>>> However, I think that changing the spec will take time and this is a
>>> real issue affecting real use-cases, so we should start from somewhere.
>> I know a spec change can take time, but has there even been any effort
>> at all to try to start that action since first discussed a year ago ?
>
> These patch-sets are out exactly for starting the conversation on adding
> this to the spec. As you mentioned, it would be great if we could get the
> opinion of someone at Microsoft on this.
>
>>
>> If these race condition issues are supposedly so serious that we need
>> to do this without waiting for a spec, then what is the answer for the
>> masses of users running Linux on VMware or HyperV/Azure ?
>
> The problem arises when you start snapshotting and restoring on VMs,
> so not everyone is affected from the issue. Use-cases interested in this
> are ones that manage fleets of VMs that run code that relies on
> user/kernel-space PRNGs or network-facing services using UUIDs, for
> example.
>
>>
>> With regards,
>> Daniel
>> -- 
>> |: https://berrange.com      -o- 
>> https://www.flickr.com/photos/dberrange :|
>> |: https://libvirt.org         -o- https://fstop138.berrange.com :|
>> |: https://entangle-photo.org    -o- 
>> https://www.instagram.com/dberrange :|
>>
>
> Cheers,
> Babis

I am CCing Michael from Microsoft. Maybe he has some input on this.
Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] vmgenid: add generation counter
  2022-08-03 13:41 [PATCH 0/2] vmgenid: add generation counter bchalios
                   ` (3 preceding siblings ...)
  2022-08-03 16:26 ` Daniel P. Berrangé
@ 2022-08-04 15:01 ` Jason A. Donenfeld
  4 siblings, 0 replies; 12+ messages in thread
From: Jason A. Donenfeld @ 2022-08-04 15:01 UTC (permalink / raw)
  To: bchalios; +Cc: qemu-devel, ani, mst, imammedo, dwmw, graf, xmarcalx

Hi Babis,

On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> From: Babis Chalios <bchalios@amazon.es>
> 
> VM generation ID exposes a GUID inside the VM which changes every time a
> VM restore is happening. Typically, this GUID is used by the guest
> kernel to re-seed its internal PRNG. As a result, this value cannot be
> exposed in guest user-space as a notification mechanism for VM restore
> events.
> 
> This patch set extends vmgenid to introduce a 32 bits generation counter
> whose purpose is to be used as a VM restore notification mechanism for
> the guest user-space.
> 
> It is true that such a counter could be implemented entirely by the
> guest kernel, but this would rely on the vmgenid ACPI notification to
> trigger the counter update, which is inherently racy. Exposing this
> through the monitor allows the updated value to be in-place before
> resuming the vcpus, so interested user-space code can (atomically)
> observe the update without relying on the ACPI notification.

As I wrote on LKML:
https://lore.kernel.org/lkml/Yuve4vuAnU85mdRY@zx2c4.com/
you seem to be rehashing something already discussed in earlier threads.
I don't think we should rush to adding something like this to QEMU.

Jason


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 0/2] vmgenid: add generation counter
  2022-08-04 13:31         ` Chalios, Babis
@ 2022-08-07 15:39           ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Kelley (LINUX) @ 2022-08-07 15:39 UTC (permalink / raw)
  To: Chalios, Babis, Daniel P. Berrangé
  Cc: qemu-devel, ani, mst, imammedo, dwmw, graf, xmarcalx, jason

From: Chalios, Babis <bchalios@amazon.es> Sent: Thursday, August 4, 2022 6:31 AM
> 
> On 4/8/22 12:17, Chalios, Babis wrote:
> >
> > On 4/8/22 12:02, Daniel P. Berrangé wrote:
> >>
> >> On Thu, Aug 04, 2022 at 11:54:05AM +0200, Chalios, Babis wrote:
> >>> Hi Daniel,
> >>>
> >>> On 3/8/22 18:26, Daniel P. Berrangé wrote:
> >>>>
> >>>> On Wed, Aug 03, 2022 at 03:41:45PM +0200, bchalios@amazon.es wrote:
> >>>>> From: Babis Chalios <bchalios@amazon.es>
> >>>>>
> >>>>> VM generation ID exposes a GUID inside the VM which changes every
> >>>>> time a
> >>>>> VM restore is happening. Typically, this GUID is used by the guest
> >>>>> kernel to re-seed its internal PRNG. As a result, this value
> >>>>> cannot be
> >>>>> exposed in guest user-space as a notification mechanism for VM
> >>>>> restore
> >>>>> events.
> >>>>>
> >>>>> This patch set extends vmgenid to introduce a 32 bits generation
> >>>>> counter
> >>>>> whose purpose is to be used as a VM restore notification mechanism
> >>>>> for
> >>>>> the guest user-space.
> >>>>>
> >>>>> It is true that such a counter could be implemented entirely by the
> >>>>> guest kernel, but this would rely on the vmgenid ACPI notification to
> >>>>> trigger the counter update, which is inherently racy. Exposing this
> >>>>> through the monitor allows the updated value to be in-place before
> >>>>> resuming the vcpus, so interested user-space code can (atomically)
> >>>>> observe the update without relying on the ACPI notification.
> >>>> The VM generation ID feature in QEMU is implementing a spec defined
> >>>> by Microsoft. It is implemented in HyperV, VMWare, QEMU and possibly
> >>>> more. This series is proposing a QEMU specific variant, which means
> >>>> Linux running on all these other hypervisor platforms won't benefit
> >>>> from the change. If the counter were provided entirely in the guest
> >>>> kernel, then it works across all hypervisors.
> >>>>
> >>>> It feels like the kernel ought to provide an implementation itself
> >>>> as a starting point, with this QEMU change merely being an optional
> >>>> enhancement to close the race window.
> >>>>
> >>>> Ideally there would be someone at Microsoft we could connect with to
> >>>> propose they include this feature in a VM Gen ID spec update, but I
> >>>> don't personally know who to contact about that kind of thing. A
> >>>> spec update would increase chances that this change gets provieded
> >>>> across all hypervisors.
> >>> You are right, this *is* out-of-spec. The approach here is based on
> >>> various
> >>> discussions happened last year when we first tried to upstream and more
> >>> recently when vmgenid landed in Linux. I find that this summary:
> >>> https://lkml.org/lkml/2022/3/1/693 quite to the point. (CCing Jason to
> >>> have his take on the matter).
> >>>
> >>> This series comes together with a Linux counterpart:
> >>> https://lkml.org/lkml/2022/8/3/563, where the generation counter is
> >>> exposed to user-space as a misc device. There, I tried to make the
> >>> generation counter "optional", in the sense that if it is not there, the
> >>> ACPI device should not fail, exactly because, for the moment, this is
> >>> not in the spec and hypervisors might not want to implement it.
> >>>
> >>> However, I think that changing the spec will take time and this is a
> >>> real issue affecting real use-cases, so we should start from somewhere.
> >> I know a spec change can take time, but has there even been any effort
> >> at all to try to start that action since first discussed a year ago ?
> >
> > These patch-sets are out exactly for starting the conversation on adding
> > this to the spec. As you mentioned, it would be great if we could get the
> > opinion of someone at Microsoft on this.
> >
> >>
> >> If these race condition issues are supposedly so serious that we need
> >> to do this without waiting for a spec, then what is the answer for the
> >> masses of users running Linux on VMware or HyperV/Azure ?
> >
> > The problem arises when you start snapshotting and restoring on VMs,
> > so not everyone is affected from the issue. Use-cases interested in this
> > are ones that manage fleets of VMs that run code that relies on
> > user/kernel-space PRNGs or network-facing services using UUIDs, for
> > example.
> >
> 
> I am CCing Michael from Microsoft. Maybe he has some input on this.

FWIW, Microsoft created the vmgenid concept and spec 10 years ago for
Windows running virtualized, with virtualized Active Directory Domain
Controllers being the primary use case.  I doubt there's anyone on the
Windows side here at Microsoft who has any further interest in the topic.
I do know there haven't been any requests to the Hyper-V team to
further enhance the vmgenid functionality.

As such, I think the Linux community is free to extend the functionality 
without conflicting with something Microsoft is doing.  I don't personally
have domain knowledge on these technical issues, so I'm acting more as
a Linux advocate with the Hyper-V team rather than a domain expert from
Microsoft.  I can circulate the Linux proposals with the Hyper-V team and
to see if they have any objections.  If there is agreement on extensions to
the hypervisor/VMM interface, I can ask the Hyper-V team to implement
them in support of Linux guest requirements.

I can also try to update the original spec with the extensions so that the
spec is captured in one place, though the internal logistics of updating an
old document like the vmgenid spec can be surprisingly difficult.  At worst,
there might need to be a separate spec for the Linux-driven extensions
that is hosted elsewhere.

Michael


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-08-07 23:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 13:41 [PATCH 0/2] vmgenid: add generation counter bchalios
2022-08-03 13:41 ` [PATCH 1/2] vmgenid: make device data size configurable bchalios
2022-08-03 13:41 ` [PATCH 2/2] vmgenid: add generation counter bchalios
2022-08-03 15:36 ` [PATCH 0/2] " Michael S. Tsirkin
2022-08-03 16:17   ` bchalios
2022-08-03 16:26 ` Daniel P. Berrangé
2022-08-04  9:54   ` Chalios, Babis
2022-08-04 10:02     ` Daniel P. Berrangé
2022-08-04 10:17       ` Chalios, Babis
2022-08-04 13:31         ` Chalios, Babis
2022-08-07 15:39           ` Michael Kelley (LINUX)
2022-08-04 15:01 ` Jason A. Donenfeld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.