All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-23 13:12 ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 13:12 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian
  Cc: Jason A. Donenfeld, dwmw, acatan, graf, colmmacc, sblbir,
	raduweis, jannh, gregkh, tytso

This small series picks up work from Amazon that seems to have stalled
out later year around this time: listening for the vmgenid ACPI
notification, and using it to "do something." Last year, that something
involved a complicated userspace mmap chardev, which seems frought with
difficulty. This year, I have something much simpler in mind: simply
using those ACPI notifications to tell the RNG to reinitialize safely,
so we don't repeat random numbers in cloned, forked, or rolled-back VM
instances.

This series consists of two patches. The first is a rather
straightforward addition to random.c, which I feel fine about. The
second patch is the reason this is just an RFC: it's a cleanup of the
ACPI driver from last year, and I don't really have much experience
writing, testing, debugging, or maintaining these types of drivers.
Ideally this thread would yield somebody saying, "I see the intent of
this; I'm happy to take over ownership of this part." That way, I can
focus on the RNG part, and whoever steps up for the paravirt ACPI part
can focus on that.

As a final note, this series intentionally does _not_ focus on
notification of these events to userspace or to other kernel consumers.
Since these VM fork detection events first need to hit the RNG, we can
later talk about what sorts of notifications or mmap'd counters the RNG
should be making accessible to elsewhere. But that's a different sort of
project and ties into a lot of more complicated concerns beyond this
more basic patchset. So hopefully we can keep the discussion rather
focused here to this ACPI business.

Cc: dwmw@amazon.co.uk
Cc: acatan@amazon.com
Cc: graf@amazon.com
Cc: colmmacc@amazon.com
Cc: sblbir@amazon.com
Cc: raduweis@amazon.com
Cc: jannh@google.com
Cc: gregkh@linuxfoundation.org
Cc: tytso@mit.edu

Jason A. Donenfeld (2):
  random: add mechanism for VM forks to reinitialize crng
  drivers/virt: add vmgenid driver for reinitializing RNG

 drivers/char/random.c  |  58 ++++++++++++++++++
 drivers/virt/Kconfig   |   8 +++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 133 +++++++++++++++++++++++++++++++++++++++++
 include/linux/random.h |   1 +
 5 files changed, 201 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-23 13:12 ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 13:12 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian
  Cc: Jason A. Donenfeld, tytso, jannh, gregkh, raduweis, acatan, graf,
	colmmacc, sblbir, dwmw

This small series picks up work from Amazon that seems to have stalled
out later year around this time: listening for the vmgenid ACPI
notification, and using it to "do something." Last year, that something
involved a complicated userspace mmap chardev, which seems frought with
difficulty. This year, I have something much simpler in mind: simply
using those ACPI notifications to tell the RNG to reinitialize safely,
so we don't repeat random numbers in cloned, forked, or rolled-back VM
instances.

This series consists of two patches. The first is a rather
straightforward addition to random.c, which I feel fine about. The
second patch is the reason this is just an RFC: it's a cleanup of the
ACPI driver from last year, and I don't really have much experience
writing, testing, debugging, or maintaining these types of drivers.
Ideally this thread would yield somebody saying, "I see the intent of
this; I'm happy to take over ownership of this part." That way, I can
focus on the RNG part, and whoever steps up for the paravirt ACPI part
can focus on that.

As a final note, this series intentionally does _not_ focus on
notification of these events to userspace or to other kernel consumers.
Since these VM fork detection events first need to hit the RNG, we can
later talk about what sorts of notifications or mmap'd counters the RNG
should be making accessible to elsewhere. But that's a different sort of
project and ties into a lot of more complicated concerns beyond this
more basic patchset. So hopefully we can keep the discussion rather
focused here to this ACPI business.

Cc: dwmw@amazon.co.uk
Cc: acatan@amazon.com
Cc: graf@amazon.com
Cc: colmmacc@amazon.com
Cc: sblbir@amazon.com
Cc: raduweis@amazon.com
Cc: jannh@google.com
Cc: gregkh@linuxfoundation.org
Cc: tytso@mit.edu

Jason A. Donenfeld (2):
  random: add mechanism for VM forks to reinitialize crng
  drivers/virt: add vmgenid driver for reinitializing RNG

 drivers/char/random.c  |  58 ++++++++++++++++++
 drivers/virt/Kconfig   |   8 +++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 133 +++++++++++++++++++++++++++++++++++++++++
 include/linux/random.h |   1 +
 5 files changed, 201 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

-- 
2.35.1



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
  2022-02-23 13:12 ` Jason A. Donenfeld
@ 2022-02-23 13:12   ` Jason A. Donenfeld
  -1 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 13:12 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian
  Cc: Jason A. Donenfeld, dwmw, acatan, graf, colmmacc, sblbir,
	raduweis, jannh, gregkh, tytso

When a VM forks, we must immediately mix in additional information to
the stream of random output so that two forks or a rollback don't
produce the same stream of random numbers, which could have catastrophic
cryptographic consequences. This commit adds a simple API, add_vmfork_
randomness(), for that.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/random.c  | 58 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/random.h |  1 +
 2 files changed, 59 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 536237a0f073..29d6ce484d15 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -344,6 +344,46 @@ static void crng_reseed(void)
 	}
 }
 
+/*
+ * This mixes unique_vm_id directly into the base_crng key as soon as
+ * possible, similarly to crng_pre_init_inject(), even if the crng is
+ * already running, in order to immediately branch streams from prior
+ * VM instances.
+ */
+static void crng_vm_fork_inject(const void *unique_vm_id, size_t len)
+{
+	unsigned long flags, next_gen;
+	struct blake2s_state hash;
+
+	/*
+	 * Unlike crng_reseed(), we take the lock as early as possible,
+	 * since we don't want the RNG to be used until it's updated.
+	 */
+	spin_lock_irqsave(&base_crng.lock, flags);
+
+	/*
+	 * Also update the generation, while locked, as early as
+	 * possible. This will mean unlocked reads of the generation
+	 * will cause a reseeding of per-cpu crngs, and those will
+	 * spin on the base_crng lock waiting for the rest of this
+	 * operation to complete, which achieves the goal of blocking
+	 * the production of new output until this is done.
+	 */
+	next_gen = base_crng.generation + 1;
+	if (next_gen == ULONG_MAX)
+		++next_gen;
+	WRITE_ONCE(base_crng.generation, next_gen);
+	WRITE_ONCE(base_crng.birth, jiffies);
+
+	/* This is the same formulation used by crng_pre_init_inject(). */
+	blake2s_init(&hash, sizeof(base_crng.key));
+	blake2s_update(&hash, base_crng.key, sizeof(base_crng.key));
+	blake2s_update(&hash, unique_vm_id, len);
+	blake2s_final(&hash, base_crng.key);
+
+	spin_unlock_irqrestore(&base_crng.lock, flags);
+}
+
 /*
  * This generates a ChaCha block using the provided key, and then
  * immediately overwites that key with half the block. It returns
@@ -935,6 +975,7 @@ static bool drain_entropy(void *buf, size_t nbytes)
  *	void add_hwgenerator_randomness(const void *buffer, size_t count,
  *					size_t entropy);
  *	void add_bootloader_randomness(const void *buf, size_t size);
+ *	void add_vmfork_randomness(const void *unique_vm_id, size_t size);
  *	void add_interrupt_randomness(int irq);
  *
  * add_device_randomness() adds data to the input pool that
@@ -966,6 +1007,11 @@ static bool drain_entropy(void *buf, size_t nbytes)
  * add_device_randomness(), depending on whether or not the configuration
  * option CONFIG_RANDOM_TRUST_BOOTLOADER is set.
  *
+ * add_vmfork_randomness() adds a unique (but not neccessarily secret) ID
+ * representing the current instance of a VM to the pool, without crediting,
+ * and then immediately mixes that ID into the current base_crng key, so
+ * that it takes effect prior to a reseeding.
+ *
  * add_interrupt_randomness() uses the interrupt timing as random
  * inputs to the entropy pool. Using the cycle counters and the irq source
  * as inputs, it feeds the input pool roughly once a second or after 64
@@ -1195,6 +1241,18 @@ void add_bootloader_randomness(const void *buf, size_t size)
 }
 EXPORT_SYMBOL_GPL(add_bootloader_randomness);
 
+/*
+ * Handle a new unique VM ID, which is unique, not secret, so we
+ * don't credit it, but we do mix it into the entropy pool and
+ * inject it into the crng.
+ */
+void add_vmfork_randomness(const void *unique_vm_id, size_t size)
+{
+	add_device_randomness(unique_vm_id, size);
+	crng_vm_fork_inject(unique_vm_id, size);
+}
+EXPORT_SYMBOL_GPL(add_vmfork_randomness);
+
 struct fast_pool {
 	union {
 		u32 pool32[4];
diff --git a/include/linux/random.h b/include/linux/random.h
index 6148b8d1ccf3..51b8ed797732 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,6 +34,7 @@ extern void add_input_randomness(unsigned int type, unsigned int code,
 extern void add_interrupt_randomness(int irq) __latent_entropy;
 extern void add_hwgenerator_randomness(const void *buffer, size_t count,
 				       size_t entropy);
+extern void add_vmfork_randomness(const void *unique_vm_id, size_t size);
 
 extern void get_random_bytes(void *buf, size_t nbytes);
 extern int wait_for_random_bytes(void);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
@ 2022-02-23 13:12   ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 13:12 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian
  Cc: Jason A. Donenfeld, tytso, jannh, gregkh, raduweis, acatan, graf,
	colmmacc, sblbir, dwmw

When a VM forks, we must immediately mix in additional information to
the stream of random output so that two forks or a rollback don't
produce the same stream of random numbers, which could have catastrophic
cryptographic consequences. This commit adds a simple API, add_vmfork_
randomness(), for that.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/char/random.c  | 58 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/random.h |  1 +
 2 files changed, 59 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 536237a0f073..29d6ce484d15 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -344,6 +344,46 @@ static void crng_reseed(void)
 	}
 }
 
+/*
+ * This mixes unique_vm_id directly into the base_crng key as soon as
+ * possible, similarly to crng_pre_init_inject(), even if the crng is
+ * already running, in order to immediately branch streams from prior
+ * VM instances.
+ */
+static void crng_vm_fork_inject(const void *unique_vm_id, size_t len)
+{
+	unsigned long flags, next_gen;
+	struct blake2s_state hash;
+
+	/*
+	 * Unlike crng_reseed(), we take the lock as early as possible,
+	 * since we don't want the RNG to be used until it's updated.
+	 */
+	spin_lock_irqsave(&base_crng.lock, flags);
+
+	/*
+	 * Also update the generation, while locked, as early as
+	 * possible. This will mean unlocked reads of the generation
+	 * will cause a reseeding of per-cpu crngs, and those will
+	 * spin on the base_crng lock waiting for the rest of this
+	 * operation to complete, which achieves the goal of blocking
+	 * the production of new output until this is done.
+	 */
+	next_gen = base_crng.generation + 1;
+	if (next_gen == ULONG_MAX)
+		++next_gen;
+	WRITE_ONCE(base_crng.generation, next_gen);
+	WRITE_ONCE(base_crng.birth, jiffies);
+
+	/* This is the same formulation used by crng_pre_init_inject(). */
+	blake2s_init(&hash, sizeof(base_crng.key));
+	blake2s_update(&hash, base_crng.key, sizeof(base_crng.key));
+	blake2s_update(&hash, unique_vm_id, len);
+	blake2s_final(&hash, base_crng.key);
+
+	spin_unlock_irqrestore(&base_crng.lock, flags);
+}
+
 /*
  * This generates a ChaCha block using the provided key, and then
  * immediately overwites that key with half the block. It returns
@@ -935,6 +975,7 @@ static bool drain_entropy(void *buf, size_t nbytes)
  *	void add_hwgenerator_randomness(const void *buffer, size_t count,
  *					size_t entropy);
  *	void add_bootloader_randomness(const void *buf, size_t size);
+ *	void add_vmfork_randomness(const void *unique_vm_id, size_t size);
  *	void add_interrupt_randomness(int irq);
  *
  * add_device_randomness() adds data to the input pool that
@@ -966,6 +1007,11 @@ static bool drain_entropy(void *buf, size_t nbytes)
  * add_device_randomness(), depending on whether or not the configuration
  * option CONFIG_RANDOM_TRUST_BOOTLOADER is set.
  *
+ * add_vmfork_randomness() adds a unique (but not neccessarily secret) ID
+ * representing the current instance of a VM to the pool, without crediting,
+ * and then immediately mixes that ID into the current base_crng key, so
+ * that it takes effect prior to a reseeding.
+ *
  * add_interrupt_randomness() uses the interrupt timing as random
  * inputs to the entropy pool. Using the cycle counters and the irq source
  * as inputs, it feeds the input pool roughly once a second or after 64
@@ -1195,6 +1241,18 @@ void add_bootloader_randomness(const void *buf, size_t size)
 }
 EXPORT_SYMBOL_GPL(add_bootloader_randomness);
 
+/*
+ * Handle a new unique VM ID, which is unique, not secret, so we
+ * don't credit it, but we do mix it into the entropy pool and
+ * inject it into the crng.
+ */
+void add_vmfork_randomness(const void *unique_vm_id, size_t size)
+{
+	add_device_randomness(unique_vm_id, size);
+	crng_vm_fork_inject(unique_vm_id, size);
+}
+EXPORT_SYMBOL_GPL(add_vmfork_randomness);
+
 struct fast_pool {
 	union {
 		u32 pool32[4];
diff --git a/include/linux/random.h b/include/linux/random.h
index 6148b8d1ccf3..51b8ed797732 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -34,6 +34,7 @@ extern void add_input_randomness(unsigned int type, unsigned int code,
 extern void add_interrupt_randomness(int irq) __latent_entropy;
 extern void add_hwgenerator_randomness(const void *buffer, size_t count,
 				       size_t entropy);
+extern void add_vmfork_randomness(const void *unique_vm_id, size_t size);
 
 extern void get_random_bytes(void *buf, size_t nbytes);
 extern int wait_for_random_bytes(void);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 2/2] drivers/virt: add vmgenid driver for reinitializing RNG
  2022-02-23 13:12 ` Jason A. Donenfeld
@ 2022-02-23 13:12   ` Jason A. Donenfeld
  -1 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 13:12 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian
  Cc: Jason A. Donenfeld, dwmw, acatan, graf, colmmacc, sblbir,
	raduweis, jannh, gregkh, tytso

VM Generation ID is a feature from Microsoft, described at
<https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
<https://aka.ms/win10rng>, as:

    If the OS is running in a VM, there is a problem that most
    hypervisors can snapshot the state of the machine and later rewind
    the VM state to the saved state. This results in the machine running
    a second time with the exact same RNG state, which leads to serious
    security problems.  To reduce the window of vulnerability, Windows
    10 on a Hyper-V VM will detect when the VM state is reset, retrieve
    a unique (not random) value from the hypervisor, and reseed the root
    RNG with that unique value.  This does not eliminate the
    vulnerability, but it greatly reduces the time during which the RNG
    system will produce the same outputs as it did during a previous
    instantiation of the same VM state.

Linux has the same issue, and given that vmgenid is supported already by
multiple hypervisors, we can implement more or less the same solution.
So this commit wires up the vmgenid ACPI notification to the RNG's newly
added add_vmfork_randomness() function.

This driver builds on prior work from Adrian Catangiu at Amazon, and it
is my hope that that team can resume maintenance of this driver.

Cc: Adrian Catangiu <adrian@parity.io>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/virt/Kconfig   |   8 +++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 133 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 142 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 8061e8ef449f..e7f9c1bca02b 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -13,6 +13,14 @@ menuconfig VIRT_DRIVERS
 
 if VIRT_DRIVERS
 
+config VMGENID
+	bool "Virtual Machine Generation ID driver"
+	default y
+	depends on ACPI
+	help
+	  Say Y here to use the hypervisor provided Virtual Machine Generation ID
+	  to reseed the RNG when the VM is cloned.
+
 config FSL_HV_MANAGER
 	tristate "Freescale hypervisor management driver"
 	depends on FSL_SOC
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index 3e272ea60cd9..108d0ffcc9aa 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -4,6 +4,7 @@
 #
 
 obj-$(CONFIG_FSL_HV_MANAGER)	+= fsl_hypervisor.o
+obj-$(CONFIG_VMGENID)		+= vmgenid.o
 obj-y				+= vboxguest/
 
 obj-$(CONFIG_NITRO_ENCLAVES)	+= nitro_enclaves/
diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
new file mode 100644
index 000000000000..11ed7f668bb1
--- /dev/null
+++ b/drivers/virt/vmgenid.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtual Machine Generation ID driver
+ *
+ * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2020 Amazon. All rights reserved.
+ * Copyright (C) 2018 Red Hat Inc. All rights reserved.
+ */
+
+#define pr_fmt(fmt) "vmgenid: " fmt
+
+#include <linux/acpi.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/uuid.h>
+
+#define DEV_NAME "vmgenid"
+ACPI_MODULE_NAME(DEV_NAME);
+
+static struct {
+	uuid_t uuid;
+	void *uuid_iomap;
+} vmgenid_data;
+
+static int vmgenid_acpi_map(acpi_handle handle)
+{
+	phys_addr_t phys_addr = 0;
+	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+	acpi_status status;
+	union acpi_object *pss;
+	union acpi_object *element;
+	int i;
+
+	status = acpi_evaluate_object(handle, "ADDR", NULL, &buffer);
+	if (ACPI_FAILURE(status)) {
+		ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR"));
+		return -ENODEV;
+	}
+	pss = buffer.pointer;
+	if (!pss || pss->type != ACPI_TYPE_PACKAGE || pss->package.count != 2)
+		return -EINVAL;
+
+	for (i = 0; i < pss->package.count; ++i) {
+		element = &pss->package.elements[i];
+		if (element->type != ACPI_TYPE_INTEGER)
+			return -EINVAL;
+		phys_addr |= element->integer.value << i * 32;
+	}
+
+	vmgenid_data.uuid_iomap = acpi_os_map_memory(phys_addr, sizeof(vmgenid_data.uuid));
+	if (!vmgenid_data.uuid_iomap) {
+		pr_err("failed to map memory at %pa, size %zu\n",
+			&phys_addr, sizeof(vmgenid_data.uuid));
+		return -ENOMEM;
+	}
+
+	memcpy_fromio(&vmgenid_data.uuid, vmgenid_data.uuid_iomap, sizeof(vmgenid_data.uuid));
+
+	return 0;
+}
+
+static int vmgenid_acpi_add(struct acpi_device *device)
+{
+	int ret;
+
+	if (!device)
+		return -EINVAL;
+	ret = vmgenid_acpi_map(device->handle);
+	if (ret < 0) {
+		pr_err("failed to map acpi device: %d\n", ret);
+		return ret;
+	}
+	device->driver_data = &vmgenid_data;
+	add_device_randomness(&vmgenid_data.uuid, sizeof(vmgenid_data.uuid));
+	return 0;
+}
+
+static int vmgenid_acpi_remove(struct acpi_device *device)
+{
+	if (!device || acpi_driver_data(device) != &vmgenid_data)
+		return -EINVAL;
+	device->driver_data = NULL;
+	if (vmgenid_data.uuid_iomap)
+		acpi_os_unmap_memory(vmgenid_data.uuid_iomap, sizeof(vmgenid_data.uuid));
+	vmgenid_data.uuid_iomap = NULL;
+	return 0;
+}
+
+static void vmgenid_acpi_notify(struct acpi_device *device, u32 event)
+{
+	uuid_t old_uuid = vmgenid_data.uuid;
+
+	if (!device || acpi_driver_data(device) != &vmgenid_data)
+		return;
+	memcpy_fromio(&vmgenid_data.uuid, vmgenid_data.uuid_iomap, sizeof(vmgenid_data.uuid));
+	if (!memcmp(&old_uuid, &vmgenid_data.uuid, sizeof(vmgenid_data.uuid)))
+		return;
+	add_vmfork_randomness(&vmgenid_data.uuid, sizeof(vmgenid_data.uuid));
+}
+
+static const struct acpi_device_id vmgenid_ids[] = {
+	{"VMGENID", 0},
+	{"QEMUVGID", 0},
+	{"", 0},
+};
+
+static struct acpi_driver acpi_vmgenid_driver = {
+	.name = "vm_generation_id",
+	.ids = vmgenid_ids,
+	.owner = THIS_MODULE,
+	.ops = {
+		.add = vmgenid_acpi_add,
+		.remove = vmgenid_acpi_remove,
+		.notify = vmgenid_acpi_notify,
+	}
+};
+
+static int __init vmgenid_init(void)
+{
+	return acpi_bus_register_driver(&acpi_vmgenid_driver);
+}
+
+static void __exit vmgenid_exit(void)
+{
+	acpi_bus_unregister_driver(&acpi_vmgenid_driver);
+}
+
+module_init(vmgenid_init);
+module_exit(vmgenid_exit);
+
+MODULE_DESCRIPTION("Virtual Machine Generation ID");
+MODULE_LICENSE("GPL");
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH RFC v1 2/2] drivers/virt: add vmgenid driver for reinitializing RNG
@ 2022-02-23 13:12   ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 13:12 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian
  Cc: Jason A. Donenfeld, tytso, jannh, gregkh, raduweis, acatan, graf,
	colmmacc, sblbir, dwmw

VM Generation ID is a feature from Microsoft, described at
<https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
<https://aka.ms/win10rng>, as:

    If the OS is running in a VM, there is a problem that most
    hypervisors can snapshot the state of the machine and later rewind
    the VM state to the saved state. This results in the machine running
    a second time with the exact same RNG state, which leads to serious
    security problems.  To reduce the window of vulnerability, Windows
    10 on a Hyper-V VM will detect when the VM state is reset, retrieve
    a unique (not random) value from the hypervisor, and reseed the root
    RNG with that unique value.  This does not eliminate the
    vulnerability, but it greatly reduces the time during which the RNG
    system will produce the same outputs as it did during a previous
    instantiation of the same VM state.

Linux has the same issue, and given that vmgenid is supported already by
multiple hypervisors, we can implement more or less the same solution.
So this commit wires up the vmgenid ACPI notification to the RNG's newly
added add_vmfork_randomness() function.

This driver builds on prior work from Adrian Catangiu at Amazon, and it
is my hope that that team can resume maintenance of this driver.

Cc: Adrian Catangiu <adrian@parity.io>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/virt/Kconfig   |   8 +++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 133 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 142 insertions(+)
 create mode 100644 drivers/virt/vmgenid.c

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 8061e8ef449f..e7f9c1bca02b 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -13,6 +13,14 @@ menuconfig VIRT_DRIVERS
 
 if VIRT_DRIVERS
 
+config VMGENID
+	bool "Virtual Machine Generation ID driver"
+	default y
+	depends on ACPI
+	help
+	  Say Y here to use the hypervisor provided Virtual Machine Generation ID
+	  to reseed the RNG when the VM is cloned.
+
 config FSL_HV_MANAGER
 	tristate "Freescale hypervisor management driver"
 	depends on FSL_SOC
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index 3e272ea60cd9..108d0ffcc9aa 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -4,6 +4,7 @@
 #
 
 obj-$(CONFIG_FSL_HV_MANAGER)	+= fsl_hypervisor.o
+obj-$(CONFIG_VMGENID)		+= vmgenid.o
 obj-y				+= vboxguest/
 
 obj-$(CONFIG_NITRO_ENCLAVES)	+= nitro_enclaves/
diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
new file mode 100644
index 000000000000..11ed7f668bb1
--- /dev/null
+++ b/drivers/virt/vmgenid.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtual Machine Generation ID driver
+ *
+ * Copyright (C) 2022 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2020 Amazon. All rights reserved.
+ * Copyright (C) 2018 Red Hat Inc. All rights reserved.
+ */
+
+#define pr_fmt(fmt) "vmgenid: " fmt
+
+#include <linux/acpi.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/random.h>
+#include <linux/uuid.h>
+
+#define DEV_NAME "vmgenid"
+ACPI_MODULE_NAME(DEV_NAME);
+
+static struct {
+	uuid_t uuid;
+	void *uuid_iomap;
+} vmgenid_data;
+
+static int vmgenid_acpi_map(acpi_handle handle)
+{
+	phys_addr_t phys_addr = 0;
+	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+	acpi_status status;
+	union acpi_object *pss;
+	union acpi_object *element;
+	int i;
+
+	status = acpi_evaluate_object(handle, "ADDR", NULL, &buffer);
+	if (ACPI_FAILURE(status)) {
+		ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR"));
+		return -ENODEV;
+	}
+	pss = buffer.pointer;
+	if (!pss || pss->type != ACPI_TYPE_PACKAGE || pss->package.count != 2)
+		return -EINVAL;
+
+	for (i = 0; i < pss->package.count; ++i) {
+		element = &pss->package.elements[i];
+		if (element->type != ACPI_TYPE_INTEGER)
+			return -EINVAL;
+		phys_addr |= element->integer.value << i * 32;
+	}
+
+	vmgenid_data.uuid_iomap = acpi_os_map_memory(phys_addr, sizeof(vmgenid_data.uuid));
+	if (!vmgenid_data.uuid_iomap) {
+		pr_err("failed to map memory at %pa, size %zu\n",
+			&phys_addr, sizeof(vmgenid_data.uuid));
+		return -ENOMEM;
+	}
+
+	memcpy_fromio(&vmgenid_data.uuid, vmgenid_data.uuid_iomap, sizeof(vmgenid_data.uuid));
+
+	return 0;
+}
+
+static int vmgenid_acpi_add(struct acpi_device *device)
+{
+	int ret;
+
+	if (!device)
+		return -EINVAL;
+	ret = vmgenid_acpi_map(device->handle);
+	if (ret < 0) {
+		pr_err("failed to map acpi device: %d\n", ret);
+		return ret;
+	}
+	device->driver_data = &vmgenid_data;
+	add_device_randomness(&vmgenid_data.uuid, sizeof(vmgenid_data.uuid));
+	return 0;
+}
+
+static int vmgenid_acpi_remove(struct acpi_device *device)
+{
+	if (!device || acpi_driver_data(device) != &vmgenid_data)
+		return -EINVAL;
+	device->driver_data = NULL;
+	if (vmgenid_data.uuid_iomap)
+		acpi_os_unmap_memory(vmgenid_data.uuid_iomap, sizeof(vmgenid_data.uuid));
+	vmgenid_data.uuid_iomap = NULL;
+	return 0;
+}
+
+static void vmgenid_acpi_notify(struct acpi_device *device, u32 event)
+{
+	uuid_t old_uuid = vmgenid_data.uuid;
+
+	if (!device || acpi_driver_data(device) != &vmgenid_data)
+		return;
+	memcpy_fromio(&vmgenid_data.uuid, vmgenid_data.uuid_iomap, sizeof(vmgenid_data.uuid));
+	if (!memcmp(&old_uuid, &vmgenid_data.uuid, sizeof(vmgenid_data.uuid)))
+		return;
+	add_vmfork_randomness(&vmgenid_data.uuid, sizeof(vmgenid_data.uuid));
+}
+
+static const struct acpi_device_id vmgenid_ids[] = {
+	{"VMGENID", 0},
+	{"QEMUVGID", 0},
+	{"", 0},
+};
+
+static struct acpi_driver acpi_vmgenid_driver = {
+	.name = "vm_generation_id",
+	.ids = vmgenid_ids,
+	.owner = THIS_MODULE,
+	.ops = {
+		.add = vmgenid_acpi_add,
+		.remove = vmgenid_acpi_remove,
+		.notify = vmgenid_acpi_notify,
+	}
+};
+
+static int __init vmgenid_init(void)
+{
+	return acpi_bus_register_driver(&acpi_vmgenid_driver);
+}
+
+static void __exit vmgenid_exit(void)
+{
+	acpi_bus_unregister_driver(&acpi_vmgenid_driver);
+}
+
+module_init(vmgenid_init);
+module_exit(vmgenid_exit);
+
+MODULE_DESCRIPTION("Virtual Machine Generation ID");
+MODULE_LICENSE("GPL");
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-23 13:12 ` Jason A. Donenfeld
@ 2022-02-23 16:08   ` Jason A. Donenfeld
  -1 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 16:08 UTC (permalink / raw)
  To: LKML, Linux Crypto Mailing List, QEMU Developers, KVM list,
	linux-s390, adrian
  Cc: Woodhouse, David, Catangiu, Adrian Costin, graf,
	Colm MacCarthaigh, Singh, Balbir, Weiss, Radu, Jann Horn,
	Greg Kroah-Hartman, Theodore Ts'o, Igor Mammedov, ehabkost,
	ben, Michael S. Tsirkin, lersek

On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> second patch is the reason this is just an RFC: it's a cleanup of the
> ACPI driver from last year, and I don't really have much experience
> writing, testing, debugging, or maintaining these types of drivers.
> Ideally this thread would yield somebody saying, "I see the intent of
> this; I'm happy to take over ownership of this part." That way, I can
> focus on the RNG part, and whoever steps up for the paravirt ACPI part
> can focus on that.

I actually managed to test this in QEMU, and it seems to work quite well. Steps:

$ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
(qemu) savevm blah
(qemu) quit
$ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
(qemu) loadvm blah

Doing this successfully triggers the function to reinitialize the RNG
with the new GUID. (It appears there's a bug in QEMU which prevents
the GUID from being reinitialized when running `loadvm` without
quitting first; I suppose this should be discussed with QEMU
upstream.)

So that's very positive. But I would appreciate hearing from some
ACPI/Virt/Amazon people about this.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-23 16:08   ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 16:08 UTC (permalink / raw)
  To: LKML, Linux Crypto Mailing List, QEMU Developers, KVM list,
	linux-s390, adrian
  Cc: Theodore Ts'o, ehabkost, ben, Jann Horn, Greg Kroah-Hartman,
	Michael S. Tsirkin, Weiss, Radu, lersek, Catangiu, Adrian Costin,
	graf, Igor Mammedov, Colm MacCarthaigh, Singh, Balbir, Woodhouse,
	David

On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> second patch is the reason this is just an RFC: it's a cleanup of the
> ACPI driver from last year, and I don't really have much experience
> writing, testing, debugging, or maintaining these types of drivers.
> Ideally this thread would yield somebody saying, "I see the intent of
> this; I'm happy to take over ownership of this part." That way, I can
> focus on the RNG part, and whoever steps up for the paravirt ACPI part
> can focus on that.

I actually managed to test this in QEMU, and it seems to work quite well. Steps:

$ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
(qemu) savevm blah
(qemu) quit
$ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
(qemu) loadvm blah

Doing this successfully triggers the function to reinitialize the RNG
with the new GUID. (It appears there's a bug in QEMU which prevents
the GUID from being reinitialized when running `loadvm` without
quitting first; I suppose this should be discussed with QEMU
upstream.)

So that's very positive. But I would appreciate hearing from some
ACPI/Virt/Amazon people about this.

Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-23 16:08   ` Jason A. Donenfeld
@ 2022-02-23 16:19     ` Jason A. Donenfeld
  -1 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 16:19 UTC (permalink / raw)
  To: LKML, Linux Crypto Mailing List, QEMU Developers, KVM list,
	linux-s390, adrian
  Cc: Woodhouse, David, Catangiu, Adrian Costin, graf,
	Colm MacCarthaigh, Singh, Balbir, Weiss, Radu, Jann Horn,
	Greg Kroah-Hartman, Theodore Ts'o, Igor Mammedov, ehabkost,
	ben, Michael S. Tsirkin, lersek

On Wed, Feb 23, 2022 at 5:08 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > second patch is the reason this is just an RFC: it's a cleanup of the
> > ACPI driver from last year, and I don't really have much experience
> > writing, testing, debugging, or maintaining these types of drivers.
> > Ideally this thread would yield somebody saying, "I see the intent of
> > this; I'm happy to take over ownership of this part." That way, I can
> > focus on the RNG part, and whoever steps up for the paravirt ACPI part
> > can focus on that.
>
> I actually managed to test this in QEMU, and it seems to work quite well. Steps:
>
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) savevm blah
> (qemu) quit
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) loadvm blah
>
> Doing this successfully triggers the function to reinitialize the RNG
> with the new GUID. (It appears there's a bug in QEMU which prevents
> the GUID from being reinitialized when running `loadvm` without
> quitting first; I suppose this should be discussed with QEMU
> upstream.)
>
> So that's very positive. But I would appreciate hearing from some
> ACPI/Virt/Amazon people about this.

Because something something picture thousand words something, here's a
gif to see this working as expected:
https://data.zx2c4.com/vmgenid-appears-to-work.gif

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-23 16:19     ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 16:19 UTC (permalink / raw)
  To: LKML, Linux Crypto Mailing List, QEMU Developers, KVM list,
	linux-s390, adrian
  Cc: Theodore Ts'o, ehabkost, ben, Jann Horn, Greg Kroah-Hartman,
	Michael S. Tsirkin, Weiss, Radu, lersek, Catangiu, Adrian Costin,
	graf, Igor Mammedov, Colm MacCarthaigh, Singh, Balbir, Woodhouse,
	David

On Wed, Feb 23, 2022 at 5:08 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > second patch is the reason this is just an RFC: it's a cleanup of the
> > ACPI driver from last year, and I don't really have much experience
> > writing, testing, debugging, or maintaining these types of drivers.
> > Ideally this thread would yield somebody saying, "I see the intent of
> > this; I'm happy to take over ownership of this part." That way, I can
> > focus on the RNG part, and whoever steps up for the paravirt ACPI part
> > can focus on that.
>
> I actually managed to test this in QEMU, and it seems to work quite well. Steps:
>
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) savevm blah
> (qemu) quit
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) loadvm blah
>
> Doing this successfully triggers the function to reinitialize the RNG
> with the new GUID. (It appears there's a bug in QEMU which prevents
> the GUID from being reinitialized when running `loadvm` without
> quitting first; I suppose this should be discussed with QEMU
> upstream.)
>
> So that's very positive. But I would appreciate hearing from some
> ACPI/Virt/Amazon people about this.

Because something something picture thousand words something, here's a
gif to see this working as expected:
https://data.zx2c4.com/vmgenid-appears-to-work.gif

Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 2/2] drivers/virt: add vmgenid driver for reinitializing RNG
  2022-02-23 13:12   ` Jason A. Donenfeld
@ 2022-02-23 16:36     ` Jason A. Donenfeld
  -1 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 16:36 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, linux-hyperv
  Cc: Woodhouse, David, Catangiu, Adrian Costin, graf,
	Colm MacCarthaigh, Singh, Balbir, Weiss, Radu, Jann Horn,
	Greg Kroah-Hartman, Theodore Ts'o, Igor Mammedov, ehabkost,
	KVM list, adrian, ben, Michael S. Tsirkin,
	Linux Crypto Mailing List, lersek, linux-s390, LKML,
	QEMU Developers

Adding the Hyper-V people to this:

On Wed, Feb 23, 2022 at 2:13 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> VM Generation ID is a feature from Microsoft, described at
> <https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
> Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
> <https://aka.ms/win10rng>, as:
>
>     If the OS is running in a VM, there is a problem that most
>     hypervisors can snapshot the state of the machine and later rewind
>     the VM state to the saved state. This results in the machine running
>     a second time with the exact same RNG state, which leads to serious
>     security problems.  To reduce the window of vulnerability, Windows
>     10 on a Hyper-V VM will detect when the VM state is reset, retrieve
>     a unique (not random) value from the hypervisor, and reseed the root
>     RNG with that unique value.  This does not eliminate the
>     vulnerability, but it greatly reduces the time during which the RNG
>     system will produce the same outputs as it did during a previous
>     instantiation of the same VM state.
>
> Linux has the same issue, and given that vmgenid is supported already by
> multiple hypervisors, we can implement more or less the same solution.
> So this commit wires up the vmgenid ACPI notification to the RNG's newly
> added add_vmfork_randomness() function.
>
> This driver builds on prior work from Adrian Catangiu at Amazon, and it
> is my hope that that team can resume maintenance of this driver.

If any of you have some experience with the Hyper-V side of this
protocol, could you take a look at this and see if it matches the way
this is supposed to work? It appears to work fine with QEMU's
behavior, at least, but I know Hyper-V has a lot of additional
complexities.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 2/2] drivers/virt: add vmgenid driver for reinitializing RNG
@ 2022-02-23 16:36     ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-23 16:36 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, linux-hyperv
  Cc: linux-s390, Michael S. Tsirkin, Theodore Ts'o, ehabkost,
	KVM list, Jann Horn, QEMU Developers, Greg Kroah-Hartman, ben,
	Weiss, Radu, lersek, adrian, Catangiu, Adrian Costin, graf,
	Linux Crypto Mailing List, Igor Mammedov, LKML,
	Colm MacCarthaigh, Singh, Balbir, Woodhouse, David

Adding the Hyper-V people to this:

On Wed, Feb 23, 2022 at 2:13 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> VM Generation ID is a feature from Microsoft, described at
> <https://go.microsoft.com/fwlink/?LinkId=260709>, and supported by
> Hyper-V and QEMU. Its usage is described in Microsoft's RNG whitepaper,
> <https://aka.ms/win10rng>, as:
>
>     If the OS is running in a VM, there is a problem that most
>     hypervisors can snapshot the state of the machine and later rewind
>     the VM state to the saved state. This results in the machine running
>     a second time with the exact same RNG state, which leads to serious
>     security problems.  To reduce the window of vulnerability, Windows
>     10 on a Hyper-V VM will detect when the VM state is reset, retrieve
>     a unique (not random) value from the hypervisor, and reseed the root
>     RNG with that unique value.  This does not eliminate the
>     vulnerability, but it greatly reduces the time during which the RNG
>     system will produce the same outputs as it did during a previous
>     instantiation of the same VM state.
>
> Linux has the same issue, and given that vmgenid is supported already by
> multiple hypervisors, we can implement more or less the same solution.
> So this commit wires up the vmgenid ACPI notification to the RNG's newly
> added add_vmfork_randomness() function.
>
> This driver builds on prior work from Adrian Catangiu at Amazon, and it
> is my hope that that team can resume maintenance of this driver.

If any of you have some experience with the Hyper-V side of this
protocol, could you take a look at this and see if it matches the way
this is supposed to work? It appears to work fine with QEMU's
behavior, at least, but I know Hyper-V has a lot of additional
complexities.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
  2022-02-23 13:12   ` Jason A. Donenfeld
@ 2022-02-23 23:16     ` Eric Biggers
  -1 siblings, 0 replies; 35+ messages in thread
From: Eric Biggers @ 2022-02-23 23:16 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian,
	dwmw, acatan, graf, colmmacc, sblbir, raduweis, jannh, gregkh,
	tytso

On Wed, Feb 23, 2022 at 02:12:30PM +0100, Jason A. Donenfeld wrote:
> When a VM forks, we must immediately mix in additional information to
> the stream of random output so that two forks or a rollback don't
> produce the same stream of random numbers, which could have catastrophic
> cryptographic consequences. This commit adds a simple API, add_vmfork_
> randomness(), for that.
> 
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Jann Horn <jannh@google.com>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  drivers/char/random.c  | 58 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/random.h |  1 +
>  2 files changed, 59 insertions(+)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 536237a0f073..29d6ce484d15 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -344,6 +344,46 @@ static void crng_reseed(void)
>  	}
>  }
>  
> +/*
> + * This mixes unique_vm_id directly into the base_crng key as soon as
> + * possible, similarly to crng_pre_init_inject(), even if the crng is
> + * already running, in order to immediately branch streams from prior
> + * VM instances.
> + */
> +static void crng_vm_fork_inject(const void *unique_vm_id, size_t len)
> +{
> +	unsigned long flags, next_gen;
> +	struct blake2s_state hash;
> +
> +	/*
> +	 * Unlike crng_reseed(), we take the lock as early as possible,
> +	 * since we don't want the RNG to be used until it's updated.
> +	 */
> +	spin_lock_irqsave(&base_crng.lock, flags);
> +
> +	/*
> +	 * Also update the generation, while locked, as early as
> +	 * possible. This will mean unlocked reads of the generation
> +	 * will cause a reseeding of per-cpu crngs, and those will
> +	 * spin on the base_crng lock waiting for the rest of this
> +	 * operation to complete, which achieves the goal of blocking
> +	 * the production of new output until this is done.
> +	 */
> +	next_gen = base_crng.generation + 1;
> +	if (next_gen == ULONG_MAX)
> +		++next_gen;
> +	WRITE_ONCE(base_crng.generation, next_gen);
> +	WRITE_ONCE(base_crng.birth, jiffies);
> +
> +	/* This is the same formulation used by crng_pre_init_inject(). */
> +	blake2s_init(&hash, sizeof(base_crng.key));
> +	blake2s_update(&hash, base_crng.key, sizeof(base_crng.key));
> +	blake2s_update(&hash, unique_vm_id, len);
> +	blake2s_final(&hash, base_crng.key);
> +
> +	spin_unlock_irqrestore(&base_crng.lock, flags);
> +}
[...]
> +/*
> + * Handle a new unique VM ID, which is unique, not secret, so we
> + * don't credit it, but we do mix it into the entropy pool and
> + * inject it into the crng.
> + */
> +void add_vmfork_randomness(const void *unique_vm_id, size_t size)
> +{
> +	add_device_randomness(unique_vm_id, size);
> +	crng_vm_fork_inject(unique_vm_id, size);
> +}
> +EXPORT_SYMBOL_GPL(add_vmfork_randomness);

I think we should be removing cases where the base_crng key is changed directly
besides extraction from the input_pool, not adding new ones.  Why not implement
this as add_device_randomness() followed by crng_reseed(force=true), where the
'force' argument forces a reseed to occur even if the entropy_count is too low?

- Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
@ 2022-02-23 23:16     ` Eric Biggers
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Biggers @ 2022-02-23 23:16 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-s390, tytso, kvm, adrian, jannh, gregkh, raduweis,
	qemu-devel, linux-kernel, acatan, graf, linux-crypto, colmmacc,
	sblbir, dwmw

On Wed, Feb 23, 2022 at 02:12:30PM +0100, Jason A. Donenfeld wrote:
> When a VM forks, we must immediately mix in additional information to
> the stream of random output so that two forks or a rollback don't
> produce the same stream of random numbers, which could have catastrophic
> cryptographic consequences. This commit adds a simple API, add_vmfork_
> randomness(), for that.
> 
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Jann Horn <jannh@google.com>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  drivers/char/random.c  | 58 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/random.h |  1 +
>  2 files changed, 59 insertions(+)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 536237a0f073..29d6ce484d15 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -344,6 +344,46 @@ static void crng_reseed(void)
>  	}
>  }
>  
> +/*
> + * This mixes unique_vm_id directly into the base_crng key as soon as
> + * possible, similarly to crng_pre_init_inject(), even if the crng is
> + * already running, in order to immediately branch streams from prior
> + * VM instances.
> + */
> +static void crng_vm_fork_inject(const void *unique_vm_id, size_t len)
> +{
> +	unsigned long flags, next_gen;
> +	struct blake2s_state hash;
> +
> +	/*
> +	 * Unlike crng_reseed(), we take the lock as early as possible,
> +	 * since we don't want the RNG to be used until it's updated.
> +	 */
> +	spin_lock_irqsave(&base_crng.lock, flags);
> +
> +	/*
> +	 * Also update the generation, while locked, as early as
> +	 * possible. This will mean unlocked reads of the generation
> +	 * will cause a reseeding of per-cpu crngs, and those will
> +	 * spin on the base_crng lock waiting for the rest of this
> +	 * operation to complete, which achieves the goal of blocking
> +	 * the production of new output until this is done.
> +	 */
> +	next_gen = base_crng.generation + 1;
> +	if (next_gen == ULONG_MAX)
> +		++next_gen;
> +	WRITE_ONCE(base_crng.generation, next_gen);
> +	WRITE_ONCE(base_crng.birth, jiffies);
> +
> +	/* This is the same formulation used by crng_pre_init_inject(). */
> +	blake2s_init(&hash, sizeof(base_crng.key));
> +	blake2s_update(&hash, base_crng.key, sizeof(base_crng.key));
> +	blake2s_update(&hash, unique_vm_id, len);
> +	blake2s_final(&hash, base_crng.key);
> +
> +	spin_unlock_irqrestore(&base_crng.lock, flags);
> +}
[...]
> +/*
> + * Handle a new unique VM ID, which is unique, not secret, so we
> + * don't credit it, but we do mix it into the entropy pool and
> + * inject it into the crng.
> + */
> +void add_vmfork_randomness(const void *unique_vm_id, size_t size)
> +{
> +	add_device_randomness(unique_vm_id, size);
> +	crng_vm_fork_inject(unique_vm_id, size);
> +}
> +EXPORT_SYMBOL_GPL(add_vmfork_randomness);

I think we should be removing cases where the base_crng key is changed directly
besides extraction from the input_pool, not adding new ones.  Why not implement
this as add_device_randomness() followed by crng_reseed(force=true), where the
'force' argument forces a reseed to occur even if the entropy_count is too low?

- Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
  2022-02-23 23:16     ` Eric Biggers
@ 2022-02-24  0:54       ` Jason A. Donenfeld
  -1 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24  0:54 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-s390, tytso, kvm, adrian, jannh, gregkh, raduweis,
	qemu-devel, linux-kernel, acatan, graf, linux-crypto, colmmacc,
	sblbir, dwmw

On 2/24/22, Eric Biggers <ebiggers@kernel.org> wrote:
> I think we should be removing cases where the base_crng key is changed
> directly
> besides extraction from the input_pool, not adding new ones.  Why not
> implement
> this as add_device_randomness() followed by crng_reseed(force=true), where
> the
> 'force' argument forces a reseed to occur even if the entropy_count is too
> low?

Because that induces a "premature next" condition which can let that
entropy, potentially newly acquired by a storm of IRQs at power-on, be
bruteforced by unprivileged userspace. I actually had it exactly the
way you describe at first, but decided that this here is the lesser of
evils and doesn't really complicate things the way an intentional
premature next would. The only thing we care about here is branching
the crng stream, and so this does explicitly that, without having to
interfere with how we collect entropy. Of course we *also* add it as
non-credited "device randomness" so that it's part of the next
reseeding, whenever that might occur.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
@ 2022-02-24  0:54       ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24  0:54 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian,
	dwmw, acatan, graf, colmmacc, sblbir, raduweis, jannh, gregkh,
	tytso

On 2/24/22, Eric Biggers <ebiggers@kernel.org> wrote:
> I think we should be removing cases where the base_crng key is changed
> directly
> besides extraction from the input_pool, not adding new ones.  Why not
> implement
> this as add_device_randomness() followed by crng_reseed(force=true), where
> the
> 'force' argument forces a reseed to occur even if the entropy_count is too
> low?

Because that induces a "premature next" condition which can let that
entropy, potentially newly acquired by a storm of IRQs at power-on, be
bruteforced by unprivileged userspace. I actually had it exactly the
way you describe at first, but decided that this here is the lesser of
evils and doesn't really complicate things the way an intentional
premature next would. The only thing we care about here is branching
the crng stream, and so this does explicitly that, without having to
interfere with how we collect entropy. Of course we *also* add it as
non-credited "device randomness" so that it's part of the next
reseeding, whenever that might occur.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
  2022-02-24  0:54       ` Jason A. Donenfeld
@ 2022-02-24  1:27         ` Eric Biggers
  -1 siblings, 0 replies; 35+ messages in thread
From: Eric Biggers @ 2022-02-24  1:27 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian,
	dwmw, acatan, graf, colmmacc, sblbir, raduweis, jannh, gregkh,
	tytso

On Thu, Feb 24, 2022 at 01:54:54AM +0100, Jason A. Donenfeld wrote:
> On 2/24/22, Eric Biggers <ebiggers@kernel.org> wrote:
> > I think we should be removing cases where the base_crng key is changed
> > directly
> > besides extraction from the input_pool, not adding new ones.  Why not
> > implement
> > this as add_device_randomness() followed by crng_reseed(force=true), where
> > the
> > 'force' argument forces a reseed to occur even if the entropy_count is too
> > low?
> 
> Because that induces a "premature next" condition which can let that
> entropy, potentially newly acquired by a storm of IRQs at power-on, be
> bruteforced by unprivileged userspace. I actually had it exactly the
> way you describe at first, but decided that this here is the lesser of
> evils and doesn't really complicate things the way an intentional
> premature next would. The only thing we care about here is branching
> the crng stream, and so this does explicitly that, without having to
> interfere with how we collect entropy. Of course we *also* add it as
> non-credited "device randomness" so that it's part of the next
> reseeding, whenever that might occur.

Can you make sure to properly explain this in the code?

- Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
@ 2022-02-24  1:27         ` Eric Biggers
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Biggers @ 2022-02-24  1:27 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-s390, tytso, kvm, adrian, jannh, gregkh, raduweis,
	qemu-devel, linux-kernel, acatan, graf, linux-crypto, colmmacc,
	sblbir, dwmw

On Thu, Feb 24, 2022 at 01:54:54AM +0100, Jason A. Donenfeld wrote:
> On 2/24/22, Eric Biggers <ebiggers@kernel.org> wrote:
> > I think we should be removing cases where the base_crng key is changed
> > directly
> > besides extraction from the input_pool, not adding new ones.  Why not
> > implement
> > this as add_device_randomness() followed by crng_reseed(force=true), where
> > the
> > 'force' argument forces a reseed to occur even if the entropy_count is too
> > low?
> 
> Because that induces a "premature next" condition which can let that
> entropy, potentially newly acquired by a storm of IRQs at power-on, be
> bruteforced by unprivileged userspace. I actually had it exactly the
> way you describe at first, but decided that this here is the lesser of
> evils and doesn't really complicate things the way an intentional
> premature next would. The only thing we care about here is branching
> the crng stream, and so this does explicitly that, without having to
> interfere with how we collect entropy. Of course we *also* add it as
> non-credited "device randomness" so that it's part of the next
> reseeding, whenever that might occur.

Can you make sure to properly explain this in the code?

- Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-23 16:08   ` Jason A. Donenfeld
  (?)
  (?)
@ 2022-02-24  8:22   ` Laszlo Ersek
  2022-02-24 10:43       ` Jason A. Donenfeld
  2022-02-24 10:55       ` Daniel P. Berrangé
  -1 siblings, 2 replies; 35+ messages in thread
From: Laszlo Ersek @ 2022-02-24  8:22 UTC (permalink / raw)
  To: Jason A. Donenfeld, LKML, Linux Crypto Mailing List,
	QEMU Developers, KVM list, linux-s390, adrian
  Cc: Theodore Ts'o, ehabkost, ben, Jann Horn, Greg Kroah-Hartman,
	Michael S. Tsirkin, Weiss, Radu, Daniel P. Berrangé,
	Richard W.M. Jones, Catangiu, Adrian Costin, graf, Igor Mammedov,
	Colm MacCarthaigh, Singh, Balbir, Woodhouse, David

(+Daniel, +Rich)

On 02/23/22 17:08, Jason A. Donenfeld wrote:
> On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>> second patch is the reason this is just an RFC: it's a cleanup of the
>> ACPI driver from last year, and I don't really have much experience
>> writing, testing, debugging, or maintaining these types of drivers.
>> Ideally this thread would yield somebody saying, "I see the intent of
>> this; I'm happy to take over ownership of this part." That way, I can
>> focus on the RNG part, and whoever steps up for the paravirt ACPI part
>> can focus on that.
> 
> I actually managed to test this in QEMU, and it seems to work quite well. Steps:
> 
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) savevm blah
> (qemu) quit
> $ qemu-system-x86_64 ... -device vmgenid,guid=auto -monitor stdio
> (qemu) loadvm blah
> 
> Doing this successfully triggers the function to reinitialize the RNG
> with the new GUID.

QEMU's related design is documented in
<https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vmgenid.txt>.

The Microsoft specification referenced therein,
<http://go.microsoft.com/fwlink/?LinkId=260709>, contains the following
statement:

"they can also use the data provided in the 128-bit identifier as a high
entropy random data source"

So reinitializing an RNG from it is an express purpose.

More info in the libvirt docs (see "genid"):

https://libvirt.org/formatdomain.html#general-metadata

QEMU's interpretation of the VMGENID specifically as a UUID (which I
believe comes from me) has received (valid) criticism since:

https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt

(This document also investigates VMGENID on other hypervisors, which I
think pertains to your other message.)

> (It appears there's a bug in QEMU which prevents
> the GUID from being reinitialized when running `loadvm` without
> quitting first; I suppose this should be discussed with QEMU
> upstream.)

That's not (necessarily) a bug; see the end of the above-linked QEMU
document:

"There are no known use cases for changing the GUID once QEMU is
running, and adding this capability would greatly increase the complexity."

> 
> So that's very positive. But I would appreciate hearing from some
> ACPI/Virt/Amazon people about this.

I've only made some random comments; I didn't see a question so I
couldn't attempt to answer :)

Thanks
Laszlo



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-23 13:12 ` Jason A. Donenfeld
                   ` (3 preceding siblings ...)
  (?)
@ 2022-02-24  8:53 ` Alexander Graf
  2022-02-24 10:43     ` Daniel P. Berrangé
  2022-02-24 10:53     ` Jason A. Donenfeld
  -1 siblings, 2 replies; 35+ messages in thread
From: Alexander Graf @ 2022-02-24  8:53 UTC (permalink / raw)
  To: Jason A. Donenfeld, linux-kernel, linux-crypto, qemu-devel, kvm,
	linux-s390, adrian
  Cc: dwmw, acatan, colmmacc, sblbir, raduweis, jannh, gregkh, tytso

Hey Jason,

On 23.02.22 14:12, Jason A. Donenfeld wrote:
> This small series picks up work from Amazon that seems to have stalled
> out later year around this time: listening for the vmgenid ACPI
> notification, and using it to "do something." Last year, that something
> involved a complicated userspace mmap chardev, which seems frought with
> difficulty. This year, I have something much simpler in mind: simply
> using those ACPI notifications to tell the RNG to reinitialize safely,
> so we don't repeat random numbers in cloned, forked, or rolled-back VM
> instances.
>
> This series consists of two patches. The first is a rather
> straightforward addition to random.c, which I feel fine about. The
> second patch is the reason this is just an RFC: it's a cleanup of the
> ACPI driver from last year, and I don't really have much experience
> writing, testing, debugging, or maintaining these types of drivers.
> Ideally this thread would yield somebody saying, "I see the intent of
> this; I'm happy to take over ownership of this part." That way, I can
> focus on the RNG part, and whoever steps up for the paravirt ACPI part
> can focus on that.
>
> As a final note, this series intentionally does _not_ focus on
> notification of these events to userspace or to other kernel consumers.
> Since these VM fork detection events first need to hit the RNG, we can
> later talk about what sorts of notifications or mmap'd counters the RNG
> should be making accessible to elsewhere. But that's a different sort of
> project and ties into a lot of more complicated concerns beyond this
> more basic patchset. So hopefully we can keep the discussion rather
> focused here to this ACPI business.


The main problem with VMGenID is that it is inherently racy. There will 
always be a (short) amount of time where the ACPI notification is not 
processed, but the VM could use its RNG to for example establish TLS 
connections.

Hence we as the next step proposed a multi-stage quiesce/resume 
mechanism where the system is aware that it is going into suspend - can 
block network connections for example - and only returns to a fully 
functional state after an unquiesce phase:

   https://github.com/systemd/systemd/issues/20222

Looking at the issue again, it seems like we completely missed to follow 
up with a PR to implement that functionality :(.

What exact use case do you have in mind for the RNG/VMGenID update? Can 
you think of situations where the race is not an actual concern?


Alex


>
> Cc: dwmw@amazon.co.uk
> Cc: acatan@amazon.com
> Cc: graf@amazon.com
> Cc: colmmacc@amazon.com
> Cc: sblbir@amazon.com
> Cc: raduweis@amazon.com
> Cc: jannh@google.com
> Cc: gregkh@linuxfoundation.org
> Cc: tytso@mit.edu
>
> Jason A. Donenfeld (2):
>    random: add mechanism for VM forks to reinitialize crng
>    drivers/virt: add vmgenid driver for reinitializing RNG
>
>   drivers/char/random.c  |  58 ++++++++++++++++++
>   drivers/virt/Kconfig   |   8 +++
>   drivers/virt/Makefile  |   1 +
>   drivers/virt/vmgenid.c | 133 +++++++++++++++++++++++++++++++++++++++++
>   include/linux/random.h |   1 +
>   5 files changed, 201 insertions(+)
>   create mode 100644 drivers/virt/vmgenid.c
>
> --
> 2.35.1
>



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-24  8:53 ` Alexander Graf
@ 2022-02-24 10:43     ` Daniel P. Berrangé
  2022-02-24 10:53     ` Jason A. Donenfeld
  1 sibling, 0 replies; 35+ messages in thread
From: Daniel P. Berrangé @ 2022-02-24 10:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jason A. Donenfeld, linux-kernel, linux-crypto, qemu-devel, kvm,
	linux-s390, adrian, dwmw, acatan, colmmacc, sblbir, raduweis,
	jannh, gregkh, tytso

On Thu, Feb 24, 2022 at 09:53:59AM +0100, Alexander Graf wrote:
> Hey Jason,
> 
> On 23.02.22 14:12, Jason A. Donenfeld wrote:
> > This small series picks up work from Amazon that seems to have stalled
> > out later year around this time: listening for the vmgenid ACPI
> > notification, and using it to "do something." Last year, that something
> > involved a complicated userspace mmap chardev, which seems frought with
> > difficulty. This year, I have something much simpler in mind: simply
> > using those ACPI notifications to tell the RNG to reinitialize safely,
> > so we don't repeat random numbers in cloned, forked, or rolled-back VM
> > instances.
> > 
> > This series consists of two patches. The first is a rather
> > straightforward addition to random.c, which I feel fine about. The
> > second patch is the reason this is just an RFC: it's a cleanup of the
> > ACPI driver from last year, and I don't really have much experience
> > writing, testing, debugging, or maintaining these types of drivers.
> > Ideally this thread would yield somebody saying, "I see the intent of
> > this; I'm happy to take over ownership of this part." That way, I can
> > focus on the RNG part, and whoever steps up for the paravirt ACPI part
> > can focus on that.
> > 
> > As a final note, this series intentionally does _not_ focus on
> > notification of these events to userspace or to other kernel consumers.
> > Since these VM fork detection events first need to hit the RNG, we can
> > later talk about what sorts of notifications or mmap'd counters the RNG
> > should be making accessible to elsewhere. But that's a different sort of
> > project and ties into a lot of more complicated concerns beyond this
> > more basic patchset. So hopefully we can keep the discussion rather
> > focused here to this ACPI business.
> 
> 
> The main problem with VMGenID is that it is inherently racy. There will
> always be a (short) amount of time where the ACPI notification is not
> processed, but the VM could use its RNG to for example establish TLS
> connections.
> 
> Hence we as the next step proposed a multi-stage quiesce/resume mechanism
> where the system is aware that it is going into suspend - can block network
> connections for example - and only returns to a fully functional state after
> an unquiesce phase:
> 
>   https://github.com/systemd/systemd/issues/20222

The downside of course is precisely that the guest now needs to be aware
and involved every single time a snapshot is taken.

Currently with virt the act of taking a snapshot can often remain invisible
to the VM with no functional effect on the guest OS or its workload, and
the host OS knows it can complete a snapshot in a specific timeframe. That
said, this transparency to the VM is precisely the cause of the race
condition described.

With guest involvement to quiesce the bulk of activity for time period,
there is more likely to be a negative impact on the guest workload. The
guest admin likely needs to be more explicit about exactly when in time
it is reasonable to take a snapshot to mitigate the impact.

The host OS snapshot operations are also now dependant on co-operation
of a guest OS that has to be considered to be potentially malicious, or
at least crashed/non-responsive. The guest OS also needs a way to receive
the triggers for snapshot capture and restore, most likely via an extension
to something like the QEMU guest agent or an equivalent for othuer
hypervisors.


Despite the above, I'm not against the idea of co-operative involvement
of the guest OS in the acts of taking & restoring snapshots. I can't
see any other proposals so far that can reliably eliminate the races
in the general case, from the kernel right upto user applications.
So I think it is neccessary to have guest cooperative snapshotting.

> What exact use case do you have in mind for the RNG/VMGenID update? Can you
> think of situations where the race is not an actual concern?

Lets assume we do take the approach described in that systemd bug and
have a co-operative snapshot process. If the hypervisor does the right
thing and guest owners install the right things, they'll have a race
free solution that works well in normal operation. That's good.


Realistically though, it is never going to be universally and reliably
put into practice. So what is our attitude to cases where the preferred
solution isn't availble and/or operative ?


There are going to be users who continue to build their guest disk images
without the QEMU guest agent (or equivalent for whatever hypervisor they
run on) installed because they don't know any better. Or where the guest
agent is mis-configured or fails to starts or some other scenario that
prevents the quiesce working as desired. The host mgmt could refuse to
take a snapshot in these cases. More likely is that they are just
going to go ahead and do a snapshot anyway because lack of guest agent
is a very common scenario today and users want their snapshots.


There are going to be virt management apps / hypervisors that don't
support talking to any guest agent across their snapshot operation
in the first place, so systemd gets no way to trigger the required
quiesce dance on snapshot, but they likely have VMGenID support
implemented already.


IOW, I could view VMGenID triggered fork detection integrated with
the kernel RNG as providing a backup line of defence that is going
to "just work", albeit with the known race. It isn't as good as the
guest co-operative snapshot approach, because it only tries to solve
the one specific targetted problem of updating the kernel RNG.

Is it still better than doing nothing at all though, for the scenario
where guest co-operative snapshot is unavailable ?

If it is better than nothing, is it then compelling enough to justify
the maint cost of the code added to the kernel ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-24 10:43     ` Daniel P. Berrangé
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel P. Berrangé @ 2022-02-24 10:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-s390, Jason A. Donenfeld, tytso, adrian, kvm, jannh,
	gregkh, raduweis, qemu-devel, linux-kernel, acatan, linux-crypto,
	colmmacc, sblbir, dwmw

On Thu, Feb 24, 2022 at 09:53:59AM +0100, Alexander Graf wrote:
> Hey Jason,
> 
> On 23.02.22 14:12, Jason A. Donenfeld wrote:
> > This small series picks up work from Amazon that seems to have stalled
> > out later year around this time: listening for the vmgenid ACPI
> > notification, and using it to "do something." Last year, that something
> > involved a complicated userspace mmap chardev, which seems frought with
> > difficulty. This year, I have something much simpler in mind: simply
> > using those ACPI notifications to tell the RNG to reinitialize safely,
> > so we don't repeat random numbers in cloned, forked, or rolled-back VM
> > instances.
> > 
> > This series consists of two patches. The first is a rather
> > straightforward addition to random.c, which I feel fine about. The
> > second patch is the reason this is just an RFC: it's a cleanup of the
> > ACPI driver from last year, and I don't really have much experience
> > writing, testing, debugging, or maintaining these types of drivers.
> > Ideally this thread would yield somebody saying, "I see the intent of
> > this; I'm happy to take over ownership of this part." That way, I can
> > focus on the RNG part, and whoever steps up for the paravirt ACPI part
> > can focus on that.
> > 
> > As a final note, this series intentionally does _not_ focus on
> > notification of these events to userspace or to other kernel consumers.
> > Since these VM fork detection events first need to hit the RNG, we can
> > later talk about what sorts of notifications or mmap'd counters the RNG
> > should be making accessible to elsewhere. But that's a different sort of
> > project and ties into a lot of more complicated concerns beyond this
> > more basic patchset. So hopefully we can keep the discussion rather
> > focused here to this ACPI business.
> 
> 
> The main problem with VMGenID is that it is inherently racy. There will
> always be a (short) amount of time where the ACPI notification is not
> processed, but the VM could use its RNG to for example establish TLS
> connections.
> 
> Hence we as the next step proposed a multi-stage quiesce/resume mechanism
> where the system is aware that it is going into suspend - can block network
> connections for example - and only returns to a fully functional state after
> an unquiesce phase:
> 
>   https://github.com/systemd/systemd/issues/20222

The downside of course is precisely that the guest now needs to be aware
and involved every single time a snapshot is taken.

Currently with virt the act of taking a snapshot can often remain invisible
to the VM with no functional effect on the guest OS or its workload, and
the host OS knows it can complete a snapshot in a specific timeframe. That
said, this transparency to the VM is precisely the cause of the race
condition described.

With guest involvement to quiesce the bulk of activity for time period,
there is more likely to be a negative impact on the guest workload. The
guest admin likely needs to be more explicit about exactly when in time
it is reasonable to take a snapshot to mitigate the impact.

The host OS snapshot operations are also now dependant on co-operation
of a guest OS that has to be considered to be potentially malicious, or
at least crashed/non-responsive. The guest OS also needs a way to receive
the triggers for snapshot capture and restore, most likely via an extension
to something like the QEMU guest agent or an equivalent for othuer
hypervisors.


Despite the above, I'm not against the idea of co-operative involvement
of the guest OS in the acts of taking & restoring snapshots. I can't
see any other proposals so far that can reliably eliminate the races
in the general case, from the kernel right upto user applications.
So I think it is neccessary to have guest cooperative snapshotting.

> What exact use case do you have in mind for the RNG/VMGenID update? Can you
> think of situations where the race is not an actual concern?

Lets assume we do take the approach described in that systemd bug and
have a co-operative snapshot process. If the hypervisor does the right
thing and guest owners install the right things, they'll have a race
free solution that works well in normal operation. That's good.


Realistically though, it is never going to be universally and reliably
put into practice. So what is our attitude to cases where the preferred
solution isn't availble and/or operative ?


There are going to be users who continue to build their guest disk images
without the QEMU guest agent (or equivalent for whatever hypervisor they
run on) installed because they don't know any better. Or where the guest
agent is mis-configured or fails to starts or some other scenario that
prevents the quiesce working as desired. The host mgmt could refuse to
take a snapshot in these cases. More likely is that they are just
going to go ahead and do a snapshot anyway because lack of guest agent
is a very common scenario today and users want their snapshots.


There are going to be virt management apps / hypervisors that don't
support talking to any guest agent across their snapshot operation
in the first place, so systemd gets no way to trigger the required
quiesce dance on snapshot, but they likely have VMGenID support
implemented already.


IOW, I could view VMGenID triggered fork detection integrated with
the kernel RNG as providing a backup line of defence that is going
to "just work", albeit with the known race. It isn't as good as the
guest co-operative snapshot approach, because it only tries to solve
the one specific targetted problem of updating the kernel RNG.

Is it still better than doing nothing at all though, for the scenario
where guest co-operative snapshot is unavailable ?

If it is better than nothing, is it then compelling enough to justify
the maint cost of the code added to the kernel ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-24  8:22   ` Laszlo Ersek
@ 2022-02-24 10:43       ` Jason A. Donenfeld
  2022-02-24 10:55       ` Daniel P. Berrangé
  1 sibling, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24 10:43 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: LKML, Linux Crypto Mailing List, QEMU Developers, KVM list,
	linux-s390, adrian, Woodhouse, David, Catangiu, Adrian Costin,
	graf, Colm MacCarthaigh, Singh, Balbir, Weiss, Radu, Jann Horn,
	Greg Kroah-Hartman, Theodore Ts'o, Igor Mammedov, ehabkost,
	ben, Michael S. Tsirkin, Daniel P. Berrangé,
	Richard W.M. Jones

Hi Lazlo,

Thanks for your reply.

On Thu, Feb 24, 2022 at 9:23 AM Laszlo Ersek <lersek@redhat.com> wrote:
> QEMU's related design is documented in
> <https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vmgenid.txt>.

I'll link to this document on the 2/2 patch next to the other ones.

> "they can also use the data provided in the 128-bit identifier as a high
> entropy random data source"
>
> So reinitializing an RNG from it is an express purpose.

It seems like this is indeed meant to be used for RNG purposes, but
the Windows 10 RNG document says: "Windows 10 on a Hyper-V VM will
detect when the VM state is reset, retrieve a unique (not random)
value from the hypervisor." I gather from that that it's not totally
clear what the "quality" of those 128 bits are. So this patchset mixes
them into the entropy pool, but does not credit it, which is
consistent with how the RNG deals with other data where the conclusion
is, "probably pretty good but maybe not," erring on the side of
caution. Either way, it's certainly being used -- and combined with
what was there before -- to reinitialize the RNG following a VM fork.

>
> More info in the libvirt docs (see "genid"):
>
> https://libvirt.org/formatdomain.html#general-metadata

Thanks, noted in the 2/2 patch too.

> QEMU's interpretation of the VMGENID specifically as a UUID (which I
> believe comes from me) has received (valid) criticism since:
>
> https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt
>
> (This document also investigates VMGENID on other hypervisors, which I
> think pertains to your other message.)

Thank you very much for this reference! You're absolutely right here.
v3 will treat this as just an opaque 128-bit binary blob. There's no
point, anyway, in treating it as a UUID in the kernel, since it never
should be printed or exposed to anywhere except random.c (and my gifs,
of course :-P).

>
> > (It appears there's a bug in QEMU which prevents
> > the GUID from being reinitialized when running `loadvm` without
> > quitting first; I suppose this should be discussed with QEMU
> > upstream.)
>
> That's not (necessarily) a bug; see the end of the above-linked QEMU
> document:
>
> "There are no known use cases for changing the GUID once QEMU is
> running, and adding this capability would greatly increase the complexity."

I read that, and I think I might disagree? If you're QEMUing with the
monitor and are jumping back and forth and all around between saved
snapshots, probably those snapshots should have their RNG
reinitialized through this mechanism, right? It seems like doing that
would be the proper behavior for `guid=auto`, but not for
`guid={some-fixed-thing}`.

> > So that's very positive. But I would appreciate hearing from some
> > ACPI/Virt/Amazon people about this.
>
> I've only made some random comments; I didn't see a question so I
> couldn't attempt to answer :)

"Am I on the right track," I guess, and your reply has been very
informative. Thanks for your feedback. I'll have a v3 sent out not
before long.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-24 10:43       ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24 10:43 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: linux-s390, Daniel P. Berrangé,
	Michael S. Tsirkin, Theodore Ts'o, KVM list, adrian,
	Jann Horn, Greg Kroah-Hartman, ben, Weiss, Radu, QEMU Developers,
	Richard W.M. Jones, LKML, Catangiu, Adrian Costin, graf,
	Linux Crypto Mailing List, Igor Mammedov, Colm MacCarthaigh,
	Singh, Balbir, Woodhouse, David, ehabkost

Hi Lazlo,

Thanks for your reply.

On Thu, Feb 24, 2022 at 9:23 AM Laszlo Ersek <lersek@redhat.com> wrote:
> QEMU's related design is documented in
> <https://git.qemu.org/?p=qemu.git;a=blob;f=docs/specs/vmgenid.txt>.

I'll link to this document on the 2/2 patch next to the other ones.

> "they can also use the data provided in the 128-bit identifier as a high
> entropy random data source"
>
> So reinitializing an RNG from it is an express purpose.

It seems like this is indeed meant to be used for RNG purposes, but
the Windows 10 RNG document says: "Windows 10 on a Hyper-V VM will
detect when the VM state is reset, retrieve a unique (not random)
value from the hypervisor." I gather from that that it's not totally
clear what the "quality" of those 128 bits are. So this patchset mixes
them into the entropy pool, but does not credit it, which is
consistent with how the RNG deals with other data where the conclusion
is, "probably pretty good but maybe not," erring on the side of
caution. Either way, it's certainly being used -- and combined with
what was there before -- to reinitialize the RNG following a VM fork.

>
> More info in the libvirt docs (see "genid"):
>
> https://libvirt.org/formatdomain.html#general-metadata

Thanks, noted in the 2/2 patch too.

> QEMU's interpretation of the VMGENID specifically as a UUID (which I
> believe comes from me) has received (valid) criticism since:
>
> https://github.com/libguestfs/virt-v2v/blob/master/docs/vm-generation-id-across-hypervisors.txt
>
> (This document also investigates VMGENID on other hypervisors, which I
> think pertains to your other message.)

Thank you very much for this reference! You're absolutely right here.
v3 will treat this as just an opaque 128-bit binary blob. There's no
point, anyway, in treating it as a UUID in the kernel, since it never
should be printed or exposed to anywhere except random.c (and my gifs,
of course :-P).

>
> > (It appears there's a bug in QEMU which prevents
> > the GUID from being reinitialized when running `loadvm` without
> > quitting first; I suppose this should be discussed with QEMU
> > upstream.)
>
> That's not (necessarily) a bug; see the end of the above-linked QEMU
> document:
>
> "There are no known use cases for changing the GUID once QEMU is
> running, and adding this capability would greatly increase the complexity."

I read that, and I think I might disagree? If you're QEMUing with the
monitor and are jumping back and forth and all around between saved
snapshots, probably those snapshots should have their RNG
reinitialized through this mechanism, right? It seems like doing that
would be the proper behavior for `guid=auto`, but not for
`guid={some-fixed-thing}`.

> > So that's very positive. But I would appreciate hearing from some
> > ACPI/Virt/Amazon people about this.
>
> I've only made some random comments; I didn't see a question so I
> couldn't attempt to answer :)

"Am I on the right track," I guess, and your reply has been very
informative. Thanks for your feedback. I'll have a v3 sent out not
before long.

Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-24  8:53 ` Alexander Graf
@ 2022-02-24 10:53     ` Jason A. Donenfeld
  2022-02-24 10:53     ` Jason A. Donenfeld
  1 sibling, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24 10:53 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-kernel, linux-crypto, qemu-devel, kvm, linux-s390, adrian,
	dwmw, acatan, colmmacc, sblbir, raduweis, jannh, gregkh, tytso

Hi Alex,

Strangely your message never made it to me, and I had to pull this out
of Lore after seeing Daniel's reply to it. I wonder what's up.

On Thu, Feb 24, 2022 at 09:53:59AM +0100, Alexander Graf wrote:
> The main problem with VMGenID is that it is inherently racy. There will 
> always be a (short) amount of time where the ACPI notification is not 
> processed, but the VM could use its RNG to for example establish TLS 
> connections.
> 
> Hence we as the next step proposed a multi-stage quiesce/resume 
> mechanism where the system is aware that it is going into suspend - can 
> block network connections for example - and only returns to a fully 
> functional state after an unquiesce phase:
> 
>    https://github.com/systemd/systemd/issues/20222
> 
> Looking at the issue again, it seems like we completely missed to follow 
> up with a PR to implement that functionality :(.
> 
> What exact use case do you have in mind for the RNG/VMGenID update? Can 
> you think of situations where the race is not an actual concern?

No, I think the race is something that remains a problem for the
situations I care about. There are simpler ways of fixing that -- just
expose a single incrementing integer so that it can be checked every
time the RNG does something, without being expensive, via the same
mechanism -- and then you don't need any complexity. But anyway, that
doesn't exist right now, so this series tries to implement something for
what does exist and is already supported by multiple hypervisors. I'd
suggest sending a proposal for an improved mechanism as part of a
different thread, and pull the various parties into that, and we can
make something good for the future. I'm happy to implement whatever the
virtual hardware exposes.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-24 10:53     ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24 10:53 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linux-s390, tytso, kvm, adrian, jannh, gregkh, raduweis,
	qemu-devel, linux-kernel, acatan, linux-crypto, colmmacc, sblbir,
	dwmw

Hi Alex,

Strangely your message never made it to me, and I had to pull this out
of Lore after seeing Daniel's reply to it. I wonder what's up.

On Thu, Feb 24, 2022 at 09:53:59AM +0100, Alexander Graf wrote:
> The main problem with VMGenID is that it is inherently racy. There will 
> always be a (short) amount of time where the ACPI notification is not 
> processed, but the VM could use its RNG to for example establish TLS 
> connections.
> 
> Hence we as the next step proposed a multi-stage quiesce/resume 
> mechanism where the system is aware that it is going into suspend - can 
> block network connections for example - and only returns to a fully 
> functional state after an unquiesce phase:
> 
>    https://github.com/systemd/systemd/issues/20222
> 
> Looking at the issue again, it seems like we completely missed to follow 
> up with a PR to implement that functionality :(.
> 
> What exact use case do you have in mind for the RNG/VMGenID update? Can 
> you think of situations where the race is not an actual concern?

No, I think the race is something that remains a problem for the
situations I care about. There are simpler ways of fixing that -- just
expose a single incrementing integer so that it can be checked every
time the RNG does something, without being expensive, via the same
mechanism -- and then you don't need any complexity. But anyway, that
doesn't exist right now, so this series tries to implement something for
what does exist and is already supported by multiple hypervisors. I'd
suggest sending a proposal for an improved mechanism as part of a
different thread, and pull the various parties into that, and we can
make something good for the future. I'm happy to implement whatever the
virtual hardware exposes.

Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-24  8:22   ` Laszlo Ersek
@ 2022-02-24 10:55       ` Daniel P. Berrangé
  2022-02-24 10:55       ` Daniel P. Berrangé
  1 sibling, 0 replies; 35+ messages in thread
From: Daniel P. Berrangé @ 2022-02-24 10:55 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Jason A. Donenfeld, LKML, Linux Crypto Mailing List,
	QEMU Developers, KVM list, linux-s390, adrian, Woodhouse, David,
	Catangiu, Adrian Costin, graf, Colm MacCarthaigh, Singh, Balbir,
	Weiss, Radu, Jann Horn, Greg Kroah-Hartman, Theodore Ts'o,
	Igor Mammedov, ehabkost, ben, Michael S. Tsirkin,
	Richard W.M. Jones

On Thu, Feb 24, 2022 at 09:22:50AM +0100, Laszlo Ersek wrote:
> (+Daniel, +Rich)
> 
> On 02/23/22 17:08, Jason A. Donenfeld wrote:
> > On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >> second patch is the reason this is just an RFC: it's a cleanup of the
> >> ACPI driver from last year, and I don't really have much experience
> >> writing, testing, debugging, or maintaining these types of drivers.
> >> Ideally this thread would yield somebody saying, "I see the intent of
> >> this; I'm happy to take over ownership of this part." That way, I can
> >> focus on the RNG part, and whoever steps up for the paravirt ACPI part
> >> can focus on that.
> 
> > (It appears there's a bug in QEMU which prevents
> > the GUID from being reinitialized when running `loadvm` without
> > quitting first; I suppose this should be discussed with QEMU
> > upstream.)
> 
> That's not (necessarily) a bug; see the end of the above-linked QEMU
> document:
> 
> "There are no known use cases for changing the GUID once QEMU is
> running, and adding this capability would greatly increase the complexity."

IIRC this part of the QEMU doc was making an implicit assumption
about the way QEMU is to be used by mgmt apps doing snapshots.

Instead of using the 'loadvm' command on the existing running QEMU
process, the doc seems to tacitly expect the management app will
throwaway the existing QEMU process and spawn a brand new QEMU
process to load the snapshot into, thus getting the new GUID on
the QEMU command line. There are some downsides with doing this
as compared  to running 'loadvm' in the existing QEMU, most
notably the user's VNC/SPICE console session gets interrupted.
I guess the ease of impl for QEMU was more compelling though.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-24 10:55       ` Daniel P. Berrangé
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel P. Berrangé @ 2022-02-24 10:55 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: linux-s390, Jason A. Donenfeld, Michael S. Tsirkin,
	Theodore Ts'o, adrian, KVM list, Jann Horn,
	Greg Kroah-Hartman, ben, Weiss, Radu, QEMU Developers,
	Richard W.M. Jones, LKML, Catangiu, Adrian Costin, graf,
	Linux Crypto Mailing List, Igor Mammedov, Colm MacCarthaigh,
	Singh, Balbir, Woodhouse, David, ehabkost

On Thu, Feb 24, 2022 at 09:22:50AM +0100, Laszlo Ersek wrote:
> (+Daniel, +Rich)
> 
> On 02/23/22 17:08, Jason A. Donenfeld wrote:
> > On Wed, Feb 23, 2022 at 2:12 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >> second patch is the reason this is just an RFC: it's a cleanup of the
> >> ACPI driver from last year, and I don't really have much experience
> >> writing, testing, debugging, or maintaining these types of drivers.
> >> Ideally this thread would yield somebody saying, "I see the intent of
> >> this; I'm happy to take over ownership of this part." That way, I can
> >> focus on the RNG part, and whoever steps up for the paravirt ACPI part
> >> can focus on that.
> 
> > (It appears there's a bug in QEMU which prevents
> > the GUID from being reinitialized when running `loadvm` without
> > quitting first; I suppose this should be discussed with QEMU
> > upstream.)
> 
> That's not (necessarily) a bug; see the end of the above-linked QEMU
> document:
> 
> "There are no known use cases for changing the GUID once QEMU is
> running, and adding this capability would greatly increase the complexity."

IIRC this part of the QEMU doc was making an implicit assumption
about the way QEMU is to be used by mgmt apps doing snapshots.

Instead of using the 'loadvm' command on the existing running QEMU
process, the doc seems to tacitly expect the management app will
throwaway the existing QEMU process and spawn a brand new QEMU
process to load the snapshot into, thus getting the new GUID on
the QEMU command line. There are some downsides with doing this
as compared  to running 'loadvm' in the existing QEMU, most
notably the user's VNC/SPICE console session gets interrupted.
I guess the ease of impl for QEMU was more compelling though.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-24 10:55       ` Daniel P. Berrangé
@ 2022-02-24 10:57         ` Jason A. Donenfeld
  -1 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24 10:57 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Laszlo Ersek, LKML, Linux Crypto Mailing List, QEMU Developers,
	KVM list, linux-s390, adrian, Woodhouse, David, Catangiu,
	Adrian Costin, graf, Colm MacCarthaigh, Singh, Balbir, Weiss,
	Radu, Jann Horn, Greg Kroah-Hartman, Theodore Ts'o,
	Igor Mammedov, ehabkost, ben, Michael S. Tsirkin,
	Richard W.M. Jones

On Thu, Feb 24, 2022 at 11:56 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> IIRC this part of the QEMU doc was making an implicit assumption
> about the way QEMU is to be used by mgmt apps doing snapshots.
>
> Instead of using the 'loadvm' command on the existing running QEMU
> process, the doc seems to tacitly expect the management app will
> throwaway the existing QEMU process and spawn a brand new QEMU
> process to load the snapshot into, thus getting the new GUID on
> the QEMU command line.

Right, exactly. The "there are no known use cases" bit I think just
forgot about one very common use case that perhaps just wasn't in use
by the original author. So I'm pretty sure this remains a QEMU bug.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-24 10:57         ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24 10:57 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: linux-s390, Michael S. Tsirkin, Theodore Ts'o, adrian,
	KVM list, Jann Horn, Singh, Balbir, ben, Weiss, Radu,
	QEMU Developers, Richard W.M. Jones, LKML, Catangiu,
	Adrian Costin, graf, Linux Crypto Mailing List,
	Greg Kroah-Hartman, Igor Mammedov, Colm MacCarthaigh,
	Laszlo Ersek, Woodhouse, David, ehabkost

On Thu, Feb 24, 2022 at 11:56 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> IIRC this part of the QEMU doc was making an implicit assumption
> about the way QEMU is to be used by mgmt apps doing snapshots.
>
> Instead of using the 'loadvm' command on the existing running QEMU
> process, the doc seems to tacitly expect the management app will
> throwaway the existing QEMU process and spawn a brand new QEMU
> process to load the snapshot into, thus getting the new GUID on
> the QEMU command line.

Right, exactly. The "there are no known use cases" bit I think just
forgot about one very common use case that perhaps just wasn't in use
by the original author. So I'm pretty sure this remains a QEMU bug.

Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
  2022-02-24  1:27         ` Eric Biggers
@ 2022-02-24 11:15           ` Jason A. Donenfeld
  -1 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24 11:15 UTC (permalink / raw)
  To: Eric Biggers
  Cc: LKML, Linux Crypto Mailing List, QEMU Developers, KVM list,
	linux-s390, adrian, Woodhouse, David, Catangiu, Adrian Costin,
	graf, Colm MacCarthaigh, Singh, Balbir, Weiss, Radu, Jann Horn,
	Greg Kroah-Hartman, Theodore Ts'o

Hey Eric,

On Thu, Feb 24, 2022 at 2:27 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Thu, Feb 24, 2022 at 01:54:54AM +0100, Jason A. Donenfeld wrote:
> > On 2/24/22, Eric Biggers <ebiggers@kernel.org> wrote:
> > > I think we should be removing cases where the base_crng key is changed
> > > directly
> > > besides extraction from the input_pool, not adding new ones.  Why not
> > > implement
> > > this as add_device_randomness() followed by crng_reseed(force=true), where
> > > the
> > > 'force' argument forces a reseed to occur even if the entropy_count is too
> > > low?
> >
> > Because that induces a "premature next" condition which can let that
> > entropy, potentially newly acquired by a storm of IRQs at power-on, be
> > bruteforced by unprivileged userspace. I actually had it exactly the
> > way you describe at first, but decided that this here is the lesser of
> > evils and doesn't really complicate things the way an intentional
> > premature next would. The only thing we care about here is branching
> > the crng stream, and so this does explicitly that, without having to
> > interfere with how we collect entropy. Of course we *also* add it as
> > non-credited "device randomness" so that it's part of the next
> > reseeding, whenever that might occur.
>
> Can you make sure to properly explain this in the code?

The carousel keeps turning, and after I wrote to you last night I kept
thinking about the matter. Here's how it breaks down:

Injection method:
- Assumes existing pool of entropy is still sacred.
- Assumes base_crng timestamp is representative of pool age.
- Result: Mixes directly into base_crng to avoid premature-next of pool.

Input pool method:
- Assumes existing pool of entropy is old / out of date / used by a
different fork, so not sacred.
- Assumes base_crng timestamp is tied to irrelevant entropy pool.
- Result: Force-drains input pool, causing intentional premature-next.

Which of these assumptions better models the situation? I started in
the input pool method camp, then by the time I posted v1, was
concerned about power-on IRQs, but now I think relying at all on
snapshotted entropy represents the biggest issue. And judging from
your email, it appears that you do too. So v3 of this patchset will
switch back to the input pool method, per your suggestion.

As a plus, it means we go through the RDSEED'd extraction algorithm
too, which always helps.

Jason

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng
@ 2022-02-24 11:15           ` Jason A. Donenfeld
  0 siblings, 0 replies; 35+ messages in thread
From: Jason A. Donenfeld @ 2022-02-24 11:15 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-s390, Theodore Ts'o, KVM list, adrian, Jann Horn,
	Greg Kroah-Hartman, Weiss, Radu, QEMU Developers, LKML, Catangiu,
	Adrian Costin, graf, Linux Crypto Mailing List,
	Colm MacCarthaigh, Singh, Balbir, Woodhouse, David

Hey Eric,

On Thu, Feb 24, 2022 at 2:27 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Thu, Feb 24, 2022 at 01:54:54AM +0100, Jason A. Donenfeld wrote:
> > On 2/24/22, Eric Biggers <ebiggers@kernel.org> wrote:
> > > I think we should be removing cases where the base_crng key is changed
> > > directly
> > > besides extraction from the input_pool, not adding new ones.  Why not
> > > implement
> > > this as add_device_randomness() followed by crng_reseed(force=true), where
> > > the
> > > 'force' argument forces a reseed to occur even if the entropy_count is too
> > > low?
> >
> > Because that induces a "premature next" condition which can let that
> > entropy, potentially newly acquired by a storm of IRQs at power-on, be
> > bruteforced by unprivileged userspace. I actually had it exactly the
> > way you describe at first, but decided that this here is the lesser of
> > evils and doesn't really complicate things the way an intentional
> > premature next would. The only thing we care about here is branching
> > the crng stream, and so this does explicitly that, without having to
> > interfere with how we collect entropy. Of course we *also* add it as
> > non-credited "device randomness" so that it's part of the next
> > reseeding, whenever that might occur.
>
> Can you make sure to properly explain this in the code?

The carousel keeps turning, and after I wrote to you last night I kept
thinking about the matter. Here's how it breaks down:

Injection method:
- Assumes existing pool of entropy is still sacred.
- Assumes base_crng timestamp is representative of pool age.
- Result: Mixes directly into base_crng to avoid premature-next of pool.

Input pool method:
- Assumes existing pool of entropy is old / out of date / used by a
different fork, so not sacred.
- Assumes base_crng timestamp is tied to irrelevant entropy pool.
- Result: Force-drains input pool, causing intentional premature-next.

Which of these assumptions better models the situation? I started in
the input pool method camp, then by the time I posted v1, was
concerned about power-on IRQs, but now I think relying at all on
snapshotted entropy represents the biggest issue. And judging from
your email, it appears that you do too. So v3 of this patchset will
switch back to the input pool method, per your suggestion.

As a plus, it means we go through the RDSEED'd extraction algorithm
too, which always helps.

Jason


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-24 10:43     ` Daniel P. Berrangé
  (?)
@ 2022-02-24 11:35     ` Alexander Graf
  -1 siblings, 0 replies; 35+ messages in thread
From: Alexander Graf @ 2022-02-24 11:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Jason A. Donenfeld, linux-kernel, linux-crypto, qemu-devel, kvm,
	linux-s390, adrian, dwmw, acatan, colmmacc, sblbir, raduweis,
	jannh, gregkh, tytso


On 24.02.22 11:43, Daniel P. Berrangé wrote:
> On Thu, Feb 24, 2022 at 09:53:59AM +0100, Alexander Graf wrote:
>> Hey Jason,
>>
>> On 23.02.22 14:12, Jason A. Donenfeld wrote:
>>> This small series picks up work from Amazon that seems to have stalled
>>> out later year around this time: listening for the vmgenid ACPI
>>> notification, and using it to "do something." Last year, that something
>>> involved a complicated userspace mmap chardev, which seems frought with
>>> difficulty. This year, I have something much simpler in mind: simply
>>> using those ACPI notifications to tell the RNG to reinitialize safely,
>>> so we don't repeat random numbers in cloned, forked, or rolled-back VM
>>> instances.
>>>
>>> This series consists of two patches. The first is a rather
>>> straightforward addition to random.c, which I feel fine about. The
>>> second patch is the reason this is just an RFC: it's a cleanup of the
>>> ACPI driver from last year, and I don't really have much experience
>>> writing, testing, debugging, or maintaining these types of drivers.
>>> Ideally this thread would yield somebody saying, "I see the intent of
>>> this; I'm happy to take over ownership of this part." That way, I can
>>> focus on the RNG part, and whoever steps up for the paravirt ACPI part
>>> can focus on that.
>>>
>>> As a final note, this series intentionally does _not_ focus on
>>> notification of these events to userspace or to other kernel consumers.
>>> Since these VM fork detection events first need to hit the RNG, we can
>>> later talk about what sorts of notifications or mmap'd counters the RNG
>>> should be making accessible to elsewhere. But that's a different sort of
>>> project and ties into a lot of more complicated concerns beyond this
>>> more basic patchset. So hopefully we can keep the discussion rather
>>> focused here to this ACPI business.
>>
>> The main problem with VMGenID is that it is inherently racy. There will
>> always be a (short) amount of time where the ACPI notification is not
>> processed, but the VM could use its RNG to for example establish TLS
>> connections.
>>
>> Hence we as the next step proposed a multi-stage quiesce/resume mechanism
>> where the system is aware that it is going into suspend - can block network
>> connections for example - and only returns to a fully functional state after
>> an unquiesce phase:
>>
>>    https://github.com/systemd/systemd/issues/20222
> The downside of course is precisely that the guest now needs to be aware
> and involved every single time a snapshot is taken.
>
> Currently with virt the act of taking a snapshot can often remain invisible
> to the VM with no functional effect on the guest OS or its workload, and
> the host OS knows it can complete a snapshot in a specific timeframe. That
> said, this transparency to the VM is precisely the cause of the race
> condition described.
>
> With guest involvement to quiesce the bulk of activity for time period,
> there is more likely to be a negative impact on the guest workload. The
> guest admin likely needs to be more explicit about exactly when in time
> it is reasonable to take a snapshot to mitigate the impact.
>
> The host OS snapshot operations are also now dependant on co-operation
> of a guest OS that has to be considered to be potentially malicious, or
> at least crashed/non-responsive. The guest OS also needs a way to receive
> the triggers for snapshot capture and restore, most likely via an extension
> to something like the QEMU guest agent or an equivalent for othuer
> hypervisors.


What you describe sounds almost exactly like pressing a power button on 
modern systems. You don't just kill the power line, you press a button 
and wait for the guest to acknowledge that it's ready.

Maybe the real answer to all of this is S3: Suspend to RAM. You press 
the suspend button, the guest can prepare for sleep (quiesce!) and the 
next time you run, it can check whether VMGenID changed and act accordingly.


> Despite the above, I'm not against the idea of co-operative involvement
> of the guest OS in the acts of taking & restoring snapshots. I can't
> see any other proposals so far that can reliably eliminate the races
> in the general case, from the kernel right upto user applications.
> So I think it is neccessary to have guest cooperative snapshotting.
>
>> What exact use case do you have in mind for the RNG/VMGenID update? Can you
>> think of situations where the race is not an actual concern?
> Lets assume we do take the approach described in that systemd bug and
> have a co-operative snapshot process. If the hypervisor does the right
> thing and guest owners install the right things, they'll have a race
> free solution that works well in normal operation. That's good.
>
>
> Realistically though, it is never going to be universally and reliably
> put into practice. So what is our attitude to cases where the preferred
> solution isn't availble and/or operative ?
>
>
> There are going to be users who continue to build their guest disk images
> without the QEMU guest agent (or equivalent for whatever hypervisor they
> run on) installed because they don't know any better. Or where the guest
> agent is mis-configured or fails to starts or some other scenario that
> prevents the quiesce working as desired. The host mgmt could refuse to
> take a snapshot in these cases. More likely is that they are just
> going to go ahead and do a snapshot anyway because lack of guest agent
> is a very common scenario today and users want their snapshots.
>
>
> There are going to be virt management apps / hypervisors that don't
> support talking to any guest agent across their snapshot operation
> in the first place, so systemd gets no way to trigger the required
> quiesce dance on snapshot, but they likely have VMGenID support
> implemented already.
>
>
> IOW, I could view VMGenID triggered fork detection integrated with
> the kernel RNG as providing a backup line of defence that is going
> to "just work", albeit with the known race. It isn't as good as the
> guest co-operative snapshot approach, because it only tries to solve
> the one specific targetted problem of updating the kernel RNG.
>
> Is it still better than doing nothing at all though, for the scenario
> where guest co-operative snapshot is unavailable ?
>
> If it is better than nothing, is it then compelling enough to justify
> the maint cost of the code added to the kernel ?


I'm tempted to say "If it also exposes the VMGenID via sysfs so that you 
can actually check whether you were cloned, probably yes."


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
  2022-02-24 10:57         ` Jason A. Donenfeld
@ 2022-02-25 10:40           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2022-02-25 10:40 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Daniel P. Berrangé,
	Laszlo Ersek, LKML, Linux Crypto Mailing List, QEMU Developers,
	KVM list, linux-s390, adrian, Woodhouse, David, Catangiu,
	Adrian Costin, graf, Colm MacCarthaigh, Singh, Balbir, Weiss,
	Radu, Jann Horn, Greg Kroah-Hartman, Theodore Ts'o,
	Igor Mammedov, ehabkost, ben, Richard W.M. Jones

On Thu, Feb 24, 2022 at 11:57:34AM +0100, Jason A. Donenfeld wrote:
> On Thu, Feb 24, 2022 at 11:56 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > IIRC this part of the QEMU doc was making an implicit assumption
> > about the way QEMU is to be used by mgmt apps doing snapshots.
> >
> > Instead of using the 'loadvm' command on the existing running QEMU
> > process, the doc seems to tacitly expect the management app will
> > throwaway the existing QEMU process and spawn a brand new QEMU
> > process to load the snapshot into, thus getting the new GUID on
> > the QEMU command line.
> 
> Right, exactly. The "there are no known use cases" bit I think just
> forgot about one very common use case that perhaps just wasn't in use
> by the original author. So I'm pretty sure this remains a QEMU bug.
> 
> Jason

Quite possibly yes.

-- 
MST


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH RFC v1 0/2] VM fork detection for RNG
@ 2022-02-25 10:40           ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2022-02-25 10:40 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-s390, Daniel P. Berrangé,
	adrian, KVM list, Jann Horn, Singh, Balbir, ben, Weiss, Radu,
	QEMU Developers, Richard W.M. Jones, LKML, Catangiu,
	Adrian Costin, graf, Linux Crypto Mailing List,
	Greg Kroah-Hartman, Theodore Ts'o, Colm MacCarthaigh,
	Laszlo Ersek, Igor Mammedov, Woodhouse, David, ehabkost

On Thu, Feb 24, 2022 at 11:57:34AM +0100, Jason A. Donenfeld wrote:
> On Thu, Feb 24, 2022 at 11:56 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > IIRC this part of the QEMU doc was making an implicit assumption
> > about the way QEMU is to be used by mgmt apps doing snapshots.
> >
> > Instead of using the 'loadvm' command on the existing running QEMU
> > process, the doc seems to tacitly expect the management app will
> > throwaway the existing QEMU process and spawn a brand new QEMU
> > process to load the snapshot into, thus getting the new GUID on
> > the QEMU command line.
> 
> Right, exactly. The "there are no known use cases" bit I think just
> forgot about one very common use case that perhaps just wasn't in use
> by the original author. So I'm pretty sure this remains a QEMU bug.
> 
> Jason

Quite possibly yes.

-- 
MST



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-02-25 10:43 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23 13:12 [PATCH RFC v1 0/2] VM fork detection for RNG Jason A. Donenfeld
2022-02-23 13:12 ` Jason A. Donenfeld
2022-02-23 13:12 ` [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng Jason A. Donenfeld
2022-02-23 13:12   ` Jason A. Donenfeld
2022-02-23 23:16   ` Eric Biggers
2022-02-23 23:16     ` Eric Biggers
2022-02-24  0:54     ` Jason A. Donenfeld
2022-02-24  0:54       ` Jason A. Donenfeld
2022-02-24  1:27       ` Eric Biggers
2022-02-24  1:27         ` Eric Biggers
2022-02-24 11:15         ` Jason A. Donenfeld
2022-02-24 11:15           ` Jason A. Donenfeld
2022-02-23 13:12 ` [PATCH RFC v1 2/2] drivers/virt: add vmgenid driver for reinitializing RNG Jason A. Donenfeld
2022-02-23 13:12   ` Jason A. Donenfeld
2022-02-23 16:36   ` Jason A. Donenfeld
2022-02-23 16:36     ` Jason A. Donenfeld
2022-02-23 16:08 ` [PATCH RFC v1 0/2] VM fork detection for RNG Jason A. Donenfeld
2022-02-23 16:08   ` Jason A. Donenfeld
2022-02-23 16:19   ` Jason A. Donenfeld
2022-02-23 16:19     ` Jason A. Donenfeld
2022-02-24  8:22   ` Laszlo Ersek
2022-02-24 10:43     ` Jason A. Donenfeld
2022-02-24 10:43       ` Jason A. Donenfeld
2022-02-24 10:55     ` Daniel P. Berrangé
2022-02-24 10:55       ` Daniel P. Berrangé
2022-02-24 10:57       ` Jason A. Donenfeld
2022-02-24 10:57         ` Jason A. Donenfeld
2022-02-25 10:40         ` Michael S. Tsirkin
2022-02-25 10:40           ` Michael S. Tsirkin
2022-02-24  8:53 ` Alexander Graf
2022-02-24 10:43   ` Daniel P. Berrangé
2022-02-24 10:43     ` Daniel P. Berrangé
2022-02-24 11:35     ` Alexander Graf
2022-02-24 10:53   ` Jason A. Donenfeld
2022-02-24 10:53     ` Jason A. Donenfeld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.