All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 13:25 Ashish Kalra
  2021-08-16 13:26 ` [RFC PATCH 01/13] machine: Add mirrorvcpus=N suboption to -smp Ashish Kalra
                   ` (15 more replies)
  0 siblings, 16 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

This is an RFC series for Mirror VM support that are 
essentially secondary VMs sharing the encryption context 
(ASID) with a primary VM. The patch-set creates a new 
VM and shares the primary VM's encryption context
with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
The mirror VM uses a separate pair of VM + vCPU file 
descriptors and also uses a simplified KVM run loop, 
for example, it does not support any interrupt vmexit's. etc.
Currently the mirror VM shares the address space of the
primary VM. 

The mirror VM can be used for running an in-guest migration 
helper (MH). It also might have future uses for other in-guest
operations.

The mirror VM support is enabled by adding a mirrorvcpus=N
suboption to -smp, which also designates a few vcpus (normally 1)
to the mirror VM.

Example usage for starting a 4-vcpu guest, of which 1 vcpu is marked as
mirror vcpu.

    qemu-system-x86_64 -smp 4,mirrorvcpus=1 ...

Ashish Kalra (7):
  kvm: Add Mirror VM ioctl and enable cap interfaces.
  kvm: Add Mirror VM support.
  kvm: create Mirror VM and share primary VM's encryption context.
  softmmu/cpu: Skip mirror vcpu's for pause, resume and synchronization.
  kvm/apic: Disable in-kernel APIC support for mirror vcpu's.
  hw/acpi: disable modern CPU hotplug interface for mirror vcpu's
  hw/i386/pc: reduce fw_cfg boot cpu count taking into account mirror
    vcpu's.

Dov Murik (5):
  machine: Add mirrorvcpus=N suboption to -smp
  hw/boards: Add mirror_vcpu flag to CPUArchId
  hw/i386: Mark mirror vcpus in possible_cpus
  cpu: Add boolean mirror_vcpu field to CPUState
  hw/i386: Set CPUState.mirror_vcpu=true for mirror vcpus

Tobin Feldman-Fitzthum (1):
  hw/acpi: Don't include mirror vcpus in ACPI tables

 accel/kvm/kvm-accel-ops.c |  45 ++++++-
 accel/kvm/kvm-all.c       | 244 +++++++++++++++++++++++++++++++++++++-
 accel/kvm/kvm-cpus.h      |   2 +
 hw/acpi/cpu.c             |  21 +++-
 hw/core/cpu-common.c      |   1 +
 hw/core/machine.c         |   7 ++
 hw/i386/acpi-build.c      |   5 +
 hw/i386/acpi-common.c     |   5 +
 hw/i386/kvm/apic.c        |  15 +++
 hw/i386/pc.c              |  10 ++
 hw/i386/x86.c             |  11 +-
 include/hw/acpi/cpu.h     |   1 +
 include/hw/boards.h       |   3 +
 include/hw/core/cpu.h     |   3 +
 include/hw/i386/x86.h     |   3 +-
 include/sysemu/kvm.h      |  15 +++
 qapi/machine.json         |   5 +-
 softmmu/cpus.c            |  27 +++++
 softmmu/vl.c              |   3 +
 target/i386/kvm/kvm.c     |  42 +++++++
 20 files changed, 459 insertions(+), 9 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [RFC PATCH 01/13] machine: Add mirrorvcpus=N suboption to -smp
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
@ 2021-08-16 13:26 ` Ashish Kalra
  2021-08-16 21:23     ` Eric Blake
  2021-08-16 13:27 ` [RFC PATCH 02/13] hw/boards: Add mirror_vcpu flag to CPUArchId Ashish Kalra
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Dov Murik <dovmurik@linux.vnet.ibm.com>

Add a notion of mirror vcpus to CpuTopology, which will allow to
designate a few vcpus (normally 1) for running the guest
migration handler (MH).

Example usage for starting a 4-vcpu guest, of which 1 vcpu is marked as
mirror vcpu.

    qemu-system-x86_64 -smp 4,mirrorvcpus=1 ...

Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 hw/core/machine.c   | 7 +++++++
 hw/i386/pc.c        | 7 +++++++
 include/hw/boards.h | 1 +
 qapi/machine.json   | 5 ++++-
 softmmu/vl.c        | 3 +++
 5 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 943974d411..059262f914 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -749,6 +749,7 @@ static void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
     unsigned sockets = config->has_sockets ? config->sockets : 0;
     unsigned cores   = config->has_cores ? config->cores : 0;
     unsigned threads = config->has_threads ? config->threads : 0;
+    unsigned mirror_vcpus = config->has_mirrorvcpus ? config->mirrorvcpus : 0;
 
     if (config->has_dies && config->dies != 0 && config->dies != 1) {
         error_setg(errp, "dies not supported by this machine's CPU topology");
@@ -787,6 +788,11 @@ static void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
         return;
     }
 
+    if (mirror_vcpus > ms->smp.max_cpus) {
+        error_setg(errp, "mirror vcpus must be less than max cpus");
+        return;
+    }
+
     if (sockets * cores * threads != ms->smp.max_cpus) {
         error_setg(errp, "Invalid CPU topology: "
                    "sockets (%u) * cores (%u) * threads (%u) "
@@ -800,6 +806,7 @@ static void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
     ms->smp.cores = cores;
     ms->smp.threads = threads;
     ms->smp.sockets = sockets;
+    ms->smp.mirror_vcpus = mirror_vcpus;
 }
 
 static void machine_get_smp(Object *obj, Visitor *v, const char *name,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c2b9d62a35..3856a47390 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -719,6 +719,7 @@ static void pc_smp_parse(MachineState *ms, SMPConfiguration *config, Error **err
     unsigned dies    = config->has_dies ? config->dies : 1;
     unsigned cores   = config->has_cores ? config->cores : 0;
     unsigned threads = config->has_threads ? config->threads : 0;
+    unsigned mirror_vcpus = config->has_mirrorvcpus ? config->mirrorvcpus : 0;
 
     /* compute missing values, prefer sockets over cores over threads */
     if (cpus == 0 || sockets == 0) {
@@ -753,6 +754,11 @@ static void pc_smp_parse(MachineState *ms, SMPConfiguration *config, Error **err
         return;
     }
 
+    if (mirror_vcpus > ms->smp.max_cpus) {
+        error_setg(errp, "mirror vcpus must be less than max cpus");
+        return;
+    }
+
     if (sockets * dies * cores * threads != ms->smp.max_cpus) {
         error_setg(errp, "Invalid CPU topology deprecated: "
                    "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
@@ -767,6 +773,7 @@ static void pc_smp_parse(MachineState *ms, SMPConfiguration *config, Error **err
     ms->smp.threads = threads;
     ms->smp.sockets = sockets;
     ms->smp.dies = dies;
+    ms->smp.mirror_vcpus = mirror_vcpus;
 }
 
 static
diff --git a/include/hw/boards.h b/include/hw/boards.h
index accd6eff35..b0e599096a 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -286,6 +286,7 @@ typedef struct CpuTopology {
     unsigned int threads;
     unsigned int sockets;
     unsigned int max_cpus;
+    unsigned int mirror_vcpus;
 } CpuTopology;
 
 /**
diff --git a/qapi/machine.json b/qapi/machine.json
index c3210ee1fb..7888601715 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1303,6 +1303,8 @@
 #
 # @maxcpus: maximum number of hotpluggable virtual CPUs in the virtual machine
 #
+# @mirrorvcpus: maximum number of mirror virtual CPUs in the virtual machine
+#
 # Since: 6.1
 ##
 { 'struct': 'SMPConfiguration', 'data': {
@@ -1311,4 +1313,5 @@
      '*dies': 'int',
      '*cores': 'int',
      '*threads': 'int',
-     '*maxcpus': 'int' } }
+     '*maxcpus': 'int',
+     '*mirrorvcpus': 'int' } }
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 5ca11e7469..6261f1cfb1 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -710,6 +710,9 @@ static QemuOptsList qemu_smp_opts = {
         }, {
             .name = "maxcpus",
             .type = QEMU_OPT_NUMBER,
+        }, {
+            .name = "mirrorvcpus",
+            .type = QEMU_OPT_NUMBER,
         },
         { /*End of list */ }
     },
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 02/13] hw/boards: Add mirror_vcpu flag to CPUArchId
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
  2021-08-16 13:26 ` [RFC PATCH 01/13] machine: Add mirrorvcpus=N suboption to -smp Ashish Kalra
@ 2021-08-16 13:27 ` Ashish Kalra
  2021-08-16 13:27 ` [RFC PATCH 03/13] hw/i386: Mark mirror vcpus in possible_cpus Ashish Kalra
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Dov Murik <dovmurik@linux.vnet.ibm.com>

The mirror_vcpu flag indicates whether a vcpu is a mirror.

Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 include/hw/boards.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index b0e599096a..f7f29a466c 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -89,6 +89,7 @@ MemoryRegion *machine_consume_memdev(MachineState *machine,
  * @type - QOM class name of possible @cpu object
  * @props - CPU object properties, initialized by board
  * #vcpus_count - number of threads provided by @cpu object
+ * @mirror_vcpu - is this a mirror VCPU
  */
 typedef struct CPUArchId {
     uint64_t arch_id;
@@ -96,6 +97,7 @@ typedef struct CPUArchId {
     CpuInstanceProperties props;
     Object *cpu;
     const char *type;
+    bool mirror_vcpu;
 } CPUArchId;
 
 /**
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 03/13] hw/i386: Mark mirror vcpus in possible_cpus
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
  2021-08-16 13:26 ` [RFC PATCH 01/13] machine: Add mirrorvcpus=N suboption to -smp Ashish Kalra
  2021-08-16 13:27 ` [RFC PATCH 02/13] hw/boards: Add mirror_vcpu flag to CPUArchId Ashish Kalra
@ 2021-08-16 13:27 ` Ashish Kalra
  2021-08-16 13:27 ` [RFC PATCH 04/13] hw/acpi: Don't include mirror vcpus in ACPI tables Ashish Kalra
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Dov Murik <dovmurik@linux.vnet.ibm.com>

Mark the last mirror_vcpus vcpus in the machine state's possible_cpus as
mirror.

Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 hw/i386/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 00448ed55a..a0103cb0aa 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -448,6 +448,7 @@ const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
 {
     X86MachineState *x86ms = X86_MACHINE(ms);
     unsigned int max_cpus = ms->smp.max_cpus;
+    unsigned int mirror_vcpus_start_at = max_cpus - ms->smp.mirror_vcpus;
     X86CPUTopoInfo topo_info;
     int i;
 
@@ -475,6 +476,7 @@ const CPUArchIdList *x86_possible_cpu_arch_ids(MachineState *ms)
             x86_cpu_apic_id_from_index(x86ms, i);
         x86_topo_ids_from_apicid(ms->possible_cpus->cpus[i].arch_id,
                                  &topo_info, &topo_ids);
+        ms->possible_cpus->cpus[i].mirror_vcpu = i >= mirror_vcpus_start_at;
         ms->possible_cpus->cpus[i].props.has_socket_id = true;
         ms->possible_cpus->cpus[i].props.socket_id = topo_ids.pkg_id;
         if (ms->smp.dies > 1) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 04/13] hw/acpi: Don't include mirror vcpus in ACPI tables
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (2 preceding siblings ...)
  2021-08-16 13:27 ` [RFC PATCH 03/13] hw/i386: Mark mirror vcpus in possible_cpus Ashish Kalra
@ 2021-08-16 13:27 ` Ashish Kalra
  2021-08-16 13:28 ` [RFC PATCH 05/13] cpu: Add boolean mirror_vcpu field to CPUState Ashish Kalra
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Tobin Feldman-Fitzthum <tobin@linux.ibm.com>

By excluding mirror vcpus from the ACPI tables, we hide them from the
guest OS.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@linux.ibm.com>
Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 hw/acpi/cpu.c         | 10 ++++++++++
 hw/i386/acpi-build.c  |  5 +++++
 hw/i386/acpi-common.c |  5 +++++
 3 files changed, 20 insertions(+)

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index f82e9512fd..8ac2fd018e 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -435,6 +435,11 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
 
         method = aml_method(CPU_NOTIFY_METHOD, 2, AML_NOTSERIALIZED);
         for (i = 0; i < arch_ids->len; i++) {
+            if (arch_ids->cpus[i].mirror_vcpu) {
+                /* don't build objects for mirror vCPUs */
+                continue;
+            }
+
             Aml *cpu = aml_name(CPU_NAME_FMT, i);
             Aml *uid = aml_arg(0);
             Aml *event = aml_arg(1);
@@ -650,6 +655,11 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
 
         /* build Processor object for each processor */
         for (i = 0; i < arch_ids->len; i++) {
+            if (arch_ids->cpus[i].mirror_vcpu) {
+                /* don't build objects for mirror vCPUs */
+                continue;
+            }
+
             Aml *dev;
             Aml *uid = aml_int(i);
             GArray *madt_buf = g_array_new(0, 1, 1);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a33ac8b91e..3c0a8b47ef 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1928,6 +1928,11 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
     srat->reserved1 = cpu_to_le32(1);
 
     for (i = 0; i < apic_ids->len; i++) {
+        if (apic_ids->cpus[i].mirror_vcpu) {
+            /* don't build objects for mirror vCPUs */
+            continue;
+        }
+
         int node_id = apic_ids->cpus[i].props.node_id;
         uint32_t apic_id = apic_ids->cpus[i].arch_id;
 
diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
index 1f5947fcf9..80aefbc920 100644
--- a/hw/i386/acpi-common.c
+++ b/hw/i386/acpi-common.c
@@ -91,6 +91,11 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
     madt->flags = cpu_to_le32(1);
 
     for (i = 0; i < apic_ids->len; i++) {
+        if (apic_ids->cpus[i].mirror_vcpu) {
+            /* don't build objects for mirror vCPUs */
+            continue;
+        }
+
         adevc->madt_cpu(adev, i, apic_ids, table_data);
         if (apic_ids->cpus[i].arch_id > 254) {
             x2apic_mode = true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 05/13] cpu: Add boolean mirror_vcpu field to CPUState
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (3 preceding siblings ...)
  2021-08-16 13:27 ` [RFC PATCH 04/13] hw/acpi: Don't include mirror vcpus in ACPI tables Ashish Kalra
@ 2021-08-16 13:28 ` Ashish Kalra
  2021-08-16 13:28 ` [RFC PATCH 06/13] hw/i386: Set CPUState.mirror_vcpu=true for mirror vcpus Ashish Kalra
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Dov Murik <dovmurik@linux.vnet.ibm.com>

The mirror field indicates mirror VCPUs.  This will allow QEMU to act
differently on mirror VCPUs.

Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 hw/core/cpu-common.c  | 1 +
 include/hw/core/cpu.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index e2f5a64604..99298f3a79 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -269,6 +269,7 @@ static Property cpu_common_props[] = {
                      MemoryRegion *),
 #endif
     DEFINE_PROP_BOOL("start-powered-off", CPUState, start_powered_off, false),
+    DEFINE_PROP_BOOL("mirror_vcpu", CPUState, mirror_vcpu, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index bc864564ce..a8f5f7862d 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -402,6 +402,9 @@ struct CPUState {
     /* shared by kvm, hax and hvf */
     bool vcpu_dirty;
 
+    /* does this cpu belong to mirror VM */
+    bool mirror_vcpu;
+
     /* Used to keep track of an outstanding cpu throttle thread for migration
      * autoconverge
      */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 06/13] hw/i386: Set CPUState.mirror_vcpu=true for mirror vcpus
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (4 preceding siblings ...)
  2021-08-16 13:28 ` [RFC PATCH 05/13] cpu: Add boolean mirror_vcpu field to CPUState Ashish Kalra
@ 2021-08-16 13:28 ` Ashish Kalra
  2021-08-16 13:29 ` [RFC PATCH 07/13] kvm: Add Mirror VM ioctl and enable cap interfaces Ashish Kalra
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Dov Murik <dovmurik@linux.vnet.ibm.com>

On x86 machines, when initializing the CPUState structs, set the
mirror_vcpu flag to true for mirror vcpus.

Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 hw/i386/x86.c         | 9 +++++++--
 include/hw/i386/x86.h | 3 ++-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a0103cb0aa..67e2b331fc 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -101,13 +101,17 @@ uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
 }
 
 
-void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, Error **errp)
+void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, bool mirror_vcpu,
+                 Error **errp)
 {
     Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
 
     if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
         goto out;
     }
+    if (!object_property_set_bool(cpu, "mirror_vcpu", mirror_vcpu, errp)) {
+        goto out;
+    }
     qdev_realize(DEVICE(cpu), NULL, errp);
 
 out:
@@ -135,7 +139,8 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
                                                       ms->smp.max_cpus - 1) + 1;
     possible_cpus = mc->possible_cpu_arch_ids(ms);
     for (i = 0; i < ms->smp.cpus; i++) {
-        x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
+        x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id,
+                    possible_cpus->cpus[i].mirror_vcpu, &error_fatal);
     }
 }
 
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 6e9244a82c..9206826c36 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -96,7 +96,8 @@ void init_topo_info(X86CPUTopoInfo *topo_info, const X86MachineState *x86ms);
 uint32_t x86_cpu_apic_id_from_index(X86MachineState *pcms,
                                     unsigned int cpu_index);
 
-void x86_cpu_new(X86MachineState *pcms, int64_t apic_id, Error **errp);
+void x86_cpu_new(X86MachineState *pcms, int64_t apic_id, bool mirror_vcpu,
+                 Error **errp);
 void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
 CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
                                              unsigned cpu_index);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 07/13] kvm: Add Mirror VM ioctl and enable cap interfaces.
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (5 preceding siblings ...)
  2021-08-16 13:28 ` [RFC PATCH 06/13] hw/i386: Set CPUState.mirror_vcpu=true for mirror vcpus Ashish Kalra
@ 2021-08-16 13:29 ` Ashish Kalra
  2021-08-16 13:29 ` [RFC PATCH 08/13] kvm: Add Mirror VM support Ashish Kalra
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

Add VM ioctl and enable cap support for Mirror VM's and
a new VM file descriptor for Mirror VM's in KVMState.

The VCPU ioctl interface for Mirror VM works as it is,
as it uses a CPUState and VCPU file descriptor allocated
and setup for mirror vcpus.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 accel/kvm/kvm-all.c  | 23 +++++++++++++++++++++++
 include/sysemu/kvm.h | 14 ++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0125c17edb..4bc5971881 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -149,6 +149,7 @@ struct KVMState
     uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
     uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
     struct KVMDirtyRingReaper reaper;
+    int mirror_vm_fd;
 };
 
 KVMState *kvm_state;
@@ -3003,6 +3004,28 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
     return ret;
 }
 
+int kvm_mirror_vm_ioctl(KVMState *s, int type, ...)
+{
+    int ret;
+    void *arg;
+    va_list ap;
+
+    if (!s->mirror_vm_fd) {
+        return 0;
+    }
+
+    va_start(ap, type);
+    arg = va_arg(ap, void *);
+    va_end(ap);
+
+    trace_kvm_vm_ioctl(type, arg);
+    ret = ioctl(s->mirror_vm_fd, type, arg);
+    if (ret == -1) {
+        ret = -errno;
+    }
+    return ret;
+}
+
 int kvm_vcpu_ioctl(CPUState *cpu, int type, ...)
 {
     int ret;
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a1ab1ee12d..6847ffcdfd 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -255,6 +255,8 @@ int kvm_ioctl(KVMState *s, int type, ...);
 
 int kvm_vm_ioctl(KVMState *s, int type, ...);
 
+int kvm_mirror_vm_ioctl(KVMState *s, int type, ...);
+
 int kvm_vcpu_ioctl(CPUState *cpu, int type, ...);
 
 /**
@@ -434,6 +436,18 @@ int kvm_vm_check_extension(KVMState *s, unsigned int extension);
         kvm_vm_ioctl(s, KVM_ENABLE_CAP, &cap);                       \
     })
 
+#define kvm_mirror_vm_enable_cap(s, capability, cap_flags, ...)      \
+    ({                                                               \
+        struct kvm_enable_cap cap = {                                \
+            .cap = capability,                                       \
+            .flags = cap_flags,                                      \
+        };                                                           \
+        uint64_t args_tmp[] = { __VA_ARGS__ };                       \
+        size_t n = MIN(ARRAY_SIZE(args_tmp), ARRAY_SIZE(cap.args));  \
+        memcpy(cap.args, args_tmp, n * sizeof(cap.args[0]));         \
+        kvm_mirror_vm_ioctl(s, KVM_ENABLE_CAP, &cap);                \
+    })
+
 #define kvm_vcpu_enable_cap(cpu, capability, cap_flags, ...)         \
     ({                                                               \
         struct kvm_enable_cap cap = {                                \
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 08/13] kvm: Add Mirror VM support.
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (6 preceding siblings ...)
  2021-08-16 13:29 ` [RFC PATCH 07/13] kvm: Add Mirror VM ioctl and enable cap interfaces Ashish Kalra
@ 2021-08-16 13:29 ` Ashish Kalra
  2021-08-16 13:29 ` [RFC PATCH 09/13] kvm: create Mirror VM and share primary VM's encryption context Ashish Kalra
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

Add a new kvm_mirror_vcpu_thread_fn() which is qemu's mirror vcpu
thread and the corresponding kvm_init_mirror_vcpu() which creates
the vcpu's for the mirror VM and a different KVM run loop
kvm_mirror_cpu_exec() which differs from the main KVM run loop as
it currently mainly handles IO and MMIO exits, does not handle
any interrupt exits as the mirror VM does not have an interrupt
controller. This mirror vcpu run loop can be further optimized.

Also, we have a different kvm_arch_put_registers() for mirror
vcpu's as we dont' save/restore MSRs currently for mirror vcpu's,
kvm_put_msrs() fails for mirror vcpu's as mirror VM does not have
any interrupt controller such as the in-kernel irqchip controller.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 accel/kvm/kvm-accel-ops.c |  45 ++++++++-
 accel/kvm/kvm-all.c       | 191 +++++++++++++++++++++++++++++++++++++-
 accel/kvm/kvm-cpus.h      |   2 +
 include/sysemu/kvm.h      |   1 +
 target/i386/kvm/kvm.c     |  42 +++++++++
 5 files changed, 277 insertions(+), 4 deletions(-)

diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
index 7516c67a3f..e49a14e58c 100644
--- a/accel/kvm/kvm-accel-ops.c
+++ b/accel/kvm/kvm-accel-ops.c
@@ -61,6 +61,42 @@ static void *kvm_vcpu_thread_fn(void *arg)
     return NULL;
 }
 
+static void *kvm_mirror_vcpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+    int r;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->can_do_io = 1;
+
+    r = kvm_init_mirror_vcpu(cpu, &error_fatal);
+    kvm_init_cpu_signals(cpu);
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            r = kvm_mirror_cpu_exec(cpu);
+            if (r == EXCP_DEBUG) {
+                cpu_handle_guest_debug(cpu);
+            }
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    kvm_destroy_vcpu(cpu);
+    qemu_mutex_unlock_iothread();
+    cpu_thread_signal_destroyed(cpu);
+    rcu_unregister_thread();
+    return NULL;
+}
+
 static void kvm_start_vcpu_thread(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
@@ -70,8 +106,13 @@ static void kvm_start_vcpu_thread(CPUState *cpu)
     qemu_cond_init(cpu->halt_cond);
     snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
              cpu->cpu_index);
-    qemu_thread_create(cpu->thread, thread_name, kvm_vcpu_thread_fn,
-                       cpu, QEMU_THREAD_JOINABLE);
+    if (!cpu->mirror_vcpu) {
+        qemu_thread_create(cpu->thread, thread_name, kvm_vcpu_thread_fn,
+                            cpu, QEMU_THREAD_JOINABLE);
+    } else {
+        qemu_thread_create(cpu->thread, thread_name, kvm_mirror_vcpu_thread_fn,
+                           cpu, QEMU_THREAD_JOINABLE);
+    }
 }
 
 static void kvm_accel_ops_class_init(ObjectClass *oc, void *data)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 4bc5971881..f14b33dde1 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2294,6 +2294,55 @@ bool kvm_vcpu_id_is_valid(int vcpu_id)
     return vcpu_id >= 0 && vcpu_id < kvm_max_vcpu_id(s);
 }
 
+int kvm_init_mirror_vcpu(CPUState *cpu, Error **errp)
+{
+    KVMState *s = kvm_state;
+    long mmap_size;
+    int ret;
+
+    ret =  kvm_mirror_vm_ioctl(s, KVM_CREATE_VCPU, kvm_arch_vcpu_id(cpu));
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "kvm_init_mirror_vcpu: kvm_get_vcpu failed");
+        goto err;
+    }
+
+    cpu->kvm_fd = ret;
+    cpu->kvm_state = s;
+    cpu->vcpu_dirty = true;
+
+    mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
+    if (mmap_size < 0) {
+        ret = mmap_size;
+        error_setg_errno(errp, -mmap_size,
+                         "kvm_init_mirror_vcpu: KVM_GET_VCPU_MMAP_SIZE failed");
+        goto err;
+    }
+
+    cpu->kvm_run = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+                        cpu->kvm_fd, 0);
+    if (cpu->kvm_run == MAP_FAILED) {
+        ret = -errno;
+        error_setg_errno(errp, ret,
+                         "kvm_init_mirror_vcpu: mmap'ing vcpu state failed");
+    }
+
+    if (s->coalesced_mmio && !s->coalesced_mmio_ring) {
+        s->coalesced_mmio_ring =
+            (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
+    }
+
+    ret = kvm_arch_init_vcpu(cpu);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "kvm_init_vcpu: kvm_arch_init_vcpu failed (%lu)",
+                         kvm_arch_vcpu_id(cpu));
+    }
+
+err:
+    return ret;
+}
+
 static int kvm_init(MachineState *ms)
 {
     MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -2717,7 +2766,11 @@ void kvm_cpu_synchronize_state(CPUState *cpu)
 
 static void do_kvm_cpu_synchronize_post_reset(CPUState *cpu, run_on_cpu_data arg)
 {
-    kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);
+    if (!cpu->mirror_vcpu) {
+        kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);
+    } else {
+        kvm_arch_mirror_put_registers(cpu, KVM_PUT_RESET_STATE);
+    }
     cpu->vcpu_dirty = false;
 }
 
@@ -2728,7 +2781,11 @@ void kvm_cpu_synchronize_post_reset(CPUState *cpu)
 
 static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
 {
-    kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
+    if (!cpu->mirror_vcpu) {
+        kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
+    } else {
+        kvm_arch_mirror_put_registers(cpu, KVM_PUT_FULL_STATE);
+    }
     cpu->vcpu_dirty = false;
 }
 
@@ -2968,6 +3025,136 @@ int kvm_cpu_exec(CPUState *cpu)
     return ret;
 }
 
+int kvm_mirror_cpu_exec(CPUState *cpu)
+{
+    struct kvm_run *run = cpu->kvm_run;
+    int ret, run_ret = 0;
+
+    DPRINTF("kvm_mirror_cpu_exec()\n");
+    assert(cpu->mirror_vcpu == TRUE);
+
+    qemu_mutex_unlock_iothread();
+    cpu_exec_start(cpu);
+
+    do {
+        MemTxAttrs attrs;
+
+        if (cpu->vcpu_dirty) {
+            kvm_arch_mirror_put_registers(cpu, KVM_PUT_RUNTIME_STATE);
+            cpu->vcpu_dirty = false;
+        }
+
+        kvm_arch_pre_run(cpu, run);
+        if (qatomic_read(&cpu->exit_request)) {
+            DPRINTF("interrupt exit requested\n");
+            /*
+             * KVM requires us to reenter the kernel after IO exits to complete
+             * instruction emulation. This self-signal will ensure that we
+             * leave ASAP again.
+             */
+            kvm_cpu_kick_self();
+        }
+
+        /*
+         * Read cpu->exit_request before KVM_RUN reads run->immediate_exit.
+         * Matching barrier in kvm_eat_signals.
+         */
+        smp_rmb();
+
+        run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
+
+        attrs = kvm_arch_post_run(cpu, run);
+
+        if (run_ret < 0) {
+            if (run_ret == -EINTR || run_ret == -EAGAIN) {
+                DPRINTF("io window exit\n");
+                kvm_eat_signals(cpu);
+                ret = EXCP_INTERRUPT;
+                break;
+            }
+            fprintf(stderr, "error: kvm run failed %s\n",
+                    strerror(-run_ret));
+            ret = -1;
+            break;
+        }
+
+        trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
+        switch (run->exit_reason) {
+        case KVM_EXIT_IO:
+            DPRINTF("handle_io\n");
+            /* Called outside BQL */
+            kvm_handle_io(run->io.port, attrs,
+                          (uint8_t *)run + run->io.data_offset,
+                          run->io.direction,
+                          run->io.size,
+                          run->io.count);
+           ret = 0;
+            break;
+        case KVM_EXIT_MMIO:
+            DPRINTF("handle_mmio\n");
+            /* Called outside BQL */
+            address_space_rw(&address_space_memory,
+                             run->mmio.phys_addr, attrs,
+                             run->mmio.data,
+                             run->mmio.len,
+                             run->mmio.is_write);
+            ret = 0;
+            break;
+        case KVM_EXIT_SHUTDOWN:
+            DPRINTF("shutdown\n");
+            qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            ret = EXCP_INTERRUPT;
+            break;
+        case KVM_EXIT_UNKNOWN:
+            fprintf(stderr, "KVM: unknown exit, hardware reason %" PRIx64 "\n",
+                    (uint64_t)run->hw.hardware_exit_reason);
+            ret = -1;
+            break;
+        case KVM_EXIT_INTERNAL_ERROR:
+            ret = kvm_handle_internal_error(cpu, run);
+            break;
+        case KVM_EXIT_SYSTEM_EVENT:
+            switch (run->system_event.type) {
+            case KVM_SYSTEM_EVENT_SHUTDOWN:
+                qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+                ret = EXCP_INTERRUPT;
+                break;
+            case KVM_SYSTEM_EVENT_RESET:
+                qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+                ret = EXCP_INTERRUPT;
+                break;
+            case KVM_SYSTEM_EVENT_CRASH:
+                kvm_cpu_synchronize_state(cpu);
+                qemu_mutex_lock_iothread();
+                qemu_system_guest_panicked(cpu_get_crash_info(cpu));
+                qemu_mutex_unlock_iothread();
+                ret = 0;
+                break;
+            default:
+                DPRINTF("kvm_arch_handle_exit\n");
+                ret = kvm_arch_handle_exit(cpu, run);
+                break;
+            }
+            break;
+        default:
+            DPRINTF("kvm_arch_handle_exit\n");
+            ret = kvm_arch_handle_exit(cpu, run);
+            break;
+        }
+    } while (ret == 0);
+
+    cpu_exec_end(cpu);
+    qemu_mutex_lock_iothread();
+
+    if (ret < 0) {
+        cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
+        vm_stop(RUN_STATE_INTERNAL_ERROR);
+    }
+
+    qatomic_set(&cpu->exit_request, 0);
+    return ret;
+}
+
 int kvm_ioctl(KVMState *s, int type, ...)
 {
     int ret;
diff --git a/accel/kvm/kvm-cpus.h b/accel/kvm/kvm-cpus.h
index bf0bd1bee4..c8c7e52bcd 100644
--- a/accel/kvm/kvm-cpus.h
+++ b/accel/kvm/kvm-cpus.h
@@ -13,7 +13,9 @@
 #include "sysemu/cpus.h"
 
 int kvm_init_vcpu(CPUState *cpu, Error **errp);
+int kvm_init_mirror_vcpu(CPUState *cpu, Error **errp);
 int kvm_cpu_exec(CPUState *cpu);
+int kvm_mirror_cpu_exec(CPUState *cpu);
 void kvm_destroy_vcpu(CPUState *cpu);
 void kvm_cpu_synchronize_post_reset(CPUState *cpu);
 void kvm_cpu_synchronize_post_init(CPUState *cpu);
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 6847ffcdfd..03e7b5afa0 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -352,6 +352,7 @@ int kvm_arch_get_registers(CPUState *cpu);
 #define KVM_PUT_FULL_STATE      3
 
 int kvm_arch_put_registers(CPUState *cpu, int level);
+int kvm_arch_mirror_put_registers(CPUState *cpu, int level);
 
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e69abe48e3..d6d52a06bc 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4154,6 +4154,48 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
     return 0;
 }
 
+int kvm_arch_mirror_put_registers(CPUState *cpu, int level)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    int ret;
+
+    assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
+
+    /* must be before kvm_put_nested_state so that EFER.SVME is set */
+    ret = kvm_put_sregs(x86_cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (level == KVM_PUT_FULL_STATE) {
+        /*
+         * We don't check for kvm_arch_set_tsc_khz() errors here,
+         * because TSC frequency mismatch shouldn't abort migration,
+         * unless the user explicitly asked for a more strict TSC
+         * setting (e.g. using an explicit "tsc-freq" option).
+         */
+        kvm_arch_set_tsc_khz(cpu);
+    }
+
+    ret = kvm_getput_regs(x86_cpu, 1);
+    if (ret < 0) {
+        return ret;
+    }
+    ret = kvm_put_xsave(x86_cpu);
+    if (ret < 0) {
+        return ret;
+    }
+    ret = kvm_put_xcrs(x86_cpu);
+    if (ret < 0) {
+        return ret;
+    }
+    ret = kvm_put_debugregs(x86_cpu);
+    if (ret < 0) {
+        return ret;
+    }
+    return 0;
+}
+
 int kvm_arch_get_registers(CPUState *cs)
 {
     X86CPU *cpu = X86_CPU(cs);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 09/13] kvm: create Mirror VM and share primary VM's encryption context.
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (7 preceding siblings ...)
  2021-08-16 13:29 ` [RFC PATCH 08/13] kvm: Add Mirror VM support Ashish Kalra
@ 2021-08-16 13:29 ` Ashish Kalra
  2021-08-16 13:30 ` [RFC PATCH 10/13] softmmu/cpu: Skip mirror vcpu's for pause, resume and synchronization Ashish Kalra
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

Create the Mirror VM and share the primary VM's encryption context
with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 accel/kvm/kvm-all.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f14b33dde1..624d1f779e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -369,9 +369,17 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo
         if (ret < 0) {
             goto err;
         }
+        ret = kvm_mirror_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+        if (ret < 0) {
+            goto err;
+        }
     }
     mem.memory_size = slot->memory_size;
     ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+    if (ret < 0) {
+        goto err;
+    }
+    ret = kvm_mirror_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
     slot->old_flags = mem.flags;
 err:
     trace_kvm_set_user_memory(mem.slot, mem.flags, mem.guest_phys_addr,
@@ -2606,11 +2614,33 @@ static int kvm_init(MachineState *ms)
 
     kvm_state = s;
 
+    if (ms->smp.mirror_vcpus) {
+        do {
+            ret = kvm_ioctl(s, KVM_CREATE_VM, type);
+        } while (ret == -EINTR);
+
+        if (ret < 0) {
+            fprintf(stderr, "ioctl(KVM_CREATE_VM mirror vm) failed: %d %s\n",
+                    -ret, strerror(-ret));
+            goto err;
+        }
+        s->mirror_vm_fd = ret;
+    }
+
     ret = kvm_arch_init(ms, s);
     if (ret < 0) {
         goto err;
     }
 
+    if (s->mirror_vm_fd &&
+        kvm_vm_check_extension(s, KVM_CAP_VM_COPY_ENC_CONTEXT_FROM)) {
+        ret = kvm_mirror_vm_enable_cap(s, KVM_CAP_VM_COPY_ENC_CONTEXT_FROM,
+                                       0, s->vmfd);
+        if (ret < 0) {
+            goto err;
+        }
+    }
+
     if (s->kernel_irqchip_split == ON_OFF_AUTO_AUTO) {
         s->kernel_irqchip_split = mc->default_kernel_irqchip_split ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
     }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 10/13] softmmu/cpu: Skip mirror vcpu's for pause, resume and synchronization.
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (8 preceding siblings ...)
  2021-08-16 13:29 ` [RFC PATCH 09/13] kvm: create Mirror VM and share primary VM's encryption context Ashish Kalra
@ 2021-08-16 13:30 ` Ashish Kalra
  2021-08-16 13:30 ` [RFC PATCH 11/13] kvm/apic: Disable in-kernel APIC support for mirror vcpu's Ashish Kalra
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

Skip mirror vcpus's for vcpu pause, resume and synchronization
operations.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 softmmu/cpus.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 071085f840..caed382669 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -101,6 +101,9 @@ bool all_cpu_threads_idle(void)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
+        if (cpu->mirror_vcpu) {
+            continue;
+        }
         if (!cpu_thread_is_idle(cpu)) {
             return false;
         }
@@ -136,6 +139,9 @@ void cpu_synchronize_all_states(void)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
+        if (cpu->mirror_vcpu) {
+            continue;
+        }
         cpu_synchronize_state(cpu);
     }
 }
@@ -145,6 +151,9 @@ void cpu_synchronize_all_post_reset(void)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
+        if (cpu->mirror_vcpu) {
+            continue;
+        }
         cpu_synchronize_post_reset(cpu);
     }
 }
@@ -154,6 +163,9 @@ void cpu_synchronize_all_post_init(void)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
+        if (cpu->mirror_vcpu) {
+            continue;
+        }
         cpu_synchronize_post_init(cpu);
     }
 }
@@ -163,6 +175,9 @@ void cpu_synchronize_all_pre_loadvm(void)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
+        if (cpu->mirror_vcpu) {
+            continue;
+        }
         cpu_synchronize_pre_loadvm(cpu);
     }
 }
@@ -531,6 +546,9 @@ static bool all_vcpus_paused(void)
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
+        if (cpu->mirror_vcpu) {
+            continue;
+        }
         if (!cpu->stopped) {
             return false;
         }
@@ -545,6 +563,9 @@ void pause_all_vcpus(void)
 
     qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false);
     CPU_FOREACH(cpu) {
+        if (cpu->mirror_vcpu) {
+            continue;
+        }
         if (qemu_cpu_is_self(cpu)) {
             qemu_cpu_stop(cpu, true);
         } else {
@@ -561,6 +582,9 @@ void pause_all_vcpus(void)
     while (!all_vcpus_paused()) {
         qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
         CPU_FOREACH(cpu) {
+            if (cpu->mirror_vcpu) {
+                continue;
+            }
             qemu_cpu_kick(cpu);
         }
     }
@@ -587,6 +611,9 @@ void resume_all_vcpus(void)
 
     qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
     CPU_FOREACH(cpu) {
+        if (cpu->mirror_vcpu) {
+            continue;
+        }
         cpu_resume(cpu);
     }
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 11/13] kvm/apic: Disable in-kernel APIC support for mirror vcpu's.
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (9 preceding siblings ...)
  2021-08-16 13:30 ` [RFC PATCH 10/13] softmmu/cpu: Skip mirror vcpu's for pause, resume and synchronization Ashish Kalra
@ 2021-08-16 13:30 ` Ashish Kalra
  2021-08-16 13:31 ` [RFC PATCH 12/13] hw/acpi: disable modern CPU hotplug interface " Ashish Kalra
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

Mirror VM does not support any interrupt controller and this
requires disabling the in-kernel APIC support on mirror vcpu's.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 hw/i386/kvm/apic.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 1e89ca0899..902fe49fc7 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -125,6 +125,11 @@ static void kvm_apic_vapic_base_update(APICCommonState *s)
         .vapic_addr = s->vapic_paddr,
     };
     int ret;
+    CPUState *cpu = CPU(s->cpu);
+
+    if (cpu->mirror_vcpu) {
+        return;
+    }
 
     ret = kvm_vcpu_ioctl(CPU(s->cpu), KVM_SET_VAPIC_ADDR, &vapid_addr);
     if (ret < 0) {
@@ -139,6 +144,11 @@ static void kvm_apic_put(CPUState *cs, run_on_cpu_data data)
     APICCommonState *s = data.host_ptr;
     struct kvm_lapic_state kapic;
     int ret;
+    CPUState *cpu = CPU(s->cpu);
+
+    if (cpu->mirror_vcpu) {
+        return;
+    }
 
     kvm_put_apicbase(s->cpu, s->apicbase);
     kvm_put_apic_state(s, &kapic);
@@ -227,6 +237,11 @@ static void kvm_apic_reset(APICCommonState *s)
 static void kvm_apic_realize(DeviceState *dev, Error **errp)
 {
     APICCommonState *s = APIC_COMMON(dev);
+    CPUState *cpu = CPU(s->cpu);
+
+    if (cpu->mirror_vcpu) {
+        return;
+    }
 
     memory_region_init_io(&s->io_memory, OBJECT(s), &kvm_apic_io_ops, s,
                           "kvm-apic-msi", APIC_SPACE_SIZE);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 12/13] hw/acpi: disable modern CPU hotplug interface for mirror vcpu's
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (10 preceding siblings ...)
  2021-08-16 13:30 ` [RFC PATCH 11/13] kvm/apic: Disable in-kernel APIC support for mirror vcpu's Ashish Kalra
@ 2021-08-16 13:31 ` Ashish Kalra
  2021-08-16 13:31 ` [RFC PATCH 13/13] hw/i386/pc: reduce fw_cfg boot cpu count taking into account " Ashish Kalra
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

OVMF expects both fw_cfg and the modern CPU hotplug interface to
return the same boot CPU count. We reduce the fw_cfg boot cpu count
with number of mirror vcpus's. This fails the OVMF sanity check
as fw_cfg boot cpu count and modern CPU hotplug interface boot
count don't match, hence disable the modern CPU hotplug interface.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 hw/acpi/cpu.c         | 11 ++++++++++-
 include/hw/acpi/cpu.h |  1 +
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 8ac2fd018e..6cfaf2b450 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -86,7 +86,12 @@ static uint64_t cpu_hotplug_rd(void *opaque, hwaddr addr, unsigned size)
     case ACPI_CPU_CMD_DATA2_OFFSET_R:
         switch (cpu_st->command) {
         case CPHP_GET_NEXT_CPU_WITH_EVENT_CMD:
-           val = 0;
+           /* Disabling modern CPUHP interface for mirror vCPU support */
+           if (!cpu_st->mirror_vcpu_enabled) {
+               val = 0;
+           } else {
+               val = -1ULL;
+           }
            break;
         case CPHP_GET_CPU_ID_CMD:
            val = cdev->arch_id >> 32;
@@ -226,6 +231,10 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
     state->dev_count = id_list->len;
     state->devs = g_new0(typeof(*state->devs), state->dev_count);
     for (i = 0; i < id_list->len; i++) {
+        /* Disabling modern CPUHP interface for mirror vCPU support */
+        if (id_list->cpus[i].mirror_vcpu) {
+            state->mirror_vcpu_enabled = TRUE;
+        }
         state->devs[i].cpu =  CPU(id_list->cpus[i].cpu);
         state->devs[i].arch_id = id_list->cpus[i].arch_id;
     }
diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
index 999caaf510..e7949e86b8 100644
--- a/include/hw/acpi/cpu.h
+++ b/include/hw/acpi/cpu.h
@@ -33,6 +33,7 @@ typedef struct CPUHotplugState {
     uint8_t command;
     uint32_t dev_count;
     AcpiCpuStatus *devs;
+    bool mirror_vcpu_enabled;
 } CPUHotplugState;
 
 void acpi_cpu_plug_cb(HotplugHandler *hotplug_dev,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [RFC PATCH 13/13] hw/i386/pc: reduce fw_cfg boot cpu count taking into account mirror vcpu's.
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
                   ` (11 preceding siblings ...)
  2021-08-16 13:31 ` [RFC PATCH 12/13] hw/acpi: disable modern CPU hotplug interface " Ashish Kalra
@ 2021-08-16 13:31 ` Ashish Kalra
  2021-08-16 14:01   ` Claudio Fontana
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 13:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

From: Ashish Kalra <ashish.kalra@amd.com>

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 hw/i386/pc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3856a47390..2c353becb7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -962,6 +962,9 @@ void pc_memory_init(PCMachineState *pcms,
                                         option_rom_mr,
                                         1);
 
+    /* Reduce x86 boot cpu count taking into account mirror vcpus */
+    x86ms->boot_cpus -= machine->smp.mirror_vcpus;
+
     fw_cfg = fw_cfg_arch_create(machine,
                                 x86ms->boot_cpus, x86ms->apic_id_limit);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
@ 2021-08-16 14:01   ` Claudio Fontana
  2021-08-16 13:27 ` [RFC PATCH 02/13] hw/boards: Add mirror_vcpu flag to CPUArchId Ashish Kalra
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Claudio Fontana @ 2021-08-16 14:01 UTC (permalink / raw)
  To: Ashish Kalra, qemu-devel
  Cc: pbonzini, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm



On 8/16/21 3:25 PM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> This is an RFC series for Mirror VM support that are 
> essentially secondary VMs sharing the encryption context 
> (ASID) with a primary VM. The patch-set creates a new 
> VM and shares the primary VM's encryption context
> with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> The mirror VM uses a separate pair of VM + vCPU file 
> descriptors and also uses a simplified KVM run loop, 
> for example, it does not support any interrupt vmexit's. etc.
> Currently the mirror VM shares the address space of the
> primary VM. 

Hi,

I'd expect some entry in docs/ ?

Thanks,

Claudio

> 
> The mirror VM can be used for running an in-guest migration 
> helper (MH). It also might have future uses for other in-guest
> operations.
> 
> The mirror VM support is enabled by adding a mirrorvcpus=N
> suboption to -smp, which also designates a few vcpus (normally 1)
> to the mirror VM.
> 
> Example usage for starting a 4-vcpu guest, of which 1 vcpu is marked as
> mirror vcpu.
> 
>     qemu-system-x86_64 -smp 4,mirrorvcpus=1 ...
> 
> Ashish Kalra (7):
>   kvm: Add Mirror VM ioctl and enable cap interfaces.
>   kvm: Add Mirror VM support.
>   kvm: create Mirror VM and share primary VM's encryption context.
>   softmmu/cpu: Skip mirror vcpu's for pause, resume and synchronization.
>   kvm/apic: Disable in-kernel APIC support for mirror vcpu's.
>   hw/acpi: disable modern CPU hotplug interface for mirror vcpu's
>   hw/i386/pc: reduce fw_cfg boot cpu count taking into account mirror
>     vcpu's.
> 
> Dov Murik (5):
>   machine: Add mirrorvcpus=N suboption to -smp
>   hw/boards: Add mirror_vcpu flag to CPUArchId
>   hw/i386: Mark mirror vcpus in possible_cpus
>   cpu: Add boolean mirror_vcpu field to CPUState
>   hw/i386: Set CPUState.mirror_vcpu=true for mirror vcpus
> 
> Tobin Feldman-Fitzthum (1):
>   hw/acpi: Don't include mirror vcpus in ACPI tables
> 
>  accel/kvm/kvm-accel-ops.c |  45 ++++++-
>  accel/kvm/kvm-all.c       | 244 +++++++++++++++++++++++++++++++++++++-
>  accel/kvm/kvm-cpus.h      |   2 +
>  hw/acpi/cpu.c             |  21 +++-
>  hw/core/cpu-common.c      |   1 +
>  hw/core/machine.c         |   7 ++
>  hw/i386/acpi-build.c      |   5 +
>  hw/i386/acpi-common.c     |   5 +
>  hw/i386/kvm/apic.c        |  15 +++
>  hw/i386/pc.c              |  10 ++
>  hw/i386/x86.c             |  11 +-
>  include/hw/acpi/cpu.h     |   1 +
>  include/hw/boards.h       |   3 +
>  include/hw/core/cpu.h     |   3 +
>  include/hw/i386/x86.h     |   3 +-
>  include/sysemu/kvm.h      |  15 +++
>  qapi/machine.json         |   5 +-
>  softmmu/cpus.c            |  27 +++++
>  softmmu/vl.c              |   3 +
>  target/i386/kvm/kvm.c     |  42 +++++++
>  20 files changed, 459 insertions(+), 9 deletions(-)
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 14:01   ` Claudio Fontana
  0 siblings, 0 replies; 104+ messages in thread
From: Claudio Fontana @ 2021-08-16 14:01 UTC (permalink / raw)
  To: Ashish Kalra, qemu-devel
  Cc: thomas.lendacky, brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, frankeh, dgilbert, dovmurik, pbonzini



On 8/16/21 3:25 PM, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> This is an RFC series for Mirror VM support that are 
> essentially secondary VMs sharing the encryption context 
> (ASID) with a primary VM. The patch-set creates a new 
> VM and shares the primary VM's encryption context
> with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> The mirror VM uses a separate pair of VM + vCPU file 
> descriptors and also uses a simplified KVM run loop, 
> for example, it does not support any interrupt vmexit's. etc.
> Currently the mirror VM shares the address space of the
> primary VM. 

Hi,

I'd expect some entry in docs/ ?

Thanks,

Claudio

> 
> The mirror VM can be used for running an in-guest migration 
> helper (MH). It also might have future uses for other in-guest
> operations.
> 
> The mirror VM support is enabled by adding a mirrorvcpus=N
> suboption to -smp, which also designates a few vcpus (normally 1)
> to the mirror VM.
> 
> Example usage for starting a 4-vcpu guest, of which 1 vcpu is marked as
> mirror vcpu.
> 
>     qemu-system-x86_64 -smp 4,mirrorvcpus=1 ...
> 
> Ashish Kalra (7):
>   kvm: Add Mirror VM ioctl and enable cap interfaces.
>   kvm: Add Mirror VM support.
>   kvm: create Mirror VM and share primary VM's encryption context.
>   softmmu/cpu: Skip mirror vcpu's for pause, resume and synchronization.
>   kvm/apic: Disable in-kernel APIC support for mirror vcpu's.
>   hw/acpi: disable modern CPU hotplug interface for mirror vcpu's
>   hw/i386/pc: reduce fw_cfg boot cpu count taking into account mirror
>     vcpu's.
> 
> Dov Murik (5):
>   machine: Add mirrorvcpus=N suboption to -smp
>   hw/boards: Add mirror_vcpu flag to CPUArchId
>   hw/i386: Mark mirror vcpus in possible_cpus
>   cpu: Add boolean mirror_vcpu field to CPUState
>   hw/i386: Set CPUState.mirror_vcpu=true for mirror vcpus
> 
> Tobin Feldman-Fitzthum (1):
>   hw/acpi: Don't include mirror vcpus in ACPI tables
> 
>  accel/kvm/kvm-accel-ops.c |  45 ++++++-
>  accel/kvm/kvm-all.c       | 244 +++++++++++++++++++++++++++++++++++++-
>  accel/kvm/kvm-cpus.h      |   2 +
>  hw/acpi/cpu.c             |  21 +++-
>  hw/core/cpu-common.c      |   1 +
>  hw/core/machine.c         |   7 ++
>  hw/i386/acpi-build.c      |   5 +
>  hw/i386/acpi-common.c     |   5 +
>  hw/i386/kvm/apic.c        |  15 +++
>  hw/i386/pc.c              |  10 ++
>  hw/i386/x86.c             |  11 +-
>  include/hw/acpi/cpu.h     |   1 +
>  include/hw/boards.h       |   3 +
>  include/hw/core/cpu.h     |   3 +
>  include/hw/i386/x86.h     |   3 +-
>  include/sysemu/kvm.h      |  15 +++
>  qapi/machine.json         |   5 +-
>  softmmu/cpus.c            |  27 +++++
>  softmmu/vl.c              |   3 +
>  target/i386/kvm/kvm.c     |  42 +++++++
>  20 files changed, 459 insertions(+), 9 deletions(-)
> 



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
@ 2021-08-16 14:15   ` Paolo Bonzini
  2021-08-16 13:27 ` [RFC PATCH 02/13] hw/boards: Add mirror_vcpu flag to CPUArchId Ashish Kalra
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 14:15 UTC (permalink / raw)
  To: Ashish Kalra, qemu-devel
  Cc: thomas.lendacky, brijesh.singh, ehabkost, mst, richard.henderson,
	jejb, tobin, dovmurik, frankeh, dgilbert, kvm

On 16/08/21 15:25, Ashish Kalra wrote:
> From: Ashish Kalra<ashish.kalra@amd.com>
> 
> This is an RFC series for Mirror VM support that are
> essentially secondary VMs sharing the encryption context
> (ASID) with a primary VM. The patch-set creates a new
> VM and shares the primary VM's encryption context
> with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> The mirror VM uses a separate pair of VM + vCPU file
> descriptors and also uses a simplified KVM run loop,
> for example, it does not support any interrupt vmexit's. etc.
> Currently the mirror VM shares the address space of the
> primary VM.
> 
> The mirror VM can be used for running an in-guest migration
> helper (MH). It also might have future uses for other in-guest
> operations.

Hi,

first of all, thanks for posting this work and starting the discussion.

However, I am not sure if the in-guest migration helper vCPUs should use 
the existing KVM support code.  For example, they probably can just 
always work with host CPUID (copied directly from 
KVM_GET_SUPPORTED_CPUID), and they do not need to interface with QEMU's 
MMIO logic.  They would just sit on a "HLT" instruction and communicate 
with the main migration loop using some kind of standardized ring buffer 
protocol; the migration loop then executes KVM_RUN in order to start the 
processing of pages, and expects a KVM_EXIT_HLT when the VM has nothing 
to do or requires processing on the host.

The migration helper can then also use its own address space, for 
example operating directly on ram_addr_t values with the helper running 
at very high virtual addresses.  Migration code can use a 
RAMBlockNotifier to invoke KVM_SET_USER_MEMORY_REGION on the mirror VM 
(and never enable dirty memory logging on the mirror VM, too, which has 
better performance).

With this implementation, the number of mirror vCPUs does not even have 
to be indicated on the command line.  The VM and its vCPUs can simply be 
created when migration starts.  In the SEV-ES case, the guest can even 
provide the VMSA that starts the migration helper.

The disadvantage is that, as you point out, in the future some of the 
infrastructure you introduce might be useful for VMPL0 operation on 
SEV-SNP.  My proposal above might require some code duplication. 
However, it might even be that VMPL0 operation works best with a model 
more similar to my sketch of the migration helper; it's really too early 
to say.

Paolo


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 14:15   ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 14:15 UTC (permalink / raw)
  To: Ashish Kalra, qemu-devel
  Cc: thomas.lendacky, brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, frankeh, dgilbert, dovmurik

On 16/08/21 15:25, Ashish Kalra wrote:
> From: Ashish Kalra<ashish.kalra@amd.com>
> 
> This is an RFC series for Mirror VM support that are
> essentially secondary VMs sharing the encryption context
> (ASID) with a primary VM. The patch-set creates a new
> VM and shares the primary VM's encryption context
> with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> The mirror VM uses a separate pair of VM + vCPU file
> descriptors and also uses a simplified KVM run loop,
> for example, it does not support any interrupt vmexit's. etc.
> Currently the mirror VM shares the address space of the
> primary VM.
> 
> The mirror VM can be used for running an in-guest migration
> helper (MH). It also might have future uses for other in-guest
> operations.

Hi,

first of all, thanks for posting this work and starting the discussion.

However, I am not sure if the in-guest migration helper vCPUs should use 
the existing KVM support code.  For example, they probably can just 
always work with host CPUID (copied directly from 
KVM_GET_SUPPORTED_CPUID), and they do not need to interface with QEMU's 
MMIO logic.  They would just sit on a "HLT" instruction and communicate 
with the main migration loop using some kind of standardized ring buffer 
protocol; the migration loop then executes KVM_RUN in order to start the 
processing of pages, and expects a KVM_EXIT_HLT when the VM has nothing 
to do or requires processing on the host.

The migration helper can then also use its own address space, for 
example operating directly on ram_addr_t values with the helper running 
at very high virtual addresses.  Migration code can use a 
RAMBlockNotifier to invoke KVM_SET_USER_MEMORY_REGION on the mirror VM 
(and never enable dirty memory logging on the mirror VM, too, which has 
better performance).

With this implementation, the number of mirror vCPUs does not even have 
to be indicated on the command line.  The VM and its vCPUs can simply be 
created when migration starts.  In the SEV-ES case, the guest can even 
provide the VMSA that starts the migration helper.

The disadvantage is that, as you point out, in the future some of the 
infrastructure you introduce might be useful for VMPL0 operation on 
SEV-SNP.  My proposal above might require some code duplication. 
However, it might even be that VMPL0 operation works best with a model 
more similar to my sketch of the migration helper; it's really too early 
to say.

Paolo



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 14:15   ` Paolo Bonzini
@ 2021-08-16 14:23     ` Daniel P. Berrangé
  -1 siblings, 0 replies; 104+ messages in thread
From: Daniel P. Berrangé @ 2021-08-16 14:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ashish Kalra, qemu-devel, thomas.lendacky, brijesh.singh,
	ehabkost, kvm, mst, tobin, jejb, richard.henderson, frankeh,
	dgilbert, dovmurik

On Mon, Aug 16, 2021 at 04:15:46PM +0200, Paolo Bonzini wrote:
> On 16/08/21 15:25, Ashish Kalra wrote:
> > From: Ashish Kalra<ashish.kalra@amd.com>
> > 
> > This is an RFC series for Mirror VM support that are
> > essentially secondary VMs sharing the encryption context
> > (ASID) with a primary VM. The patch-set creates a new
> > VM and shares the primary VM's encryption context
> > with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> > The mirror VM uses a separate pair of VM + vCPU file
> > descriptors and also uses a simplified KVM run loop,
> > for example, it does not support any interrupt vmexit's. etc.
> > Currently the mirror VM shares the address space of the
> > primary VM.
> > 
> > The mirror VM can be used for running an in-guest migration
> > helper (MH). It also might have future uses for other in-guest
> > operations.
> 

snip

> With this implementation, the number of mirror vCPUs does not even have to
> be indicated on the command line.  The VM and its vCPUs can simply be
> created when migration starts.  In the SEV-ES case, the guest can even
> provide the VMSA that starts the migration helper.

I don't think management apps will accept that approach when pinning
guests. They will want control over how many extra vCPU threads are
created, what host pCPUs they are pinned to, and what schedular
policies might be applied to them.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 14:23     ` Daniel P. Berrangé
  0 siblings, 0 replies; 104+ messages in thread
From: Daniel P. Berrangé @ 2021-08-16 14:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	richard.henderson, jejb, tobin, qemu-devel, dgilbert, frankeh,
	dovmurik

On Mon, Aug 16, 2021 at 04:15:46PM +0200, Paolo Bonzini wrote:
> On 16/08/21 15:25, Ashish Kalra wrote:
> > From: Ashish Kalra<ashish.kalra@amd.com>
> > 
> > This is an RFC series for Mirror VM support that are
> > essentially secondary VMs sharing the encryption context
> > (ASID) with a primary VM. The patch-set creates a new
> > VM and shares the primary VM's encryption context
> > with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> > The mirror VM uses a separate pair of VM + vCPU file
> > descriptors and also uses a simplified KVM run loop,
> > for example, it does not support any interrupt vmexit's. etc.
> > Currently the mirror VM shares the address space of the
> > primary VM.
> > 
> > The mirror VM can be used for running an in-guest migration
> > helper (MH). It also might have future uses for other in-guest
> > operations.
> 

snip

> With this implementation, the number of mirror vCPUs does not even have to
> be indicated on the command line.  The VM and its vCPUs can simply be
> created when migration starts.  In the SEV-ES case, the guest can even
> provide the VMSA that starts the migration helper.

I don't think management apps will accept that approach when pinning
guests. They will want control over how many extra vCPU threads are
created, what host pCPUs they are pinned to, and what schedular
policies might be applied to them.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 14:15   ` Paolo Bonzini
  (?)
  (?)
@ 2021-08-16 14:44   ` Ashish Kalra
  2021-08-16 14:58       ` Paolo Bonzini
  -1 siblings, 1 reply; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 14:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

Hello Paolo,

On Mon, Aug 16, 2021 at 04:15:46PM +0200, Paolo Bonzini wrote:
> On 16/08/21 15:25, Ashish Kalra wrote:
> > From: Ashish Kalra<ashish.kalra@amd.com>
> > 
> > This is an RFC series for Mirror VM support that are
> > essentially secondary VMs sharing the encryption context
> > (ASID) with a primary VM. The patch-set creates a new
> > VM and shares the primary VM's encryption context
> > with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> > The mirror VM uses a separate pair of VM + vCPU file
> > descriptors and also uses a simplified KVM run loop,
> > for example, it does not support any interrupt vmexit's. etc.
> > Currently the mirror VM shares the address space of the
> > primary VM.
> > 
> > The mirror VM can be used for running an in-guest migration
> > helper (MH). It also might have future uses for other in-guest
> > operations.
> 
> Hi,
> 
> first of all, thanks for posting this work and starting the discussion.
> 
> However, I am not sure if the in-guest migration helper vCPUs should use the
> existing KVM support code.  For example, they probably can just always work
> with host CPUID (copied directly from KVM_GET_SUPPORTED_CPUID), and they do
> not need to interface with QEMU's MMIO logic.  They would just sit on a
> "HLT" instruction and communicate with the main migration loop using some
> kind of standardized ring buffer protocol; the migration loop then executes
> KVM_RUN in order to start the processing of pages, and expects a
> KVM_EXIT_HLT when the VM has nothing to do or requires processing on the
> host.
> 

]

I am not sure if we really don't need QEMU's MMIO logic, I think that once the
mirror VM starts booting and running the UEFI code, it might be only during
the PEI or DXE phase where it will start actually running the MH code,
so mirror VM probably still need to handles KVM_EXIT_IO when SEC phase does I/O,
I can see PIC accesses and Debug Agent initialization stuff in SEC startup code.

Addtionally this still requires CPUState{..} structure and the backing
"X86CPU" structure, for example, as part of kvm_arch_post_run() to get
the MemTxAttrs needed by kvm_handle_io().

Thanks,
Ashish

> The migration helper can then also use its own address space, for example
> operating directly on ram_addr_t values with the helper running at very high
> virtual addresses.  Migration code can use a RAMBlockNotifier to invoke
> KVM_SET_USER_MEMORY_REGION on the mirror VM (and never enable dirty memory
> logging on the mirror VM, too, which has better performance).
> 
> With this implementation, the number of mirror vCPUs does not even have to
> be indicated on the command line.  The VM and its vCPUs can simply be
> created when migration starts.  In the SEV-ES case, the guest can even
> provide the VMSA that starts the migration helper.
> 
> The disadvantage is that, as you point out, in the future some of the
> infrastructure you introduce might be useful for VMPL0 operation on SEV-SNP.
> My proposal above might require some code duplication. However, it might
> even be that VMPL0 operation works best with a model more similar to my
> sketch of the migration helper; it's really too early to say.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 14:44   ` Ashish Kalra
@ 2021-08-16 14:58       ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 14:58 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: qemu-devel, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

On 16/08/21 16:44, Ashish Kalra wrote:
> I think that once the mirror VM starts booting and running the UEFI
> code, it might be only during the PEI or DXE phase where it will
> start actually running the MH code, so mirror VM probably still need
> to handles KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> accesses and Debug Agent initialization stuff in SEC startup code.

That may be a design of the migration helper code that you were working
with, but it's not necessary.

The migration helper can be just some code that the guest "donates" to
the host.  The entry point need not be the usual 0xfffffff0; it can be
booted directly in 64-bit mode with a CR3 and EIP that the guest
provides to the guest---for example with a UEFI GUIDed structure.

In fact, the migration helper can run even before the guest has booted
and while the guest is paused, so I don't think that it is possible to
make use of any device emulation code in it.

Paolo


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 14:58       ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 14:58 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: thomas.lendacky, brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, qemu-devel, dgilbert, frankeh, dovmurik

On 16/08/21 16:44, Ashish Kalra wrote:
> I think that once the mirror VM starts booting and running the UEFI
> code, it might be only during the PEI or DXE phase where it will
> start actually running the MH code, so mirror VM probably still need
> to handles KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> accesses and Debug Agent initialization stuff in SEC startup code.

That may be a design of the migration helper code that you were working
with, but it's not necessary.

The migration helper can be just some code that the guest "donates" to
the host.  The entry point need not be the usual 0xfffffff0; it can be
booted directly in 64-bit mode with a CR3 and EIP that the guest
provides to the guest---for example with a UEFI GUIDed structure.

In fact, the migration helper can run even before the guest has booted
and while the guest is paused, so I don't think that it is possible to
make use of any device emulation code in it.

Paolo



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 14:23     ` Daniel P. Berrangé
@ 2021-08-16 15:00       ` Paolo Bonzini
  -1 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 15:00 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	richard.henderson, jejb, tobin, qemu-devel, dgilbert, frankeh,
	dovmurik

On 16/08/21 16:23, Daniel P. Berrangé wrote:
> snip
> 
>> With this implementation, the number of mirror vCPUs does not even have to
>> be indicated on the command line.  The VM and its vCPUs can simply be
>> created when migration starts.  In the SEV-ES case, the guest can even
>> provide the VMSA that starts the migration helper.
>
> I don't think management apps will accept that approach when pinning
> guests. They will want control over how many extra vCPU threads are
> created, what host pCPUs they are pinned to, and what schedular
> policies might be applied to them.

That doesn't require creating the migration threads at startup, or 
making them vCPU threads, does it?

The migration helper is guest code that is run within the context of the 
migration helper in order to decrypt/re-encrypt the code (using a 
different tweak that is based on e.g. the ram_addr_t rather than the 
HPA).  How does libvirt pin migration thread(s) currently?

Paolo


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 15:00       ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 15:00 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, jejb, richard.henderson, qemu-devel, dgilbert, frankeh,
	dovmurik

On 16/08/21 16:23, Daniel P. Berrangé wrote:
> snip
> 
>> With this implementation, the number of mirror vCPUs does not even have to
>> be indicated on the command line.  The VM and its vCPUs can simply be
>> created when migration starts.  In the SEV-ES case, the guest can even
>> provide the VMSA that starts the migration helper.
>
> I don't think management apps will accept that approach when pinning
> guests. They will want control over how many extra vCPU threads are
> created, what host pCPUs they are pinned to, and what schedular
> policies might be applied to them.

That doesn't require creating the migration threads at startup, or 
making them vCPU threads, does it?

The migration helper is guest code that is run within the context of the 
migration helper in order to decrypt/re-encrypt the code (using a 
different tweak that is based on e.g. the ram_addr_t rather than the 
HPA).  How does libvirt pin migration thread(s) currently?

Paolo



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 14:58       ` Paolo Bonzini
  (?)
@ 2021-08-16 15:13       ` Ashish Kalra
  2021-08-16 15:38           ` Paolo Bonzini
  -1 siblings, 1 reply; 104+ messages in thread
From: Ashish Kalra @ 2021-08-16 15:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

Hello Paolo,

On Mon, Aug 16, 2021 at 04:58:02PM +0200, Paolo Bonzini wrote:
> On 16/08/21 16:44, Ashish Kalra wrote:
> > I think that once the mirror VM starts booting and running the UEFI
> > code, it might be only during the PEI or DXE phase where it will
> > start actually running the MH code, so mirror VM probably still need
> > to handles KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> > accesses and Debug Agent initialization stuff in SEC startup code.
> 
> That may be a design of the migration helper code that you were working
> with, but it's not necessary.
> 
Actually my comments are about a more generic MH code.

> The migration helper can be just some code that the guest "donates" to
> the host.  The entry point need not be the usual 0xfffffff0; it can be
> booted directly in 64-bit mode with a CR3 and EIP that the guest
> provides to the guest---for example with a UEFI GUIDed structure.

Yes, this is consistent with the MH code we are currently testing, it
boots directly into 64-bit mode. This is what Tobin's response is also
pointing out to.

Thanks,
Ashish
> 
> In fact, the migration helper can run even before the guest has booted
> and while the guest is paused, so I don't think that it is possible to
> make use of any device emulation code in it.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 15:00       ` Paolo Bonzini
@ 2021-08-16 15:16         ` Daniel P. Berrangé
  -1 siblings, 0 replies; 104+ messages in thread
From: Daniel P. Berrangé @ 2021-08-16 15:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	richard.henderson, jejb, tobin, qemu-devel, dgilbert, frankeh,
	dovmurik

On Mon, Aug 16, 2021 at 05:00:21PM +0200, Paolo Bonzini wrote:
> On 16/08/21 16:23, Daniel P. Berrangé wrote:
> > snip
> > 
> > > With this implementation, the number of mirror vCPUs does not even have to
> > > be indicated on the command line.  The VM and its vCPUs can simply be
> > > created when migration starts.  In the SEV-ES case, the guest can even
> > > provide the VMSA that starts the migration helper.
> > 
> > I don't think management apps will accept that approach when pinning
> > guests. They will want control over how many extra vCPU threads are
> > created, what host pCPUs they are pinned to, and what schedular
> > policies might be applied to them.
> 
> That doesn't require creating the migration threads at startup, or making
> them vCPU threads, does it?
> 
> The migration helper is guest code that is run within the context of the
> migration helper in order to decrypt/re-encrypt the code (using a different
> tweak that is based on e.g. the ram_addr_t rather than the HPA).  How does
> libvirt pin migration thread(s) currently?

I don't think we do explicit pinning of migration related threads right
now, which means they'll inherit characteristics of whichever thread
spawns the transient migration thread.  If the mgmt app has pinned the
emulator threads to a single CPU, then creating many migration threads
is a waste of time as they'll compete with each other.

I woudn't be needed to create migration threads at startup - we should
just think about how we would identify and control them when created
at runtime. The complexity here is a trust issue - once guest code has
been run, we can't trust what QMP tells us - so I'm not sure how we
would learn what PIDs are associated with the transiently created
migration threads, in order to set their properties.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 15:16         ` Daniel P. Berrangé
  0 siblings, 0 replies; 104+ messages in thread
From: Daniel P. Berrangé @ 2021-08-16 15:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, jejb, richard.henderson, qemu-devel, dgilbert, frankeh,
	dovmurik

On Mon, Aug 16, 2021 at 05:00:21PM +0200, Paolo Bonzini wrote:
> On 16/08/21 16:23, Daniel P. Berrangé wrote:
> > snip
> > 
> > > With this implementation, the number of mirror vCPUs does not even have to
> > > be indicated on the command line.  The VM and its vCPUs can simply be
> > > created when migration starts.  In the SEV-ES case, the guest can even
> > > provide the VMSA that starts the migration helper.
> > 
> > I don't think management apps will accept that approach when pinning
> > guests. They will want control over how many extra vCPU threads are
> > created, what host pCPUs they are pinned to, and what schedular
> > policies might be applied to them.
> 
> That doesn't require creating the migration threads at startup, or making
> them vCPU threads, does it?
> 
> The migration helper is guest code that is run within the context of the
> migration helper in order to decrypt/re-encrypt the code (using a different
> tweak that is based on e.g. the ram_addr_t rather than the HPA).  How does
> libvirt pin migration thread(s) currently?

I don't think we do explicit pinning of migration related threads right
now, which means they'll inherit characteristics of whichever thread
spawns the transient migration thread.  If the mgmt app has pinned the
emulator threads to a single CPU, then creating many migration threads
is a waste of time as they'll compete with each other.

I woudn't be needed to create migration threads at startup - we should
just think about how we would identify and control them when created
at runtime. The complexity here is a trust issue - once guest code has
been run, we can't trust what QMP tells us - so I'm not sure how we
would learn what PIDs are associated with the transiently created
migration threads, in order to set their properties.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 15:16         ` Daniel P. Berrangé
@ 2021-08-16 15:35           ` Paolo Bonzini
  -1 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 15:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	richard.henderson, jejb, tobin, qemu-devel, dgilbert, frankeh,
	dovmurik

On 16/08/21 17:16, Daniel P. Berrangé wrote:
> I woudn't be needed to create migration threads at startup - we should
> just think about how we would identify and control them when created
> at runtime. The complexity here is a trust issue - once guest code has
> been run, we can't trust what QMP tells us - so I'm not sure how we
> would learn what PIDs are associated with the transiently created
> migration threads, in order to set their properties.

That would apply anyway to any kind of thread though.  It doesn't matter 
whether the migration thread runs host or (mostly) guest code.

Paolo


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 15:35           ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 15:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, jejb, richard.henderson, qemu-devel, dgilbert, frankeh,
	dovmurik

On 16/08/21 17:16, Daniel P. Berrangé wrote:
> I woudn't be needed to create migration threads at startup - we should
> just think about how we would identify and control them when created
> at runtime. The complexity here is a trust issue - once guest code has
> been run, we can't trust what QMP tells us - so I'm not sure how we
> would learn what PIDs are associated with the transiently created
> migration threads, in order to set their properties.

That would apply anyway to any kind of thread though.  It doesn't matter 
whether the migration thread runs host or (mostly) guest code.

Paolo



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 15:13       ` Ashish Kalra
@ 2021-08-16 15:38           ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 15:38 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: qemu-devel, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

On 16/08/21 17:13, Ashish Kalra wrote:
>>> I think that once the mirror VM starts booting and running the UEFI
>>> code, it might be only during the PEI or DXE phase where it will
>>> start actually running the MH code, so mirror VM probably still need
>>> to handles KVM_EXIT_IO when SEC phase does I/O, I can see PIC
>>> accesses and Debug Agent initialization stuff in SEC startup code.
>> That may be a design of the migration helper code that you were working
>> with, but it's not necessary.
>>
> Actually my comments are about a more generic MH code.

I don't think that would be a good idea; designing QEMU's migration 
helper interface to be as constrained as possible is a good thing.  The 
migration helper is extremely security sensitive code, so it should not 
expose itself to the attack surface of the whole of QEMU.

Paolo


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 15:38           ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 15:38 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: thomas.lendacky, brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, qemu-devel, dgilbert, frankeh, dovmurik

On 16/08/21 17:13, Ashish Kalra wrote:
>>> I think that once the mirror VM starts booting and running the UEFI
>>> code, it might be only during the PEI or DXE phase where it will
>>> start actually running the MH code, so mirror VM probably still need
>>> to handles KVM_EXIT_IO when SEC phase does I/O, I can see PIC
>>> accesses and Debug Agent initialization stuff in SEC startup code.
>> That may be a design of the migration helper code that you were working
>> with, but it's not necessary.
>>
> Actually my comments are about a more generic MH code.

I don't think that would be a good idea; designing QEMU's migration 
helper interface to be as constrained as possible is a good thing.  The 
migration helper is extremely security sensitive code, so it should not 
expose itself to the attack surface of the whole of QEMU.

Paolo



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 15:38           ` Paolo Bonzini
@ 2021-08-16 15:48             ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-16 15:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, jejb, richard.henderson, qemu-devel, frankeh, dovmurik

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 16/08/21 17:13, Ashish Kalra wrote:
> > > > I think that once the mirror VM starts booting and running the UEFI
> > > > code, it might be only during the PEI or DXE phase where it will
> > > > start actually running the MH code, so mirror VM probably still need
> > > > to handles KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> > > > accesses and Debug Agent initialization stuff in SEC startup code.
> > > That may be a design of the migration helper code that you were working
> > > with, but it's not necessary.
> > > 
> > Actually my comments are about a more generic MH code.
> 
> I don't think that would be a good idea; designing QEMU's migration helper
> interface to be as constrained as possible is a good thing.  The migration
> helper is extremely security sensitive code, so it should not expose itself
> to the attack surface of the whole of QEMU.

It's also odd in that it's provided by the guest and acting on behalf of
the migration code; that's an unusually trusting relationship.

Dave

> Paolo
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 15:48             ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-16 15:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ashish Kalra, qemu-devel, thomas.lendacky, brijesh.singh,
	ehabkost, mst, richard.henderson, jejb, tobin, dovmurik, frankeh,
	kvm

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 16/08/21 17:13, Ashish Kalra wrote:
> > > > I think that once the mirror VM starts booting and running the UEFI
> > > > code, it might be only during the PEI or DXE phase where it will
> > > > start actually running the MH code, so mirror VM probably still need
> > > > to handles KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> > > > accesses and Debug Agent initialization stuff in SEC startup code.
> > > That may be a design of the migration helper code that you were working
> > > with, but it's not necessary.
> > > 
> > Actually my comments are about a more generic MH code.
> 
> I don't think that would be a good idea; designing QEMU's migration helper
> interface to be as constrained as possible is a good thing.  The migration
> helper is extremely security sensitive code, so it should not expose itself
> to the attack surface of the whole of QEMU.

It's also odd in that it's provided by the guest and acting on behalf of
the migration code; that's an unusually trusting relationship.

Dave

> Paolo
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 14:15   ` Paolo Bonzini
@ 2021-08-16 17:23     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-16 17:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ashish Kalra, qemu-devel, thomas.lendacky, brijesh.singh,
	ehabkost, mst, richard.henderson, jejb, tobin, dovmurik, frankeh,
	kvm

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 16/08/21 15:25, Ashish Kalra wrote:
> > From: Ashish Kalra<ashish.kalra@amd.com>
> > 
> > This is an RFC series for Mirror VM support that are
> > essentially secondary VMs sharing the encryption context
> > (ASID) with a primary VM. The patch-set creates a new
> > VM and shares the primary VM's encryption context
> > with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> > The mirror VM uses a separate pair of VM + vCPU file
> > descriptors and also uses a simplified KVM run loop,
> > for example, it does not support any interrupt vmexit's. etc.
> > Currently the mirror VM shares the address space of the
> > primary VM.
> > 
> > The mirror VM can be used for running an in-guest migration
> > helper (MH). It also might have future uses for other in-guest
> > operations.
> 
> Hi,
> 
> first of all, thanks for posting this work and starting the discussion.
> 
> However, I am not sure if the in-guest migration helper vCPUs should use the
> existing KVM support code.  For example, they probably can just always work
> with host CPUID (copied directly from KVM_GET_SUPPORTED_CPUID),

Doesn't at least one form of SEV have some masking of CPUID that's
visible to the guest; so perhaps we have to match the main vCPUs idea of
CPUIDs?

>  and they do
> not need to interface with QEMU's MMIO logic.  They would just sit on a
> "HLT" instruction and communicate with the main migration loop using some
> kind of standardized ring buffer protocol; the migration loop then executes
> KVM_RUN in order to start the processing of pages, and expects a
> KVM_EXIT_HLT when the VM has nothing to do or requires processing on the
> host.
> 
> The migration helper can then also use its own address space, for example
> operating directly on ram_addr_t values with the helper running at very high
> virtual addresses.  Migration code can use a RAMBlockNotifier to invoke
> KVM_SET_USER_MEMORY_REGION on the mirror VM (and never enable dirty memory
> logging on the mirror VM, too, which has better performance).

How does the use of a very high virtual address help ?

> With this implementation, the number of mirror vCPUs does not even have to
> be indicated on the command line.  The VM and its vCPUs can simply be
> created when migration starts.  In the SEV-ES case, the guest can even
> provide the VMSA that starts the migration helper.
> 
> The disadvantage is that, as you point out, in the future some of the
> infrastructure you introduce might be useful for VMPL0 operation on SEV-SNP.
> My proposal above might require some code duplication. However, it might
> even be that VMPL0 operation works best with a model more similar to my
> sketch of the migration helper; it's really too early to say.
> 

Dave

> Paolo
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 17:23     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-16 17:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, jejb, richard.henderson, qemu-devel, frankeh, dovmurik

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 16/08/21 15:25, Ashish Kalra wrote:
> > From: Ashish Kalra<ashish.kalra@amd.com>
> > 
> > This is an RFC series for Mirror VM support that are
> > essentially secondary VMs sharing the encryption context
> > (ASID) with a primary VM. The patch-set creates a new
> > VM and shares the primary VM's encryption context
> > with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> > The mirror VM uses a separate pair of VM + vCPU file
> > descriptors and also uses a simplified KVM run loop,
> > for example, it does not support any interrupt vmexit's. etc.
> > Currently the mirror VM shares the address space of the
> > primary VM.
> > 
> > The mirror VM can be used for running an in-guest migration
> > helper (MH). It also might have future uses for other in-guest
> > operations.
> 
> Hi,
> 
> first of all, thanks for posting this work and starting the discussion.
> 
> However, I am not sure if the in-guest migration helper vCPUs should use the
> existing KVM support code.  For example, they probably can just always work
> with host CPUID (copied directly from KVM_GET_SUPPORTED_CPUID),

Doesn't at least one form of SEV have some masking of CPUID that's
visible to the guest; so perhaps we have to match the main vCPUs idea of
CPUIDs?

>  and they do
> not need to interface with QEMU's MMIO logic.  They would just sit on a
> "HLT" instruction and communicate with the main migration loop using some
> kind of standardized ring buffer protocol; the migration loop then executes
> KVM_RUN in order to start the processing of pages, and expects a
> KVM_EXIT_HLT when the VM has nothing to do or requires processing on the
> host.
> 
> The migration helper can then also use its own address space, for example
> operating directly on ram_addr_t values with the helper running at very high
> virtual addresses.  Migration code can use a RAMBlockNotifier to invoke
> KVM_SET_USER_MEMORY_REGION on the mirror VM (and never enable dirty memory
> logging on the mirror VM, too, which has better performance).

How does the use of a very high virtual address help ?

> With this implementation, the number of mirror vCPUs does not even have to
> be indicated on the command line.  The VM and its vCPUs can simply be
> created when migration starts.  In the SEV-ES case, the guest can even
> provide the VMSA that starts the migration helper.
> 
> The disadvantage is that, as you point out, in the future some of the
> infrastructure you introduce might be useful for VMPL0 operation on SEV-SNP.
> My proposal above might require some code duplication. However, it might
> even be that VMPL0 operation works best with a model more similar to my
> sketch of the migration helper; it's really too early to say.
> 

Dave

> Paolo
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 17:23     ` Dr. David Alan Gilbert
  (?)
@ 2021-08-16 20:53     ` Paolo Bonzini
  -1 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-16 20:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Thomas Lendacky, Ashish Kalra, Brijesh Singh, Habkost, Eduardo,
	kvm, S. Tsirkin, Michael, Tobin Feldman-Fitzthum,
	James E . J . Bottomley, Richard Henderson, qemu-devel,
	Hubertus Franke, Dov Murik

[-- Attachment #1: Type: text/plain, Size: 2221 bytes --]

Il lun 16 ago 2021, 19:23 Dr. David Alan Gilbert <dgilbert@redhat.com> ha
scritto:

> > However, I am not sure if the in-guest migration helper vCPUs should use
> the
> > existing KVM support code.  For example, they probably can just always
> work
> > with host CPUID (copied directly from KVM_GET_SUPPORTED_CPUID),
>
> Doesn't at least one form of SEV have some masking of CPUID that's
> visible to the guest; so perhaps we have to match the main vCPUs idea of
> CPUIDs?
>

I don't think we do. Whatever startup code the on the migration helper can
look at CPUID for purposes such as enabling AES instructions. It's a
separate VM and one that will never be migrated (it's started separately on
the source and destination).

> The migration helper can then also use its own address space, for example
> > operating directly on ram_addr_t values with the helper running at very
> high
> > virtual addresses.  Migration code can use a RAMBlockNotifier to invoke
> > KVM_SET_USER_MEMORY_REGION on the mirror VM (and never enable dirty
> memory
> > logging on the mirror VM, too, which has better performance).
>
> How does the use of a very high virtual address help ?
>

Sorry, read that as physical addresses, i.e. the code and any dedicated
migration helper RAM (including communication structures) would be out of
the range used by ram_addr_ts. (The virtual addresses instead can be chosen
by the helper, since QEMU knows nothing about them).

Paolo


> > With this implementation, the number of mirror vCPUs does not even have
> to
> > be indicated on the command line.  The VM and its vCPUs can simply be
> > created when migration starts.  In the SEV-ES case, the guest can even
> > provide the VMSA that starts the migration helper.
> >
> > The disadvantage is that, as you point out, in the future some of the
> > infrastructure you introduce might be useful for VMPL0 operation on
> SEV-SNP.
> > My proposal above might require some code duplication. However, it might
> > even be that VMPL0 operation works best with a model more similar to my
> > sketch of the migration helper; it's really too early to say.
> >
>
> Dave
>
> > Paolo
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>

[-- Attachment #2: Type: text/html, Size: 3283 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 01/13] machine: Add mirrorvcpus=N suboption to -smp
  2021-08-16 13:26 ` [RFC PATCH 01/13] machine: Add mirrorvcpus=N suboption to -smp Ashish Kalra
@ 2021-08-16 21:23     ` Eric Blake
  0 siblings, 0 replies; 104+ messages in thread
From: Eric Blake @ 2021-08-16 21:23 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: qemu-devel, pbonzini, thomas.lendacky, brijesh.singh, ehabkost,
	mst, richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert,
	kvm

On Mon, Aug 16, 2021 at 01:26:45PM +0000, Ashish Kalra wrote:
> From: Dov Murik <dovmurik@linux.vnet.ibm.com>
> 
> Add a notion of mirror vcpus to CpuTopology, which will allow to
> designate a few vcpus (normally 1) for running the guest
> migration handler (MH).
> 
> Example usage for starting a 4-vcpu guest, of which 1 vcpu is marked as
> mirror vcpu.
> 
>     qemu-system-x86_64 -smp 4,mirrorvcpus=1 ...
> 
> Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---

> +++ b/qapi/machine.json
> @@ -1303,6 +1303,8 @@
>  #
>  # @maxcpus: maximum number of hotpluggable virtual CPUs in the virtual machine
>  #
> +# @mirrorvcpus: maximum number of mirror virtual CPUs in the virtual machine
> +#

Needs a '(since 6.2)' tag.

>  # Since: 6.1
>  ##
>  { 'struct': 'SMPConfiguration', 'data': {
> @@ -1311,4 +1313,5 @@
>       '*dies': 'int',
>       '*cores': 'int',
>       '*threads': 'int',
> -     '*maxcpus': 'int' } }
> +     '*maxcpus': 'int',
> +     '*mirrorvcpus': 'int' } }

Is this really the right place to be adding it?  The rest of this
struct feels like things that advertise what bare metal can do, and
therefore what we are emulating.  But bare metal can't do mirrors -
that's something that is completely in the realm of emulation only.
If I understand the cover letter, the guest shouldn't be able to
detect that mirroring exists, which is different from how the guest
DOES detect how many dies, cores, and threads are available to use.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 01/13] machine: Add mirrorvcpus=N suboption to -smp
@ 2021-08-16 21:23     ` Eric Blake
  0 siblings, 0 replies; 104+ messages in thread
From: Eric Blake @ 2021-08-16 21:23 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: thomas.lendacky, brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, qemu-devel, dgilbert, frankeh, dovmurik,
	pbonzini

On Mon, Aug 16, 2021 at 01:26:45PM +0000, Ashish Kalra wrote:
> From: Dov Murik <dovmurik@linux.vnet.ibm.com>
> 
> Add a notion of mirror vcpus to CpuTopology, which will allow to
> designate a few vcpus (normally 1) for running the guest
> migration handler (MH).
> 
> Example usage for starting a 4-vcpu guest, of which 1 vcpu is marked as
> mirror vcpu.
> 
>     qemu-system-x86_64 -smp 4,mirrorvcpus=1 ...
> 
> Signed-off-by: Dov Murik <dovmurik@linux.vnet.ibm.com>
> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---

> +++ b/qapi/machine.json
> @@ -1303,6 +1303,8 @@
>  #
>  # @maxcpus: maximum number of hotpluggable virtual CPUs in the virtual machine
>  #
> +# @mirrorvcpus: maximum number of mirror virtual CPUs in the virtual machine
> +#

Needs a '(since 6.2)' tag.

>  # Since: 6.1
>  ##
>  { 'struct': 'SMPConfiguration', 'data': {
> @@ -1311,4 +1313,5 @@
>       '*dies': 'int',
>       '*cores': 'int',
>       '*threads': 'int',
> -     '*maxcpus': 'int' } }
> +     '*maxcpus': 'int',
> +     '*mirrorvcpus': 'int' } }

Is this really the right place to be adding it?  The rest of this
struct feels like things that advertise what bare metal can do, and
therefore what we are emulating.  But bare metal can't do mirrors -
that's something that is completely in the realm of emulation only.
If I understand the cover letter, the guest shouldn't be able to
detect that mirroring exists, which is different from how the guest
DOES detect how many dies, cores, and threads are available to use.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
@ 2021-08-16 23:53   ` Steve Rutherford
  2021-08-16 13:27 ` [RFC PATCH 02/13] hw/boards: Add mirror_vcpu flag to CPUArchId Ashish Kalra
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 104+ messages in thread
From: Steve Rutherford @ 2021-08-16 23:53 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: qemu-devel, pbonzini, thomas.lendacky, brijesh.singh, ehabkost,
	mst, richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert,
	kvm

On Mon, Aug 16, 2021 at 6:37 AM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> This is an RFC series for Mirror VM support that are
> essentially secondary VMs sharing the encryption context
> (ASID) with a primary VM. The patch-set creates a new
> VM and shares the primary VM's encryption context
> with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> The mirror VM uses a separate pair of VM + vCPU file
> descriptors and also uses a simplified KVM run loop,
> for example, it does not support any interrupt vmexit's. etc.
> Currently the mirror VM shares the address space of the
> primary VM.
Sharing an address space is incompatible with post-copy migration via
UFFD on the target side. I'll be honest and say I'm not deeply
familiar with QEMU's implementation of post-copy, but I imagine there
must be a mapping of guest memory that doesn't fault: on the target
side (or on both sides), the migration helper will need to have it's
view of guest memory go through that mapping, or a similar mapping.

Separately, I'm a little weary of leaving the migration helper mapped
into the shared address space as writable. Since the migration threads
will be executing guest-owned code, the guest could use these threads
to do whatever it pleases (including getting free cycles). The
migration helper's code needs to be trusted by both the host and the
guest. Making it non-writable, sourced by the host, and attested by
the hardware would mitigate these concerns. The host could also try to
monitor for malicious use of migration threads, but that would be
pretty finicky.  The host could competitively schedule the migration
helper vCPUs with the guest vCPUs, but I'd imagine that wouldn't be
the best for guest performance.


--Steve

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 23:53   ` Steve Rutherford
  0 siblings, 0 replies; 104+ messages in thread
From: Steve Rutherford @ 2021-08-16 23:53 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: qemu-devel, pbonzini, thomas.lendacky, brijesh.singh, ehabkost,
	mst, richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert,
	kvm

On Mon, Aug 16, 2021 at 6:37 AM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> This is an RFC series for Mirror VM support that are
> essentially secondary VMs sharing the encryption context
> (ASID) with a primary VM. The patch-set creates a new
> VM and shares the primary VM's encryption context
> with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> The mirror VM uses a separate pair of VM + vCPU file
> descriptors and also uses a simplified KVM run loop,
> for example, it does not support any interrupt vmexit's. etc.
> Currently the mirror VM shares the address space of the
> primary VM.
Sharing an address space is incompatible with post-copy migration via
UFFD on the target side. I'll be honest and say I'm not deeply
familiar with QEMU's implementation of post-copy, but I imagine there
must be a mapping of guest memory that doesn't fault: on the target
side (or on both sides), the migration helper will need to have it's
view of guest memory go through that mapping, or a similar mapping.

Separately, I'm a little weary of leaving the migration helper mapped
into the shared address space as writable. Since the migration threads
will be executing guest-owned code, the guest could use these threads
to do whatever it pleases (including getting free cycles). The
migration helper's code needs to be trusted by both the host and the
guest. Making it non-writable, sourced by the host, and attested by
the hardware would mitigate these concerns. The host could also try to
monitor for malicious use of migration threads, but that would be
pretty finicky.  The host could competitively schedule the migration
helper vCPUs with the guest vCPUs, but I'd imagine that wouldn't be
the best for guest performance.


--Steve


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 23:53   ` Steve Rutherford
@ 2021-08-17  7:05     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 104+ messages in thread
From: Michael S. Tsirkin @ 2021-08-17  7:05 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Ashish Kalra, qemu-devel, pbonzini, thomas.lendacky,
	brijesh.singh, ehabkost, richard.henderson, jejb, tobin,
	dovmurik, frankeh, dgilbert, kvm

On Mon, Aug 16, 2021 at 04:53:17PM -0700, Steve Rutherford wrote:
> Separately, I'm a little weary of leaving the migration helper mapped
> into the shared address space as writable. Since the migration threads
> will be executing guest-owned code, the guest could use these threads
> to do whatever it pleases (including getting free cycles). The
> migration helper's code needs to be trusted by both the host and the
> guest. Making it non-writable, sourced by the host, and attested by
> the hardware would mitigate these concerns.

Well it's an ABI to maintain against *both* guest and host then.

And a separate attestation isn't making things easier to manage.

I feel guest risks much more than the hypervisor here,
the hypervisor at worst is giving out free cycles and that can
be mitigated, so it makes sense to have guest be in control.

How about we source it from guest but write-protect it on the
hypervisor side? It could include a signature that hypervisor could
verify, which would be more flexible than hardware attestation.

-- 
MST


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17  7:05     ` Michael S. Tsirkin
  0 siblings, 0 replies; 104+ messages in thread
From: Michael S. Tsirkin @ 2021-08-17  7:05 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm,
	tobin, jejb, richard.henderson, qemu-devel, dgilbert, frankeh,
	dovmurik, pbonzini

On Mon, Aug 16, 2021 at 04:53:17PM -0700, Steve Rutherford wrote:
> Separately, I'm a little weary of leaving the migration helper mapped
> into the shared address space as writable. Since the migration threads
> will be executing guest-owned code, the guest could use these threads
> to do whatever it pleases (including getting free cycles). The
> migration helper's code needs to be trusted by both the host and the
> guest. Making it non-writable, sourced by the host, and attested by
> the hardware would mitigate these concerns.

Well it's an ABI to maintain against *both* guest and host then.

And a separate attestation isn't making things easier to manage.

I feel guest risks much more than the hypervisor here,
the hypervisor at worst is giving out free cycles and that can
be mitigated, so it makes sense to have guest be in control.

How about we source it from guest but write-protect it on the
hypervisor side? It could include a signature that hypervisor could
verify, which would be more flexible than hardware attestation.

-- 
MST



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 23:53   ` Steve Rutherford
@ 2021-08-17  8:38     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17  8:38 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Ashish Kalra, qemu-devel, pbonzini, thomas.lendacky,
	brijesh.singh, ehabkost, mst, richard.henderson, jejb, tobin,
	dovmurik, frankeh, kvm

* Steve Rutherford (srutherford@google.com) wrote:
> On Mon, Aug 16, 2021 at 6:37 AM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >
> > From: Ashish Kalra <ashish.kalra@amd.com>
> >
> > This is an RFC series for Mirror VM support that are
> > essentially secondary VMs sharing the encryption context
> > (ASID) with a primary VM. The patch-set creates a new
> > VM and shares the primary VM's encryption context
> > with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> > The mirror VM uses a separate pair of VM + vCPU file
> > descriptors and also uses a simplified KVM run loop,
> > for example, it does not support any interrupt vmexit's. etc.
> > Currently the mirror VM shares the address space of the
> > primary VM.
> Sharing an address space is incompatible with post-copy migration via
> UFFD on the target side. I'll be honest and say I'm not deeply
> familiar with QEMU's implementation of post-copy, but I imagine there
> must be a mapping of guest memory that doesn't fault: on the target
> side (or on both sides), the migration helper will need to have it's
> view of guest memory go through that mapping, or a similar mapping.

Ignoring SEV, our postcopy currently has a single mapping which is
guarded by UFFD. There is no 'no-fault' mapping.  We use the uffd ioctl
to 'place' a page into that space when we receive it.
But yes, I guess that can't work with SEV; as you say; if the helper
has to do the write, it'll have to do it into a shadow that it can write
to, even though the rest of th e guest must UF on access.

> Separately, I'm a little weary of leaving the migration helper mapped
> into the shared address space as writable. Since the migration threads
> will be executing guest-owned code, the guest could use these threads
> to do whatever it pleases (including getting free cycles)a

Agreed.

> . The
> migration helper's code needs to be trusted by both the host and the
> guest. 


> Making it non-writable, sourced by the host, and attested by
> the hardware would mitigate these concerns.

Some people worry about having the host supply the guest firmware,
because they worry they'll be railroaded into using something they
don't actually trust, and if there aim of using SEV etc is to avoid
trusting the cloud owner, that breaks that.

So for them, I think they want the migration helper to be trusted by the
guest and tolerated by the host.

> The host could also try to
> monitor for malicious use of migration threads, but that would be
> pretty finicky.  The host could competitively schedule the migration
> helper vCPUs with the guest vCPUs, but I'd imagine that wouldn't be
> the best for guest performance.

The CPU usage has to go somewhere.

Dave

> 
> --Steve
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17  8:38     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17  8:38 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, jejb, richard.henderson, qemu-devel, frankeh, dovmurik,
	pbonzini

* Steve Rutherford (srutherford@google.com) wrote:
> On Mon, Aug 16, 2021 at 6:37 AM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >
> > From: Ashish Kalra <ashish.kalra@amd.com>
> >
> > This is an RFC series for Mirror VM support that are
> > essentially secondary VMs sharing the encryption context
> > (ASID) with a primary VM. The patch-set creates a new
> > VM and shares the primary VM's encryption context
> > with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> > The mirror VM uses a separate pair of VM + vCPU file
> > descriptors and also uses a simplified KVM run loop,
> > for example, it does not support any interrupt vmexit's. etc.
> > Currently the mirror VM shares the address space of the
> > primary VM.
> Sharing an address space is incompatible with post-copy migration via
> UFFD on the target side. I'll be honest and say I'm not deeply
> familiar with QEMU's implementation of post-copy, but I imagine there
> must be a mapping of guest memory that doesn't fault: on the target
> side (or on both sides), the migration helper will need to have it's
> view of guest memory go through that mapping, or a similar mapping.

Ignoring SEV, our postcopy currently has a single mapping which is
guarded by UFFD. There is no 'no-fault' mapping.  We use the uffd ioctl
to 'place' a page into that space when we receive it.
But yes, I guess that can't work with SEV; as you say; if the helper
has to do the write, it'll have to do it into a shadow that it can write
to, even though the rest of th e guest must UF on access.

> Separately, I'm a little weary of leaving the migration helper mapped
> into the shared address space as writable. Since the migration threads
> will be executing guest-owned code, the guest could use these threads
> to do whatever it pleases (including getting free cycles)a

Agreed.

> . The
> migration helper's code needs to be trusted by both the host and the
> guest. 


> Making it non-writable, sourced by the host, and attested by
> the hardware would mitigate these concerns.

Some people worry about having the host supply the guest firmware,
because they worry they'll be railroaded into using something they
don't actually trust, and if there aim of using SEV etc is to avoid
trusting the cloud owner, that breaks that.

So for them, I think they want the migration helper to be trusted by the
guest and tolerated by the host.

> The host could also try to
> monitor for malicious use of migration threads, but that would be
> pretty finicky.  The host could competitively schedule the migration
> helper vCPUs with the guest vCPUs, but I'd imagine that wouldn't be
> the best for guest performance.

The CPU usage has to go somewhere.

Dave

> 
> --Steve
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17  8:38     ` Dr. David Alan Gilbert
  (?)
@ 2021-08-17 14:08     ` Ashish Kalra
  -1 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-17 14:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Steve Rutherford, qemu-devel, pbonzini, thomas.lendacky,
	brijesh.singh, ehabkost, mst, richard.henderson, jejb, tobin,
	dovmurik, frankeh, kvm

Hello Dave, Steve,

On Tue, Aug 17, 2021 at 09:38:24AM +0100, Dr. David Alan Gilbert wrote:
> * Steve Rutherford (srutherford@google.com) wrote:
> > On Mon, Aug 16, 2021 at 6:37 AM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> > >
> > > From: Ashish Kalra <ashish.kalra@amd.com>
> > >
> > > This is an RFC series for Mirror VM support that are
> > > essentially secondary VMs sharing the encryption context
> > > (ASID) with a primary VM. The patch-set creates a new
> > > VM and shares the primary VM's encryption context
> > > with it using the KVM_CAP_VM_COPY_ENC_CONTEXT_FROM capability.
> > > The mirror VM uses a separate pair of VM + vCPU file
> > > descriptors and also uses a simplified KVM run loop,
> > > for example, it does not support any interrupt vmexit's. etc.
> > > Currently the mirror VM shares the address space of the
> > > primary VM.
> > Sharing an address space is incompatible with post-copy migration via
> > UFFD on the target side. I'll be honest and say I'm not deeply
> > familiar with QEMU's implementation of post-copy, but I imagine there
> > must be a mapping of guest memory that doesn't fault: on the target
> > side (or on both sides), the migration helper will need to have it's
> > view of guest memory go through that mapping, or a similar mapping.
> 
> Ignoring SEV, our postcopy currently has a single mapping which is
> guarded by UFFD. There is no 'no-fault' mapping.  We use the uffd ioctl
> to 'place' a page into that space when we receive it.
> But yes, I guess that can't work with SEV; as you say; if the helper
> has to do the write, it'll have to do it into a shadow that it can write
> to, even though the rest of th e guest must UF on access.
> 

I assume that MH will be sharing the address space for source VM,
this will be in compatibility with host based live migration, the source
VM will be running in context of the qemu process (with all it's vcpu
threads and migration thread). 

Surely sharing the address space on target side will be incompatible
with post copy migration, as post copy migration will need to setup UFFD
mappings to start running the target VM while post copy migration is
active.

But will the UFFD mappings only be setup till the post-copy migration is
active, won't similar page mappings as source VM be setup on target VM
after the migration process is complete ?

Thanks,
Ashish

> > Separately, I'm a little weary of leaving the migration helper mapped
> > into the shared address space as writable. Since the migration threads
> > will be executing guest-owned code, the guest could use these threads
> > to do whatever it pleases (including getting free cycles)a
> 
> Agreed.
> 
> > . The
> > migration helper's code needs to be trusted by both the host and the
> > guest. 
> 
> 
> > Making it non-writable, sourced by the host, and attested by
> > the hardware would mitigate these concerns.
> 
> Some people worry about having the host supply the guest firmware,
> because they worry they'll be railroaded into using something they
> don't actually trust, and if there aim of using SEV etc is to avoid
> trusting the cloud owner, that breaks that.
> 
> So for them, I think they want the migration helper to be trusted by the
> guest and tolerated by the host.
> 
> > The host could also try to
> > monitor for malicious use of migration threads, but that would be
> > pretty finicky.  The host could competitively schedule the migration
> > helper vCPUs with the guest vCPUs, but I'd imagine that wouldn't be
> > the best for guest performance.
> 
> The CPU usage has to go somewhere.
> 
> Dave
> 
> > 
> > --Steve
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 23:53   ` Steve Rutherford
@ 2021-08-17 16:32     ` Paolo Bonzini
  -1 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-17 16:32 UTC (permalink / raw)
  To: Steve Rutherford, Ashish Kalra
  Cc: qemu-devel, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

On 17/08/21 01:53, Steve Rutherford wrote:
> Separately, I'm a little weary of leaving the migration helper mapped
> into the shared address space as writable.

A related question here is what the API should be for how the migration 
helper sees the memory in both physical and virtual address.

First of all, I would like the addresses passed to and from the 
migration helper to *not* be guest physical addresses (this is what I 
referred to as QEMU's ram_addr_t in other messages).  The reason is that 
some unmapped memory regions, such as virtio-mem hotplugged memory, 
would still have to be transferred and could be encrypted.  While the 
guest->host hypercall interface uses guest physical addresses to 
communicate which pages are encrypted, the host can do the 
GPA->ram_addr_t conversion and remember the encryption status of 
currently-unmapped regions.

This poses a problem, in that the guest needs to prepare the page tables 
for the migration helper and those need to use the migration helper's 
physical address space.

There's three possibilities for this:

1) the easy one: the bottom 4G of guest memory are mapped in the mirror 
VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a 
huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1). 
This even lets the migration helper reuse the OVMF runtime services 
memory map (but be careful about thread safety...).

2) the more future-proof one.  Here, the migration helper tells QEMU 
which area to copy from the guest to the mirror VM, as a (main GPA, 
length, mirror GPA) tuple.  This could happen for example the first time 
the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts, 
QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION 
accordingly.  The page tables are built for this (usually very high) 
mirror GPA and the migration helper operates in a completely separate 
address space.  However, the backing memory would still be shared 
between the main and mirror VMs.  I am saying this is more future proof 
because we have more flexibility in setting up the physical address 
space of the mirror VM.

3) the paranoid one, which I think is what you hint at above: this is an 
extension of (2), where userspace invokes the PSP send/receive API to 
copy the small requested area of the main VM into the mirror VM.  The 
mirror VM code and data are completely separate from the main VM.  All 
that the mirror VM shares is the ram_addr_t data.  Though I am not even 
sure it is possible to use the send/receive API this way...

What do you think?

Paolo


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17 16:32     ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-17 16:32 UTC (permalink / raw)
  To: Steve Rutherford, Ashish Kalra
  Cc: thomas.lendacky, brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, qemu-devel, dgilbert, frankeh, dovmurik

On 17/08/21 01:53, Steve Rutherford wrote:
> Separately, I'm a little weary of leaving the migration helper mapped
> into the shared address space as writable.

A related question here is what the API should be for how the migration 
helper sees the memory in both physical and virtual address.

First of all, I would like the addresses passed to and from the 
migration helper to *not* be guest physical addresses (this is what I 
referred to as QEMU's ram_addr_t in other messages).  The reason is that 
some unmapped memory regions, such as virtio-mem hotplugged memory, 
would still have to be transferred and could be encrypted.  While the 
guest->host hypercall interface uses guest physical addresses to 
communicate which pages are encrypted, the host can do the 
GPA->ram_addr_t conversion and remember the encryption status of 
currently-unmapped regions.

This poses a problem, in that the guest needs to prepare the page tables 
for the migration helper and those need to use the migration helper's 
physical address space.

There's three possibilities for this:

1) the easy one: the bottom 4G of guest memory are mapped in the mirror 
VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a 
huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1). 
This even lets the migration helper reuse the OVMF runtime services 
memory map (but be careful about thread safety...).

2) the more future-proof one.  Here, the migration helper tells QEMU 
which area to copy from the guest to the mirror VM, as a (main GPA, 
length, mirror GPA) tuple.  This could happen for example the first time 
the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts, 
QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION 
accordingly.  The page tables are built for this (usually very high) 
mirror GPA and the migration helper operates in a completely separate 
address space.  However, the backing memory would still be shared 
between the main and mirror VMs.  I am saying this is more future proof 
because we have more flexibility in setting up the physical address 
space of the mirror VM.

3) the paranoid one, which I think is what you hint at above: this is an 
extension of (2), where userspace invokes the PSP send/receive API to 
copy the small requested area of the main VM into the mirror VM.  The 
mirror VM code and data are completely separate from the main VM.  All 
that the mirror VM shares is the ram_addr_t data.  Though I am not even 
sure it is possible to use the send/receive API this way...

What do you think?

Paolo



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 16:32     ` Paolo Bonzini
@ 2021-08-17 20:50       ` Tobin Feldman-Fitzthum
  -1 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-17 20:50 UTC (permalink / raw)
  To: Paolo Bonzini, Steve Rutherford, Ashish Kalra
  Cc: thomas.lendacky, brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, qemu-devel, dgilbert, frankeh, dovmurik


On 8/17/21 12:32 PM, Paolo Bonzini wrote:
> On 17/08/21 01:53, Steve Rutherford wrote:
>> Separately, I'm a little weary of leaving the migration helper mapped
>> into the shared address space as writable.
>
> A related question here is what the API should be for how the 
> migration helper sees the memory in both physical and virtual address.
>
> First of all, I would like the addresses passed to and from the 
> migration helper to *not* be guest physical addresses (this is what I 
> referred to as QEMU's ram_addr_t in other messages).  The reason is 
> that some unmapped memory regions, such as virtio-mem hotplugged 
> memory, would still have to be transferred and could be encrypted.  
> While the guest->host hypercall interface uses guest physical 
> addresses to communicate which pages are encrypted, the host can do 
> the GPA->ram_addr_t conversion and remember the encryption status of 
> currently-unmapped regions.
>
> This poses a problem, in that the guest needs to prepare the page 
> tables for the migration helper and those need to use the migration 
> helper's physical address space.
>
> There's three possibilities for this:
>
> 1) the easy one: the bottom 4G of guest memory are mapped in the 
> mirror VM 1:1.  The ram_addr_t-based addresses are shifted by either 
> 4G or a huge value such as 2^42 (MAXPHYADDR - physical address 
> reduction - 1). This even lets the migration helper reuse the OVMF 
> runtime services memory map (but be careful about thread safety...).

This is essentially what we do in our prototype, although we have an 
even simpler approach. We have a 1:1 mapping that maps an address to 
itself with the cbit set. During Migration QEMU asks the migration 
handler to import/export encrypted pages and provides the GPA for said 
page. Since the migration handler only exports/imports encrypted pages, 
we can have the cbit set for every page in our mapping. We can still use 
OVMF functions with these mappings because they are on encrypted pages. 
The MH does need to use a few shared pages (to communicate with QEMU, 
for instance), so we have another mapping without the cbit that is at a 
large offset.

I think this is basically equivalent to what you suggest. As you point 
out above, this approach does require that any page that will be 
exported/imported by the MH is mapped in the guest. Is this a bad 
assumption? The VMSA for SEV-ES is one example of a region that is 
encrypted but not mapped in the guest (the PSP handles it directly). We 
have been planning to map the VMSA into the guest to support migration 
with SEV-ES (along with other changes).

> 2) the more future-proof one.  Here, the migration helper tells QEMU 
> which area to copy from the guest to the mirror VM, as a (main GPA, 
> length, mirror GPA) tuple.  This could happen for example the first 
> time the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration 
> starts, QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION 
> accordingly.  The page tables are built for this (usually very high) 
> mirror GPA and the migration helper operates in a completely separate 
> address space.  However, the backing memory would still be shared 
> between the main and mirror VMs.  I am saying this is more future 
> proof because we have more flexibility in setting up the physical 
> address space of the mirror VM.

The Migration Handler in OVMF is not a contiguous region of memory. The 
MH uses OVMF helper functions that are allocated in various regions of 
runtime memory. I guess I can see how separating the memory of the MH 
and the guest OS could be positive. On the other hand, since the MH is 
in OVMF, it is fundamentally designed to coexist with the guest OS.

What do you envision in terms of future changes to the mirror address space?

> 3) the paranoid one, which I think is what you hint at above: this is 
> an extension of (2), where userspace invokes the PSP send/receive API 
> to copy the small requested area of the main VM into the mirror VM.  
> The mirror VM code and data are completely separate from the main VM.  
> All that the mirror VM shares is the ram_addr_t data. Though I am not 
> even sure it is possible to use the send/receive API this way...

Yeah not sure if you could use the PSP for this.

-Tobin

>
> What do you think?
>
> Paolo
>
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17 20:50       ` Tobin Feldman-Fitzthum
  0 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-17 20:50 UTC (permalink / raw)
  To: Paolo Bonzini, Steve Rutherford, Ashish Kalra
  Cc: thomas.lendacky, brijesh.singh, ehabkost, kvm, mst,
	richard.henderson, jejb, tobin, qemu-devel, dgilbert, frankeh,
	dovmurik


On 8/17/21 12:32 PM, Paolo Bonzini wrote:
> On 17/08/21 01:53, Steve Rutherford wrote:
>> Separately, I'm a little weary of leaving the migration helper mapped
>> into the shared address space as writable.
>
> A related question here is what the API should be for how the 
> migration helper sees the memory in both physical and virtual address.
>
> First of all, I would like the addresses passed to and from the 
> migration helper to *not* be guest physical addresses (this is what I 
> referred to as QEMU's ram_addr_t in other messages).  The reason is 
> that some unmapped memory regions, such as virtio-mem hotplugged 
> memory, would still have to be transferred and could be encrypted.  
> While the guest->host hypercall interface uses guest physical 
> addresses to communicate which pages are encrypted, the host can do 
> the GPA->ram_addr_t conversion and remember the encryption status of 
> currently-unmapped regions.
>
> This poses a problem, in that the guest needs to prepare the page 
> tables for the migration helper and those need to use the migration 
> helper's physical address space.
>
> There's three possibilities for this:
>
> 1) the easy one: the bottom 4G of guest memory are mapped in the 
> mirror VM 1:1.  The ram_addr_t-based addresses are shifted by either 
> 4G or a huge value such as 2^42 (MAXPHYADDR - physical address 
> reduction - 1). This even lets the migration helper reuse the OVMF 
> runtime services memory map (but be careful about thread safety...).

This is essentially what we do in our prototype, although we have an 
even simpler approach. We have a 1:1 mapping that maps an address to 
itself with the cbit set. During Migration QEMU asks the migration 
handler to import/export encrypted pages and provides the GPA for said 
page. Since the migration handler only exports/imports encrypted pages, 
we can have the cbit set for every page in our mapping. We can still use 
OVMF functions with these mappings because they are on encrypted pages. 
The MH does need to use a few shared pages (to communicate with QEMU, 
for instance), so we have another mapping without the cbit that is at a 
large offset.

I think this is basically equivalent to what you suggest. As you point 
out above, this approach does require that any page that will be 
exported/imported by the MH is mapped in the guest. Is this a bad 
assumption? The VMSA for SEV-ES is one example of a region that is 
encrypted but not mapped in the guest (the PSP handles it directly). We 
have been planning to map the VMSA into the guest to support migration 
with SEV-ES (along with other changes).

> 2) the more future-proof one.  Here, the migration helper tells QEMU 
> which area to copy from the guest to the mirror VM, as a (main GPA, 
> length, mirror GPA) tuple.  This could happen for example the first 
> time the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration 
> starts, QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION 
> accordingly.  The page tables are built for this (usually very high) 
> mirror GPA and the migration helper operates in a completely separate 
> address space.  However, the backing memory would still be shared 
> between the main and mirror VMs.  I am saying this is more future 
> proof because we have more flexibility in setting up the physical 
> address space of the mirror VM.

The Migration Handler in OVMF is not a contiguous region of memory. The 
MH uses OVMF helper functions that are allocated in various regions of 
runtime memory. I guess I can see how separating the memory of the MH 
and the guest OS could be positive. On the other hand, since the MH is 
in OVMF, it is fundamentally designed to coexist with the guest OS.

What do you envision in terms of future changes to the mirror address space?

> 3) the paranoid one, which I think is what you hint at above: this is 
> an extension of (2), where userspace invokes the PSP send/receive API 
> to copy the small requested area of the main VM into the mirror VM.  
> The mirror VM code and data are completely separate from the main VM.  
> All that the mirror VM shares is the ram_addr_t data. Though I am not 
> even sure it is possible to use the send/receive API this way...

Yeah not sure if you could use the PSP for this.

-Tobin

>
> What do you think?
>
> Paolo
>
>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 16:32     ` Paolo Bonzini
@ 2021-08-17 21:54       ` Steve Rutherford
  -1 siblings, 0 replies; 104+ messages in thread
From: Steve Rutherford @ 2021-08-17 21:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ashish Kalra, qemu-devel, thomas.lendacky, brijesh.singh,
	ehabkost, mst, richard.henderson, jejb, tobin, dovmurik, frankeh,
	dgilbert, kvm

On Tue, Aug 17, 2021 at 9:32 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 17/08/21 01:53, Steve Rutherford wrote:
> > Separately, I'm a little weary of leaving the migration helper mapped
> > into the shared address space as writable.
>
> A related question here is what the API should be for how the migration
> helper sees the memory in both physical and virtual address.
>
> First of all, I would like the addresses passed to and from the
> migration helper to *not* be guest physical addresses (this is what I
> referred to as QEMU's ram_addr_t in other messages).  The reason is that
> some unmapped memory regions, such as virtio-mem hotplugged memory,
> would still have to be transferred and could be encrypted.  While the
> guest->host hypercall interface uses guest physical addresses to
> communicate which pages are encrypted, the host can do the
> GPA->ram_addr_t conversion and remember the encryption status of
> currently-unmapped regions.
>
> This poses a problem, in that the guest needs to prepare the page tables
> for the migration helper and those need to use the migration helper's
> physical address space.
>
> There's three possibilities for this:
>
> 1) the easy one: the bottom 4G of guest memory are mapped in the mirror
> VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a
> huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1).
> This even lets the migration helper reuse the OVMF runtime services
> memory map (but be careful about thread safety...).
If I understand what you are proposing, this would only work for
SEV/SEV-ES, since the RMP prevents these remapping games. This makes
me less enthusiastic about this (but I suspect that's why you call
this less future proof).
>
> 2) the more future-proof one.  Here, the migration helper tells QEMU
> which area to copy from the guest to the mirror VM, as a (main GPA,
> length, mirror GPA) tuple.  This could happen for example the first time
> the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts,
> QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION
> accordingly.  The page tables are built for this (usually very high)
> mirror GPA and the migration helper operates in a completely separate
> address space.  However, the backing memory would still be shared
> between the main and mirror VMs.  I am saying this is more future proof
> because we have more flexibility in setting up the physical address
> space of the mirror VM.
>
> 3) the paranoid one, which I think is what you hint at above: this is an
> extension of (2), where userspace invokes the PSP send/receive API to
> copy the small requested area of the main VM into the mirror VM.  The
> mirror VM code and data are completely separate from the main VM.  All
> that the mirror VM shares is the ram_addr_t data.  Though I am not even
> sure it is possible to use the send/receive API this way...
Moreso what I was hinting at was treating the MH's code and data as
firmware is treated, i.e. initialize it via LAUNCH_UPDATE_DATA.
Getting the guest to trust host supplied code (i.e. firmware) needs to
happen regardless.

>
> What do you think?

My intuition for this leans more on the host, but matches some of the
bits you've mentioned in (2)/(3). My intuition would be to put the
migration helper incredibly high in gPA space, so that it does not
collide with the rest of the guest (and can then stay in the same
place for a fairly long period of time without needing to poke a hole
in the guest). Then you can leave the ram_addr_t-based addresses
mapped normally (without the offsetting). All this together allows the
migration helper to be orthogonal to the normal guest and normal
firmware.

In this case, since the migration helper has a somewhat stable base
address, you can have a prebaked entry point and page tables
(determined at build time). The shared communication pages can come
from neighboring high-memory. The migration helper can support a
straightforward halt loop (or PIO loop, or whatever) where it reads
from a predefined page to find what work needs to be done (perhaps
with that page depending on which CPU it is, so you can support
multithreading of the migration helper). Additionally, having it high
in memory makes it quite easy to assess who owns which addresses: high
mem is under the purview of the migration helper and does not need to
be dirty tracked. Only "low" memory can and needs to be encrypted for
transport to the target side.

--Steve
>
> Paolo
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17 21:54       ` Steve Rutherford
  0 siblings, 0 replies; 104+ messages in thread
From: Steve Rutherford @ 2021-08-17 21:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ashish Kalra, qemu-devel, thomas.lendacky, brijesh.singh,
	ehabkost, mst, richard.henderson, jejb, tobin, dovmurik, frankeh,
	dgilbert, kvm

On Tue, Aug 17, 2021 at 9:32 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 17/08/21 01:53, Steve Rutherford wrote:
> > Separately, I'm a little weary of leaving the migration helper mapped
> > into the shared address space as writable.
>
> A related question here is what the API should be for how the migration
> helper sees the memory in both physical and virtual address.
>
> First of all, I would like the addresses passed to and from the
> migration helper to *not* be guest physical addresses (this is what I
> referred to as QEMU's ram_addr_t in other messages).  The reason is that
> some unmapped memory regions, such as virtio-mem hotplugged memory,
> would still have to be transferred and could be encrypted.  While the
> guest->host hypercall interface uses guest physical addresses to
> communicate which pages are encrypted, the host can do the
> GPA->ram_addr_t conversion and remember the encryption status of
> currently-unmapped regions.
>
> This poses a problem, in that the guest needs to prepare the page tables
> for the migration helper and those need to use the migration helper's
> physical address space.
>
> There's three possibilities for this:
>
> 1) the easy one: the bottom 4G of guest memory are mapped in the mirror
> VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a
> huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1).
> This even lets the migration helper reuse the OVMF runtime services
> memory map (but be careful about thread safety...).
If I understand what you are proposing, this would only work for
SEV/SEV-ES, since the RMP prevents these remapping games. This makes
me less enthusiastic about this (but I suspect that's why you call
this less future proof).
>
> 2) the more future-proof one.  Here, the migration helper tells QEMU
> which area to copy from the guest to the mirror VM, as a (main GPA,
> length, mirror GPA) tuple.  This could happen for example the first time
> the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts,
> QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION
> accordingly.  The page tables are built for this (usually very high)
> mirror GPA and the migration helper operates in a completely separate
> address space.  However, the backing memory would still be shared
> between the main and mirror VMs.  I am saying this is more future proof
> because we have more flexibility in setting up the physical address
> space of the mirror VM.
>
> 3) the paranoid one, which I think is what you hint at above: this is an
> extension of (2), where userspace invokes the PSP send/receive API to
> copy the small requested area of the main VM into the mirror VM.  The
> mirror VM code and data are completely separate from the main VM.  All
> that the mirror VM shares is the ram_addr_t data.  Though I am not even
> sure it is possible to use the send/receive API this way...
Moreso what I was hinting at was treating the MH's code and data as
firmware is treated, i.e. initialize it via LAUNCH_UPDATE_DATA.
Getting the guest to trust host supplied code (i.e. firmware) needs to
happen regardless.

>
> What do you think?

My intuition for this leans more on the host, but matches some of the
bits you've mentioned in (2)/(3). My intuition would be to put the
migration helper incredibly high in gPA space, so that it does not
collide with the rest of the guest (and can then stay in the same
place for a fairly long period of time without needing to poke a hole
in the guest). Then you can leave the ram_addr_t-based addresses
mapped normally (without the offsetting). All this together allows the
migration helper to be orthogonal to the normal guest and normal
firmware.

In this case, since the migration helper has a somewhat stable base
address, you can have a prebaked entry point and page tables
(determined at build time). The shared communication pages can come
from neighboring high-memory. The migration helper can support a
straightforward halt loop (or PIO loop, or whatever) where it reads
from a predefined page to find what work needs to be done (perhaps
with that page depending on which CPU it is, so you can support
multithreading of the migration helper). Additionally, having it high
in memory makes it quite easy to assess who owns which addresses: high
mem is under the purview of the migration helper and does not need to
be dirty tracked. Only "low" memory can and needs to be encrypted for
transport to the target side.

--Steve
>
> Paolo
>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 20:50       ` Tobin Feldman-Fitzthum
@ 2021-08-17 22:04         ` Steve Rutherford
  -1 siblings, 0 replies; 104+ messages in thread
From: Steve Rutherford @ 2021-08-17 22:04 UTC (permalink / raw)
  To: Tobin Feldman-Fitzthum
  Cc: Paolo Bonzini, Ashish Kalra, thomas.lendacky, brijesh.singh,
	ehabkost, kvm, mst, tobin, jejb, richard.henderson, qemu-devel,
	dgilbert, frankeh, dovmurik

On Tue, Aug 17, 2021 at 1:50 PM Tobin Feldman-Fitzthum
<tobin@linux.ibm.com> wrote:
>
>
> On 8/17/21 12:32 PM, Paolo Bonzini wrote:
> > There's three possibilities for this:
> >
> > 1) the easy one: the bottom 4G of guest memory are mapped in the
> > mirror VM 1:1.  The ram_addr_t-based addresses are shifted by either
> > 4G or a huge value such as 2^42 (MAXPHYADDR - physical address
> > reduction - 1). This even lets the migration helper reuse the OVMF
> > runtime services memory map (but be careful about thread safety...).
>
> This is essentially what we do in our prototype, although we have an
> even simpler approach. We have a 1:1 mapping that maps an address to
> itself with the cbit set. During Migration QEMU asks the migration
> handler to import/export encrypted pages and provides the GPA for said
> page. Since the migration handler only exports/imports encrypted pages,
> we can have the cbit set for every page in our mapping. We can still use
> OVMF functions with these mappings because they are on encrypted pages.
> The MH does need to use a few shared pages (to communicate with QEMU,
> for instance), so we have another mapping without the cbit that is at a
> large offset.
>
> I think this is basically equivalent to what you suggest. As you point
> out above, this approach does require that any page that will be
> exported/imported by the MH is mapped in the guest. Is this a bad
> assumption? The VMSA for SEV-ES is one example of a region that is
> encrypted but not mapped in the guest (the PSP handles it directly). We
> have been planning to map the VMSA into the guest to support migration
> with SEV-ES (along with other changes).

Ahh, It sounds like you are looking into sidestepping the existing
AMD-SP flows for migration. I assume the idea is to spin up a VM on
the target side, and have the two VMs attest to each other. How do the
two sides know if the other is legitimate? I take it that the source
is directing the LAUNCH flows?


--Steve

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17 22:04         ` Steve Rutherford
  0 siblings, 0 replies; 104+ messages in thread
From: Steve Rutherford @ 2021-08-17 22:04 UTC (permalink / raw)
  To: Tobin Feldman-Fitzthum
  Cc: Paolo Bonzini, Ashish Kalra, thomas.lendacky, brijesh.singh,
	ehabkost, kvm, mst, tobin, jejb, richard.henderson, qemu-devel,
	dgilbert, frankeh, dovmurik

On Tue, Aug 17, 2021 at 1:50 PM Tobin Feldman-Fitzthum
<tobin@linux.ibm.com> wrote:
>
>
> On 8/17/21 12:32 PM, Paolo Bonzini wrote:
> > There's three possibilities for this:
> >
> > 1) the easy one: the bottom 4G of guest memory are mapped in the
> > mirror VM 1:1.  The ram_addr_t-based addresses are shifted by either
> > 4G or a huge value such as 2^42 (MAXPHYADDR - physical address
> > reduction - 1). This even lets the migration helper reuse the OVMF
> > runtime services memory map (but be careful about thread safety...).
>
> This is essentially what we do in our prototype, although we have an
> even simpler approach. We have a 1:1 mapping that maps an address to
> itself with the cbit set. During Migration QEMU asks the migration
> handler to import/export encrypted pages and provides the GPA for said
> page. Since the migration handler only exports/imports encrypted pages,
> we can have the cbit set for every page in our mapping. We can still use
> OVMF functions with these mappings because they are on encrypted pages.
> The MH does need to use a few shared pages (to communicate with QEMU,
> for instance), so we have another mapping without the cbit that is at a
> large offset.
>
> I think this is basically equivalent to what you suggest. As you point
> out above, this approach does require that any page that will be
> exported/imported by the MH is mapped in the guest. Is this a bad
> assumption? The VMSA for SEV-ES is one example of a region that is
> encrypted but not mapped in the guest (the PSP handles it directly). We
> have been planning to map the VMSA into the guest to support migration
> with SEV-ES (along with other changes).

Ahh, It sounds like you are looking into sidestepping the existing
AMD-SP flows for migration. I assume the idea is to spin up a VM on
the target side, and have the two VMs attest to each other. How do the
two sides know if the other is legitimate? I take it that the source
is directing the LAUNCH flows?


--Steve


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 21:54       ` Steve Rutherford
@ 2021-08-17 22:37         ` Paolo Bonzini
  -1 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-17 22:37 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Ashish Kalra, qemu-devel, Thomas Lendacky, Brijesh Singh,
	Habkost, Eduardo, S. Tsirkin, Michael, Richard Henderson,
	James E . J . Bottomley, Tobin Feldman-Fitzthum, Dov Murik,
	Hubertus Franke, David Gilbert, kvm

On Tue, Aug 17, 2021 at 11:54 PM Steve Rutherford
<srutherford@google.com> wrote:
> > 1) the easy one: the bottom 4G of guest memory are mapped in the mirror
> > VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a
> > huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1).
> > This even lets the migration helper reuse the OVMF runtime services
> > memory map (but be careful about thread safety...).
>
> If I understand what you are proposing, this would only work for
> SEV/SEV-ES, since the RMP prevents these remapping games. This makes
> me less enthusiastic about this (but I suspect that's why you call
> this less future proof).

I called it less future proof because it allows the migration helper
to rely more on OVMF details, but those may not apply in the future.

However you're right about SNP; the same page cannot be mapped twice
at different GPAs by a single ASID (which includes the VM and the
migration helper). :( That does throw a wrench in the idea of mapping
pages by ram_addr_t(*), and this applies to both schemes.

Migrating RAM in PCI BARs is a mess anyway for SNP, because PCI BARs
can be moved and every time they do the migration helper needs to wait
for validation to happen. :(

Paolo

(*) ram_addr_t is not a GPA; it is constant throughout the life of the
guest and independent of e.g. PCI BARs. Internally, when QEMU
retrieves the dirty page bitmap from KVM it stores the bits indexed by
ram_addr_t (shifted right by PAGE_SHIFT).

> > 2) the more future-proof one.  Here, the migration helper tells QEMU
> > which area to copy from the guest to the mirror VM, as a (main GPA,
> > length, mirror GPA) tuple.  This could happen for example the first time
> > the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts,
> > QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION
> > accordingly.  The page tables are built for this (usually very high)
> > mirror GPA and the migration helper operates in a completely separate
> > address space.  However, the backing memory would still be shared
> > between the main and mirror VMs.  I am saying this is more future proof
> > because we have more flexibility in setting up the physical address
> > space of the mirror VM.
>
> My intuition for this leans more on the host, but matches some of the
> bits you've mentioned in (2)/(3). My intuition would be to put the
> migration helper incredibly high in gPA space, so that it does not
> collide with the rest of the guest (and can then stay in the same
> place for a fairly long period of time without needing to poke a hole
> in the guest). Then you can leave the ram_addr_t-based addresses
> mapped normally (without the offsetting). All this together allows the
> migration helper to be orthogonal to the normal guest and normal
> firmware.
>
> In this case, since the migration helper has a somewhat stable base
> address, you can have a prebaked entry point and page tables
> (determined at build time). The shared communication pages can come
> from neighboring high-memory. The migration helper can support a
> straightforward halt loop (or PIO loop, or whatever) where it reads
> from a predefined page to find what work needs to be done (perhaps
> with that page depending on which CPU it is, so you can support
> multithreading of the migration helper). Additionally, having it high
> in memory makes it quite easy to assess who owns which addresses: high
> mem is under the purview of the migration helper and does not need to
> be dirty tracked. Only "low" memory can and needs to be encrypted for
> transport to the target side.
>
> --Steve
> >
> > Paolo
> >
>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17 22:37         ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-17 22:37 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Thomas Lendacky, Ashish Kalra, Brijesh Singh, Habkost, Eduardo,
	kvm, S. Tsirkin, Michael, Tobin Feldman-Fitzthum,
	James E . J . Bottomley, Richard Henderson, qemu-devel,
	David Gilbert, Hubertus Franke, Dov Murik

On Tue, Aug 17, 2021 at 11:54 PM Steve Rutherford
<srutherford@google.com> wrote:
> > 1) the easy one: the bottom 4G of guest memory are mapped in the mirror
> > VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a
> > huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1).
> > This even lets the migration helper reuse the OVMF runtime services
> > memory map (but be careful about thread safety...).
>
> If I understand what you are proposing, this would only work for
> SEV/SEV-ES, since the RMP prevents these remapping games. This makes
> me less enthusiastic about this (but I suspect that's why you call
> this less future proof).

I called it less future proof because it allows the migration helper
to rely more on OVMF details, but those may not apply in the future.

However you're right about SNP; the same page cannot be mapped twice
at different GPAs by a single ASID (which includes the VM and the
migration helper). :( That does throw a wrench in the idea of mapping
pages by ram_addr_t(*), and this applies to both schemes.

Migrating RAM in PCI BARs is a mess anyway for SNP, because PCI BARs
can be moved and every time they do the migration helper needs to wait
for validation to happen. :(

Paolo

(*) ram_addr_t is not a GPA; it is constant throughout the life of the
guest and independent of e.g. PCI BARs. Internally, when QEMU
retrieves the dirty page bitmap from KVM it stores the bits indexed by
ram_addr_t (shifted right by PAGE_SHIFT).

> > 2) the more future-proof one.  Here, the migration helper tells QEMU
> > which area to copy from the guest to the mirror VM, as a (main GPA,
> > length, mirror GPA) tuple.  This could happen for example the first time
> > the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts,
> > QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION
> > accordingly.  The page tables are built for this (usually very high)
> > mirror GPA and the migration helper operates in a completely separate
> > address space.  However, the backing memory would still be shared
> > between the main and mirror VMs.  I am saying this is more future proof
> > because we have more flexibility in setting up the physical address
> > space of the mirror VM.
>
> My intuition for this leans more on the host, but matches some of the
> bits you've mentioned in (2)/(3). My intuition would be to put the
> migration helper incredibly high in gPA space, so that it does not
> collide with the rest of the guest (and can then stay in the same
> place for a fairly long period of time without needing to poke a hole
> in the guest). Then you can leave the ram_addr_t-based addresses
> mapped normally (without the offsetting). All this together allows the
> migration helper to be orthogonal to the normal guest and normal
> firmware.
>
> In this case, since the migration helper has a somewhat stable base
> address, you can have a prebaked entry point and page tables
> (determined at build time). The shared communication pages can come
> from neighboring high-memory. The migration helper can support a
> straightforward halt loop (or PIO loop, or whatever) where it reads
> from a predefined page to find what work needs to be done (perhaps
> with that page depending on which CPU it is, so you can support
> multithreading of the migration helper). Additionally, having it high
> in memory makes it quite easy to assess who owns which addresses: high
> mem is under the purview of the migration helper and does not need to
> be dirty tracked. Only "low" memory can and needs to be encrypted for
> transport to the target side.
>
> --Steve
> >
> > Paolo
> >
>



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 22:37         ` Paolo Bonzini
@ 2021-08-17 22:57           ` James Bottomley
  -1 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-17 22:57 UTC (permalink / raw)
  To: Paolo Bonzini, Steve Rutherford
  Cc: Ashish Kalra, qemu-devel, Thomas Lendacky, Brijesh Singh,
	Habkost, Eduardo, S. Tsirkin, Michael, Richard Henderson,
	Tobin Feldman-Fitzthum, Dov Murik, Hubertus Franke,
	David Gilbert, kvm

On Wed, 2021-08-18 at 00:37 +0200, Paolo Bonzini wrote:
> On Tue, Aug 17, 2021 at 11:54 PM Steve Rutherford
> <srutherford@google.com> wrote:
> > > 1) the easy one: the bottom 4G of guest memory are mapped in the
> > > mirror
> > > VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G
> > > or a
> > > huge value such as 2^42 (MAXPHYADDR - physical address reduction
> > > - 1).
> > > This even lets the migration helper reuse the OVMF runtime
> > > services
> > > memory map (but be careful about thread safety...).
> > 
> > If I understand what you are proposing, this would only work for
> > SEV/SEV-ES, since the RMP prevents these remapping games. This
> > makes
> > me less enthusiastic about this (but I suspect that's why you call
> > this less future proof).
> 
> I called it less future proof because it allows the migration helper
> to rely more on OVMF details, but those may not apply in the future.
> 
> However you're right about SNP; the same page cannot be mapped twice
> at different GPAs by a single ASID (which includes the VM and the
> migration helper). :( That does throw a wrench in the idea of mapping
> pages by ram_addr_t(*), and this applies to both schemes.

Right, but in the current IBM approach, since we use the same mapping
for guest and mirror, we have the same GPA in both and it should work
with -SNP.

> Migrating RAM in PCI BARs is a mess anyway for SNP, because PCI BARs
> can be moved and every time they do the migration helper needs to
> wait for validation to happen. :(

Realistically, migration is becoming a royal pain, not just for
confidential computing, but for virtual functions in general.  I really
think we should look at S3 suspend, where we shut down the drivers and
then reattach on S3 resume as the potential pathway to getting
migration working both for virtual functions and this use case.

James



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17 22:57           ` James Bottomley
  0 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-17 22:57 UTC (permalink / raw)
  To: Paolo Bonzini, Steve Rutherford
  Cc: Thomas Lendacky, Ashish Kalra, Brijesh Singh, Habkost, Eduardo,
	kvm, S. Tsirkin, Michael, Tobin Feldman-Fitzthum,
	Richard Henderson, qemu-devel, David Gilbert, Hubertus Franke,
	Dov Murik

On Wed, 2021-08-18 at 00:37 +0200, Paolo Bonzini wrote:
> On Tue, Aug 17, 2021 at 11:54 PM Steve Rutherford
> <srutherford@google.com> wrote:
> > > 1) the easy one: the bottom 4G of guest memory are mapped in the
> > > mirror
> > > VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G
> > > or a
> > > huge value such as 2^42 (MAXPHYADDR - physical address reduction
> > > - 1).
> > > This even lets the migration helper reuse the OVMF runtime
> > > services
> > > memory map (but be careful about thread safety...).
> > 
> > If I understand what you are proposing, this would only work for
> > SEV/SEV-ES, since the RMP prevents these remapping games. This
> > makes
> > me less enthusiastic about this (but I suspect that's why you call
> > this less future proof).
> 
> I called it less future proof because it allows the migration helper
> to rely more on OVMF details, but those may not apply in the future.
> 
> However you're right about SNP; the same page cannot be mapped twice
> at different GPAs by a single ASID (which includes the VM and the
> migration helper). :( That does throw a wrench in the idea of mapping
> pages by ram_addr_t(*), and this applies to both schemes.

Right, but in the current IBM approach, since we use the same mapping
for guest and mirror, we have the same GPA in both and it should work
with -SNP.

> Migrating RAM in PCI BARs is a mess anyway for SNP, because PCI BARs
> can be moved and every time they do the migration helper needs to
> wait for validation to happen. :(

Realistically, migration is becoming a royal pain, not just for
confidential computing, but for virtual functions in general.  I really
think we should look at S3 suspend, where we shut down the drivers and
then reattach on S3 resume as the potential pathway to getting
migration working both for virtual functions and this use case.

James




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 22:57           ` James Bottomley
@ 2021-08-17 23:10             ` Steve Rutherford
  -1 siblings, 0 replies; 104+ messages in thread
From: Steve Rutherford @ 2021-08-17 23:10 UTC (permalink / raw)
  To: jejb
  Cc: Paolo Bonzini, Ashish Kalra, qemu-devel, Thomas Lendacky,
	Brijesh Singh, Habkost, Eduardo, S. Tsirkin, Michael,
	Richard Henderson, Tobin Feldman-Fitzthum, Dov Murik,
	Hubertus Franke, David Gilbert, kvm

On Tue, Aug 17, 2021 at 3:57 PM James Bottomley <jejb@linux.ibm.com> wrote:
> Realistically, migration is becoming a royal pain, not just for
> confidential computing, but for virtual functions in general.  I really
> think we should look at S3 suspend, where we shut down the drivers and
> then reattach on S3 resume as the potential pathway to getting
> migration working both for virtual functions and this use case.

This type of migration seems a little bit less "live", which makes me
concerned about its performance characteristics.

Steve

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17 23:10             ` Steve Rutherford
  0 siblings, 0 replies; 104+ messages in thread
From: Steve Rutherford @ 2021-08-17 23:10 UTC (permalink / raw)
  To: jejb
  Cc: Paolo Bonzini, Ashish Kalra, qemu-devel, Thomas Lendacky,
	Brijesh Singh, Habkost, Eduardo, S. Tsirkin, Michael,
	Richard Henderson, Tobin Feldman-Fitzthum, Dov Murik,
	Hubertus Franke, David Gilbert, kvm

On Tue, Aug 17, 2021 at 3:57 PM James Bottomley <jejb@linux.ibm.com> wrote:
> Realistically, migration is becoming a royal pain, not just for
> confidential computing, but for virtual functions in general.  I really
> think we should look at S3 suspend, where we shut down the drivers and
> then reattach on S3 resume as the potential pathway to getting
> migration working both for virtual functions and this use case.

This type of migration seems a little bit less "live", which makes me
concerned about its performance characteristics.

Steve


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 20:50       ` Tobin Feldman-Fitzthum
@ 2021-08-17 23:20         ` Paolo Bonzini
  -1 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-17 23:20 UTC (permalink / raw)
  To: Tobin Feldman-Fitzthum
  Cc: Steve Rutherford, Ashish Kalra, Thomas Lendacky, Brijesh Singh,
	Habkost, Eduardo, kvm, S. Tsirkin, Michael,
	Tobin Feldman-Fitzthum, James E . J . Bottomley,
	Richard Henderson, qemu-devel, David Gilbert, Hubertus Franke,
	Dov Murik

On Tue, Aug 17, 2021 at 10:51 PM Tobin Feldman-Fitzthum
<tobin@linux.ibm.com> wrote:
> This is essentially what we do in our prototype, although we have an
> even simpler approach. We have a 1:1 mapping that maps an address to
> itself with the cbit set. During Migration QEMU asks the migration
> handler to import/export encrypted pages and provides the GPA for said
> page. Since the migration handler only exports/imports encrypted pages,
> we can have the cbit set for every page in our mapping. We can still use
> OVMF functions with these mappings because they are on encrypted pages.
> The MH does need to use a few shared pages (to communicate with QEMU,
> for instance), so we have another mapping without the cbit that is at a
> large offset.
>
> I think this is basically equivalent to what you suggest. As you point
> out above, this approach does require that any page that will be
> exported/imported by the MH is mapped in the guest. Is this a bad
> assumption?

It should work well enough in the common case; and with SNP it looks
like it is a necessary assumption anyway. *shrug*

It would be a bit ugly because QEMU has to constantly convert
ram_addr_t's to guest physical addresses and back. But that's not a
performance problem.

The only important bit is that the encryption status bitmap be indexed
by ram_addr_t. This lets QEMU detect the problem of a ram_addr_t that
is marked encrypted but is not currently mapped, and abort migration.

> The Migration Handler in OVMF is not a contiguous region of memory. The
> MH uses OVMF helper functions that are allocated in various regions of
> runtime memory. I guess I can see how separating the memory of the MH
> and the guest OS could be positive. On the other hand, since the MH is
> in OVMF, it is fundamentally designed to coexist with the guest OS.

IIRC runtime services are not SMP-safe, so the migration helper cannot
coexist with the guest OS without extra care. I checked quickly and
CryptoPkg/Library/BaseCryptLib/SysCall/RuntimeMemAllocation.c does not
use any lock, so it would be bad if both OS-invoked runtime services
and the MH invoked the CryptoPkg malloc at the same time.

Paolo


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-17 23:20         ` Paolo Bonzini
  0 siblings, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-17 23:20 UTC (permalink / raw)
  To: Tobin Feldman-Fitzthum
  Cc: Thomas Lendacky, Ashish Kalra, Brijesh Singh, Habkost, Eduardo,
	kvm, S. Tsirkin, Michael, Steve Rutherford, Richard Henderson,
	James E . J . Bottomley, Tobin Feldman-Fitzthum, qemu-devel,
	David Gilbert, Hubertus Franke, Dov Murik

On Tue, Aug 17, 2021 at 10:51 PM Tobin Feldman-Fitzthum
<tobin@linux.ibm.com> wrote:
> This is essentially what we do in our prototype, although we have an
> even simpler approach. We have a 1:1 mapping that maps an address to
> itself with the cbit set. During Migration QEMU asks the migration
> handler to import/export encrypted pages and provides the GPA for said
> page. Since the migration handler only exports/imports encrypted pages,
> we can have the cbit set for every page in our mapping. We can still use
> OVMF functions with these mappings because they are on encrypted pages.
> The MH does need to use a few shared pages (to communicate with QEMU,
> for instance), so we have another mapping without the cbit that is at a
> large offset.
>
> I think this is basically equivalent to what you suggest. As you point
> out above, this approach does require that any page that will be
> exported/imported by the MH is mapped in the guest. Is this a bad
> assumption?

It should work well enough in the common case; and with SNP it looks
like it is a necessary assumption anyway. *shrug*

It would be a bit ugly because QEMU has to constantly convert
ram_addr_t's to guest physical addresses and back. But that's not a
performance problem.

The only important bit is that the encryption status bitmap be indexed
by ram_addr_t. This lets QEMU detect the problem of a ram_addr_t that
is marked encrypted but is not currently mapped, and abort migration.

> The Migration Handler in OVMF is not a contiguous region of memory. The
> MH uses OVMF helper functions that are allocated in various regions of
> runtime memory. I guess I can see how separating the memory of the MH
> and the guest OS could be positive. On the other hand, since the MH is
> in OVMF, it is fundamentally designed to coexist with the guest OS.

IIRC runtime services are not SMP-safe, so the migration helper cannot
coexist with the guest OS without extra care. I checked quickly and
CryptoPkg/Library/BaseCryptLib/SysCall/RuntimeMemAllocation.c does not
use any lock, so it would be bad if both OS-invoked runtime services
and the MH invoked the CryptoPkg malloc at the same time.

Paolo



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 23:10             ` Steve Rutherford
@ 2021-08-18  2:49               ` James Bottomley
  -1 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18  2:49 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Ashish Kalra, qemu-devel, Thomas Lendacky,
	Brijesh Singh, Habkost, Eduardo, S. Tsirkin, Michael,
	Richard Henderson, Tobin Feldman-Fitzthum, Dov Murik,
	Hubertus Franke, David Gilbert, kvm

On Tue, 2021-08-17 at 16:10 -0700, Steve Rutherford wrote:
> On Tue, Aug 17, 2021 at 3:57 PM James Bottomley <jejb@linux.ibm.com>
> wrote:
> > Realistically, migration is becoming a royal pain, not just for
> > confidential computing, but for virtual functions in general.  I
> > really think we should look at S3 suspend, where we shut down the
> > drivers and then reattach on S3 resume as the potential pathway to
> > getting migration working both for virtual functions and this use
> > case.
> 
> This type of migration seems a little bit less "live", which makes me
> concerned about its performance characteristics.

Well, there are too many scenarios we just fail at migration today.  We
need help from the guest to quiesce or shut down the interior devices,
and S3 suspend seems to be the machine signal for that.  I think in
most clouds guests would accept some loss of "liveness" for a gain in
reliability as long as we keep them within the SLA ... which is 5
minutes a year for 5 nines.  Most failed migrations also instantly fail
SLAs because of the recovery times involved so I don't see what's to be
achieved by keeping the current "we can migrate sometimes" approach.

James



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18  2:49               ` James Bottomley
  0 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18  2:49 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Thomas Lendacky, Ashish Kalra, Brijesh Singh, Habkost, Eduardo,
	kvm, S. Tsirkin, Michael, Tobin Feldman-Fitzthum,
	Richard Henderson, qemu-devel, David Gilbert, Hubertus Franke,
	Dov Murik, Paolo Bonzini

On Tue, 2021-08-17 at 16:10 -0700, Steve Rutherford wrote:
> On Tue, Aug 17, 2021 at 3:57 PM James Bottomley <jejb@linux.ibm.com>
> wrote:
> > Realistically, migration is becoming a royal pain, not just for
> > confidential computing, but for virtual functions in general.  I
> > really think we should look at S3 suspend, where we shut down the
> > drivers and then reattach on S3 resume as the potential pathway to
> > getting migration working both for virtual functions and this use
> > case.
> 
> This type of migration seems a little bit less "live", which makes me
> concerned about its performance characteristics.

Well, there are too many scenarios we just fail at migration today.  We
need help from the guest to quiesce or shut down the interior devices,
and S3 suspend seems to be the machine signal for that.  I think in
most clouds guests would accept some loss of "liveness" for a gain in
reliability as long as we keep them within the SLA ... which is 5
minutes a year for 5 nines.  Most failed migrations also instantly fail
SLAs because of the recovery times involved so I don't see what's to be
achieved by keeping the current "we can migrate sometimes" approach.

James




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-16 15:38           ` Paolo Bonzini
  (?)
  (?)
@ 2021-08-18 10:31           ` Ashish Kalra
  2021-08-18 11:25               ` James Bottomley
  2021-08-18 19:47             ` Paolo Bonzini
  -1 siblings, 2 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-18 10:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, jejb, tobin, dovmurik, frankeh, dgilbert, kvm

Hello Paolo,

On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> On 16/08/21 17:13, Ashish Kalra wrote:
> > > > I think that once the mirror VM starts booting and running the UEFI
> > > > code, it might be only during the PEI or DXE phase where it will
> > > > start actually running the MH code, so mirror VM probably still need
> > > > to handles KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> > > > accesses and Debug Agent initialization stuff in SEC startup code.
> > > That may be a design of the migration helper code that you were working
> > > with, but it's not necessary.
> > > 
> > Actually my comments are about a more generic MH code.
> 
> I don't think that would be a good idea; designing QEMU's migration helper
> interface to be as constrained as possible is a good thing.  The migration
> helper is extremely security sensitive code, so it should not expose itself
> to the attack surface of the whole of QEMU.
> 
> 
One question i have here, is that where exactly will the MH code exist
in QEMU ?

I assume it will be only x86 platform specific code, we probably will
never support it on other platforms ?

So it will probably exist in hw/i386, something similar to "microvm"
support and using the same TYPE_X86_MACHINE ?

Also if we are not going to use the existing KVM support code and
adding some duplicate KVM interface code, do we need to interface with
this added KVM code via the QEMU accelerator framework, or simply invoke
this KVM code statically ?

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 10:31           ` Ashish Kalra
@ 2021-08-18 11:25               ` James Bottomley
  2021-08-18 19:47             ` Paolo Bonzini
  1 sibling, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18 11:25 UTC (permalink / raw)
  To: Ashish Kalra, Paolo Bonzini
  Cc: qemu-devel, thomas.lendacky, brijesh.singh, ehabkost, mst,
	richard.henderson, tobin, dovmurik, frankeh, dgilbert, kvm

On Wed, 2021-08-18 at 10:31 +0000, Ashish Kalra wrote:
> Hello Paolo,
> 
> On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> > On 16/08/21 17:13, Ashish Kalra wrote:
> > > > > I think that once the mirror VM starts booting and running
> > > > > the UEFI code, it might be only during the PEI or DXE phase
> > > > > where it will start actually running the MH code, so mirror
> > > > > VM probably still need to handles KVM_EXIT_IO when SEC phase
> > > > > does I/O, I can see PIC accesses and Debug Agent
> > > > > initialization stuff in SEC startup code.
> > > > That may be a design of the migration helper code that you were
> > > > working with, but it's not necessary.
> > > > 
> > > Actually my comments are about a more generic MH code.
> > 
> > I don't think that would be a good idea; designing QEMU's migration
> > helper interface to be as constrained as possible is a good
> > thing.  The migration helper is extremely security sensitive code,
> > so it should not expose itself to the attack surface of the whole
> > of QEMU.

The attack surface of the MH in the guest is simply the API.  The API
needs to do two things:

   1. validate a correct endpoint and negotiate a wrapping key
   2. When requested by QEMU, wrap a section of guest encrypted memory
      with the wrapping key and return it.

The big security risk is in 1. if the MH can be tricked into
communicating with the wrong endpoint it will leak the entire guest. 
If we can lock that down, I don't see any particular security problem
with 2. So, provided we get the security properties of the API correct,
I think we won't have to worry over much about exposure of the API.

> > One question i have here, is that where exactly will the MH code
> exist in QEMU ?

I assume it will be only x86 platform specific code, we probably will
never support it on other platforms ?

So it will probably exist in hw/i386, something similar to "microvm"
support and using the same TYPE_X86_MACHINE ?

I don't think it should be x86 only.  The migration handler receiver
should be completely CPU agnostic.  It's possible other CPUs will grow
an encrypted memory confidential computing capability (Power already
has one and ARM is "thinking" about it, but even if it doesn't, there's
a similar problem if you want to use trustzone isolation in VMs).  I
would envisage migration working substantially similarly on all of them
(need to ask an agent in the guest to wrap an encrypted page for
transport) so I think we should add this capability to the generic QEMU
migration code and let other architectures take advantage of it as they
grow the facility.

> Also if we are not going to use the existing KVM support code and
> adding some duplicate KVM interface code, do we need to interface
> with this added KVM code via the QEMU accelerator framework, or
> simply invoke this KVM code statically ?

I think we need to design the interface as cleanly as possible, so it
just depends what's easiest.  We certainly need some KVM support for
the mirror CPUs, I think but it's not clear to me yet what the simplest
way to do the interface is.

James



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 11:25               ` James Bottomley
  0 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18 11:25 UTC (permalink / raw)
  To: Ashish Kalra, Paolo Bonzini
  Cc: thomas.lendacky, brijesh.singh, ehabkost, kvm, mst, tobin,
	richard.henderson, qemu-devel, dgilbert, frankeh, dovmurik

On Wed, 2021-08-18 at 10:31 +0000, Ashish Kalra wrote:
> Hello Paolo,
> 
> On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> > On 16/08/21 17:13, Ashish Kalra wrote:
> > > > > I think that once the mirror VM starts booting and running
> > > > > the UEFI code, it might be only during the PEI or DXE phase
> > > > > where it will start actually running the MH code, so mirror
> > > > > VM probably still need to handles KVM_EXIT_IO when SEC phase
> > > > > does I/O, I can see PIC accesses and Debug Agent
> > > > > initialization stuff in SEC startup code.
> > > > That may be a design of the migration helper code that you were
> > > > working with, but it's not necessary.
> > > > 
> > > Actually my comments are about a more generic MH code.
> > 
> > I don't think that would be a good idea; designing QEMU's migration
> > helper interface to be as constrained as possible is a good
> > thing.  The migration helper is extremely security sensitive code,
> > so it should not expose itself to the attack surface of the whole
> > of QEMU.

The attack surface of the MH in the guest is simply the API.  The API
needs to do two things:

   1. validate a correct endpoint and negotiate a wrapping key
   2. When requested by QEMU, wrap a section of guest encrypted memory
      with the wrapping key and return it.

The big security risk is in 1. if the MH can be tricked into
communicating with the wrong endpoint it will leak the entire guest. 
If we can lock that down, I don't see any particular security problem
with 2. So, provided we get the security properties of the API correct,
I think we won't have to worry over much about exposure of the API.

> > One question i have here, is that where exactly will the MH code
> exist in QEMU ?

I assume it will be only x86 platform specific code, we probably will
never support it on other platforms ?

So it will probably exist in hw/i386, something similar to "microvm"
support and using the same TYPE_X86_MACHINE ?

I don't think it should be x86 only.  The migration handler receiver
should be completely CPU agnostic.  It's possible other CPUs will grow
an encrypted memory confidential computing capability (Power already
has one and ARM is "thinking" about it, but even if it doesn't, there's
a similar problem if you want to use trustzone isolation in VMs).  I
would envisage migration working substantially similarly on all of them
(need to ask an agent in the guest to wrap an encrypted page for
transport) so I think we should add this capability to the generic QEMU
migration code and let other architectures take advantage of it as they
grow the facility.

> Also if we are not going to use the existing KVM support code and
> adding some duplicate KVM interface code, do we need to interface
> with this added KVM code via the QEMU accelerator framework, or
> simply invoke this KVM code statically ?

I think we need to design the interface as cleanly as possible, so it
just depends what's easiest.  We certainly need some KVM support for
the mirror CPUs, I think but it's not clear to me yet what the simplest
way to do the interface is.

James




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 22:37         ` Paolo Bonzini
  (?)
  (?)
@ 2021-08-18 14:06         ` Ashish Kalra
  2021-08-18 17:07           ` Ashish Kalra
  -1 siblings, 1 reply; 104+ messages in thread
From: Ashish Kalra @ 2021-08-18 14:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Steve Rutherford, qemu-devel, Thomas Lendacky, Brijesh Singh,
	Habkost, Eduardo, S. Tsirkin, Michael, Richard Henderson,
	James E . J . Bottomley, Tobin Feldman-Fitzthum, Dov Murik,
	Hubertus Franke, David Gilbert, kvm

On Wed, Aug 18, 2021 at 12:37:32AM +0200, Paolo Bonzini wrote:
> On Tue, Aug 17, 2021 at 11:54 PM Steve Rutherford
> <srutherford@google.com> wrote:
> > > 1) the easy one: the bottom 4G of guest memory are mapped in the mirror
> > > VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a
> > > huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1).
> > > This even lets the migration helper reuse the OVMF runtime services
> > > memory map (but be careful about thread safety...).
> >
> > If I understand what you are proposing, this would only work for
> > SEV/SEV-ES, since the RMP prevents these remapping games. This makes
> > me less enthusiastic about this (but I suspect that's why you call
> > this less future proof).
> 
> I called it less future proof because it allows the migration helper
> to rely more on OVMF details, but those may not apply in the future.
> 
> However you're right about SNP; the same page cannot be mapped twice
> at different GPAs by a single ASID (which includes the VM and the
> migration helper). :( That does throw a wrench in the idea of mapping
> pages by ram_addr_t(*), and this applies to both schemes.
> 
> Migrating RAM in PCI BARs is a mess anyway for SNP, because PCI BARs
> can be moved and every time they do the migration helper needs to wait
> for validation to happen. :(
> 
> Paolo
> 
> (*) ram_addr_t is not a GPA; it is constant throughout the life of the
> guest and independent of e.g. PCI BARs. Internally, when QEMU
> retrieves the dirty page bitmap from KVM it stores the bits indexed by
> ram_addr_t (shifted right by PAGE_SHIFT).

With reference to SNP here, the mirror VM model seems to have a nice
fit with SNP:

SNP will support the separate address spaces for main VM and mirror VMs
implicitly, with the MH/MA running in VMPL0. 

Additionally, vTOM can be used to separate mirror VM and main VM memory,
with private mirror VM memory below vTOM and all the shared stuff with
main VM setup above vTOM. 

The design here should probably base itself on this model to probably
allow an easy future port to SNP and also make it more futurer-proof.

Thanks,
Ashish

> > > 2) the more future-proof one.  Here, the migration helper tells QEMU
> > > which area to copy from the guest to the mirror VM, as a (main GPA,
> > > length, mirror GPA) tuple.  This could happen for example the first time
> > > the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts,
> > > QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION
> > > accordingly.  The page tables are built for this (usually very high)
> > > mirror GPA and the migration helper operates in a completely separate
> > > address space.  However, the backing memory would still be shared
> > > between the main and mirror VMs.  I am saying this is more future proof
> > > because we have more flexibility in setting up the physical address
> > > space of the mirror VM.
> >
> > My intuition for this leans more on the host, but matches some of the
> > bits you've mentioned in (2)/(3). My intuition would be to put the
> > migration helper incredibly high in gPA space, so that it does not
> > collide with the rest of the guest (and can then stay in the same
> > place for a fairly long period of time without needing to poke a hole
> > in the guest). Then you can leave the ram_addr_t-based addresses
> > mapped normally (without the offsetting). All this together allows the
> > migration helper to be orthogonal to the normal guest and normal
> > firmware.
> >
> > In this case, since the migration helper has a somewhat stable base
> > address, you can have a prebaked entry point and page tables
> > (determined at build time). The shared communication pages can come
> > from neighboring high-memory. The migration helper can support a
> > straightforward halt loop (or PIO loop, or whatever) where it reads
> > from a predefined page to find what work needs to be done (perhaps
> > with that page depending on which CPU it is, so you can support
> > multithreading of the migration helper). Additionally, having it high
> > in memory makes it quite easy to assess who owns which addresses: high
> > mem is under the purview of the migration helper and does not need to
> > be dirty tracked. Only "low" memory can and needs to be encrypted for
> > transport to the target side.
> >
> > --Steve
> > >
> > > Paolo
> > >
> >
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 11:25               ` James Bottomley
@ 2021-08-18 15:31                 ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-18 15:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: Ashish Kalra, Paolo Bonzini, qemu-devel, thomas.lendacky,
	brijesh.singh, ehabkost, mst, richard.henderson, tobin, dovmurik,
	frankeh, kvm

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Wed, 2021-08-18 at 10:31 +0000, Ashish Kalra wrote:
> > Hello Paolo,
> > 
> > On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> > > On 16/08/21 17:13, Ashish Kalra wrote:
> > > > > > I think that once the mirror VM starts booting and running
> > > > > > the UEFI code, it might be only during the PEI or DXE phase
> > > > > > where it will start actually running the MH code, so mirror
> > > > > > VM probably still need to handles KVM_EXIT_IO when SEC phase
> > > > > > does I/O, I can see PIC accesses and Debug Agent
> > > > > > initialization stuff in SEC startup code.
> > > > > That may be a design of the migration helper code that you were
> > > > > working with, but it's not necessary.
> > > > > 
> > > > Actually my comments are about a more generic MH code.
> > > 
> > > I don't think that would be a good idea; designing QEMU's migration
> > > helper interface to be as constrained as possible is a good
> > > thing.  The migration helper is extremely security sensitive code,
> > > so it should not expose itself to the attack surface of the whole
> > > of QEMU.
> 
> The attack surface of the MH in the guest is simply the API.  The API
> needs to do two things:
> 
>    1. validate a correct endpoint and negotiate a wrapping key
>    2. When requested by QEMU, wrap a section of guest encrypted memory
>       with the wrapping key and return it.
> 
> The big security risk is in 1. if the MH can be tricked into
> communicating with the wrong endpoint it will leak the entire guest. 
> If we can lock that down, I don't see any particular security problem
> with 2. So, provided we get the security properties of the API correct,
> I think we won't have to worry over much about exposure of the API.

Well, we'd have to make sure it only does stuff on behalf of qemu; if
the guest can ever write to MH's memory it could do something that the
guest shouldn't be able to.

Dave

> > > One question i have here, is that where exactly will the MH code
> > exist in QEMU ?
> 
> I assume it will be only x86 platform specific code, we probably will
> never support it on other platforms ?
> 
> So it will probably exist in hw/i386, something similar to "microvm"
> support and using the same TYPE_X86_MACHINE ?
> 
> I don't think it should be x86 only.  The migration handler receiver
> should be completely CPU agnostic.  It's possible other CPUs will grow
> an encrypted memory confidential computing capability (Power already
> has one and ARM is "thinking" about it, but even if it doesn't, there's
> a similar problem if you want to use trustzone isolation in VMs).  I
> would envisage migration working substantially similarly on all of them
> (need to ask an agent in the guest to wrap an encrypted page for
> transport) so I think we should add this capability to the generic QEMU
> migration code and let other architectures take advantage of it as they
> grow the facility.
> 
> > Also if we are not going to use the existing KVM support code and
> > adding some duplicate KVM interface code, do we need to interface
> > with this added KVM code via the QEMU accelerator framework, or
> > simply invoke this KVM code statically ?
> 
> I think we need to design the interface as cleanly as possible, so it
> just depends what's easiest.  We certainly need some KVM support for
> the mirror CPUs, I think but it's not clear to me yet what the simplest
> way to do the interface is.
> 
> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 15:31                 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-18 15:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, richard.henderson, qemu-devel, frankeh, dovmurik,
	Paolo Bonzini

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Wed, 2021-08-18 at 10:31 +0000, Ashish Kalra wrote:
> > Hello Paolo,
> > 
> > On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> > > On 16/08/21 17:13, Ashish Kalra wrote:
> > > > > > I think that once the mirror VM starts booting and running
> > > > > > the UEFI code, it might be only during the PEI or DXE phase
> > > > > > where it will start actually running the MH code, so mirror
> > > > > > VM probably still need to handles KVM_EXIT_IO when SEC phase
> > > > > > does I/O, I can see PIC accesses and Debug Agent
> > > > > > initialization stuff in SEC startup code.
> > > > > That may be a design of the migration helper code that you were
> > > > > working with, but it's not necessary.
> > > > > 
> > > > Actually my comments are about a more generic MH code.
> > > 
> > > I don't think that would be a good idea; designing QEMU's migration
> > > helper interface to be as constrained as possible is a good
> > > thing.  The migration helper is extremely security sensitive code,
> > > so it should not expose itself to the attack surface of the whole
> > > of QEMU.
> 
> The attack surface of the MH in the guest is simply the API.  The API
> needs to do two things:
> 
>    1. validate a correct endpoint and negotiate a wrapping key
>    2. When requested by QEMU, wrap a section of guest encrypted memory
>       with the wrapping key and return it.
> 
> The big security risk is in 1. if the MH can be tricked into
> communicating with the wrong endpoint it will leak the entire guest. 
> If we can lock that down, I don't see any particular security problem
> with 2. So, provided we get the security properties of the API correct,
> I think we won't have to worry over much about exposure of the API.

Well, we'd have to make sure it only does stuff on behalf of qemu; if
the guest can ever write to MH's memory it could do something that the
guest shouldn't be able to.

Dave

> > > One question i have here, is that where exactly will the MH code
> > exist in QEMU ?
> 
> I assume it will be only x86 platform specific code, we probably will
> never support it on other platforms ?
> 
> So it will probably exist in hw/i386, something similar to "microvm"
> support and using the same TYPE_X86_MACHINE ?
> 
> I don't think it should be x86 only.  The migration handler receiver
> should be completely CPU agnostic.  It's possible other CPUs will grow
> an encrypted memory confidential computing capability (Power already
> has one and ARM is "thinking" about it, but even if it doesn't, there's
> a similar problem if you want to use trustzone isolation in VMs).  I
> would envisage migration working substantially similarly on all of them
> (need to ask an agent in the guest to wrap an encrypted page for
> transport) so I think we should add this capability to the generic QEMU
> migration code and let other architectures take advantage of it as they
> grow the facility.
> 
> > Also if we are not going to use the existing KVM support code and
> > adding some duplicate KVM interface code, do we need to interface
> > with this added KVM code via the QEMU accelerator framework, or
> > simply invoke this KVM code statically ?
> 
> I think we need to design the interface as cleanly as possible, so it
> just depends what's easiest.  We certainly need some KVM support for
> the mirror CPUs, I think but it's not clear to me yet what the simplest
> way to do the interface is.
> 
> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-17 22:04         ` Steve Rutherford
@ 2021-08-18 15:32           ` Tobin Feldman-Fitzthum
  -1 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-18 15:32 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Paolo Bonzini, Ashish Kalra, thomas.lendacky, brijesh.singh,
	ehabkost, kvm, mst, tobin, jejb, richard.henderson, qemu-devel,
	dgilbert, frankeh, dovmurik

On 8/17/21 6:04 PM, Steve Rutherford wrote:
> On Tue, Aug 17, 2021 at 1:50 PM Tobin Feldman-Fitzthum
> <tobin@linux.ibm.com> wrote:
>> This is essentially what we do in our prototype, although we have an
>> even simpler approach. We have a 1:1 mapping that maps an address to
>> itself with the cbit set. During Migration QEMU asks the migration
>> handler to import/export encrypted pages and provides the GPA for said
>> page. Since the migration handler only exports/imports encrypted pages,
>> we can have the cbit set for every page in our mapping. We can still use
>> OVMF functions with these mappings because they are on encrypted pages.
>> The MH does need to use a few shared pages (to communicate with QEMU,
>> for instance), so we have another mapping without the cbit that is at a
>> large offset.
>>
>> I think this is basically equivalent to what you suggest. As you point
>> out above, this approach does require that any page that will be
>> exported/imported by the MH is mapped in the guest. Is this a bad
>> assumption? The VMSA for SEV-ES is one example of a region that is
>> encrypted but not mapped in the guest (the PSP handles it directly). We
>> have been planning to map the VMSA into the guest to support migration
>> with SEV-ES (along with other changes).
> Ahh, It sounds like you are looking into sidestepping the existing
> AMD-SP flows for migration. I assume the idea is to spin up a VM on
> the target side, and have the two VMs attest to each other. How do the
> two sides know if the other is legitimate? I take it that the source
> is directing the LAUNCH flows?

Yeah we don't use PSP migration flows at all. We don't need to send the 
MH code from the source to the target because the MH lives in firmware, 
which is common between the two. We start the target like a normal VM 
rather than waiting for an incoming migration. The plan is to treat the 
target like a normal VM for attestation as well. The guest owner will 
attest the target VM just like they would any other VM that is started 
on their behalf. Secret injection can be used to establish a shared key 
for the source and target.

-Tobin

>
> --Steve
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 15:32           ` Tobin Feldman-Fitzthum
  0 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-18 15:32 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	richard.henderson, jejb, tobin, qemu-devel, dgilbert, frankeh,
	Paolo Bonzini, dovmurik

On 8/17/21 6:04 PM, Steve Rutherford wrote:
> On Tue, Aug 17, 2021 at 1:50 PM Tobin Feldman-Fitzthum
> <tobin@linux.ibm.com> wrote:
>> This is essentially what we do in our prototype, although we have an
>> even simpler approach. We have a 1:1 mapping that maps an address to
>> itself with the cbit set. During Migration QEMU asks the migration
>> handler to import/export encrypted pages and provides the GPA for said
>> page. Since the migration handler only exports/imports encrypted pages,
>> we can have the cbit set for every page in our mapping. We can still use
>> OVMF functions with these mappings because they are on encrypted pages.
>> The MH does need to use a few shared pages (to communicate with QEMU,
>> for instance), so we have another mapping without the cbit that is at a
>> large offset.
>>
>> I think this is basically equivalent to what you suggest. As you point
>> out above, this approach does require that any page that will be
>> exported/imported by the MH is mapped in the guest. Is this a bad
>> assumption? The VMSA for SEV-ES is one example of a region that is
>> encrypted but not mapped in the guest (the PSP handles it directly). We
>> have been planning to map the VMSA into the guest to support migration
>> with SEV-ES (along with other changes).
> Ahh, It sounds like you are looking into sidestepping the existing
> AMD-SP flows for migration. I assume the idea is to spin up a VM on
> the target side, and have the two VMs attest to each other. How do the
> two sides know if the other is legitimate? I take it that the source
> is directing the LAUNCH flows?

Yeah we don't use PSP migration flows at all. We don't need to send the 
MH code from the source to the target because the MH lives in firmware, 
which is common between the two. We start the target like a normal VM 
rather than waiting for an incoming migration. The plan is to treat the 
target like a normal VM for attestation as well. The guest owner will 
attest the target VM just like they would any other VM that is started 
on their behalf. Secret injection can be used to establish a shared key 
for the source and target.

-Tobin

>
> --Steve
>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 15:31                 ` Dr. David Alan Gilbert
@ 2021-08-18 15:35                   ` James Bottomley
  -1 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18 15:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Ashish Kalra, Paolo Bonzini, qemu-devel, thomas.lendacky,
	brijesh.singh, ehabkost, mst, richard.henderson, tobin, dovmurik,
	frankeh, kvm

On Wed, 2021-08-18 at 16:31 +0100, Dr. David Alan Gilbert wrote:
> * James Bottomley (jejb@linux.ibm.com) wrote:
> > On Wed, 2021-08-18 at 10:31 +0000, Ashish Kalra wrote:
> > > Hello Paolo,
> > > 
> > > On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> > > > On 16/08/21 17:13, Ashish Kalra wrote:
> > > > > > > I think that once the mirror VM starts booting and
> > > > > > > running the UEFI code, it might be only during the PEI or
> > > > > > > DXE phase where it will start actually running the MH
> > > > > > > code, so mirror VM probably still need to handles
> > > > > > > KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> > > > > > > accesses and Debug Agent initialization stuff in SEC
> > > > > > > startup code.
> > > > > > That may be a design of the migration helper code that you
> > > > > > were working with, but it's not necessary.
> > > > > > 
> > > > > Actually my comments are about a more generic MH code.
> > > > 
> > > > I don't think that would be a good idea; designing QEMU's
> > > > migration helper interface to be as constrained as possible is
> > > > a good thing.  The migration helper is extremely security
> > > > sensitive code, so it should not expose itself to the attack
> > > > surface of the whole of QEMU.
> > 
> > The attack surface of the MH in the guest is simply the API.  The
> > API needs to do two things:
> > 
> >    1. validate a correct endpoint and negotiate a wrapping key
> >    2. When requested by QEMU, wrap a section of guest encrypted
> > memory
> >       with the wrapping key and return it.
> > 
> > The big security risk is in 1. if the MH can be tricked into
> > communicating with the wrong endpoint it will leak the entire
> > guest.  If we can lock that down, I don't see any particular
> > security problem with 2. So, provided we get the security
> > properties of the API correct, I think we won't have to worry over
> > much about exposure of the API.
> 
> Well, we'd have to make sure it only does stuff on behalf of qemu; if
> the guest can ever write to MH's memory it could do something that
> the guest shouldn't be able to.

Given the lack of SMI, we can't guarantee that with plain SEV and -ES. 
Once we move to -SNP, we can use VMPLs to achieve this.

But realistically, given the above API, even if the guest is malicious,
what can it do?  I think it's simply return bogus pages that cause a
crash on start after migration, which doesn't look like a huge risk to
the cloud to me (it's more a self destructive act on behalf of the
guest).

James



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 15:35                   ` James Bottomley
  0 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18 15:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, richard.henderson, qemu-devel, frankeh, dovmurik,
	Paolo Bonzini

On Wed, 2021-08-18 at 16:31 +0100, Dr. David Alan Gilbert wrote:
> * James Bottomley (jejb@linux.ibm.com) wrote:
> > On Wed, 2021-08-18 at 10:31 +0000, Ashish Kalra wrote:
> > > Hello Paolo,
> > > 
> > > On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> > > > On 16/08/21 17:13, Ashish Kalra wrote:
> > > > > > > I think that once the mirror VM starts booting and
> > > > > > > running the UEFI code, it might be only during the PEI or
> > > > > > > DXE phase where it will start actually running the MH
> > > > > > > code, so mirror VM probably still need to handles
> > > > > > > KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> > > > > > > accesses and Debug Agent initialization stuff in SEC
> > > > > > > startup code.
> > > > > > That may be a design of the migration helper code that you
> > > > > > were working with, but it's not necessary.
> > > > > > 
> > > > > Actually my comments are about a more generic MH code.
> > > > 
> > > > I don't think that would be a good idea; designing QEMU's
> > > > migration helper interface to be as constrained as possible is
> > > > a good thing.  The migration helper is extremely security
> > > > sensitive code, so it should not expose itself to the attack
> > > > surface of the whole of QEMU.
> > 
> > The attack surface of the MH in the guest is simply the API.  The
> > API needs to do two things:
> > 
> >    1. validate a correct endpoint and negotiate a wrapping key
> >    2. When requested by QEMU, wrap a section of guest encrypted
> > memory
> >       with the wrapping key and return it.
> > 
> > The big security risk is in 1. if the MH can be tricked into
> > communicating with the wrong endpoint it will leak the entire
> > guest.  If we can lock that down, I don't see any particular
> > security problem with 2. So, provided we get the security
> > properties of the API correct, I think we won't have to worry over
> > much about exposure of the API.
> 
> Well, we'd have to make sure it only does stuff on behalf of qemu; if
> the guest can ever write to MH's memory it could do something that
> the guest shouldn't be able to.

Given the lack of SMI, we can't guarantee that with plain SEV and -ES. 
Once we move to -SNP, we can use VMPLs to achieve this.

But realistically, given the above API, even if the guest is malicious,
what can it do?  I think it's simply return bogus pages that cause a
crash on start after migration, which doesn't look like a huge risk to
the cloud to me (it's more a self destructive act on behalf of the
guest).

James




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 15:35                   ` James Bottomley
@ 2021-08-18 15:43                     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-18 15:43 UTC (permalink / raw)
  To: James Bottomley
  Cc: Ashish Kalra, Paolo Bonzini, qemu-devel, thomas.lendacky,
	brijesh.singh, ehabkost, mst, richard.henderson, tobin, dovmurik,
	frankeh, kvm

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Wed, 2021-08-18 at 16:31 +0100, Dr. David Alan Gilbert wrote:
> > * James Bottomley (jejb@linux.ibm.com) wrote:
> > > On Wed, 2021-08-18 at 10:31 +0000, Ashish Kalra wrote:
> > > > Hello Paolo,
> > > > 
> > > > On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> > > > > On 16/08/21 17:13, Ashish Kalra wrote:
> > > > > > > > I think that once the mirror VM starts booting and
> > > > > > > > running the UEFI code, it might be only during the PEI or
> > > > > > > > DXE phase where it will start actually running the MH
> > > > > > > > code, so mirror VM probably still need to handles
> > > > > > > > KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> > > > > > > > accesses and Debug Agent initialization stuff in SEC
> > > > > > > > startup code.
> > > > > > > That may be a design of the migration helper code that you
> > > > > > > were working with, but it's not necessary.
> > > > > > > 
> > > > > > Actually my comments are about a more generic MH code.
> > > > > 
> > > > > I don't think that would be a good idea; designing QEMU's
> > > > > migration helper interface to be as constrained as possible is
> > > > > a good thing.  The migration helper is extremely security
> > > > > sensitive code, so it should not expose itself to the attack
> > > > > surface of the whole of QEMU.
> > > 
> > > The attack surface of the MH in the guest is simply the API.  The
> > > API needs to do two things:
> > > 
> > >    1. validate a correct endpoint and negotiate a wrapping key
> > >    2. When requested by QEMU, wrap a section of guest encrypted
> > > memory
> > >       with the wrapping key and return it.
> > > 
> > > The big security risk is in 1. if the MH can be tricked into
> > > communicating with the wrong endpoint it will leak the entire
> > > guest.  If we can lock that down, I don't see any particular
> > > security problem with 2. So, provided we get the security
> > > properties of the API correct, I think we won't have to worry over
> > > much about exposure of the API.
> > 
> > Well, we'd have to make sure it only does stuff on behalf of qemu; if
> > the guest can ever write to MH's memory it could do something that
> > the guest shouldn't be able to.
> 
> Given the lack of SMI, we can't guarantee that with plain SEV and -ES. 
> Once we move to -SNP, we can use VMPLs to achieve this.

Doesn't the MH have access to different slots and running on separate
vCPUs; so it's still got some separation?

> But realistically, given the above API, even if the guest is malicious,
> what can it do?  I think it's simply return bogus pages that cause a
> crash on start after migration, which doesn't look like a huge risk to
> the cloud to me (it's more a self destructive act on behalf of the
> guest).

I'm a bit worried about the data structures that are shared between the
migration code in qemu and the MH; the code in qemu is going to have to
be paranoid about not trusting anything coming from the MH.

Dave

> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 15:43                     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-18 15:43 UTC (permalink / raw)
  To: James Bottomley
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, richard.henderson, qemu-devel, frankeh, dovmurik,
	Paolo Bonzini

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Wed, 2021-08-18 at 16:31 +0100, Dr. David Alan Gilbert wrote:
> > * James Bottomley (jejb@linux.ibm.com) wrote:
> > > On Wed, 2021-08-18 at 10:31 +0000, Ashish Kalra wrote:
> > > > Hello Paolo,
> > > > 
> > > > On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> > > > > On 16/08/21 17:13, Ashish Kalra wrote:
> > > > > > > > I think that once the mirror VM starts booting and
> > > > > > > > running the UEFI code, it might be only during the PEI or
> > > > > > > > DXE phase where it will start actually running the MH
> > > > > > > > code, so mirror VM probably still need to handles
> > > > > > > > KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> > > > > > > > accesses and Debug Agent initialization stuff in SEC
> > > > > > > > startup code.
> > > > > > > That may be a design of the migration helper code that you
> > > > > > > were working with, but it's not necessary.
> > > > > > > 
> > > > > > Actually my comments are about a more generic MH code.
> > > > > 
> > > > > I don't think that would be a good idea; designing QEMU's
> > > > > migration helper interface to be as constrained as possible is
> > > > > a good thing.  The migration helper is extremely security
> > > > > sensitive code, so it should not expose itself to the attack
> > > > > surface of the whole of QEMU.
> > > 
> > > The attack surface of the MH in the guest is simply the API.  The
> > > API needs to do two things:
> > > 
> > >    1. validate a correct endpoint and negotiate a wrapping key
> > >    2. When requested by QEMU, wrap a section of guest encrypted
> > > memory
> > >       with the wrapping key and return it.
> > > 
> > > The big security risk is in 1. if the MH can be tricked into
> > > communicating with the wrong endpoint it will leak the entire
> > > guest.  If we can lock that down, I don't see any particular
> > > security problem with 2. So, provided we get the security
> > > properties of the API correct, I think we won't have to worry over
> > > much about exposure of the API.
> > 
> > Well, we'd have to make sure it only does stuff on behalf of qemu; if
> > the guest can ever write to MH's memory it could do something that
> > the guest shouldn't be able to.
> 
> Given the lack of SMI, we can't guarantee that with plain SEV and -ES. 
> Once we move to -SNP, we can use VMPLs to achieve this.

Doesn't the MH have access to different slots and running on separate
vCPUs; so it's still got some separation?

> But realistically, given the above API, even if the guest is malicious,
> what can it do?  I think it's simply return bogus pages that cause a
> crash on start after migration, which doesn't look like a huge risk to
> the cloud to me (it's more a self destructive act on behalf of the
> guest).

I'm a bit worried about the data structures that are shared between the
migration code in qemu and the MH; the code in qemu is going to have to
be paranoid about not trusting anything coming from the MH.

Dave

> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 15:43                     ` Dr. David Alan Gilbert
@ 2021-08-18 16:28                       ` James Bottomley
  -1 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18 16:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Ashish Kalra, Paolo Bonzini, qemu-devel, thomas.lendacky,
	brijesh.singh, ehabkost, mst, richard.henderson, tobin, dovmurik,
	frankeh, kvm

On Wed, 2021-08-18 at 16:43 +0100, Dr. David Alan Gilbert wrote:
> * James Bottomley (jejb@linux.ibm.com) wrote:
[...]
> > Given the lack of SMI, we can't guarantee that with plain SEV and
> > -ES. Once we move to -SNP, we can use VMPLs to achieve this.
> 
> Doesn't the MH have access to different slots and running on separate
> vCPUs; so it's still got some separation?

Remember that the OVMF code is provided by the host, but its attested
to and run by the guest.  Once the guest takes control (i.e. after OVMF
boots the next thing), we can't guarantee that it wont overwrite the MH
code, so the host must treat the MH as untrusted.

> > But realistically, given the above API, even if the guest is
> > malicious, what can it do?  I think it's simply return bogus pages
> > that cause a crash on start after migration, which doesn't look
> > like a huge risk to the cloud to me (it's more a self destructive
> > act on behalf of the guest).
> 
> I'm a bit worried about the data structures that are shared between
> the migration code in qemu and the MH; the code in qemu is going to
> have to be paranoid about not trusting anything coming from the MH.

Given that we have to treat the host MH structure as untrusted, this is
definitely something we have to do.  Although the primary API is simply
"here's a buffer, please fill it", so there's not much checking to do,
we just have to be careful that we don't expose any more of the buffer
than the guest needs to write to ... and, obviously, clean it before
exposing it to the guest.

James



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 16:28                       ` James Bottomley
  0 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18 16:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, richard.henderson, qemu-devel, frankeh, dovmurik,
	Paolo Bonzini

On Wed, 2021-08-18 at 16:43 +0100, Dr. David Alan Gilbert wrote:
> * James Bottomley (jejb@linux.ibm.com) wrote:
[...]
> > Given the lack of SMI, we can't guarantee that with plain SEV and
> > -ES. Once we move to -SNP, we can use VMPLs to achieve this.
> 
> Doesn't the MH have access to different slots and running on separate
> vCPUs; so it's still got some separation?

Remember that the OVMF code is provided by the host, but its attested
to and run by the guest.  Once the guest takes control (i.e. after OVMF
boots the next thing), we can't guarantee that it wont overwrite the MH
code, so the host must treat the MH as untrusted.

> > But realistically, given the above API, even if the guest is
> > malicious, what can it do?  I think it's simply return bogus pages
> > that cause a crash on start after migration, which doesn't look
> > like a huge risk to the cloud to me (it's more a self destructive
> > act on behalf of the guest).
> 
> I'm a bit worried about the data structures that are shared between
> the migration code in qemu and the MH; the code in qemu is going to
> have to be paranoid about not trusting anything coming from the MH.

Given that we have to treat the host MH structure as untrusted, this is
definitely something we have to do.  Although the primary API is simply
"here's a buffer, please fill it", so there's not much checking to do,
we just have to be careful that we don't expose any more of the buffer
than the guest needs to write to ... and, obviously, clean it before
exposing it to the guest.

James




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 14:06         ` Ashish Kalra
@ 2021-08-18 17:07           ` Ashish Kalra
  0 siblings, 0 replies; 104+ messages in thread
From: Ashish Kalra @ 2021-08-18 17:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Steve Rutherford, qemu-devel, Thomas Lendacky, Brijesh Singh,
	Habkost, Eduardo, S. Tsirkin, Michael, Richard Henderson,
	James E . J . Bottomley, Tobin Feldman-Fitzthum, Dov Murik,
	Hubertus Franke, David Gilbert, kvm

On Wed, Aug 18, 2021 at 02:06:25PM +0000, Ashish Kalra wrote:
> On Wed, Aug 18, 2021 at 12:37:32AM +0200, Paolo Bonzini wrote:
> > On Tue, Aug 17, 2021 at 11:54 PM Steve Rutherford
> > <srutherford@google.com> wrote:
> > > > 1) the easy one: the bottom 4G of guest memory are mapped in the mirror
> > > > VM 1:1.  The ram_addr_t-based addresses are shifted by either 4G or a
> > > > huge value such as 2^42 (MAXPHYADDR - physical address reduction - 1).
> > > > This even lets the migration helper reuse the OVMF runtime services
> > > > memory map (but be careful about thread safety...).
> > >
> > > If I understand what you are proposing, this would only work for
> > > SEV/SEV-ES, since the RMP prevents these remapping games. This makes
> > > me less enthusiastic about this (but I suspect that's why you call
> > > this less future proof).
> > 
> > I called it less future proof because it allows the migration helper
> > to rely more on OVMF details, but those may not apply in the future.
> > 
> > However you're right about SNP; the same page cannot be mapped twice
> > at different GPAs by a single ASID (which includes the VM and the
> > migration helper). :( That does throw a wrench in the idea of mapping
> > pages by ram_addr_t(*), and this applies to both schemes.
> > 
> > Migrating RAM in PCI BARs is a mess anyway for SNP, because PCI BARs
> > can be moved and every time they do the migration helper needs to wait
> > for validation to happen. :(
> > 
> > Paolo
> > 
> > (*) ram_addr_t is not a GPA; it is constant throughout the life of the
> > guest and independent of e.g. PCI BARs. Internally, when QEMU
> > retrieves the dirty page bitmap from KVM it stores the bits indexed by
> > ram_addr_t (shifted right by PAGE_SHIFT).
> 
> With reference to SNP here, the mirror VM model seems to have a nice
> fit with SNP:
> 
> SNP will support the separate address spaces for main VM and mirror VMs
> implicitly, with the MH/MA running in VMPL0. 
> 

Need to correct this statement, there is no separate address space as
such, there is basically page level permission/protection between VMPL
levels. 

> Additionally, vTOM can be used to separate mirror VM and main VM memory,
> with private mirror VM memory below vTOM and all the shared stuff with
> main VM setup above vTOM. 
> 

I need to take back the above statement, memory above vTOM is basically
decrypted memory. 

Thanks,
Ashish

> The design here should probably base itself on this model to probably
> allow an easy future port to SNP and also make it more futurer-proof.

> > > > 2) the more future-proof one.  Here, the migration helper tells QEMU
> > > > which area to copy from the guest to the mirror VM, as a (main GPA,
> > > > length, mirror GPA) tuple.  This could happen for example the first time
> > > > the guest writes 1 to MSR_KVM_MIGRATION_CONTROL.  When migration starts,
> > > > QEMU uses this information to issue KVM_SET_USER_MEMORY_REGION
> > > > accordingly.  The page tables are built for this (usually very high)
> > > > mirror GPA and the migration helper operates in a completely separate
> > > > address space.  However, the backing memory would still be shared
> > > > between the main and mirror VMs.  I am saying this is more future proof
> > > > because we have more flexibility in setting up the physical address
> > > > space of the mirror VM.
> > >
> > > My intuition for this leans more on the host, but matches some of the
> > > bits you've mentioned in (2)/(3). My intuition would be to put the
> > > migration helper incredibly high in gPA space, so that it does not
> > > collide with the rest of the guest (and can then stay in the same
> > > place for a fairly long period of time without needing to poke a hole
> > > in the guest). Then you can leave the ram_addr_t-based addresses
> > > mapped normally (without the offsetting). All this together allows the
> > > migration helper to be orthogonal to the normal guest and normal
> > > firmware.
> > >
> > > In this case, since the migration helper has a somewhat stable base
> > > address, you can have a prebaked entry point and page tables
> > > (determined at build time). The shared communication pages can come
> > > from neighboring high-memory. The migration helper can support a
> > > straightforward halt loop (or PIO loop, or whatever) where it reads
> > > from a predefined page to find what work needs to be done (perhaps
> > > with that page depending on which CPU it is, so you can support
> > > multithreading of the migration helper). Additionally, having it high
> > > in memory makes it quite easy to assess who owns which addresses: high
> > > mem is under the purview of the migration helper and does not need to
> > > be dirty tracked. Only "low" memory can and needs to be encrypted for
> > > transport to the target side.
> > >
> > > --Steve
> > > >
> > > > Paolo
> > > >
> > >
> > 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 16:28                       ` James Bottomley
@ 2021-08-18 17:30                         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-18 17:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: Ashish Kalra, Paolo Bonzini, qemu-devel, thomas.lendacky,
	brijesh.singh, ehabkost, mst, richard.henderson, tobin, dovmurik,
	frankeh, kvm

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Wed, 2021-08-18 at 16:43 +0100, Dr. David Alan Gilbert wrote:
> > * James Bottomley (jejb@linux.ibm.com) wrote:
> [...]
> > > Given the lack of SMI, we can't guarantee that with plain SEV and
> > > -ES. Once we move to -SNP, we can use VMPLs to achieve this.
> > 
> > Doesn't the MH have access to different slots and running on separate
> > vCPUs; so it's still got some separation?
> 
> Remember that the OVMF code is provided by the host, but its attested
> to and run by the guest.  Once the guest takes control (i.e. after OVMF
> boots the next thing), we can't guarantee that it wont overwrite the MH
> code, so the host must treat the MH as untrusted.

Yeh; if it's in a romimage I guess we could write protect it?
(Not that I'd trust it still)

> > > But realistically, given the above API, even if the guest is
> > > malicious, what can it do?  I think it's simply return bogus pages
> > > that cause a crash on start after migration, which doesn't look
> > > like a huge risk to the cloud to me (it's more a self destructive
> > > act on behalf of the guest).
> > 
> > I'm a bit worried about the data structures that are shared between
> > the migration code in qemu and the MH; the code in qemu is going to
> > have to be paranoid about not trusting anything coming from the MH.
> 
> Given that we have to treat the host MH structure as untrusted, this is
> definitely something we have to do.  Although the primary API is simply
> "here's a buffer, please fill it", so there's not much checking to do,
> we just have to be careful that we don't expose any more of the buffer
> than the guest needs to write to ... and, obviously, clean it before
> exposing it to the guest.

I was assuming life got a bit more complicated than that; and we had
to have lists of pages we were requesting, and a list of pages that were
cooked and the qemu thread and the helper thread all had to work in
parallel.  So I'm guessing some list or bookkeeeping that we need to be
very careful of.

Dave

> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 17:30                         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-18 17:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, richard.henderson, qemu-devel, frankeh, dovmurik,
	Paolo Bonzini

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Wed, 2021-08-18 at 16:43 +0100, Dr. David Alan Gilbert wrote:
> > * James Bottomley (jejb@linux.ibm.com) wrote:
> [...]
> > > Given the lack of SMI, we can't guarantee that with plain SEV and
> > > -ES. Once we move to -SNP, we can use VMPLs to achieve this.
> > 
> > Doesn't the MH have access to different slots and running on separate
> > vCPUs; so it's still got some separation?
> 
> Remember that the OVMF code is provided by the host, but its attested
> to and run by the guest.  Once the guest takes control (i.e. after OVMF
> boots the next thing), we can't guarantee that it wont overwrite the MH
> code, so the host must treat the MH as untrusted.

Yeh; if it's in a romimage I guess we could write protect it?
(Not that I'd trust it still)

> > > But realistically, given the above API, even if the guest is
> > > malicious, what can it do?  I think it's simply return bogus pages
> > > that cause a crash on start after migration, which doesn't look
> > > like a huge risk to the cloud to me (it's more a self destructive
> > > act on behalf of the guest).
> > 
> > I'm a bit worried about the data structures that are shared between
> > the migration code in qemu and the MH; the code in qemu is going to
> > have to be paranoid about not trusting anything coming from the MH.
> 
> Given that we have to treat the host MH structure as untrusted, this is
> definitely something we have to do.  Although the primary API is simply
> "here's a buffer, please fill it", so there's not much checking to do,
> we just have to be careful that we don't expose any more of the buffer
> than the guest needs to write to ... and, obviously, clean it before
> exposing it to the guest.

I was assuming life got a bit more complicated than that; and we had
to have lists of pages we were requesting, and a list of pages that were
cooked and the qemu thread and the helper thread all had to work in
parallel.  So I'm guessing some list or bookkeeeping that we need to be
very careful of.

Dave

> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 17:30                         ` Dr. David Alan Gilbert
@ 2021-08-18 18:51                           ` James Bottomley
  -1 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18 18:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Ashish Kalra, Paolo Bonzini, qemu-devel, thomas.lendacky,
	brijesh.singh, ehabkost, mst, richard.henderson, tobin, dovmurik,
	frankeh, kvm

On Wed, 2021-08-18 at 18:30 +0100, Dr. David Alan Gilbert wrote:
> * James Bottomley (jejb@linux.ibm.com) wrote:
> > On Wed, 2021-08-18 at 16:43 +0100, Dr. David Alan Gilbert wrote:
> > > * James Bottomley (jejb@linux.ibm.com) wrote:
> > [...]
> > > > Given the lack of SMI, we can't guarantee that with plain SEV
> > > > and -ES. Once we move to -SNP, we can use VMPLs to achieve
> > > > this.
> > > 
> > > Doesn't the MH have access to different slots and running on
> > > separate vCPUs; so it's still got some separation?
> > 
> > Remember that the OVMF code is provided by the host, but its
> > attested to and run by the guest.  Once the guest takes control
> > (i.e. after OVMF boots the next thing), we can't guarantee that it
> > wont overwrite the MH code, so the host must treat the MH as
> > untrusted.
> 
> Yeh; if it's in a romimage I guess we could write protect it?
> (Not that I'd trust it still)

Yes, but unfortunately OVMF (and edk2 in general) has another pitfall
for you: the initial pflash may be a read only ROM image, but it
uncompresses itself to low RAM and executes itself out of there. 
Anything in either PEI or DXE (which is where the migration handler
lies) is RAM based after decompression.

> > > > But realistically, given the above API, even if the guest is
> > > > malicious, what can it do?  I think it's simply return bogus
> > > > pages that cause a crash on start after migration, which
> > > > doesn't look like a huge risk to the cloud to me (it's more a
> > > > self destructive act on behalf of the guest).
> > > 
> > > I'm a bit worried about the data structures that are shared
> > > between the migration code in qemu and the MH; the code in qemu
> > > is going to have to be paranoid about not trusting anything
> > > coming from the MH.
> > 
> > Given that we have to treat the host MH structure as untrusted,
> > this is definitely something we have to do.  Although the primary
> > API is simply "here's a buffer, please fill it", so there's not
> > much checking to do, we just have to be careful that we don't
> > expose any more of the buffer than the guest needs to write to ...
> > and, obviously, clean it before exposing it to the guest.
> 
> I was assuming life got a bit more complicated than that; and we had
> to have lists of pages we were requesting, and a list of pages that
> were cooked and the qemu thread and the helper thread all had to work
> in parallel.  So I'm guessing some list or bookkeeeping that we need
> to be very careful of.

I was more or less imagining a GPA address and length, so range based,
but it could be we need something more sophisticated ... Tobin will
look after that part.  However, either way, we just need to be careful.

Regards,

James



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 18:51                           ` James Bottomley
  0 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-18 18:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	tobin, richard.henderson, qemu-devel, frankeh, dovmurik,
	Paolo Bonzini

On Wed, 2021-08-18 at 18:30 +0100, Dr. David Alan Gilbert wrote:
> * James Bottomley (jejb@linux.ibm.com) wrote:
> > On Wed, 2021-08-18 at 16:43 +0100, Dr. David Alan Gilbert wrote:
> > > * James Bottomley (jejb@linux.ibm.com) wrote:
> > [...]
> > > > Given the lack of SMI, we can't guarantee that with plain SEV
> > > > and -ES. Once we move to -SNP, we can use VMPLs to achieve
> > > > this.
> > > 
> > > Doesn't the MH have access to different slots and running on
> > > separate vCPUs; so it's still got some separation?
> > 
> > Remember that the OVMF code is provided by the host, but its
> > attested to and run by the guest.  Once the guest takes control
> > (i.e. after OVMF boots the next thing), we can't guarantee that it
> > wont overwrite the MH code, so the host must treat the MH as
> > untrusted.
> 
> Yeh; if it's in a romimage I guess we could write protect it?
> (Not that I'd trust it still)

Yes, but unfortunately OVMF (and edk2 in general) has another pitfall
for you: the initial pflash may be a read only ROM image, but it
uncompresses itself to low RAM and executes itself out of there. 
Anything in either PEI or DXE (which is where the migration handler
lies) is RAM based after decompression.

> > > > But realistically, given the above API, even if the guest is
> > > > malicious, what can it do?  I think it's simply return bogus
> > > > pages that cause a crash on start after migration, which
> > > > doesn't look like a huge risk to the cloud to me (it's more a
> > > > self destructive act on behalf of the guest).
> > > 
> > > I'm a bit worried about the data structures that are shared
> > > between the migration code in qemu and the MH; the code in qemu
> > > is going to have to be paranoid about not trusting anything
> > > coming from the MH.
> > 
> > Given that we have to treat the host MH structure as untrusted,
> > this is definitely something we have to do.  Although the primary
> > API is simply "here's a buffer, please fill it", so there's not
> > much checking to do, we just have to be careful that we don't
> > expose any more of the buffer than the guest needs to write to ...
> > and, obviously, clean it before exposing it to the guest.
> 
> I was assuming life got a bit more complicated than that; and we had
> to have lists of pages we were requesting, and a list of pages that
> were cooked and the qemu thread and the helper thread all had to work
> in parallel.  So I'm guessing some list or bookkeeeping that we need
> to be very careful of.

I was more or less imagining a GPA address and length, so range based,
but it could be we need something more sophisticated ... Tobin will
look after that part.  However, either way, we just need to be careful.

Regards,

James




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 15:32           ` Tobin Feldman-Fitzthum
@ 2021-08-18 19:04             ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-18 19:04 UTC (permalink / raw)
  To: Tobin Feldman-Fitzthum
  Cc: Steve Rutherford, Paolo Bonzini, Ashish Kalra, thomas.lendacky,
	brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, qemu-devel, frankeh, dovmurik

* Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> On 8/17/21 6:04 PM, Steve Rutherford wrote:
> > On Tue, Aug 17, 2021 at 1:50 PM Tobin Feldman-Fitzthum
> > <tobin@linux.ibm.com> wrote:
> > > This is essentially what we do in our prototype, although we have an
> > > even simpler approach. We have a 1:1 mapping that maps an address to
> > > itself with the cbit set. During Migration QEMU asks the migration
> > > handler to import/export encrypted pages and provides the GPA for said
> > > page. Since the migration handler only exports/imports encrypted pages,
> > > we can have the cbit set for every page in our mapping. We can still use
> > > OVMF functions with these mappings because they are on encrypted pages.
> > > The MH does need to use a few shared pages (to communicate with QEMU,
> > > for instance), so we have another mapping without the cbit that is at a
> > > large offset.
> > > 
> > > I think this is basically equivalent to what you suggest. As you point
> > > out above, this approach does require that any page that will be
> > > exported/imported by the MH is mapped in the guest. Is this a bad
> > > assumption? The VMSA for SEV-ES is one example of a region that is
> > > encrypted but not mapped in the guest (the PSP handles it directly). We
> > > have been planning to map the VMSA into the guest to support migration
> > > with SEV-ES (along with other changes).
> > Ahh, It sounds like you are looking into sidestepping the existing
> > AMD-SP flows for migration. I assume the idea is to spin up a VM on
> > the target side, and have the two VMs attest to each other. How do the
> > two sides know if the other is legitimate? I take it that the source
> > is directing the LAUNCH flows?
> 
> Yeah we don't use PSP migration flows at all. We don't need to send the MH
> code from the source to the target because the MH lives in firmware, which
> is common between the two.

Are you relying on the target firmware to be *identical* or purely for
it to be *compatible* ?  It's normal for a migration to be the result of
wanting to do an upgrade; and that means the destination build of OVMF
might be newer (or older, or ...).

Dave


> We start the target like a normal VM rather than
> waiting for an incoming migration. The plan is to treat the target like a
> normal VM for attestation as well. The guest owner will attest the target VM
> just like they would any other VM that is started on their behalf. Secret
> injection can be used to establish a shared key for the source and target.
> 
> -Tobin
> 
> > 
> > --Steve
> > 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 19:04             ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-18 19:04 UTC (permalink / raw)
  To: Tobin Feldman-Fitzthum
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, richard.henderson, jejb, tobin, qemu-devel,
	frankeh, Paolo Bonzini, dovmurik

* Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> On 8/17/21 6:04 PM, Steve Rutherford wrote:
> > On Tue, Aug 17, 2021 at 1:50 PM Tobin Feldman-Fitzthum
> > <tobin@linux.ibm.com> wrote:
> > > This is essentially what we do in our prototype, although we have an
> > > even simpler approach. We have a 1:1 mapping that maps an address to
> > > itself with the cbit set. During Migration QEMU asks the migration
> > > handler to import/export encrypted pages and provides the GPA for said
> > > page. Since the migration handler only exports/imports encrypted pages,
> > > we can have the cbit set for every page in our mapping. We can still use
> > > OVMF functions with these mappings because they are on encrypted pages.
> > > The MH does need to use a few shared pages (to communicate with QEMU,
> > > for instance), so we have another mapping without the cbit that is at a
> > > large offset.
> > > 
> > > I think this is basically equivalent to what you suggest. As you point
> > > out above, this approach does require that any page that will be
> > > exported/imported by the MH is mapped in the guest. Is this a bad
> > > assumption? The VMSA for SEV-ES is one example of a region that is
> > > encrypted but not mapped in the guest (the PSP handles it directly). We
> > > have been planning to map the VMSA into the guest to support migration
> > > with SEV-ES (along with other changes).
> > Ahh, It sounds like you are looking into sidestepping the existing
> > AMD-SP flows for migration. I assume the idea is to spin up a VM on
> > the target side, and have the two VMs attest to each other. How do the
> > two sides know if the other is legitimate? I take it that the source
> > is directing the LAUNCH flows?
> 
> Yeah we don't use PSP migration flows at all. We don't need to send the MH
> code from the source to the target because the MH lives in firmware, which
> is common between the two.

Are you relying on the target firmware to be *identical* or purely for
it to be *compatible* ?  It's normal for a migration to be the result of
wanting to do an upgrade; and that means the destination build of OVMF
might be newer (or older, or ...).

Dave


> We start the target like a normal VM rather than
> waiting for an incoming migration. The plan is to treat the target like a
> normal VM for attestation as well. The guest owner will attest the target VM
> just like they would any other VM that is started on their behalf. Secret
> injection can be used to establish a shared key for the source and target.
> 
> -Tobin
> 
> > 
> > --Steve
> > 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 10:31           ` Ashish Kalra
  2021-08-18 11:25               ` James Bottomley
@ 2021-08-18 19:47             ` Paolo Bonzini
  1 sibling, 0 replies; 104+ messages in thread
From: Paolo Bonzini @ 2021-08-18 19:47 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Thomas Lendacky, Brijesh Singh, Habkost, Eduardo, kvm,
	S. Tsirkin, Michael, Tobin Feldman-Fitzthum,
	James E . J . Bottomley, Richard Henderson, qemu-devel,
	David Gilbert, Hubertus Franke, Dov Murik

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

Il mer 18 ago 2021, 12:31 Ashish Kalra <ashish.kalra@amd.com> ha scritto:

> Hello Paolo,
>
> On Mon, Aug 16, 2021 at 05:38:55PM +0200, Paolo Bonzini wrote:
> > On 16/08/21 17:13, Ashish Kalra wrote:
> > > > > I think that once the mirror VM starts booting and running the UEFI
> > > > > code, it might be only during the PEI or DXE phase where it will
> > > > > start actually running the MH code, so mirror VM probably still
> need
> > > > > to handles KVM_EXIT_IO when SEC phase does I/O, I can see PIC
> > > > > accesses and Debug Agent initialization stuff in SEC startup code.
> > > > That may be a design of the migration helper code that you were
> working
> > > > with, but it's not necessary.
> > > >
> > > Actually my comments are about a more generic MH code.
> >
> > I don't think that would be a good idea; designing QEMU's migration
> helper
> > interface to be as constrained as possible is a good thing.  The
> migration
> > helper is extremely security sensitive code, so it should not expose
> itself
> > to the attack surface of the whole of QEMU.
> >
> >
> One question i have here, is that where exactly will the MH code exist
> in QEMU ?
>
> I assume it will be only x86 platform specific code, we probably will
> never support it on other platforms ?
>
> So it will probably exist in hw/i386, something similar to "microvm"
> support and using the same TYPE_X86_MACHINE ?
>
> Also if we are not going to use the existing KVM support code and
> adding some duplicate KVM interface code, do we need to interface with
> this added KVM code via the QEMU accelerator framework, or simply invoke
> this KVM code statically ?
>

I would expect to be mostly separate. Once we get a second architecture we
may move part of it into TYPE_ACCEL_KVM, but that can come later and
probably it would still not reuse much code from the full-blown KVM code
that supports interrupts, MMIO and all that.

Paolo


> Thanks,
> Ashish
>
>

[-- Attachment #2: Type: text/html, Size: 2752 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 19:04             ` Dr. David Alan Gilbert
@ 2021-08-18 21:42               ` Tobin Feldman-Fitzthum
  -1 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-18 21:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Steve Rutherford, Paolo Bonzini, Ashish Kalra, thomas.lendacky,
	brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, qemu-devel, frankeh, dovmurik

On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
> * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
>> On 8/17/21 6:04 PM, Steve Rutherford wrote:
>>> Ahh, It sounds like you are looking into sidestepping the existing
>>> AMD-SP flows for migration. I assume the idea is to spin up a VM on
>>> the target side, and have the two VMs attest to each other. How do the
>>> two sides know if the other is legitimate? I take it that the source
>>> is directing the LAUNCH flows?
>> Yeah we don't use PSP migration flows at all. We don't need to send the MH
>> code from the source to the target because the MH lives in firmware, which
>> is common between the two.
> Are you relying on the target firmware to be *identical* or purely for
> it to be *compatible* ?  It's normal for a migration to be the result of
> wanting to do an upgrade; and that means the destination build of OVMF
> might be newer (or older, or ...).
>
> Dave

This is a good point. The migration handler on the source and target 
must have the same memory footprint or bad things will happen. Using the 
same firmware on the source and target is an easy way to guarantee this. 
Since the MH in OVMF is not a contiguous region of memory, but a group 
of functions scattered around OVMF, it is a bit difficult to guarantee 
that the memory footprint is the same if the build is different.

-Tobin

>
>
>> We start the target like a normal VM rather than
>> waiting for an incoming migration. The plan is to treat the target like a
>> normal VM for attestation as well. The guest owner will attest the target VM
>> just like they would any other VM that is started on their behalf. Secret
>> injection can be used to establish a shared key for the source and target.
>>
>> -Tobin
>>
>>> --Steve
>>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-18 21:42               ` Tobin Feldman-Fitzthum
  0 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-18 21:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, richard.henderson, jejb, tobin, qemu-devel,
	frankeh, Paolo Bonzini, dovmurik

On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
> * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
>> On 8/17/21 6:04 PM, Steve Rutherford wrote:
>>> Ahh, It sounds like you are looking into sidestepping the existing
>>> AMD-SP flows for migration. I assume the idea is to spin up a VM on
>>> the target side, and have the two VMs attest to each other. How do the
>>> two sides know if the other is legitimate? I take it that the source
>>> is directing the LAUNCH flows?
>> Yeah we don't use PSP migration flows at all. We don't need to send the MH
>> code from the source to the target because the MH lives in firmware, which
>> is common between the two.
> Are you relying on the target firmware to be *identical* or purely for
> it to be *compatible* ?  It's normal for a migration to be the result of
> wanting to do an upgrade; and that means the destination build of OVMF
> might be newer (or older, or ...).
>
> Dave

This is a good point. The migration handler on the source and target 
must have the same memory footprint or bad things will happen. Using the 
same firmware on the source and target is an easy way to guarantee this. 
Since the MH in OVMF is not a contiguous region of memory, but a group 
of functions scattered around OVMF, it is a bit difficult to guarantee 
that the memory footprint is the same if the build is different.

-Tobin

>
>
>> We start the target like a normal VM rather than
>> waiting for an incoming migration. The plan is to treat the target like a
>> normal VM for attestation as well. The guest owner will attest the target VM
>> just like they would any other VM that is started on their behalf. Secret
>> injection can be used to establish a shared key for the source and target.
>>
>> -Tobin
>>
>>> --Steve
>>>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-18 21:42               ` Tobin Feldman-Fitzthum
@ 2021-08-19  8:22                 ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19  8:22 UTC (permalink / raw)
  To: Tobin Feldman-Fitzthum
  Cc: Steve Rutherford, Paolo Bonzini, Ashish Kalra, thomas.lendacky,
	brijesh.singh, ehabkost, kvm, mst, tobin, jejb,
	richard.henderson, qemu-devel, frankeh, dovmurik

* Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
> > * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > > On 8/17/21 6:04 PM, Steve Rutherford wrote:
> > > > Ahh, It sounds like you are looking into sidestepping the existing
> > > > AMD-SP flows for migration. I assume the idea is to spin up a VM on
> > > > the target side, and have the two VMs attest to each other. How do the
> > > > two sides know if the other is legitimate? I take it that the source
> > > > is directing the LAUNCH flows?
> > > Yeah we don't use PSP migration flows at all. We don't need to send the MH
> > > code from the source to the target because the MH lives in firmware, which
> > > is common between the two.
> > Are you relying on the target firmware to be *identical* or purely for
> > it to be *compatible* ?  It's normal for a migration to be the result of
> > wanting to do an upgrade; and that means the destination build of OVMF
> > might be newer (or older, or ...).
> > 
> > Dave
> 
> This is a good point. The migration handler on the source and target must
> have the same memory footprint or bad things will happen. Using the same
> firmware on the source and target is an easy way to guarantee this. Since
> the MH in OVMF is not a contiguous region of memory, but a group of
> functions scattered around OVMF, it is a bit difficult to guarantee that the
> memory footprint is the same if the build is different.

Can you explain what the 'memory footprint' consists of? Can't it just
be the whole of the OVMF rom space if you have no way of nudging the MH
into it's own chunk?

I think it really does have to cope with migration to a new version of
host.

Dave

> -Tobin
> 
> > 
> > 
> > > We start the target like a normal VM rather than
> > > waiting for an incoming migration. The plan is to treat the target like a
> > > normal VM for attestation as well. The guest owner will attest the target VM
> > > just like they would any other VM that is started on their behalf. Secret
> > > injection can be used to establish a shared key for the source and target.
> > > 
> > > -Tobin
> > > 
> > > > --Steve
> > > > 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-19  8:22                 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19  8:22 UTC (permalink / raw)
  To: Tobin Feldman-Fitzthum
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, richard.henderson, jejb, tobin, qemu-devel,
	frankeh, Paolo Bonzini, dovmurik

* Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
> > * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > > On 8/17/21 6:04 PM, Steve Rutherford wrote:
> > > > Ahh, It sounds like you are looking into sidestepping the existing
> > > > AMD-SP flows for migration. I assume the idea is to spin up a VM on
> > > > the target side, and have the two VMs attest to each other. How do the
> > > > two sides know if the other is legitimate? I take it that the source
> > > > is directing the LAUNCH flows?
> > > Yeah we don't use PSP migration flows at all. We don't need to send the MH
> > > code from the source to the target because the MH lives in firmware, which
> > > is common between the two.
> > Are you relying on the target firmware to be *identical* or purely for
> > it to be *compatible* ?  It's normal for a migration to be the result of
> > wanting to do an upgrade; and that means the destination build of OVMF
> > might be newer (or older, or ...).
> > 
> > Dave
> 
> This is a good point. The migration handler on the source and target must
> have the same memory footprint or bad things will happen. Using the same
> firmware on the source and target is an easy way to guarantee this. Since
> the MH in OVMF is not a contiguous region of memory, but a group of
> functions scattered around OVMF, it is a bit difficult to guarantee that the
> memory footprint is the same if the build is different.

Can you explain what the 'memory footprint' consists of? Can't it just
be the whole of the OVMF rom space if you have no way of nudging the MH
into it's own chunk?

I think it really does have to cope with migration to a new version of
host.

Dave

> -Tobin
> 
> > 
> > 
> > > We start the target like a normal VM rather than
> > > waiting for an incoming migration. The plan is to treat the target like a
> > > normal VM for attestation as well. The guest owner will attest the target VM
> > > just like they would any other VM that is started on their behalf. Secret
> > > injection can be used to establish a shared key for the source and target.
> > > 
> > > -Tobin
> > > 
> > > > --Steve
> > > > 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-19  8:22                 ` Dr. David Alan Gilbert
@ 2021-08-19 14:06                   ` James Bottomley
  -1 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-19 14:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Tobin Feldman-Fitzthum
  Cc: Steve Rutherford, Paolo Bonzini, Ashish Kalra, thomas.lendacky,
	brijesh.singh, ehabkost, kvm, mst, tobin, richard.henderson,
	qemu-devel, frankeh, dovmurik

On Thu, 2021-08-19 at 09:22 +0100, Dr. David Alan Gilbert wrote:
> * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
> > > * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > > > On 8/17/21 6:04 PM, Steve Rutherford wrote:
> > > > > Ahh, It sounds like you are looking into sidestepping the
> > > > > existing AMD-SP flows for migration. I assume the idea is to
> > > > > spin up a VM on the target side, and have the two VMs attest
> > > > > to each other. How do the two sides know if the other is
> > > > > legitimate? I take it that the source is directing the LAUNCH
> > > > > flows?
> > > >  
> > > > Yeah we don't use PSP migration flows at all. We don't need to
> > > > send the MH code from the source to the target because the MH
> > > > lives in firmware, which is common between the two.
> > >  
> > > Are you relying on the target firmware to be *identical* or
> > > purely for it to be *compatible* ?  It's normal for a migration
> > > to be the result of wanting to do an upgrade; and that means the
> > > destination build of OVMF might be newer (or older, or ...).
> > > 
> > > Dave
> > 
> > This is a good point. The migration handler on the source and
> > target must have the same memory footprint or bad things will
> > happen. Using the same firmware on the source and target is an easy
> > way to guarantee this. Since the MH in OVMF is not a contiguous
> > region of memory, but a group of functions scattered around OVMF,
> > it is a bit difficult to guarantee that the memory footprint is the
> > same if the build is different.
> 
> Can you explain what the 'memory footprint' consists of? Can't it
> just be the whole of the OVMF rom space if you have no way of nudging
> the MH into it's own chunk?

It might be possible depending on how we link it. At the moment it's
using the core OVMF libraries, but it is possible to retool the OVMF
build to copy those libraries into the MH DXE.

> I think it really does have to cope with migration to a new version
> of host.

Well, you're thinking of OVMF as belonging to the host because of the
way it is supplied, but think about the way it works in practice now,
forgetting about confidential computing: OVMF is RAM resident in
ordinary guests, so when you migrate them, the whole of OVMF (or at
least what's left at runtime) goes with the migration, thus it's not
possible to change the guest OVMF by migration.  The above is really
just an extension of that principle, the only difference for
confidential computing being you have to have an image of the current
OVMF ROM in the target to seed migration.

Technically, the problem is we can't overwrite running code and once
the guest is re-sited to the target, the OVMF there has to match
exactly what was on the source for the RT to still function.   Once the
migration has run, the OVMF on the target must be identical to what was
on the source (including internally allocated OVMF memory), and if we
can't copy the MH code, we have to rely on the target image providing
this identical code and we copy the rest.

James



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-19 14:06                   ` James Bottomley
  0 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-19 14:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Tobin Feldman-Fitzthum
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, richard.henderson, tobin, qemu-devel, frankeh,
	Paolo Bonzini, dovmurik

On Thu, 2021-08-19 at 09:22 +0100, Dr. David Alan Gilbert wrote:
> * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
> > > * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > > > On 8/17/21 6:04 PM, Steve Rutherford wrote:
> > > > > Ahh, It sounds like you are looking into sidestepping the
> > > > > existing AMD-SP flows for migration. I assume the idea is to
> > > > > spin up a VM on the target side, and have the two VMs attest
> > > > > to each other. How do the two sides know if the other is
> > > > > legitimate? I take it that the source is directing the LAUNCH
> > > > > flows?
> > > >  
> > > > Yeah we don't use PSP migration flows at all. We don't need to
> > > > send the MH code from the source to the target because the MH
> > > > lives in firmware, which is common between the two.
> > >  
> > > Are you relying on the target firmware to be *identical* or
> > > purely for it to be *compatible* ?  It's normal for a migration
> > > to be the result of wanting to do an upgrade; and that means the
> > > destination build of OVMF might be newer (or older, or ...).
> > > 
> > > Dave
> > 
> > This is a good point. The migration handler on the source and
> > target must have the same memory footprint or bad things will
> > happen. Using the same firmware on the source and target is an easy
> > way to guarantee this. Since the MH in OVMF is not a contiguous
> > region of memory, but a group of functions scattered around OVMF,
> > it is a bit difficult to guarantee that the memory footprint is the
> > same if the build is different.
> 
> Can you explain what the 'memory footprint' consists of? Can't it
> just be the whole of the OVMF rom space if you have no way of nudging
> the MH into it's own chunk?

It might be possible depending on how we link it. At the moment it's
using the core OVMF libraries, but it is possible to retool the OVMF
build to copy those libraries into the MH DXE.

> I think it really does have to cope with migration to a new version
> of host.

Well, you're thinking of OVMF as belonging to the host because of the
way it is supplied, but think about the way it works in practice now,
forgetting about confidential computing: OVMF is RAM resident in
ordinary guests, so when you migrate them, the whole of OVMF (or at
least what's left at runtime) goes with the migration, thus it's not
possible to change the guest OVMF by migration.  The above is really
just an extension of that principle, the only difference for
confidential computing being you have to have an image of the current
OVMF ROM in the target to seed migration.

Technically, the problem is we can't overwrite running code and once
the guest is re-sited to the target, the OVMF there has to match
exactly what was on the source for the RT to still function.   Once the
migration has run, the OVMF on the target must be identical to what was
on the source (including internally allocated OVMF memory), and if we
can't copy the MH code, we have to rely on the target image providing
this identical code and we copy the rest.

James




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-19  8:22                 ` Dr. David Alan Gilbert
@ 2021-08-19 14:07                   ` Tobin Feldman-Fitzthum
  -1 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-19 14:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, richard.henderson, jejb, tobin, qemu-devel,
	frankeh, Paolo Bonzini, dovmurik

On 8/19/21 4:22 AM, Dr. David Alan Gilbert wrote:
> * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
>> On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
>>
>>> Are you relying on the target firmware to be *identical* or purely for
>>> it to be *compatible* ?  It's normal for a migration to be the result of
>>> wanting to do an upgrade; and that means the destination build of OVMF
>>> might be newer (or older, or ...).
>>>
>>> Dave
>> This is a good point. The migration handler on the source and target must
>> have the same memory footprint or bad things will happen. Using the same
>> firmware on the source and target is an easy way to guarantee this. Since
>> the MH in OVMF is not a contiguous region of memory, but a group of
>> functions scattered around OVMF, it is a bit difficult to guarantee that the
>> memory footprint is the same if the build is different.
> Can you explain what the 'memory footprint' consists of? Can't it just
> be the whole of the OVMF rom space if you have no way of nudging the MH
> into it's own chunk?

The footprint is not massive. It is mainly ConfidentialMigrationDxe and 
the OVMF crypto support. It might be feasible to copy these components 
to a fixed location that would be the same across fw builds. It might 
also be feasible to pin these components to certain addresses. OVMF sort 
of supports doing this. We can raise the question in that community.

It also might work to protect the entirety of OVMF as you suggest. 
Currently we don't copy any of the OVMF ROM (as in flash0) over. That 
said, the MH doesn't run from the ROM so we would need to protect the 
memory used by OVMF as well. In some ways it might seem easier to 
protect all of the OVMF memory rather than just a couple of packages, 
but there are some complexities. For one thing, we would only want to 
protect efi runtime memory, as boot memory may be in use by the OS and 
would need to be migrated. The MH could check whether each page is efi 
runtime memory and skip any pages that are. Runtime memory won't be a 
contiguous blob, however, so for this approach the layout of the runtime 
memory would need to be the same on the source and target.

We can sidestep these issues entirely by using identical firmware 
images. That said, there are a number of strategies for developing 
compatible OVMF images and I definitely see the value of doing so.

-Tobin

>
> I think it really does have to cope with migration to a new version of
> host.
>
> Dave
>
>> -Tobin
>>
>>>
>>>> We start the target like a normal VM rather than
>>>> waiting for an incoming migration. The plan is to treat the target like a
>>>> normal VM for attestation as well. The guest owner will attest the target VM
>>>> just like they would any other VM that is started on their behalf. Secret
>>>> injection can be used to establish a shared key for the source and target.
>>>>
>>>> -Tobin
>>>>
>>>>> --Steve
>>>>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-19 14:07                   ` Tobin Feldman-Fitzthum
  0 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-19 14:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, tobin, jejb, richard.henderson, qemu-devel,
	frankeh, Paolo Bonzini, dovmurik

On 8/19/21 4:22 AM, Dr. David Alan Gilbert wrote:
> * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
>> On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
>>
>>> Are you relying on the target firmware to be *identical* or purely for
>>> it to be *compatible* ?  It's normal for a migration to be the result of
>>> wanting to do an upgrade; and that means the destination build of OVMF
>>> might be newer (or older, or ...).
>>>
>>> Dave
>> This is a good point. The migration handler on the source and target must
>> have the same memory footprint or bad things will happen. Using the same
>> firmware on the source and target is an easy way to guarantee this. Since
>> the MH in OVMF is not a contiguous region of memory, but a group of
>> functions scattered around OVMF, it is a bit difficult to guarantee that the
>> memory footprint is the same if the build is different.
> Can you explain what the 'memory footprint' consists of? Can't it just
> be the whole of the OVMF rom space if you have no way of nudging the MH
> into it's own chunk?

The footprint is not massive. It is mainly ConfidentialMigrationDxe and 
the OVMF crypto support. It might be feasible to copy these components 
to a fixed location that would be the same across fw builds. It might 
also be feasible to pin these components to certain addresses. OVMF sort 
of supports doing this. We can raise the question in that community.

It also might work to protect the entirety of OVMF as you suggest. 
Currently we don't copy any of the OVMF ROM (as in flash0) over. That 
said, the MH doesn't run from the ROM so we would need to protect the 
memory used by OVMF as well. In some ways it might seem easier to 
protect all of the OVMF memory rather than just a couple of packages, 
but there are some complexities. For one thing, we would only want to 
protect efi runtime memory, as boot memory may be in use by the OS and 
would need to be migrated. The MH could check whether each page is efi 
runtime memory and skip any pages that are. Runtime memory won't be a 
contiguous blob, however, so for this approach the layout of the runtime 
memory would need to be the same on the source and target.

We can sidestep these issues entirely by using identical firmware 
images. That said, there are a number of strategies for developing 
compatible OVMF images and I definitely see the value of doing so.

-Tobin

>
> I think it really does have to cope with migration to a new version of
> host.
>
> Dave
>
>> -Tobin
>>
>>>
>>>> We start the target like a normal VM rather than
>>>> waiting for an incoming migration. The plan is to treat the target like a
>>>> normal VM for attestation as well. The guest owner will attest the target VM
>>>> just like they would any other VM that is started on their behalf. Secret
>>>> injection can be used to establish a shared key for the source and target.
>>>>
>>>> -Tobin
>>>>
>>>>> --Steve
>>>>>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-19 14:06                   ` James Bottomley
@ 2021-08-19 14:28                     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 14:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tobin Feldman-Fitzthum, Steve Rutherford, Paolo Bonzini,
	Ashish Kalra, thomas.lendacky, brijesh.singh, ehabkost, kvm, mst,
	tobin, richard.henderson, qemu-devel, frankeh, dovmurik

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Thu, 2021-08-19 at 09:22 +0100, Dr. David Alan Gilbert wrote:
> > * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > > On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
> > > > * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > > > > On 8/17/21 6:04 PM, Steve Rutherford wrote:
> > > > > > Ahh, It sounds like you are looking into sidestepping the
> > > > > > existing AMD-SP flows for migration. I assume the idea is to
> > > > > > spin up a VM on the target side, and have the two VMs attest
> > > > > > to each other. How do the two sides know if the other is
> > > > > > legitimate? I take it that the source is directing the LAUNCH
> > > > > > flows?
> > > > >  
> > > > > Yeah we don't use PSP migration flows at all. We don't need to
> > > > > send the MH code from the source to the target because the MH
> > > > > lives in firmware, which is common between the two.
> > > >  
> > > > Are you relying on the target firmware to be *identical* or
> > > > purely for it to be *compatible* ?  It's normal for a migration
> > > > to be the result of wanting to do an upgrade; and that means the
> > > > destination build of OVMF might be newer (or older, or ...).
> > > > 
> > > > Dave
> > > 
> > > This is a good point. The migration handler on the source and
> > > target must have the same memory footprint or bad things will
> > > happen. Using the same firmware on the source and target is an easy
> > > way to guarantee this. Since the MH in OVMF is not a contiguous
> > > region of memory, but a group of functions scattered around OVMF,
> > > it is a bit difficult to guarantee that the memory footprint is the
> > > same if the build is different.
> > 
> > Can you explain what the 'memory footprint' consists of? Can't it
> > just be the whole of the OVMF rom space if you have no way of nudging
> > the MH into it's own chunk?
> 
> It might be possible depending on how we link it. At the moment it's
> using the core OVMF libraries, but it is possible to retool the OVMF
> build to copy those libraries into the MH DXE.
> 
> > I think it really does have to cope with migration to a new version
> > of host.
> 
> Well, you're thinking of OVMF as belonging to the host because of the
> way it is supplied, but think about the way it works in practice now,
> forgetting about confidential computing: OVMF is RAM resident in
> ordinary guests, so when you migrate them, the whole of OVMF (or at
> least what's left at runtime) goes with the migration, thus it's not
> possible to change the guest OVMF by migration.  The above is really
> just an extension of that principle, the only difference for
> confidential computing being you have to have an image of the current
> OVMF ROM in the target to seed migration.
> 
> Technically, the problem is we can't overwrite running code and once
> the guest is re-sited to the target, the OVMF there has to match
> exactly what was on the source for the RT to still function.   Once the
> migration has run, the OVMF on the target must be identical to what was
> on the source (including internally allocated OVMF memory), and if we
> can't copy the MH code, we have to rely on the target image providing
> this identical code and we copy the rest.

I'm OK with the OVMF now being part of the guest image, and having to
exist on both; it's a bit delicate though unless we have a way to check
it (is there an attest of the destination happening here?)

Dave

> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-19 14:28                     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 14:28 UTC (permalink / raw)
  To: James Bottomley
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, richard.henderson, tobin, qemu-devel, frankeh,
	Tobin Feldman-Fitzthum, Paolo Bonzini, dovmurik

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Thu, 2021-08-19 at 09:22 +0100, Dr. David Alan Gilbert wrote:
> > * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > > On 8/18/21 3:04 PM, Dr. David Alan Gilbert wrote:
> > > > * Tobin Feldman-Fitzthum (tobin@linux.ibm.com) wrote:
> > > > > On 8/17/21 6:04 PM, Steve Rutherford wrote:
> > > > > > Ahh, It sounds like you are looking into sidestepping the
> > > > > > existing AMD-SP flows for migration. I assume the idea is to
> > > > > > spin up a VM on the target side, and have the two VMs attest
> > > > > > to each other. How do the two sides know if the other is
> > > > > > legitimate? I take it that the source is directing the LAUNCH
> > > > > > flows?
> > > > >  
> > > > > Yeah we don't use PSP migration flows at all. We don't need to
> > > > > send the MH code from the source to the target because the MH
> > > > > lives in firmware, which is common between the two.
> > > >  
> > > > Are you relying on the target firmware to be *identical* or
> > > > purely for it to be *compatible* ?  It's normal for a migration
> > > > to be the result of wanting to do an upgrade; and that means the
> > > > destination build of OVMF might be newer (or older, or ...).
> > > > 
> > > > Dave
> > > 
> > > This is a good point. The migration handler on the source and
> > > target must have the same memory footprint or bad things will
> > > happen. Using the same firmware on the source and target is an easy
> > > way to guarantee this. Since the MH in OVMF is not a contiguous
> > > region of memory, but a group of functions scattered around OVMF,
> > > it is a bit difficult to guarantee that the memory footprint is the
> > > same if the build is different.
> > 
> > Can you explain what the 'memory footprint' consists of? Can't it
> > just be the whole of the OVMF rom space if you have no way of nudging
> > the MH into it's own chunk?
> 
> It might be possible depending on how we link it. At the moment it's
> using the core OVMF libraries, but it is possible to retool the OVMF
> build to copy those libraries into the MH DXE.
> 
> > I think it really does have to cope with migration to a new version
> > of host.
> 
> Well, you're thinking of OVMF as belonging to the host because of the
> way it is supplied, but think about the way it works in practice now,
> forgetting about confidential computing: OVMF is RAM resident in
> ordinary guests, so when you migrate them, the whole of OVMF (or at
> least what's left at runtime) goes with the migration, thus it's not
> possible to change the guest OVMF by migration.  The above is really
> just an extension of that principle, the only difference for
> confidential computing being you have to have an image of the current
> OVMF ROM in the target to seed migration.
> 
> Technically, the problem is we can't overwrite running code and once
> the guest is re-sited to the target, the OVMF there has to match
> exactly what was on the source for the RT to still function.   Once the
> migration has run, the OVMF on the target must be identical to what was
> on the source (including internally allocated OVMF memory), and if we
> can't copy the MH code, we have to rely on the target image providing
> this identical code and we copy the rest.

I'm OK with the OVMF now being part of the guest image, and having to
exist on both; it's a bit delicate though unless we have a way to check
it (is there an attest of the destination happening here?)

Dave

> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-19 14:28                     ` Dr. David Alan Gilbert
@ 2021-08-19 22:10                       ` James Bottomley
  -1 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-19 22:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Tobin Feldman-Fitzthum, Steve Rutherford, Paolo Bonzini,
	Ashish Kalra, thomas.lendacky, brijesh.singh, ehabkost, kvm, mst,
	tobin, richard.henderson, qemu-devel, frankeh, dovmurik

On Thu, 2021-08-19 at 15:28 +0100, Dr. David Alan Gilbert wrote:
> * James Bottomley (jejb@linux.ibm.com) wrote:
> > On Thu, 2021-08-19 at 09:22 +0100, Dr. David Alan Gilbert wrote:
[...]
> > > I think it really does have to cope with migration to a new
> > > version of host.
> > 
> > Well, you're thinking of OVMF as belonging to the host because of
> > the way it is supplied, but think about the way it works in
> > practice now, forgetting about confidential computing: OVMF is RAM
> > resident in ordinary guests, so when you migrate them, the whole of
> > OVMF (or at least what's left at runtime) goes with the migration,
> > thus it's not possible to change the guest OVMF by migration.  The
> > above is really just an extension of that principle, the only
> > difference for confidential computing being you have to have an
> > image of the current OVMF ROM in the target to seed migration.
> > 
> > Technically, the problem is we can't overwrite running code and
> > once the guest is re-sited to the target, the OVMF there has to
> > match exactly what was on the source for the RT to still
> > function.   Once the migration has run, the OVMF on the target must
> > be identical to what was on the source (including internally
> > allocated OVMF memory), and if we can't copy the MH code, we have
> > to rely on the target image providing this identical code and we
> > copy the rest.
> 
> I'm OK with the OVMF now being part of the guest image, and having to
> exist on both; it's a bit delicate though unless we have a way to
> check it (is there an attest of the destination happening here?)

There will be in the final version.  The attestations of the source and
target, being the hash of the OVMF (with the registers in the -ES
case), should be the same (modulo any firmware updates to the PSP,
whose firmware version is also hashed) to guarantee the OVMF is the
same on both sides.  We'll definitely take an action to get QEMU to
verify this ... made a lot easier now we have signed attestations ...

James



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-19 22:10                       ` James Bottomley
  0 siblings, 0 replies; 104+ messages in thread
From: James Bottomley @ 2021-08-19 22:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, richard.henderson, tobin, qemu-devel, frankeh,
	Tobin Feldman-Fitzthum, Paolo Bonzini, dovmurik

On Thu, 2021-08-19 at 15:28 +0100, Dr. David Alan Gilbert wrote:
> * James Bottomley (jejb@linux.ibm.com) wrote:
> > On Thu, 2021-08-19 at 09:22 +0100, Dr. David Alan Gilbert wrote:
[...]
> > > I think it really does have to cope with migration to a new
> > > version of host.
> > 
> > Well, you're thinking of OVMF as belonging to the host because of
> > the way it is supplied, but think about the way it works in
> > practice now, forgetting about confidential computing: OVMF is RAM
> > resident in ordinary guests, so when you migrate them, the whole of
> > OVMF (or at least what's left at runtime) goes with the migration,
> > thus it's not possible to change the guest OVMF by migration.  The
> > above is really just an extension of that principle, the only
> > difference for confidential computing being you have to have an
> > image of the current OVMF ROM in the target to seed migration.
> > 
> > Technically, the problem is we can't overwrite running code and
> > once the guest is re-sited to the target, the OVMF there has to
> > match exactly what was on the source for the RT to still
> > function.   Once the migration has run, the OVMF on the target must
> > be identical to what was on the source (including internally
> > allocated OVMF memory), and if we can't copy the MH code, we have
> > to rely on the target image providing this identical code and we
> > copy the rest.
> 
> I'm OK with the OVMF now being part of the guest image, and having to
> exist on both; it's a bit delicate though unless we have a way to
> check it (is there an attest of the destination happening here?)

There will be in the final version.  The attestations of the source and
target, being the hash of the OVMF (with the registers in the -ES
case), should be the same (modulo any firmware updates to the PSP,
whose firmware version is also hashed) to guarantee the OVMF is the
same on both sides.  We'll definitely take an action to get QEMU to
verify this ... made a lot easier now we have signed attestations ...

James




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-19 22:10                       ` James Bottomley
@ 2021-08-23 12:26                         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-23 12:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tobin Feldman-Fitzthum, Steve Rutherford, Paolo Bonzini,
	Ashish Kalra, thomas.lendacky, brijesh.singh, ehabkost, kvm, mst,
	tobin, richard.henderson, qemu-devel, frankeh, dovmurik

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Thu, 2021-08-19 at 15:28 +0100, Dr. David Alan Gilbert wrote:
> > * James Bottomley (jejb@linux.ibm.com) wrote:
> > > On Thu, 2021-08-19 at 09:22 +0100, Dr. David Alan Gilbert wrote:
> [...]
> > > > I think it really does have to cope with migration to a new
> > > > version of host.
> > > 
> > > Well, you're thinking of OVMF as belonging to the host because of
> > > the way it is supplied, but think about the way it works in
> > > practice now, forgetting about confidential computing: OVMF is RAM
> > > resident in ordinary guests, so when you migrate them, the whole of
> > > OVMF (or at least what's left at runtime) goes with the migration,
> > > thus it's not possible to change the guest OVMF by migration.  The
> > > above is really just an extension of that principle, the only
> > > difference for confidential computing being you have to have an
> > > image of the current OVMF ROM in the target to seed migration.
> > > 
> > > Technically, the problem is we can't overwrite running code and
> > > once the guest is re-sited to the target, the OVMF there has to
> > > match exactly what was on the source for the RT to still
> > > function.   Once the migration has run, the OVMF on the target must
> > > be identical to what was on the source (including internally
> > > allocated OVMF memory), and if we can't copy the MH code, we have
> > > to rely on the target image providing this identical code and we
> > > copy the rest.
> > 
> > I'm OK with the OVMF now being part of the guest image, and having to
> > exist on both; it's a bit delicate though unless we have a way to
> > check it (is there an attest of the destination happening here?)
> 
> There will be in the final version.  The attestations of the source and
> target, being the hash of the OVMF (with the registers in the -ES
> case), should be the same (modulo any firmware updates to the PSP,
> whose firmware version is also hashed) to guarantee the OVMF is the
> same on both sides.  We'll definitely take an action to get QEMU to
> verify this ... made a lot easier now we have signed attestations ...

Hmm; I'm not sure you're allowed to have QEMU verify that - we don't
trust it; you need to have either the firmware say it's OK to migrate
to the destination (using the existing PSP mechanism) or get the source
MH to verify a quote from the destination.

[Somewhere along the line, if you're not using the PSP, I think you also
need to check the guest policy to check it is allowed to migrate].

Dave

> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-23 12:26                         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 104+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-23 12:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, richard.henderson, tobin, qemu-devel, frankeh,
	Tobin Feldman-Fitzthum, Paolo Bonzini, dovmurik

* James Bottomley (jejb@linux.ibm.com) wrote:
> On Thu, 2021-08-19 at 15:28 +0100, Dr. David Alan Gilbert wrote:
> > * James Bottomley (jejb@linux.ibm.com) wrote:
> > > On Thu, 2021-08-19 at 09:22 +0100, Dr. David Alan Gilbert wrote:
> [...]
> > > > I think it really does have to cope with migration to a new
> > > > version of host.
> > > 
> > > Well, you're thinking of OVMF as belonging to the host because of
> > > the way it is supplied, but think about the way it works in
> > > practice now, forgetting about confidential computing: OVMF is RAM
> > > resident in ordinary guests, so when you migrate them, the whole of
> > > OVMF (or at least what's left at runtime) goes with the migration,
> > > thus it's not possible to change the guest OVMF by migration.  The
> > > above is really just an extension of that principle, the only
> > > difference for confidential computing being you have to have an
> > > image of the current OVMF ROM in the target to seed migration.
> > > 
> > > Technically, the problem is we can't overwrite running code and
> > > once the guest is re-sited to the target, the OVMF there has to
> > > match exactly what was on the source for the RT to still
> > > function.   Once the migration has run, the OVMF on the target must
> > > be identical to what was on the source (including internally
> > > allocated OVMF memory), and if we can't copy the MH code, we have
> > > to rely on the target image providing this identical code and we
> > > copy the rest.
> > 
> > I'm OK with the OVMF now being part of the guest image, and having to
> > exist on both; it's a bit delicate though unless we have a way to
> > check it (is there an attest of the destination happening here?)
> 
> There will be in the final version.  The attestations of the source and
> target, being the hash of the OVMF (with the registers in the -ES
> case), should be the same (modulo any firmware updates to the PSP,
> whose firmware version is also hashed) to guarantee the OVMF is the
> same on both sides.  We'll definitely take an action to get QEMU to
> verify this ... made a lot easier now we have signed attestations ...

Hmm; I'm not sure you're allowed to have QEMU verify that - we don't
trust it; you need to have either the firmware say it's OK to migrate
to the destination (using the existing PSP mechanism) or get the source
MH to verify a quote from the destination.

[Somewhere along the line, if you're not using the PSP, I think you also
need to check the guest policy to check it is allowed to migrate].

Dave

> James
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
  2021-08-23 12:26                         ` Dr. David Alan Gilbert
@ 2021-08-23 16:28                           ` Tobin Feldman-Fitzthum
  -1 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-23 16:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, James Bottomley
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, richard.henderson, tobin, qemu-devel, frankeh,
	Paolo Bonzini, dovmurik

On 8/23/21 8:26 AM, Dr. David Alan Gilbert wrote:

> * James Bottomley (jejb@linux.ibm.com) wrote:
>
>> (is there an attest of the destination happening here?)
>> There will be in the final version.  The attestations of the source and
>> target, being the hash of the OVMF (with the registers in the -ES
>> case), should be the same (modulo any firmware updates to the PSP,
>> whose firmware version is also hashed) to guarantee the OVMF is the
>> same on both sides.  We'll definitely take an action to get QEMU to
>> verify this ... made a lot easier now we have signed attestations ...
> Hmm; I'm not sure you're allowed to have QEMU verify that - we don't
> trust it; you need to have either the firmware say it's OK to migrate
> to the destination (using the existing PSP mechanism) or get the source
> MH to verify a quote from the destination.

I think the check in QEMU would only be a convenience. The launch 
measurement of the target (verified by the guest owner) is what 
guarantees that the firmware, as well as the policy, of the target is 
what is expected. In PSP-assisted migration the source verifies the 
target, but our plan is to have the guest owner verify both the source 
and the target. The target will only be provisioned with the transport 
key if the measurement checks out. We will have some more details about 
this key agreement scheme soon.

> [Somewhere along the line, if you're not using the PSP, I think you also
> need to check the guest policy to check it is allowed to migrate].

Sources that aren't allowed to migrate won't be provisioned with 
transport key to encrypt pages. A non-migrateable guest could also be 
booted with OvmfPkg firmware, which does not contain the migration handler.

-Tobin

> Dave
>
>> James
>>
>>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-23 16:28                           ` Tobin Feldman-Fitzthum
  0 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-23 16:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, James Bottomley
  Cc: thomas.lendacky, Ashish Kalra, brijesh.singh, ehabkost, kvm, mst,
	Steve Rutherford, tobin, richard.henderson, qemu-devel, frankeh,
	Paolo Bonzini, dovmurik

On 8/23/21 8:26 AM, Dr. David Alan Gilbert wrote:

> * James Bottomley (jejb@linux.ibm.com) wrote:
>
>> (is there an attest of the destination happening here?)
>> There will be in the final version.  The attestations of the source and
>> target, being the hash of the OVMF (with the registers in the -ES
>> case), should be the same (modulo any firmware updates to the PSP,
>> whose firmware version is also hashed) to guarantee the OVMF is the
>> same on both sides.  We'll definitely take an action to get QEMU to
>> verify this ... made a lot easier now we have signed attestations ...
> Hmm; I'm not sure you're allowed to have QEMU verify that - we don't
> trust it; you need to have either the firmware say it's OK to migrate
> to the destination (using the existing PSP mechanism) or get the source
> MH to verify a quote from the destination.

I think the check in QEMU would only be a convenience. The launch 
measurement of the target (verified by the guest owner) is what 
guarantees that the firmware, as well as the policy, of the target is 
what is expected. In PSP-assisted migration the source verifies the 
target, but our plan is to have the guest owner verify both the source 
and the target. The target will only be provisioned with the transport 
key if the measurement checks out. We will have some more details about 
this key agreement scheme soon.

> [Somewhere along the line, if you're not using the PSP, I think you also
> need to check the guest policy to check it is allowed to migrate].

Sources that aren't allowed to migrate won't be provisioned with 
transport key to encrypt pages. A non-migrateable guest could also be 
booted with OvmfPkg firmware, which does not contain the migration handler.

-Tobin

> Dave
>
>> James
>>
>>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 15:07 ` Tobin Feldman-Fitzthum
  0 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-16 15:07 UTC (permalink / raw)
  To: ashish.kalra
  Cc: qemu-devel, Tom Lendacky, Brijesh Singh, ehabkost, mst,
	richard.henderson, James Bottomley, dovmurik, Hubertus Franke,
	Dr. David Alan Gilbert, kvm list

On Mon, Aug 16 at 10:44 AM Ashish Kalra wrote:

 > I am not sure if we really don't need QEMU's MMIO logic, I think that 
once the>
 > mirror VM starts booting and running the UEFI code, it might be only 
during
 > the PEI or DXE phase where it will start actually running the MH code,
 > so mirror VM probably still need to handles KVM_EXIT_IO when SEC 
phase does I/O,
 > I can see PIC accesses and Debug Agent initialization stuff in SEC 
startup code.

The migration handler prototype that we are releasing in the near future 
does not use the SEC or PEI phases in the mirror. We have some support 
code that runs in the main VM and sets up the migration handler entry 
point. QEMU starts the mirror pointing to this entry point, which does 
some more setup (like switching to long mode) and jumps to the migration 
handler.

-Tobin

 > Addtionally this still requires CPUState{..} structure and the backing
 > "X86CPU" structure, for example, as part of kvm_arch_post_run() to get
 > the MemTxAttrs needed by kvm_handle_io().

 > Thanks,
 > Ashish


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC PATCH 00/13] Add support for Mirror VM.
@ 2021-08-16 15:07 ` Tobin Feldman-Fitzthum
  0 siblings, 0 replies; 104+ messages in thread
From: Tobin Feldman-Fitzthum @ 2021-08-16 15:07 UTC (permalink / raw)
  To: ashish.kalra
  Cc: Tom Lendacky, Brijesh Singh, ehabkost, kvm list, mst,
	James Bottomley, richard.henderson, qemu-devel,
	Dr. David Alan Gilbert, Hubertus Franke, dovmurik

On Mon, Aug 16 at 10:44 AM Ashish Kalra wrote:

 > I am not sure if we really don't need QEMU's MMIO logic, I think that 
once the>
 > mirror VM starts booting and running the UEFI code, it might be only 
during
 > the PEI or DXE phase where it will start actually running the MH code,
 > so mirror VM probably still need to handles KVM_EXIT_IO when SEC 
phase does I/O,
 > I can see PIC accesses and Debug Agent initialization stuff in SEC 
startup code.

The migration handler prototype that we are releasing in the near future 
does not use the SEC or PEI phases in the mirror. We have some support 
code that runs in the main VM and sets up the migration handler entry 
point. QEMU starts the mirror pointing to this entry point, which does 
some more setup (like switching to long mode) and jumps to the migration 
handler.

-Tobin

 > Addtionally this still requires CPUState{..} structure and the backing
 > "X86CPU" structure, for example, as part of kvm_arch_post_run() to get
 > the MemTxAttrs needed by kvm_handle_io().

 > Thanks,
 > Ashish



^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2021-08-23 16:29 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16 13:25 [RFC PATCH 00/13] Add support for Mirror VM Ashish Kalra
2021-08-16 13:26 ` [RFC PATCH 01/13] machine: Add mirrorvcpus=N suboption to -smp Ashish Kalra
2021-08-16 21:23   ` Eric Blake
2021-08-16 21:23     ` Eric Blake
2021-08-16 13:27 ` [RFC PATCH 02/13] hw/boards: Add mirror_vcpu flag to CPUArchId Ashish Kalra
2021-08-16 13:27 ` [RFC PATCH 03/13] hw/i386: Mark mirror vcpus in possible_cpus Ashish Kalra
2021-08-16 13:27 ` [RFC PATCH 04/13] hw/acpi: Don't include mirror vcpus in ACPI tables Ashish Kalra
2021-08-16 13:28 ` [RFC PATCH 05/13] cpu: Add boolean mirror_vcpu field to CPUState Ashish Kalra
2021-08-16 13:28 ` [RFC PATCH 06/13] hw/i386: Set CPUState.mirror_vcpu=true for mirror vcpus Ashish Kalra
2021-08-16 13:29 ` [RFC PATCH 07/13] kvm: Add Mirror VM ioctl and enable cap interfaces Ashish Kalra
2021-08-16 13:29 ` [RFC PATCH 08/13] kvm: Add Mirror VM support Ashish Kalra
2021-08-16 13:29 ` [RFC PATCH 09/13] kvm: create Mirror VM and share primary VM's encryption context Ashish Kalra
2021-08-16 13:30 ` [RFC PATCH 10/13] softmmu/cpu: Skip mirror vcpu's for pause, resume and synchronization Ashish Kalra
2021-08-16 13:30 ` [RFC PATCH 11/13] kvm/apic: Disable in-kernel APIC support for mirror vcpu's Ashish Kalra
2021-08-16 13:31 ` [RFC PATCH 12/13] hw/acpi: disable modern CPU hotplug interface " Ashish Kalra
2021-08-16 13:31 ` [RFC PATCH 13/13] hw/i386/pc: reduce fw_cfg boot cpu count taking into account " Ashish Kalra
2021-08-16 14:01 ` [RFC PATCH 00/13] Add support for Mirror VM Claudio Fontana
2021-08-16 14:01   ` Claudio Fontana
2021-08-16 14:15 ` Paolo Bonzini
2021-08-16 14:15   ` Paolo Bonzini
2021-08-16 14:23   ` Daniel P. Berrangé
2021-08-16 14:23     ` Daniel P. Berrangé
2021-08-16 15:00     ` Paolo Bonzini
2021-08-16 15:00       ` Paolo Bonzini
2021-08-16 15:16       ` Daniel P. Berrangé
2021-08-16 15:16         ` Daniel P. Berrangé
2021-08-16 15:35         ` Paolo Bonzini
2021-08-16 15:35           ` Paolo Bonzini
2021-08-16 14:44   ` Ashish Kalra
2021-08-16 14:58     ` Paolo Bonzini
2021-08-16 14:58       ` Paolo Bonzini
2021-08-16 15:13       ` Ashish Kalra
2021-08-16 15:38         ` Paolo Bonzini
2021-08-16 15:38           ` Paolo Bonzini
2021-08-16 15:48           ` Dr. David Alan Gilbert
2021-08-16 15:48             ` Dr. David Alan Gilbert
2021-08-18 10:31           ` Ashish Kalra
2021-08-18 11:25             ` James Bottomley
2021-08-18 11:25               ` James Bottomley
2021-08-18 15:31               ` Dr. David Alan Gilbert
2021-08-18 15:31                 ` Dr. David Alan Gilbert
2021-08-18 15:35                 ` James Bottomley
2021-08-18 15:35                   ` James Bottomley
2021-08-18 15:43                   ` Dr. David Alan Gilbert
2021-08-18 15:43                     ` Dr. David Alan Gilbert
2021-08-18 16:28                     ` James Bottomley
2021-08-18 16:28                       ` James Bottomley
2021-08-18 17:30                       ` Dr. David Alan Gilbert
2021-08-18 17:30                         ` Dr. David Alan Gilbert
2021-08-18 18:51                         ` James Bottomley
2021-08-18 18:51                           ` James Bottomley
2021-08-18 19:47             ` Paolo Bonzini
2021-08-16 17:23   ` Dr. David Alan Gilbert
2021-08-16 17:23     ` Dr. David Alan Gilbert
2021-08-16 20:53     ` Paolo Bonzini
2021-08-16 23:53 ` Steve Rutherford
2021-08-16 23:53   ` Steve Rutherford
2021-08-17  7:05   ` Michael S. Tsirkin
2021-08-17  7:05     ` Michael S. Tsirkin
2021-08-17  8:38   ` Dr. David Alan Gilbert
2021-08-17  8:38     ` Dr. David Alan Gilbert
2021-08-17 14:08     ` Ashish Kalra
2021-08-17 16:32   ` Paolo Bonzini
2021-08-17 16:32     ` Paolo Bonzini
2021-08-17 20:50     ` Tobin Feldman-Fitzthum
2021-08-17 20:50       ` Tobin Feldman-Fitzthum
2021-08-17 22:04       ` Steve Rutherford
2021-08-17 22:04         ` Steve Rutherford
2021-08-18 15:32         ` Tobin Feldman-Fitzthum
2021-08-18 15:32           ` Tobin Feldman-Fitzthum
2021-08-18 19:04           ` Dr. David Alan Gilbert
2021-08-18 19:04             ` Dr. David Alan Gilbert
2021-08-18 21:42             ` Tobin Feldman-Fitzthum
2021-08-18 21:42               ` Tobin Feldman-Fitzthum
2021-08-19  8:22               ` Dr. David Alan Gilbert
2021-08-19  8:22                 ` Dr. David Alan Gilbert
2021-08-19 14:06                 ` James Bottomley
2021-08-19 14:06                   ` James Bottomley
2021-08-19 14:28                   ` Dr. David Alan Gilbert
2021-08-19 14:28                     ` Dr. David Alan Gilbert
2021-08-19 22:10                     ` James Bottomley
2021-08-19 22:10                       ` James Bottomley
2021-08-23 12:26                       ` Dr. David Alan Gilbert
2021-08-23 12:26                         ` Dr. David Alan Gilbert
2021-08-23 16:28                         ` Tobin Feldman-Fitzthum
2021-08-23 16:28                           ` Tobin Feldman-Fitzthum
2021-08-19 14:07                 ` Tobin Feldman-Fitzthum
2021-08-19 14:07                   ` Tobin Feldman-Fitzthum
2021-08-17 23:20       ` Paolo Bonzini
2021-08-17 23:20         ` Paolo Bonzini
2021-08-17 21:54     ` Steve Rutherford
2021-08-17 21:54       ` Steve Rutherford
2021-08-17 22:37       ` Paolo Bonzini
2021-08-17 22:37         ` Paolo Bonzini
2021-08-17 22:57         ` James Bottomley
2021-08-17 22:57           ` James Bottomley
2021-08-17 23:10           ` Steve Rutherford
2021-08-17 23:10             ` Steve Rutherford
2021-08-18  2:49             ` James Bottomley
2021-08-18  2:49               ` James Bottomley
2021-08-18 14:06         ` Ashish Kalra
2021-08-18 17:07           ` Ashish Kalra
2021-08-16 15:07 Tobin Feldman-Fitzthum
2021-08-16 15:07 ` Tobin Feldman-Fitzthum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.