linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 V4] Add AMD SEV and SEV-ES intra host migration support
@ 2021-08-19 15:49 Peter Gonda
  2021-08-19 15:49 ` [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration Peter Gonda
  2021-08-19 15:49 ` [PATCH 2/2 V4] KVM, SEV: Add support for SEV-ES " Peter Gonda
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Gonda @ 2021-08-19 15:49 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Paolo Bonzini, Sean Christopherson, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

Intra host migration provides a low-cost mechanism for userspace VMM
upgrades.  It is an alternative to traditional (i.e., remote) live
migration. Whereas remote migration handles moving a guest to a new host,
intra host migration only handles moving a guest to a new userspace VMM
within a host.  This can be used to update, rollback, change flags of the
VMM, etc. The lower cost compared to live migration comes from the fact
that the guest's memory does not need to be copied between processes. A
handle to the guest memory simply gets passed to the new VMM, this could
be done via /dev/shm with share=on or similar feature.

The guest state can be transferred from an old VMM to a new VMM as follows:
1. Export guest state from KVM to the old user-space VMM via a getter
user-space/kernel API 2. Transfer guest state from old VMM to new VMM via
IPC communication 3. Import guest state into KVM from the new user-space
VMM via a setter user-space/kernel API VMMs by exporting from KVM using
getters, sending that data to the new VMM, then setting it again in KVM.

In the common case for intra host migration, we can rely on the normal
ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other
confidential compute environments make most of this information opaque, and
render KVM ioctls such as "KVM_GET_REGS" irrelevant.  As a result, we need
the ability to pass this opaque metadata from one VMM to the next. The
easiest way to do this is to leave this data in the kernel, and transfer
ownership of the metadata from one KVM VM (or vCPU) to the next. For
example, we need to move the SEV enabled ASID, VMSAs, and GHCB metadata
from one VMM to the next.  In general, we need to be able to hand off any
data that would be unsafe/impossible for the kernel to hand directly to
userspace (and cannot be reproduced using data that can be handed safely to
userspace).

For the intra host operation the SEV required metadata, the source VM FD is
sent to the target VMM. The target VMM calls the new cap ioctl with the
source VM FD, KVM then moves all the SEV state to the target VM from the
source VM.

V4:
 * Move to seanjc@'s suggestion of source VM FD based single ioctl design.

v3:
 * Fix memory leak found by dan.carpenter@

v2:
 * Added marcorr@ reviewed by tag
 * Renamed function introduced in 1/3
 * Edited with seanjc@'s review comments
 ** Cleaned up WARN usage
 ** Userspace makes random token now
 * Edited with brijesh.singh@'s review comments
 ** Checks for different LAUNCH_* states in send function

v1: https://lore.kernel.org/kvm/20210621163118.1040170-1-pgonda@google.com/

Peter Gonda (2):
  KVM, SEV: Add support for SEV intra host migration
  KVM, SEV: Add support for SEV-ES intra host migration

 Documentation/virt/kvm/api.rst  |  15 +++
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/svm/sev.c          | 160 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |   1 +
 arch/x86/kvm/svm/svm.h          |   3 +
 arch/x86/kvm/x86.c              |   5 +
 include/uapi/linux/kvm.h        |   1 +
 7 files changed, 186 insertions(+)

base-commit: a3e0b8bd99ab

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration
  2021-08-19 15:49 [PATCH 0/2 V4] Add AMD SEV and SEV-ES intra host migration support Peter Gonda
@ 2021-08-19 15:49 ` Peter Gonda
  2021-08-19 16:23   ` Marc Orr
  2021-08-19 15:49 ` [PATCH 2/2 V4] KVM, SEV: Add support for SEV-ES " Peter Gonda
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Gonda @ 2021-08-19 15:49 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Sean Christopherson, Marc Orr, Paolo Bonzini,
	David Rientjes, Dr . David Alan Gilbert, Brijesh Singh,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	linux-kernel

For SEV to work with intra host migration, contents of the SEV info struct
such as the ASID (used to index the encryption key in the AMD SP) and
the list of memory regions need to be transferred to the target VM.
This change adds a commands for a target VMM to get a source SEV VM's sev
info.

The target is expected to be initialized (sev_guest_init), but not
launched state (sev_launch_start) when performing receive. Once the
target has received, it will be in a launched state and will not
need to perform the typical SEV launch commands.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/virt/kvm/api.rst  |  15 +++++
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/svm/sev.c          | 108 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |   1 +
 arch/x86/kvm/svm/svm.h          |   3 +
 arch/x86/kvm/x86.c              |   5 ++
 include/uapi/linux/kvm.h        |   1 +
 7 files changed, 134 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 86d7ad3a126c..9dc56778b421 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6701,6 +6701,21 @@ MAP_SHARED mmap will result in an -EINVAL return.
 When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
 perform a bulk copy of tags to/from the guest.
 
+7.29 KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM
+-------------------------------------
+
+Architectures: x86 SEV enabled
+Type: vm
+Parameters: args[0] is the fd of the source vm
+Returns: 0 on success
+
+This capability enables userspace to migrate the encryption context from the vm
+indicated by the fd to the vm this is called on.
+
+This is intended to support intra-host migration of VMs between userspace VMMs.
+in-guest workloads scheduled by the host. This allows for upgrading the VMM
+process without interrupting the guest.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 20daaf67a5bf..fd3a118c9e40 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1448,6 +1448,7 @@ struct kvm_x86_ops {
 	int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
+	int (*vm_migrate_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
 
 	int (*get_msr_feature)(struct kvm_msr_entry *entry);
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 75e0b21ad07c..2d98b56b6f8c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1501,6 +1501,114 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	return sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, &data, &argp->error);
 }
 
+static int svm_sev_lock_for_migration(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+	int ret;
+
+	/*
+	 * Bail if this VM is already involved in a migration to avoid deadlock
+	 * between two VMs trying to migrate to/from each other.
+	 */
+	spin_lock(&sev->migration_lock);
+	if (sev->migration_in_progress)
+		ret = -EBUSY;
+	else {
+		/*
+		 * Otherwise indicate VM is migrating and take the KVM lock.
+		 */
+		sev->migration_in_progress = true;
+		mutex_lock(&kvm->lock);
+		ret = 0;
+	}
+	spin_unlock(&sev->migration_lock);
+
+	return ret;
+}
+
+static void svm_unlock_after_migration(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	mutex_unlock(&kvm->lock);
+	WRITE_ONCE(sev->migration_in_progress, false);
+}
+
+static void migrate_info_from(struct kvm_sev_info *dst,
+			      struct kvm_sev_info *src)
+{
+	sev_asid_free(dst);
+
+	dst->asid = src->asid;
+	dst->misc_cg = src->misc_cg;
+	dst->handle = src->handle;
+	dst->pages_locked = src->pages_locked;
+
+	src->asid = 0;
+	src->active = false;
+	src->handle = 0;
+	src->pages_locked = 0;
+	src->misc_cg = NULL;
+
+	INIT_LIST_HEAD(&dst->regions_list);
+	list_replace_init(&src->regions_list, &dst->regions_list);
+}
+
+int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
+{
+	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
+	struct file *source_kvm_file;
+	struct kvm *source_kvm;
+	int ret;
+
+	ret = svm_sev_lock_for_migration(kvm);
+	if (ret)
+		return ret;
+
+	if (!sev_guest(kvm) || sev_es_guest(kvm)) {
+		ret = -EINVAL;
+		pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
+		goto out_unlock;
+	}
+
+	if (!list_empty(&dst_sev->regions_list)) {
+		ret = -EINVAL;
+		pr_warn_ratelimited(
+			"VM must not have encrypted regions to migrate to.\n");
+		goto out_unlock;
+	}
+
+	source_kvm_file = fget(source_fd);
+	if (!file_is_kvm(source_kvm_file)) {
+		ret = -EBADF;
+		goto out_fput;
+	}
+
+	source_kvm = source_kvm_file->private_data;
+	ret = svm_sev_lock_for_migration(source_kvm);
+	if (ret)
+		goto out_fput;
+
+	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
+		ret = -EINVAL;
+		pr_warn_ratelimited(
+			"Source VM must be SEV enabled to migrate from.\n");
+		goto out_source;
+	}
+
+	migrate_info_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
+	ret = 0;
+
+out_source:
+	svm_unlock_after_migration(source_kvm);
+out_fput:
+	if (source_kvm_file)
+		fput(source_kvm_file);
+out_unlock:
+	svm_unlock_after_migration(kvm);
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7b58e445a967..8b5bcab48937 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4627,6 +4627,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.mem_enc_unreg_region = svm_unregister_enc_region,
 
 	.vm_copy_enc_context_from = svm_vm_copy_asid_from,
+	.vm_migrate_enc_context_from = svm_vm_migrate_from,
 
 	.can_emulate_instruction = svm_can_emulate_instruction,
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 524d943f3efc..3576a12700d7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -80,6 +80,8 @@ struct kvm_sev_info {
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
+	spinlock_t migration_lock;
+	bool migration_in_progress;
 };
 
 struct kvm_svm {
@@ -552,6 +554,7 @@ int svm_register_enc_region(struct kvm *kvm,
 int svm_unregister_enc_region(struct kvm *kvm,
 			      struct kvm_enc_region *range);
 int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd);
+int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd);
 void pre_sev_run(struct vcpu_svm *svm, int cpu);
 void __init sev_set_cpu_caps(void);
 void __init sev_hardware_setup(void);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fdc0c18339fb..ea3100134e35 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5655,6 +5655,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		if (kvm_x86_ops.vm_copy_enc_context_from)
 			r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
 		return r;
+	case KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM:
+		r = -EINVAL;
+		if (kvm_x86_ops.vm_migrate_enc_context_from)
+			r = kvm_x86_ops.vm_migrate_enc_context_from(kvm, cap->args[0]);
+		return r;
 	case KVM_CAP_EXIT_HYPERCALL:
 		if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
 			r = -EINVAL;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a067410ebea5..49660204cdb9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/2 V4] KVM, SEV: Add support for SEV-ES intra host migration
  2021-08-19 15:49 [PATCH 0/2 V4] Add AMD SEV and SEV-ES intra host migration support Peter Gonda
  2021-08-19 15:49 ` [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration Peter Gonda
@ 2021-08-19 15:49 ` Peter Gonda
  2021-08-20 21:00   ` Marc Orr
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Gonda @ 2021-08-19 15:49 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Marc Orr, Paolo Bonzini, Sean Christopherson,
	David Rientjes, Dr . David Alan Gilbert, Brijesh Singh,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	linux-kernel

For SEV-ES to work with intra host migration the VMSAs, GHCB metadata,
and other SEV-ES info needs to be preserved along with the guest's
memory.

Signed-off-by: Peter Gonda <pgonda@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kvm/svm/sev.c | 58 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2d98b56b6f8c..970d75c34e9a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1554,6 +1554,53 @@ static void migrate_info_from(struct kvm_sev_info *dst,
 	list_replace_init(&src->regions_list, &dst->regions_list);
 }
 
+static int migrate_vmsa_from(struct kvm *dst, struct kvm *src)
+{
+	int i, num_vcpus;
+	struct kvm_vcpu *dst_vcpu, *src_vcpu;
+	struct vcpu_svm *dst_svm, *src_svm;
+
+	num_vcpus = atomic_read(&dst->online_vcpus);
+	if (num_vcpus != atomic_read(&src->online_vcpus)) {
+		pr_warn_ratelimited(
+			"Source and target VMs must have same number of vCPUs.\n");
+		return -EINVAL;
+	}
+
+	for (i = 0; i < num_vcpus; ++i) {
+		src_vcpu = src->vcpus[i];
+		if (!src_vcpu->arch.guest_state_protected) {
+			pr_warn_ratelimited(
+				"Source ES VM vCPUs must have protected state.\n");
+			return -EINVAL;
+		}
+	}
+
+	for (i = 0; i < num_vcpus; ++i) {
+		src_vcpu = src->vcpus[i];
+		src_svm = to_svm(src_vcpu);
+		dst_vcpu = dst->vcpus[i];
+		dst_svm = to_svm(dst_vcpu);
+
+		dst_vcpu->vcpu_id = src_vcpu->vcpu_id;
+		dst_svm->vmsa = src_svm->vmsa;
+		src_svm->vmsa = NULL;
+		dst_svm->ghcb = src_svm->ghcb;
+		src_svm->ghcb = NULL;
+		dst_svm->vmcb->control.ghcb_gpa =
+				src_svm->vmcb->control.ghcb_gpa;
+		dst_svm->ghcb_sa = src_svm->ghcb_sa;
+		src_svm->ghcb_sa = NULL;
+		dst_svm->ghcb_sa_len = src_svm->ghcb_sa_len;
+		src_svm->ghcb_sa_len = 0;
+		dst_svm->ghcb_sa_sync = src_svm->ghcb_sa_sync;
+		src_svm->ghcb_sa_sync = false;
+		dst_svm->ghcb_sa_free = src_svm->ghcb_sa_free;
+		src_svm->ghcb_sa_free = false;
+	}
+	return 0;
+}
+
 int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
 {
 	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
@@ -1565,7 +1612,7 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
 	if (ret)
 		return ret;
 
-	if (!sev_guest(kvm) || sev_es_guest(kvm)) {
+	if (!sev_guest(kvm)) {
 		ret = -EINVAL;
 		pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
 		goto out_unlock;
@@ -1589,15 +1636,20 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
 	if (ret)
 		goto out_fput;
 
-	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
+	if (!sev_guest(source_kvm)) {
 		ret = -EINVAL;
 		pr_warn_ratelimited(
 			"Source VM must be SEV enabled to migrate from.\n");
 		goto out_source;
 	}
 
+	if (sev_es_guest(kvm)) {
+		ret = migrate_vmsa_from(kvm, source_kvm);
+		if (ret)
+			goto out_source;
+	}
 	migrate_info_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
-        ret = 0;
+	ret = 0;
 
 out_source:
 	svm_unlock_after_migration(source_kvm);
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration
  2021-08-19 15:49 ` [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration Peter Gonda
@ 2021-08-19 16:23   ` Marc Orr
  2021-08-19 21:00     ` Peter Gonda
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Orr @ 2021-08-19 16:23 UTC (permalink / raw)
  To: Peter Gonda
  Cc: kvm list, Sean Christopherson, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

On Thu, Aug 19, 2021 at 8:49 AM Peter Gonda <pgonda@google.com> wrote:
>
> For SEV to work with intra host migration, contents of the SEV info struct
> such as the ASID (used to index the encryption key in the AMD SP) and
> the list of memory regions need to be transferred to the target VM.
> This change adds a commands for a target VMM to get a source SEV VM's sev
> info.
>
> The target is expected to be initialized (sev_guest_init), but not
> launched state (sev_launch_start) when performing receive. Once the
> target has received, it will be in a launched state and will not
> need to perform the typical SEV launch commands.
>
> Signed-off-by: Peter Gonda <pgonda@google.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Cc: Marc Orr <marcorr@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Brijesh Singh <brijesh.singh@amd.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  Documentation/virt/kvm/api.rst  |  15 +++++
>  arch/x86/include/asm/kvm_host.h |   1 +
>  arch/x86/kvm/svm/sev.c          | 108 ++++++++++++++++++++++++++++++++
>  arch/x86/kvm/svm/svm.c          |   1 +
>  arch/x86/kvm/svm/svm.h          |   3 +
>  arch/x86/kvm/x86.c              |   5 ++
>  include/uapi/linux/kvm.h        |   1 +
>  7 files changed, 134 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 86d7ad3a126c..9dc56778b421 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6701,6 +6701,21 @@ MAP_SHARED mmap will result in an -EINVAL return.
>  When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
>  perform a bulk copy of tags to/from the guest.
>
> +7.29 KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM
> +-------------------------------------
> +
> +Architectures: x86 SEV enabled
> +Type: vm
> +Parameters: args[0] is the fd of the source vm
> +Returns: 0 on success
> +
> +This capability enables userspace to migrate the encryption context from the vm
> +indicated by the fd to the vm this is called on.
> +
> +This is intended to support intra-host migration of VMs between userspace VMMs.
> +in-guest workloads scheduled by the host. This allows for upgrading the VMM
> +process without interrupting the guest.
> +
>  8. Other capabilities.
>  ======================
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 20daaf67a5bf..fd3a118c9e40 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1448,6 +1448,7 @@ struct kvm_x86_ops {
>         int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
>         int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
>         int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
> +       int (*vm_migrate_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
>
>         int (*get_msr_feature)(struct kvm_msr_entry *entry);
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 75e0b21ad07c..2d98b56b6f8c 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1501,6 +1501,114 @@ static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>         return sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, &data, &argp->error);
>  }
>
> +static int svm_sev_lock_for_migration(struct kvm *kvm)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +       int ret;
> +
> +       /*
> +        * Bail if this VM is already involved in a migration to avoid deadlock
> +        * between two VMs trying to migrate to/from each other.
> +        */
> +       spin_lock(&sev->migration_lock);
> +       if (sev->migration_in_progress)
> +               ret = -EBUSY;
> +       else {
> +               /*
> +                * Otherwise indicate VM is migrating and take the KVM lock.
> +                */
> +               sev->migration_in_progress = true;
> +               mutex_lock(&kvm->lock);
> +               ret = 0;
> +       }
> +       spin_unlock(&sev->migration_lock);
> +
> +       return ret;
> +}
> +
> +static void svm_unlock_after_migration(struct kvm *kvm)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> +
> +       mutex_unlock(&kvm->lock);
> +       WRITE_ONCE(sev->migration_in_progress, false);
> +}
> +

This entire locking scheme seems over-complicated to me. Can we simply
rely on `migration_lock` and get rid of `migration_in_progress`? I was
chatting about these patches with Peter, while he worked on this new
version. But he mentioned that this locking scheme had been suggested
by Sean in a previous review. Sean: what do you think? My rationale
was that this is called via a VM-level ioctl. So serializing the
entire code path on `migration_lock` seems fine. But maybe I'm missing
something?

> +static void migrate_info_from(struct kvm_sev_info *dst,
> +                             struct kvm_sev_info *src)
> +{
> +       sev_asid_free(dst);
> +
> +       dst->asid = src->asid;
> +       dst->misc_cg = src->misc_cg;
> +       dst->handle = src->handle;
> +       dst->pages_locked = src->pages_locked;
> +
> +       src->asid = 0;
> +       src->active = false;
> +       src->handle = 0;
> +       src->pages_locked = 0;
> +       src->misc_cg = NULL;
> +
> +       INIT_LIST_HEAD(&dst->regions_list);
> +       list_replace_init(&src->regions_list, &dst->regions_list);
> +}
> +
> +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> +{
> +       struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> +       struct file *source_kvm_file;
> +       struct kvm *source_kvm;
> +       int ret;
> +
> +       ret = svm_sev_lock_for_migration(kvm);
> +       if (ret)
> +               return ret;
> +
> +       if (!sev_guest(kvm) || sev_es_guest(kvm)) {
> +               ret = -EINVAL;
> +               pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
> +               goto out_unlock;
> +       }
> +
> +       if (!list_empty(&dst_sev->regions_list)) {
> +               ret = -EINVAL;
> +               pr_warn_ratelimited(
> +                       "VM must not have encrypted regions to migrate to.\n");
> +               goto out_unlock;
> +       }
> +
> +       source_kvm_file = fget(source_fd);
> +       if (!file_is_kvm(source_kvm_file)) {
> +               ret = -EBADF;
> +               goto out_fput;
> +       }
> +
> +       source_kvm = source_kvm_file->private_data;
> +       ret = svm_sev_lock_for_migration(source_kvm);
> +       if (ret)
> +               goto out_fput;
> +
> +       if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> +               ret = -EINVAL;
> +               pr_warn_ratelimited(
> +                       "Source VM must be SEV enabled to migrate from.\n");
> +               goto out_source;
> +       }
> +
> +       migrate_info_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> +       ret = 0;
> +
> +out_source:
> +       svm_unlock_after_migration(source_kvm);
> +out_fput:
> +       if (source_kvm_file)
> +               fput(source_kvm_file);
> +out_unlock:
> +       svm_unlock_after_migration(kvm);
> +       return ret;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>         struct kvm_sev_cmd sev_cmd;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7b58e445a967..8b5bcab48937 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4627,6 +4627,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>         .mem_enc_unreg_region = svm_unregister_enc_region,
>
>         .vm_copy_enc_context_from = svm_vm_copy_asid_from,
> +       .vm_migrate_enc_context_from = svm_vm_migrate_from,
>
>         .can_emulate_instruction = svm_can_emulate_instruction,
>
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 524d943f3efc..3576a12700d7 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -80,6 +80,8 @@ struct kvm_sev_info {
>         u64 ap_jump_table;      /* SEV-ES AP Jump Table address */
>         struct kvm *enc_context_owner; /* Owner of copied encryption context */
>         struct misc_cg *misc_cg; /* For misc cgroup accounting */
> +       spinlock_t migration_lock;
> +       bool migration_in_progress;
>  };
>
>  struct kvm_svm {
> @@ -552,6 +554,7 @@ int svm_register_enc_region(struct kvm *kvm,
>  int svm_unregister_enc_region(struct kvm *kvm,
>                               struct kvm_enc_region *range);
>  int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd);
> +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd);
>  void pre_sev_run(struct vcpu_svm *svm, int cpu);
>  void __init sev_set_cpu_caps(void);
>  void __init sev_hardware_setup(void);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fdc0c18339fb..ea3100134e35 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5655,6 +5655,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 if (kvm_x86_ops.vm_copy_enc_context_from)
>                         r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
>                 return r;
> +       case KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM:
> +               r = -EINVAL;
> +               if (kvm_x86_ops.vm_migrate_enc_context_from)
> +                       r = kvm_x86_ops.vm_migrate_enc_context_from(kvm, cap->args[0]);
> +               return r;
>         case KVM_CAP_EXIT_HYPERCALL:
>                 if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
>                         r = -EINVAL;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index a067410ebea5..49660204cdb9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_BINARY_STATS_FD 203
>  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>  #define KVM_CAP_ARM_MTE 205
> +#define KVM_CAP_VM_MIGRATE_ENC_CONTEXT_FROM 206
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration
  2021-08-19 16:23   ` Marc Orr
@ 2021-08-19 21:00     ` Peter Gonda
  2021-08-19 22:58       ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Gonda @ 2021-08-19 21:00 UTC (permalink / raw)
  To: Marc Orr
  Cc: kvm list, Sean Christopherson, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

> >
> > +static int svm_sev_lock_for_migration(struct kvm *kvm)
> > +{
> > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +       int ret;
> > +
> > +       /*
> > +        * Bail if this VM is already involved in a migration to avoid deadlock
> > +        * between two VMs trying to migrate to/from each other.
> > +        */
> > +       spin_lock(&sev->migration_lock);
> > +       if (sev->migration_in_progress)
> > +               ret = -EBUSY;
> > +       else {
> > +               /*
> > +                * Otherwise indicate VM is migrating and take the KVM lock.
> > +                */
> > +               sev->migration_in_progress = true;
> > +               mutex_lock(&kvm->lock);
> > +               ret = 0;
> > +       }
> > +       spin_unlock(&sev->migration_lock);
> > +
> > +       return ret;
> > +}
> > +
> > +static void svm_unlock_after_migration(struct kvm *kvm)
> > +{
> > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > +
> > +       mutex_unlock(&kvm->lock);
> > +       WRITE_ONCE(sev->migration_in_progress, false);
> > +}
> > +
>
> This entire locking scheme seems over-complicated to me. Can we simply
> rely on `migration_lock` and get rid of `migration_in_progress`? I was
> chatting about these patches with Peter, while he worked on this new
> version. But he mentioned that this locking scheme had been suggested
> by Sean in a previous review. Sean: what do you think? My rationale
> was that this is called via a VM-level ioctl. So serializing the
> entire code path on `migration_lock` seems fine. But maybe I'm missing
> something?
>

Marc I think that only having the spin lock could result in
deadlocking. If userspace double migrated 2 VMs, A and B for
discussion, A could grab VM_A.spin_lock then VM_A.kvm_mutex. Meanwhile
B could grab VM_B.spin_lock and VM_B.kvm_mutex. Then A attempts to
grab VM_B.spin_lock and we have a deadlock. If the same happens with
the proposed scheme when A attempts to lock B, VM_B.spin_lock will be
open but the bool will mark the VM under migration so A will unlock
and bail. Sean originally proposed a global spin lock but I thought a
per kvm_sev_info struct would also be safe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration
  2021-08-19 21:00     ` Peter Gonda
@ 2021-08-19 22:58       ` Sean Christopherson
  2021-08-20  6:35         ` Marc Orr
  2021-08-20 20:53         ` Marc Orr
  0 siblings, 2 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-08-19 22:58 UTC (permalink / raw)
  To: Peter Gonda
  Cc: Marc Orr, kvm list, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

On Thu, Aug 19, 2021, Peter Gonda wrote:
> > >
> > > +static int svm_sev_lock_for_migration(struct kvm *kvm)
> > > +{
> > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > +       int ret;
> > > +
> > > +       /*
> > > +        * Bail if this VM is already involved in a migration to avoid deadlock
> > > +        * between two VMs trying to migrate to/from each other.
> > > +        */
> > > +       spin_lock(&sev->migration_lock);
> > > +       if (sev->migration_in_progress)
> > > +               ret = -EBUSY;
> > > +       else {
> > > +               /*
> > > +                * Otherwise indicate VM is migrating and take the KVM lock.
> > > +                */
> > > +               sev->migration_in_progress = true;
> > > +               mutex_lock(&kvm->lock);

Deadlock aside, mutex_lock() can sleep, which is not allowed while holding a
spinlock, i.e. this patch does not work.  That's my suggestion did the crazy
dance of "acquiring" a flag.

What I don't know is why on earth I suggested a global spinlock, a simple atomic
should work, e.g.

		if (atomic_cmpxchg_acquire(&sev->migration_in_progress, 0, 1))
			return -EBUSY;

		mutex_lock(&kvm->lock);

and on the backend...

		mutex_unlock(&kvm->lock); 

		atomic_set_release(&sev->migration_in_progress, 0);

> > > +               ret = 0;
> > > +       }
> > > +       spin_unlock(&sev->migration_lock);
> > > +
> > > +       return ret;
> > > +}
> > > +
> > > +static void svm_unlock_after_migration(struct kvm *kvm)
> > > +{
> > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > +
> > > +       mutex_unlock(&kvm->lock);
> > > +       WRITE_ONCE(sev->migration_in_progress, false);
> > > +}
> > > +
> >
> > This entire locking scheme seems over-complicated to me. Can we simply
> > rely on `migration_lock` and get rid of `migration_in_progress`? I was
> > chatting about these patches with Peter, while he worked on this new
> > version. But he mentioned that this locking scheme had been suggested
> > by Sean in a previous review. Sean: what do you think? My rationale
> > was that this is called via a VM-level ioctl. So serializing the
> > entire code path on `migration_lock` seems fine. But maybe I'm missing
> > something?
>
> 
> Marc I think that only having the spin lock could result in
> deadlocking. If userspace double migrated 2 VMs, A and B for
> discussion, A could grab VM_A.spin_lock then VM_A.kvm_mutex. Meanwhile
> B could grab VM_B.spin_lock and VM_B.kvm_mutex. Then A attempts to
> grab VM_B.spin_lock and we have a deadlock. If the same happens with
> the proposed scheme when A attempts to lock B, VM_B.spin_lock will be
> open but the bool will mark the VM under migration so A will unlock
> and bail. Sean originally proposed a global spin lock but I thought a
> per kvm_sev_info struct would also be safe.
 
Close.  The issue is taking kvm->lock from both VM_A and VM_B.  If userspace
double migrates we'll end up with lock ordering A->B and B-A, so we need a way
to guarantee one of those wins.  My proposed solution is to use a flag as a sort
of one-off "try lock" to detect a mean userspace.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration
  2021-08-19 22:58       ` Sean Christopherson
@ 2021-08-20  6:35         ` Marc Orr
  2021-08-20 14:50           ` Sean Christopherson
  2021-08-20 20:53         ` Marc Orr
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Orr @ 2021-08-20  6:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Gonda, kvm list, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

On Thu, Aug 19, 2021 at 3:58 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Aug 19, 2021, Peter Gonda wrote:
> > > >
> > > > +static int svm_sev_lock_for_migration(struct kvm *kvm)
> > > > +{
> > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > +       int ret;
> > > > +
> > > > +       /*
> > > > +        * Bail if this VM is already involved in a migration to avoid deadlock
> > > > +        * between two VMs trying to migrate to/from each other.
> > > > +        */
> > > > +       spin_lock(&sev->migration_lock);
> > > > +       if (sev->migration_in_progress)
> > > > +               ret = -EBUSY;
> > > > +       else {
> > > > +               /*
> > > > +                * Otherwise indicate VM is migrating and take the KVM lock.
> > > > +                */
> > > > +               sev->migration_in_progress = true;
> > > > +               mutex_lock(&kvm->lock);
>
> Deadlock aside, mutex_lock() can sleep, which is not allowed while holding a
> spinlock, i.e. this patch does not work.  That's my suggestion did the crazy
> dance of "acquiring" a flag.
>
> What I don't know is why on earth I suggested a global spinlock, a simple atomic
> should work, e.g.
>
>                 if (atomic_cmpxchg_acquire(&sev->migration_in_progress, 0, 1))
>                         return -EBUSY;
>
>                 mutex_lock(&kvm->lock);
>
> and on the backend...
>
>                 mutex_unlock(&kvm->lock);
>
>                 atomic_set_release(&sev->migration_in_progress, 0);
>
> > > > +               ret = 0;
> > > > +       }
> > > > +       spin_unlock(&sev->migration_lock);
> > > > +
> > > > +       return ret;
> > > > +}
> > > > +
> > > > +static void svm_unlock_after_migration(struct kvm *kvm)
> > > > +{
> > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > +
> > > > +       mutex_unlock(&kvm->lock);
> > > > +       WRITE_ONCE(sev->migration_in_progress, false);
> > > > +}
> > > > +
> > >
> > > This entire locking scheme seems over-complicated to me. Can we simply
> > > rely on `migration_lock` and get rid of `migration_in_progress`? I was
> > > chatting about these patches with Peter, while he worked on this new
> > > version. But he mentioned that this locking scheme had been suggested
> > > by Sean in a previous review. Sean: what do you think? My rationale
> > > was that this is called via a VM-level ioctl. So serializing the
> > > entire code path on `migration_lock` seems fine. But maybe I'm missing
> > > something?
> >
> >
> > Marc I think that only having the spin lock could result in
> > deadlocking. If userspace double migrated 2 VMs, A and B for
> > discussion, A could grab VM_A.spin_lock then VM_A.kvm_mutex. Meanwhile
> > B could grab VM_B.spin_lock and VM_B.kvm_mutex. Then A attempts to
> > grab VM_B.spin_lock and we have a deadlock. If the same happens with
> > the proposed scheme when A attempts to lock B, VM_B.spin_lock will be
> > open but the bool will mark the VM under migration so A will unlock
> > and bail. Sean originally proposed a global spin lock but I thought a
> > per kvm_sev_info struct would also be safe.
>
> Close.  The issue is taking kvm->lock from both VM_A and VM_B.  If userspace
> double migrates we'll end up with lock ordering A->B and B-A, so we need a way
> to guarantee one of those wins.  My proposed solution is to use a flag as a sort
> of one-off "try lock" to detect a mean userspace.

Got it now. Thanks to you both, for the explanation. By the way, just
to make sure I completely follow, I assume that if a "double
migration" occurs, then user space is mis-behaving -- correct? But
presumably, we need to reason about how to respond to such
mis-behavior so that buggy or malicious user-space code cannot stumble
over/exploit this scenario?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration
  2021-08-20  6:35         ` Marc Orr
@ 2021-08-20 14:50           ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2021-08-20 14:50 UTC (permalink / raw)
  To: Marc Orr
  Cc: Peter Gonda, kvm list, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

On Thu, Aug 19, 2021, Marc Orr wrote:
> On Thu, Aug 19, 2021 at 3:58 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Aug 19, 2021, Peter Gonda wrote:
> > > Marc I think that only having the spin lock could result in
> > > deadlocking. If userspace double migrated 2 VMs, A and B for
> > > discussion, A could grab VM_A.spin_lock then VM_A.kvm_mutex. Meanwhile
> > > B could grab VM_B.spin_lock and VM_B.kvm_mutex. Then A attempts to
> > > grab VM_B.spin_lock and we have a deadlock. If the same happens with
> > > the proposed scheme when A attempts to lock B, VM_B.spin_lock will be
> > > open but the bool will mark the VM under migration so A will unlock
> > > and bail. Sean originally proposed a global spin lock but I thought a
> > > per kvm_sev_info struct would also be safe.
> >
> > Close.  The issue is taking kvm->lock from both VM_A and VM_B.  If userspace
> > double migrates we'll end up with lock ordering A->B and B-A, so we need a way
> > to guarantee one of those wins.  My proposed solution is to use a flag as a sort
> > of one-off "try lock" to detect a mean userspace.
> 
> Got it now. Thanks to you both, for the explanation. By the way, just
> to make sure I completely follow, I assume that if a "double
> migration" occurs, then user space is mis-behaving -- correct?

Yep.

> But presumably, we need to reason about how to respond to such mis-behavior
> so that buggy or malicious user-space code cannot stumble over/exploit this
> scenario?

That's what the anti-deadlock flag is for. :-)  With that in place, there's no
meaningful difference between say a bad userspace doing double migrate and a bad
userspace migrating from garbage, e.g. passing in a bogus fd.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration
  2021-08-19 22:58       ` Sean Christopherson
  2021-08-20  6:35         ` Marc Orr
@ 2021-08-20 20:53         ` Marc Orr
  2021-08-23 16:39           ` Peter Gonda
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Orr @ 2021-08-20 20:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Gonda, kvm list, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

On Thu, Aug 19, 2021 at 3:58 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Aug 19, 2021, Peter Gonda wrote:
> > > >
> > > > +static int svm_sev_lock_for_migration(struct kvm *kvm)
> > > > +{
> > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > +       int ret;
> > > > +
> > > > +       /*
> > > > +        * Bail if this VM is already involved in a migration to avoid deadlock
> > > > +        * between two VMs trying to migrate to/from each other.
> > > > +        */
> > > > +       spin_lock(&sev->migration_lock);
> > > > +       if (sev->migration_in_progress)
> > > > +               ret = -EBUSY;
> > > > +       else {
> > > > +               /*
> > > > +                * Otherwise indicate VM is migrating and take the KVM lock.
> > > > +                */
> > > > +               sev->migration_in_progress = true;
> > > > +               mutex_lock(&kvm->lock);
>
> Deadlock aside, mutex_lock() can sleep, which is not allowed while holding a
> spinlock, i.e. this patch does not work.  That's my suggestion did the crazy
> dance of "acquiring" a flag.
>
> What I don't know is why on earth I suggested a global spinlock, a simple atomic
> should work, e.g.
>
>                 if (atomic_cmpxchg_acquire(&sev->migration_in_progress, 0, 1))
>                         return -EBUSY;
>
>                 mutex_lock(&kvm->lock);
>
> and on the backend...
>
>                 mutex_unlock(&kvm->lock);
>
>                 atomic_set_release(&sev->migration_in_progress, 0);

+1 to replacing the spin lock with an atomic flag. Correctness issues
aside, I think it's also cleaner. Also, I'd suggest adding a comment
to source code to explain that the `migration_in_progress` flag is to
prevent deadlock due to the "double migration" discussed previously.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2 V4] KVM, SEV: Add support for SEV-ES intra host migration
  2021-08-19 15:49 ` [PATCH 2/2 V4] KVM, SEV: Add support for SEV-ES " Peter Gonda
@ 2021-08-20 21:00   ` Marc Orr
  2021-08-23 16:38     ` Peter Gonda
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Orr @ 2021-08-20 21:00 UTC (permalink / raw)
  To: Peter Gonda
  Cc: kvm list, Paolo Bonzini, Sean Christopherson, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

On Thu, Aug 19, 2021 at 8:49 AM Peter Gonda <pgonda@google.com> wrote:
>
> For SEV-ES to work with intra host migration the VMSAs, GHCB metadata,
> and other SEV-ES info needs to be preserved along with the guest's
> memory.
>
> Signed-off-by: Peter Gonda <pgonda@google.com>
> Cc: Marc Orr <marcorr@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Brijesh Singh <brijesh.singh@amd.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/x86/kvm/svm/sev.c | 58 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2d98b56b6f8c..970d75c34e9a 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1554,6 +1554,53 @@ static void migrate_info_from(struct kvm_sev_info *dst,
>         list_replace_init(&src->regions_list, &dst->regions_list);
>  }
>
> +static int migrate_vmsa_from(struct kvm *dst, struct kvm *src)
> +{
> +       int i, num_vcpus;
> +       struct kvm_vcpu *dst_vcpu, *src_vcpu;
> +       struct vcpu_svm *dst_svm, *src_svm;
> +
> +       num_vcpus = atomic_read(&dst->online_vcpus);
> +       if (num_vcpus != atomic_read(&src->online_vcpus)) {
> +               pr_warn_ratelimited(
> +                       "Source and target VMs must have same number of vCPUs.\n");
> +               return -EINVAL;
> +       }
> +
> +       for (i = 0; i < num_vcpus; ++i) {
> +               src_vcpu = src->vcpus[i];
> +               if (!src_vcpu->arch.guest_state_protected) {
> +                       pr_warn_ratelimited(
> +                               "Source ES VM vCPUs must have protected state.\n");
> +                       return -EINVAL;
> +               }
> +       }
> +
> +       for (i = 0; i < num_vcpus; ++i) {
> +               src_vcpu = src->vcpus[i];
> +               src_svm = to_svm(src_vcpu);
> +               dst_vcpu = dst->vcpus[i];
> +               dst_svm = to_svm(dst_vcpu);
> +
> +               dst_vcpu->vcpu_id = src_vcpu->vcpu_id;
> +               dst_svm->vmsa = src_svm->vmsa;
> +               src_svm->vmsa = NULL;
> +               dst_svm->ghcb = src_svm->ghcb;
> +               src_svm->ghcb = NULL;
> +               dst_svm->vmcb->control.ghcb_gpa =
> +                               src_svm->vmcb->control.ghcb_gpa;

Should we clear `src_svm->vmcb->control.ghcb_gpa`? (All the other
fields in `srv_svm` that are copied are then cleared, except for this
one.) Aside: Do we really need to clear all of the fields in `src_svm`
after they're copied over? It might be worth adding a comment to
explain why we're clearing them. It's not immediately obvious to me
why we're doing that.

> +               dst_svm->ghcb_sa = src_svm->ghcb_sa;
> +               src_svm->ghcb_sa = NULL;
> +               dst_svm->ghcb_sa_len = src_svm->ghcb_sa_len;
> +               src_svm->ghcb_sa_len = 0;
> +               dst_svm->ghcb_sa_sync = src_svm->ghcb_sa_sync;
> +               src_svm->ghcb_sa_sync = false;
> +               dst_svm->ghcb_sa_free = src_svm->ghcb_sa_free;
> +               src_svm->ghcb_sa_free = false;
> +       }
> +       return 0;
> +}
> +
>  int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
>  {
>         struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> @@ -1565,7 +1612,7 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
>         if (ret)
>                 return ret;
>
> -       if (!sev_guest(kvm) || sev_es_guest(kvm)) {
> +       if (!sev_guest(kvm)) {
>                 ret = -EINVAL;
>                 pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
>                 goto out_unlock;
> @@ -1589,15 +1636,20 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
>         if (ret)
>                 goto out_fput;
>
> -       if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> +       if (!sev_guest(source_kvm)) {
>                 ret = -EINVAL;
>                 pr_warn_ratelimited(
>                         "Source VM must be SEV enabled to migrate from.\n");
>                 goto out_source;
>         }
>
> +       if (sev_es_guest(kvm)) {
> +               ret = migrate_vmsa_from(kvm, source_kvm);
> +               if (ret)
> +                       goto out_source;
> +       }
>         migrate_info_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> -        ret = 0;
> +       ret = 0;

nit: looks like there's some space issue on this line. Since I believe
this line was introduced in the previous patch, I wouldn't expect it
to show up with a diff in patch #2.

>  out_source:
>         svm_unlock_after_migration(source_kvm);
> --
> 2.33.0.rc1.237.g0d66db33f3-goog

This patch looks good overall to me.

Reviewed-by: Marc Orr <marcorr@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2 V4] KVM, SEV: Add support for SEV-ES intra host migration
  2021-08-20 21:00   ` Marc Orr
@ 2021-08-23 16:38     ` Peter Gonda
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Gonda @ 2021-08-23 16:38 UTC (permalink / raw)
  To: Marc Orr
  Cc: kvm list, Paolo Bonzini, Sean Christopherson, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

On Fri, Aug 20, 2021 at 3:01 PM Marc Orr <marcorr@google.com> wrote:
>
> On Thu, Aug 19, 2021 at 8:49 AM Peter Gonda <pgonda@google.com> wrote:
> >
> > For SEV-ES to work with intra host migration the VMSAs, GHCB metadata,
> > and other SEV-ES info needs to be preserved along with the guest's
> > memory.
> >
> > Signed-off-by: Peter Gonda <pgonda@google.com>
> > Cc: Marc Orr <marcorr@google.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Sean Christopherson <seanjc@google.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Cc: Brijesh Singh <brijesh.singh@amd.com>
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Cc: Wanpeng Li <wanpengli@tencent.com>
> > Cc: Jim Mattson <jmattson@google.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  arch/x86/kvm/svm/sev.c | 58 +++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 55 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 2d98b56b6f8c..970d75c34e9a 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -1554,6 +1554,53 @@ static void migrate_info_from(struct kvm_sev_info *dst,
> >         list_replace_init(&src->regions_list, &dst->regions_list);
> >  }
> >
> > +static int migrate_vmsa_from(struct kvm *dst, struct kvm *src)
> > +{
> > +       int i, num_vcpus;
> > +       struct kvm_vcpu *dst_vcpu, *src_vcpu;
> > +       struct vcpu_svm *dst_svm, *src_svm;
> > +
> > +       num_vcpus = atomic_read(&dst->online_vcpus);
> > +       if (num_vcpus != atomic_read(&src->online_vcpus)) {
> > +               pr_warn_ratelimited(
> > +                       "Source and target VMs must have same number of vCPUs.\n");
> > +               return -EINVAL;
> > +       }
> > +
> > +       for (i = 0; i < num_vcpus; ++i) {
> > +               src_vcpu = src->vcpus[i];
> > +               if (!src_vcpu->arch.guest_state_protected) {
> > +                       pr_warn_ratelimited(
> > +                               "Source ES VM vCPUs must have protected state.\n");
> > +                       return -EINVAL;
> > +               }
> > +       }
> > +
> > +       for (i = 0; i < num_vcpus; ++i) {
> > +               src_vcpu = src->vcpus[i];
> > +               src_svm = to_svm(src_vcpu);
> > +               dst_vcpu = dst->vcpus[i];
> > +               dst_svm = to_svm(dst_vcpu);
> > +
> > +               dst_vcpu->vcpu_id = src_vcpu->vcpu_id;
> > +               dst_svm->vmsa = src_svm->vmsa;
> > +               src_svm->vmsa = NULL;
> > +               dst_svm->ghcb = src_svm->ghcb;
> > +               src_svm->ghcb = NULL;
> > +               dst_svm->vmcb->control.ghcb_gpa =
> > +                               src_svm->vmcb->control.ghcb_gpa;
>
> Should we clear `src_svm->vmcb->control.ghcb_gpa`? (All the other
> fields in `srv_svm` that are copied are then cleared, except for this
> one.) Aside: Do we really need to clear all of the fields in `src_svm`
> after they're copied over? It might be worth adding a comment to
> explain why we're clearing them. It's not immediately obvious to me
> why we're doing that.

Cleared in the next version and added a comment. We don't want to
leave references for the source VM to edit the VMSA or GHCB after the
migration.

>
> > +               dst_svm->ghcb_sa = src_svm->ghcb_sa;
> > +               src_svm->ghcb_sa = NULL;
> > +               dst_svm->ghcb_sa_len = src_svm->ghcb_sa_len;
> > +               src_svm->ghcb_sa_len = 0;
> > +               dst_svm->ghcb_sa_sync = src_svm->ghcb_sa_sync;
> > +               src_svm->ghcb_sa_sync = false;
> > +               dst_svm->ghcb_sa_free = src_svm->ghcb_sa_free;
> > +               src_svm->ghcb_sa_free = false;
> > +       }
> > +       return 0;
> > +}
> > +
> >  int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> >  {
> >         struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> > @@ -1565,7 +1612,7 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> >         if (ret)
> >                 return ret;
> >
> > -       if (!sev_guest(kvm) || sev_es_guest(kvm)) {
> > +       if (!sev_guest(kvm)) {
> >                 ret = -EINVAL;
> >                 pr_warn_ratelimited("VM must be SEV enabled to migrate to.\n");
> >                 goto out_unlock;
> > @@ -1589,15 +1636,20 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> >         if (ret)
> >                 goto out_fput;
> >
> > -       if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> > +       if (!sev_guest(source_kvm)) {
> >                 ret = -EINVAL;
> >                 pr_warn_ratelimited(
> >                         "Source VM must be SEV enabled to migrate from.\n");
> >                 goto out_source;
> >         }
> >
> > +       if (sev_es_guest(kvm)) {
> > +               ret = migrate_vmsa_from(kvm, source_kvm);
> > +               if (ret)
> > +                       goto out_source;
> > +       }
> >         migrate_info_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> > -        ret = 0;
> > +       ret = 0;
>
> nit: looks like there's some space issue on this line. Since I believe
> this line was introduced in the previous patch, I wouldn't expect it
> to show up with a diff in patch #2.

Fixed thanks.

>
> >  out_source:
> >         svm_unlock_after_migration(source_kvm);
> > --
> > 2.33.0.rc1.237.g0d66db33f3-goog
>
> This patch looks good overall to me.
>
> Reviewed-by: Marc Orr <marcorr@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration
  2021-08-20 20:53         ` Marc Orr
@ 2021-08-23 16:39           ` Peter Gonda
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Gonda @ 2021-08-23 16:39 UTC (permalink / raw)
  To: Marc Orr
  Cc: Sean Christopherson, kvm list, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

On Fri, Aug 20, 2021 at 2:53 PM Marc Orr <marcorr@google.com> wrote:
>
> On Thu, Aug 19, 2021 at 3:58 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Aug 19, 2021, Peter Gonda wrote:
> > > > >
> > > > > +static int svm_sev_lock_for_migration(struct kvm *kvm)
> > > > > +{
> > > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > > +       int ret;
> > > > > +
> > > > > +       /*
> > > > > +        * Bail if this VM is already involved in a migration to avoid deadlock
> > > > > +        * between two VMs trying to migrate to/from each other.
> > > > > +        */
> > > > > +       spin_lock(&sev->migration_lock);
> > > > > +       if (sev->migration_in_progress)
> > > > > +               ret = -EBUSY;
> > > > > +       else {
> > > > > +               /*
> > > > > +                * Otherwise indicate VM is migrating and take the KVM lock.
> > > > > +                */
> > > > > +               sev->migration_in_progress = true;
> > > > > +               mutex_lock(&kvm->lock);
> >
> > Deadlock aside, mutex_lock() can sleep, which is not allowed while holding a
> > spinlock, i.e. this patch does not work.  That's my suggestion did the crazy
> > dance of "acquiring" a flag.

Ah, makes sense.

> >
> > What I don't know is why on earth I suggested a global spinlock, a simple atomic
> > should work, e.g.
> >
> >                 if (atomic_cmpxchg_acquire(&sev->migration_in_progress, 0, 1))
> >                         return -EBUSY;
> >
> >                 mutex_lock(&kvm->lock);
> >
> > and on the backend...
> >
> >                 mutex_unlock(&kvm->lock);
> >
> >                 atomic_set_release(&sev->migration_in_progress, 0);
>
> +1 to replacing the spin lock with an atomic flag. Correctness issues
> aside, I think it's also cleaner. Also, I'd suggest adding a comment
> to source code to explain that the `migration_in_progress` flag is to
> prevent deadlock due to the "double migration" discussed previously.

Thanks! I've updated these locks to use the atomic. It looks much cleaner now.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-08-23 16:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-19 15:49 [PATCH 0/2 V4] Add AMD SEV and SEV-ES intra host migration support Peter Gonda
2021-08-19 15:49 ` [PATCH 1/2 V4] KVM, SEV: Add support for SEV intra host migration Peter Gonda
2021-08-19 16:23   ` Marc Orr
2021-08-19 21:00     ` Peter Gonda
2021-08-19 22:58       ` Sean Christopherson
2021-08-20  6:35         ` Marc Orr
2021-08-20 14:50           ` Sean Christopherson
2021-08-20 20:53         ` Marc Orr
2021-08-23 16:39           ` Peter Gonda
2021-08-19 15:49 ` [PATCH 2/2 V4] KVM, SEV: Add support for SEV-ES " Peter Gonda
2021-08-20 21:00   ` Marc Orr
2021-08-23 16:38     ` Peter Gonda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).