kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5 V11] Add AMD SEV and SEV-ES intra host migration support
@ 2021-10-21 17:42 Peter Gonda
  2021-10-21 17:42 ` [PATCH 1/5 V11] KVM: SEV: Refactor out sev_es_state struct Peter Gonda
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Peter Gonda @ 2021-10-21 17:42 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Marc Orr, Paolo Bonzini, Sean Christopherson,
	David Rientjes, Dr . David Alan Gilbert, Brijesh Singh,
	Tom Lendacky, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, linux-kernel

Intra host migration provides a low-cost mechanism for userspace VMM
upgrades.  It is an alternative to traditional (i.e., remote) live
migration. Whereas remote migration handles moving a guest to a new host,
intra host migration only handles moving a guest to a new userspace VMM
within a host.  This can be used to update, rollback, change flags of the
VMM, etc. The lower cost compared to live migration comes from the fact
that the guest's memory does not need to be copied between processes. A
handle to the guest memory simply gets passed to the new VMM, this could
be done via /dev/shm with share=on or similar feature.

The guest state can be transferred from an old VMM to a new VMM as follows:
1. Export guest state from KVM to the old user-space VMM via a getter
user-space/kernel API 2. Transfer guest state from old VMM to new VMM via
IPC communication 3. Import guest state into KVM from the new user-space
VMM via a setter user-space/kernel API VMMs by exporting from KVM using
getters, sending that data to the new VMM, then setting it again in KVM.

In the common case for intra host migration, we can rely on the normal
ioctls for passing data from one VMM to the next. SEV, SEV-ES, and other
confidential compute environments make most of this information opaque, and
render KVM ioctls such as "KVM_GET_REGS" irrelevant.  As a result, we need
the ability to pass this opaque metadata from one VMM to the next. The
easiest way to do this is to leave this data in the kernel, and transfer
ownership of the metadata from one KVM VM (or vCPU) to the next. For
example, we need to move the SEV enabled ASID, VMSAs, and GHCB metadata
from one VMM to the next.  In general, we need to be able to hand off any
data that would be unsafe/impossible for the kernel to hand directly to
userspace (and cannot be reproduced using data that can be handed safely to
userspace).

V11
 * Zero SEV-ES vCPU state on source.
 * Rebase onto SEV-ES fixes.

V10
 * Add new starting patch to refactor all SEV-ES related vCPU data into
   for easier copying.

V9
 * Fix sev_lock_vcpus_for_migration from unlocking the vCPU mutex it
   failed to unlock.

V8
 * Update to require that @dst is not SEV or SEV-ES enabled.
 * Address selftest feedback.

V7
 * Address selftest feedback.

V6
 * Add selftest.

V5:
 * Fix up locking scheme
 * Address marcorr@ comments.

V4:
 * Move to seanjc@'s suggestion of source VM FD based single ioctl design.

v3:
 * Fix memory leak found by dan.carpenter@

v2:
 * Added marcorr@ reviewed by tag
 * Renamed function introduced in 1/3
 * Edited with seanjc@'s review comments
 ** Cleaned up WARN usage
 ** Userspace makes random token now
 * Edited with brijesh.singh@'s review comments
 ** Checks for different LAUNCH_* states in send function

v1: https://lore.kernel.org/kvm/20210621163118.1040170-1-pgonda@google.com/

base-commit: 9f1ee7b169af

Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Peter Gonda (5):
  Refactor out sev_es_state struct
  KVM: SEV: Add support for SEV intra host migration
  KVM: SEV: Add support for SEV-ES intra host migration
  selftest: KVM: Add open sev dev helper
  selftest: KVM: Add intra host migration tests

 Documentation/virt/kvm/api.rst                |  15 +
 arch/x86/include/asm/kvm_host.h               |   1 +
 arch/x86/kvm/svm/sev.c                        | 268 +++++++++++++++---
 arch/x86/kvm/svm/svm.c                        |   9 +-
 arch/x86/kvm/svm/svm.h                        |  28 +-
 arch/x86/kvm/x86.c                            |   6 +
 include/uapi/linux/kvm.h                      |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +-
 .../testing/selftests/kvm/include/kvm_util.h  |   1 +
 .../selftests/kvm/include/x86_64/svm_util.h   |   2 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  24 +-
 tools/testing/selftests/kvm/lib/x86_64/svm.c  |  13 +
 .../selftests/kvm/x86_64/sev_vm_tests.c       | 203 +++++++++++++
 13 files changed, 507 insertions(+), 67 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/sev_vm_tests.c

-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5 V11] KVM: SEV: Refactor out sev_es_state struct
  2021-10-21 17:42 [PATCH 0/5 V11] Add AMD SEV and SEV-ES intra host migration support Peter Gonda
@ 2021-10-21 17:42 ` Peter Gonda
  2021-11-04 20:06   ` Sean Christopherson
  2021-10-21 17:43 ` [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration Peter Gonda
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Peter Gonda @ 2021-10-21 17:42 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Tom Lendacky, Marc Orr, Paolo Bonzini,
	Sean Christopherson, David Rientjes, Dr . David Alan Gilbert,
	Brijesh Singh, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, linux-kernel

Move SEV-ES vCPU metadata into new sev_es_state struct from vcpu_svm.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kvm/svm/sev.c | 83 +++++++++++++++++++++---------------------
 arch/x86/kvm/svm/svm.c |  8 ++--
 arch/x86/kvm/svm/svm.h | 26 +++++++------
 3 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0d21d59936e5..109b223c0b58 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -590,7 +590,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	 * traditional VMSA as it has been built so far (in prep
 	 * for LAUNCH_UPDATE_VMSA) to be the initial SEV-ES state.
 	 */
-	memcpy(svm->vmsa, save, sizeof(*save));
+	memcpy(svm->sev_es.vmsa, save, sizeof(*save));
 
 	return 0;
 }
@@ -612,11 +612,11 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
 	 * the VMSA memory content (i.e it will write the same memory region
 	 * with the guest's key), so invalidate it first.
 	 */
-	clflush_cache_range(svm->vmsa, PAGE_SIZE);
+	clflush_cache_range(svm->sev_es.vmsa, PAGE_SIZE);
 
 	vmsa.reserved = 0;
 	vmsa.handle = to_kvm_svm(kvm)->sev_info.handle;
-	vmsa.address = __sme_pa(svm->vmsa);
+	vmsa.address = __sme_pa(svm->sev_es.vmsa);
 	vmsa.len = PAGE_SIZE;
 	ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_VMSA, &vmsa, error);
 	if (ret)
@@ -2031,16 +2031,16 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
 	svm = to_svm(vcpu);
 
 	if (vcpu->arch.guest_state_protected)
-		sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
-	__free_page(virt_to_page(svm->vmsa));
+		sev_flush_guest_memory(svm, svm->sev_es.vmsa, PAGE_SIZE);
+	__free_page(virt_to_page(svm->sev_es.vmsa));
 
-	if (svm->ghcb_sa_free)
-		kfree(svm->ghcb_sa);
+	if (svm->sev_es.ghcb_sa_free)
+		kfree(svm->sev_es.ghcb_sa);
 }
 
 static void dump_ghcb(struct vcpu_svm *svm)
 {
-	struct ghcb *ghcb = svm->ghcb;
+	struct ghcb *ghcb = svm->sev_es.ghcb;
 	unsigned int nbits;
 
 	/* Re-use the dump_invalid_vmcb module parameter */
@@ -2066,7 +2066,7 @@ static void dump_ghcb(struct vcpu_svm *svm)
 static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
 {
 	struct kvm_vcpu *vcpu = &svm->vcpu;
-	struct ghcb *ghcb = svm->ghcb;
+	struct ghcb *ghcb = svm->sev_es.ghcb;
 
 	/*
 	 * The GHCB protocol so far allows for the following data
@@ -2086,7 +2086,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
 	struct kvm_vcpu *vcpu = &svm->vcpu;
-	struct ghcb *ghcb = svm->ghcb;
+	struct ghcb *ghcb = svm->sev_es.ghcb;
 	u64 exit_code;
 
 	/*
@@ -2133,7 +2133,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	struct ghcb *ghcb;
 	u64 exit_code = 0;
 
-	ghcb = svm->ghcb;
+	ghcb = svm->sev_es.ghcb;
 
 	/* Only GHCB Usage code 0 is supported */
 	if (ghcb->ghcb_usage)
@@ -2251,33 +2251,34 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 
 void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 {
-	if (!svm->ghcb)
+	if (!svm->sev_es.ghcb)
 		return;
 
-	if (svm->ghcb_sa_free) {
+	if (svm->sev_es.ghcb_sa_free) {
 		/*
 		 * The scratch area lives outside the GHCB, so there is a
 		 * buffer that, depending on the operation performed, may
 		 * need to be synced, then freed.
 		 */
-		if (svm->ghcb_sa_sync) {
+		if (svm->sev_es.ghcb_sa_sync) {
 			kvm_write_guest(svm->vcpu.kvm,
-					ghcb_get_sw_scratch(svm->ghcb),
-					svm->ghcb_sa, svm->ghcb_sa_len);
-			svm->ghcb_sa_sync = false;
+					ghcb_get_sw_scratch(svm->sev_es.ghcb),
+					svm->sev_es.ghcb_sa,
+					svm->sev_es.ghcb_sa_len);
+			svm->sev_es.ghcb_sa_sync = false;
 		}
 
-		kfree(svm->ghcb_sa);
-		svm->ghcb_sa = NULL;
-		svm->ghcb_sa_free = false;
+		kfree(svm->sev_es.ghcb_sa);
+		svm->sev_es.ghcb_sa = NULL;
+		svm->sev_es.ghcb_sa_free = false;
 	}
 
-	trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);
+	trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->sev_es.ghcb);
 
 	sev_es_sync_to_ghcb(svm);
 
-	kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
-	svm->ghcb = NULL;
+	kvm_vcpu_unmap(&svm->vcpu, &svm->sev_es.ghcb_map, true);
+	svm->sev_es.ghcb = NULL;
 }
 
 void pre_sev_run(struct vcpu_svm *svm, int cpu)
@@ -2307,7 +2308,7 @@ void pre_sev_run(struct vcpu_svm *svm, int cpu)
 static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
-	struct ghcb *ghcb = svm->ghcb;
+	struct ghcb *ghcb = svm->sev_es.ghcb;
 	u64 ghcb_scratch_beg, ghcb_scratch_end;
 	u64 scratch_gpa_beg, scratch_gpa_end;
 	void *scratch_va;
@@ -2343,7 +2344,7 @@ static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
 			return false;
 		}
 
-		scratch_va = (void *)svm->ghcb;
+		scratch_va = (void *)svm->sev_es.ghcb;
 		scratch_va += (scratch_gpa_beg - control->ghcb_gpa);
 	} else {
 		/*
@@ -2373,12 +2374,12 @@ static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
 		 * the vCPU next time (i.e. a read was requested so the data
 		 * must be written back to the guest memory).
 		 */
-		svm->ghcb_sa_sync = sync;
-		svm->ghcb_sa_free = true;
+		svm->sev_es.ghcb_sa_sync = sync;
+		svm->sev_es.ghcb_sa_free = true;
 	}
 
-	svm->ghcb_sa = scratch_va;
-	svm->ghcb_sa_len = len;
+	svm->sev_es.ghcb_sa = scratch_va;
+	svm->sev_es.ghcb_sa_len = len;
 
 	return true;
 }
@@ -2497,15 +2498,15 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 	}
 
-	if (kvm_vcpu_map(vcpu, ghcb_gpa >> PAGE_SHIFT, &svm->ghcb_map)) {
+	if (kvm_vcpu_map(vcpu, ghcb_gpa >> PAGE_SHIFT, &svm->sev_es.ghcb_map)) {
 		/* Unable to map GHCB from guest */
 		vcpu_unimpl(vcpu, "vmgexit: error mapping GHCB [%#llx] from guest\n",
 			    ghcb_gpa);
 		return -EINVAL;
 	}
 
-	svm->ghcb = svm->ghcb_map.hva;
-	ghcb = svm->ghcb_map.hva;
+	svm->sev_es.ghcb = svm->sev_es.ghcb_map.hva;
+	ghcb = svm->sev_es.ghcb_map.hva;
 
 	trace_kvm_vmgexit_enter(vcpu->vcpu_id, ghcb);
 
@@ -2528,7 +2529,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = kvm_sev_es_mmio_read(vcpu,
 					   control->exit_info_1,
 					   control->exit_info_2,
-					   svm->ghcb_sa);
+					   svm->sev_es.ghcb_sa);
 		break;
 	case SVM_VMGEXIT_MMIO_WRITE:
 		if (!setup_vmgexit_scratch(svm, false, control->exit_info_2))
@@ -2537,7 +2538,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 		ret = kvm_sev_es_mmio_write(vcpu,
 					    control->exit_info_1,
 					    control->exit_info_2,
-					    svm->ghcb_sa);
+					    svm->sev_es.ghcb_sa);
 		break;
 	case SVM_VMGEXIT_NMI_COMPLETE:
 		ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
@@ -2587,8 +2588,8 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
 	if (!setup_vmgexit_scratch(svm, in, svm->vmcb->control.exit_info_2))
 		return -EINVAL;
 
-	return kvm_sev_es_string_io(&svm->vcpu, size, port,
-				    svm->ghcb_sa, svm->ghcb_sa_len / size, in);
+	return kvm_sev_es_string_io(&svm->vcpu, size, port, svm->sev_es.ghcb_sa,
+				    svm->sev_es.ghcb_sa_len / size, in);
 }
 
 void sev_es_init_vmcb(struct vcpu_svm *svm)
@@ -2603,7 +2604,7 @@ void sev_es_init_vmcb(struct vcpu_svm *svm)
 	 * VMCB page. Do not include the encryption mask on the VMSA physical
 	 * address since hardware will access it using the guest key.
 	 */
-	svm->vmcb->control.vmsa_pa = __pa(svm->vmsa);
+	svm->vmcb->control.vmsa_pa = __pa(svm->sev_es.vmsa);
 
 	/* Can't intercept CR register access, HV can't modify CR registers */
 	svm_clr_intercept(svm, INTERCEPT_CR0_READ);
@@ -2675,8 +2676,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	/* First SIPI: Use the values as initially set by the VMM */
-	if (!svm->received_first_sipi) {
-		svm->received_first_sipi = true;
+	if (!svm->sev_es.received_first_sipi) {
+		svm->sev_es.received_first_sipi = true;
 		return;
 	}
 
@@ -2685,8 +2686,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 	 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
 	 * non-zero value.
 	 */
-	if (!svm->ghcb)
+	if (!svm->sev_es.ghcb)
 		return;
 
-	ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, 1);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 989685098b3e..68294491c23d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1372,7 +1372,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
 
 	if (vmsa_page)
-		svm->vmsa = page_address(vmsa_page);
+		svm->sev_es.vmsa = page_address(vmsa_page);
 
 	svm->guest_state_loaded = false;
 
@@ -2762,11 +2762,11 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
+	if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->sev_es.ghcb))
 		return kvm_complete_insn_gp(vcpu, err);
 
-	ghcb_set_sw_exit_info_1(svm->ghcb, 1);
-	ghcb_set_sw_exit_info_2(svm->ghcb,
+	ghcb_set_sw_exit_info_1(svm->sev_es.ghcb, 1);
+	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb,
 				X86_TRAP_GP |
 				SVM_EVTINJ_TYPE_EXEPT |
 				SVM_EVTINJ_VALID);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5d30db599e10..6d8d762d208f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -123,6 +123,20 @@ struct svm_nested_state {
 	bool initialized;
 };
 
+struct vcpu_sev_es_state {
+	/* SEV-ES support */
+	struct vmcb_save_area *vmsa;
+	struct ghcb *ghcb;
+	struct kvm_host_map ghcb_map;
+	bool received_first_sipi;
+
+	/* SEV-ES scratch area support */
+	void *ghcb_sa;
+	u32 ghcb_sa_len;
+	bool ghcb_sa_sync;
+	bool ghcb_sa_free;
+};
+
 struct vcpu_svm {
 	struct kvm_vcpu vcpu;
 	/* vmcb always points at current_vmcb->ptr, it's purely a shorthand. */
@@ -183,17 +197,7 @@ struct vcpu_svm {
 		DECLARE_BITMAP(write, MAX_DIRECT_ACCESS_MSRS);
 	} shadow_msr_intercept;
 
-	/* SEV-ES support */
-	struct vmcb_save_area *vmsa;
-	struct ghcb *ghcb;
-	struct kvm_host_map ghcb_map;
-	bool received_first_sipi;
-
-	/* SEV-ES scratch area support */
-	void *ghcb_sa;
-	u32 ghcb_sa_len;
-	bool ghcb_sa_sync;
-	bool ghcb_sa_free;
+	struct vcpu_sev_es_state sev_es;
 
 	bool guest_state_loaded;
 };
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration
  2021-10-21 17:42 [PATCH 0/5 V11] Add AMD SEV and SEV-ES intra host migration support Peter Gonda
  2021-10-21 17:42 ` [PATCH 1/5 V11] KVM: SEV: Refactor out sev_es_state struct Peter Gonda
@ 2021-10-21 17:43 ` Peter Gonda
  2021-11-04 22:07   ` Sean Christopherson
  2021-10-21 17:43 ` [PATCH V11 3/5] KVM: SEV: Add support for SEV-ES " Peter Gonda
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Peter Gonda @ 2021-10-21 17:43 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Sean Christopherson, Marc Orr, Paolo Bonzini,
	David Rientjes, Dr . David Alan Gilbert, Brijesh Singh,
	Tom Lendacky, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, linux-kernel

For SEV to work with intra host migration, contents of the SEV info struct
such as the ASID (used to index the encryption key in the AMD SP) and
the list of memory regions need to be transferred to the target VM.
This change adds a commands for a target VMM to get a source SEV VM's sev
info.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/virt/kvm/api.rst  |  15 ++++
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/svm/sev.c          | 137 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c          |   1 +
 arch/x86/kvm/svm/svm.h          |   2 +
 arch/x86/kvm/x86.c              |   6 ++
 include/uapi/linux/kvm.h        |   1 +
 7 files changed, 163 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a6729c8cf063..df6f56d689e2 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6706,6 +6706,21 @@ MAP_SHARED mmap will result in an -EINVAL return.
 When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
 perform a bulk copy of tags to/from the guest.
 
+7.29 KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM
+-------------------------------------
+
+Architectures: x86 SEV enabled
+Type: vm
+Parameters: args[0] is the fd of the source vm
+Returns: 0 on success
+
+This capability enables userspace to migrate the encryption context from the VM
+indicated by the fd to the VM this is called on.
+
+This is intended to support intra-host migration of VMs between userspace VMMs.
+in-guest workloads scheduled by the host. This allows for upgrading the VMM
+process without interrupting the guest.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f8f48a7ec577..422869707466 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1468,6 +1468,7 @@ struct kvm_x86_ops {
 	int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp);
 	int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd);
+	int (*vm_migrate_protected_vm_from)(struct kvm *kvm, unsigned int source_fd);
 
 	int (*get_msr_feature)(struct kvm_msr_entry *entry);
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 109b223c0b58..2c2f724c9096 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1529,6 +1529,143 @@ static bool cmd_allowed_from_miror(u32 cmd_id)
 	return false;
 }
 
+static int sev_lock_for_migration(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	/*
+	 * Bail if this VM is already involved in a migration to avoid deadlock
+	 * between two VMs trying to migrate to/from each other.
+	 */
+	if (atomic_cmpxchg_acquire(&sev->migration_in_progress, 0, 1))
+		return -EBUSY;
+
+	mutex_lock(&kvm->lock);
+
+	return 0;
+}
+
+static void sev_unlock_after_migration(struct kvm *kvm)
+{
+	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+	mutex_unlock(&kvm->lock);
+	atomic_set_release(&sev->migration_in_progress, 0);
+}
+
+
+static int sev_lock_vcpus_for_migration(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu;
+	int i, j;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (mutex_lock_killable(&vcpu->mutex))
+			goto out_unlock;
+	}
+
+	return 0;
+
+out_unlock:
+	kvm_for_each_vcpu(j, vcpu, kvm) {
+		if (i == j)
+			break;
+
+		mutex_unlock(&vcpu->mutex);
+	}
+	return -EINTR;
+}
+
+static void sev_unlock_vcpus_for_migration(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		mutex_unlock(&vcpu->mutex);
+	}
+}
+
+static void sev_migrate_from(struct kvm_sev_info *dst,
+			      struct kvm_sev_info *src)
+{
+	dst->active = true;
+	dst->asid = src->asid;
+	dst->misc_cg = src->misc_cg;
+	dst->handle = src->handle;
+	dst->pages_locked = src->pages_locked;
+
+	src->asid = 0;
+	src->active = false;
+	src->handle = 0;
+	src->pages_locked = 0;
+	src->misc_cg = NULL;
+
+	INIT_LIST_HEAD(&dst->regions_list);
+	list_replace_init(&src->regions_list, &dst->regions_list);
+}
+
+int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
+{
+	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
+	struct file *source_kvm_file;
+	struct kvm *source_kvm;
+	struct kvm_vcpu *vcpu;
+	int i, ret;
+
+	ret = sev_lock_for_migration(kvm);
+	if (ret)
+		return ret;
+
+	if (sev_guest(kvm)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	source_kvm_file = fget(source_fd);
+	if (!file_is_kvm(source_kvm_file)) {
+		ret = -EBADF;
+		goto out_fput;
+	}
+
+	source_kvm = source_kvm_file->private_data;
+	ret = sev_lock_for_migration(source_kvm);
+	if (ret)
+		goto out_fput;
+
+	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
+		ret = -EINVAL;
+		goto out_source;
+	}
+	ret = sev_lock_vcpus_for_migration(kvm);
+	if (ret)
+		goto out_dst_vcpu;
+	ret = sev_lock_vcpus_for_migration(source_kvm);
+	if (ret)
+		goto out_source_vcpu;
+
+	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
+	kvm_for_each_vcpu(i, vcpu, source_kvm) {
+		kvm_vcpu_reset(vcpu, /* init_event= */ false);
+	}
+	ret = 0;
+
+out_source_vcpu:
+	sev_unlock_vcpus_for_migration(source_kvm);
+
+out_dst_vcpu:
+	sev_unlock_vcpus_for_migration(kvm);
+
+out_source:
+	sev_unlock_after_migration(source_kvm);
+out_fput:
+	if (source_kvm_file)
+		fput(source_kvm_file);
+out_unlock:
+	sev_unlock_after_migration(kvm);
+	return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_sev_cmd sev_cmd;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 68294491c23d..c2e25ae4757f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4637,6 +4637,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.mem_enc_unreg_region = svm_unregister_enc_region,
 
 	.vm_copy_enc_context_from = svm_vm_copy_asid_from,
+	.vm_migrate_protected_vm_from = svm_vm_migrate_from,
 
 	.can_emulate_instruction = svm_can_emulate_instruction,
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6d8d762d208f..d7b44b37dfcf 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -80,6 +80,7 @@ struct kvm_sev_info {
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct misc_cg *misc_cg; /* For misc cgroup accounting */
+	atomic_t migration_in_progress;
 };
 
 struct kvm_svm {
@@ -557,6 +558,7 @@ int svm_register_enc_region(struct kvm *kvm,
 int svm_unregister_enc_region(struct kvm *kvm,
 			      struct kvm_enc_region *range);
 int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd);
+int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd);
 void pre_sev_run(struct vcpu_svm *svm, int cpu);
 void __init sev_set_cpu_caps(void);
 void __init sev_hardware_setup(void);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c8b5129effd..c80fa1d378c9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5665,6 +5665,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		if (kvm_x86_ops.vm_copy_enc_context_from)
 			r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
 		return r;
+	case KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM:
+		r = -EINVAL;
+		if (kvm_x86_ops.vm_migrate_protected_vm_from)
+			r = kvm_x86_ops.vm_migrate_protected_vm_from(
+				kvm, cap->args[0]);
+		return r;
 	case KVM_CAP_EXIT_HYPERCALL:
 		if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
 			r = -EINVAL;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a067410ebea5..77b292ed01c1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V11 3/5] KVM: SEV: Add support for SEV-ES intra host migration
  2021-10-21 17:42 [PATCH 0/5 V11] Add AMD SEV and SEV-ES intra host migration support Peter Gonda
  2021-10-21 17:42 ` [PATCH 1/5 V11] KVM: SEV: Refactor out sev_es_state struct Peter Gonda
  2021-10-21 17:43 ` [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration Peter Gonda
@ 2021-10-21 17:43 ` Peter Gonda
  2021-11-04 23:33   ` Sean Christopherson
  2021-10-21 17:43 ` [PATCH V11 4/5] selftest: KVM: Add open sev dev helper Peter Gonda
  2021-10-21 17:43 ` [PATCH V11 5/5] selftest: KVM: Add intra host migration tests Peter Gonda
  4 siblings, 1 reply; 14+ messages in thread
From: Peter Gonda @ 2021-10-21 17:43 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Marc Orr, Paolo Bonzini, Sean Christopherson,
	David Rientjes, Dr . David Alan Gilbert, Brijesh Singh,
	Tom Lendacky, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, linux-kernel

For SEV-ES to work with intra host migration the VMSAs, GHCB metadata,
and other SEV-ES info needs to be preserved along with the guest's
memory.

Signed-off-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kvm/svm/sev.c | 50 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2c2f724c9096..d8ce93fd1129 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1605,6 +1605,48 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
 	list_replace_init(&src->regions_list, &dst->regions_list);
 }
 
+static int sev_es_migrate_from(struct kvm *dst, struct kvm *src)
+{
+	int i;
+	struct kvm_vcpu *dst_vcpu, *src_vcpu;
+	struct vcpu_svm *dst_svm, *src_svm;
+
+	if (atomic_read(&src->online_vcpus) != atomic_read(&dst->online_vcpus))
+		return -EINVAL;
+
+	kvm_for_each_vcpu(i, src_vcpu, src) {
+		if (!src_vcpu->arch.guest_state_protected)
+			return -EINVAL;
+	}
+
+	kvm_for_each_vcpu(i, src_vcpu, src) {
+		src_svm = to_svm(src_vcpu);
+		dst_vcpu = kvm_get_vcpu(dst, i);
+		dst_svm = to_svm(dst_vcpu);
+
+		/*
+		 * Transfer VMSA and GHCB state to the destination.  Nullify and
+		 * clear source fields as appropriate, the state now belongs to
+		 * the destination.
+		 */
+		dst_vcpu->vcpu_id = src_vcpu->vcpu_id;
+		dst_svm->sev_es = src_svm->sev_es;
+		dst_svm->vmcb->control.ghcb_gpa =
+			src_svm->vmcb->control.ghcb_gpa;
+		dst_svm->vmcb->control.vmsa_pa = __pa(dst_svm->sev_es.vmsa);
+		dst_vcpu->arch.guest_state_protected = true;
+
+		memset(&src_svm->sev_es, 0, sizeof(src_svm->sev_es));
+		src_svm->vmcb->control.ghcb_gpa = 0;
+		src_svm->vmcb->control.vmsa_pa = 0;
+		src_vcpu->arch.guest_state_protected = false;
+	}
+	to_kvm_svm(src)->sev_info.es_active = false;
+	to_kvm_svm(dst)->sev_info.es_active = true;
+
+	return 0;
+}
+
 int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
 {
 	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
@@ -1633,7 +1675,7 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
 	if (ret)
 		goto out_fput;
 
-	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
+	if (!sev_guest(source_kvm)) {
 		ret = -EINVAL;
 		goto out_source;
 	}
@@ -1644,6 +1686,12 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
 	if (ret)
 		goto out_source_vcpu;
 
+	if (sev_es_guest(source_kvm)) {
+		ret = sev_es_migrate_from(kvm, source_kvm);
+		if (ret)
+			goto out_source_vcpu;
+	}
+
 	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
 	kvm_for_each_vcpu (i, vcpu, source_kvm) {
 		kvm_vcpu_reset(vcpu, /* init_event= */ false);
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V11 4/5] selftest: KVM: Add open sev dev helper
  2021-10-21 17:42 [PATCH 0/5 V11] Add AMD SEV and SEV-ES intra host migration support Peter Gonda
                   ` (2 preceding siblings ...)
  2021-10-21 17:43 ` [PATCH V11 3/5] KVM: SEV: Add support for SEV-ES " Peter Gonda
@ 2021-10-21 17:43 ` Peter Gonda
  2021-11-05 22:47   ` Sean Christopherson
  2021-10-21 17:43 ` [PATCH V11 5/5] selftest: KVM: Add intra host migration tests Peter Gonda
  4 siblings, 1 reply; 14+ messages in thread
From: Peter Gonda @ 2021-10-21 17:43 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Sean Christopherson, Marc Orr, David Rientjes,
	Brijesh Singh, Tom Lendacky, Paolo Bonzini, linux-kernel

Refactors out open path support from open_kvm_dev_path_or_exit() and
adds new helper for SEV device path.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 .../testing/selftests/kvm/include/kvm_util.h  |  1 +
 .../selftests/kvm/include/x86_64/svm_util.h   |  2 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    | 24 +++++++++++--------
 tools/testing/selftests/kvm/lib/x86_64/svm.c  | 13 ++++++++++
 4 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 010b59b13917..368e88305046 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -80,6 +80,7 @@ struct vm_guest_mode_params {
 };
 extern const struct vm_guest_mode_params vm_guest_mode_params[];
 
+int open_path_or_exit(const char *path, int flags);
 int open_kvm_dev_path_or_exit(void);
 int kvm_check_cap(long cap);
 int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap *cap);
diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
index b7531c83b8ae..587fbe408b99 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -46,4 +46,6 @@ static inline bool cpu_has_svm(void)
 	return ecx & CPUID_SVM;
 }
 
+int open_sev_dev_path_or_exit(void);
+
 #endif /* SELFTEST_KVM_SVM_UTILS_H */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 10a8ed691c66..06a6c04010fb 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -31,6 +31,19 @@ static void *align(void *x, size_t size)
 	return (void *) (((size_t) x + mask) & ~mask);
 }
 
+int open_path_or_exit(const char *path, int flags)
+{
+	int fd;
+
+	fd = open(path, flags);
+	if (fd < 0) {
+		print_skip("%s not available (errno: %d)", path, errno);
+		exit(KSFT_SKIP);
+	}
+
+	return fd;
+}
+
 /*
  * Open KVM_DEV_PATH if available, otherwise exit the entire program.
  *
@@ -42,16 +55,7 @@ static void *align(void *x, size_t size)
  */
 static int _open_kvm_dev_path_or_exit(int flags)
 {
-	int fd;
-
-	fd = open(KVM_DEV_PATH, flags);
-	if (fd < 0) {
-		print_skip("%s not available, is KVM loaded? (errno: %d)",
-			   KVM_DEV_PATH, errno);
-		exit(KSFT_SKIP);
-	}
-
-	return fd;
+	return open_path_or_exit(KVM_DEV_PATH, flags);
 }
 
 int open_kvm_dev_path_or_exit(void)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/svm.c b/tools/testing/selftests/kvm/lib/x86_64/svm.c
index 2ac98d70d02b..14a8618efa9c 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/svm.c
@@ -13,6 +13,8 @@
 #include "processor.h"
 #include "svm_util.h"
 
+#define SEV_DEV_PATH "/dev/sev"
+
 struct gpr64_regs guest_regs;
 u64 rflags;
 
@@ -160,3 +162,14 @@ void nested_svm_check_supported(void)
 		exit(KSFT_SKIP);
 	}
 }
+
+/*
+ * Open SEV_DEV_PATH if available, otherwise exit the entire program.
+ *
+ * Return:
+ *   The opened file descriptor of /dev/sev.
+ */
+int open_sev_dev_path_or_exit(void)
+{
+	return open_path_or_exit(SEV_DEV_PATH, 0);
+}
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V11 5/5] selftest: KVM: Add intra host migration tests
  2021-10-21 17:42 [PATCH 0/5 V11] Add AMD SEV and SEV-ES intra host migration support Peter Gonda
                   ` (3 preceding siblings ...)
  2021-10-21 17:43 ` [PATCH V11 4/5] selftest: KVM: Add open sev dev helper Peter Gonda
@ 2021-10-21 17:43 ` Peter Gonda
  4 siblings, 0 replies; 14+ messages in thread
From: Peter Gonda @ 2021-10-21 17:43 UTC (permalink / raw)
  To: kvm
  Cc: Peter Gonda, Sean Christopherson, Marc Orr, David Rientjes,
	Brijesh Singh, Tom Lendacky, Paolo Bonzini, linux-kernel

Adds testcases for intra host migration for SEV and SEV-ES. Also adds
locking test to confirm no deadlock exists.

Signed-off-by: Peter Gonda <pgonda@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 tools/testing/selftests/kvm/Makefile          |   3 +-
 .../selftests/kvm/x86_64/sev_vm_tests.c       | 203 ++++++++++++++++++
 2 files changed, 205 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/sev_vm_tests.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index d1774f461393..c6b37d2045d9 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -72,7 +72,8 @@ TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/xen_shinfo_test
 TEST_GEN_PROGS_x86_64 += x86_64/xen_vmcall_test
-TEST_GEN_PROGS_x86_64 += access_tracking_perf_test
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_pi_mmio_test
+TEST_GEN_PROGS_x86_64 += x86_64/sev_vm_tests
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
 TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/x86_64/sev_vm_tests.c b/tools/testing/selftests/kvm/x86_64/sev_vm_tests.c
new file mode 100644
index 000000000000..ec3bbc96e73a
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/sev_vm_tests.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kvm.h>
+#include <linux/psp-sev.h>
+#include <stdio.h>
+#include <sys/ioctl.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <pthread.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+#include "kselftest.h"
+#include "../lib/kvm_util_internal.h"
+
+#define SEV_POLICY_ES 0b100
+
+#define NR_MIGRATE_TEST_VCPUS 4
+#define NR_MIGRATE_TEST_VMS 3
+#define NR_LOCK_TESTING_THREADS 3
+#define NR_LOCK_TESTING_ITERATIONS 10000
+
+static void sev_ioctl(int vm_fd, int cmd_id, void *data)
+{
+	struct kvm_sev_cmd cmd = {
+		.id = cmd_id,
+		.data = (uint64_t)data,
+		.sev_fd = open_sev_dev_path_or_exit(),
+	};
+	int ret;
+
+	ret = ioctl(vm_fd, KVM_MEMORY_ENCRYPT_OP, &cmd);
+	TEST_ASSERT((ret == 0 || cmd.error == SEV_RET_SUCCESS),
+		    "%d failed: return code: %d, errno: %d, fw error: %d",
+		    cmd_id, ret, errno, cmd.error);
+}
+
+static struct kvm_vm *sev_vm_create(bool es)
+{
+	struct kvm_vm *vm;
+	struct kvm_sev_launch_start start = { 0 };
+	int i;
+
+	vm = vm_create(VM_MODE_DEFAULT, 0, O_RDWR);
+	sev_ioctl(vm->fd, es ? KVM_SEV_ES_INIT : KVM_SEV_INIT, NULL);
+	for (i = 0; i < NR_MIGRATE_TEST_VCPUS; ++i)
+		vm_vcpu_add(vm, i);
+	if (es)
+		start.policy |= SEV_POLICY_ES;
+	sev_ioctl(vm->fd, KVM_SEV_LAUNCH_START, &start);
+	if (es)
+		sev_ioctl(vm->fd, KVM_SEV_LAUNCH_UPDATE_VMSA, NULL);
+	return vm;
+}
+
+static struct kvm_vm *__vm_create(void)
+{
+	struct kvm_vm *vm;
+	int i;
+
+	vm = vm_create(VM_MODE_DEFAULT, 0, O_RDWR);
+	for (i = 0; i < NR_MIGRATE_TEST_VCPUS; ++i)
+		vm_vcpu_add(vm, i);
+
+	return vm;
+}
+
+static int __sev_migrate_from(int dst_fd, int src_fd)
+{
+	struct kvm_enable_cap cap = {
+		.cap = KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM,
+		.args = { src_fd }
+	};
+
+	return ioctl(dst_fd, KVM_ENABLE_CAP, &cap);
+}
+
+
+static void sev_migrate_from(int dst_fd, int src_fd)
+{
+	int ret;
+
+	ret = __sev_migrate_from(dst_fd, src_fd);
+	TEST_ASSERT(!ret, "Migration failed, ret: %d, errno: %d\n", ret, errno);
+}
+
+static void test_sev_migrate_from(bool es)
+{
+	struct kvm_vm *src_vm;
+	struct kvm_vm *dst_vms[NR_MIGRATE_TEST_VMS];
+	int i;
+
+	src_vm = sev_vm_create(es);
+	for (i = 0; i < NR_MIGRATE_TEST_VMS; ++i)
+		dst_vms[i] = __vm_create();
+
+	/* Initial migration from the src to the first dst. */
+	sev_migrate_from(dst_vms[0]->fd, src_vm->fd);
+
+	for (i = 1; i < NR_MIGRATE_TEST_VMS; i++)
+		sev_migrate_from(dst_vms[i]->fd, dst_vms[i - 1]->fd);
+
+	/* Migrate the guest back to the original VM. */
+	sev_migrate_from(src_vm->fd, dst_vms[NR_MIGRATE_TEST_VMS - 1]->fd);
+
+	kvm_vm_free(src_vm);
+	for (i = 0; i < NR_MIGRATE_TEST_VMS; ++i)
+		kvm_vm_free(dst_vms[i]);
+}
+
+struct locking_thread_input {
+	struct kvm_vm *vm;
+	int source_fds[NR_LOCK_TESTING_THREADS];
+};
+
+static void *locking_test_thread(void *arg)
+{
+	int i, j;
+	struct locking_thread_input *input = (struct locking_test_thread *)arg;
+
+	for (i = 0; i < NR_LOCK_TESTING_ITERATIONS; ++i) {
+		j = i % NR_LOCK_TESTING_THREADS;
+		__sev_migrate_from(input->vm->fd, input->source_fds[j]);
+	}
+
+	return NULL;
+}
+
+static void test_sev_migrate_locking(void)
+{
+	struct locking_thread_input input[NR_LOCK_TESTING_THREADS];
+	pthread_t pt[NR_LOCK_TESTING_THREADS];
+	int i;
+
+	for (i = 0; i < NR_LOCK_TESTING_THREADS; ++i) {
+		input[i].vm = sev_vm_create(/* es= */ false);
+		input[0].source_fds[i] = input[i].vm->fd;
+	}
+	for (i = 1; i < NR_LOCK_TESTING_THREADS; ++i)
+		memcpy(input[i].source_fds, input[0].source_fds,
+		       sizeof(input[i].source_fds));
+
+	for (i = 0; i < NR_LOCK_TESTING_THREADS; ++i)
+		pthread_create(&pt[i], NULL, locking_test_thread, &input[i]);
+
+	for (i = 0; i < NR_LOCK_TESTING_THREADS; ++i)
+		pthread_join(pt[i], NULL);
+}
+
+static void test_sev_migrate_parameters(void)
+{
+	struct kvm_vm *sev_vm, *sev_es_vm, *vm_no_vcpu, *vm_no_sev,
+		*sev_es_vm_no_vmsa;
+	int ret;
+
+	sev_vm = sev_vm_create(/* es= */ false);
+	sev_es_vm = sev_vm_create(/* es= */ true);
+	vm_no_vcpu = vm_create(VM_MODE_DEFAULT, 0, O_RDWR);
+	vm_no_sev = __vm_create();
+	sev_es_vm_no_vmsa = vm_create(VM_MODE_DEFAULT, 0, O_RDWR);
+	sev_ioctl(sev_es_vm_no_vmsa->fd, KVM_SEV_ES_INIT, NULL);
+	vm_vcpu_add(sev_es_vm_no_vmsa, 1);
+
+
+	ret = __sev_migrate_from(sev_vm->fd, sev_es_vm->fd);
+	TEST_ASSERT(
+		ret == -1 && errno == EINVAL,
+		"Should not be able migrate to SEV enabled VM. ret: %d, errno: %d\n",
+		ret, errno);
+
+	ret = __sev_migrate_from(sev_es_vm->fd, sev_vm->fd);
+	TEST_ASSERT(
+		ret == -1 && errno == EINVAL,
+		"Should not be able migrate to SEV-ES enabled VM. ret: %d, errno: %d\n",
+		ret, errno);
+
+	ret = __sev_migrate_from(vm_no_vcpu->fd, sev_es_vm->fd);
+	TEST_ASSERT(
+		ret == -1 && errno == EINVAL,
+		"SEV-ES migrations require same number of vCPUS. ret: %d, errno: %d\n",
+		ret, errno);
+
+	ret = __sev_migrate_from(vm_no_vcpu->fd, sev_es_vm_no_vmsa->fd);
+	TEST_ASSERT(
+		ret == -1 && errno == EINVAL,
+		"SEV-ES migrations require UPDATE_VMSA. ret %d, errno: %d\n",
+		ret, errno);
+
+	ret = __sev_migrate_from(vm_no_vcpu->fd, vm_no_sev->fd);
+	TEST_ASSERT(ret == -1 && errno == EINVAL,
+		    "Migrations require SEV enabled. ret %d, errno: %d\n", ret,
+		    errno);
+}
+
+int main(int argc, char *argv[])
+{
+	test_sev_migrate_from(/* es= */ false);
+	test_sev_migrate_from(/* es= */ true);
+	test_sev_migrate_locking();
+	test_sev_migrate_parameters();
+	return 0;
+}
-- 
2.33.0.1079.g6e70778dc9-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5 V11] KVM: SEV: Refactor out sev_es_state struct
  2021-10-21 17:42 ` [PATCH 1/5 V11] KVM: SEV: Refactor out sev_es_state struct Peter Gonda
@ 2021-11-04 20:06   ` Sean Christopherson
  0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-11-04 20:06 UTC (permalink / raw)
  To: Peter Gonda
  Cc: kvm, Tom Lendacky, Marc Orr, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, linux-kernel

FWIW, "git format-patch -v $version" will generate the version info for you.
Based on the weird ordering of the shortlog and the cover letter subject

   [PATCH 1/5 V11] KVM: SEV: Refactor out sev_es_state struct

vs.

   [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration

it looks like you might be hand-editing these?

Note, format-patch also takes --rfc, and you can override the entire thing for a
free form prefixes in --subject-prefix, e.g. I use it to call out "[kvm-unit-tests PATCH ...].

The PATCH part of the "prefix" needs to be provided manually, but then you do fun
things like replace PATCH with something else entirely, append an incremental
version, e.g. v2.1, which some people do if they need to make a small change to
one patch in a large series.

On Thu, Oct 21, 2021, Peter Gonda wrote:
> Move SEV-ES vCPU metadata into new sev_es_state struct from vcpu_svm.
> 
> Signed-off-by: Peter Gonda <pgonda@google.com>
> Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
> Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Marc Orr <marcorr@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Brijesh Singh <brijesh.singh@amd.com>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---

At some point in the future we really need to do some cleanup to reduce the
"depth" of things like svm->sev_es... and svm->vcpu.arch..., but for now,

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration
  2021-10-21 17:43 ` [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration Peter Gonda
@ 2021-11-04 22:07   ` Sean Christopherson
  2021-11-04 23:04     ` Sean Christopherson
                       ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-11-04 22:07 UTC (permalink / raw)
  To: Peter Gonda
  Cc: kvm, Marc Orr, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Tom Lendacky,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	linux-kernel

Paolo and anyone else, any thoughts before I lead Peter on an even longer wild
goose chase?

On Thu, Oct 21, 2021, Peter Gonda wrote:
> @@ -6706,6 +6706,21 @@ MAP_SHARED mmap will result in an -EINVAL return.
>  When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
>  perform a bulk copy of tags to/from the guest.
>  
> +7.29 KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM
> +-------------------------------------
> +
> +Architectures: x86 SEV enabled

I'd drop the "SEV enabled" part.  In a way, it's technically a lie for this one
patch since an SEV-ES VM is also an SEV VM, but doesn't support this capability.
And AFAICT no other ioctl()/capability provides this level of granularity.

> +Type: vm
> +Parameters: args[0] is the fd of the source vm
> +Returns: 0 on success
> +
> +This capability enables userspace to migrate the encryption context from the VM
> +indicated by the fd to the VM this is called on.
> +
> +This is intended to support intra-host migration of VMs between userspace VMMs.
> +in-guest workloads scheduled by the host. This allows for upgrading the VMM
> +process without interrupting the guest.
> +

...

> +static void sev_unlock_vcpus_for_migration(struct kvm *kvm)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {

Braces not needed.

> +		mutex_unlock(&vcpu->mutex);
> +	}
> +}
> +
> +static void sev_migrate_from(struct kvm_sev_info *dst,
> +			      struct kvm_sev_info *src)
> +{
> +	dst->active = true;
> +	dst->asid = src->asid;
> +	dst->misc_cg = src->misc_cg;

Ah, this is not correct.  If @dst is in a different cgroup, then @dst needs to
be charged and @src needs to be uncharged.

That would also provide a good opportunity to more tightly couple ->asid and
->misc_cg in the form of a helper.  Looking at the code, there's an invariant
that misc_cg is NULL if an ASID is not assigned.  I.e. these three lines belong
in a helper, irrespective of this code.

	misc_cg_uncharge(type, sev->misc_cg, 1);
	put_misc_cg(sev->misc_cg);
	sev->misc_cg = NULL;

> +	dst->handle = src->handle;
> +	dst->pages_locked = src->pages_locked;
> +
> +	src->asid = 0;
> +	src->active = false;
> +	src->handle = 0;
> +	src->pages_locked = 0;
> +	src->misc_cg = NULL;
> +	INIT_LIST_HEAD(&dst->regions_list);
> +	list_replace_init(&src->regions_list, &dst->regions_list);
> +}
> +
> +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> +{
> +	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> +	struct file *source_kvm_file;
> +	struct kvm *source_kvm;
> +	struct kvm_vcpu *vcpu;
> +	int i, ret;
> +
> +	ret = sev_lock_for_migration(kvm);
> +	if (ret)
> +		return ret;
> +
> +	if (sev_guest(kvm)) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	source_kvm_file = fget(source_fd);
> +	if (!file_is_kvm(source_kvm_file)) {
> +		ret = -EBADF;
> +		goto out_fput;
> +	}
> +
> +	source_kvm = source_kvm_file->private_data;
> +	ret = sev_lock_for_migration(source_kvm);
> +	if (ret)
> +		goto out_fput;
> +
> +	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> +		ret = -EINVAL;
> +		goto out_source;
> +	}
> +	ret = sev_lock_vcpus_for_migration(kvm);
> +	if (ret)
> +		goto out_dst_vcpu;
> +	ret = sev_lock_vcpus_for_migration(source_kvm);
> +	if (ret)
> +		goto out_source_vcpu;
> +
> +	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> +	kvm_for_each_vcpu(i, vcpu, source_kvm) {

Braces not needed.

> +		kvm_vcpu_reset(vcpu, /* init_event= */ false);

Phooey.  I made this suggestion, but in hindsight, it's a bad suggestion as KVM
doesn't currently have a true RESET path; there are quite a few blobs of code
that assume the vCPU has never been run if init_event=false.

And to go through kvm_vcpu_reset(), the vcpu needs to be loaded, not just locked.
It won't fail as hard as VMX, where KVM would write the wrong VMCS, but odds are
good something will eventually go sideways.

Aha!  An idea.  Marking the VM bugged doesn't work because "we need to keep using
the  source VM even after the state is transfered"[*], but the core idea is sound,
it just needs to add a different flag to more precisely prevent kvm_vcpu_ioctl().

If we rename KVM_REQ_VM_BUGGED=>KVM_REQ_VM_DEAD in a prep patch (see below), then
this patch can add something here (can't think of a good name)

	source_kvm->??? = true;
	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);

and then check it in kvm_vcpu_ioctl()

	struct kvm *kvm = vcpu->kvm;

	if (kvm->mm != current->mm || kvm->vm_bugged || kvm->???)
		return -EIO;

That way the source vCPUs don't need to be locked and all vCPU ioctls() are
blocked, which I think is ideal since the vCPUs are in a frankenstate and really
should just die.

Maybe we can call the flag "zombie", or "mostly_dead" :-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c80fa1d378c9..e3f49ca01f95 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9423,7 +9423,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
        }

        if (kvm_request_pending(vcpu)) {
-               if (kvm_check_request(KVM_REQ_VM_BUGGED, vcpu)) {
+               if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
                        r = -EIO;
                        goto out;
                }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0f18df7fe874..de8d25cef183 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -150,7 +150,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_MMU_RELOAD        (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK           2
 #define KVM_REQ_UNHALT            3
-#define KVM_REQ_VM_BUGGED         (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD                  (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQUEST_ARCH_BASE     8

 #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
@@ -654,7 +654,7 @@ struct kvm {
 static inline void kvm_vm_bugged(struct kvm *kvm)
 {
        kvm->vm_bugged = true;
-       kvm_make_all_cpus_request(kvm, KVM_REQ_VM_BUGGED);
+       kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
 }

 #define KVM_BUG(cond, kvm, fmt...)


Back when I made this bad suggestion in v7, you said "we need to keep using the
source VM even after the state is transfered"[*].  What all do you need to do
after the migration?  I assume it's mostly memory related per-VM ioctls?


[*] https://lkml.kernel.org/r/CAMkAt6q3as414YMZco6UyCycY+jKbaYS5BUdC+U+8iWmBft3+A@mail.gmail.com

> +	}
> +	ret = 0;
> +
> +out_source_vcpu:
> +	sev_unlock_vcpus_for_migration(source_kvm);
> +
> +out_dst_vcpu:
> +	sev_unlock_vcpus_for_migration(kvm);
> +
> +out_source:
> +	sev_unlock_after_migration(source_kvm);
> +out_fput:
> +	if (source_kvm_file)
> +		fput(source_kvm_file);
> +out_unlock:
> +	sev_unlock_after_migration(kvm);
> +	return ret;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
>  	struct kvm_sev_cmd sev_cmd;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 68294491c23d..c2e25ae4757f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4637,6 +4637,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.mem_enc_unreg_region = svm_unregister_enc_region,
>  
>  	.vm_copy_enc_context_from = svm_vm_copy_asid_from,
> +	.vm_migrate_protected_vm_from = svm_vm_migrate_from,
>  
>  	.can_emulate_instruction = svm_can_emulate_instruction,
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 6d8d762d208f..d7b44b37dfcf 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -80,6 +80,7 @@ struct kvm_sev_info {
>  	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
>  	struct kvm *enc_context_owner; /* Owner of copied encryption context */
>  	struct misc_cg *misc_cg; /* For misc cgroup accounting */
> +	atomic_t migration_in_progress;
>  };
>  
>  struct kvm_svm {
> @@ -557,6 +558,7 @@ int svm_register_enc_region(struct kvm *kvm,
>  int svm_unregister_enc_region(struct kvm *kvm,
>  			      struct kvm_enc_region *range);
>  int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd);
> +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd);
>  void pre_sev_run(struct vcpu_svm *svm, int cpu);
>  void __init sev_set_cpu_caps(void);
>  void __init sev_hardware_setup(void);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0c8b5129effd..c80fa1d378c9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5665,6 +5665,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		if (kvm_x86_ops.vm_copy_enc_context_from)
>  			r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
>  		return r;
> +	case KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM:

I wonder... would it make sense to hedge and just call this KVM_CAP_VM_MIGRATE_VM_FROM?
I can't think of a use case where KVM would "need" to do this for a non-protected
VM, but I also don't see a huge naming problem if the "PROTECTED" is omitted.

> +		r = -EINVAL;
> +		if (kvm_x86_ops.vm_migrate_protected_vm_from)
> +			r = kvm_x86_ops.vm_migrate_protected_vm_from(
> +				kvm, cap->args[0]);

Either let that poke out and/or refactor to avoid the indentation.  E.g.

		r = -EINVAL;
		if (!kvm_x86_ops.vm_migrate_protected_vm_from)
			break;

		return kvm_x86_ops.vm_migrate_protected_vm_from(kvm, cap->args[0]);

		
> +		return r;
>  	case KVM_CAP_EXIT_HYPERCALL:
>  		if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
>  			r = -EINVAL;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index a067410ebea5..77b292ed01c1 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_BINARY_STATS_FD 203
>  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>  #define KVM_CAP_ARM_MTE 205
> +#define KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM 206
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.33.0.1079.g6e70778dc9-goog
> 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration
  2021-11-04 22:07   ` Sean Christopherson
@ 2021-11-04 23:04     ` Sean Christopherson
  2021-11-09 15:19     ` Peter Gonda
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-11-04 23:04 UTC (permalink / raw)
  To: Peter Gonda
  Cc: kvm, Marc Orr, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Tom Lendacky,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	linux-kernel

On Thu, Nov 04, 2021, Sean Christopherson wrote:
> On Thu, Oct 21, 2021, Peter Gonda wrote:
> > +	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> > +		ret = -EINVAL;
> > +		goto out_source;
> > +	}
> > +	ret = sev_lock_vcpus_for_migration(kvm);
> > +	if (ret)
> > +		goto out_dst_vcpu;
> > +	ret = sev_lock_vcpus_for_migration(source_kvm);
> > +	if (ret)
> > +		goto out_source_vcpu;
> > +
> > +	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> > +	kvm_for_each_vcpu(i, vcpu, source_kvm) {
> 
> Braces not needed.
> 
> > +		kvm_vcpu_reset(vcpu, /* init_event= */ false);

...

> That way the source vCPUs don't need to be locked

Scratch that particular idea, I keep forgetting the SEV-ES support needs to lock
the source vCPUs to transfer state.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V11 3/5] KVM: SEV: Add support for SEV-ES intra host migration
  2021-10-21 17:43 ` [PATCH V11 3/5] KVM: SEV: Add support for SEV-ES " Peter Gonda
@ 2021-11-04 23:33   ` Sean Christopherson
  0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-11-04 23:33 UTC (permalink / raw)
  To: Peter Gonda
  Cc: kvm, Marc Orr, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Tom Lendacky,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	linux-kernel

On Thu, Oct 21, 2021, Peter Gonda wrote:
> ---
>  arch/x86/kvm/svm/sev.c | 50 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2c2f724c9096..d8ce93fd1129 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1605,6 +1605,48 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
>  	list_replace_init(&src->regions_list, &dst->regions_list);
>  }
>  
> +static int sev_es_migrate_from(struct kvm *dst, struct kvm *src)
> +{
> +	int i;
> +	struct kvm_vcpu *dst_vcpu, *src_vcpu;
> +	struct vcpu_svm *dst_svm, *src_svm;

What do you think about following the style of svm_vm_migrate_from(), where the
"dst" is simply "kvm"?  I like that because (a) it shortens all of these lines,
and (b) conveys the idea that the functions are running in the context of "this"
kvm, as opposed to being a third party that operates on an unrelated source and
destination.

> +
> +	if (atomic_read(&src->online_vcpus) != atomic_read(&dst->online_vcpus))
> +		return -EINVAL;
> +
> +	kvm_for_each_vcpu(i, src_vcpu, src) {
> +		if (!src_vcpu->arch.guest_state_protected)
> +			return -EINVAL;
> +	}
> +
> +	kvm_for_each_vcpu(i, src_vcpu, src) {
> +		src_svm = to_svm(src_vcpu);
> +		dst_vcpu = kvm_get_vcpu(dst, i);
> +		dst_svm = to_svm(dst_vcpu);
> +
> +		/*
> +		 * Transfer VMSA and GHCB state to the destination.  Nullify and
> +		 * clear source fields as appropriate, the state now belongs to
> +		 * the destination.
> +		 */
> +		dst_vcpu->vcpu_id = src_vcpu->vcpu_id;

vcpu_id is an odd thing to copy over.  That's fully controlled by userspace, and
is effectively immutable in KVM.  I don't think SEV-ES should be touching anything
besides SEV-ES state.

> +		dst_svm->sev_es = src_svm->sev_es;

Uber nit, maybe use memcpy() to "pair" with the memset() below?

> +		dst_svm->vmcb->control.ghcb_gpa =
> +			src_svm->vmcb->control.ghcb_gpa;
> +		dst_svm->vmcb->control.vmsa_pa = __pa(dst_svm->sev_es.vmsa);

Oof!  This _looks_ wrong, but it's not because dst_svm->sev_es.vmsa is copied
from the source in that subtle not-memcpy()-memcpy above.  The result of __pa()
absolutely will not change, so I would say just do the obvious

		dst_svm->vmcb->control.vmsa_pa = src_svm->vmcb->control.vmsa_pa;

and not force readers to think too hard.  That also avoids breakage if the order
is changed.

> +		dst_vcpu->arch.guest_state_protected = true;
> +
> +		memset(&src_svm->sev_es, 0, sizeof(src_svm->sev_es));
> +		src_svm->vmcb->control.ghcb_gpa = 0;
> +		src_svm->vmcb->control.vmsa_pa = 0;

'0' is not an invalid (G)PA.  INVALID_PAGE would be the most appropriate.

> +		src_vcpu->arch.guest_state_protected = false;
> +	}
> +	to_kvm_svm(src)->sev_info.es_active = false;
> +	to_kvm_svm(dst)->sev_info.es_active = true;
> +
> +	return 0;
> +}
> +
>  int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
>  {
>  	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;

And if we do the above s/dst_//, do it here as well.

> @@ -1633,7 +1675,7 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
>  	if (ret)
>  		goto out_fput;
>  
> -	if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> +	if (!sev_guest(source_kvm)) {
>  		ret = -EINVAL;
>  		goto out_source;
>  	}
> @@ -1644,6 +1686,12 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
>  	if (ret)
>  		goto out_source_vcpu;
>  
> +	if (sev_es_guest(source_kvm)) {
> +		ret = sev_es_migrate_from(kvm, source_kvm);
> +		if (ret)
> +			goto out_source_vcpu;
> +	}
> +
>  	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
>  	kvm_for_each_vcpu (i, vcpu, source_kvm) {
>  		kvm_vcpu_reset(vcpu, /* init_event= */ false);
> -- 
> 2.33.0.1079.g6e70778dc9-goog
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V11 4/5] selftest: KVM: Add open sev dev helper
  2021-10-21 17:43 ` [PATCH V11 4/5] selftest: KVM: Add open sev dev helper Peter Gonda
@ 2021-11-05 22:47   ` Sean Christopherson
  0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-11-05 22:47 UTC (permalink / raw)
  To: Peter Gonda
  Cc: kvm, Marc Orr, David Rientjes, Brijesh Singh, Tom Lendacky,
	Paolo Bonzini, linux-kernel

On Thu, Oct 21, 2021, Peter Gonda wrote:
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 10a8ed691c66..06a6c04010fb 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -31,6 +31,19 @@ static void *align(void *x, size_t size)
>  	return (void *) (((size_t) x + mask) & ~mask);
>  }
>  
> +int open_path_or_exit(const char *path, int flags)
> +{
> +	int fd;
> +
> +	fd = open(path, flags);
> +	if (fd < 0) {
> +		print_skip("%s not available (errno: %d)", path, errno);

While you're here, can you add the strerror(errno) as well?  There are some other
enhancements I'd like to make as some failure modes are really annoying, e.g. if
the max vCPUs test fails/skips due to ulimits, but printing the human friendly
version is an easy one to pick off.

> +		exit(KSFT_SKIP);
> +	}
> +
> +	return fd;
> +}
> +
>  /*
>   * Open KVM_DEV_PATH if available, otherwise exit the entire program.
>   *
> @@ -42,16 +55,7 @@ static void *align(void *x, size_t size)
>   */
>  static int _open_kvm_dev_path_or_exit(int flags)
>  {
> -	int fd;
> -
> -	fd = open(KVM_DEV_PATH, flags);
> -	if (fd < 0) {
> -		print_skip("%s not available, is KVM loaded? (errno: %d)",
> -			   KVM_DEV_PATH, errno);
> -		exit(KSFT_SKIP);
> -	}
> -
> -	return fd;
> +	return open_path_or_exit(KVM_DEV_PATH, flags);
>  }
>  
>  int open_kvm_dev_path_or_exit(void)
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/svm.c b/tools/testing/selftests/kvm/lib/x86_64/svm.c
> index 2ac98d70d02b..14a8618efa9c 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/svm.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/svm.c
> @@ -13,6 +13,8 @@
>  #include "processor.h"
>  #include "svm_util.h"
>  
> +#define SEV_DEV_PATH "/dev/sev"
> +
>  struct gpr64_regs guest_regs;
>  u64 rflags;
>  
> @@ -160,3 +162,14 @@ void nested_svm_check_supported(void)
>  		exit(KSFT_SKIP);
>  	}
>  }
> +
> +/*
> + * Open SEV_DEV_PATH if available, otherwise exit the entire program.
> + *
> + * Return:
> + *   The opened file descriptor of /dev/sev.
> + */
> +int open_sev_dev_path_or_exit(void)
> +{
> +	return open_path_or_exit(SEV_DEV_PATH, 0);
> +}
> -- 
> 2.33.0.1079.g6e70778dc9-goog
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration
  2021-11-04 22:07   ` Sean Christopherson
  2021-11-04 23:04     ` Sean Christopherson
@ 2021-11-09 15:19     ` Peter Gonda
  2021-11-11 15:17     ` Paolo Bonzini
  2021-11-11 15:18     ` Paolo Bonzini
  3 siblings, 0 replies; 14+ messages in thread
From: Peter Gonda @ 2021-11-09 15:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Marc Orr, Paolo Bonzini, David Rientjes,
	Dr . David Alan Gilbert, Brijesh Singh, Tom Lendacky,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	linux-kernel

On Thu, Nov 4, 2021 at 4:07 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Paolo and anyone else, any thoughts before I lead Peter on an even longer wild
> goose chase?

Input would be appreciated.

Commented on some questions, otherwise taken feedback for a new revision.

>
> On Thu, Oct 21, 2021, Peter Gonda wrote:
> > @@ -6706,6 +6706,21 @@ MAP_SHARED mmap will result in an -EINVAL return.
> >  When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
> >  perform a bulk copy of tags to/from the guest.
> >
> > +7.29 KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM
> > +-------------------------------------
> > +
> > +Architectures: x86 SEV enabled
>
> I'd drop the "SEV enabled" part.  In a way, it's technically a lie for this one
> patch since an SEV-ES VM is also an SEV VM, but doesn't support this capability.
> And AFAICT no other ioctl()/capability provides this level of granularity.
>
> > +Type: vm
> > +Parameters: args[0] is the fd of the source vm
> > +Returns: 0 on success
> > +
> > +This capability enables userspace to migrate the encryption context from the VM
> > +indicated by the fd to the VM this is called on.
> > +
> > +This is intended to support intra-host migration of VMs between userspace VMMs.
> > +in-guest workloads scheduled by the host. This allows for upgrading the VMM
> > +process without interrupting the guest.
> > +
>
> ...
>
> > +static void sev_unlock_vcpus_for_migration(struct kvm *kvm)
> > +{
> > +     struct kvm_vcpu *vcpu;
> > +     int i;
> > +
> > +     kvm_for_each_vcpu(i, vcpu, kvm) {
>
> Braces not needed.
>
> > +             mutex_unlock(&vcpu->mutex);
> > +     }
> > +}
> > +
> > +static void sev_migrate_from(struct kvm_sev_info *dst,
> > +                           struct kvm_sev_info *src)
> > +{
> > +     dst->active = true;
> > +     dst->asid = src->asid;
> > +     dst->misc_cg = src->misc_cg;
>
> Ah, this is not correct.  If @dst is in a different cgroup, then @dst needs to
> be charged and @src needs to be uncharged.
>
> That would also provide a good opportunity to more tightly couple ->asid and
> ->misc_cg in the form of a helper.  Looking at the code, there's an invariant
> that misc_cg is NULL if an ASID is not assigned.  I.e. these three lines belong
> in a helper, irrespective of this code.
>
>         misc_cg_uncharge(type, sev->misc_cg, 1);
>         put_misc_cg(sev->misc_cg);
>         sev->misc_cg = NULL;
>

OK I can pull this out into a helper in a separate patch. Then do the
uncharge/charge here.

> > +     dst->handle = src->handle;
> > +     dst->pages_locked = src->pages_locked;
> > +
> > +     src->asid = 0;
> > +     src->active = false;
> > +     src->handle = 0;
> > +     src->pages_locked = 0;
> > +     src->misc_cg = NULL;
> > +     INIT_LIST_HEAD(&dst->regions_list);
> > +     list_replace_init(&src->regions_list, &dst->regions_list);
> > +}
> > +
> > +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
> > +{
> > +     struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
> > +     struct file *source_kvm_file;
> > +     struct kvm *source_kvm;
> > +     struct kvm_vcpu *vcpu;
> > +     int i, ret;
> > +
> > +     ret = sev_lock_for_migration(kvm);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (sev_guest(kvm)) {
> > +             ret = -EINVAL;
> > +             goto out_unlock;
> > +     }
> > +
> > +     source_kvm_file = fget(source_fd);
> > +     if (!file_is_kvm(source_kvm_file)) {
> > +             ret = -EBADF;
> > +             goto out_fput;
> > +     }
> > +
> > +     source_kvm = source_kvm_file->private_data;
> > +     ret = sev_lock_for_migration(source_kvm);
> > +     if (ret)
> > +             goto out_fput;
> > +
> > +     if (!sev_guest(source_kvm) || sev_es_guest(source_kvm)) {
> > +             ret = -EINVAL;
> > +             goto out_source;
> > +     }
> > +     ret = sev_lock_vcpus_for_migration(kvm);
> > +     if (ret)
> > +             goto out_dst_vcpu;
> > +     ret = sev_lock_vcpus_for_migration(source_kvm);
> > +     if (ret)
> > +             goto out_source_vcpu;
> > +
> > +     sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
> > +     kvm_for_each_vcpu(i, vcpu, source_kvm) {
>
> Braces not needed.
>
> > +             kvm_vcpu_reset(vcpu, /* init_event= */ false);
>
> Phooey.  I made this suggestion, but in hindsight, it's a bad suggestion as KVM
> doesn't currently have a true RESET path; there are quite a few blobs of code
> that assume the vCPU has never been run if init_event=false.
>
> And to go through kvm_vcpu_reset(), the vcpu needs to be loaded, not just locked.
> It won't fail as hard as VMX, where KVM would write the wrong VMCS, but odds are
> good something will eventually go sideways.
>
> Aha!  An idea.  Marking the VM bugged doesn't work because "we need to keep using
> the  source VM even after the state is transfered"[*], but the core idea is sound,
> it just needs to add a different flag to more precisely prevent kvm_vcpu_ioctl().
>
> If we rename KVM_REQ_VM_BUGGED=>KVM_REQ_VM_DEAD in a prep patch (see below), then
> this patch can add something here (can't think of a good name)
>
>         source_kvm->??? = true;
>         kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
>
> and then check it in kvm_vcpu_ioctl()
>
>         struct kvm *kvm = vcpu->kvm;
>
>         if (kvm->mm != current->mm || kvm->vm_bugged || kvm->???)
>                 return -EIO;
>
> That way the source vCPUs don't need to be locked and all vCPU ioctls() are
> blocked, which I think is ideal since the vCPUs are in a frankenstate and really
> should just die.
>
> Maybe we can call the flag "zombie", or "mostly_dead" :-)

Do we actually need this functionality? We currently do intra-host
migration leaving the old vCPUs in a potentially dangling state until
we clean them up, so I am wondering why SEV VMs intra-host migration
should be special cased? Why not just zero out all the SEV-ES state
here so they cannot corrupt the GHCB, VMSA, etc. That is already safer
than what's done currently since the source vCPUs are still available
until the source VMM starts to tear down, those vCPUs could still be
run affecting guest state unexpectedly.

>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c80fa1d378c9..e3f49ca01f95 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9423,7 +9423,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>         }
>
>         if (kvm_request_pending(vcpu)) {
> -               if (kvm_check_request(KVM_REQ_VM_BUGGED, vcpu)) {
> +               if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>                         r = -EIO;
>                         goto out;
>                 }
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0f18df7fe874..de8d25cef183 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -150,7 +150,7 @@ static inline bool is_error_page(struct page *page)
>  #define KVM_REQ_MMU_RELOAD        (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQ_UNBLOCK           2
>  #define KVM_REQ_UNHALT            3
> -#define KVM_REQ_VM_BUGGED         (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> +#define KVM_REQ_VM_DEAD                  (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>  #define KVM_REQUEST_ARCH_BASE     8
>
>  #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
> @@ -654,7 +654,7 @@ struct kvm {
>  static inline void kvm_vm_bugged(struct kvm *kvm)
>  {
>         kvm->vm_bugged = true;
> -       kvm_make_all_cpus_request(kvm, KVM_REQ_VM_BUGGED);
> +       kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
>  }
>
>  #define KVM_BUG(cond, kvm, fmt...)
>
>
> Back when I made this bad suggestion in v7, you said "we need to keep using the
> source VM even after the state is transfered"[*].  What all do you need to do
> after the migration?  I assume it's mostly memory related per-VM ioctls?
>
>
> [*] https://lkml.kernel.org/r/CAMkAt6q3as414YMZco6UyCycY+jKbaYS5BUdC+U+8iWmBft3+A@mail.gmail.com
>
> > +     }
> > +     ret = 0;
> > +
> > +out_source_vcpu:
> > +     sev_unlock_vcpus_for_migration(source_kvm);
> > +
> > +out_dst_vcpu:
> > +     sev_unlock_vcpus_for_migration(kvm);
> > +
> > +out_source:
> > +     sev_unlock_after_migration(source_kvm);
> > +out_fput:
> > +     if (source_kvm_file)
> > +             fput(source_kvm_file);
> > +out_unlock:
> > +     sev_unlock_after_migration(kvm);
> > +     return ret;
> > +}
> > +
> >  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> >  {
> >       struct kvm_sev_cmd sev_cmd;
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 68294491c23d..c2e25ae4757f 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4637,6 +4637,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> >       .mem_enc_unreg_region = svm_unregister_enc_region,
> >
> >       .vm_copy_enc_context_from = svm_vm_copy_asid_from,
> > +     .vm_migrate_protected_vm_from = svm_vm_migrate_from,
> >
> >       .can_emulate_instruction = svm_can_emulate_instruction,
> >
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 6d8d762d208f..d7b44b37dfcf 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -80,6 +80,7 @@ struct kvm_sev_info {
> >       u64 ap_jump_table;      /* SEV-ES AP Jump Table address */
> >       struct kvm *enc_context_owner; /* Owner of copied encryption context */
> >       struct misc_cg *misc_cg; /* For misc cgroup accounting */
> > +     atomic_t migration_in_progress;
> >  };
> >
> >  struct kvm_svm {
> > @@ -557,6 +558,7 @@ int svm_register_enc_region(struct kvm *kvm,
> >  int svm_unregister_enc_region(struct kvm *kvm,
> >                             struct kvm_enc_region *range);
> >  int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd);
> > +int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd);
> >  void pre_sev_run(struct vcpu_svm *svm, int cpu);
> >  void __init sev_set_cpu_caps(void);
> >  void __init sev_hardware_setup(void);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 0c8b5129effd..c80fa1d378c9 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5665,6 +5665,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >               if (kvm_x86_ops.vm_copy_enc_context_from)
> >                       r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
> >               return r;
> > +     case KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM:
>
> I wonder... would it make sense to hedge and just call this KVM_CAP_VM_MIGRATE_VM_FROM?
> I can't think of a use case where KVM would "need" to do this for a non-protected
> VM, but I also don't see a huge naming problem if the "PROTECTED" is omitted.

Currently this CAP only deals with "protected" VM metadata. This call
isn't needed at all for non-protected VMs so wouldn't this change
imply that this call is needed for intra-host migration of all VMs?

>
> > +             r = -EINVAL;
> > +             if (kvm_x86_ops.vm_migrate_protected_vm_from)
> > +                     r = kvm_x86_ops.vm_migrate_protected_vm_from(
> > +                             kvm, cap->args[0]);
>
> Either let that poke out and/or refactor to avoid the indentation.  E.g.
>
>                 r = -EINVAL;
>                 if (!kvm_x86_ops.vm_migrate_protected_vm_from)
>                         break;
>
>                 return kvm_x86_ops.vm_migrate_protected_vm_from(kvm, cap->args[0]);
>
>
> > +             return r;
> >       case KVM_CAP_EXIT_HYPERCALL:
> >               if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
> >                       r = -EINVAL;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index a067410ebea5..77b292ed01c1 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
> >  #define KVM_CAP_BINARY_STATS_FD 203
> >  #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> >  #define KVM_CAP_ARM_MTE 205
> > +#define KVM_CAP_VM_MIGRATE_PROTECTED_VM_FROM 206
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> > --
> > 2.33.0.1079.g6e70778dc9-goog
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration
  2021-11-04 22:07   ` Sean Christopherson
  2021-11-04 23:04     ` Sean Christopherson
  2021-11-09 15:19     ` Peter Gonda
@ 2021-11-11 15:17     ` Paolo Bonzini
  2021-11-11 15:18     ` Paolo Bonzini
  3 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2021-11-11 15:17 UTC (permalink / raw)
  To: Sean Christopherson, Peter Gonda
  Cc: kvm, Marc Orr, David Rientjes, Dr . David Alan Gilbert,
	Brijesh Singh, Tom Lendacky, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, linux-kernel

On 11/4/21 23:07, Sean Christopherson wrote:
> 
> That would also provide a good opportunity to more tightly couple ->asid and
> ->misc_cg in the form of a helper.  Looking at the code, there's an invariant
> that misc_cg is NULL if an ASID is not assigned.  I.e. these three lines belong
> in a helper, irrespective of this code.
> 
> 	misc_cg_uncharge(type, sev->misc_cg, 1);
> 	put_misc_cg(sev->misc_cg);
> 	sev->misc_cg = NULL;

Agreed.  Though it's a bit more complicated because if dst->misc_cg ==
src->misc_cg you should *not* charge and uncharge, because charging
could fail.  So I'm going for just simple charge/uncharge helpers for
now:

 From 0ac1004d9e15e9460b51651bd095e6c5ee9cf4f9 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:02:26 -0500
Subject: [PATCH 1/2] KVM: SEV: provide helpers to charge/uncharge misc_cg

Avoid code duplication across all callers of misc_cg_try_charge and
misc_cg_uncharge.  The resource type for KVM is always derived from
sev->es_active, and the quantity is always 1.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7c94fe307b39..5bafa4bf7c49 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -120,16 +120,26 @@ static bool __sev_recycle_asids(int min_asid, int max_asid)
  	return true;
  }
  
+static int sev_misc_cg_try_charge(struct kvm_sev_info *sev)
+{
+	enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	return misc_cg_try_charge(type, sev->misc_cg, 1);
+}
+
+static void sev_misc_cg_uncharge(struct kvm_sev_info *sev)
+{
+	enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	misc_cg_uncharge(type, sev->misc_cg, 1);
+}
+
  static int sev_asid_new(struct kvm_sev_info *sev)
  {
  	int asid, min_asid, max_asid, ret;
  	bool retry = true;
-	enum misc_res_type type;
  
-	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
  	WARN_ON(sev->misc_cg);
  	sev->misc_cg = get_current_misc_cg();
-	ret = misc_cg_try_charge(type, sev->misc_cg, 1);
+	ret = sev_misc_cg_try_charge(sev);
  	if (ret) {
  		put_misc_cg(sev->misc_cg);
  		sev->misc_cg = NULL;
@@ -162,7 +172,7 @@ static int sev_asid_new(struct kvm_sev_info *sev)
  
  	return asid;
  e_uncharge:
-	misc_cg_uncharge(type, sev->misc_cg, 1);
+	sev_misc_cg_uncharge(sev);
  	put_misc_cg(sev->misc_cg);
  	sev->misc_cg = NULL;
  	return ret;
@@ -179,7 +189,6 @@ static void sev_asid_free(struct kvm_sev_info *sev)
  {
  	struct svm_cpu_data *sd;
  	int cpu;
-	enum misc_res_type type;
  
  	mutex_lock(&sev_bitmap_lock);
  
@@ -192,8 +201,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
  
  	mutex_unlock(&sev_bitmap_lock);
  
-	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
-	misc_cg_uncharge(type, sev->misc_cg, 1);
+	sev_misc_cg_uncharge(sev);
  	put_misc_cg(sev->misc_cg);
  	sev->misc_cg = NULL;
  }



 From 5214132ae7e8310de26d5791f7fe913085a8e53c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:08:32 -0500
Subject: [PATCH] fix cgroup charging


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5bafa4bf7c49..97048ff7c2ad 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1594,7 +1594,6 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  {
  	dst->active = true;
  	dst->asid = src->asid;
-	dst->misc_cg = src->misc_cg;
  	dst->handle = src->handle;
  	dst->pages_locked = src->pages_locked;
  
@@ -1602,6 +1601,11 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  	src->active = false;
  	src->handle = 0;
  	src->pages_locked = 0;
+
+	if (dst->misc_cg != src->misc_cg)
+		sev_misc_cg_uncharge(src);
+
+	put_misc_cg(src->misc_cg);
  	src->misc_cg = NULL;
  
  	INIT_LIST_HEAD(&dst->regions_list);
@@ -1611,6 +1615,7 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  {
  	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_info *src_sev;
  	struct file *source_kvm_file;
  	struct kvm *source_kvm;
  	struct kvm_vcpu *vcpu;
@@ -1640,25 +1645,39 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  		ret = -EINVAL;
  		goto out_source;
  	}
+
+	src_sev = &to_kvm_svm(source_kvm)->sev_info;
+	dst_sev->misc_cg = get_current_misc_cg();
+	if (dst_sev->misc_cg != src_sev->misc_cg) {
+		ret = sev_misc_cg_try_charge(dst_sev);
+		if (ret)
+			goto out_dst_put_cgroup;
+	}
+
  	ret = sev_lock_vcpus_for_migration(kvm);
  	if (ret)
-		goto out_dst_vcpu;
+		goto out_dst_cgroup;
  	ret = sev_lock_vcpus_for_migration(source_kvm);
  	if (ret)
-		goto out_source_vcpu;
+		goto out_dst_vcpu;
  
-	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
+	sev_migrate_from(dst_sev, src_sev);
  	kvm_for_each_vcpu(i, vcpu, source_kvm) {
  		kvm_vcpu_reset(vcpu, /* init_event= */ false);
  	}
  	ret = 0;
  
-out_source_vcpu:
  	sev_unlock_vcpus_for_migration(source_kvm);
  
  out_dst_vcpu:
  	sev_unlock_vcpus_for_migration(kvm);
-
+out_dst_cgroup:
+	if (ret < 0) {
+		sev_misc_cg_uncharge(dst_sev);
+out_dst_put_cgroup:
+		put_misc_cg(dst_sev->misc_cg);
+		dst_sev->misc_cg = NULL;
+	}
  out_source:
  	sev_unlock_after_migration(source_kvm);
  out_fput:



 From 943ba93c57ee25f85538decd68dca6e4ebdaf2c1 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:13:38 -0500
Subject: [PATCH] KVM: generalize "bugged" VM to "dead" VM

Generalize KVM_REQ_VM_BUGGED so that it can be called even in cases
where it is by design that the VM cannot be operated upon.  In this
case any KVM_BUG_ON should still warn, so introduce a new flag
kvm->vm_dead that is separate from kvm->vm_bugged.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca9693d3436b..185094eb86b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9660,7 +9660,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  	}
  
  	if (kvm_request_pending(vcpu)) {
-		if (kvm_check_request(KVM_REQ_VM_BUGGED, vcpu)) {
+		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
  			r = -EIO;
  			goto out;
  		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 60a35d9fe259..9e0667e3723e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -150,7 +150,7 @@ static inline bool is_error_page(struct page *page)
  #define KVM_REQ_MMU_RELOAD        (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
  #define KVM_REQ_UNBLOCK           2
  #define KVM_REQ_UNHALT            3
-#define KVM_REQ_VM_BUGGED         (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD           (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
  #define KVM_REQUEST_ARCH_BASE     8
  
  #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
@@ -617,6 +617,7 @@ struct kvm {
  	unsigned int max_halt_poll_ns;
  	u32 dirty_ring_size;
  	bool vm_bugged;
+	bool vm_dead;
  
  #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
  	struct notifier_block pm_notifier;
@@ -650,12 +651,19 @@ struct kvm {
  #define vcpu_err(vcpu, fmt, ...)					\
  	kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
  
+static inline void kvm_vm_dead(struct kvm *kvm)
+{
+	kvm->vm_dead = true;
+	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
+}
+
  static inline void kvm_vm_bugged(struct kvm *kvm)
  {
  	kvm->vm_bugged = true;
-	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_BUGGED);
+	kvm_vm_dead(kvm);
  }
  
+
  #define KVM_BUG(cond, kvm, fmt...)				\
  ({								\
  	int __ret = (cond);					\
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3f6d450355f0..d31724500501 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3747,7 +3747,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
  	struct kvm_fpu *fpu = NULL;
  	struct kvm_sregs *kvm_sregs = NULL;
  
-	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
  		return -EIO;
  
  	if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
@@ -3957,7 +3957,7 @@ static long kvm_vcpu_compat_ioctl(struct file *filp,
  	void __user *argp = compat_ptr(arg);
  	int r;
  
-	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
  		return -EIO;
  
  	switch (ioctl) {
@@ -4023,7 +4023,7 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
  {
  	struct kvm_device *dev = filp->private_data;
  
-	if (dev->kvm->mm != current->mm || dev->kvm->vm_bugged)
+	if (dev->kvm->mm != current->mm || dev->kvm->vm_dead)
  		return -EIO;
  
  	switch (ioctl) {
@@ -4345,7 +4345,7 @@ static long kvm_vm_ioctl(struct file *filp,
  	void __user *argp = (void __user *)arg;
  	int r;
  
-	if (kvm->mm != current->mm || kvm->vm_bugged)
+	if (kvm->mm != current->mm || kvm->vm_dead)
  		return -EIO;
  	switch (ioctl) {
  	case KVM_CREATE_VCPU:
@@ -4556,7 +4556,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
  	struct kvm *kvm = filp->private_data;
  	int r;
  
-	if (kvm->mm != current->mm || kvm->vm_bugged)
+	if (kvm->mm != current->mm || kvm->vm_dead)
  		return -EIO;
  	switch (ioctl) {
  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT



 From fb168352e16a4dbd95a7c0d1e6add18f0496ac97 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:14:49 -0500
Subject: [PATCH] mark src vm as dead


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 97048ff7c2ad..2403aea3dbd3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1662,12 +1662,9 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  		goto out_dst_vcpu;
  
  	sev_migrate_from(dst_sev, src_sev);
-	kvm_for_each_vcpu(i, vcpu, source_kvm) {
-		kvm_vcpu_reset(vcpu, /* init_event= */ false);
-	}
-	ret = 0;
-
+	kvm_vm_dead(source_kvm);
  	sev_unlock_vcpus_for_migration(source_kvm);
+	ret = 0;
  
  out_dst_vcpu:
  	sev_unlock_vcpus_for_migration(kvm);


I'll send it out properly when I finish reviewing.

Paolo


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration
  2021-11-04 22:07   ` Sean Christopherson
                       ` (2 preceding siblings ...)
  2021-11-11 15:17     ` Paolo Bonzini
@ 2021-11-11 15:18     ` Paolo Bonzini
  3 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2021-11-11 15:18 UTC (permalink / raw)
  To: Sean Christopherson, Peter Gonda
  Cc: kvm, Marc Orr, David Rientjes, Dr . David Alan Gilbert,
	Brijesh Singh, Tom Lendacky, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, linux-kernel

On 11/4/21 23:07, Sean Christopherson wrote:
> 
> That would also provide a good opportunity to more tightly couple ->asid and
> ->misc_cg in the form of a helper.  Looking at the code, there's an invariant
> that misc_cg is NULL if an ASID is not assigned.  I.e. these three lines belong
> in a helper, irrespective of this code.
> 
> 	misc_cg_uncharge(type, sev->misc_cg, 1);
> 	put_misc_cg(sev->misc_cg);
> 	sev->misc_cg = NULL;

Agreed.  Though it's a bit more complicated because if dst->misc_cg ==
src->misc_cg you should *not* charge and uncharge, because charging
could fail.  So I'm going for just simple charge/uncharge helpers for
now:

 From 0ac1004d9e15e9460b51651bd095e6c5ee9cf4f9 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:02:26 -0500
Subject: [PATCH 1/4] KVM: SEV: provide helpers to charge/uncharge misc_cg

Avoid code duplication across all callers of misc_cg_try_charge and
misc_cg_uncharge.  The resource type for KVM is always derived from
sev->es_active, and the quantity is always 1.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7c94fe307b39..5bafa4bf7c49 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -120,16 +120,26 @@ static bool __sev_recycle_asids(int min_asid, int max_asid)
  	return true;
  }
  
+static int sev_misc_cg_try_charge(struct kvm_sev_info *sev)
+{
+	enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	return misc_cg_try_charge(type, sev->misc_cg, 1);
+}
+
+static void sev_misc_cg_uncharge(struct kvm_sev_info *sev)
+{
+	enum misc_res_type type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	misc_cg_uncharge(type, sev->misc_cg, 1);
+}
+
  static int sev_asid_new(struct kvm_sev_info *sev)
  {
  	int asid, min_asid, max_asid, ret;
  	bool retry = true;
-	enum misc_res_type type;
  
-	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
  	WARN_ON(sev->misc_cg);
  	sev->misc_cg = get_current_misc_cg();
-	ret = misc_cg_try_charge(type, sev->misc_cg, 1);
+	ret = sev_misc_cg_try_charge(sev);
  	if (ret) {
  		put_misc_cg(sev->misc_cg);
  		sev->misc_cg = NULL;
@@ -162,7 +172,7 @@ static int sev_asid_new(struct kvm_sev_info *sev)
  
  	return asid;
  e_uncharge:
-	misc_cg_uncharge(type, sev->misc_cg, 1);
+	sev_misc_cg_uncharge(sev);
  	put_misc_cg(sev->misc_cg);
  	sev->misc_cg = NULL;
  	return ret;
@@ -179,7 +189,6 @@ static void sev_asid_free(struct kvm_sev_info *sev)
  {
  	struct svm_cpu_data *sd;
  	int cpu;
-	enum misc_res_type type;
  
  	mutex_lock(&sev_bitmap_lock);
  
@@ -192,8 +201,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
  
  	mutex_unlock(&sev_bitmap_lock);
  
-	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
-	misc_cg_uncharge(type, sev->misc_cg, 1);
+	sev_misc_cg_uncharge(sev);
  	put_misc_cg(sev->misc_cg);
  	sev->misc_cg = NULL;
  }



 From 5214132ae7e8310de26d5791f7fe913085a8e53c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:08:32 -0500
Subject: [PATCH 2/4] fix cgroup charging


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 5bafa4bf7c49..97048ff7c2ad 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1594,7 +1594,6 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  {
  	dst->active = true;
  	dst->asid = src->asid;
-	dst->misc_cg = src->misc_cg;
  	dst->handle = src->handle;
  	dst->pages_locked = src->pages_locked;
  
@@ -1602,6 +1601,11 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  	src->active = false;
  	src->handle = 0;
  	src->pages_locked = 0;
+
+	if (dst->misc_cg != src->misc_cg)
+		sev_misc_cg_uncharge(src);
+
+	put_misc_cg(src->misc_cg);
  	src->misc_cg = NULL;
  
  	INIT_LIST_HEAD(&dst->regions_list);
@@ -1611,6 +1615,7 @@ static void sev_migrate_from(struct kvm_sev_info *dst,
  int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  {
  	struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
+	struct kvm_sev_info *src_sev;
  	struct file *source_kvm_file;
  	struct kvm *source_kvm;
  	struct kvm_vcpu *vcpu;
@@ -1640,25 +1645,39 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  		ret = -EINVAL;
  		goto out_source;
  	}
+
+	src_sev = &to_kvm_svm(source_kvm)->sev_info;
+	dst_sev->misc_cg = get_current_misc_cg();
+	if (dst_sev->misc_cg != src_sev->misc_cg) {
+		ret = sev_misc_cg_try_charge(dst_sev);
+		if (ret)
+			goto out_dst_put_cgroup;
+	}
+
  	ret = sev_lock_vcpus_for_migration(kvm);
  	if (ret)
-		goto out_dst_vcpu;
+		goto out_dst_cgroup;
  	ret = sev_lock_vcpus_for_migration(source_kvm);
  	if (ret)
-		goto out_source_vcpu;
+		goto out_dst_vcpu;
  
-	sev_migrate_from(dst_sev, &to_kvm_svm(source_kvm)->sev_info);
+	sev_migrate_from(dst_sev, src_sev);
  	kvm_for_each_vcpu(i, vcpu, source_kvm) {
  		kvm_vcpu_reset(vcpu, /* init_event= */ false);
  	}
  	ret = 0;
  
-out_source_vcpu:
  	sev_unlock_vcpus_for_migration(source_kvm);
  
  out_dst_vcpu:
  	sev_unlock_vcpus_for_migration(kvm);
-
+out_dst_cgroup:
+	if (ret < 0) {
+		sev_misc_cg_uncharge(dst_sev);
+out_dst_put_cgroup:
+		put_misc_cg(dst_sev->misc_cg);
+		dst_sev->misc_cg = NULL;
+	}
  out_source:
  	sev_unlock_after_migration(source_kvm);
  out_fput:



 From 943ba93c57ee25f85538decd68dca6e4ebdaf2c1 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:13:38 -0500
Subject: [PATCH 3/4] KVM: generalize "bugged" VM to "dead" VM

Generalize KVM_REQ_VM_BUGGED so that it can be called even in cases
where it is by design that the VM cannot be operated upon.  In this
case any KVM_BUG_ON should still warn, so introduce a new flag
kvm->vm_dead that is separate from kvm->vm_bugged.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca9693d3436b..185094eb86b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9660,7 +9660,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
  	}
  
  	if (kvm_request_pending(vcpu)) {
-		if (kvm_check_request(KVM_REQ_VM_BUGGED, vcpu)) {
+		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
  			r = -EIO;
  			goto out;
  		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 60a35d9fe259..9e0667e3723e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -150,7 +150,7 @@ static inline bool is_error_page(struct page *page)
  #define KVM_REQ_MMU_RELOAD        (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
  #define KVM_REQ_UNBLOCK           2
  #define KVM_REQ_UNHALT            3
-#define KVM_REQ_VM_BUGGED         (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD           (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
  #define KVM_REQUEST_ARCH_BASE     8
  
  #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
@@ -617,6 +617,7 @@ struct kvm {
  	unsigned int max_halt_poll_ns;
  	u32 dirty_ring_size;
  	bool vm_bugged;
+	bool vm_dead;
  
  #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
  	struct notifier_block pm_notifier;
@@ -650,12 +651,19 @@ struct kvm {
  #define vcpu_err(vcpu, fmt, ...)					\
  	kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
  
+static inline void kvm_vm_dead(struct kvm *kvm)
+{
+	kvm->vm_dead = true;
+	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_DEAD);
+}
+
  static inline void kvm_vm_bugged(struct kvm *kvm)
  {
  	kvm->vm_bugged = true;
-	kvm_make_all_cpus_request(kvm, KVM_REQ_VM_BUGGED);
+	kvm_vm_dead(kvm);
  }
  
+
  #define KVM_BUG(cond, kvm, fmt...)				\
  ({								\
  	int __ret = (cond);					\
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3f6d450355f0..d31724500501 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3747,7 +3747,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
  	struct kvm_fpu *fpu = NULL;
  	struct kvm_sregs *kvm_sregs = NULL;
  
-	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
  		return -EIO;
  
  	if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
@@ -3957,7 +3957,7 @@ static long kvm_vcpu_compat_ioctl(struct file *filp,
  	void __user *argp = compat_ptr(arg);
  	int r;
  
-	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_bugged)
+	if (vcpu->kvm->mm != current->mm || vcpu->kvm->vm_dead)
  		return -EIO;
  
  	switch (ioctl) {
@@ -4023,7 +4023,7 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl,
  {
  	struct kvm_device *dev = filp->private_data;
  
-	if (dev->kvm->mm != current->mm || dev->kvm->vm_bugged)
+	if (dev->kvm->mm != current->mm || dev->kvm->vm_dead)
  		return -EIO;
  
  	switch (ioctl) {
@@ -4345,7 +4345,7 @@ static long kvm_vm_ioctl(struct file *filp,
  	void __user *argp = (void __user *)arg;
  	int r;
  
-	if (kvm->mm != current->mm || kvm->vm_bugged)
+	if (kvm->mm != current->mm || kvm->vm_dead)
  		return -EIO;
  	switch (ioctl) {
  	case KVM_CREATE_VCPU:
@@ -4556,7 +4556,7 @@ static long kvm_vm_compat_ioctl(struct file *filp,
  	struct kvm *kvm = filp->private_data;
  	int r;
  
-	if (kvm->mm != current->mm || kvm->vm_bugged)
+	if (kvm->mm != current->mm || kvm->vm_dead)
  		return -EIO;
  	switch (ioctl) {
  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT



 From fb168352e16a4dbd95a7c0d1e6add18f0496ac97 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Thu, 11 Nov 2021 10:14:49 -0500
Subject: [PATCH 4/4] mark src vm as dead


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 97048ff7c2ad..2403aea3dbd3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1662,12 +1662,9 @@ int svm_vm_migrate_from(struct kvm *kvm, unsigned int source_fd)
  		goto out_dst_vcpu;
  
  	sev_migrate_from(dst_sev, src_sev);
-	kvm_for_each_vcpu(i, vcpu, source_kvm) {
-		kvm_vcpu_reset(vcpu, /* init_event= */ false);
-	}
-	ret = 0;
-
+	kvm_vm_dead(source_kvm);
  	sev_unlock_vcpus_for_migration(source_kvm);
+	ret = 0;
  
  out_dst_vcpu:
  	sev_unlock_vcpus_for_migration(kvm);


I'll send it out properly when I finish reviewing.

Paolo


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-11-11 15:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 17:42 [PATCH 0/5 V11] Add AMD SEV and SEV-ES intra host migration support Peter Gonda
2021-10-21 17:42 ` [PATCH 1/5 V11] KVM: SEV: Refactor out sev_es_state struct Peter Gonda
2021-11-04 20:06   ` Sean Christopherson
2021-10-21 17:43 ` [PATCH V11 2/5] KVM: SEV: Add support for SEV intra host migration Peter Gonda
2021-11-04 22:07   ` Sean Christopherson
2021-11-04 23:04     ` Sean Christopherson
2021-11-09 15:19     ` Peter Gonda
2021-11-11 15:17     ` Paolo Bonzini
2021-11-11 15:18     ` Paolo Bonzini
2021-10-21 17:43 ` [PATCH V11 3/5] KVM: SEV: Add support for SEV-ES " Peter Gonda
2021-11-04 23:33   ` Sean Christopherson
2021-10-21 17:43 ` [PATCH V11 4/5] selftest: KVM: Add open sev dev helper Peter Gonda
2021-11-05 22:47   ` Sean Christopherson
2021-10-21 17:43 ` [PATCH V11 5/5] selftest: KVM: Add intra host migration tests Peter Gonda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).