[PATCH v3 0/5] Add Guest API & Guest Kernel support for SEV live migration.

linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/5] Add Guest API & Guest Kernel support for SEV live migration.
@ 2021-06-08 18:05 Ashish Kalra
  2021-06-08 18:05 ` [PATCH v3 1/5] KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall Ashish Kalra
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Ashish Kalra @ 2021-06-08 18:05 UTC (permalink / raw)
  To: pbonzini
  Cc: seanjc, tglx, bp, mingo, hpa, joro, Thomas.Lendacky, x86, kvm,
	linux-kernel, srutherford, brijesh.singh, linux-efi

From: Ashish Kalra <ashish.kalra@amd.com>

The series adds guest api and guest kernel support for SEV live migration.

The patch series introduces a new hypercall. The guest OS can use this
hypercall to notify the page encryption status. If the page is encrypted
with guest specific-key then we use SEV command during the migration.
If page is not encrypted then fallback to default.

This section descibes how the SEV live migration feature is negotiated
between the host and guest, the host indicates this feature support via 
KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
sets a UEFI enviroment variable indicating OVMF support for live
migration, the guest kernel also detects the host support for this
feature via cpuid and in case of an EFI boot verifies if OVMF also
supports this feature by getting the UEFI enviroment variable and if it
set then enables live migration feature on host by writing to a custom
MSR, if not booted under EFI, then it simply enables the feature by
again writing to the custom MSR.

Changes since v2:
 - Add guest api patch to this patchset.
 - Replace KVM_HC_PAGE_ENC_STATUS hypercall with the more generic
   KVM_HC_MAP_GPA_RANGE hypercall.
 - Add WARN_ONCE() messages if address lookup fails during kernel
   page table walk while issuing KVM_HC_MAP_GPA_RANGE hypercall.

Changes since v1:
 - Avoid having an SEV specific variant of kvm_hypercall3() and instead
   invert the default to VMMCALL.

Ashish Kalra (4):
  KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall
  KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  EFI: Introduce the new AMD Memory Encryption GUID.
  x86/kvm: Add guest support for detecting and enabling SEV Live
    Migration feature.

Brijesh Singh (1):
  mm: x86: Invoke hypercall when page encryption status is changed

 Documentation/virt/kvm/api.rst        |  19 +++++
 Documentation/virt/kvm/cpuid.rst      |   7 ++
 Documentation/virt/kvm/hypercalls.rst |  21 +++++
 Documentation/virt/kvm/msr.rst        |  13 ++++
 arch/x86/include/asm/kvm_host.h       |   2 +
 arch/x86/include/asm/kvm_para.h       |   2 +-
 arch/x86/include/asm/mem_encrypt.h    |   4 +
 arch/x86/include/asm/paravirt.h       |   6 ++
 arch/x86/include/asm/paravirt_types.h |   1 +
 arch/x86/include/asm/set_memory.h     |   1 +
 arch/x86/include/uapi/asm/kvm_para.h  |  13 ++++
 arch/x86/kernel/kvm.c                 | 107 ++++++++++++++++++++++++++
 arch/x86/kernel/paravirt.c            |   1 +
 arch/x86/kvm/x86.c                    |  46 +++++++++++
 arch/x86/mm/mem_encrypt.c             |  75 +++++++++++++++---
 arch/x86/mm/pat/set_memory.c          |   7 ++
 include/linux/efi.h                   |   1 +
 include/uapi/linux/kvm.h              |   1 +
 include/uapi/linux/kvm_para.h         |   1 +
 19 files changed, 318 insertions(+), 10 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/5] KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall
  2021-06-08 18:05 [PATCH v3 0/5] Add Guest API & Guest Kernel support for SEV live migration Ashish Kalra
@ 2021-06-08 18:05 ` Ashish Kalra
  2021-06-10 16:58   ` Paolo Bonzini
  2021-06-08 18:06 ` [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL Ashish Kalra
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Ashish Kalra @ 2021-06-08 18:05 UTC (permalink / raw)
  To: pbonzini
  Cc: seanjc, tglx, bp, mingo, hpa, joro, Thomas.Lendacky, x86, kvm,
	linux-kernel, srutherford, brijesh.singh, linux-efi

From: Ashish Kalra <ashish.kalra@amd.com>

This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.

The hypercall exits to userspace to manage the guest shared regions and
integrate with the userspace VMM's migration code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst        | 19 +++++++++++
 Documentation/virt/kvm/cpuid.rst      |  7 ++++
 Documentation/virt/kvm/hypercalls.rst | 21 ++++++++++++
 Documentation/virt/kvm/msr.rst        | 13 ++++++++
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h  | 13 ++++++++
 arch/x86/kvm/x86.c                    | 46 +++++++++++++++++++++++++++
 include/uapi/linux/kvm.h              |  1 +
 include/uapi/linux/kvm_para.h         |  1 +
 9 files changed, 123 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..6396ce8bfa44 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6891,3 +6891,22 @@ This capability is always enabled.
 This capability indicates that the KVM virtual PTP service is
 supported in the host. A VMM can check whether the service is
 available to the guest on migration.
+
+8.33 KVM_CAP_EXIT_HYPERCALL
+---------------------------
+
+:Capability: KVM_CAP_EXIT_HYPERCALL
+:Architectures: x86
+:Type: vm
+
+This capability, if enabled, will cause KVM to exit to userspace
+with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
+
+Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
+of hypercalls that can be configured to exit to userspace.
+Right now, the only such hypercall is KVM_HC_MAP_GPA_RANGE.
+
+The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
+of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
+the hypercalls whose corresponding bit is in the argument, and return
+ENOSYS for the others.
diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index cf62162d4be2..bda3e3e737d7 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -96,6 +96,13 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
                                                before using extended destination
                                                ID bits in MSI address bits 11-5.
 
+KVM_FEATURE_HC_MAP_GPA_RANGE       16          guest checks this feature bit before
+                                               using the map gpa range hypercall
+                                               to notify the page state change
+
+KVM_FEATURE_MIGRATION_CONTROL      17          guest checks this feature bit before
+                                               using MSR_KVM_MIGRATION_CONTROL
+
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                                per-cpu warps are expected in
                                                kvmclock
diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
index ed4fddd364ea..e56fa8b9cfca 100644
--- a/Documentation/virt/kvm/hypercalls.rst
+++ b/Documentation/virt/kvm/hypercalls.rst
@@ -169,3 +169,24 @@ a0: destination APIC ID
 
 :Usage example: When sending a call-function IPI-many to vCPUs, yield if
 	        any of the IPI target vCPUs was preempted.
+
+8. KVM_HC_MAP_GPA_RANGE
+-------------------------
+:Architecture: x86
+:Status: active
+:Purpose: Request KVM to map a GPA range with the specified attributes.
+
+a0: the guest physical address of the start page
+a1: the number of (4kb) pages (must be contiguous in GPA space)
+a2: attributes
+
+    Where 'attributes' :
+        * bits  3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
+        * bit     4 - plaintext = 0, encrypted = 1
+        * bits 63:5 - reserved (must be zero)
+
+**Implementation note**: this hypercall is implemented in userspace via
+the KVM_CAP_EXIT_HYPERCALL capability. Userspace must enable that capability
+before advertising KVM_FEATURE_HC_MAP_GPA_RANGE in the guest CPUID.  In
+addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace
+must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL.
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index e37a14c323d2..9315fc385fb0 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -376,3 +376,16 @@ data:
 	write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
 	and check if there are more notifications pending. The MSR is available
 	if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
+
+MSR_KVM_MIGRATION_CONTROL:
+        0x4b564d08
+
+data:
+        This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
+        CPUID.  Bit 0 represents whether live migration of the guest is allowed.
+
+        When a guest is started, bit 0 will be 0 if the guest has encrypted
+        memory and 1 if the guest does not have encrypted memory.  If the
+        guest is communicating page encryption status to the host using the
+        ``KVM_HC_MAP_GPA_RANGE`` hypercall, it can set bit 0 in this MSR to
+        allow live migration of the guest.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..5b9bc8b3db20 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1067,6 +1067,8 @@ struct kvm_arch {
 	u32 user_space_msr_mask;
 	struct kvm_x86_msr_filter __rcu *msr_filter;
 
+	u32 hypercall_exit_enabled;
+
 	/* Guest can access the SGX PROVISIONKEY. */
 	bool sgx_provisioning_allowed;
 
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 950afebfba88..5146bbab84d4 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -33,6 +33,8 @@
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
 #define KVM_FEATURE_MSI_EXT_DEST_ID	15
+#define KVM_FEATURE_HC_MAP_GPA_RANGE	16
+#define KVM_FEATURE_MIGRATION_CONTROL	17
 
 #define KVM_HINTS_REALTIME      0
 
@@ -54,6 +56,7 @@
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
 #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
 #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
+#define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
 
 struct kvm_steal_time {
 	__u64 steal;
@@ -90,6 +93,16 @@ struct kvm_clock_pairing {
 /* MSR_KVM_ASYNC_PF_INT */
 #define KVM_ASYNC_PF_VEC_MASK			GENMASK(7, 0)
 
+/* MSR_KVM_MIGRATION_CONTROL */
+#define KVM_MIGRATION_READY		(1 << 0)
+
+/* KVM_HC_MAP_GPA_RANGE */
+#define KVM_MAP_GPA_RANGE_PAGE_SZ_4K	0
+#define KVM_MAP_GPA_RANGE_PAGE_SZ_2M	(1 << 0)
+#define KVM_MAP_GPA_RANGE_PAGE_SZ_1G	(1 << 1)
+#define KVM_MAP_GPA_RANGE_ENC_STAT(n)	(n << 4)
+#define KVM_MAP_GPA_RANGE_ENCRYPTED	KVM_MAP_GPA_RANGE_ENC_STAT(1)
+#define KVM_MAP_GPA_RANGE_DECRYPTED	KVM_MAP_GPA_RANGE_ENC_STAT(0)
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..6686d99b1d7b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -102,6 +102,8 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 
 static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
 
+#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
+
 #define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
                                     KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
 
@@ -3894,6 +3896,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
 		r = 1;
 		break;
+	case KVM_CAP_EXIT_HYPERCALL:
+		r = KVM_EXIT_HYPERCALL_VALID_MASK;
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
 		return KVM_GUESTDBG_VALID_MASK;
 #ifdef CONFIG_KVM_XEN
@@ -5499,6 +5504,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		if (kvm_x86_ops.vm_copy_enc_context_from)
 			r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
 		return r;
+	case KVM_CAP_EXIT_HYPERCALL:
+		if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
+			r = -EINVAL;
+			break;
+		}
+		kvm->arch.hypercall_exit_enabled = cap->args[0];
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -8384,6 +8397,17 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 	return;
 }
 
+static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
+{
+	u64 ret = vcpu->run->hypercall.ret;
+
+	if (!is_64_bit_mode(vcpu))
+		ret = (u32)ret;
+	kvm_rax_write(vcpu, ret);
+	++vcpu->stat.hypercalls;
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -8449,6 +8473,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		kvm_sched_yield(vcpu, a0);
 		ret = 0;
 		break;
+	case KVM_HC_MAP_GPA_RANGE: {
+		u64 gpa = a0, npages = a1, attrs = a2;
+
+		ret = -KVM_ENOSYS;
+		if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE)))
+			break;
+
+		if (!PAGE_ALIGNED(gpa) || !npages ||
+		    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
+			ret = -KVM_EINVAL;
+			break;
+		}
+
+		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
+		vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
+		vcpu->run->hypercall.args[0]  = gpa;
+		vcpu->run->hypercall.args[1]  = npages;
+		vcpu->run->hypercall.args[2]  = attrs;
+		vcpu->run->hypercall.longmode = op_64_bit;
+		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
+		return 0;
+	}
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..1fb4fd863324 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_EXIT_HYPERCALL 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 8b86609849b9..960c7e93d1a9 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -29,6 +29,7 @@
 #define KVM_HC_CLOCK_PAIRING		9
 #define KVM_HC_SEND_IPI		10
 #define KVM_HC_SCHED_YIELD		11
+#define KVM_HC_MAP_GPA_RANGE		12
 
 /*
  * hypercalls use architecture specific
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  2021-06-08 18:05 [PATCH v3 0/5] Add Guest API & Guest Kernel support for SEV live migration Ashish Kalra
  2021-06-08 18:05 ` [PATCH v3 1/5] KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall Ashish Kalra
@ 2021-06-08 18:06 ` Ashish Kalra
  2021-08-19 20:45   ` Sean Christopherson
  2021-06-08 18:06 ` [PATCH v3 3/5] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Ashish Kalra @ 2021-06-08 18:06 UTC (permalink / raw)
  To: pbonzini
  Cc: seanjc, tglx, bp, mingo, hpa, joro, Thomas.Lendacky, x86, kvm,
	linux-kernel, srutherford, brijesh.singh, linux-efi

From: Ashish Kalra <ashish.kalra@amd.com>

KVM hypercall framework relies on alternative framework to patch the
VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
apply_alternative() is called then it defaults to VMCALL. The approach
works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
will be able to decode the instruction and do the right things. But
when SEV is active, guest memory is encrypted with guest key and
hypervisor will not be able to decode the instruction bytes.

So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
and opt into VMCALL.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/kvm_para.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 69299878b200..0267bebb0b0f 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 #endif /* CONFIG_KVM_GUEST */
 
 #define KVM_HYPERCALL \
-        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
+	ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL)
 
 /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
  * instruction.  The hypervisor may replace it with something else but only the
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/5] mm: x86: Invoke hypercall when page encryption status is changed
  2021-06-08 18:05 [PATCH v3 0/5] Add Guest API & Guest Kernel support for SEV live migration Ashish Kalra
  2021-06-08 18:05 ` [PATCH v3 1/5] KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall Ashish Kalra
  2021-06-08 18:06 ` [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL Ashish Kalra
@ 2021-06-08 18:06 ` Ashish Kalra
  2021-06-10 18:30   ` Borislav Petkov
  2021-06-08 18:06 ` [PATCH v3 4/5] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
  2021-06-08 18:07 ` [PATCH v3 5/5] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
  4 siblings, 1 reply; 16+ messages in thread
From: Ashish Kalra @ 2021-06-08 18:06 UTC (permalink / raw)
  To: pbonzini
  Cc: seanjc, tglx, bp, mingo, hpa, joro, Thomas.Lendacky, x86, kvm,
	linux-kernel, srutherford, brijesh.singh, linux-efi

From: Brijesh Singh <brijesh.singh@amd.com>

Invoke a hypercall when a memory region is changed from encrypted ->
decrypted and vice versa. Hypervisor needs to know the page encryption
status during the guest migration.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/paravirt.h       |  6 +++
 arch/x86/include/asm/paravirt_types.h |  1 +
 arch/x86/include/asm/set_memory.h     |  1 +
 arch/x86/kernel/paravirt.c            |  1 +
 arch/x86/mm/mem_encrypt.c             | 69 +++++++++++++++++++++++----
 arch/x86/mm/pat/set_memory.c          |  7 +++
 6 files changed, 76 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index da3a1ac82be5..540bf8cb37db 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -97,6 +97,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 	PVOP_VCALL1(mmu.exit_mmap, mm);
 }
 
+static inline void notify_page_enc_status_changed(unsigned long pfn,
+						  int npages, bool enc)
+{
+	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
+}
+
 #ifdef CONFIG_PARAVIRT_XXL
 static inline void load_sp0(unsigned long sp0)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index d9d6b0203ec4..664199820239 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -168,6 +168,7 @@ struct pv_mmu_ops {
 
 	/* Hook for intercepting the destruction of an mm_struct. */
 	void (*exit_mmap)(struct mm_struct *mm);
+	void (*notify_page_enc_status_changed)(unsigned long pfn, int npages, bool enc);
 
 #ifdef CONFIG_PARAVIRT_XXL
 	struct paravirt_callee_save read_cr2;
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..872617542bbc 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -83,6 +83,7 @@ int set_pages_rw(struct page *page, int numpages);
 int set_direct_map_invalid_noflush(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
 bool kernel_page_present(struct page *page);
+void notify_range_enc_status_changed(unsigned long vaddr, int npages, bool enc);
 
 extern int kernel_set_to_readonly;
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 04cafc057bed..1cc20ac9a54f 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -296,6 +296,7 @@ struct paravirt_patch_template pv_ops = {
 			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
 
 	.mmu.exit_mmap		= paravirt_nop,
+	.mmu.notify_page_enc_status_changed	= paravirt_nop,
 
 #ifdef CONFIG_PARAVIRT_XXL
 	.mmu.read_cr2		= __PV_IS_CALLEE_SAVE(native_read_cr2),
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ff08dc463634..6b12620376a4 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -228,29 +228,77 @@ void __init sev_setup_arch(void)
 	swiotlb_adjust_size(size);
 }
 
-static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
+static unsigned long pg_level_to_pfn(int level, pte_t *kpte, pgprot_t *ret_prot)
 {
-	pgprot_t old_prot, new_prot;
-	unsigned long pfn, pa, size;
-	pte_t new_pte;
+	unsigned long pfn = 0;
+	pgprot_t prot;
 
 	switch (level) {
 	case PG_LEVEL_4K:
 		pfn = pte_pfn(*kpte);
-		old_prot = pte_pgprot(*kpte);
+		prot = pte_pgprot(*kpte);
 		break;
 	case PG_LEVEL_2M:
 		pfn = pmd_pfn(*(pmd_t *)kpte);
-		old_prot = pmd_pgprot(*(pmd_t *)kpte);
+		prot = pmd_pgprot(*(pmd_t *)kpte);
 		break;
 	case PG_LEVEL_1G:
 		pfn = pud_pfn(*(pud_t *)kpte);
-		old_prot = pud_pgprot(*(pud_t *)kpte);
+		prot = pud_pgprot(*(pud_t *)kpte);
 		break;
 	default:
-		return;
+		WARN_ONCE(1, "Invalid level for kpte\n");
+		return 0;
 	}
 
+	if (ret_prot)
+		*ret_prot = prot;
+
+	return pfn;
+}
+
+void notify_range_enc_status_changed(unsigned long vaddr, int npages,
+				    bool enc)
+{
+#ifdef CONFIG_PARAVIRT
+	unsigned long sz = npages << PAGE_SHIFT;
+	unsigned long vaddr_end = vaddr + sz;
+
+	while (vaddr < vaddr_end) {
+		int psize, pmask, level;
+		unsigned long pfn;
+		pte_t *kpte;
+
+		kpte = lookup_address(vaddr, &level);
+		if (!kpte || pte_none(*kpte)) {
+			WARN_ONCE(1, "kpte lookup for vaddr\n");
+			return;
+		}
+
+		pfn = pg_level_to_pfn(level, kpte, NULL);
+		if (!pfn)
+			continue;
+
+		psize = page_level_size(level);
+		pmask = page_level_mask(level);
+
+		notify_page_enc_status_changed(pfn, psize >> PAGE_SHIFT, enc);
+
+		vaddr = (vaddr & pmask) + psize;
+	}
+#endif
+}
+
+static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
+{
+	pgprot_t old_prot, new_prot;
+	unsigned long pfn, pa, size;
+	pte_t new_pte;
+
+	pfn = pg_level_to_pfn(level, kpte, &old_prot);
+	if (!pfn)
+		return;
+
 	new_prot = old_prot;
 	if (enc)
 		pgprot_val(new_prot) |= _PAGE_ENC;
@@ -285,12 +333,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
 static int __init early_set_memory_enc_dec(unsigned long vaddr,
 					   unsigned long size, bool enc)
 {
-	unsigned long vaddr_end, vaddr_next;
+	unsigned long vaddr_end, vaddr_next, start;
 	unsigned long psize, pmask;
 	int split_page_size_mask;
 	int level, ret;
 	pte_t *kpte;
 
+	start = vaddr;
 	vaddr_next = vaddr;
 	vaddr_end = vaddr + size;
 
@@ -345,6 +394,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
 
 	ret = 0;
 
+	notify_range_enc_status_changed(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
+					enc);
 out:
 	__flush_tlb_all();
 	return ret;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 156cd235659f..9729cb0d99e3 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2020,6 +2020,13 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 	 */
 	cpa_flush(&cpa, 0);
 
+	/*
+	 * Notify hypervisor that a given memory range is mapped encrypted
+	 * or decrypted. The hypervisor will use this information during the
+	 * VM migration.
+	 */
+	notify_range_enc_status_changed(addr, numpages, enc);
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/5] EFI: Introduce the new AMD Memory Encryption GUID.
  2021-06-08 18:05 [PATCH v3 0/5] Add Guest API & Guest Kernel support for SEV live migration Ashish Kalra
                   ` (2 preceding siblings ...)
  2021-06-08 18:06 ` [PATCH v3 3/5] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
@ 2021-06-08 18:06 ` Ashish Kalra
  2021-06-10 15:01   ` Ard Biesheuvel
  2021-06-08 18:07 ` [PATCH v3 5/5] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
  4 siblings, 1 reply; 16+ messages in thread
From: Ashish Kalra @ 2021-06-08 18:06 UTC (permalink / raw)
  To: pbonzini
  Cc: seanjc, tglx, bp, mingo, hpa, joro, Thomas.Lendacky, x86, kvm,
	linux-kernel, srutherford, brijesh.singh, linux-efi

From: Ashish Kalra <ashish.kalra@amd.com>

Introduce a new AMD Memory Encryption GUID which is currently
used for defining a new UEFI environment variable which indicates
UEFI/OVMF support for the SEV live migration feature. This variable
is setup when UEFI/OVMF detects host/hypervisor support for SEV
live migration and later this variable is read by the kernel using
EFI runtime services to verify if OVMF supports the live migration
feature.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 include/linux/efi.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/efi.h b/include/linux/efi.h
index 6b5d36babfcc..dbd39b20e034 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -362,6 +362,7 @@ void efi_native_runtime_setup(void);
 
 /* OEM GUIDs */
 #define DELLEMC_EFI_RCI2_TABLE_GUID		EFI_GUID(0x2d9f28a2, 0xa886, 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
+#define AMD_SEV_MEM_ENCRYPT_GUID		EFI_GUID(0x0cf29b71, 0x9e51, 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
 
 typedef struct {
 	efi_guid_t guid;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/5] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature.
  2021-06-08 18:05 [PATCH v3 0/5] Add Guest API & Guest Kernel support for SEV live migration Ashish Kalra
                   ` (3 preceding siblings ...)
  2021-06-08 18:06 ` [PATCH v3 4/5] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
@ 2021-06-08 18:07 ` Ashish Kalra
  2021-06-10 18:32   ` Borislav Petkov
  4 siblings, 1 reply; 16+ messages in thread
From: Ashish Kalra @ 2021-06-08 18:07 UTC (permalink / raw)
  To: pbonzini
  Cc: seanjc, tglx, bp, mingo, hpa, joro, Thomas.Lendacky, x86, kvm,
	linux-kernel, srutherford, brijesh.singh, linux-efi

From: Ashish Kalra <ashish.kalra@amd.com>

The guest support for detecting and enabling SEV Live migration
feature uses the following logic :

 - kvm_init_plaform() checks if its booted under the EFI

   - If not EFI,

     i) if kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), issue a wrmsrl()
         to enable the SEV live migration support

   - If EFI,

     i) If kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), read
        the UEFI variable which indicates OVMF support for live migration

     ii) the variable indicates live migration is supported, issue a wrmsrl() to
          enable the SEV live migration support

The EFI live migration check is done using a late_initcall() callback.

Also, ensure that _bss_decrypted section is marked as decrypted in the
shared pages list.

Also adds kexec support for SEV Live Migration.

Reset the host's shared pages list related to kernel
specific page encryption status settings before we load a
new kernel by kexec. We cannot reset the complete
shared pages list here as we need to retain the
UEFI/OVMF firmware specific settings.

The host's shared pages list is maintained for the
guest to keep track of all unencrypted guest memory regions,
therefore we need to explicitly mark all shared pages as
encrypted again before rebooting into the new guest kernel.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |   4 ++
 arch/x86/kernel/kvm.c              | 107 +++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt.c          |   6 ++
 3 files changed, 117 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 9c80c68d75b5..8dd373cc8b66 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -43,6 +43,8 @@ void __init sme_enable(struct boot_params *bp);
 
 int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
 int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
+void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
+					    bool enc);
 
 void __init mem_encrypt_free_decrypted_mem(void);
 
@@ -83,6 +85,8 @@ static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
 static inline int __init
 early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
+static inline void __init
+early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages, bool enc) {}
 
 static inline void mem_encrypt_free_decrypted_mem(void) { }
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a26643dc6bd6..80a81de4c470 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -27,6 +27,7 @@
 #include <linux/nmi.h>
 #include <linux/swait.h>
 #include <linux/syscore_ops.h>
+#include <linux/efi.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -40,6 +41,7 @@
 #include <asm/ptrace.h>
 #include <asm/reboot.h>
 #include <asm/svm.h>
+#include <asm/e820/api.h>
 
 DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
 
@@ -433,6 +435,8 @@ static void kvm_guest_cpu_offline(bool shutdown)
 	kvm_disable_steal_time();
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	if (kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL))
+		wrmsrl(MSR_KVM_MIGRATION_CONTROL, 0);
 	kvm_pv_disable_apf();
 	if (!shutdown)
 		apf_task_wake_all();
@@ -547,6 +551,55 @@ static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
 	__send_ipi_mask(local_mask, vector);
 }
 
+static int __init setup_efi_kvm_sev_migration(void)
+{
+	efi_char16_t efi_sev_live_migration_enabled[] = L"SevLiveMigrationEnabled";
+	efi_guid_t efi_variable_guid = AMD_SEV_MEM_ENCRYPT_GUID;
+	efi_status_t status;
+	unsigned long size;
+	bool enabled;
+
+	if (!sev_active() ||
+	    !kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL))
+		return 0;
+
+	if (!efi_enabled(EFI_BOOT))
+		return 0;
+
+	if (!efi_enabled(EFI_RUNTIME_SERVICES)) {
+		pr_info("%s : EFI runtime services are not enabled\n", __func__);
+		return 0;
+	}
+
+	size = sizeof(enabled);
+
+	/* Get variable contents into buffer */
+	status = efi.get_variable(efi_sev_live_migration_enabled,
+				  &efi_variable_guid, NULL, &size, &enabled);
+
+	if (status == EFI_NOT_FOUND) {
+		pr_info("%s : EFI live migration variable not found\n", __func__);
+		return 0;
+	}
+
+	if (status != EFI_SUCCESS) {
+		pr_info("%s : EFI variable retrieval failed\n", __func__);
+		return 0;
+	}
+
+	if (enabled == 0) {
+		pr_info("%s: live migration disabled in EFI\n", __func__);
+		return 0;
+	}
+
+	pr_info("%s : live migration enabled in EFI\n", __func__);
+	wrmsrl(MSR_KVM_MIGRATION_CONTROL, KVM_MIGRATION_READY);
+
+	return true;
+}
+
+late_initcall(setup_efi_kvm_sev_migration);
+
 /*
  * Set the IPI entry points
  */
@@ -805,8 +858,62 @@ static bool __init kvm_msi_ext_dest_id(void)
 	return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
 }
 
+static void kvm_sev_hc_page_enc_status(unsigned long pfn, int npages, bool enc)
+{
+	kvm_hypercall3(KVM_HC_MAP_GPA_RANGE, pfn << PAGE_SHIFT, npages,
+		       KVM_MAP_GPA_RANGE_ENC_STAT(enc) | KVM_MAP_GPA_RANGE_PAGE_SZ_4K);
+}
+
 static void __init kvm_init_platform(void)
 {
+	if (sev_active() &&
+	    kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) {
+		unsigned long nr_pages;
+		int i;
+
+		pv_ops.mmu.notify_page_enc_status_changed =
+			kvm_sev_hc_page_enc_status;
+
+		/*
+		 * Reset the host's shared pages list related to kernel
+		 * specific page encryption status settings before we load a
+		 * new kernel by kexec. Reset the page encryption status
+		 * during early boot intead of just before kexec to avoid SMP
+		 * races during kvm_pv_guest_cpu_reboot().
+		 * NOTE: We cannot reset the complete shared pages list
+		 * here as we need to retain the UEFI/OVMF firmware
+		 * specific settings.
+		 */
+
+		for (i = 0; i < e820_table->nr_entries; i++) {
+			struct e820_entry *entry = &e820_table->entries[i];
+
+			if (entry->type != E820_TYPE_RAM)
+				continue;
+
+			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
+
+			kvm_hypercall3(KVM_HC_MAP_GPA_RANGE, entry->addr,
+				       nr_pages,
+				       KVM_MAP_GPA_RANGE_ENCRYPTED | KVM_MAP_GPA_RANGE_PAGE_SZ_4K);
+		}
+
+		/*
+		 * Ensure that _bss_decrypted section is marked as decrypted in the
+		 * shared pages list.
+		 */
+		nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
+					PAGE_SIZE);
+		early_set_mem_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
+						nr_pages, 0);
+
+		/*
+		 * If not booted using EFI, enable Live migration support.
+		 */
+		if (!efi_enabled(EFI_BOOT))
+			wrmsrl(MSR_KVM_MIGRATION_CONTROL,
+			       KVM_MIGRATION_READY);
+	}
 	kvmclock_init();
 	x86_platform.apic_post_init = kvm_apic_init;
 }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 6b12620376a4..3d6a906d125c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -411,6 +411,12 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size)
 	return early_set_memory_enc_dec(vaddr, size, true);
 }
 
+void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
+					bool enc)
+{
+	notify_range_enc_status_changed(vaddr, npages, enc);
+}
+
 /*
  * SME and SEV are very similar but they are not the same, so there are
  * times that the kernel will need to distinguish between SME and SEV. The
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/5] EFI: Introduce the new AMD Memory Encryption GUID.
  2021-06-08 18:06 ` [PATCH v3 4/5] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
@ 2021-06-10 15:01   ` Ard Biesheuvel
  0 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2021-06-10 15:01 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Joerg Roedel,
	Tom Lendacky, X86 ML, kvm, Linux Kernel Mailing List,
	Steve Rutherford, Brijesh Singh, linux-efi

On Tue, 8 Jun 2021 at 20:07, Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> Introduce a new AMD Memory Encryption GUID which is currently
> used for defining a new UEFI environment variable which indicates
> UEFI/OVMF support for the SEV live migration feature. This variable
> is setup when UEFI/OVMF detects host/hypervisor support for SEV
> live migration and later this variable is read by the kernel using
> EFI runtime services to verify if OVMF supports the live migration
> feature.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>

Acked-by: Ard Biesheuvel <ardb@kernel.org>

> ---
>  include/linux/efi.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 6b5d36babfcc..dbd39b20e034 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -362,6 +362,7 @@ void efi_native_runtime_setup(void);
>
>  /* OEM GUIDs */
>  #define DELLEMC_EFI_RCI2_TABLE_GUID            EFI_GUID(0x2d9f28a2, 0xa886, 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
> +#define AMD_SEV_MEM_ENCRYPT_GUID               EFI_GUID(0x0cf29b71, 0x9e51, 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
>
>  typedef struct {
>         efi_guid_t guid;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/5] KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall
  2021-06-08 18:05 ` [PATCH v3 1/5] KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall Ashish Kalra
@ 2021-06-10 16:58   ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2021-06-10 16:58 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: seanjc, tglx, bp, mingo, hpa, joro, Thomas.Lendacky, x86, kvm,
	linux-kernel, srutherford, brijesh.singh, linux-efi

On 08/06/21 20:05, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
> 
> The hypercall exits to userspace to manage the guest shared regions and
> integrate with the userspace VMM's migration code.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   Documentation/virt/kvm/api.rst        | 19 +++++++++++
>   Documentation/virt/kvm/cpuid.rst      |  7 ++++
>   Documentation/virt/kvm/hypercalls.rst | 21 ++++++++++++
>   Documentation/virt/kvm/msr.rst        | 13 ++++++++
>   arch/x86/include/asm/kvm_host.h       |  2 ++
>   arch/x86/include/uapi/asm/kvm_para.h  | 13 ++++++++
>   arch/x86/kvm/x86.c                    | 46 +++++++++++++++++++++++++++
>   include/uapi/linux/kvm.h              |  1 +
>   include/uapi/linux/kvm_para.h         |  1 +
>   9 files changed, 123 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7fcb2fd38f42..6396ce8bfa44 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6891,3 +6891,22 @@ This capability is always enabled.
>   This capability indicates that the KVM virtual PTP service is
>   supported in the host. A VMM can check whether the service is
>   available to the guest on migration.
> +
> +8.33 KVM_CAP_EXIT_HYPERCALL
> +---------------------------
> +
> +:Capability: KVM_CAP_EXIT_HYPERCALL
> +:Architectures: x86
> +:Type: vm
> +
> +This capability, if enabled, will cause KVM to exit to userspace
> +with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
> +
> +Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
> +of hypercalls that can be configured to exit to userspace.
> +Right now, the only such hypercall is KVM_HC_MAP_GPA_RANGE.
> +
> +The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
> +of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
> +the hypercalls whose corresponding bit is in the argument, and return
> +ENOSYS for the others.
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index cf62162d4be2..bda3e3e737d7 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -96,6 +96,13 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
>                                                  before using extended destination
>                                                  ID bits in MSI address bits 11-5.
>   
> +KVM_FEATURE_HC_MAP_GPA_RANGE       16          guest checks this feature bit before
> +                                               using the map gpa range hypercall
> +                                               to notify the page state change
> +
> +KVM_FEATURE_MIGRATION_CONTROL      17          guest checks this feature bit before
> +                                               using MSR_KVM_MIGRATION_CONTROL
> +
>   KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
>                                                  per-cpu warps are expected in
>                                                  kvmclock
> diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
> index ed4fddd364ea..e56fa8b9cfca 100644
> --- a/Documentation/virt/kvm/hypercalls.rst
> +++ b/Documentation/virt/kvm/hypercalls.rst
> @@ -169,3 +169,24 @@ a0: destination APIC ID
>   
>   :Usage example: When sending a call-function IPI-many to vCPUs, yield if
>   	        any of the IPI target vCPUs was preempted.
> +
> +8. KVM_HC_MAP_GPA_RANGE
> +-------------------------
> +:Architecture: x86
> +:Status: active
> +:Purpose: Request KVM to map a GPA range with the specified attributes.
> +
> +a0: the guest physical address of the start page
> +a1: the number of (4kb) pages (must be contiguous in GPA space)
> +a2: attributes
> +
> +    Where 'attributes' :
> +        * bits  3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
> +        * bit     4 - plaintext = 0, encrypted = 1
> +        * bits 63:5 - reserved (must be zero)
> +
> +**Implementation note**: this hypercall is implemented in userspace via
> +the KVM_CAP_EXIT_HYPERCALL capability. Userspace must enable that capability
> +before advertising KVM_FEATURE_HC_MAP_GPA_RANGE in the guest CPUID.  In
> +addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace
> +must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL.
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index e37a14c323d2..9315fc385fb0 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -376,3 +376,16 @@ data:
>   	write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
>   	and check if there are more notifications pending. The MSR is available
>   	if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> +
> +MSR_KVM_MIGRATION_CONTROL:
> +        0x4b564d08
> +
> +data:
> +        This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
> +        CPUID.  Bit 0 represents whether live migration of the guest is allowed.
> +
> +        When a guest is started, bit 0 will be 0 if the guest has encrypted
> +        memory and 1 if the guest does not have encrypted memory.  If the
> +        guest is communicating page encryption status to the host using the
> +        ``KVM_HC_MAP_GPA_RANGE`` hypercall, it can set bit 0 in this MSR to
> +        allow live migration of the guest.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 55efbacfc244..5b9bc8b3db20 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1067,6 +1067,8 @@ struct kvm_arch {
>   	u32 user_space_msr_mask;
>   	struct kvm_x86_msr_filter __rcu *msr_filter;
>   
> +	u32 hypercall_exit_enabled;
> +
>   	/* Guest can access the SGX PROVISIONKEY. */
>   	bool sgx_provisioning_allowed;
>   
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 950afebfba88..5146bbab84d4 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -33,6 +33,8 @@
>   #define KVM_FEATURE_PV_SCHED_YIELD	13
>   #define KVM_FEATURE_ASYNC_PF_INT	14
>   #define KVM_FEATURE_MSI_EXT_DEST_ID	15
> +#define KVM_FEATURE_HC_MAP_GPA_RANGE	16
> +#define KVM_FEATURE_MIGRATION_CONTROL	17
>   
>   #define KVM_HINTS_REALTIME      0
>   
> @@ -54,6 +56,7 @@
>   #define MSR_KVM_POLL_CONTROL	0x4b564d05
>   #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
>   #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
> +#define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
>   
>   struct kvm_steal_time {
>   	__u64 steal;
> @@ -90,6 +93,16 @@ struct kvm_clock_pairing {
>   /* MSR_KVM_ASYNC_PF_INT */
>   #define KVM_ASYNC_PF_VEC_MASK			GENMASK(7, 0)
>   
> +/* MSR_KVM_MIGRATION_CONTROL */
> +#define KVM_MIGRATION_READY		(1 << 0)
> +
> +/* KVM_HC_MAP_GPA_RANGE */
> +#define KVM_MAP_GPA_RANGE_PAGE_SZ_4K	0
> +#define KVM_MAP_GPA_RANGE_PAGE_SZ_2M	(1 << 0)
> +#define KVM_MAP_GPA_RANGE_PAGE_SZ_1G	(1 << 1)
> +#define KVM_MAP_GPA_RANGE_ENC_STAT(n)	(n << 4)
> +#define KVM_MAP_GPA_RANGE_ENCRYPTED	KVM_MAP_GPA_RANGE_ENC_STAT(1)
> +#define KVM_MAP_GPA_RANGE_DECRYPTED	KVM_MAP_GPA_RANGE_ENC_STAT(0)
>   
>   /* Operations for KVM_HC_MMU_OP */
>   #define KVM_MMU_OP_WRITE_PTE            1
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b6bca616929..6686d99b1d7b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -102,6 +102,8 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>   
>   static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
>   
> +#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
> +
>   #define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
>                                       KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
>   
> @@ -3894,6 +3896,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
>   		r = 1;
>   		break;
> +	case KVM_CAP_EXIT_HYPERCALL:
> +		r = KVM_EXIT_HYPERCALL_VALID_MASK;
> +		break;
>   	case KVM_CAP_SET_GUEST_DEBUG2:
>   		return KVM_GUESTDBG_VALID_MASK;
>   #ifdef CONFIG_KVM_XEN
> @@ -5499,6 +5504,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		if (kvm_x86_ops.vm_copy_enc_context_from)
>   			r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]);
>   		return r;
> +	case KVM_CAP_EXIT_HYPERCALL:
> +		if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
> +			r = -EINVAL;
> +			break;
> +		}
> +		kvm->arch.hypercall_exit_enabled = cap->args[0];
> +		r = 0;
> +		break;
>   	default:
>   		r = -EINVAL;
>   		break;
> @@ -8384,6 +8397,17 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
>   	return;
>   }
>   
> +static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
> +{
> +	u64 ret = vcpu->run->hypercall.ret;
> +
> +	if (!is_64_bit_mode(vcpu))
> +		ret = (u32)ret;
> +	kvm_rax_write(vcpu, ret);
> +	++vcpu->stat.hypercalls;
> +	return kvm_skip_emulated_instruction(vcpu);
> +}
> +
>   int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>   {
>   	unsigned long nr, a0, a1, a2, a3, ret;
> @@ -8449,6 +8473,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>   		kvm_sched_yield(vcpu, a0);
>   		ret = 0;
>   		break;
> +	case KVM_HC_MAP_GPA_RANGE: {
> +		u64 gpa = a0, npages = a1, attrs = a2;
> +
> +		ret = -KVM_ENOSYS;
> +		if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_MAP_GPA_RANGE)))
> +			break;
> +
> +		if (!PAGE_ALIGNED(gpa) || !npages ||
> +		    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
> +			ret = -KVM_EINVAL;
> +			break;
> +		}
> +
> +		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> +		vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
> +		vcpu->run->hypercall.args[0]  = gpa;
> +		vcpu->run->hypercall.args[1]  = npages;
> +		vcpu->run->hypercall.args[2]  = attrs;
> +		vcpu->run->hypercall.longmode = op_64_bit;
> +		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
> +		return 0;
> +	}
>   	default:
>   		ret = -KVM_ENOSYS;
>   		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 3fd9a7e9d90c..1fb4fd863324 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
>   #define KVM_CAP_SGX_ATTRIBUTE 196
>   #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
>   #define KVM_CAP_PTP_KVM 198
> +#define KVM_CAP_EXIT_HYPERCALL 199
>   
>   #ifdef KVM_CAP_IRQ_ROUTING
>   
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..960c7e93d1a9 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
>   #define KVM_HC_CLOCK_PAIRING		9
>   #define KVM_HC_SEND_IPI		10
>   #define KVM_HC_SCHED_YIELD		11
> +#define KVM_HC_MAP_GPA_RANGE		12
>   
>   /*
>    * hypercalls use architecture specific
> 

Queued this one for 5.14, thanks!

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] mm: x86: Invoke hypercall when page encryption status is changed
  2021-06-08 18:06 ` [PATCH v3 3/5] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
@ 2021-06-10 18:30   ` Borislav Petkov
  2021-06-30  3:10     ` Ashish Kalra
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2021-06-10 18:30 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, seanjc, tglx, mingo, hpa, joro, Thomas.Lendacky, x86,
	kvm, linux-kernel, srutherford, brijesh.singh, linux-efi

On Tue, Jun 08, 2021 at 06:06:26PM +0000, Ashish Kalra wrote:
> +void notify_range_enc_status_changed(unsigned long vaddr, int npages,
> +				    bool enc)

You don't need to break this line.

> @@ -285,12 +333,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
>  static int __init early_set_memory_enc_dec(unsigned long vaddr,
>  					   unsigned long size, bool enc)
>  {
> -	unsigned long vaddr_end, vaddr_next;
> +	unsigned long vaddr_end, vaddr_next, start;
>  	unsigned long psize, pmask;
>  	int split_page_size_mask;
>  	int level, ret;
>  	pte_t *kpte;
>  
> +	start = vaddr;
>  	vaddr_next = vaddr;
>  	vaddr_end = vaddr + size;
>  
> @@ -345,6 +394,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
>  
>  	ret = 0;
>  
> +	notify_range_enc_status_changed(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> +					enc);

Ditto.

>  out:
>  	__flush_tlb_all();
>  	return ret;
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 156cd235659f..9729cb0d99e3 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2020,6 +2020,13 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  	 */
>  	cpa_flush(&cpa, 0);
>  
> +	/*
> +	 * Notify hypervisor that a given memory range is mapped encrypted
> +	 * or decrypted. The hypervisor will use this information during the
> +	 * VM migration.
> +	 */

Simplify that comment:

        /*
         * Notify the hypervisor about the encryption status change of the memory
	 * range. It will use this information during the VM migration.
         */


With those nitpicks fixed:

Reviewed-by: Borislav Petkov <bp@suse.de>

Paulo, if you want me to take this, lemme know, but I think it'll
conflict with patch 5 so perhaps it all should go together through the
kvm tree...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 5/5] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature.
  2021-06-08 18:07 ` [PATCH v3 5/5] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
@ 2021-06-10 18:32   ` Borislav Petkov
  0 siblings, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2021-06-10 18:32 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, seanjc, tglx, mingo, hpa, joro, Thomas.Lendacky, x86,
	kvm, linux-kernel, srutherford, brijesh.singh, linux-efi

On Tue, Jun 08, 2021 at 06:07:04PM +0000, Ashish Kalra wrote:
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 6b12620376a4..3d6a906d125c 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -411,6 +411,12 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size)
>  	return early_set_memory_enc_dec(vaddr, size, true);
>  }
>  
> +void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
> +					bool enc)

You don't have to break this line either.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 3/5] mm: x86: Invoke hypercall when page encryption status is changed
  2021-06-10 18:30   ` Borislav Petkov
@ 2021-06-30  3:10     ` Ashish Kalra
  0 siblings, 0 replies; 16+ messages in thread
From: Ashish Kalra @ 2021-06-30  3:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: pbonzini, seanjc, tglx, mingo, hpa, joro, Thomas.Lendacky, x86,
	kvm, linux-kernel, srutherford, brijesh.singh, linux-efi

Hello Boris, Paolo,

On Thu, Jun 10, 2021 at 08:30:52PM +0200, Borislav Petkov wrote:
> On Tue, Jun 08, 2021 at 06:06:26PM +0000, Ashish Kalra wrote:
> > +void notify_range_enc_status_changed(unsigned long vaddr, int npages,
> > +				    bool enc)
> 
> You don't need to break this line.
> 
> > @@ -285,12 +333,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
> >  static int __init early_set_memory_enc_dec(unsigned long vaddr,
> >  					   unsigned long size, bool enc)
> >  {
> > -	unsigned long vaddr_end, vaddr_next;
> > +	unsigned long vaddr_end, vaddr_next, start;
> >  	unsigned long psize, pmask;
> >  	int split_page_size_mask;
> >  	int level, ret;
> >  	pte_t *kpte;
> >  
> > +	start = vaddr;
> >  	vaddr_next = vaddr;
> >  	vaddr_end = vaddr + size;
> >  
> > @@ -345,6 +394,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
> >  
> >  	ret = 0;
> >  
> > +	notify_range_enc_status_changed(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> > +					enc);
> 
> Ditto.
> 
> >  out:
> >  	__flush_tlb_all();
> >  	return ret;
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index 156cd235659f..9729cb0d99e3 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -2020,6 +2020,13 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> >  	 */
> >  	cpa_flush(&cpa, 0);
> >  
> > +	/*
> > +	 * Notify hypervisor that a given memory range is mapped encrypted
> > +	 * or decrypted. The hypervisor will use this information during the
> > +	 * VM migration.
> > +	 */
> 
> Simplify that comment:
> 
>         /*
>          * Notify the hypervisor about the encryption status change of the memory
> 	 * range. It will use this information during the VM migration.
>          */
> 
> 
> With those nitpicks fixed:
> 
> Reviewed-by: Borislav Petkov <bp@suse.de>
> 
> Paulo, if you want me to take this, lemme know, but I think it'll
> conflict with patch 5 so perhaps it all should go together through the
> kvm tree...
> 

Will these patches be merged into 5.14 ?

I have posted another version (v5) for patch 5 after more review comments
from Boris, so please pull in all these patches together. 

Thanks,
Ashish

> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CAshish.Kalra%40amd.com%7C142a30170b8145b44a2f08d92c3de599%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637589466634224968%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nCXHGP8%2F9on0DurrLLbBT0MivMWXfNqwS73rKkqclUM%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  2021-06-08 18:06 ` [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL Ashish Kalra
@ 2021-08-19 20:45   ` Sean Christopherson
  2021-08-19 22:08     ` Kalra, Ashish
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2021-08-19 20:45 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, bp, mingo, hpa, joro, Thomas.Lendacky, x86, kvm,
	linux-kernel, srutherford, brijesh.singh, linux-efi

Preferred shortlog prefix for KVM guest changes is "x86/kvm".  "KVM: x86" is for
host changes.

On Tue, Jun 08, 2021, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> KVM hypercall framework relies on alternative framework to patch the
> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> apply_alternative() is called then it defaults to VMCALL. The approach
> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> will be able to decode the instruction and do the right things. But
> when SEV is active, guest memory is encrypted with guest key and
> hypervisor will not be able to decode the instruction bytes.
> 
> So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
> and opt into VMCALL.

The changelog needs to explain why SEV hypercalls need to be made before
apply_alternative(), why it's ok to make Intel CPUs take #UDs on the unknown
VMMCALL, and why this is not creating the same conundrum for TDX.

Actually, I don't think making Intel CPUs take #UDs is acceptable.  This patch
breaks Linux on upstream KVM on Intel due a bug in upstream KVM.  KVM attempts
to patch the "wrong" hypercall to the "right" hypercall, but stupidly does so
via an emulated write.  I.e. KVM honors the guest page table permissions and
injects a !WRITABLE #PF on the VMMCALL RIP if the kernel code is mapped RX.

In other words, trusting the VMM to not screw up the #UD is a bad idea.  This also
makes documenting the "why does SEV need super early hypercalls" extra important.

This patch doesn't work because X86_FEATURE_VMCALL is a synthetic flag and is
only set by VMware paravirt code, which is why the patching doesn't happen as
would be expected.  The obvious solution would be to manually set X86_FEATURE_VMCALL
where appropriate, but given that defaulting to VMCALL has worked for years,
defaulting to VMMCALL makes me nervous, e.g. even if we splatter X86_FEATURE_VMCALL
into Intel, Centaur, and Zhaoxin, there's a possibility we'll break existing VMs
that run on hypervisors that do something weird with the vendor string.

Rather than look for X86_FEATURE_VMCALL, I think it makes sense to have this be
a "pure" inversion, i.e. patch in VMCALL if VMMCALL is not supported, as opposed
to patching in VMCALL if VMCALL is supproted.

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 69299878b200..61641e69cfda 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 #endif /* CONFIG_KVM_GUEST */

 #define KVM_HYPERCALL \
-        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
+        ALTERNATIVE("vmmcall", "vmcall", ALT_NOT(X86_FEATURE_VMMCALL))

 /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
  * instruction.  The hypervisor may replace it with something else but only the

> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org

Suggested-by: Sean Christopherson <seanjc@google.com>

> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>

Is Brijesh the author?  Co-developed-by for a one-line change would be odd...

> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/include/asm/kvm_para.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 69299878b200..0267bebb0b0f 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
>  #endif /* CONFIG_KVM_GUEST */
>  
>  #define KVM_HYPERCALL \
> -        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
> +	ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL)
>  
>  /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
>   * instruction.  The hypervisor may replace it with something else but only the
> -- 
> 2.17.1
> 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  2021-08-19 20:45   ` Sean Christopherson
@ 2021-08-19 22:08     ` Kalra, Ashish
  2021-08-19 23:02       ` Kalra, Ashish
  0 siblings, 1 reply; 16+ messages in thread
From: Kalra, Ashish @ 2021-08-19 22:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, bp, mingo, hpa, joro, Lendacky, Thomas, x86, kvm,
	linux-kernel, srutherford, Singh, Brijesh, linux-efi

Hello Sean,

> On Aug 20, 2021, at 2:15 AM, Sean Christopherson <seanjc@google.com> wrote:
> 
> Preferred shortlog prefix for KVM guest changes is "x86/kvm".  "KVM: x86" is for
> host changes.
> 
>> On Tue, Jun 08, 2021, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>> 
>> KVM hypercall framework relies on alternative framework to patch the
>> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
>> apply_alternative() is called then it defaults to VMCALL. The approach
>> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
>> will be able to decode the instruction and do the right things. But
>> when SEV is active, guest memory is encrypted with guest key and
>> hypervisor will not be able to decode the instruction bytes.
>> 
>> So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
>> and opt into VMCALL.
> 
> The changelog needs to explain why SEV hypercalls need to be made before
> apply_alternative(), why it's ok to make Intel CPUs take #UDs on the unknown
> VMMCALL, and why this is not creating the same conundrum for TDX.

I think it makes more sense to stick to the original approach/patch, i.e., introducing a new private hypercall interface like kvm_sev_hypercall3() and let early paravirtualized kernel code invoke this private hypercall interface wherever required.

This helps avoiding Intel CPUs taking unnecessary #UDs and also avoid using hacks as below.

TDX code can introduce similar private hypercall interface for their early para virtualized kernel code if required.

> 
> Actually, I don't think making Intel CPUs take #UDs is acceptable.  This patch
> breaks Linux on upstream KVM on Intel due a bug in upstream KVM.  KVM attempts
> to patch the "wrong" hypercall to the "right" hypercall, but stupidly does so
> via an emulated write.  I.e. KVM honors the guest page table permissions and
> injects a !WRITABLE #PF on the VMMCALL RIP if the kernel code is mapped RX.
> 
> In other words, trusting the VMM to not screw up the #UD is a bad idea.  This also
> makes documenting the "why does SEV need super early hypercalls" extra important.
> 

Makes sense.

Thanks,
Ashish

> This patch doesn't work because X86_FEATURE_VMCALL is a synthetic flag and is
> only set by VMware paravirt code, which is why the patching doesn't happen as
> would be expected.  The obvious solution would be to manually set X86_FEATURE_VMCALL
> where appropriate, but given that defaulting to VMCALL has worked for years,
> defaulting to VMMCALL makes me nervous, e.g. even if we splatter X86_FEATURE_VMCALL
> into Intel, Centaur, and Zhaoxin, there's a possibility we'll break existing VMs
> that run on hypervisors that do something weird with the vendor string.
> 
> Rather than look for X86_FEATURE_VMCALL, I think it makes sense to have this be
> a "pure" inversion, i.e. patch in VMCALL if VMMCALL is not supported, as opposed
> to patching in VMCALL if VMCALL is supproted.
> 
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 69299878b200..61641e69cfda 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
> #endif /* CONFIG_KVM_GUEST */
> 
> #define KVM_HYPERCALL \
> -        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
> +        ALTERNATIVE("vmmcall", "vmcall", ALT_NOT(X86_FEATURE_VMMCALL))
> 
> /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
>  * instruction.  The hypervisor may replace it with something else but only the
> 
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>> Cc: x86@kernel.org
>> Cc: kvm@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> 
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> 
> Is Brijesh the author?  Co-developed-by for a one-line change would be odd...
> 
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> ---
>> arch/x86/include/asm/kvm_para.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
>> index 69299878b200..0267bebb0b0f 100644
>> --- a/arch/x86/include/asm/kvm_para.h
>> +++ b/arch/x86/include/asm/kvm_para.h
>> @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
>> #endif /* CONFIG_KVM_GUEST */
>> 
>> #define KVM_HYPERCALL \
>> -        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
>> +    ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL)
>> 
>> /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
>>  * instruction.  The hypervisor may replace it with something else but only the
>> -- 
>> 2.17.1
>> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  2021-08-19 22:08     ` Kalra, Ashish
@ 2021-08-19 23:02       ` Kalra, Ashish
  2021-08-19 23:15         ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Kalra, Ashish @ 2021-08-19 23:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, bp, mingo, hpa, joro, Lendacky, Thomas, x86, kvm,
	linux-kernel, srutherford, Singh, Brijesh, linux-efi



> On Aug 20, 2021, at 3:38 AM, Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> 
> Hello Sean,
> 
>> On Aug 20, 2021, at 2:15 AM, Sean Christopherson <seanjc@google.com> wrote:
>> 
>> Preferred shortlog prefix for KVM guest changes is "x86/kvm".  "KVM: x86" is for
>> host changes.
>> 
>>>> On Tue, Jun 08, 2021, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@amd.com>
>>> 
>>> KVM hypercall framework relies on alternative framework to patch the
>>> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
>>> apply_alternative() is called then it defaults to VMCALL. The approach
>>> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
>>> will be able to decode the instruction and do the right things. But
>>> when SEV is active, guest memory is encrypted with guest key and
>>> hypervisor will not be able to decode the instruction bytes.
>>> 
>>> So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
>>> and opt into VMCALL.
>> 
>> The changelog needs to explain why SEV hypercalls need to be made before
>> apply_alternative(), why it's ok to make Intel CPUs take #UDs on the unknown
>> VMMCALL, and why this is not creating the same conundrum for TDX.
> 
> I think it makes more sense to stick to the original approach/patch, i.e., introducing a new private hypercall interface like kvm_sev_hypercall3() and let early paravirtualized kernel code invoke this private hypercall interface wherever required.
> 
> This helps avoiding Intel CPUs taking unnecessary #UDs and also avoid using hacks as below.
> 
> TDX code can introduce similar private hypercall interface for their early para virtualized kernel code if required.

Actually, if we are using this kvm_sev_hypercall3() and not modifying KVM_HYPERCALL() then Intel CPUs avoid unnecessary #UDs and TDX code does not need any new interface. Only early AMD/SEV specific code will use this kvm_sev_hypercall3() interface. TDX code will always work with KVM_HYPERCALL().

Thanks,
Ashish

> 
>> 
>> Actually, I don't think making Intel CPUs take #UDs is acceptable.  This patch
>> breaks Linux on upstream KVM on Intel due a bug in upstream KVM.  KVM attempts
>> to patch the "wrong" hypercall to the "right" hypercall, but stupidly does so
>> via an emulated write.  I.e. KVM honors the guest page table permissions and
>> injects a !WRITABLE #PF on the VMMCALL RIP if the kernel code is mapped RX.
>> 
>> In other words, trusting the VMM to not screw up the #UD is a bad idea.  This also
>> makes documenting the "why does SEV need super early hypercalls" extra important.
>> 
> 
> Makes sense.
> 
> Thanks,
> Ashish
> 
>> This patch doesn't work because X86_FEATURE_VMCALL is a synthetic flag and is
>> only set by VMware paravirt code, which is why the patching doesn't happen as
>> would be expected.  The obvious solution would be to manually set X86_FEATURE_VMCALL
>> where appropriate, but given that defaulting to VMCALL has worked for years,
>> defaulting to VMMCALL makes me nervous, e.g. even if we splatter X86_FEATURE_VMCALL
>> into Intel, Centaur, and Zhaoxin, there's a possibility we'll break existing VMs
>> that run on hypervisors that do something weird with the vendor string.
>> 
>> Rather than look for X86_FEATURE_VMCALL, I think it makes sense to have this be
>> a "pure" inversion, i.e. patch in VMCALL if VMMCALL is not supported, as opposed
>> to patching in VMCALL if VMCALL is supproted.
>> 
>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
>> index 69299878b200..61641e69cfda 100644
>> --- a/arch/x86/include/asm/kvm_para.h
>> +++ b/arch/x86/include/asm/kvm_para.h
>> @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
>> #endif /* CONFIG_KVM_GUEST */
>> 
>> #define KVM_HYPERCALL \
>> -        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
>> +        ALTERNATIVE("vmmcall", "vmcall", ALT_NOT(X86_FEATURE_VMMCALL))
>> 
>> /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
>> * instruction.  The hypervisor may replace it with something else but only the
>> 
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Joerg Roedel <joro@8bytes.org>
>>> Cc: Borislav Petkov <bp@suse.de>
>>> Cc: Tom Lendacky <thomas.lendacky@amd.com>
>>> Cc: x86@kernel.org
>>> Cc: kvm@vger.kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>> 
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> 
>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> 
>> Is Brijesh the author?  Co-developed-by for a one-line change would be odd...
>> 
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>>> ---
>>> arch/x86/include/asm/kvm_para.h | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
>>> index 69299878b200..0267bebb0b0f 100644
>>> --- a/arch/x86/include/asm/kvm_para.h
>>> +++ b/arch/x86/include/asm/kvm_para.h
>>> @@ -17,7 +17,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
>>> #endif /* CONFIG_KVM_GUEST */
>>> 
>>> #define KVM_HYPERCALL \
>>> -        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
>>> +    ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL)
>>> 
>>> /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
>>> * instruction.  The hypervisor may replace it with something else but only the
>>> -- 
>>> 2.17.1
>>> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  2021-08-19 23:02       ` Kalra, Ashish
@ 2021-08-19 23:15         ` Sean Christopherson
  2021-08-20 13:32           ` Ashish Kalra
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2021-08-19 23:15 UTC (permalink / raw)
  To: Kalra, Ashish
  Cc: pbonzini, tglx, bp, mingo, hpa, joro, Lendacky, Thomas, x86, kvm,
	linux-kernel, srutherford, Singh, Brijesh, linux-efi

On Thu, Aug 19, 2021, Kalra, Ashish wrote:
> 
> > On Aug 20, 2021, at 3:38 AM, Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> > I think it makes more sense to stick to the original approach/patch, i.e.,
> > introducing a new private hypercall interface like kvm_sev_hypercall3() and
> > let early paravirtualized kernel code invoke this private hypercall
> > interface wherever required.

I don't like the idea of duplicating code just because the problem is tricky to
solve.  Right now it's just one function, but it could balloon to multiple in
the future.  Plus there's always the possibility of a new, pre-alternatives
kvm_hypercall() being added in generic code, at which point using an SEV-specific
variant gets even uglier.

> > This helps avoiding Intel CPUs taking unnecessary #UDs and also avoid using
> > hacks as below.
> > 
> > TDX code can introduce similar private hypercall interface for their early
> > para virtualized kernel code if required.
> 
> Actually, if we are using this kvm_sev_hypercall3() and not modifying
> KVM_HYPERCALL() then Intel CPUs avoid unnecessary #UDs and TDX code does not
> need any new interface. Only early AMD/SEV specific code will use this
> kvm_sev_hypercall3() interface. TDX code will always work with
> KVM_HYPERCALL().

Even if VMCALL is the default, i.e. not patched in, VMCALL it will #VE on TDX.
In other words, VMCALL isn't really any better than VMMCALL, TDX will need to do
something clever either way.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  2021-08-19 23:15         ` Sean Christopherson
@ 2021-08-20 13:32           ` Ashish Kalra
  0 siblings, 0 replies; 16+ messages in thread
From: Ashish Kalra @ 2021-08-20 13:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, tglx, bp, mingo, hpa, joro, Lendacky, Thomas, x86, kvm,
	linux-kernel, srutherford, Singh, Brijesh, linux-efi

On Thu, Aug 19, 2021 at 11:15:26PM +0000, Sean Christopherson wrote:
> On Thu, Aug 19, 2021, Kalra, Ashish wrote:
> > 
> > > On Aug 20, 2021, at 3:38 AM, Kalra, Ashish <Ashish.Kalra@amd.com> wrote:
> > > I think it makes more sense to stick to the original approach/patch, i.e.,
> > > introducing a new private hypercall interface like kvm_sev_hypercall3() and
> > > let early paravirtualized kernel code invoke this private hypercall
> > > interface wherever required.
> 
> I don't like the idea of duplicating code just because the problem is tricky to
> solve.  Right now it's just one function, but it could balloon to multiple in
> the future.  Plus there's always the possibility of a new, pre-alternatives
> kvm_hypercall() being added in generic code, at which point using an SEV-specific
> variant gets even uglier.
> 

Also to highlight the need to support this interface, capturing the flow
of apply_alternatives() as part of this thread: 

setup_arch() call init_hypervisor_platform() which detects the
hypervisor platform the kernel is running under and then the hypervisor
specific initialization code can make early hypercalls. For example, KVM
specific initialization in case of SEV will try to mark the
"__bss_decrypted" section's encryption state via early page encryption
status hypercalls. 

Now, apply_alternatives() is called much later when setup_arch() calls
check_bugs(), so we do need some kind of an early, pre-alternatives
hypercall interface. 

Other cases of pre-alternatives hypercalls include marking per-cpu GHCB
pages as decrypted on SEV-ES and per-cpu apf_reason, steal_time and
kvm_apic_eoi as decrypted for SEV generally.

Actually using this kvm_sev_hypercall3() function may be abstracted
quite nicely. All these early hypercalls are made through
early_set_memory_XX() interfaces, which in turn invoke pv_ops. 

Now, pv_ops can have this SEV/TDX specific abstractions.

Currently, pv_ops.mmu.notify_page_enc_status_changed() callback is setup
to kvm_sev_hypercall3() in case of SEV.

Similarly, in case of TDX, pv_ops.mmu.notify_page_enc_status_changed() can
be setup to a TDX specific callback. 

Therefore, this early_set_memory_XX() -> pv_ops.mmu.notify_page_enc_status_changed()
is a generic interface and can easily have SEV, TDX and any other future platform
specific abstractions added to it.

Thanks,
Ashish

> > > This helps avoiding Intel CPUs taking unnecessary #UDs and also avoid using
> > > hacks as below.
> > > 
> > > TDX code can introduce similar private hypercall interface for their early
> > > para virtualized kernel code if required.
> > 
> > Actually, if we are using this kvm_sev_hypercall3() and not modifying
> > KVM_HYPERCALL() then Intel CPUs avoid unnecessary #UDs and TDX code does not
> > need any new interface. Only early AMD/SEV specific code will use this
> > kvm_sev_hypercall3() interface. TDX code will always work with
> > KVM_HYPERCALL().
> 
> Even if VMCALL is the default, i.e. not patched in, VMCALL it will #VE on TDX.
> In other words, VMCALL isn't really any better than VMMCALL, TDX will need to do
> something clever either way.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-08-20 13:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-08 18:05 [PATCH v3 0/5] Add Guest API & Guest Kernel support for SEV live migration Ashish Kalra
2021-06-08 18:05 ` [PATCH v3 1/5] KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall Ashish Kalra
2021-06-10 16:58   ` Paolo Bonzini
2021-06-08 18:06 ` [PATCH v3 2/5] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL Ashish Kalra
2021-08-19 20:45   ` Sean Christopherson
2021-08-19 22:08     ` Kalra, Ashish
2021-08-19 23:02       ` Kalra, Ashish
2021-08-19 23:15         ` Sean Christopherson
2021-08-20 13:32           ` Ashish Kalra
2021-06-08 18:06 ` [PATCH v3 3/5] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
2021-06-10 18:30   ` Borislav Petkov
2021-06-30  3:10     ` Ashish Kalra
2021-06-08 18:06 ` [PATCH v3 4/5] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
2021-06-10 15:01   ` Ard Biesheuvel
2021-06-08 18:07 ` [PATCH v3 5/5] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
2021-06-10 18:32   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).