kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Add guest support for SEV live migration.
@ 2021-04-23 15:57 Ashish Kalra
  2021-04-23 15:58 ` [PATCH v2 1/4] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL Ashish Kalra
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Ashish Kalra @ 2021-04-23 15:57 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

The series adds guest support for SEV live migration.

The patch series introduces a new hypercall. The guest OS can use this
hypercall to notify the page encryption status. If the page is encrypted
with guest specific-key then we use SEV command during the migration.
If page is not encrypted then fallback to default.

This section descibes how the SEV live migration feature is negotiated
between the host and guest, the host indicates this feature support via 
KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
sets a UEFI enviroment variable indicating OVMF support for live
migration, the guest kernel also detects the host support for this
feature via cpuid and in case of an EFI boot verifies if OVMF also
supports this feature by getting the UEFI enviroment variable and if it
set then enables live migration feature on host by writing to a custom
MSR, if not booted under EFI, then it simply enables the feature by
again writing to the custom MSR.

Changes since v1:
 - Avoid having an SEV specific variant of kvm_hypercall3() and instead
   invert the default to VMMCALL.

Ashish Kalra (3):
  KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  EFI: Introduce the new AMD Memory Encryption GUID.
  x86/kvm: Add guest support for detecting and enabling SEV Live
    Migration feature.

Brijesh Singh (1):
  mm: x86: Invoke hypercall when page encryption status is changed

 arch/x86/include/asm/kvm_para.h       |   2 +-
 arch/x86/include/asm/mem_encrypt.h    |   4 +
 arch/x86/include/asm/paravirt.h       |   6 ++
 arch/x86/include/asm/paravirt_types.h |   2 +
 arch/x86/include/asm/set_memory.h     |   2 +
 arch/x86/kernel/kvm.c                 | 106 ++++++++++++++++++++++++++
 arch/x86/kernel/paravirt.c            |   1 +
 arch/x86/mm/mem_encrypt.c             |  72 ++++++++++++++---
 arch/x86/mm/pat/set_memory.c          |   7 ++
 include/linux/efi.h                   |   1 +
 10 files changed, 193 insertions(+), 10 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 1/4] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  2021-04-23 15:57 [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
@ 2021-04-23 15:58 ` Ashish Kalra
  2021-04-23 16:31   ` Jim Mattson
  2021-04-23 15:58 ` [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Ashish Kalra @ 2021-04-23 15:58 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

KVM hypercall framework relies on alternative framework to patch the
VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
apply_alternative() is called then it defaults to VMCALL. The approach
works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
will be able to decode the instruction and do the right things. But
when SEV is active, guest memory is encrypted with guest key and
hypervisor will not be able to decode the instruction bytes.

So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
and opt into VMCALL.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/kvm_para.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 338119852512..fda2fe0d1b10 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -19,7 +19,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 #endif /* CONFIG_KVM_GUEST */
 
 #define KVM_HYPERCALL \
-        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
+	ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL)
 
 /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
  * instruction.  The hypervisor may replace it with something else but only the
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-04-23 15:57 [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
  2021-04-23 15:58 ` [PATCH v2 1/4] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL Ashish Kalra
@ 2021-04-23 15:58 ` Ashish Kalra
  2021-05-12 13:15   ` Borislav Petkov
  2021-04-23 15:59 ` [PATCH v2 3/4] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Ashish Kalra @ 2021-04-23 15:58 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Brijesh Singh <brijesh.singh@amd.com>

Invoke a hypercall when a memory region is changed from encrypted ->
decrypted and vice versa. Hypervisor needs to know the page encryption
status during the guest migration.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Reviewed-by: Venu Busireddy <venu.busireddy@oracle.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/paravirt.h       |  6 +++
 arch/x86/include/asm/paravirt_types.h |  2 +
 arch/x86/include/asm/set_memory.h     |  2 +
 arch/x86/kernel/paravirt.c            |  1 +
 arch/x86/mm/mem_encrypt.c             | 66 +++++++++++++++++++++++----
 arch/x86/mm/pat/set_memory.c          |  7 +++
 6 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 4abf110e2243..c0ef13e97f5a 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -84,6 +84,12 @@ static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 	PVOP_VCALL1(mmu.exit_mmap, mm);
 }
 
+static inline void notify_page_enc_status_changed(unsigned long pfn,
+						  int npages, bool enc)
+{
+	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
+}
+
 #ifdef CONFIG_PARAVIRT_XXL
 static inline void load_sp0(unsigned long sp0)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index de87087d3bde..7245d08f5500 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -195,6 +195,8 @@ struct pv_mmu_ops {
 
 	/* Hook for intercepting the destruction of an mm_struct. */
 	void (*exit_mmap)(struct mm_struct *mm);
+	void (*notify_page_enc_status_changed)(unsigned long pfn, int npages,
+					       bool enc);
 
 #ifdef CONFIG_PARAVIRT_XXL
 	struct paravirt_callee_save read_cr2;
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 4352f08bfbb5..ed9cfe062634 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -83,6 +83,8 @@ int set_pages_rw(struct page *page, int numpages);
 int set_direct_map_invalid_noflush(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
 bool kernel_page_present(struct page *page);
+void notify_addr_enc_status_changed(unsigned long vaddr, int npages,
+				    bool enc);
 
 extern int kernel_set_to_readonly;
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c60222ab8ab9..192230247ad7 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -335,6 +335,7 @@ struct paravirt_patch_template pv_ops = {
 			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
 
 	.mmu.exit_mmap		= paravirt_nop,
+	.mmu.notify_page_enc_status_changed	= paravirt_nop,
 
 #ifdef CONFIG_PARAVIRT_XXL
 	.mmu.read_cr2		= __PV_IS_CALLEE_SAVE(native_read_cr2),
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 4b01f7dbaf30..e4b94099645b 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -229,29 +229,74 @@ void __init sev_setup_arch(void)
 	swiotlb_adjust_size(size);
 }
 
-static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
+static unsigned long pg_level_to_pfn(int level, pte_t *kpte, pgprot_t *ret_prot)
 {
-	pgprot_t old_prot, new_prot;
-	unsigned long pfn, pa, size;
-	pte_t new_pte;
+	unsigned long pfn = 0;
+	pgprot_t prot;
 
 	switch (level) {
 	case PG_LEVEL_4K:
 		pfn = pte_pfn(*kpte);
-		old_prot = pte_pgprot(*kpte);
+		prot = pte_pgprot(*kpte);
 		break;
 	case PG_LEVEL_2M:
 		pfn = pmd_pfn(*(pmd_t *)kpte);
-		old_prot = pmd_pgprot(*(pmd_t *)kpte);
+		prot = pmd_pgprot(*(pmd_t *)kpte);
 		break;
 	case PG_LEVEL_1G:
 		pfn = pud_pfn(*(pud_t *)kpte);
-		old_prot = pud_pgprot(*(pud_t *)kpte);
+		prot = pud_pgprot(*(pud_t *)kpte);
 		break;
 	default:
-		return;
+		return 0;
 	}
 
+	if (ret_prot)
+		*ret_prot = prot;
+
+	return pfn;
+}
+
+void notify_addr_enc_status_changed(unsigned long vaddr, int npages,
+				    bool enc)
+{
+#ifdef CONFIG_PARAVIRT
+	unsigned long sz = npages << PAGE_SHIFT;
+	unsigned long vaddr_end = vaddr + sz;
+
+	while (vaddr < vaddr_end) {
+		int psize, pmask, level;
+		unsigned long pfn;
+		pte_t *kpte;
+
+		kpte = lookup_address(vaddr, &level);
+		if (!kpte || pte_none(*kpte))
+			return;
+
+		pfn = pg_level_to_pfn(level, kpte, NULL);
+		if (!pfn)
+			continue;
+
+		psize = page_level_size(level);
+		pmask = page_level_mask(level);
+
+		notify_page_enc_status_changed(pfn, psize >> PAGE_SHIFT, enc);
+
+		vaddr = (vaddr & pmask) + psize;
+	}
+#endif
+}
+
+static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
+{
+	pgprot_t old_prot, new_prot;
+	unsigned long pfn, pa, size;
+	pte_t new_pte;
+
+	pfn = pg_level_to_pfn(level, kpte, &old_prot);
+	if (!pfn)
+		return;
+
 	new_prot = old_prot;
 	if (enc)
 		pgprot_val(new_prot) |= _PAGE_ENC;
@@ -286,12 +331,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
 static int __init early_set_memory_enc_dec(unsigned long vaddr,
 					   unsigned long size, bool enc)
 {
-	unsigned long vaddr_end, vaddr_next;
+	unsigned long vaddr_end, vaddr_next, start;
 	unsigned long psize, pmask;
 	int split_page_size_mask;
 	int level, ret;
 	pte_t *kpte;
 
+	start = vaddr;
 	vaddr_next = vaddr;
 	vaddr_end = vaddr + size;
 
@@ -346,6 +392,8 @@ static int __init early_set_memory_enc_dec(unsigned long vaddr,
 
 	ret = 0;
 
+	notify_addr_enc_status_changed(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
+					enc);
 out:
 	__flush_tlb_all();
 	return ret;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 16f878c26667..45e65517405a 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2012,6 +2012,13 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 	 */
 	cpa_flush(&cpa, 0);
 
+	/*
+	 * Notify hypervisor that a given memory range is mapped encrypted
+	 * or decrypted. The hypervisor will use this information during the
+	 * VM migration.
+	 */
+	notify_addr_enc_status_changed(addr, numpages, enc);
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 3/4] EFI: Introduce the new AMD Memory Encryption GUID.
  2021-04-23 15:57 [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
  2021-04-23 15:58 ` [PATCH v2 1/4] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL Ashish Kalra
  2021-04-23 15:58 ` [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
@ 2021-04-23 15:59 ` Ashish Kalra
  2021-05-12 13:19   ` Borislav Petkov
  2021-04-23 15:59 ` [PATCH v2 4/4] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
  2021-04-30  7:19 ` [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
  4 siblings, 1 reply; 33+ messages in thread
From: Ashish Kalra @ 2021-04-23 15:59 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

Introduce a new AMD Memory Encryption GUID which is currently
used for defining a new UEFI environment variable which indicates
UEFI/OVMF support for the SEV live migration feature. This variable
is setup when UEFI/OVMF detects host/hypervisor support for SEV
live migration and later this variable is read by the kernel using
EFI runtime services to verify if OVMF supports the live migration
feature.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 include/linux/efi.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8710f5710c1d..e95c144d1d02 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -360,6 +360,7 @@ void efi_native_runtime_setup(void);
 
 /* OEM GUIDs */
 #define DELLEMC_EFI_RCI2_TABLE_GUID		EFI_GUID(0x2d9f28a2, 0xa886, 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
+#define MEM_ENCRYPT_GUID			EFI_GUID(0x0cf29b71, 0x9e51, 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
 
 typedef struct {
 	efi_guid_t guid;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 4/4] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature.
  2021-04-23 15:57 [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
                   ` (2 preceding siblings ...)
  2021-04-23 15:59 ` [PATCH v2 3/4] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
@ 2021-04-23 15:59 ` Ashish Kalra
  2021-04-30  7:19 ` [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
  4 siblings, 0 replies; 33+ messages in thread
From: Ashish Kalra @ 2021-04-23 15:59 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

From: Ashish Kalra <ashish.kalra@amd.com>

The guest support for detecting and enabling SEV Live migration
feature uses the following logic :

 - kvm_init_plaform() checks if its booted under the EFI

   - If not EFI,

     i) if kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), issue a wrmsrl()
         to enable the SEV live migration support

   - If EFI,

     i) If kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL), read
        the UEFI variable which indicates OVMF support for live migration

     ii) the variable indicates live migration is supported, issue a wrmsrl() to
          enable the SEV live migration support

The EFI live migration check is done using a late_initcall() callback.

Also, ensure that _bss_decrypted section is marked as decrypted in the
shared pages list.

Also adds kexec support for SEV Live Migration.

Reset the host's shared pages list related to kernel
specific page encryption status settings before we load a
new kernel by kexec. We cannot reset the complete
shared pages list here as we need to retain the
UEFI/OVMF firmware specific settings.

The host's shared pages list is maintained for the
guest to keep track of all unencrypted guest memory regions,
therefore we need to explicitly mark all shared pages as
encrypted again before rebooting into the new guest kernel.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |   4 ++
 arch/x86/kernel/kvm.c              | 106 +++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt.c          |   6 ++
 3 files changed, 116 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 31c4df123aa0..776e3378a782 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -44,6 +44,8 @@ void __init sme_enable(struct boot_params *bp);
 
 int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
 int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
+void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
+					    bool enc);
 
 void __init mem_encrypt_free_decrypted_mem(void);
 
@@ -84,6 +86,8 @@ static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
 static inline int __init
 early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
+static inline void __init
+early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages, bool enc) {}
 
 static inline void mem_encrypt_free_decrypted_mem(void) { }
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 224a7a1ed6c3..1844b200f6d4 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -26,6 +26,7 @@
 #include <linux/kprobes.h>
 #include <linux/nmi.h>
 #include <linux/swait.h>
+#include <linux/efi.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -38,6 +39,7 @@
 #include <asm/cpuidle_haltpoll.h>
 #include <asm/ptrace.h>
 #include <asm/svm.h>
+#include <asm/e820/api.h>
 
 DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
 
@@ -383,6 +385,8 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
 	 */
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		wrmsrl(MSR_KVM_PV_EOI_EN, 0);
+	if (kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL))
+		wrmsrl(MSR_KVM_MIGRATION_CONTROL, 0);
 	kvm_pv_disable_apf();
 	kvm_disable_steal_time();
 }
@@ -429,6 +433,55 @@ static inline void __set_percpu_decrypted(void *ptr, unsigned long size)
 	early_set_memory_decrypted((unsigned long) ptr, size);
 }
 
+static int __init setup_efi_kvm_sev_migration(void)
+{
+	efi_char16_t efi_sev_live_migration_enabled[] = L"SevLiveMigrationEnabled";
+	efi_guid_t efi_variable_guid = MEM_ENCRYPT_GUID;
+	efi_status_t status;
+	unsigned long size;
+	bool enabled;
+
+	if (!sev_active() ||
+	    !kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL))
+		return 0;
+
+	if (!efi_enabled(EFI_BOOT))
+		return 0;
+
+	if (!efi_enabled(EFI_RUNTIME_SERVICES)) {
+		pr_info("%s : EFI runtime services are not enabled\n", __func__);
+		return 0;
+	}
+
+	size = sizeof(enabled);
+
+	/* Get variable contents into buffer */
+	status = efi.get_variable(efi_sev_live_migration_enabled,
+				  &efi_variable_guid, NULL, &size, &enabled);
+
+	if (status == EFI_NOT_FOUND) {
+		pr_info("%s : EFI live migration variable not found\n", __func__);
+		return 0;
+	}
+
+	if (status != EFI_SUCCESS) {
+		pr_info("%s : EFI variable retrieval failed\n", __func__);
+		return 0;
+	}
+
+	if (enabled == 0) {
+		pr_info("%s: live migration disabled in EFI\n", __func__);
+		return 0;
+	}
+
+	pr_info("%s : live migration enabled in EFI\n", __func__);
+	wrmsrl(MSR_KVM_MIGRATION_CONTROL, KVM_PAGE_ENC_STATUS_UPTODATE);
+
+	return true;
+}
+
+late_initcall(setup_efi_kvm_sev_migration);
+
 /*
  * Iterate through all possible CPUs and map the memory region pointed
  * by apf_reason, steal_time and kvm_apic_eoi as decrypted at once.
@@ -763,8 +816,61 @@ static bool __init kvm_msi_ext_dest_id(void)
 	return kvm_para_has_feature(KVM_FEATURE_MSI_EXT_DEST_ID);
 }
 
+static void kvm_sev_hc_page_enc_status(unsigned long pfn, int npages, bool enc)
+{
+	kvm_hypercall3(KVM_HC_PAGE_ENC_STATUS, pfn << PAGE_SHIFT, npages,
+			   enc);
+}
+
 static void __init kvm_init_platform(void)
 {
+	if (sev_active() &&
+	    kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) {
+		unsigned long nr_pages;
+		int i;
+
+		pv_ops.mmu.notify_page_enc_status_changed =
+			kvm_sev_hc_page_enc_status;
+
+		/*
+		 * Reset the host's shared pages list related to kernel
+		 * specific page encryption status settings before we load a
+		 * new kernel by kexec. Reset the page encryption status
+		 * during early boot intead of just before kexec to avoid SMP
+		 * races during kvm_pv_guest_cpu_reboot().
+		 * NOTE: We cannot reset the complete shared pages list
+		 * here as we need to retain the UEFI/OVMF firmware
+		 * specific settings.
+		 */
+
+		for (i = 0; i < e820_table->nr_entries; i++) {
+			struct e820_entry *entry = &e820_table->entries[i];
+
+			if (entry->type != E820_TYPE_RAM)
+				continue;
+
+			nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
+
+			kvm_hypercall3(KVM_HC_PAGE_ENC_STATUS, entry->addr,
+					   nr_pages, 1);
+		}
+
+		/*
+		 * Ensure that _bss_decrypted section is marked as decrypted in the
+		 * shared pages list.
+		 */
+		nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
+					PAGE_SIZE);
+		early_set_mem_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
+						nr_pages, 0);
+
+		/*
+		 * If not booted using EFI, enable Live migration support.
+		 */
+		if (!efi_enabled(EFI_BOOT))
+			wrmsrl(MSR_KVM_MIGRATION_CONTROL,
+			       KVM_PAGE_ENC_STATUS_UPTODATE);
+	}
 	kvmclock_init();
 	x86_platform.apic_post_init = kvm_apic_init;
 }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index e4b94099645b..5294e7057ce1 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -409,6 +409,12 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size)
 	return early_set_memory_enc_dec(vaddr, size, true);
 }
 
+void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
+					bool enc)
+{
+	notify_addr_enc_status_changed(vaddr, npages, enc);
+}
+
 /*
  * SME and SEV are very similar but they are not the same, so there are
  * times that the kernel will need to distinguish between SME and SEV. The
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  2021-04-23 15:58 ` [PATCH v2 1/4] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL Ashish Kalra
@ 2021-04-23 16:31   ` Jim Mattson
  2021-04-23 17:44     ` Sean Christopherson
  0 siblings, 1 reply; 33+ messages in thread
From: Jim Mattson @ 2021-04-23 16:31 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	Joerg Roedel, Borislav Petkov, Tom Lendacky,
	the arch/x86 maintainers, kvm list, LKML, Steve Rutherford,
	Sean Christopherson, venu.busireddy, Brijesh Singh

On Fri, Apr 23, 2021 at 9:00 AM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> KVM hypercall framework relies on alternative framework to patch the
> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> apply_alternative() is called then it defaults to VMCALL. The approach
> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> will be able to decode the instruction and do the right things. But
> when SEV is active, guest memory is encrypted with guest key and
> hypervisor will not be able to decode the instruction bytes.
>
> So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
> and opt into VMCALL.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Cc: x86@kernel.org
> Cc: kvm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/include/asm/kvm_para.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 338119852512..fda2fe0d1b10 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -19,7 +19,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
>  #endif /* CONFIG_KVM_GUEST */
>
>  #define KVM_HYPERCALL \
> -        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
> +       ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL)
>
>  /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
>   * instruction.  The hypervisor may replace it with something else but only the
> --
> 2.17.1
>

Won't this result in the same problem when Intel implements full VM encryption?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/4] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
  2021-04-23 16:31   ` Jim Mattson
@ 2021-04-23 17:44     ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2021-04-23 17:44 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Ashish Kalra, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, Joerg Roedel, Borislav Petkov, Tom Lendacky,
	the arch/x86 maintainers, kvm list, LKML, Steve Rutherford,
	venu.busireddy, Brijesh Singh

On Fri, Apr 23, 2021, Jim Mattson wrote:
> On Fri, Apr 23, 2021 at 9:00 AM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >
> > From: Ashish Kalra <ashish.kalra@amd.com>
> >
> > KVM hypercall framework relies on alternative framework to patch the
> > VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> > apply_alternative() is called then it defaults to VMCALL. The approach
> > works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> > will be able to decode the instruction and do the right things. But
> > when SEV is active, guest memory is encrypted with guest key and
> > hypervisor will not be able to decode the instruction bytes.
> >
> > So invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
> > and opt into VMCALL.
> >
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > Cc: x86@kernel.org
> > Cc: kvm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >  arch/x86/include/asm/kvm_para.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> > index 338119852512..fda2fe0d1b10 100644
> > --- a/arch/x86/include/asm/kvm_para.h
> > +++ b/arch/x86/include/asm/kvm_para.h
> > @@ -19,7 +19,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
> >  #endif /* CONFIG_KVM_GUEST */
> >
> >  #define KVM_HYPERCALL \
> > -        ALTERNATIVE("vmcall", "vmmcall", X86_FEATURE_VMMCALL)
> > +       ALTERNATIVE("vmmcall", "vmcall", X86_FEATURE_VMCALL)
> >
> >  /* For KVM hypercalls, a three-byte sequence of either the vmcall or the vmmcall
> >   * instruction.  The hypervisor may replace it with something else but only the
> > --
> > 2.17.1
> >
> 
> Won't this result in the same problem when Intel implements full VM encryption?

TDX uses yet another opcode, TDCALL, along with a different ABI.  The existing
KVM hypercalls are then tunneled through that ABI.  TDX-specific hypercalls,
which will handle the private vs. shared conversions, will not go through the
KVM defined hypercalls because Intel has defined an ABI for guest/host
communication to handle hypercalls that will be needed by all guest+VMM combos.
E.g. to allow Linux/Windows guests to run on KVM/HyperV/VMware without having to
have additional enlightment for the "basic" functionality.

TL;DR: TDX won't use kvm_hypercall() before apply_alternative().

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/4] Add guest support for SEV live migration.
  2021-04-23 15:57 [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
                   ` (3 preceding siblings ...)
  2021-04-23 15:59 ` [PATCH v2 4/4] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
@ 2021-04-30  7:19 ` Ashish Kalra
  2021-04-30  7:40   ` Paolo Bonzini
  4 siblings, 1 reply; 33+ messages in thread
From: Ashish Kalra @ 2021-04-30  7:19 UTC (permalink / raw)
  To: pbonzini
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

Hello Boris, Paolo,

Do you have additional comments, feedback on this updated patchset ?

Thanks,
Ashish

On Fri, Apr 23, 2021 at 03:57:37PM +0000, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> The series adds guest support for SEV live migration.
> 
> The patch series introduces a new hypercall. The guest OS can use this
> hypercall to notify the page encryption status. If the page is encrypted
> with guest specific-key then we use SEV command during the migration.
> If page is not encrypted then fallback to default.
> 
> This section descibes how the SEV live migration feature is negotiated
> between the host and guest, the host indicates this feature support via 
> KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
> sets a UEFI enviroment variable indicating OVMF support for live
> migration, the guest kernel also detects the host support for this
> feature via cpuid and in case of an EFI boot verifies if OVMF also
> supports this feature by getting the UEFI enviroment variable and if it
> set then enables live migration feature on host by writing to a custom
> MSR, if not booted under EFI, then it simply enables the feature by
> again writing to the custom MSR.
> 
> Changes since v1:
>  - Avoid having an SEV specific variant of kvm_hypercall3() and instead
>    invert the default to VMMCALL.
> 
> Ashish Kalra (3):
>   KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
>   EFI: Introduce the new AMD Memory Encryption GUID.
>   x86/kvm: Add guest support for detecting and enabling SEV Live
>     Migration feature.
> 
> Brijesh Singh (1):
>   mm: x86: Invoke hypercall when page encryption status is changed
> 
>  arch/x86/include/asm/kvm_para.h       |   2 +-
>  arch/x86/include/asm/mem_encrypt.h    |   4 +
>  arch/x86/include/asm/paravirt.h       |   6 ++
>  arch/x86/include/asm/paravirt_types.h |   2 +
>  arch/x86/include/asm/set_memory.h     |   2 +
>  arch/x86/kernel/kvm.c                 | 106 ++++++++++++++++++++++++++
>  arch/x86/kernel/paravirt.c            |   1 +
>  arch/x86/mm/mem_encrypt.c             |  72 ++++++++++++++---
>  arch/x86/mm/pat/set_memory.c          |   7 ++
>  include/linux/efi.h                   |   1 +
>  10 files changed, 193 insertions(+), 10 deletions(-)
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/4] Add guest support for SEV live migration.
  2021-04-30  7:19 ` [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
@ 2021-04-30  7:40   ` Paolo Bonzini
  0 siblings, 0 replies; 33+ messages in thread
From: Paolo Bonzini @ 2021-04-30  7:40 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: tglx, mingo, hpa, joro, bp, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh

On 30/04/21 09:19, Ashish Kalra wrote:
> Hello Boris, Paolo,
> 
> Do you have additional comments, feedback on this updated patchset ?

Just a small one that the guest ABI isn't finished yet; some #defines 
are going to change names and there will be two separate CPUIDs, one for 
the hypercall and one for the MSR.  It would be nice to provide correct 
encryption status even if the migration MSR is not there, because it can 
be useful for debugging.

Paolo

> Thanks,
> Ashish
> 
> On Fri, Apr 23, 2021 at 03:57:37PM +0000, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> The series adds guest support for SEV live migration.
>>
>> The patch series introduces a new hypercall. The guest OS can use this
>> hypercall to notify the page encryption status. If the page is encrypted
>> with guest specific-key then we use SEV command during the migration.
>> If page is not encrypted then fallback to default.
>>
>> This section descibes how the SEV live migration feature is negotiated
>> between the host and guest, the host indicates this feature support via
>> KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
>> sets a UEFI enviroment variable indicating OVMF support for live
>> migration, the guest kernel also detects the host support for this
>> feature via cpuid and in case of an EFI boot verifies if OVMF also
>> supports this feature by getting the UEFI enviroment variable and if it
>> set then enables live migration feature on host by writing to a custom
>> MSR, if not booted under EFI, then it simply enables the feature by
>> again writing to the custom MSR.
>>
>> Changes since v1:
>>   - Avoid having an SEV specific variant of kvm_hypercall3() and instead
>>     invert the default to VMMCALL.
>>
>> Ashish Kalra (3):
>>    KVM: x86: invert KVM_HYPERCALL to default to VMMCALL
>>    EFI: Introduce the new AMD Memory Encryption GUID.
>>    x86/kvm: Add guest support for detecting and enabling SEV Live
>>      Migration feature.
>>
>> Brijesh Singh (1):
>>    mm: x86: Invoke hypercall when page encryption status is changed
>>
>>   arch/x86/include/asm/kvm_para.h       |   2 +-
>>   arch/x86/include/asm/mem_encrypt.h    |   4 +
>>   arch/x86/include/asm/paravirt.h       |   6 ++
>>   arch/x86/include/asm/paravirt_types.h |   2 +
>>   arch/x86/include/asm/set_memory.h     |   2 +
>>   arch/x86/kernel/kvm.c                 | 106 ++++++++++++++++++++++++++
>>   arch/x86/kernel/paravirt.c            |   1 +
>>   arch/x86/mm/mem_encrypt.c             |  72 ++++++++++++++---
>>   arch/x86/mm/pat/set_memory.c          |   7 ++
>>   include/linux/efi.h                   |   1 +
>>   10 files changed, 193 insertions(+), 10 deletions(-)
>>
>> -- 
>> 2.17.1
>>
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-04-23 15:58 ` [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
@ 2021-05-12 13:15   ` Borislav Petkov
  2021-05-12 15:51     ` Sean Christopherson
                       ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Borislav Petkov @ 2021-05-12 13:15 UTC (permalink / raw)
  To: Ashish Kalra, seanjc
  Cc: pbonzini, tglx, mingo, hpa, joro, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, venu.busireddy, brijesh.singh

On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> +static inline void notify_page_enc_status_changed(unsigned long pfn,
> +						  int npages, bool enc)
> +{
> +	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> +}

Now the question is whether something like that is needed for TDX, and,
if so, could it be shared by both.

Sean?

> +void notify_addr_enc_status_changed(unsigned long vaddr, int npages,
> +				    bool enc)

Let that line stick out.

> +{
> +#ifdef CONFIG_PARAVIRT
> +	unsigned long sz = npages << PAGE_SHIFT;
> +	unsigned long vaddr_end = vaddr + sz;
> +
> +	while (vaddr < vaddr_end) {
> +		int psize, pmask, level;
> +		unsigned long pfn;
> +		pte_t *kpte;
> +
> +		kpte = lookup_address(vaddr, &level);
> +		if (!kpte || pte_none(*kpte))
> +			return;

What does this mean exactly? On the first failure to lookup the address,
you return? Why not continue so that you can notify about the remaining
pages in [vaddr - vaddr_end)?

Also, what does it mean for the current range if the lookup fails?
Innocuous situation or do you need to signal it with a WARN or so?

> +
> +		pfn = pg_level_to_pfn(level, kpte, NULL);
> +		if (!pfn)
> +			continue;

Same here: if it hits the default case, wouldn't it make sense to
WARN_ONCE or so to catch potential misuse? Or better yet, the WARN_ONCE
should be in pg_level_to_pfn().

> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 16f878c26667..45e65517405a 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2012,6 +2012,13 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  	 */
>  	cpa_flush(&cpa, 0);
>  
> +	/*
> +	 * Notify hypervisor that a given memory range is mapped encrypted
> +	 * or decrypted. The hypervisor will use this information during the
> +	 * VM migration.
> +	 */
> +	notify_addr_enc_status_changed(addr, numpages, enc);

If you notify about a range then that function should be called

	notify_range_enc_status_changed

or so.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/4] EFI: Introduce the new AMD Memory Encryption GUID.
  2021-04-23 15:59 ` [PATCH v2 3/4] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
@ 2021-05-12 13:19   ` Borislav Petkov
  2021-05-12 14:53     ` Ard Biesheuvel
  0 siblings, 1 reply; 33+ messages in thread
From: Borislav Petkov @ 2021-05-12 13:19 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: pbonzini, tglx, mingo, hpa, joro, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, seanjc, venu.busireddy, brijesh.singh,
	linux-efi

On Fri, Apr 23, 2021 at 03:59:01PM +0000, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> Introduce a new AMD Memory Encryption GUID which is currently
> used for defining a new UEFI environment variable which indicates
> UEFI/OVMF support for the SEV live migration feature. This variable
> is setup when UEFI/OVMF detects host/hypervisor support for SEV
> live migration and later this variable is read by the kernel using
> EFI runtime services to verify if OVMF supports the live migration
> feature.
> 
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  include/linux/efi.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 8710f5710c1d..e95c144d1d02 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -360,6 +360,7 @@ void efi_native_runtime_setup(void);
>  
>  /* OEM GUIDs */
>  #define DELLEMC_EFI_RCI2_TABLE_GUID		EFI_GUID(0x2d9f28a2, 0xa886, 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
> +#define MEM_ENCRYPT_GUID			EFI_GUID(0x0cf29b71, 0x9e51, 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
>  
>  typedef struct {
>  	efi_guid_t guid;
> -- 

When you apply this patch locally, you do:

$ git log -p -1 | ./scripts/get_maintainer.pl
Ard Biesheuvel <ardb@kernel.org> (maintainer:EXTENSIBLE FIRMWARE INTERFACE (EFI))
linux-efi@vger.kernel.org (open list:EXTENSIBLE FIRMWARE INTERFACE (EFI))
linux-kernel@vger.kernel.org (open list)

and this tells you that you need to CC EFI folks too.

I've CCed linux-efi now - please make sure you use that script to CC the
relevant parties on patches, in the future.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/4] EFI: Introduce the new AMD Memory Encryption GUID.
  2021-05-12 13:19   ` Borislav Petkov
@ 2021-05-12 14:53     ` Ard Biesheuvel
  2021-05-13  4:36       ` Ashish Kalra
  0 siblings, 1 reply; 33+ messages in thread
From: Ard Biesheuvel @ 2021-05-12 14:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ashish Kalra, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Tom Lendacky, X86 ML, kvm,
	Linux Kernel Mailing List, srutherford, Sean Christopherson,
	venu.busireddy, Brijesh Singh, linux-efi

On Wed, 12 May 2021 at 15:19, Borislav Petkov <bp@alien8.de> wrote:
>
> On Fri, Apr 23, 2021 at 03:59:01PM +0000, Ashish Kalra wrote:
> > From: Ashish Kalra <ashish.kalra@amd.com>
> >
> > Introduce a new AMD Memory Encryption GUID which is currently
> > used for defining a new UEFI environment variable which indicates
> > UEFI/OVMF support for the SEV live migration feature. This variable
> > is setup when UEFI/OVMF detects host/hypervisor support for SEV
> > live migration and later this variable is read by the kernel using
> > EFI runtime services to verify if OVMF supports the live migration
> > feature.
> >
> > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > ---
> >  include/linux/efi.h | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > index 8710f5710c1d..e95c144d1d02 100644
> > --- a/include/linux/efi.h
> > +++ b/include/linux/efi.h
> > @@ -360,6 +360,7 @@ void efi_native_runtime_setup(void);
> >
> >  /* OEM GUIDs */
> >  #define DELLEMC_EFI_RCI2_TABLE_GUID          EFI_GUID(0x2d9f28a2, 0xa886, 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
> > +#define MEM_ENCRYPT_GUID                     EFI_GUID(0x0cf29b71, 0x9e51, 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
> >
> >  typedef struct {
> >       efi_guid_t guid;
> > --
>
> When you apply this patch locally, you do:
>
> $ git log -p -1 | ./scripts/get_maintainer.pl
> Ard Biesheuvel <ardb@kernel.org> (maintainer:EXTENSIBLE FIRMWARE INTERFACE (EFI))
> linux-efi@vger.kernel.org (open list:EXTENSIBLE FIRMWARE INTERFACE (EFI))
> linux-kernel@vger.kernel.org (open list)
>
> and this tells you that you need to CC EFI folks too.
>
> I've CCed linux-efi now - please make sure you use that script to CC the
> relevant parties on patches, in the future.
>

Thanks Boris.

You are adding this GUID to the 'OEM GUIDs' section, in which case I'd
prefer the identifier to include which OEM.

Or alternatively, put it somewhere else, but in this case, putting
something like AMD_SEV in the identifier would still help to make it
more self-documenting.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-12 13:15   ` Borislav Petkov
@ 2021-05-12 15:51     ` Sean Christopherson
  2021-05-12 16:23       ` Borislav Petkov
                         ` (2 more replies)
  2021-05-13  4:34     ` Ashish Kalra
  2021-05-19 23:29     ` Andy Lutomirski
  2 siblings, 3 replies; 33+ messages in thread
From: Sean Christopherson @ 2021-05-12 15:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ashish Kalra, pbonzini, tglx, mingo, hpa, joro, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Wed, May 12, 2021, Borislav Petkov wrote:
> On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> > +static inline void notify_page_enc_status_changed(unsigned long pfn,
> > +						  int npages, bool enc)
> > +{
> > +	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> > +}
> 
> Now the question is whether something like that is needed for TDX, and,
> if so, could it be shared by both.

Yes, TDX needs this same hook, but "can't" reuse the hypercall verbatime.  Ditto
for SEV-SNP.  I wanted to squish everything into a single common hypercall, but
that didn't pan out.

The problem is that both TDX and SNP define their own versions of this so that
any guest kernel that complies with the TDX|SNP specification will run cleanly
on a hypervisor that also complies with the spec.  This KVM-specific hook doesn't
meet those requires because non-Linux guest support will be sketchy at best, and
non-KVM hypervisor support will be non-existent.

The best we can do, short of refusing to support TDX or SNP, is to make this
KVM-specific hypercall compatible with TDX and SNP so that the bulk of the
control logic is identical.  The mechanics of actually invoking the hypercall
will differ, but if done right, everything else should be reusable without
modification.

I had an in-depth analysis of this, but it was all off-list.  Pasted below. 

  TDX uses GPRs to communicate with the host, so it can tunnel "legacy" hypercalls
  from time zero.  SNP could technically do the same (with a revised GHCB spec),
  but it'd be butt ugly.  And of course trying to go that route for either TDX or
  SNP would run into the problem of having to coordinate the ABI for the "legacy"
  hypercall across all guests and hosts.  So yeah, trying to remove any of the
  three (KVM vs. SNP vs. TDX) interfaces is sadly just wishful thinking.

  That being said, I do think we can reuse the KVM specific hypercall for TDX and
  SNP.  Both will still need a {TDX,SNP}-specific GCH{I,B} protocol so that cross-
  vendor compatibility is guaranteed, but that shouldn't preclude a guest that is
  KVM enlightened from switching to the KVM specific hypercall once it can do so.
  More thoughts later on.

  > I guess a common structure could be used along the lines of what is in the
  > GHCB spec today, but that seems like overkill for SEV/SEV-ES, which will
  > only ever really do a single page range at a time (through
  > set_memory_encrypted() and set_memory_decrypted()). The reason for the
  > expanded form for SEV-SNP is that the OS can (proactively) adjust multiple
  > page ranges in advance. Will TDX need to do something similar?

  Yes, TDX needs the exact same thing.  All three (SEV, SNP, and TDX) have more or
  less the exact same hook in the guest (Linux, obviously) kernel.

  > If so, the only real common piece in KVM is a function to track what pages
  > are shared vs private, which would only require a simple interface.

  It's not just KVM, it's also the relevant code in the guest kernel(s) and other
  hypervisors.  And the MMU side of KVM will likely be able to share code, e.g. to
  act on the page size hint.

  > So for SEV/SEV-ES, a simpler hypercall interface to specify a single page
  > range is really all that is needed, but it must be common across
  > hypervisors. I think that was one Sean's points originally, we don't want
  > one hypercall for KVM, one for Hyper-V, one for VMware, one for Xen, etc.

  For the KVM defined interface (required for SEV/SEV-ES), I think it makes sense
  to make it a superset of the SNP and TDX protocols so that it _can_ be used in
  lieu of the SNP/TDX specific protocol.  I don't know for sure whether or not
  that will actually yield better code and/or performance, but it costs us almost
  nothing and at least gives us the option of further optimizing the Linux+KVM
  combination.

  It probably shouldn't be a strict superset, as in practice I don't think SNP
  approach of having individual entries when batching multiple pages will yield
  the best performance.  E.g. the vast majority (maybe all?) of conversions for a
  Linux guest will be physically contiguous and will have the same preferred page
  size, at which point there will be less overhead if the guest specifies a
  massive range as opposed to having to santize and fill a large buffer.

  TL;DR: I think the KVM hypercall should be something like this, so that it can
  be used for SNP and TDX, and possibly for other purposes, e.g. for paravirt
  performance enhancements or something.

    8. KVM_HC_MAP_GPA_RANGE
    -----------------------
    :Architecture: x86
    :Status: active
    :Purpose: Request KVM to map a GPA range with the specified attributes.

    a0: the guest physical address of the start page
    a1: the number of (4kb) pages (must be contiguous in GPA space)
    a2: attributes

  where 'attributes' could be something like:

    bits  3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
    bit     4 - plaintext = 0, encrypted = 1
    bits 63:5 - reserved (must be zero)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-12 15:51     ` Sean Christopherson
@ 2021-05-12 16:23       ` Borislav Petkov
  2021-05-13  6:57       ` Ashish Kalra
  2021-05-13 13:49       ` Tom Lendacky
  2 siblings, 0 replies; 33+ messages in thread
From: Borislav Petkov @ 2021-05-12 16:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ashish Kalra, pbonzini, tglx, mingo, hpa, joro, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Wed, May 12, 2021 at 03:51:10PM +0000, Sean Christopherson wrote:
>   TL;DR: I think the KVM hypercall should be something like this, so that it can
>   be used for SNP and TDX, and possibly for other purposes, e.g. for paravirt
>   performance enhancements or something.

Ok, good, I was only making sure this is on people's radar but it
actually is more than that. I'll let Tom and Jörg comment on the meat
of the thing - as always, thanks for the detailed explanation.

From my !virt guy POV, I like the aspect of sharing stuff as much as
possible and it all makes sense to me but what the hell do I know...

>     8. KVM_HC_MAP_GPA_RANGE
>     -----------------------
>     :Architecture: x86
>     :Status: active
>     :Purpose: Request KVM to map a GPA range with the specified attributes.
> 
>     a0: the guest physical address of the start page
>     a1: the number of (4kb) pages (must be contiguous in GPA space)
>     a2: attributes
> 
>   where 'attributes' could be something like:
> 
>     bits  3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
>     bit     4 - plaintext = 0, encrypted = 1
>     bits 63:5 - reserved (must be zero)

Yah, nice and simple. I like.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-12 13:15   ` Borislav Petkov
  2021-05-12 15:51     ` Sean Christopherson
@ 2021-05-13  4:34     ` Ashish Kalra
  2021-05-14  7:33       ` Borislav Petkov
  2021-05-19 23:29     ` Andy Lutomirski
  2 siblings, 1 reply; 33+ messages in thread
From: Ashish Kalra @ 2021-05-13  4:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: seanjc, pbonzini, tglx, mingo, hpa, joro, thomas.lendacky, x86,
	kvm, linux-kernel, srutherford, venu.busireddy, brijesh.singh

Hello Boris,

On Wed, May 12, 2021 at 03:15:37PM +0200, Borislav Petkov wrote:
> On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> > +static inline void notify_page_enc_status_changed(unsigned long pfn,
> > +						  int npages, bool enc)
> > +{
> > +	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> > +}
> 
> Now the question is whether something like that is needed for TDX, and,
> if so, could it be shared by both.
> 
> Sean?
> 
> > +void notify_addr_enc_status_changed(unsigned long vaddr, int npages,
> > +				    bool enc)
> 
> Let that line stick out.
> 
> > +{
> > +#ifdef CONFIG_PARAVIRT
> > +	unsigned long sz = npages << PAGE_SHIFT;
> > +	unsigned long vaddr_end = vaddr + sz;
> > +
> > +	while (vaddr < vaddr_end) {
> > +		int psize, pmask, level;
> > +		unsigned long pfn;
> > +		pte_t *kpte;
> > +
> > +		kpte = lookup_address(vaddr, &level);
> > +		if (!kpte || pte_none(*kpte))
> > +			return;
> 
> What does this mean exactly? On the first failure to lookup the address,
> you return? Why not continue so that you can notify about the remaining
> pages in [vaddr - vaddr_end)?

What's the use of notification of a partial page list, even a single
incorrect guest page encryption status can crash the guest/migrated
guest.

> Also, what does it mean for the current range if the lookup fails?
> Innocuous situation or do you need to signal it with a WARN or so?
> 

Yes, it makes sense to signal it with a WARN or so.

> > +
> > +		pfn = pg_level_to_pfn(level, kpte, NULL);
> > +		if (!pfn)
> > +			continue;
> 
> Same here: if it hits the default case, wouldn't it make sense to
> WARN_ONCE or so to catch potential misuse? Or better yet, the WARN_ONCE
> should be in pg_level_to_pfn().

Yes, it makes sense to add a WARN_ONCE() in pg_level_to_pfn().
> 
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index 16f878c26667..45e65517405a 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -2012,6 +2012,13 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> >  	 */
> >  	cpa_flush(&cpa, 0);
> >  
> > +	/*
> > +	 * Notify hypervisor that a given memory range is mapped encrypted
> > +	 * or decrypted. The hypervisor will use this information during the
> > +	 * VM migration.
> > +	 */
> > +	notify_addr_enc_status_changed(addr, numpages, enc);
> 
> If you notify about a range then that function should be called
> 
> 	notify_range_enc_status_changed
> 

Ok. 

Thanks,
Ashish

> or so.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CAshish.Kalra%40amd.com%7Cb880e2dae4d24f208c8b08d915480b4a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637564221487050648%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=q%2FOAt%2FQqv0t%2BXDhjvPQAEYj67XQIUWbis0MXGMu4EZY%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/4] EFI: Introduce the new AMD Memory Encryption GUID.
  2021-05-12 14:53     ` Ard Biesheuvel
@ 2021-05-13  4:36       ` Ashish Kalra
  0 siblings, 0 replies; 33+ messages in thread
From: Ashish Kalra @ 2021-05-13  4:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Borislav Petkov, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joerg Roedel, Tom Lendacky, X86 ML, kvm,
	Linux Kernel Mailing List, srutherford, Sean Christopherson,
	venu.busireddy, Brijesh Singh, linux-efi

On Wed, May 12, 2021 at 04:53:21PM +0200, Ard Biesheuvel wrote:
> On Wed, 12 May 2021 at 15:19, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Fri, Apr 23, 2021 at 03:59:01PM +0000, Ashish Kalra wrote:
> > > From: Ashish Kalra <ashish.kalra@amd.com>
> > >
> > > Introduce a new AMD Memory Encryption GUID which is currently
> > > used for defining a new UEFI environment variable which indicates
> > > UEFI/OVMF support for the SEV live migration feature. This variable
> > > is setup when UEFI/OVMF detects host/hypervisor support for SEV
> > > live migration and later this variable is read by the kernel using
> > > EFI runtime services to verify if OVMF supports the live migration
> > > feature.
> > >
> > > Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> > > ---
> > >  include/linux/efi.h | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > > index 8710f5710c1d..e95c144d1d02 100644
> > > --- a/include/linux/efi.h
> > > +++ b/include/linux/efi.h
> > > @@ -360,6 +360,7 @@ void efi_native_runtime_setup(void);
> > >
> > >  /* OEM GUIDs */
> > >  #define DELLEMC_EFI_RCI2_TABLE_GUID          EFI_GUID(0x2d9f28a2, 0xa886, 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
> > > +#define MEM_ENCRYPT_GUID                     EFI_GUID(0x0cf29b71, 0x9e51, 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
> > >
> > >  typedef struct {
> > >       efi_guid_t guid;
> > > --
> >
> > When you apply this patch locally, you do:
> >
> > $ git log -p -1 | ./scripts/get_maintainer.pl
> > Ard Biesheuvel <ardb@kernel.org> (maintainer:EXTENSIBLE FIRMWARE INTERFACE (EFI))
> > linux-efi@vger.kernel.org (open list:EXTENSIBLE FIRMWARE INTERFACE (EFI))
> > linux-kernel@vger.kernel.org (open list)
> >
> > and this tells you that you need to CC EFI folks too.
> >
> > I've CCed linux-efi now - please make sure you use that script to CC the
> > relevant parties on patches, in the future.
> >
> 
> Thanks Boris.
> 
> You are adding this GUID to the 'OEM GUIDs' section, in which case I'd
> prefer the identifier to include which OEM.
> 
> Or alternatively, put it somewhere else, but in this case, putting
> something like AMD_SEV in the identifier would still help to make it
> more self-documenting.

I will add AMD_SEV in the identifier above.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-12 15:51     ` Sean Christopherson
  2021-05-12 16:23       ` Borislav Petkov
@ 2021-05-13  6:57       ` Ashish Kalra
  2021-05-13  8:40         ` Paolo Bonzini
  2021-05-13 13:49       ` Tom Lendacky
  2 siblings, 1 reply; 33+ messages in thread
From: Ashish Kalra @ 2021-05-13  6:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Borislav Petkov, pbonzini, tglx, mingo, hpa, joro,
	thomas.lendacky, x86, kvm, linux-kernel, srutherford,
	venu.busireddy, brijesh.singh

Hello Sean,

On Wed, May 12, 2021 at 03:51:10PM +0000, Sean Christopherson wrote:
> On Wed, May 12, 2021, Borislav Petkov wrote:
> > On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> > > +static inline void notify_page_enc_status_changed(unsigned long pfn,
> > > +						  int npages, bool enc)
> > > +{
> > > +	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> > > +}
> > 
> > Now the question is whether something like that is needed for TDX, and,
> > if so, could it be shared by both.
> 
> Yes, TDX needs this same hook, but "can't" reuse the hypercall verbatime.  Ditto
> for SEV-SNP.  I wanted to squish everything into a single common hypercall, but
> that didn't pan out.
> 
> The problem is that both TDX and SNP define their own versions of this so that
> any guest kernel that complies with the TDX|SNP specification will run cleanly
> on a hypervisor that also complies with the spec.  This KVM-specific hook doesn't
> meet those requires because non-Linux guest support will be sketchy at best, and
> non-KVM hypervisor support will be non-existent.
> 
> The best we can do, short of refusing to support TDX or SNP, is to make this
> KVM-specific hypercall compatible with TDX and SNP so that the bulk of the
> control logic is identical.  The mechanics of actually invoking the hypercall
> will differ, but if done right, everything else should be reusable without
> modification.
> 
> I had an in-depth analysis of this, but it was all off-list.  Pasted below. 
> 
>   TDX uses GPRs to communicate with the host, so it can tunnel "legacy" hypercalls
>   from time zero.  SNP could technically do the same (with a revised GHCB spec),
>   but it'd be butt ugly.  And of course trying to go that route for either TDX or
>   SNP would run into the problem of having to coordinate the ABI for the "legacy"
>   hypercall across all guests and hosts.  So yeah, trying to remove any of the
>   three (KVM vs. SNP vs. TDX) interfaces is sadly just wishful thinking.
> 
>   That being said, I do think we can reuse the KVM specific hypercall for TDX and
>   SNP.  Both will still need a {TDX,SNP}-specific GCH{I,B} protocol so that cross-
>   vendor compatibility is guaranteed, but that shouldn't preclude a guest that is
>   KVM enlightened from switching to the KVM specific hypercall once it can do so.
>   More thoughts later on.
> 
>   > I guess a common structure could be used along the lines of what is in the
>   > GHCB spec today, but that seems like overkill for SEV/SEV-ES, which will
>   > only ever really do a single page range at a time (through
>   > set_memory_encrypted() and set_memory_decrypted()). The reason for the
>   > expanded form for SEV-SNP is that the OS can (proactively) adjust multiple
>   > page ranges in advance. Will TDX need to do something similar?
> 
>   Yes, TDX needs the exact same thing.  All three (SEV, SNP, and TDX) have more or
>   less the exact same hook in the guest (Linux, obviously) kernel.
> 
>   > If so, the only real common piece in KVM is a function to track what pages
>   > are shared vs private, which would only require a simple interface.
> 
>   It's not just KVM, it's also the relevant code in the guest kernel(s) and other
>   hypervisors.  And the MMU side of KVM will likely be able to share code, e.g. to
>   act on the page size hint.
> 
>   > So for SEV/SEV-ES, a simpler hypercall interface to specify a single page
>   > range is really all that is needed, but it must be common across
>   > hypervisors. I think that was one Sean's points originally, we don't want
>   > one hypercall for KVM, one for Hyper-V, one for VMware, one for Xen, etc.
> 
>   For the KVM defined interface (required for SEV/SEV-ES), I think it makes sense
>   to make it a superset of the SNP and TDX protocols so that it _can_ be used in
>   lieu of the SNP/TDX specific protocol.  I don't know for sure whether or not
>   that will actually yield better code and/or performance, but it costs us almost
>   nothing and at least gives us the option of further optimizing the Linux+KVM
>   combination.
> 
>   It probably shouldn't be a strict superset, as in practice I don't think SNP
>   approach of having individual entries when batching multiple pages will yield
>   the best performance.  E.g. the vast majority (maybe all?) of conversions for a
>   Linux guest will be physically contiguous and will have the same preferred page
>   size, at which point there will be less overhead if the guest specifies a
>   massive range as opposed to having to santize and fill a large buffer.
> 
>   TL;DR: I think the KVM hypercall should be something like this, so that it can
>   be used for SNP and TDX, and possibly for other purposes, e.g. for paravirt
>   performance enhancements or something.
> 
>     8. KVM_HC_MAP_GPA_RANGE
>     -----------------------
>     :Architecture: x86
>     :Status: active
>     :Purpose: Request KVM to map a GPA range with the specified attributes.
> 
>     a0: the guest physical address of the start page
>     a1: the number of (4kb) pages (must be contiguous in GPA space)
>     a2: attributes
> 
>   where 'attributes' could be something like:
> 
>     bits  3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
>     bit     4 - plaintext = 0, encrypted = 1
>     bits 63:5 - reserved (must be zero)
> 

Ok. Will modify page encryption status hypercall to be compatible with
the above defined interface.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-13  6:57       ` Ashish Kalra
@ 2021-05-13  8:40         ` Paolo Bonzini
  0 siblings, 0 replies; 33+ messages in thread
From: Paolo Bonzini @ 2021-05-13  8:40 UTC (permalink / raw)
  To: Ashish Kalra, Sean Christopherson
  Cc: Borislav Petkov, tglx, mingo, hpa, joro, thomas.lendacky, x86,
	kvm, linux-kernel, srutherford, venu.busireddy, brijesh.singh

On 13/05/21 08:57, Ashish Kalra wrote:
>>      8. KVM_HC_MAP_GPA_RANGE
>>      -----------------------
>>      :Architecture: x86
>>      :Status: active
>>      :Purpose: Request KVM to map a GPA range with the specified attributes.
>>
>>      a0: the guest physical address of the start page
>>      a1: the number of (4kb) pages (must be contiguous in GPA space)
>>      a2: attributes
>>
>>    where 'attributes' could be something like:
>>
>>      bits  3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
>>      bit     4 - plaintext = 0, encrypted = 1
>>      bits 63:5 - reserved (must be zero)
>>
> 
> Ok. Will modify page encryption status hypercall to be compatible with
> the above defined interface.

Great, this is the current state of the host-side patch (untested):

 From df571861e1d47d81a578b4950c704d01a0ed915e Mon Sep 17 00:00:00 2001
From: Ashish Kalra <ashish.kalra@amd.com>
Date: Thu, 15 Apr 2021 15:57:02 +0000
Subject: [PATCH] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.

The hypercall exits to userspace to manage the guest shared regions and
integrate with the userspace VMM's migration code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..0d2abcad0565 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6891,3 +6891,22 @@ This capability is always enabled.
  This capability indicates that the KVM virtual PTP service is
  supported in the host. A VMM can check whether the service is
  available to the guest on migration.
+
+8.33 KVM_CAP_EXIT_HYPERCALL
+---------------------------
+
+:Capability: KVM_CAP_EXIT_HYPERCALL
+:Architectures: x86
+:Type: vm
+
+This capability, if enabled, will cause KVM to exit to userspace
+with KVM_EXIT_HYPERCALL exit reason to process some hypercalls.
+
+Calling KVM_CHECK_EXTENSION for this capability will return a bitmask
+of hypercalls that can be configured to exit to userspace.
+Right now, the only such hypercall is KVM_HC_PAGE_ENC_STATUS.
+
+The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
+of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
+the hypercalls whose corresponding bit is in the argument, and return
+ENOSYS for the others.
diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index cf62162d4be2..1e0013d3c972 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -96,6 +96,14 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
                                                 before using extended destination
                                                 ID bits in MSI address bits 11-5.
  
+KVM_FEATURE_HC_PAGE_ENC_STATUS     16          guest checks this feature bit before
+                                               using the page encryption state
+                                               hypercall to notify the page state
+                                               change
+
+KVM_FEATURE_MIGRATION_CONTROL      17          guest checks this feature bit before
+                                               using MSR_KVM_MIGRATION_CONTROL
+
  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                                 per-cpu warps are expected in
                                                 kvmclock
diff --git a/Documentation/virt/kvm/hypercalls.rst b/Documentation/virt/kvm/hypercalls.rst
index ed4fddd364ea..117ff3b27d3c 100644
--- a/Documentation/virt/kvm/hypercalls.rst
+++ b/Documentation/virt/kvm/hypercalls.rst
@@ -169,3 +169,24 @@ a0: destination APIC ID
  
  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
  	        any of the IPI target vCPUs was preempted.
+
+
+8. KVM_HC_PAGE_ENC_STATUS
+-------------------------
+:Architecture: x86
+:Status: active
+:Purpose: Notify the encryption status changes in guest page table (SEV guest)
+
+a0: the guest physical address of the start page
+a1: the number of pages
+a2: page encryption status
+
+   Where:
+	* 1: Page is encrypted
+	* 0: Page is decrypted
+
+**Implementation note**: this hypercall is implemented in userspace via
+the KVM_CAP_EXIT_HYPERCALL capability.  Userspace must enable that capability
+before advertising KVM_FEATURE_HC_PAGE_ENC_STATUS in the guest CPUID.  In
+addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace
+must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL.
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index e37a14c323d2..977936176f36 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -376,3 +376,16 @@ data:
  	write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
  	and check if there are more notifications pending. The MSR is available
  	if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
+
+MSR_KVM_MIGRATION_CONTROL:
+        0x4b564d08
+
+data:
+        This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
+        CPUID.  Bit 0 represents whether live migration of the guest is allowed.
+
+        When a guest is started, bit 0 will be 0 if the guest has encrypted
+        memory and 1 if the guest does not have encrypted memory.  If the
+        guest is communicating page encryption status to the host using the
+        ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
+        allow live migration of the guest.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..5b9bc8b3db20 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1067,6 +1067,8 @@ struct kvm_arch {
  	u32 user_space_msr_mask;
  	struct kvm_x86_msr_filter __rcu *msr_filter;
  
+	u32 hypercall_exit_enabled;
+
  	/* Guest can access the SGX PROVISIONKEY. */
  	bool sgx_provisioning_allowed;
  
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 950afebfba88..cff18b8b6dec 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -33,6 +33,8 @@
  #define KVM_FEATURE_PV_SCHED_YIELD	13
  #define KVM_FEATURE_ASYNC_PF_INT	14
  #define KVM_FEATURE_MSI_EXT_DEST_ID	15
+#define KVM_FEATURE_HC_PAGE_ENC_STATUS	16
+#define KVM_FEATURE_MIGRATION_CONTROL	17
  
  #define KVM_HINTS_REALTIME      0
  
@@ -54,6 +56,7 @@
  #define MSR_KVM_POLL_CONTROL	0x4b564d05
  #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
  #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
+#define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
  
  struct kvm_steal_time {
  	__u64 steal;
@@ -90,6 +93,8 @@ struct kvm_clock_pairing {
  /* MSR_KVM_ASYNC_PF_INT */
  #define KVM_ASYNC_PF_VEC_MASK			GENMASK(7, 0)
  
+/* MSR_KVM_MIGRATION_CONTROL */
+#define KVM_MIGRATION_READY		(1 << 0)
  
  /* Operations for KVM_HC_MMU_OP */
  #define KVM_MMU_OP_WRITE_PTE            1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5bd550eaf683..eab7d50eb4e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -102,6 +102,8 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
  
  static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
  
+#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_PAGE_ENC_STATUS)
+
  #define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
                                      KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
  
@@ -3894,6 +3896,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
  	case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
  		r = 1;
  		break;
+	case KVM_CAP_EXIT_HYPERCALL:
+		r = KVM_EXIT_HYPERCALL_VALID_MASK;
+		break;
  	case KVM_CAP_SET_GUEST_DEBUG2:
  		return KVM_GUESTDBG_VALID_MASK;
  #ifdef CONFIG_KVM_XEN
@@ -5494,6 +5499,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
  		break;
  	}
  #endif
+	case KVM_CAP_EXIT_HYPERCALL:
+		if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
+			r = -EINVAL;
+			break;
+		}
+		kvm->arch.hypercall_exit_enabled = cap->args[0];
+		r = 0;
+		break;
  	case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
  		r = -EINVAL;
  		if (kvm_x86_ops.vm_copy_enc_context_from)
@@ -8384,6 +8397,16 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
  	return;
  }
  
+static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
+{
+	u64 ret = vcpu->run->hypercall.ret;
+	if (!is_64_bit_mode(vcpu))
+		ret = (u32)ret;
+	kvm_rax_write(vcpu, ret);
+	++vcpu->stat.hypercalls;
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
  {
  	unsigned long nr, a0, a1, a2, a3, ret;
@@ -8449,6 +8472,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
  		kvm_sched_yield(vcpu, a0);
  		ret = 0;
  		break;
+	case KVM_HC_PAGE_ENC_STATUS: {
+		u64 gpa = a0, npages = a1, enc = a2;
+
+		ret = -KVM_ENOSYS;
+		if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_PAGE_ENC_STATUS)))
+			break;
+
+		if (!PAGE_ALIGNED(gpa) || !npages ||
+		    gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
+			ret = -KVM_EINVAL;
+			break;
+		}
+
+		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
+		vcpu->run->hypercall.nr       = KVM_HC_PAGE_ENC_STATUS;
+		vcpu->run->hypercall.args[0]  = gpa;
+		vcpu->run->hypercall.args[1]  = npages;
+		vcpu->run->hypercall.args[2]  = enc;
+		vcpu->run->hypercall.longmode = op_64_bit;
+		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
+		return 0;
+	}
  	default:
  		ret = -KVM_ENOSYS;
  		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..1fb4fd863324 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
  #define KVM_CAP_SGX_ATTRIBUTE 196
  #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
  #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_EXIT_HYPERCALL 199
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 8b86609849b9..847b83b75dc8 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -29,6 +29,7 @@
  #define KVM_HC_CLOCK_PAIRING		9
  #define KVM_HC_SEND_IPI		10
  #define KVM_HC_SCHED_YIELD		11
+#define KVM_HC_PAGE_ENC_STATUS		12
  
  /*
   * hypercalls use architecture specific


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-12 15:51     ` Sean Christopherson
  2021-05-12 16:23       ` Borislav Petkov
  2021-05-13  6:57       ` Ashish Kalra
@ 2021-05-13 13:49       ` Tom Lendacky
  2 siblings, 0 replies; 33+ messages in thread
From: Tom Lendacky @ 2021-05-13 13:49 UTC (permalink / raw)
  To: Sean Christopherson, Borislav Petkov
  Cc: Ashish Kalra, pbonzini, tglx, mingo, hpa, joro, x86, kvm,
	linux-kernel, srutherford, venu.busireddy, brijesh.singh

On 5/12/21 10:51 AM, Sean Christopherson wrote:
> On Wed, May 12, 2021, Borislav Petkov wrote:
>> On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
>>> +static inline void notify_page_enc_status_changed(unsigned long pfn,
>>> +						  int npages, bool enc)
>>> +{
>>> +	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
>>> +}
>>
>> Now the question is whether something like that is needed for TDX, and,
>> if so, could it be shared by both.
> 
> Yes, TDX needs this same hook, but "can't" reuse the hypercall verbatime.  Ditto
> for SEV-SNP.  I wanted to squish everything into a single common hypercall, but
> that didn't pan out.
> 
> The problem is that both TDX and SNP define their own versions of this so that
> any guest kernel that complies with the TDX|SNP specification will run cleanly
> on a hypervisor that also complies with the spec.  This KVM-specific hook doesn't
> meet those requires because non-Linux guest support will be sketchy at best, and
> non-KVM hypervisor support will be non-existent.
> 
> The best we can do, short of refusing to support TDX or SNP, is to make this
> KVM-specific hypercall compatible with TDX and SNP so that the bulk of the
> control logic is identical.  The mechanics of actually invoking the hypercall
> will differ, but if done right, everything else should be reusable without
> modification.
> 
> I had an in-depth analysis of this, but it was all off-list.  Pasted below. 
> 
>   TDX uses GPRs to communicate with the host, so it can tunnel "legacy" hypercalls
>   from time zero.  SNP could technically do the same (with a revised GHCB spec),
>   but it'd be butt ugly.  And of course trying to go that route for either TDX or
>   SNP would run into the problem of having to coordinate the ABI for the "legacy"
>   hypercall across all guests and hosts.  So yeah, trying to remove any of the
>   three (KVM vs. SNP vs. TDX) interfaces is sadly just wishful thinking.
> 
>   That being said, I do think we can reuse the KVM specific hypercall for TDX and
>   SNP.  Both will still need a {TDX,SNP}-specific GCH{I,B} protocol so that cross-
>   vendor compatibility is guaranteed, but that shouldn't preclude a guest that is
>   KVM enlightened from switching to the KVM specific hypercall once it can do so.
>   More thoughts later on.
> 
>   > I guess a common structure could be used along the lines of what is in the
>   > GHCB spec today, but that seems like overkill for SEV/SEV-ES, which will
>   > only ever really do a single page range at a time (through
>   > set_memory_encrypted() and set_memory_decrypted()). The reason for the
>   > expanded form for SEV-SNP is that the OS can (proactively) adjust multiple
>   > page ranges in advance. Will TDX need to do something similar?
> 
>   Yes, TDX needs the exact same thing.  All three (SEV, SNP, and TDX) have more or
>   less the exact same hook in the guest (Linux, obviously) kernel.
> 
>   > If so, the only real common piece in KVM is a function to track what pages
>   > are shared vs private, which would only require a simple interface.
> 
>   It's not just KVM, it's also the relevant code in the guest kernel(s) and other
>   hypervisors.  And the MMU side of KVM will likely be able to share code, e.g. to
>   act on the page size hint.
> 
>   > So for SEV/SEV-ES, a simpler hypercall interface to specify a single page
>   > range is really all that is needed, but it must be common across
>   > hypervisors. I think that was one Sean's points originally, we don't want
>   > one hypercall for KVM, one for Hyper-V, one for VMware, one for Xen, etc.
> 
>   For the KVM defined interface (required for SEV/SEV-ES), I think it makes sense
>   to make it a superset of the SNP and TDX protocols so that it _can_ be used in
>   lieu of the SNP/TDX specific protocol.  I don't know for sure whether or not
>   that will actually yield better code and/or performance, but it costs us almost
>   nothing and at least gives us the option of further optimizing the Linux+KVM
>   combination.

Right, for SEV-SNP, as long as we know that the KVM interface is
available, it could be used. But we would have to fall back to the GHCB
specification if it could not be determined.

> 
>   It probably shouldn't be a strict superset, as in practice I don't think SNP
>   approach of having individual entries when batching multiple pages will yield
>   the best performance.  E.g. the vast majority (maybe all?) of conversions for a
>   Linux guest will be physically contiguous and will have the same preferred page
>   size, at which point there will be less overhead if the guest specifies a
>   massive range as opposed to having to santize and fill a large buffer.

Originally, the plan was to use ranges, but based on feedback during the
GHCB spec review it was updated to its current definition.

A concern from other hypervisor vendors was to be able to return back to
the guest in the middle of the hypercall, e.g. to deliver an interrupt or
such, and then allow the guest to resume the hypercall where it left off.
Not sure if you would want to build that into this hypercall, since
depending on the range, the operation could take a while.

Thanks,
Tom

> 
>   TL;DR: I think the KVM hypercall should be something like this, so that it can
>   be used for SNP and TDX, and possibly for other purposes, e.g. for paravirt
>   performance enhancements or something.
> 
>     8. KVM_HC_MAP_GPA_RANGE
>     -----------------------
>     :Architecture: x86
>     :Status: active
>     :Purpose: Request KVM to map a GPA range with the specified attributes.
> 
>     a0: the guest physical address of the start page
>     a1: the number of (4kb) pages (must be contiguous in GPA space)
>     a2: attributes
> 
>   where 'attributes' could be something like:
> 
>     bits  3:0 - preferred page size encoding 0 = 4kb, 1 = 2mb, 2 = 1gb, etc...
>     bit     4 - plaintext = 0, encrypted = 1
>     bits 63:5 - reserved (must be zero)
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-13  4:34     ` Ashish Kalra
@ 2021-05-14  7:33       ` Borislav Petkov
  2021-05-14  8:03         ` Paolo Bonzini
  0 siblings, 1 reply; 33+ messages in thread
From: Borislav Petkov @ 2021-05-14  7:33 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: seanjc, pbonzini, tglx, mingo, hpa, joro, thomas.lendacky, x86,
	kvm, linux-kernel, srutherford, venu.busireddy, brijesh.singh

On Thu, May 13, 2021 at 04:34:41AM +0000, Ashish Kalra wrote:
> What's the use of notification of a partial page list, even a single
> incorrect guest page encryption status can crash the guest/migrated
> guest.

Ok, so explain to me how this looks from the user standpoint: she starts
migrating the guest, it fails to lookup an address, there's nothing
saying where it failed but the guest crashed.

Do you think this is user-friendly?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-14  7:33       ` Borislav Petkov
@ 2021-05-14  8:03         ` Paolo Bonzini
  2021-05-14  9:05           ` Ashish Kalra
  2021-05-14  9:24           ` Borislav Petkov
  0 siblings, 2 replies; 33+ messages in thread
From: Paolo Bonzini @ 2021-05-14  8:03 UTC (permalink / raw)
  To: Borislav Petkov, Ashish Kalra
  Cc: seanjc, tglx, mingo, hpa, joro, thomas.lendacky, x86, kvm,
	linux-kernel, srutherford, venu.busireddy, brijesh.singh

On 14/05/21 09:33, Borislav Petkov wrote:
> Ok, so explain to me how this looks from the user standpoint: she starts
> migrating the guest, it fails to lookup an address, there's nothing
> saying where it failed but the guest crashed.
> 
> Do you think this is user-friendly?

Ok, so explain to me how this looks from the submitter standpoint: he 
reads your review of his patch, he acknowledges your point with "Yes, it 
makes sense to signal it with a WARN or so", and still is treated as shit.

Do you think this is friendly?

Paolo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-14  8:03         ` Paolo Bonzini
@ 2021-05-14  9:05           ` Ashish Kalra
  2021-05-14  9:34             ` Borislav Petkov
  2021-05-14  9:57             ` Paolo Bonzini
  2021-05-14  9:24           ` Borislav Petkov
  1 sibling, 2 replies; 33+ messages in thread
From: Ashish Kalra @ 2021-05-14  9:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Borislav Petkov, seanjc, tglx, mingo, hpa, joro, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

Hello Boris, Paolo,

On Fri, May 14, 2021 at 10:03:18AM +0200, Paolo Bonzini wrote:
> On 14/05/21 09:33, Borislav Petkov wrote:
> > Ok, so explain to me how this looks from the user standpoint: she starts
> > migrating the guest, it fails to lookup an address, there's nothing
> > saying where it failed but the guest crashed.
> > 
> > Do you think this is user-friendly?
> 
> Ok, so explain to me how this looks from the submitter standpoint: he reads
> your review of his patch, he acknowledges your point with "Yes, it makes
> sense to signal it with a WARN or so", and still is treated as shit.
> 
> Do you think this is friendly?
> 
> 

I absolutely agree with both of your point of view. But what's the
alternative ?

Ideally we should fail/stop migration even if a single guest page
encryption status cannot be notified and that should be the way to
proceed in this case, the guest kernel should notify the source
userspace VMM to block/stop migration in this case.

From a practical side, i do see Qemu's migrate_add_blocker() interface
but that looks to be a static interface and also i don't think it will
force stop an ongoing migration, is there an existing mechanism
to inform userspace VMM from kernel about blocking/stopping migration ?

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-14  8:03         ` Paolo Bonzini
  2021-05-14  9:05           ` Ashish Kalra
@ 2021-05-14  9:24           ` Borislav Petkov
  2021-05-14  9:33             ` Ashish Kalra
  1 sibling, 1 reply; 33+ messages in thread
From: Borislav Petkov @ 2021-05-14  9:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ashish Kalra, seanjc, tglx, mingo, hpa, joro, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Fri, May 14, 2021 at 10:03:18AM +0200, Paolo Bonzini wrote:
> Ok, so explain to me how this looks from the submitter standpoint: he reads
> your review of his patch, he acknowledges your point with "Yes, it makes
> sense to signal it with a WARN or so", and still is treated as shit.

How is me asking about the user experience of it all, treating him like
shit?!

How should I have asked this so that it is not making you think I'm
treating him like shit?

Because treating someone like shit is not in my goals.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-14  9:24           ` Borislav Petkov
@ 2021-05-14  9:33             ` Ashish Kalra
  0 siblings, 0 replies; 33+ messages in thread
From: Ashish Kalra @ 2021-05-14  9:33 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, seanjc, tglx, mingo, hpa, joro, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Fri, May 14, 2021 at 11:24:03AM +0200, Borislav Petkov wrote:
> On Fri, May 14, 2021 at 10:03:18AM +0200, Paolo Bonzini wrote:
> > Ok, so explain to me how this looks from the submitter standpoint: he reads
> > your review of his patch, he acknowledges your point with "Yes, it makes
> > sense to signal it with a WARN or so", and still is treated as shit.
> 
> How is me asking about the user experience of it all, treating him like
> shit?!
> 
> How should I have asked this so that it is not making you think I'm
> treating him like shit?
> 
> Because treating someone like shit is not in my goals.
> 

As i mentioned in my previous reply, this has to be treated as a fatal
error from the user point of view, and the kernel needs to inform the
userspace VMM to block/stop migration as response to this failure.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-14  9:05           ` Ashish Kalra
@ 2021-05-14  9:34             ` Borislav Petkov
  2021-05-14 10:05               ` Ashish Kalra
  2021-05-14  9:57             ` Paolo Bonzini
  1 sibling, 1 reply; 33+ messages in thread
From: Borislav Petkov @ 2021-05-14  9:34 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, seanjc, tglx, mingo, hpa, joro, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Fri, May 14, 2021 at 09:05:23AM +0000, Ashish Kalra wrote:
> Ideally we should fail/stop migration even if a single guest page
> encryption status cannot be notified and that should be the way to
> proceed in this case, the guest kernel should notify the source
> userspace VMM to block/stop migration in this case.

Yap, and what I'm trying to point out here is that if the low level
machinery fails for whatever reason and it cannot recover, we should
propagate that error up the chain so that the user is aware as to why it
failed.

WARN is a good first start but in some configurations those splats don't
even get shown as people don't look at dmesg, etc.

And I think it is very important to propagate those errors properly
because there's a *lot* of moving parts involved in a guest migration
and you have encrypted memory which makes debugging this probably even
harder, etc, etc.

I hope this makes more sense.

> From a practical side, i do see Qemu's migrate_add_blocker() interface
> but that looks to be a static interface and also i don't think it will
> force stop an ongoing migration, is there an existing mechanism
> to inform userspace VMM from kernel about blocking/stopping migration ?

Hmm, so __set_memory_enc_dec() which calls
notify_addr_enc_status_changed() is called by the guest, right, when it
starts migrating.

Can an error value from it be propagated up the callchain so it can be
turned into an error messsage for the guest owner to see?

(I might be way off base here as I have no clue how the whole migration
 machinery is kicked into gear...)

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-14  9:05           ` Ashish Kalra
  2021-05-14  9:34             ` Borislav Petkov
@ 2021-05-14  9:57             ` Paolo Bonzini
  1 sibling, 0 replies; 33+ messages in thread
From: Paolo Bonzini @ 2021-05-14  9:57 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Borislav Petkov, seanjc, tglx, mingo, hpa, joro, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On 14/05/21 11:05, Ashish Kalra wrote:
> I absolutely agree with both of your point of view. But what's the
> alternative ?
> 
> Ideally we should fail/stop migration even if a single guest page
> encryption status cannot be notified and that should be the way to
> proceed in this case, the guest kernel should notify the source
> userspace VMM to block/stop migration in this case.
> 
>  From a practical side, i do see Qemu's migrate_add_blocker() interface
> but that looks to be a static interface and also i don't think it will
> force stop an ongoing migration, is there an existing mechanism
> to inform userspace VMM from kernel about blocking/stopping migration ?

On the Linux side, all you need to do is WARN and write 0 to the 
MIGRATION_CONTROL MSR.

QEMU can check the MSR value when migrating the CPU registers at the 
end, and fail migration if the MSR value is 0.

Paolo


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-14  9:34             ` Borislav Petkov
@ 2021-05-14 10:05               ` Ashish Kalra
  2021-05-14 10:38                 ` Borislav Petkov
  2021-05-18  2:01                 ` Steve Rutherford
  0 siblings, 2 replies; 33+ messages in thread
From: Ashish Kalra @ 2021-05-14 10:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, seanjc, tglx, mingo, hpa, joro, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Fri, May 14, 2021 at 11:34:32AM +0200, Borislav Petkov wrote:
> On Fri, May 14, 2021 at 09:05:23AM +0000, Ashish Kalra wrote:
> > Ideally we should fail/stop migration even if a single guest page
> > encryption status cannot be notified and that should be the way to
> > proceed in this case, the guest kernel should notify the source
> > userspace VMM to block/stop migration in this case.
> 
> Yap, and what I'm trying to point out here is that if the low level
> machinery fails for whatever reason and it cannot recover, we should
> propagate that error up the chain so that the user is aware as to why it
> failed.
> 

I totally agree.

> WARN is a good first start but in some configurations those splats don't
> even get shown as people don't look at dmesg, etc.
> 
> And I think it is very important to propagate those errors properly
> because there's a *lot* of moving parts involved in a guest migration
> and you have encrypted memory which makes debugging this probably even
> harder, etc, etc.
> 
> I hope this makes more sense.
> 
> > From a practical side, i do see Qemu's migrate_add_blocker() interface
> > but that looks to be a static interface and also i don't think it will
> > force stop an ongoing migration, is there an existing mechanism
> > to inform userspace VMM from kernel about blocking/stopping migration ?
> 
> Hmm, so __set_memory_enc_dec() which calls
> notify_addr_enc_status_changed() is called by the guest, right, when it
> starts migrating.
> 

No, actually notify_addr_enc_status_changed() is called whenever a range
of memory is marked as encrypted or decrypted, so it has nothing to do
with migration as such. 

This is basically modifying the encryption attributes on the page tables
and correspondingly also making the hypercall to inform the hypervisor about
page status encryption changes. The hypervisor will use this information
during an ongoing or future migration, so this information is maintained
even though migration might never be initiated here.

> Can an error value from it be propagated up the callchain so it can be
> turned into an error messsage for the guest owner to see?
> 

The error value cannot be propogated up the callchain directly
here, but one possibility is to leverage the hypercall and use Sean's
proposed hypercall interface to notify the host/hypervisor to block/stop
any future/ongoing migration.

Or as from Paolo's response, writing 0 to MIGRATION_CONTROL MSR seems
more ideal.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-14 10:05               ` Ashish Kalra
@ 2021-05-14 10:38                 ` Borislav Petkov
  2021-05-18  2:01                 ` Steve Rutherford
  1 sibling, 0 replies; 33+ messages in thread
From: Borislav Petkov @ 2021-05-14 10:38 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Paolo Bonzini, seanjc, tglx, mingo, hpa, joro, thomas.lendacky,
	x86, kvm, linux-kernel, srutherford, venu.busireddy,
	brijesh.singh

On Fri, May 14, 2021 at 10:05:19AM +0000, Ashish Kalra wrote:
> No, actually notify_addr_enc_status_changed() is called whenever a range
> of memory is marked as encrypted or decrypted, so it has nothing to do
> with migration as such. 
> 
> This is basically modifying the encryption attributes on the page tables
> and correspondingly also making the hypercall to inform the hypervisor about
> page status encryption changes. The hypervisor will use this information
> during an ongoing or future migration, so this information is maintained
> even though migration might never be initiated here.

Doh, ofcourse. This doesn't make it easier.

> The error value cannot be propogated up the callchain directly
> here,

Yeah, my thinking was way wrong here - sorry about that.

> but one possibility is to leverage the hypercall and use Sean's
> proposed hypercall interface to notify the host/hypervisor to block/stop
> any future/ongoing migration.
>
> Or as from Paolo's response, writing 0 to MIGRATION_CONTROL MSR seems
> more ideal.

Ok.

So to sum up: notify_addr_enc_status_changed() should warn but not
because of migration but because regardless, we should tell the users
when page enc attributes updating fails as that is potentially hinting
at a bigger problem so we better make sufficient noise here.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-14 10:05               ` Ashish Kalra
  2021-05-14 10:38                 ` Borislav Petkov
@ 2021-05-18  2:01                 ` Steve Rutherford
  2021-05-19 12:06                   ` Ashish Kalra
  1 sibling, 1 reply; 33+ messages in thread
From: Steve Rutherford @ 2021-05-18  2:01 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: Borislav Petkov, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joerg Roedel,
	Tom Lendacky, X86 ML, KVM list, LKML, Venu Busireddy,
	Brijesh Singh

On Fri, May 14, 2021 at 3:05 AM Ashish Kalra <ashish.kalra@amd.com> wrote:
>
> On Fri, May 14, 2021 at 11:34:32AM +0200, Borislav Petkov wrote:
> > On Fri, May 14, 2021 at 09:05:23AM +0000, Ashish Kalra wrote:
> > > Ideally we should fail/stop migration even if a single guest page
> > > encryption status cannot be notified and that should be the way to
> > > proceed in this case, the guest kernel should notify the source
> > > userspace VMM to block/stop migration in this case.
> >
> > Yap, and what I'm trying to point out here is that if the low level
> > machinery fails for whatever reason and it cannot recover, we should
> > propagate that error up the chain so that the user is aware as to why it
> > failed.
> >
>
> I totally agree.
>
> > WARN is a good first start but in some configurations those splats don't
> > even get shown as people don't look at dmesg, etc.
> >
> > And I think it is very important to propagate those errors properly
> > because there's a *lot* of moving parts involved in a guest migration
> > and you have encrypted memory which makes debugging this probably even
> > harder, etc, etc.
> >
> > I hope this makes more sense.
> >
> > > From a practical side, i do see Qemu's migrate_add_blocker() interface
> > > but that looks to be a static interface and also i don't think it will
> > > force stop an ongoing migration, is there an existing mechanism
> > > to inform userspace VMM from kernel about blocking/stopping migration ?
> >
> > Hmm, so __set_memory_enc_dec() which calls
> > notify_addr_enc_status_changed() is called by the guest, right, when it
> > starts migrating.
> >
>
> No, actually notify_addr_enc_status_changed() is called whenever a range
> of memory is marked as encrypted or decrypted, so it has nothing to do
> with migration as such.
>
> This is basically modifying the encryption attributes on the page tables
> and correspondingly also making the hypercall to inform the hypervisor about
> page status encryption changes. The hypervisor will use this information
> during an ongoing or future migration, so this information is maintained
> even though migration might never be initiated here.
>
> > Can an error value from it be propagated up the callchain so it can be
> > turned into an error messsage for the guest owner to see?
> >
>
> The error value cannot be propogated up the callchain directly
> here, but one possibility is to leverage the hypercall and use Sean's
> proposed hypercall interface to notify the host/hypervisor to block/stop
> any future/ongoing migration.
>
> Or as from Paolo's response, writing 0 to MIGRATION_CONTROL MSR seems
> more ideal.
>
> Thanks,
> Ashish

How realistic is this type of failure? If you've gotten this deep, it
seems like something has gone very wrong if the memory you are about
to mark as shared (or encrypted) doesn't exist and isn't mapped. In
particular, is the kernel going to page fault when it tries to
reinitialize the page it's currently changing the c-bit of? From what
I recall, most paths that do either set_memory_encrypted or
set_memory_decrypted memset the region being toggled. Note: dma_pool
doesn't immediately memset, but the VA it's providing to set_decrypted
is freshly fetched from a recently allocated region (something has to
have gone pretty wrong if this is invalid if I'm not mistaken. No one
would think twice if you wrote to that freshly allocated page).

The reason I mention this is that SEV migration is going to be the
least of your concerns if you are already on a one-way train towards a
Kernel oops. I'm not certain I would go so far as to say this should
BUG() instead (I think the page fault on access might be easier to
debug a BUG here), but I'm pretty skeptical that the kernel is going
to do too well if it doesn't know if its kernel VAs are valid.

If, despite the above, we expect to infrequently-but-not-never disable
migration with no intention of reenabling it, we should signal it
differently than we currently signal migration enablement. Currently,
if you toggle migration from on to off there is an implication that
you are about to reboot, and you are only ephemerally unable to
migrate. Having permanent disablement be indistinguishable from a
really long reboot is a recipe for a really sad long timeout in
userspace.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-18  2:01                 ` Steve Rutherford
@ 2021-05-19 12:06                   ` Ashish Kalra
  2021-05-19 13:44                     ` Paolo Bonzini
  0 siblings, 1 reply; 33+ messages in thread
From: Ashish Kalra @ 2021-05-19 12:06 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Borislav Petkov, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joerg Roedel,
	Tom Lendacky, X86 ML, KVM list, LKML, Venu Busireddy,
	Brijesh Singh

On Mon, May 17, 2021 at 07:01:09PM -0700, Steve Rutherford wrote:
> On Fri, May 14, 2021 at 3:05 AM Ashish Kalra <ashish.kalra@amd.com> wrote:
> >
> > On Fri, May 14, 2021 at 11:34:32AM +0200, Borislav Petkov wrote:
> > > On Fri, May 14, 2021 at 09:05:23AM +0000, Ashish Kalra wrote:
> > > > Ideally we should fail/stop migration even if a single guest page
> > > > encryption status cannot be notified and that should be the way to
> > > > proceed in this case, the guest kernel should notify the source
> > > > userspace VMM to block/stop migration in this case.
> > >
> > > Yap, and what I'm trying to point out here is that if the low level
> > > machinery fails for whatever reason and it cannot recover, we should
> > > propagate that error up the chain so that the user is aware as to why it
> > > failed.
> > >
> >
> > I totally agree.
> >
> > > WARN is a good first start but in some configurations those splats don't
> > > even get shown as people don't look at dmesg, etc.
> > >
> > > And I think it is very important to propagate those errors properly
> > > because there's a *lot* of moving parts involved in a guest migration
> > > and you have encrypted memory which makes debugging this probably even
> > > harder, etc, etc.
> > >
> > > I hope this makes more sense.
> > >
> > > > From a practical side, i do see Qemu's migrate_add_blocker() interface
> > > > but that looks to be a static interface and also i don't think it will
> > > > force stop an ongoing migration, is there an existing mechanism
> > > > to inform userspace VMM from kernel about blocking/stopping migration ?
> > >
> > > Hmm, so __set_memory_enc_dec() which calls
> > > notify_addr_enc_status_changed() is called by the guest, right, when it
> > > starts migrating.
> > >
> >
> > No, actually notify_addr_enc_status_changed() is called whenever a range
> > of memory is marked as encrypted or decrypted, so it has nothing to do
> > with migration as such.
> >
> > This is basically modifying the encryption attributes on the page tables
> > and correspondingly also making the hypercall to inform the hypervisor about
> > page status encryption changes. The hypervisor will use this information
> > during an ongoing or future migration, so this information is maintained
> > even though migration might never be initiated here.
> >
> > > Can an error value from it be propagated up the callchain so it can be
> > > turned into an error messsage for the guest owner to see?
> > >
> >
> > The error value cannot be propogated up the callchain directly
> > here, but one possibility is to leverage the hypercall and use Sean's
> > proposed hypercall interface to notify the host/hypervisor to block/stop
> > any future/ongoing migration.
> >
> > Or as from Paolo's response, writing 0 to MIGRATION_CONTROL MSR seems
> > more ideal.
> >
> > Thanks,
> > Ashish
> 
> How realistic is this type of failure? If you've gotten this deep, it
> seems like something has gone very wrong if the memory you are about
> to mark as shared (or encrypted) doesn't exist and isn't mapped. In
> particular, is the kernel going to page fault when it tries to
> reinitialize the page it's currently changing the c-bit of? From what
> I recall, most paths that do either set_memory_encrypted or
> set_memory_decrypted memset the region being toggled. Note: dma_pool
> doesn't immediately memset, but the VA it's providing to set_decrypted
> is freshly fetched from a recently allocated region (something has to
> have gone pretty wrong if this is invalid if I'm not mistaken. No one
> would think twice if you wrote to that freshly allocated page).
> 
> The reason I mention this is that SEV migration is going to be the
> least of your concerns if you are already on a one-way train towards a
> Kernel oops. I'm not certain I would go so far as to say this should
> BUG() instead (I think the page fault on access might be easier to
> debug a BUG here), but I'm pretty skeptical that the kernel is going
> to do too well if it doesn't know if its kernel VAs are valid.
> 
> If, despite the above, we expect to infrequently-but-not-never disable
> migration with no intention of reenabling it, we should signal it
> differently than we currently signal migration enablement. Currently,
> if you toggle migration from on to off there is an implication that
> you are about to reboot, and you are only ephemerally unable to
> migrate. Having permanent disablement be indistinguishable from a
> really long reboot is a recipe for a really sad long timeout in
> userspace.

Also looking at set_memory_encrypted and set_memory_decrypted usage
patterns, the persistent decrypted regions like the dma pool,
bss_decrypted, etc., will be setup at boot time. Later the unused
bss_decrypted section will be freed and set_memory_encrypted called on
the freed memory at end of kernel boot.

Most of the runtime set_memory_encrypted and set_memory_decrypted calls
will be on dynamically allocated dma buffers via dma_alloc_coherent()
and dma_free_coherent().

For example, A dma_alloc_coherent() request will allocate pages and then call
set_memory_decrypted() against them. When dma_free_coherent() is called,
set_memory_encrypted() is called against the pages about to be freed 
before they are actually freed.

Now these buffers have very short life and only used for immediate I/O
and then freed, so they may not be of major concern for SEV
migration ?

So disabling migration for failure of address lookup or mapping failures
on such pages will really be an overkill. 

Might be in favor of Steve's thoughts above of doing a BUG() here
instead.

Thanks,
Ashish

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-19 12:06                   ` Ashish Kalra
@ 2021-05-19 13:44                     ` Paolo Bonzini
  0 siblings, 0 replies; 33+ messages in thread
From: Paolo Bonzini @ 2021-05-19 13:44 UTC (permalink / raw)
  To: Ashish Kalra, Steve Rutherford
  Cc: Borislav Petkov, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Joerg Roedel, Tom Lendacky, X86 ML,
	KVM list, LKML, Venu Busireddy, Brijesh Singh

On 19/05/21 14:06, Ashish Kalra wrote:
> Now these buffers have very short life and only used for immediate I/O
> and then freed, so they may not be of major concern for SEV
> migration ?

Well, they are a concern because they do break migration.  But it may be 
indeed good enough to just have a WARN ("bad things may happen and you 
get to keep both pieces") and not disable future migration.

A BUG must always be avoided unless you're sure that something *worse* 
will happen in the future, e.g. a BUG is acceptable if you have detected 
a use-after-free or a dangling pointer.  This is not the case.

Paolo

> So disabling migration for failure of address lookup or mapping failures
> on such pages will really be an overkill.
> Might be in favor of Steve's thoughts above of doing a BUG() here
> instead.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-12 13:15   ` Borislav Petkov
  2021-05-12 15:51     ` Sean Christopherson
  2021-05-13  4:34     ` Ashish Kalra
@ 2021-05-19 23:29     ` Andy Lutomirski
  2021-05-19 23:44       ` Sean Christopherson
  2 siblings, 1 reply; 33+ messages in thread
From: Andy Lutomirski @ 2021-05-19 23:29 UTC (permalink / raw)
  To: Borislav Petkov, Ashish Kalra, Sean Christopherson
  Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joerg Roedel, thomas.lendacky, the arch/x86 maintainers,
	kvm list, Linux Kernel Mailing List, srutherford, venu.busireddy,
	brijesh.singh

On Wed, May 12, 2021, at 6:15 AM, Borislav Petkov wrote:
> On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> > +static inline void notify_page_enc_status_changed(unsigned long pfn,
> > +						  int npages, bool enc)
> > +{
> > +	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> > +}
> 
> Now the question is whether something like that is needed for TDX, and,
> if so, could it be shared by both.

The TDX MapGPA call can fail, and presumably it will fail if the page is not sufficiently quiescent from the host's perspective.  It seems like a mistake to me to have a KVM-specific hypercall for this that cannot cleanly fail.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
  2021-05-19 23:29     ` Andy Lutomirski
@ 2021-05-19 23:44       ` Sean Christopherson
  0 siblings, 0 replies; 33+ messages in thread
From: Sean Christopherson @ 2021-05-19 23:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, Ashish Kalra, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Joerg Roedel, thomas.lendacky,
	the arch/x86 maintainers, kvm list, Linux Kernel Mailing List,
	srutherford, venu.busireddy, brijesh.singh

On Wed, May 19, 2021, Andy Lutomirski wrote:
> On Wed, May 12, 2021, at 6:15 AM, Borislav Petkov wrote:
> > On Fri, Apr 23, 2021 at 03:58:43PM +0000, Ashish Kalra wrote:
> > > +static inline void notify_page_enc_status_changed(unsigned long pfn,
> > > +						  int npages, bool enc)
> > > +{
> > > +	PVOP_VCALL3(mmu.notify_page_enc_status_changed, pfn, npages, enc);
> > > +}
> > 
> > Now the question is whether something like that is needed for TDX, and,
> > if so, could it be shared by both.
> 
> The TDX MapGPA call can fail, and presumably it will fail if the page is not
> sufficiently quiescent from the host's perspective.

Barring a guest bug, e.g. requesting a completely non-existent page, MapGPA
shouldn't fail.  The example in the the GHCI:

  Invalid operand – for example, the GPA may be already mapped as a shared page.

makes no sense to me.  An already-mapped page would be an -EBUSY style error,
not an invalid operand, and IIRC, I explicitly lobbied against allowing the VMM
to return "try again" precisely because it's impossible for the guest to handle
in a sane manner.  If the physical page is in a state that requires stalling the
vCPU, then the VMM is supposed to do exactly that, not punt the problem to the
guest.

Maybe we should get stronger language into the GHCI?

> It seems like a mistake to me to have a KVM-specific hypercall for this that
> cannot cleanly fail.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-05-19 23:44 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-23 15:57 [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
2021-04-23 15:58 ` [PATCH v2 1/4] KVM: x86: invert KVM_HYPERCALL to default to VMMCALL Ashish Kalra
2021-04-23 16:31   ` Jim Mattson
2021-04-23 17:44     ` Sean Christopherson
2021-04-23 15:58 ` [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed Ashish Kalra
2021-05-12 13:15   ` Borislav Petkov
2021-05-12 15:51     ` Sean Christopherson
2021-05-12 16:23       ` Borislav Petkov
2021-05-13  6:57       ` Ashish Kalra
2021-05-13  8:40         ` Paolo Bonzini
2021-05-13 13:49       ` Tom Lendacky
2021-05-13  4:34     ` Ashish Kalra
2021-05-14  7:33       ` Borislav Petkov
2021-05-14  8:03         ` Paolo Bonzini
2021-05-14  9:05           ` Ashish Kalra
2021-05-14  9:34             ` Borislav Petkov
2021-05-14 10:05               ` Ashish Kalra
2021-05-14 10:38                 ` Borislav Petkov
2021-05-18  2:01                 ` Steve Rutherford
2021-05-19 12:06                   ` Ashish Kalra
2021-05-19 13:44                     ` Paolo Bonzini
2021-05-14  9:57             ` Paolo Bonzini
2021-05-14  9:24           ` Borislav Petkov
2021-05-14  9:33             ` Ashish Kalra
2021-05-19 23:29     ` Andy Lutomirski
2021-05-19 23:44       ` Sean Christopherson
2021-04-23 15:59 ` [PATCH v2 3/4] EFI: Introduce the new AMD Memory Encryption GUID Ashish Kalra
2021-05-12 13:19   ` Borislav Petkov
2021-05-12 14:53     ` Ard Biesheuvel
2021-05-13  4:36       ` Ashish Kalra
2021-04-23 15:59 ` [PATCH v2 4/4] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature Ashish Kalra
2021-04-30  7:19 ` [PATCH v2 0/4] Add guest support for SEV live migration Ashish Kalra
2021-04-30  7:40   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).