linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv
@ 2023-05-15 16:59 Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler Tianyu Lan
                   ` (13 more replies)
  0 siblings, 14 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

This patchset is to add AMD sev-snp enlightened guest
support on hyperv. Hyperv uses Linux direct boot mode
to boot up Linux kernel and so it needs to pvalidate
system memory by itself.

In hyperv case, there is no boot loader and so cc blob
is prepared by hypervisor. In this series, hypervisor
set the cc blob address directly into boot parameter
of Linux kernel.

Shared memory between guests and hypervisor should be
decrypted and zero memory after decrypt memory. The data
in the target address. It maybe smearedto avoid smearing
data.

Introduce #HV exception support in AMD sev snp code and
#HV handler.

Change since v6:
       - Merge Ashish Kalr patch https://github.com/
       ashkalra/linux/commit/6975484094b7cb8d703c45066780dd85043cd040
       - Merge patch "x86/sev: Fix interrupt exit code paths from
        #HV exception" with patch "x86/sev: Add AMD sev-snp enlightened guest
	 support on hyperv".
       - Fix getting processor num in the hv_snp_get_smp_config() when ealry is false.

Change since v4:
       - Use pgcount to free intput arg page.
       - Fix encrypt and free page order.
       - struct_size to calculate array size
       - Share asm code between #HV and #VC exception.

Change since v3:
       - Replace struct sev_es_save_area with struct vmcb_save_area
       - Move smp, cpu and memory enumerating code from mshyperv.c to ivm.c
       - Handle nested entry case of do_exc_hv() case.
       - Check NMI event when irq is disabled

Change since v2:
       - Remove validate kernel memory code at boot stage
       - Split #HV page patch into two parts
       - Remove HV-APIC change due to enable x2apic from
       	 host side
       - Rework vmbus code to handle error of decrypt page
       - Spilt memory and cpu initialization patch. 
Change since v1:
       - Remove boot param changes for cc blob address and
       use setup head to pass cc blob info
       - Remove unnessary WARN and BUG check
       - Add system vector table map in the #HV exception
       - Fix interrupt exit issue when use #HV exception

Ashish Kalra (1):
  x86/sev: optimize system vector processing invoked from #HV exception

Tianyu Lan (13):
  x86/sev: Add a #HV exception handler
  x86/sev: Add Check of #HV event in path
  x86/sev: Add AMD sev-snp enlightened guest support on hyperv
  x86/hyperv: Add sev-snp enlightened guest static key
  x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP
    enlightened guest
  x86/hyperv: Set Virtual Trust Level in VMBus init message
  x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp
    enlightened guest
  clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp
    enlightened guest
  hv: vmbus: Mask VMBus pages unencrypted for sev-snp enlightened guest
  drivers: hv: Decrypt percpu hvcall input arg page in sev-snp
    enlightened guest
  x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest
  x86/hyperv: Add smp support for sev-snp guest
  x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES

 arch/x86/entry/entry_64.S             |  46 ++-
 arch/x86/hyperv/hv_init.c             |  42 +++
 arch/x86/hyperv/ivm.c                 | 196 +++++++++++++
 arch/x86/include/asm/cpu_entry_area.h |   6 +
 arch/x86/include/asm/hyperv-tlfs.h    |   7 +
 arch/x86/include/asm/idtentry.h       |  52 +++-
 arch/x86/include/asm/irqflags.h       |  14 +-
 arch/x86/include/asm/mem_encrypt.h    |   2 +
 arch/x86/include/asm/mshyperv.h       |  74 ++++-
 arch/x86/include/asm/page_64_types.h  |   1 +
 arch/x86/include/asm/trapnr.h         |   1 +
 arch/x86/include/asm/traps.h          |   1 +
 arch/x86/include/uapi/asm/svm.h       |   4 +
 arch/x86/kernel/cpu/common.c          |   1 +
 arch/x86/kernel/cpu/mshyperv.c        |  42 ++-
 arch/x86/kernel/dumpstack_64.c        |   9 +-
 arch/x86/kernel/idt.c                 |   1 +
 arch/x86/kernel/sev.c                 | 404 +++++++++++++++++++++++---
 arch/x86/kernel/traps.c               |  60 ++++
 arch/x86/kernel/vmlinux.lds.S         |   7 +
 arch/x86/mm/cpu_entry_area.c          |   2 +
 drivers/clocksource/hyperv_timer.c    |   2 +-
 drivers/hv/connection.c               |   1 +
 drivers/hv/hv.c                       |  37 ++-
 drivers/hv/hv_common.c                |  27 +-
 include/asm-generic/hyperv-tlfs.h     |   3 +-
 include/asm-generic/mshyperv.h        |   1 +
 include/linux/hyperv.h                |   4 +-
 28 files changed, 960 insertions(+), 87 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-16  9:30   ` Peter Zijlstra
  2023-05-15 16:59 ` [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path Tianyu Lan
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

Add a #HV exception handler that uses IST stack.

Co-developed-by: Kalra Ashish <ashish.kalra@amd.com>
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change since RFC V5:
       * Merge Ashish Kalr patch https://github.com/
       ashkalra/linux/commit/6975484094b7cb8d703c45066780dd85043cd040

Change since RFC V2:
       * Remove unnecessary line in the change log.
---
 arch/x86/entry/entry_64.S             | 22 ++++++----
 arch/x86/include/asm/cpu_entry_area.h |  6 +++
 arch/x86/include/asm/idtentry.h       | 40 +++++++++++++++++-
 arch/x86/include/asm/page_64_types.h  |  1 +
 arch/x86/include/asm/trapnr.h         |  1 +
 arch/x86/include/asm/traps.h          |  1 +
 arch/x86/kernel/cpu/common.c          |  1 +
 arch/x86/kernel/dumpstack_64.c        |  9 ++++-
 arch/x86/kernel/idt.c                 |  1 +
 arch/x86/kernel/sev.c                 | 53 ++++++++++++++++++++++++
 arch/x86/kernel/traps.c               | 58 +++++++++++++++++++++++++++
 arch/x86/mm/cpu_entry_area.c          |  2 +
 12 files changed, 183 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index eccc3431e515..653b1f10699b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -496,7 +496,7 @@ SYM_CODE_END(\asmsym)
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 /**
- * idtentry_vc - Macro to generate entry stub for #VC
+ * idtentry_sev - Macro to generate entry stub for #VC
  * @vector:		Vector number
  * @asmsym:		ASM symbol for the entry point
  * @cfunc:		C function to be called
@@ -515,14 +515,18 @@ SYM_CODE_END(\asmsym)
  *
  * The macro is only used for one vector, but it is planned to be extended in
  * the future for the #HV exception.
- */
-.macro idtentry_vc vector asmsym cfunc
+*/
+.macro idtentry_sev vector asmsym cfunc has_error_code:req
 SYM_CODE_START(\asmsym)
 	UNWIND_HINT_IRET_REGS
 	ENDBR
 	ASM_CLAC
 	cld
 
+	.if \vector == X86_TRAP_HV
+		pushq	$-1			/* ORIG_RAX: no syscall */
+	.endif
+
 	/*
 	 * If the entry is from userspace, switch stacks and treat it as
 	 * a normal entry.
@@ -545,7 +549,12 @@ SYM_CODE_START(\asmsym)
 	 * stack.
 	 */
 	movq	%rsp, %rdi		/* pt_regs pointer */
-	call	vc_switch_off_ist
+	.if \vector == X86_TRAP_VC
+		call	vc_switch_off_ist
+	.else
+		call	hv_switch_off_ist	
+	.endif
+
 	movq	%rax, %rsp		/* Switch to new stack */
 
 	ENCODE_FRAME_POINTER
@@ -568,10 +577,7 @@ SYM_CODE_START(\asmsym)
 
 	/* Switch to the regular task stack */
 .Lfrom_usermode_switch_stack_\@:
-	idtentry_body user_\cfunc, has_error_code=1
-
-_ASM_NOKPROBE(\asmsym)
-SYM_CODE_END(\asmsym)
+	idtentry_body user_\cfunc, \has_error_code
 .endm
 #endif
 
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
index 462fc34f1317..2186ed601b4a 100644
--- a/arch/x86/include/asm/cpu_entry_area.h
+++ b/arch/x86/include/asm/cpu_entry_area.h
@@ -30,6 +30,10 @@
 	char	VC_stack[optional_stack_size];			\
 	char	VC2_stack_guard[guardsize];			\
 	char	VC2_stack[optional_stack_size];			\
+	char	HV_stack_guard[guardsize];			\
+	char	HV_stack[optional_stack_size];			\
+	char	HV2_stack_guard[guardsize];			\
+	char	HV2_stack[optional_stack_size];			\
 	char	IST_top_guard[guardsize];			\
 
 /* The exception stacks' physical storage. No guard pages required */
@@ -52,6 +56,8 @@ enum exception_stack_ordering {
 	ESTACK_MCE,
 	ESTACK_VC,
 	ESTACK_VC2,
+	ESTACK_HV,
+	ESTACK_HV2,
 	N_EXCEPTION_STACKS
 };
 
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index b241af4ce9b4..b0f3501b2767 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -317,6 +317,19 @@ static __always_inline void __##func(struct pt_regs *regs)
 	__visible noinstr void kernel_##func(struct pt_regs *regs, unsigned long error_code);	\
 	__visible noinstr void   user_##func(struct pt_regs *regs, unsigned long error_code)
 
+
+/**
+ * DECLARE_IDTENTRY_HV - Declare functions for the HV entry point
+ * @vector:	Vector number (ignored for C)
+ * @func:	Function name of the entry point
+ *
+ * Maps to DECLARE_IDTENTRY_RAW, but declares also the user C handler.
+ */
+#define DECLARE_IDTENTRY_HV(vector, func)				\
+	DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func);			\
+	__visible noinstr void kernel_##func(struct pt_regs *regs);	\
+	__visible noinstr void   user_##func(struct pt_regs *regs)
+
 /**
  * DEFINE_IDTENTRY_IST - Emit code for IST entry points
  * @func:	Function name of the entry point
@@ -376,6 +389,26 @@ static __always_inline void __##func(struct pt_regs *regs)
 #define DEFINE_IDTENTRY_VC_USER(func)				\
 	DEFINE_IDTENTRY_RAW_ERRORCODE(user_##func)
 
+/**
+ * DEFINE_IDTENTRY_HV_KERNEL - Emit code for HV injection handler
+ *			       when raised from kernel mode
+ * @func:	Function name of the entry point
+ *
+ * Maps to DEFINE_IDTENTRY_RAW
+ */
+#define DEFINE_IDTENTRY_HV_KERNEL(func)					\
+	DEFINE_IDTENTRY_RAW(kernel_##func)
+
+/**
+ * DEFINE_IDTENTRY_HV_USER - Emit code for HV injection handler
+ *			     when raised from user mode
+ * @func:	Function name of the entry point
+ *
+ * Maps to DEFINE_IDTENTRY_RAW
+ */
+#define DEFINE_IDTENTRY_HV_USER(func)					\
+	DEFINE_IDTENTRY_RAW(user_##func)
+
 #else	/* CONFIG_X86_64 */
 
 /**
@@ -463,8 +496,10 @@ __visible noinstr void func(struct pt_regs *regs,			\
 	DECLARE_IDTENTRY(vector, func)
 
 # define DECLARE_IDTENTRY_VC(vector, func)				\
-	idtentry_vc vector asm_##func func
+	idtentry_sev vector asm_##func func has_error_code=1
 
+# define DECLARE_IDTENTRY_HV(vector, func)				\
+	idtentry_sev vector asm_##func func has_error_code=0
 #else
 # define DECLARE_IDTENTRY_MCE(vector, func)				\
 	DECLARE_IDTENTRY(vector, func)
@@ -618,9 +653,10 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF,	xenpv_exc_double_fault);
 DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP,	exc_control_protection);
 #endif
 
-/* #VC */
+/* #VC & #HV */
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 DECLARE_IDTENTRY_VC(X86_TRAP_VC,	exc_vmm_communication);
+DECLARE_IDTENTRY_HV(X86_TRAP_HV,	exc_hv_injection);
 #endif
 
 #ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index e9e2c3ba5923..0bd7dab676c5 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -29,6 +29,7 @@
 #define	IST_INDEX_DB		2
 #define	IST_INDEX_MCE		3
 #define	IST_INDEX_VC		4
+#define	IST_INDEX_HV		5
 
 /*
  * Set __PAGE_OFFSET to the most negative possible address +
diff --git a/arch/x86/include/asm/trapnr.h b/arch/x86/include/asm/trapnr.h
index f5d2325aa0b7..c6583631cecb 100644
--- a/arch/x86/include/asm/trapnr.h
+++ b/arch/x86/include/asm/trapnr.h
@@ -26,6 +26,7 @@
 #define X86_TRAP_XF		19	/* SIMD Floating-Point Exception */
 #define X86_TRAP_VE		20	/* Virtualization Exception */
 #define X86_TRAP_CP		21	/* Control Protection Exception */
+#define X86_TRAP_HV		28	/* HV injected exception in SNP restricted mode */
 #define X86_TRAP_VC		29	/* VMM Communication Exception */
 #define X86_TRAP_IRET		32	/* IRET Exception */
 
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 47ecfff2c83d..6795d3e517d6 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -16,6 +16,7 @@ asmlinkage __visible notrace
 struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs);
 void __init trap_init(void);
 asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *eregs);
+asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *eregs);
 #endif
 
 extern bool ibt_selftest(void);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8cd4126d8253..5bc44bcf6e48 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2172,6 +2172,7 @@ static inline void tss_setup_ist(struct tss_struct *tss)
 	tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
 	/* Only mapped when SEV-ES is active */
 	tss->x86_tss.ist[IST_INDEX_VC] = __this_cpu_ist_top_va(VC);
+	tss->x86_tss.ist[IST_INDEX_HV] = __this_cpu_ist_top_va(HV);
 }
 
 #else /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index f05339fee778..6d8f8864810c 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -26,11 +26,14 @@ static const char * const exception_stack_names[] = {
 		[ ESTACK_MCE	]	= "#MC",
 		[ ESTACK_VC	]	= "#VC",
 		[ ESTACK_VC2	]	= "#VC2",
+		[ ESTACK_HV	]	= "#HV",
+		[ ESTACK_HV2	]	= "#HV2",
+		
 };
 
 const char *stack_type_name(enum stack_type type)
 {
-	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);
 
 	if (type == STACK_TYPE_TASK)
 		return "TASK";
@@ -89,6 +92,8 @@ struct estack_pages estack_pages[CEA_ESTACK_PAGES] ____cacheline_aligned = {
 	EPAGERANGE(MCE),
 	EPAGERANGE(VC),
 	EPAGERANGE(VC2),
+	EPAGERANGE(HV),
+	EPAGERANGE(HV2),
 };
 
 static __always_inline bool in_exception_stack(unsigned long *stack, struct stack_info *info)
@@ -98,7 +103,7 @@ static __always_inline bool in_exception_stack(unsigned long *stack, struct stac
 	struct pt_regs *regs;
 	unsigned int k;
 
-	BUILD_BUG_ON(N_EXCEPTION_STACKS != 6);
+	BUILD_BUG_ON(N_EXCEPTION_STACKS != 8);
 
 	begin = (unsigned long)__this_cpu_read(cea_exception_stacks);
 	/*
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index a58c6bc1cd68..48c0a7e1dbcb 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -113,6 +113,7 @@ static const __initconst struct idt_data def_idts[] = {
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 	ISTG(X86_TRAP_VC,		asm_exc_vmm_communication, IST_INDEX_VC),
+	ISTG(X86_TRAP_HV,		asm_exc_hv_injection, IST_INDEX_HV),
 #endif
 
 	SYSG(X86_TRAP_OF,		asm_exc_overflow),
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b031244d6d2d..e25445de0957 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -2006,6 +2006,59 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
 	irqentry_exit_to_user_mode(regs);
 }
 
+static bool hv_raw_handle_exception(struct pt_regs *regs)
+{
+	return false;
+}
+
+static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
+{
+	unsigned long sp = (unsigned long)regs;
+
+	return (sp >= __this_cpu_ist_bottom_va(HV2) && sp < __this_cpu_ist_top_va(HV2));
+}
+
+DEFINE_IDTENTRY_HV_USER(exc_hv_injection)
+{
+	irqentry_enter_from_user_mode(regs);
+	instrumentation_begin();
+
+	if (!hv_raw_handle_exception(regs)) {
+		/*
+		 * Do not kill the machine if user-space triggered the
+		 * exception. Send SIGBUS instead and let user-space deal
+		 * with it.
+		 */
+		force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0);
+	}
+
+	instrumentation_end();
+	irqentry_exit_to_user_mode(regs);
+}
+
+DEFINE_IDTENTRY_HV_KERNEL(exc_hv_injection)
+{
+	irqentry_state_t irq_state;
+
+	irq_state = irqentry_enter(regs);
+	instrumentation_begin();
+
+	if (!hv_raw_handle_exception(regs)) {
+		pr_emerg("PANIC: Unhandled #HV exception in kernel space\n");
+
+		/* Show some debug info */
+		show_regs(regs);
+
+		/* Ask hypervisor to sev_es_terminate */
+		sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
+		panic("Returned from Terminate-Request to Hypervisor\n");
+	}
+
+	instrumentation_end();
+	irqentry_exit(regs, irq_state);
+}
+
 bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
 {
 	unsigned long exit_code = regs->orig_ax;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d317dc3d06a3..5dca05d0fa38 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -905,6 +905,64 @@ asmlinkage __visible noinstr struct pt_regs *vc_switch_off_ist(struct pt_regs *r
 
 	return regs_ret;
 }
+
+asmlinkage __visible noinstr struct pt_regs *hv_switch_off_ist(struct pt_regs *regs)
+{
+	unsigned long sp, *stack;
+	struct stack_info info;
+	struct pt_regs *regs_ret;
+
+	/*
+	 * In the SYSCALL entry path the RSP value comes from user-space - don't
+	 * trust it and switch to the current kernel stack
+	 */
+	if (ip_within_syscall_gap(regs)) {
+		sp = this_cpu_read(pcpu_hot.top_of_stack);
+		goto sync;
+	}
+
+	/*
+	 * From here on the RSP value is trusted. Now check whether entry
+	 * happened from a safe stack. Not safe are the entry or unknown stacks,
+	 * use the fall-back stack instead in this case.
+	 */
+	sp    = regs->sp;
+	stack = (unsigned long *)sp;
+
+	/*
+	 * We support nested #HV exceptions once the IST stack is
+	 * switched out. The HV can always inject an #HV, but as per
+	 * GHCB specs, the HV will not inject another #HV, if
+	 * PendingEvent.NoFurtherSignal is set and we only clear this
+	 * after switching out the IST stack and handling the current
+	 * #HV. But there is still a window before the IST stack is
+	 * switched out, where a malicious HV can inject nested #HV.
+	 * The code below checks the interrupted stack to check if
+	 * it is the IST stack, and if so panic as this is
+	 * not supported and this nested #HV would have corrupted
+	 * the iret frame of the previous #HV on the IST stack.
+	 */
+	if (get_stack_info_noinstr(stack, current, &info) &&
+	    (info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV) ||
+	     info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV2)))
+		panic("Nested #HV exception, HV IST corrupted, stack type = %d\n", info.type);
+
+	if (!get_stack_info_noinstr(stack, current, &info) || info.type == STACK_TYPE_ENTRY ||
+	    info.type > STACK_TYPE_EXCEPTION_LAST)
+		sp = __this_cpu_ist_top_va(HV2);
+sync:
+	/*
+	 * Found a safe stack - switch to it as if the entry didn't happen via
+	 * IST stack. The code below only copies pt_regs, the real switch happens
+	 * in assembly code.
+	 */
+	sp = ALIGN_DOWN(sp, 8) - sizeof(*regs_ret);
+
+	regs_ret = (struct pt_regs *)sp;
+	*regs_ret = *regs;
+
+	return regs_ret;
+}
 #endif
 
 asmlinkage __visible noinstr struct pt_regs *fixup_bad_iret(struct pt_regs *bad_regs)
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index e91500a80963..97554fa0ff30 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -160,6 +160,8 @@ static void __init percpu_setup_exception_stacks(unsigned int cpu)
 		if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
 			cea_map_stack(VC);
 			cea_map_stack(VC2);
+			cea_map_stack(HV);
+			cea_map_stack(HV2);
 		}
 	}
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-16  9:32   ` Peter Zijlstra
  2023-05-15 16:59 ` [RFC PATCH V6 03/14] x86/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

Add check_hv_pending() and check_hv_pending_after_irq() to
check queued #HV event when irq is disabled.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/entry/entry_64.S       | 18 ++++++++++++++++
 arch/x86/include/asm/irqflags.h | 14 +++++++++++-
 arch/x86/kernel/sev.c           | 38 +++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 653b1f10699b..147b850babf6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
  * R15 - old SPEC_CTRL
  */
 SYM_CODE_START_LOCAL(paranoid_exit)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	/*
+	 * If a #HV was delivered during execution and interrupts were
+	 * disabled, then check if it can be handled before the iret
+	 * (which may re-enable interrupts).
+	 */
+	mov     %rsp, %rdi
+	call    check_hv_pending
+#endif
 	UNWIND_HINT_REGS
 
 	/*
@@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
 SYM_CODE_END(error_entry)
 
 SYM_CODE_START_LOCAL(error_return)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	/*
+	 * If a #HV was delivered during execution and interrupts were
+	 * disabled, then check if it can be handled before the iret
+	 * (which may re-enable interrupts).
+	 */
+	mov     %rsp, %rdi
+	call    check_hv_pending
+#endif
 	UNWIND_HINT_REGS
 	DEBUG_ENTRY_ASSERT_IRQS_OFF
 	testb	$3, CS(%rsp)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 8c5ae649d2df..d09ec6d76591 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -11,6 +11,10 @@
 /*
  * Interrupt control:
  */
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+void check_hv_pending(struct pt_regs *regs);
+void check_hv_pending_irq_enable(void);
+#endif
 
 /* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
 extern inline unsigned long native_save_fl(void);
@@ -40,12 +44,20 @@ static __always_inline void native_irq_disable(void)
 static __always_inline void native_irq_enable(void)
 {
 	asm volatile("sti": : :"memory");
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	check_hv_pending_irq_enable();
+#endif
 }
 
 static __always_inline void native_safe_halt(void)
 {
 	mds_idle_clear_cpu_buffers();
-	asm volatile("sti; hlt": : :"memory");
+	asm volatile("sti": : :"memory");
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	check_hv_pending_irq_enable();
+#endif
+	asm volatile("hlt": : :"memory");
 }
 
 static __always_inline void native_halt(void)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e25445de0957..ff5eab48bfe2 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -181,6 +181,44 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
 	this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
 }
 
+static void do_exc_hv(struct pt_regs *regs)
+{
+	/* Handle #HV exception. */
+}
+
+void check_hv_pending(struct pt_regs *regs)
+{
+	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+		return;
+
+	if ((regs->flags & X86_EFLAGS_IF) == 0)
+		return;
+
+	do_exc_hv(regs);
+}
+
+void check_hv_pending_irq_enable(void)
+{
+	struct pt_regs regs;
+
+	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+		return;
+
+	memset(&regs, 0, sizeof(struct pt_regs));
+	asm volatile("movl %%cs, %%eax;" : "=a" (regs.cs));
+	asm volatile("movl %%ss, %%eax;" : "=a" (regs.ss));
+	regs.orig_ax = 0xffffffff;
+	regs.flags = native_save_fl();
+
+	/*
+	 * Disable irq when handle pending #HV events after
+	 * re-enabling irq.
+	 */
+	asm volatile("cli" : : : "memory");
+	do_exc_hv(&regs);
+	asm volatile("sti" : : : "memory");
+}
+
 void noinstr __sev_es_ist_exit(void)
 {
 	unsigned long ist;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 03/14] x86/sev: Add AMD sev-snp enlightened guest support on hyperv
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-16  9:40   ` Peter Zijlstra
  2023-05-15 16:59 ` [RFC PATCH V6 04/14] x86/sev: optimize system vector processing invoked from #HV exception Tianyu Lan
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

Enable #HV exception to handle interrupt requests from hypervisor.

Co-developed-by: Lendacky Thomas <thomas.lendacky@amd.com>
Co-developed-by: Kalra Ashish <ashish.kalra@amd.com>
Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change since RFC V5:
       * Merge patch "x86/sev: Fix interrupt exit code paths from
        #HV exception" with this commit.

Change since RFC V3:
       * Check NMI event when irq is disabled.
       * Remove redundant variable
---
 arch/x86/include/asm/idtentry.h    |  12 +-
 arch/x86/include/asm/mem_encrypt.h |   2 +
 arch/x86/include/uapi/asm/svm.h    |   4 +
 arch/x86/kernel/sev.c              | 349 ++++++++++++++++++++++++-----
 arch/x86/kernel/traps.c            |   2 +
 5 files changed, 310 insertions(+), 59 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index b0f3501b2767..867073ccf1d1 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -13,6 +13,12 @@
 
 #include <asm/irq_stack.h>
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state);
+#else
+#define irqentry_exit_hv_cond(regs, state)	irqentry_exit(regs, state)
+#endif
+
 /**
  * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
  *		      No error code pushed by hardware
@@ -201,7 +207,7 @@ __visible noinstr void func(struct pt_regs *regs,			\
 	kvm_set_cpu_l1tf_flush_l1d();					\
 	run_irq_on_irqstack_cond(__##func, regs, vector);		\
 	instrumentation_end();						\
-	irqentry_exit(regs, state);					\
+	irqentry_exit_hv_cond(regs, state);				\
 }									\
 									\
 static noinline void __##func(struct pt_regs *regs, u32 vector)
@@ -241,7 +247,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	kvm_set_cpu_l1tf_flush_l1d();					\
 	run_sysvec_on_irqstack_cond(__##func, regs);			\
 	instrumentation_end();						\
-	irqentry_exit(regs, state);					\
+	irqentry_exit_hv_cond(regs, state);				\
 }									\
 									\
 static noinline void __##func(struct pt_regs *regs)
@@ -270,7 +276,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	__##func (regs);						\
 	__irq_exit_raw();						\
 	instrumentation_end();						\
-	irqentry_exit(regs, state);					\
+	irqentry_exit_hv_cond(regs, state);				\
 }									\
 									\
 static __always_inline void __##func(struct pt_regs *regs)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index b7126701574c..9299caeca69f 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -50,6 +50,7 @@ void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
 void __init mem_encrypt_free_decrypted_mem(void);
 
 void __init sev_es_init_vc_handling(void);
+void __init sev_snp_init_hv_handling(void);
 
 #define __bss_decrypted __section(".bss..decrypted")
 
@@ -73,6 +74,7 @@ static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
+static inline void sev_snp_init_hv_handling(void) { }
 
 static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 80e1df482337..828d624a38cf 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -115,6 +115,10 @@
 #define SVM_VMGEXIT_AP_CREATE_ON_INIT		0
 #define SVM_VMGEXIT_AP_CREATE			1
 #define SVM_VMGEXIT_AP_DESTROY			2
+#define SVM_VMGEXIT_HV_DOORBELL_PAGE		0x80000014
+#define SVM_VMGEXIT_GET_PREFERRED_HV_DOORBELL_PAGE	0
+#define SVM_VMGEXIT_SET_HV_DOORBELL_PAGE		1
+#define SVM_VMGEXIT_QUERY_HV_DOORBELL_PAGE		2
 #define SVM_VMGEXIT_HV_FEATURES			0x8000fffd
 #define SVM_VMGEXIT_TERM_REQUEST		0x8000fffe
 #define SVM_VMGEXIT_TERM_REASON(reason_set, reason_code)	\
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index ff5eab48bfe2..400ca555bd48 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -124,6 +124,165 @@ struct sev_config {
 
 static struct sev_config sev_cfg __read_mostly;
 
+static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state);
+static noinstr void __sev_put_ghcb(struct ghcb_state *state);
+static int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa);
+static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb);
+
+union hv_pending_events {
+	u16 events;
+	struct {
+		u8 vector;
+		u8 nmi : 1;
+		u8 mc : 1;
+		u8 reserved1 : 5;
+		u8 no_further_signal : 1;
+	};
+};
+
+struct sev_hv_doorbell_page {
+	union hv_pending_events pending_events;
+	u8 no_eoi_required;
+	u8 reserved2[61];
+	u8 padding[4032];
+};
+
+struct sev_snp_runtime_data {
+	struct sev_hv_doorbell_page hv_doorbell_page;
+	/*
+	 * Indication that we are currently handling #HV events.
+	 */
+	bool hv_handling_events;
+};
+
+static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
+
+static inline u64 sev_es_rd_ghcb_msr(void)
+{
+	return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
+}
+
+static __always_inline void sev_es_wr_ghcb_msr(u64 val)
+{
+	u32 low, high;
+
+	low  = (u32)(val);
+	high = (u32)(val >> 32);
+
+	native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+}
+
+struct sev_hv_doorbell_page *sev_snp_current_doorbell_page(void)
+{
+	return &this_cpu_read(snp_runtime_data)->hv_doorbell_page;
+}
+
+static u8 sev_hv_pending(void)
+{
+	return sev_snp_current_doorbell_page()->pending_events.events;
+}
+
+#define sev_hv_pending_nmi	\
+		sev_snp_current_doorbell_page()->pending_events.nmi
+
+static void hv_doorbell_apic_eoi_write(u32 reg, u32 val)
+{
+	if (xchg(&sev_snp_current_doorbell_page()->no_eoi_required, 0) & 0x1)
+		return;
+
+	BUG_ON(reg != APIC_EOI);
+	apic->write(reg, val);
+}
+
+static void do_exc_hv(struct pt_regs *regs)
+{
+	union hv_pending_events pending_events;
+
+	/* Avoid nested entry. */
+	if (this_cpu_read(snp_runtime_data)->hv_handling_events)
+		return;
+
+	this_cpu_read(snp_runtime_data)->hv_handling_events = true;
+
+	while (sev_hv_pending()) {
+		pending_events.events = xchg(
+			&sev_snp_current_doorbell_page()->pending_events.events,
+			0);
+
+		if (pending_events.nmi)
+			exc_nmi(regs);
+
+#ifdef CONFIG_X86_MCE
+		if (pending_events.mc)
+			exc_machine_check(regs);
+#endif
+
+		if (!pending_events.vector)
+			goto out;
+
+		if (pending_events.vector < FIRST_EXTERNAL_VECTOR) {
+			/* Exception vectors */
+			WARN(1, "exception shouldn't happen\n");
+		} else if (pending_events.vector == FIRST_EXTERNAL_VECTOR) {
+			sysvec_irq_move_cleanup(regs);
+		} else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
+			WARN(1, "syscall shouldn't happen\n");
+		} else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
+			switch (pending_events.vector) {
+#if IS_ENABLED(CONFIG_HYPERV)
+			case HYPERV_STIMER0_VECTOR:
+				sysvec_hyperv_stimer0(regs);
+				break;
+			case HYPERVISOR_CALLBACK_VECTOR:
+				sysvec_hyperv_callback(regs);
+				break;
+#endif
+#ifdef CONFIG_SMP
+			case RESCHEDULE_VECTOR:
+				sysvec_reschedule_ipi(regs);
+				break;
+			case IRQ_MOVE_CLEANUP_VECTOR:
+				sysvec_irq_move_cleanup(regs);
+				break;
+			case REBOOT_VECTOR:
+				sysvec_reboot(regs);
+				break;
+			case CALL_FUNCTION_SINGLE_VECTOR:
+				sysvec_call_function_single(regs);
+				break;
+			case CALL_FUNCTION_VECTOR:
+				sysvec_call_function(regs);
+				break;
+#endif
+#ifdef CONFIG_X86_LOCAL_APIC
+			case ERROR_APIC_VECTOR:
+				sysvec_error_interrupt(regs);
+				break;
+			case SPURIOUS_APIC_VECTOR:
+				sysvec_spurious_apic_interrupt(regs);
+				break;
+			case LOCAL_TIMER_VECTOR:
+				sysvec_apic_timer_interrupt(regs);
+				break;
+			case X86_PLATFORM_IPI_VECTOR:
+				sysvec_x86_platform_ipi(regs);
+				break;
+#endif
+			case 0x0:
+				break;
+			default:
+				panic("Unexpected vector %d\n", vector);
+				unreachable();
+			}
+		} else {
+			common_interrupt(regs, pending_events.vector);
+		}
+	}
+
+out:
+	this_cpu_read(snp_runtime_data)->hv_handling_events = false;
+}
+
 static __always_inline bool on_vc_stack(struct pt_regs *regs)
 {
 	unsigned long sp = regs->sp;
@@ -181,18 +340,19 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
 	this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], new_ist);
 }
 
-static void do_exc_hv(struct pt_regs *regs)
-{
-	/* Handle #HV exception. */
-}
-
 void check_hv_pending(struct pt_regs *regs)
 {
 	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
 		return;
 
-	if ((regs->flags & X86_EFLAGS_IF) == 0)
+	/* Handle NMI when irq is disabled. */
+	if ((regs->flags & X86_EFLAGS_IF) == 0) {
+		if (sev_hv_pending_nmi) {
+			exc_nmi(regs);
+			sev_hv_pending_nmi = 0;
+		}
 		return;
+	}
 
 	do_exc_hv(regs);
 }
@@ -233,68 +393,35 @@ void noinstr __sev_es_ist_exit(void)
 	this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
 }
 
-/*
- * Nothing shall interrupt this code path while holding the per-CPU
- * GHCB. The backup GHCB is only for NMIs interrupting this path.
- *
- * Callers must disable local interrupts around it.
- */
-static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+static bool sev_restricted_injection_enabled(void)
+{
+	return sev_status & MSR_AMD64_SNP_RESTRICTED_INJ;
+}
+
+void __init sev_snp_init_hv_handling(void)
 {
 	struct sev_es_runtime_data *data;
+	struct ghcb_state state;
 	struct ghcb *ghcb;
+	unsigned long flags;
 
 	WARN_ON(!irqs_disabled());
+	if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) || !sev_restricted_injection_enabled())
+		return;
 
 	data = this_cpu_read(runtime_data);
-	ghcb = &data->ghcb_page;
-
-	if (unlikely(data->ghcb_active)) {
-		/* GHCB is already in use - save its contents */
-
-		if (unlikely(data->backup_ghcb_active)) {
-			/*
-			 * Backup-GHCB is also already in use. There is no way
-			 * to continue here so just kill the machine. To make
-			 * panic() work, mark GHCBs inactive so that messages
-			 * can be printed out.
-			 */
-			data->ghcb_active        = false;
-			data->backup_ghcb_active = false;
-
-			instrumentation_begin();
-			panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
-			instrumentation_end();
-		}
-
-		/* Mark backup_ghcb active before writing to it */
-		data->backup_ghcb_active = true;
 
-		state->ghcb = &data->backup_ghcb;
+	local_irq_save(flags);
 
-		/* Backup GHCB content */
-		*state->ghcb = *ghcb;
-	} else {
-		state->ghcb = NULL;
-		data->ghcb_active = true;
-	}
+	ghcb = __sev_get_ghcb(&state);
 
-	return ghcb;
-}
+	sev_snp_setup_hv_doorbell_page(ghcb);
 
-static inline u64 sev_es_rd_ghcb_msr(void)
-{
-	return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
-}
-
-static __always_inline void sev_es_wr_ghcb_msr(u64 val)
-{
-	u32 low, high;
+	__sev_put_ghcb(&state);
 
-	low  = (u32)(val);
-	high = (u32)(val >> 32);
+	apic_set_eoi_write(hv_doorbell_apic_eoi_write);
 
-	native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+	local_irq_restore(flags);
 }
 
 static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
@@ -555,6 +682,69 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
 /* Include code shared with pre-decompression boot stage */
 #include "sev-shared.c"
 
+/*
+ * Nothing shall interrupt this code path while holding the per-CPU
+ * GHCB. The backup GHCB is only for NMIs interrupting this path.
+ *
+ * Callers must disable local interrupts around it.
+ */
+static noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+{
+	struct sev_es_runtime_data *data;
+	struct ghcb *ghcb;
+
+	WARN_ON(!irqs_disabled());
+
+	data = this_cpu_read(runtime_data);
+	ghcb = &data->ghcb_page;
+
+	if (unlikely(data->ghcb_active)) {
+		/* GHCB is already in use - save its contents */
+
+		if (unlikely(data->backup_ghcb_active)) {
+			/*
+			 * Backup-GHCB is also already in use. There is no way
+			 * to continue here so just kill the machine. To make
+			 * panic() work, mark GHCBs inactive so that messages
+			 * can be printed out.
+			 */
+			data->ghcb_active        = false;
+			data->backup_ghcb_active = false;
+
+			instrumentation_begin();
+			panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
+			instrumentation_end();
+		}
+
+		/* Mark backup_ghcb active before writing to it */
+		data->backup_ghcb_active = true;
+
+		state->ghcb = &data->backup_ghcb;
+
+		/* Backup GHCB content */
+		*state->ghcb = *ghcb;
+	} else {
+		state->ghcb = NULL;
+		data->ghcb_active = true;
+	}
+
+	return ghcb;
+}
+
+static void sev_snp_setup_hv_doorbell_page(struct ghcb *ghcb)
+{
+	u64 pa;
+	enum es_result ret;
+
+	pa = __pa(sev_snp_current_doorbell_page());
+	vc_ghcb_invalidate(ghcb);
+	ret = vmgexit_hv_doorbell_page(ghcb,
+				       SVM_VMGEXIT_SET_HV_DOORBELL_PAGE,
+				       pa);
+	if (ret != ES_OK)
+		panic("SEV-SNP: failed to set up #HV doorbell page");
+}
+
 static noinstr void __sev_put_ghcb(struct ghcb_state *state)
 {
 	struct sev_es_runtime_data *data;
@@ -1283,6 +1473,7 @@ static void snp_register_per_cpu_ghcb(void)
 	ghcb = &data->ghcb_page;
 
 	snp_register_ghcb_early(__pa(ghcb));
+	sev_snp_setup_hv_doorbell_page(ghcb);
 }
 
 void setup_ghcb(void)
@@ -1322,6 +1513,11 @@ void setup_ghcb(void)
 		snp_register_ghcb_early(__pa(&boot_ghcb_page));
 }
 
+int vmgexit_hv_doorbell_page(struct ghcb *ghcb, u64 op, u64 pa)
+{
+	return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_HV_DOORBELL_PAGE, op, pa);
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 static void sev_es_ap_hlt_loop(void)
 {
@@ -1395,6 +1591,7 @@ static void __init alloc_runtime_data(int cpu)
 static void __init init_ghcb(int cpu)
 {
 	struct sev_es_runtime_data *data;
+	struct sev_snp_runtime_data *snp_data;
 	int err;
 
 	data = per_cpu(runtime_data, cpu);
@@ -1406,6 +1603,19 @@ static void __init init_ghcb(int cpu)
 
 	memset(&data->ghcb_page, 0, sizeof(data->ghcb_page));
 
+	snp_data = memblock_alloc(sizeof(*snp_data), PAGE_SIZE);
+	if (!snp_data)
+		panic("Can't allocate SEV-SNP runtime data");
+
+	err = early_set_memory_decrypted((unsigned long)&snp_data->hv_doorbell_page,
+					 sizeof(snp_data->hv_doorbell_page));
+	if (err)
+		panic("Can't map #HV doorbell pages unencrypted");
+
+	memset(&snp_data->hv_doorbell_page, 0, sizeof(snp_data->hv_doorbell_page));
+
+	per_cpu(snp_runtime_data, cpu) = snp_data;
+
 	data->ghcb_active = false;
 	data->backup_ghcb_active = false;
 }
@@ -2046,7 +2256,12 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
 
 static bool hv_raw_handle_exception(struct pt_regs *regs)
 {
-	return false;
+	/* Clear the no_further_signal bit */
+	sev_snp_current_doorbell_page()->pending_events.events &= 0x7fff;
+
+	check_hv_pending(regs);
+
+	return true;
 }
 
 static __always_inline bool on_hv_fallback_stack(struct pt_regs *regs)
@@ -2360,3 +2575,25 @@ static int __init snp_init_platform_device(void)
 	return 0;
 }
 device_initcall(snp_init_platform_device);
+
+noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state)
+{
+	/*
+	 * Check whether this returns to user mode, if so and if
+	 * we are currently executing the #HV handler then we don't
+	 * want to follow the irqentry_exit_to_user_mode path as
+	 * that can potentially cause the #HV handler to be
+	 * preempted and rescheduled on another CPU. Rescheduled #HV
+	 * handler on another cpu will cause interrupts to be handled
+	 * on a different cpu than the injected one, causing
+	 * invalid EOIs and missed/lost guest interrupts and
+	 * corresponding hangs and/or per-cpu IRQs handled on
+	 * non-intended cpu.
+	 */
+	if (user_mode(regs) &&
+	    this_cpu_read(snp_runtime_data)->hv_handling_events)
+		return;
+
+	/* follow normal interrupt return/exit path */
+	irqentry_exit(regs, state);
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 5dca05d0fa38..4f0f8dd2a5cb 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1521,5 +1521,7 @@ void __init trap_init(void)
 	cpu_init_exception_handling();
 	/* Setup traps as cpu_init() might #GP */
 	idt_setup_traps();
+	sev_snp_init_hv_handling();
+
 	cpu_init();
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 04/14] x86/sev: optimize system vector processing invoked from #HV exception
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (2 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 03/14] x86/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-16 10:23   ` Peter Zijlstra
  2023-05-15 16:59 ` [RFC PATCH V6 05/14] x86/hyperv: Add sev-snp enlightened guest static key Tianyu Lan
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Ashish Kalra <ashish.kalra@amd.com>

Construct system vector table and dispatch system vector exceptions through
sysvec_table from #HV exception handler instead of explicitly calling each
system vector. The system vector table is created dynamically and is placed
in a new named ELF section.

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
 arch/x86/entry/entry_64.S     |  6 +++
 arch/x86/kernel/sev.c         | 70 +++++++++++++----------------------
 arch/x86/kernel/vmlinux.lds.S |  7 ++++
 3 files changed, 38 insertions(+), 45 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 147b850babf6..f86b319d0a9e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -419,6 +419,12 @@ SYM_CODE_START(\asmsym)
 
 _ASM_NOKPROBE(\asmsym)
 SYM_CODE_END(\asmsym)
+	.if \vector >= FIRST_SYSTEM_VECTOR && \vector < NR_VECTORS
+		.section .system_vectors, "aw"
+		.byte \vector
+		.quad \cfunc
+		.previous
+	.endif
 .endm
 
 /*
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 400ca555bd48..ac3d758670b3 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -157,6 +157,16 @@ struct sev_snp_runtime_data {
 
 static DEFINE_PER_CPU(struct sev_snp_runtime_data*, snp_runtime_data);
 
+static void (*sysvec_table[NR_VECTORS - FIRST_SYSTEM_VECTOR])
+		(struct pt_regs *regs) __ro_after_init;
+
+struct sysvec_entry {
+	unsigned char vector;
+	void (*sysvec_func)(struct pt_regs *regs);
+} __packed;
+
+extern struct sysvec_entry __system_vectors[], __system_vectors_end[];
+
 static inline u64 sev_es_rd_ghcb_msr(void)
 {
 	return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
@@ -228,51 +238,11 @@ static void do_exc_hv(struct pt_regs *regs)
 		} else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
 			WARN(1, "syscall shouldn't happen\n");
 		} else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
-			switch (pending_events.vector) {
-#if IS_ENABLED(CONFIG_HYPERV)
-			case HYPERV_STIMER0_VECTOR:
-				sysvec_hyperv_stimer0(regs);
-				break;
-			case HYPERVISOR_CALLBACK_VECTOR:
-				sysvec_hyperv_callback(regs);
-				break;
-#endif
-#ifdef CONFIG_SMP
-			case RESCHEDULE_VECTOR:
-				sysvec_reschedule_ipi(regs);
-				break;
-			case IRQ_MOVE_CLEANUP_VECTOR:
-				sysvec_irq_move_cleanup(regs);
-				break;
-			case REBOOT_VECTOR:
-				sysvec_reboot(regs);
-				break;
-			case CALL_FUNCTION_SINGLE_VECTOR:
-				sysvec_call_function_single(regs);
-				break;
-			case CALL_FUNCTION_VECTOR:
-				sysvec_call_function(regs);
-				break;
-#endif
-#ifdef CONFIG_X86_LOCAL_APIC
-			case ERROR_APIC_VECTOR:
-				sysvec_error_interrupt(regs);
-				break;
-			case SPURIOUS_APIC_VECTOR:
-				sysvec_spurious_apic_interrupt(regs);
-				break;
-			case LOCAL_TIMER_VECTOR:
-				sysvec_apic_timer_interrupt(regs);
-				break;
-			case X86_PLATFORM_IPI_VECTOR:
-				sysvec_x86_platform_ipi(regs);
-				break;
-#endif
-			case 0x0:
-				break;
-			default:
-				panic("Unexpected vector %d\n", vector);
-				unreachable();
+			if (!(sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])) {
+				WARN(1, "system vector entry 0x%x is NULL\n",
+				     pending_events.vector);
+			} else {
+				(*sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])(regs);
 			}
 		} else {
 			common_interrupt(regs, pending_events.vector);
@@ -398,6 +368,14 @@ static bool sev_restricted_injection_enabled(void)
 	return sev_status & MSR_AMD64_SNP_RESTRICTED_INJ;
 }
 
+static void __init construct_sysvec_table(void)
+{
+	struct sysvec_entry *p;
+
+	for (p = __system_vectors; p < __system_vectors_end; p++)
+		sysvec_table[p->vector - FIRST_SYSTEM_VECTOR] = p->sysvec_func;
+}
+
 void __init sev_snp_init_hv_handling(void)
 {
 	struct sev_es_runtime_data *data;
@@ -422,6 +400,8 @@ void __init sev_snp_init_hv_handling(void)
 	apic_set_eoi_write(hv_doorbell_apic_eoi_write);
 
 	local_irq_restore(flags);
+
+	construct_sysvec_table();
 }
 
 static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 25f155205770..c37165d8e877 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -338,6 +338,13 @@ SECTIONS
 		*(.altinstr_replacement)
 	}
 
+	. = ALIGN(8);
+	.system_vectors : AT(ADDR(.system_vectors) - LOAD_OFFSET) {
+		__system_vectors = .;
+		*(.system_vectors)
+		__system_vectors_end = .;
+	}
+
 	. = ALIGN(8);
 	.apicdrivers : AT(ADDR(.apicdrivers) - LOAD_OFFSET) {
 		__apicdrivers = .;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 05/14] x86/hyperv: Add sev-snp enlightened guest static key
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (3 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 04/14] x86/sev: optimize system vector processing invoked from #HV exception Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 06/14] x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened guest Tianyu Lan
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

Introduce static key isolation_type_en_snp for enlightened
sev-snp guest check.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change since RFC-v3:
	* Remove some Hyper-V specific config setting
---
 arch/x86/hyperv/ivm.c           | 11 +++++++++++
 arch/x86/include/asm/mshyperv.h |  3 +++
 arch/x86/kernel/cpu/mshyperv.c  |  9 +++++++--
 drivers/hv/hv_common.c          |  6 ++++++
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 127d5b7b63de..368b2731950e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -409,3 +409,14 @@ bool hv_isolation_type_snp(void)
 {
 	return static_branch_unlikely(&isolation_type_snp);
 }
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_en_snp);
+/*
+ * hv_isolation_type_en_snp - Check system runs in the AMD SEV-SNP based
+ * isolation enlightened VM.
+ */
+bool hv_isolation_type_en_snp(void)
+{
+	return static_branch_unlikely(&isolation_type_en_snp);
+}
+
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index b445e252aa83..97d117ec95c4 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -26,6 +26,7 @@
 union hv_ghcb;
 
 DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+DECLARE_STATIC_KEY_FALSE(isolation_type_en_snp);
 
 typedef int (*hyperv_fill_flush_list_func)(
 		struct hv_guest_mapping_flush_list *flush,
@@ -45,6 +46,8 @@ extern void *hv_hypercall_pg;
 
 extern u64 hv_current_partition_id;
 
+extern bool hv_isolation_type_en_snp(void);
+
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index c7969e806c64..63a2bfbfe701 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -402,8 +402,12 @@ static void __init ms_hyperv_init_platform(void)
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
 
-		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+
+		if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
+			static_branch_enable(&isolation_type_en_snp);
+		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
 			static_branch_enable(&isolation_type_snp);
+		}
 	}
 
 	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
@@ -473,7 +477,8 @@ static void __init ms_hyperv_init_platform(void)
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	if ((hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS) ||
-	    (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP))
+	    (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP
+	     && !cc_platform_has(CC_ATTR_GUEST_SEV_SNP)))
 		hv_vtom_init();
 	/*
 	 * Setup the hook to get control post apic initialization.
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 64f9ceca887b..179bc5f5bf52 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -502,6 +502,12 @@ bool __weak hv_isolation_type_snp(void)
 }
 EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
 
+bool __weak hv_isolation_type_en_snp(void)
+{
+	return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_en_snp);
+
 void __weak hv_setup_vmbus_handler(void (*handler)(void))
 {
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 06/14] x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened guest
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (4 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 05/14] x86/hyperv: Add sev-snp enlightened guest static key Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 07/14] x86/hyperv: Set Virtual Trust Level in VMBus init message Tianyu Lan
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

hv vp assist page needs to be shared between SEV-SNP guest and Hyper-V.
So mark the page unencrypted in the SEV-SNP guest.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a5f9474f08e1..9f3e2d71d015 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -18,6 +18,7 @@
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
 #include <asm/idtentry.h>
+#include <asm/set_memory.h>
 #include <linux/kexec.h>
 #include <linux/version.h>
 #include <linux/vmalloc.h>
@@ -113,6 +114,11 @@ static int hv_cpu_init(unsigned int cpu)
 
 	}
 	if (!WARN_ON(!(*hvp))) {
+		if (hv_isolation_type_en_snp()) {
+			WARN_ON_ONCE(set_memory_decrypted((unsigned long)(*hvp), 1));
+			memset(*hvp, 0, PAGE_SIZE);
+		}
+
 		msr.enable = 1;
 		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 07/14] x86/hyperv: Set Virtual Trust Level in VMBus init message
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (5 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 06/14] x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened guest Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 08/14] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest Tianyu Lan
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

sev-snp guest provides vtl(Virtual Trust Level) and
get it from hyperv hvcall via HVCALL_GET_VP_REGISTERS.
Set target vtl in the VMBus init message.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change since RFC v4:
       * Use struct_size to calculate array size.
       * Fix some coding style

Change since RFC v3:
       * Use the standard helper functions to check hypercall result
       * Fix coding style

Change since RFC v2:
       * Rename get_current_vtl() to get_vtl()
       * Fix some coding style issues
---
 arch/x86/hyperv/hv_init.c          | 36 ++++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  7 ++++++
 drivers/hv/connection.c            |  1 +
 include/asm-generic/mshyperv.h     |  1 +
 include/linux/hyperv.h             |  4 ++--
 5 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 9f3e2d71d015..331b855314b7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -384,6 +384,40 @@ static void __init hv_get_partition_id(void)
 	local_irq_restore(flags);
 }
 
+static u8 __init get_vtl(void)
+{
+	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
+	struct hv_get_vp_registers_input *input;
+	struct hv_get_vp_registers_output *output;
+	u64 vtl = 0;
+	u64 ret;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	output = (struct hv_get_vp_registers_output *)input;
+	if (!input) {
+		local_irq_restore(flags);
+		goto done;
+	}
+
+	memset(input, 0, struct_size(input, element, 1));
+	input->header.partitionid = HV_PARTITION_ID_SELF;
+	input->header.vpindex = HV_VP_INDEX_SELF;
+	input->header.inputvtl = 0;
+	input->element[0].name0 = HV_X64_REGISTER_VSM_VP_STATUS;
+
+	ret = hv_do_hypercall(control, input, output);
+	if (hv_result_success(ret))
+		vtl = output->as64.low & HV_X64_VTL_MASK;
+	else
+		pr_err("Hyper-V: failed to get VTL! %lld", ret);
+	local_irq_restore(flags);
+
+done:
+	return vtl;
+}
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -512,6 +546,8 @@ void __init hyperv_init(void)
 	/* Query the VMs extended capability once, so that it can be cached. */
 	hv_query_ext_cap(0);
 
+	/* Find the VTL */
+	ms_hyperv.vtl = get_vtl();
 	return;
 
 clean_guest_os_id:
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index cea95dcd27c2..4bf0b315b0ce 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -301,6 +301,13 @@ enum hv_isolation_type {
 #define HV_X64_MSR_TIME_REF_COUNT	HV_REGISTER_TIME_REF_COUNT
 #define HV_X64_MSR_REFERENCE_TSC	HV_REGISTER_REFERENCE_TSC
 
+/*
+ * Registers are only accessible via HVCALL_GET_VP_REGISTERS hvcall and
+ * there is not associated MSR address.
+ */
+#define	HV_X64_REGISTER_VSM_VP_STATUS	0x000D0003
+#define	HV_X64_VTL_MASK			GENMASK(3, 0)
+
 /* Hyper-V memory host visibility */
 enum hv_mem_host_visibility {
 	VMBUS_PAGE_NOT_VISIBLE		= 0,
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5978e9dbc286..02b54f85dc60 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -98,6 +98,7 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 	 */
 	if (version >= VERSION_WIN10_V5) {
 		msg->msg_sint = VMBUS_MESSAGE_SINT;
+		msg->msg_vtl = ms_hyperv.vtl;
 		vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID_4;
 	} else {
 		msg->interrupt_page = virt_to_phys(vmbus_connection.int_page);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 402a8c1c202d..3052130ba4ef 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -48,6 +48,7 @@ struct ms_hyperv_info {
 		};
 	};
 	u64 shared_gpa_boundary;
+	u8 vtl;
 };
 extern struct ms_hyperv_info ms_hyperv;
 extern bool hv_nested;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index bfbc37ce223b..1f2bfec4abde 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -665,8 +665,8 @@ struct vmbus_channel_initiate_contact {
 		u64 interrupt_page;
 		struct {
 			u8	msg_sint;
-			u8	padding1[3];
-			u32	padding2;
+			u8	msg_vtl;
+			u8	reserved[6];
 		};
 	};
 	u64 monitor_page1;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 08/14] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (6 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 07/14] x86/hyperv: Set Virtual Trust Level in VMBus init message Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-16 10:29   ` Peter Zijlstra
  2023-05-15 16:59 ` [RFC PATCH V6 09/14] clocksource/drivers/hyper-v: decrypt hyperv tsc page " Tianyu Lan
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

In sev-snp enlightened guest, Hyper-V hypercall needs
to use vmmcall to trigger vmexit and notify hypervisor
to handle hypercall request.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change since RFC V2:
       * Fix indentation style
---
 arch/x86/include/asm/mshyperv.h | 44 ++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 97d117ec95c4..939373791249 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -61,16 +61,25 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	if (!hv_hypercall_pg)
-		return U64_MAX;
+	if (hv_isolation_type_en_snp()) {
+		__asm__ __volatile__("mov %4, %%r8\n"
+				     "vmmcall"
+				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+				       "+c" (control), "+d" (input_address)
+				     :  "r" (output_address)
+				     : "cc", "memory", "r8", "r9", "r10", "r11");
+	} else {
+		if (!hv_hypercall_pg)
+			return U64_MAX;
 
-	__asm__ __volatile__("mov %4, %%r8\n"
-			     CALL_NOSPEC
-			     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-			       "+c" (control), "+d" (input_address)
-			     :  "r" (output_address),
-				THUNK_TARGET(hv_hypercall_pg)
-			     : "cc", "memory", "r8", "r9", "r10", "r11");
+		__asm__ __volatile__("mov %4, %%r8\n"
+				     CALL_NOSPEC
+				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+				       "+c" (control), "+d" (input_address)
+				     :  "r" (output_address),
+					THUNK_TARGET(hv_hypercall_pg)
+				     : "cc", "memory", "r8", "r9", "r10", "r11");
+	}
 #else
 	u32 input_address_hi = upper_32_bits(input_address);
 	u32 input_address_lo = lower_32_bits(input_address);
@@ -104,7 +113,13 @@ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	{
+	if (hv_isolation_type_en_snp()) {
+		__asm__ __volatile__(
+				"vmmcall"
+				: "=a" (hv_status), ASM_CALL_CONSTRAINT,
+				"+c" (control), "+d" (input1)
+				:: "cc", "r8", "r9", "r10", "r11");
+	} else {
 		__asm__ __volatile__(CALL_NOSPEC
 				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
 				       "+c" (control), "+d" (input1)
@@ -149,7 +164,14 @@ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
 	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	{
+	if (hv_isolation_type_en_snp()) {
+		__asm__ __volatile__("mov %4, %%r8\n"
+				     "vmmcall"
+				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+				       "+c" (control), "+d" (input1)
+				     : "r" (input2)
+				     : "cc", "r8", "r9", "r10", "r11");
+	} else {
 		__asm__ __volatile__("mov %4, %%r8\n"
 				     CALL_NOSPEC
 				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 09/14] clocksource/drivers/hyper-v: decrypt hyperv tsc page in sev-snp enlightened guest
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (7 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 08/14] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 10/14] hv: vmbus: Mask VMBus pages unencrypted for " Tianyu Lan
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

Hyper-V tsc page is shared with hypervisor and it should
be decrypted in sev-snp enlightened guest when it's used.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change since RFC V2:
       * Change the Subject line prefix
---
 drivers/clocksource/hyperv_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index bcd9042a0c9f..66e29a19770b 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -376,7 +376,7 @@ EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
 static union {
 	struct ms_hyperv_tsc_page page;
 	u8 reserved[PAGE_SIZE];
-} tsc_pg __aligned(PAGE_SIZE);
+} tsc_pg __bss_decrypted __aligned(PAGE_SIZE);
 
 static struct ms_hyperv_tsc_page *tsc_page = &tsc_pg.page;
 static unsigned long tsc_pfn;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 10/14] hv: vmbus: Mask VMBus pages unencrypted for sev-snp enlightened guest
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (8 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 09/14] clocksource/drivers/hyper-v: decrypt hyperv tsc page " Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 11/14] drivers: hv: Decrypt percpu hvcall input arg page in " Tianyu Lan
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

VMBus post msg, synic event and message pages is necessary to
shared with hypervisor and so mask these pages unencrypted in
the sev-snp guest.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change sicne RFC V4:
       * Fix encrypt and free page order.

Change since RFC V3:
       * Set encrypt page back in the hv_synic_free()

Change since RFC V2:
       * Fix error in the error code path and encrypt
       	 pages correctly when decryption failure happens.
---
 drivers/hv/hv.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index de6708dbe0df..d29bbf0c7108 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -20,6 +20,7 @@
 #include <linux/interrupt.h>
 #include <clocksource/hyperv_timer.h>
 #include <asm/mshyperv.h>
+#include <linux/set_memory.h>
 #include "hyperv_vmbus.h"
 
 /* The one and only */
@@ -78,7 +79,7 @@ int hv_post_message(union hv_connection_id connection_id,
 
 int hv_synic_alloc(void)
 {
-	int cpu;
+	int cpu, ret;
 	struct hv_per_cpu_context *hv_cpu;
 
 	/*
@@ -123,9 +124,29 @@ int hv_synic_alloc(void)
 				goto err;
 			}
 		}
+
+		if (hv_isolation_type_en_snp()) {
+			ret = set_memory_decrypted((unsigned long)
+				hv_cpu->synic_message_page, 1);
+			if (ret)
+				goto err;
+
+			ret = set_memory_decrypted((unsigned long)
+				hv_cpu->synic_event_page, 1);
+			if (ret)
+				goto err_decrypt_event_page;
+
+			memset(hv_cpu->synic_message_page, 0, PAGE_SIZE);
+			memset(hv_cpu->synic_event_page, 0, PAGE_SIZE);
+		}
 	}
 
 	return 0;
+
+err_decrypt_event_page:
+	set_memory_encrypted((unsigned long)
+		hv_cpu->synic_message_page, 1);
+
 err:
 	/*
 	 * Any memory allocations that succeeded will be freed when
@@ -143,8 +164,18 @@ void hv_synic_free(void)
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
-		free_page((unsigned long)hv_cpu->synic_event_page);
-		free_page((unsigned long)hv_cpu->synic_message_page);
+		if (hv_isolation_type_en_snp()) {
+			if (!set_memory_encrypted((unsigned long)
+			    hv_cpu->synic_message_page, 1))
+				free_page((unsigned long)hv_cpu->synic_event_page);
+
+			if (!set_memory_encrypted((unsigned long)
+			    hv_cpu->synic_event_page, 1))
+				free_page((unsigned long)hv_cpu->synic_message_page);
+		} else {
+			free_page((unsigned long)hv_cpu->synic_event_page);
+			free_page((unsigned long)hv_cpu->synic_message_page);
+		}
 	}
 
 	kfree(hv_context.hv_numa_map);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 11/14] drivers: hv: Decrypt percpu hvcall input arg page in sev-snp enlightened guest
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (9 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 10/14] hv: vmbus: Mask VMBus pages unencrypted for " Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 12/14] x86/hyperv: Initialize cpu and memory for " Tianyu Lan
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

Hypervisor needs to access iput arg page and guest should decrypt
the page.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change since RFC V4:
        * Use pgcount to free intput arg page

Change since RFC V3:
	* Use pgcount to decrypt memory.

Change since RFC V2:
	* Set inputarg to be zero after kfree()
	* Not free mem when fail to encrypt mem in the hv_common_cpu_die().
---
 drivers/hv/hv_common.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 179bc5f5bf52..15d3054f3440 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -24,6 +24,7 @@
 #include <linux/kmsg_dump.h>
 #include <linux/slab.h>
 #include <linux/dma-map-ops.h>
+#include <linux/set_memory.h>
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
 
@@ -359,6 +360,7 @@ int hv_common_cpu_init(unsigned int cpu)
 	u64 msr_vp_index;
 	gfp_t flags;
 	int pgcount = hv_root_partition ? 2 : 1;
+	int ret;
 
 	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
 	flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
@@ -368,6 +370,17 @@ int hv_common_cpu_init(unsigned int cpu)
 	if (!(*inputarg))
 		return -ENOMEM;
 
+	if (hv_isolation_type_en_snp()) {
+		ret = set_memory_decrypted((unsigned long)*inputarg, pgcount);
+		if (ret) {
+			kfree(*inputarg);
+			*inputarg = NULL;
+			return ret;
+		}
+
+		memset(*inputarg, 0x00, pgcount * PAGE_SIZE);
+	}
+
 	if (hv_root_partition) {
 		outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
 		*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
@@ -387,6 +400,7 @@ int hv_common_cpu_die(unsigned int cpu)
 {
 	unsigned long flags;
 	void **inputarg, **outputarg;
+	int pgcount = hv_root_partition ? 2 : 1;
 	void *mem;
 
 	local_irq_save(flags);
@@ -402,7 +416,12 @@ int hv_common_cpu_die(unsigned int cpu)
 
 	local_irq_restore(flags);
 
-	kfree(mem);
+	if (hv_isolation_type_en_snp()) {
+		if (!set_memory_encrypted((unsigned long)mem, pgcount))
+			kfree(mem);
+	} else {
+		kfree(mem);
+	}
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 12/14] x86/hyperv: Initialize cpu and memory for sev-snp enlightened guest
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (10 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 11/14] drivers: hv: Decrypt percpu hvcall input arg page in " Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest Tianyu Lan
  2023-05-15 16:59 ` [RFC PATCH V6 14/14] x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES Tianyu Lan
  13 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

Read processor amd memory info from specific address which are
populated by Hyper-V. Initialize smp cpu related ops, pvalidate
system memory and add it into e820 table.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change since RFCv5:
	* Fix getting processor num in the
	hv_snp_get_smp_config() when ealry is false.

Change since RFCv4:
	* Add mem info addr to get mem layout info
---
 arch/x86/hyperv/ivm.c           | 87 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h | 17 +++++++
 arch/x86/kernel/cpu/mshyperv.c  |  3 ++
 3 files changed, 107 insertions(+)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 368b2731950e..85e4378f052f 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -17,6 +17,11 @@
 #include <asm/mem_encrypt.h>
 #include <asm/mshyperv.h>
 #include <asm/hypervisor.h>
+#include <asm/coco.h>
+#include <asm/io_apic.h>
+#include <asm/sev.h>
+#include <asm/realmode.h>
+#include <asm/e820/api.h>
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
@@ -57,6 +62,8 @@ union hv_ghcb {
 
 static u16 hv_ghcb_version __ro_after_init;
 
+static u32 processor_count;
+
 u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
 {
 	union hv_ghcb *hv_ghcb;
@@ -356,6 +363,86 @@ static bool hv_is_private_mmio(u64 addr)
 	return false;
 }
 
+static __init void hv_snp_get_smp_config(unsigned int early)
+{
+	if (early)
+		return;
+
+	/*
+	 * There is no firmware and ACPI MADT table spport in
+	 * in the Hyper-V SEV-SNP enlightened guest. Set smp
+	 * related config variable.
+	 */
+	while (num_processors < processor_count) {
+		early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
+		early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
+		physid_set(num_processors, phys_cpu_present_map);
+		set_cpu_possible(num_processors, true);
+		set_cpu_present(num_processors, true);
+		num_processors++;
+	}
+}
+
+__init void hv_sev_init_mem_and_cpu(void)
+{
+	struct memory_map_entry *entry;
+	struct e820_entry *e820_entry;
+	u64 e820_end;
+	u64 ram_end;
+	u64 page;
+
+	/*
+	 * Hyper-V enlightened snp guest boots kernel
+	 * directly without bootloader and so roms,
+	 * bios regions and reserve resources are not
+	 * available. Set these callback to NULL.
+	 */
+	x86_platform.legacy.rtc			= 0;
+	x86_platform.legacy.reserve_bios_regions = 0;
+	x86_platform.set_wallclock		= set_rtc_noop;
+	x86_platform.get_wallclock		= get_rtc_noop;
+	x86_init.resources.probe_roms		= x86_init_noop;
+	x86_init.resources.reserve_resources	= x86_init_noop;
+	x86_init.mpparse.find_smp_config	= x86_init_noop;
+	x86_init.mpparse.get_smp_config		= hv_snp_get_smp_config;
+
+	/*
+	 * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
+	 * and legacy APIC page read/write. Switch to hv apic here.
+	 */
+	disable_ioapic_support();
+
+	/* Get processor and mem info. */
+	processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
+	entry = (struct memory_map_entry *)__va(EN_SEV_SNP_MEM_INFO_ADDR);
+
+	/*
+	 * There is no bootloader/EFI firmware in the SEV SNP guest.
+	 * E820 table in the memory just describes memory for kernel,
+	 * ACPI table, cmdline, boot params and ramdisk. The dynamic
+	 * data(e.g, vcpu nnumber and the rest memory layout) needs to
+	 * be read from EN_SEV_SNP_PROCESSOR_INFO_ADDR.
+	 */
+	for (; entry->numpages != 0; entry++) {
+		e820_entry = &e820_table->entries[
+				e820_table->nr_entries - 1];
+		e820_end = e820_entry->addr + e820_entry->size;
+		ram_end = (entry->starting_gpn +
+			   entry->numpages) * PAGE_SIZE;
+
+		if (e820_end < entry->starting_gpn * PAGE_SIZE)
+			e820_end = entry->starting_gpn * PAGE_SIZE;
+
+		if (e820_end < ram_end) {
+			pr_info("Hyper-V: add e820 entry [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);
+			e820__range_add(e820_end, ram_end - e820_end,
+					E820_TYPE_RAM);
+			for (page = e820_end; page < ram_end; page += PAGE_SIZE)
+				pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
+		}
+	}
+}
+
 void __init hv_vtom_init(void)
 {
 	/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 939373791249..84e024ffacd5 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -50,6 +50,21 @@ extern bool hv_isolation_type_en_snp(void);
 
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
+/*
+ * Hyper-V puts processor and memory layout info
+ * to this address in SEV-SNP enlightened guest.
+ */
+#define EN_SEV_SNP_PROCESSOR_INFO_ADDR  0x802000
+#define EN_SEV_SNP_MEM_INFO_ADDR	0x802018
+
+struct memory_map_entry {
+	u64 starting_gpn;
+	u64 numpages;
+	u16 type;
+	u16 flags;
+	u32 reserved;
+};
+
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -255,12 +270,14 @@ void hv_ghcb_msr_read(u64 msr, u64 *value);
 bool hv_ghcb_negotiate_protocol(void);
 void hv_ghcb_terminate(unsigned int set, unsigned int reason);
 void hv_vtom_init(void);
+void hv_sev_init_mem_and_cpu(void);
 #else
 static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
 static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
 static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
 static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
 static inline void hv_vtom_init(void) {}
+static inline void hv_sev_init_mem_and_cpu(void) {}
 #endif
 
 extern bool hv_isolation_type_snp(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 63a2bfbfe701..dea9b881180b 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -529,6 +529,9 @@ static void __init ms_hyperv_init_platform(void)
 	if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
 		mark_tsc_unstable("running on Hyper-V");
 
+	if (hv_isolation_type_en_snp())
+		hv_sev_init_mem_and_cpu();
+
 	hardlockup_detector_disable();
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (11 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 12/14] x86/hyperv: Initialize cpu and memory for " Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  2023-05-16  5:16   ` [EXTERNAL] " Saurabh Singh Sengar
  2023-05-15 16:59 ` [RFC PATCH V6 14/14] x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES Tianyu Lan
  13 siblings, 1 reply; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

The wakeup_secondary_cpu callback was populated with wakeup_
cpu_via_vmgexit() which doesn't work for Hyper-V and Hyper-V
requires to call Hyper-V specific hvcall to start APs. So override
it with Hyper-V specific hook to start AP sev_es_save_area data
structure.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
Change since RFC v5:
       * Remove some redundant structure definitions

Change sicne RFC v3:
       * Replace struct sev_es_save_area with struct
         vmcb_save_area
       * Move code from mshyperv.c to ivm.c

Change since RFC v2:
       * Add helper function to initialize segment
       * Fix some coding style
---
 arch/x86/hyperv/ivm.c             | 98 +++++++++++++++++++++++++++++++
 arch/x86/include/asm/mshyperv.h   | 10 ++++
 arch/x86/kernel/cpu/mshyperv.c    | 13 +++-
 include/asm-generic/hyperv-tlfs.h |  3 +-
 4 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 85e4378f052f..b7b8e1ba8223 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -22,11 +22,15 @@
 #include <asm/sev.h>
 #include <asm/realmode.h>
 #include <asm/e820/api.h>
+#include <asm/desc.h>
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 #define GHCB_USAGE_HYPERV_CALL	1
 
+static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted __aligned(PAGE_SIZE);
+static u8 ap_start_stack[PAGE_SIZE] __aligned(PAGE_SIZE);
+
 union hv_ghcb {
 	struct ghcb ghcb;
 	struct {
@@ -443,6 +447,100 @@ __init void hv_sev_init_mem_and_cpu(void)
 	}
 }
 
+#define hv_populate_vmcb_seg(seg, gdtr_base)			\
+do {								\
+	if (seg.selector) {					\
+		seg.base = 0;					\
+		seg.limit = HV_AP_SEGMENT_LIMIT;		\
+		seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5);	\
+		seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
+	}							\
+} while (0)							\
+
+int hv_snp_boot_ap(int cpu, unsigned long start_ip)
+{
+	struct sev_es_save_area *vmsa = (struct sev_es_save_area *)
+		__get_free_page(GFP_KERNEL | __GFP_ZERO);
+	struct desc_ptr gdtr;
+	u64 ret, rmp_adjust, retry = 5;
+	struct hv_enable_vp_vtl *start_vp_input;
+	unsigned long flags;
+
+	native_store_gdt(&gdtr);
+
+	vmsa->gdtr.base = gdtr.address;
+	vmsa->gdtr.limit = gdtr.size;
+
+	asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
+	hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
+
+	asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
+	hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
+
+	asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
+	hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
+
+	asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
+	hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
+
+	vmsa->efer = native_read_msr(MSR_EFER);
+
+	asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
+	asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
+	asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
+
+	vmsa->xcr0 = 1;
+	vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
+	vmsa->rip = (u64)secondary_startup_64_no_verify;
+	vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
+
+	/*
+	 * Set the SNP-specific fields for this VMSA:
+	 *   VMPL level
+	 *   SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
+	 */;
+	vmsa->vmpl = 0;
+	vmsa->sev_features = sev_status >> 2;
+
+	/*
+	 * Running at VMPL0 allows the kernel to change the VMSA bit for a page
+	 * using the RMPADJUST instruction. However, for the instruction to
+	 * succeed it must target the permissions of a lesser privileged
+	 * (higher numbered) VMPL level, so use VMPL1 (refer to the RMPADJUST
+	 * instruction in the AMD64 APM Volume 3).
+	 */
+	rmp_adjust = RMPADJUST_VMSA_PAGE_BIT | 1;
+	ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
+			rmp_adjust);
+	if (ret != 0) {
+		pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
+		return ret;
+	}
+
+	local_irq_save(flags);
+	start_vp_input =
+		(struct hv_enable_vp_vtl *)ap_start_input_arg;
+	memset(start_vp_input, 0, sizeof(*start_vp_input));
+	start_vp_input->partition_id = -1;
+	start_vp_input->vp_index = cpu;
+	start_vp_input->target_vtl.target_vtl = ms_hyperv.vtl;
+	*(u64 *)&start_vp_input->vp_context = __pa(vmsa) | 1;
+
+	do {
+		ret = hv_do_hypercall(HVCALL_START_VP,
+				      start_vp_input, NULL);
+	} while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
+
+	if (!hv_result_success(ret)) {
+		pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
+		goto done;
+	}
+
+done:
+	local_irq_restore(flags);
+	return ret;
+}
+
 void __init hv_vtom_init(void)
 {
 	/*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 84e024ffacd5..9ad2a0f21d68 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -65,6 +65,13 @@ struct memory_map_entry {
 	u32 reserved;
 };
 
+/*
+ * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
+ * to start AP in enlightened SEV guest.
+ */
+#define HV_AP_INIT_GPAT_DEFAULT		0x0007040600070406ULL
+#define HV_AP_SEGMENT_LIMIT		0xffffffff
+
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -263,6 +270,7 @@ struct irq_domain *hv_create_pci_msi_domain(void);
 int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
 		struct hv_interrupt_entry *entry);
 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
+int hv_snp_boot_ap(int cpu, unsigned long start_ip);
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 void hv_ghcb_msr_write(u64 msr, u64 value);
@@ -271,6 +279,7 @@ bool hv_ghcb_negotiate_protocol(void);
 void hv_ghcb_terminate(unsigned int set, unsigned int reason);
 void hv_vtom_init(void);
 void hv_sev_init_mem_and_cpu(void);
+int hv_snp_boot_ap(int cpu, unsigned long start_ip);
 #else
 static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
 static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
@@ -278,6 +287,7 @@ static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
 static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
 static inline void hv_vtom_init(void) {}
 static inline void hv_sev_init_mem_and_cpu(void) {}
+static int hv_snp_boot_ap(int cpu, unsigned long start_ip) {}
 #endif
 
 extern bool hv_isolation_type_snp(void);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index dea9b881180b..0c5f9f7bd7ba 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -295,6 +295,16 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
 
 	native_smp_prepare_cpus(max_cpus);
 
+	/*
+	 *  Override wakeup_secondary_cpu_64 callback for SEV-SNP
+	 *  enlightened guest.
+	 */
+	if (hv_isolation_type_en_snp())
+		apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
+
+	if (!hv_root_partition)
+		return;
+
 #ifdef CONFIG_X86_64
 	for_each_present_cpu(i) {
 		if (i == 0)
@@ -502,8 +512,7 @@ static void __init ms_hyperv_init_platform(void)
 
 # ifdef CONFIG_SMP
 	smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
-	if (hv_root_partition)
-		smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
+	smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
 # endif
 
 	/*
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index f4e4cc4f965f..92dcc530350c 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -146,9 +146,9 @@ union hv_reference_tsc_msr {
 /* Declare the various hypercall operations. */
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE	0x0002
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST	0x0003
-#define HVCALL_ENABLE_VP_VTL			0x000f
 #define HVCALL_NOTIFY_LONG_SPIN_WAIT		0x0008
 #define HVCALL_SEND_IPI				0x000b
+#define HVCALL_ENABLE_VP_VTL			0x000f
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX	0x0013
 #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX	0x0014
 #define HVCALL_SEND_IPI_EX			0x0015
@@ -223,6 +223,7 @@ enum HV_GENERIC_SET_FORMAT {
 #define HV_STATUS_INVALID_PORT_ID		17
 #define HV_STATUS_INVALID_CONNECTION_ID		18
 #define HV_STATUS_INSUFFICIENT_BUFFERS		19
+#define HV_STATUS_TIME_OUT                      120
 #define HV_STATUS_VTL_ALREADY_ENABLED		134
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH V6 14/14] x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
  2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
                   ` (12 preceding siblings ...)
  2023-05-15 16:59 ` [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest Tianyu Lan
@ 2023-05-15 16:59 ` Tianyu Lan
  13 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-15 16:59 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, peterz, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Tianyu Lan <tiala@microsoft.com>

Add Hyperv-specific handling for faults caused by VMMCALL
instructions.

Signed-off-by: Tianyu Lan <tiala@microsoft.com>
---
 arch/x86/kernel/cpu/mshyperv.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 0c5f9f7bd7ba..3469b369e627 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -32,6 +32,7 @@
 #include <asm/nmi.h>
 #include <clocksource/hyperv_timer.h>
 #include <asm/numa.h>
+#include <asm/svm.h>
 
 /* Is Linux running as the root partition? */
 bool hv_root_partition;
@@ -577,6 +578,20 @@ static bool __init ms_hyperv_msi_ext_dest_id(void)
 	return eax & HYPERV_VS_PROPERTIES_EAX_EXTENDED_IOAPIC_RTE;
 }
 
+static void hv_sev_es_hcall_prepare(struct ghcb *ghcb, struct pt_regs *regs)
+{
+	/* RAX and CPL are already in the GHCB */
+	ghcb_set_rcx(ghcb, regs->cx);
+	ghcb_set_rdx(ghcb, regs->dx);
+	ghcb_set_r8(ghcb, regs->r8);
+}
+
+static bool hv_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
+{
+	/* No checking of the return state needed */
+	return true;
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
 	.name			= "Microsoft Hyper-V",
 	.detect			= ms_hyperv_platform,
@@ -584,4 +599,6 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
 	.init.x2apic_available	= ms_hyperv_x2apic_available,
 	.init.msi_ext_dest_id	= ms_hyperv_msi_ext_dest_id,
 	.init.init_platform	= ms_hyperv_init_platform,
+	.runtime.sev_es_hcall_prepare = hv_sev_es_hcall_prepare,
+	.runtime.sev_es_hcall_finish = hv_sev_es_hcall_finish,
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* RE: [EXTERNAL] [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest
  2023-05-15 16:59 ` [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest Tianyu Lan
@ 2023-05-16  5:16   ` Saurabh Singh Sengar
  2023-05-17  8:19     ` Tianyu Lan
  0 siblings, 1 reply; 40+ messages in thread
From: Saurabh Singh Sengar @ 2023-05-16  5:16 UTC (permalink / raw)
  To: Tianyu Lan, luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	pbonzini, jgross, Tianyu Lan, kirill, jiangshan.ljs, peterz,
	ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, thomas.lendacky, venu.busireddy, sterritt,
	tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch



> -----Original Message-----
> From: Tianyu Lan <ltykernel@gmail.com>
> Sent: Monday, May 15, 2023 10:29 PM
> To: luto@kernel.org; tglx@linutronix.de; mingo@redhat.com; bp@alien8.de;
> dave.hansen@linux.intel.com; x86@kernel.org; hpa@zytor.com;
> seanjc@google.com; pbonzini@redhat.com; jgross@suse.com; Tianyu Lan
> <Tianyu.Lan@microsoft.com>; kirill@shutemov.name;
> jiangshan.ljs@antgroup.com; peterz@infradead.org; ashish.kalra@amd.com;
> srutherford@google.com; akpm@linux-foundation.org;
> anshuman.khandual@arm.com; pawan.kumar.gupta@linux.intel.com;
> adrian.hunter@intel.com; daniel.sneddon@linux.intel.com;
> alexander.shishkin@linux.intel.com; sandipan.das@amd.com;
> ray.huang@amd.com; brijesh.singh@amd.com; michael.roth@amd.com;
> thomas.lendacky@amd.com; venu.busireddy@oracle.com;
> sterritt@google.com; tony.luck@intel.com; samitolvanen@google.com;
> fenghua.yu@intel.com
> Cc: pangupta@amd.com; linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
> linux-hyperv@vger.kernel.org; linux-arch@vger.kernel.org
> Subject: [EXTERNAL] [RFC PATCH V6 13/14] x86/hyperv: Add smp support for
> sev-snp guest
> 
> From: Tianyu Lan <tiala@microsoft.com>
> 
> The wakeup_secondary_cpu callback was populated with wakeup_
> cpu_via_vmgexit() which doesn't work for Hyper-V and Hyper-V requires to
> call Hyper-V specific hvcall to start APs. So override it with Hyper-V specific
> hook to start AP sev_es_save_area data structure.
> 
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
> Change since RFC v5:
>        * Remove some redundant structure definitions
> 
> Change sicne RFC v3:
>        * Replace struct sev_es_save_area with struct
>          vmcb_save_area
>        * Move code from mshyperv.c to ivm.c
> 
> Change since RFC v2:
>        * Add helper function to initialize segment
>        * Fix some coding style
> ---
>  arch/x86/hyperv/ivm.c             | 98 +++++++++++++++++++++++++++++++
>  arch/x86/include/asm/mshyperv.h   | 10 ++++
>  arch/x86/kernel/cpu/mshyperv.c    | 13 +++-
>  include/asm-generic/hyperv-tlfs.h |  3 +-
>  4 files changed, 121 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index
> 85e4378f052f..b7b8e1ba8223 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -22,11 +22,15 @@
>  #include <asm/sev.h>
>  #include <asm/realmode.h>
>  #include <asm/e820/api.h>
> +#include <asm/desc.h>
> 
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
> 
>  #define GHCB_USAGE_HYPERV_CALL	1
> 
> +static u8 ap_start_input_arg[PAGE_SIZE] __bss_decrypted
> +__aligned(PAGE_SIZE); static u8 ap_start_stack[PAGE_SIZE]
> +__aligned(PAGE_SIZE);
> +
>  union hv_ghcb {
>  	struct ghcb ghcb;
>  	struct {
> @@ -443,6 +447,100 @@ __init void hv_sev_init_mem_and_cpu(void)
>  	}
>  }
> 
> +#define hv_populate_vmcb_seg(seg, gdtr_base)			\
> +do {								\
> +	if (seg.selector) {					\
> +		seg.base = 0;					\
> +		seg.limit = HV_AP_SEGMENT_LIMIT;		\
> +		seg.attrib = *(u16 *)(gdtr_base + seg.selector + 5);	\
> +		seg.attrib = (seg.attrib & 0xFF) | ((seg.attrib >> 4) & 0xF00); \
> +	}							\
> +} while (0)							\
> +
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip) {
> +	struct sev_es_save_area *vmsa = (struct sev_es_save_area *)
> +		__get_free_page(GFP_KERNEL | __GFP_ZERO);
> +	struct desc_ptr gdtr;
> +	u64 ret, rmp_adjust, retry = 5;
> +	struct hv_enable_vp_vtl *start_vp_input;
> +	unsigned long flags;
> +
> +	native_store_gdt(&gdtr);
> +
> +	vmsa->gdtr.base = gdtr.address;
> +	vmsa->gdtr.limit = gdtr.size;
> +
> +	asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
> +	hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
> +
> +	asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
> +	hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
> +
> +	asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
> +	hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
> +
> +	asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
> +	hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
> +
> +	vmsa->efer = native_read_msr(MSR_EFER);
> +
> +	asm volatile("movq %%cr4, %%rax;" : "=a" (vmsa->cr4));
> +	asm volatile("movq %%cr3, %%rax;" : "=a" (vmsa->cr3));
> +	asm volatile("movq %%cr0, %%rax;" : "=a" (vmsa->cr0));
> +
> +	vmsa->xcr0 = 1;
> +	vmsa->g_pat = HV_AP_INIT_GPAT_DEFAULT;
> +	vmsa->rip = (u64)secondary_startup_64_no_verify;
> +	vmsa->rsp = (u64)&ap_start_stack[PAGE_SIZE];
> +
> +	/*
> +	 * Set the SNP-specific fields for this VMSA:
> +	 *   VMPL level
> +	 *   SEV_FEATURES (matches the SEV STATUS MSR right shifted 2 bits)
> +	 */;
> +	vmsa->vmpl = 0;
> +	vmsa->sev_features = sev_status >> 2;
> +
> +	/*
> +	 * Running at VMPL0 allows the kernel to change the VMSA bit for a
> page
> +	 * using the RMPADJUST instruction. However, for the instruction to
> +	 * succeed it must target the permissions of a lesser privileged
> +	 * (higher numbered) VMPL level, so use VMPL1 (refer to the
> RMPADJUST
> +	 * instruction in the AMD64 APM Volume 3).
> +	 */
> +	rmp_adjust = RMPADJUST_VMSA_PAGE_BIT | 1;
> +	ret = rmpadjust((unsigned long)vmsa, RMP_PG_SIZE_4K,
> +			rmp_adjust);
> +	if (ret != 0) {
> +		pr_err("RMPADJUST(%llx) failed: %llx\n", (u64)vmsa, ret);
> +		return ret;
> +	}
> +
> +	local_irq_save(flags);
> +	start_vp_input =
> +		(struct hv_enable_vp_vtl *)ap_start_input_arg;
> +	memset(start_vp_input, 0, sizeof(*start_vp_input));
> +	start_vp_input->partition_id = -1;
> +	start_vp_input->vp_index = cpu;
> +	start_vp_input->target_vtl.target_vtl = ms_hyperv.vtl;
> +	*(u64 *)&start_vp_input->vp_context = __pa(vmsa) | 1;
> +
> +	do {
> +		ret = hv_do_hypercall(HVCALL_START_VP,
> +				      start_vp_input, NULL);
> +	} while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);

can we restore local_irq here ?

> +
> +	if (!hv_result_success(ret)) {
> +		pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
> +		goto done;

No need of goto here.

Regards,
Saurabh

> +	}
> +
> +done:
> +	local_irq_restore(flags);
> +	return ret;
> +}
> +
>  void __init hv_vtom_init(void)
>  {
>  	/*
> diff --git a/arch/x86/include/asm/mshyperv.h
> b/arch/x86/include/asm/mshyperv.h index 84e024ffacd5..9ad2a0f21d68
> 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -65,6 +65,13 @@ struct memory_map_entry {
>  	u32 reserved;
>  };
> 
> +/*
> + * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
> + * to start AP in enlightened SEV guest.
> + */
> +#define HV_AP_INIT_GPAT_DEFAULT		0x0007040600070406ULL
> +#define HV_AP_SEGMENT_LIMIT		0xffffffff
> +
>  int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);  int
> hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);  int
> hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags); @@ -
> 263,6 +270,7 @@ struct irq_domain *hv_create_pci_msi_domain(void);  int
> hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
>  		struct hv_interrupt_entry *entry);
>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry
> *entry);
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip);
> 
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  void hv_ghcb_msr_write(u64 msr, u64 value); @@ -271,6 +279,7 @@ bool
> hv_ghcb_negotiate_protocol(void);  void hv_ghcb_terminate(unsigned int set,
> unsigned int reason);  void hv_vtom_init(void);  void
> hv_sev_init_mem_and_cpu(void);
> +int hv_snp_boot_ap(int cpu, unsigned long start_ip);
>  #else
>  static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}  static inline void
> hv_ghcb_msr_read(u64 msr, u64 *value) {} @@ -278,6 +287,7 @@ static
> inline bool hv_ghcb_negotiate_protocol(void) { return false; }  static inline
> void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}  static inline
> void hv_vtom_init(void) {}  static inline void hv_sev_init_mem_and_cpu(void)
> {}
> +static int hv_snp_boot_ap(int cpu, unsigned long start_ip) {}
>  #endif
> 
>  extern bool hv_isolation_type_snp(void); diff --git
> a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index
> dea9b881180b..0c5f9f7bd7ba 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -295,6 +295,16 @@ static void __init hv_smp_prepare_cpus(unsigned int
> max_cpus)
> 
>  	native_smp_prepare_cpus(max_cpus);
> 
> +	/*
> +	 *  Override wakeup_secondary_cpu_64 callback for SEV-SNP
> +	 *  enlightened guest.
> +	 */
> +	if (hv_isolation_type_en_snp())
> +		apic->wakeup_secondary_cpu_64 = hv_snp_boot_ap;
> +
> +	if (!hv_root_partition)
> +		return;
> +
>  #ifdef CONFIG_X86_64
>  	for_each_present_cpu(i) {
>  		if (i == 0)
> @@ -502,8 +512,7 @@ static void __init ms_hyperv_init_platform(void)
> 
>  # ifdef CONFIG_SMP
>  	smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> -	if (hv_root_partition)
> -		smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
> +	smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
>  # endif
> 
>  	/*
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-
> tlfs.h
> index f4e4cc4f965f..92dcc530350c 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -146,9 +146,9 @@ union hv_reference_tsc_msr {
>  /* Declare the various hypercall operations. */
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE	0x0002
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST	0x0003
> -#define HVCALL_ENABLE_VP_VTL			0x000f
>  #define HVCALL_NOTIFY_LONG_SPIN_WAIT		0x0008
>  #define HVCALL_SEND_IPI				0x000b
> +#define HVCALL_ENABLE_VP_VTL			0x000f
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX	0x0013
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX	0x0014
>  #define HVCALL_SEND_IPI_EX			0x0015
> @@ -223,6 +223,7 @@ enum HV_GENERIC_SET_FORMAT {
>  #define HV_STATUS_INVALID_PORT_ID		17
>  #define HV_STATUS_INVALID_CONNECTION_ID		18
>  #define HV_STATUS_INSUFFICIENT_BUFFERS		19
> +#define HV_STATUS_TIME_OUT                      120
>  #define HV_STATUS_VTL_ALREADY_ENABLED		134
> 
>  /*
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-15 16:59 ` [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler Tianyu Lan
@ 2023-05-16  9:30   ` Peter Zijlstra
  2023-05-17  9:01     ` Tianyu Lan
  2023-05-30 12:16     ` Gupta, Pankaj
  0 siblings, 2 replies; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-16  9:30 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On Mon, May 15, 2023 at 12:59:03PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <tiala@microsoft.com>
> 
> Add a #HV exception handler that uses IST stack.
> 

Urgh.. that is entirely insufficient. Like it doesn't even begin to
start to cover things.

The whole existing VC IST stack abuse is already a nightmare and you're
duplicating that.. without any explanation for why this would be needed
and how it is correct.

Please try again.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
  2023-05-15 16:59 ` [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path Tianyu Lan
@ 2023-05-16  9:32   ` Peter Zijlstra
  2023-05-17  9:55     ` Tianyu Lan
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-16  9:32 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On Mon, May 15, 2023 at 12:59:04PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <tiala@microsoft.com>
> 
> Add check_hv_pending() and check_hv_pending_after_irq() to
> check queued #HV event when irq is disabled.
> 
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
>  arch/x86/entry/entry_64.S       | 18 ++++++++++++++++
>  arch/x86/include/asm/irqflags.h | 14 +++++++++++-
>  arch/x86/kernel/sev.c           | 38 +++++++++++++++++++++++++++++++++
>  3 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 653b1f10699b..147b850babf6 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
>   * R15 - old SPEC_CTRL
>   */
>  SYM_CODE_START_LOCAL(paranoid_exit)
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +	/*
> +	 * If a #HV was delivered during execution and interrupts were
> +	 * disabled, then check if it can be handled before the iret
> +	 * (which may re-enable interrupts).
> +	 */
> +	mov     %rsp, %rdi
> +	call    check_hv_pending
> +#endif
>  	UNWIND_HINT_REGS
>  
>  	/*
> @@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
>  SYM_CODE_END(error_entry)
>  
>  SYM_CODE_START_LOCAL(error_return)
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +	/*
> +	 * If a #HV was delivered during execution and interrupts were
> +	 * disabled, then check if it can be handled before the iret
> +	 * (which may re-enable interrupts).
> +	 */
> +	mov     %rsp, %rdi
> +	call    check_hv_pending
> +#endif
>  	UNWIND_HINT_REGS
>  	DEBUG_ENTRY_ASSERT_IRQS_OFF
>  	testb	$3, CS(%rsp)

Oh hell no... do now you're adding unconditional calls to every single
interrupt and nmi exit path, with the grand total of 0 justification.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 03/14] x86/sev: Add AMD sev-snp enlightened guest support on hyperv
  2023-05-15 16:59 ` [RFC PATCH V6 03/14] x86/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
@ 2023-05-16  9:40   ` Peter Zijlstra
  2023-05-16 15:38     ` Dionna Amalie Glaze
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-16  9:40 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On Mon, May 15, 2023 at 12:59:05PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <tiala@microsoft.com>
> 
> Enable #HV exception to handle interrupt requests from hypervisor.
> 
> Co-developed-by: Lendacky Thomas <thomas.lendacky@amd.com>
> Co-developed-by: Kalra Ashish <ashish.kalra@amd.com>
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
> Change since RFC V5:
>        * Merge patch "x86/sev: Fix interrupt exit code paths from
>         #HV exception" with this commit.
> 
> Change since RFC V3:
>        * Check NMI event when irq is disabled.
>        * Remove redundant variable
> ---
>  arch/x86/include/asm/idtentry.h    |  12 +-
>  arch/x86/include/asm/mem_encrypt.h |   2 +
>  arch/x86/include/uapi/asm/svm.h    |   4 +
>  arch/x86/kernel/sev.c              | 349 ++++++++++++++++++++++++-----
>  arch/x86/kernel/traps.c            |   2 +
>  5 files changed, 310 insertions(+), 59 deletions(-)
> 
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index b0f3501b2767..867073ccf1d1 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
> @@ -13,6 +13,12 @@
>  
>  #include <asm/irq_stack.h>
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +noinstr void irqentry_exit_hv_cond(struct pt_regs *regs, irqentry_state_t state);
> +#else
> +#define irqentry_exit_hv_cond(regs, state)	irqentry_exit(regs, state)
> +#endif
> +
>  /**
>   * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
>   *		      No error code pushed by hardware
> @@ -201,7 +207,7 @@ __visible noinstr void func(struct pt_regs *regs,			\
>  	kvm_set_cpu_l1tf_flush_l1d();					\
>  	run_irq_on_irqstack_cond(__##func, regs, vector);		\
>  	instrumentation_end();						\
> -	irqentry_exit(regs, state);					\
> +	irqentry_exit_hv_cond(regs, state);				\
>  }									\
>  									\
>  static noinline void __##func(struct pt_regs *regs, u32 vector)
> @@ -241,7 +247,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
>  	kvm_set_cpu_l1tf_flush_l1d();					\
>  	run_sysvec_on_irqstack_cond(__##func, regs);			\
>  	instrumentation_end();						\
> -	irqentry_exit(regs, state);					\
> +	irqentry_exit_hv_cond(regs, state);				\
>  }									\
>  									\
>  static noinline void __##func(struct pt_regs *regs)
> @@ -270,7 +276,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
>  	__##func (regs);						\
>  	__irq_exit_raw();						\
>  	instrumentation_end();						\
> -	irqentry_exit(regs, state);					\
> +	irqentry_exit_hv_cond(regs, state);				\
>  }									\
>  									\
>  static __always_inline void __##func(struct pt_regs *regs)

WTF is this supposed to do and why is this the right way to achieve the
desired result?

Your changelog gives me 0 clues -- guess how much I then care about your
patches?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 04/14] x86/sev: optimize system vector processing invoked from #HV exception
  2023-05-15 16:59 ` [RFC PATCH V6 04/14] x86/sev: optimize system vector processing invoked from #HV exception Tianyu Lan
@ 2023-05-16 10:23   ` Peter Zijlstra
  2023-05-17 13:28     ` Tianyu Lan
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-16 10:23 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On Mon, May 15, 2023 at 12:59:06PM -0400, Tianyu Lan wrote:

So your subject states:

> Subject: [RFC PATCH V6 04/14] x86/sev: optimize system vector processing invoked from #HV exception
                                         ^^^^^^^^

> @@ -228,51 +238,11 @@ static void do_exc_hv(struct pt_regs *regs)
>  		} else if (pending_events.vector == IA32_SYSCALL_VECTOR) {
>  			WARN(1, "syscall shouldn't happen\n");
>  		} else if (pending_events.vector >= FIRST_SYSTEM_VECTOR) {
> -			switch (pending_events.vector) {
> -#if IS_ENABLED(CONFIG_HYPERV)
> -			case HYPERV_STIMER0_VECTOR:
> -				sysvec_hyperv_stimer0(regs);
> -				break;
> -			case HYPERVISOR_CALLBACK_VECTOR:
> -				sysvec_hyperv_callback(regs);
> -				break;
> -#endif
> -#ifdef CONFIG_SMP
> -			case RESCHEDULE_VECTOR:
> -				sysvec_reschedule_ipi(regs);
> -				break;
> -			case IRQ_MOVE_CLEANUP_VECTOR:
> -				sysvec_irq_move_cleanup(regs);
> -				break;
> -			case REBOOT_VECTOR:
> -				sysvec_reboot(regs);
> -				break;
> -			case CALL_FUNCTION_SINGLE_VECTOR:
> -				sysvec_call_function_single(regs);
> -				break;
> -			case CALL_FUNCTION_VECTOR:
> -				sysvec_call_function(regs);
> -				break;
> -#endif
> -#ifdef CONFIG_X86_LOCAL_APIC
> -			case ERROR_APIC_VECTOR:
> -				sysvec_error_interrupt(regs);
> -				break;
> -			case SPURIOUS_APIC_VECTOR:
> -				sysvec_spurious_apic_interrupt(regs);
> -				break;
> -			case LOCAL_TIMER_VECTOR:
> -				sysvec_apic_timer_interrupt(regs);
> -				break;
> -			case X86_PLATFORM_IPI_VECTOR:
> -				sysvec_x86_platform_ipi(regs);
> -				break;
> -#endif
> -			case 0x0:
> -				break;
> -			default:
> -				panic("Unexpected vector %d\n", vector);
> -				unreachable();
> +			if (!(sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])) {
> +				WARN(1, "system vector entry 0x%x is NULL\n",
> +				     pending_events.vector);
> +			} else {
> +				(*sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])(regs);
>  			}
>  		} else {
>  			common_interrupt(regs, pending_events.vector);

But your code replace direct calls with an indirect call. Now AFAIK,
this SNP shit came with Zen3, and Zen3 still uses retpolines for
indirect calls.

Can you connect the dots?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 08/14] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
  2023-05-15 16:59 ` [RFC PATCH V6 08/14] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest Tianyu Lan
@ 2023-05-16 10:29   ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-16 10:29 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On Mon, May 15, 2023 at 12:59:10PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <tiala@microsoft.com>
> 
> In sev-snp enlightened guest, Hyper-V hypercall needs
> to use vmmcall to trigger vmexit and notify hypervisor
> to handle hypercall request.
> 
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
> Change since RFC V2:
>        * Fix indentation style
> ---
>  arch/x86/include/asm/mshyperv.h | 44 ++++++++++++++++++++++++---------
>  1 file changed, 33 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 97d117ec95c4..939373791249 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -61,16 +61,25 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  	u64 hv_status;
>  
>  #ifdef CONFIG_X86_64
> -	if (!hv_hypercall_pg)
> -		return U64_MAX;
> +	if (hv_isolation_type_en_snp()) {
> +		__asm__ __volatile__("mov %4, %%r8\n"
> +				     "vmmcall"
> +				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> +				       "+c" (control), "+d" (input_address)
> +				     :  "r" (output_address)
> +				     : "cc", "memory", "r8", "r9", "r10", "r11");
> +	} else {
> +		if (!hv_hypercall_pg)
> +			return U64_MAX;
>  
> -	__asm__ __volatile__("mov %4, %%r8\n"
> -			     CALL_NOSPEC
> -			     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -			       "+c" (control), "+d" (input_address)
> -			     :  "r" (output_address),
> -				THUNK_TARGET(hv_hypercall_pg)
> -			     : "cc", "memory", "r8", "r9", "r10", "r11");
> +		__asm__ __volatile__("mov %4, %%r8\n"
> +				     CALL_NOSPEC
> +				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> +				       "+c" (control), "+d" (input_address)
> +				     :  "r" (output_address),
> +					THUNK_TARGET(hv_hypercall_pg)
> +				     : "cc", "memory", "r8", "r9", "r10", "r11");
> +	}

Wouldn't this generate better code with an alternative?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 03/14] x86/sev: Add AMD sev-snp enlightened guest support on hyperv
  2023-05-16  9:40   ` Peter Zijlstra
@ 2023-05-16 15:38     ` Dionna Amalie Glaze
  0 siblings, 0 replies; 40+ messages in thread
From: Dionna Amalie Glaze @ 2023-05-16 15:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tianyu Lan, luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	pbonzini, jgross, tiala, kirill, jiangshan.ljs, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu,
	pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

>
> WTF is this supposed to do and why is this the right way to achieve the
> desired result?
>
> Your changelog gives me 0 clues -- guess how much I then care about your
> patches?

Excuse me? No. This is incredibly rude and violates the community code
of conduct. Please review examples of creating a positive environment
here https://docs.kernel.org/process/code-of-conduct.html

-- 
-Dionna Glaze, PhD (she/her)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [EXTERNAL] [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest
  2023-05-16  5:16   ` [EXTERNAL] " Saurabh Singh Sengar
@ 2023-05-17  8:19     ` Tianyu Lan
  0 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-17  8:19 UTC (permalink / raw)
  To: Saurabh Singh Sengar, luto, tglx, mingo, bp, dave.hansen, x86,
	hpa, seanjc, pbonzini, jgross, Tianyu Lan, kirill, jiangshan.ljs,
	peterz, ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, thomas.lendacky, venu.busireddy, sterritt,
	tony.luck, samitolvanen, fenghua.yu
  Cc: pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

On 5/16/2023 1:16 PM, Saurabh Singh Sengar wrote:
>> +		(struct hv_enable_vp_vtl *)ap_start_input_arg;
>> +	memset(start_vp_input, 0, sizeof(*start_vp_input));
>> +	start_vp_input->partition_id = -1;
>> +	start_vp_input->vp_index = cpu;
>> +	start_vp_input->target_vtl.target_vtl = ms_hyperv.vtl;
>> +	*(u64 *)&start_vp_input->vp_context = __pa(vmsa) | 1;
>> +
>> +	do {
>> +		ret = hv_do_hypercall(HVCALL_START_VP,
>> +				      start_vp_input, NULL);
>> +	} while (hv_result(ret) == HV_STATUS_TIME_OUT && retry--);
> can we restore local_irq here ?
> 
>> +
>> +	if (!hv_result_success(ret)) {
>> +		pr_err("HvCallStartVirtualProcessor failed: %llx\n", ret);
>> +		goto done;
> No need of goto here.
> 

Nice catch. The goto label should be removed here. Will update in the 
next version.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-16  9:30   ` Peter Zijlstra
@ 2023-05-17  9:01     ` Tianyu Lan
  2023-05-30 12:16     ` Gupta, Pankaj
  1 sibling, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-17  9:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On 5/16/2023 5:30 PM, Peter Zijlstra wrote:
> On Mon, May 15, 2023 at 12:59:03PM -0400, Tianyu Lan wrote:
>> From: Tianyu Lan<tiala@microsoft.com>
>>
>> Add a #HV exception handler that uses IST stack.
>>
> Urgh.. that is entirely insufficient. Like it doesn't even begin to
> start to cover things.
> 
> The whole existing VC IST stack abuse is already a nightmare and you're
> duplicating that.. without any explanation for why this would be needed
> and how it is correct.
> 
> Please try again.

Hi Peter:
	Thanks for your review. Will add more explanation in the next version.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
  2023-05-16  9:32   ` Peter Zijlstra
@ 2023-05-17  9:55     ` Tianyu Lan
  2023-05-17 13:09       ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Tianyu Lan @ 2023-05-17  9:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On 5/16/2023 5:32 PM, Peter Zijlstra wrote:
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
>>    * R15 - old SPEC_CTRL
>>    */
>>   SYM_CODE_START_LOCAL(paranoid_exit)
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +	/*
>> +	 * If a #HV was delivered during execution and interrupts were
>> +	 * disabled, then check if it can be handled before the iret
>> +	 * (which may re-enable interrupts).
>> +	 */
>> +	mov     %rsp, %rdi
>> +	call    check_hv_pending
>> +#endif
>>   	UNWIND_HINT_REGS
>>   
>>   	/*
>> @@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
>>   SYM_CODE_END(error_entry)
>>   
>>   SYM_CODE_START_LOCAL(error_return)
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +	/*
>> +	 * If a #HV was delivered during execution and interrupts were
>> +	 * disabled, then check if it can be handled before the iret
>> +	 * (which may re-enable interrupts).
>> +	 */
>> +	mov     %rsp, %rdi
>> +	call    check_hv_pending
>> +#endif
>>   	UNWIND_HINT_REGS
>>   	DEBUG_ENTRY_ASSERT_IRQS_OFF
>>   	testb	$3, CS(%rsp)
> Oh hell no... do now you're adding unconditional calls to every single
> interrupt and nmi exit path, with the grand total of 0 justification.
> 

Sorry to Add check inside of check_hv_pending(). Will move the check 
before calling check_hv_pending() in the next version. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
  2023-05-17  9:55     ` Tianyu Lan
@ 2023-05-17 13:09       ` Peter Zijlstra
  2023-05-31 14:50         ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-17 13:09 UTC (permalink / raw)
  To: Tianyu Lan
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On Wed, May 17, 2023 at 05:55:45PM +0800, Tianyu Lan wrote:
> On 5/16/2023 5:32 PM, Peter Zijlstra wrote:
> > > --- a/arch/x86/entry/entry_64.S
> > > +++ b/arch/x86/entry/entry_64.S
> > > @@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
> > >    * R15 - old SPEC_CTRL
> > >    */
> > >   SYM_CODE_START_LOCAL(paranoid_exit)
> > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > +	/*
> > > +	 * If a #HV was delivered during execution and interrupts were
> > > +	 * disabled, then check if it can be handled before the iret
> > > +	 * (which may re-enable interrupts).
> > > +	 */
> > > +	mov     %rsp, %rdi
> > > +	call    check_hv_pending
> > > +#endif
> > >   	UNWIND_HINT_REGS
> > >   	/*
> > > @@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
> > >   SYM_CODE_END(error_entry)
> > >   SYM_CODE_START_LOCAL(error_return)
> > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > +	/*
> > > +	 * If a #HV was delivered during execution and interrupts were
> > > +	 * disabled, then check if it can be handled before the iret
> > > +	 * (which may re-enable interrupts).
> > > +	 */
> > > +	mov     %rsp, %rdi
> > > +	call    check_hv_pending
> > > +#endif
> > >   	UNWIND_HINT_REGS
> > >   	DEBUG_ENTRY_ASSERT_IRQS_OFF
> > >   	testb	$3, CS(%rsp)
> > Oh hell no... do now you're adding unconditional calls to every single
> > interrupt and nmi exit path, with the grand total of 0 justification.
> > 
> 
> Sorry to Add check inside of check_hv_pending(). Will move the check before
> calling check_hv_pending() in the next version. Thanks.

You will also explain, in the Changelog, in excruciating detail, *WHY*
any of this is required.

Any additional code in these paths that are only required for some
random hypervisor had better proof that they are absolutely required and
no alternative solution exists and have no performance impact on normal
users.

If this is due to Hyper-V design idiocies over something fundamentally
required by the hardware design you'll get a NAK.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 04/14] x86/sev: optimize system vector processing invoked from #HV exception
  2023-05-16 10:23   ` Peter Zijlstra
@ 2023-05-17 13:28     ` Tianyu Lan
  0 siblings, 0 replies; 40+ messages in thread
From: Tianyu Lan @ 2023-05-17 13:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On 5/16/2023 6:23 PM, Peter Zijlstra wrote:
>> -				panic("Unexpected vector %d\n", vector);
>> -				unreachable();
>> +			if (!(sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])) {
>> +				WARN(1, "system vector entry 0x%x is NULL\n",
>> +				     pending_events.vector);
>> +			} else {
>> +				(*sysvec_table[pending_events.vector - FIRST_SYSTEM_VECTOR])(regs);
>>   			}
>>   		} else {
>>   			common_interrupt(regs, pending_events.vector);
> But your code replace direct calls with an indirect call. Now AFAIK,
> this SNP shit came with Zen3, and Zen3 still uses retpolines for
> indirect calls.
> 
> Can you connect the dots?


The title is no exact and will update in the next version. Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-16  9:30   ` Peter Zijlstra
  2023-05-17  9:01     ` Tianyu Lan
@ 2023-05-30 12:16     ` Gupta, Pankaj
  2023-05-30 14:35       ` Peter Zijlstra
  2023-05-30 15:18       ` Dave Hansen
  1 sibling, 2 replies; 40+ messages in thread
From: Gupta, Pankaj @ 2023-05-30 12:16 UTC (permalink / raw)
  To: Peter Zijlstra, Tianyu Lan
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch


>> Add a #HV exception handler that uses IST stack.
>>
> 
> Urgh.. that is entirely insufficient. Like it doesn't even begin to
> start to cover things.
> 
> The whole existing VC IST stack abuse is already a nightmare and you're
> duplicating that.. without any explanation for why this would be needed
> and how it is correct.
> 
> Please try again.

#HV handler handles both #NMI & #MCE in the guest and nested #HV is 
never raised by the hypervisor. Next #HV exception is only raised by the 
hypervisor when Guest acknowledges the pending #HV exception by clearing 
"NoFurtherSignal” bit in the doorbell page.

There is still protection (please see hv_switch_off_ist()) to gracefully 
exit the guest if by any chance a malicious hypervisor sends nested #HV. 
This saves with most of the nested IST stack pitfalls with #NMI & #MCE, 
also #DB is handled in noinstr code 
block(exc_vmm_communication()->vc_is_db {...}) hence avoid any recursive 
#DBs.

Do you see anything else needs to be handled in #HV IST handling?

Thanks,
Pankaj



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-30 12:16     ` Gupta, Pankaj
@ 2023-05-30 14:35       ` Peter Zijlstra
  2023-05-30 15:59         ` Tom Lendacky
  2023-05-30 15:18       ` Dave Hansen
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-30 14:35 UTC (permalink / raw)
  To: Gupta, Pankaj
  Cc: Tianyu Lan, luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	pbonzini, jgross, tiala, kirill, jiangshan.ljs, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu,
	pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

On Tue, May 30, 2023 at 02:16:55PM +0200, Gupta, Pankaj wrote:
> 
> > > Add a #HV exception handler that uses IST stack.
> > > 
> > 
> > Urgh.. that is entirely insufficient. Like it doesn't even begin to
> > start to cover things.
> > 
> > The whole existing VC IST stack abuse is already a nightmare and you're
> > duplicating that.. without any explanation for why this would be needed
> > and how it is correct.
> > 
> > Please try again.
> 
> #HV handler handles both #NMI & #MCE in the guest and nested #HV is never
> raised by the hypervisor. 

I thought all this confidental computing nonsense was about not trusting
the hypervisor, so how come we're now relying on the hypervisor being
sane?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-30 12:16     ` Gupta, Pankaj
  2023-05-30 14:35       ` Peter Zijlstra
@ 2023-05-30 15:18       ` Dave Hansen
  1 sibling, 0 replies; 40+ messages in thread
From: Dave Hansen @ 2023-05-30 15:18 UTC (permalink / raw)
  To: Gupta, Pankaj, Peter Zijlstra, Tianyu Lan
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, tiala, kirill, jiangshan.ljs, ashish.kalra, srutherford,
	akpm, anshuman.khandual, pawan.kumar.gupta, adrian.hunter,
	daniel.sneddon, alexander.shishkin, sandipan.das, ray.huang,
	brijesh.singh, michael.roth, thomas.lendacky, venu.busireddy,
	sterritt, tony.luck, samitolvanen, fenghua.yu, pangupta,
	linux-kernel, kvm, linux-hyperv, linux-arch

On 5/30/23 05:16, Gupta, Pankaj wrote:
> #HV handler handles both #NMI & #MCE in the guest and nested #HV is
> never raised by the hypervisor. Next #HV exception is only raised by the
> hypervisor when Guest acknowledges the pending #HV exception by clearing
> "NoFurtherSignal” bit in the doorbell page.

There's a big difference between "is never raised by" and "cannot be
raised by".

Either way, this series (and this patch in particular) needs some much
better changelogs so that this behavior is clear.  It would also be nice
to reference the relevant parts of the hardware specs if the "hardware"*
is helping to provide these guarantees.

* I say "hardware" in quotes because on TDX a big chunk of this behavior
  is implemented in software in the TDX module.  SEV probably does it in
  microcode (or maybe in the secure processor), but I kinda doubt it's
  purely silicon.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-30 14:35       ` Peter Zijlstra
@ 2023-05-30 15:59         ` Tom Lendacky
  2023-05-30 18:52           ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Tom Lendacky @ 2023-05-30 15:59 UTC (permalink / raw)
  To: Peter Zijlstra, Gupta, Pankaj
  Cc: Tianyu Lan, luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	pbonzini, jgross, tiala, kirill, jiangshan.ljs, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, venu.busireddy, sterritt,
	tony.luck, samitolvanen, fenghua.yu, pangupta, linux-kernel, kvm,
	linux-hyperv, linux-arch

On 5/30/23 09:35, Peter Zijlstra wrote:
> On Tue, May 30, 2023 at 02:16:55PM +0200, Gupta, Pankaj wrote:
>>
>>>> Add a #HV exception handler that uses IST stack.
>>>>
>>>
>>> Urgh.. that is entirely insufficient. Like it doesn't even begin to
>>> start to cover things.
>>>
>>> The whole existing VC IST stack abuse is already a nightmare and you're
>>> duplicating that.. without any explanation for why this would be needed
>>> and how it is correct.
>>>
>>> Please try again.
>>
>> #HV handler handles both #NMI & #MCE in the guest and nested #HV is never
>> raised by the hypervisor.
> 
> I thought all this confidental computing nonsense was about not trusting
> the hypervisor, so how come we're now relying on the hypervisor being
> sane?

That should really say that a nested #HV should never be raised by the 
hypervisor, but if it is, then the guest should detect that and 
self-terminate knowing that the hypervisor is possibly being malicious.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-30 15:59         ` Tom Lendacky
@ 2023-05-30 18:52           ` Peter Zijlstra
  2023-05-30 19:03             ` Dave Hansen
                               ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-30 18:52 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Gupta, Pankaj, Tianyu Lan, luto, tglx, mingo, bp, dave.hansen,
	x86, hpa, seanjc, pbonzini, jgross, tiala, kirill, jiangshan.ljs,
	ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, venu.busireddy, sterritt, tony.luck, samitolvanen,
	fenghua.yu, pangupta, linux-kernel, kvm, linux-hyperv,
	linux-arch

On Tue, May 30, 2023 at 10:59:01AM -0500, Tom Lendacky wrote:
> On 5/30/23 09:35, Peter Zijlstra wrote:
> > On Tue, May 30, 2023 at 02:16:55PM +0200, Gupta, Pankaj wrote:
> > > 
> > > > > Add a #HV exception handler that uses IST stack.
> > > > > 
> > > > 
> > > > Urgh.. that is entirely insufficient. Like it doesn't even begin to
> > > > start to cover things.
> > > > 
> > > > The whole existing VC IST stack abuse is already a nightmare and you're
> > > > duplicating that.. without any explanation for why this would be needed
> > > > and how it is correct.
> > > > 
> > > > Please try again.
> > > 
> > > #HV handler handles both #NMI & #MCE in the guest and nested #HV is never
> > > raised by the hypervisor.
> > 
> > I thought all this confidental computing nonsense was about not trusting
> > the hypervisor, so how come we're now relying on the hypervisor being
> > sane?
> 
> That should really say that a nested #HV should never be raised by the
> hypervisor, but if it is, then the guest should detect that and
> self-terminate knowing that the hypervisor is possibly being malicious.

I've yet to see code that can do that reliably.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-30 18:52           ` Peter Zijlstra
@ 2023-05-30 19:03             ` Dave Hansen
  2023-05-31  9:14             ` Peter Zijlstra
  2023-06-06  6:00             ` Gupta, Pankaj
  2 siblings, 0 replies; 40+ messages in thread
From: Dave Hansen @ 2023-05-30 19:03 UTC (permalink / raw)
  To: Peter Zijlstra, Tom Lendacky
  Cc: Gupta, Pankaj, Tianyu Lan, luto, tglx, mingo, bp, dave.hansen,
	x86, hpa, seanjc, pbonzini, jgross, tiala, kirill, jiangshan.ljs,
	ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, venu.busireddy, sterritt, tony.luck, samitolvanen,
	fenghua.yu, pangupta, linux-kernel, kvm, linux-hyperv,
	linux-arch

On 5/30/23 11:52, Peter Zijlstra wrote:
>> That should really say that a nested #HV should never be raised by the
>> hypervisor, but if it is, then the guest should detect that and
>> self-terminate knowing that the hypervisor is possibly being malicious.
> I've yet to see code that can do that reliably.

By "#HV should never be raised by the hypervisor", I think Tom means:

	#HV can and will be raised by malicious hypervisors and the
	guest must be able to unambiguously handle it in a way that
	will not result in the guest getting rooted.

Right? ;)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-30 18:52           ` Peter Zijlstra
  2023-05-30 19:03             ` Dave Hansen
@ 2023-05-31  9:14             ` Peter Zijlstra
  2023-06-07 18:19               ` Tom Lendacky
  2023-06-06  6:00             ` Gupta, Pankaj
  2 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-31  9:14 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Gupta, Pankaj, Tianyu Lan, luto, tglx, mingo, bp, dave.hansen,
	x86, hpa, seanjc, pbonzini, jgross, tiala, kirill, jiangshan.ljs,
	ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, venu.busireddy, sterritt, tony.luck, samitolvanen,
	fenghua.yu, pangupta, linux-kernel, kvm, linux-hyperv,
	linux-arch

On Tue, May 30, 2023 at 08:52:32PM +0200, Peter Zijlstra wrote:

> > That should really say that a nested #HV should never be raised by the
> > hypervisor, but if it is, then the guest should detect that and
> > self-terminate knowing that the hypervisor is possibly being malicious.
> 
> I've yet to see code that can do that reliably.

Tom; could you please investigate if this can be enforced in ucode?

Ideally #HV would have an internal latch such that a recursive #HV will
terminate the guest (much like double #MC and tripple-fault).

But unlike the #MC trainwreck, can we please not leave a glaring hole in
this latch and use a spare bit in the IRET frame please?

So have #HV delivery:
 - check internal latch; if set, terminate machine
 - set latch
 - write IRET frame with magic bit set

have IRET:
 - check magic bit and reset #HV latch


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
  2023-05-17 13:09       ` Peter Zijlstra
@ 2023-05-31 14:50         ` Michael Kelley (LINUX)
  2023-05-31 15:48           ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Kelley (LINUX) @ 2023-05-31 14:50 UTC (permalink / raw)
  To: Peter Zijlstra, Tianyu Lan
  Cc: luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc, pbonzini,
	jgross, Tianyu Lan, kirill, jiangshan.ljs, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, thomas.lendacky,
	venu.busireddy, sterritt, tony.luck, samitolvanen, fenghua.yu,
	pangupta, linux-kernel, kvm, linux-hyperv, linux-arch

From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, May 17, 2023 6:10 AM
> 
> On Wed, May 17, 2023 at 05:55:45PM +0800, Tianyu Lan wrote:
> > On 5/16/2023 5:32 PM, Peter Zijlstra wrote:
> > > > --- a/arch/x86/entry/entry_64.S
> > > > +++ b/arch/x86/entry/entry_64.S
> > > > @@ -1019,6 +1019,15 @@ SYM_CODE_END(paranoid_entry)
> > > >    * R15 - old SPEC_CTRL
> > > >    */
> > > >   SYM_CODE_START_LOCAL(paranoid_exit)
> > > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > > +	/*
> > > > +	 * If a #HV was delivered during execution and interrupts were
> > > > +	 * disabled, then check if it can be handled before the iret
> > > > +	 * (which may re-enable interrupts).
> > > > +	 */
> > > > +	mov     %rsp, %rdi
> > > > +	call    check_hv_pending
> > > > +#endif
> > > >   	UNWIND_HINT_REGS
> > > >   	/*
> > > > @@ -1143,6 +1152,15 @@ SYM_CODE_START(error_entry)
> > > >   SYM_CODE_END(error_entry)
> > > >   SYM_CODE_START_LOCAL(error_return)
> > > > +#ifdef CONFIG_AMD_MEM_ENCRYPT
> > > > +	/*
> > > > +	 * If a #HV was delivered during execution and interrupts were
> > > > +	 * disabled, then check if it can be handled before the iret
> > > > +	 * (which may re-enable interrupts).
> > > > +	 */
> > > > +	mov     %rsp, %rdi
> > > > +	call    check_hv_pending
> > > > +#endif
> > > >   	UNWIND_HINT_REGS
> > > >   	DEBUG_ENTRY_ASSERT_IRQS_OFF
> > > >   	testb	$3, CS(%rsp)
> > > Oh hell no... do now you're adding unconditional calls to every single
> > > interrupt and nmi exit path, with the grand total of 0 justification.
> > >
> >
> > Sorry to Add check inside of check_hv_pending(). Will move the check before
> > calling check_hv_pending() in the next version. Thanks.
> 
> You will also explain, in the Changelog, in excruciating detail, *WHY*
> any of this is required.
> 
> Any additional code in these paths that are only required for some
> random hypervisor had better proof that they are absolutely required and
> no alternative solution exists and have no performance impact on normal
> users.
> 
> If this is due to Hyper-V design idiocies over something fundamentally
> required by the hardware design you'll get a NAK.

I'm jumping in to answer some of the basic questions here.  Yesterday,
there was a discussion about nested #HV exceptions, so maybe some of
this is already understood, but let me recap at a higher level, provide some
references, and suggest the path forward.

This code and some of the other patches in this series are for handling the
#HV exception that is introduced by the Restricted Interrupt Injection
feature of the SEV-SNP architecture.  See Section 15.36.16 of [1], and
Section 5 of [2].   There's also an AMD presentation from LPC last fall [3].

Hyper-V requires that the guest implement Restricted Interrupt Injection
to handle the case of a compromised hypervisor injecting an exception
(and forcing the running of that exception handler), even when it should
be disallowed by guest state. For example, the hypervisor could inject an
interrupt while the guest has interrupts disabled.  In time, presumably other
hypervisors like KVM will at least have an option where they expect SEV-SNP
guests to implement Restricted Interrupt Injection functionality, so it's
not Hyper-V specific.

Naming the new exception as #HV and use of "hv" as the Linux prefix
for related functions and variable names is a bit unfortunate.  It
conflicts with the existing use of the "hv" prefix to denote Hyper-V
specific code in the Linux kernel, and at first glance makes this code
look like it is Hyper-V specific code. Maybe we can choose a different
prefix ("hvex"?) for this #HV exception related code to avoid that
"first glance" confusion.

I've talked with Tianyu offline, and he will do the following:

1) Split this patch set into two patch sets.  The first patch set is Hyper-V
specific code for managing communication pages that must be shared
between the guest and Hyper-V, for starting APs, etc.  The second patch
set will be only the Restricted Interrupt Injection and #HV code.

2) For the Restricted Interrupt Injection code, Tianyu will look at
how to absolutely minimize the impact in the hot code paths,
particularly when SEV-SNP is not active.  Hopefully the impact can
be a couple of instructions at most, or even less with the use of
other existing kernel techniques.  He'll look at the other things you've
commented on and get the code into a better state.  I'll work with
him on writing commit messages and comments that explain what's
going on.

Michael

[1] https://www.amd.com/system/files/TechDocs/24593.pdf 
[2] https://www.amd.com/system/files/TechDocs/56421-guest-hypervisor-communication-block-standardization.pdf
[3] https://lpc.events/event/16/contributions/1321/attachments/965/1886/SNP_Interrupt_Security.pptx 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
  2023-05-31 14:50         ` Michael Kelley (LINUX)
@ 2023-05-31 15:48           ` Peter Zijlstra
  2023-05-31 15:58             ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2023-05-31 15:48 UTC (permalink / raw)
  To: Michael Kelley (LINUX)
  Cc: Tianyu Lan, luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	pbonzini, jgross, Tianyu Lan, kirill, jiangshan.ljs,
	ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, thomas.lendacky, venu.busireddy, sterritt,
	tony.luck, samitolvanen, fenghua.yu, pangupta, linux-kernel, kvm,
	linux-hyperv, linux-arch

On Wed, May 31, 2023 at 02:50:50PM +0000, Michael Kelley (LINUX) wrote:

> I'm jumping in to answer some of the basic questions here.  Yesterday,
> there was a discussion about nested #HV exceptions, so maybe some of
> this is already understood, but let me recap at a higher level, provide some
> references, and suggest the path forward.

> 2) For the Restricted Interrupt Injection code, Tianyu will look at
> how to absolutely minimize the impact in the hot code paths,
> particularly when SEV-SNP is not active.  Hopefully the impact can
> be a couple of instructions at most, or even less with the use of
> other existing kernel techniques.  He'll look at the other things you've
> commented on and get the code into a better state.  I'll work with
> him on writing commit messages and comments that explain what's
> going on.

So from what I understand of all this SEV-SNP/#HV muck is that it is
near impossible to get right without ucode/hw changes. Hence my request
to Tom to look into that.

The feature as specified in the AMD documentation seems fundamentally
buggered.

Specifically #HV needs to be IST because hypervisor can inject at any
moment, irrespective of IF or anything else -- even #HV itself. This
means also in the syscall gap.

Since it is IST, a nested #HV is instant stack corruption -- #HV can
attempt to play stack games as per the copied #VC crap (which I'm not at
all convinced about being correct itself), but this doesn't actually fix
anything, all you need is a single instruction window to wreck things.

Because as stated, the whole premise is that the hypervisor is out to
get you, you must not leave it room to wiggle. As is, this is security
through prayer, and we don't do that.

In short; I really want a solid proof that what you propose to implement
is correct and not wishful thinking.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path
  2023-05-31 15:48           ` Peter Zijlstra
@ 2023-05-31 15:58             ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 40+ messages in thread
From: Michael Kelley (LINUX) @ 2023-05-31 15:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tianyu Lan, luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	pbonzini, jgross, Tianyu Lan, kirill, jiangshan.ljs,
	ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, thomas.lendacky, venu.busireddy, sterritt,
	tony.luck, samitolvanen, fenghua.yu, pangupta, linux-kernel, kvm,
	linux-hyperv, linux-arch

From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, May 31, 2023 8:49 AM
> 
> On Wed, May 31, 2023 at 02:50:50PM +0000, Michael Kelley (LINUX) wrote:
> 
> > I'm jumping in to answer some of the basic questions here.  Yesterday,
> > there was a discussion about nested #HV exceptions, so maybe some of
> > this is already understood, but let me recap at a higher level, provide some
> > references, and suggest the path forward.
> 
> > 2) For the Restricted Interrupt Injection code, Tianyu will look at
> > how to absolutely minimize the impact in the hot code paths,
> > particularly when SEV-SNP is not active.  Hopefully the impact can
> > be a couple of instructions at most, or even less with the use of
> > other existing kernel techniques.  He'll look at the other things you've
> > commented on and get the code into a better state.  I'll work with
> > him on writing commit messages and comments that explain what's
> > going on.
> 
> So from what I understand of all this SEV-SNP/#HV muck is that it is
> near impossible to get right without ucode/hw changes. Hence my request
> to Tom to look into that.
> 
> The feature as specified in the AMD documentation seems fundamentally
> buggered.
> 
> Specifically #HV needs to be IST because hypervisor can inject at any
> moment, irrespective of IF or anything else -- even #HV itself. This
> means also in the syscall gap.
> 
> Since it is IST, a nested #HV is instant stack corruption -- #HV can
> attempt to play stack games as per the copied #VC crap (which I'm not at
> all convinced about being correct itself), but this doesn't actually fix
> anything, all you need is a single instruction window to wreck things.
> 
> Because as stated, the whole premise is that the hypervisor is out to
> get you, you must not leave it room to wiggle. As is, this is security
> through prayer, and we don't do that.
> 
> In short; I really want a solid proof that what you propose to implement
> is correct and not wishful thinking.

Fair enough.  We will be sync'ing with the AMD folks to make sure that
one way or another this really will work.

Michael


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-30 18:52           ` Peter Zijlstra
  2023-05-30 19:03             ` Dave Hansen
  2023-05-31  9:14             ` Peter Zijlstra
@ 2023-06-06  6:00             ` Gupta, Pankaj
  2023-06-06  7:50               ` Peter Zijlstra
  2 siblings, 1 reply; 40+ messages in thread
From: Gupta, Pankaj @ 2023-06-06  6:00 UTC (permalink / raw)
  To: Peter Zijlstra, Tom Lendacky
  Cc: Tianyu Lan, luto, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	pbonzini, jgross, tiala, kirill, jiangshan.ljs, ashish.kalra,
	srutherford, akpm, anshuman.khandual, pawan.kumar.gupta,
	adrian.hunter, daniel.sneddon, alexander.shishkin, sandipan.das,
	ray.huang, brijesh.singh, michael.roth, venu.busireddy, sterritt,
	tony.luck, samitolvanen, fenghua.yu, pangupta, linux-kernel, kvm,
	linux-hyperv, linux-arch


>> That should really say that a nested #HV should never be raised by the
>> hypervisor, but if it is, then the guest should detect that and
>> self-terminate knowing that the hypervisor is possibly being malicious.
> 
> I've yet to see code that can do that reliably.

- Currently, we are detecting the direct nested #HV with below check and
   guest self terminate.

   <snip>
	if (get_stack_info_noinstr(stack, current, &info) &&
	    (info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV) ||
	     info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV2)))
		panic("Nested #HV exception, HV IST corrupted, stack
                 type = %d\n", info.type);
   </snip>

- Thinking about below solution to detect the nested
   #HV reliably:

   -- Make reliable IST stack switching for #VC -> #HV -> #VC case
      (similar to done in __sev_es_ist_enter/__sev_es_ist_exit for NMI
      IST stack).

   -- In addition to this, we can make nested #HV detection (with another
      exception type) more reliable with refcounting (percpu?).

Need your inputs before I implement this solution. Or any other idea in 
software you have in mind?

Thanks,
Pankaj


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-06-06  6:00             ` Gupta, Pankaj
@ 2023-06-06  7:50               ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2023-06-06  7:50 UTC (permalink / raw)
  To: Gupta, Pankaj
  Cc: Tom Lendacky, Tianyu Lan, luto, tglx, mingo, bp, dave.hansen,
	x86, hpa, seanjc, pbonzini, jgross, tiala, kirill, jiangshan.ljs,
	ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, venu.busireddy, sterritt, tony.luck, samitolvanen,
	fenghua.yu, pangupta, linux-kernel, kvm, linux-hyperv,
	linux-arch

On Tue, Jun 06, 2023 at 08:00:32AM +0200, Gupta, Pankaj wrote:
> 
> > > That should really say that a nested #HV should never be raised by the
> > > hypervisor, but if it is, then the guest should detect that and
> > > self-terminate knowing that the hypervisor is possibly being malicious.
> > 
> > I've yet to see code that can do that reliably.
> 
> - Currently, we are detecting the direct nested #HV with below check and
>   guest self terminate.
> 
>   <snip>
> 	if (get_stack_info_noinstr(stack, current, &info) &&
> 	    (info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV) ||
> 	     info.type == (STACK_TYPE_EXCEPTION + ESTACK_HV2)))
> 		panic("Nested #HV exception, HV IST corrupted, stack
>                 type = %d\n", info.type);
>   </snip>
> 
> - Thinking about below solution to detect the nested
>   #HV reliably:
> 
>   -- Make reliable IST stack switching for #VC -> #HV -> #VC case
>      (similar to done in __sev_es_ist_enter/__sev_es_ist_exit for NMI
>      IST stack).

I'm not convinced any of that is actually correct; there is a *huge*
window between NMI hitting and calling __sev_es_ist_enter(), idem on the
exit side.

>   -- In addition to this, we can make nested #HV detection (with another
>      exception type) more reliable with refcounting (percpu?).

There is also #DB and the MOVSS shadow.

And no, I don't think any of that is what you'd call 'robust'. This is
what I call a trainwreck :/

And I'm more than willing to say no until the hardware is more sane.

Supervisor Shadow Stack support is in the same boat, that's on hold
until FRED makes things workable.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler
  2023-05-31  9:14             ` Peter Zijlstra
@ 2023-06-07 18:19               ` Tom Lendacky
  0 siblings, 0 replies; 40+ messages in thread
From: Tom Lendacky @ 2023-06-07 18:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gupta, Pankaj, Tianyu Lan, luto, tglx, mingo, bp, dave.hansen,
	x86, hpa, seanjc, pbonzini, jgross, tiala, kirill, jiangshan.ljs,
	ashish.kalra, srutherford, akpm, anshuman.khandual,
	pawan.kumar.gupta, adrian.hunter, daniel.sneddon,
	alexander.shishkin, sandipan.das, ray.huang, brijesh.singh,
	michael.roth, venu.busireddy, sterritt, tony.luck, samitolvanen,
	fenghua.yu, pangupta, linux-kernel, kvm, linux-hyperv,
	linux-arch

On 5/31/23 04:14, Peter Zijlstra wrote:
> On Tue, May 30, 2023 at 08:52:32PM +0200, Peter Zijlstra wrote:
> 
>>> That should really say that a nested #HV should never be raised by the
>>> hypervisor, but if it is, then the guest should detect that and
>>> self-terminate knowing that the hypervisor is possibly being malicious.
>>
>> I've yet to see code that can do that reliably.
> 
> Tom; could you please investigate if this can be enforced in ucode?
> 
> Ideally #HV would have an internal latch such that a recursive #HV will
> terminate the guest (much like double #MC and tripple-fault).
> 
> But unlike the #MC trainwreck, can we please not leave a glaring hole in
> this latch and use a spare bit in the IRET frame please?
> 
> So have #HV delivery:
>   - check internal latch; if set, terminate machine
>   - set latch
>   - write IRET frame with magic bit set
> 
> have IRET:
>   - check magic bit and reset #HV latch

Hi Peter,

I talked with the hardware team about this and, unfortunately, it is not 
practical to implement. The main concerns are that there are already two 
generations of hardware out there with the current support and, given 
limited patch space, in addition to the ucode support to track and perform 
the latch support, additional ucode support would be required to 
save/restore the latch information when handling a VMEXIT during #HV 
processing.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-06-07 18:20 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 16:59 [RFC PATCH V6 00/14] x86/hyperv/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 01/14] x86/sev: Add a #HV exception handler Tianyu Lan
2023-05-16  9:30   ` Peter Zijlstra
2023-05-17  9:01     ` Tianyu Lan
2023-05-30 12:16     ` Gupta, Pankaj
2023-05-30 14:35       ` Peter Zijlstra
2023-05-30 15:59         ` Tom Lendacky
2023-05-30 18:52           ` Peter Zijlstra
2023-05-30 19:03             ` Dave Hansen
2023-05-31  9:14             ` Peter Zijlstra
2023-06-07 18:19               ` Tom Lendacky
2023-06-06  6:00             ` Gupta, Pankaj
2023-06-06  7:50               ` Peter Zijlstra
2023-05-30 15:18       ` Dave Hansen
2023-05-15 16:59 ` [RFC PATCH V6 02/14] x86/sev: Add Check of #HV event in path Tianyu Lan
2023-05-16  9:32   ` Peter Zijlstra
2023-05-17  9:55     ` Tianyu Lan
2023-05-17 13:09       ` Peter Zijlstra
2023-05-31 14:50         ` Michael Kelley (LINUX)
2023-05-31 15:48           ` Peter Zijlstra
2023-05-31 15:58             ` Michael Kelley (LINUX)
2023-05-15 16:59 ` [RFC PATCH V6 03/14] x86/sev: Add AMD sev-snp enlightened guest support on hyperv Tianyu Lan
2023-05-16  9:40   ` Peter Zijlstra
2023-05-16 15:38     ` Dionna Amalie Glaze
2023-05-15 16:59 ` [RFC PATCH V6 04/14] x86/sev: optimize system vector processing invoked from #HV exception Tianyu Lan
2023-05-16 10:23   ` Peter Zijlstra
2023-05-17 13:28     ` Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 05/14] x86/hyperv: Add sev-snp enlightened guest static key Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 06/14] x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened guest Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 07/14] x86/hyperv: Set Virtual Trust Level in VMBus init message Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 08/14] x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest Tianyu Lan
2023-05-16 10:29   ` Peter Zijlstra
2023-05-15 16:59 ` [RFC PATCH V6 09/14] clocksource/drivers/hyper-v: decrypt hyperv tsc page " Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 10/14] hv: vmbus: Mask VMBus pages unencrypted for " Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 11/14] drivers: hv: Decrypt percpu hvcall input arg page in " Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 12/14] x86/hyperv: Initialize cpu and memory for " Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 13/14] x86/hyperv: Add smp support for sev-snp guest Tianyu Lan
2023-05-16  5:16   ` [EXTERNAL] " Saurabh Singh Sengar
2023-05-17  8:19     ` Tianyu Lan
2023-05-15 16:59 ` [RFC PATCH V6 14/14] x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES Tianyu Lan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).