linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] x86/sev-es: Fixes for SEV-ES Guest Support
@ 2021-06-14 13:53 Joerg Roedel
  2021-06-14 13:53 ` [PATCH v5 1/6] x86/sev-es: Fix error message in runtime #VC handler Joerg Roedel
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-06-14 13:53 UTC (permalink / raw)
  To: x86
  Cc: Joerg Roedel, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

From: Joerg Roedel <jroedel@suse.de>

Hi,

here is the next revision of my pending fixes for Linux' SEV-ES
support. Changes to the previous version are:

	- Updated patch 2 and implemented wrappers around
	  sev_es_get/put_ghcb() which will disable/enable IRQs when
	  needed.
	- Updated patch 3 to now call the from_user and from_kernel
	  functions of the VC handler directly from entry-asm.

The previous version can be found here:

	https://lore.kernel.org/lkml/20210610091141.30322-1-joro@8bytes.org/

Changes are based on tip/x86/urgent. Please review.

Thanks,

	Joerg


Joerg Roedel (6):
  x86/sev-es: Fix error message in runtime #VC handler
  x86/sev-es: Make sure IRQs are disabled while GHCB is active
  x86/sev-es: Split up runtime #VC handler for correct state tracking
  x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip()
  x86/insn: Extend error reporting from
    insn_fetch_from_user[_inatomic]()
  x86/sev-es: Propagate #GP if getting linear instruction address failed

 arch/x86/entry/entry_64.S       |   4 +-
 arch/x86/include/asm/idtentry.h |  29 ++---
 arch/x86/kernel/sev.c           | 207 ++++++++++++++++++++------------
 arch/x86/kernel/umip.c          |  10 +-
 arch/x86/lib/insn-eval.c        |  22 ++--
 5 files changed, 156 insertions(+), 116 deletions(-)


base-commit: efa165504943f2128d50f63de0c02faf6dcceb0d
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/6] x86/sev-es: Fix error message in runtime #VC handler
  2021-06-14 13:53 [PATCH v5 0/6] x86/sev-es: Fixes for SEV-ES Guest Support Joerg Roedel
@ 2021-06-14 13:53 ` Joerg Roedel
  2021-06-14 13:53 ` [PATCH v5 2/6] x86/sev-es: Make sure IRQs are disabled while GHCB is active Joerg Roedel
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-06-14 13:53 UTC (permalink / raw)
  To: x86
  Cc: Joerg Roedel, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

From: Joerg Roedel <jroedel@suse.de>

The runtime #VC handler is not "early" anymore. Fix the copy&paste error
and remove that word from the error message.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 651b81cd648e..4fd997bbf059 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1369,7 +1369,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 		vc_finish_insn(&ctxt);
 		break;
 	case ES_UNSUPPORTED:
-		pr_err_ratelimited("Unsupported exit-code 0x%02lx in early #VC exception (IP: 0x%lx)\n",
+		pr_err_ratelimited("Unsupported exit-code 0x%02lx in #VC exception (IP: 0x%lx)\n",
 				   error_code, regs->ip);
 		goto fail;
 	case ES_VMM_ERROR:
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 2/6] x86/sev-es: Make sure IRQs are disabled while GHCB is active
  2021-06-14 13:53 [PATCH v5 0/6] x86/sev-es: Fixes for SEV-ES Guest Support Joerg Roedel
  2021-06-14 13:53 ` [PATCH v5 1/6] x86/sev-es: Fix error message in runtime #VC handler Joerg Roedel
@ 2021-06-14 13:53 ` Joerg Roedel
  2021-06-14 16:25   ` Borislav Petkov
  2021-06-14 13:53 ` [PATCH v5 3/6] x86/sev-es: Split up runtime #VC handler for correct state tracking Joerg Roedel
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Joerg Roedel @ 2021-06-14 13:53 UTC (permalink / raw)
  To: x86
  Cc: Joerg Roedel, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

From: Joerg Roedel <jroedel@suse.de>

The #VC handler only cares about IRQs being disabled while the GHCB is
active, as it must not be interrupted by something which could cause
another #VC while it holds the GHCB (NMI is the exception for which the
backup GHCB exits).

Make sure nothing interrupts the code path while the GHCB is active by
disabling IRQs in sev_es_get_ghcb() and restoring the previous irq state
in sev_es_put_ghcb().

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev.c | 48 ++++++++++++++++++++++++++++++-------------
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 4fd997bbf059..f1bd95d451c3 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -192,7 +192,7 @@ void noinstr __sev_es_ist_exit(void)
 	this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
 }
 
-static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state)
+static __always_inline struct ghcb *__sev_es_get_ghcb(struct ghcb_state *state)
 {
 	struct sev_es_runtime_data *data;
 	struct ghcb *ghcb;
@@ -231,6 +231,18 @@ static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state)
 	return ghcb;
 }
 
+static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state,
+						    unsigned long *flags)
+{
+	/*
+	 * Nothing shall interrupt this code path while holding the per-cpu
+	 * GHCB. The backup GHCB is only for NMIs interrupting this path.
+	 */
+	local_irq_save(*flags);
+
+	return __sev_es_get_ghcb(state);
+}
+
 /* Needed in vc_early_forward_exception */
 void do_early_exception(struct pt_regs *regs, int trapnr);
 
@@ -479,7 +491,7 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
 /* Include code shared with pre-decompression boot stage */
 #include "sev-shared.c"
 
-static __always_inline void sev_es_put_ghcb(struct ghcb_state *state)
+static __always_inline void __sev_es_put_ghcb(struct ghcb_state *state)
 {
 	struct sev_es_runtime_data *data;
 	struct ghcb *ghcb;
@@ -502,12 +514,21 @@ static __always_inline void sev_es_put_ghcb(struct ghcb_state *state)
 	}
 }
 
+static __always_inline void sev_es_put_ghcb(struct ghcb_state *state,
+					    unsigned long flags)
+{
+	__sev_es_put_ghcb(state);
+	local_irq_restore(flags);
+}
+
 void noinstr __sev_es_nmi_complete(void)
 {
 	struct ghcb_state state;
 	struct ghcb *ghcb;
 
-	ghcb = sev_es_get_ghcb(&state);
+	BUG_ON(!irqs_disabled());
+
+	ghcb = __sev_es_get_ghcb(&state);
 
 	vc_ghcb_invalidate(ghcb);
 	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_NMI_COMPLETE);
@@ -517,7 +538,7 @@ void noinstr __sev_es_nmi_complete(void)
 	sev_es_wr_ghcb_msr(__pa_nodebug(ghcb));
 	VMGEXIT();
 
-	sev_es_put_ghcb(&state);
+	__sev_es_put_ghcb(&state);
 }
 
 static u64 get_jump_table_addr(void)
@@ -527,9 +548,7 @@ static u64 get_jump_table_addr(void)
 	struct ghcb *ghcb;
 	u64 ret = 0;
 
-	local_irq_save(flags);
-
-	ghcb = sev_es_get_ghcb(&state);
+	ghcb = sev_es_get_ghcb(&state, &flags);
 
 	vc_ghcb_invalidate(ghcb);
 	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_JUMP_TABLE);
@@ -543,9 +562,7 @@ static u64 get_jump_table_addr(void)
 	    ghcb_sw_exit_info_2_is_valid(ghcb))
 		ret = ghcb->save.sw_exit_info_2;
 
-	sev_es_put_ghcb(&state);
-
-	local_irq_restore(flags);
+	sev_es_put_ghcb(&state, flags);
 
 	return ret;
 }
@@ -666,9 +683,10 @@ static bool __init sev_es_setup_ghcb(void)
 static void sev_es_ap_hlt_loop(void)
 {
 	struct ghcb_state state;
+	unsigned long flags;
 	struct ghcb *ghcb;
 
-	ghcb = sev_es_get_ghcb(&state);
+	ghcb = sev_es_get_ghcb(&state, &flags);
 
 	while (true) {
 		vc_ghcb_invalidate(ghcb);
@@ -685,7 +703,7 @@ static void sev_es_ap_hlt_loop(void)
 			break;
 	}
 
-	sev_es_put_ghcb(&state);
+	sev_es_put_ghcb(&state, flags);
 }
 
 /*
@@ -1335,6 +1353,8 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 	enum es_result result;
 	struct ghcb *ghcb;
 
+	BUG_ON(!irqs_disabled());
+
 	/*
 	 * Handle #DB before calling into !noinstr code to avoid recursive #DB.
 	 */
@@ -1353,7 +1373,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 	 * keep the IRQs disabled to protect us against concurrent TLB flushes.
 	 */
 
-	ghcb = sev_es_get_ghcb(&state);
+	ghcb = __sev_es_get_ghcb(&state);
 
 	vc_ghcb_invalidate(ghcb);
 	result = vc_init_em_ctxt(&ctxt, regs, error_code);
@@ -1361,7 +1381,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 	if (result == ES_OK)
 		result = vc_handle_exitcode(&ctxt, ghcb, error_code);
 
-	sev_es_put_ghcb(&state);
+	__sev_es_put_ghcb(&state);
 
 	/* Done - now check the result */
 	switch (result) {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 3/6] x86/sev-es: Split up runtime #VC handler for correct state tracking
  2021-06-14 13:53 [PATCH v5 0/6] x86/sev-es: Fixes for SEV-ES Guest Support Joerg Roedel
  2021-06-14 13:53 ` [PATCH v5 1/6] x86/sev-es: Fix error message in runtime #VC handler Joerg Roedel
  2021-06-14 13:53 ` [PATCH v5 2/6] x86/sev-es: Make sure IRQs are disabled while GHCB is active Joerg Roedel
@ 2021-06-14 13:53 ` Joerg Roedel
  2021-06-16 16:04   ` Peter Zijlstra
  2021-06-14 13:53 ` [PATCH v5 4/6] x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip() Joerg Roedel
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Joerg Roedel @ 2021-06-14 13:53 UTC (permalink / raw)
  To: x86
  Cc: Joerg Roedel, Joerg Roedel, Peter Zijlstra, hpa, Andy Lutomirski,
	Dave Hansen, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

From: Joerg Roedel <jroedel@suse.de>

Split up the #VC handler code into a from-user and a from-kernel part.
This allows clean and correct state tracking, as the #VC handler needs
to enter NMI-state when raised from kernel mode and plain IRQ state when
raised from user-mode.

Fixes: 62441a1fb532 ("x86/sev-es: Correctly track IRQ states in runtime #VC handler")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_64.S       |   4 +-
 arch/x86/include/asm/idtentry.h |  29 +++----
 arch/x86/kernel/sev.c           | 144 ++++++++++++++++++--------------
 3 files changed, 94 insertions(+), 83 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a16a5294d55f..1886aaf19914 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -506,7 +506,7 @@ SYM_CODE_START(\asmsym)
 
 	movq	%rsp, %rdi		/* pt_regs pointer */
 
-	call	\cfunc
+	call	kernel_\cfunc
 
 	/*
 	 * No need to switch back to the IST stack. The current stack is either
@@ -517,7 +517,7 @@ SYM_CODE_START(\asmsym)
 
 	/* Switch to the regular task stack */
 .Lfrom_usermode_switch_stack_\@:
-	idtentry_body safe_stack_\cfunc, has_error_code=1
+	idtentry_body user_\cfunc, has_error_code=1
 
 _ASM_NOKPROBE(\asmsym)
 SYM_CODE_END(\asmsym)
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 73d45b0dfff2..cd9f3e304944 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -312,8 +312,8 @@ static __always_inline void __##func(struct pt_regs *regs)
  */
 #define DECLARE_IDTENTRY_VC(vector, func)				\
 	DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func);			\
-	__visible noinstr void ist_##func(struct pt_regs *regs, unsigned long error_code);	\
-	__visible noinstr void safe_stack_##func(struct pt_regs *regs, unsigned long error_code)
+	__visible noinstr void kernel_##func(struct pt_regs *regs, unsigned long error_code);	\
+	__visible noinstr void   user_##func(struct pt_regs *regs, unsigned long error_code)
 
 /**
  * DEFINE_IDTENTRY_IST - Emit code for IST entry points
@@ -355,33 +355,24 @@ static __always_inline void __##func(struct pt_regs *regs)
 	DEFINE_IDTENTRY_RAW_ERRORCODE(func)
 
 /**
- * DEFINE_IDTENTRY_VC_SAFE_STACK - Emit code for VMM communication handler
-				   which runs on a safe stack.
+ * DEFINE_IDTENTRY_VC_KERNEL - Emit code for VMM communication handler
+			       when raised from kernel mode
  * @func:	Function name of the entry point
  *
  * Maps to DEFINE_IDTENTRY_RAW_ERRORCODE
  */
-#define DEFINE_IDTENTRY_VC_SAFE_STACK(func)				\
-	DEFINE_IDTENTRY_RAW_ERRORCODE(safe_stack_##func)
+#define DEFINE_IDTENTRY_VC_KERNEL(func)				\
+	DEFINE_IDTENTRY_RAW_ERRORCODE(kernel_##func)
 
 /**
- * DEFINE_IDTENTRY_VC_IST - Emit code for VMM communication handler
-			    which runs on the VC fall-back stack
+ * DEFINE_IDTENTRY_VC_USER - Emit code for VMM communication handler
+			     when raised from user mode
  * @func:	Function name of the entry point
  *
  * Maps to DEFINE_IDTENTRY_RAW_ERRORCODE
  */
-#define DEFINE_IDTENTRY_VC_IST(func)				\
-	DEFINE_IDTENTRY_RAW_ERRORCODE(ist_##func)
-
-/**
- * DEFINE_IDTENTRY_VC - Emit code for VMM communication handler
- * @func:	Function name of the entry point
- *
- * Maps to DEFINE_IDTENTRY_RAW_ERRORCODE
- */
-#define DEFINE_IDTENTRY_VC(func)					\
-	DEFINE_IDTENTRY_RAW_ERRORCODE(func)
+#define DEFINE_IDTENTRY_VC_USER(func)				\
+	DEFINE_IDTENTRY_RAW_ERRORCODE(user_##func)
 
 #else	/* CONFIG_X86_64 */
 
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index f1bd95d451c3..6a580a8d5b32 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -793,7 +793,7 @@ void __init sev_es_init_vc_handling(void)
 	sev_es_setup_play_dead();
 
 	/* Secondary CPUs use the runtime #VC handler */
-	initial_vc_handler = (unsigned long)safe_stack_exc_vmm_communication;
+	initial_vc_handler = (unsigned long)kernel_exc_vmm_communication;
 }
 
 static void __init vc_early_forward_exception(struct es_em_ctxt *ctxt)
@@ -1334,45 +1334,16 @@ static __always_inline bool on_vc_fallback_stack(struct pt_regs *regs)
 	return (sp >= __this_cpu_ist_bottom_va(VC2) && sp < __this_cpu_ist_top_va(VC2));
 }
 
-/*
- * Main #VC exception handler. It is called when the entry code was able to
- * switch off the IST to a safe kernel stack.
- *
- * With the current implementation it is always possible to switch to a safe
- * stack because #VC exceptions only happen at known places, like intercepted
- * instructions or accesses to MMIO areas/IO ports. They can also happen with
- * code instrumentation when the hypervisor intercepts #DB, but the critical
- * paths are forbidden to be instrumented, so #DB exceptions currently also
- * only happen in safe places.
- */
-DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
+static bool vc_raw_handle_exception(struct pt_regs *regs, unsigned long error_code)
 {
-	irqentry_state_t irq_state;
 	struct ghcb_state state;
 	struct es_em_ctxt ctxt;
 	enum es_result result;
 	struct ghcb *ghcb;
+	bool ret = true;
 
 	BUG_ON(!irqs_disabled());
 
-	/*
-	 * Handle #DB before calling into !noinstr code to avoid recursive #DB.
-	 */
-	if (error_code == SVM_EXIT_EXCP_BASE + X86_TRAP_DB) {
-		vc_handle_trap_db(regs);
-		return;
-	}
-
-	irq_state = irqentry_nmi_enter(regs);
-	lockdep_assert_irqs_disabled();
-	instrumentation_begin();
-
-	/*
-	 * This is invoked through an interrupt gate, so IRQs are disabled. The
-	 * code below might walk page-tables for user or kernel addresses, so
-	 * keep the IRQs disabled to protect us against concurrent TLB flushes.
-	 */
-
 	ghcb = __sev_es_get_ghcb(&state);
 
 	vc_ghcb_invalidate(ghcb);
@@ -1391,15 +1362,18 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 	case ES_UNSUPPORTED:
 		pr_err_ratelimited("Unsupported exit-code 0x%02lx in #VC exception (IP: 0x%lx)\n",
 				   error_code, regs->ip);
-		goto fail;
+		ret = false;
+		break;
 	case ES_VMM_ERROR:
 		pr_err_ratelimited("Failure in communication with VMM (exit-code 0x%02lx IP: 0x%lx)\n",
 				   error_code, regs->ip);
-		goto fail;
+		ret = false;
+		break;
 	case ES_DECODE_FAILED:
 		pr_err_ratelimited("Failed to decode instruction (exit-code 0x%02lx IP: 0x%lx)\n",
 				   error_code, regs->ip);
-		goto fail;
+		ret = false;
+		break;
 	case ES_EXCEPTION:
 		vc_forward_exception(&ctxt);
 		break;
@@ -1415,24 +1389,55 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 		BUG();
 	}
 
-out:
-	instrumentation_end();
-	irqentry_nmi_exit(regs, irq_state);
+	return ret;
+}
 
-	return;
+static bool noinstr vc_check_and_handle_db(struct pt_regs *regs, unsigned long error_code)
+{
+	if (likely(error_code != SVM_EXIT_EXCP_BASE + X86_TRAP_DB))
+		return false;
 
-fail:
-	if (user_mode(regs)) {
-		/*
-		 * Do not kill the machine if user-space triggered the
-		 * exception. Send SIGBUS instead and let user-space deal with
-		 * it.
-		 */
-		force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0);
-	} else {
-		pr_emerg("PANIC: Unhandled #VC exception in kernel space (result=%d)\n",
-			 result);
+	vc_handle_trap_db(regs);
+
+	return true;
+}
+
+/*
+ * Runtime #VC exception handler when raised from kernel mode. Runs in NMI mode
+ * and will panic when an error happens.
+ */
+DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)
+{
+	irqentry_state_t irq_state;
 
+	/*
+	 * With the current implementation it is always possible to switch to a
+	 * safe stack because #VC exceptions only happen at known places, like
+	 * intercepted instructions or accesses to MMIO areas/IO ports. They can
+	 * also happen with code instrumentation when the hypervisor intercepts
+	 * #DB, but the critical paths are forbidden to be instrumented, so #DB
+	 * exceptions currently also only happen in safe places.
+	 *
+	 * But keep this here in case the noinstr annotations are violated due
+	 * to bug elsewhere.
+	 */
+	if (unlikely(on_vc_fallback_stack(regs))) {
+		instrumentation_begin();
+		panic("Can't handle #VC exception from unsupported context\n");
+		instrumentation_end();
+	}
+
+	/*
+	 * Handle #DB before calling into !noinstr code to avoid recursive #DB.
+	 */
+	if (vc_check_and_handle_db(regs, error_code))
+		return;
+
+	irq_state = irqentry_nmi_enter(regs);
+
+	instrumentation_begin();
+
+	if (!vc_raw_handle_exception(regs, error_code)) {
 		/* Show some debug info */
 		show_regs(regs);
 
@@ -1443,23 +1448,38 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 		panic("Returned from Terminate-Request to Hypervisor\n");
 	}
 
-	goto out;
+	instrumentation_end();
+	irqentry_nmi_exit(regs, irq_state);
 }
 
-/* This handler runs on the #VC fall-back stack. It can cause further #VC exceptions */
-DEFINE_IDTENTRY_VC_IST(exc_vmm_communication)
+/*
+ * Runtime #VC exception handler when raised from user mode. Runs in IRQ mode
+ * and will kill the current task with SIGBUS when an error happens.
+ */
+DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
 {
+	irqentry_state_t irq_state;
+
+	/*
+	 * Handle #DB before calling into !noinstr code to avoid recursive #DB.
+	 */
+	if (vc_check_and_handle_db(regs, error_code))
+		return;
+
+	irq_state = irqentry_enter(regs);
 	instrumentation_begin();
-	panic("Can't handle #VC exception from unsupported context\n");
-	instrumentation_end();
-}
 
-DEFINE_IDTENTRY_VC(exc_vmm_communication)
-{
-	if (likely(!on_vc_fallback_stack(regs)))
-		safe_stack_exc_vmm_communication(regs, error_code);
-	else
-		ist_exc_vmm_communication(regs, error_code);
+	if (!vc_raw_handle_exception(regs, error_code)) {
+		/*
+		 * Do not kill the machine if user-space triggered the
+		 * exception. Send SIGBUS instead and let user-space deal with
+		 * it.
+		 */
+		force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0);
+	}
+
+	instrumentation_end();
+	irqentry_exit(regs, irq_state);
 }
 
 bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 4/6] x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip()
  2021-06-14 13:53 [PATCH v5 0/6] x86/sev-es: Fixes for SEV-ES Guest Support Joerg Roedel
                   ` (2 preceding siblings ...)
  2021-06-14 13:53 ` [PATCH v5 3/6] x86/sev-es: Split up runtime #VC handler for correct state tracking Joerg Roedel
@ 2021-06-14 13:53 ` Joerg Roedel
  2021-06-14 13:53 ` [PATCH v5 5/6] x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]() Joerg Roedel
  2021-06-14 13:53 ` [PATCH v5 6/6] x86/sev-es: Propagate #GP if getting linear instruction address failed Joerg Roedel
  5 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-06-14 13:53 UTC (permalink / raw)
  To: x86
  Cc: Joerg Roedel, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

From: Joerg Roedel <jroedel@suse.de>

In theory 0 is a valid value for the instruction pointer, so don't use
it as the error return value from insn_get_effective_ip().

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/lib/insn-eval.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index a67afd74232c..4eecb9c7c6a0 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1417,7 +1417,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs)
 	}
 }
 
-static unsigned long insn_get_effective_ip(struct pt_regs *regs)
+static int insn_get_effective_ip(struct pt_regs *regs, unsigned long *ip)
 {
 	unsigned long seg_base = 0;
 
@@ -1430,10 +1430,12 @@ static unsigned long insn_get_effective_ip(struct pt_regs *regs)
 	if (!user_64bit_mode(regs)) {
 		seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
 		if (seg_base == -1L)
-			return 0;
+			return -EINVAL;
 	}
 
-	return seg_base + regs->ip;
+	*ip = seg_base + regs->ip;
+
+	return 0;
 }
 
 /**
@@ -1455,8 +1457,7 @@ int insn_fetch_from_user(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE])
 	unsigned long ip;
 	int not_copied;
 
-	ip = insn_get_effective_ip(regs);
-	if (!ip)
+	if (insn_get_effective_ip(regs, &ip))
 		return 0;
 
 	not_copied = copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE);
@@ -1484,8 +1485,7 @@ int insn_fetch_from_user_inatomic(struct pt_regs *regs, unsigned char buf[MAX_IN
 	unsigned long ip;
 	int not_copied;
 
-	ip = insn_get_effective_ip(regs);
-	if (!ip)
+	if (insn_get_effective_ip(regs, &ip))
 		return 0;
 
 	not_copied = __copy_from_user_inatomic(buf, (void __user *)ip, MAX_INSN_SIZE);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 5/6] x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]()
  2021-06-14 13:53 [PATCH v5 0/6] x86/sev-es: Fixes for SEV-ES Guest Support Joerg Roedel
                   ` (3 preceding siblings ...)
  2021-06-14 13:53 ` [PATCH v5 4/6] x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip() Joerg Roedel
@ 2021-06-14 13:53 ` Joerg Roedel
  2021-06-14 13:53 ` [PATCH v5 6/6] x86/sev-es: Propagate #GP if getting linear instruction address failed Joerg Roedel
  5 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-06-14 13:53 UTC (permalink / raw)
  To: x86
  Cc: Joerg Roedel, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

From: Joerg Roedel <jroedel@suse.de>

The error reporting from the insn_fetch_from_user*() functions is not
very verbose. Extend it to include information on whether the linear
RIP could not be calculated or whether the memory access faulted.

This will be used in the SEV-ES code to propagate the correct
exception depending on what went wrong during instruction fetch.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev.c    |  8 ++++----
 arch/x86/kernel/umip.c   | 10 ++++------
 arch/x86/lib/insn-eval.c |  8 ++++++--
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 6a580a8d5b32..259f64be2611 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -270,17 +270,17 @@ static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt,
 static enum es_result __vc_decode_user_insn(struct es_em_ctxt *ctxt)
 {
 	char buffer[MAX_INSN_SIZE];
-	int res;
+	int insn_bytes;
 
-	res = insn_fetch_from_user_inatomic(ctxt->regs, buffer);
-	if (!res) {
+	insn_bytes = insn_fetch_from_user_inatomic(ctxt->regs, buffer);
+	if (insn_bytes <= 0) {
 		ctxt->fi.vector     = X86_TRAP_PF;
 		ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
 		ctxt->fi.cr2        = ctxt->regs->ip;
 		return ES_EXCEPTION;
 	}
 
-	if (!insn_decode_from_regs(&ctxt->insn, ctxt->regs, buffer, res))
+	if (!insn_decode_from_regs(&ctxt->insn, ctxt->regs, buffer, insn_bytes))
 		return ES_DECODE_FAILED;
 
 	if (ctxt->insn.immediate.got)
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 8daa70b0d2da..337178809c89 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -346,14 +346,12 @@ bool fixup_umip_exception(struct pt_regs *regs)
 	if (!regs)
 		return false;
 
-	nr_copied = insn_fetch_from_user(regs, buf);
-
 	/*
-	 * The insn_fetch_from_user above could have failed if user code
-	 * is protected by a memory protection key. Give up on emulation
-	 * in such a case.  Should we issue a page fault?
+	 * Give up on emulation if fetching the instruction failed. Should we
+	 * issue a page fault or a #GP?
 	 */
-	if (!nr_copied)
+	nr_copied = insn_fetch_from_user(regs, buf);
+	if (nr_copied <= 0)
 		return false;
 
 	if (!insn_decode_from_regs(&insn, regs, buf, nr_copied))
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4eecb9c7c6a0..1b5cdf8b7a4e 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1451,6 +1451,8 @@ static int insn_get_effective_ip(struct pt_regs *regs, unsigned long *ip)
  * Number of instruction bytes copied.
  *
  * 0 if nothing was copied.
+ *
+ * -EINVAL if the linear address of the instruction could not be calculated
  */
 int insn_fetch_from_user(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE])
 {
@@ -1458,7 +1460,7 @@ int insn_fetch_from_user(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE])
 	int not_copied;
 
 	if (insn_get_effective_ip(regs, &ip))
-		return 0;
+		return -EINVAL;
 
 	not_copied = copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE);
 
@@ -1479,6 +1481,8 @@ int insn_fetch_from_user(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE])
  * Number of instruction bytes copied.
  *
  * 0 if nothing was copied.
+ *
+ * -EINVAL if the linear address of the instruction could not be calculated
  */
 int insn_fetch_from_user_inatomic(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE])
 {
@@ -1486,7 +1490,7 @@ int insn_fetch_from_user_inatomic(struct pt_regs *regs, unsigned char buf[MAX_IN
 	int not_copied;
 
 	if (insn_get_effective_ip(regs, &ip))
-		return 0;
+		return -EINVAL;
 
 	not_copied = __copy_from_user_inatomic(buf, (void __user *)ip, MAX_INSN_SIZE);
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 6/6] x86/sev-es: Propagate #GP if getting linear instruction address failed
  2021-06-14 13:53 [PATCH v5 0/6] x86/sev-es: Fixes for SEV-ES Guest Support Joerg Roedel
                   ` (4 preceding siblings ...)
  2021-06-14 13:53 ` [PATCH v5 5/6] x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]() Joerg Roedel
@ 2021-06-14 13:53 ` Joerg Roedel
  5 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-06-14 13:53 UTC (permalink / raw)
  To: x86
  Cc: Joerg Roedel, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

From: Joerg Roedel <jroedel@suse.de>

When an instruction is fetched from user-space, segmentation needs to
be taken into account. This means that getting the linear address of
an instruction can fail. Hardware would raise a #GP
exception in that case, but the #VC exception handler would emulate it
as a page-fault.

The insn_fetch_from_user*() functions now provide the relevant
information in case of an failure. Use that and propagate a #GP when
the linear address of an instruction to fetch could not be calculated.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 259f64be2611..2a7ddefc61a5 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -273,11 +273,18 @@ static enum es_result __vc_decode_user_insn(struct es_em_ctxt *ctxt)
 	int insn_bytes;
 
 	insn_bytes = insn_fetch_from_user_inatomic(ctxt->regs, buffer);
-	if (insn_bytes <= 0) {
+	if (insn_bytes == 0) {
+		/* Nothing could be copied */
 		ctxt->fi.vector     = X86_TRAP_PF;
 		ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
 		ctxt->fi.cr2        = ctxt->regs->ip;
 		return ES_EXCEPTION;
+	} else if (insn_bytes == -EINVAL) {
+		/* Effective RIP could not be calculated */
+		ctxt->fi.vector     = X86_TRAP_GP;
+		ctxt->fi.error_code = 0;
+		ctxt->fi.cr2        = 0;
+		return ES_EXCEPTION;
 	}
 
 	if (!insn_decode_from_regs(&ctxt->insn, ctxt->regs, buffer, insn_bytes))
-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/6] x86/sev-es: Make sure IRQs are disabled while GHCB is active
  2021-06-14 13:53 ` [PATCH v5 2/6] x86/sev-es: Make sure IRQs are disabled while GHCB is active Joerg Roedel
@ 2021-06-14 16:25   ` Borislav Petkov
  2021-06-14 16:30     ` Borislav Petkov
  2021-06-15  9:37     ` Joerg Roedel
  0 siblings, 2 replies; 12+ messages in thread
From: Borislav Petkov @ 2021-06-14 16:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

On Mon, Jun 14, 2021 at 03:53:23PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> The #VC handler only cares about IRQs being disabled while the GHCB is
> active, as it must not be interrupted by something which could cause
> another #VC while it holds the GHCB (NMI is the exception for which the
> backup GHCB exits).
> 
> Make sure nothing interrupts the code path while the GHCB is active by
> disabling IRQs in sev_es_get_ghcb() and restoring the previous irq state
> in sev_es_put_ghcb().
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/kernel/sev.c | 48 ++++++++++++++++++++++++++++++-------------
>  1 file changed, 34 insertions(+), 14 deletions(-)

Here's a diff ontop of yours with a couple of points:

* I've named the low-level, interrupts-enabled workers
__sev_get_ghcb()/__sev_put_ghcb() to mean a couple of things:

** underscored to mean, that callers need to disable local locks. There's
also a lockdep_assert_irqs_disabled() to make sure, both in the get and
put function.

** also only "sev" in the name because this code is not used for SEV-ES
only anymore.

* I've done it this way because you have a well-recognized code pattern
where the caller disables interrupts, calls the low-level helpers and
then enables interrupts again when done. VS passing a flags pointer back
and forth which just looks weird.

And as to being easy to use - users can botch flags too, when passing
around so they can just as well do proper interrupts toggling like a
gazillion other places in the kernel.

Also, you have places like exc_vmm_communication() where you have
to artifically pass in flags - I'm looking at your previous version
- even if you already make sure interrupts are disabled with the
BUG_ON assertion on entry. So in those cases you can simply call the
interrupt-enabled, __-variants.

Btw, while we're on exc_vmm_communication, it has a:

	BUG_ON(!irqs_disabled());

on entry and then later

	lockdep_assert_irqs_disabled();

and that second assertion is not really needed, methinks. So a hunk
below removes it.

Thoughts?

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 7d70cddc38be..b85c4a2be9fa 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -192,11 +192,19 @@ void noinstr __sev_es_ist_exit(void)
 	this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist);
 }
 
-static __always_inline struct ghcb *__sev_es_get_ghcb(struct ghcb_state *state)
+/*
+ * Nothing shall interrupt this code path while holding the per-CPU
+ * GHCB. The backup GHCB is only for NMIs interrupting this path.
+ *
+ * Callers must disable local interrupts around it.
+ */
+static __always_inline struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
 {
 	struct sev_es_runtime_data *data;
 	struct ghcb *ghcb;
 
+	lockdep_assert_irqs_disabled();
+
 	data = this_cpu_read(runtime_data);
 	ghcb = &data->ghcb_page;
 
@@ -231,18 +239,6 @@ static __always_inline struct ghcb *__sev_es_get_ghcb(struct ghcb_state *state)
 	return ghcb;
 }
 
-static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state,
-						    unsigned long *flags)
-{
-	/*
-	 * Nothing shall interrupt this code path while holding the per-cpu
-	 * GHCB. The backup GHCB is only for NMIs interrupting this path.
-	 */
-	local_irq_save(*flags);
-
-	return __sev_es_get_ghcb(state);
-}
-
 /* Needed in vc_early_forward_exception */
 void do_early_exception(struct pt_regs *regs, int trapnr);
 
@@ -491,11 +487,13 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt
 /* Include code shared with pre-decompression boot stage */
 #include "sev-shared.c"
 
-static __always_inline void __sev_es_put_ghcb(struct ghcb_state *state)
+static __always_inline void __sev_put_ghcb(struct ghcb_state *state)
 {
 	struct sev_es_runtime_data *data;
 	struct ghcb *ghcb;
 
+	lockdep_assert_irqs_disabled();
+
 	data = this_cpu_read(runtime_data);
 	ghcb = &data->ghcb_page;
 
@@ -514,13 +512,6 @@ static __always_inline void __sev_es_put_ghcb(struct ghcb_state *state)
 	}
 }
 
-static __always_inline void sev_es_put_ghcb(struct ghcb_state *state,
-					    unsigned long flags)
-{
-	__sev_es_put_ghcb(state);
-	local_irq_restore(flags);
-}
-
 void noinstr __sev_es_nmi_complete(void)
 {
 	struct ghcb_state state;
@@ -528,7 +519,7 @@ void noinstr __sev_es_nmi_complete(void)
 
 	BUG_ON(!irqs_disabled());
 
-	ghcb = __sev_es_get_ghcb(&state);
+	ghcb = __sev_get_ghcb(&state);
 
 	vc_ghcb_invalidate(ghcb);
 	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_NMI_COMPLETE);
@@ -538,7 +529,7 @@ void noinstr __sev_es_nmi_complete(void)
 	sev_es_wr_ghcb_msr(__pa_nodebug(ghcb));
 	VMGEXIT();
 
-	__sev_es_put_ghcb(&state);
+	__sev_put_ghcb(&state);
 }
 
 static u64 get_jump_table_addr(void)
@@ -548,7 +539,9 @@ static u64 get_jump_table_addr(void)
 	struct ghcb *ghcb;
 	u64 ret = 0;
 
-	ghcb = sev_es_get_ghcb(&state, &flags);
+	local_irq_save(flags);
+
+	ghcb = __sev_get_ghcb(&state);
 
 	vc_ghcb_invalidate(ghcb);
 	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_JUMP_TABLE);
@@ -562,7 +555,9 @@ static u64 get_jump_table_addr(void)
 	    ghcb_sw_exit_info_2_is_valid(ghcb))
 		ret = ghcb->save.sw_exit_info_2;
 
-	sev_es_put_ghcb(&state, flags);
+	__sev_put_ghcb(&state);
+
+	local_irq_restore(flags);
 
 	return ret;
 }
@@ -686,7 +681,9 @@ static void sev_es_ap_hlt_loop(void)
 	unsigned long flags;
 	struct ghcb *ghcb;
 
-	ghcb = sev_es_get_ghcb(&state, &flags);
+	local_irq_save(flags);
+
+	ghcb = __sev_get_ghcb(&state);
 
 	while (true) {
 		vc_ghcb_invalidate(ghcb);
@@ -703,7 +700,9 @@ static void sev_es_ap_hlt_loop(void)
 			break;
 	}
 
-	sev_es_put_ghcb(&state, flags);
+	__sev_put_ghcb(&state);
+
+	local_irq_restore(flags);
 }
 
 /*
@@ -1364,7 +1363,6 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 	}
 
 	irq_state = irqentry_nmi_enter(regs);
-	lockdep_assert_irqs_disabled();
 	instrumentation_begin();
 
 	/*
@@ -1373,7 +1371,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 	 * keep the IRQs disabled to protect us against concurrent TLB flushes.
 	 */
 
-	ghcb = __sev_es_get_ghcb(&state);
+	ghcb = __sev_get_ghcb(&state);
 
 	vc_ghcb_invalidate(ghcb);
 	result = vc_init_em_ctxt(&ctxt, regs, error_code);
@@ -1381,7 +1379,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 	if (result == ES_OK)
 		result = vc_handle_exitcode(&ctxt, ghcb, error_code);
 
-	__sev_es_put_ghcb(&state);
+	__sev_put_ghcb(&state);
 
 	/* Done - now check the result */
 	switch (result) {

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/6] x86/sev-es: Make sure IRQs are disabled while GHCB is active
  2021-06-14 16:25   ` Borislav Petkov
@ 2021-06-14 16:30     ` Borislav Petkov
  2021-06-15  9:37     ` Joerg Roedel
  1 sibling, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2021-06-14 16:30 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

On Mon, Jun 14, 2021 at 06:25:18PM +0200, Borislav Petkov wrote:
> ** underscored to mean, that callers need to disable local locks. There's
							     ^^^^^^

"interrupts" ofc.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/6] x86/sev-es: Make sure IRQs are disabled while GHCB is active
  2021-06-14 16:25   ` Borislav Petkov
  2021-06-14 16:30     ` Borislav Petkov
@ 2021-06-15  9:37     ` Joerg Roedel
  1 sibling, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-06-15  9:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, David Rientjes, Cfir Cohen,
	Erdem Aktas, Masami Hiramatsu, Mike Stunes, Sean Christopherson,
	Martin Radev, Arvind Sankar, linux-coco, linux-kernel, kvm,
	virtualization

On Mon, Jun 14, 2021 at 06:25:18PM +0200, Borislav Petkov wrote:
> Thoughts?

Okay, I tested a bit with this, it mostly works fine.

The lockdep_hardirqs_disabled() check in __sev_get/put_ghcb() was
problematic, because these functions are called also when no state is
set up yet. More specifically it is called from __sev_es_nmi_complete(),
which is called at the very beginning of do_nmi(), before the NMI
handler entered NMI mode. So this triggered a warning in the NMI
test-suite when booting up, I replaced these checks with a
WARN_ON(!irqs_disabled()) and also removed the BUG_ON()s.

With that it boots fine and without SEV-ES related warnings.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/6] x86/sev-es: Split up runtime #VC handler for correct state tracking
  2021-06-14 13:53 ` [PATCH v5 3/6] x86/sev-es: Split up runtime #VC handler for correct state tracking Joerg Roedel
@ 2021-06-16 16:04   ` Peter Zijlstra
  2021-06-16 19:01     ` Joerg Roedel
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2021-06-16 16:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen, Jiri Slaby,
	Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	David Rientjes, Cfir Cohen, Erdem Aktas, Masami Hiramatsu,
	Mike Stunes, Sean Christopherson, Martin Radev, Arvind Sankar,
	linux-coco, linux-kernel, kvm, virtualization

On Mon, Jun 14, 2021 at 03:53:24PM +0200, Joerg Roedel wrote:

> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -506,7 +506,7 @@ SYM_CODE_START(\asmsym)
>  
>  	movq	%rsp, %rdi		/* pt_regs pointer */
>  
> -	call	\cfunc
> +	call	kernel_\cfunc
>  
>  	/*
>  	 * No need to switch back to the IST stack. The current stack is either
> @@ -517,7 +517,7 @@ SYM_CODE_START(\asmsym)
>  
>  	/* Switch to the regular task stack */
>  .Lfrom_usermode_switch_stack_\@:
> -	idtentry_body safe_stack_\cfunc, has_error_code=1
> +	idtentry_body user_\cfunc, has_error_code=1
>  
>  _ASM_NOKPROBE(\asmsym)
>  SYM_CODE_END(\asmsym)

Consistency with idtentry_mce_db would seem to suggest using \cfunc and
noist_\cfunc.

amluto, tglx: do we have strong feelings on consistency?


> +static bool noinstr vc_check_and_handle_db(struct pt_regs *regs, unsigned long error_code)
> +{
> +	if (likely(error_code != SVM_EXIT_EXCP_BASE + X86_TRAP_DB))
> +		return false;
>  
> +	vc_handle_trap_db(regs);

It's a bit sad this does user_mode(regs) again.

> +
> +	return true;
> +}

Maybe something like:

static __always_inline bool vc_is_db(unsigned long error_code)
{
	return error_code == SVM_EXIT_EXCP_BASE + X86_TRAP_DB;
}

> +
> +/*
> + * Runtime #VC exception handler when raised from kernel mode. Runs in NMI mode
> + * and will panic when an error happens.
> + */
> +DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)
> +{
> +	irqentry_state_t irq_state;
>  
> +	/*
> +	 * With the current implementation it is always possible to switch to a
> +	 * safe stack because #VC exceptions only happen at known places, like
> +	 * intercepted instructions or accesses to MMIO areas/IO ports. They can
> +	 * also happen with code instrumentation when the hypervisor intercepts
> +	 * #DB, but the critical paths are forbidden to be instrumented, so #DB
> +	 * exceptions currently also only happen in safe places.
> +	 *
> +	 * But keep this here in case the noinstr annotations are violated due
> +	 * to bug elsewhere.
> +	 */
> +	if (unlikely(on_vc_fallback_stack(regs))) {
> +		instrumentation_begin();
> +		panic("Can't handle #VC exception from unsupported context\n");
> +		instrumentation_end();
> +	}
> +
> +	/*
> +	 * Handle #DB before calling into !noinstr code to avoid recursive #DB.
> +	 */
> +	if (vc_check_and_handle_db(regs, error_code))
> +		return;

	if (vc_is_db(error_core)) {
		exc_debug(regs);
		return;
	}

> +
> +	irq_state = irqentry_nmi_enter(regs);
> +
> +	instrumentation_begin();
> +
> +	if (!vc_raw_handle_exception(regs, error_code)) {
>  		/* Show some debug info */
>  		show_regs(regs);
>  
> @@ -1443,23 +1448,38 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
>  		panic("Returned from Terminate-Request to Hypervisor\n");
>  	}
>  
> +	instrumentation_end();
> +	irqentry_nmi_exit(regs, irq_state);
>  }
>  
> +/*
> + * Runtime #VC exception handler when raised from user mode. Runs in IRQ mode
> + * and will kill the current task with SIGBUS when an error happens.
> + */
> +DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
>  {
> +	irqentry_state_t irq_state;
> +
> +	/*
> +	 * Handle #DB before calling into !noinstr code to avoid recursive #DB.
> +	 */
> +	if (vc_check_and_handle_db(regs, error_code))
> +		return;

	if (vs_is_db(error_code)) {
		noist_exc_debug(regs);
		return;
	}

> +
> +	irq_state = irqentry_enter(regs);
>  	instrumentation_begin();
>  
> +	if (!vc_raw_handle_exception(regs, error_code)) {
> +		/*
> +		 * Do not kill the machine if user-space triggered the
> +		 * exception. Send SIGBUS instead and let user-space deal with
> +		 * it.
> +		 */
> +		force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0);
> +	}
> +
> +	instrumentation_end();
> +	irqentry_exit(regs, irq_state);
>  }

Other than that, this seems *much* nicer. Thanks!



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/6] x86/sev-es: Split up runtime #VC handler for correct state tracking
  2021-06-16 16:04   ` Peter Zijlstra
@ 2021-06-16 19:01     ` Joerg Roedel
  0 siblings, 0 replies; 12+ messages in thread
From: Joerg Roedel @ 2021-06-16 19:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Joerg Roedel, hpa, Andy Lutomirski, Dave Hansen, Jiri Slaby,
	Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	David Rientjes, Cfir Cohen, Erdem Aktas, Masami Hiramatsu,
	Mike Stunes, Sean Christopherson, Martin Radev, Arvind Sankar,
	linux-coco, linux-kernel, kvm, virtualization

Hi Peter,

sorry, missed this email before sending out v6.

On Wed, Jun 16, 2021 at 06:04:26PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 14, 2021 at 03:53:24PM +0200, Joerg Roedel wrote:
> >  _ASM_NOKPROBE(\asmsym)
> >  SYM_CODE_END(\asmsym)
> 
> Consistency with idtentry_mce_db would seem to suggest using \cfunc and
> noist_\cfunc.
> 
> amluto, tglx: do we have strong feelings on consistency?

Yeah, but this distinction does not make sense here, as none of the #VC
handlers C functions run on the actual #VC IST stack. The from-kernel
function might run on the fall-back stack (not really possible today
unless the hypervisor does something nasty). And the difference between
the fall-back stack and the actual IST stack is, that on the fall-back
stack nesting #VC exceptions is still supported.

> > +	vc_handle_trap_db(regs);
> 
> It's a bit sad this does user_mode(regs) again.

Okay, I will change this according your comments.

Thanks,

	Joerg


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-06-16 19:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14 13:53 [PATCH v5 0/6] x86/sev-es: Fixes for SEV-ES Guest Support Joerg Roedel
2021-06-14 13:53 ` [PATCH v5 1/6] x86/sev-es: Fix error message in runtime #VC handler Joerg Roedel
2021-06-14 13:53 ` [PATCH v5 2/6] x86/sev-es: Make sure IRQs are disabled while GHCB is active Joerg Roedel
2021-06-14 16:25   ` Borislav Petkov
2021-06-14 16:30     ` Borislav Petkov
2021-06-15  9:37     ` Joerg Roedel
2021-06-14 13:53 ` [PATCH v5 3/6] x86/sev-es: Split up runtime #VC handler for correct state tracking Joerg Roedel
2021-06-16 16:04   ` Peter Zijlstra
2021-06-16 19:01     ` Joerg Roedel
2021-06-14 13:53 ` [PATCH v5 4/6] x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip() Joerg Roedel
2021-06-14 13:53 ` [PATCH v5 5/6] x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]() Joerg Roedel
2021-06-14 13:53 ` [PATCH v5 6/6] x86/sev-es: Propagate #GP if getting linear instruction address failed Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).