All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/3] Fix for load_unaligned_zeropad() in TDX guest
@ 2022-05-20  3:13 Kirill A. Shutemov
  2022-05-20  3:13 ` [PATCHv2 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2022-05-20  3:13 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86,
	Kirill A. Shutemov

An updated version of the load_unaligned_zeropad() fix.

v2:
 - Clarify RIP adjustments during #VE handling;
 - Fix early #VE handling;
 - Fix comment and commit message;

Kirill A. Shutemov (3):
  x86/tdx: Fix early #VE handling
  x86/tdx: Clarify RIP adjustments in #VE handler
  x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

 arch/x86/coco/tdx/tdx.c | 166 +++++++++++++++++++++++++++-------------
 1 file changed, 113 insertions(+), 53 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 1/3] x86/tdx: Fix early #VE handling
  2022-05-20  3:13 [PATCHv2 0/3] Fix for load_unaligned_zeropad() in TDX guest Kirill A. Shutemov
@ 2022-05-20  3:13 ` Kirill A. Shutemov
  2022-05-20 17:01   ` Sean Christopherson
  2022-05-20 17:46   ` Sathyanarayanan Kuppuswamy
  2022-05-20  3:13 ` [PATCHv2 2/3] x86/tdx: Clarify RIP adjustments in #VE handler Kirill A. Shutemov
  2022-05-20  3:13 ` [PATCHv2 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page Kirill A. Shutemov
  2 siblings, 2 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2022-05-20  3:13 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86,
	Kirill A. Shutemov

Move RIP in tdx_early_handle_ve() after handling the exception. Failure
to do that leads to infinite loop of exceptions.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 32e72854fa5f ("x86/tdx: Port I/O: Add early boot support")
---
 arch/x86/coco/tdx/tdx.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 03deb4d6920d..faae53f8d559 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -447,13 +447,17 @@ static bool handle_io(struct pt_regs *regs, u32 exit_qual)
 __init bool tdx_early_handle_ve(struct pt_regs *regs)
 {
 	struct ve_info ve;
+	bool ret;
 
 	tdx_get_ve_info(&ve);
 
 	if (ve.exit_reason != EXIT_REASON_IO_INSTRUCTION)
 		return false;
 
-	return handle_io(regs, ve.exit_qual);
+	ret = handle_io(regs, ve.exit_qual);
+	if (ret)
+		regs->ip += ve.instr_len;
+	return ret;
 }
 
 void tdx_get_ve_info(struct ve_info *ve)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 2/3] x86/tdx: Clarify RIP adjustments in #VE handler
  2022-05-20  3:13 [PATCHv2 0/3] Fix for load_unaligned_zeropad() in TDX guest Kirill A. Shutemov
  2022-05-20  3:13 ` [PATCHv2 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
@ 2022-05-20  3:13 ` Kirill A. Shutemov
  2022-05-20 17:52   ` Dave Hansen
  2022-05-20  3:13 ` [PATCHv2 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page Kirill A. Shutemov
  2 siblings, 1 reply; 12+ messages in thread
From: Kirill A. Shutemov @ 2022-05-20  3:13 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86,
	Kirill A. Shutemov

After successful #VE handling, tdx_handle_virt_exception() has to move
RIP to the next instruction. The handler needs to know the length of the
instruction.

If the #VE happened due to instruction execution, GET_VEINFO TDX module
call provides info on the instruction in R10, including its length.

For #VE due to EPT violation, info in R10 is not usable and kernel has
to decode instruction manually to find out its length.

Restructure the code to make it explicit that the instruction length
depends on the type of #VE. Handler of an exit reason returns
instruction length on success or -errno on failure.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/coco/tdx/tdx.c | 150 +++++++++++++++++++++++++---------------
 1 file changed, 93 insertions(+), 57 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index faae53f8d559..010dc229096a 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -147,7 +147,7 @@ static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti)
 	return __tdx_hypercall(&args, do_sti ? TDX_HCALL_ISSUE_STI : 0);
 }
 
-static bool handle_halt(void)
+static int handle_halt(struct ve_info *ve)
 {
 	/*
 	 * Since non safe halt is mainly used in CPU offlining
@@ -158,9 +158,9 @@ static bool handle_halt(void)
 	const bool do_sti = false;
 
 	if (__halt(irq_disabled, do_sti))
-		return false;
+		return -EIO;
 
-	return true;
+	return ve->instr_len;
 }
 
 void __cpuidle tdx_safe_halt(void)
@@ -180,7 +180,7 @@ void __cpuidle tdx_safe_halt(void)
 		WARN_ONCE(1, "HLT instruction emulation failed\n");
 }
 
-static bool read_msr(struct pt_regs *regs)
+static int read_msr(struct pt_regs *regs, struct ve_info *ve)
 {
 	struct tdx_hypercall_args args = {
 		.r10 = TDX_HYPERCALL_STANDARD,
@@ -194,14 +194,14 @@ static bool read_msr(struct pt_regs *regs)
 	 * (GHCI), section titled "TDG.VP.VMCALL<Instruction.RDMSR>".
 	 */
 	if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
-		return false;
+		return -EIO;
 
 	regs->ax = lower_32_bits(args.r11);
 	regs->dx = upper_32_bits(args.r11);
-	return true;
+	return ve->instr_len;
 }
 
-static bool write_msr(struct pt_regs *regs)
+static int write_msr(struct pt_regs *regs, struct ve_info *ve)
 {
 	struct tdx_hypercall_args args = {
 		.r10 = TDX_HYPERCALL_STANDARD,
@@ -215,10 +215,13 @@ static bool write_msr(struct pt_regs *regs)
 	 * can be found in TDX Guest-Host-Communication Interface
 	 * (GHCI) section titled "TDG.VP.VMCALL<Instruction.WRMSR>".
 	 */
-	return !__tdx_hypercall(&args, 0);
+	if (__tdx_hypercall(&args, 0))
+		return -EIO;
+
+	return ve->instr_len;
 }
 
-static bool handle_cpuid(struct pt_regs *regs)
+static int handle_cpuid(struct pt_regs *regs, struct ve_info *ve)
 {
 	struct tdx_hypercall_args args = {
 		.r10 = TDX_HYPERCALL_STANDARD,
@@ -236,7 +239,7 @@ static bool handle_cpuid(struct pt_regs *regs)
 	 */
 	if (regs->ax < 0x40000000 || regs->ax > 0x4FFFFFFF) {
 		regs->ax = regs->bx = regs->cx = regs->dx = 0;
-		return true;
+		return ve->instr_len;
 	}
 
 	/*
@@ -245,7 +248,7 @@ static bool handle_cpuid(struct pt_regs *regs)
 	 * (GHCI), section titled "VP.VMCALL<Instruction.CPUID>".
 	 */
 	if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
-		return false;
+		return -EIO;
 
 	/*
 	 * As per TDX GHCI CPUID ABI, r12-r15 registers contain contents of
@@ -257,7 +260,7 @@ static bool handle_cpuid(struct pt_regs *regs)
 	regs->cx = args.r14;
 	regs->dx = args.r15;
 
-	return true;
+	return ve->instr_len;
 }
 
 static bool mmio_read(int size, unsigned long addr, unsigned long *val)
@@ -283,7 +286,7 @@ static bool mmio_write(int size, unsigned long addr, unsigned long val)
 			       EPT_WRITE, addr, val);
 }
 
-static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
+static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 {
 	char buffer[MAX_INSN_SIZE];
 	unsigned long *reg, val;
@@ -294,34 +297,36 @@ static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 
 	/* Only in-kernel MMIO is supported */
 	if (WARN_ON_ONCE(user_mode(regs)))
-		return false;
+		return -EFAULT;
 
 	if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE))
-		return false;
+		return -EFAULT;
 
 	if (insn_decode(&insn, buffer, MAX_INSN_SIZE, INSN_MODE_64))
-		return false;
+		return -EINVAL;
 
 	mmio = insn_decode_mmio(&insn, &size);
 	if (WARN_ON_ONCE(mmio == MMIO_DECODE_FAILED))
-		return false;
+		return -EINVAL;
 
 	if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
 		reg = insn_get_modrm_reg_ptr(&insn, regs);
 		if (!reg)
-			return false;
+			return -EINVAL;
 	}
 
-	ve->instr_len = insn.length;
-
 	/* Handle writes first */
 	switch (mmio) {
 	case MMIO_WRITE:
 		memcpy(&val, reg, size);
-		return mmio_write(size, ve->gpa, val);
+		if (!mmio_write(size, ve->gpa, val))
+			return -EIO;
+		return insn.length;
 	case MMIO_WRITE_IMM:
 		val = insn.immediate.value;
-		return mmio_write(size, ve->gpa, val);
+		if (!mmio_write(size, ve->gpa, val))
+			return -EIO;
+		return insn.length;
 	case MMIO_READ:
 	case MMIO_READ_ZERO_EXTEND:
 	case MMIO_READ_SIGN_EXTEND:
@@ -334,15 +339,15 @@ static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 		 * decoded or handled properly. It was likely not using io.h
 		 * helpers or accessed MMIO accidentally.
 		 */
-		return false;
+		return -EINVAL;
 	default:
 		WARN_ONCE(1, "Unknown insn_decode_mmio() decode value?");
-		return false;
+		return -EINVAL;
 	}
 
 	/* Handle reads */
 	if (!mmio_read(size, ve->gpa, &val))
-		return false;
+		return -EIO;
 
 	switch (mmio) {
 	case MMIO_READ:
@@ -364,13 +369,13 @@ static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 	default:
 		/* All other cases has to be covered with the first switch() */
 		WARN_ON_ONCE(1);
-		return false;
+		return -EINVAL;
 	}
 
 	if (extend_size)
 		memset(reg, extend_val, extend_size);
 	memcpy(reg, &val, size);
-	return true;
+	return insn.length;
 }
 
 static bool handle_in(struct pt_regs *regs, int size, int port)
@@ -421,13 +426,14 @@ static bool handle_out(struct pt_regs *regs, int size, int port)
  *
  * Return True on success or False on failure.
  */
-static bool handle_io(struct pt_regs *regs, u32 exit_qual)
+static int handle_io(struct pt_regs *regs, struct ve_info *ve)
 {
+	u32 exit_qual = ve->exit_qual;
 	int size, port;
-	bool in;
+	bool in, ret;
 
 	if (VE_IS_IO_STRING(exit_qual))
-		return false;
+		return -EIO;
 
 	in   = VE_IS_IO_IN(exit_qual);
 	size = VE_GET_IO_SIZE(exit_qual);
@@ -435,9 +441,13 @@ static bool handle_io(struct pt_regs *regs, u32 exit_qual)
 
 
 	if (in)
-		return handle_in(regs, size, port);
+		ret = handle_in(regs, size, port);
 	else
-		return handle_out(regs, size, port);
+		ret = handle_out(regs, size, port);
+	if (!ret)
+		return -EIO;
+
+	return ve->instr_len;
 }
 
 /*
@@ -447,17 +457,19 @@ static bool handle_io(struct pt_regs *regs, u32 exit_qual)
 __init bool tdx_early_handle_ve(struct pt_regs *regs)
 {
 	struct ve_info ve;
-	bool ret;
+	int insn_len;
 
 	tdx_get_ve_info(&ve);
 
 	if (ve.exit_reason != EXIT_REASON_IO_INSTRUCTION)
 		return false;
 
-	ret = handle_io(regs, ve.exit_qual);
-	if (ret)
-		regs->ip += ve.instr_len;
-	return ret;
+	insn_len = handle_io(regs, &ve);
+	if (insn_len < 0)
+		return false;
+
+	regs->ip += insn_len;
+	return true;
 }
 
 void tdx_get_ve_info(struct ve_info *ve)
@@ -486,58 +498,82 @@ void tdx_get_ve_info(struct ve_info *ve)
 	ve->exit_qual   = out.rdx;
 	ve->gla         = out.r8;
 	ve->gpa         = out.r9;
-	ve->instr_len   = lower_32_bits(out.r10);
-	ve->instr_info  = upper_32_bits(out.r10);
+
+	/*
+	 * If the #VE happened due to instruction execution, GET_VEINFO
+	 * provides info on the instruction in out.r10.
+	 *
+	 * For #VE due to EPT violation, info in out.r10 is not usable and
+	 * kernel has to decode instruction manually to find out its length.
+	 */
+	if (ve->exit_reason != EXIT_REASON_EPT_VIOLATION) {
+		ve->instr_len   = lower_32_bits(out.r10);
+		ve->instr_info  = upper_32_bits(out.r10);
+	} else {
+		ve->instr_len   = 0;
+		ve->instr_info  = 0;
+	}
 }
 
-/* Handle the user initiated #VE */
-static bool virt_exception_user(struct pt_regs *regs, struct ve_info *ve)
+/*
+ * Handle the user initiated #VE.
+ *
+ * On success, returns the number of bytes RIP should be incremented (>=0)
+ * or -errno on error.
+ */
+static int virt_exception_user(struct pt_regs *regs, struct ve_info *ve)
 {
 	switch (ve->exit_reason) {
 	case EXIT_REASON_CPUID:
-		return handle_cpuid(regs);
+		return handle_cpuid(regs, ve);
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
-		return false;
+		return -EIO;
 	}
 }
 
-/* Handle the kernel #VE */
-static bool virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
+/*
+ * Handle the kernel #VE.
+ *
+ * On success, returns the number of bytes RIP should be incremented (>=0)
+ * or -errno on error.
+ */
+static int virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
 {
 	switch (ve->exit_reason) {
 	case EXIT_REASON_HLT:
-		return handle_halt();
+		return handle_halt(ve);
 	case EXIT_REASON_MSR_READ:
-		return read_msr(regs);
+		return read_msr(regs, ve);
 	case EXIT_REASON_MSR_WRITE:
-		return write_msr(regs);
+		return write_msr(regs, ve);
 	case EXIT_REASON_CPUID:
-		return handle_cpuid(regs);
+		return handle_cpuid(regs, ve);
 	case EXIT_REASON_EPT_VIOLATION:
 		return handle_mmio(regs, ve);
 	case EXIT_REASON_IO_INSTRUCTION:
-		return handle_io(regs, ve->exit_qual);
+		return handle_io(regs, ve);
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
-		return false;
+		return -EIO;
 	}
 }
 
 bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve)
 {
-	bool ret;
+	int insn_len;
 
 	if (user_mode(regs))
-		ret = virt_exception_user(regs, ve);
+		insn_len = virt_exception_user(regs, ve);
 	else
-		ret = virt_exception_kernel(regs, ve);
+		insn_len = virt_exception_kernel(regs, ve);
+	if (insn_len < 0)
+		return false;
 
 	/* After successful #VE handling, move the IP */
-	if (ret)
-		regs->ip += ve->instr_len;
+	regs->ip += insn_len;
 
-	return ret;
+	return true;
 }
 
 static bool tdx_tlb_flush_required(bool private)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  2022-05-20  3:13 [PATCHv2 0/3] Fix for load_unaligned_zeropad() in TDX guest Kirill A. Shutemov
  2022-05-20  3:13 ` [PATCHv2 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
  2022-05-20  3:13 ` [PATCHv2 2/3] x86/tdx: Clarify RIP adjustments in #VE handler Kirill A. Shutemov
@ 2022-05-20  3:13 ` Kirill A. Shutemov
  2022-05-20 17:47   ` Sean Christopherson
  2 siblings, 1 reply; 12+ messages in thread
From: Kirill A. Shutemov @ 2022-05-20  3:13 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86,
	Kirill A. Shutemov

load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
The unwanted loads are typically harmless. But, they might be made to
totally unrelated or even unmapped memory. load_unaligned_zeropad()
relies on exception fixup (#PF, #GP and now #VE) to recover from these
unwanted loads.

In TDX guests, the second page can be shared page and VMM may configure
it to trigger #VE.

Kernel assumes that #VE on a shared page is MMIO access and tries to
decode instruction to handle it. In case of load_unaligned_zeropad() it
may result in confusion as it is not MMIO access.

Check fixup table before trying to handle MMIO.

The issue was discovered by analysis. It was not triggered during the
testing.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/coco/tdx/tdx.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 010dc229096a..1a1c8a92cfa5 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -11,6 +11,8 @@
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
 #include <asm/pgtable.h>
+#include <asm/trapnr.h>
+#include <asm/extable.h>
 
 /* TDX module Call Leaf IDs */
 #define TDX_GET_INFO			1
@@ -299,6 +301,24 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 	if (WARN_ON_ONCE(user_mode(regs)))
 		return -EFAULT;
 
+	/*
+	 * load_unaligned_zeropad() relies on exception fixups in case of the
+	 * word being a page-crosser and the second page is not accessible.
+	 *
+	 * In TDX guests, the second page can be shared page and VMM may
+	 * configure it to trigger #VE.
+	 *
+	 * Kernel assumes that #VE on a shared page is MMIO access and tries to
+	 * decode instruction to handle it. In case of load_unaligned_zeropad()
+	 * it may result in confusion as it is not MMIO access.
+	 *
+	 * Check fixup table before trying to handle MMIO.
+	 */
+	if (fixup_exception(regs, X86_TRAP_VE, 0, ve->gla)) {
+		/* regs->ip is adjusted by fixup_exception() */
+		return 0;
+	}
+
 	if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE))
 		return -EFAULT;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 1/3] x86/tdx: Fix early #VE handling
  2022-05-20  3:13 ` [PATCHv2 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
@ 2022-05-20 17:01   ` Sean Christopherson
  2022-05-20 17:33     ` Sean Christopherson
  2022-05-20 17:46   ` Sathyanarayanan Kuppuswamy
  1 sibling, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2022-05-20 17:01 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: tglx, mingo, bp, dave.hansen, luto, peterz, ak, dan.j.williams,
	david, hpa, linux-kernel, sathyanarayanan.kuppuswamy,
	thomas.lendacky, x86

On Fri, May 20, 2022, Kirill A. Shutemov wrote:
> Move RIP in tdx_early_handle_ve() after handling the exception. Failure
> to do that leads to infinite loop of exceptions.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Fixes: 32e72854fa5f ("x86/tdx: Port I/O: Add early boot support")
> ---
>  arch/x86/coco/tdx/tdx.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 03deb4d6920d..faae53f8d559 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -447,13 +447,17 @@ static bool handle_io(struct pt_regs *regs, u32 exit_qual)
>  __init bool tdx_early_handle_ve(struct pt_regs *regs)
>  {
>  	struct ve_info ve;
> +	bool ret;
>  
>  	tdx_get_ve_info(&ve);
>  
>  	if (ve.exit_reason != EXIT_REASON_IO_INSTRUCTION)
>  		return false;
>  
> -	return handle_io(regs, ve.exit_qual);
> +	ret = handle_io(regs, ve.exit_qual);
> +	if (ret)

Ugh, the boolean returns instead of 0/-errno are fugly.  At first glance I thought
this was wrong, i.e. advancing RIP on failure.

Assuming moving away from booleans isn't happening anytime soon maybe s/ret/success
or s/ret/handled to make it more obvious that it's a happy path?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 1/3] x86/tdx: Fix early #VE handling
  2022-05-20 17:01   ` Sean Christopherson
@ 2022-05-20 17:33     ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2022-05-20 17:33 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: tglx, mingo, bp, dave.hansen, luto, peterz, ak, dan.j.williams,
	david, hpa, linux-kernel, sathyanarayanan.kuppuswamy,
	thomas.lendacky, x86

On Fri, May 20, 2022, Sean Christopherson wrote:
> On Fri, May 20, 2022, Kirill A. Shutemov wrote:
> > Move RIP in tdx_early_handle_ve() after handling the exception. Failure
> > to do that leads to infinite loop of exceptions.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Fixes: 32e72854fa5f ("x86/tdx: Port I/O: Add early boot support")
> > ---
> >  arch/x86/coco/tdx/tdx.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> > index 03deb4d6920d..faae53f8d559 100644
> > --- a/arch/x86/coco/tdx/tdx.c
> > +++ b/arch/x86/coco/tdx/tdx.c
> > @@ -447,13 +447,17 @@ static bool handle_io(struct pt_regs *regs, u32 exit_qual)
> >  __init bool tdx_early_handle_ve(struct pt_regs *regs)
> >  {
> >  	struct ve_info ve;
> > +	bool ret;
> >  
> >  	tdx_get_ve_info(&ve);
> >  
> >  	if (ve.exit_reason != EXIT_REASON_IO_INSTRUCTION)
> >  		return false;
> >  
> > -	return handle_io(regs, ve.exit_qual);
> > +	ret = handle_io(regs, ve.exit_qual);
> > +	if (ret)
> 
> Ugh, the boolean returns instead of 0/-errno are fugly.  At first glance I thought
> this was wrong, i.e. advancing RIP on failure.
> 
> Assuming moving away from booleans isn't happening anytime soon maybe s/ret/success
> or s/ret/handled to make it more obvious that it's a happy path?

Doh, should have looked at patch 2...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 1/3] x86/tdx: Fix early #VE handling
  2022-05-20  3:13 ` [PATCHv2 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
  2022-05-20 17:01   ` Sean Christopherson
@ 2022-05-20 17:46   ` Sathyanarayanan Kuppuswamy
  1 sibling, 0 replies; 12+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2022-05-20 17:46 UTC (permalink / raw)
  To: Kirill A. Shutemov, tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel, seanjc,
	thomas.lendacky, x86



On 5/19/22 8:13 PM, Kirill A. Shutemov wrote:
> Move RIP in tdx_early_handle_ve() after handling the exception. Failure
> to do that leads to infinite loop of exceptions.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Fixes: 32e72854fa5f ("x86/tdx: Port I/O: Add early boot support")
> ---
>   arch/x86/coco/tdx/tdx.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)

Looks good to me

Reviewed-by: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@linux.intel.com>

> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 03deb4d6920d..faae53f8d559 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -447,13 +447,17 @@ static bool handle_io(struct pt_regs *regs, u32 exit_qual)
>   __init bool tdx_early_handle_ve(struct pt_regs *regs)
>   {
>   	struct ve_info ve;
> +	bool ret;
>   
>   	tdx_get_ve_info(&ve);
>   
>   	if (ve.exit_reason != EXIT_REASON_IO_INSTRUCTION)
>   		return false;
>   
> -	return handle_io(regs, ve.exit_qual);
> +	ret = handle_io(regs, ve.exit_qual);
> +	if (ret)
> +		regs->ip += ve.instr_len;
> +	return ret;
>   }
>   
>   void tdx_get_ve_info(struct ve_info *ve)

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  2022-05-20  3:13 ` [PATCHv2 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page Kirill A. Shutemov
@ 2022-05-20 17:47   ` Sean Christopherson
  2022-05-20 18:43     ` Kirill A. Shutemov
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2022-05-20 17:47 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: tglx, mingo, bp, dave.hansen, luto, peterz, ak, dan.j.williams,
	david, hpa, linux-kernel, sathyanarayanan.kuppuswamy,
	thomas.lendacky, x86

On Fri, May 20, 2022, Kirill A. Shutemov wrote:
> load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
> The unwanted loads are typically harmless. But, they might be made to
> totally unrelated or even unmapped memory. load_unaligned_zeropad()
> relies on exception fixup (#PF, #GP and now #VE) to recover from these
> unwanted loads.
> 
> In TDX guests, the second page can be shared page and VMM may configure
> it to trigger #VE.
> 
> Kernel assumes that #VE on a shared page is MMIO access and tries to
> decode instruction to handle it. In case of load_unaligned_zeropad() it
> may result in confusion as it is not MMIO access.
> 
> Check fixup table before trying to handle MMIO.
> 
> The issue was discovered by analysis. It was not triggered during the
> testing.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/coco/tdx/tdx.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index 010dc229096a..1a1c8a92cfa5 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -11,6 +11,8 @@
>  #include <asm/insn.h>
>  #include <asm/insn-eval.h>
>  #include <asm/pgtable.h>
> +#include <asm/trapnr.h>
> +#include <asm/extable.h>
>  
>  /* TDX module Call Leaf IDs */
>  #define TDX_GET_INFO			1
> @@ -299,6 +301,24 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
>  	if (WARN_ON_ONCE(user_mode(regs)))
>  		return -EFAULT;
>  
> +	/*
> +	 * load_unaligned_zeropad() relies on exception fixups in case of the
> +	 * word being a page-crosser and the second page is not accessible.
> +	 *
> +	 * In TDX guests, the second page can be shared page and VMM may
> +	 * configure it to trigger #VE.
> +	 *
> +	 * Kernel assumes that #VE on a shared page is MMIO access and tries to
> +	 * decode instruction to handle it. In case of load_unaligned_zeropad()
> +	 * it may result in confusion as it is not MMIO access.

The guest kernel can't know that it's not "MMIO", e.g. nothing prevents the host
from manually serving accesses to some chunk of shared memory instead of backing
the shared chunk with host DRAM.

> +	 *
> +	 * Check fixup table before trying to handle MMIO.

This ordering is wrong, fixup should be done if and only if the instruction truly
"faults".  E.g. if there's an MMIO access lurking in the kernel that is wrapped in
exception fixup, then this will break that usage and provide garbage data on a read
and drop any write.

> +	 */
> +	if (fixup_exception(regs, X86_TRAP_VE, 0, ve->gla)) {
> +		/* regs->ip is adjusted by fixup_exception() */
> +		return 0;
> +	}
> +
>  	if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE))
>  		return -EFAULT;
>  
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 2/3] x86/tdx: Clarify RIP adjustments in #VE handler
  2022-05-20  3:13 ` [PATCHv2 2/3] x86/tdx: Clarify RIP adjustments in #VE handler Kirill A. Shutemov
@ 2022-05-20 17:52   ` Dave Hansen
  2022-05-20 18:01     ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2022-05-20 17:52 UTC (permalink / raw)
  To: Kirill A. Shutemov, tglx, mingo, bp, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86

On 5/19/22 20:13, Kirill A. Shutemov wrote:
> +	/*
> +	 * If the #VE happened due to instruction execution, GET_VEINFO
> +	 * provides info on the instruction in out.r10.
> +	 *
> +	 * For #VE due to EPT violation, info in out.r10 is not usable and
> +	 * kernel has to decode instruction manually to find out its length.
> +	 */
> +	if (ve->exit_reason != EXIT_REASON_EPT_VIOLATION) {
> +		ve->instr_len   = lower_32_bits(out.r10);
> +		ve->instr_info  = upper_32_bits(out.r10);
> +	} else {
> +		ve->instr_len   = 0;
> +		ve->instr_info  = 0;
> +	}
>  }

This is _better_, but I still don't like it.  It still hides the magic
down in here and if someone screws it up, they'll silently get a loop of
#VE's.  If we stick the logic a helper like:

int ve_instr_len(struct ve_info *ve)
{
	if (WARN_ON_ONCE(ve->exit_reason != EXIT_REASON_EPT_VIOLATION))
		return 0;

	return ve->instr_len;
}

and then folks consume the instruction length with that, we get a splat
right where and when it goes wrong.

BTW, how do we know that all non-EPT_VIOLATION exits reasons are
instruction execution?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 2/3] x86/tdx: Clarify RIP adjustments in #VE handler
  2022-05-20 17:52   ` Dave Hansen
@ 2022-05-20 18:01     ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2022-05-20 18:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, tglx, mingo, bp, luto, peterz, ak,
	dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, thomas.lendacky, x86

On Fri, May 20, 2022, Dave Hansen wrote:
> BTW, how do we know that all non-EPT_VIOLATION exits reasons are
> instruction execution?

The TDX module spec actually does a decent job of calling this out explicitly:

  #VE may be injected by the Intel TDX module in several cases:
    * Emulation of the architectural #VE injection on EPT violation, done by a
      guest-side Intel TDX module flow that performs an EPT walk.
    * As a result of guest TD execution of a disallowed instruction (see 10.4 above),
      a disallowed MSR access (see 10.7 above), or CPUID virtualization (see 10.8 above).
    * A notification to the guest TD about anomalous behavior (e.g., too many EPT
      violations reported on the same TD VCPU instruction without making progress).
      This kind of #VE is raised only if the guest TD enabled the specific notification
      (using TDG.VM.WR to write the TDCS.NOTIFY_ENABLES field) and when a #VE can be
      injected. See 16.3 for details.

The first one is (obviously) the EPT violation / MMIO case.  The second is purely
instruction execution (the MSR and CPUID clauses are specific to RDMSR, WRMSR,
and CPUID)).  The third I hadn't seen until now, but it's opt-in.

The main switch statement further guarantees the kernel is only going to handle
EPT violations and instruction exits at this time:

        switch (ve->exit_reason) {
        case EXIT_REASON_HLT:
                return handle_halt();
        case EXIT_REASON_MSR_READ:
                return read_msr(regs);
        case EXIT_REASON_MSR_WRITE:
                return write_msr(regs);
        case EXIT_REASON_CPUID:
                return handle_cpuid(regs);
        case EXIT_REASON_EPT_VIOLATION:
                return handle_mmio(regs, ve);
        case EXIT_REASON_IO_INSTRUCTION:
                return handle_io(regs, ve->exit_qual);
        default:
                pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
                return false;
        }


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  2022-05-20 17:47   ` Sean Christopherson
@ 2022-05-20 18:43     ` Kirill A. Shutemov
  2022-05-20 19:00       ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Kirill A. Shutemov @ 2022-05-20 18:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: tglx, mingo, bp, dave.hansen, luto, peterz, ak, dan.j.williams,
	david, hpa, linux-kernel, sathyanarayanan.kuppuswamy,
	thomas.lendacky, x86

On Fri, May 20, 2022 at 05:47:30PM +0000, Sean Christopherson wrote:
> On Fri, May 20, 2022, Kirill A. Shutemov wrote:
> > load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
> > The unwanted loads are typically harmless. But, they might be made to
> > totally unrelated or even unmapped memory. load_unaligned_zeropad()
> > relies on exception fixup (#PF, #GP and now #VE) to recover from these
> > unwanted loads.
> > 
> > In TDX guests, the second page can be shared page and VMM may configure
> > it to trigger #VE.
> > 
> > Kernel assumes that #VE on a shared page is MMIO access and tries to
> > decode instruction to handle it. In case of load_unaligned_zeropad() it
> > may result in confusion as it is not MMIO access.
> > 
> > Check fixup table before trying to handle MMIO.
> > 
> > The issue was discovered by analysis. It was not triggered during the
> > testing.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/coco/tdx/tdx.c | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> > 
> > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> > index 010dc229096a..1a1c8a92cfa5 100644
> > --- a/arch/x86/coco/tdx/tdx.c
> > +++ b/arch/x86/coco/tdx/tdx.c
> > @@ -11,6 +11,8 @@
> >  #include <asm/insn.h>
> >  #include <asm/insn-eval.h>
> >  #include <asm/pgtable.h>
> > +#include <asm/trapnr.h>
> > +#include <asm/extable.h>
> >  
> >  /* TDX module Call Leaf IDs */
> >  #define TDX_GET_INFO			1
> > @@ -299,6 +301,24 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
> >  	if (WARN_ON_ONCE(user_mode(regs)))
> >  		return -EFAULT;
> >  
> > +	/*
> > +	 * load_unaligned_zeropad() relies on exception fixups in case of the
> > +	 * word being a page-crosser and the second page is not accessible.
> > +	 *
> > +	 * In TDX guests, the second page can be shared page and VMM may
> > +	 * configure it to trigger #VE.
> > +	 *
> > +	 * Kernel assumes that #VE on a shared page is MMIO access and tries to
> > +	 * decode instruction to handle it. In case of load_unaligned_zeropad()
> > +	 * it may result in confusion as it is not MMIO access.
> 
> The guest kernel can't know that it's not "MMIO", e.g. nothing prevents the host
> from manually serving accesses to some chunk of shared memory instead of backing
> the shared chunk with host DRAM.

It would require the guest to access shared memory only with instructions
that we can deal with. I don't think we have such guarantee.

> > +	 *
> > +	 * Check fixup table before trying to handle MMIO.
> 
> This ordering is wrong, fixup should be done if and only if the instruction truly
> "faults".  E.g. if there's an MMIO access lurking in the kernel that is wrapped in
> exception fixup, then this will break that usage and provide garbage data on a read
> and drop any write.

When I tried to trigger the bug, the #VE actually succeed, because
load_unaligned_zeropad() uses instruction we can decode. But due
misalignment, the part of that came from non-shared page got overwritten
with data that came from VMM.

I guess we can try to detect misaligned accesses and handle them
correctly. But it gets complicated and easer to screw up.

Do we ever use exception fixups for MMIO accesses to justify the
complication?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  2022-05-20 18:43     ` Kirill A. Shutemov
@ 2022-05-20 19:00       ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2022-05-20 19:00 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: tglx, mingo, bp, dave.hansen, luto, peterz, ak, dan.j.williams,
	david, hpa, linux-kernel, sathyanarayanan.kuppuswamy,
	thomas.lendacky, x86

On Fri, May 20, 2022, Kirill A. Shutemov wrote:
> On Fri, May 20, 2022 at 05:47:30PM +0000, Sean Christopherson wrote:
> > On Fri, May 20, 2022, Kirill A. Shutemov wrote:
> > > @@ -299,6 +301,24 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
> > >  	if (WARN_ON_ONCE(user_mode(regs)))
> > >  		return -EFAULT;
> > >  
> > > +	/*
> > > +	 * load_unaligned_zeropad() relies on exception fixups in case of the
> > > +	 * word being a page-crosser and the second page is not accessible.
> > > +	 *
> > > +	 * In TDX guests, the second page can be shared page and VMM may
> > > +	 * configure it to trigger #VE.
> > > +	 *
> > > +	 * Kernel assumes that #VE on a shared page is MMIO access and tries to
> > > +	 * decode instruction to handle it. In case of load_unaligned_zeropad()
> > > +	 * it may result in confusion as it is not MMIO access.
> > 
> > The guest kernel can't know that it's not "MMIO", e.g. nothing prevents the host
> > from manually serving accesses to some chunk of shared memory instead of backing
> > the shared chunk with host DRAM.
> 
> It would require the guest to access shared memory only with instructions
> that we can deal with. I don't think we have such guarantee.

Ya, it's purely thoereticaly behavior.  But panicking if the kernel can't decode
the instruction is really all the guest can do.

> > > +	 * Check fixup table before trying to handle MMIO.
> > 
> > This ordering is wrong, fixup should be done if and only if the instruction truly
> > "faults".  E.g. if there's an MMIO access lurking in the kernel that is wrapped in
> > exception fixup, then this will break that usage and provide garbage data on a read
> > and drop any write.
> 
> When I tried to trigger the bug, the #VE actually succeed, because
> load_unaligned_zeropad() uses instruction we can decode. But due
> misalignment, the part of that came from non-shared page got overwritten
> with data that came from VMM.

That's a bug in the emulation then.  I.e. it needs to deal with page splits.

> I guess we can try to detect misaligned accesses and handle them
> correctly. But it gets complicated and easer to screw up.

At a minimum, it should reject EPT violation #VEs that split pages (on either side).
That's needed irrespective of fixup, e.g. if there's a bug in there kernel that
results in splitting an MMIO region, then panicking is better than data corruption.

Then the post-failure fixup will work, i.e. the load_unaligned_zeropad() will work
like you intend here, without risking spurious fixup.
 
> Do we ever use exception fixups for MMIO accesses to justify the
> complication?

It's essentially impossible to prove because identifying all the MMIO accesses in
the kernel (and drivers!) is extremely difficult, e.g. see the I/O APIC code which
uses a struct to overlay MMIO.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-20 19:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-20  3:13 [PATCHv2 0/3] Fix for load_unaligned_zeropad() in TDX guest Kirill A. Shutemov
2022-05-20  3:13 ` [PATCHv2 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
2022-05-20 17:01   ` Sean Christopherson
2022-05-20 17:33     ` Sean Christopherson
2022-05-20 17:46   ` Sathyanarayanan Kuppuswamy
2022-05-20  3:13 ` [PATCHv2 2/3] x86/tdx: Clarify RIP adjustments in #VE handler Kirill A. Shutemov
2022-05-20 17:52   ` Dave Hansen
2022-05-20 18:01     ` Sean Christopherson
2022-05-20  3:13 ` [PATCHv2 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page Kirill A. Shutemov
2022-05-20 17:47   ` Sean Christopherson
2022-05-20 18:43     ` Kirill A. Shutemov
2022-05-20 19:00       ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.