All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 0/3] Fix for load_unaligned_zeropad() in TDX guest
@ 2022-05-24 22:10 Kirill A. Shutemov
  2022-05-24 22:10 ` [PATCHv3 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2022-05-24 22:10 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86,
	Kirill A. Shutemov

An updated version of the load_unaligned_zeropad() fix.

v3:
 - Rework load_unaligned_zeropad() fix: fail misaligned MMIO accesses
   instead of upfront exception fixups;
 - Add ve_instr_len() helper to wrap access to ve->instr_len;
 - Add Reviewed-by from Sathya;
v2:
 - Clarify RIP adjustments during #VE handling;
 - Fix early #VE handling;
 - Fix comment and commit message;

Kirill A. Shutemov (3):
  x86/tdx: Fix early #VE handling
  x86/tdx: Clarify RIP adjustments in #VE handler
  x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page

 arch/x86/coco/tdx/tdx.c | 154 +++++++++++++++++++++++++++-------------
 1 file changed, 104 insertions(+), 50 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCHv3 1/3] x86/tdx: Fix early #VE handling
  2022-05-24 22:10 [PATCHv3 0/3] Fix for load_unaligned_zeropad() in TDX guest Kirill A. Shutemov
@ 2022-05-24 22:10 ` Kirill A. Shutemov
  2022-05-24 22:10 ` [PATCHv3 2/3] x86/tdx: Clarify RIP adjustments in #VE handler Kirill A. Shutemov
  2022-05-24 22:10 ` [PATCHv3 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page Kirill A. Shutemov
  2 siblings, 0 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2022-05-24 22:10 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86,
	Kirill A. Shutemov

Move RIP in tdx_early_handle_ve() after handling the exception. Failure
to do that leads to infinite loop of exceptions.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 32e72854fa5f ("x86/tdx: Port I/O: Add early boot support")
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 arch/x86/coco/tdx/tdx.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 03deb4d6920d..faae53f8d559 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -447,13 +447,17 @@ static bool handle_io(struct pt_regs *regs, u32 exit_qual)
 __init bool tdx_early_handle_ve(struct pt_regs *regs)
 {
 	struct ve_info ve;
+	bool ret;
 
 	tdx_get_ve_info(&ve);
 
 	if (ve.exit_reason != EXIT_REASON_IO_INSTRUCTION)
 		return false;
 
-	return handle_io(regs, ve.exit_qual);
+	ret = handle_io(regs, ve.exit_qual);
+	if (ret)
+		regs->ip += ve.instr_len;
+	return ret;
 }
 
 void tdx_get_ve_info(struct ve_info *ve)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCHv3 2/3] x86/tdx: Clarify RIP adjustments in #VE handler
  2022-05-24 22:10 [PATCHv3 0/3] Fix for load_unaligned_zeropad() in TDX guest Kirill A. Shutemov
  2022-05-24 22:10 ` [PATCHv3 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
@ 2022-05-24 22:10 ` Kirill A. Shutemov
  2022-05-25 16:02   ` Dave Hansen
  2022-05-24 22:10 ` [PATCHv3 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page Kirill A. Shutemov
  2 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2022-05-24 22:10 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86,
	Kirill A. Shutemov

After successful #VE handling, tdx_handle_virt_exception() has to move
RIP to the next instruction. The handler needs to know the length of the
instruction.

If the #VE happened due to instruction execution, GET_VEINFO TDX module
call provides info on the instruction in R10, including its length.

For #VE due to EPT violation, info in R10 is not usable and kernel has
to decode instruction manually to find out its length.

Restructure the code to make it explicit that the instruction length
depends on the type of #VE. Handler of an exit reason returns
instruction length on success or -errno on failure.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/coco/tdx/tdx.c | 149 +++++++++++++++++++++++++---------------
 1 file changed, 94 insertions(+), 55 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index faae53f8d559..94e447e7f103 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -124,6 +124,22 @@ static u64 get_cc_mask(void)
 	return BIT_ULL(gpa_width - 1);
 }
 
+static int ve_instr_len(struct ve_info *ve)
+{
+	/*
+	 * If the #VE happened due to instruction execution, GET_VEINFO
+	 * provides info on the instruction.
+	 *
+	 * For #VE due to EPT violation, info provided by GET_VEINFO not usable
+	 * and kernel has to decode instruction manually to find out its
+	 * length. Catch such cases.
+	 */
+	if (WARN_ON_ONCE(ve->exit_reason == EXIT_REASON_EPT_VIOLATION))
+		return 0;
+
+	return ve->instr_len;
+}
+
 static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti)
 {
 	struct tdx_hypercall_args args = {
@@ -147,7 +163,7 @@ static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti)
 	return __tdx_hypercall(&args, do_sti ? TDX_HCALL_ISSUE_STI : 0);
 }
 
-static bool handle_halt(void)
+static int handle_halt(struct ve_info *ve)
 {
 	/*
 	 * Since non safe halt is mainly used in CPU offlining
@@ -158,9 +174,9 @@ static bool handle_halt(void)
 	const bool do_sti = false;
 
 	if (__halt(irq_disabled, do_sti))
-		return false;
+		return -EIO;
 
-	return true;
+	return ve_instr_len(ve);
 }
 
 void __cpuidle tdx_safe_halt(void)
@@ -180,7 +196,7 @@ void __cpuidle tdx_safe_halt(void)
 		WARN_ONCE(1, "HLT instruction emulation failed\n");
 }
 
-static bool read_msr(struct pt_regs *regs)
+static int read_msr(struct pt_regs *regs, struct ve_info *ve)
 {
 	struct tdx_hypercall_args args = {
 		.r10 = TDX_HYPERCALL_STANDARD,
@@ -194,14 +210,14 @@ static bool read_msr(struct pt_regs *regs)
 	 * (GHCI), section titled "TDG.VP.VMCALL<Instruction.RDMSR>".
 	 */
 	if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
-		return false;
+		return -EIO;
 
 	regs->ax = lower_32_bits(args.r11);
 	regs->dx = upper_32_bits(args.r11);
-	return true;
+	return ve_instr_len(ve);
 }
 
-static bool write_msr(struct pt_regs *regs)
+static int write_msr(struct pt_regs *regs, struct ve_info *ve)
 {
 	struct tdx_hypercall_args args = {
 		.r10 = TDX_HYPERCALL_STANDARD,
@@ -215,10 +231,13 @@ static bool write_msr(struct pt_regs *regs)
 	 * can be found in TDX Guest-Host-Communication Interface
 	 * (GHCI) section titled "TDG.VP.VMCALL<Instruction.WRMSR>".
 	 */
-	return !__tdx_hypercall(&args, 0);
+	if (__tdx_hypercall(&args, 0))
+		return -EIO;
+
+	return ve_instr_len(ve);
 }
 
-static bool handle_cpuid(struct pt_regs *regs)
+static int handle_cpuid(struct pt_regs *regs, struct ve_info *ve)
 {
 	struct tdx_hypercall_args args = {
 		.r10 = TDX_HYPERCALL_STANDARD,
@@ -236,7 +255,7 @@ static bool handle_cpuid(struct pt_regs *regs)
 	 */
 	if (regs->ax < 0x40000000 || regs->ax > 0x4FFFFFFF) {
 		regs->ax = regs->bx = regs->cx = regs->dx = 0;
-		return true;
+		return ve_instr_len(ve);
 	}
 
 	/*
@@ -245,7 +264,7 @@ static bool handle_cpuid(struct pt_regs *regs)
 	 * (GHCI), section titled "VP.VMCALL<Instruction.CPUID>".
 	 */
 	if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
-		return false;
+		return -EIO;
 
 	/*
 	 * As per TDX GHCI CPUID ABI, r12-r15 registers contain contents of
@@ -257,7 +276,7 @@ static bool handle_cpuid(struct pt_regs *regs)
 	regs->cx = args.r14;
 	regs->dx = args.r15;
 
-	return true;
+	return ve_instr_len(ve);
 }
 
 static bool mmio_read(int size, unsigned long addr, unsigned long *val)
@@ -283,7 +302,7 @@ static bool mmio_write(int size, unsigned long addr, unsigned long val)
 			       EPT_WRITE, addr, val);
 }
 
-static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
+static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 {
 	char buffer[MAX_INSN_SIZE];
 	unsigned long *reg, val;
@@ -294,34 +313,36 @@ static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 
 	/* Only in-kernel MMIO is supported */
 	if (WARN_ON_ONCE(user_mode(regs)))
-		return false;
+		return -EFAULT;
 
 	if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE))
-		return false;
+		return -EFAULT;
 
 	if (insn_decode(&insn, buffer, MAX_INSN_SIZE, INSN_MODE_64))
-		return false;
+		return -EINVAL;
 
 	mmio = insn_decode_mmio(&insn, &size);
 	if (WARN_ON_ONCE(mmio == MMIO_DECODE_FAILED))
-		return false;
+		return -EINVAL;
 
 	if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
 		reg = insn_get_modrm_reg_ptr(&insn, regs);
 		if (!reg)
-			return false;
+			return -EINVAL;
 	}
 
-	ve->instr_len = insn.length;
-
 	/* Handle writes first */
 	switch (mmio) {
 	case MMIO_WRITE:
 		memcpy(&val, reg, size);
-		return mmio_write(size, ve->gpa, val);
+		if (!mmio_write(size, ve->gpa, val))
+			return -EIO;
+		return insn.length;
 	case MMIO_WRITE_IMM:
 		val = insn.immediate.value;
-		return mmio_write(size, ve->gpa, val);
+		if (!mmio_write(size, ve->gpa, val))
+			return -EIO;
+		return insn.length;
 	case MMIO_READ:
 	case MMIO_READ_ZERO_EXTEND:
 	case MMIO_READ_SIGN_EXTEND:
@@ -334,15 +355,15 @@ static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 		 * decoded or handled properly. It was likely not using io.h
 		 * helpers or accessed MMIO accidentally.
 		 */
-		return false;
+		return -EINVAL;
 	default:
 		WARN_ONCE(1, "Unknown insn_decode_mmio() decode value?");
-		return false;
+		return -EINVAL;
 	}
 
 	/* Handle reads */
 	if (!mmio_read(size, ve->gpa, &val))
-		return false;
+		return -EIO;
 
 	switch (mmio) {
 	case MMIO_READ:
@@ -364,13 +385,13 @@ static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 	default:
 		/* All other cases has to be covered with the first switch() */
 		WARN_ON_ONCE(1);
-		return false;
+		return -EINVAL;
 	}
 
 	if (extend_size)
 		memset(reg, extend_val, extend_size);
 	memcpy(reg, &val, size);
-	return true;
+	return insn.length;
 }
 
 static bool handle_in(struct pt_regs *regs, int size, int port)
@@ -421,13 +442,14 @@ static bool handle_out(struct pt_regs *regs, int size, int port)
  *
  * Return True on success or False on failure.
  */
-static bool handle_io(struct pt_regs *regs, u32 exit_qual)
+static int handle_io(struct pt_regs *regs, struct ve_info *ve)
 {
+	u32 exit_qual = ve->exit_qual;
 	int size, port;
-	bool in;
+	bool in, ret;
 
 	if (VE_IS_IO_STRING(exit_qual))
-		return false;
+		return -EIO;
 
 	in   = VE_IS_IO_IN(exit_qual);
 	size = VE_GET_IO_SIZE(exit_qual);
@@ -435,9 +457,13 @@ static bool handle_io(struct pt_regs *regs, u32 exit_qual)
 
 
 	if (in)
-		return handle_in(regs, size, port);
+		ret = handle_in(regs, size, port);
 	else
-		return handle_out(regs, size, port);
+		ret = handle_out(regs, size, port);
+	if (!ret)
+		return -EIO;
+
+	return ve_instr_len(ve);
 }
 
 /*
@@ -447,17 +473,19 @@ static bool handle_io(struct pt_regs *regs, u32 exit_qual)
 __init bool tdx_early_handle_ve(struct pt_regs *regs)
 {
 	struct ve_info ve;
-	bool ret;
+	int insn_len;
 
 	tdx_get_ve_info(&ve);
 
 	if (ve.exit_reason != EXIT_REASON_IO_INSTRUCTION)
 		return false;
 
-	ret = handle_io(regs, ve.exit_qual);
-	if (ret)
-		regs->ip += ve.instr_len;
-	return ret;
+	insn_len = handle_io(regs, &ve);
+	if (insn_len < 0)
+		return false;
+
+	regs->ip += insn_len;
+	return true;
 }
 
 void tdx_get_ve_info(struct ve_info *ve)
@@ -490,54 +518,65 @@ void tdx_get_ve_info(struct ve_info *ve)
 	ve->instr_info  = upper_32_bits(out.r10);
 }
 
-/* Handle the user initiated #VE */
-static bool virt_exception_user(struct pt_regs *regs, struct ve_info *ve)
+/*
+ * Handle the user initiated #VE.
+ *
+ * On success, returns the number of bytes RIP should be incremented (>=0)
+ * or -errno on error.
+ */
+static int virt_exception_user(struct pt_regs *regs, struct ve_info *ve)
 {
 	switch (ve->exit_reason) {
 	case EXIT_REASON_CPUID:
-		return handle_cpuid(regs);
+		return handle_cpuid(regs, ve);
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
-		return false;
+		return -EIO;
 	}
 }
 
-/* Handle the kernel #VE */
-static bool virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
+/*
+ * Handle the kernel #VE.
+ *
+ * On success, returns the number of bytes RIP should be incremented (>=0)
+ * or -errno on error.
+ */
+static int virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
 {
 	switch (ve->exit_reason) {
 	case EXIT_REASON_HLT:
-		return handle_halt();
+		return handle_halt(ve);
 	case EXIT_REASON_MSR_READ:
-		return read_msr(regs);
+		return read_msr(regs, ve);
 	case EXIT_REASON_MSR_WRITE:
-		return write_msr(regs);
+		return write_msr(regs, ve);
 	case EXIT_REASON_CPUID:
-		return handle_cpuid(regs);
+		return handle_cpuid(regs, ve);
 	case EXIT_REASON_EPT_VIOLATION:
 		return handle_mmio(regs, ve);
 	case EXIT_REASON_IO_INSTRUCTION:
-		return handle_io(regs, ve->exit_qual);
+		return handle_io(regs, ve);
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
-		return false;
+		return -EIO;
 	}
 }
 
 bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve)
 {
-	bool ret;
+	int insn_len;
 
 	if (user_mode(regs))
-		ret = virt_exception_user(regs, ve);
+		insn_len = virt_exception_user(regs, ve);
 	else
-		ret = virt_exception_kernel(regs, ve);
+		insn_len = virt_exception_kernel(regs, ve);
+	if (insn_len < 0)
+		return false;
 
 	/* After successful #VE handling, move the IP */
-	if (ret)
-		regs->ip += ve->instr_len;
+	regs->ip += insn_len;
 
-	return ret;
+	return true;
 }
 
 static bool tdx_tlb_flush_required(bool private)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCHv3 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  2022-05-24 22:10 [PATCHv3 0/3] Fix for load_unaligned_zeropad() in TDX guest Kirill A. Shutemov
  2022-05-24 22:10 ` [PATCHv3 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
  2022-05-24 22:10 ` [PATCHv3 2/3] x86/tdx: Clarify RIP adjustments in #VE handler Kirill A. Shutemov
@ 2022-05-24 22:10 ` Kirill A. Shutemov
  2022-05-26 16:20   ` Dave Hansen
  2 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2022-05-24 22:10 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86,
	Kirill A. Shutemov

load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
The unwanted loads are typically harmless. But, they might be made to
totally unrelated or even unmapped memory. load_unaligned_zeropad()
relies on exception fixup (#PF, #GP and now #VE) to recover from these
unwanted loads.

In TDX guests, the second page can be shared page and VMM may configure
it to trigger #VE.

Kernel assumes that #VE on a shared page is MMIO access and tries to
decode instruction to handle it. In case of load_unaligned_zeropad() it
may result in confusion as it is not MMIO access.

Fix it by detecting unaligned MMIO accesses (it includes page-crossings)
and fail them. load_unaligned_zeropad() will recover using exception
fixups.

The issue was discovered by analysis. It was not triggered during the
testing.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/coco/tdx/tdx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 94e447e7f103..4e566ed67db8 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -331,6 +331,17 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
 			return -EINVAL;
 	}
 
+	/*
+	 * MMIO accesses suppose to be naturally aligned and therefore never
+	 * cross a page boundary. Seeing unaligned accesses indicates a bug or
+	 * load_unaligned_zeropad() that steps into unmapped shared page.
+	 *
+	 * In both cases fail the #VE handling. load_unaligned_zeropad() will
+	 * recover using exception fixups.
+	 */
+	if ((unsigned long)insn_get_addr_ref(&insn, regs) % size)
+		return -EFAULT;
+
 	/* Handle writes first */
 	switch (mmio) {
 	case MMIO_WRITE:
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCHv3 2/3] x86/tdx: Clarify RIP adjustments in #VE handler
  2022-05-24 22:10 ` [PATCHv3 2/3] x86/tdx: Clarify RIP adjustments in #VE handler Kirill A. Shutemov
@ 2022-05-25 16:02   ` Dave Hansen
  2022-05-26 20:13     ` Kirill A. Shutemov
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2022-05-25 16:02 UTC (permalink / raw)
  To: Kirill A. Shutemov, tglx, mingo, bp, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86

On 5/24/22 15:10, Kirill A. Shutemov wrote:
> +static int ve_instr_len(struct ve_info *ve)
> +{
> +	/*
> +	 * If the #VE happened due to instruction execution, GET_VEINFO
> +	 * provides info on the instruction.
> +	 *
> +	 * For #VE due to EPT violation, info provided by GET_VEINFO not usable
> +	 * and kernel has to decode instruction manually to find out its
> +	 * length. Catch such cases.
> +	 */
> +	if (WARN_ON_ONCE(ve->exit_reason == EXIT_REASON_EPT_VIOLATION))
> +		return 0;
> +
> +	return ve->instr_len;
> +}

I'm not super happy with how this comment ended up.  First, let's put
the comment next to the code to which it applies, like:

	/*
	 * ve->instr_len is not defined for EPT violations.  For those,
	 * the kernel must decode instructions manually and should not
	 * be using this function.
	 */
	if (WARN_ON_ONCE(ve->exit_reason == EXIT_REASON_EPT_VIOLATION))
		return 0;

	/*
	 * Assume that the #VE occurred due to instruction execution.
	 */
	return ve->instr_len;

Second, there also needs to be *something* here to link this back to the
TDX module spec and the VMCS documentation.  To make actual sense of
this, you need to tie together something like three or four pieces of
disparate information scattered across two massive documents.

Intel really made this quite the scavenger hunt.  Time to atone for the
sins of your colleagues by tying all of those things together in one place.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  2022-05-24 22:10 ` [PATCHv3 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page Kirill A. Shutemov
@ 2022-05-26 16:20   ` Dave Hansen
  2022-05-26 20:36     ` Kirill A. Shutemov
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2022-05-26 16:20 UTC (permalink / raw)
  To: Kirill A. Shutemov, tglx, mingo, bp, luto, peterz
  Cc: ak, dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, seanjc, thomas.lendacky, x86

On 5/24/22 15:10, Kirill A. Shutemov wrote:
> +	/*
> +	 * MMIO accesses suppose to be naturally aligned and therefore never
> +	 * cross a page boundary. Seeing unaligned accesses indicates a bug or
> +	 * load_unaligned_zeropad() that steps into unmapped shared page.

Wait a sec though...

We've been talking all along about how MMIO accesses are in some cases
just plain old compiler-generated memory accesses.  It's *probably* bad
code that does this, but it's not necessarily a bug.

It's kinda like the split lock detection patches.  Those definitely
found some stupid stuff, but it wasn't anything that I would have called
an outright bug.  Plus, in those cases, folks had explicitly opted in to
more crashes on stupid stuff.

That stupid stuff _might_ be rare enough that it's still OK to just punt
on it and not emulate the instruction (aka. crash).  Or, to say that TDX
guests are opting in to being more fragile, just like with split lock
detection.

But, either of those would call for a very different comment.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3 2/3] x86/tdx: Clarify RIP adjustments in #VE handler
  2022-05-25 16:02   ` Dave Hansen
@ 2022-05-26 20:13     ` Kirill A. Shutemov
  2022-05-26 20:18       ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2022-05-26 20:13 UTC (permalink / raw)
  To: Dave Hansen
  Cc: tglx, mingo, bp, luto, peterz, ak, dan.j.williams, david, hpa,
	linux-kernel, sathyanarayanan.kuppuswamy, seanjc,
	thomas.lendacky, x86

On Wed, May 25, 2022 at 09:02:48AM -0700, Dave Hansen wrote:
> On 5/24/22 15:10, Kirill A. Shutemov wrote:
> > +static int ve_instr_len(struct ve_info *ve)
> > +{
> > +	/*
> > +	 * If the #VE happened due to instruction execution, GET_VEINFO
> > +	 * provides info on the instruction.
> > +	 *
> > +	 * For #VE due to EPT violation, info provided by GET_VEINFO not usable
> > +	 * and kernel has to decode instruction manually to find out its
> > +	 * length. Catch such cases.
> > +	 */
> > +	if (WARN_ON_ONCE(ve->exit_reason == EXIT_REASON_EPT_VIOLATION))
> > +		return 0;
> > +
> > +	return ve->instr_len;
> > +}
> 
> I'm not super happy with how this comment ended up.  First, let's put
> the comment next to the code to which it applies, like:
> 
> 	/*
> 	 * ve->instr_len is not defined for EPT violations.  For those,
> 	 * the kernel must decode instructions manually and should not
> 	 * be using this function.
> 	 */
> 	if (WARN_ON_ONCE(ve->exit_reason == EXIT_REASON_EPT_VIOLATION))
> 		return 0;
> 
> 	/*
> 	 * Assume that the #VE occurred due to instruction execution.
> 	 */
> 	return ve->instr_len;

Would it be helpful if the function has a whitelist of exit resons where
using ve->instr_len is safe? WARN_ONCE() and return 0 otherwise.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3 2/3] x86/tdx: Clarify RIP adjustments in #VE handler
  2022-05-26 20:13     ` Kirill A. Shutemov
@ 2022-05-26 20:18       ` Dave Hansen
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2022-05-26 20:18 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: tglx, mingo, bp, luto, peterz, ak, dan.j.williams, david, hpa,
	linux-kernel, sathyanarayanan.kuppuswamy, seanjc,
	thomas.lendacky, x86

On 5/26/22 13:13, Kirill A. Shutemov wrote:
>> I'm not super happy with how this comment ended up.  First, let's put
>> the comment next to the code to which it applies, like:
>>
>> 	/*
>> 	 * ve->instr_len is not defined for EPT violations.  For those,
>> 	 * the kernel must decode instructions manually and should not
>> 	 * be using this function.
>> 	 */
>> 	if (WARN_ON_ONCE(ve->exit_reason == EXIT_REASON_EPT_VIOLATION))
>> 		return 0;
>>
>> 	/*
>> 	 * Assume that the #VE occurred due to instruction execution.
>> 	 */
>> 	return ve->instr_len;
> Would it be helpful if the function has a whitelist of exit resons where
> using ve->instr_len is safe? WARN_ONCE() and return 0 otherwise.

Maybe.  I was hoping to avoid mostly duplicating the
virt_exception_kernel() switch().


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  2022-05-26 16:20   ` Dave Hansen
@ 2022-05-26 20:36     ` Kirill A. Shutemov
  2022-05-26 20:39       ` Dave Hansen
  0 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2022-05-26 20:36 UTC (permalink / raw)
  To: Dave Hansen
  Cc: tglx, mingo, bp, luto, peterz, ak, dan.j.williams, david, hpa,
	linux-kernel, sathyanarayanan.kuppuswamy, seanjc,
	thomas.lendacky, x86

On Thu, May 26, 2022 at 09:20:56AM -0700, Dave Hansen wrote:
> On 5/24/22 15:10, Kirill A. Shutemov wrote:
> > +	/*
> > +	 * MMIO accesses suppose to be naturally aligned and therefore never
> > +	 * cross a page boundary. Seeing unaligned accesses indicates a bug or
> > +	 * load_unaligned_zeropad() that steps into unmapped shared page.
> 
> Wait a sec though...
> 
> We've been talking all along about how MMIO accesses are in some cases
> just plain old compiler-generated memory accesses.  It's *probably* bad
> code that does this, but it's not necessarily a bug.

Compiler-generated memory accesses tend to be aligned too. You need to do
something make them unalinged, like __packed or pointer trickery.

> It's kinda like the split lock detection patches.  Those definitely
> found some stupid stuff, but it wasn't anything that I would have called
> an outright bug.  Plus, in those cases, folks had explicitly opted in to
> more crashes on stupid stuff.
> 
> That stupid stuff _might_ be rare enough that it's still OK to just punt
> on it and not emulate the instruction (aka. crash).  Or, to say that TDX
> guests are opting in to being more fragile, just like with split lock
> detection.

I think it is reasonable to expect that TDX user value its data security
higher than uptime. 

And I'm not sure that compare unaligned MMIO access to split-lock is fair.
Split-lock is performance hit, but semantics is defined. In unalgined MMIO
case, I think the behaviour is not defined: it is not clear what memory
reqested should be issued on the memory bus in case of byte-algined 4-byte
access. It can make a difference on device side.

> But, either of those would call for a very different comment.

Fair enough.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  2022-05-26 20:36     ` Kirill A. Shutemov
@ 2022-05-26 20:39       ` Dave Hansen
  2022-05-26 21:00         ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2022-05-26 20:39 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: tglx, mingo, bp, luto, peterz, ak, dan.j.williams, david, hpa,
	linux-kernel, sathyanarayanan.kuppuswamy, seanjc,
	thomas.lendacky, x86

On 5/26/22 13:36, Kirill A. Shutemov wrote:
> On Thu, May 26, 2022 at 09:20:56AM -0700, Dave Hansen wrote:
>> On 5/24/22 15:10, Kirill A. Shutemov wrote:
>>> +	/*
>>> +	 * MMIO accesses suppose to be naturally aligned and therefore never
>>> +	 * cross a page boundary. Seeing unaligned accesses indicates a bug or
>>> +	 * load_unaligned_zeropad() that steps into unmapped shared page.
>> Wait a sec though...
>>
>> We've been talking all along about how MMIO accesses are in some cases
>> just plain old compiler-generated memory accesses.  It's *probably* bad
>> code that does this, but it's not necessarily a bug.
> Compiler-generated memory accesses tend to be aligned too. You need to do
> something make them unalinged, like __packed or pointer trickery.

I totally agree.  But, the point is that __packed or pointer trickery is
goofy, but it's not necessarily a bug.  This might crash the kernel on
goofy stuff, not bugs.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHv3 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
  2022-05-26 20:39       ` Dave Hansen
@ 2022-05-26 21:00         ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2022-05-26 21:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, tglx, mingo, bp, luto, peterz, ak,
	dan.j.williams, david, hpa, linux-kernel,
	sathyanarayanan.kuppuswamy, thomas.lendacky, x86

On Thu, May 26, 2022, Dave Hansen wrote:
> On 5/26/22 13:36, Kirill A. Shutemov wrote:
> > On Thu, May 26, 2022 at 09:20:56AM -0700, Dave Hansen wrote:
> >> On 5/24/22 15:10, Kirill A. Shutemov wrote:
> >>> +	/*
> >>> +	 * MMIO accesses suppose to be naturally aligned and therefore never
> >>> +	 * cross a page boundary. Seeing unaligned accesses indicates a bug or
> >>> +	 * load_unaligned_zeropad() that steps into unmapped shared page.
> >> Wait a sec though...
> >>
> >> We've been talking all along about how MMIO accesses are in some cases
> >> just plain old compiler-generated memory accesses.  It's *probably* bad
> >> code that does this, but it's not necessarily a bug.
> > Compiler-generated memory accesses tend to be aligned too. You need to do
> > something make them unalinged, like __packed or pointer trickery.
> 
> I totally agree.  But, the point is that __packed or pointer trickery is
> goofy, but it's not necessarily a bug.  This might crash the kernel on
> goofy stuff, not bugs.

Yeah, I don't think it's worth exploding on unaligned accesses, it's specifically
page splits that are a mess and are an absolutely nightmare to handle.  E.g. for
VirtIO kicks, the data and page offset are completely ignored/irrelevant, so a
multi-byte write to any random byte in the page should work, even though it's all
kinds of goofy.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-26 21:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-24 22:10 [PATCHv3 0/3] Fix for load_unaligned_zeropad() in TDX guest Kirill A. Shutemov
2022-05-24 22:10 ` [PATCHv3 1/3] x86/tdx: Fix early #VE handling Kirill A. Shutemov
2022-05-24 22:10 ` [PATCHv3 2/3] x86/tdx: Clarify RIP adjustments in #VE handler Kirill A. Shutemov
2022-05-25 16:02   ` Dave Hansen
2022-05-26 20:13     ` Kirill A. Shutemov
2022-05-26 20:18       ` Dave Hansen
2022-05-24 22:10 ` [PATCHv3 3/3] x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page Kirill A. Shutemov
2022-05-26 16:20   ` Dave Hansen
2022-05-26 20:36     ` Kirill A. Shutemov
2022-05-26 20:39       ` Dave Hansen
2022-05-26 21:00         ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.