All of lore.kernel.org
 help / color / mirror / Atom feed
From: "tip-bot2 for Kirill A. Shutemov" <tip-bot2@linutronix.de>
To: linux-tip-commits@vger.kernel.org
Cc: Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>, Tony Luck <tony.luck@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: [tip: x86/tdx] x86/tdx: Handle in-kernel MMIO
Date: Sat, 09 Apr 2022 01:27:34 -0000	[thread overview]
Message-ID: <164946765464.4207.3715751176055921036.tip-bot2@tip-bot2> (raw)
In-Reply-To: <20220405232939.73860-12-kirill.shutemov@linux.intel.com>

The following commit has been merged into the x86/tdx branch of tip:

Commit-ID:     31d58c4e557d46fa7f8557714250fb6f89c941ae
Gitweb:        https://git.kernel.org/tip/31d58c4e557d46fa7f8557714250fb6f89c941ae
Author:        Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
AuthorDate:    Wed, 06 Apr 2022 02:29:20 +03:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Thu, 07 Apr 2022 08:27:51 -07:00

x86/tdx: Handle in-kernel MMIO

In non-TDX VMs, MMIO is implemented by providing the guest a mapping
which will cause a VMEXIT on access and then the VMM emulating the
instruction that caused the VMEXIT. That's not possible for TDX VM.

To emulate an instruction an emulator needs two things:

  - R/W access to the register file to read/modify instruction arguments
    and see RIP of the faulted instruction.

  - Read access to memory where instruction is placed to see what to
    emulate. In this case it is guest kernel text.

Both of them are not available to VMM in TDX environment:

  - Register file is never exposed to VMM. When a TD exits to the module,
    it saves registers into the state-save area allocated for that TD.
    The module then scrubs these registers before returning execution
    control to the VMM, to help prevent leakage of TD state.

  - TDX does not allow guests to execute from shared memory. All executed
    instructions are in TD-private memory. Being private to the TD, VMMs
    have no way to access TD-private memory and no way to read the
    instruction to decode and emulate it.

In TDX the MMIO regions are instead configured by VMM to trigger a #VE
exception in the guest.

Add #VE handling that emulates the MMIO instruction inside the guest and
converts it into a controlled hypercall to the host.

This approach is bad for performance. But, it has (virtually) no impact
on the size of the kernel image and will work for a wide variety of
drivers. This allows TDX deployments to use arbitrary devices and device
drivers, including virtio. TDX customers have asked for the capability
to use random devices in their deployments.

In other words, even if all of the work was done to paravirtualize all
x86 MMIO users and virtio, this approach would still be needed. There
is essentially no way to get rid of this code.

This approach is functional for all in-kernel MMIO users current and
future and does so with a minimal amount of code and kernel image bloat.

MMIO addresses can be used with any CPU instruction that accesses
memory. Address only MMIO accesses done via io.h helpers, such as
'readl()' or 'writeq()'.

Any CPU instruction that accesses memory can also be used to access
MMIO.  However, by convention, MMIO access are typically performed via
io.h helpers such as 'readl()' or 'writeq()'.

The io.h helpers intentionally use a limited set of instructions when
accessing MMIO.  This known, limited set of instructions makes MMIO
instruction decoding and emulation feasible in KVM hosts and SEV guests
today.

MMIO accesses performed without the io.h helpers are at the mercy of the
compiler.  Compilers can and will generate a much more broad set of
instructions which can not practically be decoded and emulated.  TDX
guests will oops if they encounter one of these decoding failures.

This means that TDX guests *must* use the io.h helpers to access MMIO.

This requirement is not new.  Both KVM hosts and AMD SEV guests have the
same limitations on MMIO access.

=== Potential alternative approaches ===

== Paravirtualizing all MMIO ==

An alternative to letting MMIO induce a #VE exception is to avoid
the #VE in the first place. Similar to the port I/O case, it is
theoretically possible to paravirtualize MMIO accesses.

Like the exception-based approach offered here, a fully paravirtualized
approach would be limited to MMIO users that leverage common
infrastructure like the io.h macros.

However, any paravirtual approach would be patching approximately 120k
call sites. Any paravirtual approach would need to replace a bare memory
access instruction with (at least) a function call. With a conservative
overhead estimation of 5 bytes per call site (CALL instruction),
it leads to bloating code by 600k.

Many drivers will never be used in the TDX environment and the bloat
cannot be justified.

== Patching TDX drivers ==

Rather than touching the entire kernel, it might also be possible to
just go after drivers that use MMIO in TDX guests *and* are performance
critical to justify the effrort. Right now, that's limited only to virtio.

All virtio MMIO appears to be done through a single function, which
makes virtio eminently easy to patch.

This approach will be adopted in the future, removing the bulk of
MMIO #VEs. The #VE-based MMIO will remain serving non-virtio use cases.

Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20220405232939.73860-12-kirill.shutemov@linux.intel.com
---
 arch/x86/coco/tdx/tdx.c | 121 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 121 insertions(+)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 50c3b97..ab10bc7 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -8,11 +8,17 @@
 #include <asm/coco.h>
 #include <asm/tdx.h>
 #include <asm/vmx.h>
+#include <asm/insn.h>
+#include <asm/insn-eval.h>
 
 /* TDX module Call Leaf IDs */
 #define TDX_GET_INFO			1
 #define TDX_GET_VEINFO			3
 
+/* MMIO direction */
+#define EPT_READ	0
+#define EPT_WRITE	1
+
 /*
  * Wrapper for standard use of __tdx_hypercall with no output aside from
  * return code.
@@ -222,6 +228,119 @@ static bool handle_cpuid(struct pt_regs *regs)
 	return true;
 }
 
+static bool mmio_read(int size, unsigned long addr, unsigned long *val)
+{
+	struct tdx_hypercall_args args = {
+		.r10 = TDX_HYPERCALL_STANDARD,
+		.r11 = hcall_func(EXIT_REASON_EPT_VIOLATION),
+		.r12 = size,
+		.r13 = EPT_READ,
+		.r14 = addr,
+		.r15 = *val,
+	};
+
+	if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
+		return false;
+	*val = args.r11;
+	return true;
+}
+
+static bool mmio_write(int size, unsigned long addr, unsigned long val)
+{
+	return !_tdx_hypercall(hcall_func(EXIT_REASON_EPT_VIOLATION), size,
+			       EPT_WRITE, addr, val);
+}
+
+static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
+{
+	char buffer[MAX_INSN_SIZE];
+	unsigned long *reg, val;
+	struct insn insn = {};
+	enum mmio_type mmio;
+	int size, extend_size;
+	u8 extend_val = 0;
+
+	/* Only in-kernel MMIO is supported */
+	if (WARN_ON_ONCE(user_mode(regs)))
+		return false;
+
+	if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE))
+		return false;
+
+	if (insn_decode(&insn, buffer, MAX_INSN_SIZE, INSN_MODE_64))
+		return false;
+
+	mmio = insn_decode_mmio(&insn, &size);
+	if (WARN_ON_ONCE(mmio == MMIO_DECODE_FAILED))
+		return false;
+
+	if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
+		reg = insn_get_modrm_reg_ptr(&insn, regs);
+		if (!reg)
+			return false;
+	}
+
+	ve->instr_len = insn.length;
+
+	/* Handle writes first */
+	switch (mmio) {
+	case MMIO_WRITE:
+		memcpy(&val, reg, size);
+		return mmio_write(size, ve->gpa, val);
+	case MMIO_WRITE_IMM:
+		val = insn.immediate.value;
+		return mmio_write(size, ve->gpa, val);
+	case MMIO_READ:
+	case MMIO_READ_ZERO_EXTEND:
+	case MMIO_READ_SIGN_EXTEND:
+		/* Reads are handled below */
+		break;
+	case MMIO_MOVS:
+	case MMIO_DECODE_FAILED:
+		/*
+		 * MMIO was accessed with an instruction that could not be
+		 * decoded or handled properly. It was likely not using io.h
+		 * helpers or accessed MMIO accidentally.
+		 */
+		return false;
+	default:
+		WARN_ONCE(1, "Unknown insn_decode_mmio() decode value?");
+		return false;
+	}
+
+	/* Handle reads */
+	if (!mmio_read(size, ve->gpa, &val))
+		return false;
+
+	switch (mmio) {
+	case MMIO_READ:
+		/* Zero-extend for 32-bit operation */
+		extend_size = size == 4 ? sizeof(*reg) : 0;
+		break;
+	case MMIO_READ_ZERO_EXTEND:
+		/* Zero extend based on operand size */
+		extend_size = insn.opnd_bytes;
+		break;
+	case MMIO_READ_SIGN_EXTEND:
+		/* Sign extend based on operand size */
+		extend_size = insn.opnd_bytes;
+		if (size == 1 && val & BIT(7))
+			extend_val = 0xFF;
+		else if (size > 1 && val & BIT(15))
+			extend_val = 0xFF;
+		break;
+	default:
+		/* All other cases has to be covered with the first switch() */
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	if (extend_size)
+		memset(reg, extend_val, extend_size);
+	memcpy(reg, &val, size);
+	return true;
+}
+
 void tdx_get_ve_info(struct ve_info *ve)
 {
 	struct tdx_module_output out;
@@ -276,6 +395,8 @@ static bool virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
 		return write_msr(regs);
 	case EXIT_REASON_CPUID:
 		return handle_cpuid(regs);
+	case EXIT_REASON_EPT_VIOLATION:
+		return handle_mmio(regs, ve);
 	default:
 		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
 		return false;

  reply	other threads:[~2022-04-09  1:29 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-05 23:29 [PATCHv8 00/30] TDX Guest: TDX core support Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 01/30] x86/tdx: Detect running as a TDX guest in early boot Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kuppuswamy Sathyanarayanan
2022-04-05 23:29 ` [PATCHv8 02/30] x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 03/30] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kuppuswamy Sathyanarayanan
2022-05-20  8:38     ` [PATCH] x86/tdx: Fix tdx asm Peter Zijlstra
2022-05-20 11:00       ` [tip: x86/tdx] x86/tdx: Fix RETs in TDX asm tip-bot2 for Peter Zijlstra
2022-05-20 13:59       ` [PATCH] x86/tdx: Fix tdx asm Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 04/30] x86/tdx: Extend the confidential computing API to support TDX guests Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 05/30] x86/tdx: Exclude shared bit from __PHYSICAL_MASK Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 06/30] x86/traps: Refactor exc_general_protection() Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 07/30] x86/traps: Add #VE support for TDX guest Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 08/30] x86/tdx: Add HLT support for TDX guests Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 09/30] x86/tdx: Add MSR " Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 10/30] x86/tdx: Handle CPUID via #VE Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 11/30] x86/tdx: Handle in-kernel MMIO Kirill A. Shutemov
2022-04-09  1:27   ` tip-bot2 for Kirill A. Shutemov [this message]
2022-04-05 23:29 ` [PATCHv8 12/30] x86/tdx: Detect TDX at early kernel decompression time Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kuppuswamy Sathyanarayanan
2022-04-05 23:29 ` [PATCHv8 13/30] x86: Adjust types used in port I/O helpers Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 14/30] x86: Consolidate " Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-10 10:58   ` [PATCHv8 14/30] " Borislav Petkov
2022-04-10 20:00     ` Kirill A. Shutemov
2022-04-10 20:37       ` Borislav Petkov
2022-04-11  7:49       ` [tip: x86/tdx] x86/kaslr: Fix build warning in KASLR code in boot stub tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 15/30] x86/boot: Port I/O: allow to hook up alternative helpers Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] x86/boot: Port I/O: Allow " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 16/30] x86/boot: Port I/O: add decompression-time support for TDX Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] x86/boot: Port I/O: Add " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 17/30] x86/tdx: Port I/O: add runtime hypercalls Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] x86/tdx: Port I/O: Add " tip-bot2 for Kuppuswamy Sathyanarayanan
2022-04-05 23:29 ` [PATCHv8 18/30] x86/tdx: Port I/O: add early boot support Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] x86/tdx: Port I/O: Add " tip-bot2 for Andi Kleen
2022-04-05 23:29 ` [PATCHv8 19/30] x86/tdx: Wire up KVM hypercalls Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kuppuswamy Sathyanarayanan
2022-04-05 23:29 ` [PATCHv8 20/30] x86/boot: Add a trampoline for booting APs via firmware handoff Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Sean Christopherson
2022-04-05 23:29 ` [PATCHv8 21/30] x86/acpi, x86/boot: Add multiprocessor wake-up support Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] x86/acpi/x86/boot: " tip-bot2 for Kuppuswamy Sathyanarayanan
2022-04-05 23:29 ` [PATCHv8 22/30] x86/boot: Set CR0.NE early and keep it set during the boot Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 23/30] x86/boot: Avoid #VE during boot for TDX platforms Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Sean Christopherson
2022-04-05 23:29 ` [PATCHv8 24/30] x86/topology: Disable CPU online/offline control for TDX guests Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kuppuswamy Sathyanarayanan
2022-04-05 23:29 ` [PATCHv8 25/30] x86/tdx: Make pages shared in ioremap() Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 26/30] x86/mm/cpa: Add support for TDX shared memory Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 27/30] x86/mm: Make DMA memory shared for TD guest Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 28/30] x86/tdx: ioapic: Add shared bit for IOAPIC base address Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] x86/tdx/ioapic: " tip-bot2 for Isaku Yamahata
2022-04-05 23:29 ` [PATCHv8 29/30] ACPICA: Avoid cache flush inside virtual machines Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kirill A. Shutemov
2022-04-05 23:29 ` [PATCHv8 30/30] Documentation/x86: Document TDX kernel architecture Kirill A. Shutemov
2022-04-09  1:27   ` [tip: x86/tdx] " tip-bot2 for Kuppuswamy Sathyanarayanan
2022-04-07 16:36 ` [PATCHv8 00/30] TDX Guest: TDX core support Dave Hansen
2022-04-07 16:50   ` Sean Christopherson
2022-04-07 17:42     ` Tom Lendacky
2022-04-07 17:47     ` Kirill A. Shutemov
2022-04-07 18:53       ` Sean Christopherson
2022-04-08 11:01         ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=164946765464.4207.3715751176055921036.tip-bot2@tip-bot2 \
    --to=tip-bot2@linutronix.de \
    --cc=ak@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.