All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org
Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com,
	ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com,
	hpa@zytor.com, jgross@suse.com, jmattson@google.com,
	joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org,
	pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com,
	tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCH 26/26] Documentation/x86: Document TDX kernel architecture
Date: Tue, 14 Dec 2021 18:03:04 +0300	[thread overview]
Message-ID: <20211214150304.62613-27-kirill.shutemov@linux.intel.com> (raw)
In-Reply-To: <20211214150304.62613-1-kirill.shutemov@linux.intel.com>

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Document the TDX guest architecture details like #VE support,
shared memory, etc.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/index.rst |   1 +
 Documentation/x86/tdx.rst   | 194 ++++++++++++++++++++++++++++++++++++
 2 files changed, 195 insertions(+)
 create mode 100644 Documentation/x86/tdx.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index f498f1d36cd3..382e53ca850a 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -24,6 +24,7 @@ x86-specific Documentation
    intel-iommu
    intel_txt
    amd-memory-encryption
+   tdx
    pti
    mds
    microcode
diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst
new file mode 100644
index 000000000000..8c9cf1a5bfb8
--- /dev/null
+++ b/Documentation/x86/tdx.rst
@@ -0,0 +1,194 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================
+Intel Trust Domain Extensions (TDX)
+=====================================
+
+Intel's Trust Domain Extensions (TDX) protect confidential guest VMs
+from the host and physical attacks by isolating the guest register
+state and by encrypting the guest memory. In TDX, a special TDX module
+sits between the host and the guest, and runs in a special mode and
+manages the guest/host separation.
+
+Since the host cannot directly access guest registers or memory, much
+normal functionality of a hypervisor (such as trapping MMIO, some MSRs,
+some CPUIDs, and some other instructions) has to be moved into the
+guest. This is implemented using a Virtualization Exception (#VE) that
+is handled by the guest kernel. Some #VEs are handled inside the guest
+kernel, but some require the hypervisor (VMM) to be involved. The TD
+hypercall mechanism allows TD guests to call TDX module or hypervisor
+function.
+
+#VE Exceptions:
+===============
+
+In TDX guests, #VE Exceptions are delivered to TDX guests in following
+scenarios:
+
+* Execution of certain instructions (see list below)
+* Certain MSR accesses.
+* CPUID usage (only for certain leaves)
+* Shared memory access (including MMIO)
+
+#VE due to instruction execution
+---------------------------------
+
+Intel TDX dis-allows execution of certain instructions in non-root
+mode. Execution of these instructions would lead to #VE or #GP.
+
+Details are,
+
+List of instructions that can cause a #VE is,
+
+* String I/O (INS, OUTS), IN, OUT
+* HLT
+* MONITOR, MWAIT
+* WBINVD, INVD
+* VMCALL
+
+List of instructions that can cause a #GP is,
+
+* All VMX instructions: INVEPT, INVVPID, VMCLEAR, VMFUNC, VMLAUNCH,
+  VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON
+* ENCLS, ENCLV
+* GETSEC
+* RSM
+* ENQCMD
+
+#VE due to MSR access
+----------------------
+
+In TDX guest, MSR access behavior can be categorized as,
+
+* Native supported (also called "context switched MSR")
+  No special handling is required for these MSRs in TDX guests.
+* #GP triggered
+  Dis-allowed MSR read/write would lead to #GP.
+* #VE triggered
+  All MSRs that are not natively supported or dis-allowed
+  (triggers #GP) will trigger #VE. To support access to
+  these MSRs, it needs to be emulated using TDCALL.
+
+Look Intel TDX Module Specification, sec "MSR Virtualization" for the complete
+list of MSRs that fall under the categories above.
+
+#VE due to CPUID instruction
+----------------------------
+
+In TDX guests, most of CPUID leaf/sub-leaf combinations are virtualized by
+the TDX module while some trigger #VE. Combinations of CPUID leaf/sub-leaf
+which triggers #VE are configured by the VMM during the TD initialization
+time (using TDH.MNG.INIT).
+
+#VE on Memory Accesses
+----------------------
+
+A TD guest is in control of whether its memory accesses are treated as
+private or shared.  It selects the behavior with a bit in its page table
+entries.
+
+#VE on Shared Pages
+-------------------
+
+Access to shared mappings can cause a #VE. The hypervisor controls whether
+access of shared mapping causes a #VE, so the guest must be careful to only
+reference shared pages it can safely handle a #VE, avoid nested #VEs.
+
+Content of shared mapping is not trusted since shared memory is writable
+by the hypervisor. Shared mappings are never used for sensitive memory content
+like stacks or kernel text, only for I/O buffers and MMIO regions. The kernel
+will not encounter shared mappings in sensitive contexts like syscall entry
+or NMIs.
+
+#VE on Private Pages
+--------------------
+
+Some accesses to private mappings may cause #VEs.  Before a mapping is
+accepted (AKA in the SEPT_PENDING state), a reference would cause a #VE.
+But, after acceptance, references typically succeed.
+
+The hypervisor can cause a private page reference to fail if it chooses
+to move an accepted page to a "blocked" state.  However, if it does
+this, page access will not generate a #VE.  It will, instead, cause a
+"TD Exit" where the hypervisor is required to handle the exception.
+
+Linux #VE handler
+-----------------
+
+Both user/kernel #VE exceptions are handled by the tdx_handle_virt_exception()
+handler. If successfully handled, the instruction pointer is incremented to
+complete the handling process. If failed to handle, it is treated as a regular
+exception and handled via fixup handlers.
+
+In TD guests, #VE nesting (a #VE triggered before handling the current one
+or AKA syscall gap issue) problem is handled by TDX Module ensuring that
+interrupts, including NMIs, are blocked. The hardware blocks interrupts
+starting with #VE delivery until TDGETVEINFO is called.
+
+The kernel must avoid triggering #VE in entry paths: do not touch TD-shared
+memory, including MMIO regions, and do not use #VE triggering MSRs,
+instructions, or CPUID leaves that might generate #VE.
+
+MMIO handling:
+==============
+
+In non-TDX VMs, MMIO is usually implemented by giving a guest access to a
+mapping which will cause a VMEXIT on access, and then the VMM emulates the
+access. That's not possible in TDX guests because VMEXIT will expose the
+register state to the host. TDX guests don't trust the host and can't have
+their state exposed to the host.
+
+In TDX the MMIO regions are instead configured to trigger a #VE
+exception in the guest. The guest #VE handler then emulates the MMIO
+instructions inside the guest and converts them into a controlled TDCALL
+to the host, rather than completely exposing the state to the host.
+
+MMIO addresses on x86 are just special physical addresses. They can be
+accessed with any instruction that accesses memory. However, the
+introduced instruction decoding method is limited. It is only designed
+to decode instructions like those generated by io.h macros.
+
+MMIO access via other means (like structure overlays) may result in
+MMIO_DECODE_FAILED and an oops.
+
+Shared memory:
+==============
+
+Intel TDX doesn't allow the VMM to access guest private memory. Any
+memory that is required for communication with VMM must be shared
+explicitly by setting the bit in the page table entry. The shared bit
+can be enumerated with TDX_GET_INFO.
+
+After setting the shared bit, the conversion must be completed with
+MapGPA hypercall. The call informs the VMM about the conversion between
+private/shared mappings.
+
+set_memory_decrypted() converts a range of pages to shared.
+set_memory_encrypted() converts memory back to private.
+
+Device drivers are the primary user of shared memory, but there's no
+need in touching every driver. DMA buffers and ioremap()'ed regions are
+converted to shared automatically.
+
+TDX uses SWIOTLB for most DMA allocations. The SWIOTLB buffer is
+converted to shared on boot.
+
+For coherent DMA allocation, the DMA buffer gets converted on the
+allocation. Check force_dma_unencrypted() for details.
+
+References
+==========
+
+More details about TDX module (and its response for MSR, memory access,
+IO, CPUID etc) can be found at,
+
+https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1.0-public-spec-v0.931.pdf
+
+More details about TDX hypercall and TDX module call ABI can be found
+at,
+
+https://www.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface-1.0-344426-002.pdf
+
+More details about TDVF requirements can be found at,
+
+https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.01.pdf
-- 
2.32.0


      parent reply	other threads:[~2021-12-14 15:04 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-14 15:02 [PATCH 00/26] TDX Guest: TDX core support Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 01/26] x86/tdx: Detect running as a TDX guest in early boot Kirill A. Shutemov
2021-12-14 18:18   ` Borislav Petkov
2021-12-14 20:21     ` Kirill A. Shutemov
2021-12-14 20:58       ` Borislav Petkov
2021-12-14 15:02 ` [PATCH 02/26] x86/tdx: Extend the cc_platform_has() API to support TDX guests Kirill A. Shutemov
2021-12-15 23:19   ` Josh Poimboeuf
2021-12-15 23:35     ` Kirill A. Shutemov
2021-12-15 23:37       ` Josh Poimboeuf
2021-12-16 18:33   ` Borislav Petkov
2021-12-14 15:02 ` [PATCH 03/26] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kirill A. Shutemov
2021-12-21 19:11   ` Borislav Petkov
2021-12-23 16:55     ` Kirill A. Shutemov
2021-12-23 18:53       ` Borislav Petkov
2021-12-24  9:16       ` Paolo Bonzini
2021-12-24 10:34         ` Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 04/26] x86/traps: Add #VE support for TDX guest Kirill A. Shutemov
2021-12-23 19:45   ` Borislav Petkov
2021-12-28 23:31     ` Kirill A. Shutemov
2021-12-29 11:29       ` Borislav Petkov
2021-12-29 17:07         ` Sean Christopherson
2021-12-29 17:35           ` Borislav Petkov
2021-12-29 17:47             ` Sean Christopherson
2021-12-30  8:05         ` Kirill A. Shutemov
2021-12-30 10:53           ` Borislav Petkov
2021-12-30 15:41             ` Kirill A. Shutemov
2021-12-30 18:02               ` Borislav Petkov
2021-12-29 18:42       ` Dave Hansen
2021-12-14 15:02 ` [PATCH 05/26] x86/tdx: Add HLT support for TDX guests (#VE approach) Kirill A. Shutemov
2021-12-28 19:08   ` Borislav Petkov
2021-12-14 15:02 ` [PATCH 06/26] x86/tdx: Add MSR support for TDX guests Kirill A. Shutemov
2021-12-29 11:59   ` Borislav Petkov
2021-12-14 15:02 ` [PATCH 07/26] x86/tdx: Handle CPUID via #VE Kirill A. Shutemov
2021-12-31 17:19   ` Borislav Petkov
2021-12-14 15:02 ` [PATCH 08/26] x86/tdx: Handle in-kernel MMIO Kirill A. Shutemov
2021-12-15 23:31   ` Josh Poimboeuf
2021-12-15 23:37     ` Kirill A. Shutemov
2022-01-06 15:08     ` Kirill A. Shutemov
2022-01-05 10:37   ` Borislav Petkov
2022-01-05 15:43     ` Kirill A. Shutemov
2022-01-07 13:46       ` Borislav Petkov
2022-01-07 17:49         ` Kirill A. Shutemov
2022-01-07 19:04           ` Borislav Petkov
2021-12-14 15:02 ` [PATCH 09/26] x86/tdx: Detect TDX at early kernel decompression time Kirill A. Shutemov
2022-01-07 16:27   ` Borislav Petkov
2021-12-14 15:02 ` [PATCH 10/26] x86/tdx: Support TDX guest port I/O at " Kirill A. Shutemov
2022-01-13 13:51   ` Borislav Petkov
2022-01-15  1:01     ` Kirill A. Shutemov
2022-01-15 12:16       ` Borislav Petkov
2022-01-17 14:39         ` Kirill A. Shutemov
2022-01-17 18:32           ` Borislav Petkov
2022-01-19 11:53             ` Kirill A. Shutemov
2022-01-19 13:35               ` Borislav Petkov
2022-01-19 15:49                 ` Kirill A. Shutemov
2022-01-19 19:46                   ` Borislav Petkov
2022-01-19 20:08                     ` Kirill A. Shutemov
2022-01-19 20:26                       ` Borislav Petkov
2022-01-20  2:15                         ` [PATCH 1/3] x86: Consolidate port I/O helpers Kirill A. Shutemov
2022-01-20  2:15                           ` [PATCH 2/3] x86/boot: Allow to hook up alternative " Kirill A. Shutemov
2022-01-20 16:38                             ` Kirill A. Shutemov
2022-01-20 21:13                               ` Josh Poimboeuf
2022-01-20 22:19                                 ` Borislav Petkov
2022-01-20  2:15                           ` [PATCH 3/3] x86/boot/compressed: Support TDX guest port I/O at decompression time Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 11/26] x86/tdx: Add port I/O emulation Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 12/26] x86/tdx: Early boot handling of port I/O Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 13/26] x86/boot: Add a trampoline for booting APs via firmware handoff Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 14/26] x86/acpi, x86/boot: Add multiprocessor wake-up support Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 15/26] x86/boot: Avoid #VE during boot for TDX platforms Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 16/26] x86/topology: Disable CPU online/offline control for TDX guests Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 17/26] x86/tdx: Get page shared bit info from the TDX Module Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 18/26] x86/tdx: Exclude shared bit from __PHYSICAL_MASK Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 19/26] x86/tdx: Make pages shared in ioremap() Kirill A. Shutemov
2021-12-22 17:26   ` Tom Lendacky
2021-12-23 17:15     ` Kirill A. Shutemov
2021-12-23 19:45       ` Dave Hansen
2021-12-23 19:53         ` Borislav Petkov
2021-12-23 20:56           ` Kirill A. Shutemov
2021-12-23 21:09             ` Borislav Petkov
2021-12-24 11:03               ` Kirill A. Shutemov
2021-12-27 11:51                 ` Borislav Petkov
2021-12-27 14:14                   ` Kirill A. Shutemov
2021-12-28 18:39                     ` Borislav Petkov
2021-12-28 23:33                       ` Kirill A. Shutemov
2021-12-27 15:07                 ` Tom Lendacky
2022-01-03 14:17                   ` Kirill A. Shutemov
2022-01-03 14:29                     ` Borislav Petkov
2022-01-03 15:15                       ` Kirill A. Shutemov
2022-01-03 16:50                         ` Dave Hansen
2022-01-03 18:10                           ` Kirill A. Shutemov
2022-01-04 19:14                             ` Kirill A. Shutemov
2022-01-04 20:36                               ` Dave Hansen
2022-01-05  0:31                                 ` Kirill A. Shutemov
2022-01-05  0:43                                   ` Dave Hansen
2022-01-05  0:57                                     ` Kirill A. Shutemov
2022-01-05  1:02                                       ` Kirill A. Shutemov
2022-01-05  1:38                                       ` Dave Hansen
2022-01-05  9:46                                         ` Kirill A. Shutemov
2022-01-05 14:16                                     ` Tom Lendacky
2022-01-05 16:02                                       ` Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 20/26] x86/tdx: Add helper to convert memory between shared and private Kirill A. Shutemov
2021-12-14 15:02 ` [PATCH 21/26] x86/mm/cpa: Add support for TDX shared memory Kirill A. Shutemov
2021-12-14 15:03 ` [PATCH 22/26] x86/kvm: Use bounce buffers for TD guest Kirill A. Shutemov
2021-12-14 15:03 ` [PATCH 23/26] x86/tdx: ioapic: Add shared bit for IOAPIC base address Kirill A. Shutemov
2021-12-14 15:03 ` [PATCH 24/26] ACPICA: Avoid cache flush on TDX guest Kirill A. Shutemov
2021-12-14 15:03 ` [PATCH 25/26] x86/tdx: Warn about unexpected WBINVD Kirill A. Shutemov
2021-12-14 15:03 ` Kirill A. Shutemov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211214150304.62613-27-kirill.shutemov@linux.intel.com \
    --to=kirill.shutemov@linux.intel.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jpoimboe@redhat.com \
    --cc=knsathya@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sdeep@vmware.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.