linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org
Cc: sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com,
	ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com,
	hpa@zytor.com, jgross@suse.com, jmattson@google.com,
	joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org,
	pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com,
	tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Subject: [PATCHv2 29/29] Documentation/x86: Document TDX kernel architecture
Date: Mon, 24 Jan 2022 18:02:15 +0300	[thread overview]
Message-ID: <20220124150215.36893-30-kirill.shutemov@linux.intel.com> (raw)
In-Reply-To: <20220124150215.36893-1-kirill.shutemov@linux.intel.com>

From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Document the TDX guest architecture details like #VE support,
shared memory, etc.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/x86/index.rst |   1 +
 Documentation/x86/tdx.rst   | 194 ++++++++++++++++++++++++++++++++++++
 2 files changed, 195 insertions(+)
 create mode 100644 Documentation/x86/tdx.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index f498f1d36cd3..382e53ca850a 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -24,6 +24,7 @@ x86-specific Documentation
    intel-iommu
    intel_txt
    amd-memory-encryption
+   tdx
    pti
    mds
    microcode
diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst
new file mode 100644
index 000000000000..903c9cecccbd
--- /dev/null
+++ b/Documentation/x86/tdx.rst
@@ -0,0 +1,194 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================
+Intel Trust Domain Extensions (TDX)
+=====================================
+
+Intel's Trust Domain Extensions (TDX) protect confidential guest VMs
+from the host and physical attacks by isolating the guest register
+state and by encrypting the guest memory. In TDX, a special TDX module
+sits between the host and the guest, and runs in a special mode and
+manages the guest/host separation.
+
+Since the host cannot directly access guest registers or memory, much
+normal functionality of a hypervisor (such as trapping MMIO, some MSRs,
+some CPUIDs, and some other instructions) has to be moved into the
+guest. This is implemented using a Virtualization Exception (#VE) that
+is handled by the guest kernel. Some #VEs are handled inside the guest
+kernel, but some require the hypervisor (VMM) to be involved. The TD
+hypercall mechanism allows TD guests to call TDX module or hypervisor
+function.
+
+#VE Exceptions:
+===============
+
+In TDX guests, #VE Exceptions are delivered to TDX guests in following
+scenarios:
+
+* Execution of certain instructions (see list below)
+* Certain MSR accesses.
+* CPUID usage (only for certain leaves)
+* Shared memory access (including MMIO)
+
+#VE due to instruction execution
+---------------------------------
+
+Intel TDX dis-allows execution of certain instructions in non-root
+mode. Execution of these instructions would lead to #VE or #GP.
+
+Details are,
+
+List of instructions that can cause a #VE is,
+
+* String I/O (INS, OUTS), IN, OUT
+* HLT
+* MONITOR, MWAIT
+* WBINVD, INVD
+* VMCALL
+
+List of instructions that can cause a #GP is,
+
+* All VMX instructions: INVEPT, INVVPID, VMCLEAR, VMFUNC, VMLAUNCH,
+  VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON
+* ENCLS, ENCLV
+* GETSEC
+* RSM
+* ENQCMD
+
+#VE due to MSR access
+----------------------
+
+In TDX guest, MSR access behavior can be categorized as,
+
+* Native supported (also called "context switched MSR")
+  No special handling is required for these MSRs in TDX guests.
+* #GP triggered
+  Dis-allowed MSR read/write would lead to #GP.
+* #VE triggered
+  All MSRs that are not natively supported or dis-allowed
+  (triggers #GP) will trigger #VE. To support access to
+  these MSRs, it needs to be emulated using TDCALL.
+
+Look Intel TDX Module Specification, sec "MSR Virtualization" for the complete
+list of MSRs that fall under the categories above.
+
+#VE due to CPUID instruction
+----------------------------
+
+In TDX guests, most of CPUID leaf/sub-leaf combinations are virtualized by
+the TDX module while some trigger #VE. Combinations of CPUID leaf/sub-leaf
+which triggers #VE are configured by the VMM during the TD initialization
+time (using TDH.MNG.INIT).
+
+#VE on Memory Accesses
+----------------------
+
+A TD guest is in control of whether its memory accesses are treated as
+private or shared.  It selects the behavior with a bit in its page table
+entries.
+
+#VE on Shared Pages
+-------------------
+
+Access to shared mappings can cause a #VE. The hypervisor controls whether
+access of shared mapping causes a #VE, so the guest must be careful to only
+reference shared pages it can safely handle a #VE, avoid nested #VEs.
+
+Content of shared mapping is not trusted since shared memory is writable
+by the hypervisor. Shared mappings are never used for sensitive memory content
+like stacks or kernel text, only for I/O buffers and MMIO regions. The kernel
+will not encounter shared mappings in sensitive contexts like syscall entry
+or NMIs.
+
+#VE on Private Pages
+--------------------
+
+Some accesses to private mappings may cause #VEs.  Before a mapping is
+accepted (AKA in the SEPT_PENDING state), a reference would cause a #VE.
+But, after acceptance, references typically succeed.
+
+The hypervisor can cause a private page reference to fail if it chooses
+to move an accepted page to a "blocked" state.  However, if it does
+this, page access will not generate a #VE.  It will, instead, cause a
+"TD Exit" where the hypervisor is required to handle the exception.
+
+Linux #VE handler
+-----------------
+
+Both user/kernel #VE exceptions are handled by the tdx_handle_virt_exception()
+handler. If successfully handled, the instruction pointer is incremented to
+complete the handling process. If failed to handle, it is treated as a regular
+exception and handled via fixup handlers.
+
+In TD guests, #VE nesting (a #VE triggered before handling the current one
+or AKA syscall gap issue) problem is handled by TDX module ensuring that
+interrupts, including NMIs, are blocked. The hardware blocks interrupts
+starting with #VE delivery until TDGETVEINFO is called.
+
+The kernel must avoid triggering #VE in entry paths: do not touch TD-shared
+memory, including MMIO regions, and do not use #VE triggering MSRs,
+instructions, or CPUID leaves that might generate #VE.
+
+MMIO handling:
+==============
+
+In non-TDX VMs, MMIO is usually implemented by giving a guest access to a
+mapping which will cause a VMEXIT on access, and then the VMM emulates the
+access. That's not possible in TDX guests because VMEXIT will expose the
+register state to the host. TDX guests don't trust the host and can't have
+their state exposed to the host.
+
+In TDX the MMIO regions are instead configured to trigger a #VE
+exception in the guest. The guest #VE handler then emulates the MMIO
+instructions inside the guest and converts them into a controlled TDCALL
+to the host, rather than completely exposing the state to the host.
+
+MMIO addresses on x86 are just special physical addresses. They can be
+accessed with any instruction that accesses memory. However, the
+introduced instruction decoding method is limited. It is only designed
+to decode instructions like those generated by io.h macros.
+
+MMIO access via other means (like structure overlays) may result in
+MMIO_DECODE_FAILED and an oops.
+
+Shared memory:
+==============
+
+Intel TDX doesn't allow the VMM to access guest private memory. Any
+memory that is required for communication with VMM must be shared
+explicitly by setting the bit in the page table entry. The shared bit
+can be enumerated with TDX_GET_INFO.
+
+After setting the shared bit, the conversion must be completed with
+MapGPA hypercall. The call informs the VMM about the conversion between
+private/shared mappings.
+
+set_memory_decrypted() converts a range of pages to shared.
+set_memory_encrypted() converts memory back to private.
+
+Device drivers are the primary user of shared memory, but there's no
+need in touching every driver. DMA buffers and ioremap()'ed regions are
+converted to shared automatically.
+
+TDX uses SWIOTLB for most DMA allocations. The SWIOTLB buffer is
+converted to shared on boot.
+
+For coherent DMA allocation, the DMA buffer gets converted on the
+allocation. Check force_dma_unencrypted() for details.
+
+References
+==========
+
+More details about TDX module (and its response for MSR, memory access,
+IO, CPUID etc) can be found at,
+
+https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1.0-public-spec-v0.931.pdf
+
+More details about TDX hypercall and TDX module call ABI can be found
+at,
+
+https://www.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface-1.0-344426-002.pdf
+
+More details about TDVF requirements can be found at,
+
+https://www.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.01.pdf
-- 
2.34.1


  parent reply	other threads:[~2022-01-24 15:03 UTC|newest]

Thread overview: 154+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-24 15:01 [PATCHv2 00/29] TDX Guest: TDX core support Kirill A. Shutemov
2022-01-24 15:01 ` [PATCHv2 01/29] x86/tdx: Detect running as a TDX guest in early boot Kirill A. Shutemov
2022-02-01 19:29   ` Thomas Gleixner
2022-02-01 23:14     ` Kirill A. Shutemov
2022-02-03  0:32       ` Josh Poimboeuf
2022-02-03 14:09         ` Kirill A. Shutemov
2022-02-03 15:13           ` Dave Hansen
2022-01-24 15:01 ` [PATCHv2 02/29] x86/tdx: Extend the cc_platform_has() API to support TDX guests Kirill A. Shutemov
2022-02-01 19:31   ` Thomas Gleixner
2022-01-24 15:01 ` [PATCHv2 03/29] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kirill A. Shutemov
2022-02-01 19:58   ` Thomas Gleixner
2022-02-02  2:55     ` Kirill A. Shutemov
2022-02-02 10:59       ` Kai Huang
2022-02-03 14:44         ` Kirill A. Shutemov
2022-02-03 23:47           ` Kai Huang
2022-02-04  3:43           ` Kirill A. Shutemov
2022-02-04  9:51             ` Kai Huang
2022-02-04 13:20               ` Kirill A. Shutemov
2022-02-04 10:12             ` Kai Huang
2022-02-04 13:18               ` Kirill A. Shutemov
2022-02-05  0:06                 ` Kai Huang
2022-02-02 17:08       ` Thomas Gleixner
2022-01-24 15:01 ` [PATCHv2 04/29] x86/traps: Add #VE support for TDX guest Kirill A. Shutemov
2022-02-01 21:02   ` Thomas Gleixner
2022-02-01 21:26     ` Sean Christopherson
2022-02-12  1:42     ` Kirill A. Shutemov
2022-01-24 15:01 ` [PATCHv2 05/29] x86/tdx: Add HLT support for TDX guests Kirill A. Shutemov
2022-01-29 14:53   ` Borislav Petkov
2022-01-29 22:30     ` [PATCHv2.1 " Kirill A. Shutemov
2022-02-01 21:21       ` Thomas Gleixner
2022-02-02 12:48         ` Kirill A. Shutemov
2022-02-02 17:17           ` Thomas Gleixner
2022-02-04 16:55             ` Kirill A. Shutemov
2022-02-07 22:52               ` Sean Christopherson
2022-02-09 14:34                 ` Kirill A. Shutemov
2022-02-09 18:05                   ` Sean Christopherson
2022-02-09 22:23                     ` Kirill A. Shutemov
2022-02-10  1:21                       ` Sean Christopherson
2022-01-24 15:01 ` [PATCHv2 06/29] x86/tdx: Add MSR " Kirill A. Shutemov
2022-02-01 21:38   ` Thomas Gleixner
2022-02-02 13:06     ` Kirill A. Shutemov
2022-02-02 17:18       ` Thomas Gleixner
2022-01-24 15:01 ` [PATCHv2 07/29] x86/tdx: Handle CPUID via #VE Kirill A. Shutemov
2022-02-01 21:39   ` Thomas Gleixner
2022-01-24 15:01 ` [PATCHv2 08/29] x86/tdx: Handle in-kernel MMIO Kirill A. Shutemov
2022-01-24 19:30   ` Josh Poimboeuf
2022-01-24 22:08     ` Kirill A. Shutemov
2022-01-24 23:04       ` Josh Poimboeuf
2022-01-24 22:40   ` Dave Hansen
2022-01-24 23:04     ` [PATCHv2.1 " Kirill A. Shutemov
2022-02-01 16:14       ` Borislav Petkov
2022-02-01 22:30   ` [PATCHv2 " Thomas Gleixner
2022-01-24 15:01 ` [PATCHv2 09/29] x86/tdx: Detect TDX at early kernel decompression time Kirill A. Shutemov
2022-02-01 18:30   ` Borislav Petkov
2022-02-01 22:33   ` Thomas Gleixner
2022-01-24 15:01 ` [PATCHv2 10/29] x86: Consolidate port I/O helpers Kirill A. Shutemov
2022-02-01 22:36   ` Thomas Gleixner
2022-01-24 15:01 ` [PATCHv2 11/29] x86/boot: Allow to hook up alternative " Kirill A. Shutemov
2022-02-01 19:02   ` Borislav Petkov
2022-02-01 22:39   ` Thomas Gleixner
2022-02-01 22:53     ` Thomas Gleixner
2022-02-02 17:20       ` Kirill A. Shutemov
2022-02-02 19:05         ` Thomas Gleixner
2022-01-24 15:01 ` [PATCHv2 12/29] x86/boot/compressed: Support TDX guest port I/O at decompression time Kirill A. Shutemov
2022-02-01 22:55   ` Thomas Gleixner
2022-01-24 15:01 ` [PATCHv2 13/29] x86/tdx: Add port I/O emulation Kirill A. Shutemov
2022-02-01 23:01   ` Thomas Gleixner
2022-02-02  6:22   ` Borislav Petkov
2022-01-24 15:02 ` [PATCHv2 14/29] x86/tdx: Early boot handling of port I/O Kirill A. Shutemov
2022-02-01 23:02   ` Thomas Gleixner
2022-02-02 10:09   ` Borislav Petkov
2022-01-24 15:02 ` [PATCHv2 15/29] x86/tdx: Wire up KVM hypercalls Kirill A. Shutemov
2022-02-01 23:05   ` Thomas Gleixner
2022-01-24 15:02 ` [PATCHv2 16/29] x86/boot: Add a trampoline for booting APs via firmware handoff Kirill A. Shutemov
2022-02-01 23:06   ` Thomas Gleixner
2022-02-02 11:27   ` Borislav Petkov
2022-02-04 11:27     ` Kuppuswamy, Sathyanarayanan
2022-02-04 13:49       ` Borislav Petkov
2022-02-15 21:36         ` Kirill A. Shutemov
2022-02-16 10:07           ` Borislav Petkov
2022-02-16 14:10             ` Kirill A. Shutemov
2022-02-10  0:25       ` Kai Huang
2022-01-24 15:02 ` [PATCHv2 17/29] x86/acpi, x86/boot: Add multiprocessor wake-up support Kirill A. Shutemov
2022-02-01 23:27   ` Thomas Gleixner
2022-02-05 12:37     ` Kuppuswamy, Sathyanarayanan
2022-01-24 15:02 ` [PATCHv2 18/29] x86/boot: Avoid #VE during boot for TDX platforms Kirill A. Shutemov
2022-02-02  0:04   ` Thomas Gleixner
2022-02-11 16:13     ` Kirill A. Shutemov
2022-01-24 15:02 ` [PATCHv2 19/29] x86/topology: Disable CPU online/offline control for TDX guests Kirill A. Shutemov
2022-02-02  0:09   ` Thomas Gleixner
2022-02-02  0:11     ` Thomas Gleixner
2022-02-03 15:00       ` Borislav Petkov
2022-02-03 21:26         ` Thomas Gleixner
2022-01-24 15:02 ` [PATCHv2 20/29] x86/tdx: Get page shared bit info from the TDX module Kirill A. Shutemov
2022-02-02  0:14   ` Thomas Gleixner
2022-02-07 22:27     ` Sean Christopherson
2022-02-07 10:44   ` Borislav Petkov
2022-01-24 15:02 ` [PATCHv2 21/29] x86/tdx: Exclude shared bit from __PHYSICAL_MASK Kirill A. Shutemov
2022-02-02  0:18   ` Thomas Gleixner
2022-01-24 15:02 ` [PATCHv2 22/29] x86/tdx: Make pages shared in ioremap() Kirill A. Shutemov
2022-02-02  0:25   ` Thomas Gleixner
2022-02-02 19:27     ` Kirill A. Shutemov
2022-02-02 19:47       ` Thomas Gleixner
2022-02-07 16:27   ` Borislav Petkov
2022-02-07 16:57     ` Dave Hansen
2022-02-07 17:28       ` Borislav Petkov
2022-02-14 22:09         ` Kirill A. Shutemov
2022-02-15 10:50           ` Borislav Petkov
2022-02-15 14:49           ` Tom Lendacky
2022-02-15 15:41             ` Kirill A. Shutemov
2022-02-15 15:55               ` Tom Lendacky
2022-02-15 16:27                 ` Kirill A. Shutemov
2022-02-15 16:34                   ` Dave Hansen
2022-02-15 17:33                     ` Kirill A. Shutemov
2022-02-16  9:58                       ` Borislav Petkov
2022-02-16 15:37                         ` Kirill A. Shutemov
2022-02-17 15:24                           ` Borislav Petkov
2022-01-24 15:02 ` [PATCHv2 23/29] x86/tdx: Add helper to convert memory between shared and private Kirill A. Shutemov
2022-02-02  0:35   ` Thomas Gleixner
2022-02-08 12:12   ` Borislav Petkov
2022-02-09 23:21     ` Kirill A. Shutemov
2022-01-24 15:02 ` [PATCHv2 24/29] x86/mm/cpa: Add support for TDX shared memory Kirill A. Shutemov
2022-02-02  1:27   ` Thomas Gleixner
2022-01-24 15:02 ` [PATCHv2 25/29] x86/kvm: Use bounce buffers for TD guest Kirill A. Shutemov
2022-01-24 15:02 ` [PATCHv2 26/29] x86/tdx: ioapic: Add shared bit for IOAPIC base address Kirill A. Shutemov
2022-02-02  1:33   ` Thomas Gleixner
2022-02-04 22:09     ` Yamahata, Isaku
2022-02-04 22:31     ` Kirill A. Shutemov
2022-02-07 14:08       ` Tom Lendacky
2022-01-24 15:02 ` [PATCHv2 27/29] ACPICA: Avoid cache flush on TDX guest Kirill A. Shutemov
2022-01-24 15:02 ` [PATCHv2 28/29] x86/tdx: Warn about unexpected WBINVD Kirill A. Shutemov
2022-02-02  1:46   ` Thomas Gleixner
2022-02-04 21:35     ` Kirill A. Shutemov
2022-01-24 15:02 ` Kirill A. Shutemov [this message]
2022-02-24  9:08   ` [PATCHv2 29/29] Documentation/x86: Document TDX kernel architecture Xiaoyao Li
2022-02-09 10:56 ` [PATCHv2 00/29] TDX Guest: TDX core support Kai Huang
2022-02-09 11:08   ` Borislav Petkov
2022-02-09 11:30     ` Kai Huang
2022-02-09 11:40       ` Borislav Petkov
2022-02-09 11:48         ` Kai Huang
2022-02-09 11:56           ` Borislav Petkov
2022-02-09 11:58             ` Kai Huang
2022-02-09 16:50             ` Sean Christopherson
2022-02-09 19:11               ` Borislav Petkov
2022-02-09 20:07                 ` Sean Christopherson
2022-02-09 20:36                   ` Borislav Petkov
2022-02-10  0:05                     ` Kai Huang
2022-02-16 16:08                       ` Sean Christopherson
2022-02-16 15:48                     ` Kirill A. Shutemov
2022-02-17 15:19                       ` Borislav Petkov
2022-02-17 15:26                         ` Kirill A. Shutemov
2022-02-17 15:34                           ` Borislav Petkov
2022-02-17 15:29                         ` Sean Christopherson
2022-02-17 15:31                           ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220124150215.36893-30-kirill.shutemov@linux.intel.com \
    --to=kirill.shutemov@linux.intel.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jpoimboe@redhat.com \
    --cc=knsathya@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sdeep@vmware.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).