kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/40] TDX QEMU support
@ 2022-08-02  7:47 Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 01/40] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
                   ` (41 more replies)
  0 siblings, 42 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

This is the first version that removes RFC tag since last RFC gots
several acked-by. Hope more people and reviewers can help review it.


This patch series aims to enable TDX support to allow creating and booting a
TD (TDX VM) with QEMU. It needs to work with corresponding KVM patch [1].
TDX related documents can be found in [2].

this series is also available in github:

https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v1

To boot a TDX VM, it requires several changes/additional steps in the flow:

 1. specify the vm type KVM_X86_TDX_VM when creating VM with
    IOCTL(KVM_CREATE_VM);
 2. initialize VM scope configuration before creating any VCPU;
 3. initialize VCPU scope configuration;
 4. initialize virtual firmware (TDVF) in guest private memory before
    vcpu running;

Besides, TDX VM needs to boot with TDVF (TDX virtual firmware) and currently
upstream OVMF can serve as TDVF. This series adds the support of parsing TDVF,
loading TDVF into guest's private memory and preparing TD HOB info for TDVF.

[1] KVM TDX basic feature support v7
https://lore.kernel.org/all/cover.1656366337.git.isaku.yamahata@intel.com/

[2] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html

== Limitation and future work ==
- Readonly memslot

  TDX only support readonly (write protection) memslot for shared memory, but
  not for private memory. For simplicity, just mark readonly memslot not
  supported entirely for TDX. 

- CPU model

  We cannot create a TD with arbitrary CPU model like what for non-TDX VMs,
  because only a subset of features can be configured for TD.
  
  - It's recommended to use '-cpu host' to create TD;
  - '+feature/-feature' might not work as expected;

  future work: To introduce specific CPU model for TDs and enhance +/-features
               for TDs.

- gdb suppport

  gdb support to debug a TD of off-debug mode is future work.

== Patch organization ==
1           Manually fetch Linux UAPI changes for TDX;
2-19,29-30  Basic TDX support that parses vm-type and invoke TDX
            specific IOCTLs
20-28       Load, parse and initialize TDVF for TDX VM;
31-35       Disable unsupported functions for TDX VM;
36-39       Avoid errors due to KVM's requirement on TDX;
40          Add documentation of TDX;

== Change history ==
Changes from RFC v4:
[RFC v4] https://lore.kernel.org/qemu-devel/20220512031803.3315890-1-xiaoyao.li@intel.com/

- Add 3 more patches(9, 10, 11) to improve the tdx_get_supported_cpuid();
- make attributes of object tdx-guest not settable by user;
- improve get_tdx_capabilities() by using a known starting value and
  limiting the loop with a known size;
- clarify why isa.bios needs to be skipped;
- remove the MMIO hob setup since OVMF sets them up itself;

Changes from RFC v3:
[RFC v3] https://lore.kernel.org/qemu-devel/20220317135913.2166202-1-xiaoyao.li@intel.com/

- Load TDVF with -bios interface;
- Adapt to KVM API changes;
	- KVM_TDX_CAPABILITIES changes back to KVM-scope;
	- struct kvm_tdx_init_vm changes;
- Define TDX_SUPPORTED_KVM_FEATURES;
- Drop the patch of introducing property sept-ve-disable since it's not
  public yet;
- some misc cleanups


Changes from RFC v2:
[RFC v2] https://lore.kernel.org/qemu-devel/cover.1625704980.git.isaku.yamahata@intel.com/

- Get vm-type from confidential-guest-support object type;
- Drop machine_init_done_late_notifiers;
- Refactor tdx_ioctl implementation;
- re-use existing pflash interface to load TDVF (i.e., OVMF binaries);
- introduce new date structure to track memory type instead of changing
  e820 table;
- Force smm to off for TDX VM;
- Drop the patches that suppress level-trigger/SMI/INIT/SIPI since KVM
  will ingore them;
- Add documentation;


Changes from RFC v1:
[RFC v1] https://lore.kernel.org/qemu-devel/cover.1613188118.git.isaku.yamahata@intel.com/

- suppress level trigger/SMI/INIT/SIPI related to IOAPIC.
- add VM attribute sha384 to TD measurement.
- guest TSC Hz specification


Isaku Yamahata (4):
  i386/tdvf: Introduce function to parse TDVF metadata
  i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION
  hw/i386: add option to forcibly report edge trigger in acpi tables
  i386/tdx: Don't synchronize guest tsc for TDs

Sean Christopherson (2):
  i386/kvm: Move architectural CPUID leaf generation to separate helper
  i386/tdx: Don't get/put guest state for TDX VMs

Xiaoyao Li (34):
  *** HACK *** linux-headers: Update headers to pull in TDX API changes
  i386: Introduce tdx-guest object
  target/i386: Implement mc->kvm_type() to get VM type
  target/i386: Introduce kvm_confidential_guest_init()
  i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context
  i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
  i386/tdx: Adjust the supported CPUID based on TDX restrictions
  i386/tdx: Update tdx_fixed0/1 bits by tdx_caps.cpuid_config[]
  i386/tdx: Integrate tdx_caps->xfam_fixed0/1 into tdx_cpuid_lookup
  i386/tdx: Integrate tdx_caps->attrs_fixed0/1 to tdx_cpuid_lookup
  KVM: Introduce kvm_arch_pre_create_vcpu()
  i386/tdx: Initialize TDX before creating TD vcpus
  i386/tdx: Add property sept-ve-disable for tdx-guest object
  i386/tdx: Wire CPU features up with attributes of TD guest
  i386/tdx: Validate TD attributes
  i386/tdx: Implement user specified tsc frequency
  i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM
  i386/tdx: Parse TDVF metadata for TDX VM
  i386/tdx: Skip BIOS shadowing setup
  i386/tdx: Don't initialize pc.rom for TDX VMs
  i386/tdx: Track mem_ptr for each firmware entry of TDVF
  i386/tdx: Track RAM entries for TDX VM
  headers: Add definitions from UEFI spec for volumes, resources, etc...
  i386/tdx: Setup the TD HOB list
  i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu
  i386/tdx: Finalize TDX VM
  i386/tdx: Disable SMM for TDX VMs
  i386/tdx: Disable PIC for TDX VMs
  i386/tdx: Don't allow system reset for TDX VMs
  hw/i386: add eoi_intercept_unsupported member to X86MachineState
  i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
  i386/tdx: Skip kvm_put_apicbase() for TDs
  docs: Add TDX documentation

 accel/kvm/kvm-all.c                        |  21 +-
 configs/devices/i386-softmmu/default.mak   |   1 +
 docs/system/confidential-guest-support.rst |   1 +
 docs/system/i386/tdx.rst                   | 105 +++
 docs/system/target-i386.rst                |   1 +
 hw/i386/Kconfig                            |   6 +
 hw/i386/acpi-build.c                       |  99 ++-
 hw/i386/acpi-common.c                      |  50 +-
 hw/i386/meson.build                        |   1 +
 hw/i386/pc.c                               |  21 +-
 hw/i386/pc_sysfw.c                         |   7 +
 hw/i386/tdvf-hob.c                         | 146 ++++
 hw/i386/tdvf-hob.h                         |  24 +
 hw/i386/tdvf.c                             | 198 +++++
 hw/i386/x86.c                              |  35 +-
 include/hw/i386/tdvf.h                     |  58 ++
 include/hw/i386/x86.h                      |   1 +
 include/standard-headers/uefi/uefi.h       | 198 +++++
 include/sysemu/kvm.h                       |   1 +
 linux-headers/asm-x86/kvm.h                |  95 +++
 linux-headers/linux/kvm.h                  |   2 +
 qapi/qom.json                              |  14 +
 target/i386/cpu-internal.h                 |   9 +
 target/i386/cpu.c                          |  12 -
 target/i386/cpu.h                          |  21 +
 target/i386/kvm/kvm.c                      | 363 +++++----
 target/i386/kvm/kvm_i386.h                 |   6 +
 target/i386/kvm/meson.build                |   2 +
 target/i386/kvm/tdx-stub.c                 |  19 +
 target/i386/kvm/tdx.c                      | 838 +++++++++++++++++++++
 target/i386/kvm/tdx.h                      |  55 ++
 target/i386/sev.c                          |   1 -
 target/i386/sev.h                          |   2 +
 33 files changed, 2193 insertions(+), 220 deletions(-)
 create mode 100644 docs/system/i386/tdx.rst
 create mode 100644 hw/i386/tdvf-hob.c
 create mode 100644 hw/i386/tdvf-hob.h
 create mode 100644 hw/i386/tdvf.c
 create mode 100644 include/hw/i386/tdvf.h
 create mode 100644 include/standard-headers/uefi/uefi.h
 create mode 100644 target/i386/kvm/tdx-stub.c
 create mode 100644 target/i386/kvm/tdx.c
 create mode 100644 target/i386/kvm/tdx.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v1 01/40] *** HACK *** linux-headers: Update headers to pull in TDX API changes
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  9:47   ` Daniel P. Berrangé
  2022-08-02  7:47 ` [PATCH v1 02/40] i386: Introduce tdx-guest object Xiaoyao Li
                   ` (40 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Pull in recent TDX updates, which are not backwards compatible.

It's just to make this series runnable. It will be updated by script

	scripts/update-linux-headers.sh

once TDX support is upstreamed in linux kernel.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 linux-headers/asm-x86/kvm.h | 95 +++++++++++++++++++++++++++++++++++++
 linux-headers/linux/kvm.h   |  2 +
 2 files changed, 97 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index bf6e96011dfe..a5433cc71f79 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -525,4 +525,99 @@ struct kvm_pmu_event_filter {
 #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
 
+#define KVM_X86_DEFAULT_VM	0
+#define KVM_X86_TDX_VM		1
+
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES = 0,
+	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 * Defined for consistency with struct kvm_sev_cmd.
+	 */
+	__u64 error;
+	/* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+	__u64 unused;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u32 max_vcpus;
+	__u32 padding;
+	__u64 mrconfigid[6];	/* sha384 digest */
+	__u64 mrowner[6];	/* sha384 digest */
+	__u64 mrownerconfig[6];	/* sha348 digest */
+	union {
+		/*
+		 * KVM_TDX_INIT_VM is called before vcpu creation, thus before
+		 * KVM_SET_CPUID2.  CPUID configurations needs to be passed.
+		 *
+		 * This configuration supersedes KVM_SET_CPUID{,2}.
+		 * The user space VMM, e.g. qemu, should make them consistent
+		 * with this values.
+		 * sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES(256)
+		 * = 8KB.
+		 */
+		struct {
+			struct kvm_cpuid2 cpuid;
+			/* 8KB with KVM_MAX_CPUID_ENTRIES. */
+			struct kvm_cpuid_entry2 entries[];
+		};
+		/*
+		 * For future extensibility.
+		 * The size(struct kvm_tdx_init_vm) = 16KB.
+		 * This should be enough given sizeof(TD_PARAMS) = 1024
+		 */
+		__u64 reserved[2028];
+	};
+};
+
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index f089349149a5..054cf89fa2d6 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1151,6 +1151,8 @@ struct kvm_ppc_resize_hpt {
 /* #define KVM_CAP_VM_TSC_CONTROL 214 */
 #define KVM_CAP_SYSTEM_EVENT_DATA 215
 
+#define KVM_CAP_VM_TYPES 216
+
 #ifdef KVM_CAP_IRQ_ROUTING
 
 struct kvm_irq_routing_irqchip {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 02/40] i386: Introduce tdx-guest object
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 01/40] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 03/40] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
                   ` (39 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Introduce tdx-guest object which implements the interface of
CONFIDENTIAL_GUEST_SUPPORT, and will be used to create TDX VMs (TDs) by

  qemu -machine ...,confidential-guest-support=tdx0	\
       -object tdx-guset,id=tdx0

It has only one property 'attributes' with fixed value 0 and not
configurable so far.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
changes from RFC-V4
- make @attributes not user-settable
---
 configs/devices/i386-softmmu/default.mak |  1 +
 hw/i386/Kconfig                          |  5 +++
 qapi/qom.json                            | 12 +++++++
 target/i386/kvm/meson.build              |  2 ++
 target/i386/kvm/tdx.c                    | 40 ++++++++++++++++++++++++
 target/i386/kvm/tdx.h                    | 19 +++++++++++
 6 files changed, 79 insertions(+)
 create mode 100644 target/i386/kvm/tdx.c
 create mode 100644 target/i386/kvm/tdx.h

diff --git a/configs/devices/i386-softmmu/default.mak b/configs/devices/i386-softmmu/default.mak
index 598c6646dfc0..9b5ec59d65b0 100644
--- a/configs/devices/i386-softmmu/default.mak
+++ b/configs/devices/i386-softmmu/default.mak
@@ -18,6 +18,7 @@
 #CONFIG_QXL=n
 #CONFIG_SEV=n
 #CONFIG_SGA=n
+#CONFIG_TDX=n
 #CONFIG_TEST_DEVICES=n
 #CONFIG_TPM_CRB=n
 #CONFIG_TPM_TIS_ISA=n
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index d22ac4a4b952..9e40ff79fc2d 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -10,6 +10,10 @@ config SGX
     bool
     depends on KVM
 
+config TDX
+    bool
+    depends on KVM
+
 config PC
     bool
     imply APPLESMC
@@ -26,6 +30,7 @@ config PC
     imply QXL
     imply SEV
     imply SGX
+    imply TDX
     imply SGA
     imply TEST_DEVICES
     imply TPM_CRB
diff --git a/qapi/qom.json b/qapi/qom.json
index 80dd419b3925..38177848abc1 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -830,6 +830,16 @@
             'reduced-phys-bits': 'uint32',
             '*kernel-hashes': 'bool' } }
 
+##
+# @TdxGuestProperties:
+#
+# Properties for tdx-guest objects.
+#
+# Since: 7.2
+##
+{ 'struct': 'TdxGuestProperties',
+  'data': { }}
+
 ##
 # @ObjectType:
 #
@@ -883,6 +893,7 @@
       'if': 'CONFIG_SECRET_KEYRING' },
     'sev-guest',
     's390-pv-guest',
+    'tdx-guest',
     'throttle-group',
     'tls-creds-anon',
     'tls-creds-psk',
@@ -948,6 +959,7 @@
       'secret_keyring':             { 'type': 'SecretKeyringProperties',
                                       'if': 'CONFIG_SECRET_KEYRING' },
       'sev-guest':                  'SevGuestProperties',
+      'tdx-guest':                  'TdxGuestProperties',
       'throttle-group':             'ThrottleGroupProperties',
       'tls-creds-anon':             'TlsCredsAnonProperties',
       'tls-creds-psk':              'TlsCredsPskProperties',
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 736df8b72e3f..b2d7d41acde2 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -9,6 +9,8 @@ i386_softmmu_kvm_ss.add(files(
 
 i386_softmmu_kvm_ss.add(when: 'CONFIG_SEV', if_false: files('sev-stub.c'))
 
+i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
+
 i386_softmmu_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
 
 i386_softmmu_ss.add_all(when: 'CONFIG_KVM', if_true: i386_softmmu_kvm_ss)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
new file mode 100644
index 000000000000..d3792d4a3d56
--- /dev/null
+++ b/target/i386/kvm/tdx.c
@@ -0,0 +1,40 @@
+/*
+ * QEMU TDX support
+ *
+ * Copyright Intel
+ *
+ * Author:
+ *      Xiaoyao Li <xiaoyao.li@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object_interfaces.h"
+
+#include "tdx.h"
+
+/* tdx guest */
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
+                                   tdx_guest,
+                                   TDX_GUEST,
+                                   CONFIDENTIAL_GUEST_SUPPORT,
+                                   { TYPE_USER_CREATABLE },
+                                   { NULL })
+
+static void tdx_guest_init(Object *obj)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    tdx->attributes = 0;
+}
+
+static void tdx_guest_finalize(Object *obj)
+{
+}
+
+static void tdx_guest_class_init(ObjectClass *oc, void *data)
+{
+}
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
new file mode 100644
index 000000000000..415aeb5af746
--- /dev/null
+++ b/target/i386/kvm/tdx.h
@@ -0,0 +1,19 @@
+#ifndef QEMU_I386_TDX_H
+#define QEMU_I386_TDX_H
+
+#include "exec/confidential-guest-support.h"
+
+#define TYPE_TDX_GUEST "tdx-guest"
+#define TDX_GUEST(obj)  OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
+
+typedef struct TdxGuestClass {
+    ConfidentialGuestSupportClass parent_class;
+} TdxGuestClass;
+
+typedef struct TdxGuest {
+    ConfidentialGuestSupport parent_obj;
+
+    uint64_t attributes;    /* TD attributes */
+} TdxGuest;
+
+#endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 03/40] target/i386: Implement mc->kvm_type() to get VM type
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 01/40] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 02/40] i386: Introduce tdx-guest object Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 04/40] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX VM requires VM type KVM_X86_TDX_VM to be passed to
kvm_ioctl(KVM_CREATE_VM). Hence implement mc->kvm_type() for i386
architecture.

If tdx-guest object is specified to confidential-guest-support, like,

  qemu -machine ...,confidential-guest-support=tdx0 \
       -object tdx-guest,id=tdx0,...

it parses VM type as KVM_X86_TDX_VM. Otherwise, it's KVM_X86_DEFAULT_VM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/i386/x86.c              |  6 ++++++
 target/i386/kvm/kvm.c      | 30 ++++++++++++++++++++++++++++++
 target/i386/kvm/kvm_i386.h |  1 +
 3 files changed, 37 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8e2..a15fadeb0e68 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1379,6 +1379,11 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
     qapi_free_SgxEPCList(list);
 }
 
+static int x86_kvm_type(MachineState *ms, const char *vm_type)
+{
+    return kvm_get_vm_type(ms, vm_type);
+}
+
 static void x86_machine_initfn(Object *obj)
 {
     X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1403,6 +1408,7 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
     mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
     mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
     mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+    mc->kvm_type = x86_kvm_type;
     x86mc->save_tsc_khz = true;
     x86mc->fwcfg_dma_enabled = true;
     nc->nmi_monitor_handler = x86_nmi;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f148a6d52fa4..33e0d2948f77 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -30,6 +30,7 @@
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "sev.h"
+#include "tdx.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
 
@@ -143,6 +144,35 @@ static struct kvm_msr_list *kvm_feature_msrs;
 static RateLimit bus_lock_ratelimit_ctrl;
 static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
 
+static const char* vm_type_name[] = {
+    [KVM_X86_DEFAULT_VM] = "X86_DEFAULT_VM",
+    [KVM_X86_TDX_VM] = "X86_TDX_VM",
+};
+
+int kvm_get_vm_type(MachineState *ms, const char *vm_type)
+{
+    int kvm_type = KVM_X86_DEFAULT_VM;
+
+    if (ms->cgs && object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
+        kvm_type = KVM_X86_TDX_VM;
+    }
+
+    /*
+     * old KVM doesn't support KVM_CAP_VM_TYPES and KVM_X86_DEFAULT_VM
+     * is always supported
+     */
+    if (kvm_type == KVM_X86_DEFAULT_VM) {
+        return kvm_type;
+    }
+
+    if (!(kvm_check_extension(KVM_STATE(ms->accelerator), KVM_CAP_VM_TYPES) & BIT(kvm_type))) {
+        error_report("vm-type %s not supported by KVM", vm_type_name[kvm_type]);
+        exit(1);
+    }
+
+    return kvm_type;
+}
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 4124912c202e..b434feaa6b1d 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -37,6 +37,7 @@ bool kvm_has_adjust_clock(void);
 bool kvm_has_adjust_clock_stable(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
+int kvm_get_vm_type(MachineState *ms, const char *vm_type);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_do_init_vcpu(X86CPU *cs);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 04/40] target/i386: Introduce kvm_confidential_guest_init()
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (2 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 03/40] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 05/40] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context Xiaoyao Li
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Introduce a separate function kvm_confidential_guest_init() for SEV (and
future TDX).

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 target/i386/kvm/kvm.c | 11 ++++++++++-
 target/i386/sev.c     |  1 -
 target/i386/sev.h     |  2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 33e0d2948f77..1f4a6a4dff28 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2448,6 +2448,15 @@ static void register_smram_listener(Notifier *n, void *unused)
                                  &smram_address_space, 1, "kvm-smram");
 }
 
+static int kvm_confidential_guest_init(MachineState *ms, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
+        return sev_kvm_init(ms->cgs, errp);
+    }
+
+    return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     uint64_t identity_base = 0xfffbc000;
@@ -2468,7 +2477,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
      * mechanisms are supported in future (e.g. TDX), they'll need
      * their own initialization either here or elsewhere.
      */
-    ret = sev_kvm_init(ms->cgs, &local_err);
+    ret = kvm_confidential_guest_init(ms, &local_err);
     if (ret < 0) {
         error_report_err(local_err);
         return ret;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 32f7dbac4efa..6089b91cc698 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -39,7 +39,6 @@
 #include "hw/i386/pc.h"
 #include "exec/address-spaces.h"
 
-#define TYPE_SEV_GUEST "sev-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
 
 
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 7b1528248a54..64fbf186dbd2 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -20,6 +20,8 @@
 
 #include "exec/confidential-guest-support.h"
 
+#define TYPE_SEV_GUEST "sev-guest"
+
 #define SEV_POLICY_NODBG        0x1
 #define SEV_POLICY_NOKS         0x2
 #define SEV_POLICY_ES           0x4
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 05/40] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (3 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 04/40] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 06/40] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
                   ` (36 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Introduce tdx_kvm_init() and invoke it in kvm_confidential_guest_init()
if it's a TDX VM. More initialization will be added later.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 target/i386/kvm/kvm.c       | 15 ++++++---------
 target/i386/kvm/meson.build |  2 +-
 target/i386/kvm/tdx-stub.c  |  9 +++++++++
 target/i386/kvm/tdx.c       |  7 +++++++
 target/i386/kvm/tdx.h       |  2 ++
 5 files changed, 25 insertions(+), 10 deletions(-)
 create mode 100644 target/i386/kvm/tdx-stub.c

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 1f4a6a4dff28..335f87e6cc59 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -54,6 +54,7 @@
 #include "migration/blocker.h"
 #include "exec/memattrs.h"
 #include "trace.h"
+#include "tdx.h"
 
 #include CONFIG_DEVICES
 
@@ -2452,6 +2453,8 @@ static int kvm_confidential_guest_init(MachineState *ms, Error **errp)
 {
     if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
         return sev_kvm_init(ms->cgs, errp);
+    } else if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
+        return tdx_kvm_init(ms, errp);
     }
 
     return 0;
@@ -2466,16 +2469,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     Error *local_err = NULL;
 
     /*
-     * Initialize SEV context, if required
+     * Initialize confidential guest (SEV/TDX) context, if required
      *
-     * If no memory encryption is requested (ms->cgs == NULL) this is
-     * a no-op.
-     *
-     * It's also a no-op if a non-SEV confidential guest support
-     * mechanism is selected.  SEV is the only mechanism available to
-     * select on x86 at present, so this doesn't arise, but if new
-     * mechanisms are supported in future (e.g. TDX), they'll need
-     * their own initialization either here or elsewhere.
+     * It's a no-op if a non-SEV/non-tdx confidential guest support
+     * mechanism is selected, i.e., ms->cgs == NULL
      */
     ret = kvm_confidential_guest_init(ms, &local_err);
     if (ret < 0) {
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index b2d7d41acde2..fd30b93ecec9 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -9,7 +9,7 @@ i386_softmmu_kvm_ss.add(files(
 
 i386_softmmu_kvm_ss.add(when: 'CONFIG_SEV', if_false: files('sev-stub.c'))
 
-i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
+i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'), if_false: files('tdx-stub.c'))
 
 i386_softmmu_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
 
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
new file mode 100644
index 000000000000..1df24735201e
--- /dev/null
+++ b/target/i386/kvm/tdx-stub.c
@@ -0,0 +1,9 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "tdx.h"
+
+int tdx_kvm_init(MachineState *ms, Error **errp)
+{
+    return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index d3792d4a3d56..77e33ae01147 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -12,10 +12,17 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "qom/object_interfaces.h"
 
+#include "hw/i386/x86.h"
 #include "tdx.h"
 
+int tdx_kvm_init(MachineState *ms, Error **errp)
+{
+    return 0;
+}
+
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
                                    tdx_guest,
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 415aeb5af746..c8a23d95258d 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -16,4 +16,6 @@ typedef struct TdxGuest {
     uint64_t attributes;    /* TD attributes */
 } TdxGuest;
 
+int tdx_kvm_init(MachineState *ms, Error **errp);
+
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 06/40] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (4 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 05/40] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-25 10:12   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 07/40] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
                   ` (35 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.

Since there is no interface reporting how many cpuid configs contains in
KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
and abort when it exceeds KVM_MAX_CPUID_ENTRIES.

Besides, introduce the interfaces to invoke TDX "ioctls" at different
scope (KVM, VM and VCPU) in preparation.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
changes from RFC v4:
  - start from nr_cpuid_configs = 6 for the loop;
  - stop the loop when nr_cpuid_configs exceeds KVM_MAX_CPUID_ENTRIES;
---
 target/i386/kvm/kvm.c      |  2 -
 target/i386/kvm/kvm_i386.h |  2 +
 target/i386/kvm/tdx.c      | 92 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 335f87e6cc59..9e30fa9f4eb5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1704,8 +1704,6 @@ static int hyperv_init_vcpu(X86CPU *cpu)
 
 static Error *invtsc_mig_blocker;
 
-#define KVM_MAX_CPUID_ENTRIES  100
-
 static void kvm_init_xsave(CPUX86State *env)
 {
     if (has_xsave2) {
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index b434feaa6b1d..6b24ab2a7813 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -13,6 +13,8 @@
 
 #include "sysemu/kvm.h"
 
+#define KVM_MAX_CPUID_ENTRIES  100
+
 #define kvm_apic_in_kernel() (kvm_irqchip_in_kernel())
 
 #ifdef CONFIG_KVM
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 77e33ae01147..89f81f7d7082 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,12 +14,104 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "sysemu/kvm.h"
 
 #include "hw/i386/x86.h"
+#include "kvm_i386.h"
 #include "tdx.h"
 
+static struct kvm_tdx_capabilities *tdx_caps;
+
+enum tdx_ioctl_level{
+    TDX_PLATFORM_IOCTL,
+    TDX_VM_IOCTL,
+    TDX_VCPU_IOCTL,
+};
+
+static int __tdx_ioctl(void *state, enum tdx_ioctl_level level, int cmd_id,
+                        __u32 flags, void *data)
+{
+    struct kvm_tdx_cmd tdx_cmd;
+    int r;
+
+    memset(&tdx_cmd, 0x0, sizeof(tdx_cmd));
+
+    tdx_cmd.id = cmd_id;
+    tdx_cmd.flags = flags;
+    tdx_cmd.data = (__u64)(unsigned long)data;
+
+    switch (level) {
+    case TDX_PLATFORM_IOCTL:
+        r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        break;
+    case TDX_VM_IOCTL:
+        r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        break;
+    case TDX_VCPU_IOCTL:
+        r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        break;
+    default:
+        error_report("Invalid tdx_ioctl_level %d", level);
+        exit(1);
+    }
+
+    return r;
+}
+
+static inline int tdx_platform_ioctl(int cmd_id, __u32 flags, void *data)
+{
+    return __tdx_ioctl(NULL, TDX_PLATFORM_IOCTL, cmd_id, flags, data);
+}
+
+static inline int tdx_vm_ioctl(int cmd_id, __u32 flags, void *data)
+{
+    return __tdx_ioctl(NULL, TDX_VM_IOCTL, cmd_id, flags, data);
+}
+
+static inline int tdx_vcpu_ioctl(void *vcpu_fd, int cmd_id, __u32 flags,
+                                 void *data)
+{
+    return  __tdx_ioctl(vcpu_fd, TDX_VCPU_IOCTL, cmd_id, flags, data);
+}
+
+static void get_tdx_capabilities(void)
+{
+    struct kvm_tdx_capabilities *caps;
+    /* 1st generation of TDX reports 6 cpuid configs */
+    int nr_cpuid_configs = 6;
+    int r, size;
+
+    do {
+        size = sizeof(struct kvm_tdx_capabilities) +
+               nr_cpuid_configs * sizeof(struct kvm_tdx_cpuid_config);
+        caps = g_malloc0(size);
+        caps->nr_cpuid_configs = nr_cpuid_configs;
+
+        r = tdx_platform_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
+        if (r == -E2BIG) {
+            g_free(caps);
+            nr_cpuid_configs *= 2;
+            if (nr_cpuid_configs > KVM_MAX_CPUID_ENTRIES) {
+                error_report("KVM TDX seems broken");
+                exit(1);
+            }
+        } else if (r < 0) {
+            g_free(caps);
+            error_report("KVM_TDX_CAPABILITIES failed: %s\n", strerror(-r));
+            exit(1);
+        }
+    }
+    while (r == -E2BIG);
+
+    tdx_caps = caps;
+}
+
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
+    if (!tdx_caps) {
+        get_tdx_capabilities();
+    }
+
     return 0;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 07/40] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (5 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 06/40] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-25 10:16   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions Xiaoyao Li
                   ` (34 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

It will need special handling for TDX VMs all around the QEMU.
Introduce is_tdx_vm() helper to query if it's a TDX VM.

Cache tdx_guest object thus no need to cast from ms->cgs every time.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 13 +++++++++++++
 target/i386/kvm/tdx.h | 10 ++++++++++
 2 files changed, 23 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 89f81f7d7082..fdd6bec58758 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -20,8 +20,16 @@
 #include "kvm_i386.h"
 #include "tdx.h"
 
+static TdxGuest *tdx_guest;
+
 static struct kvm_tdx_capabilities *tdx_caps;
 
+/* It's valid after kvm_confidential_guest_init()->kvm_tdx_init() */
+bool is_tdx_vm(void)
+{
+    return !!tdx_guest;
+}
+
 enum tdx_ioctl_level{
     TDX_PLATFORM_IOCTL,
     TDX_VM_IOCTL,
@@ -108,10 +116,15 @@ static void get_tdx_capabilities(void)
 
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+
     if (!tdx_caps) {
         get_tdx_capabilities();
     }
 
+    tdx_guest = tdx;
+
     return 0;
 }
 
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index c8a23d95258d..4036ca2f3f99 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -1,6 +1,10 @@
 #ifndef QEMU_I386_TDX_H
 #define QEMU_I386_TDX_H
 
+#ifndef CONFIG_USER_ONLY
+#include CONFIG_DEVICES /* CONFIG_TDX */
+#endif
+
 #include "exec/confidential-guest-support.h"
 
 #define TYPE_TDX_GUEST "tdx-guest"
@@ -16,6 +20,12 @@ typedef struct TdxGuest {
     uint64_t attributes;    /* TD attributes */
 } TdxGuest;
 
+#ifdef CONFIG_TDX
+bool is_tdx_vm(void);
+#else
+#define is_tdx_vm() 0
+#endif /* CONFIG_TDX */
+
 int tdx_kvm_init(MachineState *ms, Error **errp);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (6 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 07/40] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-03  7:33   ` Chenyi Qiang
  2022-08-25 11:26   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 09/40] i386/tdx: Update tdx_fixed0/1 bits by tdx_caps.cpuid_config[] Xiaoyao Li
                   ` (33 subsequent siblings)
  41 siblings, 2 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

According to Chapter "CPUID Virtualization" in TDX module spec, CPUID
bits of TD can be classified into 6 types:

------------------------------------------------------------------------
1 | As configured | configurable by VMM, independent of native value;
------------------------------------------------------------------------
2 | As configured | configurable by VMM if the bit is supported natively
    (if native)   | Otherwise it equals as native(0).
------------------------------------------------------------------------
3 | Fixed         | fixed to 0/1
------------------------------------------------------------------------
4 | Native        | reflect the native value
------------------------------------------------------------------------
5 | Calculated    | calculated by TDX module.
------------------------------------------------------------------------
6 | Inducing #VE  | get #VE exception
------------------------------------------------------------------------

Note:
1. All the configurable XFAM related features and TD attributes related
   features fall into type #2. And fixed0/1 bits of XFAM and TD
   attributes fall into type #3.

2. For CPUID leaves not listed in "CPUID virtualization Overview" table
   in TDX module spec. When they are queried, TDX module injects #VE to
   TDs. For this case, TDs can request CPUID emulation from VMM via
   TDVMCALL and the values are fully controlled by VMM.

Due to TDX module has its own virtualization policy on CPUID bits, it leads
to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported
CPUID bits for TDS. In order to keep a consistent CPUID configuration
between VMM and TDs. Adjust supported CPUID for TDs based on TDX
restrictions.

Currently only focus on the CPUID leaves recognized by QEMU's
feature_word_info[] that are indexed by a FeatureWord.

Introduce a TDX CPUID lookup table, which maintains 1 entry for each
FeatureWord. Each entry has below fields:

 - tdx_fixed0/1: The bits that are fixed as 0/1;

 - vmm_fixup:   The bits that are configurable from the view of TDX module.
                But they requires emulation of VMM when they are configured
	        as enabled. For those, they are not supported if VMM doesn't
		report them as supported. So they need be fixed up by
		checking if VMM supports them.

 - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is
                totally configurable by VMM.

 - supported_on_ve: It's valid only when @inducing_ve is true. It represents
		    the maximum feature set supported that be emulated
		    for TDs.

By applying TDX CPUID lookup table and TDX capabilities reported from
TDX module, the supported CPUID for TDs can be obtained from following
steps:

- get the base of VMM supported feature set;

- if the leaf is not a FeatureWord just return VMM's value without
  modification;

- if the leaf is an inducing_ve type, applying supported_on_ve mask and
  return;

- include all native bits, it covers type #2, #4, and parts of type #1.
  (it also includes some unsupported bits. The following step will
   correct it.)

- apply fixed0/1 to it (it covers #3, and rectifies the previous step);

- add configurable bits (it covers the other part of type #1);

- fix the ones in vmm_fixup;

- filter the one has valid .supported field;

(Calculated type is ignored since it's determined at runtime).

Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/cpu.h     |  16 +++
 target/i386/kvm/kvm.c |   4 +
 target/i386/kvm/tdx.c | 255 ++++++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx.h |   2 +
 4 files changed, 277 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 82004b65b944..cc9da9fc4318 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -771,6 +771,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 
 /* Support RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE */
 #define CPUID_7_0_EBX_FSGSBASE          (1U << 0)
+/* Support for TSC adjustment MSR 0x3B */
+#define CPUID_7_0_EBX_TSC_ADJUST        (1U << 1)
 /* Support SGX */
 #define CPUID_7_0_EBX_SGX               (1U << 2)
 /* 1st Group of Advanced Bit Manipulation Extensions */
@@ -789,8 +791,12 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_0_EBX_INVPCID           (1U << 10)
 /* Restricted Transactional Memory */
 #define CPUID_7_0_EBX_RTM               (1U << 11)
+/* Cache QoS Monitoring */
+#define CPUID_7_0_EBX_PQM               (1U << 12)
 /* Memory Protection Extension */
 #define CPUID_7_0_EBX_MPX               (1U << 14)
+/* Resource Director Technology Allocation */
+#define CPUID_7_0_EBX_RDT_A             (1U << 15)
 /* AVX-512 Foundation */
 #define CPUID_7_0_EBX_AVX512F           (1U << 16)
 /* AVX-512 Doubleword & Quadword Instruction */
@@ -846,10 +852,16 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_0_ECX_AVX512VNNI        (1U << 11)
 /* Support for VPOPCNT[B,W] and VPSHUFBITQMB */
 #define CPUID_7_0_ECX_AVX512BITALG      (1U << 12)
+/* Intel Total Memory Encryption */
+#define CPUID_7_0_ECX_TME               (1U << 13)
 /* POPCNT for vectors of DW/QW */
 #define CPUID_7_0_ECX_AVX512_VPOPCNTDQ  (1U << 14)
+/* Placeholder for bit 15 */
+#define CPUID_7_0_ECX_FZM               (1U << 15)
 /* 5-level Page Tables */
 #define CPUID_7_0_ECX_LA57              (1U << 16)
+/* MAWAU for MPX */
+#define CPUID_7_0_ECX_MAWAU             (31U << 17)
 /* Read Processor ID */
 #define CPUID_7_0_ECX_RDPID             (1U << 22)
 /* Bus Lock Debug Exception */
@@ -860,6 +872,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_0_ECX_MOVDIRI           (1U << 27)
 /* Move 64 Bytes as Direct Store Instruction */
 #define CPUID_7_0_ECX_MOVDIR64B         (1U << 28)
+/* ENQCMD and ENQCMDS instructions */
+#define CPUID_7_0_ECX_ENQCMD            (1U << 29)
 /* Support SGX Launch Control */
 #define CPUID_7_0_ECX_SGX_LC            (1U << 30)
 /* Protection Keys for Supervisor-mode Pages */
@@ -877,6 +891,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_0_EDX_SERIALIZE         (1U << 14)
 /* TSX Suspend Load Address Tracking instruction */
 #define CPUID_7_0_EDX_TSX_LDTRK         (1U << 16)
+/* PCONFIG instruction */
+#define CPUID_7_0_EDX_PCONFIG           (1U << 18)
 /* Architectural LBRs */
 #define CPUID_7_0_EDX_ARCH_LBR          (1U << 19)
 /* AVX512_FP16 instruction */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9e30fa9f4eb5..9930902ae890 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -492,6 +492,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
         ret |= 1U << KVM_HINTS_REALTIME;
     }
 
+    if (is_tdx_vm()) {
+        tdx_get_supported_cpuid(function, index, reg, &ret);
+    }
+
     return ret;
 }
 
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index fdd6bec58758..e3e9a424512e 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,11 +14,134 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/kvm.h"
 
 #include "hw/i386/x86.h"
 #include "kvm_i386.h"
 #include "tdx.h"
+#include "../cpu-internal.h"
+
+#define TDX_SUPPORTED_KVM_FEATURES  ((1U << KVM_FEATURE_NOP_IO_DELAY) | \
+                                     (1U << KVM_FEATURE_PV_UNHALT) | \
+                                     (1U << KVM_FEATURE_PV_TLB_FLUSH) | \
+                                     (1U << KVM_FEATURE_PV_SEND_IPI) | \
+                                     (1U << KVM_FEATURE_POLL_CONTROL) | \
+                                     (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
+                                     (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
+
+typedef struct KvmTdxCpuidLookup {
+    uint32_t tdx_fixed0;
+    uint32_t tdx_fixed1;
+
+    /*
+     * The CPUID bits that are configurable from the view of TDX module
+     * but require VMM emulation if configured to enabled by VMM.
+     *
+     * For those bits, they cannot be enabled actually if VMM (KVM/QEMU) cannot
+     * virtualize them.
+     */
+    uint32_t vmm_fixup;
+
+    bool inducing_ve;
+    /*
+     * The maximum supported feature set for given inducing-#VE leaf.
+     * It's valid only when .inducing_ve is true.
+     */
+    uint32_t supported_on_ve;
+} KvmTdxCpuidLookup;
+
+ /*
+  * QEMU maintained TDX CPUID lookup tables, which reflects how CPUIDs are
+  * virtualized for guest TDs based on "CPUID virtualization" of TDX spec.
+  *
+  * Note:
+  *
+  * This table will be updated runtime by tdx_caps reported by platform.
+  *
+  */
+static KvmTdxCpuidLookup tdx_cpuid_lookup[FEATURE_WORDS] = {
+    [FEAT_1_EDX] = {
+        .tdx_fixed0 =
+            BIT(10) | BIT(20) | CPUID_IA64,
+        .tdx_fixed1 =
+            CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_APIC |
+            CPUID_MTRR | CPUID_MCA | CPUID_CLFLUSH | CPUID_DTS,
+        .vmm_fixup =
+            CPUID_ACPI | CPUID_PBE,
+    },
+    [FEAT_1_ECX] = {
+        .tdx_fixed0 =
+            CPUID_EXT_MONITOR | CPUID_EXT_VMX | CPUID_EXT_SMX |
+            BIT(16),
+        .tdx_fixed1 =
+            CPUID_EXT_CX16 | CPUID_EXT_PDCM | CPUID_EXT_X2APIC |
+            CPUID_EXT_AES | CPUID_EXT_XSAVE | CPUID_EXT_RDRAND |
+            CPUID_EXT_HYPERVISOR,
+        .vmm_fixup =
+            CPUID_EXT_EST | CPUID_EXT_TM2 | CPUID_EXT_XTPR | CPUID_EXT_DCA,
+    },
+    [FEAT_8000_0001_EDX] = {
+        .tdx_fixed1 =
+            CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB | CPUID_EXT2_RDTSCP |
+            CPUID_EXT2_LM,
+    },
+    [FEAT_7_0_EBX] = {
+        .tdx_fixed0 =
+            CPUID_7_0_EBX_TSC_ADJUST | CPUID_7_0_EBX_SGX | CPUID_7_0_EBX_MPX,
+        .tdx_fixed1 =
+            CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_RTM |
+            CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_SMAP |
+            CPUID_7_0_EBX_CLFLUSHOPT | CPUID_7_0_EBX_CLWB |
+            CPUID_7_0_EBX_SHA_NI,
+        .vmm_fixup =
+            CPUID_7_0_EBX_PQM | CPUID_7_0_EBX_RDT_A,
+    },
+    [FEAT_7_0_ECX] = {
+        .tdx_fixed0 =
+            CPUID_7_0_ECX_FZM | CPUID_7_0_ECX_MAWAU |
+            CPUID_7_0_ECX_ENQCMD | CPUID_7_0_ECX_SGX_LC,
+        .tdx_fixed1 =
+            CPUID_7_0_ECX_MOVDIR64B | CPUID_7_0_ECX_BUS_LOCK_DETECT,
+        .vmm_fixup =
+            CPUID_7_0_ECX_TME,
+    },
+    [FEAT_7_0_EDX] = {
+        .tdx_fixed1 =
+            CPUID_7_0_EDX_SPEC_CTRL | CPUID_7_0_EDX_ARCH_CAPABILITIES |
+            CPUID_7_0_EDX_CORE_CAPABILITY | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
+        .vmm_fixup =
+            CPUID_7_0_EDX_PCONFIG,
+    },
+    [FEAT_8000_0001_EDX] = {
+        .tdx_fixed1 =
+            CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB |
+            CPUID_EXT2_RDTSCP | CPUID_EXT2_LM,
+    },
+    [FEAT_8000_0008_EBX] = {
+        .tdx_fixed0 =
+            ~CPUID_8000_0008_EBX_WBNOINVD,
+        .tdx_fixed1 =
+            CPUID_8000_0008_EBX_WBNOINVD,
+    },
+    [FEAT_XSAVE] = {
+        .tdx_fixed1 =
+            CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
+            CPUID_XSAVE_XSAVES,
+    },
+    [FEAT_6_EAX] = {
+        .inducing_ve = true,
+        .supported_on_ve = -1U,
+    },
+    [FEAT_8000_0007_EDX] = {
+        .inducing_ve = true,
+        .supported_on_ve = -1U,
+    },
+    [FEAT_KVM] = {
+        .inducing_ve = true,
+        .supported_on_ve = TDX_SUPPORTED_KVM_FEATURES,
+    },
+};
 
 static TdxGuest *tdx_guest;
 
@@ -30,6 +153,138 @@ bool is_tdx_vm(void)
     return !!tdx_guest;
 }
 
+static inline uint32_t host_cpuid_reg(uint32_t function,
+                                      uint32_t index, int reg)
+{
+    uint32_t eax, ebx, ecx, edx;
+    uint32_t ret = 0;
+
+    host_cpuid(function, index, &eax, &ebx, &ecx, &edx);
+
+    switch (reg) {
+    case R_EAX:
+        ret |= eax;
+        break;
+    case R_EBX:
+        ret |= ebx;
+        break;
+    case R_ECX:
+        ret |= ecx;
+        break;
+    case R_EDX:
+        ret |= edx;
+        break;
+    }
+    return ret;
+}
+
+static inline uint32_t tdx_cap_cpuid_config(uint32_t function,
+                                            uint32_t index, int reg)
+{
+    struct kvm_tdx_cpuid_config *cpuid_c;
+    int ret = 0;
+    int i;
+
+    if (tdx_caps->nr_cpuid_configs <= 0) {
+        return ret;
+    }
+
+    for (i = 0; i < tdx_caps->nr_cpuid_configs; i++) {
+        cpuid_c = &tdx_caps->cpuid_configs[i];
+        /* 0xffffffff in sub_leaf means the leaf doesn't require a sublesf */
+        if (cpuid_c->leaf == function &&
+            (cpuid_c->sub_leaf == 0xffffffff || cpuid_c->sub_leaf == index)) {
+            switch (reg) {
+            case R_EAX:
+                ret = cpuid_c->eax;
+                break;
+            case R_EBX:
+                ret = cpuid_c->ebx;
+                break;
+            case R_ECX:
+                ret = cpuid_c->ecx;
+                break;
+            case R_EDX:
+                ret = cpuid_c->edx;
+                break;
+            default:
+                return 0;
+            }
+        }
+    }
+    return ret;
+}
+
+static FeatureWord get_cpuid_featureword_index(uint32_t function,
+                                               uint32_t index, int reg)
+{
+    FeatureWord w;
+
+    for (w = 0; w < FEATURE_WORDS; w++) {
+        FeatureWordInfo *f = &feature_word_info[w];
+
+        if (f->type == MSR_FEATURE_WORD || f->cpuid.eax != function ||
+            f->cpuid.reg != reg ||
+            (f->cpuid.needs_ecx && f->cpuid.ecx != index)) {
+            continue;
+        }
+
+        return w;
+    }
+
+    return w;
+}
+
+/*
+ * TDX supported CPUID varies from what KVM reports. Adjust the result by
+ * applying the TDX restrictions.
+ */
+void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
+                             uint32_t *ret)
+{
+    uint32_t vmm_cap = *ret;
+    FeatureWord w;
+
+    /* Only handle features leaves that recognized by feature_word_info[] */
+    w = get_cpuid_featureword_index(function, index, reg);
+    if (w == FEATURE_WORDS) {
+        return;
+    }
+
+    if (tdx_cpuid_lookup[w].inducing_ve) {
+        *ret &= tdx_cpuid_lookup[w].supported_on_ve;
+        return;
+    }
+
+    /*
+     * Include all the native bits as first step. It covers types
+     * - As configured (if native)
+     * - Native
+     * - XFAM related and Attributes realted
+     *
+     * It also has side effect to enable unsupported bits, e.g., the
+     * bits of "fixed0" type while present natively. It's safe because
+     * the unsupported bits will be masked off by .fixed0 later.
+     */
+    *ret |= host_cpuid_reg(function, index, reg);
+
+    /* Adjust according to "fixed" type in tdx_cpuid_lookup. */
+    *ret |= tdx_cpuid_lookup[w].tdx_fixed1;
+    *ret &= ~tdx_cpuid_lookup[w].tdx_fixed0;
+
+    /*
+     * Configurable cpuids are supported unconditionally. It's mainly to
+     * include those configurable regardless of native existence.
+     */
+    *ret |= tdx_cap_cpuid_config(function, index, reg);
+
+    /*
+     * clear the configurable bits that require VMM emulation and VMM doesn't
+     * report the support.
+     */
+    *ret &= ~(~vmm_cap & tdx_cpuid_lookup[w].vmm_fixup);
+}
+
 enum tdx_ioctl_level{
     TDX_PLATFORM_IOCTL,
     TDX_VM_IOCTL,
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 4036ca2f3f99..06599b65b827 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -27,5 +27,7 @@ bool is_tdx_vm(void);
 #endif /* CONFIG_TDX */
 
 int tdx_kvm_init(MachineState *ms, Error **errp);
+void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
+                             uint32_t *ret);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 09/40] i386/tdx: Update tdx_fixed0/1 bits by tdx_caps.cpuid_config[]
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (7 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 10/40] i386/tdx: Integrate tdx_caps->xfam_fixed0/1 into tdx_cpuid_lookup Xiaoyao Li
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

tdx_cpuid_lookup[].tdx_fixed0/1 is the QEMU maintained data which
reflects TDX restrictions regrading how some CPUID is virtualized by
TDX. It's retrieved from TDX spec. However, TDX may change some fixed
fields to configurable in the future. Update
tdx_cpuid.lookup[].tdx_fixed0/1 fields by removing the bits that
reported from TDX module as configurable. This can adapt with the
updated TDX (module) automatically.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e3e9a424512e..d12b03fa05c9 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -369,6 +369,34 @@ static void get_tdx_capabilities(void)
     tdx_caps = caps;
 }
 
+static void update_tdx_cpuid_lookup_by_tdx_caps(void)
+{
+    KvmTdxCpuidLookup *entry;
+    FeatureWordInfo *fi;
+    uint32_t config;
+    FeatureWord w;
+
+    /*
+     * Patch tdx_fixed0/1 by tdx_caps that what TDX module reports as
+     * configurable is not fixed.
+     */
+    for (w = 0; w < FEATURE_WORDS; w++) {
+        fi = &feature_word_info[w];
+        entry = &tdx_cpuid_lookup[w];
+
+        if (fi->type != CPUID_FEATURE_WORD) {
+            continue;
+        }
+
+        config = tdx_cap_cpuid_config(fi->cpuid.eax,
+                                      fi->cpuid.needs_ecx ? fi->cpuid.ecx : ~0u,
+                                      fi->cpuid.reg);
+
+        entry->tdx_fixed0 &= ~config;
+        entry->tdx_fixed1 &= ~config;
+    }
+}
+
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
     TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
@@ -378,6 +406,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
         get_tdx_capabilities();
     }
 
+    update_tdx_cpuid_lookup_by_tdx_caps();
+
     tdx_guest = tdx;
 
     return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 10/40] i386/tdx: Integrate tdx_caps->xfam_fixed0/1 into tdx_cpuid_lookup
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (8 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 09/40] i386/tdx: Update tdx_fixed0/1 bits by tdx_caps.cpuid_config[] Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 11/40] i386/tdx: Integrate tdx_caps->attrs_fixed0/1 to tdx_cpuid_lookup Xiaoyao Li
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

KVMM requires userspace to pass XFAM configuration via CPUID leaves 0xDs.

Convert tdx_caps->xfam_fixed0/1 into corresponding
tdx_cpuid_lookup[].tdx_fixed0/1 field of CPUID leaves 0xD. Thus the
requirement can applied naturally.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/cpu.c     |  3 ---
 target/i386/cpu.h     |  3 +++
 target/i386/kvm/tdx.c | 24 ++++++++++++++++++++++++
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 194b5a31afac..45652bb2fd7c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1418,9 +1418,6 @@ static const X86RegisterInfo32 x86_reg_info_32[CPU_NB_REGS32] = {
 };
 #undef REGISTER
 
-/* CPUID feature bits available in XSS */
-#define CPUID_XSTATE_XSS_MASK    (XSTATE_ARCH_LBR_MASK)
-
 ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = {
     [XSTATE_FP_BIT] = {
         /* x87 FP state component is always enabled if XSAVE is supported */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index cc9da9fc4318..90f403aecd8b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -583,6 +583,9 @@ typedef enum X86Seg {
                                  XSTATE_Hi16_ZMM_MASK | XSTATE_PKRU_MASK | \
                                  XSTATE_XTILE_CFG_MASK | XSTATE_XTILE_DATA_MASK)
 
+/* CPUID feature bits available in XSS */
+#define CPUID_XSTATE_XSS_MASK    (XSTATE_ARCH_LBR_MASK)
+
 /* CPUID feature words */
 typedef enum FeatureWord {
     FEAT_1_EDX,         /* CPUID[1].EDX */
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index d12b03fa05c9..dffaa533f899 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -395,6 +395,30 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
         entry->tdx_fixed0 &= ~config;
         entry->tdx_fixed1 &= ~config;
     }
+
+    /*
+     * Because KVM gets XFAM settings via CPUID leaves 0xD,  map
+     * tdx_caps->xfam_fixed{0, 1} into tdx_cpuid_lookup[].tdx_fixed{0, 1}.
+     *
+     * Then the enforment applies in tdx_get_configurable_cpuid() naturally.
+     */
+    tdx_cpuid_lookup[FEAT_XSAVE_XCR0_LO].tdx_fixed0 =
+            (uint32_t)~tdx_caps->xfam_fixed0 & CPUID_XSTATE_XCR0_MASK;
+    tdx_cpuid_lookup[FEAT_XSAVE_XCR0_LO].tdx_fixed1 =
+            (uint32_t)tdx_caps->xfam_fixed1 & CPUID_XSTATE_XCR0_MASK;
+    tdx_cpuid_lookup[FEAT_XSAVE_XCR0_HI].tdx_fixed0 =
+            (~tdx_caps->xfam_fixed0 & CPUID_XSTATE_XCR0_MASK) >> 32;
+    tdx_cpuid_lookup[FEAT_XSAVE_XCR0_HI].tdx_fixed1 =
+            (tdx_caps->xfam_fixed1 & CPUID_XSTATE_XCR0_MASK) >> 32;
+
+    tdx_cpuid_lookup[FEAT_XSAVE_XSS_LO].tdx_fixed0 =
+            (uint32_t)~tdx_caps->xfam_fixed0 & CPUID_XSTATE_XSS_MASK;
+    tdx_cpuid_lookup[FEAT_XSAVE_XSS_LO].tdx_fixed1 =
+            (uint32_t)tdx_caps->xfam_fixed1 & CPUID_XSTATE_XSS_MASK;
+    tdx_cpuid_lookup[FEAT_XSAVE_XSS_HI].tdx_fixed0 =
+            (~tdx_caps->xfam_fixed0 & CPUID_XSTATE_XSS_MASK) >> 32;
+    tdx_cpuid_lookup[FEAT_XSAVE_XSS_HI].tdx_fixed1 =
+            (tdx_caps->xfam_fixed1 & CPUID_XSTATE_XSS_MASK) >> 32;
 }
 
 int tdx_kvm_init(MachineState *ms, Error **errp)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 11/40] i386/tdx: Integrate tdx_caps->attrs_fixed0/1 to tdx_cpuid_lookup
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (9 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 10/40] i386/tdx: Integrate tdx_caps->xfam_fixed0/1 into tdx_cpuid_lookup Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 12/40] i386/kvm: Move architectural CPUID leaf generation to separate helper Xiaoyao Li
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Some bits in TD attributes have corresponding CPUID feature bits. Reflect
the fixed0/1 restriction on TD attributes to their corresponding CPUID
bits in tdx_cpuid_lookup[] as well.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/cpu-internal.h |  9 +++++++++
 target/i386/cpu.c          |  9 ---------
 target/i386/cpu.h          |  2 ++
 target/i386/kvm/tdx.c      | 21 +++++++++++++++++++++
 4 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/target/i386/cpu-internal.h b/target/i386/cpu-internal.h
index 9baac5c0b450..e980f6e3147f 100644
--- a/target/i386/cpu-internal.h
+++ b/target/i386/cpu-internal.h
@@ -20,6 +20,15 @@
 #ifndef I386_CPU_INTERNAL_H
 #define I386_CPU_INTERNAL_H
 
+typedef struct FeatureMask {
+    FeatureWord index;
+    uint64_t mask;
+} FeatureMask;
+
+typedef struct FeatureDep {
+    FeatureMask from, to;
+} FeatureDep;
+
 typedef enum FeatureWordType {
    CPUID_FEATURE_WORD,
    MSR_FEATURE_WORD,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 45652bb2fd7c..e5c1ffcb138a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1289,15 +1289,6 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
     },
 };
 
-typedef struct FeatureMask {
-    FeatureWord index;
-    uint64_t mask;
-} FeatureMask;
-
-typedef struct FeatureDep {
-    FeatureMask from, to;
-} FeatureDep;
-
 static FeatureDep feature_dependencies[] = {
     {
         .from = { FEAT_7_0_EDX,             CPUID_7_0_EDX_ARCH_CAPABILITIES },
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 90f403aecd8b..8f4de62b02e9 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -867,6 +867,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_0_ECX_MAWAU             (31U << 17)
 /* Read Processor ID */
 #define CPUID_7_0_ECX_RDPID             (1U << 22)
+/* KeyLocker */
+#define CPUID_7_0_ECX_KeyLocker         (1U << 23)
 /* Bus Lock Debug Exception */
 #define CPUID_7_0_ECX_BUS_LOCK_DETECT   (1U << 24)
 /* Cache Line Demote Instruction */
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index dffaa533f899..6fe47cf4e29e 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -30,6 +30,13 @@
                                      (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
                                      (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_ATTRIBUTES_MAX_BITS      64
+
+static FeatureMask tdx_attrs_ctrl_fields[TDX_ATTRIBUTES_MAX_BITS] = {
+    [30] = { .index = FEAT_7_0_ECX, .mask = CPUID_7_0_ECX_PKS },
+    [31] = { .index = FEAT_7_0_ECX, .mask = CPUID_7_0_ECX_KeyLocker},
+};
+
 typedef struct KvmTdxCpuidLookup {
     uint32_t tdx_fixed0;
     uint32_t tdx_fixed1;
@@ -375,6 +382,8 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
     FeatureWordInfo *fi;
     uint32_t config;
     FeatureWord w;
+    FeatureMask *fm;
+    int i;
 
     /*
      * Patch tdx_fixed0/1 by tdx_caps that what TDX module reports as
@@ -396,6 +405,18 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
         entry->tdx_fixed1 &= ~config;
     }
 
+    for (i = 0; i < ARRAY_SIZE(tdx_attrs_ctrl_fields); i++) {
+        fm = &tdx_attrs_ctrl_fields[i];
+
+        if (tdx_caps->attrs_fixed0 & (1ULL << i)) {
+            tdx_cpuid_lookup[fm->index].tdx_fixed0 |= fm->mask;
+        }
+
+        if (tdx_caps->attrs_fixed1 & (1ULL << i)) {
+            tdx_cpuid_lookup[fm->index].tdx_fixed1 |= fm->mask;
+        }
+    }
+
     /*
      * Because KVM gets XFAM settings via CPUID leaves 0xD,  map
      * tdx_caps->xfam_fixed{0, 1} into tdx_cpuid_lookup[].tdx_fixed{0, 1}.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 12/40] i386/kvm: Move architectural CPUID leaf generation to separate helper
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (10 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 11/40] i386/tdx: Integrate tdx_caps->attrs_fixed0/1 to tdx_cpuid_lookup Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 13/40] KVM: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the architectural (for lack of a better term) CPUID leaf generation
to a separate helper so that the generation code can be reused by TDX,
which needs to generate a canonical VM-scoped configuration.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c      | 220 +++++++++++++++++++------------------
 target/i386/kvm/kvm_i386.h |   3 +
 2 files changed, 118 insertions(+), 105 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9930902ae890..9c0d5be5cc23 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1728,115 +1728,21 @@ static void kvm_init_xsave(CPUX86State *env)
            env->xsave_buf_len);
 }
 
-int kvm_arch_init_vcpu(CPUState *cs)
+uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+                            uint32_t cpuid_i)
 {
-    struct {
-        struct kvm_cpuid2 cpuid;
-        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
-    } cpuid_data;
-    /*
-     * The kernel defines these structs with padding fields so there
-     * should be no extra padding in our cpuid_data struct.
-     */
-    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
-                      sizeof(struct kvm_cpuid2) +
-                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
-
-    X86CPU *cpu = X86_CPU(cs);
-    CPUX86State *env = &cpu->env;
-    uint32_t limit, i, j, cpuid_i;
+    uint32_t limit, i, j;
     uint32_t unused;
     struct kvm_cpuid_entry2 *c;
-    uint32_t signature[3];
-    int kvm_base = KVM_CPUID_SIGNATURE;
-    int max_nested_state_len;
-    int r;
-    Error *local_err = NULL;
-
-    memset(&cpuid_data, 0, sizeof(cpuid_data));
-
-    cpuid_i = 0;
-
-    has_xsave2 = kvm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
-
-    r = kvm_arch_set_tsc_khz(cs);
-    if (r < 0) {
-        return r;
-    }
-
-    /* vcpu's TSC frequency is either specified by user, or following
-     * the value used by KVM if the former is not present. In the
-     * latter case, we query it from KVM and record in env->tsc_khz,
-     * so that vcpu's TSC frequency can be migrated later via this field.
-     */
-    if (!env->tsc_khz) {
-        r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
-            kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
-            -ENOTSUP;
-        if (r > 0) {
-            env->tsc_khz = r;
-        }
-    }
-
-    env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
-
-    /*
-     * kvm_hyperv_expand_features() is called here for the second time in case
-     * KVM_CAP_SYS_HYPERV_CPUID is not supported. While we can't possibly handle
-     * 'query-cpu-model-expansion' in this case as we don't have a KVM vCPU to
-     * check which Hyper-V enlightenments are supported and which are not, we
-     * can still proceed and check/expand Hyper-V enlightenments here so legacy
-     * behavior is preserved.
-     */
-    if (!kvm_hyperv_expand_features(cpu, &local_err)) {
-        error_report_err(local_err);
-        return -ENOSYS;
-    }
-
-    if (hyperv_enabled(cpu)) {
-        r = hyperv_init_vcpu(cpu);
-        if (r) {
-            return r;
-        }
-
-        cpuid_i = hyperv_fill_cpuids(cs, cpuid_data.entries);
-        kvm_base = KVM_CPUID_SIGNATURE_NEXT;
-        has_msr_hv_hypercall = true;
-    }
-
-    if (cpu->expose_kvm) {
-        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
-        c = &cpuid_data.entries[cpuid_i++];
-        c->function = KVM_CPUID_SIGNATURE | kvm_base;
-        c->eax = KVM_CPUID_FEATURES | kvm_base;
-        c->ebx = signature[0];
-        c->ecx = signature[1];
-        c->edx = signature[2];
-
-        c = &cpuid_data.entries[cpuid_i++];
-        c->function = KVM_CPUID_FEATURES | kvm_base;
-        c->eax = env->features[FEAT_KVM];
-        c->edx = env->features[FEAT_KVM_HINTS];
-    }
 
     cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
 
-    if (cpu->kvm_pv_enforce_cpuid) {
-        r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1);
-        if (r < 0) {
-            fprintf(stderr,
-                    "failed to enable KVM_CAP_ENFORCE_PV_FEATURE_CPUID: %s",
-                    strerror(-r));
-            abort();
-        }
-    }
-
     for (i = 0; i <= limit; i++) {
         if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
             fprintf(stderr, "unsupported level value: 0x%x\n", limit);
             abort();
         }
-        c = &cpuid_data.entries[cpuid_i++];
+        c = &entries[cpuid_i++];
 
         switch (i) {
         case 2: {
@@ -1855,7 +1761,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:2):eax & 0xf = 0x%x\n", times);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
                 c->function = i;
                 c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
                 cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
@@ -1901,7 +1807,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         case 0x7:
@@ -1921,7 +1827,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                                 "cpuid(eax:0x12,ecx:0x%x)\n", j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         case 0x14:
@@ -1941,7 +1847,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                                 "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
                 c->function = i;
                 c->index = j;
                 c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
@@ -1998,7 +1904,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
             fprintf(stderr, "unsupported xlevel value: 0x%x\n", limit);
             abort();
         }
-        c = &cpuid_data.entries[cpuid_i++];
+        c = &entries[cpuid_i++];
 
         switch (i) {
         case 0x8000001d:
@@ -2017,7 +1923,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         default:
@@ -2044,7 +1950,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                 fprintf(stderr, "unsupported xlevel2 value: 0x%x\n", limit);
                 abort();
             }
-            c = &cpuid_data.entries[cpuid_i++];
+            c = &entries[cpuid_i++];
 
             c->function = i;
             c->flags = 0;
@@ -2052,6 +1958,110 @@ int kvm_arch_init_vcpu(CPUState *cs)
         }
     }
 
+    return cpuid_i;
+}
+
+int kvm_arch_init_vcpu(CPUState *cs)
+{
+    struct {
+        struct kvm_cpuid2 cpuid;
+        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
+    } cpuid_data;
+    /*
+     * The kernel defines these structs with padding fields so there
+     * should be no extra padding in our cpuid_data struct.
+     */
+    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
+                      sizeof(struct kvm_cpuid2) +
+                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = &cpu->env;
+    uint32_t cpuid_i;
+    struct kvm_cpuid_entry2 *c;
+    uint32_t signature[3];
+    int kvm_base = KVM_CPUID_SIGNATURE;
+    int max_nested_state_len;
+    int r;
+    Error *local_err = NULL;
+
+    memset(&cpuid_data, 0, sizeof(cpuid_data));
+
+    cpuid_i = 0;
+
+    has_xsave2 = kvm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
+
+    r = kvm_arch_set_tsc_khz(cs);
+    if (r < 0) {
+        return r;
+    }
+
+    /* vcpu's TSC frequency is either specified by user, or following
+     * the value used by KVM if the former is not present. In the
+     * latter case, we query it from KVM and record in env->tsc_khz,
+     * so that vcpu's TSC frequency can be migrated later via this field.
+     */
+    if (!env->tsc_khz) {
+        r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
+            kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
+            -ENOTSUP;
+        if (r > 0) {
+            env->tsc_khz = r;
+        }
+    }
+
+    env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
+
+    /*
+     * kvm_hyperv_expand_features() is called here for the second time in case
+     * KVM_CAP_SYS_HYPERV_CPUID is not supported. While we can't possibly handle
+     * 'query-cpu-model-expansion' in this case as we don't have a KVM vCPU to
+     * check which Hyper-V enlightenments are supported and which are not, we
+     * can still proceed and check/expand Hyper-V enlightenments here so legacy
+     * behavior is preserved.
+     */
+    if (!kvm_hyperv_expand_features(cpu, &local_err)) {
+        error_report_err(local_err);
+        return -ENOSYS;
+    }
+
+    if (hyperv_enabled(cpu)) {
+        r = hyperv_init_vcpu(cpu);
+        if (r) {
+            return r;
+        }
+
+        cpuid_i = hyperv_fill_cpuids(cs, cpuid_data.entries);
+        kvm_base = KVM_CPUID_SIGNATURE_NEXT;
+        has_msr_hv_hypercall = true;
+    }
+
+    if (cpu->expose_kvm) {
+        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
+        c = &cpuid_data.entries[cpuid_i++];
+        c->function = KVM_CPUID_SIGNATURE | kvm_base;
+        c->eax = KVM_CPUID_FEATURES | kvm_base;
+        c->ebx = signature[0];
+        c->ecx = signature[1];
+        c->edx = signature[2];
+
+        c = &cpuid_data.entries[cpuid_i++];
+        c->function = KVM_CPUID_FEATURES | kvm_base;
+        c->eax = env->features[FEAT_KVM];
+        c->edx = env->features[FEAT_KVM_HINTS];
+    }
+
+    if (cpu->kvm_pv_enforce_cpuid) {
+        r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1);
+        if (r < 0) {
+            fprintf(stderr,
+                    "failed to enable KVM_CAP_ENFORCE_PV_FEATURE_CPUID: %s",
+                    strerror(-r));
+            abort();
+        }
+    }
+
+    cpuid_i = kvm_x86_arch_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
     if (((env->cpuid_version >> 8)&0xF) >= 6
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 6b24ab2a7813..c77dd7a95a7c 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -26,6 +26,9 @@
 #define kvm_ioapic_in_kernel() \
     (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
 
+uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+                            uint32_t cpuid_i);
+
 #else
 
 #define kvm_pit_in_kernel()      0
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 13/40] KVM: Introduce kvm_arch_pre_create_vcpu()
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (11 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 12/40] i386/kvm: Move architectural CPUID leaf generation to separate helper Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-25 11:28   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 14/40] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
                   ` (28 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c  | 12 ++++++++++++
 include/sysemu/kvm.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 46e609570ce1..c26d602f5476 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -460,6 +460,11 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
     return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
 }
 
+int __attribute__ ((weak)) kvm_arch_pre_create_vcpu(CPUState *cpu)
+{
+    return 0;
+}
+
 int kvm_init_vcpu(CPUState *cpu, Error **errp)
 {
     KVMState *s = kvm_state;
@@ -468,6 +473,13 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
     trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+    ret = kvm_arch_pre_create_vcpu(cpu);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "kvm_init_vcpu: kvm_arch_pre_create_vcpu() failed");
+        goto err;
+    }
+
     ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
     if (ret < 0) {
         error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)",
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index efd6dee818f2..e3159e1e711d 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -373,6 +373,7 @@ int kvm_arch_put_registers(CPUState *cpu, int level);
 
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu);
 int kvm_arch_init_vcpu(CPUState *cpu);
 int kvm_arch_destroy_vcpu(CPUState *cpu);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 14/40] i386/tdx: Initialize TDX before creating TD vcpus
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (12 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 13/40] KVM: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-25 11:29   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object Xiaoyao Li
                   ` (27 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
configures global TD state, e.g. the canonical CPUID config, and must
be executed prior to creating vCPUs.

Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM and
tie x86cpu->enable_pmu with TD's attributes.

Note, this doesn't address the fact that QEMU may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring QEMU to
provide a stable CPUID config prior to kvm_arch_init().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c        |  9 ++++++++-
 target/i386/kvm/kvm.c      |  8 ++++++++
 target/i386/kvm/tdx-stub.c |  5 +++++
 target/i386/kvm/tdx.c      | 34 ++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx.h      |  4 ++++
 5 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c26d602f5476..c1348c380680 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -473,10 +473,17 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
     trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+    /*
+     * tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn may call
+     * kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL pointer
+     * dereference.
+     */
+    cpu->kvm_state = s;
     ret = kvm_arch_pre_create_vcpu(cpu);
     if (ret < 0) {
         error_setg_errno(errp, -ret,
                          "kvm_init_vcpu: kvm_arch_pre_create_vcpu() failed");
+        cpu->kvm_state = NULL;
         goto err;
     }
 
@@ -484,11 +491,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
     if (ret < 0) {
         error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)",
                          kvm_arch_vcpu_id(cpu));
+        cpu->kvm_state = NULL;
         goto err;
     }
 
     cpu->kvm_fd = ret;
-    cpu->kvm_state = s;
     cpu->vcpu_dirty = true;
     cpu->dirty_pages = 0;
     cpu->throttle_us_per_full = 0;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9c0d5be5cc23..4f491f871f3e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2197,6 +2197,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
     return r;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu)
+{
+    if (is_tdx_vm())
+        return tdx_pre_create_vcpu(cpu);
+
+    return 0;
+}
+
 int kvm_arch_destroy_vcpu(CPUState *cs)
 {
     X86CPU *cpu = X86_CPU(cs);
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 1df24735201e..2871de9d7b56 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -7,3 +7,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 {
     return -EINVAL;
 }
+
+int tdx_pre_create_vcpu(CPUState *cpu)
+{
+    return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 6fe47cf4e29e..ecb0205651bd 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -458,6 +458,38 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
     return 0;
 }
 
+int tdx_pre_create_vcpu(CPUState *cpu)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    struct kvm_tdx_init_vm init_vm;
+    int r = 0;
+
+    qemu_mutex_lock(&tdx_guest->lock);
+    if (tdx_guest->initialized) {
+        goto out;
+    }
+
+    memset(&init_vm, 0, sizeof(init_vm));
+    init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
+
+    init_vm.attributes = tdx_guest->attributes;
+    init_vm.max_vcpus = ms->smp.cpus;
+
+    r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
+    if (r < 0) {
+        error_report("KVM_TDX_INIT_VM failed %s", strerror(-r));
+        goto out;
+    }
+
+    tdx_guest->initialized = true;
+
+out:
+    qemu_mutex_unlock(&tdx_guest->lock);
+    return r;
+}
+
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
                                    tdx_guest,
@@ -470,6 +502,8 @@ static void tdx_guest_init(Object *obj)
 {
     TdxGuest *tdx = TDX_GUEST(obj);
 
+    qemu_mutex_init(&tdx->lock);
+
     tdx->attributes = 0;
 }
 
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 06599b65b827..46a24ee8c7cc 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -17,6 +17,9 @@ typedef struct TdxGuestClass {
 typedef struct TdxGuest {
     ConfidentialGuestSupport parent_obj;
 
+    QemuMutex lock;
+
+    bool initialized;
     uint64_t attributes;    /* TD attributes */
 } TdxGuest;
 
@@ -29,5 +32,6 @@ bool is_tdx_vm(void);
 int tdx_kvm_init(MachineState *ms, Error **errp);
 void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
                              uint32_t *ret);
+int tdx_pre_create_vcpu(CPUState *cpu);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (13 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 14/40] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-25 11:36   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 16/40] i386/tdx: Wire CPU features up with attributes of TD guest Xiaoyao Li
                   ` (26 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Bit 28, named SEPT_VE_DISABLE, disables	EPT violation conversion to #VE
on guest TD access of PENDING pages when set to 1. Some guest OS (e.g.,
Linux TD guest) may require this bit set as 1. Otherwise refuse to boot.

Add sept-ve-disable property for tdx-guest object, for user to configure
this bit.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 qapi/qom.json         |  4 +++-
 target/i386/kvm/tdx.c | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 38177848abc1..2a5486bfed3e 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -835,10 +835,12 @@
 #
 # Properties for tdx-guest objects.
 #
+# @sept-ve-disable: bit 28 of TD attributes (default: 0)
+#
 # Since: 7.2
 ##
 { 'struct': 'TdxGuestProperties',
-  'data': { }}
+  'data': { '*sept-ve-disable': 'bool' } }
 
 ##
 # @ObjectType:
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index ecb0205651bd..bf57f270ac9d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -30,6 +30,8 @@
                                      (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
                                      (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE   BIT_ULL(28)
+
 #define TDX_ATTRIBUTES_MAX_BITS      64
 
 static FeatureMask tdx_attrs_ctrl_fields[TDX_ATTRIBUTES_MAX_BITS] = {
@@ -490,6 +492,24 @@ out:
     return r;
 }
 
+static bool tdx_guest_get_sept_ve_disable(Object *obj, Error **errp)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    return !!(tdx->attributes & TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE);
+}
+
+static void tdx_guest_set_sept_ve_disable(Object *obj, bool value, Error **errp)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    if (value) {
+        tdx->attributes |= TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
+    } else {
+        tdx->attributes &= ~TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
+    }
+}
+
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
                                    tdx_guest,
@@ -505,6 +525,10 @@ static void tdx_guest_init(Object *obj)
     qemu_mutex_init(&tdx->lock);
 
     tdx->attributes = 0;
+
+    object_property_add_bool(obj, "sept-ve-disable",
+                             tdx_guest_get_sept_ve_disable,
+                             tdx_guest_set_sept_ve_disable);
 }
 
 static void tdx_guest_finalize(Object *obj)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 16/40] i386/tdx: Wire CPU features up with attributes of TD guest
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (14 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-25 11:38   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 17/40] i386/tdx: Validate TD attributes Xiaoyao Li
                   ` (25 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

For QEMU VMs, PKS is configured via CPUID_7_0_ECX_PKS and PMU is
configured by x86cpu->enable_pmu. Reuse the existing configuration
interface for TDX VMs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index bf57f270ac9d..f2372002077d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -31,6 +31,8 @@
                                      (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
 #define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE   BIT_ULL(28)
+#define TDX_TD_ATTRIBUTES_PKS               BIT_ULL(30)
+#define TDX_TD_ATTRIBUTES_PERFMON           BIT_ULL(63)
 
 #define TDX_ATTRIBUTES_MAX_BITS      64
 
@@ -460,6 +462,15 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
     return 0;
 }
 
+static void setup_td_guest_attributes(X86CPU *x86cpu)
+{
+    CPUX86State *env = &x86cpu->env;
+
+    tdx_guest->attributes |= (env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS) ?
+                             TDX_TD_ATTRIBUTES_PKS : 0;
+    tdx_guest->attributes |= x86cpu->enable_pmu ? TDX_TD_ATTRIBUTES_PERFMON : 0;
+}
+
 int tdx_pre_create_vcpu(CPUState *cpu)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
@@ -473,6 +484,8 @@ int tdx_pre_create_vcpu(CPUState *cpu)
         goto out;
     }
 
+    setup_td_guest_attributes(x86cpu);
+
     memset(&init_vm, 0, sizeof(init_vm));
     init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 17/40] i386/tdx: Validate TD attributes
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (15 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 16/40] i386/tdx: Wire CPU features up with attributes of TD guest Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-25 11:39   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 18/40] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
                   ` (24 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
fixed-1 bits must be set.

Besides, sanity check the attribute bits that have not been supported by
QEMU yet. e.g., debug bit, it will be allowed in the future when debug
TD support lands in QEMU.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index f2372002077d..42cef484c574 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -30,6 +30,7 @@
                                      (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
                                      (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_TD_ATTRIBUTES_DEBUG             BIT_ULL(0)
 #define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE   BIT_ULL(28)
 #define TDX_TD_ATTRIBUTES_PKS               BIT_ULL(30)
 #define TDX_TD_ATTRIBUTES_PERFMON           BIT_ULL(63)
@@ -462,13 +463,32 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
     return 0;
 }
 
-static void setup_td_guest_attributes(X86CPU *x86cpu)
+static int tdx_validate_attributes(TdxGuest *tdx)
+{
+    if (((tdx->attributes & tdx_caps->attrs_fixed0) | tdx_caps->attrs_fixed1) !=
+        tdx->attributes) {
+            error_report("Invalid attributes 0x%lx for TDX VM (fixed0 0x%llx, fixed1 0x%llx)",
+                          tdx->attributes, tdx_caps->attrs_fixed0, tdx_caps->attrs_fixed1);
+            return -EINVAL;
+    }
+
+    if (tdx->attributes & TDX_TD_ATTRIBUTES_DEBUG) {
+        error_report("Current QEMU doesn't support attributes.debug[bit 0] for TDX VM");
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int setup_td_guest_attributes(X86CPU *x86cpu)
 {
     CPUX86State *env = &x86cpu->env;
 
     tdx_guest->attributes |= (env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS) ?
                              TDX_TD_ATTRIBUTES_PKS : 0;
     tdx_guest->attributes |= x86cpu->enable_pmu ? TDX_TD_ATTRIBUTES_PERFMON : 0;
+
+    return tdx_validate_attributes(tdx_guest);
 }
 
 int tdx_pre_create_vcpu(CPUState *cpu)
@@ -484,7 +504,10 @@ int tdx_pre_create_vcpu(CPUState *cpu)
         goto out;
     }
 
-    setup_td_guest_attributes(x86cpu);
+    r = setup_td_guest_attributes(x86cpu);
+    if (r) {
+        goto out;
+    }
 
     memset(&init_vm, 0, sizeof(init_vm));
     init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 18/40] i386/tdx: Implement user specified tsc frequency
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (16 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 17/40] i386/tdx: Validate TD attributes Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-25 11:41   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 19/40] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM Xiaoyao Li
                   ` (23 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and call VM
scope VM_SET_TSC_KHZ to set the tsc frequency of TD before KVM_TDX_INIT_VM.

Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by TDX module).

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes from RFC v4:
  - Use VM scope VM_SET_TSC_KHZ to set the TSC frequency of TD since KVM
    side drop the @tsc_khz field in struct kvm_tdx_init_vm
---
 target/i386/kvm/kvm.c |  9 +++++++++
 target/i386/kvm/tdx.c | 24 ++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4f491f871f3e..1545b6f870f5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -812,6 +812,15 @@ static int kvm_arch_set_tsc_khz(CPUState *cs)
     int r, cur_freq;
     bool set_ioctl = false;
 
+    /*
+     * TSC of TD vcpu is immutable, it cannot be set/changed via vcpu scope
+     * VM_SET_TSC_KHZ, but only be initialized via VM scope VM_SET_TSC_KHZ
+     * before ioctl KVM_TDX_INIT_VM in tdx_pre_create_vcpu()
+     */
+    if (is_tdx_vm()) {
+        return 0;
+    }
+
     if (!env->tsc_khz) {
         return 0;
     }
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 42cef484c574..0162d7cc9df4 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -30,6 +30,9 @@
                                      (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
                                      (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_MIN_TSC_FREQUENCY_KHZ   (100 * 1000)
+#define TDX_MAX_TSC_FREQUENCY_KHZ   (10 * 1000 * 1000)
+
 #define TDX_TD_ATTRIBUTES_DEBUG             BIT_ULL(0)
 #define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE   BIT_ULL(28)
 #define TDX_TD_ATTRIBUTES_PKS               BIT_ULL(30)
@@ -504,6 +507,27 @@ int tdx_pre_create_vcpu(CPUState *cpu)
         goto out;
     }
 
+    r = -EINVAL;
+    if (env->tsc_khz && (env->tsc_khz < TDX_MIN_TSC_FREQUENCY_KHZ ||
+                         env->tsc_khz > TDX_MAX_TSC_FREQUENCY_KHZ)) {
+        error_report("Invalid TSC %ld KHz, must specify cpu_frequency between [%d, %d] kHz",
+                      env->tsc_khz, TDX_MIN_TSC_FREQUENCY_KHZ,
+                      TDX_MAX_TSC_FREQUENCY_KHZ);
+        goto out;
+    }
+
+    if (env->tsc_khz % (25 * 1000)) {
+        error_report("Invalid TSC %ld KHz, it must be multiple of 25MHz", env->tsc_khz);
+        goto out;
+    }
+
+    /* it's safe even env->tsc_khz is 0. KVM uses host's tsc_khz in this case */
+    r = kvm_vm_ioctl(kvm_state, KVM_SET_TSC_KHZ, env->tsc_khz);
+    if (r < 0) {
+        error_report("Unable to set TSC frequency to %" PRId64 " kHz", env->tsc_khz);
+        goto out;
+    }
+
     r = setup_td_guest_attributes(x86cpu);
     if (r) {
         goto out;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 19/40] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (17 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 18/40] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 20/40] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX only supports readonly for shared memory but not for private memory.

In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 target/i386/kvm/tdx.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0162d7cc9df4..3aa0e374a514 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -461,6 +461,15 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 
     update_tdx_cpuid_lookup_by_tdx_caps();
 
+    /*
+     * Set kvm_readonly_mem_allowed to false, because TDX only supports readonly
+     * memory for shared memory but not for private memory. Besides, whether a
+     * memslot is private or shared is not determined by QEMU.
+     *
+     * Thus, just mark readonly memory not supported for simplicity.
+     */
+    kvm_readonly_mem_allowed = false;
+
     tdx_guest = tdx;
 
     return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 20/40] i386/tdvf: Introduce function to parse TDVF metadata
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (18 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 19/40] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26  9:12   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 21/40] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
                   ` (21 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.

A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.

TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.

Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

---
Changes from RFC v4:
 - rename tdvf_parse_section_entry() to
   tdvf_parse_and_check_section_entry()
Changes in v4:
 - rename TDX_METADATA_GUID to TDX_METADATA_OFFSET_GUID
---
 hw/i386/Kconfig        |   1 +
 hw/i386/meson.build    |   1 +
 hw/i386/tdvf.c         | 197 +++++++++++++++++++++++++++++++++++++++++
 include/hw/i386/tdvf.h |  51 +++++++++++
 4 files changed, 250 insertions(+)
 create mode 100644 hw/i386/tdvf.c
 create mode 100644 include/hw/i386/tdvf.h

diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 9e40ff79fc2d..0c3e3a464012 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -12,6 +12,7 @@ config SGX
 
 config TDX
     bool
+    select X86_FW_OVMF
     depends on KVM
 
 config PC
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3d7..97f3b50503b0 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -28,6 +28,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'port92.c'))
 i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
                                         if_false: files('pc_sysfw_ovmf-stubs.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
new file mode 100644
index 000000000000..a40198f9407a
--- /dev/null
+++ b/hw/i386/tdvf.c
@@ -0,0 +1,197 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/tdvf.h"
+#include "sysemu/kvm.h"
+
+#define TDX_METADATA_OFFSET_GUID    "e47a6535-984a-4798-865e-4685a7bf8ec2"
+#define TDX_METADATA_VERSION        1
+#define TDVF_SIGNATURE              0x46564454 /* TDVF as little endian */
+
+typedef struct {
+    uint32_t DataOffset;
+    uint32_t RawDataSize;
+    uint64_t MemoryAddress;
+    uint64_t MemoryDataSize;
+    uint32_t Type;
+    uint32_t Attributes;
+} TdvfSectionEntry;
+
+typedef struct {
+    uint32_t Signature;
+    uint32_t Length;
+    uint32_t Version;
+    uint32_t NumberOfSectionEntries;
+    TdvfSectionEntry SectionEntries[];
+} TdvfMetadata;
+
+struct tdx_metadata_offset {
+    uint32_t offset;
+};
+
+static TdvfMetadata *tdvf_get_metadata(void *flash_ptr, int size)
+{
+    TdvfMetadata *metadata;
+    uint32_t offset = 0;
+    uint8_t *data;
+
+    if ((uint32_t) size != size) {
+        return NULL;
+    }
+
+    if (pc_system_ovmf_table_find(TDX_METADATA_OFFSET_GUID, &data, NULL)) {
+        offset = size - le32_to_cpu(((struct tdx_metadata_offset *)data)->offset);
+
+        if (offset + sizeof(*metadata) > size) {
+            return NULL;
+        }
+    } else {
+        error_report("Cannot find TDX_METADATA_OFFSET_GUID");
+        return NULL;
+    }
+
+    metadata = flash_ptr + offset;
+
+    /* Finally, verify the signature to determine if this is a TDVF image. */
+    metadata->Signature = le32_to_cpu(metadata->Signature);
+    if (metadata->Signature != TDVF_SIGNATURE) {
+        error_report("Invalid TDVF signature in metadata!");
+        return NULL;
+    }
+
+    /* Sanity check that the TDVF doesn't overlap its own metadata. */
+    metadata->Length = le32_to_cpu(metadata->Length);
+    if (offset + metadata->Length > size) {
+        return NULL;
+    }
+
+    /* Only version 1 is supported/defined. */
+    metadata->Version = le32_to_cpu(metadata->Version);
+    if (metadata->Version != TDX_METADATA_VERSION) {
+        return NULL;
+    }
+
+    return metadata;
+}
+
+static int tdvf_parse_and_check_section_entry(const TdvfSectionEntry *src,
+                                              TdxFirmwareEntry *entry)
+{
+    entry->data_offset = le32_to_cpu(src->DataOffset);
+    entry->data_len = le32_to_cpu(src->RawDataSize);
+    entry->address = le64_to_cpu(src->MemoryAddress);
+    entry->size = le64_to_cpu(src->MemoryDataSize);
+    entry->type = le32_to_cpu(src->Type);
+    entry->attributes = le32_to_cpu(src->Attributes);
+
+    /* sanity check */
+    if (entry->size < entry->data_len) {
+        error_report("Broken metadata RawDataSize 0x%x MemoryDataSize 0x%lx",
+                     entry->data_len, entry->size);
+        return -1;
+    }
+    if (!QEMU_IS_ALIGNED(entry->address, TARGET_PAGE_SIZE)) {
+        error_report("MemoryAddress 0x%lx not page aligned", entry->address);
+        return -1;
+    }
+    if (!QEMU_IS_ALIGNED(entry->size, TARGET_PAGE_SIZE)) {
+        error_report("MemoryDataSize 0x%lx not page aligned", entry->size);
+        return -1;
+    }
+
+    switch (entry->type) {
+    case TDVF_SECTION_TYPE_BFV:
+    case TDVF_SECTION_TYPE_CFV:
+        /* The sections that must be copied from firmware image to TD memory */
+        if (entry->data_len == 0) {
+            error_report("%d section with RawDataSize == 0", entry->type);
+            return -1;
+        }
+        break;
+    case TDVF_SECTION_TYPE_TD_HOB:
+    case TDVF_SECTION_TYPE_TEMP_MEM:
+        /* The sections that no need to be copied from firmware image */
+        if (entry->data_len != 0) {
+            error_report("%d section with RawDataSize 0x%x != 0",
+                         entry->type, entry->data_len);
+            return -1;
+        }
+        break;
+    default:
+        error_report("TDVF contains unsupported section type %d", entry->type);
+        return -1;
+    }
+
+    return 0;
+}
+
+int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size)
+{
+    TdvfSectionEntry *sections;
+    TdvfMetadata *metadata;
+    ssize_t entries_size;
+    uint32_t len, i;
+
+    metadata = tdvf_get_metadata(flash_ptr, size);
+    if (!metadata) {
+        return -EINVAL;
+    }
+
+    //load and parse metadata entries
+    fw->nr_entries = le32_to_cpu(metadata->NumberOfSectionEntries);
+    if (fw->nr_entries < 2) {
+        error_report("Invalid number of fw entries (%u) in TDVF", fw->nr_entries);
+        return -EINVAL;
+    }
+
+    len = le32_to_cpu(metadata->Length);
+    entries_size = fw->nr_entries * sizeof(TdvfSectionEntry);
+    if (len != sizeof(*metadata) + entries_size) {
+        error_report("TDVF metadata len (0x%x) mismatch, expected (0x%x)",
+                     len, (uint32_t)(sizeof(*metadata) + entries_size));
+        return -EINVAL;
+    }
+
+    fw->entries = g_new(TdxFirmwareEntry, fw->nr_entries);
+    sections = g_new(TdvfSectionEntry, fw->nr_entries);
+
+    if (!memcpy(sections, (void *)metadata + sizeof(*metadata), entries_size))  {
+        error_report("Failed to read TDVF section entries");
+        goto err;
+    }
+
+    for (i = 0; i < fw->nr_entries; i++) {
+        if (tdvf_parse_and_check_section_entry(&sections[i], &fw->entries[i])) {
+            goto err;
+        }
+    }
+    g_free(sections);
+
+    return 0;
+
+err:
+    g_free(sections);
+    fw->entries = 0;
+    g_free(fw->entries);
+    return -EINVAL;
+}
diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
new file mode 100644
index 000000000000..593341eb2e93
--- /dev/null
+++ b/include/hw/i386/tdvf.h
@@ -0,0 +1,51 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_TDVF_H
+#define HW_I386_TDVF_H
+
+#include "qemu/osdep.h"
+
+#define TDVF_SECTION_TYPE_BFV               0
+#define TDVF_SECTION_TYPE_CFV               1
+#define TDVF_SECTION_TYPE_TD_HOB            2
+#define TDVF_SECTION_TYPE_TEMP_MEM          3
+
+#define TDVF_SECTION_ATTRIBUTES_MR_EXTEND   (1U << 0)
+#define TDVF_SECTION_ATTRIBUTES_PAGE_AUG    (1U << 1)
+
+typedef struct TdxFirmwareEntry {
+    uint32_t data_offset;
+    uint32_t data_len;
+    uint64_t address;
+    uint64_t size;
+    uint32_t type;
+    uint32_t attributes;
+} TdxFirmwareEntry;
+
+typedef struct TdxFirmware {
+    uint32_t nr_entries;
+    TdxFirmwareEntry *entries;
+} TdxFirmware;
+
+int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size);
+
+#endif /* HW_I386_TDVF_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 21/40] i386/tdx: Parse TDVF metadata for TDX VM
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (19 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 20/40] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 22/40] i386/tdx: Skip BIOS shadowing setup Xiaoyao Li
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX cannot support pflash device since it doesn't support read-only
memslot and doesn't support emulation. Load TDVF(OVMF) with -bios option
for TDs.

When boot a TD, besides load TDVF to the address below 4G, it needs
parse TDVF metadata.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/i386/pc_sysfw.c         | 7 +++++++
 hw/i386/x86.c              | 3 ++-
 target/i386/kvm/tdx-stub.c | 5 +++++
 target/i386/kvm/tdx.c      | 5 +++++
 target/i386/kvm/tdx.h      | 4 ++++
 5 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index c8d9e71b889b..cf63434ba89d 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -37,6 +37,7 @@
 #include "hw/block/flash.h"
 #include "sysemu/kvm.h"
 #include "sev.h"
+#include "kvm/tdx.h"
 
 #define FLASH_SECTOR_SIZE 4096
 
@@ -265,5 +266,11 @@ void x86_firmware_configure(void *ptr, int size)
         }
 
         sev_encrypt_flash(ptr, size, &error_fatal);
+    } else if (is_tdx_vm()) {
+        ret = tdx_parse_tdvf(ptr, size);
+        if (ret) {
+            error_report("failed to parse TDVF for TDX VM");
+            exit(1);
+        }
     }
 }
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a15fadeb0e68..006b0e670e4d 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -49,6 +49,7 @@
 #include "hw/intc/i8259.h"
 #include "hw/rtc/mc146818rtc.h"
 #include "target/i386/sev.h"
+#include "kvm/tdx.h"
 
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/irq.h"
@@ -1149,7 +1150,7 @@ void x86_bios_rom_init(MachineState *ms, const char *default_firmware,
     }
     bios = g_malloc(sizeof(*bios));
     memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
-    if (sev_enabled()) {
+    if (sev_enabled() || is_tdx_vm()) {
         /*
          * The concept of a "reset" simply doesn't exist for
          * confidential computing guests, we have to destroy and
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 2871de9d7b56..395a59721266 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -12,3 +12,8 @@ int tdx_pre_create_vcpu(CPUState *cpu)
 {
     return -EINVAL;
 }
+
+int tdx_parse_tdvf(void *flash_ptr, int size)
+{
+    return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 3aa0e374a514..25b3e2058cb3 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -561,6 +561,11 @@ out:
     return r;
 }
 
+int tdx_parse_tdvf(void *flash_ptr, int size)
+{
+    return tdvf_parse_metadata(&tdx_guest->tdvf, flash_ptr, size);
+}
+
 static bool tdx_guest_get_sept_ve_disable(Object *obj, Error **errp)
 {
     TdxGuest *tdx = TDX_GUEST(obj);
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 46a24ee8c7cc..12bcf25bb95b 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -6,6 +6,7 @@
 #endif
 
 #include "exec/confidential-guest-support.h"
+#include "hw/i386/tdvf.h"
 
 #define TYPE_TDX_GUEST "tdx-guest"
 #define TDX_GUEST(obj)  OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
@@ -21,6 +22,8 @@ typedef struct TdxGuest {
 
     bool initialized;
     uint64_t attributes;    /* TD attributes */
+
+    TdxFirmware tdvf;
 } TdxGuest;
 
 #ifdef CONFIG_TDX
@@ -33,5 +36,6 @@ int tdx_kvm_init(MachineState *ms, Error **errp);
 void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
                              uint32_t *ret);
 int tdx_pre_create_vcpu(CPUState *cpu);
+int tdx_parse_tdvf(void *flash_ptr, int size);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 22/40] i386/tdx: Skip BIOS shadowing setup
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (20 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 21/40] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26  9:13   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 23/40] i386/tdx: Don't initialize pc.rom for TDX VMs Xiaoyao Li
                   ` (19 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX doesn't support map different GPAs to same private memory. Thus,
aliasing top 128KB of BIOS as isa-bios is not supported.

On the other hand, TDX guest cannot go to real mode, it can work fine
without isa-bios.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes from RFC v4:
 - update commit message and comment to clarify
---
 hw/i386/x86.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 006b0e670e4d..a389ee26265a 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1172,17 +1172,20 @@ void x86_bios_rom_init(MachineState *ms, const char *default_firmware,
     }
     g_free(filename);
 
-    /* map the last 128KB of the BIOS in ISA space */
-    isa_bios_size = MIN(bios_size, 128 * KiB);
-    isa_bios = g_malloc(sizeof(*isa_bios));
-    memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
-                             bios_size - isa_bios_size, isa_bios_size);
-    memory_region_add_subregion_overlap(rom_memory,
-                                        0x100000 - isa_bios_size,
-                                        isa_bios,
-                                        1);
-    if (!isapc_ram_fw) {
-        memory_region_set_readonly(isa_bios, true);
+    /* For TDX, alias different GPAs to same private memory is not supported */
+    if (!is_tdx_vm()) {
+        /* map the last 128KB of the BIOS in ISA space */
+        isa_bios_size = MIN(bios_size, 128 * KiB);
+        isa_bios = g_malloc(sizeof(*isa_bios));
+        memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
+                                bios_size - isa_bios_size, isa_bios_size);
+        memory_region_add_subregion_overlap(rom_memory,
+                                            0x100000 - isa_bios_size,
+                                            isa_bios,
+                                            1);
+        if (!isapc_ram_fw) {
+            memory_region_set_readonly(isa_bios, true);
+        }
     }
 
     /* map all the bios at the top of memory */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 23/40] i386/tdx: Don't initialize pc.rom for TDX VMs
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (21 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 22/40] i386/tdx: Skip BIOS shadowing setup Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 24/40] i386/tdx: Track mem_ptr for each firmware entry of TDVF Xiaoyao Li
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

For TDX, the address below 1MB are entirely general RAM. No need to
initialize pc.rom memory region for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
This is more as a workaround of the issue that for q35 machine type, the
real memslot update (which requires memslot deletion )for pc.rom happens
after tdx_init_memory_region. It leads to the private memory ADD'ed
before get lost. I haven't work out a good solution to resolve the
order issue. So just skip the pc.rom setup to avoid memslot deletion.
---
 hw/i386/pc.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1f62971759bf..c089dc49485d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -62,6 +62,7 @@
 #include "sysemu/reset.h"
 #include "sysemu/runstate.h"
 #include "kvm/kvm_i386.h"
+#include "kvm/tdx.h"
 #include "hw/xen/xen.h"
 #include "hw/xen/start_info.h"
 #include "ui/qemu-spice.h"
@@ -1084,16 +1085,18 @@ void pc_memory_init(PCMachineState *pcms,
     /* Initialize PC system firmware */
     pc_system_firmware_init(pcms, rom_memory);
 
-    option_rom_mr = g_malloc(sizeof(*option_rom_mr));
-    memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
-                           &error_fatal);
-    if (pcmc->pci_enabled) {
-        memory_region_set_readonly(option_rom_mr, true);
+    if (!is_tdx_vm()) {
+        option_rom_mr = g_malloc(sizeof(*option_rom_mr));
+        memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
+                            &error_fatal);
+        if (pcmc->pci_enabled) {
+            memory_region_set_readonly(option_rom_mr, true);
+        }
+        memory_region_add_subregion_overlap(rom_memory,
+                                            PC_ROM_MIN_VGA,
+                                            option_rom_mr,
+                                            1);
     }
-    memory_region_add_subregion_overlap(rom_memory,
-                                        PC_ROM_MIN_VGA,
-                                        option_rom_mr,
-                                        1);
 
     fw_cfg = fw_cfg_arch_create(machine,
                                 x86ms->boot_cpus, x86ms->apic_id_limit);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 24/40] i386/tdx: Track mem_ptr for each firmware entry of TDVF
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (22 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 23/40] i386/tdx: Don't initialize pc.rom for TDX VMs Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 25/40] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).

Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.

TDVF sections can be classified into two groups:
 - Firmware itself, e.g., TDVF BFV and CFV, that located separately from
   guest RAM. Its memory pointer is the bios pointer.

 - Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
   mmap a new memory range for them.

Register a machine_init_done callback to do the stuff.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/i386/tdvf.c         |  1 +
 include/hw/i386/tdvf.h |  7 +++++++
 target/i386/kvm/tdx.c  | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
index a40198f9407a..dca209098f7a 100644
--- a/hw/i386/tdvf.c
+++ b/hw/i386/tdvf.c
@@ -187,6 +187,7 @@ int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size)
     }
     g_free(sections);
 
+    fw->mem_ptr = flash_ptr;
     return 0;
 
 err:
diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
index 593341eb2e93..d880af245a73 100644
--- a/include/hw/i386/tdvf.h
+++ b/include/hw/i386/tdvf.h
@@ -39,13 +39,20 @@ typedef struct TdxFirmwareEntry {
     uint64_t size;
     uint32_t type;
     uint32_t attributes;
+
+    void *mem_ptr;
 } TdxFirmwareEntry;
 
 typedef struct TdxFirmware {
+    void *mem_ptr;
+
     uint32_t nr_entries;
     TdxFirmwareEntry *entries;
 } TdxFirmware;
 
+#define for_each_tdx_fw_entry(fw, e)    \
+    for (e = (fw)->entries; e != (fw)->entries + (fw)->nr_entries; e++)
+
 int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size);
 
 #endif /* HW_I386_TDVF_H */
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 25b3e2058cb3..95a9c2b26516 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -12,12 +12,15 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/mmap-alloc.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
 #include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/kvm.h"
+#include "sysemu/sysemu.h"
 
 #include "hw/i386/x86.h"
+#include "hw/i386/tdvf.h"
 #include "kvm_i386.h"
 #include "tdx.h"
 #include "../cpu-internal.h"
@@ -450,6 +453,33 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
             (tdx_caps->xfam_fixed1 & CPUID_XSTATE_XSS_MASK) >> 32;
 }
 
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+    TdxFirmware *tdvf = &tdx_guest->tdvf;
+    TdxFirmwareEntry *entry;
+
+    for_each_tdx_fw_entry(tdvf, entry) {
+        switch (entry->type) {
+        case TDVF_SECTION_TYPE_BFV:
+        case TDVF_SECTION_TYPE_CFV:
+            entry->mem_ptr = tdvf->mem_ptr + entry->data_offset;
+            break;
+        case TDVF_SECTION_TYPE_TD_HOB:
+        case TDVF_SECTION_TYPE_TEMP_MEM:
+            entry->mem_ptr = qemu_ram_mmap(-1, entry->size,
+                                           qemu_real_host_page_size(), 0, 0);
+            break;
+        default:
+            error_report("Unsupported TDVF section %d", entry->type);
+            exit(1);
+        }
+    }
+}
+
+static Notifier tdx_machine_done_notify = {
+    .notify = tdx_finalize_vm,
+};
+
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
     TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
@@ -470,6 +500,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
      */
     kvm_readonly_mem_allowed = false;
 
+    qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
+
     tdx_guest = tdx;
 
     return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 25/40] i386/tdx: Track RAM entries for TDX VM
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (23 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 24/40] i386/tdx: Track mem_ptr for each firmware entry of TDVF Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26  9:15   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 26/40] headers: Add definitions from UEFI spec for volumes, resources, etc Xiaoyao Li
                   ` (16 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

The RAM of TDX VM can be classified into two types:

 - TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
   accepted by TDX guest before it can be used and will be all-zeros
   after being accepted.

 - TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
   can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.

Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.

Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.

The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

---
Changes from RFC v4:
  - simplify the algorithm of tdx_accept_ram_range() (Suggested-by: Gerd Hoffman)
    (1) Change the existing entry to cover the accepted ram range.
    (2) If there is room before the accepted ram range add a
	TDX_RAM_UNACCEPTED entry for that.
    (3) If there is room after the accepted ram range add a
	TDX_RAM_UNACCEPTED entry for that.
---
 target/i386/kvm/tdx.c | 110 ++++++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx.h |  14 ++++++
 2 files changed, 124 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 95a9c2b26516..59cff141b4f3 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
 
+#include "hw/i386/e820_memory_layout.h"
 #include "hw/i386/x86.h"
 #include "hw/i386/tdvf.h"
 #include "kvm_i386.h"
@@ -453,11 +454,116 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
             (tdx_caps->xfam_fixed1 & CPUID_XSTATE_XSS_MASK) >> 32;
 }
 
+static void tdx_add_ram_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+    uint32_t nr_entries = tdx_guest->nr_ram_entries;
+    tdx_guest->ram_entries = g_renew(TdxRamEntry, tdx_guest->ram_entries,
+                                     nr_entries + 1);
+
+    tdx_guest->ram_entries[nr_entries].address = address;
+    tdx_guest->ram_entries[nr_entries].length = length;
+    tdx_guest->ram_entries[nr_entries].type = type;
+    tdx_guest->nr_ram_entries++;
+}
+
+static int tdx_accept_ram_range(uint64_t address, uint64_t length)
+{
+    uint64_t head_start, tail_start, head_length, tail_length;
+    uint64_t tmp_address, tmp_length;
+    TdxRamEntry *e;
+    int i;
+
+    for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
+        e = &tdx_guest->ram_entries[i];
+
+        if (address + length <= e->address ||
+            e->address + e->length <= address) {
+                continue;
+        }
+
+        /*
+         * The to-be-accepted ram range must be fully contained by one
+         * RAM entry.
+         */
+        if (e->address > address ||
+            e->address + e->length < address + length) {
+            return -EINVAL;
+        }
+
+        if (e->type == TDX_RAM_ADDED) {
+            return -EINVAL;
+        }
+
+        break;
+    }
+
+    if (i == tdx_guest->nr_ram_entries) {
+        return -1;
+    }
+
+    tmp_address = e->address;
+    tmp_length = e->length;
+
+    e->address = address;
+    e->length = length;
+    e->type = TDX_RAM_ADDED;
+
+    head_length = address - tmp_address;
+    if (head_length > 0) {
+        head_start = tmp_address;
+        tdx_add_ram_entry(head_start, head_length, TDX_RAM_UNACCEPTED);
+    }
+
+    tail_start = address + length;
+    if (tail_start < tmp_address + tmp_length) {
+        tail_length = tmp_address + tmp_length - tail_start;
+        tdx_add_ram_entry(tail_start, tail_length, TDX_RAM_UNACCEPTED);
+    }
+
+    return 0;
+}
+
+static int tdx_ram_entry_compare(const void *lhs_, const void* rhs_)
+{
+    const TdxRamEntry *lhs = lhs_;
+    const TdxRamEntry *rhs = rhs_;
+
+    if (lhs->address == rhs->address) {
+        return 0;
+    }
+    if (le64_to_cpu(lhs->address) > le64_to_cpu(rhs->address)) {
+        return 1;
+    }
+    return -1;
+}
+
+static void tdx_init_ram_entries(void)
+{
+    unsigned i, j, nr_e820_entries;
+
+    nr_e820_entries = e820_get_num_entries();
+    tdx_guest->ram_entries = g_new(TdxRamEntry, nr_e820_entries);
+
+    for (i = 0, j = 0; i < nr_e820_entries; i++) {
+        uint64_t addr, len;
+
+        if (e820_get_entry(i, E820_RAM, &addr, &len)) {
+            tdx_guest->ram_entries[j].address = addr;
+            tdx_guest->ram_entries[j].length = len;
+            tdx_guest->ram_entries[j].type = TDX_RAM_UNACCEPTED;
+            j++;
+        }
+    }
+    tdx_guest->nr_ram_entries = j;
+}
+
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
     TdxFirmware *tdvf = &tdx_guest->tdvf;
     TdxFirmwareEntry *entry;
 
+    tdx_init_ram_entries();
+
     for_each_tdx_fw_entry(tdvf, entry) {
         switch (entry->type) {
         case TDVF_SECTION_TYPE_BFV:
@@ -468,12 +574,16 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
         case TDVF_SECTION_TYPE_TEMP_MEM:
             entry->mem_ptr = qemu_ram_mmap(-1, entry->size,
                                            qemu_real_host_page_size(), 0, 0);
+            tdx_accept_ram_range(entry->address, entry->size);
             break;
         default:
             error_report("Unsupported TDVF section %d", entry->type);
             exit(1);
         }
     }
+
+    qsort(tdx_guest->ram_entries, tdx_guest->nr_ram_entries,
+          sizeof(TdxRamEntry), &tdx_ram_entry_compare);
 }
 
 static Notifier tdx_machine_done_notify = {
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 12bcf25bb95b..5792518afa62 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -15,6 +15,17 @@ typedef struct TdxGuestClass {
     ConfidentialGuestSupportClass parent_class;
 } TdxGuestClass;
 
+enum TdxRamType{
+    TDX_RAM_UNACCEPTED,
+    TDX_RAM_ADDED,
+};
+
+typedef struct TdxRamEntry {
+    uint64_t address;
+    uint64_t length;
+    uint32_t type;
+} TdxRamEntry;
+
 typedef struct TdxGuest {
     ConfidentialGuestSupport parent_obj;
 
@@ -24,6 +35,9 @@ typedef struct TdxGuest {
     uint64_t attributes;    /* TD attributes */
 
     TdxFirmware tdvf;
+
+    uint32_t nr_ram_entries;
+    TdxRamEntry *ram_entries;
 } TdxGuest;
 
 #ifdef CONFIG_TDX
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 26/40] headers: Add definitions from UEFI spec for volumes, resources, etc...
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (24 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 25/40] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26  9:19   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 27/40] i386/tdx: Setup the TD HOB list Xiaoyao Li
                   ` (15 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
to the Trusted Domain Virtual Firmware (TDVF).

All values come from the UEFI specification and TDVF design guide. [1]

Note, EFI_RESOURCE_MEMORY_UNACCEPTED will be added in future UEFI spec.

[1] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 include/standard-headers/uefi/uefi.h | 198 +++++++++++++++++++++++++++
 1 file changed, 198 insertions(+)
 create mode 100644 include/standard-headers/uefi/uefi.h

diff --git a/include/standard-headers/uefi/uefi.h b/include/standard-headers/uefi/uefi.h
new file mode 100644
index 000000000000..b15aba796156
--- /dev/null
+++ b/include/standard-headers/uefi/uefi.h
@@ -0,0 +1,198 @@
+/*
+ * Copyright (C) 2020 Intel Corporation
+ *
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef HW_I386_UEFI_H
+#define HW_I386_UEFI_H
+
+/***************************************************************************/
+/*
+ * basic EFI definitions
+ * supplemented with UEFI Specification Version 2.8 (Errata A)
+ * released February 2020
+ */
+/* UEFI integer is little endian */
+
+typedef struct {
+    uint32_t Data1;
+    uint16_t Data2;
+    uint16_t Data3;
+    uint8_t Data4[8];
+} EFI_GUID;
+
+typedef enum {
+    EfiReservedMemoryType,
+    EfiLoaderCode,
+    EfiLoaderData,
+    EfiBootServicesCode,
+    EfiBootServicesData,
+    EfiRuntimeServicesCode,
+    EfiRuntimeServicesData,
+    EfiConventionalMemory,
+    EfiUnusableMemory,
+    EfiACPIReclaimMemory,
+    EfiACPIMemoryNVS,
+    EfiMemoryMappedIO,
+    EfiMemoryMappedIOPortSpace,
+    EfiPalCode,
+    EfiPersistentMemory,
+    EfiUnacceptedMemoryType,
+    EfiMaxMemoryType
+} EFI_MEMORY_TYPE;
+
+#define EFI_HOB_HANDOFF_TABLE_VERSION 0x0009
+
+#define EFI_HOB_TYPE_HANDOFF              0x0001
+#define EFI_HOB_TYPE_MEMORY_ALLOCATION    0x0002
+#define EFI_HOB_TYPE_RESOURCE_DESCRIPTOR  0x0003
+#define EFI_HOB_TYPE_GUID_EXTENSION       0x0004
+#define EFI_HOB_TYPE_FV                   0x0005
+#define EFI_HOB_TYPE_CPU                  0x0006
+#define EFI_HOB_TYPE_MEMORY_POOL          0x0007
+#define EFI_HOB_TYPE_FV2                  0x0009
+#define EFI_HOB_TYPE_LOAD_PEIM_UNUSED     0x000A
+#define EFI_HOB_TYPE_UEFI_CAPSULE         0x000B
+#define EFI_HOB_TYPE_FV3                  0x000C
+#define EFI_HOB_TYPE_UNUSED               0xFFFE
+#define EFI_HOB_TYPE_END_OF_HOB_LIST      0xFFFF
+
+typedef struct {
+    uint16_t HobType;
+    uint16_t HobLength;
+    uint32_t Reserved;
+} EFI_HOB_GENERIC_HEADER;
+
+typedef uint64_t EFI_PHYSICAL_ADDRESS;
+typedef uint32_t EFI_BOOT_MODE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    uint32_t Version;
+    EFI_BOOT_MODE BootMode;
+    EFI_PHYSICAL_ADDRESS EfiMemoryTop;
+    EFI_PHYSICAL_ADDRESS EfiMemoryBottom;
+    EFI_PHYSICAL_ADDRESS EfiFreeMemoryTop;
+    EFI_PHYSICAL_ADDRESS EfiFreeMemoryBottom;
+    EFI_PHYSICAL_ADDRESS EfiEndOfHobList;
+} EFI_HOB_HANDOFF_INFO_TABLE;
+
+#define EFI_RESOURCE_SYSTEM_MEMORY          0x00000000
+#define EFI_RESOURCE_MEMORY_MAPPED_IO       0x00000001
+#define EFI_RESOURCE_IO                     0x00000002
+#define EFI_RESOURCE_FIRMWARE_DEVICE        0x00000003
+#define EFI_RESOURCE_MEMORY_MAPPED_IO_PORT  0x00000004
+#define EFI_RESOURCE_MEMORY_RESERVED        0x00000005
+#define EFI_RESOURCE_IO_RESERVED            0x00000006
+#define EFI_RESOURCE_MEMORY_UNACCEPTED      0x00000007
+#define EFI_RESOURCE_MAX_MEMORY_TYPE        0x00000008
+
+#define EFI_RESOURCE_ATTRIBUTE_PRESENT                  0x00000001
+#define EFI_RESOURCE_ATTRIBUTE_INITIALIZED              0x00000002
+#define EFI_RESOURCE_ATTRIBUTE_TESTED                   0x00000004
+#define EFI_RESOURCE_ATTRIBUTE_SINGLE_BIT_ECC           0x00000008
+#define EFI_RESOURCE_ATTRIBUTE_MULTIPLE_BIT_ECC         0x00000010
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_1           0x00000020
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_2           0x00000040
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTED           0x00000080
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTED          0x00000100
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTED      0x00000200
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE              0x00000400
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_COMBINEABLE        0x00000800
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_THROUGH_CACHEABLE  0x00001000
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_BACK_CACHEABLE     0x00002000
+#define EFI_RESOURCE_ATTRIBUTE_16_BIT_IO                0x00004000
+#define EFI_RESOURCE_ATTRIBUTE_32_BIT_IO                0x00008000
+#define EFI_RESOURCE_ATTRIBUTE_64_BIT_IO                0x00010000
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHED_EXPORTED        0x00020000
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTED      0x00040000
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTABLE    0x00080000
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTABLE         0x00100000
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTABLE        0x00200000
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTABLE    0x00400000
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTENT               0x00800000
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTABLE              0x01000000
+#define EFI_RESOURCE_ATTRIBUTE_MORE_RELIABLE            0x02000000
+
+typedef uint32_t EFI_RESOURCE_TYPE;
+typedef uint32_t EFI_RESOURCE_ATTRIBUTE_TYPE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_GUID Owner;
+    EFI_RESOURCE_TYPE ResourceType;
+    EFI_RESOURCE_ATTRIBUTE_TYPE ResourceAttribute;
+    EFI_PHYSICAL_ADDRESS PhysicalStart;
+    uint64_t ResourceLength;
+} EFI_HOB_RESOURCE_DESCRIPTOR;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_GUID Name;
+
+    /* guid specific data follows */
+} EFI_HOB_GUID_TYPE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+} EFI_HOB_FIRMWARE_VOLUME;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+    EFI_GUID FvName;
+    EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME2;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+    uint32_t AuthenticationStatus;
+    bool ExtractedFv;
+    EFI_GUID FvName;
+    EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME3;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    uint8_t SizeOfMemorySpace;
+    uint8_t SizeOfIoSpace;
+    uint8_t Reserved[6];
+} EFI_HOB_CPU;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+} EFI_HOB_MEMORY_POOL;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+} EFI_HOB_UEFI_CAPSULE;
+
+#define EFI_HOB_OWNER_ZERO                                      \
+    ((EFI_GUID){ 0x00000000, 0x0000, 0x0000,                    \
+        { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } })
+
+#endif
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 27/40] i386/tdx: Setup the TD HOB list
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (25 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 26/40] headers: Add definitions from UEFI spec for volumes, resources, etc Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26 10:27   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 28/40] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
                   ` (14 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.

Build the TD HOB in TDX's machine_init_done callback.

Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

---
Changes from RFC v4:
  - drop the code of adding mmio resources since OVMF prepares all the
    MMIO hob itself.
---
 hw/i386/meson.build   |   2 +-
 hw/i386/tdvf-hob.c    | 146 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/tdvf-hob.h    |  24 +++++++
 target/i386/kvm/tdx.c |  16 +++++
 4 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 hw/i386/tdvf-hob.c
 create mode 100644 hw/i386/tdvf-hob.h

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 97f3b50503b0..b59e0d35bba3 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -28,7 +28,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'port92.c'))
 i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
                                         if_false: files('pc_sysfw_ovmf-stubs.c'))
-i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c', 'tdvf-hob.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
new file mode 100644
index 000000000000..bdf3b4823340
--- /dev/null
+++ b/hw/i386/tdvf-hob.c
@@ -0,0 +1,146 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "e820_memory_layout.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/x86.h"
+#include "hw/pci/pcie_host.h"
+#include "sysemu/kvm.h"
+#include "standard-headers/uefi/uefi.h"
+#include "tdvf-hob.h"
+
+typedef struct TdvfHob {
+    hwaddr hob_addr;
+    void *ptr;
+    int size;
+
+    /* working area */
+    void *current;
+    void *end;
+} TdvfHob;
+
+static uint64_t tdvf_current_guest_addr(const TdvfHob *hob)
+{
+    return hob->hob_addr + (hob->current - hob->ptr);
+}
+
+static void tdvf_align(TdvfHob *hob, size_t align)
+{
+    hob->current = QEMU_ALIGN_PTR_UP(hob->current, align);
+}
+
+static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
+{
+    void *ret;
+
+    if (hob->current + size > hob->end) {
+        error_report("TD_HOB overrun, size = 0x%" PRIx64, size);
+        exit(1);
+    }
+
+    ret = hob->current;
+    hob->current += size;
+    tdvf_align(hob, 8);
+    return ret;
+}
+
+static void tdvf_hob_add_memory_resources(TdxGuest *tdx, TdvfHob *hob)
+{
+    EFI_HOB_RESOURCE_DESCRIPTOR *region;
+    EFI_RESOURCE_ATTRIBUTE_TYPE attr;
+    EFI_RESOURCE_TYPE resource_type;
+
+    TdxRamEntry *e;
+    int i;
+
+    for (i = 0; i < tdx->nr_ram_entries; i++) {
+        e = &tdx->ram_entries[i];
+
+        if (e->type == TDX_RAM_UNACCEPTED) {
+            resource_type = EFI_RESOURCE_MEMORY_UNACCEPTED;
+            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED;
+        } else if (e->type == TDX_RAM_ADDED){
+            resource_type = EFI_RESOURCE_SYSTEM_MEMORY;
+            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE;
+        } else {
+            error_report("unknown TDX_RAM_ENTRY type %d", e->type);
+            exit(1);
+        }
+
+        region = tdvf_get_area(hob, sizeof(*region));
+        *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
+            .Header = {
+                .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
+                .HobLength = cpu_to_le16(sizeof(*region)),
+                .Reserved = cpu_to_le32(0),
+            },
+            .Owner = EFI_HOB_OWNER_ZERO,
+            .ResourceType = cpu_to_le32(resource_type),
+            .ResourceAttribute = cpu_to_le32(attr),
+            .PhysicalStart = cpu_to_le64(e->address),
+            .ResourceLength = cpu_to_le64(e->length),
+        };
+    }
+}
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob)
+{
+    TdvfHob hob = {
+        .hob_addr = td_hob->address,
+        .size = td_hob->size,
+        .ptr = td_hob->mem_ptr,
+
+        .current = td_hob->mem_ptr,
+        .end = td_hob->mem_ptr + td_hob->size,
+    };
+
+    EFI_HOB_GENERIC_HEADER *last_hob;
+    EFI_HOB_HANDOFF_INFO_TABLE *hit;
+
+    /* Note, Efi{Free}Memory{Bottom,Top} are ignored, leave 'em zeroed. */
+    hit = tdvf_get_area(&hob, sizeof(*hit));
+    *hit = (EFI_HOB_HANDOFF_INFO_TABLE) {
+        .Header = {
+            .HobType = EFI_HOB_TYPE_HANDOFF,
+            .HobLength = cpu_to_le16(sizeof(*hit)),
+            .Reserved = cpu_to_le32(0),
+        },
+        .Version = cpu_to_le32(EFI_HOB_HANDOFF_TABLE_VERSION),
+        .BootMode = cpu_to_le32(0),
+        .EfiMemoryTop = cpu_to_le64(0),
+        .EfiMemoryBottom = cpu_to_le64(0),
+        .EfiFreeMemoryTop = cpu_to_le64(0),
+        .EfiFreeMemoryBottom = cpu_to_le64(0),
+        .EfiEndOfHobList = cpu_to_le64(0), /* initialized later */
+    };
+
+    tdvf_hob_add_memory_resources(tdx, &hob);
+
+    last_hob = tdvf_get_area(&hob, sizeof(*last_hob));
+    *last_hob =  (EFI_HOB_GENERIC_HEADER) {
+        .HobType = EFI_HOB_TYPE_END_OF_HOB_LIST,
+        .HobLength = cpu_to_le16(sizeof(*last_hob)),
+        .Reserved = cpu_to_le32(0),
+    };
+    hit->EfiEndOfHobList = tdvf_current_guest_addr(&hob);
+}
diff --git a/hw/i386/tdvf-hob.h b/hw/i386/tdvf-hob.h
new file mode 100644
index 000000000000..1b737e946a8d
--- /dev/null
+++ b/hw/i386/tdvf-hob.h
@@ -0,0 +1,24 @@
+#ifndef HW_I386_TD_HOB_H
+#define HW_I386_TD_HOB_H
+
+#include "hw/i386/tdvf.h"
+#include "target/i386/kvm/tdx.h"
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob);
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE     \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_TESTED)
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED  \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_TESTED)
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO        \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT     |       \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE)
+
+#endif
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 59cff141b4f3..944f2f5b6921 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -22,6 +22,7 @@
 #include "hw/i386/e820_memory_layout.h"
 #include "hw/i386/x86.h"
 #include "hw/i386/tdvf.h"
+#include "hw/i386/tdvf-hob.h"
 #include "kvm_i386.h"
 #include "tdx.h"
 #include "../cpu-internal.h"
@@ -454,6 +455,19 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
             (tdx_caps->xfam_fixed1 & CPUID_XSTATE_XSS_MASK) >> 32;
 }
 
+static TdxFirmwareEntry *tdx_get_hob_entry(TdxGuest *tdx)
+{
+    TdxFirmwareEntry *entry;
+
+    for_each_tdx_fw_entry(&tdx->tdvf, entry) {
+        if (entry->type == TDVF_SECTION_TYPE_TD_HOB) {
+            return entry;
+        }
+    }
+    error_report("TDVF metadata doesn't specify TD_HOB location.");
+    exit(1);
+}
+
 static void tdx_add_ram_entry(uint64_t address, uint64_t length, uint32_t type)
 {
     uint32_t nr_entries = tdx_guest->nr_ram_entries;
@@ -584,6 +598,8 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
 
     qsort(tdx_guest->ram_entries, tdx_guest->nr_ram_entries,
           sizeof(TdxRamEntry), &tdx_ram_entry_compare);
+
+    tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
 }
 
 static Notifier tdx_machine_done_notify = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 28/40] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (26 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 27/40] i386/tdx: Setup the TD HOB list Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 29/40] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu Xiaoyao Li
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDVF firmware (CODE and VARS) needs to be added/copied to TD's private
memory via KVM_TDX_INIT_MEM_REGION, as well as TD HOB and TEMP memory.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>

---
Changes from RFC v4:
  - rename variable @metadata to @flags
---
 target/i386/kvm/tdx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 944f2f5b6921..d0bbe06f5504 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -575,6 +575,7 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
     TdxFirmware *tdvf = &tdx_guest->tdvf;
     TdxFirmwareEntry *entry;
+    int r;
 
     tdx_init_ram_entries();
 
@@ -600,6 +601,29 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
           sizeof(TdxRamEntry), &tdx_ram_entry_compare);
 
     tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
+
+    for_each_tdx_fw_entry(tdvf, entry) {
+        struct kvm_tdx_init_mem_region mem_region = {
+            .source_addr = (__u64)entry->mem_ptr,
+            .gpa = entry->address,
+            .nr_pages = entry->size / 4096,
+        };
+
+        __u32 flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
+                      KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+        r = tdx_vm_ioctl(KVM_TDX_INIT_MEM_REGION, flags, &mem_region);
+        if (r < 0) {
+             error_report("KVM_TDX_INIT_MEM_REGION failed %s", strerror(-r));
+             exit(1);
+        }
+
+        if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+            entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+            qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
+            entry->mem_ptr = NULL;
+        }
+    }
 }
 
 static Notifier tdx_machine_done_notify = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 29/40] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (27 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 28/40] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 30/40] i386/tdx: Finalize TDX VM Xiaoyao Li
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX vcpu needs to be initialized by SEAMCALL(TDH.VP.INIT) and KVM
provides vcpu level IOCTL KVM_TDX_INIT_VCPU for it.

KVM_TDX_INIT_VCPU needs the address of the HOB as input. Invoke it for
each vcpu after HOB list is created.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 target/i386/kvm/tdx.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index d0bbe06f5504..2dbe26f2e950 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -571,6 +571,22 @@ static void tdx_init_ram_entries(void)
     tdx_guest->nr_ram_entries = j;
 }
 
+static void tdx_post_init_vcpus(void)
+{
+    TdxFirmwareEntry *hob;
+    CPUState *cpu;
+    int r;
+
+    hob = tdx_get_hob_entry(tdx_guest);
+    CPU_FOREACH(cpu) {
+        r = tdx_vcpu_ioctl(cpu, KVM_TDX_INIT_VCPU, 0, (void *)hob->address);
+        if (r < 0) {
+            error_report("KVM_TDX_INIT_VCPU failed %s", strerror(-r));
+            exit(1);
+        }
+    }
+}
+
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
     TdxFirmware *tdvf = &tdx_guest->tdvf;
@@ -602,6 +618,8 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
 
     tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
 
+    tdx_post_init_vcpus();
+
     for_each_tdx_fw_entry(tdvf, entry) {
         struct kvm_tdx_init_mem_region mem_region = {
             .source_addr = (__u64)entry->mem_ptr,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 30/40] i386/tdx: Finalize TDX VM
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (28 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 29/40] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 31/40] i386/tdx: Disable SMM for TDX VMs Xiaoyao Li
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Invoke KVM_TDX_FINALIZE_VM to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 target/i386/kvm/tdx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 2dbe26f2e950..1de767a990ba 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -642,6 +642,13 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
             entry->mem_ptr = NULL;
         }
     }
+
+    r = tdx_vm_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
+    if (r < 0) {
+        error_report("KVM_TDX_FINALIZE_VM failed %s", strerror(-r));
+        exit(0);
+    }
+    tdx_guest->parent_obj.ready = true;
 }
 
 static Notifier tdx_machine_done_notify = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 31/40] i386/tdx: Disable SMM for TDX VMs
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (29 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 30/40] i386/tdx: Finalize TDX VM Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 32/40] i386/tdx: Disable PIC " Xiaoyao Li
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.

Disable SMM for TDX VMs and error out if user requests to enable SMM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 target/i386/kvm/tdx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 1de767a990ba..70c56b7ba32c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -657,9 +657,17 @@ static Notifier tdx_machine_done_notify = {
 
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
+    X86MachineState *x86ms = X86_MACHINE(ms);
     TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
                                                     TYPE_TDX_GUEST);
 
+    if (x86ms->smm == ON_OFF_AUTO_AUTO) {
+        x86ms->smm = ON_OFF_AUTO_OFF;
+    } else if (x86ms->smm == ON_OFF_AUTO_ON) {
+        error_setg(errp, "TDX VM doesn't support SMM");
+        return -EINVAL;
+    }
+
     if (!tdx_caps) {
         get_tdx_capabilities();
     }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 32/40] i386/tdx: Disable PIC for TDX VMs
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (30 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 31/40] i386/tdx: Disable SMM for TDX VMs Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 33/40] i386/tdx: Don't allow system reset " Xiaoyao Li
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection.  Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.

Hence disable PIC for TDX VMs and error out if user wants PIC.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 target/i386/kvm/tdx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 70c56b7ba32c..2f317a6bb55b 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -668,6 +668,13 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
         return -EINVAL;
     }
 
+    if (x86ms->pic == ON_OFF_AUTO_AUTO) {
+        x86ms->pic = ON_OFF_AUTO_OFF;
+    } else if (x86ms->pic == ON_OFF_AUTO_ON) {
+        error_setg(errp, "TDX VM doesn't support PIC");
+        return -EINVAL;
+    }
+
     if (!tdx_caps) {
         get_tdx_capabilities();
     }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 33/40] i386/tdx: Don't allow system reset for TDX VMs
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (31 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 32/40] i386/tdx: Disable PIC " Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 34/40] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX CPU state is protected and thus vcpu state cann't be reset by VMM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 target/i386/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 1545b6f870f5..8c282122ed67 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5455,7 +5455,7 @@ bool kvm_has_waitpkg(void)
 
 bool kvm_arch_cpu_check_are_resettable(void)
 {
-    return !sev_es_enabled();
+    return !sev_es_enabled() && !is_tdx_vm();
 }
 
 #define ARCH_REQ_XCOMP_GUEST_PERM       0x1025
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 34/40] hw/i386: add eoi_intercept_unsupported member to X86MachineState
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (32 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 33/40] i386/tdx: Don't allow system reset " Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26 10:32   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 35/40] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
                   ` (7 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Add a new bool member, eoi_intercept_unsupported, to X86MachineState
with default value false. Set true for TDX VM.

Inability to intercept eoi causes impossibility to emulate level
triggered interrupt to be re-injected when level is still kept active.
which affects interrupt controller emulation.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/x86.c         | 1 +
 include/hw/i386/x86.h | 1 +
 target/i386/kvm/tdx.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a389ee26265a..6ab023713bf1 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1401,6 +1401,7 @@ static void x86_machine_initfn(Object *obj)
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
     x86ms->bus_lock_ratelimit = 0;
     x86ms->above_4g_mem_start = 4 * GiB;
+    x86ms->eoi_intercept_unsupported = false;
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 62fa5774f849..0a294f9c3176 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -61,6 +61,7 @@ struct X86MachineState {
 
     /* CPU and apic information: */
     bool apic_xrupt_override;
+    bool eoi_intercept_unsupported;
     unsigned pci_irq_mask;
     unsigned apic_id_limit;
     uint16_t boot_cpus;
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 2f317a6bb55b..c734772200d0 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -675,6 +675,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
         return -EINVAL;
     }
 
+    x86ms->eoi_intercept_unsupported = true;
+
     if (!tdx_caps) {
         get_tdx_capabilities();
     }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 35/40] hw/i386: add option to forcibly report edge trigger in acpi tables
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (33 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 34/40] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26 10:32   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 36/40] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
                   ` (6 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Isaku Yamahata <isaku.yamahata@intel.com>

When level trigger isn't supported on x86 platform,
forcibly report edge trigger in acpi tables.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/acpi-build.c  | 99 ++++++++++++++++++++++++++++---------------
 hw/i386/acpi-common.c | 50 ++++++++++++++++------
 2 files changed, 104 insertions(+), 45 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 0355bd3ddaad..83d4777ca9ad 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -894,7 +894,8 @@ static void build_dbg_aml(Aml *table)
     aml_append(table, scope);
 }
 
-static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
+static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
+                           bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -906,7 +907,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
     aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
 
     crs = aml_resource_template();
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
+                                  AML_ACTIVE_HIGH,
                                   AML_SHARED, irqs, ARRAY_SIZE(irqs)));
     aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -930,7 +934,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
     return dev;
  }
 
-static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
+static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
+                               uint8_t gsi, bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -943,7 +948,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
 
     crs = aml_resource_template();
     irqs = gsi;
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
+                                  AML_ACTIVE_HIGH,
                                   AML_SHARED, &irqs, 1));
     aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -962,7 +970,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
 }
 
 /* _CRS method - get current settings */
-static Aml *build_iqcr_method(bool is_piix4)
+static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
 {
     Aml *if_ctx;
     uint32_t irqs;
@@ -970,7 +978,9 @@ static Aml *build_iqcr_method(bool is_piix4)
     Aml *crs = aml_resource_template();
 
     irqs = 0;
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
                                   AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
     aml_append(method, aml_name_decl("PRR0", crs));
 
@@ -1004,7 +1014,7 @@ static Aml *build_irq_status_method(void)
     return method;
 }
 
-static void build_piix4_pci0_int(Aml *table)
+static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -1025,12 +1035,16 @@ static void build_piix4_pci0_int(Aml *table)
     aml_append(sb_scope, field);
 
     aml_append(sb_scope, build_irq_status_method());
-    aml_append(sb_scope, build_iqcr_method(true));
+    aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
 
-    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
-    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
-    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
-    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
+    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
+                                        level_trigger_unsupported));
 
     dev = aml_device("LNKS");
     {
@@ -1039,7 +1053,9 @@ static void build_piix4_pci0_int(Aml *table)
 
         crs = aml_resource_template();
         irqs = 9;
-        aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+        aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                      level_trigger_unsupported ?
+                                      AML_EDGE : AML_LEVEL,
                                       AML_ACTIVE_HIGH, AML_SHARED,
                                       &irqs, 1));
         aml_append(dev, aml_name_decl("_PRS", crs));
@@ -1125,7 +1141,7 @@ static Aml *build_q35_routing_table(const char *str)
     return pkg;
 }
 
-static void build_q35_pci0_int(Aml *table)
+static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
 {
     Aml *field;
     Aml *method;
@@ -1177,25 +1193,41 @@ static void build_q35_pci0_int(Aml *table)
     aml_append(sb_scope, field);
 
     aml_append(sb_scope, build_irq_status_method());
-    aml_append(sb_scope, build_iqcr_method(false));
+    aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
 
-    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
-    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
-    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
-    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
-    aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
-    aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
-    aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
-    aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
+    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
+                                        level_trigger_unsupported));
 
-    aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
-    aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
-    aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
-    aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
-    aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
-    aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
-    aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
-    aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
+    aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
+                                            level_trigger_unsupported));
 
     aml_append(table, sb_scope);
 }
@@ -1440,6 +1472,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
     PCMachineState *pcms = PC_MACHINE(machine);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
     X86MachineState *x86ms = X86_MACHINE(machine);
+    bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
     AcpiMcfgInfo mcfg;
     bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
     uint32_t nr_mem = machine->ram_slots;
@@ -1474,7 +1507,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
             build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
         }
-        build_piix4_pci0_int(dsdt);
+        build_piix4_pci0_int(dsdt, level_trigger_unsupported);
     } else {
         sb_scope = aml_scope("_SB");
         dev = aml_device("PCI0");
@@ -1522,7 +1555,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         if (pm->pcihp_bridge_en) {
             build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
         }
-        build_q35_pci0_int(dsdt);
+        build_q35_pci0_int(dsdt, level_trigger_unsupported);
         if (pcms->smbus) {
             build_smb0(dsdt, ICH9_SMB_DEV, ICH9_SMB_FUNC);
         }
diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
index 4aaafbdd7b5d..485fc17816be 100644
--- a/hw/i386/acpi-common.c
+++ b/hw/i386/acpi-common.c
@@ -105,6 +105,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
     AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(adev);
     AcpiTable table = { .sig = "APIC", .rev = 1, .oem_id = oem_id,
                         .oem_table_id = oem_table_id };
+    bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
 
     acpi_table_begin(&table, table_data);
     /* Local APIC Address */
@@ -124,18 +125,43 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
                      IO_APIC_SECONDARY_ADDRESS, IO_APIC_SECONDARY_IRQBASE);
     }
 
-    if (x86ms->apic_xrupt_override) {
-        build_xrupt_override(table_data, 0, 2,
-            0 /* Flags: Conforms to the specifications of the bus */);
-    }
-
-    for (i = 1; i < 16; i++) {
-        if (!(x86ms->pci_irq_mask & (1 << i))) {
-            /* No need for a INT source override structure. */
-            continue;
-        }
-        build_xrupt_override(table_data, i, i,
-            0xd /* Flags: Active high, Level Triggered */);
+    if (level_trigger_unsupported) {
+        /* Force edge trigger */
+        if (x86ms->apic_xrupt_override) {
+            build_xrupt_override(table_data, 0, 2,
+                                 /* Flags: active high, edge triggered */
+                                 1 | (1 << 2));
+        }
+
+        for (i = x86ms->apic_xrupt_override ? 1 : 0; i < 16; i++) {
+            build_xrupt_override(table_data, i, i,
+                                 /* Flags: active high, edge triggered */
+                                 1 | (1 << 2));
+        }
+
+        if (x86ms->ioapic2) {
+            for (i = 0; i < 16; i++) {
+                build_xrupt_override(table_data, IO_APIC_SECONDARY_IRQBASE + i,
+                                     IO_APIC_SECONDARY_IRQBASE + i,
+                                     /* Flags: active high, edge triggered */
+                                     1 | (1 << 2));
+            }
+        }
+    } else {
+        if (x86ms->apic_xrupt_override) {
+            build_xrupt_override(table_data, 0, 2,
+                                 0 /* Flags: Conforms to the specifications of the bus */);
+        }
+
+        for (i = 1; i < 16; i++) {
+            if (!(x86ms->pci_irq_mask & (1 << i))) {
+                /* No need for a INT source override structure. */
+                continue;
+            }
+            build_xrupt_override(table_data, i, i,
+                                 0xd /* Flags: Active high, Level Triggered */);
+
+        }
     }
 
     if (x2apic_mode) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 36/40] i386/tdx: Don't synchronize guest tsc for TDs
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (34 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 35/40] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26 10:33   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 37/40] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
                   ` (5 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Isaku Yamahata <isaku.yamahata@intel.com>

TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8c282122ed67..8999f64eeaf1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -270,7 +270,7 @@ void kvm_synchronize_all_tsc(void)
 {
     CPUState *cpu;
 
-    if (kvm_enabled()) {
+    if (kvm_enabled() && !is_tdx_vm()) {
         CPU_FOREACH(cpu) {
             run_on_cpu(cpu, do_kvm_synchronize_tsc, RUN_ON_CPU_NULL);
         }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 37/40] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (35 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 36/40] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-02  7:47 ` [PATCH v1 38/40] i386/tdx: Skip kvm_put_apicbase() " Xiaoyao Li
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.

Only configure MSR_IA32_UCODE_REV for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
---
 target/i386/kvm/kvm.c | 44 ++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8999f64eeaf1..53ab539e7e4d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3167,32 +3167,34 @@ static void kvm_init_msrs(X86CPU *cpu)
     CPUX86State *env = &cpu->env;
 
     kvm_msr_buf_reset(cpu);
-    if (has_msr_arch_capabs) {
-        kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
-                          env->features[FEAT_ARCH_CAPABILITIES]);
-    }
-
-    if (has_msr_core_capabs) {
-        kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
-                          env->features[FEAT_CORE_CAPABILITY]);
-    }
-
-    if (has_msr_perf_capabs && cpu->enable_pmu) {
-        kvm_msr_entry_add_perf(cpu, env->features);
+
+    if (!is_tdx_vm()) {
+        if (has_msr_arch_capabs) {
+            kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
+                                env->features[FEAT_ARCH_CAPABILITIES]);
+        }
+
+        if (has_msr_core_capabs) {
+            kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
+                                env->features[FEAT_CORE_CAPABILITY]);
+        }
+
+        if (has_msr_perf_capabs && cpu->enable_pmu) {
+            kvm_msr_entry_add_perf(cpu, env->features);
+        }
+
+        /*
+         * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
+         * all kernels with MSR features should have them.
+         */
+        if (kvm_feature_msrs && cpu_has_vmx(env)) {
+            kvm_msr_entry_add_vmx(cpu, env->features);
+        }
     }
 
     if (has_msr_ucode_rev) {
         kvm_msr_entry_add(cpu, MSR_IA32_UCODE_REV, cpu->ucode_rev);
     }
-
-    /*
-     * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
-     * all kernels with MSR features should have them.
-     */
-    if (kvm_feature_msrs && cpu_has_vmx(env)) {
-        kvm_msr_entry_add_vmx(cpu, env->features);
-    }
-
     assert(kvm_buf_set_msrs(cpu) == 0);
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 38/40] i386/tdx: Skip kvm_put_apicbase() for TDs
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (36 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 37/40] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26 10:34   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 39/40] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
                   ` (3 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

KVM doesn't allow wirting to MSR_IA32_APICBASE for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 53ab539e7e4d..948c87ebdb97 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2949,6 +2949,11 @@ void kvm_put_apicbase(X86CPU *cpu, uint64_t value)
 {
     int ret;
 
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (is_tdx_vm()) {
+        return;
+    }
+
     ret = kvm_put_one_msr(cpu, MSR_IA32_APICBASE, value);
     assert(ret == 1);
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 39/40] i386/tdx: Don't get/put guest state for TDX VMs
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (37 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 38/40] i386/tdx: Skip kvm_put_apicbase() " Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26 10:35   ` Gerd Hoffmann
  2022-08-02  7:47 ` [PATCH v1 40/40] docs: Add TDX documentation Xiaoyao Li
                   ` (2 subsequent siblings)
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Sean Christopherson <sean.j.christopherson@intel.com>

Don't get/put state of TDX VMs since accessing/mutating guest state of
production TDs is not supported.

Note, it will be allowed for a debug TD. Corresponding support will be
introduced when debug TD support is implemented in the future.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 948c87ebdb97..95afbbac7116 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4584,6 +4584,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (is_tdx_vm()) {
+        return 0;
+    }
+
     /* must be before kvm_put_nested_state so that EFER.SVME is set */
     ret = has_sregs2 ? kvm_put_sregs2(x86_cpu) : kvm_put_sregs(x86_cpu);
     if (ret < 0) {
@@ -4678,6 +4683,12 @@ int kvm_arch_get_registers(CPUState *cs)
     if (ret < 0) {
         goto out;
     }
+
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (is_tdx_vm()) {
+        return 0;
+    }
+
     ret = kvm_getput_regs(cpu, 0);
     if (ret < 0) {
         goto out;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v1 40/40] docs: Add TDX documentation
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (38 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 39/40] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
@ 2022-08-02  7:47 ` Xiaoyao Li
  2022-08-26 10:36   ` Gerd Hoffmann
  2022-08-02  9:49 ` [PATCH v1 00/40] TDX QEMU support Daniel P. Berrangé
  2022-09-05  0:58 ` Xiaoyao Li
  41 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02  7:47 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Add docs/system/i386/tdx.rst for TDX support, and add tdx in
confidential-guest-support.rst

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

---
changes in v5:
 - add the restriction that kernel-irqchip must be split
---
 docs/system/confidential-guest-support.rst |   1 +
 docs/system/i386/tdx.rst                   | 105 +++++++++++++++++++++
 docs/system/target-i386.rst                |   1 +
 3 files changed, 107 insertions(+)
 create mode 100644 docs/system/i386/tdx.rst

diff --git a/docs/system/confidential-guest-support.rst b/docs/system/confidential-guest-support.rst
index 0c490dbda2b7..66129fbab64c 100644
--- a/docs/system/confidential-guest-support.rst
+++ b/docs/system/confidential-guest-support.rst
@@ -38,6 +38,7 @@ Supported mechanisms
 Currently supported confidential guest mechanisms are:
 
 * AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-memory-encryption`)
+* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`)
 * POWER Protected Execution Facility (PEF) (see :ref:`power-papr-protected-execution-facility-pef`)
 * s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`)
 
diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst
new file mode 100644
index 000000000000..1f95e742f75c
--- /dev/null
+++ b/docs/system/i386/tdx.rst
@@ -0,0 +1,105 @@
+Intel Trusted Domain eXtension (TDX)
+====================================
+
+Intel Trusted Domain eXtensions (TDX) refers to an Intel technology that extends
+Virtual Machine Extensions (VMX) and Multi-Key Total Memory Encryption (MKTME)
+with a new kind of virtual machine guest called a Trust Domain (TD). A TD runs
+in a CPU mode that is designed to protect the confidentiality of its memory
+contents and its CPU state from any other software, including the hosting
+Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself.
+
+Prerequisites
+-------------
+
+To run TD, the physical machine needs to have TDX module loaded and initialized
+while KVM hypervisor has TDX support and has TDX enabled. If those requirements
+are met, the ``KVM_CAP_VM_TYPES`` will report the support of ``KVM_X86_TDX_VM``.
+
+Trust Domain Virtual Firmware (TDVF)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Trust Domain Virtual Firmware (TDVF) is required to provide TD services to boot
+TD Guest OS. TDVF needs to be copied to guest private memory and measured before
+a TD boots.
+
+The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_INIT_MEM_REGION``
+to copy the TDVF image to TD's private memory space.
+
+Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash
+device and it actually works as RAM. "-bios" option is chosen to load TDVF.
+
+OVMF is the opensource firmware that implements the TDVF support. Thus the
+command line to specify and load TDVF is ``-bios OVMF.fd``
+
+Feature Control
+---------------
+
+Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD is not
+under full control of VMM. VMM can only configure part of features of a TD on
+``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl.
+
+The configurable features have three types:
+
+- Attributes:
+  - PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD,
+  which determines related CPUID bit and CR4 bit;
+  - PERFMON (bit 63) controls whether PMU is exposed to TD.
+
+- XSAVE related features (XFAM):
+  XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It
+  determines the set of extended features available for use by the guest TD.
+
+- CPUID features:
+  Only some bits of some CPUID leaves are directly configurable by VMM.
+
+What features can be configured is reported via TDX capabilities.
+
+TDX capabilities
+~~~~~~~~~~~~~~~~
+
+The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_CAPABILITIES``
+to get the TDX capabilities from KVM. It returns a data structure of
+``struct kvm_tdx_capabilites``, which tells the supported configuration of
+attributes, XFAM and CPUIDs.
+
+Launching a TD (TDX VM)
+-----------------------
+
+To launch a TDX guest:
+
+.. parsed-literal::
+
+    |qemu_system_x86| \\
+        -machine ...,kernel-irqchip=split,confidential-guest-support=tdx0 \\
+        -object tdx-guest,id=tdx0 \\
+        -bios OVMF.fd \\
+
+Debugging
+---------
+
+Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs in off-TD
+debug mode. When in off-TD debug mode, TD's VCPU state and private memory are
+accessible via given SEAMCALLs. This requires KVM to expose APIs to invoke those
+SEAMCALLs and resonponding QEMU change.
+
+It's targeted as future work.
+
+restrictions
+------------
+
+ - kernel-irqchip must be split;
+
+ - No readonly support for private memory;
+
+ - No SMM support: SMM support requires manipulating the guset register states
+   which is not allowed;
+
+Live Migration
+--------------
+
+TODO
+
+References
+----------
+
+- `TDX Homepage <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html>`__
diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst
index e64c0130772d..25aa626b4a33 100644
--- a/docs/system/target-i386.rst
+++ b/docs/system/target-i386.rst
@@ -30,6 +30,7 @@ Architectural features
    i386/kvm-pv
    i386/sgx
    i386/amd-memory-encryption
+   i386/tdx
 
 .. _pcsys_005freq:
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 01/40] *** HACK *** linux-headers: Update headers to pull in TDX API changes
  2022-08-02  7:47 ` [PATCH v1 01/40] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
@ 2022-08-02  9:47   ` Daniel P. Berrangé
  2022-08-02 10:38     ` Xiaoyao Li
  0 siblings, 1 reply; 80+ messages in thread
From: Daniel P. Berrangé @ 2022-08-02  9:47 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:11PM +0800, Xiaoyao Li wrote:
> Pull in recent TDX updates, which are not backwards compatible.
> 
> It's just to make this series runnable. It will be updated by script
> 
> 	scripts/update-linux-headers.sh
> 
> once TDX support is upstreamed in linux kernel.

I saw a bunch of TDX support merged in 5.19:

commit 3a755ebcc2557e22b895b8976257f682c653db1d
Merge: 5b828263b180 c796f02162e4
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon May 23 17:51:12 2022 -0700

    Merge tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
    
    Pull Intel TDX support from Borislav Petkov:
     "Intel Trust Domain Extensions (TDX) support.
    
      This is the Intel version of a confidential computing solution called
      Trust Domain Extensions (TDX). This series adds support to run the
      kernel as part of a TDX guest. It provides similar guest protections
      to AMD's SEV-SNP like guest memory and register state encryption,
      memory integrity protection and a lot more.
    
      Design-wise, it differs from AMD's solution considerably: it uses a
      software module which runs in a special CPU mode called (Secure
      Arbitration Mode) SEAM. As the name suggests, this module serves as
      sort of an arbiter which the confidential guest calls for services it
      needs during its lifetime.
    
      Just like AMD's SNP set, this series reworks and streamlines certain
      parts of x86 arch code so that this feature can be properly
      accomodated"


Is that sufficient for this patch, or is there more pending out of
tree that QEMU still depends on ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 00/40] TDX QEMU support
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (39 preceding siblings ...)
  2022-08-02  7:47 ` [PATCH v1 40/40] docs: Add TDX documentation Xiaoyao Li
@ 2022-08-02  9:49 ` Daniel P. Berrangé
  2022-08-02 10:55   ` Xiaoyao Li
  2022-09-05  0:58 ` Xiaoyao Li
  41 siblings, 1 reply; 80+ messages in thread
From: Daniel P. Berrangé @ 2022-08-02  9:49 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:10PM +0800, Xiaoyao Li wrote:
> This is the first version that removes RFC tag since last RFC gots
> several acked-by. Hope more people and reviewers can help review it.
> 
> 
> This patch series aims to enable TDX support to allow creating and booting a
> TD (TDX VM) with QEMU. It needs to work with corresponding KVM patch [1].
> TDX related documents can be found in [2].
> 
> this series is also available in github:
> 
> https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v1
> 
> To boot a TDX VM, it requires several changes/additional steps in the flow:
> 
>  1. specify the vm type KVM_X86_TDX_VM when creating VM with
>     IOCTL(KVM_CREATE_VM);
>  2. initialize VM scope configuration before creating any VCPU;
>  3. initialize VCPU scope configuration;
>  4. initialize virtual firmware (TDVF) in guest private memory before
>     vcpu running;
> 
> Besides, TDX VM needs to boot with TDVF (TDX virtual firmware) and currently
> upstream OVMF can serve as TDVF. This series adds the support of parsing TDVF,
> loading TDVF into guest's private memory and preparing TD HOB info for TDVF.
> 
> [1] KVM TDX basic feature support v7
> https://lore.kernel.org/all/cover.1656366337.git.isaku.yamahata@intel.com/
> 
> [2] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
> 
> == Limitation and future work ==


> - CPU model
> 
>   We cannot create a TD with arbitrary CPU model like what for non-TDX VMs,
>   because only a subset of features can be configured for TD.
>   
>   - It's recommended to use '-cpu host' to create TD;
>   - '+feature/-feature' might not work as expected;
> 
>   future work: To introduce specific CPU model for TDs and enhance +/-features
>                for TDs.

Which features are incompatible with TDX ?

Presumably you have such a list, so that KVM can block them when
using '-cpu host' ? If so, we should be able to sanity check the
use of these features in QEMU for the named CPU models / feature
selection too.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 01/40] *** HACK *** linux-headers: Update headers to pull in TDX API changes
  2022-08-02  9:47   ` Daniel P. Berrangé
@ 2022-08-02 10:38     ` Xiaoyao Li
  0 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02 10:38 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 8/2/2022 5:47 PM, Daniel P. Berrangé wrote:
> On Tue, Aug 02, 2022 at 03:47:11PM +0800, Xiaoyao Li wrote:
>> Pull in recent TDX updates, which are not backwards compatible.
>>
>> It's just to make this series runnable. It will be updated by script
>>
>> 	scripts/update-linux-headers.sh
>>
>> once TDX support is upstreamed in linux kernel.
> 
> I saw a bunch of TDX support merged in 5.19:
> 
> commit 3a755ebcc2557e22b895b8976257f682c653db1d
> Merge: 5b828263b180 c796f02162e4
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Mon May 23 17:51:12 2022 -0700
> 
>      Merge tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>      
>      Pull Intel TDX support from Borislav Petkov:
>       "Intel Trust Domain Extensions (TDX) support.
>      
>        This is the Intel version of a confidential computing solution called
>        Trust Domain Extensions (TDX). This series adds support to run the
>        kernel as part of a TDX guest. It provides similar guest protections
>        to AMD's SEV-SNP like guest memory and register state encryption,
>        memory integrity protection and a lot more.
>      
>        Design-wise, it differs from AMD's solution considerably: it uses a
>        software module which runs in a special CPU mode called (Secure
>        Arbitration Mode) SEAM. As the name suggests, this module serves as
>        sort of an arbiter which the confidential guest calls for services it
>        needs during its lifetime.
>      
>        Just like AMD's SNP set, this series reworks and streamlines certain
>        parts of x86 arch code so that this feature can be properly
>        accomodated"
> 
> 
> Is that sufficient for this patch, or is there more pending out of
> tree that QEMU still depends on ?

That's TDX guest support, i.e., running Liunx as TDX guest OS.

What QEMU needs is TDX KVM support and that hasn't been merged yet.

> With regards,
> Daniel


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 00/40] TDX QEMU support
  2022-08-02  9:49 ` [PATCH v1 00/40] TDX QEMU support Daniel P. Berrangé
@ 2022-08-02 10:55   ` Xiaoyao Li
  2022-08-03 17:44     ` Daniel P. Berrangé
  0 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-02 10:55 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 8/2/2022 5:49 PM, Daniel P. Berrangé wrote:
> On Tue, Aug 02, 2022 at 03:47:10PM +0800, Xiaoyao Li wrote:

>> - CPU model
>>
>>    We cannot create a TD with arbitrary CPU model like what for non-TDX VMs,
>>    because only a subset of features can be configured for TD.
>>    
>>    - It's recommended to use '-cpu host' to create TD;
>>    - '+feature/-feature' might not work as expected;
>>
>>    future work: To introduce specific CPU model for TDs and enhance +/-features
>>                 for TDs.
> 
> Which features are incompatible with TDX ?

TDX enforces some features fixed to 1 (e.g., CPUID_EXT_X2APIC, 
CPUID_EXT_HYPERVISOR)and some fixed to 0 (e.g., CPUID_EXT_VMX ).

Details can be found in patch 8 and TDX spec chapter "CPUID virtualization"

> Presumably you have such a list, so that KVM can block them when
> using '-cpu host' ? 

No, KVM doesn't do this. The result is no error reported from KVM but 
what TD OS sees from CPUID might be different what user specifies in QEMU.

> If so, we should be able to sanity check the
> use of these features in QEMU for the named CPU models / feature
> selection too.

This series enhances get_supported_cpuid() for TDX. If named CPU models 
are used to boot a TDX guest, it likely gets warning of "xxx feature is 
not available"

We have another series to enhance the "-feature" for TDX, to warn out if 
some fixed1 is specified to be removed. Besides, we will introduce 
specific named CPU model for TDX. e.g., TDX-SapphireRapids which 
contains the maximum feature set a TDX guest can have on SPR host.

> 
> With regards,
> Daniel


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions
  2022-08-02  7:47 ` [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions Xiaoyao Li
@ 2022-08-03  7:33   ` Chenyi Qiang
  2022-08-04  0:55     ` Xiaoyao Li
  2022-08-26  4:00     ` Xiaoyao Li
  2022-08-25 11:26   ` Gerd Hoffmann
  1 sibling, 2 replies; 80+ messages in thread
From: Chenyi Qiang @ 2022-08-03  7:33 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc



On 8/2/2022 3:47 PM, Xiaoyao Li wrote:
> According to Chapter "CPUID Virtualization" in TDX module spec, CPUID
> bits of TD can be classified into 6 types:
> 
> ------------------------------------------------------------------------
> 1 | As configured | configurable by VMM, independent of native value;
> ------------------------------------------------------------------------
> 2 | As configured | configurable by VMM if the bit is supported natively
>      (if native)   | Otherwise it equals as native(0).
> ------------------------------------------------------------------------
> 3 | Fixed         | fixed to 0/1
> ------------------------------------------------------------------------
> 4 | Native        | reflect the native value
> ------------------------------------------------------------------------
> 5 | Calculated    | calculated by TDX module.
> ------------------------------------------------------------------------
> 6 | Inducing #VE  | get #VE exception
> ------------------------------------------------------------------------
> 
> Note:
> 1. All the configurable XFAM related features and TD attributes related
>     features fall into type #2. And fixed0/1 bits of XFAM and TD
>     attributes fall into type #3.
> 
> 2. For CPUID leaves not listed in "CPUID virtualization Overview" table
>     in TDX module spec. When they are queried, TDX module injects #VE to
>     TDs. For this case, TDs can request CPUID emulation from VMM via
>     TDVMCALL and the values are fully controlled by VMM.
> 
> Due to TDX module has its own virtualization policy on CPUID bits, it leads
> to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported
> CPUID bits for TDS. In order to keep a consistent CPUID configuration
> between VMM and TDs. Adjust supported CPUID for TDs based on TDX
> restrictions.
> 
> Currently only focus on the CPUID leaves recognized by QEMU's
> feature_word_info[] that are indexed by a FeatureWord.
> 
> Introduce a TDX CPUID lookup table, which maintains 1 entry for each
> FeatureWord. Each entry has below fields:
> 
>   - tdx_fixed0/1: The bits that are fixed as 0/1;
> 
>   - vmm_fixup:   The bits that are configurable from the view of TDX module.
>                  But they requires emulation of VMM when they are configured
> 	        as enabled. For those, they are not supported if VMM doesn't
> 		report them as supported. So they need be fixed up by
> 		checking if VMM supports them.
> 
>   - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is
>                  totally configurable by VMM.
> 
>   - supported_on_ve: It's valid only when @inducing_ve is true. It represents
> 		    the maximum feature set supported that be emulated
> 		    for TDs.
> 
> By applying TDX CPUID lookup table and TDX capabilities reported from
> TDX module, the supported CPUID for TDs can be obtained from following
> steps:
> 
> - get the base of VMM supported feature set;
> 
> - if the leaf is not a FeatureWord just return VMM's value without
>    modification;
> 
> - if the leaf is an inducing_ve type, applying supported_on_ve mask and
>    return;
> 
> - include all native bits, it covers type #2, #4, and parts of type #1.
>    (it also includes some unsupported bits. The following step will
>     correct it.)
> 
> - apply fixed0/1 to it (it covers #3, and rectifies the previous step);
> 
> - add configurable bits (it covers the other part of type #1);
> 
> - fix the ones in vmm_fixup;
> 
> - filter the one has valid .supported field;

What does .supported field filter mean here?

> 
> (Calculated type is ignored since it's determined at runtime).
> 
> Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com>
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>   target/i386/cpu.h     |  16 +++
>   target/i386/kvm/kvm.c |   4 +
>   target/i386/kvm/tdx.c | 255 ++++++++++++++++++++++++++++++++++++++++++
>   target/i386/kvm/tdx.h |   2 +
>   4 files changed, 277 insertions(+)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 82004b65b944..cc9da9fc4318 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -771,6 +771,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
>   
>   /* Support RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE */
>   #define CPUID_7_0_EBX_FSGSBASE          (1U << 0)
> +/* Support for TSC adjustment MSR 0x3B */
> +#define CPUID_7_0_EBX_TSC_ADJUST        (1U << 1)
>   /* Support SGX */
>   #define CPUID_7_0_EBX_SGX               (1U << 2)
>   /* 1st Group of Advanced Bit Manipulation Extensions */
> @@ -789,8 +791,12 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
>   #define CPUID_7_0_EBX_INVPCID           (1U << 10)
>   /* Restricted Transactional Memory */
>   #define CPUID_7_0_EBX_RTM               (1U << 11)
> +/* Cache QoS Monitoring */
> +#define CPUID_7_0_EBX_PQM               (1U << 12)
>   /* Memory Protection Extension */
>   #define CPUID_7_0_EBX_MPX               (1U << 14)
> +/* Resource Director Technology Allocation */
> +#define CPUID_7_0_EBX_RDT_A             (1U << 15)
>   /* AVX-512 Foundation */
>   #define CPUID_7_0_EBX_AVX512F           (1U << 16)
>   /* AVX-512 Doubleword & Quadword Instruction */
> @@ -846,10 +852,16 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
>   #define CPUID_7_0_ECX_AVX512VNNI        (1U << 11)
>   /* Support for VPOPCNT[B,W] and VPSHUFBITQMB */
>   #define CPUID_7_0_ECX_AVX512BITALG      (1U << 12)
> +/* Intel Total Memory Encryption */
> +#define CPUID_7_0_ECX_TME               (1U << 13)
>   /* POPCNT for vectors of DW/QW */
>   #define CPUID_7_0_ECX_AVX512_VPOPCNTDQ  (1U << 14)
> +/* Placeholder for bit 15 */
> +#define CPUID_7_0_ECX_FZM               (1U << 15)
>   /* 5-level Page Tables */
>   #define CPUID_7_0_ECX_LA57              (1U << 16)
> +/* MAWAU for MPX */
> +#define CPUID_7_0_ECX_MAWAU             (31U << 17)
>   /* Read Processor ID */
>   #define CPUID_7_0_ECX_RDPID             (1U << 22)
>   /* Bus Lock Debug Exception */
> @@ -860,6 +872,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
>   #define CPUID_7_0_ECX_MOVDIRI           (1U << 27)
>   /* Move 64 Bytes as Direct Store Instruction */
>   #define CPUID_7_0_ECX_MOVDIR64B         (1U << 28)
> +/* ENQCMD and ENQCMDS instructions */
> +#define CPUID_7_0_ECX_ENQCMD            (1U << 29)
>   /* Support SGX Launch Control */
>   #define CPUID_7_0_ECX_SGX_LC            (1U << 30)
>   /* Protection Keys for Supervisor-mode Pages */
> @@ -877,6 +891,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
>   #define CPUID_7_0_EDX_SERIALIZE         (1U << 14)
>   /* TSX Suspend Load Address Tracking instruction */
>   #define CPUID_7_0_EDX_TSX_LDTRK         (1U << 16)
> +/* PCONFIG instruction */
> +#define CPUID_7_0_EDX_PCONFIG           (1U << 18)
>   /* Architectural LBRs */
>   #define CPUID_7_0_EDX_ARCH_LBR          (1U << 19)
>   /* AVX512_FP16 instruction */
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 9e30fa9f4eb5..9930902ae890 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -492,6 +492,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
>           ret |= 1U << KVM_HINTS_REALTIME;
>       }
>   
> +    if (is_tdx_vm()) {
> +        tdx_get_supported_cpuid(function, index, reg, &ret);
> +    }
> +
>       return ret;
>   }
>   
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index fdd6bec58758..e3e9a424512e 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -14,11 +14,134 @@
>   #include "qemu/osdep.h"
>   #include "qapi/error.h"
>   #include "qom/object_interfaces.h"
> +#include "standard-headers/asm-x86/kvm_para.h"
>   #include "sysemu/kvm.h"
>   
>   #include "hw/i386/x86.h"
>   #include "kvm_i386.h"
>   #include "tdx.h"
> +#include "../cpu-internal.h"
> +
> +#define TDX_SUPPORTED_KVM_FEATURES  ((1U << KVM_FEATURE_NOP_IO_DELAY) | \
> +                                     (1U << KVM_FEATURE_PV_UNHALT) | \
> +                                     (1U << KVM_FEATURE_PV_TLB_FLUSH) | \
> +                                     (1U << KVM_FEATURE_PV_SEND_IPI) | \
> +                                     (1U << KVM_FEATURE_POLL_CONTROL) | \
> +                                     (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
> +                                     (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
> +
> +typedef struct KvmTdxCpuidLookup {
> +    uint32_t tdx_fixed0;
> +    uint32_t tdx_fixed1;
> +
> +    /*
> +     * The CPUID bits that are configurable from the view of TDX module
> +     * but require VMM emulation if configured to enabled by VMM.
> +     *
> +     * For those bits, they cannot be enabled actually if VMM (KVM/QEMU) cannot
> +     * virtualize them.
> +     */
> +    uint32_t vmm_fixup;
> +
> +    bool inducing_ve;
> +    /*
> +     * The maximum supported feature set for given inducing-#VE leaf.
> +     * It's valid only when .inducing_ve is true.
> +     */
> +    uint32_t supported_on_ve;
> +} KvmTdxCpuidLookup;
> +
> + /*
> +  * QEMU maintained TDX CPUID lookup tables, which reflects how CPUIDs are
> +  * virtualized for guest TDs based on "CPUID virtualization" of TDX spec.
> +  *
> +  * Note:
> +  *
> +  * This table will be updated runtime by tdx_caps reported by platform.
> +  *
> +  */
> +static KvmTdxCpuidLookup tdx_cpuid_lookup[FEATURE_WORDS] = {
> +    [FEAT_1_EDX] = {
> +        .tdx_fixed0 =
> +            BIT(10) | BIT(20) | CPUID_IA64,
> +        .tdx_fixed1 =
> +            CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_APIC |
> +            CPUID_MTRR | CPUID_MCA | CPUID_CLFLUSH | CPUID_DTS,
> +        .vmm_fixup =
> +            CPUID_ACPI | CPUID_PBE,
> +    },
> +    [FEAT_1_ECX] = {
> +        .tdx_fixed0 =
> +            CPUID_EXT_MONITOR | CPUID_EXT_VMX | CPUID_EXT_SMX |
> +            BIT(16),
> +        .tdx_fixed1 =
> +            CPUID_EXT_CX16 | CPUID_EXT_PDCM | CPUID_EXT_X2APIC |
> +            CPUID_EXT_AES | CPUID_EXT_XSAVE | CPUID_EXT_RDRAND |
> +            CPUID_EXT_HYPERVISOR,
> +        .vmm_fixup =
> +            CPUID_EXT_EST | CPUID_EXT_TM2 | CPUID_EXT_XTPR | CPUID_EXT_DCA,
> +    },
> +    [FEAT_8000_0001_EDX] = {

...

> +        .tdx_fixed1 =
> +            CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB | CPUID_EXT2_RDTSCP |
> +            CPUID_EXT2_LM,
> +    },
> +    [FEAT_7_0_EBX] = {
> +        .tdx_fixed0 =
> +            CPUID_7_0_EBX_TSC_ADJUST | CPUID_7_0_EBX_SGX | CPUID_7_0_EBX_MPX,
> +        .tdx_fixed1 =
> +            CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_RTM |
> +            CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_SMAP |
> +            CPUID_7_0_EBX_CLFLUSHOPT | CPUID_7_0_EBX_CLWB |
> +            CPUID_7_0_EBX_SHA_NI,
> +        .vmm_fixup =
> +            CPUID_7_0_EBX_PQM | CPUID_7_0_EBX_RDT_A,
> +    },
> +    [FEAT_7_0_ECX] = {
> +        .tdx_fixed0 =
> +            CPUID_7_0_ECX_FZM | CPUID_7_0_ECX_MAWAU |
> +            CPUID_7_0_ECX_ENQCMD | CPUID_7_0_ECX_SGX_LC,
> +        .tdx_fixed1 =
> +            CPUID_7_0_ECX_MOVDIR64B | CPUID_7_0_ECX_BUS_LOCK_DETECT,
> +        .vmm_fixup =
> +            CPUID_7_0_ECX_TME,
> +    },
> +    [FEAT_7_0_EDX] = {
> +        .tdx_fixed1 =
> +            CPUID_7_0_EDX_SPEC_CTRL | CPUID_7_0_EDX_ARCH_CAPABILITIES |
> +            CPUID_7_0_EDX_CORE_CAPABILITY | CPUID_7_0_EDX_SPEC_CTRL_SSBD,
> +        .vmm_fixup =
> +            CPUID_7_0_EDX_PCONFIG,
> +    },
> +    [FEAT_8000_0001_EDX] = {
> +        .tdx_fixed1 =
> +            CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB |
> +            CPUID_EXT2_RDTSCP | CPUID_EXT2_LM,
> +    },

duplicated FEAT_8000_0001_EDX item.

> +    [FEAT_8000_0008_EBX] = {
> +        .tdx_fixed0 =
> +            ~CPUID_8000_0008_EBX_WBNOINVD,
> +        .tdx_fixed1 =
> +            CPUID_8000_0008_EBX_WBNOINVD,
> +    },
> +    [FEAT_XSAVE] = {
> +        .tdx_fixed1 =
> +            CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XSAVEC |
> +            CPUID_XSAVE_XSAVES,
> +    },
> +    [FEAT_6_EAX] = {
> +        .inducing_ve = true,
> +        .supported_on_ve = -1U,
> +    },
> +    [FEAT_8000_0007_EDX] = {
> +        .inducing_ve = true,
> +        .supported_on_ve = -1U,
> +    },
> +    [FEAT_KVM] = {
> +        .inducing_ve = true,
> +        .supported_on_ve = TDX_SUPPORTED_KVM_FEATURES,
> +    },
> +};
>   
>   static TdxGuest *tdx_guest;
>   
> @@ -30,6 +153,138 @@ bool is_tdx_vm(void)
>       return !!tdx_guest;
>   }
>   
> +static inline uint32_t host_cpuid_reg(uint32_t function,
> +                                      uint32_t index, int reg)
> +{
> +    uint32_t eax, ebx, ecx, edx;
> +    uint32_t ret = 0;
> +
> +    host_cpuid(function, index, &eax, &ebx, &ecx, &edx);
> +
> +    switch (reg) {
> +    case R_EAX:
> +        ret |= eax;
> +        break;
> +    case R_EBX:
> +        ret |= ebx;
> +        break;
> +    case R_ECX:
> +        ret |= ecx;
> +        break;
> +    case R_EDX:
> +        ret |= edx;
> +        break;
> +    }
> +    return ret;
> +}
> +
> +static inline uint32_t tdx_cap_cpuid_config(uint32_t function,
> +                                            uint32_t index, int reg)
> +{
> +    struct kvm_tdx_cpuid_config *cpuid_c;
> +    int ret = 0;
> +    int i;
> +
> +    if (tdx_caps->nr_cpuid_configs <= 0) {
> +        return ret;
> +    }
> +
> +    for (i = 0; i < tdx_caps->nr_cpuid_configs; i++) {
> +        cpuid_c = &tdx_caps->cpuid_configs[i];
> +        /* 0xffffffff in sub_leaf means the leaf doesn't require a sublesf */
> +        if (cpuid_c->leaf == function &&
> +            (cpuid_c->sub_leaf == 0xffffffff || cpuid_c->sub_leaf == index)) {
> +            switch (reg) {
> +            case R_EAX:
> +                ret = cpuid_c->eax;
> +                break;
> +            case R_EBX:
> +                ret = cpuid_c->ebx;
> +                break;
> +            case R_ECX:
> +                ret = cpuid_c->ecx;
> +                break;
> +            case R_EDX:
> +                ret = cpuid_c->edx;
> +                break;
> +            default:
> +                return 0;
> +            }
> +        }
> +    }
> +    return ret;
> +}
> +
> +static FeatureWord get_cpuid_featureword_index(uint32_t function,
> +                                               uint32_t index, int reg)
> +{
> +    FeatureWord w;
> +
> +    for (w = 0; w < FEATURE_WORDS; w++) {
> +        FeatureWordInfo *f = &feature_word_info[w];
> +
> +        if (f->type == MSR_FEATURE_WORD || f->cpuid.eax != function ||
> +            f->cpuid.reg != reg ||
> +            (f->cpuid.needs_ecx && f->cpuid.ecx != index)) {
> +            continue;
> +        }
> +
> +        return w;
> +    }
> +
> +    return w;
> +}
> +
> +/*
> + * TDX supported CPUID varies from what KVM reports. Adjust the result by
> + * applying the TDX restrictions.
> + */
> +void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
> +                             uint32_t *ret)
> +{
> +    uint32_t vmm_cap = *ret;
> +    FeatureWord w;
> +
> +    /* Only handle features leaves that recognized by feature_word_info[] */
> +    w = get_cpuid_featureword_index(function, index, reg);
> +    if (w == FEATURE_WORDS) {
> +        return;
> +    }
> +
> +    if (tdx_cpuid_lookup[w].inducing_ve) {
> +        *ret &= tdx_cpuid_lookup[w].supported_on_ve;
> +        return;
> +    }
> +
> +    /*
> +     * Include all the native bits as first step. It covers types
> +     * - As configured (if native)
> +     * - Native
> +     * - XFAM related and Attributes realted
> +     *
> +     * It also has side effect to enable unsupported bits, e.g., the
> +     * bits of "fixed0" type while present natively. It's safe because
> +     * the unsupported bits will be masked off by .fixed0 later.
> +     */
> +    *ret |= host_cpuid_reg(function, index, reg);
> +
> +    /* Adjust according to "fixed" type in tdx_cpuid_lookup. */
> +    *ret |= tdx_cpuid_lookup[w].tdx_fixed1;
> +    *ret &= ~tdx_cpuid_lookup[w].tdx_fixed0;
> +
> +    /*
> +     * Configurable cpuids are supported unconditionally. It's mainly to
> +     * include those configurable regardless of native existence.
> +     */
> +    *ret |= tdx_cap_cpuid_config(function, index, reg);
> +
> +    /*
> +     * clear the configurable bits that require VMM emulation and VMM doesn't
> +     * report the support.
> +     */
> +    *ret &= ~(~vmm_cap & tdx_cpuid_lookup[w].vmm_fixup);
> +}
> +
>   enum tdx_ioctl_level{
>       TDX_PLATFORM_IOCTL,
>       TDX_VM_IOCTL,
> diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
> index 4036ca2f3f99..06599b65b827 100644
> --- a/target/i386/kvm/tdx.h
> +++ b/target/i386/kvm/tdx.h
> @@ -27,5 +27,7 @@ bool is_tdx_vm(void);
>   #endif /* CONFIG_TDX */
>   
>   int tdx_kvm_init(MachineState *ms, Error **errp);
> +void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
> +                             uint32_t *ret);
>   
>   #endif /* QEMU_I386_TDX_H */

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 00/40] TDX QEMU support
  2022-08-02 10:55   ` Xiaoyao Li
@ 2022-08-03 17:44     ` Daniel P. Berrangé
  2022-08-05  0:16       ` Xiaoyao Li
  0 siblings, 1 reply; 80+ messages in thread
From: Daniel P. Berrangé @ 2022-08-03 17:44 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 06:55:48PM +0800, Xiaoyao Li wrote:
> On 8/2/2022 5:49 PM, Daniel P. Berrangé wrote:
> > On Tue, Aug 02, 2022 at 03:47:10PM +0800, Xiaoyao Li wrote:
> 
> > > - CPU model
> > > 
> > >    We cannot create a TD with arbitrary CPU model like what for non-TDX VMs,
> > >    because only a subset of features can be configured for TD.
> > >    - It's recommended to use '-cpu host' to create TD;
> > >    - '+feature/-feature' might not work as expected;
> > > 
> > >    future work: To introduce specific CPU model for TDs and enhance +/-features
> > >                 for TDs.
> > 
> > Which features are incompatible with TDX ?
> 
> TDX enforces some features fixed to 1 (e.g., CPUID_EXT_X2APIC,
> CPUID_EXT_HYPERVISOR)and some fixed to 0 (e.g., CPUID_EXT_VMX ).
> 
> Details can be found in patch 8 and TDX spec chapter "CPUID virtualization"
> 
> > Presumably you have such a list, so that KVM can block them when
> > using '-cpu host' ?
> 
> No, KVM doesn't do this. The result is no error reported from KVM but what
> TD OS sees from CPUID might be different what user specifies in QEMU.
> 
> > If so, we should be able to sanity check the
> > use of these features in QEMU for the named CPU models / feature
> > selection too.
> 
> This series enhances get_supported_cpuid() for TDX. If named CPU models are
> used to boot a TDX guest, it likely gets warning of "xxx feature is not
> available"

If the  ',check=on' arg is given to -cpu, does it ensure that the
guest fails to startup with an incompatible feature set ? That's
really the key thing to protect the user from mistakes.


> We have another series to enhance the "-feature" for TDX, to warn out if
> some fixed1 is specified to be removed. Besides, we will introduce specific
> named CPU model for TDX. e.g., TDX-SapphireRapids which contains the maximum
> feature set a TDX guest can have on SPR host.

I don't know if this is the right approach or not, but we should at least
consider making use of CPU versioning here.  ie have a single "SapphireRapids"
alias, which resolves to a suitable specific CPU version depending on whether
TDX is used or not.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions
  2022-08-03  7:33   ` Chenyi Qiang
@ 2022-08-04  0:55     ` Xiaoyao Li
  2022-08-26  4:00     ` Xiaoyao Li
  1 sibling, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-04  0:55 UTC (permalink / raw)
  To: Chenyi Qiang, Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 8/3/2022 3:33 PM, Chenyi Qiang wrote:
> 
> 
> On 8/2/2022 3:47 PM, Xiaoyao Li wrote:
>> According to Chapter "CPUID Virtualization" in TDX module spec, CPUID
>> bits of TD can be classified into 6 types:
>>
>> ------------------------------------------------------------------------
>> 1 | As configured | configurable by VMM, independent of native value;
>> ------------------------------------------------------------------------
>> 2 | As configured | configurable by VMM if the bit is supported natively
>>      (if native)   | Otherwise it equals as native(0).
>> ------------------------------------------------------------------------
>> 3 | Fixed         | fixed to 0/1
>> ------------------------------------------------------------------------
>> 4 | Native        | reflect the native value
>> ------------------------------------------------------------------------
>> 5 | Calculated    | calculated by TDX module.
>> ------------------------------------------------------------------------
>> 6 | Inducing #VE  | get #VE exception
>> ------------------------------------------------------------------------
>>
>> Note:
>> 1. All the configurable XFAM related features and TD attributes related
>>     features fall into type #2. And fixed0/1 bits of XFAM and TD
>>     attributes fall into type #3.
>>
>> 2. For CPUID leaves not listed in "CPUID virtualization Overview" table
>>     in TDX module spec. When they are queried, TDX module injects #VE to
>>     TDs. For this case, TDs can request CPUID emulation from VMM via
>>     TDVMCALL and the values are fully controlled by VMM.
>>
>> Due to TDX module has its own virtualization policy on CPUID bits, it 
>> leads
>> to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported
>> CPUID bits for TDS. In order to keep a consistent CPUID configuration
>> between VMM and TDs. Adjust supported CPUID for TDs based on TDX
>> restrictions.
>>
>> Currently only focus on the CPUID leaves recognized by QEMU's
>> feature_word_info[] that are indexed by a FeatureWord.
>>
>> Introduce a TDX CPUID lookup table, which maintains 1 entry for each
>> FeatureWord. Each entry has below fields:
>>
>>   - tdx_fixed0/1: The bits that are fixed as 0/1;
>>
>>   - vmm_fixup:   The bits that are configurable from the view of TDX 
>> module.
>>                  But they requires emulation of VMM when they are 
>> configured
>>             as enabled. For those, they are not supported if VMM doesn't
>>         report them as supported. So they need be fixed up by
>>         checking if VMM supports them.
>>
>>   - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is
>>                  totally configurable by VMM.
>>
>>   - supported_on_ve: It's valid only when @inducing_ve is true. It 
>> represents
>>             the maximum feature set supported that be emulated
>>             for TDs.
>>
>> By applying TDX CPUID lookup table and TDX capabilities reported from
>> TDX module, the supported CPUID for TDs can be obtained from following
>> steps:
>>
>> - get the base of VMM supported feature set;
>>
>> - if the leaf is not a FeatureWord just return VMM's value without
>>    modification;
>>
>> - if the leaf is an inducing_ve type, applying supported_on_ve mask and
>>    return;
>>
>> - include all native bits, it covers type #2, #4, and parts of type #1.
>>    (it also includes some unsupported bits. The following step will
>>     correct it.)
>>
>> - apply fixed0/1 to it (it covers #3, and rectifies the previous step);
>>
>> - add configurable bits (it covers the other part of type #1);
>>
>> - fix the ones in vmm_fixup;
>>
>> - filter the one has valid .supported field;
> 
> What does .supported field filter mean here?
> 
>>
>> (Calculated type is ignored since it's determined at runtime).
>>
>> Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   target/i386/cpu.h     |  16 +++
>>   target/i386/kvm/kvm.c |   4 +
>>   target/i386/kvm/tdx.c | 255 ++++++++++++++++++++++++++++++++++++++++++
>>   target/i386/kvm/tdx.h |   2 +
>>   4 files changed, 277 insertions(+)
>>
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 82004b65b944..cc9da9fc4318 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -771,6 +771,8 @@ uint64_t 
>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>   /* Support RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE */
>>   #define CPUID_7_0_EBX_FSGSBASE          (1U << 0)
>> +/* Support for TSC adjustment MSR 0x3B */
>> +#define CPUID_7_0_EBX_TSC_ADJUST        (1U << 1)
>>   /* Support SGX */
>>   #define CPUID_7_0_EBX_SGX               (1U << 2)
>>   /* 1st Group of Advanced Bit Manipulation Extensions */
>> @@ -789,8 +791,12 @@ uint64_t 
>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>   #define CPUID_7_0_EBX_INVPCID           (1U << 10)
>>   /* Restricted Transactional Memory */
>>   #define CPUID_7_0_EBX_RTM               (1U << 11)
>> +/* Cache QoS Monitoring */
>> +#define CPUID_7_0_EBX_PQM               (1U << 12)
>>   /* Memory Protection Extension */
>>   #define CPUID_7_0_EBX_MPX               (1U << 14)
>> +/* Resource Director Technology Allocation */
>> +#define CPUID_7_0_EBX_RDT_A             (1U << 15)
>>   /* AVX-512 Foundation */
>>   #define CPUID_7_0_EBX_AVX512F           (1U << 16)
>>   /* AVX-512 Doubleword & Quadword Instruction */
>> @@ -846,10 +852,16 @@ uint64_t 
>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>   #define CPUID_7_0_ECX_AVX512VNNI        (1U << 11)
>>   /* Support for VPOPCNT[B,W] and VPSHUFBITQMB */
>>   #define CPUID_7_0_ECX_AVX512BITALG      (1U << 12)
>> +/* Intel Total Memory Encryption */
>> +#define CPUID_7_0_ECX_TME               (1U << 13)
>>   /* POPCNT for vectors of DW/QW */
>>   #define CPUID_7_0_ECX_AVX512_VPOPCNTDQ  (1U << 14)
>> +/* Placeholder for bit 15 */
>> +#define CPUID_7_0_ECX_FZM               (1U << 15)
>>   /* 5-level Page Tables */
>>   #define CPUID_7_0_ECX_LA57              (1U << 16)
>> +/* MAWAU for MPX */
>> +#define CPUID_7_0_ECX_MAWAU             (31U << 17)
>>   /* Read Processor ID */
>>   #define CPUID_7_0_ECX_RDPID             (1U << 22)
>>   /* Bus Lock Debug Exception */
>> @@ -860,6 +872,8 @@ uint64_t 
>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>   #define CPUID_7_0_ECX_MOVDIRI           (1U << 27)
>>   /* Move 64 Bytes as Direct Store Instruction */
>>   #define CPUID_7_0_ECX_MOVDIR64B         (1U << 28)
>> +/* ENQCMD and ENQCMDS instructions */
>> +#define CPUID_7_0_ECX_ENQCMD            (1U << 29)
>>   /* Support SGX Launch Control */
>>   #define CPUID_7_0_ECX_SGX_LC            (1U << 30)
>>   /* Protection Keys for Supervisor-mode Pages */
>> @@ -877,6 +891,8 @@ uint64_t 
>> x86_cpu_get_supported_feature_word(FeatureWord w,
>>   #define CPUID_7_0_EDX_SERIALIZE         (1U << 14)
>>   /* TSX Suspend Load Address Tracking instruction */
>>   #define CPUID_7_0_EDX_TSX_LDTRK         (1U << 16)
>> +/* PCONFIG instruction */
>> +#define CPUID_7_0_EDX_PCONFIG           (1U << 18)
>>   /* Architectural LBRs */
>>   #define CPUID_7_0_EDX_ARCH_LBR          (1U << 19)
>>   /* AVX512_FP16 instruction */
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 9e30fa9f4eb5..9930902ae890 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -492,6 +492,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState 
>> *s, uint32_t function,
>>           ret |= 1U << KVM_HINTS_REALTIME;
>>       }
>> +    if (is_tdx_vm()) {
>> +        tdx_get_supported_cpuid(function, index, reg, &ret);
>> +    }
>> +
>>       return ret;
>>   }
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index fdd6bec58758..e3e9a424512e 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -14,11 +14,134 @@
>>   #include "qemu/osdep.h"
>>   #include "qapi/error.h"
>>   #include "qom/object_interfaces.h"
>> +#include "standard-headers/asm-x86/kvm_para.h"
>>   #include "sysemu/kvm.h"
>>   #include "hw/i386/x86.h"
>>   #include "kvm_i386.h"
>>   #include "tdx.h"
>> +#include "../cpu-internal.h"
>> +
>> +#define TDX_SUPPORTED_KVM_FEATURES  ((1U << KVM_FEATURE_NOP_IO_DELAY) 
>> | \
>> +                                     (1U << KVM_FEATURE_PV_UNHALT) | \
>> +                                     (1U << KVM_FEATURE_PV_TLB_FLUSH) 
>> | \
>> +                                     (1U << KVM_FEATURE_PV_SEND_IPI) | \
>> +                                     (1U << KVM_FEATURE_POLL_CONTROL) 
>> | \
>> +                                     (1U << 
>> KVM_FEATURE_PV_SCHED_YIELD) | \
>> +                                     (1U << 
>> KVM_FEATURE_MSI_EXT_DEST_ID))
>> +
>> +typedef struct KvmTdxCpuidLookup {
>> +    uint32_t tdx_fixed0;
>> +    uint32_t tdx_fixed1;
>> +
>> +    /*
>> +     * The CPUID bits that are configurable from the view of TDX module
>> +     * but require VMM emulation if configured to enabled by VMM.
>> +     *
>> +     * For those bits, they cannot be enabled actually if VMM 
>> (KVM/QEMU) cannot
>> +     * virtualize them.
>> +     */
>> +    uint32_t vmm_fixup;
>> +
>> +    bool inducing_ve;
>> +    /*
>> +     * The maximum supported feature set for given inducing-#VE leaf.
>> +     * It's valid only when .inducing_ve is true.
>> +     */
>> +    uint32_t supported_on_ve;
>> +} KvmTdxCpuidLookup;
>> +
>> + /*
>> +  * QEMU maintained TDX CPUID lookup tables, which reflects how 
>> CPUIDs are
>> +  * virtualized for guest TDs based on "CPUID virtualization" of TDX 
>> spec.
>> +  *
>> +  * Note:
>> +  *
>> +  * This table will be updated runtime by tdx_caps reported by platform.
>> +  *
>> +  */
>> +static KvmTdxCpuidLookup tdx_cpuid_lookup[FEATURE_WORDS] = {
>> +    [FEAT_1_EDX] = {
>> +        .tdx_fixed0 =
>> +            BIT(10) | BIT(20) | CPUID_IA64,
>> +        .tdx_fixed1 =
>> +            CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_APIC |
>> +            CPUID_MTRR | CPUID_MCA | CPUID_CLFLUSH | CPUID_DTS,
>> +        .vmm_fixup =
>> +            CPUID_ACPI | CPUID_PBE,
>> +    },
>> +    [FEAT_1_ECX] = {
>> +        .tdx_fixed0 =
>> +            CPUID_EXT_MONITOR | CPUID_EXT_VMX | CPUID_EXT_SMX |
>> +            BIT(16),
>> +        .tdx_fixed1 =
>> +            CPUID_EXT_CX16 | CPUID_EXT_PDCM | CPUID_EXT_X2APIC |
>> +            CPUID_EXT_AES | CPUID_EXT_XSAVE | CPUID_EXT_RDRAND |
>> +            CPUID_EXT_HYPERVISOR,
>> +        .vmm_fixup =
>> +            CPUID_EXT_EST | CPUID_EXT_TM2 | CPUID_EXT_XTPR | 
>> CPUID_EXT_DCA,
>> +    },
>> +    [FEAT_8000_0001_EDX] = {
> 
> ...
> 
>> +        .tdx_fixed1 =
>> +            CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB | CPUID_EXT2_RDTSCP |
>> +            CPUID_EXT2_LM,
>> +    },
>> +    [FEAT_7_0_EBX] = {
>> +        .tdx_fixed0 =
>> +            CPUID_7_0_EBX_TSC_ADJUST | CPUID_7_0_EBX_SGX | 
>> CPUID_7_0_EBX_MPX,
>> +        .tdx_fixed1 =
>> +            CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_RTM |
>> +            CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_SMAP |
>> +            CPUID_7_0_EBX_CLFLUSHOPT | CPUID_7_0_EBX_CLWB |
>> +            CPUID_7_0_EBX_SHA_NI,
>> +        .vmm_fixup =
>> +            CPUID_7_0_EBX_PQM | CPUID_7_0_EBX_RDT_A,
>> +    },
>> +    [FEAT_7_0_ECX] = {
>> +        .tdx_fixed0 =
>> +            CPUID_7_0_ECX_FZM | CPUID_7_0_ECX_MAWAU |
>> +            CPUID_7_0_ECX_ENQCMD | CPUID_7_0_ECX_SGX_LC,
>> +        .tdx_fixed1 =
>> +            CPUID_7_0_ECX_MOVDIR64B | CPUID_7_0_ECX_BUS_LOCK_DETECT,
>> +        .vmm_fixup =
>> +            CPUID_7_0_ECX_TME,
>> +    },
>> +    [FEAT_7_0_EDX] = {
>> +        .tdx_fixed1 =
>> +            CPUID_7_0_EDX_SPEC_CTRL | CPUID_7_0_EDX_ARCH_CAPABILITIES |
>> +            CPUID_7_0_EDX_CORE_CAPABILITY | 
>> CPUID_7_0_EDX_SPEC_CTRL_SSBD,
>> +        .vmm_fixup =
>> +            CPUID_7_0_EDX_PCONFIG,
>> +    },
>> +    [FEAT_8000_0001_EDX] = {
>> +        .tdx_fixed1 =
>> +            CPUID_EXT2_NX | CPUID_EXT2_PDPE1GB |
>> +            CPUID_EXT2_RDTSCP | CPUID_EXT2_LM,
>> +    },
> 
> duplicated FEAT_8000_0001_EDX item.
> 

fixed.

Thanks,
-Xiaoyao

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 00/40] TDX QEMU support
  2022-08-03 17:44     ` Daniel P. Berrangé
@ 2022-08-05  0:16       ` Xiaoyao Li
  0 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-05  0:16 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 8/4/2022 1:44 AM, Daniel P. Berrangé wrote:
> On Tue, Aug 02, 2022 at 06:55:48PM +0800, Xiaoyao Li wrote:
>> On 8/2/2022 5:49 PM, Daniel P. Berrangé wrote:
>>> On Tue, Aug 02, 2022 at 03:47:10PM +0800, Xiaoyao Li wrote:
>>
>>>> - CPU model
>>>>
>>>>     We cannot create a TD with arbitrary CPU model like what for non-TDX VMs,
>>>>     because only a subset of features can be configured for TD.
>>>>     - It's recommended to use '-cpu host' to create TD;
>>>>     - '+feature/-feature' might not work as expected;
>>>>
>>>>     future work: To introduce specific CPU model for TDs and enhance +/-features
>>>>                  for TDs.
>>>
>>> Which features are incompatible with TDX ?
>>
>> TDX enforces some features fixed to 1 (e.g., CPUID_EXT_X2APIC,
>> CPUID_EXT_HYPERVISOR)and some fixed to 0 (e.g., CPUID_EXT_VMX ).
>>
>> Details can be found in patch 8 and TDX spec chapter "CPUID virtualization"
>>
>>> Presumably you have such a list, so that KVM can block them when
>>> using '-cpu host' ?
>>
>> No, KVM doesn't do this. The result is no error reported from KVM but what
>> TD OS sees from CPUID might be different what user specifies in QEMU.
>>
>>> If so, we should be able to sanity check the
>>> use of these features in QEMU for the named CPU models / feature
>>> selection too.
>>
>> This series enhances get_supported_cpuid() for TDX. If named CPU models are
>> used to boot a TDX guest, it likely gets warning of "xxx feature is not
>> available"
> 
> If the  ',check=on' arg is given to -cpu, does it ensure that the
> guest fails to startup with an incompatible feature set ? That's
> really the key thing to protect the user from mistakes.

"check=on" won't stop startup with an incompatible feature set but 
"enforce=on". Yes, this series can ensure it with "enforce=on"

> 
>> We have another series to enhance the "-feature" for TDX, to warn out if
>> some fixed1 is specified to be removed. Besides, we will introduce specific
>> named CPU model for TDX. e.g., TDX-SapphireRapids which contains the maximum
>> feature set a TDX guest can have on SPR host.
> 
> I don't know if this is the right approach or not, but we should at least
> consider making use of CPU versioning here.  ie have a single "SapphireRapids"
> alias, which resolves to a suitable specific CPU version depending on whether
> TDX is used or not.

New version of a CPU model inherits from the last version. This fits 
well with CPU model fixup when features need to be removed/added to 
existing CPU model to make it work well with the latest kernel, and a 
new version is created.

However, I think it less proper to define a TDX variant with versioned- 
cpu model. For example, we have a SPR-V(x), then we need to define 
SPR-V(x+1) and alias it as SPR-TDX. For SPR-V(x+1), we need to add and 
remove several features. In the future, we may need a SPR-V(x+2) to fix 
up the normal SPR cpu model SPR-V(x). All the changes in V(x+1)/SPR-TDX 
  has to be reverted at first.

Anyway, we can discuss it in the future when we post the series of TDX 
CPU model. We plan to do that after this basic series gets merged. :)

> With regards,
> Daniel


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 06/40] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  2022-08-02  7:47 ` [PATCH v1 06/40] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
@ 2022-08-25 10:12   ` Gerd Hoffmann
  2022-08-25 15:35     ` Xiaoyao Li
  0 siblings, 1 reply; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-25 10:12 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> +        r = tdx_platform_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
> +        if (r == -E2BIG) {
> +            g_free(caps);
> +            nr_cpuid_configs *= 2;
> +            if (nr_cpuid_configs > KVM_MAX_CPUID_ENTRIES) {
> +                error_report("KVM TDX seems broken");

Maybe, but IMHO this should still report what exactly the problem is
(number of cpuid entries exceeds limit).

take care,
  Gerd


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 07/40] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
  2022-08-02  7:47 ` [PATCH v1 07/40] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
@ 2022-08-25 10:16   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-25 10:16 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:17PM +0800, Xiaoyao Li wrote:
> It will need special handling for TDX VMs all around the QEMU.
> Introduce is_tdx_vm() helper to query if it's a TDX VM.
> 
> Cache tdx_guest object thus no need to cast from ms->cgs every time.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions
  2022-08-02  7:47 ` [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions Xiaoyao Li
  2022-08-03  7:33   ` Chenyi Qiang
@ 2022-08-25 11:26   ` Gerd Hoffmann
  2022-08-25 12:44     ` Xiaoyao Li
  1 sibling, 1 reply; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-25 11:26 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> between VMM and TDs. Adjust supported CPUID for TDs based on TDX
> restrictions.

Automatic adjustment depending on hardware capabilities isn't going to
fly long-term, you'll run into compatibility problems sooner or later,
for example when different hardware with diverging capabilities (first
vs. second TDX generation) leads to different CPUID capsets in a
otherwise identical configuration.

Verification should happen of course, but I think qemu should just throw
an error in case the tdx can't support a given cpu configuration.

(see also Daniels reply to the cover letter).

take care,
  Gerd


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 13/40] KVM: Introduce kvm_arch_pre_create_vcpu()
  2022-08-02  7:47 ` [PATCH v1 13/40] KVM: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
@ 2022-08-25 11:28   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-25 11:28 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:23PM +0800, Xiaoyao Li wrote:
> Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
> work prior to create any vcpu. This is for i386 TDX because it needs
> call TDX_INIT_VM before creating any vcpu.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 14/40] i386/tdx: Initialize TDX before creating TD vcpus
  2022-08-02  7:47 ` [PATCH v1 14/40] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
@ 2022-08-25 11:29   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-25 11:29 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:24PM +0800, Xiaoyao Li wrote:
> Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
> configures global TD state, e.g. the canonical CPUID config, and must
> be executed prior to creating vCPUs.
> 
> Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM and
> tie x86cpu->enable_pmu with TD's attributes.
> 
> Note, this doesn't address the fact that QEMU may change the CPUID
> configuration when creating vCPUs, i.e. punts on refactoring QEMU to
> provide a stable CPUID config prior to kvm_arch_init().

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
  2022-08-02  7:47 ` [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object Xiaoyao Li
@ 2022-08-25 11:36   ` Gerd Hoffmann
  2022-08-25 14:42     ` Xiaoyao Li
  0 siblings, 1 reply; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-25 11:36 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:25PM +0800, Xiaoyao Li wrote:
> Bit 28, named SEPT_VE_DISABLE, disables	EPT violation conversion to #VE
> on guest TD access of PENDING pages when set to 1. Some guest OS (e.g.,
> Linux TD guest) may require this bit set as 1. Otherwise refuse to boot.

--verbose please.  That somehow doesn't make sense to me.

A guest is either TDX-aware (which should be the case for linux 5.19+),
or it is not.  My expectation would be that guests which are not
TDX-aware will be disturbed by any #VE exception, not only the ones
triggered by EPT violations.  So I'm wondering what this config bit
actually is useful for ...

take care,
  Gerd


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 16/40] i386/tdx: Wire CPU features up with attributes of TD guest
  2022-08-02  7:47 ` [PATCH v1 16/40] i386/tdx: Wire CPU features up with attributes of TD guest Xiaoyao Li
@ 2022-08-25 11:38   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-25 11:38 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:26PM +0800, Xiaoyao Li wrote:
> For QEMU VMs, PKS is configured via CPUID_7_0_ECX_PKS and PMU is
> configured by x86cpu->enable_pmu. Reuse the existing configuration
> interface for TDX VMs.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 17/40] i386/tdx: Validate TD attributes
  2022-08-02  7:47 ` [PATCH v1 17/40] i386/tdx: Validate TD attributes Xiaoyao Li
@ 2022-08-25 11:39   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-25 11:39 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:27PM +0800, Xiaoyao Li wrote:
> Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
> fixed-1 bits must be set.
> 
> Besides, sanity check the attribute bits that have not been supported by
> QEMU yet. e.g., debug bit, it will be allowed in the future when debug
> TD support lands in QEMU.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 18/40] i386/tdx: Implement user specified tsc frequency
  2022-08-02  7:47 ` [PATCH v1 18/40] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
@ 2022-08-25 11:41   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-25 11:41 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:28PM +0800, Xiaoyao Li wrote:
> Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and call VM
> scope VM_SET_TSC_KHZ to set the tsc frequency of TD before KVM_TDX_INIT_VM.
> 
> Besides, sanity check the tsc frequency to be in the legal range and
> legal granularity (required by TDX module).
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions
  2022-08-25 11:26   ` Gerd Hoffmann
@ 2022-08-25 12:44     ` Xiaoyao Li
  0 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-25 12:44 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 8/25/2022 7:26 PM, Gerd Hoffmann wrote:
>    Hi,
> 
>> between VMM and TDs. Adjust supported CPUID for TDs based on TDX
>> restrictions.
> 
> Automatic adjustment depending on hardware capabilities isn't going to
> fly long-term, you'll run into compatibility problems sooner or later,
> for example when different hardware with diverging capabilities (first
> vs. second TDX generation) leads to different CPUID capsets in a
> otherwise identical configuration.
> 
> Verification should happen of course, but I think qemu should just throw
> an error in case the tdx can't support a given cpu configuration.

I think you misunderstand this patch.

It's to adjust the supported feature set of the platform, not the 
feature set of the given VM/TD. I.e, the adjusted supported feature set 
will be used to *verify* the VM's setting that specified by user. Of 
course, if user requires unsupported feature, QEMU will throw an error.

> (see also Daniels reply to the cover letter).
> 
> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
  2022-08-25 11:36   ` Gerd Hoffmann
@ 2022-08-25 14:42     ` Xiaoyao Li
  2022-08-26  5:57       ` Gerd Hoffmann
  0 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-25 14:42 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 8/25/2022 7:36 PM, Gerd Hoffmann wrote:
> On Tue, Aug 02, 2022 at 03:47:25PM +0800, Xiaoyao Li wrote:
>> Bit 28, named SEPT_VE_DISABLE, disables	EPT violation conversion to #VE
>> on guest TD access of PENDING pages when set to 1. Some guest OS (e.g.,
>> Linux TD guest) may require this bit set as 1. Otherwise refuse to boot.
> 
> --verbose please.  That somehow doesn't make sense to me.
> 
> A guest is either TDX-aware (which should be the case for linux 5.19+),
> or it is not.  My expectation would be that guests which are not
> TDX-aware will be disturbed by any #VE exception, not only the ones
> triggered by EPT violations.  So I'm wondering what this config bit
> actually is useful for ...

This bit, including other properties of tdx-guest object, are supposed 
to be configured for TD only. On VM creation phase, user needs to decide 
if it's a TD (TDX VM) or non-TD (previous normal VM) by attaching 
tdx-guest object or not.

If it's a TD when VM creation, but the guest kernel is not 
TDX-capable/-aware, it's doomed to fail booting.

For TD guest kernel, it has its own reason to turn SEPT_VE on or off. 
E.g., linux TD guest requires SEPT_VE to be disabled to avoid #VE on 
syscall gap [1]. Frankly speaking, this bit is better to be configured 
by TD guest kernel, however current TDX architecture makes the design to 
let VMM configure.


[1]: TD pages that are not accepted cause a #VE exception.
It is possible for a hypervisor to take away a guest page
and thus trigger a #VE the next time it is accessed.
Normally the guest would just panic in such a case, but
for that it first needs to execute the #VE handler
reliably.

This can cause problems with the "system call gap": a malicious
hypervisor might trigger a #VE for example on the system call entry
code, and when a user process does a system call it would trigger a
and SYSCALL relies on the kernel code to switch to the kernel stack,
this would lead to kernel code running on the ring 3 stack.  This could
be exploited by a combination of malicious host and malicious ring 3
program to attack the kernel.


> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 06/40] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  2022-08-25 10:12   ` Gerd Hoffmann
@ 2022-08-25 15:35     ` Xiaoyao Li
  0 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-25 15:35 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 8/25/2022 6:12 PM, Gerd Hoffmann wrote:
>    Hi,
> 
>> +        r = tdx_platform_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
>> +        if (r == -E2BIG) {
>> +            g_free(caps);
>> +            nr_cpuid_configs *= 2;
>> +            if (nr_cpuid_configs > KVM_MAX_CPUID_ENTRIES) {
>> +                error_report("KVM TDX seems broken");
> 
> Maybe, but IMHO this should still report what exactly the problem is
> (number of cpuid entries exceeds limit).

Will update it to

	error_report(KVM TDX seems broken that number of CPUID entries in 
kvm_tdx_capabilities exceeds limit)

> take care,
>    Gerd
> 
> 


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions
  2022-08-03  7:33   ` Chenyi Qiang
  2022-08-04  0:55     ` Xiaoyao Li
@ 2022-08-26  4:00     ` Xiaoyao Li
  1 sibling, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-08-26  4:00 UTC (permalink / raw)
  To: Chenyi Qiang, Paolo Bonzini, Isaku Yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 8/3/2022 3:33 PM, Chenyi Qiang wrote:
> 
> 
> On 8/2/2022 3:47 PM, Xiaoyao Li wrote:
>> According to Chapter "CPUID Virtualization" in TDX module spec, CPUID
>> bits of TD can be classified into 6 types:
>>
>> ------------------------------------------------------------------------
>> 1 | As configured | configurable by VMM, independent of native value;
>> ------------------------------------------------------------------------
>> 2 | As configured | configurable by VMM if the bit is supported natively
>>      (if native)   | Otherwise it equals as native(0).
>> ------------------------------------------------------------------------
>> 3 | Fixed         | fixed to 0/1
>> ------------------------------------------------------------------------
>> 4 | Native        | reflect the native value
>> ------------------------------------------------------------------------
>> 5 | Calculated    | calculated by TDX module.
>> ------------------------------------------------------------------------
>> 6 | Inducing #VE  | get #VE exception
>> ------------------------------------------------------------------------
>>
>> Note:
>> 1. All the configurable XFAM related features and TD attributes related
>>     features fall into type #2. And fixed0/1 bits of XFAM and TD
>>     attributes fall into type #3.
>>
>> 2. For CPUID leaves not listed in "CPUID virtualization Overview" table
>>     in TDX module spec. When they are queried, TDX module injects #VE to
>>     TDs. For this case, TDs can request CPUID emulation from VMM via
>>     TDVMCALL and the values are fully controlled by VMM.
>>
>> Due to TDX module has its own virtualization policy on CPUID bits, it 
>> leads
>> to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported
>> CPUID bits for TDS. In order to keep a consistent CPUID configuration
>> between VMM and TDs. Adjust supported CPUID for TDs based on TDX
>> restrictions.
>>
>> Currently only focus on the CPUID leaves recognized by QEMU's
>> feature_word_info[] that are indexed by a FeatureWord.
>>
>> Introduce a TDX CPUID lookup table, which maintains 1 entry for each
>> FeatureWord. Each entry has below fields:
>>
>>   - tdx_fixed0/1: The bits that are fixed as 0/1;
>>
>>   - vmm_fixup:   The bits that are configurable from the view of TDX 
>> module.
>>                  But they requires emulation of VMM when they are 
>> configured
>>             as enabled. For those, they are not supported if VMM doesn't
>>         report them as supported. So they need be fixed up by
>>         checking if VMM supports them.
>>
>>   - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is
>>                  totally configurable by VMM.
>>
>>   - supported_on_ve: It's valid only when @inducing_ve is true. It 
>> represents
>>             the maximum feature set supported that be emulated
>>             for TDs.
>>
>> By applying TDX CPUID lookup table and TDX capabilities reported from
>> TDX module, the supported CPUID for TDs can be obtained from following
>> steps:
>>
>> - get the base of VMM supported feature set;
>>
>> - if the leaf is not a FeatureWord just return VMM's value without
>>    modification;
>>
>> - if the leaf is an inducing_ve type, applying supported_on_ve mask and
>>    return;
>>
>> - include all native bits, it covers type #2, #4, and parts of type #1.
>>    (it also includes some unsupported bits. The following step will
>>     correct it.)
>>
>> - apply fixed0/1 to it (it covers #3, and rectifies the previous step);
>>
>> - add configurable bits (it covers the other part of type #1);
>>
>> - fix the ones in vmm_fixup;
>>
>> - filter the one has valid .supported field;
> 
> What does .supported field filter mean here?
> 

Sorry I missed this comment before.

Above statement is the leftover during internal development. It needs to 
be removed actually.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
  2022-08-25 14:42     ` Xiaoyao Li
@ 2022-08-26  5:57       ` Gerd Hoffmann
  2022-09-02  2:33         ` Xiaoyao Li
  0 siblings, 1 reply; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26  5:57 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,
 
> For TD guest kernel, it has its own reason to turn SEPT_VE on or off. E.g.,
> linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall gap
> [1].

Why is that a problem for a TD guest kernel?  Installing exception
handlers is done quite early in the boot process, certainly before any
userspace code runs.  So I think we should never see a syscall without
a #VE handler being installed.  /me is confused.

Or do you want tell me linux has no #VE handler?

> Frankly speaking, this bit is better to be configured by TD guest
> kernel, however current TDX architecture makes the design to let VMM
> configure.

Indeed.  Requiring users to know guest kernel capabilities and manually
configuring the vmm accordingly looks fragile to me.

Even better would be to not have that bit in the first place and require
TD guests properly handle #VE exceptions.

> This can cause problems with the "system call gap": a malicious
> hypervisor might trigger a #VE for example on the system call entry
> code, and when a user process does a system call it would trigger a
> and SYSCALL relies on the kernel code to switch to the kernel stack,
> this would lead to kernel code running on the ring 3 stack.

Hmm?  Exceptions switch to kernel context too ...

take care,
  Gerd


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 20/40] i386/tdvf: Introduce function to parse TDVF metadata
  2022-08-02  7:47 ` [PATCH v1 20/40] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
@ 2022-08-26  9:12   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26  9:12 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:30PM +0800, Xiaoyao Li wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> TDX VM needs to boot with its specialized firmware, Trusted Domain
> Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
> guest memory prior to running the TDX VM.
> 
> A TDVF Metadata in TDVF image describes the structure of firmware.
> QEMU refers to it to setup memory for TDVF. Introduce function
> tdvf_parse_metadata() to parse the metadata from TDVF image and store
> the info of each TDVF section.
> 
> TDX metadata is located by a TDX metadata offset block, which is a
> GUID-ed structure. The data portion of the GUID structure contains
> only an 4-byte field that is the offset of TDX metadata to the end
> of firmware file.
> 
> Select X86_FW_OVMF when TDX is enable to leverage existing functions
> to parse and search OVMF's GUID-ed structures.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 22/40] i386/tdx: Skip BIOS shadowing setup
  2022-08-02  7:47 ` [PATCH v1 22/40] i386/tdx: Skip BIOS shadowing setup Xiaoyao Li
@ 2022-08-26  9:13   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26  9:13 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:32PM +0800, Xiaoyao Li wrote:
> TDX doesn't support map different GPAs to same private memory. Thus,
> aliasing top 128KB of BIOS as isa-bios is not supported.
> 
> On the other hand, TDX guest cannot go to real mode, it can work fine
> without isa-bios.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 25/40] i386/tdx: Track RAM entries for TDX VM
  2022-08-02  7:47 ` [PATCH v1 25/40] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
@ 2022-08-26  9:15   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26  9:15 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:35PM +0800, Xiaoyao Li wrote:
> The RAM of TDX VM can be classified into two types:
> 
>  - TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
>    accepted by TDX guest before it can be used and will be all-zeros
>    after being accepted.
> 
>  - TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
>    can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.
> 
> Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
> and mark each RAM range as default type TDX_RAM_UNACCEPTED.
> 
> Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
> ranges will be ADD'ed before TD runs and no need to be accepted runtime.
> 
> The TdxRamEntries[] are later used to setup the memory TD resource HOB
> that passes memory info from QEMU to TDVF.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 26/40] headers: Add definitions from UEFI spec for volumes, resources, etc...
  2022-08-02  7:47 ` [PATCH v1 26/40] headers: Add definitions from UEFI spec for volumes, resources, etc Xiaoyao Li
@ 2022-08-26  9:19   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26  9:19 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:36PM +0800, Xiaoyao Li wrote:
> Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
> will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
> to the Trusted Domain Virtual Firmware (TDVF).
> 
> All values come from the UEFI specification and TDVF design guide. [1]
> 
> Note, EFI_RESOURCE_MEMORY_UNACCEPTED will be added in future UEFI spec.
> 
> [1] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 27/40] i386/tdx: Setup the TD HOB list
  2022-08-02  7:47 ` [PATCH v1 27/40] i386/tdx: Setup the TD HOB list Xiaoyao Li
@ 2022-08-26 10:27   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26 10:27 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:37PM +0800, Xiaoyao Li wrote:
> The TD HOB list is used to pass the information from VMM to TDVF. The TD
> HOB must include PHIT HOB and Resource Descriptor HOB. More details can
> be found in TDVF specification and PI specification.
> 
> Build the TD HOB in TDX's machine_init_done callback.
> 
> Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 34/40] hw/i386: add eoi_intercept_unsupported member to X86MachineState
  2022-08-02  7:47 ` [PATCH v1 34/40] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
@ 2022-08-26 10:32   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26 10:32 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:44PM +0800, Xiaoyao Li wrote:
> Add a new bool member, eoi_intercept_unsupported, to X86MachineState
> with default value false. Set true for TDX VM.
> 
> Inability to intercept eoi causes impossibility to emulate level
> triggered interrupt to be re-injected when level is still kept active.
> which affects interrupt controller emulation.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 35/40] hw/i386: add option to forcibly report edge trigger in acpi tables
  2022-08-02  7:47 ` [PATCH v1 35/40] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
@ 2022-08-26 10:32   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26 10:32 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:45PM +0800, Xiaoyao Li wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> When level trigger isn't supported on x86 platform,
> forcibly report edge trigger in acpi tables.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 36/40] i386/tdx: Don't synchronize guest tsc for TDs
  2022-08-02  7:47 ` [PATCH v1 36/40] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
@ 2022-08-26 10:33   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26 10:33 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:46PM +0800, Xiaoyao Li wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> TSC of TDs is not accessible and KVM doesn't allow access of
> MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
> kvm_synchronize_all_tsc() noop for TDs,
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 38/40] i386/tdx: Skip kvm_put_apicbase() for TDs
  2022-08-02  7:47 ` [PATCH v1 38/40] i386/tdx: Skip kvm_put_apicbase() " Xiaoyao Li
@ 2022-08-26 10:34   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26 10:34 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:48PM +0800, Xiaoyao Li wrote:
> KVM doesn't allow wirting to MSR_IA32_APICBASE for TDs.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 39/40] i386/tdx: Don't get/put guest state for TDX VMs
  2022-08-02  7:47 ` [PATCH v1 39/40] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
@ 2022-08-26 10:35   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26 10:35 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:49PM +0800, Xiaoyao Li wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Don't get/put state of TDX VMs since accessing/mutating guest state of
> production TDs is not supported.
> 
> Note, it will be allowed for a debug TD. Corresponding support will be
> introduced when debug TD support is implemented in the future.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 40/40] docs: Add TDX documentation
  2022-08-02  7:47 ` [PATCH v1 40/40] docs: Add TDX documentation Xiaoyao Li
@ 2022-08-26 10:36   ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-08-26 10:36 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P . Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, Aug 02, 2022 at 03:47:50PM +0800, Xiaoyao Li wrote:
> Add docs/system/i386/tdx.rst for TDX support, and add tdx in
> confidential-guest-support.rst
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
  2022-08-26  5:57       ` Gerd Hoffmann
@ 2022-09-02  2:33         ` Xiaoyao Li
  2022-09-02  2:52           ` Sean Christopherson
  0 siblings, 1 reply; 80+ messages in thread
From: Xiaoyao Li @ 2022-09-02  2:33 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 8/26/2022 1:57 PM, Gerd Hoffmann wrote:
>    Hi,
>   
>> For TD guest kernel, it has its own reason to turn SEPT_VE on or off. E.g.,
>> linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall gap
>> [1].
> 
> Why is that a problem for a TD guest kernel?  Installing exception
> handlers is done quite early in the boot process, certainly before any
> userspace code runs.  So I think we should never see a syscall without
> a #VE handler being installed.  /me is confused.
> 
> Or do you want tell me linux has no #VE handler?

The problem is not "no #VE handler" and Linux does have #VE handler. The 
problem is Linux doesn't want any (or certain) exception occurrence in 
syscall gap, it's not specific to #VE. Frankly, I don't understand the 
reason clearly, it's something related to IST used in x86 Linux kernel.

>> Frankly speaking, this bit is better to be configured by TD guest
>> kernel, however current TDX architecture makes the design to let VMM
>> configure.
> 
> Indeed.  Requiring users to know guest kernel capabilities and manually
> configuring the vmm accordingly looks fragile to me.
> 
> Even better would be to not have that bit in the first place and require
> TD guests properly handle #VE exceptions.
> 
>> This can cause problems with the "system call gap": a malicious
>> hypervisor might trigger a #VE for example on the system call entry
>> code, and when a user process does a system call it would trigger a
>> and SYSCALL relies on the kernel code to switch to the kernel stack,
>> this would lead to kernel code running on the ring 3 stack.
> 
> Hmm?  Exceptions switch to kernel context too ...
> 
> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
  2022-09-02  2:33         ` Xiaoyao Li
@ 2022-09-02  2:52           ` Sean Christopherson
  2022-09-02  5:46             ` Gerd Hoffmann
  0 siblings, 1 reply; 80+ messages in thread
From: Sean Christopherson @ 2022-09-02  2:52 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Gerd Hoffmann, Paolo Bonzini, Isaku Yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel

On Fri, Sep 02, 2022, Xiaoyao Li wrote:
> On 8/26/2022 1:57 PM, Gerd Hoffmann wrote:
> >    Hi,
> > > For TD guest kernel, it has its own reason to turn SEPT_VE on or off. E.g.,
> > > linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall gap
> > > [1].
> > 
> > Why is that a problem for a TD guest kernel?  Installing exception
> > handlers is done quite early in the boot process, certainly before any
> > userspace code runs.  So I think we should never see a syscall without
> > a #VE handler being installed.  /me is confused.
> > 
> > Or do you want tell me linux has no #VE handler?
> 
> The problem is not "no #VE handler" and Linux does have #VE handler. The
> problem is Linux doesn't want any (or certain) exception occurrence in
> syscall gap, it's not specific to #VE. Frankly, I don't understand the
> reason clearly, it's something related to IST used in x86 Linux kernel.

The SYSCALL gap issue is that because SYSCALL doesn't load RSP, the first instruction
at the SYSCALL entry point runs with a userspaced-controlled RSP.  With TDX, a
malicious hypervisor can induce a #VE on the SYSCALL page and thus get the kernel
to run the #VE handler with a userspace stack.

The "fix" is to use an IST for #VE so that a kernel-controlled RSP is loaded on #VE,
but ISTs are terrible because they don't play nice with re-entrancy (among other
reasons).  The RSP used for IST-based handlers is hardcoded, and so if a #VE
handler triggers another #VE at any point before IRET, the second #VE will clobber
the stack and hose the kernel.

It's possible to workaround this, e.g. change the IST entry at the very beginning
of the handler, but it's a maintenance burden.  Since the only reason to use an IST
is to guard against a malicious hypervisor, Linux decided it would be just as easy
and more beneficial to avoid unexpected #VEs due to unaccepted private pages entirely.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
  2022-09-02  2:52           ` Sean Christopherson
@ 2022-09-02  5:46             ` Gerd Hoffmann
  2022-09-02 15:26               ` Sean Christopherson
  0 siblings, 1 reply; 80+ messages in thread
From: Gerd Hoffmann @ 2022-09-02  5:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Paolo Bonzini, Isaku Yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel

On Fri, Sep 02, 2022 at 02:52:25AM +0000, Sean Christopherson wrote:
> On Fri, Sep 02, 2022, Xiaoyao Li wrote:
> > On 8/26/2022 1:57 PM, Gerd Hoffmann wrote:
> > >    Hi,
> > > > For TD guest kernel, it has its own reason to turn SEPT_VE on or off. E.g.,
> > > > linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall gap
> > > > [1].
> > > 
> > > Why is that a problem for a TD guest kernel?  Installing exception
> > > handlers is done quite early in the boot process, certainly before any
> > > userspace code runs.  So I think we should never see a syscall without
> > > a #VE handler being installed.  /me is confused.
> > > 
> > > Or do you want tell me linux has no #VE handler?
> > 
> > The problem is not "no #VE handler" and Linux does have #VE handler. The
> > problem is Linux doesn't want any (or certain) exception occurrence in
> > syscall gap, it's not specific to #VE. Frankly, I don't understand the
> > reason clearly, it's something related to IST used in x86 Linux kernel.
> 
> The SYSCALL gap issue is that because SYSCALL doesn't load RSP, the first instruction
> at the SYSCALL entry point runs with a userspaced-controlled RSP.  With TDX, a
> malicious hypervisor can induce a #VE on the SYSCALL page and thus get the kernel
> to run the #VE handler with a userspace stack.
> 
> The "fix" is to use an IST for #VE so that a kernel-controlled RSP is loaded on #VE,
> but ISTs are terrible because they don't play nice with re-entrancy (among other
> reasons).  The RSP used for IST-based handlers is hardcoded, and so if a #VE
> handler triggers another #VE at any point before IRET, the second #VE will clobber
> the stack and hose the kernel.
> v
> It's possible to workaround this, e.g. change the IST entry at the very beginning
> of the handler, but it's a maintenance burden.  Since the only reason to use an IST
> is to guard against a malicious hypervisor, Linux decided it would be just as easy
> and more beneficial to avoid unexpected #VEs due to unaccepted private pages entirely.

Hmm, ok, but shouldn't the SEPT_VE bit *really* controlled by the guest then?

Having a hypervisor-controlled config bit to protect against a malicious
hypervisor looks pointless to me ...

take care,
  Gerd


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
  2022-09-02  5:46             ` Gerd Hoffmann
@ 2022-09-02 15:26               ` Sean Christopherson
  2022-09-02 16:52                 ` Gerd Hoffmann
  0 siblings, 1 reply; 80+ messages in thread
From: Sean Christopherson @ 2022-09-02 15:26 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Xiaoyao Li, Paolo Bonzini, Isaku Yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel

On Fri, Sep 02, 2022, Gerd Hoffmann wrote:
> On Fri, Sep 02, 2022 at 02:52:25AM +0000, Sean Christopherson wrote:
> > On Fri, Sep 02, 2022, Xiaoyao Li wrote:
> > > On 8/26/2022 1:57 PM, Gerd Hoffmann wrote:
> > > >    Hi,
> > > > > For TD guest kernel, it has its own reason to turn SEPT_VE on or off. E.g.,
> > > > > linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall gap
> > > > > [1].
> > > > 
> > > > Why is that a problem for a TD guest kernel?  Installing exception
> > > > handlers is done quite early in the boot process, certainly before any
> > > > userspace code runs.  So I think we should never see a syscall without
> > > > a #VE handler being installed.  /me is confused.
> > > > 
> > > > Or do you want tell me linux has no #VE handler?
> > > 
> > > The problem is not "no #VE handler" and Linux does have #VE handler. The
> > > problem is Linux doesn't want any (or certain) exception occurrence in
> > > syscall gap, it's not specific to #VE. Frankly, I don't understand the
> > > reason clearly, it's something related to IST used in x86 Linux kernel.
> > 
> > The SYSCALL gap issue is that because SYSCALL doesn't load RSP, the first instruction
> > at the SYSCALL entry point runs with a userspaced-controlled RSP.  With TDX, a
> > malicious hypervisor can induce a #VE on the SYSCALL page and thus get the kernel
> > to run the #VE handler with a userspace stack.
> > 
> > The "fix" is to use an IST for #VE so that a kernel-controlled RSP is loaded on #VE,
> > but ISTs are terrible because they don't play nice with re-entrancy (among other
> > reasons).  The RSP used for IST-based handlers is hardcoded, and so if a #VE
> > handler triggers another #VE at any point before IRET, the second #VE will clobber
> > the stack and hose the kernel.
> > v
> > It's possible to workaround this, e.g. change the IST entry at the very beginning
> > of the handler, but it's a maintenance burden.  Since the only reason to use an IST
> > is to guard against a malicious hypervisor, Linux decided it would be just as easy
> > and more beneficial to avoid unexpected #VEs due to unaccepted private pages entirely.
> 
> Hmm, ok, but shouldn't the SEPT_VE bit *really* controlled by the guest then?
> 
> Having a hypervisor-controlled config bit to protect against a malicious
> hypervisor looks pointless to me ...

IIRC, all (most?) of the attributes are included in the attestation report, so a
guest/customer can refuse to provision secrets to the guest if the hypervisor is
misbehaving.

I'm guessing Intel made it an attribute and not a dynamic control knob to simplify
the TDX module implementation.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object
  2022-09-02 15:26               ` Sean Christopherson
@ 2022-09-02 16:52                 ` Gerd Hoffmann
  0 siblings, 0 replies; 80+ messages in thread
From: Gerd Hoffmann @ 2022-09-02 16:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Paolo Bonzini, Isaku Yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel

On Fri, Sep 02, 2022 at 03:26:35PM +0000, Sean Christopherson wrote:
> On Fri, Sep 02, 2022, Gerd Hoffmann wrote:
> > 
> > Hmm, ok, but shouldn't the SEPT_VE bit *really* controlled by the guest then?
> > 
> > Having a hypervisor-controlled config bit to protect against a malicious
> > hypervisor looks pointless to me ...
> 
> IIRC, all (most?) of the attributes are included in the attestation report, so a
> guest/customer can refuse to provision secrets to the guest if the hypervisor is
> misbehaving.

Good.  I think we sorted all issues then.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>

take care,
  Gerd


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v1 00/40] TDX QEMU support
  2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
                   ` (40 preceding siblings ...)
  2022-08-02  9:49 ` [PATCH v1 00/40] TDX QEMU support Daniel P. Berrangé
@ 2022-09-05  0:58 ` Xiaoyao Li
  41 siblings, 0 replies; 80+ messages in thread
From: Xiaoyao Li @ 2022-09-05  0:58 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: kvm, qemu-devel

Hi Gerd

On 8/2/2022 3:47 PM, Xiaoyao Li wrote:
..
> == Change history ==
> Changes from RFC v4:
> [RFC v4] https://lore.kernel.org/qemu-devel/20220512031803.3315890-1-xiaoyao.li@intel.com/
> 
> - Add 3 more patches(9, 10, 11) to improve the tdx_get_supported_cpuid();

Patch 8-11 are the only left ones that don't get your Acked-by. Do you 
have any comment on them?

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2022-09-05  0:58 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-02  7:47 [PATCH v1 00/40] TDX QEMU support Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 01/40] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
2022-08-02  9:47   ` Daniel P. Berrangé
2022-08-02 10:38     ` Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 02/40] i386: Introduce tdx-guest object Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 03/40] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 04/40] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 05/40] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 06/40] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
2022-08-25 10:12   ` Gerd Hoffmann
2022-08-25 15:35     ` Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 07/40] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
2022-08-25 10:16   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions Xiaoyao Li
2022-08-03  7:33   ` Chenyi Qiang
2022-08-04  0:55     ` Xiaoyao Li
2022-08-26  4:00     ` Xiaoyao Li
2022-08-25 11:26   ` Gerd Hoffmann
2022-08-25 12:44     ` Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 09/40] i386/tdx: Update tdx_fixed0/1 bits by tdx_caps.cpuid_config[] Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 10/40] i386/tdx: Integrate tdx_caps->xfam_fixed0/1 into tdx_cpuid_lookup Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 11/40] i386/tdx: Integrate tdx_caps->attrs_fixed0/1 to tdx_cpuid_lookup Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 12/40] i386/kvm: Move architectural CPUID leaf generation to separate helper Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 13/40] KVM: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
2022-08-25 11:28   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 14/40] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
2022-08-25 11:29   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object Xiaoyao Li
2022-08-25 11:36   ` Gerd Hoffmann
2022-08-25 14:42     ` Xiaoyao Li
2022-08-26  5:57       ` Gerd Hoffmann
2022-09-02  2:33         ` Xiaoyao Li
2022-09-02  2:52           ` Sean Christopherson
2022-09-02  5:46             ` Gerd Hoffmann
2022-09-02 15:26               ` Sean Christopherson
2022-09-02 16:52                 ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 16/40] i386/tdx: Wire CPU features up with attributes of TD guest Xiaoyao Li
2022-08-25 11:38   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 17/40] i386/tdx: Validate TD attributes Xiaoyao Li
2022-08-25 11:39   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 18/40] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
2022-08-25 11:41   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 19/40] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 20/40] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
2022-08-26  9:12   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 21/40] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 22/40] i386/tdx: Skip BIOS shadowing setup Xiaoyao Li
2022-08-26  9:13   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 23/40] i386/tdx: Don't initialize pc.rom for TDX VMs Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 24/40] i386/tdx: Track mem_ptr for each firmware entry of TDVF Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 25/40] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
2022-08-26  9:15   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 26/40] headers: Add definitions from UEFI spec for volumes, resources, etc Xiaoyao Li
2022-08-26  9:19   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 27/40] i386/tdx: Setup the TD HOB list Xiaoyao Li
2022-08-26 10:27   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 28/40] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 29/40] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 30/40] i386/tdx: Finalize TDX VM Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 31/40] i386/tdx: Disable SMM for TDX VMs Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 32/40] i386/tdx: Disable PIC " Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 33/40] i386/tdx: Don't allow system reset " Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 34/40] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
2022-08-26 10:32   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 35/40] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
2022-08-26 10:32   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 36/40] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
2022-08-26 10:33   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 37/40] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
2022-08-02  7:47 ` [PATCH v1 38/40] i386/tdx: Skip kvm_put_apicbase() " Xiaoyao Li
2022-08-26 10:34   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 39/40] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
2022-08-26 10:35   ` Gerd Hoffmann
2022-08-02  7:47 ` [PATCH v1 40/40] docs: Add TDX documentation Xiaoyao Li
2022-08-26 10:36   ` Gerd Hoffmann
2022-08-02  9:49 ` [PATCH v1 00/40] TDX QEMU support Daniel P. Berrangé
2022-08-02 10:55   ` Xiaoyao Li
2022-08-03 17:44     ` Daniel P. Berrangé
2022-08-05  0:16       ` Xiaoyao Li
2022-09-05  0:58 ` Xiaoyao Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).