All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v4 00/36] TDX QEMU support
@ 2022-05-12  3:17 Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 01/36] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
                   ` (35 more replies)
  0 siblings, 36 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

This is the v4 RFC, I would like to get feedback on whether the design
in this series is the good direction to enable TDX on QEMU.

This patch series aims to enable TDX support to allow creating and booting a
TD (TDX VM) with QEMU. It needs to work with corresponding v6 KVM patch
for TDX [1]. You can find TDX related documents in [2].

You can also find this series in below repo in github:

https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-rfc-v4

and it's based on two cleanup patches

https://lore.kernel.org/qemu-devel/20220310122811.807794-1-xiaoyao.li@intel.com/


To boot a TDX VM, it requires several changes/additional steps in the flow:

 1. specify the vm type KVM_X86_TDX_VM when creating VM with
    IOCTL(KVM_CREATE_VM);
 2. initialize VM scope configuration before creating any VCPU;
 3. initialize VCPU scope configuration;
 4. initialize virtual firmware in guest private memory before vcpu running;

Besides, TDX VM needs to boot with TDVF (TDX virtual firmware) and currently
upstream OVMF can serve as TDVF. This series adds the support of parsing TDVF,
loading TDVF into guest's private memory and preparing TD HOB info for TDVF.

[1] KVM TDX basic feature support
https://lore.kernel.org/all/cover.1646422845.git.isaku.yamahata@intel.com/

[2] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html

== Limitation and future work ==
- Readonly memslot

  TDX only support readonly (write protection) memslot for shared memory, but
  not for private memory. For simplicity, just mark readonly memslot not
  supported entirely for TDX. 

- CPU model

  We cannot create a TD with arbitrary CPU model like what for non-TDX VMs,
  because only a subset of features can be configured for TD.
  
  - It's recommended to use '-cpu host' to create TD;
  - '+feature/-feature' might not work as expected;

  future work: To introduce specific CPU model for TDs and enhance +/-features
               for TDs.

- gdb suppport

  gdb support to debug a TD of off-debug mode is future work.

== Patch organization ==
1           Manually fetch Linux UAPI changes for TDX;
2-15,25-26  Basic TDX support that parses vm-type and invoke TDX
            specific IOCTLs
16-24       Load, parse and initialize TDVF for TDX VM;
27-31       Disable unsupported functions for TDX VM;
32-35       Avoid errors due to KVM's requirement on TDX;
36          Add documentation of TDX;

== Change history ==
Changes from RFC v3:
- Load TDVF with -bios interface;
- Adapt to KVM API changes;
	- KVM_TDX_CAPABILITIES changes back to KVM-scope;
	- struct kvm_tdx_init_vm changes;
- Define TDX_SUPPORTED_KVM_FEATURES;
- Drop the patch of introducing property sept-ve-disable since it's not
  public yet;
- some misc cleanups

Changes from RFC v2:
- Get vm-type from confidential-guest-support object type;
- Drop machine_init_done_late_notifiers;
- Refactor tdx_ioctl implementation;
- re-use existing pflash interface to load TDVF (i.e., OVMF binaries);
- introduce new date structure to track memory type instead of changing
  e820 table;
- Force smm to off for TDX VM;
- Drop the patches that suppress level-trigger/SMI/INIT/SIPI since KVM
  will ingore them;
- Add documentation;

[v2] https://lore.kernel.org/qemu-devel/cover.1625704980.git.isaku.yamahata@intel.com/

Changes from RFC v1:
- suppress level trigger/SMI/INIT/SIPI related to IOAPIC.
- add VM attribute sha384 to TD measurement.
- guest TSC Hz specification

[v1] https://lore.kernel.org/qemu-devel/cover.1613188118.git.isaku.yamahata@intel.com/

Isaku Yamahata (4):
  i386/tdvf: Introduce function to parse TDVF metadata
  i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION
  hw/i386: add option to forcibly report edge trigger in acpi tables
  i386/tdx: Don't synchronize guest tsc for TDs

Sean Christopherson (2):
  i386/kvm: Move architectural CPUID leaf generation to separate helper
  i386/tdx: Don't get/put guest state for TDX VMs

Xiaoyao Li (30):
  *** HACK *** linux-headers: Update headers to pull in TDX API changes
  i386: Introduce tdx-guest object
  target/i386: Implement mc->kvm_type() to get VM type
  target/i386: Introduce kvm_confidential_guest_init()
  i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context
  i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
  i386/tdx: Adjust get_supported_cpuid() for TDX VM
  KVM: Introduce kvm_arch_pre_create_vcpu()
  i386/tdx: Initialize TDX before creating TD vcpus
  i386/tdx: Wire CPU features up with attributes of TD guest
  i386/tdx: Validate TD attributes
  i386/tdx: Implement user specified tsc frequency
  i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM
  i386/tdx: Parse TDVF metadata for TDX VM
  i386/tdx: Skip BIOS shadowing setup
  i386/tdx: Don't initialize pc.rom for TDX VMs
  i386/tdx: Register a machine_init_done callback for TD
  i386/tdx: Track mem_ptr for each firmware entry of TDVF
  i386/tdx: Track RAM entries for TDX VM
  i386/tdx: Setup the TD HOB list
  i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu
  i386/tdx: Finalize TDX VM
  i386/tdx: Disable SMM for TDX VMs
  i386/tdx: Disable PIC for TDX VMs
  i386/tdx: Don't allow system reset for TDX VMs
  hw/i386: add eoi_intercept_unsupported member to X86MachineState
  i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
  i386/tdx: Skip kvm_put_apicbase() for TDs
  docs: Add TDX documentation

 accel/kvm/kvm-all.c                        |  21 +-
 configs/devices/i386-softmmu/default.mak   |   1 +
 docs/system/confidential-guest-support.rst |   1 +
 docs/system/i386/tdx.rst                   | 103 +++++
 docs/system/target-i386.rst                |   1 +
 hw/i386/Kconfig                            |   6 +
 hw/i386/acpi-build.c                       |  99 ++--
 hw/i386/acpi-common.c                      |  50 +-
 hw/i386/meson.build                        |   1 +
 hw/i386/pc.c                               |  21 +-
 hw/i386/pc_sysfw.c                         |   7 +
 hw/i386/tdvf-hob.c                         | 212 +++++++++
 hw/i386/tdvf-hob.h                         |  25 +
 hw/i386/tdvf.c                             | 198 ++++++++
 hw/i386/uefi.h                             | 198 ++++++++
 hw/i386/x86.c                              |  34 +-
 include/hw/i386/tdvf.h                     |  58 +++
 include/hw/i386/x86.h                      |   1 +
 include/sysemu/kvm.h                       |   1 +
 linux-headers/asm-x86/kvm.h                |  95 ++++
 linux-headers/linux/kvm.h                  |   2 +
 qapi/qom.json                              |  14 +
 target/i386/cpu.h                          |   5 +
 target/i386/kvm/kvm.c                      | 362 +++++++++------
 target/i386/kvm/kvm_i386.h                 |   5 +
 target/i386/kvm/meson.build                |   2 +
 target/i386/kvm/tdx-stub.c                 |  19 +
 target/i386/kvm/tdx.c                      | 505 +++++++++++++++++++++
 target/i386/kvm/tdx.h                      |  55 +++
 target/i386/sev.c                          |   1 -
 target/i386/sev.h                          |   2 +
 31 files changed, 1897 insertions(+), 208 deletions(-)
 create mode 100644 docs/system/i386/tdx.rst
 create mode 100644 hw/i386/tdvf-hob.c
 create mode 100644 hw/i386/tdvf-hob.h
 create mode 100644 hw/i386/tdvf.c
 create mode 100644 hw/i386/uefi.h
 create mode 100644 include/hw/i386/tdvf.h
 create mode 100644 target/i386/kvm/tdx-stub.c
 create mode 100644 target/i386/kvm/tdx.c
 create mode 100644 target/i386/kvm/tdx.h

-- 
2.27.0


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 01/36] *** HACK *** linux-headers: Update headers to pull in TDX API changes
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 02/36] i386: Introduce tdx-guest object Xiaoyao Li
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Pull in recent TDX updates, which are not backwards compatible.

It's just to make this series runnable. It will be updated by script

	scripts/update-linux-headers.sh

once TDX support is upstreamed in linux kernel.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 linux-headers/asm-x86/kvm.h | 95 +++++++++++++++++++++++++++++++++++++
 linux-headers/linux/kvm.h   |  2 +
 2 files changed, 97 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index bf6e96011dfe..8a06a2a7527e 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -525,4 +525,99 @@ struct kvm_pmu_event_filter {
 #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
 
+#define KVM_X86_DEFAULT_VM	0
+#define KVM_X86_TDX_VM		1
+
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES = 0,
+	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 * Defined for consistency with struct kvm_sev_cmd.
+	 */
+	__u64 error;
+	/* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+	__u64 unused;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u32 max_vcpus;
+	__u32 tsc_khz;
+	__u64 mrconfigid[6];	/* sha384 digest */
+	__u64 mrowner[6];	/* sha384 digest */
+	__u64 mrownerconfig[6];	/* sha348 digest */
+	union {
+		/*
+		 * KVM_TDX_INIT_VM is called before vcpu creation, thus before
+		 * KVM_SET_CPUID2.  CPUID configurations needs to be passed.
+		 *
+		 * This configuration supersedes KVM_SET_CPUID{,2}.
+		 * The user space VMM, e.g. qemu, should make them consistent
+		 * with this values.
+		 * sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES(256)
+		 * = 8KB.
+		 */
+		struct {
+			struct kvm_cpuid2 cpuid;
+			/* 8KB with KVM_MAX_CPUID_ENTRIES. */
+			struct kvm_cpuid_entry2 entries[];
+		};
+		/*
+		 * For future extensibility.
+		 * The size(struct kvm_tdx_init_vm) = 16KB.
+		 * This should be enough given sizeof(TD_PARAMS) = 1024
+		 */
+		__u64 reserved[2028];
+	};
+};
+
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index d232feaae972..b69898e4d036 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1135,6 +1135,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
 
+#define KVM_CAP_VM_TYPES 216
+
 #ifdef KVM_CAP_IRQ_ROUTING
 
 struct kvm_irq_routing_irqchip {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 02/36] i386: Introduce tdx-guest object
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 01/36] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 03/36] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Introduce tdx-guest object which implements the interface of
CONFIDENTIAL_GUEST_SUPPORT, and will be used to create TDX VMs (TDs) by

  qemu -machine ...,confidential-guest-support=tdx0	\
       -object tdx-guset,id=tdx0

It has only one property 'attributes' with fixed value 0 and not
configurable so far.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 configs/devices/i386-softmmu/default.mak |  1 +
 hw/i386/Kconfig                          |  5 +++
 qapi/qom.json                            | 14 +++++++++
 target/i386/kvm/meson.build              |  2 ++
 target/i386/kvm/tdx.c                    | 40 ++++++++++++++++++++++++
 target/i386/kvm/tdx.h                    | 19 +++++++++++
 6 files changed, 81 insertions(+)
 create mode 100644 target/i386/kvm/tdx.c
 create mode 100644 target/i386/kvm/tdx.h

diff --git a/configs/devices/i386-softmmu/default.mak b/configs/devices/i386-softmmu/default.mak
index 598c6646dfc0..9b5ec59d65b0 100644
--- a/configs/devices/i386-softmmu/default.mak
+++ b/configs/devices/i386-softmmu/default.mak
@@ -18,6 +18,7 @@
 #CONFIG_QXL=n
 #CONFIG_SEV=n
 #CONFIG_SGA=n
+#CONFIG_TDX=n
 #CONFIG_TEST_DEVICES=n
 #CONFIG_TPM_CRB=n
 #CONFIG_TPM_TIS_ISA=n
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index d22ac4a4b952..9e40ff79fc2d 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -10,6 +10,10 @@ config SGX
     bool
     depends on KVM
 
+config TDX
+    bool
+    depends on KVM
+
 config PC
     bool
     imply APPLESMC
@@ -26,6 +30,7 @@ config PC
     imply QXL
     imply SEV
     imply SGX
+    imply TDX
     imply SGA
     imply TEST_DEVICES
     imply TPM_CRB
diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3b7..fde489e311dc 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -785,6 +785,18 @@
             'reduced-phys-bits': 'uint32',
             '*kernel-hashes': 'bool' } }
 
+##
+# @TdxGuestProperties:
+#
+# Properties for tdx-guest objects.
+#
+# @attributes: TDX guest's attributes (default: 0)
+#
+# Since: 7.1
+##
+{ 'struct': 'TdxGuestProperties',
+  'data': { '*attributes': 'uint64' } }
+
 ##
 # @ObjectType:
 #
@@ -837,6 +849,7 @@
       'if': 'CONFIG_SECRET_KEYRING' },
     'sev-guest',
     's390-pv-guest',
+    'tdx-guest',
     'throttle-group',
     'tls-creds-anon',
     'tls-creds-psk',
@@ -900,6 +913,7 @@
       'secret_keyring':             { 'type': 'SecretKeyringProperties',
                                       'if': 'CONFIG_SECRET_KEYRING' },
       'sev-guest':                  'SevGuestProperties',
+      'tdx-guest':                  'TdxGuestProperties',
       'throttle-group':             'ThrottleGroupProperties',
       'tls-creds-anon':             'TlsCredsAnonProperties',
       'tls-creds-psk':              'TlsCredsPskProperties',
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 736df8b72e3f..b2d7d41acde2 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -9,6 +9,8 @@ i386_softmmu_kvm_ss.add(files(
 
 i386_softmmu_kvm_ss.add(when: 'CONFIG_SEV', if_false: files('sev-stub.c'))
 
+i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
+
 i386_softmmu_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
 
 i386_softmmu_ss.add_all(when: 'CONFIG_KVM', if_true: i386_softmmu_kvm_ss)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
new file mode 100644
index 000000000000..d3792d4a3d56
--- /dev/null
+++ b/target/i386/kvm/tdx.c
@@ -0,0 +1,40 @@
+/*
+ * QEMU TDX support
+ *
+ * Copyright Intel
+ *
+ * Author:
+ *      Xiaoyao Li <xiaoyao.li@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object_interfaces.h"
+
+#include "tdx.h"
+
+/* tdx guest */
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
+                                   tdx_guest,
+                                   TDX_GUEST,
+                                   CONFIDENTIAL_GUEST_SUPPORT,
+                                   { TYPE_USER_CREATABLE },
+                                   { NULL })
+
+static void tdx_guest_init(Object *obj)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    tdx->attributes = 0;
+}
+
+static void tdx_guest_finalize(Object *obj)
+{
+}
+
+static void tdx_guest_class_init(ObjectClass *oc, void *data)
+{
+}
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
new file mode 100644
index 000000000000..415aeb5af746
--- /dev/null
+++ b/target/i386/kvm/tdx.h
@@ -0,0 +1,19 @@
+#ifndef QEMU_I386_TDX_H
+#define QEMU_I386_TDX_H
+
+#include "exec/confidential-guest-support.h"
+
+#define TYPE_TDX_GUEST "tdx-guest"
+#define TDX_GUEST(obj)  OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
+
+typedef struct TdxGuestClass {
+    ConfidentialGuestSupportClass parent_class;
+} TdxGuestClass;
+
+typedef struct TdxGuest {
+    ConfidentialGuestSupport parent_obj;
+
+    uint64_t attributes;    /* TD attributes */
+} TdxGuest;
+
+#endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 03/36] target/i386: Implement mc->kvm_type() to get VM type
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 01/36] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 02/36] i386: Introduce tdx-guest object Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-23  8:36   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 04/36] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
                   ` (32 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX VM requires VM type KVM_X86_TDX_VM to be passed to
kvm_ioctl(KVM_CREATE_VM). Hence implement mc->kvm_type() for i386
architecture.

If tdx-guest object is specified to confidential-guest-support, like,

  qemu -machine ...,confidential-guest-support=tdx0 \
       -object tdx-guest,id=tdx0,...

it parses VM type as KVM_X86_TDX_VM. Otherwise, it's KVM_X86_DEFAULT_VM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/x86.c              |  6 ++++++
 target/i386/kvm/kvm.c      | 30 ++++++++++++++++++++++++++++++
 target/i386/kvm/kvm_i386.h |  1 +
 3 files changed, 37 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index fba790f7b49c..4d0b0047627d 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1345,6 +1345,11 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
     qapi_free_SgxEPCList(list);
 }
 
+static int x86_kvm_type(MachineState *ms, const char *vm_type)
+{
+    return kvm_get_vm_type(ms, vm_type);
+}
+
 static void x86_machine_initfn(Object *obj)
 {
     X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1368,6 +1373,7 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
     mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
     mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
     mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+    mc->kvm_type = x86_kvm_type;
     x86mc->save_tsc_khz = true;
     x86mc->fwcfg_dma_enabled = true;
     nc->nmi_monitor_handler = x86_nmi;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c885763a5bb5..3f8a9183fa9b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -30,6 +30,7 @@
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "sev.h"
+#include "tdx.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
 
@@ -142,6 +143,35 @@ static struct kvm_msr_list *kvm_feature_msrs;
 #define BUS_LOCK_SLICE_TIME 1000000000ULL /* ns */
 static RateLimit bus_lock_ratelimit_ctrl;
 
+static const char* vm_type_name[] = {
+    [KVM_X86_DEFAULT_VM] = "X86_DEFAULT_VM",
+    [KVM_X86_TDX_VM] = "X86_TDX_VM",
+};
+
+int kvm_get_vm_type(MachineState *ms, const char *vm_type)
+{
+    int kvm_type = KVM_X86_DEFAULT_VM;
+
+    if (ms->cgs && object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
+        kvm_type = KVM_X86_TDX_VM;
+    }
+
+    /*
+     * old KVM doesn't support KVM_CAP_VM_TYPES and KVM_X86_DEFAULT_VM
+     * is always supported
+     */
+    if (kvm_type == KVM_X86_DEFAULT_VM) {
+        return kvm_type;
+    }
+
+    if (!(kvm_check_extension(KVM_STATE(ms->accelerator), KVM_CAP_VM_TYPES) & BIT(kvm_type))) {
+        error_report("vm-type %s not supported by KVM", vm_type_name[kvm_type]);
+        exit(1);
+    }
+
+    return kvm_type;
+}
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 4124912c202e..b434feaa6b1d 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -37,6 +37,7 @@ bool kvm_has_adjust_clock(void);
 bool kvm_has_adjust_clock_stable(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
+int kvm_get_vm_type(MachineState *ms, const char *vm_type);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_do_init_vcpu(X86CPU *cs);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 04/36] target/i386: Introduce kvm_confidential_guest_init()
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (2 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 03/36] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-23  8:37   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 05/36] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context Xiaoyao Li
                   ` (31 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Introduce a separate function kvm_confidential_guest_init() for SEV (and
future TDX).

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 11 ++++++++++-
 target/i386/sev.c     |  1 -
 target/i386/sev.h     |  2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3f8a9183fa9b..f657518616f1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2426,6 +2426,15 @@ static void register_smram_listener(Notifier *n, void *unused)
                                  &smram_address_space, 1, "kvm-smram");
 }
 
+static int kvm_confidential_guest_init(MachineState *ms, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
+        return sev_kvm_init(ms->cgs, errp);
+    }
+
+    return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     uint64_t identity_base = 0xfffbc000;
@@ -2446,7 +2455,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
      * mechanisms are supported in future (e.g. TDX), they'll need
      * their own initialization either here or elsewhere.
      */
-    ret = sev_kvm_init(ms->cgs, &local_err);
+    ret = kvm_confidential_guest_init(ms, &local_err);
     if (ret < 0) {
         error_report_err(local_err);
         return ret;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 32f7dbac4efa..6089b91cc698 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -39,7 +39,6 @@
 #include "hw/i386/pc.h"
 #include "exec/address-spaces.h"
 
-#define TYPE_SEV_GUEST "sev-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
 
 
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 83e82aa42c41..a9c980dd4b2d 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -20,6 +20,8 @@
 
 #include "exec/confidential-guest-support.h"
 
+#define TYPE_SEV_GUEST "sev-guest"
+
 #define SEV_POLICY_NODBG        0x1
 #define SEV_POLICY_NOKS         0x2
 #define SEV_POLICY_ES           0x4
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 05/36] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (3 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 04/36] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-23  8:38   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 06/36] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
                   ` (30 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Introduce tdx_kvm_init() and invoke it in kvm_confidential_guest_init()
if it's a TDX VM. More initialization will be added later.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c       | 15 ++++++---------
 target/i386/kvm/meson.build |  2 +-
 target/i386/kvm/tdx-stub.c  |  9 +++++++++
 target/i386/kvm/tdx.c       |  7 +++++++
 target/i386/kvm/tdx.h       |  2 ++
 5 files changed, 25 insertions(+), 10 deletions(-)
 create mode 100644 target/i386/kvm/tdx-stub.c

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f657518616f1..f257ffda259d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -54,6 +54,7 @@
 #include "migration/blocker.h"
 #include "exec/memattrs.h"
 #include "trace.h"
+#include "tdx.h"
 
 #include CONFIG_DEVICES
 
@@ -2430,6 +2431,8 @@ static int kvm_confidential_guest_init(MachineState *ms, Error **errp)
 {
     if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
         return sev_kvm_init(ms->cgs, errp);
+    } else if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
+        return tdx_kvm_init(ms, errp);
     }
 
     return 0;
@@ -2444,16 +2447,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     Error *local_err = NULL;
 
     /*
-     * Initialize SEV context, if required
+     * Initialize confidential guest (SEV/TDX) context, if required
      *
-     * If no memory encryption is requested (ms->cgs == NULL) this is
-     * a no-op.
-     *
-     * It's also a no-op if a non-SEV confidential guest support
-     * mechanism is selected.  SEV is the only mechanism available to
-     * select on x86 at present, so this doesn't arise, but if new
-     * mechanisms are supported in future (e.g. TDX), they'll need
-     * their own initialization either here or elsewhere.
+     * It's a no-op if a non-SEV/non-tdx confidential guest support
+     * mechanism is selected, i.e., ms->cgs == NULL
      */
     ret = kvm_confidential_guest_init(ms, &local_err);
     if (ret < 0) {
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index b2d7d41acde2..fd30b93ecec9 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -9,7 +9,7 @@ i386_softmmu_kvm_ss.add(files(
 
 i386_softmmu_kvm_ss.add(when: 'CONFIG_SEV', if_false: files('sev-stub.c'))
 
-i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
+i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'), if_false: files('tdx-stub.c'))
 
 i386_softmmu_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
 
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
new file mode 100644
index 000000000000..1df24735201e
--- /dev/null
+++ b/target/i386/kvm/tdx-stub.c
@@ -0,0 +1,9 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "tdx.h"
+
+int tdx_kvm_init(MachineState *ms, Error **errp)
+{
+    return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index d3792d4a3d56..77e33ae01147 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -12,10 +12,17 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "qom/object_interfaces.h"
 
+#include "hw/i386/x86.h"
 #include "tdx.h"
 
+int tdx_kvm_init(MachineState *ms, Error **errp)
+{
+    return 0;
+}
+
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
                                    tdx_guest,
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 415aeb5af746..c8a23d95258d 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -16,4 +16,6 @@ typedef struct TdxGuest {
     uint64_t attributes;    /* TD attributes */
 } TdxGuest;
 
+int tdx_kvm_init(MachineState *ms, Error **errp);
+
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 06/36] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (4 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 05/36] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12 17:38   ` Isaku Yamahata
  2022-05-23  8:45   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 07/36] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
                   ` (29 subsequent siblings)
  35 siblings, 2 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.

Besides, introduce the interfaces to invoke TDX "ioctls" at different
scope (KVM, VM and VCPU) in preparation.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 85 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 77e33ae01147..68bedbad0ebe 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,12 +14,97 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "sysemu/kvm.h"
 
 #include "hw/i386/x86.h"
 #include "tdx.h"
 
+enum tdx_ioctl_level{
+    TDX_PLATFORM_IOCTL,
+    TDX_VM_IOCTL,
+    TDX_VCPU_IOCTL,
+};
+
+static int __tdx_ioctl(void *state, enum tdx_ioctl_level level, int cmd_id,
+                        __u32 flags, void *data)
+{
+    struct kvm_tdx_cmd tdx_cmd;
+    int r;
+
+    memset(&tdx_cmd, 0x0, sizeof(tdx_cmd));
+
+    tdx_cmd.id = cmd_id;
+    tdx_cmd.flags = flags;
+    tdx_cmd.data = (__u64)(unsigned long)data;
+
+    switch (level) {
+    case TDX_PLATFORM_IOCTL:
+        r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        break;
+    case TDX_VM_IOCTL:
+        r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        break;
+    case TDX_VCPU_IOCTL:
+        r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        break;
+    default:
+        error_report("Invalid tdx_ioctl_level %d", level);
+        exit(1);
+    }
+
+    return r;
+}
+
+static inline int tdx_platform_ioctl(int cmd_id, __u32 metadata, void *data)
+{
+    return __tdx_ioctl(NULL, TDX_PLATFORM_IOCTL, cmd_id, metadata, data);
+}
+
+static inline int tdx_vm_ioctl(int cmd_id, __u32 metadata, void *data)
+{
+    return __tdx_ioctl(NULL, TDX_VM_IOCTL, cmd_id, metadata, data);
+}
+
+static inline int tdx_vcpu_ioctl(void *vcpu_fd, int cmd_id, __u32 metadata,
+                                 void *data)
+{
+    return  __tdx_ioctl(vcpu_fd, TDX_VCPU_IOCTL, cmd_id, metadata, data);
+}
+
+static struct kvm_tdx_capabilities *tdx_caps;
+
+static void get_tdx_capabilities(void)
+{
+    struct kvm_tdx_capabilities *caps;
+    int max_ent = 1;
+    int r, size;
+
+    do {
+        size = sizeof(struct kvm_tdx_capabilities) +
+               max_ent * sizeof(struct kvm_tdx_cpuid_config);
+        caps = g_malloc0(size);
+        caps->nr_cpuid_configs = max_ent;
+
+        r = tdx_platform_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
+        if (r == -E2BIG) {
+            g_free(caps);
+            max_ent *= 2;
+        } else if (r < 0) {
+            error_report("KVM_TDX_CAPABILITIES failed: %s\n", strerror(-r));
+            exit(1);
+        }
+    }
+    while (r == -E2BIG);
+
+    tdx_caps = caps;
+}
+
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
+    if (!tdx_caps) {
+        get_tdx_capabilities();
+    }
+
     return 0;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 07/36] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (5 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 06/36] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-23  8:48   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 08/36] i386/tdx: Adjust get_supported_cpuid() for TDX VM Xiaoyao Li
                   ` (28 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

It will need special handling for TDX VMs all around the QEMU.
Introduce is_tdx_vm() helper to query if it's a TDX VM.

Cache tdx_guest object thus no need to cast from ms->cgs every time.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 13 +++++++++++++
 target/i386/kvm/tdx.h | 10 ++++++++++
 2 files changed, 23 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 68bedbad0ebe..803154efdb91 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,14 @@
 #include "hw/i386/x86.h"
 #include "tdx.h"
 
+static TdxGuest *tdx_guest;
+
+/* It's valid after kvm_confidential_guest_init()->kvm_tdx_init() */
+bool is_tdx_vm(void)
+{
+    return !!tdx_guest;
+}
+
 enum tdx_ioctl_level{
     TDX_PLATFORM_IOCTL,
     TDX_VM_IOCTL,
@@ -101,10 +109,15 @@ static void get_tdx_capabilities(void)
 
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+
     if (!tdx_caps) {
         get_tdx_capabilities();
     }
 
+    tdx_guest = tdx;
+
     return 0;
 }
 
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index c8a23d95258d..4036ca2f3f99 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -1,6 +1,10 @@
 #ifndef QEMU_I386_TDX_H
 #define QEMU_I386_TDX_H
 
+#ifndef CONFIG_USER_ONLY
+#include CONFIG_DEVICES /* CONFIG_TDX */
+#endif
+
 #include "exec/confidential-guest-support.h"
 
 #define TYPE_TDX_GUEST "tdx-guest"
@@ -16,6 +20,12 @@ typedef struct TdxGuest {
     uint64_t attributes;    /* TD attributes */
 } TdxGuest;
 
+#ifdef CONFIG_TDX
+bool is_tdx_vm(void);
+#else
+#define is_tdx_vm() 0
+#endif /* CONFIG_TDX */
+
 int tdx_kvm_init(MachineState *ms, Error **errp);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 08/36] i386/tdx: Adjust get_supported_cpuid() for TDX VM
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (6 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 07/36] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-23  9:01   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 09/36] KVM: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
                   ` (27 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

For TDX, the allowable CPUID configuration differs from what KVM
reports for KVM scope via KVM_GET_SUPPORTED_CPUID.

- Some CPUID bits are not supported for TDX VM while KVM reports the
  support. Mask them off for TDX VM. e.g., CPUID_EXT_VMX, some PV
  features.

- The supported XCR0 and XSS bits needs to be cap'ed by tdx_caps, because
  KVM uses them to setup XFAM of TD.

Introduce tdx_get_supported_cpuid() to adjust the
kvm_arch_get_supported_cpuid() for TDX VM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/cpu.h     |  5 +++++
 target/i386/kvm/kvm.c |  4 ++++
 target/i386/kvm/tdx.c | 44 +++++++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx.h |  2 ++
 4 files changed, 55 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 9661f9fbd1c6..0c922e5a305a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -567,6 +567,11 @@ typedef enum X86Seg {
 #define ESA_FEATURE_XFD_MASK            (1U << ESA_FEATURE_XFD_BIT)
 
 
+#define XCR0_MASK       (XSTATE_FP_MASK | XSTATE_SSE_MASK | XSTATE_YMM_MASK | \
+                         XSTATE_BNDREGS_MASK | XSTATE_BNDCSR_MASK | \
+                         XSTATE_OPMASK_MASK | XSTATE_ZMM_Hi256_MASK | \
+                         XSTATE_Hi16_ZMM_MASK | XSTATE_PKRU_MASK)
+
 /* CPUID feature words */
 typedef enum FeatureWord {
     FEAT_1_EDX,         /* CPUID[1].EDX */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f257ffda259d..0751e6e102cc 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -498,6 +498,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
         ret |= 1U << KVM_HINTS_REALTIME;
     }
 
+    if (is_tdx_vm()) {
+        tdx_get_supported_cpuid(function, index, reg, &ret);
+    }
+
     return ret;
 }
 
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 803154efdb91..6e3b15ba8a4a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,11 +14,22 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/kvm.h"
 
 #include "hw/i386/x86.h"
 #include "tdx.h"
 
+#define TDX_SUPPORTED_KVM_FEATURES  ((1ULL << KVM_FEATURE_NOP_IO_DELAY) | \
+                                     (1ULL << KVM_FEATURE_STEAL_TIME) | \
+                                     (1ULL << KVM_FEATURE_PV_EOI) | \
+                                     (1ULL << KVM_FEATURE_PV_UNHALT) | \
+                                     (1ULL << KVM_FEATURE_PV_TLB_FLUSH) | \
+                                     (1ULL << KVM_FEATURE_PV_SEND_IPI) | \
+                                     (1ULL << KVM_FEATURE_POLL_CONTROL) | \
+                                     (1ULL << KVM_FEATURE_PV_SCHED_YIELD) | \
+                                     (1ULL << KVM_FEATURE_MSI_EXT_DEST_ID))
+
 static TdxGuest *tdx_guest;
 
 /* It's valid after kvm_confidential_guest_init()->kvm_tdx_init() */
@@ -121,6 +132,39 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
     return 0;
 }
 
+void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
+                             uint32_t *ret)
+{
+    switch (function) {
+    case 1:
+        if (reg == R_ECX) {
+            *ret &= ~CPUID_EXT_VMX;
+        }
+        break;
+    case 0xd:
+        if (index == 0) {
+            if (reg == R_EAX) {
+                *ret &= (uint32_t)tdx_caps->xfam_fixed0 & XCR0_MASK;
+                *ret |= (uint32_t)tdx_caps->xfam_fixed1 & XCR0_MASK;
+            } else if (reg == R_EDX) {
+                *ret &= (tdx_caps->xfam_fixed0 & XCR0_MASK) >> 32;
+                *ret |= (tdx_caps->xfam_fixed1 & XCR0_MASK) >> 32;
+            }
+        } else if (index == 1) {
+            /* TODO: Adjust XSS when it's supported. */
+        }
+        break;
+    case KVM_CPUID_FEATURES:
+        if (reg == R_EAX) {
+            *ret &= TDX_SUPPORTED_KVM_FEATURES;
+        }
+        break;
+    default:
+        /* TODO: Use tdx_caps to adjust CPUID leafs. */
+        break;
+    }
+}
+
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
                                    tdx_guest,
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 4036ca2f3f99..06599b65b827 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -27,5 +27,7 @@ bool is_tdx_vm(void);
 #endif /* CONFIG_TDX */
 
 int tdx_kvm_init(MachineState *ms, Error **errp);
+void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
+                             uint32_t *ret);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 09/36] KVM: Introduce kvm_arch_pre_create_vcpu()
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (7 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 08/36] i386/tdx: Adjust get_supported_cpuid() for TDX VM Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12 17:50   ` Isaku Yamahata
  2022-05-12  3:17 ` [RFC PATCH v4 10/36] i386/kvm: Move architectural CPUID leaf generation to separate helper Xiaoyao Li
                   ` (26 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c  | 12 ++++++++++++
 include/sysemu/kvm.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 32e177bd26b4..e6fa9d23207a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -457,6 +457,11 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
     return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
 }
 
+int __attribute__ ((weak)) kvm_arch_pre_create_vcpu(CPUState *cpu)
+{
+    return 0;
+}
+
 int kvm_init_vcpu(CPUState *cpu, Error **errp)
 {
     KVMState *s = kvm_state;
@@ -465,6 +470,13 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
     trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+    ret = kvm_arch_pre_create_vcpu(cpu);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "kvm_init_vcpu: kvm_arch_pre_create_vcpu() failed");
+        goto err;
+    }
+
     ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
     if (ret < 0) {
         error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)",
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a783c7886811..0e94031ab7c7 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -373,6 +373,7 @@ int kvm_arch_put_registers(CPUState *cpu, int level);
 
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu);
 int kvm_arch_init_vcpu(CPUState *cpu);
 int kvm_arch_destroy_vcpu(CPUState *cpu);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 10/36] i386/kvm: Move architectural CPUID leaf generation to separate helper
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (8 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 09/36] KVM: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12 17:48   ` Isaku Yamahata
  2022-05-12  3:17 ` [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
                   ` (25 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the architectural (for lack of a better term) CPUID leaf generation
to a separate helper so that the generation code can be reused by TDX,
which needs to generate a canonical VM-scoped configuration.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c      | 222 +++++++++++++++++++------------------
 target/i386/kvm/kvm_i386.h |   4 +
 2 files changed, 119 insertions(+), 107 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 0751e6e102cc..5be151e6499b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1686,8 +1686,6 @@ static int hyperv_init_vcpu(X86CPU *cpu)
 
 static Error *invtsc_mig_blocker;
 
-#define KVM_MAX_CPUID_ENTRIES  100
-
 static void kvm_init_xsave(CPUX86State *env)
 {
     if (has_xsave2) {
@@ -1708,115 +1706,21 @@ static void kvm_init_xsave(CPUX86State *env)
            env->xsave_buf_len);
 }
 
-int kvm_arch_init_vcpu(CPUState *cs)
+uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+                            uint32_t cpuid_i)
 {
-    struct {
-        struct kvm_cpuid2 cpuid;
-        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
-    } cpuid_data;
-    /*
-     * The kernel defines these structs with padding fields so there
-     * should be no extra padding in our cpuid_data struct.
-     */
-    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
-                      sizeof(struct kvm_cpuid2) +
-                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
-
-    X86CPU *cpu = X86_CPU(cs);
-    CPUX86State *env = &cpu->env;
-    uint32_t limit, i, j, cpuid_i;
+    uint32_t limit, i, j;
     uint32_t unused;
     struct kvm_cpuid_entry2 *c;
-    uint32_t signature[3];
-    int kvm_base = KVM_CPUID_SIGNATURE;
-    int max_nested_state_len;
-    int r;
-    Error *local_err = NULL;
-
-    memset(&cpuid_data, 0, sizeof(cpuid_data));
-
-    cpuid_i = 0;
-
-    has_xsave2 = kvm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
-
-    r = kvm_arch_set_tsc_khz(cs);
-    if (r < 0) {
-        return r;
-    }
-
-    /* vcpu's TSC frequency is either specified by user, or following
-     * the value used by KVM if the former is not present. In the
-     * latter case, we query it from KVM and record in env->tsc_khz,
-     * so that vcpu's TSC frequency can be migrated later via this field.
-     */
-    if (!env->tsc_khz) {
-        r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
-            kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
-            -ENOTSUP;
-        if (r > 0) {
-            env->tsc_khz = r;
-        }
-    }
-
-    env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
-
-    /*
-     * kvm_hyperv_expand_features() is called here for the second time in case
-     * KVM_CAP_SYS_HYPERV_CPUID is not supported. While we can't possibly handle
-     * 'query-cpu-model-expansion' in this case as we don't have a KVM vCPU to
-     * check which Hyper-V enlightenments are supported and which are not, we
-     * can still proceed and check/expand Hyper-V enlightenments here so legacy
-     * behavior is preserved.
-     */
-    if (!kvm_hyperv_expand_features(cpu, &local_err)) {
-        error_report_err(local_err);
-        return -ENOSYS;
-    }
-
-    if (hyperv_enabled(cpu)) {
-        r = hyperv_init_vcpu(cpu);
-        if (r) {
-            return r;
-        }
-
-        cpuid_i = hyperv_fill_cpuids(cs, cpuid_data.entries);
-        kvm_base = KVM_CPUID_SIGNATURE_NEXT;
-        has_msr_hv_hypercall = true;
-    }
-
-    if (cpu->expose_kvm) {
-        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
-        c = &cpuid_data.entries[cpuid_i++];
-        c->function = KVM_CPUID_SIGNATURE | kvm_base;
-        c->eax = KVM_CPUID_FEATURES | kvm_base;
-        c->ebx = signature[0];
-        c->ecx = signature[1];
-        c->edx = signature[2];
-
-        c = &cpuid_data.entries[cpuid_i++];
-        c->function = KVM_CPUID_FEATURES | kvm_base;
-        c->eax = env->features[FEAT_KVM];
-        c->edx = env->features[FEAT_KVM_HINTS];
-    }
 
     cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
 
-    if (cpu->kvm_pv_enforce_cpuid) {
-        r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1);
-        if (r < 0) {
-            fprintf(stderr,
-                    "failed to enable KVM_CAP_ENFORCE_PV_FEATURE_CPUID: %s",
-                    strerror(-r));
-            abort();
-        }
-    }
-
     for (i = 0; i <= limit; i++) {
         if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
             fprintf(stderr, "unsupported level value: 0x%x\n", limit);
             abort();
         }
-        c = &cpuid_data.entries[cpuid_i++];
+        c = &entries[cpuid_i++];
 
         switch (i) {
         case 2: {
@@ -1835,7 +1739,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:2):eax & 0xf = 0x%x\n", times);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
                 c->function = i;
                 c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
                 cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
@@ -1881,7 +1785,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         case 0x7:
@@ -1901,7 +1805,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                                 "cpuid(eax:0x12,ecx:0x%x)\n", j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         case 0x14:
@@ -1921,7 +1825,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                                 "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
                 c->function = i;
                 c->index = j;
                 c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
@@ -1978,7 +1882,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
             fprintf(stderr, "unsupported xlevel value: 0x%x\n", limit);
             abort();
         }
-        c = &cpuid_data.entries[cpuid_i++];
+        c = &entries[cpuid_i++];
 
         switch (i) {
         case 0x8000001d:
@@ -1997,7 +1901,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         default:
@@ -2024,7 +1928,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                 fprintf(stderr, "unsupported xlevel2 value: 0x%x\n", limit);
                 abort();
             }
-            c = &cpuid_data.entries[cpuid_i++];
+            c = &entries[cpuid_i++];
 
             c->function = i;
             c->flags = 0;
@@ -2032,6 +1936,110 @@ int kvm_arch_init_vcpu(CPUState *cs)
         }
     }
 
+    return cpuid_i;
+}
+
+int kvm_arch_init_vcpu(CPUState *cs)
+{
+    struct {
+        struct kvm_cpuid2 cpuid;
+        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
+    } cpuid_data;
+    /*
+     * The kernel defines these structs with padding fields so there
+     * should be no extra padding in our cpuid_data struct.
+     */
+    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
+                      sizeof(struct kvm_cpuid2) +
+                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = &cpu->env;
+    uint32_t cpuid_i;
+    struct kvm_cpuid_entry2 *c;
+    uint32_t signature[3];
+    int kvm_base = KVM_CPUID_SIGNATURE;
+    int max_nested_state_len;
+    int r;
+    Error *local_err = NULL;
+
+    memset(&cpuid_data, 0, sizeof(cpuid_data));
+
+    cpuid_i = 0;
+
+    has_xsave2 = kvm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
+
+    r = kvm_arch_set_tsc_khz(cs);
+    if (r < 0) {
+        return r;
+    }
+
+    /* vcpu's TSC frequency is either specified by user, or following
+     * the value used by KVM if the former is not present. In the
+     * latter case, we query it from KVM and record in env->tsc_khz,
+     * so that vcpu's TSC frequency can be migrated later via this field.
+     */
+    if (!env->tsc_khz) {
+        r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
+            kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
+            -ENOTSUP;
+        if (r > 0) {
+            env->tsc_khz = r;
+        }
+    }
+
+    env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
+
+    /*
+     * kvm_hyperv_expand_features() is called here for the second time in case
+     * KVM_CAP_SYS_HYPERV_CPUID is not supported. While we can't possibly handle
+     * 'query-cpu-model-expansion' in this case as we don't have a KVM vCPU to
+     * check which Hyper-V enlightenments are supported and which are not, we
+     * can still proceed and check/expand Hyper-V enlightenments here so legacy
+     * behavior is preserved.
+     */
+    if (!kvm_hyperv_expand_features(cpu, &local_err)) {
+        error_report_err(local_err);
+        return -ENOSYS;
+    }
+
+    if (hyperv_enabled(cpu)) {
+        r = hyperv_init_vcpu(cpu);
+        if (r) {
+            return r;
+        }
+
+        cpuid_i = hyperv_fill_cpuids(cs, cpuid_data.entries);
+        kvm_base = KVM_CPUID_SIGNATURE_NEXT;
+        has_msr_hv_hypercall = true;
+    }
+
+    if (cpu->expose_kvm) {
+        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
+        c = &cpuid_data.entries[cpuid_i++];
+        c->function = KVM_CPUID_SIGNATURE | kvm_base;
+        c->eax = KVM_CPUID_FEATURES | kvm_base;
+        c->ebx = signature[0];
+        c->ecx = signature[1];
+        c->edx = signature[2];
+
+        c = &cpuid_data.entries[cpuid_i++];
+        c->function = KVM_CPUID_FEATURES | kvm_base;
+        c->eax = env->features[FEAT_KVM];
+        c->edx = env->features[FEAT_KVM_HINTS];
+    }
+
+    if (cpu->kvm_pv_enforce_cpuid) {
+        r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1);
+        if (r < 0) {
+            fprintf(stderr,
+                    "failed to enable KVM_CAP_ENFORCE_PV_FEATURE_CPUID: %s",
+                    strerror(-r));
+            abort();
+        }
+    }
+
+    cpuid_i = kvm_x86_arch_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
     if (((env->cpuid_version >> 8)&0xF) >= 6
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index b434feaa6b1d..5c7972f617e8 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -24,6 +24,10 @@
 #define kvm_ioapic_in_kernel() \
     (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
 
+#define KVM_MAX_CPUID_ENTRIES  100
+uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+                            uint32_t cpuid_i);
+
 #else
 
 #define kvm_pit_in_kernel()      0
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (9 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 10/36] i386/kvm: Move architectural CPUID leaf generation to separate helper Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-23  9:20   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 12/36] i386/tdx: Wire CPU features up with attributes of TD guest Xiaoyao Li
                   ` (24 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
configures global TD state, e.g. the canonical CPUID config, and must
be executed prior to creating vCPUs.

Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM and
tie x86cpu->enable_pmu with TD's attributes.

Note, this doesn't address the fact that QEMU may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring QEMU to
provide a stable CPUID config prior to kvm_arch_init().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c        |  9 ++++++++-
 target/i386/kvm/kvm.c      |  8 ++++++++
 target/i386/kvm/tdx-stub.c |  5 +++++
 target/i386/kvm/tdx.c      | 35 +++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx.h      |  4 ++++
 5 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e6fa9d23207a..88468878d181 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -470,10 +470,17 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
     trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+    /*
+     * tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn may call
+     * kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL pointer
+     * dereference.
+     */
+    cpu->kvm_state = s;
     ret = kvm_arch_pre_create_vcpu(cpu);
     if (ret < 0) {
         error_setg_errno(errp, -ret,
                          "kvm_init_vcpu: kvm_arch_pre_create_vcpu() failed");
+        cpu->kvm_state = NULL;
         goto err;
     }
 
@@ -481,11 +488,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
     if (ret < 0) {
         error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)",
                          kvm_arch_vcpu_id(cpu));
+        cpu->kvm_state = NULL;
         goto err;
     }
 
     cpu->kvm_fd = ret;
-    cpu->kvm_state = s;
     cpu->vcpu_dirty = true;
     cpu->dirty_pages = 0;
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5be151e6499b..f2d7c3cf59ac 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2175,6 +2175,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
     return r;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu)
+{
+    if (is_tdx_vm())
+        return tdx_pre_create_vcpu(cpu);
+
+    return 0;
+}
+
 int kvm_arch_destroy_vcpu(CPUState *cs)
 {
     X86CPU *cpu = X86_CPU(cs);
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 1df24735201e..2871de9d7b56 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -7,3 +7,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 {
     return -EINVAL;
 }
+
+int tdx_pre_create_vcpu(CPUState *cpu)
+{
+    return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 6e3b15ba8a4a..3472b59c2dbb 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -18,6 +18,7 @@
 #include "sysemu/kvm.h"
 
 #include "hw/i386/x86.h"
+#include "kvm_i386.h"
 #include "tdx.h"
 
 #define TDX_SUPPORTED_KVM_FEATURES  ((1ULL << KVM_FEATURE_NOP_IO_DELAY) | \
@@ -165,6 +166,38 @@ void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
     }
 }
 
+int tdx_pre_create_vcpu(CPUState *cpu)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    struct kvm_tdx_init_vm init_vm;
+    int r = 0;
+
+    qemu_mutex_lock(&tdx_guest->lock);
+    if (tdx_guest->initialized) {
+        goto out;
+    }
+
+    memset(&init_vm, 0, sizeof(init_vm));
+    init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
+
+    init_vm.attributes = tdx_guest->attributes;
+    init_vm.max_vcpus = ms->smp.cpus;
+
+    r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
+    if (r < 0) {
+        error_report("KVM_TDX_INIT_VM failed %s", strerror(-r));
+        goto out;
+    }
+
+    tdx_guest->initialized = true;
+
+out:
+    qemu_mutex_unlock(&tdx_guest->lock);
+    return r;
+}
+
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
                                    tdx_guest,
@@ -177,6 +210,8 @@ static void tdx_guest_init(Object *obj)
 {
     TdxGuest *tdx = TDX_GUEST(obj);
 
+    qemu_mutex_init(&tdx->lock);
+
     tdx->attributes = 0;
 }
 
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 06599b65b827..46a24ee8c7cc 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -17,6 +17,9 @@ typedef struct TdxGuestClass {
 typedef struct TdxGuest {
     ConfidentialGuestSupport parent_obj;
 
+    QemuMutex lock;
+
+    bool initialized;
     uint64_t attributes;    /* TD attributes */
 } TdxGuest;
 
@@ -29,5 +32,6 @@ bool is_tdx_vm(void);
 int tdx_kvm_init(MachineState *ms, Error **errp);
 void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
                              uint32_t *ret);
+int tdx_pre_create_vcpu(CPUState *cpu);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 12/36] i386/tdx: Wire CPU features up with attributes of TD guest
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (10 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes Xiaoyao Li
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

For QEMU VMs, PKS is configured via CPUID_7_0_ECX_PKS and PMU is
configured by x86cpu->enable_pmu. Reuse the existing configuration
interface for TDX VMs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 3472b59c2dbb..e9c6e6fb396c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -31,6 +31,9 @@
                                      (1ULL << KVM_FEATURE_PV_SCHED_YIELD) | \
                                      (1ULL << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_TD_ATTRIBUTES_PKS               BIT_ULL(30)
+#define TDX_TD_ATTRIBUTES_PERFMON           BIT_ULL(63)
+
 static TdxGuest *tdx_guest;
 
 /* It's valid after kvm_confidential_guest_init()->kvm_tdx_init() */
@@ -166,6 +169,15 @@ void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
     }
 }
 
+static void setup_td_guest_attributes(X86CPU *x86cpu)
+{
+    CPUX86State *env = &x86cpu->env;
+
+    tdx_guest->attributes |= (env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS) ?
+                             TDX_TD_ATTRIBUTES_PKS : 0;
+    tdx_guest->attributes |= x86cpu->enable_pmu ? TDX_TD_ATTRIBUTES_PERFMON : 0;
+}
+
 int tdx_pre_create_vcpu(CPUState *cpu)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
@@ -179,6 +191,8 @@ int tdx_pre_create_vcpu(CPUState *cpu)
         goto out;
     }
 
+    setup_td_guest_attributes(x86cpu);
+
     memset(&init_vm, 0, sizeof(init_vm));
     init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (11 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 12/36] i386/tdx: Wire CPU features up with attributes of TD guest Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-23  9:39   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 14/36] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
                   ` (22 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
fixed-1 bits must be set.

Besides, sanity check the attribute bits that have not been supported by
QEMU yet. e.g., debug bit, it will be allowed in the future when debug
TD support lands in QEMU.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e9c6e6fb396c..9f2cdf640b5c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -31,6 +31,7 @@
                                      (1ULL << KVM_FEATURE_PV_SCHED_YIELD) | \
                                      (1ULL << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_TD_ATTRIBUTES_DEBUG             BIT_ULL(0)
 #define TDX_TD_ATTRIBUTES_PKS               BIT_ULL(30)
 #define TDX_TD_ATTRIBUTES_PERFMON           BIT_ULL(63)
 
@@ -169,13 +170,32 @@ void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
     }
 }
 
-static void setup_td_guest_attributes(X86CPU *x86cpu)
+static int tdx_validate_attributes(TdxGuest *tdx)
+{
+    if (((tdx->attributes & tdx_caps->attrs_fixed0) | tdx_caps->attrs_fixed1) !=
+        tdx->attributes) {
+            error_report("Invalid attributes 0x%lx for TDX VM (fixed0 0x%llx, fixed1 0x%llx)",
+                          tdx->attributes, tdx_caps->attrs_fixed0, tdx_caps->attrs_fixed1);
+            return -EINVAL;
+    }
+
+    if (tdx->attributes & TDX_TD_ATTRIBUTES_DEBUG) {
+        error_report("Current QEMU doesn't support attributes.debug[bit 0] for TDX VM");
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int setup_td_guest_attributes(X86CPU *x86cpu)
 {
     CPUX86State *env = &x86cpu->env;
 
     tdx_guest->attributes |= (env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS) ?
                              TDX_TD_ATTRIBUTES_PKS : 0;
     tdx_guest->attributes |= x86cpu->enable_pmu ? TDX_TD_ATTRIBUTES_PERFMON : 0;
+
+    return tdx_validate_attributes(tdx_guest);
 }
 
 int tdx_pre_create_vcpu(CPUState *cpu)
@@ -191,7 +211,10 @@ int tdx_pre_create_vcpu(CPUState *cpu)
         goto out;
     }
 
-    setup_td_guest_attributes(x86cpu);
+    r = setup_td_guest_attributes(x86cpu);
+    if (r) {
+        goto out;
+    }
 
     memset(&init_vm, 0, sizeof(init_vm));
     init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 14/36] i386/tdx: Implement user specified tsc frequency
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (12 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12 18:04   ` Isaku Yamahata
  2022-05-23  9:43   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 15/36] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM Xiaoyao Li
                   ` (21 subsequent siblings)
  35 siblings, 2 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and pass it
to KVM_TDX_INIT_VM.

Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by TDX module).

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c |  8 ++++++++
 target/i386/kvm/tdx.c | 18 ++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f2d7c3cf59ac..c51125ab200f 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -818,6 +818,14 @@ static int kvm_arch_set_tsc_khz(CPUState *cs)
     int r, cur_freq;
     bool set_ioctl = false;
 
+    /*
+     * TD guest's TSC is immutable, it cannot be set/changed via
+     * KVM_SET_TSC_KHZ, but only be initialized via KVM_TDX_INIT_VM
+     */
+    if (is_tdx_vm()) {
+        return 0;
+    }
+
     if (!env->tsc_khz) {
         return 0;
     }
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 9f2cdf640b5c..622efc409438 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -35,6 +35,9 @@
 #define TDX_TD_ATTRIBUTES_PKS               BIT_ULL(30)
 #define TDX_TD_ATTRIBUTES_PERFMON           BIT_ULL(63)
 
+#define TDX_MIN_TSC_FREQUENCY_KHZ   (100 * 1000)
+#define TDX_MAX_TSC_FREQUENCY_KHZ   (10 * 1000 * 1000)
+
 static TdxGuest *tdx_guest;
 
 /* It's valid after kvm_confidential_guest_init()->kvm_tdx_init() */
@@ -211,6 +214,20 @@ int tdx_pre_create_vcpu(CPUState *cpu)
         goto out;
     }
 
+    r = -EINVAL;
+    if (env->tsc_khz && (env->tsc_khz < TDX_MIN_TSC_FREQUENCY_KHZ ||
+                         env->tsc_khz > TDX_MAX_TSC_FREQUENCY_KHZ)) {
+        error_report("Invalid TSC %ld KHz, must specify cpu_frequency between [%d, %d] kHz",
+                      env->tsc_khz, TDX_MIN_TSC_FREQUENCY_KHZ,
+                      TDX_MAX_TSC_FREQUENCY_KHZ);
+        goto out;
+    }
+
+    if (env->tsc_khz % (25 * 1000)) {
+        error_report("Invalid TSC %ld KHz, it must be multiple of 25MHz", env->tsc_khz);
+        goto out;
+    }
+
     r = setup_td_guest_attributes(x86cpu);
     if (r) {
         goto out;
@@ -221,6 +238,7 @@ int tdx_pre_create_vcpu(CPUState *cpu)
 
     init_vm.attributes = tdx_guest->attributes;
     init_vm.max_vcpus = ms->smp.cpus;
+    init_vm.tsc_khz = env->tsc_khz;
 
     r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
     if (r < 0) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 15/36] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (13 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 14/36] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-23  9:45   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 16/36] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
                   ` (20 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX only supports readonly for shared memory but not for private memory.

In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 622efc409438..23bc3c32b14a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -135,6 +135,15 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
         get_tdx_capabilities();
     }
 
+    /*
+     * Set kvm_readonly_mem_allowed to false, because TDX only supports readonly
+     * memory for shared memory but not for private memory. Besides, whether a
+     * memslot is private or shared is not determined by QEMU.
+     *
+     * Thus, just mark readonly memory not supported for simplicity.
+     */
+    kvm_readonly_mem_allowed = false;
+
     tdx_guest = tdx;
 
     return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 16/36] i386/tdvf: Introduce function to parse TDVF metadata
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (14 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 15/36] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  7:02   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 17/36] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
                   ` (19 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.

A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.

TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.

Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

---
Changes in v4:
 - rename TDX_METADATA_GUID to TDX_METADATA_OFFSET_GUID
---
 hw/i386/Kconfig        |   1 +
 hw/i386/meson.build    |   1 +
 hw/i386/tdvf.c         | 197 +++++++++++++++++++++++++++++++++++++++++
 include/hw/i386/tdvf.h |  51 +++++++++++
 4 files changed, 250 insertions(+)
 create mode 100644 hw/i386/tdvf.c
 create mode 100644 include/hw/i386/tdvf.h

diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 9e40ff79fc2d..0c3e3a464012 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -12,6 +12,7 @@ config SGX
 
 config TDX
     bool
+    select X86_FW_OVMF
     depends on KVM
 
 config PC
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3d7..97f3b50503b0 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -28,6 +28,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'port92.c'))
 i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
                                         if_false: files('pc_sysfw_ovmf-stubs.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
new file mode 100644
index 000000000000..a691d92eee6e
--- /dev/null
+++ b/hw/i386/tdvf.c
@@ -0,0 +1,197 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/tdvf.h"
+#include "sysemu/kvm.h"
+
+#define TDX_METADATA_OFFSET_GUID    "e47a6535-984a-4798-865e-4685a7bf8ec2"
+#define TDX_METADATA_VERSION        1
+#define TDVF_SIGNATURE              0x46564454 /* TDVF as little endian */
+
+typedef struct {
+    uint32_t DataOffset;
+    uint32_t RawDataSize;
+    uint64_t MemoryAddress;
+    uint64_t MemoryDataSize;
+    uint32_t Type;
+    uint32_t Attributes;
+} TdvfSectionEntry;
+
+typedef struct {
+    uint32_t Signature;
+    uint32_t Length;
+    uint32_t Version;
+    uint32_t NumberOfSectionEntries;
+    TdvfSectionEntry SectionEntries[];
+} TdvfMetadata;
+
+struct tdx_metadata_offset {
+    uint32_t offset;
+};
+
+static TdvfMetadata *tdvf_get_metadata(void *flash_ptr, int size)
+{
+    TdvfMetadata *metadata;
+    uint32_t offset = 0;
+    uint8_t *data;
+
+    if ((uint32_t) size != size) {
+        return NULL;
+    }
+
+    if (pc_system_ovmf_table_find(TDX_METADATA_OFFSET_GUID, &data, NULL)) {
+        offset = size - le32_to_cpu(((struct tdx_metadata_offset *)data)->offset);
+
+        if (offset + sizeof(*metadata) > size) {
+            return NULL;
+        }
+    } else {
+        error_report("Cannot find TDX_METADATA_OFFSET_GUID");
+        return NULL;
+    }
+
+    metadata = flash_ptr + offset;
+
+    /* Finally, verify the signature to determine if this is a TDVF image. */
+    metadata->Signature = le32_to_cpu(metadata->Signature);
+    if (metadata->Signature != TDVF_SIGNATURE) {
+        error_report("Invalid TDVF signature in metadata!");
+        return NULL;
+    }
+
+    /* Sanity check that the TDVF doesn't overlap its own metadata. */
+    metadata->Length = le32_to_cpu(metadata->Length);
+    if (offset + metadata->Length > size) {
+        return NULL;
+    }
+
+    /* Only version 1 is supported/defined. */
+    metadata->Version = le32_to_cpu(metadata->Version);
+    if (metadata->Version != TDX_METADATA_VERSION) {
+        return NULL;
+    }
+
+    return metadata;
+}
+
+static int tdvf_parse_section_entry(const TdvfSectionEntry *src,
+                                     TdxFirmwareEntry *entry)
+{
+    entry->data_offset = le32_to_cpu(src->DataOffset);
+    entry->data_len = le32_to_cpu(src->RawDataSize);
+    entry->address = le64_to_cpu(src->MemoryAddress);
+    entry->size = le64_to_cpu(src->MemoryDataSize);
+    entry->type = le32_to_cpu(src->Type);
+    entry->attributes = le32_to_cpu(src->Attributes);
+
+    /* sanity check */
+    if (entry->size < entry->data_len) {
+        error_report("Broken metadata RawDataSize 0x%x MemoryDataSize 0x%lx",
+                     entry->data_len, entry->size);
+        return -1;
+    }
+    if (!QEMU_IS_ALIGNED(entry->address, TARGET_PAGE_SIZE)) {
+        error_report("MemoryAddress 0x%lx not page aligned", entry->address);
+        return -1;
+    }
+    if (!QEMU_IS_ALIGNED(entry->size, TARGET_PAGE_SIZE)) {
+        error_report("MemoryDataSize 0x%lx not page aligned", entry->size);
+        return -1;
+    }
+
+    switch (entry->type) {
+    case TDVF_SECTION_TYPE_BFV:
+    case TDVF_SECTION_TYPE_CFV:
+        /* The sections that must be copied from firmware image to TD memory */
+        if (entry->data_len == 0) {
+            error_report("%d section with RawDataSize == 0", entry->type);
+            return -1;
+        }
+        break;
+    case TDVF_SECTION_TYPE_TD_HOB:
+    case TDVF_SECTION_TYPE_TEMP_MEM:
+        /* The sections that no need to be copied from firmware image */
+        if (entry->data_len != 0) {
+            error_report("%d section with RawDataSize 0x%x != 0",
+                         entry->type, entry->data_len);
+            return -1;
+        }
+        break;
+    default:
+        error_report("TDVF contains unsupported section type %d", entry->type);
+        return -1;
+    }
+
+    return 0;
+}
+
+int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size)
+{
+    TdvfSectionEntry *sections;
+    TdvfMetadata *metadata;
+    ssize_t entries_size;
+    uint32_t len, i;
+
+    metadata = tdvf_get_metadata(flash_ptr, size);
+    if (!metadata) {
+        return -EINVAL;
+    }
+
+    //load and parse metadata entries
+    fw->nr_entries = le32_to_cpu(metadata->NumberOfSectionEntries);
+    if (fw->nr_entries < 2) {
+        error_report("Invalid number of fw entries (%u) in TDVF", fw->nr_entries);
+        return -EINVAL;
+    }
+
+    len = le32_to_cpu(metadata->Length);
+    entries_size = fw->nr_entries * sizeof(TdvfSectionEntry);
+    if (len != sizeof(*metadata) + entries_size) {
+        error_report("TDVF metadata len (0x%x) mismatch, expected (0x%x)",
+                     len, (uint32_t)(sizeof(*metadata) + entries_size));
+        return -EINVAL;
+    }
+
+    fw->entries = g_new(TdxFirmwareEntry, fw->nr_entries);
+    sections = g_new(TdvfSectionEntry, fw->nr_entries);
+
+    if (!memcpy(sections, (void *)metadata + sizeof(*metadata), entries_size))  {
+        error_report("Failed to read TDVF section entries");
+        goto err;
+    }
+
+    for (i = 0; i < fw->nr_entries; i++) {
+        if (tdvf_parse_section_entry(&sections[i], &fw->entries[i])) {
+            goto err;
+        }
+    }
+    g_free(sections);
+
+    return 0;
+
+err:
+    g_free(sections);
+    fw->entries = 0;
+    g_free(fw->entries);
+    return -EINVAL;
+}
diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
new file mode 100644
index 000000000000..593341eb2e93
--- /dev/null
+++ b/include/hw/i386/tdvf.h
@@ -0,0 +1,51 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_TDVF_H
+#define HW_I386_TDVF_H
+
+#include "qemu/osdep.h"
+
+#define TDVF_SECTION_TYPE_BFV               0
+#define TDVF_SECTION_TYPE_CFV               1
+#define TDVF_SECTION_TYPE_TD_HOB            2
+#define TDVF_SECTION_TYPE_TEMP_MEM          3
+
+#define TDVF_SECTION_ATTRIBUTES_MR_EXTEND   (1U << 0)
+#define TDVF_SECTION_ATTRIBUTES_PAGE_AUG    (1U << 1)
+
+typedef struct TdxFirmwareEntry {
+    uint32_t data_offset;
+    uint32_t data_len;
+    uint64_t address;
+    uint64_t size;
+    uint32_t type;
+    uint32_t attributes;
+} TdxFirmwareEntry;
+
+typedef struct TdxFirmware {
+    uint32_t nr_entries;
+    TdxFirmwareEntry *entries;
+} TdxFirmware;
+
+int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size);
+
+#endif /* HW_I386_TDVF_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 17/36] i386/tdx: Parse TDVF metadata for TDX VM
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (15 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 16/36] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  7:03   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup Xiaoyao Li
                   ` (18 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX cannot support pflash device since it doesn't support read-only
memslot and doesn't support emulation. Load TDVF(OVMF) with -bios option
for TDs.

When boot a TD, besides load TDVF to the address below 4G, it needs
parse TDVF metadata.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/pc_sysfw.c         | 7 +++++++
 hw/i386/x86.c              | 3 ++-
 target/i386/kvm/tdx-stub.c | 5 +++++
 target/i386/kvm/tdx.c      | 4 ++++
 target/i386/kvm/tdx.h      | 4 ++++
 5 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index c8d9e71b889b..cf63434ba89d 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -37,6 +37,7 @@
 #include "hw/block/flash.h"
 #include "sysemu/kvm.h"
 #include "sev.h"
+#include "kvm/tdx.h"
 
 #define FLASH_SECTOR_SIZE 4096
 
@@ -265,5 +266,11 @@ void x86_firmware_configure(void *ptr, int size)
         }
 
         sev_encrypt_flash(ptr, size, &error_fatal);
+    } else if (is_tdx_vm()) {
+        ret = tdx_parse_tdvf(ptr, size);
+        if (ret) {
+            error_report("failed to parse TDVF for TDX VM");
+            exit(1);
+        }
     }
 }
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 4d0b0047627d..fdf6af2f6add 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -47,6 +47,7 @@
 #include "hw/intc/i8259.h"
 #include "hw/rtc/mc146818rtc.h"
 #include "target/i386/sev.h"
+#include "kvm/tdx.h"
 
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/irq.h"
@@ -1115,7 +1116,7 @@ void x86_bios_rom_init(MachineState *ms, const char *default_firmware,
     }
     bios = g_malloc(sizeof(*bios));
     memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
-    if (sev_enabled()) {
+    if (sev_enabled() || is_tdx_vm()) {
         /*
          * The concept of a "reset" simply doesn't exist for
          * confidential computing guests, we have to destroy and
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 2871de9d7b56..395a59721266 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -12,3 +12,8 @@ int tdx_pre_create_vcpu(CPUState *cpu)
 {
     return -EINVAL;
 }
+
+int tdx_parse_tdvf(void *flash_ptr, int size)
+{
+    return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 23bc3c32b14a..2953d2728b32 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -261,6 +261,10 @@ out:
     qemu_mutex_unlock(&tdx_guest->lock);
     return r;
 }
+int tdx_parse_tdvf(void *flash_ptr, int size)
+{
+    return tdvf_parse_metadata(&tdx_guest->tdvf, flash_ptr, size);
+}
 
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 46a24ee8c7cc..12bcf25bb95b 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -6,6 +6,7 @@
 #endif
 
 #include "exec/confidential-guest-support.h"
+#include "hw/i386/tdvf.h"
 
 #define TYPE_TDX_GUEST "tdx-guest"
 #define TDX_GUEST(obj)  OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
@@ -21,6 +22,8 @@ typedef struct TdxGuest {
 
     bool initialized;
     uint64_t attributes;    /* TD attributes */
+
+    TdxFirmware tdvf;
 } TdxGuest;
 
 #ifdef CONFIG_TDX
@@ -33,5 +36,6 @@ int tdx_kvm_init(MachineState *ms, Error **errp);
 void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
                              uint32_t *ret);
 int tdx_pre_create_vcpu(CPUState *cpu);
+int tdx_parse_tdvf(void *flash_ptr, int size);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (16 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 17/36] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  7:08   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 19/36] i386/tdx: Don't initialize pc.rom for TDX VMs Xiaoyao Li
                   ` (17 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX guest cannot go to real mode, so just skip the setup of isa-bios.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/x86.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index fdf6af2f6add..17f2252296c5 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1138,17 +1138,19 @@ void x86_bios_rom_init(MachineState *ms, const char *default_firmware,
     }
     g_free(filename);
 
-    /* map the last 128KB of the BIOS in ISA space */
-    isa_bios_size = MIN(bios_size, 128 * KiB);
-    isa_bios = g_malloc(sizeof(*isa_bios));
-    memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
-                             bios_size - isa_bios_size, isa_bios_size);
-    memory_region_add_subregion_overlap(rom_memory,
-                                        0x100000 - isa_bios_size,
-                                        isa_bios,
-                                        1);
-    if (!isapc_ram_fw) {
-        memory_region_set_readonly(isa_bios, true);
+    if (!is_tdx_vm()) {
+        /* map the last 128KB of the BIOS in ISA space */
+        isa_bios_size = MIN(bios_size, 128 * KiB);
+        isa_bios = g_malloc(sizeof(*isa_bios));
+        memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
+                                bios_size - isa_bios_size, isa_bios_size);
+        memory_region_add_subregion_overlap(rom_memory,
+                                            0x100000 - isa_bios_size,
+                                            isa_bios,
+                                            1);
+        if (!isapc_ram_fw) {
+            memory_region_set_readonly(isa_bios, true);
+        }
     }
 
     /* map all the bios at the top of memory */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 19/36] i386/tdx: Don't initialize pc.rom for TDX VMs
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (17 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 20/36] i386/tdx: Register a machine_init_done callback for TD Xiaoyao Li
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

For TDX, the address below 1MB are entirely general RAM. No need to
initialize pc.rom memory region for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/pc.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5ef20e2071a7..c8d3f2fbf9fc 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -61,6 +61,7 @@
 #include "sysemu/reset.h"
 #include "sysemu/runstate.h"
 #include "kvm/kvm_i386.h"
+#include "kvm/tdx.h"
 #include "hw/xen/xen.h"
 #include "hw/xen/start_info.h"
 #include "ui/qemu-spice.h"
@@ -908,16 +909,18 @@ void pc_memory_init(PCMachineState *pcms,
     /* Initialize PC system firmware */
     pc_system_firmware_init(pcms, rom_memory);
 
-    option_rom_mr = g_malloc(sizeof(*option_rom_mr));
-    memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
-                           &error_fatal);
-    if (pcmc->pci_enabled) {
-        memory_region_set_readonly(option_rom_mr, true);
+    if (!is_tdx_vm()) {
+        option_rom_mr = g_malloc(sizeof(*option_rom_mr));
+        memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
+                            &error_fatal);
+        if (pcmc->pci_enabled) {
+            memory_region_set_readonly(option_rom_mr, true);
+        }
+        memory_region_add_subregion_overlap(rom_memory,
+                                            PC_ROM_MIN_VGA,
+                                            option_rom_mr,
+                                            1);
     }
-    memory_region_add_subregion_overlap(rom_memory,
-                                        PC_ROM_MIN_VGA,
-                                        option_rom_mr,
-                                        1);
 
     fw_cfg = fw_cfg_arch_create(machine,
                                 x86ms->boot_cpus, x86ms->apic_id_limit);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 20/36] i386/tdx: Register a machine_init_done callback for TD
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (18 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 19/36] i386/tdx: Don't initialize pc.rom for TDX VMs Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  7:09   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 21/36] i386/tdx: Track mem_ptr for each firmware entry of TDVF Xiaoyao Li
                   ` (15 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Before a TD can run, it needs to
 - setup/configure TD HOB list;
 - initialize TDVF into TD's private memory;
 - initialize TD vcpu state;

Register a machine_init_done callback to all those stuff.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 2953d2728b32..a95d5b894c34 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -16,6 +16,7 @@
 #include "qom/object_interfaces.h"
 #include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/kvm.h"
+#include "sysemu/sysemu.h"
 
 #include "hw/i386/x86.h"
 #include "kvm_i386.h"
@@ -126,6 +127,15 @@ static void get_tdx_capabilities(void)
     tdx_caps = caps;
 }
 
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+    /* TODO */
+}
+
+static Notifier tdx_machine_done_notify = {
+    .notify = tdx_finalize_vm,
+};
+
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
     TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
@@ -144,6 +154,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
      */
     kvm_readonly_mem_allowed = false;
 
+    qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
+
     tdx_guest = tdx;
 
     return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 21/36] i386/tdx: Track mem_ptr for each firmware entry of TDVF
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (19 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 20/36] i386/tdx: Register a machine_init_done callback for TD Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  7:11   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
                   ` (14 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).

Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.

TDVF sections can be classified into two groups:
 - Firmware itself, e.g., BFV and CFV, that locates separated from guest
   RAM. It's memory pointer is the bios pointer.

 - Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
   mmap a new memory range for them.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/tdvf.c         |  1 +
 include/hw/i386/tdvf.h |  7 +++++++
 target/i386/kvm/tdx.c  | 22 +++++++++++++++++++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
index a691d92eee6e..8776a2f39f01 100644
--- a/hw/i386/tdvf.c
+++ b/hw/i386/tdvf.c
@@ -187,6 +187,7 @@ int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size)
     }
     g_free(sections);
 
+    fw->mem_ptr = flash_ptr;
     return 0;
 
 err:
diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
index 593341eb2e93..d880af245a73 100644
--- a/include/hw/i386/tdvf.h
+++ b/include/hw/i386/tdvf.h
@@ -39,13 +39,20 @@ typedef struct TdxFirmwareEntry {
     uint64_t size;
     uint32_t type;
     uint32_t attributes;
+
+    void *mem_ptr;
 } TdxFirmwareEntry;
 
 typedef struct TdxFirmware {
+    void *mem_ptr;
+
     uint32_t nr_entries;
     TdxFirmwareEntry *entries;
 } TdxFirmware;
 
+#define for_each_tdx_fw_entry(fw, e)    \
+    for (e = (fw)->entries; e != (fw)->entries + (fw)->nr_entries; e++)
+
 int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size);
 
 #endif /* HW_I386_TDVF_H */
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index a95d5b894c34..8bac49419f37 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/mmap-alloc.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
 #include "standard-headers/asm-x86/kvm_para.h"
@@ -19,6 +20,7 @@
 #include "sysemu/sysemu.h"
 
 #include "hw/i386/x86.h"
+#include "hw/i386/tdvf.h"
 #include "kvm_i386.h"
 #include "tdx.h"
 
@@ -129,7 +131,25 @@ static void get_tdx_capabilities(void)
 
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
-    /* TODO */
+    TdxFirmware *tdvf = &tdx_guest->tdvf;
+    TdxFirmwareEntry *entry;
+
+    for_each_tdx_fw_entry(tdvf, entry) {
+        switch (entry->type) {
+        case TDVF_SECTION_TYPE_BFV:
+        case TDVF_SECTION_TYPE_CFV:
+            entry->mem_ptr = tdvf->mem_ptr + entry->data_offset;
+            break;
+        case TDVF_SECTION_TYPE_TD_HOB:
+        case TDVF_SECTION_TYPE_TEMP_MEM:
+            entry->mem_ptr = qemu_ram_mmap(-1, entry->size,
+                                           qemu_real_host_page_size(), 0, 0);
+            break;
+        default:
+            error_report("Unsupported TDVF section %d", entry->type);
+            exit(1);
+        }
+    }
 }
 
 static Notifier tdx_machine_done_notify = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (20 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 21/36] i386/tdx: Track mem_ptr for each firmware entry of TDVF Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  7:37   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 23/36] i386/tdx: Setup the TD HOB list Xiaoyao Li
                   ` (13 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

The RAM of TDX VM can be classified into two types:

 - TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
   accepted by TDX guest before it can be used and will be all-zeros
   after being accepted.

 - TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
   can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.

Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.

Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.

The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 99 +++++++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx.h | 14 ++++++
 2 files changed, 113 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 8bac49419f37..e7071bfe4c9c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
 
+#include "hw/i386/e820_memory_layout.h"
 #include "hw/i386/x86.h"
 #include "hw/i386/tdvf.h"
 #include "kvm_i386.h"
@@ -129,11 +130,105 @@ static void get_tdx_capabilities(void)
     tdx_caps = caps;
 }
 
+static void tdx_add_ram_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+    uint32_t nr_entries = tdx_guest->nr_ram_entries;
+    tdx_guest->ram_entries = g_renew(TdxRamEntry, tdx_guest->ram_entries,
+                                     nr_entries + 1);
+
+    tdx_guest->ram_entries[nr_entries].address = address;
+    tdx_guest->ram_entries[nr_entries].length = length;
+    tdx_guest->ram_entries[nr_entries].type = type;
+    tdx_guest->nr_ram_entries++;
+}
+
+static int tdx_accept_ram_range(uint64_t address, uint64_t length)
+{
+    TdxRamEntry *e;
+    int i;
+
+    for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
+        e = &tdx_guest->ram_entries[i];
+
+        if (address + length < e->address ||
+            e->address + e->length < address) {
+                continue;
+        }
+
+        if (e->address > address ||
+            e->address + e->length < address + length) {
+            return -EINVAL;
+        }
+
+        if (e->address == address && e->length == length) {
+            e->type = TDX_RAM_ADDED;
+        } else if (e->address == address) {
+            e->address += length;
+            e->length -= length;
+            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
+        } else if (e->address + e->length == address + length) {
+            e->length -= length;
+            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
+        } else {
+            TdxRamEntry tmp = {
+                .address = e->address,
+                .length = e->length,
+            };
+            e->length = address - tmp.address;
+
+            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
+            tdx_add_ram_entry(address + length,
+                              tmp.address + tmp.length - (address + length),
+                              TDX_RAM_UNACCEPTED);
+        }
+
+        return 0;
+    }
+
+    return -1;
+}
+
+static int tdx_ram_entry_compare(const void *lhs_, const void* rhs_)
+{
+    const TdxRamEntry *lhs = lhs_;
+    const TdxRamEntry *rhs = rhs_;
+
+    if (lhs->address == rhs->address) {
+        return 0;
+    }
+    if (le64_to_cpu(lhs->address) > le64_to_cpu(rhs->address)) {
+        return 1;
+    }
+    return -1;
+}
+
+static void tdx_init_ram_entries(void)
+{
+    unsigned i, j, nr_e820_entries;
+
+    nr_e820_entries = e820_get_num_entries();
+    tdx_guest->ram_entries = g_new(TdxRamEntry, nr_e820_entries);
+
+    for (i = 0, j = 0; i < nr_e820_entries; i++) {
+        uint64_t addr, len;
+
+        if (e820_get_entry(i, E820_RAM, &addr, &len)) {
+            tdx_guest->ram_entries[j].address = addr;
+            tdx_guest->ram_entries[j].length = len;
+            tdx_guest->ram_entries[j].type = TDX_RAM_UNACCEPTED;
+            j++;
+        }
+    }
+    tdx_guest->nr_ram_entries = j;
+}
+
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
     TdxFirmware *tdvf = &tdx_guest->tdvf;
     TdxFirmwareEntry *entry;
 
+    tdx_init_ram_entries();
+
     for_each_tdx_fw_entry(tdvf, entry) {
         switch (entry->type) {
         case TDVF_SECTION_TYPE_BFV:
@@ -144,12 +239,16 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
         case TDVF_SECTION_TYPE_TEMP_MEM:
             entry->mem_ptr = qemu_ram_mmap(-1, entry->size,
                                            qemu_real_host_page_size(), 0, 0);
+            tdx_accept_ram_range(entry->address, entry->size);
             break;
         default:
             error_report("Unsupported TDVF section %d", entry->type);
             exit(1);
         }
     }
+
+    qsort(tdx_guest->ram_entries, tdx_guest->nr_ram_entries,
+          sizeof(TdxRamEntry), &tdx_ram_entry_compare);
 }
 
 static Notifier tdx_machine_done_notify = {
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 12bcf25bb95b..5792518afa62 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -15,6 +15,17 @@ typedef struct TdxGuestClass {
     ConfidentialGuestSupportClass parent_class;
 } TdxGuestClass;
 
+enum TdxRamType{
+    TDX_RAM_UNACCEPTED,
+    TDX_RAM_ADDED,
+};
+
+typedef struct TdxRamEntry {
+    uint64_t address;
+    uint64_t length;
+    uint32_t type;
+} TdxRamEntry;
+
 typedef struct TdxGuest {
     ConfidentialGuestSupport parent_obj;
 
@@ -24,6 +35,9 @@ typedef struct TdxGuest {
     uint64_t attributes;    /* TD attributes */
 
     TdxFirmware tdvf;
+
+    uint32_t nr_ram_entries;
+    TdxRamEntry *ram_entries;
 } TdxGuest;
 
 #ifdef CONFIG_TDX
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 23/36] i386/tdx: Setup the TD HOB list
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (21 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12 18:33   ` Isaku Yamahata
  2022-05-24  7:56   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 24/36] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
                   ` (12 subsequent siblings)
  35 siblings, 2 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.

Build the TD HOB in TDX's machine_init_done callback.

Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/meson.build   |   2 +-
 hw/i386/tdvf-hob.c    | 212 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/tdvf-hob.h    |  25 +++++
 hw/i386/uefi.h        | 198 +++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx.c |  16 ++++
 5 files changed, 452 insertions(+), 1 deletion(-)
 create mode 100644 hw/i386/tdvf-hob.c
 create mode 100644 hw/i386/tdvf-hob.h
 create mode 100644 hw/i386/uefi.h

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 97f3b50503b0..b59e0d35bba3 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -28,7 +28,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'port92.c'))
 i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
                                         if_false: files('pc_sysfw_ovmf-stubs.c'))
-i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c', 'tdvf-hob.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
new file mode 100644
index 000000000000..31160e9f95c5
--- /dev/null
+++ b/hw/i386/tdvf-hob.c
@@ -0,0 +1,212 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "e820_memory_layout.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/x86.h"
+#include "hw/pci/pcie_host.h"
+#include "sysemu/kvm.h"
+#include "tdvf-hob.h"
+#include "uefi.h"
+
+typedef struct TdvfHob {
+    hwaddr hob_addr;
+    void *ptr;
+    int size;
+
+    /* working area */
+    void *current;
+    void *end;
+} TdvfHob;
+
+static uint64_t tdvf_current_guest_addr(const TdvfHob *hob)
+{
+    return hob->hob_addr + (hob->current - hob->ptr);
+}
+
+static void tdvf_align(TdvfHob *hob, size_t align)
+{
+    hob->current = QEMU_ALIGN_PTR_UP(hob->current, align);
+}
+
+static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
+{
+    void *ret;
+
+    if (hob->current + size > hob->end) {
+        error_report("TD_HOB overrun, size = 0x%" PRIx64, size);
+        exit(1);
+    }
+
+    ret = hob->current;
+    hob->current += size;
+    tdvf_align(hob, 8);
+    return ret;
+}
+
+static void tdvf_hob_add_mmio_resource(TdvfHob *hob, uint64_t start,
+                                       uint64_t end)
+{
+    EFI_HOB_RESOURCE_DESCRIPTOR *region;
+
+    if (!start) {
+        return;
+    }
+
+    region = tdvf_get_area(hob, sizeof(*region));
+    *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
+        .Header = {
+            .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
+            .HobLength = cpu_to_le16(sizeof(*region)),
+            .Reserved = cpu_to_le32(0),
+        },
+        .Owner = EFI_HOB_OWNER_ZERO,
+        .ResourceType = cpu_to_le32(EFI_RESOURCE_MEMORY_MAPPED_IO),
+        .ResourceAttribute = cpu_to_le32(EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO),
+        .PhysicalStart = cpu_to_le64(start),
+        .ResourceLength = cpu_to_le64(end - start),
+    };
+}
+
+static void tdvf_hob_add_mmio_resources(TdvfHob *hob)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    PCIHostState *pci_host;
+    uint64_t start, end;
+    uint64_t mcfg_base, mcfg_size;
+    Object *host;
+
+    /* Effectively PCI hole + other MMIO devices. */
+    tdvf_hob_add_mmio_resource(hob, x86ms->below_4g_mem_size,
+                               APIC_DEFAULT_ADDRESS);
+
+    /* Stolen from acpi_get_i386_pci_host(), there's gotta be an easier way. */
+    pci_host = OBJECT_CHECK(PCIHostState,
+                            object_resolve_path("/machine/i440fx", NULL),
+                            TYPE_PCI_HOST_BRIDGE);
+    if (!pci_host) {
+        pci_host = OBJECT_CHECK(PCIHostState,
+                                object_resolve_path("/machine/q35", NULL),
+                                TYPE_PCI_HOST_BRIDGE);
+    }
+    g_assert(pci_host);
+
+    host = OBJECT(pci_host);
+
+    /* PCI hole above 4gb. */
+    start = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_START,
+                                     NULL);
+    end = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_END, NULL);
+    tdvf_hob_add_mmio_resource(hob, start, end);
+
+    /* MMCFG region */
+    mcfg_base = object_property_get_uint(host, PCIE_HOST_MCFG_BASE, NULL);
+    mcfg_size = object_property_get_uint(host, PCIE_HOST_MCFG_SIZE, NULL);
+    if (mcfg_base && mcfg_base != PCIE_BASE_ADDR_UNMAPPED && mcfg_size) {
+        tdvf_hob_add_mmio_resource(hob, mcfg_base, mcfg_base + mcfg_size);
+    }
+}
+
+static void tdvf_hob_add_memory_resources(TdxGuest *tdx, TdvfHob *hob)
+{
+    EFI_HOB_RESOURCE_DESCRIPTOR *region;
+    EFI_RESOURCE_ATTRIBUTE_TYPE attr;
+    EFI_RESOURCE_TYPE resource_type;
+
+    TdxRamEntry *e;
+    int i;
+
+    for (i = 0; i < tdx->nr_ram_entries; i++) {
+        e = &tdx->ram_entries[i];
+
+        if (e->type == TDX_RAM_UNACCEPTED) {
+            resource_type = EFI_RESOURCE_MEMORY_UNACCEPTED;
+            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED;
+        } else if (e->type == TDX_RAM_ADDED){
+            resource_type = EFI_RESOURCE_SYSTEM_MEMORY;
+            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE;
+        } else {
+            error_report("unknown TDXRAMENTRY type %d", e->type);
+            exit(1);
+        }
+
+        region = tdvf_get_area(hob, sizeof(*region));
+        *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
+            .Header = {
+                .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
+                .HobLength = cpu_to_le16(sizeof(*region)),
+                .Reserved = cpu_to_le32(0),
+            },
+            .Owner = EFI_HOB_OWNER_ZERO,
+            .ResourceType = cpu_to_le32(resource_type),
+            .ResourceAttribute = cpu_to_le32(attr),
+            .PhysicalStart = e->address,
+            .ResourceLength = e->length,
+        };
+    }
+}
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob)
+{
+    TdvfHob hob = {
+        .hob_addr = td_hob->address,
+        .size = td_hob->size,
+        .ptr = td_hob->mem_ptr,
+
+        .current = td_hob->mem_ptr,
+        .end = td_hob->mem_ptr + td_hob->size,
+    };
+
+    EFI_HOB_GENERIC_HEADER *last_hob;
+    EFI_HOB_HANDOFF_INFO_TABLE *hit;
+
+    /* Note, Efi{Free}Memory{Bottom,Top} are ignored, leave 'em zeroed. */
+    hit = tdvf_get_area(&hob, sizeof(*hit));
+    *hit = (EFI_HOB_HANDOFF_INFO_TABLE) {
+        .Header = {
+            .HobType = EFI_HOB_TYPE_HANDOFF,
+            .HobLength = cpu_to_le16(sizeof(*hit)),
+            .Reserved = cpu_to_le32(0),
+        },
+        .Version = cpu_to_le32(EFI_HOB_HANDOFF_TABLE_VERSION),
+        .BootMode = cpu_to_le32(0),
+        .EfiMemoryTop = cpu_to_le64(0),
+        .EfiMemoryBottom = cpu_to_le64(0),
+        .EfiFreeMemoryTop = cpu_to_le64(0),
+        .EfiFreeMemoryBottom = cpu_to_le64(0),
+        .EfiEndOfHobList = cpu_to_le64(0), /* initialized later */
+    };
+
+    tdvf_hob_add_memory_resources(tdx, &hob);
+
+    tdvf_hob_add_mmio_resources(&hob);
+
+    last_hob = tdvf_get_area(&hob, sizeof(*last_hob));
+    *last_hob =  (EFI_HOB_GENERIC_HEADER) {
+        .HobType = EFI_HOB_TYPE_END_OF_HOB_LIST,
+        .HobLength = cpu_to_le16(sizeof(*last_hob)),
+        .Reserved = cpu_to_le32(0),
+    };
+    hit->EfiEndOfHobList = tdvf_current_guest_addr(&hob);
+}
diff --git a/hw/i386/tdvf-hob.h b/hw/i386/tdvf-hob.h
new file mode 100644
index 000000000000..f0494e8c4af8
--- /dev/null
+++ b/hw/i386/tdvf-hob.h
@@ -0,0 +1,25 @@
+#ifndef HW_I386_TD_HOB_H
+#define HW_I386_TD_HOB_H
+
+#include "hw/i386/tdvf.h"
+#include "hw/i386/uefi.h"
+#include "target/i386/kvm/tdx.h"
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob);
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE     \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_TESTED)
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED  \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_TESTED)
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO        \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT     |       \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE)
+
+#endif
diff --git a/hw/i386/uefi.h b/hw/i386/uefi.h
new file mode 100644
index 000000000000..b15aba796156
--- /dev/null
+++ b/hw/i386/uefi.h
@@ -0,0 +1,198 @@
+/*
+ * Copyright (C) 2020 Intel Corporation
+ *
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef HW_I386_UEFI_H
+#define HW_I386_UEFI_H
+
+/***************************************************************************/
+/*
+ * basic EFI definitions
+ * supplemented with UEFI Specification Version 2.8 (Errata A)
+ * released February 2020
+ */
+/* UEFI integer is little endian */
+
+typedef struct {
+    uint32_t Data1;
+    uint16_t Data2;
+    uint16_t Data3;
+    uint8_t Data4[8];
+} EFI_GUID;
+
+typedef enum {
+    EfiReservedMemoryType,
+    EfiLoaderCode,
+    EfiLoaderData,
+    EfiBootServicesCode,
+    EfiBootServicesData,
+    EfiRuntimeServicesCode,
+    EfiRuntimeServicesData,
+    EfiConventionalMemory,
+    EfiUnusableMemory,
+    EfiACPIReclaimMemory,
+    EfiACPIMemoryNVS,
+    EfiMemoryMappedIO,
+    EfiMemoryMappedIOPortSpace,
+    EfiPalCode,
+    EfiPersistentMemory,
+    EfiUnacceptedMemoryType,
+    EfiMaxMemoryType
+} EFI_MEMORY_TYPE;
+
+#define EFI_HOB_HANDOFF_TABLE_VERSION 0x0009
+
+#define EFI_HOB_TYPE_HANDOFF              0x0001
+#define EFI_HOB_TYPE_MEMORY_ALLOCATION    0x0002
+#define EFI_HOB_TYPE_RESOURCE_DESCRIPTOR  0x0003
+#define EFI_HOB_TYPE_GUID_EXTENSION       0x0004
+#define EFI_HOB_TYPE_FV                   0x0005
+#define EFI_HOB_TYPE_CPU                  0x0006
+#define EFI_HOB_TYPE_MEMORY_POOL          0x0007
+#define EFI_HOB_TYPE_FV2                  0x0009
+#define EFI_HOB_TYPE_LOAD_PEIM_UNUSED     0x000A
+#define EFI_HOB_TYPE_UEFI_CAPSULE         0x000B
+#define EFI_HOB_TYPE_FV3                  0x000C
+#define EFI_HOB_TYPE_UNUSED               0xFFFE
+#define EFI_HOB_TYPE_END_OF_HOB_LIST      0xFFFF
+
+typedef struct {
+    uint16_t HobType;
+    uint16_t HobLength;
+    uint32_t Reserved;
+} EFI_HOB_GENERIC_HEADER;
+
+typedef uint64_t EFI_PHYSICAL_ADDRESS;
+typedef uint32_t EFI_BOOT_MODE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    uint32_t Version;
+    EFI_BOOT_MODE BootMode;
+    EFI_PHYSICAL_ADDRESS EfiMemoryTop;
+    EFI_PHYSICAL_ADDRESS EfiMemoryBottom;
+    EFI_PHYSICAL_ADDRESS EfiFreeMemoryTop;
+    EFI_PHYSICAL_ADDRESS EfiFreeMemoryBottom;
+    EFI_PHYSICAL_ADDRESS EfiEndOfHobList;
+} EFI_HOB_HANDOFF_INFO_TABLE;
+
+#define EFI_RESOURCE_SYSTEM_MEMORY          0x00000000
+#define EFI_RESOURCE_MEMORY_MAPPED_IO       0x00000001
+#define EFI_RESOURCE_IO                     0x00000002
+#define EFI_RESOURCE_FIRMWARE_DEVICE        0x00000003
+#define EFI_RESOURCE_MEMORY_MAPPED_IO_PORT  0x00000004
+#define EFI_RESOURCE_MEMORY_RESERVED        0x00000005
+#define EFI_RESOURCE_IO_RESERVED            0x00000006
+#define EFI_RESOURCE_MEMORY_UNACCEPTED      0x00000007
+#define EFI_RESOURCE_MAX_MEMORY_TYPE        0x00000008
+
+#define EFI_RESOURCE_ATTRIBUTE_PRESENT                  0x00000001
+#define EFI_RESOURCE_ATTRIBUTE_INITIALIZED              0x00000002
+#define EFI_RESOURCE_ATTRIBUTE_TESTED                   0x00000004
+#define EFI_RESOURCE_ATTRIBUTE_SINGLE_BIT_ECC           0x00000008
+#define EFI_RESOURCE_ATTRIBUTE_MULTIPLE_BIT_ECC         0x00000010
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_1           0x00000020
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_2           0x00000040
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTED           0x00000080
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTED          0x00000100
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTED      0x00000200
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE              0x00000400
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_COMBINEABLE        0x00000800
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_THROUGH_CACHEABLE  0x00001000
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_BACK_CACHEABLE     0x00002000
+#define EFI_RESOURCE_ATTRIBUTE_16_BIT_IO                0x00004000
+#define EFI_RESOURCE_ATTRIBUTE_32_BIT_IO                0x00008000
+#define EFI_RESOURCE_ATTRIBUTE_64_BIT_IO                0x00010000
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHED_EXPORTED        0x00020000
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTED      0x00040000
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTABLE    0x00080000
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTABLE         0x00100000
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTABLE        0x00200000
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTABLE    0x00400000
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTENT               0x00800000
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTABLE              0x01000000
+#define EFI_RESOURCE_ATTRIBUTE_MORE_RELIABLE            0x02000000
+
+typedef uint32_t EFI_RESOURCE_TYPE;
+typedef uint32_t EFI_RESOURCE_ATTRIBUTE_TYPE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_GUID Owner;
+    EFI_RESOURCE_TYPE ResourceType;
+    EFI_RESOURCE_ATTRIBUTE_TYPE ResourceAttribute;
+    EFI_PHYSICAL_ADDRESS PhysicalStart;
+    uint64_t ResourceLength;
+} EFI_HOB_RESOURCE_DESCRIPTOR;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_GUID Name;
+
+    /* guid specific data follows */
+} EFI_HOB_GUID_TYPE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+} EFI_HOB_FIRMWARE_VOLUME;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+    EFI_GUID FvName;
+    EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME2;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+    uint32_t AuthenticationStatus;
+    bool ExtractedFv;
+    EFI_GUID FvName;
+    EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME3;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    uint8_t SizeOfMemorySpace;
+    uint8_t SizeOfIoSpace;
+    uint8_t Reserved[6];
+} EFI_HOB_CPU;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+} EFI_HOB_MEMORY_POOL;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+} EFI_HOB_UEFI_CAPSULE;
+
+#define EFI_HOB_OWNER_ZERO                                      \
+    ((EFI_GUID){ 0x00000000, 0x0000, 0x0000,                    \
+        { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } })
+
+#endif
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e7071bfe4c9c..3e18ace90bf7 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -22,6 +22,7 @@
 #include "hw/i386/e820_memory_layout.h"
 #include "hw/i386/x86.h"
 #include "hw/i386/tdvf.h"
+#include "hw/i386/tdvf-hob.h"
 #include "kvm_i386.h"
 #include "tdx.h"
 
@@ -130,6 +131,19 @@ static void get_tdx_capabilities(void)
     tdx_caps = caps;
 }
 
+static TdxFirmwareEntry *tdx_get_hob_entry(TdxGuest *tdx)
+{
+    TdxFirmwareEntry *entry;
+
+    for_each_tdx_fw_entry(&tdx->tdvf, entry) {
+        if (entry->type == TDVF_SECTION_TYPE_TD_HOB) {
+            return entry;
+        }
+    }
+    error_report("TDVF metadata doesn't specify TD_HOB location.");
+    exit(1);
+}
+
 static void tdx_add_ram_entry(uint64_t address, uint64_t length, uint32_t type)
 {
     uint32_t nr_entries = tdx_guest->nr_ram_entries;
@@ -249,6 +263,8 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
 
     qsort(tdx_guest->ram_entries, tdx_guest->nr_ram_entries,
           sizeof(TdxRamEntry), &tdx_ram_entry_compare);
+
+    tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
 }
 
 static Notifier tdx_machine_done_notify = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 24/36] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (22 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 23/36] i386/tdx: Setup the TD HOB list Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12 18:34   ` Isaku Yamahata
  2022-05-24  7:57   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 25/36] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu Xiaoyao Li
                   ` (11 subsequent siblings)
  35 siblings, 2 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDVF firmware (CODE and VARS) needs to be added/copied to TD's private
memory via KVM_TDX_INIT_MEM_REGION, as well as TD HOB and TEMP memory.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 3e18ace90bf7..567ee12e88f0 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -240,6 +240,7 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
     TdxFirmware *tdvf = &tdx_guest->tdvf;
     TdxFirmwareEntry *entry;
+    int r;
 
     tdx_init_ram_entries();
 
@@ -265,6 +266,29 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
           sizeof(TdxRamEntry), &tdx_ram_entry_compare);
 
     tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
+
+    for_each_tdx_fw_entry(tdvf, entry) {
+        struct kvm_tdx_init_mem_region mem_region = {
+            .source_addr = (__u64)entry->mem_ptr,
+            .gpa = entry->address,
+            .nr_pages = entry->size / 4096,
+        };
+
+        __u32 metadata = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
+                         KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+        r = tdx_vm_ioctl(KVM_TDX_INIT_MEM_REGION, metadata, &mem_region);
+        if (r < 0) {
+             error_report("KVM_TDX_INIT_MEM_REGION failed %s", strerror(-r));
+             exit(1);
+        }
+
+        if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+            entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+            qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
+            entry->mem_ptr = NULL;
+        }
+    }
 }
 
 static Notifier tdx_machine_done_notify = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 25/36] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (23 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 24/36] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  7:59   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 26/36] i386/tdx: Finalize TDX VM Xiaoyao Li
                   ` (10 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX vcpu needs to be initialized by SEAMCALL(TDH.VP.INIT) and KVM
provides vcpu level IOCTL KVM_TDX_INIT_VCPU for it.

KVM_TDX_INIT_VCPU needs the address of the HOB as input. Invoke it for
each vcpu after HOB list is created.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 567ee12e88f0..d8b05b7749f7 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -236,6 +236,22 @@ static void tdx_init_ram_entries(void)
     tdx_guest->nr_ram_entries = j;
 }
 
+static void tdx_post_init_vcpus(void)
+{
+    TdxFirmwareEntry *hob;
+    CPUState *cpu;
+    int r;
+
+    hob = tdx_get_hob_entry(tdx_guest);
+    CPU_FOREACH(cpu) {
+        r = tdx_vcpu_ioctl(cpu, KVM_TDX_INIT_VCPU, 0, (void *)hob->address);
+        if (r < 0) {
+            error_report("KVM_TDX_INIT_VCPU failed %s", strerror(-r));
+            exit(1);
+        }
+    }
+}
+
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
     TdxFirmware *tdvf = &tdx_guest->tdvf;
@@ -267,6 +283,8 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
 
     tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
 
+    tdx_post_init_vcpus();
+
     for_each_tdx_fw_entry(tdvf, entry) {
         struct kvm_tdx_init_mem_region mem_region = {
             .source_addr = (__u64)entry->mem_ptr,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 26/36] i386/tdx: Finalize TDX VM
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (24 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 25/36] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  7:59   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 27/36] i386/tdx: Disable SMM for TDX VMs Xiaoyao Li
                   ` (9 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Invoke KVM_TDX_FINALIZE_VM to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index d8b05b7749f7..4a7c149f895c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -307,6 +307,13 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
             entry->mem_ptr = NULL;
         }
     }
+
+    r = tdx_vm_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
+    if (r < 0) {
+        error_report("KVM_TDX_FINALIZE_VM failed %s", strerror(-r));
+        exit(0);
+    }
+    tdx_guest->parent_obj.ready = true;
 }
 
 static Notifier tdx_machine_done_notify = {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 27/36] i386/tdx: Disable SMM for TDX VMs
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (25 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 26/36] i386/tdx: Finalize TDX VM Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  8:00   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 28/36] i386/tdx: Disable PIC " Xiaoyao Li
                   ` (8 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.

Disable SMM for TDX VMs and error out if user requests to enable SMM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 4a7c149f895c..7ff4c6a9a7ca 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -322,9 +322,17 @@ static Notifier tdx_machine_done_notify = {
 
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
+    X86MachineState *x86ms = X86_MACHINE(ms);
     TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
                                                     TYPE_TDX_GUEST);
 
+    if (x86ms->smm == ON_OFF_AUTO_AUTO) {
+        x86ms->smm = ON_OFF_AUTO_OFF;
+    } else if (x86ms->smm == ON_OFF_AUTO_ON) {
+        error_setg(errp, "TDX VM doesn't support SMM");
+        return -EINVAL;
+    }
+
     if (!tdx_caps) {
         get_tdx_capabilities();
     }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 28/36] i386/tdx: Disable PIC for TDX VMs
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (26 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 27/36] i386/tdx: Disable SMM for TDX VMs Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  8:00   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 29/36] i386/tdx: Don't allow system reset " Xiaoyao Li
                   ` (7 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection.  Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.

Hence disable PIC for TDX VMs and error out if user wants PIC.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/tdx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 7ff4c6a9a7ca..59c7aa8f1818 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -333,6 +333,13 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
         return -EINVAL;
     }
 
+    if (x86ms->pic == ON_OFF_AUTO_AUTO) {
+        x86ms->pic = ON_OFF_AUTO_OFF;
+    } else if (x86ms->pic == ON_OFF_AUTO_ON) {
+        error_setg(errp, "TDX VM doesn't support PIC");
+        return -EINVAL;
+    }
+
     if (!tdx_caps) {
         get_tdx_capabilities();
     }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 29/36] i386/tdx: Don't allow system reset for TDX VMs
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (27 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 28/36] i386/tdx: Disable PIC " Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  8:01   ` Gerd Hoffmann
  2022-05-12  3:17 ` [RFC PATCH v4 30/36] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
                   ` (6 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

TDX CPU state is protected and thus vcpu state cann't be reset by VMM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c51125ab200f..9a1e1dab938f 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5349,7 +5349,7 @@ bool kvm_has_waitpkg(void)
 
 bool kvm_arch_cpu_check_are_resettable(void)
 {
-    return !sev_es_enabled();
+    return !sev_es_enabled() && !is_tdx_vm();
 }
 
 #define ARCH_REQ_XCOMP_GUEST_PERM       0x1025
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 30/36] hw/i386: add eoi_intercept_unsupported member to X86MachineState
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (28 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 29/36] i386/tdx: Don't allow system reset " Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 31/36] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Add a new bool member, eoi_intercept_unsupported, to X86MachineState
with default value false. Set true for TDX VM.

Inability to intercept eoi causes impossibility to emulate level
triggered interrupt to be re-injected when level is still kept active.
which affects interrupt controller emulation.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/x86.c         | 1 +
 include/hw/i386/x86.h | 1 +
 target/i386/kvm/tdx.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 17f2252296c5..182ec544611b 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1365,6 +1365,7 @@ static void x86_machine_initfn(Object *obj)
     x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
     x86ms->bus_lock_ratelimit = 0;
+    x86ms->eoi_intercept_unsupported = false;
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 9089bdd99c3a..5bf91dd934db 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -58,6 +58,7 @@ struct X86MachineState {
 
     /* CPU and apic information: */
     bool apic_xrupt_override;
+    bool eoi_intercept_unsupported;
     unsigned pci_irq_mask;
     unsigned apic_id_limit;
     uint16_t boot_cpus;
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 59c7aa8f1818..583fda52de5f 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -340,6 +340,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
         return -EINVAL;
     }
 
+    x86ms->eoi_intercept_unsupported = true;
+
     if (!tdx_caps) {
         get_tdx_capabilities();
     }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 31/36] hw/i386: add option to forcibly report edge trigger in acpi tables
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (29 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 30/36] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-12  3:17 ` [RFC PATCH v4 32/36] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Isaku Yamahata <isaku.yamahata@intel.com>

When level trigger isn't supported on x86 platform,
forcibly report edge trigger in acpi tables.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/acpi-build.c  | 99 ++++++++++++++++++++++++++++---------------
 hw/i386/acpi-common.c | 50 ++++++++++++++++------
 2 files changed, 104 insertions(+), 45 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index dcf6ece3d043..bd068bba534b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -912,7 +912,8 @@ static void build_dbg_aml(Aml *table)
     aml_append(table, scope);
 }
 
-static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
+static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
+                           bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -924,7 +925,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
     aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
 
     crs = aml_resource_template();
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
+                                  AML_ACTIVE_HIGH,
                                   AML_SHARED, irqs, ARRAY_SIZE(irqs)));
     aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -948,7 +952,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
     return dev;
  }
 
-static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
+static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
+                               uint8_t gsi, bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -961,7 +966,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
 
     crs = aml_resource_template();
     irqs = gsi;
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
+                                  AML_ACTIVE_HIGH,
                                   AML_SHARED, &irqs, 1));
     aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -980,7 +988,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
 }
 
 /* _CRS method - get current settings */
-static Aml *build_iqcr_method(bool is_piix4)
+static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
 {
     Aml *if_ctx;
     uint32_t irqs;
@@ -988,7 +996,9 @@ static Aml *build_iqcr_method(bool is_piix4)
     Aml *crs = aml_resource_template();
 
     irqs = 0;
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
                                   AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
     aml_append(method, aml_name_decl("PRR0", crs));
 
@@ -1022,7 +1032,7 @@ static Aml *build_irq_status_method(void)
     return method;
 }
 
-static void build_piix4_pci0_int(Aml *table)
+static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -1043,12 +1053,16 @@ static void build_piix4_pci0_int(Aml *table)
     aml_append(sb_scope, field);
 
     aml_append(sb_scope, build_irq_status_method());
-    aml_append(sb_scope, build_iqcr_method(true));
+    aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
 
-    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
-    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
-    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
-    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
+    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
+                                        level_trigger_unsupported));
 
     dev = aml_device("LNKS");
     {
@@ -1057,7 +1071,9 @@ static void build_piix4_pci0_int(Aml *table)
 
         crs = aml_resource_template();
         irqs = 9;
-        aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+        aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                      level_trigger_unsupported ?
+                                      AML_EDGE : AML_LEVEL,
                                       AML_ACTIVE_HIGH, AML_SHARED,
                                       &irqs, 1));
         aml_append(dev, aml_name_decl("_PRS", crs));
@@ -1143,7 +1159,7 @@ static Aml *build_q35_routing_table(const char *str)
     return pkg;
 }
 
-static void build_q35_pci0_int(Aml *table)
+static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
 {
     Aml *field;
     Aml *method;
@@ -1195,25 +1211,41 @@ static void build_q35_pci0_int(Aml *table)
     aml_append(sb_scope, field);
 
     aml_append(sb_scope, build_irq_status_method());
-    aml_append(sb_scope, build_iqcr_method(false));
+    aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
 
-    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
-    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
-    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
-    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
-    aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
-    aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
-    aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
-    aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
+    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
+                                        level_trigger_unsupported));
 
-    aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
-    aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
-    aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
-    aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
-    aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
-    aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
-    aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
-    aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
+    aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
+                                            level_trigger_unsupported));
 
     aml_append(table, sb_scope);
 }
@@ -1420,6 +1452,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
     PCMachineState *pcms = PC_MACHINE(machine);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
     X86MachineState *x86ms = X86_MACHINE(machine);
+    bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
     AcpiMcfgInfo mcfg;
     bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
     uint32_t nr_mem = machine->ram_slots;
@@ -1454,7 +1487,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
             build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
         }
-        build_piix4_pci0_int(dsdt);
+        build_piix4_pci0_int(dsdt, level_trigger_unsupported);
     } else {
         sb_scope = aml_scope("_SB");
         dev = aml_device("PCI0");
@@ -1503,7 +1536,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         if (pm->pcihp_bridge_en) {
             build_x86_acpi_pci_hotplug(dsdt, pm->pcihp_io_base);
         }
-        build_q35_pci0_int(dsdt);
+        build_q35_pci0_int(dsdt, level_trigger_unsupported);
         if (pcms->smbus && !pcmc->do_not_add_smb_acpi) {
             build_smb0(dsdt, pcms->smbus, ICH9_SMB_DEV, ICH9_SMB_FUNC);
         }
diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
index 4aaafbdd7b5d..485fc17816be 100644
--- a/hw/i386/acpi-common.c
+++ b/hw/i386/acpi-common.c
@@ -105,6 +105,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
     AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(adev);
     AcpiTable table = { .sig = "APIC", .rev = 1, .oem_id = oem_id,
                         .oem_table_id = oem_table_id };
+    bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
 
     acpi_table_begin(&table, table_data);
     /* Local APIC Address */
@@ -124,18 +125,43 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
                      IO_APIC_SECONDARY_ADDRESS, IO_APIC_SECONDARY_IRQBASE);
     }
 
-    if (x86ms->apic_xrupt_override) {
-        build_xrupt_override(table_data, 0, 2,
-            0 /* Flags: Conforms to the specifications of the bus */);
-    }
-
-    for (i = 1; i < 16; i++) {
-        if (!(x86ms->pci_irq_mask & (1 << i))) {
-            /* No need for a INT source override structure. */
-            continue;
-        }
-        build_xrupt_override(table_data, i, i,
-            0xd /* Flags: Active high, Level Triggered */);
+    if (level_trigger_unsupported) {
+        /* Force edge trigger */
+        if (x86ms->apic_xrupt_override) {
+            build_xrupt_override(table_data, 0, 2,
+                                 /* Flags: active high, edge triggered */
+                                 1 | (1 << 2));
+        }
+
+        for (i = x86ms->apic_xrupt_override ? 1 : 0; i < 16; i++) {
+            build_xrupt_override(table_data, i, i,
+                                 /* Flags: active high, edge triggered */
+                                 1 | (1 << 2));
+        }
+
+        if (x86ms->ioapic2) {
+            for (i = 0; i < 16; i++) {
+                build_xrupt_override(table_data, IO_APIC_SECONDARY_IRQBASE + i,
+                                     IO_APIC_SECONDARY_IRQBASE + i,
+                                     /* Flags: active high, edge triggered */
+                                     1 | (1 << 2));
+            }
+        }
+    } else {
+        if (x86ms->apic_xrupt_override) {
+            build_xrupt_override(table_data, 0, 2,
+                                 0 /* Flags: Conforms to the specifications of the bus */);
+        }
+
+        for (i = 1; i < 16; i++) {
+            if (!(x86ms->pci_irq_mask & (1 << i))) {
+                /* No need for a INT source override structure. */
+                continue;
+            }
+            build_xrupt_override(table_data, i, i,
+                                 0xd /* Flags: Active high, Level Triggered */);
+
+        }
     }
 
     if (x2apic_mode) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 32/36] i386/tdx: Don't synchronize guest tsc for TDs
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (30 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 31/36] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
@ 2022-05-12  3:17 ` Xiaoyao Li
  2022-05-24  8:04   ` Gerd Hoffmann
  2022-05-12  3:18 ` [RFC PATCH v4 33/36] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
                   ` (3 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:17 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Isaku Yamahata <isaku.yamahata@intel.com>

TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9a1e1dab938f..c79dbff747e8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -276,7 +276,7 @@ void kvm_synchronize_all_tsc(void)
 {
     CPUState *cpu;
 
-    if (kvm_enabled()) {
+    if (kvm_enabled() && !is_tdx_vm()) {
         CPU_FOREACH(cpu) {
             run_on_cpu(cpu, do_kvm_synchronize_tsc, RUN_ON_CPU_NULL);
         }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 33/36] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (31 preceding siblings ...)
  2022-05-12  3:17 ` [RFC PATCH v4 32/36] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
@ 2022-05-12  3:18 ` Xiaoyao Li
  2022-05-24  8:05   ` Gerd Hoffmann
  2022-05-12  3:18 ` [RFC PATCH v4 34/36] i386/tdx: Skip kvm_put_apicbase() " Xiaoyao Li
                   ` (2 subsequent siblings)
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:18 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.

Only configure MSR_IA32_UCODE_REV for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 44 ++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c79dbff747e8..9c5bf075b542 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3125,32 +3125,34 @@ static void kvm_init_msrs(X86CPU *cpu)
     CPUX86State *env = &cpu->env;
 
     kvm_msr_buf_reset(cpu);
-    if (has_msr_arch_capabs) {
-        kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
-                          env->features[FEAT_ARCH_CAPABILITIES]);
-    }
-
-    if (has_msr_core_capabs) {
-        kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
-                          env->features[FEAT_CORE_CAPABILITY]);
-    }
-
-    if (has_msr_perf_capabs && cpu->enable_pmu) {
-        kvm_msr_entry_add_perf(cpu, env->features);
+
+    if (!is_tdx_vm()) {
+        if (has_msr_arch_capabs) {
+            kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
+                                env->features[FEAT_ARCH_CAPABILITIES]);
+        }
+
+        if (has_msr_core_capabs) {
+            kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
+                                env->features[FEAT_CORE_CAPABILITY]);
+        }
+
+        if (has_msr_perf_capabs && cpu->enable_pmu) {
+            kvm_msr_entry_add_perf(cpu, env->features);
+        }
+
+        /*
+         * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
+         * all kernels with MSR features should have them.
+         */
+        if (kvm_feature_msrs && cpu_has_vmx(env)) {
+            kvm_msr_entry_add_vmx(cpu, env->features);
+        }
     }
 
     if (has_msr_ucode_rev) {
         kvm_msr_entry_add(cpu, MSR_IA32_UCODE_REV, cpu->ucode_rev);
     }
-
-    /*
-     * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
-     * all kernels with MSR features should have them.
-     */
-    if (kvm_feature_msrs && cpu_has_vmx(env)) {
-        kvm_msr_entry_add_vmx(cpu, env->features);
-    }
-
     assert(kvm_buf_set_msrs(cpu) == 0);
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 34/36] i386/tdx: Skip kvm_put_apicbase() for TDs
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (32 preceding siblings ...)
  2022-05-12  3:18 ` [RFC PATCH v4 33/36] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
@ 2022-05-12  3:18 ` Xiaoyao Li
  2022-05-12  3:18 ` [RFC PATCH v4 35/36] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
  2022-05-12  3:18 ` [RFC PATCH v4 36/36] docs: Add TDX documentation Xiaoyao Li
  35 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:18 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

KVM doesn't allow wirting to MSR_IA32_APICBASE for TDs.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9c5bf075b542..4d520d0e34bd 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2907,6 +2907,11 @@ void kvm_put_apicbase(X86CPU *cpu, uint64_t value)
 {
     int ret;
 
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (is_tdx_vm()) {
+        return;
+    }
+
     ret = kvm_put_one_msr(cpu, MSR_IA32_APICBASE, value);
     assert(ret == 1);
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 35/36] i386/tdx: Don't get/put guest state for TDX VMs
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (33 preceding siblings ...)
  2022-05-12  3:18 ` [RFC PATCH v4 34/36] i386/tdx: Skip kvm_put_apicbase() " Xiaoyao Li
@ 2022-05-12  3:18 ` Xiaoyao Li
  2022-05-12  3:18 ` [RFC PATCH v4 36/36] docs: Add TDX documentation Xiaoyao Li
  35 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:18 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

From: Sean Christopherson <sean.j.christopherson@intel.com>

Don't get/put state of TDX VMs since accessing/mutating guest state of
production TDs is not supported.

Note, it will be allowed for a debug TD. Corresponding support will be
introduced when debug TD support is implemented in the future.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4d520d0e34bd..3e26dacf7807 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4478,6 +4478,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (is_tdx_vm()) {
+        return 0;
+    }
+
     /* must be before kvm_put_nested_state so that EFER.SVME is set */
     ret = has_sregs2 ? kvm_put_sregs2(x86_cpu) : kvm_put_sregs(x86_cpu);
     if (ret < 0) {
@@ -4572,6 +4577,12 @@ int kvm_arch_get_registers(CPUState *cs)
     if (ret < 0) {
         goto out;
     }
+
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (is_tdx_vm()) {
+        return 0;
+    }
+
     ret = kvm_getput_regs(cpu, 0);
     if (ret < 0) {
         goto out;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [RFC PATCH v4 36/36] docs: Add TDX documentation
  2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
                   ` (34 preceding siblings ...)
  2022-05-12  3:18 ` [RFC PATCH v4 35/36] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
@ 2022-05-12  3:18 ` Xiaoyao Li
  2022-05-12 18:42   ` Isaku Yamahata
  35 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-12  3:18 UTC (permalink / raw)
  To: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake
  Cc: Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, xiaoyao.li

Add docs/system/i386/tdx.rst for TDX support, and add tdx in
confidential-guest-support.rst

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 docs/system/confidential-guest-support.rst |   1 +
 docs/system/i386/tdx.rst                   | 103 +++++++++++++++++++++
 docs/system/target-i386.rst                |   1 +
 3 files changed, 105 insertions(+)
 create mode 100644 docs/system/i386/tdx.rst

diff --git a/docs/system/confidential-guest-support.rst b/docs/system/confidential-guest-support.rst
index 0c490dbda2b7..66129fbab64c 100644
--- a/docs/system/confidential-guest-support.rst
+++ b/docs/system/confidential-guest-support.rst
@@ -38,6 +38,7 @@ Supported mechanisms
 Currently supported confidential guest mechanisms are:
 
 * AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-memory-encryption`)
+* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`)
 * POWER Protected Execution Facility (PEF) (see :ref:`power-papr-protected-execution-facility-pef`)
 * s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`)
 
diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst
new file mode 100644
index 000000000000..96d91fea5516
--- /dev/null
+++ b/docs/system/i386/tdx.rst
@@ -0,0 +1,103 @@
+Intel Trusted Domain eXtension (TDX)
+====================================
+
+Intel Trusted Domain eXtensions (TDX) refers to an Intel technology that extends
+Virtual Machine Extensions (VMX) and Multi-Key Total Memory Encryption (MKTME)
+with a new kind of virtual machine guest called a Trust Domain (TD). A TD runs
+in a CPU mode that is designed to protect the confidentiality of its memory
+contents and its CPU state from any other software, including the hosting
+Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself.
+
+Prerequisites
+-------------
+
+To run TD, the physical machine needs to have TDX module loaded and initialized
+while KVM hypervisor has TDX support and has TDX enabled. If those requirements
+are met, the ``KVM_CAP_VM_TYPES`` will report the support of ``KVM_X86_TDX_VM``.
+
+Trust Domain Virtual Firmware (TDVF)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Trust Domain Virtual Firmware (TDVF) is required to provide TD services to boot
+TD Guest OS. TDVF needs to be copied to guest private memory and measured before
+a TD boots.
+
+The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_INIT_MEM_REGION``
+to copy the TDVF image to TD's private memory space.
+
+Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash
+device and it actually works as RAM. "-bios" option is chosen to load TDVF.
+
+OVMF is the opensource firmware that implements the TDVF support. Thus the
+command line to specify and load TDVF is `-bios OVMF.fd`
+
+Feature Control
+---------------
+
+Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD is not
+under full control of VMM. VMM can only configure part of features of a TD on
+``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl.
+
+The configurable features have three types:
+
+- Attributes:
+  - PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD,
+  which determines related CPUID bit and CR4 bit;
+  - PERFMON (bit 63) controls whether PMU is exposed to TD.
+
+- XSAVE related features (XFAM):
+  XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It
+  determines the set of extended features available for use by the guest TD.
+
+- CPUID features:
+  Only some bits of some CPUID leaves are directly configurable by VMM.
+
+What features can be configured is reported via TDX capabilities.
+
+TDX capabilities
+~~~~~~~~~~~~~~~~
+
+The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_CAPABILITIES``
+to get the TDX capabilities from KVM. It returns a data structure of
+``struct kvm_tdx_capabilites``, which tells the supported configuration of
+attributes, XFAM and CPUIDs.
+
+Launching a TD (TDX VM)
+-----------------------
+
+To launch a TDX guest:
+
+.. parsed-literal::
+
+    |qemu_system_x86| \\
+        -machine ...,confidential-guest-support=tdx0 \\
+        -object tdx-guest,id=tdx0 \\
+        -bios OVMF.fd \\
+
+Debugging
+---------
+
+Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs in off-TD
+debug mode. When in off-TD debug mode, TD's VCPU state and private memory are
+accessible via given SEAMCALLs. This requires KVM to expose APIs to invoke those
+SEAMCALLs and resonponding QEMU change.
+
+It's targeted as future work.
+
+restrictions
+------------
+
+ - No readonly support for private memory;
+
+ - No SMM support: SMM support requires manipulating the guset register states
+   which is not allowed;
+
+Live Migration
+--------------
+
+TODO
+
+References
+----------
+
+- `TDX Homepage <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html>`__
diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst
index 96bf54889a82..16dd4f1a8c80 100644
--- a/docs/system/target-i386.rst
+++ b/docs/system/target-i386.rst
@@ -29,6 +29,7 @@ Architectural features
    i386/kvm-pv
    i386/sgx
    i386/amd-memory-encryption
+   i386/tdx
 
 .. _pcsys_005freq:
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 06/36] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  2022-05-12  3:17 ` [RFC PATCH v4 06/36] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
@ 2022-05-12 17:38   ` Isaku Yamahata
  2022-05-23  8:45   ` Gerd Hoffmann
  1 sibling, 0 replies; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-12 17:38 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:33AM +0800,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
> IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
> TDX context. It will be used to validate user's setting later.
> 
> Besides, introduce the interfaces to invoke TDX "ioctls" at different
> scope (KVM, VM and VCPU) in preparation.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  target/i386/kvm/tdx.c | 85 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 85 insertions(+)
> 
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 77e33ae01147..68bedbad0ebe 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -14,12 +14,97 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "qom/object_interfaces.h"
> +#include "sysemu/kvm.h"
>  
>  #include "hw/i386/x86.h"
>  #include "tdx.h"
>  
> +enum tdx_ioctl_level{
> +    TDX_PLATFORM_IOCTL,
> +    TDX_VM_IOCTL,
> +    TDX_VCPU_IOCTL,
> +};
> +
> +static int __tdx_ioctl(void *state, enum tdx_ioctl_level level, int cmd_id,
> +                        __u32 flags, void *data)
> +{
> +    struct kvm_tdx_cmd tdx_cmd;
> +    int r;
> +
> +    memset(&tdx_cmd, 0x0, sizeof(tdx_cmd));
> +
> +    tdx_cmd.id = cmd_id;
> +    tdx_cmd.flags = flags;
> +    tdx_cmd.data = (__u64)(unsigned long)data;
> +
> +    switch (level) {
> +    case TDX_PLATFORM_IOCTL:
> +        r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
> +        break;
> +    case TDX_VM_IOCTL:
> +        r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
> +        break;
> +    case TDX_VCPU_IOCTL:
> +        r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
> +        break;
> +    default:
> +        error_report("Invalid tdx_ioctl_level %d", level);
> +        exit(1);
> +    }
> +
> +    return r;
> +}
> +
> +static inline int tdx_platform_ioctl(int cmd_id, __u32 metadata, void *data)

nitpick:  Because metadata was renamed to flags for clarity, please update
those.

> +{
> +    return __tdx_ioctl(NULL, TDX_PLATFORM_IOCTL, cmd_id, metadata, data);
> +}
> +
> +static inline int tdx_vm_ioctl(int cmd_id, __u32 metadata, void *data)
> +{
> +    return __tdx_ioctl(NULL, TDX_VM_IOCTL, cmd_id, metadata, data);
> +}
> +
> +static inline int tdx_vcpu_ioctl(void *vcpu_fd, int cmd_id, __u32 metadata,
> +                                 void *data)
> +{
> +    return  __tdx_ioctl(vcpu_fd, TDX_VCPU_IOCTL, cmd_id, metadata, data);
> +}
> +
> +static struct kvm_tdx_capabilities *tdx_caps;
> +
> +static void get_tdx_capabilities(void)
> +{
> +    struct kvm_tdx_capabilities *caps;
> +    int max_ent = 1;

Because we know the number of entries for TDX 1.0. We can start with better
value with comment on it.


> +    int r, size;
> +
> +    do {
> +        size = sizeof(struct kvm_tdx_capabilities) +
> +               max_ent * sizeof(struct kvm_tdx_cpuid_config);
> +        caps = g_malloc0(size);
> +        caps->nr_cpuid_configs = max_ent;
> +
> +        r = tdx_platform_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
> +        if (r == -E2BIG) {
> +            g_free(caps);
> +            max_ent *= 2;
> +        } else if (r < 0) {
> +            error_report("KVM_TDX_CAPABILITIES failed: %s\n", strerror(-r));
> +            exit(1);
> +        }
> +    }
> +    while (r == -E2BIG);
> +
> +    tdx_caps = caps;
> +}
> +
>  int tdx_kvm_init(MachineState *ms, Error **errp)
>  {
> +    if (!tdx_caps) {
> +        get_tdx_capabilities();
> +    }
> +
>      return 0;
>  }
>  
> -- 
> 2.27.0
> 
> 

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 10/36] i386/kvm: Move architectural CPUID leaf generation to separate helper
  2022-05-12  3:17 ` [RFC PATCH v4 10/36] i386/kvm: Move architectural CPUID leaf generation to separate helper Xiaoyao Li
@ 2022-05-12 17:48   ` Isaku Yamahata
  2022-05-13  0:37     ` Xiaoyao Li
  2022-05-23  9:06     ` Gerd Hoffmann
  0 siblings, 2 replies; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-12 17:48 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:37AM +0800,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
> index b434feaa6b1d..5c7972f617e8 100644
> --- a/target/i386/kvm/kvm_i386.h
> +++ b/target/i386/kvm/kvm_i386.h
> @@ -24,6 +24,10 @@
>  #define kvm_ioapic_in_kernel() \
>      (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
>  
> +#define KVM_MAX_CPUID_ENTRIES  100

In Linux side, the value was bumped to 256.  Opportunistically let's make it
same.

3f4e3eb417b1 KVM: x86: bump KVM_MAX_CPUID_ENTRIES

> +uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
> +                            uint32_t cpuid_i);
> +
>  #else
>  
>  #define kvm_pit_in_kernel()      0
> -- 
> 2.27.0
> 
> 

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 09/36] KVM: Introduce kvm_arch_pre_create_vcpu()
  2022-05-12  3:17 ` [RFC PATCH v4 09/36] KVM: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
@ 2022-05-12 17:50   ` Isaku Yamahata
  2022-05-13  0:15     ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-12 17:50 UTC (permalink / raw)
  To: Xiaoyao Li, g
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:36AM +0800,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
> work prior to create any vcpu. This is for i386 TDX because it needs
> call TDX_INIT_VM before creating any vcpu.

Because "11/36 i386/tdx: Initialize TDX before creating TD vcpus" uses
kvm_arch_pre_create_vcpu() (and 10/36 doesn't use it), please move this patch
right before 11/36. (swap 09/36 and 10/36).

Thanks,

> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  accel/kvm/kvm-all.c  | 12 ++++++++++++
>  include/sysemu/kvm.h |  1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 32e177bd26b4..e6fa9d23207a 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -457,6 +457,11 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
>      return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
>  }
>  
> +int __attribute__ ((weak)) kvm_arch_pre_create_vcpu(CPUState *cpu)
> +{
> +    return 0;
> +}
> +
>  int kvm_init_vcpu(CPUState *cpu, Error **errp)
>  {
>      KVMState *s = kvm_state;
> @@ -465,6 +470,13 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>  
>      trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  
> +    ret = kvm_arch_pre_create_vcpu(cpu);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret,
> +                         "kvm_init_vcpu: kvm_arch_pre_create_vcpu() failed");
> +        goto err;
> +    }
> +
>      ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
>      if (ret < 0) {
>          error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)",
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index a783c7886811..0e94031ab7c7 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -373,6 +373,7 @@ int kvm_arch_put_registers(CPUState *cpu, int level);
>  
>  int kvm_arch_init(MachineState *ms, KVMState *s);
>  
> +int kvm_arch_pre_create_vcpu(CPUState *cpu);
>  int kvm_arch_init_vcpu(CPUState *cpu);
>  int kvm_arch_destroy_vcpu(CPUState *cpu);
>  
> -- 
> 2.27.0
> 
> 

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 14/36] i386/tdx: Implement user specified tsc frequency
  2022-05-12  3:17 ` [RFC PATCH v4 14/36] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
@ 2022-05-12 18:04   ` Isaku Yamahata
  2022-05-13  0:46     ` Xiaoyao Li
  2022-05-23  9:43   ` Gerd Hoffmann
  1 sibling, 1 reply; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-12 18:04 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:41AM +0800,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and pass it
> to KVM_TDX_INIT_VM.
> 
> Besides, sanity check the tsc frequency to be in the legal range and
> legal granularity (required by TDX module).

Just to make it sure.
You didn't use VM-scoped KVM_SET_TSC_KHZ because KVM side patch is still in
kvm/queue?  Once the patch lands, we should use it.

Thanks,

> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  target/i386/kvm/kvm.c |  8 ++++++++
>  target/i386/kvm/tdx.c | 18 ++++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index f2d7c3cf59ac..c51125ab200f 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -818,6 +818,14 @@ static int kvm_arch_set_tsc_khz(CPUState *cs)
>      int r, cur_freq;
>      bool set_ioctl = false;
>  
> +    /*
> +     * TD guest's TSC is immutable, it cannot be set/changed via
> +     * KVM_SET_TSC_KHZ, but only be initialized via KVM_TDX_INIT_VM
> +     */
> +    if (is_tdx_vm()) {
> +        return 0;
> +    }
> +
>      if (!env->tsc_khz) {
>          return 0;
>      }
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 9f2cdf640b5c..622efc409438 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -35,6 +35,9 @@
>  #define TDX_TD_ATTRIBUTES_PKS               BIT_ULL(30)
>  #define TDX_TD_ATTRIBUTES_PERFMON           BIT_ULL(63)
>  
> +#define TDX_MIN_TSC_FREQUENCY_KHZ   (100 * 1000)
> +#define TDX_MAX_TSC_FREQUENCY_KHZ   (10 * 1000 * 1000)
> +
>  static TdxGuest *tdx_guest;
>  
>  /* It's valid after kvm_confidential_guest_init()->kvm_tdx_init() */
> @@ -211,6 +214,20 @@ int tdx_pre_create_vcpu(CPUState *cpu)
>          goto out;
>      }
>  
> +    r = -EINVAL;
> +    if (env->tsc_khz && (env->tsc_khz < TDX_MIN_TSC_FREQUENCY_KHZ ||
> +                         env->tsc_khz > TDX_MAX_TSC_FREQUENCY_KHZ)) {
> +        error_report("Invalid TSC %ld KHz, must specify cpu_frequency between [%d, %d] kHz",
> +                      env->tsc_khz, TDX_MIN_TSC_FREQUENCY_KHZ,
> +                      TDX_MAX_TSC_FREQUENCY_KHZ);
> +        goto out;
> +    }
> +
> +    if (env->tsc_khz % (25 * 1000)) {
> +        error_report("Invalid TSC %ld KHz, it must be multiple of 25MHz", env->tsc_khz);
> +        goto out;
> +    }
> +
>      r = setup_td_guest_attributes(x86cpu);
>      if (r) {
>          goto out;
> @@ -221,6 +238,7 @@ int tdx_pre_create_vcpu(CPUState *cpu)
>  
>      init_vm.attributes = tdx_guest->attributes;
>      init_vm.max_vcpus = ms->smp.cpus;
> +    init_vm.tsc_khz = env->tsc_khz;
>  
>      r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
>      if (r < 0) {
> -- 
> 2.27.0
> 
> 

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 23/36] i386/tdx: Setup the TD HOB list
  2022-05-12  3:17 ` [RFC PATCH v4 23/36] i386/tdx: Setup the TD HOB list Xiaoyao Li
@ 2022-05-12 18:33   ` Isaku Yamahata
  2022-05-24  7:56   ` Gerd Hoffmann
  1 sibling, 0 replies; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-12 18:33 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:50AM +0800,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> The TD HOB list is used to pass the information from VMM to TDVF. The TD
> HOB must include PHIT HOB and Resource Descriptor HOB. More details can
> be found in TDVF specification and PI specification.
> 
> Build the TD HOB in TDX's machine_init_done callback.

Because HOB is introduced first time, please expand HOB.


> Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  hw/i386/meson.build   |   2 +-
>  hw/i386/tdvf-hob.c    | 212 ++++++++++++++++++++++++++++++++++++++++++
>  hw/i386/tdvf-hob.h    |  25 +++++
>  hw/i386/uefi.h        | 198 +++++++++++++++++++++++++++++++++++++++
>  target/i386/kvm/tdx.c |  16 ++++
>  5 files changed, 452 insertions(+), 1 deletion(-)
>  create mode 100644 hw/i386/tdvf-hob.c
>  create mode 100644 hw/i386/tdvf-hob.h
>  create mode 100644 hw/i386/uefi.h
> 
> diff --git a/hw/i386/meson.build b/hw/i386/meson.build
> index 97f3b50503b0..b59e0d35bba3 100644
> --- a/hw/i386/meson.build
> +++ b/hw/i386/meson.build
> @@ -28,7 +28,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
>    'port92.c'))
>  i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
>                                          if_false: files('pc_sysfw_ovmf-stubs.c'))
> -i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
> +i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c', 'tdvf-hob.c'))
>  
>  subdir('kvm')
>  subdir('xen')
> diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
> new file mode 100644
> index 000000000000..31160e9f95c5
> --- /dev/null
> +++ b/hw/i386/tdvf-hob.c
> @@ -0,0 +1,212 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> +
> + * Copyright (c) 2020 Intel Corporation
> + * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
> + *                        <isaku.yamahata at intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "e820_memory_layout.h"
> +#include "hw/i386/pc.h"
> +#include "hw/i386/x86.h"
> +#include "hw/pci/pcie_host.h"
> +#include "sysemu/kvm.h"
> +#include "tdvf-hob.h"
> +#include "uefi.h"
> +
> +typedef struct TdvfHob {
> +    hwaddr hob_addr;
> +    void *ptr;
> +    int size;
> +
> +    /* working area */
> +    void *current;
> +    void *end;
> +} TdvfHob;
> +
> +static uint64_t tdvf_current_guest_addr(const TdvfHob *hob)
> +{
> +    return hob->hob_addr + (hob->current - hob->ptr);
> +}
> +
> +static void tdvf_align(TdvfHob *hob, size_t align)
> +{
> +    hob->current = QEMU_ALIGN_PTR_UP(hob->current, align);
> +}
> +
> +static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
> +{
> +    void *ret;
> +
> +    if (hob->current + size > hob->end) {
> +        error_report("TD_HOB overrun, size = 0x%" PRIx64, size);
> +        exit(1);
> +    }
> +
> +    ret = hob->current;
> +    hob->current += size;
> +    tdvf_align(hob, 8);
> +    return ret;
> +}
> +
> +static void tdvf_hob_add_mmio_resource(TdvfHob *hob, uint64_t start,
> +                                       uint64_t end)
> +{
> +    EFI_HOB_RESOURCE_DESCRIPTOR *region;
> +
> +    if (!start) {
> +        return;
> +    }
> +
> +    region = tdvf_get_area(hob, sizeof(*region));
> +    *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
> +        .Header = {
> +            .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
> +            .HobLength = cpu_to_le16(sizeof(*region)),
> +            .Reserved = cpu_to_le32(0),
> +        },
> +        .Owner = EFI_HOB_OWNER_ZERO,
> +        .ResourceType = cpu_to_le32(EFI_RESOURCE_MEMORY_MAPPED_IO),
> +        .ResourceAttribute = cpu_to_le32(EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO),
> +        .PhysicalStart = cpu_to_le64(start),
> +        .ResourceLength = cpu_to_le64(end - start),
> +    };
> +}
> +
> +static void tdvf_hob_add_mmio_resources(TdvfHob *hob)
> +{
> +    MachineState *ms = MACHINE(qdev_get_machine());
> +    X86MachineState *x86ms = X86_MACHINE(ms);
> +    PCIHostState *pci_host;
> +    uint64_t start, end;
> +    uint64_t mcfg_base, mcfg_size;
> +    Object *host;
> +
> +    /* Effectively PCI hole + other MMIO devices. */
> +    tdvf_hob_add_mmio_resource(hob, x86ms->below_4g_mem_size,
> +                               APIC_DEFAULT_ADDRESS);
> +
> +    /* Stolen from acpi_get_i386_pci_host(), there's gotta be an easier way. */
> +    pci_host = OBJECT_CHECK(PCIHostState,
> +                            object_resolve_path("/machine/i440fx", NULL),
> +                            TYPE_PCI_HOST_BRIDGE);
> +    if (!pci_host) {
> +        pci_host = OBJECT_CHECK(PCIHostState,
> +                                object_resolve_path("/machine/q35", NULL),
> +                                TYPE_PCI_HOST_BRIDGE);
> +    }
> +    g_assert(pci_host);
> +
> +    host = OBJECT(pci_host);
> +
> +    /* PCI hole above 4gb. */
> +    start = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_START,
> +                                     NULL);
> +    end = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_END, NULL);
> +    tdvf_hob_add_mmio_resource(hob, start, end);
> +
> +    /* MMCFG region */
> +    mcfg_base = object_property_get_uint(host, PCIE_HOST_MCFG_BASE, NULL);
> +    mcfg_size = object_property_get_uint(host, PCIE_HOST_MCFG_SIZE, NULL);
> +    if (mcfg_base && mcfg_base != PCIE_BASE_ADDR_UNMAPPED && mcfg_size) {
> +        tdvf_hob_add_mmio_resource(hob, mcfg_base, mcfg_base + mcfg_size);
> +    }
> +}
> +
> +static void tdvf_hob_add_memory_resources(TdxGuest *tdx, TdvfHob *hob)
> +{
> +    EFI_HOB_RESOURCE_DESCRIPTOR *region;
> +    EFI_RESOURCE_ATTRIBUTE_TYPE attr;
> +    EFI_RESOURCE_TYPE resource_type;
> +
> +    TdxRamEntry *e;
> +    int i;
> +
> +    for (i = 0; i < tdx->nr_ram_entries; i++) {
> +        e = &tdx->ram_entries[i];
> +
> +        if (e->type == TDX_RAM_UNACCEPTED) {
> +            resource_type = EFI_RESOURCE_MEMORY_UNACCEPTED;
> +            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED;
> +        } else if (e->type == TDX_RAM_ADDED){
> +            resource_type = EFI_RESOURCE_SYSTEM_MEMORY;
> +            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE;
> +        } else {
> +            error_report("unknown TDXRAMENTRY type %d", e->type);
> +            exit(1);
> +        }
> +
> +        region = tdvf_get_area(hob, sizeof(*region));
> +        *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
> +            .Header = {
> +                .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
> +                .HobLength = cpu_to_le16(sizeof(*region)),
> +                .Reserved = cpu_to_le32(0),
> +            },
> +            .Owner = EFI_HOB_OWNER_ZERO,
> +            .ResourceType = cpu_to_le32(resource_type),
> +            .ResourceAttribute = cpu_to_le32(attr),
> +            .PhysicalStart = e->address,

nitpick: cpu_to_le64() for consistency. My bad.

> +            .ResourceLength = e->length,

ditto.

Thanks,

> +        };
> +    }
> +}
> +
> +void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob)
> +{
> +    TdvfHob hob = {
> +        .hob_addr = td_hob->address,
> +        .size = td_hob->size,
> +        .ptr = td_hob->mem_ptr,
> +
> +        .current = td_hob->mem_ptr,
> +        .end = td_hob->mem_ptr + td_hob->size,
> +    };
> +
> +    EFI_HOB_GENERIC_HEADER *last_hob;
> +    EFI_HOB_HANDOFF_INFO_TABLE *hit;
> +
> +    /* Note, Efi{Free}Memory{Bottom,Top} are ignored, leave 'em zeroed. */
> +    hit = tdvf_get_area(&hob, sizeof(*hit));
> +    *hit = (EFI_HOB_HANDOFF_INFO_TABLE) {
> +        .Header = {
> +            .HobType = EFI_HOB_TYPE_HANDOFF,
> +            .HobLength = cpu_to_le16(sizeof(*hit)),
> +            .Reserved = cpu_to_le32(0),
> +        },
> +        .Version = cpu_to_le32(EFI_HOB_HANDOFF_TABLE_VERSION),
> +        .BootMode = cpu_to_le32(0),
> +        .EfiMemoryTop = cpu_to_le64(0),
> +        .EfiMemoryBottom = cpu_to_le64(0),
> +        .EfiFreeMemoryTop = cpu_to_le64(0),
> +        .EfiFreeMemoryBottom = cpu_to_le64(0),
> +        .EfiEndOfHobList = cpu_to_le64(0), /* initialized later */
> +    };
> +
> +    tdvf_hob_add_memory_resources(tdx, &hob);
> +
> +    tdvf_hob_add_mmio_resources(&hob);
> +
> +    last_hob = tdvf_get_area(&hob, sizeof(*last_hob));
> +    *last_hob =  (EFI_HOB_GENERIC_HEADER) {
> +        .HobType = EFI_HOB_TYPE_END_OF_HOB_LIST,
> +        .HobLength = cpu_to_le16(sizeof(*last_hob)),
> +        .Reserved = cpu_to_le32(0),
> +    };
> +    hit->EfiEndOfHobList = tdvf_current_guest_addr(&hob);
> +}
> diff --git a/hw/i386/tdvf-hob.h b/hw/i386/tdvf-hob.h
> new file mode 100644
> index 000000000000..f0494e8c4af8
> --- /dev/null
> +++ b/hw/i386/tdvf-hob.h
> @@ -0,0 +1,25 @@
> +#ifndef HW_I386_TD_HOB_H
> +#define HW_I386_TD_HOB_H
> +
> +#include "hw/i386/tdvf.h"
> +#include "hw/i386/uefi.h"
> +#include "target/i386/kvm/tdx.h"
> +
> +void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob);
> +
> +#define EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE     \
> +    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
> +     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
> +     EFI_RESOURCE_ATTRIBUTE_TESTED)
> +
> +#define EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED  \
> +    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
> +     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
> +     EFI_RESOURCE_ATTRIBUTE_TESTED)
> +
> +#define EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO        \
> +    (EFI_RESOURCE_ATTRIBUTE_PRESENT     |       \
> +     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
> +     EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE)
> +
> +#endif
> diff --git a/hw/i386/uefi.h b/hw/i386/uefi.h
> new file mode 100644
> index 000000000000..b15aba796156
> --- /dev/null
> +++ b/hw/i386/uefi.h
> @@ -0,0 +1,198 @@
> +/*
> + * Copyright (C) 2020 Intel Corporation
> + *
> + * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
> + *                        <isaku.yamahata at intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + *
> + */
> +
> +#ifndef HW_I386_UEFI_H
> +#define HW_I386_UEFI_H
> +
> +/***************************************************************************/
> +/*
> + * basic EFI definitions
> + * supplemented with UEFI Specification Version 2.8 (Errata A)
> + * released February 2020
> + */
> +/* UEFI integer is little endian */
> +
> +typedef struct {
> +    uint32_t Data1;
> +    uint16_t Data2;
> +    uint16_t Data3;
> +    uint8_t Data4[8];
> +} EFI_GUID;
> +
> +typedef enum {
> +    EfiReservedMemoryType,
> +    EfiLoaderCode,
> +    EfiLoaderData,
> +    EfiBootServicesCode,
> +    EfiBootServicesData,
> +    EfiRuntimeServicesCode,
> +    EfiRuntimeServicesData,
> +    EfiConventionalMemory,
> +    EfiUnusableMemory,
> +    EfiACPIReclaimMemory,
> +    EfiACPIMemoryNVS,
> +    EfiMemoryMappedIO,
> +    EfiMemoryMappedIOPortSpace,
> +    EfiPalCode,
> +    EfiPersistentMemory,
> +    EfiUnacceptedMemoryType,
> +    EfiMaxMemoryType
> +} EFI_MEMORY_TYPE;
> +
> +#define EFI_HOB_HANDOFF_TABLE_VERSION 0x0009
> +
> +#define EFI_HOB_TYPE_HANDOFF              0x0001
> +#define EFI_HOB_TYPE_MEMORY_ALLOCATION    0x0002
> +#define EFI_HOB_TYPE_RESOURCE_DESCRIPTOR  0x0003
> +#define EFI_HOB_TYPE_GUID_EXTENSION       0x0004
> +#define EFI_HOB_TYPE_FV                   0x0005
> +#define EFI_HOB_TYPE_CPU                  0x0006
> +#define EFI_HOB_TYPE_MEMORY_POOL          0x0007
> +#define EFI_HOB_TYPE_FV2                  0x0009
> +#define EFI_HOB_TYPE_LOAD_PEIM_UNUSED     0x000A
> +#define EFI_HOB_TYPE_UEFI_CAPSULE         0x000B
> +#define EFI_HOB_TYPE_FV3                  0x000C
> +#define EFI_HOB_TYPE_UNUSED               0xFFFE
> +#define EFI_HOB_TYPE_END_OF_HOB_LIST      0xFFFF
> +
> +typedef struct {
> +    uint16_t HobType;
> +    uint16_t HobLength;
> +    uint32_t Reserved;
> +} EFI_HOB_GENERIC_HEADER;
> +
> +typedef uint64_t EFI_PHYSICAL_ADDRESS;
> +typedef uint32_t EFI_BOOT_MODE;
> +
> +typedef struct {
> +    EFI_HOB_GENERIC_HEADER Header;
> +    uint32_t Version;
> +    EFI_BOOT_MODE BootMode;
> +    EFI_PHYSICAL_ADDRESS EfiMemoryTop;
> +    EFI_PHYSICAL_ADDRESS EfiMemoryBottom;
> +    EFI_PHYSICAL_ADDRESS EfiFreeMemoryTop;
> +    EFI_PHYSICAL_ADDRESS EfiFreeMemoryBottom;
> +    EFI_PHYSICAL_ADDRESS EfiEndOfHobList;
> +} EFI_HOB_HANDOFF_INFO_TABLE;
> +
> +#define EFI_RESOURCE_SYSTEM_MEMORY          0x00000000
> +#define EFI_RESOURCE_MEMORY_MAPPED_IO       0x00000001
> +#define EFI_RESOURCE_IO                     0x00000002
> +#define EFI_RESOURCE_FIRMWARE_DEVICE        0x00000003
> +#define EFI_RESOURCE_MEMORY_MAPPED_IO_PORT  0x00000004
> +#define EFI_RESOURCE_MEMORY_RESERVED        0x00000005
> +#define EFI_RESOURCE_IO_RESERVED            0x00000006
> +#define EFI_RESOURCE_MEMORY_UNACCEPTED      0x00000007
> +#define EFI_RESOURCE_MAX_MEMORY_TYPE        0x00000008
> +
> +#define EFI_RESOURCE_ATTRIBUTE_PRESENT                  0x00000001
> +#define EFI_RESOURCE_ATTRIBUTE_INITIALIZED              0x00000002
> +#define EFI_RESOURCE_ATTRIBUTE_TESTED                   0x00000004
> +#define EFI_RESOURCE_ATTRIBUTE_SINGLE_BIT_ECC           0x00000008
> +#define EFI_RESOURCE_ATTRIBUTE_MULTIPLE_BIT_ECC         0x00000010
> +#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_1           0x00000020
> +#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_2           0x00000040
> +#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTED           0x00000080
> +#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTED          0x00000100
> +#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTED      0x00000200
> +#define EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE              0x00000400
> +#define EFI_RESOURCE_ATTRIBUTE_WRITE_COMBINEABLE        0x00000800
> +#define EFI_RESOURCE_ATTRIBUTE_WRITE_THROUGH_CACHEABLE  0x00001000
> +#define EFI_RESOURCE_ATTRIBUTE_WRITE_BACK_CACHEABLE     0x00002000
> +#define EFI_RESOURCE_ATTRIBUTE_16_BIT_IO                0x00004000
> +#define EFI_RESOURCE_ATTRIBUTE_32_BIT_IO                0x00008000
> +#define EFI_RESOURCE_ATTRIBUTE_64_BIT_IO                0x00010000
> +#define EFI_RESOURCE_ATTRIBUTE_UNCACHED_EXPORTED        0x00020000
> +#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTED      0x00040000
> +#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTABLE    0x00080000
> +#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTABLE         0x00100000
> +#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTABLE        0x00200000
> +#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTABLE    0x00400000
> +#define EFI_RESOURCE_ATTRIBUTE_PERSISTENT               0x00800000
> +#define EFI_RESOURCE_ATTRIBUTE_PERSISTABLE              0x01000000
> +#define EFI_RESOURCE_ATTRIBUTE_MORE_RELIABLE            0x02000000
> +
> +typedef uint32_t EFI_RESOURCE_TYPE;
> +typedef uint32_t EFI_RESOURCE_ATTRIBUTE_TYPE;
> +
> +typedef struct {
> +    EFI_HOB_GENERIC_HEADER Header;
> +    EFI_GUID Owner;
> +    EFI_RESOURCE_TYPE ResourceType;
> +    EFI_RESOURCE_ATTRIBUTE_TYPE ResourceAttribute;
> +    EFI_PHYSICAL_ADDRESS PhysicalStart;
> +    uint64_t ResourceLength;
> +} EFI_HOB_RESOURCE_DESCRIPTOR;
> +
> +typedef struct {
> +    EFI_HOB_GENERIC_HEADER Header;
> +    EFI_GUID Name;
> +
> +    /* guid specific data follows */
> +} EFI_HOB_GUID_TYPE;
> +
> +typedef struct {
> +    EFI_HOB_GENERIC_HEADER Header;
> +    EFI_PHYSICAL_ADDRESS BaseAddress;
> +    uint64_t Length;
> +} EFI_HOB_FIRMWARE_VOLUME;
> +
> +typedef struct {
> +    EFI_HOB_GENERIC_HEADER Header;
> +    EFI_PHYSICAL_ADDRESS BaseAddress;
> +    uint64_t Length;
> +    EFI_GUID FvName;
> +    EFI_GUID FileName;
> +} EFI_HOB_FIRMWARE_VOLUME2;
> +
> +typedef struct {
> +    EFI_HOB_GENERIC_HEADER Header;
> +    EFI_PHYSICAL_ADDRESS BaseAddress;
> +    uint64_t Length;
> +    uint32_t AuthenticationStatus;
> +    bool ExtractedFv;
> +    EFI_GUID FvName;
> +    EFI_GUID FileName;
> +} EFI_HOB_FIRMWARE_VOLUME3;
> +
> +typedef struct {
> +    EFI_HOB_GENERIC_HEADER Header;
> +    uint8_t SizeOfMemorySpace;
> +    uint8_t SizeOfIoSpace;
> +    uint8_t Reserved[6];
> +} EFI_HOB_CPU;
> +
> +typedef struct {
> +    EFI_HOB_GENERIC_HEADER Header;
> +} EFI_HOB_MEMORY_POOL;
> +
> +typedef struct {
> +    EFI_HOB_GENERIC_HEADER Header;
> +
> +    EFI_PHYSICAL_ADDRESS BaseAddress;
> +    uint64_t Length;
> +} EFI_HOB_UEFI_CAPSULE;
> +
> +#define EFI_HOB_OWNER_ZERO                                      \
> +    ((EFI_GUID){ 0x00000000, 0x0000, 0x0000,                    \
> +        { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } })
> +
> +#endif
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index e7071bfe4c9c..3e18ace90bf7 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -22,6 +22,7 @@
>  #include "hw/i386/e820_memory_layout.h"
>  #include "hw/i386/x86.h"
>  #include "hw/i386/tdvf.h"
> +#include "hw/i386/tdvf-hob.h"
>  #include "kvm_i386.h"
>  #include "tdx.h"
>  
> @@ -130,6 +131,19 @@ static void get_tdx_capabilities(void)
>      tdx_caps = caps;
>  }
>  
> +static TdxFirmwareEntry *tdx_get_hob_entry(TdxGuest *tdx)
> +{
> +    TdxFirmwareEntry *entry;
> +
> +    for_each_tdx_fw_entry(&tdx->tdvf, entry) {
> +        if (entry->type == TDVF_SECTION_TYPE_TD_HOB) {
> +            return entry;
> +        }
> +    }
> +    error_report("TDVF metadata doesn't specify TD_HOB location.");
> +    exit(1);
> +}
> +
>  static void tdx_add_ram_entry(uint64_t address, uint64_t length, uint32_t type)
>  {
>      uint32_t nr_entries = tdx_guest->nr_ram_entries;
> @@ -249,6 +263,8 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
>  
>      qsort(tdx_guest->ram_entries, tdx_guest->nr_ram_entries,
>            sizeof(TdxRamEntry), &tdx_ram_entry_compare);
> +
> +    tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
>  }
>  
>  static Notifier tdx_machine_done_notify = {
> -- 
> 2.27.0
> 
> 

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 24/36] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION
  2022-05-12  3:17 ` [RFC PATCH v4 24/36] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
@ 2022-05-12 18:34   ` Isaku Yamahata
  2022-05-13  0:46     ` Xiaoyao Li
  2022-05-24  7:57   ` Gerd Hoffmann
  1 sibling, 1 reply; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-12 18:34 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:51AM +0800,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> TDVF firmware (CODE and VARS) needs to be added/copied to TD's private
> memory via KVM_TDX_INIT_MEM_REGION, as well as TD HOB and TEMP memory.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  target/i386/kvm/tdx.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 3e18ace90bf7..567ee12e88f0 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -240,6 +240,7 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
>  {
>      TdxFirmware *tdvf = &tdx_guest->tdvf;
>      TdxFirmwareEntry *entry;
> +    int r;
>  
>      tdx_init_ram_entries();
>  
> @@ -265,6 +266,29 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
>            sizeof(TdxRamEntry), &tdx_ram_entry_compare);
>  
>      tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
> +
> +    for_each_tdx_fw_entry(tdvf, entry) {
> +        struct kvm_tdx_init_mem_region mem_region = {
> +            .source_addr = (__u64)entry->mem_ptr,
> +            .gpa = entry->address,
> +            .nr_pages = entry->size / 4096,
> +        };
> +
> +        __u32 metadata = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
> +                         KVM_TDX_MEASURE_MEMORY_REGION : 0;

Please use flags instead of metadata.


> +        r = tdx_vm_ioctl(KVM_TDX_INIT_MEM_REGION, metadata, &mem_region);
> +        if (r < 0) {
> +             error_report("KVM_TDX_INIT_MEM_REGION failed %s", strerror(-r));
> +             exit(1);
> +        }
> +
> +        if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
> +            entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
> +            qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
> +            entry->mem_ptr = NULL;
> +        }
> +    }
>  }
>  
>  static Notifier tdx_machine_done_notify = {
> -- 
> 2.27.0
> 
> 

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 36/36] docs: Add TDX documentation
  2022-05-12  3:18 ` [RFC PATCH v4 36/36] docs: Add TDX documentation Xiaoyao Li
@ 2022-05-12 18:42   ` Isaku Yamahata
  0 siblings, 0 replies; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-12 18:42 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata, Gerd Hoffmann,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:18:03AM +0800,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> Add docs/system/i386/tdx.rst for TDX support, and add tdx in
> confidential-guest-support.rst
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  docs/system/confidential-guest-support.rst |   1 +
>  docs/system/i386/tdx.rst                   | 103 +++++++++++++++++++++
>  docs/system/target-i386.rst                |   1 +
>  3 files changed, 105 insertions(+)
>  create mode 100644 docs/system/i386/tdx.rst
> 
> diff --git a/docs/system/confidential-guest-support.rst b/docs/system/confidential-guest-support.rst
> index 0c490dbda2b7..66129fbab64c 100644
> --- a/docs/system/confidential-guest-support.rst
> +++ b/docs/system/confidential-guest-support.rst
> @@ -38,6 +38,7 @@ Supported mechanisms
>  Currently supported confidential guest mechanisms are:
>  
>  * AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-memory-encryption`)
> +* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`)
>  * POWER Protected Execution Facility (PEF) (see :ref:`power-papr-protected-execution-facility-pef`)
>  * s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`)
>  
> diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst
> new file mode 100644
> index 000000000000..96d91fea5516
> --- /dev/null
> +++ b/docs/system/i386/tdx.rst
> @@ -0,0 +1,103 @@
> +Intel Trusted Domain eXtension (TDX)
> +====================================
> +
> +Intel Trusted Domain eXtensions (TDX) refers to an Intel technology that extends
> +Virtual Machine Extensions (VMX) and Multi-Key Total Memory Encryption (MKTME)
> +with a new kind of virtual machine guest called a Trust Domain (TD). A TD runs
> +in a CPU mode that is designed to protect the confidentiality of its memory
> +contents and its CPU state from any other software, including the hosting
> +Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself.
> +
> +Prerequisites
> +-------------
> +
> +To run TD, the physical machine needs to have TDX module loaded and initialized
> +while KVM hypervisor has TDX support and has TDX enabled. If those requirements
> +are met, the ``KVM_CAP_VM_TYPES`` will report the support of ``KVM_X86_TDX_VM``.
> +
> +Trust Domain Virtual Firmware (TDVF)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Trust Domain Virtual Firmware (TDVF) is required to provide TD services to boot
> +TD Guest OS. TDVF needs to be copied to guest private memory and measured before
> +a TD boots.
> +
> +The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_INIT_MEM_REGION``
> +to copy the TDVF image to TD's private memory space.
> +
> +Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash
> +device and it actually works as RAM. "-bios" option is chosen to load TDVF.
> +
> +OVMF is the opensource firmware that implements the TDVF support. Thus the
> +command line to specify and load TDVF is `-bios OVMF.fd`
> +
> +Feature Control
> +---------------
> +
> +Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD is not
> +under full control of VMM. VMM can only configure part of features of a TD on
> +``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl.
> +
> +The configurable features have three types:
> +
> +- Attributes:
> +  - PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD,
> +  which determines related CPUID bit and CR4 bit;
> +  - PERFMON (bit 63) controls whether PMU is exposed to TD.
> +
> +- XSAVE related features (XFAM):
> +  XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It
> +  determines the set of extended features available for use by the guest TD.
> +
> +- CPUID features:
> +  Only some bits of some CPUID leaves are directly configurable by VMM.
> +
> +What features can be configured is reported via TDX capabilities.
> +
> +TDX capabilities
> +~~~~~~~~~~~~~~~~
> +
> +The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_CAPABILITIES``
> +to get the TDX capabilities from KVM. It returns a data structure of
> +``struct kvm_tdx_capabilites``, which tells the supported configuration of
> +attributes, XFAM and CPUIDs.
> +
> +Launching a TD (TDX VM)
> +-----------------------
> +
> +To launch a TDX guest:
> +
> +.. parsed-literal::
> +
> +    |qemu_system_x86| \\
> +        -machine ...,confidential-guest-support=tdx0 \\
> +        -object tdx-guest,id=tdx0 \\
> +        -bios OVMF.fd \\

Don't we need kernel-irqchip=split?
Or this patch series set it automatically?

Thanks,

> +
> +Debugging
> +---------
> +
> +Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs in off-TD
> +debug mode. When in off-TD debug mode, TD's VCPU state and private memory are
> +accessible via given SEAMCALLs. This requires KVM to expose APIs to invoke those
> +SEAMCALLs and resonponding QEMU change.
> +
> +It's targeted as future work.
> +
> +restrictions
> +------------
> +
> + - No readonly support for private memory;
> +
> + - No SMM support: SMM support requires manipulating the guset register states
> +   which is not allowed;
> +
> +Live Migration
> +--------------
> +
> +TODO
> +
> +References
> +----------
> +
> +- `TDX Homepage <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html>`__
> diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst
> index 96bf54889a82..16dd4f1a8c80 100644
> --- a/docs/system/target-i386.rst
> +++ b/docs/system/target-i386.rst
> @@ -29,6 +29,7 @@ Architectural features
>     i386/kvm-pv
>     i386/sgx
>     i386/amd-memory-encryption
> +   i386/tdx
>  
>  .. _pcsys_005freq:
>  
> -- 
> 2.27.0
> 
> 

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 09/36] KVM: Introduce kvm_arch_pre_create_vcpu()
  2022-05-12 17:50   ` Isaku Yamahata
@ 2022-05-13  0:15     ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-13  0:15 UTC (permalink / raw)
  To: Isaku Yamahata, g
  Cc: Paolo Bonzini, isaku.yamahata, Gerd Hoffmann,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/13/2022 1:50 AM, Isaku Yamahata wrote:
> On Thu, May 12, 2022 at 11:17:36AM +0800,
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
>> Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
>> work prior to create any vcpu. This is for i386 TDX because it needs
>> call TDX_INIT_VM before creating any vcpu.
> 
> Because "11/36 i386/tdx: Initialize TDX before creating TD vcpus" uses
> kvm_arch_pre_create_vcpu() (and 10/36 doesn't use it), please move this patch
> right before 11/36. (swap 09/36 and 10/36).
> 

OK.

I will change the order.



^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 10/36] i386/kvm: Move architectural CPUID leaf generation to separate helper
  2022-05-12 17:48   ` Isaku Yamahata
@ 2022-05-13  0:37     ` Xiaoyao Li
  2022-05-23  9:06     ` Gerd Hoffmann
  1 sibling, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-13  0:37 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Paolo Bonzini, isaku.yamahata, Gerd Hoffmann,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/13/2022 1:48 AM, Isaku Yamahata wrote:
> On Thu, May 12, 2022 at 11:17:37AM +0800,
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
>> diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
>> index b434feaa6b1d..5c7972f617e8 100644
>> --- a/target/i386/kvm/kvm_i386.h
>> +++ b/target/i386/kvm/kvm_i386.h
>> @@ -24,6 +24,10 @@
>>   #define kvm_ioapic_in_kernel() \
>>       (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
>>   
>> +#define KVM_MAX_CPUID_ENTRIES  100
> 
> In Linux side, the value was bumped to 256.  Opportunistically let's make it
> same.
> 
> 3f4e3eb417b1 KVM: x86: bump KVM_MAX_CPUID_ENTRIES

I don't think so.

In KVM, KVM_MAX_CPUID_ENTRIES is used to guard IOCTL 
KVM_SET_CPUID/KVM_SET_CPUID2/KVM_GET_SUPPORTED_CPUID/KVM_GET_EMULATED_CPUID, 
that KVM handles at most
the number of KVM_MAX_CPUID_ENTRIES entries.

However, in QEMU, KVM_MAX_CPUID_ENTRIES is used as the maximum total 
number of CPUID entries that generated by QEMU. It's used to guard the 
number in kvm_x86_arch_cpuid().

I think we can increase the number when we actually hit the check in 
kvm_x86_arch_cupid().

>> +uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
>> +                            uint32_t cpuid_i);
>> +
>>   #else
>>   
>>   #define kvm_pit_in_kernel()      0
>> -- 
>> 2.27.0
>>
>>
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 14/36] i386/tdx: Implement user specified tsc frequency
  2022-05-12 18:04   ` Isaku Yamahata
@ 2022-05-13  0:46     ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-13  0:46 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Paolo Bonzini, isaku.yamahata, Gerd Hoffmann,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/13/2022 2:04 AM, Isaku Yamahata wrote:
> On Thu, May 12, 2022 at 11:17:41AM +0800,
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
>> Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and pass it
>> to KVM_TDX_INIT_VM.
>>
>> Besides, sanity check the tsc frequency to be in the legal range and
>> legal granularity (required by TDX module).
> 
> Just to make it sure.
> You didn't use VM-scoped KVM_SET_TSC_KHZ because KVM side patch is still in
> kvm/queue?  Once the patch lands, we should use it.

I didn't use VM-scoped KVM_SET_TSC_KHZ is because

1) corresponding TDX KVM v6 series still provides tsc_khz in
    struct kvm_tdx_init_vm

2) Use KVM_SET_TSC_KHZ to set VM-scoped TSC seems possible to be applied 
to all VMs, not limited to TDs. It doesn't look like a small task.

I need more time to evaluate the efforts.

> Thanks,
> 
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   target/i386/kvm/kvm.c |  8 ++++++++
>>   target/i386/kvm/tdx.c | 18 ++++++++++++++++++
>>   2 files changed, 26 insertions(+)
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index f2d7c3cf59ac..c51125ab200f 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -818,6 +818,14 @@ static int kvm_arch_set_tsc_khz(CPUState *cs)
>>       int r, cur_freq;
>>       bool set_ioctl = false;
>>   
>> +    /*
>> +     * TD guest's TSC is immutable, it cannot be set/changed via
>> +     * KVM_SET_TSC_KHZ, but only be initialized via KVM_TDX_INIT_VM
>> +     */
>> +    if (is_tdx_vm()) {
>> +        return 0;
>> +    }
>> +
>>       if (!env->tsc_khz) {
>>           return 0;
>>       }
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index 9f2cdf640b5c..622efc409438 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -35,6 +35,9 @@
>>   #define TDX_TD_ATTRIBUTES_PKS               BIT_ULL(30)
>>   #define TDX_TD_ATTRIBUTES_PERFMON           BIT_ULL(63)
>>   
>> +#define TDX_MIN_TSC_FREQUENCY_KHZ   (100 * 1000)
>> +#define TDX_MAX_TSC_FREQUENCY_KHZ   (10 * 1000 * 1000)
>> +
>>   static TdxGuest *tdx_guest;
>>   
>>   /* It's valid after kvm_confidential_guest_init()->kvm_tdx_init() */
>> @@ -211,6 +214,20 @@ int tdx_pre_create_vcpu(CPUState *cpu)
>>           goto out;
>>       }
>>   
>> +    r = -EINVAL;
>> +    if (env->tsc_khz && (env->tsc_khz < TDX_MIN_TSC_FREQUENCY_KHZ ||
>> +                         env->tsc_khz > TDX_MAX_TSC_FREQUENCY_KHZ)) {
>> +        error_report("Invalid TSC %ld KHz, must specify cpu_frequency between [%d, %d] kHz",
>> +                      env->tsc_khz, TDX_MIN_TSC_FREQUENCY_KHZ,
>> +                      TDX_MAX_TSC_FREQUENCY_KHZ);
>> +        goto out;
>> +    }
>> +
>> +    if (env->tsc_khz % (25 * 1000)) {
>> +        error_report("Invalid TSC %ld KHz, it must be multiple of 25MHz", env->tsc_khz);
>> +        goto out;
>> +    }
>> +
>>       r = setup_td_guest_attributes(x86cpu);
>>       if (r) {
>>           goto out;
>> @@ -221,6 +238,7 @@ int tdx_pre_create_vcpu(CPUState *cpu)
>>   
>>       init_vm.attributes = tdx_guest->attributes;
>>       init_vm.max_vcpus = ms->smp.cpus;
>> +    init_vm.tsc_khz = env->tsc_khz;
>>   
>>       r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
>>       if (r < 0) {
>> -- 
>> 2.27.0
>>
>>
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 24/36] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION
  2022-05-12 18:34   ` Isaku Yamahata
@ 2022-05-13  0:46     ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-13  0:46 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Paolo Bonzini, isaku.yamahata, Gerd Hoffmann,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/13/2022 2:34 AM, Isaku Yamahata wrote:
> On Thu, May 12, 2022 at 11:17:51AM +0800,
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> TDVF firmware (CODE and VARS) needs to be added/copied to TD's private
>> memory via KVM_TDX_INIT_MEM_REGION, as well as TD HOB and TEMP memory.
>>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   target/i386/kvm/tdx.c | 24 ++++++++++++++++++++++++
>>   1 file changed, 24 insertions(+)
>>
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index 3e18ace90bf7..567ee12e88f0 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -240,6 +240,7 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
>>   {
>>       TdxFirmware *tdvf = &tdx_guest->tdvf;
>>       TdxFirmwareEntry *entry;
>> +    int r;
>>   
>>       tdx_init_ram_entries();
>>   
>> @@ -265,6 +266,29 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
>>             sizeof(TdxRamEntry), &tdx_ram_entry_compare);
>>   
>>       tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
>> +
>> +    for_each_tdx_fw_entry(tdvf, entry) {
>> +        struct kvm_tdx_init_mem_region mem_region = {
>> +            .source_addr = (__u64)entry->mem_ptr,
>> +            .gpa = entry->address,
>> +            .nr_pages = entry->size / 4096,
>> +        };
>> +
>> +        __u32 metadata = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
>> +                         KVM_TDX_MEASURE_MEMORY_REGION : 0;
> 
> Please use flags instead of metadata.

Sure. Will change it.

> 
>> +        r = tdx_vm_ioctl(KVM_TDX_INIT_MEM_REGION, metadata, &mem_region);
>> +        if (r < 0) {
>> +             error_report("KVM_TDX_INIT_MEM_REGION failed %s", strerror(-r));
>> +             exit(1);
>> +        }
>> +
>> +        if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
>> +            entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
>> +            qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
>> +            entry->mem_ptr = NULL;
>> +        }
>> +    }
>>   }
>>   
>>   static Notifier tdx_machine_done_notify = {
>> -- 
>> 2.27.0
>>
>>
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 03/36] target/i386: Implement mc->kvm_type() to get VM type
  2022-05-12  3:17 ` [RFC PATCH v4 03/36] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
@ 2022-05-23  8:36   ` Gerd Hoffmann
  2022-05-23 14:55     ` Isaku Yamahata
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  8:36 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> +    if (!(kvm_check_extension(KVM_STATE(ms->accelerator), KVM_CAP_VM_TYPES) & BIT(kvm_type))) {
> +        error_report("vm-type %s not supported by KVM", vm_type_name[kvm_type]);
> +        exit(1);
> +    }

Not sure why TDX needs a new vm type whereas sev doesn't.  But that's up
for debate in the kernel tdx patches, not here.  Assuming the kernel
interface actually merged will look like this the patch makes sense.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 04/36] target/i386: Introduce kvm_confidential_guest_init()
  2022-05-12  3:17 ` [RFC PATCH v4 04/36] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
@ 2022-05-23  8:37   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  8:37 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:31AM +0800, Xiaoyao Li wrote:
> Introduce a separate function kvm_confidential_guest_init() for SEV (and
> future TDX).
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 05/36] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context
  2022-05-12  3:17 ` [RFC PATCH v4 05/36] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context Xiaoyao Li
@ 2022-05-23  8:38   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  8:38 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:32AM +0800, Xiaoyao Li wrote:
> Introduce tdx_kvm_init() and invoke it in kvm_confidential_guest_init()
> if it's a TDX VM. More initialization will be added later.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 06/36] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  2022-05-12  3:17 ` [RFC PATCH v4 06/36] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
  2022-05-12 17:38   ` Isaku Yamahata
@ 2022-05-23  8:45   ` Gerd Hoffmann
  2022-05-23 15:30     ` Xiaoyao Li
  1 sibling, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  8:45 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> +    do {
> +        size = sizeof(struct kvm_tdx_capabilities) +
> +               max_ent * sizeof(struct kvm_tdx_cpuid_config);
> +        caps = g_malloc0(size);
> +        caps->nr_cpuid_configs = max_ent;
> +
> +        r = tdx_platform_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
> +        if (r == -E2BIG) {
> +            g_free(caps);
> +            max_ent *= 2;
> +        } else if (r < 0) {
> +            error_report("KVM_TDX_CAPABILITIES failed: %s\n", strerror(-r));
> +            exit(1);
> +        }
> +    }
> +    while (r == -E2BIG);

This should have a limit for the number of loop runs.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 07/36] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
  2022-05-12  3:17 ` [RFC PATCH v4 07/36] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
@ 2022-05-23  8:48   ` Gerd Hoffmann
  2022-05-23 14:59     ` Isaku Yamahata
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  8:48 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

> diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
> index c8a23d95258d..4036ca2f3f99 100644
> --- a/target/i386/kvm/tdx.h
> +++ b/target/i386/kvm/tdx.h
> @@ -1,6 +1,10 @@
>  #ifndef QEMU_I386_TDX_H
>  #define QEMU_I386_TDX_H
>  
> +#ifndef CONFIG_USER_ONLY
> +#include CONFIG_DEVICES /* CONFIG_TDX */
> +#endif
> +
>  #include "exec/confidential-guest-support.h"
>  
>  #define TYPE_TDX_GUEST "tdx-guest"
> @@ -16,6 +20,12 @@ typedef struct TdxGuest {
>      uint64_t attributes;    /* TD attributes */
>  } TdxGuest;
>  
> +#ifdef CONFIG_TDX
> +bool is_tdx_vm(void);
> +#else
> +#define is_tdx_vm() 0

Just add that to the tdx-stubs.c file you already created in one of the
previous patches and drop this #ifdef mess ;)

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 08/36] i386/tdx: Adjust get_supported_cpuid() for TDX VM
  2022-05-12  3:17 ` [RFC PATCH v4 08/36] i386/tdx: Adjust get_supported_cpuid() for TDX VM Xiaoyao Li
@ 2022-05-23  9:01   ` Gerd Hoffmann
  2022-05-23 15:37     ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  9:01 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> - The supported XCR0 and XSS bits needs to be cap'ed by tdx_caps, because
>   KVM uses them to setup XFAM of TD.

> +    case 0xd:
> +        if (index == 0) {
> +            if (reg == R_EAX) {
> +                *ret &= (uint32_t)tdx_caps->xfam_fixed0 & XCR0_MASK;
> +                *ret |= (uint32_t)tdx_caps->xfam_fixed1 & XCR0_MASK;
> +            } else if (reg == R_EDX) {
> +                *ret &= (tdx_caps->xfam_fixed0 & XCR0_MASK) >> 32;
> +                *ret |= (tdx_caps->xfam_fixed1 & XCR0_MASK) >> 32;
> +            }
> +        } else if (index == 1) {
> +            /* TODO: Adjust XSS when it's supported. */
> +        }
> +        break;

> +    default:
> +        /* TODO: Use tdx_caps to adjust CPUID leafs. */
> +        break;

Hmm, that looks all a bit messy and incomplete, also the commit
message doesn't match the patch (describes XSS which isn't actually
implemented).

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 10/36] i386/kvm: Move architectural CPUID leaf generation to separate helper
  2022-05-12 17:48   ` Isaku Yamahata
  2022-05-13  0:37     ` Xiaoyao Li
@ 2022-05-23  9:06     ` Gerd Hoffmann
  1 sibling, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  9:06 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Xiaoyao Li, Paolo Bonzini, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 10:48:14AM -0700, Isaku Yamahata wrote:
> On Thu, May 12, 2022 at 11:17:37AM +0800,
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
> > diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
> > index b434feaa6b1d..5c7972f617e8 100644
> > --- a/target/i386/kvm/kvm_i386.h
> > +++ b/target/i386/kvm/kvm_i386.h
> > @@ -24,6 +24,10 @@
> >  #define kvm_ioapic_in_kernel() \
> >      (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
> >  
> > +#define KVM_MAX_CPUID_ENTRIES  100
> 
> In Linux side, the value was bumped to 256.  Opportunistically let's make it
> same.

When doing so use a separate patch please.  A patch like this -- moving
around code without functional changes -- should not be mixed with other
changes.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus
  2022-05-12  3:17 ` [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
@ 2022-05-23  9:20   ` Gerd Hoffmann
  2022-05-23 15:42     ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  9:20 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

> +int tdx_pre_create_vcpu(CPUState *cpu)
> +{
> +    MachineState *ms = MACHINE(qdev_get_machine());
> +    X86CPU *x86cpu = X86_CPU(cpu);
> +    CPUX86State *env = &x86cpu->env;
> +    struct kvm_tdx_init_vm init_vm;
> +    int r = 0;
> +
> +    qemu_mutex_lock(&tdx_guest->lock);
> +    if (tdx_guest->initialized) {
> +        goto out;
> +    }
> +
> +    memset(&init_vm, 0, sizeof(init_vm));
> +    init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
> +
> +    init_vm.attributes = tdx_guest->attributes;
> +    init_vm.max_vcpus = ms->smp.cpus;
> +
> +    r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
> +    if (r < 0) {
> +        error_report("KVM_TDX_INIT_VM failed %s", strerror(-r));
> +        goto out;
> +    }
> +
> +    tdx_guest->initialized = true;
> +
> +out:
> +    qemu_mutex_unlock(&tdx_guest->lock);
> +    return r;
> +}

Hmm, hooking *vm* initialization into *vcpu* creation looks wrong to me.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes
  2022-05-12  3:17 ` [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes Xiaoyao Li
@ 2022-05-23  9:39   ` Gerd Hoffmann
  2022-05-24  4:19     ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  9:39 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

> Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
> fixed-1 bits must be set.

> -static void setup_td_guest_attributes(X86CPU *x86cpu)
> +static int tdx_validate_attributes(TdxGuest *tdx)
> +{
> +    if (((tdx->attributes & tdx_caps->attrs_fixed0) | tdx_caps->attrs_fixed1) !=
> +        tdx->attributes) {
> +            error_report("Invalid attributes 0x%lx for TDX VM (fixed0 0x%llx, fixed1 0x%llx)",
> +                          tdx->attributes, tdx_caps->attrs_fixed0, tdx_caps->attrs_fixed1);
> +            return -EINVAL;
> +    }

So, how is this supposed to work?  Patch #2 introduces attributes as
user-settable property.  So do users have to manually figure and pass
the correct value, so the check passes?  Specifically the fixed1 check?

I think 'attributes' should not be user-settable in the first place.
Each feature-bit which is actually user-settable (and not already
covered by another option like pmu) should be a separate attribute for
tdx-object.  Then the tdx code can create attributes from hardware
capabilities and user settings.

When user-settable options might not be available depending on hardware
capabilities best practice is to create them as OnOffAuto properties.

  Auto == qemu can pick the value, typical behavior is to enable the
          feature if the hardware supports it.
  On == must enable, if it isn't possible throw an error and exit.
  Off == must disable, if it isn't possible throw an error and exit. 

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 14/36] i386/tdx: Implement user specified tsc frequency
  2022-05-12  3:17 ` [RFC PATCH v4 14/36] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
  2022-05-12 18:04   ` Isaku Yamahata
@ 2022-05-23  9:43   ` Gerd Hoffmann
  1 sibling, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  9:43 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:41AM +0800, Xiaoyao Li wrote:
> Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and pass it
> to KVM_TDX_INIT_VM.
> 
> Besides, sanity check the tsc frequency to be in the legal range and
> legal granularity (required by TDX module).

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 15/36] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM
  2022-05-12  3:17 ` [RFC PATCH v4 15/36] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM Xiaoyao Li
@ 2022-05-23  9:45   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-23  9:45 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:42AM +0800, Xiaoyao Li wrote:
> TDX only supports readonly for shared memory but not for private memory.
> 
> In the view of QEMU, it has no idea whether a memslot is used as shared
> memory of private. Thus just mark kvm_readonly_mem_enabled to false to
> TDX VM for simplicity.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 03/36] target/i386: Implement mc->kvm_type() to get VM type
  2022-05-23  8:36   ` Gerd Hoffmann
@ 2022-05-23 14:55     ` Isaku Yamahata
  0 siblings, 0 replies; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-23 14:55 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Xiaoyao Li, Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Mon, May 23, 2022 at 10:36:16AM +0200,
Gerd Hoffmann <kraxel@redhat.com> wrote:

>   Hi,
> 
> > +    if (!(kvm_check_extension(KVM_STATE(ms->accelerator), KVM_CAP_VM_TYPES) & BIT(kvm_type))) {
> > +        error_report("vm-type %s not supported by KVM", vm_type_name[kvm_type]);
> > +        exit(1);
> > +    }
> 
> Not sure why TDX needs a new vm type whereas sev doesn't.  But that's up
> for debate in the kernel tdx patches, not here.  Assuming the kernel
> interface actually merged will look like this the patch makes sense.

Because VM operations, e.g. KVM_CREATE_VCPU, require TDX specific one in KVM
side, we need to tell this VM is TD.
Also it's for consistency.  It's common pattern to specify vm type with
KVM_CREATE_VM when among other archs.  S390, PPC, MIPS, and ARM64.  Only SEV is
an exception.  It makes default VM into confidential VM after KVM_CREATE_VM.

Thanks,

> 
> Acked-by: Gerd Hoffmann <kraxel@redhat.com>
> 
> take care,
>   Gerd
> 
> 

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 07/36] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
  2022-05-23  8:48   ` Gerd Hoffmann
@ 2022-05-23 14:59     ` Isaku Yamahata
  2022-05-24  6:42       ` Gerd Hoffmann
  0 siblings, 1 reply; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-23 14:59 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Xiaoyao Li, Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Mon, May 23, 2022 at 10:48:17AM +0200,
Gerd Hoffmann <kraxel@redhat.com> wrote:

> > diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
> > index c8a23d95258d..4036ca2f3f99 100644
> > --- a/target/i386/kvm/tdx.h
> > +++ b/target/i386/kvm/tdx.h
> > @@ -1,6 +1,10 @@
> >  #ifndef QEMU_I386_TDX_H
> >  #define QEMU_I386_TDX_H
> >  
> > +#ifndef CONFIG_USER_ONLY
> > +#include CONFIG_DEVICES /* CONFIG_TDX */
> > +#endif
> > +
> >  #include "exec/confidential-guest-support.h"
> >  
> >  #define TYPE_TDX_GUEST "tdx-guest"
> > @@ -16,6 +20,12 @@ typedef struct TdxGuest {
> >      uint64_t attributes;    /* TD attributes */
> >  } TdxGuest;
> >  
> > +#ifdef CONFIG_TDX
> > +bool is_tdx_vm(void);
> > +#else
> > +#define is_tdx_vm() 0
> 
> Just add that to the tdx-stubs.c file you already created in one of the
> previous patches and drop this #ifdef mess ;)

This is for consistency with SEV.  Anyway Either way is okay.

From target/i386/sev.h
  ...
  #ifdef CONFIG_SEV
  bool sev_enabled(void);
  bool sev_es_enabled(void);
  #else
  #define sev_enabled() 0
  #define sev_es_enabled() 0
  #endif

-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 06/36] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  2022-05-23  8:45   ` Gerd Hoffmann
@ 2022-05-23 15:30     ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-23 15:30 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/23/2022 4:45 PM, Gerd Hoffmann wrote:
>    Hi,
> 
>> +    do {
>> +        size = sizeof(struct kvm_tdx_capabilities) +
>> +               max_ent * sizeof(struct kvm_tdx_cpuid_config);
>> +        caps = g_malloc0(size);
>> +        caps->nr_cpuid_configs = max_ent;
>> +
>> +        r = tdx_platform_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
>> +        if (r == -E2BIG) {
>> +            g_free(caps);
>> +            max_ent *= 2;
>> +        } else if (r < 0) {
>> +            error_report("KVM_TDX_CAPABILITIES failed: %s\n", strerror(-r));
>> +            exit(1);
>> +        }
>> +    }
>> +    while (r == -E2BIG);
> 
> This should have a limit for the number of loop runs.

Actually, this logic is copied from get_supported_cpuid().

Anyway, I can put a maximum limit as 256 (it should be large enough) or 
maybe re-use KVM_MAX_CPUID_ENTRIES. When it gets hit, we know we need to 
update QEMU to fit with TDX on new platform.

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 08/36] i386/tdx: Adjust get_supported_cpuid() for TDX VM
  2022-05-23  9:01   ` Gerd Hoffmann
@ 2022-05-23 15:37     ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-23 15:37 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/23/2022 5:01 PM, Gerd Hoffmann wrote:
>    Hi,
> 
>> - The supported XCR0 and XSS bits needs to be cap'ed by tdx_caps, because
>>    KVM uses them to setup XFAM of TD.
> 
>> +    case 0xd:
>> +        if (index == 0) {
>> +            if (reg == R_EAX) {
>> +                *ret &= (uint32_t)tdx_caps->xfam_fixed0 & XCR0_MASK;
>> +                *ret |= (uint32_t)tdx_caps->xfam_fixed1 & XCR0_MASK;
>> +            } else if (reg == R_EDX) {
>> +                *ret &= (tdx_caps->xfam_fixed0 & XCR0_MASK) >> 32;
>> +                *ret |= (tdx_caps->xfam_fixed1 & XCR0_MASK) >> 32;
>> +            }
>> +        } else if (index == 1) {
>> +            /* TODO: Adjust XSS when it's supported. */
>> +        }
>> +        break;
> 
>> +    default:
>> +        /* TODO: Use tdx_caps to adjust CPUID leafs. */
>> +        break;
> 
> Hmm, that looks all a bit messy and incomplete, also the commit
> message doesn't match the patch (describes XSS which isn't actually
> implemented).

For XSS, QEMU recently got XSS MASK defined, I will use it in this patch.

For other CPUID leaves, we have following patches (a series) to enable 
fine-grained feature control for TDX guest and CPU model for it. So the 
plan is to make it functional with no error in this basic series. Anyway 
I will update the commit message to describe clearly.

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus
  2022-05-23  9:20   ` Gerd Hoffmann
@ 2022-05-23 15:42     ` Xiaoyao Li
  2022-05-24  6:57       ` Gerd Hoffmann
  0 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-23 15:42 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/23/2022 5:20 PM, Gerd Hoffmann wrote:
>> +int tdx_pre_create_vcpu(CPUState *cpu)
>> +{
>> +    MachineState *ms = MACHINE(qdev_get_machine());
>> +    X86CPU *x86cpu = X86_CPU(cpu);
>> +    CPUX86State *env = &x86cpu->env;
>> +    struct kvm_tdx_init_vm init_vm;
>> +    int r = 0;
>> +
>> +    qemu_mutex_lock(&tdx_guest->lock);
>> +    if (tdx_guest->initialized) {
>> +        goto out;
>> +    }
>> +
>> +    memset(&init_vm, 0, sizeof(init_vm));
>> +    init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
>> +
>> +    init_vm.attributes = tdx_guest->attributes;
>> +    init_vm.max_vcpus = ms->smp.cpus;
>> +
>> +    r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
>> +    if (r < 0) {
>> +        error_report("KVM_TDX_INIT_VM failed %s", strerror(-r));
>> +        goto out;
>> +    }
>> +
>> +    tdx_guest->initialized = true;
>> +
>> +out:
>> +    qemu_mutex_unlock(&tdx_guest->lock);
>> +    return r;
>> +}
> 
> Hmm, hooking *vm* initialization into *vcpu* creation looks wrong to me.

That's because for TDX, it has to do VM-scope (feature) initialization 
before creating vcpu. This is new to KVM and QEMU, that every feature is 
vcpu-scope and configured per-vcpu before.

To minimize the change to QEMU, we want to utilize @cpu and @cpu->env to 
grab the configuration info. That's why it goes this way.

Do you have any better idea on it?

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes
  2022-05-23  9:39   ` Gerd Hoffmann
@ 2022-05-24  4:19     ` Xiaoyao Li
  2022-05-24  6:59       ` Gerd Hoffmann
  0 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-24  4:19 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/23/2022 5:39 PM, Gerd Hoffmann wrote:
>> Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
>> fixed-1 bits must be set.
> 
>> -static void setup_td_guest_attributes(X86CPU *x86cpu)
>> +static int tdx_validate_attributes(TdxGuest *tdx)
>> +{
>> +    if (((tdx->attributes & tdx_caps->attrs_fixed0) | tdx_caps->attrs_fixed1) !=
>> +        tdx->attributes) {
>> +            error_report("Invalid attributes 0x%lx for TDX VM (fixed0 0x%llx, fixed1 0x%llx)",
>> +                          tdx->attributes, tdx_caps->attrs_fixed0, tdx_caps->attrs_fixed1);
>> +            return -EINVAL;
>> +    }
> 
> So, how is this supposed to work?  Patch #2 introduces attributes as
> user-settable property.  So do users have to manually figure and pass
> the correct value, so the check passes?  Specifically the fixed1 check?
> 
> I think 'attributes' should not be user-settable in the first place.
> Each feature-bit which is actually user-settable (and not already
> covered by another option like pmu) should be a separate attribute for
> tdx-object.  Then the tdx code can create attributes from hardware
> capabilities and user settings.

In patch #2, tdx-guest.attributes is defined as a field to hold a 64 
bits value of attributes but it doesn't provide any getter/setter for 
it. So it's *not* user-settable.

Did I miss something? (I'm not good at QEMU object)

> When user-settable options might not be available depending on hardware
> capabilities best practice is to create them as OnOffAuto properties.
> 
>    Auto == qemu can pick the value, typical behavior is to enable the
>            feature if the hardware supports it.
>    On == must enable, if it isn't possible throw an error and exit.
>    Off == must disable, if it isn't possible throw an error and exit.
> 
> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 07/36] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
  2022-05-23 14:59     ` Isaku Yamahata
@ 2022-05-24  6:42       ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  6:42 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Xiaoyao Li, Paolo Bonzini, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

> > > +#ifdef CONFIG_TDX
> > > +bool is_tdx_vm(void);
> > > +#else
> > > +#define is_tdx_vm() 0
> > 
> > Just add that to the tdx-stubs.c file you already created in one of the
> > previous patches and drop this #ifdef mess ;)
> 
> This is for consistency with SEV.  Anyway Either way is okay.

> From target/i386/sev.h
>   ...
>   #ifdef CONFIG_SEV
>   bool sev_enabled(void);
>   bool sev_es_enabled(void);
>   #else
>   #define sev_enabled() 0
>   #define sev_es_enabled() 0
>   #endif

Hmm, not sure why sev did it this way.  One possible reason is that the
compiler optimizer can see sev_enabled() evaluates to 0 and throw away
the dead code branches then.

So, yes, maybe it makes sense to stick to the #ifdef in this specific
case.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus
  2022-05-23 15:42     ` Xiaoyao Li
@ 2022-05-24  6:57       ` Gerd Hoffmann
  2022-06-01  7:20         ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  6:57 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> > Hmm, hooking *vm* initialization into *vcpu* creation looks wrong to me.
> 
> That's because for TDX, it has to do VM-scope (feature) initialization
> before creating vcpu. This is new to KVM and QEMU, that every feature is
> vcpu-scope and configured per-vcpu before.
> 
> To minimize the change to QEMU, we want to utilize @cpu and @cpu->env to
> grab the configuration info. That's why it goes this way.
> 
> Do you have any better idea on it?

Maybe it's a bit more work to add VM-scope initialization support to
qemu.  But I expect that approach will work better long-term.  You need
this mutex and the 'initialized' variable in your code to make sure it
runs only once because the way you hook it in is not ideal ...

[ disclaimer: I'm not that familiar with the kvm interface in qemu ]

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes
  2022-05-24  4:19     ` Xiaoyao Li
@ 2022-05-24  6:59       ` Gerd Hoffmann
  2022-05-24  8:11         ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  6:59 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, May 24, 2022 at 12:19:51PM +0800, Xiaoyao Li wrote:
> On 5/23/2022 5:39 PM, Gerd Hoffmann wrote:
> > So, how is this supposed to work?  Patch #2 introduces attributes as
> > user-settable property.  So do users have to manually figure and pass
> > the correct value, so the check passes?  Specifically the fixed1 check?
> > 
> > I think 'attributes' should not be user-settable in the first place.
> > Each feature-bit which is actually user-settable (and not already
> > covered by another option like pmu) should be a separate attribute for
> > tdx-object.  Then the tdx code can create attributes from hardware
> > capabilities and user settings.
> 
> In patch #2, tdx-guest.attributes is defined as a field to hold a 64 bits
> value of attributes but it doesn't provide any getter/setter for it. So it's
> *not* user-settable.

Ok.  Why it is declared as object property in the first place then?

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 16/36] i386/tdvf: Introduce function to parse TDVF metadata
  2022-05-12  3:17 ` [RFC PATCH v4 16/36] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
@ 2022-05-24  7:02   ` Gerd Hoffmann
  2022-05-26  2:25     ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:02 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> +static int tdvf_parse_section_entry(const TdvfSectionEntry *src,
> +                                     TdxFirmwareEntry *entry)

> +    /* sanity check */

That is what the whole function is doing.  So rename it to
tdvf_check_section_entry to clarify that?

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 17/36] i386/tdx: Parse TDVF metadata for TDX VM
  2022-05-12  3:17 ` [RFC PATCH v4 17/36] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
@ 2022-05-24  7:03   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:03 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:44AM +0800, Xiaoyao Li wrote:
> TDX cannot support pflash device since it doesn't support read-only
> memslot and doesn't support emulation. Load TDVF(OVMF) with -bios option
> for TDs.
> 
> When boot a TD, besides load TDVF to the address below 4G, it needs
> parse TDVF metadata.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup
  2022-05-12  3:17 ` [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup Xiaoyao Li
@ 2022-05-24  7:08   ` Gerd Hoffmann
  2022-05-26  2:48     ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:08 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:45AM +0800, Xiaoyao Li wrote:
> TDX guest cannot go to real mode, so just skip the setup of isa-bios.

Does isa-bios setup cause any actual problems?
(same question for patch #19).

"is not needed" IMHO isn't a good enough reason to special-case tdx
here.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 20/36] i386/tdx: Register a machine_init_done callback for TD
  2022-05-12  3:17 ` [RFC PATCH v4 20/36] i386/tdx: Register a machine_init_done callback for TD Xiaoyao Li
@ 2022-05-24  7:09   ` Gerd Hoffmann
  2022-05-26  2:52     ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:09 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:47AM +0800, Xiaoyao Li wrote:
> Before a TD can run, it needs to
>  - setup/configure TD HOB list;
>  - initialize TDVF into TD's private memory;
>  - initialize TD vcpu state;
> 
> Register a machine_init_done callback to all those stuff.

> +static void tdx_finalize_vm(Notifier *notifier, void *unused)
> +{
> +    /* TODO */
> +}

I'd suggest to squash this into the patch actually implementing
tdx_finalize_vm.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 21/36] i386/tdx: Track mem_ptr for each firmware entry of TDVF
  2022-05-12  3:17 ` [RFC PATCH v4 21/36] i386/tdx: Track mem_ptr for each firmware entry of TDVF Xiaoyao Li
@ 2022-05-24  7:11   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:11 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:48AM +0800, Xiaoyao Li wrote:
> For each TDVF sections, QEMU needs to copy the content to guest
> private memory via KVM API (KVM_TDX_INIT_MEM_REGION).
> 
> Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
> pointer of each TDVF sections. So that QEMU can add/copy them to guest
> private memory later.
> 
> TDVF sections can be classified into two groups:
>  - Firmware itself, e.g., BFV and CFV, that locates separated from guest
>    RAM. It's memory pointer is the bios pointer.
> 
>  - Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
>    mmap a new memory range for them.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM
  2022-05-12  3:17 ` [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
@ 2022-05-24  7:37   ` Gerd Hoffmann
  2022-05-26  7:33     ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:37 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

> +static int tdx_accept_ram_range(uint64_t address, uint64_t length)
> +{
> +    TdxRamEntry *e;
> +    int i;
> +
> +    for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
> +        e = &tdx_guest->ram_entries[i];
> +
> +        if (address + length < e->address ||
> +            e->address + e->length < address) {
> +                continue;
> +        }
> +
> +        if (e->address > address ||
> +            e->address + e->length < address + length) {
> +            return -EINVAL;
> +        }

if (e->type == TDX_RAM_ADDED)
	return -EINVAL

> +        if (e->address == address && e->length == length) {
> +            e->type = TDX_RAM_ADDED;
> +        } else if (e->address == address) {
> +            e->address += length;
> +            e->length -= length;
> +            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
> +        } else if (e->address + e->length == address + length) {
> +            e->length -= length;
> +            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
> +        } else {
> +            TdxRamEntry tmp = {
> +                .address = e->address,
> +                .length = e->length,
> +            };
> +            e->length = address - tmp.address;
> +
> +            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
> +            tdx_add_ram_entry(address + length,
> +                              tmp.address + tmp.length - (address + length),
> +                              TDX_RAM_UNACCEPTED);
> +        }

I think all this can be simplified, by
  (1) Change the existing entry to cover the accepted ram range.
  (2) If there is room before the accepted ram range add a
      TDX_RAM_UNACCEPTED entry for that.
  (3) If there is room after the accepted ram range add a
      TDX_RAM_UNACCEPTED entry for that.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 23/36] i386/tdx: Setup the TD HOB list
  2022-05-12  3:17 ` [RFC PATCH v4 23/36] i386/tdx: Setup the TD HOB list Xiaoyao Li
  2022-05-12 18:33   ` Isaku Yamahata
@ 2022-05-24  7:56   ` Gerd Hoffmann
  2022-06-02  9:27     ` Xiaoyao Li
  1 sibling, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:56 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> +static void tdvf_hob_add_mmio_resources(TdvfHob *hob)
> +{
> +    MachineState *ms = MACHINE(qdev_get_machine());
> +    X86MachineState *x86ms = X86_MACHINE(ms);
> +    PCIHostState *pci_host;
> +    uint64_t start, end;
> +    uint64_t mcfg_base, mcfg_size;
> +    Object *host;
> +
> +    /* Effectively PCI hole + other MMIO devices. */
> +    tdvf_hob_add_mmio_resource(hob, x86ms->below_4g_mem_size,
> +                               APIC_DEFAULT_ADDRESS);
> +
> +    /* Stolen from acpi_get_i386_pci_host(), there's gotta be an easier way. */
> +    pci_host = OBJECT_CHECK(PCIHostState,
> +                            object_resolve_path("/machine/i440fx", NULL),
> +                            TYPE_PCI_HOST_BRIDGE);
> +    if (!pci_host) {
> +        pci_host = OBJECT_CHECK(PCIHostState,
> +                                object_resolve_path("/machine/q35", NULL),
> +                                TYPE_PCI_HOST_BRIDGE);
> +    }
> +    g_assert(pci_host);
> +
> +    host = OBJECT(pci_host);
> +
> +    /* PCI hole above 4gb. */
> +    start = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_START,
> +                                     NULL);
> +    end = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_END, NULL);
> +    tdvf_hob_add_mmio_resource(hob, start, end);
> +
> +    /* MMCFG region */
> +    mcfg_base = object_property_get_uint(host, PCIE_HOST_MCFG_BASE, NULL);
> +    mcfg_size = object_property_get_uint(host, PCIE_HOST_MCFG_SIZE, NULL);
> +    if (mcfg_base && mcfg_base != PCIE_BASE_ADDR_UNMAPPED && mcfg_size) {
> +        tdvf_hob_add_mmio_resource(hob, mcfg_base, mcfg_base + mcfg_size);
> +    }
> +}

That looks suspicious.  I think you need none of this, except for the
first tdvf_hob_add_mmio_resource() call which adds the below-4G hole.

It is the firmware which places the mmio resources into the address
space by programming the pci config space of the devices.  qemu doesn't
dictate any of this, and I doubt you get any useful values here.  The
core runs before the firmware had the chance to do any setup here ...

> new file mode 100644
> index 000000000000..b15aba796156
> --- /dev/null
> +++ b/hw/i386/uefi.h

Separate patch please.

Also this should probably go somewhere below
include/standard-headers/

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 24/36] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION
  2022-05-12  3:17 ` [RFC PATCH v4 24/36] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
  2022-05-12 18:34   ` Isaku Yamahata
@ 2022-05-24  7:57   ` Gerd Hoffmann
  1 sibling, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:57 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:51AM +0800, Xiaoyao Li wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> TDVF firmware (CODE and VARS) needs to be added/copied to TD's private
> memory via KVM_TDX_INIT_MEM_REGION, as well as TD HOB and TEMP memory.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 25/36] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu
  2022-05-12  3:17 ` [RFC PATCH v4 25/36] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu Xiaoyao Li
@ 2022-05-24  7:59   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:59 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:52AM +0800, Xiaoyao Li wrote:
> TDX vcpu needs to be initialized by SEAMCALL(TDH.VP.INIT) and KVM
> provides vcpu level IOCTL KVM_TDX_INIT_VCPU for it.
> 
> KVM_TDX_INIT_VCPU needs the address of the HOB as input. Invoke it for
> each vcpu after HOB list is created.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 26/36] i386/tdx: Finalize TDX VM
  2022-05-12  3:17 ` [RFC PATCH v4 26/36] i386/tdx: Finalize TDX VM Xiaoyao Li
@ 2022-05-24  7:59   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  7:59 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:53AM +0800, Xiaoyao Li wrote:
> Invoke KVM_TDX_FINALIZE_VM to finalize the TD's measurement and make
> the TD vCPUs runnable once machine initialization is complete.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 27/36] i386/tdx: Disable SMM for TDX VMs
  2022-05-12  3:17 ` [RFC PATCH v4 27/36] i386/tdx: Disable SMM for TDX VMs Xiaoyao Li
@ 2022-05-24  8:00   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  8:00 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:54AM +0800, Xiaoyao Li wrote:
> TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
> VMM cannot manipulate TDX VM's memory.
> 
> Disable SMM for TDX VMs and error out if user requests to enable SMM.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 28/36] i386/tdx: Disable PIC for TDX VMs
  2022-05-12  3:17 ` [RFC PATCH v4 28/36] i386/tdx: Disable PIC " Xiaoyao Li
@ 2022-05-24  8:00   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  8:00 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:55AM +0800, Xiaoyao Li wrote:
> Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
> doesn't allow directly interrupt injection.  Using posted interrupts
> for the PIC is not a viable option as the guest BIOS/kernel will not
> do EOI for PIC IRQs, i.e. will leave the vIRR bit set.
> 
> Hence disable PIC for TDX VMs and error out if user wants PIC.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 29/36] i386/tdx: Don't allow system reset for TDX VMs
  2022-05-12  3:17 ` [RFC PATCH v4 29/36] i386/tdx: Don't allow system reset " Xiaoyao Li
@ 2022-05-24  8:01   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  8:01 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:56AM +0800, Xiaoyao Li wrote:
> TDX CPU state is protected and thus vcpu state cann't be reset by VMM.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 32/36] i386/tdx: Don't synchronize guest tsc for TDs
  2022-05-12  3:17 ` [RFC PATCH v4 32/36] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
@ 2022-05-24  8:04   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  8:04 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:17:59AM +0800, Xiaoyao Li wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> TSC of TDs is not accessible and KVM doesn't allow access of
> MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
> kvm_synchronize_all_tsc() noop for TDs,
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 33/36] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs
  2022-05-12  3:18 ` [RFC PATCH v4 33/36] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
@ 2022-05-24  8:05   ` Gerd Hoffmann
  0 siblings, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  8:05 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P . Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 12, 2022 at 11:18:00AM +0800, Xiaoyao Li wrote:
> For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
> by VMM, while the features enumerated/controlled by other MSRs except
> MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.
> 
> Only configure MSR_IA32_UCODE_REV for TDs.

Acked-by: Gerd Hoffmann <kraxel@redhat.com>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes
  2022-05-24  6:59       ` Gerd Hoffmann
@ 2022-05-24  8:11         ` Xiaoyao Li
  2022-05-24  8:29           ` Gerd Hoffmann
  0 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-24  8:11 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/24/2022 2:59 PM, Gerd Hoffmann wrote:
> On Tue, May 24, 2022 at 12:19:51PM +0800, Xiaoyao Li wrote:
>> On 5/23/2022 5:39 PM, Gerd Hoffmann wrote:
>>> So, how is this supposed to work?  Patch #2 introduces attributes as
>>> user-settable property.  So do users have to manually figure and pass
>>> the correct value, so the check passes?  Specifically the fixed1 check?
>>>
>>> I think 'attributes' should not be user-settable in the first place.
>>> Each feature-bit which is actually user-settable (and not already
>>> covered by another option like pmu) should be a separate attribute for
>>> tdx-object.  Then the tdx code can create attributes from hardware
>>> capabilities and user settings.
>>
>> In patch #2, tdx-guest.attributes is defined as a field to hold a 64 bits
>> value of attributes but it doesn't provide any getter/setter for it. So it's
>> *not* user-settable.
> 
> Ok.  Why it is declared as object property in the first place then?

Is there another way to define a member/field of object besides property?

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes
  2022-05-24  8:11         ` Xiaoyao Li
@ 2022-05-24  8:29           ` Gerd Hoffmann
  2022-05-26  3:44             ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-24  8:29 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Tue, May 24, 2022 at 04:11:56PM +0800, Xiaoyao Li wrote:
> On 5/24/2022 2:59 PM, Gerd Hoffmann wrote:
> > On Tue, May 24, 2022 at 12:19:51PM +0800, Xiaoyao Li wrote:
> > > On 5/23/2022 5:39 PM, Gerd Hoffmann wrote:
> > > > So, how is this supposed to work?  Patch #2 introduces attributes as
> > > > user-settable property.  So do users have to manually figure and pass
> > > > the correct value, so the check passes?  Specifically the fixed1 check?
> > > > 
> > > > I think 'attributes' should not be user-settable in the first place.
> > > > Each feature-bit which is actually user-settable (and not already
> > > > covered by another option like pmu) should be a separate attribute for
> > > > tdx-object.  Then the tdx code can create attributes from hardware
> > > > capabilities and user settings.
> > > 
> > > In patch #2, tdx-guest.attributes is defined as a field to hold a 64 bits
> > > value of attributes but it doesn't provide any getter/setter for it. So it's
> > > *not* user-settable.
> > 
> > Ok.  Why it is declared as object property in the first place then?
> 
> Is there another way to define a member/field of object besides property?

Well, the C object struct is completely independent from the qapi
struct.  Typically qapi-generated structs are added as struct fields.
Look at ui/input-linux.c for example.

struct InputLinux holds all the object state.  It has a GrabToggleKeys
field, that is a qapi-generated enum (see qapi/common.json) and is
user-configurable (there are getter and setter for it).

So, you can have a private 'attributes' struct field in your tdx class,
but the field doesn't have to be in the qapi struct for that.

HTH,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 16/36] i386/tdvf: Introduce function to parse TDVF metadata
  2022-05-24  7:02   ` Gerd Hoffmann
@ 2022-05-26  2:25     ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-26  2:25 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/24/2022 3:02 PM, Gerd Hoffmann wrote:
>    Hi,
> 
>> +static int tdvf_parse_section_entry(const TdvfSectionEntry *src,
>> +                                     TdxFirmwareEntry *entry)
> 
>> +    /* sanity check */
> 
> That is what the whole function is doing.  So rename it to
> tdvf_check_section_entry to clarify that?

I will rename it to tdvf_parse_and_check_section_entry() since it first 
parses the section entries from TDVF to software defined data structure 
TdxFirmwareEntry

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup
  2022-05-24  7:08   ` Gerd Hoffmann
@ 2022-05-26  2:48     ` Xiaoyao Li
  2022-05-30 11:49       ` Gerd Hoffmann
  0 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-26  2:48 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/24/2022 3:08 PM, Gerd Hoffmann wrote:
> On Thu, May 12, 2022 at 11:17:45AM +0800, Xiaoyao Li wrote:
>> TDX guest cannot go to real mode, so just skip the setup of isa-bios.
> 
> Does isa-bios setup cause any actual problems?
> (same question for patch #19).

It causes mem_region split and mem_slot deletion on KVM.

TDVF marks pages starting from 0x800000 as TEMP_MEM and TD_HOB, which 
are TD's private memory and are TDH_MEM_PAGE_ADD'ed to TD via 
KVM_TDX_INIT_MEM_REGION

However, if isa-bios and pc.rom are not skipped, the memory_region 
initialization of them is after KVM_TDX_INIT_MEM_REGION in 
tdx_machine_done_notify(). (I didn't figure out why this order though)

And the it causes memory region split that splits
	[0, ram_below_4g)
to
	[0, 0xc0 000),
	[0xc0 000, 0xe0 000),
	[0xe0 000, 0x100 000),
	[0x100 000, ram_below_4g)

which causes mem_slot deletion on KVM. On KVM side, we lose the page 
content when mem_slot deletion. Thus, the we lose the content of TD HOB.

Yes, the better solution seems to be ensure KVM_TDX_INIT_MEM_REGION is 
called after all the mem region is settled down. But I haven't figured 
out the reason why the isa-bios and pc.rom initialization happens after
machine_init_done_notifier

on the other hand, to keep isa-bios and pc.rom, we need additional work 
to copy the content from the end_of_4G to end_of_1M.

I'm not sure if isa-bios and pc.rom are needed from people on TD guest, 
so I just skip them for simplicity,

> "is not needed" IMHO isn't a good enough reason to special-case tdx
> here.
> 
> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 20/36] i386/tdx: Register a machine_init_done callback for TD
  2022-05-24  7:09   ` Gerd Hoffmann
@ 2022-05-26  2:52     ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-26  2:52 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/24/2022 3:09 PM, Gerd Hoffmann wrote:
> On Thu, May 12, 2022 at 11:17:47AM +0800, Xiaoyao Li wrote:
>> Before a TD can run, it needs to
>>   - setup/configure TD HOB list;
>>   - initialize TDVF into TD's private memory;
>>   - initialize TD vcpu state;
>>
>> Register a machine_init_done callback to all those stuff.
> 
>> +static void tdx_finalize_vm(Notifier *notifier, void *unused)
>> +{
>> +    /* TODO */
>> +}
> 
> I'd suggest to squash this into the patch actually implementing
> tdx_finalize_vm.

OK. I'll squash it into the next patch.

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes
  2022-05-24  8:29           ` Gerd Hoffmann
@ 2022-05-26  3:44             ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-26  3:44 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/24/2022 4:29 PM, Gerd Hoffmann wrote:
> On Tue, May 24, 2022 at 04:11:56PM +0800, Xiaoyao Li wrote:
>> On 5/24/2022 2:59 PM, Gerd Hoffmann wrote:
>>> On Tue, May 24, 2022 at 12:19:51PM +0800, Xiaoyao Li wrote:
>>>> On 5/23/2022 5:39 PM, Gerd Hoffmann wrote:
>>>>> So, how is this supposed to work?  Patch #2 introduces attributes as
>>>>> user-settable property.  So do users have to manually figure and pass
>>>>> the correct value, so the check passes?  Specifically the fixed1 check?
>>>>>
>>>>> I think 'attributes' should not be user-settable in the first place.
>>>>> Each feature-bit which is actually user-settable (and not already
>>>>> covered by another option like pmu) should be a separate attribute for
>>>>> tdx-object.  Then the tdx code can create attributes from hardware
>>>>> capabilities and user settings.
>>>>
>>>> In patch #2, tdx-guest.attributes is defined as a field to hold a 64 bits
>>>> value of attributes but it doesn't provide any getter/setter for it. So it's
>>>> *not* user-settable.
>>>
>>> Ok.  Why it is declared as object property in the first place then?
>>
>> Is there another way to define a member/field of object besides property?
> 
> Well, the C object struct is completely independent from the qapi
> struct.  Typically qapi-generated structs are added as struct fields.
> Look at ui/input-linux.c for example.
> 
> struct InputLinux holds all the object state.  It has a GrabToggleKeys
> field, that is a qapi-generated enum (see qapi/common.json) and is
> user-configurable (there are getter and setter for it).
> 
> So, you can have a private 'attributes' struct field in your tdx class,
> but the field doesn't have to be in the qapi struct for that.

I see. Thanks for the explanation!

I will remove the qom property definition in patch 2.

> HTH,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM
  2022-05-24  7:37   ` Gerd Hoffmann
@ 2022-05-26  7:33     ` Xiaoyao Li
  2022-05-26 18:48       ` Isaku Yamahata
  2022-05-27  8:36       ` Xiaoyao Li
  0 siblings, 2 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-26  7:33 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/24/2022 3:37 PM, Gerd Hoffmann wrote:
>> +static int tdx_accept_ram_range(uint64_t address, uint64_t length)
>> +{
>> +    TdxRamEntry *e;
>> +    int i;
>> +
>> +    for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
>> +        e = &tdx_guest->ram_entries[i];
>> +
>> +        if (address + length < e->address ||
>> +            e->address + e->length < address) {
>> +                continue;
>> +        }
>> +
>> +        if (e->address > address ||
>> +            e->address + e->length < address + length) {
>> +            return -EINVAL;
>> +        }
> 
> if (e->type == TDX_RAM_ADDED)
> 	return -EINVAL
> 
>> +        if (e->address == address && e->length == length) {
>> +            e->type = TDX_RAM_ADDED;
>> +        } else if (e->address == address) {
>> +            e->address += length;
>> +            e->length -= length;
>> +            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
>> +        } else if (e->address + e->length == address + length) {
>> +            e->length -= length;
>> +            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
>> +        } else {
>> +            TdxRamEntry tmp = {
>> +                .address = e->address,
>> +                .length = e->length,
>> +            };
>> +            e->length = address - tmp.address;
>> +
>> +            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
>> +            tdx_add_ram_entry(address + length,
>> +                              tmp.address + tmp.length - (address + length),
>> +                              TDX_RAM_UNACCEPTED);
>> +        }
> 
> I think all this can be simplified, by
>    (1) Change the existing entry to cover the accepted ram range.
>    (2) If there is room before the accepted ram range add a
>        TDX_RAM_UNACCEPTED entry for that.
>    (3) If there is room after the accepted ram range add a
>        TDX_RAM_UNACCEPTED entry for that.

I implement as below. Please help review.

+static int tdx_accept_ram_range(uint64_t address, uint64_t length)
+{
+    uint64_t head_start, tail_start, head_length, tail_length;
+    uint64_t tmp_address, tmp_length;
+    TdxRamEntry *e;
+    int i;
+
+    for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
+        e = &tdx_guest->ram_entries[i];
+
+        if (address + length < e->address ||
+            e->address + e->length < address) {
+                continue;
+        }
+
+        /*
+         * The to-be-accepted ram range must be fully contained by one
+         * RAM entries
+         */
+        if (e->address > address ||
+            e->address + e->length < address + length) {
+            return -EINVAL;
+        }
+
+        if (e->type == TDX_RAM_ADDED) {
+            return -EINVAL;
+        }
+
+        tmp_address = e->address;
+        tmp_length = e->length;
+
+        e->address = address;
+        e->length = length;
+        e->type = TDX_RAM_ADDED;
+
+        head_length = address - tmp_address;
+        if (head_length > 0) {
+            head_start = e->address;
+            tdx_add_ram_entry(head_start, head_length, TDX_RAM_UNACCEPTED);
+        }
+
+        tail_start = address + length;
+        if (tail_start < tmp_address + tmp_length) {
+            tail_length = e->address + e->length - tail_start;
+            tdx_add_ram_entry(tail_start, tail_length, TDX_RAM_UNACCEPTED);
+        }
+
+        return 0;
+    }
+
+    return -1;
+}



> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM
  2022-05-26  7:33     ` Xiaoyao Li
@ 2022-05-26 18:48       ` Isaku Yamahata
  2022-05-27  8:39         ` Xiaoyao Li
  2022-05-27  8:36       ` Xiaoyao Li
  1 sibling, 1 reply; 105+ messages in thread
From: Isaku Yamahata @ 2022-05-26 18:48 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Gerd Hoffmann, Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 26, 2022 at 03:33:10PM +0800,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> On 5/24/2022 3:37 PM, Gerd Hoffmann wrote:
> > I think all this can be simplified, by
> >    (1) Change the existing entry to cover the accepted ram range.
> >    (2) If there is room before the accepted ram range add a
> >        TDX_RAM_UNACCEPTED entry for that.
> >    (3) If there is room after the accepted ram range add a
> >        TDX_RAM_UNACCEPTED entry for that.
> 
> I implement as below. Please help review.
> 
> +static int tdx_accept_ram_range(uint64_t address, uint64_t length)
> +{
> +    uint64_t head_start, tail_start, head_length, tail_length;
> +    uint64_t tmp_address, tmp_length;
> +    TdxRamEntry *e;
> +    int i;
> +
> +    for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
> +        e = &tdx_guest->ram_entries[i];
> +
> +        if (address + length < e->address ||
> +            e->address + e->length < address) {
> +                continue;
> +        }
> +
> +        /*
> +         * The to-be-accepted ram range must be fully contained by one
> +         * RAM entries
> +         */
> +        if (e->address > address ||
> +            e->address + e->length < address + length) {
> +            return -EINVAL;
> +        }
> +
> +        if (e->type == TDX_RAM_ADDED) {
> +            return -EINVAL;
> +        }
> +
> +        tmp_address = e->address;
> +        tmp_length = e->length;
> +
> +        e->address = address;
> +        e->length = length;
> +        e->type = TDX_RAM_ADDED;
> +
> +        head_length = address - tmp_address;
> +        if (head_length > 0) {
> +            head_start = e->address;
> +            tdx_add_ram_entry(head_start, head_length, TDX_RAM_UNACCEPTED);

tdx_add_ram_entry() increments tdx_guest->nr_ram_entries.  I think it's worth
for comments why this is safe regarding to this for-loop.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM
  2022-05-26  7:33     ` Xiaoyao Li
  2022-05-26 18:48       ` Isaku Yamahata
@ 2022-05-27  8:36       ` Xiaoyao Li
  1 sibling, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-27  8:36 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/26/2022 3:33 PM, Xiaoyao Li wrote:
> On 5/24/2022 3:37 PM, Gerd Hoffmann wrote:

>>> +        if (e->address == address && e->length == length) {
>>> +            e->type = TDX_RAM_ADDED;
>>> +        } else if (e->address == address) {
>>> +            e->address += length;
>>> +            e->length -= length;
>>> +            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
>>> +        } else if (e->address + e->length == address + length) {
>>> +            e->length -= length;
>>> +            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
>>> +        } else {
>>> +            TdxRamEntry tmp = {
>>> +                .address = e->address,
>>> +                .length = e->length,
>>> +            };
>>> +            e->length = address - tmp.address;
>>> +
>>> +            tdx_add_ram_entry(address, length, TDX_RAM_ADDED);
>>> +            tdx_add_ram_entry(address + length,
>>> +                              tmp.address + tmp.length - (address + 
>>> length),
>>> +                              TDX_RAM_UNACCEPTED);
>>> +        }
>>
>> I think all this can be simplified, by
>>    (1) Change the existing entry to cover the accepted ram range.
>>    (2) If there is room before the accepted ram range add a
>>        TDX_RAM_UNACCEPTED entry for that.
>>    (3) If there is room after the accepted ram range add a
>>        TDX_RAM_UNACCEPTED entry for that.
> 
> I implement as below. Please help review.
> 
> +static int tdx_accept_ram_range(uint64_t address, uint64_t length)
> +{
> +    uint64_t head_start, tail_start, head_length, tail_length;
> +    uint64_t tmp_address, tmp_length;
> +    TdxRamEntry *e;
> +    int i;
> +
> +    for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
> +        e = &tdx_guest->ram_entries[i];
> +
> +        if (address + length < e->address ||
> +            e->address + e->length < address) {
> +                continue;
> +        }
> +
> +        /*
> +         * The to-be-accepted ram range must be fully contained by one
> +         * RAM entries
> +         */
> +        if (e->address > address ||
> +            e->address + e->length < address + length) {
> +            return -EINVAL;
> +        }
> +
> +        if (e->type == TDX_RAM_ADDED) {
> +            return -EINVAL;
> +        }
> +
> +        tmp_address = e->address;
> +        tmp_length = e->length;
> +
> +        e->address = address;
> +        e->length = length;
> +        e->type = TDX_RAM_ADDED;
> +
> +        head_length = address - tmp_address;
> +        if (head_length > 0) {
> +            head_start = e->address;
> +            tdx_add_ram_entry(head_start, head_length, 
> TDX_RAM_UNACCEPTED);
> +        }
> +
> +        tail_start = address + length;
> +        if (tail_start < tmp_address + tmp_length) {
> +            tail_length = e->address + e->length - tail_start;
> +            tdx_add_ram_entry(tail_start, tail_length, 
> TDX_RAM_UNACCEPTED);
> +        }
> +
> +        return 0;
> +    }
> +
> +    return -1;
> +}

above is incorrect. I implement fixed one:

+static int tdx_accept_ram_range(uint64_t address, uint64_t length)
+{
+    uint64_t head_start, tail_start, head_length, tail_length;
+    uint64_t tmp_address, tmp_length;
+    TdxRamEntry *e;
+    int i;
+
+    for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
+        e = &tdx_guest->ram_entries[i];
+
+        if (address + length < e->address ||
+            e->address + e->length < address) {
+                continue;
+        }
+
+        /*
+         * The to-be-accepted ram range must be fully contained by one
+         * RAM entries
+         */
+        if (e->address > address ||
+            e->address + e->length < address + length) {
+            return -EINVAL;
+        }
+
+        if (e->type == TDX_RAM_ADDED) {
+            return -EINVAL;
+        }
+
+        tmp_address = e->address;
+        tmp_length = e->length;
+
+        e->address = address;
+        e->length = length;
+        e->type = TDX_RAM_ADDED;
+
+        head_length = address - tmp_address;
+        if (head_length > 0) {
+            head_start = tmp_address;
+            tdx_add_ram_entry(head_start, head_length, TDX_RAM_UNACCEPTED);
+        }
+
+        tail_start = address + length;
+        if (tail_start < tmp_address + tmp_length) {
+            tail_length = tmp_address + tmp_length - tail_start;
+            tdx_add_ram_entry(tail_start, tail_length, TDX_RAM_UNACCEPTED);
+        }
+
+        return 0;
+    }
+
+    return -1;
+}


> 
> 
>> take care,
>>    Gerd
>>
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM
  2022-05-26 18:48       ` Isaku Yamahata
@ 2022-05-27  8:39         ` Xiaoyao Li
  2022-05-30 11:59           ` Gerd Hoffmann
  0 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-27  8:39 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Gerd Hoffmann, Paolo Bonzini, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/27/2022 2:48 AM, Isaku Yamahata wrote:
> On Thu, May 26, 2022 at 03:33:10PM +0800,
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
>> On 5/24/2022 3:37 PM, Gerd Hoffmann wrote:
>>> I think all this can be simplified, by
>>>     (1) Change the existing entry to cover the accepted ram range.
>>>     (2) If there is room before the accepted ram range add a
>>>         TDX_RAM_UNACCEPTED entry for that.
>>>     (3) If there is room after the accepted ram range add a
>>>         TDX_RAM_UNACCEPTED entry for that.
>>
>> I implement as below. Please help review.
>>
>> +static int tdx_accept_ram_range(uint64_t address, uint64_t length)
>> +{
>> +    uint64_t head_start, tail_start, head_length, tail_length;
>> +    uint64_t tmp_address, tmp_length;
>> +    TdxRamEntry *e;
>> +    int i;
>> +
>> +    for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
>> +        e = &tdx_guest->ram_entries[i];
>> +
>> +        if (address + length < e->address ||
>> +            e->address + e->length < address) {
>> +                continue;
>> +        }
>> +
>> +        /*
>> +         * The to-be-accepted ram range must be fully contained by one
>> +         * RAM entries
>> +         */
>> +        if (e->address > address ||
>> +            e->address + e->length < address + length) {
>> +            return -EINVAL;
>> +        }
>> +
>> +        if (e->type == TDX_RAM_ADDED) {
>> +            return -EINVAL;
>> +        }
>> +
>> +        tmp_address = e->address;
>> +        tmp_length = e->length;
>> +
>> +        e->address = address;
>> +        e->length = length;
>> +        e->type = TDX_RAM_ADDED;
>> +
>> +        head_length = address - tmp_address;
>> +        if (head_length > 0) {
>> +            head_start = e->address;
>> +            tdx_add_ram_entry(head_start, head_length, TDX_RAM_UNACCEPTED);
> 
> tdx_add_ram_entry() increments tdx_guest->nr_ram_entries.  I think it's worth
> for comments why this is safe regarding to this for-loop.

The for-loop is to find the valid existing RAM entry (from E820 table).
It will update the RAM entry and increment tdx_guest->nr_ram_entries 
when the initial RAM entry needs to be split. However, once find, the 
for-loop is certainly stopped since it returns unconditionally.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup
  2022-05-26  2:48     ` Xiaoyao Li
@ 2022-05-30 11:49       ` Gerd Hoffmann
  2022-07-29  7:14         ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-30 11:49 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Thu, May 26, 2022 at 10:48:56AM +0800, Xiaoyao Li wrote:
> On 5/24/2022 3:08 PM, Gerd Hoffmann wrote:
> > On Thu, May 12, 2022 at 11:17:45AM +0800, Xiaoyao Li wrote:
> > > TDX guest cannot go to real mode, so just skip the setup of isa-bios.
> > 
> > Does isa-bios setup cause any actual problems?
> > (same question for patch #19).
> 
> It causes mem_region split and mem_slot deletion on KVM.
> 
> TDVF marks pages starting from 0x800000 as TEMP_MEM and TD_HOB, which are
> TD's private memory and are TDH_MEM_PAGE_ADD'ed to TD via
> KVM_TDX_INIT_MEM_REGION
> 
> However, if isa-bios and pc.rom are not skipped, the memory_region
> initialization of them is after KVM_TDX_INIT_MEM_REGION in
> tdx_machine_done_notify(). (I didn't figure out why this order though)
> 
> And the it causes memory region split that splits
> 	[0, ram_below_4g)
> to
> 	[0, 0xc0 000),
> 	[0xc0 000, 0xe0 000),
> 	[0xe0 000, 0x100 000),
> 	[0x100 000, ram_below_4g)
> 
> which causes mem_slot deletion on KVM. On KVM side, we lose the page content
> when mem_slot deletion.  Thus, the we lose the content of TD HOB.

Hmm, removing and re-creating memory slots shouldn't cause page content
go away.   I'm wondering what the *real* problem is?  Maybe you loose
tdx-specific state, i.e. this removes TDH_MEM_PAGE_ADD changes?

> Yes, the better solution seems to be ensure KVM_TDX_INIT_MEM_REGION is
> called after all the mem region is settled down.

Yes, especially if tdx can't tolerate memory slots coming and going.

> But I haven't figured out the reason why the isa-bios and pc.rom
> initialization happens after machine_init_done_notifier

Probably happens when a flatview is created from the address space.

Maybe that is delayed somehow for machine creation, so all the address
space updates caused by device creation don't lead to lots of flatviews
being created and thrown away.

> on the other hand, to keep isa-bios and pc.rom, we need additional work to
> copy the content from the end_of_4G to end_of_1M.

There is no need for copying, end_of_1M is a alias memory region for
end_of_4G, so the backing storage is the same.

> I'm not sure if isa-bios and pc.rom are needed from people on TD guest, so I
> just skip them for simplicity,

Given that TDX guests start in 32bit mode not in real mode everything
should work fine without isa-bios.

I'd prefer to avoid creating a special case for tdx though.  Should make
long-term maintenance a bit easier when this is not needed.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM
  2022-05-27  8:39         ` Xiaoyao Li
@ 2022-05-30 11:59           ` Gerd Hoffmann
  2022-05-31  2:09             ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-05-30 11:59 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Isaku Yamahata, Paolo Bonzini, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> > tdx_add_ram_entry() increments tdx_guest->nr_ram_entries.  I think it's worth
> > for comments why this is safe regarding to this for-loop.
> 
> The for-loop is to find the valid existing RAM entry (from E820 table).
> It will update the RAM entry and increment tdx_guest->nr_ram_entries when
> the initial RAM entry needs to be split. However, once find, the for-loop is
> certainly stopped since it returns unconditionally.

Add a comment saying so would be good.

Or move the code block doing the update out of the loop.  That will
likewise make clear that finding the entry which must be updated is
the only purpose of the loop.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM
  2022-05-30 11:59           ` Gerd Hoffmann
@ 2022-05-31  2:09             ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-05-31  2:09 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Isaku Yamahata, Paolo Bonzini, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/30/2022 7:59 PM, Gerd Hoffmann wrote:
>    Hi,
> 
>>> tdx_add_ram_entry() increments tdx_guest->nr_ram_entries.  I think it's worth
>>> for comments why this is safe regarding to this for-loop.
>>
>> The for-loop is to find the valid existing RAM entry (from E820 table).
>> It will update the RAM entry and increment tdx_guest->nr_ram_entries when
>> the initial RAM entry needs to be split. However, once find, the for-loop is
>> certainly stopped since it returns unconditionally.
> 
> Add a comment saying so would be good.
> 
> Or move the code block doing the update out of the loop.  That will
> likewise make clear that finding the entry which must be updated is
> the only purpose of the loop.

Good idea. I'll go this way.

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus
  2022-05-24  6:57       ` Gerd Hoffmann
@ 2022-06-01  7:20         ` Xiaoyao Li
  2022-06-01  7:54           ` Gerd Hoffmann
  0 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-06-01  7:20 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/24/2022 2:57 PM, Gerd Hoffmann wrote:
>    Hi,
> 
>>> Hmm, hooking *vm* initialization into *vcpu* creation looks wrong to me.
>>
>> That's because for TDX, it has to do VM-scope (feature) initialization
>> before creating vcpu. This is new to KVM and QEMU, that every feature is
>> vcpu-scope and configured per-vcpu before.
>>
>> To minimize the change to QEMU, we want to utilize @cpu and @cpu->env to
>> grab the configuration info. That's why it goes this way.
>>
>> Do you have any better idea on it?
> 
> Maybe it's a bit more work to add VM-scope initialization support to
> qemu.  

If just introducing VM-scope initialization to QEMU, it would be easy. 
What matters is what needs to be done inside VM-scope initialization.

For TDX, we need to settle down the features that configured for the TD. 
Typically, the features are attributes of cpu object, parsed from "-cpu" 
option and stored in cpu object.

I cannot think up a clean solution for it, other than
1) implement the same attributes from cpu object to machine object, or
2) create a CPU object when initializing machine object and collect all 
the info from "-cpu" and drop it in the end; then why not do it when 
creating 1st vcpu like this patch.

That's what I can think up. Let's see if anyone has better idea.

> But I expect that approach will work better long-term.  You need
> this mutex and the 'initialized' variable in your code to make sure it
> runs only once because the way you hook it in is not ideal ...
> 
> [ disclaimer: I'm not that familiar with the kvm interface in qemu ]
> 
> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus
  2022-06-01  7:20         ` Xiaoyao Li
@ 2022-06-01  7:54           ` Gerd Hoffmann
  2022-06-02  1:01             ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-06-01  7:54 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Wed, Jun 01, 2022 at 03:20:46PM +0800, Xiaoyao Li wrote:
> On 5/24/2022 2:57 PM, Gerd Hoffmann wrote:
> >    Hi,
> > Maybe it's a bit more work to add VM-scope initialization support to
> > qemu.
> 
> If just introducing VM-scope initialization to QEMU, it would be easy. What
> matters is what needs to be done inside VM-scope initialization.
> 
> For TDX, we need to settle down the features that configured for the TD.
> Typically, the features are attributes of cpu object, parsed from "-cpu"
> option and stored in cpu object.

> 2) create a CPU object when initializing machine object and collect all the
> info from "-cpu" and drop it in the end; then why not do it when creating
> 1st vcpu like this patch.

Do VM-scope tdx initialization late enough that cpu objects are already
created at that point, so you can collect the info you need without a
dummy cpu?

I guess it could be helpful for the discussion when you can outine the
'big picture' for tdx initialization.  How does kvm accel setup look
like without TDX, and what additional actions are needed for TDX?  What
ordering requirements and other constrains exist?

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus
  2022-06-01  7:54           ` Gerd Hoffmann
@ 2022-06-02  1:01             ` Xiaoyao Li
  2022-06-07 11:16               ` Gerd Hoffmann
  0 siblings, 1 reply; 105+ messages in thread
From: Xiaoyao Li @ 2022-06-02  1:01 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 6/1/2022 3:54 PM, Gerd Hoffmann wrote:
> On Wed, Jun 01, 2022 at 03:20:46PM +0800, Xiaoyao Li wrote:
>> On 5/24/2022 2:57 PM, Gerd Hoffmann wrote:
>>>     Hi,
>>> Maybe it's a bit more work to add VM-scope initialization support to
>>> qemu.
>>
>> If just introducing VM-scope initialization to QEMU, it would be easy. What
>> matters is what needs to be done inside VM-scope initialization.
>>
>> For TDX, we need to settle down the features that configured for the TD.
>> Typically, the features are attributes of cpu object, parsed from "-cpu"
>> option and stored in cpu object.
> 
>> 2) create a CPU object when initializing machine object and collect all the
>> info from "-cpu" and drop it in the end; then why not do it when creating
>> 1st vcpu like this patch.
> 
> Do VM-scope tdx initialization late enough that cpu objects are already
> created at that point, so you can collect the info you need without a
> dummy cpu?

new CPU object is created during creating each vcpu. So we have to use 
mutex and flag to ensure VM-scope initialization is executed only once.

And it's werid to hook  VM-scope initialization in the middle of the 
vcpu creating phase to satisfy "late enough", so we choose to do it just 
before calling KVM API to initializing vcpu.

> I guess it could be helpful for the discussion when you can outine the
> 'big picture' for tdx initialization.  How does kvm accel setup look
> like without TDX, and what additional actions are needed for TDX?  What
> ordering requirements and other constrains exist?

To boot a TDX VM, it requires several changes/additional steps in the flow:

  1. specify the vm type KVM_X86_TDX_VM when creating VM with
     IOCTL(KVM_CREATE_VM);
	- When initializing KVM accel

  2. initialize VM scope configuration before creating any VCPU;

  3. initialize VCPU scope configuration;
	- done inside machine_init_done_notifier;

  4. initialize virtual firmware in guest private memory before vcpu 
running;
	- done inside machine_init_done_notifier;

  5. finalize the TD's measurement;
	- done inside machine init_done_notifier;


And we are discussing where to do step 2).

We can find from the code of tdx_pre_create_vcpu(), that it needs
cpuid entries[] and attributes as input to KVM.

   cpuid entries[] is set up by kvm_x86_arch_cpuid() mainly based on
   'CPUX86State *env'

   attributes.pks is retrieved from env->features[]
   and attributes.pmu is retrieved from x86cpu->enable_pmu

to make VM-socpe data is consistent with VCPU data, we do choose the 
point late enough to ensure all the info/configurations from VCPU are 
settle down, that just before calling KVM API to do VCPU-scope 
configuration.

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 23/36] i386/tdx: Setup the TD HOB list
  2022-05-24  7:56   ` Gerd Hoffmann
@ 2022-06-02  9:27     ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-06-02  9:27 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc, Xu, Min M

On 5/24/2022 3:56 PM, Gerd Hoffmann wrote:
>    Hi,
> 
>> +static void tdvf_hob_add_mmio_resources(TdvfHob *hob)
>> +{
>> +    MachineState *ms = MACHINE(qdev_get_machine());
>> +    X86MachineState *x86ms = X86_MACHINE(ms);
>> +    PCIHostState *pci_host;
>> +    uint64_t start, end;
>> +    uint64_t mcfg_base, mcfg_size;
>> +    Object *host;
>> +
>> +    /* Effectively PCI hole + other MMIO devices. */
>> +    tdvf_hob_add_mmio_resource(hob, x86ms->below_4g_mem_size,
>> +                               APIC_DEFAULT_ADDRESS);
>> +
>> +    /* Stolen from acpi_get_i386_pci_host(), there's gotta be an easier way. */
>> +    pci_host = OBJECT_CHECK(PCIHostState,
>> +                            object_resolve_path("/machine/i440fx", NULL),
>> +                            TYPE_PCI_HOST_BRIDGE);
>> +    if (!pci_host) {
>> +        pci_host = OBJECT_CHECK(PCIHostState,
>> +                                object_resolve_path("/machine/q35", NULL),
>> +                                TYPE_PCI_HOST_BRIDGE);
>> +    }
>> +    g_assert(pci_host);
>> +
>> +    host = OBJECT(pci_host);
>> +
>> +    /* PCI hole above 4gb. */
>> +    start = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_START,
>> +                                     NULL);
>> +    end = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_END, NULL);
>> +    tdvf_hob_add_mmio_resource(hob, start, end);
>> +
>> +    /* MMCFG region */
>> +    mcfg_base = object_property_get_uint(host, PCIE_HOST_MCFG_BASE, NULL);
>> +    mcfg_size = object_property_get_uint(host, PCIE_HOST_MCFG_SIZE, NULL);
>> +    if (mcfg_base && mcfg_base != PCIE_BASE_ADDR_UNMAPPED && mcfg_size) {
>> +        tdvf_hob_add_mmio_resource(hob, mcfg_base, mcfg_base + mcfg_size);
>> +    }
>> +}
> 
> That looks suspicious.  I think you need none of this, except for the
> first tdvf_hob_add_mmio_resource() call which adds the below-4G hole.

for below-4G hole, it seems can be removed as well since I notice that 
OVMF will prepare that mmio hob for TD, in OVMF. Is it correct?

> It is the firmware which places the mmio resources into the address
> space by programming the pci config space of the devices.  qemu doesn't
> dictate any of this, and I doubt you get any useful values here.  The
> core runs before the firmware had the chance to do any setup here ...
> 
>> new file mode 100644
>> index 000000000000..b15aba796156
>> --- /dev/null
>> +++ b/hw/i386/uefi.h
> 
> Separate patch please.
> 
> Also this should probably go somewhere below
> include/standard-headers/

I will do it in next post.

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus
  2022-06-02  1:01             ` Xiaoyao Li
@ 2022-06-07 11:16               ` Gerd Hoffmann
  2022-06-08  1:50                 ` Xiaoyao Li
  0 siblings, 1 reply; 105+ messages in thread
From: Gerd Hoffmann @ 2022-06-07 11:16 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> > I guess it could be helpful for the discussion when you can outine the
> > 'big picture' for tdx initialization.  How does kvm accel setup look
> > like without TDX, and what additional actions are needed for TDX?  What
> > ordering requirements and other constrains exist?
> 
> To boot a TDX VM, it requires several changes/additional steps in the flow:
> 
>  1. specify the vm type KVM_X86_TDX_VM when creating VM with
>     IOCTL(KVM_CREATE_VM);
> 	- When initializing KVM accel
> 
>  2. initialize VM scope configuration before creating any VCPU;
> 
>  3. initialize VCPU scope configuration;
> 	- done inside machine_init_done_notifier;
> 
>  4. initialize virtual firmware in guest private memory before vcpu running;
> 	- done inside machine_init_done_notifier;
> 
>  5. finalize the TD's measurement;
> 	- done inside machine init_done_notifier;
> 
> 
> And we are discussing where to do step 2).
> 
> We can find from the code of tdx_pre_create_vcpu(), that it needs
> cpuid entries[] and attributes as input to KVM.
> 
>   cpuid entries[] is set up by kvm_x86_arch_cpuid() mainly based on
>   'CPUX86State *env'
> 
>   attributes.pks is retrieved from env->features[]
>   and attributes.pmu is retrieved from x86cpu->enable_pmu
> 
> to make VM-socpe data is consistent with VCPU data, we do choose the point
> late enough to ensure all the info/configurations from VCPU are settle down,
> that just before calling KVM API to do VCPU-scope configuration.

So essentially tdx defines (some) vcpu properties at vm scope?  Given
that all vcpus typically identical (and maybe tdx even enforces this)
this makes sense.

A comment in the source code explaining this would be good.

thanks,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus
  2022-06-07 11:16               ` Gerd Hoffmann
@ 2022-06-08  1:50                 ` Xiaoyao Li
  0 siblings, 0 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-06-08  1:50 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 6/7/2022 7:16 PM, Gerd Hoffmann wrote:
>    Hi,
> 
>>> I guess it could be helpful for the discussion when you can outine the
>>> 'big picture' for tdx initialization.  How does kvm accel setup look
>>> like without TDX, and what additional actions are needed for TDX?  What
>>> ordering requirements and other constrains exist?
>>
>> To boot a TDX VM, it requires several changes/additional steps in the flow:
>>
>>   1. specify the vm type KVM_X86_TDX_VM when creating VM with
>>      IOCTL(KVM_CREATE_VM);
>> 	- When initializing KVM accel
>>
>>   2. initialize VM scope configuration before creating any VCPU;
>>
>>   3. initialize VCPU scope configuration;
>> 	- done inside machine_init_done_notifier;
>>
>>   4. initialize virtual firmware in guest private memory before vcpu running;
>> 	- done inside machine_init_done_notifier;
>>
>>   5. finalize the TD's measurement;
>> 	- done inside machine init_done_notifier;
>>
>>
>> And we are discussing where to do step 2).
>>
>> We can find from the code of tdx_pre_create_vcpu(), that it needs
>> cpuid entries[] and attributes as input to KVM.
>>
>>    cpuid entries[] is set up by kvm_x86_arch_cpuid() mainly based on
>>    'CPUX86State *env'
>>
>>    attributes.pks is retrieved from env->features[]
>>    and attributes.pmu is retrieved from x86cpu->enable_pmu
>>
>> to make VM-socpe data is consistent with VCPU data, we do choose the point
>> late enough to ensure all the info/configurations from VCPU are settle down,
>> that just before calling KVM API to do VCPU-scope configuration.
> 
> So essentially tdx defines (some) vcpu properties at vm scope?  

Not TDX, but QEMU. Most of the CPU features are configrued by "-cpu" 
option not "-machine" option.

> Given
> that all vcpus typically identical (and maybe tdx even enforces this)
> this makes sense.
> 
> A comment in the source code explaining this would be good.
> 
> thanks,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup
  2022-05-30 11:49       ` Gerd Hoffmann
@ 2022-07-29  7:14         ` Xiaoyao Li
  2022-08-16  7:13           ` Gerd Hoffmann
  2022-08-16  7:16           ` Gerd Hoffmann
  0 siblings, 2 replies; 105+ messages in thread
From: Xiaoyao Li @ 2022-07-29  7:14 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On 5/30/2022 7:49 PM, Gerd Hoffmann wrote:
> On Thu, May 26, 2022 at 10:48:56AM +0800, Xiaoyao Li wrote:
>> On 5/24/2022 3:08 PM, Gerd Hoffmann wrote:
>>> On Thu, May 12, 2022 at 11:17:45AM +0800, Xiaoyao Li wrote:
>>>> TDX guest cannot go to real mode, so just skip the setup of isa-bios.
>>>
>>> Does isa-bios setup cause any actual problems?
>>> (same question for patch #19).
>>
>> It causes mem_region split and mem_slot deletion on KVM.
>>
>> TDVF marks pages starting from 0x800000 as TEMP_MEM and TD_HOB, which are
>> TD's private memory and are TDH_MEM_PAGE_ADD'ed to TD via
>> KVM_TDX_INIT_MEM_REGION
>>
>> However, if isa-bios and pc.rom are not skipped, the memory_region
>> initialization of them is after KVM_TDX_INIT_MEM_REGION in
>> tdx_machine_done_notify(). (I didn't figure out why this order though)
>>
>> And the it causes memory region split that splits
>> 	[0, ram_below_4g)
>> to
>> 	[0, 0xc0 000),
>> 	[0xc0 000, 0xe0 000),
>> 	[0xe0 000, 0x100 000),
>> 	[0x100 000, ram_below_4g)
>>
>> which causes mem_slot deletion on KVM. On KVM side, we lose the page content
>> when mem_slot deletion.  Thus, the we lose the content of TD HOB.
> 
> Hmm, removing and re-creating memory slots shouldn't cause page content
> go away.   I'm wondering what the *real* problem is?  Maybe you loose
> tdx-specific state, i.e. this removes TDH_MEM_PAGE_ADD changes?
> 
>> Yes, the better solution seems to be ensure KVM_TDX_INIT_MEM_REGION is
>> called after all the mem region is settled down.
> 
> Yes, especially if tdx can't tolerate memory slots coming and going.

Actually, only the private memory that is assumed as already-accepted 
via SEAMALL(TDH.MEM.PAGE.ADD) in the point of view of TDVF cannot 
tolerate being removed. TDVF assumes those memory has initialized 
content and can be accessed directly. In other words, QEMU needs to 
always calls SEAMALL(TDH.MEM.PAGE.ADD) to "add" those memory before TDVF 
runs.

>> But I haven't figured out the reason why the isa-bios and pc.rom
>> initialization happens after machine_init_done_notifier
> 
> Probably happens when a flatview is created from the address space.
> 
> Maybe that is delayed somehow for machine creation, so all the address
> space updates caused by device creation don't lead to lots of flatviews
> being created and thrown away.

sorry for the late response.

I did some tracing for this, and the result differs for q35 machine type 
and pc machine type.

- For q35, the memslot update for isa-bios/pc.rom happens when 
mc->reset() that is triggered via

   qdev_machine_creation_done()
     -> qemu_system_reset(SHUTDOWN_CASE_NONE);

It's surely later than TDX's machine_init_done_notify callback which 
initializes the part of private memory via KVM_TDX_INIT_MEM_REGION

- For pc machine type, the memslot update happens in i440fx_init(), 
which is earlier than TDX's machine_init_done_notify callback

I haven't fully understand in what condition will QEMU carry out the 
memslot update yet. I will keep learning and try to come up a solution 
to ensure TDX's machine_init_done_notify callback executed after all the 
memslot settle down.

>> on the other hand, to keep isa-bios and pc.rom, we need additional work to
>> copy the content from the end_of_4G to end_of_1M.
> 
> There is no need for copying, end_of_1M is a alias memory region for
> end_of_4G, so the backing storage is the same.

It is a reason that current alias approach cannot work for TDX. Because 
in TDX a private page can be only mapped to one gpa. So for simplicity, 
I will just skip isa-bios shadowing for TDX instead of implementing a 
non-alias + memcpy approach.

For pc.rom in next patch, I don't have strong reason to skip it. But I 
will keep it in next version to make whole TDX patches work for q35 
machine type until I think up a good solution to ensure the memslot 
update happens before TDX's machine_init_done_notify callback.

>> I'm not sure if isa-bios and pc.rom are needed from people on TD guest, so I
>> just skip them for simplicity,
> 
> Given that TDX guests start in 32bit mode not in real mode everything
> should work fine without isa-bios.
> 
> I'd prefer to avoid creating a special case for tdx though.  Should make
> long-term maintenance a bit easier when this is not needed.
> 
> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup
  2022-07-29  7:14         ` Xiaoyao Li
@ 2022-08-16  7:13           ` Gerd Hoffmann
  2022-08-16  7:16           ` Gerd Hoffmann
  1 sibling, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-08-16  7:13 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

On Fri, Jul 29, 2022 at 03:14:02PM +0800, Xiaoyao Li wrote:
> On 5/30/2022 7:49 PM, Gerd Hoffmann wrote:
> > On Thu, May 26, 2022 at 10:48:56AM +0800, Xiaoyao Li wrote:
> > > On 5/24/2022 3:08 PM, Gerd Hoffmann wrote:
> > > > On Thu, May 12, 2022 at 11:17:45AM +0800, Xiaoyao Li wrote:
> > > > > TDX guest cannot go to real mode, so just skip the setup of isa-bios.
> > > > 
> > > > Does isa-bios setup cause any actual problems?
> > > > (same question for patch #19).

> > There is no need for copying, end_of_1M is a alias memory region for
> > end_of_4G, so the backing storage is the same.
> 
> It is a reason that current alias approach cannot work for TDX. Because in
> TDX a private page can be only mapped to one gpa.

Ok, so memory aliasing not being supported by TDX is the underlying
reason.

> So for simplicity, I will
> just skip isa-bios shadowing for TDX instead of implementing a non-alias +
> memcpy approach.

Makes sense given that tdx wouldn't use the mapping below 1M anyway.
A comment explaining the tdx aliasing restriction would be good to make
clear why the special case for tdx exists.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup
  2022-07-29  7:14         ` Xiaoyao Li
  2022-08-16  7:13           ` Gerd Hoffmann
@ 2022-08-16  7:16           ` Gerd Hoffmann
  1 sibling, 0 replies; 105+ messages in thread
From: Gerd Hoffmann @ 2022-08-16  7:16 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Isaku Yamahata, isaku.yamahata,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Richard Henderson, Michael S . Tsirkin, Marcel Apfelbaum,
	Cornelia Huck, Marcelo Tosatti, Laszlo Ersek, Eric Blake,
	Connor Kuehl, erdemaktas, kvm, qemu-devel, seanjc

  Hi,

> I did some tracing for this, and the result differs for q35 machine type and
> pc machine type.
> 
> - For q35, the memslot update for isa-bios/pc.rom happens when mc->reset()
> that is triggered via
> 
>   qdev_machine_creation_done()
>     -> qemu_system_reset(SHUTDOWN_CASE_NONE);
> 
> It's surely later than TDX's machine_init_done_notify callback which
> initializes the part of private memory via KVM_TDX_INIT_MEM_REGION
> 
> - For pc machine type, the memslot update happens in i440fx_init(), which is
> earlier than TDX's machine_init_done_notify callback
> 
> I haven't fully understand in what condition will QEMU carry out the memslot
> update yet. I will keep learning and try to come up a solution to ensure
> TDX's machine_init_done_notify callback executed after all the memslot
> settle down.

My guess would be the rom shadowing initialization being slightly
different in 'pc' and 'q35'.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 105+ messages in thread

end of thread, other threads:[~2022-08-16  9:04 UTC | newest]

Thread overview: 105+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12  3:17 [RFC PATCH v4 00/36] TDX QEMU support Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 01/36] *** HACK *** linux-headers: Update headers to pull in TDX API changes Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 02/36] i386: Introduce tdx-guest object Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 03/36] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
2022-05-23  8:36   ` Gerd Hoffmann
2022-05-23 14:55     ` Isaku Yamahata
2022-05-12  3:17 ` [RFC PATCH v4 04/36] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
2022-05-23  8:37   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 05/36] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context Xiaoyao Li
2022-05-23  8:38   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 06/36] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES Xiaoyao Li
2022-05-12 17:38   ` Isaku Yamahata
2022-05-23  8:45   ` Gerd Hoffmann
2022-05-23 15:30     ` Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 07/36] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object Xiaoyao Li
2022-05-23  8:48   ` Gerd Hoffmann
2022-05-23 14:59     ` Isaku Yamahata
2022-05-24  6:42       ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 08/36] i386/tdx: Adjust get_supported_cpuid() for TDX VM Xiaoyao Li
2022-05-23  9:01   ` Gerd Hoffmann
2022-05-23 15:37     ` Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 09/36] KVM: Introduce kvm_arch_pre_create_vcpu() Xiaoyao Li
2022-05-12 17:50   ` Isaku Yamahata
2022-05-13  0:15     ` Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 10/36] i386/kvm: Move architectural CPUID leaf generation to separate helper Xiaoyao Li
2022-05-12 17:48   ` Isaku Yamahata
2022-05-13  0:37     ` Xiaoyao Li
2022-05-23  9:06     ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 11/36] i386/tdx: Initialize TDX before creating TD vcpus Xiaoyao Li
2022-05-23  9:20   ` Gerd Hoffmann
2022-05-23 15:42     ` Xiaoyao Li
2022-05-24  6:57       ` Gerd Hoffmann
2022-06-01  7:20         ` Xiaoyao Li
2022-06-01  7:54           ` Gerd Hoffmann
2022-06-02  1:01             ` Xiaoyao Li
2022-06-07 11:16               ` Gerd Hoffmann
2022-06-08  1:50                 ` Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 12/36] i386/tdx: Wire CPU features up with attributes of TD guest Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 13/36] i386/tdx: Validate TD attributes Xiaoyao Li
2022-05-23  9:39   ` Gerd Hoffmann
2022-05-24  4:19     ` Xiaoyao Li
2022-05-24  6:59       ` Gerd Hoffmann
2022-05-24  8:11         ` Xiaoyao Li
2022-05-24  8:29           ` Gerd Hoffmann
2022-05-26  3:44             ` Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 14/36] i386/tdx: Implement user specified tsc frequency Xiaoyao Li
2022-05-12 18:04   ` Isaku Yamahata
2022-05-13  0:46     ` Xiaoyao Li
2022-05-23  9:43   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 15/36] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM Xiaoyao Li
2022-05-23  9:45   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 16/36] i386/tdvf: Introduce function to parse TDVF metadata Xiaoyao Li
2022-05-24  7:02   ` Gerd Hoffmann
2022-05-26  2:25     ` Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 17/36] i386/tdx: Parse TDVF metadata for TDX VM Xiaoyao Li
2022-05-24  7:03   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 18/36] i386/tdx: Skip BIOS shadowing setup Xiaoyao Li
2022-05-24  7:08   ` Gerd Hoffmann
2022-05-26  2:48     ` Xiaoyao Li
2022-05-30 11:49       ` Gerd Hoffmann
2022-07-29  7:14         ` Xiaoyao Li
2022-08-16  7:13           ` Gerd Hoffmann
2022-08-16  7:16           ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 19/36] i386/tdx: Don't initialize pc.rom for TDX VMs Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 20/36] i386/tdx: Register a machine_init_done callback for TD Xiaoyao Li
2022-05-24  7:09   ` Gerd Hoffmann
2022-05-26  2:52     ` Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 21/36] i386/tdx: Track mem_ptr for each firmware entry of TDVF Xiaoyao Li
2022-05-24  7:11   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 22/36] i386/tdx: Track RAM entries for TDX VM Xiaoyao Li
2022-05-24  7:37   ` Gerd Hoffmann
2022-05-26  7:33     ` Xiaoyao Li
2022-05-26 18:48       ` Isaku Yamahata
2022-05-27  8:39         ` Xiaoyao Li
2022-05-30 11:59           ` Gerd Hoffmann
2022-05-31  2:09             ` Xiaoyao Li
2022-05-27  8:36       ` Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 23/36] i386/tdx: Setup the TD HOB list Xiaoyao Li
2022-05-12 18:33   ` Isaku Yamahata
2022-05-24  7:56   ` Gerd Hoffmann
2022-06-02  9:27     ` Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 24/36] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION Xiaoyao Li
2022-05-12 18:34   ` Isaku Yamahata
2022-05-13  0:46     ` Xiaoyao Li
2022-05-24  7:57   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 25/36] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu Xiaoyao Li
2022-05-24  7:59   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 26/36] i386/tdx: Finalize TDX VM Xiaoyao Li
2022-05-24  7:59   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 27/36] i386/tdx: Disable SMM for TDX VMs Xiaoyao Li
2022-05-24  8:00   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 28/36] i386/tdx: Disable PIC " Xiaoyao Li
2022-05-24  8:00   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 29/36] i386/tdx: Don't allow system reset " Xiaoyao Li
2022-05-24  8:01   ` Gerd Hoffmann
2022-05-12  3:17 ` [RFC PATCH v4 30/36] hw/i386: add eoi_intercept_unsupported member to X86MachineState Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 31/36] hw/i386: add option to forcibly report edge trigger in acpi tables Xiaoyao Li
2022-05-12  3:17 ` [RFC PATCH v4 32/36] i386/tdx: Don't synchronize guest tsc for TDs Xiaoyao Li
2022-05-24  8:04   ` Gerd Hoffmann
2022-05-12  3:18 ` [RFC PATCH v4 33/36] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() " Xiaoyao Li
2022-05-24  8:05   ` Gerd Hoffmann
2022-05-12  3:18 ` [RFC PATCH v4 34/36] i386/tdx: Skip kvm_put_apicbase() " Xiaoyao Li
2022-05-12  3:18 ` [RFC PATCH v4 35/36] i386/tdx: Don't get/put guest state for TDX VMs Xiaoyao Li
2022-05-12  3:18 ` [RFC PATCH v4 36/36] docs: Add TDX documentation Xiaoyao Li
2022-05-12 18:42   ` Isaku Yamahata

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.