All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/44] TDX support
@ 2021-07-08  0:54 ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch series is to enable TDX support.  This needs corresponding KVM patch
for TDX [1].  The patch [1] requires more patches to be function. So this patch
series is RFC.  For those who want to try github repo is available at [2].

Patch series is organized as follows.
 1- 5 code refactoring and simple hooks that will be used later
 6- 9 introduce kvm type and tdx type. disallow non-usable operations
10-15 wire up necessary TDX kvm ioctl to initialize TD guest
16-24 load TDVF and setup necessary info for TDVF
25-26 prohibit unsupported operations related to SMM
28-29 force x2apic and disable PIC
30-31 allows user to specify sha384 value for TD guest
32-33 add qmp operation to query KVM capability and TD info
34    make reboot action shutdown
35-43 suppress level-trigger/SMI/INIT/SIPI
44    suppress S3/S4

TODO:
- gdb support
- sanity check of CPUID

Changes from v1:
- suppress level trigger/SMI/INIT/SIPI related to IOAPIC.
- add VM attribute sha384 to TD measurement.
- guest TSC Hz specification.

Links:
[1] KVM TDX patch series v2
    https://patchwork.kernel.org/project/kvm/list/?series=510271
[2] intel public github
   kvm TDX branch: https://github.com/intel/tdx/tree/kvm
   TDX guest branch: https://github.com/intel/tdx/tree/guest
   qemu TDX https://github.com/intel/qemu-tdx
[3] TDVF
    https://github.com/tianocore/edk2-staging/tree/TDVF
[4] TDX specs
https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html

Chenyi Qiang (1):
  qmp: add query-tdx-capabilities query-tdx command

Isaku Yamahata (29):
  kvm: Switch KVM_CAP_READONLY_MEM to a per-VM ioctl()
  vl: Introduce machine_init_done_late notifier
  i386/kvm: Skip KVM_X86_SETUP_MCE for TDX guests
  target/i386: kvm: don't synchronize guest tsc for TD guest
  i386/tdx: Frame in the call for KVM_TDX_INIT_VCPU
  hw/i386: Add definitions from UEFI spec for volumes, resources, etc...
  i386/tdx: Add definitions for TDVF metadata
  hw/i386: refactor e820_add_entry()
  hw/i386/e820: introduce a helper function to change type of e820
  i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  i386/tdx: Create the TD HOB list upon machine init done
  i386/tdx: Add TDVF memory via INIT_MEM_REGION
  i386/tdx: Use KVM_TDX_INIT_VCPU to pass HOB to TDVF
  pci-host/q35: Move PAM initialization above SMRAM initialization
  q35: Introduce smm_ranges property for q35-pci-host
  qom: implement property helper for sha384
  target/i386/tdx: Allows mrconfigid/mrowner/mrownerconfig for
    TDX_INIT_VM
  tdx: add kvm_tdx_enabled() accessor for later use
  target/i386/tdx: set reboot action to shutdown when tdx
  ioapic: add property to disable level interrupt
  hw/i386: add eoi_intercept_unsupported member to X86MachineState
  hw/i386: add option to forcibly report edge trigger in acpi tables
  hw/i386: plug eoi_intercept_unsupported to ioapic
  ioapic: add property to disallow SMI delivery mode
  hw/i386: add a flag to disallow SMI
  ioapic: add property to disallow INIT/SIPI delivery mode
  hw/i386: add a flag to disable init/sipi delivery mode of interrupt
  i386/tdx: disallow level interrupt and SMI/INIT/SIPI delivery mode
  i386/tdx: disable S3/S4 unconditionally

Sean Christopherson (9):
  target/i386: Expose x86_cpu_get_supported_feature_word() for TDX
  i386/kvm: Move architectural CPUID leaf generation to separarte helper
  i386/kvm: Squash getting/putting guest state for TDX VMs
  i386/tdx: Frame in tdx_get_supported_cpuid with KVM_TDX_CAPABILITIES
  i386/tdx: Add hook to require generic device loader
  i386/tdx: Add MMIO HOB entries
  q35: Move PCIe BAR check above PAM check in mch_write_config()
  i386/tdx: Force x2apic mode and routing for TDs
  target/i386: Add machine option to disable PIC/8259

Xiaoyao Li (5):
  linux-headers: Update headers to pull in TDX API changes
  hw/i386: Introduce kvm-type for TDX guest
  hw/i386: Initialize TDX via KVM ioctl() when kvm_type is TDX
  i386/tdx: Implement user specified tsc frequency
  target/i386/tdx: Finalize the TD's measurement when machine is done

 accel/kvm/kvm-all.c                      |   4 +-
 default-configs/devices/i386-softmmu.mak |   1 +
 hw/core/generic-loader.c                 |   5 +
 hw/core/machine.c                        |  26 ++
 hw/core/meson.build                      |   3 +
 hw/core/tdvf-stub.c                      |   6 +
 hw/i386/Kconfig                          |   5 +
 hw/i386/acpi-build.c                     | 103 +++--
 hw/i386/acpi-common.c                    |  74 +++-
 hw/i386/e820_memory_layout.c             | 114 +++++-
 hw/i386/e820_memory_layout.h             |   1 +
 hw/i386/meson.build                      |   1 +
 hw/i386/microvm.c                        |   7 +-
 hw/i386/pc.c                             |  18 +
 hw/i386/pc_piix.c                        |   7 +-
 hw/i386/pc_q35.c                         |   9 +-
 hw/i386/pc_sysfw.c                       |   6 +
 hw/i386/tdvf-hob.c                       | 235 +++++++++++
 hw/i386/tdvf-hob.h                       |  25 ++
 hw/i386/tdvf.c                           | 312 ++++++++++++++
 hw/i386/uefi.h                           | 496 +++++++++++++++++++++++
 hw/i386/x86.c                            |  72 +++-
 hw/intc/apic_common.c                    |  12 +
 hw/intc/ioapic.c                         |  57 +++
 hw/intc/ioapic_common.c                  |  68 ++++
 hw/pci-host/q35.c                        |  67 +--
 include/hw/i386/apic.h                   |   1 +
 include/hw/i386/apic_internal.h          |   1 +
 include/hw/i386/ioapic_internal.h        |   3 +
 include/hw/i386/pc.h                     |   3 +
 include/hw/i386/tdvf.h                   |  55 +++
 include/hw/i386/x86.h                    |  14 +-
 include/hw/pci-host/q35.h                |   1 +
 include/qom/object.h                     |  17 +
 include/sysemu/sysemu.h                  |   2 +
 include/sysemu/tdvf.h                    |   6 +
 include/sysemu/tdx.h                     |  22 +
 linux-headers/asm-x86/kvm.h              |  60 +++
 linux-headers/linux/kvm.h                |   2 +
 qapi/misc-target.json                    |  59 +++
 qapi/qom.json                            |  23 ++
 qom/object.c                             |  76 ++++
 target/i386/cpu.c                        |   4 +-
 target/i386/cpu.h                        |   3 +
 target/i386/kvm/kvm-stub.c               |   5 +
 target/i386/kvm/kvm.c                    | 255 +++++++-----
 target/i386/kvm/kvm_i386.h               |   5 +
 target/i386/kvm/meson.build              |   1 +
 target/i386/kvm/tdx-stub.c               |  33 ++
 target/i386/kvm/tdx.c                    | 417 +++++++++++++++++++
 target/i386/kvm/tdx.h                    |  58 +++
 target/i386/monitor.c                    |  23 ++
 52 files changed, 2685 insertions(+), 198 deletions(-)
 create mode 100644 hw/core/tdvf-stub.c
 create mode 100644 hw/i386/tdvf-hob.c
 create mode 100644 hw/i386/tdvf-hob.h
 create mode 100644 hw/i386/tdvf.c
 create mode 100644 hw/i386/uefi.h
 create mode 100644 include/hw/i386/tdvf.h
 create mode 100644 include/sysemu/tdvf.h
 create mode 100644 include/sysemu/tdx.h
 create mode 100644 target/i386/kvm/tdx-stub.c
 create mode 100644 target/i386/kvm/tdx.c
 create mode 100644 target/i386/kvm/tdx.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 00/44] TDX support
@ 2021-07-08  0:54 ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

This patch series is to enable TDX support.  This needs corresponding KVM patch
for TDX [1].  The patch [1] requires more patches to be function. So this patch
series is RFC.  For those who want to try github repo is available at [2].

Patch series is organized as follows.
 1- 5 code refactoring and simple hooks that will be used later
 6- 9 introduce kvm type and tdx type. disallow non-usable operations
10-15 wire up necessary TDX kvm ioctl to initialize TD guest
16-24 load TDVF and setup necessary info for TDVF
25-26 prohibit unsupported operations related to SMM
28-29 force x2apic and disable PIC
30-31 allows user to specify sha384 value for TD guest
32-33 add qmp operation to query KVM capability and TD info
34    make reboot action shutdown
35-43 suppress level-trigger/SMI/INIT/SIPI
44    suppress S3/S4

TODO:
- gdb support
- sanity check of CPUID

Changes from v1:
- suppress level trigger/SMI/INIT/SIPI related to IOAPIC.
- add VM attribute sha384 to TD measurement.
- guest TSC Hz specification.

Links:
[1] KVM TDX patch series v2
    https://patchwork.kernel.org/project/kvm/list/?series=510271
[2] intel public github
   kvm TDX branch: https://github.com/intel/tdx/tree/kvm
   TDX guest branch: https://github.com/intel/tdx/tree/guest
   qemu TDX https://github.com/intel/qemu-tdx
[3] TDVF
    https://github.com/tianocore/edk2-staging/tree/TDVF
[4] TDX specs
https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html

Chenyi Qiang (1):
  qmp: add query-tdx-capabilities query-tdx command

Isaku Yamahata (29):
  kvm: Switch KVM_CAP_READONLY_MEM to a per-VM ioctl()
  vl: Introduce machine_init_done_late notifier
  i386/kvm: Skip KVM_X86_SETUP_MCE for TDX guests
  target/i386: kvm: don't synchronize guest tsc for TD guest
  i386/tdx: Frame in the call for KVM_TDX_INIT_VCPU
  hw/i386: Add definitions from UEFI spec for volumes, resources, etc...
  i386/tdx: Add definitions for TDVF metadata
  hw/i386: refactor e820_add_entry()
  hw/i386/e820: introduce a helper function to change type of e820
  i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  i386/tdx: Create the TD HOB list upon machine init done
  i386/tdx: Add TDVF memory via INIT_MEM_REGION
  i386/tdx: Use KVM_TDX_INIT_VCPU to pass HOB to TDVF
  pci-host/q35: Move PAM initialization above SMRAM initialization
  q35: Introduce smm_ranges property for q35-pci-host
  qom: implement property helper for sha384
  target/i386/tdx: Allows mrconfigid/mrowner/mrownerconfig for
    TDX_INIT_VM
  tdx: add kvm_tdx_enabled() accessor for later use
  target/i386/tdx: set reboot action to shutdown when tdx
  ioapic: add property to disable level interrupt
  hw/i386: add eoi_intercept_unsupported member to X86MachineState
  hw/i386: add option to forcibly report edge trigger in acpi tables
  hw/i386: plug eoi_intercept_unsupported to ioapic
  ioapic: add property to disallow SMI delivery mode
  hw/i386: add a flag to disallow SMI
  ioapic: add property to disallow INIT/SIPI delivery mode
  hw/i386: add a flag to disable init/sipi delivery mode of interrupt
  i386/tdx: disallow level interrupt and SMI/INIT/SIPI delivery mode
  i386/tdx: disable S3/S4 unconditionally

Sean Christopherson (9):
  target/i386: Expose x86_cpu_get_supported_feature_word() for TDX
  i386/kvm: Move architectural CPUID leaf generation to separarte helper
  i386/kvm: Squash getting/putting guest state for TDX VMs
  i386/tdx: Frame in tdx_get_supported_cpuid with KVM_TDX_CAPABILITIES
  i386/tdx: Add hook to require generic device loader
  i386/tdx: Add MMIO HOB entries
  q35: Move PCIe BAR check above PAM check in mch_write_config()
  i386/tdx: Force x2apic mode and routing for TDs
  target/i386: Add machine option to disable PIC/8259

Xiaoyao Li (5):
  linux-headers: Update headers to pull in TDX API changes
  hw/i386: Introduce kvm-type for TDX guest
  hw/i386: Initialize TDX via KVM ioctl() when kvm_type is TDX
  i386/tdx: Implement user specified tsc frequency
  target/i386/tdx: Finalize the TD's measurement when machine is done

 accel/kvm/kvm-all.c                      |   4 +-
 default-configs/devices/i386-softmmu.mak |   1 +
 hw/core/generic-loader.c                 |   5 +
 hw/core/machine.c                        |  26 ++
 hw/core/meson.build                      |   3 +
 hw/core/tdvf-stub.c                      |   6 +
 hw/i386/Kconfig                          |   5 +
 hw/i386/acpi-build.c                     | 103 +++--
 hw/i386/acpi-common.c                    |  74 +++-
 hw/i386/e820_memory_layout.c             | 114 +++++-
 hw/i386/e820_memory_layout.h             |   1 +
 hw/i386/meson.build                      |   1 +
 hw/i386/microvm.c                        |   7 +-
 hw/i386/pc.c                             |  18 +
 hw/i386/pc_piix.c                        |   7 +-
 hw/i386/pc_q35.c                         |   9 +-
 hw/i386/pc_sysfw.c                       |   6 +
 hw/i386/tdvf-hob.c                       | 235 +++++++++++
 hw/i386/tdvf-hob.h                       |  25 ++
 hw/i386/tdvf.c                           | 312 ++++++++++++++
 hw/i386/uefi.h                           | 496 +++++++++++++++++++++++
 hw/i386/x86.c                            |  72 +++-
 hw/intc/apic_common.c                    |  12 +
 hw/intc/ioapic.c                         |  57 +++
 hw/intc/ioapic_common.c                  |  68 ++++
 hw/pci-host/q35.c                        |  67 +--
 include/hw/i386/apic.h                   |   1 +
 include/hw/i386/apic_internal.h          |   1 +
 include/hw/i386/ioapic_internal.h        |   3 +
 include/hw/i386/pc.h                     |   3 +
 include/hw/i386/tdvf.h                   |  55 +++
 include/hw/i386/x86.h                    |  14 +-
 include/hw/pci-host/q35.h                |   1 +
 include/qom/object.h                     |  17 +
 include/sysemu/sysemu.h                  |   2 +
 include/sysemu/tdvf.h                    |   6 +
 include/sysemu/tdx.h                     |  22 +
 linux-headers/asm-x86/kvm.h              |  60 +++
 linux-headers/linux/kvm.h                |   2 +
 qapi/misc-target.json                    |  59 +++
 qapi/qom.json                            |  23 ++
 qom/object.c                             |  76 ++++
 target/i386/cpu.c                        |   4 +-
 target/i386/cpu.h                        |   3 +
 target/i386/kvm/kvm-stub.c               |   5 +
 target/i386/kvm/kvm.c                    | 255 +++++++-----
 target/i386/kvm/kvm_i386.h               |   5 +
 target/i386/kvm/meson.build              |   1 +
 target/i386/kvm/tdx-stub.c               |  33 ++
 target/i386/kvm/tdx.c                    | 417 +++++++++++++++++++
 target/i386/kvm/tdx.h                    |  58 +++
 target/i386/monitor.c                    |  23 ++
 52 files changed, 2685 insertions(+), 198 deletions(-)
 create mode 100644 hw/core/tdvf-stub.c
 create mode 100644 hw/i386/tdvf-hob.c
 create mode 100644 hw/i386/tdvf-hob.h
 create mode 100644 hw/i386/tdvf.c
 create mode 100644 hw/i386/uefi.h
 create mode 100644 include/hw/i386/tdvf.h
 create mode 100644 include/sysemu/tdvf.h
 create mode 100644 include/sysemu/tdx.h
 create mode 100644 target/i386/kvm/tdx-stub.c
 create mode 100644 target/i386/kvm/tdx.c
 create mode 100644 target/i386/kvm/tdx.h

-- 
2.25.1



^ permalink raw reply	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 01/44] target/i386: Expose x86_cpu_get_supported_feature_word() for TDX
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Expose x86_cpu_get_supported_feature_word() outside of cpu.c so that it
can be used by TDX to setup the VM-wide CPUID configuration.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/cpu.c | 4 ++--
 target/i386/cpu.h | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d8f3ab3192..45b81a63df 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4894,8 +4894,8 @@ CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
     return cpu_list;
 }
 
-static uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
-                                                   bool migratable_only)
+uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
+                                            bool migratable_only)
 {
     FeatureWordInfo *wi = &feature_word_info[w];
     uint64_t r = 0;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f7fa5870b1..ff8f9532b9 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1838,6 +1838,9 @@ void cpu_clear_ignne(void);
 /* mpx_helper.c */
 void cpu_sync_bndcs_hflags(CPUX86State *env);
 
+uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
+                                            bool migratable_only);
+
 /* this function must always be used to load data in the segment
    cache: it synchronizes the hflags with the segment cache values */
 static inline void cpu_x86_load_seg_cache(CPUX86State *env,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 01/44] target/i386: Expose x86_cpu_get_supported_feature_word() for TDX
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Sean Christopherson <sean.j.christopherson@intel.com>

Expose x86_cpu_get_supported_feature_word() outside of cpu.c so that it
can be used by TDX to setup the VM-wide CPUID configuration.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/cpu.c | 4 ++--
 target/i386/cpu.h | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d8f3ab3192..45b81a63df 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4894,8 +4894,8 @@ CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
     return cpu_list;
 }
 
-static uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
-                                                   bool migratable_only)
+uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
+                                            bool migratable_only)
 {
     FeatureWordInfo *wi = &feature_word_info[w];
     uint64_t r = 0;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f7fa5870b1..ff8f9532b9 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1838,6 +1838,9 @@ void cpu_clear_ignne(void);
 /* mpx_helper.c */
 void cpu_sync_bndcs_hflags(CPUX86State *env);
 
+uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
+                                            bool migratable_only);
+
 /* this function must always be used to load data in the segment
    cache: it synchronizes the hflags with the segment cache values */
 static inline void cpu_x86_load_seg_cache(CPUX86State *env,
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 02/44] kvm: Switch KVM_CAP_READONLY_MEM to a per-VM ioctl()
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Switch to making a VM ioctl() call for KVM_CAP_READONLY_MEM, which may
be conditional on VM type in recent versions of KVM, e.g. when TDX is
supported.

kvm_vm_check_extension() has fallback from kvm_vm_ioctl() to
kvm_check_extension(). fallback from VM ioctl to System ioctl for
compatibility for old kernel.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 accel/kvm/kvm-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e5b10dd129..fdbe24bf59 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2531,7 +2531,7 @@ static int kvm_init(MachineState *ms)
     }
 
     kvm_readonly_mem_allowed =
-        (kvm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);
+        (kvm_vm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);
 
     kvm_eventfds_allowed =
         (kvm_check_extension(s, KVM_CAP_IOEVENTFD) > 0);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 02/44] kvm: Switch KVM_CAP_READONLY_MEM to a per-VM ioctl()
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Switch to making a VM ioctl() call for KVM_CAP_READONLY_MEM, which may
be conditional on VM type in recent versions of KVM, e.g. when TDX is
supported.

kvm_vm_check_extension() has fallback from kvm_vm_ioctl() to
kvm_check_extension(). fallback from VM ioctl to System ioctl for
compatibility for old kernel.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 accel/kvm/kvm-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e5b10dd129..fdbe24bf59 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2531,7 +2531,7 @@ static int kvm_init(MachineState *ms)
     }
 
     kvm_readonly_mem_allowed =
-        (kvm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);
+        (kvm_vm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);
 
     kvm_eventfds_allowed =
         (kvm_check_extension(s, KVM_CAP_IOEVENTFD) > 0);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 03/44] i386/kvm: Move architectural CPUID leaf generation to separarte helper
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the architectural (for lack of a better term) CPUID leaf generation
to a separate helper so that the generation code can be reused by TDX,
which needs to generate a canonical VM-scoped configuration.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c      | 186 +++++++++++++++++++------------------
 target/i386/kvm/kvm_i386.h |   4 +
 2 files changed, 102 insertions(+), 88 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 04e4ec063f..0558e4b506 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1507,90 +1507,12 @@ static int hyperv_init_vcpu(X86CPU *cpu)
 
 static Error *invtsc_mig_blocker;
 
-#define KVM_MAX_CPUID_ENTRIES  100
-
-int kvm_arch_init_vcpu(CPUState *cs)
+uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+                            uint32_t cpuid_i)
 {
-    struct {
-        struct kvm_cpuid2 cpuid;
-        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
-    } cpuid_data;
-    /*
-     * The kernel defines these structs with padding fields so there
-     * should be no extra padding in our cpuid_data struct.
-     */
-    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
-                      sizeof(struct kvm_cpuid2) +
-                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
-
-    X86CPU *cpu = X86_CPU(cs);
-    CPUX86State *env = &cpu->env;
-    uint32_t limit, i, j, cpuid_i;
+    uint32_t limit, i, j;
     uint32_t unused;
     struct kvm_cpuid_entry2 *c;
-    uint32_t signature[3];
-    int kvm_base = KVM_CPUID_SIGNATURE;
-    int max_nested_state_len;
-    int r;
-    Error *local_err = NULL;
-
-    memset(&cpuid_data, 0, sizeof(cpuid_data));
-
-    cpuid_i = 0;
-
-    r = kvm_arch_set_tsc_khz(cs);
-    if (r < 0) {
-        return r;
-    }
-
-    /* vcpu's TSC frequency is either specified by user, or following
-     * the value used by KVM if the former is not present. In the
-     * latter case, we query it from KVM and record in env->tsc_khz,
-     * so that vcpu's TSC frequency can be migrated later via this field.
-     */
-    if (!env->tsc_khz) {
-        r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
-            kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
-            -ENOTSUP;
-        if (r > 0) {
-            env->tsc_khz = r;
-        }
-    }
-
-    env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
-
-    /* Paravirtualization CPUIDs */
-    hyperv_expand_features(cs, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-        return -ENOSYS;
-    }
-
-    if (hyperv_enabled(cpu)) {
-        r = hyperv_init_vcpu(cpu);
-        if (r) {
-            return r;
-        }
-
-        cpuid_i = hyperv_fill_cpuids(cs, cpuid_data.entries);
-        kvm_base = KVM_CPUID_SIGNATURE_NEXT;
-        has_msr_hv_hypercall = true;
-    }
-
-    if (cpu->expose_kvm) {
-        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
-        c = &cpuid_data.entries[cpuid_i++];
-        c->function = KVM_CPUID_SIGNATURE | kvm_base;
-        c->eax = KVM_CPUID_FEATURES | kvm_base;
-        c->ebx = signature[0];
-        c->ecx = signature[1];
-        c->edx = signature[2];
-
-        c = &cpuid_data.entries[cpuid_i++];
-        c->function = KVM_CPUID_FEATURES | kvm_base;
-        c->eax = env->features[FEAT_KVM];
-        c->edx = env->features[FEAT_KVM_HINTS];
-    }
 
     cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
 
@@ -1599,7 +1521,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
             fprintf(stderr, "unsupported level value: 0x%x\n", limit);
             abort();
         }
-        c = &cpuid_data.entries[cpuid_i++];
+        c = &entries[cpuid_i++];
 
         switch (i) {
         case 2: {
@@ -1618,7 +1540,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:2):eax & 0xf = 0x%x\n", times);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
                 c->function = i;
                 c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
                 cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
@@ -1664,7 +1586,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         case 0x7:
@@ -1683,7 +1605,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                                 "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
                 c->function = i;
                 c->index = j;
                 c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
@@ -1740,7 +1662,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
             fprintf(stderr, "unsupported xlevel value: 0x%x\n", limit);
             abort();
         }
-        c = &cpuid_data.entries[cpuid_i++];
+        c = &entries[cpuid_i++];
 
         switch (i) {
         case 0x8000001d:
@@ -1759,7 +1681,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         default:
@@ -1786,7 +1708,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                 fprintf(stderr, "unsupported xlevel2 value: 0x%x\n", limit);
                 abort();
             }
-            c = &cpuid_data.entries[cpuid_i++];
+            c = &entries[cpuid_i++];
 
             c->function = i;
             c->flags = 0;
@@ -1794,6 +1716,94 @@ int kvm_arch_init_vcpu(CPUState *cs)
         }
     }
 
+    return cpuid_i;
+}
+
+#define KVM_MAX_CPUID_ENTRIES  100
+
+int kvm_arch_init_vcpu(CPUState *cs)
+{
+    struct {
+        struct kvm_cpuid2 cpuid;
+        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
+    } cpuid_data;
+    /*
+     * The kernel defines these structs with padding fields so there
+     * should be no extra padding in our cpuid_data struct.
+     */
+    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
+                      sizeof(struct kvm_cpuid2) +
+                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = &cpu->env;
+    uint32_t cpuid_i;
+    struct kvm_cpuid_entry2 *c;
+    uint32_t signature[3];
+    int kvm_base = KVM_CPUID_SIGNATURE;
+    int max_nested_state_len;
+    int r;
+    Error *local_err = NULL;
+
+    memset(&cpuid_data, 0, sizeof(cpuid_data));
+
+    cpuid_i = 0;
+
+    r = kvm_arch_set_tsc_khz(cs);
+    if (r < 0) {
+        return r;
+    }
+
+    /* vcpu's TSC frequency is either specified by user, or following
+     * the value used by KVM if the former is not present. In the
+     * latter case, we query it from KVM and record in env->tsc_khz,
+     * so that vcpu's TSC frequency can be migrated later via this field.
+     */
+    if (!env->tsc_khz) {
+        r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
+            kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
+            -ENOTSUP;
+        if (r > 0) {
+            env->tsc_khz = r;
+        }
+    }
+
+    env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
+
+    /* Paravirtualization CPUIDs */
+    hyperv_expand_features(cs, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -ENOSYS;
+    }
+
+    if (hyperv_enabled(cpu)) {
+        r = hyperv_init_vcpu(cpu);
+        if (r) {
+            return r;
+        }
+
+        cpuid_i = hyperv_fill_cpuids(cs, cpuid_data.entries);
+        kvm_base = KVM_CPUID_SIGNATURE_NEXT;
+        has_msr_hv_hypercall = true;
+    }
+
+    if (cpu->expose_kvm) {
+        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
+        c = &cpuid_data.entries[cpuid_i++];
+        c->function = KVM_CPUID_SIGNATURE | kvm_base;
+        c->eax = KVM_CPUID_FEATURES | kvm_base;
+        c->ebx = signature[0];
+        c->ecx = signature[1];
+        c->edx = signature[2];
+
+        c = &cpuid_data.entries[cpuid_i++];
+        c->function = KVM_CPUID_FEATURES | kvm_base;
+        c->eax = env->features[FEAT_KVM];
+        c->edx = env->features[FEAT_KVM_HINTS];
+    }
+
+    cpuid_i = kvm_x86_arch_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
     if (((env->cpuid_version >> 8)&0xF) >= 6
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index dc72508389..c9a92578b1 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -24,6 +24,10 @@
 #define kvm_ioapic_in_kernel() \
     (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
 
+#define KVM_MAX_CPUID_ENTRIES  100
+uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+                            uint32_t cpuid_i);
+
 #else
 
 #define kvm_pit_in_kernel()      0
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 03/44] i386/kvm: Move architectural CPUID leaf generation to separarte helper
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the architectural (for lack of a better term) CPUID leaf generation
to a separate helper so that the generation code can be reused by TDX,
which needs to generate a canonical VM-scoped configuration.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c      | 186 +++++++++++++++++++------------------
 target/i386/kvm/kvm_i386.h |   4 +
 2 files changed, 102 insertions(+), 88 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 04e4ec063f..0558e4b506 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1507,90 +1507,12 @@ static int hyperv_init_vcpu(X86CPU *cpu)
 
 static Error *invtsc_mig_blocker;
 
-#define KVM_MAX_CPUID_ENTRIES  100
-
-int kvm_arch_init_vcpu(CPUState *cs)
+uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+                            uint32_t cpuid_i)
 {
-    struct {
-        struct kvm_cpuid2 cpuid;
-        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
-    } cpuid_data;
-    /*
-     * The kernel defines these structs with padding fields so there
-     * should be no extra padding in our cpuid_data struct.
-     */
-    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
-                      sizeof(struct kvm_cpuid2) +
-                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
-
-    X86CPU *cpu = X86_CPU(cs);
-    CPUX86State *env = &cpu->env;
-    uint32_t limit, i, j, cpuid_i;
+    uint32_t limit, i, j;
     uint32_t unused;
     struct kvm_cpuid_entry2 *c;
-    uint32_t signature[3];
-    int kvm_base = KVM_CPUID_SIGNATURE;
-    int max_nested_state_len;
-    int r;
-    Error *local_err = NULL;
-
-    memset(&cpuid_data, 0, sizeof(cpuid_data));
-
-    cpuid_i = 0;
-
-    r = kvm_arch_set_tsc_khz(cs);
-    if (r < 0) {
-        return r;
-    }
-
-    /* vcpu's TSC frequency is either specified by user, or following
-     * the value used by KVM if the former is not present. In the
-     * latter case, we query it from KVM and record in env->tsc_khz,
-     * so that vcpu's TSC frequency can be migrated later via this field.
-     */
-    if (!env->tsc_khz) {
-        r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
-            kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
-            -ENOTSUP;
-        if (r > 0) {
-            env->tsc_khz = r;
-        }
-    }
-
-    env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
-
-    /* Paravirtualization CPUIDs */
-    hyperv_expand_features(cs, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-        return -ENOSYS;
-    }
-
-    if (hyperv_enabled(cpu)) {
-        r = hyperv_init_vcpu(cpu);
-        if (r) {
-            return r;
-        }
-
-        cpuid_i = hyperv_fill_cpuids(cs, cpuid_data.entries);
-        kvm_base = KVM_CPUID_SIGNATURE_NEXT;
-        has_msr_hv_hypercall = true;
-    }
-
-    if (cpu->expose_kvm) {
-        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
-        c = &cpuid_data.entries[cpuid_i++];
-        c->function = KVM_CPUID_SIGNATURE | kvm_base;
-        c->eax = KVM_CPUID_FEATURES | kvm_base;
-        c->ebx = signature[0];
-        c->ecx = signature[1];
-        c->edx = signature[2];
-
-        c = &cpuid_data.entries[cpuid_i++];
-        c->function = KVM_CPUID_FEATURES | kvm_base;
-        c->eax = env->features[FEAT_KVM];
-        c->edx = env->features[FEAT_KVM_HINTS];
-    }
 
     cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
 
@@ -1599,7 +1521,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
             fprintf(stderr, "unsupported level value: 0x%x\n", limit);
             abort();
         }
-        c = &cpuid_data.entries[cpuid_i++];
+        c = &entries[cpuid_i++];
 
         switch (i) {
         case 2: {
@@ -1618,7 +1540,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:2):eax & 0xf = 0x%x\n", times);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
                 c->function = i;
                 c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
                 cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
@@ -1664,7 +1586,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         case 0x7:
@@ -1683,7 +1605,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                                 "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
                 c->function = i;
                 c->index = j;
                 c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
@@ -1740,7 +1662,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
             fprintf(stderr, "unsupported xlevel value: 0x%x\n", limit);
             abort();
         }
-        c = &cpuid_data.entries[cpuid_i++];
+        c = &entries[cpuid_i++];
 
         switch (i) {
         case 0x8000001d:
@@ -1759,7 +1681,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                             "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
                     abort();
                 }
-                c = &cpuid_data.entries[cpuid_i++];
+                c = &entries[cpuid_i++];
             }
             break;
         default:
@@ -1786,7 +1708,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
                 fprintf(stderr, "unsupported xlevel2 value: 0x%x\n", limit);
                 abort();
             }
-            c = &cpuid_data.entries[cpuid_i++];
+            c = &entries[cpuid_i++];
 
             c->function = i;
             c->flags = 0;
@@ -1794,6 +1716,94 @@ int kvm_arch_init_vcpu(CPUState *cs)
         }
     }
 
+    return cpuid_i;
+}
+
+#define KVM_MAX_CPUID_ENTRIES  100
+
+int kvm_arch_init_vcpu(CPUState *cs)
+{
+    struct {
+        struct kvm_cpuid2 cpuid;
+        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
+    } cpuid_data;
+    /*
+     * The kernel defines these structs with padding fields so there
+     * should be no extra padding in our cpuid_data struct.
+     */
+    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
+                      sizeof(struct kvm_cpuid2) +
+                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *env = &cpu->env;
+    uint32_t cpuid_i;
+    struct kvm_cpuid_entry2 *c;
+    uint32_t signature[3];
+    int kvm_base = KVM_CPUID_SIGNATURE;
+    int max_nested_state_len;
+    int r;
+    Error *local_err = NULL;
+
+    memset(&cpuid_data, 0, sizeof(cpuid_data));
+
+    cpuid_i = 0;
+
+    r = kvm_arch_set_tsc_khz(cs);
+    if (r < 0) {
+        return r;
+    }
+
+    /* vcpu's TSC frequency is either specified by user, or following
+     * the value used by KVM if the former is not present. In the
+     * latter case, we query it from KVM and record in env->tsc_khz,
+     * so that vcpu's TSC frequency can be migrated later via this field.
+     */
+    if (!env->tsc_khz) {
+        r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
+            kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
+            -ENOTSUP;
+        if (r > 0) {
+            env->tsc_khz = r;
+        }
+    }
+
+    env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
+
+    /* Paravirtualization CPUIDs */
+    hyperv_expand_features(cs, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -ENOSYS;
+    }
+
+    if (hyperv_enabled(cpu)) {
+        r = hyperv_init_vcpu(cpu);
+        if (r) {
+            return r;
+        }
+
+        cpuid_i = hyperv_fill_cpuids(cs, cpuid_data.entries);
+        kvm_base = KVM_CPUID_SIGNATURE_NEXT;
+        has_msr_hv_hypercall = true;
+    }
+
+    if (cpu->expose_kvm) {
+        memcpy(signature, "KVMKVMKVM\0\0\0", 12);
+        c = &cpuid_data.entries[cpuid_i++];
+        c->function = KVM_CPUID_SIGNATURE | kvm_base;
+        c->eax = KVM_CPUID_FEATURES | kvm_base;
+        c->ebx = signature[0];
+        c->ecx = signature[1];
+        c->edx = signature[2];
+
+        c = &cpuid_data.entries[cpuid_i++];
+        c->function = KVM_CPUID_FEATURES | kvm_base;
+        c->eax = env->features[FEAT_KVM];
+        c->edx = env->features[FEAT_KVM_HINTS];
+    }
+
+    cpuid_i = kvm_x86_arch_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
     if (((env->cpuid_version >> 8)&0xF) >= 6
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index dc72508389..c9a92578b1 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -24,6 +24,10 @@
 #define kvm_ioapic_in_kernel() \
     (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split())
 
+#define KVM_MAX_CPUID_ENTRIES  100
+uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+                            uint32_t cpuid_i);
+
 #else
 
 #define kvm_pit_in_kernel()      0
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 04/44] vl: Introduce machine_init_done_late notifier
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Introduce a new notifier, machine_init_done_late, that is notified after
machine_init_done.  This will be used by TDX to generate the HOB for its
virtual firmware, which needs to be done after all guest memory has been
added, i.e. after machine_init_done notifiers have run.  Some code
registers memory by machine_init_done().

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/core/machine.c       | 26 ++++++++++++++++++++++++++
 include/sysemu/sysemu.h |  2 ++
 2 files changed, 28 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index ffc076ae84..66c39cf72a 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1278,6 +1278,31 @@ void qemu_remove_machine_init_done_notifier(Notifier *notify)
     notifier_remove(notify);
 }
 
+static NotifierList machine_init_done_late_notifiers =
+    NOTIFIER_LIST_INITIALIZER(machine_init_done_late_notifiers);
+
+static bool machine_init_done_late;
+
+void qemu_add_machine_init_done_late_notifier(Notifier *notify)
+{
+    notifier_list_add(&machine_init_done_late_notifiers, notify);
+    if (machine_init_done_late) {
+        notify->notify(notify, NULL);
+    }
+}
+
+void qemu_remove_machine_init_done_late_notifier(Notifier *notify)
+{
+    notifier_remove(notify);
+}
+
+
+static void qemu_run_machine_init_done_late_notifiers(void)
+{
+    machine_init_done_late = true;
+    notifier_list_notify(&machine_init_done_late_notifiers, NULL);
+}
+
 void qdev_machine_creation_done(void)
 {
     cpu_synchronize_all_post_init();
@@ -1311,6 +1336,7 @@ void qdev_machine_creation_done(void)
     if (rom_check_and_register_reset() != 0) {
         exit(1);
     }
+    qemu_run_machine_init_done_late_notifiers();
 
     replay_start();
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8fae667172..d44f8cf778 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -19,6 +19,8 @@ void qemu_remove_exit_notifier(Notifier *notify);
 void qemu_run_machine_init_done_notifiers(void);
 void qemu_add_machine_init_done_notifier(Notifier *notify);
 void qemu_remove_machine_init_done_notifier(Notifier *notify);
+void qemu_add_machine_init_done_late_notifier(Notifier *notify);
+void qemu_remove_machine_init_done_late_notifier(Notifier *notify);
 
 void configure_rtc(QemuOpts *opts);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 04/44] vl: Introduce machine_init_done_late notifier
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Introduce a new notifier, machine_init_done_late, that is notified after
machine_init_done.  This will be used by TDX to generate the HOB for its
virtual firmware, which needs to be done after all guest memory has been
added, i.e. after machine_init_done notifiers have run.  Some code
registers memory by machine_init_done().

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/core/machine.c       | 26 ++++++++++++++++++++++++++
 include/sysemu/sysemu.h |  2 ++
 2 files changed, 28 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index ffc076ae84..66c39cf72a 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1278,6 +1278,31 @@ void qemu_remove_machine_init_done_notifier(Notifier *notify)
     notifier_remove(notify);
 }
 
+static NotifierList machine_init_done_late_notifiers =
+    NOTIFIER_LIST_INITIALIZER(machine_init_done_late_notifiers);
+
+static bool machine_init_done_late;
+
+void qemu_add_machine_init_done_late_notifier(Notifier *notify)
+{
+    notifier_list_add(&machine_init_done_late_notifiers, notify);
+    if (machine_init_done_late) {
+        notify->notify(notify, NULL);
+    }
+}
+
+void qemu_remove_machine_init_done_late_notifier(Notifier *notify)
+{
+    notifier_remove(notify);
+}
+
+
+static void qemu_run_machine_init_done_late_notifiers(void)
+{
+    machine_init_done_late = true;
+    notifier_list_notify(&machine_init_done_late_notifiers, NULL);
+}
+
 void qdev_machine_creation_done(void)
 {
     cpu_synchronize_all_post_init();
@@ -1311,6 +1336,7 @@ void qdev_machine_creation_done(void)
     if (rom_check_and_register_reset() != 0) {
         exit(1);
     }
+    qemu_run_machine_init_done_late_notifiers();
 
     replay_start();
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8fae667172..d44f8cf778 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -19,6 +19,8 @@ void qemu_remove_exit_notifier(Notifier *notify);
 void qemu_run_machine_init_done_notifiers(void);
 void qemu_add_machine_init_done_notifier(Notifier *notify);
 void qemu_remove_machine_init_done_notifier(Notifier *notify);
+void qemu_add_machine_init_done_late_notifier(Notifier *notify);
+void qemu_remove_machine_init_done_late_notifier(Notifier *notify);
 
 void configure_rtc(QemuOpts *opts);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 05/44] linux-headers: Update headers to pull in TDX API changes
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Xiaoyao Li <xiaoyao.li@intel.com>

Pull in recent TDX updates, which are not backwards compatible.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 linux-headers/asm-x86/kvm.h | 60 +++++++++++++++++++++++++++++++++++++
 linux-headers/linux/kvm.h   |  2 ++
 2 files changed, 62 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 0662f644aa..dbcb590fb8 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -490,4 +490,64 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+#define KVM_X86_LEGACY_VM	0
+#define KVM_X86_SW_PROTECTED_VM	1
+#define KVM_X86_TDX_VM		2
+
+/* Trust Domain eXtension command*/
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES = 0,
+	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	__u32 id;
+	__u32 metadata;
+	__u64 data;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
+struct kvm_tdx_init_vm {
+	__u32 max_vcpus;
+	__u32 tsc_khz;
+	__u64 attributes;
+	__u64 cpuid;
+	__u64 mrconfigid[6];    /* sha384 digest */
+	__u64 mrowner[6];       /* sha384 digest */
+	__u64 mrownerconfig[6]; /* sha348 digest */
+	__u64 reserved[43];     /* must be zero for future extensibility */
+};
+
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 20d6a263bb..65ac70d6fd 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1084,6 +1084,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
 
+#define KVM_CAP_VM_TYPES 1000
+
 #ifdef KVM_CAP_IRQ_ROUTING
 
 struct kvm_irq_routing_irqchip {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 05/44] linux-headers: Update headers to pull in TDX API changes
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Xiaoyao Li <xiaoyao.li@intel.com>

Pull in recent TDX updates, which are not backwards compatible.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 linux-headers/asm-x86/kvm.h | 60 +++++++++++++++++++++++++++++++++++++
 linux-headers/linux/kvm.h   |  2 ++
 2 files changed, 62 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 0662f644aa..dbcb590fb8 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -490,4 +490,64 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+#define KVM_X86_LEGACY_VM	0
+#define KVM_X86_SW_PROTECTED_VM	1
+#define KVM_X86_TDX_VM		2
+
+/* Trust Domain eXtension command*/
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES = 0,
+	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	__u32 id;
+	__u32 metadata;
+	__u64 data;
+};
+
+struct kvm_tdx_cpuid_config {
+	__u32 leaf;
+	__u32 sub_leaf;
+	__u32 eax;
+	__u32 ebx;
+	__u32 ecx;
+	__u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 attrs_fixed0;
+	__u64 attrs_fixed1;
+	__u64 xfam_fixed0;
+	__u64 xfam_fixed1;
+
+	__u32 nr_cpuid_configs;
+	__u32 padding;
+	struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
+struct kvm_tdx_init_vm {
+	__u32 max_vcpus;
+	__u32 tsc_khz;
+	__u64 attributes;
+	__u64 cpuid;
+	__u64 mrconfigid[6];    /* sha384 digest */
+	__u64 mrowner[6];       /* sha384 digest */
+	__u64 mrownerconfig[6]; /* sha348 digest */
+	__u64 reserved[43];     /* must be zero for future extensibility */
+};
+
+#define KVM_TDX_MEASURE_MEMORY_REGION	(1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 20d6a263bb..65ac70d6fd 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1084,6 +1084,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
 
+#define KVM_CAP_VM_TYPES 1000
+
 #ifdef KVM_CAP_IRQ_ROUTING
 
 struct kvm_irq_routing_irqchip {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Xiaoyao Li <xiaoyao.li@intel.com>

Introduce a machine property, kvm-type, to allow the user to create a
Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:

 # $QEMU \
	-machine ...,kvm-type=tdx \
	...

Only two types are supported: "legacy" and "tdx", with "legacy" being
the default.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 default-configs/devices/i386-softmmu.mak |  1 +
 hw/i386/Kconfig                          |  5 +++
 hw/i386/x86.c                            | 44 ++++++++++++++++++++++++
 include/hw/i386/x86.h                    |  1 +
 include/sysemu/tdx.h                     | 10 ++++++
 target/i386/kvm/kvm-stub.c               |  5 +++
 target/i386/kvm/kvm.c                    | 16 +++++++++
 target/i386/kvm/kvm_i386.h               |  1 +
 target/i386/kvm/meson.build              |  1 +
 target/i386/kvm/tdx-stub.c               | 10 ++++++
 target/i386/kvm/tdx.c                    | 30 ++++++++++++++++
 11 files changed, 124 insertions(+)
 create mode 100644 include/sysemu/tdx.h
 create mode 100644 target/i386/kvm/tdx-stub.c
 create mode 100644 target/i386/kvm/tdx.c

diff --git a/default-configs/devices/i386-softmmu.mak b/default-configs/devices/i386-softmmu.mak
index 84d1a2487c..6e805407b8 100644
--- a/default-configs/devices/i386-softmmu.mak
+++ b/default-configs/devices/i386-softmmu.mak
@@ -18,6 +18,7 @@
 #CONFIG_QXL=n
 #CONFIG_SEV=n
 #CONFIG_SGA=n
+#CONFIG_TDX=n
 #CONFIG_TEST_DEVICES=n
 #CONFIG_TPM_CRB=n
 #CONFIG_TPM_TIS_ISA=n
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index aacb6f6d96..01633123e0 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -2,6 +2,10 @@ config SEV
     bool
     depends on KVM
 
+config TDX
+    bool
+    depends on KVM
+
 config PC
     bool
     imply APPLESMC
@@ -17,6 +21,7 @@ config PC
     imply PVPANIC_ISA
     imply QXL
     imply SEV
+    imply TDX
     imply SGA
     imply TEST_DEVICES
     imply TPM_CRB
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 00448ed55a..ed15f6f2cf 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -21,6 +21,7 @@
  * THE SOFTWARE.
  */
 #include "qemu/osdep.h"
+#include <linux/kvm.h>
 #include "qemu/error-report.h"
 #include "qemu/option.h"
 #include "qemu/cutils.h"
@@ -31,6 +32,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qapi/qapi-visit-common.h"
 #include "qapi/visitor.h"
+#include "sysemu/kvm_int.h"
 #include "sysemu/qtest.h"
 #include "sysemu/whpx.h"
 #include "sysemu/numa.h"
@@ -1263,6 +1265,42 @@ static void x86_machine_set_bus_lock_ratelimit(Object *obj, Visitor *v,
     visit_type_uint64(v, name, &x86ms->bus_lock_ratelimit, errp);
 }
 
+static char *x86_get_kvm_type(Object *obj, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    return g_strdup(x86ms->kvm_type);
+}
+
+static void x86_set_kvm_type(Object *obj, const char *value, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    g_free(x86ms->kvm_type);
+    x86ms->kvm_type = g_strdup(value);
+}
+
+static int x86_kvm_type(MachineState *ms, const char *vm_type)
+{
+    int kvm_type;
+
+    if (!vm_type || !strcmp(vm_type, "") ||
+        !g_ascii_strcasecmp(vm_type, "legacy")) {
+        kvm_type = KVM_X86_LEGACY_VM;
+    } else if (!g_ascii_strcasecmp(vm_type, "tdx")) {
+        kvm_type = KVM_X86_TDX_VM;
+    } else {
+        error_report("Unknown kvm-type specified '%s'", vm_type);
+        exit(1);
+    }
+    if (kvm_set_vm_type(ms, kvm_type)) {
+        error_report("kvm-type '%s' not supported by KVM", vm_type);
+        exit(1);
+    }
+
+    return kvm_type;
+}
+
 static void x86_machine_initfn(Object *obj)
 {
     X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1273,6 +1311,11 @@ static void x86_machine_initfn(Object *obj)
     x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
     x86ms->bus_lock_ratelimit = 0;
+
+    object_property_add_str(obj, "kvm-type",
+                            x86_get_kvm_type, x86_set_kvm_type);
+    object_property_set_description(obj, "kvm-type",
+                                    "KVM guest type (legacy, tdx)");
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
@@ -1284,6 +1327,7 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
     mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
     mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
     mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+    mc->kvm_type = x86_kvm_type;
     x86mc->compat_apic_id_mode = false;
     x86mc->save_tsc_khz = true;
     nc->nmi_monitor_handler = x86_nmi;
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 6e9244a82c..a450b5e226 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -56,6 +56,7 @@ struct X86MachineState {
 
     /* RAM information (sizes, addresses, configuration): */
     ram_addr_t below_4g_mem_size, above_4g_mem_size;
+    char *kvm_type;
 
     /* CPU and apic information: */
     bool apic_xrupt_override;
diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
new file mode 100644
index 0000000000..60ebded851
--- /dev/null
+++ b/include/sysemu/tdx.h
@@ -0,0 +1,10 @@
+#ifndef QEMU_TDX_H
+#define QEMU_TDX_H
+
+#ifndef CONFIG_USER_ONLY
+#include "sysemu/kvm.h"
+
+bool kvm_has_tdx(KVMState *s);
+#endif
+
+#endif
diff --git a/target/i386/kvm/kvm-stub.c b/target/i386/kvm/kvm-stub.c
index 92f49121b8..e9221de76f 100644
--- a/target/i386/kvm/kvm-stub.c
+++ b/target/i386/kvm/kvm-stub.c
@@ -39,3 +39,8 @@ bool kvm_hv_vpindex_settable(void)
 {
     return false;
 }
+
+int kvm_set_vm_type(MachineState *ms, int kvm_type)
+{
+    return 0;
+}
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 0558e4b506..a3d5b334d1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -27,6 +27,7 @@
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/runstate.h"
+#include "sysemu/tdx.h"
 #include "kvm_i386.h"
 #include "sev_i386.h"
 #include "hyperv.h"
@@ -132,9 +133,24 @@ static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
 
+
 #define BUS_LOCK_SLICE_TIME 1000000000ULL /* ns */
 static RateLimit bus_lock_ratelimit_ctrl;
 
+static int vm_type;
+
+int kvm_set_vm_type(MachineState *ms, int kvm_type)
+{
+    if (kvm_type == KVM_X86_LEGACY_VM ||
+        (kvm_type == KVM_X86_TDX_VM &&
+         kvm_has_tdx(KVM_STATE(ms->accelerator)))) {
+        vm_type = kvm_type;
+        return 0;
+    }
+
+    return -ENOTSUP;
+}
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index c9a92578b1..8e63365162 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -41,6 +41,7 @@ bool kvm_has_adjust_clock(void);
 bool kvm_has_adjust_clock_stable(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
+int kvm_set_vm_type(MachineState *ms, int kvm_type);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_do_init_vcpu(X86CPU *cs);
 
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 0a533411ca..3c143a3c93 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -6,3 +6,4 @@ i386_softmmu_ss.add(when: 'CONFIG_KVM', if_true: files(
 ))
 
 i386_softmmu_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'), if_false: files('tdx-stub.c'))
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
new file mode 100644
index 0000000000..e1eb09cae1
--- /dev/null
+++ b/target/i386/kvm/tdx-stub.c
@@ -0,0 +1,10 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "sysemu/tdx.h"
+
+#ifndef CONFIG_USER_ONLY
+bool kvm_has_tdx(KVMState *s)
+{
+        return false;
+}
+#endif
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
new file mode 100644
index 0000000000..e62a570f75
--- /dev/null
+++ b/target/i386/kvm/tdx.c
@@ -0,0 +1,30 @@
+/*
+ * QEMU TDX support
+ *
+ * Copyright Intel
+ *
+ * Author:
+ *      Xiaoyao Li <xiaoyao.li@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include <linux/kvm.h>
+
+#include "cpu.h"
+#include "hw/boards.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_int.h"
+#include "sysemu/tdx.h"
+
+bool kvm_has_tdx(KVMState *s)
+{
+    return !!(kvm_check_extension(s, KVM_CAP_VM_TYPES) & BIT(KVM_X86_TDX_VM));
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Xiaoyao Li <xiaoyao.li@intel.com>

Introduce a machine property, kvm-type, to allow the user to create a
Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:

 # $QEMU \
	-machine ...,kvm-type=tdx \
	...

Only two types are supported: "legacy" and "tdx", with "legacy" being
the default.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 default-configs/devices/i386-softmmu.mak |  1 +
 hw/i386/Kconfig                          |  5 +++
 hw/i386/x86.c                            | 44 ++++++++++++++++++++++++
 include/hw/i386/x86.h                    |  1 +
 include/sysemu/tdx.h                     | 10 ++++++
 target/i386/kvm/kvm-stub.c               |  5 +++
 target/i386/kvm/kvm.c                    | 16 +++++++++
 target/i386/kvm/kvm_i386.h               |  1 +
 target/i386/kvm/meson.build              |  1 +
 target/i386/kvm/tdx-stub.c               | 10 ++++++
 target/i386/kvm/tdx.c                    | 30 ++++++++++++++++
 11 files changed, 124 insertions(+)
 create mode 100644 include/sysemu/tdx.h
 create mode 100644 target/i386/kvm/tdx-stub.c
 create mode 100644 target/i386/kvm/tdx.c

diff --git a/default-configs/devices/i386-softmmu.mak b/default-configs/devices/i386-softmmu.mak
index 84d1a2487c..6e805407b8 100644
--- a/default-configs/devices/i386-softmmu.mak
+++ b/default-configs/devices/i386-softmmu.mak
@@ -18,6 +18,7 @@
 #CONFIG_QXL=n
 #CONFIG_SEV=n
 #CONFIG_SGA=n
+#CONFIG_TDX=n
 #CONFIG_TEST_DEVICES=n
 #CONFIG_TPM_CRB=n
 #CONFIG_TPM_TIS_ISA=n
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index aacb6f6d96..01633123e0 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -2,6 +2,10 @@ config SEV
     bool
     depends on KVM
 
+config TDX
+    bool
+    depends on KVM
+
 config PC
     bool
     imply APPLESMC
@@ -17,6 +21,7 @@ config PC
     imply PVPANIC_ISA
     imply QXL
     imply SEV
+    imply TDX
     imply SGA
     imply TEST_DEVICES
     imply TPM_CRB
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 00448ed55a..ed15f6f2cf 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -21,6 +21,7 @@
  * THE SOFTWARE.
  */
 #include "qemu/osdep.h"
+#include <linux/kvm.h>
 #include "qemu/error-report.h"
 #include "qemu/option.h"
 #include "qemu/cutils.h"
@@ -31,6 +32,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qapi/qapi-visit-common.h"
 #include "qapi/visitor.h"
+#include "sysemu/kvm_int.h"
 #include "sysemu/qtest.h"
 #include "sysemu/whpx.h"
 #include "sysemu/numa.h"
@@ -1263,6 +1265,42 @@ static void x86_machine_set_bus_lock_ratelimit(Object *obj, Visitor *v,
     visit_type_uint64(v, name, &x86ms->bus_lock_ratelimit, errp);
 }
 
+static char *x86_get_kvm_type(Object *obj, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    return g_strdup(x86ms->kvm_type);
+}
+
+static void x86_set_kvm_type(Object *obj, const char *value, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    g_free(x86ms->kvm_type);
+    x86ms->kvm_type = g_strdup(value);
+}
+
+static int x86_kvm_type(MachineState *ms, const char *vm_type)
+{
+    int kvm_type;
+
+    if (!vm_type || !strcmp(vm_type, "") ||
+        !g_ascii_strcasecmp(vm_type, "legacy")) {
+        kvm_type = KVM_X86_LEGACY_VM;
+    } else if (!g_ascii_strcasecmp(vm_type, "tdx")) {
+        kvm_type = KVM_X86_TDX_VM;
+    } else {
+        error_report("Unknown kvm-type specified '%s'", vm_type);
+        exit(1);
+    }
+    if (kvm_set_vm_type(ms, kvm_type)) {
+        error_report("kvm-type '%s' not supported by KVM", vm_type);
+        exit(1);
+    }
+
+    return kvm_type;
+}
+
 static void x86_machine_initfn(Object *obj)
 {
     X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1273,6 +1311,11 @@ static void x86_machine_initfn(Object *obj)
     x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
     x86ms->bus_lock_ratelimit = 0;
+
+    object_property_add_str(obj, "kvm-type",
+                            x86_get_kvm_type, x86_set_kvm_type);
+    object_property_set_description(obj, "kvm-type",
+                                    "KVM guest type (legacy, tdx)");
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
@@ -1284,6 +1327,7 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
     mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
     mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
     mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+    mc->kvm_type = x86_kvm_type;
     x86mc->compat_apic_id_mode = false;
     x86mc->save_tsc_khz = true;
     nc->nmi_monitor_handler = x86_nmi;
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 6e9244a82c..a450b5e226 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -56,6 +56,7 @@ struct X86MachineState {
 
     /* RAM information (sizes, addresses, configuration): */
     ram_addr_t below_4g_mem_size, above_4g_mem_size;
+    char *kvm_type;
 
     /* CPU and apic information: */
     bool apic_xrupt_override;
diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
new file mode 100644
index 0000000000..60ebded851
--- /dev/null
+++ b/include/sysemu/tdx.h
@@ -0,0 +1,10 @@
+#ifndef QEMU_TDX_H
+#define QEMU_TDX_H
+
+#ifndef CONFIG_USER_ONLY
+#include "sysemu/kvm.h"
+
+bool kvm_has_tdx(KVMState *s);
+#endif
+
+#endif
diff --git a/target/i386/kvm/kvm-stub.c b/target/i386/kvm/kvm-stub.c
index 92f49121b8..e9221de76f 100644
--- a/target/i386/kvm/kvm-stub.c
+++ b/target/i386/kvm/kvm-stub.c
@@ -39,3 +39,8 @@ bool kvm_hv_vpindex_settable(void)
 {
     return false;
 }
+
+int kvm_set_vm_type(MachineState *ms, int kvm_type)
+{
+    return 0;
+}
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 0558e4b506..a3d5b334d1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -27,6 +27,7 @@
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/runstate.h"
+#include "sysemu/tdx.h"
 #include "kvm_i386.h"
 #include "sev_i386.h"
 #include "hyperv.h"
@@ -132,9 +133,24 @@ static struct kvm_cpuid2 *cpuid_cache;
 static struct kvm_cpuid2 *hv_cpuid_cache;
 static struct kvm_msr_list *kvm_feature_msrs;
 
+
 #define BUS_LOCK_SLICE_TIME 1000000000ULL /* ns */
 static RateLimit bus_lock_ratelimit_ctrl;
 
+static int vm_type;
+
+int kvm_set_vm_type(MachineState *ms, int kvm_type)
+{
+    if (kvm_type == KVM_X86_LEGACY_VM ||
+        (kvm_type == KVM_X86_TDX_VM &&
+         kvm_has_tdx(KVM_STATE(ms->accelerator)))) {
+        vm_type = kvm_type;
+        return 0;
+    }
+
+    return -ENOTSUP;
+}
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index c9a92578b1..8e63365162 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -41,6 +41,7 @@ bool kvm_has_adjust_clock(void);
 bool kvm_has_adjust_clock_stable(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
+int kvm_set_vm_type(MachineState *ms, int kvm_type);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_do_init_vcpu(X86CPU *cs);
 
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 0a533411ca..3c143a3c93 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -6,3 +6,4 @@ i386_softmmu_ss.add(when: 'CONFIG_KVM', if_true: files(
 ))
 
 i386_softmmu_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'), if_false: files('tdx-stub.c'))
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
new file mode 100644
index 0000000000..e1eb09cae1
--- /dev/null
+++ b/target/i386/kvm/tdx-stub.c
@@ -0,0 +1,10 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "sysemu/tdx.h"
+
+#ifndef CONFIG_USER_ONLY
+bool kvm_has_tdx(KVMState *s)
+{
+        return false;
+}
+#endif
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
new file mode 100644
index 0000000000..e62a570f75
--- /dev/null
+++ b/target/i386/kvm/tdx.c
@@ -0,0 +1,30 @@
+/*
+ * QEMU TDX support
+ *
+ * Copyright Intel
+ *
+ * Author:
+ *      Xiaoyao Li <xiaoyao.li@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include <linux/kvm.h>
+
+#include "cpu.h"
+#include "hw/boards.h"
+#include "qapi/error.h"
+#include "qom/object_interfaces.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
+#include "sysemu/kvm_int.h"
+#include "sysemu/tdx.h"
+
+bool kvm_has_tdx(KVMState *s)
+{
+    return !!(kvm_check_extension(s, KVM_CAP_VM_TYPES) & BIT(KVM_X86_TDX_VM));
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 07/44] i386/kvm: Squash getting/putting guest state for TDX VMs
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Ignore get/put state of TDX VMs as accessing/mutating guest state of
producation TDs is not supported.
Allow kvm_arch_get_registers() to run as normal, except for MSRs, for
debug TDs, and silently ignores attempts to read guest state for
non-debug TDs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a3d5b334d1..27b64dedc2 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2641,6 +2641,11 @@ void kvm_put_apicbase(X86CPU *cpu, uint64_t value)
 {
     int ret;
 
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (vm_type == KVM_X86_TDX_VM) {
+            return;
+    }
+
     ret = kvm_put_one_msr(cpu, MSR_IA32_APICBASE, value);
     assert(ret == 1);
 }
@@ -4099,6 +4104,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (vm_type == KVM_X86_TDX_VM) {
+        return 0;
+    }
+
     /* must be before kvm_put_nested_state so that EFER.SVME is set */
     ret = kvm_put_sregs(x86_cpu);
     if (ret < 0) {
@@ -4209,9 +4219,11 @@ int kvm_arch_get_registers(CPUState *cs)
     if (ret < 0) {
         goto out;
     }
-    ret = kvm_get_msrs(cpu);
-    if (ret < 0) {
-        goto out;
+    if (vm_type != KVM_X86_TDX_VM) {
+        ret = kvm_get_msrs(cpu);
+        if (ret < 0) {
+            goto out;
+        }
     }
     ret = kvm_get_apic(cpu);
     if (ret < 0) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 07/44] i386/kvm: Squash getting/putting guest state for TDX VMs
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Sean Christopherson <sean.j.christopherson@intel.com>

Ignore get/put state of TDX VMs as accessing/mutating guest state of
producation TDs is not supported.
Allow kvm_arch_get_registers() to run as normal, except for MSRs, for
debug TDs, and silently ignores attempts to read guest state for
non-debug TDs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a3d5b334d1..27b64dedc2 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2641,6 +2641,11 @@ void kvm_put_apicbase(X86CPU *cpu, uint64_t value)
 {
     int ret;
 
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (vm_type == KVM_X86_TDX_VM) {
+            return;
+    }
+
     ret = kvm_put_one_msr(cpu, MSR_IA32_APICBASE, value);
     assert(ret == 1);
 }
@@ -4099,6 +4104,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    /* TODO: Allow accessing guest state for debug TDs. */
+    if (vm_type == KVM_X86_TDX_VM) {
+        return 0;
+    }
+
     /* must be before kvm_put_nested_state so that EFER.SVME is set */
     ret = kvm_put_sregs(x86_cpu);
     if (ret < 0) {
@@ -4209,9 +4219,11 @@ int kvm_arch_get_registers(CPUState *cs)
     if (ret < 0) {
         goto out;
     }
-    ret = kvm_get_msrs(cpu);
-    if (ret < 0) {
-        goto out;
+    if (vm_type != KVM_X86_TDX_VM) {
+        ret = kvm_get_msrs(cpu);
+        if (ret < 0) {
+            goto out;
+        }
     }
     ret = kvm_get_apic(cpu);
     if (ret < 0) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 08/44] i386/kvm: Skip KVM_X86_SETUP_MCE for TDX guests
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Despite advertising MCE support to the guest, TDX-SEAM doesn't support
injecting #MCs into the guest.   All of the associated setup is thus
rejected by KVM.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 27b64dedc2..c29cb420a1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1825,7 +1825,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
     if (((env->cpuid_version >> 8)&0xF) >= 6
         && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
            (CPUID_MCE | CPUID_MCA)
-        && kvm_check_extension(cs->kvm_state, KVM_CAP_MCE) > 0) {
+        && kvm_check_extension(cs->kvm_state, KVM_CAP_MCE) > 0
+        && vm_type != KVM_X86_TDX_VM) {
         uint64_t mcg_cap, unsupported_caps;
         int banks;
         int ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 08/44] i386/kvm: Skip KVM_X86_SETUP_MCE for TDX guests
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Despite advertising MCE support to the guest, TDX-SEAM doesn't support
injecting #MCs into the guest.   All of the associated setup is thus
rejected by KVM.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 27b64dedc2..c29cb420a1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1825,7 +1825,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
     if (((env->cpuid_version >> 8)&0xF) >= 6
         && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
            (CPUID_MCE | CPUID_MCA)
-        && kvm_check_extension(cs->kvm_state, KVM_CAP_MCE) > 0) {
+        && kvm_check_extension(cs->kvm_state, KVM_CAP_MCE) > 0
+        && vm_type != KVM_X86_TDX_VM) {
         uint64_t mcg_cap, unsupported_caps;
         int banks;
         int ret;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 09/44] target/i386: kvm: don't synchronize guest tsc for TD guest
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Make kvm_synchronize_all_tsc() nop for TD-guest.

TDX module specification, 9.11.1 TSC Virtualization
"Virtual TSC values are consistent among all the TD;s VCPUs at the
level suppored by the CPU".
There is no need for qemu to synchronize tsc and VMM can't access
to guest TSC. Actually do_kvm_synchronize_tsc() hits assert due to
failure to write to guest tsc.

> qemu/target/i386/kvm.c:235: kvm_get_tsc: Assertion `ret == 1' failed.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c29cb420a1..ecb1714920 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -254,7 +254,7 @@ void kvm_synchronize_all_tsc(void)
 {
     CPUState *cpu;
 
-    if (kvm_enabled()) {
+    if (kvm_enabled() && vm_type != KVM_X86_TDX_VM) {
         CPU_FOREACH(cpu) {
             run_on_cpu(cpu, do_kvm_synchronize_tsc, RUN_ON_CPU_NULL);
         }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 09/44] target/i386: kvm: don't synchronize guest tsc for TD guest
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Make kvm_synchronize_all_tsc() nop for TD-guest.

TDX module specification, 9.11.1 TSC Virtualization
"Virtual TSC values are consistent among all the TD;s VCPUs at the
level suppored by the CPU".
There is no need for qemu to synchronize tsc and VMM can't access
to guest TSC. Actually do_kvm_synchronize_tsc() hits assert due to
failure to write to guest tsc.

> qemu/target/i386/kvm.c:235: kvm_get_tsc: Assertion `ret == 1' failed.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c29cb420a1..ecb1714920 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -254,7 +254,7 @@ void kvm_synchronize_all_tsc(void)
 {
     CPUState *cpu;
 
-    if (kvm_enabled()) {
+    if (kvm_enabled() && vm_type != KVM_X86_TDX_VM) {
         CPU_FOREACH(cpu) {
             run_on_cpu(cpu, do_kvm_synchronize_tsc, RUN_ON_CPU_NULL);
         }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 10/44] hw/i386: Initialize TDX via KVM ioctl() when kvm_type is TDX
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Xiaoyao Li <xiaoyao.li@intel.com>

Introduce tdx_ioctl() to invoke TDX specific sub-ioctls of
KVM_MEMORY_ENCRYPT_OP.  Use tdx_ioctl() to invoke KVM_TDX_INIT, by way
of tdx_init(), during kvm_arch_init().  KVM_TDX_INIT configures global
TD state, e.g. the canonical CPUID config, and must be executed prior to
creating vCPUs.

Note, this doesn't address the fact that Qemu may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring Qemu to
provide a stable CPUID config prior to kvm_arch_init().

Explicitly set subleaf index and flags when adding CPUID
Set the index and flags when adding a CPUID entry to avoid propagating
stale state from a removed entry, e.g. when the CPUID 0x4 loop bails, it
can leave non-zero index and flags in the array.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 accel/kvm/kvm-all.c        |   2 +
 include/sysemu/tdx.h       |   2 +
 qapi/qom.json              |  14 +++++
 target/i386/kvm/tdx-stub.c |   4 ++
 target/i386/kvm/tdx.c      | 126 +++++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx.h      |  24 +++++++
 6 files changed, 172 insertions(+)
 create mode 100644 target/i386/kvm/tdx.h

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index fdbe24bf59..6475f15d5f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -38,6 +38,7 @@
 #include "qemu/main-loop.h"
 #include "trace.h"
 #include "hw/irq.h"
+#include "sysemu/tdx.h"
 #include "qapi/visitor.h"
 #include "qapi/qapi-types-common.h"
 #include "qapi/qapi-visit-common.h"
@@ -459,6 +460,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
     trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+    tdx_pre_create_vcpu(cpu);
     ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
     if (ret < 0) {
         error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)",
diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index 60ebded851..36a901e723 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -7,4 +7,6 @@
 bool kvm_has_tdx(KVMState *s);
 #endif
 
+void tdx_pre_create_vcpu(CPUState *cpu);
+
 #endif
diff --git a/qapi/qom.json b/qapi/qom.json
index 652be317b8..70c70e3efe 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -760,6 +760,18 @@
             '*cbitpos': 'uint32',
             'reduced-phys-bits': 'uint32' } }
 
+##
+# @TdxGuestProperties:
+#
+# Properties for tdx-guest objects.
+#
+# @debug: enable debug mode (default: off)
+#
+# Since: 6.0
+##
+{ 'struct': 'TdxGuestProperties',
+  'data': { '*debug': 'bool' } }
+
 ##
 # @ObjectType:
 #
@@ -802,6 +814,7 @@
     'secret_keyring',
     'sev-guest',
     's390-pv-guest',
+    'tdx-guest',
     'throttle-group',
     'tls-creds-anon',
     'tls-creds-psk',
@@ -858,6 +871,7 @@
       'secret':                     'SecretProperties',
       'secret_keyring':             'SecretKeyringProperties',
       'sev-guest':                  'SevGuestProperties',
+      'tdx-guest':                  'TdxGuestProperties',
       'throttle-group':             'ThrottleGroupProperties',
       'tls-creds-anon':             'TlsCredsAnonProperties',
       'tls-creds-psk':              'TlsCredsPskProperties',
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index e1eb09cae1..93d5913c89 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -8,3 +8,7 @@ bool kvm_has_tdx(KVMState *s)
         return false;
 }
 #endif
+
+void tdx_pre_create_vcpu(CPUState *cpu)
+{
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e62a570f75..e8c70f241d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,8 +14,10 @@
 #include "qemu/osdep.h"
 
 #include <linux/kvm.h>
+#include <sys/ioctl.h>
 
 #include "cpu.h"
+#include "kvm_i386.h"
 #include "hw/boards.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
@@ -23,8 +25,132 @@
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/tdx.h"
+#include "tdx.h"
+
+#define TDX1_TD_ATTRIBUTE_DEBUG BIT_ULL(0)
+#define TDX1_TD_ATTRIBUTE_PERFMON BIT_ULL(63)
 
 bool kvm_has_tdx(KVMState *s)
 {
     return !!(kvm_check_extension(s, KVM_CAP_VM_TYPES) & BIT(KVM_X86_TDX_VM));
 }
+
+static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
+                        __u32 metadata, void *data)
+{
+    struct kvm_tdx_cmd tdx_cmd;
+    int r;
+
+    memset(&tdx_cmd, 0x0, sizeof(tdx_cmd));
+
+    tdx_cmd.id = ioctl_no;
+    tdx_cmd.metadata = metadata;
+    tdx_cmd.data = (__u64)(unsigned long)data;
+
+    r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    if (r) {
+        error_report("%s failed: %s", ioctl_name, strerror(-r));
+        exit(1);
+    }
+}
+#define tdx_ioctl(ioctl_no, metadata, data) \
+        __tdx_ioctl(ioctl_no, stringify(ioctl_no), metadata, data)
+
+void tdx_pre_create_vcpu(CPUState *cpu)
+{
+    struct {
+        struct kvm_cpuid2 cpuid;
+        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
+    } cpuid_data;
+
+    /*
+     * The kernel defines these structs with padding fields so there
+     * should be no extra padding in our cpuid_data struct.
+     */
+    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
+                      sizeof(struct kvm_cpuid2) +
+                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+    struct kvm_tdx_init_vm init_vm;
+
+    if (!tdx) {
+        return;
+    }
+
+    /* HACK: Remove MPX support, which is not allowed by TDX. */
+    env->features[FEAT_XSAVE_COMP_LO] &= ~(XSTATE_BNDREGS_MASK |
+                                           XSTATE_BNDCSR_MASK);
+
+    if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
+        error_report("TDX VM must support XSAVE features");
+        exit(1);
+    }
+
+    qemu_mutex_lock(&tdx->lock);
+    if (tdx->initialized) {
+        goto out;
+    }
+    tdx->initialized = true;
+
+    memset(&cpuid_data, 0, sizeof(cpuid_data));
+
+    cpuid_data.cpuid.nent = kvm_x86_arch_cpuid(env, cpuid_data.entries, 0);
+    cpuid_data.cpuid.padding = 0;
+
+    init_vm.max_vcpus = ms->smp.cpus;
+    init_vm.attributes = 0;
+    init_vm.attributes |= tdx->debug ? TDX1_TD_ATTRIBUTE_DEBUG : 0;
+    init_vm.attributes |= x86cpu->enable_pmu ? TDX1_TD_ATTRIBUTE_PERFMON : 0;
+
+    init_vm.cpuid = (__u64)(&cpuid_data);
+    tdx_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
+out:
+    qemu_mutex_unlock(&tdx->lock);
+}
+
+static bool tdx_guest_get_debug(Object *obj, Error **errp)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    return tdx->debug;
+}
+
+static void tdx_guest_set_debug(Object *obj, bool value, Error **errp)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    tdx->debug = value;
+}
+
+/* tdx guest */
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
+                                   tdx_guest,
+                                   TDX_GUEST,
+                                   CONFIDENTIAL_GUEST_SUPPORT,
+                                   { TYPE_USER_CREATABLE },
+                                   { NULL })
+
+static void tdx_guest_init(Object *obj)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    tdx->parent_obj.ready = true;
+    qemu_mutex_init(&tdx->lock);
+
+    tdx->debug = false;
+    object_property_add_bool(obj, "debug", tdx_guest_get_debug,
+                             tdx_guest_set_debug);
+}
+
+static void tdx_guest_finalize(Object *obj)
+{
+}
+
+static void tdx_guest_class_init(ObjectClass *oc, void *data)
+{
+}
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
new file mode 100644
index 0000000000..6ad6c9a313
--- /dev/null
+++ b/target/i386/kvm/tdx.h
@@ -0,0 +1,24 @@
+#ifndef QEMU_I386_TDX_H
+#define QEMU_I386_TDX_H
+
+#include "qom/object.h"
+#include "exec/confidential-guest-support.h"
+
+#define TYPE_TDX_GUEST "tdx-guest"
+#define TDX_GUEST(obj)     \
+    OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
+
+typedef struct TdxGuestClass {
+    ConfidentialGuestSupportClass parent_class;
+} TdxGuestClass;
+
+typedef struct TdxGuest {
+    ConfidentialGuestSupport parent_obj;
+
+    QemuMutex lock;
+
+    bool initialized;
+    bool debug;
+} TdxGuest;
+
+#endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 10/44] hw/i386: Initialize TDX via KVM ioctl() when kvm_type is TDX
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Xiaoyao Li <xiaoyao.li@intel.com>

Introduce tdx_ioctl() to invoke TDX specific sub-ioctls of
KVM_MEMORY_ENCRYPT_OP.  Use tdx_ioctl() to invoke KVM_TDX_INIT, by way
of tdx_init(), during kvm_arch_init().  KVM_TDX_INIT configures global
TD state, e.g. the canonical CPUID config, and must be executed prior to
creating vCPUs.

Note, this doesn't address the fact that Qemu may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring Qemu to
provide a stable CPUID config prior to kvm_arch_init().

Explicitly set subleaf index and flags when adding CPUID
Set the index and flags when adding a CPUID entry to avoid propagating
stale state from a removed entry, e.g. when the CPUID 0x4 loop bails, it
can leave non-zero index and flags in the array.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 accel/kvm/kvm-all.c        |   2 +
 include/sysemu/tdx.h       |   2 +
 qapi/qom.json              |  14 +++++
 target/i386/kvm/tdx-stub.c |   4 ++
 target/i386/kvm/tdx.c      | 126 +++++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx.h      |  24 +++++++
 6 files changed, 172 insertions(+)
 create mode 100644 target/i386/kvm/tdx.h

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index fdbe24bf59..6475f15d5f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -38,6 +38,7 @@
 #include "qemu/main-loop.h"
 #include "trace.h"
 #include "hw/irq.h"
+#include "sysemu/tdx.h"
 #include "qapi/visitor.h"
 #include "qapi/qapi-types-common.h"
 #include "qapi/qapi-visit-common.h"
@@ -459,6 +460,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
     trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+    tdx_pre_create_vcpu(cpu);
     ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
     if (ret < 0) {
         error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)",
diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index 60ebded851..36a901e723 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -7,4 +7,6 @@
 bool kvm_has_tdx(KVMState *s);
 #endif
 
+void tdx_pre_create_vcpu(CPUState *cpu);
+
 #endif
diff --git a/qapi/qom.json b/qapi/qom.json
index 652be317b8..70c70e3efe 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -760,6 +760,18 @@
             '*cbitpos': 'uint32',
             'reduced-phys-bits': 'uint32' } }
 
+##
+# @TdxGuestProperties:
+#
+# Properties for tdx-guest objects.
+#
+# @debug: enable debug mode (default: off)
+#
+# Since: 6.0
+##
+{ 'struct': 'TdxGuestProperties',
+  'data': { '*debug': 'bool' } }
+
 ##
 # @ObjectType:
 #
@@ -802,6 +814,7 @@
     'secret_keyring',
     'sev-guest',
     's390-pv-guest',
+    'tdx-guest',
     'throttle-group',
     'tls-creds-anon',
     'tls-creds-psk',
@@ -858,6 +871,7 @@
       'secret':                     'SecretProperties',
       'secret_keyring':             'SecretKeyringProperties',
       'sev-guest':                  'SevGuestProperties',
+      'tdx-guest':                  'TdxGuestProperties',
       'throttle-group':             'ThrottleGroupProperties',
       'tls-creds-anon':             'TlsCredsAnonProperties',
       'tls-creds-psk':              'TlsCredsPskProperties',
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index e1eb09cae1..93d5913c89 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -8,3 +8,7 @@ bool kvm_has_tdx(KVMState *s)
         return false;
 }
 #endif
+
+void tdx_pre_create_vcpu(CPUState *cpu)
+{
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e62a570f75..e8c70f241d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,8 +14,10 @@
 #include "qemu/osdep.h"
 
 #include <linux/kvm.h>
+#include <sys/ioctl.h>
 
 #include "cpu.h"
+#include "kvm_i386.h"
 #include "hw/boards.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
@@ -23,8 +25,132 @@
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/tdx.h"
+#include "tdx.h"
+
+#define TDX1_TD_ATTRIBUTE_DEBUG BIT_ULL(0)
+#define TDX1_TD_ATTRIBUTE_PERFMON BIT_ULL(63)
 
 bool kvm_has_tdx(KVMState *s)
 {
     return !!(kvm_check_extension(s, KVM_CAP_VM_TYPES) & BIT(KVM_X86_TDX_VM));
 }
+
+static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
+                        __u32 metadata, void *data)
+{
+    struct kvm_tdx_cmd tdx_cmd;
+    int r;
+
+    memset(&tdx_cmd, 0x0, sizeof(tdx_cmd));
+
+    tdx_cmd.id = ioctl_no;
+    tdx_cmd.metadata = metadata;
+    tdx_cmd.data = (__u64)(unsigned long)data;
+
+    r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    if (r) {
+        error_report("%s failed: %s", ioctl_name, strerror(-r));
+        exit(1);
+    }
+}
+#define tdx_ioctl(ioctl_no, metadata, data) \
+        __tdx_ioctl(ioctl_no, stringify(ioctl_no), metadata, data)
+
+void tdx_pre_create_vcpu(CPUState *cpu)
+{
+    struct {
+        struct kvm_cpuid2 cpuid;
+        struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
+    } cpuid_data;
+
+    /*
+     * The kernel defines these structs with padding fields so there
+     * should be no extra padding in our cpuid_data struct.
+     */
+    QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
+                      sizeof(struct kvm_cpuid2) +
+                      sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
+
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+    struct kvm_tdx_init_vm init_vm;
+
+    if (!tdx) {
+        return;
+    }
+
+    /* HACK: Remove MPX support, which is not allowed by TDX. */
+    env->features[FEAT_XSAVE_COMP_LO] &= ~(XSTATE_BNDREGS_MASK |
+                                           XSTATE_BNDCSR_MASK);
+
+    if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
+        error_report("TDX VM must support XSAVE features");
+        exit(1);
+    }
+
+    qemu_mutex_lock(&tdx->lock);
+    if (tdx->initialized) {
+        goto out;
+    }
+    tdx->initialized = true;
+
+    memset(&cpuid_data, 0, sizeof(cpuid_data));
+
+    cpuid_data.cpuid.nent = kvm_x86_arch_cpuid(env, cpuid_data.entries, 0);
+    cpuid_data.cpuid.padding = 0;
+
+    init_vm.max_vcpus = ms->smp.cpus;
+    init_vm.attributes = 0;
+    init_vm.attributes |= tdx->debug ? TDX1_TD_ATTRIBUTE_DEBUG : 0;
+    init_vm.attributes |= x86cpu->enable_pmu ? TDX1_TD_ATTRIBUTE_PERFMON : 0;
+
+    init_vm.cpuid = (__u64)(&cpuid_data);
+    tdx_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
+out:
+    qemu_mutex_unlock(&tdx->lock);
+}
+
+static bool tdx_guest_get_debug(Object *obj, Error **errp)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    return tdx->debug;
+}
+
+static void tdx_guest_set_debug(Object *obj, bool value, Error **errp)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    tdx->debug = value;
+}
+
+/* tdx guest */
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
+                                   tdx_guest,
+                                   TDX_GUEST,
+                                   CONFIDENTIAL_GUEST_SUPPORT,
+                                   { TYPE_USER_CREATABLE },
+                                   { NULL })
+
+static void tdx_guest_init(Object *obj)
+{
+    TdxGuest *tdx = TDX_GUEST(obj);
+
+    tdx->parent_obj.ready = true;
+    qemu_mutex_init(&tdx->lock);
+
+    tdx->debug = false;
+    object_property_add_bool(obj, "debug", tdx_guest_get_debug,
+                             tdx_guest_set_debug);
+}
+
+static void tdx_guest_finalize(Object *obj)
+{
+}
+
+static void tdx_guest_class_init(ObjectClass *oc, void *data)
+{
+}
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
new file mode 100644
index 0000000000..6ad6c9a313
--- /dev/null
+++ b/target/i386/kvm/tdx.h
@@ -0,0 +1,24 @@
+#ifndef QEMU_I386_TDX_H
+#define QEMU_I386_TDX_H
+
+#include "qom/object.h"
+#include "exec/confidential-guest-support.h"
+
+#define TYPE_TDX_GUEST "tdx-guest"
+#define TDX_GUEST(obj)     \
+    OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
+
+typedef struct TdxGuestClass {
+    ConfidentialGuestSupportClass parent_class;
+} TdxGuestClass;
+
+typedef struct TdxGuest {
+    ConfidentialGuestSupport parent_obj;
+
+    QemuMutex lock;
+
+    bool initialized;
+    bool debug;
+} TdxGuest;
+
+#endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 11/44] i386/tdx: Implement user specified tsc frequency
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Xiaoyao Li <xiaoyao.li@intel.com>

Reuse -cpu,tsc-frequency= to get user wanted tsc frequency and pass it
to KVM_TDX_INIT_VM.

Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by SEAM module).

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c |  8 ++++++++
 target/i386/kvm/tdx.c | 16 ++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ecb1714920..be0b96b120 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -763,6 +763,14 @@ static int kvm_arch_set_tsc_khz(CPUState *cs)
     int r, cur_freq;
     bool set_ioctl = false;
 
+    /*
+     * TD guest's TSC is immutable, it cannot be set/changed via
+     * KVM_SET_TSC_KHZ, but only be initialized via KVM_TDX_INIT_VM
+     */
+    if (vm_type == KVM_X86_TDX_VM) {
+        return 0;
+    }
+
     if (!env->tsc_khz) {
         return 0;
     }
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e8c70f241d..c50a0dcf11 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -29,6 +29,8 @@
 
 #define TDX1_TD_ATTRIBUTE_DEBUG BIT_ULL(0)
 #define TDX1_TD_ATTRIBUTE_PERFMON BIT_ULL(63)
+#define TDX1_MIN_TSC_FREQUENCY_KHZ (100 * 1000)
+#define TDX1_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000)
 
 bool kvm_has_tdx(KVMState *s)
 {
@@ -91,6 +93,19 @@ void tdx_pre_create_vcpu(CPUState *cpu)
         exit(1);
     }
 
+    if (env->tsc_khz && (env->tsc_khz < TDX1_MIN_TSC_FREQUENCY_KHZ ||
+                         env->tsc_khz > TDX1_MAX_TSC_FREQUENCY_KHZ)) {
+        error_report("Invalid TSC %ld KHz, must specify cpu_frequecy between [%d, %d] kHz\n",
+                      env->tsc_khz, TDX1_MIN_TSC_FREQUENCY_KHZ,
+                      TDX1_MAX_TSC_FREQUENCY_KHZ);
+        exit(1);
+    }
+
+    if (env->tsc_khz % (25 * 1000)) {
+        error_report("Invalid TSC %ld KHz, it must be multiple of 25MHz\n", env->tsc_khz);
+        exit(1);
+    }
+
     qemu_mutex_lock(&tdx->lock);
     if (tdx->initialized) {
         goto out;
@@ -103,6 +118,7 @@ void tdx_pre_create_vcpu(CPUState *cpu)
     cpuid_data.cpuid.padding = 0;
 
     init_vm.max_vcpus = ms->smp.cpus;
+    init_vm.tsc_khz = env->tsc_khz;
     init_vm.attributes = 0;
     init_vm.attributes |= tdx->debug ? TDX1_TD_ATTRIBUTE_DEBUG : 0;
     init_vm.attributes |= x86cpu->enable_pmu ? TDX1_TD_ATTRIBUTE_PERFMON : 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 11/44] i386/tdx: Implement user specified tsc frequency
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Xiaoyao Li <xiaoyao.li@intel.com>

Reuse -cpu,tsc-frequency= to get user wanted tsc frequency and pass it
to KVM_TDX_INIT_VM.

Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by SEAM module).

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c |  8 ++++++++
 target/i386/kvm/tdx.c | 16 ++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ecb1714920..be0b96b120 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -763,6 +763,14 @@ static int kvm_arch_set_tsc_khz(CPUState *cs)
     int r, cur_freq;
     bool set_ioctl = false;
 
+    /*
+     * TD guest's TSC is immutable, it cannot be set/changed via
+     * KVM_SET_TSC_KHZ, but only be initialized via KVM_TDX_INIT_VM
+     */
+    if (vm_type == KVM_X86_TDX_VM) {
+        return 0;
+    }
+
     if (!env->tsc_khz) {
         return 0;
     }
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e8c70f241d..c50a0dcf11 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -29,6 +29,8 @@
 
 #define TDX1_TD_ATTRIBUTE_DEBUG BIT_ULL(0)
 #define TDX1_TD_ATTRIBUTE_PERFMON BIT_ULL(63)
+#define TDX1_MIN_TSC_FREQUENCY_KHZ (100 * 1000)
+#define TDX1_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000)
 
 bool kvm_has_tdx(KVMState *s)
 {
@@ -91,6 +93,19 @@ void tdx_pre_create_vcpu(CPUState *cpu)
         exit(1);
     }
 
+    if (env->tsc_khz && (env->tsc_khz < TDX1_MIN_TSC_FREQUENCY_KHZ ||
+                         env->tsc_khz > TDX1_MAX_TSC_FREQUENCY_KHZ)) {
+        error_report("Invalid TSC %ld KHz, must specify cpu_frequecy between [%d, %d] kHz\n",
+                      env->tsc_khz, TDX1_MIN_TSC_FREQUENCY_KHZ,
+                      TDX1_MAX_TSC_FREQUENCY_KHZ);
+        exit(1);
+    }
+
+    if (env->tsc_khz % (25 * 1000)) {
+        error_report("Invalid TSC %ld KHz, it must be multiple of 25MHz\n", env->tsc_khz);
+        exit(1);
+    }
+
     qemu_mutex_lock(&tdx->lock);
     if (tdx->initialized) {
         goto out;
@@ -103,6 +118,7 @@ void tdx_pre_create_vcpu(CPUState *cpu)
     cpuid_data.cpuid.padding = 0;
 
     init_vm.max_vcpus = ms->smp.cpus;
+    init_vm.tsc_khz = env->tsc_khz;
     init_vm.attributes = 0;
     init_vm.attributes |= tdx->debug ? TDX1_TD_ATTRIBUTE_DEBUG : 0;
     init_vm.attributes |= x86cpu->enable_pmu ? TDX1_TD_ATTRIBUTE_PERFMON : 0;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 12/44] target/i386/tdx: Finalize the TD's measurement when machine is done
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Xiaoyao Li <xiaoyao.li@intel.com>

Invoke KVM_TDX_FINALIZEMR to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c |  7 +++++++
 target/i386/kvm/tdx.c | 21 +++++++++++++++++++++
 target/i386/kvm/tdx.h |  3 +++
 3 files changed, 31 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index be0b96b120..5742fa4806 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -53,6 +53,7 @@
 #include "migration/blocker.h"
 #include "exec/memattrs.h"
 #include "trace.h"
+#include "tdx.h"
 
 //#define DEBUG_KVM
 
@@ -2246,6 +2247,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         return ret;
     }
 
+    ret = tdx_kvm_init(ms->cgs, &local_err);
+    if (ret < 0) {
+        error_report_err(local_err);
+        return ret;
+    }
+
     if (!kvm_check_extension(s, KVM_CAP_IRQ_ROUTING)) {
         error_report("kvm: KVM_CAP_IRQ_ROUTING not supported by KVM");
         return -ENOTSUP;
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index c50a0dcf11..f8c7560fc8 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -58,6 +58,27 @@ static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
 #define tdx_ioctl(ioctl_no, metadata, data) \
         __tdx_ioctl(ioctl_no, stringify(ioctl_no), metadata, data)
 
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+    tdx_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
+}
+
+static Notifier tdx_machine_done_late_notify = {
+    .notify = tdx_finalize_vm,
+};
+
+int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
+{
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(cgs),
+                                                    TYPE_TDX_GUEST);
+    if (!tdx) {
+        return 0;
+    }
+
+    qemu_add_machine_init_done_late_notifier(&tdx_machine_done_late_notify);
+    return 0;
+}
+
 void tdx_pre_create_vcpu(CPUState *cpu)
 {
     struct {
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 6ad6c9a313..e15657d272 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -2,6 +2,7 @@
 #define QEMU_I386_TDX_H
 
 #include "qom/object.h"
+#include "qapi/error.h"
 #include "exec/confidential-guest-support.h"
 
 #define TYPE_TDX_GUEST "tdx-guest"
@@ -21,4 +22,6 @@ typedef struct TdxGuest {
     bool debug;
 } TdxGuest;
 
+int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
+
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 12/44] target/i386/tdx: Finalize the TD's measurement when machine is done
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Xiaoyao Li <xiaoyao.li@intel.com>

Invoke KVM_TDX_FINALIZEMR to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c |  7 +++++++
 target/i386/kvm/tdx.c | 21 +++++++++++++++++++++
 target/i386/kvm/tdx.h |  3 +++
 3 files changed, 31 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index be0b96b120..5742fa4806 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -53,6 +53,7 @@
 #include "migration/blocker.h"
 #include "exec/memattrs.h"
 #include "trace.h"
+#include "tdx.h"
 
 //#define DEBUG_KVM
 
@@ -2246,6 +2247,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         return ret;
     }
 
+    ret = tdx_kvm_init(ms->cgs, &local_err);
+    if (ret < 0) {
+        error_report_err(local_err);
+        return ret;
+    }
+
     if (!kvm_check_extension(s, KVM_CAP_IRQ_ROUTING)) {
         error_report("kvm: KVM_CAP_IRQ_ROUTING not supported by KVM");
         return -ENOTSUP;
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index c50a0dcf11..f8c7560fc8 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -58,6 +58,27 @@ static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
 #define tdx_ioctl(ioctl_no, metadata, data) \
         __tdx_ioctl(ioctl_no, stringify(ioctl_no), metadata, data)
 
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+    tdx_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
+}
+
+static Notifier tdx_machine_done_late_notify = {
+    .notify = tdx_finalize_vm,
+};
+
+int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
+{
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(cgs),
+                                                    TYPE_TDX_GUEST);
+    if (!tdx) {
+        return 0;
+    }
+
+    qemu_add_machine_init_done_late_notifier(&tdx_machine_done_late_notify);
+    return 0;
+}
+
 void tdx_pre_create_vcpu(CPUState *cpu)
 {
     struct {
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 6ad6c9a313..e15657d272 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -2,6 +2,7 @@
 #define QEMU_I386_TDX_H
 
 #include "qom/object.h"
+#include "qapi/error.h"
 #include "exec/confidential-guest-support.h"
 
 #define TYPE_TDX_GUEST "tdx-guest"
@@ -21,4 +22,6 @@ typedef struct TdxGuest {
     bool debug;
 } TdxGuest;
 
+int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
+
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 13/44] i386/tdx: Frame in tdx_get_supported_cpuid with KVM_TDX_CAPABILITIES
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add support for grabbing KVM_TDX_CAPABILITIES and use the new
kvm_get_supported_cpuid() hook to adjust the supported XCR0 bits.

Add TODOs for the remaining work.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c |  2 ++
 target/i386/kvm/tdx.c | 79 ++++++++++++++++++++++++++++++++++++++++---
 target/i386/kvm/tdx.h |  2 ++
 3 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5742fa4806..25dcecd60c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -448,6 +448,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
         ret |= 1U << KVM_HINTS_REALTIME;
     }
 
+    tdx_get_supported_cpuid(s, function, index, reg, &ret);
+
     return ret;
 }
 
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index f8c7560fc8..b1e4f27c9a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -21,6 +21,7 @@
 #include "hw/boards.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_int.h"
@@ -49,7 +50,11 @@ static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
     tdx_cmd.metadata = metadata;
     tdx_cmd.data = (__u64)(unsigned long)data;
 
-    r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    if (ioctl_no == KVM_TDX_CAPABILITIES) {
+        r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    } else {
+        r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    }
     if (r) {
         error_report("%s failed: %s", ioctl_name, strerror(-r));
         exit(1);
@@ -67,6 +72,18 @@ static Notifier tdx_machine_done_late_notify = {
     .notify = tdx_finalize_vm,
 };
 
+#define TDX1_MAX_NR_CPUID_CONFIGS 6
+
+static struct {
+    struct kvm_tdx_capabilities __caps;
+    struct kvm_tdx_cpuid_config __cpuid_configs[TDX1_MAX_NR_CPUID_CONFIGS];
+} __tdx_caps;
+
+static struct kvm_tdx_capabilities *tdx_caps = (void *)&__tdx_caps;
+
+#define XCR0_MASK (MAKE_64BIT_MASK(0, 8) | BIT_ULL(9))
+#define XSS_MASK (~XCR0_MASK)
+
 int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
     TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(cgs),
@@ -75,10 +92,65 @@ int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return 0;
     }
 
+    QEMU_BUILD_BUG_ON(sizeof(__tdx_caps) !=
+                      sizeof(struct kvm_tdx_capabilities) +
+                      sizeof(struct kvm_tdx_cpuid_config) *
+                      TDX1_MAX_NR_CPUID_CONFIGS);
+
+    tdx_caps->nr_cpuid_configs = TDX1_MAX_NR_CPUID_CONFIGS;
+    tdx_ioctl(KVM_TDX_CAPABILITIES, 0, tdx_caps);
+
     qemu_add_machine_init_done_late_notifier(&tdx_machine_done_late_notify);
+
     return 0;
 }
 
+void tdx_get_supported_cpuid(KVMState *s, uint32_t function,
+                             uint32_t index, int reg, uint32_t *ret)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+
+    if (!tdx) {
+        return;
+    }
+
+    switch (function) {
+    case 1:
+        if (reg == R_ECX) {
+            *ret &= ~CPUID_EXT_VMX;
+        }
+        break;
+    case 0xd:
+        if (index == 0) {
+            if (reg == R_EAX) {
+                *ret &= (uint32_t)tdx_caps->xfam_fixed0 & XCR0_MASK;
+                *ret |= (uint32_t)tdx_caps->xfam_fixed1 & XCR0_MASK;
+            } else if (reg == R_EDX) {
+                *ret &= (tdx_caps->xfam_fixed0 & XCR0_MASK) >> 32;
+                *ret |= (tdx_caps->xfam_fixed1 & XCR0_MASK) >> 32;
+            }
+        } else if (index == 1) {
+            /* TODO: Adjust XSS when it's supported. */
+        }
+        break;
+    case KVM_CPUID_FEATURES:
+        if (reg == R_EAX) {
+            *ret &= ~((1ULL << KVM_FEATURE_CLOCKSOURCE) |
+                      (1ULL << KVM_FEATURE_CLOCKSOURCE2) |
+                      (1ULL << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+                      (1ULL << KVM_FEATURE_ASYNC_PF) |
+                      (1ULL << KVM_FEATURE_ASYNC_PF_VMEXIT) |
+                      (1ULL << KVM_FEATURE_ASYNC_PF_INT));
+        }
+        break;
+    default:
+        /* TODO: Use tdx_caps to adjust CPUID leafs. */
+        break;
+    }
+}
+
 void tdx_pre_create_vcpu(CPUState *cpu)
 {
     struct {
@@ -105,10 +177,7 @@ void tdx_pre_create_vcpu(CPUState *cpu)
         return;
     }
 
-    /* HACK: Remove MPX support, which is not allowed by TDX. */
-    env->features[FEAT_XSAVE_COMP_LO] &= ~(XSTATE_BNDREGS_MASK |
-                                           XSTATE_BNDCSR_MASK);
-
+    /* TODO: Use tdx_caps to validate the config. */
     if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
         error_report("TDX VM must support XSAVE features");
         exit(1);
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index e15657d272..844d24aade 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -23,5 +23,7 @@ typedef struct TdxGuest {
 } TdxGuest;
 
 int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
+void tdx_get_supported_cpuid(KVMState *s, uint32_t function,
+                             uint32_t index, int reg, uint32_t *ret);
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 13/44] i386/tdx: Frame in tdx_get_supported_cpuid with KVM_TDX_CAPABILITIES
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add support for grabbing KVM_TDX_CAPABILITIES and use the new
kvm_get_supported_cpuid() hook to adjust the supported XCR0 bits.

Add TODOs for the remaining work.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/kvm.c |  2 ++
 target/i386/kvm/tdx.c | 79 ++++++++++++++++++++++++++++++++++++++++---
 target/i386/kvm/tdx.h |  2 ++
 3 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5742fa4806..25dcecd60c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -448,6 +448,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
         ret |= 1U << KVM_HINTS_REALTIME;
     }
 
+    tdx_get_supported_cpuid(s, function, index, reg, &ret);
+
     return ret;
 }
 
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index f8c7560fc8..b1e4f27c9a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -21,6 +21,7 @@
 #include "hw/boards.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_int.h"
@@ -49,7 +50,11 @@ static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
     tdx_cmd.metadata = metadata;
     tdx_cmd.data = (__u64)(unsigned long)data;
 
-    r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    if (ioctl_no == KVM_TDX_CAPABILITIES) {
+        r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    } else {
+        r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    }
     if (r) {
         error_report("%s failed: %s", ioctl_name, strerror(-r));
         exit(1);
@@ -67,6 +72,18 @@ static Notifier tdx_machine_done_late_notify = {
     .notify = tdx_finalize_vm,
 };
 
+#define TDX1_MAX_NR_CPUID_CONFIGS 6
+
+static struct {
+    struct kvm_tdx_capabilities __caps;
+    struct kvm_tdx_cpuid_config __cpuid_configs[TDX1_MAX_NR_CPUID_CONFIGS];
+} __tdx_caps;
+
+static struct kvm_tdx_capabilities *tdx_caps = (void *)&__tdx_caps;
+
+#define XCR0_MASK (MAKE_64BIT_MASK(0, 8) | BIT_ULL(9))
+#define XSS_MASK (~XCR0_MASK)
+
 int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
     TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(cgs),
@@ -75,10 +92,65 @@ int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return 0;
     }
 
+    QEMU_BUILD_BUG_ON(sizeof(__tdx_caps) !=
+                      sizeof(struct kvm_tdx_capabilities) +
+                      sizeof(struct kvm_tdx_cpuid_config) *
+                      TDX1_MAX_NR_CPUID_CONFIGS);
+
+    tdx_caps->nr_cpuid_configs = TDX1_MAX_NR_CPUID_CONFIGS;
+    tdx_ioctl(KVM_TDX_CAPABILITIES, 0, tdx_caps);
+
     qemu_add_machine_init_done_late_notifier(&tdx_machine_done_late_notify);
+
     return 0;
 }
 
+void tdx_get_supported_cpuid(KVMState *s, uint32_t function,
+                             uint32_t index, int reg, uint32_t *ret)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+
+    if (!tdx) {
+        return;
+    }
+
+    switch (function) {
+    case 1:
+        if (reg == R_ECX) {
+            *ret &= ~CPUID_EXT_VMX;
+        }
+        break;
+    case 0xd:
+        if (index == 0) {
+            if (reg == R_EAX) {
+                *ret &= (uint32_t)tdx_caps->xfam_fixed0 & XCR0_MASK;
+                *ret |= (uint32_t)tdx_caps->xfam_fixed1 & XCR0_MASK;
+            } else if (reg == R_EDX) {
+                *ret &= (tdx_caps->xfam_fixed0 & XCR0_MASK) >> 32;
+                *ret |= (tdx_caps->xfam_fixed1 & XCR0_MASK) >> 32;
+            }
+        } else if (index == 1) {
+            /* TODO: Adjust XSS when it's supported. */
+        }
+        break;
+    case KVM_CPUID_FEATURES:
+        if (reg == R_EAX) {
+            *ret &= ~((1ULL << KVM_FEATURE_CLOCKSOURCE) |
+                      (1ULL << KVM_FEATURE_CLOCKSOURCE2) |
+                      (1ULL << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+                      (1ULL << KVM_FEATURE_ASYNC_PF) |
+                      (1ULL << KVM_FEATURE_ASYNC_PF_VMEXIT) |
+                      (1ULL << KVM_FEATURE_ASYNC_PF_INT));
+        }
+        break;
+    default:
+        /* TODO: Use tdx_caps to adjust CPUID leafs. */
+        break;
+    }
+}
+
 void tdx_pre_create_vcpu(CPUState *cpu)
 {
     struct {
@@ -105,10 +177,7 @@ void tdx_pre_create_vcpu(CPUState *cpu)
         return;
     }
 
-    /* HACK: Remove MPX support, which is not allowed by TDX. */
-    env->features[FEAT_XSAVE_COMP_LO] &= ~(XSTATE_BNDREGS_MASK |
-                                           XSTATE_BNDCSR_MASK);
-
+    /* TODO: Use tdx_caps to validate the config. */
     if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
         error_report("TDX VM must support XSAVE features");
         exit(1);
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index e15657d272..844d24aade 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -23,5 +23,7 @@ typedef struct TdxGuest {
 } TdxGuest;
 
 int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
+void tdx_get_supported_cpuid(KVMState *s, uint32_t function,
+                             uint32_t index, int reg, uint32_t *ret);
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 14/44] i386/tdx: Frame in the call for KVM_TDX_INIT_VCPU
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Isaku Yamahata <isaku.yamahata@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/sysemu/tdx.h       |  1 +
 target/i386/kvm/kvm.c      |  8 ++++++++
 target/i386/kvm/tdx-stub.c |  4 ++++
 target/i386/kvm/tdx.c      | 20 ++++++++++++++++----
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index 36a901e723..03461b6ae8 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -8,5 +8,6 @@ bool kvm_has_tdx(KVMState *s);
 #endif
 
 void tdx_pre_create_vcpu(CPUState *cpu);
+void tdx_post_init_vcpu(CPUState *cpu);
 
 #endif
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 25dcecd60c..af6b5f350e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4122,6 +4122,14 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    /*
+     * level == KVM_PUT_FULL_STATE is only set by
+     * kvm_cpu_synchronize_post_init() after initialization
+     */
+    if (vm_type == KVM_X86_TDX_VM && level == KVM_PUT_FULL_STATE) {
+        tdx_post_init_vcpu(cpu);
+    }
+
     /* TODO: Allow accessing guest state for debug TDs. */
     if (vm_type == KVM_X86_TDX_VM) {
         return 0;
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 93d5913c89..93afe07ddb 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -12,3 +12,7 @@ bool kvm_has_tdx(KVMState *s)
 void tdx_pre_create_vcpu(CPUState *cpu)
 {
 }
+
+void tdx_post_init_vcpu(CPUState *cpu)
+{
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index b1e4f27c9a..67fb03b4b5 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -38,7 +38,7 @@ bool kvm_has_tdx(KVMState *s)
     return !!(kvm_check_extension(s, KVM_CAP_VM_TYPES) & BIT(KVM_X86_TDX_VM));
 }
 
-static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
+static void __tdx_ioctl(void *state, int ioctl_no, const char *ioctl_name,
                         __u32 metadata, void *data)
 {
     struct kvm_tdx_cmd tdx_cmd;
@@ -51,17 +51,21 @@ static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
     tdx_cmd.data = (__u64)(unsigned long)data;
 
     if (ioctl_no == KVM_TDX_CAPABILITIES) {
-        r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        r = kvm_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    } else if (ioctl_no == KVM_TDX_INIT_VCPU) {
+        r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
     } else {
-        r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        r = kvm_vm_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
     }
     if (r) {
         error_report("%s failed: %s", ioctl_name, strerror(-r));
         exit(1);
     }
 }
+#define _tdx_ioctl(cpu, ioctl_no, metadata, data) \
+        __tdx_ioctl(cpu, ioctl_no, stringify(ioctl_no), metadata, data)
 #define tdx_ioctl(ioctl_no, metadata, data) \
-        __tdx_ioctl(ioctl_no, stringify(ioctl_no), metadata, data)
+        _tdx_ioctl(kvm_state, ioctl_no, metadata, data)
 
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
@@ -219,6 +223,14 @@ out:
     qemu_mutex_unlock(&tdx->lock);
 }
 
+void tdx_post_init_vcpu(CPUState *cpu)
+{
+    CPUX86State *env = &X86_CPU(cpu)->env;
+
+    _tdx_ioctl(cpu, KVM_TDX_INIT_VCPU, 0,
+               (void *)(unsigned long)env->regs[R_ECX]);
+}
+
 static bool tdx_guest_get_debug(Object *obj, Error **errp)
 {
     TdxGuest *tdx = TDX_GUEST(obj);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 14/44] i386/tdx: Frame in the call for KVM_TDX_INIT_VCPU
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/sysemu/tdx.h       |  1 +
 target/i386/kvm/kvm.c      |  8 ++++++++
 target/i386/kvm/tdx-stub.c |  4 ++++
 target/i386/kvm/tdx.c      | 20 ++++++++++++++++----
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index 36a901e723..03461b6ae8 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -8,5 +8,6 @@ bool kvm_has_tdx(KVMState *s);
 #endif
 
 void tdx_pre_create_vcpu(CPUState *cpu);
+void tdx_post_init_vcpu(CPUState *cpu);
 
 #endif
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 25dcecd60c..af6b5f350e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4122,6 +4122,14 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+    /*
+     * level == KVM_PUT_FULL_STATE is only set by
+     * kvm_cpu_synchronize_post_init() after initialization
+     */
+    if (vm_type == KVM_X86_TDX_VM && level == KVM_PUT_FULL_STATE) {
+        tdx_post_init_vcpu(cpu);
+    }
+
     /* TODO: Allow accessing guest state for debug TDs. */
     if (vm_type == KVM_X86_TDX_VM) {
         return 0;
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 93d5913c89..93afe07ddb 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -12,3 +12,7 @@ bool kvm_has_tdx(KVMState *s)
 void tdx_pre_create_vcpu(CPUState *cpu)
 {
 }
+
+void tdx_post_init_vcpu(CPUState *cpu)
+{
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index b1e4f27c9a..67fb03b4b5 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -38,7 +38,7 @@ bool kvm_has_tdx(KVMState *s)
     return !!(kvm_check_extension(s, KVM_CAP_VM_TYPES) & BIT(KVM_X86_TDX_VM));
 }
 
-static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
+static void __tdx_ioctl(void *state, int ioctl_no, const char *ioctl_name,
                         __u32 metadata, void *data)
 {
     struct kvm_tdx_cmd tdx_cmd;
@@ -51,17 +51,21 @@ static void __tdx_ioctl(int ioctl_no, const char *ioctl_name,
     tdx_cmd.data = (__u64)(unsigned long)data;
 
     if (ioctl_no == KVM_TDX_CAPABILITIES) {
-        r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        r = kvm_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+    } else if (ioctl_no == KVM_TDX_INIT_VCPU) {
+        r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
     } else {
-        r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+        r = kvm_vm_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
     }
     if (r) {
         error_report("%s failed: %s", ioctl_name, strerror(-r));
         exit(1);
     }
 }
+#define _tdx_ioctl(cpu, ioctl_no, metadata, data) \
+        __tdx_ioctl(cpu, ioctl_no, stringify(ioctl_no), metadata, data)
 #define tdx_ioctl(ioctl_no, metadata, data) \
-        __tdx_ioctl(ioctl_no, stringify(ioctl_no), metadata, data)
+        _tdx_ioctl(kvm_state, ioctl_no, metadata, data)
 
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
@@ -219,6 +223,14 @@ out:
     qemu_mutex_unlock(&tdx->lock);
 }
 
+void tdx_post_init_vcpu(CPUState *cpu)
+{
+    CPUX86State *env = &X86_CPU(cpu)->env;
+
+    _tdx_ioctl(cpu, KVM_TDX_INIT_VCPU, 0,
+               (void *)(unsigned long)env->regs[R_ECX]);
+}
+
 static bool tdx_guest_get_debug(Object *obj, Error **errp)
 {
     TdxGuest *tdx = TDX_GUEST(obj);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 15/44] i386/tdx: Add hook to require generic device loader
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a hook for TDX to denote that the TD Virtual Firmware must be
provided via the "generic" device loader.  Error out if pflash is used
in conjuction with TDX.

Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/pc_sysfw.c         |  6 ++++++
 include/sysemu/tdx.h       |  2 ++
 target/i386/kvm/tdx-stub.c |  5 +++++
 target/i386/kvm/tdx.c      | 25 +++++++++++++++++++++++++
 4 files changed, 38 insertions(+)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index 6ce37a2b05..5ff571af36 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -38,6 +38,7 @@
 #include "hw/block/flash.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sev.h"
+#include "sysemu/tdx.h"
 
 #define FLASH_SECTOR_SIZE 4096
 
@@ -328,6 +329,11 @@ void pc_system_firmware_init(PCMachineState *pcms,
     int i;
     BlockBackend *pflash_blk[ARRAY_SIZE(pcms->flash)];
 
+    if (!tdx_system_firmware_init(pcms, rom_memory)) {
+        pc_system_flash_cleanup_unused(pcms);
+        return;
+    }
+
     if (!pcmc->pci_enabled) {
         x86_bios_rom_init(MACHINE(pcms), "bios.bin", rom_memory, true);
         return;
diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index 03461b6ae8..70eb01348f 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -3,8 +3,10 @@
 
 #ifndef CONFIG_USER_ONLY
 #include "sysemu/kvm.h"
+#include "hw/i386/pc.h"
 
 bool kvm_has_tdx(KVMState *s);
+int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
 #endif
 
 void tdx_pre_create_vcpu(CPUState *cpu);
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 93afe07ddb..4e1a0a4280 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -7,6 +7,11 @@ bool kvm_has_tdx(KVMState *s)
 {
         return false;
 }
+
+int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory)
+{
+    return -ENOSYS;
+}
 #endif
 
 void tdx_pre_create_vcpu(CPUState *cpu)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 67fb03b4b5..48c04d344d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -109,6 +109,31 @@ int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     return 0;
 }
 
+int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory)
+{
+    MachineState *ms = MACHINE(pcms);
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+    int i;
+
+    if (!tdx) {
+        return -ENOSYS;
+    }
+
+    /*
+     * Sanitiy check for tdx:
+     * TDX uses generic loader to load bios instead of pflash.
+     */
+    for (i = 0; i < ARRAY_SIZE(pcms->flash); i++) {
+        if (drive_get(IF_PFLASH, 0, i)) {
+            error_report("pflash not supported by VM type, "
+                         "use -device loader,file=<path>");
+            exit(1);
+        }
+    }
+    return 0;
+}
+
 void tdx_get_supported_cpuid(KVMState *s, uint32_t function,
                              uint32_t index, int reg, uint32_t *ret)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 15/44] i386/tdx: Add hook to require generic device loader
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a hook for TDX to denote that the TD Virtual Firmware must be
provided via the "generic" device loader.  Error out if pflash is used
in conjuction with TDX.

Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/pc_sysfw.c         |  6 ++++++
 include/sysemu/tdx.h       |  2 ++
 target/i386/kvm/tdx-stub.c |  5 +++++
 target/i386/kvm/tdx.c      | 25 +++++++++++++++++++++++++
 4 files changed, 38 insertions(+)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index 6ce37a2b05..5ff571af36 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -38,6 +38,7 @@
 #include "hw/block/flash.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sev.h"
+#include "sysemu/tdx.h"
 
 #define FLASH_SECTOR_SIZE 4096
 
@@ -328,6 +329,11 @@ void pc_system_firmware_init(PCMachineState *pcms,
     int i;
     BlockBackend *pflash_blk[ARRAY_SIZE(pcms->flash)];
 
+    if (!tdx_system_firmware_init(pcms, rom_memory)) {
+        pc_system_flash_cleanup_unused(pcms);
+        return;
+    }
+
     if (!pcmc->pci_enabled) {
         x86_bios_rom_init(MACHINE(pcms), "bios.bin", rom_memory, true);
         return;
diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index 03461b6ae8..70eb01348f 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -3,8 +3,10 @@
 
 #ifndef CONFIG_USER_ONLY
 #include "sysemu/kvm.h"
+#include "hw/i386/pc.h"
 
 bool kvm_has_tdx(KVMState *s);
+int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
 #endif
 
 void tdx_pre_create_vcpu(CPUState *cpu);
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 93afe07ddb..4e1a0a4280 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -7,6 +7,11 @@ bool kvm_has_tdx(KVMState *s)
 {
         return false;
 }
+
+int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory)
+{
+    return -ENOSYS;
+}
 #endif
 
 void tdx_pre_create_vcpu(CPUState *cpu)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 67fb03b4b5..48c04d344d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -109,6 +109,31 @@ int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     return 0;
 }
 
+int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory)
+{
+    MachineState *ms = MACHINE(pcms);
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+    int i;
+
+    if (!tdx) {
+        return -ENOSYS;
+    }
+
+    /*
+     * Sanitiy check for tdx:
+     * TDX uses generic loader to load bios instead of pflash.
+     */
+    for (i = 0; i < ARRAY_SIZE(pcms->flash); i++) {
+        if (drive_get(IF_PFLASH, 0, i)) {
+            error_report("pflash not supported by VM type, "
+                         "use -device loader,file=<path>");
+            exit(1);
+        }
+    }
+    return 0;
+}
+
 void tdx_get_supported_cpuid(KVMState *s, uint32_t function,
                              uint32_t index, int reg, uint32_t *ret)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 16/44] hw/i386: Add definitions from UEFI spec for volumes, resources, etc...
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add definitions for literals, enums, structs, GUIDs, etc... that will be
used by TDX to build the UEFI Hand-Off Block (HOB) that is passed to the
Trusted Domain Virtual Firmware (TDVF).  All values come from the UEFI
specification and TDVF design guide. [1]

Note: EFI_RESOURCE_ATTRIBUTE_{ENCRYPTED, UNACCEPTED}, will be added
in future UEFI spec.

[1] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/uefi.h | 496 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 496 insertions(+)
 create mode 100644 hw/i386/uefi.h

diff --git a/hw/i386/uefi.h b/hw/i386/uefi.h
new file mode 100644
index 0000000000..72bfc2f6a9
--- /dev/null
+++ b/hw/i386/uefi.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (C) 2020 Intel Corporation
+ *
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef HW_I386_UEFI_H
+#define HW_I386_UEFI_H
+
+/***************************************************************************/
+/*
+ * basic EFI definitions
+ * supplemented with UEFI Specification Version 2.8 (Errata A)
+ * released February 2020
+ */
+/* UEFI integer is little endian */
+
+typedef struct {
+    uint32_t Data1;
+    uint16_t Data2;
+    uint16_t Data3;
+    uint8_t Data4[8];
+} EFI_GUID;
+
+typedef uint64_t EFI_PHYSICAL_ADDRESS;
+typedef uint32_t EFI_BOOT_MODE;
+
+typedef enum {
+    EfiReservedMemoryType,
+    EfiLoaderCode,
+    EfiLoaderData,
+    EfiBootServicesCode,
+    EfiBootServicesData,
+    EfiRuntimeServicesCode,
+    EfiRuntimeServicesData,
+    EfiConventionalMemory,
+    EfiUnusableMemory,
+    EfiACPIReclaimMemory,
+    EfiACPIMemoryNVS,
+    EfiMemoryMappedIO,
+    EfiMemoryMappedIOPortSpace,
+    EfiPalCode,
+    EfiPersistentMemory,
+    EfiMaxMemoryType
+} EFI_MEMORY_TYPE;
+
+
+/*
+ * data structure firmware volume/file
+ * based on
+ * UEFI Platform Initialization Specification Version 1.7. vol 3, 3.2.1
+ */
+
+#define SIGNATURE_16(A, B)        (((A) | (B << 8)))
+#define SIGNATURE_32(A, B, C, D)  (((A) | (B << 8) | (C << 16) | (D << 24)))
+#define SIGNATURE_64(A, B, C, D, E, F, G, H)                            \
+    (SIGNATURE_32(A, B, C, D) | ((uint64_t)(SIGNATURE_32(E, F, G, H)) << 32))
+
+/***************************************************************************/
+/* Firmware Volume format */
+
+typedef uint32_t EFI_FV_FILE_ATTRIBUTES;
+
+
+#define EFI_FV_FILE_ATTRIB_ALIGNMENT     0x0000001F
+#define EFI_FV_FILE_ATTRIB_FIXED         0x00000100
+#define EFI_FV_FILE_ATTRIB_MEMORY_MAPPED 0x00000200
+
+typedef uint32_t EFI_FVB_ATTRIBUTES_2;
+
+
+#define EFI_FVB2_READ_DISABLED_CAP  0x00000001
+#define EFI_FVB2_READ_ENABLED_CAP   0x00000002
+#define EFI_FVB2_READ_STATUS        0x00000004
+#define EFI_FVB2_WRITE_DISABLED_CAP 0x00000008
+#define EFI_FVB2_WRITE_ENABLED_CAP  0x00000010
+#define EFI_FVB2_WRITE_STATUS       0x00000020
+#define EFI_FVB2_LOCK_CAP           0x00000040
+#define EFI_FVB2_LOCK_STATUS        0x00000080
+#define EFI_FVB2_STICKY_WRITE       0x00000200
+#define EFI_FVB2_MEMORY_MAPPED      0x00000400
+#define EFI_FVB2_ERASE_POLARITY     0x00000800
+#define EFI_FVB2_READ_LOCK_CAP      0x00001000
+#define EFI_FVB2_READ_LOCK_STATUS   0x00002000
+#define EFI_FVB2_WRITE_LOCK_CAP     0x00004000
+#define EFI_FVB2_WRITE_LOCK_STATUS  0x00008000
+#define EFI_FVB2_ALIGNMENT          0x001F0000
+#define EFI_FVB2_WEAK_ALIGNMENT     0x80000000
+#define EFI_FVB2_ALIGNMENT_1        0x00000000
+#define EFI_FVB2_ALIGNMENT_2        0x00010000
+#define EFI_FVB2_ALIGNMENT_4        0x00020000
+#define EFI_FVB2_ALIGNMENT_8        0x00030000
+#define EFI_FVB2_ALIGNMENT_16       0x00040000
+#define EFI_FVB2_ALIGNMENT_32       0x00050000
+#define EFI_FVB2_ALIGNMENT_64       0x00060000
+#define EFI_FVB2_ALIGNMENT_128      0x00070000
+#define EFI_FVB2_ALIGNMENT_256      0x00080000
+#define EFI_FVB2_ALIGNMENT_512      0x00090000
+#define EFI_FVB2_ALIGNMENT_1K       0x000A0000
+#define EFI_FVB2_ALIGNMENT_2K       0x000B0000
+#define EFI_FVB2_ALIGNMENT_4K       0x000C0000
+#define EFI_FVB2_ALIGNMENT_8K       0x000D0000
+#define EFI_FVB2_ALIGNMENT_16K      0x000E0000
+#define EFI_FVB2_ALIGNMENT_32K      0x000F0000
+#define EFI_FVB2_ALIGNMENT_64K      0x00100000
+#define EFI_FVB2_ALIGNMENT_128K     0x00110000
+#define EFI_FVB2_ALIGNMENT_256K     0x00120000
+#define EFI_FVB2_ALIGNMENT_512K     0x00130000
+#define EFI_FVB2_ALIGNMENT_1M       0x00140000
+#define EFI_FVB2_ALIGNMENT_2M       0x00150000
+#define EFI_FVB2_ALIGNMENT_4M       0x00160000
+#define EFI_FVB2_ALIGNMENT_8M       0x00170000
+#define EFI_FVB2_ALIGNMENT_16M      0x00180000
+#define EFI_FVB2_ALIGNMENT_32M      0x00190000
+#define EFI_FVB2_ALIGNMENT_64M      0x001A0000
+#define EFI_FVB2_ALIGNMENT_128M     0x001B0000
+#define EFI_FVB2_ALIGNMENT_256M     0x001C0000
+#define EFI_FVB2_ALIGNMENT_512M     0x001D0000
+#define EFI_FVB2_ALIGNMENT_1G       0x001E0000
+#define EFI_FVB2_ALIGNMENT_2G       0x001F0000
+
+typedef struct {
+    uint32_t NumBlocks;
+    uint32_t Length;
+} EFI_FV_BLOCK_MAP_ENTRY;
+
+typedef struct {
+    uint8_t ZeroVector[16];
+    EFI_GUID FileSystemGuid;
+    uint64_t FvLength;
+    uint32_t Signature;
+    EFI_FVB_ATTRIBUTES_2 Attributes;
+    uint16_t HeaderLength;
+    uint16_t Checksum;
+    uint16_t ExtHeaderOffset;
+    uint8_t Reserved[1];
+    uint8_t Revision;
+    EFI_FV_BLOCK_MAP_ENTRY BlockMap[1];
+} EFI_FIRMWARE_VOLUME_HEADER;
+
+#define EFI_FVH_SIGNATURE SIGNATURE_32('_', 'F', 'V', 'H')
+
+#define EFI_FVH_REVISION 0x02
+
+typedef struct {
+    EFI_GUID FvName;
+    uint32_t ExtHeaderSize;
+} EFI_FIRMWARE_VOLUME_EXT_HEADER;
+
+typedef struct {
+    uint16_t ExtEntrySize;
+    uint16_t ExtEntryType;
+} EFI_FIRMWARE_VOLUME_EXT_ENTRY;
+
+#define EFI_FV_EXT_TYPE_OEM_TYPE 0x01
+typedef struct {
+    EFI_FIRMWARE_VOLUME_EXT_ENTRY Hdr;
+    uint32_t TypeMask;
+
+    EFI_GUID Types[];
+} EFI_FIRMWARE_VOLUME_EXT_ENTRY_OEM_TYPE;
+
+#define EFI_FV_EXT_TYPE_GUID_TYPE 0x0002
+typedef struct {
+    EFI_FIRMWARE_VOLUME_EXT_ENTRY Hdr;
+    EFI_GUID FormatType;
+
+    uint8_t Data[];
+} EFI_FIRMWARE_VOLUME_EXT_ENTRY_GUID_TYPE;
+
+#define EFI_FV_EXT_TYPE_USED_SIZE_TYPE 0x03
+typedef struct {
+  EFI_FIRMWARE_VOLUME_EXT_ENTRY Hdr;
+  uint32_t UsedSize;
+} EFI_FIRMWARE_VOLUME_EXT_ENTRY_USED_SIZE_TYPE;
+
+/***************************************************************************/
+/* Firmware File */
+
+#pragma pack(push, 1)
+
+typedef union {
+    struct {
+        uint8_t Header;
+        uint8_t File;
+    } Checksum;
+    uint16_t Checksum16;
+} EFI_FFS_INTEGRITY_CHECK;
+
+typedef uint8_t EFI_FV_FILETYPE;
+typedef uint8_t EFI_FFS_FILE_ATTRIBUTES;
+typedef uint8_t EFI_FFS_FILE_STATE;
+
+
+#define EFI_FV_FILETYPE_ALL                   0x00
+#define EFI_FV_FILETYPE_RAW                   0x01
+#define EFI_FV_FILETYPE_FREEFORM              0x02
+#define EFI_FV_FILETYPE_SECURITY_CORE         0x03
+#define EFI_FV_FILETYPE_PEI_CORE              0x04
+#define EFI_FV_FILETYPE_DXE_CORE              0x05
+#define EFI_FV_FILETYPE_PEIM                  0x06
+#define EFI_FV_FILETYPE_DRIVER                0x07
+#define EFI_FV_FILETYPE_COMBINED_PEIM_DRIVER  0x08
+#define EFI_FV_FILETYPE_APPLICATION           0x09
+#define EFI_FV_FILETYPE_SMM                   0x0A
+#define EFI_FV_FILETYPE_FIRMWARE_VOLUME_IMAGE 0x0B
+#define EFI_FV_FILETYPE_COMBINED_SMM_DXE      0x0C
+#define EFI_FV_FILETYPE_SMM_CORE              0x0D
+#define EFI_FV_FILETYPE_MM_STANDALONE         0x0E
+#define EFI_FV_FILETYPE_MM_CORE_STANDALONE    0x0F
+#define EFI_FV_FILETYPE_OEM_MIN               0xc0
+#define EFI_FV_FILETYPE_OEM_MAX               0xdf
+#define EFI_FV_FILETYPE_DEBUG_MIN             0xe0
+#define EFI_FV_FILETYPE_DEBUG_MAX             0xef
+#define EFI_FV_FILETYPE_FFS_MIN               0xf0
+#define EFI_FV_FILETYPE_FFS_MAX               0xff
+#define EFI_FV_FILETYPE_FFS_PAD               0xf0
+
+
+#define FFS_ATTRIB_LARGE_FILE         0x01
+#define FFS_ATTRIB_DATA_ALIGNMENT2    0x02
+#define FFS_ATTRIB_FIXED              0x04
+#define FFS_ATTRIB_DATA_ALIGNMENT     0x38
+#define FFS_ATTRIB_CHECKSUM           0x40
+
+
+#define EFI_FILE_HEADER_CONSTRUCTION  0x01
+#define EFI_FILE_HEADER_VALID         0x02
+#define EFI_FILE_DATA_VALID           0x04
+#define EFI_FILE_MARKED_FOR_UPDATE    0x08
+#define EFI_FILE_DELETED              0x10
+#define EFI_FILE_HEADER_INVALID       0x20
+
+
+#define EFI_FILE_ALL_STATE_BITS                 \
+    (EFI_FILE_HEADER_CONSTRUCTION |             \
+     EFI_FILE_HEADER_VALID |                    \
+     EFI_FILE_DATA_VALID |                      \
+     EFI_FILE_MARKED_FOR_UPDATE |               \
+     EFI_FILE_DELETED |                         \
+     EFI_FILE_HEADER_INVALID)
+
+
+typedef struct {
+    EFI_GUID Name;
+    EFI_FFS_INTEGRITY_CHECK IntegrityCheck;
+    EFI_FV_FILETYPE Type;
+    EFI_FFS_FILE_ATTRIBUTES Attributes;
+    uint8_t Size[3];
+    EFI_FFS_FILE_STATE State;
+} EFI_FFS_FILE_HEADER;
+
+
+typedef struct {
+    EFI_GUID Name;
+    EFI_FFS_INTEGRITY_CHECK IntegrityCheck;
+    EFI_FV_FILETYPE Type;
+    EFI_FFS_FILE_ATTRIBUTES Attributes;
+    uint8_t Size[3];
+    EFI_FFS_FILE_STATE State;
+    uint64_t ExtendedSize;
+} EFI_FFS_FILE_HEADER2;
+
+#define MAX_FFS_SIZE 0x1000000
+
+#pragma pack(pop)
+
+
+/***************************************************************************/
+/* GUIDs */
+#define EFI_FIRMWARE_FILE_SYSTEM2_GUID                          \
+    ((EFI_GUID){ 0x8c8ce578, 0x8a3d, 0x4f1c,                    \
+        { 0x99, 0x35, 0x89, 0x61, 0x85, 0xc3, 0x2d, 0xd3 } })
+
+#define EFI_FIRMWARE_FILE_SYSTEM3_GUID                          \
+    ((EFI_GUID){ 0x5473c07a, 0x3dcb, 0x4dca,                    \
+        { 0xbd, 0x6f, 0x1e, 0x96, 0x89, 0xe7, 0x34, 0x9a } })
+
+#define EFI_SYSTEM_NV_DATA_FV_GUID                              \
+    ((EFI_GUID){ 0xfff12b8d, 0x7696, 0x4c8b,                    \
+        { 0xa9, 0x85, 0x27, 0x47, 0x7, 0x5b, 0x4f, 0x50 } })
+
+#define EFI_FFS_VOLUME_TOP_FILE_GUID                            \
+    ((EFI_GUID){ 0x1BA0062E, 0xC779, 0x4582,                    \
+        { 0x85, 0x66, 0x33, 0x6A, 0xE8, 0xF7, 0x8F, 0x09 } })
+
+/*
+ * data structure for hob(Hand-Off block)
+ * based on
+ * UEFI Platform Initialization Specification Version 1.7. vol 3, chap 4 and 5
+ */
+
+#define EFI_HOB_TYPE_HANDOFF              0x0001
+#define EFI_HOB_TYPE_MEMORY_ALLOCATION    0x0002
+#define EFI_HOB_TYPE_RESOURCE_DESCRIPTOR  0x0003
+#define EFI_HOB_TYPE_GUID_EXTENSION       0x0004
+#define EFI_HOB_TYPE_FV                   0x0005
+#define EFI_HOB_TYPE_CPU                  0x0006
+#define EFI_HOB_TYPE_MEMORY_POOL          0x0007
+#define EFI_HOB_TYPE_FV2                  0x0009
+#define EFI_HOB_TYPE_LOAD_PEIM_UNUSED     0x000A
+#define EFI_HOB_TYPE_UEFI_CAPSULE         0x000B
+#define EFI_HOB_TYPE_FV3                  0x000C
+#define EFI_HOB_TYPE_UNUSED               0xFFFE
+#define EFI_HOB_TYPE_END_OF_HOB_LIST      0xFFFF
+
+typedef struct {
+    uint16_t HobType;
+    uint16_t HobLength;
+    uint32_t Reserved;
+} EFI_HOB_GENERIC_HEADER;
+
+
+#define EFI_HOB_HANDOFF_TABLE_VERSION 0x0009
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    uint32_t Version;
+    EFI_BOOT_MODE BootMode;
+    EFI_PHYSICAL_ADDRESS EfiMemoryTop;
+    EFI_PHYSICAL_ADDRESS EfiMemoryBottom;
+    EFI_PHYSICAL_ADDRESS EfiFreeMemoryTop;
+    EFI_PHYSICAL_ADDRESS EfiFreeMemoryBottom;
+    EFI_PHYSICAL_ADDRESS EfiEndOfHobList;
+} EFI_HOB_HANDOFF_INFO_TABLE;
+
+typedef struct {
+    EFI_GUID Name;
+    EFI_PHYSICAL_ADDRESS MemoryBaseAddress;
+    uint64_t MemoryLength;
+    EFI_MEMORY_TYPE MemoryType;
+    uint8_t Reserved[4];
+} EFI_HOB_MEMORY_ALLOCATION_HEADER;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_HOB_MEMORY_ALLOCATION_HEADER AllocDescriptor;
+} EFI_HOB_MEMORY_ALLOCATION;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_HOB_MEMORY_ALLOCATION_HEADER AllocDescriptor;
+} EFI_HOB_MEMORY_ALLOCATION_STACK;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_HOB_MEMORY_ALLOCATION_HEADER AllocDescriptor;
+} EFI_HOB_MEMORY_ALLOCATION_BSP_STORE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_HOB_MEMORY_ALLOCATION_HEADER MemoryAllocationHeader;
+    EFI_GUID ModuleName;
+    EFI_PHYSICAL_ADDRESS EntryPoint;
+} EFI_HOB_MEMORY_ALLOCATION_MODULE;
+
+#define EFI_HOB_MEMORY_ALLOC_STACK_GUID                         \
+    ((EFI_GUID){ 0x4ed4bf27, 0x4092, 0x42e9,                    \
+        { 0x80, 0x7d, 0x52, 0x7b, 0x1d, 0x0, 0xc9, 0xbd } })
+
+#define EFI_HOB_MEMORY_ALLOC_BSP_STORE_GUID                     \
+    ((EFI_GUID){ 0x564b33cd, 0xc92a, 0x4593,                    \
+        { 0x90, 0xbf, 0x24, 0x73, 0xe4, 0x3c, 0x63, 0x22 } })
+
+#define EFI_HOB_MEMORY_ALLOC_MODULE_GUID                        \
+    ((EFI_GUID){ 0xf8e21975, 0x899, 0x4f58,                     \
+        { 0xa4, 0xbe, 0x55, 0x25, 0xa9, 0xc6, 0xd7, 0x7a } })
+
+
+typedef uint32_t EFI_RESOURCE_TYPE;
+
+#define EFI_RESOURCE_SYSTEM_MEMORY          0x00000000
+#define EFI_RESOURCE_MEMORY_MAPPED_IO       0x00000001
+#define EFI_RESOURCE_IO                     0x00000002
+#define EFI_RESOURCE_FIRMWARE_DEVICE        0x00000003
+#define EFI_RESOURCE_MEMORY_MAPPED_IO_PORT  0x00000004
+#define EFI_RESOURCE_MEMORY_RESERVED        0x00000005
+#define EFI_RESOURCE_IO_RESERVED            0x00000006
+#define EFI_RESOURCE_MAX_MEMORY_TYPE        0x00000007
+
+typedef uint32_t EFI_RESOURCE_ATTRIBUTE_TYPE;
+
+#define EFI_RESOURCE_ATTRIBUTE_PRESENT                  0x00000001
+#define EFI_RESOURCE_ATTRIBUTE_INITIALIZED              0x00000002
+#define EFI_RESOURCE_ATTRIBUTE_TESTED                   0x00000004
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTED           0x00000080
+
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTED          0x00000100
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTED      0x00000200
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTENT               0x00800000
+
+#define EFI_RESOURCE_ATTRIBUTE_SINGLE_BIT_ECC           0x00000008
+#define EFI_RESOURCE_ATTRIBUTE_MULTIPLE_BIT_ECC         0x00000010
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_1           0x00000020
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_2           0x00000040
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE              0x00000400
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_COMBINEABLE        0x00000800
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_THROUGH_CACHEABLE  0x00001000
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_BACK_CACHEABLE     0x00002000
+#define EFI_RESOURCE_ATTRIBUTE_16_BIT_IO                0x00004000
+#define EFI_RESOURCE_ATTRIBUTE_32_BIT_IO                0x00008000
+#define EFI_RESOURCE_ATTRIBUTE_64_BIT_IO                0x00010000
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHED_EXPORTED        0x00020000
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTABLE         0x00100000
+
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTABLE        0x00200000
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTABLE    0x00400000
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTABLE              0x01000000
+
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTED      0x00040000
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTABLE    0x00080000
+
+#define EFI_RESOURCE_ATTRIBUTE_MORE_RELIABLE            0x02000000
+#define EFI_RESOURCE_ATTRIBUTE_ENCRYPTED                0x04000000
+
+/* FIXME: place holder for now */
+#define EFI_RESOURCE_ATTRIBUTE_UNACCEPTED               0x00000000
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_GUID Owner;
+    EFI_RESOURCE_TYPE ResourceType;
+    EFI_RESOURCE_ATTRIBUTE_TYPE ResourceAttribute;
+    EFI_PHYSICAL_ADDRESS PhysicalStart;
+    uint64_t ResourceLength;
+} EFI_HOB_RESOURCE_DESCRIPTOR;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_GUID Name;
+
+    /* guid specific data follows */
+} EFI_HOB_GUID_TYPE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+} EFI_HOB_FIRMWARE_VOLUME;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+    EFI_GUID FvName;
+    EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME2;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+    uint32_t AuthenticationStatus;
+    bool ExtractedFv;
+    EFI_GUID FvName;
+    EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME3;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    uint8_t SizeOfMemorySpace;
+    uint8_t SizeOfIoSpace;
+    uint8_t Reserved[6];
+} EFI_HOB_CPU;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+} EFI_HOB_MEMORY_POOL;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+} EFI_HOB_UEFI_CAPSULE;
+
+#define EFI_HOB_OWNER_ZERO                                      \
+    ((EFI_GUID){ 0x00000000, 0x0000, 0x0000,                    \
+        { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } })
+
+#endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 16/44] hw/i386: Add definitions from UEFI spec for volumes, resources, etc...
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add definitions for literals, enums, structs, GUIDs, etc... that will be
used by TDX to build the UEFI Hand-Off Block (HOB) that is passed to the
Trusted Domain Virtual Firmware (TDVF).  All values come from the UEFI
specification and TDVF design guide. [1]

Note: EFI_RESOURCE_ATTRIBUTE_{ENCRYPTED, UNACCEPTED}, will be added
in future UEFI spec.

[1] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/uefi.h | 496 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 496 insertions(+)
 create mode 100644 hw/i386/uefi.h

diff --git a/hw/i386/uefi.h b/hw/i386/uefi.h
new file mode 100644
index 0000000000..72bfc2f6a9
--- /dev/null
+++ b/hw/i386/uefi.h
@@ -0,0 +1,496 @@
+/*
+ * Copyright (C) 2020 Intel Corporation
+ *
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef HW_I386_UEFI_H
+#define HW_I386_UEFI_H
+
+/***************************************************************************/
+/*
+ * basic EFI definitions
+ * supplemented with UEFI Specification Version 2.8 (Errata A)
+ * released February 2020
+ */
+/* UEFI integer is little endian */
+
+typedef struct {
+    uint32_t Data1;
+    uint16_t Data2;
+    uint16_t Data3;
+    uint8_t Data4[8];
+} EFI_GUID;
+
+typedef uint64_t EFI_PHYSICAL_ADDRESS;
+typedef uint32_t EFI_BOOT_MODE;
+
+typedef enum {
+    EfiReservedMemoryType,
+    EfiLoaderCode,
+    EfiLoaderData,
+    EfiBootServicesCode,
+    EfiBootServicesData,
+    EfiRuntimeServicesCode,
+    EfiRuntimeServicesData,
+    EfiConventionalMemory,
+    EfiUnusableMemory,
+    EfiACPIReclaimMemory,
+    EfiACPIMemoryNVS,
+    EfiMemoryMappedIO,
+    EfiMemoryMappedIOPortSpace,
+    EfiPalCode,
+    EfiPersistentMemory,
+    EfiMaxMemoryType
+} EFI_MEMORY_TYPE;
+
+
+/*
+ * data structure firmware volume/file
+ * based on
+ * UEFI Platform Initialization Specification Version 1.7. vol 3, 3.2.1
+ */
+
+#define SIGNATURE_16(A, B)        (((A) | (B << 8)))
+#define SIGNATURE_32(A, B, C, D)  (((A) | (B << 8) | (C << 16) | (D << 24)))
+#define SIGNATURE_64(A, B, C, D, E, F, G, H)                            \
+    (SIGNATURE_32(A, B, C, D) | ((uint64_t)(SIGNATURE_32(E, F, G, H)) << 32))
+
+/***************************************************************************/
+/* Firmware Volume format */
+
+typedef uint32_t EFI_FV_FILE_ATTRIBUTES;
+
+
+#define EFI_FV_FILE_ATTRIB_ALIGNMENT     0x0000001F
+#define EFI_FV_FILE_ATTRIB_FIXED         0x00000100
+#define EFI_FV_FILE_ATTRIB_MEMORY_MAPPED 0x00000200
+
+typedef uint32_t EFI_FVB_ATTRIBUTES_2;
+
+
+#define EFI_FVB2_READ_DISABLED_CAP  0x00000001
+#define EFI_FVB2_READ_ENABLED_CAP   0x00000002
+#define EFI_FVB2_READ_STATUS        0x00000004
+#define EFI_FVB2_WRITE_DISABLED_CAP 0x00000008
+#define EFI_FVB2_WRITE_ENABLED_CAP  0x00000010
+#define EFI_FVB2_WRITE_STATUS       0x00000020
+#define EFI_FVB2_LOCK_CAP           0x00000040
+#define EFI_FVB2_LOCK_STATUS        0x00000080
+#define EFI_FVB2_STICKY_WRITE       0x00000200
+#define EFI_FVB2_MEMORY_MAPPED      0x00000400
+#define EFI_FVB2_ERASE_POLARITY     0x00000800
+#define EFI_FVB2_READ_LOCK_CAP      0x00001000
+#define EFI_FVB2_READ_LOCK_STATUS   0x00002000
+#define EFI_FVB2_WRITE_LOCK_CAP     0x00004000
+#define EFI_FVB2_WRITE_LOCK_STATUS  0x00008000
+#define EFI_FVB2_ALIGNMENT          0x001F0000
+#define EFI_FVB2_WEAK_ALIGNMENT     0x80000000
+#define EFI_FVB2_ALIGNMENT_1        0x00000000
+#define EFI_FVB2_ALIGNMENT_2        0x00010000
+#define EFI_FVB2_ALIGNMENT_4        0x00020000
+#define EFI_FVB2_ALIGNMENT_8        0x00030000
+#define EFI_FVB2_ALIGNMENT_16       0x00040000
+#define EFI_FVB2_ALIGNMENT_32       0x00050000
+#define EFI_FVB2_ALIGNMENT_64       0x00060000
+#define EFI_FVB2_ALIGNMENT_128      0x00070000
+#define EFI_FVB2_ALIGNMENT_256      0x00080000
+#define EFI_FVB2_ALIGNMENT_512      0x00090000
+#define EFI_FVB2_ALIGNMENT_1K       0x000A0000
+#define EFI_FVB2_ALIGNMENT_2K       0x000B0000
+#define EFI_FVB2_ALIGNMENT_4K       0x000C0000
+#define EFI_FVB2_ALIGNMENT_8K       0x000D0000
+#define EFI_FVB2_ALIGNMENT_16K      0x000E0000
+#define EFI_FVB2_ALIGNMENT_32K      0x000F0000
+#define EFI_FVB2_ALIGNMENT_64K      0x00100000
+#define EFI_FVB2_ALIGNMENT_128K     0x00110000
+#define EFI_FVB2_ALIGNMENT_256K     0x00120000
+#define EFI_FVB2_ALIGNMENT_512K     0x00130000
+#define EFI_FVB2_ALIGNMENT_1M       0x00140000
+#define EFI_FVB2_ALIGNMENT_2M       0x00150000
+#define EFI_FVB2_ALIGNMENT_4M       0x00160000
+#define EFI_FVB2_ALIGNMENT_8M       0x00170000
+#define EFI_FVB2_ALIGNMENT_16M      0x00180000
+#define EFI_FVB2_ALIGNMENT_32M      0x00190000
+#define EFI_FVB2_ALIGNMENT_64M      0x001A0000
+#define EFI_FVB2_ALIGNMENT_128M     0x001B0000
+#define EFI_FVB2_ALIGNMENT_256M     0x001C0000
+#define EFI_FVB2_ALIGNMENT_512M     0x001D0000
+#define EFI_FVB2_ALIGNMENT_1G       0x001E0000
+#define EFI_FVB2_ALIGNMENT_2G       0x001F0000
+
+typedef struct {
+    uint32_t NumBlocks;
+    uint32_t Length;
+} EFI_FV_BLOCK_MAP_ENTRY;
+
+typedef struct {
+    uint8_t ZeroVector[16];
+    EFI_GUID FileSystemGuid;
+    uint64_t FvLength;
+    uint32_t Signature;
+    EFI_FVB_ATTRIBUTES_2 Attributes;
+    uint16_t HeaderLength;
+    uint16_t Checksum;
+    uint16_t ExtHeaderOffset;
+    uint8_t Reserved[1];
+    uint8_t Revision;
+    EFI_FV_BLOCK_MAP_ENTRY BlockMap[1];
+} EFI_FIRMWARE_VOLUME_HEADER;
+
+#define EFI_FVH_SIGNATURE SIGNATURE_32('_', 'F', 'V', 'H')
+
+#define EFI_FVH_REVISION 0x02
+
+typedef struct {
+    EFI_GUID FvName;
+    uint32_t ExtHeaderSize;
+} EFI_FIRMWARE_VOLUME_EXT_HEADER;
+
+typedef struct {
+    uint16_t ExtEntrySize;
+    uint16_t ExtEntryType;
+} EFI_FIRMWARE_VOLUME_EXT_ENTRY;
+
+#define EFI_FV_EXT_TYPE_OEM_TYPE 0x01
+typedef struct {
+    EFI_FIRMWARE_VOLUME_EXT_ENTRY Hdr;
+    uint32_t TypeMask;
+
+    EFI_GUID Types[];
+} EFI_FIRMWARE_VOLUME_EXT_ENTRY_OEM_TYPE;
+
+#define EFI_FV_EXT_TYPE_GUID_TYPE 0x0002
+typedef struct {
+    EFI_FIRMWARE_VOLUME_EXT_ENTRY Hdr;
+    EFI_GUID FormatType;
+
+    uint8_t Data[];
+} EFI_FIRMWARE_VOLUME_EXT_ENTRY_GUID_TYPE;
+
+#define EFI_FV_EXT_TYPE_USED_SIZE_TYPE 0x03
+typedef struct {
+  EFI_FIRMWARE_VOLUME_EXT_ENTRY Hdr;
+  uint32_t UsedSize;
+} EFI_FIRMWARE_VOLUME_EXT_ENTRY_USED_SIZE_TYPE;
+
+/***************************************************************************/
+/* Firmware File */
+
+#pragma pack(push, 1)
+
+typedef union {
+    struct {
+        uint8_t Header;
+        uint8_t File;
+    } Checksum;
+    uint16_t Checksum16;
+} EFI_FFS_INTEGRITY_CHECK;
+
+typedef uint8_t EFI_FV_FILETYPE;
+typedef uint8_t EFI_FFS_FILE_ATTRIBUTES;
+typedef uint8_t EFI_FFS_FILE_STATE;
+
+
+#define EFI_FV_FILETYPE_ALL                   0x00
+#define EFI_FV_FILETYPE_RAW                   0x01
+#define EFI_FV_FILETYPE_FREEFORM              0x02
+#define EFI_FV_FILETYPE_SECURITY_CORE         0x03
+#define EFI_FV_FILETYPE_PEI_CORE              0x04
+#define EFI_FV_FILETYPE_DXE_CORE              0x05
+#define EFI_FV_FILETYPE_PEIM                  0x06
+#define EFI_FV_FILETYPE_DRIVER                0x07
+#define EFI_FV_FILETYPE_COMBINED_PEIM_DRIVER  0x08
+#define EFI_FV_FILETYPE_APPLICATION           0x09
+#define EFI_FV_FILETYPE_SMM                   0x0A
+#define EFI_FV_FILETYPE_FIRMWARE_VOLUME_IMAGE 0x0B
+#define EFI_FV_FILETYPE_COMBINED_SMM_DXE      0x0C
+#define EFI_FV_FILETYPE_SMM_CORE              0x0D
+#define EFI_FV_FILETYPE_MM_STANDALONE         0x0E
+#define EFI_FV_FILETYPE_MM_CORE_STANDALONE    0x0F
+#define EFI_FV_FILETYPE_OEM_MIN               0xc0
+#define EFI_FV_FILETYPE_OEM_MAX               0xdf
+#define EFI_FV_FILETYPE_DEBUG_MIN             0xe0
+#define EFI_FV_FILETYPE_DEBUG_MAX             0xef
+#define EFI_FV_FILETYPE_FFS_MIN               0xf0
+#define EFI_FV_FILETYPE_FFS_MAX               0xff
+#define EFI_FV_FILETYPE_FFS_PAD               0xf0
+
+
+#define FFS_ATTRIB_LARGE_FILE         0x01
+#define FFS_ATTRIB_DATA_ALIGNMENT2    0x02
+#define FFS_ATTRIB_FIXED              0x04
+#define FFS_ATTRIB_DATA_ALIGNMENT     0x38
+#define FFS_ATTRIB_CHECKSUM           0x40
+
+
+#define EFI_FILE_HEADER_CONSTRUCTION  0x01
+#define EFI_FILE_HEADER_VALID         0x02
+#define EFI_FILE_DATA_VALID           0x04
+#define EFI_FILE_MARKED_FOR_UPDATE    0x08
+#define EFI_FILE_DELETED              0x10
+#define EFI_FILE_HEADER_INVALID       0x20
+
+
+#define EFI_FILE_ALL_STATE_BITS                 \
+    (EFI_FILE_HEADER_CONSTRUCTION |             \
+     EFI_FILE_HEADER_VALID |                    \
+     EFI_FILE_DATA_VALID |                      \
+     EFI_FILE_MARKED_FOR_UPDATE |               \
+     EFI_FILE_DELETED |                         \
+     EFI_FILE_HEADER_INVALID)
+
+
+typedef struct {
+    EFI_GUID Name;
+    EFI_FFS_INTEGRITY_CHECK IntegrityCheck;
+    EFI_FV_FILETYPE Type;
+    EFI_FFS_FILE_ATTRIBUTES Attributes;
+    uint8_t Size[3];
+    EFI_FFS_FILE_STATE State;
+} EFI_FFS_FILE_HEADER;
+
+
+typedef struct {
+    EFI_GUID Name;
+    EFI_FFS_INTEGRITY_CHECK IntegrityCheck;
+    EFI_FV_FILETYPE Type;
+    EFI_FFS_FILE_ATTRIBUTES Attributes;
+    uint8_t Size[3];
+    EFI_FFS_FILE_STATE State;
+    uint64_t ExtendedSize;
+} EFI_FFS_FILE_HEADER2;
+
+#define MAX_FFS_SIZE 0x1000000
+
+#pragma pack(pop)
+
+
+/***************************************************************************/
+/* GUIDs */
+#define EFI_FIRMWARE_FILE_SYSTEM2_GUID                          \
+    ((EFI_GUID){ 0x8c8ce578, 0x8a3d, 0x4f1c,                    \
+        { 0x99, 0x35, 0x89, 0x61, 0x85, 0xc3, 0x2d, 0xd3 } })
+
+#define EFI_FIRMWARE_FILE_SYSTEM3_GUID                          \
+    ((EFI_GUID){ 0x5473c07a, 0x3dcb, 0x4dca,                    \
+        { 0xbd, 0x6f, 0x1e, 0x96, 0x89, 0xe7, 0x34, 0x9a } })
+
+#define EFI_SYSTEM_NV_DATA_FV_GUID                              \
+    ((EFI_GUID){ 0xfff12b8d, 0x7696, 0x4c8b,                    \
+        { 0xa9, 0x85, 0x27, 0x47, 0x7, 0x5b, 0x4f, 0x50 } })
+
+#define EFI_FFS_VOLUME_TOP_FILE_GUID                            \
+    ((EFI_GUID){ 0x1BA0062E, 0xC779, 0x4582,                    \
+        { 0x85, 0x66, 0x33, 0x6A, 0xE8, 0xF7, 0x8F, 0x09 } })
+
+/*
+ * data structure for hob(Hand-Off block)
+ * based on
+ * UEFI Platform Initialization Specification Version 1.7. vol 3, chap 4 and 5
+ */
+
+#define EFI_HOB_TYPE_HANDOFF              0x0001
+#define EFI_HOB_TYPE_MEMORY_ALLOCATION    0x0002
+#define EFI_HOB_TYPE_RESOURCE_DESCRIPTOR  0x0003
+#define EFI_HOB_TYPE_GUID_EXTENSION       0x0004
+#define EFI_HOB_TYPE_FV                   0x0005
+#define EFI_HOB_TYPE_CPU                  0x0006
+#define EFI_HOB_TYPE_MEMORY_POOL          0x0007
+#define EFI_HOB_TYPE_FV2                  0x0009
+#define EFI_HOB_TYPE_LOAD_PEIM_UNUSED     0x000A
+#define EFI_HOB_TYPE_UEFI_CAPSULE         0x000B
+#define EFI_HOB_TYPE_FV3                  0x000C
+#define EFI_HOB_TYPE_UNUSED               0xFFFE
+#define EFI_HOB_TYPE_END_OF_HOB_LIST      0xFFFF
+
+typedef struct {
+    uint16_t HobType;
+    uint16_t HobLength;
+    uint32_t Reserved;
+} EFI_HOB_GENERIC_HEADER;
+
+
+#define EFI_HOB_HANDOFF_TABLE_VERSION 0x0009
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    uint32_t Version;
+    EFI_BOOT_MODE BootMode;
+    EFI_PHYSICAL_ADDRESS EfiMemoryTop;
+    EFI_PHYSICAL_ADDRESS EfiMemoryBottom;
+    EFI_PHYSICAL_ADDRESS EfiFreeMemoryTop;
+    EFI_PHYSICAL_ADDRESS EfiFreeMemoryBottom;
+    EFI_PHYSICAL_ADDRESS EfiEndOfHobList;
+} EFI_HOB_HANDOFF_INFO_TABLE;
+
+typedef struct {
+    EFI_GUID Name;
+    EFI_PHYSICAL_ADDRESS MemoryBaseAddress;
+    uint64_t MemoryLength;
+    EFI_MEMORY_TYPE MemoryType;
+    uint8_t Reserved[4];
+} EFI_HOB_MEMORY_ALLOCATION_HEADER;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_HOB_MEMORY_ALLOCATION_HEADER AllocDescriptor;
+} EFI_HOB_MEMORY_ALLOCATION;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_HOB_MEMORY_ALLOCATION_HEADER AllocDescriptor;
+} EFI_HOB_MEMORY_ALLOCATION_STACK;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_HOB_MEMORY_ALLOCATION_HEADER AllocDescriptor;
+} EFI_HOB_MEMORY_ALLOCATION_BSP_STORE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_HOB_MEMORY_ALLOCATION_HEADER MemoryAllocationHeader;
+    EFI_GUID ModuleName;
+    EFI_PHYSICAL_ADDRESS EntryPoint;
+} EFI_HOB_MEMORY_ALLOCATION_MODULE;
+
+#define EFI_HOB_MEMORY_ALLOC_STACK_GUID                         \
+    ((EFI_GUID){ 0x4ed4bf27, 0x4092, 0x42e9,                    \
+        { 0x80, 0x7d, 0x52, 0x7b, 0x1d, 0x0, 0xc9, 0xbd } })
+
+#define EFI_HOB_MEMORY_ALLOC_BSP_STORE_GUID                     \
+    ((EFI_GUID){ 0x564b33cd, 0xc92a, 0x4593,                    \
+        { 0x90, 0xbf, 0x24, 0x73, 0xe4, 0x3c, 0x63, 0x22 } })
+
+#define EFI_HOB_MEMORY_ALLOC_MODULE_GUID                        \
+    ((EFI_GUID){ 0xf8e21975, 0x899, 0x4f58,                     \
+        { 0xa4, 0xbe, 0x55, 0x25, 0xa9, 0xc6, 0xd7, 0x7a } })
+
+
+typedef uint32_t EFI_RESOURCE_TYPE;
+
+#define EFI_RESOURCE_SYSTEM_MEMORY          0x00000000
+#define EFI_RESOURCE_MEMORY_MAPPED_IO       0x00000001
+#define EFI_RESOURCE_IO                     0x00000002
+#define EFI_RESOURCE_FIRMWARE_DEVICE        0x00000003
+#define EFI_RESOURCE_MEMORY_MAPPED_IO_PORT  0x00000004
+#define EFI_RESOURCE_MEMORY_RESERVED        0x00000005
+#define EFI_RESOURCE_IO_RESERVED            0x00000006
+#define EFI_RESOURCE_MAX_MEMORY_TYPE        0x00000007
+
+typedef uint32_t EFI_RESOURCE_ATTRIBUTE_TYPE;
+
+#define EFI_RESOURCE_ATTRIBUTE_PRESENT                  0x00000001
+#define EFI_RESOURCE_ATTRIBUTE_INITIALIZED              0x00000002
+#define EFI_RESOURCE_ATTRIBUTE_TESTED                   0x00000004
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTED           0x00000080
+
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTED          0x00000100
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTED      0x00000200
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTENT               0x00800000
+
+#define EFI_RESOURCE_ATTRIBUTE_SINGLE_BIT_ECC           0x00000008
+#define EFI_RESOURCE_ATTRIBUTE_MULTIPLE_BIT_ECC         0x00000010
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_1           0x00000020
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_2           0x00000040
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE              0x00000400
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_COMBINEABLE        0x00000800
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_THROUGH_CACHEABLE  0x00001000
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_BACK_CACHEABLE     0x00002000
+#define EFI_RESOURCE_ATTRIBUTE_16_BIT_IO                0x00004000
+#define EFI_RESOURCE_ATTRIBUTE_32_BIT_IO                0x00008000
+#define EFI_RESOURCE_ATTRIBUTE_64_BIT_IO                0x00010000
+#define EFI_RESOURCE_ATTRIBUTE_UNCACHED_EXPORTED        0x00020000
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTABLE         0x00100000
+
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTABLE        0x00200000
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTABLE    0x00400000
+#define EFI_RESOURCE_ATTRIBUTE_PERSISTABLE              0x01000000
+
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTED      0x00040000
+#define EFI_RESOURCE_ATTRIBUTE_READ_ONLY_PROTECTABLE    0x00080000
+
+#define EFI_RESOURCE_ATTRIBUTE_MORE_RELIABLE            0x02000000
+#define EFI_RESOURCE_ATTRIBUTE_ENCRYPTED                0x04000000
+
+/* FIXME: place holder for now */
+#define EFI_RESOURCE_ATTRIBUTE_UNACCEPTED               0x00000000
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_GUID Owner;
+    EFI_RESOURCE_TYPE ResourceType;
+    EFI_RESOURCE_ATTRIBUTE_TYPE ResourceAttribute;
+    EFI_PHYSICAL_ADDRESS PhysicalStart;
+    uint64_t ResourceLength;
+} EFI_HOB_RESOURCE_DESCRIPTOR;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_GUID Name;
+
+    /* guid specific data follows */
+} EFI_HOB_GUID_TYPE;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+} EFI_HOB_FIRMWARE_VOLUME;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+    EFI_GUID FvName;
+    EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME2;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+    uint32_t AuthenticationStatus;
+    bool ExtractedFv;
+    EFI_GUID FvName;
+    EFI_GUID FileName;
+} EFI_HOB_FIRMWARE_VOLUME3;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+    uint8_t SizeOfMemorySpace;
+    uint8_t SizeOfIoSpace;
+    uint8_t Reserved[6];
+} EFI_HOB_CPU;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+} EFI_HOB_MEMORY_POOL;
+
+typedef struct {
+    EFI_HOB_GENERIC_HEADER Header;
+
+    EFI_PHYSICAL_ADDRESS BaseAddress;
+    uint64_t Length;
+} EFI_HOB_UEFI_CAPSULE;
+
+#define EFI_HOB_OWNER_ZERO                                      \
+    ((EFI_GUID){ 0x00000000, 0x0000, 0x0000,                    \
+        { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } })
+
+#endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 17/44] i386/tdx: Add definitions for TDVF metadata
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add constants and structs for the TD Virtual Firmware metadata, which
describes how the TDVF must be built to ensure correct functionality and
measurement.  They are defined in TDVF Design Guide [1].

[1] TDVF Design Guide
https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 include/hw/i386/tdvf.h | 55 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)
 create mode 100644 include/hw/i386/tdvf.h

diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
new file mode 100644
index 0000000000..5c78e2affb
--- /dev/null
+++ b/include/hw/i386/tdvf.h
@@ -0,0 +1,55 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_TDVF_H
+#define HW_I386_TDVF_H
+
+#include "qemu/osdep.h"
+
+#define TDVF_METDATA_OFFSET_FROM_END    0x20
+
+#define TDVF_SECTION_TYPE_BFV               0
+#define TDVF_SECTION_TYPE_CFV               1
+#define TDVF_SECTION_TYPE_TD_HOB            2
+#define TDVF_SECTION_TYPE_TEMP_MEM          3
+
+#define TDVF_SECTION_ATTRIBUTES_EXTENDMR    (1U << 0)
+
+typedef struct {
+    uint32_t DataOffset;
+    uint32_t RawDataSize;
+    uint64_t MemoryAddress;
+    uint64_t MemoryDataSize;
+    uint32_t Type;
+    uint32_t Attributes;
+} TdvfSectionEntry;
+
+#define TDVF_SIGNATURE_LE32     0x46564454 /* TDVF as little endian */
+
+typedef struct {
+    uint8_t Signature[4];
+    uint32_t Length;
+    uint32_t Version;
+    uint32_t NumberOfSectionEntries;
+    TdvfSectionEntry SectionEntries[];
+} TdvfMetadata;
+
+#endif /* HW_I386_TDVF_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 17/44] i386/tdx: Add definitions for TDVF metadata
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add constants and structs for the TD Virtual Firmware metadata, which
describes how the TDVF must be built to ensure correct functionality and
measurement.  They are defined in TDVF Design Guide [1].

[1] TDVF Design Guide
https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 include/hw/i386/tdvf.h | 55 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)
 create mode 100644 include/hw/i386/tdvf.h

diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
new file mode 100644
index 0000000000..5c78e2affb
--- /dev/null
+++ b/include/hw/i386/tdvf.h
@@ -0,0 +1,55 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_I386_TDVF_H
+#define HW_I386_TDVF_H
+
+#include "qemu/osdep.h"
+
+#define TDVF_METDATA_OFFSET_FROM_END    0x20
+
+#define TDVF_SECTION_TYPE_BFV               0
+#define TDVF_SECTION_TYPE_CFV               1
+#define TDVF_SECTION_TYPE_TD_HOB            2
+#define TDVF_SECTION_TYPE_TEMP_MEM          3
+
+#define TDVF_SECTION_ATTRIBUTES_EXTENDMR    (1U << 0)
+
+typedef struct {
+    uint32_t DataOffset;
+    uint32_t RawDataSize;
+    uint64_t MemoryAddress;
+    uint64_t MemoryDataSize;
+    uint32_t Type;
+    uint32_t Attributes;
+} TdvfSectionEntry;
+
+#define TDVF_SIGNATURE_LE32     0x46564454 /* TDVF as little endian */
+
+typedef struct {
+    uint8_t Signature[4];
+    uint32_t Length;
+    uint32_t Version;
+    uint32_t NumberOfSectionEntries;
+    TdvfSectionEntry SectionEntries[];
+} TdvfMetadata;
+
+#endif /* HW_I386_TDVF_H */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 18/44] hw/i386: refactor e820_add_entry()
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

The following patch will utilize this refactoring.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/e820_memory_layout.c | 42 ++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/hw/i386/e820_memory_layout.c b/hw/i386/e820_memory_layout.c
index bcf9eaf837..d9bb11c02a 100644
--- a/hw/i386/e820_memory_layout.c
+++ b/hw/i386/e820_memory_layout.c
@@ -14,31 +14,45 @@ static size_t e820_entries;
 struct e820_table e820_reserve;
 struct e820_entry *e820_table;
 
-int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
+static int e820_append_reserve(uint64_t address, uint64_t length, uint32_t type)
 {
     int index = le32_to_cpu(e820_reserve.count);
     struct e820_entry *entry;
 
-    if (type != E820_RAM) {
-        /* old FW_CFG_E820_TABLE entry -- reservations only */
-        if (index >= E820_NR_ENTRIES) {
-            return -EBUSY;
-        }
-        entry = &e820_reserve.entry[index++];
+    /* old FW_CFG_E820_TABLE entry -- reservations only */
+    if (index >= E820_NR_ENTRIES) {
+        return -EBUSY;
+    }
+    entry = &e820_reserve.entry[index++];
 
-        entry->address = cpu_to_le64(address);
-        entry->length = cpu_to_le64(length);
-        entry->type = cpu_to_le32(type);
+    entry->address = cpu_to_le64(address);
+    entry->length = cpu_to_le64(length);
+    entry->type = cpu_to_le32(type);
 
-        e820_reserve.count = cpu_to_le32(index);
-    }
+    e820_reserve.count = cpu_to_le32(index);
+    return 0;
+}
 
-    /* new "etc/e820" file -- include ram too */
-    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
+static void e820_append_entry(uint64_t address, uint64_t length, uint32_t type)
+{
     e820_table[e820_entries].address = cpu_to_le64(address);
     e820_table[e820_entries].length = cpu_to_le64(length);
     e820_table[e820_entries].type = cpu_to_le32(type);
     e820_entries++;
+}
+
+int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+    if (type != E820_RAM) {
+        int ret = e820_append_reserve(address, length, type);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    /* new "etc/e820" file -- include ram too */
+    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
+    e820_append_entry(address, length, type);
 
     return e820_entries;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 18/44] hw/i386: refactor e820_add_entry()
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

The following patch will utilize this refactoring.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/e820_memory_layout.c | 42 ++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/hw/i386/e820_memory_layout.c b/hw/i386/e820_memory_layout.c
index bcf9eaf837..d9bb11c02a 100644
--- a/hw/i386/e820_memory_layout.c
+++ b/hw/i386/e820_memory_layout.c
@@ -14,31 +14,45 @@ static size_t e820_entries;
 struct e820_table e820_reserve;
 struct e820_entry *e820_table;
 
-int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
+static int e820_append_reserve(uint64_t address, uint64_t length, uint32_t type)
 {
     int index = le32_to_cpu(e820_reserve.count);
     struct e820_entry *entry;
 
-    if (type != E820_RAM) {
-        /* old FW_CFG_E820_TABLE entry -- reservations only */
-        if (index >= E820_NR_ENTRIES) {
-            return -EBUSY;
-        }
-        entry = &e820_reserve.entry[index++];
+    /* old FW_CFG_E820_TABLE entry -- reservations only */
+    if (index >= E820_NR_ENTRIES) {
+        return -EBUSY;
+    }
+    entry = &e820_reserve.entry[index++];
 
-        entry->address = cpu_to_le64(address);
-        entry->length = cpu_to_le64(length);
-        entry->type = cpu_to_le32(type);
+    entry->address = cpu_to_le64(address);
+    entry->length = cpu_to_le64(length);
+    entry->type = cpu_to_le32(type);
 
-        e820_reserve.count = cpu_to_le32(index);
-    }
+    e820_reserve.count = cpu_to_le32(index);
+    return 0;
+}
 
-    /* new "etc/e820" file -- include ram too */
-    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
+static void e820_append_entry(uint64_t address, uint64_t length, uint32_t type)
+{
     e820_table[e820_entries].address = cpu_to_le64(address);
     e820_table[e820_entries].length = cpu_to_le64(length);
     e820_table[e820_entries].type = cpu_to_le32(type);
     e820_entries++;
+}
+
+int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+    if (type != E820_RAM) {
+        int ret = e820_append_reserve(address, length, type);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    /* new "etc/e820" file -- include ram too */
+    e820_table = g_renew(struct e820_entry, e820_table, e820_entries + 1);
+    e820_append_entry(address, length, type);
 
     return e820_entries;
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 19/44] hw/i386/e820: introduce a helper function to change type of e820
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Introduce a helper function, e820_change_type(), that change
the type of subregion of e820 entry.
The following patch uses it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/e820_memory_layout.c | 72 ++++++++++++++++++++++++++++++++++++
 hw/i386/e820_memory_layout.h |  1 +
 2 files changed, 73 insertions(+)

diff --git a/hw/i386/e820_memory_layout.c b/hw/i386/e820_memory_layout.c
index d9bb11c02a..109c4f715a 100644
--- a/hw/i386/e820_memory_layout.c
+++ b/hw/i386/e820_memory_layout.c
@@ -57,6 +57,78 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
     return e820_entries;
 }
 
+int e820_change_type(uint64_t address, uint64_t length, uint32_t type)
+{
+    size_t i;
+
+    if (type != E820_RAM) {
+        int ret = e820_append_reserve(address, length, type);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    /* new "etc/e820" file -- include ram too */
+    for (i = 0; i < e820_entries; i++) {
+        struct e820_entry *e = &e820_table[i];
+        struct e820_entry tmp = {
+            .address = le64_to_cpu(e->address),
+            .length = le64_to_cpu(e->length),
+            .type = le32_to_cpu(e->type),
+        };
+        /* overlap? */
+        if (address + length < tmp.address ||
+            tmp.address + tmp.length < address) {
+            continue;
+        }
+        /*
+         * partial-overlap is not allowed.
+         * It is assumed that the region is completely contained within
+         * other region.
+         */
+        if (address < tmp.address ||
+            tmp.address + tmp.length < address + length) {
+            return -EINVAL;
+        }
+        /* only real type change is allowed. */
+        if (tmp.type == type) {
+            return -EINVAL;
+        }
+
+        if (tmp.address == address &&
+            tmp.address + tmp.length == address + length) {
+            e->type = cpu_to_le32(type);
+            return e820_entries;
+        } else if (tmp.address == address) {
+            e820_table = g_renew(struct e820_entry,
+                                 e820_table, e820_entries + 1);
+            e = &e820_table[i];
+            e->address = cpu_to_le64(tmp.address + length);
+            e820_append_entry(address, length, type);
+            return e820_entries;
+        } else if (tmp.address + tmp.length == address + length) {
+            e820_table = g_renew(struct e820_entry,
+                                 e820_table, e820_entries + 1);
+            e = &e820_table[i];
+            e->length = cpu_to_le64(tmp.length - length);
+            e820_append_entry(address, length, type);
+            return e820_entries;
+        } else {
+            e820_table = g_renew(struct e820_entry,
+                                 e820_table, e820_entries + 2);
+            e = &e820_table[i];
+            e->length = cpu_to_le64(address - tmp.address);
+            e820_append_entry(address, length, type);
+            e820_append_entry(address + length,
+                              tmp.address + tmp.length - (address + length),
+                              tmp.type);
+            return e820_entries;
+        }
+    }
+
+    return -EINVAL;
+}
+
 int e820_get_num_entries(void)
 {
     return e820_entries;
diff --git a/hw/i386/e820_memory_layout.h b/hw/i386/e820_memory_layout.h
index 2a0ceb8b9c..5f27cee476 100644
--- a/hw/i386/e820_memory_layout.h
+++ b/hw/i386/e820_memory_layout.h
@@ -33,6 +33,7 @@ extern struct e820_table e820_reserve;
 extern struct e820_entry *e820_table;
 
 int e820_add_entry(uint64_t address, uint64_t length, uint32_t type);
+int e820_change_type(uint64_t address, uint64_t length, uint32_t type);
 int e820_get_num_entries(void);
 bool e820_get_entry(int index, uint32_t type,
                     uint64_t *address, uint64_t *length);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 19/44] hw/i386/e820: introduce a helper function to change type of e820
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Introduce a helper function, e820_change_type(), that change
the type of subregion of e820 entry.
The following patch uses it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/e820_memory_layout.c | 72 ++++++++++++++++++++++++++++++++++++
 hw/i386/e820_memory_layout.h |  1 +
 2 files changed, 73 insertions(+)

diff --git a/hw/i386/e820_memory_layout.c b/hw/i386/e820_memory_layout.c
index d9bb11c02a..109c4f715a 100644
--- a/hw/i386/e820_memory_layout.c
+++ b/hw/i386/e820_memory_layout.c
@@ -57,6 +57,78 @@ int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
     return e820_entries;
 }
 
+int e820_change_type(uint64_t address, uint64_t length, uint32_t type)
+{
+    size_t i;
+
+    if (type != E820_RAM) {
+        int ret = e820_append_reserve(address, length, type);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    /* new "etc/e820" file -- include ram too */
+    for (i = 0; i < e820_entries; i++) {
+        struct e820_entry *e = &e820_table[i];
+        struct e820_entry tmp = {
+            .address = le64_to_cpu(e->address),
+            .length = le64_to_cpu(e->length),
+            .type = le32_to_cpu(e->type),
+        };
+        /* overlap? */
+        if (address + length < tmp.address ||
+            tmp.address + tmp.length < address) {
+            continue;
+        }
+        /*
+         * partial-overlap is not allowed.
+         * It is assumed that the region is completely contained within
+         * other region.
+         */
+        if (address < tmp.address ||
+            tmp.address + tmp.length < address + length) {
+            return -EINVAL;
+        }
+        /* only real type change is allowed. */
+        if (tmp.type == type) {
+            return -EINVAL;
+        }
+
+        if (tmp.address == address &&
+            tmp.address + tmp.length == address + length) {
+            e->type = cpu_to_le32(type);
+            return e820_entries;
+        } else if (tmp.address == address) {
+            e820_table = g_renew(struct e820_entry,
+                                 e820_table, e820_entries + 1);
+            e = &e820_table[i];
+            e->address = cpu_to_le64(tmp.address + length);
+            e820_append_entry(address, length, type);
+            return e820_entries;
+        } else if (tmp.address + tmp.length == address + length) {
+            e820_table = g_renew(struct e820_entry,
+                                 e820_table, e820_entries + 1);
+            e = &e820_table[i];
+            e->length = cpu_to_le64(tmp.length - length);
+            e820_append_entry(address, length, type);
+            return e820_entries;
+        } else {
+            e820_table = g_renew(struct e820_entry,
+                                 e820_table, e820_entries + 2);
+            e = &e820_table[i];
+            e->length = cpu_to_le64(address - tmp.address);
+            e820_append_entry(address, length, type);
+            e820_append_entry(address + length,
+                              tmp.address + tmp.length - (address + length),
+                              tmp.type);
+            return e820_entries;
+        }
+    }
+
+    return -EINVAL;
+}
+
 int e820_get_num_entries(void)
 {
     return e820_entries;
diff --git a/hw/i386/e820_memory_layout.h b/hw/i386/e820_memory_layout.h
index 2a0ceb8b9c..5f27cee476 100644
--- a/hw/i386/e820_memory_layout.h
+++ b/hw/i386/e820_memory_layout.h
@@ -33,6 +33,7 @@ extern struct e820_table e820_reserve;
 extern struct e820_entry *e820_table;
 
 int e820_add_entry(uint64_t address, uint64_t length, uint32_t type);
+int e820_change_type(uint64_t address, uint64_t length, uint32_t type);
 int e820_get_num_entries(void);
 bool e820_get_entry(int index, uint32_t type,
                     uint64_t *address, uint64_t *length);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson, Min M . Xu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add support for loading TDX's Trusted Domain Virtual Firmware (TDVF) via
the generic loader.  Prioritize the TDVF above plain hex to avoid false
positives with hex (TDVF has explicit metadata to confirm it's a TDVF).

Enumerate TempMem as added, private memory, i.e. E820_RESERVED,
otherwise TDVF will interpret the whole shebang as MMIO and complain
that the aperture overlaps other MMIO regions.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reported-by: Min M. Xu <min.m.xu@intel.com>
---
 hw/core/generic-loader.c |   5 +
 hw/core/meson.build      |   3 +
 hw/core/tdvf-stub.c      |   6 +
 hw/i386/meson.build      |   1 +
 hw/i386/tdvf.c           | 312 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/tdvf.h    |   6 +
 target/i386/kvm/tdx.h    |  26 ++++
 7 files changed, 359 insertions(+)
 create mode 100644 hw/core/tdvf-stub.c
 create mode 100644 hw/i386/tdvf.c
 create mode 100644 include/sysemu/tdvf.h

diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
index d14f932eea..ee2f49b47a 100644
--- a/hw/core/generic-loader.c
+++ b/hw/core/generic-loader.c
@@ -34,6 +34,7 @@
 #include "hw/core/cpu.h"
 #include "sysemu/dma.h"
 #include "sysemu/reset.h"
+#include "sysemu/tdvf.h"
 #include "hw/boards.h"
 #include "hw/loader.h"
 #include "hw/qdev-properties.h"
@@ -147,6 +148,10 @@ static void generic_loader_realize(DeviceState *dev, Error **errp)
                                       as);
             }
 
+            if (size < 0) {
+                size = load_tdvf(s->file);
+            }
+
             if (size < 0) {
                 size = load_targphys_hex_as(s->file, &entry, as);
             }
diff --git a/hw/core/meson.build b/hw/core/meson.build
index 18f44fb7c2..ec943debf1 100644
--- a/hw/core/meson.build
+++ b/hw/core/meson.build
@@ -24,6 +24,9 @@ common_ss.add(when: 'CONFIG_REGISTER', if_true: files('register.c'))
 common_ss.add(when: 'CONFIG_SPLIT_IRQ', if_true: files('split-irq.c'))
 common_ss.add(when: 'CONFIG_XILINX_AXI', if_true: files('stream.c'))
 
+common_ss.add(when: 'CONFIG_TDX', if_false: files('tdvf-stub.c'))
+common_ss.add(when: 'CONFIG_ALL', if_true: files('tdvf-stub.c'))
+
 softmmu_ss.add(files(
   'cpu-sysemu.c',
   'fw-path-provider.c',
diff --git a/hw/core/tdvf-stub.c b/hw/core/tdvf-stub.c
new file mode 100644
index 0000000000..5f2586dd70
--- /dev/null
+++ b/hw/core/tdvf-stub.c
@@ -0,0 +1,6 @@
+#include "sysemu/tdvf.h"
+
+int load_tdvf(const char *filename)
+{
+    return -1;
+}
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index e5d109f5c6..945e805525 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -24,6 +24,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'pc_sysfw.c',
   'acpi-build.c',
   'port92.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
new file mode 100644
index 0000000000..9b0065d656
--- /dev/null
+++ b/hw/i386/tdvf.c
@@ -0,0 +1,312 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/units.h"
+#include "cpu.h"
+#include "exec/hwaddr.h"
+#include "hw/boards.h"
+#include "hw/i386/e820_memory_layout.h"
+#include "hw/i386/tdvf.h"
+#include "hw/i386/x86.h"
+#include "hw/loader.h"
+#include "sysemu/tdx.h"
+#include "sysemu/tdvf.h"
+#include "target/i386/kvm/tdx.h"
+
+static void tdvf_init_ram_memory(MachineState *ms, TdxFirmwareEntry *entry)
+{
+    void *ram_ptr = memory_region_get_ram_ptr(ms->ram);
+    X86MachineState *x86ms = X86_MACHINE(ms);
+
+    if (entry->type == TDVF_SECTION_TYPE_BFV ||
+        entry->type == TDVF_SECTION_TYPE_CFV) {
+            error_report("TDVF type %u addr 0x%" PRIx64 " in RAM (disallowed)",
+                         entry->type, entry->address);
+            exit(1);
+    }
+
+    if (entry->address < 4 * GiB) {
+        entry->mem_ptr = ram_ptr + entry->address;
+    } else {
+        /*
+         * If TDVF temp memory describe in TDVF metadata lays in RAM, reserve
+         * the region property.
+         */
+        if (entry->address >= 4 * GiB + x86ms->above_4g_mem_size ||
+            entry->address + entry->size >= 4 * GiB + x86ms->above_4g_mem_size) {
+            error_report("TDVF type %u address 0x%" PRIx64 " size 0x%" PRIx64
+                         " above high memory",
+                         entry->type, entry->address, entry->size);
+            exit(1);
+        }
+        entry->mem_ptr = ram_ptr + x86ms->below_4g_mem_size +
+                         entry->address - 4 * GiB;
+    }
+    e820_change_type(entry->address, entry->size, E820_RESERVED);
+}
+
+static void tdvf_init_bios_memory(int fd, const char *filename,
+                                  TdxFirmwareEntry *entry)
+{
+    static unsigned int nr_cfv;
+    static unsigned int nr_tmp;
+
+    MemoryRegion *system_memory = get_system_memory();
+    Error *err = NULL;
+    const char *name;
+
+    /* Error out if the section might overlap other structures. */
+    if (entry->address < 4 * GiB - 16 * MiB) {
+        error_report("TDVF type %u address 0x%" PRIx64 " in PCI hole",
+                        entry->type, entry->address);
+        exit(1);
+    }
+
+    if (entry->type == TDVF_SECTION_TYPE_BFV) {
+        name = g_strdup("tdvf.bfv");
+    } else if (entry->type == TDVF_SECTION_TYPE_CFV) {
+        name = g_strdup_printf("tdvf.cfv%u", nr_cfv++);
+    } else if (entry->type == TDVF_SECTION_TYPE_TD_HOB) {
+        name = g_strdup("tdvf.hob");
+    } else if (entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+        name = g_strdup_printf("tdvf.tmp%u", nr_tmp++);
+    } else {
+        error_report("TDVF type %u unknown/unsupported", entry->type);
+        exit(1);
+    }
+    entry->mr = g_malloc(sizeof(*entry->mr));
+
+    memory_region_init_ram(entry->mr, NULL, name, entry->size, &err);
+    if (err) {
+        error_report_err(err);
+        exit(1);
+    }
+
+    entry->mem_ptr = memory_region_get_ram_ptr(entry->mr);
+    if (entry->data_len) {
+        /*
+         * The memory_region api doesn't allow partial file mapping, create
+         * ram and copy the contents
+         */
+        if (lseek(fd, entry->data_offset, SEEK_SET) != entry->data_offset) {
+            error_report("can't seek to 0x%x %s", entry->data_offset, filename);
+            exit(1);
+        }
+        if (read(fd, entry->mem_ptr, entry->data_len) != entry->data_len) {
+            error_report("can't read 0x%x %s", entry->data_len, filename);
+            exit(1);
+        }
+    }
+
+    memory_region_add_subregion(system_memory, entry->address, entry->mr);
+
+    if (entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+        e820_add_entry(entry->address, entry->size, E820_RESERVED);
+    }
+}
+
+static void tdvf_parse_section_entry(TdxFirmwareEntry *entry,
+                                     const TdvfSectionEntry *src,
+                                     uint64_t file_size)
+{
+    entry->data_offset = le32_to_cpu(src->DataOffset);
+    entry->data_len = le32_to_cpu(src->RawDataSize);
+    entry->address = le64_to_cpu(src->MemoryAddress);
+    entry->size = le64_to_cpu(src->MemoryDataSize);
+    entry->type = le32_to_cpu(src->Type);
+    entry->attributes = le32_to_cpu(src->Attributes);
+
+    /* sanity check */
+    if (entry->data_offset + entry->data_len > file_size) {
+        error_report("too large section: DataOffset 0x%x RawDataSize 0x%x",
+                     entry->data_offset, entry->data_len);
+        exit(1);
+    }
+    if (entry->size < entry->data_len) {
+        error_report("broken metadata RawDataSize 0x%x MemoryDataSize 0x%lx",
+                     entry->data_len, entry->size);
+        exit(1);
+    }
+    if (!QEMU_IS_ALIGNED(entry->address, TARGET_PAGE_SIZE)) {
+        error_report("MemoryAddress 0x%lx not page aligned", entry->address);
+        exit(1);
+    }
+    if (!QEMU_IS_ALIGNED(entry->size, TARGET_PAGE_SIZE)) {
+        error_report("MemoryDataSize 0x%lx not page aligned", entry->size);
+        exit(1);
+    }
+    if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+        entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+        if (entry->data_len > 0) {
+            error_report("%d section with RawDataSize 0x%x > 0",
+                         entry->type, entry->data_len);
+            exit(1);
+        }
+    }
+}
+
+static void tdvf_parse_metadata_entries(int fd, TdxFirmware *fw,
+                                        TdvfMetadata *metadata)
+{
+
+    TdvfSectionEntry *sections;
+    ssize_t entries_size;
+    uint32_t len, i;
+
+    fw->nr_entries = le32_to_cpu(metadata->NumberOfSectionEntries);
+    if (fw->nr_entries < 2) {
+        error_report("Invalid number of entries (%u) in TDVF", fw->nr_entries);
+        exit(1);
+    }
+
+    len = le32_to_cpu(metadata->Length);
+    entries_size = fw->nr_entries * sizeof(TdvfSectionEntry);
+    if (len != sizeof(*metadata) + entries_size) {
+        error_report("TDVF metadata len (0x%x) mismatch, expected (0x%x)",
+                     len, (uint32_t)(sizeof(*metadata) + entries_size));
+        exit(1);
+    }
+
+    fw->entries = g_new(TdxFirmwareEntry, fw->nr_entries);
+    sections = g_new(TdvfSectionEntry, fw->nr_entries);
+
+    if (read(fd, sections, entries_size) != entries_size)  {
+        error_report("Failed to read TDVF section entries");
+        exit(1);
+    }
+
+    for (i = 0; i < fw->nr_entries; i++) {
+        tdvf_parse_section_entry(&fw->entries[i], &sections[i], fw->file_size);
+    }
+    g_free(sections);
+}
+
+static int tdvf_parse_metadata_header(int fd, TdvfMetadata *metadata)
+{
+    uint32_t offset;
+    int64_t size;
+
+    size = lseek(fd, 0, SEEK_END);
+    if (size < TDVF_METDATA_OFFSET_FROM_END || (uint32_t)size != size) {
+        return -1;
+    }
+
+    /* Chase the metadata pointer to get to the actual metadata. */
+    offset = size - TDVF_METDATA_OFFSET_FROM_END;
+    if (lseek(fd, offset, SEEK_SET) != offset) {
+        return -1;
+    }
+    if (read(fd, &offset, sizeof(offset)) != sizeof(offset)) {
+        return -1;
+    }
+
+    offset = le32_to_cpu(offset);
+    if (offset > size - sizeof(*metadata)) {
+        return -1;
+    }
+
+    /* Pointer to the metadata has been resolved, read the actual metadata. */
+    if (lseek(fd, offset, SEEK_SET) != offset) {
+        return -1;
+    }
+    if (read(fd, metadata, sizeof(*metadata)) != sizeof(*metadata)) {
+        return -1;
+    }
+
+    /* Finally, verify the signature to determine if this is a TDVF image. */
+    if (metadata->Signature[0] != 'T' || metadata->Signature[1] != 'D' ||
+        metadata->Signature[2] != 'V' || metadata->Signature[3] != 'F') {
+        return -1;
+    }
+
+    /* Sanity check that the TDVF doesn't overlap its own metadata. */
+    metadata->Length = le32_to_cpu(metadata->Length);
+    if (metadata->Length > size - offset) {
+        return -1;
+    }
+
+    /* Only version 1 is supported/defined. */
+    metadata->Version = le32_to_cpu(metadata->Version);
+    if (metadata->Version != 1) {
+        return -1;
+    }
+
+    return size;
+}
+
+int load_tdvf(const char *filename)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    TdxFirmwareEntry *entry;
+    TdvfMetadata metadata;
+    TdxGuest *tdx;
+    TdxFirmware *fw;
+    int64_t size;
+    int fd;
+
+    if (!kvm_enabled()) {
+        return -1;
+    }
+
+    tdx = (void *)object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST);
+    if (!tdx) {
+        return -1;
+    }
+
+    fd = open(filename, O_RDONLY | O_BINARY);
+    if (fd < 0) {
+        return -1;
+    }
+
+    size = tdvf_parse_metadata_header(fd, &metadata);
+    if (size < 0) {
+        close(fd);
+        return -1;
+    }
+
+    /* Error out if the user is attempting to load multiple TDVFs. */
+    fw = &tdx->fw;
+    if (fw->file_name) {
+        error_report("tdvf can only be specified once.");
+        exit(1);
+    }
+
+    fw->file_size = size;
+    fw->file_name = g_strdup(filename);
+
+    tdvf_parse_metadata_entries(fd, fw, &metadata);
+
+    for_each_fw_entry(fw, entry) {
+        if (entry->address < x86ms->below_4g_mem_size ||
+            entry->address > 4 * GiB) {
+            tdvf_init_ram_memory(ms, entry);
+        } else {
+            tdvf_init_bios_memory(fd, filename, entry);
+        }
+    }
+
+    close(fd);
+    return 0;
+}
diff --git a/include/sysemu/tdvf.h b/include/sysemu/tdvf.h
new file mode 100644
index 0000000000..0cf085e3ae
--- /dev/null
+++ b/include/sysemu/tdvf.h
@@ -0,0 +1,6 @@
+#ifndef QEMU_TDVF_H
+#define QEMU_TDVF_H
+
+int load_tdvf(const char *filename);
+
+#endif
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 844d24aade..2fed27b3fb 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -5,6 +5,30 @@
 #include "qapi/error.h"
 #include "exec/confidential-guest-support.h"
 
+typedef struct TdxFirmwareEntry {
+    uint32_t data_offset;
+    uint32_t data_len;
+    uint64_t address;
+    uint64_t size;
+    uint32_t type;
+    uint32_t attributes;
+
+    MemoryRegion *mr;
+    void *mem_ptr;
+} TdxFirmwareEntry;
+
+typedef struct TdxFirmware {
+    const char *file_name;
+    uint64_t file_size;
+
+    /* metadata */
+    uint32_t nr_entries;
+    TdxFirmwareEntry *entries;
+} TdxFirmware;
+
+#define for_each_fw_entry(fw, e)                                        \
+    for (e = (fw)->entries; e != (fw)->entries + (fw)->nr_entries; e++)
+
 #define TYPE_TDX_GUEST "tdx-guest"
 #define TDX_GUEST(obj)     \
     OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
@@ -20,6 +44,8 @@ typedef struct TdxGuest {
 
     bool initialized;
     bool debug;
+
+    TdxFirmware fw;
 } TdxGuest;
 
 int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm, Min M . Xu

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add support for loading TDX's Trusted Domain Virtual Firmware (TDVF) via
the generic loader.  Prioritize the TDVF above plain hex to avoid false
positives with hex (TDVF has explicit metadata to confirm it's a TDVF).

Enumerate TempMem as added, private memory, i.e. E820_RESERVED,
otherwise TDVF will interpret the whole shebang as MMIO and complain
that the aperture overlaps other MMIO regions.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reported-by: Min M. Xu <min.m.xu@intel.com>
---
 hw/core/generic-loader.c |   5 +
 hw/core/meson.build      |   3 +
 hw/core/tdvf-stub.c      |   6 +
 hw/i386/meson.build      |   1 +
 hw/i386/tdvf.c           | 312 +++++++++++++++++++++++++++++++++++++++
 include/sysemu/tdvf.h    |   6 +
 target/i386/kvm/tdx.h    |  26 ++++
 7 files changed, 359 insertions(+)
 create mode 100644 hw/core/tdvf-stub.c
 create mode 100644 hw/i386/tdvf.c
 create mode 100644 include/sysemu/tdvf.h

diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
index d14f932eea..ee2f49b47a 100644
--- a/hw/core/generic-loader.c
+++ b/hw/core/generic-loader.c
@@ -34,6 +34,7 @@
 #include "hw/core/cpu.h"
 #include "sysemu/dma.h"
 #include "sysemu/reset.h"
+#include "sysemu/tdvf.h"
 #include "hw/boards.h"
 #include "hw/loader.h"
 #include "hw/qdev-properties.h"
@@ -147,6 +148,10 @@ static void generic_loader_realize(DeviceState *dev, Error **errp)
                                       as);
             }
 
+            if (size < 0) {
+                size = load_tdvf(s->file);
+            }
+
             if (size < 0) {
                 size = load_targphys_hex_as(s->file, &entry, as);
             }
diff --git a/hw/core/meson.build b/hw/core/meson.build
index 18f44fb7c2..ec943debf1 100644
--- a/hw/core/meson.build
+++ b/hw/core/meson.build
@@ -24,6 +24,9 @@ common_ss.add(when: 'CONFIG_REGISTER', if_true: files('register.c'))
 common_ss.add(when: 'CONFIG_SPLIT_IRQ', if_true: files('split-irq.c'))
 common_ss.add(when: 'CONFIG_XILINX_AXI', if_true: files('stream.c'))
 
+common_ss.add(when: 'CONFIG_TDX', if_false: files('tdvf-stub.c'))
+common_ss.add(when: 'CONFIG_ALL', if_true: files('tdvf-stub.c'))
+
 softmmu_ss.add(files(
   'cpu-sysemu.c',
   'fw-path-provider.c',
diff --git a/hw/core/tdvf-stub.c b/hw/core/tdvf-stub.c
new file mode 100644
index 0000000000..5f2586dd70
--- /dev/null
+++ b/hw/core/tdvf-stub.c
@@ -0,0 +1,6 @@
+#include "sysemu/tdvf.h"
+
+int load_tdvf(const char *filename)
+{
+    return -1;
+}
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index e5d109f5c6..945e805525 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -24,6 +24,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'pc_sysfw.c',
   'acpi-build.c',
   'port92.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
new file mode 100644
index 0000000000..9b0065d656
--- /dev/null
+++ b/hw/i386/tdvf.c
@@ -0,0 +1,312 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/units.h"
+#include "cpu.h"
+#include "exec/hwaddr.h"
+#include "hw/boards.h"
+#include "hw/i386/e820_memory_layout.h"
+#include "hw/i386/tdvf.h"
+#include "hw/i386/x86.h"
+#include "hw/loader.h"
+#include "sysemu/tdx.h"
+#include "sysemu/tdvf.h"
+#include "target/i386/kvm/tdx.h"
+
+static void tdvf_init_ram_memory(MachineState *ms, TdxFirmwareEntry *entry)
+{
+    void *ram_ptr = memory_region_get_ram_ptr(ms->ram);
+    X86MachineState *x86ms = X86_MACHINE(ms);
+
+    if (entry->type == TDVF_SECTION_TYPE_BFV ||
+        entry->type == TDVF_SECTION_TYPE_CFV) {
+            error_report("TDVF type %u addr 0x%" PRIx64 " in RAM (disallowed)",
+                         entry->type, entry->address);
+            exit(1);
+    }
+
+    if (entry->address < 4 * GiB) {
+        entry->mem_ptr = ram_ptr + entry->address;
+    } else {
+        /*
+         * If TDVF temp memory describe in TDVF metadata lays in RAM, reserve
+         * the region property.
+         */
+        if (entry->address >= 4 * GiB + x86ms->above_4g_mem_size ||
+            entry->address + entry->size >= 4 * GiB + x86ms->above_4g_mem_size) {
+            error_report("TDVF type %u address 0x%" PRIx64 " size 0x%" PRIx64
+                         " above high memory",
+                         entry->type, entry->address, entry->size);
+            exit(1);
+        }
+        entry->mem_ptr = ram_ptr + x86ms->below_4g_mem_size +
+                         entry->address - 4 * GiB;
+    }
+    e820_change_type(entry->address, entry->size, E820_RESERVED);
+}
+
+static void tdvf_init_bios_memory(int fd, const char *filename,
+                                  TdxFirmwareEntry *entry)
+{
+    static unsigned int nr_cfv;
+    static unsigned int nr_tmp;
+
+    MemoryRegion *system_memory = get_system_memory();
+    Error *err = NULL;
+    const char *name;
+
+    /* Error out if the section might overlap other structures. */
+    if (entry->address < 4 * GiB - 16 * MiB) {
+        error_report("TDVF type %u address 0x%" PRIx64 " in PCI hole",
+                        entry->type, entry->address);
+        exit(1);
+    }
+
+    if (entry->type == TDVF_SECTION_TYPE_BFV) {
+        name = g_strdup("tdvf.bfv");
+    } else if (entry->type == TDVF_SECTION_TYPE_CFV) {
+        name = g_strdup_printf("tdvf.cfv%u", nr_cfv++);
+    } else if (entry->type == TDVF_SECTION_TYPE_TD_HOB) {
+        name = g_strdup("tdvf.hob");
+    } else if (entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+        name = g_strdup_printf("tdvf.tmp%u", nr_tmp++);
+    } else {
+        error_report("TDVF type %u unknown/unsupported", entry->type);
+        exit(1);
+    }
+    entry->mr = g_malloc(sizeof(*entry->mr));
+
+    memory_region_init_ram(entry->mr, NULL, name, entry->size, &err);
+    if (err) {
+        error_report_err(err);
+        exit(1);
+    }
+
+    entry->mem_ptr = memory_region_get_ram_ptr(entry->mr);
+    if (entry->data_len) {
+        /*
+         * The memory_region api doesn't allow partial file mapping, create
+         * ram and copy the contents
+         */
+        if (lseek(fd, entry->data_offset, SEEK_SET) != entry->data_offset) {
+            error_report("can't seek to 0x%x %s", entry->data_offset, filename);
+            exit(1);
+        }
+        if (read(fd, entry->mem_ptr, entry->data_len) != entry->data_len) {
+            error_report("can't read 0x%x %s", entry->data_len, filename);
+            exit(1);
+        }
+    }
+
+    memory_region_add_subregion(system_memory, entry->address, entry->mr);
+
+    if (entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+        e820_add_entry(entry->address, entry->size, E820_RESERVED);
+    }
+}
+
+static void tdvf_parse_section_entry(TdxFirmwareEntry *entry,
+                                     const TdvfSectionEntry *src,
+                                     uint64_t file_size)
+{
+    entry->data_offset = le32_to_cpu(src->DataOffset);
+    entry->data_len = le32_to_cpu(src->RawDataSize);
+    entry->address = le64_to_cpu(src->MemoryAddress);
+    entry->size = le64_to_cpu(src->MemoryDataSize);
+    entry->type = le32_to_cpu(src->Type);
+    entry->attributes = le32_to_cpu(src->Attributes);
+
+    /* sanity check */
+    if (entry->data_offset + entry->data_len > file_size) {
+        error_report("too large section: DataOffset 0x%x RawDataSize 0x%x",
+                     entry->data_offset, entry->data_len);
+        exit(1);
+    }
+    if (entry->size < entry->data_len) {
+        error_report("broken metadata RawDataSize 0x%x MemoryDataSize 0x%lx",
+                     entry->data_len, entry->size);
+        exit(1);
+    }
+    if (!QEMU_IS_ALIGNED(entry->address, TARGET_PAGE_SIZE)) {
+        error_report("MemoryAddress 0x%lx not page aligned", entry->address);
+        exit(1);
+    }
+    if (!QEMU_IS_ALIGNED(entry->size, TARGET_PAGE_SIZE)) {
+        error_report("MemoryDataSize 0x%lx not page aligned", entry->size);
+        exit(1);
+    }
+    if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+        entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+        if (entry->data_len > 0) {
+            error_report("%d section with RawDataSize 0x%x > 0",
+                         entry->type, entry->data_len);
+            exit(1);
+        }
+    }
+}
+
+static void tdvf_parse_metadata_entries(int fd, TdxFirmware *fw,
+                                        TdvfMetadata *metadata)
+{
+
+    TdvfSectionEntry *sections;
+    ssize_t entries_size;
+    uint32_t len, i;
+
+    fw->nr_entries = le32_to_cpu(metadata->NumberOfSectionEntries);
+    if (fw->nr_entries < 2) {
+        error_report("Invalid number of entries (%u) in TDVF", fw->nr_entries);
+        exit(1);
+    }
+
+    len = le32_to_cpu(metadata->Length);
+    entries_size = fw->nr_entries * sizeof(TdvfSectionEntry);
+    if (len != sizeof(*metadata) + entries_size) {
+        error_report("TDVF metadata len (0x%x) mismatch, expected (0x%x)",
+                     len, (uint32_t)(sizeof(*metadata) + entries_size));
+        exit(1);
+    }
+
+    fw->entries = g_new(TdxFirmwareEntry, fw->nr_entries);
+    sections = g_new(TdvfSectionEntry, fw->nr_entries);
+
+    if (read(fd, sections, entries_size) != entries_size)  {
+        error_report("Failed to read TDVF section entries");
+        exit(1);
+    }
+
+    for (i = 0; i < fw->nr_entries; i++) {
+        tdvf_parse_section_entry(&fw->entries[i], &sections[i], fw->file_size);
+    }
+    g_free(sections);
+}
+
+static int tdvf_parse_metadata_header(int fd, TdvfMetadata *metadata)
+{
+    uint32_t offset;
+    int64_t size;
+
+    size = lseek(fd, 0, SEEK_END);
+    if (size < TDVF_METDATA_OFFSET_FROM_END || (uint32_t)size != size) {
+        return -1;
+    }
+
+    /* Chase the metadata pointer to get to the actual metadata. */
+    offset = size - TDVF_METDATA_OFFSET_FROM_END;
+    if (lseek(fd, offset, SEEK_SET) != offset) {
+        return -1;
+    }
+    if (read(fd, &offset, sizeof(offset)) != sizeof(offset)) {
+        return -1;
+    }
+
+    offset = le32_to_cpu(offset);
+    if (offset > size - sizeof(*metadata)) {
+        return -1;
+    }
+
+    /* Pointer to the metadata has been resolved, read the actual metadata. */
+    if (lseek(fd, offset, SEEK_SET) != offset) {
+        return -1;
+    }
+    if (read(fd, metadata, sizeof(*metadata)) != sizeof(*metadata)) {
+        return -1;
+    }
+
+    /* Finally, verify the signature to determine if this is a TDVF image. */
+    if (metadata->Signature[0] != 'T' || metadata->Signature[1] != 'D' ||
+        metadata->Signature[2] != 'V' || metadata->Signature[3] != 'F') {
+        return -1;
+    }
+
+    /* Sanity check that the TDVF doesn't overlap its own metadata. */
+    metadata->Length = le32_to_cpu(metadata->Length);
+    if (metadata->Length > size - offset) {
+        return -1;
+    }
+
+    /* Only version 1 is supported/defined. */
+    metadata->Version = le32_to_cpu(metadata->Version);
+    if (metadata->Version != 1) {
+        return -1;
+    }
+
+    return size;
+}
+
+int load_tdvf(const char *filename)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    TdxFirmwareEntry *entry;
+    TdvfMetadata metadata;
+    TdxGuest *tdx;
+    TdxFirmware *fw;
+    int64_t size;
+    int fd;
+
+    if (!kvm_enabled()) {
+        return -1;
+    }
+
+    tdx = (void *)object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST);
+    if (!tdx) {
+        return -1;
+    }
+
+    fd = open(filename, O_RDONLY | O_BINARY);
+    if (fd < 0) {
+        return -1;
+    }
+
+    size = tdvf_parse_metadata_header(fd, &metadata);
+    if (size < 0) {
+        close(fd);
+        return -1;
+    }
+
+    /* Error out if the user is attempting to load multiple TDVFs. */
+    fw = &tdx->fw;
+    if (fw->file_name) {
+        error_report("tdvf can only be specified once.");
+        exit(1);
+    }
+
+    fw->file_size = size;
+    fw->file_name = g_strdup(filename);
+
+    tdvf_parse_metadata_entries(fd, fw, &metadata);
+
+    for_each_fw_entry(fw, entry) {
+        if (entry->address < x86ms->below_4g_mem_size ||
+            entry->address > 4 * GiB) {
+            tdvf_init_ram_memory(ms, entry);
+        } else {
+            tdvf_init_bios_memory(fd, filename, entry);
+        }
+    }
+
+    close(fd);
+    return 0;
+}
diff --git a/include/sysemu/tdvf.h b/include/sysemu/tdvf.h
new file mode 100644
index 0000000000..0cf085e3ae
--- /dev/null
+++ b/include/sysemu/tdvf.h
@@ -0,0 +1,6 @@
+#ifndef QEMU_TDVF_H
+#define QEMU_TDVF_H
+
+int load_tdvf(const char *filename);
+
+#endif
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 844d24aade..2fed27b3fb 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -5,6 +5,30 @@
 #include "qapi/error.h"
 #include "exec/confidential-guest-support.h"
 
+typedef struct TdxFirmwareEntry {
+    uint32_t data_offset;
+    uint32_t data_len;
+    uint64_t address;
+    uint64_t size;
+    uint32_t type;
+    uint32_t attributes;
+
+    MemoryRegion *mr;
+    void *mem_ptr;
+} TdxFirmwareEntry;
+
+typedef struct TdxFirmware {
+    const char *file_name;
+    uint64_t file_size;
+
+    /* metadata */
+    uint32_t nr_entries;
+    TdxFirmwareEntry *entries;
+} TdxFirmware;
+
+#define for_each_fw_entry(fw, e)                                        \
+    for (e = (fw)->entries; e != (fw)->entries + (fw)->nr_entries; e++)
+
 #define TYPE_TDX_GUEST "tdx-guest"
 #define TDX_GUEST(obj)     \
     OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
@@ -20,6 +44,8 @@ typedef struct TdxGuest {
 
     bool initialized;
     bool debug;
+
+    TdxFirmware fw;
 } TdxGuest;
 
 int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 21/44] i386/tdx: Create the TD HOB list upon machine init done
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Isaku Yamahata <isaku.yamahata@intel.com>

Build the TD HOB during machine late initialization, i.e. once guest
memory is fully defined.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 hw/i386/meson.build   |   2 +-
 hw/i386/tdvf-hob.c    | 166 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/tdvf-hob.h    |  20 +++++
 target/i386/kvm/tdx.c |  19 +++++
 4 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 hw/i386/tdvf-hob.c
 create mode 100644 hw/i386/tdvf-hob.h

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 945e805525..8175c3c638 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -24,7 +24,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'pc_sysfw.c',
   'acpi-build.c',
   'port92.c'))
-i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c', 'tdvf-hob.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
new file mode 100644
index 0000000000..5e0bf807f7
--- /dev/null
+++ b/hw/i386/tdvf-hob.c
@@ -0,0 +1,166 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "e820_memory_layout.h"
+#include "hw/i386/x86.h"
+#include "sysemu/tdx.h"
+#include "tdvf-hob.h"
+#include "uefi.h"
+
+typedef struct TdvfHob {
+    hwaddr hob_addr;
+    void *ptr;
+    int size;
+
+    /* working area */
+    void *current;
+    void *end;
+} TdvfHob;
+
+static uint64_t tdvf_current_guest_addr(const TdvfHob *hob)
+{
+    return hob->hob_addr + (hob->current - hob->ptr);
+}
+
+static void tdvf_align(TdvfHob *hob, size_t align)
+{
+    hob->current = QEMU_ALIGN_PTR_UP(hob->current, align);
+}
+
+static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
+{
+    void *ret;
+
+    if (hob->current + size > hob->end) {
+        error_report("TD_HOB overrun, size = 0x%" PRIx64, size);
+        exit(1);
+    }
+
+    ret = hob->current;
+    hob->current += size;
+    tdvf_align(hob, 8);
+    return ret;
+}
+
+static int tdvf_e820_compare(const void *lhs_, const void* rhs_)
+{
+    const struct e820_entry *lhs = lhs_;
+    const struct e820_entry *rhs = rhs_;
+
+    if (lhs->address == rhs->address) {
+        return 0;
+    }
+    if (le64_to_cpu(lhs->address) > le64_to_cpu(rhs->address)) {
+        return 1;
+    }
+    return -1;
+}
+
+static void tdvf_hob_add_memory_resources(TdvfHob *hob)
+{
+    EFI_HOB_RESOURCE_DESCRIPTOR *region;
+    EFI_RESOURCE_ATTRIBUTE_TYPE attr;
+    EFI_RESOURCE_TYPE resource_type;
+
+    struct e820_entry *e820_entries, *e820_entry;
+    int nr_e820_entries, i;
+
+    nr_e820_entries = e820_get_num_entries();
+    e820_entries = g_new(struct e820_entry, nr_e820_entries);
+
+    /* Copy and sort the e820 tables to add them to the HOB. */
+    memcpy(e820_entries, e820_table,
+           nr_e820_entries * sizeof(struct e820_entry));
+    qsort(e820_entries, nr_e820_entries, sizeof(struct e820_entry),
+          &tdvf_e820_compare);
+
+    for (i = 0; i < nr_e820_entries; i++) {
+        e820_entry = &e820_entries[i];
+
+        if (le32_to_cpu(e820_entry->type) == E820_RAM) {
+            resource_type = EFI_RESOURCE_SYSTEM_MEMORY;
+            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED;
+        } else {
+            resource_type = EFI_RESOURCE_MEMORY_RESERVED;
+            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE;
+        }
+
+        region = tdvf_get_area(hob, sizeof(*region));
+        *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
+            .Header = {
+                .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
+                .HobLength = cpu_to_le16(sizeof(*region)),
+                .Reserved = cpu_to_le32(0),
+            },
+            .Owner = EFI_HOB_OWNER_ZERO,
+            .ResourceType = cpu_to_le32(resource_type),
+            .ResourceAttribute = cpu_to_le32(attr),
+            .PhysicalStart = e820_entry->address,
+            .ResourceLength = e820_entry->length,
+        };
+    }
+
+    g_free(e820_entries);
+}
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *hob_entry)
+{
+    TdvfHob hob = {
+        .hob_addr = hob_entry->address,
+        .ptr = hob_entry->mem_ptr,
+        .size = hob_entry->size,
+
+        .current = hob_entry->mem_ptr,
+        .end = hob_entry->mem_ptr + hob_entry->size,
+    };
+
+    EFI_HOB_GENERIC_HEADER *last_hob;
+    EFI_HOB_HANDOFF_INFO_TABLE *hit;
+
+    /* Note, Efi{Free}Memory{Bottom,Top} are ignored, leave 'em zeroed. */
+    hit = tdvf_get_area(&hob, sizeof(*hit));
+    *hit = (EFI_HOB_HANDOFF_INFO_TABLE) {
+        .Header = {
+            .HobType = EFI_HOB_TYPE_HANDOFF,
+            .HobLength = cpu_to_le16(sizeof(*hit)),
+            .Reserved = cpu_to_le32(0),
+        },
+        .Version = cpu_to_le32(EFI_HOB_HANDOFF_TABLE_VERSION),
+        .BootMode = cpu_to_le32(0),
+        .EfiMemoryTop = cpu_to_le64(0),
+        .EfiMemoryBottom = cpu_to_le64(0),
+        .EfiFreeMemoryTop = cpu_to_le64(0),
+        .EfiFreeMemoryBottom = cpu_to_le64(0),
+        .EfiEndOfHobList = cpu_to_le64(0), /* initialized later */
+    };
+
+    tdvf_hob_add_memory_resources(&hob);
+
+    last_hob = tdvf_get_area(&hob, sizeof(*last_hob));
+    *last_hob =  (EFI_HOB_GENERIC_HEADER) {
+        .HobType = EFI_HOB_TYPE_END_OF_HOB_LIST,
+        .HobLength = cpu_to_le16(sizeof(*last_hob)),
+        .Reserved = cpu_to_le32(0),
+    };
+    hit->EfiEndOfHobList = tdvf_current_guest_addr(&hob);
+}
diff --git a/hw/i386/tdvf-hob.h b/hw/i386/tdvf-hob.h
new file mode 100644
index 0000000000..c6c5c1d564
--- /dev/null
+++ b/hw/i386/tdvf-hob.h
@@ -0,0 +1,20 @@
+#ifndef HW_I386_TD_HOB_H
+#define HW_I386_TD_HOB_H
+
+#include "hw/i386/tdvf.h"
+#include "target/i386/kvm/tdx.h"
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *hob_entry);
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE     \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_ENCRYPTED |         \
+     EFI_RESOURCE_ATTRIBUTE_TESTED)
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED  \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_UNACCEPTED)
+
+#endif
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 48c04d344d..12b2e02fa2 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
 #include "cpu.h"
 #include "kvm_i386.h"
 #include "hw/boards.h"
+#include "hw/i386/tdvf-hob.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
 #include "standard-headers/asm-x86/kvm_para.h"
@@ -67,8 +68,26 @@ static void __tdx_ioctl(void *state, int ioctl_no, const char *ioctl_name,
 #define tdx_ioctl(ioctl_no, metadata, data) \
         _tdx_ioctl(kvm_state, ioctl_no, metadata, data)
 
+static TdxFirmwareEntry *tdx_get_hob_entry(TdxGuest *tdx)
+{
+    TdxFirmwareEntry *entry;
+
+    for_each_fw_entry(&tdx->fw, entry) {
+        if (entry->type == TDVF_SECTION_TYPE_TD_HOB) {
+            return entry;
+        }
+    }
+    error_report("TDVF metadata doesn't specify TD_HOB location.");
+    exit(1);
+}
+
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
+    MachineState *ms = MACHINE(qdev_get_machine());
+    TdxGuest *tdx = TDX_GUEST(ms->cgs);
+
+    tdvf_hob_create(tdx, tdx_get_hob_entry(tdx));
+
     tdx_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 21/44] i386/tdx: Create the TD HOB list upon machine init done
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Build the TD HOB during machine late initialization, i.e. once guest
memory is fully defined.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 hw/i386/meson.build   |   2 +-
 hw/i386/tdvf-hob.c    | 166 ++++++++++++++++++++++++++++++++++++++++++
 hw/i386/tdvf-hob.h    |  20 +++++
 target/i386/kvm/tdx.c |  19 +++++
 4 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 hw/i386/tdvf-hob.c
 create mode 100644 hw/i386/tdvf-hob.h

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 945e805525..8175c3c638 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -24,7 +24,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'pc_sysfw.c',
   'acpi-build.c',
   'port92.c'))
-i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c', 'tdvf-hob.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
new file mode 100644
index 0000000000..5e0bf807f7
--- /dev/null
+++ b/hw/i386/tdvf-hob.c
@@ -0,0 +1,166 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata <isaku.yamahata at gmail.com>
+ *                        <isaku.yamahata at intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "e820_memory_layout.h"
+#include "hw/i386/x86.h"
+#include "sysemu/tdx.h"
+#include "tdvf-hob.h"
+#include "uefi.h"
+
+typedef struct TdvfHob {
+    hwaddr hob_addr;
+    void *ptr;
+    int size;
+
+    /* working area */
+    void *current;
+    void *end;
+} TdvfHob;
+
+static uint64_t tdvf_current_guest_addr(const TdvfHob *hob)
+{
+    return hob->hob_addr + (hob->current - hob->ptr);
+}
+
+static void tdvf_align(TdvfHob *hob, size_t align)
+{
+    hob->current = QEMU_ALIGN_PTR_UP(hob->current, align);
+}
+
+static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
+{
+    void *ret;
+
+    if (hob->current + size > hob->end) {
+        error_report("TD_HOB overrun, size = 0x%" PRIx64, size);
+        exit(1);
+    }
+
+    ret = hob->current;
+    hob->current += size;
+    tdvf_align(hob, 8);
+    return ret;
+}
+
+static int tdvf_e820_compare(const void *lhs_, const void* rhs_)
+{
+    const struct e820_entry *lhs = lhs_;
+    const struct e820_entry *rhs = rhs_;
+
+    if (lhs->address == rhs->address) {
+        return 0;
+    }
+    if (le64_to_cpu(lhs->address) > le64_to_cpu(rhs->address)) {
+        return 1;
+    }
+    return -1;
+}
+
+static void tdvf_hob_add_memory_resources(TdvfHob *hob)
+{
+    EFI_HOB_RESOURCE_DESCRIPTOR *region;
+    EFI_RESOURCE_ATTRIBUTE_TYPE attr;
+    EFI_RESOURCE_TYPE resource_type;
+
+    struct e820_entry *e820_entries, *e820_entry;
+    int nr_e820_entries, i;
+
+    nr_e820_entries = e820_get_num_entries();
+    e820_entries = g_new(struct e820_entry, nr_e820_entries);
+
+    /* Copy and sort the e820 tables to add them to the HOB. */
+    memcpy(e820_entries, e820_table,
+           nr_e820_entries * sizeof(struct e820_entry));
+    qsort(e820_entries, nr_e820_entries, sizeof(struct e820_entry),
+          &tdvf_e820_compare);
+
+    for (i = 0; i < nr_e820_entries; i++) {
+        e820_entry = &e820_entries[i];
+
+        if (le32_to_cpu(e820_entry->type) == E820_RAM) {
+            resource_type = EFI_RESOURCE_SYSTEM_MEMORY;
+            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED;
+        } else {
+            resource_type = EFI_RESOURCE_MEMORY_RESERVED;
+            attr = EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE;
+        }
+
+        region = tdvf_get_area(hob, sizeof(*region));
+        *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
+            .Header = {
+                .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
+                .HobLength = cpu_to_le16(sizeof(*region)),
+                .Reserved = cpu_to_le32(0),
+            },
+            .Owner = EFI_HOB_OWNER_ZERO,
+            .ResourceType = cpu_to_le32(resource_type),
+            .ResourceAttribute = cpu_to_le32(attr),
+            .PhysicalStart = e820_entry->address,
+            .ResourceLength = e820_entry->length,
+        };
+    }
+
+    g_free(e820_entries);
+}
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *hob_entry)
+{
+    TdvfHob hob = {
+        .hob_addr = hob_entry->address,
+        .ptr = hob_entry->mem_ptr,
+        .size = hob_entry->size,
+
+        .current = hob_entry->mem_ptr,
+        .end = hob_entry->mem_ptr + hob_entry->size,
+    };
+
+    EFI_HOB_GENERIC_HEADER *last_hob;
+    EFI_HOB_HANDOFF_INFO_TABLE *hit;
+
+    /* Note, Efi{Free}Memory{Bottom,Top} are ignored, leave 'em zeroed. */
+    hit = tdvf_get_area(&hob, sizeof(*hit));
+    *hit = (EFI_HOB_HANDOFF_INFO_TABLE) {
+        .Header = {
+            .HobType = EFI_HOB_TYPE_HANDOFF,
+            .HobLength = cpu_to_le16(sizeof(*hit)),
+            .Reserved = cpu_to_le32(0),
+        },
+        .Version = cpu_to_le32(EFI_HOB_HANDOFF_TABLE_VERSION),
+        .BootMode = cpu_to_le32(0),
+        .EfiMemoryTop = cpu_to_le64(0),
+        .EfiMemoryBottom = cpu_to_le64(0),
+        .EfiFreeMemoryTop = cpu_to_le64(0),
+        .EfiFreeMemoryBottom = cpu_to_le64(0),
+        .EfiEndOfHobList = cpu_to_le64(0), /* initialized later */
+    };
+
+    tdvf_hob_add_memory_resources(&hob);
+
+    last_hob = tdvf_get_area(&hob, sizeof(*last_hob));
+    *last_hob =  (EFI_HOB_GENERIC_HEADER) {
+        .HobType = EFI_HOB_TYPE_END_OF_HOB_LIST,
+        .HobLength = cpu_to_le16(sizeof(*last_hob)),
+        .Reserved = cpu_to_le32(0),
+    };
+    hit->EfiEndOfHobList = tdvf_current_guest_addr(&hob);
+}
diff --git a/hw/i386/tdvf-hob.h b/hw/i386/tdvf-hob.h
new file mode 100644
index 0000000000..c6c5c1d564
--- /dev/null
+++ b/hw/i386/tdvf-hob.h
@@ -0,0 +1,20 @@
+#ifndef HW_I386_TD_HOB_H
+#define HW_I386_TD_HOB_H
+
+#include "hw/i386/tdvf.h"
+#include "target/i386/kvm/tdx.h"
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *hob_entry);
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE     \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_ENCRYPTED |         \
+     EFI_RESOURCE_ATTRIBUTE_TESTED)
+
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED  \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT |           \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_UNACCEPTED)
+
+#endif
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 48c04d344d..12b2e02fa2 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
 #include "cpu.h"
 #include "kvm_i386.h"
 #include "hw/boards.h"
+#include "hw/i386/tdvf-hob.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
 #include "standard-headers/asm-x86/kvm_para.h"
@@ -67,8 +68,26 @@ static void __tdx_ioctl(void *state, int ioctl_no, const char *ioctl_name,
 #define tdx_ioctl(ioctl_no, metadata, data) \
         _tdx_ioctl(kvm_state, ioctl_no, metadata, data)
 
+static TdxFirmwareEntry *tdx_get_hob_entry(TdxGuest *tdx)
+{
+    TdxFirmwareEntry *entry;
+
+    for_each_fw_entry(&tdx->fw, entry) {
+        if (entry->type == TDVF_SECTION_TYPE_TD_HOB) {
+            return entry;
+        }
+    }
+    error_report("TDVF metadata doesn't specify TD_HOB location.");
+    exit(1);
+}
+
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
+    MachineState *ms = MACHINE(qdev_get_machine());
+    TdxGuest *tdx = TDX_GUEST(ms->cgs);
+
+    tdvf_hob_create(tdx, tdx_get_hob_entry(tdx));
+
     tdx_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 22/44] i386/tdx: Add TDVF memory via INIT_MEM_REGION
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add, and optionally measure, TDVF memory via KVM_TDX_INIT_MEM_REGION as
part of finalizing the TD.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 target/i386/kvm/tdx.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 12b2e02fa2..0cd649dd01 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -85,10 +85,26 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
     TdxGuest *tdx = TDX_GUEST(ms->cgs);
+    TdxFirmwareEntry *entry;
 
     tdvf_hob_create(tdx, tdx_get_hob_entry(tdx));
 
+    for_each_fw_entry(&tdx->fw, entry) {
+        struct kvm_tdx_init_mem_region mem_region = {
+            .source_addr = (__u64)entry->mem_ptr,
+            .gpa = entry->address,
+            .nr_pages = entry->size / 4096,
+        };
+
+        __u32 metadata = entry->attributes & TDVF_SECTION_ATTRIBUTES_EXTENDMR ?
+                         KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+        tdx_ioctl(KVM_TDX_INIT_MEM_REGION, metadata, &mem_region);
+    }
+
     tdx_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
+
+    tdx->parent_obj.ready = true;
 }
 
 static Notifier tdx_machine_done_late_notify = {
@@ -301,7 +317,6 @@ static void tdx_guest_init(Object *obj)
 {
     TdxGuest *tdx = TDX_GUEST(obj);
 
-    tdx->parent_obj.ready = true;
     qemu_mutex_init(&tdx->lock);
 
     tdx->debug = false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 22/44] i386/tdx: Add TDVF memory via INIT_MEM_REGION
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add, and optionally measure, TDVF memory via KVM_TDX_INIT_MEM_REGION as
part of finalizing the TD.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
---
 target/i386/kvm/tdx.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 12b2e02fa2..0cd649dd01 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -85,10 +85,26 @@ static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
     TdxGuest *tdx = TDX_GUEST(ms->cgs);
+    TdxFirmwareEntry *entry;
 
     tdvf_hob_create(tdx, tdx_get_hob_entry(tdx));
 
+    for_each_fw_entry(&tdx->fw, entry) {
+        struct kvm_tdx_init_mem_region mem_region = {
+            .source_addr = (__u64)entry->mem_ptr,
+            .gpa = entry->address,
+            .nr_pages = entry->size / 4096,
+        };
+
+        __u32 metadata = entry->attributes & TDVF_SECTION_ATTRIBUTES_EXTENDMR ?
+                         KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+        tdx_ioctl(KVM_TDX_INIT_MEM_REGION, metadata, &mem_region);
+    }
+
     tdx_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
+
+    tdx->parent_obj.ready = true;
 }
 
 static Notifier tdx_machine_done_late_notify = {
@@ -301,7 +317,6 @@ static void tdx_guest_init(Object *obj)
 {
     TdxGuest *tdx = TDX_GUEST(obj);
 
-    tdx->parent_obj.ready = true;
     qemu_mutex_init(&tdx->lock);
 
     tdx->debug = false;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 23/44] i386/tdx: Use KVM_TDX_INIT_VCPU to pass HOB to TDVF
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Specify the initial value for RCX/R8 to be the address of the HOB.
Don't propagate the value to Qemu's cache of the registers so as to
avoid implying that the register state is valid, e.g. Qemu doesn't model
TDX-SEAM behavior for initializing other GPRs.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/tdx.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0cd649dd01..c348626dbf 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -285,10 +285,17 @@ out:
 
 void tdx_post_init_vcpu(CPUState *cpu)
 {
-    CPUX86State *env = &X86_CPU(cpu)->env;
+    MachineState *ms = MACHINE(qdev_get_machine());
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+    TdxFirmwareEntry *hob;
+
+    if (!tdx) {
+        return;
+    }
 
-    _tdx_ioctl(cpu, KVM_TDX_INIT_VCPU, 0,
-               (void *)(unsigned long)env->regs[R_ECX]);
+    hob = tdx_get_hob_entry(tdx);
+    _tdx_ioctl(cpu, KVM_TDX_INIT_VCPU, 0, (void *)hob->address);
 }
 
 static bool tdx_guest_get_debug(Object *obj, Error **errp)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 23/44] i386/tdx: Use KVM_TDX_INIT_VCPU to pass HOB to TDVF
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Specify the initial value for RCX/R8 to be the address of the HOB.
Don't propagate the value to Qemu's cache of the registers so as to
avoid implying that the register state is valid, e.g. Qemu doesn't model
TDX-SEAM behavior for initializing other GPRs.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/tdx.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0cd649dd01..c348626dbf 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -285,10 +285,17 @@ out:
 
 void tdx_post_init_vcpu(CPUState *cpu)
 {
-    CPUX86State *env = &X86_CPU(cpu)->env;
+    MachineState *ms = MACHINE(qdev_get_machine());
+    TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+                                                    TYPE_TDX_GUEST);
+    TdxFirmwareEntry *hob;
+
+    if (!tdx) {
+        return;
+    }
 
-    _tdx_ioctl(cpu, KVM_TDX_INIT_VCPU, 0,
-               (void *)(unsigned long)env->regs[R_ECX]);
+    hob = tdx_get_hob_entry(tdx);
+    _tdx_ioctl(cpu, KVM_TDX_INIT_VCPU, 0, (void *)hob->address);
 }
 
 static bool tdx_guest_get_debug(Object *obj, Error **errp)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 24/44] i386/tdx: Add MMIO HOB entries
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add MMIO HOB entries, which are needed to enumerate legal MMIO ranges to
early TDVF.

Note, the attribute absolutely must include UNCACHEABLE, else TDVF will
effectively consider it a bad HOB entry and ignore it.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/tdvf-hob.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/tdvf-hob.h |  5 ++++
 2 files changed, 74 insertions(+)

diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
index 5e0bf807f7..60c5ed0e03 100644
--- a/hw/i386/tdvf-hob.c
+++ b/hw/i386/tdvf-hob.c
@@ -22,7 +22,10 @@
 #include "qemu/osdep.h"
 #include "qemu/log.h"
 #include "e820_memory_layout.h"
+#include "hw/i386/pc.h"
 #include "hw/i386/x86.h"
+#include "hw/pci/pci_host.h"
+#include "hw/pci/pcie_host.h"
 #include "sysemu/tdx.h"
 #include "tdvf-hob.h"
 #include "uefi.h"
@@ -62,6 +65,70 @@ static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
     return ret;
 }
 
+static void tdvf_hob_add_mmio_resource(TdvfHob *hob, uint64_t start,
+                                       uint64_t end)
+{
+    EFI_HOB_RESOURCE_DESCRIPTOR *region;
+
+    if (!start) {
+        return;
+    }
+
+    region = tdvf_get_area(hob, sizeof(*region));
+    *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
+        .Header = {
+            .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
+            .HobLength = cpu_to_le16(sizeof(*region)),
+            .Reserved = cpu_to_le32(0),
+        },
+        .Owner = EFI_HOB_OWNER_ZERO,
+        .ResourceType = cpu_to_le32(EFI_RESOURCE_MEMORY_MAPPED_IO),
+        .ResourceAttribute = cpu_to_le32(EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO),
+        .PhysicalStart = cpu_to_le64(start),
+        .ResourceLength = cpu_to_le64(end - start),
+    };
+}
+
+static void tdvf_hob_add_mmio_resources(TdvfHob *hob)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    PCIHostState *pci_host;
+    uint64_t start, end;
+    uint64_t mcfg_base, mcfg_size;
+    Object *host;
+
+    /* Effectively PCI hole + other MMIO devices. */
+    tdvf_hob_add_mmio_resource(hob, x86ms->below_4g_mem_size,
+                               APIC_DEFAULT_ADDRESS);
+
+    /* Stolen from acpi_get_i386_pci_host(), there's gotta be an easier way. */
+    pci_host = OBJECT_CHECK(PCIHostState,
+                            object_resolve_path("/machine/i440fx", NULL),
+                            TYPE_PCI_HOST_BRIDGE);
+    if (!pci_host) {
+        pci_host = OBJECT_CHECK(PCIHostState,
+                                object_resolve_path("/machine/q35", NULL),
+                                TYPE_PCI_HOST_BRIDGE);
+    }
+    g_assert(pci_host);
+
+    host = OBJECT(pci_host);
+
+    /* PCI hole above 4gb. */
+    start = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_START,
+                                     NULL);
+    end = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_END, NULL);
+    tdvf_hob_add_mmio_resource(hob, start, end);
+
+    /* MMCFG region */
+    mcfg_base = object_property_get_uint(host, PCIE_HOST_MCFG_BASE, NULL);
+    mcfg_size = object_property_get_uint(host, PCIE_HOST_MCFG_SIZE, NULL);
+    if (mcfg_base && mcfg_base != PCIE_BASE_ADDR_UNMAPPED && mcfg_size) {
+        tdvf_hob_add_mmio_resource(hob, mcfg_base, mcfg_base + mcfg_size);
+    }
+}
+
 static int tdvf_e820_compare(const void *lhs_, const void* rhs_)
 {
     const struct e820_entry *lhs = lhs_;
@@ -156,6 +223,8 @@ void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *hob_entry)
 
     tdvf_hob_add_memory_resources(&hob);
 
+    tdvf_hob_add_mmio_resources(&hob);
+
     last_hob = tdvf_get_area(&hob, sizeof(*last_hob));
     *last_hob =  (EFI_HOB_GENERIC_HEADER) {
         .HobType = EFI_HOB_TYPE_END_OF_HOB_LIST,
diff --git a/hw/i386/tdvf-hob.h b/hw/i386/tdvf-hob.h
index c6c5c1d564..9967dbfe5a 100644
--- a/hw/i386/tdvf-hob.h
+++ b/hw/i386/tdvf-hob.h
@@ -17,4 +17,9 @@ void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *hob_entry);
      EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
      EFI_RESOURCE_ATTRIBUTE_UNACCEPTED)
 
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO        \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT     |       \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE)
+
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 24/44] i386/tdx: Add MMIO HOB entries
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add MMIO HOB entries, which are needed to enumerate legal MMIO ranges to
early TDVF.

Note, the attribute absolutely must include UNCACHEABLE, else TDVF will
effectively consider it a bad HOB entry and ignore it.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/tdvf-hob.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/tdvf-hob.h |  5 ++++
 2 files changed, 74 insertions(+)

diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
index 5e0bf807f7..60c5ed0e03 100644
--- a/hw/i386/tdvf-hob.c
+++ b/hw/i386/tdvf-hob.c
@@ -22,7 +22,10 @@
 #include "qemu/osdep.h"
 #include "qemu/log.h"
 #include "e820_memory_layout.h"
+#include "hw/i386/pc.h"
 #include "hw/i386/x86.h"
+#include "hw/pci/pci_host.h"
+#include "hw/pci/pcie_host.h"
 #include "sysemu/tdx.h"
 #include "tdvf-hob.h"
 #include "uefi.h"
@@ -62,6 +65,70 @@ static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
     return ret;
 }
 
+static void tdvf_hob_add_mmio_resource(TdvfHob *hob, uint64_t start,
+                                       uint64_t end)
+{
+    EFI_HOB_RESOURCE_DESCRIPTOR *region;
+
+    if (!start) {
+        return;
+    }
+
+    region = tdvf_get_area(hob, sizeof(*region));
+    *region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
+        .Header = {
+            .HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
+            .HobLength = cpu_to_le16(sizeof(*region)),
+            .Reserved = cpu_to_le32(0),
+        },
+        .Owner = EFI_HOB_OWNER_ZERO,
+        .ResourceType = cpu_to_le32(EFI_RESOURCE_MEMORY_MAPPED_IO),
+        .ResourceAttribute = cpu_to_le32(EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO),
+        .PhysicalStart = cpu_to_le64(start),
+        .ResourceLength = cpu_to_le64(end - start),
+    };
+}
+
+static void tdvf_hob_add_mmio_resources(TdvfHob *hob)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    PCIHostState *pci_host;
+    uint64_t start, end;
+    uint64_t mcfg_base, mcfg_size;
+    Object *host;
+
+    /* Effectively PCI hole + other MMIO devices. */
+    tdvf_hob_add_mmio_resource(hob, x86ms->below_4g_mem_size,
+                               APIC_DEFAULT_ADDRESS);
+
+    /* Stolen from acpi_get_i386_pci_host(), there's gotta be an easier way. */
+    pci_host = OBJECT_CHECK(PCIHostState,
+                            object_resolve_path("/machine/i440fx", NULL),
+                            TYPE_PCI_HOST_BRIDGE);
+    if (!pci_host) {
+        pci_host = OBJECT_CHECK(PCIHostState,
+                                object_resolve_path("/machine/q35", NULL),
+                                TYPE_PCI_HOST_BRIDGE);
+    }
+    g_assert(pci_host);
+
+    host = OBJECT(pci_host);
+
+    /* PCI hole above 4gb. */
+    start = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_START,
+                                     NULL);
+    end = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_END, NULL);
+    tdvf_hob_add_mmio_resource(hob, start, end);
+
+    /* MMCFG region */
+    mcfg_base = object_property_get_uint(host, PCIE_HOST_MCFG_BASE, NULL);
+    mcfg_size = object_property_get_uint(host, PCIE_HOST_MCFG_SIZE, NULL);
+    if (mcfg_base && mcfg_base != PCIE_BASE_ADDR_UNMAPPED && mcfg_size) {
+        tdvf_hob_add_mmio_resource(hob, mcfg_base, mcfg_base + mcfg_size);
+    }
+}
+
 static int tdvf_e820_compare(const void *lhs_, const void* rhs_)
 {
     const struct e820_entry *lhs = lhs_;
@@ -156,6 +223,8 @@ void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *hob_entry)
 
     tdvf_hob_add_memory_resources(&hob);
 
+    tdvf_hob_add_mmio_resources(&hob);
+
     last_hob = tdvf_get_area(&hob, sizeof(*last_hob));
     *last_hob =  (EFI_HOB_GENERIC_HEADER) {
         .HobType = EFI_HOB_TYPE_END_OF_HOB_LIST,
diff --git a/hw/i386/tdvf-hob.h b/hw/i386/tdvf-hob.h
index c6c5c1d564..9967dbfe5a 100644
--- a/hw/i386/tdvf-hob.h
+++ b/hw/i386/tdvf-hob.h
@@ -17,4 +17,9 @@ void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *hob_entry);
      EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
      EFI_RESOURCE_ATTRIBUTE_UNACCEPTED)
 
+#define EFI_RESOURCE_ATTRIBUTE_TDVF_MMIO        \
+    (EFI_RESOURCE_ATTRIBUTE_PRESENT     |       \
+     EFI_RESOURCE_ATTRIBUTE_INITIALIZED |       \
+     EFI_RESOURCE_ATTRIBUTE_UNCACHEABLE)
+
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 25/44] q35: Move PCIe BAR check above PAM check in mch_write_config()
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Process PCIe BAR before PAM so that a future patch can skip all the SMM
related crud with a single check-and-return.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/pci-host/q35.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 2eb729dff5..9a2be237d7 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -468,16 +468,16 @@ static void mch_write_config(PCIDevice *d,
 
     pci_default_write_config(d, address, val, len);
 
-    if (ranges_overlap(address, len, MCH_HOST_BRIDGE_PAM0,
-                       MCH_HOST_BRIDGE_PAM_SIZE)) {
-        mch_update_pam(mch);
-    }
-
     if (ranges_overlap(address, len, MCH_HOST_BRIDGE_PCIEXBAR,
                        MCH_HOST_BRIDGE_PCIEXBAR_SIZE)) {
         mch_update_pciexbar(mch);
     }
 
+    if (ranges_overlap(address, len, MCH_HOST_BRIDGE_PAM0,
+                       MCH_HOST_BRIDGE_PAM_SIZE)) {
+        mch_update_pam(mch);
+    }
+
     if (ranges_overlap(address, len, MCH_HOST_BRIDGE_SMRAM,
                        MCH_HOST_BRIDGE_SMRAM_SIZE)) {
         mch_update_smram(mch);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 25/44] q35: Move PCIe BAR check above PAM check in mch_write_config()
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Sean Christopherson <sean.j.christopherson@intel.com>

Process PCIe BAR before PAM so that a future patch can skip all the SMM
related crud with a single check-and-return.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/pci-host/q35.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 2eb729dff5..9a2be237d7 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -468,16 +468,16 @@ static void mch_write_config(PCIDevice *d,
 
     pci_default_write_config(d, address, val, len);
 
-    if (ranges_overlap(address, len, MCH_HOST_BRIDGE_PAM0,
-                       MCH_HOST_BRIDGE_PAM_SIZE)) {
-        mch_update_pam(mch);
-    }
-
     if (ranges_overlap(address, len, MCH_HOST_BRIDGE_PCIEXBAR,
                        MCH_HOST_BRIDGE_PCIEXBAR_SIZE)) {
         mch_update_pciexbar(mch);
     }
 
+    if (ranges_overlap(address, len, MCH_HOST_BRIDGE_PAM0,
+                       MCH_HOST_BRIDGE_PAM_SIZE)) {
+        mch_update_pam(mch);
+    }
+
     if (ranges_overlap(address, len, MCH_HOST_BRIDGE_SMRAM,
                        MCH_HOST_BRIDGE_SMRAM_SIZE)) {
         mch_update_smram(mch);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 26/44] pci-host/q35: Move PAM initialization above SMRAM initialization
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

In mch_realize(), process PAM initialization before SMRAM initialization so
that later patch can skill all the SMRAM related with a single check.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/pci-host/q35.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 9a2be237d7..68234d209c 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -571,6 +571,16 @@ static void mch_realize(PCIDevice *d, Error **errp)
     pc_pci_as_mapping_init(OBJECT(mch), mch->system_memory,
                            mch->pci_address_space);
 
+    /* PAM */
+    init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
+             mch->pci_address_space, &mch->pam_regions[0],
+             PAM_BIOS_BASE, PAM_BIOS_SIZE);
+    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
+        init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
+                 mch->pci_address_space, &mch->pam_regions[i+1],
+                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
@@ -637,15 +647,6 @@ static void mch_realize(PCIDevice *d, Error **errp)
 
     object_property_add_const_link(qdev_get_machine(), "smram",
                                    OBJECT(&mch->smram));
-
-    init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
-             mch->pci_address_space, &mch->pam_regions[0],
-             PAM_BIOS_BASE, PAM_BIOS_SIZE);
-    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
-        init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
-                 mch->pci_address_space, &mch->pam_regions[i+1],
-                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
-    }
 }
 
 uint64_t mch_mcfg_base(void)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 26/44] pci-host/q35: Move PAM initialization above SMRAM initialization
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

In mch_realize(), process PAM initialization before SMRAM initialization so
that later patch can skill all the SMRAM related with a single check.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/pci-host/q35.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 9a2be237d7..68234d209c 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -571,6 +571,16 @@ static void mch_realize(PCIDevice *d, Error **errp)
     pc_pci_as_mapping_init(OBJECT(mch), mch->system_memory,
                            mch->pci_address_space);
 
+    /* PAM */
+    init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
+             mch->pci_address_space, &mch->pam_regions[0],
+             PAM_BIOS_BASE, PAM_BIOS_SIZE);
+    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
+        init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
+                 mch->pci_address_space, &mch->pam_regions[i+1],
+                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
@@ -637,15 +647,6 @@ static void mch_realize(PCIDevice *d, Error **errp)
 
     object_property_add_const_link(qdev_get_machine(), "smram",
                                    OBJECT(&mch->smram));
-
-    init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
-             mch->pci_address_space, &mch->pam_regions[0],
-             PAM_BIOS_BASE, PAM_BIOS_SIZE);
-    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
-        init_pam(DEVICE(mch), mch->ram_memory, mch->system_memory,
-                 mch->pci_address_space, &mch->pam_regions[i+1],
-                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
-    }
 }
 
 uint64_t mch_mcfg_base(void)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 27/44] q35: Introduce smm_ranges property for q35-pci-host
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Isaku Yamahata, Sean Christopherson

From: Isaku Yamahata <isaku.yamahata@linux.intel.com>

Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
etc... exist for the target platform.  TDX doesn't support SMM and doesn't
play nice with QEMU modifying related guest memory ranges.

Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/pc_q35.c          |  2 ++
 hw/pci-host/q35.c         | 42 +++++++++++++++++++++++++++------------
 include/hw/i386/pc.h      |  1 +
 include/hw/pci-host/q35.h |  1 +
 4 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 46a0f196f4..1718aa94d9 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -219,6 +219,8 @@ static void pc_q35_init(MachineState *machine)
                             x86ms->below_4g_mem_size, NULL);
     object_property_set_int(OBJECT(q35_host), PCI_HOST_ABOVE_4G_MEM_SIZE,
                             x86ms->above_4g_mem_size, NULL);
+    object_property_set_bool(OBJECT(q35_host), PCI_HOST_PROP_SMM_RANGES,
+                             x86_machine_is_smm_enabled(x86ms), NULL);
     /* pci */
     sysbus_realize_and_unref(SYS_BUS_DEVICE(q35_host), &error_fatal);
     phb = PCI_HOST_BRIDGE(q35_host);
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 68234d209c..ba28d969ba 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -183,6 +183,8 @@ static Property q35_host_props[] = {
                      mch.below_4g_mem_size, 0),
     DEFINE_PROP_SIZE(PCI_HOST_ABOVE_4G_MEM_SIZE, Q35PCIHost,
                      mch.above_4g_mem_size, 0),
+    DEFINE_PROP_BOOL(PCI_HOST_PROP_SMM_RANGES, Q35PCIHost,
+                     mch.has_smm_ranges, true),
     DEFINE_PROP_BOOL("x-pci-hole64-fix", Q35PCIHost, pci_hole64_fix, true),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -218,6 +220,7 @@ static void q35_host_initfn(Object *obj)
     /* mch's object_initialize resets the default value, set it again */
     qdev_prop_set_uint64(DEVICE(s), PCI_HOST_PROP_PCI_HOLE64_SIZE,
                          Q35_PCI_HOST_HOLE64_SIZE_DEFAULT);
+
     object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_START, "uint32",
                         q35_host_get_pci_hole_start,
                         NULL, NULL, NULL);
@@ -478,6 +481,10 @@ static void mch_write_config(PCIDevice *d,
         mch_update_pam(mch);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     if (ranges_overlap(address, len, MCH_HOST_BRIDGE_SMRAM,
                        MCH_HOST_BRIDGE_SMRAM_SIZE)) {
         mch_update_smram(mch);
@@ -496,10 +503,13 @@ static void mch_write_config(PCIDevice *d,
 static void mch_update(MCHPCIState *mch)
 {
     mch_update_pciexbar(mch);
+
     mch_update_pam(mch);
-    mch_update_smram(mch);
-    mch_update_ext_tseg_mbytes(mch);
-    mch_update_smbase_smram(mch);
+    if (mch->has_smm_ranges) {
+        mch_update_smram(mch);
+        mch_update_ext_tseg_mbytes(mch);
+        mch_update_smbase_smram(mch);
+    }
 
     /*
      * pci hole goes from end-of-low-ram to io-apic.
@@ -540,18 +550,20 @@ static void mch_reset(DeviceState *qdev)
     pci_set_quad(d->config + MCH_HOST_BRIDGE_PCIEXBAR,
                  MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT);
 
-    d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
-    d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
-    d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
-    d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
+    if (mch->has_smm_ranges) {
+        d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
+        d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
+        d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
+        d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
 
-    if (mch->ext_tseg_mbytes > 0) {
-        pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
-                     MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
-    }
+        if (mch->ext_tseg_mbytes > 0) {
+            pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
+                        MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+        }
 
-    d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
-    d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
+        d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
+        d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
+    }
 
     mch_update(mch);
 }
@@ -581,6 +593,10 @@ static void mch_realize(PCIDevice *d, Error **errp)
                  PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 87294f2632..cd2113c763 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -147,6 +147,7 @@ void pc_guest_info_init(PCMachineState *pcms);
 #define PCI_HOST_PROP_PCI_HOLE64_SIZE  "pci-hole64-size"
 #define PCI_HOST_BELOW_4G_MEM_SIZE     "below-4g-mem-size"
 #define PCI_HOST_ABOVE_4G_MEM_SIZE     "above-4g-mem-size"
+#define PCI_HOST_PROP_SMM_RANGES       "smm-ranges"
 
 
 void pc_pci_as_mapping_init(Object *owner, MemoryRegion *system_memory,
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index ab989698ef..ce634e708a 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -50,6 +50,7 @@ struct MCHPCIState {
     MemoryRegion tseg_blackhole, tseg_window;
     MemoryRegion smbase_blackhole, smbase_window;
     bool has_smram_at_smbase;
+    bool has_smm_ranges;
     Range pci_hole;
     uint64_t below_4g_mem_size;
     uint64_t above_4g_mem_size;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 27/44] q35: Introduce smm_ranges property for q35-pci-host
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, Isaku Yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@linux.intel.com>

Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
etc... exist for the target platform.  TDX doesn't support SMM and doesn't
play nice with QEMU modifying related guest memory ranges.

Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/pc_q35.c          |  2 ++
 hw/pci-host/q35.c         | 42 +++++++++++++++++++++++++++------------
 include/hw/i386/pc.h      |  1 +
 include/hw/pci-host/q35.h |  1 +
 4 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 46a0f196f4..1718aa94d9 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -219,6 +219,8 @@ static void pc_q35_init(MachineState *machine)
                             x86ms->below_4g_mem_size, NULL);
     object_property_set_int(OBJECT(q35_host), PCI_HOST_ABOVE_4G_MEM_SIZE,
                             x86ms->above_4g_mem_size, NULL);
+    object_property_set_bool(OBJECT(q35_host), PCI_HOST_PROP_SMM_RANGES,
+                             x86_machine_is_smm_enabled(x86ms), NULL);
     /* pci */
     sysbus_realize_and_unref(SYS_BUS_DEVICE(q35_host), &error_fatal);
     phb = PCI_HOST_BRIDGE(q35_host);
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 68234d209c..ba28d969ba 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -183,6 +183,8 @@ static Property q35_host_props[] = {
                      mch.below_4g_mem_size, 0),
     DEFINE_PROP_SIZE(PCI_HOST_ABOVE_4G_MEM_SIZE, Q35PCIHost,
                      mch.above_4g_mem_size, 0),
+    DEFINE_PROP_BOOL(PCI_HOST_PROP_SMM_RANGES, Q35PCIHost,
+                     mch.has_smm_ranges, true),
     DEFINE_PROP_BOOL("x-pci-hole64-fix", Q35PCIHost, pci_hole64_fix, true),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -218,6 +220,7 @@ static void q35_host_initfn(Object *obj)
     /* mch's object_initialize resets the default value, set it again */
     qdev_prop_set_uint64(DEVICE(s), PCI_HOST_PROP_PCI_HOLE64_SIZE,
                          Q35_PCI_HOST_HOLE64_SIZE_DEFAULT);
+
     object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_START, "uint32",
                         q35_host_get_pci_hole_start,
                         NULL, NULL, NULL);
@@ -478,6 +481,10 @@ static void mch_write_config(PCIDevice *d,
         mch_update_pam(mch);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     if (ranges_overlap(address, len, MCH_HOST_BRIDGE_SMRAM,
                        MCH_HOST_BRIDGE_SMRAM_SIZE)) {
         mch_update_smram(mch);
@@ -496,10 +503,13 @@ static void mch_write_config(PCIDevice *d,
 static void mch_update(MCHPCIState *mch)
 {
     mch_update_pciexbar(mch);
+
     mch_update_pam(mch);
-    mch_update_smram(mch);
-    mch_update_ext_tseg_mbytes(mch);
-    mch_update_smbase_smram(mch);
+    if (mch->has_smm_ranges) {
+        mch_update_smram(mch);
+        mch_update_ext_tseg_mbytes(mch);
+        mch_update_smbase_smram(mch);
+    }
 
     /*
      * pci hole goes from end-of-low-ram to io-apic.
@@ -540,18 +550,20 @@ static void mch_reset(DeviceState *qdev)
     pci_set_quad(d->config + MCH_HOST_BRIDGE_PCIEXBAR,
                  MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT);
 
-    d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
-    d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
-    d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
-    d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
+    if (mch->has_smm_ranges) {
+        d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
+        d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
+        d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
+        d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
 
-    if (mch->ext_tseg_mbytes > 0) {
-        pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
-                     MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
-    }
+        if (mch->ext_tseg_mbytes > 0) {
+            pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
+                        MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+        }
 
-    d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
-    d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
+        d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
+        d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
+    }
 
     mch_update(mch);
 }
@@ -581,6 +593,10 @@ static void mch_realize(PCIDevice *d, Error **errp)
                  PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 87294f2632..cd2113c763 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -147,6 +147,7 @@ void pc_guest_info_init(PCMachineState *pcms);
 #define PCI_HOST_PROP_PCI_HOLE64_SIZE  "pci-hole64-size"
 #define PCI_HOST_BELOW_4G_MEM_SIZE     "below-4g-mem-size"
 #define PCI_HOST_ABOVE_4G_MEM_SIZE     "above-4g-mem-size"
+#define PCI_HOST_PROP_SMM_RANGES       "smm-ranges"
 
 
 void pc_pci_as_mapping_init(Object *owner, MemoryRegion *system_memory,
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index ab989698ef..ce634e708a 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -50,6 +50,7 @@ struct MCHPCIState {
     MemoryRegion tseg_blackhole, tseg_window;
     MemoryRegion smbase_blackhole, smbase_window;
     bool has_smram_at_smbase;
+    bool has_smm_ranges;
     Range pci_hole;
     uint64_t below_4g_mem_size;
     uint64_t above_4g_mem_size;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 28/44] i386/tdx: Force x2apic mode and routing for TDs
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX requires x2apic and "resets" vCPUs to have x2apic enabled.  Model
this in QEMU and unconditionally enable x2apic interrupt routing.

This fixes issues where interrupts from IRQFD would not get forwarded to
the guest due to KVM silently dropping the invalid routing entry.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/intc/apic_common.c           | 12 ++++++++++++
 include/hw/i386/apic.h          |  1 +
 include/hw/i386/apic_internal.h |  1 +
 target/i386/kvm/tdx.c           |  7 +++++++
 4 files changed, 21 insertions(+)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 2a20982066..b95fed95da 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -262,6 +262,15 @@ void apic_designate_bsp(DeviceState *dev, bool bsp)
     }
 }
 
+void apic_force_x2apic(DeviceState *dev)
+{
+    if (dev == NULL) {
+        return;
+    }
+
+    APIC_COMMON(dev)->force_x2apic = true;
+}
+
 static void apic_reset_common(DeviceState *dev)
 {
     APICCommonState *s = APIC_COMMON(dev);
@@ -270,6 +279,9 @@ static void apic_reset_common(DeviceState *dev)
 
     bsp = s->apicbase & MSR_IA32_APICBASE_BSP;
     s->apicbase = APIC_DEFAULT_ADDRESS | bsp | MSR_IA32_APICBASE_ENABLE;
+    if (s->force_x2apic) {
+        s->apicbase |= MSR_IA32_APICBASE_EXTD;
+    }
     s->id = s->initial_apic_id;
 
     apic_reset_irq_delivered();
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index da1d2fe155..7d05abd7e0 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -19,6 +19,7 @@ void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
 void apic_poll_irq(DeviceState *d);
 void apic_designate_bsp(DeviceState *d, bool bsp);
+void apic_force_x2apic(DeviceState *d);
 int apic_get_highest_priority_irr(DeviceState *dev);
 
 /* pc.c */
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index c175e7e718..eda0b5a587 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -187,6 +187,7 @@ struct APICCommonState {
     DeviceState *vapic;
     hwaddr vapic_paddr; /* note: persistence via kvmvapic */
     bool legacy_instance_id;
+    bool force_x2apic;
 };
 
 typedef struct VAPICState {
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index c348626dbf..47a502051c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -139,6 +139,11 @@ int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     tdx_caps->nr_cpuid_configs = TDX1_MAX_NR_CPUID_CONFIGS;
     tdx_ioctl(KVM_TDX_CAPABILITIES, 0, tdx_caps);
 
+    if (!kvm_enable_x2apic()) {
+        error_report("Failed to enable x2apic in KVM");
+        exit(1);
+    }
+
     qemu_add_machine_init_done_late_notifier(&tdx_machine_done_late_notify);
 
     return 0;
@@ -296,6 +301,8 @@ void tdx_post_init_vcpu(CPUState *cpu)
 
     hob = tdx_get_hob_entry(tdx);
     _tdx_ioctl(cpu, KVM_TDX_INIT_VCPU, 0, (void *)hob->address);
+
+    apic_force_x2apic(X86_CPU(cpu)->apic_state);
 }
 
 static bool tdx_guest_get_debug(Object *obj, Error **errp)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 28/44] i386/tdx: Force x2apic mode and routing for TDs
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Sean Christopherson <sean.j.christopherson@intel.com>

TDX requires x2apic and "resets" vCPUs to have x2apic enabled.  Model
this in QEMU and unconditionally enable x2apic interrupt routing.

This fixes issues where interrupts from IRQFD would not get forwarded to
the guest due to KVM silently dropping the invalid routing entry.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/intc/apic_common.c           | 12 ++++++++++++
 include/hw/i386/apic.h          |  1 +
 include/hw/i386/apic_internal.h |  1 +
 target/i386/kvm/tdx.c           |  7 +++++++
 4 files changed, 21 insertions(+)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 2a20982066..b95fed95da 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -262,6 +262,15 @@ void apic_designate_bsp(DeviceState *dev, bool bsp)
     }
 }
 
+void apic_force_x2apic(DeviceState *dev)
+{
+    if (dev == NULL) {
+        return;
+    }
+
+    APIC_COMMON(dev)->force_x2apic = true;
+}
+
 static void apic_reset_common(DeviceState *dev)
 {
     APICCommonState *s = APIC_COMMON(dev);
@@ -270,6 +279,9 @@ static void apic_reset_common(DeviceState *dev)
 
     bsp = s->apicbase & MSR_IA32_APICBASE_BSP;
     s->apicbase = APIC_DEFAULT_ADDRESS | bsp | MSR_IA32_APICBASE_ENABLE;
+    if (s->force_x2apic) {
+        s->apicbase |= MSR_IA32_APICBASE_EXTD;
+    }
     s->id = s->initial_apic_id;
 
     apic_reset_irq_delivered();
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index da1d2fe155..7d05abd7e0 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -19,6 +19,7 @@ void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
 void apic_poll_irq(DeviceState *d);
 void apic_designate_bsp(DeviceState *d, bool bsp);
+void apic_force_x2apic(DeviceState *d);
 int apic_get_highest_priority_irr(DeviceState *dev);
 
 /* pc.c */
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index c175e7e718..eda0b5a587 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -187,6 +187,7 @@ struct APICCommonState {
     DeviceState *vapic;
     hwaddr vapic_paddr; /* note: persistence via kvmvapic */
     bool legacy_instance_id;
+    bool force_x2apic;
 };
 
 typedef struct VAPICState {
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index c348626dbf..47a502051c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -139,6 +139,11 @@ int tdx_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     tdx_caps->nr_cpuid_configs = TDX1_MAX_NR_CPUID_CONFIGS;
     tdx_ioctl(KVM_TDX_CAPABILITIES, 0, tdx_caps);
 
+    if (!kvm_enable_x2apic()) {
+        error_report("Failed to enable x2apic in KVM");
+        exit(1);
+    }
+
     qemu_add_machine_init_done_late_notifier(&tdx_machine_done_late_notify);
 
     return 0;
@@ -296,6 +301,8 @@ void tdx_post_init_vcpu(CPUState *cpu)
 
     hob = tdx_get_hob_entry(tdx);
     _tdx_ioctl(cpu, KVM_TDX_INIT_VCPU, 0, (void *)hob->address);
+
+    apic_force_x2apic(X86_CPU(cpu)->apic_state);
 }
 
 static bool tdx_guest_get_debug(Object *obj, Error **errp)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 29/44] target/i386: Add machine option to disable PIC/8259
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:54   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Sean Christopherson

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a machine option to disable the legacy PIC (8259), which cannot be
supported for TDX guests as TDX-SEAM doesn't allow directly interrupt
injection.  Using posted interrupts for the PIC is not a viable option
as the guest BIOS/kernel will not do EOI for PIC IRQs, i.e. will leave
the vIRR bit set.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/pc.c         | 18 ++++++++++++++++++
 hw/i386/pc_piix.c    |  4 +++-
 hw/i386/pc_q35.c     |  4 +++-
 include/hw/i386/pc.h |  2 ++
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8e1220db72..f4590df231 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1522,6 +1522,20 @@ static void pc_machine_set_hpet(Object *obj, bool value, Error **errp)
     pcms->hpet_enabled = value;
 }
 
+static bool pc_machine_get_pic(Object *obj, Error **errp)
+{
+    PCMachineState *pcms = PC_MACHINE(obj);
+
+    return pcms->pic_enabled;
+}
+
+static void pc_machine_set_pic(Object *obj, bool value, Error **errp)
+{
+    PCMachineState *pcms = PC_MACHINE(obj);
+
+    pcms->pic_enabled = value;
+}
+
 static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
                                             const char *name, void *opaque,
                                             Error **errp)
@@ -1617,6 +1631,7 @@ static void pc_machine_initfn(Object *obj)
     pcms->smbus_enabled = true;
     pcms->sata_enabled = true;
     pcms->pit_enabled = true;
+    pcms->pic_enabled = true;
     pcms->max_fw_size = 8 * MiB;
 #ifdef CONFIG_HPET
     pcms->hpet_enabled = true;
@@ -1742,6 +1757,9 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     object_class_property_add_bool(oc, PC_MACHINE_PIT,
         pc_machine_get_pit, pc_machine_set_pit);
 
+    object_class_property_add_bool(oc, PC_MACHINE_PIC,
+        pc_machine_get_pic, pc_machine_set_pic);
+
     object_class_property_add_bool(oc, "hpet",
         pc_machine_get_hpet, pc_machine_set_hpet);
 
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 30b8bd6ea9..4c1e31f180 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -218,7 +218,9 @@ static void pc_init1(MachineState *machine,
     }
     isa_bus_irqs(isa_bus, x86ms->gsi);
 
-    pc_i8259_create(isa_bus, gsi_state->i8259_irq);
+    if (pcms->pic_enabled) {
+        pc_i8259_create(isa_bus, gsi_state->i8259_irq);
+    }
 
     if (pcmc->pci_enabled) {
         ioapic_init_gsi(gsi_state, "i440fx");
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 1718aa94d9..106f5726cc 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -251,7 +251,9 @@ static void pc_q35_init(MachineState *machine)
     pci_bus_set_route_irq_fn(host_bus, ich9_route_intx_pin_to_irq);
     isa_bus = ich9_lpc->isa_bus;
 
-    pc_i8259_create(isa_bus, gsi_state->i8259_irq);
+    if (pcms->pic_enabled) {
+        pc_i8259_create(isa_bus, gsi_state->i8259_irq);
+    }
 
     if (pcmc->pci_enabled) {
         ioapic_init_gsi(gsi_state, "q35");
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index cd2113c763..9cede7a260 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -44,6 +44,7 @@ typedef struct PCMachineState {
     bool sata_enabled;
     bool pit_enabled;
     bool hpet_enabled;
+    bool pic_enabled;
     uint64_t max_fw_size;
 
     /* NUMA information: */
@@ -61,6 +62,7 @@ typedef struct PCMachineState {
 #define PC_MACHINE_SMBUS            "smbus"
 #define PC_MACHINE_SATA             "sata"
 #define PC_MACHINE_PIT              "pit"
+#define PC_MACHINE_PIC              "pic"
 #define PC_MACHINE_MAX_FW_SIZE      "max-fw-size"
 /**
  * PCMachineClass:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 29/44] target/i386: Add machine option to disable PIC/8259
@ 2021-07-08  0:54   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:54 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, isaku.yamahata, kvm

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a machine option to disable the legacy PIC (8259), which cannot be
supported for TDX guests as TDX-SEAM doesn't allow directly interrupt
injection.  Using posted interrupts for the PIC is not a viable option
as the guest BIOS/kernel will not do EOI for PIC IRQs, i.e. will leave
the vIRR bit set.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/pc.c         | 18 ++++++++++++++++++
 hw/i386/pc_piix.c    |  4 +++-
 hw/i386/pc_q35.c     |  4 +++-
 include/hw/i386/pc.h |  2 ++
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8e1220db72..f4590df231 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1522,6 +1522,20 @@ static void pc_machine_set_hpet(Object *obj, bool value, Error **errp)
     pcms->hpet_enabled = value;
 }
 
+static bool pc_machine_get_pic(Object *obj, Error **errp)
+{
+    PCMachineState *pcms = PC_MACHINE(obj);
+
+    return pcms->pic_enabled;
+}
+
+static void pc_machine_set_pic(Object *obj, bool value, Error **errp)
+{
+    PCMachineState *pcms = PC_MACHINE(obj);
+
+    pcms->pic_enabled = value;
+}
+
 static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
                                             const char *name, void *opaque,
                                             Error **errp)
@@ -1617,6 +1631,7 @@ static void pc_machine_initfn(Object *obj)
     pcms->smbus_enabled = true;
     pcms->sata_enabled = true;
     pcms->pit_enabled = true;
+    pcms->pic_enabled = true;
     pcms->max_fw_size = 8 * MiB;
 #ifdef CONFIG_HPET
     pcms->hpet_enabled = true;
@@ -1742,6 +1757,9 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
     object_class_property_add_bool(oc, PC_MACHINE_PIT,
         pc_machine_get_pit, pc_machine_set_pit);
 
+    object_class_property_add_bool(oc, PC_MACHINE_PIC,
+        pc_machine_get_pic, pc_machine_set_pic);
+
     object_class_property_add_bool(oc, "hpet",
         pc_machine_get_hpet, pc_machine_set_hpet);
 
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 30b8bd6ea9..4c1e31f180 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -218,7 +218,9 @@ static void pc_init1(MachineState *machine,
     }
     isa_bus_irqs(isa_bus, x86ms->gsi);
 
-    pc_i8259_create(isa_bus, gsi_state->i8259_irq);
+    if (pcms->pic_enabled) {
+        pc_i8259_create(isa_bus, gsi_state->i8259_irq);
+    }
 
     if (pcmc->pci_enabled) {
         ioapic_init_gsi(gsi_state, "i440fx");
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 1718aa94d9..106f5726cc 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -251,7 +251,9 @@ static void pc_q35_init(MachineState *machine)
     pci_bus_set_route_irq_fn(host_bus, ich9_route_intx_pin_to_irq);
     isa_bus = ich9_lpc->isa_bus;
 
-    pc_i8259_create(isa_bus, gsi_state->i8259_irq);
+    if (pcms->pic_enabled) {
+        pc_i8259_create(isa_bus, gsi_state->i8259_irq);
+    }
 
     if (pcmc->pci_enabled) {
         ioapic_init_gsi(gsi_state, "q35");
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index cd2113c763..9cede7a260 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -44,6 +44,7 @@ typedef struct PCMachineState {
     bool sata_enabled;
     bool pit_enabled;
     bool hpet_enabled;
+    bool pic_enabled;
     uint64_t max_fw_size;
 
     /* NUMA information: */
@@ -61,6 +62,7 @@ typedef struct PCMachineState {
 #define PC_MACHINE_SMBUS            "smbus"
 #define PC_MACHINE_SATA             "sata"
 #define PC_MACHINE_PIT              "pit"
+#define PC_MACHINE_PIC              "pic"
 #define PC_MACHINE_MAX_FW_SIZE      "max-fw-size"
 /**
  * PCMachineClass:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 30/44] qom: implement property helper for sha384
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement property_add_sha384() which converts hex string <-> uint8_t[48]
It will be used for TDX which uses sha384 for measurement.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/qom/object.h | 17 ++++++++++
 qom/object.c         | 76 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/include/qom/object.h b/include/qom/object.h
index 6721cd312e..594d0ec52c 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -1853,6 +1853,23 @@ ObjectProperty *object_property_add_alias(Object *obj, const char *name,
 ObjectProperty *object_property_add_const_link(Object *obj, const char *name,
                                                Object *target);
 
+
+/**
+ * object_property_add_sha384:
+ * @obj: the object to add a property to
+ * @name: the name of the property
+ * @v: pointer to value
+ * @flags: bitwise-or'd ObjectPropertyFlags
+ *
+ * Add an sha384 property in memory.  This function will add a
+ * property of type 'sha384'.
+ *
+ * Returns: The newly added property on success, or %NULL on failure.
+ */
+ObjectProperty * object_property_add_sha384(Object *obj, const char *name,
+                                            const uint8_t *v,
+                                            ObjectPropertyFlags flags);
+
 /**
  * object_property_set_description:
  * @obj: the object owning the property
diff --git a/qom/object.c b/qom/object.c
index 6a01d56546..e33a0b8c5d 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -15,6 +15,7 @@
 #include "qapi/error.h"
 #include "qom/object.h"
 #include "qom/object_interfaces.h"
+#include "qemu/ctype.h"
 #include "qemu/cutils.h"
 #include "qapi/visitor.h"
 #include "qapi/string-input-visitor.h"
@@ -2749,6 +2750,81 @@ object_property_add_alias(Object *obj, const char *name,
     return op;
 }
 
+#define SHA384_DIGEST_SIZE      48
+static void property_get_sha384(Object *obj, Visitor *v, const char *name,
+                                void *opaque, Error **errp)
+{
+    uint8_t *value = (uint8_t *)opaque;
+    char str[SHA384_DIGEST_SIZE * 2 + 1];
+    char *str_ = (char*)str;
+    size_t i;
+
+    for (i = 0; i < SHA384_DIGEST_SIZE; i++) {
+        char *buf;
+        buf = &str[i * 2];
+
+        sprintf(buf, "%02hhx", value[i]);
+    }
+    str[SHA384_DIGEST_SIZE * 2] = '\0';
+
+    visit_type_str(v, name, &str_, errp);
+}
+
+static void property_set_sha384(Object *obj, Visitor *v, const char *name,
+                                    void *opaque, Error **errp)
+{
+    uint8_t *value = (uint8_t *)opaque;
+    char* str;
+    size_t len;
+    size_t i;
+
+    if (!visit_type_str(v, name, &str, errp)) {
+        goto err;
+    }
+
+    len = strlen(str);
+    if (len != SHA384_DIGEST_SIZE * 2) {
+        error_setg(errp, "invalid length for sha348 hex string %s. "
+                   "it must be 48 * 2 hex", name);
+        goto err;
+    }
+
+    for (i = 0; i < SHA384_DIGEST_SIZE; i++) {
+        if (!qemu_isxdigit(str[i * 2]) || !qemu_isxdigit(str[i * 2 + 1])) {
+            error_setg(errp, "invalid char for sha318 hex string %s at %c%c",
+                       name, str[i * 2], str[i * 2 + 1]);
+            goto err;
+        }
+
+        if (sscanf(str + i * 2, "%02hhx", &value[i]) != 1) {
+            error_setg(errp, "invalid format for sha318 hex string %s", name);
+            goto err;
+        }
+    }
+
+err:
+    g_free(str);
+}
+
+ObjectProperty *
+object_property_add_sha384(Object *obj, const char *name,
+                           const uint8_t *v, ObjectPropertyFlags flags)
+{
+    ObjectPropertyAccessor *getter = NULL;
+    ObjectPropertyAccessor *setter = NULL;
+
+    if ((flags & OBJ_PROP_FLAG_READ) == OBJ_PROP_FLAG_READ) {
+        getter = property_get_sha384;
+    }
+
+    if ((flags & OBJ_PROP_FLAG_WRITE) == OBJ_PROP_FLAG_WRITE) {
+        setter = property_set_sha384;
+    }
+
+    return object_property_add(obj, name, "sha384",
+                               getter, setter, NULL, (void *)v);
+}
+
 void object_property_set_description(Object *obj, const char *name,
                                      const char *description)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 30/44] qom: implement property helper for sha384
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Implement property_add_sha384() which converts hex string <-> uint8_t[48]
It will be used for TDX which uses sha384 for measurement.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/qom/object.h | 17 ++++++++++
 qom/object.c         | 76 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/include/qom/object.h b/include/qom/object.h
index 6721cd312e..594d0ec52c 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -1853,6 +1853,23 @@ ObjectProperty *object_property_add_alias(Object *obj, const char *name,
 ObjectProperty *object_property_add_const_link(Object *obj, const char *name,
                                                Object *target);
 
+
+/**
+ * object_property_add_sha384:
+ * @obj: the object to add a property to
+ * @name: the name of the property
+ * @v: pointer to value
+ * @flags: bitwise-or'd ObjectPropertyFlags
+ *
+ * Add an sha384 property in memory.  This function will add a
+ * property of type 'sha384'.
+ *
+ * Returns: The newly added property on success, or %NULL on failure.
+ */
+ObjectProperty * object_property_add_sha384(Object *obj, const char *name,
+                                            const uint8_t *v,
+                                            ObjectPropertyFlags flags);
+
 /**
  * object_property_set_description:
  * @obj: the object owning the property
diff --git a/qom/object.c b/qom/object.c
index 6a01d56546..e33a0b8c5d 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -15,6 +15,7 @@
 #include "qapi/error.h"
 #include "qom/object.h"
 #include "qom/object_interfaces.h"
+#include "qemu/ctype.h"
 #include "qemu/cutils.h"
 #include "qapi/visitor.h"
 #include "qapi/string-input-visitor.h"
@@ -2749,6 +2750,81 @@ object_property_add_alias(Object *obj, const char *name,
     return op;
 }
 
+#define SHA384_DIGEST_SIZE      48
+static void property_get_sha384(Object *obj, Visitor *v, const char *name,
+                                void *opaque, Error **errp)
+{
+    uint8_t *value = (uint8_t *)opaque;
+    char str[SHA384_DIGEST_SIZE * 2 + 1];
+    char *str_ = (char*)str;
+    size_t i;
+
+    for (i = 0; i < SHA384_DIGEST_SIZE; i++) {
+        char *buf;
+        buf = &str[i * 2];
+
+        sprintf(buf, "%02hhx", value[i]);
+    }
+    str[SHA384_DIGEST_SIZE * 2] = '\0';
+
+    visit_type_str(v, name, &str_, errp);
+}
+
+static void property_set_sha384(Object *obj, Visitor *v, const char *name,
+                                    void *opaque, Error **errp)
+{
+    uint8_t *value = (uint8_t *)opaque;
+    char* str;
+    size_t len;
+    size_t i;
+
+    if (!visit_type_str(v, name, &str, errp)) {
+        goto err;
+    }
+
+    len = strlen(str);
+    if (len != SHA384_DIGEST_SIZE * 2) {
+        error_setg(errp, "invalid length for sha348 hex string %s. "
+                   "it must be 48 * 2 hex", name);
+        goto err;
+    }
+
+    for (i = 0; i < SHA384_DIGEST_SIZE; i++) {
+        if (!qemu_isxdigit(str[i * 2]) || !qemu_isxdigit(str[i * 2 + 1])) {
+            error_setg(errp, "invalid char for sha318 hex string %s at %c%c",
+                       name, str[i * 2], str[i * 2 + 1]);
+            goto err;
+        }
+
+        if (sscanf(str + i * 2, "%02hhx", &value[i]) != 1) {
+            error_setg(errp, "invalid format for sha318 hex string %s", name);
+            goto err;
+        }
+    }
+
+err:
+    g_free(str);
+}
+
+ObjectProperty *
+object_property_add_sha384(Object *obj, const char *name,
+                           const uint8_t *v, ObjectPropertyFlags flags)
+{
+    ObjectPropertyAccessor *getter = NULL;
+    ObjectPropertyAccessor *setter = NULL;
+
+    if ((flags & OBJ_PROP_FLAG_READ) == OBJ_PROP_FLAG_READ) {
+        getter = property_get_sha384;
+    }
+
+    if ((flags & OBJ_PROP_FLAG_WRITE) == OBJ_PROP_FLAG_WRITE) {
+        setter = property_set_sha384;
+    }
+
+    return object_property_add(obj, name, "sha384",
+                               getter, setter, NULL, (void *)v);
+}
+
 void object_property_set_description(Object *obj, const char *name,
                                      const char *description)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 31/44] target/i386/tdx: Allows mrconfigid/mrowner/mrownerconfig for TDX_INIT_VM
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

When creating VM with TDX_INIT_VM, three sha384 hash values are accepted
for TDX attestation.
So far they were hard coded as 0. Now allow user to specify those values
via property mrconfigid, mrowner and mrownerconfig.
string for those property are hex string of 48 * 2 length.

example
-device tdx-guest, \
  mrconfigid=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef, \
  mrowner=fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210, \
  mrownerconfig=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 qapi/qom.json         | 11 ++++++++++-
 target/i386/kvm/tdx.c | 17 +++++++++++++++++
 target/i386/kvm/tdx.h |  3 +++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 70c70e3efe..8f8b7828b3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -767,10 +767,19 @@
 #
 # @debug: enable debug mode (default: off)
 #
+# @mrconfigid: MRCONFIGID SHA384 hex string of 48 * 2 length (default: 0)
+#
+# @mrowner: MROWNER SHA384 hex string of 48 * 2 length (default: 0)
+#
+# @mrownerconfig: MROWNERCONFIG SHA384 hex string of 48 * 2 length (default: 0)
+#
 # Since: 6.0
 ##
 { 'struct': 'TdxGuestProperties',
-  'data': { '*debug': 'bool' } }
+  'data': { '*debug': 'bool',
+            '*mrconfigid': 'str',
+            '*mrowner': 'str',
+            '*mrownerconfig': 'str' } }
 
 ##
 # @ObjectType:
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 47a502051c..6b560c1c0b 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -282,6 +282,17 @@ void tdx_pre_create_vcpu(CPUState *cpu)
     init_vm.attributes |= tdx->debug ? TDX1_TD_ATTRIBUTE_DEBUG : 0;
     init_vm.attributes |= x86cpu->enable_pmu ? TDX1_TD_ATTRIBUTE_PERFMON : 0;
 
+    QEMU_BUILD_BUG_ON(sizeof(init_vm.mrconfigid) != sizeof(tdx->mrconfigid));
+    memcpy(init_vm.mrconfigid, tdx->mrconfigid, sizeof(init_vm.mrconfigid));
+    QEMU_BUILD_BUG_ON(sizeof(init_vm.mrowner) != sizeof(tdx->mrowner));
+    memcpy(init_vm.mrowner, tdx->mrowner, sizeof(init_vm.mrowner));
+    QEMU_BUILD_BUG_ON(sizeof(init_vm.mrownerconfig) !=
+                      sizeof(tdx->mrownerconfig));
+    memcpy(init_vm.mrownerconfig, tdx->mrownerconfig,
+           sizeof(init_vm.mrownerconfig));
+
+    memset(init_vm.reserved, 0, sizeof(init_vm.reserved));
+
     init_vm.cpuid = (__u64)(&cpuid_data);
     tdx_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
 out:
@@ -336,6 +347,12 @@ static void tdx_guest_init(Object *obj)
     tdx->debug = false;
     object_property_add_bool(obj, "debug", tdx_guest_get_debug,
                              tdx_guest_set_debug);
+    object_property_add_sha384(obj, "mrconfigid", tdx->mrconfigid,
+                               OBJ_PROP_FLAG_READWRITE);
+    object_property_add_sha384(obj, "mrowner", tdx->mrowner,
+                               OBJ_PROP_FLAG_READWRITE);
+    object_property_add_sha384(obj, "mrownerconfig", tdx->mrownerconfig,
+                               OBJ_PROP_FLAG_READWRITE);
 }
 
 static void tdx_guest_finalize(Object *obj)
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 2fed27b3fb..4132d1be30 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -44,6 +44,9 @@ typedef struct TdxGuest {
 
     bool initialized;
     bool debug;
+    uint8_t mrconfigid[48];     /* sha348 digest */
+    uint8_t mrowner[48];        /* sha348 digest */
+    uint8_t mrownerconfig[48];  /* sha348 digest */
 
     TdxFirmware fw;
 } TdxGuest;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 31/44] target/i386/tdx: Allows mrconfigid/mrowner/mrownerconfig for TDX_INIT_VM
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

When creating VM with TDX_INIT_VM, three sha384 hash values are accepted
for TDX attestation.
So far they were hard coded as 0. Now allow user to specify those values
via property mrconfigid, mrowner and mrownerconfig.
string for those property are hex string of 48 * 2 length.

example
-device tdx-guest, \
  mrconfigid=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef, \
  mrowner=fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210, \
  mrownerconfig=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 qapi/qom.json         | 11 ++++++++++-
 target/i386/kvm/tdx.c | 17 +++++++++++++++++
 target/i386/kvm/tdx.h |  3 +++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 70c70e3efe..8f8b7828b3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -767,10 +767,19 @@
 #
 # @debug: enable debug mode (default: off)
 #
+# @mrconfigid: MRCONFIGID SHA384 hex string of 48 * 2 length (default: 0)
+#
+# @mrowner: MROWNER SHA384 hex string of 48 * 2 length (default: 0)
+#
+# @mrownerconfig: MROWNERCONFIG SHA384 hex string of 48 * 2 length (default: 0)
+#
 # Since: 6.0
 ##
 { 'struct': 'TdxGuestProperties',
-  'data': { '*debug': 'bool' } }
+  'data': { '*debug': 'bool',
+            '*mrconfigid': 'str',
+            '*mrowner': 'str',
+            '*mrownerconfig': 'str' } }
 
 ##
 # @ObjectType:
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 47a502051c..6b560c1c0b 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -282,6 +282,17 @@ void tdx_pre_create_vcpu(CPUState *cpu)
     init_vm.attributes |= tdx->debug ? TDX1_TD_ATTRIBUTE_DEBUG : 0;
     init_vm.attributes |= x86cpu->enable_pmu ? TDX1_TD_ATTRIBUTE_PERFMON : 0;
 
+    QEMU_BUILD_BUG_ON(sizeof(init_vm.mrconfigid) != sizeof(tdx->mrconfigid));
+    memcpy(init_vm.mrconfigid, tdx->mrconfigid, sizeof(init_vm.mrconfigid));
+    QEMU_BUILD_BUG_ON(sizeof(init_vm.mrowner) != sizeof(tdx->mrowner));
+    memcpy(init_vm.mrowner, tdx->mrowner, sizeof(init_vm.mrowner));
+    QEMU_BUILD_BUG_ON(sizeof(init_vm.mrownerconfig) !=
+                      sizeof(tdx->mrownerconfig));
+    memcpy(init_vm.mrownerconfig, tdx->mrownerconfig,
+           sizeof(init_vm.mrownerconfig));
+
+    memset(init_vm.reserved, 0, sizeof(init_vm.reserved));
+
     init_vm.cpuid = (__u64)(&cpuid_data);
     tdx_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
 out:
@@ -336,6 +347,12 @@ static void tdx_guest_init(Object *obj)
     tdx->debug = false;
     object_property_add_bool(obj, "debug", tdx_guest_get_debug,
                              tdx_guest_set_debug);
+    object_property_add_sha384(obj, "mrconfigid", tdx->mrconfigid,
+                               OBJ_PROP_FLAG_READWRITE);
+    object_property_add_sha384(obj, "mrowner", tdx->mrowner,
+                               OBJ_PROP_FLAG_READWRITE);
+    object_property_add_sha384(obj, "mrownerconfig", tdx->mrownerconfig,
+                               OBJ_PROP_FLAG_READWRITE);
 }
 
 static void tdx_guest_finalize(Object *obj)
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 2fed27b3fb..4132d1be30 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -44,6 +44,9 @@ typedef struct TdxGuest {
 
     bool initialized;
     bool debug;
+    uint8_t mrconfigid[48];     /* sha348 digest */
+    uint8_t mrowner[48];        /* sha348 digest */
+    uint8_t mrownerconfig[48];  /* sha348 digest */
 
     TdxFirmware fw;
 } TdxGuest;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 32/44] tdx: add kvm_tdx_enabled() accessor for later use
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/sysemu/tdx.h  | 1 +
 target/i386/kvm/kvm.c | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index 70eb01348f..f3eced10f9 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -6,6 +6,7 @@
 #include "hw/i386/pc.h"
 
 bool kvm_has_tdx(KVMState *s);
+bool kvm_tdx_enabled(void);
 int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
 #endif
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index af6b5f350e..76c3ea9fac 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -152,6 +152,11 @@ int kvm_set_vm_type(MachineState *ms, int kvm_type)
     return -ENOTSUP;
 }
 
+bool kvm_tdx_enabled(void)
+{
+    return vm_type == KVM_X86_TDX_VM;
+}
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 32/44] tdx: add kvm_tdx_enabled() accessor for later use
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/sysemu/tdx.h  | 1 +
 target/i386/kvm/kvm.c | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index 70eb01348f..f3eced10f9 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -6,6 +6,7 @@
 #include "hw/i386/pc.h"
 
 bool kvm_has_tdx(KVMState *s);
+bool kvm_tdx_enabled(void);
 int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
 #endif
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index af6b5f350e..76c3ea9fac 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -152,6 +152,11 @@ int kvm_set_vm_type(MachineState *ms, int kvm_type)
     return -ENOTSUP;
 }
 
+bool kvm_tdx_enabled(void)
+{
+    return vm_type == KVM_X86_TDX_VM;
+}
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 33/44] qmp: add query-tdx-capabilities query-tdx command
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata, Chenyi Qiang

From: Chenyi Qiang <chenyi.qiang@intel.com>

Add QMP commands that can be used by libvirt to query the TDX capabilities
and TDX info.  The set of capabilities that needs to be reported is only
enabled at the moment, which means TDX is enabled.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/sysemu/tdx.h       |  6 ++++
 qapi/misc-target.json      | 59 ++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx-stub.c | 10 +++++++
 target/i386/kvm/tdx.c      | 19 ++++++++++++
 target/i386/monitor.c      | 23 +++++++++++++++
 5 files changed, 117 insertions(+)

diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index f3eced10f9..756f46d2de 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -13,4 +13,10 @@ int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
 void tdx_pre_create_vcpu(CPUState *cpu);
 void tdx_post_init_vcpu(CPUState *cpu);
 
+struct TDXInfo;
+struct TDXInfo *tdx_get_info(void);
+
+struct TDXCapability;
+struct TDXCapability *tdx_get_capabilities(void);
+
 #endif
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 5573dcf8f0..c1de95c082 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -323,3 +323,62 @@
 { 'command': 'query-sev-attestation-report', 'data': { 'mnonce': 'str' },
   'returns': 'SevAttestationReport',
   'if': 'defined(TARGET_I386)' }
+
+##
+# @TDXInfo:
+#
+# Information about Trust Domain Extensions (TDX) support
+#
+# @enabled: true if TDX is active
+#
+##
+{ 'struct': 'TDXInfo',
+    'data': { 'enabled': 'bool' },
+  'if': 'defined(TARGET_I386)'
+}
+
+##
+# @query-tdx:
+#
+# Returns information about TDX
+#
+# Returns: @TdxInfo
+#
+#
+# Example:
+#
+# -> { "execute": "query-tdx" }
+# <- { "return": { "enabled": true } }
+#
+##
+{ 'command': 'query-tdx', 'returns': 'TDXInfo',
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @TDXCapability:
+#
+# The struct describes capability for a TDX
+# feature.
+#
+##
+{ 'struct': 'TDXCapability',
+  'data': { 'enabled': 'bool' },
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @query-tdx-capabilities:
+#
+# This command is used to get the TDX capabilities, and is supported on Intel
+# X86 platforms only.
+#
+# Returns: @TDXCapability.
+#
+#
+# Example:
+#
+# -> { "execute": "query-tdx-capabilities" }
+# <- { "return": { 'enabled': 'bool' }}
+#
+##
+{ 'command': 'query-tdx-capabilities', 'returns': 'TDXCapability',
+  'if': 'defined(TARGET_I386)' }
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 4e1a0a4280..5d8faf0716 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -21,3 +21,13 @@ void tdx_pre_create_vcpu(CPUState *cpu)
 void tdx_post_init_vcpu(CPUState *cpu)
 {
 }
+
+struct TDXInfo *tdx_get_info(void)
+{
+    return NULL;
+}
+
+struct TDXCapability *tdx_get_capabilities(void)
+{
+    return NULL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 6b560c1c0b..1316d95209 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -22,6 +22,7 @@
 #include "hw/i386/tdvf-hob.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "qapi/qapi-types-misc-target.h"
 #include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
@@ -39,6 +40,24 @@ bool kvm_has_tdx(KVMState *s)
     return !!(kvm_check_extension(s, KVM_CAP_VM_TYPES) & BIT(KVM_X86_TDX_VM));
 }
 
+TDXInfo *tdx_get_info(void)
+{
+    TDXInfo *info;
+
+    info = g_new0(TDXInfo, 1);
+    info->enabled = kvm_enabled() && kvm_tdx_enabled();
+    return info;
+}
+
+TDXCapability *tdx_get_capabilities(void)
+{
+    TDXCapability *cap;
+
+    cap = g_new0(TDXCapability, 1);
+    cap->enabled = kvm_enabled() && kvm_has_tdx(kvm_state);
+    return cap;
+}
+
 static void __tdx_ioctl(void *state, int ioctl_no, const char *ioctl_name,
                         __u32 metadata, void *data)
 {
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 119211f0b0..c0be99d13d 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -30,6 +30,7 @@
 #include "qapi/qmp/qdict.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sev.h"
+#include "sysemu/tdx.h"
 #include "qapi/error.h"
 #include "sev_i386.h"
 #include "qapi/qapi-commands-misc-target.h"
@@ -763,3 +764,25 @@ qmp_query_sev_attestation_report(const char *mnonce, Error **errp)
 {
     return sev_get_attestation_report(mnonce, errp);
 }
+
+TDXInfo *qmp_query_tdx(Error **errp)
+{
+    TDXInfo *info;
+
+    info = tdx_get_info();
+    if (!info) {
+        error_setg(errp, "TDX is not available.");
+    }
+    return info;
+}
+
+TDXCapability *qmp_query_tdx_capabilities(Error **errp)
+{
+    TDXCapability *cap;
+
+    cap = tdx_get_capabilities();
+    if (!cap) {
+        error_setg(errp, "TDX is not available.");
+    }
+    return cap;
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 33/44] qmp: add query-tdx-capabilities query-tdx command
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm, Chenyi Qiang

From: Chenyi Qiang <chenyi.qiang@intel.com>

Add QMP commands that can be used by libvirt to query the TDX capabilities
and TDX info.  The set of capabilities that needs to be reported is only
enabled at the moment, which means TDX is enabled.

Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/sysemu/tdx.h       |  6 ++++
 qapi/misc-target.json      | 59 ++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/tdx-stub.c | 10 +++++++
 target/i386/kvm/tdx.c      | 19 ++++++++++++
 target/i386/monitor.c      | 23 +++++++++++++++
 5 files changed, 117 insertions(+)

diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
index f3eced10f9..756f46d2de 100644
--- a/include/sysemu/tdx.h
+++ b/include/sysemu/tdx.h
@@ -13,4 +13,10 @@ int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
 void tdx_pre_create_vcpu(CPUState *cpu);
 void tdx_post_init_vcpu(CPUState *cpu);
 
+struct TDXInfo;
+struct TDXInfo *tdx_get_info(void);
+
+struct TDXCapability;
+struct TDXCapability *tdx_get_capabilities(void);
+
 #endif
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 5573dcf8f0..c1de95c082 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -323,3 +323,62 @@
 { 'command': 'query-sev-attestation-report', 'data': { 'mnonce': 'str' },
   'returns': 'SevAttestationReport',
   'if': 'defined(TARGET_I386)' }
+
+##
+# @TDXInfo:
+#
+# Information about Trust Domain Extensions (TDX) support
+#
+# @enabled: true if TDX is active
+#
+##
+{ 'struct': 'TDXInfo',
+    'data': { 'enabled': 'bool' },
+  'if': 'defined(TARGET_I386)'
+}
+
+##
+# @query-tdx:
+#
+# Returns information about TDX
+#
+# Returns: @TdxInfo
+#
+#
+# Example:
+#
+# -> { "execute": "query-tdx" }
+# <- { "return": { "enabled": true } }
+#
+##
+{ 'command': 'query-tdx', 'returns': 'TDXInfo',
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @TDXCapability:
+#
+# The struct describes capability for a TDX
+# feature.
+#
+##
+{ 'struct': 'TDXCapability',
+  'data': { 'enabled': 'bool' },
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @query-tdx-capabilities:
+#
+# This command is used to get the TDX capabilities, and is supported on Intel
+# X86 platforms only.
+#
+# Returns: @TDXCapability.
+#
+#
+# Example:
+#
+# -> { "execute": "query-tdx-capabilities" }
+# <- { "return": { 'enabled': 'bool' }}
+#
+##
+{ 'command': 'query-tdx-capabilities', 'returns': 'TDXCapability',
+  'if': 'defined(TARGET_I386)' }
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 4e1a0a4280..5d8faf0716 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -21,3 +21,13 @@ void tdx_pre_create_vcpu(CPUState *cpu)
 void tdx_post_init_vcpu(CPUState *cpu)
 {
 }
+
+struct TDXInfo *tdx_get_info(void)
+{
+    return NULL;
+}
+
+struct TDXCapability *tdx_get_capabilities(void)
+{
+    return NULL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 6b560c1c0b..1316d95209 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -22,6 +22,7 @@
 #include "hw/i386/tdvf-hob.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "qapi/qapi-types-misc-target.h"
 #include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
@@ -39,6 +40,24 @@ bool kvm_has_tdx(KVMState *s)
     return !!(kvm_check_extension(s, KVM_CAP_VM_TYPES) & BIT(KVM_X86_TDX_VM));
 }
 
+TDXInfo *tdx_get_info(void)
+{
+    TDXInfo *info;
+
+    info = g_new0(TDXInfo, 1);
+    info->enabled = kvm_enabled() && kvm_tdx_enabled();
+    return info;
+}
+
+TDXCapability *tdx_get_capabilities(void)
+{
+    TDXCapability *cap;
+
+    cap = g_new0(TDXCapability, 1);
+    cap->enabled = kvm_enabled() && kvm_has_tdx(kvm_state);
+    return cap;
+}
+
 static void __tdx_ioctl(void *state, int ioctl_no, const char *ioctl_name,
                         __u32 metadata, void *data)
 {
diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 119211f0b0..c0be99d13d 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -30,6 +30,7 @@
 #include "qapi/qmp/qdict.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sev.h"
+#include "sysemu/tdx.h"
 #include "qapi/error.h"
 #include "sev_i386.h"
 #include "qapi/qapi-commands-misc-target.h"
@@ -763,3 +764,25 @@ qmp_query_sev_attestation_report(const char *mnonce, Error **errp)
 {
     return sev_get_attestation_report(mnonce, errp);
 }
+
+TDXInfo *qmp_query_tdx(Error **errp)
+{
+    TDXInfo *info;
+
+    info = tdx_get_info();
+    if (!info) {
+        error_setg(errp, "TDX is not available.");
+    }
+    return info;
+}
+
+TDXCapability *qmp_query_tdx_capabilities(Error **errp)
+{
+    TDXCapability *cap;
+
+    cap = tdx_get_capabilities();
+    if (!cap) {
+        error_setg(errp, "TDX is not available.");
+    }
+    return cap;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 34/44] target/i386/tdx: set reboot action to shutdown when tdx
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

In TDX CPU state is also protected, thus vcpu state can't be reset by VMM.
It assumes -action reboot=shutdown instead of silently ignoring vcpu reset.

TDX module spec version 344425-002US doesn't support vcpu reset by VMM.  VM
needs to be destroyed and created again to emulate REBOOT_ACTION_RESET.
For simplicity, put its responsibility to management system like libvirt
because it's difficult for the current qemu implementation to destroy and
re-create KVM VM resources with keeping other resources.

If management system wants reboot behavior for its users, it needs to
 - set reboot_action to REBOOT_ACTION_SHUTDOWN,
 - set shutdown_action to SHUTDOWN_ACTION_PAUSE optionally and,
 - subscribe VM state change and on reboot, (destroy qemu if
   SHUTDOWN_ACTION_PAUSE and) start new qemu.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/tdx.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 1316d95209..0621317b0a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -25,6 +25,7 @@
 #include "qapi/qapi-types-misc-target.h"
 #include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/runstate-action.h"
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/tdx.h"
@@ -363,6 +364,19 @@ static void tdx_guest_init(Object *obj)
 
     qemu_mutex_init(&tdx->lock);
 
+    /*
+     * TDX module spec version 344425-002US doesn't support reset of vcpu by
+     * VMM.  VM needs to be destroyed and created again to emulate
+     * REBOOT_ACTION_RESET.  For simplicity, put its responsibility to
+     * management system like libvirt.
+     *
+     * Management system should
+     *  - set reboot_action to REBOOT_ACTION_SHUTDOWN
+     *  - set shutdown_action to SHUTDOWN_ACTION_PAUSE
+     *  - subscribe VM state and on reboot, destroy qemu and start new qemu
+     */
+    reboot_action = REBOOT_ACTION_SHUTDOWN;
+
     tdx->debug = false;
     object_property_add_bool(obj, "debug", tdx_guest_get_debug,
                              tdx_guest_set_debug);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 34/44] target/i386/tdx: set reboot action to shutdown when tdx
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

In TDX CPU state is also protected, thus vcpu state can't be reset by VMM.
It assumes -action reboot=shutdown instead of silently ignoring vcpu reset.

TDX module spec version 344425-002US doesn't support vcpu reset by VMM.  VM
needs to be destroyed and created again to emulate REBOOT_ACTION_RESET.
For simplicity, put its responsibility to management system like libvirt
because it's difficult for the current qemu implementation to destroy and
re-create KVM VM resources with keeping other resources.

If management system wants reboot behavior for its users, it needs to
 - set reboot_action to REBOOT_ACTION_SHUTDOWN,
 - set shutdown_action to SHUTDOWN_ACTION_PAUSE optionally and,
 - subscribe VM state change and on reboot, (destroy qemu if
   SHUTDOWN_ACTION_PAUSE and) start new qemu.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/tdx.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 1316d95209..0621317b0a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -25,6 +25,7 @@
 #include "qapi/qapi-types-misc-target.h"
 #include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/runstate-action.h"
 #include "sysemu/kvm.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/tdx.h"
@@ -363,6 +364,19 @@ static void tdx_guest_init(Object *obj)
 
     qemu_mutex_init(&tdx->lock);
 
+    /*
+     * TDX module spec version 344425-002US doesn't support reset of vcpu by
+     * VMM.  VM needs to be destroyed and created again to emulate
+     * REBOOT_ACTION_RESET.  For simplicity, put its responsibility to
+     * management system like libvirt.
+     *
+     * Management system should
+     *  - set reboot_action to REBOOT_ACTION_SHUTDOWN
+     *  - set shutdown_action to SHUTDOWN_ACTION_PAUSE
+     *  - subscribe VM state and on reboot, destroy qemu and start new qemu
+     */
+    reboot_action = REBOOT_ACTION_SHUTDOWN;
+
     tdx->debug = false;
     object_property_add_bool(obj, "debug", tdx_guest_get_debug,
                              tdx_guest_set_debug);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 35/44] ioapic: add property to disable level interrupt
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

According to TDX module spec version 344425-002US [1], VMM can inject
virtual interrupt only via posted interrupt and VMM can't get TDEXIT on
guest EOI to virtual x2APIC.  Because posted interrupt is edge-trigger and
VMM needs to hook guest EOI to re-inject level-triggered interrupt if the
level still active, level-trigger isn't supported for TD Guest VM.

Prevent trigger mode from setting to be level trigger with warning.
Without this guard, qemu can result in unexpected behavior later.

[1] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1eas-v0.85.039.pdf

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/intc/ioapic.c                  | 20 ++++++++++++++++++++
 hw/intc/ioapic_common.c           | 27 +++++++++++++++++++++++++++
 include/hw/i386/ioapic_internal.h |  1 +
 3 files changed, 48 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 264262959d..6d61744961 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -364,6 +364,23 @@ ioapic_fix_edge_remote_irr(uint64_t *entry)
     }
 }
 
+static inline void
+ioapic_fix_level_trigger_unsupported(uint64_t *entry)
+{
+    if ((*entry & IOAPIC_LVT_TRIGGER_MODE) !=
+        IOAPIC_TRIGGER_EDGE << IOAPIC_LVT_TRIGGER_MODE_SHIFT) {
+        /*
+         * ignore a request for level trigger because
+         * level trigger requires eoi intercept to re-inject
+         * interrupt when the level is still active.
+         */
+        warn_report_once("attempting to set level-trigger mode "
+                         "while eoi intercept isn't supported");
+        *entry &= ~IOAPIC_LVT_TRIGGER_MODE;
+        *entry |= IOAPIC_TRIGGER_EDGE << IOAPIC_LVT_TRIGGER_MODE_SHIFT;
+    }
+}
+
 static void
 ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                  unsigned int size)
@@ -404,6 +421,9 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                 s->ioredtbl[index] &= IOAPIC_RW_BITS;
                 s->ioredtbl[index] |= ro_bits;
                 s->irq_eoi[index] = 0;
+                if (s->level_trigger_unsupported) {
+                    ioapic_fix_level_trigger_unsupported(&s->ioredtbl[index]);
+                }
                 ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
                 ioapic_service(s);
             }
diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c
index 3cccfc1556..07ee142470 100644
--- a/hw/intc/ioapic_common.c
+++ b/hw/intc/ioapic_common.c
@@ -150,6 +150,32 @@ static int ioapic_dispatch_post_load(void *opaque, int version_id)
     return 0;
 }
 
+static bool ioapic_common_get_level_trigger_unsupported(Object *obj,
+                                                        Error **errp)
+{
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    return s->level_trigger_unsupported;
+}
+
+static void ioapic_common_set_level_trigger_unsupported(Object *obj, bool value,
+                                                       Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    /* only disabling before realize is allowed */
+    assert(!dev->realized);
+    assert(!s->level_trigger_unsupported);
+    s->level_trigger_unsupported = value;
+}
+
+static void ioapic_common_init(Object *obj)
+{
+    object_property_add_bool(obj, "level_trigger_unsupported",
+                             ioapic_common_get_level_trigger_unsupported,
+                             ioapic_common_set_level_trigger_unsupported);
+
+}
+
 static void ioapic_common_realize(DeviceState *dev, Error **errp)
 {
     IOAPICCommonState *s = IOAPIC_COMMON(dev);
@@ -207,6 +233,7 @@ static const TypeInfo ioapic_common_type = {
     .name = TYPE_IOAPIC_COMMON,
     .parent = TYPE_SYS_BUS_DEVICE,
     .instance_size = sizeof(IOAPICCommonState),
+    .instance_init = ioapic_common_init,
     .class_size = sizeof(IOAPICCommonClass),
     .class_init = ioapic_common_class_init,
     .abstract = true,
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index 021e715f11..20f2fc7897 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -103,6 +103,7 @@ struct IOAPICCommonState {
     uint32_t irr;
     uint64_t ioredtbl[IOAPIC_NUM_PINS];
     Notifier machine_done;
+    bool level_trigger_unsupported;
     uint8_t version;
     uint64_t irq_count[IOAPIC_NUM_PINS];
     int irq_level[IOAPIC_NUM_PINS];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 35/44] ioapic: add property to disable level interrupt
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

According to TDX module spec version 344425-002US [1], VMM can inject
virtual interrupt only via posted interrupt and VMM can't get TDEXIT on
guest EOI to virtual x2APIC.  Because posted interrupt is edge-trigger and
VMM needs to hook guest EOI to re-inject level-triggered interrupt if the
level still active, level-trigger isn't supported for TD Guest VM.

Prevent trigger mode from setting to be level trigger with warning.
Without this guard, qemu can result in unexpected behavior later.

[1] https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1eas-v0.85.039.pdf

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/intc/ioapic.c                  | 20 ++++++++++++++++++++
 hw/intc/ioapic_common.c           | 27 +++++++++++++++++++++++++++
 include/hw/i386/ioapic_internal.h |  1 +
 3 files changed, 48 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 264262959d..6d61744961 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -364,6 +364,23 @@ ioapic_fix_edge_remote_irr(uint64_t *entry)
     }
 }
 
+static inline void
+ioapic_fix_level_trigger_unsupported(uint64_t *entry)
+{
+    if ((*entry & IOAPIC_LVT_TRIGGER_MODE) !=
+        IOAPIC_TRIGGER_EDGE << IOAPIC_LVT_TRIGGER_MODE_SHIFT) {
+        /*
+         * ignore a request for level trigger because
+         * level trigger requires eoi intercept to re-inject
+         * interrupt when the level is still active.
+         */
+        warn_report_once("attempting to set level-trigger mode "
+                         "while eoi intercept isn't supported");
+        *entry &= ~IOAPIC_LVT_TRIGGER_MODE;
+        *entry |= IOAPIC_TRIGGER_EDGE << IOAPIC_LVT_TRIGGER_MODE_SHIFT;
+    }
+}
+
 static void
 ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                  unsigned int size)
@@ -404,6 +421,9 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                 s->ioredtbl[index] &= IOAPIC_RW_BITS;
                 s->ioredtbl[index] |= ro_bits;
                 s->irq_eoi[index] = 0;
+                if (s->level_trigger_unsupported) {
+                    ioapic_fix_level_trigger_unsupported(&s->ioredtbl[index]);
+                }
                 ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
                 ioapic_service(s);
             }
diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c
index 3cccfc1556..07ee142470 100644
--- a/hw/intc/ioapic_common.c
+++ b/hw/intc/ioapic_common.c
@@ -150,6 +150,32 @@ static int ioapic_dispatch_post_load(void *opaque, int version_id)
     return 0;
 }
 
+static bool ioapic_common_get_level_trigger_unsupported(Object *obj,
+                                                        Error **errp)
+{
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    return s->level_trigger_unsupported;
+}
+
+static void ioapic_common_set_level_trigger_unsupported(Object *obj, bool value,
+                                                       Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    /* only disabling before realize is allowed */
+    assert(!dev->realized);
+    assert(!s->level_trigger_unsupported);
+    s->level_trigger_unsupported = value;
+}
+
+static void ioapic_common_init(Object *obj)
+{
+    object_property_add_bool(obj, "level_trigger_unsupported",
+                             ioapic_common_get_level_trigger_unsupported,
+                             ioapic_common_set_level_trigger_unsupported);
+
+}
+
 static void ioapic_common_realize(DeviceState *dev, Error **errp)
 {
     IOAPICCommonState *s = IOAPIC_COMMON(dev);
@@ -207,6 +233,7 @@ static const TypeInfo ioapic_common_type = {
     .name = TYPE_IOAPIC_COMMON,
     .parent = TYPE_SYS_BUS_DEVICE,
     .instance_size = sizeof(IOAPICCommonState),
+    .instance_init = ioapic_common_init,
     .class_size = sizeof(IOAPICCommonClass),
     .class_init = ioapic_common_class_init,
     .abstract = true,
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index 021e715f11..20f2fc7897 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -103,6 +103,7 @@ struct IOAPICCommonState {
     uint32_t irr;
     uint64_t ioredtbl[IOAPIC_NUM_PINS];
     Notifier machine_done;
+    bool level_trigger_unsupported;
     uint8_t version;
     uint64_t irq_count[IOAPIC_NUM_PINS];
     int irq_level[IOAPIC_NUM_PINS];
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 36/44] hw/i386: add eoi_intercept_unsupported member to X86MachineState
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a new bool member, eoi_intercept_unsupported, to X86MachineState with
default value false.  Set true when tdx kvm type.  Inability to intercept
eoi causes impossibility to emulate level triggered interrupt to be
re-injected when level is still kept active.  which affects interrupt
controller emulation. Such new behavior will be introduced later.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/x86.c         | 1 +
 include/hw/i386/x86.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index ed15f6f2cf..9862fe5bc9 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1311,6 +1311,7 @@ static void x86_machine_initfn(Object *obj)
     x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
     x86ms->bus_lock_ratelimit = 0;
+    x86ms->eoi_intercept_unsupported = false;
 
     object_property_add_str(obj, "kvm-type",
                             x86_get_kvm_type, x86_set_kvm_type);
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index a450b5e226..6eff42550f 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -63,6 +63,7 @@ struct X86MachineState {
     unsigned pci_irq_mask;
     unsigned apic_id_limit;
     uint16_t boot_cpus;
+    bool eoi_intercept_unsupported;
 
     OnOffAuto smm;
     OnOffAuto acpi;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 36/44] hw/i386: add eoi_intercept_unsupported member to X86MachineState
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a new bool member, eoi_intercept_unsupported, to X86MachineState with
default value false.  Set true when tdx kvm type.  Inability to intercept
eoi causes impossibility to emulate level triggered interrupt to be
re-injected when level is still kept active.  which affects interrupt
controller emulation. Such new behavior will be introduced later.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/x86.c         | 1 +
 include/hw/i386/x86.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index ed15f6f2cf..9862fe5bc9 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1311,6 +1311,7 @@ static void x86_machine_initfn(Object *obj)
     x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
     x86ms->bus_lock_ratelimit = 0;
+    x86ms->eoi_intercept_unsupported = false;
 
     object_property_add_str(obj, "kvm-type",
                             x86_get_kvm_type, x86_set_kvm_type);
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index a450b5e226..6eff42550f 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -63,6 +63,7 @@ struct X86MachineState {
     unsigned pci_irq_mask;
     unsigned apic_id_limit;
     uint16_t boot_cpus;
+    bool eoi_intercept_unsupported;
 
     OnOffAuto smm;
     OnOffAuto acpi;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 37/44] hw/i386: add option to forcibly report edge trigger in acpi tables
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

When level trigger isn't supported on x86 platform, forcibly report edge
trigger in acpi tables.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/acpi-build.c  | 103 ++++++++++++++++++++++++++++--------------
 hw/i386/acpi-common.c |  74 ++++++++++++++++++++++--------
 2 files changed, 124 insertions(+), 53 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 796ffc6f5c..d0d52258b9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -866,7 +866,8 @@ static void build_dbg_aml(Aml *table)
     aml_append(table, scope);
 }
 
-static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
+static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
+                           bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -878,7 +879,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
     aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
 
     crs = aml_resource_template();
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
+                                  AML_ACTIVE_HIGH,
                                   AML_SHARED, irqs, ARRAY_SIZE(irqs)));
     aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -902,7 +906,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
     return dev;
  }
 
-static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
+static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
+                               uint8_t gsi, bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -915,7 +920,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
 
     crs = aml_resource_template();
     irqs = gsi;
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
+                                  AML_ACTIVE_HIGH,
                                   AML_SHARED, &irqs, 1));
     aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -934,7 +942,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
 }
 
 /* _CRS method - get current settings */
-static Aml *build_iqcr_method(bool is_piix4)
+static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
 {
     Aml *if_ctx;
     uint32_t irqs;
@@ -942,7 +950,9 @@ static Aml *build_iqcr_method(bool is_piix4)
     Aml *crs = aml_resource_template();
 
     irqs = 0;
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
                                   AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
     aml_append(method, aml_name_decl("PRR0", crs));
 
@@ -976,7 +986,7 @@ static Aml *build_irq_status_method(void)
     return method;
 }
 
-static void build_piix4_pci0_int(Aml *table)
+static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -997,12 +1007,16 @@ static void build_piix4_pci0_int(Aml *table)
     aml_append(sb_scope, field);
 
     aml_append(sb_scope, build_irq_status_method());
-    aml_append(sb_scope, build_iqcr_method(true));
+    aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
 
-    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
-    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
-    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
-    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
+    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
+                                        level_trigger_unsupported));
 
     dev = aml_device("LNKS");
     {
@@ -1011,7 +1025,9 @@ static void build_piix4_pci0_int(Aml *table)
 
         crs = aml_resource_template();
         irqs = 9;
-        aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+        aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                      level_trigger_unsupported ?
+                                      AML_EDGE : AML_LEVEL,
                                       AML_ACTIVE_HIGH, AML_SHARED,
                                       &irqs, 1));
         aml_append(dev, aml_name_decl("_PRS", crs));
@@ -1097,7 +1113,7 @@ static Aml *build_q35_routing_table(const char *str)
     return pkg;
 }
 
-static void build_q35_pci0_int(Aml *table)
+static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
 {
     Aml *field;
     Aml *method;
@@ -1149,25 +1165,41 @@ static void build_q35_pci0_int(Aml *table)
     aml_append(sb_scope, field);
 
     aml_append(sb_scope, build_irq_status_method());
-    aml_append(sb_scope, build_iqcr_method(false));
-
-    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
-    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
-    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
-    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
-    aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
-    aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
-    aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
-    aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
-
-    aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
-    aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
-    aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
-    aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
-    aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
-    aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
-    aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
-    aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
+    aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
+
+    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
+                                        level_trigger_unsupported));
+
+    aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
+                                            level_trigger_unsupported));
 
     aml_append(table, sb_scope);
 }
@@ -1370,6 +1402,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
     PCMachineState *pcms = PC_MACHINE(machine);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
     X86MachineState *x86ms = X86_MACHINE(machine);
+    bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
     AcpiMcfgInfo mcfg;
     bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
     uint32_t nr_mem = machine->ram_slots;
@@ -1404,7 +1437,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
             build_piix4_pci_hotplug(dsdt);
         }
-        build_piix4_pci0_int(dsdt);
+        build_piix4_pci0_int(dsdt, level_trigger_unsupported);
     } else {
         sb_scope = aml_scope("_SB");
         dev = aml_device("PCI0");
@@ -1450,7 +1483,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         }
         build_q35_isa_bridge(dsdt);
         build_isa_devices_aml(dsdt);
-        build_q35_pci0_int(dsdt);
+        build_q35_pci0_int(dsdt, level_trigger_unsupported);
         if (pcms->smbus && !pcmc->do_not_add_smb_acpi) {
             build_smb0(dsdt, pcms->smbus, ICH9_SMB_DEV, ICH9_SMB_FUNC);
         }
diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
index 1f5947fcf9..90cb05a46d 100644
--- a/hw/i386/acpi-common.c
+++ b/hw/i386/acpi-common.c
@@ -80,6 +80,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
     int madt_start = table_data->len;
     AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(adev);
     bool x2apic_mode = false;
+    bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
 
     AcpiMultipleApicTable *madt;
     AcpiMadtIoApic *io_apic;
@@ -114,26 +115,63 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
         io_apic2->interrupt = cpu_to_le32(IO_APIC_SECONDARY_IRQBASE);
     }
 
-    if (x86ms->apic_xrupt_override) {
-        intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
-        intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
-        intsrcovr->length = sizeof(*intsrcovr);
-        intsrcovr->source = 0;
-        intsrcovr->gsi    = cpu_to_le32(2);
-        intsrcovr->flags  = cpu_to_le16(0); /* conforms to bus specifications */
-    }
+    if (level_trigger_unsupported) {
+        /* Force edge trigger */
+        if (x86ms->apic_xrupt_override) {
+            intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+            intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+            intsrcovr->length = sizeof(*intsrcovr);
+            intsrcovr->source = 0;
+            intsrcovr->gsi    = cpu_to_le32(2);
+            /* active high, edge triggered */
+            intsrcovr->flags  = cpu_to_le16(1 | (1 << 2));
+        }
+
+        for (i = x86ms->apic_xrupt_override ? 1 : 0; i < 16; i++) {
+            intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+            intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+            intsrcovr->length = sizeof(*intsrcovr);
+            intsrcovr->source = i;
+            intsrcovr->gsi    = cpu_to_le32(i);
+            /* active high, edge triggered */
+            intsrcovr->flags  = cpu_to_le16(1 | (1 << 2));
+        }
+
+        if (x86ms->ioapic2) {
+            for (i = 0; i < 16; i++) {
+                intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+                intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+                intsrcovr->length = sizeof(*intsrcovr);
+                intsrcovr->source = IO_APIC_SECONDARY_IRQBASE + i;
+                intsrcovr->gsi    = cpu_to_le32(IO_APIC_SECONDARY_IRQBASE + i);
+                /* active high, edge triggered */
+                intsrcovr->flags  = cpu_to_le16(1 | (1 << 2));
+            }
+        }
+    } else {
+        if (x86ms->apic_xrupt_override) {
+            intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+            intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+            intsrcovr->length = sizeof(*intsrcovr);
+            intsrcovr->source = 0;
+            intsrcovr->gsi    = cpu_to_le32(2);
+            /* conforms to bus specifications */
+            intsrcovr->flags  = cpu_to_le16(0);
+        }
 
-    for (i = 1; i < 16; i++) {
-        if (!(x86ms->pci_irq_mask & (1 << i))) {
-            /* No need for a INT source override structure. */
-            continue;
+        for (i = 1; i < 16; i++) {
+            if (!(x86ms->pci_irq_mask & (1 << i))) {
+                /* No need for a INT source override structure. */
+                continue;
+            }
+            intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+            intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+            intsrcovr->length = sizeof(*intsrcovr);
+            intsrcovr->source = i;
+            intsrcovr->gsi    = cpu_to_le32(i);
+            /* active high, level triggered */
+            intsrcovr->flags  = cpu_to_le16(0xd);
         }
-        intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
-        intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
-        intsrcovr->length = sizeof(*intsrcovr);
-        intsrcovr->source = i;
-        intsrcovr->gsi    = cpu_to_le32(i);
-        intsrcovr->flags  = cpu_to_le16(0xd); /* active high, level triggered */
     }
 
     if (x2apic_mode) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 37/44] hw/i386: add option to forcibly report edge trigger in acpi tables
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

When level trigger isn't supported on x86 platform, forcibly report edge
trigger in acpi tables.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/acpi-build.c  | 103 ++++++++++++++++++++++++++++--------------
 hw/i386/acpi-common.c |  74 ++++++++++++++++++++++--------
 2 files changed, 124 insertions(+), 53 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 796ffc6f5c..d0d52258b9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -866,7 +866,8 @@ static void build_dbg_aml(Aml *table)
     aml_append(table, scope);
 }
 
-static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
+static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
+                           bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -878,7 +879,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
     aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
 
     crs = aml_resource_template();
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
+                                  AML_ACTIVE_HIGH,
                                   AML_SHARED, irqs, ARRAY_SIZE(irqs)));
     aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -902,7 +906,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
     return dev;
  }
 
-static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
+static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
+                               uint8_t gsi, bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -915,7 +920,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
 
     crs = aml_resource_template();
     irqs = gsi;
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
+                                  AML_ACTIVE_HIGH,
                                   AML_SHARED, &irqs, 1));
     aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -934,7 +942,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
 }
 
 /* _CRS method - get current settings */
-static Aml *build_iqcr_method(bool is_piix4)
+static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
 {
     Aml *if_ctx;
     uint32_t irqs;
@@ -942,7 +950,9 @@ static Aml *build_iqcr_method(bool is_piix4)
     Aml *crs = aml_resource_template();
 
     irqs = 0;
-    aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+    aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                  level_trigger_unsupported ?
+                                  AML_EDGE : AML_LEVEL,
                                   AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
     aml_append(method, aml_name_decl("PRR0", crs));
 
@@ -976,7 +986,7 @@ static Aml *build_irq_status_method(void)
     return method;
 }
 
-static void build_piix4_pci0_int(Aml *table)
+static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
 {
     Aml *dev;
     Aml *crs;
@@ -997,12 +1007,16 @@ static void build_piix4_pci0_int(Aml *table)
     aml_append(sb_scope, field);
 
     aml_append(sb_scope, build_irq_status_method());
-    aml_append(sb_scope, build_iqcr_method(true));
+    aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
 
-    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
-    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
-    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
-    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
+    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
+                                        level_trigger_unsupported));
 
     dev = aml_device("LNKS");
     {
@@ -1011,7 +1025,9 @@ static void build_piix4_pci0_int(Aml *table)
 
         crs = aml_resource_template();
         irqs = 9;
-        aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+        aml_append(crs, aml_interrupt(AML_CONSUMER,
+                                      level_trigger_unsupported ?
+                                      AML_EDGE : AML_LEVEL,
                                       AML_ACTIVE_HIGH, AML_SHARED,
                                       &irqs, 1));
         aml_append(dev, aml_name_decl("_PRS", crs));
@@ -1097,7 +1113,7 @@ static Aml *build_q35_routing_table(const char *str)
     return pkg;
 }
 
-static void build_q35_pci0_int(Aml *table)
+static void build_q35_pci0_int(Aml *table, bool level_trigger_unsupported)
 {
     Aml *field;
     Aml *method;
@@ -1149,25 +1165,41 @@ static void build_q35_pci0_int(Aml *table)
     aml_append(sb_scope, field);
 
     aml_append(sb_scope, build_irq_status_method());
-    aml_append(sb_scope, build_iqcr_method(false));
-
-    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA")));
-    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB")));
-    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC")));
-    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD")));
-    aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE")));
-    aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF")));
-    aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG")));
-    aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH")));
-
-    aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10));
-    aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11));
-    aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12));
-    aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13));
-    aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14));
-    aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15));
-    aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16));
-    aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17));
+    aml_append(sb_scope, build_iqcr_method(false, level_trigger_unsupported));
+
+    aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQA"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQB"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQC"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQD"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKE", 4, aml_name("PRQE"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKF", 5, aml_name("PRQF"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKG", 6, aml_name("PRQG"),
+                                        level_trigger_unsupported));
+    aml_append(sb_scope, build_link_dev("LNKH", 7, aml_name("PRQH"),
+                                        level_trigger_unsupported));
+
+    aml_append(sb_scope, build_gsi_link_dev("GSIA", 0x10, 0x10,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIB", 0x11, 0x11,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIC", 0x12, 0x12,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSID", 0x13, 0x13,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIE", 0x14, 0x14,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIF", 0x15, 0x15,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIG", 0x16, 0x16,
+                                            level_trigger_unsupported));
+    aml_append(sb_scope, build_gsi_link_dev("GSIH", 0x17, 0x17,
+                                            level_trigger_unsupported));
 
     aml_append(table, sb_scope);
 }
@@ -1370,6 +1402,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
     PCMachineState *pcms = PC_MACHINE(machine);
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
     X86MachineState *x86ms = X86_MACHINE(machine);
+    bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
     AcpiMcfgInfo mcfg;
     bool mcfg_valid = !!acpi_get_mcfg(&mcfg);
     uint32_t nr_mem = machine->ram_slots;
@@ -1404,7 +1437,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         if (pm->pcihp_bridge_en || pm->pcihp_root_en) {
             build_piix4_pci_hotplug(dsdt);
         }
-        build_piix4_pci0_int(dsdt);
+        build_piix4_pci0_int(dsdt, level_trigger_unsupported);
     } else {
         sb_scope = aml_scope("_SB");
         dev = aml_device("PCI0");
@@ -1450,7 +1483,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
         }
         build_q35_isa_bridge(dsdt);
         build_isa_devices_aml(dsdt);
-        build_q35_pci0_int(dsdt);
+        build_q35_pci0_int(dsdt, level_trigger_unsupported);
         if (pcms->smbus && !pcmc->do_not_add_smb_acpi) {
             build_smb0(dsdt, pcms->smbus, ICH9_SMB_DEV, ICH9_SMB_FUNC);
         }
diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
index 1f5947fcf9..90cb05a46d 100644
--- a/hw/i386/acpi-common.c
+++ b/hw/i386/acpi-common.c
@@ -80,6 +80,7 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
     int madt_start = table_data->len;
     AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(adev);
     bool x2apic_mode = false;
+    bool level_trigger_unsupported = x86ms->eoi_intercept_unsupported;
 
     AcpiMultipleApicTable *madt;
     AcpiMadtIoApic *io_apic;
@@ -114,26 +115,63 @@ void acpi_build_madt(GArray *table_data, BIOSLinker *linker,
         io_apic2->interrupt = cpu_to_le32(IO_APIC_SECONDARY_IRQBASE);
     }
 
-    if (x86ms->apic_xrupt_override) {
-        intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
-        intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
-        intsrcovr->length = sizeof(*intsrcovr);
-        intsrcovr->source = 0;
-        intsrcovr->gsi    = cpu_to_le32(2);
-        intsrcovr->flags  = cpu_to_le16(0); /* conforms to bus specifications */
-    }
+    if (level_trigger_unsupported) {
+        /* Force edge trigger */
+        if (x86ms->apic_xrupt_override) {
+            intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+            intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+            intsrcovr->length = sizeof(*intsrcovr);
+            intsrcovr->source = 0;
+            intsrcovr->gsi    = cpu_to_le32(2);
+            /* active high, edge triggered */
+            intsrcovr->flags  = cpu_to_le16(1 | (1 << 2));
+        }
+
+        for (i = x86ms->apic_xrupt_override ? 1 : 0; i < 16; i++) {
+            intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+            intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+            intsrcovr->length = sizeof(*intsrcovr);
+            intsrcovr->source = i;
+            intsrcovr->gsi    = cpu_to_le32(i);
+            /* active high, edge triggered */
+            intsrcovr->flags  = cpu_to_le16(1 | (1 << 2));
+        }
+
+        if (x86ms->ioapic2) {
+            for (i = 0; i < 16; i++) {
+                intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+                intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+                intsrcovr->length = sizeof(*intsrcovr);
+                intsrcovr->source = IO_APIC_SECONDARY_IRQBASE + i;
+                intsrcovr->gsi    = cpu_to_le32(IO_APIC_SECONDARY_IRQBASE + i);
+                /* active high, edge triggered */
+                intsrcovr->flags  = cpu_to_le16(1 | (1 << 2));
+            }
+        }
+    } else {
+        if (x86ms->apic_xrupt_override) {
+            intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+            intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+            intsrcovr->length = sizeof(*intsrcovr);
+            intsrcovr->source = 0;
+            intsrcovr->gsi    = cpu_to_le32(2);
+            /* conforms to bus specifications */
+            intsrcovr->flags  = cpu_to_le16(0);
+        }
 
-    for (i = 1; i < 16; i++) {
-        if (!(x86ms->pci_irq_mask & (1 << i))) {
-            /* No need for a INT source override structure. */
-            continue;
+        for (i = 1; i < 16; i++) {
+            if (!(x86ms->pci_irq_mask & (1 << i))) {
+                /* No need for a INT source override structure. */
+                continue;
+            }
+            intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
+            intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
+            intsrcovr->length = sizeof(*intsrcovr);
+            intsrcovr->source = i;
+            intsrcovr->gsi    = cpu_to_le32(i);
+            /* active high, level triggered */
+            intsrcovr->flags  = cpu_to_le16(0xd);
         }
-        intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
-        intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
-        intsrcovr->length = sizeof(*intsrcovr);
-        intsrcovr->source = i;
-        intsrcovr->gsi    = cpu_to_le32(i);
-        intsrcovr->flags  = cpu_to_le16(0xd); /* active high, level triggered */
     }
 
     if (x2apic_mode) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 38/44] hw/i386: plug eoi_intercept_unsupported to ioapic
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

When x86machine doesn't support eoi intercept, set
level_trigger_unsupported property of ioapic to true so that ioapic doesn't
accept configuration to use level trigger.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/microvm.c     |  5 +++--
 hw/i386/pc_piix.c     |  2 +-
 hw/i386/pc_q35.c      |  2 +-
 hw/i386/x86.c         | 10 ++++++++--
 include/hw/i386/x86.h |  6 ++++--
 5 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index aba0c83219..9b03d051ca 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -175,9 +175,10 @@ static void microvm_devices_init(MicrovmMachineState *mms)
                           &error_abort);
     isa_bus_irqs(isa_bus, x86ms->gsi);
 
-    ioapic_init_gsi(gsi_state, "machine");
+    ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported);
     if (ioapics > 1) {
-        x86ms->ioapic2 = ioapic_init_secondary(gsi_state);
+        x86ms->ioapic2 = ioapic_init_secondary(
+            gsi_state, x86ms->eoi_intercept_unsupported);
     }
 
     kvmclock_create(true);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 4c1e31f180..a601c4a916 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -223,7 +223,7 @@ static void pc_init1(MachineState *machine,
     }
 
     if (pcmc->pci_enabled) {
-        ioapic_init_gsi(gsi_state, "i440fx");
+        ioapic_init_gsi(gsi_state, "i440fx", x86ms->eoi_intercept_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 106f5726cc..464463766c 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -256,7 +256,7 @@ static void pc_q35_init(MachineState *machine)
     }
 
     if (pcmc->pci_enabled) {
-        ioapic_init_gsi(gsi_state, "q35");
+        ioapic_init_gsi(gsi_state, "q35", x86ms->eoi_intercept_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 9862fe5bc9..88c365b72d 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -608,7 +608,8 @@ void gsi_handler(void *opaque, int n, int level)
     }
 }
 
-void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
+void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
+                     bool level_trigger_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -622,6 +623,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
     }
     object_property_add_child(object_resolve_path(parent_name, NULL),
                               "ioapic", OBJECT(dev));
+    object_property_set_bool(OBJECT(dev), "level_trigger_unsupported",
+                             level_trigger_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
@@ -631,13 +634,16 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
     }
 }
 
-DeviceState *ioapic_init_secondary(GSIState *gsi_state)
+DeviceState *ioapic_init_secondary(GSIState *gsi_state,
+                                   bool level_trigger_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
     unsigned int i;
 
     dev = qdev_new(TYPE_IOAPIC);
+    object_property_set_bool(OBJECT(dev), "level_trigger_unsupported",
+                             level_trigger_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_SECONDARY_ADDRESS);
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 6eff42550f..7536e5fb8c 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -140,7 +140,9 @@ typedef struct GSIState {
 
 qemu_irq x86_allocate_cpu_irq(void);
 void gsi_handler(void *opaque, int n, int level);
-void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
-DeviceState *ioapic_init_secondary(GSIState *gsi_state);
+void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
+                     bool eoi_intercept_unsupported);
+DeviceState *ioapic_init_secondary(GSIState *gsi_state,
+                                   bool eoi_intercept_unsupported);
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 38/44] hw/i386: plug eoi_intercept_unsupported to ioapic
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

When x86machine doesn't support eoi intercept, set
level_trigger_unsupported property of ioapic to true so that ioapic doesn't
accept configuration to use level trigger.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/microvm.c     |  5 +++--
 hw/i386/pc_piix.c     |  2 +-
 hw/i386/pc_q35.c      |  2 +-
 hw/i386/x86.c         | 10 ++++++++--
 include/hw/i386/x86.h |  6 ++++--
 5 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index aba0c83219..9b03d051ca 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -175,9 +175,10 @@ static void microvm_devices_init(MicrovmMachineState *mms)
                           &error_abort);
     isa_bus_irqs(isa_bus, x86ms->gsi);
 
-    ioapic_init_gsi(gsi_state, "machine");
+    ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported);
     if (ioapics > 1) {
-        x86ms->ioapic2 = ioapic_init_secondary(gsi_state);
+        x86ms->ioapic2 = ioapic_init_secondary(
+            gsi_state, x86ms->eoi_intercept_unsupported);
     }
 
     kvmclock_create(true);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 4c1e31f180..a601c4a916 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -223,7 +223,7 @@ static void pc_init1(MachineState *machine,
     }
 
     if (pcmc->pci_enabled) {
-        ioapic_init_gsi(gsi_state, "i440fx");
+        ioapic_init_gsi(gsi_state, "i440fx", x86ms->eoi_intercept_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 106f5726cc..464463766c 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -256,7 +256,7 @@ static void pc_q35_init(MachineState *machine)
     }
 
     if (pcmc->pci_enabled) {
-        ioapic_init_gsi(gsi_state, "q35");
+        ioapic_init_gsi(gsi_state, "q35", x86ms->eoi_intercept_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 9862fe5bc9..88c365b72d 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -608,7 +608,8 @@ void gsi_handler(void *opaque, int n, int level)
     }
 }
 
-void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
+void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
+                     bool level_trigger_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -622,6 +623,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
     }
     object_property_add_child(object_resolve_path(parent_name, NULL),
                               "ioapic", OBJECT(dev));
+    object_property_set_bool(OBJECT(dev), "level_trigger_unsupported",
+                             level_trigger_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
@@ -631,13 +634,16 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
     }
 }
 
-DeviceState *ioapic_init_secondary(GSIState *gsi_state)
+DeviceState *ioapic_init_secondary(GSIState *gsi_state,
+                                   bool level_trigger_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
     unsigned int i;
 
     dev = qdev_new(TYPE_IOAPIC);
+    object_property_set_bool(OBJECT(dev), "level_trigger_unsupported",
+                             level_trigger_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_SECONDARY_ADDRESS);
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 6eff42550f..7536e5fb8c 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -140,7 +140,9 @@ typedef struct GSIState {
 
 qemu_irq x86_allocate_cpu_irq(void);
 void gsi_handler(void *opaque, int n, int level);
-void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
-DeviceState *ioapic_init_secondary(GSIState *gsi_state);
+void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
+                     bool eoi_intercept_unsupported);
+DeviceState *ioapic_init_secondary(GSIState *gsi_state,
+                                   bool eoi_intercept_unsupported);
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 39/44] ioapic: add property to disallow SMI delivery mode
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a property to prevent ioapic from setting SMI delivery mode.  Without
this guard, qemu can result in unexpected behavior.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/intc/ioapic.c                  | 18 ++++++++++++++++++
 hw/intc/ioapic_common.c           | 20 ++++++++++++++++++++
 include/hw/i386/ioapic_internal.h |  1 +
 3 files changed, 39 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 6d61744961..1815fbd282 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -381,6 +381,21 @@ ioapic_fix_level_trigger_unsupported(uint64_t *entry)
     }
 }
 
+static inline void
+ioapic_fix_smi_unsupported(uint64_t *entry)
+{
+    if ((*entry & IOAPIC_LVT_DELIV_MODE) ==
+        IOAPIC_DM_PMI << IOAPIC_LVT_DELIV_MODE_SHIFT) {
+        /*
+         * ignore a request for delivery mode of lowest SMI
+         */
+        warn_report_once("attempting to set delivery mode to SMI"
+                         "which is not supported");
+        *entry &= ~IOAPIC_LVT_DELIV_MODE;
+        *entry |= IOAPIC_DM_FIXED << IOAPIC_LVT_DELIV_MODE_SHIFT;
+    }
+}
+
 static void
 ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                  unsigned int size)
@@ -424,6 +439,9 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                 if (s->level_trigger_unsupported) {
                     ioapic_fix_level_trigger_unsupported(&s->ioredtbl[index]);
                 }
+                if (s->smi_unsupported) {
+                    ioapic_fix_smi_unsupported(&s->ioredtbl[index]);
+                }
                 ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
                 ioapic_service(s);
             }
diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c
index 07ee142470..b8ef7efbad 100644
--- a/hw/intc/ioapic_common.c
+++ b/hw/intc/ioapic_common.c
@@ -168,12 +168,32 @@ static void ioapic_common_set_level_trigger_unsupported(Object *obj, bool value,
     s->level_trigger_unsupported = value;
 }
 
+static bool ioapic_common_get_smi_unsupported(Object *obj, Error **errp)
+{
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    return s->smi_unsupported;
+}
+
+static void ioapic_common_set_smi_unsupported(Object *obj, bool value,
+                                                       Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    /* only disabling before realize is allowed */
+    assert(!dev->realized);
+    assert(!s->smi_unsupported);
+    s->smi_unsupported = value;
+}
+
 static void ioapic_common_init(Object *obj)
 {
     object_property_add_bool(obj, "level_trigger_unsupported",
                              ioapic_common_get_level_trigger_unsupported,
                              ioapic_common_set_level_trigger_unsupported);
 
+    object_property_add_bool(obj, "smi_unsupported",
+                             ioapic_common_get_smi_unsupported,
+                             ioapic_common_set_smi_unsupported);
 }
 
 static void ioapic_common_realize(DeviceState *dev, Error **errp)
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index 20f2fc7897..46f22a4f85 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -104,6 +104,7 @@ struct IOAPICCommonState {
     uint64_t ioredtbl[IOAPIC_NUM_PINS];
     Notifier machine_done;
     bool level_trigger_unsupported;
+    bool smi_unsupported;
     uint8_t version;
     uint64_t irq_count[IOAPIC_NUM_PINS];
     int irq_level[IOAPIC_NUM_PINS];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 39/44] ioapic: add property to disallow SMI delivery mode
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a property to prevent ioapic from setting SMI delivery mode.  Without
this guard, qemu can result in unexpected behavior.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/intc/ioapic.c                  | 18 ++++++++++++++++++
 hw/intc/ioapic_common.c           | 20 ++++++++++++++++++++
 include/hw/i386/ioapic_internal.h |  1 +
 3 files changed, 39 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 6d61744961..1815fbd282 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -381,6 +381,21 @@ ioapic_fix_level_trigger_unsupported(uint64_t *entry)
     }
 }
 
+static inline void
+ioapic_fix_smi_unsupported(uint64_t *entry)
+{
+    if ((*entry & IOAPIC_LVT_DELIV_MODE) ==
+        IOAPIC_DM_PMI << IOAPIC_LVT_DELIV_MODE_SHIFT) {
+        /*
+         * ignore a request for delivery mode of lowest SMI
+         */
+        warn_report_once("attempting to set delivery mode to SMI"
+                         "which is not supported");
+        *entry &= ~IOAPIC_LVT_DELIV_MODE;
+        *entry |= IOAPIC_DM_FIXED << IOAPIC_LVT_DELIV_MODE_SHIFT;
+    }
+}
+
 static void
 ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                  unsigned int size)
@@ -424,6 +439,9 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                 if (s->level_trigger_unsupported) {
                     ioapic_fix_level_trigger_unsupported(&s->ioredtbl[index]);
                 }
+                if (s->smi_unsupported) {
+                    ioapic_fix_smi_unsupported(&s->ioredtbl[index]);
+                }
                 ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
                 ioapic_service(s);
             }
diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c
index 07ee142470..b8ef7efbad 100644
--- a/hw/intc/ioapic_common.c
+++ b/hw/intc/ioapic_common.c
@@ -168,12 +168,32 @@ static void ioapic_common_set_level_trigger_unsupported(Object *obj, bool value,
     s->level_trigger_unsupported = value;
 }
 
+static bool ioapic_common_get_smi_unsupported(Object *obj, Error **errp)
+{
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    return s->smi_unsupported;
+}
+
+static void ioapic_common_set_smi_unsupported(Object *obj, bool value,
+                                                       Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    /* only disabling before realize is allowed */
+    assert(!dev->realized);
+    assert(!s->smi_unsupported);
+    s->smi_unsupported = value;
+}
+
 static void ioapic_common_init(Object *obj)
 {
     object_property_add_bool(obj, "level_trigger_unsupported",
                              ioapic_common_get_level_trigger_unsupported,
                              ioapic_common_set_level_trigger_unsupported);
 
+    object_property_add_bool(obj, "smi_unsupported",
+                             ioapic_common_get_smi_unsupported,
+                             ioapic_common_set_smi_unsupported);
 }
 
 static void ioapic_common_realize(DeviceState *dev, Error **errp)
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index 20f2fc7897..46f22a4f85 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -104,6 +104,7 @@ struct IOAPICCommonState {
     uint64_t ioredtbl[IOAPIC_NUM_PINS];
     Notifier machine_done;
     bool level_trigger_unsupported;
+    bool smi_unsupported;
     uint8_t version;
     uint64_t irq_count[IOAPIC_NUM_PINS];
     int irq_level[IOAPIC_NUM_PINS];
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 40/44] hw/i386: add a flag to disallow SMI
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a new flag to X86Machine to disallow SMI and pass it to ioapic creation
so that ioapic disallows delivery mode of SMI.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/microvm.c     |  6 ++++--
 hw/i386/pc_piix.c     |  3 ++-
 hw/i386/pc_q35.c      |  3 ++-
 hw/i386/x86.c         | 11 +++++++++--
 include/hw/i386/x86.h |  7 +++++--
 5 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 9b03d051ca..7504324891 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -175,10 +175,12 @@ static void microvm_devices_init(MicrovmMachineState *mms)
                           &error_abort);
     isa_bus_irqs(isa_bus, x86ms->gsi);
 
-    ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported);
+    ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported,
+                    x86ms->smi_unsupported);
     if (ioapics > 1) {
         x86ms->ioapic2 = ioapic_init_secondary(
-            gsi_state, x86ms->eoi_intercept_unsupported);
+            gsi_state, x86ms->eoi_intercept_unsupported,
+            x86ms->smi_unsupported);
     }
 
     kvmclock_create(true);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index a601c4a916..0958035bf8 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -223,7 +223,8 @@ static void pc_init1(MachineState *machine,
     }
 
     if (pcmc->pci_enabled) {
-        ioapic_init_gsi(gsi_state, "i440fx", x86ms->eoi_intercept_unsupported);
+        ioapic_init_gsi(gsi_state, "i440fx", x86ms->eoi_intercept_unsupported,
+                        x86ms->smi_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 464463766c..1ab8a6a78b 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -256,7 +256,8 @@ static void pc_q35_init(MachineState *machine)
     }
 
     if (pcmc->pci_enabled) {
-        ioapic_init_gsi(gsi_state, "q35", x86ms->eoi_intercept_unsupported);
+        ioapic_init_gsi(gsi_state, "q35", x86ms->eoi_intercept_unsupported,
+                        x86ms->smi_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 88c365b72d..3dc36e3590 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -609,7 +609,8 @@ void gsi_handler(void *opaque, int n, int level)
 }
 
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
-                     bool level_trigger_unsupported)
+                     bool level_trigger_unsupported,
+                     bool smi_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -625,6 +626,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
                               "ioapic", OBJECT(dev));
     object_property_set_bool(OBJECT(dev), "level_trigger_unsupported",
                              level_trigger_unsupported, NULL);
+    object_property_set_bool(OBJECT(dev), "smi_unsupported",
+                             smi_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
@@ -635,7 +638,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
 }
 
 DeviceState *ioapic_init_secondary(GSIState *gsi_state,
-                                   bool level_trigger_unsupported)
+                                   bool level_trigger_unsupported,
+                                   bool smi_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -644,6 +648,8 @@ DeviceState *ioapic_init_secondary(GSIState *gsi_state,
     dev = qdev_new(TYPE_IOAPIC);
     object_property_set_bool(OBJECT(dev), "level_trigger_unsupported",
                              level_trigger_unsupported, NULL);
+    object_property_set_bool(OBJECT(dev), "smi_unsupported",
+                             smi_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_SECONDARY_ADDRESS);
@@ -1318,6 +1324,7 @@ static void x86_machine_initfn(Object *obj)
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
     x86ms->bus_lock_ratelimit = 0;
     x86ms->eoi_intercept_unsupported = false;
+    x86ms->smi_unsupported = false;
 
     object_property_add_str(obj, "kvm-type",
                             x86_get_kvm_type, x86_set_kvm_type);
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 7536e5fb8c..3d1d74d171 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -64,6 +64,7 @@ struct X86MachineState {
     unsigned apic_id_limit;
     uint16_t boot_cpus;
     bool eoi_intercept_unsupported;
+    bool smi_unsupported;
 
     OnOffAuto smm;
     OnOffAuto acpi;
@@ -141,8 +142,10 @@ typedef struct GSIState {
 qemu_irq x86_allocate_cpu_irq(void);
 void gsi_handler(void *opaque, int n, int level);
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
-                     bool eoi_intercept_unsupported);
+                     bool eoi_intercept_unsupported,
+                     bool smi_unsupported);
 DeviceState *ioapic_init_secondary(GSIState *gsi_state,
-                                   bool eoi_intercept_unsupported);
+                                   bool eoi_intercept_unsupported,
+                                   bool smi_unsupported);
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 40/44] hw/i386: add a flag to disallow SMI
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a new flag to X86Machine to disallow SMI and pass it to ioapic creation
so that ioapic disallows delivery mode of SMI.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/microvm.c     |  6 ++++--
 hw/i386/pc_piix.c     |  3 ++-
 hw/i386/pc_q35.c      |  3 ++-
 hw/i386/x86.c         | 11 +++++++++--
 include/hw/i386/x86.h |  7 +++++--
 5 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 9b03d051ca..7504324891 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -175,10 +175,12 @@ static void microvm_devices_init(MicrovmMachineState *mms)
                           &error_abort);
     isa_bus_irqs(isa_bus, x86ms->gsi);
 
-    ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported);
+    ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported,
+                    x86ms->smi_unsupported);
     if (ioapics > 1) {
         x86ms->ioapic2 = ioapic_init_secondary(
-            gsi_state, x86ms->eoi_intercept_unsupported);
+            gsi_state, x86ms->eoi_intercept_unsupported,
+            x86ms->smi_unsupported);
     }
 
     kvmclock_create(true);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index a601c4a916..0958035bf8 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -223,7 +223,8 @@ static void pc_init1(MachineState *machine,
     }
 
     if (pcmc->pci_enabled) {
-        ioapic_init_gsi(gsi_state, "i440fx", x86ms->eoi_intercept_unsupported);
+        ioapic_init_gsi(gsi_state, "i440fx", x86ms->eoi_intercept_unsupported,
+                        x86ms->smi_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 464463766c..1ab8a6a78b 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -256,7 +256,8 @@ static void pc_q35_init(MachineState *machine)
     }
 
     if (pcmc->pci_enabled) {
-        ioapic_init_gsi(gsi_state, "q35", x86ms->eoi_intercept_unsupported);
+        ioapic_init_gsi(gsi_state, "q35", x86ms->eoi_intercept_unsupported,
+                        x86ms->smi_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 88c365b72d..3dc36e3590 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -609,7 +609,8 @@ void gsi_handler(void *opaque, int n, int level)
 }
 
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
-                     bool level_trigger_unsupported)
+                     bool level_trigger_unsupported,
+                     bool smi_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -625,6 +626,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
                               "ioapic", OBJECT(dev));
     object_property_set_bool(OBJECT(dev), "level_trigger_unsupported",
                              level_trigger_unsupported, NULL);
+    object_property_set_bool(OBJECT(dev), "smi_unsupported",
+                             smi_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
@@ -635,7 +638,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
 }
 
 DeviceState *ioapic_init_secondary(GSIState *gsi_state,
-                                   bool level_trigger_unsupported)
+                                   bool level_trigger_unsupported,
+                                   bool smi_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -644,6 +648,8 @@ DeviceState *ioapic_init_secondary(GSIState *gsi_state,
     dev = qdev_new(TYPE_IOAPIC);
     object_property_set_bool(OBJECT(dev), "level_trigger_unsupported",
                              level_trigger_unsupported, NULL);
+    object_property_set_bool(OBJECT(dev), "smi_unsupported",
+                             smi_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_SECONDARY_ADDRESS);
@@ -1318,6 +1324,7 @@ static void x86_machine_initfn(Object *obj)
     x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
     x86ms->bus_lock_ratelimit = 0;
     x86ms->eoi_intercept_unsupported = false;
+    x86ms->smi_unsupported = false;
 
     object_property_add_str(obj, "kvm-type",
                             x86_get_kvm_type, x86_set_kvm_type);
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 7536e5fb8c..3d1d74d171 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -64,6 +64,7 @@ struct X86MachineState {
     unsigned apic_id_limit;
     uint16_t boot_cpus;
     bool eoi_intercept_unsupported;
+    bool smi_unsupported;
 
     OnOffAuto smm;
     OnOffAuto acpi;
@@ -141,8 +142,10 @@ typedef struct GSIState {
 qemu_irq x86_allocate_cpu_irq(void);
 void gsi_handler(void *opaque, int n, int level);
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
-                     bool eoi_intercept_unsupported);
+                     bool eoi_intercept_unsupported,
+                     bool smi_unsupported);
 DeviceState *ioapic_init_secondary(GSIState *gsi_state,
-                                   bool eoi_intercept_unsupported);
+                                   bool eoi_intercept_unsupported,
+                                   bool smi_unsupported);
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 41/44] ioapic: add property to disallow INIT/SIPI delivery mode
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a property to prevent ioapic from setting INIT/SIPI delivery mode.
Without this guard, qemu can result in unexpected behavior.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/intc/ioapic.c                  | 19 +++++++++++++++++++
 hw/intc/ioapic_common.c           | 21 +++++++++++++++++++++
 include/hw/i386/ioapic_internal.h |  1 +
 3 files changed, 41 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 1815fbd282..f7eb9f7146 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -396,6 +396,22 @@ ioapic_fix_smi_unsupported(uint64_t *entry)
     }
 }
 
+static inline void
+ioapic_fix_init_sipi_unsupported(uint64_t *entry)
+{
+    uint64_t delmode = *entry & IOAPIC_LVT_DELIV_MODE;
+    if (delmode == IOAPIC_DM_INIT << IOAPIC_LVT_DELIV_MODE_SHIFT ||
+        delmode == IOAPIC_DM_SIPI << IOAPIC_LVT_DELIV_MODE_SHIFT) {
+        /*
+         * ignore a request for delivery mode of lowest SMI
+         */
+        warn_report_once("attempting to set delivery mode to INIT/SIPI"
+                         "which is not supported");
+        *entry &= ~IOAPIC_LVT_DELIV_MODE;
+        *entry |= IOAPIC_DM_FIXED << IOAPIC_LVT_DELIV_MODE_SHIFT;
+    }
+}
+
 static void
 ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                  unsigned int size)
@@ -442,6 +458,9 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                 if (s->smi_unsupported) {
                     ioapic_fix_smi_unsupported(&s->ioredtbl[index]);
                 }
+                if (s->init_sipi_unsupported) {
+                    ioapic_fix_init_sipi_unsupported(&s->ioredtbl[index]);
+                }
                 ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
                 ioapic_service(s);
             }
diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c
index b8ef7efbad..018bacbf96 100644
--- a/hw/intc/ioapic_common.c
+++ b/hw/intc/ioapic_common.c
@@ -185,6 +185,23 @@ static void ioapic_common_set_smi_unsupported(Object *obj, bool value,
     s->smi_unsupported = value;
 }
 
+static bool ioapic_common_get_init_sipi_unsupported(Object *obj, Error **errp)
+{
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    return s->init_sipi_unsupported;
+}
+
+static void ioapic_common_set_init_sipi_unsupported(Object *obj, bool value,
+                                                       Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    /* only disabling before realize is allowed */
+    assert(!dev->realized);
+    assert(!s->init_sipi_unsupported);
+    s->init_sipi_unsupported = value;
+}
+
 static void ioapic_common_init(Object *obj)
 {
     object_property_add_bool(obj, "level_trigger_unsupported",
@@ -194,6 +211,10 @@ static void ioapic_common_init(Object *obj)
     object_property_add_bool(obj, "smi_unsupported",
                              ioapic_common_get_smi_unsupported,
                              ioapic_common_set_smi_unsupported);
+
+    object_property_add_bool(obj, "init_sipi_unsupported",
+                             ioapic_common_get_init_sipi_unsupported,
+                             ioapic_common_set_init_sipi_unsupported);
 }
 
 static void ioapic_common_realize(DeviceState *dev, Error **errp)
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index 46f22a4f85..634b97426d 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -105,6 +105,7 @@ struct IOAPICCommonState {
     Notifier machine_done;
     bool level_trigger_unsupported;
     bool smi_unsupported;
+    bool init_sipi_unsupported;
     uint8_t version;
     uint64_t irq_count[IOAPIC_NUM_PINS];
     int irq_level[IOAPIC_NUM_PINS];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 41/44] ioapic: add property to disallow INIT/SIPI delivery mode
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a property to prevent ioapic from setting INIT/SIPI delivery mode.
Without this guard, qemu can result in unexpected behavior.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/intc/ioapic.c                  | 19 +++++++++++++++++++
 hw/intc/ioapic_common.c           | 21 +++++++++++++++++++++
 include/hw/i386/ioapic_internal.h |  1 +
 3 files changed, 41 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 1815fbd282..f7eb9f7146 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -396,6 +396,22 @@ ioapic_fix_smi_unsupported(uint64_t *entry)
     }
 }
 
+static inline void
+ioapic_fix_init_sipi_unsupported(uint64_t *entry)
+{
+    uint64_t delmode = *entry & IOAPIC_LVT_DELIV_MODE;
+    if (delmode == IOAPIC_DM_INIT << IOAPIC_LVT_DELIV_MODE_SHIFT ||
+        delmode == IOAPIC_DM_SIPI << IOAPIC_LVT_DELIV_MODE_SHIFT) {
+        /*
+         * ignore a request for delivery mode of lowest SMI
+         */
+        warn_report_once("attempting to set delivery mode to INIT/SIPI"
+                         "which is not supported");
+        *entry &= ~IOAPIC_LVT_DELIV_MODE;
+        *entry |= IOAPIC_DM_FIXED << IOAPIC_LVT_DELIV_MODE_SHIFT;
+    }
+}
+
 static void
 ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                  unsigned int size)
@@ -442,6 +458,9 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
                 if (s->smi_unsupported) {
                     ioapic_fix_smi_unsupported(&s->ioredtbl[index]);
                 }
+                if (s->init_sipi_unsupported) {
+                    ioapic_fix_init_sipi_unsupported(&s->ioredtbl[index]);
+                }
                 ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
                 ioapic_service(s);
             }
diff --git a/hw/intc/ioapic_common.c b/hw/intc/ioapic_common.c
index b8ef7efbad..018bacbf96 100644
--- a/hw/intc/ioapic_common.c
+++ b/hw/intc/ioapic_common.c
@@ -185,6 +185,23 @@ static void ioapic_common_set_smi_unsupported(Object *obj, bool value,
     s->smi_unsupported = value;
 }
 
+static bool ioapic_common_get_init_sipi_unsupported(Object *obj, Error **errp)
+{
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    return s->init_sipi_unsupported;
+}
+
+static void ioapic_common_set_init_sipi_unsupported(Object *obj, bool value,
+                                                       Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    IOAPICCommonState *s = IOAPIC_COMMON(obj);
+    /* only disabling before realize is allowed */
+    assert(!dev->realized);
+    assert(!s->init_sipi_unsupported);
+    s->init_sipi_unsupported = value;
+}
+
 static void ioapic_common_init(Object *obj)
 {
     object_property_add_bool(obj, "level_trigger_unsupported",
@@ -194,6 +211,10 @@ static void ioapic_common_init(Object *obj)
     object_property_add_bool(obj, "smi_unsupported",
                              ioapic_common_get_smi_unsupported,
                              ioapic_common_set_smi_unsupported);
+
+    object_property_add_bool(obj, "init_sipi_unsupported",
+                             ioapic_common_get_init_sipi_unsupported,
+                             ioapic_common_set_init_sipi_unsupported);
 }
 
 static void ioapic_common_realize(DeviceState *dev, Error **errp)
diff --git a/include/hw/i386/ioapic_internal.h b/include/hw/i386/ioapic_internal.h
index 46f22a4f85..634b97426d 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -105,6 +105,7 @@ struct IOAPICCommonState {
     Notifier machine_done;
     bool level_trigger_unsupported;
     bool smi_unsupported;
+    bool init_sipi_unsupported;
     uint8_t version;
     uint64_t irq_count[IOAPIC_NUM_PINS];
     int irq_level[IOAPIC_NUM_PINS];
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 42/44] hw/i386: add a flag to disable init/sipi delivery mode of interrupt
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a new flag to X86Machine to disallow INIT/SIPI delivery mode of
interrupt and pass it to ioapic creation so that ioapic disallows INIT/SIPI
delivery mode.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/microvm.c     |  4 ++--
 hw/i386/pc_piix.c     |  2 +-
 hw/i386/pc_q35.c      |  2 +-
 hw/i386/x86.c         | 11 +++++++++--
 include/hw/i386/x86.h |  7 +++++--
 5 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 7504324891..c790adecfb 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -176,11 +176,11 @@ static void microvm_devices_init(MicrovmMachineState *mms)
     isa_bus_irqs(isa_bus, x86ms->gsi);
 
     ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported,
-                    x86ms->smi_unsupported);
+                    x86ms->smi_unsupported, x86ms->init_sipi_unsupported);
     if (ioapics > 1) {
         x86ms->ioapic2 = ioapic_init_secondary(
             gsi_state, x86ms->eoi_intercept_unsupported,
-            x86ms->smi_unsupported);
+            x86ms->smi_unsupported, x86ms->init_sipi_unsupported);
     }
 
     kvmclock_create(true);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 0958035bf8..940cd0f47b 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -224,7 +224,7 @@ static void pc_init1(MachineState *machine,
 
     if (pcmc->pci_enabled) {
         ioapic_init_gsi(gsi_state, "i440fx", x86ms->eoi_intercept_unsupported,
-                        x86ms->smi_unsupported);
+                        x86ms->smi_unsupported, x86ms->init_sipi_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 1ab8a6a78b..8f677ec136 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -257,7 +257,7 @@ static void pc_q35_init(MachineState *machine)
 
     if (pcmc->pci_enabled) {
         ioapic_init_gsi(gsi_state, "q35", x86ms->eoi_intercept_unsupported,
-                        x86ms->smi_unsupported);
+                        x86ms->smi_unsupported, x86ms->init_sipi_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 3dc36e3590..24af05c313 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -610,7 +610,8 @@ void gsi_handler(void *opaque, int n, int level)
 
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
                      bool level_trigger_unsupported,
-                     bool smi_unsupported)
+                     bool smi_unsupported,
+                     bool init_sipi_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -628,6 +629,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
                              level_trigger_unsupported, NULL);
     object_property_set_bool(OBJECT(dev), "smi_unsupported",
                              smi_unsupported, NULL);
+    object_property_set_bool(OBJECT(dev), "init_sipi_unsupported",
+                             init_sipi_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
@@ -639,7 +642,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
 
 DeviceState *ioapic_init_secondary(GSIState *gsi_state,
                                    bool level_trigger_unsupported,
-                                   bool smi_unsupported)
+                                   bool smi_unsupported,
+                                   bool init_sipi_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -650,6 +654,8 @@ DeviceState *ioapic_init_secondary(GSIState *gsi_state,
                              level_trigger_unsupported, NULL);
     object_property_set_bool(OBJECT(dev), "smi_unsupported",
                              smi_unsupported, NULL);
+    object_property_set_bool(OBJECT(dev), "init_sipi_unsupported",
+                             init_sipi_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_SECONDARY_ADDRESS);
@@ -1325,6 +1331,7 @@ static void x86_machine_initfn(Object *obj)
     x86ms->bus_lock_ratelimit = 0;
     x86ms->eoi_intercept_unsupported = false;
     x86ms->smi_unsupported = false;
+    x86ms->init_sipi_unsupported = false;
 
     object_property_add_str(obj, "kvm-type",
                             x86_get_kvm_type, x86_set_kvm_type);
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 3d1d74d171..bca8c2b57d 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -65,6 +65,7 @@ struct X86MachineState {
     uint16_t boot_cpus;
     bool eoi_intercept_unsupported;
     bool smi_unsupported;
+    bool init_sipi_unsupported;
 
     OnOffAuto smm;
     OnOffAuto acpi;
@@ -143,9 +144,11 @@ qemu_irq x86_allocate_cpu_irq(void);
 void gsi_handler(void *opaque, int n, int level);
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
                      bool eoi_intercept_unsupported,
-                     bool smi_unsupported);
+                     bool smi_unsupported,
+                     bool init_sipi_unsupported);
 DeviceState *ioapic_init_secondary(GSIState *gsi_state,
                                    bool eoi_intercept_unsupported,
-                                   bool smi_unsupported);
+                                   bool smi_unsupported,
+                                   bool init_sipi_unsupported);
 
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 42/44] hw/i386: add a flag to disable init/sipi delivery mode of interrupt
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a new flag to X86Machine to disallow INIT/SIPI delivery mode of
interrupt and pass it to ioapic creation so that ioapic disallows INIT/SIPI
delivery mode.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/microvm.c     |  4 ++--
 hw/i386/pc_piix.c     |  2 +-
 hw/i386/pc_q35.c      |  2 +-
 hw/i386/x86.c         | 11 +++++++++--
 include/hw/i386/x86.h |  7 +++++--
 5 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 7504324891..c790adecfb 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -176,11 +176,11 @@ static void microvm_devices_init(MicrovmMachineState *mms)
     isa_bus_irqs(isa_bus, x86ms->gsi);
 
     ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported,
-                    x86ms->smi_unsupported);
+                    x86ms->smi_unsupported, x86ms->init_sipi_unsupported);
     if (ioapics > 1) {
         x86ms->ioapic2 = ioapic_init_secondary(
             gsi_state, x86ms->eoi_intercept_unsupported,
-            x86ms->smi_unsupported);
+            x86ms->smi_unsupported, x86ms->init_sipi_unsupported);
     }
 
     kvmclock_create(true);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 0958035bf8..940cd0f47b 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -224,7 +224,7 @@ static void pc_init1(MachineState *machine,
 
     if (pcmc->pci_enabled) {
         ioapic_init_gsi(gsi_state, "i440fx", x86ms->eoi_intercept_unsupported,
-                        x86ms->smi_unsupported);
+                        x86ms->smi_unsupported, x86ms->init_sipi_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 1ab8a6a78b..8f677ec136 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -257,7 +257,7 @@ static void pc_q35_init(MachineState *machine)
 
     if (pcmc->pci_enabled) {
         ioapic_init_gsi(gsi_state, "q35", x86ms->eoi_intercept_unsupported,
-                        x86ms->smi_unsupported);
+                        x86ms->smi_unsupported, x86ms->init_sipi_unsupported);
     }
 
     if (tcg_enabled()) {
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 3dc36e3590..24af05c313 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -610,7 +610,8 @@ void gsi_handler(void *opaque, int n, int level)
 
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
                      bool level_trigger_unsupported,
-                     bool smi_unsupported)
+                     bool smi_unsupported,
+                     bool init_sipi_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -628,6 +629,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
                              level_trigger_unsupported, NULL);
     object_property_set_bool(OBJECT(dev), "smi_unsupported",
                              smi_unsupported, NULL);
+    object_property_set_bool(OBJECT(dev), "init_sipi_unsupported",
+                             init_sipi_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_DEFAULT_ADDRESS);
@@ -639,7 +642,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
 
 DeviceState *ioapic_init_secondary(GSIState *gsi_state,
                                    bool level_trigger_unsupported,
-                                   bool smi_unsupported)
+                                   bool smi_unsupported,
+                                   bool init_sipi_unsupported)
 {
     DeviceState *dev;
     SysBusDevice *d;
@@ -650,6 +654,8 @@ DeviceState *ioapic_init_secondary(GSIState *gsi_state,
                              level_trigger_unsupported, NULL);
     object_property_set_bool(OBJECT(dev), "smi_unsupported",
                              smi_unsupported, NULL);
+    object_property_set_bool(OBJECT(dev), "init_sipi_unsupported",
+                             init_sipi_unsupported, NULL);
     d = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(d, &error_fatal);
     sysbus_mmio_map(d, 0, IO_APIC_SECONDARY_ADDRESS);
@@ -1325,6 +1331,7 @@ static void x86_machine_initfn(Object *obj)
     x86ms->bus_lock_ratelimit = 0;
     x86ms->eoi_intercept_unsupported = false;
     x86ms->smi_unsupported = false;
+    x86ms->init_sipi_unsupported = false;
 
     object_property_add_str(obj, "kvm-type",
                             x86_get_kvm_type, x86_set_kvm_type);
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 3d1d74d171..bca8c2b57d 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -65,6 +65,7 @@ struct X86MachineState {
     uint16_t boot_cpus;
     bool eoi_intercept_unsupported;
     bool smi_unsupported;
+    bool init_sipi_unsupported;
 
     OnOffAuto smm;
     OnOffAuto acpi;
@@ -143,9 +144,11 @@ qemu_irq x86_allocate_cpu_irq(void);
 void gsi_handler(void *opaque, int n, int level);
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name,
                      bool eoi_intercept_unsupported,
-                     bool smi_unsupported);
+                     bool smi_unsupported,
+                     bool init_sipi_unsupported);
 DeviceState *ioapic_init_secondary(GSIState *gsi_state,
                                    bool eoi_intercept_unsupported,
-                                   bool smi_unsupported);
+                                   bool smi_unsupported,
+                                   bool init_sipi_unsupported);
 
 #endif
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 43/44] i386/tdx: disallow level interrupt and SMI/INIT/SIPI delivery mode
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't allow level interrupt and SMI/INIT/SIPI interrupt delivery
mode.  So disallow them.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 24af05c313..c372403b87 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1307,6 +1307,9 @@ static int x86_kvm_type(MachineState *ms, const char *vm_type)
         kvm_type = KVM_X86_LEGACY_VM;
     } else if (!g_ascii_strcasecmp(vm_type, "tdx")) {
         kvm_type = KVM_X86_TDX_VM;
+        X86_MACHINE(ms)->eoi_intercept_unsupported = true;
+        X86_MACHINE(ms)->smi_unsupported = true;
+        X86_MACHINE(ms)->init_sipi_unsupported = true;
     } else {
         error_report("Unknown kvm-type specified '%s'", vm_type);
         exit(1);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 43/44] i386/tdx: disallow level interrupt and SMI/INIT/SIPI delivery mode
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

TDX doesn't allow level interrupt and SMI/INIT/SIPI interrupt delivery
mode.  So disallow them.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 24af05c313..c372403b87 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1307,6 +1307,9 @@ static int x86_kvm_type(MachineState *ms, const char *vm_type)
         kvm_type = KVM_X86_LEGACY_VM;
     } else if (!g_ascii_strcasecmp(vm_type, "tdx")) {
         kvm_type = KVM_X86_TDX_VM;
+        X86_MACHINE(ms)->eoi_intercept_unsupported = true;
+        X86_MACHINE(ms)->smi_unsupported = true;
+        X86_MACHINE(ms)->init_sipi_unsupported = true;
     } else {
         error_report("Unknown kvm-type specified '%s'", vm_type);
         exit(1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 44/44] i386/tdx: disable S3/S4 unconditionally
  2021-07-08  0:54 ` isaku.yamahata
@ 2021-07-08  0:55   ` isaku.yamahata
  -1 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: kvm, isaku.yamahata, isaku.yamahata

From: Isaku Yamahata <isaku.yamahata@intel.com>

Disable S3/S4 unconditionally when TDX is enabled.  Because cpu state is
protected, it's not allowed to reset cpu state.  So S3/S4 can't be
supported.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/tdx.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0621317b0a..0dd6d94c2a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -31,6 +31,9 @@
 #include "sysemu/tdx.h"
 #include "tdx.h"
 
+#include "hw/southbridge/piix.h"
+#include "hw/i386/ich9.h"
+
 #define TDX1_TD_ATTRIBUTE_DEBUG BIT_ULL(0)
 #define TDX1_TD_ATTRIBUTE_PERFMON BIT_ULL(63)
 #define TDX1_MIN_TSC_FREQUENCY_KHZ (100 * 1000)
@@ -103,10 +106,27 @@ static TdxFirmwareEntry *tdx_get_hob_entry(TdxGuest *tdx)
 
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
+    Object *pm;
+    bool ambig;
     MachineState *ms = MACHINE(qdev_get_machine());
     TdxGuest *tdx = TDX_GUEST(ms->cgs);
     TdxFirmwareEntry *entry;
 
+    /*
+     * object look up logic is copied from acpi_get_pm_info()
+     * @ hw/ie86/acpi-build.c
+     * This property override needs to be done after machine initialization
+     * as there is no ordering of creation of objects/properties.
+     */
+    pm = object_resolve_path_type("", TYPE_PIIX4_PM, &ambig);
+    if (ambig || !pm) {
+        pm = object_resolve_path_type("", TYPE_ICH9_LPC_DEVICE, &ambig);
+    }
+    if (!ambig && pm) {
+        object_property_set_uint(pm, ACPI_PM_PROP_S3_DISABLED, 1, NULL);
+        object_property_set_uint(pm, ACPI_PM_PROP_S4_DISABLED, 1, NULL);
+    }
+
     tdvf_hob_create(tdx, tdx_get_hob_entry(tdx));
 
     for_each_fw_entry(&tdx->fw, entry) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 173+ messages in thread

* [RFC PATCH v2 44/44] i386/tdx: disable S3/S4 unconditionally
@ 2021-07-08  0:55   ` isaku.yamahata
  0 siblings, 0 replies; 173+ messages in thread
From: isaku.yamahata @ 2021-07-08  0:55 UTC (permalink / raw)
  To: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas
  Cc: isaku.yamahata, isaku.yamahata, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Disable S3/S4 unconditionally when TDX is enabled.  Because cpu state is
protected, it's not allowed to reset cpu state.  So S3/S4 can't be
supported.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 target/i386/kvm/tdx.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0621317b0a..0dd6d94c2a 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -31,6 +31,9 @@
 #include "sysemu/tdx.h"
 #include "tdx.h"
 
+#include "hw/southbridge/piix.h"
+#include "hw/i386/ich9.h"
+
 #define TDX1_TD_ATTRIBUTE_DEBUG BIT_ULL(0)
 #define TDX1_TD_ATTRIBUTE_PERFMON BIT_ULL(63)
 #define TDX1_MIN_TSC_FREQUENCY_KHZ (100 * 1000)
@@ -103,10 +106,27 @@ static TdxFirmwareEntry *tdx_get_hob_entry(TdxGuest *tdx)
 
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
+    Object *pm;
+    bool ambig;
     MachineState *ms = MACHINE(qdev_get_machine());
     TdxGuest *tdx = TDX_GUEST(ms->cgs);
     TdxFirmwareEntry *entry;
 
+    /*
+     * object look up logic is copied from acpi_get_pm_info()
+     * @ hw/ie86/acpi-build.c
+     * This property override needs to be done after machine initialization
+     * as there is no ordering of creation of objects/properties.
+     */
+    pm = object_resolve_path_type("", TYPE_PIIX4_PM, &ambig);
+    if (ambig || !pm) {
+        pm = object_resolve_path_type("", TYPE_ICH9_LPC_DEVICE, &ambig);
+    }
+    if (!ambig && pm) {
+        object_property_set_uint(pm, ACPI_PM_PROP_S3_DISABLED, 1, NULL);
+        object_property_set_uint(pm, ACPI_PM_PROP_S4_DISABLED, 1, NULL);
+    }
+
     tdvf_hob_create(tdx, tdx_get_hob_entry(tdx));
 
     for_each_fw_entry(&tdx->fw, entry) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 01/44] target/i386: Expose x86_cpu_get_supported_feature_word() for TDX
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-07-22 17:52     ` Connor Kuehl
  -1 siblings, 0 replies; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:52 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, kvm

On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Expose x86_cpu_get_supported_feature_word() outside of cpu.c so that it
> can be used by TDX to setup the VM-wide CPUID configuration.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 01/44] target/i386: Expose x86_cpu_get_supported_feature_word() for TDX
@ 2021-07-22 17:52     ` Connor Kuehl
  0 siblings, 0 replies; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:52 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm, Sean Christopherson

On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Expose x86_cpu_get_supported_feature_word() outside of cpu.c so that it
> can be used by TDX to setup the VM-wide CPUID configuration.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 02/44] kvm: Switch KVM_CAP_READONLY_MEM to a per-VM ioctl()
  2021-07-08  0:54   ` isaku.yamahata
  (?)
@ 2021-07-22 17:52   ` Connor Kuehl
  -1 siblings, 0 replies; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:52 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Switch to making a VM ioctl() call for KVM_CAP_READONLY_MEM, which may
> be conditional on VM type in recent versions of KVM, e.g. when TDX is
> supported.
> 
> kvm_vm_check_extension() has fallback from kvm_vm_ioctl() to
> kvm_check_extension(). fallback from VM ioctl to System ioctl for
> compatibility for old kernel.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>   accel/kvm/kvm-all.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index e5b10dd129..fdbe24bf59 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2531,7 +2531,7 @@ static int kvm_init(MachineState *ms)
>       }
>   
>       kvm_readonly_mem_allowed =
> -        (kvm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);
> +        (kvm_vm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);
>   
>       kvm_eventfds_allowed =
>           (kvm_check_extension(s, KVM_CAP_IOEVENTFD) > 0);
> 

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 04/44] vl: Introduce machine_init_done_late notifier
  2021-07-08  0:54   ` isaku.yamahata
  (?)
@ 2021-07-22 17:52   ` Connor Kuehl
  -1 siblings, 0 replies; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:52 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Introduce a new notifier, machine_init_done_late, that is notified after
> machine_init_done.  This will be used by TDX to generate the HOB for its
> virtual firmware, which needs to be done after all guest memory has been
> added, i.e. after machine_init_done notifiers have run.  Some code
> registers memory by machine_init_done().
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>   hw/core/machine.c       | 26 ++++++++++++++++++++++++++
>   include/sysemu/sysemu.h |  2 ++
>   2 files changed, 28 insertions(+)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index ffc076ae84..66c39cf72a 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -1278,6 +1278,31 @@ void qemu_remove_machine_init_done_notifier(Notifier *notify)
>       notifier_remove(notify);
>   }
>   
> +static NotifierList machine_init_done_late_notifiers =
> +    NOTIFIER_LIST_INITIALIZER(machine_init_done_late_notifiers);

I think a comment here describing the difference between
machine_init_done and machine_init_done_late would go a
long way for other developers so they don't have to hunt
through the git log.

Connor


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 09/44] target/i386: kvm: don't synchronize guest tsc for TD guest
  2021-07-08  0:54   ` isaku.yamahata
  (?)
@ 2021-07-22 17:53   ` Connor Kuehl
  -1 siblings, 0 replies; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:53 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Make kvm_synchronize_all_tsc() nop for TD-guest.

s/nop/noop

> 
> TDX module specification, 9.11.1 TSC Virtualization

This appears in 9.12.1 of the latest revision as of this writing.

https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1eas-v0.85.039.pdf

> "Virtual TSC values are consistent among all the TD;s VCPUs at the

s/TD;s/TDs

> level suppored by the CPU".

s/suppored/supported

> There is no need for qemu to synchronize tsc and VMM can't access
> to guest TSC. Actually do_kvm_synchronize_tsc() hits assert due to
> failure to write to guest tsc.
> 
>> qemu/target/i386/kvm.c:235: kvm_get_tsc: Assertion `ret == 1' failed.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 12/44] target/i386/tdx: Finalize the TD's measurement when machine is done
  2021-07-08  0:54   ` isaku.yamahata
  (?)
@ 2021-07-22 17:53   ` Connor Kuehl
  -1 siblings, 0 replies; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:53 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Invoke KVM_TDX_FINALIZEMR to finalize the TD's measurement and make
> the TD vCPUs runnable once machine initialization is complete.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>   target/i386/kvm/kvm.c |  7 +++++++
>   target/i386/kvm/tdx.c | 21 +++++++++++++++++++++
>   target/i386/kvm/tdx.h |  3 +++
>   3 files changed, 31 insertions(+)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index be0b96b120..5742fa4806 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -53,6 +53,7 @@
>   #include "migration/blocker.h"
>   #include "exec/memattrs.h"
>   #include "trace.h"
> +#include "tdx.h"
>   
>   //#define DEBUG_KVM
>   
> @@ -2246,6 +2247,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>           return ret;
>       }

This is probably a good place in the series to update the comment
preceding the sev_kvm_init call since TDX is now here and otherwise
the comment seems untimely.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-07-22 17:53     ` Connor Kuehl
  -1 siblings, 0 replies; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:53 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, Sean Christopherson, kvm

On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Introduce a machine property, kvm-type, to allow the user to create a
> Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
> 
>   # $QEMU \
> 	-machine ...,kvm-type=tdx \
> 	...
> 
> Only two types are supported: "legacy" and "tdx", with "legacy" being
> the default.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

I am not a QEMU command line expert, so my mental model of this may be
wrong, but:

This seems to have a very broad scope on the command line and I
am wondering if it's possible to associate it with a TDX command
line object specifically to narrow its scope.

I.e., is it possible to express this on the command line when
launching something that is _not_ meant to be powered by TDX,
such as an SEV guest? If it doesn't make sense to express that
command line argument in a situation like that, perhaps it could
be constrained only to the TDX-specific commandline objects.


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
@ 2021-07-22 17:53     ` Connor Kuehl
  0 siblings, 0 replies; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:53 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm, Sean Christopherson

On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Introduce a machine property, kvm-type, to allow the user to create a
> Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
> 
>   # $QEMU \
> 	-machine ...,kvm-type=tdx \
> 	...
> 
> Only two types are supported: "legacy" and "tdx", with "legacy" being
> the default.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>

I am not a QEMU command line expert, so my mental model of this may be
wrong, but:

This seems to have a very broad scope on the command line and I
am wondering if it's possible to associate it with a TDX command
line object specifically to narrow its scope.

I.e., is it possible to express this on the command line when
launching something that is _not_ meant to be powered by TDX,
such as an SEV guest? If it doesn't make sense to express that
command line argument in a situation like that, perhaps it could
be constrained only to the TDX-specific commandline objects.



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 32/44] tdx: add kvm_tdx_enabled() accessor for later use
  2021-07-08  0:55   ` isaku.yamahata
  (?)
@ 2021-07-22 17:53   ` Connor Kuehl
  2021-12-09 14:31     ` Xiaoyao Li
  -1 siblings, 1 reply; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:53 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/7/21 7:55 PM, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>   include/sysemu/tdx.h  | 1 +
>   target/i386/kvm/kvm.c | 5 +++++
>   2 files changed, 6 insertions(+)
> 
> diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
> index 70eb01348f..f3eced10f9 100644
> --- a/include/sysemu/tdx.h
> +++ b/include/sysemu/tdx.h
> @@ -6,6 +6,7 @@
>   #include "hw/i386/pc.h"
>   
>   bool kvm_has_tdx(KVMState *s);
> +bool kvm_tdx_enabled(void);
>   int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion *rom_memory);
>   #endif
>   
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index af6b5f350e..76c3ea9fac 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -152,6 +152,11 @@ int kvm_set_vm_type(MachineState *ms, int kvm_type)
>       return -ENOTSUP;
>   }
>   
> +bool kvm_tdx_enabled(void)
> +{
> +    return vm_type == KVM_X86_TDX_VM;
> +}
> +

Is this the whole story? Does this guarantee that the VM QEMU is
responsible to bring up is a successfully initialized TD?

 From my reading of the series as it unfolded, this looks like the
function proves that KVM can support TDs and that the user requested
a TDX kvm-type, not that we have a fully-formed TD.

Is it possible to associate this with a more verifiable metric that
the TD has been or will be created successfully? I.e., once the VM
has successfully called the TDX INIT ioctl or has finalized setup?

My question mainly comes from a later patch in the series, where the
"query-tdx-capabilities" and "query-tdx" QMP commands are added.

Forgive me if I am misinterpreting the semantics of each of these
commands:

"query-tdx-capabilities" sounds like it answers the question of
"can it run a TD?"

and "query-tdx" sounds like it answers the question of "is it a TD?"

Is the assumption with "query-tdx" that anything that's gone wrong
with developing a TD will have resulted in the QEMU process exiting
and therefore if we get to a point where we can run "query-tdx" then
we know the TD was successfully formed?


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 11/44] i386/tdx: Implement user specified tsc frequency
  2021-07-08  0:54   ` isaku.yamahata
  (?)
@ 2021-07-22 17:53   ` Connor Kuehl
  2021-12-02  8:56     ` Xiaoyao Li
  -1 siblings, 1 reply; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:53 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Reuse -cpu,tsc-frequency= to get user wanted tsc frequency and pass it
> to KVM_TDX_INIT_VM.
> 
> Besides, sanity check the tsc frequency to be in the legal range and
> legal granularity (required by SEAM module).
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
> [..]
> +    if (env->tsc_khz && (env->tsc_khz < TDX1_MIN_TSC_FREQUENCY_KHZ ||
> +                         env->tsc_khz > TDX1_MAX_TSC_FREQUENCY_KHZ)) {
> +        error_report("Invalid TSC %ld KHz, must specify cpu_frequecy between [%d, %d] kHz\n",

s/frequecy/frequency

> +                      env->tsc_khz, TDX1_MIN_TSC_FREQUENCY_KHZ,
> +                      TDX1_MAX_TSC_FREQUENCY_KHZ);
> +        exit(1);
> +    }
> +
> +    if (env->tsc_khz % (25 * 1000)) {
> +        error_report("Invalid TSC %ld KHz, it must be multiple of 25MHz\n", env->tsc_khz);

Should this be 25KHz instead of 25MHz?



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 34/44] target/i386/tdx: set reboot action to shutdown when tdx
  2021-07-08  0:55   ` isaku.yamahata
  (?)
@ 2021-07-22 17:54   ` Connor Kuehl
  2021-12-10  9:54     ` Xiaoyao Li
  -1 siblings, 1 reply; 173+ messages in thread
From: Connor Kuehl @ 2021-07-22 17:54 UTC (permalink / raw)
  To: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, xiaoyao.li, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/7/21 7:55 PM, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> In TDX CPU state is also protected, thus vcpu state can't be reset by VMM.
> It assumes -action reboot=shutdown instead of silently ignoring vcpu reset.
> 
> TDX module spec version 344425-002US doesn't support vcpu reset by VMM.  VM
> needs to be destroyed and created again to emulate REBOOT_ACTION_RESET.
> For simplicity, put its responsibility to management system like libvirt
> because it's difficult for the current qemu implementation to destroy and
> re-create KVM VM resources with keeping other resources.
> 
> If management system wants reboot behavior for its users, it needs to
>   - set reboot_action to REBOOT_ACTION_SHUTDOWN,
>   - set shutdown_action to SHUTDOWN_ACTION_PAUSE optionally and,
>   - subscribe VM state change and on reboot, (destroy qemu if
>     SHUTDOWN_ACTION_PAUSE and) start new qemu.
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>   target/i386/kvm/tdx.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 1316d95209..0621317b0a 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -25,6 +25,7 @@
>   #include "qapi/qapi-types-misc-target.h"
>   #include "standard-headers/asm-x86/kvm_para.h"
>   #include "sysemu/sysemu.h"
> +#include "sysemu/runstate-action.h"
>   #include "sysemu/kvm.h"
>   #include "sysemu/kvm_int.h"
>   #include "sysemu/tdx.h"
> @@ -363,6 +364,19 @@ static void tdx_guest_init(Object *obj)
>   
>       qemu_mutex_init(&tdx->lock);
>   
> +    /*
> +     * TDX module spec version 344425-002US doesn't support reset of vcpu by
> +     * VMM.  VM needs to be destroyed and created again to emulate
> +     * REBOOT_ACTION_RESET.  For simplicity, put its responsibility to
> +     * management system like libvirt.
> +     *
> +     * Management system should
> +     *  - set reboot_action to REBOOT_ACTION_SHUTDOWN
> +     *  - set shutdown_action to SHUTDOWN_ACTION_PAUSE
> +     *  - subscribe VM state and on reboot, destroy qemu and start new qemu
> +     */
> +    reboot_action = REBOOT_ACTION_SHUTDOWN;
> +
>       tdx->debug = false;
>       object_property_add_bool(obj, "debug", tdx_guest_get_debug,
>                                tdx_guest_set_debug);
> 

I think the same effect could be accomplished with modifying
kvm_arch_cpu_check_are_resettable.


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 04/44] vl: Introduce machine_init_done_late notifier
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 10:13     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:13 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata

On Wed, Jul 07, 2021 at 05:54:34PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Introduce a new notifier, machine_init_done_late, that is notified after
> machine_init_done.  This will be used by TDX to generate the HOB for its
> virtual firmware, which needs to be done after all guest memory has been
> added, i.e. after machine_init_done notifiers have run.  Some code
> registers memory by machine_init_done().

Can you be more specific than "some code"?

I see only pc_memory_init() adding guest ram (and the corresponding e820
entries), and that should run early enough ...

thanks,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 04/44] vl: Introduce machine_init_done_late notifier
@ 2021-08-26 10:13     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:13 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, mtosatti, erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:54:34PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Introduce a new notifier, machine_init_done_late, that is notified after
> machine_init_done.  This will be used by TDX to generate the HOB for its
> virtual firmware, which needs to be done after all guest memory has been
> added, i.e. after machine_init_done notifiers have run.  Some code
> registers memory by machine_init_done().

Can you be more specific than "some code"?

I see only pc_memory_init() adding guest ram (and the corresponding e820
entries), and that should run early enough ...

thanks,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 10:22     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:22 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Introduce a machine property, kvm-type, to allow the user to create a
> Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
> 
>  # $QEMU \
> 	-machine ...,kvm-type=tdx \
> 	...

Can we align sev and tdx better than that?

SEV is enabled this way:

qemu -machine ...,confidential-guest-support=sev0 \
     -object sev-guest,id=sev0,...

(see docs/amd-memory-encryption.txt for details).

tdx could likewise use a tdx-guest object (and both sev-guest and
tdx-guest should probably have a common parent object type) to enable
and configure tdx support.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
@ 2021-08-26 10:22     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:22 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Sean Christopherson

On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Introduce a machine property, kvm-type, to allow the user to create a
> Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
> 
>  # $QEMU \
> 	-machine ...,kvm-type=tdx \
> 	...

Can we align sev and tdx better than that?

SEV is enabled this way:

qemu -machine ...,confidential-guest-support=sev0 \
     -object sev-guest,id=sev0,...

(see docs/amd-memory-encryption.txt for details).

tdx could likewise use a tdx-guest object (and both sev-guest and
tdx-guest should probably have a common parent object type) to enable
and configure tdx support.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 07/44] i386/kvm: Squash getting/putting guest state for TDX VMs
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 10:24     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:24 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Sean Christopherson

On Wed, Jul 07, 2021 at 05:54:37PM -0700, isaku.yamahata@gmail.com wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Ignore get/put state of TDX VMs as accessing/mutating guest state of
> producation TDs is not supported.

Why silently ignore instead of returning an error?

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 07/44] i386/kvm: Squash getting/putting guest state for TDX VMs
@ 2021-08-26 10:24     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:24 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:54:37PM -0700, isaku.yamahata@gmail.com wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Ignore get/put state of TDX VMs as accessing/mutating guest state of
> producation TDs is not supported.

Why silently ignore instead of returning an error?

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 10/44] hw/i386: Initialize TDX via KVM ioctl() when kvm_type is TDX
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 10:27     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:27 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Sean Christopherson

  Hi,

>        'sev-guest':                  'SevGuestProperties',
> +      'tdx-guest':                  'TdxGuestProperties',

Ah, see, it's already there ...

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 10/44] hw/i386: Initialize TDX via KVM ioctl() when kvm_type is TDX
@ 2021-08-26 10:27     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:27 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini

  Hi,

>        'sev-guest':                  'SevGuestProperties',
> +      'tdx-guest':                  'TdxGuestProperties',

Ah, see, it's already there ...

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 15/44] i386/tdx: Add hook to require generic device loader
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 10:41     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:41 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Sean Christopherson

> +    /*
> +     * Sanitiy check for tdx:
> +     * TDX uses generic loader to load bios instead of pflash.
> +     */
> +    for (i = 0; i < ARRAY_SIZE(pcms->flash); i++) {
> +        if (drive_get(IF_PFLASH, 0, i)) {
> +            error_report("pflash not supported by VM type, "
> +                         "use -device loader,file=<path>");
> +            exit(1);
> +        }

I suspect that catches only "-drive if=pflash,..."
but not "-machine pflash0=..."

Also: why does tdx not support flash?
Should be explained in the commit message.

thanks,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 15/44] i386/tdx: Add hook to require generic device loader
@ 2021-08-26 10:41     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:41 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini

> +    /*
> +     * Sanitiy check for tdx:
> +     * TDX uses generic loader to load bios instead of pflash.
> +     */
> +    for (i = 0; i < ARRAY_SIZE(pcms->flash); i++) {
> +        if (drive_get(IF_PFLASH, 0, i)) {
> +            error_report("pflash not supported by VM type, "
> +                         "use -device loader,file=<path>");
> +            exit(1);
> +        }

I suspect that catches only "-drive if=pflash,..."
but not "-machine pflash0=..."

Also: why does tdx not support flash?
Should be explained in the commit message.

thanks,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 16/44] hw/i386: Add definitions from UEFI spec for volumes, resources, etc...
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 10:46     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:46 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata

On Wed, Jul 07, 2021 at 05:54:46PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add definitions for literals, enums, structs, GUIDs, etc... that will be
> used by TDX to build the UEFI Hand-Off Block (HOB) that is passed to the
> Trusted Domain Virtual Firmware (TDVF).  All values come from the UEFI
> specification and TDVF design guide. [1]

Looks like copied completely from somewhere else?
If so please add the source.

Also it should go to some place below include/standard-headers/

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 16/44] hw/i386: Add definitions from UEFI spec for volumes, resources, etc...
@ 2021-08-26 10:46     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:46 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, mtosatti, erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:54:46PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add definitions for literals, enums, structs, GUIDs, etc... that will be
> used by TDX to build the UEFI Hand-Off Block (HOB) that is passed to the
> Trusted Domain Virtual Firmware (TDVF).  All values come from the UEFI
> specification and TDVF design guide. [1]

Looks like copied completely from somewhere else?
If so please add the source.

Also it should go to some place below include/standard-headers/

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 18/44] hw/i386: refactor e820_add_entry()
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 10:49     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:49 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata

On Wed, Jul 07, 2021 at 05:54:48PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> The following patch will utilize this refactoring.

More verbose commit message please.

thanks,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 18/44] hw/i386: refactor e820_add_entry()
@ 2021-08-26 10:49     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 10:49 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, mtosatti, erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:54:48PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> The following patch will utilize this refactoring.

More verbose commit message please.

thanks,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 11:18     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:18 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Sean Christopherson, Min M . Xu

  Hi,

> +        /*
> +         * If TDVF temp memory describe in TDVF metadata lays in RAM, reserve
> +         * the region property.
> +         */
> +        if (entry->address >= 4 * GiB + x86ms->above_4g_mem_size ||
> +            entry->address + entry->size >= 4 * GiB + x86ms->above_4g_mem_size) {
> +            error_report("TDVF type %u address 0x%" PRIx64 " size 0x%" PRIx64
> +                         " above high memory",
> +                         entry->type, entry->address, entry->size);
> +            exit(1);
> +        }

I think you can simply use dma_memory_map() API, then just work with
guest physical addresses and drop the messy and error-prone memory
region offset calculations.

> +    entry->mem_ptr = memory_region_get_ram_ptr(entry->mr);
> +    if (entry->data_len) {
> +        /*
> +         * The memory_region api doesn't allow partial file mapping, create
> +         * ram and copy the contents
> +         */
> +        if (lseek(fd, entry->data_offset, SEEK_SET) != entry->data_offset) {
> +            error_report("can't seek to 0x%x %s", entry->data_offset, filename);
> +            exit(1);
> +        }
> +        if (read(fd, entry->mem_ptr, entry->data_len) != entry->data_len) {
> +            error_report("can't read 0x%x %s", entry->data_len, filename);
> +            exit(1);
> +        }
> +    }

Wouldn't a simple rom_add_blob work here?

> +int load_tdvf(const char *filename)
> +{

> +    for_each_fw_entry(fw, entry) {
> +        if (entry->address < x86ms->below_4g_mem_size ||
> +            entry->address > 4 * GiB) {
> +            tdvf_init_ram_memory(ms, entry);
> +        } else {
> +            tdvf_init_bios_memory(fd, filename, entry);
> +        }
> +    }

Why there are two different ways to load the firmware?

Also: why is all this firmware volume parsing needed?  The normal ovmf
firmware can simply be mapped just below 4G, why can't tdvf work the
same way?

thanks,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
@ 2021-08-26 11:18     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:18 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	Min M . Xu, erdemaktas, pbonzini

  Hi,

> +        /*
> +         * If TDVF temp memory describe in TDVF metadata lays in RAM, reserve
> +         * the region property.
> +         */
> +        if (entry->address >= 4 * GiB + x86ms->above_4g_mem_size ||
> +            entry->address + entry->size >= 4 * GiB + x86ms->above_4g_mem_size) {
> +            error_report("TDVF type %u address 0x%" PRIx64 " size 0x%" PRIx64
> +                         " above high memory",
> +                         entry->type, entry->address, entry->size);
> +            exit(1);
> +        }

I think you can simply use dma_memory_map() API, then just work with
guest physical addresses and drop the messy and error-prone memory
region offset calculations.

> +    entry->mem_ptr = memory_region_get_ram_ptr(entry->mr);
> +    if (entry->data_len) {
> +        /*
> +         * The memory_region api doesn't allow partial file mapping, create
> +         * ram and copy the contents
> +         */
> +        if (lseek(fd, entry->data_offset, SEEK_SET) != entry->data_offset) {
> +            error_report("can't seek to 0x%x %s", entry->data_offset, filename);
> +            exit(1);
> +        }
> +        if (read(fd, entry->mem_ptr, entry->data_len) != entry->data_len) {
> +            error_report("can't read 0x%x %s", entry->data_len, filename);
> +            exit(1);
> +        }
> +    }

Wouldn't a simple rom_add_blob work here?

> +int load_tdvf(const char *filename)
> +{

> +    for_each_fw_entry(fw, entry) {
> +        if (entry->address < x86ms->below_4g_mem_size ||
> +            entry->address > 4 * GiB) {
> +            tdvf_init_ram_memory(ms, entry);
> +        } else {
> +            tdvf_init_bios_memory(fd, filename, entry);
> +        }
> +    }

Why there are two different ways to load the firmware?

Also: why is all this firmware volume parsing needed?  The normal ovmf
firmware can simply be mapped just below 4G, why can't tdvf work the
same way?

thanks,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 19/44] hw/i386/e820: introduce a helper function to change type of e820
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 11:22     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:22 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata

On Wed, Jul 07, 2021 at 05:54:49PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Introduce a helper function, e820_change_type(), that change
> the type of subregion of e820 entry.

The entry is splited into multiple if needed.

> The following patch uses it.

Used to mark ram regions used for firmware as reserved.

More verbose commit messages please, it makes review easier if I don't
have to read the details out of the code changes.

thanks,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 19/44] hw/i386/e820: introduce a helper function to change type of e820
@ 2021-08-26 11:22     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:22 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, mtosatti, erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:54:49PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Introduce a helper function, e820_change_type(), that change
> the type of subregion of e820 entry.

The entry is splited into multiple if needed.

> The following patch uses it.

Used to mark ram regions used for firmware as reserved.

More verbose commit messages please, it makes review easier if I don't
have to read the details out of the code changes.

thanks,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 21/44] i386/tdx: Create the TD HOB list upon machine init done
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 11:29     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:29 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Sean Christopherson

  Hi,

> +static void tdvf_hob_add_memory_resources(TdvfHob *hob)
> +{

> +    /* Copy and sort the e820 tables to add them to the HOB. */
> +    memcpy(e820_entries, e820_table,
> +           nr_e820_entries * sizeof(struct e820_entry));
> +    qsort(e820_entries, nr_e820_entries, sizeof(struct e820_entry),
> +          &tdvf_e820_compare);

I guess patch #19 should make sure the e820 entries stay sorted instead
of sorting them here.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 21/44] i386/tdx: Create the TD HOB list upon machine init done
@ 2021-08-26 11:29     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:29 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini

  Hi,

> +static void tdvf_hob_add_memory_resources(TdvfHob *hob)
> +{

> +    /* Copy and sort the e820 tables to add them to the HOB. */
> +    memcpy(e820_entries, e820_table,
> +           nr_e820_entries * sizeof(struct e820_entry));
> +    qsort(e820_entries, nr_e820_entries, sizeof(struct e820_entry),
> +          &tdvf_e820_compare);

I guess patch #19 should make sure the e820 entries stay sorted instead
of sorting them here.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 27/44] q35: Introduce smm_ranges property for q35-pci-host
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 11:38     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:38 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Isaku Yamahata, Sean Christopherson

On Wed, Jul 07, 2021 at 05:54:57PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@linux.intel.com>
> 
> Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
> etc... exist for the target platform.  TDX doesn't support SMM and doesn't
> play nice with QEMU modifying related guest memory ranges.

"qemu -M q35,smm=off" doesn't work?
If so: what is the exact problem?

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 27/44] q35: Introduce smm_ranges property for q35-pci-host
@ 2021-08-26 11:38     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:38 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini, Isaku Yamahata

On Wed, Jul 07, 2021 at 05:54:57PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@linux.intel.com>
> 
> Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
> etc... exist for the target platform.  TDX doesn't support SMM and doesn't
> play nice with QEMU modifying related guest memory ranges.

"qemu -M q35,smm=off" doesn't work?
If so: what is the exact problem?

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 28/44] i386/tdx: Force x2apic mode and routing for TDs
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 11:42     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:42 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Sean Christopherson

On Wed, Jul 07, 2021 at 05:54:58PM -0700, isaku.yamahata@gmail.com wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> TDX requires x2apic and "resets" vCPUs to have x2apic enabled.  Model
> this in QEMU and unconditionally enable x2apic interrupt routing.

We have a cpu flag for that.  IMHO you should verify it is set and error
out if not instead of silently fixing up things.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 28/44] i386/tdx: Force x2apic mode and routing for TDs
@ 2021-08-26 11:42     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:42 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:54:58PM -0700, isaku.yamahata@gmail.com wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> TDX requires x2apic and "resets" vCPUs to have x2apic enabled.  Model
> this in QEMU and unconditionally enable x2apic interrupt routing.

We have a cpu flag for that.  IMHO you should verify it is set and error
out if not instead of silently fixing up things.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 29/44] target/i386: Add machine option to disable PIC/8259
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 11:50     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:50 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Sean Christopherson

  Hi,

> +    object_class_property_add_bool(oc, PC_MACHINE_PIC,
> +        pc_machine_get_pic, pc_machine_set_pic);

microvm already has such an option.  We should move it from microvm to
the common x86 base type so pc can use it too.

>  #define PC_MACHINE_PIT              "pit"
> +#define PC_MACHINE_PIC              "pic"

Oh, same for pit.  Instead of both pc and microvm having that it
likewise should be a property of the common x86 base machine type.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 29/44] target/i386: Add machine option to disable PIC/8259
@ 2021-08-26 11:50     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:50 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini

  Hi,

> +    object_class_property_add_bool(oc, PC_MACHINE_PIC,
> +        pc_machine_get_pic, pc_machine_set_pic);

microvm already has such an option.  We should move it from microvm to
the common x86 base type so pc can use it too.

>  #define PC_MACHINE_PIT              "pit"
> +#define PC_MACHINE_PIC              "pic"

Oh, same for pit.  Instead of both pc and microvm having that it
likewise should be a property of the common x86 base machine type.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 33/44] qmp: add query-tdx-capabilities query-tdx command
  2021-07-08  0:55   ` isaku.yamahata
@ 2021-08-26 11:59     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:59 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Chenyi Qiang

> +##
> +# @TDXInfo:
> +#
> +# Information about Trust Domain Extensions (TDX) support
> +#
> +# @enabled: true if TDX is active
> +#
> +##
> +{ 'struct': 'TDXInfo',
> +    'data': { 'enabled': 'bool' },
> +  'if': 'defined(TARGET_I386)'
> +}

I think a generic 'ConfidentialComputing' enum with 'none', 'sev' and
'tdx' would be better.

Hmm, I see sev already has a collection of sev-specific commands, so not
sure whenever going that route now buys us much though ...

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 33/44] qmp: add query-tdx-capabilities query-tdx command
@ 2021-08-26 11:59     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 11:59 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, Chenyi Qiang, cohuck, ehabkost, kvm, mst, seanjc,
	alistair, xiaoyao.li, qemu-devel, mtosatti, erdemaktas, pbonzini

> +##
> +# @TDXInfo:
> +#
> +# Information about Trust Domain Extensions (TDX) support
> +#
> +# @enabled: true if TDX is active
> +#
> +##
> +{ 'struct': 'TDXInfo',
> +    'data': { 'enabled': 'bool' },
> +  'if': 'defined(TARGET_I386)'
> +}

I think a generic 'ConfidentialComputing' enum with 'none', 'sev' and
'tdx' would be better.

Hmm, I see sev already has a collection of sev-specific commands, so not
sure whenever going that route now buys us much though ...

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 34/44] target/i386/tdx: set reboot action to shutdown when tdx
  2021-07-08  0:55   ` isaku.yamahata
@ 2021-08-26 12:01     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 12:01 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata

  Hi,

> In TDX CPU state is also protected, thus vcpu state can't be reset by VMM.
> It assumes -action reboot=shutdown instead of silently ignoring vcpu reset.

Again, better throw an error instead of silently fixing up settings.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 34/44] target/i386/tdx: set reboot action to shutdown when tdx
@ 2021-08-26 12:01     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 12:01 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, mtosatti, erdemaktas, pbonzini

  Hi,

> In TDX CPU state is also protected, thus vcpu state can't be reset by VMM.
> It assumes -action reboot=shutdown instead of silently ignoring vcpu reset.

Again, better throw an error instead of silently fixing up settings.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 42/44] hw/i386: add a flag to disable init/sipi delivery mode of interrupt
  2021-07-08  0:55   ` isaku.yamahata
@ 2021-08-26 12:15     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 12:15 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata

  Hi,

>      ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported,
> -                    x86ms->smi_unsupported);
> +                    x86ms->smi_unsupported, x86ms->init_sipi_unsupported);

Hmm, why add three different switches here?  I suspect these would all
be used together anyway?  So maybe just add a "tdx mode" to the ioapic?
Or maybe better a "confidential-computing" mode, as I guess amd will
have similar requirements for similar reasons?

thanks,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 42/44] hw/i386: add a flag to disable init/sipi delivery mode of interrupt
@ 2021-08-26 12:15     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 12:15 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, mtosatti, erdemaktas, pbonzini

  Hi,

>      ioapic_init_gsi(gsi_state, "machine", x86ms->eoi_intercept_unsupported,
> -                    x86ms->smi_unsupported);
> +                    x86ms->smi_unsupported, x86ms->init_sipi_unsupported);

Hmm, why add three different switches here?  I suspect these would all
be used together anyway?  So maybe just add a "tdx mode" to the ioapic?
Or maybe better a "confidential-computing" mode, as I guess amd will
have similar requirements for similar reasons?

thanks,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 24/44] i386/tdx: Add MMIO HOB entries
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 12:17     ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 12:17 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, kvm,
	isaku.yamahata, Sean Christopherson

  Hi,

> +    /* PCI hole above 4gb. */
> +    start = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_START,
> +                                     NULL);
> +    end = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_END, NULL);
> +    tdvf_hob_add_mmio_resource(hob, start, end);
> +
> +    /* MMCFG region */
> +    mcfg_base = object_property_get_uint(host, PCIE_HOST_MCFG_BASE, NULL);
> +    mcfg_size = object_property_get_uint(host, PCIE_HOST_MCFG_SIZE, NULL);
> +    if (mcfg_base && mcfg_base != PCIE_BASE_ADDR_UNMAPPED && mcfg_size) {
> +        tdvf_hob_add_mmio_resource(hob, mcfg_base, mcfg_base + mcfg_size);
> +    }

I doubt this works.  These are initialized by the firmware not qemu.

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 24/44] i386/tdx: Add MMIO HOB entries
@ 2021-08-26 12:17     ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2021-08-26 12:17 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini

  Hi,

> +    /* PCI hole above 4gb. */
> +    start = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_START,
> +                                     NULL);
> +    end = object_property_get_uint(host, PCI_HOST_PROP_PCI_HOLE64_END, NULL);
> +    tdvf_hob_add_mmio_resource(hob, start, end);
> +
> +    /* MMCFG region */
> +    mcfg_base = object_property_get_uint(host, PCIE_HOST_MCFG_BASE, NULL);
> +    mcfg_size = object_property_get_uint(host, PCIE_HOST_MCFG_SIZE, NULL);
> +    if (mcfg_base && mcfg_base != PCIE_BASE_ADDR_UNMAPPED && mcfg_size) {
> +        tdvf_hob_add_mmio_resource(hob, mcfg_base, mcfg_base + mcfg_size);
> +    }

I doubt this works.  These are initialized by the firmware not qemu.

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 10/44] hw/i386: Initialize TDX via KVM ioctl() when kvm_type is TDX
  2021-07-08  0:54   ` isaku.yamahata
@ 2021-08-26 15:06     ` Eric Blake
  -1 siblings, 0 replies; 173+ messages in thread
From: Eric Blake @ 2021-08-26 15:06 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, isaku.yamahata,
	Sean Christopherson, kvm

On Wed, Jul 07, 2021 at 05:54:40PM -0700, isaku.yamahata@gmail.com wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Introduce tdx_ioctl() to invoke TDX specific sub-ioctls of
> KVM_MEMORY_ENCRYPT_OP.  Use tdx_ioctl() to invoke KVM_TDX_INIT, by way
> of tdx_init(), during kvm_arch_init().  KVM_TDX_INIT configures global
> TD state, e.g. the canonical CPUID config, and must be executed prior to
> creating vCPUs.
> 
> Note, this doesn't address the fact that Qemu may change the CPUID
> configuration when creating vCPUs, i.e. punts on refactoring Qemu to
> provide a stable CPUID config prior to kvm_arch_init().
> 
> Explicitly set subleaf index and flags when adding CPUID
> Set the index and flags when adding a CPUID entry to avoid propagating
> stale state from a removed entry, e.g. when the CPUID 0x4 loop bails, it
> can leave non-zero index and flags in the array.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---

> +++ b/qapi/qom.json
> @@ -760,6 +760,18 @@
>              '*cbitpos': 'uint32',
>              'reduced-phys-bits': 'uint32' } }
>  
> +##
> +# @TdxGuestProperties:
> +#
> +# Properties for tdx-guest objects.
> +#
> +# @debug: enable debug mode (default: off)
> +#
> +# Since: 6.0

This should be 6.2

> +##
> +{ 'struct': 'TdxGuestProperties',
> +  'data': { '*debug': 'bool' } }
> +
>  ##
>  # @ObjectType:
>  #
> @@ -802,6 +814,7 @@
>      'secret_keyring',
>      'sev-guest',
>      's390-pv-guest',
> +    'tdx-guest',
>      'throttle-group',
>      'tls-creds-anon',
>      'tls-creds-psk',
> @@ -858,6 +871,7 @@
>        'secret':                     'SecretProperties',
>        'secret_keyring':             'SecretKeyringProperties',
>        'sev-guest':                  'SevGuestProperties',
> +      'tdx-guest':                  'TdxGuestProperties',
>        'throttle-group':             'ThrottleGroupProperties',
>        'tls-creds-anon':             'TlsCredsAnonProperties',
>        'tls-creds-psk':              'TlsCredsPskProperties',

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 10/44] hw/i386: Initialize TDX via KVM ioctl() when kvm_type is TDX
@ 2021-08-26 15:06     ` Eric Blake
  0 siblings, 0 replies; 173+ messages in thread
From: Eric Blake @ 2021-08-26 15:06 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, Sean Christopherson, mtosatti,
	erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:54:40PM -0700, isaku.yamahata@gmail.com wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Introduce tdx_ioctl() to invoke TDX specific sub-ioctls of
> KVM_MEMORY_ENCRYPT_OP.  Use tdx_ioctl() to invoke KVM_TDX_INIT, by way
> of tdx_init(), during kvm_arch_init().  KVM_TDX_INIT configures global
> TD state, e.g. the canonical CPUID config, and must be executed prior to
> creating vCPUs.
> 
> Note, this doesn't address the fact that Qemu may change the CPUID
> configuration when creating vCPUs, i.e. punts on refactoring Qemu to
> provide a stable CPUID config prior to kvm_arch_init().
> 
> Explicitly set subleaf index and flags when adding CPUID
> Set the index and flags when adding a CPUID entry to avoid propagating
> stale state from a removed entry, e.g. when the CPUID 0x4 loop bails, it
> can leave non-zero index and flags in the array.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---

> +++ b/qapi/qom.json
> @@ -760,6 +760,18 @@
>              '*cbitpos': 'uint32',
>              'reduced-phys-bits': 'uint32' } }
>  
> +##
> +# @TdxGuestProperties:
> +#
> +# Properties for tdx-guest objects.
> +#
> +# @debug: enable debug mode (default: off)
> +#
> +# Since: 6.0

This should be 6.2

> +##
> +{ 'struct': 'TdxGuestProperties',
> +  'data': { '*debug': 'bool' } }
> +
>  ##
>  # @ObjectType:
>  #
> @@ -802,6 +814,7 @@
>      'secret_keyring',
>      'sev-guest',
>      's390-pv-guest',
> +    'tdx-guest',
>      'throttle-group',
>      'tls-creds-anon',
>      'tls-creds-psk',
> @@ -858,6 +871,7 @@
>        'secret':                     'SecretProperties',
>        'secret_keyring':             'SecretKeyringProperties',
>        'sev-guest':                  'SevGuestProperties',
> +      'tdx-guest':                  'TdxGuestProperties',
>        'throttle-group':             'ThrottleGroupProperties',
>        'tls-creds-anon':             'TlsCredsAnonProperties',
>        'tls-creds-psk':              'TlsCredsPskProperties',

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 31/44] target/i386/tdx: Allows mrconfigid/mrowner/mrownerconfig for TDX_INIT_VM
  2021-07-08  0:55   ` isaku.yamahata
@ 2021-08-26 15:13     ` Eric Blake
  -1 siblings, 0 replies; 173+ messages in thread
From: Eric Blake @ 2021-08-26 15:13 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, isaku.yamahata,
	kvm

On Wed, Jul 07, 2021 at 05:55:01PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> When creating VM with TDX_INIT_VM, three sha384 hash values are accepted
> for TDX attestation.
> So far they were hard coded as 0. Now allow user to specify those values
> via property mrconfigid, mrowner and mrownerconfig.
> string for those property are hex string of 48 * 2 length.
> 
> example
> -device tdx-guest, \
>   mrconfigid=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef, \
>   mrowner=fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210, \
>   mrownerconfig=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  qapi/qom.json         | 11 ++++++++++-
>  target/i386/kvm/tdx.c | 17 +++++++++++++++++
>  target/i386/kvm/tdx.h |  3 +++
>  3 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 70c70e3efe..8f8b7828b3 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -767,10 +767,19 @@
>  #
>  # @debug: enable debug mode (default: off)
>  #
> +# @mrconfigid: MRCONFIGID SHA384 hex string of 48 * 2 length (default: 0)
> +#
> +# @mrowner: MROWNER SHA384 hex string of 48 * 2 length (default: 0)
> +#
> +# @mrownerconfig: MROWNERCONFIG SHA384 hex string of 48 * 2 length (default: 0)
> +#
>  # Since: 6.0

As these are additions in a later release, they'll need a '(since 6.2)' tag.

>  ##
>  { 'struct': 'TdxGuestProperties',
> -  'data': { '*debug': 'bool' } }
> +  'data': { '*debug': 'bool',
> +            '*mrconfigid': 'str',
> +            '*mrowner': 'str',
> +            '*mrownerconfig': 'str' } }

Do we really want hex-encoded strings?  Elsewhere in QMP, we've
favored the more compact base64 encoding; if you have a strong
argument why hex representation is worth the break in consistency,
it's worth calling out in the commit message.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 31/44] target/i386/tdx: Allows mrconfigid/mrowner/mrownerconfig for TDX_INIT_VM
@ 2021-08-26 15:13     ` Eric Blake
  0 siblings, 0 replies; 173+ messages in thread
From: Eric Blake @ 2021-08-26 15:13 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	xiaoyao.li, qemu-devel, mtosatti, erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:55:01PM -0700, isaku.yamahata@gmail.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> When creating VM with TDX_INIT_VM, three sha384 hash values are accepted
> for TDX attestation.
> So far they were hard coded as 0. Now allow user to specify those values
> via property mrconfigid, mrowner and mrownerconfig.
> string for those property are hex string of 48 * 2 length.
> 
> example
> -device tdx-guest, \
>   mrconfigid=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef, \
>   mrowner=fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210, \
>   mrownerconfig=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  qapi/qom.json         | 11 ++++++++++-
>  target/i386/kvm/tdx.c | 17 +++++++++++++++++
>  target/i386/kvm/tdx.h |  3 +++
>  3 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 70c70e3efe..8f8b7828b3 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -767,10 +767,19 @@
>  #
>  # @debug: enable debug mode (default: off)
>  #
> +# @mrconfigid: MRCONFIGID SHA384 hex string of 48 * 2 length (default: 0)
> +#
> +# @mrowner: MROWNER SHA384 hex string of 48 * 2 length (default: 0)
> +#
> +# @mrownerconfig: MROWNERCONFIG SHA384 hex string of 48 * 2 length (default: 0)
> +#
>  # Since: 6.0

As these are additions in a later release, they'll need a '(since 6.2)' tag.

>  ##
>  { 'struct': 'TdxGuestProperties',
> -  'data': { '*debug': 'bool' } }
> +  'data': { '*debug': 'bool',
> +            '*mrconfigid': 'str',
> +            '*mrowner': 'str',
> +            '*mrownerconfig': 'str' } }

Do we really want hex-encoded strings?  Elsewhere in QMP, we've
favored the more compact base64 encoding; if you have a strong
argument why hex representation is worth the break in consistency,
it's worth calling out in the commit message.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 33/44] qmp: add query-tdx-capabilities query-tdx command
  2021-07-08  0:55   ` isaku.yamahata
@ 2021-08-26 15:21     ` Eric Blake
  -1 siblings, 0 replies; 173+ messages in thread
From: Eric Blake @ 2021-08-26 15:21 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, xiaoyao.li, seanjc, erdemaktas, isaku.yamahata,
	kvm, Chenyi Qiang

On Wed, Jul 07, 2021 at 05:55:03PM -0700, isaku.yamahata@gmail.com wrote:
> From: Chenyi Qiang <chenyi.qiang@intel.com>
> 
> Add QMP commands that can be used by libvirt to query the TDX capabilities
> and TDX info.  The set of capabilities that needs to be reported is only
> enabled at the moment, which means TDX is enabled.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  include/sysemu/tdx.h       |  6 ++++
>  qapi/misc-target.json      | 59 ++++++++++++++++++++++++++++++++++++++

In addition to Gerd's suggestion to use an enum,

> +++ b/qapi/misc-target.json
> @@ -323,3 +323,62 @@
>  { 'command': 'query-sev-attestation-report', 'data': { 'mnonce': 'str' },
>    'returns': 'SevAttestationReport',
>    'if': 'defined(TARGET_I386)' }
> +
> +##
> +# @TDXInfo:
> +#
> +# Information about Trust Domain Extensions (TDX) support
> +#
> +# @enabled: true if TDX is active
> +#
> +##

Missing a 'Since: 6.2' line, here and elsewhere in the patch.

> +{ 'struct': 'TDXInfo',
> +    'data': { 'enabled': 'bool' },
> +  'if': 'defined(TARGET_I386)'
> +}
> +
> +##
> +# @query-tdx:
> +#
> +# Returns information about TDX
> +#
> +# Returns: @TdxInfo
> +#
> +#
> +# Example:
> +#
> +# -> { "execute": "query-tdx" }
> +# <- { "return": { "enabled": true } }
> +#
> +##
> +{ 'command': 'query-tdx', 'returns': 'TDXInfo',
> +  'if': 'defined(TARGET_I386)' }
> +
> +##
> +# @TDXCapability:
> +#
> +# The struct describes capability for a TDX
> +# feature.
> +#
> +##
> +{ 'struct': 'TDXCapability',
> +  'data': { 'enabled': 'bool' },
> +  'if': 'defined(TARGET_I386)' }
> +
> +##
> +# @query-tdx-capabilities:

Do we need two separate commands, or could 'query-tdx' be made
sufficiently powerful to tell you both whether tdx is available, and
what capabilities it has, all in one command?

> +#
> +# This command is used to get the TDX capabilities, and is supported on Intel
> +# X86 platforms only.
> +#
> +# Returns: @TDXCapability.
> +#
> +#
> +# Example:
> +#
> +# -> { "execute": "query-tdx-capabilities" }
> +# <- { "return": { 'enabled': 'bool' }}
> +#
> +##
> +{ 'command': 'query-tdx-capabilities', 'returns': 'TDXCapability',
> +  'if': 'defined(TARGET_I386)' }

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 33/44] qmp: add query-tdx-capabilities query-tdx command
@ 2021-08-26 15:21     ` Eric Blake
  0 siblings, 0 replies; 173+ messages in thread
From: Eric Blake @ 2021-08-26 15:21 UTC (permalink / raw)
  To: isaku.yamahata
  Cc: isaku.yamahata, Chenyi Qiang, cohuck, ehabkost, kvm, mst, seanjc,
	alistair, xiaoyao.li, qemu-devel, mtosatti, erdemaktas, pbonzini

On Wed, Jul 07, 2021 at 05:55:03PM -0700, isaku.yamahata@gmail.com wrote:
> From: Chenyi Qiang <chenyi.qiang@intel.com>
> 
> Add QMP commands that can be used by libvirt to query the TDX capabilities
> and TDX info.  The set of capabilities that needs to be reported is only
> enabled at the moment, which means TDX is enabled.
> 
> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
> Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  include/sysemu/tdx.h       |  6 ++++
>  qapi/misc-target.json      | 59 ++++++++++++++++++++++++++++++++++++++

In addition to Gerd's suggestion to use an enum,

> +++ b/qapi/misc-target.json
> @@ -323,3 +323,62 @@
>  { 'command': 'query-sev-attestation-report', 'data': { 'mnonce': 'str' },
>    'returns': 'SevAttestationReport',
>    'if': 'defined(TARGET_I386)' }
> +
> +##
> +# @TDXInfo:
> +#
> +# Information about Trust Domain Extensions (TDX) support
> +#
> +# @enabled: true if TDX is active
> +#
> +##

Missing a 'Since: 6.2' line, here and elsewhere in the patch.

> +{ 'struct': 'TDXInfo',
> +    'data': { 'enabled': 'bool' },
> +  'if': 'defined(TARGET_I386)'
> +}
> +
> +##
> +# @query-tdx:
> +#
> +# Returns information about TDX
> +#
> +# Returns: @TdxInfo
> +#
> +#
> +# Example:
> +#
> +# -> { "execute": "query-tdx" }
> +# <- { "return": { "enabled": true } }
> +#
> +##
> +{ 'command': 'query-tdx', 'returns': 'TDXInfo',
> +  'if': 'defined(TARGET_I386)' }
> +
> +##
> +# @TDXCapability:
> +#
> +# The struct describes capability for a TDX
> +# feature.
> +#
> +##
> +{ 'struct': 'TDXCapability',
> +  'data': { 'enabled': 'bool' },
> +  'if': 'defined(TARGET_I386)' }
> +
> +##
> +# @query-tdx-capabilities:

Do we need two separate commands, or could 'query-tdx' be made
sufficiently powerful to tell you both whether tdx is available, and
what capabilities it has, all in one command?

> +#
> +# This command is used to get the TDX capabilities, and is supported on Intel
> +# X86 platforms only.
> +#
> +# Returns: @TDXCapability.
> +#
> +#
> +# Example:
> +#
> +# -> { "execute": "query-tdx-capabilities" }
> +# <- { "return": { 'enabled': 'bool' }}
> +#
> +##
> +{ 'command': 'query-tdx-capabilities', 'returns': 'TDXCapability',
> +  'if': 'defined(TARGET_I386)' }

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
  2021-08-26 10:22     ` Gerd Hoffmann
@ 2021-11-24  7:31       ` Xiaoyao Li
  -1 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2021-11-24  7:31 UTC (permalink / raw)
  To: Gerd Hoffmann, isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, seanjc, erdemaktas, kvm, isaku.yamahata

On 8/26/2021 6:22 PM, Gerd Hoffmann wrote:
> On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>
>> Introduce a machine property, kvm-type, to allow the user to create a
>> Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
>>
>>   # $QEMU \
>> 	-machine ...,kvm-type=tdx \
>> 	...

Sorry for the very late reply.

> Can we align sev and tdx better than that?
> 
> SEV is enabled this way:
> 
> qemu -machine ...,confidential-guest-support=sev0 \
>       -object sev-guest,id=sev0,...
> 
> (see docs/amd-memory-encryption.txt for details).
> 
> tdx could likewise use a tdx-guest object (and both sev-guest and
> tdx-guest should probably have a common parent object type) to enable
> and configure tdx support.

yes, sev only introduced a new object and passed it to 
confidential-guest-support. This is because SEV doesn't require the new 
type of VM.
However, TDX does require a new type of VM.

If we read KVM code, there is a parameter of CREATE_VM to pass the 
vm_type, though x86 doesn't use this field so far. On QEMU side, it also 
has the codes to pass/configure vm-type in command line. Of cousre, x86 
arch doesn't implement it. With upcoming TDX, it will implement and use 
vm type for TDX. That's the reason we wrote this patch to implement 
kvm-type for x86, similar to other arches.

yes, of course we can infer the vm_type from "-object tdx-guest". But I 
prefer to just use vm_type. Let's see others opinion.

thanks,
-Xiaoyao

> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
@ 2021-11-24  7:31       ` Xiaoyao Li
  0 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2021-11-24  7:31 UTC (permalink / raw)
  To: Gerd Hoffmann, isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	qemu-devel, mtosatti, erdemaktas, pbonzini

On 8/26/2021 6:22 PM, Gerd Hoffmann wrote:
> On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>
>> Introduce a machine property, kvm-type, to allow the user to create a
>> Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
>>
>>   # $QEMU \
>> 	-machine ...,kvm-type=tdx \
>> 	...

Sorry for the very late reply.

> Can we align sev and tdx better than that?
> 
> SEV is enabled this way:
> 
> qemu -machine ...,confidential-guest-support=sev0 \
>       -object sev-guest,id=sev0,...
> 
> (see docs/amd-memory-encryption.txt for details).
> 
> tdx could likewise use a tdx-guest object (and both sev-guest and
> tdx-guest should probably have a common parent object type) to enable
> and configure tdx support.

yes, sev only introduced a new object and passed it to 
confidential-guest-support. This is because SEV doesn't require the new 
type of VM.
However, TDX does require a new type of VM.

If we read KVM code, there is a parameter of CREATE_VM to pass the 
vm_type, though x86 doesn't use this field so far. On QEMU side, it also 
has the codes to pass/configure vm-type in command line. Of cousre, x86 
arch doesn't implement it. With upcoming TDX, it will implement and use 
vm type for TDX. That's the reason we wrote this patch to implement 
kvm-type for x86, similar to other arches.

yes, of course we can infer the vm_type from "-object tdx-guest". But I 
prefer to just use vm_type. Let's see others opinion.

thanks,
-Xiaoyao

> take care,
>    Gerd
> 



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 11/44] i386/tdx: Implement user specified tsc frequency
  2021-07-22 17:53   ` Connor Kuehl
@ 2021-12-02  8:56     ` Xiaoyao Li
  0 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2021-12-02  8:56 UTC (permalink / raw)
  To: Connor Kuehl, isaku.yamahata, qemu-devel, pbonzini, alistair,
	ehabkost, marcel.apfelbaum, mst, cohuck, mtosatti, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/23/2021 1:53 AM, Connor Kuehl wrote:
> On 7/7/21 7:54 PM, isaku.yamahata@gmail.com wrote:
>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>
>> Reuse -cpu,tsc-frequency= to get user wanted tsc frequency and pass it
>> to KVM_TDX_INIT_VM.
>>
>> Besides, sanity check the tsc frequency to be in the legal range and
>> legal granularity (required by SEAM module).
>>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> ---
>> [..]
>> +    if (env->tsc_khz && (env->tsc_khz < TDX1_MIN_TSC_FREQUENCY_KHZ ||
>> +                         env->tsc_khz > TDX1_MAX_TSC_FREQUENCY_KHZ)) {
>> +        error_report("Invalid TSC %ld KHz, must specify cpu_frequecy 
>> between [%d, %d] kHz\n",
> 
> s/frequecy/frequency

will fix it, thanks!

>> +                      env->tsc_khz, TDX1_MIN_TSC_FREQUENCY_KHZ,
>> +                      TDX1_MAX_TSC_FREQUENCY_KHZ);
>> +        exit(1);
>> +    }
>> +
>> +    if (env->tsc_khz % (25 * 1000)) {
>> +        error_report("Invalid TSC %ld KHz, it must be multiple of 
>> 25MHz\n", env->tsc_khz);
> 
> Should this be 25KHz instead of 25MHz?

No. It equals to

	(evn->tsc_khz * 1000) % (25 * 1000 * 1000)





^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 07/44] i386/kvm: Squash getting/putting guest state for TDX VMs
  2021-08-26 10:24     ` Gerd Hoffmann
@ 2021-12-09  3:33       ` Xiaoyao Li
  -1 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2021-12-09  3:33 UTC (permalink / raw)
  To: Gerd Hoffmann, isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, seanjc, erdemaktas, kvm, isaku.yamahata

On 8/26/2021 6:24 PM, Gerd Hoffmann wrote:
> On Wed, Jul 07, 2021 at 05:54:37PM -0700, isaku.yamahata@gmail.com wrote:
>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>>
>> Ignore get/put state of TDX VMs as accessing/mutating guest state of
>> producation TDs is not supported.
> 
> Why silently ignore instead of returning an error?

The error is returned to upper caller in QEMU, right? There deems to be 
somewhere in QEMU to not call the IOCTLs to get guest states of TD VM.

Let's reword it to "Don't". Is it OK?



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 07/44] i386/kvm: Squash getting/putting guest state for TDX VMs
@ 2021-12-09  3:33       ` Xiaoyao Li
  0 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2021-12-09  3:33 UTC (permalink / raw)
  To: Gerd Hoffmann, isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	qemu-devel, mtosatti, erdemaktas, pbonzini

On 8/26/2021 6:24 PM, Gerd Hoffmann wrote:
> On Wed, Jul 07, 2021 at 05:54:37PM -0700, isaku.yamahata@gmail.com wrote:
>> From: Sean Christopherson <sean.j.christopherson@intel.com>
>>
>> Ignore get/put state of TDX VMs as accessing/mutating guest state of
>> producation TDs is not supported.
> 
> Why silently ignore instead of returning an error?

The error is returned to upper caller in QEMU, right? There deems to be 
somewhere in QEMU to not call the IOCTLs to get guest states of TD VM.

Let's reword it to "Don't". Is it OK?




^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 32/44] tdx: add kvm_tdx_enabled() accessor for later use
  2021-07-22 17:53   ` Connor Kuehl
@ 2021-12-09 14:31     ` Xiaoyao Li
  0 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2021-12-09 14:31 UTC (permalink / raw)
  To: Connor Kuehl, isaku.yamahata, qemu-devel, pbonzini, alistair,
	ehabkost, marcel.apfelbaum, mst, cohuck, mtosatti, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/23/2021 1:53 AM, Connor Kuehl wrote:
> On 7/7/21 7:55 PM, isaku.yamahata@gmail.com wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> ---
>>   include/sysemu/tdx.h  | 1 +
>>   target/i386/kvm/kvm.c | 5 +++++
>>   2 files changed, 6 insertions(+)
>>
>> diff --git a/include/sysemu/tdx.h b/include/sysemu/tdx.h
>> index 70eb01348f..f3eced10f9 100644
>> --- a/include/sysemu/tdx.h
>> +++ b/include/sysemu/tdx.h
>> @@ -6,6 +6,7 @@
>>   #include "hw/i386/pc.h"
>>   bool kvm_has_tdx(KVMState *s);
>> +bool kvm_tdx_enabled(void);
>>   int tdx_system_firmware_init(PCMachineState *pcms, MemoryRegion 
>> *rom_memory);
>>   #endif
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index af6b5f350e..76c3ea9fac 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -152,6 +152,11 @@ int kvm_set_vm_type(MachineState *ms, int kvm_type)
>>       return -ENOTSUP;
>>   }
>> +bool kvm_tdx_enabled(void)
>> +{
>> +    return vm_type == KVM_X86_TDX_VM;
>> +}
>> +
> 
> Is this the whole story? Does this guarantee that the VM QEMU is
> responsible to bring up is a successfully initialized TD?

No, it just means a TDX guest is requested.

>  From my reading of the series as it unfolded, this looks like the
> function proves that KVM can support TDs and that the user requested
> a TDX kvm-type, not that we have a fully-formed TD.

yes, you are right. We referenced what sev_eanbled() and sev_es_enabled().

If the name is misleading, does it looks better to name it is_tdx_vm()?

> Is it possible to associate this with a more verifiable metric that
> the TD has been or will be created successfully? I.e., once the VM
> has successfully called the TDX INIT ioctl or has finalized setup?
> 
> My question mainly comes from a later patch in the series, where the
> "query-tdx-capabilities" and "query-tdx" QMP commands are added.
> 
> Forgive me if I am misinterpreting the semantics of each of these
> commands:

what you understood is correct.

> "query-tdx-capabilities" sounds like it answers the question of
> "can it run a TD?"
> 
> and "query-tdx" sounds like it answers the question of "is it a TD?"
> 
> Is the assumption with "query-tdx" that anything that's gone wrong
> with developing a TD will have resulted in the QEMU process exiting
> and therefore if we get to a point where we can run "query-tdx" then
> we know the TD was successfully formed?
> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 34/44] target/i386/tdx: set reboot action to shutdown when tdx
  2021-07-22 17:54   ` Connor Kuehl
@ 2021-12-10  9:54     ` Xiaoyao Li
  0 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2021-12-10  9:54 UTC (permalink / raw)
  To: Connor Kuehl, isaku.yamahata, qemu-devel, pbonzini, alistair,
	ehabkost, marcel.apfelbaum, mst, cohuck, mtosatti, seanjc,
	erdemaktas
  Cc: isaku.yamahata, kvm

On 7/23/2021 1:54 AM, Connor Kuehl wrote:
> On 7/7/21 7:55 PM, isaku.yamahata@gmail.com wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> In TDX CPU state is also protected, thus vcpu state can't be reset by 
>> VMM.
>> It assumes -action reboot=shutdown instead of silently ignoring vcpu 
>> reset.
>>
>> TDX module spec version 344425-002US doesn't support vcpu reset by 
>> VMM.  VM
>> needs to be destroyed and created again to emulate REBOOT_ACTION_RESET.
>> For simplicity, put its responsibility to management system like libvirt
>> because it's difficult for the current qemu implementation to destroy and
>> re-create KVM VM resources with keeping other resources.
>>
>> If management system wants reboot behavior for its users, it needs to
>>   - set reboot_action to REBOOT_ACTION_SHUTDOWN,
>>   - set shutdown_action to SHUTDOWN_ACTION_PAUSE optionally and,
>>   - subscribe VM state change and on reboot, (destroy qemu if
>>     SHUTDOWN_ACTION_PAUSE and) start new qemu.
>>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> ---
>>   target/i386/kvm/tdx.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
>> index 1316d95209..0621317b0a 100644
>> --- a/target/i386/kvm/tdx.c
>> +++ b/target/i386/kvm/tdx.c
>> @@ -25,6 +25,7 @@
>>   #include "qapi/qapi-types-misc-target.h"
>>   #include "standard-headers/asm-x86/kvm_para.h"
>>   #include "sysemu/sysemu.h"
>> +#include "sysemu/runstate-action.h"
>>   #include "sysemu/kvm.h"
>>   #include "sysemu/kvm_int.h"
>>   #include "sysemu/tdx.h"
>> @@ -363,6 +364,19 @@ static void tdx_guest_init(Object *obj)
>>       qemu_mutex_init(&tdx->lock);
>> +    /*
>> +     * TDX module spec version 344425-002US doesn't support reset of 
>> vcpu by
>> +     * VMM.  VM needs to be destroyed and created again to emulate
>> +     * REBOOT_ACTION_RESET.  For simplicity, put its responsibility to
>> +     * management system like libvirt.
>> +     *
>> +     * Management system should
>> +     *  - set reboot_action to REBOOT_ACTION_SHUTDOWN
>> +     *  - set shutdown_action to SHUTDOWN_ACTION_PAUSE
>> +     *  - subscribe VM state and on reboot, destroy qemu and start 
>> new qemu
>> +     */
>> +    reboot_action = REBOOT_ACTION_SHUTDOWN;
>> +
>>       tdx->debug = false;
>>       object_property_add_bool(obj, "debug", tdx_guest_get_debug,
>>                                tdx_guest_set_debug);
>>
> 
> I think the same effect could be accomplished with modifying
> kvm_arch_cpu_check_are_resettable.
> 

Yes. Thanks for pointing it out. We will take this approach.

^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  2021-08-26 11:18     ` Gerd Hoffmann
@ 2022-01-04 13:08       ` Xiaoyao Li
  -1 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2022-01-04 13:08 UTC (permalink / raw)
  To: Gerd Hoffmann, isaku.yamahata, Laszlo Ersek
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, seanjc, erdemaktas, kvm, isaku.yamahata,
	Min M . Xu

On 8/26/2021 7:18 PM, Gerd Hoffmann wrote:
>> +int load_tdvf(const char *filename)
>> +{
> 
>> +    for_each_fw_entry(fw, entry) {
>> +        if (entry->address < x86ms->below_4g_mem_size ||
>> +            entry->address > 4 * GiB) {
>> +            tdvf_init_ram_memory(ms, entry);
>> +        } else {
>> +            tdvf_init_bios_memory(fd, filename, entry);
>> +        }
>> +    }
> 
> Why there are two different ways to load the firmware?

because there are two different parts in TDVF:
  a) one is firmware volume (BFV and CFV, i.e., OVMF_CODE.fd and 
OVMF_VAR.fd). Those are ROMs;

  b) the other is some RAM regions, e.g., temp memory for BFV early 
running and TD HOB to pass info to TDVF; Those are RAMs which is already 
added to TDX VM;

> Also: why is all this firmware volume parsing needed?  The normal ovmf
> firmware can simply be mapped just below 4G, why can't tdvf work the
> same way?

Ideally, the firmware (part a above) can be mapped just below 4G like 
what we do for OVMF.

But it needs additional when map part a) to parse the metadata and get 
location of part b) and initialize the RAM of part b). Yes, the 
additional work can be added in existing OVMF laoding flow as pflash.


+ Laszlo,

Regarding laoding TDVF as pflash, I have some questions:

- pflash requires KVM to support readonly mmeory. However, for TDX, it 
doesn't support readonly memory. Is it a must? or we can make an 
exception for TDX?

- I saw from 
https://lists.gnu.org/archive/html/qemu-discuss/2018-04/msg00045.html, 
you said when load OVMF as pflash, it's MMIO. But for TDVF, it's treated 
as private memory. I'm not sure whether it will cause some potential 
problem if loading TDVF with pflash.

Anyway I tried changing the existing pflash approach to load TDVF. It 
can boot a TDX VM and no issue.

> thanks,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
@ 2022-01-04 13:08       ` Xiaoyao Li
  0 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2022-01-04 13:08 UTC (permalink / raw)
  To: Gerd Hoffmann, isaku.yamahata, Laszlo Ersek
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	qemu-devel, mtosatti, Min M . Xu, erdemaktas, pbonzini

On 8/26/2021 7:18 PM, Gerd Hoffmann wrote:
>> +int load_tdvf(const char *filename)
>> +{
> 
>> +    for_each_fw_entry(fw, entry) {
>> +        if (entry->address < x86ms->below_4g_mem_size ||
>> +            entry->address > 4 * GiB) {
>> +            tdvf_init_ram_memory(ms, entry);
>> +        } else {
>> +            tdvf_init_bios_memory(fd, filename, entry);
>> +        }
>> +    }
> 
> Why there are two different ways to load the firmware?

because there are two different parts in TDVF:
  a) one is firmware volume (BFV and CFV, i.e., OVMF_CODE.fd and 
OVMF_VAR.fd). Those are ROMs;

  b) the other is some RAM regions, e.g., temp memory for BFV early 
running and TD HOB to pass info to TDVF; Those are RAMs which is already 
added to TDX VM;

> Also: why is all this firmware volume parsing needed?  The normal ovmf
> firmware can simply be mapped just below 4G, why can't tdvf work the
> same way?

Ideally, the firmware (part a above) can be mapped just below 4G like 
what we do for OVMF.

But it needs additional when map part a) to parse the metadata and get 
location of part b) and initialize the RAM of part b). Yes, the 
additional work can be added in existing OVMF laoding flow as pflash.


+ Laszlo,

Regarding laoding TDVF as pflash, I have some questions:

- pflash requires KVM to support readonly mmeory. However, for TDX, it 
doesn't support readonly memory. Is it a must? or we can make an 
exception for TDX?

- I saw from 
https://lists.gnu.org/archive/html/qemu-discuss/2018-04/msg00045.html, 
you said when load OVMF as pflash, it's MMIO. But for TDVF, it's treated 
as private memory. I'm not sure whether it will cause some potential 
problem if loading TDVF with pflash.

Anyway I tried changing the existing pflash approach to load TDVF. It 
can boot a TDX VM and no issue.

> thanks,
>    Gerd
> 



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  2022-01-04 13:08       ` Xiaoyao Li
@ 2022-01-06 16:06         ` Laszlo Ersek
  -1 siblings, 0 replies; 173+ messages in thread
From: Laszlo Ersek @ 2022-01-06 16:06 UTC (permalink / raw)
  To: Xiaoyao Li, Gerd Hoffmann, isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, seanjc, erdemaktas, kvm, isaku.yamahata,
	Min M . Xu

On 01/04/22 14:08, Xiaoyao Li wrote:

> + Laszlo,
> 
> Regarding laoding TDVF as pflash, I have some questions:
> 
> - pflash requires KVM to support readonly mmeory. However, for TDX, it
> doesn't support readonly memory. Is it a must? or we can make an
> exception for TDX?
> 
> - I saw from
> https://lists.gnu.org/archive/html/qemu-discuss/2018-04/msg00045.html,
> you said when load OVMF as pflash, it's MMIO. But for TDVF, it's treated
> as private memory. I'm not sure whether it will cause some potential
> problem if loading TDVF with pflash.
> 
> Anyway I tried changing the existing pflash approach to load TDVF. It
> can boot a TDX VM and no issue.

I have no comments on whether TDX should or should not use pflash.

If you go without pflash, then you likely will not have a
standards-conformant UEFI variable store. (Unless you reimplement the
variable arch protocols in edk2 on top of something else than the Fault
Tolerant Write and Firmware Volume Block protocols.) Whether a
conformant UEFI varstore matters to you (or to TDX in general) is
something I can't comment on.

(I've generally stopped commenting on confidential computing topics, but
this message allows for comments on just pflash, and how it impacts OVMF.)

Regarding pflash itself, the read-only KVM memslot is required for it.
Otherwise pflash cannot work as a "ROMD device" (= you can't flip it
back and forth between ROM mode and programming (MMIO) mode).

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
@ 2022-01-06 16:06         ` Laszlo Ersek
  0 siblings, 0 replies; 173+ messages in thread
From: Laszlo Ersek @ 2022-01-06 16:06 UTC (permalink / raw)
  To: Xiaoyao Li, Gerd Hoffmann, isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	qemu-devel, mtosatti, Min M . Xu, erdemaktas, pbonzini

On 01/04/22 14:08, Xiaoyao Li wrote:

> + Laszlo,
> 
> Regarding laoding TDVF as pflash, I have some questions:
> 
> - pflash requires KVM to support readonly mmeory. However, for TDX, it
> doesn't support readonly memory. Is it a must? or we can make an
> exception for TDX?
> 
> - I saw from
> https://lists.gnu.org/archive/html/qemu-discuss/2018-04/msg00045.html,
> you said when load OVMF as pflash, it's MMIO. But for TDVF, it's treated
> as private memory. I'm not sure whether it will cause some potential
> problem if loading TDVF with pflash.
> 
> Anyway I tried changing the existing pflash approach to load TDVF. It
> can boot a TDX VM and no issue.

I have no comments on whether TDX should or should not use pflash.

If you go without pflash, then you likely will not have a
standards-conformant UEFI variable store. (Unless you reimplement the
variable arch protocols in edk2 on top of something else than the Fault
Tolerant Write and Firmware Volume Block protocols.) Whether a
conformant UEFI varstore matters to you (or to TDX in general) is
something I can't comment on.

(I've generally stopped commenting on confidential computing topics, but
this message allows for comments on just pflash, and how it impacts OVMF.)

Regarding pflash itself, the read-only KVM memslot is required for it.
Otherwise pflash cannot work as a "ROMD device" (= you can't flip it
back and forth between ROM mode and programming (MMIO) mode).

Thanks
Laszlo



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  2022-01-06 16:06         ` Laszlo Ersek
@ 2022-01-07  7:05           ` Xiaoyao Li
  -1 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2022-01-07  7:05 UTC (permalink / raw)
  To: Laszlo Ersek, Gerd Hoffmann, isaku.yamahata
  Cc: qemu-devel, pbonzini, alistair, ehabkost, marcel.apfelbaum, mst,
	cohuck, mtosatti, seanjc, erdemaktas, kvm, isaku.yamahata,
	Min M . Xu

On 1/7/2022 12:06 AM, Laszlo Ersek wrote:
> On 01/04/22 14:08, Xiaoyao Li wrote:
> 
>> + Laszlo,
>>
>> Regarding laoding TDVF as pflash, I have some questions:
>>
>> - pflash requires KVM to support readonly mmeory. However, for TDX, it
>> doesn't support readonly memory. Is it a must? or we can make an
>> exception for TDX?
>>
>> - I saw from
>> https://lists.gnu.org/archive/html/qemu-discuss/2018-04/msg00045.html,
>> you said when load OVMF as pflash, it's MMIO. But for TDVF, it's treated
>> as private memory. I'm not sure whether it will cause some potential
>> problem if loading TDVF with pflash.
>>
>> Anyway I tried changing the existing pflash approach to load TDVF. It
>> can boot a TDX VM and no issue.
> 
> I have no comments on whether TDX should or should not use pflash.
> 
> If you go without pflash, then you likely will not have a
> standards-conformant UEFI variable store. (Unless you reimplement the
> variable arch protocols in edk2 on top of something else than the Fault
> Tolerant Write and Firmware Volume Block protocols.) Whether a
> conformant UEFI varstore matters to you (or to TDX in general) is
> something I can't comment on.

Thanks for your reply! Laszlo

regarding "standards-conformant UEFI variable store", I guess you mean 
the change to UEFI non-volatile variables needs to be synced back to the 
OVMF_VARS.fd file. right?

If so, I need to sync with internal folks who are upstreaming TDVF 
support into OVMF.

> (I've generally stopped commenting on confidential computing topics, but
> this message allows for comments on just pflash, and how it impacts OVMF.)
> 
> Regarding pflash itself, the read-only KVM memslot is required for it.
> Otherwise pflash cannot work as a "ROMD device" (= you can't flip it
> back and forth between ROM mode and programming (MMIO) mode).

We don't need Read-only mode for TDVF so far. If for this purpose, is it 
acceptable that allowing a pflash without KVM readonly memslot support 
if read-only is not required for the specific pflash device?

We are trying to follow the existing usage of OVMF for TDX, since TDVF 
support will be landed in OVMF instead of a new separate binary.

> Thanks
> Laszlo
> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
@ 2022-01-07  7:05           ` Xiaoyao Li
  0 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2022-01-07  7:05 UTC (permalink / raw)
  To: Laszlo Ersek, Gerd Hoffmann, isaku.yamahata
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	qemu-devel, mtosatti, Min M . Xu, erdemaktas, pbonzini

On 1/7/2022 12:06 AM, Laszlo Ersek wrote:
> On 01/04/22 14:08, Xiaoyao Li wrote:
> 
>> + Laszlo,
>>
>> Regarding laoding TDVF as pflash, I have some questions:
>>
>> - pflash requires KVM to support readonly mmeory. However, for TDX, it
>> doesn't support readonly memory. Is it a must? or we can make an
>> exception for TDX?
>>
>> - I saw from
>> https://lists.gnu.org/archive/html/qemu-discuss/2018-04/msg00045.html,
>> you said when load OVMF as pflash, it's MMIO. But for TDVF, it's treated
>> as private memory. I'm not sure whether it will cause some potential
>> problem if loading TDVF with pflash.
>>
>> Anyway I tried changing the existing pflash approach to load TDVF. It
>> can boot a TDX VM and no issue.
> 
> I have no comments on whether TDX should or should not use pflash.
> 
> If you go without pflash, then you likely will not have a
> standards-conformant UEFI variable store. (Unless you reimplement the
> variable arch protocols in edk2 on top of something else than the Fault
> Tolerant Write and Firmware Volume Block protocols.) Whether a
> conformant UEFI varstore matters to you (or to TDX in general) is
> something I can't comment on.

Thanks for your reply! Laszlo

regarding "standards-conformant UEFI variable store", I guess you mean 
the change to UEFI non-volatile variables needs to be synced back to the 
OVMF_VARS.fd file. right?

If so, I need to sync with internal folks who are upstreaming TDVF 
support into OVMF.

> (I've generally stopped commenting on confidential computing topics, but
> this message allows for comments on just pflash, and how it impacts OVMF.)
> 
> Regarding pflash itself, the read-only KVM memslot is required for it.
> Otherwise pflash cannot work as a "ROMD device" (= you can't flip it
> back and forth between ROM mode and programming (MMIO) mode).

We don't need Read-only mode for TDVF so far. If for this purpose, is it 
acceptable that allowing a pflash without KVM readonly memslot support 
if read-only is not required for the specific pflash device?

We are trying to follow the existing usage of OVMF for TDX, since TDVF 
support will be landed in OVMF instead of a new separate binary.

> Thanks
> Laszlo
> 



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  2022-01-07  7:05           ` Xiaoyao Li
@ 2022-01-10 11:01             ` Gerd Hoffmann
  -1 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2022-01-10 11:01 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Laszlo Ersek, isaku.yamahata, qemu-devel, pbonzini, alistair,
	ehabkost, marcel.apfelbaum, mst, cohuck, mtosatti, seanjc,
	erdemaktas, kvm, isaku.yamahata, Min M . Xu

> > If you go without pflash, then you likely will not have a
> > standards-conformant UEFI variable store. (Unless you reimplement the
> > variable arch protocols in edk2 on top of something else than the Fault
> > Tolerant Write and Firmware Volume Block protocols.) Whether a
> > conformant UEFI varstore matters to you (or to TDX in general) is
> > something I can't comment on.
> 
> Thanks for your reply! Laszlo
> 
> regarding "standards-conformant UEFI variable store", I guess you mean the
> change to UEFI non-volatile variables needs to be synced back to the
> OVMF_VARS.fd file. right?

Yes.  UEFI variables are expected to be persistent, and syncing to
OVMF_VARS.fd handles that.

Not fully sure whenever that expectation holds up in the CC world.  At
least the AmdSev variant has just OVMF.fd, i.e. no CODE/VARS split.

> > Regarding pflash itself, the read-only KVM memslot is required for it.
> > Otherwise pflash cannot work as a "ROMD device" (= you can't flip it
> > back and forth between ROM mode and programming (MMIO) mode).
> 
> We don't need Read-only mode for TDVF so far. If for this purpose, is it
> acceptable that allowing a pflash without KVM readonly memslot support if
> read-only is not required for the specific pflash device?

In case you don't want/need persistent VARS (which strictly speaking is
a UEFI spec violation) you should be able to go for a simple "-bios
OVMF.fd".

take care,
  Gerd


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
@ 2022-01-10 11:01             ` Gerd Hoffmann
  0 siblings, 0 replies; 173+ messages in thread
From: Gerd Hoffmann @ 2022-01-10 11:01 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	qemu-devel, mtosatti, Min M . Xu, erdemaktas, pbonzini,
	Laszlo Ersek, isaku.yamahata

> > If you go without pflash, then you likely will not have a
> > standards-conformant UEFI variable store. (Unless you reimplement the
> > variable arch protocols in edk2 on top of something else than the Fault
> > Tolerant Write and Firmware Volume Block protocols.) Whether a
> > conformant UEFI varstore matters to you (or to TDX in general) is
> > something I can't comment on.
> 
> Thanks for your reply! Laszlo
> 
> regarding "standards-conformant UEFI variable store", I guess you mean the
> change to UEFI non-volatile variables needs to be synced back to the
> OVMF_VARS.fd file. right?

Yes.  UEFI variables are expected to be persistent, and syncing to
OVMF_VARS.fd handles that.

Not fully sure whenever that expectation holds up in the CC world.  At
least the AmdSev variant has just OVMF.fd, i.e. no CODE/VARS split.

> > Regarding pflash itself, the read-only KVM memslot is required for it.
> > Otherwise pflash cannot work as a "ROMD device" (= you can't flip it
> > back and forth between ROM mode and programming (MMIO) mode).
> 
> We don't need Read-only mode for TDVF so far. If for this purpose, is it
> acceptable that allowing a pflash without KVM readonly memslot support if
> read-only is not required for the specific pflash device?

In case you don't want/need persistent VARS (which strictly speaking is
a UEFI spec violation) you should be able to go for a simple "-bios
OVMF.fd".

take care,
  Gerd



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
  2021-11-24  7:31       ` Xiaoyao Li
@ 2022-01-10 11:18         ` Daniel P. Berrangé
  -1 siblings, 0 replies; 173+ messages in thread
From: Daniel P. Berrangé @ 2022-01-10 11:18 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Gerd Hoffmann, isaku.yamahata, isaku.yamahata, cohuck, ehabkost,
	kvm, mst, seanjc, alistair, qemu-devel, mtosatti, erdemaktas,
	pbonzini

On Wed, Nov 24, 2021 at 03:31:13PM +0800, Xiaoyao Li wrote:
> On 8/26/2021 6:22 PM, Gerd Hoffmann wrote:
> > On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > 
> > > Introduce a machine property, kvm-type, to allow the user to create a
> > > Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
> > > 
> > >   # $QEMU \
> > > 	-machine ...,kvm-type=tdx \
> > > 	...
> 
> Sorry for the very late reply.
> 
> > Can we align sev and tdx better than that?
> > 
> > SEV is enabled this way:
> > 
> > qemu -machine ...,confidential-guest-support=sev0 \
> >       -object sev-guest,id=sev0,...
> > 
> > (see docs/amd-memory-encryption.txt for details).
> > 
> > tdx could likewise use a tdx-guest object (and both sev-guest and
> > tdx-guest should probably have a common parent object type) to enable
> > and configure tdx support.
> 
> yes, sev only introduced a new object and passed it to
> confidential-guest-support. This is because SEV doesn't require the new type
> of VM.
> However, TDX does require a new type of VM.
> 
> If we read KVM code, there is a parameter of CREATE_VM to pass the vm_type,
> though x86 doesn't use this field so far. On QEMU side, it also has the
> codes to pass/configure vm-type in command line. Of cousre, x86 arch doesn't
> implement it. With upcoming TDX, it will implement and use vm type for TDX.
> That's the reason we wrote this patch to implement kvm-type for x86, similar
> to other arches.
> 
> yes, of course we can infer the vm_type from "-object tdx-guest". But I
> prefer to just use vm_type. Let's see others opinion.

It isn't just SEV that is using the confidential-guest-support approach.
This was done for PPC64 and S390x too.  This gives QEMU a standard
internal interface to declare that a confidential guest is being used /
configured. IMHO, TDX needs to use this too, unless there's a compelling
technical reason why it is a bad approach & needs to diverge from every
other confidential guest impl in QEMU.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
@ 2022-01-10 11:18         ` Daniel P. Berrangé
  0 siblings, 0 replies; 173+ messages in thread
From: Daniel P. Berrangé @ 2022-01-10 11:18 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: isaku.yamahata, alistair, ehabkost, kvm, mst, seanjc, cohuck,
	qemu-devel, mtosatti, Gerd Hoffmann, erdemaktas, pbonzini,
	isaku.yamahata

On Wed, Nov 24, 2021 at 03:31:13PM +0800, Xiaoyao Li wrote:
> On 8/26/2021 6:22 PM, Gerd Hoffmann wrote:
> > On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
> > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > 
> > > Introduce a machine property, kvm-type, to allow the user to create a
> > > Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
> > > 
> > >   # $QEMU \
> > > 	-machine ...,kvm-type=tdx \
> > > 	...
> 
> Sorry for the very late reply.
> 
> > Can we align sev and tdx better than that?
> > 
> > SEV is enabled this way:
> > 
> > qemu -machine ...,confidential-guest-support=sev0 \
> >       -object sev-guest,id=sev0,...
> > 
> > (see docs/amd-memory-encryption.txt for details).
> > 
> > tdx could likewise use a tdx-guest object (and both sev-guest and
> > tdx-guest should probably have a common parent object type) to enable
> > and configure tdx support.
> 
> yes, sev only introduced a new object and passed it to
> confidential-guest-support. This is because SEV doesn't require the new type
> of VM.
> However, TDX does require a new type of VM.
> 
> If we read KVM code, there is a parameter of CREATE_VM to pass the vm_type,
> though x86 doesn't use this field so far. On QEMU side, it also has the
> codes to pass/configure vm-type in command line. Of cousre, x86 arch doesn't
> implement it. With upcoming TDX, it will implement and use vm type for TDX.
> That's the reason we wrote this patch to implement kvm-type for x86, similar
> to other arches.
> 
> yes, of course we can infer the vm_type from "-object tdx-guest". But I
> prefer to just use vm_type. Let's see others opinion.

It isn't just SEV that is using the confidential-guest-support approach.
This was done for PPC64 and S390x too.  This gives QEMU a standard
internal interface to declare that a confidential guest is being used /
configured. IMHO, TDX needs to use this too, unless there's a compelling
technical reason why it is a bad approach & needs to diverge from every
other confidential guest impl in QEMU.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
  2022-01-10 11:18         ` Daniel P. Berrangé
@ 2022-01-10 12:01           ` Xiaoyao Li
  -1 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2022-01-10 12:01 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Gerd Hoffmann, isaku.yamahata, isaku.yamahata, cohuck, ehabkost,
	kvm, mst, seanjc, alistair, qemu-devel, mtosatti, erdemaktas,
	pbonzini

On 1/10/2022 7:18 PM, Daniel P. Berrangé wrote:
> On Wed, Nov 24, 2021 at 03:31:13PM +0800, Xiaoyao Li wrote:
>> On 8/26/2021 6:22 PM, Gerd Hoffmann wrote:
>>> On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>
>>>> Introduce a machine property, kvm-type, to allow the user to create a
>>>> Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
>>>>
>>>>    # $QEMU \
>>>> 	-machine ...,kvm-type=tdx \
>>>> 	...
>>
>> Sorry for the very late reply.
>>
>>> Can we align sev and tdx better than that?
>>>
>>> SEV is enabled this way:
>>>
>>> qemu -machine ...,confidential-guest-support=sev0 \
>>>        -object sev-guest,id=sev0,...
>>>
>>> (see docs/amd-memory-encryption.txt for details).
>>>
>>> tdx could likewise use a tdx-guest object (and both sev-guest and
>>> tdx-guest should probably have a common parent object type) to enable
>>> and configure tdx support.
>>
>> yes, sev only introduced a new object and passed it to
>> confidential-guest-support. This is because SEV doesn't require the new type
>> of VM.
>> However, TDX does require a new type of VM.
>>
>> If we read KVM code, there is a parameter of CREATE_VM to pass the vm_type,
>> though x86 doesn't use this field so far. On QEMU side, it also has the
>> codes to pass/configure vm-type in command line. Of cousre, x86 arch doesn't
>> implement it. With upcoming TDX, it will implement and use vm type for TDX.
>> That's the reason we wrote this patch to implement kvm-type for x86, similar
>> to other arches.
>>
>> yes, of course we can infer the vm_type from "-object tdx-guest". But I
>> prefer to just use vm_type. Let's see others opinion.
> 
> It isn't just SEV that is using the confidential-guest-support approach.
> This was done for PPC64 and S390x too.  This gives QEMU a standard
> internal interface to declare that a confidential guest is being used /
> configured. IMHO, TDX needs to use this too, unless there's a compelling
> technical reason why it is a bad approach & needs to diverge from every
> other confidential guest impl in QEMU.
> 

Forgot to tell the update that we went the direction to identify the TDX 
vm_type based on confidential-guest_support like below:


if (ms->cgs && object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
         kvm_type = KVM_X86_TDX_VM;
} else {
         kvm_type = KVM_X86_DEFAULT_VM;
}


I think it's what you want, right?

BTW, the whole next version of TDX QEMU series should be released with 
next version of TDX KVM series. But I cannot tell the exact date yet.


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
@ 2022-01-10 12:01           ` Xiaoyao Li
  0 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2022-01-10 12:01 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: isaku.yamahata, alistair, ehabkost, kvm, mst, seanjc, cohuck,
	qemu-devel, mtosatti, Gerd Hoffmann, erdemaktas, pbonzini,
	isaku.yamahata

On 1/10/2022 7:18 PM, Daniel P. Berrangé wrote:
> On Wed, Nov 24, 2021 at 03:31:13PM +0800, Xiaoyao Li wrote:
>> On 8/26/2021 6:22 PM, Gerd Hoffmann wrote:
>>> On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
>>>> From: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>
>>>> Introduce a machine property, kvm-type, to allow the user to create a
>>>> Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
>>>>
>>>>    # $QEMU \
>>>> 	-machine ...,kvm-type=tdx \
>>>> 	...
>>
>> Sorry for the very late reply.
>>
>>> Can we align sev and tdx better than that?
>>>
>>> SEV is enabled this way:
>>>
>>> qemu -machine ...,confidential-guest-support=sev0 \
>>>        -object sev-guest,id=sev0,...
>>>
>>> (see docs/amd-memory-encryption.txt for details).
>>>
>>> tdx could likewise use a tdx-guest object (and both sev-guest and
>>> tdx-guest should probably have a common parent object type) to enable
>>> and configure tdx support.
>>
>> yes, sev only introduced a new object and passed it to
>> confidential-guest-support. This is because SEV doesn't require the new type
>> of VM.
>> However, TDX does require a new type of VM.
>>
>> If we read KVM code, there is a parameter of CREATE_VM to pass the vm_type,
>> though x86 doesn't use this field so far. On QEMU side, it also has the
>> codes to pass/configure vm-type in command line. Of cousre, x86 arch doesn't
>> implement it. With upcoming TDX, it will implement and use vm type for TDX.
>> That's the reason we wrote this patch to implement kvm-type for x86, similar
>> to other arches.
>>
>> yes, of course we can infer the vm_type from "-object tdx-guest". But I
>> prefer to just use vm_type. Let's see others opinion.
> 
> It isn't just SEV that is using the confidential-guest-support approach.
> This was done for PPC64 and S390x too.  This gives QEMU a standard
> internal interface to declare that a confidential guest is being used /
> configured. IMHO, TDX needs to use this too, unless there's a compelling
> technical reason why it is a bad approach & needs to diverge from every
> other confidential guest impl in QEMU.
> 

Forgot to tell the update that we went the direction to identify the TDX 
vm_type based on confidential-guest_support like below:


if (ms->cgs && object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
         kvm_type = KVM_X86_TDX_VM;
} else {
         kvm_type = KVM_X86_DEFAULT_VM;
}


I think it's what you want, right?

BTW, the whole next version of TDX QEMU series should be released with 
next version of TDX KVM series. But I cannot tell the exact date yet.



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
  2022-01-10 12:01           ` Xiaoyao Li
@ 2022-01-10 12:05             ` Daniel P. Berrangé
  -1 siblings, 0 replies; 173+ messages in thread
From: Daniel P. Berrangé @ 2022-01-10 12:05 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Gerd Hoffmann, isaku.yamahata, isaku.yamahata, cohuck, ehabkost,
	kvm, mst, seanjc, alistair, qemu-devel, mtosatti, erdemaktas,
	pbonzini

On Mon, Jan 10, 2022 at 08:01:33PM +0800, Xiaoyao Li wrote:
> On 1/10/2022 7:18 PM, Daniel P. Berrangé wrote:
> > On Wed, Nov 24, 2021 at 03:31:13PM +0800, Xiaoyao Li wrote:
> > > On 8/26/2021 6:22 PM, Gerd Hoffmann wrote:
> > > > On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
> > > > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > > > 
> > > > > Introduce a machine property, kvm-type, to allow the user to create a
> > > > > Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
> > > > > 
> > > > >    # $QEMU \
> > > > > 	-machine ...,kvm-type=tdx \
> > > > > 	...
> > > 
> > > Sorry for the very late reply.
> > > 
> > > > Can we align sev and tdx better than that?
> > > > 
> > > > SEV is enabled this way:
> > > > 
> > > > qemu -machine ...,confidential-guest-support=sev0 \
> > > >        -object sev-guest,id=sev0,...
> > > > 
> > > > (see docs/amd-memory-encryption.txt for details).
> > > > 
> > > > tdx could likewise use a tdx-guest object (and both sev-guest and
> > > > tdx-guest should probably have a common parent object type) to enable
> > > > and configure tdx support.
> > > 
> > > yes, sev only introduced a new object and passed it to
> > > confidential-guest-support. This is because SEV doesn't require the new type
> > > of VM.
> > > However, TDX does require a new type of VM.
> > > 
> > > If we read KVM code, there is a parameter of CREATE_VM to pass the vm_type,
> > > though x86 doesn't use this field so far. On QEMU side, it also has the
> > > codes to pass/configure vm-type in command line. Of cousre, x86 arch doesn't
> > > implement it. With upcoming TDX, it will implement and use vm type for TDX.
> > > That's the reason we wrote this patch to implement kvm-type for x86, similar
> > > to other arches.
> > > 
> > > yes, of course we can infer the vm_type from "-object tdx-guest". But I
> > > prefer to just use vm_type. Let's see others opinion.
> > 
> > It isn't just SEV that is using the confidential-guest-support approach.
> > This was done for PPC64 and S390x too.  This gives QEMU a standard
> > internal interface to declare that a confidential guest is being used /
> > configured. IMHO, TDX needs to use this too, unless there's a compelling
> > technical reason why it is a bad approach & needs to diverge from every
> > other confidential guest impl in QEMU.
> > 
> 
> Forgot to tell the update that we went the direction to identify the TDX
> vm_type based on confidential-guest_support like below:
> 
> 
> if (ms->cgs && object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
>         kvm_type = KVM_X86_TDX_VM;
> } else {
>         kvm_type = KVM_X86_DEFAULT_VM;
> }
> 
> 
> I think it's what you want, right?

Yes, that seems reasonable

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest
@ 2022-01-10 12:05             ` Daniel P. Berrangé
  0 siblings, 0 replies; 173+ messages in thread
From: Daniel P. Berrangé @ 2022-01-10 12:05 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: isaku.yamahata, alistair, ehabkost, kvm, mst, seanjc, cohuck,
	qemu-devel, mtosatti, Gerd Hoffmann, erdemaktas, pbonzini,
	isaku.yamahata

On Mon, Jan 10, 2022 at 08:01:33PM +0800, Xiaoyao Li wrote:
> On 1/10/2022 7:18 PM, Daniel P. Berrangé wrote:
> > On Wed, Nov 24, 2021 at 03:31:13PM +0800, Xiaoyao Li wrote:
> > > On 8/26/2021 6:22 PM, Gerd Hoffmann wrote:
> > > > On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamahata@gmail.com wrote:
> > > > > From: Xiaoyao Li <xiaoyao.li@intel.com>
> > > > > 
> > > > > Introduce a machine property, kvm-type, to allow the user to create a
> > > > > Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:
> > > > > 
> > > > >    # $QEMU \
> > > > > 	-machine ...,kvm-type=tdx \
> > > > > 	...
> > > 
> > > Sorry for the very late reply.
> > > 
> > > > Can we align sev and tdx better than that?
> > > > 
> > > > SEV is enabled this way:
> > > > 
> > > > qemu -machine ...,confidential-guest-support=sev0 \
> > > >        -object sev-guest,id=sev0,...
> > > > 
> > > > (see docs/amd-memory-encryption.txt for details).
> > > > 
> > > > tdx could likewise use a tdx-guest object (and both sev-guest and
> > > > tdx-guest should probably have a common parent object type) to enable
> > > > and configure tdx support.
> > > 
> > > yes, sev only introduced a new object and passed it to
> > > confidential-guest-support. This is because SEV doesn't require the new type
> > > of VM.
> > > However, TDX does require a new type of VM.
> > > 
> > > If we read KVM code, there is a parameter of CREATE_VM to pass the vm_type,
> > > though x86 doesn't use this field so far. On QEMU side, it also has the
> > > codes to pass/configure vm-type in command line. Of cousre, x86 arch doesn't
> > > implement it. With upcoming TDX, it will implement and use vm type for TDX.
> > > That's the reason we wrote this patch to implement kvm-type for x86, similar
> > > to other arches.
> > > 
> > > yes, of course we can infer the vm_type from "-object tdx-guest". But I
> > > prefer to just use vm_type. Let's see others opinion.
> > 
> > It isn't just SEV that is using the confidential-guest-support approach.
> > This was done for PPC64 and S390x too.  This gives QEMU a standard
> > internal interface to declare that a confidential guest is being used /
> > configured. IMHO, TDX needs to use this too, unless there's a compelling
> > technical reason why it is a bad approach & needs to diverge from every
> > other confidential guest impl in QEMU.
> > 
> 
> Forgot to tell the update that we went the direction to identify the TDX
> vm_type based on confidential-guest_support like below:
> 
> 
> if (ms->cgs && object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
>         kvm_type = KVM_X86_TDX_VM;
> } else {
>         kvm_type = KVM_X86_DEFAULT_VM;
> }
> 
> 
> I think it's what you want, right?

Yes, that seems reasonable

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  2022-01-10 11:01             ` Gerd Hoffmann
@ 2022-01-10 12:09               ` Xiaoyao Li
  -1 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2022-01-10 12:09 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Laszlo Ersek, isaku.yamahata, qemu-devel, pbonzini, alistair,
	ehabkost, marcel.apfelbaum, mst, cohuck, mtosatti, seanjc,
	erdemaktas, kvm, isaku.yamahata, Min M . Xu

On 1/10/2022 7:01 PM, Gerd Hoffmann wrote:
>>> If you go without pflash, then you likely will not have a
>>> standards-conformant UEFI variable store. (Unless you reimplement the
>>> variable arch protocols in edk2 on top of something else than the Fault
>>> Tolerant Write and Firmware Volume Block protocols.) Whether a
>>> conformant UEFI varstore matters to you (or to TDX in general) is
>>> something I can't comment on.
>>
>> Thanks for your reply! Laszlo
>>
>> regarding "standards-conformant UEFI variable store", I guess you mean the
>> change to UEFI non-volatile variables needs to be synced back to the
>> OVMF_VARS.fd file. right?
> 
> Yes.  UEFI variables are expected to be persistent, and syncing to
> OVMF_VARS.fd handles that.

Further question.

Is it achieved via read-only memslot that when UEFI variable gets 
changed, it exits to QEMU with KVM_EXIT_MMIO due to read-only memslot so 
QEMU can sync the change to OVMF_VAR.fd?

> Not fully sure whenever that expectation holds up in the CC world.  At
> least the AmdSev variant has just OVMF.fd, i.e. no CODE/VARS split.
> 
>>> Regarding pflash itself, the read-only KVM memslot is required for it.
>>> Otherwise pflash cannot work as a "ROMD device" (= you can't flip it
>>> back and forth between ROM mode and programming (MMIO) mode).
>>
>> We don't need Read-only mode for TDVF so far. If for this purpose, is it
>> acceptable that allowing a pflash without KVM readonly memslot support if
>> read-only is not required for the specific pflash device?
> 
> In case you don't want/need persistent VARS (which strictly speaking is
> a UEFI spec violation) you should be able to go for a simple "-bios
> OVMF.fd".
> 
> take care,
>    Gerd
> 


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
@ 2022-01-10 12:09               ` Xiaoyao Li
  0 siblings, 0 replies; 173+ messages in thread
From: Xiaoyao Li @ 2022-01-10 12:09 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	qemu-devel, mtosatti, Min M . Xu, erdemaktas, pbonzini,
	Laszlo Ersek, isaku.yamahata

On 1/10/2022 7:01 PM, Gerd Hoffmann wrote:
>>> If you go without pflash, then you likely will not have a
>>> standards-conformant UEFI variable store. (Unless you reimplement the
>>> variable arch protocols in edk2 on top of something else than the Fault
>>> Tolerant Write and Firmware Volume Block protocols.) Whether a
>>> conformant UEFI varstore matters to you (or to TDX in general) is
>>> something I can't comment on.
>>
>> Thanks for your reply! Laszlo
>>
>> regarding "standards-conformant UEFI variable store", I guess you mean the
>> change to UEFI non-volatile variables needs to be synced back to the
>> OVMF_VARS.fd file. right?
> 
> Yes.  UEFI variables are expected to be persistent, and syncing to
> OVMF_VARS.fd handles that.

Further question.

Is it achieved via read-only memslot that when UEFI variable gets 
changed, it exits to QEMU with KVM_EXIT_MMIO due to read-only memslot so 
QEMU can sync the change to OVMF_VAR.fd?

> Not fully sure whenever that expectation holds up in the CC world.  At
> least the AmdSev variant has just OVMF.fd, i.e. no CODE/VARS split.
> 
>>> Regarding pflash itself, the read-only KVM memslot is required for it.
>>> Otherwise pflash cannot work as a "ROMD device" (= you can't flip it
>>> back and forth between ROM mode and programming (MMIO) mode).
>>
>> We don't need Read-only mode for TDVF so far. If for this purpose, is it
>> acceptable that allowing a pflash without KVM readonly memslot support if
>> read-only is not required for the specific pflash device?
> 
> In case you don't want/need persistent VARS (which strictly speaking is
> a UEFI spec violation) you should be able to go for a simple "-bios
> OVMF.fd".
> 
> take care,
>    Gerd
> 



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  2022-01-10 12:09               ` Xiaoyao Li
@ 2022-01-11  8:19                 ` Laszlo Ersek
  -1 siblings, 0 replies; 173+ messages in thread
From: Laszlo Ersek @ 2022-01-11  8:19 UTC (permalink / raw)
  To: Xiaoyao Li, Gerd Hoffmann
  Cc: isaku.yamahata, qemu-devel, pbonzini, alistair, ehabkost,
	marcel.apfelbaum, mst, cohuck, mtosatti, seanjc, erdemaktas, kvm,
	isaku.yamahata, Min M . Xu

On 01/10/22 13:09, Xiaoyao Li wrote:
> On 1/10/2022 7:01 PM, Gerd Hoffmann wrote:
>>>> If you go without pflash, then you likely will not have a
>>>> standards-conformant UEFI variable store. (Unless you reimplement
>>>> the variable arch protocols in edk2 on top of something else than
>>>> the Fault Tolerant Write and Firmware Volume Block protocols.)
>>>> Whether a conformant UEFI varstore matters to you (or to TDX in
>>>> general) is something I can't comment on.
>>>
>>> Thanks for your reply! Laszlo
>>>
>>> regarding "standards-conformant UEFI variable store", I guess you
>>> mean the
>>> change to UEFI non-volatile variables needs to be synced back to the
>>> OVMF_VARS.fd file. right?
>>
>> Yes.  UEFI variables are expected to be persistent, and syncing to
>> OVMF_VARS.fd handles that.
>
> Further question.
>
> Is it achieved via read-only memslot that when UEFI variable gets
> changed, it exits to QEMU with KVM_EXIT_MMIO due to read-only memslot
> so QEMU can sync the change to OVMF_VAR.fd?

Yes.

When the flash device is in "romd_mode", that's when a readonly KVM
memslot is used. In this case, the guest can read and execute from the
memory region in question, only writes trap to QEMU. Such a write
(WRITE_BYTE_CMD) is what the guest's flash driver uses to flip the flash
device out of "romd_mode".

When the flash device is not in "romd_mode", then no KVM memslot is used
at all, and both reads and writes trap to QEMU. Once the flash
programming is done, the guest's flash driver issues a particular write
command (READ_ARRAY_CMD) that flips the device back to "romd_mode" (and
then the readonly KVM memslot is re-established).

Here's a rough call tree (for the non-SMM case, updating a
non-authenticated non-volatile variable):

  VariableServiceSetVariable()                             [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
    UpdateVariable()                                       [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
      UpdateVariableStore()                                [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
        FvbProtocolWrite()                                 [OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FwBlockService.c]
          QemuFlashWrite()                                 [OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlash.c]

            QemuFlashPtrWrite (WRITE_BYTE_CMD /* 0x10 */)
               QEMU:
                pflash_write()                             [hw/block/pflash_cfi01.c]
                  (wcycle == 0)
                  memory_region_rom_device_set_romd(false) [softmmu/memory.c]
                    ...
                      kvm_region_del()                     [accel/kvm/kvm-all.c]
                        kvm_set_phys_mem(false)            [accel/kvm/kvm-all.c]
                          /* unregister the slot */

                  /* Single Byte Program */
                  wcycle++

            QemuFlashPtrWrite (Buffer[Loop])
              QEMU:
                pflash_write()                             [hw/block/pflash_cfi01.c]
                  (wcycle == 1)
                  /* Single Byte Program */
                  pflash_data_write()                      [hw/block/pflash_cfi01.c]
                  pflash_update()                          [hw/block/pflash_cfi01.c]
                    blk_pwrite()                           [block/block-backend.c]
                  wcycle = 0

            QemuFlashPtrWrite (READ_ARRAY_CMD /* 0xff */)
              QEMU:
                pflash_write()                             [hw/block/pflash_cfi01.c]
                  (wcycle == 0)
                  memory_region_rom_device_set_romd(false) [softmmu/memory.c]
                    /* no actual change */
                  /* Read Array */
                  memory_region_rom_device_set_romd(true)  [softmmu/memory.c]
                    kvm_region_add()                       [accel/kvm/kvm-all.c]
                      kvm_set_phys_mem(true)               [accel/kvm/kvm-all.c]
                        /* register the new slot */
                        kvm_mem_flags()                    [accel/kvm/kvm-all.c]
                          ... memory_region_is_romd() ...  [include/exec/memory.h]
                          flags |= KVM_MEM_READONLY

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
@ 2022-01-11  8:19                 ` Laszlo Ersek
  0 siblings, 0 replies; 173+ messages in thread
From: Laszlo Ersek @ 2022-01-11  8:19 UTC (permalink / raw)
  To: Xiaoyao Li, Gerd Hoffmann
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	qemu-devel, mtosatti, Min M . Xu, erdemaktas, pbonzini,
	isaku.yamahata

On 01/10/22 13:09, Xiaoyao Li wrote:
> On 1/10/2022 7:01 PM, Gerd Hoffmann wrote:
>>>> If you go without pflash, then you likely will not have a
>>>> standards-conformant UEFI variable store. (Unless you reimplement
>>>> the variable arch protocols in edk2 on top of something else than
>>>> the Fault Tolerant Write and Firmware Volume Block protocols.)
>>>> Whether a conformant UEFI varstore matters to you (or to TDX in
>>>> general) is something I can't comment on.
>>>
>>> Thanks for your reply! Laszlo
>>>
>>> regarding "standards-conformant UEFI variable store", I guess you
>>> mean the
>>> change to UEFI non-volatile variables needs to be synced back to the
>>> OVMF_VARS.fd file. right?
>>
>> Yes.  UEFI variables are expected to be persistent, and syncing to
>> OVMF_VARS.fd handles that.
>
> Further question.
>
> Is it achieved via read-only memslot that when UEFI variable gets
> changed, it exits to QEMU with KVM_EXIT_MMIO due to read-only memslot
> so QEMU can sync the change to OVMF_VAR.fd?

Yes.

When the flash device is in "romd_mode", that's when a readonly KVM
memslot is used. In this case, the guest can read and execute from the
memory region in question, only writes trap to QEMU. Such a write
(WRITE_BYTE_CMD) is what the guest's flash driver uses to flip the flash
device out of "romd_mode".

When the flash device is not in "romd_mode", then no KVM memslot is used
at all, and both reads and writes trap to QEMU. Once the flash
programming is done, the guest's flash driver issues a particular write
command (READ_ARRAY_CMD) that flips the device back to "romd_mode" (and
then the readonly KVM memslot is re-established).

Here's a rough call tree (for the non-SMM case, updating a
non-authenticated non-volatile variable):

  VariableServiceSetVariable()                             [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
    UpdateVariable()                                       [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
      UpdateVariableStore()                                [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
        FvbProtocolWrite()                                 [OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FwBlockService.c]
          QemuFlashWrite()                                 [OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlash.c]

            QemuFlashPtrWrite (WRITE_BYTE_CMD /* 0x10 */)
               QEMU:
                pflash_write()                             [hw/block/pflash_cfi01.c]
                  (wcycle == 0)
                  memory_region_rom_device_set_romd(false) [softmmu/memory.c]
                    ...
                      kvm_region_del()                     [accel/kvm/kvm-all.c]
                        kvm_set_phys_mem(false)            [accel/kvm/kvm-all.c]
                          /* unregister the slot */

                  /* Single Byte Program */
                  wcycle++

            QemuFlashPtrWrite (Buffer[Loop])
              QEMU:
                pflash_write()                             [hw/block/pflash_cfi01.c]
                  (wcycle == 1)
                  /* Single Byte Program */
                  pflash_data_write()                      [hw/block/pflash_cfi01.c]
                  pflash_update()                          [hw/block/pflash_cfi01.c]
                    blk_pwrite()                           [block/block-backend.c]
                  wcycle = 0

            QemuFlashPtrWrite (READ_ARRAY_CMD /* 0xff */)
              QEMU:
                pflash_write()                             [hw/block/pflash_cfi01.c]
                  (wcycle == 0)
                  memory_region_rom_device_set_romd(false) [softmmu/memory.c]
                    /* no actual change */
                  /* Read Array */
                  memory_region_rom_device_set_romd(true)  [softmmu/memory.c]
                    kvm_region_add()                       [accel/kvm/kvm-all.c]
                      kvm_set_phys_mem(true)               [accel/kvm/kvm-all.c]
                        /* register the new slot */
                        kvm_mem_flags()                    [accel/kvm/kvm-all.c]
                          ... memory_region_is_romd() ...  [include/exec/memory.h]
                          flags |= KVM_MEM_READONLY

Thanks
Laszlo



^ permalink raw reply	[flat|nested] 173+ messages in thread

* Re: [RFC PATCH v2 20/44] i386/tdx: Parse tdx metadata and store the result into TdxGuestState
  2022-01-11  8:19                 ` Laszlo Ersek
  (?)
@ 2022-01-11  8:48                 ` Laszlo Ersek
  -1 siblings, 0 replies; 173+ messages in thread
From: Laszlo Ersek @ 2022-01-11  8:48 UTC (permalink / raw)
  To: Xiaoyao Li, Gerd Hoffmann
  Cc: isaku.yamahata, cohuck, ehabkost, kvm, mst, seanjc, alistair,
	qemu-devel, mtosatti, Min M . Xu, erdemaktas, pbonzini,
	isaku.yamahata

[-- Attachment #1: Type: text/plain, Size: 3514 bytes --]

On 01/11/22 09:19, Laszlo Ersek wrote:

> Here's a rough call tree (for the non-SMM case, updating a
> non-authenticated non-volatile variable):
>
>   VariableServiceSetVariable()                             [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
>     UpdateVariable()                                       [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
>       UpdateVariableStore()                                [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
>         FvbProtocolWrite()                                 [OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FwBlockService.c]
>           QemuFlashWrite()                                 [OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlash.c]
>
>             QemuFlashPtrWrite (WRITE_BYTE_CMD /* 0x10 */)
>                QEMU:
>                 pflash_write()                             [hw/block/pflash_cfi01.c]
>                   (wcycle == 0)
>                   memory_region_rom_device_set_romd(false) [softmmu/memory.c]
>                     ...
>                       kvm_region_del()                     [accel/kvm/kvm-all.c]
>                         kvm_set_phys_mem(false)            [accel/kvm/kvm-all.c]
>                           /* unregister the slot */
>
>                   /* Single Byte Program */
>                   wcycle++
>
>             QemuFlashPtrWrite (Buffer[Loop])
>               QEMU:
>                 pflash_write()                             [hw/block/pflash_cfi01.c]
>                   (wcycle == 1)
>                   /* Single Byte Program */
>                   pflash_data_write()                      [hw/block/pflash_cfi01.c]
>                   pflash_update()                          [hw/block/pflash_cfi01.c]
>                     blk_pwrite()                           [block/block-backend.c]
>                   wcycle = 0
>
>             QemuFlashPtrWrite (READ_ARRAY_CMD /* 0xff */)
>               QEMU:
>                 pflash_write()                             [hw/block/pflash_cfi01.c]
>                   (wcycle == 0)
>                   memory_region_rom_device_set_romd(false) [softmmu/memory.c]
>                     /* no actual change */
>                   /* Read Array */
>                   memory_region_rom_device_set_romd(true)  [softmmu/memory.c]
>                     kvm_region_add()                       [accel/kvm/kvm-all.c]
>                       kvm_set_phys_mem(true)               [accel/kvm/kvm-all.c]
>                         /* register the new slot */
>                         kvm_mem_flags()                    [accel/kvm/kvm-all.c]
>                           ... memory_region_is_romd() ...  [include/exec/memory.h]
>                           flags |= KVM_MEM_READONLY

In that call tree, I ignored Reclaim()
[MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]; Reclaim() is
called from more places than just from UpdateVariable().

In Reclaim(), we (roughly) have

  Reclaim()              [MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c]
    FtwVariableSpace()   [MdeModulePkg/Universal/Variable/RuntimeDxe/Reclaim.c]
      FtwWrite()         [MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWrite.c]
        QemuFlashWrite() [OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlash.c]

For a bit more info on the internals of FtwWrite(), see the attached
message (I'd provide a URL, but Intel had killed the edk2-devel archives
on lists.01.org, and the other archives don't go back to 2014...)

Thanks
Laszlo

[-- Attachment #2: Attached Message --]
[-- Type: message/rfc822, Size: 151720 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 2104 bytes --]

On 04/18/14 21:32, Kirkendall, Garrett wrote:
> Is there any good documentation for how the Fault Tolerant Write is
> supposed to work?
> 
> I understand that NV storage, FTW working space and FTW spare space are
> supposed to be in the same Firmware Volume.
> 
> I’m having trouble deciphering how big the FTW working and spare areas
> should be in relation to the NV storage space.
> 
> Also, in a bunch of places, it looks like the code was written such that
> the working space must fit within one block size.  What happens if need
> space spanning multiple blocks?  Below are parts from two functions that
> end in an ASSERT because only one block gets read and returns an error
> when the requested FVB->Read input size is larger than one block of data.

If it's any help, here's a diagram I derived last December, while I was
hunting down <https://github.com/tianocore/edk2/commit/06f1982a>:

On 12/17/13 07:16, Laszlo Ersek wrote:
> During reclaim, the following data movements take place (I'm skipping
> the erasures and the in-memory buffer manipulations):
>
>       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   L: event log
> LIVE  |    varstore               |L|W|   W: working block
>       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
>       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> SPARE |                               |
>       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> (1) copy LIVE to MyBuffer
> (2) copy SPARE to SpareBuffer
> (3) copy MyBuffer to SPARE
> (4) copy SPARE to Buffer
> (5) copy Buffer to LIVE
> (6) copy SpareBuffer to SPARE

(MyBuffer, SpareBuffer, and Buffer are temporary memory buffers.)

In OVMF, the block size is 4K. The varstore is 14 blocks (56K), plus we
got one block (4K) for the event log and one block (4K) for the working
block. In total, 64K in the live half. The spare half is the same size,
giving 128K total for the firmware volume.

I'm also attaching the debug patch I wrote at that time for the FTW and
auth variable services, plus its output (which I annotated during
analysis) that helped me understand what was happening. Maybe you can
reuse something from them.

Laszlo

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2.1.2: debug.diff --]
[-- Type: text/x-patch; name="debug.diff", Size: 66793 bytes --]

diff --git a/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWrite.c b/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWrite.c
index 714b5d8..2918d58 100644
--- a/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWrite.c
+++ b/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWrite.c
@@ -39,7 +39,10 @@ FtwGetMaxBlockSize (
 {
   EFI_FTW_DEVICE  *FtwDevice;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   if (!FeaturePcdGet(PcdFullFtwServiceEnable)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_UNSUPPORTED;
   }
 
@@ -47,6 +50,8 @@ FtwGetMaxBlockSize (
 
   *BlockSize  = FtwDevice->SpareAreaLength;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2, BlockSize=0x%Lx\n", __FUNCTION__,
+    (UINT64) *BlockSize));
   return EFI_SUCCESS;
 }
 
@@ -86,10 +91,15 @@ FtwAllocate (
   EFI_FTW_DEVICE                  *FtwDevice;
   EFI_FAULT_TOLERANT_WRITE_HEADER *FtwHeader;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, CallerId=%g, PrivateDataSize=0x%Lx, "
+    "NumberOfWrites=0x%Lx\n", __FUNCTION__, CallerId, (UINT64)PrivateDataSize,
+    (UINT64)NumberOfWrites));
+
   FtwDevice = FTW_CONTEXT_FROM_THIS (This);
 
   Status    = WorkSpaceRefresh (FtwDevice);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
   //
@@ -97,6 +107,7 @@ FtwAllocate (
   //
   if (FTW_WRITE_TOTAL_SIZE (NumberOfWrites, PrivateDataSize) > FtwDevice->FtwWorkSpaceHeader->WriteQueueSize) {
     DEBUG ((EFI_D_ERROR, "Ftw: Allocate() request exceed Workspace, Caller: %g\n", CallerId));
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
     return EFI_BUFFER_TOO_SMALL;
   }
   //
@@ -109,6 +120,7 @@ FtwAllocate (
   // Previous write has not completed, access denied.
   //
   if ((FtwHeader->HeaderAllocated == FTW_VALID_STATE) || (FtwHeader->WritesAllocated == FTW_VALID_STATE)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 3\n", __FUNCTION__));
     return EFI_ACCESS_DENIED;
   }
   //
@@ -118,6 +130,7 @@ FtwAllocate (
   if (Offset + FTW_WRITE_TOTAL_SIZE (NumberOfWrites, PrivateDataSize) > FtwDevice->FtwWorkSpaceSize) {
     Status = FtwReclaimWorkSpace (FtwDevice, TRUE);
     if (EFI_ERROR (Status)) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 4: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -143,6 +156,7 @@ FtwAllocate (
                                     (UINT8 *) FtwHeader
                                     );
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 5: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
   //
@@ -155,6 +169,7 @@ FtwAllocate (
             WRITES_ALLOCATED
             );
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 6: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -165,6 +180,7 @@ FtwAllocate (
     NumberOfWrites)
     );
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 7\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -195,6 +211,8 @@ FtwWriteRecord (
   UINTN                           Offset;
   EFI_LBA                         WorkSpaceLbaOffset;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+ 
   FtwDevice = FTW_CONTEXT_FROM_THIS (This);
 
   WorkSpaceLbaOffset = FtwDevice->FtwWorkSpaceLba - FtwDevice->FtwWorkBlockLba;
@@ -223,6 +241,7 @@ FtwWriteRecord (
               SPARE_COMPLETED
               );
     if (EFI_ERROR (Status)) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 1: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -240,6 +259,7 @@ FtwWriteRecord (
   }
 
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
   //
@@ -253,6 +273,7 @@ FtwWriteRecord (
             DEST_COMPLETED
             );
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 3: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -272,10 +293,12 @@ FtwWriteRecord (
               );
     Header->Complete = FTW_VALID_STATE;
     if (EFI_ERROR (Status)) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 4: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 5\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -331,10 +354,15 @@ FtwWrite (
   UINT8                               *Ptr;
   EFI_PHYSICAL_ADDRESS                FvbPhysicalAddress;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx Offset=0x%Lx Length=0x%Lx "
+    "PrivateData=%p FvBlockHandle=%p Buffer=%p\n", __FUNCTION__, (UINT64)Lba,
+    (UINT64)Offset, (UINT64)Length, PrivateData, FvBlockHandle, Buffer));
+
   FtwDevice = FTW_CONTEXT_FROM_THIS (This);
 
   Status    = WorkSpaceRefresh (FtwDevice);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -349,6 +377,7 @@ FtwWrite (
       //
       Status = FtwAllocate (This, &gEfiCallerIdGuid, 0, 1);
       if (EFI_ERROR (Status)) {
+        DEBUG ((DEBUG_VERBOSE, "%a: exit 2: %r\n", __FUNCTION__, Status));
         return Status;
       }
     } else {
@@ -358,6 +387,7 @@ FtwWrite (
       //
       DEBUG ((EFI_D_ERROR, "Ftw: no allocates space for write record!\n"));
       DEBUG ((EFI_D_ERROR, "Ftw: Allocate service should be called before Write service!\n"));
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 3\n", __FUNCTION__));
       return EFI_NOT_READY;
     }
   }
@@ -366,6 +396,7 @@ FtwWrite (
   // If Record is out of the range of Header, return access denied.
   //
   if (((UINTN)((UINT8 *) Record - (UINT8 *) Header)) > FTW_WRITE_TOTAL_SIZE (Header->NumberOfWrites - 1, Header->PrivateDataSize)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 4\n", __FUNCTION__));
     return EFI_ACCESS_DENIED;
   }
 
@@ -373,20 +404,24 @@ FtwWrite (
   // Check the COMPLETE flag of last write header
   //
   if (Header->Complete == FTW_VALID_STATE) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 5\n", __FUNCTION__));
     return EFI_ACCESS_DENIED;
   }
 
   if (Record->DestinationComplete == FTW_VALID_STATE) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 6\n", __FUNCTION__));
     return EFI_ACCESS_DENIED;
   }
 
   if ((Record->SpareComplete == FTW_VALID_STATE) && (Record->DestinationComplete != FTW_VALID_STATE)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 7\n", __FUNCTION__));
     return EFI_NOT_READY;
   }
   //
   // Check if the input data can fit within the target block
   //
   if ((Offset + Length) > FtwDevice->SpareAreaLength) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 8\n", __FUNCTION__));
     return EFI_BAD_BUFFER_SIZE;
   }
   //
@@ -394,12 +429,14 @@ FtwWrite (
   //
   Status = FtwGetFvbByHandle (FvBlockHandle, &Fvb);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 9: %r\n", __FUNCTION__, Status));
     return EFI_NOT_FOUND;
   }
 
   Status = Fvb->GetPhysicalAddress (Fvb, &FvbPhysicalAddress);
   if (EFI_ERROR (Status)) {
     DEBUG ((EFI_D_ERROR, "FtwLite: Get FVB physical address - %r\n", Status));
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 10: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -431,6 +468,7 @@ FtwWrite (
                                     (UINT8 *) Record
                                     );
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 11: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
   //
@@ -442,6 +480,7 @@ FtwWrite (
   MyBufferSize  = FtwDevice->SpareAreaLength;
   MyBuffer      = AllocatePool (MyBufferSize);
   if (MyBuffer == NULL) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 12\n", __FUNCTION__));
     return EFI_OUT_OF_RESOURCES;
   }
   //
@@ -453,6 +492,7 @@ FtwWrite (
     Status    = Fvb->Read (Fvb, Lba + Index, 0, &MyLength, Ptr);
     if (EFI_ERROR (Status)) {
       FreePool (MyBuffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 13: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -472,6 +512,7 @@ FtwWrite (
   SpareBuffer     = AllocatePool (SpareBufferSize);
   if (SpareBuffer == NULL) {
     FreePool (MyBuffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 14\n", __FUNCTION__));
     return EFI_OUT_OF_RESOURCES;
   }
 
@@ -488,6 +529,7 @@ FtwWrite (
     if (EFI_ERROR (Status)) {
       FreePool (MyBuffer);
       FreePool (SpareBuffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 15: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -510,6 +552,7 @@ FtwWrite (
     if (EFI_ERROR (Status)) {
       FreePool (MyBuffer);
       FreePool (SpareBuffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 16: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -532,6 +575,7 @@ FtwWrite (
             );
   if (EFI_ERROR (Status)) {
     FreePool (SpareBuffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 17: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -544,6 +588,7 @@ FtwWrite (
   Status = FtwWriteRecord (This, Fvb);
   if (EFI_ERROR (Status)) {
     FreePool (SpareBuffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 18: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
   //
@@ -562,6 +607,7 @@ FtwWrite (
                                         );
     if (EFI_ERROR (Status)) {
       FreePool (SpareBuffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 19: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -580,6 +626,7 @@ FtwWrite (
     Length)
     );
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 20\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -610,10 +657,14 @@ FtwRestart (
   EFI_FAULT_TOLERANT_WRITE_RECORD     *Record;
   EFI_FIRMWARE_VOLUME_BLOCK_PROTOCOL  *Fvb;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, FvBlockHandle=%p\n", __FUNCTION__,
+    FvBlockHandle));
+
   FtwDevice = FTW_CONTEXT_FROM_THIS (This);
 
   Status    = WorkSpaceRefresh (FtwDevice);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -626,6 +677,7 @@ FtwRestart (
   //
   Status = FtwGetFvbByHandle (FvBlockHandle, &Fvb);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2: %r\n", __FUNCTION__, Status));
     return EFI_NOT_FOUND;
   }
 
@@ -633,6 +685,7 @@ FtwRestart (
   // Check the COMPLETE flag of last write header
   //
   if (Header->Complete == FTW_VALID_STATE) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 3\n", __FUNCTION__));
     return EFI_ACCESS_DENIED;
   }
 
@@ -640,10 +693,12 @@ FtwRestart (
   // Check the flags of last write record
   //
   if (Record->DestinationComplete == FTW_VALID_STATE) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 4\n", __FUNCTION__));
     return EFI_ACCESS_DENIED;
   }
 
   if ((Record->SpareComplete != FTW_VALID_STATE)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 5\n", __FUNCTION__));
     return EFI_ABORTED;
   }
 
@@ -653,6 +708,7 @@ FtwRestart (
   //
   Status = FtwWriteRecord (This, Fvb);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 6: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -663,6 +719,7 @@ FtwRestart (
   FtwEraseSpareBlock (FtwDevice);
 
   DEBUG ((EFI_D_ERROR, "Ftw: Restart() success \n"));
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 7\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -686,18 +743,23 @@ FtwAbort (
   UINTN           Offset;
   EFI_FTW_DEVICE  *FtwDevice;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   FtwDevice = FTW_CONTEXT_FROM_THIS (This);
 
   Status    = WorkSpaceRefresh (FtwDevice);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
   if (FtwDevice->FtwLastWriteHeader->HeaderAllocated != FTW_VALID_STATE) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
     return EFI_NOT_FOUND;
   }
 
   if (FtwDevice->FtwLastWriteHeader->Complete == FTW_VALID_STATE) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 3\n", __FUNCTION__));
     return EFI_NOT_FOUND;
   }
   //
@@ -711,12 +773,14 @@ FtwAbort (
             WRITES_COMPLETED
             );
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 4: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
   FtwDevice->FtwLastWriteHeader->Complete = FTW_VALID_STATE;
 
   DEBUG ((EFI_D_ERROR, "Ftw: Abort() success \n"));
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 5\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -761,7 +825,11 @@ FtwGetLastWrite (
   EFI_FAULT_TOLERANT_WRITE_HEADER *Header;
   EFI_FAULT_TOLERANT_WRITE_RECORD *Record;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, PrivateDataSize=0x%Lx\n", __FUNCTION__,
+    (UINT64)*PrivateDataSize));
+
   if (!FeaturePcdGet(PcdFullFtwServiceEnable)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_UNSUPPORTED;
   }
 
@@ -769,6 +837,7 @@ FtwGetLastWrite (
 
   Status    = WorkSpaceRefresh (FtwDevice);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -786,6 +855,7 @@ FtwGetLastWrite (
 
     Status    = FtwAbort (This);
     *Complete = TRUE;
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 3\n", __FUNCTION__));
     return EFI_NOT_FOUND;
   }
   //
@@ -793,6 +863,7 @@ FtwGetLastWrite (
   //
   if (Header->HeaderAllocated != FTW_VALID_STATE) {
     *Complete = TRUE;
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 4\n", __FUNCTION__));
     return EFI_NOT_FOUND;
   }
   //
@@ -803,6 +874,7 @@ FtwGetLastWrite (
     if (EFI_ERROR (Status)) {
       FtwAbort (This);
       *Complete = TRUE;
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 5: %r\n", __FUNCTION__, Status));
       return EFI_NOT_FOUND;
     }
     ASSERT (Record != NULL);
@@ -829,6 +901,11 @@ FtwGetLastWrite (
 
   DEBUG ((EFI_D_ERROR, "Ftw: GetLasetWrite() success\n"));
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 6, CallerId=%g Lba=0x%Lx Offset=0x%Lx "
+    "Length=0x%Lx PrivateDataSize=0x%Lx PrivateData=%p Complete=%d\n",
+    __FUNCTION__, CallerId, (UINT64)*Lba, (UINT64)*Offset, (UINT64)*Length,
+    (UINT64)*PrivateDataSize, PrivateData, *Complete));
+
   return Status;
 }
 
diff --git a/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWriteDxe.c b/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWriteDxe.c
index 1235bd8..caece56 100644
--- a/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWriteDxe.c
+++ b/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWriteDxe.c
@@ -73,14 +73,19 @@ FtwGetFvbByHandle (
   OUT EFI_FIRMWARE_VOLUME_BLOCK_PROTOCOL  **FvBlock
   )
 {
+  EFI_STATUS Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
   //
   // To get the FVB protocol interface on the handle
   //
-  return gBS->HandleProtocol (
+  Status = gBS->HandleProtocol (
                 FvBlockHandle,
                 &gEfiFirmwareVolumeBlockProtocolGuid,
                 (VOID **) FvBlock
                 );
+  DEBUG ((DEBUG_VERBOSE, "%a: exit: %r\n", __FUNCTION__, Status));
+  return Status;
 }
 
 /**
@@ -100,6 +105,7 @@ FtwGetSarProtocol (
 {
   EFI_STATUS                              Status;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
   //
   // Locate Swap Address Range protocol
   //
@@ -108,6 +114,7 @@ FtwGetSarProtocol (
                   NULL, 
                   SarProtocol
                   );
+  DEBUG ((DEBUG_VERBOSE, "%a: exit: %r\n", __FUNCTION__, Status));
   return Status;
 }
 
@@ -134,6 +141,7 @@ GetFvbCountAndBuffer (
 {
   EFI_STATUS                              Status;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
   //
   // Locate all handles of Fvb protocol
   //
@@ -144,6 +152,7 @@ GetFvbCountAndBuffer (
                   NumberHandles,
                   Buffer
                   );
+  DEBUG ((DEBUG_VERBOSE, "%a: exit: %r\n", __FUNCTION__, Status));
   return Status;
 }
 
@@ -166,6 +175,9 @@ FvbNotificationEvent (
   EFI_FAULT_TOLERANT_WRITE_PROTOCOL       *FtwProtocol;
   EFI_FTW_DEVICE                          *FtwDevice;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Event=%p Context=%p\n", __FUNCTION__,
+    Event, Context));
+
   //
   // Just return to avoid installing FaultTolerantWriteProtocol again
   // if Fault Tolerant Write protocol has been installed.
@@ -176,6 +188,7 @@ FvbNotificationEvent (
                   (VOID **) &FtwProtocol
                   );
   if (!EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1: %r\n", __FUNCTION__, Status));
     return ;
   }
 
@@ -185,6 +198,7 @@ FvbNotificationEvent (
   FtwDevice = (EFI_FTW_DEVICE *)Context;
   Status = InitFtwProtocol (FtwDevice);
   if (EFI_ERROR(Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2: %r\n", __FUNCTION__, Status));
     return ;
   }                          
     
@@ -202,6 +216,7 @@ FvbNotificationEvent (
   Status = gBS->CloseEvent (Event);
   ASSERT_EFI_ERROR (Status);
   
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 3: %r\n", __FUNCTION__, Status));
   return;
 }
 
@@ -227,11 +242,14 @@ FaultTolerantWriteInitialize (
   EFI_STATUS                              Status;
   EFI_FTW_DEVICE                          *FtwDevice;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   //
   // Allocate private data structure for FTW protocol and do some initialization
   //
   Status = InitFtwDevice (&FtwDevice);
   if (EFI_ERROR(Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1: %r\n", __FUNCTION__, Status));
     return Status;
   }
 
@@ -246,5 +264,6 @@ FaultTolerantWriteInitialize (
     &mFvbRegistration
     );
   
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
diff --git a/MdeModulePkg/Universal/FaultTolerantWriteDxe/FtwMisc.c b/MdeModulePkg/Universal/FaultTolerantWriteDxe/FtwMisc.c
index b3352bb..436de4c 100644
--- a/MdeModulePkg/Universal/FaultTolerantWriteDxe/FtwMisc.c
+++ b/MdeModulePkg/Universal/FaultTolerantWriteDxe/FtwMisc.c
@@ -35,6 +35,9 @@ IsErasedFlashBuffer (
   UINT8   *Ptr;
   UINTN   Index;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Buffer=%p BufferSize=0x%Lx\n",
+    __FUNCTION__, Buffer, (UINT64)BufferSize));
+
   Ptr     = Buffer;
   IsEmpty = TRUE;
   for (Index = 0; Index < BufferSize; Index += 1) {
@@ -44,6 +47,7 @@ IsErasedFlashBuffer (
     }
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit, IsEmpty=%d\n", __FUNCTION__, IsEmpty));
   return IsEmpty;
 }
 
@@ -66,12 +70,17 @@ FtwEraseBlock (
   EFI_LBA                             Lba
   )
 {
-  return FvBlock->EraseBlocks (
+  EFI_STATUS Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx\n", __FUNCTION__, (UINT64)Lba));
+  Status = FvBlock->EraseBlocks (
                     FvBlock,
                     Lba,
                     FtwDevice->NumberOfSpareBlock,
                     EFI_LBA_LIST_TERMINATOR
                     );
+  DEBUG ((DEBUG_VERBOSE, "%a: exit: %r\n", __FUNCTION__, Status));
+  return Status;
 }
 
 /**
@@ -96,12 +105,17 @@ FtwEraseSpareBlock (
   IN EFI_FTW_DEVICE   *FtwDevice
   )
 {
-  return FtwDevice->FtwBackupFvb->EraseBlocks (
+  EFI_STATUS Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+  Status = FtwDevice->FtwBackupFvb->EraseBlocks (
                                     FtwDevice->FtwBackupFvb,
                                     FtwDevice->FtwSpareLba,
                                     FtwDevice->NumberOfSpareBlock,
                                     EFI_LBA_LIST_TERMINATOR
                                     );
+  DEBUG ((DEBUG_VERBOSE, "%a: exit: %r\n", __FUNCTION__, Status));
+  return Status;
 }
 
 /**
@@ -122,17 +136,22 @@ IsWorkingBlock (
   EFI_LBA                             Lba
   )
 {
+  BOOLEAN Ret;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx\n", __FUNCTION__, (UINT64)Lba));
   //
   // If matching the following condition, the target block is in working block.
   // 1. Target block is on the FV of working block (Using the same FVB protocol instance).
   // 2. Lba falls into the range of working block.
   //
-  return (BOOLEAN)
+  Ret = (BOOLEAN)
     (
       (FvBlock == FtwDevice->FtwFvBlock) &&
       (Lba >= FtwDevice->FtwWorkBlockLba) &&
       (Lba <= FtwDevice->FtwWorkSpaceLba)
     );
+  DEBUG ((DEBUG_VERBOSE, "%a: exit: %d\n", __FUNCTION__, Ret));
+  return Ret;
 }
 
 /**
@@ -162,6 +181,9 @@ GetFvbByAddress (
   EFI_FIRMWARE_VOLUME_HEADER          *FwVolHeader;
   EFI_HANDLE                          FvbHandle;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Address=0x%Lx\n", __FUNCTION__,
+    (UINT64)Address));
+
   *FvBlock  = NULL;
   FvbHandle = NULL;
   //
@@ -169,6 +191,7 @@ GetFvbByAddress (
   //
   Status = GetFvbCountAndBuffer (&HandleCount, &HandleBuffer);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1: %r\n", __FUNCTION__, Status));
     return NULL;
   }
   //
@@ -196,6 +219,7 @@ GetFvbByAddress (
   }
 
   FreePool (HandleBuffer);
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
   return FvbHandle;
 }
 
@@ -227,12 +251,16 @@ IsBootBlock (
   BOOLEAN                             IsSwapped;
   EFI_HANDLE                          FvbHandle;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx\n", __FUNCTION__, (UINT64)Lba));
+
   if (!FeaturePcdGet(PcdFullFtwServiceEnable)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return FALSE;
   }
 
   Status = FtwGetSarProtocol ((VOID **) &SarProtocol);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2: %r\n", __FUNCTION__, Status));
     return FALSE;
   }
   //
@@ -246,11 +274,13 @@ IsBootBlock (
                           &BackupBlockSize
                           );
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 3: %r\n", __FUNCTION__, Status));
     return FALSE;
   }
 
   Status = SarProtocol->GetSwapState (SarProtocol, &IsSwapped);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 4: %r\n", __FUNCTION__, Status));
     return FALSE;
   }
   //
@@ -263,11 +293,14 @@ IsBootBlock (
   }
 
   if (FvbHandle == NULL) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 5\n", __FUNCTION__));
     return FALSE;
   }
   //
   // Compare the Fvb
   //
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 6: %d\n", __FUNCTION__,
+    FvBlock == BootFvb));
   return (BOOLEAN) (FvBlock == BootFvb);
 }
 
@@ -313,7 +346,10 @@ FlushSpareBlockToBootBlock (
   EFI_FIRMWARE_VOLUME_BLOCK_PROTOCOL  *BootFvb;
   EFI_LBA                             BootLba;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   if (!FeaturePcdGet(PcdFullFtwServiceEnable)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_UNSUPPORTED;
   }
 
@@ -322,6 +358,7 @@ FlushSpareBlockToBootBlock (
   //
   Status = FtwGetSarProtocol ((VOID **) &SarProtocol);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2: %r\n", __FUNCTION__, Status));
     return Status;
   }
   //
@@ -330,6 +367,7 @@ FlushSpareBlockToBootBlock (
   Length = FtwDevice->SpareAreaLength;
   Buffer  = AllocatePool (Length);
   if (Buffer == NULL) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 3\n", __FUNCTION__));
     return EFI_OUT_OF_RESOURCES;
   }
   //
@@ -339,6 +377,7 @@ FlushSpareBlockToBootBlock (
   if (EFI_ERROR (Status)) {
     DEBUG ((EFI_D_ERROR, "Ftw: Get Top Swapped status - %r\n", Status));
     FreePool (Buffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 4: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -348,6 +387,7 @@ FlushSpareBlockToBootBlock (
     //
     if (GetFvbByAddress (FtwDevice->SpareAreaAddress + FtwDevice->SpareAreaLength, &BootFvb) == NULL) {
       FreePool (Buffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 5\n", __FUNCTION__));
       return EFI_ABORTED;
     }
     //
@@ -366,6 +406,7 @@ FlushSpareBlockToBootBlock (
                           );
       if (EFI_ERROR (Status)) {
         FreePool (Buffer);
+        DEBUG ((DEBUG_VERBOSE, "%a: exit 6: %r\n", __FUNCTION__, Status));
         return Status;
       }
 
@@ -387,6 +428,7 @@ FlushSpareBlockToBootBlock (
                                           );
       if (EFI_ERROR (Status)) {
         FreePool (Buffer);
+        DEBUG ((DEBUG_VERBOSE, "%a: exit 7: %r\n", __FUNCTION__, Status));
         return Status;
       }
 
@@ -398,6 +440,7 @@ FlushSpareBlockToBootBlock (
     Status = SarProtocol->SetSwapState (SarProtocol, TRUE);
     if (EFI_ERROR (Status)) {
       FreePool (Buffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 8: %r\n", __FUNCTION__, Status));
       return Status;
     }
   }
@@ -408,6 +451,7 @@ FlushSpareBlockToBootBlock (
   Status = FtwEraseSpareBlock (FtwDevice);
   if (EFI_ERROR (Status)) {
     FreePool (Buffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 9: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
   //
@@ -426,6 +470,7 @@ FlushSpareBlockToBootBlock (
     if (EFI_ERROR (Status)) {
       DEBUG ((EFI_D_ERROR, "Ftw: FVB Write boot block - %r\n", Status));
       FreePool (Buffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 10: %r\n", __FUNCTION__, Status));
       return Status;
     }
 
@@ -439,6 +484,7 @@ FlushSpareBlockToBootBlock (
   //
   Status = SarProtocol->SetSwapState (SarProtocol, FALSE);
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 11: %r\n", __FUNCTION__, Status));
   return Status;
 }
 
@@ -472,7 +518,10 @@ FlushSpareBlockToTargetBlock (
   UINT8       *Ptr;
   UINTN       Index;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx\n", __FUNCTION__, (UINT64)Lba));
+
   if ((FtwDevice == NULL) || (FvBlock == NULL)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_INVALID_PARAMETER;
   }
   //
@@ -481,6 +530,7 @@ FlushSpareBlockToTargetBlock (
   Length = FtwDevice->SpareAreaLength;
   Buffer  = AllocatePool (Length);
   if (Buffer == NULL) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
     return EFI_OUT_OF_RESOURCES;
   }
   //
@@ -498,6 +548,7 @@ FlushSpareBlockToTargetBlock (
                                         );
     if (EFI_ERROR (Status)) {
       FreePool (Buffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 3: %r\n", __FUNCTION__, Status));
       return Status;
     }
 
@@ -509,6 +560,7 @@ FlushSpareBlockToTargetBlock (
   Status = FtwEraseBlock (FtwDevice, FvBlock, Lba);
   if (EFI_ERROR (Status)) {
     FreePool (Buffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 4: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
   //
@@ -521,6 +573,7 @@ FlushSpareBlockToTargetBlock (
     if (EFI_ERROR (Status)) {
       DEBUG ((EFI_D_ERROR, "Ftw: FVB Write block - %r\n", Status));
       FreePool (Buffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 5: %r\n", __FUNCTION__, Status));
       return Status;
     }
 
@@ -529,6 +582,7 @@ FlushSpareBlockToTargetBlock (
 
   FreePool (Buffer);
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 6: %r\n", __FUNCTION__, Status));
   return Status;
 }
 
@@ -564,12 +618,17 @@ FlushSpareBlockToWorkingBlock (
   UINTN                                   Index;
   EFI_LBA                                 WorkSpaceLbaOffset;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+  Status = EFI_SUCCESS;
+
   //
   // Allocate a memory buffer
   //
   Length = FtwDevice->SpareAreaLength;
   Buffer  = AllocatePool (Length);
   if (Buffer == NULL) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_OUT_OF_RESOURCES;
   }
 
@@ -603,6 +662,8 @@ FlushSpareBlockToWorkingBlock (
                                         );
     if (EFI_ERROR (Status)) {
       FreePool (Buffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return Status;
     }
 
@@ -634,6 +695,8 @@ FlushSpareBlockToWorkingBlock (
             );
   if (EFI_ERROR (Status)) {
     FreePool (Buffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_ABORTED;
   }
 
@@ -645,6 +708,8 @@ FlushSpareBlockToWorkingBlock (
   Status = FtwEraseBlock (FtwDevice, FtwDevice->FtwFvBlock, FtwDevice->FtwWorkBlockLba);
   if (EFI_ERROR (Status)) {
     FreePool (Buffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_ABORTED;
   }
   //
@@ -663,6 +728,8 @@ FlushSpareBlockToWorkingBlock (
     if (EFI_ERROR (Status)) {
       DEBUG ((EFI_D_ERROR, "Ftw: FVB Write block - %r\n", Status));
       FreePool (Buffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return Status;
     }
 
@@ -686,12 +753,16 @@ FlushSpareBlockToWorkingBlock (
             WORKING_BLOCK_VALID
             );
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_ABORTED;
   }
 
   FtwDevice->FtwWorkSpaceHeader->WorkingBlockInvalid = FTW_INVALID_STATE;
   FtwDevice->FtwWorkSpaceHeader->WorkingBlockValid = FTW_VALID_STATE;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
 
@@ -725,12 +796,17 @@ FtwUpdateFvState (
   UINT8       State;
   UINTN       Length;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx Offset=0x%Lx NewBit=%d\n",
+    __FUNCTION__, (UINT64)Lba, (UINT64)Offset, NewBit));
+
   //
   // Read state from device, assume State is only one byte.
   //
   Length  = sizeof (UINT8);
   Status  = FvBlock->Read (FvBlock, Lba, Offset, &Length, &State);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_ABORTED;
   }
 
@@ -744,6 +820,8 @@ FtwUpdateFvState (
   Length  = sizeof (UINT8);
   Status  = FvBlock->Write (FvBlock, Lba, Offset, &Length, &State);
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return Status;
 }
 
@@ -770,6 +848,11 @@ FtwGetLastWriteHeader (
 {
   UINTN                           Offset;
   EFI_FAULT_TOLERANT_WRITE_HEADER *FtwHeader;
+  EFI_STATUS                      Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, FtwWorkSpaceSize=0x%Lx\n", __FUNCTION__,
+    (UINT64)FtwWorkSpaceSize));
+  Status = EFI_SUCCESS;
 
   *FtwWriteHeader = NULL;
   FtwHeader       = (EFI_FAULT_TOLERANT_WRITE_HEADER *) (FtwWorkSpaceHeader + 1);
@@ -782,6 +865,8 @@ FtwGetLastWriteHeader (
     //
     if (Offset >= FtwWorkSpaceSize) {
       *FtwWriteHeader = FtwHeader;
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_ABORTED;
     }
 
@@ -792,6 +877,8 @@ FtwGetLastWriteHeader (
   //
   *FtwWriteHeader = FtwHeader;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
 
@@ -816,6 +903,10 @@ FtwGetLastWriteRecord (
 {
   UINTN                           Index;
   EFI_FAULT_TOLERANT_WRITE_RECORD *FtwRecord;
+  EFI_STATUS                      Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+  Status = EFI_SUCCESS;
 
   *FtwWriteRecord = NULL;
   FtwRecord       = (EFI_FAULT_TOLERANT_WRITE_RECORD *) (FtwWriteHeader + 1);
@@ -829,6 +920,8 @@ FtwGetLastWriteRecord (
       // The last write record is found
       //
       *FtwWriteRecord = FtwRecord;
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_SUCCESS;
     }
 
@@ -846,9 +939,13 @@ FtwGetLastWriteRecord (
   //
   if (Index == FtwWriteHeader->NumberOfWrites) {
     *FtwWriteRecord = (EFI_FAULT_TOLERANT_WRITE_RECORD *) ((UINTN) FtwRecord - FTW_RECORD_SIZE (FtwWriteHeader->PrivateDataSize));
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_SUCCESS;
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_ABORTED;
 }
 
@@ -899,10 +996,13 @@ IsLastRecordOfWrites (
   UINT8 *Head;
   UINT8 *Ptr;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   Head  = (UINT8 *) FtwHeader;
   Ptr   = (UINT8 *) FtwRecord;
 
   Head += FTW_WRITE_TOTAL_SIZE (FtwHeader->NumberOfWrites - 1, FtwHeader->PrivateDataSize);
+  DEBUG ((DEBUG_VERBOSE, "%a: exit: %d\n", __FUNCTION__, Head == Ptr));
   return (BOOLEAN) (Head == Ptr);
 }
 
@@ -923,15 +1023,22 @@ GetPreviousRecordOfWrites (
   )
 {
   UINT8 *Ptr;
+  EFI_STATUS Status = EFI_SUCCESS;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
 
   if (IsFirstRecordOfWrites (FtwHeader, *FtwRecord)) {
     *FtwRecord = NULL;
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_ACCESS_DENIED;
   }
 
   Ptr = (UINT8 *) (*FtwRecord);
   Ptr -= FTW_RECORD_SIZE (FtwHeader->PrivateDataSize);
   *FtwRecord = (EFI_FAULT_TOLERANT_WRITE_RECORD *) Ptr;
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
 
@@ -951,13 +1058,18 @@ InitFtwDevice (
   )
 {
   EFI_FTW_DEVICE                   *FtwDevice;
-  
+  EFI_STATUS Status = EFI_SUCCESS;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+    
   //
   // Allocate private data of this driver,
   // Including the FtwWorkSpace[FTW_WORK_SPACE_SIZE].
   //
   FtwDevice = AllocateZeroPool (sizeof (EFI_FTW_DEVICE) + PcdGet32 (PcdFlashNvStorageFtwWorkingSize));
   if (FtwDevice == NULL) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_OUT_OF_RESOURCES;
   }
 
@@ -969,6 +1081,8 @@ InitFtwDevice (
   if ((FtwDevice->WorkSpaceLength == 0) || (FtwDevice->SpareAreaLength == 0)) {
     DEBUG ((EFI_D_ERROR, "Ftw: Workspace or Spare block does not exist!\n"));
     FreePool (FtwDevice);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_INVALID_PARAMETER;
   }
 
@@ -989,6 +1103,8 @@ InitFtwDevice (
   }  
 
   *FtwData = FtwDevice;
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
 
@@ -1019,11 +1135,15 @@ FindFvbForFtw (
   EFI_FV_BLOCK_MAP_ENTRY              *FvbMapEntry;
   UINT32                              LbaIndex;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   //
   // Get all FVB handle.
   //
   Status = GetFvbCountAndBuffer (&HandleCount, &HandleBuffer);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_NOT_FOUND;
   }
 
@@ -1135,9 +1255,13 @@ FindFvbForFtw (
 
   if ((FtwDevice->FtwBackupFvb == NULL) || (FtwDevice->FtwFvBlock == NULL) ||
     (FtwDevice->FtwWorkSpaceLba == (EFI_LBA) (-1)) || (FtwDevice->FtwSpareLba == (EFI_LBA) (-1))) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_ABORTED;
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
 
@@ -1164,11 +1288,15 @@ InitFtwProtocol (
   EFI_HANDLE                          FvbHandle;
   EFI_LBA                             WorkSpaceLbaOffset;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   //
   // Find the right SMM Fvb protocol instance for FTW.
   //
   Status = FindFvbForFtw (FtwDevice);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_NOT_FOUND;
   }  
   
@@ -1317,6 +1445,8 @@ InitFtwProtocol (
   FtwDevice->FtwInstance.Abort           = FtwAbort;
   FtwDevice->FtwInstance.GetLastWrite    = FtwGetLastWrite;
     
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
 
diff --git a/MdeModulePkg/Universal/FaultTolerantWriteDxe/UpdateWorkingBlock.c b/MdeModulePkg/Universal/FaultTolerantWriteDxe/UpdateWorkingBlock.c
index a5fa12b..85a3a2d 100644
--- a/MdeModulePkg/Universal/FaultTolerantWriteDxe/UpdateWorkingBlock.c
+++ b/MdeModulePkg/Universal/FaultTolerantWriteDxe/UpdateWorkingBlock.c
@@ -31,6 +31,7 @@ InitializeLocalWorkSpaceHeader (
 {
   EFI_STATUS                              Status;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
   //
   // Check signature with gEdkiiWorkingBlockSignatureGuid.
   //
@@ -38,6 +39,7 @@ InitializeLocalWorkSpaceHeader (
     //
     // The local work space header has been initialized.
     //
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return;
   }
 
@@ -73,6 +75,8 @@ InitializeLocalWorkSpaceHeader (
 
   mWorkingBlockHeader.WorkingBlockValid    = FTW_VALID_STATE;
   mWorkingBlockHeader.WorkingBlockInvalid  = FTW_INVALID_STATE;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
 }
 
 /**
@@ -90,15 +94,20 @@ IsValidWorkSpace (
   IN EFI_FAULT_TOLERANT_WORKING_BLOCK_HEADER *WorkingHeader
   )
 {
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   if (WorkingHeader == NULL) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return FALSE;
   }
 
   if (CompareMem (WorkingHeader, &mWorkingBlockHeader, sizeof (EFI_FAULT_TOLERANT_WORKING_BLOCK_HEADER)) == 0) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
     return TRUE;
   }
 
   DEBUG ((EFI_D_ERROR, "Ftw: Work block header check error\n"));
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 3\n", __FUNCTION__));
   return FALSE;
 }
 
@@ -116,12 +125,15 @@ InitWorkSpaceHeader (
   IN EFI_FAULT_TOLERANT_WORKING_BLOCK_HEADER *WorkingHeader
   )
 {
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
   if (WorkingHeader == NULL) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_INVALID_PARAMETER;
   }
 
   CopyMem (WorkingHeader, &mWorkingBlockHeader, sizeof (EFI_FAULT_TOLERANT_WORKING_BLOCK_HEADER));
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -143,6 +155,7 @@ WorkSpaceRefresh (
   UINTN                           Length;
   UINTN                           RemainingSpaceSize;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
   //
   // Initialize WorkSpace as FTW_ERASED_BYTE
   //
@@ -164,6 +177,7 @@ WorkSpaceRefresh (
                                     FtwDevice->FtwWorkSpace
                                     );
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
   //
@@ -189,6 +203,7 @@ WorkSpaceRefresh (
     Status = FtwReclaimWorkSpace (FtwDevice, TRUE);
     if (EFI_ERROR (Status)) {
       DEBUG ((EFI_D_ERROR, "Ftw: Reclaim workspace - %r\n", Status));
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 2: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
     //
@@ -203,6 +218,7 @@ WorkSpaceRefresh (
                                       FtwDevice->FtwWorkSpace
                                       );
     if (EFI_ERROR (Status)) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 3: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -212,6 +228,7 @@ WorkSpaceRefresh (
               &FtwDevice->FtwLastWriteHeader
               );
     if (EFI_ERROR (Status)) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 4: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
   }
@@ -223,9 +240,11 @@ WorkSpaceRefresh (
             &FtwDevice->FtwLastWriteRecord
             );
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 5: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 6\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -258,6 +277,8 @@ FtwReclaimWorkSpace (
   UINT8                                   *Ptr;
   EFI_LBA                                 WorkSpaceLbaOffset;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, PreserveRecord=%d\n", __FUNCTION__,
+    PreserveRecord));
   DEBUG ((EFI_D_INFO, "Ftw: start to reclaim work space\n"));
 
   WorkSpaceLbaOffset = FtwDevice->FtwWorkSpaceLba - FtwDevice->FtwWorkBlockLba;
@@ -268,6 +289,7 @@ FtwReclaimWorkSpace (
   TempBufferSize = FtwDevice->SpareAreaLength;
   TempBuffer     = AllocateZeroPool (TempBufferSize);
   if (TempBuffer == NULL) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_OUT_OF_RESOURCES;
   }
 
@@ -283,6 +305,7 @@ FtwReclaimWorkSpace (
                                           );
     if (EFI_ERROR (Status)) {
       FreePool (TempBuffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 2: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -361,6 +384,7 @@ FtwReclaimWorkSpace (
   SpareBuffer     = AllocatePool (SpareBufferSize);
   if (SpareBuffer == NULL) {
     FreePool (TempBuffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 3\n", __FUNCTION__));
     return EFI_OUT_OF_RESOURCES;
   }
 
@@ -377,6 +401,7 @@ FtwReclaimWorkSpace (
     if (EFI_ERROR (Status)) {
       FreePool (TempBuffer);
       FreePool (SpareBuffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 4: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -399,6 +424,7 @@ FtwReclaimWorkSpace (
     if (EFI_ERROR (Status)) {
       FreePool (TempBuffer);
       FreePool (SpareBuffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 5: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -420,6 +446,7 @@ FtwReclaimWorkSpace (
             );
   if (EFI_ERROR (Status)) {
     FreePool (SpareBuffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 6: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
   //
@@ -436,6 +463,7 @@ FtwReclaimWorkSpace (
             );
   if (EFI_ERROR (Status)) {
     FreePool (SpareBuffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 7: %r\n", __FUNCTION__, Status));
     return EFI_ABORTED;
   }
 
@@ -447,6 +475,7 @@ FtwReclaimWorkSpace (
   Status = FlushSpareBlockToWorkingBlock (FtwDevice);
   if (EFI_ERROR (Status)) {
     FreePool (SpareBuffer);
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 8: %r\n", __FUNCTION__, Status));
     return Status;
   }
   //
@@ -465,6 +494,7 @@ FtwReclaimWorkSpace (
                                         );
     if (EFI_ERROR (Status)) {
       FreePool (SpareBuffer);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit 9: %r\n", __FUNCTION__, Status));
       return EFI_ABORTED;
     }
 
@@ -475,5 +505,6 @@ FtwReclaimWorkSpace (
 
   DEBUG ((EFI_D_INFO, "Ftw: reclaim work space successfully\n"));
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 10\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
diff --git a/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbInfo.c b/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbInfo.c
index 72845f9..e9f4254 100644
--- a/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbInfo.c
+++ b/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbInfo.c
@@ -112,6 +112,10 @@ GetFvbInfo (
 {
   STATIC BOOLEAN Checksummed = FALSE;
   UINTN Index;
+  EFI_STATUS Status = EFI_SUCCESS;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, FvLength=0x%Lx\n", __FUNCTION__,
+    FvLength));
 
   if (!Checksummed) {
     for (Index = 0; Index < sizeof (mPlatformFvbMediaInfo) / sizeof (EFI_FVB_MEDIA_INFO); Index += 1) {
@@ -129,9 +133,13 @@ GetFvbInfo (
   for (Index = 0; Index < sizeof (mPlatformFvbMediaInfo) / sizeof (EFI_FVB_MEDIA_INFO); Index += 1) {
     if (mPlatformFvbMediaInfo[Index].FvLength == FvLength) {
       *FvbInfo = &mPlatformFvbMediaInfo[Index].FvbInfo;
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_SUCCESS;
     }
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_NOT_FOUND;
 }
diff --git a/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FwBlockService.c b/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FwBlockService.c
index 7d26c41..56d1b73 100644
--- a/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FwBlockService.c
+++ b/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FwBlockService.c
@@ -147,6 +147,8 @@ Returns:
   EFI_FW_VOL_INSTANCE *FwhInstance;
   UINTN               Index;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   EfiConvertPointer (0x0, (VOID **) &mFvbModuleGlobal->FvInstance[FVB_VIRTUAL]);
 
   //
@@ -167,6 +169,8 @@ Returns:
   EfiConvertPointer (0x0, (VOID **) &mFvbModuleGlobal->FvbScratchSpace[FVB_VIRTUAL]);
   EfiConvertPointer (0x0, (VOID **) &mFvbModuleGlobal);
   QemuFlashConvertPointers ();
+
+  DEBUG ((DEBUG_VERBOSE, "%a: exit\n", __FUNCTION__));
 }
 
 EFI_STATUS
@@ -196,8 +200,14 @@ Returns:
 --*/
 {
   EFI_FW_VOL_INSTANCE *FwhRecord;
+  EFI_STATUS Status = EFI_SUCCESS;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Instance=0x%Lx, Global=%p, Virtual=%d\n",
+    __FUNCTION__, (UINT64)Instance, Global, Virtual));
 
   if (Instance >= Global->NumFv) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_INVALID_PARAMETER;
   }
   //
@@ -215,6 +225,8 @@ Returns:
 
   *FwhInstance = FwhRecord;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
 
@@ -249,6 +261,9 @@ Returns:
   EFI_FW_VOL_INSTANCE *FwhInstance;
   EFI_STATUS          Status;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Instance=0x%Lx, Global=%p, Virtual=%d\n",
+    __FUNCTION__, (UINT64)Instance, Global, Virtual));
+
   //
   // Find the right instance of the FVB private data
   //
@@ -256,6 +271,8 @@ Returns:
   ASSERT_EFI_ERROR (Status);
   *Address = FwhInstance->FvBase[Virtual];
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r, Address=0x%Lx\n",
+    __FUNCTION__, __LINE__, Status, (UINT64)*Address));
   return EFI_SUCCESS;
 }
 
@@ -289,6 +306,9 @@ Returns:
   EFI_FW_VOL_INSTANCE *FwhInstance;
   EFI_STATUS          Status;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Instance=0x%Lx, Global=%p, Virtual=%d\n",
+    __FUNCTION__, (UINT64)Instance, Global, Virtual));
+
   //
   // Find the right instance of the FVB private data
   //
@@ -296,6 +316,8 @@ Returns:
   ASSERT_EFI_ERROR (Status);
   *Attributes = FwhInstance->VolumeHeader.Attributes;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r, Attributes=0x%Lx\n",
+    __FUNCTION__, __LINE__, Status, (UINT64)*Attributes));
   return EFI_SUCCESS;
 }
 
@@ -343,6 +365,10 @@ Returns:
   EFI_FV_BLOCK_MAP_ENTRY  *BlockMap;
   EFI_STATUS              Status;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Instance=0x%Lx, Global=%p, Virtual=%d, "
+    "Lba=0x%Lx\n", __FUNCTION__, (UINT64)Instance, Global, Virtual,
+    (UINT64)Lba));
+
   //
   // Find the right instance of the FVB private data
   //
@@ -361,6 +387,8 @@ Returns:
     BlockLength = BlockMap->Length;
 
     if (NumBlocks == 0 || BlockLength == 0) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_INVALID_PARAMETER;
     }
 
@@ -373,16 +401,24 @@ Returns:
       Offset = Offset + (UINTN) MultU64x32 ((Lba - StartLba), BlockLength);
       if (LbaAddress != NULL) {
         *LbaAddress = FwhInstance->FvBase[Virtual] + Offset;
+        DEBUG ((DEBUG_VERBOSE, "%a: LbaAddress=0x%Lx\n", __FUNCTION__,
+          (UINT64)*LbaAddress));
       }
 
       if (LbaLength != NULL) {
         *LbaLength = BlockLength;
+        DEBUG ((DEBUG_VERBOSE, "%a: LbaLength=0x%Lx\n", __FUNCTION__,
+          (UINT64)*LbaLength));
       }
 
       if (NumOfBlocks != NULL) {
         *NumOfBlocks = (UINTN) (NextLba - Lba);
+        DEBUG ((DEBUG_VERBOSE, "%a: NumOfBlocks=0x%Lx\n", __FUNCTION__,
+          (UINT64)*NumOfBlocks));
       }
 
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_SUCCESS;
     }
 
@@ -434,6 +470,10 @@ Returns:
   EFI_STATUS            Status;
   EFI_FVB_ATTRIBUTES_2  UnchangedAttributes;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Instance=0x%Lx, Global=%p, Virtual=%d, "
+    "Attributes=0x%Lx\n", __FUNCTION__, (UINT64)Instance, Global, Virtual,
+    (UINTN)*Attributes));
+
   //
   // Find the right instance of the FVB private data
   //
@@ -467,6 +507,8 @@ Returns:
   // Some attributes of FV is read only can *not* be set
   //
   if ((OldAttributes & UnchangedAttributes) ^ (*Attributes & UnchangedAttributes)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_INVALID_PARAMETER;
   }
   //
@@ -474,6 +516,8 @@ Returns:
   //
   if (OldAttributes & EFI_FVB2_LOCK_STATUS) {
     if (OldStatus ^ NewStatus) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_ACCESS_DENIED;
     }
   }
@@ -482,6 +526,8 @@ Returns:
   //
   if ((Capabilities & EFI_FVB2_READ_DISABLED_CAP) == 0) {
     if ((NewStatus & EFI_FVB2_READ_STATUS) == 0) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_INVALID_PARAMETER;
     }
   }
@@ -490,6 +536,8 @@ Returns:
   //
   if ((Capabilities & EFI_FVB2_READ_ENABLED_CAP) == 0) {
     if (NewStatus & EFI_FVB2_READ_STATUS) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_INVALID_PARAMETER;
     }
   }
@@ -498,6 +546,8 @@ Returns:
   //
   if ((Capabilities & EFI_FVB2_WRITE_DISABLED_CAP) == 0) {
     if ((NewStatus & EFI_FVB2_WRITE_STATUS) == 0) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_INVALID_PARAMETER;
     }
   }
@@ -506,6 +556,8 @@ Returns:
   //
   if ((Capabilities & EFI_FVB2_WRITE_ENABLED_CAP) == 0) {
     if (NewStatus & EFI_FVB2_WRITE_STATUS) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_INVALID_PARAMETER;
     }
   }
@@ -514,6 +566,8 @@ Returns:
   //
   if ((Capabilities & EFI_FVB2_LOCK_CAP) == 0) {
     if (NewStatus & EFI_FVB2_LOCK_STATUS) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_INVALID_PARAMETER;
     }
   }
@@ -522,6 +576,8 @@ Returns:
   *AttribPtr  = (*AttribPtr) | NewStatus;
   *Attributes = *AttribPtr;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r, Attributes=0x%Lx\n",
+    __FUNCTION__, __LINE__, Status, (UINTN)*Attributes));
   return EFI_SUCCESS;
 }
 
@@ -553,10 +609,16 @@ Returns:
 --*/
 {
   EFI_FW_VOL_BLOCK_DEVICE *FvbDevice;
+  EFI_STATUS Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
 
   FvbDevice = FVB_DEVICE_FROM_THIS (This);
 
-  return FvbGetPhysicalAddress (FvbDevice->Instance, Address, mFvbModuleGlobal, EfiGoneVirtual ());
+  Status = FvbGetPhysicalAddress (FvbDevice->Instance, Address, mFvbModuleGlobal, EfiGoneVirtual ());
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
+  return Status;
 }
 
 EFI_STATUS
@@ -589,10 +651,13 @@ Returns:
 --*/
 {
   EFI_FW_VOL_BLOCK_DEVICE *FvbDevice;
+  EFI_STATUS Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx\n", __FUNCTION__, (UINT64)Lba));
 
   FvbDevice = FVB_DEVICE_FROM_THIS (This);
 
-  return FvbGetLbaAddress (
+  Status = FvbGetLbaAddress (
           FvbDevice->Instance,
           Lba,
           NULL,
@@ -601,6 +666,10 @@ Returns:
           mFvbModuleGlobal,
           EfiGoneVirtual ()
           );
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r BlockSize=0x%Lx "
+    "NumOfBlocks=0x%Lx\n", __FUNCTION__, __LINE__, Status, (UINT64)*BlockSize,
+    (UINT64)*NumOfBlocks));
+  return Status;
 }
 
 EFI_STATUS
@@ -624,10 +693,16 @@ Returns:
 --*/
 {
   EFI_FW_VOL_BLOCK_DEVICE *FvbDevice;
+  EFI_STATUS Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
 
   FvbDevice = FVB_DEVICE_FROM_THIS (This);
 
-  return FvbGetVolumeAttributes (FvbDevice->Instance, Attributes, mFvbModuleGlobal, EfiGoneVirtual ());
+  Status = FvbGetVolumeAttributes (FvbDevice->Instance, Attributes, mFvbModuleGlobal, EfiGoneVirtual ());
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
+  return Status;
 }
 
 EFI_STATUS
@@ -651,10 +726,16 @@ Returns:
 --*/
 {
   EFI_FW_VOL_BLOCK_DEVICE *FvbDevice;
+  EFI_STATUS Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
 
   FvbDevice = FVB_DEVICE_FROM_THIS (This);
 
-  return FvbSetVolumeAttributes (FvbDevice->Instance, Attributes, mFvbModuleGlobal, EfiGoneVirtual ());
+  Status = FvbSetVolumeAttributes (FvbDevice->Instance, Attributes, mFvbModuleGlobal, EfiGoneVirtual ());
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r, Attributes=0x%Lx\n",
+    __FUNCTION__, __LINE__, Status, (UINT64)*Attributes));
+  return Status;
 }
 
 EFI_STATUS
@@ -696,6 +777,8 @@ Returns:
   UINTN                   NumOfLba;
   EFI_STATUS              Status;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   FvbDevice = FVB_DEVICE_FROM_THIS (This);
 
   Status    = GetFvbInstance (FvbDevice->Instance, mFvbModuleGlobal, &FwhInstance, EfiGoneVirtual ());
@@ -718,6 +801,8 @@ Returns:
     //
     if ((NumOfLba == 0) || ((StartingLba + NumOfLba) > NumOfBlocks)) {
       VA_END (args);
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_INVALID_PARAMETER;
     }
   } while (1);
@@ -737,6 +822,8 @@ Returns:
       Status = QemuFlashEraseBlock (StartingLba);
       if (EFI_ERROR (Status)) {
         VA_END (args);
+        DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+          __FUNCTION__, __LINE__, Status));
         return Status;
       }
 
@@ -748,6 +835,8 @@ Returns:
 
   VA_END (args);
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
 
@@ -791,7 +880,16 @@ Returns:
 
 --*/
 {
-  return QemuFlashWrite ((EFI_LBA)Lba, (UINTN)Offset, NumBytes, (UINT8 *)Buffer);
+  EFI_STATUS Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx Offset=0x%Lx NumBytes=0x%Lx "
+    "Buffer=%p\n", __FUNCTION__, (UINT64)Lba, (UINT64)Offset,
+    (UINT64)*NumBytes, Buffer));
+
+  Status = QemuFlashWrite ((EFI_LBA)Lba, (UINTN)Offset, NumBytes, (UINT8 *)Buffer);
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r, NumBytes=0x%Lx\n",
+    __FUNCTION__, __LINE__, Status, (UINT64)*NumBytes));
+  return Status;
 }
 
 EFI_STATUS
@@ -835,7 +933,16 @@ Returns:
 
 --*/
 {
-  return QemuFlashRead ((EFI_LBA)Lba, (UINTN)Offset, NumBytes, (UINT8 *)Buffer);
+  EFI_STATUS Status;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx Offset=0x%Lx NumBytes=0x%Lx "
+    "Buffer=%p\n", __FUNCTION__, (UINT64)Lba, (UINT64)Offset,
+    (UINT64)*NumBytes, Buffer));
+
+  Status = QemuFlashRead ((EFI_LBA)Lba, (UINTN)Offset, NumBytes, (UINT8 *)Buffer);
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r, NumBytes=0x%Lx\n",
+    __FUNCTION__, __LINE__, Status, (UINT64)*NumBytes));
+  return Status;
 }
 
 EFI_STATUS
@@ -857,6 +964,9 @@ Returns:
 --*/
 {
   UINT16 Checksum;
+  EFI_STATUS Status = EFI_SUCCESS;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
 
   //
   // Verify the header revision, header signature, length
@@ -868,6 +978,8 @@ Returns:
       (FwVolHeader->FvLength == ((UINTN) -1)) ||
       ((FwVolHeader->HeaderLength & 0x01) != 0)
       ) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_NOT_FOUND;
   }
   
@@ -883,9 +995,13 @@ Returns:
 
     DEBUG ((EFI_D_INFO, "FV@%p Checksum is 0x%x, expected 0x%x\n",
             FwVolHeader, FwVolHeader->Checksum, Expected));
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_NOT_FOUND;
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
 
@@ -898,6 +1014,9 @@ MarkMemoryRangeForRuntimeAccess (
 {
   EFI_STATUS                          Status;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, BaseAddress=0x%Lx Length=0x%Lx\n",
+    __FUNCTION__, (UINT64)BaseAddress, Length));
+
   //
   // Mark flash region as runtime memory
   //
@@ -922,6 +1041,8 @@ MarkMemoryRangeForRuntimeAccess (
                   );
   ASSERT_EFI_ERROR (Status);
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return Status;
 }
 
@@ -938,6 +1059,8 @@ InitializeVariableFvHeader (
   UINTN                               WriteLength;
   UINTN                               BlockSize;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   FwVolHeader =
     (EFI_FIRMWARE_VOLUME_HEADER *) (UINTN)
       PcdGet32 (PcdOvmfFlashNvStorageVariableBase);
@@ -994,6 +1117,8 @@ InitializeVariableFvHeader (
     ASSERT (WriteLength == GoodFwVolHeader->HeaderLength);
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return Status;
 }
 
@@ -1028,11 +1153,15 @@ Returns:
   UINTN                               NumOfBlocks;
   EFI_EVENT                           VirtualAddressChangeEvent;
 
-  if (EFI_ERROR (QemuFlashInitialize ())) {
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
+  if (EFI_ERROR ((Status = QemuFlashInitialize ()))) {
     //
     // Return an error so image will be unloaded
     //
     DEBUG ((EFI_D_INFO, "QEMU flash was not detected. Writable FVB is not being installed.\n"));
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_WRITE_PROTECTED;
   }
 
@@ -1049,6 +1178,8 @@ Returns:
   Status = InitializeVariableFvHeader ();
   if (EFI_ERROR (Status)) {
     DEBUG ((EFI_D_INFO, "QEMU Flash: Unable to initialize variable FV header\n"));
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_WRITE_PROTECTED;
   }
 
@@ -1061,6 +1192,8 @@ Returns:
     Status = GetFvbInfo (Length, &FwVolHeader);
     if (EFI_ERROR (Status)) {
       DEBUG ((EFI_D_INFO, "EFI_ERROR (GetFvbInfo (Length, &FwVolHeader))\n"));
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_WRITE_PROTECTED;
     }
   }
@@ -1223,5 +1356,7 @@ Returns:
   ASSERT_EFI_ERROR (Status);
 
   PcdSetBool (PcdOvmfFlashVariablesEnable, TRUE);
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return EFI_SUCCESS;
 }
diff --git a/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlash.c b/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlash.c
index a3fe7d8..32e07e3 100644
--- a/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlash.c
+++ b/OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlash.c
@@ -43,7 +43,9 @@ QemuFlashConvertPointers (
   VOID
   )
 {
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
   EfiConvertPointer (0x0, (VOID **) &mFlashBase);
+  DEBUG ((DEBUG_VERBOSE, "%a: exit\n", __FUNCTION__));
 }
 
 
@@ -54,7 +56,13 @@ QemuFlashPtr (
   IN        UINTN                               Offset
   )
 {
-  return mFlashBase + (Lba * mFdBlockSize) + Offset;
+  UINT8 *Ret;
+
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx Offset=0x%Lx\n", __FUNCTION__,
+    (UINT64)Lba, (UINT64)Offset));
+  Ret = mFlashBase + (Lba * mFdBlockSize) + Offset;
+  DEBUG ((DEBUG_VERBOSE, "%a: exit, Ret=%p\n", __FUNCTION__, Ret));
+  return Ret;
 }
 
 
@@ -78,6 +86,8 @@ QemuFlashDetected (
   UINT8 OriginalUint8;
   UINT8 ProbeUint8;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   FlashDetected = FALSE;
   Ptr = QemuFlashPtr (0, 0);
 
@@ -93,6 +103,7 @@ QemuFlashDetected (
 
   if (Offset >= mFdBlockSize) {
     DEBUG ((EFI_D_INFO, "QEMU Flash: Failed to find probe location\n"));
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return FALSE;
   }
 
@@ -122,6 +133,7 @@ QemuFlashDetected (
 
   DEBUG ((EFI_D_INFO, "QemuFlashDetected => %a\n",
                       FlashDetected ? "Yes" : "No"));
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
   return FlashDetected;
 }
 
@@ -146,11 +158,16 @@ QemuFlashRead (
 {
   UINT8  *Ptr;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx Offset=0x%Lx NumBytes=0x%Lx "
+    "Buffer=%p\n", __FUNCTION__, (UINT64)Lba, (UINT64)Offset,
+    (UINT64)*NumBytes, Buffer));
+
   //
   // Only write to the first 64k. We don't bother saving the FTW Spare
   // block into the flash memory.
   //
   if (Lba >= mFdBlockCount) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_INVALID_PARAMETER;
   }
 
@@ -161,6 +178,7 @@ QemuFlashRead (
 
   CopyMem (Buffer, Ptr, *NumBytes);
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -186,11 +204,16 @@ QemuFlashWrite (
   volatile UINT8  *Ptr;
   UINTN           Loop;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx Offset=0x%Lx NumBytes=0x%Lx "
+    "Buffer=%p\n", __FUNCTION__, (UINT64)Lba, (UINT64)Offset,
+    (UINT64)*NumBytes, Buffer));
+
   //
   // Only write to the first 64k. We don't bother saving the FTW Spare
   // block into the flash memory.
   //
   if (Lba >= mFdBlockCount) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_INVALID_PARAMETER;
   }
 
@@ -211,6 +234,7 @@ QemuFlashWrite (
     *Ptr = READ_ARRAY_CMD;
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -228,13 +252,17 @@ QemuFlashEraseBlock (
 {
   volatile UINT8  *Ptr;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, Lba=0x%Lx\n", __FUNCTION__, (UINT64)Lba));
+
   if (Lba >= mFdBlockCount) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_INVALID_PARAMETER;
   }
 
   Ptr = QemuFlashPtr (Lba, 0);
   *Ptr = BLOCK_ERASE_CMD;
   *Ptr = BLOCK_ERASE_CONFIRM_CMD;
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
@@ -251,15 +279,19 @@ QemuFlashInitialize (
   VOID
   )
 {
+  DEBUG ((DEBUG_VERBOSE, "%a: enter\n", __FUNCTION__));
+
   mFlashBase = (UINT8*)(UINTN) PcdGet32 (PcdOvmfFdBaseAddress);
   mFdBlockSize = PcdGet32 (PcdOvmfFirmwareBlockSize);
   ASSERT(PcdGet32 (PcdOvmfFirmwareFdSize) % mFdBlockSize == 0);
   mFdBlockCount = PcdGet32 (PcdOvmfFirmwareFdSize) / mFdBlockSize;
 
   if (!QemuFlashDetected ()) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit 1\n", __FUNCTION__));
     return EFI_WRITE_PROTECTED;
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit 2\n", __FUNCTION__));
   return EFI_SUCCESS;
 }
 
diff --git a/SecurityPkg/VariableAuthenticated/RuntimeDxe/Reclaim.c b/SecurityPkg/VariableAuthenticated/RuntimeDxe/Reclaim.c
index b20facd..fc85b18 100644
--- a/SecurityPkg/VariableAuthenticated/RuntimeDxe/Reclaim.c
+++ b/SecurityPkg/VariableAuthenticated/RuntimeDxe/Reclaim.c
@@ -118,11 +118,14 @@ FtwVariableSpace (
   UINTN                              FtwBufferSize;
   EFI_FAULT_TOLERANT_WRITE_PROTOCOL  *FtwProtocol;
 
+  DEBUG ((DEBUG_VERBOSE, "%a:%a: enter\n", __FUNCTION__, __FILE__));
   //
   // Locate fault tolerant write protocol.
   //
   Status = GetFtwProtocol((VOID **) &FtwProtocol);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_NOT_FOUND;
   }
   //
@@ -130,6 +133,8 @@ FtwVariableSpace (
   //
   Status = GetFvbInfoByAddress (VariableBase, &FvbHandle, NULL);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return Status;
   }
   //
@@ -137,6 +142,8 @@ FtwVariableSpace (
   //
   Status = GetLbaAndOffsetByAddress (VariableBase, &VarLba, &VarOffset);
   if (EFI_ERROR (Status)) {
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
     return EFI_ABORTED;
   }
 
@@ -156,5 +163,7 @@ FtwVariableSpace (
                           (VOID *) VariableBuffer // write buffer
                           );
 
+    DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+      __FUNCTION__, __LINE__, Status));
   return Status;
 }
diff --git a/SecurityPkg/VariableAuthenticated/RuntimeDxe/Variable.c b/SecurityPkg/VariableAuthenticated/RuntimeDxe/Variable.c
index 28d026a..369e7bb 100644
--- a/SecurityPkg/VariableAuthenticated/RuntimeDxe/Variable.c
+++ b/SecurityPkg/VariableAuthenticated/RuntimeDxe/Variable.c
@@ -776,6 +776,12 @@ Reclaim (
   VARIABLE_HEADER       *UpdatingVariable;
   VARIABLE_HEADER       *UpdatingInDeletedTransition;
 
+  DEBUG ((DEBUG_VERBOSE, "%a: enter, VariableBase=0x%Lx, IsVolatile=%d, "
+    "NewVariableSize=0x%Lx, ReclaimPubKeyStore=%d\n", __FUNCTION__,
+    (UINT64)VariableBase, IsVolatile, (UINTN)NewVariableSize,
+    ReclaimPubKeyStore));
+  Status = EFI_SUCCESS;
+
   UpdatingVariable = NULL;
   UpdatingInDeletedTransition = NULL;
   if (UpdatingPtrTrack != NULL) {
@@ -826,6 +832,8 @@ Reclaim (
     MaximumBufferSize += 1;
     ValidBuffer = AllocatePool (MaximumBufferSize);
     if (ValidBuffer == NULL) {
+      DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+        __FUNCTION__, __LINE__, Status));
       return EFI_OUT_OF_RESOURCES;
     }
   } else {
@@ -1067,6 +1075,8 @@ Done:
     }
   }
 
+  DEBUG ((DEBUG_VERBOSE, "%a: exit @ %d, Status=%r\n",
+    __FUNCTION__, __LINE__, Status));
   return Status;
 }
 

[-- Attachment #2.1.3: direct.txt --]
[-- Type: text/plain, Size: 58143 bytes --]

Reclaim: enter, VariableBase=0xFFE00048, IsVolatile=0, NewVariableSize=0x0, ReclaimPubKeyStore=0
  FtwVariableSpace:/home/lacos/src/upstream/edk2-git-svn/SecurityPkg/VariableAuthenticated/RuntimeDxe/Reclaim.c: enter
    FvbProtocolGetAttributes: enter
      FvbGetVolumeAttributes: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: exit @ 229, Status=Success
      FvbGetVolumeAttributes: exit @ 320, Status=Success, Attributes=0x4FEFF
    FvbProtocolGetAttributes: exit @ 704, Status=Success

    FvbProtocolGetPhysicalAddress: enter
      FvbGetPhysicalAddress: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: exit @ 229, Status=Success
      FvbGetPhysicalAddress: exit @ 275, Status=Success, Address=0xFFE00000
    FvbProtocolGetPhysicalAddress: exit @ 620, Status=Success

    FvbProtocolGetAttributes: enter
      FvbGetVolumeAttributes: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: exit @ 229, Status=Success
      FvbGetVolumeAttributes: exit @ 320, Status=Success, Attributes=0x4FEFF
    FvbProtocolGetAttributes: exit @ 704, Status=Success

    FvbProtocolGetPhysicalAddress: enter
      FvbGetPhysicalAddress: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: exit @ 229, Status=Success
      FvbGetPhysicalAddress: exit @ 275, Status=Success, Address=0xFFE00000
    FvbProtocolGetPhysicalAddress: exit @ 620, Status=Success

    FvbProtocolGetPhysicalAddress: enter
      FvbGetPhysicalAddress: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
        GetFvbInstance: exit @ 229, Status=Success
      FvbGetPhysicalAddress: exit @ 275, Status=Success, Address=0xFFE00000
    FvbProtocolGetPhysicalAddress: exit @ 620, Status=Success

    FtwWrite: enter, Lba=0x0 Offset=0x48 Length=0xDFB8 PrivateData=0 FvBlockHandle=9F58E798 Buffer=9F7BE060
      WorkSpaceRefresh: enter
        FvbProtocolRead: enter, Lba=0xF Offset=0x0 NumBytes=0x1000 Buffer=9EBEF0E0
          QemuFlashRead: enter, Lba=0xF Offset=0x0 NumBytes=0x1000 Buffer=9EBEF0E0
            QemuFlashPtr: enter, Lba=0xF Offset=0x0
            QemuFlashPtr: exit, Ret=FFE0F000
          QemuFlashRead: exit 2
        FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000
        FtwGetLastWriteHeader: enter, FtwWorkSpaceSize=0x1000
        FtwGetLastWriteHeader: exit @ 881, Status=Success
        Ftw: Remaining work space size - 590
        FtwGetLastWriteRecord: enter
        FtwGetLastWriteRecord: exit @ 924, Status=Success
      WorkSpaceRefresh: exit 6

      IsErasedFlashBuffer: enter, Buffer=9EBEFB50 BufferSize=0x28
      IsErasedFlashBuffer: exit, IsEmpty=1

      FtwAllocate: enter, CallerId=FE5CEA76-4F72-49E8-986F-2CD899DFFE5D, PrivateDataSize=0x0, NumberOfWrites=0x1
        WorkSpaceRefresh: enter
          FvbProtocolRead: enter, Lba=0xF Offset=0x0 NumBytes=0x1000 Buffer=9EBEF0E0
            QemuFlashRead: enter, Lba=0xF Offset=0x0 NumBytes=0x1000 Buffer=9EBEF0E0
              QemuFlashPtr: enter, Lba=0xF Offset=0x0
              QemuFlashPtr: exit, Ret=FFE0F000
            QemuFlashRead: exit 2
          FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000
          FtwGetLastWriteHeader: enter, FtwWorkSpaceSize=0x1000
          FtwGetLastWriteHeader: exit @ 881, Status=Success
          Ftw: Remaining work space size - 590
          FtwGetLastWriteRecord: enter
          FtwGetLastWriteRecord: exit @ 924, Status=Success
        WorkSpaceRefresh: exit 6
        FvbProtocolWrite: enter, Lba=0xF Offset=0xA70 NumBytes=0x28 Buffer=9EBEFB50
          QemuFlashWrite: enter, Lba=0xF Offset=0xA70 NumBytes=0x28 Buffer=9EBEFB50
            QemuFlashPtr: enter, Lba=0xF Offset=0xA70
            QemuFlashPtr: exit, Ret=FFE0FA70
          QemuFlashWrite: exit 2
        FvbProtocolWrite: exit @ 891, Status=Success, NumBytes=0x28
        FtwUpdateFvState: enter, Lba=0xF Offset=0xA70 NewBit=2
          FvbProtocolRead: enter, Lba=0xF Offset=0xA70 NumBytes=0x1 Buffer=9F852647
            QemuFlashRead: enter, Lba=0xF Offset=0xA70 NumBytes=0x1 Buffer=9F852647
              QemuFlashPtr: enter, Lba=0xF Offset=0xA70
              QemuFlashPtr: exit, Ret=FFE0FA70
            QemuFlashRead: exit 2
          FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1
          FvbProtocolWrite: enter, Lba=0xF Offset=0xA70 NumBytes=0x1 Buffer=9F852647
            QemuFlashWrite: enter, Lba=0xF Offset=0xA70 NumBytes=0x1 Buffer=9F852647
              QemuFlashPtr: enter, Lba=0xF Offset=0xA70
              QemuFlashPtr: exit, Ret=FFE0FA70
            QemuFlashWrite: exit 2
          FvbProtocolWrite: exit @ 891, Status=Success, NumBytes=0x1
        FtwUpdateFvState: exit @ 824, Status=Success
        Ftw: Allocate() success, Caller:FE5CEA76-4F72-49E8-986F-2CD899DFFE5D, # 1
      FtwAllocate: exit 7

      FtwGetFvbByHandle: enter
      FtwGetFvbByHandle: exit: Success

      FvbProtocolGetPhysicalAddress: enter
        FvbGetPhysicalAddress: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
          GetFvbInstance: enter, Instance=0x0, Global=9F7DAF18, Virtual=0
          GetFvbInstance: exit @ 229, Status=Success
        FvbGetPhysicalAddress: exit @ 275, Status=Success, Address=0xFFE00000
      FvbProtocolGetPhysicalAddress: exit @ 620, Status=Success

      IsBootBlock: enter, Lba=0x0
        FtwGetSarProtocol: enter
        FtwGetSarProtocol: exit: Not Found
      IsBootBlock: exit 2: Not Found

      FvbProtocolWrite: enter, Lba=0xF Offset=0xA98 NumBytes=0x28 Buffer=9EBEFB78
        QemuFlashWrite: enter, Lba=0xF Offset=0xA98 NumBytes=0x28 Buffer=9EBEFB78
          QemuFlashPtr: enter, Lba=0xF Offset=0xA98
          QemuFlashPtr: exit, Ret=FFE0FA98
        QemuFlashWrite: exit 2
      FvbProtocolWrite: exit @ 891, Status=Success, NumBytes=0x28

      //
      // Read all original data from target block to memory buffer
      //
      FvbProtocolRead: enter, Lba=0x0 Offset=0x0 NumBytes=0x1000 Buffer=9C806018
        QemuFlashRead: enter, Lba=0x0 Offset=0x0 NumBytes=0x1000 Buffer=9C806018
          QemuFlashPtr: enter, Lba=0x0 Offset=0x0
          QemuFlashPtr: exit, Ret=FFE00000
        QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x1 Offset=0x0 NumBytes=0x1000 Buffer=9C807018
      QemuFlashRead: enter, Lba=0x1 Offset=0x0 NumBytes=0x1000 Buffer=9C807018
      QemuFlashPtr: enter, Lba=0x1 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE01000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x2 Offset=0x0 NumBytes=0x1000 Buffer=9C808018
      QemuFlashRead: enter, Lba=0x2 Offset=0x0 NumBytes=0x1000 Buffer=9C808018
      QemuFlashPtr: enter, Lba=0x2 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE02000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x3 Offset=0x0 NumBytes=0x1000 Buffer=9C809018
      QemuFlashRead: enter, Lba=0x3 Offset=0x0 NumBytes=0x1000 Buffer=9C809018
      QemuFlashPtr: enter, Lba=0x3 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE03000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x4 Offset=0x0 NumBytes=0x1000 Buffer=9C80A018
      QemuFlashRead: enter, Lba=0x4 Offset=0x0 NumBytes=0x1000 Buffer=9C80A018
      QemuFlashPtr: enter, Lba=0x4 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE04000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x5 Offset=0x0 NumBytes=0x1000 Buffer=9C80B018
      QemuFlashRead: enter, Lba=0x5 Offset=0x0 NumBytes=0x1000 Buffer=9C80B018
      QemuFlashPtr: enter, Lba=0x5 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE05000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x6 Offset=0x0 NumBytes=0x1000 Buffer=9C80C018
      QemuFlashRead: enter, Lba=0x6 Offset=0x0 NumBytes=0x1000 Buffer=9C80C018
      QemuFlashPtr: enter, Lba=0x6 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE06000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x7 Offset=0x0 NumBytes=0x1000 Buffer=9C80D018
      QemuFlashRead: enter, Lba=0x7 Offset=0x0 NumBytes=0x1000 Buffer=9C80D018
      QemuFlashPtr: enter, Lba=0x7 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE07000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x8 Offset=0x0 NumBytes=0x1000 Buffer=9C80E018
      QemuFlashRead: enter, Lba=0x8 Offset=0x0 NumBytes=0x1000 Buffer=9C80E018
      QemuFlashPtr: enter, Lba=0x8 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE08000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x9 Offset=0x0 NumBytes=0x1000 Buffer=9C80F018
      QemuFlashRead: enter, Lba=0x9 Offset=0x0 NumBytes=0x1000 Buffer=9C80F018
      QemuFlashPtr: enter, Lba=0x9 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE09000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0xA Offset=0x0 NumBytes=0x1000 Buffer=9C810018
      QemuFlashRead: enter, Lba=0xA Offset=0x0 NumBytes=0x1000 Buffer=9C810018
      QemuFlashPtr: enter, Lba=0xA Offset=0x0
      QemuFlashPtr: exit, Ret=FFE0A000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0xB Offset=0x0 NumBytes=0x1000 Buffer=9C811018
      QemuFlashRead: enter, Lba=0xB Offset=0x0 NumBytes=0x1000 Buffer=9C811018
      QemuFlashPtr: enter, Lba=0xB Offset=0x0
      QemuFlashPtr: exit, Ret=FFE0B000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0xC Offset=0x0 NumBytes=0x1000 Buffer=9C812018
      QemuFlashRead: enter, Lba=0xC Offset=0x0 NumBytes=0x1000 Buffer=9C812018
      QemuFlashPtr: enter, Lba=0xC Offset=0x0
      QemuFlashPtr: exit, Ret=FFE0C000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0xD Offset=0x0 NumBytes=0x1000 Buffer=9C813018
      QemuFlashRead: enter, Lba=0xD Offset=0x0 NumBytes=0x1000 Buffer=9C813018
      QemuFlashPtr: enter, Lba=0xD Offset=0x0
      QemuFlashPtr: exit, Ret=FFE0D000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0xE Offset=0x0 NumBytes=0x1000 Buffer=9C814018
      QemuFlashRead: enter, Lba=0xE Offset=0x0 NumBytes=0x1000 Buffer=9C814018
      QemuFlashPtr: enter, Lba=0xE Offset=0x0
      QemuFlashPtr: exit, Ret=FFE0E000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0xF Offset=0x0 NumBytes=0x1000 Buffer=9C815018
      QemuFlashRead: enter, Lba=0xF Offset=0x0 NumBytes=0x1000 Buffer=9C815018
      QemuFlashPtr: enter, Lba=0xF Offset=0x0
      QemuFlashPtr: exit, Ret=FFE0F000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000


      //
      // Try to keep the content of spare block
      // Save spare block into a spare backup memory buffer (Sparebuffer)
      //
      FvbProtocolRead: enter, Lba=0x10 Offset=0x0 NumBytes=0x1000 Buffer=9C7F5018
      QemuFlashRead: enter, Lba=0x10 Offset=0x0 NumBytes=0x1000 Buffer=9C7F5018
      QemuFlashPtr: enter, Lba=0x10 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE10000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x11 Offset=0x0 NumBytes=0x1000 Buffer=9C7F6018
      QemuFlashRead: enter, Lba=0x11 Offset=0x0 NumBytes=0x1000 Buffer=9C7F6018
      QemuFlashPtr: enter, Lba=0x11 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE11000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x12 Offset=0x0 NumBytes=0x1000 Buffer=9C7F7018
      QemuFlashRead: enter, Lba=0x12 Offset=0x0 NumBytes=0x1000 Buffer=9C7F7018
      QemuFlashPtr: enter, Lba=0x12 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE12000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x13 Offset=0x0 NumBytes=0x1000 Buffer=9C7F8018
      QemuFlashRead: enter, Lba=0x13 Offset=0x0 NumBytes=0x1000 Buffer=9C7F8018
      QemuFlashPtr: enter, Lba=0x13 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE13000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x14 Offset=0x0 NumBytes=0x1000 Buffer=9C7F9018
      QemuFlashRead: enter, Lba=0x14 Offset=0x0 NumBytes=0x1000 Buffer=9C7F9018
      QemuFlashPtr: enter, Lba=0x14 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE14000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x15 Offset=0x0 NumBytes=0x1000 Buffer=9C7FA018
      QemuFlashRead: enter, Lba=0x15 Offset=0x0 NumBytes=0x1000 Buffer=9C7FA018
      QemuFlashPtr: enter, Lba=0x15 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE15000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x16 Offset=0x0 NumBytes=0x1000 Buffer=9C7FB018
      QemuFlashRead: enter, Lba=0x16 Offset=0x0 NumBytes=0x1000 Buffer=9C7FB018
      QemuFlashPtr: enter, Lba=0x16 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE16000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x17 Offset=0x0 NumBytes=0x1000 Buffer=9C7FC018
      QemuFlashRead: enter, Lba=0x17 Offset=0x0 NumBytes=0x1000 Buffer=9C7FC018
      QemuFlashPtr: enter, Lba=0x17 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE17000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x18 Offset=0x0 NumBytes=0x1000 Buffer=9C7FD018
      QemuFlashRead: enter, Lba=0x18 Offset=0x0 NumBytes=0x1000 Buffer=9C7FD018
      QemuFlashPtr: enter, Lba=0x18 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE18000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x19 Offset=0x0 NumBytes=0x1000 Buffer=9C7FE018
      QemuFlashRead: enter, Lba=0x19 Offset=0x0 NumBytes=0x1000 Buffer=9C7FE018
      QemuFlashPtr: enter, Lba=0x19 Offset=0x0
      QemuFlashPtr: exit, Ret=FFE19000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x1A Offset=0x0 NumBytes=0x1000 Buffer=9C7FF018
      QemuFlashRead: enter, Lba=0x1A Offset=0x0 NumBytes=0x1000 Buffer=9C7FF018
      QemuFlashPtr: enter, Lba=0x1A Offset=0x0
      QemuFlashPtr: exit, Ret=FFE1A000
      QemuFlashRead: exit 2
      FvbProtocolRead: exit @ 944, Status=Success, NumBytes=0x1000

      FvbProtocolRead: enter, Lba=0x1B Offset=0x0 NumBytes=0x1000 Buffer=9C800018
      QemuFlashRead: enter, Lba=0x1B Offset=0x0 NumBytes=0x1000 Buffer=9C800018
      QemuFlashPtr: enter, Lba=0x1B Offset=0x