All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset
@ 2024-03-22 18:10 Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 01/26] pci-host/q35: Move PAM initialization above SMRAM initialization Paolo Bonzini
                   ` (25 more replies)
  0 siblings, 26 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

These are the common bits for TDX and SEV-SNP support for QEMU 9.1.

The main changes compared to what Michael posted is:

1) I am including KVM_SEV_INIT2 support patches without the linux-headers
update hack (however the linux-headers update in these patches is still
not final).  This also includes the bits to track whether guest state
are encrypted, that are needed by TDX as well as SEV-ES/SNP.

2) VFIO currently is blocked, pending a decision on what is worse
between possibly consuming twice the memory and disallowing device
assignment; or someone biting the bullet and implementing the
RamDiscardManager interface.

3) I included another easy patch from the TDX series, "i386/kvm: Move
architectural CPUID leaf generation to separate helper".

Please test. :)

Paolo

Chao Peng (2):
  kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  kvm: handle KVM_EXIT_MEMORY_FAULT

Isaku Yamahata (2):
  pci-host/q35: Move PAM initialization above SMRAM initialization
  q35: Introduce smm_ranges property for q35-pci-host

Michael Roth (5):
  scripts/update-linux-headers: Add setup_data.h to import list
  scripts/update-linux-headers: Add bits.h to file imports
  [HACK] linux-headers: Update headers for 6.8 + kvm-coco-queue + SNP
  [TO SQUASH] hw/i386: Remove redeclaration of struct setup_data
  RAMBlock: Add support of KVM private guest memfd

Paolo Bonzini (7):
  runstate: skip initial CPU reset if reset is not actually possible
  KVM: track whether guest state is encrypted
  KVM: remove kvm_arch_cpu_check_are_resettable
  target/i386: introduce x86-confidential-guest
  target/i386: Implement mc->kvm_type() to get VM type
  target/i386: SEV: use KVM_SEV_INIT2 if possible
  RAMBlock: make guest_memfd require uncoordinated discard

Sean Christopherson (1):
  i386/kvm: Move architectural CPUID leaf generation to separate helper

Xiaoyao Li (9):
  confidential guest support: Add kvm_init() and kvm_reset() in class
  i386/sev: Switch to use confidential_guest_kvm_init()
  ppc/pef: switch to use confidential_guest_kvm_init/reset()
  s390: Switch to use confidential_guest_kvm_init()
  trace/kvm: Split address space and slot id in
    trace_kvm_set_user_memory()
  kvm: Introduce support for memory_attributes
  kvm/memory: Make memory type private by default if it has guest memfd
    backend
  HostMem: Add mechanism to opt in kvm guest memfd via MachineState
  physmem: Introduce ram_block_discard_guest_memfd_range()

 include/exec/confidential-guest-support.h     |  34 +-
 include/exec/cpu-common.h                     |   2 +
 include/exec/memory.h                         |  20 +-
 include/exec/ram_addr.h                       |   2 +-
 include/exec/ramblock.h                       |   1 +
 include/hw/boards.h                           |   2 +
 include/hw/i386/pc.h                          |   1 +
 include/hw/pci-host/q35.h                     |   1 +
 include/hw/ppc/pef.h                          |  17 -
 include/standard-headers/asm-x86/bootparam.h  |  17 +-
 include/standard-headers/asm-x86/kvm_para.h   |   3 +-
 include/standard-headers/asm-x86/setup_data.h |  83 ++
 include/standard-headers/linux/ethtool.h      |  48 ++
 include/standard-headers/linux/fuse.h         |  39 +-
 .../linux/input-event-codes.h                 |   1 +
 include/standard-headers/linux/virtio_gpu.h   |   2 +
 include/standard-headers/linux/virtio_snd.h   | 154 ++++
 include/sysemu/hostmem.h                      |   1 +
 include/sysemu/kvm.h                          |  19 +-
 include/sysemu/kvm_int.h                      |   3 +
 linux-headers/asm-arm64/kvm.h                 |  15 +-
 linux-headers/asm-arm64/sve_context.h         |  11 +
 linux-headers/asm-generic/bitsperlong.h       |   4 +
 linux-headers/asm-loongarch/kvm.h             |   2 -
 linux-headers/asm-mips/kvm.h                  |   2 -
 linux-headers/asm-powerpc/kvm.h               |  45 +-
 linux-headers/asm-riscv/kvm.h                 |   3 +-
 linux-headers/asm-s390/kvm.h                  | 315 +++++++-
 linux-headers/asm-x86/kvm.h                   | 364 ++++++++-
 linux-headers/linux/bits.h                    |  15 +
 linux-headers/linux/kvm.h                     | 717 +-----------------
 linux-headers/linux/psp-sev.h                 |  71 ++
 target/i386/confidential-guest.h              |  59 ++
 target/i386/kvm/kvm_i386.h                    |   2 +
 target/i386/sev.h                             |   2 -
 target/s390x/kvm/pv.h                         |  14 -
 accel/kvm/kvm-accel-ops.c                     |   2 +-
 accel/kvm/kvm-all.c                           | 236 +++++-
 accel/stubs/kvm-stub.c                        |   5 +
 backends/hostmem-file.c                       |   1 +
 backends/hostmem-memfd.c                      |   1 +
 backends/hostmem-ram.c                        |   1 +
 backends/hostmem.c                            |   1 +
 hw/core/machine.c                             |   5 +
 hw/i386/pc_q35.c                              |   2 +
 hw/i386/x86.c                                 |  19 +-
 hw/pci-host/q35.c                             |  61 +-
 hw/ppc/pef.c                                  |   9 +-
 hw/ppc/spapr.c                                |  10 +-
 hw/s390x/s390-virtio-ccw.c                    |   5 +-
 system/memory.c                               |   5 +
 system/physmem.c                              |  65 +-
 system/runstate.c                             |  15 +-
 target/arm/kvm.c                              |   5 -
 target/i386/confidential-guest.c              |  33 +
 target/i386/kvm/kvm.c                         | 507 +++++++------
 target/i386/kvm/sev-stub.c                    |  21 -
 target/i386/sev.c                             | 173 +++--
 target/loongarch/kvm/kvm.c                    |   5 -
 target/mips/kvm.c                             |   5 -
 target/ppc/kvm.c                              |   5 -
 target/riscv/kvm/kvm-cpu.c                    |   5 -
 target/s390x/kvm/kvm.c                        |   5 -
 target/s390x/kvm/pv.c                         |   8 +
 accel/kvm/trace-events                        |   4 +-
 roms/edk2                                     |   2 +-
 scripts/update-linux-headers.sh               |   8 +-
 target/i386/kvm/meson.build                   |   2 -
 target/i386/meson.build                       |   2 +-
 69 files changed, 2124 insertions(+), 1200 deletions(-)
 delete mode 100644 include/hw/ppc/pef.h
 create mode 100644 include/standard-headers/asm-x86/setup_data.h
 create mode 100644 linux-headers/linux/bits.h
 create mode 100644 target/i386/confidential-guest.h
 create mode 100644 target/i386/confidential-guest.c
 delete mode 100644 target/i386/kvm/sev-stub.c

-- 
2.44.0



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/26] pci-host/q35: Move PAM initialization above SMRAM initialization
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
@ 2024-03-22 18:10 ` Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 02/26] q35: Introduce smm_ranges property for q35-pci-host Paolo Bonzini
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Isaku Yamahata <isaku.yamahata@intel.com>

In mch_realize(), process PAM initialization before SMRAM initialization so
that later patch can skill all the SMRAM related with a single check.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240320083945.991426-18-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pci-host/q35.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 0d7d4e3f086..98d4a7c253a 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -568,6 +568,16 @@ static void mch_realize(PCIDevice *d, Error **errp)
     /* setup pci memory mapping */
     pc_pci_as_mapping_init(mch->system_memory, mch->pci_address_space);
 
+    /* PAM */
+    init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
+             mch->system_memory, mch->pci_address_space,
+             PAM_BIOS_BASE, PAM_BIOS_SIZE);
+    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
+        init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
+                 mch->system_memory, mch->pci_address_space,
+                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
@@ -634,15 +644,6 @@ static void mch_realize(PCIDevice *d, Error **errp)
 
     object_property_add_const_link(qdev_get_machine(), "smram",
                                    OBJECT(&mch->smram));
-
-    init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
-             mch->system_memory, mch->pci_address_space,
-             PAM_BIOS_BASE, PAM_BIOS_SIZE);
-    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
-        init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
-                 mch->system_memory, mch->pci_address_space,
-                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
-    }
 }
 
 uint64_t mch_mcfg_base(void)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/26] q35: Introduce smm_ranges property for q35-pci-host
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 01/26] pci-host/q35: Move PAM initialization above SMRAM initialization Paolo Bonzini
@ 2024-03-22 18:10 ` Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 03/26] confidential guest support: Add kvm_init() and kvm_reset() in class Paolo Bonzini
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Isaku Yamahata <isaku.yamahata@linux.intel.com>

Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
etc... exist for the target platform.  TDX doesn't support SMM and doesn't
play nice with QEMU modifying related guest memory ranges.

Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240320083945.991426-19-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/hw/i386/pc.h      |  1 +
 include/hw/pci-host/q35.h |  1 +
 hw/i386/pc_q35.c          |  2 ++
 hw/pci-host/q35.c         | 42 +++++++++++++++++++++++++++------------
 4 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 27a68071d77..fb1d4106e50 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -161,6 +161,7 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
 #define PCI_HOST_PROP_PCI_HOLE64_SIZE  "pci-hole64-size"
 #define PCI_HOST_BELOW_4G_MEM_SIZE     "below-4g-mem-size"
 #define PCI_HOST_ABOVE_4G_MEM_SIZE     "above-4g-mem-size"
+#define PCI_HOST_PROP_SMM_RANGES       "smm-ranges"
 
 
 void pc_pci_as_mapping_init(MemoryRegion *system_memory,
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index bafcbe67521..22fadfa3ed7 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -50,6 +50,7 @@ struct MCHPCIState {
     MemoryRegion tseg_blackhole, tseg_window;
     MemoryRegion smbase_blackhole, smbase_window;
     bool has_smram_at_smbase;
+    bool has_smm_ranges;
     Range pci_hole;
     uint64_t below_4g_mem_size;
     uint64_t above_4g_mem_size;
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index b5922b44afa..7f2d85df75f 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -219,6 +219,8 @@ static void pc_q35_init(MachineState *machine)
                             x86ms->above_4g_mem_size, NULL);
     object_property_set_bool(phb, PCI_HOST_BYPASS_IOMMU,
                              pcms->default_bus_bypass_iommu, NULL);
+    object_property_set_bool(phb, PCI_HOST_PROP_SMM_RANGES,
+                             x86_machine_is_smm_enabled(x86ms), NULL);
     sysbus_realize_and_unref(SYS_BUS_DEVICE(phb), &error_fatal);
 
     /* pci */
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 98d4a7c253a..0b6cbaed7ed 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -179,6 +179,8 @@ static Property q35_host_props[] = {
                      mch.below_4g_mem_size, 0),
     DEFINE_PROP_SIZE(PCI_HOST_ABOVE_4G_MEM_SIZE, Q35PCIHost,
                      mch.above_4g_mem_size, 0),
+    DEFINE_PROP_BOOL(PCI_HOST_PROP_SMM_RANGES, Q35PCIHost,
+                     mch.has_smm_ranges, true),
     DEFINE_PROP_BOOL("x-pci-hole64-fix", Q35PCIHost, pci_hole64_fix, true),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -214,6 +216,7 @@ static void q35_host_initfn(Object *obj)
     /* mch's object_initialize resets the default value, set it again */
     qdev_prop_set_uint64(DEVICE(s), PCI_HOST_PROP_PCI_HOLE64_SIZE,
                          Q35_PCI_HOST_HOLE64_SIZE_DEFAULT);
+
     object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_START, "uint32",
                         q35_host_get_pci_hole_start,
                         NULL, NULL, NULL);
@@ -476,6 +479,10 @@ static void mch_write_config(PCIDevice *d,
         mch_update_pciexbar(mch);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     if (ranges_overlap(address, len, MCH_HOST_BRIDGE_SMRAM,
                        MCH_HOST_BRIDGE_SMRAM_SIZE)) {
         mch_update_smram(mch);
@@ -494,10 +501,13 @@ static void mch_write_config(PCIDevice *d,
 static void mch_update(MCHPCIState *mch)
 {
     mch_update_pciexbar(mch);
+
     mch_update_pam(mch);
-    mch_update_smram(mch);
-    mch_update_ext_tseg_mbytes(mch);
-    mch_update_smbase_smram(mch);
+    if (mch->has_smm_ranges) {
+        mch_update_smram(mch);
+        mch_update_ext_tseg_mbytes(mch);
+        mch_update_smbase_smram(mch);
+    }
 
     /*
      * pci hole goes from end-of-low-ram to io-apic.
@@ -538,19 +548,21 @@ static void mch_reset(DeviceState *qdev)
     pci_set_quad(d->config + MCH_HOST_BRIDGE_PCIEXBAR,
                  MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT);
 
-    d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
-    d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
-    d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
-    d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
+    if (mch->has_smm_ranges) {
+        d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
+        d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
+        d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
+        d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
 
-    if (mch->ext_tseg_mbytes > 0) {
-        pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
-                     MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+        if (mch->ext_tseg_mbytes > 0) {
+            pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
+                        MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+        }
+
+        d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
+        d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
     }
 
-    d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
-    d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
-
     mch_update(mch);
 }
 
@@ -578,6 +590,10 @@ static void mch_realize(PCIDevice *d, Error **errp)
                  PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/26] confidential guest support: Add kvm_init() and kvm_reset() in class
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 01/26] pci-host/q35: Move PAM initialization above SMRAM initialization Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 02/26] q35: Introduce smm_ranges property for q35-pci-host Paolo Bonzini
@ 2024-03-22 18:10 ` Paolo Bonzini
  2024-03-25  8:32   ` Philippe Mathieu-Daudé
  2024-03-22 18:10 ` [PATCH 04/26] i386/sev: Switch to use confidential_guest_kvm_init() Paolo Bonzini
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Xiaoyao Li <xiaoyao.li@intel.com>

Different confidential VMs in different architectures all have the same
needs to do their specific initialization (and maybe resetting) stuffs
with KVM. Currently each of them exposes individual *_kvm_init()
functions and let machine code or kvm code to call it.

To facilitate the introduction of confidential guest technology from
different x86 vendors, add two virtual functions, kvm_init() and kvm_reset()
in ConfidentialGuestSupportClass, and expose two helpers functions for
invodking them.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20240229060038.606591-1-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/exec/confidential-guest-support.h | 34 ++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/exec/confidential-guest-support.h b/include/exec/confidential-guest-support.h
index ba2dd4b5dfc..e5b188cffbf 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -23,7 +23,10 @@
 #include "qom/object.h"
 
 #define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
-OBJECT_DECLARE_SIMPLE_TYPE(ConfidentialGuestSupport, CONFIDENTIAL_GUEST_SUPPORT)
+OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
+                    ConfidentialGuestSupportClass,
+                    CONFIDENTIAL_GUEST_SUPPORT)
+
 
 struct ConfidentialGuestSupport {
     Object parent;
@@ -55,8 +58,37 @@ struct ConfidentialGuestSupport {
 
 typedef struct ConfidentialGuestSupportClass {
     ObjectClass parent;
+
+    int (*kvm_init)(ConfidentialGuestSupport *cgs, Error **errp);
+    int (*kvm_reset)(ConfidentialGuestSupport *cgs, Error **errp);
 } ConfidentialGuestSupportClass;
 
+static inline int confidential_guest_kvm_init(ConfidentialGuestSupport *cgs,
+                                              Error **errp)
+{
+    ConfidentialGuestSupportClass *klass;
+
+    klass = CONFIDENTIAL_GUEST_SUPPORT_GET_CLASS(cgs);
+    if (klass->kvm_init) {
+        return klass->kvm_init(cgs, errp);
+    }
+
+    return 0;
+}
+
+static inline int confidential_guest_kvm_reset(ConfidentialGuestSupport *cgs,
+                                               Error **errp)
+{
+    ConfidentialGuestSupportClass *klass;
+
+    klass = CONFIDENTIAL_GUEST_SUPPORT_GET_CLASS(cgs);
+    if (klass->kvm_reset) {
+        return klass->kvm_reset(cgs, errp);
+    }
+
+    return 0;
+}
+
 #endif /* !CONFIG_USER_ONLY */
 
 #endif /* QEMU_CONFIDENTIAL_GUEST_SUPPORT_H */
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/26] i386/sev: Switch to use confidential_guest_kvm_init()
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (2 preceding siblings ...)
  2024-03-22 18:10 ` [PATCH 03/26] confidential guest support: Add kvm_init() and kvm_reset() in class Paolo Bonzini
@ 2024-03-22 18:10 ` Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 05/26] ppc/pef: switch to use confidential_guest_kvm_init/reset() Paolo Bonzini
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Xiaoyao Li <xiaoyao.li@intel.com>

Use confidential_guest_kvm_init() instead of calling SEV
specific sev_kvm_init(). This allows the introduction of multiple
confidential-guest-support subclasses for different x86 vendors.

As a bonus, stubs are not needed anymore since there is no
direct call from target/i386/kvm/kvm.c to SEV code.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20240229060038.606591-1-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/sev.h           |   2 -
 target/i386/kvm/kvm.c       |  10 +--
 target/i386/kvm/sev-stub.c  |  21 ------
 target/i386/sev.c           | 127 ++++++++++++++++++------------------
 target/i386/kvm/meson.build |   2 -
 5 files changed, 69 insertions(+), 93 deletions(-)
 delete mode 100644 target/i386/kvm/sev-stub.c

diff --git a/target/i386/sev.h b/target/i386/sev.h
index e7499c95b1e..9e10d09539a 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -57,6 +57,4 @@ int sev_inject_launch_secret(const char *hdr, const char *secret,
 int sev_es_save_reset_vector(void *flash_ptr, uint64_t flash_size);
 void sev_es_set_reset_vector(CPUState *cpu);
 
-int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
-
 #endif
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e68cbe92930..de10155b37a 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2538,10 +2538,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
      * mechanisms are supported in future (e.g. TDX), they'll need
      * their own initialization either here or elsewhere.
      */
-    ret = sev_kvm_init(ms->cgs, &local_err);
-    if (ret < 0) {
-        error_report_err(local_err);
-        return ret;
+    if (ms->cgs) {
+        ret = confidential_guest_kvm_init(ms->cgs, &local_err);
+        if (ret < 0) {
+            error_report_err(local_err);
+            return ret;
+        }
     }
 
     has_xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
diff --git a/target/i386/kvm/sev-stub.c b/target/i386/kvm/sev-stub.c
deleted file mode 100644
index 1be5341e8a6..00000000000
--- a/target/i386/kvm/sev-stub.c
+++ /dev/null
@@ -1,21 +0,0 @@
-/*
- * QEMU SEV stub
- *
- * Copyright Advanced Micro Devices 2018
- *
- * Authors:
- *      Brijesh Singh <brijesh.singh@amd.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#include "qemu/osdep.h"
-#include "sev.h"
-
-int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
-{
-    /* If we get here, cgs must be some non-SEV thing */
-    return 0;
-}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 72930ff0dcc..b8f79d34d19 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -353,63 +353,6 @@ static void sev_guest_set_kernel_hashes(Object *obj, bool value, Error **errp)
     sev->kernel_hashes = value;
 }
 
-static void
-sev_guest_class_init(ObjectClass *oc, void *data)
-{
-    object_class_property_add_str(oc, "sev-device",
-                                  sev_guest_get_sev_device,
-                                  sev_guest_set_sev_device);
-    object_class_property_set_description(oc, "sev-device",
-            "SEV device to use");
-    object_class_property_add_str(oc, "dh-cert-file",
-                                  sev_guest_get_dh_cert_file,
-                                  sev_guest_set_dh_cert_file);
-    object_class_property_set_description(oc, "dh-cert-file",
-            "guest owners DH certificate (encoded with base64)");
-    object_class_property_add_str(oc, "session-file",
-                                  sev_guest_get_session_file,
-                                  sev_guest_set_session_file);
-    object_class_property_set_description(oc, "session-file",
-            "guest owners session parameters (encoded with base64)");
-    object_class_property_add_bool(oc, "kernel-hashes",
-                                   sev_guest_get_kernel_hashes,
-                                   sev_guest_set_kernel_hashes);
-    object_class_property_set_description(oc, "kernel-hashes",
-            "add kernel hashes to guest firmware for measured Linux boot");
-}
-
-static void
-sev_guest_instance_init(Object *obj)
-{
-    SevGuestState *sev = SEV_GUEST(obj);
-
-    sev->sev_device = g_strdup(DEFAULT_SEV_DEVICE);
-    sev->policy = DEFAULT_GUEST_POLICY;
-    object_property_add_uint32_ptr(obj, "policy", &sev->policy,
-                                   OBJ_PROP_FLAG_READWRITE);
-    object_property_add_uint32_ptr(obj, "handle", &sev->handle,
-                                   OBJ_PROP_FLAG_READWRITE);
-    object_property_add_uint32_ptr(obj, "cbitpos", &sev->cbitpos,
-                                   OBJ_PROP_FLAG_READWRITE);
-    object_property_add_uint32_ptr(obj, "reduced-phys-bits",
-                                   &sev->reduced_phys_bits,
-                                   OBJ_PROP_FLAG_READWRITE);
-}
-
-/* sev guest info */
-static const TypeInfo sev_guest_info = {
-    .parent = TYPE_CONFIDENTIAL_GUEST_SUPPORT,
-    .name = TYPE_SEV_GUEST,
-    .instance_size = sizeof(SevGuestState),
-    .instance_finalize = sev_guest_finalize,
-    .class_init = sev_guest_class_init,
-    .instance_init = sev_guest_instance_init,
-    .interfaces = (InterfaceInfo[]) {
-        { TYPE_USER_CREATABLE },
-        { }
-    }
-};
-
 bool
 sev_enabled(void)
 {
@@ -906,20 +849,15 @@ sev_vm_state_change(void *opaque, bool running, RunState state)
     }
 }
 
-int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
+static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
-    SevGuestState *sev
-        = (SevGuestState *)object_dynamic_cast(OBJECT(cgs), TYPE_SEV_GUEST);
+    SevGuestState *sev = SEV_GUEST(cgs);
     char *devname;
     int ret, fw_error, cmd;
     uint32_t ebx;
     uint32_t host_cbitpos;
     struct sev_user_data_status status = {};
 
-    if (!sev) {
-        return 0;
-    }
-
     ret = ram_block_discard_disable(true);
     if (ret) {
         error_report("%s: cannot disable RAM discard", __func__);
@@ -1384,6 +1322,67 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp)
     return ret;
 }
 
+static void
+sev_guest_class_init(ObjectClass *oc, void *data)
+{
+    ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
+
+    klass->kvm_init = sev_kvm_init;
+
+    object_class_property_add_str(oc, "sev-device",
+                                  sev_guest_get_sev_device,
+                                  sev_guest_set_sev_device);
+    object_class_property_set_description(oc, "sev-device",
+            "SEV device to use");
+    object_class_property_add_str(oc, "dh-cert-file",
+                                  sev_guest_get_dh_cert_file,
+                                  sev_guest_set_dh_cert_file);
+    object_class_property_set_description(oc, "dh-cert-file",
+            "guest owners DH certificate (encoded with base64)");
+    object_class_property_add_str(oc, "session-file",
+                                  sev_guest_get_session_file,
+                                  sev_guest_set_session_file);
+    object_class_property_set_description(oc, "session-file",
+            "guest owners session parameters (encoded with base64)");
+    object_class_property_add_bool(oc, "kernel-hashes",
+                                   sev_guest_get_kernel_hashes,
+                                   sev_guest_set_kernel_hashes);
+    object_class_property_set_description(oc, "kernel-hashes",
+            "add kernel hashes to guest firmware for measured Linux boot");
+}
+
+static void
+sev_guest_instance_init(Object *obj)
+{
+    SevGuestState *sev = SEV_GUEST(obj);
+
+    sev->sev_device = g_strdup(DEFAULT_SEV_DEVICE);
+    sev->policy = DEFAULT_GUEST_POLICY;
+    object_property_add_uint32_ptr(obj, "policy", &sev->policy,
+                                   OBJ_PROP_FLAG_READWRITE);
+    object_property_add_uint32_ptr(obj, "handle", &sev->handle,
+                                   OBJ_PROP_FLAG_READWRITE);
+    object_property_add_uint32_ptr(obj, "cbitpos", &sev->cbitpos,
+                                   OBJ_PROP_FLAG_READWRITE);
+    object_property_add_uint32_ptr(obj, "reduced-phys-bits",
+                                   &sev->reduced_phys_bits,
+                                   OBJ_PROP_FLAG_READWRITE);
+}
+
+/* sev guest info */
+static const TypeInfo sev_guest_info = {
+    .parent = TYPE_CONFIDENTIAL_GUEST_SUPPORT,
+    .name = TYPE_SEV_GUEST,
+    .instance_size = sizeof(SevGuestState),
+    .instance_finalize = sev_guest_finalize,
+    .class_init = sev_guest_class_init,
+    .instance_init = sev_guest_instance_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
 static void
 sev_register_types(void)
 {
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 84d9143e602..e7850981e62 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -7,8 +7,6 @@ i386_kvm_ss.add(files(
 
 i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
 
-i386_kvm_ss.add(when: 'CONFIG_SEV', if_false: files('sev-stub.c'))
-
 i386_system_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), if_false: files('hyperv-stub.c'))
 
 i386_system_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/26] ppc/pef: switch to use confidential_guest_kvm_init/reset()
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (3 preceding siblings ...)
  2024-03-22 18:10 ` [PATCH 04/26] i386/sev: Switch to use confidential_guest_kvm_init() Paolo Bonzini
@ 2024-03-22 18:10 ` Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 06/26] s390: Switch to use confidential_guest_kvm_init() Paolo Bonzini
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Xiaoyao Li <xiaoyao.li@intel.com>

Use the unified interface to call confidential guest related kvm_init()
and kvm_reset(), to avoid exposing pef specific functions.

As a bonus, pef.h goes away since there is no direct call from sPAPR
board code to PEF code anymore.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/hw/ppc/pef.h | 17 -----------------
 hw/ppc/pef.c         |  9 ++++++---
 hw/ppc/spapr.c       | 10 +++++++---
 3 files changed, 13 insertions(+), 23 deletions(-)
 delete mode 100644 include/hw/ppc/pef.h

diff --git a/include/hw/ppc/pef.h b/include/hw/ppc/pef.h
deleted file mode 100644
index 707dbe524c4..00000000000
--- a/include/hw/ppc/pef.h
+++ /dev/null
@@ -1,17 +0,0 @@
-/*
- * PEF (Protected Execution Facility) for POWER support
- *
- * Copyright Red Hat.
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef HW_PPC_PEF_H
-#define HW_PPC_PEF_H
-
-int pef_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
-int pef_kvm_reset(ConfidentialGuestSupport *cgs, Error **errp);
-
-#endif /* HW_PPC_PEF_H */
diff --git a/hw/ppc/pef.c b/hw/ppc/pef.c
index d28ed3ba733..47553348b1e 100644
--- a/hw/ppc/pef.c
+++ b/hw/ppc/pef.c
@@ -15,7 +15,6 @@
 #include "sysemu/kvm.h"
 #include "migration/blocker.h"
 #include "exec/confidential-guest-support.h"
-#include "hw/ppc/pef.h"
 
 #define TYPE_PEF_GUEST "pef-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(PefGuest, PEF_GUEST)
@@ -93,7 +92,7 @@ static int kvmppc_svm_off(Error **errp)
 #endif
 }
 
-int pef_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
+static int pef_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
     if (!object_dynamic_cast(OBJECT(cgs), TYPE_PEF_GUEST)) {
         return 0;
@@ -107,7 +106,7 @@ int pef_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
     return kvmppc_svm_init(cgs, errp);
 }
 
-int pef_kvm_reset(ConfidentialGuestSupport *cgs, Error **errp)
+static int pef_kvm_reset(ConfidentialGuestSupport *cgs, Error **errp)
 {
     if (!object_dynamic_cast(OBJECT(cgs), TYPE_PEF_GUEST)) {
         return 0;
@@ -131,6 +130,10 @@ OBJECT_DEFINE_TYPE_WITH_INTERFACES(PefGuest,
 
 static void pef_guest_class_init(ObjectClass *oc, void *data)
 {
+    ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
+
+    klass->kvm_init = pef_kvm_init;
+    klass->kvm_reset = pef_kvm_reset;
 }
 
 static void pef_guest_init(Object *obj)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c417f9dd523..7178da53901 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -74,6 +74,7 @@
 #include "hw/virtio/vhost-scsi-common.h"
 
 #include "exec/ram_addr.h"
+#include "exec/confidential-guest-support.h"
 #include "hw/usb.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
@@ -86,7 +87,6 @@
 #include "hw/ppc/spapr_tpm_proxy.h"
 #include "hw/ppc/spapr_nvdimm.h"
 #include "hw/ppc/spapr_numa.h"
-#include "hw/ppc/pef.h"
 
 #include "monitor/monitor.h"
 
@@ -1714,7 +1714,9 @@ static void spapr_machine_reset(MachineState *machine, ShutdownCause reason)
         qemu_guest_getrandom_nofail(spapr->fdt_rng_seed, 32);
     }
 
-    pef_kvm_reset(machine->cgs, &error_fatal);
+    if (machine->cgs) {
+        confidential_guest_kvm_reset(machine->cgs, &error_fatal);
+    }
     spapr_caps_apply(spapr);
     spapr_nested_reset(spapr);
 
@@ -2840,7 +2842,9 @@ static void spapr_machine_init(MachineState *machine)
     /*
      * if Secure VM (PEF) support is configured, then initialize it
      */
-    pef_kvm_init(machine->cgs, &error_fatal);
+    if (machine->cgs) {
+        confidential_guest_kvm_init(machine->cgs, &error_fatal);
+    }
 
     msi_nonbroken = true;
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/26] s390: Switch to use confidential_guest_kvm_init()
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (4 preceding siblings ...)
  2024-03-22 18:10 ` [PATCH 05/26] ppc/pef: switch to use confidential_guest_kvm_init/reset() Paolo Bonzini
@ 2024-03-22 18:10 ` Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 07/26] scripts/update-linux-headers: Add setup_data.h to import list Paolo Bonzini
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Xiaoyao Li <xiaoyao.li@intel.com>

Use unified confidential_guest_kvm_init() for consistency with
other architectures.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20240229060038.606591-1-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/s390x/kvm/pv.h      | 14 --------------
 hw/s390x/s390-virtio-ccw.c |  5 ++++-
 target/s390x/kvm/pv.c      |  8 ++++++++
 3 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/target/s390x/kvm/pv.h b/target/s390x/kvm/pv.h
index 5877d28ff10..4b408174391 100644
--- a/target/s390x/kvm/pv.h
+++ b/target/s390x/kvm/pv.h
@@ -80,18 +80,4 @@ static inline int kvm_s390_dump_mem_state(uint64_t addr, size_t len,
 static inline int kvm_s390_dump_completion_data(void *buff) { return 0; }
 #endif /* CONFIG_KVM */
 
-int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);
-static inline int s390_pv_init(ConfidentialGuestSupport *cgs, Error **errp)
-{
-    if (!cgs) {
-        return 0;
-    }
-    if (kvm_enabled()) {
-        return s390_pv_kvm_init(cgs, errp);
-    }
-
-    error_setg(errp, "Protected Virtualization requires KVM");
-    return -1;
-}
-
 #endif /* HW_S390_PV_H */
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index b1dcb3857f0..e35b90ed83c 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "exec/ram_addr.h"
+#include "exec/confidential-guest-support.h"
 #include "hw/s390x/s390-virtio-hcall.h"
 #include "hw/s390x/sclp.h"
 #include "hw/s390x/s390_flic.h"
@@ -260,7 +261,9 @@ static void ccw_init(MachineState *machine)
     s390_init_cpus(machine);
 
     /* Need CPU model to be determined before we can set up PV */
-    s390_pv_init(machine->cgs, &error_fatal);
+    if (machine->cgs) {
+        confidential_guest_kvm_init(machine->cgs, &error_fatal);
+    }
 
     s390_flic_init();
 
diff --git a/target/s390x/kvm/pv.c b/target/s390x/kvm/pv.c
index 7ca7faec73e..c04d53753bf 100644
--- a/target/s390x/kvm/pv.c
+++ b/target/s390x/kvm/pv.c
@@ -340,6 +340,11 @@ int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
         return 0;
     }
 
+    if (!kvm_enabled()) {
+        error_setg(errp, "Protected Virtualization requires KVM");
+        return -1;
+    }
+
     if (!s390_has_feat(S390_FEAT_UNPACK)) {
         error_setg(errp,
                    "CPU model does not support Protected Virtualization");
@@ -364,6 +369,9 @@ OBJECT_DEFINE_TYPE_WITH_INTERFACES(S390PVGuest,
 
 static void s390_pv_guest_class_init(ObjectClass *oc, void *data)
 {
+    ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
+
+    klass->kvm_init = s390_pv_kvm_init;
 }
 
 static void s390_pv_guest_init(Object *obj)
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/26] scripts/update-linux-headers: Add setup_data.h to import list
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (5 preceding siblings ...)
  2024-03-22 18:10 ` [PATCH 06/26] s390: Switch to use confidential_guest_kvm_init() Paolo Bonzini
@ 2024-03-22 18:10 ` Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 08/26] scripts/update-linux-headers: Add bits.h to file imports Paolo Bonzini
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Michael Roth <michael.roth@amd.com>

Data structures like struct setup_data have been moved to a separate
setup_data.h header which bootparam.h relies on. Add setup_data.h to
the cp_portable() list and sync it along with the other header files.

Note that currently struct setup_data is stripped away as part of
generating bootparam.h, but that handling is no currently needed for
setup_data.h since it doesn't pull in many external
headers/dependencies. However, QEMU currently redefines struct
setup_data in hw/i386/x86.c, so that will need to be removed as part of
any header update that pulls in the new setup_data.h to avoid build
bisect breakage.

Because <asm/setup_data.h> is the first architecture specific #include
in include/standard-headers/, add a new sed substitution to rewrite
asm/ include to the standard-headers/asm-* subdirectory for the current
architecture.

And while at it, remove asm-generic/kvm_para.h from the list of
allowed includes: it does not have a matching substitution, and therefore
it would not be possible to use it on non-Linux systems where there is
no /usr/include/asm-generic/ directory.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 scripts/update-linux-headers.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index a0006eec6fd..d48856f9e24 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -61,7 +61,7 @@ cp_portable() {
                                      -e 'linux/const' \
                                      -e 'linux/kernel' \
                                      -e 'linux/sysinfo' \
-                                     -e 'asm-generic/kvm_para' \
+                                     -e 'asm/setup_data.h' \
                                      > /dev/null
     then
         echo "Unexpected #include in input file $f".
@@ -77,6 +77,7 @@ cp_portable() {
         -e 's/__be\([0-9][0-9]*\)/uint\1_t/g' \
         -e 's/"\(input-event-codes\.h\)"/"standard-headers\/linux\/\1"/' \
         -e 's/<linux\/\([^>]*\)>/"standard-headers\/linux\/\1"/' \
+        -e 's/<asm\/\([^>]*\)>/"standard-headers\/asm-'$arch'\/\1"/' \
         -e 's/__bitwise//' \
         -e 's/__attribute__((packed))/QEMU_PACKED/' \
         -e 's/__inline__/inline/' \
@@ -155,11 +156,14 @@ for arch in $ARCHLIST; do
                "$tmpdir/include/asm/bootparam.h" > "$tmpdir/bootparam.h"
         cp_portable "$tmpdir/bootparam.h" \
                     "$output/include/standard-headers/asm-$arch"
+        cp_portable "$tmpdir/include/asm/setup_data.h" \
+                    "$output/standard-headers/asm-x86"
     fi
     if [ $arch = riscv ]; then
         cp "$tmpdir/include/asm/ptrace.h" "$output/linux-headers/asm-riscv/"
     fi
 done
+arch=
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/26] scripts/update-linux-headers: Add bits.h to file imports
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (6 preceding siblings ...)
  2024-03-22 18:10 ` [PATCH 07/26] scripts/update-linux-headers: Add setup_data.h to import list Paolo Bonzini
@ 2024-03-22 18:10 ` Paolo Bonzini
  2024-03-22 18:10 ` [PATCH 09/26] [HACK] linux-headers: Update headers for 6.8 + kvm-coco-queue + SNP Paolo Bonzini
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Michael Roth <michael.roth@amd.com>

Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index d48856f9e24..5f20434d5c5 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -169,7 +169,7 @@ rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
 for header in const.h stddef.h kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h \
               psci.h psp-sev.h userfaultfd.h memfd.h mman.h nvme_ioctl.h \
-              vduse.h iommufd.h; do
+              vduse.h iommufd.h bits.h; do
     cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/26] [HACK] linux-headers: Update headers for 6.8 + kvm-coco-queue + SNP
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (7 preceding siblings ...)
  2024-03-22 18:10 ` [PATCH 08/26] scripts/update-linux-headers: Add bits.h to file imports Paolo Bonzini
@ 2024-03-22 18:10 ` Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 10/26] [TO SQUASH] hw/i386: Remove redeclaration of struct setup_data Paolo Bonzini
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Michael Roth <michael.roth@amd.com>

Pull in 6.8 kvm-next + kvm-coco-queue + SNP headers.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/standard-headers/asm-x86/bootparam.h  |  17 +-
 include/standard-headers/asm-x86/kvm_para.h   |   3 +-
 include/standard-headers/asm-x86/setup_data.h |  83 ++
 include/standard-headers/linux/ethtool.h      |  48 ++
 include/standard-headers/linux/fuse.h         |  39 +-
 .../linux/input-event-codes.h                 |   1 +
 include/standard-headers/linux/virtio_gpu.h   |   2 +
 include/standard-headers/linux/virtio_snd.h   | 154 ++++
 linux-headers/asm-arm64/kvm.h                 |  15 +-
 linux-headers/asm-arm64/sve_context.h         |  11 +
 linux-headers/asm-generic/bitsperlong.h       |   4 +
 linux-headers/asm-loongarch/kvm.h             |   2 -
 linux-headers/asm-mips/kvm.h                  |   2 -
 linux-headers/asm-powerpc/kvm.h               |  45 +-
 linux-headers/asm-riscv/kvm.h                 |   3 +-
 linux-headers/asm-s390/kvm.h                  | 315 +++++++-
 linux-headers/asm-x86/kvm.h                   | 364 ++++++++-
 linux-headers/linux/bits.h                    |  15 +
 linux-headers/linux/kvm.h                     | 717 +-----------------
 linux-headers/linux/psp-sev.h                 |  71 ++
 20 files changed, 1186 insertions(+), 725 deletions(-)
 create mode 100644 include/standard-headers/asm-x86/setup_data.h
 create mode 100644 linux-headers/linux/bits.h

diff --git a/include/standard-headers/asm-x86/bootparam.h b/include/standard-headers/asm-x86/bootparam.h
index 0b06d2bff1b..b582a105c08 100644
--- a/include/standard-headers/asm-x86/bootparam.h
+++ b/include/standard-headers/asm-x86/bootparam.h
@@ -2,21 +2,7 @@
 #ifndef _ASM_X86_BOOTPARAM_H
 #define _ASM_X86_BOOTPARAM_H
 
-/* setup_data/setup_indirect types */
-#define SETUP_NONE			0
-#define SETUP_E820_EXT			1
-#define SETUP_DTB			2
-#define SETUP_PCI			3
-#define SETUP_EFI			4
-#define SETUP_APPLE_PROPERTIES		5
-#define SETUP_JAILHOUSE			6
-#define SETUP_CC_BLOB			7
-#define SETUP_IMA			8
-#define SETUP_RNG_SEED			9
-#define SETUP_ENUM_MAX			SETUP_RNG_SEED
-
-#define SETUP_INDIRECT			(1<<31)
-#define SETUP_TYPE_MAX			(SETUP_ENUM_MAX | SETUP_INDIRECT)
+#include "standard-headers/asm-x86/setup_data.h"
 
 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK	0x07FF
@@ -38,6 +24,7 @@
 #define XLF_EFI_KEXEC			(1<<4)
 #define XLF_5LEVEL			(1<<5)
 #define XLF_5LEVEL_ENABLED		(1<<6)
+#define XLF_MEM_ENCRYPTION		(1<<7)
 
 
 #endif /* _ASM_X86_BOOTPARAM_H */
diff --git a/include/standard-headers/asm-x86/kvm_para.h b/include/standard-headers/asm-x86/kvm_para.h
index f0235e58a1d..9a011d20f01 100644
--- a/include/standard-headers/asm-x86/kvm_para.h
+++ b/include/standard-headers/asm-x86/kvm_para.h
@@ -92,7 +92,7 @@ struct kvm_clock_pairing {
 #define KVM_ASYNC_PF_DELIVERY_AS_INT		(1 << 3)
 
 /* MSR_KVM_ASYNC_PF_INT */
-#define KVM_ASYNC_PF_VEC_MASK			GENMASK(7, 0)
+#define KVM_ASYNC_PF_VEC_MASK			__GENMASK(7, 0)
 
 /* MSR_KVM_MIGRATION_CONTROL */
 #define KVM_MIGRATION_READY		(1 << 0)
@@ -142,7 +142,6 @@ struct kvm_vcpu_pv_apf_data {
 	uint32_t token;
 
 	uint8_t pad[56];
-	uint32_t enabled;
 };
 
 #define KVM_PV_EOI_BIT 0
diff --git a/include/standard-headers/asm-x86/setup_data.h b/include/standard-headers/asm-x86/setup_data.h
new file mode 100644
index 00000000000..09355f54c55
--- /dev/null
+++ b/include/standard-headers/asm-x86/setup_data.h
@@ -0,0 +1,83 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _ASM_X86_SETUP_DATA_H
+#define _ASM_X86_SETUP_DATA_H
+
+/* setup_data/setup_indirect types */
+#define SETUP_NONE			0
+#define SETUP_E820_EXT			1
+#define SETUP_DTB			2
+#define SETUP_PCI			3
+#define SETUP_EFI			4
+#define SETUP_APPLE_PROPERTIES		5
+#define SETUP_JAILHOUSE			6
+#define SETUP_CC_BLOB			7
+#define SETUP_IMA			8
+#define SETUP_RNG_SEED			9
+#define SETUP_ENUM_MAX			SETUP_RNG_SEED
+
+#define SETUP_INDIRECT			(1<<31)
+#define SETUP_TYPE_MAX			(SETUP_ENUM_MAX | SETUP_INDIRECT)
+
+#ifndef __ASSEMBLY__
+
+#include "standard-headers/linux/types.h"
+
+/* extensible setup data list node */
+struct setup_data {
+	uint64_t next;
+	uint32_t type;
+	uint32_t len;
+	uint8_t data[];
+};
+
+/* extensible setup indirect data node */
+struct setup_indirect {
+	uint32_t type;
+	uint32_t reserved;  /* Reserved, must be set to zero. */
+	uint64_t len;
+	uint64_t addr;
+};
+
+/*
+ * The E820 memory region entry of the boot protocol ABI:
+ */
+struct boot_e820_entry {
+	uint64_t addr;
+	uint64_t size;
+	uint32_t type;
+} QEMU_PACKED;
+
+/*
+ * The boot loader is passing platform information via this Jailhouse-specific
+ * setup data structure.
+ */
+struct jailhouse_setup_data {
+	struct {
+		uint16_t	version;
+		uint16_t	compatible_version;
+	} QEMU_PACKED hdr;
+	struct {
+		uint16_t	pm_timer_address;
+		uint16_t	num_cpus;
+		uint64_t	pci_mmconfig_base;
+		uint32_t	tsc_khz;
+		uint32_t	apic_khz;
+		uint8_t	standard_ioapic;
+		uint8_t	cpu_ids[255];
+	} QEMU_PACKED v1;
+	struct {
+		uint32_t	flags;
+	} QEMU_PACKED v2;
+} QEMU_PACKED;
+
+/*
+ * IMA buffer setup data information from the previous kernel during kexec
+ */
+struct ima_setup_data {
+	uint64_t addr;
+	uint64_t size;
+} QEMU_PACKED;
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_X86_SETUP_DATA_H */
diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
index dfb54eff6f7..01503784d26 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -2023,6 +2023,53 @@ static inline int ethtool_validate_duplex(uint8_t duplex)
 #define	IPV4_FLOW	0x10	/* hash only */
 #define	IPV6_FLOW	0x11	/* hash only */
 #define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
+
+/* Used for GTP-U IPv4 and IPv6.
+ * The format of GTP packets only includes
+ * elements such as TEID and GTP version.
+ * It is primarily intended for data communication of the UE.
+ */
+#define GTPU_V4_FLOW 0x13	/* hash only */
+#define GTPU_V6_FLOW 0x14	/* hash only */
+
+/* Use for GTP-C IPv4 and v6.
+ * The format of these GTP packets does not include TEID.
+ * Primarily expected to be used for communication
+ * to create sessions for UE data communication,
+ * commonly referred to as CSR (Create Session Request).
+ */
+#define GTPC_V4_FLOW 0x15	/* hash only */
+#define GTPC_V6_FLOW 0x16	/* hash only */
+
+/* Use for GTP-C IPv4 and v6.
+ * Unlike GTPC_V4_FLOW, the format of these GTP packets includes TEID.
+ * After session creation, it becomes this packet.
+ * This is mainly used for requests to realize UE handover.
+ */
+#define GTPC_TEID_V4_FLOW 0x17	/* hash only */
+#define GTPC_TEID_V6_FLOW 0x18	/* hash only */
+
+/* Use for GTP-U and extended headers for the PSC (PDU Session Container).
+ * The format of these GTP packets includes TEID and QFI.
+ * In 5G communication using UPF (User Plane Function),
+ * data communication with this extended header is performed.
+ */
+#define GTPU_EH_V4_FLOW 0x19	/* hash only */
+#define GTPU_EH_V6_FLOW 0x1a	/* hash only */
+
+/* Use for GTP-U IPv4 and v6 PSC (PDU Session Container) extended headers.
+ * This differs from GTPU_EH_V(4|6)_FLOW in that it is distinguished by
+ * UL/DL included in the PSC.
+ * There are differences in the data included based on Downlink/Uplink,
+ * and can be used to distinguish packets.
+ * The functions described so far are useful when you want to
+ * handle communication from the mobile network in UPF, PGW, etc.
+ */
+#define GTPU_UL_V4_FLOW 0x1b	/* hash only */
+#define GTPU_UL_V6_FLOW 0x1c	/* hash only */
+#define GTPU_DL_V4_FLOW 0x1d	/* hash only */
+#define GTPU_DL_V6_FLOW 0x1e	/* hash only */
+
 /* Flag to enable additional fields in struct ethtool_rx_flow_spec */
 #define	FLOW_EXT	0x80000000
 #define	FLOW_MAC_EXT	0x40000000
@@ -2037,6 +2084,7 @@ static inline int ethtool_validate_duplex(uint8_t duplex)
 #define	RXH_IP_DST	(1 << 5)
 #define	RXH_L4_B_0_1	(1 << 6) /* src port in case of TCP/UDP/SCTP */
 #define	RXH_L4_B_2_3	(1 << 7) /* dst port in case of TCP/UDP/SCTP */
+#define	RXH_GTP_TEID	(1 << 8) /* teid in case of GTP */
 #define	RXH_DISCARD	(1 << 31)
 
 #define	RX_CLS_FLOW_DISC	0xffffffffffffffffULL
diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index fc0dcd10aed..bac9dbc49f8 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -211,6 +211,12 @@
  *  7.39
  *  - add FUSE_DIRECT_IO_ALLOW_MMAP
  *  - add FUSE_STATX and related structures
+ *
+ *  7.40
+ *  - add max_stack_depth to fuse_init_out, add FUSE_PASSTHROUGH init flag
+ *  - add backing_id to fuse_open_out, add FOPEN_PASSTHROUGH open flag
+ *  - add FUSE_NO_EXPORT_SUPPORT init flag
+ *  - add FUSE_NOTIFY_RESEND, add FUSE_HAS_RESEND init flag
  */
 
 #ifndef _LINUX_FUSE_H
@@ -242,7 +248,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 39
+#define FUSE_KERNEL_MINOR_VERSION 40
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -349,6 +355,7 @@ struct fuse_file_lock {
  * FOPEN_STREAM: the file is stream-like (no file position at all)
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
  * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
+ * FOPEN_PASSTHROUGH: passthrough read/write io for this open file
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -357,6 +364,7 @@ struct fuse_file_lock {
 #define FOPEN_STREAM		(1 << 4)
 #define FOPEN_NOFLUSH		(1 << 5)
 #define FOPEN_PARALLEL_DIRECT_WRITES	(1 << 6)
+#define FOPEN_PASSTHROUGH	(1 << 7)
 
 /**
  * INIT request/reply flags
@@ -406,6 +414,9 @@ struct fuse_file_lock {
  *			symlink and mknod (single group that matches parent)
  * FUSE_HAS_EXPIRE_ONLY: kernel supports expiry-only entry invalidation
  * FUSE_DIRECT_IO_ALLOW_MMAP: allow shared mmap in FOPEN_DIRECT_IO mode.
+ * FUSE_NO_EXPORT_SUPPORT: explicitly disable export support
+ * FUSE_HAS_RESEND: kernel supports resending pending requests, and the high bit
+ *		    of the request ID indicates resend requests
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -445,6 +456,9 @@ struct fuse_file_lock {
 #define FUSE_CREATE_SUPP_GROUP	(1ULL << 34)
 #define FUSE_HAS_EXPIRE_ONLY	(1ULL << 35)
 #define FUSE_DIRECT_IO_ALLOW_MMAP (1ULL << 36)
+#define FUSE_PASSTHROUGH	(1ULL << 37)
+#define FUSE_NO_EXPORT_SUPPORT	(1ULL << 38)
+#define FUSE_HAS_RESEND		(1ULL << 39)
 
 /* Obsolete alias for FUSE_DIRECT_IO_ALLOW_MMAP */
 #define FUSE_DIRECT_IO_RELAX	FUSE_DIRECT_IO_ALLOW_MMAP
@@ -631,6 +645,7 @@ enum fuse_notify_code {
 	FUSE_NOTIFY_STORE = 4,
 	FUSE_NOTIFY_RETRIEVE = 5,
 	FUSE_NOTIFY_DELETE = 6,
+	FUSE_NOTIFY_RESEND = 7,
 	FUSE_NOTIFY_CODE_MAX,
 };
 
@@ -757,7 +772,7 @@ struct fuse_create_in {
 struct fuse_open_out {
 	uint64_t	fh;
 	uint32_t	open_flags;
-	uint32_t	padding;
+	int32_t		backing_id;
 };
 
 struct fuse_release_in {
@@ -873,7 +888,8 @@ struct fuse_init_out {
 	uint16_t	max_pages;
 	uint16_t	map_alignment;
 	uint32_t	flags2;
-	uint32_t	unused[7];
+	uint32_t	max_stack_depth;
+	uint32_t	unused[6];
 };
 
 #define CUSE_INIT_INFO_MAX 4096
@@ -956,6 +972,14 @@ struct fuse_fallocate_in {
 	uint32_t	padding;
 };
 
+/**
+ * FUSE request unique ID flag
+ *
+ * Indicates whether this is a resend request. The receiver should handle this
+ * request accordingly.
+ */
+#define FUSE_UNIQUE_RESEND (1ULL << 63)
+
 struct fuse_in_header {
 	uint32_t	len;
 	uint32_t	opcode;
@@ -1045,9 +1069,18 @@ struct fuse_notify_retrieve_in {
 	uint64_t	dummy4;
 };
 
+struct fuse_backing_map {
+	int32_t		fd;
+	uint32_t	flags;
+	uint64_t	padding;
+};
+
 /* Device ioctls: */
 #define FUSE_DEV_IOC_MAGIC		229
 #define FUSE_DEV_IOC_CLONE		_IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t)
+#define FUSE_DEV_IOC_BACKING_OPEN	_IOW(FUSE_DEV_IOC_MAGIC, 1, \
+					     struct fuse_backing_map)
+#define FUSE_DEV_IOC_BACKING_CLOSE	_IOW(FUSE_DEV_IOC_MAGIC, 2, uint32_t)
 
 struct fuse_lseek_in {
 	uint64_t	fh;
diff --git a/include/standard-headers/linux/input-event-codes.h b/include/standard-headers/linux/input-event-codes.h
index f6bab08540d..2221b0c3834 100644
--- a/include/standard-headers/linux/input-event-codes.h
+++ b/include/standard-headers/linux/input-event-codes.h
@@ -602,6 +602,7 @@
 
 #define KEY_ALS_TOGGLE		0x230	/* Ambient light sensor */
 #define KEY_ROTATE_LOCK_TOGGLE	0x231	/* Display rotation lock */
+#define KEY_REFRESH_RATE_TOGGLE	0x232	/* Display refresh rate toggle */
 
 #define KEY_BUTTONCONFIG		0x240	/* AL Button Configuration */
 #define KEY_TASKMANAGER		0x241	/* AL Task/Project Manager */
diff --git a/include/standard-headers/linux/virtio_gpu.h b/include/standard-headers/linux/virtio_gpu.h
index 2da48d3d4c2..2db643ed8fb 100644
--- a/include/standard-headers/linux/virtio_gpu.h
+++ b/include/standard-headers/linux/virtio_gpu.h
@@ -309,6 +309,8 @@ struct virtio_gpu_cmd_submit {
 
 #define VIRTIO_GPU_CAPSET_VIRGL 1
 #define VIRTIO_GPU_CAPSET_VIRGL2 2
+/* 3 is reserved for gfxstream */
+#define VIRTIO_GPU_CAPSET_VENUS 4
 
 /* VIRTIO_GPU_CMD_GET_CAPSET_INFO */
 struct virtio_gpu_get_capset_info {
diff --git a/include/standard-headers/linux/virtio_snd.h b/include/standard-headers/linux/virtio_snd.h
index 1af96b9fc61..860f12e0a4e 100644
--- a/include/standard-headers/linux/virtio_snd.h
+++ b/include/standard-headers/linux/virtio_snd.h
@@ -7,6 +7,14 @@
 
 #include "standard-headers/linux/virtio_types.h"
 
+/*******************************************************************************
+ * FEATURE BITS
+ */
+enum {
+	/* device supports control elements */
+	VIRTIO_SND_F_CTLS = 0
+};
+
 /*******************************************************************************
  * CONFIGURATION SPACE
  */
@@ -17,6 +25,8 @@ struct virtio_snd_config {
 	uint32_t streams;
 	/* # of available channel maps */
 	uint32_t chmaps;
+	/* # of available control elements */
+	uint32_t controls;
 };
 
 enum {
@@ -55,6 +65,15 @@ enum {
 	/* channel map control request types */
 	VIRTIO_SND_R_CHMAP_INFO = 0x0200,
 
+	/* control element request types */
+	VIRTIO_SND_R_CTL_INFO = 0x0300,
+	VIRTIO_SND_R_CTL_ENUM_ITEMS,
+	VIRTIO_SND_R_CTL_READ,
+	VIRTIO_SND_R_CTL_WRITE,
+	VIRTIO_SND_R_CTL_TLV_READ,
+	VIRTIO_SND_R_CTL_TLV_WRITE,
+	VIRTIO_SND_R_CTL_TLV_COMMAND,
+
 	/* jack event types */
 	VIRTIO_SND_EVT_JACK_CONNECTED = 0x1000,
 	VIRTIO_SND_EVT_JACK_DISCONNECTED,
@@ -63,6 +82,9 @@ enum {
 	VIRTIO_SND_EVT_PCM_PERIOD_ELAPSED = 0x1100,
 	VIRTIO_SND_EVT_PCM_XRUN,
 
+	/* control element event types */
+	VIRTIO_SND_EVT_CTL_NOTIFY = 0x1200,
+
 	/* common status codes */
 	VIRTIO_SND_S_OK = 0x8000,
 	VIRTIO_SND_S_BAD_MSG,
@@ -331,4 +353,136 @@ struct virtio_snd_chmap_info {
 	uint8_t positions[VIRTIO_SND_CHMAP_MAX_SIZE];
 };
 
+/*******************************************************************************
+ * CONTROL ELEMENTS MESSAGES
+ */
+struct virtio_snd_ctl_hdr {
+	/* VIRTIO_SND_R_CTL_XXX */
+	struct virtio_snd_hdr hdr;
+	/* 0 ... virtio_snd_config::controls - 1 */
+	uint32_t control_id;
+};
+
+/* supported roles for control elements */
+enum {
+	VIRTIO_SND_CTL_ROLE_UNDEFINED = 0,
+	VIRTIO_SND_CTL_ROLE_VOLUME,
+	VIRTIO_SND_CTL_ROLE_MUTE,
+	VIRTIO_SND_CTL_ROLE_GAIN
+};
+
+/* supported value types for control elements */
+enum {
+	VIRTIO_SND_CTL_TYPE_BOOLEAN = 0,
+	VIRTIO_SND_CTL_TYPE_INTEGER,
+	VIRTIO_SND_CTL_TYPE_INTEGER64,
+	VIRTIO_SND_CTL_TYPE_ENUMERATED,
+	VIRTIO_SND_CTL_TYPE_BYTES,
+	VIRTIO_SND_CTL_TYPE_IEC958
+};
+
+/* supported access rights for control elements */
+enum {
+	VIRTIO_SND_CTL_ACCESS_READ = 0,
+	VIRTIO_SND_CTL_ACCESS_WRITE,
+	VIRTIO_SND_CTL_ACCESS_VOLATILE,
+	VIRTIO_SND_CTL_ACCESS_INACTIVE,
+	VIRTIO_SND_CTL_ACCESS_TLV_READ,
+	VIRTIO_SND_CTL_ACCESS_TLV_WRITE,
+	VIRTIO_SND_CTL_ACCESS_TLV_COMMAND
+};
+
+struct virtio_snd_ctl_info {
+	/* common header */
+	struct virtio_snd_info hdr;
+	/* element role (VIRTIO_SND_CTL_ROLE_XXX) */
+	uint32_t role;
+	/* element value type (VIRTIO_SND_CTL_TYPE_XXX) */
+	uint32_t type;
+	/* element access right bit map (1 << VIRTIO_SND_CTL_ACCESS_XXX) */
+	uint32_t access;
+	/* # of members in the element value */
+	uint32_t count;
+	/* index for an element with a non-unique name */
+	uint32_t index;
+	/* name identifier string for the element */
+	uint8_t name[44];
+	/* additional information about the element's value */
+	union {
+		/* VIRTIO_SND_CTL_TYPE_INTEGER */
+		struct {
+			/* minimum supported value */
+			uint32_t min;
+			/* maximum supported value */
+			uint32_t max;
+			/* fixed step size for value (0 = variable size) */
+			uint32_t step;
+		} integer;
+		/* VIRTIO_SND_CTL_TYPE_INTEGER64 */
+		struct {
+			/* minimum supported value */
+			uint64_t min;
+			/* maximum supported value */
+			uint64_t max;
+			/* fixed step size for value (0 = variable size) */
+			uint64_t step;
+		} integer64;
+		/* VIRTIO_SND_CTL_TYPE_ENUMERATED */
+		struct {
+			/* # of options supported for value */
+			uint32_t items;
+		} enumerated;
+	} value;
+};
+
+struct virtio_snd_ctl_enum_item {
+	/* option name */
+	uint8_t item[64];
+};
+
+struct virtio_snd_ctl_iec958 {
+	/* AES/IEC958 channel status bits */
+	uint8_t status[24];
+	/* AES/IEC958 subcode bits */
+	uint8_t subcode[147];
+	/* nothing */
+	uint8_t pad;
+	/* AES/IEC958 subframe bits */
+	uint8_t dig_subframe[4];
+};
+
+struct virtio_snd_ctl_value {
+	union {
+		/* VIRTIO_SND_CTL_TYPE_BOOLEAN|INTEGER value */
+		uint32_t integer[128];
+		/* VIRTIO_SND_CTL_TYPE_INTEGER64 value */
+		uint64_t integer64[64];
+		/* VIRTIO_SND_CTL_TYPE_ENUMERATED value (option indexes) */
+		uint32_t enumerated[128];
+		/* VIRTIO_SND_CTL_TYPE_BYTES value */
+		uint8_t bytes[512];
+		/* VIRTIO_SND_CTL_TYPE_IEC958 value */
+		struct virtio_snd_ctl_iec958 iec958;
+	} value;
+};
+
+/* supported event reason types */
+enum {
+	/* element's value has changed */
+	VIRTIO_SND_CTL_EVT_MASK_VALUE = 0,
+	/* element's information has changed */
+	VIRTIO_SND_CTL_EVT_MASK_INFO,
+	/* element's metadata has changed */
+	VIRTIO_SND_CTL_EVT_MASK_TLV
+};
+
+struct virtio_snd_ctl_event {
+	/* VIRTIO_SND_EVT_CTL_NOTIFY */
+	struct virtio_snd_hdr hdr;
+	/* 0 ... virtio_snd_config::controls - 1 */
+	uint16_t control_id;
+	/* event reason bit map (1 << VIRTIO_SND_CTL_EVT_MASK_XXX) */
+	uint16_t mask;
+};
+
 #endif /* VIRTIO_SND_IF_H */
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index c59ea55cd8e..2af9931ae98 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -37,9 +37,7 @@
 #include <asm/ptrace.h>
 #include <asm/sve_context.h>
 
-#define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_IRQ_LINE
-#define __KVM_HAVE_READONLY_MEM
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
@@ -76,11 +74,11 @@ struct kvm_regs {
 
 /* KVM_ARM_SET_DEVICE_ADDR ioctl id encoding */
 #define KVM_ARM_DEVICE_TYPE_SHIFT	0
-#define KVM_ARM_DEVICE_TYPE_MASK	GENMASK(KVM_ARM_DEVICE_TYPE_SHIFT + 15, \
-						KVM_ARM_DEVICE_TYPE_SHIFT)
+#define KVM_ARM_DEVICE_TYPE_MASK	__GENMASK(KVM_ARM_DEVICE_TYPE_SHIFT + 15, \
+						  KVM_ARM_DEVICE_TYPE_SHIFT)
 #define KVM_ARM_DEVICE_ID_SHIFT		16
-#define KVM_ARM_DEVICE_ID_MASK		GENMASK(KVM_ARM_DEVICE_ID_SHIFT + 15, \
-						KVM_ARM_DEVICE_ID_SHIFT)
+#define KVM_ARM_DEVICE_ID_MASK		__GENMASK(KVM_ARM_DEVICE_ID_SHIFT + 15, \
+						  KVM_ARM_DEVICE_ID_SHIFT)
 
 /* Supported device IDs */
 #define KVM_ARM_DEVICE_VGIC_V2		0
@@ -162,6 +160,11 @@ struct kvm_sync_regs {
 	__u64 device_irq_level;
 };
 
+/* Bits for run->s.regs.device_irq_level */
+#define KVM_ARM_DEV_EL1_VTIMER		(1 << 0)
+#define KVM_ARM_DEV_EL1_PTIMER		(1 << 1)
+#define KVM_ARM_DEV_PMU			(1 << 2)
+
 /*
  * PMU filter structure. Describe a range of events with a particular
  * action. To be used with KVM_ARM_VCPU_PMU_V3_FILTER.
diff --git a/linux-headers/asm-arm64/sve_context.h b/linux-headers/asm-arm64/sve_context.h
index 1d0e3e1d095..d1b1ec8cb1f 100644
--- a/linux-headers/asm-arm64/sve_context.h
+++ b/linux-headers/asm-arm64/sve_context.h
@@ -13,6 +13,17 @@
 
 #define __SVE_VQ_BYTES		16	/* number of bytes per quadword */
 
+/*
+ * Yes, __SVE_VQ_MAX is 512 QUADWORDS.
+ *
+ * To help ensure forward portability, this is much larger than the
+ * current maximum value defined by the SVE architecture.  While arrays
+ * or static allocations can be sized based on this value, watch out!
+ * It will waste a surprisingly large amount of memory.
+ *
+ * Dynamic sizing based on the actual runtime vector length is likely to
+ * be preferable for most purposes.
+ */
 #define __SVE_VQ_MIN		1
 #define __SVE_VQ_MAX		512
 
diff --git a/linux-headers/asm-generic/bitsperlong.h b/linux-headers/asm-generic/bitsperlong.h
index 75f320fa91e..1fb4f0c9f27 100644
--- a/linux-headers/asm-generic/bitsperlong.h
+++ b/linux-headers/asm-generic/bitsperlong.h
@@ -24,4 +24,8 @@
 #endif
 #endif
 
+#ifndef __BITS_PER_LONG_LONG
+#define __BITS_PER_LONG_LONG 64
+#endif
+
 #endif /* __ASM_GENERIC_BITS_PER_LONG */
diff --git a/linux-headers/asm-loongarch/kvm.h b/linux-headers/asm-loongarch/kvm.h
index 923d0bd3829..109785922cf 100644
--- a/linux-headers/asm-loongarch/kvm.h
+++ b/linux-headers/asm-loongarch/kvm.h
@@ -14,8 +14,6 @@
  * Some parts derived from the x86 version of this file.
  */
 
-#define __KVM_HAVE_READONLY_MEM
-
 #define KVM_COALESCED_MMIO_PAGE_OFFSET	1
 #define KVM_DIRTY_LOG_PAGE_OFFSET	64
 
diff --git a/linux-headers/asm-mips/kvm.h b/linux-headers/asm-mips/kvm.h
index edcf717c432..9673dc9cb31 100644
--- a/linux-headers/asm-mips/kvm.h
+++ b/linux-headers/asm-mips/kvm.h
@@ -20,8 +20,6 @@
  * Some parts derived from the x86 version of this file.
  */
 
-#define __KVM_HAVE_READONLY_MEM
-
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
 /*
diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 9f18fa090f1..1691297a766 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -28,7 +28,6 @@
 #define __KVM_HAVE_PPC_SMT
 #define __KVM_HAVE_IRQCHIP
 #define __KVM_HAVE_IRQ_LINE
-#define __KVM_HAVE_GUEST_DEBUG
 
 /* Not always available, but if it is, this is the correct offset.  */
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
@@ -733,4 +732,48 @@ struct kvm_ppc_xive_eq {
 #define KVM_XIVE_TIMA_PAGE_OFFSET	0
 #define KVM_XIVE_ESB_PAGE_OFFSET	4
 
+/* for KVM_PPC_GET_PVINFO */
+
+#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)
+
+struct kvm_ppc_pvinfo {
+	/* out */
+	__u32 flags;
+	__u32 hcall[4];
+	__u8  pad[108];
+};
+
+/* for KVM_PPC_GET_SMMU_INFO */
+#define KVM_PPC_PAGE_SIZES_MAX_SZ	8
+
+struct kvm_ppc_one_page_size {
+	__u32 page_shift;	/* Page shift (or 0) */
+	__u32 pte_enc;		/* Encoding in the HPTE (>>12) */
+};
+
+struct kvm_ppc_one_seg_page_size {
+	__u32 page_shift;	/* Base page shift of segment (or 0) */
+	__u32 slb_enc;		/* SLB encoding for BookS */
+	struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
+};
+
+#define KVM_PPC_PAGE_SIZES_REAL		0x00000001
+#define KVM_PPC_1T_SEGMENTS		0x00000002
+#define KVM_PPC_NO_HASH			0x00000004
+
+struct kvm_ppc_smmu_info {
+	__u64 flags;
+	__u32 slb_size;
+	__u16 data_keys;	/* # storage keys supported for data */
+	__u16 instr_keys;	/* # storage keys supported for instructions */
+	struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
+};
+
+/* for KVM_PPC_RESIZE_HPT_{PREPARE,COMMIT} */
+struct kvm_ppc_resize_hpt {
+	__u64 flags;
+	__u32 shift;
+	__u32 pad;
+};
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/linux-headers/asm-riscv/kvm.h b/linux-headers/asm-riscv/kvm.h
index 7499e88a947..b1c503c2959 100644
--- a/linux-headers/asm-riscv/kvm.h
+++ b/linux-headers/asm-riscv/kvm.h
@@ -16,7 +16,6 @@
 #include <asm/ptrace.h>
 
 #define __KVM_HAVE_IRQ_LINE
-#define __KVM_HAVE_READONLY_MEM
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
@@ -166,6 +165,8 @@ enum KVM_RISCV_ISA_EXT_ID {
 	KVM_RISCV_ISA_EXT_ZVFH,
 	KVM_RISCV_ISA_EXT_ZVFHMIN,
 	KVM_RISCV_ISA_EXT_ZFA,
+	KVM_RISCV_ISA_EXT_ZTSO,
+	KVM_RISCV_ISA_EXT_ZACAS,
 	KVM_RISCV_ISA_EXT_MAX,
 };
 
diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
index 023a2763a97..684c4e1205d 100644
--- a/linux-headers/asm-s390/kvm.h
+++ b/linux-headers/asm-s390/kvm.h
@@ -12,7 +12,320 @@
 #include <linux/types.h>
 
 #define __KVM_S390
-#define __KVM_HAVE_GUEST_DEBUG
+
+struct kvm_s390_skeys {
+	__u64 start_gfn;
+	__u64 count;
+	__u64 skeydata_addr;
+	__u32 flags;
+	__u32 reserved[9];
+};
+
+#define KVM_S390_CMMA_PEEK (1 << 0)
+
+/**
+ * kvm_s390_cmma_log - Used for CMMA migration.
+ *
+ * Used both for input and output.
+ *
+ * @start_gfn: Guest page number to start from.
+ * @count: Size of the result buffer.
+ * @flags: Control operation mode via KVM_S390_CMMA_* flags
+ * @remaining: Used with KVM_S390_GET_CMMA_BITS. Indicates how many dirty
+ *             pages are still remaining.
+ * @mask: Used with KVM_S390_SET_CMMA_BITS. Bitmap of bits to actually set
+ *        in the PGSTE.
+ * @values: Pointer to the values buffer.
+ *
+ * Used in KVM_S390_{G,S}ET_CMMA_BITS ioctls.
+ */
+struct kvm_s390_cmma_log {
+	__u64 start_gfn;
+	__u32 count;
+	__u32 flags;
+	union {
+		__u64 remaining;
+		__u64 mask;
+	};
+	__u64 values;
+};
+
+#define KVM_S390_RESET_POR       1
+#define KVM_S390_RESET_CLEAR     2
+#define KVM_S390_RESET_SUBSYSTEM 4
+#define KVM_S390_RESET_CPU_INIT  8
+#define KVM_S390_RESET_IPL       16
+
+/* for KVM_S390_MEM_OP */
+struct kvm_s390_mem_op {
+	/* in */
+	__u64 gaddr;		/* the guest address */
+	__u64 flags;		/* flags */
+	__u32 size;		/* amount of bytes */
+	__u32 op;		/* type of operation */
+	__u64 buf;		/* buffer in userspace */
+	union {
+		struct {
+			__u8 ar;	/* the access register number */
+			__u8 key;	/* access key, ignored if flag unset */
+			__u8 pad1[6];	/* ignored */
+			__u64 old_addr;	/* ignored if cmpxchg flag unset */
+		};
+		__u32 sida_offset; /* offset into the sida */
+		__u8 reserved[32]; /* ignored */
+	};
+};
+/* types for kvm_s390_mem_op->op */
+#define KVM_S390_MEMOP_LOGICAL_READ	0
+#define KVM_S390_MEMOP_LOGICAL_WRITE	1
+#define KVM_S390_MEMOP_SIDA_READ	2
+#define KVM_S390_MEMOP_SIDA_WRITE	3
+#define KVM_S390_MEMOP_ABSOLUTE_READ	4
+#define KVM_S390_MEMOP_ABSOLUTE_WRITE	5
+#define KVM_S390_MEMOP_ABSOLUTE_CMPXCHG	6
+
+/* flags for kvm_s390_mem_op->flags */
+#define KVM_S390_MEMOP_F_CHECK_ONLY		(1ULL << 0)
+#define KVM_S390_MEMOP_F_INJECT_EXCEPTION	(1ULL << 1)
+#define KVM_S390_MEMOP_F_SKEY_PROTECTION	(1ULL << 2)
+
+/* flags specifying extension support via KVM_CAP_S390_MEM_OP_EXTENSION */
+#define KVM_S390_MEMOP_EXTENSION_CAP_BASE	(1 << 0)
+#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG	(1 << 1)
+
+struct kvm_s390_psw {
+	__u64 mask;
+	__u64 addr;
+};
+
+/* valid values for type in kvm_s390_interrupt */
+#define KVM_S390_SIGP_STOP		0xfffe0000u
+#define KVM_S390_PROGRAM_INT		0xfffe0001u
+#define KVM_S390_SIGP_SET_PREFIX	0xfffe0002u
+#define KVM_S390_RESTART		0xfffe0003u
+#define KVM_S390_INT_PFAULT_INIT	0xfffe0004u
+#define KVM_S390_INT_PFAULT_DONE	0xfffe0005u
+#define KVM_S390_MCHK			0xfffe1000u
+#define KVM_S390_INT_CLOCK_COMP		0xffff1004u
+#define KVM_S390_INT_CPU_TIMER		0xffff1005u
+#define KVM_S390_INT_VIRTIO		0xffff2603u
+#define KVM_S390_INT_SERVICE		0xffff2401u
+#define KVM_S390_INT_EMERGENCY		0xffff1201u
+#define KVM_S390_INT_EXTERNAL_CALL	0xffff1202u
+/* Anything below 0xfffe0000u is taken by INT_IO */
+#define KVM_S390_INT_IO(ai,cssid,ssid,schid)   \
+	(((schid)) |			       \
+	 ((ssid) << 16) |		       \
+	 ((cssid) << 18) |		       \
+	 ((ai) << 26))
+#define KVM_S390_INT_IO_MIN		0x00000000u
+#define KVM_S390_INT_IO_MAX		0xfffdffffu
+#define KVM_S390_INT_IO_AI_MASK		0x04000000u
+
+
+struct kvm_s390_interrupt {
+	__u32 type;
+	__u32 parm;
+	__u64 parm64;
+};
+
+struct kvm_s390_io_info {
+	__u16 subchannel_id;
+	__u16 subchannel_nr;
+	__u32 io_int_parm;
+	__u32 io_int_word;
+};
+
+struct kvm_s390_ext_info {
+	__u32 ext_params;
+	__u32 pad;
+	__u64 ext_params2;
+};
+
+struct kvm_s390_pgm_info {
+	__u64 trans_exc_code;
+	__u64 mon_code;
+	__u64 per_address;
+	__u32 data_exc_code;
+	__u16 code;
+	__u16 mon_class_nr;
+	__u8 per_code;
+	__u8 per_atmid;
+	__u8 exc_access_id;
+	__u8 per_access_id;
+	__u8 op_access_id;
+#define KVM_S390_PGM_FLAGS_ILC_VALID	0x01
+#define KVM_S390_PGM_FLAGS_ILC_0	0x02
+#define KVM_S390_PGM_FLAGS_ILC_1	0x04
+#define KVM_S390_PGM_FLAGS_ILC_MASK	0x06
+#define KVM_S390_PGM_FLAGS_NO_REWIND	0x08
+	__u8 flags;
+	__u8 pad[2];
+};
+
+struct kvm_s390_prefix_info {
+	__u32 address;
+};
+
+struct kvm_s390_extcall_info {
+	__u16 code;
+};
+
+struct kvm_s390_emerg_info {
+	__u16 code;
+};
+
+#define KVM_S390_STOP_FLAG_STORE_STATUS	0x01
+struct kvm_s390_stop_info {
+	__u32 flags;
+};
+
+struct kvm_s390_mchk_info {
+	__u64 cr14;
+	__u64 mcic;
+	__u64 failing_storage_address;
+	__u32 ext_damage_code;
+	__u32 pad;
+	__u8 fixed_logout[16];
+};
+
+struct kvm_s390_irq {
+	__u64 type;
+	union {
+		struct kvm_s390_io_info io;
+		struct kvm_s390_ext_info ext;
+		struct kvm_s390_pgm_info pgm;
+		struct kvm_s390_emerg_info emerg;
+		struct kvm_s390_extcall_info extcall;
+		struct kvm_s390_prefix_info prefix;
+		struct kvm_s390_stop_info stop;
+		struct kvm_s390_mchk_info mchk;
+		char reserved[64];
+	} u;
+};
+
+struct kvm_s390_irq_state {
+	__u64 buf;
+	__u32 flags;        /* will stay unused for compatibility reasons */
+	__u32 len;
+	__u32 reserved[4];  /* will stay unused for compatibility reasons */
+};
+
+struct kvm_s390_ucas_mapping {
+	__u64 user_addr;
+	__u64 vcpu_addr;
+	__u64 length;
+};
+
+struct kvm_s390_pv_sec_parm {
+	__u64 origin;
+	__u64 length;
+};
+
+struct kvm_s390_pv_unp {
+	__u64 addr;
+	__u64 size;
+	__u64 tweak;
+};
+
+enum pv_cmd_dmp_id {
+	KVM_PV_DUMP_INIT,
+	KVM_PV_DUMP_CONFIG_STOR_STATE,
+	KVM_PV_DUMP_COMPLETE,
+	KVM_PV_DUMP_CPU,
+};
+
+struct kvm_s390_pv_dmp {
+	__u64 subcmd;
+	__u64 buff_addr;
+	__u64 buff_len;
+	__u64 gaddr;		/* For dump storage state */
+	__u64 reserved[4];
+};
+
+enum pv_cmd_info_id {
+	KVM_PV_INFO_VM,
+	KVM_PV_INFO_DUMP,
+};
+
+struct kvm_s390_pv_info_dump {
+	__u64 dump_cpu_buffer_len;
+	__u64 dump_config_mem_buffer_per_1m;
+	__u64 dump_config_finalize_len;
+};
+
+struct kvm_s390_pv_info_vm {
+	__u64 inst_calls_list[4];
+	__u64 max_cpus;
+	__u64 max_guests;
+	__u64 max_guest_addr;
+	__u64 feature_indication;
+};
+
+struct kvm_s390_pv_info_header {
+	__u32 id;
+	__u32 len_max;
+	__u32 len_written;
+	__u32 reserved;
+};
+
+struct kvm_s390_pv_info {
+	struct kvm_s390_pv_info_header header;
+	union {
+		struct kvm_s390_pv_info_dump dump;
+		struct kvm_s390_pv_info_vm vm;
+	};
+};
+
+enum pv_cmd_id {
+	KVM_PV_ENABLE,
+	KVM_PV_DISABLE,
+	KVM_PV_SET_SEC_PARMS,
+	KVM_PV_UNPACK,
+	KVM_PV_VERIFY,
+	KVM_PV_PREP_RESET,
+	KVM_PV_UNSHARE_ALL,
+	KVM_PV_INFO,
+	KVM_PV_DUMP,
+	KVM_PV_ASYNC_CLEANUP_PREPARE,
+	KVM_PV_ASYNC_CLEANUP_PERFORM,
+};
+
+struct kvm_pv_cmd {
+	__u32 cmd;	/* Command to be executed */
+	__u16 rc;	/* Ultravisor return code */
+	__u16 rrc;	/* Ultravisor return reason code */
+	__u64 data;	/* Data or address */
+	__u32 flags;    /* flags for future extensions. Must be 0 for now */
+	__u32 reserved[3];
+};
+
+struct kvm_s390_zpci_op {
+	/* in */
+	__u32 fh;               /* target device */
+	__u8  op;               /* operation to perform */
+	__u8  pad[3];
+	union {
+		/* for KVM_S390_ZPCIOP_REG_AEN */
+		struct {
+			__u64 ibv;      /* Guest addr of interrupt bit vector */
+			__u64 sb;       /* Guest addr of summary bit */
+			__u32 flags;
+			__u32 noi;      /* Number of interrupts */
+			__u8 isc;       /* Guest interrupt subclass */
+			__u8 sbo;       /* Offset of guest summary bit vector */
+			__u16 pad;
+		} reg_aen;
+		__u64 reserved[8];
+	} u;
+};
+
+/* types for kvm_s390_zpci_op->op */
+#define KVM_S390_ZPCIOP_REG_AEN                0
+#define KVM_S390_ZPCIOP_DEREG_AEN      1
+
+/* flags for kvm_s390_zpci_op->u.reg_aen.flags */
+#define KVM_S390_ZPCIOP_REGAEN_HOST    (1 << 0)
 
 /* Device control API: s390-specific devices */
 #define KVM_DEV_FLIC_GET_ALL_IRQS	1
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 003fb745347..a551e44b1c8 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -7,6 +7,8 @@
  *
  */
 
+#include <linux/const.h>
+#include <linux/bits.h>
 #include <linux/types.h>
 #include <linux/ioctl.h>
 #include <linux/stddef.h>
@@ -40,7 +42,6 @@
 #define __KVM_HAVE_IRQ_LINE
 #define __KVM_HAVE_MSI
 #define __KVM_HAVE_USER_NMI
-#define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_MSIX
 #define __KVM_HAVE_MCE
 #define __KVM_HAVE_PIT_STATE2
@@ -49,7 +50,6 @@
 #define __KVM_HAVE_DEBUGREGS
 #define __KVM_HAVE_XSAVE
 #define __KVM_HAVE_XCRS
-#define __KVM_HAVE_READONLY_MEM
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
@@ -457,6 +457,7 @@ struct kvm_sync_regs {
 
 /* attributes for system fd (group 0) */
 #define KVM_X86_XCOMP_GUEST_SUPP	0
+#define KVM_X86_SEV_VMSA_FEATURES	1
 
 struct kvm_vmx_nested_state_data {
 	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
@@ -524,9 +525,353 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
-#define KVM_PMU_EVENT_FLAG_MASKED_EVENTS BIT(0)
+#define KVM_PMU_EVENT_FLAG_MASKED_EVENTS _BITUL(0)
 #define KVM_PMU_EVENT_FLAGS_VALID_MASK (KVM_PMU_EVENT_FLAG_MASKED_EVENTS)
 
+/* for KVM_CAP_MCE */
+struct kvm_x86_mce {
+	__u64 status;
+	__u64 addr;
+	__u64 misc;
+	__u64 mcg_status;
+	__u8 bank;
+	__u8 pad1[7];
+	__u64 pad2[3];
+};
+
+/* for KVM_CAP_XEN_HVM */
+#define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR	(1 << 0)
+#define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL	(1 << 1)
+#define KVM_XEN_HVM_CONFIG_SHARED_INFO		(1 << 2)
+#define KVM_XEN_HVM_CONFIG_RUNSTATE		(1 << 3)
+#define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL	(1 << 4)
+#define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
+#define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG	(1 << 6)
+#define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
+#define KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA	(1 << 8)
+
+struct kvm_xen_hvm_config {
+	__u32 flags;
+	__u32 msr;
+	__u64 blob_addr_32;
+	__u64 blob_addr_64;
+	__u8 blob_size_32;
+	__u8 blob_size_64;
+	__u8 pad2[30];
+};
+
+struct kvm_xen_hvm_attr {
+	__u16 type;
+	__u16 pad[3];
+	union {
+		__u8 long_mode;
+		__u8 vector;
+		__u8 runstate_update_flag;
+		union {
+			__u64 gfn;
+#define KVM_XEN_INVALID_GFN ((__u64)-1)
+			__u64 hva;
+		} shared_info;
+		struct {
+			__u32 send_port;
+			__u32 type; /* EVTCHNSTAT_ipi / EVTCHNSTAT_interdomain */
+			__u32 flags;
+#define KVM_XEN_EVTCHN_DEASSIGN		(1 << 0)
+#define KVM_XEN_EVTCHN_UPDATE		(1 << 1)
+#define KVM_XEN_EVTCHN_RESET		(1 << 2)
+			/*
+			 * Events sent by the guest are either looped back to
+			 * the guest itself (potentially on a different port#)
+			 * or signalled via an eventfd.
+			 */
+			union {
+				struct {
+					__u32 port;
+					__u32 vcpu;
+					__u32 priority;
+				} port;
+				struct {
+					__u32 port; /* Zero for eventfd */
+					__s32 fd;
+				} eventfd;
+				__u32 padding[4];
+			} deliver;
+		} evtchn;
+		__u32 xen_version;
+		__u64 pad[8];
+	} u;
+};
+
+
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
+#define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
+#define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
+#define KVM_XEN_ATTR_TYPE_UPCALL_VECTOR		0x2
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
+#define KVM_XEN_ATTR_TYPE_EVTCHN		0x3
+#define KVM_XEN_ATTR_TYPE_XEN_VERSION		0x4
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG */
+#define KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG	0x5
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA */
+#define KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA	0x6
+
+struct kvm_xen_vcpu_attr {
+	__u16 type;
+	__u16 pad[3];
+	union {
+		__u64 gpa;
+#define KVM_XEN_INVALID_GPA ((__u64)-1)
+		__u64 hva;
+		__u64 pad[8];
+		struct {
+			__u64 state;
+			__u64 state_entry_time;
+			__u64 time_running;
+			__u64 time_runnable;
+			__u64 time_blocked;
+			__u64 time_offline;
+		} runstate;
+		__u32 vcpu_id;
+		struct {
+			__u32 port;
+			__u32 priority;
+			__u64 expires_ns;
+		} timer;
+		__u8 vector;
+	} u;
+};
+
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO	0x0
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO	0x1
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR	0x2
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT	0x3
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA	0x4
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST	0x5
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID		0x6
+#define KVM_XEN_VCPU_ATTR_TYPE_TIMER		0x7
+#define KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR	0x8
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA */
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA	0x9
+
+/* Secure Encrypted Virtualization command */
+enum sev_cmd_id {
+	/* Guest initialization commands */
+	KVM_SEV_INIT = 0,
+	KVM_SEV_ES_INIT,
+	/* Guest launch commands */
+	KVM_SEV_LAUNCH_START,
+	KVM_SEV_LAUNCH_UPDATE_DATA,
+	KVM_SEV_LAUNCH_UPDATE_VMSA,
+	KVM_SEV_LAUNCH_SECRET,
+	KVM_SEV_LAUNCH_MEASURE,
+	KVM_SEV_LAUNCH_FINISH,
+	/* Guest migration commands (outgoing) */
+	KVM_SEV_SEND_START,
+	KVM_SEV_SEND_UPDATE_DATA,
+	KVM_SEV_SEND_UPDATE_VMSA,
+	KVM_SEV_SEND_FINISH,
+	/* Guest migration commands (incoming) */
+	KVM_SEV_RECEIVE_START,
+	KVM_SEV_RECEIVE_UPDATE_DATA,
+	KVM_SEV_RECEIVE_UPDATE_VMSA,
+	KVM_SEV_RECEIVE_FINISH,
+	/* Guest status and debug commands */
+	KVM_SEV_GUEST_STATUS,
+	KVM_SEV_DBG_DECRYPT,
+	KVM_SEV_DBG_ENCRYPT,
+	/* Guest certificates commands */
+	KVM_SEV_CERT_EXPORT,
+	/* Attestation report */
+	KVM_SEV_GET_ATTESTATION_REPORT,
+	/* Guest Migration Extension */
+	KVM_SEV_SEND_CANCEL,
+
+	/* Second time is the charm; improved versions of the above ioctls.  */
+	KVM_SEV_INIT2,
+
+	/* SNP-specific commands */
+	KVM_SEV_SNP_INIT,
+	KVM_SEV_SNP_LAUNCH_START,
+	KVM_SEV_SNP_LAUNCH_UPDATE,
+	KVM_SEV_SNP_LAUNCH_FINISH,
+
+	KVM_SEV_NR_MAX,
+};
+
+struct kvm_sev_cmd {
+	__u32 id;
+	__u32 pad0;
+	__u64 data;
+	__u32 error;
+	__u32 sev_fd;
+};
+
+struct kvm_sev_init {
+	__u64 vmsa_features;
+	__u32 flags;
+	__u32 pad[9];
+};
+
+struct kvm_sev_launch_start {
+	__u32 handle;
+	__u32 policy;
+	__u64 dh_uaddr;
+	__u32 dh_len;
+	__u32 pad0;
+	__u64 session_uaddr;
+	__u32 session_len;
+	__u32 pad1;
+};
+
+struct kvm_sev_launch_update_data {
+	__u64 uaddr;
+	__u32 len;
+	__u32 pad0;
+};
+
+
+struct kvm_sev_launch_secret {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u32 pad0;
+	__u64 guest_uaddr;
+	__u32 guest_len;
+	__u32 pad1;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+	__u32 pad2;
+};
+
+struct kvm_sev_launch_measure {
+	__u64 uaddr;
+	__u32 len;
+	__u32 pad0;
+};
+
+struct kvm_sev_guest_status {
+	__u32 handle;
+	__u32 policy;
+	__u32 state;
+};
+
+struct kvm_sev_dbg {
+	__u64 src_uaddr;
+	__u64 dst_uaddr;
+	__u32 len;
+	__u32 pad0;
+};
+
+struct kvm_sev_attestation_report {
+	__u8 mnonce[16];
+	__u64 uaddr;
+	__u32 len;
+	__u32 pad0;
+};
+
+struct kvm_sev_send_start {
+	__u32 policy;
+	__u32 pad0;
+	__u64 pdh_cert_uaddr;
+	__u32 pdh_cert_len;
+	__u32 pad1;
+	__u64 plat_certs_uaddr;
+	__u32 plat_certs_len;
+	__u32 pad2;
+	__u64 amd_certs_uaddr;
+	__u32 amd_certs_len;
+	__u32 pad3;
+	__u64 session_uaddr;
+	__u32 session_len;
+	__u32 pad4;
+};
+
+struct kvm_sev_send_update_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u32 pad0;
+	__u64 guest_uaddr;
+	__u32 guest_len;
+	__u32 pad1;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+	__u32 pad2;
+};
+
+struct kvm_sev_receive_start {
+	__u32 handle;
+	__u32 policy;
+	__u64 pdh_uaddr;
+	__u32 pdh_len;
+	__u32 pad0;
+	__u64 session_uaddr;
+	__u32 session_len;
+	__u32 pad1;
+};
+
+struct kvm_sev_receive_update_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u32 pad0;
+	__u64 guest_uaddr;
+	__u32 guest_len;
+	__u32 pad1;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+	__u32 pad2;
+};
+
+/* TODO: use a common struct via KVM_SEV_INIT2 */
+struct kvm_snp_init {
+	__u64 flags;
+};
+
+struct kvm_sev_snp_launch_start {
+	__u64 policy;
+	__u8 gosvw[16];
+};
+
+/* Kept in sync with firmware values for simplicity. */
+#define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
+#define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
+#define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
+#define KVM_SEV_SNP_PAGE_TYPE_SECRETS		0x5
+#define KVM_SEV_SNP_PAGE_TYPE_CPUID		0x6
+
+struct kvm_sev_snp_launch_update {
+	__u64 gfn_start;
+	__u64 uaddr;
+	__u32 len;
+	__u8 type;
+};
+
+#define KVM_SEV_SNP_ID_BLOCK_SIZE	96
+#define KVM_SEV_SNP_ID_AUTH_SIZE	4096
+#define KVM_SEV_SNP_FINISH_DATA_SIZE	32
+
+struct kvm_sev_snp_launch_finish {
+	__u64 id_block_uaddr;
+	__u64 id_auth_uaddr;
+	__u8 id_block_en;
+	__u8 auth_key_en;
+	__u8 host_data[KVM_SEV_SNP_FINISH_DATA_SIZE];
+	__u8 pad[6];
+};
+
+#define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
+
+struct kvm_hyperv_eventfd {
+	__u32 conn_id;
+	__s32 fd;
+	__u32 flags;
+	__u32 padding[3];
+};
+
+#define KVM_HYPERV_CONN_ID_MASK		0x00ffffff
+#define KVM_HYPERV_EVENTFD_DEASSIGN	(1 << 0)
+
 /*
  * Masked event layout.
  * Bits   Description
@@ -547,10 +892,10 @@ struct kvm_pmu_event_filter {
 	((__u64)(!!(exclude)) << 55))
 
 #define KVM_PMU_MASKED_ENTRY_EVENT_SELECT \
-	(GENMASK_ULL(7, 0) | GENMASK_ULL(35, 32))
-#define KVM_PMU_MASKED_ENTRY_UMASK_MASK		(GENMASK_ULL(63, 56))
-#define KVM_PMU_MASKED_ENTRY_UMASK_MATCH	(GENMASK_ULL(15, 8))
-#define KVM_PMU_MASKED_ENTRY_EXCLUDE		(BIT_ULL(55))
+	(__GENMASK_ULL(7, 0) | __GENMASK_ULL(35, 32))
+#define KVM_PMU_MASKED_ENTRY_UMASK_MASK		(__GENMASK_ULL(63, 56))
+#define KVM_PMU_MASKED_ENTRY_UMASK_MATCH	(__GENMASK_ULL(15, 8))
+#define KVM_PMU_MASKED_ENTRY_EXCLUDE		(_BITULL(55))
 #define KVM_PMU_MASKED_ENTRY_UMASK_MASK_SHIFT	(56)
 
 /* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
@@ -558,9 +903,12 @@ struct kvm_pmu_event_filter {
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
 
 /* x86-specific KVM_EXIT_HYPERCALL flags. */
-#define KVM_EXIT_HYPERCALL_LONG_MODE	BIT(0)
+#define KVM_EXIT_HYPERCALL_LONG_MODE	_BITULL(0)
 
 #define KVM_X86_DEFAULT_VM	0
 #define KVM_X86_SW_PROTECTED_VM	1
+#define KVM_X86_SEV_VM		2
+#define KVM_X86_SEV_ES_VM	3
+#define KVM_X86_SNP_VM		4
 
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/bits.h b/linux-headers/linux/bits.h
new file mode 100644
index 00000000000..d9897771be8
--- /dev/null
+++ b/linux-headers/linux/bits.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* bits.h: Macros for dealing with bitmasks.  */
+
+#ifndef _LINUX_BITS_H
+#define _LINUX_BITS_H
+
+#define __GENMASK(h, l) \
+        (((~_UL(0)) - (_UL(1) << (l)) + 1) & \
+         (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
+
+#define __GENMASK_ULL(h, l) \
+        (((~_ULL(0)) - (_ULL(1) << (l)) + 1) & \
+         (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
+
+#endif /* _LINUX_BITS_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 17839229b2a..629a015e4ed 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -16,6 +16,11 @@
 
 #define KVM_API_VERSION 12
 
+/*
+ * Backwards-compatible definitions.
+ */
+#define __KVM_HAVE_GUEST_DEBUG
+
 /* for KVM_SET_USER_MEMORY_REGION */
 struct kvm_userspace_memory_region {
 	__u32 slot;
@@ -85,43 +90,6 @@ struct kvm_pit_config {
 
 #define KVM_PIT_SPEAKER_DUMMY     1
 
-struct kvm_s390_skeys {
-	__u64 start_gfn;
-	__u64 count;
-	__u64 skeydata_addr;
-	__u32 flags;
-	__u32 reserved[9];
-};
-
-#define KVM_S390_CMMA_PEEK (1 << 0)
-
-/**
- * kvm_s390_cmma_log - Used for CMMA migration.
- *
- * Used both for input and output.
- *
- * @start_gfn: Guest page number to start from.
- * @count: Size of the result buffer.
- * @flags: Control operation mode via KVM_S390_CMMA_* flags
- * @remaining: Used with KVM_S390_GET_CMMA_BITS. Indicates how many dirty
- *             pages are still remaining.
- * @mask: Used with KVM_S390_SET_CMMA_BITS. Bitmap of bits to actually set
- *        in the PGSTE.
- * @values: Pointer to the values buffer.
- *
- * Used in KVM_S390_{G,S}ET_CMMA_BITS ioctls.
- */
-struct kvm_s390_cmma_log {
-	__u64 start_gfn;
-	__u32 count;
-	__u32 flags;
-	union {
-		__u64 remaining;
-		__u64 mask;
-	};
-	__u64 values;
-};
-
 struct kvm_hyperv_exit {
 #define KVM_EXIT_HYPERV_SYNIC          1
 #define KVM_EXIT_HYPERV_HCALL          2
@@ -167,6 +135,31 @@ struct kvm_xen_exit {
 	} u;
 };
 
+struct kvm_user_vmgexit {
+#define KVM_USER_VMGEXIT_PSC_MSR	1
+#define KVM_USER_VMGEXIT_PSC		2
+#define KVM_USER_VMGEXIT_EXT_GUEST_REQ	3
+	__u32 type; /* KVM_USER_VMGEXIT_* type */
+	union {
+		struct {
+			__u64 gpa;
+#define KVM_USER_VMGEXIT_PSC_MSR_OP_PRIVATE	1
+#define KVM_USER_VMGEXIT_PSC_MSR_OP_SHARED	2
+			__u8 op;
+			__u32 ret;
+		} psc_msr;
+		struct {
+			__u64 shared_gpa;
+			__u64 ret;
+		} psc;
+		struct {
+			__u64 data_gpa;
+			__u64 data_npages;
+			__u32 ret;
+		} ext_guest_req;
+	};
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX        1048576
 
@@ -210,6 +203,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
+#define KVM_EXIT_VMGEXIT          40
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -313,11 +307,6 @@ struct kvm_run {
 			__u32 ipb;
 		} s390_sieic;
 		/* KVM_EXIT_S390_RESET */
-#define KVM_S390_RESET_POR       1
-#define KVM_S390_RESET_CLEAR     2
-#define KVM_S390_RESET_SUBSYSTEM 4
-#define KVM_S390_RESET_CPU_INIT  8
-#define KVM_S390_RESET_IPL       16
 		__u64 s390_reset_flags;
 		/* KVM_EXIT_S390_UCONTROL */
 		struct {
@@ -466,6 +455,8 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
+		/* KVM_EXIT_VMGEXIT */
+		struct kvm_user_vmgexit vmgexit;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -532,43 +523,6 @@ struct kvm_translation {
 	__u8  pad[5];
 };
 
-/* for KVM_S390_MEM_OP */
-struct kvm_s390_mem_op {
-	/* in */
-	__u64 gaddr;		/* the guest address */
-	__u64 flags;		/* flags */
-	__u32 size;		/* amount of bytes */
-	__u32 op;		/* type of operation */
-	__u64 buf;		/* buffer in userspace */
-	union {
-		struct {
-			__u8 ar;	/* the access register number */
-			__u8 key;	/* access key, ignored if flag unset */
-			__u8 pad1[6];	/* ignored */
-			__u64 old_addr;	/* ignored if cmpxchg flag unset */
-		};
-		__u32 sida_offset; /* offset into the sida */
-		__u8 reserved[32]; /* ignored */
-	};
-};
-/* types for kvm_s390_mem_op->op */
-#define KVM_S390_MEMOP_LOGICAL_READ	0
-#define KVM_S390_MEMOP_LOGICAL_WRITE	1
-#define KVM_S390_MEMOP_SIDA_READ	2
-#define KVM_S390_MEMOP_SIDA_WRITE	3
-#define KVM_S390_MEMOP_ABSOLUTE_READ	4
-#define KVM_S390_MEMOP_ABSOLUTE_WRITE	5
-#define KVM_S390_MEMOP_ABSOLUTE_CMPXCHG	6
-
-/* flags for kvm_s390_mem_op->flags */
-#define KVM_S390_MEMOP_F_CHECK_ONLY		(1ULL << 0)
-#define KVM_S390_MEMOP_F_INJECT_EXCEPTION	(1ULL << 1)
-#define KVM_S390_MEMOP_F_SKEY_PROTECTION	(1ULL << 2)
-
-/* flags specifying extension support via KVM_CAP_S390_MEM_OP_EXTENSION */
-#define KVM_S390_MEMOP_EXTENSION_CAP_BASE	(1 << 0)
-#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG	(1 << 1)
-
 /* for KVM_INTERRUPT */
 struct kvm_interrupt {
 	/* in */
@@ -633,124 +587,6 @@ struct kvm_mp_state {
 	__u32 mp_state;
 };
 
-struct kvm_s390_psw {
-	__u64 mask;
-	__u64 addr;
-};
-
-/* valid values for type in kvm_s390_interrupt */
-#define KVM_S390_SIGP_STOP		0xfffe0000u
-#define KVM_S390_PROGRAM_INT		0xfffe0001u
-#define KVM_S390_SIGP_SET_PREFIX	0xfffe0002u
-#define KVM_S390_RESTART		0xfffe0003u
-#define KVM_S390_INT_PFAULT_INIT	0xfffe0004u
-#define KVM_S390_INT_PFAULT_DONE	0xfffe0005u
-#define KVM_S390_MCHK			0xfffe1000u
-#define KVM_S390_INT_CLOCK_COMP		0xffff1004u
-#define KVM_S390_INT_CPU_TIMER		0xffff1005u
-#define KVM_S390_INT_VIRTIO		0xffff2603u
-#define KVM_S390_INT_SERVICE		0xffff2401u
-#define KVM_S390_INT_EMERGENCY		0xffff1201u
-#define KVM_S390_INT_EXTERNAL_CALL	0xffff1202u
-/* Anything below 0xfffe0000u is taken by INT_IO */
-#define KVM_S390_INT_IO(ai,cssid,ssid,schid)   \
-	(((schid)) |			       \
-	 ((ssid) << 16) |		       \
-	 ((cssid) << 18) |		       \
-	 ((ai) << 26))
-#define KVM_S390_INT_IO_MIN		0x00000000u
-#define KVM_S390_INT_IO_MAX		0xfffdffffu
-#define KVM_S390_INT_IO_AI_MASK		0x04000000u
-
-
-struct kvm_s390_interrupt {
-	__u32 type;
-	__u32 parm;
-	__u64 parm64;
-};
-
-struct kvm_s390_io_info {
-	__u16 subchannel_id;
-	__u16 subchannel_nr;
-	__u32 io_int_parm;
-	__u32 io_int_word;
-};
-
-struct kvm_s390_ext_info {
-	__u32 ext_params;
-	__u32 pad;
-	__u64 ext_params2;
-};
-
-struct kvm_s390_pgm_info {
-	__u64 trans_exc_code;
-	__u64 mon_code;
-	__u64 per_address;
-	__u32 data_exc_code;
-	__u16 code;
-	__u16 mon_class_nr;
-	__u8 per_code;
-	__u8 per_atmid;
-	__u8 exc_access_id;
-	__u8 per_access_id;
-	__u8 op_access_id;
-#define KVM_S390_PGM_FLAGS_ILC_VALID	0x01
-#define KVM_S390_PGM_FLAGS_ILC_0	0x02
-#define KVM_S390_PGM_FLAGS_ILC_1	0x04
-#define KVM_S390_PGM_FLAGS_ILC_MASK	0x06
-#define KVM_S390_PGM_FLAGS_NO_REWIND	0x08
-	__u8 flags;
-	__u8 pad[2];
-};
-
-struct kvm_s390_prefix_info {
-	__u32 address;
-};
-
-struct kvm_s390_extcall_info {
-	__u16 code;
-};
-
-struct kvm_s390_emerg_info {
-	__u16 code;
-};
-
-#define KVM_S390_STOP_FLAG_STORE_STATUS	0x01
-struct kvm_s390_stop_info {
-	__u32 flags;
-};
-
-struct kvm_s390_mchk_info {
-	__u64 cr14;
-	__u64 mcic;
-	__u64 failing_storage_address;
-	__u32 ext_damage_code;
-	__u32 pad;
-	__u8 fixed_logout[16];
-};
-
-struct kvm_s390_irq {
-	__u64 type;
-	union {
-		struct kvm_s390_io_info io;
-		struct kvm_s390_ext_info ext;
-		struct kvm_s390_pgm_info pgm;
-		struct kvm_s390_emerg_info emerg;
-		struct kvm_s390_extcall_info extcall;
-		struct kvm_s390_prefix_info prefix;
-		struct kvm_s390_stop_info stop;
-		struct kvm_s390_mchk_info mchk;
-		char reserved[64];
-	} u;
-};
-
-struct kvm_s390_irq_state {
-	__u64 buf;
-	__u32 flags;        /* will stay unused for compatibility reasons */
-	__u32 len;
-	__u32 reserved[4];  /* will stay unused for compatibility reasons */
-};
-
 /* for KVM_SET_GUEST_DEBUG */
 
 #define KVM_GUESTDBG_ENABLE		0x00000001
@@ -806,50 +642,6 @@ struct kvm_enable_cap {
 	__u8  pad[64];
 };
 
-/* for KVM_PPC_GET_PVINFO */
-
-#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)
-
-struct kvm_ppc_pvinfo {
-	/* out */
-	__u32 flags;
-	__u32 hcall[4];
-	__u8  pad[108];
-};
-
-/* for KVM_PPC_GET_SMMU_INFO */
-#define KVM_PPC_PAGE_SIZES_MAX_SZ	8
-
-struct kvm_ppc_one_page_size {
-	__u32 page_shift;	/* Page shift (or 0) */
-	__u32 pte_enc;		/* Encoding in the HPTE (>>12) */
-};
-
-struct kvm_ppc_one_seg_page_size {
-	__u32 page_shift;	/* Base page shift of segment (or 0) */
-	__u32 slb_enc;		/* SLB encoding for BookS */
-	struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
-};
-
-#define KVM_PPC_PAGE_SIZES_REAL		0x00000001
-#define KVM_PPC_1T_SEGMENTS		0x00000002
-#define KVM_PPC_NO_HASH			0x00000004
-
-struct kvm_ppc_smmu_info {
-	__u64 flags;
-	__u32 slb_size;
-	__u16 data_keys;	/* # storage keys supported for data */
-	__u16 instr_keys;	/* # storage keys supported for instructions */
-	struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
-};
-
-/* for KVM_PPC_RESIZE_HPT_{PREPARE,COMMIT} */
-struct kvm_ppc_resize_hpt {
-	__u64 flags;
-	__u32 shift;
-	__u32 pad;
-};
-
 #define KVMIO 0xAE
 
 /* machine type bits, to be used as argument to KVM_CREATE_VM */
@@ -919,9 +711,7 @@ struct kvm_ppc_resize_hpt {
 /* Bug in KVM_SET_USER_MEMORY_REGION fixed: */
 #define KVM_CAP_DESTROY_MEMORY_REGION_WORKS 21
 #define KVM_CAP_USER_NMI 22
-#ifdef __KVM_HAVE_GUEST_DEBUG
 #define KVM_CAP_SET_GUEST_DEBUG 23
-#endif
 #ifdef __KVM_HAVE_PIT
 #define KVM_CAP_REINJECT_CONTROL 24
 #endif
@@ -1152,8 +942,6 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_GUEST_MEMFD 234
 #define KVM_CAP_VM_TYPES 235
 
-#ifdef KVM_CAP_IRQ_ROUTING
-
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
 	__u32 pin;
@@ -1218,42 +1006,6 @@ struct kvm_irq_routing {
 	struct kvm_irq_routing_entry entries[];
 };
 
-#endif
-
-#ifdef KVM_CAP_MCE
-/* x86 MCE */
-struct kvm_x86_mce {
-	__u64 status;
-	__u64 addr;
-	__u64 misc;
-	__u64 mcg_status;
-	__u8 bank;
-	__u8 pad1[7];
-	__u64 pad2[3];
-};
-#endif
-
-#ifdef KVM_CAP_XEN_HVM
-#define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR	(1 << 0)
-#define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL	(1 << 1)
-#define KVM_XEN_HVM_CONFIG_SHARED_INFO		(1 << 2)
-#define KVM_XEN_HVM_CONFIG_RUNSTATE		(1 << 3)
-#define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL	(1 << 4)
-#define KVM_XEN_HVM_CONFIG_EVTCHN_SEND		(1 << 5)
-#define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG	(1 << 6)
-#define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
-
-struct kvm_xen_hvm_config {
-	__u32 flags;
-	__u32 msr;
-	__u64 blob_addr_32;
-	__u64 blob_addr_64;
-	__u8 blob_size_32;
-	__u8 blob_size_64;
-	__u8 pad2[30];
-};
-#endif
-
 #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
 /*
  * Available with KVM_CAP_IRQFD_RESAMPLE
@@ -1438,11 +1190,6 @@ struct kvm_vfio_spapr_tce {
 					 struct kvm_userspace_memory_region2)
 
 /* enable ucontrol for s390 */
-struct kvm_s390_ucas_mapping {
-	__u64 user_addr;
-	__u64 vcpu_addr;
-	__u64 length;
-};
 #define KVM_S390_UCAS_MAP        _IOW(KVMIO, 0x50, struct kvm_s390_ucas_mapping)
 #define KVM_S390_UCAS_UNMAP      _IOW(KVMIO, 0x51, struct kvm_s390_ucas_mapping)
 #define KVM_S390_VCPU_FAULT	 _IOW(KVMIO, 0x52, unsigned long)
@@ -1637,89 +1384,6 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
-struct kvm_s390_pv_sec_parm {
-	__u64 origin;
-	__u64 length;
-};
-
-struct kvm_s390_pv_unp {
-	__u64 addr;
-	__u64 size;
-	__u64 tweak;
-};
-
-enum pv_cmd_dmp_id {
-	KVM_PV_DUMP_INIT,
-	KVM_PV_DUMP_CONFIG_STOR_STATE,
-	KVM_PV_DUMP_COMPLETE,
-	KVM_PV_DUMP_CPU,
-};
-
-struct kvm_s390_pv_dmp {
-	__u64 subcmd;
-	__u64 buff_addr;
-	__u64 buff_len;
-	__u64 gaddr;		/* For dump storage state */
-	__u64 reserved[4];
-};
-
-enum pv_cmd_info_id {
-	KVM_PV_INFO_VM,
-	KVM_PV_INFO_DUMP,
-};
-
-struct kvm_s390_pv_info_dump {
-	__u64 dump_cpu_buffer_len;
-	__u64 dump_config_mem_buffer_per_1m;
-	__u64 dump_config_finalize_len;
-};
-
-struct kvm_s390_pv_info_vm {
-	__u64 inst_calls_list[4];
-	__u64 max_cpus;
-	__u64 max_guests;
-	__u64 max_guest_addr;
-	__u64 feature_indication;
-};
-
-struct kvm_s390_pv_info_header {
-	__u32 id;
-	__u32 len_max;
-	__u32 len_written;
-	__u32 reserved;
-};
-
-struct kvm_s390_pv_info {
-	struct kvm_s390_pv_info_header header;
-	union {
-		struct kvm_s390_pv_info_dump dump;
-		struct kvm_s390_pv_info_vm vm;
-	};
-};
-
-enum pv_cmd_id {
-	KVM_PV_ENABLE,
-	KVM_PV_DISABLE,
-	KVM_PV_SET_SEC_PARMS,
-	KVM_PV_UNPACK,
-	KVM_PV_VERIFY,
-	KVM_PV_PREP_RESET,
-	KVM_PV_UNSHARE_ALL,
-	KVM_PV_INFO,
-	KVM_PV_DUMP,
-	KVM_PV_ASYNC_CLEANUP_PREPARE,
-	KVM_PV_ASYNC_CLEANUP_PERFORM,
-};
-
-struct kvm_pv_cmd {
-	__u32 cmd;	/* Command to be executed */
-	__u16 rc;	/* Ultravisor return code */
-	__u16 rrc;	/* Ultravisor return reason code */
-	__u64 data;	/* Data or address */
-	__u32 flags;    /* flags for future extensions. Must be 0 for now */
-	__u32 reserved[3];
-};
-
 /* Available with KVM_CAP_S390_PROTECTED */
 #define KVM_S390_PV_COMMAND		_IOWR(KVMIO, 0xc5, struct kvm_pv_cmd)
 
@@ -1733,58 +1397,6 @@ struct kvm_pv_cmd {
 #define KVM_XEN_HVM_GET_ATTR	_IOWR(KVMIO, 0xc8, struct kvm_xen_hvm_attr)
 #define KVM_XEN_HVM_SET_ATTR	_IOW(KVMIO,  0xc9, struct kvm_xen_hvm_attr)
 
-struct kvm_xen_hvm_attr {
-	__u16 type;
-	__u16 pad[3];
-	union {
-		__u8 long_mode;
-		__u8 vector;
-		__u8 runstate_update_flag;
-		struct {
-			__u64 gfn;
-#define KVM_XEN_INVALID_GFN ((__u64)-1)
-		} shared_info;
-		struct {
-			__u32 send_port;
-			__u32 type; /* EVTCHNSTAT_ipi / EVTCHNSTAT_interdomain */
-			__u32 flags;
-#define KVM_XEN_EVTCHN_DEASSIGN		(1 << 0)
-#define KVM_XEN_EVTCHN_UPDATE		(1 << 1)
-#define KVM_XEN_EVTCHN_RESET		(1 << 2)
-			/*
-			 * Events sent by the guest are either looped back to
-			 * the guest itself (potentially on a different port#)
-			 * or signalled via an eventfd.
-			 */
-			union {
-				struct {
-					__u32 port;
-					__u32 vcpu;
-					__u32 priority;
-				} port;
-				struct {
-					__u32 port; /* Zero for eventfd */
-					__s32 fd;
-				} eventfd;
-				__u32 padding[4];
-			} deliver;
-		} evtchn;
-		__u32 xen_version;
-		__u64 pad[8];
-	} u;
-};
-
-
-/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
-#define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
-#define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
-#define KVM_XEN_ATTR_TYPE_UPCALL_VECTOR		0x2
-/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
-#define KVM_XEN_ATTR_TYPE_EVTCHN		0x3
-#define KVM_XEN_ATTR_TYPE_XEN_VERSION		0x4
-/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG */
-#define KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG	0x5
-
 /* Per-vCPU Xen attributes */
 #define KVM_XEN_VCPU_GET_ATTR	_IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
 #define KVM_XEN_VCPU_SET_ATTR	_IOW(KVMIO,  0xcb, struct kvm_xen_vcpu_attr)
@@ -1795,242 +1407,6 @@ struct kvm_xen_hvm_attr {
 #define KVM_GET_SREGS2             _IOR(KVMIO,  0xcc, struct kvm_sregs2)
 #define KVM_SET_SREGS2             _IOW(KVMIO,  0xcd, struct kvm_sregs2)
 
-struct kvm_xen_vcpu_attr {
-	__u16 type;
-	__u16 pad[3];
-	union {
-		__u64 gpa;
-#define KVM_XEN_INVALID_GPA ((__u64)-1)
-		__u64 pad[8];
-		struct {
-			__u64 state;
-			__u64 state_entry_time;
-			__u64 time_running;
-			__u64 time_runnable;
-			__u64 time_blocked;
-			__u64 time_offline;
-		} runstate;
-		__u32 vcpu_id;
-		struct {
-			__u32 port;
-			__u32 priority;
-			__u64 expires_ns;
-		} timer;
-		__u8 vector;
-	} u;
-};
-
-/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
-#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO	0x0
-#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO	0x1
-#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR	0x2
-#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT	0x3
-#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA	0x4
-#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST	0x5
-/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND */
-#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID		0x6
-#define KVM_XEN_VCPU_ATTR_TYPE_TIMER		0x7
-#define KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR	0x8
-
-/* Secure Encrypted Virtualization command */
-enum sev_cmd_id {
-	/* Guest initialization commands */
-	KVM_SEV_INIT = 0,
-	KVM_SEV_ES_INIT,
-	/* Guest launch commands */
-	KVM_SEV_LAUNCH_START,
-	KVM_SEV_LAUNCH_UPDATE_DATA,
-	KVM_SEV_LAUNCH_UPDATE_VMSA,
-	KVM_SEV_LAUNCH_SECRET,
-	KVM_SEV_LAUNCH_MEASURE,
-	KVM_SEV_LAUNCH_FINISH,
-	/* Guest migration commands (outgoing) */
-	KVM_SEV_SEND_START,
-	KVM_SEV_SEND_UPDATE_DATA,
-	KVM_SEV_SEND_UPDATE_VMSA,
-	KVM_SEV_SEND_FINISH,
-	/* Guest migration commands (incoming) */
-	KVM_SEV_RECEIVE_START,
-	KVM_SEV_RECEIVE_UPDATE_DATA,
-	KVM_SEV_RECEIVE_UPDATE_VMSA,
-	KVM_SEV_RECEIVE_FINISH,
-	/* Guest status and debug commands */
-	KVM_SEV_GUEST_STATUS,
-	KVM_SEV_DBG_DECRYPT,
-	KVM_SEV_DBG_ENCRYPT,
-	/* Guest certificates commands */
-	KVM_SEV_CERT_EXPORT,
-	/* Attestation report */
-	KVM_SEV_GET_ATTESTATION_REPORT,
-	/* Guest Migration Extension */
-	KVM_SEV_SEND_CANCEL,
-
-	KVM_SEV_NR_MAX,
-};
-
-struct kvm_sev_cmd {
-	__u32 id;
-	__u64 data;
-	__u32 error;
-	__u32 sev_fd;
-};
-
-struct kvm_sev_launch_start {
-	__u32 handle;
-	__u32 policy;
-	__u64 dh_uaddr;
-	__u32 dh_len;
-	__u64 session_uaddr;
-	__u32 session_len;
-};
-
-struct kvm_sev_launch_update_data {
-	__u64 uaddr;
-	__u32 len;
-};
-
-
-struct kvm_sev_launch_secret {
-	__u64 hdr_uaddr;
-	__u32 hdr_len;
-	__u64 guest_uaddr;
-	__u32 guest_len;
-	__u64 trans_uaddr;
-	__u32 trans_len;
-};
-
-struct kvm_sev_launch_measure {
-	__u64 uaddr;
-	__u32 len;
-};
-
-struct kvm_sev_guest_status {
-	__u32 handle;
-	__u32 policy;
-	__u32 state;
-};
-
-struct kvm_sev_dbg {
-	__u64 src_uaddr;
-	__u64 dst_uaddr;
-	__u32 len;
-};
-
-struct kvm_sev_attestation_report {
-	__u8 mnonce[16];
-	__u64 uaddr;
-	__u32 len;
-};
-
-struct kvm_sev_send_start {
-	__u32 policy;
-	__u64 pdh_cert_uaddr;
-	__u32 pdh_cert_len;
-	__u64 plat_certs_uaddr;
-	__u32 plat_certs_len;
-	__u64 amd_certs_uaddr;
-	__u32 amd_certs_len;
-	__u64 session_uaddr;
-	__u32 session_len;
-};
-
-struct kvm_sev_send_update_data {
-	__u64 hdr_uaddr;
-	__u32 hdr_len;
-	__u64 guest_uaddr;
-	__u32 guest_len;
-	__u64 trans_uaddr;
-	__u32 trans_len;
-};
-
-struct kvm_sev_receive_start {
-	__u32 handle;
-	__u32 policy;
-	__u64 pdh_uaddr;
-	__u32 pdh_len;
-	__u64 session_uaddr;
-	__u32 session_len;
-};
-
-struct kvm_sev_receive_update_data {
-	__u64 hdr_uaddr;
-	__u32 hdr_len;
-	__u64 guest_uaddr;
-	__u32 guest_len;
-	__u64 trans_uaddr;
-	__u32 trans_len;
-};
-
-#define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
-#define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
-#define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
-
-struct kvm_assigned_pci_dev {
-	__u32 assigned_dev_id;
-	__u32 busnr;
-	__u32 devfn;
-	__u32 flags;
-	__u32 segnr;
-	union {
-		__u32 reserved[11];
-	};
-};
-
-#define KVM_DEV_IRQ_HOST_INTX    (1 << 0)
-#define KVM_DEV_IRQ_HOST_MSI     (1 << 1)
-#define KVM_DEV_IRQ_HOST_MSIX    (1 << 2)
-
-#define KVM_DEV_IRQ_GUEST_INTX   (1 << 8)
-#define KVM_DEV_IRQ_GUEST_MSI    (1 << 9)
-#define KVM_DEV_IRQ_GUEST_MSIX   (1 << 10)
-
-#define KVM_DEV_IRQ_HOST_MASK	 0x00ff
-#define KVM_DEV_IRQ_GUEST_MASK   0xff00
-
-struct kvm_assigned_irq {
-	__u32 assigned_dev_id;
-	__u32 host_irq; /* ignored (legacy field) */
-	__u32 guest_irq;
-	__u32 flags;
-	union {
-		__u32 reserved[12];
-	};
-};
-
-struct kvm_assigned_msix_nr {
-	__u32 assigned_dev_id;
-	__u16 entry_nr;
-	__u16 padding;
-};
-
-#define KVM_MAX_MSIX_PER_DEV		256
-struct kvm_assigned_msix_entry {
-	__u32 assigned_dev_id;
-	__u32 gsi;
-	__u16 entry; /* The index of entry in the MSI-X table */
-	__u16 padding[3];
-};
-
-#define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
-#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
-
-/* Available with KVM_CAP_ARM_USER_IRQ */
-
-/* Bits for run->s.regs.device_irq_level */
-#define KVM_ARM_DEV_EL1_VTIMER		(1 << 0)
-#define KVM_ARM_DEV_EL1_PTIMER		(1 << 1)
-#define KVM_ARM_DEV_PMU			(1 << 2)
-
-struct kvm_hyperv_eventfd {
-	__u32 conn_id;
-	__s32 fd;
-	__u32 flags;
-	__u32 padding[3];
-};
-
-#define KVM_HYPERV_CONN_ID_MASK		0x00ffffff
-#define KVM_HYPERV_EVENTFD_DEASSIGN	(1 << 0)
-
 #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE    (1 << 0)
 #define KVM_DIRTY_LOG_INITIALLY_SET            (1 << 1)
 
@@ -2176,33 +1552,6 @@ struct kvm_stats_desc {
 /* Available with KVM_CAP_S390_ZPCI_OP */
 #define KVM_S390_ZPCI_OP         _IOW(KVMIO,  0xd1, struct kvm_s390_zpci_op)
 
-struct kvm_s390_zpci_op {
-	/* in */
-	__u32 fh;               /* target device */
-	__u8  op;               /* operation to perform */
-	__u8  pad[3];
-	union {
-		/* for KVM_S390_ZPCIOP_REG_AEN */
-		struct {
-			__u64 ibv;      /* Guest addr of interrupt bit vector */
-			__u64 sb;       /* Guest addr of summary bit */
-			__u32 flags;
-			__u32 noi;      /* Number of interrupts */
-			__u8 isc;       /* Guest interrupt subclass */
-			__u8 sbo;       /* Offset of guest summary bit vector */
-			__u16 pad;
-		} reg_aen;
-		__u64 reserved[8];
-	} u;
-};
-
-/* types for kvm_s390_zpci_op->op */
-#define KVM_S390_ZPCIOP_REG_AEN                0
-#define KVM_S390_ZPCIOP_DEREG_AEN      1
-
-/* flags for kvm_s390_zpci_op->u.reg_aen.flags */
-#define KVM_S390_ZPCIOP_REGAEN_HOST    (1 << 0)
-
 /* Available with KVM_CAP_MEMORY_ATTRIBUTES */
 #define KVM_SET_MEMORY_ATTRIBUTES              _IOW(KVMIO,  0xd2, struct kvm_memory_attributes)
 
diff --git a/linux-headers/linux/psp-sev.h b/linux-headers/linux/psp-sev.h
index bcb21339ee3..3095af51e5c 100644
--- a/linux-headers/linux/psp-sev.h
+++ b/linux-headers/linux/psp-sev.h
@@ -28,6 +28,11 @@ enum {
 	SEV_PEK_CERT_IMPORT,
 	SEV_GET_ID,	/* This command is deprecated, use SEV_GET_ID2 */
 	SEV_GET_ID2,
+	SNP_PLATFORM_STATUS,
+	SNP_COMMIT,
+	SNP_SET_CONFIG,
+	SNP_SET_CONFIG_START,
+	SNP_SET_CONFIG_END,
 
 	SEV_MAX,
 };
@@ -69,6 +74,12 @@ typedef enum {
 	SEV_RET_RESOURCE_LIMIT,
 	SEV_RET_SECURE_DATA_INVALID,
 	SEV_RET_INVALID_KEY = 0x27,
+	SEV_RET_INVALID_PAGE_SIZE,
+	SEV_RET_INVALID_PAGE_STATE,
+	SEV_RET_INVALID_MDATA_ENTRY,
+	SEV_RET_INVALID_PAGE_OWNER,
+	SEV_RET_INVALID_PAGE_AEAD_OFLOW,
+	SEV_RET_RMP_INIT_REQUIRED,
 	SEV_RET_MAX,
 } sev_ret_code;
 
@@ -155,6 +166,66 @@ struct sev_user_data_get_id2 {
 	__u32 length;				/* In/Out */
 } __attribute__((packed));
 
+/**
+ * struct sev_user_data_snp_status - SNP status
+ *
+ * @api_major: API major version
+ * @api_minor: API minor version
+ * @state: current platform state
+ * @is_rmp_initialized: whether RMP is initialized or not
+ * @rsvd: reserved
+ * @build_id: firmware build id for the API version
+ * @mask_chip_id: whether chip id is present in attestation reports or not
+ * @mask_chip_key: whether attestation reports are signed or not
+ * @vlek_en: VLEK (Version Loaded Endorsement Key) hashstick is loaded
+ * @rsvd1: reserved
+ * @guest_count: the number of guest currently managed by the firmware
+ * @current_tcb_version: current TCB version
+ * @reported_tcb_version: reported TCB version
+ */
+struct sev_user_data_snp_status {
+	__u8 api_major;			/* Out */
+	__u8 api_minor;			/* Out */
+	__u8 state;			/* Out */
+	__u8 is_rmp_initialized:1;	/* Out */
+	__u8 rsvd:7;
+	__u32 build_id;			/* Out */
+	__u32 mask_chip_id:1;		/* Out */
+	__u32 mask_chip_key:1;		/* Out */
+	__u32 vlek_en:1;		/* Out */
+	__u32 rsvd1:29;
+	__u32 guest_count;		/* Out */
+	__u64 current_tcb_version;	/* Out */
+	__u64 reported_tcb_version;	/* Out */
+} __attribute__((packed));
+
+/**
+ * struct sev_user_data_snp_config - system wide configuration value for SNP.
+ *
+ * @reported_tcb: the TCB version to report in the guest attestation report.
+ * @mask_chip_id: whether chip id is present in attestation reports or not
+ * @mask_chip_key: whether attestation reports are signed or not
+ * @rsvd: reserved
+ * @rsvd1: reserved
+ */
+struct sev_user_data_snp_config {
+	__u64 reported_tcb  ;   /* In */
+	__u32 mask_chip_id:1;   /* In */
+	__u32 mask_chip_key:1;  /* In */
+	__u32 rsvd:30;          /* In */
+	__u8 rsvd1[52];
+} __attribute__((packed));
+
+/**
+ * struct sev_user_data_snp_config_transaction - metadata for config transactions
+ *
+ * @id: the ID of the transaction started/ended by a call to SNP_SET_CONFIG_START
+ *	or SNP_SET_CONFIG_END, respectively.
+ */
+struct sev_user_data_snp_config_transaction {
+	__u64 id;		/* Out */
+} __attribute__((packed));
+
 /**
  * struct sev_issue_cmd - SEV ioctl parameters
  *
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/26] [TO SQUASH] hw/i386: Remove redeclaration of struct setup_data
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (8 preceding siblings ...)
  2024-03-22 18:10 ` [PATCH 09/26] [HACK] linux-headers: Update headers for 6.8 + kvm-coco-queue + SNP Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 11/26] runstate: skip initial CPU reset if reset is not actually possible Paolo Bonzini
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Michael Roth <michael.roth@amd.com>

It is now provided by kernel headers.

This needs to be squashed with the header update to avoid temporary
build bisect breakage. Keeping it separate for reference.

Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240320083945.991426-6-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/i386/x86.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index ffbda48917f..84a48019770 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -679,14 +679,6 @@ DeviceState *ioapic_init_secondary(GSIState *gsi_state)
     return dev;
 }
 
-struct setup_data {
-    uint64_t next;
-    uint32_t type;
-    uint32_t len;
-    uint8_t data[];
-} __attribute__((packed));
-
-
 /*
  * The entry point into the kernel for PVH boot is different from
  * the native entry point.  The PVH entry is defined by the x86/HVM
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 11/26] runstate: skip initial CPU reset if reset is not actually possible
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (9 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 10/26] [TO SQUASH] hw/i386: Remove redeclaration of struct setup_data Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-25  8:58   ` Daniel P. Berrangé
  2024-03-25  9:32   ` Philippe Mathieu-Daudé
  2024-03-22 18:11 ` [PATCH 12/26] KVM: track whether guest state is encrypted Paolo Bonzini
                   ` (14 subsequent siblings)
  25 siblings, 2 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

Right now, the system reset is concluded by a call to
cpu_synchronize_all_post_reset() in order to sync any changes
that the machine reset callback applied to the CPU state.

However, for VMs with encrypted state such as SEV-ES guests (currently
the only case of guests with non-resettable CPUs) this cannot be done,
because guest state has already been finalized by machine-init-done notifiers.
cpu_synchronize_all_post_reset() does nothing on these guests, and actually
we would like to make it fail if called once guest has been encrypted.
So, assume that boards that support non-resettable CPUs do not touch
CPU state and that all such setup is done before, at the time of
cpu_synchronize_all_post_init().

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 system/runstate.c | 15 ++++++++++++++-
 roms/edk2         |  2 +-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/system/runstate.c b/system/runstate.c
index d6ab860ecaa..cb4905a40fc 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -501,7 +501,20 @@ void qemu_system_reset(ShutdownCause reason)
     default:
         qapi_event_send_reset(shutdown_caused_by_guest(reason), reason);
     }
-    cpu_synchronize_all_post_reset();
+
+    /*
+     * Some boards use the machine reset callback to point CPUs to the firmware
+     * entry point.  Assume that this is not the case for boards that support
+     * non-resettable CPUs (currently used only for confidential guests), in
+     * which case cpu_synchronize_all_post_init() is enough because
+     * it does _more_  than cpu_synchronize_all_post_reset().
+     */
+    if (cpus_are_resettable()) {
+        cpu_synchronize_all_post_reset();
+    } else {
+        assert(runstate_check(RUN_STATE_PRELAUNCH));
+    }
+
     vm_set_suspended(false);
 }
 
diff --git a/roms/edk2 b/roms/edk2
index edc6681206c..819cfc6b42a 160000
--- a/roms/edk2
+++ b/roms/edk2
@@ -1 +1 @@
-Subproject commit edc6681206c1a8791981a2f911d2fb8b3d2f5768
+Subproject commit 819cfc6b42a68790a23509e4fcc58ceb70e1965e
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 12/26] KVM: track whether guest state is encrypted
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (10 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 11/26] runstate: skip initial CPU reset if reset is not actually possible Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-25  9:25   ` Philippe Mathieu-Daudé
  2024-03-26 15:48   ` Xiaoyao Li
  2024-03-22 18:11 ` [PATCH 13/26] KVM: remove kvm_arch_cpu_check_are_resettable Paolo Bonzini
                   ` (13 subsequent siblings)
  25 siblings, 2 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

So far, KVM has allowed KVM_GET/SET_* ioctls to execute even if the
guest state is encrypted, in which case they do nothing.  For the new
API using VM types, instead, the ioctls will fail which is a safer and
more robust approach.

The new API will be the only one available for SEV-SNP and TDX, but it
is also usable for SEV and SEV-ES.  In preparation for that, require
architecture-specific KVM code to communicate the point at which guest
state is protected (which must be after kvm_cpu_synchronize_post_init(),
though that might change in the future in order to suppor migration).
From that point, skip reading registers so that cpu->vcpu_dirty is
never true: if it ever becomes true, kvm_arch_put_registers() will
fail miserably.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/sysemu/kvm.h     |  2 ++
 include/sysemu/kvm_int.h |  1 +
 accel/kvm/kvm-all.c      | 14 ++++++++++++--
 target/i386/sev.c        |  1 +
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index fad9a7e8ff3..302e8f6f1e5 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -539,6 +539,8 @@ bool kvm_dirty_ring_enabled(void);
 
 uint32_t kvm_dirty_ring_size(void);
 
+void kvm_mark_guest_state_protected(void);
+
 /**
  * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
  * reported for the VM.
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 882e37e12c5..3496be7997a 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -87,6 +87,7 @@ struct KVMState
     bool kernel_irqchip_required;
     OnOffAuto kernel_irqchip_split;
     bool sync_mmu;
+    bool guest_state_protected;
     uint64_t manual_dirty_log_protect;
     /* The man page (and posix) say ioctl numbers are signed int, but
      * they're not.  Linux, glibc and *BSD all treat ioctl numbers as
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a8cecd040eb..05fa3533c66 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2698,7 +2698,7 @@ bool kvm_cpu_check_are_resettable(void)
 
 static void do_kvm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
 {
-    if (!cpu->vcpu_dirty) {
+    if (!cpu->vcpu_dirty && !kvm_state->guest_state_protected) {
         int ret = kvm_arch_get_registers(cpu);
         if (ret) {
             error_report("Failed to get registers: %s", strerror(-ret));
@@ -2712,7 +2712,7 @@ static void do_kvm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
 
 void kvm_cpu_synchronize_state(CPUState *cpu)
 {
-    if (!cpu->vcpu_dirty) {
+    if (!cpu->vcpu_dirty && !kvm_state->guest_state_protected) {
         run_on_cpu(cpu, do_kvm_cpu_synchronize_state, RUN_ON_CPU_NULL);
     }
 }
@@ -2747,6 +2747,11 @@ static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
 
 void kvm_cpu_synchronize_post_init(CPUState *cpu)
 {
+    /*
+     * This runs before the machine_init_done notifiers, and is the last
+     * opportunity to synchronize the state of confidential guests.
+     */
+    assert(!kvm_state->guest_state_protected);
     run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 }
 
@@ -4094,3 +4099,8 @@ void query_stats_schemas_cb(StatsSchemaList **result, Error **errp)
         query_stats_schema_vcpu(first_cpu, &stats_args);
     }
 }
+
+void kvm_mark_guest_state_protected(void)
+{
+    kvm_state->guest_state_protected = true;
+}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index b8f79d34d19..c49a8fd55eb 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -755,6 +755,7 @@ sev_launch_get_measure(Notifier *notifier, void *unused)
         if (ret) {
             exit(1);
         }
+        kvm_mark_guest_state_protected();
     }
 
     /* query the measurement blob length */
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 13/26] KVM: remove kvm_arch_cpu_check_are_resettable
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (11 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 12/26] KVM: track whether guest state is encrypted Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-25  9:26   ` Philippe Mathieu-Daudé
  2024-03-22 18:11 ` [PATCH 14/26] target/i386: introduce x86-confidential-guest Paolo Bonzini
                   ` (12 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

Board reset requires writing a fresh CPU state.  As far as KVM is
concerned, the only thing that blocks reset is that CPU state is
encrypted; therefore, kvm_cpus_are_resettable() can simply check
if that is the case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/sysemu/kvm.h       | 10 ----------
 accel/kvm/kvm-accel-ops.c  |  2 +-
 accel/kvm/kvm-all.c        |  5 -----
 target/arm/kvm.c           |  5 -----
 target/i386/kvm/kvm.c      |  5 -----
 target/loongarch/kvm/kvm.c |  5 -----
 target/mips/kvm.c          |  5 -----
 target/ppc/kvm.c           |  5 -----
 target/riscv/kvm/kvm-cpu.c |  5 -----
 target/s390x/kvm/kvm.c     |  5 -----
 10 files changed, 1 insertion(+), 51 deletions(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 302e8f6f1e5..54f4d83a370 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -525,16 +525,6 @@ int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target);
 /* Notify resamplefd for EOI of specific interrupts. */
 void kvm_resample_fd_notify(int gsi);
 
-/**
- * kvm_cpu_check_are_resettable - return whether CPUs can be reset
- *
- * Returns: true: CPUs are resettable
- *          false: CPUs are not resettable
- */
-bool kvm_cpu_check_are_resettable(void);
-
-bool kvm_arch_cpu_check_are_resettable(void);
-
 bool kvm_dirty_ring_enabled(void);
 
 uint32_t kvm_dirty_ring_size(void);
diff --git a/accel/kvm/kvm-accel-ops.c b/accel/kvm/kvm-accel-ops.c
index b3c946dc4b4..74e3c5785b5 100644
--- a/accel/kvm/kvm-accel-ops.c
+++ b/accel/kvm/kvm-accel-ops.c
@@ -82,7 +82,7 @@ static bool kvm_vcpu_thread_is_idle(CPUState *cpu)
 
 static bool kvm_cpus_are_resettable(void)
 {
-    return !kvm_enabled() || kvm_cpu_check_are_resettable();
+    return !kvm_enabled() || !kvm_state->guest_state_protected;
 }
 
 #ifdef KVM_CAP_SET_GUEST_DEBUG
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 05fa3533c66..a05dea23133 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2691,11 +2691,6 @@ void kvm_flush_coalesced_mmio_buffer(void)
     s->coalesced_flush_in_progress = false;
 }
 
-bool kvm_cpu_check_are_resettable(void)
-{
-    return kvm_arch_cpu_check_are_resettable();
-}
-
 static void do_kvm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
 {
     if (!cpu->vcpu_dirty && !kvm_state->guest_state_protected) {
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index ab85d628a8b..21ebbf3b8f8 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1598,11 +1598,6 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
     return (data - 32) & 0xffff;
 }
 
-bool kvm_arch_cpu_check_are_resettable(void)
-{
-    return true;
-}
-
 static void kvm_arch_get_eager_split_size(Object *obj, Visitor *v,
                                           const char *name, void *opaque,
                                           Error **errp)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index de10155b37a..0ec69109a2b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5614,11 +5614,6 @@ bool kvm_has_waitpkg(void)
     return has_msr_umwait;
 }
 
-bool kvm_arch_cpu_check_are_resettable(void)
-{
-    return !sev_es_enabled();
-}
-
 #define ARCH_REQ_XCOMP_GUEST_PERM       0x1025
 
 void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask)
diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index d630cc39cb2..8224d943331 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -733,11 +733,6 @@ bool kvm_arch_stop_on_emulation_error(CPUState *cs)
     return true;
 }
 
-bool kvm_arch_cpu_check_are_resettable(void)
-{
-    return true;
-}
-
 int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 {
     int ret = 0;
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index 6c52e59f55d..a631ab544f5 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -1273,11 +1273,6 @@ int kvm_arch_get_default_type(MachineState *machine)
     return -1;
 }
 
-bool kvm_arch_cpu_check_are_resettable(void)
-{
-    return true;
-}
-
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 }
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 8231feb2d45..63930d4a77d 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -2956,11 +2956,6 @@ void kvmppc_set_reg_tb_offset(PowerPCCPU *cpu, int64_t tb_offset)
     }
 }
 
-bool kvm_arch_cpu_check_are_resettable(void)
-{
-    return true;
-}
-
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 }
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index cda7d78a778..135d87dc3f5 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -1466,11 +1466,6 @@ void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
     }
 }
 
-bool kvm_arch_cpu_check_are_resettable(void)
-{
-    return true;
-}
-
 static int aia_mode;
 
 static const char *kvm_aia_mode_str(uint64_t mode)
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 4ce809c5d46..4dcd757cdcc 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -2622,11 +2622,6 @@ void kvm_s390_stop_interrupt(S390CPU *cpu)
     kvm_s390_vcpu_interrupt(cpu, &irq);
 }
 
-bool kvm_arch_cpu_check_are_resettable(void)
-{
-    return true;
-}
-
 int kvm_s390_get_zpci_op(void)
 {
     return cap_zpci_op;
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 14/26] target/i386: introduce x86-confidential-guest
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (12 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 13/26] KVM: remove kvm_arch_cpu_check_are_resettable Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 15/26] target/i386: Implement mc->kvm_type() to get VM type Paolo Bonzini
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

Introduce a common superclass for x86 confidential guest implementations.
It will extend ConfidentialGuestSupportClass with a method that provides
the VM type to be passed to KVM_CREATE_VM.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/confidential-guest.h | 40 ++++++++++++++++++++++++++++++++
 target/i386/confidential-guest.c | 33 ++++++++++++++++++++++++++
 target/i386/sev.c                |  6 ++---
 target/i386/meson.build          |  2 +-
 4 files changed, 77 insertions(+), 4 deletions(-)
 create mode 100644 target/i386/confidential-guest.h
 create mode 100644 target/i386/confidential-guest.c

diff --git a/target/i386/confidential-guest.h b/target/i386/confidential-guest.h
new file mode 100644
index 00000000000..ca12d5a8fba
--- /dev/null
+++ b/target/i386/confidential-guest.h
@@ -0,0 +1,40 @@
+/*
+ * x86-specific confidential guest methods.
+ *
+ * Copyright (c) 2024 Red Hat Inc.
+ *
+ * Authors:
+ *  Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef TARGET_I386_CG_H
+#define TARGET_I386_CG_H
+
+#include "qom/object.h"
+
+#include "exec/confidential-guest-support.h"
+
+#define TYPE_X86_CONFIDENTIAL_GUEST "x86-confidential-guest"
+
+OBJECT_DECLARE_TYPE(X86ConfidentialGuest,
+                    X86ConfidentialGuestClass,
+                    X86_CONFIDENTIAL_GUEST)
+
+struct X86ConfidentialGuest {
+    /* <private> */
+    ConfidentialGuestSupport parent_obj;
+};
+
+/**
+ * X86ConfidentialGuestClass:
+ *
+ * Class to be implemented by confidential-guest-support concrete objects
+ * for the x86 target.
+ */
+struct X86ConfidentialGuestClass {
+    /* <private> */
+    ConfidentialGuestSupportClass parent;
+};
+#endif
diff --git a/target/i386/confidential-guest.c b/target/i386/confidential-guest.c
new file mode 100644
index 00000000000..b3727845adc
--- /dev/null
+++ b/target/i386/confidential-guest.c
@@ -0,0 +1,33 @@
+/*
+ * QEMU Confidential Guest support
+ *
+ * Copyright (C) 2024 Red Hat, Inc.
+ *
+ * Authors:
+ *  Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "confidential-guest.h"
+
+OBJECT_DEFINE_ABSTRACT_TYPE(X86ConfidentialGuest,
+                            x86_confidential_guest,
+                            X86_CONFIDENTIAL_GUEST,
+                            CONFIDENTIAL_GUEST_SUPPORT)
+
+static void x86_confidential_guest_class_init(ObjectClass *oc, void *data)
+{
+}
+
+static void x86_confidential_guest_init(Object *obj)
+{
+}
+
+static void x86_confidential_guest_finalize(Object *obj)
+{
+}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index c49a8fd55eb..ebe36d4c10c 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -35,7 +35,7 @@
 #include "monitor/monitor.h"
 #include "monitor/hmp-target.h"
 #include "qapi/qapi-commands-misc-target.h"
-#include "exec/confidential-guest-support.h"
+#include "confidential-guest.h"
 #include "hw/i386/pc.h"
 #include "exec/address-spaces.h"
 
@@ -54,7 +54,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
  *         -machine ...,memory-encryption=sev0
  */
 struct SevGuestState {
-    ConfidentialGuestSupport parent_obj;
+    X86ConfidentialGuest parent_obj;
 
     /* configuration parameters */
     char *sev_device;
@@ -1372,7 +1372,7 @@ sev_guest_instance_init(Object *obj)
 
 /* sev guest info */
 static const TypeInfo sev_guest_info = {
-    .parent = TYPE_CONFIDENTIAL_GUEST_SUPPORT,
+    .parent = TYPE_X86_CONFIDENTIAL_GUEST,
     .name = TYPE_SEV_GUEST,
     .instance_size = sizeof(SevGuestState),
     .instance_finalize = sev_guest_finalize,
diff --git a/target/i386/meson.build b/target/i386/meson.build
index 7c74bfa8591..8abce725f86 100644
--- a/target/i386/meson.build
+++ b/target/i386/meson.build
@@ -6,7 +6,7 @@ i386_ss.add(files(
   'xsave_helper.c',
   'cpu-dump.c',
 ))
-i386_ss.add(when: 'CONFIG_SEV', if_true: files('host-cpu.c'))
+i386_ss.add(when: 'CONFIG_SEV', if_true: files('host-cpu.c', 'confidential-guest.c'))
 
 # x86 cpu type
 i386_ss.add(when: 'CONFIG_KVM', if_true: files('host-cpu.c'))
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 15/26] target/i386: Implement mc->kvm_type() to get VM type
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (13 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 14/26] target/i386: introduce x86-confidential-guest Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-25  9:29   ` Philippe Mathieu-Daudé
  2024-03-22 18:11 ` [PATCH 16/26] target/i386: SEV: use KVM_SEV_INIT2 if possible Paolo Bonzini
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

KVM is introducing a new API to create confidential guests, which
will be used by TDX and SEV-SNP but is also available for SEV and
SEV-ES.  The API uses the VM type argument to KVM_CREATE_VM to
identify which confidential computing technology to use.

Since there are no other expected uses of VM types, delegate
mc->kvm_type() for x86 boards to the confidential-guest-support
object pointed to by ms->cgs.

For example, if a sev-guest object is specified to confidential-guest-support,
like,

  qemu -machine ...,confidential-guest-support=sev0 \
       -object sev-guest,id=sev0,...

it will check if a VM type KVM_X86_SEV_VM or KVM_X86_SEV_ES_VM
is supported, and if so use them together with the KVM_SEV_INIT2
function of the KVM_MEMORY_ENCRYPT_OP ioctl. If not, it will fall back to
KVM_SEV_INIT and KVM_SEV_ES_INIT.

This is a preparatory work towards TDX and SEV-SNP support, but it
will also enable support for VMSA features such as DebugSwap, which
are only available via KVM_SEV_INIT2.

Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/confidential-guest.h | 19 ++++++++++++++
 target/i386/kvm/kvm_i386.h       |  2 ++
 hw/i386/x86.c                    | 11 ++++++++
 target/i386/kvm/kvm.c            | 44 ++++++++++++++++++++++++++++++++
 4 files changed, 76 insertions(+)

diff --git a/target/i386/confidential-guest.h b/target/i386/confidential-guest.h
index ca12d5a8fba..532e172a60b 100644
--- a/target/i386/confidential-guest.h
+++ b/target/i386/confidential-guest.h
@@ -36,5 +36,24 @@ struct X86ConfidentialGuest {
 struct X86ConfidentialGuestClass {
     /* <private> */
     ConfidentialGuestSupportClass parent;
+
+    /* <public> */
+    int (*kvm_type)(X86ConfidentialGuest *cg);
 };
+
+/**
+ * x86_confidential_guest_kvm_type:
+ *
+ * Calls #X86ConfidentialGuestClass.unplug callback of @plug_handler.
+ */
+static inline int x86_confidential_guest_kvm_type(X86ConfidentialGuest *cg)
+{
+    X86ConfidentialGuestClass *klass = X86_CONFIDENTIAL_GUEST_GET_CLASS(cg);
+
+    if (klass->kvm_type) {
+        return klass->kvm_type(cg);
+    } else {
+        return 0;
+    }
+}
 #endif
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 30fedcffea3..6b44844d95d 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -37,6 +37,7 @@ bool kvm_hv_vpindex_settable(void);
 bool kvm_enable_sgx_provisioning(KVMState *s);
 bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp);
 
+int kvm_get_vm_type(MachineState *ms);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_after_reset_vcpu(X86CPU *cpu);
 void kvm_arch_do_init_vcpu(X86CPU *cs);
@@ -49,6 +50,7 @@ void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask);
 
 #ifdef CONFIG_KVM
 
+bool kvm_is_vm_type_supported(int type);
 bool kvm_has_adjust_clock_stable(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 84a48019770..3d5b51e92db 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1381,6 +1381,16 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
     qapi_free_SgxEPCList(list);
 }
 
+static int x86_kvm_type(MachineState *ms, const char *vm_type)
+{
+    /*
+     * No x86 machine has a kvm-type property.  If one is added that has
+     * it, it should call kvm_get_vm_type() directly or not use it at all.
+     */
+    assert(vm_type == NULL);
+    return kvm_enabled() ? kvm_get_vm_type(ms) : 0;
+}
+
 static void x86_machine_initfn(Object *obj)
 {
     X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1405,6 +1415,7 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
     mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
     mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
     mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+    mc->kvm_type = x86_kvm_type;
     x86mc->save_tsc_khz = true;
     x86mc->fwcfg_dma_enabled = true;
     nc->nmi_monitor_handler = x86_nmi;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 0ec69109a2b..b599a7fae36 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -31,6 +31,7 @@
 #include "sysemu/kvm_int.h"
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
+#include "../confidential-guest.h"
 #include "sev.h"
 #include "xen-emu.h"
 #include "hyperv.h"
@@ -161,6 +162,49 @@ static KVMMSRHandlers msr_handlers[KVM_MSR_FILTER_MAX_RANGES];
 static RateLimit bus_lock_ratelimit_ctrl;
 static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
 
+static const char *vm_type_name[] = {
+    [KVM_X86_DEFAULT_VM] = "default",
+};
+
+bool kvm_is_vm_type_supported(int type)
+{
+    uint32_t machine_types;
+
+    /*
+     * old KVM doesn't support KVM_CAP_VM_TYPES but KVM_X86_DEFAULT_VM
+     * is always supported
+     */
+    if (type == KVM_X86_DEFAULT_VM) {
+        return true;
+    }
+
+    machine_types = kvm_check_extension(KVM_STATE(current_machine->accelerator),
+                                        KVM_CAP_VM_TYPES);
+    return !!(machine_types & BIT(type));
+}
+
+int kvm_get_vm_type(MachineState *ms)
+{
+    int kvm_type = KVM_X86_DEFAULT_VM;
+
+    if (ms->cgs) {
+        if (!object_dynamic_cast(OBJECT(ms->cgs), TYPE_X86_CONFIDENTIAL_GUEST)) {
+            error_report("configuration type %s not supported for x86 guests",
+                         object_get_typename(OBJECT(ms->cgs)));
+            exit(1);
+        }
+        kvm_type = x86_confidential_guest_kvm_type(
+            X86_CONFIDENTIAL_GUEST(ms->cgs));
+    }
+
+    if (!kvm_is_vm_type_supported(kvm_type)) {
+        error_report("vm-type %s not supported by KVM", vm_type_name[kvm_type]);
+        exit(1);
+    }
+
+    return kvm_type;
+}
+
 bool kvm_has_smm(void)
 {
     return kvm_vm_check_extension(kvm_state, KVM_CAP_X86_SMM);
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 16/26] target/i386: SEV: use KVM_SEV_INIT2 if possible
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (14 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 15/26] target/i386: Implement mc->kvm_type() to get VM type Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 17/26] trace/kvm: Split address space and slot id in trace_kvm_set_user_memory() Paolo Bonzini
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

Implement support for the KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM virtual
machine types, and the KVM_SEV_INIT2 function of KVM_MEMORY_ENCRYPT_OP.

These replace the KVM_SEV_INIT and KVM_SEV_ES_INIT functions, and have
several advantages:

- sharing the initialization sequence with SEV-SNP and TDX

- allowing arguments including the set of desired VMSA features

- protection against invalid use of KVM_GET/SET_* ioctls for guests
  with encrypted state

If the KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types are not supported,
fall back to KVM_SEV_INIT and KVM_SEV_ES_INIT (which use the
default x86 VM type).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/kvm/kvm.c |  2 ++
 target/i386/sev.c     | 41 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index b599a7fae36..2577e345502 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -164,6 +164,8 @@ static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
 
 static const char *vm_type_name[] = {
     [KVM_X86_DEFAULT_VM] = "default",
+    [KVM_X86_SEV_VM] = "SEV",
+    [KVM_X86_SEV_ES_VM] = "SEV-ES",
 };
 
 bool kvm_is_vm_type_supported(int type)
diff --git a/target/i386/sev.c b/target/i386/sev.c
index ebe36d4c10c..9dab4060b84 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -26,6 +26,7 @@
 #include "qemu/error-report.h"
 #include "crypto/hash.h"
 #include "sysemu/kvm.h"
+#include "kvm/kvm_i386.h"
 #include "sev.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
@@ -56,6 +57,8 @@ OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
 struct SevGuestState {
     X86ConfidentialGuest parent_obj;
 
+    int kvm_type;
+
     /* configuration parameters */
     char *sev_device;
     uint32_t policy;
@@ -850,6 +853,26 @@ sev_vm_state_change(void *opaque, bool running, RunState state)
     }
 }
 
+static int sev_kvm_type(X86ConfidentialGuest *cg)
+{
+    SevGuestState *sev = SEV_GUEST(cg);
+    int kvm_type;
+
+    if (sev->kvm_type != -1) {
+        goto out;
+    }
+
+    kvm_type = (sev->policy & SEV_POLICY_ES) ? KVM_X86_SEV_ES_VM : KVM_X86_SEV_VM;
+    if (kvm_is_vm_type_supported(kvm_type)) {
+        sev->kvm_type = kvm_type;
+    } else {
+        sev->kvm_type = KVM_X86_DEFAULT_VM;
+    }
+
+out:
+    return sev->kvm_type;
+}
+
 static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
 {
     SevGuestState *sev = SEV_GUEST(cgs);
@@ -929,13 +952,19 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, Error **errp)
                          __func__);
             goto err;
         }
-        cmd = KVM_SEV_ES_INIT;
-    } else {
-        cmd = KVM_SEV_INIT;
     }
 
     trace_kvm_sev_init();
-    ret = sev_ioctl(sev->sev_fd, cmd, NULL, &fw_error);
+    if (sev_kvm_type(X86_CONFIDENTIAL_GUEST(sev)) == KVM_X86_DEFAULT_VM) {
+        cmd = sev_es_enabled() ? KVM_SEV_ES_INIT : KVM_SEV_INIT;
+
+        ret = sev_ioctl(sev->sev_fd, cmd, NULL, &fw_error);
+    } else {
+        struct kvm_sev_init args = { 0 };
+
+        ret = sev_ioctl(sev->sev_fd, KVM_SEV_INIT2, &args, &fw_error);
+    }
+
     if (ret) {
         error_setg(errp, "%s: failed to initialize ret=%d fw_error=%d '%s'",
                    __func__, ret, fw_error, fw_error_to_str(fw_error));
@@ -1327,8 +1356,10 @@ static void
 sev_guest_class_init(ObjectClass *oc, void *data)
 {
     ConfidentialGuestSupportClass *klass = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
+    X86ConfidentialGuestClass *x86_klass = X86_CONFIDENTIAL_GUEST_CLASS(oc);
 
     klass->kvm_init = sev_kvm_init;
+    x86_klass->kvm_type = sev_kvm_type;
 
     object_class_property_add_str(oc, "sev-device",
                                   sev_guest_get_sev_device,
@@ -1357,6 +1388,8 @@ sev_guest_instance_init(Object *obj)
 {
     SevGuestState *sev = SEV_GUEST(obj);
 
+    sev->kvm_type = -1;
+
     sev->sev_device = g_strdup(DEFAULT_SEV_DEVICE);
     sev->policy = DEFAULT_GUEST_POLICY;
     object_property_add_uint32_ptr(obj, "policy", &sev->policy,
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 17/26] trace/kvm: Split address space and slot id in trace_kvm_set_user_memory()
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (15 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 16/26] target/i386: SEV: use KVM_SEV_INIT2 if possible Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 18/26] kvm: Introduce support for memory_attributes Paolo Bonzini
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Xiaoyao Li <xiaoyao.li@intel.com>

The upper 16 bits of kvm_userspace_memory_region::slot are
address space id. Parse it separately in trace_kvm_set_user_memory().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240229063726.610065-5-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 accel/kvm/kvm-all.c    | 5 +++--
 accel/kvm/trace-events | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index a05dea23133..4ac3cf1c9ef 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -303,8 +303,9 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo
     ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
     slot->old_flags = mem.flags;
 err:
-    trace_kvm_set_user_memory(mem.slot, mem.flags, mem.guest_phys_addr,
-                              mem.memory_size, mem.userspace_addr, ret);
+    trace_kvm_set_user_memory(mem.slot >> 16, (uint16_t)mem.slot, mem.flags,
+                              mem.guest_phys_addr, mem.memory_size,
+                              mem.userspace_addr, ret);
     if (ret < 0) {
         error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
                      " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index a25902597b1..9f599abc172 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,7 +15,7 @@ kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
 kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
 kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
-kvm_set_user_memory(uint32_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
+kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
 kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
 kvm_resample_fd_notify(int gsi) "gsi %d"
 kvm_dirty_ring_full(int id) "vcpu %d"
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 18/26] kvm: Introduce support for memory_attributes
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (16 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 17/26] trace/kvm: Split address space and slot id in trace_kvm_set_user_memory() Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-25  9:31   ` Philippe Mathieu-Daudé
  2024-03-22 18:11 ` [PATCH 19/26] RAMBlock: Add support of KVM private guest memfd Paolo Bonzini
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Xiaoyao Li <xiaoyao.li@intel.com>

Introduce the helper functions to set the attributes of a range of
memory to private or shared.

This is necessary to notify KVM the private/shared attribute of each gpa
range. KVM needs the information to decide the GPA needs to be mapped at
hva-based shared memory or guest_memfd based private memory.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240320083945.991426-11-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/sysemu/kvm.h |  4 ++++
 accel/kvm/kvm-all.c  | 31 +++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 54f4d83a370..bda309d5ffa 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -536,4 +536,8 @@ void kvm_mark_guest_state_protected(void);
  * reported for the VM.
  */
 bool kvm_hwpoisoned_mem(void);
+
+int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
+int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
+
 #endif
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 4ac3cf1c9ef..36e39fd6514 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -91,6 +91,7 @@ bool kvm_msi_use_devid;
 static bool kvm_has_guest_debug;
 static int kvm_sstep_flags;
 static bool kvm_immediate_exit;
+static uint64_t kvm_supported_memory_attributes;
 static hwaddr kvm_max_slot_size = ~0;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
@@ -1266,6 +1267,35 @@ void kvm_set_max_memslot_size(hwaddr max_slot_size)
     kvm_max_slot_size = max_slot_size;
 }
 
+static int kvm_set_memory_attributes(hwaddr start, hwaddr size, uint64_t attr)
+{
+    struct kvm_memory_attributes attrs;
+    int r;
+
+    assert((attr & kvm_supported_memory_attributes) == attr);
+    attrs.attributes = attr;
+    attrs.address = start;
+    attrs.size = size;
+    attrs.flags = 0;
+
+    r = kvm_vm_ioctl(kvm_state, KVM_SET_MEMORY_ATTRIBUTES, &attrs);
+    if (r) {
+        error_report("failed to set memory (0x%lx+%#zx) with attr 0x%lx error '%s'",
+                     start, size, attr, strerror(errno));
+    }
+    return r;
+}
+
+int kvm_set_memory_attributes_private(hwaddr start, hwaddr size)
+{
+    return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+}
+
+int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size)
+{
+    return kvm_set_memory_attributes(start, size, 0);
+}
+
 /* Called with KVMMemoryListener.slots_lock held */
 static void kvm_set_phys_mem(KVMMemoryListener *kml,
                              MemoryRegionSection *section, bool add)
@@ -2382,6 +2412,7 @@ static int kvm_init(MachineState *ms)
         goto err;
     }
 
+    kvm_supported_memory_attributes = kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES);
     kvm_immediate_exit = kvm_check_extension(s, KVM_CAP_IMMEDIATE_EXIT);
     s->nr_slots = kvm_check_extension(s, KVM_CAP_NR_MEMSLOTS);
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 19/26] RAMBlock: Add support of KVM private guest memfd
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (17 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 18/26] kvm: Introduce support for memory_attributes Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-22 18:53   ` Michael Roth
  2024-03-22 18:11 ` [PATCH 20/26] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Paolo Bonzini
                   ` (6 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Michael Roth <michael.roth@amd.com>

Add KVM guest_memfd support to RAMBlock so both normal hva based memory
and kvm guest memfd based private memory can be associated in one RAMBlock.

Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to
create private guest_memfd during RAMBlock setup.

Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd
is more flexible and extensible than simply relying on the VM type because
in the future we may have the case that not all the memory of a VM need
guest memfd. As a benefit, it also avoid getting MachineState in memory
subsystem.

Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of
confidential guests, such as TDX VM. How and when to set it for memory
backends will be implemented in the following patches.

Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has
KVM guest_memfd allocated.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20240320083945.991426-7-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/exec/memory.h   | 20 +++++++++++++++++---
 include/exec/ram_addr.h |  2 +-
 include/exec/ramblock.h |  1 +
 include/sysemu/kvm.h    |  3 ++-
 accel/kvm/kvm-all.c     | 28 ++++++++++++++++++++++++++++
 accel/stubs/kvm-stub.c  |  5 +++++
 system/memory.c         |  5 +++++
 system/physmem.c        | 34 +++++++++++++++++++++++++++++++---
 8 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 8626a355b31..679a8476852 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -243,6 +243,9 @@ typedef struct IOMMUTLBEvent {
 /* RAM FD is opened read-only */
 #define RAM_READONLY_FD (1 << 11)
 
+/* RAM can be private that has kvm guest memfd backend */
+#define RAM_GUEST_MEMFD   (1 << 12)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
@@ -1307,7 +1310,8 @@ bool memory_region_init_ram_nomigrate(MemoryRegion *mr,
  * @name: Region name, becomes part of RAMBlock name used in migration stream
  *        must be unique within any device
  * @size: size of the region.
- * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_NORESERVE.
+ * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_NORESERVE,
+ *             RAM_GUEST_MEMFD.
  * @errp: pointer to Error*, to store an error if it happens.
  *
  * Note that this function does not do anything to cause the data in the
@@ -1369,7 +1373,7 @@ bool memory_region_init_resizeable_ram(MemoryRegion *mr,
  *         (getpagesize()) will be used.
  * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  *             RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
- *             RAM_READONLY_FD
+ *             RAM_READONLY_FD, RAM_GUEST_MEMFD
  * @path: the path in which to allocate the RAM.
  * @offset: offset within the file referenced by path
  * @errp: pointer to Error*, to store an error if it happens.
@@ -1399,7 +1403,7 @@ bool memory_region_init_ram_from_file(MemoryRegion *mr,
  * @size: size of the region.
  * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  *             RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
- *             RAM_READONLY_FD
+ *             RAM_READONLY_FD, RAM_GUEST_MEMFD
  * @fd: the fd to mmap.
  * @offset: offset within the file referenced by fd
  * @errp: pointer to Error*, to store an error if it happens.
@@ -1722,6 +1726,16 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
  */
 bool memory_region_is_protected(MemoryRegion *mr);
 
+/**
+ * memory_region_has_guest_memfd: check whether a memory region has guest_memfd
+ *     associated
+ *
+ * Returns %true if a memory region's ram_block has valid guest_memfd assigned.
+ *
+ * @mr: the memory region being queried
+ */
+bool memory_region_has_guest_memfd(MemoryRegion *mr);
+
 /**
  * memory_region_get_iommu: check whether a memory region is an iommu
  *
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index de45ba7bc96..07c8f863750 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -110,7 +110,7 @@ long qemu_maxrampagesize(void);
  *  @mr: the memory region where the ram block is
  *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
  *              RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
- *              RAM_READONLY_FD
+ *              RAM_READONLY_FD, RAM_GUEST_MEMFD
  *  @mem_path or @fd: specify the backing file or device
  *  @offset: Offset into target file
  *  @errp: pointer to Error*, to store an error if it happens
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 848915ea5bf..459c8917de2 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -41,6 +41,7 @@ struct RAMBlock {
     QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
     int fd;
     uint64_t fd_offset;
+    int guest_memfd;
     size_t page_size;
     /* dirty bitmap used during migration */
     unsigned long *bmap;
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index bda309d5ffa..2cb31925091 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -537,7 +537,8 @@ void kvm_mark_guest_state_protected(void);
  */
 bool kvm_hwpoisoned_mem(void);
 
+int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
+
 int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
 int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
-
 #endif
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 36e39fd6514..6aa0608805b 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -92,6 +92,7 @@ static bool kvm_has_guest_debug;
 static int kvm_sstep_flags;
 static bool kvm_immediate_exit;
 static uint64_t kvm_supported_memory_attributes;
+static bool kvm_guest_memfd_supported;
 static hwaddr kvm_max_slot_size = ~0;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
@@ -2413,6 +2414,11 @@ static int kvm_init(MachineState *ms)
     }
 
     kvm_supported_memory_attributes = kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES);
+    kvm_guest_memfd_supported =
+        kvm_check_extension(s, KVM_CAP_GUEST_MEMFD) &&
+        kvm_check_extension(s, KVM_CAP_USER_MEMORY2) &&
+        (kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
+
     kvm_immediate_exit = kvm_check_extension(s, KVM_CAP_IMMEDIATE_EXIT);
     s->nr_slots = kvm_check_extension(s, KVM_CAP_NR_MEMSLOTS);
 
@@ -4131,3 +4137,25 @@ void kvm_mark_guest_state_protected(void)
 {
     kvm_state->guest_state_protected = true;
 }
+
+int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
+{
+    int fd;
+    struct kvm_create_guest_memfd guest_memfd = {
+        .size = size,
+        .flags = flags,
+    };
+
+    if (!kvm_guest_memfd_supported) {
+        error_setg(errp, "KVM doesn't support guest memfd\n");
+        return -1;
+    }
+
+    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &guest_memfd);
+    if (fd < 0) {
+        error_setg_errno(errp, errno, "Error creating kvm guest memfd");
+        return -1;
+    }
+
+    return fd;
+}
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index ca381728840..8e0eb22e61c 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -129,3 +129,8 @@ bool kvm_hwpoisoned_mem(void)
 {
     return false;
 }
+
+int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
+{
+    return -ENOSYS;
+}
diff --git a/system/memory.c b/system/memory.c
index a229a79988f..c756950c0c0 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1850,6 +1850,11 @@ bool memory_region_is_protected(MemoryRegion *mr)
     return mr->ram && (mr->ram_block->flags & RAM_PROTECTED);
 }
 
+bool memory_region_has_guest_memfd(MemoryRegion *mr)
+{
+    return mr->ram_block && mr->ram_block->guest_memfd >= 0;
+}
+
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
     uint8_t mask = mr->dirty_log_mask;
diff --git a/system/physmem.c b/system/physmem.c
index a4fe3d2bf89..f5dfa20e57e 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1808,6 +1808,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
     const bool shared = qemu_ram_is_shared(new_block);
     RAMBlock *block;
     RAMBlock *last_block = NULL;
+    bool free_on_error = false;
     ram_addr_t old_ram_size, new_ram_size;
     Error *err = NULL;
 
@@ -1837,6 +1838,19 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
                 return;
             }
             memory_try_enable_merging(new_block->host, new_block->max_length);
+            free_on_error = true;
+        }
+    }
+
+    if (new_block->flags & RAM_GUEST_MEMFD) {
+        assert(kvm_enabled());
+        assert(new_block->guest_memfd < 0);
+
+        new_block->guest_memfd = kvm_create_guest_memfd(new_block->max_length,
+                                                        0, errp);
+        if (new_block->guest_memfd < 0) {
+            qemu_mutex_unlock_ramlist();
+            goto out_free;
         }
     }
 
@@ -1888,6 +1902,13 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
         ram_block_notify_add(new_block->host, new_block->used_length,
                              new_block->max_length);
     }
+    return;
+
+out_free:
+    if (free_on_error) {
+        qemu_anon_ram_free(new_block->host, new_block->max_length);
+        new_block->host = NULL;
+    }
 }
 
 #ifdef CONFIG_POSIX
@@ -1902,7 +1923,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     /* Just support these ram flags by now. */
     assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
                           RAM_PROTECTED | RAM_NAMED_FILE | RAM_READONLY |
-                          RAM_READONLY_FD)) == 0);
+                          RAM_READONLY_FD | RAM_GUEST_MEMFD)) == 0);
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");
@@ -1939,6 +1960,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = ram_flags;
+    new_block->guest_memfd = -1;
     new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
                                      errp);
     if (!new_block->host) {
@@ -2018,7 +2040,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     int align;
 
     assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
-                          RAM_NORESERVE)) == 0);
+                          RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
     assert(!host ^ (ram_flags & RAM_PREALLOC));
 
     align = qemu_real_host_page_size();
@@ -2033,6 +2055,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     new_block->max_length = max_size;
     assert(max_size >= size);
     new_block->fd = -1;
+    new_block->guest_memfd = -1;
     new_block->page_size = qemu_real_host_page_size();
     new_block->host = host;
     new_block->flags = ram_flags;
@@ -2055,7 +2078,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
 RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
                          MemoryRegion *mr, Error **errp)
 {
-    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE)) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
     return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
 }
 
@@ -2083,6 +2106,11 @@ static void reclaim_ramblock(RAMBlock *block)
     } else {
         qemu_anon_ram_free(block->host, block->max_length);
     }
+
+    if (block->guest_memfd >= 0) {
+        close(block->guest_memfd);
+    }
+
     g_free(block);
 }
 
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 20/26] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (18 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 19/26] RAMBlock: Add support of KVM private guest memfd Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 21/26] kvm/memory: Make memory type private by default if it has guest memfd backend Paolo Bonzini
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Chao Peng <chao.p.peng@linux.intel.com>

Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.

With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
backend'ed both by hva-based shared memory and guest memfd based private
memory.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240320083945.991426-10-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/sysemu/kvm_int.h |  2 ++
 accel/kvm/kvm-all.c      | 46 +++++++++++++++++++++++++++++++++-------
 accel/kvm/trace-events   |  2 +-
 3 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 3496be7997a..a5a3fee4119 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -30,6 +30,8 @@ typedef struct KVMSlot
     int as_id;
     /* Cache of the offset in ram address space */
     ram_addr_t ram_start_offset;
+    int guest_memfd;
+    hwaddr guest_memfd_offset;
 } KVMSlot;
 
 typedef struct KVMMemoryUpdate {
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 6aa0608805b..7fbaf31cbaf 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -284,35 +284,58 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
 static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, bool new)
 {
     KVMState *s = kvm_state;
-    struct kvm_userspace_memory_region mem;
+    struct kvm_userspace_memory_region2 mem;
     int ret;
 
     mem.slot = slot->slot | (kml->as_id << 16);
     mem.guest_phys_addr = slot->start_addr;
     mem.userspace_addr = (unsigned long)slot->ram;
     mem.flags = slot->flags;
+    mem.guest_memfd = slot->guest_memfd;
+    mem.guest_memfd_offset = slot->guest_memfd_offset;
 
     if (slot->memory_size && !new && (mem.flags ^ slot->old_flags) & KVM_MEM_READONLY) {
         /* Set the slot size to 0 before setting the slot to the desired
          * value. This is needed based on KVM commit 75d61fbc. */
         mem.memory_size = 0;
-        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+
+        if (kvm_guest_memfd_supported) {
+            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
+        } else {
+            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+        }
         if (ret < 0) {
             goto err;
         }
     }
     mem.memory_size = slot->memory_size;
-    ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+    if (kvm_guest_memfd_supported) {
+        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
+    } else {
+        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+    }
     slot->old_flags = mem.flags;
 err:
     trace_kvm_set_user_memory(mem.slot >> 16, (uint16_t)mem.slot, mem.flags,
                               mem.guest_phys_addr, mem.memory_size,
-                              mem.userspace_addr, ret);
+                              mem.userspace_addr, mem.guest_memfd,
+                              mem.guest_memfd_offset, ret);
     if (ret < 0) {
-        error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
-                     " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
-                     __func__, mem.slot, slot->start_addr,
-                     (uint64_t)mem.memory_size, strerror(errno));
+        if (kvm_guest_memfd_supported) {
+                error_report("%s: KVM_SET_USER_MEMORY_REGION2 failed, slot=%d,"
+                        " start=0x%" PRIx64 ", size=0x%" PRIx64 ","
+                        " flags=0x%" PRIx32 ", guest_memfd=%" PRId32 ","
+                        " guest_memfd_offset=0x%" PRIx64 ": %s",
+                        __func__, mem.slot, slot->start_addr,
+                        (uint64_t)mem.memory_size, mem.flags,
+                        mem.guest_memfd, (uint64_t)mem.guest_memfd_offset,
+                        strerror(errno));
+        } else {
+                error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
+                            " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
+                            __func__, mem.slot, slot->start_addr,
+                            (uint64_t)mem.memory_size, strerror(errno));
+        }
     }
     return ret;
 }
@@ -467,6 +490,10 @@ static int kvm_mem_flags(MemoryRegion *mr)
     if (readonly && kvm_readonly_mem_allowed) {
         flags |= KVM_MEM_READONLY;
     }
+    if (memory_region_has_guest_memfd(mr)) {
+        assert(kvm_guest_memfd_supported);
+        flags |= KVM_MEM_GUEST_MEMFD;
+    }
     return flags;
 }
 
@@ -1393,6 +1420,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
         mem->ram_start_offset = ram_start_offset;
         mem->ram = ram;
         mem->flags = kvm_mem_flags(mr);
+        mem->guest_memfd = mr->ram_block->guest_memfd;
+        mem->guest_memfd_offset = (uint8_t*)ram - mr->ram_block->host;
+
         kvm_slot_init_dirty_bitmap(mem);
         err = kvm_set_user_memory_region(kml, mem, true);
         if (err) {
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 9f599abc172..e8c52cb9e7a 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,7 +15,7 @@ kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
 kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
 kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
-kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
+kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, uint32_t fd, uint64_t fd_offset, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " guest_memfd=%d" " guest_memfd_offset=0x%" PRIx64 " ret=%d"
 kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
 kvm_resample_fd_notify(int gsi) "gsi %d"
 kvm_dirty_ring_full(int id) "vcpu %d"
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 21/26] kvm/memory: Make memory type private by default if it has guest memfd backend
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (19 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 20/26] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-26  8:56   ` Xiaoyao Li
  2024-03-22 18:11 ` [PATCH 22/26] HostMem: Add mechanism to opt in kvm guest memfd via MachineState Paolo Bonzini
                   ` (4 subsequent siblings)
  25 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Xiaoyao Li <xiaoyao.li@intel.com>

KVM side leaves the memory to shared by default, while may incur the
overhead of paging conversion on the first visit of each page. Because
the expectation is that page is likely to private for the VMs that
require private memory (has guest memfd).

Explicitly set the memory to private when memory region has valid
guest memfd backend.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240320083945.991426-16-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 accel/kvm/kvm-all.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7fbaf31cbaf..56b17cbd8aa 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1430,6 +1430,16 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
                     strerror(-err));
             abort();
         }
+
+        if (memory_region_has_guest_memfd(mr)) {
+            err = kvm_set_memory_attributes_private(start_addr, slot_size);
+            if (err) {
+                error_report("%s: failed to set memory attribute private: %s\n",
+                             __func__, strerror(-err));
+                exit(1);
+            }
+        }
+
         start_addr += slot_size;
         ram_start_offset += slot_size;
         ram += slot_size;
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 22/26] HostMem: Add mechanism to opt in kvm guest memfd via MachineState
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (20 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 21/26] kvm/memory: Make memory type private by default if it has guest memfd backend Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 23/26] RAMBlock: make guest_memfd require uncoordinated discard Paolo Bonzini
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Xiaoyao Li <xiaoyao.li@intel.com>

Add a new member "guest_memfd" to memory backends. When it's set
to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm
guest_memfd will be allocated during RAMBlock allocation.

Memory backend's @guest_memfd is wired with @require_guest_memfd
field of MachineState. It avoid looking up the machine in phymem.c.

MachineState::require_guest_memfd is supposed to be set by any VMs
that requires KVM guest memfd as private memory, e.g., TDX VM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20240320083945.991426-8-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/hw/boards.h      | 2 ++
 include/sysemu/hostmem.h | 1 +
 backends/hostmem-file.c  | 1 +
 backends/hostmem-memfd.c | 1 +
 backends/hostmem-ram.c   | 1 +
 backends/hostmem.c       | 1 +
 hw/core/machine.c        | 5 +++++
 7 files changed, 12 insertions(+)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 8b8f6d5c00d..44c2a4e1ec7 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -36,6 +36,7 @@ bool machine_usb(MachineState *machine);
 int machine_phandle_start(MachineState *machine);
 bool machine_dump_guest_core(MachineState *machine);
 bool machine_mem_merge(MachineState *machine);
+bool machine_require_guest_memfd(MachineState *machine);
 HotpluggableCPUList *machine_query_hotpluggable_cpus(MachineState *machine);
 void machine_set_cpu_numa_node(MachineState *machine,
                                const CpuInstanceProperties *props,
@@ -370,6 +371,7 @@ struct MachineState {
     char *dt_compatible;
     bool dump_guest_core;
     bool mem_merge;
+    bool require_guest_memfd;
     bool usb;
     bool usb_disabled;
     char *firmware;
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 0e411aaa29e..04b884bf42a 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -74,6 +74,7 @@ struct HostMemoryBackend {
     uint64_t size;
     bool merge, dump, use_canonical_path;
     bool prealloc, is_mapped, share, reserve;
+    bool guest_memfd;
     uint32_t prealloc_threads;
     ThreadContext *prealloc_context;
     DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index ac3e433cbdd..3c69db79460 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -85,6 +85,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     ram_flags |= fb->readonly ? RAM_READONLY_FD : 0;
     ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
     ram_flags |= RAM_NAMED_FILE;
     return memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 3923ea9364d..745ead0034d 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     return memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
                                           backend->size, ram_flags, fd, 0, errp);
 }
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index d121249f0f4..f7d81af783a 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -30,6 +30,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->guest_memfd ? RAM_GUEST_MEMFD : 0;
     return memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend),
                                                   name, backend->size,
                                                   ram_flags, errp);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 81a72ce40b7..eb9682b4a85 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -277,6 +277,7 @@ static void host_memory_backend_init(Object *obj)
     /* TODO: convert access to globals to compat properties */
     backend->merge = machine_mem_merge(machine);
     backend->dump = machine_dump_guest_core(machine);
+    backend->guest_memfd = machine_require_guest_memfd(machine);
     backend->reserve = true;
     backend->prealloc_threads = machine->smp.cpus;
 }
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 37ede0e7d4f..73ce9da835b 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1198,6 +1198,11 @@ bool machine_mem_merge(MachineState *machine)
     return machine->mem_merge;
 }
 
+bool machine_require_guest_memfd(MachineState *machine)
+{
+    return machine->require_guest_memfd;
+}
+
 static char *cpu_slot_to_string(const CPUArchId *cpu)
 {
     GString *s = g_string_new(NULL);
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 23/26] RAMBlock: make guest_memfd require uncoordinated discard
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (21 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 22/26] HostMem: Add mechanism to opt in kvm guest memfd via MachineState Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 24/26] physmem: Introduce ram_block_discard_guest_memfd_range() Paolo Bonzini
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

Some subsystems like VFIO might disable ram block discard, but guest_memfd
uses discard operations to implement conversions between private and
shared memory.  Because of this, sequences like the following can result
in stale IOMMU mappings:

1. allocate shared page
2. convert page shared->private
3. discard shared page
4. convert page private->shared
5. allocate shared page
6. issue DMA operations against that shared page

This is not a use-after-free, because after step 3 VFIO is still pinning
the page.  However, DMA operations in step 6 will hit the old mapping
that was allocated in step 1.

Address this by taking ram_block_discard_is_enabled() into account when
deciding whether or not to discard pages.

Since kvm_convert_memory()/guest_memfd doesn't implement a
RamDiscardManager handler to convey and replay discard operations,
this is a case of uncoordinated discard, which is blocked/released
by ram_block_discard_require().  Interestingly, this function had
no use so far.

Alternative approaches would be to block discard of shared pages, but
this would cause guests to consume twice the memory if they use VFIO;
or to implement a RamDiscardManager and only block uncoordinated
discard, i.e. use ram_block_coordinated_discard_require().

[Commit message mostly by Michael Roth <michael.roth@amd.com>]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 system/physmem.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/system/physmem.c b/system/physmem.c
index f5dfa20e57e..5ebcf5be116 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1846,6 +1846,13 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
         assert(kvm_enabled());
         assert(new_block->guest_memfd < 0);
 
+        if (ram_block_discard_require(true) < 0) {
+            error_setg_errno(errp, errno,
+                             "cannot set up private guest memory: discard currently blocked");
+            error_append_hint(errp, "Are you using assigned devices?\n");
+            goto out_free;
+        }
+
         new_block->guest_memfd = kvm_create_guest_memfd(new_block->max_length,
                                                         0, errp);
         if (new_block->guest_memfd < 0) {
@@ -2109,6 +2116,7 @@ static void reclaim_ramblock(RAMBlock *block)
 
     if (block->guest_memfd >= 0) {
         close(block->guest_memfd);
+        ram_block_discard_require(false);
     }
 
     g_free(block);
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 24/26] physmem: Introduce ram_block_discard_guest_memfd_range()
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (22 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 23/26] RAMBlock: make guest_memfd require uncoordinated discard Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 25/26] kvm: handle KVM_EXIT_MEMORY_FAULT Paolo Bonzini
  2024-03-22 18:11 ` [PATCH 26/26] i386/kvm: Move architectural CPUID leaf generation to separate helper Paolo Bonzini
  25 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Xiaoyao Li <xiaoyao.li@intel.com>

When memory page is converted from private to shared, the original
private memory is back'ed by guest_memfd. Introduce
ram_block_discard_guest_memfd_range() for discarding memory in
guest_memfd.

Based on a patch by Isaku Yamahata <isaku.yamahata@intel.com>.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240320083945.991426-12-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/exec/cpu-common.h |  2 ++
 system/physmem.c          | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 6346df17ce9..6d5318895a3 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -159,6 +159,8 @@ typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
 
 int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length);
+int ram_block_discard_guest_memfd_range(RAMBlock *rb, uint64_t start,
+                                        size_t length);
 
 #endif
 
diff --git a/system/physmem.c b/system/physmem.c
index 5ebcf5be116..c3d04ca9212 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -3721,6 +3721,29 @@ err:
     return ret;
 }
 
+int ram_block_discard_guest_memfd_range(RAMBlock *rb, uint64_t start,
+                                        size_t length)
+{
+    int ret = -1;
+
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+    ret = fallocate(rb->guest_memfd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+                    start, length);
+
+    if (ret) {
+        ret = -errno;
+        error_report("%s: Failed to fallocate %s:%" PRIx64 " +%zx (%d)",
+                     __func__, rb->idstr, start, length, ret);
+    }
+#else
+    ret = -ENOSYS;
+    error_report("%s: fallocate not available %s:%" PRIx64 " +%zx (%d)",
+                 __func__, rb->idstr, start, length, ret);
+#endif
+
+    return ret;
+}
+
 bool ramblock_is_pmem(RAMBlock *rb)
 {
     return rb->flags & RAM_PMEM;
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 25/26] kvm: handle KVM_EXIT_MEMORY_FAULT
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (23 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 24/26] physmem: Introduce ram_block_discard_guest_memfd_range() Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-03-26  8:53   ` Xiaoyao Li
  2024-03-22 18:11 ` [PATCH 26/26] i386/kvm: Move architectural CPUID leaf generation to separate helper Paolo Bonzini
  25 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Chao Peng <chao.p.peng@linux.intel.com>

When geeting KVM_EXIT_MEMORY_FAULT exit, it indicates userspace needs to
do the memory conversion on the RAMBlock to turn the memory into desired
attribute, i.e., private/shared.

Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
KVM_EXIT_MEMORY_FAULT happens.

Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
guest_memfd memory backend.

Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is
added.

When page is converted from shared to private, the original shared
memory can be discarded via ram_block_discard_range(). Note, shared
memory can be discarded only when it's not back'ed by hugetlb because
hugetlb is supposed to be pre-allocated and no need for discarding.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Message-ID: <20240320083945.991426-13-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/sysemu/kvm.h   |  2 +
 accel/kvm/kvm-all.c    | 99 +++++++++++++++++++++++++++++++++++++-----
 accel/kvm/trace-events |  2 +
 3 files changed, 93 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 2cb31925091..698f1640fe2 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -541,4 +541,6 @@ int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
 
 int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
 int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
+
+int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
 #endif
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 56b17cbd8aa..afd7f992e39 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2893,6 +2893,70 @@ static void kvm_eat_signals(CPUState *cpu)
     } while (sigismember(&chkset, SIG_IPI));
 }
 
+int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
+{
+    MemoryRegionSection section;
+    ram_addr_t offset;
+    MemoryRegion *mr;
+    RAMBlock *rb;
+    void *addr;
+    int ret = -1;
+
+    trace_kvm_convert_memory(start, size, to_private ? "shared_to_private" : "private_to_shared");
+
+    if (!QEMU_PTR_IS_ALIGNED(start, qemu_real_host_page_size()) ||
+        !QEMU_PTR_IS_ALIGNED(size, qemu_real_host_page_size())) {
+        return -1;
+    }
+
+    if (!size) {
+        return -1;
+    }
+
+    section = memory_region_find(get_system_memory(), start, size);
+    mr = section.mr;
+    if (!mr) {
+        return -1;
+    }
+
+    if (!memory_region_has_guest_memfd(mr)) {
+        error_report("Converting non guest_memfd backed memory region "
+                     "(0x%"HWADDR_PRIx" ,+ 0x%"HWADDR_PRIx") to %s",
+                     start, size, to_private ? "private" : "shared");
+        ret = -1;
+        goto out_unref;
+    }
+
+    if (to_private) {
+        ret = kvm_set_memory_attributes_private(start, size);
+    } else {
+        ret = kvm_set_memory_attributes_shared(start, size);
+    }
+    if (ret) {
+        goto out_unref;
+    }
+
+    addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
+    rb = qemu_ram_block_from_host(addr, false, &offset);
+
+    if (to_private) {
+        if (rb->page_size == qemu_real_host_page_size()) {
+            /*
+            * shared memory is back'ed by  hugetlb, which is supposed to be
+            * pre-allocated and doesn't need to be discarded
+            */
+            goto out_unref;
+        }
+        ret = ram_block_discard_range(rb, offset, size);
+    } else {
+        ret = ram_block_discard_guest_memfd_range(rb, offset, size);
+    }
+
+out_unref:
+    memory_region_unref(section.mr);
+    return ret;
+}
+
 int kvm_cpu_exec(CPUState *cpu)
 {
     struct kvm_run *run = cpu->kvm_run;
@@ -2960,18 +3024,20 @@ int kvm_cpu_exec(CPUState *cpu)
                 ret = EXCP_INTERRUPT;
                 break;
             }
-            fprintf(stderr, "error: kvm run failed %s\n",
-                    strerror(-run_ret));
+            if (!(run_ret == -EFAULT && run->exit_reason == KVM_EXIT_MEMORY_FAULT)) {
+                fprintf(stderr, "error: kvm run failed %s\n",
+                        strerror(-run_ret));
 #ifdef TARGET_PPC
-            if (run_ret == -EBUSY) {
-                fprintf(stderr,
-                        "This is probably because your SMT is enabled.\n"
-                        "VCPU can only run on primary threads with all "
-                        "secondary threads offline.\n");
-            }
+                if (run_ret == -EBUSY) {
+                    fprintf(stderr,
+                            "This is probably because your SMT is enabled.\n"
+                            "VCPU can only run on primary threads with all "
+                            "secondary threads offline.\n");
+                }
 #endif
-            ret = -1;
-            break;
+                ret = -1;
+                break;
+            }
         }
 
         trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
@@ -3054,6 +3120,19 @@ int kvm_cpu_exec(CPUState *cpu)
                 break;
             }
             break;
+        case KVM_EXIT_MEMORY_FAULT:
+            trace_kvm_memory_fault(run->memory_fault.gpa,
+                                   run->memory_fault.size,
+                                   run->memory_fault.flags);
+            if (run->memory_fault.flags & ~KVM_MEMORY_EXIT_FLAG_PRIVATE) {
+                error_report("KVM_EXIT_MEMORY_FAULT: Unknown flag 0x%" PRIx64,
+                             (uint64_t)run->memory_fault.flags);
+                ret = -1;
+                break;
+            }
+            ret = kvm_convert_memory(run->memory_fault.gpa, run->memory_fault.size,
+                                     run->memory_fault.flags & KVM_MEMORY_EXIT_FLAG_PRIVATE);
+            break;
         default:
             ret = kvm_arch_handle_exit(cpu, run);
             break;
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index e8c52cb9e7a..681ccb667d6 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -31,3 +31,5 @@ kvm_cpu_exec(void) ""
 kvm_interrupt_exit_request(void) ""
 kvm_io_window_exit(void) ""
 kvm_run_exit_system_event(int cpu_index, uint32_t event_type) "cpu_index %d, system_even_type %"PRIu32
+kvm_convert_memory(uint64_t start, uint64_t size, const char *msg) "start 0x%" PRIx64 " size 0x%" PRIx64 " %s"
+kvm_memory_fault(uint64_t start, uint64_t size, uint64_t flags) "start 0x%" PRIx64 " size 0x%" PRIx64 " flags 0x%" PRIx64
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 26/26] i386/kvm: Move architectural CPUID leaf generation to separate helper
  2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
                   ` (24 preceding siblings ...)
  2024-03-22 18:11 ` [PATCH 25/26] kvm: handle KVM_EXIT_MEMORY_FAULT Paolo Bonzini
@ 2024-03-22 18:11 ` Paolo Bonzini
  2024-04-01 15:03   ` Xiaoyao Li
  25 siblings, 1 reply; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-22 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: xiaoyao.li, michael.roth, david

From: Sean Christopherson <sean.j.christopherson@intel.com>

Move the architectural (for lack of a better term) CPUID leaf generation
to a separate helper so that the generation code can be reused by TDX,
which needs to generate a canonical VM-scoped configuration.

For now this is just a cleanup, so keep the function static.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240229063726.610065-23-xiaoyao.li@intel.com>
[Unify error reporting, rename function. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/kvm/kvm.c | 446 +++++++++++++++++++++---------------------
 1 file changed, 224 insertions(+), 222 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 2577e345502..eab6261e1f5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1752,6 +1752,228 @@ static void kvm_init_nested_state(CPUX86State *env)
     }
 }
 
+static uint32_t kvm_x86_build_cpuid(CPUX86State *env,
+                                    struct kvm_cpuid_entry2 *entries,
+                                    uint32_t cpuid_i)
+{
+    uint32_t limit, i, j;
+    uint32_t unused;
+    struct kvm_cpuid_entry2 *c;
+
+    cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
+
+    for (i = 0; i <= limit; i++) {
+        j = 0;
+        if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
+            goto full;
+        }
+        c = &entries[cpuid_i++];
+        switch (i) {
+        case 2: {
+            /* Keep reading function 2 till all the input is received */
+            int times;
+
+            c->function = i;
+            c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC |
+                       KVM_CPUID_FLAG_STATE_READ_NEXT;
+            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+            times = c->eax & 0xff;
+
+            for (j = 1; j < times; ++j) {
+                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
+                    goto full;
+                }
+                c = &entries[cpuid_i++];
+                c->function = i;
+                c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
+                cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+            }
+            break;
+        }
+        case 0x1f:
+            if (env->nr_dies < 2) {
+                cpuid_i--;
+                break;
+            }
+            /* fallthrough */
+        case 4:
+        case 0xb:
+        case 0xd:
+            for (j = 0; ; j++) {
+                if (i == 0xd && j == 64) {
+                    break;
+                }
+
+                c->function = i;
+                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+                c->index = j;
+                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
+
+                if (i == 4 && c->eax == 0) {
+                    break;
+                }
+                if (i == 0xb && !(c->ecx & 0xff00)) {
+                    break;
+                }
+                if (i == 0x1f && !(c->ecx & 0xff00)) {
+                    break;
+                }
+                if (i == 0xd && c->eax == 0) {
+                    continue;
+                }
+                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
+                    goto full;
+                }
+                c = &entries[cpuid_i++];
+            }
+            break;
+        case 0x12:
+            for (j = 0; ; j++) {
+                c->function = i;
+                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+                c->index = j;
+                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
+
+                if (j > 1 && (c->eax & 0xf) != 1) {
+                    break;
+                }
+
+                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
+                    goto full;
+                }
+                c = &entries[cpuid_i++];
+            }
+            break;
+        case 0x7:
+        case 0x14:
+        case 0x1d:
+        case 0x1e: {
+            uint32_t times;
+
+            c->function = i;
+            c->index = 0;
+            c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+            times = c->eax;
+
+            for (j = 1; j <= times; ++j) {
+                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
+                    goto full;
+                }
+                c = &entries[cpuid_i++];
+                c->function = i;
+                c->index = j;
+                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
+            }
+            break;
+        }
+        default:
+            c->function = i;
+            c->flags = 0;
+            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+            if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
+                /*
+                 * KVM already returns all zeroes if a CPUID entry is missing,
+                 * so we can omit it and avoid hitting KVM's 80-entry limit.
+                 */
+                cpuid_i--;
+            }
+            break;
+        }
+    }
+
+    if (limit >= 0x0a) {
+        uint32_t eax, edx;
+
+        cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
+
+        has_architectural_pmu_version = eax & 0xff;
+        if (has_architectural_pmu_version > 0) {
+            num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
+
+            /* Shouldn't be more than 32, since that's the number of bits
+             * available in EBX to tell us _which_ counters are available.
+             * Play it safe.
+             */
+            if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
+                num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+            }
+
+            if (has_architectural_pmu_version > 1) {
+                num_architectural_pmu_fixed_counters = edx & 0x1f;
+
+                if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+                    num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+                }
+            }
+        }
+    }
+
+    cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
+
+    for (i = 0x80000000; i <= limit; i++) {
+        j = 0;
+        c = &entries[cpuid_i++];
+
+        switch (i) {
+        case 0x8000001d:
+            /* Query for all AMD cache information leaves */
+            for (j = 0; ; j++) {
+                c->function = i;
+                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+                c->index = j;
+                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
+
+                if (c->eax == 0) {
+                    break;
+                }
+                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
+                    goto full;
+                }
+                c = &entries[cpuid_i++];
+            }
+            break;
+        default:
+            c->function = i;
+            c->flags = 0;
+            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+            if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
+                /*
+                 * KVM already returns all zeroes if a CPUID entry is missing,
+                 * so we can omit it and avoid hitting KVM's 80-entry limit.
+                 */
+                cpuid_i--;
+            }
+            break;
+        }
+    }
+
+    /* Call Centaur's CPUID instructions they are supported. */
+    if (env->cpuid_xlevel2 > 0) {
+        cpu_x86_cpuid(env, 0xC0000000, 0, &limit, &unused, &unused, &unused);
+
+        for (i = 0xC0000000; i <= limit; i++) {
+            j = 0;
+            if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
+                goto full;
+            }
+            c = &entries[cpuid_i++];
+
+            c->function = i;
+            c->flags = 0;
+            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
+        }
+    }
+
+    return cpuid_i;
+
+full:
+    fprintf(stderr, "cpuid_data is full, no space for "
+            "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
+    abort();
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
     struct {
@@ -1768,8 +1990,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
     X86CPU *cpu = X86_CPU(cs);
     CPUX86State *env = &cpu->env;
-    uint32_t limit, i, j, cpuid_i;
-    uint32_t unused;
+    uint32_t cpuid_i;
     struct kvm_cpuid_entry2 *c;
     uint32_t signature[3];
     int kvm_base = KVM_CPUID_SIGNATURE;
@@ -1922,8 +2143,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
         c->edx = env->features[FEAT_KVM_HINTS];
     }
 
-    cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
-
     if (cpu->kvm_pv_enforce_cpuid) {
         r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1);
         if (r < 0) {
@@ -1934,224 +2153,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
         }
     }
 
-    for (i = 0; i <= limit; i++) {
-        if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
-            fprintf(stderr, "unsupported level value: 0x%x\n", limit);
-            abort();
-        }
-        c = &cpuid_data.entries[cpuid_i++];
-
-        switch (i) {
-        case 2: {
-            /* Keep reading function 2 till all the input is received */
-            int times;
-
-            c->function = i;
-            c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC |
-                       KVM_CPUID_FLAG_STATE_READ_NEXT;
-            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
-            times = c->eax & 0xff;
-
-            for (j = 1; j < times; ++j) {
-                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
-                    fprintf(stderr, "cpuid_data is full, no space for "
-                            "cpuid(eax:2):eax & 0xf = 0x%x\n", times);
-                    abort();
-                }
-                c = &cpuid_data.entries[cpuid_i++];
-                c->function = i;
-                c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
-                cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
-            }
-            break;
-        }
-        case 0x1f:
-            if (env->nr_dies < 2) {
-                cpuid_i--;
-                break;
-            }
-            /* fallthrough */
-        case 4:
-        case 0xb:
-        case 0xd:
-            for (j = 0; ; j++) {
-                if (i == 0xd && j == 64) {
-                    break;
-                }
-
-                c->function = i;
-                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
-                c->index = j;
-                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
-
-                if (i == 4 && c->eax == 0) {
-                    break;
-                }
-                if (i == 0xb && !(c->ecx & 0xff00)) {
-                    break;
-                }
-                if (i == 0x1f && !(c->ecx & 0xff00)) {
-                    break;
-                }
-                if (i == 0xd && c->eax == 0) {
-                    continue;
-                }
-                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
-                    fprintf(stderr, "cpuid_data is full, no space for "
-                            "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
-                    abort();
-                }
-                c = &cpuid_data.entries[cpuid_i++];
-            }
-            break;
-        case 0x12:
-            for (j = 0; ; j++) {
-                c->function = i;
-                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
-                c->index = j;
-                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
-
-                if (j > 1 && (c->eax & 0xf) != 1) {
-                    break;
-                }
-
-                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
-                    fprintf(stderr, "cpuid_data is full, no space for "
-                                "cpuid(eax:0x12,ecx:0x%x)\n", j);
-                    abort();
-                }
-                c = &cpuid_data.entries[cpuid_i++];
-            }
-            break;
-        case 0x7:
-        case 0x14:
-        case 0x1d:
-        case 0x1e: {
-            uint32_t times;
-
-            c->function = i;
-            c->index = 0;
-            c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
-            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
-            times = c->eax;
-
-            for (j = 1; j <= times; ++j) {
-                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
-                    fprintf(stderr, "cpuid_data is full, no space for "
-                                "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
-                    abort();
-                }
-                c = &cpuid_data.entries[cpuid_i++];
-                c->function = i;
-                c->index = j;
-                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
-                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
-            }
-            break;
-        }
-        default:
-            c->function = i;
-            c->flags = 0;
-            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
-            if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
-                /*
-                 * KVM already returns all zeroes if a CPUID entry is missing,
-                 * so we can omit it and avoid hitting KVM's 80-entry limit.
-                 */
-                cpuid_i--;
-            }
-            break;
-        }
-    }
-
-    if (limit >= 0x0a) {
-        uint32_t eax, edx;
-
-        cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
-
-        has_architectural_pmu_version = eax & 0xff;
-        if (has_architectural_pmu_version > 0) {
-            num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
-
-            /* Shouldn't be more than 32, since that's the number of bits
-             * available in EBX to tell us _which_ counters are available.
-             * Play it safe.
-             */
-            if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
-                num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
-            }
-
-            if (has_architectural_pmu_version > 1) {
-                num_architectural_pmu_fixed_counters = edx & 0x1f;
-
-                if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
-                    num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
-                }
-            }
-        }
-    }
-
-    cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
-
-    for (i = 0x80000000; i <= limit; i++) {
-        if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
-            fprintf(stderr, "unsupported xlevel value: 0x%x\n", limit);
-            abort();
-        }
-        c = &cpuid_data.entries[cpuid_i++];
-
-        switch (i) {
-        case 0x8000001d:
-            /* Query for all AMD cache information leaves */
-            for (j = 0; ; j++) {
-                c->function = i;
-                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
-                c->index = j;
-                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
-
-                if (c->eax == 0) {
-                    break;
-                }
-                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
-                    fprintf(stderr, "cpuid_data is full, no space for "
-                            "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
-                    abort();
-                }
-                c = &cpuid_data.entries[cpuid_i++];
-            }
-            break;
-        default:
-            c->function = i;
-            c->flags = 0;
-            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
-            if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
-                /*
-                 * KVM already returns all zeroes if a CPUID entry is missing,
-                 * so we can omit it and avoid hitting KVM's 80-entry limit.
-                 */
-                cpuid_i--;
-            }
-            break;
-        }
-    }
-
-    /* Call Centaur's CPUID instructions they are supported. */
-    if (env->cpuid_xlevel2 > 0) {
-        cpu_x86_cpuid(env, 0xC0000000, 0, &limit, &unused, &unused, &unused);
-
-        for (i = 0xC0000000; i <= limit; i++) {
-            if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
-                fprintf(stderr, "unsupported xlevel2 value: 0x%x\n", limit);
-                abort();
-            }
-            c = &cpuid_data.entries[cpuid_i++];
-
-            c->function = i;
-            c->flags = 0;
-            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
-        }
-    }
-
+    cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
     cpuid_data.cpuid.nent = cpuid_i;
 
     if (((env->cpuid_version >> 8)&0xF) >= 6
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 19/26] RAMBlock: Add support of KVM private guest memfd
  2024-03-22 18:11 ` [PATCH 19/26] RAMBlock: Add support of KVM private guest memfd Paolo Bonzini
@ 2024-03-22 18:53   ` Michael Roth
  0 siblings, 0 replies; 42+ messages in thread
From: Michael Roth @ 2024-03-22 18:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, xiaoyao.li, david

On Fri, Mar 22, 2024 at 07:11:09PM +0100, Paolo Bonzini wrote:
> From: Michael Roth <michael.roth@amd.com>

This should be:

  From: Xiaoyao Li <xiaoyao.li@intel.com>

Looks like the author got reset in my tree for some reason and I failed to
notice it before posting. Sorry for the mix-up.

-Mike

> 
> Add KVM guest_memfd support to RAMBlock so both normal hva based memory
> and kvm guest memfd based private memory can be associated in one RAMBlock.
> 
> Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to
> create private guest_memfd during RAMBlock setup.
> 
> Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd
> is more flexible and extensible than simply relying on the VM type because
> in the future we may have the case that not all the memory of a VM need
> guest memfd. As a benefit, it also avoid getting MachineState in memory
> subsystem.
> 
> Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of
> confidential guests, such as TDX VM. How and when to set it for memory
> backends will be implemented in the following patches.
> 
> Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has
> KVM guest_memfd allocated.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Message-ID: <20240320083945.991426-7-michael.roth@amd.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/exec/memory.h   | 20 +++++++++++++++++---
>  include/exec/ram_addr.h |  2 +-
>  include/exec/ramblock.h |  1 +
>  include/sysemu/kvm.h    |  3 ++-
>  accel/kvm/kvm-all.c     | 28 ++++++++++++++++++++++++++++
>  accel/stubs/kvm-stub.c  |  5 +++++
>  system/memory.c         |  5 +++++
>  system/physmem.c        | 34 +++++++++++++++++++++++++++++++---
>  8 files changed, 90 insertions(+), 8 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 8626a355b31..679a8476852 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -243,6 +243,9 @@ typedef struct IOMMUTLBEvent {
>  /* RAM FD is opened read-only */
>  #define RAM_READONLY_FD (1 << 11)
>  
> +/* RAM can be private that has kvm guest memfd backend */
> +#define RAM_GUEST_MEMFD   (1 << 12)
> +
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         IOMMUNotifierFlag flags,
>                                         hwaddr start, hwaddr end,
> @@ -1307,7 +1310,8 @@ bool memory_region_init_ram_nomigrate(MemoryRegion *mr,
>   * @name: Region name, becomes part of RAMBlock name used in migration stream
>   *        must be unique within any device
>   * @size: size of the region.
> - * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_NORESERVE.
> + * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_NORESERVE,
> + *             RAM_GUEST_MEMFD.
>   * @errp: pointer to Error*, to store an error if it happens.
>   *
>   * Note that this function does not do anything to cause the data in the
> @@ -1369,7 +1373,7 @@ bool memory_region_init_resizeable_ram(MemoryRegion *mr,
>   *         (getpagesize()) will be used.
>   * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
>   *             RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
> - *             RAM_READONLY_FD
> + *             RAM_READONLY_FD, RAM_GUEST_MEMFD
>   * @path: the path in which to allocate the RAM.
>   * @offset: offset within the file referenced by path
>   * @errp: pointer to Error*, to store an error if it happens.
> @@ -1399,7 +1403,7 @@ bool memory_region_init_ram_from_file(MemoryRegion *mr,
>   * @size: size of the region.
>   * @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
>   *             RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
> - *             RAM_READONLY_FD
> + *             RAM_READONLY_FD, RAM_GUEST_MEMFD
>   * @fd: the fd to mmap.
>   * @offset: offset within the file referenced by fd
>   * @errp: pointer to Error*, to store an error if it happens.
> @@ -1722,6 +1726,16 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
>   */
>  bool memory_region_is_protected(MemoryRegion *mr);
>  
> +/**
> + * memory_region_has_guest_memfd: check whether a memory region has guest_memfd
> + *     associated
> + *
> + * Returns %true if a memory region's ram_block has valid guest_memfd assigned.
> + *
> + * @mr: the memory region being queried
> + */
> +bool memory_region_has_guest_memfd(MemoryRegion *mr);
> +
>  /**
>   * memory_region_get_iommu: check whether a memory region is an iommu
>   *
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index de45ba7bc96..07c8f863750 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -110,7 +110,7 @@ long qemu_maxrampagesize(void);
>   *  @mr: the memory region where the ram block is
>   *  @ram_flags: RamBlock flags. Supported flags: RAM_SHARED, RAM_PMEM,
>   *              RAM_NORESERVE, RAM_PROTECTED, RAM_NAMED_FILE, RAM_READONLY,
> - *              RAM_READONLY_FD
> + *              RAM_READONLY_FD, RAM_GUEST_MEMFD
>   *  @mem_path or @fd: specify the backing file or device
>   *  @offset: Offset into target file
>   *  @errp: pointer to Error*, to store an error if it happens
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 848915ea5bf..459c8917de2 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -41,6 +41,7 @@ struct RAMBlock {
>      QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
>      int fd;
>      uint64_t fd_offset;
> +    int guest_memfd;
>      size_t page_size;
>      /* dirty bitmap used during migration */
>      unsigned long *bmap;
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index bda309d5ffa..2cb31925091 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -537,7 +537,8 @@ void kvm_mark_guest_state_protected(void);
>   */
>  bool kvm_hwpoisoned_mem(void);
>  
> +int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
> +
>  int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
>  int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
> -
>  #endif
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 36e39fd6514..6aa0608805b 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -92,6 +92,7 @@ static bool kvm_has_guest_debug;
>  static int kvm_sstep_flags;
>  static bool kvm_immediate_exit;
>  static uint64_t kvm_supported_memory_attributes;
> +static bool kvm_guest_memfd_supported;
>  static hwaddr kvm_max_slot_size = ~0;
>  
>  static const KVMCapabilityInfo kvm_required_capabilites[] = {
> @@ -2413,6 +2414,11 @@ static int kvm_init(MachineState *ms)
>      }
>  
>      kvm_supported_memory_attributes = kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES);
> +    kvm_guest_memfd_supported =
> +        kvm_check_extension(s, KVM_CAP_GUEST_MEMFD) &&
> +        kvm_check_extension(s, KVM_CAP_USER_MEMORY2) &&
> +        (kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +
>      kvm_immediate_exit = kvm_check_extension(s, KVM_CAP_IMMEDIATE_EXIT);
>      s->nr_slots = kvm_check_extension(s, KVM_CAP_NR_MEMSLOTS);
>  
> @@ -4131,3 +4137,25 @@ void kvm_mark_guest_state_protected(void)
>  {
>      kvm_state->guest_state_protected = true;
>  }
> +
> +int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
> +{
> +    int fd;
> +    struct kvm_create_guest_memfd guest_memfd = {
> +        .size = size,
> +        .flags = flags,
> +    };
> +
> +    if (!kvm_guest_memfd_supported) {
> +        error_setg(errp, "KVM doesn't support guest memfd\n");
> +        return -1;
> +    }
> +
> +    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &guest_memfd);
> +    if (fd < 0) {
> +        error_setg_errno(errp, errno, "Error creating kvm guest memfd");
> +        return -1;
> +    }
> +
> +    return fd;
> +}
> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
> index ca381728840..8e0eb22e61c 100644
> --- a/accel/stubs/kvm-stub.c
> +++ b/accel/stubs/kvm-stub.c
> @@ -129,3 +129,8 @@ bool kvm_hwpoisoned_mem(void)
>  {
>      return false;
>  }
> +
> +int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
> +{
> +    return -ENOSYS;
> +}
> diff --git a/system/memory.c b/system/memory.c
> index a229a79988f..c756950c0c0 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -1850,6 +1850,11 @@ bool memory_region_is_protected(MemoryRegion *mr)
>      return mr->ram && (mr->ram_block->flags & RAM_PROTECTED);
>  }
>  
> +bool memory_region_has_guest_memfd(MemoryRegion *mr)
> +{
> +    return mr->ram_block && mr->ram_block->guest_memfd >= 0;
> +}
> +
>  uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
>  {
>      uint8_t mask = mr->dirty_log_mask;
> diff --git a/system/physmem.c b/system/physmem.c
> index a4fe3d2bf89..f5dfa20e57e 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -1808,6 +1808,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>      const bool shared = qemu_ram_is_shared(new_block);
>      RAMBlock *block;
>      RAMBlock *last_block = NULL;
> +    bool free_on_error = false;
>      ram_addr_t old_ram_size, new_ram_size;
>      Error *err = NULL;
>  
> @@ -1837,6 +1838,19 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>                  return;
>              }
>              memory_try_enable_merging(new_block->host, new_block->max_length);
> +            free_on_error = true;
> +        }
> +    }
> +
> +    if (new_block->flags & RAM_GUEST_MEMFD) {
> +        assert(kvm_enabled());
> +        assert(new_block->guest_memfd < 0);
> +
> +        new_block->guest_memfd = kvm_create_guest_memfd(new_block->max_length,
> +                                                        0, errp);
> +        if (new_block->guest_memfd < 0) {
> +            qemu_mutex_unlock_ramlist();
> +            goto out_free;
>          }
>      }
>  
> @@ -1888,6 +1902,13 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>          ram_block_notify_add(new_block->host, new_block->used_length,
>                               new_block->max_length);
>      }
> +    return;
> +
> +out_free:
> +    if (free_on_error) {
> +        qemu_anon_ram_free(new_block->host, new_block->max_length);
> +        new_block->host = NULL;
> +    }
>  }
>  
>  #ifdef CONFIG_POSIX
> @@ -1902,7 +1923,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      /* Just support these ram flags by now. */
>      assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
>                            RAM_PROTECTED | RAM_NAMED_FILE | RAM_READONLY |
> -                          RAM_READONLY_FD)) == 0);
> +                          RAM_READONLY_FD | RAM_GUEST_MEMFD)) == 0);
>  
>      if (xen_enabled()) {
>          error_setg(errp, "-mem-path not supported with Xen");
> @@ -1939,6 +1960,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      new_block->used_length = size;
>      new_block->max_length = size;
>      new_block->flags = ram_flags;
> +    new_block->guest_memfd = -1;
>      new_block->host = file_ram_alloc(new_block, size, fd, !file_size, offset,
>                                       errp);
>      if (!new_block->host) {
> @@ -2018,7 +2040,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
>      int align;
>  
>      assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
> -                          RAM_NORESERVE)) == 0);
> +                          RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
>      assert(!host ^ (ram_flags & RAM_PREALLOC));
>  
>      align = qemu_real_host_page_size();
> @@ -2033,6 +2055,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
>      new_block->max_length = max_size;
>      assert(max_size >= size);
>      new_block->fd = -1;
> +    new_block->guest_memfd = -1;
>      new_block->page_size = qemu_real_host_page_size();
>      new_block->host = host;
>      new_block->flags = ram_flags;
> @@ -2055,7 +2078,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>  RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
>                           MemoryRegion *mr, Error **errp)
>  {
> -    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE)) == 0);
> +    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
>      return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
>  }
>  
> @@ -2083,6 +2106,11 @@ static void reclaim_ramblock(RAMBlock *block)
>      } else {
>          qemu_anon_ram_free(block->host, block->max_length);
>      }
> +
> +    if (block->guest_memfd >= 0) {
> +        close(block->guest_memfd);
> +    }
> +
>      g_free(block);
>  }
>  
> -- 
> 2.44.0
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/26] confidential guest support: Add kvm_init() and kvm_reset() in class
  2024-03-22 18:10 ` [PATCH 03/26] confidential guest support: Add kvm_init() and kvm_reset() in class Paolo Bonzini
@ 2024-03-25  8:32   ` Philippe Mathieu-Daudé
  2024-03-25 10:53     ` Paolo Bonzini
  0 siblings, 1 reply; 42+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-25  8:32 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: xiaoyao.li, michael.roth, david

Hi Xiaoyao,

On 22/3/24 19:10, Paolo Bonzini wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Different confidential VMs in different architectures all have the same
> needs to do their specific initialization (and maybe resetting) stuffs
> with KVM. Currently each of them exposes individual *_kvm_init()
> functions and let machine code or kvm code to call it.
> 
> To facilitate the introduction of confidential guest technology from
> different x86 vendors, add two virtual functions, kvm_init() and kvm_reset()
> in ConfidentialGuestSupportClass, and expose two helpers functions for
> invodking them.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Message-Id: <20240229060038.606591-1-xiaoyao.li@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   include/exec/confidential-guest-support.h | 34 ++++++++++++++++++++++-
>   1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/include/exec/confidential-guest-support.h b/include/exec/confidential-guest-support.h
> index ba2dd4b5dfc..e5b188cffbf 100644
> --- a/include/exec/confidential-guest-support.h
> +++ b/include/exec/confidential-guest-support.h
> @@ -23,7 +23,10 @@
>   #include "qom/object.h"
>   
>   #define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
> -OBJECT_DECLARE_SIMPLE_TYPE(ConfidentialGuestSupport, CONFIDENTIAL_GUEST_SUPPORT)
> +OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
> +                    ConfidentialGuestSupportClass,
> +                    CONFIDENTIAL_GUEST_SUPPORT)
> +
>   
>   struct ConfidentialGuestSupport {
>       Object parent;
> @@ -55,8 +58,37 @@ struct ConfidentialGuestSupport {
>   
>   typedef struct ConfidentialGuestSupportClass {
>       ObjectClass parent;
> +
> +    int (*kvm_init)(ConfidentialGuestSupport *cgs, Error **errp);
> +    int (*kvm_reset)(ConfidentialGuestSupport *cgs, Error **errp);

Can we get a docstring indicating what these functions return?
Looking at the next patch, the KVM specific return value doesn't
seem used, so can we return a boolean instead?

>   } ConfidentialGuestSupportClass;

I suppose it will be easy enough to refactor for future other
HW accelerators.

Regards,

Phil.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 11/26] runstate: skip initial CPU reset if reset is not actually possible
  2024-03-22 18:11 ` [PATCH 11/26] runstate: skip initial CPU reset if reset is not actually possible Paolo Bonzini
@ 2024-03-25  8:58   ` Daniel P. Berrangé
  2024-03-25  9:32   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 42+ messages in thread
From: Daniel P. Berrangé @ 2024-03-25  8:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, xiaoyao.li, michael.roth, david

On Fri, Mar 22, 2024 at 07:11:01PM +0100, Paolo Bonzini wrote:
> Right now, the system reset is concluded by a call to
> cpu_synchronize_all_post_reset() in order to sync any changes
> that the machine reset callback applied to the CPU state.
> 
> However, for VMs with encrypted state such as SEV-ES guests (currently
> the only case of guests with non-resettable CPUs) this cannot be done,
> because guest state has already been finalized by machine-init-done notifiers.
> cpu_synchronize_all_post_reset() does nothing on these guests, and actually
> we would like to make it fail if called once guest has been encrypted.
> So, assume that boards that support non-resettable CPUs do not touch
> CPU state and that all such setup is done before, at the time of
> cpu_synchronize_all_post_init().
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  system/runstate.c | 15 ++++++++++++++-
>  roms/edk2         |  2 +-
>  2 files changed, 15 insertions(+), 2 deletions(-)

Accidental submodule change here :

> diff --git a/roms/edk2 b/roms/edk2
> index edc6681206c..819cfc6b42a 160000
> --- a/roms/edk2
> +++ b/roms/edk2
> @@ -1 +1 @@
> -Subproject commit edc6681206c1a8791981a2f911d2fb8b3d2f5768
> +Subproject commit 819cfc6b42a68790a23509e4fcc58ceb70e1965e
> -- 
> 2.44.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/26] KVM: track whether guest state is encrypted
  2024-03-22 18:11 ` [PATCH 12/26] KVM: track whether guest state is encrypted Paolo Bonzini
@ 2024-03-25  9:25   ` Philippe Mathieu-Daudé
  2024-03-26 15:48   ` Xiaoyao Li
  1 sibling, 0 replies; 42+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-25  9:25 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: xiaoyao.li, michael.roth, david

On 22/3/24 19:11, Paolo Bonzini wrote:
> So far, KVM has allowed KVM_GET/SET_* ioctls to execute even if the
> guest state is encrypted, in which case they do nothing.  For the new
> API using VM types, instead, the ioctls will fail which is a safer and
> more robust approach.
> 
> The new API will be the only one available for SEV-SNP and TDX, but it
> is also usable for SEV and SEV-ES.  In preparation for that, require
> architecture-specific KVM code to communicate the point at which guest
> state is protected (which must be after kvm_cpu_synchronize_post_init(),
> though that might change in the future in order to suppor migration).
>  From that point, skip reading registers so that cpu->vcpu_dirty is
> never true: if it ever becomes true, kvm_arch_put_registers() will
> fail miserably.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   include/sysemu/kvm.h     |  2 ++
>   include/sysemu/kvm_int.h |  1 +
>   accel/kvm/kvm-all.c      | 14 ++++++++++++--
>   target/i386/sev.c        |  1 +
>   4 files changed, 16 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 13/26] KVM: remove kvm_arch_cpu_check_are_resettable
  2024-03-22 18:11 ` [PATCH 13/26] KVM: remove kvm_arch_cpu_check_are_resettable Paolo Bonzini
@ 2024-03-25  9:26   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 42+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-25  9:26 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: xiaoyao.li, michael.roth, david

On 22/3/24 19:11, Paolo Bonzini wrote:
> Board reset requires writing a fresh CPU state.  As far as KVM is
> concerned, the only thing that blocks reset is that CPU state is
> encrypted; therefore, kvm_cpus_are_resettable() can simply check
> if that is the case.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   include/sysemu/kvm.h       | 10 ----------
>   accel/kvm/kvm-accel-ops.c  |  2 +-
>   accel/kvm/kvm-all.c        |  5 -----
>   target/arm/kvm.c           |  5 -----
>   target/i386/kvm/kvm.c      |  5 -----
>   target/loongarch/kvm/kvm.c |  5 -----
>   target/mips/kvm.c          |  5 -----
>   target/ppc/kvm.c           |  5 -----
>   target/riscv/kvm/kvm-cpu.c |  5 -----
>   target/s390x/kvm/kvm.c     |  5 -----
>   10 files changed, 1 insertion(+), 51 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 15/26] target/i386: Implement mc->kvm_type() to get VM type
  2024-03-22 18:11 ` [PATCH 15/26] target/i386: Implement mc->kvm_type() to get VM type Paolo Bonzini
@ 2024-03-25  9:29   ` Philippe Mathieu-Daudé
  2024-03-25 10:57     ` Paolo Bonzini
  0 siblings, 1 reply; 42+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-25  9:29 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: xiaoyao.li, michael.roth, david

On 22/3/24 19:11, Paolo Bonzini wrote:
> KVM is introducing a new API to create confidential guests, which
> will be used by TDX and SEV-SNP but is also available for SEV and
> SEV-ES.  The API uses the VM type argument to KVM_CREATE_VM to
> identify which confidential computing technology to use.
> 
> Since there are no other expected uses of VM types, delegate
> mc->kvm_type() for x86 boards to the confidential-guest-support

s/mc/cgs/ here and in subject?

> object pointed to by ms->cgs.
> 
> For example, if a sev-guest object is specified to confidential-guest-support,
> like,
> 
>    qemu -machine ...,confidential-guest-support=sev0 \
>         -object sev-guest,id=sev0,...
> 
> it will check if a VM type KVM_X86_SEV_VM or KVM_X86_SEV_ES_VM
> is supported, and if so use them together with the KVM_SEV_INIT2
> function of the KVM_MEMORY_ENCRYPT_OP ioctl. If not, it will fall back to
> KVM_SEV_INIT and KVM_SEV_ES_INIT.
> 
> This is a preparatory work towards TDX and SEV-SNP support, but it
> will also enable support for VMSA features such as DebugSwap, which
> are only available via KVM_SEV_INIT2.
> 
> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   target/i386/confidential-guest.h | 19 ++++++++++++++
>   target/i386/kvm/kvm_i386.h       |  2 ++
>   hw/i386/x86.c                    | 11 ++++++++
>   target/i386/kvm/kvm.c            | 44 ++++++++++++++++++++++++++++++++
>   4 files changed, 76 insertions(+)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 18/26] kvm: Introduce support for memory_attributes
  2024-03-22 18:11 ` [PATCH 18/26] kvm: Introduce support for memory_attributes Paolo Bonzini
@ 2024-03-25  9:31   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 42+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-25  9:31 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: xiaoyao.li, michael.roth, david

On 22/3/24 19:11, Paolo Bonzini wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Introduce the helper functions to set the attributes of a range of
> memory to private or shared.
> 
> This is necessary to notify KVM the private/shared attribute of each gpa
> range. KVM needs the information to decide the GPA needs to be mapped at
> hva-based shared memory or guest_memfd based private memory.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Message-ID: <20240320083945.991426-11-michael.roth@amd.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   include/sysemu/kvm.h |  4 ++++
>   accel/kvm/kvm-all.c  | 31 +++++++++++++++++++++++++++++++
>   2 files changed, 35 insertions(+)
> 
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 54f4d83a370..bda309d5ffa 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -536,4 +536,8 @@ void kvm_mark_guest_state_protected(void);
>    * reported for the VM.
>    */
>   bool kvm_hwpoisoned_mem(void);
> +
> +int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
> +int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);

uint64_t size? (kvm_memory_attributes::size is __u64).


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 11/26] runstate: skip initial CPU reset if reset is not actually possible
  2024-03-22 18:11 ` [PATCH 11/26] runstate: skip initial CPU reset if reset is not actually possible Paolo Bonzini
  2024-03-25  8:58   ` Daniel P. Berrangé
@ 2024-03-25  9:32   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 42+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-03-25  9:32 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: xiaoyao.li, michael.roth, david

On 22/3/24 19:11, Paolo Bonzini wrote:
> Right now, the system reset is concluded by a call to
> cpu_synchronize_all_post_reset() in order to sync any changes
> that the machine reset callback applied to the CPU state.
> 
> However, for VMs with encrypted state such as SEV-ES guests (currently
> the only case of guests with non-resettable CPUs) this cannot be done,
> because guest state has already been finalized by machine-init-done notifiers.
> cpu_synchronize_all_post_reset() does nothing on these guests, and actually
> we would like to make it fail if called once guest has been encrypted.
> So, assume that boards that support non-resettable CPUs do not touch
> CPU state and that all such setup is done before, at the time of
> cpu_synchronize_all_post_init().
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   system/runstate.c | 15 ++++++++++++++-
>   roms/edk2         |  2 +-
Without submodule change:
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

>   2 files changed, 15 insertions(+), 2 deletions(-)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/26] confidential guest support: Add kvm_init() and kvm_reset() in class
  2024-03-25  8:32   ` Philippe Mathieu-Daudé
@ 2024-03-25 10:53     ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-25 10:53 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: qemu-devel, xiaoyao.li, michael.roth, david

On Mon, Mar 25, 2024 at 9:33 AM Philippe Mathieu-Daudé
<philmd@linaro.org> wrote:
>
> Hi Xiaoyao,
>
> On 22/3/24 19:10, Paolo Bonzini wrote:
> > From: Xiaoyao Li <xiaoyao.li@intel.com>
> >
> > Different confidential VMs in different architectures all have the same
> > needs to do their specific initialization (and maybe resetting) stuffs
> > with KVM. Currently each of them exposes individual *_kvm_init()
> > functions and let machine code or kvm code to call it.
> >
> > To facilitate the introduction of confidential guest technology from
> > different x86 vendors, add two virtual functions, kvm_init() and kvm_reset()
> > in ConfidentialGuestSupportClass, and expose two helpers functions for
> > invodking them.
> >
> > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > Message-Id: <20240229060038.606591-1-xiaoyao.li@intel.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   include/exec/confidential-guest-support.h | 34 ++++++++++++++++++++++-
> >   1 file changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/exec/confidential-guest-support.h b/include/exec/confidential-guest-support.h
> > index ba2dd4b5dfc..e5b188cffbf 100644
> > --- a/include/exec/confidential-guest-support.h
> > +++ b/include/exec/confidential-guest-support.h
> > @@ -23,7 +23,10 @@
> >   #include "qom/object.h"
> >
> >   #define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
> > -OBJECT_DECLARE_SIMPLE_TYPE(ConfidentialGuestSupport, CONFIDENTIAL_GUEST_SUPPORT)
> > +OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
> > +                    ConfidentialGuestSupportClass,
> > +                    CONFIDENTIAL_GUEST_SUPPORT)
> > +
> >
> >   struct ConfidentialGuestSupport {
> >       Object parent;
> > @@ -55,8 +58,37 @@ struct ConfidentialGuestSupport {
> >
> >   typedef struct ConfidentialGuestSupportClass {
> >       ObjectClass parent;
> > +
> > +    int (*kvm_init)(ConfidentialGuestSupport *cgs, Error **errp);
> > +    int (*kvm_reset)(ConfidentialGuestSupport *cgs, Error **errp);
>
> Can we get a docstring indicating what these functions return?
> Looking at the next patch, the KVM specific return value doesn't
> seem used, so can we return a boolean instead?

It is propagated all the way up to accel_init_machine().  _There_ it
isn't used, but I think it's not a good idea to return a "wrong" value
from kvm_arch_init() because we know that ultimately it isn't used. It
should be refactored top-down, though I admit that to be honest it
won't happen.

Paolo



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 15/26] target/i386: Implement mc->kvm_type() to get VM type
  2024-03-25  9:29   ` Philippe Mathieu-Daudé
@ 2024-03-25 10:57     ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-25 10:57 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: qemu-devel, xiaoyao.li, michael.roth, david

On Mon, Mar 25, 2024 at 10:29 AM Philippe Mathieu-Daudé
<philmd@linaro.org> wrote:
>
> On 22/3/24 19:11, Paolo Bonzini wrote:
> > KVM is introducing a new API to create confidential guests, which
> > will be used by TDX and SEV-SNP but is also available for SEV and
> > SEV-ES.  The API uses the VM type argument to KVM_CREATE_VM to
> > identify which confidential computing technology to use.
> >
> > Since there are no other expected uses of VM types, delegate
> > mc->kvm_type() for x86 boards to the confidential-guest-support
>
> s/mc/cgs/ here and in subject?

No, this refers to x86_kvm_type(), which is the implementation
mc->kvm_type. As the commit message says, mc->kvm_type() is delegated
to cgs->kvm_type(), via kvm_get_vm_type().

Paolo

> > object pointed to by ms->cgs.
> >
> > For example, if a sev-guest object is specified to confidential-guest-support,
> > like,
> >
> >    qemu -machine ...,confidential-guest-support=sev0 \
> >         -object sev-guest,id=sev0,...
> >
> > it will check if a VM type KVM_X86_SEV_VM or KVM_X86_SEV_ES_VM
> > is supported, and if so use them together with the KVM_SEV_INIT2
> > function of the KVM_MEMORY_ENCRYPT_OP ioctl. If not, it will fall back to
> > KVM_SEV_INIT and KVM_SEV_ES_INIT.
> >
> > This is a preparatory work towards TDX and SEV-SNP support, but it
> > will also enable support for VMSA features such as DebugSwap, which
> > are only available via KVM_SEV_INIT2.
> >
> > Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >   target/i386/confidential-guest.h | 19 ++++++++++++++
> >   target/i386/kvm/kvm_i386.h       |  2 ++
> >   hw/i386/x86.c                    | 11 ++++++++
> >   target/i386/kvm/kvm.c            | 44 ++++++++++++++++++++++++++++++++
> >   4 files changed, 76 insertions(+)
>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 25/26] kvm: handle KVM_EXIT_MEMORY_FAULT
  2024-03-22 18:11 ` [PATCH 25/26] kvm: handle KVM_EXIT_MEMORY_FAULT Paolo Bonzini
@ 2024-03-26  8:53   ` Xiaoyao Li
  0 siblings, 0 replies; 42+ messages in thread
From: Xiaoyao Li @ 2024-03-26  8:53 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: michael.roth, david

On 3/23/2024 2:11 AM, Paolo Bonzini wrote:
> From: Chao Peng <chao.p.peng@linux.intel.com>
> 
> When geeting KVM_EXIT_MEMORY_FAULT exit, it indicates userspace needs to

typo: /s/geeting/getting

> do the memory conversion on the RAMBlock to turn the memory into desired
> attribute, i.e., private/shared.
> 
> Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
> KVM_EXIT_MEMORY_FAULT happens.
> 
> Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
> guest_memfd memory backend.
> 
> Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is
> added.
> 
> When page is converted from shared to private, the original shared
> memory can be discarded via ram_block_discard_range(). Note, shared
> memory can be discarded only when it's not back'ed by hugetlb because
> hugetlb is supposed to be pre-allocated and no need for discarding.
> 
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> Message-ID: <20240320083945.991426-13-michael.roth@amd.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   include/sysemu/kvm.h   |  2 +
>   accel/kvm/kvm-all.c    | 99 +++++++++++++++++++++++++++++++++++++-----
>   accel/kvm/trace-events |  2 +
>   3 files changed, 93 insertions(+), 10 deletions(-)
> 
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 2cb31925091..698f1640fe2 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -541,4 +541,6 @@ int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
>   
>   int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
>   int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
> +
> +int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
>   #endif
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 56b17cbd8aa..afd7f992e39 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2893,6 +2893,70 @@ static void kvm_eat_signals(CPUState *cpu)
>       } while (sigismember(&chkset, SIG_IPI));
>   }
>   
> +int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> +{
> +    MemoryRegionSection section;
> +    ram_addr_t offset;
> +    MemoryRegion *mr;
> +    RAMBlock *rb;
> +    void *addr;
> +    int ret = -1;
> +
> +    trace_kvm_convert_memory(start, size, to_private ? "shared_to_private" : "private_to_shared");
> +
> +    if (!QEMU_PTR_IS_ALIGNED(start, qemu_real_host_page_size()) ||
> +        !QEMU_PTR_IS_ALIGNED(size, qemu_real_host_page_size())) {
> +        return -1;
> +    }
> +
> +    if (!size) {
> +        return -1;
> +    }
> +
> +    section = memory_region_find(get_system_memory(), start, size);
> +    mr = section.mr;
> +    if (!mr) {
> +        return -1;
> +    }
> +
> +    if (!memory_region_has_guest_memfd(mr)) {
> +        error_report("Converting non guest_memfd backed memory region "
> +                     "(0x%"HWADDR_PRIx" ,+ 0x%"HWADDR_PRIx") to %s",
> +                     start, size, to_private ? "private" : "shared");
> +        ret = -1;

No need for it. ret is initialized as -1 at the function start.

> +        goto out_unref;
> +    }
> +
> +    if (to_private) {
> +        ret = kvm_set_memory_attributes_private(start, size);
> +    } else {
> +        ret = kvm_set_memory_attributes_shared(start, size);
> +    }
> +    if (ret) {
> +        goto out_unref;
> +    }
> +
> +    addr = memory_region_get_ram_ptr(mr) + section.offset_within_region;
> +    rb = qemu_ram_block_from_host(addr, false, &offset);
> +
> +    if (to_private) {
> +        if (rb->page_size == qemu_real_host_page_size()) {
> +            /*
> +            * shared memory is back'ed by  hugetlb, which is supposed to be

Please fix the bad comment indentation for me, as well as the extra 
space before 'hugetlb'

> +            * pre-allocated and doesn't need to be discarded
> +            */
> +            goto out_unref;
> +        }
> +        ret = ram_block_discard_range(rb, offset, size);
> +    } else {
> +        ret = ram_block_discard_guest_memfd_range(rb, offset, size);
> +    }
> +
> +out_unref:
> +    memory_region_unref(section.mr);
> +    return ret;
> +}
> +
>   int kvm_cpu_exec(CPUState *cpu)
>   {
>       struct kvm_run *run = cpu->kvm_run;
> @@ -2960,18 +3024,20 @@ int kvm_cpu_exec(CPUState *cpu)
>                   ret = EXCP_INTERRUPT;
>                   break;
>               }
> -            fprintf(stderr, "error: kvm run failed %s\n",
> -                    strerror(-run_ret));
> +            if (!(run_ret == -EFAULT && run->exit_reason == KVM_EXIT_MEMORY_FAULT)) {
> +                fprintf(stderr, "error: kvm run failed %s\n",
> +                        strerror(-run_ret));
>   #ifdef TARGET_PPC
> -            if (run_ret == -EBUSY) {
> -                fprintf(stderr,
> -                        "This is probably because your SMT is enabled.\n"
> -                        "VCPU can only run on primary threads with all "
> -                        "secondary threads offline.\n");
> -            }
> +                if (run_ret == -EBUSY) {
> +                    fprintf(stderr,
> +                            "This is probably because your SMT is enabled.\n"
> +                            "VCPU can only run on primary threads with all "
> +                            "secondary threads offline.\n");
> +                }
>   #endif
> -            ret = -1;
> -            break;
> +                ret = -1;
> +                break;
> +            }
>           }
>   
>           trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
> @@ -3054,6 +3120,19 @@ int kvm_cpu_exec(CPUState *cpu)
>                   break;
>               }
>               break;
> +        case KVM_EXIT_MEMORY_FAULT:
> +            trace_kvm_memory_fault(run->memory_fault.gpa,
> +                                   run->memory_fault.size,
> +                                   run->memory_fault.flags);
> +            if (run->memory_fault.flags & ~KVM_MEMORY_EXIT_FLAG_PRIVATE) {
> +                error_report("KVM_EXIT_MEMORY_FAULT: Unknown flag 0x%" PRIx64,
> +                             (uint64_t)run->memory_fault.flags);
> +                ret = -1;
> +                break;
> +            }
> +            ret = kvm_convert_memory(run->memory_fault.gpa, run->memory_fault.size,
> +                                     run->memory_fault.flags & KVM_MEMORY_EXIT_FLAG_PRIVATE);
> +            break;
>           default:
>               ret = kvm_arch_handle_exit(cpu, run);
>               break;
> diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
> index e8c52cb9e7a..681ccb667d6 100644
> --- a/accel/kvm/trace-events
> +++ b/accel/kvm/trace-events
> @@ -31,3 +31,5 @@ kvm_cpu_exec(void) ""
>   kvm_interrupt_exit_request(void) ""
>   kvm_io_window_exit(void) ""
>   kvm_run_exit_system_event(int cpu_index, uint32_t event_type) "cpu_index %d, system_even_type %"PRIu32
> +kvm_convert_memory(uint64_t start, uint64_t size, const char *msg) "start 0x%" PRIx64 " size 0x%" PRIx64 " %s"
> +kvm_memory_fault(uint64_t start, uint64_t size, uint64_t flags) "start 0x%" PRIx64 " size 0x%" PRIx64 " flags 0x%" PRIx64



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 21/26] kvm/memory: Make memory type private by default if it has guest memfd backend
  2024-03-22 18:11 ` [PATCH 21/26] kvm/memory: Make memory type private by default if it has guest memfd backend Paolo Bonzini
@ 2024-03-26  8:56   ` Xiaoyao Li
  0 siblings, 0 replies; 42+ messages in thread
From: Xiaoyao Li @ 2024-03-26  8:56 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: michael.roth, david

On 3/23/2024 2:11 AM, Paolo Bonzini wrote:
> From: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> KVM side leaves the memory to shared by default, while may incur the

/s/while/which/

fix typo from myself.

> overhead of paging conversion on the first visit of each page. Because
> the expectation is that page is likely to private for the VMs that
> require private memory (has guest memfd).
> 
> Explicitly set the memory to private when memory region has valid
> guest memfd backend.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Message-ID: <20240320083945.991426-16-michael.roth@amd.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   accel/kvm/kvm-all.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 7fbaf31cbaf..56b17cbd8aa 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1430,6 +1430,16 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>                       strerror(-err));
>               abort();
>           }
> +
> +        if (memory_region_has_guest_memfd(mr)) {
> +            err = kvm_set_memory_attributes_private(start_addr, slot_size);
> +            if (err) {
> +                error_report("%s: failed to set memory attribute private: %s\n",
> +                             __func__, strerror(-err));
> +                exit(1);
> +            }
> +        }
> +
>           start_addr += slot_size;
>           ram_start_offset += slot_size;
>           ram += slot_size;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/26] KVM: track whether guest state is encrypted
  2024-03-22 18:11 ` [PATCH 12/26] KVM: track whether guest state is encrypted Paolo Bonzini
  2024-03-25  9:25   ` Philippe Mathieu-Daudé
@ 2024-03-26 15:48   ` Xiaoyao Li
  2024-03-27  9:05     ` Paolo Bonzini
  1 sibling, 1 reply; 42+ messages in thread
From: Xiaoyao Li @ 2024-03-26 15:48 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: michael.roth, david

On 3/23/2024 2:11 AM, Paolo Bonzini wrote:
> So far, KVM has allowed KVM_GET/SET_* ioctls to execute even if the
> guest state is encrypted, in which case they do nothing.  For the new
> API using VM types, instead, the ioctls will fail which is a safer and
> more robust approach.
> 
> The new API will be the only one available for SEV-SNP and TDX, but it
> is also usable for SEV and SEV-ES.  In preparation for that, require
> architecture-specific KVM code to communicate the point at which guest
> state is protected (which must be after kvm_cpu_synchronize_post_init(),
> though that might change in the future in order to suppor migration).
>  From that point, skip reading registers so that cpu->vcpu_dirty is
> never true: if it ever becomes true, kvm_arch_put_registers() will
> fail miserably.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   include/sysemu/kvm.h     |  2 ++
>   include/sysemu/kvm_int.h |  1 +
>   accel/kvm/kvm-all.c      | 14 ++++++++++++--
>   target/i386/sev.c        |  1 +
>   4 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index fad9a7e8ff3..302e8f6f1e5 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -539,6 +539,8 @@ bool kvm_dirty_ring_enabled(void);
>   
>   uint32_t kvm_dirty_ring_size(void);
>   
> +void kvm_mark_guest_state_protected(void);
> +
>   /**
>    * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
>    * reported for the VM.
> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index 882e37e12c5..3496be7997a 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -87,6 +87,7 @@ struct KVMState
>       bool kernel_irqchip_required;
>       OnOffAuto kernel_irqchip_split;
>       bool sync_mmu;
> +    bool guest_state_protected;
>       uint64_t manual_dirty_log_protect;
>       /* The man page (and posix) say ioctl numbers are signed int, but
>        * they're not.  Linux, glibc and *BSD all treat ioctl numbers as
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index a8cecd040eb..05fa3533c66 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2698,7 +2698,7 @@ bool kvm_cpu_check_are_resettable(void)
>   
>   static void do_kvm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
>   {
> -    if (!cpu->vcpu_dirty) {
> +    if (!cpu->vcpu_dirty && !kvm_state->guest_state_protected) {
>           int ret = kvm_arch_get_registers(cpu);
>           if (ret) {
>               error_report("Failed to get registers: %s", strerror(-ret));
> @@ -2712,7 +2712,7 @@ static void do_kvm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
>   
>   void kvm_cpu_synchronize_state(CPUState *cpu)
>   {
> -    if (!cpu->vcpu_dirty) {
> +    if (!cpu->vcpu_dirty && !kvm_state->guest_state_protected) {
>           run_on_cpu(cpu, do_kvm_cpu_synchronize_state, RUN_ON_CPU_NULL);
>       }
>   }
> @@ -2747,6 +2747,11 @@ static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, run_on_cpu_data arg)
>   
>   void kvm_cpu_synchronize_post_init(CPUState *cpu)
>   {
> +    /*
> +     * This runs before the machine_init_done notifiers, and is the last
> +     * opportunity to synchronize the state of confidential guests.
> +     */ > +    assert(!kvm_state->guest_state_protected);

So, this requires confidential guests to call 
kvm_mark_guest_state_protected() in its machine_init_done notifier callback?

But for TDX, the guest_state is protected at the beginning, not some 
time later when machine_init_done.

>       run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
>   }
>   
> @@ -4094,3 +4099,8 @@ void query_stats_schemas_cb(StatsSchemaList **result, Error **errp)
>           query_stats_schema_vcpu(first_cpu, &stats_args);
>       }
>   }
> +
> +void kvm_mark_guest_state_protected(void)
> +{
> +    kvm_state->guest_state_protected = true;
> +}
> diff --git a/target/i386/sev.c b/target/i386/sev.c
> index b8f79d34d19..c49a8fd55eb 100644
> --- a/target/i386/sev.c
> +++ b/target/i386/sev.c
> @@ -755,6 +755,7 @@ sev_launch_get_measure(Notifier *notifier, void *unused)
>           if (ret) {
>               exit(1);
>           }
> +        kvm_mark_guest_state_protected();
>       }
>   
>       /* query the measurement blob length */



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/26] KVM: track whether guest state is encrypted
  2024-03-26 15:48   ` Xiaoyao Li
@ 2024-03-27  9:05     ` Paolo Bonzini
  0 siblings, 0 replies; 42+ messages in thread
From: Paolo Bonzini @ 2024-03-27  9:05 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: qemu-devel, michael.roth, david

On Tue, Mar 26, 2024 at 4:48 PM Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> So, this requires confidential guests to call
> kvm_mark_guest_state_protected() in its machine_init_done notifier callback?
>
> But for TDX, the guest_state is protected at the beginning, not some
> time later when machine_init_done.

Good point, I will change this to an "if".

Paolo



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 26/26] i386/kvm: Move architectural CPUID leaf generation to separate helper
  2024-03-22 18:11 ` [PATCH 26/26] i386/kvm: Move architectural CPUID leaf generation to separate helper Paolo Bonzini
@ 2024-04-01 15:03   ` Xiaoyao Li
  0 siblings, 0 replies; 42+ messages in thread
From: Xiaoyao Li @ 2024-04-01 15:03 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: michael.roth, david

On 3/23/2024 2:11 AM, Paolo Bonzini wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Move the architectural (for lack of a better term) CPUID leaf generation
> to a separate helper so that the generation code can be reused by TDX,
> which needs to generate a canonical VM-scoped configuration.
> 
> For now this is just a cleanup, so keep the function static.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Message-ID: <20240229063726.610065-23-xiaoyao.li@intel.com>
> [Unify error reporting, rename function. - Paolo]
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   target/i386/kvm/kvm.c | 446 +++++++++++++++++++++---------------------
>   1 file changed, 224 insertions(+), 222 deletions(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 2577e345502..eab6261e1f5 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -1752,6 +1752,228 @@ static void kvm_init_nested_state(CPUX86State *env)
>       }
>   }
>   
> +static uint32_t kvm_x86_build_cpuid(CPUX86State *env,
> +                                    struct kvm_cpuid_entry2 *entries,
> +                                    uint32_t cpuid_i)
> +{
> +    uint32_t limit, i, j;
> +    uint32_t unused;
> +    struct kvm_cpuid_entry2 *c;
> +
> +    cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
> +
> +    for (i = 0; i <= limit; i++) {
> +        j = 0;
> +        if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +            goto full;
> +        }
> +        c = &entries[cpuid_i++];
> +        switch (i) {
> +        case 2: {
> +            /* Keep reading function 2 till all the input is received */
> +            int times;
> +
> +            c->function = i;
> +            c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC |
> +                       KVM_CPUID_FLAG_STATE_READ_NEXT;
> +            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +            times = c->eax & 0xff;
> +
> +            for (j = 1; j < times; ++j) {
> +                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +                    goto full;
> +                }
> +                c = &entries[cpuid_i++];
> +                c->function = i;
> +                c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
> +                cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +            }
> +            break;
> +        }
> +        case 0x1f:
> +            if (env->nr_dies < 2) {
> +                cpuid_i--;
> +                break;
> +            }
> +            /* fallthrough */
> +        case 4:
> +        case 0xb:
> +        case 0xd:
> +            for (j = 0; ; j++) {
> +                if (i == 0xd && j == 64) {
> +                    break;
> +                }
> +
> +                c->function = i;
> +                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> +                c->index = j;
> +                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +
> +                if (i == 4 && c->eax == 0) {
> +                    break;
> +                }
> +                if (i == 0xb && !(c->ecx & 0xff00)) {
> +                    break;
> +                }
> +                if (i == 0x1f && !(c->ecx & 0xff00)) {
> +                    break;
> +                }
> +                if (i == 0xd && c->eax == 0) {
> +                    continue;
> +                }
> +                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +                    goto full;
> +                }
> +                c = &entries[cpuid_i++];
> +            }
> +            break;
> +        case 0x12:
> +            for (j = 0; ; j++) {
> +                c->function = i;
> +                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> +                c->index = j;
> +                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +
> +                if (j > 1 && (c->eax & 0xf) != 1) {
> +                    break;
> +                }
> +
> +                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +                    goto full;
> +                }
> +                c = &entries[cpuid_i++];
> +            }
> +            break;
> +        case 0x7:
> +        case 0x14:
> +        case 0x1d:
> +        case 0x1e: {
> +            uint32_t times;
> +
> +            c->function = i;
> +            c->index = 0;
> +            c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> +            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +            times = c->eax;
> +
> +            for (j = 1; j <= times; ++j) {
> +                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +                    goto full;
> +                }
> +                c = &entries[cpuid_i++];
> +                c->function = i;
> +                c->index = j;
> +                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> +                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +            }
> +            break;
> +        }
> +        default:
> +            c->function = i;
> +            c->flags = 0;
> +            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +            if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
> +                /*
> +                 * KVM already returns all zeroes if a CPUID entry is missing,
> +                 * so we can omit it and avoid hitting KVM's 80-entry limit.
> +                 */
> +                cpuid_i--;
> +            }
> +            break;
> +        }
> +    }
> +
> +    if (limit >= 0x0a) {
> +        uint32_t eax, edx;
> +
> +        cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
> +
> +        has_architectural_pmu_version = eax & 0xff;
> +        if (has_architectural_pmu_version > 0) {
> +            num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
> +
> +            /* Shouldn't be more than 32, since that's the number of bits
> +             * available in EBX to tell us _which_ counters are available.
> +             * Play it safe.
> +             */
> +            if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
> +                num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
> +            }
> +
> +            if (has_architectural_pmu_version > 1) {
> +                num_architectural_pmu_fixed_counters = edx & 0x1f;
> +
> +                if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
> +                    num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
> +                }
> +            }
> +        }
> +    }
> +
> +    cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
> +
> +    for (i = 0x80000000; i <= limit; i++) {
> +        j = 0;

here miss the check of

	if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
             goto full;
         }

Otherwise, it looks good to me.

Reviewed-by: Xiaoyao Li <xiaoyoa.li@intel.com>

> +        c = &entries[cpuid_i++];
> +
> +        switch (i) {
> +        case 0x8000001d:
> +            /* Query for all AMD cache information leaves */
> +            for (j = 0; ; j++) {
> +                c->function = i;
> +                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> +                c->index = j;
> +                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +
> +                if (c->eax == 0) {
> +                    break;
> +                }
> +                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +                    goto full;
> +                }
> +                c = &entries[cpuid_i++];
> +            }
> +            break;
> +        default:
> +            c->function = i;
> +            c->flags = 0;
> +            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +            if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
> +                /*
> +                 * KVM already returns all zeroes if a CPUID entry is missing,
> +                 * so we can omit it and avoid hitting KVM's 80-entry limit.
> +                 */
> +                cpuid_i--;
> +            }
> +            break;
> +        }
> +    }
> +
> +    /* Call Centaur's CPUID instructions they are supported. */
> +    if (env->cpuid_xlevel2 > 0) {
> +        cpu_x86_cpuid(env, 0xC0000000, 0, &limit, &unused, &unused, &unused);
> +
> +        for (i = 0xC0000000; i <= limit; i++) {
> +            j = 0;
> +            if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> +                goto full;
> +            }
> +            c = &entries[cpuid_i++];
> +
> +            c->function = i;
> +            c->flags = 0;
> +            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> +        }
> +    }
> +
> +    return cpuid_i;
> +
> +full:
> +    fprintf(stderr, "cpuid_data is full, no space for "
> +            "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
> +    abort();
> +}
> +
>   int kvm_arch_init_vcpu(CPUState *cs)
>   {
>       struct {
> @@ -1768,8 +1990,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>   
>       X86CPU *cpu = X86_CPU(cs);
>       CPUX86State *env = &cpu->env;
> -    uint32_t limit, i, j, cpuid_i;
> -    uint32_t unused;
> +    uint32_t cpuid_i;
>       struct kvm_cpuid_entry2 *c;
>       uint32_t signature[3];
>       int kvm_base = KVM_CPUID_SIGNATURE;
> @@ -1922,8 +2143,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
>           c->edx = env->features[FEAT_KVM_HINTS];
>       }
>   
> -    cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
> -
>       if (cpu->kvm_pv_enforce_cpuid) {
>           r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1);
>           if (r < 0) {
> @@ -1934,224 +2153,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>           }
>       }
>   
> -    for (i = 0; i <= limit; i++) {
> -        if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> -            fprintf(stderr, "unsupported level value: 0x%x\n", limit);
> -            abort();
> -        }
> -        c = &cpuid_data.entries[cpuid_i++];
> -
> -        switch (i) {
> -        case 2: {
> -            /* Keep reading function 2 till all the input is received */
> -            int times;
> -
> -            c->function = i;
> -            c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC |
> -                       KVM_CPUID_FLAG_STATE_READ_NEXT;
> -            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -            times = c->eax & 0xff;
> -
> -            for (j = 1; j < times; ++j) {
> -                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> -                    fprintf(stderr, "cpuid_data is full, no space for "
> -                            "cpuid(eax:2):eax & 0xf = 0x%x\n", times);
> -                    abort();
> -                }
> -                c = &cpuid_data.entries[cpuid_i++];
> -                c->function = i;
> -                c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
> -                cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -            }
> -            break;
> -        }
> -        case 0x1f:
> -            if (env->nr_dies < 2) {
> -                cpuid_i--;
> -                break;
> -            }
> -            /* fallthrough */
> -        case 4:
> -        case 0xb:
> -        case 0xd:
> -            for (j = 0; ; j++) {
> -                if (i == 0xd && j == 64) {
> -                    break;
> -                }
> -
> -                c->function = i;
> -                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> -                c->index = j;
> -                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -
> -                if (i == 4 && c->eax == 0) {
> -                    break;
> -                }
> -                if (i == 0xb && !(c->ecx & 0xff00)) {
> -                    break;
> -                }
> -                if (i == 0x1f && !(c->ecx & 0xff00)) {
> -                    break;
> -                }
> -                if (i == 0xd && c->eax == 0) {
> -                    continue;
> -                }
> -                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> -                    fprintf(stderr, "cpuid_data is full, no space for "
> -                            "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
> -                    abort();
> -                }
> -                c = &cpuid_data.entries[cpuid_i++];
> -            }
> -            break;
> -        case 0x12:
> -            for (j = 0; ; j++) {
> -                c->function = i;
> -                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> -                c->index = j;
> -                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -
> -                if (j > 1 && (c->eax & 0xf) != 1) {
> -                    break;
> -                }
> -
> -                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> -                    fprintf(stderr, "cpuid_data is full, no space for "
> -                                "cpuid(eax:0x12,ecx:0x%x)\n", j);
> -                    abort();
> -                }
> -                c = &cpuid_data.entries[cpuid_i++];
> -            }
> -            break;
> -        case 0x7:
> -        case 0x14:
> -        case 0x1d:
> -        case 0x1e: {
> -            uint32_t times;
> -
> -            c->function = i;
> -            c->index = 0;
> -            c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> -            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -            times = c->eax;
> -
> -            for (j = 1; j <= times; ++j) {
> -                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> -                    fprintf(stderr, "cpuid_data is full, no space for "
> -                                "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
> -                    abort();
> -                }
> -                c = &cpuid_data.entries[cpuid_i++];
> -                c->function = i;
> -                c->index = j;
> -                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> -                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -            }
> -            break;
> -        }
> -        default:
> -            c->function = i;
> -            c->flags = 0;
> -            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -            if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
> -                /*
> -                 * KVM already returns all zeroes if a CPUID entry is missing,
> -                 * so we can omit it and avoid hitting KVM's 80-entry limit.
> -                 */
> -                cpuid_i--;
> -            }
> -            break;
> -        }
> -    }
> -
> -    if (limit >= 0x0a) {
> -        uint32_t eax, edx;
> -
> -        cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
> -
> -        has_architectural_pmu_version = eax & 0xff;
> -        if (has_architectural_pmu_version > 0) {
> -            num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
> -
> -            /* Shouldn't be more than 32, since that's the number of bits
> -             * available in EBX to tell us _which_ counters are available.
> -             * Play it safe.
> -             */
> -            if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
> -                num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
> -            }
> -
> -            if (has_architectural_pmu_version > 1) {
> -                num_architectural_pmu_fixed_counters = edx & 0x1f;
> -
> -                if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
> -                    num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
> -                }
> -            }
> -        }
> -    }
> -
> -    cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
> -
> -    for (i = 0x80000000; i <= limit; i++) {
> -        if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> -            fprintf(stderr, "unsupported xlevel value: 0x%x\n", limit);
> -            abort();
> -        }
> -        c = &cpuid_data.entries[cpuid_i++];
> -
> -        switch (i) {
> -        case 0x8000001d:
> -            /* Query for all AMD cache information leaves */
> -            for (j = 0; ; j++) {
> -                c->function = i;
> -                c->flags = KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> -                c->index = j;
> -                cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -
> -                if (c->eax == 0) {
> -                    break;
> -                }
> -                if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> -                    fprintf(stderr, "cpuid_data is full, no space for "
> -                            "cpuid(eax:0x%x,ecx:0x%x)\n", i, j);
> -                    abort();
> -                }
> -                c = &cpuid_data.entries[cpuid_i++];
> -            }
> -            break;
> -        default:
> -            c->function = i;
> -            c->flags = 0;
> -            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -            if (!c->eax && !c->ebx && !c->ecx && !c->edx) {
> -                /*
> -                 * KVM already returns all zeroes if a CPUID entry is missing,
> -                 * so we can omit it and avoid hitting KVM's 80-entry limit.
> -                 */
> -                cpuid_i--;
> -            }
> -            break;
> -        }
> -    }
> -
> -    /* Call Centaur's CPUID instructions they are supported. */
> -    if (env->cpuid_xlevel2 > 0) {
> -        cpu_x86_cpuid(env, 0xC0000000, 0, &limit, &unused, &unused, &unused);
> -
> -        for (i = 0xC0000000; i <= limit; i++) {
> -            if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
> -                fprintf(stderr, "unsupported xlevel2 value: 0x%x\n", limit);
> -                abort();
> -            }
> -            c = &cpuid_data.entries[cpuid_i++];
> -
> -            c->function = i;
> -            c->flags = 0;
> -            cpu_x86_cpuid(env, i, 0, &c->eax, &c->ebx, &c->ecx, &c->edx);
> -        }
> -    }
> -
> +    cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
>       cpuid_data.cpuid.nent = cpuid_i;
>   
>       if (((env->cpuid_version >> 8)&0xF) >= 6



^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-04-01 15:04 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-22 18:10 [PATCH for-9.1 00/26] x86, kvm: common confidential computing subset Paolo Bonzini
2024-03-22 18:10 ` [PATCH 01/26] pci-host/q35: Move PAM initialization above SMRAM initialization Paolo Bonzini
2024-03-22 18:10 ` [PATCH 02/26] q35: Introduce smm_ranges property for q35-pci-host Paolo Bonzini
2024-03-22 18:10 ` [PATCH 03/26] confidential guest support: Add kvm_init() and kvm_reset() in class Paolo Bonzini
2024-03-25  8:32   ` Philippe Mathieu-Daudé
2024-03-25 10:53     ` Paolo Bonzini
2024-03-22 18:10 ` [PATCH 04/26] i386/sev: Switch to use confidential_guest_kvm_init() Paolo Bonzini
2024-03-22 18:10 ` [PATCH 05/26] ppc/pef: switch to use confidential_guest_kvm_init/reset() Paolo Bonzini
2024-03-22 18:10 ` [PATCH 06/26] s390: Switch to use confidential_guest_kvm_init() Paolo Bonzini
2024-03-22 18:10 ` [PATCH 07/26] scripts/update-linux-headers: Add setup_data.h to import list Paolo Bonzini
2024-03-22 18:10 ` [PATCH 08/26] scripts/update-linux-headers: Add bits.h to file imports Paolo Bonzini
2024-03-22 18:10 ` [PATCH 09/26] [HACK] linux-headers: Update headers for 6.8 + kvm-coco-queue + SNP Paolo Bonzini
2024-03-22 18:11 ` [PATCH 10/26] [TO SQUASH] hw/i386: Remove redeclaration of struct setup_data Paolo Bonzini
2024-03-22 18:11 ` [PATCH 11/26] runstate: skip initial CPU reset if reset is not actually possible Paolo Bonzini
2024-03-25  8:58   ` Daniel P. Berrangé
2024-03-25  9:32   ` Philippe Mathieu-Daudé
2024-03-22 18:11 ` [PATCH 12/26] KVM: track whether guest state is encrypted Paolo Bonzini
2024-03-25  9:25   ` Philippe Mathieu-Daudé
2024-03-26 15:48   ` Xiaoyao Li
2024-03-27  9:05     ` Paolo Bonzini
2024-03-22 18:11 ` [PATCH 13/26] KVM: remove kvm_arch_cpu_check_are_resettable Paolo Bonzini
2024-03-25  9:26   ` Philippe Mathieu-Daudé
2024-03-22 18:11 ` [PATCH 14/26] target/i386: introduce x86-confidential-guest Paolo Bonzini
2024-03-22 18:11 ` [PATCH 15/26] target/i386: Implement mc->kvm_type() to get VM type Paolo Bonzini
2024-03-25  9:29   ` Philippe Mathieu-Daudé
2024-03-25 10:57     ` Paolo Bonzini
2024-03-22 18:11 ` [PATCH 16/26] target/i386: SEV: use KVM_SEV_INIT2 if possible Paolo Bonzini
2024-03-22 18:11 ` [PATCH 17/26] trace/kvm: Split address space and slot id in trace_kvm_set_user_memory() Paolo Bonzini
2024-03-22 18:11 ` [PATCH 18/26] kvm: Introduce support for memory_attributes Paolo Bonzini
2024-03-25  9:31   ` Philippe Mathieu-Daudé
2024-03-22 18:11 ` [PATCH 19/26] RAMBlock: Add support of KVM private guest memfd Paolo Bonzini
2024-03-22 18:53   ` Michael Roth
2024-03-22 18:11 ` [PATCH 20/26] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Paolo Bonzini
2024-03-22 18:11 ` [PATCH 21/26] kvm/memory: Make memory type private by default if it has guest memfd backend Paolo Bonzini
2024-03-26  8:56   ` Xiaoyao Li
2024-03-22 18:11 ` [PATCH 22/26] HostMem: Add mechanism to opt in kvm guest memfd via MachineState Paolo Bonzini
2024-03-22 18:11 ` [PATCH 23/26] RAMBlock: make guest_memfd require uncoordinated discard Paolo Bonzini
2024-03-22 18:11 ` [PATCH 24/26] physmem: Introduce ram_block_discard_guest_memfd_range() Paolo Bonzini
2024-03-22 18:11 ` [PATCH 25/26] kvm: handle KVM_EXIT_MEMORY_FAULT Paolo Bonzini
2024-03-26  8:53   ` Xiaoyao Li
2024-03-22 18:11 ` [PATCH 26/26] i386/kvm: Move architectural CPUID leaf generation to separate helper Paolo Bonzini
2024-04-01 15:03   ` Xiaoyao Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.