qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/21] QEMU gmem implemention
@ 2023-09-14  3:50 Xiaoyao Li
  2023-09-14  3:50 ` [RFC PATCH v2 01/21] *** HACK *** linux-headers: Update headers to pull in gmem APIs Xiaoyao Li
                   ` (21 more replies)
  0 siblings, 22 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:50 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

It's the v2 RFC of enabling KVM gmem[1] as the backend for private
memory.

For confidential-computing, KVM provides gmem/guest_mem interfaces for
userspace, like QEMU, to allocate user-unaccesible private memory. This
series aims to add gmem support in QEMU's RAMBlock so that each RAM can
have both hva-based shared memory and gmem_fd based private memory. QEMU
does the shared-private conversion on KVM_MEMORY_EXIT and discards the
memory.

It chooses the design that adds "private" property to hostmeory backend.
If "private" property is set, QEMU will allocate/create KVM gmem when
initialize the RAMbloch of the memory backend. 

This sereis also introduces the first user of kvm gmem,
KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
can be created with 

  $qemu -object sw-protected-vm,id=sp-vm0 \
	-object memory-backend-ram,id=mem0,size=1G,private=on \
	-machine q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
	...

Unfortunately this patch series fails the boot of OVMF at very early
stage due to triple fault, because KVM doesn't support emulating string IO
to private memory.

This version still leave some opens to be discussed:
1. whether we need "private" propery to be user-settable?

   It seems unnecessary because vm-type is determined. If the VM is
   confidential-guest, then the RAM of the guest must be able to be
   mapped as private, i.e., have kvm gmem backend. So QEMU can
   determine the value of "private" property automatiacally based on vm
   type.

   This also aligns with the board internal MemoryRegion that needs to
   have kvm gmem backend, e.g., TDX requires OVMF to act as private
   memory so bios memory region needs to have kvm gmem fd associated.
   QEMU no doubt will do it internally automatically.

2. hugepage support.

   KVM gmem can be allocated from hugetlbfs. How does QEMU determine
   when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
   easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
   only when memory backend is HostMemoryBackendFile of hugetlbfs.

3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?

   This series implements KVM_X86_SW_PROTECTED_VM because it's introduced
   with gmem together on KVM side and it's supposed to be the first user
   who requires KVM gmem. However the implementation is incomplete and
   there lacks the definition of how KVM_X86_SW_PROTECTED_VM works.

Any other idea/open/question is welcomed.

Beside, TDX QEMU implemetation is based on this series to provide
private gmem for TD private memory, which can be found at [2].
And it can work corresponding KVM [3] to boot TDX guest. 

[1] https://lore.kernel.org/all/20230718234512.1690985-1-seanjc@google.com/
[2] https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream
[3] https://github.com/intel/tdx/tree/kvm-upstream-2023.07.27-v6.5-rc2-workaround

===
Changes since rfc v1:
- Implement KVM_X86_SW_PROTECTED_VM with confidential-guest-support
interface;
- rename memory_region_can_be_private() to memory_region_has_gmem_fd();
- allocate kvm gmem fd when creating/initializing the memory backend by
introducing the RAM_KVM_GMEM flag;


Chao Peng (3):
  RAMBlock: Add support of KVM private gmem
  kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  kvm: handle KVM_EXIT_MEMORY_FAULT

Isaku Yamahata (4):
  HostMem: Add private property and associate it with RAM_KVM_GMEM
  trace/kvm: Add trace for page convertion between shared and private
  pci-host/q35: Move PAM initialization above SMRAM initialization
  q35: Introduce smm_ranges property for q35-pci-host

Xiaoyao Li (14):
  *** HACK *** linux-headers: Update headers to pull in gmem APIs
  memory: Introduce memory_region_has_gmem_fd()
  i386: Add support for sw-protected-vm object
  i386/pc: Drop pc_machine_kvm_type()
  target/i386: Implement mc->kvm_type() to get VM type
  target/i386: Introduce kvm_confidential_guest_init()
  i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm
    specific functions
  kvm: Introduce support for memory_attributes
  kvm/memory: Introduce the infrastructure to set the default
    shared/private value
  i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM
  physmem: replace function name with __func__ in
    ram_block_discard_range()
  physmem: extract ram_block_discard_range_fd() from
    ram_block_discard_range()
  physmem: Introduce ram_block_convert_range()
  i386: Disable SMM mode for X86_SW_PROTECTED_VM

 accel/kvm/kvm-all.c               | 180 ++++++++++++++++++++-
 accel/kvm/trace-events            |   4 +-
 backends/hostmem-file.c           |   1 +
 backends/hostmem-memfd.c          |   1 +
 backends/hostmem-ram.c            |   1 +
 backends/hostmem.c                |  18 +++
 hw/i386/pc.c                      |   5 -
 hw/i386/pc_q35.c                  |   3 +-
 hw/i386/x86.c                     |  12 ++
 hw/pci-host/q35.c                 |  61 ++++---
 include/exec/cpu-common.h         |   2 +
 include/exec/memory.h             |  20 +++
 include/exec/ramblock.h           |   1 +
 include/hw/i386/pc.h              |   4 +-
 include/hw/i386/x86.h             |   1 +
 include/hw/pci-host/q35.h         |   1 +
 include/sysemu/hostmem.h          |   2 +-
 include/sysemu/kvm.h              |   5 +
 include/sysemu/kvm_int.h          |   2 +
 linux-headers/asm-x86/kvm.h       |   3 +
 linux-headers/linux/kvm.h         |  50 ++++++
 qapi/qom.json                     |   5 +
 softmmu/memory.c                  |  18 +++
 softmmu/physmem.c                 | 256 ++++++++++++++++++------------
 target/i386/kvm/kvm.c             |  43 ++++-
 target/i386/kvm/kvm_i386.h        |   1 +
 target/i386/kvm/meson.build       |   1 +
 target/i386/kvm/sw-protected-vm.c |  71 +++++++++
 target/i386/kvm/sw-protected-vm.h |  19 +++
 target/i386/sev.c                 |   1 -
 target/i386/sev.h                 |   2 +
 31 files changed, 648 insertions(+), 146 deletions(-)
 create mode 100644 target/i386/kvm/sw-protected-vm.c
 create mode 100644 target/i386/kvm/sw-protected-vm.h

-- 
2.34.1



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 01/21] *** HACK *** linux-headers: Update headers to pull in gmem APIs
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
@ 2023-09-14  3:50 ` Xiaoyao Li
  2023-09-14  3:50 ` [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem Xiaoyao Li
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:50 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

This patch needs to be updated by script

	scripts/update-linux-headers.sh

once gmem fd support is upstreamed in Linux kernel.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 linux-headers/asm-x86/kvm.h |  3 +++
 linux-headers/linux/kvm.h   | 50 +++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 2b3a8f7bd2c0..003fb745347c 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -560,4 +560,7 @@ struct kvm_pmu_event_filter {
 /* x86-specific KVM_EXIT_HYPERCALL flags. */
 #define KVM_EXIT_HYPERCALL_LONG_MODE	BIT(0)
 
+#define KVM_X86_DEFAULT_VM	0
+#define KVM_X86_SW_PROTECTED_VM	1
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 1f3f3333a4d8..278bed78f98e 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -95,6 +95,19 @@ struct kvm_userspace_memory_region {
 	__u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
+/* for KVM_SET_USER_MEMORY_REGION2 */
+struct kvm_userspace_memory_region2 {
+	__u32 slot;
+	__u32 flags;
+	__u64 guest_phys_addr;
+	__u64 memory_size;
+	__u64 userspace_addr;
+	__u64 gmem_offset;
+	__u32 gmem_fd;
+	__u32 pad1;
+	__u64 pad2[14];
+};
+
 /*
  * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for
  * userspace, other bits are reserved for kvm internal use which are defined
@@ -102,6 +115,7 @@ struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 #define KVM_MEM_READONLY	(1UL << 1)
+#define KVM_MEM_PRIVATE		(1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -264,6 +278,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_SBI        35
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
+#define KVM_EXIT_MEMORY_FAULT     38
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -506,6 +521,13 @@ struct kvm_run {
 #define KVM_NOTIFY_CONTEXT_INVALID	(1 << 0)
 			__u32 flags;
 		} notify;
+		/* KVM_EXIT_MEMORY_FAULT */
+		struct {
+#define KVM_MEMORY_EXIT_FLAG_PRIVATE	(1ULL << 3)
+			__u64 flags;
+			__u64 gpa;
+			__u64 size;
+		} memory;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -1188,6 +1210,9 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_COUNTER_OFFSET 227
 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
+#define KVM_CAP_USER_MEMORY2 230
+#define KVM_CAP_MEMORY_ATTRIBUTES 231
+#define KVM_CAP_VM_TYPES 232
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1462,6 +1487,8 @@ struct kvm_vfio_spapr_tce {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \
+					 struct kvm_userspace_memory_region2)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
@@ -2245,4 +2272,27 @@ struct kvm_s390_zpci_op {
 /* flags for kvm_s390_zpci_op->u.reg_aen.flags */
 #define KVM_S390_ZPCIOP_REGAEN_HOST    (1 << 0)
 
+/* Available with KVM_CAP_MEMORY_ATTRIBUTES */
+#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES    _IOR(KVMIO,  0xd2, __u64)
+#define KVM_SET_MEMORY_ATTRIBUTES              _IOW(KVMIO,  0xd3, struct kvm_memory_attributes)
+
+struct kvm_memory_attributes {
+	__u64 address;
+	__u64 size;
+	__u64 attributes;
+	__u64 flags;
+};
+
+#define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
+
+#define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
+
+#define KVM_GUEST_MEMFD_ALLOW_HUGEPAGE		(1ULL << 0)
+
+struct kvm_create_guest_memfd {
+	__u64 size;
+	__u64 flags;
+	__u64 reserved[6];
+};
+
 #endif /* __LINUX_KVM_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
  2023-09-14  3:50 ` [RFC PATCH v2 01/21] *** HACK *** linux-headers: Update headers to pull in gmem APIs Xiaoyao Li
@ 2023-09-14  3:50 ` Xiaoyao Li
  2023-09-15  2:04   ` Wang, Lei
                     ` (2 more replies)
  2023-09-14  3:50 ` [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM Xiaoyao Li
                   ` (19 subsequent siblings)
  21 siblings, 3 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:50 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

From: Chao Peng <chao.p.peng@linux.intel.com>

Add KVM gmem support to RAMBlock so both normal hva based memory
and kvm gmem fd based private memory can be associated in one RAMBlock.

Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
gmem for the RAMBlock when it's set.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c     | 17 +++++++++++++++++
 include/exec/memory.h   |  3 +++
 include/exec/ramblock.h |  1 +
 include/sysemu/kvm.h    |  2 ++
 softmmu/physmem.c       | 18 +++++++++++++++---
 5 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 60aacd925393..185ae16d9620 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -4225,3 +4225,20 @@ void query_stats_schemas_cb(StatsSchemaList **result, Error **errp)
         query_stats_schema_vcpu(first_cpu, &stats_args);
     }
 }
+
+int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
+{
+    int fd;
+    struct kvm_create_guest_memfd gmem = {
+        .size = size,
+        /* TODO: to decide whether KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is supported */
+        .flags = flags,
+    };
+
+    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &gmem);
+    if (fd < 0) {
+        error_setg_errno(errp, errno, "%s: error creating kvm gmem\n", __func__);
+    }
+
+    return fd;
+}
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 68284428f87c..227cb2578e95 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -235,6 +235,9 @@ typedef struct IOMMUTLBEvent {
 /* RAM is an mmap-ed named file */
 #define RAM_NAMED_FILE (1 << 9)
 
+/* RAM can be private that has kvm gmem backend */
+#define RAM_KVM_GMEM    (1 << 10)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 69c6a5390293..0d158b3909c9 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -41,6 +41,7 @@ struct RAMBlock {
     QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
     int fd;
     uint64_t fd_offset;
+    int gmem_fd;
     size_t page_size;
     /* dirty bitmap used during migration */
     unsigned long *bmap;
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 115f0cca79d1..f5b74c8dd8c5 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -580,4 +580,6 @@ bool kvm_arch_cpu_check_are_resettable(void);
 bool kvm_dirty_ring_enabled(void);
 
 uint32_t kvm_dirty_ring_size(void);
+
+int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
 #endif
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1fe..2d98a88f41f0 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1824,6 +1824,16 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
         }
     }
 
+    if (kvm_enabled() && new_block->flags & RAM_KVM_GMEM &&
+        new_block->gmem_fd < 0) {
+        new_block->gmem_fd = kvm_create_guest_memfd(new_block->max_length,
+                                                    0, errp);
+        if (new_block->gmem_fd < 0) {
+            qemu_mutex_unlock_ramlist();
+            return;
+        }
+    }
+
     new_ram_size = MAX(old_ram_size,
               (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
     if (new_ram_size > old_ram_size) {
@@ -1885,7 +1895,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
 
     /* Just support these ram flags by now. */
     assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
-                          RAM_PROTECTED | RAM_NAMED_FILE)) == 0);
+                          RAM_PROTECTED | RAM_NAMED_FILE | RAM_KVM_GMEM)) == 0);
 
     if (xen_enabled()) {
         error_setg(errp, "-mem-path not supported with Xen");
@@ -1920,6 +1930,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = ram_flags;
+    new_block->gmem_fd = -1;
     new_block->host = file_ram_alloc(new_block, size, fd, readonly,
                                      !file_size, offset, errp);
     if (!new_block->host) {
@@ -1978,7 +1989,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     Error *local_err = NULL;
 
     assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
-                          RAM_NORESERVE)) == 0);
+                          RAM_NORESERVE| RAM_KVM_GMEM)) == 0);
     assert(!host ^ (ram_flags & RAM_PREALLOC));
 
     size = HOST_PAGE_ALIGN(size);
@@ -1990,6 +2001,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     new_block->max_length = max_size;
     assert(max_size >= size);
     new_block->fd = -1;
+    new_block->gmem_fd = -1;
     new_block->page_size = qemu_real_host_page_size();
     new_block->host = host;
     new_block->flags = ram_flags;
@@ -2012,7 +2024,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
 RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
                          MemoryRegion *mr, Error **errp)
 {
-    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE)) == 0);
+    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE | RAM_KVM_GMEM)) == 0);
     return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
  2023-09-14  3:50 ` [RFC PATCH v2 01/21] *** HACK *** linux-headers: Update headers to pull in gmem APIs Xiaoyao Li
  2023-09-14  3:50 ` [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem Xiaoyao Li
@ 2023-09-14  3:50 ` Xiaoyao Li
  2023-09-19  9:46   ` Markus Armbruster
  2023-09-14  3:51 ` [RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd() Xiaoyao Li
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:50 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

From: Isaku Yamahata <isaku.yamahata@intel.com>

Add a new property "private" to memory backends. When it's set to true,
it indicates the RAMblock of the backend also requires kvm gmem.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 backends/hostmem-file.c  |  1 +
 backends/hostmem-memfd.c |  1 +
 backends/hostmem-ram.c   |  1 +
 backends/hostmem.c       | 18 ++++++++++++++++++
 include/sysemu/hostmem.h |  2 +-
 qapi/qom.json            |  4 ++++
 6 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index b4335a80e6da..861f76f2de8a 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -56,6 +56,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->private ? RAM_KVM_GMEM : 0;
     ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
     ram_flags |= RAM_NAMED_FILE;
     memory_region_init_ram_from_file(&backend->mr, OBJECT(backend), name,
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 3fc85c3db81b..f49990ce3bbd 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -55,6 +55,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->private ? RAM_KVM_GMEM : 0;
     memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name,
                                    backend->size, ram_flags, fd, 0, errp);
     g_free(name);
diff --git a/backends/hostmem-ram.c b/backends/hostmem-ram.c
index b8e55cdbd0f8..d6c46250dcfd 100644
--- a/backends/hostmem-ram.c
+++ b/backends/hostmem-ram.c
@@ -30,6 +30,7 @@ ram_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
     name = host_memory_backend_get_name(backend);
     ram_flags = backend->share ? RAM_SHARED : 0;
     ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
+    ram_flags |= backend->private ? RAM_KVM_GMEM : 0;
     memory_region_init_ram_flags_nomigrate(&backend->mr, OBJECT(backend), name,
                                            backend->size, ram_flags, errp);
     g_free(name);
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 747e7838c031..dbdbb0aafd45 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -461,6 +461,20 @@ static void host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
     }
     backend->reserve = value;
 }
+
+static bool host_memory_backend_get_private(Object *o, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+    return backend->private;
+}
+
+static void host_memory_backend_set_private(Object *o, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+    backend->private = value;
+}
 #endif /* CONFIG_LINUX */
 
 static bool
@@ -541,6 +555,10 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
         host_memory_backend_get_reserve, host_memory_backend_set_reserve);
     object_class_property_set_description(oc, "reserve",
         "Reserve swap space (or huge pages) if applicable");
+    object_class_property_add_bool(oc, "private",
+        host_memory_backend_get_private, host_memory_backend_set_private);
+    object_class_property_set_description(oc, "private",
+        "Use KVM gmem private memory");
 #endif /* CONFIG_LINUX */
     /*
      * Do not delete/rename option. This option must be considered stable
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 39326f1d4f9c..d88970395618 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -65,7 +65,7 @@ struct HostMemoryBackend {
     /* protected */
     uint64_t size;
     bool merge, dump, use_canonical_path;
-    bool prealloc, is_mapped, share, reserve;
+    bool prealloc, is_mapped, share, reserve, private;
     uint32_t prealloc_threads;
     ThreadContext *prealloc_context;
     DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
diff --git a/qapi/qom.json b/qapi/qom.json
index fa3e88c8e6ab..d28c5403bc0f 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -605,6 +605,9 @@
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 #     (default: true) (since 6.1)
 #
+# @private: if true, use KVM gmem private memory (default: false)
+#     (since 8.2)
+#
 # @size: size of the memory region in bytes
 #
 # @x-use-canonical-path-for-ramblock-id: if true, the canonical path
@@ -631,6 +634,7 @@
             '*prealloc-context': 'str',
             '*share': 'bool',
             '*reserve': 'bool',
+            '*private': 'bool',
             'size': 'size',
             '*x-use-canonical-path-for-ramblock-id': 'bool' } }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd()
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (2 preceding siblings ...)
  2023-09-14  3:50 ` [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-21  8:46   ` David Hildenbrand
  2023-09-14  3:51 ` [RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Introduce memory_region_has_gmem_fd() to query if the MemoryRegion has
KVM gmem fd allocated.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 include/exec/memory.h | 10 ++++++++++
 softmmu/memory.c      |  5 +++++
 2 files changed, 15 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 227cb2578e95..4b8486ca3632 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1674,6 +1674,16 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
  */
 bool memory_region_is_protected(MemoryRegion *mr);
 
+/**
+ * memory_region_has_gmem_fd: check whether a memory region has KVM gmem fd
+ *     associated
+ *
+ * Returns %true if a memory region's ram_block has valid gmem fd assigned.
+ *
+ * @mr: the memory region being queried
+ */
+bool memory_region_has_gmem_fd(MemoryRegion *mr);
+
 /**
  * memory_region_get_iommu: check whether a memory region is an iommu
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce7028..e69a5f96d5d1 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1846,6 +1846,11 @@ bool memory_region_is_protected(MemoryRegion *mr)
     return mr->ram && (mr->ram_block->flags & RAM_PROTECTED);
 }
 
+bool memory_region_has_gmem_fd(MemoryRegion *mr)
+{
+    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
+}
+
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
     uint8_t mask = mr->dirty_log_mask;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (3 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd() Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-21  8:56   ` David Hildenbrand
  2023-09-14  3:51 ` [RFC PATCH v2 06/21] i386: Add support for sw-protected-vm object Xiaoyao Li
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

From: Chao Peng <chao.p.peng@linux.intel.com>

Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.

With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
backend'ed both by hva-based shared memory and gmem fd based private
memory.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c      | 55 ++++++++++++++++++++++++++++++++++------
 accel/kvm/trace-events   |  2 +-
 include/sysemu/kvm_int.h |  2 ++
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 185ae16d9620..91cee0878366 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -288,35 +288,68 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
 static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, bool new)
 {
     KVMState *s = kvm_state;
-    struct kvm_userspace_memory_region mem;
+    struct kvm_userspace_memory_region2 mem;
+    static int cap_user_memory2 = -1;
     int ret;
 
+    if (cap_user_memory2 == -1) {
+        cap_user_memory2 = kvm_check_extension(s, KVM_CAP_USER_MEMORY2);
+    }
+
+    if (!cap_user_memory2 && slot->gmem_fd >= 0) {
+        error_report("%s, KVM doesn't support gmem!", __func__);
+        exit(1);
+    }
+
     mem.slot = slot->slot | (kml->as_id << 16);
     mem.guest_phys_addr = slot->start_addr;
     mem.userspace_addr = (unsigned long)slot->ram;
     mem.flags = slot->flags;
+    mem.gmem_fd = slot->gmem_fd;
+    mem.gmem_offset = slot->ofs;
 
     if (slot->memory_size && !new && (mem.flags ^ slot->old_flags) & KVM_MEM_READONLY) {
         /* Set the slot size to 0 before setting the slot to the desired
          * value. This is needed based on KVM commit 75d61fbc. */
         mem.memory_size = 0;
-        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+
+        if (cap_user_memory2) {
+            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
+        } else {
+            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+	    }
         if (ret < 0) {
             goto err;
         }
     }
     mem.memory_size = slot->memory_size;
-    ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+    if (cap_user_memory2) {
+        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
+    } else {
+        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+    }
     slot->old_flags = mem.flags;
 err:
     trace_kvm_set_user_memory(mem.slot >> 16, (uint16_t)mem.slot, mem.flags,
                               mem.guest_phys_addr, mem.memory_size,
-                              mem.userspace_addr, ret);
+                              mem.userspace_addr, mem.gmem_fd,
+			      mem.gmem_offset, ret);
     if (ret < 0) {
-        error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
-                     " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
-                     __func__, mem.slot, slot->start_addr,
-                     (uint64_t)mem.memory_size, strerror(errno));
+        if (cap_user_memory2) {
+                error_report("%s: KVM_SET_USER_MEMORY_REGION2 failed, slot=%d,"
+                        " start=0x%" PRIx64 ", size=0x%" PRIx64 ","
+                        " flags=0x%" PRIx32 ","
+                        " gmem_fd=%" PRId32 ", gmem_offset=0x%" PRIx64 ": %s",
+                        __func__, mem.slot, slot->start_addr,
+                        (uint64_t)mem.memory_size, mem.flags,
+                        mem.gmem_fd, (uint64_t)mem.gmem_offset,
+                        strerror(errno));
+        } else {
+                error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
+                            " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
+                            __func__, mem.slot, slot->start_addr,
+                            (uint64_t)mem.memory_size, strerror(errno));
+        }
     }
     return ret;
 }
@@ -472,6 +505,9 @@ static int kvm_mem_flags(MemoryRegion *mr)
     if (readonly && kvm_readonly_mem_allowed) {
         flags |= KVM_MEM_READONLY;
     }
+    if (memory_region_has_gmem_fd(mr)) {
+        flags |= KVM_MEM_PRIVATE;
+    }
     return flags;
 }
 
@@ -1402,6 +1438,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
         mem->ram_start_offset = ram_start_offset;
         mem->ram = ram;
         mem->flags = kvm_mem_flags(mr);
+        mem->gmem_fd = mr->ram_block->gmem_fd;
+        mem->ofs = (uint8_t*)ram - mr->ram_block->host;
+
         kvm_slot_init_dirty_bitmap(mem);
         err = kvm_set_user_memory_region(kml, mem, true);
         if (err) {
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 14ebfa1b991c..80694683acea 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,7 +15,7 @@ kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
 kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
 kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
-kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
+kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, uint32_t fd, uint64_t fd_offset, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " gmem_fd=%d" " gmem_fd_offset=0x%" PRIx64 " ret=%d"
 kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
 kvm_resample_fd_notify(int gsi) "gsi %d"
 kvm_dirty_ring_full(int id) "vcpu %d"
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 511b42bde5c4..9990889d3494 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -30,6 +30,8 @@ typedef struct KVMSlot
     int as_id;
     /* Cache of the offset in ram address space */
     ram_addr_t ram_start_offset;
+    int gmem_fd;
+    hwaddr ofs;
 } KVMSlot;
 
 typedef struct KVMMemoryUpdate {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 06/21] i386: Add support for sw-protected-vm object
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (4 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Introduce sw-protected-vm object which implements the interface of
CONFIDENTIAL_GUEST_SUPPORT, and will be used to create
X86_SW_PROTECTED_VM via

  $qemu -machine ...,confidential-guest-support=sp-vm0	\
        -object sw-protected-vm,id=sp-vm0

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 qapi/qom.json                     |  1 +
 target/i386/kvm/meson.build       |  1 +
 target/i386/kvm/sw-protected-vm.c | 35 +++++++++++++++++++++++++++++++
 target/i386/kvm/sw-protected-vm.h | 17 +++++++++++++++
 4 files changed, 54 insertions(+)
 create mode 100644 target/i386/kvm/sw-protected-vm.c
 create mode 100644 target/i386/kvm/sw-protected-vm.h

diff --git a/qapi/qom.json b/qapi/qom.json
index d28c5403bc0f..be054ee2f348 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -944,6 +944,7 @@
       'if': 'CONFIG_SECRET_KEYRING' },
     'sev-guest',
     'thread-context',
+    'sw-protected-vm',
     's390-pv-guest',
     'throttle-group',
     'tls-creds-anon',
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 40fbde96cac6..a31e760b3f19 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -5,6 +5,7 @@ i386_softmmu_kvm_ss = ss.source_set()
 i386_softmmu_kvm_ss.add(files(
   'kvm.c',
   'kvm-cpu.c',
+  'sw-protected-vm.c',
 ))
 
 i386_softmmu_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
diff --git a/target/i386/kvm/sw-protected-vm.c b/target/i386/kvm/sw-protected-vm.c
new file mode 100644
index 000000000000..62a1d3d5d3fe
--- /dev/null
+++ b/target/i386/kvm/sw-protected-vm.c
@@ -0,0 +1,35 @@
+/*
+ * QEMU X86_SW_PROTECTED_VM SUPPORT
+ *
+ * Author:
+ *      Xiaoyao Li <xiaoyao.li@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object_interfaces.h"
+
+#include "sw-protected-vm.h"
+
+/* x86-sw-protected-vm */
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(SwProtectedVm,
+                                   sw_protected_vm,
+                                   SW_PROTECTED_VM,
+                                   CONFIDENTIAL_GUEST_SUPPORT,
+                                   { TYPE_USER_CREATABLE },
+                                   { NULL })
+
+static void sw_protected_vm_init(Object *obj)
+{
+}
+
+static void sw_protected_vm_finalize(Object *obj)
+{
+}
+
+static void sw_protected_vm_class_init(ObjectClass *oc, void *data)
+{
+}
diff --git a/target/i386/kvm/sw-protected-vm.h b/target/i386/kvm/sw-protected-vm.h
new file mode 100644
index 000000000000..db192a81c75e
--- /dev/null
+++ b/target/i386/kvm/sw-protected-vm.h
@@ -0,0 +1,17 @@
+#ifndef QEMU_I386_SW_PROTECTED_VM_H
+#define QEMU_I386_SW_PROTECTED_VM_H
+
+#include "exec/confidential-guest-support.h"
+
+#define TYPE_SW_PROTECTED_VM    "sw-protected-vm"
+#define SW_PROTECTED_VM(obj)    OBJECT_CHECK(SwProtectedVm, (obj), TYPE_SW_PROTECTED_VM)
+
+typedef struct SwProtectedVmClass {
+    ConfidentialGuestSupportClass parent_class;
+} SwProtectedVmClass;
+
+typedef struct SwProtectedVm {
+    ConfidentialGuestSupport parent_obj;
+} SwProtectedVm;
+
+#endif /* QEMU_I386_SW_PROTECTED_VM_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type()
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (5 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 06/21] i386: Add support for sw-protected-vm object Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-21  8:51   ` David Hildenbrand
  2023-09-23  7:32   ` David Woodhouse
  2023-09-14  3:51 ` [RFC PATCH v2 08/21] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
                   ` (14 subsequent siblings)
  21 siblings, 2 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

pc_machine_kvm_type() was introduced by commit e21be724eaf5 ("i386/xen:
add pc_machine_kvm_type to initialize XEN_EMULATE mode") to do Xen
specific initialization by utilizing kvm_type method.

commit eeedfe6c6316 ("hw/xen: Simplify emulated Xen platform init")
moves the Xen specific initialization to pc_basic_device_init().

There is no need to keep the PC specific kvm_type() implementation
anymore. On the other hand, later patch will implement kvm_type()
method for all x86/i386 machines to support KVM_X86_SW_PROTECTED_VM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 hw/i386/pc.c         | 5 -----
 include/hw/i386/pc.h | 3 ---
 2 files changed, 8 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3109d5e0e035..abeadd903827 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1794,11 +1794,6 @@ static void pc_machine_initfn(Object *obj)
     cxl_machine_init(obj, &pcms->cxl_devices_state);
 }
 
-int pc_machine_kvm_type(MachineState *machine, const char *kvm_type)
-{
-    return 0;
-}
-
 static void pc_machine_reset(MachineState *machine, ShutdownCause reason)
 {
     CPUState *cs;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index d54e8b1101e4..c98d628a76f3 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -296,15 +296,12 @@ extern const size_t pc_compat_1_5_len;
 extern GlobalProperty pc_compat_1_4[];
 extern const size_t pc_compat_1_4_len;
 
-int pc_machine_kvm_type(MachineState *machine, const char *vm_type);
-
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
     static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
     { \
         MachineClass *mc = MACHINE_CLASS(oc); \
         optsfn(mc); \
         mc->init = initfn; \
-        mc->kvm_type = pc_machine_kvm_type; \
     } \
     static const TypeInfo pc_machine_type_##suffix = { \
         .name       = namestr TYPE_MACHINE_SUFFIX, \
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 08/21] target/i386: Implement mc->kvm_type() to get VM type
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (6 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 09/21] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Implement mc->kvm_type() for i386 machines. It provides a way for user
to create SW_PROTECTE_VM.

Also store the vm_type in machinestate to other code to query what the
VM type is.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/x86.c              | 12 ++++++++++++
 include/hw/i386/x86.h      |  1 +
 target/i386/kvm/kvm.c      | 30 ++++++++++++++++++++++++++++++
 target/i386/kvm/kvm_i386.h |  1 +
 4 files changed, 44 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a88a126123be..660f83935315 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1382,6 +1382,17 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
     qapi_free_SgxEPCList(list);
 }
 
+static int x86_kvm_type(MachineState *ms, const char *vm_type)
+{
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    int kvm_type;
+
+    kvm_type = kvm_get_vm_type(ms, vm_type);
+    x86ms->vm_type = kvm_type;
+
+    return kvm_type;
+}
+
 static void x86_machine_initfn(Object *obj)
 {
     X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1406,6 +1417,7 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
     mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
     mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
     mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+    mc->kvm_type = x86_kvm_type;
     x86mc->save_tsc_khz = true;
     x86mc->fwcfg_dma_enabled = true;
     nc->nmi_monitor_handler = x86_nmi;
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index da19ae15463a..ab1d38569019 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -41,6 +41,7 @@ struct X86MachineState {
     MachineState parent;
 
     /*< public >*/
+    unsigned int vm_type;
 
     /* Pointers to devices and objects: */
     ISADevice *rtc;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f8cc8eb1fe70..d1cf6c1f63b3 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -32,6 +32,7 @@
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "sev.h"
+#include "sw-protected-vm.h"
 #include "xen-emu.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
@@ -154,6 +155,35 @@ static KVMMSRHandlers msr_handlers[KVM_MSR_FILTER_MAX_RANGES];
 static RateLimit bus_lock_ratelimit_ctrl;
 static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
 
+static const char* vm_type_name[] = {
+    [KVM_X86_DEFAULT_VM] = "default",
+    [KVM_X86_SW_PROTECTED_VM] = "sw-protected-vm",
+};
+
+int kvm_get_vm_type(MachineState *ms, const char *vm_type)
+{
+    int kvm_type = KVM_X86_DEFAULT_VM;
+
+    if (ms->cgs && object_dynamic_cast(OBJECT(ms->cgs), TYPE_SW_PROTECTED_VM)) {
+        kvm_type = KVM_X86_SW_PROTECTED_VM;
+    }
+
+    /*
+     * old KVM doesn't support KVM_CAP_VM_TYPES and KVM_X86_DEFAULT_VM
+     * is always supported
+     */
+    if (kvm_type == KVM_X86_DEFAULT_VM) {
+        return kvm_type;
+    }
+
+    if (!(kvm_check_extension(KVM_STATE(ms->accelerator), KVM_CAP_VM_TYPES) & BIT(kvm_type))) {
+        error_report("vm-type %s not supported by KVM", vm_type_name[kvm_type]);
+        exit(1);
+    }
+
+    return kvm_type;
+}
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index e24753abfe6a..ea3a5b174ac0 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -37,6 +37,7 @@ bool kvm_has_adjust_clock(void);
 bool kvm_has_adjust_clock_stable(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
+int kvm_get_vm_type(MachineState *ms, const char *vm_type);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_after_reset_vcpu(X86CPU *cpu);
 void kvm_arch_do_init_vcpu(X86CPU *cs);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 09/21] target/i386: Introduce kvm_confidential_guest_init()
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (7 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 08/21] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 10/21] i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm specific functions Xiaoyao Li
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Introduce a separate function kvm_confidential_guest_init(), which
dispatches specific confidential guest initialization function by
ms->cgs type.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 target/i386/kvm/kvm.c | 11 ++++++++++-
 target/i386/sev.c     |  1 -
 target/i386/sev.h     |  2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index d1cf6c1f63b3..fb1be16471b4 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2583,6 +2583,15 @@ static void register_smram_listener(Notifier *n, void *unused)
                                  &smram_address_space, 1, "kvm-smram");
 }
 
+static int kvm_confidential_guest_init(MachineState *ms, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
+        return sev_kvm_init(ms->cgs, errp);
+    }
+
+    return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
     uint64_t identity_base = 0xfffbc000;
@@ -2603,7 +2612,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
      * mechanisms are supported in future (e.g. TDX), they'll need
      * their own initialization either here or elsewhere.
      */
-    ret = sev_kvm_init(ms->cgs, &local_err);
+    ret = kvm_confidential_guest_init(ms, &local_err);
     if (ret < 0) {
         error_report_err(local_err);
         return ret;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index fe2144c0388b..5aa04863846d 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -39,7 +39,6 @@
 #include "hw/i386/pc.h"
 #include "exec/address-spaces.h"
 
-#define TYPE_SEV_GUEST "sev-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
 
 
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 7b1528248a54..64fbf186dbd2 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -20,6 +20,8 @@
 
 #include "exec/confidential-guest-support.h"
 
+#define TYPE_SEV_GUEST "sev-guest"
+
 #define SEV_POLICY_NODBG        0x1
 #define SEV_POLICY_NOKS         0x2
 #define SEV_POLICY_ES           0x4
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 10/21] i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm specific functions
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (8 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 09/21] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 11/21] kvm: Introduce support for memory_attributes Xiaoyao Li
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c             |  2 ++
 target/i386/kvm/sw-protected-vm.c | 10 ++++++++++
 target/i386/kvm/sw-protected-vm.h |  2 ++
 3 files changed, 14 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index fb1be16471b4..e126bf4e7ddd 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2587,6 +2587,8 @@ static int kvm_confidential_guest_init(MachineState *ms, Error **errp)
 {
     if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
         return sev_kvm_init(ms->cgs, errp);
+    } else if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SW_PROTECTED_VM)) {
+        return sw_protected_vm_kvm_init(ms, errp);
     }
 
     return 0;
diff --git a/target/i386/kvm/sw-protected-vm.c b/target/i386/kvm/sw-protected-vm.c
index 62a1d3d5d3fe..3cfcc89202a6 100644
--- a/target/i386/kvm/sw-protected-vm.c
+++ b/target/i386/kvm/sw-protected-vm.c
@@ -10,10 +10,20 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "qom/object_interfaces.h"
 
+#include "hw/i386/x86.h"
 #include "sw-protected-vm.h"
 
+int sw_protected_vm_kvm_init(MachineState *ms, Error **errp)
+{
+    SwProtectedVm *spvm = SW_PROTECTED_VM(OBJECT(ms->cgs));
+
+    spvm->parent_obj.ready = true;
+    return 0;
+}
+
 /* x86-sw-protected-vm */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(SwProtectedVm,
                                    sw_protected_vm,
diff --git a/target/i386/kvm/sw-protected-vm.h b/target/i386/kvm/sw-protected-vm.h
index db192a81c75e..15f63bfc7c60 100644
--- a/target/i386/kvm/sw-protected-vm.h
+++ b/target/i386/kvm/sw-protected-vm.h
@@ -14,4 +14,6 @@ typedef struct SwProtectedVm {
     ConfidentialGuestSupport parent_obj;
 } SwProtectedVm;
 
+int sw_protected_vm_kvm_init(MachineState *ms, Error **errp);
+
 #endif /* QEMU_I386_SW_PROTECTED_VM_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 11/21] kvm: Introduce support for memory_attributes
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (9 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 10/21] i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm specific functions Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 12/21] kvm/memory: Introduce the infrastructure to set the default shared/private value Xiaoyao Li
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Introcude the helper functions to set the attributes of a range of
memory to private and shared.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/sysemu/kvm.h |  3 +++
 2 files changed, 46 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 91cee0878366..eeccc6317fa9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -105,6 +105,7 @@ bool kvm_msi_use_devid;
 bool kvm_has_guest_debug;
 static int kvm_sstep_flags;
 static bool kvm_immediate_exit;
+static uint64_t kvm_supported_memory_attributes;
 static hwaddr kvm_max_slot_size = ~0;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
@@ -1343,6 +1344,44 @@ void kvm_set_max_memslot_size(hwaddr max_slot_size)
     kvm_max_slot_size = max_slot_size;
 }
 
+static int kvm_set_memory_attributes(hwaddr start, hwaddr size, uint64_t attr)
+{
+    struct kvm_memory_attributes attrs;
+    int r;
+
+    attrs.attributes = attr;
+    attrs.address = start;
+    attrs.size = size;
+    attrs.flags = 0;
+
+    r = kvm_vm_ioctl(kvm_state, KVM_SET_MEMORY_ATTRIBUTES, &attrs);
+    if (r) {
+        warn_report("%s: failed to set memory (0x%lx+%#zx) with attr 0x%lx error '%s'",
+                     __func__, start, size, attr, strerror(errno));
+    }
+    return r;
+}
+
+int kvm_set_memory_attributes_private(hwaddr start, hwaddr size)
+{
+    if (!(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+        error_report("KVM doesn't support PRIVATE memory attribute\n");
+        return -EINVAL;
+    }
+
+    return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+}
+
+int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size)
+{
+    if (!(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+        error_report("KVM doesn't support PRIVATE memory attribute\n");
+        return -EINVAL;
+    }
+
+    return kvm_set_memory_attributes(start, size, 0);
+}
+
 /* Called with KVMMemoryListener.slots_lock held */
 static void kvm_set_phys_mem(KVMMemoryListener *kml,
                              MemoryRegionSection *section, bool add)
@@ -2556,6 +2595,10 @@ static int kvm_init(MachineState *ms)
     }
     s->as = g_new0(struct KVMAs, s->nr_as);
 
+    if (kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES)) {
+        kvm_supported_memory_attributes = kvm_ioctl(s, KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES, 0);
+    }
+
     if (object_property_find(OBJECT(current_machine), "kvm-type")) {
         g_autofree char *kvm_type = object_property_get_str(OBJECT(current_machine),
                                                             "kvm-type",
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index f5b74c8dd8c5..0f78f1246c7f 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -582,4 +582,7 @@ bool kvm_dirty_ring_enabled(void);
 uint32_t kvm_dirty_ring_size(void);
 
 int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
+
+int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
+int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 12/21] kvm/memory: Introduce the infrastructure to set the default shared/private value
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (10 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 11/21] kvm: Introduce support for memory_attributes Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 13/21] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Introduce new flag RAM_DEFAULT_PRIVATE for RAMBlock. It's used to
indicate the default attribute,  private or not.

Set the RAM range to private explicitly when it's default private.

Originated-from: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c   | 10 ++++++++++
 include/exec/memory.h |  6 ++++++
 softmmu/memory.c      | 13 +++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index eeccc6317fa9..7e32ee83b258 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1487,6 +1487,16 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
                     strerror(-err));
             abort();
         }
+
+        if (memory_region_is_default_private(mr)) {
+            err = kvm_set_memory_attributes_private(start_addr, slot_size);
+            if (err) {
+                error_report("%s: failed to set memory attribute private: %s\n",
+                             __func__, strerror(-err));
+                exit(1);
+            }
+        }
+
         start_addr += slot_size;
         ram_start_offset += slot_size;
         ram += slot_size;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 4b8486ca3632..2c738b5dc420 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -238,6 +238,9 @@ typedef struct IOMMUTLBEvent {
 /* RAM can be private that has kvm gmem backend */
 #define RAM_KVM_GMEM    (1 << 10)
 
+/* RAM is default private */
+#define RAM_DEFAULT_PRIVATE     (1 << 11)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
@@ -1684,6 +1687,9 @@ bool memory_region_is_protected(MemoryRegion *mr);
  */
 bool memory_region_has_gmem_fd(MemoryRegion *mr);
 
+void memory_region_set_default_private(MemoryRegion *mr);
+bool memory_region_is_default_private(MemoryRegion *mr);
+
 /**
  * memory_region_get_iommu: check whether a memory region is an iommu
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index e69a5f96d5d1..dc5d0d7703b5 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1851,6 +1851,19 @@ bool memory_region_has_gmem_fd(MemoryRegion *mr)
     return mr->ram_block && mr->ram_block->gmem_fd >= 0;
 }
 
+bool memory_region_is_default_private(MemoryRegion *mr)
+{
+    return memory_region_has_gmem_fd(mr) &&
+           (mr->ram_block->flags & RAM_DEFAULT_PRIVATE);
+}
+
+void memory_region_set_default_private(MemoryRegion *mr)
+{
+    if (memory_region_has_gmem_fd(mr)) {
+        mr->ram_block->flags |= RAM_DEFAULT_PRIVATE;
+    }
+}
+
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
     uint8_t mask = mr->dirty_log_mask;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 13/21] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (11 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 12/21] kvm/memory: Introduce the infrastructure to set the default shared/private value Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 14/21] physmem: replace function name with __func__ in ram_block_discard_range() Xiaoyao Li
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Register a memory listener for KVM_X86_SW_PROVTED_VM. It set RAM to
private by default.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 include/exec/memory.h             |  1 +
 target/i386/kvm/sw-protected-vm.c | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2c738b5dc420..a2602b783a38 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -820,6 +820,7 @@ struct IOMMUMemoryRegion {
 #define MEMORY_LISTENER_PRIORITY_MIN            0
 #define MEMORY_LISTENER_PRIORITY_ACCEL          10
 #define MEMORY_LISTENER_PRIORITY_DEV_BACKEND    10
+#define MEMORY_LISTENER_PRIORITY_ACCEL_HIGH     20
 
 /**
  * struct MemoryListener: callbacks structure for updates to the physical memory map
diff --git a/target/i386/kvm/sw-protected-vm.c b/target/i386/kvm/sw-protected-vm.c
index 3cfcc89202a6..f47ac383e1dd 100644
--- a/target/i386/kvm/sw-protected-vm.c
+++ b/target/i386/kvm/sw-protected-vm.c
@@ -12,14 +12,32 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "exec/address-spaces.h"
+#include "sysemu/kvm.h"
 
 #include "hw/i386/x86.h"
 #include "sw-protected-vm.h"
 
+static void kvm_x86_sw_protected_vm_region_add(MemoryListener *listenr,
+                                               MemoryRegionSection *section)
+{
+    memory_region_set_default_private(section->mr);
+}
+
+static MemoryListener kvm_x86_sw_protected_vm_memory_listener = {
+    .name = "kvm_x86_sw_protected_vm_memory_listener",
+    .region_add = kvm_x86_sw_protected_vm_region_add,
+    /* Higher than KVM memory listener = 10. */
+    .priority = MEMORY_LISTENER_PRIORITY_ACCEL_HIGH,
+};
+
 int sw_protected_vm_kvm_init(MachineState *ms, Error **errp)
 {
     SwProtectedVm *spvm = SW_PROTECTED_VM(OBJECT(ms->cgs));
 
+    memory_listener_register(&kvm_x86_sw_protected_vm_memory_listener,
+                             &address_space_memory);
+
     spvm->parent_obj.ready = true;
     return 0;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 14/21] physmem: replace function name with __func__ in ram_block_discard_range()
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (12 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 13/21] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 15/21] physmem: extract ram_block_discard_range_fd() from ram_block_discard_range() Xiaoyao Li
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 softmmu/physmem.c | 34 +++++++++++++++-------------------
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 2d98a88f41f0..34d580ec0d39 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3440,16 +3440,15 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
     uint8_t *host_startaddr = rb->host + start;
 
     if (!QEMU_PTR_IS_ALIGNED(host_startaddr, rb->page_size)) {
-        error_report("ram_block_discard_range: Unaligned start address: %p",
-                     host_startaddr);
+        error_report("%s: Unaligned start address: %p",
+                     __func__, host_startaddr);
         goto err;
     }
 
     if ((start + length) <= rb->max_length) {
         bool need_madvise, need_fallocate;
         if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
-            error_report("ram_block_discard_range: Unaligned length: %zx",
-                         length);
+            error_report("%s: Unaligned length: %zx", __func__, length);
             goto err;
         }
 
@@ -3479,27 +3478,26 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
              * file.
              */
             if (!qemu_ram_is_shared(rb)) {
-                warn_report_once("ram_block_discard_range: Discarding RAM"
+                warn_report_once("%s: Discarding RAM"
                                  " in private file mappings is possibly"
                                  " dangerous, because it will modify the"
                                  " underlying file and will affect other"
-                                 " users of the file");
+                                 " users of the file", __func__);
             }
 
             ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                             start, length);
             if (ret) {
                 ret = -errno;
-                error_report("ram_block_discard_range: Failed to fallocate "
-                             "%s:%" PRIx64 " +%zx (%d)",
-                             rb->idstr, start, length, ret);
+                error_report("%s: Failed to fallocate %s:%" PRIx64 " +%zx (%d)",
+                             __func__, rb->idstr, start, length, ret);
                 goto err;
             }
 #else
             ret = -ENOSYS;
-            error_report("ram_block_discard_range: fallocate not available/file"
+            error_report("%s: fallocate not available/file"
                          "%s:%" PRIx64 " +%zx (%d)",
-                         rb->idstr, start, length, ret);
+                         __func__, rb->idstr, start, length, ret);
             goto err;
 #endif
         }
@@ -3517,25 +3515,23 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
             }
             if (ret) {
                 ret = -errno;
-                error_report("ram_block_discard_range: Failed to discard range "
+                error_report("%s: Failed to discard range "
                              "%s:%" PRIx64 " +%zx (%d)",
-                             rb->idstr, start, length, ret);
+                             __func__, rb->idstr, start, length, ret);
                 goto err;
             }
 #else
             ret = -ENOSYS;
-            error_report("ram_block_discard_range: MADVISE not available"
-                         "%s:%" PRIx64 " +%zx (%d)",
-                         rb->idstr, start, length, ret);
+            error_report("%s: MADVISE not available %s:%" PRIx64 " +%zx (%d)",
+                         __func__, rb->idstr, start, length, ret);
             goto err;
 #endif
         }
         trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
                                       need_madvise, need_fallocate, ret);
     } else {
-        error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
-                     "/%zx/" RAM_ADDR_FMT")",
-                     rb->idstr, start, length, rb->max_length);
+        error_report("%s: Overrun block '%s' (%" PRIu64 "/%zx/" RAM_ADDR_FMT")",
+                     __func__, rb->idstr, start, length, rb->max_length);
     }
 
 err:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 15/21] physmem: extract ram_block_discard_range_fd() from ram_block_discard_range()
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (13 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 14/21] physmem: replace function name with __func__ in ram_block_discard_range() Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 16/21] physmem: Introduce ram_block_convert_range() Xiaoyao Li
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Extract the alignment check and sanity check out from
ram_block_discard_range() into a seperate function
ram_block_discard_range_fd(), which can be passed with an explicit fd as
input parameter.

ram_block_discard_range_fd() can be used to discard private memory range
from gmem fd with later patch. When doing private memory <-> shared
memory conversion, it requires 4KB alignment instead of
RamBlock.page_size.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 softmmu/physmem.c | 192 ++++++++++++++++++++++++----------------------
 1 file changed, 100 insertions(+), 92 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 34d580ec0d39..6ee6bc794f44 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3425,117 +3425,125 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
     return ret;
 }
 
+static int ram_block_discard_range_fd(RAMBlock *rb, uint64_t start,
+                                      size_t length, int fd)
+{
+    uint8_t *host_startaddr = rb->host + start;
+    bool need_madvise, need_fallocate;
+    int ret = -1;
+
+    errno = ENOTSUP; /* If we are missing MADVISE etc */
+
+    /* The logic here is messy;
+     *    madvise DONTNEED fails for hugepages
+     *    fallocate works on hugepages and shmem
+     *    shared anonymous memory requires madvise REMOVE
+     */
+    need_madvise = (rb->page_size == qemu_host_page_size) && (rb->fd == fd);
+    need_fallocate = fd != -1;
+
+    if (need_fallocate) {
+        /* For a file, this causes the area of the file to be zero'd
+         * if read, and for hugetlbfs also causes it to be unmapped
+         * so a userfault will trigger.
+         */
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+        /*
+         * We'll discard data from the actual file, even though we only
+         * have a MAP_PRIVATE mapping, possibly messing with other
+         * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
+         * change that behavior whithout violating the promised
+         * semantics of ram_block_discard_range().
+         *
+         * Only warn, because it works as long as nobody else uses that
+         * file.
+         */
+        if (!qemu_ram_is_shared(rb)) {
+            warn_report_once("%s: Discarding RAM"
+                                " in private file mappings is possibly"
+                                " dangerous, because it will modify the"
+                                " underlying file and will affect other"
+                                " users of the file", __func__);
+        }
+
+        ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+                        start, length);
+        if (ret) {
+            ret = -errno;
+            error_report("%s: Failed to fallocate %s:%" PRIx64 " +%zx (%d)",
+                            __func__, rb->idstr, start, length, ret);
+            return ret;
+        }
+#else
+        ret = -ENOSYS;
+        error_report("%s: fallocate not available/file "
+                     "%s:%" PRIx64 " +%zx (%d)",
+                     __func__, rb->idstr, start, length, ret);
+        return ret;
+#endif
+    }
+
+    if (need_madvise) {
+        /* For normal RAM this causes it to be unmapped,
+         * for shared memory it causes the local mapping to disappear
+         * and to fall back on the file contents (which we just
+         * fallocate'd away).
+         */
+#if defined(CONFIG_MADVISE)
+        if (qemu_ram_is_shared(rb) && fd < 0) {
+            ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
+        } else {
+            ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
+        }
+        if (ret) {
+            ret = -errno;
+            error_report("%s: Failed to discard range %s:%" PRIx64 " +%zx (%d)",
+                         __func__, rb->idstr, start, length, ret);
+            return ret;
+        }
+#else
+        ret = -ENOSYS;
+        error_report("%s: MADVISE not available %s:%" PRIx64 " +%zx (%d)",
+                        __func__, rb->idstr, start, length, ret);
+        return ret;
+#endif
+    }
+
+    trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
+                                  need_madvise, need_fallocate, ret);
+    return ret;
+}
+
 /*
  * Unmap pages of memory from start to start+length such that
  * they a) read as 0, b) Trigger whatever fault mechanism
  * the OS provides for postcopy.
+ *
  * The pages must be unmapped by the end of the function.
- * Returns: 0 on success, none-0 on failure
- *
+ * Returns: 0 on success, none-0 on failure.
  */
 int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
 {
-    int ret = -1;
-
     uint8_t *host_startaddr = rb->host + start;
 
     if (!QEMU_PTR_IS_ALIGNED(host_startaddr, rb->page_size)) {
         error_report("%s: Unaligned start address: %p",
                      __func__, host_startaddr);
-        goto err;
+        return -1;
     }
 
-    if ((start + length) <= rb->max_length) {
-        bool need_madvise, need_fallocate;
-        if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
-            error_report("%s: Unaligned length: %zx", __func__, length);
-            goto err;
-        }
-
-        errno = ENOTSUP; /* If we are missing MADVISE etc */
-
-        /* The logic here is messy;
-         *    madvise DONTNEED fails for hugepages
-         *    fallocate works on hugepages and shmem
-         *    shared anonymous memory requires madvise REMOVE
-         */
-        need_madvise = (rb->page_size == qemu_host_page_size);
-        need_fallocate = rb->fd != -1;
-        if (need_fallocate) {
-            /* For a file, this causes the area of the file to be zero'd
-             * if read, and for hugetlbfs also causes it to be unmapped
-             * so a userfault will trigger.
-             */
-#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
-            /*
-             * We'll discard data from the actual file, even though we only
-             * have a MAP_PRIVATE mapping, possibly messing with other
-             * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
-             * change that behavior whithout violating the promised
-             * semantics of ram_block_discard_range().
-             *
-             * Only warn, because it works as long as nobody else uses that
-             * file.
-             */
-            if (!qemu_ram_is_shared(rb)) {
-                warn_report_once("%s: Discarding RAM"
-                                 " in private file mappings is possibly"
-                                 " dangerous, because it will modify the"
-                                 " underlying file and will affect other"
-                                 " users of the file", __func__);
-            }
-
-            ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-                            start, length);
-            if (ret) {
-                ret = -errno;
-                error_report("%s: Failed to fallocate %s:%" PRIx64 " +%zx (%d)",
-                             __func__, rb->idstr, start, length, ret);
-                goto err;
-            }
-#else
-            ret = -ENOSYS;
-            error_report("%s: fallocate not available/file"
-                         "%s:%" PRIx64 " +%zx (%d)",
-                         __func__, rb->idstr, start, length, ret);
-            goto err;
-#endif
-        }
-        if (need_madvise) {
-            /* For normal RAM this causes it to be unmapped,
-             * for shared memory it causes the local mapping to disappear
-             * and to fall back on the file contents (which we just
-             * fallocate'd away).
-             */
-#if defined(CONFIG_MADVISE)
-            if (qemu_ram_is_shared(rb) && rb->fd < 0) {
-                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
-            } else {
-                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
-            }
-            if (ret) {
-                ret = -errno;
-                error_report("%s: Failed to discard range "
-                             "%s:%" PRIx64 " +%zx (%d)",
-                             __func__, rb->idstr, start, length, ret);
-                goto err;
-            }
-#else
-            ret = -ENOSYS;
-            error_report("%s: MADVISE not available %s:%" PRIx64 " +%zx (%d)",
-                         __func__, rb->idstr, start, length, ret);
-            goto err;
-#endif
-        }
-        trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
-                                      need_madvise, need_fallocate, ret);
-    } else {
+    if ((start + length) > rb->max_length) {
         error_report("%s: Overrun block '%s' (%" PRIu64 "/%zx/" RAM_ADDR_FMT")",
                      __func__, rb->idstr, start, length, rb->max_length);
+        return -1;
     }
 
-err:
-    return ret;
+    if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
+        error_report("%s: Unaligned length: %zx", __func__, length);
+        return -1;
+    }
+
+    return ram_block_discard_range_fd(rb, start, length, rb->fd);
 }
 
 bool ramblock_is_pmem(RAMBlock *rb)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 16/21] physmem: Introduce ram_block_convert_range()
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (14 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 15/21] physmem: extract ram_block_discard_range_fd() from ram_block_discard_range() Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 17/21] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

It's used for discarding oppsite memory after memory conversion to
shared/private.

Note, private-shared page conversion is done at 4KB granularity. Don't
check alignment with rb->page_size, instead qemu_host_page_size, which
is 4K.

Originally-from: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 include/exec/cpu-common.h |  2 ++
 softmmu/physmem.c         | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 87dc9a752c9a..558684b9f246 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -157,6 +157,8 @@ typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
 
 int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length);
+int ram_block_convert_range(RAMBlock *rb, uint64_t start, size_t length,
+                            bool shared_to_private);
 
 #endif
 
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 6ee6bc794f44..dab3247d461c 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3733,3 +3733,39 @@ bool ram_block_discard_is_required(void)
     return qatomic_read(&ram_block_discard_required_cnt) ||
            qatomic_read(&ram_block_coordinated_discard_required_cnt);
 }
+
+int ram_block_convert_range(RAMBlock *rb, uint64_t start, size_t length,
+                            bool shared_to_private)
+{
+    int fd;
+
+    if (!rb || rb->gmem_fd < 0) {
+        return -1;
+    }
+
+    if (!QEMU_PTR_IS_ALIGNED(start, qemu_host_page_size) ||
+        !QEMU_PTR_IS_ALIGNED(length, qemu_host_page_size)) {
+        return -1;
+    }
+
+    if (!length) {
+        return -1;
+    }
+
+    if (start + length > rb->max_length) {
+        return -1;
+    }
+
+    if (shared_to_private) {
+        void *host_startaddr = rb->host + start;
+
+        if (!QEMU_PTR_IS_ALIGNED(host_startaddr, qemu_host_page_size)) {
+            return -1;
+        }
+        fd = rb->fd;
+    } else {
+        fd = rb->gmem_fd;
+    }
+
+    return ram_block_discard_range_fd(rb, start, length, fd);
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 17/21] kvm: handle KVM_EXIT_MEMORY_FAULT
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (15 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 16/21] physmem: Introduce ram_block_convert_range() Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 18/21] trace/kvm: Add trace for page convertion between shared and private Xiaoyao Li
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

From: Chao Peng <chao.p.peng@linux.intel.com>

Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
KVM_EXIT_MEMORY_FAULT happens. It indicates userspace needs to do
the memory conversion on the RAMBlock to turn the memory into desired
attribute, i.e., private/shared.

Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
gmem memory backend.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c | 54 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7e32ee83b258..c67aa66b0559 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3040,6 +3040,50 @@ static void kvm_eat_signals(CPUState *cpu)
     } while (sigismember(&chkset, SIG_IPI));
 }
 
+static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
+{
+    MemoryRegionSection section;
+    ram_addr_t offset;
+    RAMBlock *rb;
+    void *addr;
+    int ret = -1;
+
+    section = memory_region_find(get_system_memory(), start, size);
+    if (!section.mr) {
+        return ret;
+    }
+
+    if (memory_region_has_gmem_fd(section.mr)) {
+        if (to_private) {
+            ret = kvm_set_memory_attributes_private(start, size);
+        } else {
+            ret = kvm_set_memory_attributes_shared(start, size);
+        }
+
+        if (ret) {
+            memory_region_unref(section.mr);
+            return ret;
+        }
+
+        addr = memory_region_get_ram_ptr(section.mr) +
+               section.offset_within_region;
+        rb = qemu_ram_block_from_host(addr, false, &offset);
+        /*
+         * With KVM_SET_MEMORY_ATTRIBUTES by kvm_set_memory_attributes(),
+         * operation on underlying file descriptor is only for releasing
+         * unnecessary pages.
+         */
+        ram_block_convert_range(rb, offset, size, to_private);
+    } else {
+        warn_report("Convert non guest-memfd backed memory region "
+                    "(0x%"HWADDR_PRIx" ,+ 0x%"HWADDR_PRIx") to %s",
+                    start, size, to_private ? "private" : "shared");
+    }
+
+    memory_region_unref(section.mr);
+    return ret;
+}
+
 int kvm_cpu_exec(CPUState *cpu)
 {
     struct kvm_run *run = cpu->kvm_run;
@@ -3198,6 +3242,16 @@ int kvm_cpu_exec(CPUState *cpu)
                 break;
             }
             break;
+        case KVM_EXIT_MEMORY_FAULT:
+            if (run->memory.flags & ~KVM_MEMORY_EXIT_FLAG_PRIVATE) {
+                error_report("KVM_EXIT_MEMORY_FAULT: Unknown flag 0x%" PRIx64,
+                             (uint64_t)run->memory.flags);
+                ret = -1;
+                break;
+            }
+            ret = kvm_convert_memory(run->memory.gpa, run->memory.size,
+                                     run->memory.flags & KVM_MEMORY_EXIT_FLAG_PRIVATE);
+            break;
         default:
             DPRINTF("kvm_arch_handle_exit\n");
             ret = kvm_arch_handle_exit(cpu, run);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 18/21] trace/kvm: Add trace for page convertion between shared and private
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (16 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 17/21] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 19/21] pci-host/q35: Move PAM initialization above SMRAM initialization Xiaoyao Li
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

From: Isaku Yamahata <isaku.yamahata@intel.com>

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c    | 1 +
 accel/kvm/trace-events | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c67aa66b0559..229b7038a4c2 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3048,6 +3048,7 @@ static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
     void *addr;
     int ret = -1;
 
+    trace_kvm_convert_memory(start, size, to_private ? "shared_to_private" : "private_to_shared");
     section = memory_region_find(get_system_memory(), start, size);
     if (!section.mr) {
         return ret;
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 80694683acea..4935c9c5cf0b 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -25,4 +25,4 @@ kvm_dirty_ring_reaper(const char *s) "%s"
 kvm_dirty_ring_reap(uint64_t count, int64_t t) "reaped %"PRIu64" pages (took %"PRIi64" us)"
 kvm_dirty_ring_reaper_kick(const char *reason) "%s"
 kvm_dirty_ring_flush(int finished) "%d"
-
+kvm_convert_memory(uint64_t start, uint64_t size, const char *msg) "start 0x%" PRIx64 " size 0x%" PRIx64 " %s"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 19/21] pci-host/q35: Move PAM initialization above SMRAM initialization
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (17 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 18/21] trace/kvm: Add trace for page convertion between shared and private Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 20/21] q35: Introduce smm_ranges property for q35-pci-host Xiaoyao Li
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

From: Isaku Yamahata <isaku.yamahata@intel.com>

In mch_realize(), process PAM initialization before SMRAM initialization so
that later patch can skill all the SMRAM related with a single check.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/pci-host/q35.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 91c46df9ae31..ac1518a94ee4 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -575,6 +575,16 @@ static void mch_realize(PCIDevice *d, Error **errp)
     /* setup pci memory mapping */
     pc_pci_as_mapping_init(mch->system_memory, mch->pci_address_space);
 
+    /* PAM */
+    init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
+             mch->system_memory, mch->pci_address_space,
+             PAM_BIOS_BASE, PAM_BIOS_SIZE);
+    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
+        init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
+                 mch->system_memory, mch->pci_address_space,
+                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
@@ -641,15 +651,6 @@ static void mch_realize(PCIDevice *d, Error **errp)
 
     object_property_add_const_link(qdev_get_machine(), "smram",
                                    OBJECT(&mch->smram));
-
-    init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
-             mch->system_memory, mch->pci_address_space,
-             PAM_BIOS_BASE, PAM_BIOS_SIZE);
-    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
-        init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
-                 mch->system_memory, mch->pci_address_space,
-                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
-    }
 }
 
 uint64_t mch_mcfg_base(void)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 20/21] q35: Introduce smm_ranges property for q35-pci-host
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (18 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 19/21] pci-host/q35: Move PAM initialization above SMRAM initialization Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14  3:51 ` [RFC PATCH v2 21/21] i386: Disable SMM mode for X86_SW_PROTECTED_VM Xiaoyao Li
  2023-09-14 13:09 ` [RFC PATCH v2 00/21] QEMU gmem implemention David Hildenbrand
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

From: Isaku Yamahata <isaku.yamahata@linux.intel.com>

Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
etc... exist for the target platform.  TDX doesn't support SMM and doesn't
play nice with QEMU modifying related guest memory ranges.

Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/pc_q35.c          |  3 ++-
 hw/pci-host/q35.c         | 42 +++++++++++++++++++++++++++------------
 include/hw/i386/pc.h      |  1 +
 include/hw/pci-host/q35.h |  1 +
 4 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index dc27a9e223a2..73eb3bc5e826 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -232,8 +232,9 @@ static void pc_q35_init(MachineState *machine)
                             x86ms->above_4g_mem_size, NULL);
     object_property_set_bool(phb, PCI_HOST_BYPASS_IOMMU,
                              pcms->default_bus_bypass_iommu, NULL);
+    object_property_set_bool(phb, PCI_HOST_PROP_SMM_RANGES,
+                             x86_machine_is_smm_enabled(x86ms), NULL);
     sysbus_realize_and_unref(SYS_BUS_DEVICE(phb), &error_fatal);
-
     /* pci */
     host_bus = PCI_BUS(qdev_get_child_bus(DEVICE(phb), "pcie.0"));
     pcms->bus = host_bus;
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index ac1518a94ee4..f5c01f7080f9 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -186,6 +186,8 @@ static Property q35_host_props[] = {
                      mch.below_4g_mem_size, 0),
     DEFINE_PROP_SIZE(PCI_HOST_ABOVE_4G_MEM_SIZE, Q35PCIHost,
                      mch.above_4g_mem_size, 0),
+    DEFINE_PROP_BOOL(PCI_HOST_PROP_SMM_RANGES, Q35PCIHost,
+                     mch.has_smm_ranges, true),
     DEFINE_PROP_BOOL("x-pci-hole64-fix", Q35PCIHost, pci_hole64_fix, true),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -221,6 +223,7 @@ static void q35_host_initfn(Object *obj)
     /* mch's object_initialize resets the default value, set it again */
     qdev_prop_set_uint64(DEVICE(s), PCI_HOST_PROP_PCI_HOLE64_SIZE,
                          Q35_PCI_HOST_HOLE64_SIZE_DEFAULT);
+
     object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_START, "uint32",
                         q35_host_get_pci_hole_start,
                         NULL, NULL, NULL);
@@ -483,6 +486,10 @@ static void mch_write_config(PCIDevice *d,
         mch_update_pciexbar(mch);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     if (ranges_overlap(address, len, MCH_HOST_BRIDGE_SMRAM,
                        MCH_HOST_BRIDGE_SMRAM_SIZE)) {
         mch_update_smram(mch);
@@ -501,10 +508,13 @@ static void mch_write_config(PCIDevice *d,
 static void mch_update(MCHPCIState *mch)
 {
     mch_update_pciexbar(mch);
+
     mch_update_pam(mch);
-    mch_update_smram(mch);
-    mch_update_ext_tseg_mbytes(mch);
-    mch_update_smbase_smram(mch);
+    if (mch->has_smm_ranges) {
+        mch_update_smram(mch);
+        mch_update_ext_tseg_mbytes(mch);
+        mch_update_smbase_smram(mch);
+    }
 
     /*
      * pci hole goes from end-of-low-ram to io-apic.
@@ -545,19 +555,21 @@ static void mch_reset(DeviceState *qdev)
     pci_set_quad(d->config + MCH_HOST_BRIDGE_PCIEXBAR,
                  MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT);
 
-    d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
-    d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
-    d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
-    d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
+    if (mch->has_smm_ranges) {
+        d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
+        d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
+        d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
+        d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
 
-    if (mch->ext_tseg_mbytes > 0) {
-        pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
-                     MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+        if (mch->ext_tseg_mbytes > 0) {
+            pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
+                        MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+        }
+
+        d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
+        d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
     }
 
-    d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
-    d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
-
     mch_update(mch);
 }
 
@@ -585,6 +597,10 @@ static void mch_realize(PCIDevice *d, Error **errp)
                  PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index c98d628a76f3..3c9ced22cea9 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -156,6 +156,7 @@ void pc_guest_info_init(PCMachineState *pcms);
 #define PCI_HOST_PROP_PCI_HOLE64_SIZE  "pci-hole64-size"
 #define PCI_HOST_BELOW_4G_MEM_SIZE     "below-4g-mem-size"
 #define PCI_HOST_ABOVE_4G_MEM_SIZE     "above-4g-mem-size"
+#define PCI_HOST_PROP_SMM_RANGES       "smm-ranges"
 
 
 void pc_pci_as_mapping_init(MemoryRegion *system_memory,
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index 1d98bbfe0dd2..930af3870e8a 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -50,6 +50,7 @@ struct MCHPCIState {
     MemoryRegion tseg_blackhole, tseg_window;
     MemoryRegion smbase_blackhole, smbase_window;
     bool has_smram_at_smbase;
+    bool has_smm_ranges;
     Range pci_hole;
     uint64_t below_4g_mem_size;
     uint64_t above_4g_mem_size;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC PATCH v2 21/21] i386: Disable SMM mode for X86_SW_PROTECTED_VM
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (19 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 20/21] q35: Introduce smm_ranges property for q35-pci-host Xiaoyao Li
@ 2023-09-14  3:51 ` Xiaoyao Li
  2023-09-14 13:09 ` [RFC PATCH v2 00/21] QEMU gmem implemention David Hildenbrand
  21 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-14  3:51 UTC (permalink / raw)
  To: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, xiaoyao.li, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/sw-protected-vm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/i386/kvm/sw-protected-vm.c b/target/i386/kvm/sw-protected-vm.c
index f47ac383e1dd..65347067aa03 100644
--- a/target/i386/kvm/sw-protected-vm.c
+++ b/target/i386/kvm/sw-protected-vm.c
@@ -34,10 +34,18 @@ static MemoryListener kvm_x86_sw_protected_vm_memory_listener = {
 int sw_protected_vm_kvm_init(MachineState *ms, Error **errp)
 {
     SwProtectedVm *spvm = SW_PROTECTED_VM(OBJECT(ms->cgs));
+    X86MachineState *x86ms = X86_MACHINE(ms);
 
     memory_listener_register(&kvm_x86_sw_protected_vm_memory_listener,
                              &address_space_memory);
 
+    if (x86ms->smm == ON_OFF_AUTO_AUTO) {
+        x86ms->smm = ON_OFF_AUTO_OFF;
+    } else if (x86ms->smm == ON_OFF_AUTO_ON) {
+        error_setg(errp, "X86_SW_PROTECTED_VM doesn't support SMM");
+        return -EINVAL;
+    }
+
     spvm->parent_obj.ready = true;
     return 0;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 00/21] QEMU gmem implemention
  2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
                   ` (20 preceding siblings ...)
  2023-09-14  3:51 ` [RFC PATCH v2 21/21] i386: Disable SMM mode for X86_SW_PROTECTED_VM Xiaoyao Li
@ 2023-09-14 13:09 ` David Hildenbrand
  2023-09-15  1:10   ` Sean Christopherson
  2023-09-15  3:37   ` Xiaoyao Li
  21 siblings, 2 replies; 52+ messages in thread
From: David Hildenbrand @ 2023-09-14 13:09 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 14.09.23 05:50, Xiaoyao Li wrote:
> It's the v2 RFC of enabling KVM gmem[1] as the backend for private
> memory.
> 
> For confidential-computing, KVM provides gmem/guest_mem interfaces for
> userspace, like QEMU, to allocate user-unaccesible private memory. This
> series aims to add gmem support in QEMU's RAMBlock so that each RAM can
> have both hva-based shared memory and gmem_fd based private memory. QEMU
> does the shared-private conversion on KVM_MEMORY_EXIT and discards the
> memory.
> 
> It chooses the design that adds "private" property to hostmeory backend.
> If "private" property is set, QEMU will allocate/create KVM gmem when
> initialize the RAMbloch of the memory backend.
> 
> This sereis also introduces the first user of kvm gmem,
> KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
> can be created with
> 
>    $qemu -object sw-protected-vm,id=sp-vm0 \
> 	-object memory-backend-ram,id=mem0,size=1G,private=on \
> 	-machine q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
> 	...
> 
> Unfortunately this patch series fails the boot of OVMF at very early
> stage due to triple fault, because KVM doesn't support emulating string IO
> to private memory.

Is support being added? Or have we figured out what it would take to 
make it work?

How does this interact with other features (memory ballooning, virtiofs, 
vfio/mdev/...)?

> 
> This version still leave some opens to be discussed:
> 1. whether we need "private" propery to be user-settable?
> 
>     It seems unnecessary because vm-type is determined. If the VM is
>     confidential-guest, then the RAM of the guest must be able to be
>     mapped as private, i.e., have kvm gmem backend. So QEMU can
>     determine the value of "private" property automatiacally based on vm
>     type.
> 
>     This also aligns with the board internal MemoryRegion that needs to
>     have kvm gmem backend, e.g., TDX requires OVMF to act as private
>     memory so bios memory region needs to have kvm gmem fd associated.
>     QEMU no doubt will do it internally automatically.

Would it make sense to have some regions without "pivate" semantics? 
Like NVDIMMs?

> 
> 2. hugepage support.
> 
>     KVM gmem can be allocated from hugetlbfs. How does QEMU determine
>     when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
>     easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
>     only when memory backend is HostMemoryBackendFile of hugetlbfs.

Good question.

Probably "if the memory backend uses huge pages, also use huge pages for 
the private gmem" makes sense.

... but it becomes a mess with preallocation ... which is what people 
should actually be using with hugetlb. Andeventual double 
memory-consumption ... but maybe that's all been taken care of already?

Probably it's best to leave hugetlb support as future work and start 
with something minimal.

> 
> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?
> 

Why implement it when you have to ask others for a motivation? ;)

Personally, I'm not sure if it is really useful, especially in this state.

>     This series implements KVM_X86_SW_PROTECTED_VM because it's introduced
>     with gmem together on KVM side and it's supposed to be the first user
>     who requires KVM gmem. However the implementation is incomplete and
>     there lacks the definition of how KVM_X86_SW_PROTECTED_VM works.

Then it should not be included in this series such that you can make 
progress with the gmem implementation for TDX guests instead?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 00/21] QEMU gmem implemention
  2023-09-14 13:09 ` [RFC PATCH v2 00/21] QEMU gmem implemention David Hildenbrand
@ 2023-09-15  1:10   ` Sean Christopherson
  2023-09-21  9:11     ` David Hildenbrand
  2023-09-15  3:37   ` Xiaoyao Li
  1 sibling, 1 reply; 52+ messages in thread
From: Sean Christopherson @ 2023-09-15  1:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti, qemu-devel, kvm,
	Michael Roth, isaku.yamahata, Claudio Fontana

On Thu, Sep 14, 2023, David Hildenbrand wrote:
> On 14.09.23 05:50, Xiaoyao Li wrote:
> > It's the v2 RFC of enabling KVM gmem[1] as the backend for private
> > memory.
> > 
> > For confidential-computing, KVM provides gmem/guest_mem interfaces for
> > userspace, like QEMU, to allocate user-unaccesible private memory. This
> > series aims to add gmem support in QEMU's RAMBlock so that each RAM can
> > have both hva-based shared memory and gmem_fd based private memory. QEMU
> > does the shared-private conversion on KVM_MEMORY_EXIT and discards the
> > memory.
> > 
> > It chooses the design that adds "private" property to hostmeory backend.
> > If "private" property is set, QEMU will allocate/create KVM gmem when
> > initialize the RAMbloch of the memory backend.
> > 
> > This sereis also introduces the first user of kvm gmem,
> > KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
> > can be created with
> > 
> >    $qemu -object sw-protected-vm,id=sp-vm0 \
> > 	-object memory-backend-ram,id=mem0,size=1G,private=on \
> > 	-machine q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
> > 	...
> > 
> > Unfortunately this patch series fails the boot of OVMF at very early
> > stage due to triple fault, because KVM doesn't support emulating string IO
> > to private memory.
> 
> Is support being added? Or have we figured out what it would take to make it
> work?

Hrm, this isn't something I've thought deeply about.  The issue is that anything
that reaches any form of copy_{from,to}_user() will go kablooie because KVM will
always try to read/write the shared mappings.  The best case scenario is that the
shared mapping is invalid and the uaccess faults.  The worst case scenario is
that KVM read/writes the wrong memory and sends the guest into the weeds.  Eww.

And we (well, at least I) definitely want to support this so that gmem can be
used for "regular" VMs, i.e. for VMs where userspace is in the TCB, but for which
userspace doesn't have access to guest memory by default.

It shouldn't be too hard to support.  It's easy enough to wire up the hook
(thankfully that aren't _that_ many sites), and gmem only supports struct page at
the moment so we go straight to kmap.  E.g. something like this

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 54480655bcce..b500b0ce5ce3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3291,12 +3291,15 @@ static int next_segment(unsigned long len, int offset)
                return len;
 }
 
-static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
-                                void *data, int offset, int len)
+static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot,
+                                gfn_t gfn, void *data, int offset, int len)
 {
        int r;
        unsigned long addr;
 
+       if (kvm_mem_is_private(kvm, gfn))
+               return kvm_gmem_read(slot, gfn, data, offset, len);
+
        addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
        if (kvm_is_error_hva(addr))
                return -EFAULT;
@@ -3309,9 +3312,8 @@ static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn,
 int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
                        int len)
 {
-       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
-
-       return __kvm_read_guest_page(slot, gfn, data, offset, len);
+       return __kvm_read_guest_page(kvm, gfn_to_memslot(kvm, gfn), gfn, data,
+                                    offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_page);
 
@@ -3320,7 +3322,7 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data,
 {
        struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
 
-       return __kvm_read_guest_page(slot, gfn, data, offset, len);
+       return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page);
 
> > 2. hugepage support.
> > 
> >     KVM gmem can be allocated from hugetlbfs. How does QEMU determine

Not yet it can't.  gmem only supports THP, hugetlbfs is a future thing, if it's
ever supported.  I wouldn't be at all surprised if we end up going down a slightly
different route and don't use hugetlbfs directly.

> >     when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
> >     easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
> >     only when memory backend is HostMemoryBackendFile of hugetlbfs.
> 
> Good question.
> 
> Probably "if the memory backend uses huge pages, also use huge pages for the
> private gmem" makes sense.
> 
> ... but it becomes a mess with preallocation ... which is what people should
> actually be using with hugetlb. Andeventual double memory-consumption ...
> but maybe that's all been taken care of already?
> 
> Probably it's best to leave hugetlb support as future work and start with
> something minimal.
> 
> > 
> > 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?
> > 
> 
> Why implement it when you have to ask others for a motivation? ;)
> 
> Personally, I'm not sure if it is really useful, especially in this state.

Yeah, as of today, KVM_X86_SW_PROTECTED_VM is mainly a development vehicle,
e.g. so that testing gmem doesn't require TDX/SNP hardware, debugging gmem guests
isn't brutally painful, etc.

Longer term, I have aspirations of being able to back most VMs with gmem, but
that's going to require quite a bit more work, e.g. gmem needs to be mappable
(when hardware allows it) so that gmem doesn't all but require double mapping,
KVM obviously needs to be able to read/write gmem, etc.

The value proposition is that having a guest-first memory type will allow KVM to
optimize and harden gmem in ways that wouldn't be feasible for a more generic
memory implementation.  E.g. memory isn't mapped into host userspace by default
(makes it harder to accidentally corrupt the guest), the guest can have *larger*
mappings than host userspace, guest memory can be served from a dedicated pool
(similar-ish to hugetlb), the pool can be omitted from the direct map, etc.

> >     This series implements KVM_X86_SW_PROTECTED_VM because it's introduced
> >     with gmem together on KVM side and it's supposed to be the first user
> >     who requires KVM gmem. However the implementation is incomplete and
> >     there lacks the definition of how KVM_X86_SW_PROTECTED_VM works.
> 
> Then it should not be included in this series such that you can make
> progress with the gmem implementation for TDX guests instead?


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem
  2023-09-14  3:50 ` [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem Xiaoyao Li
@ 2023-09-15  2:04   ` Wang, Lei
  2023-09-15  3:45     ` Xiaoyao Li
  2023-09-21  8:55   ` David Hildenbrand
  2023-10-06 11:07   ` Daniel P. Berrangé
  2 siblings, 1 reply; 52+ messages in thread
From: Wang, Lei @ 2023-09-15  2:04 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 9/14/2023 11:50, Xiaoyao Li wrote:
> From: Chao Peng <chao.p.peng@linux.intel.com>
> 
> Add KVM gmem support to RAMBlock so both normal hva based memory
> and kvm gmem fd based private memory can be associated in one RAMBlock.
> 
> Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
> gmem for the RAMBlock when it's set.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

Kindly reminding the author's Signed-off-by is missing.

> ---
>  accel/kvm/kvm-all.c     | 17 +++++++++++++++++
>  include/exec/memory.h   |  3 +++
>  include/exec/ramblock.h |  1 +
>  include/sysemu/kvm.h    |  2 ++
>  softmmu/physmem.c       | 18 +++++++++++++++---
>  5 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 60aacd925393..185ae16d9620 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -4225,3 +4225,20 @@ void query_stats_schemas_cb(StatsSchemaList **result, Error **errp)
>          query_stats_schema_vcpu(first_cpu, &stats_args);
>      }
>  }
> +
> +int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
> +{
> +    int fd;
> +    struct kvm_create_guest_memfd gmem = {
> +        .size = size,
> +        /* TODO: to decide whether KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is supported */
> +        .flags = flags,
> +    };
> +
> +    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &gmem);
> +    if (fd < 0) {
> +        error_setg_errno(errp, errno, "%s: error creating kvm gmem\n", __func__);
> +    }
> +
> +    return fd;
> +}
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 68284428f87c..227cb2578e95 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -235,6 +235,9 @@ typedef struct IOMMUTLBEvent {
>  /* RAM is an mmap-ed named file */
>  #define RAM_NAMED_FILE (1 << 9)
>  
> +/* RAM can be private that has kvm gmem backend */
> +#define RAM_KVM_GMEM    (1 << 10)
> +
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         IOMMUNotifierFlag flags,
>                                         hwaddr start, hwaddr end,
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 69c6a5390293..0d158b3909c9 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -41,6 +41,7 @@ struct RAMBlock {
>      QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
>      int fd;
>      uint64_t fd_offset;
> +    int gmem_fd;
>      size_t page_size;
>      /* dirty bitmap used during migration */
>      unsigned long *bmap;
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 115f0cca79d1..f5b74c8dd8c5 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -580,4 +580,6 @@ bool kvm_arch_cpu_check_are_resettable(void);
>  bool kvm_dirty_ring_enabled(void);
>  
>  uint32_t kvm_dirty_ring_size(void);
> +
> +int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp);
>  #endif
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 3df73542e1fe..2d98a88f41f0 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1824,6 +1824,16 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>          }
>      }
>  
> +    if (kvm_enabled() && new_block->flags & RAM_KVM_GMEM &&
> +        new_block->gmem_fd < 0) {
> +        new_block->gmem_fd = kvm_create_guest_memfd(new_block->max_length,
> +                                                    0, errp);
> +        if (new_block->gmem_fd < 0) {
> +            qemu_mutex_unlock_ramlist();
> +            return;
> +        }
> +    }
> +
>      new_ram_size = MAX(old_ram_size,
>                (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
>      if (new_ram_size > old_ram_size) {
> @@ -1885,7 +1895,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>  
>      /* Just support these ram flags by now. */
>      assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE |
> -                          RAM_PROTECTED | RAM_NAMED_FILE)) == 0);
> +                          RAM_PROTECTED | RAM_NAMED_FILE | RAM_KVM_GMEM)) == 0);
>  
>      if (xen_enabled()) {
>          error_setg(errp, "-mem-path not supported with Xen");
> @@ -1920,6 +1930,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
>      new_block->used_length = size;
>      new_block->max_length = size;
>      new_block->flags = ram_flags;
> +    new_block->gmem_fd = -1;
>      new_block->host = file_ram_alloc(new_block, size, fd, readonly,
>                                       !file_size, offset, errp);
>      if (!new_block->host) {
> @@ -1978,7 +1989,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
>      Error *local_err = NULL;
>  
>      assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
> -                          RAM_NORESERVE)) == 0);
> +                          RAM_NORESERVE| RAM_KVM_GMEM)) == 0);
>      assert(!host ^ (ram_flags & RAM_PREALLOC));
>  
>      size = HOST_PAGE_ALIGN(size);
> @@ -1990,6 +2001,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
>      new_block->max_length = max_size;
>      assert(max_size >= size);
>      new_block->fd = -1;
> +    new_block->gmem_fd = -1;
>      new_block->page_size = qemu_real_host_page_size();
>      new_block->host = host;
>      new_block->flags = ram_flags;
> @@ -2012,7 +2024,7 @@ RAMBlock *qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>  RAMBlock *qemu_ram_alloc(ram_addr_t size, uint32_t ram_flags,
>                           MemoryRegion *mr, Error **errp)
>  {
> -    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE)) == 0);
> +    assert((ram_flags & ~(RAM_SHARED | RAM_NORESERVE | RAM_KVM_GMEM)) == 0);
>      return qemu_ram_alloc_internal(size, size, NULL, NULL, ram_flags, mr, errp);
>  }
>  


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 00/21] QEMU gmem implemention
  2023-09-14 13:09 ` [RFC PATCH v2 00/21] QEMU gmem implemention David Hildenbrand
  2023-09-15  1:10   ` Sean Christopherson
@ 2023-09-15  3:37   ` Xiaoyao Li
  2023-09-21  8:59     ` David Hildenbrand
  1 sibling, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-15  3:37 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 9/14/2023 9:09 PM, David Hildenbrand wrote:
> On 14.09.23 05:50, Xiaoyao Li wrote:
>> It's the v2 RFC of enabling KVM gmem[1] as the backend for private
>> memory.
>>
>> For confidential-computing, KVM provides gmem/guest_mem interfaces for
>> userspace, like QEMU, to allocate user-unaccesible private memory. This
>> series aims to add gmem support in QEMU's RAMBlock so that each RAM can
>> have both hva-based shared memory and gmem_fd based private memory. QEMU
>> does the shared-private conversion on KVM_MEMORY_EXIT and discards the
>> memory.
>>
>> It chooses the design that adds "private" property to hostmeory backend.
>> If "private" property is set, QEMU will allocate/create KVM gmem when
>> initialize the RAMbloch of the memory backend.
>>
>> This sereis also introduces the first user of kvm gmem,
>> KVM_X86_SW_PROTECTED_VM. A KVM_X86_SW_PROTECTED_VM with private KVM gmem
>> can be created with
>>
>>    $qemu -object sw-protected-vm,id=sp-vm0 \
>>     -object memory-backend-ram,id=mem0,size=1G,private=on \
>>     -machine 
>> q35,kernel_irqchip=split,confidential-guest-support=sp-vm0,memory-backend=mem0 \
>>     ...
>>
>> Unfortunately this patch series fails the boot of OVMF at very early
>> stage due to triple fault, because KVM doesn't support emulating 
>> string IO
>> to private memory.
> 
> Is support being added? Or have we figured out what it would take to 
> make it work?

Hi David,

I only reply the questions that werrn't covered by Sean's reply.

> How does this interact with other features (memory ballooning, virtiofs, 
> vfio/mdev/...)?

I need time to learn them before I can answer it.

>>
>> This version still leave some opens to be discussed:
>> 1. whether we need "private" propery to be user-settable?
>>
>>     It seems unnecessary because vm-type is determined. If the VM is
>>     confidential-guest, then the RAM of the guest must be able to be
>>     mapped as private, i.e., have kvm gmem backend. So QEMU can
>>     determine the value of "private" property automatiacally based on vm
>>     type.
>>
>>     This also aligns with the board internal MemoryRegion that needs to
>>     have kvm gmem backend, e.g., TDX requires OVMF to act as private
>>     memory so bios memory region needs to have kvm gmem fd associated.
>>     QEMU no doubt will do it internally automatically.
> 
> Would it make sense to have some regions without "pivate" semantics? 
> Like NVDIMMs?

Of course it can have regions without "private" semantics.

Whether a region needs "private" backend depends on the definition of VM 
type. E.g., for TDX,
  - all the RAM needs to able to mapped as private. So it needs private 
gmem.
  - TDVF(OVMF) code must be mapped as private. So it needs private gmem.
  - MMIO region needs to be shared for TDX 1.0, and it doesn't need 
private gmem;

>>
>> 2. hugepage support.
>>
>>     KVM gmem can be allocated from hugetlbfs. How does QEMU determine
>>     when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
>>     easiest solution is create KVM gmem with 
>> KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
>>     only when memory backend is HostMemoryBackendFile of hugetlbfs.
> 
> Good question.
> 
> Probably "if the memory backend uses huge pages, also use huge pages for 
> the private gmem" makes sense.
> 
> ... but it becomes a mess with preallocation ... which is what people 
> should actually be using with hugetlb. Andeventual double 
> memory-consumption ... but maybe that's all been taken care of already?
> 
> Probably it's best to leave hugetlb support as future work and start 
> with something minimal.
> 

As Sean replied, I had some misunderstanding of 
KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. If it's for THP, I think we can allow it 
for every gmem.

As for hugetlb, we can leave it as future work.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem
  2023-09-15  2:04   ` Wang, Lei
@ 2023-09-15  3:45     ` Xiaoyao Li
  0 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-15  3:45 UTC (permalink / raw)
  To: Wang, Lei, Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 9/15/2023 10:04 AM, Wang, Lei wrote:
> On 9/14/2023 11:50, Xiaoyao Li wrote:
>> From: Chao Peng<chao.p.peng@linux.intel.com>
>>
>> Add KVM gmem support to RAMBlock so both normal hva based memory
>> and kvm gmem fd based private memory can be associated in one RAMBlock.
>>
>> Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
>> gmem for the RAMBlock when it's set.
>>
>> Signed-off-by: Xiaoyao Li<xiaoyao.li@intel.com>
> Kindly reminding the author's Signed-off-by is missing.

I will fix it.
Thanks!




^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM
  2023-09-14  3:50 ` [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM Xiaoyao Li
@ 2023-09-19  9:46   ` Markus Armbruster
  2023-09-19 23:24     ` Xiaoyao Li
  0 siblings, 1 reply; 52+ messages in thread
From: Markus Armbruster @ 2023-09-19  9:46 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Marcelo Tosatti, qemu-devel, kvm, Michael Roth,
	isaku.yamahata, Sean Christopherson, Claudio Fontana

Xiaoyao Li <xiaoyao.li@intel.com> writes:

> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Add a new property "private" to memory backends. When it's set to true,
> it indicates the RAMblock of the backend also requires kvm gmem.

Can you add a brief explanation why you need the property?

> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

[...]

> diff --git a/qapi/qom.json b/qapi/qom.json
> index fa3e88c8e6ab..d28c5403bc0f 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -605,6 +605,9 @@
>  # @reserve: if true, reserve swap space (or huge pages) if applicable
>  #     (default: true) (since 6.1)
>  #
> +# @private: if true, use KVM gmem private memory (default: false)
> +#     (since 8.2)
> +#
>  # @size: size of the memory region in bytes
>  #
>  # @x-use-canonical-path-for-ramblock-id: if true, the canonical path
> @@ -631,6 +634,7 @@
>              '*prealloc-context': 'str',
>              '*share': 'bool',
>              '*reserve': 'bool',
> +            '*private': 'bool',
>              'size': 'size',
>              '*x-use-canonical-path-for-ramblock-id': 'bool' } }



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM
  2023-09-19  9:46   ` Markus Armbruster
@ 2023-09-19 23:24     ` Xiaoyao Li
  2023-09-20  7:30       ` Markus Armbruster
  0 siblings, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-19 23:24 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Marcelo Tosatti, qemu-devel, kvm, Michael Roth,
	isaku.yamahata, Sean Christopherson, Claudio Fontana

On 9/19/2023 5:46 PM, Markus Armbruster wrote:
> Xiaoyao Li <xiaoyao.li@intel.com> writes:
> 
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> Add a new property "private" to memory backends. When it's set to true,
>> it indicates the RAMblock of the backend also requires kvm gmem.
> 
> Can you add a brief explanation why you need the property?

It provides a mechanism for user to specify whether the memory can serve 
as private memory (need request kvm gmem).

>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> [...]
> 
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index fa3e88c8e6ab..d28c5403bc0f 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -605,6 +605,9 @@
>>   # @reserve: if true, reserve swap space (or huge pages) if applicable
>>   #     (default: true) (since 6.1)
>>   #
>> +# @private: if true, use KVM gmem private memory (default: false)
>> +#     (since 8.2)
>> +#
>>   # @size: size of the memory region in bytes
>>   #
>>   # @x-use-canonical-path-for-ramblock-id: if true, the canonical path
>> @@ -631,6 +634,7 @@
>>               '*prealloc-context': 'str',
>>               '*share': 'bool',
>>               '*reserve': 'bool',
>> +            '*private': 'bool',
>>               'size': 'size',
>>               '*x-use-canonical-path-for-ramblock-id': 'bool' } }
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM
  2023-09-19 23:24     ` Xiaoyao Li
@ 2023-09-20  7:30       ` Markus Armbruster
  2023-09-20 14:35         ` Xiaoyao Li
  0 siblings, 1 reply; 52+ messages in thread
From: Markus Armbruster @ 2023-09-20  7:30 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Marcelo Tosatti, qemu-devel, kvm, Michael Roth,
	isaku.yamahata, Sean Christopherson, Claudio Fontana

Xiaoyao Li <xiaoyao.li@intel.com> writes:

> On 9/19/2023 5:46 PM, Markus Armbruster wrote:
>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>> 
>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>
>>> Add a new property "private" to memory backends. When it's set to true,
>>> it indicates the RAMblock of the backend also requires kvm gmem.
>> Can you add a brief explanation why you need the property?
>
> It provides a mechanism for user to specify whether the memory can serve as private memory (need request kvm gmem).

Yes, but why would a user want such memory?



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM
  2023-09-20  7:30       ` Markus Armbruster
@ 2023-09-20 14:35         ` Xiaoyao Li
  2023-09-20 14:37           ` David Hildenbrand
  0 siblings, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-20 14:35 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Marcelo Tosatti, qemu-devel, kvm, Michael Roth,
	isaku.yamahata, Sean Christopherson, Claudio Fontana

On 9/20/2023 3:30 PM, Markus Armbruster wrote:
> Xiaoyao Li <xiaoyao.li@intel.com> writes:
> 
>> On 9/19/2023 5:46 PM, Markus Armbruster wrote:
>>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>>
>>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>
>>>> Add a new property "private" to memory backends. When it's set to true,
>>>> it indicates the RAMblock of the backend also requires kvm gmem.
>>> Can you add a brief explanation why you need the property?
>>
>> It provides a mechanism for user to specify whether the memory can serve as private memory (need request kvm gmem).
> 
> Yes, but why would a user want such memory?
> 

Because KVM demands it for confidential guest, e.g., TDX guest. KVM 
demands that the mem slot needs to have KVM_MEM_PRIVATE set and has 
valid gmem associated if the guest accesses it as private memory.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM
  2023-09-20 14:35         ` Xiaoyao Li
@ 2023-09-20 14:37           ` David Hildenbrand
  2023-09-20 15:42             ` Markus Armbruster
  0 siblings, 1 reply; 52+ messages in thread
From: David Hildenbrand @ 2023-09-20 14:37 UTC (permalink / raw)
  To: Xiaoyao Li, Markus Armbruster
  Cc: Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Marcelo Tosatti, qemu-devel, kvm, Michael Roth,
	isaku.yamahata, Sean Christopherson, Claudio Fontana

On 20.09.23 16:35, Xiaoyao Li wrote:
> On 9/20/2023 3:30 PM, Markus Armbruster wrote:
>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>
>>> On 9/19/2023 5:46 PM, Markus Armbruster wrote:
>>>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>>>
>>>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>>
>>>>> Add a new property "private" to memory backends. When it's set to true,
>>>>> it indicates the RAMblock of the backend also requires kvm gmem.
>>>> Can you add a brief explanation why you need the property?
>>>
>>> It provides a mechanism for user to specify whether the memory can serve as private memory (need request kvm gmem).
>>
>> Yes, but why would a user want such memory?
>>
> 
> Because KVM demands it for confidential guest, e.g., TDX guest. KVM
> demands that the mem slot needs to have KVM_MEM_PRIVATE set and has
> valid gmem associated if the guest accesses it as private memory.

I think as long as there is no demand to have a TDX guest with this 
property be set to "off", then just don't add it.

With a TDX VM, it will can be implicitly active. If we ever have to 
disable it for selective memory backends, we can add the property and 
have something like on/off/auto. For now it would be "auto".

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM
  2023-09-20 14:37           ` David Hildenbrand
@ 2023-09-20 15:42             ` Markus Armbruster
  2023-09-21  8:38               ` Xiaoyao Li
  0 siblings, 1 reply; 52+ messages in thread
From: Markus Armbruster @ 2023-09-20 15:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Marcelo Tosatti, qemu-devel, kvm, Michael Roth,
	isaku.yamahata, Sean Christopherson, Claudio Fontana

David Hildenbrand <david@redhat.com> writes:

> On 20.09.23 16:35, Xiaoyao Li wrote:
>> On 9/20/2023 3:30 PM, Markus Armbruster wrote:
>>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>>
>>>> On 9/19/2023 5:46 PM, Markus Armbruster wrote:
>>>>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>>>>
>>>>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>>>
>>>>>> Add a new property "private" to memory backends. When it's set to true,
>>>>>> it indicates the RAMblock of the backend also requires kvm gmem.
>>>>> Can you add a brief explanation why you need the property?
>>>>
>>>> It provides a mechanism for user to specify whether the memory can serve as private memory (need request kvm gmem).
>>>
>>> Yes, but why would a user want such memory?
>>>
>> Because KVM demands it for confidential guest, e.g., TDX guest. KVM
>> demands that the mem slot needs to have KVM_MEM_PRIVATE set and has
>> valid gmem associated if the guest accesses it as private memory.

Commit messages should explain why we want the patch.  Documenting "why"
is at least as important as "what".  If "what" is missing, I can read
the patch to find out.  If "why" is missing, I'm reduced to guesswork.

> I think as long as there is no demand to have a TDX guest with this property be set to "off", then just don't add it.
>
> With a TDX VM, it will can be implicitly active. If we ever have to disable it for selective memory backends, we can add the property and have something like on/off/auto. For now it would be "auto".

Makes sense to me.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM
  2023-09-20 15:42             ` Markus Armbruster
@ 2023-09-21  8:38               ` Xiaoyao Li
  2023-09-21  8:45                 ` David Hildenbrand
  0 siblings, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-21  8:38 UTC (permalink / raw)
  To: Markus Armbruster, David Hildenbrand
  Cc: Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Marcelo Tosatti, qemu-devel, kvm, Michael Roth,
	isaku.yamahata, Sean Christopherson, Claudio Fontana

On 9/20/2023 11:42 PM, Markus Armbruster wrote:
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 20.09.23 16:35, Xiaoyao Li wrote:
>>> On 9/20/2023 3:30 PM, Markus Armbruster wrote:
>>>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>>>
>>>>> On 9/19/2023 5:46 PM, Markus Armbruster wrote:
>>>>>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>>>>>
>>>>>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>>>>
>>>>>>> Add a new property "private" to memory backends. When it's set to true,
>>>>>>> it indicates the RAMblock of the backend also requires kvm gmem.
>>>>>> Can you add a brief explanation why you need the property?
>>>>>
>>>>> It provides a mechanism for user to specify whether the memory can serve as private memory (need request kvm gmem).
>>>>
>>>> Yes, but why would a user want such memory?
>>>>
>>> Because KVM demands it for confidential guest, e.g., TDX guest. KVM
>>> demands that the mem slot needs to have KVM_MEM_PRIVATE set and has
>>> valid gmem associated if the guest accesses it as private memory.
> 
> Commit messages should explain why we want the patch.  Documenting "why"
> is at least as important as "what".  If "what" is missing, I can read
> the patch to find out.  If "why" is missing, I'm reduced to guesswork.

I'll try best to improve the commit message of this patch, and all other 
patches.

>> I think as long as there is no demand to have a TDX guest with this property be set to "off", then just don't add it.
>>
>> With a TDX VM, it will can be implicitly active. If we ever have to disable it for selective memory backends, we can add the property and have something like on/off/auto. For now it would be "auto".
> 
> Makes sense to me.

OK. I think I get the answer of open #1 in cover letter.

If no other voice, I'll drop this patch and allocate gmem RAM when 
vm_type is TDX.




^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM
  2023-09-21  8:38               ` Xiaoyao Li
@ 2023-09-21  8:45                 ` David Hildenbrand
  0 siblings, 0 replies; 52+ messages in thread
From: David Hildenbrand @ 2023-09-21  8:45 UTC (permalink / raw)
  To: Xiaoyao Li, Markus Armbruster
  Cc: Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Marcelo Tosatti, qemu-devel, kvm, Michael Roth,
	isaku.yamahata, Sean Christopherson, Claudio Fontana

>>> I think as long as there is no demand to have a TDX guest with this property be set to "off", then just don't add it.
>>>
>>> With a TDX VM, it will can be implicitly active. If we ever have to disable it for selective memory backends, we can add the property and have something like on/off/auto. For now it would be "auto".
>>
>> Makes sense to me.
> 
> OK. I think I get the answer of open #1 in cover letter.
> 

Yes.

> If no other voice, I'll drop this patch and allocate gmem RAM when
> vm_type is TDX.

Good!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd()
  2023-09-14  3:51 ` [RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd() Xiaoyao Li
@ 2023-09-21  8:46   ` David Hildenbrand
  2023-09-22  0:22     ` Xiaoyao Li
  0 siblings, 1 reply; 52+ messages in thread
From: David Hildenbrand @ 2023-09-21  8:46 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 14.09.23 05:51, Xiaoyao Li wrote:
> Introduce memory_region_has_gmem_fd() to query if the MemoryRegion has
> KVM gmem fd allocated.

*probably* best to just squash that into patch #2.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type()
  2023-09-14  3:51 ` [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
@ 2023-09-21  8:51   ` David Hildenbrand
  2023-09-22  0:24     ` Xiaoyao Li
  2023-09-23  7:32   ` David Woodhouse
  1 sibling, 1 reply; 52+ messages in thread
From: David Hildenbrand @ 2023-09-21  8:51 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 14.09.23 05:51, Xiaoyao Li wrote:
> pc_machine_kvm_type() was introduced by commit e21be724eaf5 ("i386/xen:
> add pc_machine_kvm_type to initialize XEN_EMULATE mode") to do Xen
> specific initialization by utilizing kvm_type method.
> 
> commit eeedfe6c6316 ("hw/xen: Simplify emulated Xen platform init")
> moves the Xen specific initialization to pc_basic_device_init().
> 
> There is no need to keep the PC specific kvm_type() implementation
> anymore.

So we'll fallback to kvm_arch_get_default_type(), which simply returns 0.

> On the other hand, later patch will implement kvm_type()
> method for all x86/i386 machines to support KVM_X86_SW_PROTECTED_VM.
> 

^ I suggest dropping that and merging that patch ahead-of-time as a 
simple cleanup.

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem
  2023-09-14  3:50 ` [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem Xiaoyao Li
  2023-09-15  2:04   ` Wang, Lei
@ 2023-09-21  8:55   ` David Hildenbrand
  2023-09-22  0:22     ` Xiaoyao Li
  2023-10-06 11:07   ` Daniel P. Berrangé
  2 siblings, 1 reply; 52+ messages in thread
From: David Hildenbrand @ 2023-09-21  8:55 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 14.09.23 05:50, Xiaoyao Li wrote:
> From: Chao Peng <chao.p.peng@linux.intel.com>
> 
> Add KVM gmem support to RAMBlock so both normal hva based memory
> and kvm gmem fd based private memory can be associated in one RAMBlock.
> 
> Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
> gmem for the RAMBlock when it's set.


But who sets RAM_KVM_GMEM and when? Don't we simply allocate it for all 
RAMBlocks under such special VMs? What's the downside of doing that?


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  2023-09-14  3:51 ` [RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
@ 2023-09-21  8:56   ` David Hildenbrand
  2023-09-22  0:23     ` Xiaoyao Li
  0 siblings, 1 reply; 52+ messages in thread
From: David Hildenbrand @ 2023-09-21  8:56 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 14.09.23 05:51, Xiaoyao Li wrote:
> From: Chao Peng <chao.p.peng@linux.intel.com>
> 
> Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.
> 
> With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
> backend'ed both by hva-based shared memory and gmem fd based private
> memory.
> 
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>

"Co-developed-by".

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 00/21] QEMU gmem implemention
  2023-09-15  3:37   ` Xiaoyao Li
@ 2023-09-21  8:59     ` David Hildenbrand
  0 siblings, 0 replies; 52+ messages in thread
From: David Hildenbrand @ 2023-09-21  8:59 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

>>>
>>> This version still leave some opens to be discussed:
>>> 1. whether we need "private" propery to be user-settable?
>>>
>>>      It seems unnecessary because vm-type is determined. If the VM is
>>>      confidential-guest, then the RAM of the guest must be able to be
>>>      mapped as private, i.e., have kvm gmem backend. So QEMU can
>>>      determine the value of "private" property automatiacally based on vm
>>>      type.
>>>
>>>      This also aligns with the board internal MemoryRegion that needs to
>>>      have kvm gmem backend, e.g., TDX requires OVMF to act as private
>>>      memory so bios memory region needs to have kvm gmem fd associated.
>>>      QEMU no doubt will do it internally automatically.
>>
>> Would it make sense to have some regions without "pivate" semantics?
>> Like NVDIMMs?
> 
> Of course it can have regions without "private" semantics.

I meant "RAM memory regions on such a special VM". Does it make sense to 
have !private there? I assume "for now, not".

>>>
>>> 2. hugepage support.
>>>
>>>      KVM gmem can be allocated from hugetlbfs. How does QEMU determine
>>>      when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
>>>      easiest solution is create KVM gmem with
>>> KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
>>>      only when memory backend is HostMemoryBackendFile of hugetlbfs.
>>
>> Good question.
>>
>> Probably "if the memory backend uses huge pages, also use huge pages for
>> the private gmem" makes sense.
>>
>> ... but it becomes a mess with preallocation ... which is what people
>> should actually be using with hugetlb. Andeventual double
>> memory-consumption ... but maybe that's all been taken care of already?
>>
>> Probably it's best to leave hugetlb support as future work and start
>> with something minimal.
>>
> 
> As Sean replied, I had some misunderstanding of
> KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. If it's for THP, I think we can allow it
> for every gmem.

Right, just like we do a MADV_HUGEPAGE rather blindly on all memory.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 00/21] QEMU gmem implemention
  2023-09-15  1:10   ` Sean Christopherson
@ 2023-09-21  9:11     ` David Hildenbrand
  2023-09-22  7:03       ` Xiaoyao Li
  0 siblings, 1 reply; 52+ messages in thread
From: David Hildenbrand @ 2023-09-21  9:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti, qemu-devel, kvm,
	Michael Roth, isaku.yamahata, Claudio Fontana

>>> 2. hugepage support.
>>>
>>>      KVM gmem can be allocated from hugetlbfs. How does QEMU determine
> 
> Not yet it can't.  gmem only supports THP, hugetlbfs is a future thing, if it's
> ever supported.  I wouldn't be at all surprised if we end up going down a slightly
> different route and don't use hugetlbfs directly.

Agreed. Certainly future work.

> 
>>>      when to allocate KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE. The
>>>      easiest solution is create KVM gmem with KVM_GUEST_MEMFD_ALLOW_HUGEPAGE
>>>      only when memory backend is HostMemoryBackendFile of hugetlbfs.
>>
>> Good question.
>>
>> Probably "if the memory backend uses huge pages, also use huge pages for the
>> private gmem" makes sense.
>>
>> ... but it becomes a mess with preallocation ... which is what people should
>> actually be using with hugetlb. Andeventual double memory-consumption ...
>> but maybe that's all been taken care of already?
>>
>> Probably it's best to leave hugetlb support as future work and start with
>> something minimal.
>>
>>>
>>> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we need it?
>>>
>>
>> Why implement it when you have to ask others for a motivation? ;)
>>
>> Personally, I'm not sure if it is really useful, especially in this state.
> 
> Yeah, as of today, KVM_X86_SW_PROTECTED_VM is mainly a development vehicle,
> e.g. so that testing gmem doesn't require TDX/SNP hardware, debugging gmem guests
> isn't brutally painful, etc.
> 
> Longer term, I have aspirations of being able to back most VMs with gmem, but
> that's going to require quite a bit more work, e.g. gmem needs to be mappable
> (when hardware allows it) so that gmem doesn't all but require double mapping,
> KVM obviously needs to be able to read/write gmem, etc.
> 
> The value proposition is that having a guest-first memory type will allow KVM to
> optimize and harden gmem in ways that wouldn't be feasible for a more generic
> memory implementation.  E.g. memory isn't mapped into host userspace by default
> (makes it harder to accidentally corrupt the guest), the guest can have *larger*
> mappings than host userspace, guest memory can be served from a dedicated pool
> (similar-ish to hugetlb), the pool can be omitted from the direct map, etc.
>
Thanks for that information. Personally, I don't believe "to back most 
VMs with gmem", but that's a different discussion.

As a development vehicle to get TDX up and running it might be very 
valuable indeed. But it doesn't necessarily have to be merged in QEMU 
for that case -- especially in a semi-finished form.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem
  2023-09-21  8:55   ` David Hildenbrand
@ 2023-09-22  0:22     ` Xiaoyao Li
  2023-09-22  7:08       ` David Hildenbrand
  0 siblings, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-22  0:22 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 9/21/2023 4:55 PM, David Hildenbrand wrote:
> On 14.09.23 05:50, Xiaoyao Li wrote:
>> From: Chao Peng <chao.p.peng@linux.intel.com>
>>
>> Add KVM gmem support to RAMBlock so both normal hva based memory
>> and kvm gmem fd based private memory can be associated in one RAMBlock.
>>
>> Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
>> gmem for the RAMBlock when it's set.
> 
> 
> But who sets RAM_KVM_GMEM and when? 

The answer is in the next patch. When `private` property of memory 
backend is set to true, it will pass RAM_KVM_GMEM flag to 
memory_region_init_ram_*()

> Don't we simply allocate it for all 
> RAMBlocks under such special VMs? 

yes, this is the direction after your comments.

I'll try to figure out how to achieve it.

> What's the downside of doing that?

As far as I see, for TDX, no downside.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd()
  2023-09-21  8:46   ` David Hildenbrand
@ 2023-09-22  0:22     ` Xiaoyao Li
  0 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-22  0:22 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 9/21/2023 4:46 PM, David Hildenbrand wrote:
> On 14.09.23 05:51, Xiaoyao Li wrote:
>> Introduce memory_region_has_gmem_fd() to query if the MemoryRegion has
>> KVM gmem fd allocated.
> 
> *probably* best to just squash that into patch #2.

Sure, I will do it.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  2023-09-21  8:56   ` David Hildenbrand
@ 2023-09-22  0:23     ` Xiaoyao Li
  0 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-22  0:23 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 9/21/2023 4:56 PM, David Hildenbrand wrote:
> On 14.09.23 05:51, Xiaoyao Li wrote:
>> From: Chao Peng <chao.p.peng@linux.intel.com>
>>
>> Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.
>>
>> With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
>> backend'ed both by hva-based shared memory and gmem fd based private
>> memory.
>>
>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>> Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> "Co-developed-by".

I will fix it, thanks!



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type()
  2023-09-21  8:51   ` David Hildenbrand
@ 2023-09-22  0:24     ` Xiaoyao Li
  2023-09-22  7:11       ` David Hildenbrand
  0 siblings, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-22  0:24 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 9/21/2023 4:51 PM, David Hildenbrand wrote:
> On 14.09.23 05:51, Xiaoyao Li wrote:
>> pc_machine_kvm_type() was introduced by commit e21be724eaf5 ("i386/xen:
>> add pc_machine_kvm_type to initialize XEN_EMULATE mode") to do Xen
>> specific initialization by utilizing kvm_type method.
>>
>> commit eeedfe6c6316 ("hw/xen: Simplify emulated Xen platform init")
>> moves the Xen specific initialization to pc_basic_device_init().
>>
>> There is no need to keep the PC specific kvm_type() implementation
>> anymore.
> 
> So we'll fallback to kvm_arch_get_default_type(), which simply returns 0.
> 
>> On the other hand, later patch will implement kvm_type()
>> method for all x86/i386 machines to support KVM_X86_SW_PROTECTED_VM.
>>
> 
> ^ I suggest dropping that and merging that patch ahead-of-time as a 
> simple cleanup.

I suppose the "that" here means "this patch", right?

If so, I can submit this patch separately.

> Reviewed-by: David Hildenbrand <david@redhat.com>
> 



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 00/21] QEMU gmem implemention
  2023-09-21  9:11     ` David Hildenbrand
@ 2023-09-22  7:03       ` Xiaoyao Li
  2023-09-22  7:10         ` David Hildenbrand
  0 siblings, 1 reply; 52+ messages in thread
From: Xiaoyao Li @ 2023-09-22  7:03 UTC (permalink / raw)
  To: David Hildenbrand, Sean Christopherson
  Cc: Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti, qemu-devel, kvm,
	Michael Roth, isaku.yamahata, Claudio Fontana

On 9/21/2023 5:11 PM, David Hildenbrand wrote:
>>>> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we 
>>>> need it?
>>>>
>>>
>>> Why implement it when you have to ask others for a motivation? 😉
>>>
>>> Personally, I'm not sure if it is really useful, especially in this 
>>> state.
>>
>> Yeah, as of today, KVM_X86_SW_PROTECTED_VM is mainly a development 
>> vehicle,
>> e.g. so that testing gmem doesn't require TDX/SNP hardware, debugging 
>> gmem guests
>> isn't brutally painful, etc.
>>
>> Longer term, I have aspirations of being able to back most VMs with 
>> gmem, but
>> that's going to require quite a bit more work, e.g. gmem needs to be 
>> mappable
>> (when hardware allows it) so that gmem doesn't all but require double 
>> mapping,
>> KVM obviously needs to be able to read/write gmem, etc.
>>
>> The value proposition is that having a guest-first memory type will 
>> allow KVM to
>> optimize and harden gmem in ways that wouldn't be feasible for a more 
>> generic
>> memory implementation.  E.g. memory isn't mapped into host userspace 
>> by default
>> (makes it harder to accidentally corrupt the guest), the guest can 
>> have *larger*
>> mappings than host userspace, guest memory can be served from a 
>> dedicated pool
>> (similar-ish to hugetlb), the pool can be omitted from the direct map, 
>> etc.
>>
> Thanks for that information. Personally, I don't believe "to back most 
> VMs with gmem", but that's a different discussion.
> 
> As a development vehicle to get TDX up and running it might be very 
> valuable indeed. But it doesn't necessarily have to be merged in QEMU 
> for that case -- especially in a semi-finished form.

It's true and I agree with it. I'll drop the KVM_X86_SW_PROTECTED_VM 
part in next version.

How would you like this series to proceed in next version? only the 
patches of gmem support without a user? or together with next QEMU TDX 
series?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem
  2023-09-22  0:22     ` Xiaoyao Li
@ 2023-09-22  7:08       ` David Hildenbrand
  2023-10-08  2:59         ` Xiaoyao Li
  0 siblings, 1 reply; 52+ messages in thread
From: David Hildenbrand @ 2023-09-22  7:08 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 22.09.23 02:22, Xiaoyao Li wrote:
> On 9/21/2023 4:55 PM, David Hildenbrand wrote:
>> On 14.09.23 05:50, Xiaoyao Li wrote:
>>> From: Chao Peng <chao.p.peng@linux.intel.com>
>>>
>>> Add KVM gmem support to RAMBlock so both normal hva based memory
>>> and kvm gmem fd based private memory can be associated in one RAMBlock.
>>>
>>> Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
>>> gmem for the RAMBlock when it's set.
>>
>>
>> But who sets RAM_KVM_GMEM and when?
> 
> The answer is in the next patch. When `private` property of memory
> backend is set to true, it will pass RAM_KVM_GMEM flag to
> memory_region_init_ram_*()

Okay, assuming that patch (and property) will go away, I assume this 
flag can also go away, right?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 00/21] QEMU gmem implemention
  2023-09-22  7:03       ` Xiaoyao Li
@ 2023-09-22  7:10         ` David Hildenbrand
  0 siblings, 0 replies; 52+ messages in thread
From: David Hildenbrand @ 2023-09-22  7:10 UTC (permalink / raw)
  To: Xiaoyao Li, Sean Christopherson
  Cc: Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti, qemu-devel, kvm,
	Michael Roth, isaku.yamahata, Claudio Fontana

On 22.09.23 09:03, Xiaoyao Li wrote:
> On 9/21/2023 5:11 PM, David Hildenbrand wrote:
>>>>> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? and do we
>>>>> need it?
>>>>>
>>>>
>>>> Why implement it when you have to ask others for a motivation? 😉
>>>>
>>>> Personally, I'm not sure if it is really useful, especially in this
>>>> state.
>>>
>>> Yeah, as of today, KVM_X86_SW_PROTECTED_VM is mainly a development
>>> vehicle,
>>> e.g. so that testing gmem doesn't require TDX/SNP hardware, debugging
>>> gmem guests
>>> isn't brutally painful, etc.
>>>
>>> Longer term, I have aspirations of being able to back most VMs with
>>> gmem, but
>>> that's going to require quite a bit more work, e.g. gmem needs to be
>>> mappable
>>> (when hardware allows it) so that gmem doesn't all but require double
>>> mapping,
>>> KVM obviously needs to be able to read/write gmem, etc.
>>>
>>> The value proposition is that having a guest-first memory type will
>>> allow KVM to
>>> optimize and harden gmem in ways that wouldn't be feasible for a more
>>> generic
>>> memory implementation.  E.g. memory isn't mapped into host userspace
>>> by default
>>> (makes it harder to accidentally corrupt the guest), the guest can
>>> have *larger*
>>> mappings than host userspace, guest memory can be served from a
>>> dedicated pool
>>> (similar-ish to hugetlb), the pool can be omitted from the direct map,
>>> etc.
>>>
>> Thanks for that information. Personally, I don't believe "to back most
>> VMs with gmem", but that's a different discussion.
>>
>> As a development vehicle to get TDX up and running it might be very
>> valuable indeed. But it doesn't necessarily have to be merged in QEMU
>> for that case -- especially in a semi-finished form.
> 
> It's true and I agree with it. I'll drop the KVM_X86_SW_PROTECTED_VM
> part in next version.
> 
> How would you like this series to proceed in next version? only the
> patches of gmem support without a user? or together with next QEMU TDX
> series?

Makes sense to me. GMEM series can be a prereq for QEMU TDX.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type()
  2023-09-22  0:24     ` Xiaoyao Li
@ 2023-09-22  7:11       ` David Hildenbrand
  0 siblings, 0 replies; 52+ messages in thread
From: David Hildenbrand @ 2023-09-22  7:11 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Peter Xu,
	Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 22.09.23 02:24, Xiaoyao Li wrote:
> On 9/21/2023 4:51 PM, David Hildenbrand wrote:
>> On 14.09.23 05:51, Xiaoyao Li wrote:
>>> pc_machine_kvm_type() was introduced by commit e21be724eaf5 ("i386/xen:
>>> add pc_machine_kvm_type to initialize XEN_EMULATE mode") to do Xen
>>> specific initialization by utilizing kvm_type method.
>>>
>>> commit eeedfe6c6316 ("hw/xen: Simplify emulated Xen platform init")
>>> moves the Xen specific initialization to pc_basic_device_init().
>>>
>>> There is no need to keep the PC specific kvm_type() implementation
>>> anymore.
>>
>> So we'll fallback to kvm_arch_get_default_type(), which simply returns 0.
>>
>>> On the other hand, later patch will implement kvm_type()
>>> method for all x86/i386 machines to support KVM_X86_SW_PROTECTED_VM.
>>>
>>
>> ^ I suggest dropping that and merging that patch ahead-of-time as a
>> simple cleanup.
> 
> I suppose the "that" here means "this patch", right?


With that I meant that paragraph "On the other hand, later patch ...".

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type()
  2023-09-14  3:51 ` [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
  2023-09-21  8:51   ` David Hildenbrand
@ 2023-09-23  7:32   ` David Woodhouse
  1 sibling, 0 replies; 52+ messages in thread
From: David Woodhouse @ 2023-09-23  7:32 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

[-- Attachment #1: Type: text/plain, Size: 962 bytes --]

On Wed, 2023-09-13 at 23:51 -0400, Xiaoyao Li wrote:
> pc_machine_kvm_type() was introduced by commit e21be724eaf5
> ("i386/xen:
> add pc_machine_kvm_type to initialize XEN_EMULATE mode") to do Xen
> specific initialization by utilizing kvm_type method.
> 
> commit eeedfe6c6316 ("hw/xen: Simplify emulated Xen platform init")
> moves the Xen specific initialization to pc_basic_device_init().
> 
> There is no need to keep the PC specific kvm_type() implementation
> anymore. On the other hand, later patch will implement kvm_type()
> method for all x86/i386 machines to support KVM_X86_SW_PROTECTED_VM.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>

Indeed, I added it and then later ripped everything out of it and left
it empty, as you nicely describe (thanks) in your commit message. I
have no designs on using it again, so

Acked-by: David Woodhouse <dwmw@amazon.co.uk>

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem
  2023-09-14  3:50 ` [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem Xiaoyao Li
  2023-09-15  2:04   ` Wang, Lei
  2023-09-21  8:55   ` David Hildenbrand
@ 2023-10-06 11:07   ` Daniel P. Berrangé
  2 siblings, 0 replies; 52+ messages in thread
From: Daniel P. Berrangé @ 2023-10-06 11:07 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Eric Blake, Markus Armbruster, Marcelo Tosatti,
	qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On Wed, Sep 13, 2023 at 11:50:58PM -0400, Xiaoyao Li wrote:
> From: Chao Peng <chao.p.peng@linux.intel.com>
> 
> Add KVM gmem support to RAMBlock so both normal hva based memory
> and kvm gmem fd based private memory can be associated in one RAMBlock.
> 
> Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
> gmem for the RAMBlock when it's set.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  accel/kvm/kvm-all.c     | 17 +++++++++++++++++
>  include/exec/memory.h   |  3 +++
>  include/exec/ramblock.h |  1 +
>  include/sysemu/kvm.h    |  2 ++
>  softmmu/physmem.c       | 18 +++++++++++++++---
>  5 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 60aacd925393..185ae16d9620 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -4225,3 +4225,20 @@ void query_stats_schemas_cb(StatsSchemaList **result, Error **errp)
>          query_stats_schema_vcpu(first_cpu, &stats_args);
>      }
>  }
> +
> +int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
> +{
> +    int fd;
> +    struct kvm_create_guest_memfd gmem = {
> +        .size = size,
> +        /* TODO: to decide whether KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is supported */
> +        .flags = flags,
> +    };
> +
> +    fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &gmem);
> +    if (fd < 0) {
> +        error_setg_errno(errp, errno, "%s: error creating kvm gmem\n", __func__);
> +    }
> +
> +    return fd;
> +}
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 68284428f87c..227cb2578e95 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -235,6 +235,9 @@ typedef struct IOMMUTLBEvent {
>  /* RAM is an mmap-ed named file */
>  #define RAM_NAMED_FILE (1 << 9)
>  
> +/* RAM can be private that has kvm gmem backend */
> +#define RAM_KVM_GMEM    (1 << 10)
> +
>  static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
>                                         IOMMUNotifierFlag flags,
>                                         hwaddr start, hwaddr end,
> diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> index 69c6a5390293..0d158b3909c9 100644
> --- a/include/exec/ramblock.h
> +++ b/include/exec/ramblock.h
> @@ -41,6 +41,7 @@ struct RAMBlock {
>      QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
>      int fd;
>      uint64_t fd_offset;
> +    int gmem_fd;
>      size_t page_size;
>      /* dirty bitmap used during migration */
>      unsigned long *bmap;

You're adding a file descriptor to RAMBlock, but I don't see
anything in this patch that ever calls close(gmem_fd) when the
RAMBlock is released. Presuambly reclaim_ramblock() needs to
deal with this ?


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem
  2023-09-22  7:08       ` David Hildenbrand
@ 2023-10-08  2:59         ` Xiaoyao Li
  0 siblings, 0 replies; 52+ messages in thread
From: Xiaoyao Li @ 2023-10-08  2:59 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Peter Xu, Philippe Mathieu-Daudé,
	Cornelia Huck, Daniel P. Berrangé,
	Eric Blake, Markus Armbruster, Marcelo Tosatti
  Cc: qemu-devel, kvm, Michael Roth, isaku.yamahata,
	Sean Christopherson, Claudio Fontana

On 9/22/2023 3:08 PM, David Hildenbrand wrote:
> On 22.09.23 02:22, Xiaoyao Li wrote:
>> On 9/21/2023 4:55 PM, David Hildenbrand wrote:
>>> On 14.09.23 05:50, Xiaoyao Li wrote:
>>>> From: Chao Peng <chao.p.peng@linux.intel.com>
>>>>
>>>> Add KVM gmem support to RAMBlock so both normal hva based memory
>>>> and kvm gmem fd based private memory can be associated in one RAMBlock.
>>>>
>>>> Introduce new flag RAM_KVM_GMEM. It calls KVM ioctl to create private
>>>> gmem for the RAMBlock when it's set.
>>>
>>>
>>> But who sets RAM_KVM_GMEM and when?
>>
>> The answer is in the next patch. When `private` property of memory
>> backend is set to true, it will pass RAM_KVM_GMEM flag to
>> memory_region_init_ram_*()
> 
> Okay, assuming that patch (and property) will go away, I assume this 
> flag can also go away, right?
> 

If dropping the flag RAM_KVM_GMEM, it seems we need go back to the 
approach of rfc v1[1][2], that allocating gmem inside .region_add() 
callback. Is it accepted by you?

Another option is allocating gmem inside ram_block_add() by checking the 
vm_type (it looks hacky for me). What's your opinion on this option?

One more option is, we keep the RAM_KVM_GMEM as this patch, and change 
"private" property of memory backend into "need_kvm_gmem" field (make it 
not user settable) and "need_kvm_gmem" field will be set to true in 
tdx_kvm_init() specific cgs initialization function.


[1] 
https://lore.kernel.org/qemu-devel/a154c33d-b24d-b713-0dc0-027d54f2340f@redhat.com/
[2] 
https://lore.kernel.org/qemu-devel/20230731162201.271114-10-xiaoyao.li@intel.com/






^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2023-10-08  3:00 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
2023-09-14  3:50 ` [RFC PATCH v2 01/21] *** HACK *** linux-headers: Update headers to pull in gmem APIs Xiaoyao Li
2023-09-14  3:50 ` [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem Xiaoyao Li
2023-09-15  2:04   ` Wang, Lei
2023-09-15  3:45     ` Xiaoyao Li
2023-09-21  8:55   ` David Hildenbrand
2023-09-22  0:22     ` Xiaoyao Li
2023-09-22  7:08       ` David Hildenbrand
2023-10-08  2:59         ` Xiaoyao Li
2023-10-06 11:07   ` Daniel P. Berrangé
2023-09-14  3:50 ` [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM Xiaoyao Li
2023-09-19  9:46   ` Markus Armbruster
2023-09-19 23:24     ` Xiaoyao Li
2023-09-20  7:30       ` Markus Armbruster
2023-09-20 14:35         ` Xiaoyao Li
2023-09-20 14:37           ` David Hildenbrand
2023-09-20 15:42             ` Markus Armbruster
2023-09-21  8:38               ` Xiaoyao Li
2023-09-21  8:45                 ` David Hildenbrand
2023-09-14  3:51 ` [RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd() Xiaoyao Li
2023-09-21  8:46   ` David Hildenbrand
2023-09-22  0:22     ` Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
2023-09-21  8:56   ` David Hildenbrand
2023-09-22  0:23     ` Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 06/21] i386: Add support for sw-protected-vm object Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
2023-09-21  8:51   ` David Hildenbrand
2023-09-22  0:24     ` Xiaoyao Li
2023-09-22  7:11       ` David Hildenbrand
2023-09-23  7:32   ` David Woodhouse
2023-09-14  3:51 ` [RFC PATCH v2 08/21] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 09/21] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 10/21] i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm specific functions Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 11/21] kvm: Introduce support for memory_attributes Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 12/21] kvm/memory: Introduce the infrastructure to set the default shared/private value Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 13/21] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 14/21] physmem: replace function name with __func__ in ram_block_discard_range() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 15/21] physmem: extract ram_block_discard_range_fd() from ram_block_discard_range() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 16/21] physmem: Introduce ram_block_convert_range() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 17/21] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 18/21] trace/kvm: Add trace for page convertion between shared and private Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 19/21] pci-host/q35: Move PAM initialization above SMRAM initialization Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 20/21] q35: Introduce smm_ranges property for q35-pci-host Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 21/21] i386: Disable SMM mode for X86_SW_PROTECTED_VM Xiaoyao Li
2023-09-14 13:09 ` [RFC PATCH v2 00/21] QEMU gmem implemention David Hildenbrand
2023-09-15  1:10   ` Sean Christopherson
2023-09-21  9:11     ` David Hildenbrand
2023-09-22  7:03       ` Xiaoyao Li
2023-09-22  7:10         ` David Hildenbrand
2023-09-15  3:37   ` Xiaoyao Li
2023-09-21  8:59     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).