qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/19] QEMU gmem implemention
@ 2023-07-31 16:21 Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 01/19] trace/kvm: Split address space and slot id in trace_kvm_set_user_memory() Xiaoyao Li
                   ` (20 more replies)
  0 siblings, 21 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

This is the first RFC version of enabling KVM gmem[1] as the backend for
private memory of KVM_X86_PROTECTED_VM.

It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
and introduces 'private' property for memory backend. When the vm type
is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
it will call KVM gmem ioctl to allocate private memory for the backend.

    $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
          -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
	  ...

Unfortunately this patch series fails the boot of OVMF at very early
stage due to triple fault because KVM doesn't support emulate string IO
to private memory. We leave it as an open to be discussed.

There are following design opens that need to be discussed:

1. how to determine the vm type?

   a. like this series, specify the vm type via machine property
      'kvm-type'
   b. check the memory backend, if any backend has 'private' property
      set, the vm-type is set to KVM_X86_PROTECTED_VM.

2. whether 'private' property is needed if we choose 1.b as design 

   with 1.b, QEMU can decide whether the memory region needs to be
   private (allocates gmem fd for it) or not, on its own.

3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
   purose of it and what's the requirement on it. I think it's the
   questions for KVM folks than QEMU folks.

Any other idea/open/question is welcomed.


Beside, TDX QEMU implemetation is based on this series to provide
private gmem for TD private memory, which can be found at [2].
And it can work corresponding KVM [3] to boot TDX guest. 

[1] https://lore.kernel.org/all/20230718234512.1690985-1-seanjc@google.com/
[2] https://github.com/intel/qemu-tdx/tree/tdx-upstream-wip
[3] https://github.com/intel/tdx/tree/kvm-upstream-2023.07.27-v6.5-rc2-workaround

Chao Peng (4):
  RAMBlock: Support KVM gmemory
  kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  physmem: Add ram_block_convert_range
  kvm: handle KVM_EXIT_MEMORY_FAULT

Isaku Yamahata (4):
  HostMem: Add private property to indicate to use kvm gmem
  trace/kvm: Add trace for page convertion between shared and private
  pci-host/q35: Move PAM initialization above SMRAM initialization
  q35: Introduce smm_ranges property for q35-pci-host

Xiaoyao Li (11):
  trace/kvm: Split address space and slot id in
    trace_kvm_set_user_memory()
  *** HACK *** linux-headers: Update headers to pull in gmem APIs
  memory: Introduce memory_region_can_be_private()
  i386/pc: Drop pc_machine_kvm_type()
  target/i386: Implement mc->kvm_type() to get VM type
  i386/kvm: Create gmem fd for KVM_X86_SW_PROTECTED_VM
  kvm: Introduce support for memory_attributes
  kvm/memory: Introduce the infrastructure to set the default
    shared/private value
  i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM
  physmem: replace function name with __func__ in
    ram_block_discard_range()
  i386: Disable SMM mode for X86_SW_PROTECTED_VM

 accel/kvm/kvm-all.c         | 166 +++++++++++++++++++++++++++++++++---
 accel/kvm/trace-events      |   4 +-
 backends/hostmem.c          |  18 ++++
 hw/i386/pc.c                |   5 --
 hw/i386/pc_q35.c            |   3 +-
 hw/i386/x86.c               |  27 ++++++
 hw/pci-host/q35.c           |  61 ++++++++-----
 include/exec/cpu-common.h   |   2 +
 include/exec/memory.h       |  24 ++++++
 include/exec/ramblock.h     |   1 +
 include/hw/i386/pc.h        |   4 +-
 include/hw/i386/x86.h       |   4 +
 include/hw/pci-host/q35.h   |   1 +
 include/sysemu/hostmem.h    |   2 +-
 include/sysemu/kvm.h        |   3 +
 include/sysemu/kvm_int.h    |   2 +
 linux-headers/asm-x86/kvm.h |   3 +
 linux-headers/linux/kvm.h   |  50 +++++++++++
 qapi/qom.json               |   4 +
 softmmu/memory.c            |  27 ++++++
 softmmu/physmem.c           |  97 ++++++++++++++-------
 target/i386/kvm/kvm.c       |  84 ++++++++++++++++++
 target/i386/kvm/kvm_i386.h  |   1 +
 23 files changed, 517 insertions(+), 76 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 53+ messages in thread

* [RFC PATCH 01/19] trace/kvm: Split address space and slot id in trace_kvm_set_user_memory()
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 02/19] *** HACK *** linux-headers: Update headers to pull in gmem APIs Xiaoyao Li
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

The upper 16 bits of kvm_userspace_memory_region::slot are
address space id. Parse it separately in trace_kvm_set_user_memory().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c    | 5 +++--
 accel/kvm/trace-events | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 373d876c0580..d8eee405de24 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -309,8 +309,9 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo
     ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
     slot->old_flags = mem.flags;
 err:
-    trace_kvm_set_user_memory(mem.slot, mem.flags, mem.guest_phys_addr,
-                              mem.memory_size, mem.userspace_addr, ret);
+    trace_kvm_set_user_memory(mem.slot >> 16, (uint16_t)mem.slot, mem.flags,
+                              mem.guest_phys_addr, mem.memory_size,
+                              mem.userspace_addr, ret);
     if (ret < 0) {
         error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
                      " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 399aaeb0ec75..14ebfa1b991c 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,7 +15,7 @@ kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
 kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
 kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
-kvm_set_user_memory(uint32_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
+kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
 kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
 kvm_resample_fd_notify(int gsi) "gsi %d"
 kvm_dirty_ring_full(int id) "vcpu %d"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 02/19] *** HACK *** linux-headers: Update headers to pull in gmem APIs
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 01/19] trace/kvm: Split address space and slot id in trace_kvm_set_user_memory() Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 03/19] RAMBlock: Support KVM gmemory Xiaoyao Li
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

This patch needs to be updated by script

	scripts/update-linux-headers.sh

once gmem fd support is upstreamed in Linux kernel.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 linux-headers/asm-x86/kvm.h |  3 +++
 linux-headers/linux/kvm.h   | 50 +++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 2b3a8f7bd2c0..003fb745347c 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -560,4 +560,7 @@ struct kvm_pmu_event_filter {
 /* x86-specific KVM_EXIT_HYPERCALL flags. */
 #define KVM_EXIT_HYPERCALL_LONG_MODE	BIT(0)
 
+#define KVM_X86_DEFAULT_VM	0
+#define KVM_X86_SW_PROTECTED_VM	1
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 1f3f3333a4d8..278bed78f98e 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -95,6 +95,19 @@ struct kvm_userspace_memory_region {
 	__u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
+/* for KVM_SET_USER_MEMORY_REGION2 */
+struct kvm_userspace_memory_region2 {
+	__u32 slot;
+	__u32 flags;
+	__u64 guest_phys_addr;
+	__u64 memory_size;
+	__u64 userspace_addr;
+	__u64 gmem_offset;
+	__u32 gmem_fd;
+	__u32 pad1;
+	__u64 pad2[14];
+};
+
 /*
  * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for
  * userspace, other bits are reserved for kvm internal use which are defined
@@ -102,6 +115,7 @@ struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 #define KVM_MEM_READONLY	(1UL << 1)
+#define KVM_MEM_PRIVATE		(1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -264,6 +278,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_SBI        35
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
+#define KVM_EXIT_MEMORY_FAULT     38
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -506,6 +521,13 @@ struct kvm_run {
 #define KVM_NOTIFY_CONTEXT_INVALID	(1 << 0)
 			__u32 flags;
 		} notify;
+		/* KVM_EXIT_MEMORY_FAULT */
+		struct {
+#define KVM_MEMORY_EXIT_FLAG_PRIVATE	(1ULL << 3)
+			__u64 flags;
+			__u64 gpa;
+			__u64 size;
+		} memory;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -1188,6 +1210,9 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_COUNTER_OFFSET 227
 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
+#define KVM_CAP_USER_MEMORY2 230
+#define KVM_CAP_MEMORY_ATTRIBUTES 231
+#define KVM_CAP_VM_TYPES 232
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1462,6 +1487,8 @@ struct kvm_vfio_spapr_tce {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \
+					 struct kvm_userspace_memory_region2)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
@@ -2245,4 +2272,27 @@ struct kvm_s390_zpci_op {
 /* flags for kvm_s390_zpci_op->u.reg_aen.flags */
 #define KVM_S390_ZPCIOP_REGAEN_HOST    (1 << 0)
 
+/* Available with KVM_CAP_MEMORY_ATTRIBUTES */
+#define KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES    _IOR(KVMIO,  0xd2, __u64)
+#define KVM_SET_MEMORY_ATTRIBUTES              _IOW(KVMIO,  0xd3, struct kvm_memory_attributes)
+
+struct kvm_memory_attributes {
+	__u64 address;
+	__u64 size;
+	__u64 attributes;
+	__u64 flags;
+};
+
+#define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
+
+#define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
+
+#define KVM_GUEST_MEMFD_ALLOW_HUGEPAGE		(1ULL << 0)
+
+struct kvm_create_guest_memfd {
+	__u64 size;
+	__u64 flags;
+	__u64 reserved[6];
+};
+
 #endif /* __LINUX_KVM_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 03/19] RAMBlock: Support KVM gmemory
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 01/19] trace/kvm: Split address space and slot id in trace_kvm_set_user_memory() Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 02/19] *** HACK *** linux-headers: Update headers to pull in gmem APIs Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-08-01 16:33   ` David Hildenbrand
  2023-07-31 16:21 ` [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private() Xiaoyao Li
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

From: Chao Peng <chao.p.peng@linux.intel.com>

Add KVM gmem support to RAMBlock so we can have both normal
hva based memory and gmem fd based memory in one RAMBlock.

The gmem part is represented by the gmem_fd.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 include/exec/memory.h   | 8 ++++++++
 include/exec/ramblock.h | 1 +
 softmmu/memory.c        | 9 +++++++++
 softmmu/physmem.c       | 2 ++
 4 files changed, 20 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7f5c11a0cc9e..61e31c7b9874 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1376,6 +1376,14 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
                                     int fd,
                                     ram_addr_t offset,
                                     Error **errp);
+/**
+ * memory_region_set_gmem_fd:  Set RAM memory region with a restricted fd.
+ *
+ * @mr: the #MemoryRegion to be set.
+ * @fd: the fd to provide restricted memory.
+ */
+void memory_region_set_gmem_fd(MemoryRegion *mr, int fd);
+
 #endif
 
 /**
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 69c6a5390293..0d158b3909c9 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -41,6 +41,7 @@ struct RAMBlock {
     QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
     int fd;
     uint64_t fd_offset;
+    int gmem_fd;
     size_t page_size;
     /* dirty bitmap used during migration */
     unsigned long *bmap;
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce7028..4f8f8c0a02e6 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1661,6 +1661,15 @@ void memory_region_init_ram_from_fd(MemoryRegion *mr,
         error_propagate(errp, err);
     }
 }
+
+void memory_region_set_gmem_fd(MemoryRegion *mr, int fd)
+{
+    if (mr->ram_block) {
+        assert(fd >= 0);
+        mr->ram_block->gmem_fd = fd;
+    }
+}
+
 #endif
 
 void memory_region_init_ram_ptr(MemoryRegion *mr,
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1fe..8f64128de0b5 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1920,6 +1920,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr,
     new_block->used_length = size;
     new_block->max_length = size;
     new_block->flags = ram_flags;
+    new_block->gmem_fd = -1;
     new_block->host = file_ram_alloc(new_block, size, fd, readonly,
                                      !file_size, offset, errp);
     if (!new_block->host) {
@@ -1990,6 +1991,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
     new_block->max_length = max_size;
     assert(max_size >= size);
     new_block->fd = -1;
+    new_block->gmem_fd = -1;
     new_block->page_size = qemu_real_host_page_size();
     new_block->host = host;
     new_block->flags = ram_flags;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (2 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 03/19] RAMBlock: Support KVM gmemory Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 21:23   ` Peter Xu
  2023-08-01 16:48   ` Claudio Fontana
  2023-07-31 16:21 ` [RFC PATCH 05/19] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
                   ` (16 subsequent siblings)
  20 siblings, 2 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 include/exec/memory.h | 9 +++++++++
 softmmu/memory.c      | 5 +++++
 2 files changed, 14 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 61e31c7b9874..e119d3ce1a1d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1679,6 +1679,15 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
  */
 bool memory_region_is_protected(MemoryRegion *mr);
 
+/**
+ * memory_region_can_be_private: check whether a memory region can be private
+ *
+ * Returns %true if a memory region's ram_block has valid gmem fd assigned.
+ *
+ * @mr: the memory region being queried
+ */
+bool memory_region_can_be_private(MemoryRegion *mr);
+
 /**
  * memory_region_get_iommu: check whether a memory region is an iommu
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 4f8f8c0a02e6..336c76ede660 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1855,6 +1855,11 @@ bool memory_region_is_protected(MemoryRegion *mr)
     return mr->ram && (mr->ram_block->flags & RAM_PROTECTED);
 }
 
+bool memory_region_can_be_private(MemoryRegion *mr)
+{
+    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
+}
+
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
     uint8_t mask = mr->dirty_log_mask;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 05/19] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (3 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private() Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-08-01 17:10   ` Claudio Fontana
  2023-07-31 16:21 ` [RFC PATCH 06/19] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

From: Chao Peng <chao.p.peng@linux.intel.com>

Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.

With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
backen'ed both by hva-based shared memory and gmem fd based private
memory.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c      | 57 +++++++++++++++++++++++++++++++++-------
 accel/kvm/trace-events   |  2 +-
 include/sysemu/kvm_int.h |  2 ++
 3 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index d8eee405de24..7b1818334ba7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -288,35 +288,68 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
 static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, bool new)
 {
     KVMState *s = kvm_state;
-    struct kvm_userspace_memory_region mem;
+    struct kvm_userspace_memory_region2 mem;
+    static int cap_user_memory2 = -1;
     int ret;
 
+    if (cap_user_memory2 == -1) {
+        cap_user_memory2 = kvm_check_extension(s, KVM_CAP_USER_MEMORY2);
+    }
+
+    if (!cap_user_memory2 && slot->fd >= 0) {
+        error_report("%s, KVM doesn't support gmem!", __func__);
+        exit(1);
+    }
+
     mem.slot = slot->slot | (kml->as_id << 16);
     mem.guest_phys_addr = slot->start_addr;
     mem.userspace_addr = (unsigned long)slot->ram;
     mem.flags = slot->flags;
+    mem.gmem_fd = slot->fd;
+    mem.gmem_offset = slot->ofs;
 
-    if (slot->memory_size && !new && (mem.flags ^ slot->old_flags) & KVM_MEM_READONLY) {
+    if (slot->memory_size && !new && (slot->flags ^ slot->old_flags) & KVM_MEM_READONLY) {
         /* Set the slot size to 0 before setting the slot to the desired
          * value. This is needed based on KVM commit 75d61fbc. */
         mem.memory_size = 0;
-        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+
+        if (cap_user_memory2) {
+            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
+        } else {
+            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+	    }
         if (ret < 0) {
             goto err;
         }
     }
     mem.memory_size = slot->memory_size;
-    ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+    if (cap_user_memory2) {
+        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
+    } else {
+        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+    }
     slot->old_flags = mem.flags;
 err:
     trace_kvm_set_user_memory(mem.slot >> 16, (uint16_t)mem.slot, mem.flags,
                               mem.guest_phys_addr, mem.memory_size,
-                              mem.userspace_addr, ret);
+                              mem.userspace_addr, mem.gmem_fd,
+			      mem.gmem_offset, ret);
     if (ret < 0) {
-        error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
-                     " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
-                     __func__, mem.slot, slot->start_addr,
-                     (uint64_t)mem.memory_size, strerror(errno));
+        if (cap_user_memory2) {
+                error_report("%s: KVM_SET_USER_MEMORY_REGION2 failed, slot=%d,"
+                        " start=0x%" PRIx64 ", size=0x%" PRIx64 ","
+                        " flags=0x%" PRIx32 ","
+                        " gmem_fd=%" PRId32 ", gmem_offset=0x%" PRIx64 ": %s",
+                        __func__, mem.slot, slot->start_addr,
+                (uint64_t)mem.memory_size, mem.flags,
+                        mem.gmem_fd, (uint64_t)mem.gmem_offset,
+                        strerror(errno));
+        } else {
+                error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
+                            " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
+                __func__, mem.slot, slot->start_addr,
+                (uint64_t)mem.memory_size, strerror(errno));
+        }
     }
     return ret;
 }
@@ -472,6 +505,9 @@ static int kvm_mem_flags(MemoryRegion *mr)
     if (readonly && kvm_readonly_mem_allowed) {
         flags |= KVM_MEM_READONLY;
     }
+    if (memory_region_can_be_private(mr)) {
+        flags |= KVM_MEM_PRIVATE;
+    }
     return flags;
 }
 
@@ -1402,6 +1438,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
         mem->ram_start_offset = ram_start_offset;
         mem->ram = ram;
         mem->flags = kvm_mem_flags(mr);
+        mem->fd = mr->ram_block->gmem_fd;
+        mem->ofs = (uint8_t*)ram - mr->ram_block->host;
+
         kvm_slot_init_dirty_bitmap(mem);
         err = kvm_set_user_memory_region(kml, mem, true);
         if (err) {
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 14ebfa1b991c..80694683acea 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -15,7 +15,7 @@ kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 kvm_irqchip_release_virq(int virq) "virq %d"
 kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
 kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
-kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
+kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, uint32_t fd, uint64_t fd_offset, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " gmem_fd=%d" " gmem_fd_offset=0x%" PRIx64 " ret=%d"
 kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
 kvm_resample_fd_notify(int gsi) "gsi %d"
 kvm_dirty_ring_full(int id) "vcpu %d"
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 511b42bde5c4..48220c0793ac 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -30,6 +30,8 @@ typedef struct KVMSlot
     int as_id;
     /* Cache of the offset in ram address space */
     ram_addr_t ram_start_offset;
+    int fd;
+    hwaddr ofs;
 } KVMSlot;
 
 typedef struct KVMMemoryUpdate {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 06/19] i386/pc: Drop pc_machine_kvm_type()
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (4 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 05/19] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-08-02 23:00   ` Isaku Yamahata
  2023-07-31 16:21 ` [RFC PATCH 07/19] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

pc_machine_kvm_type() was introduced by commit e21be724eaf5 ("i386/xen:
add pc_machine_kvm_type to initialize XEN_EMULATE mode") to do Xen
specific initialization by utilizing kvm_type method.

commit eeedfe6c6316 ("hw/xen: Simplify emulated Xen platform init")
moves the Xen specific initialization to pc_basic_device_init().

There is no need to keep the PC specific kvm_type() implementation
anymore. On the other hand, later patch will implement kvm_type()
method for all x86/i386 machines to support KVM_X86_SW_PROTECTED_VM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/pc.c         | 5 -----
 include/hw/i386/pc.h | 3 ---
 2 files changed, 8 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3109d5e0e035..abeadd903827 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1794,11 +1794,6 @@ static void pc_machine_initfn(Object *obj)
     cxl_machine_init(obj, &pcms->cxl_devices_state);
 }
 
-int pc_machine_kvm_type(MachineState *machine, const char *kvm_type)
-{
-    return 0;
-}
-
 static void pc_machine_reset(MachineState *machine, ShutdownCause reason)
 {
     CPUState *cs;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index d54e8b1101e4..c98d628a76f3 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -296,15 +296,12 @@ extern const size_t pc_compat_1_5_len;
 extern GlobalProperty pc_compat_1_4[];
 extern const size_t pc_compat_1_4_len;
 
-int pc_machine_kvm_type(MachineState *machine, const char *vm_type);
-
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
     static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
     { \
         MachineClass *mc = MACHINE_CLASS(oc); \
         optsfn(mc); \
         mc->init = initfn; \
-        mc->kvm_type = pc_machine_kvm_type; \
     } \
     static const TypeInfo pc_machine_type_##suffix = { \
         .name       = namestr TYPE_MACHINE_SUFFIX, \
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 07/19] target/i386: Implement mc->kvm_type() to get VM type
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (5 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 06/19] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem Xiaoyao Li
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

Implement mc->kvm_type() for i386 machines. It provides a way for user
to create SW_PROTECTE_VM.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/x86.c              | 27 +++++++++++++++++++++++++++
 include/hw/i386/x86.h      |  4 ++++
 target/i386/kvm/kvm.c      | 38 ++++++++++++++++++++++++++++++++++++++
 target/i386/kvm/kvm_i386.h |  1 +
 4 files changed, 70 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a88a126123be..3ccd06154249 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1382,6 +1382,26 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
     qapi_free_SgxEPCList(list);
 }
 
+static int x86_kvm_type(MachineState *ms, const char *vm_type)
+{
+    return kvm_get_vm_type(ms, vm_type);
+}
+
+static char *x86_machine_get_kvm_type(Object *obj, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    return g_strdup(x86ms->kvm_type);
+}
+
+static void x86_machine_set_kvm_type(Object *obj, const char *value, Error **errp)
+{
+    X86MachineState *x86ms = X86_MACHINE(obj);
+
+    g_free(x86ms->kvm_type);
+    x86ms->kvm_type = g_strdup(value);
+}
+
 static void x86_machine_initfn(Object *obj)
 {
     X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1406,6 +1426,7 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
     mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
     mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
     mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+    mc->kvm_type = x86_kvm_type;
     x86mc->save_tsc_khz = true;
     x86mc->fwcfg_dma_enabled = true;
     nc->nmi_monitor_handler = x86_nmi;
@@ -1464,6 +1485,12 @@ static void x86_machine_class_init(ObjectClass *oc, void *data)
         NULL, NULL);
     object_class_property_set_description(oc, "sgx-epc",
         "SGX EPC device");
+
+    object_class_property_add_str(oc, X86_MACHINE_KVM_TYPE,
+                                  x86_machine_get_kvm_type,
+                                  x86_machine_set_kvm_type);
+    object_class_property_set_description(oc, X86_MACHINE_KVM_TYPE,
+                                          "KVM guest type (default, sw_protected_vm)");
 }
 
 static const TypeInfo x86_machine_info = {
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index da19ae15463a..a3d03f78cefe 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -42,6 +42,9 @@ struct X86MachineState {
 
     /*< public >*/
 
+    char *kvm_type;
+    unsigned int vm_type;
+
     /* Pointers to devices and objects: */
     ISADevice *rtc;
     FWCfgState *fw_cfg;
@@ -91,6 +94,7 @@ struct X86MachineState {
 #define X86_MACHINE_OEM_ID           "x-oem-id"
 #define X86_MACHINE_OEM_TABLE_ID     "x-oem-table-id"
 #define X86_MACHINE_BUS_LOCK_RATELIMIT  "bus-lock-ratelimit"
+#define X86_MACHINE_KVM_TYPE         "kvm-type"
 
 #define TYPE_X86_MACHINE   MACHINE_TYPE_NAME("x86")
 OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f8cc8eb1fe70..7971f0fd74b1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -154,6 +154,44 @@ static KVMMSRHandlers msr_handlers[KVM_MSR_FILTER_MAX_RANGES];
 static RateLimit bus_lock_ratelimit_ctrl;
 static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
 
+static const char* vm_type_name[] = {
+    [KVM_X86_DEFAULT_VM] = "default",
+    [KVM_X86_SW_PROTECTED_VM] = "sw-protected-vm",
+};
+
+int kvm_get_vm_type(MachineState *ms, const char *vm_type)
+{
+    X86MachineState *x86ms = X86_MACHINE(ms);
+    int kvm_type = KVM_X86_DEFAULT_VM;
+
+    if (vm_type) {
+        if (!g_ascii_strcasecmp(vm_type, "default") || !g_ascii_strcasecmp(vm_type, "")) {
+            kvm_type = KVM_X86_DEFAULT_VM;
+        } else if (!g_ascii_strcasecmp(vm_type, "sw-protected-vm")) {
+            kvm_type = KVM_X86_SW_PROTECTED_VM;
+        } else {
+            error_report("Unknown kvm-type specified '%s'", vm_type);
+            exit(1);
+        }
+    }
+
+    /*
+     * old KVM doesn't support KVM_CAP_VM_TYPES and KVM_X86_DEFAULT_VM
+     * is always supported
+     */
+    if (kvm_type == KVM_X86_DEFAULT_VM) {
+        return kvm_type;
+    }
+
+    if (!(kvm_check_extension(KVM_STATE(ms->accelerator), KVM_CAP_VM_TYPES) & BIT(kvm_type))) {
+        error_report("vm-type %s not supported by KVM", vm_type_name[kvm_type]);
+        exit(1);
+    }
+
+    x86ms->vm_type = kvm_type;
+    return kvm_type;
+}
+
 int kvm_has_pit_state2(void)
 {
     return has_pit_state2;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index e24753abfe6a..ea3a5b174ac0 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -37,6 +37,7 @@ bool kvm_has_adjust_clock(void);
 bool kvm_has_adjust_clock_stable(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
+int kvm_get_vm_type(MachineState *ms, const char *vm_type);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_after_reset_vcpu(X86CPU *cpu);
 void kvm_arch_do_init_vcpu(X86CPU *cs);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (6 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 07/19] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 17:22   ` Markus Armbruster
  2023-08-01 17:21   ` David Hildenbrand
  2023-07-31 16:21 ` [RFC PATCH 09/19] i386/kvm: Create gmem fd for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
                   ` (12 subsequent siblings)
  20 siblings, 2 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 backends/hostmem.c       | 18 ++++++++++++++++++
 include/sysemu/hostmem.h |  2 +-
 qapi/qom.json            |  4 ++++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 747e7838c031..dbdbb0aafd45 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -461,6 +461,20 @@ static void host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
     }
     backend->reserve = value;
 }
+
+static bool host_memory_backend_get_private(Object *o, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+    return backend->private;
+}
+
+static void host_memory_backend_set_private(Object *o, bool value, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+
+    backend->private = value;
+}
 #endif /* CONFIG_LINUX */
 
 static bool
@@ -541,6 +555,10 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
         host_memory_backend_get_reserve, host_memory_backend_set_reserve);
     object_class_property_set_description(oc, "reserve",
         "Reserve swap space (or huge pages) if applicable");
+    object_class_property_add_bool(oc, "private",
+        host_memory_backend_get_private, host_memory_backend_set_private);
+    object_class_property_set_description(oc, "private",
+        "Use KVM gmem private memory");
 #endif /* CONFIG_LINUX */
     /*
      * Do not delete/rename option. This option must be considered stable
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 39326f1d4f9c..d88970395618 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -65,7 +65,7 @@ struct HostMemoryBackend {
     /* protected */
     uint64_t size;
     bool merge, dump, use_canonical_path;
-    bool prealloc, is_mapped, share, reserve;
+    bool prealloc, is_mapped, share, reserve, private;
     uint32_t prealloc_threads;
     ThreadContext *prealloc_context;
     DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
diff --git a/qapi/qom.json b/qapi/qom.json
index 7f92ea43e8e1..e0b2044e3d20 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -605,6 +605,9 @@
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 #     (default: true) (since 6.1)
 #
+# @private: if true, use KVM gmem private memory
+#           (default: false) (since 8.1)
+#
 # @size: size of the memory region in bytes
 #
 # @x-use-canonical-path-for-ramblock-id: if true, the canonical path
@@ -631,6 +634,7 @@
             '*prealloc-context': 'str',
             '*share': 'bool',
             '*reserve': 'bool',
+            '*private': 'bool',
             'size': 'size',
             '*x-use-canonical-path-for-ramblock-id': 'bool' } }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 09/19] i386/kvm: Create gmem fd for KVM_X86_SW_PROTECTED_VM
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (7 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 10/19] kvm: Introduce support for memory_attributes Xiaoyao Li
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

Register a memory listener for KVM_X86_SW_PROVTED_VM. It creates gmem
for the backend who sets the private property.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 include/exec/memory.h |  1 +
 target/i386/kvm/kvm.c | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index e119d3ce1a1d..ddf0e14970b0 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -814,6 +814,7 @@ struct IOMMUMemoryRegion {
 #define MEMORY_LISTENER_PRIORITY_MIN            0
 #define MEMORY_LISTENER_PRIORITY_ACCEL          10
 #define MEMORY_LISTENER_PRIORITY_DEV_BACKEND    10
+#define MEMORY_LISTENER_PRIORITY_ACCEL_HIGH     20
 
 /**
  * struct MemoryListener: callbacks structure for updates to the physical memory map
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7971f0fd74b1..df3a5f89396e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -37,6 +37,7 @@
 #include "hyperv-proto.h"
 
 #include "exec/gdbstub.h"
+#include "exec/ramblock.h"
 #include "qemu/host-utils.h"
 #include "qemu/main-loop.h"
 #include "qemu/ratelimit.h"
@@ -2591,8 +2592,41 @@ static void register_smram_listener(Notifier *n, void *unused)
                                  &smram_address_space, 1, "kvm-smram");
 }
 
+static void kvm_x86_sw_protected_vm_region_add(MemoryListener *listenr,
+                                       MemoryRegionSection *section)
+{
+    MemoryRegion *mr = section->mr;
+    Object *owner = memory_region_owner(mr);
+    int fd;
+
+    if (owner && object_dynamic_cast(owner, TYPE_MEMORY_BACKEND) &&
+        object_property_get_bool(owner, "private", NULL) &&
+        mr->ram_block && mr->ram_block->gmem_fd < 0) {
+        struct kvm_create_guest_memfd gmem = {
+            .size = memory_region_size(mr),
+            /* TODO: to decide whether KVM_GUEST_MEMFD_ALLOW_HUGEPAGE is supported */
+            .flags = KVM_GUEST_MEMFD_ALLOW_HUGEPAGE,
+        };
+
+        fd = kvm_vm_ioctl(kvm_state, KVM_CREATE_GUEST_MEMFD, &gmem);
+        if (fd < 0) {
+            error_report("%s: error creating gmem: %s\n", __func__, strerror(-fd));
+            exit(1);
+        }
+        memory_region_set_gmem_fd(mr, fd);
+    }
+}
+
+static MemoryListener kvm_x86_sw_protected_vm_memory_listener = {
+    .name = "kvm_x86_sw_protected_vm_memory_listener",
+    .region_add = kvm_x86_sw_protected_vm_region_add,
+    /* Higher than KVM memory listener = 10. */
+    .priority = MEMORY_LISTENER_PRIORITY_ACCEL_HIGH,
+};
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
+    X86MachineState *x86ms = X86_MACHINE(ms);
     uint64_t identity_base = 0xfffbc000;
     uint64_t shadow_mem;
     int ret;
@@ -2617,6 +2651,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         return ret;
     }
 
+    if (x86ms->vm_type == KVM_X86_SW_PROTECTED_VM) {
+        memory_listener_register(&kvm_x86_sw_protected_vm_memory_listener, &address_space_memory);
+    }
+
     if (!kvm_check_extension(s, KVM_CAP_IRQ_ROUTING)) {
         error_report("kvm: KVM_CAP_IRQ_ROUTING not supported by KVM");
         return -ENOTSUP;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 10/19] kvm: Introduce support for memory_attributes
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (8 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 09/19] i386/kvm: Create gmem fd for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 11/19] kvm/memory: Introduce the infrastructure to set the default shared/private value Xiaoyao Li
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

Introcude the helper functions to set the attributes of a range of
memory to private and shared.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/sysemu/kvm.h |  3 +++
 2 files changed, 46 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7b1818334ba7..6dd22fa4fd6f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -105,6 +105,7 @@ bool kvm_msi_use_devid;
 bool kvm_has_guest_debug;
 static int kvm_sstep_flags;
 static bool kvm_immediate_exit;
+static uint64_t kvm_supported_memory_attributes;
 static hwaddr kvm_max_slot_size = ~0;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
@@ -1343,6 +1344,44 @@ void kvm_set_max_memslot_size(hwaddr max_slot_size)
     kvm_max_slot_size = max_slot_size;
 }
 
+static int kvm_set_memory_attributes(hwaddr start, hwaddr size, uint64_t attr)
+{
+    struct kvm_memory_attributes attrs;
+    int r;
+
+    attrs.attributes = attr;
+    attrs.address = start;
+    attrs.size = size;
+    attrs.flags = 0;
+
+    r = kvm_vm_ioctl(kvm_state, KVM_SET_MEMORY_ATTRIBUTES, &attrs);
+    if (r) {
+        warn_report("%s: failed to set memory (0x%lx+%#zx) with attr 0x%lx error '%s'",
+                     __func__, start, size, attr, strerror(errno));
+    }
+    return r;
+}
+
+int kvm_set_memory_attributes_private(hwaddr start, hwaddr size)
+{
+    if (!(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+        error_report("KVM doesn't support PRIVATE memory attribute\n");
+        return -EINVAL;
+    }
+
+    return kvm_set_memory_attributes(start, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
+}
+
+int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size)
+{
+    if (!(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+        error_report("KVM doesn't support PRIVATE memory attribute\n");
+        return -EINVAL;
+    }
+
+    return kvm_set_memory_attributes(start, size, 0);
+}
+
 /* Called with KVMMemoryListener.slots_lock held */
 static void kvm_set_phys_mem(KVMMemoryListener *kml,
                              MemoryRegionSection *section, bool add)
@@ -2556,6 +2595,10 @@ static int kvm_init(MachineState *ms)
     }
     s->as = g_new0(struct KVMAs, s->nr_as);
 
+    if (kvm_check_extension(s, KVM_CAP_MEMORY_ATTRIBUTES)) {
+        kvm_supported_memory_attributes = kvm_ioctl(s, KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES, 0);
+    }
+
     if (object_property_find(OBJECT(current_machine), "kvm-type")) {
         g_autofree char *kvm_type = object_property_get_str(OBJECT(current_machine),
                                                             "kvm-type",
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 115f0cca79d1..49c896d8a512 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -580,4 +580,7 @@ bool kvm_arch_cpu_check_are_resettable(void);
 bool kvm_dirty_ring_enabled(void);
 
 uint32_t kvm_dirty_ring_size(void);
+
+int kvm_set_memory_attributes_private(hwaddr start, hwaddr size);
+int kvm_set_memory_attributes_shared(hwaddr start, hwaddr size);
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 11/19] kvm/memory: Introduce the infrastructure to set the default shared/private value
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (9 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 10/19] kvm: Introduce support for memory_attributes Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 12/19] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

Introduce new flags for MemoryRegion to indicate the default attribute,
private or not. And update the attribute if default private.

Originated-from: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c   | 10 ++++++++++
 include/exec/memory.h |  6 ++++++
 softmmu/memory.c      | 13 +++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 6dd22fa4fd6f..f9b5050b8885 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1487,6 +1487,16 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
                     strerror(-err));
             abort();
         }
+
+        if (memory_region_is_default_private(mr)) {
+            err = kvm_set_memory_attributes_private(start_addr, slot_size);
+            if (err) {
+                error_report("%s: failed to set memory attribute private: %s\n",
+                             __func__, strerror(-err));
+                exit(1);
+            }
+        }
+
         start_addr += slot_size;
         ram_start_offset += slot_size;
         ram += slot_size;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index ddf0e14970b0..759f797b6acd 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -235,6 +235,9 @@ typedef struct IOMMUTLBEvent {
 /* RAM is an mmap-ed named file */
 #define RAM_NAMED_FILE (1 << 9)
 
+/* RAM is default private */
+#define RAM_DEFAULT_PRIVATE     (1 << 10)
+
 static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn,
                                        IOMMUNotifierFlag flags,
                                        hwaddr start, hwaddr end,
@@ -1689,6 +1692,9 @@ bool memory_region_is_protected(MemoryRegion *mr);
  */
 bool memory_region_can_be_private(MemoryRegion *mr);
 
+void memory_region_set_default_private(MemoryRegion *mr);
+bool memory_region_is_default_private(MemoryRegion *mr);
+
 /**
  * memory_region_get_iommu: check whether a memory region is an iommu
  *
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 336c76ede660..af6aa3c1e3c9 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1860,6 +1860,19 @@ bool memory_region_can_be_private(MemoryRegion *mr)
     return mr->ram_block && mr->ram_block->gmem_fd >= 0;
 }
 
+bool memory_region_is_default_private(MemoryRegion *mr)
+{
+    return mr->ram_block && mr->ram_block->gmem_fd >= 0 &&
+        (mr->ram_block->flags & RAM_DEFAULT_PRIVATE);
+}
+
+void memory_region_set_default_private(MemoryRegion *mr)
+{
+    if (mr->ram_block && mr->ram_block->gmem_fd >= 0) {
+        mr->ram_block->flags |= RAM_DEFAULT_PRIVATE;
+    }
+}
+
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
     uint8_t mask = mr->dirty_log_mask;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 12/19] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (10 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 11/19] kvm/memory: Introduce the infrastructure to set the default shared/private value Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 13/19] physmem: replace function name with __func__ in ram_block_discard_range() Xiaoyao Li
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index df3a5f89396e..a96640512dbc 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2614,6 +2614,7 @@ static void kvm_x86_sw_protected_vm_region_add(MemoryListener *listenr,
             exit(1);
         }
         memory_region_set_gmem_fd(mr, fd);
+        memory_region_set_default_private(mr);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 13/19] physmem: replace function name with __func__ in ram_block_discard_range()
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (11 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 12/19] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 14/19] physmem: Add ram_block_convert_range Xiaoyao Li
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 softmmu/physmem.c | 34 +++++++++++++++-------------------
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 8f64128de0b5..05c981e5c18e 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3430,16 +3430,15 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
     uint8_t *host_startaddr = rb->host + start;
 
     if (!QEMU_PTR_IS_ALIGNED(host_startaddr, rb->page_size)) {
-        error_report("ram_block_discard_range: Unaligned start address: %p",
-                     host_startaddr);
+        error_report("%s: Unaligned start address: %p",
+                     __func__, host_startaddr);
         goto err;
     }
 
     if ((start + length) <= rb->max_length) {
         bool need_madvise, need_fallocate;
         if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
-            error_report("ram_block_discard_range: Unaligned length: %zx",
-                         length);
+            error_report("%s: Unaligned length: %zx", __func__, length);
             goto err;
         }
 
@@ -3469,27 +3468,26 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
              * file.
              */
             if (!qemu_ram_is_shared(rb)) {
-                warn_report_once("ram_block_discard_range: Discarding RAM"
+                warn_report_once("%s: Discarding RAM"
                                  " in private file mappings is possibly"
                                  " dangerous, because it will modify the"
                                  " underlying file and will affect other"
-                                 " users of the file");
+                                 " users of the file", __func__);
             }
 
             ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                             start, length);
             if (ret) {
                 ret = -errno;
-                error_report("ram_block_discard_range: Failed to fallocate "
-                             "%s:%" PRIx64 " +%zx (%d)",
-                             rb->idstr, start, length, ret);
+                error_report("%s: Failed to fallocate %s:%" PRIx64 " +%zx (%d)",
+                             __func__, rb->idstr, start, length, ret);
                 goto err;
             }
 #else
             ret = -ENOSYS;
-            error_report("ram_block_discard_range: fallocate not available/file"
+            error_report("%s: fallocate not available/file"
                          "%s:%" PRIx64 " +%zx (%d)",
-                         rb->idstr, start, length, ret);
+                         __func__, rb->idstr, start, length, ret);
             goto err;
 #endif
         }
@@ -3507,25 +3505,23 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
             }
             if (ret) {
                 ret = -errno;
-                error_report("ram_block_discard_range: Failed to discard range "
+                error_report("%s: Failed to discard range "
                              "%s:%" PRIx64 " +%zx (%d)",
-                             rb->idstr, start, length, ret);
+                             __func__, rb->idstr, start, length, ret);
                 goto err;
             }
 #else
             ret = -ENOSYS;
-            error_report("ram_block_discard_range: MADVISE not available"
-                         "%s:%" PRIx64 " +%zx (%d)",
-                         rb->idstr, start, length, ret);
+            error_report("%s: MADVISE not available %s:%" PRIx64 " +%zx (%d)",
+                         __func__, rb->idstr, start, length, ret);
             goto err;
 #endif
         }
         trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
                                       need_madvise, need_fallocate, ret);
     } else {
-        error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
-                     "/%zx/" RAM_ADDR_FMT")",
-                     rb->idstr, start, length, rb->max_length);
+        error_report("%s: Overrun block '%s' (%" PRIu64 "/%zx/" RAM_ADDR_FMT")",
+                     __func__, rb->idstr, start, length, rb->max_length);
     }
 
 err:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 14/19] physmem: Add ram_block_convert_range
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (12 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 13/19] physmem: replace function name with __func__ in ram_block_discard_range() Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

From: Chao Peng <chao.p.peng@linux.intel.com>

This new routine adds support for memory conversion between
shared/private memory for gmem fd based private ram_block.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 include/exec/cpu-common.h |  2 ++
 softmmu/physmem.c         | 61 ++++++++++++++++++++++++++++++---------
 2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 87dc9a752c9a..558684b9f246 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -157,6 +157,8 @@ typedef int (RAMBlockIterFunc)(RAMBlock *rb, void *opaque);
 
 int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque);
 int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length);
+int ram_block_convert_range(RAMBlock *rb, uint64_t start, size_t length,
+                            bool shared_to_private);
 
 #endif
 
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 05c981e5c18e..2acc8bee5b33 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3415,15 +3415,9 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
     return ret;
 }
 
-/*
- * Unmap pages of memory from start to start+length such that
- * they a) read as 0, b) Trigger whatever fault mechanism
- * the OS provides for postcopy.
- * The pages must be unmapped by the end of the function.
- * Returns: 0 on success, none-0 on failure
- *
- */
-int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
+static int ram_block_discard_range_fd(RAMBlock *rb, uint64_t start,
+                                      size_t length, int fd)
+
 {
     int ret = -1;
 
@@ -3449,8 +3443,8 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
          *    fallocate works on hugepages and shmem
          *    shared anonymous memory requires madvise REMOVE
          */
-        need_madvise = (rb->page_size == qemu_host_page_size);
-        need_fallocate = rb->fd != -1;
+        need_madvise = (rb->page_size == qemu_host_page_size) && (rb->fd == fd);
+        need_fallocate = fd != -1;
         if (need_fallocate) {
             /* For a file, this causes the area of the file to be zero'd
              * if read, and for hugetlbfs also causes it to be unmapped
@@ -3475,7 +3469,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
                                  " users of the file", __func__);
             }
 
-            ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+            ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                             start, length);
             if (ret) {
                 ret = -errno;
@@ -3498,7 +3492,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
              * fallocate'd away).
              */
 #if defined(CONFIG_MADVISE)
-            if (qemu_ram_is_shared(rb) && rb->fd < 0) {
+            if (qemu_ram_is_shared(rb) && fd < 0) {
                 ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
             } else {
                 ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
@@ -3528,6 +3522,20 @@ err:
     return ret;
 }
 
+/*
+ * Unmap pages of memory from start to start+length such that
+ * they a) read as 0, b) Trigger whatever fault mechanism
+ * the OS provides for postcopy.
+ *
+ * The pages must be unmapped by the end of the function.
+ * Returns: 0 on success, none-0 on failure.
+ */
+int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
+{
+    return ram_block_discard_range_fd(rb, start, length, rb->fd);
+}
+
+
 bool ramblock_is_pmem(RAMBlock *rb)
 {
     return rb->flags & RAM_PMEM;
@@ -3715,3 +3723,30 @@ bool ram_block_discard_is_required(void)
     return qatomic_read(&ram_block_discard_required_cnt) ||
            qatomic_read(&ram_block_coordinated_discard_required_cnt);
 }
+
+int ram_block_convert_range(RAMBlock *rb, uint64_t start, size_t length,
+                            bool shared_to_private)
+{
+    int fd;
+
+    if (!rb || rb->gmem_fd < 0) {
+        return -1;
+    }
+
+    if (!QEMU_PTR_IS_ALIGNED(start, rb->page_size) ||
+        !QEMU_PTR_IS_ALIGNED(length, rb->page_size)) {
+        return -1;
+    }
+
+    if (length > rb->max_length) {
+        return -1;
+    }
+
+    if (shared_to_private) {
+        fd = rb->fd;
+    } else {
+        fd = rb->gmem_fd;
+    }
+
+    return ram_block_discard_range_fd(rb, start, length, fd);
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (13 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 14/19] physmem: Add ram_block_convert_range Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-08-02 22:25   ` Isaku Yamahata
  2023-08-09 15:02   ` Xu Yilun
  2023-07-31 16:21 ` [RFC PATCH 16/19] trace/kvm: Add trace for page convertion between shared and private Xiaoyao Li
                   ` (5 subsequent siblings)
  20 siblings, 2 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

From: Chao Peng <chao.p.peng@linux.intel.com>

Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
KVM_EXIT_MEMORY_FAULT happens. It indicates userspace needs to do
the memory conversion on the RAMBlock to turn the memory into desired
attribute, i.e., private/shared.

Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
gmem memory backend.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f9b5050b8885..72d50b923bf2 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3040,6 +3040,48 @@ static void kvm_eat_signals(CPUState *cpu)
     } while (sigismember(&chkset, SIG_IPI));
 }
 
+static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
+{
+    MemoryRegionSection section;
+    void *addr;
+    RAMBlock *rb;
+    ram_addr_t offset;
+    int ret = -1;
+
+    section = memory_region_find(get_system_memory(), start, size);
+    if (!section.mr) {
+        return ret;
+    }
+
+    if (memory_region_can_be_private(section.mr)) {
+        if (to_private) {
+            ret = kvm_set_memory_attributes_private(start, size);
+        } else {
+            ret = kvm_set_memory_attributes_shared(start, size);
+        }
+
+        if (ret) {
+            return ret;
+        }
+
+        addr = memory_region_get_ram_ptr(section.mr) +
+               section.offset_within_region;
+        rb = qemu_ram_block_from_host(addr, false, &offset);
+        /*
+         * With KVM_SET_MEMORY_ATTRIBUTES by kvm_set_memory_attributes(),
+         * operation on underlying file descriptor is only for releasing
+         * unnecessary pages.
+         */
+        ram_block_convert_range(rb, offset, size, to_private);
+    } else {
+        warn_report("Convert non guest-memfd backed memory region (0x%"HWADDR_PRIx" ,+ 0x%"HWADDR_PRIx") to %s",
+                    start, size, to_private ? "private" : "shared");
+    }
+
+    memory_region_unref(section.mr);
+    return ret;
+}
+
 int kvm_cpu_exec(CPUState *cpu)
 {
     struct kvm_run *run = cpu->kvm_run;
@@ -3198,6 +3240,16 @@ int kvm_cpu_exec(CPUState *cpu)
                 break;
             }
             break;
+        case KVM_EXIT_MEMORY_FAULT:
+            if (run->memory.flags & ~KVM_MEMORY_EXIT_FLAG_PRIVATE) {
+                error_report("KVM_EXIT_MEMORY_FAULT: Unknown flag 0x%" PRIx64,
+                             (uint64_t)run->memory.flags);
+                ret = -1;
+                break;
+            }
+            ret = kvm_convert_memory(run->memory.gpa, run->memory.size,
+                                     run->memory.flags & KVM_MEMORY_EXIT_FLAG_PRIVATE);
+            break;
         default:
             DPRINTF("kvm_arch_handle_exit\n");
             ret = kvm_arch_handle_exit(cpu, run);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 16/19] trace/kvm: Add trace for page convertion between shared and private
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (14 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:21 ` [RFC PATCH 17/19] pci-host/q35: Move PAM initialization above SMRAM initialization Xiaoyao Li
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 accel/kvm/kvm-all.c    | 1 +
 accel/kvm/trace-events | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 72d50b923bf2..c9f3aab5e587 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3048,6 +3048,7 @@ static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
     ram_addr_t offset;
     int ret = -1;
 
+    trace_kvm_convert_memory(start, size, to_private ? "shared_to_private" : "private_to_shared");
     section = memory_region_find(get_system_memory(), start, size);
     if (!section.mr) {
         return ret;
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 80694683acea..4935c9c5cf0b 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -25,4 +25,4 @@ kvm_dirty_ring_reaper(const char *s) "%s"
 kvm_dirty_ring_reap(uint64_t count, int64_t t) "reaped %"PRIu64" pages (took %"PRIi64" us)"
 kvm_dirty_ring_reaper_kick(const char *reason) "%s"
 kvm_dirty_ring_flush(int finished) "%d"
-
+kvm_convert_memory(uint64_t start, uint64_t size, const char *msg) "start 0x%" PRIx64 " size 0x%" PRIx64 " %s"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 17/19] pci-host/q35: Move PAM initialization above SMRAM initialization
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (15 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 16/19] trace/kvm: Add trace for page convertion between shared and private Xiaoyao Li
@ 2023-07-31 16:21 ` Xiaoyao Li
  2023-07-31 16:22 ` [RFC PATCH 18/19] q35: Introduce smm_ranges property for q35-pci-host Xiaoyao Li
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:21 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

From: Isaku Yamahata <isaku.yamahata@intel.com>

In mch_realize(), process PAM initialization before SMRAM initialization so
that later patch can skill all the SMRAM related with a single check.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/pci-host/q35.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 91c46df9ae31..ac1518a94ee4 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -575,6 +575,16 @@ static void mch_realize(PCIDevice *d, Error **errp)
     /* setup pci memory mapping */
     pc_pci_as_mapping_init(mch->system_memory, mch->pci_address_space);
 
+    /* PAM */
+    init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
+             mch->system_memory, mch->pci_address_space,
+             PAM_BIOS_BASE, PAM_BIOS_SIZE);
+    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
+        init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
+                 mch->system_memory, mch->pci_address_space,
+                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
@@ -641,15 +651,6 @@ static void mch_realize(PCIDevice *d, Error **errp)
 
     object_property_add_const_link(qdev_get_machine(), "smram",
                                    OBJECT(&mch->smram));
-
-    init_pam(&mch->pam_regions[0], OBJECT(mch), mch->ram_memory,
-             mch->system_memory, mch->pci_address_space,
-             PAM_BIOS_BASE, PAM_BIOS_SIZE);
-    for (i = 0; i < ARRAY_SIZE(mch->pam_regions) - 1; ++i) {
-        init_pam(&mch->pam_regions[i + 1], OBJECT(mch), mch->ram_memory,
-                 mch->system_memory, mch->pci_address_space,
-                 PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
-    }
 }
 
 uint64_t mch_mcfg_base(void)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 18/19] q35: Introduce smm_ranges property for q35-pci-host
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (16 preceding siblings ...)
  2023-07-31 16:21 ` [RFC PATCH 17/19] pci-host/q35: Move PAM initialization above SMRAM initialization Xiaoyao Li
@ 2023-07-31 16:22 ` Xiaoyao Li
  2023-07-31 16:22 ` [RFC PATCH 19/19] i386: Disable SMM mode for X86_SW_PROTECTED_VM Xiaoyao Li
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

From: Isaku Yamahata <isaku.yamahata@linux.intel.com>

Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
etc... exist for the target platform.  TDX doesn't support SMM and doesn't
play nice with QEMU modifying related guest memory ranges.

Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 hw/i386/pc_q35.c          |  3 ++-
 hw/pci-host/q35.c         | 42 +++++++++++++++++++++++++++------------
 include/hw/i386/pc.h      |  1 +
 include/hw/pci-host/q35.h |  1 +
 4 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index dc27a9e223a2..73eb3bc5e826 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -232,8 +232,9 @@ static void pc_q35_init(MachineState *machine)
                             x86ms->above_4g_mem_size, NULL);
     object_property_set_bool(phb, PCI_HOST_BYPASS_IOMMU,
                              pcms->default_bus_bypass_iommu, NULL);
+    object_property_set_bool(phb, PCI_HOST_PROP_SMM_RANGES,
+                             x86_machine_is_smm_enabled(x86ms), NULL);
     sysbus_realize_and_unref(SYS_BUS_DEVICE(phb), &error_fatal);
-
     /* pci */
     host_bus = PCI_BUS(qdev_get_child_bus(DEVICE(phb), "pcie.0"));
     pcms->bus = host_bus;
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index ac1518a94ee4..f5c01f7080f9 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -186,6 +186,8 @@ static Property q35_host_props[] = {
                      mch.below_4g_mem_size, 0),
     DEFINE_PROP_SIZE(PCI_HOST_ABOVE_4G_MEM_SIZE, Q35PCIHost,
                      mch.above_4g_mem_size, 0),
+    DEFINE_PROP_BOOL(PCI_HOST_PROP_SMM_RANGES, Q35PCIHost,
+                     mch.has_smm_ranges, true),
     DEFINE_PROP_BOOL("x-pci-hole64-fix", Q35PCIHost, pci_hole64_fix, true),
     DEFINE_PROP_END_OF_LIST(),
 };
@@ -221,6 +223,7 @@ static void q35_host_initfn(Object *obj)
     /* mch's object_initialize resets the default value, set it again */
     qdev_prop_set_uint64(DEVICE(s), PCI_HOST_PROP_PCI_HOLE64_SIZE,
                          Q35_PCI_HOST_HOLE64_SIZE_DEFAULT);
+
     object_property_add(obj, PCI_HOST_PROP_PCI_HOLE_START, "uint32",
                         q35_host_get_pci_hole_start,
                         NULL, NULL, NULL);
@@ -483,6 +486,10 @@ static void mch_write_config(PCIDevice *d,
         mch_update_pciexbar(mch);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     if (ranges_overlap(address, len, MCH_HOST_BRIDGE_SMRAM,
                        MCH_HOST_BRIDGE_SMRAM_SIZE)) {
         mch_update_smram(mch);
@@ -501,10 +508,13 @@ static void mch_write_config(PCIDevice *d,
 static void mch_update(MCHPCIState *mch)
 {
     mch_update_pciexbar(mch);
+
     mch_update_pam(mch);
-    mch_update_smram(mch);
-    mch_update_ext_tseg_mbytes(mch);
-    mch_update_smbase_smram(mch);
+    if (mch->has_smm_ranges) {
+        mch_update_smram(mch);
+        mch_update_ext_tseg_mbytes(mch);
+        mch_update_smbase_smram(mch);
+    }
 
     /*
      * pci hole goes from end-of-low-ram to io-apic.
@@ -545,19 +555,21 @@ static void mch_reset(DeviceState *qdev)
     pci_set_quad(d->config + MCH_HOST_BRIDGE_PCIEXBAR,
                  MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT);
 
-    d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
-    d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
-    d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
-    d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
+    if (mch->has_smm_ranges) {
+        d->config[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_DEFAULT;
+        d->config[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_DEFAULT;
+        d->wmask[MCH_HOST_BRIDGE_SMRAM] = MCH_HOST_BRIDGE_SMRAM_WMASK;
+        d->wmask[MCH_HOST_BRIDGE_ESMRAMC] = MCH_HOST_BRIDGE_ESMRAMC_WMASK;
 
-    if (mch->ext_tseg_mbytes > 0) {
-        pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
-                     MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+        if (mch->ext_tseg_mbytes > 0) {
+            pci_set_word(d->config + MCH_HOST_BRIDGE_EXT_TSEG_MBYTES,
+                        MCH_HOST_BRIDGE_EXT_TSEG_MBYTES_QUERY);
+        }
+
+        d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
+        d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
     }
 
-    d->config[MCH_HOST_BRIDGE_F_SMBASE] = 0;
-    d->wmask[MCH_HOST_BRIDGE_F_SMBASE] = 0xff;
-
     mch_update(mch);
 }
 
@@ -585,6 +597,10 @@ static void mch_realize(PCIDevice *d, Error **errp)
                  PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
     }
 
+    if (!mch->has_smm_ranges) {
+        return;
+    }
+
     /* if *disabled* show SMRAM to all CPUs */
     memory_region_init_alias(&mch->smram_region, OBJECT(mch), "smram-region",
                              mch->pci_address_space, MCH_HOST_BRIDGE_SMRAM_C_BASE,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index c98d628a76f3..3c9ced22cea9 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -156,6 +156,7 @@ void pc_guest_info_init(PCMachineState *pcms);
 #define PCI_HOST_PROP_PCI_HOLE64_SIZE  "pci-hole64-size"
 #define PCI_HOST_BELOW_4G_MEM_SIZE     "below-4g-mem-size"
 #define PCI_HOST_ABOVE_4G_MEM_SIZE     "above-4g-mem-size"
+#define PCI_HOST_PROP_SMM_RANGES       "smm-ranges"
 
 
 void pc_pci_as_mapping_init(MemoryRegion *system_memory,
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index 1d98bbfe0dd2..930af3870e8a 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -50,6 +50,7 @@ struct MCHPCIState {
     MemoryRegion tseg_blackhole, tseg_window;
     MemoryRegion smbase_blackhole, smbase_window;
     bool has_smram_at_smbase;
+    bool has_smm_ranges;
     Range pci_hole;
     uint64_t below_4g_mem_size;
     uint64_t above_4g_mem_size;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [RFC PATCH 19/19] i386: Disable SMM mode for X86_SW_PROTECTED_VM
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (17 preceding siblings ...)
  2023-07-31 16:22 ` [RFC PATCH 18/19] q35: Introduce smm_ranges property for q35-pci-host Xiaoyao Li
@ 2023-07-31 16:22 ` Xiaoyao Li
  2023-08-02 22:27   ` Isaku Yamahata
  2023-07-31 16:51 ` [RFC PATCH 00/19] QEMU gmem implemention Daniel P. Berrangé
  2023-07-31 17:10 ` Isaku Yamahata
  20 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2023-07-31 16:22 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, xiaoyao.li,
	qemu-devel, kvm

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 target/i386/kvm/kvm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a96640512dbc..62f237068a3a 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2654,6 +2654,13 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
     if (x86ms->vm_type == KVM_X86_SW_PROTECTED_VM) {
         memory_listener_register(&kvm_x86_sw_protected_vm_memory_listener, &address_space_memory);
+
+        if (x86ms->smm == ON_OFF_AUTO_AUTO) {
+            x86ms->smm = ON_OFF_AUTO_OFF;
+        } else if (x86ms->smm == ON_OFF_AUTO_ON) {
+            error_report("X86_SW_PROTECTED_VM doesn't support SMM");
+            return -EINVAL;
+        }
     }
 
     if (!kvm_check_extension(s, KVM_CAP_IRQ_ROUTING)) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 00/19] QEMU gmem implemention
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (18 preceding siblings ...)
  2023-07-31 16:22 ` [RFC PATCH 19/19] i386: Disable SMM mode for X86_SW_PROTECTED_VM Xiaoyao Li
@ 2023-07-31 16:51 ` Daniel P. Berrangé
  2023-08-01  1:45   ` Xiaoyao Li
  2023-07-31 17:10 ` Isaku Yamahata
  20 siblings, 1 reply; 53+ messages in thread
From: Daniel P. Berrangé @ 2023-07-31 16:51 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote:
> This is the first RFC version of enabling KVM gmem[1] as the backend for
> private memory of KVM_X86_PROTECTED_VM.
> 
> It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
> and introduces 'private' property for memory backend. When the vm type
> is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
> it will call KVM gmem ioctl to allocate private memory for the backend.
> 
>     $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
>           -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
> 	  ...
> 
> Unfortunately this patch series fails the boot of OVMF at very early
> stage due to triple fault because KVM doesn't support emulate string IO
> to private memory. We leave it as an open to be discussed.
> 
> There are following design opens that need to be discussed:
> 
> 1. how to determine the vm type?
> 
>    a. like this series, specify the vm type via machine property
>       'kvm-type'
>    b. check the memory backend, if any backend has 'private' property
>       set, the vm-type is set to KVM_X86_PROTECTED_VM.
> 
> 2. whether 'private' property is needed if we choose 1.b as design 
> 
>    with 1.b, QEMU can decide whether the memory region needs to be
>    private (allocates gmem fd for it) or not, on its own.
> 
> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
>    purose of it and what's the requirement on it. I think it's the
>    questions for KVM folks than QEMU folks.
> 
> Any other idea/open/question is welcomed.
> 
> 
> Beside, TDX QEMU implemetation is based on this series to provide
> private gmem for TD private memory, which can be found at [2].
> And it can work corresponding KVM [3] to boot TDX guest. 

We already have a general purpose configuration mechanism for
confidential guests.  The -machine argument has a property
confidential-guest-support=$OBJECT-ID, for pointing to an
object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT
interface in QEMU. This is implemented with SEV, PPC PEF
mode, and s390 protvirt.

I would expect TDX to follow this same design ie

    qemu-system-x86_64 \
      -object tdx-guest,id=tdx0,..... \
      -machine q35,confidential-guest-support=tdx0 \
      ...

and not require inventing the new 'kvm-type' attribute at least.

For the memory backend though, I'm not so sure - possibly that
might be something that still wants an extra property to identify
the type of memory to allocate, since we use memory-backend-ram
for a variety of use cases.  Or it could be an entirely new object
type such as "memory-backend-gmem"


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 00/19] QEMU gmem implemention
  2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
                   ` (19 preceding siblings ...)
  2023-07-31 16:51 ` [RFC PATCH 00/19] QEMU gmem implemention Daniel P. Berrangé
@ 2023-07-31 17:10 ` Isaku Yamahata
  2023-08-01  1:55   ` Xiaoyao Li
  20 siblings, 1 reply; 53+ messages in thread
From: Isaku Yamahata @ 2023-07-31 17:10 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On Mon, Jul 31, 2023 at 12:21:42PM -0400,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> This is the first RFC version of enabling KVM gmem[1] as the backend for
> private memory of KVM_X86_PROTECTED_VM.
> 
> It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
> and introduces 'private' property for memory backend. When the vm type
> is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
> it will call KVM gmem ioctl to allocate private memory for the backend.
> 
>     $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
>           -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
> 	  ...
> 
> Unfortunately this patch series fails the boot of OVMF at very early
> stage due to triple fault because KVM doesn't support emulate string IO
> to private memory. We leave it as an open to be discussed.
> 
> There are following design opens that need to be discussed:
> 
> 1. how to determine the vm type?
> 
>    a. like this series, specify the vm type via machine property
>       'kvm-type'
>    b. check the memory backend, if any backend has 'private' property
>       set, the vm-type is set to KVM_X86_PROTECTED_VM.

Hi Xiaoyao.  Because qemu has already confidential guest support, we should
utilize it.  Say,
qemu  \
  -object sw-protected, id=swp0, <more options for KVM_X86_SW_PROTECTED_VM> \
  -machine confidential-guest-support=swp0



> 2. whether 'private' property is needed if we choose 1.b as design 
> 
>    with 1.b, QEMU can decide whether the memory region needs to be
>    private (allocates gmem fd for it) or not, on its own.


Memory region property (how to create KVM memory slot) should be independent
from underlying VM type.  Some (e.g. TDX) may require KVM private memory slot,
some may not.  Leave the decision to its vm type backend.  They can use qemu
memory listener.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-07-31 16:21 ` [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem Xiaoyao Li
@ 2023-07-31 17:22   ` Markus Armbruster
  2023-08-01 14:54     ` Xiaoyao Li
  2023-08-01 14:57     ` Daniel P. Berrangé
  2023-08-01 17:21   ` David Hildenbrand
  1 sibling, 2 replies; 53+ messages in thread
From: Markus Armbruster @ 2023-07-31 17:22 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

Xiaoyao Li <xiaoyao.li@intel.com> writes:

> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

[...]

> diff --git a/qapi/qom.json b/qapi/qom.json
> index 7f92ea43e8e1..e0b2044e3d20 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -605,6 +605,9 @@
>  # @reserve: if true, reserve swap space (or huge pages) if applicable
>  #     (default: true) (since 6.1)
>  #
> +# @private: if true, use KVM gmem private memory
> +#           (default: false) (since 8.1)
> +#

Please format like

   # @private: if true, use KVM gmem private memory (default: false)
   #     (since 8.1)

to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments
to conform to current conventions).

>  # @size: size of the memory region in bytes
>  #
>  # @x-use-canonical-path-for-ramblock-id: if true, the canonical path
> @@ -631,6 +634,7 @@
>              '*prealloc-context': 'str',
>              '*share': 'bool',
>              '*reserve': 'bool',
> +            '*private': 'bool',
>              'size': 'size',
>              '*x-use-canonical-path-for-ramblock-id': 'bool' } }



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-07-31 16:21 ` [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private() Xiaoyao Li
@ 2023-07-31 21:23   ` Peter Xu
  2023-07-31 21:33     ` Michael S. Tsirkin
  2023-07-31 21:34     ` Sean Christopherson
  2023-08-01 16:48   ` Claudio Fontana
  1 sibling, 2 replies; 53+ messages in thread
From: Peter Xu @ 2023-07-31 21:23 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Chao Peng, Michael Roth, isaku.yamahata, qemu-devel, kvm

On Mon, Jul 31, 2023 at 12:21:46PM -0400, Xiaoyao Li wrote:
> +bool memory_region_can_be_private(MemoryRegion *mr)
> +{
> +    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
> +}

This is not really MAP_PRIVATE, am I right?  If so, is there still chance
we rename it (it seems to be also in the kernel proposal all across..)?

I worry it can be very confusing in the future against MAP_PRIVATE /
MAP_SHARED otherwise.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-07-31 21:23   ` Peter Xu
@ 2023-07-31 21:33     ` Michael S. Tsirkin
  2023-07-31 21:34     ` Sean Christopherson
  1 sibling, 0 replies; 53+ messages in thread
From: Michael S. Tsirkin @ 2023-07-31 21:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: Xiaoyao Li, Paolo Bonzini, Sean Christopherson,
	David Hildenbrand, Igor Mammedov, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Chao Peng, Michael Roth, isaku.yamahata, qemu-devel, kvm

On Mon, Jul 31, 2023 at 05:23:55PM -0400, Peter Xu wrote:
> On Mon, Jul 31, 2023 at 12:21:46PM -0400, Xiaoyao Li wrote:
> > +bool memory_region_can_be_private(MemoryRegion *mr)
> > +{
> > +    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
> > +}
> 
> This is not really MAP_PRIVATE, am I right?  If so, is there still chance
> we rename it (it seems to be also in the kernel proposal all across..)?
> 
> I worry it can be very confusing in the future against MAP_PRIVATE /
> MAP_SHARED otherwise.
> 
> Thanks,

It is - kernel calls this "map shared" but unfortunately qemu calls this
just "shared". Maybe e.g. "protect_shared" would be a better name for qemu?

-- 
MST



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-07-31 21:23   ` Peter Xu
  2023-07-31 21:33     ` Michael S. Tsirkin
@ 2023-07-31 21:34     ` Sean Christopherson
  2023-07-31 21:36       ` Michael S. Tsirkin
  1 sibling, 1 reply; 53+ messages in thread
From: Sean Christopherson @ 2023-07-31 21:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: Xiaoyao Li, Paolo Bonzini, David Hildenbrand, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Marcelo Tosatti, Markus Armbruster, Eric Blake,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Chao Peng, Michael Roth, isaku.yamahata, qemu-devel, kvm

On Mon, Jul 31, 2023, Peter Xu wrote:
> On Mon, Jul 31, 2023 at 12:21:46PM -0400, Xiaoyao Li wrote:
> > +bool memory_region_can_be_private(MemoryRegion *mr)
> > +{
> > +    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
> > +}
> 
> This is not really MAP_PRIVATE, am I right?  If so, is there still chance
> we rename it (it seems to be also in the kernel proposal all across..)?

Yes and yes.

> I worry it can be very confusing in the future against MAP_PRIVATE /
> MAP_SHARED otherwise.

Heh, it's already quite confusing at times.  I'm definitely open to naming that
doesn't collide with MAP_{PRIVATE,SHARED}, especially if someone can come with a
naming scheme that includes a succinct way to describe memory that is shared
between two or more VMs, but is accessible to _only_ those VMs.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-07-31 21:34     ` Sean Christopherson
@ 2023-07-31 21:36       ` Michael S. Tsirkin
  2023-08-01  0:21         ` Peter Xu
  0 siblings, 1 reply; 53+ messages in thread
From: Michael S. Tsirkin @ 2023-07-31 21:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Xu, Xiaoyao Li, Paolo Bonzini, David Hildenbrand,
	Igor Mammedov, Marcel Apfelbaum, Richard Henderson,
	Marcelo Tosatti, Markus Armbruster, Eric Blake,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Chao Peng, Michael Roth, isaku.yamahata, qemu-devel, kvm

On Mon, Jul 31, 2023 at 02:34:22PM -0700, Sean Christopherson wrote:
> On Mon, Jul 31, 2023, Peter Xu wrote:
> > On Mon, Jul 31, 2023 at 12:21:46PM -0400, Xiaoyao Li wrote:
> > > +bool memory_region_can_be_private(MemoryRegion *mr)
> > > +{
> > > +    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
> > > +}
> > 
> > This is not really MAP_PRIVATE, am I right?  If so, is there still chance
> > we rename it (it seems to be also in the kernel proposal all across..)?
> 
> Yes and yes.
> 
> > I worry it can be very confusing in the future against MAP_PRIVATE /
> > MAP_SHARED otherwise.
> 
> Heh, it's already quite confusing at times.  I'm definitely open to naming that
> doesn't collide with MAP_{PRIVATE,SHARED}, especially if someone can come with a
> naming scheme that includes a succinct way to describe memory that is shared
> between two or more VMs, but is accessible to _only_ those VMs.

Standard solution is a technology specific prefix.
protect_shared, encrypt_shared etc.

-- 
MST



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-07-31 21:36       ` Michael S. Tsirkin
@ 2023-08-01  0:21         ` Peter Xu
  2023-08-01 16:23           ` Sean Christopherson
  0 siblings, 1 reply; 53+ messages in thread
From: Peter Xu @ 2023-08-01  0:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sean Christopherson, Xiaoyao Li, Paolo Bonzini,
	David Hildenbrand, Igor Mammedov, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Chao Peng, Michael Roth, isaku.yamahata, qemu-devel, kvm

On Mon, Jul 31, 2023 at 05:36:37PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 31, 2023 at 02:34:22PM -0700, Sean Christopherson wrote:
> > On Mon, Jul 31, 2023, Peter Xu wrote:
> > > On Mon, Jul 31, 2023 at 12:21:46PM -0400, Xiaoyao Li wrote:
> > > > +bool memory_region_can_be_private(MemoryRegion *mr)
> > > > +{
> > > > +    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
> > > > +}
> > > 
> > > This is not really MAP_PRIVATE, am I right?  If so, is there still chance
> > > we rename it (it seems to be also in the kernel proposal all across..)?
> > 
> > Yes and yes.
> > 
> > > I worry it can be very confusing in the future against MAP_PRIVATE /
> > > MAP_SHARED otherwise.
> > 
> > Heh, it's already quite confusing at times.  I'm definitely open to naming that
> > doesn't collide with MAP_{PRIVATE,SHARED}, especially if someone can come with a
> > naming scheme that includes a succinct way to describe memory that is shared
> > between two or more VMs, but is accessible to _only_ those VMs.
> 
> Standard solution is a technology specific prefix.
> protect_shared, encrypt_shared etc.

Agreed, a prefix could definitely help (if nothing better comes at last..).
If e.g. "encrypted" too long to be applied everywhere in var names and
functions, maybe it can also be "enc_{private|shared}".

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 00/19] QEMU gmem implemention
  2023-07-31 16:51 ` [RFC PATCH 00/19] QEMU gmem implemention Daniel P. Berrangé
@ 2023-08-01  1:45   ` Xiaoyao Li
  2023-08-10 15:58     ` Michael Roth via
  0 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2023-08-01  1:45 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 8/1/2023 12:51 AM, Daniel P. Berrangé wrote:
> On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote:
>> This is the first RFC version of enabling KVM gmem[1] as the backend for
>> private memory of KVM_X86_PROTECTED_VM.
>>
>> It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
>> and introduces 'private' property for memory backend. When the vm type
>> is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
>> it will call KVM gmem ioctl to allocate private memory for the backend.
>>
>>      $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
>>            -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
>> 	  ...
>>
>> Unfortunately this patch series fails the boot of OVMF at very early
>> stage due to triple fault because KVM doesn't support emulate string IO
>> to private memory. We leave it as an open to be discussed.
>>
>> There are following design opens that need to be discussed:
>>
>> 1. how to determine the vm type?
>>
>>     a. like this series, specify the vm type via machine property
>>        'kvm-type'
>>     b. check the memory backend, if any backend has 'private' property
>>        set, the vm-type is set to KVM_X86_PROTECTED_VM.
>>
>> 2. whether 'private' property is needed if we choose 1.b as design
>>
>>     with 1.b, QEMU can decide whether the memory region needs to be
>>     private (allocates gmem fd for it) or not, on its own.
>>
>> 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
>>     purose of it and what's the requirement on it. I think it's the
>>     questions for KVM folks than QEMU folks.
>>
>> Any other idea/open/question is welcomed.
>>
>>
>> Beside, TDX QEMU implemetation is based on this series to provide
>> private gmem for TD private memory, which can be found at [2].
>> And it can work corresponding KVM [3] to boot TDX guest.
> 
> We already have a general purpose configuration mechanism for
> confidential guests.  The -machine argument has a property
> confidential-guest-support=$OBJECT-ID, for pointing to an
> object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT
> interface in QEMU. This is implemented with SEV, PPC PEF
> mode, and s390 protvirt.
> 
> I would expect TDX to follow this same design ie
> 
>      qemu-system-x86_64 \
>        -object tdx-guest,id=tdx0,..... \
>        -machine q35,confidential-guest-support=tdx0 \
>        ...
> 
> and not require inventing the new 'kvm-type' attribute at least.

yes.

TDX is initialized exactly as the above.

This RFC series introduces the 'kvm-type' for KVM_X86_SW_PROTECTED_VM. 
It's my fault that forgot to list the option of introducing 
sw_protected_vm object with CONFIDENTIAL_GUEST_SUPPORT interface.
Thanks for Isaku to raise it 
https://lore.kernel.org/qemu-devel/20230731171041.GB1807130@ls.amr.corp.intel.com/

we can specify KVM_X86_SW_PROTECTED_VM this way:

qemu  \
   -object sw-protected,id=swp0,... \
   -machine confidential-guest-support=swp0 \
   ...

> For the memory backend though, I'm not so sure - possibly that
> might be something that still wants an extra property to identify
> the type of memory to allocate, since we use memory-backend-ram
> for a variety of use cases.  Or it could be an entirely new object
> type such as "memory-backend-gmem"

What I want to discuss is whether providing the interface to users to 
allow them configuring which memory is/can be private. For example, QEMU 
can do it internally. If users wants a confidential guest, QEMU 
allocates private gmem for normal RAM automatically.




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 00/19] QEMU gmem implemention
  2023-07-31 17:10 ` Isaku Yamahata
@ 2023-08-01  1:55   ` Xiaoyao Li
  0 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-08-01  1:55 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, qemu-devel, kvm

On 8/1/2023 1:10 AM, Isaku Yamahata wrote:
> On Mon, Jul 31, 2023 at 12:21:42PM -0400,
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
>> This is the first RFC version of enabling KVM gmem[1] as the backend for
>> private memory of KVM_X86_PROTECTED_VM.
>>
>> It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
>> and introduces 'private' property for memory backend. When the vm type
>> is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
>> it will call KVM gmem ioctl to allocate private memory for the backend.
>>
>>      $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
>>            -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
>> 	  ...
>>
>> Unfortunately this patch series fails the boot of OVMF at very early
>> stage due to triple fault because KVM doesn't support emulate string IO
>> to private memory. We leave it as an open to be discussed.
>>
>> There are following design opens that need to be discussed:
>>
>> 1. how to determine the vm type?
>>
>>     a. like this series, specify the vm type via machine property
>>        'kvm-type'
>>     b. check the memory backend, if any backend has 'private' property
>>        set, the vm-type is set to KVM_X86_PROTECTED_VM.
> 
> Hi Xiaoyao.  Because qemu has already confidential guest support, we should
> utilize it.  Say,
> qemu  \
>    -object sw-protected, id=swp0, <more options for KVM_X86_SW_PROTECTED_VM> \
>    -machine confidential-guest-support=swp0

thanks for pointing out this option. I thought of it and forgot to list 
it as option.

It seems better and I'll go this direction if no one has different opinion.

> 
>> 2. whether 'private' property is needed if we choose 1.b as design
>>
>>     with 1.b, QEMU can decide whether the memory region needs to be
>>     private (allocates gmem fd for it) or not, on its own.
> 
> 
> Memory region property (how to create KVM memory slot) should be independent
> from underlying VM type.  Some (e.g. TDX) may require KVM private memory slot,
> some may not.  Leave the decision to its vm type backend.  They can use qemu
> memory listener.

As I replied to Daniel, the topic is whether 'private' property is 
needed. Is it essential to let users decide which memory can be private? 
It seems OK that QEMU can make the decision based on VM type.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-07-31 17:22   ` Markus Armbruster
@ 2023-08-01 14:54     ` Xiaoyao Li
  2023-08-01 14:57     ` Daniel P. Berrangé
  1 sibling, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-08-01 14:54 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Eric Blake,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 8/1/2023 1:22 AM, Markus Armbruster wrote:
> Xiaoyao Li <xiaoyao.li@intel.com> writes:
> 
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> [...]
> 
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index 7f92ea43e8e1..e0b2044e3d20 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -605,6 +605,9 @@
>>   # @reserve: if true, reserve swap space (or huge pages) if applicable
>>   #     (default: true) (since 6.1)
>>   #
>> +# @private: if true, use KVM gmem private memory
>> +#           (default: false) (since 8.1)
>> +#
> 
> Please format like
> 
>     # @private: if true, use KVM gmem private memory (default: false)
>     #     (since 8.1)
> 
> to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments
> to conform to current conventions).

will do it in next version.

Thanks!
-Xiaoyao

>>   # @size: size of the memory region in bytes
>>   #
>>   # @x-use-canonical-path-for-ramblock-id: if true, the canonical path
>> @@ -631,6 +634,7 @@
>>               '*prealloc-context': 'str',
>>               '*share': 'bool',
>>               '*reserve': 'bool',
>> +            '*private': 'bool',
>>               'size': 'size',
>>               '*x-use-canonical-path-for-ramblock-id': 'bool' } }
> 



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-07-31 17:22   ` Markus Armbruster
  2023-08-01 14:54     ` Xiaoyao Li
@ 2023-08-01 14:57     ` Daniel P. Berrangé
  2023-08-02  8:04       ` Xiaoyao Li
  1 sibling, 1 reply; 53+ messages in thread
From: Daniel P. Berrangé @ 2023-08-01 14:57 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Xiaoyao Li, Paolo Bonzini, Sean Christopherson,
	David Hildenbrand, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Marcelo Tosatti, Eric Blake,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On Mon, Jul 31, 2023 at 07:22:05PM +0200, Markus Armbruster wrote:
> Xiaoyao Li <xiaoyao.li@intel.com> writes:
> 
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> >
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> 
> [...]
> 
> > diff --git a/qapi/qom.json b/qapi/qom.json
> > index 7f92ea43e8e1..e0b2044e3d20 100644
> > --- a/qapi/qom.json
> > +++ b/qapi/qom.json
> > @@ -605,6 +605,9 @@
> >  # @reserve: if true, reserve swap space (or huge pages) if applicable
> >  #     (default: true) (since 6.1)
> >  #
> > +# @private: if true, use KVM gmem private memory
> > +#           (default: false) (since 8.1)
> > +#
> 
> Please format like
> 
>    # @private: if true, use KVM gmem private memory (default: false)
>    #     (since 8.1)

Also QEMU 8.1.0 is in freeze right now, so there's no chance
of these patches making 8.1.0. IOW, use "since 8.2" as the
next release you might achieve merge for.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-08-01  0:21         ` Peter Xu
@ 2023-08-01 16:23           ` Sean Christopherson
  0 siblings, 0 replies; 53+ messages in thread
From: Sean Christopherson @ 2023-08-01 16:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, Xiaoyao Li, Paolo Bonzini, David Hildenbrand,
	Igor Mammedov, Marcel Apfelbaum, Richard Henderson,
	Marcelo Tosatti, Markus Armbruster, Eric Blake,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Chao Peng, Michael Roth, isaku.yamahata, qemu-devel, kvm

On Mon, Jul 31, 2023, Peter Xu wrote:
> On Mon, Jul 31, 2023 at 05:36:37PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 31, 2023 at 02:34:22PM -0700, Sean Christopherson wrote:
> > > On Mon, Jul 31, 2023, Peter Xu wrote:
> > > > On Mon, Jul 31, 2023 at 12:21:46PM -0400, Xiaoyao Li wrote:
> > > > > +bool memory_region_can_be_private(MemoryRegion *mr)
> > > > > +{
> > > > > +    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
> > > > > +}
> > > > 
> > > > This is not really MAP_PRIVATE, am I right?  If so, is there still chance
> > > > we rename it (it seems to be also in the kernel proposal all across..)?
> > > 
> > > Yes and yes.
> > > 
> > > > I worry it can be very confusing in the future against MAP_PRIVATE /
> > > > MAP_SHARED otherwise.
> > > 
> > > Heh, it's already quite confusing at times.  I'm definitely open to naming that
> > > doesn't collide with MAP_{PRIVATE,SHARED}, especially if someone can come with a
> > > naming scheme that includes a succinct way to describe memory that is shared
> > > between two or more VMs, but is accessible to _only_ those VMs.
> > 
> > Standard solution is a technology specific prefix.
> > protect_shared, encrypt_shared etc.
> 
> Agreed, a prefix could definitely help (if nothing better comes at last..).
> If e.g. "encrypted" too long to be applied everywhere in var names and
> functions, maybe it can also be "enc_{private|shared}".

FWIW, I would stay away from "encrypted", there is no requirement that the memory
actually be encrypted.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 03/19] RAMBlock: Support KVM gmemory
  2023-07-31 16:21 ` [RFC PATCH 03/19] RAMBlock: Support KVM gmemory Xiaoyao Li
@ 2023-08-01 16:33   ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2023-08-01 16:33 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Sean Christopherson, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 31.07.23 18:21, Xiaoyao Li wrote:
> From: Chao Peng <chao.p.peng@linux.intel.com>
> 
> Add KVM gmem support to RAMBlock so we can have both normal
> hva based memory and gmem fd based memory in one RAMBlock.
> 
> The gmem part is represented by the gmem_fd.
> 
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>

So, *someone* creates a RAMBlock in QEMU.

Who'll squeeze a gmem_fd in there? When? How?

Shouldn't we create the RAM memory region / RAMBlock already with the 
gmem_fd, and set that when initializing these things?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-07-31 16:21 ` [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private() Xiaoyao Li
  2023-07-31 21:23   ` Peter Xu
@ 2023-08-01 16:48   ` Claudio Fontana
  2023-08-01 16:52     ` Claudio Fontana
  1 sibling, 1 reply; 53+ messages in thread
From: Claudio Fontana @ 2023-08-01 16:48 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Sean Christopherson,
	David Hildenbrand, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 7/31/23 18:21, Xiaoyao Li wrote:
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  include/exec/memory.h | 9 +++++++++
>  softmmu/memory.c      | 5 +++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 61e31c7b9874..e119d3ce1a1d 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1679,6 +1679,15 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
>   */
>  bool memory_region_is_protected(MemoryRegion *mr);
>  
> +/**
> + * memory_region_can_be_private: check whether a memory region can be private

The name of the function is not particularly informative,

> + *
> + * Returns %true if a memory region's ram_block has valid gmem fd assigned.

but in your comment you describe more accurately what it does, why not make it the function name?

bool memory_region_has_valid_gmem_fd()

> + *
> + * @mr: the memory region being queried
> + */
> +bool memory_region_can_be_private(MemoryRegion *mr);


bool memory_region_has_valid_gmem_fd()


Thanks,

C

> +
>  /**
>   * memory_region_get_iommu: check whether a memory region is an iommu
>   *
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 4f8f8c0a02e6..336c76ede660 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1855,6 +1855,11 @@ bool memory_region_is_protected(MemoryRegion *mr)
>      return mr->ram && (mr->ram_block->flags & RAM_PROTECTED);
>  }
>  
> +bool memory_region_can_be_private(MemoryRegion *mr)
> +{
> +    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
> +}
> +
>  uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
>  {
>      uint8_t mask = mr->dirty_log_mask;



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-08-01 16:48   ` Claudio Fontana
@ 2023-08-01 16:52     ` Claudio Fontana
  2023-08-02  8:05       ` Xiaoyao Li
  0 siblings, 1 reply; 53+ messages in thread
From: Claudio Fontana @ 2023-08-01 16:52 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Sean Christopherson,
	David Hildenbrand, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 8/1/23 18:48, Claudio Fontana wrote:
> On 7/31/23 18:21, Xiaoyao Li wrote:
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>  include/exec/memory.h | 9 +++++++++
>>  softmmu/memory.c      | 5 +++++
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 61e31c7b9874..e119d3ce1a1d 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -1679,6 +1679,15 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
>>   */
>>  bool memory_region_is_protected(MemoryRegion *mr);
>>  
>> +/**
>> + * memory_region_can_be_private: check whether a memory region can be private
> 
> The name of the function is not particularly informative,
> 
>> + *
>> + * Returns %true if a memory region's ram_block has valid gmem fd assigned.
> 
> but in your comment you describe more accurately what it does, why not make it the function name?
> 
> bool memory_region_has_valid_gmem_fd()


btw can a memory region have an invalid gmem_fd ?

If an invalid gmem_fd is just used to mark whether gmem_fd is present or not,

we could make it just:

bool memory_region_has_gmem_fd()


Thanks,

C

> 
>> + *
>> + * @mr: the memory region being queried
>> + */
>> +bool memory_region_can_be_private(MemoryRegion *mr);
> 
> 
> bool memory_region_has_valid_gmem_fd()
> 
> 
> Thanks,
> 
> C
> 
>> +
>>  /**
>>   * memory_region_get_iommu: check whether a memory region is an iommu
>>   *
>> diff --git a/softmmu/memory.c b/softmmu/memory.c
>> index 4f8f8c0a02e6..336c76ede660 100644
>> --- a/softmmu/memory.c
>> +++ b/softmmu/memory.c
>> @@ -1855,6 +1855,11 @@ bool memory_region_is_protected(MemoryRegion *mr)
>>      return mr->ram && (mr->ram_block->flags & RAM_PROTECTED);
>>  }
>>  
>> +bool memory_region_can_be_private(MemoryRegion *mr)
>> +{
>> +    return mr->ram_block && mr->ram_block->gmem_fd >= 0;
>> +}
>> +
>>  uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
>>  {
>>      uint8_t mask = mr->dirty_log_mask;
> 



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 05/19] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  2023-07-31 16:21 ` [RFC PATCH 05/19] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
@ 2023-08-01 17:10   ` Claudio Fontana
  2023-08-03  8:43     ` Xiaoyao Li
  0 siblings, 1 reply; 53+ messages in thread
From: Claudio Fontana @ 2023-08-01 17:10 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Sean Christopherson,
	David Hildenbrand, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 7/31/23 18:21, Xiaoyao Li wrote:
> From: Chao Peng <chao.p.peng@linux.intel.com>
> 
> Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.
> 
> With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
> backen'ed both by hva-based shared memory and gmem fd based private
> memory.
> 
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  accel/kvm/kvm-all.c      | 57 +++++++++++++++++++++++++++++++++-------
>  accel/kvm/trace-events   |  2 +-
>  include/sysemu/kvm_int.h |  2 ++
>  3 files changed, 51 insertions(+), 10 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index d8eee405de24..7b1818334ba7 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -288,35 +288,68 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
>  static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, bool new)
>  {
>      KVMState *s = kvm_state;
> -    struct kvm_userspace_memory_region mem;
> +    struct kvm_userspace_memory_region2 mem;
> +    static int cap_user_memory2 = -1;
>      int ret;
>  
> +    if (cap_user_memory2 == -1) {
> +        cap_user_memory2 = kvm_check_extension(s, KVM_CAP_USER_MEMORY2);
> +    }
> +
> +    if (!cap_user_memory2 && slot->fd >= 0) {
> +        error_report("%s, KVM doesn't support gmem!", __func__);
> +        exit(1);
> +    }

We handle this special error case here,
while the existing callers of kvm_set_user_memory_region handle the other error cases in different places.

Not that the rest of kvm-all does an excellent job at error handling, but maybe we can avoid compounding on the issue.

> +
>      mem.slot = slot->slot | (kml->as_id << 16);
>      mem.guest_phys_addr = slot->start_addr;
>      mem.userspace_addr = (unsigned long)slot->ram;
>      mem.flags = slot->flags;
> +    mem.gmem_fd = slot->fd;
> +    mem.gmem_offset = slot->ofs;
>  
> -    if (slot->memory_size && !new && (mem.flags ^ slot->old_flags) & KVM_MEM_READONLY) {
> +    if (slot->memory_size && !new && (slot->flags ^ slot->old_flags) & KVM_MEM_READONLY) {

Why the change if mem.flags == slot->flags ?

>          /* Set the slot size to 0 before setting the slot to the desired
>           * value. This is needed based on KVM commit 75d61fbc. */
>          mem.memory_size = 0;
> -        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
> +
> +        if (cap_user_memory2) {
> +            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
> +        } else {
> +            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
> +	    }
>          if (ret < 0) {
>              goto err;
>          }
>      }
>      mem.memory_size = slot->memory_size;
> -    ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
> +    if (cap_user_memory2) {
> +        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
> +    } else {
> +        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
> +    }
>      slot->old_flags = mem.flags;
>  err:
>      trace_kvm_set_user_memory(mem.slot >> 16, (uint16_t)mem.slot, mem.flags,
>                                mem.guest_phys_addr, mem.memory_size,
> -                              mem.userspace_addr, ret);
> +                              mem.userspace_addr, mem.gmem_fd,
> +			      mem.gmem_offset, ret);
>      if (ret < 0) {
> -        error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
> -                     " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
> -                     __func__, mem.slot, slot->start_addr,
> -                     (uint64_t)mem.memory_size, strerror(errno));
> +        if (cap_user_memory2) {
> +                error_report("%s: KVM_SET_USER_MEMORY_REGION2 failed, slot=%d,"
> +                        " start=0x%" PRIx64 ", size=0x%" PRIx64 ","
> +                        " flags=0x%" PRIx32 ","
> +                        " gmem_fd=%" PRId32 ", gmem_offset=0x%" PRIx64 ": %s",
> +                        __func__, mem.slot, slot->start_addr,
> +                (uint64_t)mem.memory_size, mem.flags,
> +                        mem.gmem_fd, (uint64_t)mem.gmem_offset,
> +                        strerror(errno));
> +        } else {
> +                error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
> +                            " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
> +                __func__, mem.slot, slot->start_addr,
> +                (uint64_t)mem.memory_size, strerror(errno));
> +        }
>      }
>      return ret;
>  }
> @@ -472,6 +505,9 @@ static int kvm_mem_flags(MemoryRegion *mr)
>      if (readonly && kvm_readonly_mem_allowed) {
>          flags |= KVM_MEM_READONLY;
>      }
> +    if (memory_region_can_be_private(mr)) {
> +        flags |= KVM_MEM_PRIVATE;
> +    }
>      return flags;
>  }
>  
> @@ -1402,6 +1438,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>          mem->ram_start_offset = ram_start_offset;
>          mem->ram = ram;
>          mem->flags = kvm_mem_flags(mr);
> +        mem->fd = mr->ram_block->gmem_fd;
> +        mem->ofs = (uint8_t*)ram - mr->ram_block->host;
> +
>          kvm_slot_init_dirty_bitmap(mem);
>          err = kvm_set_user_memory_region(kml, mem, true);
>          if (err) {
> diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
> index 14ebfa1b991c..80694683acea 100644
> --- a/accel/kvm/trace-events
> +++ b/accel/kvm/trace-events
> @@ -15,7 +15,7 @@ kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
>  kvm_irqchip_release_virq(int virq) "virq %d"
>  kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
>  kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
> -kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
> +kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, uint32_t fd, uint64_t fd_offset, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " gmem_fd=%d" " gmem_fd_offset=0x%" PRIx64 " ret=%d"
>  kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
>  kvm_resample_fd_notify(int gsi) "gsi %d"
>  kvm_dirty_ring_full(int id) "vcpu %d"
> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index 511b42bde5c4..48220c0793ac 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -30,6 +30,8 @@ typedef struct KVMSlot
>      int as_id;
>      /* Cache of the offset in ram address space */
>      ram_addr_t ram_start_offset;
> +    int fd;
> +    hwaddr ofs;
>  } KVMSlot;
>  
>  typedef struct KVMMemoryUpdate {



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-07-31 16:21 ` [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem Xiaoyao Li
  2023-07-31 17:22   ` Markus Armbruster
@ 2023-08-01 17:21   ` David Hildenbrand
  2023-08-02  8:03     ` Xiaoyao Li
  1 sibling, 1 reply; 53+ messages in thread
From: David Hildenbrand @ 2023-08-01 17:21 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Sean Christopherson, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 31.07.23 18:21, Xiaoyao Li wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>   backends/hostmem.c       | 18 ++++++++++++++++++
>   include/sysemu/hostmem.h |  2 +-
>   qapi/qom.json            |  4 ++++
>   3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 747e7838c031..dbdbb0aafd45 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -461,6 +461,20 @@ static void host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
>       }
>       backend->reserve = value;
>   }
> +
> +static bool host_memory_backend_get_private(Object *o, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> +
> +    return backend->private;
> +}
> +
> +static void host_memory_backend_set_private(Object *o, bool value, Error **errp)
> +{
> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> +
> +    backend->private = value;
> +}
>   #endif /* CONFIG_LINUX */
>   
>   static bool
> @@ -541,6 +555,10 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
>           host_memory_backend_get_reserve, host_memory_backend_set_reserve);
>       object_class_property_set_description(oc, "reserve",
>           "Reserve swap space (or huge pages) if applicable");
> +    object_class_property_add_bool(oc, "private",
> +        host_memory_backend_get_private, host_memory_backend_set_private);
> +    object_class_property_set_description(oc, "private",
> +        "Use KVM gmem private memory");
>   #endif /* CONFIG_LINUX */
>       /*
>        * Do not delete/rename option. This option must be considered stable
> diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
> index 39326f1d4f9c..d88970395618 100644
> --- a/include/sysemu/hostmem.h
> +++ b/include/sysemu/hostmem.h
> @@ -65,7 +65,7 @@ struct HostMemoryBackend {
>       /* protected */
>       uint64_t size;
>       bool merge, dump, use_canonical_path;
> -    bool prealloc, is_mapped, share, reserve;
> +    bool prealloc, is_mapped, share, reserve, private;
>       uint32_t prealloc_threads;
>       ThreadContext *prealloc_context;
>       DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 7f92ea43e8e1..e0b2044e3d20 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -605,6 +605,9 @@
>   # @reserve: if true, reserve swap space (or huge pages) if applicable
>   #     (default: true) (since 6.1)
>   #
> +# @private: if true, use KVM gmem private memory
> +#           (default: false) (since 8.1)
> +#

But that's not what any of this does.

This patch only adds a property and doesn't even explain what it intends 
to achieve with that.

How will it be used from a user? What will it affect internally? What 
will it modify in regards of the memory backend?

That all should go into the surprisingly empty patch description.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-08-01 17:21   ` David Hildenbrand
@ 2023-08-02  8:03     ` Xiaoyao Li
  2023-08-02 14:14       ` David Hildenbrand
  0 siblings, 1 reply; 53+ messages in thread
From: Xiaoyao Li @ 2023-08-02  8:03 UTC (permalink / raw)
  To: David Hildenbrand, Paolo Bonzini, Sean Christopherson,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 8/2/2023 1:21 AM, David Hildenbrand wrote:
> On 31.07.23 18:21, Xiaoyao Li wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   backends/hostmem.c       | 18 ++++++++++++++++++
>>   include/sysemu/hostmem.h |  2 +-
>>   qapi/qom.json            |  4 ++++
>>   3 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>> index 747e7838c031..dbdbb0aafd45 100644
>> --- a/backends/hostmem.c
>> +++ b/backends/hostmem.c
>> @@ -461,6 +461,20 @@ static void 
>> host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
>>       }
>>       backend->reserve = value;
>>   }
>> +
>> +static bool host_memory_backend_get_private(Object *o, Error **errp)
>> +{
>> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
>> +
>> +    return backend->private;
>> +}
>> +
>> +static void host_memory_backend_set_private(Object *o, bool value, 
>> Error **errp)
>> +{
>> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
>> +
>> +    backend->private = value;
>> +}
>>   #endif /* CONFIG_LINUX */
>>   static bool
>> @@ -541,6 +555,10 @@ host_memory_backend_class_init(ObjectClass *oc, 
>> void *data)
>>           host_memory_backend_get_reserve, 
>> host_memory_backend_set_reserve);
>>       object_class_property_set_description(oc, "reserve",
>>           "Reserve swap space (or huge pages) if applicable");
>> +    object_class_property_add_bool(oc, "private",
>> +        host_memory_backend_get_private, 
>> host_memory_backend_set_private);
>> +    object_class_property_set_description(oc, "private",
>> +        "Use KVM gmem private memory");
>>   #endif /* CONFIG_LINUX */
>>       /*
>>        * Do not delete/rename option. This option must be considered 
>> stable
>> diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
>> index 39326f1d4f9c..d88970395618 100644
>> --- a/include/sysemu/hostmem.h
>> +++ b/include/sysemu/hostmem.h
>> @@ -65,7 +65,7 @@ struct HostMemoryBackend {
>>       /* protected */
>>       uint64_t size;
>>       bool merge, dump, use_canonical_path;
>> -    bool prealloc, is_mapped, share, reserve;
>> +    bool prealloc, is_mapped, share, reserve, private;
>>       uint32_t prealloc_threads;
>>       ThreadContext *prealloc_context;
>>       DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index 7f92ea43e8e1..e0b2044e3d20 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -605,6 +605,9 @@
>>   # @reserve: if true, reserve swap space (or huge pages) if applicable
>>   #     (default: true) (since 6.1)
>>   #
>> +# @private: if true, use KVM gmem private memory
>> +#           (default: false) (since 8.1)
>> +#
> 
> But that's not what any of this does.
> 
> This patch only adds a property and doesn't even explain what it intends 
> to achieve with that.
> 
> How will it be used from a user? What will it affect internally? What 
> will it modify in regards of the memory backend?

How it will be used is in the next patch, patch 09.

for kvm_x86_sw_protected_vm type VM, it will allocate private gmem with 
KVM ioctl if the memory backend has property "private" on.

> That all should go into the surprisingly empty patch description.

I'm sorry. I admit the empty commit message is really bad.





^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-08-01 14:57     ` Daniel P. Berrangé
@ 2023-08-02  8:04       ` Xiaoyao Li
  0 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-08-02  8:04 UTC (permalink / raw)
  To: Daniel P. Berrangé, Markus Armbruster
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Eric Blake,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 8/1/2023 10:57 PM, Daniel P. Berrangé wrote:
> On Mon, Jul 31, 2023 at 07:22:05PM +0200, Markus Armbruster wrote:
>> Xiaoyao Li <xiaoyao.li@intel.com> writes:
>>
>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>
>>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>
>> [...]
>>
>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>> index 7f92ea43e8e1..e0b2044e3d20 100644
>>> --- a/qapi/qom.json
>>> +++ b/qapi/qom.json
>>> @@ -605,6 +605,9 @@
>>>   # @reserve: if true, reserve swap space (or huge pages) if applicable
>>>   #     (default: true) (since 6.1)
>>>   #
>>> +# @private: if true, use KVM gmem private memory
>>> +#           (default: false) (since 8.1)
>>> +#
>>
>> Please format like
>>
>>     # @private: if true, use KVM gmem private memory (default: false)
>>     #     (since 8.1)
> 
> Also QEMU 8.1.0 is in freeze right now, so there's no chance
> of these patches making 8.1.0. IOW, use "since 8.2" as the
> next release you might achieve merge for.

Thanks for pointing it out. Will do it in next version.

> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private()
  2023-08-01 16:52     ` Claudio Fontana
@ 2023-08-02  8:05       ` Xiaoyao Li
  0 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-08-02  8:05 UTC (permalink / raw)
  To: Claudio Fontana, Paolo Bonzini, Sean Christopherson,
	David Hildenbrand, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 8/2/2023 12:52 AM, Claudio Fontana wrote:
> On 8/1/23 18:48, Claudio Fontana wrote:
>> On 7/31/23 18:21, Xiaoyao Li wrote:
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>> ---
>>>   include/exec/memory.h | 9 +++++++++
>>>   softmmu/memory.c      | 5 +++++
>>>   2 files changed, 14 insertions(+)
>>>
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index 61e31c7b9874..e119d3ce1a1d 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -1679,6 +1679,15 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
>>>    */
>>>   bool memory_region_is_protected(MemoryRegion *mr);
>>>   
>>> +/**
>>> + * memory_region_can_be_private: check whether a memory region can be private
>>
>> The name of the function is not particularly informative,
>>
>>> + *
>>> + * Returns %true if a memory region's ram_block has valid gmem fd assigned.
>>
>> but in your comment you describe more accurately what it does, why not make it the function name?
>>
>> bool memory_region_has_valid_gmem_fd()
> 
> 
> btw can a memory region have an invalid gmem_fd ?
> 
> If an invalid gmem_fd is just used to mark whether gmem_fd is present or not,
> 
> we could make it just:
> 
> bool memory_region_has_gmem_fd()

yes. It's a good suggestion!

I will use it in next version if no objection from others.

Thanks,
-Xiaoyao






^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-08-02  8:03     ` Xiaoyao Li
@ 2023-08-02 14:14       ` David Hildenbrand
  2023-08-02 22:53         ` Isaku Yamahata
  0 siblings, 1 reply; 53+ messages in thread
From: David Hildenbrand @ 2023-08-02 14:14 UTC (permalink / raw)
  To: Xiaoyao Li, Paolo Bonzini, Sean Christopherson, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 02.08.23 10:03, Xiaoyao Li wrote:
> On 8/2/2023 1:21 AM, David Hildenbrand wrote:
>> On 31.07.23 18:21, Xiaoyao Li wrote:
>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>
>>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>> ---
>>>    backends/hostmem.c       | 18 ++++++++++++++++++
>>>    include/sysemu/hostmem.h |  2 +-
>>>    qapi/qom.json            |  4 ++++
>>>    3 files changed, 23 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>>> index 747e7838c031..dbdbb0aafd45 100644
>>> --- a/backends/hostmem.c
>>> +++ b/backends/hostmem.c
>>> @@ -461,6 +461,20 @@ static void
>>> host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
>>>        }
>>>        backend->reserve = value;
>>>    }
>>> +
>>> +static bool host_memory_backend_get_private(Object *o, Error **errp)
>>> +{
>>> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
>>> +
>>> +    return backend->private;
>>> +}
>>> +
>>> +static void host_memory_backend_set_private(Object *o, bool value,
>>> Error **errp)
>>> +{
>>> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
>>> +
>>> +    backend->private = value;
>>> +}
>>>    #endif /* CONFIG_LINUX */
>>>    static bool
>>> @@ -541,6 +555,10 @@ host_memory_backend_class_init(ObjectClass *oc,
>>> void *data)
>>>            host_memory_backend_get_reserve,
>>> host_memory_backend_set_reserve);
>>>        object_class_property_set_description(oc, "reserve",
>>>            "Reserve swap space (or huge pages) if applicable");
>>> +    object_class_property_add_bool(oc, "private",
>>> +        host_memory_backend_get_private,
>>> host_memory_backend_set_private);
>>> +    object_class_property_set_description(oc, "private",
>>> +        "Use KVM gmem private memory");
>>>    #endif /* CONFIG_LINUX */
>>>        /*
>>>         * Do not delete/rename option. This option must be considered
>>> stable
>>> diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
>>> index 39326f1d4f9c..d88970395618 100644
>>> --- a/include/sysemu/hostmem.h
>>> +++ b/include/sysemu/hostmem.h
>>> @@ -65,7 +65,7 @@ struct HostMemoryBackend {
>>>        /* protected */
>>>        uint64_t size;
>>>        bool merge, dump, use_canonical_path;
>>> -    bool prealloc, is_mapped, share, reserve;
>>> +    bool prealloc, is_mapped, share, reserve, private;
>>>        uint32_t prealloc_threads;
>>>        ThreadContext *prealloc_context;
>>>        DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>> index 7f92ea43e8e1..e0b2044e3d20 100644
>>> --- a/qapi/qom.json
>>> +++ b/qapi/qom.json
>>> @@ -605,6 +605,9 @@
>>>    # @reserve: if true, reserve swap space (or huge pages) if applicable
>>>    #     (default: true) (since 6.1)
>>>    #
>>> +# @private: if true, use KVM gmem private memory
>>> +#           (default: false) (since 8.1)
>>> +#
>>
>> But that's not what any of this does.
>>
>> This patch only adds a property and doesn't even explain what it intends
>> to achieve with that.
>>
>> How will it be used from a user? What will it affect internally? What
>> will it modify in regards of the memory backend?
> 
> How it will be used is in the next patch, patch 09.
> 
> for kvm_x86_sw_protected_vm type VM, it will allocate private gmem with
> KVM ioctl if the memory backend has property "private" on.

It feels wired up the wrong way.

When creating/initializing the memory backend, we should also take care 
of allocating the gmem_fd, for example, by doing some gmem allocation 
callback, ideally *internally* creating the RAM memory region / RAMBlock.

And we should fail if that is impossible (gmem does not apply to the VM) 
or creating the gmem_fd failed for other reason.

Like passing a RAM_GMEM flag to memory_region_init_ram_flags_nomigrate() 
in ram_backend_memory_alloc(), to then handle it internally, failing if 
there is an error.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT
  2023-07-31 16:21 ` [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
@ 2023-08-02 22:25   ` Isaku Yamahata
  2023-09-13  6:59     ` Xiaoyao Li
  2023-08-09 15:02   ` Xu Yilun
  1 sibling, 1 reply; 53+ messages in thread
From: Isaku Yamahata @ 2023-08-02 22:25 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On Mon, Jul 31, 2023 at 12:21:57PM -0400,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> From: Chao Peng <chao.p.peng@linux.intel.com>
> 
> Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
> KVM_EXIT_MEMORY_FAULT happens. It indicates userspace needs to do
> the memory conversion on the RAMBlock to turn the memory into desired
> attribute, i.e., private/shared.
> 
> Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
> gmem memory backend.
> 
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index f9b5050b8885..72d50b923bf2 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -3040,6 +3040,48 @@ static void kvm_eat_signals(CPUState *cpu)
>      } while (sigismember(&chkset, SIG_IPI));
>  }
>  
> +static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> +{
> +    MemoryRegionSection section;
> +    void *addr;
> +    RAMBlock *rb;
> +    ram_addr_t offset;
> +    int ret = -1;
> +
> +    section = memory_region_find(get_system_memory(), start, size);
> +    if (!section.mr) {
> +        return ret;
> +    }
> +
> +    if (memory_region_can_be_private(section.mr)) {
> +        if (to_private) {
> +            ret = kvm_set_memory_attributes_private(start, size);
> +        } else {
> +            ret = kvm_set_memory_attributes_shared(start, size);
> +        }
> +
> +        if (ret) {
> +            return ret;
> +        }
> +
> +        addr = memory_region_get_ram_ptr(section.mr) +
> +               section.offset_within_region;
> +        rb = qemu_ram_block_from_host(addr, false, &offset);

Here we have already section. section.mr->ram_block.  We don't have to
scan the existing RAMBlocks.

Except that, looks good to me.
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 19/19] i386: Disable SMM mode for X86_SW_PROTECTED_VM
  2023-07-31 16:22 ` [RFC PATCH 19/19] i386: Disable SMM mode for X86_SW_PROTECTED_VM Xiaoyao Li
@ 2023-08-02 22:27   ` Isaku Yamahata
  0 siblings, 0 replies; 53+ messages in thread
From: Isaku Yamahata @ 2023-08-02 22:27 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On Mon, Jul 31, 2023 at 12:22:01PM -0400,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  target/i386/kvm/kvm.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index a96640512dbc..62f237068a3a 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2654,6 +2654,13 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  
>      if (x86ms->vm_type == KVM_X86_SW_PROTECTED_VM) {
>          memory_listener_register(&kvm_x86_sw_protected_vm_memory_listener, &address_space_memory);
> +
> +        if (x86ms->smm == ON_OFF_AUTO_AUTO) {
> +            x86ms->smm = ON_OFF_AUTO_OFF;
> +        } else if (x86ms->smm == ON_OFF_AUTO_ON) {
> +            error_report("X86_SW_PROTECTED_VM doesn't support SMM");
> +            return -EINVAL;
> +        }
>      }
>  

If we use confidential guest support, this check should go to there.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-08-02 14:14       ` David Hildenbrand
@ 2023-08-02 22:53         ` Isaku Yamahata
  2023-08-03 13:05           ` David Hildenbrand
  0 siblings, 1 reply; 53+ messages in thread
From: Isaku Yamahata @ 2023-08-02 22:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Xiaoyao Li, Paolo Bonzini, Sean Christopherson, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Marcelo Tosatti, Markus Armbruster, Eric Blake,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On Wed, Aug 02, 2023 at 04:14:29PM +0200,
David Hildenbrand <david@redhat.com> wrote:

> On 02.08.23 10:03, Xiaoyao Li wrote:
> > On 8/2/2023 1:21 AM, David Hildenbrand wrote:
> > > On 31.07.23 18:21, Xiaoyao Li wrote:
> > > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > 
> > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > > Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > > > ---
> > > >    backends/hostmem.c       | 18 ++++++++++++++++++
> > > >    include/sysemu/hostmem.h |  2 +-
> > > >    qapi/qom.json            |  4 ++++
> > > >    3 files changed, 23 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/backends/hostmem.c b/backends/hostmem.c
> > > > index 747e7838c031..dbdbb0aafd45 100644
> > > > --- a/backends/hostmem.c
> > > > +++ b/backends/hostmem.c
> > > > @@ -461,6 +461,20 @@ static void
> > > > host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
> > > >        }
> > > >        backend->reserve = value;
> > > >    }
> > > > +
> > > > +static bool host_memory_backend_get_private(Object *o, Error **errp)
> > > > +{
> > > > +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> > > > +
> > > > +    return backend->private;
> > > > +}
> > > > +
> > > > +static void host_memory_backend_set_private(Object *o, bool value,
> > > > Error **errp)
> > > > +{
> > > > +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
> > > > +
> > > > +    backend->private = value;
> > > > +}
> > > >    #endif /* CONFIG_LINUX */
> > > >    static bool
> > > > @@ -541,6 +555,10 @@ host_memory_backend_class_init(ObjectClass *oc,
> > > > void *data)
> > > >            host_memory_backend_get_reserve,
> > > > host_memory_backend_set_reserve);
> > > >        object_class_property_set_description(oc, "reserve",
> > > >            "Reserve swap space (or huge pages) if applicable");
> > > > +    object_class_property_add_bool(oc, "private",
> > > > +        host_memory_backend_get_private,
> > > > host_memory_backend_set_private);
> > > > +    object_class_property_set_description(oc, "private",
> > > > +        "Use KVM gmem private memory");
> > > >    #endif /* CONFIG_LINUX */
> > > >        /*
> > > >         * Do not delete/rename option. This option must be considered
> > > > stable
> > > > diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
> > > > index 39326f1d4f9c..d88970395618 100644
> > > > --- a/include/sysemu/hostmem.h
> > > > +++ b/include/sysemu/hostmem.h
> > > > @@ -65,7 +65,7 @@ struct HostMemoryBackend {
> > > >        /* protected */
> > > >        uint64_t size;
> > > >        bool merge, dump, use_canonical_path;
> > > > -    bool prealloc, is_mapped, share, reserve;
> > > > +    bool prealloc, is_mapped, share, reserve, private;
> > > >        uint32_t prealloc_threads;
> > > >        ThreadContext *prealloc_context;
> > > >        DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
> > > > diff --git a/qapi/qom.json b/qapi/qom.json
> > > > index 7f92ea43e8e1..e0b2044e3d20 100644
> > > > --- a/qapi/qom.json
> > > > +++ b/qapi/qom.json
> > > > @@ -605,6 +605,9 @@
> > > >    # @reserve: if true, reserve swap space (or huge pages) if applicable
> > > >    #     (default: true) (since 6.1)
> > > >    #
> > > > +# @private: if true, use KVM gmem private memory
> > > > +#           (default: false) (since 8.1)
> > > > +#
> > > 
> > > But that's not what any of this does.
> > > 
> > > This patch only adds a property and doesn't even explain what it intends
> > > to achieve with that.
> > > 
> > > How will it be used from a user? What will it affect internally? What
> > > will it modify in regards of the memory backend?
> > 
> > How it will be used is in the next patch, patch 09.
> > 
> > for kvm_x86_sw_protected_vm type VM, it will allocate private gmem with
> > KVM ioctl if the memory backend has property "private" on.
> 
> It feels wired up the wrong way.
> 
> When creating/initializing the memory backend, we should also take care of
> allocating the gmem_fd, for example, by doing some gmem allocation callback,
> ideally *internally* creating the RAM memory region / RAMBlock.
> 
> And we should fail if that is impossible (gmem does not apply to the VM) or
> creating the gmem_fd failed for other reason.
> 
> Like passing a RAM_GMEM flag to memory_region_init_ram_flags_nomigrate() in
> ram_backend_memory_alloc(), to then handle it internally, failing if there
> is an error.

KVM gmem is tied to VM. not before creating VM. We have to delay of the
allocation of kvm gmem until VM initialization.

Hmm, one options is to move gmem_fd from RAMBlock to KVMSlot.  Handle the
allocation of KVM gmem (issuing KVM gmem ioctl) there. i.e. in
kvm_set_phys_mem() or kvm_region_add() (or whatever functions of KVM memory
listener).  Maybe we can drop ram_block_convert_range() and can have KVM
specific logic instead.

We still need a way for user to specify which guest memory region is subject
to KVM gmem, though.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 06/19] i386/pc: Drop pc_machine_kvm_type()
  2023-07-31 16:21 ` [RFC PATCH 06/19] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
@ 2023-08-02 23:00   ` Isaku Yamahata
  0 siblings, 0 replies; 53+ messages in thread
From: Isaku Yamahata @ 2023-08-02 23:00 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On Mon, Jul 31, 2023 at 12:21:48PM -0400,
Xiaoyao Li <xiaoyao.li@intel.com> wrote:

> pc_machine_kvm_type() was introduced by commit e21be724eaf5 ("i386/xen:
> add pc_machine_kvm_type to initialize XEN_EMULATE mode") to do Xen
> specific initialization by utilizing kvm_type method.
> 
> commit eeedfe6c6316 ("hw/xen: Simplify emulated Xen platform init")
> moves the Xen specific initialization to pc_basic_device_init().
> 
> There is no need to keep the PC specific kvm_type() implementation
> anymore. On the other hand, later patch will implement kvm_type()
> method for all x86/i386 machines to support KVM_X86_SW_PROTECTED_VM.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  hw/i386/pc.c         | 5 -----
>  include/hw/i386/pc.h | 3 ---
>  2 files changed, 8 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 3109d5e0e035..abeadd903827 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1794,11 +1794,6 @@ static void pc_machine_initfn(Object *obj)
>      cxl_machine_init(obj, &pcms->cxl_devices_state);
>  }
>  
> -int pc_machine_kvm_type(MachineState *machine, const char *kvm_type)
> -{
> -    return 0;
> -}
> -
>  static void pc_machine_reset(MachineState *machine, ShutdownCause reason)
>  {
>      CPUState *cs;
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index d54e8b1101e4..c98d628a76f3 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -296,15 +296,12 @@ extern const size_t pc_compat_1_5_len;
>  extern GlobalProperty pc_compat_1_4[];
>  extern const size_t pc_compat_1_4_len;
>  
> -int pc_machine_kvm_type(MachineState *machine, const char *vm_type);
> -
>  #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
>      static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
>      { \
>          MachineClass *mc = MACHINE_CLASS(oc); \
>          optsfn(mc); \
>          mc->init = initfn; \
> -        mc->kvm_type = pc_machine_kvm_type; \
>      } \
>      static const TypeInfo pc_machine_type_##suffix = { \
>          .name       = namestr TYPE_MACHINE_SUFFIX, \
> -- 
> 2.34.1
> 

It seems strange for MachineClass to have kvm_type(). Probably AccelClass.
(struct KVMAccelClass?)

Anyway this is independent clean up.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 05/19] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot
  2023-08-01 17:10   ` Claudio Fontana
@ 2023-08-03  8:43     ` Xiaoyao Li
  0 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-08-03  8:43 UTC (permalink / raw)
  To: Claudio Fontana, Paolo Bonzini, Sean Christopherson,
	David Hildenbrand, Igor Mammedov, Michael S. Tsirkin,
	Marcel Apfelbaum, Richard Henderson, Marcelo Tosatti
  Cc: Markus Armbruster, Eric Blake, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 8/2/2023 1:10 AM, Claudio Fontana wrote:
> On 7/31/23 18:21, Xiaoyao Li wrote:
>> From: Chao Peng <chao.p.peng@linux.intel.com>
>>
>> Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.
>>
>> With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
>> backen'ed both by hva-based shared memory and gmem fd based private
>> memory.
>>
>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>> Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   accel/kvm/kvm-all.c      | 57 +++++++++++++++++++++++++++++++++-------
>>   accel/kvm/trace-events   |  2 +-
>>   include/sysemu/kvm_int.h |  2 ++
>>   3 files changed, 51 insertions(+), 10 deletions(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index d8eee405de24..7b1818334ba7 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -288,35 +288,68 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
>>   static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, bool new)
>>   {
>>       KVMState *s = kvm_state;
>> -    struct kvm_userspace_memory_region mem;
>> +    struct kvm_userspace_memory_region2 mem;
>> +    static int cap_user_memory2 = -1;
>>       int ret;
>>   
>> +    if (cap_user_memory2 == -1) {
>> +        cap_user_memory2 = kvm_check_extension(s, KVM_CAP_USER_MEMORY2);
>> +    }
>> +
>> +    if (!cap_user_memory2 && slot->fd >= 0) {
>> +        error_report("%s, KVM doesn't support gmem!", __func__);
>> +        exit(1);
>> +    }
> 
> We handle this special error case here,
> while the existing callers of kvm_set_user_memory_region handle the other error cases in different places.
> 
> Not that the rest of kvm-all does an excellent job at error handling, but maybe we can avoid compounding on the issue.

I'm not sure how to align them. Do you have any suggestion?

>> +
>>       mem.slot = slot->slot | (kml->as_id << 16);
>>       mem.guest_phys_addr = slot->start_addr;
>>       mem.userspace_addr = (unsigned long)slot->ram;
>>       mem.flags = slot->flags;
>> +    mem.gmem_fd = slot->fd;
>> +    mem.gmem_offset = slot->ofs;
>>   
>> -    if (slot->memory_size && !new && (mem.flags ^ slot->old_flags) & KVM_MEM_READONLY) {
>> +    if (slot->memory_size && !new && (slot->flags ^ slot->old_flags) & KVM_MEM_READONLY) {
> 
> Why the change if mem.flags == slot->flags ?

I guess the goal is to make it clearer that it's comparing the (new) 
flags with old_flags of the slot.

Anyway, if this change is annoying, I can drop it. :)

>>           /* Set the slot size to 0 before setting the slot to the desired
>>            * value. This is needed based on KVM commit 75d61fbc. */
>>           mem.memory_size = 0;
>> -        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
>> +
>> +        if (cap_user_memory2) {
>> +            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
>> +        } else {
>> +            ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
>> +	    }
>>           if (ret < 0) {
>>               goto err;
>>           }
>>       }
>>       mem.memory_size = slot->memory_size;
>> -    ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
>> +    if (cap_user_memory2) {
>> +        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION2, &mem);
>> +    } else {
>> +        ret = kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
>> +    }
>>       slot->old_flags = mem.flags;
>>   err:
>>       trace_kvm_set_user_memory(mem.slot >> 16, (uint16_t)mem.slot, mem.flags,
>>                                 mem.guest_phys_addr, mem.memory_size,
>> -                              mem.userspace_addr, ret);
>> +                              mem.userspace_addr, mem.gmem_fd,
>> +			      mem.gmem_offset, ret);
>>       if (ret < 0) {
>> -        error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
>> -                     " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
>> -                     __func__, mem.slot, slot->start_addr,
>> -                     (uint64_t)mem.memory_size, strerror(errno));
>> +        if (cap_user_memory2) {
>> +                error_report("%s: KVM_SET_USER_MEMORY_REGION2 failed, slot=%d,"
>> +                        " start=0x%" PRIx64 ", size=0x%" PRIx64 ","
>> +                        " flags=0x%" PRIx32 ","
>> +                        " gmem_fd=%" PRId32 ", gmem_offset=0x%" PRIx64 ": %s",
>> +                        __func__, mem.slot, slot->start_addr,
>> +                (uint64_t)mem.memory_size, mem.flags,
>> +                        mem.gmem_fd, (uint64_t)mem.gmem_offset,
>> +                        strerror(errno));
>> +        } else {
>> +                error_report("%s: KVM_SET_USER_MEMORY_REGION failed, slot=%d,"
>> +                            " start=0x%" PRIx64 ", size=0x%" PRIx64 ": %s",
>> +                __func__, mem.slot, slot->start_addr,
>> +                (uint64_t)mem.memory_size, strerror(errno));
>> +        }
>>       }
>>       return ret;
>>   }
>> @@ -472,6 +505,9 @@ static int kvm_mem_flags(MemoryRegion *mr)
>>       if (readonly && kvm_readonly_mem_allowed) {
>>           flags |= KVM_MEM_READONLY;
>>       }
>> +    if (memory_region_can_be_private(mr)) {
>> +        flags |= KVM_MEM_PRIVATE;
>> +    }
>>       return flags;
>>   }
>>   
>> @@ -1402,6 +1438,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>           mem->ram_start_offset = ram_start_offset;
>>           mem->ram = ram;
>>           mem->flags = kvm_mem_flags(mr);
>> +        mem->fd = mr->ram_block->gmem_fd;
>> +        mem->ofs = (uint8_t*)ram - mr->ram_block->host;
>> +
>>           kvm_slot_init_dirty_bitmap(mem);
>>           err = kvm_set_user_memory_region(kml, mem, true);
>>           if (err) {
>> diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
>> index 14ebfa1b991c..80694683acea 100644
>> --- a/accel/kvm/trace-events
>> +++ b/accel/kvm/trace-events
>> @@ -15,7 +15,7 @@ kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
>>   kvm_irqchip_release_virq(int virq) "virq %d"
>>   kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%" PRIx64 " val=0x%x assign: %d size: %d match: %d"
>>   kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
>> -kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
>> +kvm_set_user_memory(uint16_t as, uint16_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, uint32_t fd, uint64_t fd_offset, int ret) "AddrSpace#%d Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " gmem_fd=%d" " gmem_fd_offset=0x%" PRIx64 " ret=%d"
>>   kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
>>   kvm_resample_fd_notify(int gsi) "gsi %d"
>>   kvm_dirty_ring_full(int id) "vcpu %d"
>> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
>> index 511b42bde5c4..48220c0793ac 100644
>> --- a/include/sysemu/kvm_int.h
>> +++ b/include/sysemu/kvm_int.h
>> @@ -30,6 +30,8 @@ typedef struct KVMSlot
>>       int as_id;
>>       /* Cache of the offset in ram address space */
>>       ram_addr_t ram_start_offset;
>> +    int fd;
>> +    hwaddr ofs;
>>   } KVMSlot;
>>   
>>   typedef struct KVMMemoryUpdate {
> 



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem
  2023-08-02 22:53         ` Isaku Yamahata
@ 2023-08-03 13:05           ` David Hildenbrand
  0 siblings, 0 replies; 53+ messages in thread
From: David Hildenbrand @ 2023-08-03 13:05 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Xiaoyao Li, Paolo Bonzini, Sean Christopherson, Igor Mammedov,
	Michael S. Tsirkin, Marcel Apfelbaum, Richard Henderson,
	Marcelo Tosatti, Markus Armbruster, Eric Blake,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, qemu-devel, kvm

On 03.08.23 00:53, Isaku Yamahata wrote:
> On Wed, Aug 02, 2023 at 04:14:29PM +0200,
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 02.08.23 10:03, Xiaoyao Li wrote:
>>> On 8/2/2023 1:21 AM, David Hildenbrand wrote:
>>>> On 31.07.23 18:21, Xiaoyao Li wrote:
>>>>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>>
>>>>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>>>>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>> ---
>>>>>     backends/hostmem.c       | 18 ++++++++++++++++++
>>>>>     include/sysemu/hostmem.h |  2 +-
>>>>>     qapi/qom.json            |  4 ++++
>>>>>     3 files changed, 23 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>>>>> index 747e7838c031..dbdbb0aafd45 100644
>>>>> --- a/backends/hostmem.c
>>>>> +++ b/backends/hostmem.c
>>>>> @@ -461,6 +461,20 @@ static void
>>>>> host_memory_backend_set_reserve(Object *o, bool value, Error **errp)
>>>>>         }
>>>>>         backend->reserve = value;
>>>>>     }
>>>>> +
>>>>> +static bool host_memory_backend_get_private(Object *o, Error **errp)
>>>>> +{
>>>>> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
>>>>> +
>>>>> +    return backend->private;
>>>>> +}
>>>>> +
>>>>> +static void host_memory_backend_set_private(Object *o, bool value,
>>>>> Error **errp)
>>>>> +{
>>>>> +    HostMemoryBackend *backend = MEMORY_BACKEND(o);
>>>>> +
>>>>> +    backend->private = value;
>>>>> +}
>>>>>     #endif /* CONFIG_LINUX */
>>>>>     static bool
>>>>> @@ -541,6 +555,10 @@ host_memory_backend_class_init(ObjectClass *oc,
>>>>> void *data)
>>>>>             host_memory_backend_get_reserve,
>>>>> host_memory_backend_set_reserve);
>>>>>         object_class_property_set_description(oc, "reserve",
>>>>>             "Reserve swap space (or huge pages) if applicable");
>>>>> +    object_class_property_add_bool(oc, "private",
>>>>> +        host_memory_backend_get_private,
>>>>> host_memory_backend_set_private);
>>>>> +    object_class_property_set_description(oc, "private",
>>>>> +        "Use KVM gmem private memory");
>>>>>     #endif /* CONFIG_LINUX */
>>>>>         /*
>>>>>          * Do not delete/rename option. This option must be considered
>>>>> stable
>>>>> diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
>>>>> index 39326f1d4f9c..d88970395618 100644
>>>>> --- a/include/sysemu/hostmem.h
>>>>> +++ b/include/sysemu/hostmem.h
>>>>> @@ -65,7 +65,7 @@ struct HostMemoryBackend {
>>>>>         /* protected */
>>>>>         uint64_t size;
>>>>>         bool merge, dump, use_canonical_path;
>>>>> -    bool prealloc, is_mapped, share, reserve;
>>>>> +    bool prealloc, is_mapped, share, reserve, private;
>>>>>         uint32_t prealloc_threads;
>>>>>         ThreadContext *prealloc_context;
>>>>>         DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
>>>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>>>> index 7f92ea43e8e1..e0b2044e3d20 100644
>>>>> --- a/qapi/qom.json
>>>>> +++ b/qapi/qom.json
>>>>> @@ -605,6 +605,9 @@
>>>>>     # @reserve: if true, reserve swap space (or huge pages) if applicable
>>>>>     #     (default: true) (since 6.1)
>>>>>     #
>>>>> +# @private: if true, use KVM gmem private memory
>>>>> +#           (default: false) (since 8.1)
>>>>> +#
>>>>
>>>> But that's not what any of this does.
>>>>
>>>> This patch only adds a property and doesn't even explain what it intends
>>>> to achieve with that.
>>>>
>>>> How will it be used from a user? What will it affect internally? What
>>>> will it modify in regards of the memory backend?
>>>
>>> How it will be used is in the next patch, patch 09.
>>>
>>> for kvm_x86_sw_protected_vm type VM, it will allocate private gmem with
>>> KVM ioctl if the memory backend has property "private" on.
>>
>> It feels wired up the wrong way.
>>
>> When creating/initializing the memory backend, we should also take care of
>> allocating the gmem_fd, for example, by doing some gmem allocation callback,
>> ideally *internally* creating the RAM memory region / RAMBlock.
>>
>> And we should fail if that is impossible (gmem does not apply to the VM) or
>> creating the gmem_fd failed for other reason.
>>
>> Like passing a RAM_GMEM flag to memory_region_init_ram_flags_nomigrate() in
>> ram_backend_memory_alloc(), to then handle it internally, failing if there
>> is an error.
> 
> KVM gmem is tied to VM. not before creating VM. We have to delay of the
> allocation of kvm gmem until VM initialization.

In vl.c, the flow is

1) Create machine: qemu_create_machine()
2) Configure KVM: configure_accelerators()
3) Create backends: qemu_create_late_backends()

Staring at object_create_early(), "memory-backend-" area always late.

So maybe, at the time memory backends are created+initialized, 
everything you need  is already in place.

> 
> Hmm, one options is to move gmem_fd from RAMBlock to KVMSlot.  Handle the
> allocation of KVM gmem (issuing KVM gmem ioctl) there. i.e. in
> kvm_set_phys_mem() or kvm_region_add() (or whatever functions of KVM memory
> listener).  Maybe we can drop ram_block_convert_range() and can have KVM
> specific logic instead.

Might be doable as well.

> 
> We still need a way for user to specify which guest memory region is subject
> to KVM gmem, though.

Let's minimize the gmem hacks and come up with something clean.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT
  2023-07-31 16:21 ` [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
  2023-08-02 22:25   ` Isaku Yamahata
@ 2023-08-09 15:02   ` Xu Yilun
  2023-09-13  7:00     ` Xiaoyao Li
  1 sibling, 1 reply; 53+ messages in thread
From: Xu Yilun @ 2023-08-09 15:02 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 2023-07-31 at 12:21:57 -0400, Xiaoyao Li wrote:
> From: Chao Peng <chao.p.peng@linux.intel.com>
> 
> Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
> KVM_EXIT_MEMORY_FAULT happens. It indicates userspace needs to do
> the memory conversion on the RAMBlock to turn the memory into desired
> attribute, i.e., private/shared.
> 
> Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
> gmem memory backend.
> 
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
>  accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index f9b5050b8885..72d50b923bf2 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -3040,6 +3040,48 @@ static void kvm_eat_signals(CPUState *cpu)
>      } while (sigismember(&chkset, SIG_IPI));
>  }
>  
> +static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
> +{
> +    MemoryRegionSection section;
> +    void *addr;
> +    RAMBlock *rb;
> +    ram_addr_t offset;
> +    int ret = -1;
> +
> +    section = memory_region_find(get_system_memory(), start, size);
> +    if (!section.mr) {
> +        return ret;
> +    }
> +
> +    if (memory_region_can_be_private(section.mr)) {
> +        if (to_private) {
> +            ret = kvm_set_memory_attributes_private(start, size);
> +        } else {
> +            ret = kvm_set_memory_attributes_shared(start, size);
> +        }
> +
> +        if (ret) {
> +            return ret;

Should we unref the memory region before return?

Thanks,
Yilun

> +        }
> +
> +        addr = memory_region_get_ram_ptr(section.mr) +
> +               section.offset_within_region;
> +        rb = qemu_ram_block_from_host(addr, false, &offset);
> +        /*
> +         * With KVM_SET_MEMORY_ATTRIBUTES by kvm_set_memory_attributes(),
> +         * operation on underlying file descriptor is only for releasing
> +         * unnecessary pages.
> +         */
> +        ram_block_convert_range(rb, offset, size, to_private);
> +    } else {
> +        warn_report("Convert non guest-memfd backed memory region (0x%"HWADDR_PRIx" ,+ 0x%"HWADDR_PRIx") to %s",
> +                    start, size, to_private ? "private" : "shared");
> +    }
> +
> +    memory_region_unref(section.mr);
> +    return ret;
> +}


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 00/19] QEMU gmem implemention
  2023-08-01  1:45   ` Xiaoyao Li
@ 2023-08-10 15:58     ` Michael Roth via
  2023-08-14 21:45       ` Isaku Yamahata
  0 siblings, 1 reply; 53+ messages in thread
From: Michael Roth via @ 2023-08-10 15:58 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Daniel P. Berrangé,
	Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, isaku.yamahata, qemu-devel, kvm

On Tue, Aug 01, 2023 at 09:45:41AM +0800, Xiaoyao Li wrote:
> On 8/1/2023 12:51 AM, Daniel P. Berrangé wrote:
> > On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote:
> > > This is the first RFC version of enabling KVM gmem[1] as the backend for
> > > private memory of KVM_X86_PROTECTED_VM.
> > > 
> > > It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
> > > and introduces 'private' property for memory backend. When the vm type
> > > is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
> > > it will call KVM gmem ioctl to allocate private memory for the backend.
> > > 
> > >      $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
> > >            -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
> > > 	  ...
> > > 
> > > Unfortunately this patch series fails the boot of OVMF at very early
> > > stage due to triple fault because KVM doesn't support emulate string IO
> > > to private memory. We leave it as an open to be discussed.
> > > 
> > > There are following design opens that need to be discussed:
> > > 
> > > 1. how to determine the vm type?
> > > 
> > >     a. like this series, specify the vm type via machine property
> > >        'kvm-type'
> > >     b. check the memory backend, if any backend has 'private' property
> > >        set, the vm-type is set to KVM_X86_PROTECTED_VM.
> > > 
> > > 2. whether 'private' property is needed if we choose 1.b as design
> > > 
> > >     with 1.b, QEMU can decide whether the memory region needs to be
> > >     private (allocates gmem fd for it) or not, on its own.
> > > 
> > > 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
> > >     purose of it and what's the requirement on it. I think it's the
> > >     questions for KVM folks than QEMU folks.
> > > 
> > > Any other idea/open/question is welcomed.
> > > 
> > > 
> > > Beside, TDX QEMU implemetation is based on this series to provide
> > > private gmem for TD private memory, which can be found at [2].
> > > And it can work corresponding KVM [3] to boot TDX guest.
> > 
> > We already have a general purpose configuration mechanism for
> > confidential guests.  The -machine argument has a property
> > confidential-guest-support=$OBJECT-ID, for pointing to an
> > object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT
> > interface in QEMU. This is implemented with SEV, PPC PEF
> > mode, and s390 protvirt.
> > 
> > I would expect TDX to follow this same design ie
> > 
> >      qemu-system-x86_64 \
> >        -object tdx-guest,id=tdx0,..... \
> >        -machine q35,confidential-guest-support=tdx0 \
> >        ...
> > 
> > and not require inventing the new 'kvm-type' attribute at least.
> 
> yes.
> 
> TDX is initialized exactly as the above.
> 
> This RFC series introduces the 'kvm-type' for KVM_X86_SW_PROTECTED_VM. It's
> my fault that forgot to list the option of introducing sw_protected_vm
> object with CONFIDENTIAL_GUEST_SUPPORT interface.
> Thanks for Isaku to raise it https://lore.kernel.org/qemu-devel/20230731171041.GB1807130@ls.amr.corp.intel.com/
> 
> we can specify KVM_X86_SW_PROTECTED_VM this way:
> 
> qemu  \
>   -object sw-protected,id=swp0,... \
>   -machine confidential-guest-support=swp0 \
>   ...
> 
> > For the memory backend though, I'm not so sure - possibly that
> > might be something that still wants an extra property to identify
> > the type of memory to allocate, since we use memory-backend-ram
> > for a variety of use cases.  Or it could be an entirely new object
> > type such as "memory-backend-gmem"
> 
> What I want to discuss is whether providing the interface to users to allow
> them configuring which memory is/can be private. For example, QEMU can do it
> internally. If users wants a confidential guest, QEMU allocates private gmem
> for normal RAM automatically.

I think handling it automatically simplifies things a good deal on the
QEMU side. I think it's still worthwhile to still allow:

 -object memory-backend-memfd-private,...

because it provides a nice mechanism to set up a pair of shared/private
memfd's to enable hole-punching via fallocate() to avoid doubling memory
allocations for shared/private. It's also a nice place to control
potentially-configurable things like:

 - whether or not to enable discard/hole-punching
 - if discard is enabled, whether or not to register the range via
   RamDiscardManager interface so that VFIO/IOMMU mappings get updated
   when doing PCI passthrough. SNP relies on this for PCI passthrough
   when discard is enabled, otherwise DMA occurs to stale mappings of
   discarded bounce-buffer pages:

     https://github.com/AMDESE/qemu/blob/snp-latest/backends/hostmem-memfd-private.c#L449

But for other memory ranges, it doesn't do a lot of good to rely on
users to control those via -object memory-backend-memfd-private, since
QEMU will set up some regions internally, like the UEFI ROM.

It also isn't ideal for QEMU itself to internally control what
should/shouldn't be set up with a backing guest_memfd, because some
guest kernels do weird stuff, like scan for ROM regions in areas that
guest kernels might have mapped as encrypted in guest page table. You
can consider them to be guest bugs, but even current SNP-capable
kernels exhibit this behavior and if the guest wants to do dumb stuff
QEMU should let it.

But for these latter 2 cases, it doesn't make sense to attempt to do
any sort of discarding of backing pages since it doesn't make sense to
discard ROM pages.

So I think it makes sense to just set up the gmemfd automatically across
the board internally, and keep memory-backend-memfd-private around
purely as a way to control/configure discardable memory.

-Mike

> 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 00/19] QEMU gmem implemention
  2023-08-10 15:58     ` Michael Roth via
@ 2023-08-14 21:45       ` Isaku Yamahata
  0 siblings, 0 replies; 53+ messages in thread
From: Isaku Yamahata @ 2023-08-14 21:45 UTC (permalink / raw)
  To: Michael Roth
  Cc: Xiaoyao Li, Daniel P. Berrangé,
	Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, isaku.yamahata, qemu-devel, kvm

On Thu, Aug 10, 2023 at 10:58:09AM -0500,
Michael Roth via <qemu-devel@nongnu.org> wrote:

> On Tue, Aug 01, 2023 at 09:45:41AM +0800, Xiaoyao Li wrote:
> > On 8/1/2023 12:51 AM, Daniel P. Berrangé wrote:
> > > On Mon, Jul 31, 2023 at 12:21:42PM -0400, Xiaoyao Li wrote:
> > > > This is the first RFC version of enabling KVM gmem[1] as the backend for
> > > > private memory of KVM_X86_PROTECTED_VM.
> > > > 
> > > > It adds the support to create a specific KVM_X86_PROTECTED_VM type VM,
> > > > and introduces 'private' property for memory backend. When the vm type
> > > > is KVM_X86_PROTECTED_VM and memory backend has private enabled as below,
> > > > it will call KVM gmem ioctl to allocate private memory for the backend.
> > > > 
> > > >      $qemu -object memory-backend-ram,id=mem0,size=1G,private=on \
> > > >            -machine q35,kvm-type=sw-protected-vm,memory-backend=mem0 \
> > > > 	  ...
> > > > 
> > > > Unfortunately this patch series fails the boot of OVMF at very early
> > > > stage due to triple fault because KVM doesn't support emulate string IO
> > > > to private memory. We leave it as an open to be discussed.
> > > > 
> > > > There are following design opens that need to be discussed:
> > > > 
> > > > 1. how to determine the vm type?
> > > > 
> > > >     a. like this series, specify the vm type via machine property
> > > >        'kvm-type'
> > > >     b. check the memory backend, if any backend has 'private' property
> > > >        set, the vm-type is set to KVM_X86_PROTECTED_VM.
> > > > 
> > > > 2. whether 'private' property is needed if we choose 1.b as design
> > > > 
> > > >     with 1.b, QEMU can decide whether the memory region needs to be
> > > >     private (allocates gmem fd for it) or not, on its own.
> > > > 
> > > > 3. What is KVM_X86_SW_PROTECTED_VM going to look like? What's the
> > > >     purose of it and what's the requirement on it. I think it's the
> > > >     questions for KVM folks than QEMU folks.
> > > > 
> > > > Any other idea/open/question is welcomed.
> > > > 
> > > > 
> > > > Beside, TDX QEMU implemetation is based on this series to provide
> > > > private gmem for TD private memory, which can be found at [2].
> > > > And it can work corresponding KVM [3] to boot TDX guest.
> > > 
> > > We already have a general purpose configuration mechanism for
> > > confidential guests.  The -machine argument has a property
> > > confidential-guest-support=$OBJECT-ID, for pointing to an
> > > object that implements the TYPE_CONFIDENTIAL_GUEST_SUPPORT
> > > interface in QEMU. This is implemented with SEV, PPC PEF
> > > mode, and s390 protvirt.
> > > 
> > > I would expect TDX to follow this same design ie
> > > 
> > >      qemu-system-x86_64 \
> > >        -object tdx-guest,id=tdx0,..... \
> > >        -machine q35,confidential-guest-support=tdx0 \
> > >        ...
> > > 
> > > and not require inventing the new 'kvm-type' attribute at least.
> > 
> > yes.
> > 
> > TDX is initialized exactly as the above.
> > 
> > This RFC series introduces the 'kvm-type' for KVM_X86_SW_PROTECTED_VM. It's
> > my fault that forgot to list the option of introducing sw_protected_vm
> > object with CONFIDENTIAL_GUEST_SUPPORT interface.
> > Thanks for Isaku to raise it https://lore.kernel.org/qemu-devel/20230731171041.GB1807130@ls.amr.corp.intel.com/
> > 
> > we can specify KVM_X86_SW_PROTECTED_VM this way:
> > 
> > qemu  \
> >   -object sw-protected,id=swp0,... \
> >   -machine confidential-guest-support=swp0 \
> >   ...
> > 
> > > For the memory backend though, I'm not so sure - possibly that
> > > might be something that still wants an extra property to identify
> > > the type of memory to allocate, since we use memory-backend-ram
> > > for a variety of use cases.  Or it could be an entirely new object
> > > type such as "memory-backend-gmem"
> > 
> > What I want to discuss is whether providing the interface to users to allow
> > them configuring which memory is/can be private. For example, QEMU can do it
> > internally. If users wants a confidential guest, QEMU allocates private gmem
> > for normal RAM automatically.
> 
> I think handling it automatically simplifies things a good deal on the
> QEMU side. I think it's still worthwhile to still allow:
> 
>  -object memory-backend-memfd-private,...
> 
> because it provides a nice mechanism to set up a pair of shared/private
> memfd's to enable hole-punching via fallocate() to avoid doubling memory
> allocations for shared/private. It's also a nice place to control
> potentially-configurable things like:
> 
>  - whether or not to enable discard/hole-punching
>  - if discard is enabled, whether or not to register the range via
>    RamDiscardManager interface so that VFIO/IOMMU mappings get updated
>    when doing PCI passthrough. SNP relies on this for PCI passthrough
>    when discard is enabled, otherwise DMA occurs to stale mappings of
>    discarded bounce-buffer pages:
> 
>      https://github.com/AMDESE/qemu/blob/snp-latest/backends/hostmem-memfd-private.c#L449
> 
> But for other memory ranges, it doesn't do a lot of good to rely on
> users to control those via -object memory-backend-memfd-private, since
> QEMU will set up some regions internally, like the UEFI ROM.
> 
> It also isn't ideal for QEMU itself to internally control what
> should/shouldn't be set up with a backing guest_memfd, because some
> guest kernels do weird stuff, like scan for ROM regions in areas that
> guest kernels might have mapped as encrypted in guest page table. You
> can consider them to be guest bugs, but even current SNP-capable
> kernels exhibit this behavior and if the guest wants to do dumb stuff
> QEMU should let it.
> 
> But for these latter 2 cases, it doesn't make sense to attempt to do
> any sort of discarding of backing pages since it doesn't make sense to
> discard ROM pages.
> 
> So I think it makes sense to just set up the gmemfd automatically across
> the board internally, and keep memory-backend-memfd-private around
> purely as a way to control/configure discardable memory.


I'm looking at the repo and
31a7c7e36684 ("*hostmem-memfd-private: Initial discard manager support")

Do we have to implement RAM_DISCARD_MANGER at memory-backend-memfd-private?
Can't we implement it at host_mem? The interface callbacks can have check
"if (!private) return".  Then we can support any host-mem backend.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT
  2023-08-02 22:25   ` Isaku Yamahata
@ 2023-09-13  6:59     ` Xiaoyao Li
  0 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-09-13  6:59 UTC (permalink / raw)
  To: Isaku Yamahata
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, qemu-devel, kvm

On 8/3/2023 6:25 AM, Isaku Yamahata wrote:
> On Mon, Jul 31, 2023 at 12:21:57PM -0400,
> Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
>> From: Chao Peng <chao.p.peng@linux.intel.com>
>>
>> Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
>> KVM_EXIT_MEMORY_FAULT happens. It indicates userspace needs to do
>> the memory conversion on the RAMBlock to turn the memory into desired
>> attribute, i.e., private/shared.
>>
>> Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
>> gmem memory backend.
>>
>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 52 insertions(+)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index f9b5050b8885..72d50b923bf2 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -3040,6 +3040,48 @@ static void kvm_eat_signals(CPUState *cpu)
>>       } while (sigismember(&chkset, SIG_IPI));
>>   }
>>   
>> +static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>> +{
>> +    MemoryRegionSection section;
>> +    void *addr;
>> +    RAMBlock *rb;
>> +    ram_addr_t offset;
>> +    int ret = -1;
>> +
>> +    section = memory_region_find(get_system_memory(), start, size);
>> +    if (!section.mr) {
>> +        return ret;
>> +    }
>> +
>> +    if (memory_region_can_be_private(section.mr)) {
>> +        if (to_private) {
>> +            ret = kvm_set_memory_attributes_private(start, size);
>> +        } else {
>> +            ret = kvm_set_memory_attributes_shared(start, size);
>> +        }
>> +
>> +        if (ret) {
>> +            return ret;
>> +        }
>> +
>> +        addr = memory_region_get_ram_ptr(section.mr) +
>> +               section.offset_within_region;
>> +        rb = qemu_ram_block_from_host(addr, false, &offset);
> 
> Here we have already section. section.mr->ram_block.  We don't have to
> scan the existing RAMBlocks.

But we don't have the @offset, do we?

> Except that, looks good to me.
> Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT
  2023-08-09 15:02   ` Xu Yilun
@ 2023-09-13  7:00     ` Xiaoyao Li
  0 siblings, 0 replies; 53+ messages in thread
From: Xiaoyao Li @ 2023-09-13  7:00 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Paolo Bonzini, Sean Christopherson, David Hildenbrand,
	Igor Mammedov, Michael S. Tsirkin, Marcel Apfelbaum,
	Richard Henderson, Marcelo Tosatti, Markus Armbruster,
	Eric Blake, Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Peter Xu, Chao Peng, Michael Roth, isaku.yamahata, qemu-devel,
	kvm

On 8/9/2023 11:02 PM, Xu Yilun wrote:
> On 2023-07-31 at 12:21:57 -0400, Xiaoyao Li wrote:
>> From: Chao Peng <chao.p.peng@linux.intel.com>
>>
>> Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
>> KVM_EXIT_MEMORY_FAULT happens. It indicates userspace needs to do
>> the memory conversion on the RAMBlock to turn the memory into desired
>> attribute, i.e., private/shared.
>>
>> Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
>> gmem memory backend.
>>
>> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
>> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> ---
>>   accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 52 insertions(+)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index f9b5050b8885..72d50b923bf2 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -3040,6 +3040,48 @@ static void kvm_eat_signals(CPUState *cpu)
>>       } while (sigismember(&chkset, SIG_IPI));
>>   }
>>   
>> +static int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private)
>> +{
>> +    MemoryRegionSection section;
>> +    void *addr;
>> +    RAMBlock *rb;
>> +    ram_addr_t offset;
>> +    int ret = -1;
>> +
>> +    section = memory_region_find(get_system_memory(), start, size);
>> +    if (!section.mr) {
>> +        return ret;
>> +    }
>> +
>> +    if (memory_region_can_be_private(section.mr)) {
>> +        if (to_private) {
>> +            ret = kvm_set_memory_attributes_private(start, size);
>> +        } else {
>> +            ret = kvm_set_memory_attributes_shared(start, size);
>> +        }
>> +
>> +        if (ret) {
>> +            return ret;
> 
> Should we unref the memory region before return?

Thanks for catching this!

> Thanks,
> Yilun
> 
>> +        }
>> +
>> +        addr = memory_region_get_ram_ptr(section.mr) +
>> +               section.offset_within_region;
>> +        rb = qemu_ram_block_from_host(addr, false, &offset);
>> +        /*
>> +         * With KVM_SET_MEMORY_ATTRIBUTES by kvm_set_memory_attributes(),
>> +         * operation on underlying file descriptor is only for releasing
>> +         * unnecessary pages.
>> +         */
>> +        ram_block_convert_range(rb, offset, size, to_private);
>> +    } else {
>> +        warn_report("Convert non guest-memfd backed memory region (0x%"HWADDR_PRIx" ,+ 0x%"HWADDR_PRIx") to %s",
>> +                    start, size, to_private ? "private" : "shared");
>> +    }
>> +
>> +    memory_region_unref(section.mr);
>> +    return ret;
>> +}



^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2023-09-13  7:00 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-31 16:21 [RFC PATCH 00/19] QEMU gmem implemention Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 01/19] trace/kvm: Split address space and slot id in trace_kvm_set_user_memory() Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 02/19] *** HACK *** linux-headers: Update headers to pull in gmem APIs Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 03/19] RAMBlock: Support KVM gmemory Xiaoyao Li
2023-08-01 16:33   ` David Hildenbrand
2023-07-31 16:21 ` [RFC PATCH 04/19] memory: Introduce memory_region_can_be_private() Xiaoyao Li
2023-07-31 21:23   ` Peter Xu
2023-07-31 21:33     ` Michael S. Tsirkin
2023-07-31 21:34     ` Sean Christopherson
2023-07-31 21:36       ` Michael S. Tsirkin
2023-08-01  0:21         ` Peter Xu
2023-08-01 16:23           ` Sean Christopherson
2023-08-01 16:48   ` Claudio Fontana
2023-08-01 16:52     ` Claudio Fontana
2023-08-02  8:05       ` Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 05/19] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
2023-08-01 17:10   ` Claudio Fontana
2023-08-03  8:43     ` Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 06/19] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
2023-08-02 23:00   ` Isaku Yamahata
2023-07-31 16:21 ` [RFC PATCH 07/19] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 08/19] HostMem: Add private property to indicate to use kvm gmem Xiaoyao Li
2023-07-31 17:22   ` Markus Armbruster
2023-08-01 14:54     ` Xiaoyao Li
2023-08-01 14:57     ` Daniel P. Berrangé
2023-08-02  8:04       ` Xiaoyao Li
2023-08-01 17:21   ` David Hildenbrand
2023-08-02  8:03     ` Xiaoyao Li
2023-08-02 14:14       ` David Hildenbrand
2023-08-02 22:53         ` Isaku Yamahata
2023-08-03 13:05           ` David Hildenbrand
2023-07-31 16:21 ` [RFC PATCH 09/19] i386/kvm: Create gmem fd for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 10/19] kvm: Introduce support for memory_attributes Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 11/19] kvm/memory: Introduce the infrastructure to set the default shared/private value Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 12/19] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 13/19] physmem: replace function name with __func__ in ram_block_discard_range() Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 14/19] physmem: Add ram_block_convert_range Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 15/19] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
2023-08-02 22:25   ` Isaku Yamahata
2023-09-13  6:59     ` Xiaoyao Li
2023-08-09 15:02   ` Xu Yilun
2023-09-13  7:00     ` Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 16/19] trace/kvm: Add trace for page convertion between shared and private Xiaoyao Li
2023-07-31 16:21 ` [RFC PATCH 17/19] pci-host/q35: Move PAM initialization above SMRAM initialization Xiaoyao Li
2023-07-31 16:22 ` [RFC PATCH 18/19] q35: Introduce smm_ranges property for q35-pci-host Xiaoyao Li
2023-07-31 16:22 ` [RFC PATCH 19/19] i386: Disable SMM mode for X86_SW_PROTECTED_VM Xiaoyao Li
2023-08-02 22:27   ` Isaku Yamahata
2023-07-31 16:51 ` [RFC PATCH 00/19] QEMU gmem implemention Daniel P. Berrangé
2023-08-01  1:45   ` Xiaoyao Li
2023-08-10 15:58     ` Michael Roth via
2023-08-14 21:45       ` Isaku Yamahata
2023-07-31 17:10 ` Isaku Yamahata
2023-08-01  1:55   ` Xiaoyao Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).