All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/8] ioregionfd introduction
@ 2022-02-08  7:22 Elena Ufimtseva
  2022-02-08  7:22 ` [RFC 1/8] ioregionfd: introduce a syscall and memory API Elena Ufimtseva
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2022-02-08  7:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, armbru, peterx, mst, berrange, stefanha, pbonzini, philmd

This patchset is an RFC version for the ioregionfd implementation
in QEMU. The kernel patches are to be posted with some fixes as a v4.

For this implementation version 3 of the posted kernel patches was user:
https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/

The future version will include support for vfio/libvfio-user.
Please refer to the design discussion here proposed by Stefan:
https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/

The vfio-user version needed some bug-fixing and it was decided to send
this for multiprocess first.

The ioregionfd is configured currently trough the command line and each
ioregionfd represent an object. This allow for easy parsing and does
not require device/remote object command line option modifications.

The following command line can be used to specify ioregionfd:
<snip>
  '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
  '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
  '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
</snip>

Proxy side of ioregionfd in this version uses only one file descriptor:
<snip>
  '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \
</snip>

This is done for RFC version and my though was that next version will
be for vfio-user, so I have not dedicated much effort to this command
line options.

The multiprocess messaging protocol was extended to support inquiries
by the proxy if device has any ioregionfds.
This RFC implements inquires by proxy about the type of BAR (ioregionfd
or not) and the type of it (memory/io).

Currently there are few limitations in this version of ioregionfd.
 - one ioregionfd per bar, only full bar size is supported;
 - one file descriptor per device for all of its ioregionfds;
 - each remote device runs fd handler for all its BARs in one IOThread;
 - proxy supports only one fd.

Some of these limitations will be dropped in the future version.
This RFC is to acquire the feedback/suggestions from the community
on the general approach.

The quick performance test was done for the remote lsi device with
ioregionfd and without for both mem BARs (1 and 2) with help
of the fio tool:

Random R/W:

	             read IOPS	read BW     write IOPS   write BW
no ioregionfd	 889	    3559KiB/s   890          3561KiB/s
ioregionfd	     938	    3756KiB/s   939          3757KiB/s


Sequential Read and Sequential Write:

                 Sequential read		Sequential write	
                 read IOPS	read BW	    write IOPS	 write BW

no ioregionfd    367k	    1434MiB/s	76k	         297MiB/s
ioregionfd       374k	    1459MiB/s	77.3k	     302MiB/s


Please review and send your feedback.

Thank you!
Elena

Elena Ufimtseva (8):
  ioregionfd: introduce a syscall and memory API
  multiprocess: place RemoteObject definition in a header file
  ioregionfd: introduce memory API functions
  ioregionfd: Introduce IORegionDFObject type
  multiprocess: prepare ioregionfds for remote device
  multiprocess: add MPQEMU_CMD_BAR_INFO
  multiprocess: add ioregionfd memory region in proxy
  multiprocess: handle ioregionfd commands

 meson.build                     |  15 +-
 qapi/qom.json                   |  32 ++-
 include/exec/memory.h           |  50 +++++
 include/hw/remote/ioregionfd.h  |  45 ++++
 include/hw/remote/machine.h     |   1 +
 include/hw/remote/mpqemu-link.h |   2 +
 include/hw/remote/proxy.h       |   1 +
 include/hw/remote/remote.h      |  31 +++
 include/sysemu/kvm.h            |  15 ++
 linux-headers/ioregionfd.h      |  30 +++
 linux-headers/linux/kvm.h       |  25 +++
 accel/kvm/kvm-all.c             | 132 ++++++++++++
 accel/stubs/kvm-stub.c          |   1 +
 hw/remote/ioregionfd.c          | 361 ++++++++++++++++++++++++++++++++
 hw/remote/message.c             |  38 ++++
 hw/remote/proxy.c               |  66 +++++-
 hw/remote/remote-obj.c          | 154 ++++++++++++--
 softmmu/memory.c                | 207 ++++++++++++++++++
 Kconfig.host                    |   3 +
 MAINTAINERS                     |   3 +
 hw/remote/Kconfig               |   4 +
 hw/remote/meson.build           |   1 +
 meson_options.txt               |   2 +
 scripts/meson-buildoptions.sh   |   3 +
 24 files changed, 1199 insertions(+), 23 deletions(-)
 create mode 100644 include/hw/remote/ioregionfd.h
 create mode 100644 include/hw/remote/remote.h
 create mode 100644 linux-headers/ioregionfd.h
 create mode 100644 hw/remote/ioregionfd.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 1/8] ioregionfd: introduce a syscall and memory API
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
@ 2022-02-08  7:22 ` Elena Ufimtseva
  2022-02-16 12:19   ` David Hildenbrand
  2022-02-08  7:22 ` [RFC 2/8] multiprocess: place RemoteObject definition in a header file Elena Ufimtseva
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Elena Ufimtseva @ 2022-02-08  7:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, armbru, peterx, mst, berrange, stefanha, pbonzini, philmd

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 include/exec/memory.h     |  50 +++++++++++++++
 include/sysemu/kvm.h      |  15 +++++
 linux-headers/linux/kvm.h |  25 ++++++++
 accel/kvm/kvm-all.c       | 132 ++++++++++++++++++++++++++++++++++++++
 accel/stubs/kvm-stub.c    |   1 +
 5 files changed, 223 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 20f1b27377..2ce7f35cc2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -712,6 +712,7 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
 typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
+typedef struct MemoryRegionIoregionfd MemoryRegionIoregionfd;
 
 /** MemoryRegion:
  *
@@ -756,6 +757,8 @@ struct MemoryRegion {
     const char *name;
     unsigned ioeventfd_nb;
     MemoryRegionIoeventfd *ioeventfds;
+    unsigned ioregionfd_nb;
+    MemoryRegionIoregionfd *ioregionfds;
     RamDiscardManager *rdm; /* Only for RAM */
 };
 
@@ -974,6 +977,38 @@ struct MemoryListener {
      */
     void (*eventfd_del)(MemoryListener *listener, MemoryRegionSection *section,
                         bool match_data, uint64_t data, EventNotifier *e);
+    /**
+     * @ioregionfd_add:
+     *
+     * Called during an address space update transaction,
+     * for a section of the address space that has had a new ioregionfd
+     * registration since the last transaction.
+     *
+     * @listener: The #MemoryListener.
+     * @section: The new #MemoryRegionSection.
+     * @data: The @data parameter for the new ioregionfd.
+     * @fd: The file descriptor parameter for the new ioregionfd.
+     */
+    void (*ioregionfd_add)(MemoryListener *listener,
+                           MemoryRegionSection *section,
+                           uint64_t data, int fd);
+
+    /**
+     * @ioregionfd_del:
+     *
+     * Called during an address space update transaction,
+     * for a section of the address space that has dropped an ioregionfd
+     * registration since the last transaction.
+     *
+     * @listener: The #MemoryListener.
+     * @section: The new #MemoryRegionSection.
+     * @data: The @data parameter for the dropped ioregionfd.
+     * @fd: The file descriptor parameter for the dropped ioregionfd.
+     */
+    void (*ioregionfd_del)(MemoryListener *listener,
+                           MemoryRegionSection *section,
+                           uint64_t data, int fd);
+
 
     /**
      * @coalesced_io_add:
@@ -1041,6 +1076,8 @@ struct AddressSpace {
 
     int ioeventfd_nb;
     struct MemoryRegionIoeventfd *ioeventfds;
+    int ioregionfd_nb;
+    struct MemoryRegionIoregionfd *ioregionfds;
     QTAILQ_HEAD(, MemoryListener) listeners;
     QTAILQ_ENTRY(AddressSpace) address_spaces_link;
 };
@@ -2175,6 +2212,19 @@ void memory_region_del_eventfd(MemoryRegion *mr,
                                uint64_t data,
                                EventNotifier *e);
 
+void memory_region_add_ioregionfd(MemoryRegion *mr,
+                                  hwaddr addr,
+                                  unsigned size,
+                                  uint64_t data,
+                                  int fd,
+                                  bool pio);
+
+void memory_region_del_ioregionfd(MemoryRegion *mr,
+                                  hwaddr addr,
+                                  unsigned size,
+                                  uint64_t data,
+                                  int fd);
+
 /**
  * memory_region_add_subregion: Add a subregion to a container.
  *
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 7b22aeb6ae..fea77b5185 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -46,6 +46,7 @@ extern bool kvm_readonly_mem_allowed;
 extern bool kvm_direct_msi_allowed;
 extern bool kvm_ioeventfd_any_length_allowed;
 extern bool kvm_msi_use_devid;
+extern bool kvm_ioregionfds_allowed;
 
 #define kvm_enabled()           (kvm_allowed)
 /**
@@ -167,6 +168,15 @@ extern bool kvm_msi_use_devid;
  */
 #define kvm_msi_devid_required() (kvm_msi_use_devid)
 
+/**
+ * kvm_ioregionfds_enabled:
+ *
+ * Returns: true if we can use ioregionfd to receive the MMIO/PIO
+ * dispatches from KVM (ie the kernel supports ioregionfd and we are running
+ * with a configuration where it is meaningful to use them).
+ */
+#define kvm_ioregionfds_enabled() (kvm_ioregionfds_allowed)
+
 #else
 
 #define kvm_enabled()           (0)
@@ -184,12 +194,14 @@ extern bool kvm_msi_use_devid;
 #define kvm_direct_msi_enabled() (false)
 #define kvm_ioeventfd_any_length_enabled() (false)
 #define kvm_msi_devid_required() (false)
+#define kvm_ioregionfds_enabled (false)
 
 #endif  /* CONFIG_KVM_IS_POSSIBLE */
 
 struct kvm_run;
 struct kvm_lapic_state;
 struct kvm_irq_routing_entry;
+struct kvm_ioregion;
 
 typedef struct KVMCapabilityInfo {
     const char *name;
@@ -548,4 +560,7 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_resettable(void);
 
 bool kvm_dirty_ring_enabled(void);
+
+int kvm_set_ioregionfd(struct kvm_ioregion *ioregionfd);
+
 #endif
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index bcaf66cc4d..1ad444a74e 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -776,6 +776,29 @@ struct kvm_ioeventfd {
 	__u8  pad[36];
 };
 
+enum {
+        kvm_ioregion_flag_nr_pio,
+        kvm_ioregion_flag_nr_posted_writes,
+        kvm_ioregion_flag_nr_deassign,
+        kvm_ioregion_flag_nr_max,
+};
+
+#define KVM_IOREGION_PIO (1 << kvm_ioregion_flag_nr_pio)
+#define KVM_IOREGION_POSTED_WRITES (1 << kvm_ioregion_flag_nr_posted_writes)
+#define KVM_IOREGION_DEASSIGN (1 << kvm_ioregion_flag_nr_deassign)
+
+#define KVM_IOREGION_VALID_FLAG_MASK ((1 << kvm_ioregion_flag_nr_max) - 1)
+
+struct kvm_ioregion {
+        __u64 guest_paddr; /* guest physical address */
+        __u64 memory_size; /* bytes */
+        __u64 user_data;
+        __s32 read_fd;
+        __s32 write_fd;
+        __u32 flags;
+        __u8  pad[28];
+};
+
 #define KVM_X86_DISABLE_EXITS_MWAIT          (1 << 0)
 #define KVM_X86_DISABLE_EXITS_HLT            (1 << 1)
 #define KVM_X86_DISABLE_EXITS_PAUSE          (1 << 2)
@@ -933,6 +956,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PIT_STATE2 35
 #endif
 #define KVM_CAP_IOEVENTFD 36
+#define KVM_CAP_IOREGIONFD 206
 #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
 #ifdef __KVM_HAVE_XEN_HVM
 #define KVM_CAP_XEN_HVM 38
@@ -1372,6 +1396,7 @@ struct kvm_vfio_spapr_tce {
 					struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SET_IOREGION          _IOW(KVMIO,  0x49, struct kvm_ioregion)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index eecd8031cf..dda04a0ae1 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -168,6 +168,7 @@ bool kvm_vm_attributes_allowed;
 bool kvm_direct_msi_allowed;
 bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
+bool kvm_ioregionfds_allowed;
 static bool kvm_immediate_exit;
 static hwaddr kvm_max_slot_size = ~0;
 
@@ -384,6 +385,18 @@ err:
     return ret;
 }
 
+int kvm_set_ioregionfd(struct kvm_ioregion *ioregionfd)
+{
+    KVMState *s = kvm_state;
+    int ret = -1;
+
+    ret = kvm_vm_ioctl(s, KVM_SET_IOREGION, ioregionfd);
+    if (ret < 0) {
+        error_report("Failed SET_IOREGION syscall ret is %d", ret);
+    }
+    return ret;
+}
+
 static int do_kvm_destroy_vcpu(CPUState *cpu)
 {
     KVMState *s = kvm_state;
@@ -1635,6 +1648,104 @@ static void kvm_io_ioeventfd_del(MemoryListener *listener,
     }
 }
 
+static void kvm_mem_ioregionfd_add(MemoryListener *listener,
+                                   MemoryRegionSection *section,
+                                   uint64_t data,
+                                   int fd)
+{
+
+    struct kvm_ioregion ioregionfd;
+    int r = -1;
+
+    ioregionfd.guest_paddr = section->offset_within_address_space;
+    ioregionfd.memory_size = int128_get64(section->size);
+    ioregionfd.user_data = data;
+    ioregionfd.read_fd = fd;
+    ioregionfd.write_fd = fd;
+    ioregionfd.flags = 0;
+    memset(&ioregionfd.pad, 0, sizeof(ioregionfd.pad));
+
+    r = kvm_set_ioregionfd(&ioregionfd);
+    if (r < 0) {
+        fprintf(stderr, "%s: error adding ioregionfd: %s (%d)\n,",
+                __func__, strerror(-r), -r);
+        abort();
+    }
+}
+
+static void kvm_mem_ioregionfd_del(MemoryListener *listener,
+                                   MemoryRegionSection *section,
+                                   uint64_t data,
+                                   int fd)
+
+{
+    struct kvm_ioregion ioregionfd;
+    int r = -1;
+
+    ioregionfd.guest_paddr = section->offset_within_address_space;
+    ioregionfd.memory_size = int128_get64(section->size);
+    ioregionfd.user_data = data;
+    ioregionfd.read_fd = fd;
+    ioregionfd.write_fd = fd;
+    ioregionfd.flags = KVM_IOREGION_DEASSIGN;
+    memset(&ioregionfd.pad, 0, sizeof(ioregionfd.pad));
+
+    r = kvm_set_ioregionfd(&ioregionfd);
+    if (r < 0) {
+        fprintf(stderr, "%s: error deleting ioregionfd: %s (%d)\n,",
+                __func__, strerror(-r), -r);
+        abort();
+    }
+}
+
+static void kvm_io_ioregionfd_add(MemoryListener *listener,
+                                  MemoryRegionSection *section,
+                                  uint64_t data,
+                                  int fd)
+{
+    struct kvm_ioregion ioregionfd;
+    int r = -1;
+
+    ioregionfd.guest_paddr = section->offset_within_address_space;
+    ioregionfd.memory_size = int128_get64(section->size);
+    ioregionfd.user_data = data;
+    ioregionfd.read_fd = fd;
+    ioregionfd.write_fd = fd;
+    ioregionfd.flags = KVM_IOREGION_PIO;
+    memset(&ioregionfd.pad, 0, sizeof(ioregionfd.pad));
+
+    r = kvm_set_ioregionfd(&ioregionfd);
+    if (r < 0) {
+        fprintf(stderr, "%s: error adding pio ioregionfd: %s (%d)\n,",
+                __func__, strerror(-r), -r);
+        abort();
+    }
+}
+
+static void kvm_io_ioregionfd_del(MemoryListener *listener,
+                                  MemoryRegionSection *section,
+                                  uint64_t data,
+                                  int fd)
+{
+    struct kvm_ioregion ioregionfd;
+    int r = -1;
+
+    ioregionfd.guest_paddr = section->offset_within_address_space;
+    ioregionfd.memory_size = int128_get64(section->size);
+    ioregionfd.user_data = data;
+    ioregionfd.read_fd = fd;
+    ioregionfd.write_fd = fd;
+    ioregionfd.flags = KVM_IOREGION_DEASSIGN | KVM_IOREGION_PIO;
+    memset(&ioregionfd.pad, 0, sizeof(ioregionfd.pad));
+
+    r = kvm_set_ioregionfd(&ioregionfd);
+    if (r < 0) {
+        fprintf(stderr, "%s: error deleting pio ioregionfd: %s (%d)\n,",
+                __func__, strerror(-r), -r);
+        abort();
+    }
+}
+
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
                                   AddressSpace *as, int as_id, const char *name)
 {
@@ -1679,6 +1790,12 @@ static MemoryListener kvm_io_listener = {
     .priority = 10,
 };
 
+static MemoryListener kvm_ioregion_listener = {
+    .ioregionfd_add = kvm_io_ioregionfd_add,
+    .ioregionfd_del = kvm_io_ioregionfd_del,
+    .priority = 10,
+};
+
 int kvm_set_irq(KVMState *s, int irq, int level)
 {
     struct kvm_irq_level event;
@@ -2564,6 +2681,9 @@ static int kvm_init(MachineState *ms)
     kvm_ioeventfd_any_length_allowed =
         (kvm_check_extension(s, KVM_CAP_IOEVENTFD_ANY_LENGTH) > 0);
 
+    kvm_ioregionfds_allowed =
+        (kvm_check_extension(s, KVM_CAP_IOREGIONFD) > 0);
+
     kvm_state = s;
 
     ret = kvm_arch_init(ms, s);
@@ -2585,6 +2705,12 @@ static int kvm_init(MachineState *ms)
         s->memory_listener.listener.eventfd_add = kvm_mem_ioeventfd_add;
         s->memory_listener.listener.eventfd_del = kvm_mem_ioeventfd_del;
     }
+
+    if (kvm_ioregionfds_allowed) {
+        s->memory_listener.listener.ioregionfd_add = kvm_mem_ioregionfd_add;
+        s->memory_listener.listener.ioregionfd_del = kvm_mem_ioregionfd_del;
+    }
+
     s->memory_listener.listener.coalesced_io_add = kvm_coalesce_mmio_region;
     s->memory_listener.listener.coalesced_io_del = kvm_uncoalesce_mmio_region;
 
@@ -2594,6 +2720,12 @@ static int kvm_init(MachineState *ms)
         memory_listener_register(&kvm_io_listener,
                                  &address_space_io);
     }
+
+    if (kvm_ioregionfds_allowed) {
+        memory_listener_register(&kvm_ioregion_listener,
+                                 &address_space_io);
+    }
+
     memory_listener_register(&kvm_coalesced_pio_listener,
                              &address_space_io);
 
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 5319573e00..d6caea8174 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -29,6 +29,7 @@ bool kvm_gsi_direct_mapping;
 bool kvm_allowed;
 bool kvm_readonly_mem_allowed;
 bool kvm_ioeventfd_any_length_allowed;
+bool kvm_ioregionfds_allowed;
 bool kvm_msi_use_devid;
 
 void kvm_flush_coalesced_mmio_buffer(void)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 2/8] multiprocess: place RemoteObject definition in a header file
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
  2022-02-08  7:22 ` [RFC 1/8] ioregionfd: introduce a syscall and memory API Elena Ufimtseva
@ 2022-02-08  7:22 ` Elena Ufimtseva
  2022-02-08  7:22 ` [RFC 3/8] ioregionfd: introduce memory API functions Elena Ufimtseva
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2022-02-08  7:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, armbru, peterx, mst, berrange, stefanha, pbonzini, philmd

This will be needed later. No functional changes.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 include/hw/remote/remote.h | 28 ++++++++++++++++++++++++++++
 hw/remote/remote-obj.c     | 16 +---------------
 MAINTAINERS                |  1 +
 3 files changed, 30 insertions(+), 15 deletions(-)
 create mode 100644 include/hw/remote/remote.h

diff --git a/include/hw/remote/remote.h b/include/hw/remote/remote.h
new file mode 100644
index 0000000000..a2d23178b9
--- /dev/null
+++ b/include/hw/remote/remote.h
@@ -0,0 +1,28 @@
+/*
+ * RemoteObject header.
+ *
+ * Copyright © 2018, 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef REMOTE_H
+#define REMOTE_H
+
+struct RemoteObject {
+    /* private */
+    Object parent;
+
+    Notifier machine_done;
+
+    int32_t fd;
+    char *devid;
+
+    QIOChannel *ioc;
+
+    DeviceState *dev;
+    DeviceListener listener;
+};
+
+#endif
diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
index 4f21254219..f0da696662 100644
--- a/hw/remote/remote-obj.c
+++ b/hw/remote/remote-obj.c
@@ -23,6 +23,7 @@
 #include "hw/pci/pci.h"
 #include "qemu/sockets.h"
 #include "monitor/monitor.h"
+#include "hw/remote/remote.h"
 
 #define TYPE_REMOTE_OBJECT "x-remote-object"
 OBJECT_DECLARE_TYPE(RemoteObject, RemoteObjectClass, REMOTE_OBJECT)
@@ -34,21 +35,6 @@ struct RemoteObjectClass {
     unsigned int max_devs;
 };
 
-struct RemoteObject {
-    /* private */
-    Object parent;
-
-    Notifier machine_done;
-
-    int32_t fd;
-    char *devid;
-
-    QIOChannel *ioc;
-
-    DeviceState *dev;
-    DeviceListener listener;
-};
-
 static void remote_object_set_fd(Object *obj, const char *str, Error **errp)
 {
     RemoteObject *o = REMOTE_OBJECT(obj);
diff --git a/MAINTAINERS b/MAINTAINERS
index 7543eb4d59..3c60a29760 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3453,6 +3453,7 @@ F: hw/remote/proxy-memory-listener.c
 F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
+F: include/hw/remote/remote.h
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 3/8] ioregionfd: introduce memory API functions
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
  2022-02-08  7:22 ` [RFC 1/8] ioregionfd: introduce a syscall and memory API Elena Ufimtseva
  2022-02-08  7:22 ` [RFC 2/8] multiprocess: place RemoteObject definition in a header file Elena Ufimtseva
@ 2022-02-08  7:22 ` Elena Ufimtseva
  2022-02-14 14:32   ` Stefan Hajnoczi
  2022-02-08  7:22 ` [RFC 4/8] ioregionfd: Introduce IORegionDFObject type Elena Ufimtseva
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Elena Ufimtseva @ 2022-02-08  7:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, armbru, peterx, mst, berrange, stefanha, pbonzini, philmd

Similar to ioeventfd, introduce the ioregionfd
functions to add and delete ioregionfds.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 softmmu/memory.c | 207 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 207 insertions(+)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7340e19ff5..3618c5d1cf 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -40,6 +40,7 @@ static unsigned memory_region_transaction_depth;
 static bool memory_region_update_pending;
 static bool ioeventfd_update_pending;
 unsigned int global_dirty_tracking;
+static bool ioregionfd_update_pending;
 
 static QTAILQ_HEAD(, MemoryListener) memory_listeners
     = QTAILQ_HEAD_INITIALIZER(memory_listeners);
@@ -170,6 +171,13 @@ struct MemoryRegionIoeventfd {
     EventNotifier *e;
 };
 
+struct MemoryRegionIoregionfd {
+    AddrRange addr;
+    uint64_t data;
+    int fd;
+    bool pio;
+};
+
 static bool memory_region_ioeventfd_before(MemoryRegionIoeventfd *a,
                                            MemoryRegionIoeventfd *b)
 {
@@ -214,6 +222,33 @@ static bool memory_region_ioeventfd_equal(MemoryRegionIoeventfd *a,
     return false;
 }
 
+static bool memory_region_ioregionfd_before(MemoryRegionIoregionfd *a,
+                                           MemoryRegionIoregionfd *b)
+{
+    if (int128_lt(a->addr.start, b->addr.start)) {
+        return true;
+    } else if (int128_gt(a->addr.start, b->addr.start)) {
+        return false;
+    } else if (int128_lt(a->addr.size, b->addr.size)) {
+        return true;
+    } else if (int128_gt(a->addr.size, b->addr.size)) {
+        return false;
+    }
+    return false;
+}
+
+static bool memory_region_ioregionfd_equal(MemoryRegionIoregionfd *a,
+                                          MemoryRegionIoregionfd *b)
+{
+    if (int128_eq(a->addr.start, b->addr.start) &&
+        (!int128_nz(a->addr.size) || !int128_nz(b->addr.size) ||
+         (int128_eq(a->addr.size, b->addr.size) &&
+          (a->fd == b->fd))))
+        return true;
+
+    return false;
+}
+
 /* Range of memory in the global map.  Addresses are absolute. */
 struct FlatRange {
     MemoryRegion *mr;
@@ -800,6 +835,52 @@ static void address_space_add_del_ioeventfds(AddressSpace *as,
     }
 }
 
+static void address_space_add_del_ioregionfds(AddressSpace *as,
+                                              MemoryRegionIoregionfd *fds_new,
+                                              unsigned fds_new_nb,
+                                              MemoryRegionIoregionfd *fds_old,
+                                              unsigned fds_old_nb)
+{
+    unsigned iold, inew;
+    MemoryRegionIoregionfd *fd;
+    MemoryRegionSection section;
+
+    iold = inew = 0;
+    while (iold < fds_old_nb || inew < fds_new_nb) {
+        if (iold < fds_old_nb
+            && (inew == fds_new_nb
+                || memory_region_ioregionfd_before(&fds_old[iold],
+                                                  &fds_new[inew]))) {
+            fd = &fds_old[iold];
+            section = (MemoryRegionSection) {
+                .fv = address_space_to_flatview(as),
+                .offset_within_address_space = int128_get64(fd->addr.start),
+                .size = fd->addr.size,
+            };
+            MEMORY_LISTENER_CALL(as, ioregionfd_del, Forward, &section,
+                                 fd->data, fd->fd);
+            ++iold;
+
+        } else if (inew < fds_new_nb
+                   && (iold == fds_old_nb
+                       || memory_region_ioregionfd_before(&fds_new[inew],
+                                                         &fds_old[iold]))) {
+            fd = &fds_new[inew];
+            section = (MemoryRegionSection) {
+                .fv = address_space_to_flatview(as),
+                .offset_within_address_space = int128_get64(fd->addr.start),
+                .size = fd->addr.size,
+            };
+            MEMORY_LISTENER_CALL(as, ioregionfd_add, Reverse, &section,
+                                 fd->data, fd->fd);
+            ++inew;
+        } else {
+            ++iold;
+            ++inew;
+        }
+    }
+}
+
 FlatView *address_space_get_flatview(AddressSpace *as)
 {
     FlatView *view;
@@ -814,6 +895,52 @@ FlatView *address_space_get_flatview(AddressSpace *as)
     return view;
 }
 
+static void address_space_update_ioregionfds(AddressSpace *as)
+{
+    FlatView *view;
+    FlatRange *fr;
+    unsigned ioregionfd_nb = 0;
+    unsigned ioregionfd_max;
+    MemoryRegionIoregionfd *ioregionfds;
+    AddrRange tmp;
+    unsigned i;
+
+    /*
+     * It is likely that the number of ioregionfds hasn't changed much, so use
+     * the previous size as the starting value, with some headroom to avoid
+     * gratuitous reallocations.
+     */
+    ioregionfd_max = QEMU_ALIGN_UP(as->ioregionfd_nb, 4);
+    ioregionfds = g_new(MemoryRegionIoregionfd, ioregionfd_max);
+
+    view = address_space_get_flatview(as);
+    FOR_EACH_FLAT_RANGE(fr, view) {
+        for (i = 0; i < fr->mr->ioregionfd_nb; ++i) {
+            tmp = addrrange_shift(fr->mr->ioregionfds[i].addr,
+                                  int128_sub(fr->addr.start,
+                                  int128_make64(fr->offset_in_region)));
+            if (addrrange_intersects(fr->addr, tmp)) {
+                ++ioregionfd_nb;
+                if (ioregionfd_nb > ioregionfd_max) {
+                    ioregionfd_max = MAX(ioregionfd_max * 2, 4);
+                    ioregionfds = g_realloc(ioregionfds,
+                            ioregionfd_max * sizeof(*ioregionfds));
+                }
+                ioregionfds[ioregionfd_nb - 1] = fr->mr->ioregionfds[i];
+                ioregionfds[ioregionfd_nb - 1].addr = tmp;
+            }
+        }
+    }
+
+    address_space_add_del_ioregionfds(as, ioregionfds, ioregionfd_nb,
+                                      as->ioregionfds, as->ioregionfd_nb);
+
+    g_free(as->ioregionfds);
+    as->ioregionfds = ioregionfds;
+    as->ioregionfd_nb = ioregionfd_nb;
+    flatview_unref(view);
+}
+
 static void address_space_update_ioeventfds(AddressSpace *as)
 {
     FlatView *view;
@@ -1102,15 +1229,22 @@ void memory_region_transaction_commit(void)
             QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
                 address_space_set_flatview(as);
                 address_space_update_ioeventfds(as);
+                address_space_update_ioregionfds(as);
             }
             memory_region_update_pending = false;
             ioeventfd_update_pending = false;
+            ioregionfd_update_pending = false;
             MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
         } else if (ioeventfd_update_pending) {
             QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
                 address_space_update_ioeventfds(as);
             }
             ioeventfd_update_pending = false;
+        } else if (ioregionfd_update_pending) {
+            QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
+                address_space_update_ioregionfds(as);
+            }
+            ioregionfd_update_pending = false;
         }
    }
 }
@@ -1757,6 +1891,7 @@ static void memory_region_finalize(Object *obj)
     memory_region_clear_coalescing(mr);
     g_free((char *)mr->name);
     g_free(mr->ioeventfds);
+    g_free(mr->ioregionfds);
 }
 
 Object *memory_region_owner(MemoryRegion *mr)
@@ -2434,6 +2569,42 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr)
 
 static bool userspace_eventfd_warning;
 
+void memory_region_add_ioregionfd(MemoryRegion *mr,
+                                  hwaddr addr,
+                                  unsigned size,
+                                  uint64_t data,
+                                  int fd,
+                                  bool pio)
+{
+    MemoryRegionIoregionfd mriofd = {
+        .addr.start = int128_make64(addr),
+        .addr.size = int128_make64(size),
+        .data = data,
+        .fd = fd,
+    };
+    unsigned i;
+
+    if (kvm_enabled() && !kvm_ioregionfds_enabled()) {
+        error_report("KVM does not support KVM_CAP_IOREGIONFD");
+    }
+
+    memory_region_transaction_begin();
+    for (i = 0; i < mr->ioregionfd_nb; ++i) {
+        if (memory_region_ioregionfd_before(&mriofd, &mr->ioregionfds[i])) {
+            break;
+        }
+    }
+    ++mr->ioregionfd_nb;
+    mr->ioregionfds = g_realloc(mr->ioregionfds,
+                                sizeof(*mr->ioregionfds) * mr->ioregionfd_nb);
+    memmove(&mr->ioregionfds[i + 1], &mr->ioregionfds[i],
+            sizeof(*mr->ioregionfds) * (mr->ioregionfd_nb - 1 - i));
+    mr->ioregionfds[i] = mriofd;
+
+    memory_region_transaction_commit();
+    ioregionfd_update_pending = true;
+}
+
 void memory_region_add_eventfd(MemoryRegion *mr,
                                hwaddr addr,
                                unsigned size,
@@ -2511,6 +2682,38 @@ void memory_region_del_eventfd(MemoryRegion *mr,
     memory_region_transaction_commit();
 }
 
+void memory_region_del_ioregionfd(MemoryRegion *mr,
+                                  hwaddr addr,
+                                  unsigned size,
+                                  uint64_t data,
+                                  int fd)
+{
+    MemoryRegionIoregionfd mriofd = {
+        .addr.start = int128_make64(addr),
+        .addr.size = int128_make64(size),
+        .data = data,
+        .fd = fd,
+    };
+    unsigned i;
+
+    memory_region_transaction_begin();
+    for (i = 0; i < mr->ioregionfd_nb; ++i) {
+        if (memory_region_ioregionfd_equal(&mriofd, &mr->ioregionfds[i])) {
+            break;
+        }
+    }
+    assert(i != mr->ioregionfd_nb);
+    memmove(&mr->ioregionfds[i], &mr->ioregionfds[i + 1],
+            sizeof(*mr->ioregionfds) * (mr->ioregionfd_nb - (i + 1)));
+    --mr->ioregionfd_nb;
+    mr->ioregionfds = g_realloc(mr->ioregionfds,
+                                sizeof(*mr->ioregionfds) *
+                                mr->ioregionfd_nb + 1);
+    memory_region_transaction_commit();
+
+    ioregionfd_update_pending = true;
+}
+
 static void memory_region_update_container_subregions(MemoryRegion *subregion)
 {
     MemoryRegion *mr = subregion->container;
@@ -2956,11 +3159,14 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
     as->current_map = NULL;
     as->ioeventfd_nb = 0;
     as->ioeventfds = NULL;
+    as->ioregionfd_nb = 0;
+    as->ioregionfds = NULL;
     QTAILQ_INIT(&as->listeners);
     QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
     as->name = g_strdup(name ? name : "anonymous");
     address_space_update_topology(as);
     address_space_update_ioeventfds(as);
+    address_space_update_ioregionfds(as);
 }
 
 static void do_address_space_destroy(AddressSpace *as)
@@ -2970,6 +3176,7 @@ static void do_address_space_destroy(AddressSpace *as)
     flatview_unref(as->current_map);
     g_free(as->name);
     g_free(as->ioeventfds);
+    g_free(as->ioregionfds);
     memory_region_unref(as->root);
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 4/8] ioregionfd: Introduce IORegionDFObject type
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
                   ` (2 preceding siblings ...)
  2022-02-08  7:22 ` [RFC 3/8] ioregionfd: introduce memory API functions Elena Ufimtseva
@ 2022-02-08  7:22 ` Elena Ufimtseva
  2022-02-11 13:46   ` Markus Armbruster
  2022-02-14 14:37   ` Stefan Hajnoczi
  2022-02-08  7:22 ` [RFC 5/8] multiprocess: prepare ioregionfds for remote device Elena Ufimtseva
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2022-02-08  7:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, armbru, peterx, mst, berrange, stefanha, pbonzini, philmd

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 meson.build                    |  15 ++-
 qapi/qom.json                  |  32 +++++-
 include/hw/remote/ioregionfd.h |  40 +++++++
 hw/remote/ioregionfd.c         | 196 +++++++++++++++++++++++++++++++++
 Kconfig.host                   |   3 +
 MAINTAINERS                    |   2 +
 hw/remote/Kconfig              |   4 +
 hw/remote/meson.build          |   1 +
 meson_options.txt              |   2 +
 scripts/meson-buildoptions.sh  |   3 +
 10 files changed, 294 insertions(+), 4 deletions(-)
 create mode 100644 include/hw/remote/ioregionfd.h
 create mode 100644 hw/remote/ioregionfd.c

diff --git a/meson.build b/meson.build
index 96de1a6ef9..6483e754bd 100644
--- a/meson.build
+++ b/meson.build
@@ -258,6 +258,17 @@ if targetos != 'linux' and get_option('multiprocess').enabled()
 endif
 multiprocess_allowed = targetos == 'linux' and not get_option('multiprocess').disabled()
 
+# TODO: drop this limitation
+if not multiprocess_allowed and not get_option('ioregionfd').disabled()
+  error('To enable ioregiofd support, enable mutliprocess option.')
+endif
+ioregionfd_allowed = multiprocess_allowed and not get_option('ioregionfd').disabled()
+if ioregionfd_allowed
+    config_host += { 'CONFIG_IOREGIONFD': 'y' }
+else
+    config_host += { 'CONFIG_IOREGIONFD': 'n' }
+endif
+
 libm = cc.find_library('m', required: false)
 threads = dependency('threads')
 util = cc.find_library('util', required: false)
@@ -1837,7 +1848,8 @@ host_kconfig = \
   (have_virtfs ? ['CONFIG_VIRTFS=y'] : []) + \
   ('CONFIG_LINUX' in config_host ? ['CONFIG_LINUX=y'] : []) + \
   ('CONFIG_PVRDMA' in config_host ? ['CONFIG_PVRDMA=y'] : []) + \
-  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : [])
+  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : []) + \
+  (ioregionfd_allowed ? ['CONFIG_IOREGIONFD=y'] : [])
 
 ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
 
@@ -3315,6 +3327,7 @@ summary_info += {'target list':       ' '.join(target_dirs)}
 if have_system
   summary_info += {'default devices':   get_option('default_devices')}
   summary_info += {'out of process emulation': multiprocess_allowed}
+  summary_info += {'ioregionfd support': ioregionfd_allowed}
 endif
 summary(summary_info, bool_yn: true, section: 'Targets and accelerators')
 
diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3..439fb94c93 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -689,6 +689,29 @@
         'data': { 'chardev': 'str',
                   '*log': 'str' } }
 
+##
+# @IORegionFDObjectProperties:
+#
+# Describes ioregionfd for the device
+#
+# @devid: the id of the device to be associated with the ioregionfd
+#
+# @iofd: File descriptor
+#
+# @bar: BAR number to use with ioregionfd
+#
+# @start: offset from the BAR start address of ioregionfd
+#
+# @size: size of the ioregionfd
+##
+# Since: 2.9
+{ 'struct': 'IORegionFDObjectProperties',
+  'data': { 'devid': 'str',
+            'iofd': 'str',
+            'bar': 'int',
+            '*start': 'int',
+            '*size':'int' } }
+
 ##
 # @RemoteObjectProperties:
 #
@@ -842,8 +865,10 @@
     'tls-creds-psk',
     'tls-creds-x509',
     'tls-cipher-suites',
-    { 'name': 'x-remote-object', 'features': [ 'unstable' ] }
-  ] }
+    { 'name': 'x-remote-object', 'features': [ 'unstable' ] },
+    { 'name' :'ioregionfd-object',
+      'if': 'CONFIG_IOREGIONFD' }
+ ] }
 
 ##
 # @ObjectOptions:
@@ -905,7 +930,8 @@
       'tls-creds-psk':              'TlsCredsPskProperties',
       'tls-creds-x509':             'TlsCredsX509Properties',
       'tls-cipher-suites':          'TlsCredsProperties',
-      'x-remote-object':            'RemoteObjectProperties'
+      'x-remote-object':            'RemoteObjectProperties',
+      'ioregionfd-object':          'IORegionFDObjectProperties'
   } }
 
 ##
diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
new file mode 100644
index 0000000000..c8a8b32ee0
--- /dev/null
+++ b/include/hw/remote/ioregionfd.h
@@ -0,0 +1,40 @@
+/*
+ * Ioregionfd headers
+ *
+ * Copyright © 2018, 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef IOREGIONFD_H
+#define IOREGIONFD_H
+
+#define PCI_BARS_NR 6
+
+typedef struct {
+    uint64_t val;
+    bool memory;
+} IORegionFDOp;
+
+typedef struct {
+    int fd;
+    char *devid;
+    uint32_t bar;
+    uint32_t start;
+    uint32_t size;
+    bool memory;
+} IORegionFD;
+
+struct IORegionFDObject {
+    /* private */
+    Object parent;
+
+    IORegionFD ioregfd;
+    QTAILQ_ENTRY(IORegionFDObject) next;
+};
+
+typedef struct IORegionFDObject IORegionFDObject;
+
+#endif /* IOREGIONFD_H */
diff --git a/hw/remote/ioregionfd.c b/hw/remote/ioregionfd.c
new file mode 100644
index 0000000000..ae95f702a6
--- /dev/null
+++ b/hw/remote/ioregionfd.c
@@ -0,0 +1,196 @@
+/*
+ * Memory manager for remote device
+ *
+ * Copyright © 2018, 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/kvm.h"
+#include "linux/kvm.h"
+
+#include "exec/memory.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qom/object_interfaces.h"
+#include "exec/confidential-guest-support.h"
+#include "io/channel.h"
+#include "qemu/sockets.h"
+#include "qemu/cutils.h"
+#include "io/channel-socket.h"
+#include "monitor/monitor.h"
+#include "hw/remote/ioregionfd.h"
+#include "hw/pci/pci.h"
+#include "qapi/qapi-visit-qom.h"
+#include "hw/remote/remote.h"
+
+#define TYPE_IOREGIONFD_OBJECT "ioregionfd-object"
+OBJECT_DECLARE_TYPE(IORegionFDObject, IORegionFDObjectClass, IOREGIONFD_OBJECT)
+
+struct IORegionFDObjectClass {
+    ObjectClass parent_class;
+
+    unsigned int nr_ioregfds;
+    unsigned int max_ioregfds;
+};
+
+static void ioregionfd_object_init(Object *obj)
+{
+    IORegionFDObjectClass *k = IOREGIONFD_OBJECT_GET_CLASS(obj);
+
+    if (k->nr_ioregfds >= k->max_ioregfds) {
+        error_report("Reached max number of ioregions: %u", k->max_ioregfds);
+        return;
+    }
+}
+
+static void ioregionfd_object_set_fd(Object *obj, const char *str,
+                                     Error **errp)
+{
+    IORegionFDObject *o = IOREGIONFD_OBJECT(obj);
+    int fd = -1;
+
+    fd = monitor_fd_param(monitor_cur(), str, errp);
+    if (fd == -1) {
+        error_prepend(errp, "Could not parse ioregionfd fd %s:", str);
+        return;
+    }
+    o->ioregfd.fd = fd;
+}
+
+static void ioregionfd_object_set_devid(Object *obj, const char *str,
+                                        Error **errp)
+{
+    IORegionFDObject *o = IOREGIONFD_OBJECT(obj);
+
+    g_free(o->ioregfd.devid);
+
+    o->ioregfd.devid = g_strdup(str);
+}
+
+static char *ioregionfd_object_get_devid(Object *obj, Error **errp)
+{
+    IORegionFDObject *o = IOREGIONFD_OBJECT(obj);
+
+    return g_strdup(o->ioregfd.devid);
+}
+
+static void ioregionfd_object_set_bar(Object *obj, Visitor *v,
+                                      const char *name, void *opaque,
+                                      Error **errp)
+{
+    IORegionFDObject *o = IOREGIONFD_OBJECT(obj);
+    uint32_t value;
+
+    if (!visit_type_uint32(v, name, &value, errp)) {
+        return;
+    }
+
+    if (value > PCI_BARS_NR) {
+        error_setg(errp, "BAR number cannot be larger than %d", PCI_BARS_NR);
+        return;
+    }
+
+    o->ioregfd.bar = value;
+}
+
+static void ioregionfd_object_set_start(Object *obj, Visitor *v,
+                                        const char *name, void *opaque,
+                                        Error **errp)
+{
+    IORegionFDObject *o = IOREGIONFD_OBJECT(obj);
+    int64_t value;
+
+    if (!visit_type_int(v, name, &value, errp)) {
+        return;
+    }
+
+    if (value < 0) {
+        error_setg(errp, "BAR start %"PRId64" must be > 0", value);
+        return;
+    }
+
+    if (value > UINT32_MAX) {
+        error_setg(errp, "BAR start %"PRId64" is too big", value);
+        o->ioregfd.start = 0;
+        return;
+    }
+
+    o->ioregfd.start = value;
+}
+
+static void ioregionfd_object_set_size(Object *obj, Visitor *v,
+                                       const char *name, void *opaque,
+                                       Error **errp)
+{
+    IORegionFDObject *o = IOREGIONFD_OBJECT(obj);
+    int64_t value;
+
+    if (!visit_type_int(v, name, &value, errp)) {
+        return;
+    }
+
+    if (value < 0) {
+        error_setg(errp, "Invalid BAR size %"PRId64, value);
+        return;
+    }
+
+    if (value > UINT32_MAX) {
+        error_setg(errp, "BAR size %"PRId64" is too big", value);
+        o->ioregfd.size = 0;
+        return;
+    }
+
+
+    o->ioregfd.size = value;
+}
+
+static void ioregionfd_object_class_init(ObjectClass *klass, void *data)
+{
+    IORegionFDObjectClass *k = IOREGIONFD_OBJECT_CLASS(klass);
+
+    k->nr_ioregfds = 0;
+    k->max_ioregfds = 1;
+
+    object_class_property_add_str(klass, "devid", ioregionfd_object_get_devid,
+                                  ioregionfd_object_set_devid);
+    object_class_property_add_str(klass, "iofd", NULL,
+                                  ioregionfd_object_set_fd);
+    object_class_property_add(klass, "bar", "uint32", NULL,
+                              ioregionfd_object_set_bar, NULL, NULL);
+    object_class_property_add(klass, "start", "uint64", NULL,
+                              ioregionfd_object_set_start, NULL, NULL);
+    object_class_property_add(klass, "size", "uint64", NULL,
+                              ioregionfd_object_set_size, NULL, NULL);
+}
+
+/* Assume that Object user released all allocated structures. */
+static void ioregionfd_object_finalize(Object *obj)
+{
+    IORegionFDObject *o = IOREGIONFD_OBJECT(obj);
+    g_free(o->ioregfd.devid);
+}
+
+static const TypeInfo ioregionfd_object_info = {
+    .name = TYPE_IOREGIONFD_OBJECT,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(IORegionFDObject),
+    .instance_init = ioregionfd_object_init,
+    .instance_finalize = ioregionfd_object_finalize,
+    .class_size = sizeof(IORegionFDObjectClass),
+    .class_init = ioregionfd_object_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void register_types(void)
+{
+    type_register_static(&ioregionfd_object_info);
+}
+
+type_init(register_types);
diff --git a/Kconfig.host b/Kconfig.host
index 60b9c07b5e..af01b75770 100644
--- a/Kconfig.host
+++ b/Kconfig.host
@@ -45,3 +45,6 @@ config MULTIPROCESS_ALLOWED
 config FUZZ
     bool
     select SPARSE_MEM
+
+config IOREGIONFD
+    bool
diff --git a/MAINTAINERS b/MAINTAINERS
index 3c60a29760..d29fa8a7de 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3454,6 +3454,8 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: include/hw/remote/remote.h
+F: include/hw/remote/ioregionfd.h
+F: hw/remote/ioregionfd.c
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
index 08c16e235f..caff3427e7 100644
--- a/hw/remote/Kconfig
+++ b/hw/remote/Kconfig
@@ -2,3 +2,7 @@ config MULTIPROCESS
     bool
     depends on PCI && PCI_EXPRESS && KVM
     select REMOTE_PCIHOST
+config IOREGIONFD
+    bool
+    default n
+    depends on MULTIPROCESS
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index e6a5574242..b190c520c4 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+specific_ss.add(when: 'CONFIG_IOREGIONFD', if_true: files('ioregionfd.c'))
 
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
diff --git a/meson_options.txt b/meson_options.txt
index e392323732..52b338c1b8 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -66,6 +66,8 @@ option('cfi_debug', type: 'boolean', value: 'false',
        description: 'Verbose errors in case of CFI violation')
 option('multiprocess', type: 'feature', value: 'auto',
        description: 'Out of process device emulation support')
+option('ioregionfd', type: 'feature', value: 'auto',
+       description: 'Fast-path IO/MMIO support')
 
 option('attr', type : 'feature', value : 'auto',
        description: 'attr/xattr support')
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 7a17ff4218..1cbd2984f5 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -98,6 +98,7 @@ meson_options_help() {
   printf "%s\n" '                  Xen PCI passthrough support'
   printf "%s\n" '  xkbcommon       xkbcommon support'
   printf "%s\n" '  zstd            zstd compression support'
+  printf "%s\n" '  ioregionfd      ioregionfd support'
 }
 _meson_option_parse() {
   case $1 in
@@ -270,6 +271,8 @@ _meson_option_parse() {
     --disable-xkbcommon) printf "%s" -Dxkbcommon=disabled ;;
     --enable-zstd) printf "%s" -Dzstd=enabled ;;
     --disable-zstd) printf "%s" -Dzstd=disabled ;;
+    --enable-ioregionfd) printf "%s" -Dioregionfd=enabled ;;
+    --disable-ioregionfd) printf "%s" -Dioregionfd=disabled ;;
     *) return 1 ;;
   esac
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 5/8] multiprocess: prepare ioregionfds for remote device
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
                   ` (3 preceding siblings ...)
  2022-02-08  7:22 ` [RFC 4/8] ioregionfd: Introduce IORegionDFObject type Elena Ufimtseva
@ 2022-02-08  7:22 ` Elena Ufimtseva
  2022-02-08  7:22 ` [RFC 6/8] multiprocess: add MPQEMU_CMD_BAR_INFO Elena Ufimtseva
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2022-02-08  7:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, armbru, peterx, mst, berrange, stefanha, pbonzini, philmd

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 include/hw/remote/ioregionfd.h |  1 +
 include/hw/remote/remote.h     |  1 +
 hw/remote/ioregionfd.c         | 26 ++++++++++
 hw/remote/remote-obj.c         | 93 ++++++++++++++++++++++++++++++++++
 4 files changed, 121 insertions(+)

diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
index c8a8b32ee0..85a2ef2c4f 100644
--- a/include/hw/remote/ioregionfd.h
+++ b/include/hw/remote/ioregionfd.h
@@ -37,4 +37,5 @@ struct IORegionFDObject {
 
 typedef struct IORegionFDObject IORegionFDObject;
 
+GSList *ioregionfd_get_obj_list(void);
 #endif /* IOREGIONFD_H */
diff --git a/include/hw/remote/remote.h b/include/hw/remote/remote.h
index a2d23178b9..46390c7934 100644
--- a/include/hw/remote/remote.h
+++ b/include/hw/remote/remote.h
@@ -23,6 +23,7 @@ struct RemoteObject {
 
     DeviceState *dev;
     DeviceListener listener;
+    GHashTable *ioregionfd_hash;
 };
 
 #endif
diff --git a/hw/remote/ioregionfd.c b/hw/remote/ioregionfd.c
index ae95f702a6..85ec0f7d38 100644
--- a/hw/remote/ioregionfd.c
+++ b/hw/remote/ioregionfd.c
@@ -37,6 +37,32 @@ struct IORegionFDObjectClass {
     unsigned int max_ioregfds;
 };
 
+static int ioregionfd_obj_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_IOREGIONFD_OBJECT)) {
+        *list = g_slist_append(*list, obj);
+    }
+
+    object_child_foreach(obj, ioregionfd_obj_list, opaque);
+    return 0;
+}
+
+/*
+ * inquire ioregionfd objects and link them into the list which is
+ * returned to the caller.
+ *
+ * Caller must free the list.
+ */
+GSList *ioregionfd_get_obj_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(object_get_root(), ioregionfd_obj_list, &list);
+    return list;
+}
+
 static void ioregionfd_object_init(Object *obj)
 {
     IORegionFDObjectClass *k = IOREGIONFD_OBJECT_GET_CLASS(obj);
diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
index f0da696662..9bb61c3a2d 100644
--- a/hw/remote/remote-obj.c
+++ b/hw/remote/remote-obj.c
@@ -24,6 +24,10 @@
 #include "qemu/sockets.h"
 #include "monitor/monitor.h"
 #include "hw/remote/remote.h"
+#include "hw/remote/ioregionfd.h"
+#include "qemu/cutils.h"
+#include "qapi/qapi-visit-qom.h"
+#include "qapi/string-output-visitor.h"
 
 #define TYPE_REMOTE_OBJECT "x-remote-object"
 OBJECT_DECLARE_TYPE(RemoteObject, RemoteObjectClass, REMOTE_OBJECT)
@@ -74,6 +78,80 @@ static void remote_object_unrealize_listener(DeviceListener *listener,
     }
 }
 
+static GSList *ioregions_list;
+
+static unsigned int ioregionfd_bar_hash(const void *key)
+{
+    const IORegionFDObject *o = key;
+
+    return g_int_hash(&o->ioregfd.bar);
+}
+
+/* TODO: allow for multiple ioregionfds per BAR. */
+static gboolean ioregionfd_bar_equal(const void *a, const void *b)
+{
+    const IORegionFDObject *oa = a;
+    const IORegionFDObject *ob = b;
+
+    error_report("BARS comparing %d %d", oa->ioregfd.bar, ob->ioregfd.bar);
+    if (oa->ioregfd.bar == ob->ioregfd.bar) {
+        return TRUE;
+    }
+    return FALSE;
+}
+
+static void ioregionfd_prepare_for_dev(RemoteObject *o, PCIDevice *dev)
+{
+    IORegionFDObject *ioregfd_obj = NULL;
+    GSList *obj_list, *list;
+
+    list = ioregionfd_get_obj_list();
+
+    o->ioregionfd_hash = g_hash_table_new(ioregionfd_bar_hash,
+                                       ioregionfd_bar_equal);
+
+    for (obj_list = list; obj_list; obj_list = obj_list->next) {
+        ioregfd_obj = obj_list->data;
+        if (strcmp(ioregfd_obj->ioregfd.devid, o->devid) != 0) {
+            list = g_slist_remove(list, ioregfd_obj);
+            error_report("No my dev remove");
+            continue;
+        }
+        if (!g_hash_table_add(o->ioregionfd_hash, ioregfd_obj)) {
+            error_report("Cannot use more than one ioregionfd per bar");
+            list = g_slist_remove(list, ioregfd_obj);
+            object_unparent(OBJECT(ioregfd_obj));
+        } else {
+            error_report("Added to hash");
+        }
+    }
+
+    if (!list) {
+        error_report("Remote device %s will not have ioregionfds.",
+                     o->devid);
+        goto fatal;
+    }
+
+    /*
+     * Take first element in the list of ioregions and use its fd
+     * for all regions for this device.
+     * TODO: make this more flexible and allow different fd for the
+     * device.
+     */
+    ioregfd_obj = list->data;
+
+    /* This is default and will be changed when proxy requests region info. */
+    ioregfd_obj->ioregfd.memory = true;
+
+    ioregions_list = list;
+    return;
+
+ fatal:
+    g_slist_free(list);
+    g_hash_table_destroy(o->ioregionfd_hash);
+    return;
+}
+
 static void remote_object_machine_done(Notifier *notifier, void *data)
 {
     RemoteObject *o = container_of(notifier, RemoteObject, machine_done);
@@ -98,6 +176,10 @@ static void remote_object_machine_done(Notifier *notifier, void *data)
 
     o->dev = dev;
 
+#if CONFIG_IOREGIONFD
+    ioregionfd_prepare_for_dev(o, PCI_DEVICE(dev));
+#endif
+
     o->listener.unrealize = remote_object_unrealize_listener;
     device_listener_register(&o->listener);
 
@@ -132,6 +214,13 @@ static void remote_object_init(Object *obj)
     qemu_add_machine_init_done_notifier(&o->machine_done);
 }
 
+static void ioregionfd_release(gpointer data, gpointer user_data)
+{
+    IORegionFDObject *o = data;
+
+    object_unparent(OBJECT(o));
+}
+
 static void remote_object_finalize(Object *obj)
 {
     RemoteObjectClass *k = REMOTE_OBJECT_GET_CLASS(obj);
@@ -148,6 +237,10 @@ static void remote_object_finalize(Object *obj)
 
     k->nr_devs--;
     g_free(o->devid);
+    /* Free the list of the ioregions. */
+    g_slist_foreach(ioregions_list, ioregionfd_release, NULL);
+    g_slist_free(ioregions_list);
+    g_hash_table_destroy(o->ioregionfd_hash);
 }
 
 static void remote_object_class_init(ObjectClass *klass, void *data)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 6/8] multiprocess: add MPQEMU_CMD_BAR_INFO
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
                   ` (4 preceding siblings ...)
  2022-02-08  7:22 ` [RFC 5/8] multiprocess: prepare ioregionfds for remote device Elena Ufimtseva
@ 2022-02-08  7:22 ` Elena Ufimtseva
  2022-02-08  7:22 ` [RFC 7/8] multiprocess: add ioregionfd memory region in proxy Elena Ufimtseva
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2022-02-08  7:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, armbru, peterx, mst, berrange, stefanha, pbonzini, philmd

This command is used to request the bar type info from
remote device.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 include/hw/remote/ioregionfd.h  |  2 ++
 include/hw/remote/machine.h     |  1 +
 include/hw/remote/mpqemu-link.h |  2 ++
 hw/remote/ioregionfd.c          | 28 ++++++++++++++++++++++++
 hw/remote/message.c             | 38 +++++++++++++++++++++++++++++++++
 hw/remote/remote-obj.c          |  1 +
 6 files changed, 72 insertions(+)

diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
index 85a2ef2c4f..66bb459f76 100644
--- a/include/hw/remote/ioregionfd.h
+++ b/include/hw/remote/ioregionfd.h
@@ -38,4 +38,6 @@ struct IORegionFDObject {
 typedef struct IORegionFDObject IORegionFDObject;
 
 GSList *ioregionfd_get_obj_list(void);
+IORegionFD *ioregionfd_get_by_bar(GSList *list, uint32_t bar);
+void ioregionfd_set_bar_type(GSList *list, uint32_t bar, bool memory);
 #endif /* IOREGIONFD_H */
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 2a2a33c4b2..71c53ba0d7 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -28,6 +28,7 @@ struct RemoteMachineState {
 typedef struct RemoteCommDev {
     PCIDevice *dev;
     QIOChannel *ioc;
+    GSList *ioregions_list;
 } RemoteCommDev;
 
 #define TYPE_REMOTE_MACHINE "x-remote-machine"
diff --git a/include/hw/remote/mpqemu-link.h b/include/hw/remote/mpqemu-link.h
index 4ec0915885..be546e4586 100644
--- a/include/hw/remote/mpqemu-link.h
+++ b/include/hw/remote/mpqemu-link.h
@@ -17,6 +17,7 @@
 #include "exec/hwaddr.h"
 #include "io/channel-socket.h"
 #include "hw/remote/proxy.h"
+#include "hw/remote/ioregionfd.h"
 
 #define REMOTE_MAX_FDS 8
 
@@ -41,6 +42,7 @@ typedef enum {
     MPQEMU_CMD_BAR_READ,
     MPQEMU_CMD_SET_IRQFD,
     MPQEMU_CMD_DEVICE_RESET,
+    MPQEMU_CMD_BAR_INFO,
     MPQEMU_CMD_MAX,
 } MPQemuCmd;
 
diff --git a/hw/remote/ioregionfd.c b/hw/remote/ioregionfd.c
index 85ec0f7d38..1d371357c6 100644
--- a/hw/remote/ioregionfd.c
+++ b/hw/remote/ioregionfd.c
@@ -63,6 +63,34 @@ GSList *ioregionfd_get_obj_list(void)
     return list;
 }
 
+IORegionFD *ioregionfd_get_by_bar(GSList *list, uint32_t bar)
+{
+    IORegionFDObject *ioregionfd;
+    GSList *elem;
+
+    for (elem = list; elem; elem = elem->next) {
+        ioregionfd = elem->data;
+
+        if (ioregionfd->ioregfd.bar == bar) {
+            return &ioregionfd->ioregfd;
+        }
+    }
+    return NULL;
+}
+
+void ioregionfd_set_bar_type(GSList *list, uint32_t bar, bool memory)
+{
+    IORegionFDObject *ioregionfd;
+    GSList *elem;
+
+    for (elem = list; elem; elem = elem->next) {
+        ioregionfd = elem->data;
+        if (ioregionfd->ioregfd.bar == bar) {
+            ioregionfd->ioregfd.memory = memory;
+        }
+    }
+}
+
 static void ioregionfd_object_init(Object *obj)
 {
     IORegionFDObjectClass *k = IOREGIONFD_OBJECT_GET_CLASS(obj);
diff --git a/hw/remote/message.c b/hw/remote/message.c
index 11d729845c..a8fb9764ba 100644
--- a/hw/remote/message.c
+++ b/hw/remote/message.c
@@ -29,6 +29,8 @@ static void process_bar_write(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
 static void process_bar_read(QIOChannel *ioc, MPQemuMsg *msg, Error **errp);
 static void process_device_reset_msg(QIOChannel *ioc, PCIDevice *dev,
                                      Error **errp);
+static void process_device_get_reg_info(QIOChannel *ioc, RemoteCommDev *com,
+                                        MPQemuMsg *msg, Error **errp);
 
 void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
 {
@@ -75,6 +77,9 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
         case MPQEMU_CMD_DEVICE_RESET:
             process_device_reset_msg(com->ioc, pci_dev, &local_err);
             break;
+        case MPQEMU_CMD_BAR_INFO:
+            process_device_get_reg_info(com->ioc, com, &msg, &local_err);
+            break;
         default:
             error_setg(&local_err,
                        "Unknown command (%d) received for device %s"
@@ -91,6 +96,39 @@ void coroutine_fn mpqemu_remote_msg_loop_co(void *data)
     }
 }
 
+static void process_device_get_reg_info(QIOChannel *ioc, RemoteCommDev *com,
+                                        MPQemuMsg *msg, Error **errp)
+{
+    ERRP_GUARD();
+    uint32_t bar = (uint32_t)(msg->data.u64 & MAKE_64BIT_MASK(0, 32));
+    bool memory;
+
+    memory = (msg->data.u64 && MAKE_64BIT_MASK(32, 32)) == 1 ?  true : false;
+
+    IORegionFD *ioregfd;
+    MPQemuMsg ret = { 0 };
+
+    error_report("Bar is %d, mem %s", bar, memory ? "true" : "false");
+
+    memset(&ret, 0, sizeof(MPQemuMsg));
+    ret.cmd = MPQEMU_CMD_RET;
+    ret.size = sizeof(ret.data.u64);
+
+    ioregfd = ioregionfd_get_by_bar(com->ioregions_list, bar);
+    if (ioregfd) {
+        ret.data.u64 = ioregfd->bar;
+        if (ioregfd->memory != memory) {
+            ioregionfd_set_bar_type(com->ioregions_list, bar, memory);
+        }
+    } else {
+        ret.data.u64 = UINT64_MAX;
+    }
+    if (!mpqemu_msg_send(&ret, ioc, NULL)) {
+        error_prepend(errp, "Error returning code to proxy, pid "FMT_pid": ",
+                      getpid());
+    }
+}
+
 static void process_config_write(QIOChannel *ioc, PCIDevice *dev,
                                  MPQemuMsg *msg, Error **errp)
 {
diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
index 9bb61c3a2d..46c2e2a5bd 100644
--- a/hw/remote/remote-obj.c
+++ b/hw/remote/remote-obj.c
@@ -188,6 +188,7 @@ static void remote_object_machine_done(Notifier *notifier, void *data)
     *comdev = (RemoteCommDev) {
         .ioc = ioc,
         .dev = PCI_DEVICE(dev),
+        .ioregions_list = ioregions_list,
     };
 
     co = qemu_coroutine_create(mpqemu_remote_msg_loop_co, comdev);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 7/8] multiprocess: add ioregionfd memory region in proxy
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
                   ` (5 preceding siblings ...)
  2022-02-08  7:22 ` [RFC 6/8] multiprocess: add MPQEMU_CMD_BAR_INFO Elena Ufimtseva
@ 2022-02-08  7:22 ` Elena Ufimtseva
  2022-02-08  7:22 ` [RFC 8/8] multiprocess: handle ioregionfd commands Elena Ufimtseva
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2022-02-08  7:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, armbru, peterx, mst, berrange, stefanha, pbonzini, philmd

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 include/hw/remote/proxy.h |  1 +
 hw/remote/proxy.c         | 66 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/include/hw/remote/proxy.h b/include/hw/remote/proxy.h
index 741def71f1..9efef0b935 100644
--- a/include/hw/remote/proxy.h
+++ b/include/hw/remote/proxy.h
@@ -29,6 +29,7 @@ struct PCIProxyDev {
     PCIDevice parent_dev;
     char *fd;
 
+    char *ioregfd;
     /*
      * Mutex used to protect the QIOChannel fd from
      * the concurrent access by the VCPUs since proxy
diff --git a/hw/remote/proxy.c b/hw/remote/proxy.c
index bad164299d..ba1aa20d78 100644
--- a/hw/remote/proxy.c
+++ b/hw/remote/proxy.c
@@ -146,6 +146,33 @@ static void pci_proxy_dev_exit(PCIDevice *pdev)
     event_notifier_cleanup(&dev->resample);
 }
 
+static void config_get_ioregionfd_info(PCIProxyDev *pdev, uint32_t reg_num,
+                                       uint32_t *val, bool memory)
+{
+    MPQemuMsg msg = { 0 };
+    Error *local_err = NULL;
+    uint64_t ret = -EINVAL;
+
+    memset(&msg, 0, sizeof(MPQemuMsg));
+    msg.cmd = MPQEMU_CMD_BAR_INFO;
+    msg.num_fds = 0;
+    msg.data.u64 = (uint64_t)reg_num & MAKE_64BIT_MASK(0, 32);
+
+    msg.data.u64 |= memory ? (1ULL << 32) : 0;
+    msg.size = sizeof(msg.data.u64);
+
+    ret = mpqemu_msg_send_and_await_reply(&msg, pdev, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        error_report("Error while receiving reply from remote about fd");
+    }
+    if (ret == UINT64_MAX) {
+        error_report("Failed to request bar info for %d", reg_num);
+    }
+
+    *val = (uint32_t)ret;
+}
+
 static void config_op_send(PCIProxyDev *pdev, uint32_t addr, uint32_t *val,
                            int len, unsigned int op)
 {
@@ -198,6 +225,7 @@ static void pci_proxy_write_config(PCIDevice *d, uint32_t addr, uint32_t val,
 
 static Property proxy_properties[] = {
     DEFINE_PROP_STRING("fd", PCIProxyDev, fd),
+    DEFINE_PROP_STRING("ioregfd", PCIProxyDev, ioregfd),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -297,7 +325,7 @@ const MemoryRegionOps proxy_mr_ops = {
 static void probe_pci_info(PCIDevice *dev, Error **errp)
 {
     PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
-    uint32_t orig_val, new_val, base_class, val;
+    uint32_t orig_val, new_val, base_class, val, ioregionfd_bar;
     PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
     DeviceClass *dc = DEVICE_CLASS(pc);
     uint8_t type;
@@ -342,6 +370,9 @@ static void probe_pci_info(PCIDevice *dev, Error **errp)
     }
 
     for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        bool init_ioregionfd = false;
+        int fd = -1;
+
         config_op_send(pdev, PCI_BASE_ADDRESS_0 + (4 * i), &orig_val, 4,
                        MPQEMU_CMD_PCI_CFGREAD);
         new_val = 0xffffffff;
@@ -362,9 +393,36 @@ static void probe_pci_info(PCIDevice *dev, Error **errp)
             if (type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
                 pdev->region[i].memory = true;
             }
-            memory_region_init_io(&pdev->region[i].mr, OBJECT(pdev),
-                                  &proxy_mr_ops, &pdev->region[i],
-                                  name, size);
+#ifdef CONFIG_IOREGIONFD
+            /*
+             * Currently, only one fd per device is supported.
+             * TODO: Drop this limit.
+             */
+            if (pdev->ioregfd) {
+                fd = monitor_fd_param(monitor_cur(), pdev->ioregfd, errp);
+                if (fd == -1) {
+                    error_prepend(errp, "Could not parse ioregionfd fd %s:",
+                                  pdev->ioregfd);
+                }
+
+                config_get_ioregionfd_info(pdev, i, &ioregionfd_bar,
+                                           pdev->region[i].memory);
+                if (ioregionfd_bar == i) {
+                    init_ioregionfd = true;
+                }
+            }
+#endif
+            if (init_ioregionfd) {
+                memory_region_init_io(&pdev->region[i].mr, OBJECT(pdev),
+                                      NULL, &pdev->region[i],
+                                      name, size);
+                memory_region_add_ioregionfd(&pdev->region[i].mr, 0, size, i,
+                                             fd, false);
+            } else {
+                memory_region_init_io(&pdev->region[i].mr, OBJECT(pdev),
+                                      &proxy_mr_ops, &pdev->region[i],
+                                      name, size);
+            }
             pci_register_bar(dev, i, type, &pdev->region[i].mr);
         }
     }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 8/8] multiprocess: handle ioregionfd commands
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
                   ` (6 preceding siblings ...)
  2022-02-08  7:22 ` [RFC 7/8] multiprocess: add ioregionfd memory region in proxy Elena Ufimtseva
@ 2022-02-08  7:22 ` Elena Ufimtseva
  2022-02-09 10:33 ` [RFC 0/8] ioregionfd introduction Stefan Hajnoczi
  2022-02-14 14:52 ` Stefan Hajnoczi
  9 siblings, 0 replies; 20+ messages in thread
From: Elena Ufimtseva @ 2022-02-08  7:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, armbru, peterx, mst, berrange, stefanha, pbonzini, philmd

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 include/hw/remote/ioregionfd.h |   2 +
 include/hw/remote/remote.h     |   2 +
 linux-headers/ioregionfd.h     |  30 +++++++++
 hw/remote/ioregionfd.c         | 111 +++++++++++++++++++++++++++++++++
 hw/remote/remote-obj.c         |  44 +++++++++++++
 5 files changed, 189 insertions(+)
 create mode 100644 linux-headers/ioregionfd.h

diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
index 66bb459f76..8021eed6f1 100644
--- a/include/hw/remote/ioregionfd.h
+++ b/include/hw/remote/ioregionfd.h
@@ -40,4 +40,6 @@ typedef struct IORegionFDObject IORegionFDObject;
 GSList *ioregionfd_get_obj_list(void);
 IORegionFD *ioregionfd_get_by_bar(GSList *list, uint32_t bar);
 void ioregionfd_set_bar_type(GSList *list, uint32_t bar, bool memory);
+int qio_channel_ioregionfd_read(QIOChannel *ioc, gpointer opaque,
+                                Error **errp);
 #endif /* IOREGIONFD_H */
diff --git a/include/hw/remote/remote.h b/include/hw/remote/remote.h
index 46390c7934..53b570e1ac 100644
--- a/include/hw/remote/remote.h
+++ b/include/hw/remote/remote.h
@@ -23,6 +23,8 @@ struct RemoteObject {
 
     DeviceState *dev;
     DeviceListener listener;
+    QIOChannel *ioregfd_ioc;
+    AioContext *ioregfd_ctx;
     GHashTable *ioregionfd_hash;
 };
 
diff --git a/linux-headers/ioregionfd.h b/linux-headers/ioregionfd.h
new file mode 100644
index 0000000000..58f9b5ba61
--- /dev/null
+++ b/linux-headers/ioregionfd.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: ((GPL-2.0-only WITH Linux-syscall-note) OR BSD-3-Clause) */
+#ifndef _UAPI_LINUX_IOREGION_H
+#define _UAPI_LINUX_IOREGION_H
+
+/* Wire protocol */
+
+struct ioregionfd_cmd {
+	__u8 cmd;
+	__u8 size_exponent : 4;
+	__u8 resp : 1;
+	__u8 padding[6];
+	__u64 user_data;
+	__u64 offset;
+	__u64 data;
+};
+
+struct ioregionfd_resp {
+	__u64 data;
+	__u8 pad[24];
+};
+
+#define IOREGIONFD_CMD_READ    0
+#define IOREGIONFD_CMD_WRITE   1
+
+#define IOREGIONFD_SIZE_8BIT   0
+#define IOREGIONFD_SIZE_16BIT  1
+#define IOREGIONFD_SIZE_32BIT  2
+#define IOREGIONFD_SIZE_64BIT  3
+
+#endif
diff --git a/hw/remote/ioregionfd.c b/hw/remote/ioregionfd.c
index 1d371357c6..dd04c39e25 100644
--- a/hw/remote/ioregionfd.c
+++ b/hw/remote/ioregionfd.c
@@ -26,6 +26,7 @@
 #include "hw/pci/pci.h"
 #include "qapi/qapi-visit-qom.h"
 #include "hw/remote/remote.h"
+#include "ioregionfd.h"
 
 #define TYPE_IOREGIONFD_OBJECT "ioregionfd-object"
 OBJECT_DECLARE_TYPE(IORegionFDObject, IORegionFDObjectClass, IOREGIONFD_OBJECT)
@@ -91,6 +92,116 @@ void ioregionfd_set_bar_type(GSList *list, uint32_t bar, bool memory)
     }
 }
 
+int qio_channel_ioregionfd_read(QIOChannel *ioc, gpointer opaque,
+                                Error **errp)
+{
+    struct RemoteObject *o = (struct RemoteObject *)opaque;
+    struct ioregionfd_cmd cmd = {};
+    struct iovec iov = {
+        .iov_base = &cmd,
+        .iov_len = sizeof(struct ioregionfd_cmd),
+    };
+    IORegionFDObject *ioregfd_obj;
+    PCIDevice *pci_dev;
+    hwaddr addr;
+    struct ioregionfd_resp resp = {};
+    int bar = 0;
+    Error *local_err = NULL;
+    uint64_t val = UINT64_MAX;
+    AddressSpace *as;
+    int ret = -EINVAL;
+
+    ERRP_GUARD();
+
+    if (!ioc) {
+        return -EINVAL;
+    }
+    ret = qio_channel_readv_full(ioc, &iov, 1, NULL, 0, &local_err);
+
+    if (ret == QIO_CHANNEL_ERR_BLOCK) {
+        return -EINVAL;
+    }
+
+    if (ret <= 0) {
+        /* read error or other side closed connection */
+        if (local_err) {
+            error_report_err(local_err);
+        }
+        error_setg(errp, "ioregionfd receive error");
+        return -EINVAL;
+    }
+
+    bar = cmd.user_data;
+    pci_dev = PCI_DEVICE(o->dev);
+    addr = (hwaddr)(pci_get_bar_addr(pci_dev, bar) + cmd.offset);
+    IORegionFDObject key = {.ioregfd = {.bar = bar} };
+    ioregfd_obj = g_hash_table_lookup(o->ioregionfd_hash, &key);
+
+    if (!ioregfd_obj) {
+        error_setg(errp, "Could not find IORegionFDObject");
+        return -EINVAL;
+    }
+    if (ioregfd_obj->ioregfd.memory) {
+        as = &address_space_memory;
+    } else {
+        as = &address_space_io;
+    }
+
+    if (ret > 0 && pci_dev) {
+        switch (cmd.cmd) {
+        case IOREGIONFD_CMD_READ:
+            ret = address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED,
+                                   (void *)&val, 1 << cmd.size_exponent,
+                                   false);
+            if (ret != MEMTX_OK) {
+                ret = -EINVAL;
+                error_setg(errp, "Bad address %"PRIx64" in mem read", addr);
+                val = UINT64_MAX;
+            }
+
+            memset(&resp, 0, sizeof(resp));
+            resp.data = val;
+            if (qio_channel_write_all(ioc, (char *)&resp, sizeof(resp),
+                                      &local_err)) {
+                error_propagate(errp, local_err);
+                goto fatal;
+            }
+            break;
+        case IOREGIONFD_CMD_WRITE:
+            ret = address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED,
+                                   (void *)&cmd.data, 1 << cmd.size_exponent,
+                                   true);
+            if (ret != MEMTX_OK) {
+                error_setg(errp, "Bad address %"PRIx64" for mem write", addr);
+                val = UINT64_MAX;
+            }
+
+            if (cmd.resp) {
+                memset(&resp, 0, sizeof(resp));
+                if (ret != MEMTX_OK) {
+                    resp.data = UINT64_MAX;
+                    ret = -EINVAL;
+                } else {
+                    resp.data = cmd.data;
+                }
+                if (qio_channel_write_all(ioc, (char *)&resp, sizeof(resp),
+                                          &local_err)) {
+                    error_propagate(errp, local_err);
+                    goto fatal;
+                }
+            }
+            break;
+        default:
+            error_setg(errp, "Unknown ioregionfd command from kvm");
+            break;
+        }
+    }
+    return ret;
+
+ fatal:
+    return -EINVAL;
+}
+
 static void ioregionfd_object_init(Object *obj)
 {
     IORegionFDObjectClass *k = IOREGIONFD_OBJECT_GET_CLASS(obj);
diff --git a/hw/remote/remote-obj.c b/hw/remote/remote-obj.c
index 46c2e2a5bd..2b005eab40 100644
--- a/hw/remote/remote-obj.c
+++ b/hw/remote/remote-obj.c
@@ -11,6 +11,7 @@
 #include "qemu-common.h"
 
 #include "qemu/error-report.h"
+#include "sysemu/iothread.h"
 #include "qemu/notify.h"
 #include "qom/object_interfaces.h"
 #include "hw/qdev-core.h"
@@ -78,6 +79,16 @@ static void remote_object_unrealize_listener(DeviceListener *listener,
     }
 }
 
+static IOThread *ioregionfd_iot;
+
+static void ioregion_read(void *opaque)
+{
+    struct RemoteObject *o = opaque;
+    Error *local_error = NULL;
+
+    qio_channel_ioregionfd_read(o->ioregfd_ioc, opaque, &local_error);
+}
+
 static GSList *ioregions_list;
 
 static unsigned int ioregionfd_bar_hash(const void *key)
@@ -104,6 +115,8 @@ static void ioregionfd_prepare_for_dev(RemoteObject *o, PCIDevice *dev)
 {
     IORegionFDObject *ioregfd_obj = NULL;
     GSList *obj_list, *list;
+    QIOChannel *ioc = NULL;
+    Error *local_err = NULL;
 
     list = ioregionfd_get_obj_list();
 
@@ -143,6 +156,30 @@ static void ioregionfd_prepare_for_dev(RemoteObject *o, PCIDevice *dev)
     /* This is default and will be changed when proxy requests region info. */
     ioregfd_obj->ioregfd.memory = true;
 
+    ioc = qio_channel_new_fd(ioregfd_obj->ioregfd.fd, &local_err);
+    if (!ioc) {
+        error_prepend(&local_err, "Could not create IOC channel for" \
+                      "ioregionfd fd %d", ioregfd_obj->ioregfd.fd);
+        error_report_err(local_err);
+        goto fatal;
+    }
+    o->ioregfd_ioc = ioc;
+
+    if (ioregionfd_iot == NULL) {
+        ioregionfd_iot = iothread_create("ioregionfd iothread",
+                                       &local_err);
+        if (local_err) {
+            qio_channel_shutdown(o->ioregfd_ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
+                                 NULL);
+            qio_channel_close(o->ioregfd_ioc, NULL);
+            error_report_err(local_err);
+            goto fatal;
+        }
+    }
+    o->ioregfd_ctx = iothread_get_aio_context(ioregionfd_iot);
+    qio_channel_set_aio_fd_handler(o->ioregfd_ioc, o->ioregfd_ctx,
+                                   ioregion_read, NULL, o);
+
     ioregions_list = list;
     return;
 
@@ -238,8 +275,15 @@ static void remote_object_finalize(Object *obj)
 
     k->nr_devs--;
     g_free(o->devid);
+
+    iothread_destroy(ioregionfd_iot);
     /* Free the list of the ioregions. */
     g_slist_foreach(ioregions_list, ioregionfd_release, NULL);
+    if (o->ioregfd_ioc) {
+        qio_channel_shutdown(o->ioregfd_ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+        qio_channel_close(o->ioregfd_ioc, NULL);
+    }
+
     g_slist_free(ioregions_list);
     g_hash_table_destroy(o->ioregionfd_hash);
 }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8] ioregionfd introduction
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
                   ` (7 preceding siblings ...)
  2022-02-08  7:22 ` [RFC 8/8] multiprocess: handle ioregionfd commands Elena Ufimtseva
@ 2022-02-09 10:33 ` Stefan Hajnoczi
  2022-02-14 14:52 ` Stefan Hajnoczi
  9 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2022-02-09 10:33 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, armbru, mst, berrange, pbonzini,
	philmd

[-- Attachment #1: Type: text/plain, Size: 262 bytes --]

On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote:
> This patchset is an RFC version for the ioregionfd implementation
> in QEMU. The kernel patches are to be posted with some fixes as a v4.

Hi Elena,
I will review this on Monday. Thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/8] ioregionfd: Introduce IORegionDFObject type
  2022-02-08  7:22 ` [RFC 4/8] ioregionfd: Introduce IORegionDFObject type Elena Ufimtseva
@ 2022-02-11 13:46   ` Markus Armbruster
  2022-02-15 18:19     ` Elena
  2022-02-14 14:37   ` Stefan Hajnoczi
  1 sibling, 1 reply; 20+ messages in thread
From: Markus Armbruster @ 2022-02-11 13:46 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, mst, berrange, stefanha, pbonzini,
	philmd

Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>

[...]

> diff --git a/qapi/qom.json b/qapi/qom.json
> index eeb5395ff3..439fb94c93 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -689,6 +689,29 @@
>          'data': { 'chardev': 'str',
>                    '*log': 'str' } }
>  
> +##
> +# @IORegionFDObjectProperties:
> +#
> +# Describes ioregionfd for the device
> +#
> +# @devid: the id of the device to be associated with the ioregionfd
> +#
> +# @iofd: File descriptor
> +#
> +# @bar: BAR number to use with ioregionfd
> +#
> +# @start: offset from the BAR start address of ioregionfd
> +#
> +# @size: size of the ioregionfd
> +##
> +# Since: 2.9
> +{ 'struct': 'IORegionFDObjectProperties',
> +  'data': { 'devid': 'str',
> +            'iofd': 'str',
> +            'bar': 'int',
> +            '*start': 'int',
> +            '*size':'int' } }

Should these three be 'uint32' to match struct IORegionFD?

> +
>  ##
>  # @RemoteObjectProperties:
>  #
> @@ -842,8 +865,10 @@
>      'tls-creds-psk',
>      'tls-creds-x509',
>      'tls-cipher-suites',
> -    { 'name': 'x-remote-object', 'features': [ 'unstable' ] }
> -  ] }
> +    { 'name': 'x-remote-object', 'features': [ 'unstable' ] },
> +    { 'name' :'ioregionfd-object',
> +      'if': 'CONFIG_IOREGIONFD' }
> + ] }
>  
>  ##
>  # @ObjectOptions:
> @@ -905,7 +930,8 @@
>        'tls-creds-psk':              'TlsCredsPskProperties',
>        'tls-creds-x509':             'TlsCredsX509Properties',
>        'tls-cipher-suites':          'TlsCredsProperties',
> -      'x-remote-object':            'RemoteObjectProperties'
> +      'x-remote-object':            'RemoteObjectProperties',
> +      'ioregionfd-object':          'IORegionFDObjectProperties'
>    } }
>  
>  ##
> diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
> new file mode 100644
> index 0000000000..c8a8b32ee0
> --- /dev/null
> +++ b/include/hw/remote/ioregionfd.h
> @@ -0,0 +1,40 @@
> +/*
> + * Ioregionfd headers
> + *
> + * Copyright © 2018, 2022 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef IOREGIONFD_H
> +#define IOREGIONFD_H
> +
> +#define PCI_BARS_NR 6
> +
> +typedef struct {
> +    uint64_t val;
> +    bool memory;
> +} IORegionFDOp;
> +
> +typedef struct {
> +    int fd;
> +    char *devid;
> +    uint32_t bar;
> +    uint32_t start;
> +    uint32_t size;
> +    bool memory;
> +} IORegionFD;
> +
> +struct IORegionFDObject {
> +    /* private */
> +    Object parent;
> +
> +    IORegionFD ioregfd;
> +    QTAILQ_ENTRY(IORegionFDObject) next;
> +};
> +
> +typedef struct IORegionFDObject IORegionFDObject;
> +
> +#endif /* IOREGIONFD_H */

[...]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 3/8] ioregionfd: introduce memory API functions
  2022-02-08  7:22 ` [RFC 3/8] ioregionfd: introduce memory API functions Elena Ufimtseva
@ 2022-02-14 14:32   ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2022-02-14 14:32 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, armbru, mst, berrange, pbonzini,
	philmd

[-- Attachment #1: Type: text/plain, Size: 1296 bytes --]

On Mon, Feb 07, 2022 at 11:22:17PM -0800, Elena Ufimtseva wrote:
> @@ -2434,6 +2569,42 @@ void memory_region_clear_flush_coalesced(MemoryRegion *mr)
>  
>  static bool userspace_eventfd_warning;
>  
> +void memory_region_add_ioregionfd(MemoryRegion *mr,
> +                                  hwaddr addr,
> +                                  unsigned size,
> +                                  uint64_t data,

uint64_t data is vague and can be confused with ioeventfd's match data
field. QEMU tends to use void *opaque, but following the ioregionfd
kernel API's naming would be fine too: uint64_t user_data.

> +                                  int fd,
> +                                  bool pio)
> +{
> +    MemoryRegionIoregionfd mriofd = {
> +        .addr.start = int128_make64(addr),
> +        .addr.size = int128_make64(size),
> +        .data = data,
> +        .fd = fd,
> +    };
> +    unsigned i;
> +
> +    if (kvm_enabled() && !kvm_ioregionfds_enabled()) {
> +        error_report("KVM does not support KVM_CAP_IOREGIONFD");
> +    }

Is this a fatal error?

QEMU should have a userspace ioregionfd implementation for
compatibility. That allows ioregionfd to be tested without running in
KVM mode. (This is how ioeventfd support works in QEMU.)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/8] ioregionfd: Introduce IORegionDFObject type
  2022-02-08  7:22 ` [RFC 4/8] ioregionfd: Introduce IORegionDFObject type Elena Ufimtseva
  2022-02-11 13:46   ` Markus Armbruster
@ 2022-02-14 14:37   ` Stefan Hajnoczi
  2022-02-15 18:18     ` Elena
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2022-02-14 14:37 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, armbru, mst, berrange, pbonzini,
	philmd

[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]

On Mon, Feb 07, 2022 at 11:22:18PM -0800, Elena Ufimtseva wrote:
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> ---
>  meson.build                    |  15 ++-
>  qapi/qom.json                  |  32 +++++-
>  include/hw/remote/ioregionfd.h |  40 +++++++
>  hw/remote/ioregionfd.c         | 196 +++++++++++++++++++++++++++++++++
>  Kconfig.host                   |   3 +
>  MAINTAINERS                    |   2 +
>  hw/remote/Kconfig              |   4 +
>  hw/remote/meson.build          |   1 +
>  meson_options.txt              |   2 +
>  scripts/meson-buildoptions.sh  |   3 +
>  10 files changed, 294 insertions(+), 4 deletions(-)
>  create mode 100644 include/hw/remote/ioregionfd.h
>  create mode 100644 hw/remote/ioregionfd.c
> 
> diff --git a/meson.build b/meson.build
> index 96de1a6ef9..6483e754bd 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -258,6 +258,17 @@ if targetos != 'linux' and get_option('multiprocess').enabled()
>  endif
>  multiprocess_allowed = targetos == 'linux' and not get_option('multiprocess').disabled()
>  
> +# TODO: drop this limitation

What is the reason for the limitation?

> +if not multiprocess_allowed and not get_option('ioregionfd').disabled()
> +  error('To enable ioregiofd support, enable mutliprocess option.')

s/ioregiofd/ioregionfd/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8] ioregionfd introduction
  2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
                   ` (8 preceding siblings ...)
  2022-02-09 10:33 ` [RFC 0/8] ioregionfd introduction Stefan Hajnoczi
@ 2022-02-14 14:52 ` Stefan Hajnoczi
  2022-02-15 18:16   ` Elena
  9 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2022-02-14 14:52 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, armbru, mst, berrange, pbonzini,
	philmd

[-- Attachment #1: Type: text/plain, Size: 4476 bytes --]

On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote:
> This patchset is an RFC version for the ioregionfd implementation
> in QEMU. The kernel patches are to be posted with some fixes as a v4.
> 
> For this implementation version 3 of the posted kernel patches was user:
> https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/
> 
> The future version will include support for vfio/libvfio-user.
> Please refer to the design discussion here proposed by Stefan:
> https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/
> 
> The vfio-user version needed some bug-fixing and it was decided to send
> this for multiprocess first.
> 
> The ioregionfd is configured currently trough the command line and each
> ioregionfd represent an object. This allow for easy parsing and does
> not require device/remote object command line option modifications.
> 
> The following command line can be used to specify ioregionfd:
> <snip>
>   '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
>   '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
>   '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\

Explicit configuration of ioregionfd-object is okay for early
prototyping, but what is the plan for integrating this? I guess
x-remote-object would query the remote device to find out which
ioregionfds need to be registered and the user wouldn't need to specify
ioregionfds on the command-line?

> </snip>
> 
> Proxy side of ioregionfd in this version uses only one file descriptor:
> <snip>
>   '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \
> </snip>

This raises the question of the ioregionfd file descriptor lifecycle. In
the end I think it shouldn't be specified on the command-line. Instead
the remote device should create it and pass it to QEMU over the
mpqemu/remote fd?

> 
> This is done for RFC version and my though was that next version will
> be for vfio-user, so I have not dedicated much effort to this command
> line options.
> 
> The multiprocess messaging protocol was extended to support inquiries
> by the proxy if device has any ioregionfds.
> This RFC implements inquires by proxy about the type of BAR (ioregionfd
> or not) and the type of it (memory/io).
> 
> Currently there are few limitations in this version of ioregionfd.
>  - one ioregionfd per bar, only full bar size is supported;
>  - one file descriptor per device for all of its ioregionfds;
>  - each remote device runs fd handler for all its BARs in one IOThread;
>  - proxy supports only one fd.
> 
> Some of these limitations will be dropped in the future version.
> This RFC is to acquire the feedback/suggestions from the community
> on the general approach.
> 
> The quick performance test was done for the remote lsi device with
> ioregionfd and without for both mem BARs (1 and 2) with help
> of the fio tool:
> 
> Random R/W:
> 
> 	             read IOPS	read BW     write IOPS   write BW
> no ioregionfd	 889	    3559KiB/s   890          3561KiB/s
> ioregionfd	     938	    3756KiB/s   939          3757KiB/s

This is extremely slow, even for random I/O. How does this compare to
QEMU running the LSI device without multi-process mode?

> Sequential Read and Sequential Write:
> 
>                  Sequential read		Sequential write	
>                  read IOPS	read BW	    write IOPS	 write BW
> 
> no ioregionfd    367k	    1434MiB/s	76k	         297MiB/s
> ioregionfd       374k	    1459MiB/s	77.3k	     302MiB/s

It's normal for read and write IOPS to differ, but the read IOPS are
very high. I wonder if caching and read-ahead are hiding the LSI
device's actual performance here.

What are the fio and QEMU command-lines?

In order to benchmark ioregionfd it's best to run a benchmark where the
bottleneck is MMIO/PIO dispatch. Otherwise we're looking at some other
bottleneck (e.g. physical disk I/O performance) and the MMIO/PIO
dispatch cost doesn't affect IOPS significantly.

I suggest trying --blockdev null-co,size=64G,id=null0 as the disk
instead of a file or host block device. The fio block size should be 4k
to minimize the amount of time spent on I/O buffer contents and
iodepth=1 because batching multiple requests with iodepth > 0 hides the
MMIO/PIO dispatch bottleneck.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8] ioregionfd introduction
  2022-02-14 14:52 ` Stefan Hajnoczi
@ 2022-02-15 18:16   ` Elena
  2022-02-16 11:20     ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Elena @ 2022-02-15 18:16 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, armbru, mst, berrange, pbonzini,
	philmd

On Mon, Feb 14, 2022 at 02:52:29PM +0000, Stefan Hajnoczi wrote:
> On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote:
> > This patchset is an RFC version for the ioregionfd implementation
> > in QEMU. The kernel patches are to be posted with some fixes as a v4.
> > 
> > For this implementation version 3 of the posted kernel patches was user:
> > https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/
> > 
> > The future version will include support for vfio/libvfio-user.
> > Please refer to the design discussion here proposed by Stefan:
> > https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/
> > 
> > The vfio-user version needed some bug-fixing and it was decided to send
> > this for multiprocess first.
> > 
> > The ioregionfd is configured currently trough the command line and each
> > ioregionfd represent an object. This allow for easy parsing and does
> > not require device/remote object command line option modifications.
> > 
> > The following command line can be used to specify ioregionfd:
> > <snip>
> >   '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
> >   '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
> >   '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
> 

Hi Stefan

Thank you for taking a look!

> Explicit configuration of ioregionfd-object is okay for early
> prototyping, but what is the plan for integrating this? I guess
> x-remote-object would query the remote device to find out which
> ioregionfds need to be registered and the user wouldn't need to specify
> ioregionfds on the command-line?

Yes, this can be done. For some reason I thought that user will be able
to configure the number/size of the regions to be configured as
ioregionfds. 

> 
> > </snip>
> > 
> > Proxy side of ioregionfd in this version uses only one file descriptor:
> > <snip>
> >   '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \
> > </snip>
> 
> This raises the question of the ioregionfd file descriptor lifecycle. In
> the end I think it shouldn't be specified on the command-line. Instead
> the remote device should create it and pass it to QEMU over the
> mpqemu/remote fd?

Yes, this will be same as vfio-user does.
> 
> > 
> > This is done for RFC version and my though was that next version will
> > be for vfio-user, so I have not dedicated much effort to this command
> > line options.
> > 
> > The multiprocess messaging protocol was extended to support inquiries
> > by the proxy if device has any ioregionfds.
> > This RFC implements inquires by proxy about the type of BAR (ioregionfd
> > or not) and the type of it (memory/io).
> > 
> > Currently there are few limitations in this version of ioregionfd.
> >  - one ioregionfd per bar, only full bar size is supported;
> >  - one file descriptor per device for all of its ioregionfds;
> >  - each remote device runs fd handler for all its BARs in one IOThread;
> >  - proxy supports only one fd.
> > 
> > Some of these limitations will be dropped in the future version.
> > This RFC is to acquire the feedback/suggestions from the community
> > on the general approach.
> > 
> > The quick performance test was done for the remote lsi device with
> > ioregionfd and without for both mem BARs (1 and 2) with help
> > of the fio tool:
> > 
> > Random R/W:
> > 
> > 	             read IOPS	read BW     write IOPS   write BW
> > no ioregionfd	 889	    3559KiB/s   890          3561KiB/s
> > ioregionfd	     938	    3756KiB/s   939          3757KiB/s
> 
> This is extremely slow, even for random I/O. How does this compare to
> QEMU running the LSI device without multi-process mode?

These tests had the iodepth=256. I have changed this to 1 and tested
without multiprocess, with multiprocess and multiprocess with both mmio
regions as ioregionfds:

	                 read IOPS  read BW(KiB/s)  write IOPS   write BW (KiB/s)
no multiprocess             89	         358	       90	    360
multiprocess                138	         556	       139	    557
multiprocess ioregionfd	    174	         698	       173	    693

The fio config for randomrw:
[global]
bs=4K
iodepth=1
direct=0
ioengine=libaio
group_reporting
time_based
runtime=240
numjobs=1
name=raw-randreadwrite
rw=randrw
size=8G
[job1]
filename=/fio/randomrw

And QEMU command line for non-mutliprocess:

/usr/local/bin/qemu-system-x86_64  -name "OL7.4" -machine q35,accel=kvm -smp sockets=1,cores=2,threads=2 -m 2048 -hda /home/homedir/ol7u9boot.img -boot d -vnc :0 -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios -device lsi53c895a,id=lsi1 -drive id=drive_image1,if=none,file=/home/homedir/10gb.qcow2 -device scsi-hd,id=drive1,drive=drive_image1,bus=lsi1.0,scsi-id=0

QEMU command line for multiprocess:

remote_cmd = [ PROC_QEMU,                                                      \
               '-machine', 'x-remote',                                         \
               '-device', 'lsi53c895a,id=lsi0',                                \
               '-drive', 'id=drive_image1,file=/home/homedir/10gb.qcow2',   \
               '-device', 'scsi-hd,id=drive2,drive=drive_image1,bus=lsi0.0,'   \
                              'scsi-id=0',                                     \
               '-nographic',                                                   \
               '-monitor', 'unix:/home/homedir/rem-sock,server,nowait',                \
               '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
               '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1,',\
               '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
               ]
proxy_cmd = [ PROC_QEMU,                                           \
              '-D', '/tmp/qemu-debug-log', \
              '-name', 'OL7.4',                                                \
              '-machine', 'pc,accel=kvm',                                      \
              '-smp', 'sockets=1,cores=2,threads=2',                           \
              '-m', '2048',                                                    \
              '-object', 'memory-backend-memfd,id=sysmem-file,size=2G',        \
              '-numa', 'node,memdev=sysmem-file',                              \
              '-hda','/home/homedir/ol7u9boot.img',                      \
              '-boot', 'd',                                                    \
              '-vnc', ':0',                                                    \
              '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()),               \
              '-monitor', 'unix:/home/homedir/qemu-sock,server,nowait',                \
              '-netdev','tap,id=mynet0,ifname=tap0,script=no,downscript=no', '-device','e1000,netdev=mynet0,mac=52:55:00:d1:55:01',\
            ]

Where for the test without ioregionfds, they are commented out.

I am doing more testing as I see some inconsistent results.

> 
> > Sequential Read and Sequential Write:
> > 
> >                  Sequential read		Sequential write	
> >                  read IOPS	read BW	    write IOPS	 write BW
> > 
> > no ioregionfd    367k	    1434MiB/s	76k	         297MiB/s
> > ioregionfd       374k	    1459MiB/s	77.3k	     302MiB/s
> 
> It's normal for read and write IOPS to differ, but the read IOPS are
> very high. I wonder if caching and read-ahead are hiding the LSI
> device's actual performance here.
> 
> What are the fio and QEMU command-lines?
> 
> In order to benchmark ioregionfd it's best to run a benchmark where the
> bottleneck is MMIO/PIO dispatch. Otherwise we're looking at some other
> bottleneck (e.g. physical disk I/O performance) and the MMIO/PIO
> dispatch cost doesn't affect IOPS significantly.
> 
> I suggest trying --blockdev null-co,size=64G,id=null0 as the disk
> instead of a file or host block device. The fio block size should be 4k
> to minimize the amount of time spent on I/O buffer contents and
> iodepth=1 because batching multiple requests with iodepth > 0 hides the
> MMIO/PIO dispatch bottleneck.

The queue depth in the tests above was 256, I will try that you have
suggested. The block size is 4k.

I am also looking at some other system issue that can interfere with
test, will be running test on the fresh install and with settings you
mentioned above.

Thank you!
> 
> Stefan




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/8] ioregionfd: Introduce IORegionDFObject type
  2022-02-14 14:37   ` Stefan Hajnoczi
@ 2022-02-15 18:18     ` Elena
  2022-02-16 11:08       ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Elena @ 2022-02-15 18:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, armbru, mst, berrange, pbonzini,
	philmd

On Mon, Feb 14, 2022 at 02:37:21PM +0000, Stefan Hajnoczi wrote:
> On Mon, Feb 07, 2022 at 11:22:18PM -0800, Elena Ufimtseva wrote:
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > ---
> >  meson.build                    |  15 ++-
> >  qapi/qom.json                  |  32 +++++-
> >  include/hw/remote/ioregionfd.h |  40 +++++++
> >  hw/remote/ioregionfd.c         | 196 +++++++++++++++++++++++++++++++++
> >  Kconfig.host                   |   3 +
> >  MAINTAINERS                    |   2 +
> >  hw/remote/Kconfig              |   4 +
> >  hw/remote/meson.build          |   1 +
> >  meson_options.txt              |   2 +
> >  scripts/meson-buildoptions.sh  |   3 +
> >  10 files changed, 294 insertions(+), 4 deletions(-)
> >  create mode 100644 include/hw/remote/ioregionfd.h
> >  create mode 100644 hw/remote/ioregionfd.c
> > 
> > diff --git a/meson.build b/meson.build
> > index 96de1a6ef9..6483e754bd 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -258,6 +258,17 @@ if targetos != 'linux' and get_option('multiprocess').enabled()
> >  endif
> >  multiprocess_allowed = targetos == 'linux' and not get_option('multiprocess').disabled()
> >  
> > +# TODO: drop this limitation
> 
> What is the reason for the limitation?
>

The idea is to limit use of this acceleration until the API is more
generic and does not need mutliprocess.

> > +if not multiprocess_allowed and not get_option('ioregionfd').disabled()
> > +  error('To enable ioregiofd support, enable mutliprocess option.')
> 
> s/ioregiofd/ioregionfd/
!

Thanks Stefan!




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/8] ioregionfd: Introduce IORegionDFObject type
  2022-02-11 13:46   ` Markus Armbruster
@ 2022-02-15 18:19     ` Elena
  0 siblings, 0 replies; 20+ messages in thread
From: Elena @ 2022-02-15 18:19 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, mst, berrange, stefanha, pbonzini,
	philmd

On Fri, Feb 11, 2022 at 02:46:47PM +0100, Markus Armbruster wrote:
> Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:
> 
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> 
> [...]
> 
> > diff --git a/qapi/qom.json b/qapi/qom.json
> > index eeb5395ff3..439fb94c93 100644
> > --- a/qapi/qom.json
> > +++ b/qapi/qom.json
> > @@ -689,6 +689,29 @@
> >          'data': { 'chardev': 'str',
> >                    '*log': 'str' } }
> >  
> > +##
> > +# @IORegionFDObjectProperties:
> > +#
> > +# Describes ioregionfd for the device
> > +#
> > +# @devid: the id of the device to be associated with the ioregionfd
> > +#
> > +# @iofd: File descriptor
> > +#
> > +# @bar: BAR number to use with ioregionfd
> > +#
> > +# @start: offset from the BAR start address of ioregionfd
> > +#
> > +# @size: size of the ioregionfd
> > +##
> > +# Since: 2.9
> > +{ 'struct': 'IORegionFDObjectProperties',
> > +  'data': { 'devid': 'str',
> > +            'iofd': 'str',
> > +            'bar': 'int',
> > +            '*start': 'int',
> > +            '*size':'int' } }
> 
> Should these three be 'uint32' to match struct IORegionFD?
>

That is right, I will fix this.

Thank you Markus.
> > +
> >  ##
> >  # @RemoteObjectProperties:
> >  #
> > @@ -842,8 +865,10 @@
> >      'tls-creds-psk',
> >      'tls-creds-x509',
> >      'tls-cipher-suites',
> > -    { 'name': 'x-remote-object', 'features': [ 'unstable' ] }
> > -  ] }
> > +    { 'name': 'x-remote-object', 'features': [ 'unstable' ] },
> > +    { 'name' :'ioregionfd-object',
> > +      'if': 'CONFIG_IOREGIONFD' }
> > + ] }
> >  
> >  ##
> >  # @ObjectOptions:
> > @@ -905,7 +930,8 @@
> >        'tls-creds-psk':              'TlsCredsPskProperties',
> >        'tls-creds-x509':             'TlsCredsX509Properties',
> >        'tls-cipher-suites':          'TlsCredsProperties',
> > -      'x-remote-object':            'RemoteObjectProperties'
> > +      'x-remote-object':            'RemoteObjectProperties',
> > +      'ioregionfd-object':          'IORegionFDObjectProperties'
> >    } }
> >  
> >  ##
> > diff --git a/include/hw/remote/ioregionfd.h b/include/hw/remote/ioregionfd.h
> > new file mode 100644
> > index 0000000000..c8a8b32ee0
> > --- /dev/null
> > +++ b/include/hw/remote/ioregionfd.h
> > @@ -0,0 +1,40 @@
> > +/*
> > + * Ioregionfd headers
> > + *
> > + * Copyright © 2018, 2022 Oracle and/or its affiliates.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef IOREGIONFD_H
> > +#define IOREGIONFD_H
> > +
> > +#define PCI_BARS_NR 6
> > +
> > +typedef struct {
> > +    uint64_t val;
> > +    bool memory;
> > +} IORegionFDOp;
> > +
> > +typedef struct {
> > +    int fd;
> > +    char *devid;
> > +    uint32_t bar;
> > +    uint32_t start;
> > +    uint32_t size;
> > +    bool memory;
> > +} IORegionFD;
> > +
> > +struct IORegionFDObject {
> > +    /* private */
> > +    Object parent;
> > +
> > +    IORegionFD ioregfd;
> > +    QTAILQ_ENTRY(IORegionFDObject) next;
> > +};
> > +
> > +typedef struct IORegionFDObject IORegionFDObject;
> > +
> > +#endif /* IOREGIONFD_H */
> 
> [...]
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/8] ioregionfd: Introduce IORegionDFObject type
  2022-02-15 18:18     ` Elena
@ 2022-02-16 11:08       ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2022-02-16 11:08 UTC (permalink / raw)
  To: Elena
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, armbru, mst, berrange, pbonzini,
	philmd

[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]

On Tue, Feb 15, 2022 at 10:18:12AM -0800, Elena wrote:
> On Mon, Feb 14, 2022 at 02:37:21PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Feb 07, 2022 at 11:22:18PM -0800, Elena Ufimtseva wrote:
> > > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > ---
> > >  meson.build                    |  15 ++-
> > >  qapi/qom.json                  |  32 +++++-
> > >  include/hw/remote/ioregionfd.h |  40 +++++++
> > >  hw/remote/ioregionfd.c         | 196 +++++++++++++++++++++++++++++++++
> > >  Kconfig.host                   |   3 +
> > >  MAINTAINERS                    |   2 +
> > >  hw/remote/Kconfig              |   4 +
> > >  hw/remote/meson.build          |   1 +
> > >  meson_options.txt              |   2 +
> > >  scripts/meson-buildoptions.sh  |   3 +
> > >  10 files changed, 294 insertions(+), 4 deletions(-)
> > >  create mode 100644 include/hw/remote/ioregionfd.h
> > >  create mode 100644 hw/remote/ioregionfd.c
> > > 
> > > diff --git a/meson.build b/meson.build
> > > index 96de1a6ef9..6483e754bd 100644
> > > --- a/meson.build
> > > +++ b/meson.build
> > > @@ -258,6 +258,17 @@ if targetos != 'linux' and get_option('multiprocess').enabled()
> > >  endif
> > >  multiprocess_allowed = targetos == 'linux' and not get_option('multiprocess').disabled()
> > >  
> > > +# TODO: drop this limitation
> > 
> > What is the reason for the limitation?
> >
> 
> The idea is to limit use of this acceleration until the API is more
> generic and does not need mutliprocess.

Please document that intention so readers understand why a limitation
is in place.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/8] ioregionfd introduction
  2022-02-15 18:16   ` Elena
@ 2022-02-16 11:20     ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2022-02-16 11:20 UTC (permalink / raw)
  To: Elena
  Cc: eduardo, john.g.johnson, cohuck, jag.raman, john.levon, eblake,
	david, qemu-devel, peterx, armbru, mst, berrange, pbonzini,
	philmd

[-- Attachment #1: Type: text/plain, Size: 8357 bytes --]

On Tue, Feb 15, 2022 at 10:16:04AM -0800, Elena wrote:
> On Mon, Feb 14, 2022 at 02:52:29PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote:
> > > This patchset is an RFC version for the ioregionfd implementation
> > > in QEMU. The kernel patches are to be posted with some fixes as a v4.
> > > 
> > > For this implementation version 3 of the posted kernel patches was user:
> > > https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/
> > > 
> > > The future version will include support for vfio/libvfio-user.
> > > Please refer to the design discussion here proposed by Stefan:
> > > https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/
> > > 
> > > The vfio-user version needed some bug-fixing and it was decided to send
> > > this for multiprocess first.
> > > 
> > > The ioregionfd is configured currently trough the command line and each
> > > ioregionfd represent an object. This allow for easy parsing and does
> > > not require device/remote object command line option modifications.
> > > 
> > > The following command line can be used to specify ioregionfd:
> > > <snip>
> > >   '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
> > >   '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\
> > >   '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
> > 
> 
> Hi Stefan
> 
> Thank you for taking a look!
> 
> > Explicit configuration of ioregionfd-object is okay for early
> > prototyping, but what is the plan for integrating this? I guess
> > x-remote-object would query the remote device to find out which
> > ioregionfds need to be registered and the user wouldn't need to specify
> > ioregionfds on the command-line?
> 
> Yes, this can be done. For some reason I thought that user will be able
> to configure the number/size of the regions to be configured as
> ioregionfds. 
> 
> > 
> > > </snip>
> > > 
> > > Proxy side of ioregionfd in this version uses only one file descriptor:
> > > <snip>
> > >   '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \
> > > </snip>
> > 
> > This raises the question of the ioregionfd file descriptor lifecycle. In
> > the end I think it shouldn't be specified on the command-line. Instead
> > the remote device should create it and pass it to QEMU over the
> > mpqemu/remote fd?
> 
> Yes, this will be same as vfio-user does.
> > 
> > > 
> > > This is done for RFC version and my though was that next version will
> > > be for vfio-user, so I have not dedicated much effort to this command
> > > line options.
> > > 
> > > The multiprocess messaging protocol was extended to support inquiries
> > > by the proxy if device has any ioregionfds.
> > > This RFC implements inquires by proxy about the type of BAR (ioregionfd
> > > or not) and the type of it (memory/io).
> > > 
> > > Currently there are few limitations in this version of ioregionfd.
> > >  - one ioregionfd per bar, only full bar size is supported;
> > >  - one file descriptor per device for all of its ioregionfds;
> > >  - each remote device runs fd handler for all its BARs in one IOThread;
> > >  - proxy supports only one fd.
> > > 
> > > Some of these limitations will be dropped in the future version.
> > > This RFC is to acquire the feedback/suggestions from the community
> > > on the general approach.
> > > 
> > > The quick performance test was done for the remote lsi device with
> > > ioregionfd and without for both mem BARs (1 and 2) with help
> > > of the fio tool:
> > > 
> > > Random R/W:
> > > 
> > > 	             read IOPS	read BW     write IOPS   write BW
> > > no ioregionfd	 889	    3559KiB/s   890          3561KiB/s
> > > ioregionfd	     938	    3756KiB/s   939          3757KiB/s
> > 
> > This is extremely slow, even for random I/O. How does this compare to
> > QEMU running the LSI device without multi-process mode?
> 
> These tests had the iodepth=256. I have changed this to 1 and tested
> without multiprocess, with multiprocess and multiprocess with both mmio
> regions as ioregionfds:
> 
> 	                 read IOPS  read BW(KiB/s)  write IOPS   write BW (KiB/s)
> no multiprocess             89	         358	       90	    360
> multiprocess                138	         556	       139	    557
> multiprocess ioregionfd	    174	         698	       173	    693
> 
> The fio config for randomrw:
> [global]
> bs=4K
> iodepth=1
> direct=0

Please set direct=1 so the guest page cache does not affect the I/O
pattern.

The host --drive option also needs cache.direct=on to avoid host page
cache effects.

The reason for benchmarking with direct=1 is to ensure that every I/O
request submitted by fio is forwarded to the underlying disk. Otherwise
the benchmark may be comparing guest page cache or host page cache hits,
which do not involve the disk.

Page cache read-ahead and write-behind may involve large block sizes and
therefore change the I/O pattern specified on the fio command-line. This
interferes with the benchmark and is another reason to use direct=1.

> ioengine=libaio
> group_reporting
> time_based
> runtime=240
> numjobs=1
> name=raw-randreadwrite
> rw=randrw
> size=8G
> [job1]
> filename=/fio/randomrw
> 
> And QEMU command line for non-mutliprocess:
> 
> /usr/local/bin/qemu-system-x86_64  -name "OL7.4" -machine q35,accel=kvm -smp sockets=1,cores=2,threads=2 -m 2048 -hda /home/homedir/ol7u9boot.img -boot d -vnc :0 -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios -device lsi53c895a,id=lsi1 -drive id=drive_image1,if=none,file=/home/homedir/10gb.qcow2 -device scsi-hd,id=drive1,drive=drive_image1,bus=lsi1.0,scsi-id=0
> 
> QEMU command line for multiprocess:
> 
> remote_cmd = [ PROC_QEMU,                                                      \
>                '-machine', 'x-remote',                                         \
>                '-device', 'lsi53c895a,id=lsi0',                                \
>                '-drive', 'id=drive_image1,file=/home/homedir/10gb.qcow2',   \
>                '-device', 'scsi-hd,id=drive2,drive=drive_image1,bus=lsi0.0,'   \
>                               'scsi-id=0',                                     \
>                '-nographic',                                                   \
>                '-monitor', 'unix:/home/homedir/rem-sock,server,nowait',                \
>                '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\
>                '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1,',\
>                '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
>                ]
> proxy_cmd = [ PROC_QEMU,                                           \
>               '-D', '/tmp/qemu-debug-log', \
>               '-name', 'OL7.4',                                                \
>               '-machine', 'pc,accel=kvm',                                      \
>               '-smp', 'sockets=1,cores=2,threads=2',                           \
>               '-m', '2048',                                                    \
>               '-object', 'memory-backend-memfd,id=sysmem-file,size=2G',        \
>               '-numa', 'node,memdev=sysmem-file',                              \
>               '-hda','/home/homedir/ol7u9boot.img',                      \
>               '-boot', 'd',                                                    \
>               '-vnc', ':0',                                                    \
>               '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()),               \
>               '-monitor', 'unix:/home/homedir/qemu-sock,server,nowait',                \
>               '-netdev','tap,id=mynet0,ifname=tap0,script=no,downscript=no', '-device','e1000,netdev=mynet0,mac=52:55:00:d1:55:01',\
>             ]
> 
> Where for the test without ioregionfds, they are commented out.
> 
> I am doing more testing as I see some inconsistent results.

Thanks for the benchmark details!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/8] ioregionfd: introduce a syscall and memory API
  2022-02-08  7:22 ` [RFC 1/8] ioregionfd: introduce a syscall and memory API Elena Ufimtseva
@ 2022-02-16 12:19   ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2022-02-16 12:19 UTC (permalink / raw)
  To: Elena Ufimtseva, qemu-devel
  Cc: eduardo, john.g.johnson, jag.raman, john.levon, philmd, cohuck,
	armbru, peterx, mst, stefanha, pbonzini, berrange, eblake

Looks straight forward to me.

[...]

>  
> +int kvm_set_ioregionfd(struct kvm_ioregion *ioregionfd)
> +{
> +    KVMState *s = kvm_state;
> +    int ret = -1;
> +
> +    ret = kvm_vm_ioctl(s, KVM_SET_IOREGION, ioregionfd);
> +    if (ret < 0) {
> +        error_report("Failed SET_IOREGION syscall ret is %d", ret);

Maybe print the textual representation via strerror(-ret).

> +    }
> +    return ret;
> +}
> +
>  static int do_kvm_destroy_vcpu(CPUState *cpu)
>  {
>      KVMState *s = kvm_state;
> @@ -1635,6 +1648,104 @@ static void kvm_io_ioeventfd_del(MemoryListener *listener,
>      }
>  }
>  
> +static void kvm_mem_ioregionfd_add(MemoryListener *listener,
> +                                   MemoryRegionSection *section,
> +                                   uint64_t data,
> +                                   int fd)
> +{
> +
> +    struct kvm_ioregion ioregionfd;
> +    int r = -1;
> +
> +    ioregionfd.guest_paddr = section->offset_within_address_space;
> +    ioregionfd.memory_size = int128_get64(section->size);
> +    ioregionfd.user_data = data;
> +    ioregionfd.read_fd = fd;
> +    ioregionfd.write_fd = fd;
> +    ioregionfd.flags = 0;
> +    memset(&ioregionfd.pad, 0, sizeof(ioregionfd.pad));
> +
> +    r = kvm_set_ioregionfd(&ioregionfd);
> +    if (r < 0) {
> +        fprintf(stderr, "%s: error adding ioregionfd: %s (%d)\n,",
> +                __func__, strerror(-r), -r);

Oh, you're actually printing the error again? Why error_report() above
and here fprintf?

[...]

>  void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
>                                    AddressSpace *as, int as_id, const char *name)
>  {
> @@ -1679,6 +1790,12 @@ static MemoryListener kvm_io_listener = {
>      .priority = 10,
>  };
>  
> +static MemoryListener kvm_ioregion_listener = {
> +    .ioregionfd_add = kvm_io_ioregionfd_add,
> +    .ioregionfd_del = kvm_io_ioregionfd_del,
> +    .priority = 10,
> +};
> +
>  int kvm_set_irq(KVMState *s, int irq, int level)
>  {
>      struct kvm_irq_level event;
> @@ -2564,6 +2681,9 @@ static int kvm_init(MachineState *ms)
>      kvm_ioeventfd_any_length_allowed =
>          (kvm_check_extension(s, KVM_CAP_IOEVENTFD_ANY_LENGTH) > 0);
>  
> +    kvm_ioregionfds_allowed =
> +        (kvm_check_extension(s, KVM_CAP_IOREGIONFD) > 0);
> +
>      kvm_state = s;
>  
>      ret = kvm_arch_init(ms, s);
> @@ -2585,6 +2705,12 @@ static int kvm_init(MachineState *ms)
>          s->memory_listener.listener.eventfd_add = kvm_mem_ioeventfd_add;
>          s->memory_listener.listener.eventfd_del = kvm_mem_ioeventfd_del;
>      }
> +
> +    if (kvm_ioregionfds_allowed) {
> +        s->memory_listener.listener.ioregionfd_add = kvm_mem_ioregionfd_add;
> +        s->memory_listener.listener.ioregionfd_del = kvm_mem_ioregionfd_del;
> +    }
> +
>      s->memory_listener.listener.coalesced_io_add = kvm_coalesce_mmio_region;
>      s->memory_listener.listener.coalesced_io_del = kvm_uncoalesce_mmio_region;
>  
> @@ -2594,6 +2720,12 @@ static int kvm_init(MachineState *ms)
>          memory_listener_register(&kvm_io_listener,
>                                   &address_space_io);
>      }
> +
> +    if (kvm_ioregionfds_allowed) {
> +        memory_listener_register(&kvm_ioregion_listener,
> +                                 &address_space_io);
> +    }
> +
>      memory_listener_register(&kvm_coalesced_pio_listener,
>                               &address_space_io);
>  

Why are we using a single memory listener for address_space_memory but
individual listeners for address_space_io?

IOW, wey don't we have &s->io_listener ?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-02-16 12:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-08  7:22 [RFC 0/8] ioregionfd introduction Elena Ufimtseva
2022-02-08  7:22 ` [RFC 1/8] ioregionfd: introduce a syscall and memory API Elena Ufimtseva
2022-02-16 12:19   ` David Hildenbrand
2022-02-08  7:22 ` [RFC 2/8] multiprocess: place RemoteObject definition in a header file Elena Ufimtseva
2022-02-08  7:22 ` [RFC 3/8] ioregionfd: introduce memory API functions Elena Ufimtseva
2022-02-14 14:32   ` Stefan Hajnoczi
2022-02-08  7:22 ` [RFC 4/8] ioregionfd: Introduce IORegionDFObject type Elena Ufimtseva
2022-02-11 13:46   ` Markus Armbruster
2022-02-15 18:19     ` Elena
2022-02-14 14:37   ` Stefan Hajnoczi
2022-02-15 18:18     ` Elena
2022-02-16 11:08       ` Stefan Hajnoczi
2022-02-08  7:22 ` [RFC 5/8] multiprocess: prepare ioregionfds for remote device Elena Ufimtseva
2022-02-08  7:22 ` [RFC 6/8] multiprocess: add MPQEMU_CMD_BAR_INFO Elena Ufimtseva
2022-02-08  7:22 ` [RFC 7/8] multiprocess: add ioregionfd memory region in proxy Elena Ufimtseva
2022-02-08  7:22 ` [RFC 8/8] multiprocess: handle ioregionfd commands Elena Ufimtseva
2022-02-09 10:33 ` [RFC 0/8] ioregionfd introduction Stefan Hajnoczi
2022-02-14 14:52 ` Stefan Hajnoczi
2022-02-15 18:16   ` Elena
2022-02-16 11:20     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.