All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH qemu 0/3] vfio-pci, spapr: Allow in-kernel acceleration
@ 2017-03-28  9:05 Alexey Kardashevskiy
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR Alexey Kardashevskiy
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2017-03-28  9:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, David Gibson, Alex Williamson

This is my current working tree to support kernel's
"powerpc/kvm/vfio: Enable in-kernel acceleration".

Please comment. Thanks.



Alexey Kardashevskiy (3):
  memory: Add get_fd() hook for IOMMU MR
  vfio-pci: Reorder group-to-container attaching
  vfio: Enable in-kernel acceleration via VFIO KVM device

 include/exec/memory.h         |  2 ++
 include/hw/vfio/vfio-common.h |  1 +
 target/ppc/kvm_ppc.h          |  6 ++++++
 hw/ppc/spapr_iommu.c          | 12 ++++++++++++
 hw/vfio/common.c              | 34 ++++++++++++++++++++++++----------
 hw/vfio/spapr.c               | 26 ++++++++++++++++++++++++++
 target/ppc/kvm.c              |  7 ++++++-
 hw/vfio/trace-events          |  1 +
 8 files changed, 78 insertions(+), 11 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR
  2017-03-28  9:05 [Qemu-devel] [RFC PATCH qemu 0/3] vfio-pci, spapr: Allow in-kernel acceleration Alexey Kardashevskiy
@ 2017-03-28  9:05 ` Alexey Kardashevskiy
  2017-03-28 17:48   ` Alex Williamson
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 2/3] vfio-pci: Reorder group-to-container attaching Alexey Kardashevskiy
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device Alexey Kardashevskiy
  2 siblings, 1 reply; 16+ messages in thread
From: Alexey Kardashevskiy @ 2017-03-28  9:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, David Gibson, Alex Williamson

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/exec/memory.h | 2 ++
 hw/ppc/spapr_iommu.c  | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index e39256ad03..925c10b35b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
     void (*notify_flag_changed)(MemoryRegion *iommu,
                                 IOMMUNotifierFlag old_flags,
                                 IOMMUNotifierFlag new_flags);
+    /* Returns a kernel fd for IOMMU */
+    int (*get_fd)(MemoryRegion *iommu);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 9e30e148d6..b61c8f053e 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -170,6 +170,13 @@ static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
     }
 }
 
+static int spapr_tce_get_fd(MemoryRegion *iommu)
+{
+    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
+
+    return tcet->fd;
+}
+
 static int spapr_tce_table_post_load(void *opaque, int version_id)
 {
     sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
@@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
     .translate = spapr_tce_translate_iommu,
     .get_min_page_size = spapr_tce_get_min_page_size,
     .notify_flag_changed = spapr_tce_notify_flag_changed,
+    .get_fd = spapr_tce_get_fd,
 };
 
 static int spapr_tce_table_realize(DeviceState *dev)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC PATCH qemu 2/3] vfio-pci: Reorder group-to-container attaching
  2017-03-28  9:05 [Qemu-devel] [RFC PATCH qemu 0/3] vfio-pci, spapr: Allow in-kernel acceleration Alexey Kardashevskiy
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR Alexey Kardashevskiy
@ 2017-03-28  9:05 ` Alexey Kardashevskiy
  2017-03-28 17:48   ` Alex Williamson
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device Alexey Kardashevskiy
  2 siblings, 1 reply; 16+ messages in thread
From: Alexey Kardashevskiy @ 2017-03-28  9:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, David Gibson, Alex Williamson

At the moment VFIO PCI device initialization works as follows:
vfio_realize
	vfio_get_group
		vfio_connect_container
			register memory listeners (1)
			update QEMU groups lists
		vfio_kvm_device_add_group

Then (example for pseries) the machine reset hook triggers region_add()
for all regions where listeners from (1) are listening:

ppc_spapr_reset
	spapr_phb_reset
		spapr_tce_table_enable
			memory_region_add_subregion
				vfio_listener_region_add
					vfio_spapr_create_window

This scheme works fine until we need to handle VFIO PCI device hotplug
_and_ we want to enable in-kernel acceleration on, i.e. after PCI hotplug
we need a place to call
ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE).
Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd
(from VFIOGroup), vfio_listener_region_add() seems to be the only place
for this ioctl().

However this only works during boot time because the machine reset
happens strictly after all devices are finalized. When hotplug happens,
vfio_listener_region_add() is called when a memory listener is registered
but when this happens:
1. new group is not added to the container->group_list yet;
2. VFIO KVM device is unaware of the new IOMMU group.

This moves bits around to have all necessary VFIO infrastructure
in place for both initial startup and hotplug cases.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/vfio/common.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f3ba9b9007..c75c7594d5 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1086,6 +1086,16 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         goto free_container_exit;
     }
 
+    container->initialized = true;
+
+    vfio_kvm_device_add_group(group);
+
+    QLIST_INIT(&container->group_list);
+    QLIST_INSERT_HEAD(&space->containers, container, next);
+
+    group->container = container;
+    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
+
     container->listener = vfio_memory_listener;
 
     memory_listener_register(&container->listener, container->space->as);
@@ -1097,16 +1107,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         goto listener_release_exit;
     }
 
-    container->initialized = true;
-
-    QLIST_INIT(&container->group_list);
-    QLIST_INSERT_HEAD(&space->containers, container, next);
-
-    group->container = container;
-    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
-
     return 0;
 listener_release_exit:
+    vfio_kvm_device_del_group(group);
     vfio_listener_release(container);
 
 free_container_exit:
@@ -1210,8 +1213,6 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
 
     QLIST_INSERT_HEAD(&vfio_group_list, group, next);
 
-    vfio_kvm_device_add_group(group);
-
     return group;
 
 close_fd_exit:
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device
  2017-03-28  9:05 [Qemu-devel] [RFC PATCH qemu 0/3] vfio-pci, spapr: Allow in-kernel acceleration Alexey Kardashevskiy
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR Alexey Kardashevskiy
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 2/3] vfio-pci: Reorder group-to-container attaching Alexey Kardashevskiy
@ 2017-03-28  9:05 ` Alexey Kardashevskiy
  2017-03-28 17:48   ` Alex Williamson
  2017-03-29  3:53   ` David Gibson
  2 siblings, 2 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2017-03-28  9:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, David Gibson, Alex Williamson

This enables in-kernel acceleration of TCE update requests via
VFIO KVM device.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/hw/vfio/vfio-common.h |  1 +
 target/ppc/kvm_ppc.h          |  6 ++++++
 hw/ppc/spapr_iommu.c          |  4 ++++
 hw/vfio/common.c              | 13 +++++++++++++
 hw/vfio/spapr.c               | 26 ++++++++++++++++++++++++++
 target/ppc/kvm.c              |  7 ++++++-
 hw/vfio/trace-events          |  1 +
 7 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index c582de18c9..ee8c96cc4a 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -175,6 +175,7 @@ extern const MemoryListener vfio_prereg_listener;
 int vfio_spapr_create_window(VFIOContainer *container,
                              MemoryRegionSection *section,
                              hwaddr *pgsize);
+int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd);
 int vfio_spapr_remove_window(VFIOContainer *container,
                              hwaddr offset_within_address_space);
 
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index f48243d13f..ce7327a4e0 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -46,6 +46,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
 int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
 int kvmppc_reset_htab(int shift_hint);
 uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
+bool kvmppc_has_cap_spapr_vfio(void);
 #endif /* !CONFIG_USER_ONLY */
 bool kvmppc_has_cap_epr(void);
 int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
@@ -216,6 +217,11 @@ static inline bool kvmppc_is_mem_backend_page_size_ok(char *obj_path)
     return true;
 }
 
+static inline bool kvmppc_has_cap_spapr_vfio(void)
+{
+    return false;
+}
+
 #endif /* !CONFIG_USER_ONLY */
 
 static inline bool kvmppc_has_cap_epr(void)
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index b61c8f053e..fc23d81645 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -293,6 +293,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
 
     tcet->need_vfio = need_vfio;
 
+    if (!need_vfio || (tcet->fd != -1 && kvmppc_has_cap_spapr_vfio())) {
+        return;
+    }
+
     oldtable = tcet->table;
 
     tcet->table = spapr_tce_alloc_table(tcet->liobn,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c75c7594d5..9aaf861904 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -440,6 +440,19 @@ static void vfio_listener_region_add(MemoryListener *listener,
             goto fail;
         }
 
+#ifdef CONFIG_KVM
+        if (kvm_enabled() && section->mr->iommu_ops->get_fd) {
+            VFIOGroup *group;
+            int tablefd =  section->mr->iommu_ops->get_fd(section->mr);
+
+            if (tablefd != -1) {
+                QLIST_FOREACH(group, &container->group_list, container_next) {
+                    vfio_spapr_notify_kvm(vfio_kvm_device_fd,
+                                          group->fd, tablefd);
+                }
+            }
+        }
+#endif
         vfio_host_win_add(container, section->offset_within_address_space,
                           section->offset_within_address_space +
                           int128_get64(section->size) - 1, pgsize);
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 4409bcc0d7..dffef3bd5f 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -17,6 +17,9 @@
 #include "hw/hw.h"
 #include "qemu/error-report.h"
 #include "trace.h"
+#ifdef CONFIG_KVM
+#include "linux/kvm.h"
+#endif
 
 static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
 {
@@ -187,6 +190,29 @@ int vfio_spapr_create_window(VFIOContainer *container,
     return 0;
 }
 
+int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd)
+{
+#ifdef CONFIG_KVM
+    struct kvm_vfio_spapr_tce param = {
+        .groupfd = groupfd,
+        .tablefd = tablefd
+    };
+    struct kvm_device_attr attr = {
+        .group = KVM_DEV_VFIO_GROUP,
+        .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
+        .addr = (uint64_t)(unsigned long)&param,
+    };
+
+    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+        error_report("vfio: failed to setup fd %d for a group with fd %d: %s",
+                     param.tablefd, param.groupfd, strerror(errno));
+        return -errno;
+    }
+    trace_vfio_spapr_notify_kvm(groupfd, tablefd);
+#endif
+    return 0;
+}
+
 int vfio_spapr_remove_window(VFIOContainer *container,
                              hwaddr offset_within_address_space)
 {
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 560ce655c7..bca5fe7329 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -131,7 +131,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
     cap_spapr_tce_64 = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_64);
     cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
-    cap_spapr_vfio = false;
+    cap_spapr_vfio = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_VFIO);
     cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
     cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
     cap_epr = kvm_check_extension(s, KVM_CAP_PPC_EPR);
@@ -2416,6 +2416,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
     return cap_mmu_hash_v3;
 }
 
+bool kvmppc_has_cap_spapr_vfio(void)
+{
+    return cap_spapr_vfio;
+}
+
 static PowerPCCPUClass *ppc_cpu_get_family_class(PowerPCCPUClass *pcc)
 {
     ObjectClass *oc = OBJECT_CLASS(pcc);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 2561c6d31a..084a92f7c2 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -123,3 +123,4 @@ vfio_prereg_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"P
 vfio_prereg_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
 vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
 vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64
+vfio_spapr_notify_kvm(int groupfd, int tablefd) "Attached groupfd %d to liobn fd %d"
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device Alexey Kardashevskiy
@ 2017-03-28 17:48   ` Alex Williamson
  2017-03-29  4:27     ` Alexey Kardashevskiy
  2017-03-29  3:53   ` David Gibson
  1 sibling, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2017-03-28 17:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, David Gibson

On Tue, 28 Mar 2017 20:05:30 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> This enables in-kernel acceleration of TCE update requests via
> VFIO KVM device.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  include/hw/vfio/vfio-common.h |  1 +
>  target/ppc/kvm_ppc.h          |  6 ++++++
>  hw/ppc/spapr_iommu.c          |  4 ++++
>  hw/vfio/common.c              | 13 +++++++++++++
>  hw/vfio/spapr.c               | 26 ++++++++++++++++++++++++++
>  target/ppc/kvm.c              |  7 ++++++-
>  hw/vfio/trace-events          |  1 +
>  7 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index c582de18c9..ee8c96cc4a 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h

Two patches intermixed here again it seems.  I'll refer to them as "A"
and "B".  Seems easy to split at the file level.

Patch "B"

> @@ -175,6 +175,7 @@ extern const MemoryListener vfio_prereg_listener;
>  int vfio_spapr_create_window(VFIOContainer *container,
>                               MemoryRegionSection *section,
>                               hwaddr *pgsize);
> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd);
>  int vfio_spapr_remove_window(VFIOContainer *container,
>                               hwaddr offset_within_address_space);
>  
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index f48243d13f..ce7327a4e0 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h

Patch "A"

> @@ -46,6 +46,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
>  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
> +bool kvmppc_has_cap_spapr_vfio(void);
>  #endif /* !CONFIG_USER_ONLY */
>  bool kvmppc_has_cap_epr(void);
>  int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
> @@ -216,6 +217,11 @@ static inline bool kvmppc_is_mem_backend_page_size_ok(char *obj_path)
>      return true;
>  }
>  
> +static inline bool kvmppc_has_cap_spapr_vfio(void)
> +{
> +    return false;
> +}
> +
>  #endif /* !CONFIG_USER_ONLY */
>  
>  static inline bool kvmppc_has_cap_epr(void)
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index b61c8f053e..fc23d81645 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c

Patch "A"

> @@ -293,6 +293,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
>  
>      tcet->need_vfio = need_vfio;
>  
> +    if (!need_vfio || (tcet->fd != -1 && kvmppc_has_cap_spapr_vfio())) {
> +        return;
> +    }
> +
>      oldtable = tcet->table;
>  
>      tcet->table = spapr_tce_alloc_table(tcet->liobn,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index c75c7594d5..9aaf861904 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c

Patch "B"

> @@ -440,6 +440,19 @@ static void vfio_listener_region_add(MemoryListener *listener,
>              goto fail;
>          }
>  
> +#ifdef CONFIG_KVM

I don't think we need this just for kvm_enabled(), do we?

> +        if (kvm_enabled() && section->mr->iommu_ops->get_fd) {
> +            VFIOGroup *group;
> +            int tablefd =  section->mr->iommu_ops->get_fd(section->mr);

This would change to

    tablefd=memory_region_iommu_get_fd(SPAPR_IOMMU_TABLE_FD,section->mr);

> +
> +            if (tablefd != -1) {
> +                QLIST_FOREACH(group, &container->group_list, container_next) {
> +                    vfio_spapr_notify_kvm(vfio_kvm_device_fd,
> +                                          group->fd, tablefd);
> +                }
> +            }
> +        }
> +#endif
>          vfio_host_win_add(container, section->offset_within_address_space,
>                            section->offset_within_address_space +
>                            int128_get64(section->size) - 1, pgsize);
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 4409bcc0d7..dffef3bd5f 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c

Patch "B"

> @@ -17,6 +17,9 @@
>  #include "hw/hw.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
> +#ifdef CONFIG_KVM
> +#include "linux/kvm.h"
> +#endif
>  
>  static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
>  {
> @@ -187,6 +190,29 @@ int vfio_spapr_create_window(VFIOContainer *container,
>      return 0;
>  }
>  
> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd)
> +{
> +#ifdef CONFIG_KVM
> +    struct kvm_vfio_spapr_tce param = {
> +        .groupfd = groupfd,
> +        .tablefd = tablefd
> +    };
> +    struct kvm_device_attr attr = {
> +        .group = KVM_DEV_VFIO_GROUP,
> +        .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
> +        .addr = (uint64_t)(unsigned long)&param,
> +    };
> +
> +    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
> +        error_report("vfio: failed to setup fd %d for a group with fd %d: %s",
> +                     param.tablefd, param.groupfd, strerror(errno));
> +        return -errno;
> +    }
> +    trace_vfio_spapr_notify_kvm(groupfd, tablefd);
> +#endif
> +    return 0;
> +}
> +
>  int vfio_spapr_remove_window(VFIOContainer *container,
>                               hwaddr offset_within_address_space)
>  {
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 560ce655c7..bca5fe7329 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c

Patch "A"

> @@ -131,7 +131,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>      cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
>      cap_spapr_tce_64 = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_64);
>      cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
> -    cap_spapr_vfio = false;
> +    cap_spapr_vfio = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_VFIO);
>      cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
>      cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
>      cap_epr = kvm_check_extension(s, KVM_CAP_PPC_EPR);
> @@ -2416,6 +2416,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>      return cap_mmu_hash_v3;
>  }
>  
> +bool kvmppc_has_cap_spapr_vfio(void)
> +{
> +    return cap_spapr_vfio;
> +}
> +
>  static PowerPCCPUClass *ppc_cpu_get_family_class(PowerPCCPUClass *pcc)
>  {
>      ObjectClass *oc = OBJECT_CLASS(pcc);
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 2561c6d31a..084a92f7c2 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events

Patch "B"

> @@ -123,3 +123,4 @@ vfio_prereg_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"P
>  vfio_prereg_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
>  vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
>  vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64
> +vfio_spapr_notify_kvm(int groupfd, int tablefd) "Attached groupfd %d to liobn fd %d"

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR Alexey Kardashevskiy
@ 2017-03-28 17:48   ` Alex Williamson
  2017-03-29  1:41     ` Alexey Kardashevskiy
  2017-03-29  3:35     ` David Gibson
  0 siblings, 2 replies; 16+ messages in thread
From: Alex Williamson @ 2017-03-28 17:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, David Gibson

On Tue, 28 Mar 2017 20:05:28 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  include/exec/memory.h | 2 ++
>  hw/ppc/spapr_iommu.c  | 8 ++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index e39256ad03..925c10b35b 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
>      void (*notify_flag_changed)(MemoryRegion *iommu,
>                                  IOMMUNotifierFlag old_flags,
>                                  IOMMUNotifierFlag new_flags);
> +    /* Returns a kernel fd for IOMMU */
> +    int (*get_fd)(MemoryRegion *iommu);

What if we used this as a prototype:

int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);

And then we defined:

typedef enum {
    SPAPR_IOMMU_TABLE_FD = 0,
} IOMMUFdType;

Such that you're actually asking the IOMMUOps for a specific type of FD
and it either has it or not, so the caller doesn't need to assume what
it is they get back.

Furthermore, add:

int memory_region_iommu_get_fd(IOMMUFdType type, MemoryRegion *mr)
{
    assert(memory_region_is_iommu(mr));

    if (mr->iommu_ops && mr->iommu_ops->get_fd) {
        return mr->iommu_ops->get_fd(type, mr);
    }

    return -1;
}

>  };
>

This should be two patches, patch 1 above, patch 2 below
  
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 9e30e148d6..b61c8f053e 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -170,6 +170,13 @@ static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
>      }
>  }
>  
> +static int spapr_tce_get_fd(MemoryRegion *iommu)
> +{
> +    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> +
> +    return tcet->fd;


This would then be:

    return type == SPAPR_IOMMU_TABLE_FD ? tcet->fd : -1;

> +}
> +
>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>  {
>      sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> @@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>      .translate = spapr_tce_translate_iommu,
>      .get_min_page_size = spapr_tce_get_min_page_size,
>      .notify_flag_changed = spapr_tce_notify_flag_changed,
> +    .get_fd = spapr_tce_get_fd,
>  };
>  
>  static int spapr_tce_table_realize(DeviceState *dev)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 2/3] vfio-pci: Reorder group-to-container attaching
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 2/3] vfio-pci: Reorder group-to-container attaching Alexey Kardashevskiy
@ 2017-03-28 17:48   ` Alex Williamson
  2017-03-29  3:42     ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2017-03-28 17:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, David Gibson

On Tue, 28 Mar 2017 20:05:29 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> At the moment VFIO PCI device initialization works as follows:
> vfio_realize
> 	vfio_get_group
> 		vfio_connect_container
> 			register memory listeners (1)
> 			update QEMU groups lists
> 		vfio_kvm_device_add_group
> 
> Then (example for pseries) the machine reset hook triggers region_add()
> for all regions where listeners from (1) are listening:
> 
> ppc_spapr_reset
> 	spapr_phb_reset
> 		spapr_tce_table_enable
> 			memory_region_add_subregion
> 				vfio_listener_region_add
> 					vfio_spapr_create_window
> 
> This scheme works fine until we need to handle VFIO PCI device hotplug
> _and_ we want to enable in-kernel acceleration on, i.e. after PCI hotplug
> we need a place to call
> ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE).
> Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd
> (from VFIOGroup), vfio_listener_region_add() seems to be the only place
> for this ioctl().
> 
> However this only works during boot time because the machine reset
> happens strictly after all devices are finalized. When hotplug happens,
> vfio_listener_region_add() is called when a memory listener is registered
> but when this happens:
> 1. new group is not added to the container->group_list yet;
> 2. VFIO KVM device is unaware of the new IOMMU group.
> 
> This moves bits around to have all necessary VFIO infrastructure
> in place for both initial startup and hotplug cases.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/vfio/common.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f3ba9b9007..c75c7594d5 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1086,6 +1086,16 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          goto free_container_exit;
>      }
>  
> +    container->initialized = true;

This ignores the purpose of this variable, which is to make runtime
mapping faults fatal, but device realize time faults simply cause the
device to be rejected.  This cannot be moved above
memory_listener_register().

> +
> +    vfio_kvm_device_add_group(group);
> +
> +    QLIST_INIT(&container->group_list);
> +    QLIST_INSERT_HEAD(&space->containers, container, next);
> +
> +    group->container = container;
> +    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> +
>      container->listener = vfio_memory_listener;
>  
>      memory_listener_register(&container->listener, container->space->as);
> @@ -1097,16 +1107,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>          goto listener_release_exit;
>      }
>  
> -    container->initialized = true;
> -
> -    QLIST_INIT(&container->group_list);
> -    QLIST_INSERT_HEAD(&space->containers, container, next);
> -
> -    group->container = container;
> -    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> -
>      return 0;
>  listener_release_exit:
> +    vfio_kvm_device_del_group(group);


Where's the QLIST cleanup?


>      vfio_listener_release(container);
>  
>  free_container_exit:
> @@ -1210,8 +1213,6 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
>  
>      QLIST_INSERT_HEAD(&vfio_group_list, group, next);
>  
> -    vfio_kvm_device_add_group(group);
> -
>      return group;
>  
>  close_fd_exit:

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR
  2017-03-28 17:48   ` Alex Williamson
@ 2017-03-29  1:41     ` Alexey Kardashevskiy
  2017-03-29  3:04       ` Alex Williamson
  2017-03-29  3:35     ` David Gibson
  1 sibling, 1 reply; 16+ messages in thread
From: Alexey Kardashevskiy @ 2017-03-29  1:41 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, qemu-ppc, David Gibson, Paolo Bonzini

On 29/03/17 04:48, Alex Williamson wrote:
> On Tue, 28 Mar 2017 20:05:28 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  include/exec/memory.h | 2 ++
>>  hw/ppc/spapr_iommu.c  | 8 ++++++++
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index e39256ad03..925c10b35b 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
>>      void (*notify_flag_changed)(MemoryRegion *iommu,
>>                                  IOMMUNotifierFlag old_flags,
>>                                  IOMMUNotifierFlag new_flags);
>> +    /* Returns a kernel fd for IOMMU */
>> +    int (*get_fd)(MemoryRegion *iommu);
> 
> What if we used this as a prototype:
> 
> int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
> 
> And then we defined:
> 
> typedef enum {
>     SPAPR_IOMMU_TABLE_FD = 0,
> } IOMMUFdType;


Where do I put this enum definition? include/exec/memory.h? It does not
have any mention of any platform yet...

I could pass char* instead of IOMMUFdType (and pass there something like
TYPE_SPAPR_TCE_TABLE), would it be any better?


> 
> Such that you're actually asking the IOMMUOps for a specific type of FD
> and it either has it or not, so the caller doesn't need to assume what
> it is they get back.
> 
> Furthermore, add:
> 
> int memory_region_iommu_get_fd(IOMMUFdType type, MemoryRegion *mr)
> {
>     assert(memory_region_is_iommu(mr));
> 
>     if (mr->iommu_ops && mr->iommu_ops->get_fd) {
>         return mr->iommu_ops->get_fd(type, mr);
>     }
> 
>     return -1;
> }
> 
>>  };
>>
> 
> This should be two patches, patch 1 above, patch 2 below
>   
>>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 9e30e148d6..b61c8f053e 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -170,6 +170,13 @@ static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
>>      }
>>  }
>>  
>> +static int spapr_tce_get_fd(MemoryRegion *iommu)
>> +{
>> +    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
>> +
>> +    return tcet->fd;
> 
> 
> This would then be:
> 
>     return type == SPAPR_IOMMU_TABLE_FD ? tcet->fd : -1;
> 
>> +}
>> +
>>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>>  {
>>      sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
>> @@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>>      .translate = spapr_tce_translate_iommu,
>>      .get_min_page_size = spapr_tce_get_min_page_size,
>>      .notify_flag_changed = spapr_tce_notify_flag_changed,
>> +    .get_fd = spapr_tce_get_fd,
>>  };
>>  
>>  static int spapr_tce_table_realize(DeviceState *dev)
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR
  2017-03-29  1:41     ` Alexey Kardashevskiy
@ 2017-03-29  3:04       ` Alex Williamson
  2017-03-29  8:04         ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2017-03-29  3:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, David Gibson, Paolo Bonzini

On Wed, 29 Mar 2017 12:41:01 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 29/03/17 04:48, Alex Williamson wrote:
> > On Tue, 28 Mar 2017 20:05:28 +1100
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>  include/exec/memory.h | 2 ++
> >>  hw/ppc/spapr_iommu.c  | 8 ++++++++
> >>  2 files changed, 10 insertions(+)
> >>
> >> diff --git a/include/exec/memory.h b/include/exec/memory.h
> >> index e39256ad03..925c10b35b 100644
> >> --- a/include/exec/memory.h
> >> +++ b/include/exec/memory.h
> >> @@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
> >>      void (*notify_flag_changed)(MemoryRegion *iommu,
> >>                                  IOMMUNotifierFlag old_flags,
> >>                                  IOMMUNotifierFlag new_flags);
> >> +    /* Returns a kernel fd for IOMMU */
> >> +    int (*get_fd)(MemoryRegion *iommu);  
> > 
> > What if we used this as a prototype:
> > 
> > int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
> > 
> > And then we defined:
> > 
> > typedef enum {
> >     SPAPR_IOMMU_TABLE_FD = 0,
> > } IOMMUFdType;  
> 
> 
> Where do I put this enum definition? include/exec/memory.h? It does not
> have any mention of any platform yet...

I would assume memory.h, yes.  It seems like the enum is just an
abstraction, what does "get fd" mean generically to an IOMMU
MemoryRegion?  How can anyone else ever implement that callback if the
initial user is assuming that the returned fd is a specific, yet
unspecified type.  If the API is "give me an fd for this type of thing"
then the IOMMU driver can either provide it or indicate that type is not
supported.  There's really no platform knowledge at the memory API
level, it's just a type of thing that means something to the driver
providing the MemoryRegionIOMMUOps and the caller.
 
> I could pass char* instead of IOMMUFdType (and pass there something like
> TYPE_SPAPR_TCE_TABLE), would it be any better?

Gack, an enum seems so much cleaner than requiring a strcmp.  Thanks,

Alex

> > Such that you're actually asking the IOMMUOps for a specific type of FD
> > and it either has it or not, so the caller doesn't need to assume what
> > it is they get back.
> > 
> > Furthermore, add:
> > 
> > int memory_region_iommu_get_fd(IOMMUFdType type, MemoryRegion *mr)
> > {
> >     assert(memory_region_is_iommu(mr));
> > 
> >     if (mr->iommu_ops && mr->iommu_ops->get_fd) {
> >         return mr->iommu_ops->get_fd(type, mr);
> >     }
> > 
> >     return -1;
> > }
> >   
> >>  };
> >>  
> > 
> > This should be two patches, patch 1 above, patch 2 below
> >     
> >>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> >> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> >> index 9e30e148d6..b61c8f053e 100644
> >> --- a/hw/ppc/spapr_iommu.c
> >> +++ b/hw/ppc/spapr_iommu.c
> >> @@ -170,6 +170,13 @@ static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
> >>      }
> >>  }
> >>  
> >> +static int spapr_tce_get_fd(MemoryRegion *iommu)
> >> +{
> >> +    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> >> +
> >> +    return tcet->fd;  
> > 
> > 
> > This would then be:
> > 
> >     return type == SPAPR_IOMMU_TABLE_FD ? tcet->fd : -1;
> >   
> >> +}
> >> +
> >>  static int spapr_tce_table_post_load(void *opaque, int version_id)
> >>  {
> >>      sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> >> @@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
> >>      .translate = spapr_tce_translate_iommu,
> >>      .get_min_page_size = spapr_tce_get_min_page_size,
> >>      .notify_flag_changed = spapr_tce_notify_flag_changed,
> >> +    .get_fd = spapr_tce_get_fd,
> >>  };
> >>  
> >>  static int spapr_tce_table_realize(DeviceState *dev)  
> >   
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR
  2017-03-28 17:48   ` Alex Williamson
  2017-03-29  1:41     ` Alexey Kardashevskiy
@ 2017-03-29  3:35     ` David Gibson
  2017-03-29  5:08       ` Alexey Kardashevskiy
  1 sibling, 1 reply; 16+ messages in thread
From: David Gibson @ 2017-03-29  3:35 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Alexey Kardashevskiy, qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3143 bytes --]

On Tue, Mar 28, 2017 at 11:48:29AM -0600, Alex Williamson wrote:
> On Tue, 28 Mar 2017 20:05:28 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> >  include/exec/memory.h | 2 ++
> >  hw/ppc/spapr_iommu.c  | 8 ++++++++
> >  2 files changed, 10 insertions(+)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index e39256ad03..925c10b35b 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
> >      void (*notify_flag_changed)(MemoryRegion *iommu,
> >                                  IOMMUNotifierFlag old_flags,
> >                                  IOMMUNotifierFlag new_flags);
> > +    /* Returns a kernel fd for IOMMU */
> > +    int (*get_fd)(MemoryRegion *iommu);
> 
> What if we used this as a prototype:
> 
> int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
> 
> And then we defined:
> 
> typedef enum {
>     SPAPR_IOMMU_TABLE_FD = 0,
> } IOMMUFdType;

Are we expecting any new types of fd?  Maybe it would be simpler just
to name this spapr_tce_fd() or something more specific, and only
generalize if we really need it for another fd type.

> 
> Such that you're actually asking the IOMMUOps for a specific type of FD
> and it either has it or not, so the caller doesn't need to assume what
> it is they get back.
> 
> Furthermore, add:
> 
> int memory_region_iommu_get_fd(IOMMUFdType type, MemoryRegion *mr)
> {
>     assert(memory_region_is_iommu(mr));
> 
>     if (mr->iommu_ops && mr->iommu_ops->get_fd) {
>         return mr->iommu_ops->get_fd(type, mr);
>     }
> 
>     return -1;
> }
> 
> >  };
> >
> 
> This should be two patches, patch 1 above, patch 2 below
>   
> >  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> > index 9e30e148d6..b61c8f053e 100644
> > --- a/hw/ppc/spapr_iommu.c
> > +++ b/hw/ppc/spapr_iommu.c
> > @@ -170,6 +170,13 @@ static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
> >      }
> >  }
> >  
> > +static int spapr_tce_get_fd(MemoryRegion *iommu)
> > +{
> > +    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> > +
> > +    return tcet->fd;
> 
> 
> This would then be:
> 
>     return type == SPAPR_IOMMU_TABLE_FD ? tcet->fd : -1;
> 
> > +}
> > +
> >  static int spapr_tce_table_post_load(void *opaque, int version_id)
> >  {
> >      sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> > @@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
> >      .translate = spapr_tce_translate_iommu,
> >      .get_min_page_size = spapr_tce_get_min_page_size,
> >      .notify_flag_changed = spapr_tce_notify_flag_changed,
> > +    .get_fd = spapr_tce_get_fd,
> >  };
> >  
> >  static int spapr_tce_table_realize(DeviceState *dev)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 2/3] vfio-pci: Reorder group-to-container attaching
  2017-03-28 17:48   ` Alex Williamson
@ 2017-03-29  3:42     ` David Gibson
  0 siblings, 0 replies; 16+ messages in thread
From: David Gibson @ 2017-03-29  3:42 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Alexey Kardashevskiy, qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4357 bytes --]

On Tue, Mar 28, 2017 at 11:48:36AM -0600, Alex Williamson wrote:
> On Tue, 28 Mar 2017 20:05:29 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
> > At the moment VFIO PCI device initialization works as follows:
> > vfio_realize
> > 	vfio_get_group
> > 		vfio_connect_container
> > 			register memory listeners (1)
> > 			update QEMU groups lists
> > 		vfio_kvm_device_add_group
> > 
> > Then (example for pseries) the machine reset hook triggers region_add()
> > for all regions where listeners from (1) are listening:
> > 
> > ppc_spapr_reset
> > 	spapr_phb_reset
> > 		spapr_tce_table_enable
> > 			memory_region_add_subregion
> > 				vfio_listener_region_add
> > 					vfio_spapr_create_window
> > 
> > This scheme works fine until we need to handle VFIO PCI device hotplug
> > _and_ we want to enable in-kernel acceleration on, i.e. after PCI hotplug
> > we need a place to call
> > ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE).
> > Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd
> > (from VFIOGroup), vfio_listener_region_add() seems to be the only place
> > for this ioctl().
> > 
> > However this only works during boot time because the machine reset
> > happens strictly after all devices are finalized. When hotplug happens,
> > vfio_listener_region_add() is called when a memory listener is registered
> > but when this happens:
> > 1. new group is not added to the container->group_list yet;
> > 2. VFIO KVM device is unaware of the new IOMMU group.
> > 
> > This moves bits around to have all necessary VFIO infrastructure
> > in place for both initial startup and hotplug cases.
> > 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> >  hw/vfio/common.c | 21 +++++++++++----------
> >  1 file changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index f3ba9b9007..c75c7594d5 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -1086,6 +1086,16 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> >          goto free_container_exit;
> >      }
> >  
> > +    container->initialized = true;
> 
> This ignores the purpose of this variable, which is to make runtime
> mapping faults fatal, but device realize time faults simply cause the
> device to be rejected.  This cannot be moved above
> memory_listener_register().

Apart from that, the rest of the code motion looks ok, though.  Even
if it weren't for the hotplug case, have the container/group
more-or-less initialized before registering the listener makes more
logical sense to me.

> 
> > +
> > +    vfio_kvm_device_add_group(group);
> > +
> > +    QLIST_INIT(&container->group_list);
> > +    QLIST_INSERT_HEAD(&space->containers, container, next);
> > +
> > +    group->container = container;
> > +    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > +
> >      container->listener = vfio_memory_listener;
> >  
> >      memory_listener_register(&container->listener, container->space->as);
> > @@ -1097,16 +1107,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> >          goto listener_release_exit;
> >      }
> >  
> > -    container->initialized = true;
> > -
> > -    QLIST_INIT(&container->group_list);
> > -    QLIST_INSERT_HEAD(&space->containers, container, next);
> > -
> > -    group->container = container;
> > -    QLIST_INSERT_HEAD(&container->group_list, group, container_next);
> > -
> >      return 0;
> >  listener_release_exit:
> > +    vfio_kvm_device_del_group(group);
> 
> 
> Where's the QLIST cleanup?

Moving it does introduce more intermediate cleanup cases which need to
be handled, though..

> 
> 
> >      vfio_listener_release(container);
> >  
> >  free_container_exit:
> > @@ -1210,8 +1213,6 @@ VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
> >  
> >      QLIST_INSERT_HEAD(&vfio_group_list, group, next);
> >  
> > -    vfio_kvm_device_add_group(group);
> > -
> >      return group;
> >  
> >  close_fd_exit:
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device
  2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device Alexey Kardashevskiy
  2017-03-28 17:48   ` Alex Williamson
@ 2017-03-29  3:53   ` David Gibson
  1 sibling, 0 replies; 16+ messages in thread
From: David Gibson @ 2017-03-29  3:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, Alex Williamson

[-- Attachment #1: Type: text/plain, Size: 7301 bytes --]

On Tue, Mar 28, 2017 at 08:05:30PM +1100, Alexey Kardashevskiy wrote:
> This enables in-kernel acceleration of TCE update requests via
> VFIO KVM device.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  include/hw/vfio/vfio-common.h |  1 +
>  target/ppc/kvm_ppc.h          |  6 ++++++
>  hw/ppc/spapr_iommu.c          |  4 ++++
>  hw/vfio/common.c              | 13 +++++++++++++
>  hw/vfio/spapr.c               | 26 ++++++++++++++++++++++++++
>  target/ppc/kvm.c              |  7 ++++++-
>  hw/vfio/trace-events          |  1 +
>  7 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index c582de18c9..ee8c96cc4a 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -175,6 +175,7 @@ extern const MemoryListener vfio_prereg_listener;
>  int vfio_spapr_create_window(VFIOContainer *container,
>                               MemoryRegionSection *section,
>                               hwaddr *pgsize);
> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd);
>  int vfio_spapr_remove_window(VFIOContainer *container,
>                               hwaddr offset_within_address_space);
>  
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index f48243d13f..ce7327a4e0 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -46,6 +46,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>  int kvmppc_reset_htab(int shift_hint);
>  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
> +bool kvmppc_has_cap_spapr_vfio(void);
>  #endif /* !CONFIG_USER_ONLY */
>  bool kvmppc_has_cap_epr(void);
>  int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
> @@ -216,6 +217,11 @@ static inline bool kvmppc_is_mem_backend_page_size_ok(char *obj_path)
>      return true;
>  }
>  
> +static inline bool kvmppc_has_cap_spapr_vfio(void)
> +{
> +    return false;
> +}
> +
>  #endif /* !CONFIG_USER_ONLY */
>  
>  static inline bool kvmppc_has_cap_epr(void)
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index b61c8f053e..fc23d81645 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -293,6 +293,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
>  
>      tcet->need_vfio = need_vfio;
>  
> +    if (!need_vfio || (tcet->fd != -1 && kvmppc_has_cap_spapr_vfio())) {
> +        return;
> +    }
> +
>      oldtable = tcet->table;
>  
>      tcet->table = spapr_tce_alloc_table(tcet->liobn,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index c75c7594d5..9aaf861904 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -440,6 +440,19 @@ static void vfio_listener_region_add(MemoryListener *listener,
>              goto fail;
>          }
>  
> +#ifdef CONFIG_KVM
> +        if (kvm_enabled() && section->mr->iommu_ops->get_fd) {
> +            VFIOGroup *group;
> +            int tablefd =  section->mr->iommu_ops->get_fd(section->mr);
> +
> +            if (tablefd != -1) {
> +                QLIST_FOREACH(group, &container->group_list, container_next) {
> +                    vfio_spapr_notify_kvm(vfio_kvm_device_fd,
> +                                          group->fd, tablefd);

This is only going to make sense if we have both PAPR-style TCE tables
on the guest and TCE-based IOMMU backend on the host.  In which case
wouldn't it make more sense to explicitly verify that, and upcast,
rather than adding a new vaguely-specified get_fd hook.

> +                }
> +            }
> +        }
> +#endif
>          vfio_host_win_add(container, section->offset_within_address_space,
>                            section->offset_within_address_space +
>                            int128_get64(section->size) - 1, pgsize);
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 4409bcc0d7..dffef3bd5f 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -17,6 +17,9 @@
>  #include "hw/hw.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
> +#ifdef CONFIG_KVM
> +#include "linux/kvm.h"
> +#endif
>  
>  static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
>  {
> @@ -187,6 +190,29 @@ int vfio_spapr_create_window(VFIOContainer *container,
>      return 0;
>  }
>  
> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd)
> +{
> +#ifdef CONFIG_KVM
> +    struct kvm_vfio_spapr_tce param = {
> +        .groupfd = groupfd,
> +        .tablefd = tablefd
> +    };
> +    struct kvm_device_attr attr = {
> +        .group = KVM_DEV_VFIO_GROUP,
> +        .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
> +        .addr = (uint64_t)(unsigned long)&param,
> +    };
> +
> +    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
> +        error_report("vfio: failed to setup fd %d for a group with fd %d: %s",
> +                     param.tablefd, param.groupfd, strerror(errno));
> +        return -errno;
> +    }
> +    trace_vfio_spapr_notify_kvm(groupfd, tablefd);
> +#endif
> +    return 0;
> +}
> +
>  int vfio_spapr_remove_window(VFIOContainer *container,
>                               hwaddr offset_within_address_space)
>  {
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 560ce655c7..bca5fe7329 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -131,7 +131,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>      cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
>      cap_spapr_tce_64 = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_64);
>      cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
> -    cap_spapr_vfio = false;
> +    cap_spapr_vfio = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_VFIO);
>      cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
>      cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
>      cap_epr = kvm_check_extension(s, KVM_CAP_PPC_EPR);
> @@ -2416,6 +2416,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>      return cap_mmu_hash_v3;
>  }
>  
> +bool kvmppc_has_cap_spapr_vfio(void)
> +{
> +    return cap_spapr_vfio;
> +}
> +
>  static PowerPCCPUClass *ppc_cpu_get_family_class(PowerPCCPUClass *pcc)
>  {
>      ObjectClass *oc = OBJECT_CLASS(pcc);
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 2561c6d31a..084a92f7c2 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -123,3 +123,4 @@ vfio_prereg_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"P
>  vfio_prereg_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
>  vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
>  vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64
> +vfio_spapr_notify_kvm(int groupfd, int tablefd) "Attached groupfd %d to liobn fd %d"

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device
  2017-03-28 17:48   ` Alex Williamson
@ 2017-03-29  4:27     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2017-03-29  4:27 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, qemu-ppc, David Gibson

On 29/03/17 04:48, Alex Williamson wrote:
> On Tue, 28 Mar 2017 20:05:30 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> This enables in-kernel acceleration of TCE update requests via
>> VFIO KVM device.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  include/hw/vfio/vfio-common.h |  1 +
>>  target/ppc/kvm_ppc.h          |  6 ++++++
>>  hw/ppc/spapr_iommu.c          |  4 ++++
>>  hw/vfio/common.c              | 13 +++++++++++++
>>  hw/vfio/spapr.c               | 26 ++++++++++++++++++++++++++
>>  target/ppc/kvm.c              |  7 ++++++-
>>  hw/vfio/trace-events          |  1 +
>>  7 files changed, 57 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index c582de18c9..ee8c96cc4a 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
> 
> Two patches intermixed here again it seems.  I'll refer to them as "A"
> and "B".  Seems easy to split at the file level.
> 
> Patch "B"
> 
>> @@ -175,6 +175,7 @@ extern const MemoryListener vfio_prereg_listener;
>>  int vfio_spapr_create_window(VFIOContainer *container,
>>                               MemoryRegionSection *section,
>>                               hwaddr *pgsize);
>> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd);
>>  int vfio_spapr_remove_window(VFIOContainer *container,
>>                               hwaddr offset_within_address_space);
>>  
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index f48243d13f..ce7327a4e0 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
> 
> Patch "A"
> 
>> @@ -46,6 +46,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t page_shift,
>>  int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size);
>>  int kvmppc_reset_htab(int shift_hint);
>>  uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
>> +bool kvmppc_has_cap_spapr_vfio(void);
>>  #endif /* !CONFIG_USER_ONLY */
>>  bool kvmppc_has_cap_epr(void);
>>  int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function);
>> @@ -216,6 +217,11 @@ static inline bool kvmppc_is_mem_backend_page_size_ok(char *obj_path)
>>      return true;
>>  }
>>  
>> +static inline bool kvmppc_has_cap_spapr_vfio(void)
>> +{
>> +    return false;
>> +}
>> +
>>  #endif /* !CONFIG_USER_ONLY */
>>  
>>  static inline bool kvmppc_has_cap_epr(void)
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index b61c8f053e..fc23d81645 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
> 
> Patch "A"
> 
>> @@ -293,6 +293,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio)
>>  
>>      tcet->need_vfio = need_vfio;
>>  
>> +    if (!need_vfio || (tcet->fd != -1 && kvmppc_has_cap_spapr_vfio())) {
>> +        return;
>> +    }


Separation to "A" and "B" makes sense most of the time, however this bit
being put into "A" will look at the capability and change the behaviour
effectively disabling TCE requests handling in the kernel as
vfio_spapr_notify_kvm() only appears in "B". Bad for bisectability.

I could swap "A" and "B", this way vfio_spapr_notify_kvm() would fail but
thing would keep working.



>> +
>>      oldtable = tcet->table;
>>  
>>      tcet->table = spapr_tce_alloc_table(tcet->liobn,
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index c75c7594d5..9aaf861904 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
> 
> Patch "B"
> 
>> @@ -440,6 +440,19 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>              goto fail;
>>          }
>>  
>> +#ifdef CONFIG_KVM
> 
> I don't think we need this just for kvm_enabled(), do we?


We do for vfio_kvm_device_fd - this one is defined under #ifdef.


> 
>> +        if (kvm_enabled() && section->mr->iommu_ops->get_fd) {
>> +            VFIOGroup *group;
>> +            int tablefd =  section->mr->iommu_ops->get_fd(section->mr);
> 
> This would change to
> 
>     tablefd=memory_region_iommu_get_fd(SPAPR_IOMMU_TABLE_FD,section->mr);
> 
>> +
>> +            if (tablefd != -1) {
>> +                QLIST_FOREACH(group, &container->group_list, container_next) {
>> +                    vfio_spapr_notify_kvm(vfio_kvm_device_fd,
>> +                                          group->fd, tablefd);
>> +                }
>> +            }
>> +        }
>> +#endif
>>          vfio_host_win_add(container, section->offset_within_address_space,
>>                            section->offset_within_address_space +
>>                            int128_get64(section->size) - 1, pgsize);
>> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
>> index 4409bcc0d7..dffef3bd5f 100644
>> --- a/hw/vfio/spapr.c
>> +++ b/hw/vfio/spapr.c
> 
> Patch "B"
> 
>> @@ -17,6 +17,9 @@
>>  #include "hw/hw.h"
>>  #include "qemu/error-report.h"
>>  #include "trace.h"
>> +#ifdef CONFIG_KVM
>> +#include "linux/kvm.h"
>> +#endif
>>  
>>  static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
>>  {
>> @@ -187,6 +190,29 @@ int vfio_spapr_create_window(VFIOContainer *container,
>>      return 0;
>>  }
>>  
>> +int vfio_spapr_notify_kvm(int vfio_kvm_device_fd, int groupfd, int tablefd)
>> +{
>> +#ifdef CONFIG_KVM
>> +    struct kvm_vfio_spapr_tce param = {
>> +        .groupfd = groupfd,
>> +        .tablefd = tablefd
>> +    };
>> +    struct kvm_device_attr attr = {
>> +        .group = KVM_DEV_VFIO_GROUP,
>> +        .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
>> +        .addr = (uint64_t)(unsigned long)&param,
>> +    };
>> +
>> +    if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
>> +        error_report("vfio: failed to setup fd %d for a group with fd %d: %s",
>> +                     param.tablefd, param.groupfd, strerror(errno));
>> +        return -errno;
>> +    }
>> +    trace_vfio_spapr_notify_kvm(groupfd, tablefd);
>> +#endif
>> +    return 0;
>> +}
>> +
>>  int vfio_spapr_remove_window(VFIOContainer *container,
>>                               hwaddr offset_within_address_space)
>>  {
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index 560ce655c7..bca5fe7329 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
> 
> Patch "A"
> 
>> @@ -131,7 +131,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>      cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
>>      cap_spapr_tce_64 = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_64);
>>      cap_spapr_multitce = kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE);
>> -    cap_spapr_vfio = false;
>> +    cap_spapr_vfio = kvm_check_extension(s, KVM_CAP_SPAPR_TCE_VFIO);
>>      cap_one_reg = kvm_check_extension(s, KVM_CAP_ONE_REG);
>>      cap_hior = kvm_check_extension(s, KVM_CAP_PPC_HIOR);
>>      cap_epr = kvm_check_extension(s, KVM_CAP_PPC_EPR);
>> @@ -2416,6 +2416,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void)
>>      return cap_mmu_hash_v3;
>>  }
>>  
>> +bool kvmppc_has_cap_spapr_vfio(void)
>> +{
>> +    return cap_spapr_vfio;
>> +}
>> +
>>  static PowerPCCPUClass *ppc_cpu_get_family_class(PowerPCCPUClass *pcc)
>>  {
>>      ObjectClass *oc = OBJECT_CLASS(pcc);
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 2561c6d31a..084a92f7c2 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
> 
> Patch "B"
> 
>> @@ -123,3 +123,4 @@ vfio_prereg_register(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"P
>>  vfio_prereg_unregister(uint64_t va, uint64_t size, int ret) "va=%"PRIx64" size=%"PRIx64" ret=%d"
>>  vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=0x%x winsize=0x%"PRIx64" offset=0x%"PRIx64
>>  vfio_spapr_remove_window(uint64_t off) "offset=%"PRIx64
>> +vfio_spapr_notify_kvm(int groupfd, int tablefd) "Attached groupfd %d to liobn fd %d"
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR
  2017-03-29  3:35     ` David Gibson
@ 2017-03-29  5:08       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Alexey Kardashevskiy @ 2017-03-29  5:08 UTC (permalink / raw)
  To: David Gibson, Alex Williamson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3374 bytes --]

On 29/03/17 14:35, David Gibson wrote:
> On Tue, Mar 28, 2017 at 11:48:29AM -0600, Alex Williamson wrote:
>> On Tue, 28 Mar 2017 20:05:28 +1100
>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  include/exec/memory.h | 2 ++
>>>  hw/ppc/spapr_iommu.c  | 8 ++++++++
>>>  2 files changed, 10 insertions(+)
>>>
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index e39256ad03..925c10b35b 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -174,6 +174,8 @@ struct MemoryRegionIOMMUOps {
>>>      void (*notify_flag_changed)(MemoryRegion *iommu,
>>>                                  IOMMUNotifierFlag old_flags,
>>>                                  IOMMUNotifierFlag new_flags);
>>> +    /* Returns a kernel fd for IOMMU */
>>> +    int (*get_fd)(MemoryRegion *iommu);
>>
>> What if we used this as a prototype:
>>
>> int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
>>
>> And then we defined:
>>
>> typedef enum {
>>     SPAPR_IOMMU_TABLE_FD = 0,
>> } IOMMUFdType;
> 
> Are we expecting any new types of fd?  Maybe it would be simpler just
> to name this spapr_tce_fd() or something more specific, and only
> generalize if we really need it for another fd type.


So far we managed to keep VFIO and sPAPR-IOMMU relatively separate - they
do not include each others headers and interact via memory regions and
kernel uapi interface. The only direct connection between VFIO and sPAPR at
all is vfio_eeh_as_ok/vfio_eeh_as_op which is rather workaround. I like
this separation tbh.



> 
>>
>> Such that you're actually asking the IOMMUOps for a specific type of FD
>> and it either has it or not, so the caller doesn't need to assume what
>> it is they get back.
>>
>> Furthermore, add:
>>
>> int memory_region_iommu_get_fd(IOMMUFdType type, MemoryRegion *mr)
>> {
>>     assert(memory_region_is_iommu(mr));
>>
>>     if (mr->iommu_ops && mr->iommu_ops->get_fd) {
>>         return mr->iommu_ops->get_fd(type, mr);
>>     }
>>
>>     return -1;
>> }
>>
>>>  };
>>>
>>
>> This should be two patches, patch 1 above, patch 2 below
>>   
>>>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>>> index 9e30e148d6..b61c8f053e 100644
>>> --- a/hw/ppc/spapr_iommu.c
>>> +++ b/hw/ppc/spapr_iommu.c
>>> @@ -170,6 +170,13 @@ static void spapr_tce_notify_flag_changed(MemoryRegion *iommu,
>>>      }
>>>  }
>>>  
>>> +static int spapr_tce_get_fd(MemoryRegion *iommu)
>>> +{
>>> +    sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
>>> +
>>> +    return tcet->fd;
>>
>>
>> This would then be:
>>
>>     return type == SPAPR_IOMMU_TABLE_FD ? tcet->fd : -1;
>>
>>> +}
>>> +
>>>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>>>  {
>>>      sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
>>> @@ -251,6 +258,7 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>>>      .translate = spapr_tce_translate_iommu,
>>>      .get_min_page_size = spapr_tce_get_min_page_size,
>>>      .notify_flag_changed = spapr_tce_notify_flag_changed,
>>> +    .get_fd = spapr_tce_get_fd,
>>>  };
>>>  
>>>  static int spapr_tce_table_realize(DeviceState *dev)
>>
> 


-- 
Alexey


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 839 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR
  2017-03-29  3:04       ` Alex Williamson
@ 2017-03-29  8:04         ` Paolo Bonzini
  2017-03-30  1:33           ` David Gibson
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2017-03-29  8:04 UTC (permalink / raw)
  To: Alex Williamson, Alexey Kardashevskiy; +Cc: qemu-devel, qemu-ppc, David Gibson



On 29/03/2017 05:04, Alex Williamson wrote:
>>> What if we used this as a prototype:
>>>
>>> int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
>>>
>>> And then we defined:
>>>
>>> typedef enum {
>>>     SPAPR_IOMMU_TABLE_FD = 0,
>>> } IOMMUFdType;  
>>
>> Where do I put this enum definition? include/exec/memory.h? It does not
>> have any mention of any platform yet...
>  
> I would assume memory.h, yes.  It seems like the enum is just an
> abstraction, what does "get fd" mean generically to an IOMMU
> MemoryRegion?  How can anyone else ever implement that callback if the
> initial user is assuming that the returned fd is a specific, yet
> unspecified type.  If the API is "give me an fd for this type of thing"
> then the IOMMU driver can either provide it or indicate that type is not
> supported.  There's really no platform knowledge at the memory API
> level, it's just a type of thing that means something to the driver
> providing the MemoryRegionIOMMUOps and the caller.

I think we should move to a QOM hierarchy like

   AbstractMemoryRegion
      MemoryRegion                  << adds MemoryRegionOps
      IOMMUMemoryRegion             << adds MemoryRegionIOMMUOps
        sPAPR_IOMMUMemoryRegion     << adds get_fd

but for now I'm okay with Alex's proposal.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR
  2017-03-29  8:04         ` Paolo Bonzini
@ 2017-03-30  1:33           ` David Gibson
  0 siblings, 0 replies; 16+ messages in thread
From: David Gibson @ 2017-03-30  1:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alex Williamson, Alexey Kardashevskiy, qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]

On Wed, Mar 29, 2017 at 10:04:47AM +0200, Paolo Bonzini wrote:
> 
> 
> On 29/03/2017 05:04, Alex Williamson wrote:
> >>> What if we used this as a prototype:
> >>>
> >>> int (*get_fd)(IOMMUFdType type, MemoryRegion *iommu);
> >>>
> >>> And then we defined:
> >>>
> >>> typedef enum {
> >>>     SPAPR_IOMMU_TABLE_FD = 0,
> >>> } IOMMUFdType;  
> >>
> >> Where do I put this enum definition? include/exec/memory.h? It does not
> >> have any mention of any platform yet...
> >  
> > I would assume memory.h, yes.  It seems like the enum is just an
> > abstraction, what does "get fd" mean generically to an IOMMU
> > MemoryRegion?  How can anyone else ever implement that callback if the
> > initial user is assuming that the returned fd is a specific, yet
> > unspecified type.  If the API is "give me an fd for this type of thing"
> > then the IOMMU driver can either provide it or indicate that type is not
> > supported.  There's really no platform knowledge at the memory API
> > level, it's just a type of thing that means something to the driver
> > providing the MemoryRegionIOMMUOps and the caller.
> 
> I think we should move to a QOM hierarchy like
> 
>    AbstractMemoryRegion
>       MemoryRegion                  << adds MemoryRegionOps
>       IOMMUMemoryRegion             << adds MemoryRegionIOMMUOps
>         sPAPR_IOMMUMemoryRegion     << adds get_fd

Right, this makes more sense to me than the enum as well.

> but for now I'm okay with Alex's proposal.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-03-30  1:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-28  9:05 [Qemu-devel] [RFC PATCH qemu 0/3] vfio-pci, spapr: Allow in-kernel acceleration Alexey Kardashevskiy
2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 1/3] memory: Add get_fd() hook for IOMMU MR Alexey Kardashevskiy
2017-03-28 17:48   ` Alex Williamson
2017-03-29  1:41     ` Alexey Kardashevskiy
2017-03-29  3:04       ` Alex Williamson
2017-03-29  8:04         ` Paolo Bonzini
2017-03-30  1:33           ` David Gibson
2017-03-29  3:35     ` David Gibson
2017-03-29  5:08       ` Alexey Kardashevskiy
2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 2/3] vfio-pci: Reorder group-to-container attaching Alexey Kardashevskiy
2017-03-28 17:48   ` Alex Williamson
2017-03-29  3:42     ` David Gibson
2017-03-28  9:05 ` [Qemu-devel] [RFC PATCH qemu 3/3] vfio: Enable in-kernel acceleration via VFIO KVM device Alexey Kardashevskiy
2017-03-28 17:48   ` Alex Williamson
2017-03-29  4:27     ` Alexey Kardashevskiy
2017-03-29  3:53   ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.