linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
@ 2022-08-17 16:11 Jason Gunthorpe
  2022-08-17 16:11 ` [PATCH 1/4] dma-buf: Add dma_buf_try_get() Jason Gunthorpe
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-17 16:11 UTC (permalink / raw)
  To: Alex Williamson, Christian König, Cornelia Huck, dri-devel,
	kvm, linaro-mm-sig, linux-media, Sumit Semwal
  Cc: Daniel Vetter, Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

dma-buf has become a way to safely acquire a handle to non-struct page
memory that can still have lifetime controlled by the exporter. Notably
RDMA can now import dma-buf FDs and build them into MRs which allows for
PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
from PCI device BARs.

This series supports a use case for SPDK where a NVMe device will be owned
by SPDK through VFIO but interacting with a RDMA device. The RDMA device
may directly access the NVMe CMB or directly manipulate the NVMe device's
doorbell using PCI P2P.

However, as a general mechanism, it can support many other scenarios with
VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
generic and safe P2P mappings.

This series goes after the "Break up ioctl dispatch functions to one
function per ioctl" series.

This is on github: https://github.com/jgunthorpe/linux/commits/vfio_dma_buf

Jason Gunthorpe (4):
  dma-buf: Add dma_buf_try_get()
  vfio: Add vfio_device_get()
  vfio_pci: Do not open code pci_try_reset_function()
  vfio/pci: Allow MMIO regions to be exported through dma-buf

 drivers/vfio/pci/Makefile           |   1 +
 drivers/vfio/pci/vfio_pci_config.c  |  22 ++-
 drivers/vfio/pci/vfio_pci_core.c    |  33 +++-
 drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_priv.h    |  24 +++
 drivers/vfio/vfio_main.c            |   3 +-
 include/linux/dma-buf.h             |  13 ++
 include/linux/vfio.h                |   6 +
 include/linux/vfio_pci_core.h       |   1 +
 include/uapi/linux/vfio.h           |  18 ++
 10 files changed, 364 insertions(+), 22 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_dma_buf.c


base-commit: 385f0411fcd2780b5273992832cdc8edcd5b8ea9
-- 
2.37.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] dma-buf: Add dma_buf_try_get()
  2022-08-17 16:11 [PATCH 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
@ 2022-08-17 16:11 ` Jason Gunthorpe
  2022-08-17 16:11 ` [PATCH 2/4] vfio: Add vfio_device_get() Jason Gunthorpe
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-17 16:11 UTC (permalink / raw)
  To: Alex Williamson, Christian König, Cornelia Huck, dri-devel,
	kvm, linaro-mm-sig, linux-media, Sumit Semwal
  Cc: Daniel Vetter, Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

Used to increment the refcount of the dma buf's struct file, only if the
refcount is not zero. Useful to allow the struct file's lifetime to
control the lifetime of the dmabuf while still letting the driver to keep
track of created dmabufs.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 include/linux/dma-buf.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 71731796c8c3a8..a35f1554f2fb36 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -618,6 +618,19 @@ int dma_buf_fd(struct dma_buf *dmabuf, int flags);
 struct dma_buf *dma_buf_get(int fd);
 void dma_buf_put(struct dma_buf *dmabuf);
 
+/**
+ * dma_buf_try_get - try to get a reference on a dmabuf
+ * @dmabuf - the dmabuf to get
+ *
+ * Returns true if a reference was successfully obtained. The caller must
+ * interlock with the dmabuf's release function in some way, such as RCU, to
+ * ensure that this is not called on freed memory.
+ */
+static inline bool dma_buf_try_get(struct dma_buf *dmabuf)
+{
+	return get_file_rcu(dmabuf->file);
+}
+
 struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *,
 					enum dma_data_direction);
 void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *,
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] vfio: Add vfio_device_get()
  2022-08-17 16:11 [PATCH 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
  2022-08-17 16:11 ` [PATCH 1/4] dma-buf: Add dma_buf_try_get() Jason Gunthorpe
@ 2022-08-17 16:11 ` Jason Gunthorpe
  2022-08-17 16:11 ` [PATCH 3/4] vfio_pci: Do not open code pci_try_reset_function() Jason Gunthorpe
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-17 16:11 UTC (permalink / raw)
  To: Alex Williamson, Christian König, Cornelia Huck, dri-devel,
	kvm, linaro-mm-sig, linux-media, Sumit Semwal
  Cc: Daniel Vetter, Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

To increment a reference the caller already holds. Export
vfio_device_put() to pair with it.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_main.c | 3 ++-
 include/linux/vfio.h     | 6 ++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6f96e6d07a5e98..5ad50aec7dc94c 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -451,11 +451,12 @@ static void vfio_group_get(struct vfio_group *group)
  * Device objects - create, release, get, put, search
  */
 /* Device reference always implies a group reference */
-static void vfio_device_put(struct vfio_device *device)
+void vfio_device_put(struct vfio_device *device)
 {
 	if (refcount_dec_and_test(&device->refcount))
 		complete(&device->comp);
 }
+EXPORT_SYMBOL_GPL(vfio_device_put);
 
 static bool vfio_device_try_get(struct vfio_device *device)
 {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e05ddc6fe6a556..25850b1e08cba9 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -143,6 +143,12 @@ void vfio_uninit_group_dev(struct vfio_device *device);
 int vfio_register_group_dev(struct vfio_device *device);
 int vfio_register_emulated_iommu_dev(struct vfio_device *device);
 void vfio_unregister_group_dev(struct vfio_device *device);
+void vfio_device_put(struct vfio_device *device);
+
+static inline void vfio_device_get(struct vfio_device *device)
+{
+	refcount_inc(&device->refcount);
+}
 
 int vfio_assign_device_set(struct vfio_device *device, void *set_id);
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] vfio_pci: Do not open code pci_try_reset_function()
  2022-08-17 16:11 [PATCH 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
  2022-08-17 16:11 ` [PATCH 1/4] dma-buf: Add dma_buf_try_get() Jason Gunthorpe
  2022-08-17 16:11 ` [PATCH 2/4] vfio: Add vfio_device_get() Jason Gunthorpe
@ 2022-08-17 16:11 ` Jason Gunthorpe
  2022-08-17 16:11 ` [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-17 16:11 UTC (permalink / raw)
  To: Alex Williamson, Christian König, Cornelia Huck, dri-devel,
	kvm, linaro-mm-sig, linux-media, Sumit Semwal
  Cc: Daniel Vetter, Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

FLR triggered by an emulated config space write should not behave
differently from a FLR triggered by VFIO_DEVICE_RESET, currently the
config space path misses the power management.

Consolidate all the call sites to invoke a single function.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/pci/vfio_pci_config.c | 14 ++++----------
 drivers/vfio/pci/vfio_pci_core.c   |  5 ++---
 drivers/vfio/pci/vfio_pci_priv.h   |  1 +
 3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 4a350421c5f62a..d22921efa25987 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -893,11 +893,8 @@ static int vfio_exp_config_write(struct vfio_pci_core_device *vdev, int pos,
 						 pos - offset + PCI_EXP_DEVCAP,
 						 &cap);
 
-		if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) {
-			vfio_pci_zap_and_down_write_memory_lock(vdev);
-			pci_try_reset_function(vdev->pdev);
-			up_write(&vdev->memory_lock);
-		}
+		if (!ret && (cap & PCI_EXP_DEVCAP_FLR))
+			vfio_pci_try_reset_function(vdev);
 	}
 
 	/*
@@ -975,11 +972,8 @@ static int vfio_af_config_write(struct vfio_pci_core_device *vdev, int pos,
 						pos - offset + PCI_AF_CAP,
 						&cap);
 
-		if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) {
-			vfio_pci_zap_and_down_write_memory_lock(vdev);
-			pci_try_reset_function(vdev->pdev);
-			up_write(&vdev->memory_lock);
-		}
+		if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP))
+			vfio_pci_try_reset_function(vdev);
 	}
 
 	return count;
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 050b9d4b8c290c..d13e22021860cc 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -960,8 +960,7 @@ static int vfio_pci_ioctl_set_irqs(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
-static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
-				void __user *arg)
+int vfio_pci_try_reset_function(struct vfio_pci_core_device *vdev)
 {
 	int ret;
 
@@ -1202,7 +1201,7 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 	case VFIO_DEVICE_PCI_HOT_RESET:
 		return vfio_pci_ioctl_pci_hot_reset(vdev, uarg);
 	case VFIO_DEVICE_RESET:
-		return vfio_pci_ioctl_reset(vdev, uarg);
+		return vfio_pci_try_reset_function(vdev);
 	case VFIO_DEVICE_SET_IRQS:
 		return vfio_pci_ioctl_set_irqs(vdev, uarg);
 	default:
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 58b8d34c162cd6..5b1cb9a9838442 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -60,6 +60,7 @@ void vfio_config_free(struct vfio_pci_core_device *vdev);
 int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev,
 			     pci_power_t state);
 
+int vfio_pci_try_reset_function(struct vfio_pci_core_device *vdev);
 bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev);
 void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev);
 u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
  2022-08-17 16:11 [PATCH 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2022-08-17 16:11 ` [PATCH 3/4] vfio_pci: Do not open code pci_try_reset_function() Jason Gunthorpe
@ 2022-08-17 16:11 ` Jason Gunthorpe
  2022-08-21 13:51   ` Fwd: " Oded Gabbay
  2022-08-29  5:04   ` Yan Zhao
  2022-08-18 11:07 ` [PATCH 0/4] " Christian König
  2022-08-18 12:05 ` Jason Gunthorpe
  5 siblings, 2 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-17 16:11 UTC (permalink / raw)
  To: Alex Williamson, Christian König, Cornelia Huck, dri-devel,
	kvm, linaro-mm-sig, linux-media, Sumit Semwal
  Cc: Daniel Vetter, Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

dma-buf has become a way to safely acquire a handle to non-struct page
memory that can still have lifetime controlled by the exporter. Notably
RDMA can now import dma-buf FDs and build them into MRs which allows for
PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
from PCI device BARs.

The patch design loosely follows the pattern in commit
db1a8dd916aa ("habanalabs: add support for dma-buf exporter") except this
does not support pinning.

Instead, this implements what, in the past, we've called a revocable
attachment using move. In normal situations the attachment is pinned, as a
BAR does not change physical address. However when the VFIO device is
closed, or a PCI reset is issued, access to the MMIO memory is revoked.

Revoked means that move occurs, but an attempt to immediately re-map the
memory will fail. In the reset case a future move will be triggered when
MMIO access returns. As both close and reset are under userspace control
it is expected that userspace will suspend use of the dma-buf before doing
these operations, the revoke is purely for kernel self-defense against a
hostile userspace.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/pci/Makefile           |   1 +
 drivers/vfio/pci/vfio_pci_config.c  |   8 +-
 drivers/vfio/pci/vfio_pci_core.c    |  28 ++-
 drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_priv.h    |  23 +++
 include/linux/vfio_pci_core.h       |   1 +
 include/uapi/linux/vfio.h           |  18 ++
 7 files changed, 336 insertions(+), 8 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_dma_buf.c

diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 24c524224da5a3..81006a157cde14 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -2,6 +2,7 @@
 
 vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
 vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o
+vfio-pci-core-$(CONFIG_DMA_SHARED_BUFFER) += vfio_pci_dma_buf.o
 obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
 
 vfio-pci-y := vfio_pci.o
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index d22921efa25987..f8a9c24d04aeb7 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -584,10 +584,12 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
 		virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY);
 		new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);
 
-		if (!new_mem)
+		if (!new_mem) {
 			vfio_pci_zap_and_down_write_memory_lock(vdev);
-		else
+			vfio_pci_dma_buf_move(vdev, true);
+		} else {
 			down_write(&vdev->memory_lock);
+		}
 
 		/*
 		 * If the user is writing mem/io enable (new_mem/io) and we
@@ -622,6 +624,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
 		*virt_cmd &= cpu_to_le16(~mask);
 		*virt_cmd |= cpu_to_le16(new_cmd & mask);
 
+		if (__vfio_pci_memory_enabled(vdev))
+			vfio_pci_dma_buf_move(vdev, false);
 		up_write(&vdev->memory_lock);
 	}
 
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d13e22021860cc..206f159c480e42 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -504,6 +504,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
 	vfio_spapr_pci_eeh_release(vdev->pdev);
 	vfio_pci_core_disable(vdev);
 
+	vfio_pci_dma_buf_cleanup(vdev);
+
 	mutex_lock(&vdev->igate);
 	if (vdev->err_trigger) {
 		eventfd_ctx_put(vdev->err_trigger);
@@ -980,7 +982,10 @@ int vfio_pci_try_reset_function(struct vfio_pci_core_device *vdev)
 	 */
 	vfio_pci_set_power_state(vdev, PCI_D0);
 
+	vfio_pci_dma_buf_move(vdev, true);
 	ret = pci_try_reset_function(vdev->pdev);
+	if (__vfio_pci_memory_enabled(vdev))
+		vfio_pci_dma_buf_move(vdev, false);
 	up_write(&vdev->memory_lock);
 
 	return ret;
@@ -1210,11 +1215,10 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);
 
-static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
-				       uuid_t __user *arg, size_t argsz)
+static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
+				       u32 flags, uuid_t __user *arg,
+				       size_t argsz)
 {
-	struct vfio_pci_core_device *vdev =
-		container_of(device, struct vfio_pci_core_device, vdev);
 	uuid_t uuid;
 	int ret;
 
@@ -1241,9 +1245,14 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
 				void __user *arg, size_t argsz)
 {
+	struct vfio_pci_core_device *vdev =
+		container_of(device, struct vfio_pci_core_device, vdev);
+
 	switch (flags & VFIO_DEVICE_FEATURE_MASK) {
 	case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
-		return vfio_pci_core_feature_token(device, flags, arg, argsz);
+		return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
+	case VFIO_DEVICE_FEATURE_DMA_BUF:
+		return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
 	default:
 		return -ENOTTY;
 	}
@@ -1881,6 +1890,7 @@ void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev,
 	INIT_LIST_HEAD(&vdev->vma_list);
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
 	init_rwsem(&vdev->memory_lock);
+	INIT_LIST_HEAD(&vdev->dmabufs);
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_init_device);
 
@@ -2227,11 +2237,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 	 * cause the PCI config space reset without restoring the original
 	 * state (saved locally in 'vdev->pm_save').
 	 */
-	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
+	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
+		vfio_pci_dma_buf_move(cur, true);
 		vfio_pci_set_power_state(cur, PCI_D0);
+	}
 
 	ret = pci_reset_bus(pdev);
 
+	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
+		if (__vfio_pci_memory_enabled(cur))
+			vfio_pci_dma_buf_move(cur, false);
+
 err_undo:
 	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
 		if (cur == cur_mem)
diff --git a/drivers/vfio/pci/vfio_pci_dma_buf.c b/drivers/vfio/pci/vfio_pci_dma_buf.c
new file mode 100644
index 00000000000000..ac32405de5e48d
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_dma_buf.c
@@ -0,0 +1,265 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ */
+#include <linux/dma-buf.h>
+#include <linux/pci-p2pdma.h>
+#include <linux/dma-resv.h>
+
+#include "vfio_pci_priv.h"
+
+MODULE_IMPORT_NS(DMA_BUF);
+
+struct vfio_pci_dma_buf {
+	struct dma_buf *dmabuf;
+	struct vfio_pci_core_device *vdev;
+	struct list_head dmabufs_elm;
+	unsigned int index;
+	size_t offset;
+	bool revoked;
+};
+
+static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
+				   struct dma_buf_attachment *attachment)
+{
+	struct vfio_pci_dma_buf *priv = dmabuf->priv;
+	int rc;
+
+	rc = pci_p2pdma_distance_many(priv->vdev->pdev, &attachment->dev, 1,
+				      true);
+	if (rc < 0)
+		attachment->peer2peer = false;
+	return 0;
+}
+
+static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment)
+{
+}
+
+static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment)
+{
+	/*
+	 * Uses the dynamic interface but must always allow for
+	 * dma_buf_move_notify() to do revoke
+	 */
+	return -EINVAL;
+}
+
+static struct sg_table *
+vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
+		     enum dma_data_direction dir)
+{
+	size_t sgl_size = dma_get_max_seg_size(attachment->dev);
+	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
+	struct scatterlist *sgl;
+	struct sg_table *sgt;
+	dma_addr_t dma_addr;
+	unsigned int nents;
+	size_t offset;
+	int ret;
+
+	dma_resv_assert_held(priv->dmabuf->resv);
+
+	if (!attachment->peer2peer)
+		return ERR_PTR(-EPERM);
+
+	if (!priv->revoked)
+		return ERR_PTR(-ENODEV);
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt)
+		return ERR_PTR(-ENOMEM);
+
+	nents = DIV_ROUND_UP(priv->dmabuf->size, sgl_size);
+	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
+	if (ret)
+		goto err_kfree_sgt;
+
+	/*
+	 * Since the memory being mapped is a device memory it could never be in
+	 * CPU caches.
+	 */
+	dma_addr = dma_map_resource(
+		attachment->dev,
+		pci_resource_start(priv->vdev->pdev, priv->index) +
+			priv->offset,
+		priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
+	ret = dma_mapping_error(attachment->dev, dma_addr);
+	if (ret)
+		goto err_free_sgt;
+
+	/*
+	 * Break the BAR's physical range up into max sized SGL's according to
+	 * the device's requirement.
+	 */
+	sgl = sgt->sgl;
+	for (offset = 0; offset != priv->dmabuf->size;) {
+		size_t chunk_size = min(priv->dmabuf->size - offset, sgl_size);
+
+		sg_set_page(sgl, NULL, chunk_size, 0);
+		sg_dma_address(sgl) = dma_addr + offset;
+		sg_dma_len(sgl) = chunk_size;
+		sgl = sg_next(sgl);
+		offset += chunk_size;
+	}
+
+	/*
+	 * Because we are not going to include a CPU list we want to have some
+	 * chance that other users will detect this by setting the orig_nents to
+	 * 0 and using only nents (length of DMA list) when going over the sgl
+	 */
+	sgt->orig_nents = 0;
+	return sgt;
+
+err_free_sgt:
+	sg_free_table(sgt);
+err_kfree_sgt:
+	kfree(sgt);
+	return ERR_PTR(ret);
+}
+
+static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
+				   struct sg_table *sgt,
+				   enum dma_data_direction dir)
+{
+	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
+
+	dma_unmap_resource(attachment->dev, sg_dma_address(sgt->sgl),
+			   priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
+	sg_free_table(sgt);
+	kfree(sgt);
+}
+
+static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
+{
+	struct vfio_pci_dma_buf *priv = dmabuf->priv;
+
+	/*
+	 * Either this or vfio_pci_dma_buf_cleanup() will remove from the list.
+	 * The refcount prevents both.
+	 */
+	if (priv->vdev) {
+		down_write(&priv->vdev->memory_lock);
+		list_del_init(&priv->dmabufs_elm);
+		up_write(&priv->vdev->memory_lock);
+		vfio_device_put(&priv->vdev->vdev);
+	}
+	kfree(priv);
+}
+
+static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
+	.attach = vfio_pci_dma_buf_attach,
+	.map_dma_buf = vfio_pci_dma_buf_map,
+	.pin = vfio_pci_dma_buf_pin,
+	.unpin = vfio_pci_dma_buf_unpin,
+	.release = vfio_pci_dma_buf_release,
+	.unmap_dma_buf = vfio_pci_dma_buf_unmap,
+};
+
+int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
+				  struct vfio_device_feature_dma_buf __user *arg,
+				  size_t argsz)
+{
+	struct vfio_device_feature_dma_buf get_dma_buf;
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+	struct vfio_pci_dma_buf *priv;
+	int ret;
+
+	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
+				 sizeof(get_dma_buf));
+	if (ret != 1)
+		return ret;
+
+	if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
+		return -EFAULT;
+
+	/* For PCI the region_index is the BAR number like everything else */
+	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)
+		return -EINVAL;
+
+	exp_info.ops = &vfio_pci_dmabuf_ops;
+	exp_info.size = pci_resource_len(vdev->pdev, get_dma_buf.region_index);
+	if (!exp_info.size)
+		return -EINVAL;
+	if (get_dma_buf.offset || get_dma_buf.length) {
+		if (get_dma_buf.length > exp_info.size ||
+		    get_dma_buf.offset >= exp_info.size ||
+		    get_dma_buf.length > exp_info.size - get_dma_buf.offset ||
+		    get_dma_buf.offset % PAGE_SIZE ||
+		    get_dma_buf.length % PAGE_SIZE)
+			return -EINVAL;
+		exp_info.size = get_dma_buf.length;
+	}
+	exp_info.flags = get_dma_buf.open_flags;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&priv->dmabufs_elm);
+	priv->offset = get_dma_buf.offset;
+
+	exp_info.priv = priv;
+	priv->dmabuf = dma_buf_export(&exp_info);
+	if (IS_ERR(priv->dmabuf)) {
+		ret = PTR_ERR(priv->dmabuf);
+		kfree(priv);
+		return ret;
+	}
+
+	/* dma_buf_put() now frees priv */
+
+	down_write(&vdev->memory_lock);
+	dma_resv_lock(priv->dmabuf->resv, NULL);
+	priv->revoked = __vfio_pci_memory_enabled(vdev);
+	priv->vdev = vdev;
+	vfio_device_get(&vdev->vdev);
+	list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
+	dma_resv_unlock(priv->dmabuf->resv);
+	up_write(&vdev->memory_lock);
+
+	/*
+	 * dma_buf_fd() consumes the reference, when the file closes the dmabuf
+	 * will be released.
+	 */
+	return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags);
+}
+
+void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
+{
+	struct vfio_pci_dma_buf *priv;
+	struct vfio_pci_dma_buf *tmp;
+
+	lockdep_assert_held_write(&vdev->memory_lock);
+
+	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
+		if (!dma_buf_try_get(priv->dmabuf))
+			continue;
+		if (priv->revoked != revoked) {
+			dma_resv_lock(priv->dmabuf->resv, NULL);
+			priv->revoked = revoked;
+			dma_buf_move_notify(priv->dmabuf);
+			dma_resv_unlock(priv->dmabuf->resv);
+		}
+		dma_buf_put(priv->dmabuf);
+	}
+}
+
+void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
+{
+	struct vfio_pci_dma_buf *priv;
+	struct vfio_pci_dma_buf *tmp;
+
+	down_write(&vdev->memory_lock);
+	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
+		if (!dma_buf_try_get(priv->dmabuf))
+			continue;
+		dma_resv_lock(priv->dmabuf->resv, NULL);
+		list_del_init(&priv->dmabufs_elm);
+		priv->vdev = NULL;
+		priv->revoked = true;
+		dma_buf_move_notify(priv->dmabuf);
+		dma_resv_unlock(priv->dmabuf->resv);
+		vfio_device_put(&vdev->vdev);
+		dma_buf_put(priv->dmabuf);
+	}
+	up_write(&vdev->memory_lock);
+}
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
index 5b1cb9a9838442..c295a1166e7a9c 100644
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -102,4 +102,27 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
 	return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
 }
 
+#ifdef CONFIG_DMA_SHARED_BUFFER
+int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
+				  struct vfio_device_feature_dma_buf __user *arg,
+				  size_t argsz);
+void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
+void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
+#else
+static int
+vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
+			      struct vfio_device_feature_dma_buf __user *arg,
+			      size_t argsz)
+{
+	return -ENOTTY;
+}
+static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
+{
+}
+static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev,
+					 bool revoked)
+{
+}
+#endif
+
 #endif
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 9d18b832e61a0d..b57f4ecc2665e1 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -93,6 +93,7 @@ struct vfio_pci_core_device {
 	struct mutex		vma_lock;
 	struct list_head	vma_list;
 	struct rw_semaphore	memory_lock;
+	struct list_head	dmabufs;
 };
 
 /* Will be exported for vfio pci drivers usage */
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 733a1cddde30a5..1dcfad6dcb6863 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -986,6 +986,24 @@ enum vfio_device_mig_state {
 	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
 };
 
+/**
+ * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
+ * region selected.
+ *
+ * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC,
+ * etc. offset/length specify a slice of the region to create the dmabuf from.
+ * If both are 0 then the whole region is used.
+ *
+ * Return: The fd number on success, -1 and errno is set on failure.
+ */
+struct vfio_device_feature_dma_buf {
+	__u32 region_index;
+	__u32 open_flags;
+	__u32 offset;
+	__u64 length;
+};
+#define VFIO_DEVICE_FEATURE_DMA_BUF 3
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-17 16:11 [PATCH 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2022-08-17 16:11 ` [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
@ 2022-08-18 11:07 ` Christian König
  2022-08-18 12:03   ` Jason Gunthorpe
  2022-08-18 12:05 ` Jason Gunthorpe
  5 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2022-08-18 11:07 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, dri-devel, kvm,
	linaro-mm-sig, linux-media, Sumit Semwal
  Cc: Daniel Vetter, Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

Am 17.08.22 um 18:11 schrieb Jason Gunthorpe:
> dma-buf has become a way to safely acquire a handle to non-struct page
> memory that can still have lifetime controlled by the exporter. Notably
> RDMA can now import dma-buf FDs and build them into MRs which allows for
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> from PCI device BARs.
>
> This series supports a use case for SPDK where a NVMe device will be owned
> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> may directly access the NVMe CMB or directly manipulate the NVMe device's
> doorbell using PCI P2P.
>
> However, as a general mechanism, it can support many other scenarios with
> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> generic and safe P2P mappings.

In general looks good to me, but we really need to get away from using 
sg_tables for this here.

The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen 
this incorrectly used so many times that I can't count them any more.

Would that be somehow avoidable? Or could you at least explain the use 
case a bit better.

Thanks,
Christian.

>
> This series goes after the "Break up ioctl dispatch functions to one
> function per ioctl" series.
>
> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_dma_buf
>
> Jason Gunthorpe (4):
>    dma-buf: Add dma_buf_try_get()
>    vfio: Add vfio_device_get()
>    vfio_pci: Do not open code pci_try_reset_function()
>    vfio/pci: Allow MMIO regions to be exported through dma-buf
>
>   drivers/vfio/pci/Makefile           |   1 +
>   drivers/vfio/pci/vfio_pci_config.c  |  22 ++-
>   drivers/vfio/pci/vfio_pci_core.c    |  33 +++-
>   drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++
>   drivers/vfio/pci/vfio_pci_priv.h    |  24 +++
>   drivers/vfio/vfio_main.c            |   3 +-
>   include/linux/dma-buf.h             |  13 ++
>   include/linux/vfio.h                |   6 +
>   include/linux/vfio_pci_core.h       |   1 +
>   include/uapi/linux/vfio.h           |  18 ++
>   10 files changed, 364 insertions(+), 22 deletions(-)
>   create mode 100644 drivers/vfio/pci/vfio_pci_dma_buf.c
>
>
> base-commit: 385f0411fcd2780b5273992832cdc8edcd5b8ea9


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-18 11:07 ` [PATCH 0/4] " Christian König
@ 2022-08-18 12:03   ` Jason Gunthorpe
  2022-08-18 12:58     ` Christian König
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-18 12:03 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Williamson, Cornelia Huck, dri-devel, kvm, linaro-mm-sig,
	linux-media, Sumit Semwal, Daniel Vetter, Leon Romanovsky,
	linux-rdma, Maor Gottlieb, Oded Gabbay

On Thu, Aug 18, 2022 at 01:07:16PM +0200, Christian König wrote:
> Am 17.08.22 um 18:11 schrieb Jason Gunthorpe:
> > dma-buf has become a way to safely acquire a handle to non-struct page
> > memory that can still have lifetime controlled by the exporter. Notably
> > RDMA can now import dma-buf FDs and build them into MRs which allows for
> > PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> > from PCI device BARs.
> > 
> > This series supports a use case for SPDK where a NVMe device will be owned
> > by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> > may directly access the NVMe CMB or directly manipulate the NVMe device's
> > doorbell using PCI P2P.
> > 
> > However, as a general mechanism, it can support many other scenarios with
> > VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> > generic and safe P2P mappings.
> 
> In general looks good to me, but we really need to get away from using
> sg_tables for this here.
> 
> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
> this incorrectly used so many times that I can't count them any more.
> 
> Would that be somehow avoidable? Or could you at least explain the use case
> a bit better.

I didn't see a way, maybe you know of one

VFIO needs to maintain a list of dmabuf FDs that have been created by
the user attached to each vfio_device:

int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
				  struct vfio_device_feature_dma_buf __user *arg,
				  size_t argsz)
{
	down_write(&vdev->memory_lock);
	list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
	up_write(&vdev->memory_lock);

And dmabuf FD's are removed from the list when the user closes the FD:

static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
{
		down_write(&priv->vdev->memory_lock);
		list_del_init(&priv->dmabufs_elm);
		up_write(&priv->vdev->memory_lock);

Which then poses the problem: How do you iterate over only dma_buf's
that are still alive to execute move?

This seems necessary as parts of the dma_buf have already been
destroyed by the time the user's release function is called.

Which I solved like this:

	down_write(&vdev->memory_lock);
	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
		if (!dma_buf_try_get(priv->dmabuf))
			continue;

So the scenarios resolve as:
 - Concurrent release is not in progress: dma_buf_try_get() succeeds
   and prevents concurrent release from starting
 - Release has started but not reached its memory_lock:
   dma_buf_try_get() fails
 - Release has started but passed its memory_lock: dmabuf is not on
   the list so dma_buf_try_get() is not called.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-17 16:11 [PATCH 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2022-08-18 11:07 ` [PATCH 0/4] " Christian König
@ 2022-08-18 12:05 ` Jason Gunthorpe
  2022-08-22 21:58   ` Alex Williamson
  5 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-18 12:05 UTC (permalink / raw)
  To: Alex Williamson, Christian König, Cornelia Huck, dri-devel,
	kvm, linaro-mm-sig, linux-media, Sumit Semwal
  Cc: Daniel Vetter, Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

On Wed, Aug 17, 2022 at 01:11:38PM -0300, Jason Gunthorpe wrote:
> dma-buf has become a way to safely acquire a handle to non-struct page
> memory that can still have lifetime controlled by the exporter. Notably
> RDMA can now import dma-buf FDs and build them into MRs which allows for
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> from PCI device BARs.
> 
> This series supports a use case for SPDK where a NVMe device will be owned
> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> may directly access the NVMe CMB or directly manipulate the NVMe device's
> doorbell using PCI P2P.
> 
> However, as a general mechanism, it can support many other scenarios with
> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> generic and safe P2P mappings.
> 
> This series goes after the "Break up ioctl dispatch functions to one
> function per ioctl" series.
> 
> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_dma_buf
> 
> Jason Gunthorpe (4):
>   dma-buf: Add dma_buf_try_get()
>   vfio: Add vfio_device_get()
>   vfio_pci: Do not open code pci_try_reset_function()
>   vfio/pci: Allow MMIO regions to be exported through dma-buf
> 
>  drivers/vfio/pci/Makefile           |   1 +
>  drivers/vfio/pci/vfio_pci_config.c  |  22 ++-
>  drivers/vfio/pci/vfio_pci_core.c    |  33 +++-
>  drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++

I forget about this..

Alex, do you want to start doing as Linus discused and I will rename
this new file to "dma_buf.c" ?

Or keep this directory as having the vfio_pci_* prefix for
consistency?

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-18 12:03   ` Jason Gunthorpe
@ 2022-08-18 12:58     ` Christian König
  2022-08-18 13:16       ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2022-08-18 12:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, dri-devel, kvm, linaro-mm-sig,
	linux-media, Sumit Semwal, Daniel Vetter, Leon Romanovsky,
	linux-rdma, Maor Gottlieb, Oded Gabbay



Am 18.08.22 um 14:03 schrieb Jason Gunthorpe:
> On Thu, Aug 18, 2022 at 01:07:16PM +0200, Christian König wrote:
>> Am 17.08.22 um 18:11 schrieb Jason Gunthorpe:
>>> dma-buf has become a way to safely acquire a handle to non-struct page
>>> memory that can still have lifetime controlled by the exporter. Notably
>>> RDMA can now import dma-buf FDs and build them into MRs which allows for
>>> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
>>> from PCI device BARs.
>>>
>>> This series supports a use case for SPDK where a NVMe device will be owned
>>> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
>>> may directly access the NVMe CMB or directly manipulate the NVMe device's
>>> doorbell using PCI P2P.
>>>
>>> However, as a general mechanism, it can support many other scenarios with
>>> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
>>> generic and safe P2P mappings.
>> In general looks good to me, but we really need to get away from using
>> sg_tables for this here.
>>
>> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
>> this incorrectly used so many times that I can't count them any more.
>>
>> Would that be somehow avoidable? Or could you at least explain the use case
>> a bit better.
> I didn't see a way, maybe you know of one

For GEM objects we usually don't use the reference count of the DMA-buf, 
but rather that of the GEM object for this. But that's not an ideal 
solution either.

>
> VFIO needs to maintain a list of dmabuf FDs that have been created by
> the user attached to each vfio_device:
>
> int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> 				  struct vfio_device_feature_dma_buf __user *arg,
> 				  size_t argsz)
> {
> 	down_write(&vdev->memory_lock);
> 	list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
> 	up_write(&vdev->memory_lock);
>
> And dmabuf FD's are removed from the list when the user closes the FD:
>
> static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> {
> 		down_write(&priv->vdev->memory_lock);
> 		list_del_init(&priv->dmabufs_elm);
> 		up_write(&priv->vdev->memory_lock);
>
> Which then poses the problem: How do you iterate over only dma_buf's
> that are still alive to execute move?
>
> This seems necessary as parts of the dma_buf have already been
> destroyed by the time the user's release function is called.
>
> Which I solved like this:
>
> 	down_write(&vdev->memory_lock);
> 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> 		if (!dma_buf_try_get(priv->dmabuf))
> 			continue;

What would happen if you don't skip destroyed dma-bufs here? In other 
words why do you maintain that list in the first place?

Regards,
Christian.

>
> So the scenarios resolve as:
>   - Concurrent release is not in progress: dma_buf_try_get() succeeds
>     and prevents concurrent release from starting
>   - Release has started but not reached its memory_lock:
>     dma_buf_try_get() fails
>   - Release has started but passed its memory_lock: dmabuf is not on
>     the list so dma_buf_try_get() is not called.
>
> Jason


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-18 12:58     ` Christian König
@ 2022-08-18 13:16       ` Jason Gunthorpe
  2022-08-18 13:37         ` Christian König
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-18 13:16 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Williamson, Cornelia Huck, dri-devel, kvm, linaro-mm-sig,
	linux-media, Sumit Semwal, Daniel Vetter, Leon Romanovsky,
	linux-rdma, Maor Gottlieb, Oded Gabbay

On Thu, Aug 18, 2022 at 02:58:10PM +0200, Christian König wrote:

> > > The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
> > > this incorrectly used so many times that I can't count them any more.
> > > 
> > > Would that be somehow avoidable? Or could you at least explain the use case
> > > a bit better.
> > I didn't see a way, maybe you know of one
> 
> For GEM objects we usually don't use the reference count of the DMA-buf, but
> rather that of the GEM object for this. But that's not an ideal solution
> either.

You can't really ignore the dmabuf refcount. At some point you have to
deal with the dmabuf being asynchronously released by userspace.

> > 	down_write(&vdev->memory_lock);
> > 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> > 		if (!dma_buf_try_get(priv->dmabuf))
> > 			continue;
> 
> What would happen if you don't skip destroyed dma-bufs here? In other words
> why do you maintain that list in the first place?

The list is to keep track of the dmabufs that were created, it is not
optional.

The only question is what do you do about invoking
dma_buf_move_notify() on a dmabuf that is already undergoing
destruction.

For instance undergoing destruction means the dmabuf core has already
done this:

	mutex_lock(&db_list.lock);
	list_del(&dmabuf->list_node);
	mutex_unlock(&db_list.lock);
	dma_buf_stats_teardown(dmabuf);

So it seems non-ideal to continue to use it.

However, dma_buf_move_notify() specifically has no issue with that level of
destruction since it only does

	list_for_each_entry(attach, &dmabuf->attachments, node)

And attachments must be empty if the file refcount is zero.

So we could delete the try_buf and just rely on move being safe on
partially destroyed dma_buf's as part of the API design.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-18 13:16       ` Jason Gunthorpe
@ 2022-08-18 13:37         ` Christian König
  2022-08-19 13:11           ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2022-08-18 13:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, dri-devel, kvm, linaro-mm-sig,
	linux-media, Sumit Semwal, Daniel Vetter, Leon Romanovsky,
	linux-rdma, Maor Gottlieb, Oded Gabbay

Am 18.08.22 um 15:16 schrieb Jason Gunthorpe:
> On Thu, Aug 18, 2022 at 02:58:10PM +0200, Christian König wrote:
>
>>>> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
>>>> this incorrectly used so many times that I can't count them any more.
>>>>
>>>> Would that be somehow avoidable? Or could you at least explain the use case
>>>> a bit better.
>>> I didn't see a way, maybe you know of one
>> For GEM objects we usually don't use the reference count of the DMA-buf, but
>> rather that of the GEM object for this. But that's not an ideal solution
>> either.
> You can't really ignore the dmabuf refcount. At some point you have to
> deal with the dmabuf being asynchronously released by userspace.

Yeah, but in this case the dma-buf is just a reference to the 
real/private object which holds the backing store.

When the dma-buf is released you drop the real object reference and from 
your driver internals you only try_get only the real object.

The advantage is that only your driver can use the try_get function and 
not some importing driver which doesn't know about the internals of the 
exporter.

We just had to many cases where developers weren't sure if a pointer is 
still valid and by using try_get it just "magically" got working (well I 
have to admit it made the crashing less likely....).

>>> 	down_write(&vdev->memory_lock);
>>> 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
>>> 		if (!dma_buf_try_get(priv->dmabuf))
>>> 			continue;
>> What would happen if you don't skip destroyed dma-bufs here? In other words
>> why do you maintain that list in the first place?
> The list is to keep track of the dmabufs that were created, it is not
> optional.
>
> The only question is what do you do about invoking
> dma_buf_move_notify() on a dmabuf that is already undergoing
> destruction.

Ah, yes. Really good point.

>
> For instance undergoing destruction means the dmabuf core has already
> done this:
>
> 	mutex_lock(&db_list.lock);
> 	list_del(&dmabuf->list_node);
> 	mutex_unlock(&db_list.lock);
> 	dma_buf_stats_teardown(dmabuf);
>
> So it seems non-ideal to continue to use it.
>
> However, dma_buf_move_notify() specifically has no issue with that level of
> destruction since it only does
>
> 	list_for_each_entry(attach, &dmabuf->attachments, node)
>
> And attachments must be empty if the file refcount is zero.
>
> So we could delete the try_buf and just rely on move being safe on
> partially destroyed dma_buf's as part of the API design.

I think that might be the more defensive approach. A comment on the 
dma_buf_move_notify() function should probably be a good idea.

Thanks,
Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-18 13:37         ` Christian König
@ 2022-08-19 13:11           ` Jason Gunthorpe
  2022-08-19 13:33             ` Christian König
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-19 13:11 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Williamson, Cornelia Huck, dri-devel, kvm, linaro-mm-sig,
	linux-media, Sumit Semwal, Daniel Vetter, Leon Romanovsky,
	linux-rdma, Maor Gottlieb, Oded Gabbay

On Thu, Aug 18, 2022 at 03:37:01PM +0200, Christian König wrote:
> Am 18.08.22 um 15:16 schrieb Jason Gunthorpe:
> > On Thu, Aug 18, 2022 at 02:58:10PM +0200, Christian König wrote:
> > 
> > > > > The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
> > > > > this incorrectly used so many times that I can't count them any more.
> > > > > 
> > > > > Would that be somehow avoidable? Or could you at least explain the use case
> > > > > a bit better.
> > > > I didn't see a way, maybe you know of one
> > > For GEM objects we usually don't use the reference count of the DMA-buf, but
> > > rather that of the GEM object for this. But that's not an ideal solution
> > > either.
> > You can't really ignore the dmabuf refcount. At some point you have to
> > deal with the dmabuf being asynchronously released by userspace.
> 
> Yeah, but in this case the dma-buf is just a reference to the real/private
> object which holds the backing store.

The gem approach is backwards to what I did here.

GEM holds a singleton pointer to the dmabuf and holds a reference on
it as long as it has the pointer. This means the dmabuf can not be
freed until the GEM object is freed.

For this I held a "weak reference" on the dmabuf in a list, and we
convert the weak reference to a strong reference in the usual way
using a try_get.

The reason it is different is because the VFIO interface allows
creating a DMABUF with unique parameters on every user request. Eg the
user can select a BAR index and a slice of the MMIO space unique to
each each request and this results in a unique DMABUF.

Due to this we have to store a list of DMABUFs and we need the
DMABUF's to clean up their memory when the user closes the file.

> > So we could delete the try_buf and just rely on move being safe on
> > partially destroyed dma_buf's as part of the API design.
> 
> I think that might be the more defensive approach. A comment on the
> dma_buf_move_notify() function should probably be a good idea.

IMHO, it is an anti-pattern. The caller should hold a strong reference
on an object before invoking any API surface. Upgrading a weak
reference to a strong reference requires the standard "try get" API.

But if you feel strongly I don't mind dropping the try_get around move.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-19 13:11           ` Jason Gunthorpe
@ 2022-08-19 13:33             ` Christian König
  2022-08-19 13:39               ` Jason Gunthorpe
  0 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2022-08-19 13:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, dri-devel, kvm, linaro-mm-sig,
	linux-media, Sumit Semwal, Daniel Vetter, Leon Romanovsky,
	linux-rdma, Maor Gottlieb, Oded Gabbay

Am 19.08.22 um 15:11 schrieb Jason Gunthorpe:
> On Thu, Aug 18, 2022 at 03:37:01PM +0200, Christian König wrote:
>> Am 18.08.22 um 15:16 schrieb Jason Gunthorpe:
>>> On Thu, Aug 18, 2022 at 02:58:10PM +0200, Christian König wrote:
>>>
>>>>>> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
>>>>>> this incorrectly used so many times that I can't count them any more.
>>>>>>
>>>>>> Would that be somehow avoidable? Or could you at least explain the use case
>>>>>> a bit better.
>>>>> I didn't see a way, maybe you know of one
>>>> For GEM objects we usually don't use the reference count of the DMA-buf, but
>>>> rather that of the GEM object for this. But that's not an ideal solution
>>>> either.
>>> You can't really ignore the dmabuf refcount. At some point you have to
>>> deal with the dmabuf being asynchronously released by userspace.
>> Yeah, but in this case the dma-buf is just a reference to the real/private
>> object which holds the backing store.
> The gem approach is backwards to what I did here.

As I said, what GEM does is not necessary the best approach either.

> GEM holds a singleton pointer to the dmabuf and holds a reference on
> it as long as it has the pointer. This means the dmabuf can not be
> freed until the GEM object is freed.
>
> For this I held a "weak reference" on the dmabuf in a list, and we
> convert the weak reference to a strong reference in the usual way
> using a try_get.
>
> The reason it is different is because the VFIO interface allows
> creating a DMABUF with unique parameters on every user request. Eg the
> user can select a BAR index and a slice of the MMIO space unique to
> each each request and this results in a unique DMABUF.
>
> Due to this we have to store a list of DMABUFs and we need the
> DMABUF's to clean up their memory when the user closes the file.

Yeah, that makes sense.

>>> So we could delete the try_buf and just rely on move being safe on
>>> partially destroyed dma_buf's as part of the API design.
>> I think that might be the more defensive approach. A comment on the
>> dma_buf_move_notify() function should probably be a good idea.
> IMHO, it is an anti-pattern. The caller should hold a strong reference
> on an object before invoking any API surface. Upgrading a weak
> reference to a strong reference requires the standard "try get" API.
>
> But if you feel strongly I don't mind dropping the try_get around move.

Well I see it as well that both approaches are not ideal, but my gut 
feeling tells me that just documenting that dma_buf_move_notify() can 
still be called as long as the release callback wasn't called yet is 
probably the better approach.

On the other hand this is really just a gut feeling without strong 
arguments backing it. So if somebody has an argument which makes try_get 
necessary I'm happy to hear it.

Regards,
Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-19 13:33             ` Christian König
@ 2022-08-19 13:39               ` Jason Gunthorpe
  2022-08-19 13:47                 ` Christian König
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-19 13:39 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Williamson, Cornelia Huck, dri-devel, kvm, linaro-mm-sig,
	linux-media, Sumit Semwal, Daniel Vetter, Leon Romanovsky,
	linux-rdma, Maor Gottlieb, Oded Gabbay

On Fri, Aug 19, 2022 at 03:33:04PM +0200, Christian König wrote:

> > > > So we could delete the try_buf and just rely on move being safe on
> > > > partially destroyed dma_buf's as part of the API design.
> > > I think that might be the more defensive approach. A comment on the
> > > dma_buf_move_notify() function should probably be a good idea.
> > IMHO, it is an anti-pattern. The caller should hold a strong reference
> > on an object before invoking any API surface. Upgrading a weak
> > reference to a strong reference requires the standard "try get" API.
> > 
> > But if you feel strongly I don't mind dropping the try_get around move.
> 
> Well I see it as well that both approaches are not ideal, but my gut feeling
> tells me that just documenting that dma_buf_move_notify() can still be
> called as long as the release callback wasn't called yet is probably the
> better approach.

The comment would say something like:

 "dma_resv_lock(), dma_buf_move_notify(), dma_resv_unlock() may be
  called with a 0 refcount so long as ops->release() hasn't returned"

Which is a really abnormal API design, IMHO.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-19 13:39               ` Jason Gunthorpe
@ 2022-08-19 13:47                 ` Christian König
  0 siblings, 0 replies; 20+ messages in thread
From: Christian König @ 2022-08-19 13:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, dri-devel, kvm, linaro-mm-sig,
	linux-media, Sumit Semwal, Daniel Vetter, Leon Romanovsky,
	linux-rdma, Maor Gottlieb, Oded Gabbay

Am 19.08.22 um 15:39 schrieb Jason Gunthorpe:
> On Fri, Aug 19, 2022 at 03:33:04PM +0200, Christian König wrote:
>
>>>>> So we could delete the try_buf and just rely on move being safe on
>>>>> partially destroyed dma_buf's as part of the API design.
>>>> I think that might be the more defensive approach. A comment on the
>>>> dma_buf_move_notify() function should probably be a good idea.
>>> IMHO, it is an anti-pattern. The caller should hold a strong reference
>>> on an object before invoking any API surface. Upgrading a weak
>>> reference to a strong reference requires the standard "try get" API.
>>>
>>> But if you feel strongly I don't mind dropping the try_get around move.
>> Well I see it as well that both approaches are not ideal, but my gut feeling
>> tells me that just documenting that dma_buf_move_notify() can still be
>> called as long as the release callback wasn't called yet is probably the
>> better approach.
> The comment would say something like:
>
>   "dma_resv_lock(), dma_buf_move_notify(), dma_resv_unlock() may be
>    called with a 0 refcount so long as ops->release() hasn't returned"
>
> Which is a really abnormal API design, IMHO.

Mhm, Daniel or other do you have any opinion on that as well?

Thanks,
Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Fwd: [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
  2022-08-17 16:11 ` [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
@ 2022-08-21 13:51   ` Oded Gabbay
  2022-08-26 18:10     ` Jason Gunthorpe
  2022-08-29  5:04   ` Yan Zhao
  1 sibling, 1 reply; 20+ messages in thread
From: Oded Gabbay @ 2022-08-21 13:51 UTC (permalink / raw)
  To: Alex Williamson, Christian König, Cornelia Huck,
	Maling list - DRI developers, KVM list,
	moderated list:DMA BUFFER SHARING FRAMEWORK,
	Linux Media Mailing List, Sumit Semwal, Jason Gunthorpe
  Cc: Daniel Vetter, Leon Romanovsky, linux-rdma, Maor Gottlieb

On Wed, Aug 17, 2022 at 7:11 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> dma-buf has become a way to safely acquire a handle to non-struct page
> memory that can still have lifetime controlled by the exporter. Notably
> RDMA can now import dma-buf FDs and build them into MRs which allows for
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> from PCI device BARs.
>
> The patch design loosely follows the pattern in commit
> db1a8dd916aa ("habanalabs: add support for dma-buf exporter") except this
> does not support pinning.
>
> Instead, this implements what, in the past, we've called a revocable
> attachment using move. In normal situations the attachment is pinned, as a
> BAR does not change physical address. However when the VFIO device is
> closed, or a PCI reset is issued, access to the MMIO memory is revoked.
>
> Revoked means that move occurs, but an attempt to immediately re-map the
> memory will fail. In the reset case a future move will be triggered when
> MMIO access returns. As both close and reset are under userspace control
> it is expected that userspace will suspend use of the dma-buf before doing
> these operations, the revoke is purely for kernel self-defense against a
> hostile userspace.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/pci/Makefile           |   1 +
>  drivers/vfio/pci/vfio_pci_config.c  |   8 +-
>  drivers/vfio/pci/vfio_pci_core.c    |  28 ++-
>  drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_priv.h    |  23 +++
>  include/linux/vfio_pci_core.h       |   1 +
>  include/uapi/linux/vfio.h           |  18 ++
>  7 files changed, 336 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/vfio/pci/vfio_pci_dma_buf.c
>
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 24c524224da5a3..81006a157cde14 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -2,6 +2,7 @@
>
>  vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
>  vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o
> +vfio-pci-core-$(CONFIG_DMA_SHARED_BUFFER) += vfio_pci_dma_buf.o
>  obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
>
>  vfio-pci-y := vfio_pci.o
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index d22921efa25987..f8a9c24d04aeb7 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -584,10 +584,12 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
>                 virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY);
>                 new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);
>
> -               if (!new_mem)
> +               if (!new_mem) {
>                         vfio_pci_zap_and_down_write_memory_lock(vdev);
> -               else
> +                       vfio_pci_dma_buf_move(vdev, true);
> +               } else {
>                         down_write(&vdev->memory_lock);
> +               }
>
>                 /*
>                  * If the user is writing mem/io enable (new_mem/io) and we
> @@ -622,6 +624,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
>                 *virt_cmd &= cpu_to_le16(~mask);
>                 *virt_cmd |= cpu_to_le16(new_cmd & mask);
>
> +               if (__vfio_pci_memory_enabled(vdev))
> +                       vfio_pci_dma_buf_move(vdev, false);
>                 up_write(&vdev->memory_lock);
>         }
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d13e22021860cc..206f159c480e42 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -504,6 +504,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>         vfio_spapr_pci_eeh_release(vdev->pdev);
>         vfio_pci_core_disable(vdev);
>
> +       vfio_pci_dma_buf_cleanup(vdev);
> +
>         mutex_lock(&vdev->igate);
>         if (vdev->err_trigger) {
>                 eventfd_ctx_put(vdev->err_trigger);
> @@ -980,7 +982,10 @@ int vfio_pci_try_reset_function(struct vfio_pci_core_device *vdev)
>          */
>         vfio_pci_set_power_state(vdev, PCI_D0);
>
> +       vfio_pci_dma_buf_move(vdev, true);
>         ret = pci_try_reset_function(vdev->pdev);
> +       if (__vfio_pci_memory_enabled(vdev))
> +               vfio_pci_dma_buf_move(vdev, false);
>         up_write(&vdev->memory_lock);
>
>         return ret;
> @@ -1210,11 +1215,10 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);
>
> -static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
> -                                      uuid_t __user *arg, size_t argsz)
> +static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
> +                                      u32 flags, uuid_t __user *arg,
> +                                      size_t argsz)
>  {
> -       struct vfio_pci_core_device *vdev =
> -               container_of(device, struct vfio_pci_core_device, vdev);
>         uuid_t uuid;
>         int ret;
>
> @@ -1241,9 +1245,14 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
>  int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>                                 void __user *arg, size_t argsz)
>  {
> +       struct vfio_pci_core_device *vdev =
> +               container_of(device, struct vfio_pci_core_device, vdev);
> +
>         switch (flags & VFIO_DEVICE_FEATURE_MASK) {
>         case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
> -               return vfio_pci_core_feature_token(device, flags, arg, argsz);
> +               return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
> +       case VFIO_DEVICE_FEATURE_DMA_BUF:
> +               return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
>         default:
>                 return -ENOTTY;
>         }
> @@ -1881,6 +1890,7 @@ void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev,
>         INIT_LIST_HEAD(&vdev->vma_list);
>         INIT_LIST_HEAD(&vdev->sriov_pfs_item);
>         init_rwsem(&vdev->memory_lock);
> +       INIT_LIST_HEAD(&vdev->dmabufs);
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_init_device);
>
> @@ -2227,11 +2237,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
>          * cause the PCI config space reset without restoring the original
>          * state (saved locally in 'vdev->pm_save').
>          */
> -       list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
> +       list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
> +               vfio_pci_dma_buf_move(cur, true);
>                 vfio_pci_set_power_state(cur, PCI_D0);
> +       }
>
>         ret = pci_reset_bus(pdev);
>
> +       list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
> +               if (__vfio_pci_memory_enabled(cur))
> +                       vfio_pci_dma_buf_move(cur, false);
> +
>  err_undo:
>         list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
>                 if (cur == cur_mem)
> diff --git a/drivers/vfio/pci/vfio_pci_dma_buf.c b/drivers/vfio/pci/vfio_pci_dma_buf.c
> new file mode 100644
> index 00000000000000..ac32405de5e48d
> --- /dev/null
> +++ b/drivers/vfio/pci/vfio_pci_dma_buf.c
> @@ -0,0 +1,265 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> + */
> +#include <linux/dma-buf.h>
> +#include <linux/pci-p2pdma.h>
> +#include <linux/dma-resv.h>
> +
> +#include "vfio_pci_priv.h"
> +
> +MODULE_IMPORT_NS(DMA_BUF);
> +
> +struct vfio_pci_dma_buf {
> +       struct dma_buf *dmabuf;
> +       struct vfio_pci_core_device *vdev;
> +       struct list_head dmabufs_elm;
> +       unsigned int index;
> +       size_t offset;
> +       bool revoked;
> +};
> +
> +static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
> +                                  struct dma_buf_attachment *attachment)
> +{
> +       struct vfio_pci_dma_buf *priv = dmabuf->priv;
> +       int rc;
> +
> +       rc = pci_p2pdma_distance_many(priv->vdev->pdev, &attachment->dev, 1,
> +                                     true);
> +       if (rc < 0)
> +               attachment->peer2peer = false;
> +       return 0;
> +}
> +
> +static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment)
> +{
> +}
> +
> +static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment)
> +{
> +       /*
> +        * Uses the dynamic interface but must always allow for
> +        * dma_buf_move_notify() to do revoke
> +        */
> +       return -EINVAL;
> +}
> +
> +static struct sg_table *
> +vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
> +                    enum dma_data_direction dir)
> +{
> +       size_t sgl_size = dma_get_max_seg_size(attachment->dev);
> +       struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> +       struct scatterlist *sgl;
> +       struct sg_table *sgt;
> +       dma_addr_t dma_addr;
> +       unsigned int nents;
> +       size_t offset;
> +       int ret;
> +
> +       dma_resv_assert_held(priv->dmabuf->resv);
> +
> +       if (!attachment->peer2peer)
> +               return ERR_PTR(-EPERM);
> +
> +       if (!priv->revoked)
> +               return ERR_PTR(-ENODEV);
> +
> +       sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> +       if (!sgt)
> +               return ERR_PTR(-ENOMEM);
> +
> +       nents = DIV_ROUND_UP(priv->dmabuf->size, sgl_size);
> +       ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
> +       if (ret)
> +               goto err_kfree_sgt;
> +
> +       /*
> +        * Since the memory being mapped is a device memory it could never be in
> +        * CPU caches.
> +        */
> +       dma_addr = dma_map_resource(
> +               attachment->dev,
> +               pci_resource_start(priv->vdev->pdev, priv->index) +
> +                       priv->offset,
> +               priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
> +       ret = dma_mapping_error(attachment->dev, dma_addr);
> +       if (ret)
> +               goto err_free_sgt;
> +
> +       /*
> +        * Break the BAR's physical range up into max sized SGL's according to
> +        * the device's requirement.
> +        */
> +       sgl = sgt->sgl;
> +       for (offset = 0; offset != priv->dmabuf->size;) {
> +               size_t chunk_size = min(priv->dmabuf->size - offset, sgl_size);
> +
> +               sg_set_page(sgl, NULL, chunk_size, 0);
> +               sg_dma_address(sgl) = dma_addr + offset;
> +               sg_dma_len(sgl) = chunk_size;
> +               sgl = sg_next(sgl);
> +               offset += chunk_size;
> +       }
> +
> +       /*
> +        * Because we are not going to include a CPU list we want to have some
> +        * chance that other users will detect this by setting the orig_nents to
> +        * 0 and using only nents (length of DMA list) when going over the sgl
> +        */
> +       sgt->orig_nents = 0;
> +       return sgt;
> +
> +err_free_sgt:
> +       sg_free_table(sgt);
> +err_kfree_sgt:
> +       kfree(sgt);
> +       return ERR_PTR(ret);
> +}
> +
> +static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
> +                                  struct sg_table *sgt,
> +                                  enum dma_data_direction dir)
> +{
> +       struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> +
> +       dma_unmap_resource(attachment->dev, sg_dma_address(sgt->sgl),
> +                          priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
> +       sg_free_table(sgt);
Before calling sg_free_table(), you need to restore the orig_nents as
it is used in that function to free the allocated memory of the sgt.

> +       kfree(sgt);
> +}
> +
> +static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> +{
> +       struct vfio_pci_dma_buf *priv = dmabuf->priv;
> +
> +       /*
> +        * Either this or vfio_pci_dma_buf_cleanup() will remove from the list.
> +        * The refcount prevents both.
> +        */
> +       if (priv->vdev) {
> +               down_write(&priv->vdev->memory_lock);
> +               list_del_init(&priv->dmabufs_elm);
> +               up_write(&priv->vdev->memory_lock);
> +               vfio_device_put(&priv->vdev->vdev);
> +       }
> +       kfree(priv);
> +}
> +
> +static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
> +       .attach = vfio_pci_dma_buf_attach,
> +       .map_dma_buf = vfio_pci_dma_buf_map,
> +       .pin = vfio_pci_dma_buf_pin,
> +       .unpin = vfio_pci_dma_buf_unpin,
> +       .release = vfio_pci_dma_buf_release,
> +       .unmap_dma_buf = vfio_pci_dma_buf_unmap,
> +};
> +
> +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> +                                 struct vfio_device_feature_dma_buf __user *arg,
> +                                 size_t argsz)
> +{
> +       struct vfio_device_feature_dma_buf get_dma_buf;
> +       DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> +       struct vfio_pci_dma_buf *priv;
> +       int ret;
> +
> +       ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
> +                                sizeof(get_dma_buf));
> +       if (ret != 1)
> +               return ret;
> +
> +       if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
> +               return -EFAULT;
> +
> +       /* For PCI the region_index is the BAR number like everything else */
> +       if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)
> +               return -EINVAL;
> +
> +       exp_info.ops = &vfio_pci_dmabuf_ops;
> +       exp_info.size = pci_resource_len(vdev->pdev, get_dma_buf.region_index);
> +       if (!exp_info.size)
> +               return -EINVAL;
> +       if (get_dma_buf.offset || get_dma_buf.length) {
> +               if (get_dma_buf.length > exp_info.size ||
> +                   get_dma_buf.offset >= exp_info.size ||
> +                   get_dma_buf.length > exp_info.size - get_dma_buf.offset ||
> +                   get_dma_buf.offset % PAGE_SIZE ||
> +                   get_dma_buf.length % PAGE_SIZE)
> +                       return -EINVAL;
> +               exp_info.size = get_dma_buf.length;
> +       }
> +       exp_info.flags = get_dma_buf.open_flags;
> +
> +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +       if (!priv)
> +               return -ENOMEM;
> +       INIT_LIST_HEAD(&priv->dmabufs_elm);
> +       priv->offset = get_dma_buf.offset;
> +
> +       exp_info.priv = priv;
> +       priv->dmabuf = dma_buf_export(&exp_info);
> +       if (IS_ERR(priv->dmabuf)) {
> +               ret = PTR_ERR(priv->dmabuf);
> +               kfree(priv);
> +               return ret;
> +       }
> +
> +       /* dma_buf_put() now frees priv */
> +
> +       down_write(&vdev->memory_lock);
> +       dma_resv_lock(priv->dmabuf->resv, NULL);
> +       priv->revoked = __vfio_pci_memory_enabled(vdev);
> +       priv->vdev = vdev;
> +       vfio_device_get(&vdev->vdev);
> +       list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
> +       dma_resv_unlock(priv->dmabuf->resv);
> +       up_write(&vdev->memory_lock);
> +
> +       /*
> +        * dma_buf_fd() consumes the reference, when the file closes the dmabuf
> +        * will be released.
> +        */
> +       return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags);
> +}
> +
> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> +{
> +       struct vfio_pci_dma_buf *priv;
> +       struct vfio_pci_dma_buf *tmp;
> +
> +       lockdep_assert_held_write(&vdev->memory_lock);
> +
> +       list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> +               if (!dma_buf_try_get(priv->dmabuf))
> +                       continue;
> +               if (priv->revoked != revoked) {
> +                       dma_resv_lock(priv->dmabuf->resv, NULL);
> +                       priv->revoked = revoked;
> +                       dma_buf_move_notify(priv->dmabuf);
> +                       dma_resv_unlock(priv->dmabuf->resv);
> +               }
> +               dma_buf_put(priv->dmabuf);
> +       }
> +}
> +
> +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> +{
> +       struct vfio_pci_dma_buf *priv;
> +       struct vfio_pci_dma_buf *tmp;
> +
> +       down_write(&vdev->memory_lock);
> +       list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> +               if (!dma_buf_try_get(priv->dmabuf))
> +                       continue;
> +               dma_resv_lock(priv->dmabuf->resv, NULL);
> +               list_del_init(&priv->dmabufs_elm);
> +               priv->vdev = NULL;
> +               priv->revoked = true;
> +               dma_buf_move_notify(priv->dmabuf);
> +               dma_resv_unlock(priv->dmabuf->resv);
> +               vfio_device_put(&vdev->vdev);
> +               dma_buf_put(priv->dmabuf);
> +       }
> +       up_write(&vdev->memory_lock);
> +}
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> index 5b1cb9a9838442..c295a1166e7a9c 100644
> --- a/drivers/vfio/pci/vfio_pci_priv.h
> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> @@ -102,4 +102,27 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
>         return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
>  }
>
> +#ifdef CONFIG_DMA_SHARED_BUFFER
> +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> +                                 struct vfio_device_feature_dma_buf __user *arg,
> +                                 size_t argsz);
> +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
> +#else
> +static int
> +vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> +                             struct vfio_device_feature_dma_buf __user *arg,
> +                             size_t argsz)
> +{
> +       return -ENOTTY;
> +}
> +static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> +{
> +}
> +static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev,
> +                                        bool revoked)
> +{
> +}
> +#endif
> +
>  #endif
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 9d18b832e61a0d..b57f4ecc2665e1 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -93,6 +93,7 @@ struct vfio_pci_core_device {
>         struct mutex            vma_lock;
>         struct list_head        vma_list;
>         struct rw_semaphore     memory_lock;
> +       struct list_head        dmabufs;
>  };
>
>  /* Will be exported for vfio pci drivers usage */
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 733a1cddde30a5..1dcfad6dcb6863 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -986,6 +986,24 @@ enum vfio_device_mig_state {
>         VFIO_DEVICE_STATE_RUNNING_P2P = 5,
>  };
>
> +/**
> + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
> + * region selected.
> + *
> + * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC,
> + * etc. offset/length specify a slice of the region to create the dmabuf from.
> + * If both are 0 then the whole region is used.
> + *
> + * Return: The fd number on success, -1 and errno is set on failure.
> + */
> +struct vfio_device_feature_dma_buf {
> +       __u32 region_index;
> +       __u32 open_flags;
> +       __u32 offset;
> +       __u64 length;
> +};
> +#define VFIO_DEVICE_FEATURE_DMA_BUF 3
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
>
>  /**
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf
  2022-08-18 12:05 ` Jason Gunthorpe
@ 2022-08-22 21:58   ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2022-08-22 21:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christian König, Cornelia Huck, dri-devel, kvm,
	linaro-mm-sig, linux-media, Sumit Semwal, Daniel Vetter,
	Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

On Thu, 18 Aug 2022 09:05:24 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Aug 17, 2022 at 01:11:38PM -0300, Jason Gunthorpe wrote:
> > dma-buf has become a way to safely acquire a handle to non-struct page
> > memory that can still have lifetime controlled by the exporter. Notably
> > RDMA can now import dma-buf FDs and build them into MRs which allows for
> > PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> > from PCI device BARs.
> > 
> > This series supports a use case for SPDK where a NVMe device will be owned
> > by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> > may directly access the NVMe CMB or directly manipulate the NVMe device's
> > doorbell using PCI P2P.
> > 
> > However, as a general mechanism, it can support many other scenarios with
> > VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> > generic and safe P2P mappings.
> > 
> > This series goes after the "Break up ioctl dispatch functions to one
> > function per ioctl" series.
> > 
> > This is on github: https://github.com/jgunthorpe/linux/commits/vfio_dma_buf
> > 
> > Jason Gunthorpe (4):
> >   dma-buf: Add dma_buf_try_get()
> >   vfio: Add vfio_device_get()
> >   vfio_pci: Do not open code pci_try_reset_function()
> >   vfio/pci: Allow MMIO regions to be exported through dma-buf
> > 
> >  drivers/vfio/pci/Makefile           |   1 +
> >  drivers/vfio/pci/vfio_pci_config.c  |  22 ++-
> >  drivers/vfio/pci/vfio_pci_core.c    |  33 +++-
> >  drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++  
> 
> I forget about this..
> 
> Alex, do you want to start doing as Linus discused and I will rename
> this new file to "dma_buf.c" ?
> 
> Or keep this directory as having the vfio_pci_* prefix for
> consistency?

I have a hard time generating a strong opinion over file name
redundancy relative to directory structure.  By my count, over 17% of
files in drivers/ have some file name redundancy to their parent
directory structure (up to two levels).  I see we already have two
$FOO_dma_buf.c files in the tree, virtio and amdgpu among these.  In
the virtio case this is somewhat justified, to me at least, as the
virtio_dma_buf.h file exists in a shared include namespace.  However,
this justification only accounts for about 1% of such files, for many
others the redundancy exists in the include path as well.

I guess if we don't have a reason other than naming consistency and
accept an end goal to incrementally remove file name vs directory
structure redundancy where it makes sense, sure, name it dma_buf.c.
Ugh. Thanks,

Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fwd: [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
  2022-08-21 13:51   ` Fwd: " Oded Gabbay
@ 2022-08-26 18:10     ` Jason Gunthorpe
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-26 18:10 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Alex Williamson, Christian König, Cornelia Huck,
	Maling list - DRI developers, KVM list,
	moderated list:DMA BUFFER SHARING FRAMEWORK,
	Linux Media Mailing List, Sumit Semwal, Daniel Vetter,
	Leon Romanovsky, linux-rdma, Maor Gottlieb

On Sun, Aug 21, 2022 at 04:51:34PM +0300, Oded Gabbay wrote:

> > +static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
> > +                                  struct sg_table *sgt,
> > +                                  enum dma_data_direction dir)
> > +{
> > +       struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> > +
> > +       dma_unmap_resource(attachment->dev, sg_dma_address(sgt->sgl),
> > +                          priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
> > +       sg_free_table(sgt);
> Before calling sg_free_table(), you need to restore the orig_nents as
> it is used in that function to free the allocated memory of the sgt.

Oops, right, thanks good catch

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
  2022-08-17 16:11 ` [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
  2022-08-21 13:51   ` Fwd: " Oded Gabbay
@ 2022-08-29  5:04   ` Yan Zhao
  2022-08-29 12:26     ` Jason Gunthorpe
  1 sibling, 1 reply; 20+ messages in thread
From: Yan Zhao @ 2022-08-29  5:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Christian König, Cornelia Huck, dri-devel,
	kvm, linaro-mm-sig, linux-media, Sumit Semwal, Daniel Vetter,
	Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

On Wed, Aug 17, 2022 at 01:11:42PM -0300, Jason Gunthorpe wrote:
> dma-buf has become a way to safely acquire a handle to non-struct page
> memory that can still have lifetime controlled by the exporter. Notably
> RDMA can now import dma-buf FDs and build them into MRs which allows for
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> from PCI device BARs.
> 
> The patch design loosely follows the pattern in commit
> db1a8dd916aa ("habanalabs: add support for dma-buf exporter") except this
> does not support pinning.
> 
> Instead, this implements what, in the past, we've called a revocable
> attachment using move. In normal situations the attachment is pinned, as a
> BAR does not change physical address. However when the VFIO device is
> closed, or a PCI reset is issued, access to the MMIO memory is revoked.
> 
> Revoked means that move occurs, but an attempt to immediately re-map the
> memory will fail. In the reset case a future move will be triggered when
> MMIO access returns. As both close and reset are under userspace control
> it is expected that userspace will suspend use of the dma-buf before doing
> these operations, the revoke is purely for kernel self-defense against a
> hostile userspace.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/pci/Makefile           |   1 +
>  drivers/vfio/pci/vfio_pci_config.c  |   8 +-
>  drivers/vfio/pci/vfio_pci_core.c    |  28 ++-
>  drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_priv.h    |  23 +++
>  include/linux/vfio_pci_core.h       |   1 +
>  include/uapi/linux/vfio.h           |  18 ++
>  7 files changed, 336 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/vfio/pci/vfio_pci_dma_buf.c
> 
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 24c524224da5a3..81006a157cde14 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -2,6 +2,7 @@
>  
>  vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
>  vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o
> +vfio-pci-core-$(CONFIG_DMA_SHARED_BUFFER) += vfio_pci_dma_buf.o
>  obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
>  
>  vfio-pci-y := vfio_pci.o
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index d22921efa25987..f8a9c24d04aeb7 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -584,10 +584,12 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
>  		virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY);
>  		new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);
>  
> -		if (!new_mem)
> +		if (!new_mem) {
>  			vfio_pci_zap_and_down_write_memory_lock(vdev);
> -		else
> +			vfio_pci_dma_buf_move(vdev, true);
> +		} else {
>  			down_write(&vdev->memory_lock);
> +		}
>  
>  		/*
>  		 * If the user is writing mem/io enable (new_mem/io) and we
> @@ -622,6 +624,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos,
>  		*virt_cmd &= cpu_to_le16(~mask);
>  		*virt_cmd |= cpu_to_le16(new_cmd & mask);
>  
> +		if (__vfio_pci_memory_enabled(vdev))
> +			vfio_pci_dma_buf_move(vdev, false);
>  		up_write(&vdev->memory_lock);
>  	}
>  
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d13e22021860cc..206f159c480e42 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -504,6 +504,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>  	vfio_spapr_pci_eeh_release(vdev->pdev);
>  	vfio_pci_core_disable(vdev);
>  
> +	vfio_pci_dma_buf_cleanup(vdev);
> +
>  	mutex_lock(&vdev->igate);
>  	if (vdev->err_trigger) {
>  		eventfd_ctx_put(vdev->err_trigger);
> @@ -980,7 +982,10 @@ int vfio_pci_try_reset_function(struct vfio_pci_core_device *vdev)
>  	 */
>  	vfio_pci_set_power_state(vdev, PCI_D0);
>  
> +	vfio_pci_dma_buf_move(vdev, true);
>  	ret = pci_try_reset_function(vdev->pdev);
> +	if (__vfio_pci_memory_enabled(vdev))
> +		vfio_pci_dma_buf_move(vdev, false);
>  	up_write(&vdev->memory_lock);
>  
>  	return ret;
> @@ -1210,11 +1215,10 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);
>  
> -static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
> -				       uuid_t __user *arg, size_t argsz)
> +static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev,
> +				       u32 flags, uuid_t __user *arg,
> +				       size_t argsz)
>  {
> -	struct vfio_pci_core_device *vdev =
> -		container_of(device, struct vfio_pci_core_device, vdev);
>  	uuid_t uuid;
>  	int ret;
>  
> @@ -1241,9 +1245,14 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
>  int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
>  				void __user *arg, size_t argsz)
>  {
> +	struct vfio_pci_core_device *vdev =
> +		container_of(device, struct vfio_pci_core_device, vdev);
> +
>  	switch (flags & VFIO_DEVICE_FEATURE_MASK) {
>  	case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
> -		return vfio_pci_core_feature_token(device, flags, arg, argsz);
> +		return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
> +	case VFIO_DEVICE_FEATURE_DMA_BUF:
> +		return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
>  	default:
>  		return -ENOTTY;
>  	}
> @@ -1881,6 +1890,7 @@ void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev,
>  	INIT_LIST_HEAD(&vdev->vma_list);
>  	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
>  	init_rwsem(&vdev->memory_lock);
> +	INIT_LIST_HEAD(&vdev->dmabufs);
>  }
>  EXPORT_SYMBOL_GPL(vfio_pci_core_init_device);
>  
> @@ -2227,11 +2237,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
>  	 * cause the PCI config space reset without restoring the original
>  	 * state (saved locally in 'vdev->pm_save').
>  	 */
> -	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
> +	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
> +		vfio_pci_dma_buf_move(cur, true);
>  		vfio_pci_set_power_state(cur, PCI_D0);
> +	}
>  
>  	ret = pci_reset_bus(pdev);
>  
> +	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
> +		if (__vfio_pci_memory_enabled(cur))
> +			vfio_pci_dma_buf_move(cur, false);
> +
>  err_undo:
>  	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
>  		if (cur == cur_mem)
> diff --git a/drivers/vfio/pci/vfio_pci_dma_buf.c b/drivers/vfio/pci/vfio_pci_dma_buf.c
> new file mode 100644
> index 00000000000000..ac32405de5e48d
> --- /dev/null
> +++ b/drivers/vfio/pci/vfio_pci_dma_buf.c
> @@ -0,0 +1,265 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> + */
> +#include <linux/dma-buf.h>
> +#include <linux/pci-p2pdma.h>
> +#include <linux/dma-resv.h>
> +
> +#include "vfio_pci_priv.h"
> +
> +MODULE_IMPORT_NS(DMA_BUF);
> +
> +struct vfio_pci_dma_buf {
> +	struct dma_buf *dmabuf;
> +	struct vfio_pci_core_device *vdev;
> +	struct list_head dmabufs_elm;
> +	unsigned int index;
> +	size_t offset;
> +	bool revoked;
> +};
> +
> +static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf,
> +				   struct dma_buf_attachment *attachment)
> +{
> +	struct vfio_pci_dma_buf *priv = dmabuf->priv;
> +	int rc;
> +
> +	rc = pci_p2pdma_distance_many(priv->vdev->pdev, &attachment->dev, 1,
> +				      true);
> +	if (rc < 0)
> +		attachment->peer2peer = false;
> +	return 0;
> +}
> +
> +static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment)
> +{
> +}
> +
> +static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment)
> +{
> +	/*
> +	 * Uses the dynamic interface but must always allow for
> +	 * dma_buf_move_notify() to do revoke
> +	 */
> +	return -EINVAL;
> +}
> +
> +static struct sg_table *
> +vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
> +		     enum dma_data_direction dir)
> +{
> +	size_t sgl_size = dma_get_max_seg_size(attachment->dev);
> +	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> +	struct scatterlist *sgl;
> +	struct sg_table *sgt;
> +	dma_addr_t dma_addr;
> +	unsigned int nents;
> +	size_t offset;
> +	int ret;
> +
> +	dma_resv_assert_held(priv->dmabuf->resv);
> +
> +	if (!attachment->peer2peer)
> +		return ERR_PTR(-EPERM);
> +
> +	if (!priv->revoked)
> +		return ERR_PTR(-ENODEV);
> +
> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> +	if (!sgt)
> +		return ERR_PTR(-ENOMEM);
> +
> +	nents = DIV_ROUND_UP(priv->dmabuf->size, sgl_size);
> +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
> +	if (ret)
> +		goto err_kfree_sgt;
> +
> +	/*
> +	 * Since the memory being mapped is a device memory it could never be in
> +	 * CPU caches.
> +	 */
> +	dma_addr = dma_map_resource(
> +		attachment->dev,
> +		pci_resource_start(priv->vdev->pdev, priv->index) +
> +			priv->offset,
> +		priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
dma_map_resource maps the phys to an IOVA in device's
default_domain, which, however, may not be the domain that the device is
currently attached to.
So, the importer of this sgt will convert this dma_addr back to phys?

BTW, I don't see the assignment of priv->index in below
vfio_pci_core_feature_dma_buf(), is it equal to get_dma_buf.region_index ?

Thanks
Yan

> +	ret = dma_mapping_error(attachment->dev, dma_addr);
> +	if (ret)
> +		goto err_free_sgt;
> +
> +	/*
> +	 * Break the BAR's physical range up into max sized SGL's according to
> +	 * the device's requirement.
> +	 */
> +	sgl = sgt->sgl;
> +	for (offset = 0; offset != priv->dmabuf->size;) {
> +		size_t chunk_size = min(priv->dmabuf->size - offset, sgl_size);
> +
> +		sg_set_page(sgl, NULL, chunk_size, 0);
> +		sg_dma_address(sgl) = dma_addr + offset;
> +		sg_dma_len(sgl) = chunk_size;
> +		sgl = sg_next(sgl);
> +		offset += chunk_size;
> +	}
> +
> +	/*
> +	 * Because we are not going to include a CPU list we want to have some
> +	 * chance that other users will detect this by setting the orig_nents to
> +	 * 0 and using only nents (length of DMA list) when going over the sgl
> +	 */
> +	sgt->orig_nents = 0;
> +	return sgt;
> +
> +err_free_sgt:
> +	sg_free_table(sgt);
> +err_kfree_sgt:
> +	kfree(sgt);
> +	return ERR_PTR(ret);
> +}
> +
> +static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
> +				   struct sg_table *sgt,
> +				   enum dma_data_direction dir)
> +{
> +	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> +
> +	dma_unmap_resource(attachment->dev, sg_dma_address(sgt->sgl),
> +			   priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
> +	sg_free_table(sgt);
> +	kfree(sgt);
> +}
> +
> +static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> +{
> +	struct vfio_pci_dma_buf *priv = dmabuf->priv;
> +
> +	/*
> +	 * Either this or vfio_pci_dma_buf_cleanup() will remove from the list.
> +	 * The refcount prevents both.
> +	 */
> +	if (priv->vdev) {
> +		down_write(&priv->vdev->memory_lock);
> +		list_del_init(&priv->dmabufs_elm);
> +		up_write(&priv->vdev->memory_lock);
> +		vfio_device_put(&priv->vdev->vdev);
> +	}
> +	kfree(priv);
> +}
> +
> +static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
> +	.attach = vfio_pci_dma_buf_attach,
> +	.map_dma_buf = vfio_pci_dma_buf_map,
> +	.pin = vfio_pci_dma_buf_pin,
> +	.unpin = vfio_pci_dma_buf_unpin,
> +	.release = vfio_pci_dma_buf_release,
> +	.unmap_dma_buf = vfio_pci_dma_buf_unmap,
> +};
> +
> +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> +				  struct vfio_device_feature_dma_buf __user *arg,
> +				  size_t argsz)
> +{
> +	struct vfio_device_feature_dma_buf get_dma_buf;
> +	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> +	struct vfio_pci_dma_buf *priv;
> +	int ret;
> +
> +	ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
> +				 sizeof(get_dma_buf));
> +	if (ret != 1)
> +		return ret;
> +
> +	if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf)))
> +		return -EFAULT;
> +
> +	/* For PCI the region_index is the BAR number like everything else */
> +	if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX)
> +		return -EINVAL;
> +
> +	exp_info.ops = &vfio_pci_dmabuf_ops;
> +	exp_info.size = pci_resource_len(vdev->pdev, get_dma_buf.region_index);
> +	if (!exp_info.size)
> +		return -EINVAL;
> +	if (get_dma_buf.offset || get_dma_buf.length) {
> +		if (get_dma_buf.length > exp_info.size ||
> +		    get_dma_buf.offset >= exp_info.size ||
> +		    get_dma_buf.length > exp_info.size - get_dma_buf.offset ||
> +		    get_dma_buf.offset % PAGE_SIZE ||
> +		    get_dma_buf.length % PAGE_SIZE)
> +			return -EINVAL;
> +		exp_info.size = get_dma_buf.length;
> +	}
> +	exp_info.flags = get_dma_buf.open_flags;
> +
> +	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +	INIT_LIST_HEAD(&priv->dmabufs_elm);
> +	priv->offset = get_dma_buf.offset;
> +
> +	exp_info.priv = priv;
> +	priv->dmabuf = dma_buf_export(&exp_info);
> +	if (IS_ERR(priv->dmabuf)) {
> +		ret = PTR_ERR(priv->dmabuf);
> +		kfree(priv);
> +		return ret;
> +	}
> +
> +	/* dma_buf_put() now frees priv */
> +
> +	down_write(&vdev->memory_lock);
> +	dma_resv_lock(priv->dmabuf->resv, NULL);
> +	priv->revoked = __vfio_pci_memory_enabled(vdev);
> +	priv->vdev = vdev;
> +	vfio_device_get(&vdev->vdev);
> +	list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
> +	dma_resv_unlock(priv->dmabuf->resv);
> +	up_write(&vdev->memory_lock);
> +
> +	/*
> +	 * dma_buf_fd() consumes the reference, when the file closes the dmabuf
> +	 * will be released.
> +	 */
> +	return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags);
> +}
> +
> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> +{
> +	struct vfio_pci_dma_buf *priv;
> +	struct vfio_pci_dma_buf *tmp;
> +
> +	lockdep_assert_held_write(&vdev->memory_lock);
> +
> +	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> +		if (!dma_buf_try_get(priv->dmabuf))
> +			continue;
> +		if (priv->revoked != revoked) {
> +			dma_resv_lock(priv->dmabuf->resv, NULL);
> +			priv->revoked = revoked;
> +			dma_buf_move_notify(priv->dmabuf);
> +			dma_resv_unlock(priv->dmabuf->resv);
> +		}
> +		dma_buf_put(priv->dmabuf);
> +	}
> +}
> +
> +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> +{
> +	struct vfio_pci_dma_buf *priv;
> +	struct vfio_pci_dma_buf *tmp;
> +
> +	down_write(&vdev->memory_lock);
> +	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> +		if (!dma_buf_try_get(priv->dmabuf))
> +			continue;
> +		dma_resv_lock(priv->dmabuf->resv, NULL);
> +		list_del_init(&priv->dmabufs_elm);
> +		priv->vdev = NULL;
> +		priv->revoked = true;
> +		dma_buf_move_notify(priv->dmabuf);
> +		dma_resv_unlock(priv->dmabuf->resv);
> +		vfio_device_put(&vdev->vdev);
> +		dma_buf_put(priv->dmabuf);
> +	}
> +	up_write(&vdev->memory_lock);
> +}
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> index 5b1cb9a9838442..c295a1166e7a9c 100644
> --- a/drivers/vfio/pci/vfio_pci_priv.h
> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> @@ -102,4 +102,27 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
>  	return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
>  }
>  
> +#ifdef CONFIG_DMA_SHARED_BUFFER
> +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> +				  struct vfio_device_feature_dma_buf __user *arg,
> +				  size_t argsz);
> +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
> +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
> +#else
> +static int
> +vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> +			      struct vfio_device_feature_dma_buf __user *arg,
> +			      size_t argsz)
> +{
> +	return -ENOTTY;
> +}
> +static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> +{
> +}
> +static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev,
> +					 bool revoked)
> +{
> +}
> +#endif
> +
>  #endif
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
> index 9d18b832e61a0d..b57f4ecc2665e1 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -93,6 +93,7 @@ struct vfio_pci_core_device {
>  	struct mutex		vma_lock;
>  	struct list_head	vma_list;
>  	struct rw_semaphore	memory_lock;
> +	struct list_head	dmabufs;
>  };
>  
>  /* Will be exported for vfio pci drivers usage */
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 733a1cddde30a5..1dcfad6dcb6863 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -986,6 +986,24 @@ enum vfio_device_mig_state {
>  	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
>  };
>  
> +/**
> + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the
> + * region selected.
> + *
> + * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC,
> + * etc. offset/length specify a slice of the region to create the dmabuf from.
> + * If both are 0 then the whole region is used.
> + *
> + * Return: The fd number on success, -1 and errno is set on failure.
> + */
> +struct vfio_device_feature_dma_buf {
> +	__u32 region_index;
> +	__u32 open_flags;
> +	__u32 offset;
> +	__u64 length;
> +};
> +#define VFIO_DEVICE_FEATURE_DMA_BUF 3
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
>  
>  /**
> -- 
> 2.37.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
  2022-08-29  5:04   ` Yan Zhao
@ 2022-08-29 12:26     ` Jason Gunthorpe
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2022-08-29 12:26 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Alex Williamson, Christian König, Cornelia Huck, dri-devel,
	kvm, linaro-mm-sig, linux-media, Sumit Semwal, Daniel Vetter,
	Leon Romanovsky, linux-rdma, Maor Gottlieb, Oded Gabbay

On Mon, Aug 29, 2022 at 01:04:05PM +0800, Yan Zhao wrote:

> > +static struct sg_table *
> > +vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
> > +		     enum dma_data_direction dir)
> > +{
> > +	size_t sgl_size = dma_get_max_seg_size(attachment->dev);
> > +	struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv;
> > +	struct scatterlist *sgl;
> > +	struct sg_table *sgt;
> > +	dma_addr_t dma_addr;
> > +	unsigned int nents;
> > +	size_t offset;
> > +	int ret;
> > +
> > +	dma_resv_assert_held(priv->dmabuf->resv);
> > +
> > +	if (!attachment->peer2peer)
> > +		return ERR_PTR(-EPERM);
> > +
> > +	if (!priv->revoked)
> > +		return ERR_PTR(-ENODEV);
> > +
> > +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> > +	if (!sgt)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	nents = DIV_ROUND_UP(priv->dmabuf->size, sgl_size);
> > +	ret = sg_alloc_table(sgt, nents, GFP_KERNEL);
> > +	if (ret)
> > +		goto err_kfree_sgt;
> > +
> > +	/*
> > +	 * Since the memory being mapped is a device memory it could never be in
> > +	 * CPU caches.
> > +	 */
> > +	dma_addr = dma_map_resource(
> > +		attachment->dev,
> > +		pci_resource_start(priv->vdev->pdev, priv->index) +
> > +			priv->offset,
> > +		priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
> dma_map_resource maps the phys to an IOVA in device's
> default_domain, which, however, may not be the domain that the device is
> currently attached to.

dmabuf is defined only in terms of the DMA API, so it creates a SGL
with DMA API mapped dma_addr's

> So, the importer of this sgt will convert this dma_addr back to phys?

No, it is illegal to call this API if the importer is not using the
DMA API.

I'm expecting to propose some new dmabuf API for this use-case, when
we get to it. For now all importers use the DMA API.

> BTW, I don't see the assignment of priv->index in below
> vfio_pci_core_feature_dma_buf(), is it equal to
> get_dma_buf.region_index ?

Yes, I noticed that too and have fixed it - the testing was all done
on index 0.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-08-29 12:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-17 16:11 [PATCH 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
2022-08-17 16:11 ` [PATCH 1/4] dma-buf: Add dma_buf_try_get() Jason Gunthorpe
2022-08-17 16:11 ` [PATCH 2/4] vfio: Add vfio_device_get() Jason Gunthorpe
2022-08-17 16:11 ` [PATCH 3/4] vfio_pci: Do not open code pci_try_reset_function() Jason Gunthorpe
2022-08-17 16:11 ` [PATCH 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
2022-08-21 13:51   ` Fwd: " Oded Gabbay
2022-08-26 18:10     ` Jason Gunthorpe
2022-08-29  5:04   ` Yan Zhao
2022-08-29 12:26     ` Jason Gunthorpe
2022-08-18 11:07 ` [PATCH 0/4] " Christian König
2022-08-18 12:03   ` Jason Gunthorpe
2022-08-18 12:58     ` Christian König
2022-08-18 13:16       ` Jason Gunthorpe
2022-08-18 13:37         ` Christian König
2022-08-19 13:11           ` Jason Gunthorpe
2022-08-19 13:33             ` Christian König
2022-08-19 13:39               ` Jason Gunthorpe
2022-08-19 13:47                 ` Christian König
2022-08-18 12:05 ` Jason Gunthorpe
2022-08-22 21:58   ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).