dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/4] RDMA: Add dma-buf support
@ 2020-10-04 19:12 Jianxin Xiong
  2020-10-04 19:12 ` [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region Jianxin Xiong
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Jianxin Xiong @ 2020-10-04 19:12 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

When enabled, an RDMA capable NIC can perform peer-to-peer transactions
over PCIe to access the local memory located on another device. This can
often lead to better performance than using a system memory buffer for
RDMA and copying data between the buffer and device memory.

Current kernel RDMA stack uses get_user_pages() to pin the physical
pages backing the user buffer and uses dma_map_sg_attrs() to get the
dma addresses for memory access. This usually doesn't work for peer
device memory due to the lack of associated page structures.

Several mechanisms exist today to facilitate device memory access.

ZONE_DEVICE is a new zone for device memory in the memory management
subsystem. It allows pages from device memory being described with
specialized page structures, but what can be done with these page
structures may be different from system memory. ZONE_DEVICE is further
specialized into multiple memory types, such as one type for PCI
p2pmem/p2pdma and one type for HMM.

PCI p2pmem/p2pdma uses ZONE_DEVICE to represent device memory residing
in a PCI BAR and provides a set of calls to publish, discover, allocate,
and map such memory for peer-to-peer transactions. One feature of the
API is that the buffer is allocated by the side that does the DMA
transfer. This works well with the storage usage case, but is awkward
with GPU-NIC communication, where typically the buffer is allocated by
the GPU driver rather than the NIC driver.

Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier
and ZONE_DEVICE to support shared virtual address space and page
migration between system memory and device memory. HMM doesn't support
pinning device memory because pages located on device must be able to
migrate to system memory when accessed by CPU. Peer-to-peer access
is currently not supported by HMM.

Dma-buf is a standard mechanism for sharing buffers among different
device drivers. The buffer to be shared is exported by the owning
driver and imported by the driver that wants to use it. The exporter
provides a set of ops that the importer can call to pin and map the
buffer. In addition, a file descriptor can be associated with a dma-
buf object as the handle that can be passed to user space.

This patch series adds dma-buf importer role to the RDMA driver in
attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
chosen for a few reasons: first, the API is relatively simple and allows
a lot of flexibility in implementing the buffer manipulation ops.
Second, it doesn't require page structure. Third, dma-buf is already
supported in many GPU drivers. However, we are aware that existing GPU
drivers don't allow pinning device memory via the dma-buf interface.
Pinning would simply cause the backing storage to migrate to system RAM.
True peer-to-peer access is only possible using dynamic attach, which
requires on-demand paging support from the NIC to work. For this reason,
this series only works with ODP capable NICs.

This is the third version of the patch series. Here are the changes
from the previous version:
* Use dma_buf_dynamic_attach() instead of dma_buf_attach()
* Use on-demand paging mechanism to avoid pinning the GPU memory
* Instead of adding a new parameter to the device method for memory
registration, pass all the attributes including the file descriptor
as a structure.
* Define a new access flag for dma-buf based memory region.
* Check for on-demand paging support in the new uverbs command

This series consists of four patches. The first patch adds the common
code for importing dma-buf from a file descriptor and mapping the
dma-buf pages. Patch 2 changes the signature of driver method for user
space memory registration to accept a structure of attributes and adds
dma-buf file descriptor to the structure. Vendor drivers are updated
accordingly. Patch 3 adds dma-buf support to the mlx5 driver. Patch 4
adds a new uverbs command for registering dma-buf based memory region.

Related user space RDMA library changes will be provided as a separate
patch series.

Jianxin Xiong (4):
  RDMA/umem: Support importing dma-buf as user memory region
  RDMA: Expand driver memory registration methods to support dma-buf
  RDMA/mlx5: Support dma-buf based userspace memory region
  RDMA/uverbs: Add uverbs command for dma-buf based MR registration

 drivers/infiniband/core/Makefile                |   2 +-
 drivers/infiniband/core/umem.c                  |   4 +
 drivers/infiniband/core/umem_dmabuf.c           | 291 ++++++++++++++++++++++++
 drivers/infiniband/core/umem_dmabuf.h           |  14 ++
 drivers/infiniband/core/umem_odp.c              |  12 +
 drivers/infiniband/core/uverbs_cmd.c            |  25 +-
 drivers/infiniband/core/uverbs_std_types_mr.c   | 115 ++++++++++
 drivers/infiniband/core/verbs.c                 |  15 +-
 drivers/infiniband/hw/bnxt_re/ib_verbs.c        |  23 +-
 drivers/infiniband/hw/bnxt_re/ib_verbs.h        |   4 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h          |   6 +-
 drivers/infiniband/hw/cxgb4/mem.c               |  19 +-
 drivers/infiniband/hw/efa/efa.h                 |   3 +-
 drivers/infiniband/hw/efa/efa_verbs.c           |  24 +-
 drivers/infiniband/hw/hns/hns_roce_device.h     |   8 +-
 drivers/infiniband/hw/hns/hns_roce_mr.c         |  28 +--
 drivers/infiniband/hw/i40iw/i40iw_verbs.c       |  24 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h            |   7 +-
 drivers/infiniband/hw/mlx4/mr.c                 |  37 +--
 drivers/infiniband/hw/mlx5/mlx5_ib.h            |   8 +-
 drivers/infiniband/hw/mlx5/mr.c                 |  97 ++++++--
 drivers/infiniband/hw/mlx5/odp.c                |  22 +-
 drivers/infiniband/hw/mthca/mthca_provider.c    |  13 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c     |  23 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h     |   5 +-
 drivers/infiniband/hw/qedr/verbs.c              |  25 +-
 drivers/infiniband/hw/qedr/verbs.h              |   4 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c    |  12 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h    |   4 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c    |  24 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h |   4 +-
 drivers/infiniband/sw/rdmavt/mr.c               |  21 +-
 drivers/infiniband/sw/rdmavt/mr.h               |   4 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c           |  10 +-
 drivers/infiniband/sw/siw/siw_verbs.c           |  26 ++-
 drivers/infiniband/sw/siw/siw_verbs.h           |   5 +-
 include/rdma/ib_umem.h                          |  19 +-
 include/rdma/ib_verbs.h                         |  21 +-
 include/uapi/rdma/ib_user_ioctl_cmds.h          |  14 ++
 include/uapi/rdma/ib_user_ioctl_verbs.h         |   2 +
 40 files changed, 799 insertions(+), 225 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c
 create mode 100644 drivers/infiniband/core/umem_dmabuf.h

-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-04 19:12 [RFC PATCH v3 0/4] RDMA: Add dma-buf support Jianxin Xiong
@ 2020-10-04 19:12 ` Jianxin Xiong
  2020-10-05 10:54   ` Christian König
  2020-10-05 13:13   ` Jason Gunthorpe
  2020-10-04 19:12 ` [RFC PATCH v3 2/4] RDMA: Expand driver memory registration methods to support dma-buf Jianxin Xiong
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Jianxin Xiong @ 2020-10-04 19:12 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Dma-buf is a standard cross-driver buffer sharing mechanism that can be
used to support peer-to-peer access from RDMA devices.

Device memory exported via dma-buf is associated with a file descriptor.
This is passed to the user space as a property associated with the
buffer allocation. When the buffer is registered as a memory region,
the file descriptor is passed to the RDMA driver along with other
parameters.

Implement the common code for importing dma-buf object and mapping
dma-buf pages.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
---
 drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/umem.c        |   4 +
 drivers/infiniband/core/umem_dmabuf.c | 291 ++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/umem_dmabuf.h |  14 ++
 drivers/infiniband/core/umem_odp.c    |  12 ++
 include/rdma/ib_umem.h                |  19 ++-
 6 files changed, 340 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c
 create mode 100644 drivers/infiniband/core/umem_dmabuf.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 24cb71a..b8d51a7 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -40,5 +40,5 @@ ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
 				uverbs_std_types_srq.o \
 				uverbs_std_types_wq.o \
 				uverbs_std_types_qp.o
-ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
+ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
 ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 831bff8..59ec36c 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -42,6 +43,7 @@
 #include <rdma/ib_umem_odp.h>
 
 #include "uverbs.h"
+#include "umem_dmabuf.h"
 
 static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty)
 {
@@ -318,6 +320,8 @@ void ib_umem_release(struct ib_umem *umem)
 {
 	if (!umem)
 		return;
+	if (umem->is_dmabuf)
+		return ib_umem_dmabuf_release(umem);
 	if (umem->is_odp)
 		return ib_umem_odp_release(to_ib_umem_odp(umem));
 
diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
new file mode 100644
index 0000000..10ed646
--- /dev/null
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -0,0 +1,291 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
+#include <linux/dma-mapping.h>
+#include <rdma/ib_umem_odp.h>
+
+#include "uverbs.h"
+
+struct ib_umem_dmabuf {
+	struct ib_umem_odp umem_odp;
+	struct dma_buf_attachment *attach;
+	struct sg_table *sgt;
+	atomic_t notifier_seq;
+};
+
+static inline struct ib_umem_dmabuf *to_ib_umem_dmabuf(struct ib_umem *umem)
+{
+	struct ib_umem_odp *umem_odp = to_ib_umem_odp(umem);
+	return container_of(umem_odp, struct ib_umem_dmabuf, umem_odp);
+}
+
+static void ib_umem_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
+	struct ib_umem_odp *umem_odp = &umem_dmabuf->umem_odp;
+	struct mmu_notifier_range range = {};
+	unsigned long current_seq;
+
+	/* no concurrent invalidation due to the dma_resv lock */
+
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!umem_dmabuf->sgt)
+		return;
+
+	range.start = ib_umem_start(umem_odp);
+	range.end = ib_umem_end(umem_odp);
+	range.flags = MMU_NOTIFIER_RANGE_BLOCKABLE;
+	current_seq = atomic_read(&umem_dmabuf->notifier_seq);
+	umem_odp->notifier.ops->invalidate(&umem_odp->notifier, &range,
+					   current_seq);
+
+	atomic_inc(&umem_dmabuf->notifier_seq);
+}
+
+static struct dma_buf_attach_ops ib_umem_dmabuf_attach_ops = {
+	.allow_peer2peer = 1,
+	.move_notify = ib_umem_dmabuf_invalidate_cb,
+};
+
+static inline int ib_umem_dmabuf_init_odp(struct ib_umem_odp *umem_odp)
+{
+	size_t page_size = 1UL << umem_odp->page_shift;
+	unsigned long start;
+	unsigned long end;
+	size_t pages;
+
+	umem_odp->umem.is_odp = 1;
+	mutex_init(&umem_odp->umem_mutex);
+
+	start = ALIGN_DOWN(umem_odp->umem.address, page_size);
+	if (check_add_overflow(umem_odp->umem.address,
+			       (unsigned long)umem_odp->umem.length,
+			       &end))
+		return -EOVERFLOW;
+	end = ALIGN(end, page_size);
+	if (unlikely(end < page_size))
+		return -EOVERFLOW;
+
+	pages = (end - start) >> umem_odp->page_shift;
+	if (!pages)
+		return -EINVAL;
+
+	/* used for ib_umem_start() & ib_umem_end() */
+	umem_odp->notifier.interval_tree.start = start;
+	umem_odp->notifier.interval_tree.last = end - 1;
+
+	/* umem_odp->page_list is never used for dma-buf */
+
+	umem_odp->dma_list = kvcalloc(
+		pages, sizeof(*umem_odp->dma_list), GFP_KERNEL);
+	if (!umem_odp->dma_list)
+		return -ENOMEM;
+
+	return 0;
+}
+
+struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
+				   unsigned long addr, size_t size,
+				   int dmabuf_fd, int access,
+				   const struct mmu_interval_notifier_ops *ops)
+{
+	struct dma_buf *dmabuf;
+	struct ib_umem_dmabuf *umem_dmabuf;
+	struct ib_umem *umem;
+	struct ib_umem_odp *umem_odp;
+	unsigned long end;
+	long ret;
+
+	if (check_add_overflow(addr, size, &end))
+		return ERR_PTR(-EINVAL);
+
+	if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
+		return ERR_PTR(-EINVAL);
+
+	umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
+	if (!umem_dmabuf)
+		return ERR_PTR(-ENOMEM);
+
+	umem = &umem_dmabuf->umem_odp.umem;
+	umem->ibdev = device;
+	umem->length = size;
+	umem->address = addr;
+	umem->writable = ib_access_writable(access);
+	umem->is_dmabuf = 1;
+
+	dmabuf = dma_buf_get(dmabuf_fd);
+	if (IS_ERR(dmabuf)) {
+		ret = PTR_ERR(dmabuf);
+		goto out_free_umem;
+	}
+
+	/* always attach dynamically to pass the allow_peer2peer flag */
+	umem_dmabuf->attach = dma_buf_dynamic_attach(
+					dmabuf,
+					device->dma_device,
+					&ib_umem_dmabuf_attach_ops,
+					umem_dmabuf);
+	if (IS_ERR(umem_dmabuf->attach)) {
+		ret = PTR_ERR(umem_dmabuf->attach);
+		goto out_release_dmabuf;
+	}
+
+	umem_odp = &umem_dmabuf->umem_odp;
+	umem_odp->page_shift = PAGE_SHIFT;
+	if (access & IB_ACCESS_HUGETLB) {
+		/* don't support huge_tlb at this point */
+		ret = -EINVAL;
+		goto out_detach_dmabuf;
+	}
+
+	ret = ib_umem_dmabuf_init_odp(umem_odp);
+	if (ret)
+		goto out_detach_dmabuf;
+
+	umem_odp->notifier.ops = ops;
+	return umem;
+
+out_detach_dmabuf:
+	dma_buf_detach(dmabuf, umem_dmabuf->attach);
+
+out_release_dmabuf:
+	dma_buf_put(dmabuf);
+
+out_free_umem:
+	kfree(umem_dmabuf);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_get);
+
+unsigned long ib_umem_dmabuf_notifier_read_begin(struct ib_umem_odp *umem_odp)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(&umem_odp->umem);
+
+	return atomic_read(&umem_dmabuf->notifier_seq);
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_notifier_read_begin);
+
+int ib_umem_dmabuf_notifier_read_retry(struct ib_umem_odp *umem_odp,
+				       unsigned long current_seq)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(&umem_odp->umem);
+
+	return (atomic_read(&umem_dmabuf->notifier_seq) != current_seq);
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_notifier_read_retry);
+
+int ib_umem_dmabuf_map_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
+			     u64 access_mask, unsigned long current_seq)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(umem);
+	struct ib_umem_odp *umem_odp = &umem_dmabuf->umem_odp;
+	u64 start, end, addr;
+	int j, k, ret = 0, user_pages, pages, total_pages;
+	unsigned int page_shift;
+	size_t page_size;
+	struct scatterlist *sg;
+	struct sg_table *sgt;
+
+	if (access_mask == 0)
+		return -EINVAL;
+
+	if (user_virt < ib_umem_start(umem_odp) ||
+	    user_virt + bcnt > ib_umem_end(umem_odp))
+		return -EFAULT;
+
+	page_shift = umem_odp->page_shift;
+	page_size = 1UL << page_shift;
+	start = ALIGN_DOWN(user_virt, page_size);
+	end = ALIGN(user_virt + bcnt, page_size);
+	user_pages = (end - start) >> page_shift;
+
+	mutex_lock(&umem_odp->umem_mutex);
+
+	/* check for on-ongoing invalidations */
+	if (ib_umem_dmabuf_notifier_read_retry(umem_odp, current_seq)) {
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	ret = user_pages;
+	if (umem_dmabuf->sgt)
+		goto out;
+
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
+				     DMA_BIDIRECTIONAL);
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
+	if (IS_ERR(sgt)) {
+		ret = PTR_ERR(sgt);
+		goto out;
+	}
+
+	umem->sg_head = *sgt;
+	umem->nmap = sgt->nents;
+	umem_dmabuf->sgt = sgt;
+
+	k = 0;
+	total_pages = ib_umem_odp_num_pages(umem_odp);
+	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
+		addr = sg_dma_address(sg);
+		pages = sg_dma_len(sg) >> page_shift;
+		while (pages > 0 && k < total_pages) {
+			umem_odp->dma_list[k++] = addr | access_mask;
+			umem_odp->npages++;
+			addr += page_size;
+			pages--;
+		}
+	}
+
+	WARN_ON(k != total_pages);
+
+out:
+	mutex_unlock(&umem_odp->umem_mutex);
+	return ret;
+}
+
+void ib_umem_dmabuf_unmap_pages(struct ib_umem *umem)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(umem);
+	struct ib_umem_odp *umem_odp = &umem_dmabuf->umem_odp;
+	int npages = ib_umem_odp_num_pages(umem_odp);
+	int i;
+
+	lockdep_assert_held(&umem_odp->umem_mutex);
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!umem_dmabuf->sgt)
+		return;
+
+	dma_buf_unmap_attachment(umem_dmabuf->attach, umem_dmabuf->sgt,
+				 DMA_BIDIRECTIONAL);
+
+	umem_dmabuf->sgt = NULL;
+
+	for (i = 0; i < npages; i++)
+		umem_odp->dma_list[i] = 0;
+	umem_odp->npages = 0;
+}
+
+void ib_umem_dmabuf_release(struct ib_umem *umem)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(umem);
+	struct ib_umem_odp *umem_odp = &umem_dmabuf->umem_odp;
+	struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf;
+
+	mutex_lock(&umem_odp->umem_mutex);
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	ib_umem_dmabuf_unmap_pages(umem);
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+	mutex_unlock(&umem_odp->umem_mutex);
+	kvfree(umem_odp->dma_list);
+	dma_buf_detach(dmabuf, umem_dmabuf->attach);
+	dma_buf_put(dmabuf);
+	kfree(umem_dmabuf);
+}
diff --git a/drivers/infiniband/core/umem_dmabuf.h b/drivers/infiniband/core/umem_dmabuf.h
new file mode 100644
index 0000000..b9378bd
--- /dev/null
+++ b/drivers/infiniband/core/umem_dmabuf.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#ifndef UMEM_DMABUF_H
+#define UMEM_DMABUF_H
+
+int ib_umem_dmabuf_map_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
+			     u64 access_mask, unsigned long current_seq);
+void ib_umem_dmabuf_unmap_pages(struct ib_umem *umem);
+void ib_umem_dmabuf_release(struct ib_umem *umem);
+
+#endif /* UMEM_DMABUF_H */
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index cc6b4be..7e11619 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2014 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -47,6 +48,7 @@
 #include <rdma/ib_umem_odp.h>
 
 #include "uverbs.h"
+#include "umem_dmabuf.h"
 
 static inline int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
 				   const struct mmu_interval_notifier_ops *ops)
@@ -263,6 +265,9 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
 
 void ib_umem_odp_release(struct ib_umem_odp *umem_odp)
 {
+	if (umem_odp->umem.is_dmabuf)
+		return ib_umem_dmabuf_release(&umem_odp->umem);
+
 	/*
 	 * Ensure that no more pages are mapped in the umem.
 	 *
@@ -392,6 +397,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 	unsigned int flags = 0, page_shift;
 	phys_addr_t p = 0;
 
+	if (umem_odp->umem.is_dmabuf)
+		return ib_umem_dmabuf_map_pages(&umem_odp->umem, user_virt,
+						bcnt, access_mask, current_seq);
+
 	if (access_mask == 0)
 		return -EINVAL;
 
@@ -517,6 +526,9 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt,
 	u64 addr;
 	struct ib_device *dev = umem_odp->umem.ibdev;
 
+	if (umem_odp->umem.is_dmabuf)
+		return ib_umem_dmabuf_unmap_pages(&umem_odp->umem);
+
 	lockdep_assert_held(&umem_odp->umem_mutex);
 
 	virt = max_t(u64, virt, ib_umem_start(umem_odp));
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 71f573a..b8ea693 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
 /*
  * Copyright (c) 2007 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporation.  All rights reserved.
  */
 
 #ifndef IB_UMEM_H
@@ -13,6 +14,7 @@
 
 struct ib_ucontext;
 struct ib_umem_odp;
+struct ib_umem_dmabuf;
 
 struct ib_umem {
 	struct ib_device       *ibdev;
@@ -21,6 +23,7 @@ struct ib_umem {
 	unsigned long		address;
 	u32 writable : 1;
 	u32 is_odp : 1;
+	u32 is_dmabuf : 1;
 	struct work_struct	work;
 	struct sg_table sg_head;
 	int             nmap;
@@ -51,6 +54,13 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 				     unsigned long pgsz_bitmap,
 				     unsigned long virt);
+struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
+				   unsigned long addr, size_t size,
+				   int dmabuf_fd, int access,
+				   const struct mmu_interval_notifier_ops *ops);
+unsigned long ib_umem_dmabuf_notifier_read_begin(struct ib_umem_odp *umem_odp);
+int ib_umem_dmabuf_notifier_read_retry(struct ib_umem_odp *umem_odp,
+				       unsigned long current_seq);
 
 #else /* CONFIG_INFINIBAND_USER_MEM */
 
@@ -73,7 +83,14 @@ static inline int ib_umem_find_best_pgsz(struct ib_umem *umem,
 					 unsigned long virt) {
 	return -EINVAL;
 }
+static inline struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
+						 unsigned long addr,
+						 size_t size, int dmabuf_fd,
+						 int access,
+						 struct mmu_interval_notifier_ops *ops)
+{
+	return ERR_PTR(-EINVAL);
+}
 
 #endif /* CONFIG_INFINIBAND_USER_MEM */
-
 #endif /* IB_UMEM_H */
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 2/4] RDMA: Expand driver memory registration methods to support dma-buf
  2020-10-04 19:12 [RFC PATCH v3 0/4] RDMA: Add dma-buf support Jianxin Xiong
  2020-10-04 19:12 ` [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region Jianxin Xiong
@ 2020-10-04 19:12 ` Jianxin Xiong
  2020-10-04 19:12 ` [RFC PATCH v3 3/4] RDMA/mlx5: Support dma-buf based userspace memory region Jianxin Xiong
  2020-10-04 19:12 ` [RFC PATCH v3 4/4] RDMA/uverbs: Add uverbs command for dma-buf based MR registration Jianxin Xiong
  3 siblings, 0 replies; 22+ messages in thread
From: Jianxin Xiong @ 2020-10-04 19:12 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

For better extensibility, driver methods for user memory registration
are changed to to accept a structure instead of individual attributes
of the memory region.

To support dma-buf based memory region, a 'fd' field is added to the
the structure and a new access flag is defined.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
---
 drivers/infiniband/core/uverbs_cmd.c            | 25 +++++++++---
 drivers/infiniband/core/verbs.c                 | 15 ++++++--
 drivers/infiniband/hw/bnxt_re/ib_verbs.c        | 23 +++++------
 drivers/infiniband/hw/bnxt_re/ib_verbs.h        |  4 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h          |  6 +--
 drivers/infiniband/hw/cxgb4/mem.c               | 19 ++++-----
 drivers/infiniband/hw/efa/efa.h                 |  3 +-
 drivers/infiniband/hw/efa/efa_verbs.c           | 24 ++++++------
 drivers/infiniband/hw/hns/hns_roce_device.h     |  8 ++--
 drivers/infiniband/hw/hns/hns_roce_mr.c         | 28 +++++++-------
 drivers/infiniband/hw/i40iw/i40iw_verbs.c       | 24 +++++-------
 drivers/infiniband/hw/mlx4/mlx4_ib.h            |  7 ++--
 drivers/infiniband/hw/mlx4/mr.c                 | 37 +++++++++---------
 drivers/infiniband/hw/mlx5/mlx5_ib.h            |  8 ++--
 drivers/infiniband/hw/mlx5/mr.c                 | 51 +++++++++++++------------
 drivers/infiniband/hw/mthca/mthca_provider.c    | 13 ++++---
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c     | 23 ++++++-----
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h     |  5 ++-
 drivers/infiniband/hw/qedr/verbs.c              | 25 ++++++------
 drivers/infiniband/hw/qedr/verbs.h              |  4 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c    | 12 +++---
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h    |  4 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c    | 24 ++++++------
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h |  4 +-
 drivers/infiniband/sw/rdmavt/mr.c               | 21 +++++-----
 drivers/infiniband/sw/rdmavt/mr.h               |  4 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c           | 10 ++---
 drivers/infiniband/sw/siw/siw_verbs.c           | 26 +++++++------
 drivers/infiniband/sw/siw/siw_verbs.h           |  5 ++-
 include/rdma/ib_verbs.h                         | 21 +++++++---
 include/uapi/rdma/ib_user_ioctl_verbs.h         |  2 +
 31 files changed, 265 insertions(+), 220 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 2fbc583..b522204 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2005, 2006, 2007 Cisco Systems.  All rights reserved.
  * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
  * Copyright (c) 2006 Mellanox Technologies.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporatiion.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -694,6 +695,7 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 	struct ib_uobject           *uobj;
 	struct ib_pd                *pd;
 	struct ib_mr                *mr;
+	struct ib_user_mr_attr	     user_mr_attr;
 	int                          ret;
 	struct ib_device *ib_dev;
 
@@ -727,8 +729,17 @@ static int ib_uverbs_reg_mr(struct uverbs_attr_bundle *attrs)
 		}
 	}
 
-	mr = pd->device->ops.reg_user_mr(pd, cmd.start, cmd.length, cmd.hca_va,
-					 cmd.access_flags,
+	if (cmd.access_flags & IB_ACCESS_DMABUF) {
+		pr_debug("Dma-buf support not available via regular reg_mr call\n");
+		ret = -EINVAL;
+		goto err_put;
+	}
+
+	user_mr_attr.start = cmd.start;
+	user_mr_attr.length = cmd.length;
+	user_mr_attr.virt_addr = cmd.hca_va;
+	user_mr_attr.access_flags = cmd.access_flags;
+	mr = pd->device->ops.reg_user_mr(pd, &user_mr_attr,
 					 &attrs->driver_udata);
 	if (IS_ERR(mr)) {
 		ret = PTR_ERR(mr);
@@ -769,6 +780,7 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
 	struct ib_pd                *pd = NULL;
 	struct ib_mr                *mr;
 	struct ib_pd		    *old_pd;
+	struct ib_user_mr_attr	     user_mr_attr;
 	int                          ret;
 	struct ib_uobject	    *uobj;
 
@@ -811,9 +823,12 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
 	}
 
 	old_pd = mr->pd;
-	ret = mr->device->ops.rereg_user_mr(mr, cmd.flags, cmd.start,
-					    cmd.length, cmd.hca_va,
-					    cmd.access_flags, pd,
+	user_mr_attr.start = cmd.start;
+	user_mr_attr.length = cmd.length;
+	user_mr_attr.virt_addr = cmd.hca_va;
+	user_mr_attr.access_flags = cmd.access_flags;
+	ret = mr->device->ops.rereg_user_mr(mr, cmd.flags,
+					    &user_mr_attr, pd,
 					    &attrs->driver_udata);
 	if (ret)
 		goto put_uobj_pd;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 3096e73..574dc26 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1,7 +1,7 @@
 /*
  * Copyright (c) 2004 Mellanox Technologies Ltd.  All rights reserved.
  * Copyright (c) 2004 Infinicon Corporation.  All rights reserved.
- * Copyright (c) 2004 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2020 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
@@ -2039,6 +2039,7 @@ int ib_resize_cq(struct ib_cq *cq, int cqe)
 struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 			     u64 virt_addr, int access_flags)
 {
+	struct ib_user_mr_attr attr;
 	struct ib_mr *mr;
 
 	if (access_flags & IB_ACCESS_ON_DEMAND) {
@@ -2049,8 +2050,16 @@ struct ib_mr *ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		}
 	}
 
-	mr = pd->device->ops.reg_user_mr(pd, start, length, virt_addr,
-					 access_flags, NULL);
+	if (access_flags & IB_ACCESS_DMABUF) {
+		pr_debug("Dma-buf support not available via kernel Verbs\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	attr.start = start;
+	attr.length = length;
+	attr.virt_addr = virt_addr;
+	attr.access_flags = access_flags;
+	mr = pd->device->ops.reg_user_mr(pd, &attr, NULL);
 
 	if (IS_ERR(mr))
 		return mr;
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 5ee272d..aae4861 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -3790,8 +3790,8 @@ static int fill_umem_pbl_tbl(struct ib_umem *umem, u64 *pbl_tbl_orig,
 }
 
 /* uverbs */
-struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
-				  u64 virt_addr, int mr_access_flags,
+struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd,
+				  struct ib_user_mr_attr *attr,
 				  struct ib_udata *udata)
 {
 	struct bnxt_re_pd *pd = container_of(ib_pd, struct bnxt_re_pd, ib_pd);
@@ -3801,9 +3801,9 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 	u64 *pbl_tbl = NULL;
 	int umem_pgs, page_shift, rc;
 
-	if (length > BNXT_RE_MAX_MR_SIZE) {
+	if (attr->length > BNXT_RE_MAX_MR_SIZE) {
 		ibdev_err(&rdev->ibdev, "MR Size: %lld > Max supported:%lld\n",
-			  length, BNXT_RE_MAX_MR_SIZE);
+			  attr->length, BNXT_RE_MAX_MR_SIZE);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -3813,7 +3813,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 
 	mr->rdev = rdev;
 	mr->qplib_mr.pd = &pd->qplib_pd;
-	mr->qplib_mr.flags = __from_ib_access_flags(mr_access_flags);
+	mr->qplib_mr.flags = __from_ib_access_flags(attr->access_flags);
 	mr->qplib_mr.type = CMDQ_ALLOCATE_MRW_MRW_FLAGS_MR;
 
 	rc = bnxt_qplib_alloc_mrw(&rdev->qplib_res, &mr->qplib_mr);
@@ -3824,7 +3824,8 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 	/* The fixed portion of the rkey is the same as the lkey */
 	mr->ib_mr.rkey = mr->qplib_mr.rkey;
 
-	umem = ib_umem_get(&rdev->ibdev, start, length, mr_access_flags);
+	umem = ib_umem_get(&rdev->ibdev, attr->start, attr->length,
+			   attr->access_flags);
 	if (IS_ERR(umem)) {
 		ibdev_err(&rdev->ibdev, "Failed to get umem");
 		rc = -EFAULT;
@@ -3832,14 +3833,14 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 	}
 	mr->ib_umem = umem;
 
-	mr->qplib_mr.va = virt_addr;
+	mr->qplib_mr.va = attr->virt_addr;
 	umem_pgs = ib_umem_page_count(umem);
 	if (!umem_pgs) {
 		ibdev_err(&rdev->ibdev, "umem is invalid!");
 		rc = -EINVAL;
 		goto free_umem;
 	}
-	mr->qplib_mr.total_size = length;
+	mr->qplib_mr.total_size = attr->length;
 
 	pbl_tbl = kcalloc(umem_pgs, sizeof(u64 *), GFP_KERNEL);
 	if (!pbl_tbl) {
@@ -3849,7 +3850,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 
 	page_shift = __ffs(ib_umem_find_best_pgsz(umem,
 				BNXT_RE_PAGE_SIZE_4K | BNXT_RE_PAGE_SIZE_2M,
-				virt_addr));
+				attr->virt_addr));
 
 	if (!bnxt_re_page_size_ok(page_shift)) {
 		ibdev_err(&rdev->ibdev, "umem page size unsupported!");
@@ -3858,9 +3859,9 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
 	}
 
 	if (page_shift == BNXT_RE_PAGE_SHIFT_4K &&
-	    length > BNXT_RE_MAX_MR_SIZE_LOW) {
+	    attr->length > BNXT_RE_MAX_MR_SIZE_LOW) {
 		ibdev_err(&rdev->ibdev, "Requested MR Sz:%llu Max sup:%llu",
-			  length, (u64)BNXT_RE_MAX_MR_SIZE_LOW);
+			  attr->length, (u64)BNXT_RE_MAX_MR_SIZE_LOW);
 		rc = -EINVAL;
 		goto fail;
 	}
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.h b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
index 1daeb30..a7538809 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.h
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.h
@@ -206,8 +206,8 @@ struct ib_mr *bnxt_re_alloc_mr(struct ib_pd *ib_pd, enum ib_mr_type mr_type,
 struct ib_mw *bnxt_re_alloc_mw(struct ib_pd *ib_pd, enum ib_mw_type type,
 			       struct ib_udata *udata);
 int bnxt_re_dealloc_mw(struct ib_mw *mw);
-struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				  u64 virt_addr, int mr_access_flags,
+struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *pd,
+				  struct ib_user_mr_attr *attr,
 				  struct ib_udata *udata);
 int bnxt_re_alloc_ucontext(struct ib_ucontext *ctx, struct ib_udata *udata);
 void bnxt_re_dealloc_ucontext(struct ib_ucontext *context);
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 2b2b009..9b56538 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -987,9 +987,9 @@ int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
 void c4iw_dealloc(struct uld_ctx *ctx);
 struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type,
 			    struct ib_udata *udata);
-struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
-					   u64 length, u64 virt, int acc,
-					   struct ib_udata *udata);
+struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd,
+			       struct ib_user_mr_attr *attr,
+			       struct ib_udata *udata);
 struct ib_mr *c4iw_get_dma_mr(struct ib_pd *pd, int acc);
 int c4iw_dereg_mr(struct ib_mr *ib_mr, struct ib_udata *udata);
 void c4iw_destroy_cq(struct ib_cq *ib_cq, struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 73936c3..9203037 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -504,8 +504,8 @@ struct ib_mr *c4iw_get_dma_mr(struct ib_pd *pd, int acc)
 	return ERR_PTR(ret);
 }
 
-struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-			       u64 virt, int acc, struct ib_udata *udata)
+struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, struct ib_user_mr_attr *attr,
+			       struct ib_udata *udata)
 {
 	__be64 *pages;
 	int shift, n, i;
@@ -517,16 +517,16 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	pr_debug("ib_pd %p\n", pd);
 
-	if (length == ~0ULL)
+	if (attr->length == ~0ULL)
 		return ERR_PTR(-EINVAL);
 
-	if ((length + start) < start)
+	if ((attr->length + attr->start) < attr->start)
 		return ERR_PTR(-EINVAL);
 
 	php = to_c4iw_pd(pd);
 	rhp = php->rhp;
 
-	if (mr_exceeds_hw_limits(rhp, length))
+	if (mr_exceeds_hw_limits(rhp, attr->length))
 		return ERR_PTR(-EINVAL);
 
 	mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
@@ -542,7 +542,8 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	mhp->rhp = rhp;
 
-	mhp->umem = ib_umem_get(pd->device, start, length, acc);
+	mhp->umem = ib_umem_get(pd->device, attr->start, attr->length,
+				attr->access_flags);
 	if (IS_ERR(mhp->umem))
 		goto err_free_skb;
 
@@ -586,10 +587,10 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	mhp->attr.pdid = php->pdid;
 	mhp->attr.zbva = 0;
-	mhp->attr.perms = c4iw_ib_to_tpt_access(acc);
-	mhp->attr.va_fbo = virt;
+	mhp->attr.perms = c4iw_ib_to_tpt_access(attr->access_flags);
+	mhp->attr.va_fbo = attr->virt_addr;
 	mhp->attr.page_size = shift - 12;
-	mhp->attr.len = length;
+	mhp->attr.len = attr->length;
 
 	err = register_mem(rhp, php, mhp, shift);
 	if (err)
diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 1889dd1..a32a55f 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -142,8 +142,7 @@ struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
 void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata);
 int efa_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		  struct ib_udata *udata);
-struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
-			 u64 virt_addr, int access_flags,
+struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, struct ib_user_mr_attr *attr,
 			 struct ib_udata *udata);
 int efa_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
 int efa_get_port_immutable(struct ib_device *ibdev, u8 port_num,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index 9e201f169..d1452fb 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -1346,8 +1346,7 @@ static int efa_create_pbl(struct efa_dev *dev,
 	return 0;
 }
 
-struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
-			 u64 virt_addr, int access_flags,
+struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, struct ib_user_mr_attr *attr,
 			 struct ib_udata *udata)
 {
 	struct efa_dev *dev = to_edev(ibpd->device);
@@ -1372,11 +1371,11 @@ struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
 		IB_ACCESS_LOCAL_WRITE |
 		(is_rdma_read_cap(dev) ? IB_ACCESS_REMOTE_READ : 0);
 
-	access_flags &= ~IB_ACCESS_OPTIONAL;
-	if (access_flags & ~supp_access_flags) {
+	attr->access_flags &= ~IB_ACCESS_OPTIONAL;
+	if (attr->access_flags & ~supp_access_flags) {
 		ibdev_dbg(&dev->ibdev,
 			  "Unsupported access flags[%#x], supported[%#x]\n",
-			  access_flags, supp_access_flags);
+			  attr->access_flags, supp_access_flags);
 		err = -EOPNOTSUPP;
 		goto err_out;
 	}
@@ -1387,7 +1386,8 @@ struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
 		goto err_out;
 	}
 
-	mr->umem = ib_umem_get(ibpd->device, start, length, access_flags);
+	mr->umem = ib_umem_get(ibpd->device, attr->start, attr->length,
+			       attr->access_flags);
 	if (IS_ERR(mr->umem)) {
 		err = PTR_ERR(mr->umem);
 		ibdev_dbg(&dev->ibdev,
@@ -1396,13 +1396,13 @@ struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
 	}
 
 	params.pd = to_epd(ibpd)->pdn;
-	params.iova = virt_addr;
-	params.mr_length_in_bytes = length;
-	params.permissions = access_flags;
+	params.iova = attr->virt_addr;
+	params.mr_length_in_bytes = attr->length;
+	params.permissions = attr->access_flags;
 
 	pg_sz = ib_umem_find_best_pgsz(mr->umem,
 				       dev->dev_attr.page_size_cap,
-				       virt_addr);
+				       attr->virt_addr);
 	if (!pg_sz) {
 		err = -EOPNOTSUPP;
 		ibdev_dbg(&dev->ibdev, "Failed to find a suitable page size in page_size_cap %#llx\n",
@@ -1416,7 +1416,7 @@ struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
 
 	ibdev_dbg(&dev->ibdev,
 		  "start %#llx length %#llx params.page_shift %u params.page_num %u\n",
-		  start, length, params.page_shift, params.page_num);
+		  attr->start, attr->length, params.page_shift, params.page_num);
 
 	inline_size = ARRAY_SIZE(params.pbl.inline_pbl_array);
 	if (params.page_num <= inline_size) {
@@ -1441,7 +1441,7 @@ struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length,
 
 	mr->ibmr.lkey = result.l_key;
 	mr->ibmr.rkey = result.r_key;
-	mr->ibmr.length = length;
+	mr->ibmr.length = attr->length;
 	ibdev_dbg(&dev->ibdev, "Registered mr[%d]\n", mr->ibmr.lkey);
 
 	return &mr->ibmr;
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 6edcbdc..c94589d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -1184,11 +1184,11 @@ int hns_roce_create_ah(struct ib_ah *ah, struct rdma_ah_init_attr *init_attr,
 void hns_roce_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 
 struct ib_mr *hns_roce_get_dma_mr(struct ib_pd *pd, int acc);
-struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				   u64 virt_addr, int access_flags,
+struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd,
+				   struct ib_user_mr_attr *attr,
 				   struct ib_udata *udata);
-int hns_roce_rereg_user_mr(struct ib_mr *mr, int flags, u64 start, u64 length,
-			   u64 virt_addr, int mr_access_flags, struct ib_pd *pd,
+int hns_roce_rereg_user_mr(struct ib_mr *mr, int flags,
+			   struct ib_user_mr_attr *attr, struct ib_pd *pd,
 			   struct ib_udata *udata);
 struct ib_mr *hns_roce_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
 				u32 max_num_sg);
diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index e5df388..1e8ebba 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -259,8 +259,8 @@ struct ib_mr *hns_roce_get_dma_mr(struct ib_pd *pd, int acc)
 	return ERR_PTR(ret);
 }
 
-struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				   u64 virt_addr, int access_flags,
+struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd,
+				   struct ib_user_mr_attr *attr,
 				   struct ib_udata *udata)
 {
 	struct hns_roce_dev *hr_dev = to_hr_dev(pd->device);
@@ -272,12 +272,13 @@ struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		return ERR_PTR(-ENOMEM);
 
 	mr->type = MR_TYPE_MR;
-	ret = alloc_mr_key(hr_dev, mr, to_hr_pd(pd)->pdn, virt_addr, length,
-			   access_flags);
+	ret = alloc_mr_key(hr_dev, mr, to_hr_pd(pd)->pdn, attr->virt_addr,
+			   attr->length, attr->access_flags);
 	if (ret)
 		goto err_alloc_mr;
 
-	ret = alloc_mr_pbl(hr_dev, mr, length, udata, start, access_flags);
+	ret = alloc_mr_pbl(hr_dev, mr, attr->length, udata, attr->start,
+			   attr->access_flags);
 	if (ret)
 		goto err_alloc_key;
 
@@ -286,7 +287,7 @@ struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		goto err_alloc_pbl;
 
 	mr->ibmr.rkey = mr->ibmr.lkey = mr->key;
-	mr->ibmr.length = length;
+	mr->ibmr.length = attr->length;
 
 	return &mr->ibmr;
 
@@ -328,8 +329,8 @@ static int rereg_mr_trans(struct ib_mr *ibmr, int flags,
 	return ret;
 }
 
-int hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start, u64 length,
-			   u64 virt_addr, int mr_access_flags, struct ib_pd *pd,
+int hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags,
+			   struct ib_user_mr_attr *attr,
 			   struct ib_udata *udata)
 {
 	struct hns_roce_dev *hr_dev = to_hr_dev(ibmr->device);
@@ -365,15 +366,16 @@ int hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start, u64 length,
 
 	if (flags & IB_MR_REREG_TRANS) {
 		ret = rereg_mr_trans(ibmr, flags,
-				     start, length,
-				     virt_addr, mr_access_flags,
+				     attr->start, attr->length,
+				     attr->virt_addr, attr->access_flags,
 				     mailbox, pdn, udata);
 		if (ret)
 			goto free_cmd_mbox;
 	} else {
 		ret = hr_dev->hw->rereg_write_mtpt(hr_dev, mr, flags, pdn,
-						   mr_access_flags, virt_addr,
-						   length, mailbox->buf);
+						   attr->access_flags,
+						   attr->virt_addr,
+						   attr->length, mailbox->buf);
 		if (ret)
 			goto free_cmd_mbox;
 	}
@@ -386,7 +388,7 @@ int hns_roce_rereg_user_mr(struct ib_mr *ibmr, int flags, u64 start, u64 length,
 
 	mr->enabled = 1;
 	if (flags & IB_MR_REREG_ACCESS)
-		mr->access = mr_access_flags;
+		mr->access = attr->access_flags;
 
 	hns_roce_free_cmd_mailbox(hr_dev, mailbox);
 
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index b513393..4c3ca7e 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -1722,17 +1722,11 @@ static int i40iw_hwreg_mr(struct i40iw_device *iwdev,
 /**
  * i40iw_reg_user_mr - Register a user memory region
  * @pd: ptr of pd
- * @start: virtual start address
- * @length: length of mr
- * @virt: virtual address
- * @acc: access of mr
+ * @attr: attributes for user mr
  * @udata: user data
  */
 static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
-				       u64 start,
-				       u64 length,
-				       u64 virt,
-				       int acc,
+				       struct ib_user_mr_attr *attr,
 				       struct ib_udata *udata)
 {
 	struct i40iw_pd *iwpd = to_iwpd(pd);
@@ -1760,9 +1754,11 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
 	if (iwdev->closing)
 		return ERR_PTR(-ENODEV);
 
-	if (length > I40IW_MAX_MR_SIZE)
+	if (attr->length > I40IW_MAX_MR_SIZE)
 		return ERR_PTR(-EINVAL);
-	region = ib_umem_get(pd->device, start, length, acc);
+
+	region = ib_umem_get(pd->device, attr->start, attr->length,
+			     attr->access_flags);
 	if (IS_ERR(region))
 		return (struct ib_mr *)region;
 
@@ -1786,15 +1782,15 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
 	iwmr->page_size = PAGE_SIZE;
 	if (req.reg_type == IW_MEMREG_TYPE_MEM)
 		iwmr->page_size = ib_umem_find_best_pgsz(region, SZ_4K | SZ_2M,
-							 virt);
+							 attr->virt_addr);
 
-	region_length = region->length + (start & (iwmr->page_size - 1));
+	region_length = region->length + (attr->start & (iwmr->page_size - 1));
 	pg_shift = ffs(iwmr->page_size) - 1;
 	pbl_depth = region_length >> pg_shift;
 	pbl_depth += (region_length & (iwmr->page_size - 1)) ? 1 : 0;
 	iwmr->length = region->length;
 
-	iwpbl->user_base = virt;
+	iwpbl->user_base = attr->virt_addr;
 	palloc = &iwpbl->pble_alloc;
 
 	iwmr->type = req.reg_type;
@@ -1838,7 +1834,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd,
 			}
 		}
 
-		access |= i40iw_get_user_access(acc);
+		access |= i40iw_get_user_access(attr->access_flags);
 		stag = i40iw_create_stag(iwdev);
 		if (!stag) {
 			err = -ENOMEM;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 38e87a7..52c41ef 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -721,8 +721,8 @@ int mlx4_ib_db_map_user(struct ib_udata *udata, unsigned long virt,
 struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc);
 int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt,
 			   struct ib_umem *umem);
-struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				  u64 virt_addr, int access_flags,
+struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd,
+				  struct ib_user_mr_attr *attr,
 				  struct ib_udata *udata);
 int mlx4_ib_dereg_mr(struct ib_mr *mr, struct ib_udata *udata);
 struct ib_mw *mlx4_ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type,
@@ -876,8 +876,7 @@ void mlx4_ib_slave_alias_guid_event(struct mlx4_ib_dev *dev, int slave,
 int mlx4_ib_steer_qp_reg(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
 			 int is_attach);
 int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
-			  u64 start, u64 length, u64 virt_addr,
-			  int mr_access_flags, struct ib_pd *pd,
+			  struct ib_user_mr_attr *attr, struct ib_pd *pd,
 			  struct ib_udata *udata);
 int mlx4_ib_gid_index_to_real_index(struct mlx4_ib_dev *ibdev,
 				    const struct ib_gid_attr *attr);
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 1d5ef0d..79752f2c 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -401,8 +401,8 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_device *device, u64 start,
 	return ib_umem_get(device, start, length, access_flags);
 }
 
-struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				  u64 virt_addr, int access_flags,
+struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd,
+				  struct ib_user_mr_attr *attr,
 				  struct ib_udata *udata)
 {
 	struct mlx4_ib_dev *dev = to_mdev(pd->device);
@@ -415,17 +415,19 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	mr->umem = mlx4_get_umem_mr(pd->device, start, length, access_flags);
+	mr->umem = mlx4_get_umem_mr(pd->device, attr->start, attr->length,
+				    attr->access_flags);
 	if (IS_ERR(mr->umem)) {
 		err = PTR_ERR(mr->umem);
 		goto err_free;
 	}
 
 	n = ib_umem_page_count(mr->umem);
-	shift = mlx4_ib_umem_calc_optimal_mtt_size(mr->umem, start, &n);
+	shift = mlx4_ib_umem_calc_optimal_mtt_size(mr->umem, attr->start, &n);
 
-	err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length,
-			    convert_access(access_flags), n, shift, &mr->mmr);
+	err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, attr->virt_addr,
+			    attr->length, convert_access(attr->access_flags),
+			    n, shift, &mr->mmr);
 	if (err)
 		goto err_umem;
 
@@ -438,7 +440,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		goto err_mr;
 
 	mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key;
-	mr->ibmr.length = length;
+	mr->ibmr.length = attr->length;
 	mr->ibmr.page_size = 1U << shift;
 
 	return &mr->ibmr;
@@ -456,8 +458,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 }
 
 int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
-			  u64 start, u64 length, u64 virt_addr,
-			  int mr_access_flags, struct ib_pd *pd,
+			  struct ib_user_mr_attr *attr, struct ib_pd *pd,
 			  struct ib_udata *udata)
 {
 	struct mlx4_ib_dev *dev = to_mdev(mr->device);
@@ -484,14 +485,14 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
 	}
 
 	if (flags & IB_MR_REREG_ACCESS) {
-		if (ib_access_writable(mr_access_flags) &&
+		if (ib_access_writable(attr->access_flags) &&
 		    !mmr->umem->writable) {
 			err = -EPERM;
 			goto release_mpt_entry;
 		}
 
 		err = mlx4_mr_hw_change_access(dev->dev, *pmpt_entry,
-					       convert_access(mr_access_flags));
+					       convert_access(attr->access_flags));
 
 		if (err)
 			goto release_mpt_entry;
@@ -503,8 +504,8 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
 
 		mlx4_mr_rereg_mem_cleanup(dev->dev, &mmr->mmr);
 		ib_umem_release(mmr->umem);
-		mmr->umem = mlx4_get_umem_mr(mr->device, start, length,
-					     mr_access_flags);
+		mmr->umem = mlx4_get_umem_mr(mr->device, attr->start, attr->length,
+					     attr->access_flags);
 		if (IS_ERR(mmr->umem)) {
 			err = PTR_ERR(mmr->umem);
 			/* Prevent mlx4_ib_dereg_mr from free'ing invalid pointer */
@@ -515,14 +516,14 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
 		shift = PAGE_SHIFT;
 
 		err = mlx4_mr_rereg_mem_write(dev->dev, &mmr->mmr,
-					      virt_addr, length, n, shift,
-					      *pmpt_entry);
+					      attr->virt_addr, attr->length,
+					      n, shift, *pmpt_entry);
 		if (err) {
 			ib_umem_release(mmr->umem);
 			goto release_mpt_entry;
 		}
-		mmr->mmr.iova       = virt_addr;
-		mmr->mmr.size       = length;
+		mmr->mmr.iova       = attr->virt_addr;
+		mmr->mmr.size       = attr->length;
 
 		err = mlx4_ib_umem_write_mtt(dev, &mmr->mmr.mtt, mmr->umem);
 		if (err) {
@@ -537,7 +538,7 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags,
 	 */
 	err = mlx4_mr_hw_write_mpt(dev->dev, &mmr->mmr, pmpt_entry);
 	if (!err && flags & IB_MR_REREG_ACCESS)
-		mmr->mmr.access = mr_access_flags;
+		mmr->mmr.access = attr->access_flags;
 
 release_mpt_entry:
 	mlx4_mr_hw_put_mpt(dev->dev, pmpt_entry);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 5287fc8..76b376b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1154,8 +1154,8 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 int mlx5_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 int mlx5_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
 struct ib_mr *mlx5_ib_get_dma_mr(struct ib_pd *pd, int acc);
-struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				  u64 virt_addr, int access_flags,
+struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd,
+				  struct ib_user_mr_attr *attr,
 				  struct ib_udata *udata);
 int mlx5_ib_advise_mr(struct ib_pd *pd,
 		      enum ib_uverbs_advise_mr_advice advice,
@@ -1173,8 +1173,8 @@ struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
 					     int access_flags);
 void mlx5_ib_free_implicit_mr(struct mlx5_ib_mr *mr);
 void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr);
-int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
-			  u64 length, u64 virt_addr, int access_flags,
+int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags,
+			  struct ib_user_mr_attr *attr,
 			  struct ib_pd *pd, struct ib_udata *udata);
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
 struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 3e6f2f9..3c91e32 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1347,8 +1347,8 @@ struct ib_mr *mlx5_ib_reg_dm_mr(struct ib_pd *pd, struct ib_dm *dm,
 				 attr->access_flags, mode);
 }
 
-struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				  u64 virt_addr, int access_flags,
+struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd,
+				  struct ib_user_mr_attr *attr,
 				  struct ib_udata *udata)
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
@@ -1365,39 +1365,41 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		return ERR_PTR(-EOPNOTSUPP);
 
 	mlx5_ib_dbg(dev, "start 0x%llx, virt_addr 0x%llx, length 0x%llx, access_flags 0x%x\n",
-		    start, virt_addr, length, access_flags);
+		    attr->start, attr->virt_addr, attr->length, attr->access_flags);
 
-	if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING) && !start &&
-	    length == U64_MAX) {
-		if (virt_addr != start)
+	if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING) && !attr->start &&
+	    attr->length == U64_MAX) {
+		if (attr->virt_addr != attr->start)
 			return ERR_PTR(-EINVAL);
-		if (!(access_flags & IB_ACCESS_ON_DEMAND) ||
+		if (!(attr->access_flags & IB_ACCESS_ON_DEMAND) ||
 		    !(dev->odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
 			return ERR_PTR(-EINVAL);
 
-		mr = mlx5_ib_alloc_implicit_mr(to_mpd(pd), udata, access_flags);
+		mr = mlx5_ib_alloc_implicit_mr(to_mpd(pd), udata,
+					       attr->access_flags);
 		if (IS_ERR(mr))
 			return ERR_CAST(mr);
 		return &mr->ibmr;
 	}
 
-	err = mr_umem_get(dev, start, length, access_flags, &umem,
+	err = mr_umem_get(dev, attr->start, attr->length,
+			  attr->access_flags, &umem,
 			  &npages, &page_shift, &ncont, &order);
-
 	if (err < 0)
 		return ERR_PTR(err);
 
-	use_umr = mlx5_ib_can_use_umr(dev, true, access_flags);
+	use_umr = mlx5_ib_can_use_umr(dev, true, attr->access_flags);
 
 	if (order <= mr_cache_max_order(dev) && use_umr) {
-		mr = alloc_mr_from_cache(pd, umem, virt_addr, length, ncont,
-					 page_shift, order, access_flags);
+		mr = alloc_mr_from_cache(pd, umem, attr->virt_addr,
+					 attr->length, ncont, page_shift,
+					 order, attr->access_flags);
 		if (PTR_ERR(mr) == -EAGAIN) {
 			mlx5_ib_dbg(dev, "cache empty for order %d\n", order);
 			mr = NULL;
 		}
 	} else if (!MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset)) {
-		if (access_flags & IB_ACCESS_ON_DEMAND) {
+		if (attr->access_flags & IB_ACCESS_ON_DEMAND) {
 			err = -EINVAL;
 			pr_err("Got MR registration for ODP MR > 512MB, not supported for Connect-IB\n");
 			goto error;
@@ -1407,8 +1409,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	if (!mr) {
 		mutex_lock(&dev->slow_path_mutex);
-		mr = reg_create(NULL, pd, virt_addr, length, umem, ncont,
-				page_shift, access_flags, !use_umr);
+		mr = reg_create(NULL, pd, attr->virt_addr, attr->length, umem,
+				ncont, page_shift, attr->access_flags,
+				!use_umr);
 		mutex_unlock(&dev->slow_path_mutex);
 	}
 
@@ -1420,12 +1423,12 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	mlx5_ib_dbg(dev, "mkey 0x%x\n", mr->mmkey.key);
 
 	mr->umem = umem;
-	set_mr_fields(dev, mr, npages, length, access_flags);
+	set_mr_fields(dev, mr, npages, attr->length, attr->access_flags);
 
 	if (use_umr) {
 		int update_xlt_flags = MLX5_IB_UPD_XLT_ENABLE;
 
-		if (access_flags & IB_ACCESS_ON_DEMAND)
+		if (attr->access_flags & IB_ACCESS_ON_DEMAND)
 			update_xlt_flags |= MLX5_IB_UPD_XLT_ZAP;
 
 		err = mlx5_ib_update_xlt(mr, 0, ncont, page_shift,
@@ -1504,15 +1507,15 @@ static int rereg_umr(struct ib_pd *pd, struct mlx5_ib_mr *mr,
 	return err;
 }
 
-int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
-			  u64 length, u64 virt_addr, int new_access_flags,
+int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags,
+			  struct ib_user_mr_attr *attr,
 			  struct ib_pd *new_pd, struct ib_udata *udata)
 {
 	struct mlx5_ib_dev *dev = to_mdev(ib_mr->device);
 	struct mlx5_ib_mr *mr = to_mmr(ib_mr);
 	struct ib_pd *pd = (flags & IB_MR_REREG_PD) ? new_pd : ib_mr->pd;
 	int access_flags = flags & IB_MR_REREG_ACCESS ?
-			    new_access_flags :
+			    attr->access_flags :
 			    mr->access_flags;
 	int page_shift = 0;
 	int upd_flags = 0;
@@ -1523,7 +1526,7 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 	int err;
 
 	mlx5_ib_dbg(dev, "start 0x%llx, virt_addr 0x%llx, length 0x%llx, access_flags 0x%x\n",
-		    start, virt_addr, length, access_flags);
+		    attr->start, attr->virt_addr, attr->length, access_flags);
 
 	atomic_sub(mr->npages, &dev->mdev->priv.reg_pages);
 
@@ -1534,8 +1537,8 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 		return -EOPNOTSUPP;
 
 	if (flags & IB_MR_REREG_TRANS) {
-		addr = virt_addr;
-		len = length;
+		addr = attr->virt_addr;
+		len = attr->length;
 	} else {
 		addr = mr->umem->address;
 		len = mr->umem->length;
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 9fa2f91..2f15a36 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -842,8 +842,9 @@ static struct ib_mr *mthca_get_dma_mr(struct ib_pd *pd, int acc)
 	return &mr->ibmr;
 }
 
-static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				       u64 virt, int acc, struct ib_udata *udata)
+static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd,
+				       struct ib_user_mr_attr *attr,
+				       struct ib_udata *udata)
 {
 	struct mthca_dev *dev = to_mdev(pd->device);
 	struct sg_dma_page_iter sg_iter;
@@ -871,7 +872,8 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	mr->umem = ib_umem_get(pd->device, start, length, acc);
+	mr->umem = ib_umem_get(pd->device, attr->start, attr->length,
+			       attr->access_flags);
 	if (IS_ERR(mr->umem)) {
 		err = PTR_ERR(mr->umem);
 		goto err;
@@ -918,8 +920,9 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	if (err)
 		goto err_mtt;
 
-	err = mthca_mr_alloc(dev, to_mpd(pd)->pd_num, PAGE_SHIFT, virt, length,
-			     convert_access(acc), mr);
+	err = mthca_mr_alloc(dev, to_mpd(pd)->pd_num, PAGE_SHIFT, attr->virt,
+			     attr->length, convert_access(attr->access_flags),
+			     mr);
 
 	if (err)
 		goto err_mtt;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index c1751c9..7b1081a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -850,8 +850,9 @@ static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr,
 	}
 }
 
-struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
-				 u64 usr_addr, int acc, struct ib_udata *udata)
+struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd,
+				 struct ib_user_mr_attr *attr,
+				 struct ib_udata *udata)
 {
 	int status = -ENOMEM;
 	struct ocrdma_dev *dev = get_ocrdma_dev(ibpd->device);
@@ -867,7 +868,9 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
 	if (!mr)
 		return ERR_PTR(status);
-	mr->umem = ib_umem_get(ibpd->device, start, len, acc);
+
+	mr->umem = ib_umem_get(ibpd->device, attr->start, attr->length,
+			       attr->access_flags);
 	if (IS_ERR(mr->umem)) {
 		status = -EFAULT;
 		goto umem_err;
@@ -879,18 +882,18 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 
 	mr->hwmr.pbe_size = PAGE_SIZE;
 	mr->hwmr.fbo = ib_umem_offset(mr->umem);
-	mr->hwmr.va = usr_addr;
-	mr->hwmr.len = len;
-	mr->hwmr.remote_wr = (acc & IB_ACCESS_REMOTE_WRITE) ? 1 : 0;
-	mr->hwmr.remote_rd = (acc & IB_ACCESS_REMOTE_READ) ? 1 : 0;
-	mr->hwmr.local_wr = (acc & IB_ACCESS_LOCAL_WRITE) ? 1 : 0;
+	mr->hwmr.va = attr->virt_addr;
+	mr->hwmr.len = attr->length;
+	mr->hwmr.remote_wr = (attr->access_flags & IB_ACCESS_REMOTE_WRITE) ? 1 : 0;
+	mr->hwmr.remote_rd = (attr->access_flags & IB_ACCESS_REMOTE_READ) ? 1 : 0;
+	mr->hwmr.local_wr = (attr->access_flags & IB_ACCESS_LOCAL_WRITE) ? 1 : 0;
 	mr->hwmr.local_rd = 1;
-	mr->hwmr.remote_atomic = (acc & IB_ACCESS_REMOTE_ATOMIC) ? 1 : 0;
+	mr->hwmr.remote_atomic = (attr->access_flags & IB_ACCESS_REMOTE_ATOMIC) ? 1 : 0;
 	status = ocrdma_build_pbl_tbl(dev, &mr->hwmr);
 	if (status)
 		goto umem_err;
 	build_user_pbes(dev, mr, num_pbes);
-	status = ocrdma_reg_mr(dev, &mr->hwmr, pd->id, acc);
+	status = ocrdma_reg_mr(dev, &mr->hwmr, pd->id, attr->access_flags);
 	if (status)
 		goto mbx_err;
 	mr->ibmr.lkey = mr->hwmr.lkey;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index df8e3b9..da9cf809 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -98,8 +98,9 @@ int ocrdma_post_srq_recv(struct ib_srq *, const struct ib_recv_wr *,
 
 int ocrdma_dereg_mr(struct ib_mr *ib_mr, struct ib_udata *udata);
 struct ib_mr *ocrdma_get_dma_mr(struct ib_pd *, int acc);
-struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length,
-				 u64 virt, int acc, struct ib_udata *);
+struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *,
+				 struct ib_user_mr_attr *attr,
+				 struct ib_udata *);
 struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
 			      u32 max_num_sg);
 int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index b49bef9..2ffaf92 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -2833,8 +2833,8 @@ static int init_mr_info(struct qedr_dev *dev, struct mr_info *info,
 	return rc;
 }
 
-struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
-			       u64 usr_addr, int acc, struct ib_udata *udata)
+struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, struct ib_user_mr_attr *attr,
+			       struct ib_udata *udata)
 {
 	struct qedr_dev *dev = get_qedr_dev(ibpd->device);
 	struct qedr_mr *mr;
@@ -2844,9 +2844,11 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	pd = get_qedr_pd(ibpd);
 	DP_DEBUG(dev, QEDR_MSG_MR,
 		 "qedr_register user mr pd = %d start = %lld, len = %lld, usr_addr = %lld, acc = %d\n",
-		 pd->pd_id, start, len, usr_addr, acc);
+		 pd->pd_id, attr->start, attr->length, attr->virt_addr,
+		 attr->access_flags);
 
-	if (acc & IB_ACCESS_REMOTE_WRITE && !(acc & IB_ACCESS_LOCAL_WRITE))
+	if (attr->access_flags & IB_ACCESS_REMOTE_WRITE &&
+	    !(attr->access_flags & IB_ACCESS_LOCAL_WRITE))
 		return ERR_PTR(-EINVAL);
 
 	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
@@ -2855,7 +2857,8 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 
 	mr->type = QEDR_MR_USER;
 
-	mr->umem = ib_umem_get(ibpd->device, start, len, acc);
+	mr->umem = ib_umem_get(ibpd->device, attr->start, attr->length,
+			       attr->access_flags);
 	if (IS_ERR(mr->umem)) {
 		rc = -EFAULT;
 		goto err0;
@@ -2879,18 +2882,18 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len,
 	mr->hw_mr.key = 0;
 	mr->hw_mr.pd = pd->pd_id;
 	mr->hw_mr.local_read = 1;
-	mr->hw_mr.local_write = (acc & IB_ACCESS_LOCAL_WRITE) ? 1 : 0;
-	mr->hw_mr.remote_read = (acc & IB_ACCESS_REMOTE_READ) ? 1 : 0;
-	mr->hw_mr.remote_write = (acc & IB_ACCESS_REMOTE_WRITE) ? 1 : 0;
-	mr->hw_mr.remote_atomic = (acc & IB_ACCESS_REMOTE_ATOMIC) ? 1 : 0;
+	mr->hw_mr.local_write = (attr->access_flags & IB_ACCESS_LOCAL_WRITE) ? 1 : 0;
+	mr->hw_mr.remote_read = (attr->access_flags & IB_ACCESS_REMOTE_READ) ? 1 : 0;
+	mr->hw_mr.remote_write = (attr->access_flags & IB_ACCESS_REMOTE_WRITE) ? 1 : 0;
+	mr->hw_mr.remote_atomic = (attr->access_flags & IB_ACCESS_REMOTE_ATOMIC) ? 1 : 0;
 	mr->hw_mr.mw_bind = false;
 	mr->hw_mr.pbl_ptr = mr->info.pbl_table[0].pa;
 	mr->hw_mr.pbl_two_level = mr->info.pbl_info.two_layered;
 	mr->hw_mr.pbl_page_size_log = ilog2(mr->info.pbl_info.pbl_size);
 	mr->hw_mr.page_size_log = PAGE_SHIFT;
 	mr->hw_mr.fbo = ib_umem_offset(mr->umem);
-	mr->hw_mr.length = len;
-	mr->hw_mr.vaddr = usr_addr;
+	mr->hw_mr.length = attr->length;
+	mr->hw_mr.vaddr = attr->virt_addr;
 	mr->hw_mr.zbva = false;
 	mr->hw_mr.phy_mr = false;
 	mr->hw_mr.dma_mr = false;
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index 39dd628..1283cc9 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -77,8 +77,8 @@ int qedr_create_ah(struct ib_ah *ibah, struct rdma_ah_init_attr *init_attr,
 int qedr_dereg_mr(struct ib_mr *ib_mr, struct ib_udata *udata);
 struct ib_mr *qedr_get_dma_mr(struct ib_pd *, int acc);
 
-struct ib_mr *qedr_reg_user_mr(struct ib_pd *, u64 start, u64 length,
-			       u64 virt, int acc, struct ib_udata *);
+struct ib_mr *qedr_reg_user_mr(struct ib_pd *, struct ib_user_mr_attr *attr,
+			       struct ib_udata *);
 
 int qedr_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 		   int sg_nents, unsigned int *sg_offset);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index b8a77ce..3944115 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -601,22 +601,22 @@ void usnic_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata)
 	return;
 }
 
-struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length,
-					u64 virt_addr, int access_flags,
+struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd,
+					struct ib_user_mr_attr *attr,
 					struct ib_udata *udata)
 {
 	struct usnic_ib_mr *mr;
 	int err;
 
-	usnic_dbg("start 0x%llx va 0x%llx length 0x%llx\n", start,
-			virt_addr, length);
+	usnic_dbg("start 0x%llx va 0x%llx length 0x%llx\n", attr->start,
+			attr->virt_addr, attr->length);
 
 	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
-	mr->umem = usnic_uiom_reg_get(to_upd(pd)->umem_pd, start, length,
-					access_flags, 0);
+	mr->umem = usnic_uiom_reg_get(to_upd(pd)->umem_pd, attr->start,
+					attr->length, attr->access_flags, 0);
 	if (IS_ERR_OR_NULL(mr->umem)) {
 		err = mr->umem ? PTR_ERR(mr->umem) : -EFAULT;
 		goto err_free;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index 2aedf78..d2b6837 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -61,8 +61,8 @@ int usnic_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 int usnic_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 		       struct ib_udata *udata);
 void usnic_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata);
-struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd, u64 start, u64 length,
-				u64 virt_addr, int access_flags,
+struct ib_mr *usnic_ib_reg_mr(struct ib_pd *pd,
+				struct ib_user_mr_attr *attr,
 				struct ib_udata *udata);
 int usnic_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
 int usnic_ib_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
index 77a010e68..d0d0813 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c
@@ -100,16 +100,13 @@ struct ib_mr *pvrdma_get_dma_mr(struct ib_pd *pd, int acc)
 /**
  * pvrdma_reg_user_mr - register a userspace memory region
  * @pd: protection domain
- * @start: starting address
- * @length: length of region
- * @virt_addr: I/O virtual address
- * @access_flags: access flags for memory region
+ * @attr: user mr attributes
  * @udata: user data
  *
  * @return: ib_mr pointer on success, otherwise returns an errno.
  */
-struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				 u64 virt_addr, int access_flags,
+struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd,
+				 struct ib_user_mr_attr *attr,
 				 struct ib_udata *udata)
 {
 	struct pvrdma_dev *dev = to_vdev(pd->device);
@@ -121,12 +118,13 @@ struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	struct pvrdma_cmd_create_mr_resp *resp = &rsp.create_mr_resp;
 	int ret, npages;
 
-	if (length == 0 || length > dev->dsr->caps.max_mr_size) {
+	if (attr->length == 0 || attr->length > dev->dsr->caps.max_mr_size) {
 		dev_warn(&dev->pdev->dev, "invalid mem region length\n");
 		return ERR_PTR(-EINVAL);
 	}
 
-	umem = ib_umem_get(pd->device, start, length, access_flags);
+	umem = ib_umem_get(pd->device, attr->start, attr->length,
+			   attr->access_flags);
 	if (IS_ERR(umem)) {
 		dev_warn(&dev->pdev->dev,
 			 "could not get umem for mem region\n");
@@ -147,8 +145,8 @@ struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		goto err_umem;
 	}
 
-	mr->mmr.iova = virt_addr;
-	mr->mmr.size = length;
+	mr->mmr.iova = attr->virt_addr;
+	mr->mmr.size = attr->length;
 	mr->umem = umem;
 
 	ret = pvrdma_page_dir_init(dev, &mr->pdir, npages, false);
@@ -164,10 +162,10 @@ struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 	memset(cmd, 0, sizeof(*cmd));
 	cmd->hdr.cmd = PVRDMA_CMD_CREATE_MR;
-	cmd->start = start;
-	cmd->length = length;
+	cmd->start = attr->start;
+	cmd->length = attr->length;
 	cmd->pd_handle = to_vpd(pd)->pd_handle;
-	cmd->access_flags = access_flags;
+	cmd->access_flags = attr->access_flags;
 	cmd->nchunks = npages;
 	cmd->pdir_dma = mr->pdir.dir_dma;
 
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
index 699b208..efdc69e 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h
@@ -401,8 +401,8 @@ int pvrdma_modify_port(struct ib_device *ibdev, u8 port,
 int pvrdma_alloc_pd(struct ib_pd *pd, struct ib_udata *udata);
 void pvrdma_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
 struct ib_mr *pvrdma_get_dma_mr(struct ib_pd *pd, int acc);
-struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-				 u64 virt_addr, int access_flags,
+struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd,
+				 struct ib_user_mr_attr *attr,
 				 struct ib_udata *udata);
 int pvrdma_dereg_mr(struct ib_mr *mr, struct ib_udata *udata);
 struct ib_mr *pvrdma_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c
index 2f7c25f..062d269 100644
--- a/drivers/infiniband/sw/rdmavt/mr.c
+++ b/drivers/infiniband/sw/rdmavt/mr.c
@@ -369,15 +369,13 @@ struct ib_mr *rvt_get_dma_mr(struct ib_pd *pd, int acc)
 /**
  * rvt_reg_user_mr - register a userspace memory region
  * @pd: protection domain for this memory region
- * @start: starting userspace address
- * @length: length of region to register
- * @mr_access_flags: access flags for this memory region
+ * @attr: userspace memory region attributes
  * @udata: unused by the driver
  *
  * Return: the memory region on success, otherwise returns an errno.
  */
-struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-			      u64 virt_addr, int mr_access_flags,
+struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd,
+			      struct ib_user_mr_attr *attr,
 			      struct ib_udata *udata)
 {
 	struct rvt_mr *mr;
@@ -386,10 +384,11 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	int n, m;
 	struct ib_mr *ret;
 
-	if (length == 0)
+	if (attr->length == 0)
 		return ERR_PTR(-EINVAL);
 
-	umem = ib_umem_get(pd->device, start, length, mr_access_flags);
+	umem = ib_umem_get(pd->device, attr->start, attr->length,
+			   attr->access_flags);
 	if (IS_ERR(umem))
 		return (void *)umem;
 
@@ -401,11 +400,11 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 		goto bail_umem;
 	}
 
-	mr->mr.user_base = start;
-	mr->mr.iova = virt_addr;
-	mr->mr.length = length;
+	mr->mr.user_base = attr->start;
+	mr->mr.iova = attr->virt_addr;
+	mr->mr.length = attr->length;
 	mr->mr.offset = ib_umem_offset(umem);
-	mr->mr.access_flags = mr_access_flags;
+	mr->mr.access_flags = attr->access_flags;
 	mr->umem = umem;
 
 	mr->mr.page_shift = PAGE_SHIFT;
diff --git a/drivers/infiniband/sw/rdmavt/mr.h b/drivers/infiniband/sw/rdmavt/mr.h
index b3aba35..b58ab5a 100644
--- a/drivers/infiniband/sw/rdmavt/mr.h
+++ b/drivers/infiniband/sw/rdmavt/mr.h
@@ -66,8 +66,8 @@ static inline struct rvt_mr *to_imr(struct ib_mr *ibmr)
 
 /* Mem Regions */
 struct ib_mr *rvt_get_dma_mr(struct ib_pd *pd, int acc);
-struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
-			      u64 virt_addr, int mr_access_flags,
+struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd,
+			      struct ib_user_mr_attr *attr,
 			      struct ib_udata *udata);
 int rvt_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata);
 struct ib_mr *rvt_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 658939e..2f28ecc 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -904,10 +904,8 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access)
 }
 
 static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
-				     u64 start,
-				     u64 length,
-				     u64 iova,
-				     int access, struct ib_udata *udata)
+				     struct ib_user_mr_attr *attr,
+				     struct ib_udata *udata)
 {
 	int err;
 	struct rxe_dev *rxe = to_rdev(ibpd->device);
@@ -924,8 +922,8 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
 
 	rxe_add_ref(pd);
 
-	err = rxe_mem_init_user(pd, start, length, iova,
-				access, udata, mr);
+	err = rxe_mem_init_user(pd, attr->start, attr->length, attr->virt_addr,
+				attr->access_flags, udata, mr);
 	if (err)
 		goto err3;
 
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index adafa1b..399d9b3 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -1262,14 +1262,12 @@ int siw_dereg_mr(struct ib_mr *base_mr, struct ib_udata *udata)
  * Register Memory Region.
  *
  * @pd:		Protection Domain
- * @start:	starting address of MR (virtual address)
- * @len:	len of MR
- * @rnic_va:	not used by siw
- * @rights:	MR access rights
+ * @attr:	user space MR attributes
  * @udata:	user buffer to communicate STag and Key.
  */
-struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
-			      u64 rnic_va, int rights, struct ib_udata *udata)
+struct ib_mr *siw_reg_user_mr(struct ib_pd *pd,
+			      struct ib_user_mr_attr *attr,
+			      struct ib_udata *udata)
 {
 	struct siw_mr *mr = NULL;
 	struct siw_umem *umem = NULL;
@@ -1280,21 +1278,23 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 	int rv;
 
 	siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n",
-		   (void *)(uintptr_t)start, (void *)(uintptr_t)rnic_va,
-		   (unsigned long long)len);
+		   (void *)(uintptr_t)attr->start,
+		   (void *)(uintptr_t)attr->virt_addr,
+		   (unsigned long long)attr->length);
 
 	if (atomic_inc_return(&sdev->num_mr) > SIW_MAX_MR) {
 		siw_dbg_pd(pd, "too many mr's\n");
 		rv = -ENOMEM;
 		goto err_out;
 	}
-	if (!len) {
+	if (!attr->length) {
 		rv = -EINVAL;
 		goto err_out;
 	}
 	if (mem_limit != RLIM_INFINITY) {
 		unsigned long num_pages =
-			(PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT;
+			(PAGE_ALIGN(attr->length + (attr->start & ~PAGE_MASK)))
+				>> PAGE_SHIFT;
 		mem_limit >>= PAGE_SHIFT;
 
 		if (num_pages > mem_limit - current->mm->locked_vm) {
@@ -1305,7 +1305,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 			goto err_out;
 		}
 	}
-	umem = siw_umem_get(start, len, ib_access_writable(rights));
+	umem = siw_umem_get(attr->start, attr->length,
+			    ib_access_writable(attr->access_flags));
 	if (IS_ERR(umem)) {
 		rv = PTR_ERR(umem);
 		siw_dbg_pd(pd, "getting user memory failed: %d\n", rv);
@@ -1317,7 +1318,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
 		rv = -ENOMEM;
 		goto err_out;
 	}
-	rv = siw_mr_add_mem(mr, pd, umem, start, len, rights);
+	rv = siw_mr_add_mem(mr, pd, umem, attr->start, attr->length,
+			    attr->access_flags);
 	if (rv)
 		goto err_out;
 
diff --git a/drivers/infiniband/sw/siw/siw_verbs.h b/drivers/infiniband/sw/siw/siw_verbs.h
index d957227..35d65e6 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.h
+++ b/drivers/infiniband/sw/siw/siw_verbs.h
@@ -65,8 +65,9 @@ int siw_post_receive(struct ib_qp *base_qp, const struct ib_recv_wr *wr,
 void siw_destroy_cq(struct ib_cq *base_cq, struct ib_udata *udata);
 int siw_poll_cq(struct ib_cq *base_cq, int num_entries, struct ib_wc *wc);
 int siw_req_notify_cq(struct ib_cq *base_cq, enum ib_cq_notify_flags flags);
-struct ib_mr *siw_reg_user_mr(struct ib_pd *base_pd, u64 start, u64 len,
-			      u64 rnic_va, int rights, struct ib_udata *udata);
+struct ib_mr *siw_reg_user_mr(struct ib_pd *base_pd,
+			      struct ib_user_mr_attr *attr,
+			      struct ib_udata *udata);
 struct ib_mr *siw_alloc_mr(struct ib_pd *base_pd, enum ib_mr_type mr_type,
 			   u32 max_sge);
 struct ib_mr *siw_get_dma_mr(struct ib_pd *base_pd, int rights);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index c0b2fa7..a22014c 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2,7 +2,7 @@
 /*
  * Copyright (c) 2004 Mellanox Technologies Ltd.  All rights reserved.
  * Copyright (c) 2004 Infinicon Corporation.  All rights reserved.
- * Copyright (c) 2004 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2020 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
@@ -372,6 +372,14 @@ struct ib_dm_alloc_attr {
 	u32	flags;
 };
 
+struct ib_user_mr_attr {
+	u64	start;
+	u64	length;
+	u64	virt_addr;
+	u32	fd;
+	u32	access_flags;
+};
+
 struct ib_device_attr {
 	u64			fw_ver;
 	__be64			sys_image_guid;
@@ -1431,11 +1439,12 @@ enum ib_access_flags {
 	IB_ZERO_BASED = IB_UVERBS_ACCESS_ZERO_BASED,
 	IB_ACCESS_ON_DEMAND = IB_UVERBS_ACCESS_ON_DEMAND,
 	IB_ACCESS_HUGETLB = IB_UVERBS_ACCESS_HUGETLB,
+	IB_ACCESS_DMABUF = IB_UVERBS_ACCESS_DMABUF,
 	IB_ACCESS_RELAXED_ORDERING = IB_UVERBS_ACCESS_RELAXED_ORDERING,
 
 	IB_ACCESS_OPTIONAL = IB_UVERBS_ACCESS_OPTIONAL_RANGE,
 	IB_ACCESS_SUPPORTED =
-		((IB_ACCESS_HUGETLB << 1) - 1) | IB_ACCESS_OPTIONAL,
+		((IB_ACCESS_DMABUF << 1) - 1) | IB_ACCESS_OPTIONAL,
 };
 
 /*
@@ -2442,11 +2451,11 @@ struct ib_device_ops {
 	void (*destroy_cq)(struct ib_cq *cq, struct ib_udata *udata);
 	int (*resize_cq)(struct ib_cq *cq, int cqe, struct ib_udata *udata);
 	struct ib_mr *(*get_dma_mr)(struct ib_pd *pd, int mr_access_flags);
-	struct ib_mr *(*reg_user_mr)(struct ib_pd *pd, u64 start, u64 length,
-				     u64 virt_addr, int mr_access_flags,
+	struct ib_mr *(*reg_user_mr)(struct ib_pd *pd,
+				     struct ib_user_mr_attr *attr,
 				     struct ib_udata *udata);
-	int (*rereg_user_mr)(struct ib_mr *mr, int flags, u64 start, u64 length,
-			     u64 virt_addr, int mr_access_flags,
+	int (*rereg_user_mr)(struct ib_mr *mr, int flags,
+			     struct ib_user_mr_attr *attr,
 			     struct ib_pd *pd, struct ib_udata *udata);
 	int (*dereg_mr)(struct ib_mr *mr, struct ib_udata *udata);
 	struct ib_mr *(*alloc_mr)(struct ib_pd *pd, enum ib_mr_type mr_type,
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 5debab4..faf2008 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2017-2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -57,6 +58,7 @@ enum ib_uverbs_access_flags {
 	IB_UVERBS_ACCESS_ZERO_BASED = 1 << 5,
 	IB_UVERBS_ACCESS_ON_DEMAND = 1 << 6,
 	IB_UVERBS_ACCESS_HUGETLB = 1 << 7,
+	IB_UVERBS_ACCESS_DMABUF = 1 << 8,
 
 	IB_UVERBS_ACCESS_RELAXED_ORDERING = IB_UVERBS_ACCESS_OPTIONAL_FIRST,
 	IB_UVERBS_ACCESS_OPTIONAL_RANGE =
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 3/4] RDMA/mlx5: Support dma-buf based userspace memory region
  2020-10-04 19:12 [RFC PATCH v3 0/4] RDMA: Add dma-buf support Jianxin Xiong
  2020-10-04 19:12 ` [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region Jianxin Xiong
  2020-10-04 19:12 ` [RFC PATCH v3 2/4] RDMA: Expand driver memory registration methods to support dma-buf Jianxin Xiong
@ 2020-10-04 19:12 ` Jianxin Xiong
  2020-10-04 19:12 ` [RFC PATCH v3 4/4] RDMA/uverbs: Add uverbs command for dma-buf based MR registration Jianxin Xiong
  3 siblings, 0 replies; 22+ messages in thread
From: Jianxin Xiong @ 2020-10-04 19:12 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Recognize the new access flag and call the core function to import
dma-buf based memory region.

Since the invalidation callback is called from the dma-buf driver
instead of mmu_interval_notifier, the part inside the ODP pagefault
handler that checks for on-going invalidation is modified to handle
the difference.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
---
 drivers/infiniband/hw/mlx5/mr.c  | 50 +++++++++++++++++++++++++++++++++++++---
 drivers/infiniband/hw/mlx5/odp.c | 22 ++++++++++++++++--
 2 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 3c91e32..d58be20 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -912,6 +912,44 @@ static int mr_umem_get(struct mlx5_ib_dev *dev, u64 start, u64 length,
 	return 0;
 }
 
+static int mr_umem_dmabuf_get(struct mlx5_ib_dev *dev, u64 start, u64 length,
+			      int dmabuf_fd, int access_flags,
+			      struct ib_umem **umem, int *npages,
+			      int *page_shift, int *ncont, int *order)
+{
+	struct ib_umem *u;
+	struct ib_umem_odp *odp;
+
+	*umem = NULL;
+
+	u = ib_umem_dmabuf_get(&dev->ib_dev, start, length, dmabuf_fd,
+			       access_flags, &mlx5_mn_ops);
+	if (IS_ERR(u)) {
+		mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(u));
+		return PTR_ERR(u);
+	}
+
+	odp = to_ib_umem_odp(u);
+	*page_shift = odp->page_shift;
+	*ncont = ib_umem_odp_num_pages(odp);
+	*npages = *ncont << (*page_shift - PAGE_SHIFT);
+	if (order)
+		*order = ilog2(roundup_pow_of_two(*ncont));
+
+	if (!*npages) {
+		mlx5_ib_warn(dev, "avoid zero region\n");
+		ib_umem_release(u);
+		return -EINVAL;
+	}
+
+	*umem = u;
+
+	mlx5_ib_dbg(dev, "npages %d, ncont %d, order %d, page_shift %d\n",
+		    *npages, *ncont, *order, *page_shift);
+
+	return 0;
+}
+
 static void mlx5_ib_umr_done(struct ib_cq *cq, struct ib_wc *wc)
 {
 	struct mlx5_ib_umr_context *context =
@@ -1382,9 +1420,15 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd,
 		return &mr->ibmr;
 	}
 
-	err = mr_umem_get(dev, attr->start, attr->length,
-			  attr->access_flags, &umem,
-			  &npages, &page_shift, &ncont, &order);
+	if (attr->access_flags & IB_ACCESS_DMABUF)
+		err = mr_umem_dmabuf_get(dev, attr->start, attr->length,
+					 attr->fd, attr->access_flags,
+					 &umem, &npages, &page_shift, &ncont,
+					 &order);
+	else
+		err = mr_umem_get(dev, attr->start, attr->length,
+				  attr->access_flags, &umem,
+				  &npages, &page_shift, &ncont, &order);
 	if (err < 0)
 		return ERR_PTR(err);
 
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index cfd7efa..f2ca3f8 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -665,6 +665,24 @@ void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr)
 	dma_fence_odp_mr(mr);
 }
 
+static inline unsigned long notifier_read_begin(struct ib_umem_odp *odp)
+{
+	if (odp->umem.is_dmabuf)
+		return ib_umem_dmabuf_notifier_read_begin(odp);
+	else
+		return mmu_interval_read_begin(&odp->notifier);
+}
+
+static inline int notifier_read_retry(struct ib_umem_odp *odp,
+				      unsigned long current_seq)
+{
+	if (odp->umem.is_dmabuf) {
+		return ib_umem_dmabuf_notifier_read_retry(odp, current_seq);
+	} else {
+		return mmu_interval_read_retry(&odp->notifier, current_seq);
+	}
+}
+
 #define MLX5_PF_FLAGS_DOWNGRADE BIT(1)
 static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp,
 			     u64 user_va, size_t bcnt, u32 *bytes_mapped,
@@ -683,7 +701,7 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp,
 	if (odp->umem.writable && !downgrade)
 		access_mask |= ODP_WRITE_ALLOWED_BIT;
 
-	current_seq = mmu_interval_read_begin(&odp->notifier);
+	current_seq = notifier_read_begin(odp);
 
 	np = ib_umem_odp_map_dma_pages(odp, user_va, bcnt, access_mask,
 				       current_seq);
@@ -691,7 +709,7 @@ static int pagefault_real_mr(struct mlx5_ib_mr *mr, struct ib_umem_odp *odp,
 		return np;
 
 	mutex_lock(&odp->umem_mutex);
-	if (!mmu_interval_read_retry(&odp->notifier, current_seq)) {
+	if (!notifier_read_retry(odp, current_seq)) {
 		/*
 		 * No need to check whether the MTTs really belong to
 		 * this MR, since ib_umem_odp_map_dma_pages already
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 4/4] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  2020-10-04 19:12 [RFC PATCH v3 0/4] RDMA: Add dma-buf support Jianxin Xiong
                   ` (2 preceding siblings ...)
  2020-10-04 19:12 ` [RFC PATCH v3 3/4] RDMA/mlx5: Support dma-buf based userspace memory region Jianxin Xiong
@ 2020-10-04 19:12 ` Jianxin Xiong
  3 siblings, 0 replies; 22+ messages in thread
From: Jianxin Xiong @ 2020-10-04 19:12 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Add uverbs command for registering user memory region associated
with a dma-buf file descriptor.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
---
 drivers/infiniband/core/uverbs_std_types_mr.c | 115 ++++++++++++++++++++++++++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 ++++
 2 files changed, 129 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index 9b22bb5..388364a 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -178,6 +179,88 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 	return IS_UVERBS_COPY_ERR(ret) ? ret : 0;
 }
 
+static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uobject *uobj =
+		uverbs_attr_get_uobject(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
+	struct ib_pd *pd =
+		uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE);
+	struct ib_device *ib_dev = pd->device;
+
+	struct ib_user_mr_attr user_mr_attr;
+	struct ib_mr *mr;
+	int ret;
+
+	if (!ib_dev->ops.reg_user_mr)
+		return -EOPNOTSUPP;
+
+	if (!(ib_dev->attrs.device_cap_flags & IB_DEVICE_ON_DEMAND_PAGING))
+		return -EOPNOTSUPP;
+
+	ret = uverbs_copy_from(&user_mr_attr.start, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_ADDR);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&user_mr_attr.length, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_LENGTH);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&user_mr_attr.virt_addr, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_HCA_VA);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&user_mr_attr.fd, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_FD);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_flags32(&user_mr_attr.access_flags, attrs,
+				 UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+				 IB_ACCESS_SUPPORTED);
+	if (ret)
+		return ret;
+
+	user_mr_attr.access_flags |= IB_ACCESS_DMABUF;
+
+	ret = ib_check_mr_access(user_mr_attr.access_flags);
+	if (ret)
+		return ret;
+
+	mr = pd->device->ops.reg_user_mr(pd, &user_mr_attr,
+					 &attrs->driver_udata);
+	if (IS_ERR(mr))
+		return PTR_ERR(mr);
+
+	mr->device  = pd->device;
+	mr->pd      = pd;
+	mr->type    = IB_MR_TYPE_USER;
+	mr->uobject = uobj;
+	atomic_inc(&pd->usecnt);
+
+	uobj->object = mr;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			     &mr->lkey, sizeof(mr->lkey));
+	if (ret)
+		goto err_dereg;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			     &mr->rkey, sizeof(mr->rkey));
+	if (ret)
+		goto err_dereg;
+
+	return 0;
+
+err_dereg:
+	ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
+
+	return ret;
+}
+
 DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
@@ -243,6 +326,37 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 			    UVERBS_ATTR_TYPE(u32),
 			    UA_MANDATORY));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_REG_DMABUF_MR,
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+			UVERBS_OBJECT_MR,
+			UVERBS_ACCESS_NEW,
+			UA_MANDATORY),
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+			UVERBS_OBJECT_PD,
+			UVERBS_ACCESS_READ,
+			UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_ADDR,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_HCA_VA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_FD,
+			   UVERBS_ATTR_TYPE(u32),
+			   UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+			     enum ib_access_flags),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY));
+
 DECLARE_UVERBS_NAMED_METHOD_DESTROY(
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_MR_HANDLE,
@@ -253,6 +367,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 DECLARE_UVERBS_NAMED_OBJECT(
 	UVERBS_OBJECT_MR,
 	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
+	&UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR),
 	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
 	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
 	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 99dcabf..6fd3324 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -249,6 +250,7 @@ enum uverbs_methods_mr {
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_METHOD_QUERY_MR,
+	UVERBS_METHOD_REG_DMABUF_MR,
 };
 
 enum uverbs_attrs_mr_destroy_ids {
@@ -270,6 +272,18 @@ enum uverbs_attrs_query_mr_cmd_attr_ids {
 	UVERBS_ATTR_QUERY_MR_RESP_IOVA,
 };
 
+enum uverbs_attrs_reg_dmabuf_mr_cmd_attr_ids {
+	UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_ADDR,
+	UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+	UVERBS_ATTR_REG_DMABUF_MR_HCA_VA,
+	UVERBS_ATTR_REG_DMABUF_MR_FD,
+	UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+};
+
 enum uverbs_attrs_create_counters_cmd_attr_ids {
 	UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
 };
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-04 19:12 ` [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region Jianxin Xiong
@ 2020-10-05 10:54   ` Christian König
  2020-10-05 16:19     ` Xiong, Jianxin
  2020-10-05 13:13   ` Jason Gunthorpe
  1 sibling, 1 reply; 22+ messages in thread
From: Christian König @ 2020-10-05 10:54 UTC (permalink / raw)
  To: Jianxin Xiong, linux-rdma, dri-devel
  Cc: Jason Gunthorpe, Doug Ledford, Leon Romanovsky, Daniel Vetter

Hi Jianxin,

Am 04.10.20 um 21:12 schrieb Jianxin Xiong:
> Dma-buf is a standard cross-driver buffer sharing mechanism that can be
> used to support peer-to-peer access from RDMA devices.
>
> Device memory exported via dma-buf is associated with a file descriptor.
> This is passed to the user space as a property associated with the
> buffer allocation. When the buffer is registered as a memory region,
> the file descriptor is passed to the RDMA driver along with other
> parameters.
>
> Implement the common code for importing dma-buf object and mapping
> dma-buf pages.
>
> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
> Reviewed-by: Sean Hefty <sean.hefty@intel.com>
> Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>

well first of all really nice work you have done here.

Since I'm not an expert on RDMA or its drivers I can't really review any 
of that part.

But at least from the DMA-buf side it looks like you are using the 
interface correctly as intended.

So feel free to add an Acked-by: Christian König 
<christian.koenig@amd.com> if it helps.

Thanks,
Christian.

> ---
>   drivers/infiniband/core/Makefile      |   2 +-
>   drivers/infiniband/core/umem.c        |   4 +
>   drivers/infiniband/core/umem_dmabuf.c | 291 ++++++++++++++++++++++++++++++++++
>   drivers/infiniband/core/umem_dmabuf.h |  14 ++
>   drivers/infiniband/core/umem_odp.c    |  12 ++
>   include/rdma/ib_umem.h                |  19 ++-
>   6 files changed, 340 insertions(+), 2 deletions(-)
>   create mode 100644 drivers/infiniband/core/umem_dmabuf.c
>   create mode 100644 drivers/infiniband/core/umem_dmabuf.h
>
> diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> index 24cb71a..b8d51a7 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -40,5 +40,5 @@ ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
>   				uverbs_std_types_srq.o \
>   				uverbs_std_types_wq.o \
>   				uverbs_std_types_qp.o
> -ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
> +ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
>   ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 831bff8..59ec36c 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -2,6 +2,7 @@
>    * Copyright (c) 2005 Topspin Communications.  All rights reserved.
>    * Copyright (c) 2005 Cisco Systems.  All rights reserved.
>    * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
> + * Copyright (c) 2020 Intel Corporation. All rights reserved.
>    *
>    * This software is available to you under a choice of one of two
>    * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -42,6 +43,7 @@
>   #include <rdma/ib_umem_odp.h>
>   
>   #include "uverbs.h"
> +#include "umem_dmabuf.h"
>   
>   static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty)
>   {
> @@ -318,6 +320,8 @@ void ib_umem_release(struct ib_umem *umem)
>   {
>   	if (!umem)
>   		return;
> +	if (umem->is_dmabuf)
> +		return ib_umem_dmabuf_release(umem);
>   	if (umem->is_odp)
>   		return ib_umem_odp_release(to_ib_umem_odp(umem));
>   
> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> new file mode 100644
> index 0000000..10ed646
> --- /dev/null
> +++ b/drivers/infiniband/core/umem_dmabuf.c
> @@ -0,0 +1,291 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> +/*
> + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> + */
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-resv.h>
> +#include <linux/dma-mapping.h>
> +#include <rdma/ib_umem_odp.h>
> +
> +#include "uverbs.h"
> +
> +struct ib_umem_dmabuf {
> +	struct ib_umem_odp umem_odp;
> +	struct dma_buf_attachment *attach;
> +	struct sg_table *sgt;
> +	atomic_t notifier_seq;
> +};
> +
> +static inline struct ib_umem_dmabuf *to_ib_umem_dmabuf(struct ib_umem *umem)
> +{
> +	struct ib_umem_odp *umem_odp = to_ib_umem_odp(umem);
> +	return container_of(umem_odp, struct ib_umem_dmabuf, umem_odp);
> +}
> +
> +static void ib_umem_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
> +{
> +	struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> +	struct ib_umem_odp *umem_odp = &umem_dmabuf->umem_odp;
> +	struct mmu_notifier_range range = {};
> +	unsigned long current_seq;
> +
> +	/* no concurrent invalidation due to the dma_resv lock */
> +
> +	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> +
> +	if (!umem_dmabuf->sgt)
> +		return;
> +
> +	range.start = ib_umem_start(umem_odp);
> +	range.end = ib_umem_end(umem_odp);
> +	range.flags = MMU_NOTIFIER_RANGE_BLOCKABLE;
> +	current_seq = atomic_read(&umem_dmabuf->notifier_seq);
> +	umem_odp->notifier.ops->invalidate(&umem_odp->notifier, &range,
> +					   current_seq);
> +
> +	atomic_inc(&umem_dmabuf->notifier_seq);
> +}
> +
> +static struct dma_buf_attach_ops ib_umem_dmabuf_attach_ops = {
> +	.allow_peer2peer = 1,
> +	.move_notify = ib_umem_dmabuf_invalidate_cb,
> +};
> +
> +static inline int ib_umem_dmabuf_init_odp(struct ib_umem_odp *umem_odp)
> +{
> +	size_t page_size = 1UL << umem_odp->page_shift;
> +	unsigned long start;
> +	unsigned long end;
> +	size_t pages;
> +
> +	umem_odp->umem.is_odp = 1;
> +	mutex_init(&umem_odp->umem_mutex);
> +
> +	start = ALIGN_DOWN(umem_odp->umem.address, page_size);
> +	if (check_add_overflow(umem_odp->umem.address,
> +			       (unsigned long)umem_odp->umem.length,
> +			       &end))
> +		return -EOVERFLOW;
> +	end = ALIGN(end, page_size);
> +	if (unlikely(end < page_size))
> +		return -EOVERFLOW;
> +
> +	pages = (end - start) >> umem_odp->page_shift;
> +	if (!pages)
> +		return -EINVAL;
> +
> +	/* used for ib_umem_start() & ib_umem_end() */
> +	umem_odp->notifier.interval_tree.start = start;
> +	umem_odp->notifier.interval_tree.last = end - 1;
> +
> +	/* umem_odp->page_list is never used for dma-buf */
> +
> +	umem_odp->dma_list = kvcalloc(
> +		pages, sizeof(*umem_odp->dma_list), GFP_KERNEL);
> +	if (!umem_odp->dma_list)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> +				   unsigned long addr, size_t size,
> +				   int dmabuf_fd, int access,
> +				   const struct mmu_interval_notifier_ops *ops)
> +{
> +	struct dma_buf *dmabuf;
> +	struct ib_umem_dmabuf *umem_dmabuf;
> +	struct ib_umem *umem;
> +	struct ib_umem_odp *umem_odp;
> +	unsigned long end;
> +	long ret;
> +
> +	if (check_add_overflow(addr, size, &end))
> +		return ERR_PTR(-EINVAL);
> +
> +	if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
> +		return ERR_PTR(-EINVAL);
> +
> +	umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
> +	if (!umem_dmabuf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	umem = &umem_dmabuf->umem_odp.umem;
> +	umem->ibdev = device;
> +	umem->length = size;
> +	umem->address = addr;
> +	umem->writable = ib_access_writable(access);
> +	umem->is_dmabuf = 1;
> +
> +	dmabuf = dma_buf_get(dmabuf_fd);
> +	if (IS_ERR(dmabuf)) {
> +		ret = PTR_ERR(dmabuf);
> +		goto out_free_umem;
> +	}
> +
> +	/* always attach dynamically to pass the allow_peer2peer flag */
> +	umem_dmabuf->attach = dma_buf_dynamic_attach(
> +					dmabuf,
> +					device->dma_device,
> +					&ib_umem_dmabuf_attach_ops,
> +					umem_dmabuf);
> +	if (IS_ERR(umem_dmabuf->attach)) {
> +		ret = PTR_ERR(umem_dmabuf->attach);
> +		goto out_release_dmabuf;
> +	}
> +
> +	umem_odp = &umem_dmabuf->umem_odp;
> +	umem_odp->page_shift = PAGE_SHIFT;
> +	if (access & IB_ACCESS_HUGETLB) {
> +		/* don't support huge_tlb at this point */
> +		ret = -EINVAL;
> +		goto out_detach_dmabuf;
> +	}
> +
> +	ret = ib_umem_dmabuf_init_odp(umem_odp);
> +	if (ret)
> +		goto out_detach_dmabuf;
> +
> +	umem_odp->notifier.ops = ops;
> +	return umem;
> +
> +out_detach_dmabuf:
> +	dma_buf_detach(dmabuf, umem_dmabuf->attach);
> +
> +out_release_dmabuf:
> +	dma_buf_put(dmabuf);
> +
> +out_free_umem:
> +	kfree(umem_dmabuf);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL(ib_umem_dmabuf_get);
> +
> +unsigned long ib_umem_dmabuf_notifier_read_begin(struct ib_umem_odp *umem_odp)
> +{
> +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(&umem_odp->umem);
> +
> +	return atomic_read(&umem_dmabuf->notifier_seq);
> +}
> +EXPORT_SYMBOL(ib_umem_dmabuf_notifier_read_begin);
> +
> +int ib_umem_dmabuf_notifier_read_retry(struct ib_umem_odp *umem_odp,
> +				       unsigned long current_seq)
> +{
> +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(&umem_odp->umem);
> +
> +	return (atomic_read(&umem_dmabuf->notifier_seq) != current_seq);
> +}
> +EXPORT_SYMBOL(ib_umem_dmabuf_notifier_read_retry);
> +
> +int ib_umem_dmabuf_map_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
> +			     u64 access_mask, unsigned long current_seq)
> +{
> +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(umem);
> +	struct ib_umem_odp *umem_odp = &umem_dmabuf->umem_odp;
> +	u64 start, end, addr;
> +	int j, k, ret = 0, user_pages, pages, total_pages;
> +	unsigned int page_shift;
> +	size_t page_size;
> +	struct scatterlist *sg;
> +	struct sg_table *sgt;
> +
> +	if (access_mask == 0)
> +		return -EINVAL;
> +
> +	if (user_virt < ib_umem_start(umem_odp) ||
> +	    user_virt + bcnt > ib_umem_end(umem_odp))
> +		return -EFAULT;
> +
> +	page_shift = umem_odp->page_shift;
> +	page_size = 1UL << page_shift;
> +	start = ALIGN_DOWN(user_virt, page_size);
> +	end = ALIGN(user_virt + bcnt, page_size);
> +	user_pages = (end - start) >> page_shift;
> +
> +	mutex_lock(&umem_odp->umem_mutex);
> +
> +	/* check for on-ongoing invalidations */
> +	if (ib_umem_dmabuf_notifier_read_retry(umem_odp, current_seq)) {
> +		ret = -EAGAIN;
> +		goto out;
> +	}
> +
> +	ret = user_pages;
> +	if (umem_dmabuf->sgt)
> +		goto out;
> +
> +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> +	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> +				     DMA_BIDIRECTIONAL);
> +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> +
> +	if (IS_ERR(sgt)) {
> +		ret = PTR_ERR(sgt);
> +		goto out;
> +	}
> +
> +	umem->sg_head = *sgt;
> +	umem->nmap = sgt->nents;
> +	umem_dmabuf->sgt = sgt;
> +
> +	k = 0;
> +	total_pages = ib_umem_odp_num_pages(umem_odp);
> +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> +		addr = sg_dma_address(sg);
> +		pages = sg_dma_len(sg) >> page_shift;
> +		while (pages > 0 && k < total_pages) {
> +			umem_odp->dma_list[k++] = addr | access_mask;
> +			umem_odp->npages++;
> +			addr += page_size;
> +			pages--;
> +		}
> +	}
> +
> +	WARN_ON(k != total_pages);
> +
> +out:
> +	mutex_unlock(&umem_odp->umem_mutex);
> +	return ret;
> +}
> +
> +void ib_umem_dmabuf_unmap_pages(struct ib_umem *umem)
> +{
> +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(umem);
> +	struct ib_umem_odp *umem_odp = &umem_dmabuf->umem_odp;
> +	int npages = ib_umem_odp_num_pages(umem_odp);
> +	int i;
> +
> +	lockdep_assert_held(&umem_odp->umem_mutex);
> +	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> +
> +	if (!umem_dmabuf->sgt)
> +		return;
> +
> +	dma_buf_unmap_attachment(umem_dmabuf->attach, umem_dmabuf->sgt,
> +				 DMA_BIDIRECTIONAL);
> +
> +	umem_dmabuf->sgt = NULL;
> +
> +	for (i = 0; i < npages; i++)
> +		umem_odp->dma_list[i] = 0;
> +	umem_odp->npages = 0;
> +}
> +
> +void ib_umem_dmabuf_release(struct ib_umem *umem)
> +{
> +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(umem);
> +	struct ib_umem_odp *umem_odp = &umem_dmabuf->umem_odp;
> +	struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf;
> +
> +	mutex_lock(&umem_odp->umem_mutex);
> +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> +	ib_umem_dmabuf_unmap_pages(umem);
> +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> +	mutex_unlock(&umem_odp->umem_mutex);
> +	kvfree(umem_odp->dma_list);
> +	dma_buf_detach(dmabuf, umem_dmabuf->attach);
> +	dma_buf_put(dmabuf);
> +	kfree(umem_dmabuf);
> +}
> diff --git a/drivers/infiniband/core/umem_dmabuf.h b/drivers/infiniband/core/umem_dmabuf.h
> new file mode 100644
> index 0000000..b9378bd
> --- /dev/null
> +++ b/drivers/infiniband/core/umem_dmabuf.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
> +/*
> + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> + */
> +
> +#ifndef UMEM_DMABUF_H
> +#define UMEM_DMABUF_H
> +
> +int ib_umem_dmabuf_map_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
> +			     u64 access_mask, unsigned long current_seq);
> +void ib_umem_dmabuf_unmap_pages(struct ib_umem *umem);
> +void ib_umem_dmabuf_release(struct ib_umem *umem);
> +
> +#endif /* UMEM_DMABUF_H */
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index cc6b4be..7e11619 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -1,5 +1,6 @@
>   /*
>    * Copyright (c) 2014 Mellanox Technologies. All rights reserved.
> + * Copyright (c) 2020 Intel Corporation. All rights reserved.
>    *
>    * This software is available to you under a choice of one of two
>    * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -47,6 +48,7 @@
>   #include <rdma/ib_umem_odp.h>
>   
>   #include "uverbs.h"
> +#include "umem_dmabuf.h"
>   
>   static inline int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
>   				   const struct mmu_interval_notifier_ops *ops)
> @@ -263,6 +265,9 @@ struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device,
>   
>   void ib_umem_odp_release(struct ib_umem_odp *umem_odp)
>   {
> +	if (umem_odp->umem.is_dmabuf)
> +		return ib_umem_dmabuf_release(&umem_odp->umem);
> +
>   	/*
>   	 * Ensure that no more pages are mapped in the umem.
>   	 *
> @@ -392,6 +397,10 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>   	unsigned int flags = 0, page_shift;
>   	phys_addr_t p = 0;
>   
> +	if (umem_odp->umem.is_dmabuf)
> +		return ib_umem_dmabuf_map_pages(&umem_odp->umem, user_virt,
> +						bcnt, access_mask, current_seq);
> +
>   	if (access_mask == 0)
>   		return -EINVAL;
>   
> @@ -517,6 +526,9 @@ void ib_umem_odp_unmap_dma_pages(struct ib_umem_odp *umem_odp, u64 virt,
>   	u64 addr;
>   	struct ib_device *dev = umem_odp->umem.ibdev;
>   
> +	if (umem_odp->umem.is_dmabuf)
> +		return ib_umem_dmabuf_unmap_pages(&umem_odp->umem);
> +
>   	lockdep_assert_held(&umem_odp->umem_mutex);
>   
>   	virt = max_t(u64, virt, ib_umem_start(umem_odp));
> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> index 71f573a..b8ea693 100644
> --- a/include/rdma/ib_umem.h
> +++ b/include/rdma/ib_umem.h
> @@ -1,6 +1,7 @@
>   /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
>   /*
>    * Copyright (c) 2007 Cisco Systems.  All rights reserved.
> + * Copyright (c) 2020 Intel Corporation.  All rights reserved.
>    */
>   
>   #ifndef IB_UMEM_H
> @@ -13,6 +14,7 @@
>   
>   struct ib_ucontext;
>   struct ib_umem_odp;
> +struct ib_umem_dmabuf;
>   
>   struct ib_umem {
>   	struct ib_device       *ibdev;
> @@ -21,6 +23,7 @@ struct ib_umem {
>   	unsigned long		address;
>   	u32 writable : 1;
>   	u32 is_odp : 1;
> +	u32 is_dmabuf : 1;
>   	struct work_struct	work;
>   	struct sg_table sg_head;
>   	int             nmap;
> @@ -51,6 +54,13 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
>   unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
>   				     unsigned long pgsz_bitmap,
>   				     unsigned long virt);
> +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> +				   unsigned long addr, size_t size,
> +				   int dmabuf_fd, int access,
> +				   const struct mmu_interval_notifier_ops *ops);
> +unsigned long ib_umem_dmabuf_notifier_read_begin(struct ib_umem_odp *umem_odp);
> +int ib_umem_dmabuf_notifier_read_retry(struct ib_umem_odp *umem_odp,
> +				       unsigned long current_seq);
>   
>   #else /* CONFIG_INFINIBAND_USER_MEM */
>   
> @@ -73,7 +83,14 @@ static inline int ib_umem_find_best_pgsz(struct ib_umem *umem,
>   					 unsigned long virt) {
>   	return -EINVAL;
>   }
> +static inline struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> +						 unsigned long addr,
> +						 size_t size, int dmabuf_fd,
> +						 int access,
> +						 struct mmu_interval_notifier_ops *ops)
> +{
> +	return ERR_PTR(-EINVAL);
> +}
>   
>   #endif /* CONFIG_INFINIBAND_USER_MEM */
> -
>   #endif /* IB_UMEM_H */

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-04 19:12 ` [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region Jianxin Xiong
  2020-10-05 10:54   ` Christian König
@ 2020-10-05 13:13   ` Jason Gunthorpe
  2020-10-05 16:18     ` Xiong, Jianxin
  1 sibling, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2020-10-05 13:13 UTC (permalink / raw)
  To: Jianxin Xiong
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford,
	Daniel Vetter, Christian Koenig

On Sun, Oct 04, 2020 at 12:12:28PM -0700, Jianxin Xiong wrote:
> Dma-buf is a standard cross-driver buffer sharing mechanism that can be
> used to support peer-to-peer access from RDMA devices.
> 
> Device memory exported via dma-buf is associated with a file descriptor.
> This is passed to the user space as a property associated with the
> buffer allocation. When the buffer is registered as a memory region,
> the file descriptor is passed to the RDMA driver along with other
> parameters.
> 
> Implement the common code for importing dma-buf object and mapping
> dma-buf pages.
> 
> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
> Reviewed-by: Sean Hefty <sean.hefty@intel.com>
> Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> ---
>  drivers/infiniband/core/Makefile      |   2 +-
>  drivers/infiniband/core/umem.c        |   4 +
>  drivers/infiniband/core/umem_dmabuf.c | 291 ++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/umem_dmabuf.h |  14 ++
>  drivers/infiniband/core/umem_odp.c    |  12 ++
>  include/rdma/ib_umem.h                |  19 ++-
>  6 files changed, 340 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/infiniband/core/umem_dmabuf.c
>  create mode 100644 drivers/infiniband/core/umem_dmabuf.h

I think this is using ODP too literally, dmabuf isn't going to need
fine grained page faults, and I'm not sure this locking scheme is OK -
ODP is horrifically complicated.

If this is the approach then I think we should make dmabuf its own
stand alone API, reg_user_mr_dmabuf()

The implementation in mlx5 will be much more understandable, it would
just do dma_buf_dynamic_attach() and program the XLT exactly the same
as a normal umem.

The move_notify() simply zap's the XLT and triggers a work to reload
it after the move. Locking is provided by the dma_resv_lock. Only a
small disruption to the page fault handler is needed.

> +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> +	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> +				     DMA_BIDIRECTIONAL);
> +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);

This doesn't look right, this lock has to be held up until the HW is
prorgammed

The use of atomic looks probably wrong as well.

> +	k = 0;
> +	total_pages = ib_umem_odp_num_pages(umem_odp);
> +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> +		addr = sg_dma_address(sg);
> +		pages = sg_dma_len(sg) >> page_shift;
> +		while (pages > 0 && k < total_pages) {
> +			umem_odp->dma_list[k++] = addr | access_mask;
> +			umem_odp->npages++;
> +			addr += page_size;
> +			pages--;

This isn't fragmenting the sg into a page list properly, won't work
for unaligned things

And really we don't need the dma_list for this case, with a fixed
whole mapping DMA SGL a normal umem sgl is OK and the normal umem XLT
programming in mlx5 is fine.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-05 13:13   ` Jason Gunthorpe
@ 2020-10-05 16:18     ` Xiong, Jianxin
  2020-10-05 16:33       ` Jason Gunthorpe
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Xiong, Jianxin @ 2020-10-05 16:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Monday, October 05, 2020 6:13 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
> 
> On Sun, Oct 04, 2020 at 12:12:28PM -0700, Jianxin Xiong wrote:
> > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > be used to support peer-to-peer access from RDMA devices.
> >
> > Device memory exported via dma-buf is associated with a file descriptor.
> > This is passed to the user space as a property associated with the
> > buffer allocation. When the buffer is registered as a memory region,
> > the file descriptor is passed to the RDMA driver along with other
> > parameters.
> >
> > Implement the common code for importing dma-buf object and mapping
> > dma-buf pages.
> >
> > Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
> > Reviewed-by: Sean Hefty <sean.hefty@intel.com>
> > Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> > ---
> >  drivers/infiniband/core/Makefile      |   2 +-
> >  drivers/infiniband/core/umem.c        |   4 +
> >  drivers/infiniband/core/umem_dmabuf.c | 291
> > ++++++++++++++++++++++++++++++++++
> >  drivers/infiniband/core/umem_dmabuf.h |  14 ++
> >  drivers/infiniband/core/umem_odp.c    |  12 ++
> >  include/rdma/ib_umem.h                |  19 ++-
> >  6 files changed, 340 insertions(+), 2 deletions(-)  create mode
> > 100644 drivers/infiniband/core/umem_dmabuf.c
> >  create mode 100644 drivers/infiniband/core/umem_dmabuf.h
> 
> I think this is using ODP too literally, dmabuf isn't going to need fine grained page faults, and I'm not sure this locking scheme is OK - ODP is
> horrifically complicated.
> 

> If this is the approach then I think we should make dmabuf its own stand alone API, reg_user_mr_dmabuf()

That's the original approach in the first version. We can go back there.

> 
> The implementation in mlx5 will be much more understandable, it would just do dma_buf_dynamic_attach() and program the XLT exactly
> the same as a normal umem.
> 
> The move_notify() simply zap's the XLT and triggers a work to reload it after the move. Locking is provided by the dma_resv_lock. Only a
> small disruption to the page fault handler is needed.
> 

We considered such scheme but didn't go that way due to the lack of notification when the move is done and thus the work wouldn't know when it can reload.

Now I think it again, we could probably signal the reload in the page fault handler. 

> > +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > +	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> > +				     DMA_BIDIRECTIONAL);
> > +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> 
> This doesn't look right, this lock has to be held up until the HW is programmed

The mapping remains valid until being invalidated again. There is a sequence number checking before programming the HW. 

> 
> The use of atomic looks probably wrong as well.

Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?

> 
> > +	k = 0;
> > +	total_pages = ib_umem_odp_num_pages(umem_odp);
> > +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> > +		addr = sg_dma_address(sg);
> > +		pages = sg_dma_len(sg) >> page_shift;
> > +		while (pages > 0 && k < total_pages) {
> > +			umem_odp->dma_list[k++] = addr | access_mask;
> > +			umem_odp->npages++;
> > +			addr += page_size;
> > +			pages--;
> 
> This isn't fragmenting the sg into a page list properly, won't work for unaligned things

I thought the addresses are aligned, but will add explicit alignment here.

> 
> And really we don't need the dma_list for this case, with a fixed whole mapping DMA SGL a normal umem sgl is OK and the normal umem
> XLT programming in mlx5 is fine.

The dma_list is used by both "polulate_mtt()" and "mlx5_ib_invalidate_range", which are used for XLT programming and invalidating (zapping), respectively.

> 
> Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-05 10:54   ` Christian König
@ 2020-10-05 16:19     ` Xiong, Jianxin
  0 siblings, 0 replies; 22+ messages in thread
From: Xiong, Jianxin @ 2020-10-05 16:19 UTC (permalink / raw)
  To: Christian König, linux-rdma, dri-devel
  Cc: Jason Gunthorpe, Doug Ledford, Leon Romanovsky, Vetter, Daniel

> -----Original Message-----
> From: Christian König <christian.koenig@amd.com>
> Sent: Monday, October 05, 2020 3:55 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>; linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org
> Cc: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky <leon@kernel.org>; Sumit Semwal
> <sumit.semwal@linaro.org>; Vetter, Daniel <daniel.vetter@intel.com>
> Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
> 
> Hi Jianxin,
> 
> Am 04.10.20 um 21:12 schrieb Jianxin Xiong:
> > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > be used to support peer-to-peer access from RDMA devices.
> >
> > Device memory exported via dma-buf is associated with a file descriptor.
> > This is passed to the user space as a property associated with the
> > buffer allocation. When the buffer is registered as a memory region,
> > the file descriptor is passed to the RDMA driver along with other
> > parameters.
> >
> > Implement the common code for importing dma-buf object and mapping
> > dma-buf pages.
> >
> > Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
> > Reviewed-by: Sean Hefty <sean.hefty@intel.com>
> > Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> 
> well first of all really nice work you have done here.
> 
> Since I'm not an expert on RDMA or its drivers I can't really review any of that part.
> 
> But at least from the DMA-buf side it looks like you are using the interface correctly as intended.
> 
> So feel free to add an Acked-by: Christian König <christian.koenig@amd.com> if it helps.
> 
> Thanks,
> Christian.

Thanks, will do.


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-05 16:18     ` Xiong, Jianxin
@ 2020-10-05 16:33       ` Jason Gunthorpe
  2020-10-05 19:41         ` Xiong, Jianxin
  2020-10-06  9:22       ` Daniel Vetter
  2020-10-06 16:40       ` Daniel Vetter
  2 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2020-10-05 16:33 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

On Mon, Oct 05, 2020 at 04:18:11PM +0000, Xiong, Jianxin wrote:

> > The implementation in mlx5 will be much more understandable, it would just do dma_buf_dynamic_attach() and program the XLT exactly
> > the same as a normal umem.
> > 
> > The move_notify() simply zap's the XLT and triggers a work to reload it after the move. Locking is provided by the dma_resv_lock. Only a
> > small disruption to the page fault handler is needed.
> 
> We considered such scheme but didn't go that way due to the lack of
> notification when the move is done and thus the work wouldn't know
> when it can reload.

Well, the work would block on the reservation lock and that indicates
the move is done

It would be nicer if the dma_buf could provide an op that things are
ready to go though

> Now I think it again, we could probably signal the reload in the page fault handler. 

This also works, with a performance cost

> > > +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > > +	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> > > +				     DMA_BIDIRECTIONAL);
> > > +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > 
> > This doesn't look right, this lock has to be held up until the HW is programmed
> 
> The mapping remains valid until being invalidated again. There is a
> sequence number checking before programming the HW.

It races, we could immediately trigger invalidation and then
re-program the HW with this stale data.
 
> > The use of atomic looks probably wrong as well.
> 
> Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?

It only increments once per invalidation, that usually is racy.

> > > +	total_pages = ib_umem_odp_num_pages(umem_odp);
> > > +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> > > +		addr = sg_dma_address(sg);
> > > +		pages = sg_dma_len(sg) >> page_shift;
> > > +		while (pages > 0 && k < total_pages) {
> > > +			umem_odp->dma_list[k++] = addr | access_mask;
> > > +			umem_odp->npages++;
> > > +			addr += page_size;
> > > +			pages--;
> > 
> > This isn't fragmenting the sg into a page list properly, won't work for unaligned things
> 
> I thought the addresses are aligned, but will add explicit alignment here.

I have no idea what comes out of dma_buf, I wouldn't make too many
assumptions since it does have to pass through the IOMMU layer too

> > And really we don't need the dma_list for this case, with a fixed
> > whole mapping DMA SGL a normal umem sgl is OK and the normal umem
> > XLT programming in mlx5 is fine.
> 
> The dma_list is used by both "polulate_mtt()" and
> "mlx5_ib_invalidate_range", which are used for XLT programming and
> invalidating (zapping), respectively.

Don't use those functions for the dma_buf flow.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-05 16:33       ` Jason Gunthorpe
@ 2020-10-05 19:41         ` Xiong, Jianxin
  0 siblings, 0 replies; 22+ messages in thread
From: Xiong, Jianxin @ 2020-10-05 19:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig


> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Monday, October 05, 2020 9:33 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
> 
> On Mon, Oct 05, 2020 at 04:18:11PM +0000, Xiong, Jianxin wrote:
> 
> > > The implementation in mlx5 will be much more understandable, it
> > > would just do dma_buf_dynamic_attach() and program the XLT exactly the same as a normal umem.
> > >
> > > The move_notify() simply zap's the XLT and triggers a work to reload
> > > it after the move. Locking is provided by the dma_resv_lock. Only a small disruption to the page fault handler is needed.
> >
> > We considered such scheme but didn't go that way due to the lack of
> > notification when the move is done and thus the work wouldn't know
> > when it can reload.
> 
> Well, the work would block on the reservation lock and that indicates the move is done

Got it.  Will work on that.

> 
> It would be nicer if the dma_buf could provide an op that things are ready to go though
> 
> > Now I think it again, we could probably signal the reload in the page fault handler.
> 
> This also works, with a performance cost
> 
> > > > +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > > > +	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> > > > +				     DMA_BIDIRECTIONAL);
> > > > +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > >
> > > This doesn't look right, this lock has to be held up until the HW is
> > > programmed
> >
> > The mapping remains valid until being invalidated again. There is a
> > sequence number checking before programming the HW.
> 
> It races, we could immediately trigger invalidation and then re-program the HW with this stale data.
> 
> > > The use of atomic looks probably wrong as well.
> >
> > Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?
> 
> It only increments once per invalidation, that usually is racy.

I will rework these parts.

> 
> > > > +	total_pages = ib_umem_odp_num_pages(umem_odp);
> > > > +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> > > > +		addr = sg_dma_address(sg);
> > > > +		pages = sg_dma_len(sg) >> page_shift;
> > > > +		while (pages > 0 && k < total_pages) {
> > > > +			umem_odp->dma_list[k++] = addr | access_mask;
> > > > +			umem_odp->npages++;
> > > > +			addr += page_size;
> > > > +			pages--;
> > >
> > > This isn't fragmenting the sg into a page list properly, won't work
> > > for unaligned things
> >
> > I thought the addresses are aligned, but will add explicit alignment here.
> 
> I have no idea what comes out of dma_buf, I wouldn't make too many assumptions since it does have to pass through the IOMMU layer too
> 
> > > And really we don't need the dma_list for this case, with a fixed
> > > whole mapping DMA SGL a normal umem sgl is OK and the normal umem
> > > XLT programming in mlx5 is fine.
> >
> > The dma_list is used by both "polulate_mtt()" and
> > "mlx5_ib_invalidate_range", which are used for XLT programming and
> > invalidating (zapping), respectively.
> 
> Don't use those functions for the dma_buf flow.

Ok.  I think we can use mlx5_ib_update_xlt() directly for dma-buf case. 

> 
> Jason

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-05 16:18     ` Xiong, Jianxin
  2020-10-05 16:33       ` Jason Gunthorpe
@ 2020-10-06  9:22       ` Daniel Vetter
  2020-10-06 15:26         ` Xiong, Jianxin
  2020-10-06 15:49         ` Jason Gunthorpe
  2020-10-06 16:40       ` Daniel Vetter
  2 siblings, 2 replies; 22+ messages in thread
From: Daniel Vetter @ 2020-10-06  9:22 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Jason Gunthorpe,
	Doug Ledford, Vetter, Daniel, Christian Koenig

On Mon, Oct 05, 2020 at 04:18:11PM +0000, Xiong, Jianxin wrote:
> > -----Original Message-----
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Monday, October 05, 2020 6:13 AM
> > To: Xiong, Jianxin <jianxin.xiong@intel.com>
> > Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> > <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> > <daniel.vetter@intel.com>
> > Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
> > 
> > On Sun, Oct 04, 2020 at 12:12:28PM -0700, Jianxin Xiong wrote:
> > > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > > be used to support peer-to-peer access from RDMA devices.
> > >
> > > Device memory exported via dma-buf is associated with a file descriptor.
> > > This is passed to the user space as a property associated with the
> > > buffer allocation. When the buffer is registered as a memory region,
> > > the file descriptor is passed to the RDMA driver along with other
> > > parameters.
> > >
> > > Implement the common code for importing dma-buf object and mapping
> > > dma-buf pages.
> > >
> > > Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
> > > Reviewed-by: Sean Hefty <sean.hefty@intel.com>
> > > Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> > > ---
> > >  drivers/infiniband/core/Makefile      |   2 +-
> > >  drivers/infiniband/core/umem.c        |   4 +
> > >  drivers/infiniband/core/umem_dmabuf.c | 291
> > > ++++++++++++++++++++++++++++++++++
> > >  drivers/infiniband/core/umem_dmabuf.h |  14 ++
> > >  drivers/infiniband/core/umem_odp.c    |  12 ++
> > >  include/rdma/ib_umem.h                |  19 ++-
> > >  6 files changed, 340 insertions(+), 2 deletions(-)  create mode
> > > 100644 drivers/infiniband/core/umem_dmabuf.c
> > >  create mode 100644 drivers/infiniband/core/umem_dmabuf.h
> > 
> > I think this is using ODP too literally, dmabuf isn't going to need fine grained page faults, and I'm not sure this locking scheme is OK - ODP is
> > horrifically complicated.
> > 
> 
> > If this is the approach then I think we should make dmabuf its own stand alone API, reg_user_mr_dmabuf()
> 
> That's the original approach in the first version. We can go back there.
> 
> > 
> > The implementation in mlx5 will be much more understandable, it would just do dma_buf_dynamic_attach() and program the XLT exactly
> > the same as a normal umem.
> > 
> > The move_notify() simply zap's the XLT and triggers a work to reload it after the move. Locking is provided by the dma_resv_lock. Only a
> > small disruption to the page fault handler is needed.
> > 
> 
> We considered such scheme but didn't go that way due to the lack of
> notification when the move is done and thus the work wouldn't know when
> it can reload.
> 
> Now I think it again, we could probably signal the reload in the page fault handler. 

For reinstanting the pages you need:

- dma_resv_lock, this prevents anyone else from issuing new moves or
  anything like that
- dma_resv_get_excl + dma_fence_wait to wait for any pending moves to
  finish. gpus generally don't wait on the cpu, but block the dependent
  dma operations from being scheduled until that fence fired. But for rdma
  odp I think you need the cpu wait in your worker here.
- get the new sg list, write it into your ptes
- dma_resv_unlock to make sure you're not racing with a concurrent
  move_notify

You can also grab multiple dma_resv_lock in atomically, but I think the
odp rdma model doesn't require that (gpus need that).

Note that you're allowed to allocate memory with GFP_KERNEL while holding
dma_resv_lock, so this shouldn't impose any issues. You are otoh not
allowed to cause userspace faults (so no gup/pup or copy*user with
faulting enabled). So all in all this shouldn't be any worse that calling
pup for normal umem.

Unlike mmu notifier the caller holds dma_resv_lock already for you around
the move_notify callback, so you shouldn't need any additional locking in
there (aside from what you need to zap the ptes and flush hw tlbs).

Cheers, Daniel

> 
> > > +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > > +	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> > > +				     DMA_BIDIRECTIONAL);
> > > +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > 
> > This doesn't look right, this lock has to be held up until the HW is programmed
> 
> The mapping remains valid until being invalidated again. There is a sequence number checking before programming the HW. 
> 
> > 
> > The use of atomic looks probably wrong as well.
> 
> Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?
> 
> > 
> > > +	k = 0;
> > > +	total_pages = ib_umem_odp_num_pages(umem_odp);
> > > +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> > > +		addr = sg_dma_address(sg);
> > > +		pages = sg_dma_len(sg) >> page_shift;
> > > +		while (pages > 0 && k < total_pages) {
> > > +			umem_odp->dma_list[k++] = addr | access_mask;
> > > +			umem_odp->npages++;
> > > +			addr += page_size;
> > > +			pages--;
> > 
> > This isn't fragmenting the sg into a page list properly, won't work for unaligned things
> 
> I thought the addresses are aligned, but will add explicit alignment here.
> 
> > 
> > And really we don't need the dma_list for this case, with a fixed whole mapping DMA SGL a normal umem sgl is OK and the normal umem
> > XLT programming in mlx5 is fine.
> 
> The dma_list is used by both "polulate_mtt()" and "mlx5_ib_invalidate_range", which are used for XLT programming and invalidating (zapping), respectively.
> 
> > 
> > Jason
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-06  9:22       ` Daniel Vetter
@ 2020-10-06 15:26         ` Xiong, Jianxin
  2020-10-06 15:49         ` Jason Gunthorpe
  1 sibling, 0 replies; 22+ messages in thread
From: Xiong, Jianxin @ 2020-10-06 15:26 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Jason Gunthorpe,
	Doug Ledford, Vetter, Daniel, Christian Koenig


> -----Original Message-----
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Tuesday, October 06, 2020 2:22 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> Doug Ledford <dledford@redhat.com>; Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig <christian.koenig@amd.com>
> Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
> 
> On Mon, Oct 05, 2020 at 04:18:11PM +0000, Xiong, Jianxin wrote:
> > > -----Original Message-----
> > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > Sent: Monday, October 05, 2020 6:13 AM
> > > To: Xiong, Jianxin <jianxin.xiong@intel.com>
> > > Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> > > Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> > > <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian
> > > Koenig <christian.koenig@amd.com>; Vetter, Daniel
> > > <daniel.vetter@intel.com>
> > > Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf
> > > as user memory region
> > >
> > > On Sun, Oct 04, 2020 at 12:12:28PM -0700, Jianxin Xiong wrote:
> > > > Dma-buf is a standard cross-driver buffer sharing mechanism that
> > > > can be used to support peer-to-peer access from RDMA devices.
> > > >
> > > > Device memory exported via dma-buf is associated with a file descriptor.
> > > > This is passed to the user space as a property associated with the
> > > > buffer allocation. When the buffer is registered as a memory
> > > > region, the file descriptor is passed to the RDMA driver along
> > > > with other parameters.
> > > >
> > > > Implement the common code for importing dma-buf object and mapping
> > > > dma-buf pages.
> > > >
> > > > Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
> > > > Reviewed-by: Sean Hefty <sean.hefty@intel.com>
> > > > Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> > > > ---
> > > >  drivers/infiniband/core/Makefile      |   2 +-
> > > >  drivers/infiniband/core/umem.c        |   4 +
> > > >  drivers/infiniband/core/umem_dmabuf.c | 291
> > > > ++++++++++++++++++++++++++++++++++
> > > >  drivers/infiniband/core/umem_dmabuf.h |  14 ++
> > > >  drivers/infiniband/core/umem_odp.c    |  12 ++
> > > >  include/rdma/ib_umem.h                |  19 ++-
> > > >  6 files changed, 340 insertions(+), 2 deletions(-)  create mode
> > > > 100644 drivers/infiniband/core/umem_dmabuf.c
> > > >  create mode 100644 drivers/infiniband/core/umem_dmabuf.h
> > >
> > > I think this is using ODP too literally, dmabuf isn't going to need
> > > fine grained page faults, and I'm not sure this locking scheme is OK - ODP is horrifically complicated.
> > >
> >
> > > If this is the approach then I think we should make dmabuf its own
> > > stand alone API, reg_user_mr_dmabuf()
> >
> > That's the original approach in the first version. We can go back there.
> >
> > >
> > > The implementation in mlx5 will be much more understandable, it
> > > would just do dma_buf_dynamic_attach() and program the XLT exactly the same as a normal umem.
> > >
> > > The move_notify() simply zap's the XLT and triggers a work to reload
> > > it after the move. Locking is provided by the dma_resv_lock. Only a small disruption to the page fault handler is needed.
> > >
> >
> > We considered such scheme but didn't go that way due to the lack of
> > notification when the move is done and thus the work wouldn't know
> > when it can reload.
> >
> > Now I think it again, we could probably signal the reload in the page fault handler.
> 
> For reinstanting the pages you need:
> 
> - dma_resv_lock, this prevents anyone else from issuing new moves or
>   anything like that
> - dma_resv_get_excl + dma_fence_wait to wait for any pending moves to
>   finish. gpus generally don't wait on the cpu, but block the dependent
>   dma operations from being scheduled until that fence fired. But for rdma
>   odp I think you need the cpu wait in your worker here.
> - get the new sg list, write it into your ptes
> - dma_resv_unlock to make sure you're not racing with a concurrent
>   move_notify
> 
> You can also grab multiple dma_resv_lock in atomically, but I think the odp rdma model doesn't require that (gpus need that).
> 
> Note that you're allowed to allocate memory with GFP_KERNEL while holding dma_resv_lock, so this shouldn't impose any issues. You are
> otoh not allowed to cause userspace faults (so no gup/pup or copy*user with faulting enabled). So all in all this shouldn't be any worse that
> calling pup for normal umem.
> 
> Unlike mmu notifier the caller holds dma_resv_lock already for you around the move_notify callback, so you shouldn't need any additional
> locking in there (aside from what you need to zap the ptes and flush hw tlbs).
> 
> Cheers, Daniel
> 

Hi Daniel, thanks for providing the details. I would have missed the dma_resv_get_excl + dma_fence_wait part otherwise. 

> >
> > > > +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > > > +	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> > > > +				     DMA_BIDIRECTIONAL);
> > > > +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > >
> > > This doesn't look right, this lock has to be held up until the HW is
> > > programmed
> >
> > The mapping remains valid until being invalidated again. There is a sequence number checking before programming the HW.
> >
> > >
> > > The use of atomic looks probably wrong as well.
> >
> > Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?
> >
> > >
> > > > +	k = 0;
> > > > +	total_pages = ib_umem_odp_num_pages(umem_odp);
> > > > +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> > > > +		addr = sg_dma_address(sg);
> > > > +		pages = sg_dma_len(sg) >> page_shift;
> > > > +		while (pages > 0 && k < total_pages) {
> > > > +			umem_odp->dma_list[k++] = addr | access_mask;
> > > > +			umem_odp->npages++;
> > > > +			addr += page_size;
> > > > +			pages--;
> > >
> > > This isn't fragmenting the sg into a page list properly, won't work
> > > for unaligned things
> >
> > I thought the addresses are aligned, but will add explicit alignment here.
> >
> > >
> > > And really we don't need the dma_list for this case, with a fixed
> > > whole mapping DMA SGL a normal umem sgl is OK and the normal umem XLT programming in mlx5 is fine.
> >
> > The dma_list is used by both "polulate_mtt()" and "mlx5_ib_invalidate_range", which are used for XLT programming and invalidating
> (zapping), respectively.
> >
> > >
> > > Jason
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-06  9:22       ` Daniel Vetter
  2020-10-06 15:26         ` Xiong, Jianxin
@ 2020-10-06 15:49         ` Jason Gunthorpe
  2020-10-06 16:34           ` Daniel Vetter
  1 sibling, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2020-10-06 15:49 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig, Xiong, Jianxin

On Tue, Oct 06, 2020 at 11:22:14AM +0200, Daniel Vetter wrote:
> 
> For reinstanting the pages you need:
> 
> - dma_resv_lock, this prevents anyone else from issuing new moves or
>   anything like that
> - dma_resv_get_excl + dma_fence_wait to wait for any pending moves to
>   finish. gpus generally don't wait on the cpu, but block the dependent
>   dma operations from being scheduled until that fence fired. But for rdma
>   odp I think you need the cpu wait in your worker here.

Reinstating is not really any different that the first insertion, so
then all this should be needed in every case?

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-06 15:49         ` Jason Gunthorpe
@ 2020-10-06 16:34           ` Daniel Vetter
  2020-10-06 17:24             ` Daniel Vetter
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Vetter @ 2020-10-06 16:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig, Xiong, Jianxin

On Tue, Oct 06, 2020 at 12:49:56PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 06, 2020 at 11:22:14AM +0200, Daniel Vetter wrote:
> > 
> > For reinstanting the pages you need:
> > 
> > - dma_resv_lock, this prevents anyone else from issuing new moves or
> >   anything like that
> > - dma_resv_get_excl + dma_fence_wait to wait for any pending moves to
> >   finish. gpus generally don't wait on the cpu, but block the dependent
> >   dma operations from being scheduled until that fence fired. But for rdma
> >   odp I think you need the cpu wait in your worker here.
> 
> Reinstating is not really any different that the first insertion, so
> then all this should be needed in every case?

Yes. Without move_notify we pin the dma-buf into system memory, so it
can't move, and hence you also don't have to chase it. But with
move_notify this all becomes possible.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-05 16:18     ` Xiong, Jianxin
  2020-10-05 16:33       ` Jason Gunthorpe
  2020-10-06  9:22       ` Daniel Vetter
@ 2020-10-06 16:40       ` Daniel Vetter
  2 siblings, 0 replies; 22+ messages in thread
From: Daniel Vetter @ 2020-10-06 16:40 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Jason Gunthorpe,
	Doug Ledford, Vetter, Daniel, Christian Koenig

On Mon, Oct 05, 2020 at 04:18:11PM +0000, Xiong, Jianxin wrote:
> > -----Original Message-----
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Monday, October 05, 2020 6:13 AM
> > To: Xiong, Jianxin <jianxin.xiong@intel.com>
> > Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> > <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> > <daniel.vetter@intel.com>
> > Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
> > 
> > On Sun, Oct 04, 2020 at 12:12:28PM -0700, Jianxin Xiong wrote:
> > > Dma-buf is a standard cross-driver buffer sharing mechanism that can
> > > be used to support peer-to-peer access from RDMA devices.
> > >
> > > Device memory exported via dma-buf is associated with a file descriptor.
> > > This is passed to the user space as a property associated with the
> > > buffer allocation. When the buffer is registered as a memory region,
> > > the file descriptor is passed to the RDMA driver along with other
> > > parameters.
> > >
> > > Implement the common code for importing dma-buf object and mapping
> > > dma-buf pages.
> > >
> > > Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
> > > Reviewed-by: Sean Hefty <sean.hefty@intel.com>
> > > Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> > > ---
> > >  drivers/infiniband/core/Makefile      |   2 +-
> > >  drivers/infiniband/core/umem.c        |   4 +
> > >  drivers/infiniband/core/umem_dmabuf.c | 291
> > > ++++++++++++++++++++++++++++++++++
> > >  drivers/infiniband/core/umem_dmabuf.h |  14 ++
> > >  drivers/infiniband/core/umem_odp.c    |  12 ++
> > >  include/rdma/ib_umem.h                |  19 ++-
> > >  6 files changed, 340 insertions(+), 2 deletions(-)  create mode
> > > 100644 drivers/infiniband/core/umem_dmabuf.c
> > >  create mode 100644 drivers/infiniband/core/umem_dmabuf.h
> > 
> > I think this is using ODP too literally, dmabuf isn't going to need fine grained page faults, and I'm not sure this locking scheme is OK - ODP is
> > horrifically complicated.
> > 
> 
> > If this is the approach then I think we should make dmabuf its own stand alone API, reg_user_mr_dmabuf()
> 
> That's the original approach in the first version. We can go back there.
> 
> > 
> > The implementation in mlx5 will be much more understandable, it would just do dma_buf_dynamic_attach() and program the XLT exactly
> > the same as a normal umem.
> > 
> > The move_notify() simply zap's the XLT and triggers a work to reload it after the move. Locking is provided by the dma_resv_lock. Only a
> > small disruption to the page fault handler is needed.
> > 
> 
> We considered such scheme but didn't go that way due to the lack of notification when the move is done and thus the work wouldn't know when it can reload.
> 
> Now I think it again, we could probably signal the reload in the page fault handler. 
> 
> > > +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > > +	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
> > > +				     DMA_BIDIRECTIONAL);
> > > +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > 
> > This doesn't look right, this lock has to be held up until the HW is programmed
> 
> The mapping remains valid until being invalidated again. There is a sequence number checking before programming the HW. 
> 
> > 
> > The use of atomic looks probably wrong as well.
> 
> Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?
> 
> > 
> > > +	k = 0;
> > > +	total_pages = ib_umem_odp_num_pages(umem_odp);
> > > +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
> > > +		addr = sg_dma_address(sg);
> > > +		pages = sg_dma_len(sg) >> page_shift;
> > > +		while (pages > 0 && k < total_pages) {
> > > +			umem_odp->dma_list[k++] = addr | access_mask;
> > > +			umem_odp->npages++;
> > > +			addr += page_size;
> > > +			pages--;
> > 
> > This isn't fragmenting the sg into a page list properly, won't work for unaligned things
> 
> I thought the addresses are aligned, but will add explicit alignment here.

Everyone's working under the assumption that dma_buf sg lists are fully
PAGE aligned. Lots of stuff can break otherwise all over the place. You
might get more than that, especially for p2p since many gpus work on 64kb
pages for the vram, but system pages are 4k aligned.

I think we need a patch to clarify this in the kerneldoc, bot for
dma_buf_map_attachment for importers and the dma_buf_ops callback for
exporters.

Also I think it'd be good to verify that when dma api debugging is
enabled, i.e. when dma_buf_map_attachment succeeds, walk the entire sg
list and validate that all dma_addr_t segments are page aligned. And if
not, WARN loudly and fail.
-Daniel

 
> > 
> > And really we don't need the dma_list for this case, with a fixed whole mapping DMA SGL a normal umem sgl is OK and the normal umem
> > XLT programming in mlx5 is fine.
> 
> The dma_list is used by both "polulate_mtt()" and "mlx5_ib_invalidate_range", which are used for XLT programming and invalidating (zapping), respectively.
> 
> > 
> > Jason
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-06 16:34           ` Daniel Vetter
@ 2020-10-06 17:24             ` Daniel Vetter
  2020-10-06 18:02               ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Vetter @ 2020-10-06 17:24 UTC (permalink / raw)
  To: Jason Gunthorpe, Christian König
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Xiong, Jianxin

On Tue, Oct 6, 2020 at 6:34 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Oct 06, 2020 at 12:49:56PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 06, 2020 at 11:22:14AM +0200, Daniel Vetter wrote:
> > >
> > > For reinstanting the pages you need:
> > >
> > > - dma_resv_lock, this prevents anyone else from issuing new moves or
> > >   anything like that
> > > - dma_resv_get_excl + dma_fence_wait to wait for any pending moves to
> > >   finish. gpus generally don't wait on the cpu, but block the dependent
> > >   dma operations from being scheduled until that fence fired. But for rdma
> > >   odp I think you need the cpu wait in your worker here.
> >
> > Reinstating is not really any different that the first insertion, so
> > then all this should be needed in every case?
>
> Yes. Without move_notify we pin the dma-buf into system memory, so it
> can't move, and hence you also don't have to chase it. But with
> move_notify this all becomes possible.

I just realized I got it wrong compared to gpus. I needs to be:
1. dma_resv_lock
2. dma_buf_map_attachment, which might have to move the buffer around
again if you're unlucky
3. wait for the exclusive fence
4. put sgt into your rdma ptes
5 dma_resv_unlock

Maybe also something we should document somewhere for dynamic buffers.
Assuming I got it right this time around ... Christian?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-06 17:24             ` Daniel Vetter
@ 2020-10-06 18:02               ` Jason Gunthorpe
  2020-10-06 18:17                 ` Daniel Vetter
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2020-10-06 18:02 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian König, Xiong, Jianxin

On Tue, Oct 06, 2020 at 07:24:30PM +0200, Daniel Vetter wrote:
> On Tue, Oct 6, 2020 at 6:34 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Oct 06, 2020 at 12:49:56PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 06, 2020 at 11:22:14AM +0200, Daniel Vetter wrote:
> > > >
> > > > For reinstanting the pages you need:
> > > >
> > > > - dma_resv_lock, this prevents anyone else from issuing new moves or
> > > >   anything like that
> > > > - dma_resv_get_excl + dma_fence_wait to wait for any pending moves to
> > > >   finish. gpus generally don't wait on the cpu, but block the dependent
> > > >   dma operations from being scheduled until that fence fired. But for rdma
> > > >   odp I think you need the cpu wait in your worker here.
> > >
> > > Reinstating is not really any different that the first insertion, so
> > > then all this should be needed in every case?
> >
> > Yes. Without move_notify we pin the dma-buf into system memory, so it
> > can't move, and hence you also don't have to chase it. But with
> > move_notify this all becomes possible.
> 
> I just realized I got it wrong compared to gpus. I needs to be:
> 1. dma_resv_lock
> 2. dma_buf_map_attachment, which might have to move the buffer around
> again if you're unlucky
> 3. wait for the exclusive fence
> 4. put sgt into your rdma ptes
> 5 dma_resv_unlock
> 
> Maybe also something we should document somewhere for dynamic buffers.
> Assuming I got it right this time around ... Christian?

#3 between 2 and 4 seems strange - I would expect once
dma_buf_map_attachment() returns that the buffer can be placed in the
ptes. It certianly can't be changed after the SGL is returned..

Feels like #2 should serialize all this internally? An API that
returns invalidate data sometimes is dangerous :)

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-06 18:02               ` Jason Gunthorpe
@ 2020-10-06 18:17                 ` Daniel Vetter
  2020-10-06 18:38                   ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Vetter @ 2020-10-06 18:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian König, Xiong, Jianxin

On Tue, Oct 6, 2020 at 8:02 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Oct 06, 2020 at 07:24:30PM +0200, Daniel Vetter wrote:
> > On Tue, Oct 6, 2020 at 6:34 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Tue, Oct 06, 2020 at 12:49:56PM -0300, Jason Gunthorpe wrote:
> > > > On Tue, Oct 06, 2020 at 11:22:14AM +0200, Daniel Vetter wrote:
> > > > >
> > > > > For reinstanting the pages you need:
> > > > >
> > > > > - dma_resv_lock, this prevents anyone else from issuing new moves or
> > > > >   anything like that
> > > > > - dma_resv_get_excl + dma_fence_wait to wait for any pending moves to
> > > > >   finish. gpus generally don't wait on the cpu, but block the dependent
> > > > >   dma operations from being scheduled until that fence fired. But for rdma
> > > > >   odp I think you need the cpu wait in your worker here.
> > > >
> > > > Reinstating is not really any different that the first insertion, so
> > > > then all this should be needed in every case?
> > >
> > > Yes. Without move_notify we pin the dma-buf into system memory, so it
> > > can't move, and hence you also don't have to chase it. But with
> > > move_notify this all becomes possible.
> >
> > I just realized I got it wrong compared to gpus. I needs to be:
> > 1. dma_resv_lock
> > 2. dma_buf_map_attachment, which might have to move the buffer around
> > again if you're unlucky
> > 3. wait for the exclusive fence
> > 4. put sgt into your rdma ptes
> > 5 dma_resv_unlock
> >
> > Maybe also something we should document somewhere for dynamic buffers.
> > Assuming I got it right this time around ... Christian?
>
> #3 between 2 and 4 seems strange - I would expect once
> dma_buf_map_attachment() returns that the buffer can be placed in the
> ptes. It certianly can't be changed after the SGL is returned..


So on the gpu we pipeline this all. So step 4 doesn't happen on the
cpu, but instead we queue up a bunch of command buffers so that the
gpu writes these pagetables (and the flushes tlbs and then does the
actual stuff userspace wants it to do).

And all that is being blocked in the gpu scheduler on the fences we
acquire in step 3. Again we don't wait on the cpu for that fence, we
just queue it all up and let the gpu scheduler sort out the mess. End
result is that you get a sgt that points at stuff which very well
might have nothing even remotely resembling your buffer in there at
the moment. But all the copy operations are queued up, so rsn the data
will also be there.

This is also why the precise semantics of move_notify for gpu<->gpu
sharing took forever to discuss and are still a bit wip, because you
have the inverse problem: The dma api mapping might still be there in
the iommu, but the data behind it is long gone and replaced. So we
need to be really carefully with making sure that dma operations are
blocked in the gpu properly, with all the flushing and everything. I
think we've reached the conclusion that ->move_notify is allowed to
change the set of fences in the dma_resv so that these flushes and pte
writes can be queued up correctly (on many gpu you can't synchronously
flush tlbs, yay). The exporter then needs to make sure that the actual
buffer move is queued up behind all these operations too.

But rdma doesn't work like that, so it looks all a bit funny.

Anticipating your next question: Can this mean there's a bunch of
differnt dma/buffer mappings in flight for the same buffer?

Yes. We call the ghost objects, at least in the ttm helpers.

> Feels like #2 should serialize all this internally? An API that
> returns invalidate data sometimes is dangerous :)

If you use the non-dynamic mode, where we pin the buffer into system
memory at dma_buf_attach time, it kinda works like that. Also it's not
flat out invalide date, it's the most up-to-date data reflecting all
committed changes. Plus dma_resv tells you when that will actually be
reality in the hardware, not just the software tracking of what's
going on.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-06 18:17                 ` Daniel Vetter
@ 2020-10-06 18:38                   ` Jason Gunthorpe
  2020-10-06 19:12                     ` Daniel Vetter
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2020-10-06 18:38 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian König, Xiong, Jianxin

On Tue, Oct 06, 2020 at 08:17:05PM +0200, Daniel Vetter wrote:

> So on the gpu we pipeline this all. So step 4 doesn't happen on the
> cpu, but instead we queue up a bunch of command buffers so that the
> gpu writes these pagetables (and the flushes tlbs and then does the
> actual stuff userspace wants it to do).

mlx5 HW does basically this as well.

We just apply scheduling for this work on the device, not in the CPU.
 
> just queue it all up and let the gpu scheduler sort out the mess. End
> result is that you get a sgt that points at stuff which very well
> might have nothing even remotely resembling your buffer in there at
> the moment. But all the copy operations are queued up, so rsn the data
> will also be there.

The explanation make sense, thanks

> But rdma doesn't work like that, so it looks all a bit funny.

Well, I guess it could, but how would it make anything better? I can
overlap building the SGL and the device PTEs with something else doing
'move', but is that a workload that needs such agressive optimization?

> This is also why the precise semantics of move_notify for gpu<->gpu
> sharing took forever to discuss and are still a bit wip, because you
> have the inverse problem: The dma api mapping might still be there

Seems like this all makes a graph of operations, can't start the next
one until all deps are finished. Actually sounds a lot like futures.

Would be clearer if this attach API provided some indication that the
SGL is actually a future valid SGL..

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-06 18:38                   ` Jason Gunthorpe
@ 2020-10-06 19:12                     ` Daniel Vetter
  2020-10-07  7:13                       ` Christian König
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Vetter @ 2020-10-06 19:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian König, Xiong, Jianxin

On Tue, Oct 6, 2020 at 8:38 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Oct 06, 2020 at 08:17:05PM +0200, Daniel Vetter wrote:
>
> > So on the gpu we pipeline this all. So step 4 doesn't happen on the
> > cpu, but instead we queue up a bunch of command buffers so that the
> > gpu writes these pagetables (and the flushes tlbs and then does the
> > actual stuff userspace wants it to do).
>
> mlx5 HW does basically this as well.
>
> We just apply scheduling for this work on the device, not in the CPU.
>
> > just queue it all up and let the gpu scheduler sort out the mess. End
> > result is that you get a sgt that points at stuff which very well
> > might have nothing even remotely resembling your buffer in there at
> > the moment. But all the copy operations are queued up, so rsn the data
> > will also be there.
>
> The explanation make sense, thanks
>
> > But rdma doesn't work like that, so it looks all a bit funny.
>
> Well, I guess it could, but how would it make anything better? I can
> overlap building the SGL and the device PTEs with something else doing
> 'move', but is that a workload that needs such agressive optimization?

The compounding issue with gpus is that we need entire lists of
buffers, atomically, for our dma operations. Which means that the
cliff you jump over with a working set that's slightly too big is very
steep, so that you have to pipeline your buffer moves interleaved with
dma operations to keep the hw busy. Having per page fault handling and
hw that can continue in other places while that fault is repaired
should smooth that cliff out enough that you don't need to bother.

I think at worst we might worry about unfairness. With the "entire
list of buffers" workload model gpus might starve out rdma badly by
constantly moving all the buffers around. Installing a dma_fence in
the rdma page fault handler, to keep the dma-buf busy for a small
amount of time to make sure at least the next rdma transfer goes
through without more faults should be able to fix that though. Such a
keepalive fence should be in the shared slots for dma_resv, to not
blocker other access. This wouldn't even need any other changes in
rdma (although delaying the pte zapping when we get a move_notify
would be better), since an active fence alone makes that buffer a much
less likely target for eviction.

> > This is also why the precise semantics of move_notify for gpu<->gpu
> > sharing took forever to discuss and are still a bit wip, because you
> > have the inverse problem: The dma api mapping might still be there
>
> Seems like this all makes a graph of operations, can't start the next
> one until all deps are finished. Actually sounds a lot like futures.
>
> Would be clearer if this attach API provided some indication that the
> SGL is actually a future valid SGL..

Yeah I think one of the things we've discussed is whether dma_buf
should pass around the fences more explicitly, or whether we should
continue to smash on the more implicit dma_resv tracking. Inertia won
out, at least for now because gpu drivers do all the book-keeping
directly in the shared dma_resv structure anyway, so this wouldn't
have helped to get cleaner code.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-10-06 19:12                     ` Daniel Vetter
@ 2020-10-07  7:13                       ` Christian König
  0 siblings, 0 replies; 22+ messages in thread
From: Christian König @ 2020-10-07  7:13 UTC (permalink / raw)
  To: Daniel Vetter, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Xiong, Jianxin

Hi guys,

maybe it becomes clearer to understand when you see this as two 
different things:
1. The current location where the buffer is.
2. If the data inside the buffer can be accessed.

The current location is returned back by dma_buf_map_attachment() and 
the result of it can be used to fill up your page tables.

But before you can access the data at this location you need to wait for 
the exclusive fence to signal.

As Daniel explained the reason for this is that GPUs are heavily 
pipelined pieces of hardware. To keep all blocks busy all the time you 
need to prepare things ahead of time.

This is not only to make buffers able to move around, but also for 
example for cache coherency. In other words the buffer could have been 
at the given location all the time, but you need to wait for the 
exclusive fence to guarantee that write back caches are done with their job.

Regards,
Christian.

Am 06.10.20 um 21:12 schrieb Daniel Vetter:
> On Tue, Oct 6, 2020 at 8:38 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> On Tue, Oct 06, 2020 at 08:17:05PM +0200, Daniel Vetter wrote:
>>
>>> So on the gpu we pipeline this all. So step 4 doesn't happen on the
>>> cpu, but instead we queue up a bunch of command buffers so that the
>>> gpu writes these pagetables (and the flushes tlbs and then does the
>>> actual stuff userspace wants it to do).
>> mlx5 HW does basically this as well.
>>
>> We just apply scheduling for this work on the device, not in the CPU.
>>
>>> just queue it all up and let the gpu scheduler sort out the mess. End
>>> result is that you get a sgt that points at stuff which very well
>>> might have nothing even remotely resembling your buffer in there at
>>> the moment. But all the copy operations are queued up, so rsn the data
>>> will also be there.
>> The explanation make sense, thanks
>>
>>> But rdma doesn't work like that, so it looks all a bit funny.
>> Well, I guess it could, but how would it make anything better? I can
>> overlap building the SGL and the device PTEs with something else doing
>> 'move', but is that a workload that needs such agressive optimization?
> The compounding issue with gpus is that we need entire lists of
> buffers, atomically, for our dma operations. Which means that the
> cliff you jump over with a working set that's slightly too big is very
> steep, so that you have to pipeline your buffer moves interleaved with
> dma operations to keep the hw busy. Having per page fault handling and
> hw that can continue in other places while that fault is repaired
> should smooth that cliff out enough that you don't need to bother.
>
> I think at worst we might worry about unfairness. With the "entire
> list of buffers" workload model gpus might starve out rdma badly by
> constantly moving all the buffers around. Installing a dma_fence in
> the rdma page fault handler, to keep the dma-buf busy for a small
> amount of time to make sure at least the next rdma transfer goes
> through without more faults should be able to fix that though. Such a
> keepalive fence should be in the shared slots for dma_resv, to not
> blocker other access. This wouldn't even need any other changes in
> rdma (although delaying the pte zapping when we get a move_notify
> would be better), since an active fence alone makes that buffer a much
> less likely target for eviction.
>
>>> This is also why the precise semantics of move_notify for gpu<->gpu
>>> sharing took forever to discuss and are still a bit wip, because you
>>> have the inverse problem: The dma api mapping might still be there
>> Seems like this all makes a graph of operations, can't start the next
>> one until all deps are finished. Actually sounds a lot like futures.
>>
>> Would be clearer if this attach API provided some indication that the
>> SGL is actually a future valid SGL..
> Yeah I think one of the things we've discussed is whether dma_buf
> should pass around the fences more explicitly, or whether we should
> continue to smash on the more implicit dma_resv tracking. Inertia won
> out, at least for now because gpu drivers do all the book-keeping
> directly in the shared dma_resv structure anyway, so this wouldn't
> have helped to get cleaner code.
> -Daniel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-10-07  7:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-04 19:12 [RFC PATCH v3 0/4] RDMA: Add dma-buf support Jianxin Xiong
2020-10-04 19:12 ` [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region Jianxin Xiong
2020-10-05 10:54   ` Christian König
2020-10-05 16:19     ` Xiong, Jianxin
2020-10-05 13:13   ` Jason Gunthorpe
2020-10-05 16:18     ` Xiong, Jianxin
2020-10-05 16:33       ` Jason Gunthorpe
2020-10-05 19:41         ` Xiong, Jianxin
2020-10-06  9:22       ` Daniel Vetter
2020-10-06 15:26         ` Xiong, Jianxin
2020-10-06 15:49         ` Jason Gunthorpe
2020-10-06 16:34           ` Daniel Vetter
2020-10-06 17:24             ` Daniel Vetter
2020-10-06 18:02               ` Jason Gunthorpe
2020-10-06 18:17                 ` Daniel Vetter
2020-10-06 18:38                   ` Jason Gunthorpe
2020-10-06 19:12                     ` Daniel Vetter
2020-10-07  7:13                       ` Christian König
2020-10-06 16:40       ` Daniel Vetter
2020-10-04 19:12 ` [RFC PATCH v3 2/4] RDMA: Expand driver memory registration methods to support dma-buf Jianxin Xiong
2020-10-04 19:12 ` [RFC PATCH v3 3/4] RDMA/mlx5: Support dma-buf based userspace memory region Jianxin Xiong
2020-10-04 19:12 ` [RFC PATCH v3 4/4] RDMA/uverbs: Add uverbs command for dma-buf based MR registration Jianxin Xiong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).