All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/5] RDMA: Add dma-buf support
@ 2020-11-05 22:48 ` Jianxin Xiong
  0 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

This is the eighth version of the patch set. Changelog:

v8:
* Modify the dma-buf sg list in place to get a proper umem sg list and
  restore it before calling dma_buf_unmap_attachment()
* Validate the umem sg list with ib_umem_find_best_pgsz()
* Remove the logic for slicing the sg list at runtime

v7: https://www.spinics.net/lists/linux-rdma/msg97297.html
* Rebase on top of latest mlx5 MR patch series
* Slice dma-buf sg list at runtime instead of creating a new list
* Preload the buffer page mapping when the MR is created
* Move the 'dma_virt_ops' check into dma_buf_dynamic_attach()

v6: https://www.spinics.net/lists/linux-rdma/msg96923.html
* Move the dma-buf invalidation callback from the core to the device
  driver
* Move mapping update from work queue to pagefault handler
* Add dma-buf based MRs to the xarray of mmkeys so that the pagefault
  handler can be reached
* Update the new driver method and uverbs command signature by changing
  the paramter 'addr' to 'offset'
* Modify the sg list returned from dma_buf_map_attachment() based on
  the parameters 'offset' and 'length'
* Don't import dma-buf if 'dma_virt_ops' is used by the dma device
* The patch that clarifies dma-buf sg lists alignment has landed at
  https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac80cd17a615
  and thus is no longer included with this set

v5: https://www.spinics.net/lists/linux-rdma/msg96786.html
* Fix a few warnings reported by kernel test robot:
    - no previous prototype for function 'ib_umem_dmabuf_release' 
    - no previous prototype for function 'ib_umem_dmabuf_map_pages'
    - comparison of distinct pointer types in 'check_add_overflow'
* Add comment for the wait between getting the dma-buf sg tagle and
  updating the NIC page table

v4: https://www.spinics.net/lists/linux-rdma/msg96767.html
* Add a new ib_device method reg_user_mr_dmabuf() instead of expanding
  the existing method reg_user_mr()
* Use a separate code flow for dma-buf instead of adding special cases
  to the ODP memory region code path
* In invalidation callback, new mapping is updated as whole using work
  queue instead of being updated in page granularity in the page fault
  handler
* Use dma_resv_get_excl() and dma_fence_wait() to ensure the content of
  the pages have been moved to the new location before the new mapping
  is programmed into the NIC
* Add code to the ODP page fault handler to check the mapping status
* The new access flag added in v3 is removed.
* The checking for on-demand paging support in the new uverbs command
  is removed because it is implied by implementing the new ib_device
  method
* Clarify that dma-buf sg lists are page aligned

v3: https://www.spinics.net/lists/linux-rdma/msg96330.html
* Use dma_buf_dynamic_attach() instead of dma_buf_attach()
* Use on-demand paging mechanism to avoid pinning the GPU memory
* Instead of adding a new parameter to the device method for memory
  registration, pass all the attributes including the file descriptor
  as a structure
* Define a new access flag for dma-buf based memory region
* Check for on-demand paging support in the new uverbs command

v2: https://www.spinics.net/lists/linux-rdma/msg93643.html
* The Kconfig option is removed. There is no dependence issue since
  dma-buf driver is always enabled.
* The declaration of new data structure and functions is reorganized to
  minimize the visibility of the changes.
* The new uverbs command now goes through ioctl() instead of write().
* The rereg functionality is removed.
* Instead of adding new device method for dma-buf specific registration,
  existing method is extended to accept an extra parameter. 
* The correct function is now used for address range checking. 

v1: https://www.spinics.net/lists/linux-rdma/msg90720.html
* The initial patch set
* Implement core functions for importing and mapping dma-buf
* Use dma-buf static attach interface
* Add two ib_device methods reg_user_mr_fd() and rereg_user_mr_fd()
* Add two uverbs commands via the write() interface
* Add Kconfig option
* Add dma-buf support to mlx5 device

When enabled, an RDMA capable NIC can perform peer-to-peer transactions
over PCIe to access the local memory located on another device. This can
often lead to better performance than using a system memory buffer for
RDMA and copying data between the buffer and device memory.

Current kernel RDMA stack uses get_user_pages() to pin the physical
pages backing the user buffer and uses dma_map_sg_attrs() to get the
dma addresses for memory access. This usually doesn't work for peer
device memory due to the lack of associated page structures.

Several mechanisms exist today to facilitate device memory access.

ZONE_DEVICE is a new zone for device memory in the memory management
subsystem. It allows pages from device memory being described with
specialized page structures, but what can be done with these page
structures may be different from system memory. ZONE_DEVICE is further
specialized into multiple memory types, such as one type for PCI
p2pmem/p2pdma and one type for HMM.

PCI p2pmem/p2pdma uses ZONE_DEVICE to represent device memory residing
in a PCI BAR and provides a set of calls to publish, discover, allocate,
and map such memory for peer-to-peer transactions. One feature of the
API is that the buffer is allocated by the side that does the DMA
transfer. This works well with the storage usage case, but is awkward
with GPU-NIC communication, where typically the buffer is allocated by
the GPU driver rather than the NIC driver.

Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier
and ZONE_DEVICE to support shared virtual address space and page
migration between system memory and device memory. HMM doesn't support
pinning device memory because pages located on device must be able to
migrate to system memory when accessed by CPU. Peer-to-peer access
is currently not supported by HMM.

Dma-buf is a standard mechanism for sharing buffers among different
device drivers. The buffer to be shared is exported by the owning
driver and imported by the driver that wants to use it. The exporter
provides a set of ops that the importer can call to pin and map the
buffer. In addition, a file descriptor can be associated with a dma-
buf object as the handle that can be passed to user space.

This patch series adds dma-buf importer role to the RDMA driver in
attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
chosen for a few reasons: first, the API is relatively simple and allows
a lot of flexibility in implementing the buffer manipulation ops.
Second, it doesn't require page structure. Third, dma-buf is already
supported in many GPU drivers. However, we are aware that existing GPU
drivers don't allow pinning device memory via the dma-buf interface.
Pinning would simply cause the backing storage to migrate to system RAM.
True peer-to-peer access is only possible using dynamic attach, which
requires on-demand paging support from the NIC to work. For this reason,
this series only works with ODP capable NICs.

This series consists of five patches. The first patch adds the common
code for importing dma-buf from a file descriptor and mapping the
dma-buf pages. Patch 2 add the new driver method reg_user_mr_dmabuf().
Patch 3 adds a new uverbs command for registering dma-buf based memory
region. Patch 4 adds dma-buf support to the mlx5 driver. Patch 5 adds
code to dma_buf_dynamic_attach() to check the importer device for use
of dma_virt_ops and reject the request if so.

Related user space RDMA library changes will be provided as a separate
patch series.

Jianxin Xiong (5):
  RDMA/umem: Support importing dma-buf as user memory region
  RDMA/core: Add device method for registering dma-buf based memory
    region
  RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  RDMA/mlx5: Support dma-buf based userspace memory region
  dma-buf: Reject attach request from importers that use dma_virt_ops

 drivers/dma-buf/dma-buf.c                     |   5 +
 drivers/infiniband/core/Makefile              |   2 +-
 drivers/infiniband/core/device.c              |   1 +
 drivers/infiniband/core/umem.c                |   4 +
 drivers/infiniband/core/umem_dmabuf.c         | 193 ++++++++++++++++++++++++++
 drivers/infiniband/core/umem_dmabuf.h         |  11 ++
 drivers/infiniband/core/uverbs_std_types_mr.c | 112 +++++++++++++++
 drivers/infiniband/hw/mlx5/main.c             |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  18 +++
 drivers/infiniband/hw/mlx5/mr.c               | 118 +++++++++++++++-
 drivers/infiniband/hw/mlx5/odp.c              |  97 +++++++++++--
 include/rdma/ib_umem.h                        |  39 +++++-
 include/rdma/ib_verbs.h                       |   6 +-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 ++
 14 files changed, 604 insertions(+), 18 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c
 create mode 100644 drivers/infiniband/core/umem_dmabuf.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v8 0/5] RDMA: Add dma-buf support
@ 2020-11-05 22:48 ` Jianxin Xiong
  0 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

This is the eighth version of the patch set. Changelog:

v8:
* Modify the dma-buf sg list in place to get a proper umem sg list and
  restore it before calling dma_buf_unmap_attachment()
* Validate the umem sg list with ib_umem_find_best_pgsz()
* Remove the logic for slicing the sg list at runtime

v7: https://www.spinics.net/lists/linux-rdma/msg97297.html
* Rebase on top of latest mlx5 MR patch series
* Slice dma-buf sg list at runtime instead of creating a new list
* Preload the buffer page mapping when the MR is created
* Move the 'dma_virt_ops' check into dma_buf_dynamic_attach()

v6: https://www.spinics.net/lists/linux-rdma/msg96923.html
* Move the dma-buf invalidation callback from the core to the device
  driver
* Move mapping update from work queue to pagefault handler
* Add dma-buf based MRs to the xarray of mmkeys so that the pagefault
  handler can be reached
* Update the new driver method and uverbs command signature by changing
  the paramter 'addr' to 'offset'
* Modify the sg list returned from dma_buf_map_attachment() based on
  the parameters 'offset' and 'length'
* Don't import dma-buf if 'dma_virt_ops' is used by the dma device
* The patch that clarifies dma-buf sg lists alignment has landed at
  https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac80cd17a615
  and thus is no longer included with this set

v5: https://www.spinics.net/lists/linux-rdma/msg96786.html
* Fix a few warnings reported by kernel test robot:
    - no previous prototype for function 'ib_umem_dmabuf_release' 
    - no previous prototype for function 'ib_umem_dmabuf_map_pages'
    - comparison of distinct pointer types in 'check_add_overflow'
* Add comment for the wait between getting the dma-buf sg tagle and
  updating the NIC page table

v4: https://www.spinics.net/lists/linux-rdma/msg96767.html
* Add a new ib_device method reg_user_mr_dmabuf() instead of expanding
  the existing method reg_user_mr()
* Use a separate code flow for dma-buf instead of adding special cases
  to the ODP memory region code path
* In invalidation callback, new mapping is updated as whole using work
  queue instead of being updated in page granularity in the page fault
  handler
* Use dma_resv_get_excl() and dma_fence_wait() to ensure the content of
  the pages have been moved to the new location before the new mapping
  is programmed into the NIC
* Add code to the ODP page fault handler to check the mapping status
* The new access flag added in v3 is removed.
* The checking for on-demand paging support in the new uverbs command
  is removed because it is implied by implementing the new ib_device
  method
* Clarify that dma-buf sg lists are page aligned

v3: https://www.spinics.net/lists/linux-rdma/msg96330.html
* Use dma_buf_dynamic_attach() instead of dma_buf_attach()
* Use on-demand paging mechanism to avoid pinning the GPU memory
* Instead of adding a new parameter to the device method for memory
  registration, pass all the attributes including the file descriptor
  as a structure
* Define a new access flag for dma-buf based memory region
* Check for on-demand paging support in the new uverbs command

v2: https://www.spinics.net/lists/linux-rdma/msg93643.html
* The Kconfig option is removed. There is no dependence issue since
  dma-buf driver is always enabled.
* The declaration of new data structure and functions is reorganized to
  minimize the visibility of the changes.
* The new uverbs command now goes through ioctl() instead of write().
* The rereg functionality is removed.
* Instead of adding new device method for dma-buf specific registration,
  existing method is extended to accept an extra parameter. 
* The correct function is now used for address range checking. 

v1: https://www.spinics.net/lists/linux-rdma/msg90720.html
* The initial patch set
* Implement core functions for importing and mapping dma-buf
* Use dma-buf static attach interface
* Add two ib_device methods reg_user_mr_fd() and rereg_user_mr_fd()
* Add two uverbs commands via the write() interface
* Add Kconfig option
* Add dma-buf support to mlx5 device

When enabled, an RDMA capable NIC can perform peer-to-peer transactions
over PCIe to access the local memory located on another device. This can
often lead to better performance than using a system memory buffer for
RDMA and copying data between the buffer and device memory.

Current kernel RDMA stack uses get_user_pages() to pin the physical
pages backing the user buffer and uses dma_map_sg_attrs() to get the
dma addresses for memory access. This usually doesn't work for peer
device memory due to the lack of associated page structures.

Several mechanisms exist today to facilitate device memory access.

ZONE_DEVICE is a new zone for device memory in the memory management
subsystem. It allows pages from device memory being described with
specialized page structures, but what can be done with these page
structures may be different from system memory. ZONE_DEVICE is further
specialized into multiple memory types, such as one type for PCI
p2pmem/p2pdma and one type for HMM.

PCI p2pmem/p2pdma uses ZONE_DEVICE to represent device memory residing
in a PCI BAR and provides a set of calls to publish, discover, allocate,
and map such memory for peer-to-peer transactions. One feature of the
API is that the buffer is allocated by the side that does the DMA
transfer. This works well with the storage usage case, but is awkward
with GPU-NIC communication, where typically the buffer is allocated by
the GPU driver rather than the NIC driver.

Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier
and ZONE_DEVICE to support shared virtual address space and page
migration between system memory and device memory. HMM doesn't support
pinning device memory because pages located on device must be able to
migrate to system memory when accessed by CPU. Peer-to-peer access
is currently not supported by HMM.

Dma-buf is a standard mechanism for sharing buffers among different
device drivers. The buffer to be shared is exported by the owning
driver and imported by the driver that wants to use it. The exporter
provides a set of ops that the importer can call to pin and map the
buffer. In addition, a file descriptor can be associated with a dma-
buf object as the handle that can be passed to user space.

This patch series adds dma-buf importer role to the RDMA driver in
attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
chosen for a few reasons: first, the API is relatively simple and allows
a lot of flexibility in implementing the buffer manipulation ops.
Second, it doesn't require page structure. Third, dma-buf is already
supported in many GPU drivers. However, we are aware that existing GPU
drivers don't allow pinning device memory via the dma-buf interface.
Pinning would simply cause the backing storage to migrate to system RAM.
True peer-to-peer access is only possible using dynamic attach, which
requires on-demand paging support from the NIC to work. For this reason,
this series only works with ODP capable NICs.

This series consists of five patches. The first patch adds the common
code for importing dma-buf from a file descriptor and mapping the
dma-buf pages. Patch 2 add the new driver method reg_user_mr_dmabuf().
Patch 3 adds a new uverbs command for registering dma-buf based memory
region. Patch 4 adds dma-buf support to the mlx5 driver. Patch 5 adds
code to dma_buf_dynamic_attach() to check the importer device for use
of dma_virt_ops and reject the request if so.

Related user space RDMA library changes will be provided as a separate
patch series.

Jianxin Xiong (5):
  RDMA/umem: Support importing dma-buf as user memory region
  RDMA/core: Add device method for registering dma-buf based memory
    region
  RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  RDMA/mlx5: Support dma-buf based userspace memory region
  dma-buf: Reject attach request from importers that use dma_virt_ops

 drivers/dma-buf/dma-buf.c                     |   5 +
 drivers/infiniband/core/Makefile              |   2 +-
 drivers/infiniband/core/device.c              |   1 +
 drivers/infiniband/core/umem.c                |   4 +
 drivers/infiniband/core/umem_dmabuf.c         | 193 ++++++++++++++++++++++++++
 drivers/infiniband/core/umem_dmabuf.h         |  11 ++
 drivers/infiniband/core/uverbs_std_types_mr.c | 112 +++++++++++++++
 drivers/infiniband/hw/mlx5/main.c             |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  18 +++
 drivers/infiniband/hw/mlx5/mr.c               | 118 +++++++++++++++-
 drivers/infiniband/hw/mlx5/odp.c              |  97 +++++++++++--
 include/rdma/ib_umem.h                        |  39 +++++-
 include/rdma/ib_verbs.h                       |   6 +-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 ++
 14 files changed, 604 insertions(+), 18 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c
 create mode 100644 drivers/infiniband/core/umem_dmabuf.h

-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
  2020-11-05 22:48 ` Jianxin Xiong
@ 2020-11-05 22:48   ` Jianxin Xiong
  -1 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

Dma-buf is a standard cross-driver buffer sharing mechanism that can be
used to support peer-to-peer access from RDMA devices.

Device memory exported via dma-buf is associated with a file descriptor.
This is passed to the user space as a property associated with the
buffer allocation. When the buffer is registered as a memory region,
the file descriptor is passed to the RDMA driver along with other
parameters.

Implement the common code for importing dma-buf object and mapping
dma-buf pages.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/umem.c        |   4 +
 drivers/infiniband/core/umem_dmabuf.c | 193 ++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/umem_dmabuf.h |  11 ++
 include/rdma/ib_umem.h                |  39 ++++++-
 5 files changed, 247 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c
 create mode 100644 drivers/infiniband/core/umem_dmabuf.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index ccf2670..8ab4eea 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -40,5 +40,5 @@ ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
 				uverbs_std_types_srq.o \
 				uverbs_std_types_wq.o \
 				uverbs_std_types_qp.o
-ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
+ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
 ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index f1fc7e3..4bc455f 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -43,6 +44,7 @@
 #include <rdma/ib_umem_odp.h>
 
 #include "uverbs.h"
+#include "umem_dmabuf.h"
 
 static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty)
 {
@@ -278,6 +280,8 @@ void ib_umem_release(struct ib_umem *umem)
 {
 	if (!umem)
 		return;
+	if (umem->is_dmabuf)
+		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
 	if (umem->is_odp)
 		return ib_umem_odp_release(to_ib_umem_odp(umem));
 
diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
new file mode 100644
index 0000000..ee82b7b
--- /dev/null
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -0,0 +1,193 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
+#include <linux/dma-mapping.h>
+
+#include "uverbs.h"
+#include "umem_dmabuf.h"
+
+int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	struct sg_table *sgt;
+	struct scatterlist *sg;
+	struct dma_fence *fence;
+	unsigned long start, end, cur;
+	unsigned int nmap;
+	unsigned int page_size;
+	int i;
+
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
+				     DMA_BIDIRECTIONAL);
+
+	if (IS_ERR(sgt))
+		return PTR_ERR(sgt);
+
+	/* modify the sgl in-place to match umem address and length */
+
+	start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
+	end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
+		    PAGE_SIZE);
+	cur = 0;
+	nmap = 0;
+	for_each_sgtable_dma_sg(sgt, sg, i) {
+		if (cur >= end)
+			break;
+		if (cur + sg_dma_len(sg) <= start) {
+			cur += sg_dma_len(sg);
+			continue;
+		}
+		if (cur <= start) {
+			unsigned long offset = start - cur;
+
+			umem_dmabuf->first_sg = sg;
+			umem_dmabuf->first_sg_offset = offset;
+			sg_dma_address(sg) += offset;
+			sg_dma_len(sg) -= offset;
+			if (&sg_dma_len(sg) != &sg->length)
+				sg->length -= offset;
+			cur += offset;
+		}
+		if (cur + sg_dma_len(sg) >= end) {
+			unsigned long trim = cur + sg_dma_len(sg) - end;
+
+			umem_dmabuf->last_sg = sg;
+			umem_dmabuf->last_sg_trim = trim;
+			sg_dma_len(sg) -= trim;
+			if (&sg_dma_len(sg) != &sg->length)
+				sg->length -= trim;
+		}
+		cur += sg_dma_len(sg);
+		nmap++;
+	}
+	
+	umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
+	umem_dmabuf->umem.sg_head.nents = nmap;
+	umem_dmabuf->umem.nmap = nmap;
+	umem_dmabuf->sgt = sgt;
+
+	page_size = ib_umem_find_best_pgsz(&umem_dmabuf->umem, PAGE_SIZE,
+					   umem_dmabuf->umem.iova);
+
+	if (WARN_ON(cur != end || page_size != PAGE_SIZE)) {
+		ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+		return -EINVAL;
+	}
+
+	/*
+	 * Although the sg list is valid now, the content of the pages
+	 * may be not up-to-date. Wait for the exporter to finish
+	 * the migration.
+	 */
+	fence = dma_resv_get_excl(umem_dmabuf->attach->dmabuf->resv);
+	if (fence)
+		dma_fence_wait(fence, false);
+
+	return 0;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);
+
+void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!umem_dmabuf->sgt)
+		return;
+
+	/* retore the original sgl */
+	sg_dma_address(umem_dmabuf->first_sg) -= umem_dmabuf->first_sg_offset;
+	sg_dma_len(umem_dmabuf->first_sg) += umem_dmabuf->first_sg_offset;
+	sg_dma_len(umem_dmabuf->last_sg) += umem_dmabuf->last_sg_trim;
+	if (&sg_dma_len(umem_dmabuf->first_sg) !=
+	    &umem_dmabuf->first_sg->length) {
+		umem_dmabuf->first_sg->length += umem_dmabuf->first_sg_offset;
+		umem_dmabuf->last_sg->length += umem_dmabuf->last_sg_trim;
+	}
+
+	dma_buf_unmap_attachment(umem_dmabuf->attach, umem_dmabuf->sgt,
+				 DMA_BIDIRECTIONAL);
+	umem_dmabuf->sgt = NULL;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_unmap_pages);
+
+struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
+				   unsigned long offset, size_t size,
+				   int fd, int access,
+				   const struct dma_buf_attach_ops *ops)
+{
+	struct dma_buf *dmabuf;
+	struct ib_umem_dmabuf *umem_dmabuf;
+	struct ib_umem *umem;
+	unsigned long end;
+	long ret;
+
+	if (check_add_overflow(offset, (unsigned long)size, &end))
+		return ERR_PTR(-EINVAL);
+
+	if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
+		return ERR_PTR(-EINVAL);
+
+	if (unlikely(!ops || !ops->move_notify))
+		return ERR_PTR(-EINVAL);
+
+	umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
+	if (!umem_dmabuf)
+		return ERR_PTR(-ENOMEM);
+
+	umem = &umem_dmabuf->umem;
+	umem->ibdev = device;
+	umem->length = size;
+	umem->address = offset;
+	umem->iova = offset;
+	umem->writable = ib_access_writable(access);
+	umem->is_dmabuf = 1;
+
+	if (unlikely(!ib_umem_num_pages(umem))) {
+		ret = -EINVAL;
+		goto out_free_umem;
+	}
+
+	dmabuf = dma_buf_get(fd);
+	if (IS_ERR(dmabuf)) {
+		ret = PTR_ERR(dmabuf);
+		goto out_free_umem;
+	}
+
+	umem_dmabuf->attach = dma_buf_dynamic_attach(
+					dmabuf,
+					device->dma_device,
+					ops,
+					umem_dmabuf);
+	if (IS_ERR(umem_dmabuf->attach)) {
+		ret = PTR_ERR(umem_dmabuf->attach);
+		goto out_release_dmabuf;
+	}
+
+	return umem;
+
+out_release_dmabuf:
+	dma_buf_put(dmabuf);
+
+out_free_umem:
+	kfree(umem_dmabuf);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_get);
+
+void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf;
+
+	dma_resv_lock(dmabuf->resv, NULL);
+	ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+	dma_resv_unlock(dmabuf->resv);
+
+	dma_buf_detach(dmabuf, umem_dmabuf->attach);
+	dma_buf_put(dmabuf);
+	kfree(umem_dmabuf);
+}
diff --git a/drivers/infiniband/core/umem_dmabuf.h b/drivers/infiniband/core/umem_dmabuf.h
new file mode 100644
index 0000000..13acf55
--- /dev/null
+++ b/drivers/infiniband/core/umem_dmabuf.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#ifndef UMEM_DMABUF_H
+#define UMEM_DMABUF_H
+
+void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf);
+
+#endif /* UMEM_DMABUF_H */
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 7059750..e6e79bd 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
 /*
  * Copyright (c) 2007 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporation.  All rights reserved.
  */
 
 #ifndef IB_UMEM_H
@@ -13,6 +14,7 @@
 
 struct ib_ucontext;
 struct ib_umem_odp;
+struct dma_buf_attach_ops;
 
 struct ib_umem {
 	struct ib_device       *ibdev;
@@ -22,12 +24,29 @@ struct ib_umem {
 	unsigned long		address;
 	u32 writable : 1;
 	u32 is_odp : 1;
+	u32 is_dmabuf : 1;
 	struct work_struct	work;
 	struct sg_table sg_head;
 	int             nmap;
 	unsigned int    sg_nents;
 };
 
+struct ib_umem_dmabuf {
+	struct ib_umem umem;
+	struct dma_buf_attachment *attach;
+	struct sg_table *sgt;
+	struct scatterlist *first_sg;
+	struct scatterlist *last_sg;
+	unsigned long first_sg_offset;
+	unsigned long last_sg_trim;
+	void *private;
+};
+
+static inline struct ib_umem_dmabuf *to_ib_umem_dmabuf(struct ib_umem *umem)
+{
+	return container_of(umem, struct ib_umem_dmabuf, umem);
+}
+
 /* Returns the offset of the umem start relative to the first page. */
 static inline int ib_umem_offset(struct ib_umem *umem)
 {
@@ -79,6 +98,12 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 				     unsigned long pgsz_bitmap,
 				     unsigned long virt);
+struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
+				   unsigned long offset, size_t size,
+				   int fd, int access,
+				   const struct dma_buf_attach_ops *ops);
+int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf);
+void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf);
 
 #else /* CONFIG_INFINIBAND_USER_MEM */
 
@@ -101,7 +126,19 @@ static inline unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 {
 	return 0;
 }
+static inline struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
+						 unsigned long offset,
+						 size_t size, int fd,
+						 int access,
+						 struct dma_buf_attach_ops *ops)
+{
+	return ERR_PTR(-EINVAL);
+}
+static inline int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	return -EINVAL;
+}
+static inline void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf) { }
 
 #endif /* CONFIG_INFINIBAND_USER_MEM */
-
 #endif /* IB_UMEM_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-11-05 22:48   ` Jianxin Xiong
  0 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Dma-buf is a standard cross-driver buffer sharing mechanism that can be
used to support peer-to-peer access from RDMA devices.

Device memory exported via dma-buf is associated with a file descriptor.
This is passed to the user space as a property associated with the
buffer allocation. When the buffer is registered as a memory region,
the file descriptor is passed to the RDMA driver along with other
parameters.

Implement the common code for importing dma-buf object and mapping
dma-buf pages.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/umem.c        |   4 +
 drivers/infiniband/core/umem_dmabuf.c | 193 ++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/umem_dmabuf.h |  11 ++
 include/rdma/ib_umem.h                |  39 ++++++-
 5 files changed, 247 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c
 create mode 100644 drivers/infiniband/core/umem_dmabuf.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index ccf2670..8ab4eea 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -40,5 +40,5 @@ ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
 				uverbs_std_types_srq.o \
 				uverbs_std_types_wq.o \
 				uverbs_std_types_qp.o
-ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
+ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
 ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index f1fc7e3..4bc455f 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -43,6 +44,7 @@
 #include <rdma/ib_umem_odp.h>
 
 #include "uverbs.h"
+#include "umem_dmabuf.h"
 
 static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty)
 {
@@ -278,6 +280,8 @@ void ib_umem_release(struct ib_umem *umem)
 {
 	if (!umem)
 		return;
+	if (umem->is_dmabuf)
+		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
 	if (umem->is_odp)
 		return ib_umem_odp_release(to_ib_umem_odp(umem));
 
diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
new file mode 100644
index 0000000..ee82b7b
--- /dev/null
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -0,0 +1,193 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
+#include <linux/dma-mapping.h>
+
+#include "uverbs.h"
+#include "umem_dmabuf.h"
+
+int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	struct sg_table *sgt;
+	struct scatterlist *sg;
+	struct dma_fence *fence;
+	unsigned long start, end, cur;
+	unsigned int nmap;
+	unsigned int page_size;
+	int i;
+
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	sgt = dma_buf_map_attachment(umem_dmabuf->attach,
+				     DMA_BIDIRECTIONAL);
+
+	if (IS_ERR(sgt))
+		return PTR_ERR(sgt);
+
+	/* modify the sgl in-place to match umem address and length */
+
+	start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
+	end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
+		    PAGE_SIZE);
+	cur = 0;
+	nmap = 0;
+	for_each_sgtable_dma_sg(sgt, sg, i) {
+		if (cur >= end)
+			break;
+		if (cur + sg_dma_len(sg) <= start) {
+			cur += sg_dma_len(sg);
+			continue;
+		}
+		if (cur <= start) {
+			unsigned long offset = start - cur;
+
+			umem_dmabuf->first_sg = sg;
+			umem_dmabuf->first_sg_offset = offset;
+			sg_dma_address(sg) += offset;
+			sg_dma_len(sg) -= offset;
+			if (&sg_dma_len(sg) != &sg->length)
+				sg->length -= offset;
+			cur += offset;
+		}
+		if (cur + sg_dma_len(sg) >= end) {
+			unsigned long trim = cur + sg_dma_len(sg) - end;
+
+			umem_dmabuf->last_sg = sg;
+			umem_dmabuf->last_sg_trim = trim;
+			sg_dma_len(sg) -= trim;
+			if (&sg_dma_len(sg) != &sg->length)
+				sg->length -= trim;
+		}
+		cur += sg_dma_len(sg);
+		nmap++;
+	}
+	
+	umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
+	umem_dmabuf->umem.sg_head.nents = nmap;
+	umem_dmabuf->umem.nmap = nmap;
+	umem_dmabuf->sgt = sgt;
+
+	page_size = ib_umem_find_best_pgsz(&umem_dmabuf->umem, PAGE_SIZE,
+					   umem_dmabuf->umem.iova);
+
+	if (WARN_ON(cur != end || page_size != PAGE_SIZE)) {
+		ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+		return -EINVAL;
+	}
+
+	/*
+	 * Although the sg list is valid now, the content of the pages
+	 * may be not up-to-date. Wait for the exporter to finish
+	 * the migration.
+	 */
+	fence = dma_resv_get_excl(umem_dmabuf->attach->dmabuf->resv);
+	if (fence)
+		dma_fence_wait(fence, false);
+
+	return 0;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);
+
+void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!umem_dmabuf->sgt)
+		return;
+
+	/* retore the original sgl */
+	sg_dma_address(umem_dmabuf->first_sg) -= umem_dmabuf->first_sg_offset;
+	sg_dma_len(umem_dmabuf->first_sg) += umem_dmabuf->first_sg_offset;
+	sg_dma_len(umem_dmabuf->last_sg) += umem_dmabuf->last_sg_trim;
+	if (&sg_dma_len(umem_dmabuf->first_sg) !=
+	    &umem_dmabuf->first_sg->length) {
+		umem_dmabuf->first_sg->length += umem_dmabuf->first_sg_offset;
+		umem_dmabuf->last_sg->length += umem_dmabuf->last_sg_trim;
+	}
+
+	dma_buf_unmap_attachment(umem_dmabuf->attach, umem_dmabuf->sgt,
+				 DMA_BIDIRECTIONAL);
+	umem_dmabuf->sgt = NULL;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_unmap_pages);
+
+struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
+				   unsigned long offset, size_t size,
+				   int fd, int access,
+				   const struct dma_buf_attach_ops *ops)
+{
+	struct dma_buf *dmabuf;
+	struct ib_umem_dmabuf *umem_dmabuf;
+	struct ib_umem *umem;
+	unsigned long end;
+	long ret;
+
+	if (check_add_overflow(offset, (unsigned long)size, &end))
+		return ERR_PTR(-EINVAL);
+
+	if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
+		return ERR_PTR(-EINVAL);
+
+	if (unlikely(!ops || !ops->move_notify))
+		return ERR_PTR(-EINVAL);
+
+	umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
+	if (!umem_dmabuf)
+		return ERR_PTR(-ENOMEM);
+
+	umem = &umem_dmabuf->umem;
+	umem->ibdev = device;
+	umem->length = size;
+	umem->address = offset;
+	umem->iova = offset;
+	umem->writable = ib_access_writable(access);
+	umem->is_dmabuf = 1;
+
+	if (unlikely(!ib_umem_num_pages(umem))) {
+		ret = -EINVAL;
+		goto out_free_umem;
+	}
+
+	dmabuf = dma_buf_get(fd);
+	if (IS_ERR(dmabuf)) {
+		ret = PTR_ERR(dmabuf);
+		goto out_free_umem;
+	}
+
+	umem_dmabuf->attach = dma_buf_dynamic_attach(
+					dmabuf,
+					device->dma_device,
+					ops,
+					umem_dmabuf);
+	if (IS_ERR(umem_dmabuf->attach)) {
+		ret = PTR_ERR(umem_dmabuf->attach);
+		goto out_release_dmabuf;
+	}
+
+	return umem;
+
+out_release_dmabuf:
+	dma_buf_put(dmabuf);
+
+out_free_umem:
+	kfree(umem_dmabuf);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_get);
+
+void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf;
+
+	dma_resv_lock(dmabuf->resv, NULL);
+	ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+	dma_resv_unlock(dmabuf->resv);
+
+	dma_buf_detach(dmabuf, umem_dmabuf->attach);
+	dma_buf_put(dmabuf);
+	kfree(umem_dmabuf);
+}
diff --git a/drivers/infiniband/core/umem_dmabuf.h b/drivers/infiniband/core/umem_dmabuf.h
new file mode 100644
index 0000000..13acf55
--- /dev/null
+++ b/drivers/infiniband/core/umem_dmabuf.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#ifndef UMEM_DMABUF_H
+#define UMEM_DMABUF_H
+
+void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf);
+
+#endif /* UMEM_DMABUF_H */
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 7059750..e6e79bd 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
 /*
  * Copyright (c) 2007 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporation.  All rights reserved.
  */
 
 #ifndef IB_UMEM_H
@@ -13,6 +14,7 @@
 
 struct ib_ucontext;
 struct ib_umem_odp;
+struct dma_buf_attach_ops;
 
 struct ib_umem {
 	struct ib_device       *ibdev;
@@ -22,12 +24,29 @@ struct ib_umem {
 	unsigned long		address;
 	u32 writable : 1;
 	u32 is_odp : 1;
+	u32 is_dmabuf : 1;
 	struct work_struct	work;
 	struct sg_table sg_head;
 	int             nmap;
 	unsigned int    sg_nents;
 };
 
+struct ib_umem_dmabuf {
+	struct ib_umem umem;
+	struct dma_buf_attachment *attach;
+	struct sg_table *sgt;
+	struct scatterlist *first_sg;
+	struct scatterlist *last_sg;
+	unsigned long first_sg_offset;
+	unsigned long last_sg_trim;
+	void *private;
+};
+
+static inline struct ib_umem_dmabuf *to_ib_umem_dmabuf(struct ib_umem *umem)
+{
+	return container_of(umem, struct ib_umem_dmabuf, umem);
+}
+
 /* Returns the offset of the umem start relative to the first page. */
 static inline int ib_umem_offset(struct ib_umem *umem)
 {
@@ -79,6 +98,12 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 				     unsigned long pgsz_bitmap,
 				     unsigned long virt);
+struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
+				   unsigned long offset, size_t size,
+				   int fd, int access,
+				   const struct dma_buf_attach_ops *ops);
+int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf);
+void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf);
 
 #else /* CONFIG_INFINIBAND_USER_MEM */
 
@@ -101,7 +126,19 @@ static inline unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 {
 	return 0;
 }
+static inline struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
+						 unsigned long offset,
+						 size_t size, int fd,
+						 int access,
+						 struct dma_buf_attach_ops *ops)
+{
+	return ERR_PTR(-EINVAL);
+}
+static inline int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	return -EINVAL;
+}
+static inline void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf) { }
 
 #endif /* CONFIG_INFINIBAND_USER_MEM */
-
 #endif /* IB_UMEM_H */
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 2/5] RDMA/core: Add device method for registering dma-buf based memory region
  2020-11-05 22:48 ` Jianxin Xiong
@ 2020-11-05 22:48   ` Jianxin Xiong
  -1 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

Dma-buf based memory region requires one extra parameter and is processed
quite differently. Adding a separate method allows clean separation from
regular memory regions.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/core/device.c | 1 +
 include/rdma/ib_verbs.h          | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index ce26564..6768a19 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2689,6 +2689,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, read_counters);
 	SET_DEVICE_OP(dev_ops, reg_dm_mr);
 	SET_DEVICE_OP(dev_ops, reg_user_mr);
+	SET_DEVICE_OP(dev_ops, reg_user_mr_dmabuf);
 	SET_DEVICE_OP(dev_ops, req_ncomp_notif);
 	SET_DEVICE_OP(dev_ops, req_notify_cq);
 	SET_DEVICE_OP(dev_ops, rereg_user_mr);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9420827..3d1d098 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2,7 +2,7 @@
 /*
  * Copyright (c) 2004 Mellanox Technologies Ltd.  All rights reserved.
  * Copyright (c) 2004 Infinicon Corporation.  All rights reserved.
- * Copyright (c) 2004 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2020 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
@@ -2433,6 +2433,10 @@ struct ib_device_ops {
 	struct ib_mr *(*reg_user_mr)(struct ib_pd *pd, u64 start, u64 length,
 				     u64 virt_addr, int mr_access_flags,
 				     struct ib_udata *udata);
+	struct ib_mr *(*reg_user_mr_dmabuf)(struct ib_pd *pd, u64 offset,
+				     u64 length, u64 virt_addr, int fd,
+				     int mr_access_flags,
+				     struct ib_udata *udata);
 	int (*rereg_user_mr)(struct ib_mr *mr, int flags, u64 start, u64 length,
 			     u64 virt_addr, int mr_access_flags,
 			     struct ib_pd *pd, struct ib_udata *udata);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 2/5] RDMA/core: Add device method for registering dma-buf based memory region
@ 2020-11-05 22:48   ` Jianxin Xiong
  0 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Dma-buf based memory region requires one extra parameter and is processed
quite differently. Adding a separate method allows clean separation from
regular memory regions.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/core/device.c | 1 +
 include/rdma/ib_verbs.h          | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index ce26564..6768a19 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2689,6 +2689,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, read_counters);
 	SET_DEVICE_OP(dev_ops, reg_dm_mr);
 	SET_DEVICE_OP(dev_ops, reg_user_mr);
+	SET_DEVICE_OP(dev_ops, reg_user_mr_dmabuf);
 	SET_DEVICE_OP(dev_ops, req_ncomp_notif);
 	SET_DEVICE_OP(dev_ops, req_notify_cq);
 	SET_DEVICE_OP(dev_ops, rereg_user_mr);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9420827..3d1d098 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2,7 +2,7 @@
 /*
  * Copyright (c) 2004 Mellanox Technologies Ltd.  All rights reserved.
  * Copyright (c) 2004 Infinicon Corporation.  All rights reserved.
- * Copyright (c) 2004 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2020 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
@@ -2433,6 +2433,10 @@ struct ib_device_ops {
 	struct ib_mr *(*reg_user_mr)(struct ib_pd *pd, u64 start, u64 length,
 				     u64 virt_addr, int mr_access_flags,
 				     struct ib_udata *udata);
+	struct ib_mr *(*reg_user_mr_dmabuf)(struct ib_pd *pd, u64 offset,
+				     u64 length, u64 virt_addr, int fd,
+				     int mr_access_flags,
+				     struct ib_udata *udata);
 	int (*rereg_user_mr)(struct ib_mr *mr, int flags, u64 start, u64 length,
 			     u64 virt_addr, int mr_access_flags,
 			     struct ib_pd *pd, struct ib_udata *udata);
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  2020-11-05 22:48 ` Jianxin Xiong
@ 2020-11-05 22:48   ` Jianxin Xiong
  -1 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

Implement a new uverbs ioctl method for memory registration with file
descriptor as an extra parameter.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/core/uverbs_std_types_mr.c | 112 ++++++++++++++++++++++++++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 ++++
 2 files changed, 126 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index 9b22bb5..f22e58b 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -178,6 +179,85 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 	return IS_UVERBS_COPY_ERR(ret) ? ret : 0;
 }
 
+static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uobject *uobj =
+		uverbs_attr_get_uobject(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
+	struct ib_pd *pd =
+		uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE);
+	struct ib_device *ib_dev = pd->device;
+
+	u64 offset, length, virt_addr;
+	u32 fd, access_flags;
+	struct ib_mr *mr;
+	int ret;
+
+	if (!ib_dev->ops.reg_user_mr_dmabuf)
+		return -EOPNOTSUPP;
+
+	ret = uverbs_copy_from(&offset, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_OFFSET);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&length, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_LENGTH);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&virt_addr, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_IOVA);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&fd, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_FD);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_flags32(&access_flags, attrs,
+				 UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+				 IB_ACCESS_SUPPORTED);
+	if (ret)
+		return ret;
+
+	ret = ib_check_mr_access(access_flags);
+	if (ret)
+		return ret;
+
+	mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, virt_addr,
+						fd, access_flags,
+						&attrs->driver_udata);
+	if (IS_ERR(mr))
+		return PTR_ERR(mr);
+
+	mr->device  = pd->device;
+	mr->pd      = pd;
+	mr->type    = IB_MR_TYPE_USER;
+	mr->uobject = uobj;
+	atomic_inc(&pd->usecnt);
+
+	uobj->object = mr;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			     &mr->lkey, sizeof(mr->lkey));
+	if (ret)
+		goto err_dereg;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			     &mr->rkey, sizeof(mr->rkey));
+	if (ret)
+		goto err_dereg;
+
+	return 0;
+
+err_dereg:
+	ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
+
+	return ret;
+}
+
 DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
@@ -243,6 +323,37 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 			    UVERBS_ATTR_TYPE(u32),
 			    UA_MANDATORY));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_REG_DMABUF_MR,
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+			UVERBS_OBJECT_MR,
+			UVERBS_ACCESS_NEW,
+			UA_MANDATORY),
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+			UVERBS_OBJECT_PD,
+			UVERBS_ACCESS_READ,
+			UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_IOVA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_FD,
+			   UVERBS_ATTR_TYPE(u32),
+			   UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+			     enum ib_access_flags),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY));
+
 DECLARE_UVERBS_NAMED_METHOD_DESTROY(
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_MR_HANDLE,
@@ -253,6 +364,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 DECLARE_UVERBS_NAMED_OBJECT(
 	UVERBS_OBJECT_MR,
 	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
+	&UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR),
 	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
 	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
 	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 7968a18..dafc7eb 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -251,6 +252,7 @@ enum uverbs_methods_mr {
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_METHOD_QUERY_MR,
+	UVERBS_METHOD_REG_DMABUF_MR,
 };
 
 enum uverbs_attrs_mr_destroy_ids {
@@ -272,6 +274,18 @@ enum uverbs_attrs_query_mr_cmd_attr_ids {
 	UVERBS_ATTR_QUERY_MR_RESP_IOVA,
 };
 
+enum uverbs_attrs_reg_dmabuf_mr_cmd_attr_ids {
+	UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
+	UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+	UVERBS_ATTR_REG_DMABUF_MR_IOVA,
+	UVERBS_ATTR_REG_DMABUF_MR_FD,
+	UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+};
+
 enum uverbs_attrs_create_counters_cmd_attr_ids {
 	UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
 };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
@ 2020-11-05 22:48   ` Jianxin Xiong
  0 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Implement a new uverbs ioctl method for memory registration with file
descriptor as an extra parameter.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/core/uverbs_std_types_mr.c | 112 ++++++++++++++++++++++++++
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 ++++
 2 files changed, 126 insertions(+)

diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index 9b22bb5..f22e58b 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -178,6 +179,85 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 	return IS_UVERBS_COPY_ERR(ret) ? ret : 0;
 }
 
+static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uobject *uobj =
+		uverbs_attr_get_uobject(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
+	struct ib_pd *pd =
+		uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE);
+	struct ib_device *ib_dev = pd->device;
+
+	u64 offset, length, virt_addr;
+	u32 fd, access_flags;
+	struct ib_mr *mr;
+	int ret;
+
+	if (!ib_dev->ops.reg_user_mr_dmabuf)
+		return -EOPNOTSUPP;
+
+	ret = uverbs_copy_from(&offset, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_OFFSET);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&length, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_LENGTH);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&virt_addr, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_IOVA);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&fd, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_FD);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_flags32(&access_flags, attrs,
+				 UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+				 IB_ACCESS_SUPPORTED);
+	if (ret)
+		return ret;
+
+	ret = ib_check_mr_access(access_flags);
+	if (ret)
+		return ret;
+
+	mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, virt_addr,
+						fd, access_flags,
+						&attrs->driver_udata);
+	if (IS_ERR(mr))
+		return PTR_ERR(mr);
+
+	mr->device  = pd->device;
+	mr->pd      = pd;
+	mr->type    = IB_MR_TYPE_USER;
+	mr->uobject = uobj;
+	atomic_inc(&pd->usecnt);
+
+	uobj->object = mr;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			     &mr->lkey, sizeof(mr->lkey));
+	if (ret)
+		goto err_dereg;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			     &mr->rkey, sizeof(mr->rkey));
+	if (ret)
+		goto err_dereg;
+
+	return 0;
+
+err_dereg:
+	ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
+
+	return ret;
+}
+
 DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
@@ -243,6 +323,37 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 			    UVERBS_ATTR_TYPE(u32),
 			    UA_MANDATORY));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_REG_DMABUF_MR,
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+			UVERBS_OBJECT_MR,
+			UVERBS_ACCESS_NEW,
+			UA_MANDATORY),
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+			UVERBS_OBJECT_PD,
+			UVERBS_ACCESS_READ,
+			UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_IOVA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_FD,
+			   UVERBS_ATTR_TYPE(u32),
+			   UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+			     enum ib_access_flags),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY));
+
 DECLARE_UVERBS_NAMED_METHOD_DESTROY(
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_MR_HANDLE,
@@ -253,6 +364,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 DECLARE_UVERBS_NAMED_OBJECT(
 	UVERBS_OBJECT_MR,
 	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
+	&UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR),
 	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
 	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
 	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 7968a18..dafc7eb 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -251,6 +252,7 @@ enum uverbs_methods_mr {
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_METHOD_QUERY_MR,
+	UVERBS_METHOD_REG_DMABUF_MR,
 };
 
 enum uverbs_attrs_mr_destroy_ids {
@@ -272,6 +274,18 @@ enum uverbs_attrs_query_mr_cmd_attr_ids {
 	UVERBS_ATTR_QUERY_MR_RESP_IOVA,
 };
 
+enum uverbs_attrs_reg_dmabuf_mr_cmd_attr_ids {
+	UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
+	UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+	UVERBS_ATTR_REG_DMABUF_MR_IOVA,
+	UVERBS_ATTR_REG_DMABUF_MR_FD,
+	UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+};
+
 enum uverbs_attrs_create_counters_cmd_attr_ids {
 	UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
 };
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
  2020-11-05 22:48 ` Jianxin Xiong
@ 2020-11-05 22:48   ` Jianxin Xiong
  -1 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

Implement the new driver method 'reg_user_mr_dmabuf'.  Utilize the core
functions to import dma-buf based memory region and update the mappings.

Add code to handle dma-buf related page fault.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/hw/mlx5/main.c    |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  18 ++++++
 drivers/infiniband/hw/mlx5/mr.c      | 118 +++++++++++++++++++++++++++++++++--
 drivers/infiniband/hw/mlx5/odp.c     |  97 +++++++++++++++++++++++++---
 4 files changed, 220 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 36b15a0..e647ea4 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /*
  * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  */
 
 #include <linux/debugfs.h>
@@ -4055,6 +4056,7 @@ static int mlx5_ib_enable_driver(struct ib_device *dev)
 	.query_srq = mlx5_ib_query_srq,
 	.query_ucontext = mlx5_ib_query_ucontext,
 	.reg_user_mr = mlx5_ib_reg_user_mr,
+	.reg_user_mr_dmabuf = mlx5_ib_reg_user_mr_dmabuf,
 	.req_notify_cq = mlx5_ib_arm_cq,
 	.rereg_user_mr = mlx5_ib_rereg_user_mr,
 	.resize_cq = mlx5_ib_resize_cq,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index bb44080..3ef6872 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
 /*
  * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  */
 
 #ifndef MLX5_IB_H
@@ -665,6 +666,12 @@ static inline bool is_odp_mr(struct mlx5_ib_mr *mr)
 	       mr->umem->is_odp;
 }
 
+static inline bool is_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	return IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING) && mr->umem &&
+	       mr->umem->is_dmabuf;
+}
+
 struct mlx5_ib_mw {
 	struct ib_mw		ibmw;
 	struct mlx5_core_mkey	mmkey;
@@ -1200,6 +1207,10 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int access_flags,
 				  struct ib_udata *udata);
+struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
+					 u64 length, u64 virt_addr,
+					 int fd, int access_flags,
+					 struct ib_udata *udata);
 int mlx5_ib_advise_mr(struct ib_pd *pd,
 		      enum ib_uverbs_advise_mr_advice advice,
 		      u32 flags,
@@ -1210,11 +1221,13 @@ int mlx5_ib_advise_mr(struct ib_pd *pd,
 int mlx5_ib_dealloc_mw(struct ib_mw *mw);
 int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 		       int page_shift, int flags);
+int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags);
 struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
 					     struct ib_udata *udata,
 					     int access_flags);
 void mlx5_ib_free_implicit_mr(struct mlx5_ib_mr *mr);
 void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr);
+void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr);
 int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 			  u64 length, u64 virt_addr, int access_flags,
 			  struct ib_pd *pd, struct ib_udata *udata);
@@ -1306,6 +1319,7 @@ int mlx5_ib_advise_mr_prefetch(struct ib_pd *pd,
 			       enum ib_uverbs_advise_mr_advice advice,
 			       u32 flags, struct ib_sge *sg_list, u32 num_sge);
 int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr, bool enable);
+int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr);
 #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 static inline void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
 {
@@ -1331,6 +1345,10 @@ static inline int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr, bool enable)
 {
 	return -EOPNOTSUPP;
 }
+static inline int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	return -EOPNOTSUPP;
+}
 #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 
 extern const struct mmu_interval_notifier_ops mlx5_mn_ops;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 9f653b4..c7f2a01 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -36,6 +37,8 @@
 #include <linux/debugfs.h>
 #include <linux/export.h>
 #include <linux/delay.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
 #include <rdma/ib_umem.h>
 #include <rdma/ib_umem_odp.h>
 #include <rdma/ib_verbs.h>
@@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
 	struct mlx5_ib_mr *mr;
 	unsigned int page_size;
 
-	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
+	if (umem->is_dmabuf)
+		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
+	else
+		page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
 	if (WARN_ON(!page_size))
 		return ERR_PTR(-EINVAL);
 	ent = mr_cache_ent_from_order(
@@ -1212,8 +1218,10 @@ int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 
 /*
  * Send the DMA list to the HW for a normal MR using UMR.
+ * Dmabuf MR is handled in a similar way, except that the MLX5_IB_UPD_XLT_ZAP
+ * flag may be used.
  */
-static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
+int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
 {
 	struct mlx5_ib_dev *dev = mr->dev;
 	struct device *ddev = dev->ib_dev.dev.parent;
@@ -1255,6 +1263,10 @@ static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
 		cur_mtt->ptag =
 			cpu_to_be64(rdma_block_iter_dma_address(&biter) |
 				    MLX5_IB_MTT_PRESENT);
+
+		if (mr->umem->is_dmabuf && (flags & MLX5_IB_UPD_XLT_ZAP))
+			cur_mtt->ptag = 0;
+
 		cur_mtt++;
 	}
 
@@ -1291,8 +1303,11 @@ static struct mlx5_ib_mr *reg_create(struct ib_mr *ibmr, struct ib_pd *pd,
 	int err;
 	bool pg_cap = !!(MLX5_CAP_GEN(dev->mdev, pg));
 
-	page_size =
-		mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
+	if (umem->is_dmabuf)
+		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
+	else
+		page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size,
+						     0, iova);
 	if (WARN_ON(!page_size))
 		return ERR_PTR(-EINVAL);
 
@@ -1581,6 +1596,97 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	return ERR_PTR(err);
 }
 
+static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
+	struct mlx5_ib_mr *mr = umem_dmabuf->private;
+
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	/* mr could have been destroyed. see mlx5_ib_fence_dmabuf_mr(). */
+	if (!mr)
+		return;
+
+	mlx5_ib_update_mr_pas(mr, MLX5_IB_UPD_XLT_ZAP);
+	ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+}
+
+static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = {
+	.allow_peer2peer = 1,
+	.move_notify = mlx5_ib_dmabuf_invalidate_cb,
+};
+
+struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 offset,
+					 u64 length, u64 virt_addr,
+					 int fd, int access_flags,
+					 struct ib_udata *udata)
+{
+	struct mlx5_ib_dev *dev = to_mdev(pd->device);
+	struct mlx5_ib_mr *mr = NULL;
+	struct ib_umem *umem;
+	int err;
+
+	if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	mlx5_ib_dbg(dev,
+		    "offset 0x%llx, virt_addr 0x%llx, length 0x%llx, fd %d, access_flags 0x%x\n",
+		    offset, virt_addr, length, fd, access_flags);
+
+	if (!mlx5_ib_can_load_pas_with_umr(dev, length))
+		return ERR_PTR(-EINVAL);
+
+	umem = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd, access_flags,
+				  &mlx5_ib_dmabuf_attach_ops);
+	if (IS_ERR(umem)) {
+		mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(umem));
+		return ERR_PTR(PTR_ERR(umem));
+	}
+
+	mr = alloc_mr_from_cache(pd, umem, virt_addr, access_flags);
+	if (IS_ERR(mr))
+		mr = NULL;
+
+	if (!mr) {
+		mutex_lock(&dev->slow_path_mutex);
+		mr = reg_create(NULL, pd, umem, virt_addr, access_flags,
+				false);
+		mutex_unlock(&dev->slow_path_mutex);
+	}
+
+	if (IS_ERR(mr)) {
+		err = PTR_ERR(mr);
+		goto error;
+	}
+
+	mlx5_ib_dbg(dev, "mkey 0x%x\n", mr->mmkey.key);
+
+	mr->umem = umem;
+	atomic_add(ib_umem_num_pages(mr->umem), &dev->mdev->priv.reg_pages);
+	set_mr_fields(dev, mr, length, access_flags);
+
+	to_ib_umem_dmabuf(umem)->private = mr;
+	init_waitqueue_head(&mr->q_deferred_work);
+	atomic_set(&mr->num_deferred_work, 0);
+	err = xa_err(xa_store(&dev->odp_mkeys,
+			      mlx5_base_mkey(mr->mmkey.key), &mr->mmkey,
+			      GFP_KERNEL));
+	if (err) {
+		dereg_mr(dev, mr);
+		return ERR_PTR(err);
+	}
+
+	err = mlx5_ib_init_dmabuf_mr(mr);
+	if (err) {
+		dereg_mr(dev, mr);
+		return ERR_PTR(err);
+	}
+	return &mr->ibmr;
+error:
+	ib_umem_release(umem);
+	return ERR_PTR(err);
+}
+
 /**
  * mlx5_mr_cache_invalidate - Fence all DMA on the MR
  * @mr: The MR to fence
@@ -1649,7 +1755,7 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 	if (!mr->umem)
 		return -EINVAL;
 
-	if (is_odp_mr(mr))
+	if (is_odp_mr(mr) || is_dmabuf_mr(mr))
 		return -EOPNOTSUPP;
 
 	if (flags & IB_MR_REREG_TRANS) {
@@ -1812,6 +1918,8 @@ static void dereg_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
 	/* Stop all DMA */
 	if (is_odp)
 		mlx5_ib_fence_odp_mr(mr);
+	else if (is_dmabuf_mr(mr))
+		mlx5_ib_fence_dmabuf_mr(mr);
 	else
 		clean_mr(dev, mr);
 
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 5c853ec..14bc950 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -33,6 +33,8 @@
 #include <rdma/ib_umem.h>
 #include <rdma/ib_umem_odp.h>
 #include <linux/kernel.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
 
 #include "mlx5_ib.h"
 #include "cmd.h"
@@ -664,6 +666,36 @@ void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr)
 	dma_fence_odp_mr(mr);
 }
 
+/**
+ * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
+ * @mr: to fence
+ *
+ * On return no parallel threads will be touching this MR and no DMA will be
+ * active.
+ */
+void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
+
+	/* Prevent new page faults and prefetch requests from succeeding */
+	xa_erase(&mr->dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
+
+	/* Wait for all running page-fault handlers to finish. */
+	synchronize_srcu(&mr->dev->odp_srcu);
+
+	wait_event(mr->q_deferred_work, !atomic_read(&mr->num_deferred_work));
+
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	mlx5_mr_cache_invalidate(mr);
+	umem_dmabuf->private = NULL;
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!mr->cache_ent) {
+		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
+		WARN_ON(mr->descs);
+	}
+}
+
 #define MLX5_PF_FLAGS_DOWNGRADE BIT(1)
 #define MLX5_PF_FLAGS_SNAPSHOT BIT(2)
 #define MLX5_PF_FLAGS_ENABLE BIT(3)
@@ -797,6 +829,35 @@ static int pagefault_implicit_mr(struct mlx5_ib_mr *imr,
 	return ret;
 }
 
+static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr,
+			       struct ib_umem_dmabuf *umem_dmabuf,
+			       u64 user_va, size_t bcnt, u32 *bytes_mapped,
+			       u32 flags)
+{
+	int npages;
+	int err;
+	u32 xlt_flags = 0;
+
+	if (flags & MLX5_PF_FLAGS_ENABLE)
+		xlt_flags |= MLX5_IB_UPD_XLT_ENABLE;
+
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	err = ib_umem_dmabuf_map_pages(umem_dmabuf);
+	if (!err)
+		err = mlx5_ib_update_mr_pas(mr, xlt_flags);
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
+	if (err)
+		return err;
+
+	if (bytes_mapped)
+		*bytes_mapped += bcnt;
+
+	npages = (ALIGN(user_va + bcnt, PAGE_SIZE) -
+		 ALIGN_DOWN(user_va, PAGE_SIZE)) >> PAGE_SHIFT;
+	return npages;
+}
+
 /*
  * Returns:
  *  -EFAULT: The io_virt->bcnt is not within the MR, it covers pages that are
@@ -810,22 +871,31 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 io_virt, size_t bcnt,
 			u32 *bytes_mapped, u32 flags)
 {
 	struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem);
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
 
 	lockdep_assert_held(&mr->dev->odp_srcu);
 	if (unlikely(io_virt < mr->mmkey.iova))
 		return -EFAULT;
 
-	if (!odp->is_implicit_odp) {
+	if (is_dmabuf_mr(mr) || !odp->is_implicit_odp) {
 		u64 user_va;
+		u64 end;
 
 		if (check_add_overflow(io_virt - mr->mmkey.iova,
-				       (u64)odp->umem.address, &user_va))
+				       (u64)mr->umem->address, &user_va))
 			return -EFAULT;
-		if (unlikely(user_va >= ib_umem_end(odp) ||
-			     ib_umem_end(odp) - user_va < bcnt))
+		if (is_dmabuf_mr(mr))
+			end = mr->umem->address + mr->umem->length;
+		else
+			end = ib_umem_end(odp);
+		if (unlikely(user_va >= end || end - user_va < bcnt))
 			return -EFAULT;
-		return pagefault_real_mr(mr, odp, user_va, bcnt, bytes_mapped,
-					 flags);
+		if (is_dmabuf_mr(mr))
+			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
+						   bcnt, bytes_mapped, flags);
+		else
+			return pagefault_real_mr(mr, odp, user_va, bcnt,
+						 bytes_mapped, flags);
 	}
 	return pagefault_implicit_mr(mr, odp, io_virt, bcnt, bytes_mapped,
 				     flags);
@@ -845,6 +915,16 @@ int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr, bool enable)
 	return ret >= 0 ? 0 : ret;
 }
 
+int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	int ret;
+
+	ret = pagefault_dmabuf_mr(mr, to_ib_umem_dmabuf(mr->umem),
+				  mr->umem->address, mr->umem->length, NULL,
+				  MLX5_PF_FLAGS_ENABLE);
+	return ret >= 0 ? 0 : ret;
+}
+
 struct pf_frame {
 	struct pf_frame *next;
 	u32 key;
@@ -1747,7 +1827,6 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work)
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct mlx5_core_mkey *mmkey;
-	struct ib_umem_odp *odp;
 	struct mlx5_ib_mr *mr;
 
 	lockdep_assert_held(&dev->odp_srcu);
@@ -1761,11 +1840,9 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work)
 	if (mr->ibmr.pd != pd)
 		return NULL;
 
-	odp = to_ib_umem_odp(mr->umem);
-
 	/* prefetch with write-access must be supported by the MR */
 	if (advice == IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE &&
-	    !odp->umem.writable)
+	    !mr->umem->writable)
 		return NULL;
 
 	return mr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
@ 2020-11-05 22:48   ` Jianxin Xiong
  0 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Implement the new driver method 'reg_user_mr_dmabuf'.  Utilize the core
functions to import dma-buf based memory region and update the mappings.

Add code to handle dma-buf related page fault.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/hw/mlx5/main.c    |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  18 ++++++
 drivers/infiniband/hw/mlx5/mr.c      | 118 +++++++++++++++++++++++++++++++++--
 drivers/infiniband/hw/mlx5/odp.c     |  97 +++++++++++++++++++++++++---
 4 files changed, 220 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 36b15a0..e647ea4 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /*
  * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  */
 
 #include <linux/debugfs.h>
@@ -4055,6 +4056,7 @@ static int mlx5_ib_enable_driver(struct ib_device *dev)
 	.query_srq = mlx5_ib_query_srq,
 	.query_ucontext = mlx5_ib_query_ucontext,
 	.reg_user_mr = mlx5_ib_reg_user_mr,
+	.reg_user_mr_dmabuf = mlx5_ib_reg_user_mr_dmabuf,
 	.req_notify_cq = mlx5_ib_arm_cq,
 	.rereg_user_mr = mlx5_ib_rereg_user_mr,
 	.resize_cq = mlx5_ib_resize_cq,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index bb44080..3ef6872 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
 /*
  * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  */
 
 #ifndef MLX5_IB_H
@@ -665,6 +666,12 @@ static inline bool is_odp_mr(struct mlx5_ib_mr *mr)
 	       mr->umem->is_odp;
 }
 
+static inline bool is_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	return IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING) && mr->umem &&
+	       mr->umem->is_dmabuf;
+}
+
 struct mlx5_ib_mw {
 	struct ib_mw		ibmw;
 	struct mlx5_core_mkey	mmkey;
@@ -1200,6 +1207,10 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int access_flags,
 				  struct ib_udata *udata);
+struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
+					 u64 length, u64 virt_addr,
+					 int fd, int access_flags,
+					 struct ib_udata *udata);
 int mlx5_ib_advise_mr(struct ib_pd *pd,
 		      enum ib_uverbs_advise_mr_advice advice,
 		      u32 flags,
@@ -1210,11 +1221,13 @@ int mlx5_ib_advise_mr(struct ib_pd *pd,
 int mlx5_ib_dealloc_mw(struct ib_mw *mw);
 int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 		       int page_shift, int flags);
+int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags);
 struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
 					     struct ib_udata *udata,
 					     int access_flags);
 void mlx5_ib_free_implicit_mr(struct mlx5_ib_mr *mr);
 void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr);
+void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr);
 int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 			  u64 length, u64 virt_addr, int access_flags,
 			  struct ib_pd *pd, struct ib_udata *udata);
@@ -1306,6 +1319,7 @@ int mlx5_ib_advise_mr_prefetch(struct ib_pd *pd,
 			       enum ib_uverbs_advise_mr_advice advice,
 			       u32 flags, struct ib_sge *sg_list, u32 num_sge);
 int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr, bool enable);
+int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr);
 #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 static inline void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
 {
@@ -1331,6 +1345,10 @@ static inline int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr, bool enable)
 {
 	return -EOPNOTSUPP;
 }
+static inline int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	return -EOPNOTSUPP;
+}
 #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 
 extern const struct mmu_interval_notifier_ops mlx5_mn_ops;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 9f653b4..c7f2a01 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -36,6 +37,8 @@
 #include <linux/debugfs.h>
 #include <linux/export.h>
 #include <linux/delay.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
 #include <rdma/ib_umem.h>
 #include <rdma/ib_umem_odp.h>
 #include <rdma/ib_verbs.h>
@@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
 	struct mlx5_ib_mr *mr;
 	unsigned int page_size;
 
-	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
+	if (umem->is_dmabuf)
+		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
+	else
+		page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
 	if (WARN_ON(!page_size))
 		return ERR_PTR(-EINVAL);
 	ent = mr_cache_ent_from_order(
@@ -1212,8 +1218,10 @@ int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 
 /*
  * Send the DMA list to the HW for a normal MR using UMR.
+ * Dmabuf MR is handled in a similar way, except that the MLX5_IB_UPD_XLT_ZAP
+ * flag may be used.
  */
-static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
+int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
 {
 	struct mlx5_ib_dev *dev = mr->dev;
 	struct device *ddev = dev->ib_dev.dev.parent;
@@ -1255,6 +1263,10 @@ static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
 		cur_mtt->ptag =
 			cpu_to_be64(rdma_block_iter_dma_address(&biter) |
 				    MLX5_IB_MTT_PRESENT);
+
+		if (mr->umem->is_dmabuf && (flags & MLX5_IB_UPD_XLT_ZAP))
+			cur_mtt->ptag = 0;
+
 		cur_mtt++;
 	}
 
@@ -1291,8 +1303,11 @@ static struct mlx5_ib_mr *reg_create(struct ib_mr *ibmr, struct ib_pd *pd,
 	int err;
 	bool pg_cap = !!(MLX5_CAP_GEN(dev->mdev, pg));
 
-	page_size =
-		mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
+	if (umem->is_dmabuf)
+		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
+	else
+		page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size,
+						     0, iova);
 	if (WARN_ON(!page_size))
 		return ERR_PTR(-EINVAL);
 
@@ -1581,6 +1596,97 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	return ERR_PTR(err);
 }
 
+static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
+	struct mlx5_ib_mr *mr = umem_dmabuf->private;
+
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	/* mr could have been destroyed. see mlx5_ib_fence_dmabuf_mr(). */
+	if (!mr)
+		return;
+
+	mlx5_ib_update_mr_pas(mr, MLX5_IB_UPD_XLT_ZAP);
+	ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+}
+
+static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = {
+	.allow_peer2peer = 1,
+	.move_notify = mlx5_ib_dmabuf_invalidate_cb,
+};
+
+struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 offset,
+					 u64 length, u64 virt_addr,
+					 int fd, int access_flags,
+					 struct ib_udata *udata)
+{
+	struct mlx5_ib_dev *dev = to_mdev(pd->device);
+	struct mlx5_ib_mr *mr = NULL;
+	struct ib_umem *umem;
+	int err;
+
+	if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	mlx5_ib_dbg(dev,
+		    "offset 0x%llx, virt_addr 0x%llx, length 0x%llx, fd %d, access_flags 0x%x\n",
+		    offset, virt_addr, length, fd, access_flags);
+
+	if (!mlx5_ib_can_load_pas_with_umr(dev, length))
+		return ERR_PTR(-EINVAL);
+
+	umem = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd, access_flags,
+				  &mlx5_ib_dmabuf_attach_ops);
+	if (IS_ERR(umem)) {
+		mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(umem));
+		return ERR_PTR(PTR_ERR(umem));
+	}
+
+	mr = alloc_mr_from_cache(pd, umem, virt_addr, access_flags);
+	if (IS_ERR(mr))
+		mr = NULL;
+
+	if (!mr) {
+		mutex_lock(&dev->slow_path_mutex);
+		mr = reg_create(NULL, pd, umem, virt_addr, access_flags,
+				false);
+		mutex_unlock(&dev->slow_path_mutex);
+	}
+
+	if (IS_ERR(mr)) {
+		err = PTR_ERR(mr);
+		goto error;
+	}
+
+	mlx5_ib_dbg(dev, "mkey 0x%x\n", mr->mmkey.key);
+
+	mr->umem = umem;
+	atomic_add(ib_umem_num_pages(mr->umem), &dev->mdev->priv.reg_pages);
+	set_mr_fields(dev, mr, length, access_flags);
+
+	to_ib_umem_dmabuf(umem)->private = mr;
+	init_waitqueue_head(&mr->q_deferred_work);
+	atomic_set(&mr->num_deferred_work, 0);
+	err = xa_err(xa_store(&dev->odp_mkeys,
+			      mlx5_base_mkey(mr->mmkey.key), &mr->mmkey,
+			      GFP_KERNEL));
+	if (err) {
+		dereg_mr(dev, mr);
+		return ERR_PTR(err);
+	}
+
+	err = mlx5_ib_init_dmabuf_mr(mr);
+	if (err) {
+		dereg_mr(dev, mr);
+		return ERR_PTR(err);
+	}
+	return &mr->ibmr;
+error:
+	ib_umem_release(umem);
+	return ERR_PTR(err);
+}
+
 /**
  * mlx5_mr_cache_invalidate - Fence all DMA on the MR
  * @mr: The MR to fence
@@ -1649,7 +1755,7 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 	if (!mr->umem)
 		return -EINVAL;
 
-	if (is_odp_mr(mr))
+	if (is_odp_mr(mr) || is_dmabuf_mr(mr))
 		return -EOPNOTSUPP;
 
 	if (flags & IB_MR_REREG_TRANS) {
@@ -1812,6 +1918,8 @@ static void dereg_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
 	/* Stop all DMA */
 	if (is_odp)
 		mlx5_ib_fence_odp_mr(mr);
+	else if (is_dmabuf_mr(mr))
+		mlx5_ib_fence_dmabuf_mr(mr);
 	else
 		clean_mr(dev, mr);
 
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 5c853ec..14bc950 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -33,6 +33,8 @@
 #include <rdma/ib_umem.h>
 #include <rdma/ib_umem_odp.h>
 #include <linux/kernel.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
 
 #include "mlx5_ib.h"
 #include "cmd.h"
@@ -664,6 +666,36 @@ void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr)
 	dma_fence_odp_mr(mr);
 }
 
+/**
+ * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
+ * @mr: to fence
+ *
+ * On return no parallel threads will be touching this MR and no DMA will be
+ * active.
+ */
+void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
+
+	/* Prevent new page faults and prefetch requests from succeeding */
+	xa_erase(&mr->dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
+
+	/* Wait for all running page-fault handlers to finish. */
+	synchronize_srcu(&mr->dev->odp_srcu);
+
+	wait_event(mr->q_deferred_work, !atomic_read(&mr->num_deferred_work));
+
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	mlx5_mr_cache_invalidate(mr);
+	umem_dmabuf->private = NULL;
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!mr->cache_ent) {
+		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
+		WARN_ON(mr->descs);
+	}
+}
+
 #define MLX5_PF_FLAGS_DOWNGRADE BIT(1)
 #define MLX5_PF_FLAGS_SNAPSHOT BIT(2)
 #define MLX5_PF_FLAGS_ENABLE BIT(3)
@@ -797,6 +829,35 @@ static int pagefault_implicit_mr(struct mlx5_ib_mr *imr,
 	return ret;
 }
 
+static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr,
+			       struct ib_umem_dmabuf *umem_dmabuf,
+			       u64 user_va, size_t bcnt, u32 *bytes_mapped,
+			       u32 flags)
+{
+	int npages;
+	int err;
+	u32 xlt_flags = 0;
+
+	if (flags & MLX5_PF_FLAGS_ENABLE)
+		xlt_flags |= MLX5_IB_UPD_XLT_ENABLE;
+
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	err = ib_umem_dmabuf_map_pages(umem_dmabuf);
+	if (!err)
+		err = mlx5_ib_update_mr_pas(mr, xlt_flags);
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
+	if (err)
+		return err;
+
+	if (bytes_mapped)
+		*bytes_mapped += bcnt;
+
+	npages = (ALIGN(user_va + bcnt, PAGE_SIZE) -
+		 ALIGN_DOWN(user_va, PAGE_SIZE)) >> PAGE_SHIFT;
+	return npages;
+}
+
 /*
  * Returns:
  *  -EFAULT: The io_virt->bcnt is not within the MR, it covers pages that are
@@ -810,22 +871,31 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 io_virt, size_t bcnt,
 			u32 *bytes_mapped, u32 flags)
 {
 	struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem);
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
 
 	lockdep_assert_held(&mr->dev->odp_srcu);
 	if (unlikely(io_virt < mr->mmkey.iova))
 		return -EFAULT;
 
-	if (!odp->is_implicit_odp) {
+	if (is_dmabuf_mr(mr) || !odp->is_implicit_odp) {
 		u64 user_va;
+		u64 end;
 
 		if (check_add_overflow(io_virt - mr->mmkey.iova,
-				       (u64)odp->umem.address, &user_va))
+				       (u64)mr->umem->address, &user_va))
 			return -EFAULT;
-		if (unlikely(user_va >= ib_umem_end(odp) ||
-			     ib_umem_end(odp) - user_va < bcnt))
+		if (is_dmabuf_mr(mr))
+			end = mr->umem->address + mr->umem->length;
+		else
+			end = ib_umem_end(odp);
+		if (unlikely(user_va >= end || end - user_va < bcnt))
 			return -EFAULT;
-		return pagefault_real_mr(mr, odp, user_va, bcnt, bytes_mapped,
-					 flags);
+		if (is_dmabuf_mr(mr))
+			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
+						   bcnt, bytes_mapped, flags);
+		else
+			return pagefault_real_mr(mr, odp, user_va, bcnt,
+						 bytes_mapped, flags);
 	}
 	return pagefault_implicit_mr(mr, odp, io_virt, bcnt, bytes_mapped,
 				     flags);
@@ -845,6 +915,16 @@ int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr, bool enable)
 	return ret >= 0 ? 0 : ret;
 }
 
+int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	int ret;
+
+	ret = pagefault_dmabuf_mr(mr, to_ib_umem_dmabuf(mr->umem),
+				  mr->umem->address, mr->umem->length, NULL,
+				  MLX5_PF_FLAGS_ENABLE);
+	return ret >= 0 ? 0 : ret;
+}
+
 struct pf_frame {
 	struct pf_frame *next;
 	u32 key;
@@ -1747,7 +1827,6 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work)
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct mlx5_core_mkey *mmkey;
-	struct ib_umem_odp *odp;
 	struct mlx5_ib_mr *mr;
 
 	lockdep_assert_held(&dev->odp_srcu);
@@ -1761,11 +1840,9 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work)
 	if (mr->ibmr.pd != pd)
 		return NULL;
 
-	odp = to_ib_umem_odp(mr->umem);
-
 	/* prefetch with write-access must be supported by the MR */
 	if (advice == IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE &&
-	    !odp->umem.writable)
+	    !mr->umem->writable)
 		return NULL;
 
 	return mr;
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 5/5] dma-buf: Reject attach request from importers that use dma_virt_ops
  2020-11-05 22:48 ` Jianxin Xiong
@ 2020-11-05 22:48   ` Jianxin Xiong
  -1 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

dma_virt_ops is used by virtual devices that map pages / scatterlists to
virtual addresses for CPU access instead of performing DMA. This is not
the intended use of dma_buf_attach() and dma_buf_map_attachment(). CPU
access of dma-buf should use dma_buf_vmap() and related functions.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
---
 drivers/dma-buf/dma-buf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 9a054fb5..ba2b877 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -686,6 +686,11 @@ struct dma_buf_attachment *
 	if (WARN_ON(importer_ops && !importer_ops->move_notify))
 		return ERR_PTR(-EINVAL);
 
+#ifdef CONFIG_DMA_VIRT_OPS
+	if (dev->dma_ops == &dma_virt_ops)
+		return ERR_PTR(-EINVAL);
+#endif
+
 	attach = kzalloc(sizeof(*attach), GFP_KERNEL);
 	if (!attach)
 		return ERR_PTR(-ENOMEM);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v8 5/5] dma-buf: Reject attach request from importers that use dma_virt_ops
@ 2020-11-05 22:48   ` Jianxin Xiong
  0 siblings, 0 replies; 42+ messages in thread
From: Jianxin Xiong @ 2020-11-05 22:48 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

dma_virt_ops is used by virtual devices that map pages / scatterlists to
virtual addresses for CPU access instead of performing DMA. This is not
the intended use of dma_buf_attach() and dma_buf_map_attachment(). CPU
access of dma-buf should use dma_buf_vmap() and related functions.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
---
 drivers/dma-buf/dma-buf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 9a054fb5..ba2b877 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -686,6 +686,11 @@ struct dma_buf_attachment *
 	if (WARN_ON(importer_ops && !importer_ops->move_notify))
 		return ERR_PTR(-EINVAL);
 
+#ifdef CONFIG_DMA_VIRT_OPS
+	if (dev->dma_ops == &dma_virt_ops)
+		return ERR_PTR(-EINVAL);
+#endif
+
 	attach = kzalloc(sizeof(*attach), GFP_KERNEL);
 	if (!attach)
 		return ERR_PTR(-ENOMEM);
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
  2020-11-05 22:48   ` Jianxin Xiong
@ 2020-11-06  0:08     ` Jason Gunthorpe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06  0:08 UTC (permalink / raw)
  To: Jianxin Xiong
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

On Thu, Nov 05, 2020 at 02:48:05PM -0800, Jianxin Xiong wrote:
> +	/* modify the sgl in-place to match umem address and length */
> +
> +	start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
> +	end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
> +		    PAGE_SIZE);
> +	cur = 0;
> +	nmap = 0;
> +	for_each_sgtable_dma_sg(sgt, sg, i) {
> +		if (cur >= end)
> +			break;
> +		if (cur + sg_dma_len(sg) <= start) {
> +			cur += sg_dma_len(sg);
> +			continue;
> +		}

This seems like a strange way to compute interesections

  if (cur <= start && start < cur + sg_dma_len(sg))

> +		if (cur <= start) {
> +			unsigned long offset = start - cur;
> +
> +			umem_dmabuf->first_sg = sg;
> +			umem_dmabuf->first_sg_offset = offset;
> +			sg_dma_address(sg) += offset;
> +			sg_dma_len(sg) -= offset;
> +			if (&sg_dma_len(sg) != &sg->length)
> +				sg->length -= offset;

We don't need to adjust sg->length, only dma_len, so no reason for
this surprising if.

> +			cur += offset;
> +		}
> +		if (cur + sg_dma_len(sg) >= end) {

Same logic here

> +			unsigned long trim = cur + sg_dma_len(sg) - end;
> +
> +			umem_dmabuf->last_sg = sg;
> +			umem_dmabuf->last_sg_trim = trim;
> +			sg_dma_len(sg) -= trim;
> +			if (&sg_dma_len(sg) != &sg->length)
> +				sg->length -= trim;

break, things are done here

> +		}
> +		cur += sg_dma_len(sg);
> +		nmap++;
> +	}

> +	
> +	umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
> +	umem_dmabuf->umem.sg_head.nents = nmap;
> +	umem_dmabuf->umem.nmap = nmap;
> +	umem_dmabuf->sgt = sgt;
> +
> +	page_size = ib_umem_find_best_pgsz(&umem_dmabuf->umem, PAGE_SIZE,
> +					   umem_dmabuf->umem.iova);
> +
> +	if (WARN_ON(cur != end || page_size != PAGE_SIZE)) {

Looks like nothing prevents this warn on to tigger

The user could specify a length that is beyond
the dma buf, can the dma buf length be checked during get?

Also page_size can be 0 because iova is not OK. iova should be checked
for alignment during get as well:

  iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)

But yes, this is the right idea

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-11-06  0:08     ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06  0:08 UTC (permalink / raw)
  To: Jianxin Xiong
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford,
	Daniel Vetter, Christian Koenig

On Thu, Nov 05, 2020 at 02:48:05PM -0800, Jianxin Xiong wrote:
> +	/* modify the sgl in-place to match umem address and length */
> +
> +	start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
> +	end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
> +		    PAGE_SIZE);
> +	cur = 0;
> +	nmap = 0;
> +	for_each_sgtable_dma_sg(sgt, sg, i) {
> +		if (cur >= end)
> +			break;
> +		if (cur + sg_dma_len(sg) <= start) {
> +			cur += sg_dma_len(sg);
> +			continue;
> +		}

This seems like a strange way to compute interesections

  if (cur <= start && start < cur + sg_dma_len(sg))

> +		if (cur <= start) {
> +			unsigned long offset = start - cur;
> +
> +			umem_dmabuf->first_sg = sg;
> +			umem_dmabuf->first_sg_offset = offset;
> +			sg_dma_address(sg) += offset;
> +			sg_dma_len(sg) -= offset;
> +			if (&sg_dma_len(sg) != &sg->length)
> +				sg->length -= offset;

We don't need to adjust sg->length, only dma_len, so no reason for
this surprising if.

> +			cur += offset;
> +		}
> +		if (cur + sg_dma_len(sg) >= end) {

Same logic here

> +			unsigned long trim = cur + sg_dma_len(sg) - end;
> +
> +			umem_dmabuf->last_sg = sg;
> +			umem_dmabuf->last_sg_trim = trim;
> +			sg_dma_len(sg) -= trim;
> +			if (&sg_dma_len(sg) != &sg->length)
> +				sg->length -= trim;

break, things are done here

> +		}
> +		cur += sg_dma_len(sg);
> +		nmap++;
> +	}

> +	
> +	umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
> +	umem_dmabuf->umem.sg_head.nents = nmap;
> +	umem_dmabuf->umem.nmap = nmap;
> +	umem_dmabuf->sgt = sgt;
> +
> +	page_size = ib_umem_find_best_pgsz(&umem_dmabuf->umem, PAGE_SIZE,
> +					   umem_dmabuf->umem.iova);
> +
> +	if (WARN_ON(cur != end || page_size != PAGE_SIZE)) {

Looks like nothing prevents this warn on to tigger

The user could specify a length that is beyond
the dma buf, can the dma buf length be checked during get?

Also page_size can be 0 because iova is not OK. iova should be checked
for alignment during get as well:

  iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)

But yes, this is the right idea

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  2020-11-05 22:48   ` Jianxin Xiong
@ 2020-11-06  0:13     ` Jason Gunthorpe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06  0:13 UTC (permalink / raw)
  To: Jianxin Xiong
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

On Thu, Nov 05, 2020 at 02:48:07PM -0800, Jianxin Xiong wrote:

> +	ret = ib_check_mr_access(access_flags);
> +	if (ret)
> +		return ret;

This should also reject unsupportable flags like ACCESS_ON_DEMAND and
HUGETLB

> +
> +	mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, virt_addr,
> +						fd, access_flags,
> +						&attrs->driver_udata);
> +	if (IS_ERR(mr))
> +		return PTR_ERR(mr);
> +
> +	mr->device  = pd->device;
> +	mr->pd      = pd;
> +	mr->type    = IB_MR_TYPE_USER;
> +	mr->uobject = uobj;
> +	atomic_inc(&pd->usecnt);

Fix intending when copying code please

> +
> +	uobj->object = mr;

Also bit surprised this works at all, it needs to call

  uverbs_finalize_uobj_create()

Right here.

Seems like the core code is missing some check that the API is being
used properly.

> +
> +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
> +			     &mr->lkey, sizeof(mr->lkey));
> +	if (ret)
> +		goto err_dereg;
> +
> +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
> +			     &mr->rkey, sizeof(mr->rkey));
> +	if (ret)
> +		goto err_dereg;
> +
> +	return 0;
> +
> +err_dereg:
> +	ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
> +
> +	return ret;
> +}
> +
>  DECLARE_UVERBS_NAMED_METHOD(
>  	UVERBS_METHOD_ADVISE_MR,
>  	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,

> @@ -253,6 +364,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
>  DECLARE_UVERBS_NAMED_OBJECT(
>  	UVERBS_OBJECT_MR,
>  	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
> +	&UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR),
>  	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
>  	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
>  	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),

I trie to keep these sorted, but I see it has become unsorted. You can
re-sort it in this patch

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
@ 2020-11-06  0:13     ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06  0:13 UTC (permalink / raw)
  To: Jianxin Xiong
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford,
	Daniel Vetter, Christian Koenig

On Thu, Nov 05, 2020 at 02:48:07PM -0800, Jianxin Xiong wrote:

> +	ret = ib_check_mr_access(access_flags);
> +	if (ret)
> +		return ret;

This should also reject unsupportable flags like ACCESS_ON_DEMAND and
HUGETLB

> +
> +	mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, virt_addr,
> +						fd, access_flags,
> +						&attrs->driver_udata);
> +	if (IS_ERR(mr))
> +		return PTR_ERR(mr);
> +
> +	mr->device  = pd->device;
> +	mr->pd      = pd;
> +	mr->type    = IB_MR_TYPE_USER;
> +	mr->uobject = uobj;
> +	atomic_inc(&pd->usecnt);

Fix intending when copying code please

> +
> +	uobj->object = mr;

Also bit surprised this works at all, it needs to call

  uverbs_finalize_uobj_create()

Right here.

Seems like the core code is missing some check that the API is being
used properly.

> +
> +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
> +			     &mr->lkey, sizeof(mr->lkey));
> +	if (ret)
> +		goto err_dereg;
> +
> +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
> +			     &mr->rkey, sizeof(mr->rkey));
> +	if (ret)
> +		goto err_dereg;
> +
> +	return 0;
> +
> +err_dereg:
> +	ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
> +
> +	return ret;
> +}
> +
>  DECLARE_UVERBS_NAMED_METHOD(
>  	UVERBS_METHOD_ADVISE_MR,
>  	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,

> @@ -253,6 +364,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
>  DECLARE_UVERBS_NAMED_OBJECT(
>  	UVERBS_OBJECT_MR,
>  	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
> +	&UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR),
>  	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
>  	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
>  	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),

I trie to keep these sorted, but I see it has become unsorted. You can
re-sort it in this patch

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
  2020-11-05 22:48   ` Jianxin Xiong
@ 2020-11-06  0:25     ` Jason Gunthorpe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06  0:25 UTC (permalink / raw)
  To: Jianxin Xiong
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> @@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
>  	struct mlx5_ib_mr *mr;
>  	unsigned int page_size;
>  
> -	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> +	if (umem->is_dmabuf)
> +		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);

You said the sgl is not set here, why doesn't this crash? It is
certainly wrong to call this function without a SGL.

> +/**
> + * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
> + * @mr: to fence
> + *
> + * On return no parallel threads will be touching this MR and no DMA will be
> + * active.
> + */
> +void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr)
> +{
> +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> +
> +	/* Prevent new page faults and prefetch requests from succeeding */
> +	xa_erase(&mr->dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
> +
> +	/* Wait for all running page-fault handlers to finish. */
> +	synchronize_srcu(&mr->dev->odp_srcu);
> +
> +	wait_event(mr->q_deferred_work, !atomic_read(&mr->num_deferred_work));
> +
> +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> +	mlx5_mr_cache_invalidate(mr);
> +	umem_dmabuf->private = NULL;
> +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> +
> +	if (!mr->cache_ent) {
> +		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> +		WARN_ON(mr->descs);
> +	}
> +}

I would expect this to call ib_umem_dmabuf_unmap_pages() ?

Who calls it on the dereg path?

This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages()
only from the invalidate callback?

I feel uneasy how this seems to assume everything works sanely, we can
have parallel page faults so pagefault_dmabuf_mr() can be called
multiple times after an invalidation, and it doesn't protect itself
against calling ib_umem_dmabuf_map_pages() twice.

Perhaps the umem code should keep track of the current map state and
exit if there is already a sgl. NULL or not NULL sgl would do and
seems quite reasonable.

> @@ -810,22 +871,31 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 io_virt, size_t bcnt,
>  			u32 *bytes_mapped, u32 flags)
>  {
>  	struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem);
> +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
>  
>  	lockdep_assert_held(&mr->dev->odp_srcu);
>  	if (unlikely(io_virt < mr->mmkey.iova))
>  		return -EFAULT;
>  
> -	if (!odp->is_implicit_odp) {
> +	if (is_dmabuf_mr(mr) || !odp->is_implicit_odp) {
>  		u64 user_va;
> +		u64 end;
>  
>  		if (check_add_overflow(io_virt - mr->mmkey.iova,
> -				       (u64)odp->umem.address, &user_va))
> +				       (u64)mr->umem->address, &user_va))
>  			return -EFAULT;
> -		if (unlikely(user_va >= ib_umem_end(odp) ||
> -			     ib_umem_end(odp) - user_va < bcnt))
> +		if (is_dmabuf_mr(mr))
> +			end = mr->umem->address + mr->umem->length;
> +		else
> +			end = ib_umem_end(odp);
> +		if (unlikely(user_va >= end || end - user_va < bcnt))
>  			return -EFAULT;
> -		return pagefault_real_mr(mr, odp, user_va, bcnt, bytes_mapped,
> -					 flags);
> +		if (is_dmabuf_mr(mr))
> +			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
> +						   bcnt, bytes_mapped, flags);

But this doesn't care about user_va or bcnt it just triggers the whole
thing to be remapped, so why calculate it?

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
@ 2020-11-06  0:25     ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06  0:25 UTC (permalink / raw)
  To: Jianxin Xiong
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford,
	Daniel Vetter, Christian Koenig

On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> @@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
>  	struct mlx5_ib_mr *mr;
>  	unsigned int page_size;
>  
> -	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> +	if (umem->is_dmabuf)
> +		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);

You said the sgl is not set here, why doesn't this crash? It is
certainly wrong to call this function without a SGL.

> +/**
> + * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
> + * @mr: to fence
> + *
> + * On return no parallel threads will be touching this MR and no DMA will be
> + * active.
> + */
> +void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr)
> +{
> +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> +
> +	/* Prevent new page faults and prefetch requests from succeeding */
> +	xa_erase(&mr->dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
> +
> +	/* Wait for all running page-fault handlers to finish. */
> +	synchronize_srcu(&mr->dev->odp_srcu);
> +
> +	wait_event(mr->q_deferred_work, !atomic_read(&mr->num_deferred_work));
> +
> +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> +	mlx5_mr_cache_invalidate(mr);
> +	umem_dmabuf->private = NULL;
> +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> +
> +	if (!mr->cache_ent) {
> +		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> +		WARN_ON(mr->descs);
> +	}
> +}

I would expect this to call ib_umem_dmabuf_unmap_pages() ?

Who calls it on the dereg path?

This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages()
only from the invalidate callback?

I feel uneasy how this seems to assume everything works sanely, we can
have parallel page faults so pagefault_dmabuf_mr() can be called
multiple times after an invalidation, and it doesn't protect itself
against calling ib_umem_dmabuf_map_pages() twice.

Perhaps the umem code should keep track of the current map state and
exit if there is already a sgl. NULL or not NULL sgl would do and
seems quite reasonable.

> @@ -810,22 +871,31 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 io_virt, size_t bcnt,
>  			u32 *bytes_mapped, u32 flags)
>  {
>  	struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem);
> +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
>  
>  	lockdep_assert_held(&mr->dev->odp_srcu);
>  	if (unlikely(io_virt < mr->mmkey.iova))
>  		return -EFAULT;
>  
> -	if (!odp->is_implicit_odp) {
> +	if (is_dmabuf_mr(mr) || !odp->is_implicit_odp) {
>  		u64 user_va;
> +		u64 end;
>  
>  		if (check_add_overflow(io_virt - mr->mmkey.iova,
> -				       (u64)odp->umem.address, &user_va))
> +				       (u64)mr->umem->address, &user_va))
>  			return -EFAULT;
> -		if (unlikely(user_va >= ib_umem_end(odp) ||
> -			     ib_umem_end(odp) - user_va < bcnt))
> +		if (is_dmabuf_mr(mr))
> +			end = mr->umem->address + mr->umem->length;
> +		else
> +			end = ib_umem_end(odp);
> +		if (unlikely(user_va >= end || end - user_va < bcnt))
>  			return -EFAULT;
> -		return pagefault_real_mr(mr, odp, user_va, bcnt, bytes_mapped,
> -					 flags);
> +		if (is_dmabuf_mr(mr))
> +			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
> +						   bcnt, bytes_mapped, flags);

But this doesn't care about user_va or bcnt it just triggers the whole
thing to be remapped, so why calculate it?

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
  2020-11-06  0:25     ` Jason Gunthorpe
@ 2020-11-06  1:11       ` Xiong, Jianxin
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06  1:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, November 05, 2020 4:25 PM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
> 
> On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> > @@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
> >  	struct mlx5_ib_mr *mr;
> >  	unsigned int page_size;
> >
> > -	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> > +	if (umem->is_dmabuf)
> > +		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
> 
> You said the sgl is not set here, why doesn't this crash? It is certainly wrong to call this function without a SGL.

The sgl is NULL, and nmap is 0. The 'for_each_sg' loop is just skipped and won't crash.

> 
> > +/**
> > + * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
> > + * @mr: to fence
> > + *
> > + * On return no parallel threads will be touching this MR and no DMA
> > +will be
> > + * active.
> > + */
> > +void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr) {
> > +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> > +
> > +	/* Prevent new page faults and prefetch requests from succeeding */
> > +	xa_erase(&mr->dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
> > +
> > +	/* Wait for all running page-fault handlers to finish. */
> > +	synchronize_srcu(&mr->dev->odp_srcu);
> > +
> > +	wait_event(mr->q_deferred_work,
> > +!atomic_read(&mr->num_deferred_work));
> > +
> > +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > +	mlx5_mr_cache_invalidate(mr);
> > +	umem_dmabuf->private = NULL;
> > +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +	if (!mr->cache_ent) {
> > +		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > +		WARN_ON(mr->descs);
> > +	}
> > +}
> 
> I would expect this to call ib_umem_dmabuf_unmap_pages() ?
> 
> Who calls it on the dereg path?
> 
> This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages() only from the invalidate callback?
>

It is also called from ib_umem_dmabuf_release(). 
 
> I feel uneasy how this seems to assume everything works sanely, we can have parallel page faults so pagefault_dmabuf_mr() can be called
> multiple times after an invalidation, and it doesn't protect itself against calling ib_umem_dmabuf_map_pages() twice.
> 
> Perhaps the umem code should keep track of the current map state and exit if there is already a sgl. NULL or not NULL sgl would do and
> seems quite reasonable.
> 

Ib_umem_dmabuf_map() already checks the sgl and will do nothing if it is already set.

> > @@ -810,22 +871,31 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 io_virt, size_t bcnt,
> >  			u32 *bytes_mapped, u32 flags)
> >  {
> >  	struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem);
> > +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> >
> >  	lockdep_assert_held(&mr->dev->odp_srcu);
> >  	if (unlikely(io_virt < mr->mmkey.iova))
> >  		return -EFAULT;
> >
> > -	if (!odp->is_implicit_odp) {
> > +	if (is_dmabuf_mr(mr) || !odp->is_implicit_odp) {
> >  		u64 user_va;
> > +		u64 end;
> >
> >  		if (check_add_overflow(io_virt - mr->mmkey.iova,
> > -				       (u64)odp->umem.address, &user_va))
> > +				       (u64)mr->umem->address, &user_va))
> >  			return -EFAULT;
> > -		if (unlikely(user_va >= ib_umem_end(odp) ||
> > -			     ib_umem_end(odp) - user_va < bcnt))
> > +		if (is_dmabuf_mr(mr))
> > +			end = mr->umem->address + mr->umem->length;
> > +		else
> > +			end = ib_umem_end(odp);
> > +		if (unlikely(user_va >= end || end - user_va < bcnt))
> >  			return -EFAULT;
> > -		return pagefault_real_mr(mr, odp, user_va, bcnt, bytes_mapped,
> > -					 flags);
> > +		if (is_dmabuf_mr(mr))
> > +			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
> > +						   bcnt, bytes_mapped, flags);
> 
> But this doesn't care about user_va or bcnt it just triggers the whole thing to be remapped, so why calculate it?

The range check is still needed, in order to catch application errors of using incorrect address or count in verbs command. Passing the values further in is to allow pagefault_dmabuf_mr to
generate return value and set bytes_mapped in a way consistent with the page fault handler
chain.
  
> 
> Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
@ 2020-11-06  1:11       ` Xiong, Jianxin
  0 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06  1:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, November 05, 2020 4:25 PM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
> 
> On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> > @@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
> >  	struct mlx5_ib_mr *mr;
> >  	unsigned int page_size;
> >
> > -	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> > +	if (umem->is_dmabuf)
> > +		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
> 
> You said the sgl is not set here, why doesn't this crash? It is certainly wrong to call this function without a SGL.

The sgl is NULL, and nmap is 0. The 'for_each_sg' loop is just skipped and won't crash.

> 
> > +/**
> > + * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
> > + * @mr: to fence
> > + *
> > + * On return no parallel threads will be touching this MR and no DMA
> > +will be
> > + * active.
> > + */
> > +void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr) {
> > +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> > +
> > +	/* Prevent new page faults and prefetch requests from succeeding */
> > +	xa_erase(&mr->dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
> > +
> > +	/* Wait for all running page-fault handlers to finish. */
> > +	synchronize_srcu(&mr->dev->odp_srcu);
> > +
> > +	wait_event(mr->q_deferred_work,
> > +!atomic_read(&mr->num_deferred_work));
> > +
> > +	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> > +	mlx5_mr_cache_invalidate(mr);
> > +	umem_dmabuf->private = NULL;
> > +	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +	if (!mr->cache_ent) {
> > +		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > +		WARN_ON(mr->descs);
> > +	}
> > +}
> 
> I would expect this to call ib_umem_dmabuf_unmap_pages() ?
> 
> Who calls it on the dereg path?
> 
> This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages() only from the invalidate callback?
>

It is also called from ib_umem_dmabuf_release(). 
 
> I feel uneasy how this seems to assume everything works sanely, we can have parallel page faults so pagefault_dmabuf_mr() can be called
> multiple times after an invalidation, and it doesn't protect itself against calling ib_umem_dmabuf_map_pages() twice.
> 
> Perhaps the umem code should keep track of the current map state and exit if there is already a sgl. NULL or not NULL sgl would do and
> seems quite reasonable.
> 

Ib_umem_dmabuf_map() already checks the sgl and will do nothing if it is already set.

> > @@ -810,22 +871,31 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 io_virt, size_t bcnt,
> >  			u32 *bytes_mapped, u32 flags)
> >  {
> >  	struct ib_umem_odp *odp = to_ib_umem_odp(mr->umem);
> > +	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
> >
> >  	lockdep_assert_held(&mr->dev->odp_srcu);
> >  	if (unlikely(io_virt < mr->mmkey.iova))
> >  		return -EFAULT;
> >
> > -	if (!odp->is_implicit_odp) {
> > +	if (is_dmabuf_mr(mr) || !odp->is_implicit_odp) {
> >  		u64 user_va;
> > +		u64 end;
> >
> >  		if (check_add_overflow(io_virt - mr->mmkey.iova,
> > -				       (u64)odp->umem.address, &user_va))
> > +				       (u64)mr->umem->address, &user_va))
> >  			return -EFAULT;
> > -		if (unlikely(user_va >= ib_umem_end(odp) ||
> > -			     ib_umem_end(odp) - user_va < bcnt))
> > +		if (is_dmabuf_mr(mr))
> > +			end = mr->umem->address + mr->umem->length;
> > +		else
> > +			end = ib_umem_end(odp);
> > +		if (unlikely(user_va >= end || end - user_va < bcnt))
> >  			return -EFAULT;
> > -		return pagefault_real_mr(mr, odp, user_va, bcnt, bytes_mapped,
> > -					 flags);
> > +		if (is_dmabuf_mr(mr))
> > +			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
> > +						   bcnt, bytes_mapped, flags);
> 
> But this doesn't care about user_va or bcnt it just triggers the whole thing to be remapped, so why calculate it?

The range check is still needed, in order to catch application errors of using incorrect address or count in verbs command. Passing the values further in is to allow pagefault_dmabuf_mr to
generate return value and set bytes_mapped in a way consistent with the page fault handler
chain.
  
> 
> Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
  2020-11-06  1:11       ` Xiong, Jianxin
@ 2020-11-06 12:48         ` Jason Gunthorpe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06 12:48 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

On Fri, Nov 06, 2020 at 01:11:38AM +0000, Xiong, Jianxin wrote:
> > On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> > > @@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
> > >  	struct mlx5_ib_mr *mr;
> > >  	unsigned int page_size;
> > >
> > > -	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> > > +	if (umem->is_dmabuf)
> > > +		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
> > 
> > You said the sgl is not set here, why doesn't this crash? It is certainly wrong to call this function without a SGL.
> 
> The sgl is NULL, and nmap is 0. The 'for_each_sg' loop is just skipped and won't crash.

Just wire this to 4k it is clearer than calling some no-op pgsz


> > > +	if (!mr->cache_ent) {
> > > +		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > > +		WARN_ON(mr->descs);
> > > +	}
> > > +}
> > 
> > I would expect this to call ib_umem_dmabuf_unmap_pages() ?
> > 
> > Who calls it on the dereg path?
> > 
> > This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages() only from the invalidate callback?
> 
> It is also called from ib_umem_dmabuf_release(). 

Hmm, that is no how the other APIs work, the unmap should be paired
with the map in the caller, and the sequence for destroy should be

 invalidate
 unmap
 destroy_mkey
 release_umem

I have another series coming that makes the other three destroy flows
much closer to that ideal.

> > I feel uneasy how this seems to assume everything works sanely, we can have parallel page faults so pagefault_dmabuf_mr() can be called
> > multiple times after an invalidation, and it doesn't protect itself against calling ib_umem_dmabuf_map_pages() twice.
> > 
> > Perhaps the umem code should keep track of the current map state and exit if there is already a sgl. NULL or not NULL sgl would do and
> > seems quite reasonable.
> 
> Ib_umem_dmabuf_map() already checks the sgl and will do nothing if it is already set.

How? What I see in patch 1 is an unconditonal call to
dma_buf_map_attachment() ?

> > > +		if (is_dmabuf_mr(mr))
> > > +			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
> > > +						   bcnt, bytes_mapped, flags);
> > 
> > But this doesn't care about user_va or bcnt it just triggers the whole thing to be remapped, so why calculate it?
> 
> The range check is still needed, in order to catch application
> errors of using incorrect address or count in verbs command. Passing
> the values further in is to allow pagefault_dmabuf_mr to generate
> return value and set bytes_mapped in a way consistent with the page
> fault handler chain.

The HW validates the range. The range check in the ODP case is to
protect against a HW bug that would cause the kernel to
malfunction. For dmabuf you don't need to do it

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
@ 2020-11-06 12:48         ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06 12:48 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

On Fri, Nov 06, 2020 at 01:11:38AM +0000, Xiong, Jianxin wrote:
> > On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> > > @@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
> > >  	struct mlx5_ib_mr *mr;
> > >  	unsigned int page_size;
> > >
> > > -	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> > > +	if (umem->is_dmabuf)
> > > +		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
> > 
> > You said the sgl is not set here, why doesn't this crash? It is certainly wrong to call this function without a SGL.
> 
> The sgl is NULL, and nmap is 0. The 'for_each_sg' loop is just skipped and won't crash.

Just wire this to 4k it is clearer than calling some no-op pgsz


> > > +	if (!mr->cache_ent) {
> > > +		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > > +		WARN_ON(mr->descs);
> > > +	}
> > > +}
> > 
> > I would expect this to call ib_umem_dmabuf_unmap_pages() ?
> > 
> > Who calls it on the dereg path?
> > 
> > This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages() only from the invalidate callback?
> 
> It is also called from ib_umem_dmabuf_release(). 

Hmm, that is no how the other APIs work, the unmap should be paired
with the map in the caller, and the sequence for destroy should be

 invalidate
 unmap
 destroy_mkey
 release_umem

I have another series coming that makes the other three destroy flows
much closer to that ideal.

> > I feel uneasy how this seems to assume everything works sanely, we can have parallel page faults so pagefault_dmabuf_mr() can be called
> > multiple times after an invalidation, and it doesn't protect itself against calling ib_umem_dmabuf_map_pages() twice.
> > 
> > Perhaps the umem code should keep track of the current map state and exit if there is already a sgl. NULL or not NULL sgl would do and
> > seems quite reasonable.
> 
> Ib_umem_dmabuf_map() already checks the sgl and will do nothing if it is already set.

How? What I see in patch 1 is an unconditonal call to
dma_buf_map_attachment() ?

> > > +		if (is_dmabuf_mr(mr))
> > > +			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
> > > +						   bcnt, bytes_mapped, flags);
> > 
> > But this doesn't care about user_va or bcnt it just triggers the whole thing to be remapped, so why calculate it?
> 
> The range check is still needed, in order to catch application
> errors of using incorrect address or count in verbs command. Passing
> the values further in is to allow pagefault_dmabuf_mr to generate
> return value and set bytes_mapped in a way consistent with the page
> fault handler chain.

The HW validates the range. The range check in the ODP case is to
protect against a HW bug that would cause the kernel to
malfunction. For dmabuf you don't need to do it

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
  2020-11-06 12:48         ` Jason Gunthorpe
@ 2020-11-06 16:10           ` Xiong, Jianxin
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06 16:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Friday, November 06, 2020 4:49 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
> 
> On Fri, Nov 06, 2020 at 01:11:38AM +0000, Xiong, Jianxin wrote:
> > > On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> > > > @@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
> > > >  	struct mlx5_ib_mr *mr;
> > > >  	unsigned int page_size;
> > > >
> > > > -	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> > > > +	if (umem->is_dmabuf)
> > > > +		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
> > >
> > > You said the sgl is not set here, why doesn't this crash? It is certainly wrong to call this function without a SGL.
> >
> > The sgl is NULL, and nmap is 0. The 'for_each_sg' loop is just skipped and won't crash.
> 
> Just wire this to 4k it is clearer than calling some no-op pgsz

Ok

> 
> 
> > > > +	if (!mr->cache_ent) {
> > > > +		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > > > +		WARN_ON(mr->descs);
> > > > +	}
> > > > +}
> > >
> > > I would expect this to call ib_umem_dmabuf_unmap_pages() ?
> > >
> > > Who calls it on the dereg path?
> > >
> > > This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages() only from the invalidate callback?
> >
> > It is also called from ib_umem_dmabuf_release().
> 
> Hmm, that is no how the other APIs work, the unmap should be paired with the map in the caller, and the sequence for destroy should be
> 
>  invalidate
>  unmap
>  destroy_mkey
>  release_umem
> 
> I have another series coming that makes the other three destroy flows much closer to that ideal.
> 

Can fix that.

> > > I feel uneasy how this seems to assume everything works sanely, we
> > > can have parallel page faults so pagefault_dmabuf_mr() can be called multiple times after an invalidation, and it doesn't protect itself
> against calling ib_umem_dmabuf_map_pages() twice.
> > >
> > > Perhaps the umem code should keep track of the current map state and
> > > exit if there is already a sgl. NULL or not NULL sgl would do and seems quite reasonable.
> >
> > Ib_umem_dmabuf_map() already checks the sgl and will do nothing if it is already set.
> 
> How? What I see in patch 1 is an unconditonal call to
> dma_buf_map_attachment() ?

My bad. I misread the lines. It used to be there (in v3) but somehow got lost. 

> 
> > > > +		if (is_dmabuf_mr(mr))
> > > > +			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
> > > > +						   bcnt, bytes_mapped, flags);
> > >
> > > But this doesn't care about user_va or bcnt it just triggers the whole thing to be remapped, so why calculate it?
> >
> > The range check is still needed, in order to catch application errors
> > of using incorrect address or count in verbs command. Passing the
> > values further in is to allow pagefault_dmabuf_mr to generate return
> > value and set bytes_mapped in a way consistent with the page fault
> > handler chain.
> 
> The HW validates the range. The range check in the ODP case is to protect against a HW bug that would cause the kernel to malfunction.
> For dmabuf you don't need to do it

Ok.  So the handler can simply return 0 (as the number of pages mapped) and leave bytes_mapped untouched?

> 
> Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
@ 2020-11-06 16:10           ` Xiong, Jianxin
  0 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06 16:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Friday, November 06, 2020 4:49 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region
> 
> On Fri, Nov 06, 2020 at 01:11:38AM +0000, Xiong, Jianxin wrote:
> > > On Thu, Nov 05, 2020 at 02:48:08PM -0800, Jianxin Xiong wrote:
> > > > @@ -966,7 +969,10 @@ static struct mlx5_ib_mr *alloc_mr_from_cache(struct ib_pd *pd,
> > > >  	struct mlx5_ib_mr *mr;
> > > >  	unsigned int page_size;
> > > >
> > > > -	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
> > > > +	if (umem->is_dmabuf)
> > > > +		page_size = ib_umem_find_best_pgsz(umem, PAGE_SIZE, iova);
> > >
> > > You said the sgl is not set here, why doesn't this crash? It is certainly wrong to call this function without a SGL.
> >
> > The sgl is NULL, and nmap is 0. The 'for_each_sg' loop is just skipped and won't crash.
> 
> Just wire this to 4k it is clearer than calling some no-op pgsz

Ok

> 
> 
> > > > +	if (!mr->cache_ent) {
> > > > +		mlx5_core_destroy_mkey(mr->dev->mdev, &mr->mmkey);
> > > > +		WARN_ON(mr->descs);
> > > > +	}
> > > > +}
> > >
> > > I would expect this to call ib_umem_dmabuf_unmap_pages() ?
> > >
> > > Who calls it on the dereg path?
> > >
> > > This looks quite strange to me, it calls ib_umem_dmabuf_unmap_pages() only from the invalidate callback?
> >
> > It is also called from ib_umem_dmabuf_release().
> 
> Hmm, that is no how the other APIs work, the unmap should be paired with the map in the caller, and the sequence for destroy should be
> 
>  invalidate
>  unmap
>  destroy_mkey
>  release_umem
> 
> I have another series coming that makes the other three destroy flows much closer to that ideal.
> 

Can fix that.

> > > I feel uneasy how this seems to assume everything works sanely, we
> > > can have parallel page faults so pagefault_dmabuf_mr() can be called multiple times after an invalidation, and it doesn't protect itself
> against calling ib_umem_dmabuf_map_pages() twice.
> > >
> > > Perhaps the umem code should keep track of the current map state and
> > > exit if there is already a sgl. NULL or not NULL sgl would do and seems quite reasonable.
> >
> > Ib_umem_dmabuf_map() already checks the sgl and will do nothing if it is already set.
> 
> How? What I see in patch 1 is an unconditonal call to
> dma_buf_map_attachment() ?

My bad. I misread the lines. It used to be there (in v3) but somehow got lost. 

> 
> > > > +		if (is_dmabuf_mr(mr))
> > > > +			return pagefault_dmabuf_mr(mr, umem_dmabuf, user_va,
> > > > +						   bcnt, bytes_mapped, flags);
> > >
> > > But this doesn't care about user_va or bcnt it just triggers the whole thing to be remapped, so why calculate it?
> >
> > The range check is still needed, in order to catch application errors
> > of using incorrect address or count in verbs command. Passing the
> > values further in is to allow pagefault_dmabuf_mr to generate return
> > value and set bytes_mapped in a way consistent with the page fault
> > handler chain.
> 
> The HW validates the range. The range check in the ODP case is to protect against a HW bug that would cause the kernel to malfunction.
> For dmabuf you don't need to do it

Ok.  So the handler can simply return 0 (as the number of pages mapped) and leave bytes_mapped untouched?

> 
> Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  2020-11-06  0:13     ` Jason Gunthorpe
@ 2020-11-06 16:20       ` Xiong, Jianxin
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06 16:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, November 05, 2020 4:13 PM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
> 
> On Thu, Nov 05, 2020 at 02:48:07PM -0800, Jianxin Xiong wrote:
> 
> > +	ret = ib_check_mr_access(access_flags);
> > +	if (ret)
> > +		return ret;
> 
> This should also reject unsupportable flags like ACCESS_ON_DEMAND and HUGETLB

Will do.

> 
> > +
> > +	mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, virt_addr,
> > +						fd, access_flags,
> > +						&attrs->driver_udata);
> > +	if (IS_ERR(mr))
> > +		return PTR_ERR(mr);
> > +
> > +	mr->device  = pd->device;
> > +	mr->pd      = pd;
> > +	mr->type    = IB_MR_TYPE_USER;
> > +	mr->uobject = uobj;
> > +	atomic_inc(&pd->usecnt);
> 
> Fix intending when copying code please

It could be due to a mix of tab and space. They look aligned in the source file. Will fix.

> 
> > +
> > +	uobj->object = mr;
> 
> Also bit surprised this works at all, it needs to call
> 
>   uverbs_finalize_uobj_create()
> 
> Right here.
> 

Will fix.

> Seems like the core code is missing some check that the API is being used properly.
> 
> > +
> > +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
> > +			     &mr->lkey, sizeof(mr->lkey));
> > +	if (ret)
> > +		goto err_dereg;
> > +
> > +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
> > +			     &mr->rkey, sizeof(mr->rkey));
> > +	if (ret)
> > +		goto err_dereg;
> > +
> > +	return 0;
> > +
> > +err_dereg:
> > +	ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
> > +
> > +	return ret;
> > +}
> > +
> >  DECLARE_UVERBS_NAMED_METHOD(
> >  	UVERBS_METHOD_ADVISE_MR,
> >  	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
> 
> > @@ -253,6 +364,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
> >  DECLARE_UVERBS_NAMED_OBJECT(
> >  	UVERBS_OBJECT_MR,
> >  	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
> > +	&UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR),
> >  	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
> >  	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
> >  	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
> 
> I trie to keep these sorted, but I see it has become unsorted. You can re-sort it in this patch

Will do.

> 
> Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
@ 2020-11-06 16:20       ` Xiong, Jianxin
  0 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06 16:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, November 05, 2020 4:13 PM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
> 
> On Thu, Nov 05, 2020 at 02:48:07PM -0800, Jianxin Xiong wrote:
> 
> > +	ret = ib_check_mr_access(access_flags);
> > +	if (ret)
> > +		return ret;
> 
> This should also reject unsupportable flags like ACCESS_ON_DEMAND and HUGETLB

Will do.

> 
> > +
> > +	mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, virt_addr,
> > +						fd, access_flags,
> > +						&attrs->driver_udata);
> > +	if (IS_ERR(mr))
> > +		return PTR_ERR(mr);
> > +
> > +	mr->device  = pd->device;
> > +	mr->pd      = pd;
> > +	mr->type    = IB_MR_TYPE_USER;
> > +	mr->uobject = uobj;
> > +	atomic_inc(&pd->usecnt);
> 
> Fix intending when copying code please

It could be due to a mix of tab and space. They look aligned in the source file. Will fix.

> 
> > +
> > +	uobj->object = mr;
> 
> Also bit surprised this works at all, it needs to call
> 
>   uverbs_finalize_uobj_create()
> 
> Right here.
> 

Will fix.

> Seems like the core code is missing some check that the API is being used properly.
> 
> > +
> > +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
> > +			     &mr->lkey, sizeof(mr->lkey));
> > +	if (ret)
> > +		goto err_dereg;
> > +
> > +	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
> > +			     &mr->rkey, sizeof(mr->rkey));
> > +	if (ret)
> > +		goto err_dereg;
> > +
> > +	return 0;
> > +
> > +err_dereg:
> > +	ib_dereg_mr_user(mr, uverbs_get_cleared_udata(attrs));
> > +
> > +	return ret;
> > +}
> > +
> >  DECLARE_UVERBS_NAMED_METHOD(
> >  	UVERBS_METHOD_ADVISE_MR,
> >  	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
> 
> > @@ -253,6 +364,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
> >  DECLARE_UVERBS_NAMED_OBJECT(
> >  	UVERBS_OBJECT_MR,
> >  	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
> > +	&UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR),
> >  	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
> >  	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
> >  	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
> 
> I trie to keep these sorted, but I see it has become unsorted. You can re-sort it in this patch

Will do.

> 
> Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
  2020-11-06  0:08     ` Jason Gunthorpe
@ 2020-11-06 16:34       ` Xiong, Jianxin
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06 16:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, November 05, 2020 4:09 PM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
> 
> On Thu, Nov 05, 2020 at 02:48:05PM -0800, Jianxin Xiong wrote:
> > +	/* modify the sgl in-place to match umem address and length */
> > +
> > +	start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
> > +	end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
> > +		    PAGE_SIZE);
> > +	cur = 0;
> > +	nmap = 0;
> > +	for_each_sgtable_dma_sg(sgt, sg, i) {
> > +		if (cur >= end)
> > +			break;
> > +		if (cur + sg_dma_len(sg) <= start) {
> > +			cur += sg_dma_len(sg);
> > +			continue;
> > +		}
> 
> This seems like a strange way to compute interesections 

I can rework that.

> 
>   if (cur <= start && start < cur + sg_dma_len(sg))
> 
> > +		if (cur <= start) {
> > +			unsigned long offset = start - cur;
> > +
> > +			umem_dmabuf->first_sg = sg;
> > +			umem_dmabuf->first_sg_offset = offset;
> > +			sg_dma_address(sg) += offset;
> > +			sg_dma_len(sg) -= offset;
> > +			if (&sg_dma_len(sg) != &sg->length)
> > +				sg->length -= offset;
> 
> We don't need to adjust sg->length, only dma_len, so no reason for this surprising if.
> 
> > +			cur += offset;
> > +		}
> > +		if (cur + sg_dma_len(sg) >= end) {
> 
> Same logic here
> 
> > +			unsigned long trim = cur + sg_dma_len(sg) - end;
> > +
> > +			umem_dmabuf->last_sg = sg;
> > +			umem_dmabuf->last_sg_trim = trim;
> > +			sg_dma_len(sg) -= trim;
> > +			if (&sg_dma_len(sg) != &sg->length)
> > +				sg->length -= trim;
> 
> break, things are done here
> 
> > +		}
> > +		cur += sg_dma_len(sg);
> > +		nmap++;
> > +	}
> 
> > +
> > +	umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
> > +	umem_dmabuf->umem.sg_head.nents = nmap;
> > +	umem_dmabuf->umem.nmap = nmap;
> > +	umem_dmabuf->sgt = sgt;
> > +
> > +	page_size = ib_umem_find_best_pgsz(&umem_dmabuf->umem, PAGE_SIZE,
> > +					   umem_dmabuf->umem.iova);
> > +
> > +	if (WARN_ON(cur != end || page_size != PAGE_SIZE)) {
> 
> Looks like nothing prevents this warn on to tigger
> 
> The user could specify a length that is beyond the dma buf, can the dma buf length be checked during get?

In order to check the length, the buffer needs to be mapped. That can be done.

> 
> Also page_size can be 0 because iova is not OK. iova should be checked for alignment during get as well:
> 
>   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> 

If ib_umem_dmabuf_map_pages is called during get this error is automatically caught.

> But yes, this is the right idea
> 
> Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-11-06 16:34       ` Xiong, Jianxin
  0 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06 16:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, November 05, 2020 4:09 PM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
> 
> On Thu, Nov 05, 2020 at 02:48:05PM -0800, Jianxin Xiong wrote:
> > +	/* modify the sgl in-place to match umem address and length */
> > +
> > +	start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
> > +	end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
> > +		    PAGE_SIZE);
> > +	cur = 0;
> > +	nmap = 0;
> > +	for_each_sgtable_dma_sg(sgt, sg, i) {
> > +		if (cur >= end)
> > +			break;
> > +		if (cur + sg_dma_len(sg) <= start) {
> > +			cur += sg_dma_len(sg);
> > +			continue;
> > +		}
> 
> This seems like a strange way to compute interesections 

I can rework that.

> 
>   if (cur <= start && start < cur + sg_dma_len(sg))
> 
> > +		if (cur <= start) {
> > +			unsigned long offset = start - cur;
> > +
> > +			umem_dmabuf->first_sg = sg;
> > +			umem_dmabuf->first_sg_offset = offset;
> > +			sg_dma_address(sg) += offset;
> > +			sg_dma_len(sg) -= offset;
> > +			if (&sg_dma_len(sg) != &sg->length)
> > +				sg->length -= offset;
> 
> We don't need to adjust sg->length, only dma_len, so no reason for this surprising if.
> 
> > +			cur += offset;
> > +		}
> > +		if (cur + sg_dma_len(sg) >= end) {
> 
> Same logic here
> 
> > +			unsigned long trim = cur + sg_dma_len(sg) - end;
> > +
> > +			umem_dmabuf->last_sg = sg;
> > +			umem_dmabuf->last_sg_trim = trim;
> > +			sg_dma_len(sg) -= trim;
> > +			if (&sg_dma_len(sg) != &sg->length)
> > +				sg->length -= trim;
> 
> break, things are done here
> 
> > +		}
> > +		cur += sg_dma_len(sg);
> > +		nmap++;
> > +	}
> 
> > +
> > +	umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
> > +	umem_dmabuf->umem.sg_head.nents = nmap;
> > +	umem_dmabuf->umem.nmap = nmap;
> > +	umem_dmabuf->sgt = sgt;
> > +
> > +	page_size = ib_umem_find_best_pgsz(&umem_dmabuf->umem, PAGE_SIZE,
> > +					   umem_dmabuf->umem.iova);
> > +
> > +	if (WARN_ON(cur != end || page_size != PAGE_SIZE)) {
> 
> Looks like nothing prevents this warn on to tigger
> 
> The user could specify a length that is beyond the dma buf, can the dma buf length be checked during get?

In order to check the length, the buffer needs to be mapped. That can be done.

> 
> Also page_size can be 0 because iova is not OK. iova should be checked for alignment during get as well:
> 
>   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> 

If ib_umem_dmabuf_map_pages is called during get this error is automatically caught.

> But yes, this is the right idea
> 
> Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  2020-11-06 16:20       ` Xiong, Jianxin
@ 2020-11-06 16:37         ` Jason Gunthorpe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06 16:37 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

On Fri, Nov 06, 2020 at 04:20:34PM +0000, Xiong, Jianxin wrote:
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Thursday, November 05, 2020 4:13 PM
> > To: Xiong, Jianxin <jianxin.xiong@intel.com>
> > Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> > <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> > <daniel.vetter@intel.com>
> > Subject: Re: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
> > 
> > On Thu, Nov 05, 2020 at 02:48:07PM -0800, Jianxin Xiong wrote:
> > 
> > > +	ret = ib_check_mr_access(access_flags);
> > > +	if (ret)
> > > +		return ret;
> > 
> > This should also reject unsupportable flags like ACCESS_ON_DEMAND and HUGETLB
> 
> Will do.

Just change IB_ACCESS_SUPPORTED to the list of allowed flags in this
context


> > > +	mr->device  = pd->device;
> > > +	mr->pd      = pd;
> > > +	mr->type    = IB_MR_TYPE_USER;
> > > +	mr->uobject = uobj;
> > > +	atomic_inc(&pd->usecnt);
> > 
> > Fix intending when copying code please
> 
> It could be due to a mix of tab and space. They look aligned in the source file. Will fix.

The interior spaces before the = should not be there, we don't align ='s

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
@ 2020-11-06 16:37         ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06 16:37 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

On Fri, Nov 06, 2020 at 04:20:34PM +0000, Xiong, Jianxin wrote:
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Thursday, November 05, 2020 4:13 PM
> > To: Xiong, Jianxin <jianxin.xiong@intel.com>
> > Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> > <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> > <daniel.vetter@intel.com>
> > Subject: Re: [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
> > 
> > On Thu, Nov 05, 2020 at 02:48:07PM -0800, Jianxin Xiong wrote:
> > 
> > > +	ret = ib_check_mr_access(access_flags);
> > > +	if (ret)
> > > +		return ret;
> > 
> > This should also reject unsupportable flags like ACCESS_ON_DEMAND and HUGETLB
> 
> Will do.

Just change IB_ACCESS_SUPPORTED to the list of allowed flags in this
context


> > > +	mr->device  = pd->device;
> > > +	mr->pd      = pd;
> > > +	mr->type    = IB_MR_TYPE_USER;
> > > +	mr->uobject = uobj;
> > > +	atomic_inc(&pd->usecnt);
> > 
> > Fix intending when copying code please
> 
> It could be due to a mix of tab and space. They look aligned in the source file. Will fix.

The interior spaces before the = should not be there, we don't align ='s

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
  2020-11-06 16:34       ` Xiong, Jianxin
@ 2020-11-06 16:39         ` Jason Gunthorpe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06 16:39 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:

> > The user could specify a length that is beyond the dma buf, can
> > the dma buf length be checked during get?
> 
> In order to check the length, the buffer needs to be mapped. That can be done.

Do DMA bufs even have definitate immutable lengths? Going to be a
probelm if they can shrink

> > Also page_size can be 0 because iova is not OK. iova should be
> > checked for alignment during get as well:
> > 
> >   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> 
> If ib_umem_dmabuf_map_pages is called during get this error is automatically caught.

The ib_umem_find_best_pgsz() checks this equation, yes.

So you'd map the sgl before allocating the mkey? This then checks the
length and iova?

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-11-06 16:39         ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-06 16:39 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:

> > The user could specify a length that is beyond the dma buf, can
> > the dma buf length be checked during get?
> 
> In order to check the length, the buffer needs to be mapped. That can be done.

Do DMA bufs even have definitate immutable lengths? Going to be a
probelm if they can shrink

> > Also page_size can be 0 because iova is not OK. iova should be
> > checked for alignment during get as well:
> > 
> >   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> 
> If ib_umem_dmabuf_map_pages is called during get this error is automatically caught.

The ib_umem_find_best_pgsz() checks this equation, yes.

So you'd map the sgl before allocating the mkey? This then checks the
length and iova?

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
  2020-11-06 16:39         ` Jason Gunthorpe
@ 2020-11-06 17:01           ` Xiong, Jianxin
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06 17:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Friday, November 06, 2020 8:40 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
> 
> On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> 
> > > The user could specify a length that is beyond the dma buf, can the
> > > dma buf length be checked during get?
> >
> > In order to check the length, the buffer needs to be mapped. That can be done.
> 
> Do DMA bufs even have definitate immutable lengths? Going to be a probelm if they can shrink

Good question. The buffer itself has fixed size. If for whatever reason the mapping
is not full it must be temporary. If that does happen ib_umem_dmabuf_map_pages
will undo the mapping and return error. It will be retried later via the pagefault handler.

> 
> > > Also page_size can be 0 because iova is not OK. iova should be
> > > checked for alignment during get as well:
> > >
> > >   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> >
> > If ib_umem_dmabuf_map_pages is called during get this error is automatically caught.
> 
> The ib_umem_find_best_pgsz() checks this equation, yes.
> 
> So you'd map the sgl before allocating the mkey? This then checks the length and iova?

Yes

> 
> Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-11-06 17:01           ` Xiong, Jianxin
  0 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-06 17:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Friday, November 06, 2020 8:40 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
> 
> On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> 
> > > The user could specify a length that is beyond the dma buf, can the
> > > dma buf length be checked during get?
> >
> > In order to check the length, the buffer needs to be mapped. That can be done.
> 
> Do DMA bufs even have definitate immutable lengths? Going to be a probelm if they can shrink

Good question. The buffer itself has fixed size. If for whatever reason the mapping
is not full it must be temporary. If that does happen ib_umem_dmabuf_map_pages
will undo the mapping and return error. It will be retried later via the pagefault handler.

> 
> > > Also page_size can be 0 because iova is not OK. iova should be
> > > checked for alignment during get as well:
> > >
> > >   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> >
> > If ib_umem_dmabuf_map_pages is called during get this error is automatically caught.
> 
> The ib_umem_find_best_pgsz() checks this equation, yes.
> 
> So you'd map the sgl before allocating the mkey? This then checks the length and iova?

Yes

> 
> Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
  2020-11-06 16:39         ` Jason Gunthorpe
@ 2020-11-10 14:14           ` Daniel Vetter
  -1 siblings, 0 replies; 42+ messages in thread
From: Daniel Vetter @ 2020-11-10 14:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Xiong, Jianxin, Leon Romanovsky, linux-rdma, dri-devel,
	Doug Ledford, Vetter, Daniel, Christian Koenig

On Fri, Nov 06, 2020 at 12:39:53PM -0400, Jason Gunthorpe wrote:
> On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> 
> > > The user could specify a length that is beyond the dma buf, can
> > > the dma buf length be checked during get?
> > 
> > In order to check the length, the buffer needs to be mapped. That can be done.
> 
> Do DMA bufs even have definitate immutable lengths? Going to be a
> probelm if they can shrink

Yup. Unfortunately that's not document in the structures themselves,
there's just some random comments sprinkled all over that dma-buf size is
invariant, e.g. see the @mmap kerneldoc in dma_buf_ops:

https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_buf_ops#c.dma_buf_ops

"Because dma-buf buffers have invariant size over their lifetime, ..."

Jianxin, can you pls do a kerneldoc patch which updates the comment for
dma_buf.size and dma_buf_export_info.size?

Thanks, Daniel

> 
> > > Also page_size can be 0 because iova is not OK. iova should be
> > > checked for alignment during get as well:
> > > 
> > >   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> > 
> > If ib_umem_dmabuf_map_pages is called during get this error is automatically caught.
> 
> The ib_umem_find_best_pgsz() checks this equation, yes.
> 
> So you'd map the sgl before allocating the mkey? This then checks the
> length and iova?
> 
> Jason
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-11-10 14:14           ` Daniel Vetter
  0 siblings, 0 replies; 42+ messages in thread
From: Daniel Vetter @ 2020-11-10 14:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig, Xiong, Jianxin

On Fri, Nov 06, 2020 at 12:39:53PM -0400, Jason Gunthorpe wrote:
> On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> 
> > > The user could specify a length that is beyond the dma buf, can
> > > the dma buf length be checked during get?
> > 
> > In order to check the length, the buffer needs to be mapped. That can be done.
> 
> Do DMA bufs even have definitate immutable lengths? Going to be a
> probelm if they can shrink

Yup. Unfortunately that's not document in the structures themselves,
there's just some random comments sprinkled all over that dma-buf size is
invariant, e.g. see the @mmap kerneldoc in dma_buf_ops:

https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_buf_ops#c.dma_buf_ops

"Because dma-buf buffers have invariant size over their lifetime, ..."

Jianxin, can you pls do a kerneldoc patch which updates the comment for
dma_buf.size and dma_buf_export_info.size?

Thanks, Daniel

> 
> > > Also page_size can be 0 because iova is not OK. iova should be
> > > checked for alignment during get as well:
> > > 
> > >   iova & (PAGE_SIZE-1) == umem->addr & (PAGE_SIZE-1)
> > 
> > If ib_umem_dmabuf_map_pages is called during get this error is automatically caught.
> 
> The ib_umem_find_best_pgsz() checks this equation, yes.
> 
> So you'd map the sgl before allocating the mkey? This then checks the
> length and iova?
> 
> Jason
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
  2020-11-10 14:14           ` Daniel Vetter
@ 2020-11-10 14:27             ` Jason Gunthorpe
  -1 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-10 14:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Xiong, Jianxin, Leon Romanovsky, linux-rdma, dri-devel,
	Doug Ledford, Vetter, Daniel, Christian Koenig

On Tue, Nov 10, 2020 at 03:14:45PM +0100, Daniel Vetter wrote:
> On Fri, Nov 06, 2020 at 12:39:53PM -0400, Jason Gunthorpe wrote:
> > On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> > 
> > > > The user could specify a length that is beyond the dma buf, can
> > > > the dma buf length be checked during get?
> > > 
> > > In order to check the length, the buffer needs to be mapped. That can be done.
> > 
> > Do DMA bufs even have definitate immutable lengths? Going to be a
> > probelm if they can shrink
> 
> Yup. Unfortunately that's not document in the structures themselves,
> there's just some random comments sprinkled all over that dma-buf size is
> invariant, e.g. see the @mmap kerneldoc in dma_buf_ops:
> 
> https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_buf_ops#c.dma_buf_ops
> 
> "Because dma-buf buffers have invariant size over their lifetime, ..."
> 
> Jianxin, can you pls do a kerneldoc patch which updates the comment for
> dma_buf.size and dma_buf_export_info.size?

So we can check the size without doing an attachment?

That means the flow should be put back to how it was a series or two
ago where the SGL is only attached during page fault with the entire
programming sequence under the resv lock

Jason

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-11-10 14:27             ` Jason Gunthorpe
  0 siblings, 0 replies; 42+ messages in thread
From: Jason Gunthorpe @ 2020-11-10 14:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig, Xiong, Jianxin

On Tue, Nov 10, 2020 at 03:14:45PM +0100, Daniel Vetter wrote:
> On Fri, Nov 06, 2020 at 12:39:53PM -0400, Jason Gunthorpe wrote:
> > On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> > 
> > > > The user could specify a length that is beyond the dma buf, can
> > > > the dma buf length be checked during get?
> > > 
> > > In order to check the length, the buffer needs to be mapped. That can be done.
> > 
> > Do DMA bufs even have definitate immutable lengths? Going to be a
> > probelm if they can shrink
> 
> Yup. Unfortunately that's not document in the structures themselves,
> there's just some random comments sprinkled all over that dma-buf size is
> invariant, e.g. see the @mmap kerneldoc in dma_buf_ops:
> 
> https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_buf_ops#c.dma_buf_ops
> 
> "Because dma-buf buffers have invariant size over their lifetime, ..."
> 
> Jianxin, can you pls do a kerneldoc patch which updates the comment for
> dma_buf.size and dma_buf_export_info.size?

So we can check the size without doing an attachment?

That means the flow should be put back to how it was a series or two
ago where the SGL is only attached during page fault with the entire
programming sequence under the resv lock

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
  2020-11-10 14:27             ` Jason Gunthorpe
@ 2020-11-10 14:44               ` Daniel Vetter
  -1 siblings, 0 replies; 42+ messages in thread
From: Daniel Vetter @ 2020-11-10 14:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, Xiong, Jianxin, Leon Romanovsky, linux-rdma,
	dri-devel, Doug Ledford, Vetter, Daniel, Christian Koenig

On Tue, Nov 10, 2020 at 10:27:57AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 10, 2020 at 03:14:45PM +0100, Daniel Vetter wrote:
> > On Fri, Nov 06, 2020 at 12:39:53PM -0400, Jason Gunthorpe wrote:
> > > On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> > > 
> > > > > The user could specify a length that is beyond the dma buf, can
> > > > > the dma buf length be checked during get?
> > > > 
> > > > In order to check the length, the buffer needs to be mapped. That can be done.
> > > 
> > > Do DMA bufs even have definitate immutable lengths? Going to be a
> > > probelm if they can shrink
> > 
> > Yup. Unfortunately that's not document in the structures themselves,
> > there's just some random comments sprinkled all over that dma-buf size is
> > invariant, e.g. see the @mmap kerneldoc in dma_buf_ops:
> > 
> > https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_buf_ops#c.dma_buf_ops
> > 
> > "Because dma-buf buffers have invariant size over their lifetime, ..."
> > 
> > Jianxin, can you pls do a kerneldoc patch which updates the comment for
> > dma_buf.size and dma_buf_export_info.size?
> 
> So we can check the size without doing an attachment?

Yeah size should be invariant over the lifetime of the dma_buf (it's also
needed for dma_buf_vmap kernel cpu access and dma_buf_mmap userspace mmap
forwarding). No lock or attachment needed. But like I said, would be good
to have the kerneldoc patch to make this clear.

The history here is that the X shm extension suffered badly from
resizeable storage object exploits of the X server, this is why both
dma-buf (and also drm_gem_buffer_object before that generalization) and
memfd are size sealed.
-Daniel

> That means the flow should be put back to how it was a series or two
> ago where the SGL is only attached during page fault with the entire
> programming sequence under the resv lock
> 
> Jason

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-11-10 14:44               ` Daniel Vetter
  0 siblings, 0 replies; 42+ messages in thread
From: Daniel Vetter @ 2020-11-10 14:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig, Xiong, Jianxin

On Tue, Nov 10, 2020 at 10:27:57AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 10, 2020 at 03:14:45PM +0100, Daniel Vetter wrote:
> > On Fri, Nov 06, 2020 at 12:39:53PM -0400, Jason Gunthorpe wrote:
> > > On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> > > 
> > > > > The user could specify a length that is beyond the dma buf, can
> > > > > the dma buf length be checked during get?
> > > > 
> > > > In order to check the length, the buffer needs to be mapped. That can be done.
> > > 
> > > Do DMA bufs even have definitate immutable lengths? Going to be a
> > > probelm if they can shrink
> > 
> > Yup. Unfortunately that's not document in the structures themselves,
> > there's just some random comments sprinkled all over that dma-buf size is
> > invariant, e.g. see the @mmap kerneldoc in dma_buf_ops:
> > 
> > https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_buf_ops#c.dma_buf_ops
> > 
> > "Because dma-buf buffers have invariant size over their lifetime, ..."
> > 
> > Jianxin, can you pls do a kerneldoc patch which updates the comment for
> > dma_buf.size and dma_buf_export_info.size?
> 
> So we can check the size without doing an attachment?

Yeah size should be invariant over the lifetime of the dma_buf (it's also
needed for dma_buf_vmap kernel cpu access and dma_buf_mmap userspace mmap
forwarding). No lock or attachment needed. But like I said, would be good
to have the kerneldoc patch to make this clear.

The history here is that the X shm extension suffered badly from
resizeable storage object exploits of the X server, this is why both
dma-buf (and also drm_gem_buffer_object before that generalization) and
memfd are size sealed.
-Daniel

> That means the flow should be put back to how it was a series or two
> ago where the SGL is only attached during page fault with the entire
> programming sequence under the resv lock
> 
> Jason

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
  2020-11-10 14:44               ` Daniel Vetter
@ 2020-11-10 17:23                 ` Xiong, Jianxin
  -1 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-10 17:23 UTC (permalink / raw)
  To: Daniel Vetter, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

> -----Original Message-----
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Tuesday, November 10, 2020 6:44 AM
> To: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Daniel Vetter <daniel@ffwll.ch>; Xiong, Jianxin <jianxin.xiong@intel.com>; Leon Romanovsky <leon@kernel.org>; linux-
> rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Vetter, Daniel <daniel.vetter@intel.com>;
> Christian Koenig <christian.koenig@amd.com>
> Subject: Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
> 
> On Tue, Nov 10, 2020 at 10:27:57AM -0400, Jason Gunthorpe wrote:
> > On Tue, Nov 10, 2020 at 03:14:45PM +0100, Daniel Vetter wrote:
> > > On Fri, Nov 06, 2020 at 12:39:53PM -0400, Jason Gunthorpe wrote:
> > > > On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> > > >
> > > > > > The user could specify a length that is beyond the dma buf,
> > > > > > can the dma buf length be checked during get?
> > > > >
> > > > > In order to check the length, the buffer needs to be mapped. That can be done.
> > > >
> > > > Do DMA bufs even have definitate immutable lengths? Going to be a
> > > > probelm if they can shrink
> > >
> > > Yup. Unfortunately that's not document in the structures themselves,
> > > there's just some random comments sprinkled all over that dma-buf
> > > size is invariant, e.g. see the @mmap kerneldoc in dma_buf_ops:
> > >
> > > https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlig
> > > ht=dma_buf_ops#c.dma_buf_ops
> > >
> > > "Because dma-buf buffers have invariant size over their lifetime, ..."
> > >
> > > Jianxin, can you pls do a kerneldoc patch which updates the comment
> > > for dma_buf.size and dma_buf_export_info.size?

Sure. 

> >
> > So we can check the size without doing an attachment?
> 
> Yeah size should be invariant over the lifetime of the dma_buf (it's also needed for dma_buf_vmap kernel cpu access and dma_buf_mmap
> userspace mmap forwarding). No lock or attachment needed. But like I said, would be good to have the kerneldoc patch to make this clear.
> 
> The history here is that the X shm extension suffered badly from resizeable storage object exploits of the X server, this is why both dma-buf
> (and also drm_gem_buffer_object before that generalization) and memfd are size sealed.
> -Daniel
> 
> > That means the flow should be put back to how it was a series or two
> > ago where the SGL is only attached during page fault with the entire
> > programming sequence under the resv lock

Will do.

> >
> > Jason
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-11-10 17:23                 ` Xiong, Jianxin
  0 siblings, 0 replies; 42+ messages in thread
From: Xiong, Jianxin @ 2020-11-10 17:23 UTC (permalink / raw)
  To: Daniel Vetter, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

> -----Original Message-----
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Tuesday, November 10, 2020 6:44 AM
> To: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Daniel Vetter <daniel@ffwll.ch>; Xiong, Jianxin <jianxin.xiong@intel.com>; Leon Romanovsky <leon@kernel.org>; linux-
> rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Vetter, Daniel <daniel.vetter@intel.com>;
> Christian Koenig <christian.koenig@amd.com>
> Subject: Re: [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region
> 
> On Tue, Nov 10, 2020 at 10:27:57AM -0400, Jason Gunthorpe wrote:
> > On Tue, Nov 10, 2020 at 03:14:45PM +0100, Daniel Vetter wrote:
> > > On Fri, Nov 06, 2020 at 12:39:53PM -0400, Jason Gunthorpe wrote:
> > > > On Fri, Nov 06, 2020 at 04:34:07PM +0000, Xiong, Jianxin wrote:
> > > >
> > > > > > The user could specify a length that is beyond the dma buf,
> > > > > > can the dma buf length be checked during get?
> > > > >
> > > > > In order to check the length, the buffer needs to be mapped. That can be done.
> > > >
> > > > Do DMA bufs even have definitate immutable lengths? Going to be a
> > > > probelm if they can shrink
> > >
> > > Yup. Unfortunately that's not document in the structures themselves,
> > > there's just some random comments sprinkled all over that dma-buf
> > > size is invariant, e.g. see the @mmap kerneldoc in dma_buf_ops:
> > >
> > > https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlig
> > > ht=dma_buf_ops#c.dma_buf_ops
> > >
> > > "Because dma-buf buffers have invariant size over their lifetime, ..."
> > >
> > > Jianxin, can you pls do a kerneldoc patch which updates the comment
> > > for dma_buf.size and dma_buf_export_info.size?

Sure. 

> >
> > So we can check the size without doing an attachment?
> 
> Yeah size should be invariant over the lifetime of the dma_buf (it's also needed for dma_buf_vmap kernel cpu access and dma_buf_mmap
> userspace mmap forwarding). No lock or attachment needed. But like I said, would be good to have the kerneldoc patch to make this clear.
> 
> The history here is that the X shm extension suffered badly from resizeable storage object exploits of the X server, this is why both dma-buf
> (and also drm_gem_buffer_object before that generalization) and memfd are size sealed.
> -Daniel
> 
> > That means the flow should be put back to how it was a series or two
> > ago where the SGL is only attached during page fault with the entire
> > programming sequence under the resv lock

Will do.

> >
> > Jason
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-11-11  7:56 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-05 22:48 [PATCH v8 0/5] RDMA: Add dma-buf support Jianxin Xiong
2020-11-05 22:48 ` Jianxin Xiong
2020-11-05 22:48 ` [PATCH v8 1/5] RDMA/umem: Support importing dma-buf as user memory region Jianxin Xiong
2020-11-05 22:48   ` Jianxin Xiong
2020-11-06  0:08   ` Jason Gunthorpe
2020-11-06  0:08     ` Jason Gunthorpe
2020-11-06 16:34     ` Xiong, Jianxin
2020-11-06 16:34       ` Xiong, Jianxin
2020-11-06 16:39       ` Jason Gunthorpe
2020-11-06 16:39         ` Jason Gunthorpe
2020-11-06 17:01         ` Xiong, Jianxin
2020-11-06 17:01           ` Xiong, Jianxin
2020-11-10 14:14         ` Daniel Vetter
2020-11-10 14:14           ` Daniel Vetter
2020-11-10 14:27           ` Jason Gunthorpe
2020-11-10 14:27             ` Jason Gunthorpe
2020-11-10 14:44             ` Daniel Vetter
2020-11-10 14:44               ` Daniel Vetter
2020-11-10 17:23               ` Xiong, Jianxin
2020-11-10 17:23                 ` Xiong, Jianxin
2020-11-05 22:48 ` [PATCH v8 2/5] RDMA/core: Add device method for registering dma-buf based " Jianxin Xiong
2020-11-05 22:48   ` Jianxin Xiong
2020-11-05 22:48 ` [PATCH v8 3/5] RDMA/uverbs: Add uverbs command for dma-buf based MR registration Jianxin Xiong
2020-11-05 22:48   ` Jianxin Xiong
2020-11-06  0:13   ` Jason Gunthorpe
2020-11-06  0:13     ` Jason Gunthorpe
2020-11-06 16:20     ` Xiong, Jianxin
2020-11-06 16:20       ` Xiong, Jianxin
2020-11-06 16:37       ` Jason Gunthorpe
2020-11-06 16:37         ` Jason Gunthorpe
2020-11-05 22:48 ` [PATCH v8 4/5] RDMA/mlx5: Support dma-buf based userspace memory region Jianxin Xiong
2020-11-05 22:48   ` Jianxin Xiong
2020-11-06  0:25   ` Jason Gunthorpe
2020-11-06  0:25     ` Jason Gunthorpe
2020-11-06  1:11     ` Xiong, Jianxin
2020-11-06  1:11       ` Xiong, Jianxin
2020-11-06 12:48       ` Jason Gunthorpe
2020-11-06 12:48         ` Jason Gunthorpe
2020-11-06 16:10         ` Xiong, Jianxin
2020-11-06 16:10           ` Xiong, Jianxin
2020-11-05 22:48 ` [PATCH v8 5/5] dma-buf: Reject attach request from importers that use dma_virt_ops Jianxin Xiong
2020-11-05 22:48   ` Jianxin Xiong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.