All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2020-12-15 21:27 ` Jianxin Xiong
  0 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

This is the sixteenth version of the patch set. Changelog:

v16:
* Add "select DMA_SHARED_BUFFER" to Kconfig when IB UMEM is enabled.
  This fixes the auto build test error with a random config.

v15: https://www.spinics.net/lists/linux-rdma/msg98369.html
* Rebase to the latest linux-rdma 'for-next' branch (commit 0583531bb9ef)
  to pick up RDMA core and mlx5 updates
* Let ib_umem_dmabuf_get() return 'struct ib_umem_dmabuf *' instead of
  'struct ib_umem *'
* Move the check of on demand paging support to mlx5_ib_reg_user_mr_dmabuf()
* Check iova alignment at the entry point of the uverb command so that
  mlx5_umem_dmabuf_default_pgsz() can always succeed

v14: https://www.spinics.net/lists/linux-rdma/msg98265.html
* Check return value of dma_fence_wait()
* Fix a dma-buf leak in ib_umem_dmabuf_get()
* Fix return value type cast for ib_umem_dmabuf_get()
* Return -EOPNOTSUPP instead of -EINVAL for unimplemented functions
* Remove an unnecessary use of unlikely()
* Remove left-over commit message resulted from rebase

v13: https://www.spinics.net/lists/linux-rdma/msg98227.html
* Rebase to the latest linux-rdma 'for-next' branch (5.10.0-rc6+)
* Check for device on-demand paging capability at the entry point of
  the new verbs command to avoid calling device's reg_user_mr_dmabuf()
  method when CONFIG_INFINIBAND_ON_DEMAND_PAGING is diabled.

v12: https://www.spinics.net/lists/linux-rdma/msg97943.html
* Move the prototype of function ib_umem_dmabuf_release() to ib_umem.h
  and remove umem_dmabuf.h
* Break a line that is too long

v11: https://www.spinics.net/lists/linux-rdma/msg97860.html
* Rework the parameter checking code inside ib_umem_dmabuf_get() 
* Fix incorrect error handling in the new verbs command handler
* Put a duplicated code sequence for checking iova and setting page size
  into a function
* In the invalidation callback, check for if the buffer has been mapped
  and thus the presence of a valid driver mr is ensured
* The patch that checks for dma_virt_ops is dropped because it is no
  longer needed
* The patch that documents that dma-buf size is fixed has landed at:
  https://cgit.freedesktop.org/drm/drm-misc/commit/?id=476b485be03c
  and thus is no longer included here
* The matching user space patch set is sent separately

v10: https://www.spinics.net/lists/linux-rdma/msg97483.html
* Don't map the pages in ib_umem_dmabuf_get(); use the size information
  of the dma-buf object to validate the umem size instead
* Use PAGE_SIZE directly instead of use ib_umem_find_best_pgsz() when
  the MR is created since the pages have not been mapped yet and dma-buf
  requires PAGE_SIZE anyway
* Always call mlx5_umem_find_best_pgsz() after mapping the pages to
  verify that the page size requirement is satisfied
* Add a patch to document that dma-buf size is fixed

v9: https://www.spinics.net/lists/linux-rdma/msg97432.html
* Clean up the code for sg list in-place modification
* Prevent dma-buf pages from being mapped multiple times
* Map the pages in ib_umem_dmabuf_get() so that inproper values of
  address/length/iova can be caught early
* Check for unsupported flags in the new uverbs command
* Add missing uverbs_finalize_uobj_create()
* Sort uverbs objects by name
* Fix formating issue -- unnecessary alignment of '='
* Unmap pages in mlx5_ib_fence_dmabuf_mr()
* Remove address range checking from pagefault_dmabuf_mr()

v8: https://www.spinics.net/lists/linux-rdma/msg97370.html
* Modify the dma-buf sg list in place to get a proper umem sg list and
  restore it before calling dma_buf_unmap_attachment()
* Validate the umem sg list with ib_umem_find_best_pgsz()
* Remove the logic for slicing the sg list at runtime

v7: https://www.spinics.net/lists/linux-rdma/msg97297.html
* Rebase on top of latest mlx5 MR patch series
* Slice dma-buf sg list at runtime instead of creating a new list
* Preload the buffer page mapping when the MR is created
* Move the 'dma_virt_ops' check into dma_buf_dynamic_attach()

v6: https://www.spinics.net/lists/linux-rdma/msg96923.html
* Move the dma-buf invalidation callback from the core to the device
  driver
* Move mapping update from work queue to pagefault handler
* Add dma-buf based MRs to the xarray of mmkeys so that the pagefault
  handler can be reached
* Update the new driver method and uverbs command signature by changing
  the paramter 'addr' to 'offset'
* Modify the sg list returned from dma_buf_map_attachment() based on
  the parameters 'offset' and 'length'
* Don't import dma-buf if 'dma_virt_ops' is used by the dma device
* The patch that clarifies dma-buf sg lists alignment has landed at
  https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac80cd17a615
  and thus is no longer included with this set

v5: https://www.spinics.net/lists/linux-rdma/msg96786.html
* Fix a few warnings reported by kernel test robot:
    - no previous prototype for function 'ib_umem_dmabuf_release' 
    - no previous prototype for function 'ib_umem_dmabuf_map_pages'
    - comparison of distinct pointer types in 'check_add_overflow'
* Add comment for the wait between getting the dma-buf sg tagle and
  updating the NIC page table

v4: https://www.spinics.net/lists/linux-rdma/msg96767.html
* Add a new ib_device method reg_user_mr_dmabuf() instead of expanding
  the existing method reg_user_mr()
* Use a separate code flow for dma-buf instead of adding special cases
  to the ODP memory region code path
* In invalidation callback, new mapping is updated as whole using work
  queue instead of being updated in page granularity in the page fault
  handler
* Use dma_resv_get_excl() and dma_fence_wait() to ensure the content of
  the pages have been moved to the new location before the new mapping
  is programmed into the NIC
* Add code to the ODP page fault handler to check the mapping status
* The new access flag added in v3 is removed.
* The checking for on-demand paging support in the new uverbs command
  is removed because it is implied by implementing the new ib_device
  method
* Clarify that dma-buf sg lists are page aligned

v3: https://www.spinics.net/lists/linux-rdma/msg96330.html
* Use dma_buf_dynamic_attach() instead of dma_buf_attach()
* Use on-demand paging mechanism to avoid pinning the GPU memory
* Instead of adding a new parameter to the device method for memory
  registration, pass all the attributes including the file descriptor
  as a structure
* Define a new access flag for dma-buf based memory region
* Check for on-demand paging support in the new uverbs command

v2: https://www.spinics.net/lists/linux-rdma/msg93643.html
* The Kconfig option is removed. There is no dependence issue since
  dma-buf driver is always enabled.
* The declaration of new data structure and functions is reorganized to
  minimize the visibility of the changes.
* The new uverbs command now goes through ioctl() instead of write().
* The rereg functionality is removed.
* Instead of adding new device method for dma-buf specific registration,
  existing method is extended to accept an extra parameter. 
* The correct function is now used for address range checking. 

v1: https://www.spinics.net/lists/linux-rdma/msg90720.html
* The initial patch set
* Implement core functions for importing and mapping dma-buf
* Use dma-buf static attach interface
* Add two ib_device methods reg_user_mr_fd() and rereg_user_mr_fd()
* Add two uverbs commands via the write() interface
* Add Kconfig option
* Add dma-buf support to mlx5 device

When enabled, an RDMA capable NIC can perform peer-to-peer transactions
over PCIe to access the local memory located on another device. This can
often lead to better performance than using a system memory buffer for
RDMA and copying data between the buffer and device memory.

Current kernel RDMA stack uses get_user_pages() to pin the physical
pages backing the user buffer and uses dma_map_sg_attrs() to get the
dma addresses for memory access. This usually doesn't work for peer
device memory due to the lack of associated page structures.

Several mechanisms exist today to facilitate device memory access.

ZONE_DEVICE is a new zone for device memory in the memory management
subsystem. It allows pages from device memory being described with
specialized page structures, but what can be done with these page
structures may be different from system memory. ZONE_DEVICE is further
specialized into multiple memory types, such as one type for PCI
p2pmem/p2pdma and one type for HMM.

PCI p2pmem/p2pdma uses ZONE_DEVICE to represent device memory residing
in a PCI BAR and provides a set of calls to publish, discover, allocate,
and map such memory for peer-to-peer transactions. One feature of the
API is that the buffer is allocated by the side that does the DMA
transfer. This works well with the storage usage case, but is awkward
with GPU-NIC communication, where typically the buffer is allocated by
the GPU driver rather than the NIC driver.

Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier
and ZONE_DEVICE to support shared virtual address space and page
migration between system memory and device memory. HMM doesn't support
pinning device memory because pages located on device must be able to
migrate to system memory when accessed by CPU. Peer-to-peer access
is currently not supported by HMM.

Dma-buf is a standard mechanism for sharing buffers among different
device drivers. The buffer to be shared is exported by the owning
driver and imported by the driver that wants to use it. The exporter
provides a set of ops that the importer can call to pin and map the
buffer. In addition, a file descriptor can be associated with a dma-
buf object as the handle that can be passed to user space.

This patch series adds dma-buf importer role to the RDMA driver in
attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
chosen for a few reasons: first, the API is relatively simple and allows
a lot of flexibility in implementing the buffer manipulation ops.
Second, it doesn't require page structure. Third, dma-buf is already
supported in many GPU drivers. However, we are aware that existing GPU
drivers don't allow pinning device memory via the dma-buf interface.
Pinning would simply cause the backing storage to migrate to system RAM.
True peer-to-peer access is only possible using dynamic attach, which
requires on-demand paging support from the NIC to work. For this reason,
this series only works with ODP capable NICs.

This series consists of four patches. The first patch adds the common
code for importing dma-buf from a file descriptor and mapping the
dma-buf pages. Patch 2 add the new driver method reg_user_mr_dmabuf().
Patch 3 adds a new uverbs command for registering dma-buf based memory
region. Patch 4 adds dma-buf support to the mlx5 driver.

Related user space RDMA library changes are provided as a separate
patch series.

Jianxin Xiong (4):
  RDMA/umem: Support importing dma-buf as user memory region
  RDMA/core: Add device method for registering dma-buf based memory
    region
  RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  RDMA/mlx5: Support dma-buf based userspace memory region

 drivers/infiniband/Kconfig                    |   1 +
 drivers/infiniband/core/Makefile              |   2 +-
 drivers/infiniband/core/device.c              |   1 +
 drivers/infiniband/core/umem.c                |   3 +
 drivers/infiniband/core/umem_dmabuf.c         | 174 ++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_std_types_mr.c | 117 ++++++++++++++++-
 drivers/infiniband/hw/mlx5/main.c             |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  18 +++
 drivers/infiniband/hw/mlx5/mr.c               | 112 ++++++++++++++++-
 drivers/infiniband/hw/mlx5/odp.c              |  89 ++++++++++++-
 include/rdma/ib_umem.h                        |  48 ++++++-
 include/rdma/ib_verbs.h                       |   6 +-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 +++
 13 files changed, 573 insertions(+), 14 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2020-12-15 21:27 ` Jianxin Xiong
  0 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

This is the sixteenth version of the patch set. Changelog:

v16:
* Add "select DMA_SHARED_BUFFER" to Kconfig when IB UMEM is enabled.
  This fixes the auto build test error with a random config.

v15: https://www.spinics.net/lists/linux-rdma/msg98369.html
* Rebase to the latest linux-rdma 'for-next' branch (commit 0583531bb9ef)
  to pick up RDMA core and mlx5 updates
* Let ib_umem_dmabuf_get() return 'struct ib_umem_dmabuf *' instead of
  'struct ib_umem *'
* Move the check of on demand paging support to mlx5_ib_reg_user_mr_dmabuf()
* Check iova alignment at the entry point of the uverb command so that
  mlx5_umem_dmabuf_default_pgsz() can always succeed

v14: https://www.spinics.net/lists/linux-rdma/msg98265.html
* Check return value of dma_fence_wait()
* Fix a dma-buf leak in ib_umem_dmabuf_get()
* Fix return value type cast for ib_umem_dmabuf_get()
* Return -EOPNOTSUPP instead of -EINVAL for unimplemented functions
* Remove an unnecessary use of unlikely()
* Remove left-over commit message resulted from rebase

v13: https://www.spinics.net/lists/linux-rdma/msg98227.html
* Rebase to the latest linux-rdma 'for-next' branch (5.10.0-rc6+)
* Check for device on-demand paging capability at the entry point of
  the new verbs command to avoid calling device's reg_user_mr_dmabuf()
  method when CONFIG_INFINIBAND_ON_DEMAND_PAGING is diabled.

v12: https://www.spinics.net/lists/linux-rdma/msg97943.html
* Move the prototype of function ib_umem_dmabuf_release() to ib_umem.h
  and remove umem_dmabuf.h
* Break a line that is too long

v11: https://www.spinics.net/lists/linux-rdma/msg97860.html
* Rework the parameter checking code inside ib_umem_dmabuf_get() 
* Fix incorrect error handling in the new verbs command handler
* Put a duplicated code sequence for checking iova and setting page size
  into a function
* In the invalidation callback, check for if the buffer has been mapped
  and thus the presence of a valid driver mr is ensured
* The patch that checks for dma_virt_ops is dropped because it is no
  longer needed
* The patch that documents that dma-buf size is fixed has landed at:
  https://cgit.freedesktop.org/drm/drm-misc/commit/?id=476b485be03c
  and thus is no longer included here
* The matching user space patch set is sent separately

v10: https://www.spinics.net/lists/linux-rdma/msg97483.html
* Don't map the pages in ib_umem_dmabuf_get(); use the size information
  of the dma-buf object to validate the umem size instead
* Use PAGE_SIZE directly instead of use ib_umem_find_best_pgsz() when
  the MR is created since the pages have not been mapped yet and dma-buf
  requires PAGE_SIZE anyway
* Always call mlx5_umem_find_best_pgsz() after mapping the pages to
  verify that the page size requirement is satisfied
* Add a patch to document that dma-buf size is fixed

v9: https://www.spinics.net/lists/linux-rdma/msg97432.html
* Clean up the code for sg list in-place modification
* Prevent dma-buf pages from being mapped multiple times
* Map the pages in ib_umem_dmabuf_get() so that inproper values of
  address/length/iova can be caught early
* Check for unsupported flags in the new uverbs command
* Add missing uverbs_finalize_uobj_create()
* Sort uverbs objects by name
* Fix formating issue -- unnecessary alignment of '='
* Unmap pages in mlx5_ib_fence_dmabuf_mr()
* Remove address range checking from pagefault_dmabuf_mr()

v8: https://www.spinics.net/lists/linux-rdma/msg97370.html
* Modify the dma-buf sg list in place to get a proper umem sg list and
  restore it before calling dma_buf_unmap_attachment()
* Validate the umem sg list with ib_umem_find_best_pgsz()
* Remove the logic for slicing the sg list at runtime

v7: https://www.spinics.net/lists/linux-rdma/msg97297.html
* Rebase on top of latest mlx5 MR patch series
* Slice dma-buf sg list at runtime instead of creating a new list
* Preload the buffer page mapping when the MR is created
* Move the 'dma_virt_ops' check into dma_buf_dynamic_attach()

v6: https://www.spinics.net/lists/linux-rdma/msg96923.html
* Move the dma-buf invalidation callback from the core to the device
  driver
* Move mapping update from work queue to pagefault handler
* Add dma-buf based MRs to the xarray of mmkeys so that the pagefault
  handler can be reached
* Update the new driver method and uverbs command signature by changing
  the paramter 'addr' to 'offset'
* Modify the sg list returned from dma_buf_map_attachment() based on
  the parameters 'offset' and 'length'
* Don't import dma-buf if 'dma_virt_ops' is used by the dma device
* The patch that clarifies dma-buf sg lists alignment has landed at
  https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac80cd17a615
  and thus is no longer included with this set

v5: https://www.spinics.net/lists/linux-rdma/msg96786.html
* Fix a few warnings reported by kernel test robot:
    - no previous prototype for function 'ib_umem_dmabuf_release' 
    - no previous prototype for function 'ib_umem_dmabuf_map_pages'
    - comparison of distinct pointer types in 'check_add_overflow'
* Add comment for the wait between getting the dma-buf sg tagle and
  updating the NIC page table

v4: https://www.spinics.net/lists/linux-rdma/msg96767.html
* Add a new ib_device method reg_user_mr_dmabuf() instead of expanding
  the existing method reg_user_mr()
* Use a separate code flow for dma-buf instead of adding special cases
  to the ODP memory region code path
* In invalidation callback, new mapping is updated as whole using work
  queue instead of being updated in page granularity in the page fault
  handler
* Use dma_resv_get_excl() and dma_fence_wait() to ensure the content of
  the pages have been moved to the new location before the new mapping
  is programmed into the NIC
* Add code to the ODP page fault handler to check the mapping status
* The new access flag added in v3 is removed.
* The checking for on-demand paging support in the new uverbs command
  is removed because it is implied by implementing the new ib_device
  method
* Clarify that dma-buf sg lists are page aligned

v3: https://www.spinics.net/lists/linux-rdma/msg96330.html
* Use dma_buf_dynamic_attach() instead of dma_buf_attach()
* Use on-demand paging mechanism to avoid pinning the GPU memory
* Instead of adding a new parameter to the device method for memory
  registration, pass all the attributes including the file descriptor
  as a structure
* Define a new access flag for dma-buf based memory region
* Check for on-demand paging support in the new uverbs command

v2: https://www.spinics.net/lists/linux-rdma/msg93643.html
* The Kconfig option is removed. There is no dependence issue since
  dma-buf driver is always enabled.
* The declaration of new data structure and functions is reorganized to
  minimize the visibility of the changes.
* The new uverbs command now goes through ioctl() instead of write().
* The rereg functionality is removed.
* Instead of adding new device method for dma-buf specific registration,
  existing method is extended to accept an extra parameter. 
* The correct function is now used for address range checking. 

v1: https://www.spinics.net/lists/linux-rdma/msg90720.html
* The initial patch set
* Implement core functions for importing and mapping dma-buf
* Use dma-buf static attach interface
* Add two ib_device methods reg_user_mr_fd() and rereg_user_mr_fd()
* Add two uverbs commands via the write() interface
* Add Kconfig option
* Add dma-buf support to mlx5 device

When enabled, an RDMA capable NIC can perform peer-to-peer transactions
over PCIe to access the local memory located on another device. This can
often lead to better performance than using a system memory buffer for
RDMA and copying data between the buffer and device memory.

Current kernel RDMA stack uses get_user_pages() to pin the physical
pages backing the user buffer and uses dma_map_sg_attrs() to get the
dma addresses for memory access. This usually doesn't work for peer
device memory due to the lack of associated page structures.

Several mechanisms exist today to facilitate device memory access.

ZONE_DEVICE is a new zone for device memory in the memory management
subsystem. It allows pages from device memory being described with
specialized page structures, but what can be done with these page
structures may be different from system memory. ZONE_DEVICE is further
specialized into multiple memory types, such as one type for PCI
p2pmem/p2pdma and one type for HMM.

PCI p2pmem/p2pdma uses ZONE_DEVICE to represent device memory residing
in a PCI BAR and provides a set of calls to publish, discover, allocate,
and map such memory for peer-to-peer transactions. One feature of the
API is that the buffer is allocated by the side that does the DMA
transfer. This works well with the storage usage case, but is awkward
with GPU-NIC communication, where typically the buffer is allocated by
the GPU driver rather than the NIC driver.

Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier
and ZONE_DEVICE to support shared virtual address space and page
migration between system memory and device memory. HMM doesn't support
pinning device memory because pages located on device must be able to
migrate to system memory when accessed by CPU. Peer-to-peer access
is currently not supported by HMM.

Dma-buf is a standard mechanism for sharing buffers among different
device drivers. The buffer to be shared is exported by the owning
driver and imported by the driver that wants to use it. The exporter
provides a set of ops that the importer can call to pin and map the
buffer. In addition, a file descriptor can be associated with a dma-
buf object as the handle that can be passed to user space.

This patch series adds dma-buf importer role to the RDMA driver in
attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
chosen for a few reasons: first, the API is relatively simple and allows
a lot of flexibility in implementing the buffer manipulation ops.
Second, it doesn't require page structure. Third, dma-buf is already
supported in many GPU drivers. However, we are aware that existing GPU
drivers don't allow pinning device memory via the dma-buf interface.
Pinning would simply cause the backing storage to migrate to system RAM.
True peer-to-peer access is only possible using dynamic attach, which
requires on-demand paging support from the NIC to work. For this reason,
this series only works with ODP capable NICs.

This series consists of four patches. The first patch adds the common
code for importing dma-buf from a file descriptor and mapping the
dma-buf pages. Patch 2 add the new driver method reg_user_mr_dmabuf().
Patch 3 adds a new uverbs command for registering dma-buf based memory
region. Patch 4 adds dma-buf support to the mlx5 driver.

Related user space RDMA library changes are provided as a separate
patch series.

Jianxin Xiong (4):
  RDMA/umem: Support importing dma-buf as user memory region
  RDMA/core: Add device method for registering dma-buf based memory
    region
  RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  RDMA/mlx5: Support dma-buf based userspace memory region

 drivers/infiniband/Kconfig                    |   1 +
 drivers/infiniband/core/Makefile              |   2 +-
 drivers/infiniband/core/device.c              |   1 +
 drivers/infiniband/core/umem.c                |   3 +
 drivers/infiniband/core/umem_dmabuf.c         | 174 ++++++++++++++++++++++++++
 drivers/infiniband/core/uverbs_std_types_mr.c | 117 ++++++++++++++++-
 drivers/infiniband/hw/mlx5/main.c             |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  18 +++
 drivers/infiniband/hw/mlx5/mr.c               | 112 ++++++++++++++++-
 drivers/infiniband/hw/mlx5/odp.c              |  89 ++++++++++++-
 include/rdma/ib_umem.h                        |  48 ++++++-
 include/rdma/ib_verbs.h                       |   6 +-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 +++
 13 files changed, 573 insertions(+), 14 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c

-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v16 1/4] RDMA/umem: Support importing dma-buf as user memory region
  2020-12-15 21:27 ` Jianxin Xiong
@ 2020-12-15 21:27   ` Jianxin Xiong
  -1 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

Dma-buf is a standard cross-driver buffer sharing mechanism that can be
used to support peer-to-peer access from RDMA devices.

Device memory exported via dma-buf is associated with a file descriptor.
This is passed to the user space as a property associated with the
buffer allocation. When the buffer is registered as a memory region,
the file descriptor is passed to the RDMA driver along with other
parameters.

Implement the common code for importing dma-buf object and mapping
dma-buf pages.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/Kconfig            |   1 +
 drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/umem.c        |   3 +
 drivers/infiniband/core/umem_dmabuf.c | 174 ++++++++++++++++++++++++++++++++++
 include/rdma/ib_umem.h                |  48 +++++++++-
 5 files changed, 224 insertions(+), 4 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 9325e18..04a78d9 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -41,6 +41,7 @@ config INFINIBAND_USER_MEM
 	bool
 	depends on INFINIBAND_USER_ACCESS != n
 	depends on MMU
+	select DMA_SHARED_BUFFER
 	default y
 
 config INFINIBAND_ON_DEMAND_PAGING
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index ccf2670..8ab4eea 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -40,5 +40,5 @@ ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
 				uverbs_std_types_srq.o \
 				uverbs_std_types_wq.o \
 				uverbs_std_types_qp.o
-ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
+ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
 ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 7ca4112..cc131f8 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -278,6 +279,8 @@ void ib_umem_release(struct ib_umem *umem)
 {
 	if (!umem)
 		return;
+	if (umem->is_dmabuf)
+		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
 	if (umem->is_odp)
 		return ib_umem_odp_release(to_ib_umem_odp(umem));
 
diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
new file mode 100644
index 0000000..f9b5162
--- /dev/null
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
+#include <linux/dma-mapping.h>
+
+#include "uverbs.h"
+
+int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	struct sg_table *sgt;
+	struct scatterlist *sg;
+	struct dma_fence *fence;
+	unsigned long start, end, cur = 0;
+	unsigned int nmap = 0;
+	int i;
+
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (umem_dmabuf->sgt)
+		goto wait_fence;
+
+	sgt = dma_buf_map_attachment(umem_dmabuf->attach, DMA_BIDIRECTIONAL);
+	if (IS_ERR(sgt))
+		return PTR_ERR(sgt);
+
+	/* modify the sg list in-place to match umem address and length */
+
+	start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
+	end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
+		    PAGE_SIZE);
+	for_each_sgtable_dma_sg(sgt, sg, i) {
+		if (start < cur + sg_dma_len(sg) && cur < end)
+			nmap++;
+		if (cur <= start && start < cur + sg_dma_len(sg)) {
+			unsigned long offset = start - cur;
+
+			umem_dmabuf->first_sg = sg;
+			umem_dmabuf->first_sg_offset = offset;
+			sg_dma_address(sg) += offset;
+			sg_dma_len(sg) -= offset;
+			cur += offset;
+		}
+		if (cur < end && end <= cur + sg_dma_len(sg)) {
+			unsigned long trim = cur + sg_dma_len(sg) - end;
+
+			umem_dmabuf->last_sg = sg;
+			umem_dmabuf->last_sg_trim = trim;
+			sg_dma_len(sg) -= trim;
+			break;
+		}
+		cur += sg_dma_len(sg);
+	}
+
+	umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
+	umem_dmabuf->umem.sg_head.nents = nmap;
+	umem_dmabuf->umem.nmap = nmap;
+	umem_dmabuf->sgt = sgt;
+
+wait_fence:
+	/*
+	 * Although the sg list is valid now, the content of the pages
+	 * may be not up-to-date. Wait for the exporter to finish
+	 * the migration.
+	 */
+	fence = dma_resv_get_excl(umem_dmabuf->attach->dmabuf->resv);
+	if (fence)
+		return dma_fence_wait(fence, false);
+
+	return 0;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);
+
+void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!umem_dmabuf->sgt)
+		return;
+
+	/* retore the original sg list */
+	if (umem_dmabuf->first_sg) {
+		sg_dma_address(umem_dmabuf->first_sg) -=
+			umem_dmabuf->first_sg_offset;
+		sg_dma_len(umem_dmabuf->first_sg) +=
+			umem_dmabuf->first_sg_offset;
+		umem_dmabuf->first_sg = NULL;
+		umem_dmabuf->first_sg_offset = 0;
+	}
+	if (umem_dmabuf->last_sg) {
+		sg_dma_len(umem_dmabuf->last_sg) +=
+			umem_dmabuf->last_sg_trim;
+		umem_dmabuf->last_sg = NULL;
+		umem_dmabuf->last_sg_trim = 0;
+	}
+
+	dma_buf_unmap_attachment(umem_dmabuf->attach, umem_dmabuf->sgt,
+				 DMA_BIDIRECTIONAL);
+
+	umem_dmabuf->sgt = NULL;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_unmap_pages);
+
+struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,
+					  unsigned long offset, size_t size,
+					  int fd, int access,
+					  const struct dma_buf_attach_ops *ops)
+{
+	struct dma_buf *dmabuf;
+	struct ib_umem_dmabuf *umem_dmabuf;
+	struct ib_umem *umem;
+	unsigned long end;
+	struct ib_umem_dmabuf *ret = ERR_PTR(-EINVAL);
+
+	if (check_add_overflow(offset, (unsigned long)size, &end))
+		return ret;
+
+	if (unlikely(!ops || !ops->move_notify))
+		return ret;
+
+	dmabuf = dma_buf_get(fd);
+	if (IS_ERR(dmabuf))
+		return ERR_CAST(dmabuf);
+
+	if (dmabuf->size < end)
+		goto out_release_dmabuf;
+
+	umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
+	if (!umem_dmabuf) {
+		ret = ERR_PTR(-ENOMEM);
+		goto out_release_dmabuf;
+	}
+
+	umem = &umem_dmabuf->umem;
+	umem->ibdev = device;
+	umem->length = size;
+	umem->address = offset;
+	umem->writable = ib_access_writable(access);
+	umem->is_dmabuf = 1;
+
+	if (!ib_umem_num_pages(umem))
+		goto out_free_umem;
+
+	umem_dmabuf->attach = dma_buf_dynamic_attach(
+					dmabuf,
+					device->dma_device,
+					ops,
+					umem_dmabuf);
+	if (IS_ERR(umem_dmabuf->attach)) {
+		ret = ERR_CAST(umem_dmabuf->attach);
+		goto out_free_umem;
+	}
+	return umem_dmabuf;
+
+out_free_umem:
+	kfree(umem_dmabuf);
+
+out_release_dmabuf:
+	dma_buf_put(dmabuf);
+	return ret;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_get);
+
+void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf;
+
+	dma_buf_detach(dmabuf, umem_dmabuf->attach);
+	dma_buf_put(dmabuf);
+	kfree(umem_dmabuf);
+}
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 7752211..676c57f 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
 /*
  * Copyright (c) 2007 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporation.  All rights reserved.
  */
 
 #ifndef IB_UMEM_H
@@ -13,6 +14,7 @@
 
 struct ib_ucontext;
 struct ib_umem_odp;
+struct dma_buf_attach_ops;
 
 struct ib_umem {
 	struct ib_device       *ibdev;
@@ -22,12 +24,29 @@ struct ib_umem {
 	unsigned long		address;
 	u32 writable : 1;
 	u32 is_odp : 1;
+	u32 is_dmabuf : 1;
 	struct work_struct	work;
 	struct sg_table sg_head;
 	int             nmap;
 	unsigned int    sg_nents;
 };
 
+struct ib_umem_dmabuf {
+	struct ib_umem umem;
+	struct dma_buf_attachment *attach;
+	struct sg_table *sgt;
+	struct scatterlist *first_sg;
+	struct scatterlist *last_sg;
+	unsigned long first_sg_offset;
+	unsigned long last_sg_trim;
+	void *private;
+};
+
+static inline struct ib_umem_dmabuf *to_ib_umem_dmabuf(struct ib_umem *umem)
+{
+	return container_of(umem, struct ib_umem_dmabuf, umem);
+}
+
 /* Returns the offset of the umem start relative to the first page. */
 static inline int ib_umem_offset(struct ib_umem *umem)
 {
@@ -86,6 +105,7 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 				     unsigned long pgsz_bitmap,
 				     unsigned long virt);
+
 /**
  * ib_umem_find_best_pgoff - Find best HW page size
  *
@@ -116,6 +136,14 @@ static inline unsigned long ib_umem_find_best_pgoff(struct ib_umem *umem,
 				      dma_addr & pgoff_bitmask);
 }
 
+struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,
+					  unsigned long offset, size_t size,
+					  int fd, int access,
+					  const struct dma_buf_attach_ops *ops);
+int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf);
+void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf);
+void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf);
+
 #else /* CONFIG_INFINIBAND_USER_MEM */
 
 #include <linux/err.h>
@@ -124,12 +152,12 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
 					  unsigned long addr, size_t size,
 					  int access)
 {
-	return ERR_PTR(-EINVAL);
+	return ERR_PTR(-EOPNOTSUPP);
 }
 static inline void ib_umem_release(struct ib_umem *umem) { }
 static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 		      		    size_t length) {
-	return -EINVAL;
+	return -EOPNOTSUPP;
 }
 static inline unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 						   unsigned long pgsz_bitmap,
@@ -143,7 +171,21 @@ static inline unsigned long ib_umem_find_best_pgoff(struct ib_umem *umem,
 {
 	return 0;
 }
+static inline
+struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,
+					  unsigned long offset,
+					  size_t size, int fd,
+					  int access,
+					  struct dma_buf_attach_ops *ops)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+static inline int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	return -EOPNOTSUPP;
+}
+static inline void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf) { }
+static inline void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf) { }
 
 #endif /* CONFIG_INFINIBAND_USER_MEM */
-
 #endif /* IB_UMEM_H */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v16 1/4] RDMA/umem: Support importing dma-buf as user memory region
@ 2020-12-15 21:27   ` Jianxin Xiong
  0 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Dma-buf is a standard cross-driver buffer sharing mechanism that can be
used to support peer-to-peer access from RDMA devices.

Device memory exported via dma-buf is associated with a file descriptor.
This is passed to the user space as a property associated with the
buffer allocation. When the buffer is registered as a memory region,
the file descriptor is passed to the RDMA driver along with other
parameters.

Implement the common code for importing dma-buf object and mapping
dma-buf pages.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/Kconfig            |   1 +
 drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/umem.c        |   3 +
 drivers/infiniband/core/umem_dmabuf.c | 174 ++++++++++++++++++++++++++++++++++
 include/rdma/ib_umem.h                |  48 +++++++++-
 5 files changed, 224 insertions(+), 4 deletions(-)
 create mode 100644 drivers/infiniband/core/umem_dmabuf.c

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 9325e18..04a78d9 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -41,6 +41,7 @@ config INFINIBAND_USER_MEM
 	bool
 	depends on INFINIBAND_USER_ACCESS != n
 	depends on MMU
+	select DMA_SHARED_BUFFER
 	default y
 
 config INFINIBAND_ON_DEMAND_PAGING
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index ccf2670..8ab4eea 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -40,5 +40,5 @@ ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
 				uverbs_std_types_srq.o \
 				uverbs_std_types_wq.o \
 				uverbs_std_types_qp.o
-ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
+ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
 ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 7ca4112..cc131f8 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Cisco Systems.  All rights reserved.
  * Copyright (c) 2005 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -278,6 +279,8 @@ void ib_umem_release(struct ib_umem *umem)
 {
 	if (!umem)
 		return;
+	if (umem->is_dmabuf)
+		return ib_umem_dmabuf_release(to_ib_umem_dmabuf(umem));
 	if (umem->is_odp)
 		return ib_umem_odp_release(to_ib_umem_odp(umem));
 
diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
new file mode 100644
index 0000000..f9b5162
--- /dev/null
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+/*
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
+#include <linux/dma-mapping.h>
+
+#include "uverbs.h"
+
+int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	struct sg_table *sgt;
+	struct scatterlist *sg;
+	struct dma_fence *fence;
+	unsigned long start, end, cur = 0;
+	unsigned int nmap = 0;
+	int i;
+
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (umem_dmabuf->sgt)
+		goto wait_fence;
+
+	sgt = dma_buf_map_attachment(umem_dmabuf->attach, DMA_BIDIRECTIONAL);
+	if (IS_ERR(sgt))
+		return PTR_ERR(sgt);
+
+	/* modify the sg list in-place to match umem address and length */
+
+	start = ALIGN_DOWN(umem_dmabuf->umem.address, PAGE_SIZE);
+	end = ALIGN(umem_dmabuf->umem.address + umem_dmabuf->umem.length,
+		    PAGE_SIZE);
+	for_each_sgtable_dma_sg(sgt, sg, i) {
+		if (start < cur + sg_dma_len(sg) && cur < end)
+			nmap++;
+		if (cur <= start && start < cur + sg_dma_len(sg)) {
+			unsigned long offset = start - cur;
+
+			umem_dmabuf->first_sg = sg;
+			umem_dmabuf->first_sg_offset = offset;
+			sg_dma_address(sg) += offset;
+			sg_dma_len(sg) -= offset;
+			cur += offset;
+		}
+		if (cur < end && end <= cur + sg_dma_len(sg)) {
+			unsigned long trim = cur + sg_dma_len(sg) - end;
+
+			umem_dmabuf->last_sg = sg;
+			umem_dmabuf->last_sg_trim = trim;
+			sg_dma_len(sg) -= trim;
+			break;
+		}
+		cur += sg_dma_len(sg);
+	}
+
+	umem_dmabuf->umem.sg_head.sgl = umem_dmabuf->first_sg;
+	umem_dmabuf->umem.sg_head.nents = nmap;
+	umem_dmabuf->umem.nmap = nmap;
+	umem_dmabuf->sgt = sgt;
+
+wait_fence:
+	/*
+	 * Although the sg list is valid now, the content of the pages
+	 * may be not up-to-date. Wait for the exporter to finish
+	 * the migration.
+	 */
+	fence = dma_resv_get_excl(umem_dmabuf->attach->dmabuf->resv);
+	if (fence)
+		return dma_fence_wait(fence, false);
+
+	return 0;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);
+
+void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!umem_dmabuf->sgt)
+		return;
+
+	/* retore the original sg list */
+	if (umem_dmabuf->first_sg) {
+		sg_dma_address(umem_dmabuf->first_sg) -=
+			umem_dmabuf->first_sg_offset;
+		sg_dma_len(umem_dmabuf->first_sg) +=
+			umem_dmabuf->first_sg_offset;
+		umem_dmabuf->first_sg = NULL;
+		umem_dmabuf->first_sg_offset = 0;
+	}
+	if (umem_dmabuf->last_sg) {
+		sg_dma_len(umem_dmabuf->last_sg) +=
+			umem_dmabuf->last_sg_trim;
+		umem_dmabuf->last_sg = NULL;
+		umem_dmabuf->last_sg_trim = 0;
+	}
+
+	dma_buf_unmap_attachment(umem_dmabuf->attach, umem_dmabuf->sgt,
+				 DMA_BIDIRECTIONAL);
+
+	umem_dmabuf->sgt = NULL;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_unmap_pages);
+
+struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,
+					  unsigned long offset, size_t size,
+					  int fd, int access,
+					  const struct dma_buf_attach_ops *ops)
+{
+	struct dma_buf *dmabuf;
+	struct ib_umem_dmabuf *umem_dmabuf;
+	struct ib_umem *umem;
+	unsigned long end;
+	struct ib_umem_dmabuf *ret = ERR_PTR(-EINVAL);
+
+	if (check_add_overflow(offset, (unsigned long)size, &end))
+		return ret;
+
+	if (unlikely(!ops || !ops->move_notify))
+		return ret;
+
+	dmabuf = dma_buf_get(fd);
+	if (IS_ERR(dmabuf))
+		return ERR_CAST(dmabuf);
+
+	if (dmabuf->size < end)
+		goto out_release_dmabuf;
+
+	umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
+	if (!umem_dmabuf) {
+		ret = ERR_PTR(-ENOMEM);
+		goto out_release_dmabuf;
+	}
+
+	umem = &umem_dmabuf->umem;
+	umem->ibdev = device;
+	umem->length = size;
+	umem->address = offset;
+	umem->writable = ib_access_writable(access);
+	umem->is_dmabuf = 1;
+
+	if (!ib_umem_num_pages(umem))
+		goto out_free_umem;
+
+	umem_dmabuf->attach = dma_buf_dynamic_attach(
+					dmabuf,
+					device->dma_device,
+					ops,
+					umem_dmabuf);
+	if (IS_ERR(umem_dmabuf->attach)) {
+		ret = ERR_CAST(umem_dmabuf->attach);
+		goto out_free_umem;
+	}
+	return umem_dmabuf;
+
+out_free_umem:
+	kfree(umem_dmabuf);
+
+out_release_dmabuf:
+	dma_buf_put(dmabuf);
+	return ret;
+}
+EXPORT_SYMBOL(ib_umem_dmabuf_get);
+
+void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf;
+
+	dma_buf_detach(dmabuf, umem_dmabuf->attach);
+	dma_buf_put(dmabuf);
+	kfree(umem_dmabuf);
+}
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 7752211..676c57f 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
 /*
  * Copyright (c) 2007 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporation.  All rights reserved.
  */
 
 #ifndef IB_UMEM_H
@@ -13,6 +14,7 @@
 
 struct ib_ucontext;
 struct ib_umem_odp;
+struct dma_buf_attach_ops;
 
 struct ib_umem {
 	struct ib_device       *ibdev;
@@ -22,12 +24,29 @@ struct ib_umem {
 	unsigned long		address;
 	u32 writable : 1;
 	u32 is_odp : 1;
+	u32 is_dmabuf : 1;
 	struct work_struct	work;
 	struct sg_table sg_head;
 	int             nmap;
 	unsigned int    sg_nents;
 };
 
+struct ib_umem_dmabuf {
+	struct ib_umem umem;
+	struct dma_buf_attachment *attach;
+	struct sg_table *sgt;
+	struct scatterlist *first_sg;
+	struct scatterlist *last_sg;
+	unsigned long first_sg_offset;
+	unsigned long last_sg_trim;
+	void *private;
+};
+
+static inline struct ib_umem_dmabuf *to_ib_umem_dmabuf(struct ib_umem *umem)
+{
+	return container_of(umem, struct ib_umem_dmabuf, umem);
+}
+
 /* Returns the offset of the umem start relative to the first page. */
 static inline int ib_umem_offset(struct ib_umem *umem)
 {
@@ -86,6 +105,7 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 				     unsigned long pgsz_bitmap,
 				     unsigned long virt);
+
 /**
  * ib_umem_find_best_pgoff - Find best HW page size
  *
@@ -116,6 +136,14 @@ static inline unsigned long ib_umem_find_best_pgoff(struct ib_umem *umem,
 				      dma_addr & pgoff_bitmask);
 }
 
+struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,
+					  unsigned long offset, size_t size,
+					  int fd, int access,
+					  const struct dma_buf_attach_ops *ops);
+int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf);
+void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf);
+void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf);
+
 #else /* CONFIG_INFINIBAND_USER_MEM */
 
 #include <linux/err.h>
@@ -124,12 +152,12 @@ static inline struct ib_umem *ib_umem_get(struct ib_device *device,
 					  unsigned long addr, size_t size,
 					  int access)
 {
-	return ERR_PTR(-EINVAL);
+	return ERR_PTR(-EOPNOTSUPP);
 }
 static inline void ib_umem_release(struct ib_umem *umem) { }
 static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset,
 		      		    size_t length) {
-	return -EINVAL;
+	return -EOPNOTSUPP;
 }
 static inline unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
 						   unsigned long pgsz_bitmap,
@@ -143,7 +171,21 @@ static inline unsigned long ib_umem_find_best_pgoff(struct ib_umem *umem,
 {
 	return 0;
 }
+static inline
+struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,
+					  unsigned long offset,
+					  size_t size, int fd,
+					  int access,
+					  struct dma_buf_attach_ops *ops)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+static inline int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf)
+{
+	return -EOPNOTSUPP;
+}
+static inline void ib_umem_dmabuf_unmap_pages(struct ib_umem_dmabuf *umem_dmabuf) { }
+static inline void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf) { }
 
 #endif /* CONFIG_INFINIBAND_USER_MEM */
-
 #endif /* IB_UMEM_H */
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v16 2/4] RDMA/core: Add device method for registering dma-buf based memory region
  2020-12-15 21:27 ` Jianxin Xiong
@ 2020-12-15 21:27   ` Jianxin Xiong
  -1 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

Dma-buf based memory region requires one extra parameter and is processed
quite differently. Adding a separate method allows clean separation from
regular memory regions.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/device.c | 1 +
 include/rdma/ib_verbs.h          | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 3ab1ede..23f7440 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2677,6 +2677,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, read_counters);
 	SET_DEVICE_OP(dev_ops, reg_dm_mr);
 	SET_DEVICE_OP(dev_ops, reg_user_mr);
+	SET_DEVICE_OP(dev_ops, reg_user_mr_dmabuf);
 	SET_DEVICE_OP(dev_ops, req_ncomp_notif);
 	SET_DEVICE_OP(dev_ops, req_notify_cq);
 	SET_DEVICE_OP(dev_ops, rereg_user_mr);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 06a5652..b2f02a7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2,7 +2,7 @@
 /*
  * Copyright (c) 2004 Mellanox Technologies Ltd.  All rights reserved.
  * Copyright (c) 2004 Infinicon Corporation.  All rights reserved.
- * Copyright (c) 2004 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2020 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
@@ -2433,6 +2433,10 @@ struct ib_device_ops {
 	struct ib_mr *(*reg_user_mr)(struct ib_pd *pd, u64 start, u64 length,
 				     u64 virt_addr, int mr_access_flags,
 				     struct ib_udata *udata);
+	struct ib_mr *(*reg_user_mr_dmabuf)(struct ib_pd *pd, u64 offset,
+					    u64 length, u64 virt_addr, int fd,
+					    int mr_access_flags,
+					    struct ib_udata *udata);
 	struct ib_mr *(*rereg_user_mr)(struct ib_mr *mr, int flags, u64 start,
 				       u64 length, u64 virt_addr,
 				       int mr_access_flags, struct ib_pd *pd,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v16 2/4] RDMA/core: Add device method for registering dma-buf based memory region
@ 2020-12-15 21:27   ` Jianxin Xiong
  0 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Dma-buf based memory region requires one extra parameter and is processed
quite differently. Adding a separate method allows clean separation from
regular memory regions.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/device.c | 1 +
 include/rdma/ib_verbs.h          | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 3ab1ede..23f7440 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2677,6 +2677,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, read_counters);
 	SET_DEVICE_OP(dev_ops, reg_dm_mr);
 	SET_DEVICE_OP(dev_ops, reg_user_mr);
+	SET_DEVICE_OP(dev_ops, reg_user_mr_dmabuf);
 	SET_DEVICE_OP(dev_ops, req_ncomp_notif);
 	SET_DEVICE_OP(dev_ops, req_notify_cq);
 	SET_DEVICE_OP(dev_ops, rereg_user_mr);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 06a5652..b2f02a7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2,7 +2,7 @@
 /*
  * Copyright (c) 2004 Mellanox Technologies Ltd.  All rights reserved.
  * Copyright (c) 2004 Infinicon Corporation.  All rights reserved.
- * Copyright (c) 2004 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2020 Intel Corporation.  All rights reserved.
  * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
  * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
  * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
@@ -2433,6 +2433,10 @@ struct ib_device_ops {
 	struct ib_mr *(*reg_user_mr)(struct ib_pd *pd, u64 start, u64 length,
 				     u64 virt_addr, int mr_access_flags,
 				     struct ib_udata *udata);
+	struct ib_mr *(*reg_user_mr_dmabuf)(struct ib_pd *pd, u64 offset,
+					    u64 length, u64 virt_addr, int fd,
+					    int mr_access_flags,
+					    struct ib_udata *udata);
 	struct ib_mr *(*rereg_user_mr)(struct ib_mr *mr, int flags, u64 start,
 				       u64 length, u64 virt_addr,
 				       int mr_access_flags, struct ib_pd *pd,
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v16 3/4] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
  2020-12-15 21:27 ` Jianxin Xiong
@ 2020-12-15 21:27   ` Jianxin Xiong
  -1 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

Implement a new uverbs ioctl method for memory registration with file
descriptor as an extra parameter.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/uverbs_std_types_mr.c | 117 +++++++++++++++++++++++++-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 +++
 2 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index dd4e76b..f782d5e 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -182,6 +183,86 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 	return IS_UVERBS_COPY_ERR(ret) ? ret : 0;
 }
 
+static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uobject *uobj =
+		uverbs_attr_get_uobject(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
+	struct ib_pd *pd =
+		uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE);
+	struct ib_device *ib_dev = pd->device;
+
+	u64 offset, length, iova;
+	u32 fd, access_flags;
+	struct ib_mr *mr;
+	int ret;
+
+	if (!ib_dev->ops.reg_user_mr_dmabuf)
+		return -EOPNOTSUPP;
+
+	ret = uverbs_copy_from(&offset, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_OFFSET);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&length, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_LENGTH);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&iova, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_IOVA);
+	if (ret)
+		return ret;
+
+	if ((offset & ~PAGE_MASK) != (iova & ~PAGE_MASK))
+		return -EINVAL;
+
+	ret = uverbs_copy_from(&fd, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_FD);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_flags32(&access_flags, attrs,
+				 UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+				 IB_ACCESS_LOCAL_WRITE |
+				 IB_ACCESS_REMOTE_READ |
+				 IB_ACCESS_REMOTE_WRITE |
+				 IB_ACCESS_REMOTE_ATOMIC |
+				 IB_ACCESS_RELAXED_ORDERING);
+	if (ret)
+		return ret;
+
+	ret = ib_check_mr_access(ib_dev, access_flags);
+	if (ret)
+		return ret;
+
+	mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, iova, fd,
+						access_flags,
+						&attrs->driver_udata);
+	if (IS_ERR(mr))
+		return PTR_ERR(mr);
+
+	mr->device = pd->device;
+	mr->pd = pd;
+	mr->type = IB_MR_TYPE_USER;
+	mr->uobject = uobj;
+	atomic_inc(&pd->usecnt);
+
+	uobj->object = mr;
+
+	uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			     &mr->lkey, sizeof(mr->lkey));
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			     &mr->rkey, sizeof(mr->rkey));
+	return ret;
+}
+
 DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
@@ -247,6 +328,37 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 			    UVERBS_ATTR_TYPE(u32),
 			    UA_MANDATORY));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_REG_DMABUF_MR,
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+			UVERBS_OBJECT_MR,
+			UVERBS_ACCESS_NEW,
+			UA_MANDATORY),
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+			UVERBS_OBJECT_PD,
+			UVERBS_ACCESS_READ,
+			UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_IOVA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_FD,
+			   UVERBS_ATTR_TYPE(u32),
+			   UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+			     enum ib_access_flags),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY));
+
 DECLARE_UVERBS_NAMED_METHOD_DESTROY(
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_MR_HANDLE,
@@ -257,10 +369,11 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 DECLARE_UVERBS_NAMED_OBJECT(
 	UVERBS_OBJECT_MR,
 	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
+	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
 	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
 	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
-	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
-	&UVERBS_METHOD(UVERBS_METHOD_QUERY_MR));
+	&UVERBS_METHOD(UVERBS_METHOD_QUERY_MR),
+	&UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR));
 
 const struct uapi_definition uverbs_def_obj_mr[] = {
 	UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_MR,
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 7968a18..dafc7eb 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -251,6 +252,7 @@ enum uverbs_methods_mr {
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_METHOD_QUERY_MR,
+	UVERBS_METHOD_REG_DMABUF_MR,
 };
 
 enum uverbs_attrs_mr_destroy_ids {
@@ -272,6 +274,18 @@ enum uverbs_attrs_query_mr_cmd_attr_ids {
 	UVERBS_ATTR_QUERY_MR_RESP_IOVA,
 };
 
+enum uverbs_attrs_reg_dmabuf_mr_cmd_attr_ids {
+	UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
+	UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+	UVERBS_ATTR_REG_DMABUF_MR_IOVA,
+	UVERBS_ATTR_REG_DMABUF_MR_FD,
+	UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+};
+
 enum uverbs_attrs_create_counters_cmd_attr_ids {
 	UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
 };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v16 3/4] RDMA/uverbs: Add uverbs command for dma-buf based MR registration
@ 2020-12-15 21:27   ` Jianxin Xiong
  0 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Implement a new uverbs ioctl method for memory registration with file
descriptor as an extra parameter.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/uverbs_std_types_mr.c | 117 +++++++++++++++++++++++++-
 include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 +++
 2 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_std_types_mr.c b/drivers/infiniband/core/uverbs_std_types_mr.c
index dd4e76b..f782d5e 100644
--- a/drivers/infiniband/core/uverbs_std_types_mr.c
+++ b/drivers/infiniband/core/uverbs_std_types_mr.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -182,6 +183,86 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 	return IS_UVERBS_COPY_ERR(ret) ? ret : 0;
 }
 
+static int UVERBS_HANDLER(UVERBS_METHOD_REG_DMABUF_MR)(
+	struct uverbs_attr_bundle *attrs)
+{
+	struct ib_uobject *uobj =
+		uverbs_attr_get_uobject(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
+	struct ib_pd *pd =
+		uverbs_attr_get_obj(attrs, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE);
+	struct ib_device *ib_dev = pd->device;
+
+	u64 offset, length, iova;
+	u32 fd, access_flags;
+	struct ib_mr *mr;
+	int ret;
+
+	if (!ib_dev->ops.reg_user_mr_dmabuf)
+		return -EOPNOTSUPP;
+
+	ret = uverbs_copy_from(&offset, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_OFFSET);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&length, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_LENGTH);
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_from(&iova, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_IOVA);
+	if (ret)
+		return ret;
+
+	if ((offset & ~PAGE_MASK) != (iova & ~PAGE_MASK))
+		return -EINVAL;
+
+	ret = uverbs_copy_from(&fd, attrs,
+			       UVERBS_ATTR_REG_DMABUF_MR_FD);
+	if (ret)
+		return ret;
+
+	ret = uverbs_get_flags32(&access_flags, attrs,
+				 UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+				 IB_ACCESS_LOCAL_WRITE |
+				 IB_ACCESS_REMOTE_READ |
+				 IB_ACCESS_REMOTE_WRITE |
+				 IB_ACCESS_REMOTE_ATOMIC |
+				 IB_ACCESS_RELAXED_ORDERING);
+	if (ret)
+		return ret;
+
+	ret = ib_check_mr_access(ib_dev, access_flags);
+	if (ret)
+		return ret;
+
+	mr = pd->device->ops.reg_user_mr_dmabuf(pd, offset, length, iova, fd,
+						access_flags,
+						&attrs->driver_udata);
+	if (IS_ERR(mr))
+		return PTR_ERR(mr);
+
+	mr->device = pd->device;
+	mr->pd = pd;
+	mr->type = IB_MR_TYPE_USER;
+	mr->uobject = uobj;
+	atomic_inc(&pd->usecnt);
+
+	uobj->object = mr;
+
+	uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			     &mr->lkey, sizeof(mr->lkey));
+	if (ret)
+		return ret;
+
+	ret = uverbs_copy_to(attrs, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			     &mr->rkey, sizeof(mr->rkey));
+	return ret;
+}
+
 DECLARE_UVERBS_NAMED_METHOD(
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_ADVISE_MR_PD_HANDLE,
@@ -247,6 +328,37 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 			    UVERBS_ATTR_TYPE(u32),
 			    UA_MANDATORY));
 
+DECLARE_UVERBS_NAMED_METHOD(
+	UVERBS_METHOD_REG_DMABUF_MR,
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+			UVERBS_OBJECT_MR,
+			UVERBS_ACCESS_NEW,
+			UA_MANDATORY),
+	UVERBS_ATTR_IDR(UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+			UVERBS_OBJECT_PD,
+			UVERBS_ACCESS_READ,
+			UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_IOVA,
+			   UVERBS_ATTR_TYPE(u64),
+			   UA_MANDATORY),
+	UVERBS_ATTR_PTR_IN(UVERBS_ATTR_REG_DMABUF_MR_FD,
+			   UVERBS_ATTR_TYPE(u32),
+			   UA_MANDATORY),
+	UVERBS_ATTR_FLAGS_IN(UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+			     enum ib_access_flags),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY),
+	UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+			    UVERBS_ATTR_TYPE(u32),
+			    UA_MANDATORY));
+
 DECLARE_UVERBS_NAMED_METHOD_DESTROY(
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_MR_HANDLE,
@@ -257,10 +369,11 @@ static int UVERBS_HANDLER(UVERBS_METHOD_QUERY_MR)(
 DECLARE_UVERBS_NAMED_OBJECT(
 	UVERBS_OBJECT_MR,
 	UVERBS_TYPE_ALLOC_IDR(uverbs_free_mr),
+	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
 	&UVERBS_METHOD(UVERBS_METHOD_DM_MR_REG),
 	&UVERBS_METHOD(UVERBS_METHOD_MR_DESTROY),
-	&UVERBS_METHOD(UVERBS_METHOD_ADVISE_MR),
-	&UVERBS_METHOD(UVERBS_METHOD_QUERY_MR));
+	&UVERBS_METHOD(UVERBS_METHOD_QUERY_MR),
+	&UVERBS_METHOD(UVERBS_METHOD_REG_DMABUF_MR));
 
 const struct uapi_definition uverbs_def_obj_mr[] = {
 	UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_MR,
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 7968a18..dafc7eb 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -251,6 +252,7 @@ enum uverbs_methods_mr {
 	UVERBS_METHOD_MR_DESTROY,
 	UVERBS_METHOD_ADVISE_MR,
 	UVERBS_METHOD_QUERY_MR,
+	UVERBS_METHOD_REG_DMABUF_MR,
 };
 
 enum uverbs_attrs_mr_destroy_ids {
@@ -272,6 +274,18 @@ enum uverbs_attrs_query_mr_cmd_attr_ids {
 	UVERBS_ATTR_QUERY_MR_RESP_IOVA,
 };
 
+enum uverbs_attrs_reg_dmabuf_mr_cmd_attr_ids {
+	UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+	UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
+	UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+	UVERBS_ATTR_REG_DMABUF_MR_IOVA,
+	UVERBS_ATTR_REG_DMABUF_MR_FD,
+	UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+	UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+};
+
 enum uverbs_attrs_create_counters_cmd_attr_ids {
 	UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
 };
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v16 4/4] RDMA/mlx5: Support dma-buf based userspace memory region
  2020-12-15 21:27 ` Jianxin Xiong
@ 2020-12-15 21:27   ` Jianxin Xiong
  -1 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Jianxin Xiong, Doug Ledford, Jason Gunthorpe, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

Implement the new driver method 'reg_user_mr_dmabuf'.  Utilize the core
functions to import dma-buf based memory region and update the mappings.

Add code to handle dma-buf related page fault.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/hw/mlx5/main.c    |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  18 ++++++
 drivers/infiniband/hw/mlx5/mr.c      | 112 ++++++++++++++++++++++++++++++++++-
 drivers/infiniband/hw/mlx5/odp.c     |  89 ++++++++++++++++++++++++++--
 4 files changed, 214 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 4a054eb..c025746 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /*
  * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  */
 
 #include <linux/debugfs.h>
@@ -4069,6 +4070,7 @@ static int mlx5_ib_enable_driver(struct ib_device *dev)
 	.query_srq = mlx5_ib_query_srq,
 	.query_ucontext = mlx5_ib_query_ucontext,
 	.reg_user_mr = mlx5_ib_reg_user_mr,
+	.reg_user_mr_dmabuf = mlx5_ib_reg_user_mr_dmabuf,
 	.req_notify_cq = mlx5_ib_arm_cq,
 	.rereg_user_mr = mlx5_ib_rereg_user_mr,
 	.resize_cq = mlx5_ib_resize_cq,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index c33d6fd..bddf252 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
 /*
  * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  */
 
 #ifndef MLX5_IB_H
@@ -703,6 +704,12 @@ static inline bool is_odp_mr(struct mlx5_ib_mr *mr)
 	       mr->umem->is_odp;
 }
 
+static inline bool is_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	return IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING) && mr->umem &&
+	       mr->umem->is_dmabuf;
+}
+
 struct mlx5_ib_mw {
 	struct ib_mw		ibmw;
 	struct mlx5_core_mkey	mmkey;
@@ -1243,6 +1250,10 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int access_flags,
 				  struct ib_udata *udata);
+struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
+					 u64 length, u64 virt_addr,
+					 int fd, int access_flags,
+					 struct ib_udata *udata);
 int mlx5_ib_advise_mr(struct ib_pd *pd,
 		      enum ib_uverbs_advise_mr_advice advice,
 		      u32 flags,
@@ -1253,11 +1264,13 @@ int mlx5_ib_advise_mr(struct ib_pd *pd,
 int mlx5_ib_dealloc_mw(struct ib_mw *mw);
 int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 		       int page_shift, int flags);
+int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags);
 struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
 					     struct ib_udata *udata,
 					     int access_flags);
 void mlx5_ib_free_implicit_mr(struct mlx5_ib_mr *mr);
 void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr);
+void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr);
 struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 				    u64 length, u64 virt_addr, int access_flags,
 				    struct ib_pd *pd, struct ib_udata *udata);
@@ -1345,6 +1358,7 @@ int mlx5_ib_advise_mr_prefetch(struct ib_pd *pd,
 			       enum ib_uverbs_advise_mr_advice advice,
 			       u32 flags, struct ib_sge *sg_list, u32 num_sge);
 int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr);
+int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr);
 #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 static inline void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
 {
@@ -1370,6 +1384,10 @@ static inline int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr)
 {
 	return -EOPNOTSUPP;
 }
+static inline int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	return -EOPNOTSUPP;
+}
 #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 
 extern const struct mmu_interval_notifier_ops mlx5_mn_ops;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6fa869c..6b9c4dc 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -36,6 +37,8 @@
 #include <linux/debugfs.h>
 #include <linux/export.h>
 #include <linux/delay.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
 #include <rdma/ib_umem.h>
 #include <rdma/ib_umem_odp.h>
 #include <rdma/ib_verbs.h>
@@ -934,6 +937,17 @@ static void set_mr_fields(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr,
 	mr->access_flags = access_flags;
 }
 
+static unsigned int mlx5_umem_dmabuf_default_pgsz(struct ib_umem *umem,
+						  u64 iova)
+{
+	/*
+	 * The alignment of iova has already been checked upon entering
+	 * UVERBS_METHOD_REG_DMABUF_MR
+	 */
+	umem->iova = iova;
+	return PAGE_SIZE;
+}
+
 static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 					     struct ib_umem *umem, u64 iova,
 					     int access_flags)
@@ -943,7 +957,11 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 	struct mlx5_ib_mr *mr;
 	unsigned int page_size;
 
-	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
+	if (umem->is_dmabuf)
+		page_size = mlx5_umem_dmabuf_default_pgsz(umem, iova);
+	else
+		page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size,
+						     0, iova);
 	if (WARN_ON(!page_size))
 		return ERR_PTR(-EINVAL);
 	ent = mr_cache_ent_from_order(
@@ -979,7 +997,6 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 	mr->mmkey.size = umem->length;
 	mr->mmkey.pd = to_mpd(pd)->pdn;
 	mr->page_shift = order_base_2(page_size);
-	mr->umem = umem;
 	set_mr_fields(dev, mr, umem->length, access_flags);
 
 	return mr;
@@ -1200,8 +1217,10 @@ int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 
 /*
  * Send the DMA list to the HW for a normal MR using UMR.
+ * Dmabuf MR is handled in a similar way, except that the MLX5_IB_UPD_XLT_ZAP
+ * flag may be used.
  */
-static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
+int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
 {
 	struct mlx5_ib_dev *dev = mr_to_mdev(mr);
 	struct device *ddev = &dev->mdev->pdev->dev;
@@ -1243,6 +1262,10 @@ static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
 		cur_mtt->ptag =
 			cpu_to_be64(rdma_block_iter_dma_address(&biter) |
 				    MLX5_IB_MTT_PRESENT);
+
+		if (mr->umem->is_dmabuf && (flags & MLX5_IB_UPD_XLT_ZAP))
+			cur_mtt->ptag = 0;
+
 		cur_mtt++;
 	}
 
@@ -1566,6 +1589,84 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	return create_real_mr(pd, umem, iova, access_flags);
 }
 
+static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
+	struct mlx5_ib_mr *mr = umem_dmabuf->private;
+
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!umem_dmabuf->sgt)
+		return;
+
+	mlx5_ib_update_mr_pas(mr, MLX5_IB_UPD_XLT_ZAP);
+	ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+}
+
+static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = {
+	.allow_peer2peer = 1,
+	.move_notify = mlx5_ib_dmabuf_invalidate_cb,
+};
+
+struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 offset,
+					 u64 length, u64 virt_addr,
+					 int fd, int access_flags,
+					 struct ib_udata *udata)
+{
+	struct mlx5_ib_dev *dev = to_mdev(pd->device);
+	struct mlx5_ib_mr *mr = NULL;
+	struct ib_umem_dmabuf *umem_dmabuf;
+	int err;
+
+	if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM) ||
+	    !IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	mlx5_ib_dbg(dev,
+		    "offset 0x%llx, virt_addr 0x%llx, length 0x%llx, fd %d, access_flags 0x%x\n",
+		    offset, virt_addr, length, fd, access_flags);
+
+	/* dmabuf requires xlt update via umr to work. */
+	if (!mlx5_ib_can_load_pas_with_umr(dev, length))
+		return ERR_PTR(-EINVAL);
+
+	umem_dmabuf = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd,
+					 access_flags,
+					 &mlx5_ib_dmabuf_attach_ops);
+	if (IS_ERR(umem_dmabuf)) {
+		mlx5_ib_dbg(dev, "umem_dmabuf get failed (%ld)\n",
+			    PTR_ERR(umem_dmabuf));
+		return ERR_CAST(umem_dmabuf);
+	}
+
+	mr = alloc_cacheable_mr(pd, &umem_dmabuf->umem, virt_addr,
+				access_flags);
+	if (IS_ERR(mr)) {
+		ib_umem_release(&umem_dmabuf->umem);
+		return ERR_CAST(mr);
+	}
+
+	mlx5_ib_dbg(dev, "mkey 0x%x\n", mr->mmkey.key);
+
+	atomic_add(ib_umem_num_pages(mr->umem), &dev->mdev->priv.reg_pages);
+	umem_dmabuf->private = mr;
+	init_waitqueue_head(&mr->q_deferred_work);
+	atomic_set(&mr->num_deferred_work, 0);
+	err = xa_err(xa_store(&dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key),
+			      &mr->mmkey, GFP_KERNEL));
+	if (err)
+		goto err_dereg_mr;
+
+	err = mlx5_ib_init_dmabuf_mr(mr);
+	if (err)
+		goto err_dereg_mr;
+	return &mr->ibmr;
+
+err_dereg_mr:
+	dereg_mr(dev, mr);
+	return ERR_PTR(err);
+}
+
 /**
  * mlx5_mr_cache_invalidate - Fence all DMA on the MR
  * @mr: The MR to fence
@@ -1723,6 +1824,9 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 	if (flags & ~(IB_MR_REREG_TRANS | IB_MR_REREG_PD | IB_MR_REREG_ACCESS))
 		return ERR_PTR(-EOPNOTSUPP);
 
+	if (is_dmabuf_mr(mr))
+		return ERR_PTR(-EOPNOTSUPP);
+
 	if (!(flags & IB_MR_REREG_ACCESS))
 		new_access_flags = mr->access_flags;
 	if (!(flags & IB_MR_REREG_PD))
@@ -1875,6 +1979,8 @@ static void dereg_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
 	/* Stop all DMA */
 	if (is_odp_mr(mr))
 		mlx5_ib_fence_odp_mr(mr);
+	else if (is_dmabuf_mr(mr))
+		mlx5_ib_fence_dmabuf_mr(mr);
 	else
 		clean_mr(dev, mr);
 
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index aa2413b..440fbf7 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -33,6 +33,8 @@
 #include <rdma/ib_umem.h>
 #include <rdma/ib_umem_odp.h>
 #include <linux/kernel.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
 
 #include "mlx5_ib.h"
 #include "cmd.h"
@@ -670,6 +672,37 @@ void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr)
 	dma_fence_odp_mr(mr);
 }
 
+/**
+ * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
+ * @mr: to fence
+ *
+ * On return no parallel threads will be touching this MR and no DMA will be
+ * active.
+ */
+void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
+
+	/* Prevent new page faults and prefetch requests from succeeding */
+	xa_erase(&mr_to_mdev(mr)->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
+
+	/* Wait for all running page-fault handlers to finish. */
+	synchronize_srcu(&mr_to_mdev(mr)->odp_srcu);
+
+	wait_event(mr->q_deferred_work, !atomic_read(&mr->num_deferred_work));
+
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	mlx5_mr_cache_invalidate(mr);
+	umem_dmabuf->private = NULL;
+	ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!mr->cache_ent) {
+		mlx5_core_destroy_mkey(mr_to_mdev(mr)->mdev, &mr->mmkey);
+		WARN_ON(mr->descs);
+	}
+}
+
 #define MLX5_PF_FLAGS_DOWNGRADE BIT(1)
 #define MLX5_PF_FLAGS_SNAPSHOT BIT(2)
 #define MLX5_PF_FLAGS_ENABLE BIT(3)
@@ -803,6 +836,44 @@ static int pagefault_implicit_mr(struct mlx5_ib_mr *imr,
 	return ret;
 }
 
+static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr, size_t bcnt,
+			       u32 *bytes_mapped, u32 flags)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
+	u32 xlt_flags = 0;
+	int err;
+	unsigned int page_size;
+
+	if (flags & MLX5_PF_FLAGS_ENABLE)
+		xlt_flags |= MLX5_IB_UPD_XLT_ENABLE;
+
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	err = ib_umem_dmabuf_map_pages(umem_dmabuf);
+	if (err) {
+		dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+		return err;
+	}
+
+	page_size = mlx5_umem_find_best_pgsz(&umem_dmabuf->umem, mkc,
+					     log_page_size, 0,
+					     umem_dmabuf->umem.iova);
+	if (unlikely(page_size < PAGE_SIZE)) {
+		ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+		err = -EINVAL;
+	} else {
+		err = mlx5_ib_update_mr_pas(mr, xlt_flags);
+	}
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
+	if (err)
+		return err;
+
+	if (bytes_mapped)
+		*bytes_mapped += bcnt;
+
+	return ib_umem_num_pages(mr->umem);
+}
+
 /*
  * Returns:
  *  -EFAULT: The io_virt->bcnt is not within the MR, it covers pages that are
@@ -821,6 +892,9 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 io_virt, size_t bcnt,
 	if (unlikely(io_virt < mr->mmkey.iova))
 		return -EFAULT;
 
+	if (mr->umem->is_dmabuf)
+		return pagefault_dmabuf_mr(mr, bcnt, bytes_mapped, flags);
+
 	if (!odp->is_implicit_odp) {
 		u64 user_va;
 
@@ -847,6 +921,16 @@ int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr)
 	return ret >= 0 ? 0 : ret;
 }
 
+int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	int ret;
+
+	ret = pagefault_dmabuf_mr(mr, mr->umem->length, NULL,
+				  MLX5_PF_FLAGS_ENABLE);
+
+	return ret >= 0 ? 0 : ret;
+}
+
 struct pf_frame {
 	struct pf_frame *next;
 	u32 key;
@@ -1749,7 +1833,6 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work)
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct mlx5_core_mkey *mmkey;
-	struct ib_umem_odp *odp;
 	struct mlx5_ib_mr *mr;
 
 	lockdep_assert_held(&dev->odp_srcu);
@@ -1763,11 +1846,9 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work)
 	if (mr->ibmr.pd != pd)
 		return NULL;
 
-	odp = to_ib_umem_odp(mr->umem);
-
 	/* prefetch with write-access must be supported by the MR */
 	if (advice == IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE &&
-	    !odp->umem.writable)
+	    !mr->umem->writable)
 		return NULL;
 
 	return mr;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v16 4/4] RDMA/mlx5: Support dma-buf based userspace memory region
@ 2020-12-15 21:27   ` Jianxin Xiong
  0 siblings, 0 replies; 48+ messages in thread
From: Jianxin Xiong @ 2020-12-15 21:27 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Jason Gunthorpe, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

Implement the new driver method 'reg_user_mr_dmabuf'.  Utilize the core
functions to import dma-buf based memory region and update the mappings.

Add code to handle dma-buf related page fault.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Acked-by: Christian Koenig <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/infiniband/hw/mlx5/main.c    |   2 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  18 ++++++
 drivers/infiniband/hw/mlx5/mr.c      | 112 ++++++++++++++++++++++++++++++++++-
 drivers/infiniband/hw/mlx5/odp.c     |  89 ++++++++++++++++++++++++++--
 4 files changed, 214 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 4a054eb..c025746 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 /*
  * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  */
 
 #include <linux/debugfs.h>
@@ -4069,6 +4070,7 @@ static int mlx5_ib_enable_driver(struct ib_device *dev)
 	.query_srq = mlx5_ib_query_srq,
 	.query_ucontext = mlx5_ib_query_ucontext,
 	.reg_user_mr = mlx5_ib_reg_user_mr,
+	.reg_user_mr_dmabuf = mlx5_ib_reg_user_mr_dmabuf,
 	.req_notify_cq = mlx5_ib_arm_cq,
 	.rereg_user_mr = mlx5_ib_rereg_user_mr,
 	.resize_cq = mlx5_ib_resize_cq,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index c33d6fd..bddf252 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
 /*
  * Copyright (c) 2013-2020, Mellanox Technologies inc. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  */
 
 #ifndef MLX5_IB_H
@@ -703,6 +704,12 @@ static inline bool is_odp_mr(struct mlx5_ib_mr *mr)
 	       mr->umem->is_odp;
 }
 
+static inline bool is_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	return IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING) && mr->umem &&
+	       mr->umem->is_dmabuf;
+}
+
 struct mlx5_ib_mw {
 	struct ib_mw		ibmw;
 	struct mlx5_core_mkey	mmkey;
@@ -1243,6 +1250,10 @@ int mlx5_ib_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  u64 virt_addr, int access_flags,
 				  struct ib_udata *udata);
+struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
+					 u64 length, u64 virt_addr,
+					 int fd, int access_flags,
+					 struct ib_udata *udata);
 int mlx5_ib_advise_mr(struct ib_pd *pd,
 		      enum ib_uverbs_advise_mr_advice advice,
 		      u32 flags,
@@ -1253,11 +1264,13 @@ int mlx5_ib_advise_mr(struct ib_pd *pd,
 int mlx5_ib_dealloc_mw(struct ib_mw *mw);
 int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 		       int page_shift, int flags);
+int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags);
 struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd,
 					     struct ib_udata *udata,
 					     int access_flags);
 void mlx5_ib_free_implicit_mr(struct mlx5_ib_mr *mr);
 void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr);
+void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr);
 struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 				    u64 length, u64 virt_addr, int access_flags,
 				    struct ib_pd *pd, struct ib_udata *udata);
@@ -1345,6 +1358,7 @@ int mlx5_ib_advise_mr_prefetch(struct ib_pd *pd,
 			       enum ib_uverbs_advise_mr_advice advice,
 			       u32 flags, struct ib_sge *sg_list, u32 num_sge);
 int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr);
+int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr);
 #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 static inline void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
 {
@@ -1370,6 +1384,10 @@ static inline int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr)
 {
 	return -EOPNOTSUPP;
 }
+static inline int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	return -EOPNOTSUPP;
+}
 #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 
 extern const struct mmu_interval_notifier_ops mlx5_mn_ops;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6fa869c..6b9c4dc 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -36,6 +37,8 @@
 #include <linux/debugfs.h>
 #include <linux/export.h>
 #include <linux/delay.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
 #include <rdma/ib_umem.h>
 #include <rdma/ib_umem_odp.h>
 #include <rdma/ib_verbs.h>
@@ -934,6 +937,17 @@ static void set_mr_fields(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr,
 	mr->access_flags = access_flags;
 }
 
+static unsigned int mlx5_umem_dmabuf_default_pgsz(struct ib_umem *umem,
+						  u64 iova)
+{
+	/*
+	 * The alignment of iova has already been checked upon entering
+	 * UVERBS_METHOD_REG_DMABUF_MR
+	 */
+	umem->iova = iova;
+	return PAGE_SIZE;
+}
+
 static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 					     struct ib_umem *umem, u64 iova,
 					     int access_flags)
@@ -943,7 +957,11 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 	struct mlx5_ib_mr *mr;
 	unsigned int page_size;
 
-	page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size, 0, iova);
+	if (umem->is_dmabuf)
+		page_size = mlx5_umem_dmabuf_default_pgsz(umem, iova);
+	else
+		page_size = mlx5_umem_find_best_pgsz(umem, mkc, log_page_size,
+						     0, iova);
 	if (WARN_ON(!page_size))
 		return ERR_PTR(-EINVAL);
 	ent = mr_cache_ent_from_order(
@@ -979,7 +997,6 @@ static struct mlx5_ib_mr *alloc_cacheable_mr(struct ib_pd *pd,
 	mr->mmkey.size = umem->length;
 	mr->mmkey.pd = to_mpd(pd)->pdn;
 	mr->page_shift = order_base_2(page_size);
-	mr->umem = umem;
 	set_mr_fields(dev, mr, umem->length, access_flags);
 
 	return mr;
@@ -1200,8 +1217,10 @@ int mlx5_ib_update_xlt(struct mlx5_ib_mr *mr, u64 idx, int npages,
 
 /*
  * Send the DMA list to the HW for a normal MR using UMR.
+ * Dmabuf MR is handled in a similar way, except that the MLX5_IB_UPD_XLT_ZAP
+ * flag may be used.
  */
-static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
+int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
 {
 	struct mlx5_ib_dev *dev = mr_to_mdev(mr);
 	struct device *ddev = &dev->mdev->pdev->dev;
@@ -1243,6 +1262,10 @@ static int mlx5_ib_update_mr_pas(struct mlx5_ib_mr *mr, unsigned int flags)
 		cur_mtt->ptag =
 			cpu_to_be64(rdma_block_iter_dma_address(&biter) |
 				    MLX5_IB_MTT_PRESENT);
+
+		if (mr->umem->is_dmabuf && (flags & MLX5_IB_UPD_XLT_ZAP))
+			cur_mtt->ptag = 0;
+
 		cur_mtt++;
 	}
 
@@ -1566,6 +1589,84 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 	return create_real_mr(pd, umem, iova, access_flags);
 }
 
+static void mlx5_ib_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
+	struct mlx5_ib_mr *mr = umem_dmabuf->private;
+
+	dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!umem_dmabuf->sgt)
+		return;
+
+	mlx5_ib_update_mr_pas(mr, MLX5_IB_UPD_XLT_ZAP);
+	ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+}
+
+static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = {
+	.allow_peer2peer = 1,
+	.move_notify = mlx5_ib_dmabuf_invalidate_cb,
+};
+
+struct ib_mr *mlx5_ib_reg_user_mr_dmabuf(struct ib_pd *pd, u64 offset,
+					 u64 length, u64 virt_addr,
+					 int fd, int access_flags,
+					 struct ib_udata *udata)
+{
+	struct mlx5_ib_dev *dev = to_mdev(pd->device);
+	struct mlx5_ib_mr *mr = NULL;
+	struct ib_umem_dmabuf *umem_dmabuf;
+	int err;
+
+	if (!IS_ENABLED(CONFIG_INFINIBAND_USER_MEM) ||
+	    !IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	mlx5_ib_dbg(dev,
+		    "offset 0x%llx, virt_addr 0x%llx, length 0x%llx, fd %d, access_flags 0x%x\n",
+		    offset, virt_addr, length, fd, access_flags);
+
+	/* dmabuf requires xlt update via umr to work. */
+	if (!mlx5_ib_can_load_pas_with_umr(dev, length))
+		return ERR_PTR(-EINVAL);
+
+	umem_dmabuf = ib_umem_dmabuf_get(&dev->ib_dev, offset, length, fd,
+					 access_flags,
+					 &mlx5_ib_dmabuf_attach_ops);
+	if (IS_ERR(umem_dmabuf)) {
+		mlx5_ib_dbg(dev, "umem_dmabuf get failed (%ld)\n",
+			    PTR_ERR(umem_dmabuf));
+		return ERR_CAST(umem_dmabuf);
+	}
+
+	mr = alloc_cacheable_mr(pd, &umem_dmabuf->umem, virt_addr,
+				access_flags);
+	if (IS_ERR(mr)) {
+		ib_umem_release(&umem_dmabuf->umem);
+		return ERR_CAST(mr);
+	}
+
+	mlx5_ib_dbg(dev, "mkey 0x%x\n", mr->mmkey.key);
+
+	atomic_add(ib_umem_num_pages(mr->umem), &dev->mdev->priv.reg_pages);
+	umem_dmabuf->private = mr;
+	init_waitqueue_head(&mr->q_deferred_work);
+	atomic_set(&mr->num_deferred_work, 0);
+	err = xa_err(xa_store(&dev->odp_mkeys, mlx5_base_mkey(mr->mmkey.key),
+			      &mr->mmkey, GFP_KERNEL));
+	if (err)
+		goto err_dereg_mr;
+
+	err = mlx5_ib_init_dmabuf_mr(mr);
+	if (err)
+		goto err_dereg_mr;
+	return &mr->ibmr;
+
+err_dereg_mr:
+	dereg_mr(dev, mr);
+	return ERR_PTR(err);
+}
+
 /**
  * mlx5_mr_cache_invalidate - Fence all DMA on the MR
  * @mr: The MR to fence
@@ -1723,6 +1824,9 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 	if (flags & ~(IB_MR_REREG_TRANS | IB_MR_REREG_PD | IB_MR_REREG_ACCESS))
 		return ERR_PTR(-EOPNOTSUPP);
 
+	if (is_dmabuf_mr(mr))
+		return ERR_PTR(-EOPNOTSUPP);
+
 	if (!(flags & IB_MR_REREG_ACCESS))
 		new_access_flags = mr->access_flags;
 	if (!(flags & IB_MR_REREG_PD))
@@ -1875,6 +1979,8 @@ static void dereg_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
 	/* Stop all DMA */
 	if (is_odp_mr(mr))
 		mlx5_ib_fence_odp_mr(mr);
+	else if (is_dmabuf_mr(mr))
+		mlx5_ib_fence_dmabuf_mr(mr);
 	else
 		clean_mr(dev, mr);
 
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index aa2413b..440fbf7 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -33,6 +33,8 @@
 #include <rdma/ib_umem.h>
 #include <rdma/ib_umem_odp.h>
 #include <linux/kernel.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
 
 #include "mlx5_ib.h"
 #include "cmd.h"
@@ -670,6 +672,37 @@ void mlx5_ib_fence_odp_mr(struct mlx5_ib_mr *mr)
 	dma_fence_odp_mr(mr);
 }
 
+/**
+ * mlx5_ib_fence_dmabuf_mr - Stop all access to the dmabuf MR
+ * @mr: to fence
+ *
+ * On return no parallel threads will be touching this MR and no DMA will be
+ * active.
+ */
+void mlx5_ib_fence_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
+
+	/* Prevent new page faults and prefetch requests from succeeding */
+	xa_erase(&mr_to_mdev(mr)->odp_mkeys, mlx5_base_mkey(mr->mmkey.key));
+
+	/* Wait for all running page-fault handlers to finish. */
+	synchronize_srcu(&mr_to_mdev(mr)->odp_srcu);
+
+	wait_event(mr->q_deferred_work, !atomic_read(&mr->num_deferred_work));
+
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	mlx5_mr_cache_invalidate(mr);
+	umem_dmabuf->private = NULL;
+	ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
+	if (!mr->cache_ent) {
+		mlx5_core_destroy_mkey(mr_to_mdev(mr)->mdev, &mr->mmkey);
+		WARN_ON(mr->descs);
+	}
+}
+
 #define MLX5_PF_FLAGS_DOWNGRADE BIT(1)
 #define MLX5_PF_FLAGS_SNAPSHOT BIT(2)
 #define MLX5_PF_FLAGS_ENABLE BIT(3)
@@ -803,6 +836,44 @@ static int pagefault_implicit_mr(struct mlx5_ib_mr *imr,
 	return ret;
 }
 
+static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr, size_t bcnt,
+			       u32 *bytes_mapped, u32 flags)
+{
+	struct ib_umem_dmabuf *umem_dmabuf = to_ib_umem_dmabuf(mr->umem);
+	u32 xlt_flags = 0;
+	int err;
+	unsigned int page_size;
+
+	if (flags & MLX5_PF_FLAGS_ENABLE)
+		xlt_flags |= MLX5_IB_UPD_XLT_ENABLE;
+
+	dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+	err = ib_umem_dmabuf_map_pages(umem_dmabuf);
+	if (err) {
+		dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+		return err;
+	}
+
+	page_size = mlx5_umem_find_best_pgsz(&umem_dmabuf->umem, mkc,
+					     log_page_size, 0,
+					     umem_dmabuf->umem.iova);
+	if (unlikely(page_size < PAGE_SIZE)) {
+		ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+		err = -EINVAL;
+	} else {
+		err = mlx5_ib_update_mr_pas(mr, xlt_flags);
+	}
+	dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
+	if (err)
+		return err;
+
+	if (bytes_mapped)
+		*bytes_mapped += bcnt;
+
+	return ib_umem_num_pages(mr->umem);
+}
+
 /*
  * Returns:
  *  -EFAULT: The io_virt->bcnt is not within the MR, it covers pages that are
@@ -821,6 +892,9 @@ static int pagefault_mr(struct mlx5_ib_mr *mr, u64 io_virt, size_t bcnt,
 	if (unlikely(io_virt < mr->mmkey.iova))
 		return -EFAULT;
 
+	if (mr->umem->is_dmabuf)
+		return pagefault_dmabuf_mr(mr, bcnt, bytes_mapped, flags);
+
 	if (!odp->is_implicit_odp) {
 		u64 user_va;
 
@@ -847,6 +921,16 @@ int mlx5_ib_init_odp_mr(struct mlx5_ib_mr *mr)
 	return ret >= 0 ? 0 : ret;
 }
 
+int mlx5_ib_init_dmabuf_mr(struct mlx5_ib_mr *mr)
+{
+	int ret;
+
+	ret = pagefault_dmabuf_mr(mr, mr->umem->length, NULL,
+				  MLX5_PF_FLAGS_ENABLE);
+
+	return ret >= 0 ? 0 : ret;
+}
+
 struct pf_frame {
 	struct pf_frame *next;
 	u32 key;
@@ -1749,7 +1833,6 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work)
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct mlx5_core_mkey *mmkey;
-	struct ib_umem_odp *odp;
 	struct mlx5_ib_mr *mr;
 
 	lockdep_assert_held(&dev->odp_srcu);
@@ -1763,11 +1846,9 @@ static void destroy_prefetch_work(struct prefetch_mr_work *work)
 	if (mr->ibmr.pd != pd)
 		return NULL;
 
-	odp = to_ib_umem_odp(mr->umem);
-
 	/* prefetch with write-access must be supported by the MR */
 	if (advice == IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE &&
-	    !odp->umem.writable)
+	    !mr->umem->writable)
 		return NULL;
 
 	return mr;
-- 
1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* RE: [PATCH v16 0/4] RDMA: Add dma-buf support
  2020-12-15 21:27 ` Jianxin Xiong
@ 2021-01-11 15:24   ` Xiong, Jianxin
  -1 siblings, 0 replies; 48+ messages in thread
From: Xiong, Jianxin @ 2021-01-11 15:24 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Sumit Semwal,
	Christian Koenig, Vetter, Daniel

Jason, will this series be able to get into 5.12?

> -----Original Message-----
> From: Xiong, Jianxin <jianxin.xiong@intel.com>
> Sent: Tuesday, December 15, 2020 1:27 PM
> To: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org
> Cc: Xiong, Jianxin <jianxin.xiong@intel.com>; Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> This is the sixteenth version of the patch set. Changelog:
> 
> v16:
> * Add "select DMA_SHARED_BUFFER" to Kconfig when IB UMEM is enabled.
>   This fixes the auto build test error with a random config.
> 
> v15: https://www.spinics.net/lists/linux-rdma/msg98369.html
> * Rebase to the latest linux-rdma 'for-next' branch (commit 0583531bb9ef)
>   to pick up RDMA core and mlx5 updates
> * Let ib_umem_dmabuf_get() return 'struct ib_umem_dmabuf *' instead of
>   'struct ib_umem *'
> * Move the check of on demand paging support to mlx5_ib_reg_user_mr_dmabuf()
> * Check iova alignment at the entry point of the uverb command so that
>   mlx5_umem_dmabuf_default_pgsz() can always succeed
> 
> v14: https://www.spinics.net/lists/linux-rdma/msg98265.html
> * Check return value of dma_fence_wait()
> * Fix a dma-buf leak in ib_umem_dmabuf_get()
> * Fix return value type cast for ib_umem_dmabuf_get()
> * Return -EOPNOTSUPP instead of -EINVAL for unimplemented functions
> * Remove an unnecessary use of unlikely()
> * Remove left-over commit message resulted from rebase
> 
> v13: https://www.spinics.net/lists/linux-rdma/msg98227.html
> * Rebase to the latest linux-rdma 'for-next' branch (5.10.0-rc6+)
> * Check for device on-demand paging capability at the entry point of
>   the new verbs command to avoid calling device's reg_user_mr_dmabuf()
>   method when CONFIG_INFINIBAND_ON_DEMAND_PAGING is diabled.
> 
> v12: https://www.spinics.net/lists/linux-rdma/msg97943.html
> * Move the prototype of function ib_umem_dmabuf_release() to ib_umem.h
>   and remove umem_dmabuf.h
> * Break a line that is too long
> 
> v11: https://www.spinics.net/lists/linux-rdma/msg97860.html
> * Rework the parameter checking code inside ib_umem_dmabuf_get()
> * Fix incorrect error handling in the new verbs command handler
> * Put a duplicated code sequence for checking iova and setting page size
>   into a function
> * In the invalidation callback, check for if the buffer has been mapped
>   and thus the presence of a valid driver mr is ensured
> * The patch that checks for dma_virt_ops is dropped because it is no
>   longer needed
> * The patch that documents that dma-buf size is fixed has landed at:
>   https://cgit.freedesktop.org/drm/drm-misc/commit/?id=476b485be03c
>   and thus is no longer included here
> * The matching user space patch set is sent separately
> 
> v10: https://www.spinics.net/lists/linux-rdma/msg97483.html
> * Don't map the pages in ib_umem_dmabuf_get(); use the size information
>   of the dma-buf object to validate the umem size instead
> * Use PAGE_SIZE directly instead of use ib_umem_find_best_pgsz() when
>   the MR is created since the pages have not been mapped yet and dma-buf
>   requires PAGE_SIZE anyway
> * Always call mlx5_umem_find_best_pgsz() after mapping the pages to
>   verify that the page size requirement is satisfied
> * Add a patch to document that dma-buf size is fixed
> 
> v9: https://www.spinics.net/lists/linux-rdma/msg97432.html
> * Clean up the code for sg list in-place modification
> * Prevent dma-buf pages from being mapped multiple times
> * Map the pages in ib_umem_dmabuf_get() so that inproper values of
>   address/length/iova can be caught early
> * Check for unsupported flags in the new uverbs command
> * Add missing uverbs_finalize_uobj_create()
> * Sort uverbs objects by name
> * Fix formating issue -- unnecessary alignment of '='
> * Unmap pages in mlx5_ib_fence_dmabuf_mr()
> * Remove address range checking from pagefault_dmabuf_mr()
> 
> v8: https://www.spinics.net/lists/linux-rdma/msg97370.html
> * Modify the dma-buf sg list in place to get a proper umem sg list and
>   restore it before calling dma_buf_unmap_attachment()
> * Validate the umem sg list with ib_umem_find_best_pgsz()
> * Remove the logic for slicing the sg list at runtime
> 
> v7: https://www.spinics.net/lists/linux-rdma/msg97297.html
> * Rebase on top of latest mlx5 MR patch series
> * Slice dma-buf sg list at runtime instead of creating a new list
> * Preload the buffer page mapping when the MR is created
> * Move the 'dma_virt_ops' check into dma_buf_dynamic_attach()
> 
> v6: https://www.spinics.net/lists/linux-rdma/msg96923.html
> * Move the dma-buf invalidation callback from the core to the device
>   driver
> * Move mapping update from work queue to pagefault handler
> * Add dma-buf based MRs to the xarray of mmkeys so that the pagefault
>   handler can be reached
> * Update the new driver method and uverbs command signature by changing
>   the paramter 'addr' to 'offset'
> * Modify the sg list returned from dma_buf_map_attachment() based on
>   the parameters 'offset' and 'length'
> * Don't import dma-buf if 'dma_virt_ops' is used by the dma device
> * The patch that clarifies dma-buf sg lists alignment has landed at
>   https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac80cd17a615
>   and thus is no longer included with this set
> 
> v5: https://www.spinics.net/lists/linux-rdma/msg96786.html
> * Fix a few warnings reported by kernel test robot:
>     - no previous prototype for function 'ib_umem_dmabuf_release'
>     - no previous prototype for function 'ib_umem_dmabuf_map_pages'
>     - comparison of distinct pointer types in 'check_add_overflow'
> * Add comment for the wait between getting the dma-buf sg tagle and
>   updating the NIC page table
> 
> v4: https://www.spinics.net/lists/linux-rdma/msg96767.html
> * Add a new ib_device method reg_user_mr_dmabuf() instead of expanding
>   the existing method reg_user_mr()
> * Use a separate code flow for dma-buf instead of adding special cases
>   to the ODP memory region code path
> * In invalidation callback, new mapping is updated as whole using work
>   queue instead of being updated in page granularity in the page fault
>   handler
> * Use dma_resv_get_excl() and dma_fence_wait() to ensure the content of
>   the pages have been moved to the new location before the new mapping
>   is programmed into the NIC
> * Add code to the ODP page fault handler to check the mapping status
> * The new access flag added in v3 is removed.
> * The checking for on-demand paging support in the new uverbs command
>   is removed because it is implied by implementing the new ib_device
>   method
> * Clarify that dma-buf sg lists are page aligned
> 
> v3: https://www.spinics.net/lists/linux-rdma/msg96330.html
> * Use dma_buf_dynamic_attach() instead of dma_buf_attach()
> * Use on-demand paging mechanism to avoid pinning the GPU memory
> * Instead of adding a new parameter to the device method for memory
>   registration, pass all the attributes including the file descriptor
>   as a structure
> * Define a new access flag for dma-buf based memory region
> * Check for on-demand paging support in the new uverbs command
> 
> v2: https://www.spinics.net/lists/linux-rdma/msg93643.html
> * The Kconfig option is removed. There is no dependence issue since
>   dma-buf driver is always enabled.
> * The declaration of new data structure and functions is reorganized to
>   minimize the visibility of the changes.
> * The new uverbs command now goes through ioctl() instead of write().
> * The rereg functionality is removed.
> * Instead of adding new device method for dma-buf specific registration,
>   existing method is extended to accept an extra parameter.
> * The correct function is now used for address range checking.
> 
> v1: https://www.spinics.net/lists/linux-rdma/msg90720.html
> * The initial patch set
> * Implement core functions for importing and mapping dma-buf
> * Use dma-buf static attach interface
> * Add two ib_device methods reg_user_mr_fd() and rereg_user_mr_fd()
> * Add two uverbs commands via the write() interface
> * Add Kconfig option
> * Add dma-buf support to mlx5 device
> 
> When enabled, an RDMA capable NIC can perform peer-to-peer transactions over PCIe to access the local memory located on another
> device. This can often lead to better performance than using a system memory buffer for RDMA and copying data between the buffer and
> device memory.
> 
> Current kernel RDMA stack uses get_user_pages() to pin the physical pages backing the user buffer and uses dma_map_sg_attrs() to get the
> dma addresses for memory access. This usually doesn't work for peer device memory due to the lack of associated page structures.
> 
> Several mechanisms exist today to facilitate device memory access.
> 
> ZONE_DEVICE is a new zone for device memory in the memory management subsystem. It allows pages from device memory being
> described with specialized page structures, but what can be done with these page structures may be different from system memory.
> ZONE_DEVICE is further specialized into multiple memory types, such as one type for PCI p2pmem/p2pdma and one type for HMM.
> 
> PCI p2pmem/p2pdma uses ZONE_DEVICE to represent device memory residing in a PCI BAR and provides a set of calls to publish, discover,
> allocate, and map such memory for peer-to-peer transactions. One feature of the API is that the buffer is allocated by the side that does the
> DMA transfer. This works well with the storage usage case, but is awkward with GPU-NIC communication, where typically the buffer is
> allocated by the GPU driver rather than the NIC driver.
> 
> Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier and ZONE_DEVICE to support shared virtual address space and
> page migration between system memory and device memory. HMM doesn't support pinning device memory because pages located on
> device must be able to migrate to system memory when accessed by CPU. Peer-to-peer access is currently not supported by HMM.
> 
> Dma-buf is a standard mechanism for sharing buffers among different device drivers. The buffer to be shared is exported by the owning
> driver and imported by the driver that wants to use it. The exporter provides a set of ops that the importer can call to pin and map the
> buffer. In addition, a file descriptor can be associated with a dma- buf object as the handle that can be passed to user space.
> 
> This patch series adds dma-buf importer role to the RDMA driver in attempt to support RDMA using device memory such as GPU VRAM.
> Dma-buf is chosen for a few reasons: first, the API is relatively simple and allows a lot of flexibility in implementing the buffer manipulation
> ops.
> Second, it doesn't require page structure. Third, dma-buf is already supported in many GPU drivers. However, we are aware that existing
> GPU drivers don't allow pinning device memory via the dma-buf interface.
> Pinning would simply cause the backing storage to migrate to system RAM.
> True peer-to-peer access is only possible using dynamic attach, which requires on-demand paging support from the NIC to work. For this
> reason, this series only works with ODP capable NICs.
> 
> This series consists of four patches. The first patch adds the common code for importing dma-buf from a file descriptor and mapping the
> dma-buf pages. Patch 2 add the new driver method reg_user_mr_dmabuf().
> Patch 3 adds a new uverbs command for registering dma-buf based memory region. Patch 4 adds dma-buf support to the mlx5 driver.
> 
> Related user space RDMA library changes are provided as a separate patch series.
> 
> Jianxin Xiong (4):
>   RDMA/umem: Support importing dma-buf as user memory region
>   RDMA/core: Add device method for registering dma-buf based memory
>     region
>   RDMA/uverbs: Add uverbs command for dma-buf based MR registration
>   RDMA/mlx5: Support dma-buf based userspace memory region
> 
>  drivers/infiniband/Kconfig                    |   1 +
>  drivers/infiniband/core/Makefile              |   2 +-
>  drivers/infiniband/core/device.c              |   1 +
>  drivers/infiniband/core/umem.c                |   3 +
>  drivers/infiniband/core/umem_dmabuf.c         | 174 ++++++++++++++++++++++++++
>  drivers/infiniband/core/uverbs_std_types_mr.c | 117 ++++++++++++++++-
>  drivers/infiniband/hw/mlx5/main.c             |   2 +
>  drivers/infiniband/hw/mlx5/mlx5_ib.h          |  18 +++
>  drivers/infiniband/hw/mlx5/mr.c               | 112 ++++++++++++++++-
>  drivers/infiniband/hw/mlx5/odp.c              |  89 ++++++++++++-
>  include/rdma/ib_umem.h                        |  48 ++++++-
>  include/rdma/ib_verbs.h                       |   6 +-
>  include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 +++
>  13 files changed, 573 insertions(+), 14 deletions(-)  create mode 100644 drivers/infiniband/core/umem_dmabuf.c
> 
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-01-11 15:24   ` Xiong, Jianxin
  0 siblings, 0 replies; 48+ messages in thread
From: Xiong, Jianxin @ 2021-01-11 15:24 UTC (permalink / raw)
  To: linux-rdma, dri-devel
  Cc: Leon Romanovsky, Christian Koenig, Jason Gunthorpe, Doug Ledford,
	Vetter, Daniel

Jason, will this series be able to get into 5.12?

> -----Original Message-----
> From: Xiong, Jianxin <jianxin.xiong@intel.com>
> Sent: Tuesday, December 15, 2020 1:27 PM
> To: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org
> Cc: Xiong, Jianxin <jianxin.xiong@intel.com>; Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> This is the sixteenth version of the patch set. Changelog:
> 
> v16:
> * Add "select DMA_SHARED_BUFFER" to Kconfig when IB UMEM is enabled.
>   This fixes the auto build test error with a random config.
> 
> v15: https://www.spinics.net/lists/linux-rdma/msg98369.html
> * Rebase to the latest linux-rdma 'for-next' branch (commit 0583531bb9ef)
>   to pick up RDMA core and mlx5 updates
> * Let ib_umem_dmabuf_get() return 'struct ib_umem_dmabuf *' instead of
>   'struct ib_umem *'
> * Move the check of on demand paging support to mlx5_ib_reg_user_mr_dmabuf()
> * Check iova alignment at the entry point of the uverb command so that
>   mlx5_umem_dmabuf_default_pgsz() can always succeed
> 
> v14: https://www.spinics.net/lists/linux-rdma/msg98265.html
> * Check return value of dma_fence_wait()
> * Fix a dma-buf leak in ib_umem_dmabuf_get()
> * Fix return value type cast for ib_umem_dmabuf_get()
> * Return -EOPNOTSUPP instead of -EINVAL for unimplemented functions
> * Remove an unnecessary use of unlikely()
> * Remove left-over commit message resulted from rebase
> 
> v13: https://www.spinics.net/lists/linux-rdma/msg98227.html
> * Rebase to the latest linux-rdma 'for-next' branch (5.10.0-rc6+)
> * Check for device on-demand paging capability at the entry point of
>   the new verbs command to avoid calling device's reg_user_mr_dmabuf()
>   method when CONFIG_INFINIBAND_ON_DEMAND_PAGING is diabled.
> 
> v12: https://www.spinics.net/lists/linux-rdma/msg97943.html
> * Move the prototype of function ib_umem_dmabuf_release() to ib_umem.h
>   and remove umem_dmabuf.h
> * Break a line that is too long
> 
> v11: https://www.spinics.net/lists/linux-rdma/msg97860.html
> * Rework the parameter checking code inside ib_umem_dmabuf_get()
> * Fix incorrect error handling in the new verbs command handler
> * Put a duplicated code sequence for checking iova and setting page size
>   into a function
> * In the invalidation callback, check for if the buffer has been mapped
>   and thus the presence of a valid driver mr is ensured
> * The patch that checks for dma_virt_ops is dropped because it is no
>   longer needed
> * The patch that documents that dma-buf size is fixed has landed at:
>   https://cgit.freedesktop.org/drm/drm-misc/commit/?id=476b485be03c
>   and thus is no longer included here
> * The matching user space patch set is sent separately
> 
> v10: https://www.spinics.net/lists/linux-rdma/msg97483.html
> * Don't map the pages in ib_umem_dmabuf_get(); use the size information
>   of the dma-buf object to validate the umem size instead
> * Use PAGE_SIZE directly instead of use ib_umem_find_best_pgsz() when
>   the MR is created since the pages have not been mapped yet and dma-buf
>   requires PAGE_SIZE anyway
> * Always call mlx5_umem_find_best_pgsz() after mapping the pages to
>   verify that the page size requirement is satisfied
> * Add a patch to document that dma-buf size is fixed
> 
> v9: https://www.spinics.net/lists/linux-rdma/msg97432.html
> * Clean up the code for sg list in-place modification
> * Prevent dma-buf pages from being mapped multiple times
> * Map the pages in ib_umem_dmabuf_get() so that inproper values of
>   address/length/iova can be caught early
> * Check for unsupported flags in the new uverbs command
> * Add missing uverbs_finalize_uobj_create()
> * Sort uverbs objects by name
> * Fix formating issue -- unnecessary alignment of '='
> * Unmap pages in mlx5_ib_fence_dmabuf_mr()
> * Remove address range checking from pagefault_dmabuf_mr()
> 
> v8: https://www.spinics.net/lists/linux-rdma/msg97370.html
> * Modify the dma-buf sg list in place to get a proper umem sg list and
>   restore it before calling dma_buf_unmap_attachment()
> * Validate the umem sg list with ib_umem_find_best_pgsz()
> * Remove the logic for slicing the sg list at runtime
> 
> v7: https://www.spinics.net/lists/linux-rdma/msg97297.html
> * Rebase on top of latest mlx5 MR patch series
> * Slice dma-buf sg list at runtime instead of creating a new list
> * Preload the buffer page mapping when the MR is created
> * Move the 'dma_virt_ops' check into dma_buf_dynamic_attach()
> 
> v6: https://www.spinics.net/lists/linux-rdma/msg96923.html
> * Move the dma-buf invalidation callback from the core to the device
>   driver
> * Move mapping update from work queue to pagefault handler
> * Add dma-buf based MRs to the xarray of mmkeys so that the pagefault
>   handler can be reached
> * Update the new driver method and uverbs command signature by changing
>   the paramter 'addr' to 'offset'
> * Modify the sg list returned from dma_buf_map_attachment() based on
>   the parameters 'offset' and 'length'
> * Don't import dma-buf if 'dma_virt_ops' is used by the dma device
> * The patch that clarifies dma-buf sg lists alignment has landed at
>   https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac80cd17a615
>   and thus is no longer included with this set
> 
> v5: https://www.spinics.net/lists/linux-rdma/msg96786.html
> * Fix a few warnings reported by kernel test robot:
>     - no previous prototype for function 'ib_umem_dmabuf_release'
>     - no previous prototype for function 'ib_umem_dmabuf_map_pages'
>     - comparison of distinct pointer types in 'check_add_overflow'
> * Add comment for the wait between getting the dma-buf sg tagle and
>   updating the NIC page table
> 
> v4: https://www.spinics.net/lists/linux-rdma/msg96767.html
> * Add a new ib_device method reg_user_mr_dmabuf() instead of expanding
>   the existing method reg_user_mr()
> * Use a separate code flow for dma-buf instead of adding special cases
>   to the ODP memory region code path
> * In invalidation callback, new mapping is updated as whole using work
>   queue instead of being updated in page granularity in the page fault
>   handler
> * Use dma_resv_get_excl() and dma_fence_wait() to ensure the content of
>   the pages have been moved to the new location before the new mapping
>   is programmed into the NIC
> * Add code to the ODP page fault handler to check the mapping status
> * The new access flag added in v3 is removed.
> * The checking for on-demand paging support in the new uverbs command
>   is removed because it is implied by implementing the new ib_device
>   method
> * Clarify that dma-buf sg lists are page aligned
> 
> v3: https://www.spinics.net/lists/linux-rdma/msg96330.html
> * Use dma_buf_dynamic_attach() instead of dma_buf_attach()
> * Use on-demand paging mechanism to avoid pinning the GPU memory
> * Instead of adding a new parameter to the device method for memory
>   registration, pass all the attributes including the file descriptor
>   as a structure
> * Define a new access flag for dma-buf based memory region
> * Check for on-demand paging support in the new uverbs command
> 
> v2: https://www.spinics.net/lists/linux-rdma/msg93643.html
> * The Kconfig option is removed. There is no dependence issue since
>   dma-buf driver is always enabled.
> * The declaration of new data structure and functions is reorganized to
>   minimize the visibility of the changes.
> * The new uverbs command now goes through ioctl() instead of write().
> * The rereg functionality is removed.
> * Instead of adding new device method for dma-buf specific registration,
>   existing method is extended to accept an extra parameter.
> * The correct function is now used for address range checking.
> 
> v1: https://www.spinics.net/lists/linux-rdma/msg90720.html
> * The initial patch set
> * Implement core functions for importing and mapping dma-buf
> * Use dma-buf static attach interface
> * Add two ib_device methods reg_user_mr_fd() and rereg_user_mr_fd()
> * Add two uverbs commands via the write() interface
> * Add Kconfig option
> * Add dma-buf support to mlx5 device
> 
> When enabled, an RDMA capable NIC can perform peer-to-peer transactions over PCIe to access the local memory located on another
> device. This can often lead to better performance than using a system memory buffer for RDMA and copying data between the buffer and
> device memory.
> 
> Current kernel RDMA stack uses get_user_pages() to pin the physical pages backing the user buffer and uses dma_map_sg_attrs() to get the
> dma addresses for memory access. This usually doesn't work for peer device memory due to the lack of associated page structures.
> 
> Several mechanisms exist today to facilitate device memory access.
> 
> ZONE_DEVICE is a new zone for device memory in the memory management subsystem. It allows pages from device memory being
> described with specialized page structures, but what can be done with these page structures may be different from system memory.
> ZONE_DEVICE is further specialized into multiple memory types, such as one type for PCI p2pmem/p2pdma and one type for HMM.
> 
> PCI p2pmem/p2pdma uses ZONE_DEVICE to represent device memory residing in a PCI BAR and provides a set of calls to publish, discover,
> allocate, and map such memory for peer-to-peer transactions. One feature of the API is that the buffer is allocated by the side that does the
> DMA transfer. This works well with the storage usage case, but is awkward with GPU-NIC communication, where typically the buffer is
> allocated by the GPU driver rather than the NIC driver.
> 
> Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier and ZONE_DEVICE to support shared virtual address space and
> page migration between system memory and device memory. HMM doesn't support pinning device memory because pages located on
> device must be able to migrate to system memory when accessed by CPU. Peer-to-peer access is currently not supported by HMM.
> 
> Dma-buf is a standard mechanism for sharing buffers among different device drivers. The buffer to be shared is exported by the owning
> driver and imported by the driver that wants to use it. The exporter provides a set of ops that the importer can call to pin and map the
> buffer. In addition, a file descriptor can be associated with a dma- buf object as the handle that can be passed to user space.
> 
> This patch series adds dma-buf importer role to the RDMA driver in attempt to support RDMA using device memory such as GPU VRAM.
> Dma-buf is chosen for a few reasons: first, the API is relatively simple and allows a lot of flexibility in implementing the buffer manipulation
> ops.
> Second, it doesn't require page structure. Third, dma-buf is already supported in many GPU drivers. However, we are aware that existing
> GPU drivers don't allow pinning device memory via the dma-buf interface.
> Pinning would simply cause the backing storage to migrate to system RAM.
> True peer-to-peer access is only possible using dynamic attach, which requires on-demand paging support from the NIC to work. For this
> reason, this series only works with ODP capable NICs.
> 
> This series consists of four patches. The first patch adds the common code for importing dma-buf from a file descriptor and mapping the
> dma-buf pages. Patch 2 add the new driver method reg_user_mr_dmabuf().
> Patch 3 adds a new uverbs command for registering dma-buf based memory region. Patch 4 adds dma-buf support to the mlx5 driver.
> 
> Related user space RDMA library changes are provided as a separate patch series.
> 
> Jianxin Xiong (4):
>   RDMA/umem: Support importing dma-buf as user memory region
>   RDMA/core: Add device method for registering dma-buf based memory
>     region
>   RDMA/uverbs: Add uverbs command for dma-buf based MR registration
>   RDMA/mlx5: Support dma-buf based userspace memory region
> 
>  drivers/infiniband/Kconfig                    |   1 +
>  drivers/infiniband/core/Makefile              |   2 +-
>  drivers/infiniband/core/device.c              |   1 +
>  drivers/infiniband/core/umem.c                |   3 +
>  drivers/infiniband/core/umem_dmabuf.c         | 174 ++++++++++++++++++++++++++
>  drivers/infiniband/core/uverbs_std_types_mr.c | 117 ++++++++++++++++-
>  drivers/infiniband/hw/mlx5/main.c             |   2 +
>  drivers/infiniband/hw/mlx5/mlx5_ib.h          |  18 +++
>  drivers/infiniband/hw/mlx5/mr.c               | 112 ++++++++++++++++-
>  drivers/infiniband/hw/mlx5/odp.c              |  89 ++++++++++++-
>  include/rdma/ib_umem.h                        |  48 ++++++-
>  include/rdma/ib_verbs.h                       |   6 +-
>  include/uapi/rdma/ib_user_ioctl_cmds.h        |  14 +++
>  13 files changed, 573 insertions(+), 14 deletions(-)  create mode 100644 drivers/infiniband/core/umem_dmabuf.c
> 
> --
> 1.8.3.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-01-11 15:24   ` Xiong, Jianxin
@ 2021-01-11 15:42     ` Jason Gunthorpe
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-01-11 15:42 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> Jason, will this series be able to get into 5.12?

I was going to ask you where things are after the break? 

Did everyone agree the userspace stuff is OK now? Is Edward OK with
the pyverbs changes, etc

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-01-11 15:42     ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-01-11 15:42 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> Jason, will this series be able to get into 5.12?

I was going to ask you where things are after the break? 

Did everyone agree the userspace stuff is OK now? Is Edward OK with
the pyverbs changes, etc

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-01-11 15:42     ` Jason Gunthorpe
@ 2021-01-11 17:44       ` Xiong, Jianxin
  -1 siblings, 0 replies; 48+ messages in thread
From: Xiong, Jianxin @ 2021-01-11 17:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Vetter, Daniel

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Monday, January 11, 2021 7:43 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> > Jason, will this series be able to get into 5.12?
> 
> I was going to ask you where things are after the break?
> 
> Did everyone agree the userspace stuff is OK now? Is Edward OK with the pyverbs changes, etc
> 

There is no new comment on the both the kernel and userspace series. I assume silence
means no objection. I will ask for opinions on the userspace thread.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-01-11 17:44       ` Xiong, Jianxin
  0 siblings, 0 replies; 48+ messages in thread
From: Xiong, Jianxin @ 2021-01-11 17:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford, Vetter,
	Daniel, Christian Koenig

> -----Original Message-----
> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Monday, January 11, 2021 7:43 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> <daniel.vetter@intel.com>
> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> > Jason, will this series be able to get into 5.12?
> 
> I was going to ask you where things are after the break?
> 
> Did everyone agree the userspace stuff is OK now? Is Edward OK with the pyverbs changes, etc
> 

There is no new comment on the both the kernel and userspace series. I assume silence
means no objection. I will ask for opinions on the userspace thread.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-01-11 17:44       ` Xiong, Jianxin
@ 2021-01-11 17:47         ` Alex Deucher
  -1 siblings, 0 replies; 48+ messages in thread
From: Alex Deucher @ 2021-01-11 17:47 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: Jason Gunthorpe, Leon Romanovsky, linux-rdma, dri-devel,
	Doug Ledford, Vetter, Daniel, Christian Koenig

On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin <jianxin.xiong@intel.com> wrote:
>
> > -----Original Message-----
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Monday, January 11, 2021 7:43 AM
> > To: Xiong, Jianxin <jianxin.xiong@intel.com>
> > Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> > <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> > <daniel.vetter@intel.com>
> > Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> >
> > On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> > > Jason, will this series be able to get into 5.12?
> >
> > I was going to ask you where things are after the break?
> >
> > Did everyone agree the userspace stuff is OK now? Is Edward OK with the pyverbs changes, etc
> >
>
> There is no new comment on the both the kernel and userspace series. I assume silence
> means no objection. I will ask for opinions on the userspace thread.

Do you have a link to the userspace thread?

Thanks,

Alex

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-01-11 17:47         ` Alex Deucher
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Deucher @ 2021-01-11 17:47 UTC (permalink / raw)
  To: Xiong, Jianxin
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Jason Gunthorpe,
	Doug Ledford, Vetter, Daniel, Christian Koenig

On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin <jianxin.xiong@intel.com> wrote:
>
> > -----Original Message-----
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Monday, January 11, 2021 7:43 AM
> > To: Xiong, Jianxin <jianxin.xiong@intel.com>
> > Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> > <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> > <daniel.vetter@intel.com>
> > Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> >
> > On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> > > Jason, will this series be able to get into 5.12?
> >
> > I was going to ask you where things are after the break?
> >
> > Did everyone agree the userspace stuff is OK now? Is Edward OK with the pyverbs changes, etc
> >
>
> There is no new comment on the both the kernel and userspace series. I assume silence
> means no objection. I will ask for opinions on the userspace thread.

Do you have a link to the userspace thread?

Thanks,

Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-01-11 17:47         ` Alex Deucher
@ 2021-01-11 17:55           ` Xiong, Jianxin
  -1 siblings, 0 replies; 48+ messages in thread
From: Xiong, Jianxin @ 2021-01-11 17:55 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Jason Gunthorpe, Leon Romanovsky, linux-rdma, dri-devel,
	Doug Ledford, Vetter, Daniel, Christian Koenig

> -----Original Message-----
> From: Alex Deucher <alexdeucher@gmail.com>
> Sent: Monday, January 11, 2021 9:47 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> Doug Ledford <dledford@redhat.com>; Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig <christian.koenig@amd.com>
> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin <jianxin.xiong@intel.com> wrote:
> >
> > > -----Original Message-----
> > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > Sent: Monday, January 11, 2021 7:43 AM
> > > To: Xiong, Jianxin <jianxin.xiong@intel.com>
> > > Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> > > Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> > > <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian
> > > Koenig <christian.koenig@amd.com>; Vetter, Daniel
> > > <daniel.vetter@intel.com>
> > > Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> > >
> > > On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> > > > Jason, will this series be able to get into 5.12?
> > >
> > > I was going to ask you where things are after the break?
> > >
> > > Did everyone agree the userspace stuff is OK now? Is Edward OK with
> > > the pyverbs changes, etc
> > >
> >
> > There is no new comment on the both the kernel and userspace series. I
> > assume silence means no objection. I will ask for opinions on the userspace thread.
> 
> Do you have a link to the userspace thread?
> 
https://www.spinics.net/lists/linux-rdma/msg98135.html


^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-01-11 17:55           ` Xiong, Jianxin
  0 siblings, 0 replies; 48+ messages in thread
From: Xiong, Jianxin @ 2021-01-11 17:55 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Jason Gunthorpe,
	Doug Ledford, Vetter, Daniel, Christian Koenig

> -----Original Message-----
> From: Alex Deucher <alexdeucher@gmail.com>
> Sent: Monday, January 11, 2021 9:47 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> Doug Ledford <dledford@redhat.com>; Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig <christian.koenig@amd.com>
> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin <jianxin.xiong@intel.com> wrote:
> >
> > > -----Original Message-----
> > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > Sent: Monday, January 11, 2021 7:43 AM
> > > To: Xiong, Jianxin <jianxin.xiong@intel.com>
> > > Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> > > Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> > > <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian
> > > Koenig <christian.koenig@amd.com>; Vetter, Daniel
> > > <daniel.vetter@intel.com>
> > > Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> > >
> > > On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> > > > Jason, will this series be able to get into 5.12?
> > >
> > > I was going to ask you where things are after the break?
> > >
> > > Did everyone agree the userspace stuff is OK now? Is Edward OK with
> > > the pyverbs changes, etc
> > >
> >
> > There is no new comment on the both the kernel and userspace series. I
> > assume silence means no objection. I will ask for opinions on the userspace thread.
> 
> Do you have a link to the userspace thread?
> 
https://www.spinics.net/lists/linux-rdma/msg98135.html

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-01-11 17:55           ` Xiong, Jianxin
@ 2021-01-12 12:49             ` Yishai Hadas
  -1 siblings, 0 replies; 48+ messages in thread
From: Yishai Hadas @ 2021-01-12 12:49 UTC (permalink / raw)
  To: Xiong, Jianxin, Alex Deucher
  Cc: Jason Gunthorpe, Leon Romanovsky, linux-rdma, dri-devel,
	Doug Ledford, Vetter, Daniel, Christian Koenig, Yishai Hadas

On 1/11/2021 7:55 PM, Xiong, Jianxin wrote:
>> -----Original Message-----
>> From: Alex Deucher <alexdeucher@gmail.com>
>> Sent: Monday, January 11, 2021 9:47 AM
>> To: Xiong, Jianxin <jianxin.xiong@intel.com>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
>> Doug Ledford <dledford@redhat.com>; Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig <christian.koenig@amd.com>
>> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
>>
>> On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin <jianxin.xiong@intel.com> wrote:
>>>> -----Original Message-----
>>>> From: Jason Gunthorpe <jgg@ziepe.ca>
>>>> Sent: Monday, January 11, 2021 7:43 AM
>>>> To: Xiong, Jianxin <jianxin.xiong@intel.com>
>>>> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
>>>> Doug Ledford <dledford@redhat.com>; Leon Romanovsky
>>>> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian
>>>> Koenig <christian.koenig@amd.com>; Vetter, Daniel
>>>> <daniel.vetter@intel.com>
>>>> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
>>>>
>>>> On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
>>>>> Jason, will this series be able to get into 5.12?
>>>> I was going to ask you where things are after the break?
>>>>
>>>> Did everyone agree the userspace stuff is OK now? Is Edward OK with
>>>> the pyverbs changes, etc
>>>>
>>> There is no new comment on the both the kernel and userspace series. I
>>> assume silence means no objection. I will ask for opinions on the userspace thread.
>> Do you have a link to the userspace thread?
>>
> https://www.spinics.net/lists/linux-rdma/msg98135.html
>
Any reason why the 'fork' comment that was given few times wasn't not 
handled / answered ?

Specifically,

ibv_reg_dmabuf_mr() doesn't call ibv_dontfork_range() but ibv_dereg_mr 
does call its opposite API (i.e. ibv_dofork_range())

Yishai


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-01-12 12:49             ` Yishai Hadas
  0 siblings, 0 replies; 48+ messages in thread
From: Yishai Hadas @ 2021-01-12 12:49 UTC (permalink / raw)
  To: Xiong, Jianxin, Alex Deucher
  Cc: Yishai Hadas, Leon Romanovsky, linux-rdma, dri-devel,
	Jason Gunthorpe, Doug Ledford, Vetter, Daniel, Christian Koenig

On 1/11/2021 7:55 PM, Xiong, Jianxin wrote:
>> -----Original Message-----
>> From: Alex Deucher <alexdeucher@gmail.com>
>> Sent: Monday, January 11, 2021 9:47 AM
>> To: Xiong, Jianxin <jianxin.xiong@intel.com>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
>> Doug Ledford <dledford@redhat.com>; Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig <christian.koenig@amd.com>
>> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
>>
>> On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin <jianxin.xiong@intel.com> wrote:
>>>> -----Original Message-----
>>>> From: Jason Gunthorpe <jgg@ziepe.ca>
>>>> Sent: Monday, January 11, 2021 7:43 AM
>>>> To: Xiong, Jianxin <jianxin.xiong@intel.com>
>>>> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
>>>> Doug Ledford <dledford@redhat.com>; Leon Romanovsky
>>>> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>; Christian
>>>> Koenig <christian.koenig@amd.com>; Vetter, Daniel
>>>> <daniel.vetter@intel.com>
>>>> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
>>>>
>>>> On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
>>>>> Jason, will this series be able to get into 5.12?
>>>> I was going to ask you where things are after the break?
>>>>
>>>> Did everyone agree the userspace stuff is OK now? Is Edward OK with
>>>> the pyverbs changes, etc
>>>>
>>> There is no new comment on the both the kernel and userspace series. I
>>> assume silence means no objection. I will ask for opinions on the userspace thread.
>> Do you have a link to the userspace thread?
>>
> https://www.spinics.net/lists/linux-rdma/msg98135.html
>
Any reason why the 'fork' comment that was given few times wasn't not 
handled / answered ?

Specifically,

ibv_reg_dmabuf_mr() doesn't call ibv_dontfork_range() but ibv_dereg_mr 
does call its opposite API (i.e. ibv_dofork_range())

Yishai

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-01-12 12:49             ` Yishai Hadas
@ 2021-01-12 18:11               ` Xiong, Jianxin
  -1 siblings, 0 replies; 48+ messages in thread
From: Xiong, Jianxin @ 2021-01-12 18:11 UTC (permalink / raw)
  To: Yishai Hadas, Alex Deucher
  Cc: Jason Gunthorpe, Leon Romanovsky, linux-rdma, dri-devel,
	Doug Ledford, Vetter, Daniel, Christian Koenig

 -----Original Message-----
> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Tuesday, January 12, 2021 4:49 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>; Alex Deucher <alexdeucher@gmail.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> Doug Ledford <dledford@redhat.com>; Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig <christian.koenig@amd.com>; Yishai
> Hadas <yishaih@nvidia.com>
> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> On 1/11/2021 7:55 PM, Xiong, Jianxin wrote:
> >> -----Original Message-----
> >> From: Alex Deucher <alexdeucher@gmail.com>
> >> Sent: Monday, January 11, 2021 9:47 AM
> >> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> >> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky
> >> <leon@kernel.org>; linux-rdma@vger.kernel.org;
> >> dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>;
> >> Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig
> >> <christian.koenig@amd.com>
> >> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> >>
> >> On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin <jianxin.xiong@intel.com> wrote:
> >>>> -----Original Message-----
> >>>> From: Jason Gunthorpe <jgg@ziepe.ca>
> >>>> Sent: Monday, January 11, 2021 7:43 AM
> >>>> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> >>>> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> >>>> Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> >>>> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>;
> >>>> Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> >>>> <daniel.vetter@intel.com>
> >>>> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> >>>>
> >>>> On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> >>>>> Jason, will this series be able to get into 5.12?
> >>>> I was going to ask you where things are after the break?
> >>>>
> >>>> Did everyone agree the userspace stuff is OK now? Is Edward OK with
> >>>> the pyverbs changes, etc
> >>>>
> >>> There is no new comment on the both the kernel and userspace series.
> >>> I assume silence means no objection. I will ask for opinions on the userspace thread.
> >> Do you have a link to the userspace thread?
> >>
> > https://www.spinics.net/lists/linux-rdma/msg98135.html
> >
> Any reason why the 'fork' comment that was given few times wasn't not handled / answered ?
> 
> Specifically,
> 
> ibv_reg_dmabuf_mr() doesn't call ibv_dontfork_range() but ibv_dereg_mr does call its opposite API (i.e. ibv_dofork_range())
> 

Sorry, that part was missed. Strangely enough, a few of your replies didn't reach my inbox and I just found them in the web archives:  https://www.spinics.net/lists/linux-rdma/msg97973.html, and https://www.spinics.net/lists/linux-rdma/msg98133.html

I will add check to ibv_dereg_mr() to avoid calling ibv_ibv_dofork_range() for dmabuf case.

Thanks a lot for bring this up again.

Jianxin
  



^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-01-12 18:11               ` Xiong, Jianxin
  0 siblings, 0 replies; 48+ messages in thread
From: Xiong, Jianxin @ 2021-01-12 18:11 UTC (permalink / raw)
  To: Yishai Hadas, Alex Deucher
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Jason Gunthorpe,
	Doug Ledford, Vetter, Daniel, Christian Koenig

 -----Original Message-----
> From: Yishai Hadas <yishaih@nvidia.com>
> Sent: Tuesday, January 12, 2021 4:49 AM
> To: Xiong, Jianxin <jianxin.xiong@intel.com>; Alex Deucher <alexdeucher@gmail.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky <leon@kernel.org>; linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> Doug Ledford <dledford@redhat.com>; Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig <christian.koenig@amd.com>; Yishai
> Hadas <yishaih@nvidia.com>
> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> 
> On 1/11/2021 7:55 PM, Xiong, Jianxin wrote:
> >> -----Original Message-----
> >> From: Alex Deucher <alexdeucher@gmail.com>
> >> Sent: Monday, January 11, 2021 9:47 AM
> >> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> >> Cc: Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky
> >> <leon@kernel.org>; linux-rdma@vger.kernel.org;
> >> dri-devel@lists.freedesktop.org; Doug Ledford <dledford@redhat.com>;
> >> Vetter, Daniel <daniel.vetter@intel.com>; Christian Koenig
> >> <christian.koenig@amd.com>
> >> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> >>
> >> On Mon, Jan 11, 2021 at 12:44 PM Xiong, Jianxin <jianxin.xiong@intel.com> wrote:
> >>>> -----Original Message-----
> >>>> From: Jason Gunthorpe <jgg@ziepe.ca>
> >>>> Sent: Monday, January 11, 2021 7:43 AM
> >>>> To: Xiong, Jianxin <jianxin.xiong@intel.com>
> >>>> Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
> >>>> Doug Ledford <dledford@redhat.com>; Leon Romanovsky
> >>>> <leon@kernel.org>; Sumit Semwal <sumit.semwal@linaro.org>;
> >>>> Christian Koenig <christian.koenig@amd.com>; Vetter, Daniel
> >>>> <daniel.vetter@intel.com>
> >>>> Subject: Re: [PATCH v16 0/4] RDMA: Add dma-buf support
> >>>>
> >>>> On Mon, Jan 11, 2021 at 03:24:18PM +0000, Xiong, Jianxin wrote:
> >>>>> Jason, will this series be able to get into 5.12?
> >>>> I was going to ask you where things are after the break?
> >>>>
> >>>> Did everyone agree the userspace stuff is OK now? Is Edward OK with
> >>>> the pyverbs changes, etc
> >>>>
> >>> There is no new comment on the both the kernel and userspace series.
> >>> I assume silence means no objection. I will ask for opinions on the userspace thread.
> >> Do you have a link to the userspace thread?
> >>
> > https://www.spinics.net/lists/linux-rdma/msg98135.html
> >
> Any reason why the 'fork' comment that was given few times wasn't not handled / answered ?
> 
> Specifically,
> 
> ibv_reg_dmabuf_mr() doesn't call ibv_dontfork_range() but ibv_dereg_mr does call its opposite API (i.e. ibv_dofork_range())
> 

Sorry, that part was missed. Strangely enough, a few of your replies didn't reach my inbox and I just found them in the web archives:  https://www.spinics.net/lists/linux-rdma/msg97973.html, and https://www.spinics.net/lists/linux-rdma/msg98133.html

I will add check to ibv_dereg_mr() to avoid calling ibv_ibv_dofork_range() for dmabuf case.

Thanks a lot for bring this up again.

Jianxin
  


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2020-12-15 21:27 ` Jianxin Xiong
@ 2021-01-21 16:59   ` Jason Gunthorpe
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-01-21 16:59 UTC (permalink / raw)
  To: Jianxin Xiong
  Cc: linux-rdma, dri-devel, Doug Ledford, Leon Romanovsky,
	Sumit Semwal, Christian Koenig, Daniel Vetter

On Tue, Dec 15, 2020 at 01:27:12PM -0800, Jianxin Xiong wrote:
> Jianxin Xiong (4):
>   RDMA/umem: Support importing dma-buf as user memory region
>   RDMA/core: Add device method for registering dma-buf based memory
>     region
>   RDMA/uverbs: Add uverbs command for dma-buf based MR registration
>   RDMA/mlx5: Support dma-buf based userspace memory region

I applied the below fix for rereg, but otherwise took this to rdma's
for-next

Thanks,
Jason

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index f9ca19fa531b45..a63ef7c66e383d 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1825,9 +1825,6 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 	if (flags & ~(IB_MR_REREG_TRANS | IB_MR_REREG_PD | IB_MR_REREG_ACCESS))
 		return ERR_PTR(-EOPNOTSUPP);
 
-	if (is_dmabuf_mr(mr))
-		return ERR_PTR(-EOPNOTSUPP);
-
 	if (!(flags & IB_MR_REREG_ACCESS))
 		new_access_flags = mr->access_flags;
 	if (!(flags & IB_MR_REREG_PD))
@@ -1844,8 +1841,8 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 				return ERR_PTR(err);
 			return NULL;
 		}
-		/* DM or ODP MR's don't have a umem so we can't re-use it */
-		if (!mr->umem || is_odp_mr(mr))
+		/* DM or ODP MR's don't have a normal umem so we can't re-use it */
+		if (!mr->umem || is_odp_mr(mr) || is_dmabuf_mr(mr))
 			goto recreate;
 
 		/*
@@ -1864,10 +1861,10 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 	}
 
 	/*
-	 * DM doesn't have a PAS list so we can't re-use it, odp does but the
-	 * logic around releasing the umem is different
+	 * DM doesn't have a PAS list so we can't re-use it, odp/dmabuf does
+	 * but the logic around releasing the umem is different
 	 */
-	if (!mr->umem || is_odp_mr(mr))
+	if (!mr->umem || is_odp_mr(mr) || is_dmabuf_mr(mr))
 		goto recreate;
 
 	if (!(new_access_flags & IB_ACCESS_ON_DEMAND) &&

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-01-21 16:59   ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-01-21 16:59 UTC (permalink / raw)
  To: Jianxin Xiong
  Cc: Leon Romanovsky, linux-rdma, dri-devel, Doug Ledford,
	Daniel Vetter, Christian Koenig

On Tue, Dec 15, 2020 at 01:27:12PM -0800, Jianxin Xiong wrote:
> Jianxin Xiong (4):
>   RDMA/umem: Support importing dma-buf as user memory region
>   RDMA/core: Add device method for registering dma-buf based memory
>     region
>   RDMA/uverbs: Add uverbs command for dma-buf based MR registration
>   RDMA/mlx5: Support dma-buf based userspace memory region

I applied the below fix for rereg, but otherwise took this to rdma's
for-next

Thanks,
Jason

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index f9ca19fa531b45..a63ef7c66e383d 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1825,9 +1825,6 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 	if (flags & ~(IB_MR_REREG_TRANS | IB_MR_REREG_PD | IB_MR_REREG_ACCESS))
 		return ERR_PTR(-EOPNOTSUPP);
 
-	if (is_dmabuf_mr(mr))
-		return ERR_PTR(-EOPNOTSUPP);
-
 	if (!(flags & IB_MR_REREG_ACCESS))
 		new_access_flags = mr->access_flags;
 	if (!(flags & IB_MR_REREG_PD))
@@ -1844,8 +1841,8 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 				return ERR_PTR(err);
 			return NULL;
 		}
-		/* DM or ODP MR's don't have a umem so we can't re-use it */
-		if (!mr->umem || is_odp_mr(mr))
+		/* DM or ODP MR's don't have a normal umem so we can't re-use it */
+		if (!mr->umem || is_odp_mr(mr) || is_dmabuf_mr(mr))
 			goto recreate;
 
 		/*
@@ -1864,10 +1861,10 @@ struct ib_mr *mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start,
 	}
 
 	/*
-	 * DM doesn't have a PAS list so we can't re-use it, odp does but the
-	 * logic around releasing the umem is different
+	 * DM doesn't have a PAS list so we can't re-use it, odp/dmabuf does
+	 * but the logic around releasing the umem is different
 	 */
-	if (!mr->umem || is_odp_mr(mr))
+	if (!mr->umem || is_odp_mr(mr) || is_dmabuf_mr(mr))
 		goto recreate;
 
 	if (!(new_access_flags & IB_ACCESS_ON_DEMAND) &&
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2020-12-15 21:27 ` Jianxin Xiong
@ 2021-02-04  7:48   ` John Hubbard
  -1 siblings, 0 replies; 48+ messages in thread
From: John Hubbard @ 2021-02-04  7:48 UTC (permalink / raw)
  To: Jianxin Xiong, linux-rdma, dri-devel
  Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Sumit Semwal,
	Christian Koenig, Daniel Vetter

On 12/15/20 1:27 PM, Jianxin Xiong wrote:
> This patch series adds dma-buf importer role to the RDMA driver in
> attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> chosen for a few reasons: first, the API is relatively simple and allows
> a lot of flexibility in implementing the buffer manipulation ops.
> Second, it doesn't require page structure. Third, dma-buf is already
> supported in many GPU drivers. However, we are aware that existing GPU
> drivers don't allow pinning device memory via the dma-buf interface.
> Pinning would simply cause the backing storage to migrate to system RAM.
> True peer-to-peer access is only possible using dynamic attach, which
> requires on-demand paging support from the NIC to work. For this reason,
> this series only works with ODP capable NICs.

Hi,

Looking ahead to after this patchset is merged...

Are there design thoughts out there, about the future of pinning to vidmem,
for this? It would allow a huge group of older GPUs and NICs and such to
do p2p with this approach, and it seems like a natural next step, right?


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-04  7:48   ` John Hubbard
  0 siblings, 0 replies; 48+ messages in thread
From: John Hubbard @ 2021-02-04  7:48 UTC (permalink / raw)
  To: Jianxin Xiong, linux-rdma, dri-devel
  Cc: Leon Romanovsky, Christian Koenig, Jason Gunthorpe, Doug Ledford,
	Daniel Vetter

On 12/15/20 1:27 PM, Jianxin Xiong wrote:
> This patch series adds dma-buf importer role to the RDMA driver in
> attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> chosen for a few reasons: first, the API is relatively simple and allows
> a lot of flexibility in implementing the buffer manipulation ops.
> Second, it doesn't require page structure. Third, dma-buf is already
> supported in many GPU drivers. However, we are aware that existing GPU
> drivers don't allow pinning device memory via the dma-buf interface.
> Pinning would simply cause the backing storage to migrate to system RAM.
> True peer-to-peer access is only possible using dynamic attach, which
> requires on-demand paging support from the NIC to work. For this reason,
> this series only works with ODP capable NICs.

Hi,

Looking ahead to after this patchset is merged...

Are there design thoughts out there, about the future of pinning to vidmem,
for this? It would allow a huge group of older GPUs and NICs and such to
do p2p with this approach, and it seems like a natural next step, right?


thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-04  7:48   ` John Hubbard
@ 2021-02-04 13:50     ` Alex Deucher
  -1 siblings, 0 replies; 48+ messages in thread
From: Alex Deucher @ 2021-02-04 13:50 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jianxin Xiong, linux-rdma, Maling list - DRI developers,
	Leon Romanovsky, Christian Koenig, Jason Gunthorpe, Doug Ledford,
	Daniel Vetter

On Thu, Feb 4, 2021 at 2:48 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 12/15/20 1:27 PM, Jianxin Xiong wrote:
> > This patch series adds dma-buf importer role to the RDMA driver in
> > attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> > chosen for a few reasons: first, the API is relatively simple and allows
> > a lot of flexibility in implementing the buffer manipulation ops.
> > Second, it doesn't require page structure. Third, dma-buf is already
> > supported in many GPU drivers. However, we are aware that existing GPU
> > drivers don't allow pinning device memory via the dma-buf interface.
> > Pinning would simply cause the backing storage to migrate to system RAM.
> > True peer-to-peer access is only possible using dynamic attach, which
> > requires on-demand paging support from the NIC to work. For this reason,
> > this series only works with ODP capable NICs.
>
> Hi,
>
> Looking ahead to after this patchset is merged...
>
> Are there design thoughts out there, about the future of pinning to vidmem,
> for this? It would allow a huge group of older GPUs and NICs and such to
> do p2p with this approach, and it seems like a natural next step, right?

The argument is that vram is a scarce resource, but I don't know if
that is really the case these days.  At this point, we often have as
much vram as system ram if not more.

Alex

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-04 13:50     ` Alex Deucher
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Deucher @ 2021-02-04 13:50 UTC (permalink / raw)
  To: John Hubbard
  Cc: Leon Romanovsky, linux-rdma, Maling list - DRI developers,
	Jason Gunthorpe, Doug Ledford, Daniel Vetter, Christian Koenig,
	Jianxin Xiong

On Thu, Feb 4, 2021 at 2:48 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 12/15/20 1:27 PM, Jianxin Xiong wrote:
> > This patch series adds dma-buf importer role to the RDMA driver in
> > attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> > chosen for a few reasons: first, the API is relatively simple and allows
> > a lot of flexibility in implementing the buffer manipulation ops.
> > Second, it doesn't require page structure. Third, dma-buf is already
> > supported in many GPU drivers. However, we are aware that existing GPU
> > drivers don't allow pinning device memory via the dma-buf interface.
> > Pinning would simply cause the backing storage to migrate to system RAM.
> > True peer-to-peer access is only possible using dynamic attach, which
> > requires on-demand paging support from the NIC to work. For this reason,
> > this series only works with ODP capable NICs.
>
> Hi,
>
> Looking ahead to after this patchset is merged...
>
> Are there design thoughts out there, about the future of pinning to vidmem,
> for this? It would allow a huge group of older GPUs and NICs and such to
> do p2p with this approach, and it seems like a natural next step, right?

The argument is that vram is a scarce resource, but I don't know if
that is really the case these days.  At this point, we often have as
much vram as system ram if not more.

Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-04 13:50     ` Alex Deucher
@ 2021-02-04 18:29       ` Jason Gunthorpe
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-02-04 18:29 UTC (permalink / raw)
  To: Alex Deucher
  Cc: John Hubbard, Jianxin Xiong, linux-rdma,
	Maling list - DRI developers, Leon Romanovsky, Christian Koenig,
	Doug Ledford, Daniel Vetter

On Thu, Feb 04, 2021 at 08:50:38AM -0500, Alex Deucher wrote:
> On Thu, Feb 4, 2021 at 2:48 AM John Hubbard <jhubbard@nvidia.com> wrote:
> >
> > On 12/15/20 1:27 PM, Jianxin Xiong wrote:
> > > This patch series adds dma-buf importer role to the RDMA driver in
> > > attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> > > chosen for a few reasons: first, the API is relatively simple and allows
> > > a lot of flexibility in implementing the buffer manipulation ops.
> > > Second, it doesn't require page structure. Third, dma-buf is already
> > > supported in many GPU drivers. However, we are aware that existing GPU
> > > drivers don't allow pinning device memory via the dma-buf interface.
> > > Pinning would simply cause the backing storage to migrate to system RAM.
> > > True peer-to-peer access is only possible using dynamic attach, which
> > > requires on-demand paging support from the NIC to work. For this reason,
> > > this series only works with ODP capable NICs.
> >
> > Hi,
> >
> > Looking ahead to after this patchset is merged...
> >
> > Are there design thoughts out there, about the future of pinning to vidmem,
> > for this? It would allow a huge group of older GPUs and NICs and such to
> > do p2p with this approach, and it seems like a natural next step, right?
> 
> The argument is that vram is a scarce resource, but I don't know if
> that is really the case these days.  At this point, we often have as
> much vram as system ram if not more.

I thought the main argument was that GPU memory could move at any time
between the GPU and CPU and the DMA buf would always track its current
location?

IMHO there is no reason not to have a special API to create small
amounts of GPU dedicated locked memory that cannot be moved off the
GPU.

For instance this paper:

http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2014-ASHESIPDPS.pdf

Considers using the GPU to directly drive the RDMA work
queues. Putting the queues themselves in GPU VRAM would make alot of
sense.

But that is impossible without fixed non-invalidating dma bufs.

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-04 18:29       ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-02-04 18:29 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Leon Romanovsky, linux-rdma, John Hubbard,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On Thu, Feb 04, 2021 at 08:50:38AM -0500, Alex Deucher wrote:
> On Thu, Feb 4, 2021 at 2:48 AM John Hubbard <jhubbard@nvidia.com> wrote:
> >
> > On 12/15/20 1:27 PM, Jianxin Xiong wrote:
> > > This patch series adds dma-buf importer role to the RDMA driver in
> > > attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> > > chosen for a few reasons: first, the API is relatively simple and allows
> > > a lot of flexibility in implementing the buffer manipulation ops.
> > > Second, it doesn't require page structure. Third, dma-buf is already
> > > supported in many GPU drivers. However, we are aware that existing GPU
> > > drivers don't allow pinning device memory via the dma-buf interface.
> > > Pinning would simply cause the backing storage to migrate to system RAM.
> > > True peer-to-peer access is only possible using dynamic attach, which
> > > requires on-demand paging support from the NIC to work. For this reason,
> > > this series only works with ODP capable NICs.
> >
> > Hi,
> >
> > Looking ahead to after this patchset is merged...
> >
> > Are there design thoughts out there, about the future of pinning to vidmem,
> > for this? It would allow a huge group of older GPUs and NICs and such to
> > do p2p with this approach, and it seems like a natural next step, right?
> 
> The argument is that vram is a scarce resource, but I don't know if
> that is really the case these days.  At this point, we often have as
> much vram as system ram if not more.

I thought the main argument was that GPU memory could move at any time
between the GPU and CPU and the DMA buf would always track its current
location?

IMHO there is no reason not to have a special API to create small
amounts of GPU dedicated locked memory that cannot be moved off the
GPU.

For instance this paper:

http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2014-ASHESIPDPS.pdf

Considers using the GPU to directly drive the RDMA work
queues. Putting the queues themselves in GPU VRAM would make alot of
sense.

But that is impossible without fixed non-invalidating dma bufs.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-04 18:29       ` Jason Gunthorpe
@ 2021-02-04 18:44         ` Alex Deucher
  -1 siblings, 0 replies; 48+ messages in thread
From: Alex Deucher @ 2021-02-04 18:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Jianxin Xiong, linux-rdma,
	Maling list - DRI developers, Leon Romanovsky, Christian Koenig,
	Doug Ledford, Daniel Vetter

On Thu, Feb 4, 2021 at 1:29 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Feb 04, 2021 at 08:50:38AM -0500, Alex Deucher wrote:
> > On Thu, Feb 4, 2021 at 2:48 AM John Hubbard <jhubbard@nvidia.com> wrote:
> > >
> > > On 12/15/20 1:27 PM, Jianxin Xiong wrote:
> > > > This patch series adds dma-buf importer role to the RDMA driver in
> > > > attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> > > > chosen for a few reasons: first, the API is relatively simple and allows
> > > > a lot of flexibility in implementing the buffer manipulation ops.
> > > > Second, it doesn't require page structure. Third, dma-buf is already
> > > > supported in many GPU drivers. However, we are aware that existing GPU
> > > > drivers don't allow pinning device memory via the dma-buf interface.
> > > > Pinning would simply cause the backing storage to migrate to system RAM.
> > > > True peer-to-peer access is only possible using dynamic attach, which
> > > > requires on-demand paging support from the NIC to work. For this reason,
> > > > this series only works with ODP capable NICs.
> > >
> > > Hi,
> > >
> > > Looking ahead to after this patchset is merged...
> > >
> > > Are there design thoughts out there, about the future of pinning to vidmem,
> > > for this? It would allow a huge group of older GPUs and NICs and such to
> > > do p2p with this approach, and it seems like a natural next step, right?
> >
> > The argument is that vram is a scarce resource, but I don't know if
> > that is really the case these days.  At this point, we often have as
> > much vram as system ram if not more.
>
> I thought the main argument was that GPU memory could move at any time
> between the GPU and CPU and the DMA buf would always track its current
> location?

I think the reason for that is that VRAM is scarce so we have to be
able to move it around.  We don't enforce the same limitations for
buffers in system memory.  We could just support pinning dma-bufs in
vram like we do with system ram.  Maybe with some conditions, e.g.,
p2p is possible, and the device has a large BAR so you aren't tying up
the BAR window.

Alex


>
> IMHO there is no reason not to have a special API to create small
> amounts of GPU dedicated locked memory that cannot be moved off the
> GPU.
>
> For instance this paper:
>
> http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2014-ASHESIPDPS.pdf
>
> Considers using the GPU to directly drive the RDMA work
> queues. Putting the queues themselves in GPU VRAM would make alot of
> sense.
>
> But that is impossible without fixed non-invalidating dma bufs.
>
> Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-04 18:44         ` Alex Deucher
  0 siblings, 0 replies; 48+ messages in thread
From: Alex Deucher @ 2021-02-04 18:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, John Hubbard,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On Thu, Feb 4, 2021 at 1:29 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Feb 04, 2021 at 08:50:38AM -0500, Alex Deucher wrote:
> > On Thu, Feb 4, 2021 at 2:48 AM John Hubbard <jhubbard@nvidia.com> wrote:
> > >
> > > On 12/15/20 1:27 PM, Jianxin Xiong wrote:
> > > > This patch series adds dma-buf importer role to the RDMA driver in
> > > > attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
> > > > chosen for a few reasons: first, the API is relatively simple and allows
> > > > a lot of flexibility in implementing the buffer manipulation ops.
> > > > Second, it doesn't require page structure. Third, dma-buf is already
> > > > supported in many GPU drivers. However, we are aware that existing GPU
> > > > drivers don't allow pinning device memory via the dma-buf interface.
> > > > Pinning would simply cause the backing storage to migrate to system RAM.
> > > > True peer-to-peer access is only possible using dynamic attach, which
> > > > requires on-demand paging support from the NIC to work. For this reason,
> > > > this series only works with ODP capable NICs.
> > >
> > > Hi,
> > >
> > > Looking ahead to after this patchset is merged...
> > >
> > > Are there design thoughts out there, about the future of pinning to vidmem,
> > > for this? It would allow a huge group of older GPUs and NICs and such to
> > > do p2p with this approach, and it seems like a natural next step, right?
> >
> > The argument is that vram is a scarce resource, but I don't know if
> > that is really the case these days.  At this point, we often have as
> > much vram as system ram if not more.
>
> I thought the main argument was that GPU memory could move at any time
> between the GPU and CPU and the DMA buf would always track its current
> location?

I think the reason for that is that VRAM is scarce so we have to be
able to move it around.  We don't enforce the same limitations for
buffers in system memory.  We could just support pinning dma-bufs in
vram like we do with system ram.  Maybe with some conditions, e.g.,
p2p is possible, and the device has a large BAR so you aren't tying up
the BAR window.

Alex


>
> IMHO there is no reason not to have a special API to create small
> amounts of GPU dedicated locked memory that cannot be moved off the
> GPU.
>
> For instance this paper:
>
> http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2014-ASHESIPDPS.pdf
>
> Considers using the GPU to directly drive the RDMA work
> queues. Putting the queues themselves in GPU VRAM would make alot of
> sense.
>
> But that is impossible without fixed non-invalidating dma bufs.
>
> Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-04 18:44         ` Alex Deucher
@ 2021-02-04 19:00           ` John Hubbard
  -1 siblings, 0 replies; 48+ messages in thread
From: John Hubbard @ 2021-02-04 19:00 UTC (permalink / raw)
  To: Alex Deucher, Jason Gunthorpe
  Cc: Jianxin Xiong, linux-rdma, Maling list - DRI developers,
	Leon Romanovsky, Christian Koenig, Doug Ledford, Daniel Vetter

On 2/4/21 10:44 AM, Alex Deucher wrote:
...
>>> The argument is that vram is a scarce resource, but I don't know if
>>> that is really the case these days.  At this point, we often have as
>>> much vram as system ram if not more.
>>
>> I thought the main argument was that GPU memory could move at any time
>> between the GPU and CPU and the DMA buf would always track its current
>> location?
> 
> I think the reason for that is that VRAM is scarce so we have to be
> able to move it around.  We don't enforce the same limitations for
> buffers in system memory.  We could just support pinning dma-bufs in
> vram like we do with system ram.  Maybe with some conditions, e.g.,
> p2p is possible, and the device has a large BAR so you aren't tying up
> the BAR window.
> 

Excellent. And yes, we are already building systems in which VRAM is
definitely not scarce, but on the other hand, those newer systems can
also handle GPU (and NIC) page faults, so not really an issue. For that,
we just need to enhance HMM so that it does peer to peer.

We also have some older hardware with large BAR1 apertures, specifically
for this sort of thing.

And again, for slightly older hardware, without pinning to VRAM there is
no way to use this solution here for peer-to-peer. So I'm glad to see that
so far you're not ruling out the pinning option.



thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-04 19:00           ` John Hubbard
  0 siblings, 0 replies; 48+ messages in thread
From: John Hubbard @ 2021-02-04 19:00 UTC (permalink / raw)
  To: Alex Deucher, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, Maling list - DRI developers,
	Doug Ledford, Daniel Vetter, Christian Koenig, Jianxin Xiong

On 2/4/21 10:44 AM, Alex Deucher wrote:
...
>>> The argument is that vram is a scarce resource, but I don't know if
>>> that is really the case these days.  At this point, we often have as
>>> much vram as system ram if not more.
>>
>> I thought the main argument was that GPU memory could move at any time
>> between the GPU and CPU and the DMA buf would always track its current
>> location?
> 
> I think the reason for that is that VRAM is scarce so we have to be
> able to move it around.  We don't enforce the same limitations for
> buffers in system memory.  We could just support pinning dma-bufs in
> vram like we do with system ram.  Maybe with some conditions, e.g.,
> p2p is possible, and the device has a large BAR so you aren't tying up
> the BAR window.
> 

Excellent. And yes, we are already building systems in which VRAM is
definitely not scarce, but on the other hand, those newer systems can
also handle GPU (and NIC) page faults, so not really an issue. For that,
we just need to enhance HMM so that it does peer to peer.

We also have some older hardware with large BAR1 apertures, specifically
for this sort of thing.

And again, for slightly older hardware, without pinning to VRAM there is
no way to use this solution here for peer-to-peer. So I'm glad to see that
so far you're not ruling out the pinning option.



thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-04 19:00           ` John Hubbard
@ 2021-02-05 15:39             ` Daniel Vetter
  -1 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2021-02-05 15:39 UTC (permalink / raw)
  To: John Hubbard
  Cc: Alex Deucher, Jason Gunthorpe, Leon Romanovsky, linux-rdma,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On Thu, Feb 04, 2021 at 11:00:32AM -0800, John Hubbard wrote:
> On 2/4/21 10:44 AM, Alex Deucher wrote:
> ...
> > > > The argument is that vram is a scarce resource, but I don't know if
> > > > that is really the case these days.  At this point, we often have as
> > > > much vram as system ram if not more.
> > > 
> > > I thought the main argument was that GPU memory could move at any time
> > > between the GPU and CPU and the DMA buf would always track its current
> > > location?
> > 
> > I think the reason for that is that VRAM is scarce so we have to be
> > able to move it around.  We don't enforce the same limitations for
> > buffers in system memory.  We could just support pinning dma-bufs in
> > vram like we do with system ram.  Maybe with some conditions, e.g.,
> > p2p is possible, and the device has a large BAR so you aren't tying up
> > the BAR window.

Minimally we need cgroups for that vram, so it can be managed. Which is a
bit stuck unfortunately. But if we have cgroups with some pin limit, I
think we can easily lift this.

> Excellent. And yes, we are already building systems in which VRAM is
> definitely not scarce, but on the other hand, those newer systems can
> also handle GPU (and NIC) page faults, so not really an issue. For that,
> we just need to enhance HMM so that it does peer to peer.
> 
> We also have some older hardware with large BAR1 apertures, specifically
> for this sort of thing.
> 
> And again, for slightly older hardware, without pinning to VRAM there is
> no way to use this solution here for peer-to-peer. So I'm glad to see that
> so far you're not ruling out the pinning option.

Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
cgroups) or something like that, so we could benefit from the work to make
sure pin_user_pages and all these never end up in there?

https://lwn.net/Articles/843326/

Kind inspired by the recent lwn article.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-05 15:39             ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2021-02-05 15:39 UTC (permalink / raw)
  To: John Hubbard
  Cc: Leon Romanovsky, linux-rdma, Maling list - DRI developers,
	Doug Ledford, Jason Gunthorpe, Daniel Vetter, Christian Koenig,
	Jianxin Xiong

On Thu, Feb 04, 2021 at 11:00:32AM -0800, John Hubbard wrote:
> On 2/4/21 10:44 AM, Alex Deucher wrote:
> ...
> > > > The argument is that vram is a scarce resource, but I don't know if
> > > > that is really the case these days.  At this point, we often have as
> > > > much vram as system ram if not more.
> > > 
> > > I thought the main argument was that GPU memory could move at any time
> > > between the GPU and CPU and the DMA buf would always track its current
> > > location?
> > 
> > I think the reason for that is that VRAM is scarce so we have to be
> > able to move it around.  We don't enforce the same limitations for
> > buffers in system memory.  We could just support pinning dma-bufs in
> > vram like we do with system ram.  Maybe with some conditions, e.g.,
> > p2p is possible, and the device has a large BAR so you aren't tying up
> > the BAR window.

Minimally we need cgroups for that vram, so it can be managed. Which is a
bit stuck unfortunately. But if we have cgroups with some pin limit, I
think we can easily lift this.

> Excellent. And yes, we are already building systems in which VRAM is
> definitely not scarce, but on the other hand, those newer systems can
> also handle GPU (and NIC) page faults, so not really an issue. For that,
> we just need to enhance HMM so that it does peer to peer.
> 
> We also have some older hardware with large BAR1 apertures, specifically
> for this sort of thing.
> 
> And again, for slightly older hardware, without pinning to VRAM there is
> no way to use this solution here for peer-to-peer. So I'm glad to see that
> so far you're not ruling out the pinning option.

Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
cgroups) or something like that, so we could benefit from the work to make
sure pin_user_pages and all these never end up in there?

https://lwn.net/Articles/843326/

Kind inspired by the recent lwn article.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-05 15:39             ` Daniel Vetter
@ 2021-02-05 15:43               ` Jason Gunthorpe
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-02-05 15:43 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: John Hubbard, Alex Deucher, Leon Romanovsky, linux-rdma,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:

> > And again, for slightly older hardware, without pinning to VRAM there is
> > no way to use this solution here for peer-to-peer. So I'm glad to see that
> > so far you're not ruling out the pinning option.
> 
> Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
> ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
> cgroups) or something like that, so we could benefit from the work to make
> sure pin_user_pages and all these never end up in there?

ZONE_DEVICE should already not be returned from GUP.

I've understood in the hmm casse the idea was a CPU touch of some
ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
want to follow the same logic, presumably it comes for free with the
fault handler somehow

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-05 15:43               ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-02-05 15:43 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Leon Romanovsky, linux-rdma, John Hubbard,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:

> > And again, for slightly older hardware, without pinning to VRAM there is
> > no way to use this solution here for peer-to-peer. So I'm glad to see that
> > so far you're not ruling out the pinning option.
> 
> Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
> ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
> cgroups) or something like that, so we could benefit from the work to make
> sure pin_user_pages and all these never end up in there?

ZONE_DEVICE should already not be returned from GUP.

I've understood in the hmm casse the idea was a CPU touch of some
ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
want to follow the same logic, presumably it comes for free with the
fault handler somehow

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-05 15:43               ` Jason Gunthorpe
@ 2021-02-05 15:53                 ` Daniel Vetter
  -1 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2021-02-05 15:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, John Hubbard, Alex Deucher, Leon Romanovsky,
	linux-rdma, Maling list - DRI developers, Doug Ledford,
	Daniel Vetter, Christian Koenig, Jianxin Xiong

On Fri, Feb 05, 2021 at 11:43:19AM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:
> 
> > > And again, for slightly older hardware, without pinning to VRAM there is
> > > no way to use this solution here for peer-to-peer. So I'm glad to see that
> > > so far you're not ruling out the pinning option.
> > 
> > Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
> > ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
> > cgroups) or something like that, so we could benefit from the work to make
> > sure pin_user_pages and all these never end up in there?
> 
> ZONE_DEVICE should already not be returned from GUP.
> 
> I've understood in the hmm casse the idea was a CPU touch of some
> ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
> want to follow the same logic, presumably it comes for free with the
> fault handler somehow

Oh I didn't know this, I thought the proposed p2p direct i/o patches would
just use the fact that underneath ZONE_DEVICE there's "normal" struct
pages. And so I got worried that maybe also pin_user_pages can creep in.
But I didn't read the patches in full detail:

https://lore.kernel.org/linux-block/20201106170036.18713-12-logang@deltatee.com/

But if you're saying that this all needs specific code and all the gup/pup
code we have is excluded, I think we can make sure that we're not ever
building features that requiring time-unlimited pinning of ZONE_DEVICE.
Which I think we want.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-05 15:53                 ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2021-02-05 15:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, John Hubbard,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On Fri, Feb 05, 2021 at 11:43:19AM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:
> 
> > > And again, for slightly older hardware, without pinning to VRAM there is
> > > no way to use this solution here for peer-to-peer. So I'm glad to see that
> > > so far you're not ruling out the pinning option.
> > 
> > Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
> > ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
> > cgroups) or something like that, so we could benefit from the work to make
> > sure pin_user_pages and all these never end up in there?
> 
> ZONE_DEVICE should already not be returned from GUP.
> 
> I've understood in the hmm casse the idea was a CPU touch of some
> ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
> want to follow the same logic, presumably it comes for free with the
> fault handler somehow

Oh I didn't know this, I thought the proposed p2p direct i/o patches would
just use the fact that underneath ZONE_DEVICE there's "normal" struct
pages. And so I got worried that maybe also pin_user_pages can creep in.
But I didn't read the patches in full detail:

https://lore.kernel.org/linux-block/20201106170036.18713-12-logang@deltatee.com/

But if you're saying that this all needs specific code and all the gup/pup
code we have is excluded, I think we can make sure that we're not ever
building features that requiring time-unlimited pinning of ZONE_DEVICE.
Which I think we want.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-05 15:53                 ` Daniel Vetter
@ 2021-02-05 16:00                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-02-05 16:00 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: John Hubbard, Alex Deucher, Leon Romanovsky, linux-rdma,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On Fri, Feb 05, 2021 at 04:53:04PM +0100, Daniel Vetter wrote:
> On Fri, Feb 05, 2021 at 11:43:19AM -0400, Jason Gunthorpe wrote:
> > On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:
> > 
> > > > And again, for slightly older hardware, without pinning to VRAM there is
> > > > no way to use this solution here for peer-to-peer. So I'm glad to see that
> > > > so far you're not ruling out the pinning option.
> > > 
> > > Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
> > > ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
> > > cgroups) or something like that, so we could benefit from the work to make
> > > sure pin_user_pages and all these never end up in there?
> > 
> > ZONE_DEVICE should already not be returned from GUP.
> > 
> > I've understood in the hmm casse the idea was a CPU touch of some
> > ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
> > want to follow the same logic, presumably it comes for free with the
> > fault handler somehow
> 
> Oh I didn't know this, I thought the proposed p2p direct i/o patches would
> just use the fact that underneath ZONE_DEVICE there's "normal" struct
> pages. 

So, if that every happens, it would be some special FOLL_ALLOW_P2P
flag to get the behavior.

> And so I got worried that maybe also pin_user_pages can creep in.
> But I didn't read the patches in full detail:

And yes, you might want to say that you can't longterm pin certain
kinds of zone_device pages, but if that is the common operating mode
then we'd probably never create a FOLL_ALLOW_P2P

> But if you're saying that this all needs specific code and all the gup/pup
> code we have is excluded, I think we can make sure that we're not ever
> building features that requiring time-unlimited pinning of
> ZONE_DEVICE.

Well, it is certainly a useful idea of some uses of ZONE_DEVICE, GPU
vram is not the whole world.

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-05 16:00                   ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2021-02-05 16:00 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Leon Romanovsky, linux-rdma, John Hubbard,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On Fri, Feb 05, 2021 at 04:53:04PM +0100, Daniel Vetter wrote:
> On Fri, Feb 05, 2021 at 11:43:19AM -0400, Jason Gunthorpe wrote:
> > On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:
> > 
> > > > And again, for slightly older hardware, without pinning to VRAM there is
> > > > no way to use this solution here for peer-to-peer. So I'm glad to see that
> > > > so far you're not ruling out the pinning option.
> > > 
> > > Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
> > > ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
> > > cgroups) or something like that, so we could benefit from the work to make
> > > sure pin_user_pages and all these never end up in there?
> > 
> > ZONE_DEVICE should already not be returned from GUP.
> > 
> > I've understood in the hmm casse the idea was a CPU touch of some
> > ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
> > want to follow the same logic, presumably it comes for free with the
> > fault handler somehow
> 
> Oh I didn't know this, I thought the proposed p2p direct i/o patches would
> just use the fact that underneath ZONE_DEVICE there's "normal" struct
> pages. 

So, if that every happens, it would be some special FOLL_ALLOW_P2P
flag to get the behavior.

> And so I got worried that maybe also pin_user_pages can creep in.
> But I didn't read the patches in full detail:

And yes, you might want to say that you can't longterm pin certain
kinds of zone_device pages, but if that is the common operating mode
then we'd probably never create a FOLL_ALLOW_P2P

> But if you're saying that this all needs specific code and all the gup/pup
> code we have is excluded, I think we can make sure that we're not ever
> building features that requiring time-unlimited pinning of
> ZONE_DEVICE.

Well, it is certainly a useful idea of some uses of ZONE_DEVICE, GPU
vram is not the whole world.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-05 16:00                   ` Jason Gunthorpe
@ 2021-02-05 16:06                     ` Daniel Vetter
  -1 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2021-02-05 16:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, John Hubbard, Alex Deucher, Leon Romanovsky,
	linux-rdma, Maling list - DRI developers, Doug Ledford,
	Daniel Vetter, Christian Koenig, Jianxin Xiong

On Fri, Feb 05, 2021 at 12:00:03PM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 04:53:04PM +0100, Daniel Vetter wrote:
> > On Fri, Feb 05, 2021 at 11:43:19AM -0400, Jason Gunthorpe wrote:
> > > On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:
> > > 
> > > > > And again, for slightly older hardware, without pinning to VRAM there is
> > > > > no way to use this solution here for peer-to-peer. So I'm glad to see that
> > > > > so far you're not ruling out the pinning option.
> > > > 
> > > > Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
> > > > ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
> > > > cgroups) or something like that, so we could benefit from the work to make
> > > > sure pin_user_pages and all these never end up in there?
> > > 
> > > ZONE_DEVICE should already not be returned from GUP.
> > > 
> > > I've understood in the hmm casse the idea was a CPU touch of some
> > > ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
> > > want to follow the same logic, presumably it comes for free with the
> > > fault handler somehow
> > 
> > Oh I didn't know this, I thought the proposed p2p direct i/o patches would
> > just use the fact that underneath ZONE_DEVICE there's "normal" struct
> > pages. 
> 
> So, if that every happens, it would be some special FOLL_ALLOW_P2P
> flag to get the behavior.
> 
> > And so I got worried that maybe also pin_user_pages can creep in.
> > But I didn't read the patches in full detail:
> 
> And yes, you might want to say that you can't longterm pin certain
> kinds of zone_device pages, but if that is the common operating mode
> then we'd probably never create a FOLL_ALLOW_P2P
> 
> > But if you're saying that this all needs specific code and all the gup/pup
> > code we have is excluded, I think we can make sure that we're not ever
> > building features that requiring time-unlimited pinning of
> > ZONE_DEVICE.
> 
> Well, it is certainly a useful idea of some uses of ZONE_DEVICE, GPU
> vram is not the whole world.

Yeah non-volatile RAM can probably pin whatever it wants :-)

From the other thread, I think if we can get some cgroups going for
accounting pinned memory, then pinning gpu memory should also not be any
real issue. Might be somewhat tricky to glue that into a FOLL_ALLOW_P2P
flag, maybe through zone-awareness or something like that. With the right
accounting in place I'm happy to let userspace pin whatever they want
really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-05 16:06                     ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2021-02-05 16:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, John Hubbard,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On Fri, Feb 05, 2021 at 12:00:03PM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 05, 2021 at 04:53:04PM +0100, Daniel Vetter wrote:
> > On Fri, Feb 05, 2021 at 11:43:19AM -0400, Jason Gunthorpe wrote:
> > > On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:
> > > 
> > > > > And again, for slightly older hardware, without pinning to VRAM there is
> > > > > no way to use this solution here for peer-to-peer. So I'm glad to see that
> > > > > so far you're not ruling out the pinning option.
> > > > 
> > > > Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
> > > > ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
> > > > cgroups) or something like that, so we could benefit from the work to make
> > > > sure pin_user_pages and all these never end up in there?
> > > 
> > > ZONE_DEVICE should already not be returned from GUP.
> > > 
> > > I've understood in the hmm casse the idea was a CPU touch of some
> > > ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
> > > want to follow the same logic, presumably it comes for free with the
> > > fault handler somehow
> > 
> > Oh I didn't know this, I thought the proposed p2p direct i/o patches would
> > just use the fact that underneath ZONE_DEVICE there's "normal" struct
> > pages. 
> 
> So, if that every happens, it would be some special FOLL_ALLOW_P2P
> flag to get the behavior.
> 
> > And so I got worried that maybe also pin_user_pages can creep in.
> > But I didn't read the patches in full detail:
> 
> And yes, you might want to say that you can't longterm pin certain
> kinds of zone_device pages, but if that is the common operating mode
> then we'd probably never create a FOLL_ALLOW_P2P
> 
> > But if you're saying that this all needs specific code and all the gup/pup
> > code we have is excluded, I think we can make sure that we're not ever
> > building features that requiring time-unlimited pinning of
> > ZONE_DEVICE.
> 
> Well, it is certainly a useful idea of some uses of ZONE_DEVICE, GPU
> vram is not the whole world.

Yeah non-volatile RAM can probably pin whatever it wants :-)

From the other thread, I think if we can get some cgroups going for
accounting pinned memory, then pinning gpu memory should also not be any
real issue. Might be somewhat tricky to glue that into a FOLL_ALLOW_P2P
flag, maybe through zone-awareness or something like that. With the right
accounting in place I'm happy to let userspace pin whatever they want
really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
  2021-02-05 15:53                 ` Daniel Vetter
@ 2021-02-05 20:24                   ` John Hubbard
  -1 siblings, 0 replies; 48+ messages in thread
From: John Hubbard @ 2021-02-05 20:24 UTC (permalink / raw)
  To: Daniel Vetter, Jason Gunthorpe
  Cc: Alex Deucher, Leon Romanovsky, linux-rdma,
	Maling list - DRI developers, Doug Ledford, Daniel Vetter,
	Christian Koenig, Jianxin Xiong

On 2/5/21 7:53 AM, Daniel Vetter wrote:
> On Fri, Feb 05, 2021 at 11:43:19AM -0400, Jason Gunthorpe wrote:
>> On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:
>>
>>>> And again, for slightly older hardware, without pinning to VRAM there is
>>>> no way to use this solution here for peer-to-peer. So I'm glad to see that
>>>> so far you're not ruling out the pinning option.
>>>
>>> Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
>>> ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
>>> cgroups) or something like that, so we could benefit from the work to make
>>> sure pin_user_pages and all these never end up in there?
>>
>> ZONE_DEVICE should already not be returned from GUP.
>>
>> I've understood in the hmm casse the idea was a CPU touch of some
>> ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
>> want to follow the same logic, presumably it comes for free with the
>> fault handler somehow
> 
> Oh I didn't know this, I thought the proposed p2p direct i/o patches would
> just use the fact that underneath ZONE_DEVICE there's "normal" struct
> pages. And so I got worried that maybe also pin_user_pages can creep in.
> But I didn't read the patches in full detail:
> 
> https://lore.kernel.org/linux-block/20201106170036.18713-12-logang@deltatee.com/
> 
> But if you're saying that this all needs specific code and all the gup/pup
> code we have is excluded, I think we can make sure that we're not ever
> building features that requiring time-unlimited pinning of ZONE_DEVICE.
> Which I think we want.
> 

 From an HMM perspective, the above sounds about right. HMM relies on the
GPU/device memory being ZONE_DEVICE, *and* on that memory *not* being pinned.
(HMM's mmu notifier callbacks act as a sort of virtual pin, but not a refcount
pin.)

It's a nice clean design point that we need to preserve, and fortunately it
doesn't conflict with anything I'm seeing here. But I want to say this out
loud because I see some doubt about it creeping into the discussion.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v16 0/4] RDMA: Add dma-buf support
@ 2021-02-05 20:24                   ` John Hubbard
  0 siblings, 0 replies; 48+ messages in thread
From: John Hubbard @ 2021-02-05 20:24 UTC (permalink / raw)
  To: Daniel Vetter, Jason Gunthorpe
  Cc: Leon Romanovsky, linux-rdma, Maling list - DRI developers,
	Doug Ledford, Daniel Vetter, Christian Koenig, Jianxin Xiong

On 2/5/21 7:53 AM, Daniel Vetter wrote:
> On Fri, Feb 05, 2021 at 11:43:19AM -0400, Jason Gunthorpe wrote:
>> On Fri, Feb 05, 2021 at 04:39:47PM +0100, Daniel Vetter wrote:
>>
>>>> And again, for slightly older hardware, without pinning to VRAM there is
>>>> no way to use this solution here for peer-to-peer. So I'm glad to see that
>>>> so far you're not ruling out the pinning option.
>>>
>>> Since HMM and ZONE_DEVICE came up, I'm kinda tempted to make ZONE_DEVICE
>>> ZONE_MOVEABLE (at least if you don't have a pinned vram contigent in your
>>> cgroups) or something like that, so we could benefit from the work to make
>>> sure pin_user_pages and all these never end up in there?
>>
>> ZONE_DEVICE should already not be returned from GUP.
>>
>> I've understood in the hmm casse the idea was a CPU touch of some
>> ZONE_DEVICE pages would trigger a migration to CPU memory, GUP would
>> want to follow the same logic, presumably it comes for free with the
>> fault handler somehow
> 
> Oh I didn't know this, I thought the proposed p2p direct i/o patches would
> just use the fact that underneath ZONE_DEVICE there's "normal" struct
> pages. And so I got worried that maybe also pin_user_pages can creep in.
> But I didn't read the patches in full detail:
> 
> https://lore.kernel.org/linux-block/20201106170036.18713-12-logang@deltatee.com/
> 
> But if you're saying that this all needs specific code and all the gup/pup
> code we have is excluded, I think we can make sure that we're not ever
> building features that requiring time-unlimited pinning of ZONE_DEVICE.
> Which I think we want.
> 

 From an HMM perspective, the above sounds about right. HMM relies on the
GPU/device memory being ZONE_DEVICE, *and* on that memory *not* being pinned.
(HMM's mmu notifier callbacks act as a sort of virtual pin, but not a refcount
pin.)

It's a nice clean design point that we need to preserve, and fortunately it
doesn't conflict with anything I'm seeing here. But I want to say this out
loud because I see some doubt about it creeping into the discussion.

thanks,
-- 
John Hubbard
NVIDIA
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2021-02-05 22:28 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-15 21:27 [PATCH v16 0/4] RDMA: Add dma-buf support Jianxin Xiong
2020-12-15 21:27 ` Jianxin Xiong
2020-12-15 21:27 ` [PATCH v16 1/4] RDMA/umem: Support importing dma-buf as user memory region Jianxin Xiong
2020-12-15 21:27   ` Jianxin Xiong
2020-12-15 21:27 ` [PATCH v16 2/4] RDMA/core: Add device method for registering dma-buf based " Jianxin Xiong
2020-12-15 21:27   ` Jianxin Xiong
2020-12-15 21:27 ` [PATCH v16 3/4] RDMA/uverbs: Add uverbs command for dma-buf based MR registration Jianxin Xiong
2020-12-15 21:27   ` Jianxin Xiong
2020-12-15 21:27 ` [PATCH v16 4/4] RDMA/mlx5: Support dma-buf based userspace memory region Jianxin Xiong
2020-12-15 21:27   ` Jianxin Xiong
2021-01-11 15:24 ` [PATCH v16 0/4] RDMA: Add dma-buf support Xiong, Jianxin
2021-01-11 15:24   ` Xiong, Jianxin
2021-01-11 15:42   ` Jason Gunthorpe
2021-01-11 15:42     ` Jason Gunthorpe
2021-01-11 17:44     ` Xiong, Jianxin
2021-01-11 17:44       ` Xiong, Jianxin
2021-01-11 17:47       ` Alex Deucher
2021-01-11 17:47         ` Alex Deucher
2021-01-11 17:55         ` Xiong, Jianxin
2021-01-11 17:55           ` Xiong, Jianxin
2021-01-12 12:49           ` Yishai Hadas
2021-01-12 12:49             ` Yishai Hadas
2021-01-12 18:11             ` Xiong, Jianxin
2021-01-12 18:11               ` Xiong, Jianxin
2021-01-21 16:59 ` Jason Gunthorpe
2021-01-21 16:59   ` Jason Gunthorpe
2021-02-04  7:48 ` John Hubbard
2021-02-04  7:48   ` John Hubbard
2021-02-04 13:50   ` Alex Deucher
2021-02-04 13:50     ` Alex Deucher
2021-02-04 18:29     ` Jason Gunthorpe
2021-02-04 18:29       ` Jason Gunthorpe
2021-02-04 18:44       ` Alex Deucher
2021-02-04 18:44         ` Alex Deucher
2021-02-04 19:00         ` John Hubbard
2021-02-04 19:00           ` John Hubbard
2021-02-05 15:39           ` Daniel Vetter
2021-02-05 15:39             ` Daniel Vetter
2021-02-05 15:43             ` Jason Gunthorpe
2021-02-05 15:43               ` Jason Gunthorpe
2021-02-05 15:53               ` Daniel Vetter
2021-02-05 15:53                 ` Daniel Vetter
2021-02-05 16:00                 ` Jason Gunthorpe
2021-02-05 16:00                   ` Jason Gunthorpe
2021-02-05 16:06                   ` Daniel Vetter
2021-02-05 16:06                     ` Daniel Vetter
2021-02-05 20:24                 ` John Hubbard
2021-02-05 20:24                   ` John Hubbard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.