dri-devel Archive on lore.kernel.org
 help / color / Atom feed
  • * RE: [RFC PATCH v2 0/3] RDMA: add dma-buf support
           [not found] <1593451903-30959-1-git-send-email-jianxin.xiong@intel.com>
           [not found] ` <20200629185152.GD25301@ziepe.ca>
    @ 2020-06-30 18:56 ` Xiong, Jianxin
           [not found] ` <1593451903-30959-2-git-send-email-jianxin.xiong@intel.com>
                       ` (2 subsequent siblings)
      4 siblings, 0 replies; 28+ messages in thread
    From: Xiong, Jianxin @ 2020-06-30 18:56 UTC (permalink / raw)
      To: linux-rdma
      Cc: Leon Romanovsky, dri-devel, Jason Gunthorpe, Doug Ledford,
    	Vetter, Daniel, Christian Koenig
    
    Added to cc-list:
    Christian Koenig <christian.koenig@amd.com>
    dri-devel@lists.freedesktop.org
    
    > -----Original Message-----
    > From: Xiong, Jianxin <jianxin.xiong@intel.com>
    > Sent: Monday, June 29, 2020 10:32 AM
    > To: linux-rdma@vger.kernel.org
    > Cc: Xiong, Jianxin <jianxin.xiong@intel.com>; Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>; Sumit Semwal
    > <sumit.semwal@linaro.org>; Leon Romanovsky <leon@kernel.org>; Vetter, Daniel <daniel.vetter@intel.com>
    > Subject: [RFC PATCH v2 0/3] RDMA: add dma-buf support
    > 
    > When enabled, an RDMA capable NIC can perform peer-to-peer transactions
    > over PCIe to access the local memory located on another device. This can
    > often lead to better performance than using a system memory buffer for
    > RDMA and copying data between the buffer and device memory.
    > 
    > Current kernel RDMA stack uses get_user_pages() to pin the physical
    > pages backing the user buffer and uses dma_map_sg_attrs() to get the
    > dma addresses for memory access. This usually doesn't work for peer
    > device memory due to the lack of associated page structures.
    > 
    > Several mechanisms exist today to facilitate device memory access.
    > 
    > ZONE_DEVICE is a new zone for device memory in the memory management
    > subsystem. It allows pages from device memory being described with
    > specialized page structures. As the result, calls like get_user_pages()
    > can succeed, but what can be done with these page structures may be
    > different from system memory. It is further specialized into multiple
    > memory types, such as one type for PCI p2pmem/p2pdma and one type for
    > HMM.
    > 
    > PCI p2pmem/p2pdma uses ZONE_DEVICE to represent device memory residing
    > in a PCI BAR and provides a set of calls to publish, discover, allocate,
    > and map such memory for peer-to-peer transactions. One feature of the
    > API is that the buffer is allocated by the side that does the DMA
    > transfer. This works well with the storage usage case, but is awkward
    > with GPU-NIC communication, where typically the buffer is allocated by
    > the GPU driver rather than the NIC driver.
    > 
    > Heterogeneous Memory Management (HMM) utilizes mmu_interval_notifier
    > and ZONE_DEVICE to support shared virtual address space and page
    > migration between system memory and device memory. HMM doesn't support
    > pinning device memory because pages located on device must be able to
    > migrate to system memory when accessed by CPU. Peer-to-peer access
    > is possible if the peer can handle page fault. For RDMA, that means
    > the NIC must support on-demand paging.
    > 
    > Dma-buf is a standard mechanism for sharing buffers among different
    > device drivers. The buffer to be shared is exported by the owning
    > driver and imported by the driver that wants to use it. The exporter
    > provides a set of ops that the importer can call to pin and map the
    > buffer. In addition, a file descriptor can be associated with a dma-
    > buf object as the handle that can be passed to user space.
    > 
    > This patch series adds dma-buf importer role to the RDMA driver in
    > attempt to support RDMA using device memory such as GPU VRAM. Dma-buf is
    > chosen for a few reasons: first, the API is relatively simple and allows
    > a lot of flexibility in implementing the buffer manipulation ops.
    > Second, it doesn't require page structure. Third, dma-buf is already
    > supported in many GPU drivers. However, we are aware that existing GPU
    > drivers don't allow pinning device memory via the dma-buf interface.
    > Pinning and mapping a dma-buf would cause the backing storage to migrate
    > to system RAM. This is due to the lack of knowledge about whether the
    > importer can perform peer-to-peer access and the lack of resource limit
    > control measure for GPU. For the first part, the latest dma-buf driver
    > has a peer-to-peer flag for the importer, but the flag is currently tied
    > to dynamic mapping support, which requires on-demand paging support from
    > the NIC to work. There are a few possible ways to address these issues,
    > such as decoupling peer-to-peer flag from dynamic mapping, allowing more
    > leeway for individual drivers to make the pinning decision and adding
    > GPU resource limit control via cgroup. We would like to get comments on
    > this patch series with the assumption that device memory pinning via
    > dma-buf is supported by some GPU drivers, and at the same time welcome
    > open discussions on how to address the aforementioned issues as well as
    > GPU-NIC peer-to-peer access solutions in general.
    > 
    > This is the second version of the patch series. Here are the changes
    > from the previous version:
    > * The Kconfig option is removed. There is no dependence issue since
    > dma-buf driver is always enabled.
    > * The declaration of new data structure and functions is reorganized to
    > minimize the visibility of the changes.
    > * The new uverbs command now goes through ioctl() instead of write().
    > * The rereg functionality is removed.
    > * Instead of adding new device method for dma-buf specific registration,
    > existing method is extended to accept an extra parameter.
    > * The correct function is now used for address range checking.
    > 
    > This series is organized as follows. The first patch adds the common
    > code for importing dma-buf from a file descriptor and pinning and
    > mapping the dma-buf pages. Patch 2 extends the reg_user_mr() method
    > of the ib_device structure to accept dma-buf file descriptor as an extra
    > parameter. Vendor drivers are updated with the change. Patch 3 adds a
    > new uverbs command for registering dma-buf based memory region.
    > 
    > Related user space RDMA library changes will be provided as a separate
    > patch series.
    > 
    > Jianxin Xiong (3):
    >   RDMA/umem: Support importing dma-buf as user memory region
    >   RDMA/core: Expand the driver method 'reg_user_mr' to support dma-buf
    >   RDMA/uverbs: Add uverbs command for dma-buf based MR registration
    > 
    >  drivers/infiniband/core/Makefile                |   2 +-
    >  drivers/infiniband/core/umem.c                  |   4 +
    >  drivers/infiniband/core/umem_dmabuf.c           | 105 ++++++++++++++++++++++
    >  drivers/infiniband/core/umem_dmabuf.h           |  11 +++
    >  drivers/infiniband/core/uverbs_cmd.c            |   2 +-
    >  drivers/infiniband/core/uverbs_std_types_mr.c   | 112 ++++++++++++++++++++++++
    >  drivers/infiniband/core/verbs.c                 |   2 +-
    >  drivers/infiniband/hw/bnxt_re/ib_verbs.c        |   7 +-
    >  drivers/infiniband/hw/bnxt_re/ib_verbs.h        |   2 +-
    >  drivers/infiniband/hw/cxgb4/iw_cxgb4.h          |   3 +-
    >  drivers/infiniband/hw/cxgb4/mem.c               |   8 +-
    >  drivers/infiniband/hw/efa/efa.h                 |   2 +-
    >  drivers/infiniband/hw/efa/efa_verbs.c           |   7 +-
    >  drivers/infiniband/hw/hns/hns_roce_device.h     |   2 +-
    >  drivers/infiniband/hw/hns/hns_roce_mr.c         |   7 +-
    >  drivers/infiniband/hw/i40iw/i40iw_verbs.c       |   6 ++
    >  drivers/infiniband/hw/mlx4/mlx4_ib.h            |   2 +-
    >  drivers/infiniband/hw/mlx4/mr.c                 |   7 +-
    >  drivers/infiniband/hw/mlx5/mlx5_ib.h            |   2 +-
    >  drivers/infiniband/hw/mlx5/mr.c                 |  45 +++++++++-
    >  drivers/infiniband/hw/mthca/mthca_provider.c    |   8 +-
    >  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c     |   9 +-
    >  drivers/infiniband/hw/ocrdma/ocrdma_verbs.h     |   3 +-
    >  drivers/infiniband/hw/qedr/verbs.c              |   8 +-
    >  drivers/infiniband/hw/qedr/verbs.h              |   3 +-
    >  drivers/infiniband/hw/usnic/usnic_ib_verbs.c    |   8 +-
    >  drivers/infiniband/hw/usnic/usnic_ib_verbs.h    |   2 +-
    >  drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c    |   6 +-
    >  drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.h |   2 +-
    >  drivers/infiniband/sw/rdmavt/mr.c               |   6 +-
    >  drivers/infiniband/sw/rdmavt/mr.h               |   2 +-
    >  drivers/infiniband/sw/rxe/rxe_verbs.c           |   6 ++
    >  drivers/infiniband/sw/siw/siw_verbs.c           |   8 +-
    >  drivers/infiniband/sw/siw/siw_verbs.h           |   3 +-
    >  include/rdma/ib_umem.h                          |  14 ++-
    >  include/rdma/ib_verbs.h                         |   4 +-
    >  include/uapi/rdma/ib_user_ioctl_cmds.h          |  14 +++
    >  37 files changed, 410 insertions(+), 34 deletions(-)
    >  create mode 100644 drivers/infiniband/core/umem_dmabuf.c
    >  create mode 100644 drivers/infiniband/core/umem_dmabuf.h
    > 
    > --
    > 1.8.3.1
    
    _______________________________________________
    dri-devel mailing list
    dri-devel@lists.freedesktop.org
    https://lists.freedesktop.org/mailman/listinfo/dri-devel
    
    ^ permalink raw reply	[flat|nested] 28+ messages in thread
  • [parent not found: <1593451903-30959-2-git-send-email-jianxin.xiong@intel.com>]
  • [parent not found: <1593451903-30959-3-git-send-email-jianxin.xiong@intel.com>]
  • [parent not found: <1593451903-30959-4-git-send-email-jianxin.xiong@intel.com>]

  • end of thread, back to index
    
    Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
    -- links below jump to the message on this page --
         [not found] <1593451903-30959-1-git-send-email-jianxin.xiong@intel.com>
         [not found] ` <20200629185152.GD25301@ziepe.ca>
         [not found]   ` <MW3PR11MB4555A99038FA0CFC3ED80D3DE56F0@MW3PR11MB4555.namprd11.prod.outlook.com>
         [not found]     ` <20200630173435.GK25301@ziepe.ca>
    2020-06-30 18:46       ` [RFC PATCH v2 0/3] RDMA: add dma-buf support Xiong, Jianxin
    2020-06-30 19:17         ` Jason Gunthorpe
    2020-06-30 20:08           ` Xiong, Jianxin
    2020-07-02 12:27             ` Jason Gunthorpe
    2020-07-01  9:03         ` Christian König
    2020-07-01 12:07           ` Daniel Vetter
    2020-07-01 12:14             ` Daniel Vetter
    2020-07-01 12:39           ` Jason Gunthorpe
    2020-07-01 12:55             ` Christian König
    2020-07-01 15:42               ` Daniel Vetter
    2020-07-01 17:15                 ` Jason Gunthorpe
    2020-07-02 13:10                   ` Daniel Vetter
    2020-07-02 13:29                     ` Jason Gunthorpe
    2020-07-02 14:50                       ` Christian König
    2020-07-02 18:15                         ` Daniel Vetter
    2020-07-03 12:03                           ` Jason Gunthorpe
    2020-07-03 12:52                             ` Daniel Vetter
    2020-07-03 13:14                               ` Jason Gunthorpe
    2020-07-03 13:21                                 ` Christian König
    2020-07-07 21:58                                   ` Xiong, Jianxin
    2020-07-08  9:38                                     ` Christian König
    2020-07-08  9:49                                       ` Daniel Vetter
    2020-07-08 14:20                                         ` Christian König
    2020-07-08 14:33                                           ` Alex Deucher
    2020-06-30 18:56 ` Xiong, Jianxin
         [not found] ` <1593451903-30959-2-git-send-email-jianxin.xiong@intel.com>
    2020-06-30 19:04   ` [RFC PATCH v2 1/3] RDMA/umem: Support importing dma-buf as user memory region Xiong, Jianxin
         [not found] ` <1593451903-30959-3-git-send-email-jianxin.xiong@intel.com>
    2020-06-30 19:04   ` [RFC PATCH v2 2/3] RDMA/core: Expand the driver method 'reg_user_mr' to support dma-buf Xiong, Jianxin
         [not found] ` <1593451903-30959-4-git-send-email-jianxin.xiong@intel.com>
    2020-06-30 19:05   ` [RFC PATCH v2 3/3] RDMA/uverbs: Add uverbs command for dma-buf based MR registration Xiong, Jianxin
    

    dri-devel Archive on lore.kernel.org
    
    Archives are clonable:
    	git clone --mirror https://lore.kernel.org/dri-devel/0 dri-devel/git/0.git
    
    	# If you have public-inbox 1.1+ installed, you may
    	# initialize and index your mirror using the following commands:
    	public-inbox-init -V2 dri-devel dri-devel/ https://lore.kernel.org/dri-devel \
    		dri-devel@lists.freedesktop.org
    	public-inbox-index dri-devel
    
    Example config snippet for mirrors
    
    Newsgroup available over NNTP:
    	nntp://nntp.lore.kernel.org/org.freedesktop.lists.dri-devel
    
    
    AGPL code for this site: git clone https://public-inbox.org/public-inbox.git