linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-mm@kvack.org, iommu@lists.linux-foundation.org
Cc: "Stephen Bates" <sbates@raithlin.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Christian König" <christian.koenig@amd.com>,
	"Ira Weiny" <iweiny@intel.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Don Dutile" <ddutile@redhat.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Logan Gunthorpe" <logang@deltatee.com>
Subject: [RFC PATCH 00/15] Userspace P2PDMA with O_DIRECT NVMe devices
Date: Fri,  6 Nov 2020 10:00:21 -0700	[thread overview]
Message-ID: <20201106170036.18713-1-logang@deltatee.com> (raw)

This RFC enables P2PDMA transfers in userspace between NVMe drives using
existing O_DIRECT operations or the NVMe passthrough IOCTL.

This is accomplished by allowing userspace to allocate chunks of any CMB
by mmaping the NVMe ctrl device (Patches 14 and 15). The resulting memory
will be backed by P2P pages and can be passed only to O_DIRECT
operations. A flag is added to GUP() in Patch 10 and Patches 11 through 13
wire this flag up based on whether the block queue indicates P2PDMA
support.

The above is pretty straight forward and (I hope) largely uncontroversial.
However, the one significant problem in all this is that, presently,
pci_p2pdma_map_sg() requires a homogeneous SGL with all P2PDMA pages or
none. Enhancing GUP to support enforcing this rule would require a huge
hack that I don't expect would be all that pallatable. So this RFC takes
the approach of removing the requirement of having a homogeneous SGL.

With the new common dma-iommu infrastructure, this patchset adds
support for P2PDMA pages into dma_map_sg() which will support AMD,
Intel (soon) and dma-direct implementations. (Other IOMMU
implementations would then be unsupported, notably ARM and PowerPC).

The other major blocker is that in order to implement support for
P2PDMA pages in dma_map_sg(), a flag is necessary to determine if a
given dma_addr_t points to P2PDMA memory or to an IOVA so that it can
be unmapped appropriately in dma_unmap_sg(). The (ugly) approach this
RFC takes is to use the top bit in the dma_length field and ensure
callers are prepared for it using a new DMA_ATTR_P2PDMA flag.

I suspect, the ultimate solution to this blocker will be to implement
some kind of new dma_op that doesn't use the SGL. Ideas have been
thrown around in the past for one that maps some kind of novel dma_vec
directly from a bio_vec. This will become a lot easier to implement if
more dma_ops providers get converted to the new dma-iommu
implementation, but this will take time.

Alternative ideas or other feedback welcome.

This series is based on v5.10-rc2 with Lu Baolu's (and Tom Murphy's)
v4 patchset for converting the Intel IOMMU to dma-iommu[1]. A git
branch is available here:

  https://github.com/sbates130272/linux-p2pmem/  p2pdma_user_cmb_rfc

Thanks,

Logan

[1] https://lkml.kernel.org/lkml/20200927063437.13988-1-baolu.lu@linux.intel.com/T/#u.


Logan Gunthorpe (15):
  PCI/P2PDMA: Don't sleep in upstream_bridge_distance_warn()
  PCI/P2PDMA: Attempt to set map_type if it has not been set
  PCI/P2PDMA: Introduce pci_p2pdma_should_map_bus() and
    pci_p2pdma_bus_offset()
  lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL
  dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
  iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
  nvme-pci: Convert to using dma_map_sg for p2pdma pages
  mm: Introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
  iov_iter: Introduce iov_iter_get_pages_[alloc_]flags()
  block: Set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
  block: Set FOLL_PCI_P2PDMA in bio_map_user_iov()
  PCI/P2PDMA: Introduce pci_mmap_p2pmem()
  nvme-pci: Allow mmaping the CMB in userspace

 block/bio.c                 |   7 +-
 block/blk-map.c             |   7 +-
 drivers/dax/super.c         |   7 +-
 drivers/iommu/dma-iommu.c   |  63 +++++++++++--
 drivers/nvme/host/core.c    |  14 ++-
 drivers/nvme/host/nvme.h    |   3 +-
 drivers/nvme/host/pci.c     |  50 ++++++----
 drivers/pci/p2pdma.c        | 178 +++++++++++++++++++++++++++++++++---
 include/linux/dma-map-ops.h |   3 +
 include/linux/dma-mapping.h |  16 ++++
 include/linux/memremap.h    |   4 +-
 include/linux/mm.h          |   1 +
 include/linux/pci-p2pdma.h  |  17 ++++
 include/linux/scatterlist.h |   4 +
 include/linux/uio.h         |  21 ++++-
 kernel/dma/direct.c         |  33 ++++++-
 kernel/dma/mapping.c        |   8 ++
 lib/iov_iter.c              |  25 ++---
 mm/gup.c                    |  28 +++---
 mm/huge_memory.c            |   8 +-
 mm/memory-failure.c         |   4 +-
 mm/memremap.c               |  14 ++-
 22 files changed, 427 insertions(+), 88 deletions(-)


base-commit: 5ba8a2512e8c5f5cf9b7309dc895612f0a77a399
--
2.20.1

             reply	other threads:[~2020-11-06 17:01 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-06 17:00 Logan Gunthorpe [this message]
2020-11-06 17:00 ` [RFC PATCH 01/15] PCI/P2PDMA: Don't sleep in upstream_bridge_distance_warn() Logan Gunthorpe
2020-11-09  9:12   ` Christoph Hellwig
2020-11-06 17:00 ` [RFC PATCH 02/15] PCI/P2PDMA: Attempt to set map_type if it has not been set Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 03/15] PCI/P2PDMA: Introduce pci_p2pdma_should_map_bus() and pci_p2pdma_bus_offset() Logan Gunthorpe
2020-11-10 23:25   ` Bjorn Helgaas
2020-11-10 23:42     ` Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 04/15] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL Logan Gunthorpe
2020-11-09  9:12   ` Christoph Hellwig
2020-11-09 14:02     ` Robin Murphy
2020-11-09 16:47     ` Logan Gunthorpe
2020-12-10  1:22       ` Dan Williams
2020-12-10  2:06         ` Logan Gunthorpe
2020-12-10  4:04           ` Dan Williams
2020-12-10 16:44             ` Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 05/15] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 06/15] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 07/15] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 08/15] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 09/15] nvme-pci: Convert to using dma_map_sg for p2pdma pages Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 10/15] mm: Introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 11/15] iov_iter: Introduce iov_iter_get_pages_[alloc_]flags() Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 12/15] block: Set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages() Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 13/15] block: Set FOLL_PCI_P2PDMA in bio_map_user_iov() Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 14/15] PCI/P2PDMA: Introduce pci_mmap_p2pmem() Logan Gunthorpe
2020-11-06 17:22   ` Jason Gunthorpe
2020-11-06 17:28     ` Logan Gunthorpe
2020-11-06 17:42       ` Jason Gunthorpe
2020-11-06 17:53         ` Logan Gunthorpe
2020-11-06 18:09           ` Jason Gunthorpe
2020-11-06 18:20             ` Logan Gunthorpe
2020-11-06 19:30               ` Jason Gunthorpe
2020-11-06 19:44                 ` Logan Gunthorpe
2020-11-06 19:53                   ` Jason Gunthorpe
2020-11-06 20:03                     ` Logan Gunthorpe
2020-11-07  0:14                       ` Jason Gunthorpe
2020-11-07  2:50                         ` Logan Gunthorpe
2020-11-06 17:00 ` [RFC PATCH 15/15] nvme-pci: Allow mmaping the CMB in userspace Logan Gunthorpe
2020-11-06 17:39   ` Jason Gunthorpe
2020-11-06 17:43     ` Logan Gunthorpe
2020-11-09 15:03   ` Keith Busch
2020-11-09 16:50     ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201106170036.18713-1-logang@deltatee.com \
    --to=logang@deltatee.com \
    --cc=christian.koenig@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=ddutile@redhat.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=iweiny@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sbates@raithlin.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).