linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@au1.ibm.com>
To: Logan Gunthorpe <logang@deltatee.com>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-block@vger.kernel.org
Cc: "Stephen Bates" <sbates@raithlin.com>,
	"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
	"Keith Busch" <keith.busch@intel.com>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Jason Gunthorpe" <jgg@mellanox.com>,
	"Max Gurtovoy" <maxg@mellanox.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Oliver OHalloran" <oliveroh@au1.ibm.com>
Subject: Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory
Date: Thu, 01 Mar 2018 14:56:09 +1100	[thread overview]
Message-ID: <1519876569.4592.4.camel@au1.ibm.com> (raw)
In-Reply-To: <1519876489.4592.3.camel@kernel.crashing.org>

On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2018-02-28 at 16:39 -0700, Logan Gunthorpe wrote:
> > Hi Everyone,
> 
> 
> So Oliver (CC) was having issues getting any of that to work for us.
> 
> The problem is that acccording to him (I didn't double check the latest
> patches) you effectively hotplug the PCIe memory into the system when
> creating struct pages.
> 
> This cannot possibly work for us. First we cannot map PCIe memory as
> cachable. (Note that doing so is a bad idea if you are behind a PLX
> switch anyway since you'd ahve to manage cache coherency in SW).

Note: I think the above means it won't work behind a switch on x86
either, will it ?

> Then our MMIO space is so far away from our memory space that there is
> not enough vmemmap virtual space to be able to do that.
> 
> So this can only work accross achitectures by using something like HMM
> to create special device struct page's.
> 
> Ben.
> 
> 
> > Here's v2 of our series to introduce P2P based copy offload to NVMe
> > fabrics. This version has been rebased onto v4.16-rc3 which already
> > includes Christoph's devpagemap work the previous version was based
> > off as well as a couple of the cleanup patches that were in v1.
> > 
> > Additionally, we've made the following changes based on feedback:
> > 
> > * Renamed everything to 'p2pdma' per the suggestion from Bjorn as well
> >   as a bunch of cleanup and spelling fixes he pointed out in the last
> >   series.
> > 
> > * To address Alex's ACS concerns, we change to a simpler method of
> >   just disabling ACS behind switches for any kernel that has
> >   CONFIG_PCI_P2PDMA.
> > 
> > * We also reject using devices that employ 'dma_virt_ops' which should
> >   fairly simply handle Jason's concerns that this work might break with
> >   the HFI, QIB and rxe drivers that use the virtual ops to implement
> >   their own special DMA operations.
> > 
> > Thanks,
> > 
> > Logan
> > 
> > --
> > 
> > This is a continuation of our work to enable using Peer-to-Peer PCI
> > memory in NVMe fabrics targets. Many thanks go to Christoph Hellwig who
> > provided valuable feedback to get these patches to where they are today.
> > 
> > The concept here is to use memory that's exposed on a PCI BAR as
> > data buffers in the NVME target code such that data can be transferred
> > from an RDMA NIC to the special memory and then directly to an NVMe
> > device avoiding system memory entirely. The upside of this is better
> > QoS for applications running on the CPU utilizing memory and lower
> > PCI bandwidth required to the CPU (such that systems could be designed
> > with fewer lanes connected to the CPU). However, presently, the
> > trade-off is currently a reduction in overall throughput. (Largely due
> > to hardware issues that would certainly improve in the future).
> > 
> > Due to these trade-offs we've designed the system to only enable using
> > the PCI memory in cases where the NIC, NVMe devices and memory are all
> > behind the same PCI switch. This will mean many setups that could likely
> > work well will not be supported so that we can be more confident it
> > will work and not place any responsibility on the user to understand
> > their topology. (We chose to go this route based on feedback we
> > received at the last LSF). Future work may enable these transfers behind
> > a fabric of PCI switches or perhaps using a white list of known good
> > root complexes.
> > 
> > In order to enable this functionality, we introduce a few new PCI
> > functions such that a driver can register P2P memory with the system.
> > Struct pages are created for this memory using devm_memremap_pages()
> > and the PCI bus offset is stored in the corresponding pagemap structure.
> > 
> > Another set of functions allow a client driver to create a list of
> > client devices that will be used in a given P2P transactions and then
> > use that list to find any P2P memory that is supported by all the
> > client devices. This list is then also used to selectively disable the
> > ACS bits for the downstream ports behind these devices.
> > 
> > In the block layer, we also introduce a P2P request flag to indicate a
> > given request targets P2P memory as well as a flag for a request queue
> > to indicate a given queue supports targeting P2P memory. P2P requests
> > will only be accepted by queues that support it. Also, P2P requests
> > are marked to not be merged seeing a non-homogenous request would
> > complicate the DMA mapping requirements.
> > 
> > In the PCI NVMe driver, we modify the existing CMB support to utilize
> > the new PCI P2P memory infrastructure and also add support for P2P
> > memory in its request queue. When a P2P request is received it uses the
> > pci_p2pmem_map_sg() function which applies the necessary transformation
> > to get the corrent pci_bus_addr_t for the DMA transactions.
> > 
> > In the RDMA core, we also adjust rdma_rw_ctx_init() and
> > rdma_rw_ctx_destroy() to take a flags argument which indicates whether
> > to use the PCI P2P mapping functions or not.
> > 
> > Finally, in the NVMe fabrics target port we introduce a new
> > configuration boolean: 'allow_p2pmem'. When set, the port will attempt
> > to find P2P memory supported by the RDMA NIC and all namespaces. If
> > supported memory is found, it will be used in all IO transfers. And if
> > a port is using P2P memory, adding new namespaces that are not supported
> > by that memory will fail.
> > 
> > Logan Gunthorpe (10):
> >   PCI/P2PDMA: Support peer to peer memory
> >   PCI/P2PDMA: Add sysfs group to display p2pmem stats
> >   PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset
> >   PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches
> >   block: Introduce PCI P2P flags for request and request queue
> >   IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()
> >   nvme-pci: Use PCI p2pmem subsystem to manage the CMB
> >   nvme-pci: Add support for P2P memory in requests
> >   nvme-pci: Add a quirk for a pseudo CMB
> >   nvmet: Optionally use PCI P2P memory
> > 
> >  Documentation/ABI/testing/sysfs-bus-pci |  25 ++
> >  block/blk-core.c                        |   3 +
> >  drivers/infiniband/core/rw.c            |  21 +-
> >  drivers/infiniband/ulp/isert/ib_isert.c |   5 +-
> >  drivers/infiniband/ulp/srpt/ib_srpt.c   |   7 +-
> >  drivers/nvme/host/core.c                |   4 +
> >  drivers/nvme/host/nvme.h                |   8 +
> >  drivers/nvme/host/pci.c                 | 118 ++++--
> >  drivers/nvme/target/configfs.c          |  29 ++
> >  drivers/nvme/target/core.c              |  95 ++++-
> >  drivers/nvme/target/io-cmd.c            |   3 +
> >  drivers/nvme/target/nvmet.h             |  10 +
> >  drivers/nvme/target/rdma.c              |  43 +-
> >  drivers/pci/Kconfig                     |  20 +
> >  drivers/pci/Makefile                    |   1 +
> >  drivers/pci/p2pdma.c                    | 713 ++++++++++++++++++++++++++++++++
> >  drivers/pci/pci.c                       |   4 +
> >  include/linux/blk_types.h               |  18 +-
> >  include/linux/blkdev.h                  |   3 +
> >  include/linux/memremap.h                |  19 +
> >  include/linux/pci-p2pdma.h              | 105 +++++
> >  include/linux/pci.h                     |   4 +
> >  include/rdma/rw.h                       |   7 +-
> >  net/sunrpc/xprtrdma/svc_rdma_rw.c       |   6 +-
> >  24 files changed, 1204 insertions(+), 67 deletions(-)
> >  create mode 100644 drivers/pci/p2pdma.c
> >  create mode 100644 include/linux/pci-p2pdma.h
> > 
> > --
> > 2.11.0

  reply	other threads:[~2018-03-01  3:56 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-28 23:39 [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory Logan Gunthorpe
2018-03-01 17:37   ` Bjorn Helgaas
2018-03-01 18:55     ` Logan Gunthorpe
2018-03-01 23:00       ` Bjorn Helgaas
2018-03-01 23:06         ` Logan Gunthorpe
2018-03-01 23:14           ` Stephen  Bates
2018-03-01 23:45             ` Bjorn Helgaas
2018-02-28 23:39 ` [PATCH v2 02/10] PCI/P2PDMA: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-03-01 17:44   ` Bjorn Helgaas
2018-03-02  0:15     ` Logan Gunthorpe
2018-03-02  0:36       ` Dan Williams
2018-03-02  0:37         ` Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 03/10] PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset Logan Gunthorpe
2018-03-01 17:49   ` Bjorn Helgaas
2018-03-01 19:36     ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Logan Gunthorpe
2018-03-01 18:02   ` Bjorn Helgaas
2018-03-01 18:54     ` Stephen  Bates
2018-03-01 21:21       ` Alex Williamson
2018-03-01 21:26         ` Logan Gunthorpe
2018-03-01 21:32         ` Stephen  Bates
2018-03-01 21:35           ` Jerome Glisse
2018-03-01 21:37             ` Logan Gunthorpe
2018-03-01 23:15       ` Bjorn Helgaas
2018-03-01 23:59         ` Logan Gunthorpe
2018-03-01 19:13     ` Logan Gunthorpe
2018-03-05 22:28       ` Bjorn Helgaas
2018-03-05 23:01         ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 05/10] block: Introduce PCI P2P flags for request and request queue Logan Gunthorpe
2018-03-01 11:08   ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 06/10] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-03-01 10:32   ` Sagi Grimberg
2018-03-01 17:16     ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-03-05  1:33   ` Oliver
2018-03-05 16:00     ` Keith Busch
2018-03-05 17:10       ` Logan Gunthorpe
2018-03-05 18:02         ` Sinan Kaya
2018-03-05 18:09           ` Logan Gunthorpe
2018-03-06  0:49         ` Oliver
2018-03-06  1:14           ` Logan Gunthorpe
2018-03-06 10:40             ` Oliver
2018-03-05 19:57       ` Sagi Grimberg
2018-03-05 20:10         ` Jason Gunthorpe
2018-03-05 20:16           ` Logan Gunthorpe
2018-03-05 20:42           ` Keith Busch
2018-03-05 20:50             ` Jason Gunthorpe
2018-03-05 20:13         ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-03-01 11:07   ` Sagi Grimberg
2018-03-01 15:58     ` Stephen  Bates
2018-03-09  5:08       ` Bart Van Assche
2018-02-28 23:40 ` [PATCH v2 09/10] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-03-01 11:03   ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
2018-03-01 11:03   ` Sagi Grimberg
2018-03-01 16:15     ` Stephen  Bates
2018-03-01 17:40     ` Logan Gunthorpe
2018-03-01 18:35       ` Sagi Grimberg
2018-03-01 18:42         ` Jason Gunthorpe
2018-03-01 19:01           ` Stephen  Bates
2018-03-01 19:27           ` Logan Gunthorpe
2018-03-01 22:45             ` Jason Gunthorpe
2018-03-01 22:56               ` Logan Gunthorpe
2018-03-01 23:00               ` Stephen  Bates
2018-03-01 23:20                 ` Jason Gunthorpe
2018-03-01 23:29                   ` Logan Gunthorpe
2018-03-01 23:32                   ` Stephen  Bates
2018-03-01 23:49                 ` Keith Busch
2018-03-01 23:52                   ` Logan Gunthorpe
2018-03-01 23:53                   ` Stephen  Bates
2018-03-02 15:53                     ` Christoph Hellwig
2018-03-02 20:51                       ` Stephen  Bates
2018-03-01 23:57                   ` Stephen  Bates
2018-03-02  0:03                     ` Logan Gunthorpe
2018-03-02 16:18                     ` Jason Gunthorpe
2018-03-02 17:10                       ` Logan Gunthorpe
2018-03-01 19:10         ` Logan Gunthorpe
2018-03-01  3:54 ` [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Benjamin Herrenschmidt
2018-03-01  3:56   ` Benjamin Herrenschmidt [this message]
2018-03-01 18:04     ` Logan Gunthorpe
2018-03-01 20:29       ` Benjamin Herrenschmidt
2018-03-01 20:55         ` Jerome Glisse
2018-03-01 21:03           ` Logan Gunthorpe
2018-03-01 21:10             ` Jerome Glisse
2018-03-01 21:15               ` Logan Gunthorpe
2018-03-01 21:25                 ` Jerome Glisse
2018-03-01 21:37               ` Stephen  Bates
2018-03-02 21:38               ` Stephen  Bates
2018-03-02 22:09                 ` Jerome Glisse
2018-03-05 20:36                   ` Stephen  Bates
2018-03-01 20:55         ` Logan Gunthorpe
2018-03-01 18:09     ` Stephen  Bates
2018-03-01 20:32       ` Benjamin Herrenschmidt
2018-03-01 19:21     ` Dan Williams
2018-03-01 19:30       ` Logan Gunthorpe
2018-03-01 20:34       ` Benjamin Herrenschmidt
2018-03-01 20:40         ` Benjamin Herrenschmidt
2018-03-01 20:53           ` Jason Gunthorpe
2018-03-01 20:57             ` Logan Gunthorpe
2018-03-01 22:06             ` Benjamin Herrenschmidt
2018-03-01 22:31               ` Linus Torvalds
2018-03-01 22:34                 ` Benjamin Herrenschmidt
2018-03-02 16:22                   ` Kani, Toshi
2018-03-02 16:57                     ` Linus Torvalds
2018-03-02 17:34                       ` Linus Torvalds
2018-03-02 17:38                       ` Kani, Toshi
2018-03-01 21:37         ` Dan Williams
2018-03-01 21:45           ` Logan Gunthorpe
2018-03-01 21:57             ` Logan Gunthorpe
2018-03-01 23:00               ` Benjamin Herrenschmidt
2018-03-01 23:19                 ` Logan Gunthorpe
2018-03-01 23:25                   ` Benjamin Herrenschmidt
2018-03-02 21:44                     ` Benjamin Herrenschmidt
2018-03-02 22:24                       ` Logan Gunthorpe
2018-03-01 23:26                   ` Benjamin Herrenschmidt
2018-03-01 23:54                     ` Logan Gunthorpe
2018-03-01 21:03       ` Benjamin Herrenschmidt
2018-03-01 21:11         ` Logan Gunthorpe
2018-03-01 21:18           ` Jerome Glisse
2018-03-01 21:22             ` Logan Gunthorpe
2018-03-01 10:31 ` Sagi Grimberg
2018-03-01 19:33   ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1519876569.4592.4.camel@au1.ibm.com \
    --to=benh@au1.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=maxg@mellanox.com \
    --cc=oliveroh@au1.ibm.com \
    --cc=sagi@grimberg.me \
    --cc=sbates@raithlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).