From: Benjamin Herrenschmidt <benh@au1.ibm.com>
To: Logan Gunthorpe <logang@deltatee.com>,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
linux-nvdimm@lists.01.org, linux-block@vger.kernel.org
Cc: "Stephen Bates" <sbates@raithlin.com>,
"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
"Keith Busch" <keith.busch@intel.com>,
"Sagi Grimberg" <sagi@grimberg.me>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Jason Gunthorpe" <jgg@mellanox.com>,
"Max Gurtovoy" <maxg@mellanox.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Oliver OHalloran" <oliveroh@au1.ibm.com>
Subject: Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory
Date: Thu, 01 Mar 2018 14:56:09 +1100 [thread overview]
Message-ID: <1519876569.4592.4.camel@au1.ibm.com> (raw)
In-Reply-To: <1519876489.4592.3.camel@kernel.crashing.org>
On Thu, 2018-03-01 at 14:54 +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2018-02-28 at 16:39 -0700, Logan Gunthorpe wrote:
> > Hi Everyone,
>
>
> So Oliver (CC) was having issues getting any of that to work for us.
>
> The problem is that acccording to him (I didn't double check the latest
> patches) you effectively hotplug the PCIe memory into the system when
> creating struct pages.
>
> This cannot possibly work for us. First we cannot map PCIe memory as
> cachable. (Note that doing so is a bad idea if you are behind a PLX
> switch anyway since you'd ahve to manage cache coherency in SW).
Note: I think the above means it won't work behind a switch on x86
either, will it ?
> Then our MMIO space is so far away from our memory space that there is
> not enough vmemmap virtual space to be able to do that.
>
> So this can only work accross achitectures by using something like HMM
> to create special device struct page's.
>
> Ben.
>
>
> > Here's v2 of our series to introduce P2P based copy offload to NVMe
> > fabrics. This version has been rebased onto v4.16-rc3 which already
> > includes Christoph's devpagemap work the previous version was based
> > off as well as a couple of the cleanup patches that were in v1.
> >
> > Additionally, we've made the following changes based on feedback:
> >
> > * Renamed everything to 'p2pdma' per the suggestion from Bjorn as well
> > as a bunch of cleanup and spelling fixes he pointed out in the last
> > series.
> >
> > * To address Alex's ACS concerns, we change to a simpler method of
> > just disabling ACS behind switches for any kernel that has
> > CONFIG_PCI_P2PDMA.
> >
> > * We also reject using devices that employ 'dma_virt_ops' which should
> > fairly simply handle Jason's concerns that this work might break with
> > the HFI, QIB and rxe drivers that use the virtual ops to implement
> > their own special DMA operations.
> >
> > Thanks,
> >
> > Logan
> >
> > --
> >
> > This is a continuation of our work to enable using Peer-to-Peer PCI
> > memory in NVMe fabrics targets. Many thanks go to Christoph Hellwig who
> > provided valuable feedback to get these patches to where they are today.
> >
> > The concept here is to use memory that's exposed on a PCI BAR as
> > data buffers in the NVME target code such that data can be transferred
> > from an RDMA NIC to the special memory and then directly to an NVMe
> > device avoiding system memory entirely. The upside of this is better
> > QoS for applications running on the CPU utilizing memory and lower
> > PCI bandwidth required to the CPU (such that systems could be designed
> > with fewer lanes connected to the CPU). However, presently, the
> > trade-off is currently a reduction in overall throughput. (Largely due
> > to hardware issues that would certainly improve in the future).
> >
> > Due to these trade-offs we've designed the system to only enable using
> > the PCI memory in cases where the NIC, NVMe devices and memory are all
> > behind the same PCI switch. This will mean many setups that could likely
> > work well will not be supported so that we can be more confident it
> > will work and not place any responsibility on the user to understand
> > their topology. (We chose to go this route based on feedback we
> > received at the last LSF). Future work may enable these transfers behind
> > a fabric of PCI switches or perhaps using a white list of known good
> > root complexes.
> >
> > In order to enable this functionality, we introduce a few new PCI
> > functions such that a driver can register P2P memory with the system.
> > Struct pages are created for this memory using devm_memremap_pages()
> > and the PCI bus offset is stored in the corresponding pagemap structure.
> >
> > Another set of functions allow a client driver to create a list of
> > client devices that will be used in a given P2P transactions and then
> > use that list to find any P2P memory that is supported by all the
> > client devices. This list is then also used to selectively disable the
> > ACS bits for the downstream ports behind these devices.
> >
> > In the block layer, we also introduce a P2P request flag to indicate a
> > given request targets P2P memory as well as a flag for a request queue
> > to indicate a given queue supports targeting P2P memory. P2P requests
> > will only be accepted by queues that support it. Also, P2P requests
> > are marked to not be merged seeing a non-homogenous request would
> > complicate the DMA mapping requirements.
> >
> > In the PCI NVMe driver, we modify the existing CMB support to utilize
> > the new PCI P2P memory infrastructure and also add support for P2P
> > memory in its request queue. When a P2P request is received it uses the
> > pci_p2pmem_map_sg() function which applies the necessary transformation
> > to get the corrent pci_bus_addr_t for the DMA transactions.
> >
> > In the RDMA core, we also adjust rdma_rw_ctx_init() and
> > rdma_rw_ctx_destroy() to take a flags argument which indicates whether
> > to use the PCI P2P mapping functions or not.
> >
> > Finally, in the NVMe fabrics target port we introduce a new
> > configuration boolean: 'allow_p2pmem'. When set, the port will attempt
> > to find P2P memory supported by the RDMA NIC and all namespaces. If
> > supported memory is found, it will be used in all IO transfers. And if
> > a port is using P2P memory, adding new namespaces that are not supported
> > by that memory will fail.
> >
> > Logan Gunthorpe (10):
> > PCI/P2PDMA: Support peer to peer memory
> > PCI/P2PDMA: Add sysfs group to display p2pmem stats
> > PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset
> > PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches
> > block: Introduce PCI P2P flags for request and request queue
> > IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()
> > nvme-pci: Use PCI p2pmem subsystem to manage the CMB
> > nvme-pci: Add support for P2P memory in requests
> > nvme-pci: Add a quirk for a pseudo CMB
> > nvmet: Optionally use PCI P2P memory
> >
> > Documentation/ABI/testing/sysfs-bus-pci | 25 ++
> > block/blk-core.c | 3 +
> > drivers/infiniband/core/rw.c | 21 +-
> > drivers/infiniband/ulp/isert/ib_isert.c | 5 +-
> > drivers/infiniband/ulp/srpt/ib_srpt.c | 7 +-
> > drivers/nvme/host/core.c | 4 +
> > drivers/nvme/host/nvme.h | 8 +
> > drivers/nvme/host/pci.c | 118 ++++--
> > drivers/nvme/target/configfs.c | 29 ++
> > drivers/nvme/target/core.c | 95 ++++-
> > drivers/nvme/target/io-cmd.c | 3 +
> > drivers/nvme/target/nvmet.h | 10 +
> > drivers/nvme/target/rdma.c | 43 +-
> > drivers/pci/Kconfig | 20 +
> > drivers/pci/Makefile | 1 +
> > drivers/pci/p2pdma.c | 713 ++++++++++++++++++++++++++++++++
> > drivers/pci/pci.c | 4 +
> > include/linux/blk_types.h | 18 +-
> > include/linux/blkdev.h | 3 +
> > include/linux/memremap.h | 19 +
> > include/linux/pci-p2pdma.h | 105 +++++
> > include/linux/pci.h | 4 +
> > include/rdma/rw.h | 7 +-
> > net/sunrpc/xprtrdma/svc_rdma_rw.c | 6 +-
> > 24 files changed, 1204 insertions(+), 67 deletions(-)
> > create mode 100644 drivers/pci/p2pdma.c
> > create mode 100644 include/linux/pci-p2pdma.h
> >
> > --
> > 2.11.0
next prev parent reply other threads:[~2018-03-01 3:56 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-28 23:39 [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory Logan Gunthorpe
2018-03-01 17:37 ` Bjorn Helgaas
2018-03-01 18:55 ` Logan Gunthorpe
2018-03-01 23:00 ` Bjorn Helgaas
2018-03-01 23:06 ` Logan Gunthorpe
2018-03-01 23:14 ` Stephen Bates
2018-03-01 23:45 ` Bjorn Helgaas
2018-02-28 23:39 ` [PATCH v2 02/10] PCI/P2PDMA: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-03-01 17:44 ` Bjorn Helgaas
2018-03-02 0:15 ` Logan Gunthorpe
2018-03-02 0:36 ` Dan Williams
2018-03-02 0:37 ` Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 03/10] PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset Logan Gunthorpe
2018-03-01 17:49 ` Bjorn Helgaas
2018-03-01 19:36 ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Logan Gunthorpe
2018-03-01 18:02 ` Bjorn Helgaas
2018-03-01 18:54 ` Stephen Bates
2018-03-01 21:21 ` Alex Williamson
2018-03-01 21:26 ` Logan Gunthorpe
2018-03-01 21:32 ` Stephen Bates
2018-03-01 21:35 ` Jerome Glisse
2018-03-01 21:37 ` Logan Gunthorpe
2018-03-01 23:15 ` Bjorn Helgaas
2018-03-01 23:59 ` Logan Gunthorpe
2018-03-01 19:13 ` Logan Gunthorpe
2018-03-05 22:28 ` Bjorn Helgaas
2018-03-05 23:01 ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 05/10] block: Introduce PCI P2P flags for request and request queue Logan Gunthorpe
2018-03-01 11:08 ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 06/10] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-03-01 10:32 ` Sagi Grimberg
2018-03-01 17:16 ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-03-05 1:33 ` Oliver
2018-03-05 16:00 ` Keith Busch
2018-03-05 17:10 ` Logan Gunthorpe
2018-03-05 18:02 ` Sinan Kaya
2018-03-05 18:09 ` Logan Gunthorpe
2018-03-06 0:49 ` Oliver
2018-03-06 1:14 ` Logan Gunthorpe
2018-03-06 10:40 ` Oliver
2018-03-05 19:57 ` Sagi Grimberg
2018-03-05 20:10 ` Jason Gunthorpe
2018-03-05 20:16 ` Logan Gunthorpe
2018-03-05 20:42 ` Keith Busch
2018-03-05 20:50 ` Jason Gunthorpe
2018-03-05 20:13 ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-03-01 11:07 ` Sagi Grimberg
2018-03-01 15:58 ` Stephen Bates
2018-03-09 5:08 ` Bart Van Assche
2018-02-28 23:40 ` [PATCH v2 09/10] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-03-01 11:03 ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
2018-03-01 11:03 ` Sagi Grimberg
2018-03-01 16:15 ` Stephen Bates
2018-03-01 17:40 ` Logan Gunthorpe
2018-03-01 18:35 ` Sagi Grimberg
2018-03-01 18:42 ` Jason Gunthorpe
2018-03-01 19:01 ` Stephen Bates
2018-03-01 19:27 ` Logan Gunthorpe
2018-03-01 22:45 ` Jason Gunthorpe
2018-03-01 22:56 ` Logan Gunthorpe
2018-03-01 23:00 ` Stephen Bates
2018-03-01 23:20 ` Jason Gunthorpe
2018-03-01 23:29 ` Logan Gunthorpe
2018-03-01 23:32 ` Stephen Bates
2018-03-01 23:49 ` Keith Busch
2018-03-01 23:52 ` Logan Gunthorpe
2018-03-01 23:53 ` Stephen Bates
2018-03-02 15:53 ` Christoph Hellwig
2018-03-02 20:51 ` Stephen Bates
2018-03-01 23:57 ` Stephen Bates
2018-03-02 0:03 ` Logan Gunthorpe
2018-03-02 16:18 ` Jason Gunthorpe
2018-03-02 17:10 ` Logan Gunthorpe
2018-03-01 19:10 ` Logan Gunthorpe
2018-03-01 3:54 ` [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Benjamin Herrenschmidt
2018-03-01 3:56 ` Benjamin Herrenschmidt [this message]
2018-03-01 18:04 ` Logan Gunthorpe
2018-03-01 20:29 ` Benjamin Herrenschmidt
2018-03-01 20:55 ` Jerome Glisse
2018-03-01 21:03 ` Logan Gunthorpe
2018-03-01 21:10 ` Jerome Glisse
2018-03-01 21:15 ` Logan Gunthorpe
2018-03-01 21:25 ` Jerome Glisse
2018-03-01 21:37 ` Stephen Bates
2018-03-02 21:38 ` Stephen Bates
2018-03-02 22:09 ` Jerome Glisse
2018-03-05 20:36 ` Stephen Bates
2018-03-01 20:55 ` Logan Gunthorpe
2018-03-01 18:09 ` Stephen Bates
2018-03-01 20:32 ` Benjamin Herrenschmidt
2018-03-01 19:21 ` Dan Williams
2018-03-01 19:30 ` Logan Gunthorpe
2018-03-01 20:34 ` Benjamin Herrenschmidt
2018-03-01 20:40 ` Benjamin Herrenschmidt
2018-03-01 20:53 ` Jason Gunthorpe
2018-03-01 20:57 ` Logan Gunthorpe
2018-03-01 22:06 ` Benjamin Herrenschmidt
2018-03-01 22:31 ` Linus Torvalds
2018-03-01 22:34 ` Benjamin Herrenschmidt
2018-03-02 16:22 ` Kani, Toshi
2018-03-02 16:57 ` Linus Torvalds
2018-03-02 17:34 ` Linus Torvalds
2018-03-02 17:38 ` Kani, Toshi
2018-03-01 21:37 ` Dan Williams
2018-03-01 21:45 ` Logan Gunthorpe
2018-03-01 21:57 ` Logan Gunthorpe
2018-03-01 23:00 ` Benjamin Herrenschmidt
2018-03-01 23:19 ` Logan Gunthorpe
2018-03-01 23:25 ` Benjamin Herrenschmidt
2018-03-02 21:44 ` Benjamin Herrenschmidt
2018-03-02 22:24 ` Logan Gunthorpe
2018-03-01 23:26 ` Benjamin Herrenschmidt
2018-03-01 23:54 ` Logan Gunthorpe
2018-03-01 21:03 ` Benjamin Herrenschmidt
2018-03-01 21:11 ` Logan Gunthorpe
2018-03-01 21:18 ` Jerome Glisse
2018-03-01 21:22 ` Logan Gunthorpe
2018-03-01 10:31 ` Sagi Grimberg
2018-03-01 19:33 ` Logan Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1519876569.4592.4.camel@au1.ibm.com \
--to=benh@au1.ibm.com \
--cc=alex.williamson@redhat.com \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=hch@lst.de \
--cc=jgg@mellanox.com \
--cc=jglisse@redhat.com \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=maxg@mellanox.com \
--cc=oliveroh@au1.ibm.com \
--cc=sagi@grimberg.me \
--cc=sbates@raithlin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).