From: Bjorn Helgaas <helgaas@kernel.org>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
linux-nvdimm@lists.01.org, linux-block@vger.kernel.org,
"Stephen Bates" <sbates@raithlin.com>,
"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
"Keith Busch" <keith.busch@intel.com>,
"Sagi Grimberg" <sagi@grimberg.me>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Jason Gunthorpe" <jgg@mellanox.com>,
"Max Gurtovoy" <maxg@mellanox.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
"Alex Williamson" <alex.williamson@redhat.com>
Subject: Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory
Date: Thu, 1 Mar 2018 17:00:32 -0600 [thread overview]
Message-ID: <20180301230032.GA74737@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <b81503c2-9f07-b5f6-df06-64d1fdc53386@deltatee.com>
On Thu, Mar 01, 2018 at 11:55:51AM -0700, Logan Gunthorpe wrote:
> Hi Bjorn,
>
> Thanks for the review. I'll correct all the nits for the next version.
>
> On 01/03/18 10:37 AM, Bjorn Helgaas wrote:
> > On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
> > > Some PCI devices may have memory mapped in a BAR space that's
> > > intended for use in Peer-to-Peer transactions. In order to enable
> > > such transactions the memory must be registered with ZONE_DEVICE pages
> > > so it can be used by DMA interfaces in existing drivers.
>
> > Is there anything about this memory that makes it specifically
> > intended for peer-to-peer transactions? I assume the device can't
> > really tell whether a transaction is from a CPU or a peer.
>
> No there's nothing special about the memory and it can still be accessed by
> the CPU. This is just the intended purpose. You could use this PCI memory as
> regular DMA buffers or regular memory but I'm not sure why you would. It
> would probably be pretty bad performance-wise.
>
>
> > BTW, maybe there could be some kind of guide for device driver writers
> > in Documentation/PCI/?
> Makes sense we can look at writing something for the next iteration.
>
> > I think it would be clearer and sufficient to simply say that we have
> > no way to know whether peer-to-peer routing between PCIe Root Ports is
> > supported (PCIe r4.0, sec 1.3.1).
>
> Fair enough.
>
> > The fact that you use the PCIe term "switch" suggests that a PCIe
> > Switch is required, but isn't it sufficient for the peers to be below
> > the same "PCI bridge", which would include PCIe Root Ports, PCIe
> > Switch Downstream Ports, and conventional PCI bridges?
> > The comments at get_upstream_bridge_port() suggest that this isn't
> > enough, and the peers actually do have to be below the same PCIe
> > Switch, but I don't know why.
>
> I do mean Switch as we do need to keep the traffic off the root complex.
> Seeing, as stated above, we don't know if it actually support it. (While we
> can be certain any PCI switch does). So we specifically want to exclude PCIe
> Root ports and I'm not sure about the support of PCI bridges but I can't
> imagine anyone wanting to do P2P around them so I'd rather be safe than
> sorry and exclude them.
I don't think this is correct. A Root Port defines a hierarchy domain
(I'm looking at PCIe r4.0, sec 1.3.1). The capability to route
peer-to-peer transactions *between* hierarchy domains is optional. I
think this means a Root Complex is not required to route transactions
from one Root Port to another Root Port.
This doesn't say anything about routing between two different devices
below a Root Port. Those would be in the same hierarchy domain and
should follow the conventional PCI routing rules. Of course, since a
Root Port has one link that leads to one device, they would probably
be different functions of a single multi-function device, so I don't
know how practical it would be to test this.
> > This whole thing is confusing to me. Why do you want to reject peers
> > directly connected to the same root port? Why do you require the same
> > Switch Upstream Port? You don't exclude conventional PCI, but it
> > looks like you would require peers to share *two* upstream PCI-to-PCI
> > bridges? I would think a single shared upstream bridge (conventional,
> > PCIe Switch Downstream Port, or PCIe Root Port) would be sufficient?
>
> Hmm, yes, this may just be laziness on my part. Finding the shared upstream
> bridge is a bit more tricky than just showing that they are on the same
> switch. So as coded, a fabric of switches with peers on different legs of
> the fabric are not supported. But yes, maybe they just need to be two
> devices with a single shared upstream bridge that is not the root port.
> Again, we need to reject the root port because we can't know if the root
> complex can support P2P traffic.
This sounds like the same issue as above, so we just need to resolve
that.
> > Since "pci_p2pdma_add_client()" includes "pci_" in its name, it seems
> > sort of weird that callers supply a non-PCI device and then we look up
> > a PCI device here. I assume you have some reason for this; if you
> > added a writeup in Documentation/PCI, that would be a good place to
> > elaborate on that, maybe with a one-line clue here.
>
> Well yes, but this is much more convenient for callers which don't need to
> care if the device they are attempting to add (which in the NVMe target
> case, could be a random block device) is a pci device or not. Especially
> seeing find_parent_pci_dev() is non-trivial.
OK. I accept that it might be convenient, but I still think it leads
to a weird API. Maybe that's OK; I don't know enough about the
scenario in the caller to do anything more than say "hmm, strange".
> > > +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
> > > +{
> > > + void *ret;
> > > +
> > > + if (unlikely(!pdev->p2pdma))
> >
> > Is this a hot path? I'm not sure it's worth cluttering
> > non-performance paths with likely/unlikely.
>
> I'd say it can be pretty hot given that we need to allocate and free buffers
> at multiple GB/s for the NVMe target case. I don't exactly have benchmarks
> or anything to show this though...
OK, I'll take your word for that. I'm fine with this as long as it is
performance-sensitive.
Bjorn
next prev parent reply other threads:[~2018-03-01 23:00 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-28 23:39 [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory Logan Gunthorpe
2018-03-01 17:37 ` Bjorn Helgaas
2018-03-01 18:55 ` Logan Gunthorpe
2018-03-01 23:00 ` Bjorn Helgaas [this message]
2018-03-01 23:06 ` Logan Gunthorpe
2018-03-01 23:14 ` Stephen Bates
2018-03-01 23:45 ` Bjorn Helgaas
2018-02-28 23:39 ` [PATCH v2 02/10] PCI/P2PDMA: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-03-01 17:44 ` Bjorn Helgaas
2018-03-02 0:15 ` Logan Gunthorpe
2018-03-02 0:36 ` Dan Williams
2018-03-02 0:37 ` Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 03/10] PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset Logan Gunthorpe
2018-03-01 17:49 ` Bjorn Helgaas
2018-03-01 19:36 ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Logan Gunthorpe
2018-03-01 18:02 ` Bjorn Helgaas
2018-03-01 18:54 ` Stephen Bates
2018-03-01 21:21 ` Alex Williamson
2018-03-01 21:26 ` Logan Gunthorpe
2018-03-01 21:32 ` Stephen Bates
2018-03-01 21:35 ` Jerome Glisse
2018-03-01 21:37 ` Logan Gunthorpe
2018-03-01 23:15 ` Bjorn Helgaas
2018-03-01 23:59 ` Logan Gunthorpe
2018-03-01 19:13 ` Logan Gunthorpe
2018-03-05 22:28 ` Bjorn Helgaas
2018-03-05 23:01 ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 05/10] block: Introduce PCI P2P flags for request and request queue Logan Gunthorpe
2018-03-01 11:08 ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 06/10] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-03-01 10:32 ` Sagi Grimberg
2018-03-01 17:16 ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-03-05 1:33 ` Oliver
2018-03-05 16:00 ` Keith Busch
2018-03-05 17:10 ` Logan Gunthorpe
2018-03-05 18:02 ` Sinan Kaya
2018-03-05 18:09 ` Logan Gunthorpe
2018-03-06 0:49 ` Oliver
2018-03-06 1:14 ` Logan Gunthorpe
2018-03-06 10:40 ` Oliver
2018-03-05 19:57 ` Sagi Grimberg
2018-03-05 20:10 ` Jason Gunthorpe
2018-03-05 20:16 ` Logan Gunthorpe
2018-03-05 20:42 ` Keith Busch
2018-03-05 20:50 ` Jason Gunthorpe
2018-03-05 20:13 ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-03-01 11:07 ` Sagi Grimberg
2018-03-01 15:58 ` Stephen Bates
2018-03-09 5:08 ` Bart Van Assche
2018-02-28 23:40 ` [PATCH v2 09/10] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-03-01 11:03 ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
2018-03-01 11:03 ` Sagi Grimberg
2018-03-01 16:15 ` Stephen Bates
2018-03-01 17:40 ` Logan Gunthorpe
2018-03-01 18:35 ` Sagi Grimberg
2018-03-01 18:42 ` Jason Gunthorpe
2018-03-01 19:01 ` Stephen Bates
2018-03-01 19:27 ` Logan Gunthorpe
2018-03-01 22:45 ` Jason Gunthorpe
2018-03-01 22:56 ` Logan Gunthorpe
2018-03-01 23:00 ` Stephen Bates
2018-03-01 23:20 ` Jason Gunthorpe
2018-03-01 23:29 ` Logan Gunthorpe
2018-03-01 23:32 ` Stephen Bates
2018-03-01 23:49 ` Keith Busch
2018-03-01 23:52 ` Logan Gunthorpe
2018-03-01 23:53 ` Stephen Bates
2018-03-02 15:53 ` Christoph Hellwig
2018-03-02 20:51 ` Stephen Bates
2018-03-01 23:57 ` Stephen Bates
2018-03-02 0:03 ` Logan Gunthorpe
2018-03-02 16:18 ` Jason Gunthorpe
2018-03-02 17:10 ` Logan Gunthorpe
2018-03-01 19:10 ` Logan Gunthorpe
2018-03-01 3:54 ` [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Benjamin Herrenschmidt
2018-03-01 3:56 ` Benjamin Herrenschmidt
2018-03-01 18:04 ` Logan Gunthorpe
2018-03-01 20:29 ` Benjamin Herrenschmidt
2018-03-01 20:55 ` Jerome Glisse
2018-03-01 21:03 ` Logan Gunthorpe
2018-03-01 21:10 ` Jerome Glisse
2018-03-01 21:15 ` Logan Gunthorpe
2018-03-01 21:25 ` Jerome Glisse
2018-03-01 21:37 ` Stephen Bates
2018-03-02 21:38 ` Stephen Bates
2018-03-02 22:09 ` Jerome Glisse
2018-03-05 20:36 ` Stephen Bates
2018-03-01 20:55 ` Logan Gunthorpe
2018-03-01 18:09 ` Stephen Bates
2018-03-01 20:32 ` Benjamin Herrenschmidt
2018-03-01 19:21 ` Dan Williams
2018-03-01 19:30 ` Logan Gunthorpe
2018-03-01 20:34 ` Benjamin Herrenschmidt
2018-03-01 20:40 ` Benjamin Herrenschmidt
2018-03-01 20:53 ` Jason Gunthorpe
2018-03-01 20:57 ` Logan Gunthorpe
2018-03-01 22:06 ` Benjamin Herrenschmidt
2018-03-01 22:31 ` Linus Torvalds
2018-03-01 22:34 ` Benjamin Herrenschmidt
2018-03-02 16:22 ` Kani, Toshi
2018-03-02 16:57 ` Linus Torvalds
2018-03-02 17:34 ` Linus Torvalds
2018-03-02 17:38 ` Kani, Toshi
2018-03-01 21:37 ` Dan Williams
2018-03-01 21:45 ` Logan Gunthorpe
2018-03-01 21:57 ` Logan Gunthorpe
2018-03-01 23:00 ` Benjamin Herrenschmidt
2018-03-01 23:19 ` Logan Gunthorpe
2018-03-01 23:25 ` Benjamin Herrenschmidt
2018-03-02 21:44 ` Benjamin Herrenschmidt
2018-03-02 22:24 ` Logan Gunthorpe
2018-03-01 23:26 ` Benjamin Herrenschmidt
2018-03-01 23:54 ` Logan Gunthorpe
2018-03-01 21:03 ` Benjamin Herrenschmidt
2018-03-01 21:11 ` Logan Gunthorpe
2018-03-01 21:18 ` Jerome Glisse
2018-03-01 21:22 ` Logan Gunthorpe
2018-03-01 10:31 ` Sagi Grimberg
2018-03-01 19:33 ` Logan Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180301230032.GA74737@bhelgaas-glaptop.roam.corp.google.com \
--to=helgaas@kernel.org \
--cc=alex.williamson@redhat.com \
--cc=axboe@kernel.dk \
--cc=benh@kernel.crashing.org \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=hch@lst.de \
--cc=jgg@mellanox.com \
--cc=jglisse@redhat.com \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=maxg@mellanox.com \
--cc=sagi@grimberg.me \
--cc=sbates@raithlin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).