linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-block@vger.kernel.org,
	"Stephen Bates" <sbates@raithlin.com>,
	"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
	"Keith Busch" <keith.busch@intel.com>,
	"Sagi Grimberg" <sagi@grimberg.me>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Jason Gunthorpe" <jgg@mellanox.com>,
	"Max Gurtovoy" <maxg@mellanox.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Alex Williamson" <alex.williamson@redhat.com>
Subject: Re: [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory
Date: Thu, 1 Mar 2018 11:55:51 -0700	[thread overview]
Message-ID: <b81503c2-9f07-b5f6-df06-64d1fdc53386@deltatee.com> (raw)
In-Reply-To: <20180301173752.GE13722@bhelgaas-glaptop.roam.corp.google.com>

Hi Bjorn,

Thanks for the review. I'll correct all the nits for the next version.

On 01/03/18 10:37 AM, Bjorn Helgaas wrote:
> On Wed, Feb 28, 2018 at 04:39:57PM -0700, Logan Gunthorpe wrote:
>> Some PCI devices may have memory mapped in a BAR space that's
>> intended for use in Peer-to-Peer transactions. In order to enable
>> such transactions the memory must be registered with ZONE_DEVICE pages
>> so it can be used by DMA interfaces in existing drivers.

> Is there anything about this memory that makes it specifically
> intended for peer-to-peer transactions?  I assume the device can't
> really tell whether a transaction is from a CPU or a peer.

No there's nothing special about the memory and it can still be accessed 
by the CPU. This is just the intended purpose. You could use this PCI 
memory as regular DMA buffers or regular memory but I'm not sure why you 
would. It would probably be pretty bad performance-wise.


> BTW, maybe there could be some kind of guide for device driver writers
> in Documentation/PCI/?
Makes sense we can look at writing something for the next iteration.

> I think it would be clearer and sufficient to simply say that we have
> no way to know whether peer-to-peer routing between PCIe Root Ports is
> supported (PCIe r4.0, sec 1.3.1).

Fair enough.

> The fact that you use the PCIe term "switch" suggests that a PCIe
> Switch is required, but isn't it sufficient for the peers to be below
> the same "PCI bridge", which would include PCIe Root Ports, PCIe
> Switch Downstream Ports, and conventional PCI bridges?
> The comments at get_upstream_bridge_port() suggest that this isn't
> enough, and the peers actually do have to be below the same PCIe
> Switch, but I don't know why.

I do mean Switch as we do need to keep the traffic off the root complex. 
Seeing, as stated above, we don't know if it actually support it. (While 
we can be certain any PCI switch does). So we specifically want to 
exclude PCIe Root ports and I'm not sure about the support of PCI 
bridges but I can't imagine anyone wanting to do P2P around them so I'd 
rather be safe than sorry and exclude them.


>> +	addr = devm_memremap_pages(&pdev->dev, pgmap);
>> +	if (IS_ERR(addr))
> 
> Free pgmap here?  And in the other error case below?  Or maybe this
> happens via the devm_* magic?  If so, when would that actually happen?
> Would pgmap be effectively leaked until the pdev is destroyed?

Yes, it happens via the devm magic as that's the way the 
devm_memremap_pages() interface was designed. If I remember correctly, 
in my testing, it would be de-allocated when the driver gets unbound.

>> +		return PTR_ERR(addr);
>> +
>> +	error = gen_pool_add_virt(pdev->p2pdma->pool, (uintptr_t)addr,
>> +			pci_bus_address(pdev, bar) + offset,
>> +			resource_size(&pgmap->res), dev_to_node(&pdev->dev));
>> +	if (error)
>> +		return error;
>> +
>> +	error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_percpu_kill,
>> +					  &pdev->p2pdma->devmap_ref);
>> +	if (error)
>> +		return error;
>> +
>> +	dev_info(&pdev->dev, "added peer-to-peer DMA memory %pR\n",
>> +		 &pgmap->res);
> 
> s/dev_info/pci_info/ (also similar usages below, except for the one or
> two cases where you don't have a pci_dev).

Oh, nice, I didn't notice that was added.

> This whole thing is confusing to me.  Why do you want to reject peers
> directly connected to the same root port?  Why do you require the same
> Switch Upstream Port?  You don't exclude conventional PCI, but it
> looks like you would require peers to share *two* upstream PCI-to-PCI
> bridges?  I would think a single shared upstream bridge (conventional,
> PCIe Switch Downstream Port, or PCIe Root Port) would be sufficient?

Hmm, yes, this may just be laziness on my part. Finding the shared 
upstream bridge is a bit more tricky than just showing that they are on 
the same switch. So as coded, a fabric of switches with peers on 
different legs of the fabric are not supported. But yes, maybe they just 
need to be two devices with a single shared upstream bridge that is not 
the root port. Again, we need to reject the root port because we can't 
know if the root complex can support P2P traffic.

> Since "pci_p2pdma_add_client()" includes "pci_" in its name, it seems
> sort of weird that callers supply a non-PCI device and then we look up
> a PCI device here.  I assume you have some reason for this; if you
> added a writeup in Documentation/PCI, that would be a good place to
> elaborate on that, maybe with a one-line clue here.

Well yes, but this is much more convenient for callers which don't need 
to care if the device they are attempting to add (which in the NVMe 
target case, could be a random block device) is a pci device or not. 
Especially seeing find_parent_pci_dev() is non-trivial.

>> +void *pci_alloc_p2pmem(struct pci_dev *pdev, size_t size)
>> +{
>> +	void *ret;
>> +
>> +	if (unlikely(!pdev->p2pdma))
> 
> Is this a hot path?  I'm not sure it's worth cluttering
> non-performance paths with likely/unlikely.

I'd say it can be pretty hot given that we need to allocate and free 
buffers at multiple GB/s for the NVMe target case. I don't exactly have 
benchmarks or anything to show this though...

>> +		return NULL;
>> +
>> +	if (unlikely(!percpu_ref_tryget_live(&pdev->p2pdma->devmap_ref)))
>> +		return NULL;
>> +
>> +	ret = (void *)(uintptr_t)gen_pool_alloc(pdev->p2pdma->pool, size);
> 
> Why the double cast?  Wouldn't "(void *)" be sufficient?

Oh, hmm, I can't remember now. I suspect I added it to fix a kbuild test 
robot error. I'll check and see if it can be removed.


> In v4.6-rc1, gen_pool_free() takes "unsigned long addr".  I know this
> is based on -rc3; is this something that changed between -rc1 and
> -rc3?

Yes, I think it's taken an unsigned long for a while now. Per the above, 
I can't remember exactly why I casted it this way and I'll take a look 
again.

Logan

  reply	other threads:[~2018-03-01 18:56 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-28 23:39 [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory Logan Gunthorpe
2018-03-01 17:37   ` Bjorn Helgaas
2018-03-01 18:55     ` Logan Gunthorpe [this message]
2018-03-01 23:00       ` Bjorn Helgaas
2018-03-01 23:06         ` Logan Gunthorpe
2018-03-01 23:14           ` Stephen  Bates
2018-03-01 23:45             ` Bjorn Helgaas
2018-02-28 23:39 ` [PATCH v2 02/10] PCI/P2PDMA: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-03-01 17:44   ` Bjorn Helgaas
2018-03-02  0:15     ` Logan Gunthorpe
2018-03-02  0:36       ` Dan Williams
2018-03-02  0:37         ` Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 03/10] PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset Logan Gunthorpe
2018-03-01 17:49   ` Bjorn Helgaas
2018-03-01 19:36     ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Logan Gunthorpe
2018-03-01 18:02   ` Bjorn Helgaas
2018-03-01 18:54     ` Stephen  Bates
2018-03-01 21:21       ` Alex Williamson
2018-03-01 21:26         ` Logan Gunthorpe
2018-03-01 21:32         ` Stephen  Bates
2018-03-01 21:35           ` Jerome Glisse
2018-03-01 21:37             ` Logan Gunthorpe
2018-03-01 23:15       ` Bjorn Helgaas
2018-03-01 23:59         ` Logan Gunthorpe
2018-03-01 19:13     ` Logan Gunthorpe
2018-03-05 22:28       ` Bjorn Helgaas
2018-03-05 23:01         ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 05/10] block: Introduce PCI P2P flags for request and request queue Logan Gunthorpe
2018-03-01 11:08   ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 06/10] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-03-01 10:32   ` Sagi Grimberg
2018-03-01 17:16     ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-03-05  1:33   ` Oliver
2018-03-05 16:00     ` Keith Busch
2018-03-05 17:10       ` Logan Gunthorpe
2018-03-05 18:02         ` Sinan Kaya
2018-03-05 18:09           ` Logan Gunthorpe
2018-03-06  0:49         ` Oliver
2018-03-06  1:14           ` Logan Gunthorpe
2018-03-06 10:40             ` Oliver
2018-03-05 19:57       ` Sagi Grimberg
2018-03-05 20:10         ` Jason Gunthorpe
2018-03-05 20:16           ` Logan Gunthorpe
2018-03-05 20:42           ` Keith Busch
2018-03-05 20:50             ` Jason Gunthorpe
2018-03-05 20:13         ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-03-01 11:07   ` Sagi Grimberg
2018-03-01 15:58     ` Stephen  Bates
2018-03-09  5:08       ` Bart Van Assche
2018-02-28 23:40 ` [PATCH v2 09/10] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-03-01 11:03   ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
2018-03-01 11:03   ` Sagi Grimberg
2018-03-01 16:15     ` Stephen  Bates
2018-03-01 17:40     ` Logan Gunthorpe
2018-03-01 18:35       ` Sagi Grimberg
2018-03-01 18:42         ` Jason Gunthorpe
2018-03-01 19:01           ` Stephen  Bates
2018-03-01 19:27           ` Logan Gunthorpe
2018-03-01 22:45             ` Jason Gunthorpe
2018-03-01 22:56               ` Logan Gunthorpe
2018-03-01 23:00               ` Stephen  Bates
2018-03-01 23:20                 ` Jason Gunthorpe
2018-03-01 23:29                   ` Logan Gunthorpe
2018-03-01 23:32                   ` Stephen  Bates
2018-03-01 23:49                 ` Keith Busch
2018-03-01 23:52                   ` Logan Gunthorpe
2018-03-01 23:53                   ` Stephen  Bates
2018-03-02 15:53                     ` Christoph Hellwig
2018-03-02 20:51                       ` Stephen  Bates
2018-03-01 23:57                   ` Stephen  Bates
2018-03-02  0:03                     ` Logan Gunthorpe
2018-03-02 16:18                     ` Jason Gunthorpe
2018-03-02 17:10                       ` Logan Gunthorpe
2018-03-01 19:10         ` Logan Gunthorpe
2018-03-01  3:54 ` [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Benjamin Herrenschmidt
2018-03-01  3:56   ` Benjamin Herrenschmidt
2018-03-01 18:04     ` Logan Gunthorpe
2018-03-01 20:29       ` Benjamin Herrenschmidt
2018-03-01 20:55         ` Jerome Glisse
2018-03-01 21:03           ` Logan Gunthorpe
2018-03-01 21:10             ` Jerome Glisse
2018-03-01 21:15               ` Logan Gunthorpe
2018-03-01 21:25                 ` Jerome Glisse
2018-03-01 21:37               ` Stephen  Bates
2018-03-02 21:38               ` Stephen  Bates
2018-03-02 22:09                 ` Jerome Glisse
2018-03-05 20:36                   ` Stephen  Bates
2018-03-01 20:55         ` Logan Gunthorpe
2018-03-01 18:09     ` Stephen  Bates
2018-03-01 20:32       ` Benjamin Herrenschmidt
2018-03-01 19:21     ` Dan Williams
2018-03-01 19:30       ` Logan Gunthorpe
2018-03-01 20:34       ` Benjamin Herrenschmidt
2018-03-01 20:40         ` Benjamin Herrenschmidt
2018-03-01 20:53           ` Jason Gunthorpe
2018-03-01 20:57             ` Logan Gunthorpe
2018-03-01 22:06             ` Benjamin Herrenschmidt
2018-03-01 22:31               ` Linus Torvalds
2018-03-01 22:34                 ` Benjamin Herrenschmidt
2018-03-02 16:22                   ` Kani, Toshi
2018-03-02 16:57                     ` Linus Torvalds
2018-03-02 17:34                       ` Linus Torvalds
2018-03-02 17:38                       ` Kani, Toshi
2018-03-01 21:37         ` Dan Williams
2018-03-01 21:45           ` Logan Gunthorpe
2018-03-01 21:57             ` Logan Gunthorpe
2018-03-01 23:00               ` Benjamin Herrenschmidt
2018-03-01 23:19                 ` Logan Gunthorpe
2018-03-01 23:25                   ` Benjamin Herrenschmidt
2018-03-02 21:44                     ` Benjamin Herrenschmidt
2018-03-02 22:24                       ` Logan Gunthorpe
2018-03-01 23:26                   ` Benjamin Herrenschmidt
2018-03-01 23:54                     ` Logan Gunthorpe
2018-03-01 21:03       ` Benjamin Herrenschmidt
2018-03-01 21:11         ` Logan Gunthorpe
2018-03-01 21:18           ` Jerome Glisse
2018-03-01 21:22             ` Logan Gunthorpe
2018-03-01 10:31 ` Sagi Grimberg
2018-03-01 19:33   ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b81503c2-9f07-b5f6-df06-64d1fdc53386@deltatee.com \
    --to=logang@deltatee.com \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=helgaas@kernel.org \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=sbates@raithlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).