linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: John Hubbard <jhubbard@nvidia.com>,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-mm@kvack.org, iommu@lists.linux-foundation.org
Cc: "Stephen Bates" <sbates@raithlin.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Christian König" <christian.koenig@amd.com>,
	"Don Dutile" <ddutile@redhat.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Jakowski Andrzej" <andrzej.jakowski@intel.com>,
	"Minturn Dave B" <dave.b.minturn@intel.com>,
	"Jason Ekstrand" <jason@jlekstrand.net>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Xiong Jianxin" <jianxin.xiong@intel.com>,
	"Bjorn Helgaas" <helgaas@kernel.org>,
	"Ira Weiny" <ira.weiny@intel.com>,
	"Robin Murphy" <robin.murphy@arm.com>
Subject: Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
Date: Mon, 3 May 2021 10:08:34 -0600	[thread overview]
Message-ID: <d4091d87-7d9e-8cde-4e1c-01b877b6785f@deltatee.com> (raw)
In-Reply-To: <d6220bff-83fc-6c03-76f7-32e9e00e40fd@nvidia.com>



On 2021-05-01 11:35 p.m., John Hubbard wrote:
> On 4/8/21 10:01 AM, Logan Gunthorpe wrote:
>> In order to use upstream_bridge_distance_warn() from a dma_map function,
>> it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
>> might sleep.
>>
>> In order to avoid this, try to get the host bridge's device from
>> bus->self, and if that is not set, just get the first element in the
>> device list. It should be impossible for the host bridge's device to
>> go away while references are held on child devices, so the first element
>> should not be able to change and, thus, this should be safe.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> ---
>>   drivers/pci/p2pdma.c | 14 ++++++++++++--
>>   1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
>> index bd89437faf06..473a08940fbc 100644
>> --- a/drivers/pci/p2pdma.c
>> +++ b/drivers/pci/p2pdma.c
>> @@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry {
>>   static bool __host_bridge_whitelist(struct pci_host_bridge *host,
>>   				    bool same_host_bridge)
>>   {
>> -	struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
>>   	const struct pci_p2pdma_whitelist_entry *entry;
>> +	struct pci_dev *root = host->bus->self;
>>   	unsigned short vendor, device;
>>   
>> +	/*
>> +	 * This makes the assumption that the first device on the bus is the
>> +	 * bridge itself and it has the devfn of 00.0. This assumption should
>> +	 * hold for the devices in the white list above, and if there are cases
>> +	 * where this isn't true they will have to be dealt with when such a
>> +	 * case is added to the whitelist.
> 
> Actually, it makes the assumption that the first device *in the list*
> (the host->bus-devices list) is 00.0.  The previous code made the
> assumption that you wrote.

The comment notes two assumptions (although the grammar is poor, which I
will fix). Yes, the previous code made the second assumption, the new
code makes both assumptions.

> By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems
> really short. From a naive point of view, I'd expect that there must be
> a lot more CPUs/chipsets that can do pci p2p, what do you think? I
> wonder if we have to be so super strict, anyway. It just seems extremely
> limited, and I suspect there will be some additions to the list as soon
> as we start to use this.

Yes, well unfortunately we have no other way to determine what host
bridges can communicate with P2P. We settled on a whitelist when the
series was first patch. Nobody likes that situation, but nobody has
found anything better. We've been hoping standards bodies would give us
a flag but I haven't heard anything about that. At least AMD has been
able to guarantee us that all CPUs newer than Zen will support so that
covers a large swath. It would be nice if we could say something similar
for Intel.

> OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating.
> Why is it OK to avoid taking any locks in order to retrieve the
> first entry from the list, but in order to retrieve any other entry, you
> have to aquire the pci_bus_sem, and get a reference as well? Something
> is inconsistent there.
> 
> The new version here also no longer takes a reference on the device,
> which is also cheating. But I'm guessing that the unstated assumption
> here is that there is always at least one entry in the list. But if
> that's true, then it's better to show clearly that assumption, instead
> of hiding it in an implicit call that skips both locking and reference
> counting.

Because we hold a reference to a child device of the bus. So the host
bus device can't go away until the child device has been released. An
earlier version of the P2PDMA patchset had a lot more extraneous get
device calls until someone else pointed this out.

> You could add a new function, which is a cut-down version of pci_get_slot(),
> like this, and call this from __host_bridge_whitelist():
> 
> /*
>   * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem
>   * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is
>   * up, it is safe to call this, because there will always be a top-level PCI
>   * root device.
>   *
>   * Other assumptions: the root device is the first device in the list, and the
>   * root device is numbered 00.0.
>   */
> struct pci_dev *pci_get_root_slot(struct pci_bus *bus)
> {
> 	struct pci_dev *root;
> 	unsigned devfn = PCI_DEVFN(0, 0);
> 
> 	root = list_first_entry_or_null(&bus->devices, struct pci_dev,
> 					bus_list);
> 	if (root->devfn == devfn)
> 		goto out;
> 
> 	root = NULL;
>   out:
> 	pci_dev_get(root);
> 	return root;
> }
> EXPORT_SYMBOL(pci_get_root_slot);
> 
> ...I think that's a lot clearer to the reader, about what's going on here.

Per above, I think the reference count is unnecessary. But I could wrap
it in a static function for clarity. (There's no reason to export this
function).

> Note that I'm not really sure if it *is* safe, I would need to ask other
> PCIe subsystem developers with more experience. But I don't think anyone
> is trying to make p2pdma calls so early that PCIe buses are uninitialized.

Yeah, it's impossible to make a p2pdma call before the PCIe bus is
initialized. They have to have access to at least one PCI device before
they can even attempt it.

Logan

  reply	other threads:[~2021-05-03 16:08 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 17:01 [PATCH 00/16] Add new DMA mapping operation for P2PDMA Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn() Logan Gunthorpe
2021-05-02  3:58   ` John Hubbard
2021-05-03 15:57     ` Logan Gunthorpe
2021-05-03 18:17       ` John Hubbard
2021-05-03 18:20         ` Logan Gunthorpe
2021-05-03 18:23           ` John Hubbard
2021-05-03 18:24         ` Christoph Hellwig
2021-05-11 16:05     ` Don Dutile
2021-05-11 16:12       ` Logan Gunthorpe
2021-05-11 16:23         ` Don Dutile
2021-04-08 17:01 ` [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps Logan Gunthorpe
2021-05-02  5:35   ` John Hubbard
2021-05-03 16:08     ` Logan Gunthorpe [this message]
2021-05-03 18:20       ` John Hubbard
2021-05-03 18:25       ` Christoph Hellwig
2021-05-11 16:05     ` Don Dutile
2021-05-11 16:16       ` Logan Gunthorpe
2021-05-11 16:05   ` Don Dutile
2021-05-11 16:14     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set Logan Gunthorpe
2021-05-02 19:58   ` John Hubbard
2021-05-03 16:17     ` Logan Gunthorpe
2021-05-03 18:22       ` John Hubbard
2021-05-03 18:35       ` Christoph Hellwig
2021-05-03 18:46         ` Logan Gunthorpe
2021-05-11 16:05     ` Don Dutile
2021-04-08 17:01 ` [PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device Logan Gunthorpe
2021-05-02 20:41   ` John Hubbard
2021-05-03 16:30     ` Logan Gunthorpe
2021-05-03 18:31       ` John Hubbard
2021-05-03 18:56         ` Logan Gunthorpe
2021-05-03 21:54           ` John Hubbard
2021-05-03 22:57             ` Jason Gunthorpe
2021-05-03 23:40               ` John Hubbard
2021-04-08 17:01 ` [PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma() Logan Gunthorpe
2021-04-27 19:22   ` Jason Gunthorpe
2021-04-27 22:49     ` Logan Gunthorpe
2021-04-27 19:31   ` Jason Gunthorpe
2021-04-27 22:55     ` Logan Gunthorpe
2021-04-27 23:01       ` Jason Gunthorpe
2021-05-03 18:28         ` Christoph Hellwig
2021-05-03 18:31           ` Logan Gunthorpe
2021-05-02 21:23   ` John Hubbard
2021-05-03 16:38     ` Logan Gunthorpe
2021-05-11 16:05   ` Don Dutile
2021-04-08 17:01 ` [PATCH 06/16] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL Logan Gunthorpe
2021-05-02 22:34   ` John Hubbard
2021-04-08 17:01 ` [PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static Logan Gunthorpe
2021-05-02 22:44   ` John Hubbard
2021-05-03 16:39     ` Logan Gunthorpe
2021-05-11 16:06   ` Don Dutile
2021-05-11 16:17     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations Logan Gunthorpe
2021-05-02 22:52   ` John Hubbard
2021-05-03  0:50   ` John Hubbard
2021-05-03 17:15     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg Logan Gunthorpe
2021-04-27 19:33   ` Jason Gunthorpe
2021-04-27 19:40     ` Jason Gunthorpe
2021-04-27 22:56       ` Logan Gunthorpe
2021-05-02 23:28   ` John Hubbard
2021-05-02 23:32     ` John Hubbard
2021-05-03 17:06       ` Logan Gunthorpe
2021-05-03 16:55     ` Logan Gunthorpe
2021-05-04  0:12       ` John Hubbard
2021-05-03 17:04     ` Logan Gunthorpe
2021-05-04  0:01       ` John Hubbard
2021-04-08 17:01 ` [PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support Logan Gunthorpe
2021-05-03  0:32   ` John Hubbard
2021-05-03 17:09     ` Logan Gunthorpe
2021-05-11 16:06   ` Don Dutile
2021-05-11 16:19     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe
2021-04-27 19:43   ` Jason Gunthorpe
2021-04-27 22:59     ` Logan Gunthorpe
2021-05-03  1:14   ` John Hubbard
2021-05-06 23:59     ` Logan Gunthorpe
2021-05-11 16:06   ` Don Dutile
2021-05-11 16:35     ` Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA Logan Gunthorpe
2021-05-03  1:29   ` John Hubbard
2021-05-03 17:17     ` Logan Gunthorpe
2021-05-04  0:17       ` John Hubbard
2021-04-08 17:01 ` [PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages Logan Gunthorpe
2021-05-03  1:34   ` John Hubbard
2021-05-03 17:19     ` Logan Gunthorpe
2021-05-04  0:26       ` John Hubbard
2021-04-08 17:01 ` [PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma Logan Gunthorpe
2021-04-27 19:47   ` Jason Gunthorpe
2021-04-27 22:59     ` Logan Gunthorpe
2021-05-03  1:37   ` John Hubbard
2021-04-08 17:01 ` [PATCH 15/16] RDMA/rw: use dma_map_sg_p2pdma() Logan Gunthorpe
2021-04-08 17:01 ` [PATCH 16/16] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() Logan Gunthorpe
2021-04-27 19:28 ` [PATCH 00/16] Add new DMA mapping operation for P2PDMA Jason Gunthorpe
2021-04-27 20:21   ` John Hubbard
2021-04-27 20:48     ` Dan Williams
2021-05-02  1:22 ` John Hubbard
2021-05-11 16:05 ` Don Dutile

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4091d87-7d9e-8cde-4e1c-01b877b6785f@deltatee.com \
    --to=logang@deltatee.com \
    --cc=andrzej.jakowski@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dave.b.minturn@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=ddutile@redhat.com \
    --cc=hch@lst.de \
    --cc=helgaas@kernel.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=ira.weiny@intel.com \
    --cc=jason@jlekstrand.net \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=jianxin.xiong@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=sbates@raithlin.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).