LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
From: Alistair Popple <alistair@popple.id.au>
To: linuxppc-dev@lists.ozlabs.org
Cc: Andrew Donnellan <ajd@linux.ibm.com>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	groug@kaod.org, clombard@linux.ibm.com,
	andrew.donnellan@au1.ibm.com,
	Frederic Barrat <fbarrat@linux.ibm.com>,
	Reza Arbab <arbab@linux.ibm.com>,
	alastair@au1.ibm.com
Subject: Re: [PATCH 01/11] powerpc/powernv/ioda: Fix ref count for devices with their own PE
Date: Mon, 18 Nov 2019 12:04:53 +1100
Message-ID: <7553103.1BsQ81z3e0@townsend> (raw)
In-Reply-To: <e8cb8ac1-b54e-ce15-e1ea-6131b7386342@linux.ibm.com>

On Wednesday, 13 November 2019 4:38:21 AM AEDT Frederic Barrat wrote:
> 
> Le 27/09/2019 à 08:54, Alexey Kardashevskiy a écrit :
> > 
> > 
> > On 27/09/2019 03:15, Frederic Barrat wrote:
> >>
> >>
> >> Le 26/09/2019 à 18:44, Andrew Donnellan a écrit :
> >>> On 9/9/19 5:45 pm, Frederic Barrat wrote:
> >>>> Taking a reference on the pci_dev structure was required with initial
> >>>> commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under
> >>>> OPAL v2"), where we were storing the pci_dev in the pci_dn structure.
> >>>>
> >>>> However, the pci_dev was later removed from the pci_dn structure, but
> >>>> the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa:
> >>>> Remove unnecessary pcidev from pci_dn").
> >>>>
> >>>> The pnv_ioda_pe structure life cycle is the same as the pci_dev
> >>>> structure, the PE is freed when the device is released. So we don't
> >>>> need a reference for the pci_dev stored in the PE, otherwise the
> >>>> pci_dev will never be released. Which is not really a surprise as the
> >>>> comment (removed here as no longer needed) was stating as much.
> >>>>
> >>>> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev 
from pci_dn")
> >>>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> >>>> ---
> >>>>    arch/powerpc/platforms/powernv/pci-ioda.c | 11 +----------
> >>>>    1 file changed, 1 insertion(+), 10 deletions(-)
> >>>>
> >>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/
platforms/powernv/pci-ioda.c
> >>>> index d8080558d020..92767f006f20 100644
> >>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>> @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe 
*pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> >>>>            return NULL;
> >>>>        }
> >>>> -    /* NOTE: We get only one ref to the pci_dev for the pdn, not for 
the
> >>>> -     * pointer in the PE data structure, both should be destroyed at 
the
> >>>> -     * same time. However, this needs to be looked at more closely 
again
> >>>> -     * once we actually start removing things (Hotplug, SR-IOV, ...)
> >>>> -     *
> >>>> -     * At some point we want to remove the PDN completely anyways
> >>>> -     */
> >>>> -    pci_dev_get(dev);
> >>>>        pdn->pe_number = pe->pe_number;
> >>>>        pe->flags = PNV_IODA_PE_DEV;
> >>>>        pe->pdev = dev;
> >>>> @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe 
*pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> >>>>            pnv_ioda_free_pe(pe);
> >>>>            pdn->pe_number = IODA_INVALID_PE;
> >>>>            pe->pdev = NULL;
> >>>> -        pci_dev_put(dev);
> >>>>            return NULL;
> >>>>        }
> >>>> @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe 
*pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
> >>>>                 */
> >>>>                dev_info(&npu_pdev->dev,
> >>>>                    "Associating to existing PE %x\n", pe_num);
> >>>> -            pci_dev_get(npu_pdev);
> >>>> +            pci_dev_get(npu_pdev); // still needed after 902bdc57451c 
?
> >>>
> >>> Did you mean to leave that comment in?
> >>
> >> Yes, I assumed the series wouldn't get in on the first try and a nvlink-
minded developer would weigh in :)
> > 
> > I am looking at it and I am still not so sure what exactly guarantees that 
lifetime(@dev) == lifetime(@pe). For example,
> > sriov_disable() removes VFs first, and only then releases PEs so there is a 
window when we may have a stale pdev. Not sure.
> 
> 
> Indeed, for SRIOV, PE life-cycle is handled differently. And hopefully 
> correctly, but I don’t think this patch alters it.
> 
> I was discussing with Greg about it, as he had to look in that area in 
> the past. My understanding is that this patch is ok as it follows the 
> initial comment in pnv_ioda_setup_dev_PE() that we don’t need to get a 
> reference for the PE, it makes sense and I could trace the origin of the 
> pci_dev_get() and it seemed like it should have been removed in a 
> previous patch (902bdc57451c).
> 
> However, one question is whether this patch breaks nvlink and if nvlink 
> assumes the devices won’t go away because we explicitly take a reference 
> forever. In npu_dma.c, there are 2 functions which allow to find the GPU 
> associated to a npu device, and vice-versa. Both functions return a 
> pointer to a struct pci_dev, but they don’t take a reference on the 
> device being returned. So that seems dangerous. I’m probably missing 
> something.
> 
> Alexey, Alistair: what, if anything, guarantees, that the npu or gpu 
> devices stay valid. Is it because we simply don’t provide any means to 
> get rid of them ? Otherwise, don’t we need the callers of 
> pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference 
> counting ? I’ve started looking into it and the changes are scary, which 
> explains Greg’s related commit 02c5f5394918b.

To be honest the reference counting looks like it has evolved into something 
quite suspect and I don't think you're missing anything. In practice though we 
likely haven't hit any issues because the original callers didn't store 
references to the pdev which would make the window quite small (although the 
pass through work may have changed that). And as you say there simply wasn't 
any means to get rid of them anyway (EEH, hotplug, etc. has never been 
implemented or supported for GPUs and all sorts of terrible things happen if 
you try).

The best solution would likely be to review the reference counting and to 
teach callers of get_*_dev() to call pci_put_dev(), etc.

- Alistair

>    Fred
> 
> 
> 
> > 
> >>
> >>    Fred
> >>
> >>
> >>>>                npu_pdn = pci_get_pdn(npu_pdev);
> >>>>                rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
> >>>>                npu_pdn->pe_number = pe_num;
> >>>>
> >>>
> >>
> > 
> 
> 





  reply index

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09 15:45 [PATCH 00/11] opencapi: enable card reset and link retraining Frederic Barrat
2019-09-09 15:45 ` [PATCH 01/11] powerpc/powernv/ioda: Fix ref count for devices with their own PE Frederic Barrat
2019-09-10  0:20   ` Alastair D'Silva
2019-09-26 16:44   ` Andrew Donnellan
2019-09-26 17:15     ` Frederic Barrat
2019-09-27  6:54       ` Alexey Kardashevskiy
2019-11-12 17:38         ` Frederic Barrat
2019-11-18  1:04           ` Alistair Popple [this message]
2019-11-18  1:24             ` Oliver O'Halloran
2019-11-18  2:36               ` Alistair Popple
2019-11-18 18:21                 ` Frederic Barrat
2019-09-09 15:45 ` [PATCH 02/11] powerpc/powernv/ioda: Protect PE list Frederic Barrat
2019-09-10  0:34   ` Alastair D'Silva
2019-11-19 12:55     ` Frederic Barrat
2019-11-19 13:22       ` Oliver O'Halloran
2019-11-19 14:36         ` Frederic Barrat
2019-11-19  5:55   ` Andrew Donnellan
2019-09-09 15:45 ` [PATCH 03/11] powerpc/powernv/ioda: set up PE on opencapi device when enabling Frederic Barrat
2019-09-10  0:38   ` Alastair D'Silva
2019-09-27 16:43   ` Andrew Donnellan
2019-09-09 15:45 ` [PATCH 04/11] powerpc/powernv/ioda: Release opencapi device Frederic Barrat
2019-09-10  0:56   ` Alastair D'Silva
2019-11-19 17:32     ` Frederic Barrat
2019-11-19  7:08   ` Andrew Donnellan
2019-09-09 15:45 ` [PATCH 05/11] powerpc/powernv/ioda: Find opencapi slot for a device node Frederic Barrat
2019-09-10  0:57   ` Alastair D'Silva
2019-11-19  1:26   ` Andrew Donnellan
2019-11-19 14:33     ` Frederic Barrat
2019-09-09 15:45 ` [PATCH 06/11] pci/hotplug/pnv-php: Remove erroneous warning Frederic Barrat
2019-09-10  0:58   ` Alastair D'Silva
2019-11-19  5:00   ` Andrew Donnellan
2019-09-09 15:45 ` [PATCH 07/11] pci/hotplug/pnv-php: Improve error msg on power state change failure Frederic Barrat
2019-09-10  0:59   ` Alastair D'Silva
2019-11-19  2:23   ` Andrew Donnellan
2019-09-09 15:45 ` [PATCH 08/11] pci/hotplug/pnv-php: Register opencapi slots Frederic Barrat
2019-09-10  1:00   ` Alastair D'Silva
2019-11-19  5:18   ` Andrew Donnellan
2019-11-19 15:15     ` Frederic Barrat
2019-09-09 15:45 ` [PATCH 09/11] pci/hotplug/pnv-php: Relax check when disabling slot Frederic Barrat
2019-09-10  1:00   ` Alastair D'Silva
2019-09-27 15:56   ` Andrew Donnellan
2019-09-09 15:45 ` [PATCH 10/11] pci/hotplug/pnv-php: Wrap warnings in macro Frederic Barrat
2019-09-10  1:03   ` Alastair D'Silva
2019-09-26 17:18   ` Andrew Donnellan
2019-09-09 15:46 ` [PATCH 11/11] ocxl: Add PCI hotplug dependency to Kconfig Frederic Barrat
2019-09-10  1:03   ` Alastair D'Silva
2019-09-26 17:11   ` Andrew Donnellan
2019-09-24  4:24 ` [PATCH 00/11] opencapi: enable card reset and link retraining Alexey Kardashevskiy
2019-09-24  6:39   ` Frederic Barrat
2019-09-25  0:20     ` Alexey Kardashevskiy

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7553103.1BsQ81z3e0@townsend \
    --to=alistair@popple.id.au \
    --cc=aik@ozlabs.ru \
    --cc=ajd@linux.ibm.com \
    --cc=alastair@au1.ibm.com \
    --cc=andrew.donnellan@au1.ibm.com \
    --cc=arbab@linux.ibm.com \
    --cc=clombard@linux.ibm.com \
    --cc=fbarrat@linux.ibm.com \
    --cc=groug@kaod.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git