linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* refcount leak in pci_get_device()?
@ 2008-08-21 20:19 Alex Chiang
  2008-08-21 20:25 ` Matthew Wilcox
  2008-08-21 20:40 ` Greg KH
  0 siblings, 2 replies; 19+ messages in thread
From: Alex Chiang @ 2008-08-21 20:19 UTC (permalink / raw)
  To: gregkh; +Cc: linux-pci, linux-kernel

Hi Greg,

While playing around with my slot symlink stuff, I noticed that
the following sequence is problematic:

	1. clean boot
	2. modprobe acpiphp
	3. echo 0 > /sys/bus/pci/slots/N/power
	4. ???

After step 3, we *should* be seeing pci_release_dev() getting
called, but we never do because the refcount on the device is
still quite high (5 or 6, on my ia64 system).

I'm still trying to track this down, but I did notice, via code
inspection, at least one suspicious area:

#define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)

That eventually calls pci_get_dev_by_id(), which increases the
refcount on the device, but never decrements it.

Looks like that change in behavior happened here:

	PCI: clean up search.c a lot
	95247b57ed844511a212265b45cf9a919753aea1

pci_get_device() used to decrement the refcount, but no longer
does.

Thanks to Matthew Wilcox for helping me get this far...

Like I said, I'm still trying to track down my particular issue,
but I'd like to get your opinion on this.

Thanks!

/ac


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: refcount leak in pci_get_device()?
  2008-08-21 20:19 refcount leak in pci_get_device()? Alex Chiang
@ 2008-08-21 20:25 ` Matthew Wilcox
  2008-08-21 20:47   ` Greg KH
  2008-08-21 20:40 ` Greg KH
  1 sibling, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2008-08-21 20:25 UTC (permalink / raw)
  To: Alex Chiang, gregkh, linux-pci, linux-kernel

On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
> #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
> 
> That eventually calls pci_get_dev_by_id(), which increases the
> refcount on the device, but never decrements it.
> 
> Looks like that change in behavior happened here:
> 
> 	PCI: clean up search.c a lot
> 	95247b57ed844511a212265b45cf9a919753aea1
> 
> pci_get_device() used to decrement the refcount, but no longer
> does.
> 
> Thanks to Matthew Wilcox for helping me get this far...
> 
> Like I said, I'm still trying to track down my particular issue,
> but I'd like to get your opinion on this.

In particular, I'd like to know whether this should be fixed by
pci_get_dev_by_id() decrementing the refcount of from/dev_start,
pci_get_subsys() decrementing 'from', or by bus_find_device()
decrementing 'start'.  It looks like bus_find_device() is the place
where this should logically happen, but the kerneldoc doesn't document
the intended behaviour.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: refcount leak in pci_get_device()?
  2008-08-21 20:19 refcount leak in pci_get_device()? Alex Chiang
  2008-08-21 20:25 ` Matthew Wilcox
@ 2008-08-21 20:40 ` Greg KH
  1 sibling, 0 replies; 19+ messages in thread
From: Greg KH @ 2008-08-21 20:40 UTC (permalink / raw)
  To: Alex Chiang, linux-pci, linux-kernel

On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
> Hi Greg,
> 
> While playing around with my slot symlink stuff, I noticed that
> the following sequence is problematic:
> 
> 	1. clean boot
> 	2. modprobe acpiphp
> 	3. echo 0 > /sys/bus/pci/slots/N/power
> 	4. ???
> 
> After step 3, we *should* be seeing pci_release_dev() getting
> called, but we never do because the refcount on the device is
> still quite high (5 or 6, on my ia64 system).
> 
> I'm still trying to track this down, but I did notice, via code
> inspection, at least one suspicious area:
> 
> #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
> 
> That eventually calls pci_get_dev_by_id(), which increases the
> refcount on the device, but never decrements it.
> 
> Looks like that change in behavior happened here:
> 
> 	PCI: clean up search.c a lot
> 	95247b57ed844511a212265b45cf9a919753aea1
> 
> pci_get_device() used to decrement the refcount, but no longer
> does.

No, pci_get_device() never decremented the refcount, and that didn't
change in the above git commit.

The description of pci_get_device() says that a reference is grabbed:
	Iterates through the list of known PCI devices.  If a PCI device
	is found with a matching @vendor and @device, the reference
	count to the device is incremented and a pointer to its device
	structure is returned.  Otherwise, %NULL is returned.  A new
	search is initiated by passing %NULL as the @from argument.
	Otherwise if @from is not %NULL, searches continue from next
	device on the global list.  The reference count for @from is
	always decremented if it is not %NULL.


All of the pci_find* functions should not have grabbed a reference to
the device, as that was the "old" behavior.  All of the pci_get*
functions do grab a reference.

Did I somehow mess up and one of the pci_find* functions now improperly
increment a reference?  Hopefully we shouldn't be using those functions
anymore as they aren't hotplug safe...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: refcount leak in pci_get_device()?
  2008-08-21 20:25 ` Matthew Wilcox
@ 2008-08-21 20:47   ` Greg KH
  2008-08-21 22:14     ` Alex Chiang
  2008-08-21 22:23     ` refcount leak in pci_get_device()? Jesse Barnes
  0 siblings, 2 replies; 19+ messages in thread
From: Greg KH @ 2008-08-21 20:47 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Alex Chiang, linux-pci, linux-kernel

On Thu, Aug 21, 2008 at 02:25:04PM -0600, Matthew Wilcox wrote:
> On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
> > #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
> > 
> > That eventually calls pci_get_dev_by_id(), which increases the
> > refcount on the device, but never decrements it.
> > 
> > Looks like that change in behavior happened here:
> > 
> > 	PCI: clean up search.c a lot
> > 	95247b57ed844511a212265b45cf9a919753aea1
> > 
> > pci_get_device() used to decrement the refcount, but no longer
> > does.
> > 
> > Thanks to Matthew Wilcox for helping me get this far...
> > 
> > Like I said, I'm still trying to track down my particular issue,
> > but I'd like to get your opinion on this.
> 
> In particular, I'd like to know whether this should be fixed by
> pci_get_dev_by_id() decrementing the refcount of from/dev_start,
> pci_get_subsys() decrementing 'from', or by bus_find_device()
> decrementing 'start'.  It looks like bus_find_device() is the place
> where this should logically happen, but the kerneldoc doesn't document
> the intended behaviour.

Ah, no the driver core isn't supposed to do this, it's something the pci
functions do out of "niceness" as that's how we can use them in an
iterator properly.

Does the following (untested) patch fix the issue for you all?

thanks,

greg k-h

--------------
Subject: PCI: fix reference leak in pci_get_dev_by_id()

From: Greg Kroah-Hartman <gregkh@suse.de>

Alex Chiang and Matthew Wilcox pointed out that pci_get_dev_by_id() does
not properly decrement the reference on the from pointer if it is
present, like the documentation for the function states it will.

Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Alex Chiang <achiang@hp.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index 217814f..3b3b5f1 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -280,6 +280,8 @@ static struct pci_dev *pci_get_dev_by_id(const struct pci_device_id *id,
 			      match_pci_dev_by_id);
 	if (dev)
 		pdev = to_pci_dev(dev);
+	if (from)
+		pci_dev_put(from);
 	return pdev;
 }
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: refcount leak in pci_get_device()?
  2008-08-21 20:47   ` Greg KH
@ 2008-08-21 22:14     ` Alex Chiang
  2008-08-30  4:23       ` Zhao, Yu
  2008-08-21 22:23     ` refcount leak in pci_get_device()? Jesse Barnes
  1 sibling, 1 reply; 19+ messages in thread
From: Alex Chiang @ 2008-08-21 22:14 UTC (permalink / raw)
  To: Greg KH; +Cc: Matthew Wilcox, linux-pci, linux-kernel

* Greg KH <gregkh@suse.de>:
> On Thu, Aug 21, 2008 at 02:25:04PM -0600, Matthew Wilcox wrote:
> > On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
> > > #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)
> > > 
> > > That eventually calls pci_get_dev_by_id(), which increases the
> > > refcount on the device, but never decrements it.
> > > 
> > > Looks like that change in behavior happened here:
> > > 
> > > 	PCI: clean up search.c a lot
> > > 	95247b57ed844511a212265b45cf9a919753aea1
> > > 
> > > pci_get_device() used to decrement the refcount, but no longer
> > > does.
> > > 
> > > Thanks to Matthew Wilcox for helping me get this far...
> > > 
> > > Like I said, I'm still trying to track down my particular issue,
> > > but I'd like to get your opinion on this.
> > 
> > In particular, I'd like to know whether this should be fixed by
> > pci_get_dev_by_id() decrementing the refcount of from/dev_start,
> > pci_get_subsys() decrementing 'from', or by bus_find_device()
> > decrementing 'start'.  It looks like bus_find_device() is the place
> > where this should logically happen, but the kerneldoc doesn't document
> > the intended behaviour.
> 
> Ah, no the driver core isn't supposed to do this, it's something the pci
> functions do out of "niceness" as that's how we can use them in an
> iterator properly.
> 
> Does the following (untested) patch fix the issue for you all?

Perfect, yes.

I applied it, and observed the refcount on the device I was using
to debug this problem.

I was able to modprobe acpiphp successfully, and echo 0 into the
device's 'power' file.

I then watched the rest of the hotplug core do its thing,
decrementing the refcount properly along the way, and at the end,
we did call pci_release_dev(), as I originally expected to. :)

Just to verify, I toggled the device's power a few times (echoing
a 1 and then a 0, etc.) and watched the refcounts. After each
offline, we properly called pci_release_dev().

Thanks for fixing this. It fixes a pretty bad leak in the hotplug
core (we were leaking an entire struct pci_dev * # of functions
for each offlined card, the first time around; subsequent
onlines/offlines were ok).

Tested-by: Alex Chiang <achiang@hp.com>

/ac

> 
> thanks,
> 
> greg k-h
> 
> --------------
> Subject: PCI: fix reference leak in pci_get_dev_by_id()
> 
> From: Greg Kroah-Hartman <gregkh@suse.de>
> 
> Alex Chiang and Matthew Wilcox pointed out that pci_get_dev_by_id() does
> not properly decrement the reference on the from pointer if it is
> present, like the documentation for the function states it will.
> 
> Cc: Matthew Wilcox <matthew@wil.cx>
> Cc: Alex Chiang <achiang@hp.com>
> Cc: stable <stable@kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> 
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index 217814f..3b3b5f1 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -280,6 +280,8 @@ static struct pci_dev *pci_get_dev_by_id(const struct pci_device_id *id,
>  			      match_pci_dev_by_id);
>  	if (dev)
>  		pdev = to_pci_dev(dev);
> +	if (from)
> +		pci_dev_put(from);
>  	return pdev;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: refcount leak in pci_get_device()?
  2008-08-21 20:47   ` Greg KH
  2008-08-21 22:14     ` Alex Chiang
@ 2008-08-21 22:23     ` Jesse Barnes
  2008-08-22  1:04       ` Henrique de Moraes Holschuh
  1 sibling, 1 reply; 19+ messages in thread
From: Jesse Barnes @ 2008-08-21 22:23 UTC (permalink / raw)
  To: Greg KH; +Cc: Matthew Wilcox, Alex Chiang, linux-pci, linux-kernel

On Thursday, August 21, 2008 1:47 pm Greg KH wrote:
> Subject: PCI: fix reference leak in pci_get_dev_by_id()
>
> From: Greg Kroah-Hartman <gregkh@suse.de>
>
> Alex Chiang and Matthew Wilcox pointed out that pci_get_dev_by_id() does
> not properly decrement the reference on the from pointer if it is
> present, like the documentation for the function states it will.
>
> Cc: Matthew Wilcox <matthew@wil.cx>
> Cc: Alex Chiang <achiang@hp.com>
> Cc: stable <stable@kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
>
>
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index 217814f..3b3b5f1 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -280,6 +280,8 @@ static struct pci_dev *pci_get_dev_by_id(const struct
> pci_device_id *id, match_pci_dev_by_id);
>  	if (dev)
>  		pdev = to_pci_dev(dev);
> +	if (from)
> +		pci_dev_put(from);
>  	return pdev;
>  }

Wow, thanks Greg, that was fast.  I applied this to my for-linus branch; I'll 
ask Linus to pull it for 2.6.27.

Thanks,
--
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: refcount leak in pci_get_device()?
  2008-08-21 22:23     ` refcount leak in pci_get_device()? Jesse Barnes
@ 2008-08-22  1:04       ` Henrique de Moraes Holschuh
  2008-08-22  1:09         ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 19+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-08-22  1:04 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Greg KH, Matthew Wilcox, Alex Chiang, linux-pci, linux-kernel

On Thu, 21 Aug 2008, Jesse Barnes wrote:
> Wow, thanks Greg, that was fast.  I applied this to my for-linus branch; I'll 
> ask Linus to pull it for 2.6.27.

Does 2.6.26 need it as well?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: refcount leak in pci_get_device()?
  2008-08-22  1:04       ` Henrique de Moraes Holschuh
@ 2008-08-22  1:09         ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 19+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-08-22  1:09 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Greg KH, Matthew Wilcox, Alex Chiang, linux-pci, linux-kernel

On Thu, 21 Aug 2008, Henrique de Moraes Holschuh wrote:
> On Thu, 21 Aug 2008, Jesse Barnes wrote:
> > Wow, thanks Greg, that was fast.  I applied this to my for-linus branch; I'll 
> > ask Linus to pull it for 2.6.27.
> 
> Does 2.6.26 need it as well?

Never mind, I just read Greg's reply.  It arrived a few seconds after I sent
this.  Sorry about it.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: refcount leak in pci_get_device()?
  2008-08-21 22:14     ` Alex Chiang
@ 2008-08-30  4:23       ` Zhao, Yu
  2008-08-30  5:37         ` Greg KH
  0 siblings, 1 reply; 19+ messages in thread
From: Zhao, Yu @ 2008-08-30  4:23 UTC (permalink / raw)
  To: Alex Chiang, Greg KH; +Cc: Matthew Wilcox, linux-pci, linux-kernel

And the pci_get_dev_by_id() is not safe again the PCI device removal. It might fire a warning in bus_find_device() when reference count of the knode_bus is decreased to 0 by pci_remove_bus().

Need to document this or a fix?

Thanks,
Yu

On Friday, August 22, 2008 6:15 AM, Alex Chiang wrote:
>* Greg KH <gregkh@suse.de>:
>> On Thu, Aug 21, 2008 at 02:25:04PM -0600, Matthew Wilcox wrote:
>> > On Thu, Aug 21, 2008 at 02:19:18PM -0600, Alex Chiang wrote:
>> > > #define for_each_pci_dev(d) while ((d = pci_get_device(PCI_ANY_ID,
>PCI_ANY_ID, d)) != NULL)
>> > >
>> > > That eventually calls pci_get_dev_by_id(), which increases the
>> > > refcount on the device, but never decrements it.
>> > >
>> > > Looks like that change in behavior happened here:
>> > >
>> > > 	PCI: clean up search.c a lot
>> > > 	95247b57ed844511a212265b45cf9a919753aea1
>> > >
>> > > pci_get_device() used to decrement the refcount, but no longer
>> > > does.
>> > >
>> > > Thanks to Matthew Wilcox for helping me get this far...
>> > >
>> > > Like I said, I'm still trying to track down my particular issue,
>> > > but I'd like to get your opinion on this.
>> >
>> > In particular, I'd like to know whether this should be fixed by
>> > pci_get_dev_by_id() decrementing the refcount of from/dev_start,
>> > pci_get_subsys() decrementing 'from', or by bus_find_device()
>> > decrementing 'start'.  It looks like bus_find_device() is the place
>> > where this should logically happen, but the kerneldoc doesn't document
>> > the intended behaviour.
>>
>> Ah, no the driver core isn't supposed to do this, it's something the pci
>> functions do out of "niceness" as that's how we can use them in an
>> iterator properly.
>>
>> Does the following (untested) patch fix the issue for you all?
>
>Perfect, yes.
>
>I applied it, and observed the refcount on the device I was using
>to debug this problem.
>
>I was able to modprobe acpiphp successfully, and echo 0 into the
>device's 'power' file.
>
>I then watched the rest of the hotplug core do its thing,
>decrementing the refcount properly along the way, and at the end,
>we did call pci_release_dev(), as I originally expected to. :)
>
>Just to verify, I toggled the device's power a few times (echoing
>a 1 and then a 0, etc.) and watched the refcounts. After each
>offline, we properly called pci_release_dev().
>
>Thanks for fixing this. It fixes a pretty bad leak in the hotplug
>core (we were leaking an entire struct pci_dev * # of functions
>for each offlined card, the first time around; subsequent
>onlines/offlines were ok).
>
>Tested-by: Alex Chiang <achiang@hp.com>
>
>/ac
>
>>
>> thanks,
>>
>> greg k-h
>>
>> --------------
>> Subject: PCI: fix reference leak in pci_get_dev_by_id()
>>
>> From: Greg Kroah-Hartman <gregkh@suse.de>
>>
>> Alex Chiang and Matthew Wilcox pointed out that pci_get_dev_by_id() does
>> not properly decrement the reference on the from pointer if it is
>> present, like the documentation for the function states it will.
>>
>> Cc: Matthew Wilcox <matthew@wil.cx>
>> Cc: Alex Chiang <achiang@hp.com>
>> Cc: stable <stable@kernel.org>
>> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
>>
>>
>> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
>> index 217814f..3b3b5f1 100644
>> --- a/drivers/pci/search.c
>> +++ b/drivers/pci/search.c
>> @@ -280,6 +280,8 @@ static struct pci_dev *pci_get_dev_by_id(const struct
>pci_device_id *id,
>>  			      match_pci_dev_by_id);
>>  	if (dev)
>>  		pdev = to_pci_dev(dev);
>> +	if (from)
>> +		pci_dev_put(from);
>>  	return pdev;
>>  }
>>
>>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: refcount leak in pci_get_device()?
  2008-08-30  4:23       ` Zhao, Yu
@ 2008-08-30  5:37         ` Greg KH
  2008-08-30  6:20           ` Zhao, Yu
  0 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2008-08-30  5:37 UTC (permalink / raw)
  To: Zhao, Yu; +Cc: Alex Chiang, Matthew Wilcox, linux-pci, linux-kernel

On Sat, Aug 30, 2008 at 12:23:20PM +0800, Zhao, Yu wrote:
> And the pci_get_dev_by_id() is not safe again the PCI device removal.
> It might fire a warning in bus_find_device() when reference count of
> the knode_bus is decreased to 0 by pci_remove_bus().

Is this something new?  Hasn't this always been that way?  Why would you
be wanting to call this function anyway?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: refcount leak in pci_get_device()?
  2008-08-30  5:37         ` Greg KH
@ 2008-08-30  6:20           ` Zhao, Yu
  2008-08-31  3:14             ` problems in fakephp (was RE: refcount leak in pci_get_device()?) Zhao, Yu
  0 siblings, 1 reply; 19+ messages in thread
From: Zhao, Yu @ 2008-08-30  6:20 UTC (permalink / raw)
  To: Greg KH; +Cc: Alex Chiang, Matthew Wilcox, linux-pci, linux-kernel

It's been so for a while, guess since 2.5. I meant to say the pci_remove_bus_device(), not pci_remove_bus(), is used by pci hotplug code. If a device is being plugged off, and some drivers are calling pci_get_dev_by_id() to search something, then the warning might be fired through bus_find_device -> klist_iter_init_node ->kref_get.

Thanks,
Yu

>-----Original Message-----
>From: linux-pci-owner@vger.kernel.org
>[mailto:linux-pci-owner@vger.kernel.org] On Behalf Of Greg KH
>Sent: Saturday, August 30, 2008 1:38 PM
>To: Zhao, Yu
>Cc: Alex Chiang; Matthew Wilcox; linux-pci@vger.kernel.org;
>linux-kernel@vger.kernel.org
>Subject: Re: refcount leak in pci_get_device()?
>
>On Sat, Aug 30, 2008 at 12:23:20PM +0800, Zhao, Yu wrote:
>> And the pci_get_dev_by_id() is not safe again the PCI device removal.
>> It might fire a warning in bus_find_device() when reference count of
>> the knode_bus is decreased to 0 by pci_remove_bus().
>
>Is this something new?  Hasn't this always been that way?  Why would you
>be wanting to call this function anyway?
>
>thanks,
>
>greg k-h
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* problems in fakephp (was RE: refcount leak in pci_get_device()?)
  2008-08-30  6:20           ` Zhao, Yu
@ 2008-08-31  3:14             ` Zhao, Yu
  2008-09-01 18:40               ` [PATCH] PCI Hotplug: fakephp: fix deadlock... again Alex Chiang
  0 siblings, 1 reply; 19+ messages in thread
From: Zhao, Yu @ 2008-08-31  3:14 UTC (permalink / raw)
  To: Alex Chiang, Greg KH; +Cc: Matthew Wilcox, linux-pci, linux-kernel

Just happened to see a change in fakephp when searching usage of the pci_remove_bus_device(). I couldn't find the original patches, but this http://git.kernel.org/?p=linux/kernel/git/jbarnes/pci-2.6.git;a=commitdiff;h=fe99740cac117f208707488c03f3789cf4904957 shows, the pci_remove_bus_device() was removed while the remove_slot() was added in disable_slot(), which means: 1) fakephp can't remove pci_dev anymore; 2) deadlock happens because remove_slot() tries to remove the sysfs entry calling itself.

On Saturday, August 30, 2008 2:21 PM, Zhao, Yu wrote:
>It's been so for a while, guess since 2.5. I meant to say the
>pci_remove_bus_device(), not pci_remove_bus(), is used by pci hotplug code. If
>a device is being plugged off, and some drivers are calling pci_get_dev_by_id()
>to search something, then the warning might be fired through bus_find_device
>-> klist_iter_init_node ->kref_get.
>
>Thanks,
>Yu
>
>>-----Original Message-----
>>From: linux-pci-owner@vger.kernel.org
>>[mailto:linux-pci-owner@vger.kernel.org] On Behalf Of Greg KH
>>Sent: Saturday, August 30, 2008 1:38 PM
>>To: Zhao, Yu
>>Cc: Alex Chiang; Matthew Wilcox; linux-pci@vger.kernel.org;
>>linux-kernel@vger.kernel.org
>>Subject: Re: refcount leak in pci_get_device()?
>>
>>On Sat, Aug 30, 2008 at 12:23:20PM +0800, Zhao, Yu wrote:
>>> And the pci_get_dev_by_id() is not safe again the PCI device removal.
>>> It might fire a warning in bus_find_device() when reference count of
>>> the knode_bus is decreased to 0 by pci_remove_bus().
>>
>>Is this something new?  Hasn't this always been that way?  Why would you
>>be wanting to call this function anyway?
>>
>>thanks,
>>
>>greg k-h
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] PCI Hotplug: fakephp: fix deadlock... again
  2008-08-31  3:14             ` problems in fakephp (was RE: refcount leak in pci_get_device()?) Zhao, Yu
@ 2008-09-01 18:40               ` Alex Chiang
  2008-09-02  0:10                 ` Matthew Wilcox
  2008-09-04 15:03                 ` Zhao, Yu
  0 siblings, 2 replies; 19+ messages in thread
From: Alex Chiang @ 2008-09-01 18:40 UTC (permalink / raw)
  To: Zhao, Yu, jbarnes; +Cc: Greg KH, Matthew Wilcox, linux-pci, linux-kernel

* Zhao, Yu <yu.zhao@intel.com>:
> Just happened to see a change in fakephp when searching usage
> of the pci_remove_bus_device(). I couldn't find the original
> patches, but this
> http://git.kernel.org/?p=linux/kernel/git/jbarnes/pci-2.6.git;a=commitdiff;h=fe99740cac117f208707488c03f3789cf4904957
> shows, the pci_remove_bus_device() was removed while the
> remove_slot() was added in disable_slot(), which means: 1)
> fakephp can't remove pci_dev anymore; 2) deadlock happens
> because remove_slot() tries to remove the sysfs entry calling
> itself.

Sorry about introducing this regression. Does the following patch
fix it for you?

I've tested it on my system, and I can again use fakephp to
remove a device:

sapphire:/sys/bus/pci/slots # echo 0 > fake4/power 
fakephp: disable_slot - physical_slot = fake4
e1000 0000:01:02.1: PCI INT B disabled
GSI 32 (level, low) -> CPU 2 (0x0200) vector 54 unregistered
fakephp: Slot already scheduled for removal
fakephp: removing slot fake4

We get the "slot already scheduled for removal" because that
particular device has 2 functions, and we're creating slots on a
per-slot basis now, not a per-function basis.

Although, I wonder, Willy -- is that really the right thing to
do? Seems like fakephp would be more useful if we did operate on
a per-function basis, and not per-slot. Especially given Yu's
work with SR-IOV, where we can apparently have lots of functions
per a physical device.

Hm?

Thanks.

/ac

From: Alex Chiang <achiang@hp.com>

PCI Hotplug: fakephp: fix deadlock... again

Commit fe99740cac117f208707488c03f3789cf4904957 (construct one
fakephp slot per PCI slot) introduced a regression, causing a
deadlock when removing a PCI device.

We also never actually removed the device from the PCI core.

So we:

	- remove the device from the PCI core
	- do not directly call remove_slot() to prevent deadlock

Yu Zhao reported and diagnosed this defect.

Signed-off-by: Alex Chiang <achiang@hp.com>
---
diff --git a/drivers/pci/hotplug/fakephp.c b/drivers/pci/hotplug/fakephp.c
index 40337a0..146ca9c 100644
--- a/drivers/pci/hotplug/fakephp.c
+++ b/drivers/pci/hotplug/fakephp.c
@@ -320,15 +320,15 @@ static int disable_slot(struct hotplug_slot *slot)
 			return -ENODEV;
 		}
 
+		/* remove the device from the pci core */
+		pci_remove_bus_device(dev);
+
 		/* queue work item to blow away this sysfs entry and other
 		 * parts.
 		 */
 		INIT_WORK(&dslot->remove_work, remove_slot_worker);
 		queue_work(dummyphp_wq, &dslot->remove_work);
 
-		/* blow away this sysfs entry and other parts. */
-		remove_slot(dslot);
-
 		pci_dev_put(dev);
 	}
 	return 0;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI Hotplug: fakephp: fix deadlock... again
  2008-09-01 18:40               ` [PATCH] PCI Hotplug: fakephp: fix deadlock... again Alex Chiang
@ 2008-09-02  0:10                 ` Matthew Wilcox
  2008-09-02  0:19                   ` Alex Chiang
  2008-09-04 15:03                 ` Zhao, Yu
  1 sibling, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2008-09-02  0:10 UTC (permalink / raw)
  To: Alex Chiang, Zhao, Yu, jbarnes, Greg KH, linux-pci, linux-kernel

On Mon, Sep 01, 2008 at 12:40:18PM -0600, Alex Chiang wrote:
> We get the "slot already scheduled for removal" because that
> particular device has 2 functions, and we're creating slots on a
> per-slot basis now, not a per-function basis.
> 
> Although, I wonder, Willy -- is that really the right thing to
> do? Seems like fakephp would be more useful if we did operate on
> a per-function basis, and not per-slot. Especially given Yu's
> work with SR-IOV, where we can apparently have lots of functions
> per a physical device.

I suspect it depends on what you believe the point of fakephp is.
My assumption was that it was a way to fake what would happen if you had
a hotplug controller for a particular slot.  In that context, the change
I made was clearly correct.  If you want to use it for hot-removing
individual functions from a Linux guest running under a hypervisor
(for example), that's much less useful.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI Hotplug: fakephp: fix deadlock... again
  2008-09-02  0:10                 ` Matthew Wilcox
@ 2008-09-02  0:19                   ` Alex Chiang
  2008-09-09  4:12                     ` Jesse Barnes
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Chiang @ 2008-09-02  0:19 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Zhao, Yu, jbarnes, Greg KH, linux-pci, linux-kernel

* Matthew Wilcox <matthew@wil.cx>:
> On Mon, Sep 01, 2008 at 12:40:18PM -0600, Alex Chiang wrote:
> > We get the "slot already scheduled for removal" because that
> > particular device has 2 functions, and we're creating slots
> > on a per-slot basis now, not a per-function basis.
> > 
> > Although, I wonder, Willy -- is that really the right thing
> > to do? Seems like fakephp would be more useful if we did
> > operate on a per-function basis, and not per-slot. Especially
> > given Yu's work with SR-IOV, where we can apparently have
> > lots of functions per a physical device.
> 
> I suspect it depends on what you believe the point of fakephp
> is.  

Ok, this was all developed before I started working in this area,
or Linux, even. ;)

> My assumption was that it was a way to fake what would happen
> if you had a hotplug controller for a particular slot.  In that
> context, the change I made was clearly correct.  If you want to
> use it for hot-removing individual functions from a Linux guest
> running under a hypervisor (for example), that's much less
> useful.

Sure, that sounds reasonable.

In that case, my patch should fix the stupid regression I
introduced, but if others think that fakephp would be more useful
for hot-removing functions, I wouldn't object to reverting the
original patch.

Thanks.

/ac


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] PCI Hotplug: fakephp: fix deadlock... again
  2008-09-01 18:40               ` [PATCH] PCI Hotplug: fakephp: fix deadlock... again Alex Chiang
  2008-09-02  0:10                 ` Matthew Wilcox
@ 2008-09-04 15:03                 ` Zhao, Yu
  1 sibling, 0 replies; 19+ messages in thread
From: Zhao, Yu @ 2008-09-04 15:03 UTC (permalink / raw)
  To: Alex Chiang, jbarnes; +Cc: Greg KH, Matthew Wilcox, linux-pci, linux-kernel

Thanks for the fix.

Acked-by: Yu Zhao <yu.zhao@intel.com>

On Tuesday, September 02, 2008 2:40 AM, Alex Chiang wrote:
>* Zhao, Yu <yu.zhao@intel.com>:
>> Just happened to see a change in fakephp when searching usage
>> of the pci_remove_bus_device(). I couldn't find the original
>> patches, but this
>>
>http://git.kernel.org/?p=linux/kernel/git/jbarnes/pci-2.6.git;a=commitdiff;
>h=fe99740cac117f208707488c03f3789cf4904957
>> shows, the pci_remove_bus_device() was removed while the
>> remove_slot() was added in disable_slot(), which means: 1)
>> fakephp can't remove pci_dev anymore; 2) deadlock happens
>> because remove_slot() tries to remove the sysfs entry calling
>> itself.
>
>Sorry about introducing this regression. Does the following patch
>fix it for you?
>
>I've tested it on my system, and I can again use fakephp to
>remove a device:
>
>sapphire:/sys/bus/pci/slots # echo 0 > fake4/power
>fakephp: disable_slot - physical_slot = fake4
>e1000 0000:01:02.1: PCI INT B disabled
>GSI 32 (level, low) -> CPU 2 (0x0200) vector 54 unregistered
>fakephp: Slot already scheduled for removal
>fakephp: removing slot fake4
>
>We get the "slot already scheduled for removal" because that
>particular device has 2 functions, and we're creating slots on a
>per-slot basis now, not a per-function basis.
>
>Although, I wonder, Willy -- is that really the right thing to
>do? Seems like fakephp would be more useful if we did operate on
>a per-function basis, and not per-slot. Especially given Yu's
>work with SR-IOV, where we can apparently have lots of functions
>per a physical device.
>
>Hm?
>
>Thanks.
>
>/ac
>
>From: Alex Chiang <achiang@hp.com>
>
>PCI Hotplug: fakephp: fix deadlock... again
>
>Commit fe99740cac117f208707488c03f3789cf4904957 (construct one
>fakephp slot per PCI slot) introduced a regression, causing a
>deadlock when removing a PCI device.
>
>We also never actually removed the device from the PCI core.
>
>So we:
>
>	- remove the device from the PCI core
>	- do not directly call remove_slot() to prevent deadlock
>
>Yu Zhao reported and diagnosed this defect.
>
>Signed-off-by: Alex Chiang <achiang@hp.com>
>---
>diff --git a/drivers/pci/hotplug/fakephp.c b/drivers/pci/hotplug/fakephp.c
>index 40337a0..146ca9c 100644
>--- a/drivers/pci/hotplug/fakephp.c
>+++ b/drivers/pci/hotplug/fakephp.c
>@@ -320,15 +320,15 @@ static int disable_slot(struct hotplug_slot *slot)
> 			return -ENODEV;
> 		}
>
>+		/* remove the device from the pci core */
>+		pci_remove_bus_device(dev);
>+
> 		/* queue work item to blow away this sysfs entry and other
> 		 * parts.
> 		 */
> 		INIT_WORK(&dslot->remove_work, remove_slot_worker);
> 		queue_work(dummyphp_wq, &dslot->remove_work);
>
>-		/* blow away this sysfs entry and other parts. */
>-		remove_slot(dslot);
>-
> 		pci_dev_put(dev);
> 	}
> 	return 0;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI Hotplug: fakephp: fix deadlock... again
  2008-09-02  0:19                   ` Alex Chiang
@ 2008-09-09  4:12                     ` Jesse Barnes
  2008-09-09  4:27                       ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Jesse Barnes @ 2008-09-09  4:12 UTC (permalink / raw)
  To: Alex Chiang; +Cc: Matthew Wilcox, Zhao, Yu, Greg KH, linux-pci, linux-kernel

On Monday, September 01, 2008 5:19 pm Alex Chiang wrote:
> * Matthew Wilcox <matthew@wil.cx>:
> > On Mon, Sep 01, 2008 at 12:40:18PM -0600, Alex Chiang wrote:
> > > We get the "slot already scheduled for removal" because that
> > > particular device has 2 functions, and we're creating slots
> > > on a per-slot basis now, not a per-function basis.
> > >
> > > Although, I wonder, Willy -- is that really the right thing
> > > to do? Seems like fakephp would be more useful if we did
> > > operate on a per-function basis, and not per-slot. Especially
> > > given Yu's work with SR-IOV, where we can apparently have
> > > lots of functions per a physical device.
> >
> > I suspect it depends on what you believe the point of fakephp
> > is.
>
> Ok, this was all developed before I started working in this area,
> or Linux, even. ;)
>
> > My assumption was that it was a way to fake what would happen
> > if you had a hotplug controller for a particular slot.  In that
> > context, the change I made was clearly correct.  If you want to
> > use it for hot-removing individual functions from a Linux guest
> > running under a hypervisor (for example), that's much less
> > useful.
>
> Sure, that sounds reasonable.
>
> In that case, my patch should fix the stupid regression I
> introduced, but if others think that fakephp would be more useful
> for hot-removing functions, I wouldn't object to reverting the
> original patch.

I don't have a preference here; whatever is most useful for people is probably 
what we should go for.  Zhao?

Thanks,
Jesse


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI Hotplug: fakephp: fix deadlock... again
  2008-09-09  4:12                     ` Jesse Barnes
@ 2008-09-09  4:27                       ` Matthew Wilcox
  2008-09-09  5:32                         ` Andrew Patterson
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2008-09-09  4:27 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Alex Chiang, Zhao, Yu, Greg KH, linux-pci, linux-kernel

On Mon, Sep 08, 2008 at 09:12:29PM -0700, Jesse Barnes wrote:
> I don't have a preference here; whatever is most useful for people is probably 
> what we should go for.  Zhao?

I think we have two distinct groups of users; those who find emulating
slot hotplug useful, and those who want to hot-remove pci functions.
Maybe hot-removing a pci function should not be part of the pci hotplug
core per se -- obviously it would share much code -- but being able
to hot-remove a function is a fundamentally different thing from being
able to hot-remove a slot.  For example, hot-removing a function from a
device that is in a real hotplug slot should be possible, but it wouldn't
involve the driver for that hotplug slot.

So maybe what we want is a /sys/bus/pci/devices/dddd:bb:dd.f/remove file
that does just that.

Oh, and we also want a way to hot-add functions, not necessarily even
ones that have been removed from the machine after it was booted, but
those that show up after the machine has booted.  For example, one of
my former colleagues had a laptop which would remove the wireless pci
device from the bus if the rfkill switch was enabled.  I remember there
was a hack to load a module that called some pci bus rescan functionality.
I didn't look into it in much detail.

I don't have a firm idea about an interface for this.  SCSI handles it by
writing scsi-add-single-device H C T L to /proc/scsi/scsi.  Maybe we want
a /sys/bus/pci/scan or /sys/bus/pci/devices/scan file that we can echo
"0000:01:02.3" to scan just that function, or "0000:01:02" to scan the
device.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] PCI Hotplug: fakephp: fix deadlock... again
  2008-09-09  4:27                       ` Matthew Wilcox
@ 2008-09-09  5:32                         ` Andrew Patterson
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Patterson @ 2008-09-09  5:32 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jesse Barnes, Alex Chiang, Zhao, Yu, Greg KH, linux-pci, linux-kernel

On Mon, 2008-09-08 at 22:27 -0600, Matthew Wilcox wrote:
> On Mon, Sep 08, 2008 at 09:12:29PM -0700, Jesse Barnes wrote:
> > I don't have a preference here; whatever is most useful for people is probably 
> > what we should go for.  Zhao?
> 
> I think we have two distinct groups of users; those who find emulating
> slot hotplug useful, and those who want to hot-remove pci functions.
> Maybe hot-removing a pci function should not be part of the pci hotplug
> core per se -- obviously it would share much code -- but being able
> to hot-remove a function is a fundamentally different thing from being
> able to hot-remove a slot.  For example, hot-removing a function from a
> device that is in a real hotplug slot should be possible, but it wouldn't
> involve the driver for that hotplug slot.
> 
> So maybe what we want is a /sys/bus/pci/devices/dddd:bb:dd.f/remove file
> that does just that.
> 

of something like "disable", so we can enable it again?

> Oh, and we also want a way to hot-add functions, not necessarily even
> ones that have been removed from the machine after it was booted, but
> those that show up after the machine has booted.  For example, one of
> my former colleagues had a laptop which would remove the wireless pci
> device from the bus if the rfkill switch was enabled.  I remember there
> was a hack to load a module that called some pci bus rescan functionality.
> I didn't look into it in much detail.
> 
> I don't have a firm idea about an interface for this.  SCSI handles it by
> writing scsi-add-single-device H C T L to /proc/scsi/scsi.  Maybe we want
> a /sys/bus/pci/scan or /sys/bus/pci/devices/scan file that we can echo
> "0000:01:02.3" to scan just that function, or "0000:01:02" to scan the
> device.
> 

I like this idea. Even perhaps add a recursive scan to rescan behind a
bridge?

-- 
Andrew Patterson
Hewlett-Packard


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-09-09  5:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-21 20:19 refcount leak in pci_get_device()? Alex Chiang
2008-08-21 20:25 ` Matthew Wilcox
2008-08-21 20:47   ` Greg KH
2008-08-21 22:14     ` Alex Chiang
2008-08-30  4:23       ` Zhao, Yu
2008-08-30  5:37         ` Greg KH
2008-08-30  6:20           ` Zhao, Yu
2008-08-31  3:14             ` problems in fakephp (was RE: refcount leak in pci_get_device()?) Zhao, Yu
2008-09-01 18:40               ` [PATCH] PCI Hotplug: fakephp: fix deadlock... again Alex Chiang
2008-09-02  0:10                 ` Matthew Wilcox
2008-09-02  0:19                   ` Alex Chiang
2008-09-09  4:12                     ` Jesse Barnes
2008-09-09  4:27                       ` Matthew Wilcox
2008-09-09  5:32                         ` Andrew Patterson
2008-09-04 15:03                 ` Zhao, Yu
2008-08-21 22:23     ` refcount leak in pci_get_device()? Jesse Barnes
2008-08-22  1:04       ` Henrique de Moraes Holschuh
2008-08-22  1:09         ` Henrique de Moraes Holschuh
2008-08-21 20:40 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).