linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Linux v5.5 serious PCI bug
       [not found] <PSXP216MB0438BFEAA0617283A834E11580580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
@ 2019-12-09 12:37 ` Pavel Machek
  2019-12-09 13:07   ` Nicholas Johnson
  2019-12-09 13:12 ` mika.westerberg
  2019-12-10 13:39 ` [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev Lukas Wunner
  2 siblings, 1 reply; 21+ messages in thread
From: Pavel Machek @ 2019-12-09 12:37 UTC (permalink / raw)
  To: Nicholas Johnson; +Cc: Bjorn Helgaas, linux-kernel, linux-pci, mika.westerberg

> Hi,
> 
> I have compiled Linux v5.5-rc1 and thought all was good until I 
> hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> GPU driver.

Obvious question is: does it work on in v5.4?
									Pavel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
  2019-12-09 12:37 ` Linux v5.5 serious PCI bug Pavel Machek
@ 2019-12-09 13:07   ` Nicholas Johnson
  0 siblings, 0 replies; 21+ messages in thread
From: Nicholas Johnson @ 2019-12-09 13:07 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Bjorn Helgaas, linux-kernel, linux-pci, mika.westerberg

Hi,
On Mon, Dec 09, 2019 at 01:37:38PM +0100, Pavel Machek wrote:
> > Hi,
> > 
> > I have compiled Linux v5.5-rc1 and thought all was good until I 
> > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> > GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> > GPU driver.
> 
> Obvious question is: does it work on in v5.4?
> 									Pavel
Always a good question to ask. The answer is "yes".

I would have noticed had it happened in v5.4 - I am constantly playing 
with Thunderbolt and the kernel. But I tried on v5.4 just now, just to 
be sure, and it works fine. I probably should have double checked, just 
in case, though.

Bisect to come.

Kind regards,
Nicholas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
       [not found] <PSXP216MB0438BFEAA0617283A834E11580580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
  2019-12-09 12:37 ` Linux v5.5 serious PCI bug Pavel Machek
@ 2019-12-09 13:12 ` mika.westerberg
  2019-12-09 13:29   ` Nicholas Johnson
       [not found]   ` <PSXP216MB043809A423446A6EF2C7909A80580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
  2019-12-10 13:39 ` [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev Lukas Wunner
  2 siblings, 2 replies; 21+ messages in thread
From: mika.westerberg @ 2019-12-09 13:12 UTC (permalink / raw)
  To: Nicholas Johnson; +Cc: Bjorn Helgaas, linux-kernel, linux-pci

On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
> Hi,
> 
> I have compiled Linux v5.5-rc1 and thought all was good until I 
> hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> GPU driver.
> 
> We had:
> - kernel NULL pointer dereference
> - refcount_t: underflow; use-after-free.
> 
> Attaching dmesg for now; will bisect and come back with results.

Looks like something related to iommu. Does it work if you disable it?
(intel_iommu=off in the command line).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
  2019-12-09 13:12 ` mika.westerberg
@ 2019-12-09 13:29   ` Nicholas Johnson
       [not found]   ` <PSXP216MB043809A423446A6EF2C7909A80580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
  1 sibling, 0 replies; 21+ messages in thread
From: Nicholas Johnson @ 2019-12-09 13:29 UTC (permalink / raw)
  To: mika.westerberg; +Cc: Bjorn Helgaas, linux-kernel, linux-pci

On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote:
> On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
> > Hi,
> > 
> > I have compiled Linux v5.5-rc1 and thought all was good until I 
> > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> > GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> > GPU driver.
> > 
> > We had:
> > - kernel NULL pointer dereference
> > - refcount_t: underflow; use-after-free.
> > 
> > Attaching dmesg for now; will bisect and come back with results.
> 
> Looks like something related to iommu. Does it work if you disable it?
> (intel_iommu=off in the command line).
I thought it could be that, too.

The attachment "dmesg-4" from the original email is with iommu parameters.
The attachment "dmesg-5" from the original email is with no iommu parameters.
Attaching here "dmesg-6" with the iommu explicitly set off like you said.

No difference, still broken. Although, with iommu off, there are less stack traces.

Could it be sysfs-related?

Kind regards,
Nicholas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
       [not found]   ` <PSXP216MB043809A423446A6EF2C7909A80580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
@ 2019-12-10  7:28     ` mika.westerberg
  2019-12-10 12:00       ` Nicholas Johnson
  0 siblings, 1 reply; 21+ messages in thread
From: mika.westerberg @ 2019-12-10  7:28 UTC (permalink / raw)
  To: Nicholas Johnson; +Cc: Bjorn Helgaas, linux-kernel, linux-pci

On Mon, Dec 09, 2019 at 01:33:49PM +0000, Nicholas Johnson wrote:
> On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote:
> > On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
> > > Hi,
> > > 
> > > I have compiled Linux v5.5-rc1 and thought all was good until I 
> > > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> > > GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> > > GPU driver.
> > > 
> > > We had:
> > > - kernel NULL pointer dereference
> > > - refcount_t: underflow; use-after-free.
> > > 
> > > Attaching dmesg for now; will bisect and come back with results.
> > 
> > Looks like something related to iommu. Does it work if you disable it?
> > (intel_iommu=off in the command line).
> On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote:
> > On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
> > > Hi,
> > > 
> > > I have compiled Linux v5.5-rc1 and thought all was good until I 
> > > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> > > GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> > > GPU driver.
> > > 
> > > We had:
> > > - kernel NULL pointer dereference
> > > - refcount_t: underflow; use-after-free.
> > > 
> > > Attaching dmesg for now; will bisect and come back with results.
> > 
> > Looks like something related to iommu. Does it work if you disable it?
> > (intel_iommu=off in the command line).
> I thought it could be that, too.
> 
> The attachment "dmesg-4" from the original email is with iommu parameters.
> The attachment "dmesg-5" from the original email is with no iommu parameters.
> Attaching here "dmesg-6" with the iommu explicitly set off like you said.
> 
> No difference, still broken. Although, with iommu off, there are less stack traces.
> 
> Could it be sysfs-related?

Bisect would probably be the best option to find the culprit commit.
There are couple of commits done for pciehp so reverting them one by one
may help as well:

  87d0f2a5536f PCI: pciehp: Prevent deadlock on disconnect
  75fcc0ce72e5 PCI: pciehp: Do not disable interrupt twice on suspend
  b94ec12dfaee PCI: pciehp: Refactor infinite loop in pcie_poll_cmd()
  157c1062fcd8 PCI: pciehp: Avoid returning prematurely from sysfs requests

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
  2019-12-10  7:28     ` mika.westerberg
@ 2019-12-10 12:00       ` Nicholas Johnson
  2019-12-10 12:29         ` Lukas Wunner
  2019-12-10 12:34         ` mika.westerberg
  0 siblings, 2 replies; 21+ messages in thread
From: Nicholas Johnson @ 2019-12-10 12:00 UTC (permalink / raw)
  To: mika.westerberg; +Cc: Bjorn Helgaas, linux-kernel, linux-pci

On Tue, Dec 10, 2019 at 09:28:00AM +0200, mika.westerberg@linux.intel.com wrote:
> On Mon, Dec 09, 2019 at 01:33:49PM +0000, Nicholas Johnson wrote:
> > On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote:
> > > On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
> > > > Hi,
> > > > 
> > > > I have compiled Linux v5.5-rc1 and thought all was good until I 
> > > > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> > > > GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> > > > GPU driver.
> > > > 
> > > > We had:
> > > > - kernel NULL pointer dereference
> > > > - refcount_t: underflow; use-after-free.
> > > > 
> > > > Attaching dmesg for now; will bisect and come back with results.
> > > 
> > > Looks like something related to iommu. Does it work if you disable it?
> > > (intel_iommu=off in the command line).
> > On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote:
> > > On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
> > > > Hi,
> > > > 
> > > > I have compiled Linux v5.5-rc1 and thought all was good until I 
> > > > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> > > > GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> > > > GPU driver.
> > > > 
> > > > We had:
> > > > - kernel NULL pointer dereference
> > > > - refcount_t: underflow; use-after-free.
> > > > 
> > > > Attaching dmesg for now; will bisect and come back with results.
> > > 
> > > Looks like something related to iommu. Does it work if you disable it?
> > > (intel_iommu=off in the command line).
> > I thought it could be that, too.
> > 
> > The attachment "dmesg-4" from the original email is with iommu parameters.
> > The attachment "dmesg-5" from the original email is with no iommu parameters.
> > Attaching here "dmesg-6" with the iommu explicitly set off like you said.
> > 
> > No difference, still broken. Although, with iommu off, there are less stack traces.
> > 
> > Could it be sysfs-related?
> 
> Bisect would probably be the best option to find the culprit commit.
> There are couple of commits done for pciehp so reverting them one by one
> may help as well:
> 
>   87d0f2a5536f PCI: pciehp: Prevent deadlock on disconnect
>   75fcc0ce72e5 PCI: pciehp: Do not disable interrupt twice on suspend
>   b94ec12dfaee PCI: pciehp: Refactor infinite loop in pcie_poll_cmd()
>   157c1062fcd8 PCI: pciehp: Avoid returning prematurely from sysfs requests
You are not going to believe this. The offending commit is in the SOUND 
subsystem. I thought I had messed up the bisect when only sound commits 
were showing near the end.

And yes, I double checked.

Reverted, compiled, tested that it started working.
Reapplied, compiled, tested that it stopped working.
Twice.

The following is the culprit responsible for the issues:

commit 586bc4aab878efcf672536f0cdec3d04b6990c94
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Fri Nov 22 16:43:50 2019 -0500

    ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD

It is playing with PCI devices. Clearly they did not consider 
hot-removal. I am guessing it is seeing the audio PCI func of the AMD 
card in that Thunderbolt eGPU enclosure.

I will collect information, make a bugzilla report and contact the AMD 
team. If anybody wants to be cc'd in then let me know. Sorry for 
bothering you and Bjorn with something which actually has nothing 
directly to do with the PCI subsystem or Thunderbolt.

I strongly hope that the upcoming Intel Xe GPU driver allows for 
surprise-removal in Linux without any crashing of kernel or userspace. 
The amdgpu and nouveau drivers do not take to surprise removal kindly, 
even without the above sound bug applying to AMD.

Kind regards,
Nicholas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
  2019-12-10 12:00       ` Nicholas Johnson
@ 2019-12-10 12:29         ` Lukas Wunner
  2019-12-10 12:46           ` Takashi Iwai
  2019-12-10 12:52           ` Nicholas Johnson
  2019-12-10 12:34         ` mika.westerberg
  1 sibling, 2 replies; 21+ messages in thread
From: Lukas Wunner @ 2019-12-10 12:29 UTC (permalink / raw)
  To: Nicholas Johnson
  Cc: mika.westerberg, Bjorn Helgaas, linux-kernel, linux-pci,
	alexander.deucher, tiwai

[cc += Alex, Takashi]

On Tue, Dec 10, 2019 at 12:00:23PM +0000, Nicholas Johnson wrote:
> On Tue, Dec 10, 2019 at 09:28:00AM +0200, mika.westerberg@linux.intel.com wrote:
> > On Mon, Dec 09, 2019 at 01:33:49PM +0000, Nicholas Johnson wrote:
> > > On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote:
> > > > On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
> > > > > I have compiled Linux v5.5-rc1 and thought all was good until I 
> > > > > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> > > > > GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> > > > > GPU driver.
> > > > > 
> > > > > We had:
> > > > > - kernel NULL pointer dereference
> > > > > - refcount_t: underflow; use-after-free.
> 
> The following is the culprit responsible for the issues:
> 
> commit 586bc4aab878efcf672536f0cdec3d04b6990c94
> Author: Alex Deucher <alexander.deucher@amd.com>
> Date:   Fri Nov 22 16:43:50 2019 -0500
> 
>     ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD

Does the below fix the issue?

-- >8 --
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 35b4526f0d28..b856b89378ac 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1419,7 +1419,6 @@ static bool atpx_present(void)
 				return true;
 			}
 		}
-		pci_dev_put(pdev);
 	}
 	return false;
 }

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
  2019-12-10 12:00       ` Nicholas Johnson
  2019-12-10 12:29         ` Lukas Wunner
@ 2019-12-10 12:34         ` mika.westerberg
  1 sibling, 0 replies; 21+ messages in thread
From: mika.westerberg @ 2019-12-10 12:34 UTC (permalink / raw)
  To: Nicholas Johnson; +Cc: Bjorn Helgaas, linux-kernel, linux-pci

On Tue, Dec 10, 2019 at 12:00:23PM +0000, Nicholas Johnson wrote:
> The following is the culprit responsible for the issues:
> 
> commit 586bc4aab878efcf672536f0cdec3d04b6990c94
> Author: Alex Deucher <alexander.deucher@amd.com>
> Date:   Fri Nov 22 16:43:50 2019 -0500
> 
>     ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD
>
> It is playing with PCI devices. Clearly they did not consider 
> hot-removal. I am guessing it is seeing the audio PCI func of the AMD 
> card in that Thunderbolt eGPU enclosure.
>
> I will collect information, make a bugzilla report and contact the AMD 
> team. If anybody wants to be cc'd in then let me know. Sorry for 
> bothering you and Bjorn with something which actually has nothing 
> directly to do with the PCI subsystem or Thunderbolt.

No problem. Thanks for taking time to bisect the issue.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
  2019-12-10 12:29         ` Lukas Wunner
@ 2019-12-10 12:46           ` Takashi Iwai
  2019-12-11  7:33             ` Jiasen Lin
  2019-12-10 12:52           ` Nicholas Johnson
  1 sibling, 1 reply; 21+ messages in thread
From: Takashi Iwai @ 2019-12-10 12:46 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Nicholas Johnson, mika.westerberg, Bjorn Helgaas, linux-kernel,
	linux-pci, alexander.deucher, tiwai

On Tue, 10 Dec 2019 13:29:41 +0100,
Lukas Wunner wrote:
> 
> [cc += Alex, Takashi]
> 
> On Tue, Dec 10, 2019 at 12:00:23PM +0000, Nicholas Johnson wrote:
> > On Tue, Dec 10, 2019 at 09:28:00AM +0200, mika.westerberg@linux.intel.com wrote:
> > > On Mon, Dec 09, 2019 at 01:33:49PM +0000, Nicholas Johnson wrote:
> > > > On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote:
> > > > > On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
> > > > > > I have compiled Linux v5.5-rc1 and thought all was good until I 
> > > > > > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> > > > > > GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> > > > > > GPU driver.
> > > > > > 
> > > > > > We had:
> > > > > > - kernel NULL pointer dereference
> > > > > > - refcount_t: underflow; use-after-free.
> > 
> > The following is the culprit responsible for the issues:
> > 
> > commit 586bc4aab878efcf672536f0cdec3d04b6990c94
> > Author: Alex Deucher <alexander.deucher@amd.com>
> > Date:   Fri Nov 22 16:43:50 2019 -0500
> > 
> >     ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD
> 
> Does the below fix the issue?
> 
> -- >8 --
> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> index 35b4526f0d28..b856b89378ac 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
>  				return true;
>  			}
>  		}
> -		pci_dev_put(pdev);
>  	}
>  	return false;
>  }

Oh this looks really like a bug, even if this isn't the root cause.

Care to submit a proper patch?


thanks,

Takashi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
  2019-12-10 12:29         ` Lukas Wunner
  2019-12-10 12:46           ` Takashi Iwai
@ 2019-12-10 12:52           ` Nicholas Johnson
  1 sibling, 0 replies; 21+ messages in thread
From: Nicholas Johnson @ 2019-12-10 12:52 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: mika.westerberg, Bjorn Helgaas, linux-kernel, linux-pci,
	alexander.deucher, tiwai

On Tue, Dec 10, 2019 at 01:29:41PM +0100, Lukas Wunner wrote:
> [cc += Alex, Takashi]
> 
> On Tue, Dec 10, 2019 at 12:00:23PM +0000, Nicholas Johnson wrote:
> > On Tue, Dec 10, 2019 at 09:28:00AM +0200, mika.westerberg@linux.intel.com wrote:
> > > On Mon, Dec 09, 2019 at 01:33:49PM +0000, Nicholas Johnson wrote:
> > > > On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote:
> > > > > On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
> > > > > > I have compiled Linux v5.5-rc1 and thought all was good until I 
> > > > > > hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the 
> > > > > > GPU was not loaded (blacklisted) so the crash is nothing to do with the 
> > > > > > GPU driver.
> > > > > > 
> > > > > > We had:
> > > > > > - kernel NULL pointer dereference
> > > > > > - refcount_t: underflow; use-after-free.
> > 
> > The following is the culprit responsible for the issues:
> > 
> > commit 586bc4aab878efcf672536f0cdec3d04b6990c94
> > Author: Alex Deucher <alexander.deucher@amd.com>
> > Date:   Fri Nov 22 16:43:50 2019 -0500
> > 
> >     ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD
> 
> Does the below fix the issue?
> 
> -- >8 --
> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> index 35b4526f0d28..b856b89378ac 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
>  				return true;
>  			}
>  		}
> -		pci_dev_put(pdev);
>  	}
>  	return false;
>  }
Yes, removing the pci_dev_put() as above solves the issue.

Thanks.

Kind regards,
Nicholas.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
       [not found] <PSXP216MB0438BFEAA0617283A834E11580580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
  2019-12-09 12:37 ` Linux v5.5 serious PCI bug Pavel Machek
  2019-12-09 13:12 ` mika.westerberg
@ 2019-12-10 13:39 ` Lukas Wunner
  2019-12-10 13:41   ` Takashi Iwai
                     ` (2 more replies)
  2 siblings, 3 replies; 21+ messages in thread
From: Lukas Wunner @ 2019-12-10 13:39 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Jaroslav Kysela, Alex Deucher, Mika Westerberg, Bjorn Helgaas,
	Nicholas Johnson, alsa-devel, linux-kernel, linux-pci

Nicholas Johnson reports a null pointer deref as well as a refcount
underflow upon hot-removal of a Thunderbolt-attached AMD eGPU.
He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi -
fix vgaswitcheroo detection for AMD").

The commit iterates over PCI devices using pci_get_class() and
unreferences each device found, even though pci_get_class()
subsequently unreferences the device as well.  Fix it.

Fixes: 586bc4aab878 ("ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD")
Link: https://lore.kernel.org/r/PSXP216MB0438BFEAA0617283A834E11580580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM/
Reported-and-tested-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Alexander Deucher <alexander.deucher@amd.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>
---
 sound/pci/hda/hda_intel.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 35b4526f0d28..b856b89378ac 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1419,7 +1419,6 @@ static bool atpx_present(void)
 				return true;
 			}
 		}
-		pci_dev_put(pdev);
 	}
 	return false;
 }
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
  2019-12-10 13:39 ` [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev Lukas Wunner
@ 2019-12-10 13:41   ` Takashi Iwai
  2019-12-10 13:47   ` Nicholas Johnson
  2019-12-10 15:34   ` Deucher, Alexander
  2 siblings, 0 replies; 21+ messages in thread
From: Takashi Iwai @ 2019-12-10 13:41 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Jaroslav Kysela, Alex Deucher, Mika Westerberg, Bjorn Helgaas,
	Nicholas Johnson, alsa-devel, linux-kernel, linux-pci

On Tue, 10 Dec 2019 14:39:50 +0100,
Lukas Wunner wrote:
> 
> Nicholas Johnson reports a null pointer deref as well as a refcount
> underflow upon hot-removal of a Thunderbolt-attached AMD eGPU.
> He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi -
> fix vgaswitcheroo detection for AMD").
> 
> The commit iterates over PCI devices using pci_get_class() and
> unreferences each device found, even though pci_get_class()
> subsequently unreferences the device as well.  Fix it.
> 
> Fixes: 586bc4aab878 ("ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD")
> Link: https://lore.kernel.org/r/PSXP216MB0438BFEAA0617283A834E11580580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM/
> Reported-and-tested-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
> Cc: Alexander Deucher <alexander.deucher@amd.com>
> Cc: Bjorn Helgaas <helgaas@kernel.org>

Applied now.  Thanks.


Takashi

> ---
>  sound/pci/hda/hda_intel.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> index 35b4526f0d28..b856b89378ac 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
>  				return true;
>  			}
>  		}
> -		pci_dev_put(pdev);
>  	}
>  	return false;
>  }
> -- 
> 2.24.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
  2019-12-10 13:39 ` [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev Lukas Wunner
  2019-12-10 13:41   ` Takashi Iwai
@ 2019-12-10 13:47   ` Nicholas Johnson
  2019-12-10 13:50     ` Takashi Iwai
  2019-12-10 15:34   ` Deucher, Alexander
  2 siblings, 1 reply; 21+ messages in thread
From: Nicholas Johnson @ 2019-12-10 13:47 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Takashi Iwai, Jaroslav Kysela, Alex Deucher, Mika Westerberg,
	Bjorn Helgaas, alsa-devel, linux-kernel, linux-pci

On Tue, Dec 10, 2019 at 02:39:50PM +0100, Lukas Wunner wrote:
> Nicholas Johnson reports a null pointer deref as well as a refcount
> underflow upon hot-removal of a Thunderbolt-attached AMD eGPU.
> He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi -
> fix vgaswitcheroo detection for AMD").
> 
> The commit iterates over PCI devices using pci_get_class() and
> unreferences each device found, even though pci_get_class()
> subsequently unreferences the device as well.  Fix it.
> 
> Fixes: 586bc4aab878 ("ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD")
> Link: https://lore.kernel.org/r/PSXP216MB0438BFEAA0617283A834E11580580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM/
> Reported-and-tested-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
> Cc: Alexander Deucher <alexander.deucher@amd.com>
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> ---
>  sound/pci/hda/hda_intel.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> index 35b4526f0d28..b856b89378ac 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
>  				return true;
>  			}
>  		}
> -		pci_dev_put(pdev);
>  	}
>  	return false;
>  }
> -- 
> 2.24.0
> 
Extremely fast turnaround from bisect to patch.

If you want the bugzilla.kernel.org report then I can do that ASAP, but 
I feel that it is redundant now.

Thank you for handling my bug report.

Regards,
Nicholas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
  2019-12-10 13:47   ` Nicholas Johnson
@ 2019-12-10 13:50     ` Takashi Iwai
  0 siblings, 0 replies; 21+ messages in thread
From: Takashi Iwai @ 2019-12-10 13:50 UTC (permalink / raw)
  To: Nicholas Johnson
  Cc: Lukas Wunner, Jaroslav Kysela, Alex Deucher, Mika Westerberg,
	Bjorn Helgaas, alsa-devel, linux-kernel, linux-pci

On Tue, 10 Dec 2019 14:47:28 +0100,
Nicholas Johnson wrote:
> 
> On Tue, Dec 10, 2019 at 02:39:50PM +0100, Lukas Wunner wrote:
> > Nicholas Johnson reports a null pointer deref as well as a refcount
> > underflow upon hot-removal of a Thunderbolt-attached AMD eGPU.
> > He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi -
> > fix vgaswitcheroo detection for AMD").
> > 
> > The commit iterates over PCI devices using pci_get_class() and
> > unreferences each device found, even though pci_get_class()
> > subsequently unreferences the device as well.  Fix it.
> > 
> > Fixes: 586bc4aab878 ("ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD")
> > Link: https://lore.kernel.org/r/PSXP216MB0438BFEAA0617283A834E11580580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM/
> > Reported-and-tested-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Cc: Alexander Deucher <alexander.deucher@amd.com>
> > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > ---
> >  sound/pci/hda/hda_intel.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > index 35b4526f0d28..b856b89378ac 100644
> > --- a/sound/pci/hda/hda_intel.c
> > +++ b/sound/pci/hda/hda_intel.c
> > @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
> >  				return true;
> >  			}
> >  		}
> > -		pci_dev_put(pdev);
> >  	}
> >  	return false;
> >  }
> > -- 
> > 2.24.0
> > 
> Extremely fast turnaround from bisect to patch.
> 
> If you want the bugzilla.kernel.org report then I can do that ASAP, but 
> I feel that it is redundant now.

We have now Link tag to track the ML archive, so it should suffice, I
suppose.

> Thank you for handling my bug report.

And thank you for your quick testing.


Takashi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
  2019-12-10 13:39 ` [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev Lukas Wunner
  2019-12-10 13:41   ` Takashi Iwai
  2019-12-10 13:47   ` Nicholas Johnson
@ 2019-12-10 15:34   ` Deucher, Alexander
  2019-12-10 15:46     ` Lukas Wunner
  2 siblings, 1 reply; 21+ messages in thread
From: Deucher, Alexander @ 2019-12-10 15:34 UTC (permalink / raw)
  To: Lukas Wunner, Takashi Iwai
  Cc: Jaroslav Kysela, Mika Westerberg, Bjorn Helgaas,
	Nicholas Johnson, alsa-devel, linux-kernel, linux-pci

> -----Original Message-----
> From: Lukas Wunner <lukas@wunner.de>
> Sent: Tuesday, December 10, 2019 8:40 AM
> To: Takashi Iwai <tiwai@suse.de>
> Cc: Jaroslav Kysela <perex@perex.cz>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Mika Westerberg
> <mika.westerberg@linux.intel.com>; Bjorn Helgaas <helgaas@kernel.org>;
> Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>; alsa-
> devel@alsa-project.org; linux-kernel@vger.kernel.org; linux-
> pci@vger.kernel.org
> Subject: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
> 
> Nicholas Johnson reports a null pointer deref as well as a refcount underflow
> upon hot-removal of a Thunderbolt-attached AMD eGPU.
> He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi - fix
> vgaswitcheroo detection for AMD").
> 
> The commit iterates over PCI devices using pci_get_class() and unreferences
> each device found, even though pci_get_class() subsequently unreferences
> the device as well.  Fix it.

The pci_dev_put() a few lines above should probably be dropped as well.

Alex

> 
> Fixes: 586bc4aab878 ("ALSA: hda/hdmi - fix vgaswitcheroo detection for
> AMD")
> Link:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.
> kernel.org%2Fr%2FPSXP216MB0438BFEAA0617283A834E11580580%40PSXP2
> 16MB0438.KORP216.PROD.OUTLOOK.COM%2F&amp;data=02%7C01%7Calex
> ander.deucher%40amd.com%7C254649b79a6240f3a8aa08d77d76715f%7C3d
> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637115819998031934&amp
> ;sdata=qI%2B%2FJv3Z6pvwN7gQFSnZiP31YUAvd%2BKjo0ZqMERFbYU%3D&a
> mp;reserved=0
> Reported-and-tested-by: Nicholas Johnson <nicholas.johnson-
> opensource@outlook.com.au>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
> Cc: Alexander Deucher <alexander.deucher@amd.com>
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> ---
>  sound/pci/hda/hda_intel.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index
> 35b4526f0d28..b856b89378ac 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
>  				return true;
>  			}
>  		}
> -		pci_dev_put(pdev);
>  	}
>  	return false;
>  }
> --
> 2.24.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
  2019-12-10 15:34   ` Deucher, Alexander
@ 2019-12-10 15:46     ` Lukas Wunner
  2019-12-10 15:53       ` Deucher, Alexander
  0 siblings, 1 reply; 21+ messages in thread
From: Lukas Wunner @ 2019-12-10 15:46 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: Takashi Iwai, Jaroslav Kysela, Mika Westerberg, Bjorn Helgaas,
	Nicholas Johnson, alsa-devel, linux-kernel, linux-pci

On Tue, Dec 10, 2019 at 03:34:27PM +0000, Deucher, Alexander wrote:
> > Nicholas Johnson reports a null pointer deref as well as a refcount underflow
> > upon hot-removal of a Thunderbolt-attached AMD eGPU.
> > He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi - fix
> > vgaswitcheroo detection for AMD").
> > 
> > The commit iterates over PCI devices using pci_get_class() and unreferences
> > each device found, even though pci_get_class() subsequently unreferences
> > the device as well.  Fix it.
> 
> The pci_dev_put() a few lines above should probably be dropped as well.

That one looks fine to me.  The refcount is already increased in the
caller get_bound_vga() via pci_get_domain_bus_and_slot() and it's
increased again in atpx_present() via pci_get_class().  It needs to
be decremented in atpx_present() to avoid leaking a ref.

Thanks,

Lukas

> > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index
> > 35b4526f0d28..b856b89378ac 100644
> > --- a/sound/pci/hda/hda_intel.c
> > +++ b/sound/pci/hda/hda_intel.c
> > @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
> >  				return true;
> >  			}
> >  		}
> > -		pci_dev_put(pdev);
> >  	}
> >  	return false;
> >  }
> > --
> > 2.24.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
  2019-12-10 15:46     ` Lukas Wunner
@ 2019-12-10 15:53       ` Deucher, Alexander
  2019-12-10 16:10         ` Takashi Iwai
  2019-12-10 16:13         ` Lukas Wunner
  0 siblings, 2 replies; 21+ messages in thread
From: Deucher, Alexander @ 2019-12-10 15:53 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Takashi Iwai, Jaroslav Kysela, Mika Westerberg, Bjorn Helgaas,
	Nicholas Johnson, alsa-devel, linux-kernel, linux-pci

> -----Original Message-----
> From: Lukas Wunner <lukas@wunner.de>
> Sent: Tuesday, December 10, 2019 10:47 AM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>
> Cc: Takashi Iwai <tiwai@suse.de>; Jaroslav Kysela <perex@perex.cz>; Mika
> Westerberg <mika.westerberg@linux.intel.com>; Bjorn Helgaas
> <helgaas@kernel.org>; Nicholas Johnson <nicholas.johnson-
> opensource@outlook.com.au>; alsa-devel@alsa-project.org; linux-
> kernel@vger.kernel.org; linux-pci@vger.kernel.org
> Subject: Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
> 
> On Tue, Dec 10, 2019 at 03:34:27PM +0000, Deucher, Alexander wrote:
> > > Nicholas Johnson reports a null pointer deref as well as a refcount
> > > underflow upon hot-removal of a Thunderbolt-attached AMD eGPU.
> > > He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi
> > > - fix vgaswitcheroo detection for AMD").
> > >
> > > The commit iterates over PCI devices using pci_get_class() and
> > > unreferences each device found, even though pci_get_class()
> > > subsequently unreferences the device as well.  Fix it.
> >
> > The pci_dev_put() a few lines above should probably be dropped as well.
> 
> That one looks fine to me.  The refcount is already increased in the caller
> get_bound_vga() via pci_get_domain_bus_and_slot() and it's increased
> again in atpx_present() via pci_get_class().  It needs to be decremented in
> atpx_present() to avoid leaking a ref.

I'm not following.  This is part of the same loop as the one you removed.  All we are doing is checking whether the ATPX method exists or not om the platform.  The pdev may not be the same one as the one in pci_get_domain_bus_and_slot().  The APTX method in the APU's ACPI namespace, not the dGPUs.

Alex

> 
> Thanks,
> 
> Lukas
> 
> > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > > index 35b4526f0d28..b856b89378ac 100644
> > > --- a/sound/pci/hda/hda_intel.c
> > > +++ b/sound/pci/hda/hda_intel.c
> > > @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
> > >  				return true;
> > >  			}
> > >  		}
> > > -		pci_dev_put(pdev);
> > >  	}
> > >  	return false;
> > >  }
> > > --
> > > 2.24.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
  2019-12-10 15:53       ` Deucher, Alexander
@ 2019-12-10 16:10         ` Takashi Iwai
  2019-12-10 16:51           ` Deucher, Alexander
  2019-12-10 16:13         ` Lukas Wunner
  1 sibling, 1 reply; 21+ messages in thread
From: Takashi Iwai @ 2019-12-10 16:10 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: Lukas Wunner, Jaroslav Kysela, Mika Westerberg, Bjorn Helgaas,
	Nicholas Johnson, alsa-devel, linux-kernel, linux-pci

On Tue, 10 Dec 2019 16:53:20 +0100,
Deucher, Alexander wrote:
> 
> > -----Original Message-----
> > From: Lukas Wunner <lukas@wunner.de>
> > Sent: Tuesday, December 10, 2019 10:47 AM
> > To: Deucher, Alexander <Alexander.Deucher@amd.com>
> > Cc: Takashi Iwai <tiwai@suse.de>; Jaroslav Kysela <perex@perex.cz>; Mika
> > Westerberg <mika.westerberg@linux.intel.com>; Bjorn Helgaas
> > <helgaas@kernel.org>; Nicholas Johnson <nicholas.johnson-
> > opensource@outlook.com.au>; alsa-devel@alsa-project.org; linux-
> > kernel@vger.kernel.org; linux-pci@vger.kernel.org
> > Subject: Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
> > 
> > On Tue, Dec 10, 2019 at 03:34:27PM +0000, Deucher, Alexander wrote:
> > > > Nicholas Johnson reports a null pointer deref as well as a refcount
> > > > underflow upon hot-removal of a Thunderbolt-attached AMD eGPU.
> > > > He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi
> > > > - fix vgaswitcheroo detection for AMD").
> > > >
> > > > The commit iterates over PCI devices using pci_get_class() and
> > > > unreferences each device found, even though pci_get_class()
> > > > subsequently unreferences the device as well.  Fix it.
> > >
> > > The pci_dev_put() a few lines above should probably be dropped as well.
> > 
> > That one looks fine to me.  The refcount is already increased in the caller
> > get_bound_vga() via pci_get_domain_bus_and_slot() and it's increased
> > again in atpx_present() via pci_get_class().  It needs to be decremented in
> > atpx_present() to avoid leaking a ref.
> 
> I'm not following.  This is part of the same loop as the one you removed.  All we are doing is checking whether the ATPX method exists or not om the platform.  The pdev may not be the same one as the one in pci_get_domain_bus_and_slot().  The APTX method in the APU's ACPI namespace, not the dGPUs.

Well, the tricky part is that pci_get_class() itself does
unrefeference the old object and reference the new object (if found).
At the end of the loop, nothing is referenced, so it's fine.
OTOH, if you go out of the loop in the middle, you're still keeping
the pdev object reference, so you need to manually unreference it.


Takashi

> 
> Alex
> 
> > 
> > Thanks,
> > 
> > Lukas
> > 
> > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > > > index 35b4526f0d28..b856b89378ac 100644
> > > > --- a/sound/pci/hda/hda_intel.c
> > > > +++ b/sound/pci/hda/hda_intel.c
> > > > @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
> > > >  				return true;
> > > >  			}
> > > >  		}
> > > > -		pci_dev_put(pdev);
> > > >  	}
> > > >  	return false;
> > > >  }
> > > > --
> > > > 2.24.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
  2019-12-10 15:53       ` Deucher, Alexander
  2019-12-10 16:10         ` Takashi Iwai
@ 2019-12-10 16:13         ` Lukas Wunner
  1 sibling, 0 replies; 21+ messages in thread
From: Lukas Wunner @ 2019-12-10 16:13 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: Takashi Iwai, Jaroslav Kysela, Mika Westerberg, Bjorn Helgaas,
	Nicholas Johnson, alsa-devel, linux-kernel, linux-pci

On Tue, Dec 10, 2019 at 03:53:20PM +0000, Deucher, Alexander wrote:
> > On Tue, Dec 10, 2019 at 03:34:27PM +0000, Deucher, Alexander wrote:
> > > > Nicholas Johnson reports a null pointer deref as well as a refcount
> > > > underflow upon hot-removal of a Thunderbolt-attached AMD eGPU.
> > > > He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi
> > > > - fix vgaswitcheroo detection for AMD").
> > > >
> > > > The commit iterates over PCI devices using pci_get_class() and
> > > > unreferences each device found, even though pci_get_class()
> > > > subsequently unreferences the device as well.  Fix it.
> > >
> > > The pci_dev_put() a few lines above should probably be dropped as well.
> > 
> > That one looks fine to me.  The refcount is already increased in the caller
> > get_bound_vga() via pci_get_domain_bus_and_slot() and it's increased
> > again in atpx_present() via pci_get_class().  It needs to be decremented in
> > atpx_present() to avoid leaking a ref.
> 
> I'm not following.  This is part of the same loop as the one you removed.
> All we are doing is checking whether the ATPX method exists or not om the
> platform.  The pdev may not be the same one as the one in
> pci_get_domain_bus_and_slot().  The APTX method in the APU's ACPI namespace,
> not the dGPUs.

Okay.  Still, atpx_present() doesn't pass the found pci_dev back to the
caller, so it would be leaked if the ref isn't returned.

The situation is different for the pci_dev_put() I removed:  The ref is
returned by pci_get_class() on the next loop iteration.

Thanks,

Lukas

> > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > > > index 35b4526f0d28..b856b89378ac 100644
> > > > --- a/sound/pci/hda/hda_intel.c
> > > > +++ b/sound/pci/hda/hda_intel.c
> > > > @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
> > > >  				return true;
> > > >  			}
> > > >  		}
> > > > -		pci_dev_put(pdev);
> > > >  	}
> > > >  	return false;
> > > >  }
> > > > --
> > > > 2.24.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
  2019-12-10 16:10         ` Takashi Iwai
@ 2019-12-10 16:51           ` Deucher, Alexander
  0 siblings, 0 replies; 21+ messages in thread
From: Deucher, Alexander @ 2019-12-10 16:51 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Lukas Wunner, Jaroslav Kysela, Mika Westerberg, Bjorn Helgaas,
	Nicholas Johnson, alsa-devel, linux-kernel, linux-pci

> -----Original Message-----
> From: Takashi Iwai <tiwai@suse.de>
> Sent: Tuesday, December 10, 2019 11:11 AM
> To: Deucher, Alexander <Alexander.Deucher@amd.com>
> Cc: Lukas Wunner <lukas@wunner.de>; Jaroslav Kysela <perex@perex.cz>;
> Mika Westerberg <mika.westerberg@linux.intel.com>; Bjorn Helgaas
> <helgaas@kernel.org>; Nicholas Johnson <nicholas.johnson-
> opensource@outlook.com.au>; alsa-devel@alsa-project.org; linux-
> kernel@vger.kernel.org; linux-pci@vger.kernel.org
> Subject: Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
> 
> On Tue, 10 Dec 2019 16:53:20 +0100,
> Deucher, Alexander wrote:
> >
> > > -----Original Message-----
> > > From: Lukas Wunner <lukas@wunner.de>
> > > Sent: Tuesday, December 10, 2019 10:47 AM
> > > To: Deucher, Alexander <Alexander.Deucher@amd.com>
> > > Cc: Takashi Iwai <tiwai@suse.de>; Jaroslav Kysela <perex@perex.cz>;
> > > Mika Westerberg <mika.westerberg@linux.intel.com>; Bjorn Helgaas
> > > <helgaas@kernel.org>; Nicholas Johnson <nicholas.johnson-
> > > opensource@outlook.com.au>; alsa-devel@alsa-project.org; linux-
> > > kernel@vger.kernel.org; linux-pci@vger.kernel.org
> > > Subject: Re: [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev
> > >
> > > On Tue, Dec 10, 2019 at 03:34:27PM +0000, Deucher, Alexander wrote:
> > > > > Nicholas Johnson reports a null pointer deref as well as a
> > > > > refcount underflow upon hot-removal of a Thunderbolt-attached
> AMD eGPU.
> > > > > He's bisected the issue down to commit 586bc4aab878 ("ALSA:
> > > > > hda/hdmi
> > > > > - fix vgaswitcheroo detection for AMD").
> > > > >
> > > > > The commit iterates over PCI devices using pci_get_class() and
> > > > > unreferences each device found, even though pci_get_class()
> > > > > subsequently unreferences the device as well.  Fix it.
> > > >
> > > > The pci_dev_put() a few lines above should probably be dropped as
> well.
> > >
> > > That one looks fine to me.  The refcount is already increased in the
> > > caller
> > > get_bound_vga() via pci_get_domain_bus_and_slot() and it's increased
> > > again in atpx_present() via pci_get_class().  It needs to be
> > > decremented in
> > > atpx_present() to avoid leaking a ref.
> >
> > I'm not following.  This is part of the same loop as the one you removed.  All
> we are doing is checking whether the ATPX method exists or not om the
> platform.  The pdev may not be the same one as the one in
> pci_get_domain_bus_and_slot().  The APTX method in the APU's ACPI
> namespace, not the dGPUs.
> 
> Well, the tricky part is that pci_get_class() itself does unrefeference the old
> object and reference the new object (if found).
> At the end of the loop, nothing is referenced, so it's fine.
> OTOH, if you go out of the loop in the middle, you're still keeping the pdev
> object reference, so you need to manually unreference it.
> 

Ah, I see what you are saying.  Thanks.  Patch is:
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> 
> Takashi
> 
> >
> > Alex
> >
> > >
> > > Thanks,
> > >
> > > Lukas
> > >
> > > > > diff --git a/sound/pci/hda/hda_intel.c
> > > > > b/sound/pci/hda/hda_intel.c index 35b4526f0d28..b856b89378ac
> > > > > 100644
> > > > > --- a/sound/pci/hda/hda_intel.c
> > > > > +++ b/sound/pci/hda/hda_intel.c
> > > > > @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
> > > > >  				return true;
> > > > >  			}
> > > > >  		}
> > > > > -		pci_dev_put(pdev);
> > > > >  	}
> > > > >  	return false;
> > > > >  }
> > > > > --
> > > > > 2.24.0
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Linux v5.5 serious PCI bug
  2019-12-10 12:46           ` Takashi Iwai
@ 2019-12-11  7:33             ` Jiasen Lin
  0 siblings, 0 replies; 21+ messages in thread
From: Jiasen Lin @ 2019-12-11  7:33 UTC (permalink / raw)
  To: Takashi Iwai, Lukas Wunner
  Cc: Nicholas Johnson, mika.westerberg, Bjorn Helgaas, linux-kernel,
	linux-pci, alexander.deucher



On 2019/12/10 20:46, Takashi Iwai wrote:
> On Tue, 10 Dec 2019 13:29:41 +0100,
> Lukas Wunner wrote:
>>
>> [cc += Alex, Takashi]
>>
>> On Tue, Dec 10, 2019 at 12:00:23PM +0000, Nicholas Johnson wrote:
>>> On Tue, Dec 10, 2019 at 09:28:00AM +0200, mika.westerberg@linux.intel.com wrote:
>>>> On Mon, Dec 09, 2019 at 01:33:49PM +0000, Nicholas Johnson wrote:
>>>>> On Mon, Dec 09, 2019 at 03:12:39PM +0200, mika.westerberg@linux.intel.com wrote:
>>>>>> On Mon, Dec 09, 2019 at 12:34:04PM +0000, Nicholas Johnson wrote:
>>>>>>> I have compiled Linux v5.5-rc1 and thought all was good until I
>>>>>>> hot-removed a Gigabyte Aorus eGPU from Thunderbolt. The driver for the
>>>>>>> GPU was not loaded (blacklisted) so the crash is nothing to do with the
>>>>>>> GPU driver.
>>>>>>>
>>>>>>> We had:
>>>>>>> - kernel NULL pointer dereference
>>>>>>> - refcount_t: underflow; use-after-free.
>>>
>>> The following is the culprit responsible for the issues:
>>>
>>> commit 586bc4aab878efcf672536f0cdec3d04b6990c94
>>> Author: Alex Deucher <alexander.deucher@amd.com>
>>> Date:   Fri Nov 22 16:43:50 2019 -0500
>>>
>>>      ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD
>>
>> Does the below fix the issue?
>>
>> -- >8 --
>> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
>> index 35b4526f0d28..b856b89378ac 100644
>> --- a/sound/pci/hda/hda_intel.c
>> +++ b/sound/pci/hda/hda_intel.c
>> @@ -1419,7 +1419,6 @@ static bool atpx_present(void)
>>   				return true;
>>   			}
>>   		}
>> -		pci_dev_put(pdev);
>>   	}
>>   	return false;
>>   }
> 
> Oh this looks really like a bug, even if this isn't the root cause.
> 
> Care to submit a proper patch?
> 
Hi Nicholas
I have disassembled the vmlinux by objdump -Sdl, but is it too big to
send as an attachment. Just take a look at the key information:
The file dmesg-5 show trace:
[   71.693210] general protection fault: 0000 [#1] SMP PTI
[   71.693215] CPU: 1 PID: 184 Comm: irq/128-pciehp Not tainted 5.5.0-rc1 #2
[   71.693217] Hardware name: Dell Inc. XPS 13 9370/09DKKT, BIOS 1.11.1 
07/11/2019
[   71.693223] RIP: 0010:pci_remove_bus_device+0x9a/0x100
[   71.693250] Call Trace:
[   71.693256]  pci_remove_bus_device+0x31/0x100
[   71.693259]  pci_remove_bus_device+0x31/0x100
[   71.693261]  pci_stop_and_remove_bus_device+0x1b/0x20

.....

[   71.709515] Call Trace:
[   71.709527]  kobject_put+0xfc/0x1c0
[   71.709536]  __device_link_free_srcu+0x51/0x60
[   71.709538]  srcu_invoke_callbacks+0xd2/0x170
[   71.709547]  process_one_work+0x1ec/0x3a0
[   71.709550]  worker_thread+0x4d/0x400
[   71.709554]  kthread+0x104/0x140
[   71.709557]  ? process_one_work+0x3a0/0x3a0
[   71.709559]  ? kthread_park+0x90/0x90
[   71.709563]  ret_from_fork+0x35/0x40

the disassembling file
ffffffff814eea30 <pci_remove_bus_device>:
pci_remove_bus_device():
/home/higon/linux/drivers/pci/remove.c:86

static void pci_remove_bus_device(struct pci_dev *dev)
{
ffffffff814eea30:	e8 bb 2e 51 00       	callq  ffffffff81a018f0 <__fentry__>
ffffffff814eea35:	41 55                	push   %r13
ffffffff814eea37:	41 54                	push   %r12
ffffffff814eea39:	55                   	push   %rbp
ffffffff814eea3a:	53                   	push   %rbx
ffffffff814eea3b:	48 89 fd             	mov    %rdi,%rbp
/home/higon/linux/drivers/pci/remove.c:87
	struct pci_bus *bus = dev->subordinate;
ffffffff814eea3e:	4c 8b 6f 18          	mov    0x18(%rdi),%r13
/home/higon/linux/drivers/pci/remove.c:90
	struct pci_dev *child, *tmp;

........

static inline void list_del(struct list_head *entry)
{
	__list_del_entry(entry);
	entry->next = LIST_POISON1;
ffffffff814eeac0:	48 b8 00 01 00 00 00 	movabs $0xdead000000000100,%rax
ffffffff814eeac7:	00 ad de
ffffffff814eeaca:	48 89 45 00          	mov    %rax,0x0(%rbp)
/home/higon/linux/./include/linux/list.h:141
	entry->prev = LIST_POISON2;
ffffffff814eeace:	48 b8 22 01 00 00 00 	movabs $0xdead000000000122,%rax
ffffffff814eead5:	00 ad de
ffffffff814eead8:	48 89 45 08          	mov    %rax,0x8(%rbp)
pci_destroy_dev():
/home/higon/linux/drivers/pci/remove.c:39
ffffffff814eeadc:	e8 2f 23 bf ff       	callq  ffffffff810e0e10 <up_write>
/home/higon/linux/drivers/pci/remove.c:41

In the file source/sound/pci/hda/hda_intel.c, atpx_present can not call
pci_dev_put(pdev) because the reference count for pdev is always
decremented if it is not NULL after call pci_get_class.
Resource of pdev will be released when the reference count for it
decremented to zero. Hotplug service dereferences NULL pointer when 
remove the device from the device lists.

static bool atpx_present(void)
{
	struct pci_dev *pdev = NULL;
	acpi_handle dhandle, atpx_handle;
	acpi_status status;

	while ((pdev = pci_get_class(PCI_BASE_CLASS_DISPLAY << 16, pdev)) != 
NULL) {
		dhandle = ACPI_HANDLE(&pdev->dev);
		if (dhandle) {
			status = acpi_get_handle(dhandle, "ATPX", &atpx_handle);
			if (!ACPI_FAILURE(status)) {
				pci_dev_put(pdev);
				return true;
			}
		}
		pci_dev_put(pdev);
	}
	return false;
}
struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from)
{
	struct pci_device_id id = {
		.vendor = PCI_ANY_ID,
		.device = PCI_ANY_ID,
		.subvendor = PCI_ANY_ID,
		.subdevice = PCI_ANY_ID,
		.class_mask = PCI_ANY_ID,
		.class = class,
	};

	return pci_get_dev_by_id(&id, from);
}
static struct pci_dev *pci_get_dev_by_id(const struct pci_device_id *id,
					 struct pci_dev *from)
{
	struct device *dev;
	struct device *dev_start = NULL;
	struct pci_dev *pdev = NULL;

	WARN_ON(in_interrupt());
	if (from)
		dev_start = &from->dev;
	dev = bus_find_device(&pci_bus_type, dev_start, (void *)id,
			      match_pci_dev_by_id);
	if (dev)
		pdev = to_pci_dev(dev);
	pci_dev_put(from);
	return pdev;
}

Thanks,
Jiasen Lin

> 
> thanks,
> 
> Takashi
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-12-11  7:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <PSXP216MB0438BFEAA0617283A834E11580580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
2019-12-09 12:37 ` Linux v5.5 serious PCI bug Pavel Machek
2019-12-09 13:07   ` Nicholas Johnson
2019-12-09 13:12 ` mika.westerberg
2019-12-09 13:29   ` Nicholas Johnson
     [not found]   ` <PSXP216MB043809A423446A6EF2C7909A80580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM>
2019-12-10  7:28     ` mika.westerberg
2019-12-10 12:00       ` Nicholas Johnson
2019-12-10 12:29         ` Lukas Wunner
2019-12-10 12:46           ` Takashi Iwai
2019-12-11  7:33             ` Jiasen Lin
2019-12-10 12:52           ` Nicholas Johnson
2019-12-10 12:34         ` mika.westerberg
2019-12-10 13:39 ` [PATCH] ALSA: hda/hdmi - Fix duplicate unref of pci_dev Lukas Wunner
2019-12-10 13:41   ` Takashi Iwai
2019-12-10 13:47   ` Nicholas Johnson
2019-12-10 13:50     ` Takashi Iwai
2019-12-10 15:34   ` Deucher, Alexander
2019-12-10 15:46     ` Lukas Wunner
2019-12-10 15:53       ` Deucher, Alexander
2019-12-10 16:10         ` Takashi Iwai
2019-12-10 16:51           ` Deucher, Alexander
2019-12-10 16:13         ` Lukas Wunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).