linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PCIe ASPM crash on device removal
@ 2013-01-18 18:22 Joe Lawrence
  2013-01-18 18:23 ` [PATCH 1/2] PCI: ASPM exit link state code could skip devices Joe Lawrence
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Joe Lawrence @ 2013-01-18 18:22 UTC (permalink / raw)
  To: linux-pci, Matthew Garrett, Myron Stowe, David Bulkow, Joe Lawrence

Stratus has encountered crashes in 3.8.0-rc2 ASPM tracking code when 
hotplugging PCI buses.  The call trace of a crash usually looks like:

pci_stop_and_remove_bus_device
  pci_stop_bus_device
    pcie_aspm_exit_link_state
      pcie_update_aspm_capable
        pcie_aspm_check_latency

where ASPM is trying to access a freed data structure.  When instrumenting
the ASPM init/exit code, it seems that pcie_aspm_exit_link_state is 
skipping over one of our PCIe Downstream Ports that 
pcie_aspm_init_link_state had prior allocated link_state.  Subsequent 
trips through ASPM exit code traverse through the hierarchy and crash 
trying to access freed memory (slub_debug was set to FZPU and  
pcie_aspm_check_latency is trying to access a link* that is set to 
0x6b6b6b6b6b6b6b6b.) 

A brief summary of the init/exit link_state routines is as follows.

  pcie_aspm_init_link_state will alloc_pcie_link_state if:
  
  1 - Device is PCIe
  2 - Hasn't already allocated a link_state
  3 - Device is PCI_EXP_TYPE_ROOT_PORT or PCI_EXP_TYPE_DOWNSTREAM
  4 - Device is not VIA chipset (root port under bridge)
  5 - Device's subordinate device list is not empty


  pcie_aspm_exit_link_state will free_link_state of *parent* device if:
  
  1 - Device is PCIe
  2 - Device has a parent device
  3 - Parent device has a link_state
  4 - Parent device is PCI_EXP_TYPE_ROOT_PORT or PCI_EXP_TYPE_DOWNSTREAM
  5 - Device is the last entry in parent's subordinate device list


The Stratus PCI topology includes a branch that looks like: 

... -00.0-[03-3c]----00.0-[04-2d]--+- ...
                                   |
                                   \-01.0-[2c-2d]--+-00.0
                                                   +-00.1
                                                   \-1f.0

% lspci -vs 03:00.0
03:00.0 PCI bridge: Device 1bcf:0004 (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Memory at 93000000 (64-bit, non-prefetchable) [size=4K]
        Bus: primary=03, secondary=04, subordinate=2d, sec-latency=0
        I/O behind bridge: 00001000-00005fff
        Memory behind bridge: 84000000-92ffffff
        Prefetchable memory behind bridge: 00000000c4000000-00000000c7ffffff
        Capabilities: [b0] Express Upstream Port, MSI 00
        Capabilities: [ec] Power Management version 2
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: pcieport

% lspci -vs 04:01.0
04:01.0 PCI bridge: Device 1bcf:0009 (rev 01) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=04, secondary=2c, subordinate=2d, sec-latency=0
        I/O behind bridge: 00005000-00005fff
        Memory behind bridge: 90000000-92ffffff
        Capabilities: [b0] Express Downstream Port (Slot-), MSI 00
        Capabilities: [ec] Power Management version 2
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: pcieport

% lspci -vs 2c:00.0
2c:00.0 USB Controller: Intel Corporation Patsburg USB2 Enhanced Host 
                        Controller #1 (rev 06) (prog-if 20 [EHCI])
        Subsystem: Device 1bcf:8002
        Flags: bus master, medium devsel, latency 0, IRQ 10
        Memory at 90000000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Debug port: BAR=1 offset=00a0
        Capabilities: [98] PCI Advanced Features
        Kernel driver in use: ehci-pci

% lspci -vs 2c:00.1
2c:00.1 USB Controller: Intel Corporation Patsburg USB2 Enhanced Host
                        Controller #2 (rev 06) (prog-if 20 [EHCI])
        Subsystem: Device 1bcf:8002
        Flags: bus master, medium devsel, latency 0, IRQ 11
        Memory at 90001000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Debug port: BAR=1 offset=00a0
        Capabilities: [98] PCI Advanced Features
        Kernel driver in use: ehci-pci

% lspci -vs 2c:1f.0
2c:1f.0 ISA bridge: Intel Corporation Patsburg LPC Controller (rev 06)
        Subsystem: Device 1bcf:8002
        Flags: bus master, medium devsel, latency 0
        Capabilities: [e0] Vendor Specific Information: Len=0c <?>

When debugging is added to pcie_init_link_state, 04:01.0 does set 
blacklist to 1, as the pcie_aspm_sanity_check figures out that at least 
one of its children is not PCIe.  However, all of the criteria 1-5 above 
is met and a link_state is allocated for this device.

On device removal, pcie_aspm_exit_link_state will never clean up 04:01.0 as
2c:00.0, 2c:00.1, and most importantly 2c:1f.0 are not PCIe (criteria 1).


In reply to this mail, I will post two patches I have been testing this 
week:

[PATCH 1/2] PCI: ASPM exit link state code could skip devices

In looking at pcie_aspm_exit_link_state, might it be better to verify that
the *parent* device is PCIe rather than the device itself?  This would
prevent early exit when the last function is not PCIe and also verify that
the parent is PCIe before calling pci_pcie_type on it.

Furthermore, I didn't investigate this, but does the strategy of the 
last device function out cleaning up the parent link_state leave another
whole -- what if there are no child devices in the first place?

I'd be happy to test other approaches here.

[PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled

Matthew Garrett had provided us this patch to use against a RHEL 6.3 
kernel.  It prevents ASPM from allocating any link_state when 
'pcie_aspm=off' is specified on the kernel command line.  Stratus has been 
using this patch to avoid this crash in RHEL 6.3.

When testing with Patch 1, this second patch was unnecessary (I removed 
any pcie_aspm directives from the boot line).  However given its utility 
in avoiding the crash in the first place, it would be nice to offer the 
user a means to turn off ASPM tracking to avoid other potential issues.

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-01-18 18:22 PCIe ASPM crash on device removal Joe Lawrence
@ 2013-01-18 18:23 ` Joe Lawrence
  2013-01-31 23:29   ` Myron Stowe
  2013-02-06 10:09   ` Gu Zheng
  2013-01-18 18:24 ` [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled Joe Lawrence
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Joe Lawrence @ 2013-01-18 18:23 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-pci, Matthew Garrett, Myron Stowe, David Bulkow

>From 3a51bbad5555ba6c6e144aa5176c8112eb449325 Mon Sep 17 00:00:00 2001
From: Joe Lawrence <joe.lawrence@stratus.com>
Date: Tue, 15 Jan 2013 14:51:57 -0500
Subject: [PATCH 1/2] PCI: ASPM exit link state code could skip devices

On PCI bus hotplug removal, pcie_aspm_exit_link_state could potentially skip
parent devices that have link_state allocated.  Instead of exiting early if
a given device is not PCIe, check that the device's parent is PCIe.  This
enables pcie_aspm_exit_link_state to properly clean up parent link_state when
the last function in a slot might not be PCIe.

Reviewed-by: David Bulkow <david.bulkow@stratus.com>
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/pci/pcie/aspm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index b52630b..6122447 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -634,7 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
 	struct pci_dev *parent = pdev->bus->self;
 	struct pcie_link_state *link, *root, *parent_link;
 
-	if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
+	if (!parent || !pci_is_pcie(parent) || !parent->link_state)
 		return;
 	if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
 	    (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled
  2013-01-18 18:22 PCIe ASPM crash on device removal Joe Lawrence
  2013-01-18 18:23 ` [PATCH 1/2] PCI: ASPM exit link state code could skip devices Joe Lawrence
@ 2013-01-18 18:24 ` Joe Lawrence
  2013-01-18 22:54   ` Myron Stowe
       [not found] ` <CAL-B5D0+6uO7WDYR7inmZKdU0h8-bpkOs_CzbF0bD2b9i6=1ZA@mail.gmail.com>
  2013-01-18 19:57 ` Myron Stowe
  3 siblings, 1 reply; 28+ messages in thread
From: Joe Lawrence @ 2013-01-18 18:24 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-pci, Matthew Garrett, Myron Stowe, David Bulkow

>From 108cca7bee5b04f45d9712507d14b5f3fbc22c29 Mon Sep 17 00:00:00 2001
From: Joe Lawrence <joe.lawrence@stratus.com>
Date: Tue, 15 Jan 2013 15:31:28 -0500
Subject: [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled

Don't allocate and track PCIe ASPM state when "pcie_aspm=off" is specified on
the kernel parameter list.

Based-on-patch-from: Matthew Garrett <mjg59@srcf.ucam.org>
Reviewed-by: David Bulkow <david.bulkow@stratus.com>
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/pci/pcie/aspm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 6122447..f912a93 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -556,6 +556,9 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
 	struct pcie_link_state *link;
 	int blacklist = !!pcie_aspm_sanity_check(pdev);
 
+	if (!aspm_support_enabled)
+		return;
+
 	if (!pci_is_pcie(pdev) || pdev->link_state)
 		return;
 	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: PCIe ASPM crash on device removal
       [not found] ` <CAL-B5D0+6uO7WDYR7inmZKdU0h8-bpkOs_CzbF0bD2b9i6=1ZA@mail.gmail.com>
@ 2013-01-18 19:53   ` Joe Lawrence
  2013-01-18 23:15     ` Myron Stowe
  0 siblings, 1 reply; 28+ messages in thread
From: Joe Lawrence @ 2013-01-18 19:53 UTC (permalink / raw)
  To: Myron Stowe
  Cc: Joe Lawrence, linux-pci, Matthew Garrett, Myron Stowe, David Bulkow

On Fri, 18 Jan 2013, Myron Stowe wrote:

> On Fri, Jan 18, 2013 at 11:22 AM, Joe Lawrence <Joe.Lawrence@stratus.com>wrote:
> 
> > The Stratus PCI topology includes a branch that looks like:
> >
> > ... -00.0-[03-3c]----00.0-[04-2d]--+- ...
> >                                    |
> >                                    \-01.0-[2c-2d]--+-00.0
> >                                                    +-00.1
> >                                                    \-1f.0
> >
> > This is an interesting topology.  The switch contains an Express capable
> downstream port - 04:01.0 - leading to PCI (non-Express) devices (2c:00.0,
> 2c:00.1, and 2c:1f.0).  Would Express links even be used in this topology?
> I'm guessing not, which brings up the question: why would ASPM be inserting
> link state structures into its link_list for such a topology?  Seems like
> the proper thing to do is change the code in the beginning of
> pcie_aspm_init_link_state, or pcie_aspm_sanity_check() with some
> re-factoring, to short-circuit out and do nothing (even when ASPM is
> enabled in the kernel).
> 
> Could you supply an "lspci -xxx -vvs 04:01.0?  It would be interesting to
> see what the "express capabilities" LnkCap indicates with respect to its
> ASPM bits (11:10 Active State Power Management (ASPM) Support bits).
> 
> Myron

Hi Myron,

See lspci output below (hopefully won't be word wrap mangled)

-- Joe


04:01.0 PCI bridge: Device 1bcf:0009 (rev 01) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=04, secondary=2c, subordinate=2d, sec-latency=0
	I/O behind bridge: 00005000-00005fff
	Memory behind bridge: 90000000-92ffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [b0] Express (v2) Downstream Port (Slot-), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- RBE- FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x4, ASPM L1, Latency L0 unlimited, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: Unknown, EnterCompliance+ SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS+
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [ec] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Kernel driver in use: pcieport
00: cf 1b 09 00 07 00 10 00 01 00 04 06 10 00 01 00
10: 00 00 00 00 00 00 00 00 04 2c 2d 00 51 51 00 00
20: 00 90 f0 92 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 b0 00 00 00 00 00 00 00 0b 01 00 00
40: 00 01 00 00 10 00 00 00 0a 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 10 ec 62 00 01 00 00 00 3f 28 00 00 42 78 00 01
c0: 00 00 41 10 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 1f 08 01 00 00 00 00 00 00 00 00 00 01 00 02 c8
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: PCIe ASPM crash on device removal
  2013-01-18 18:22 PCIe ASPM crash on device removal Joe Lawrence
                   ` (2 preceding siblings ...)
       [not found] ` <CAL-B5D0+6uO7WDYR7inmZKdU0h8-bpkOs_CzbF0bD2b9i6=1ZA@mail.gmail.com>
@ 2013-01-18 19:57 ` Myron Stowe
  3 siblings, 0 replies; 28+ messages in thread
From: Myron Stowe @ 2013-01-18 19:57 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-pci, Matthew Garrett, Myron Stowe, David Bulkow

Re-posting after fixing gmail to not use rich text (html) formatting ...

On Fri, Jan 18, 2013 at 11:22 AM, Joe Lawrence <Joe.Lawrence@stratus.com> wrote:
> Stratus has encountered crashes in 3.8.0-rc2 ASPM tracking code when
> hotplugging PCI buses.  The call trace of a crash usually looks like:
>
> pci_stop_and_remove_bus_device
>   pci_stop_bus_device
>     pcie_aspm_exit_link_state
>       pcie_update_aspm_capable
>         pcie_aspm_check_latency
>
> where ASPM is trying to access a freed data structure.  When instrumenting
> the ASPM init/exit code, it seems that pcie_aspm_exit_link_state is
> skipping over one of our PCIe Downstream Ports that
> pcie_aspm_init_link_state had prior allocated link_state.  Subsequent
> trips through ASPM exit code traverse through the hierarchy and crash
> trying to access freed memory (slub_debug was set to FZPU and
> pcie_aspm_check_latency is trying to access a link* that is set to
> 0x6b6b6b6b6b6b6b6b.)
>
> A brief summary of the init/exit link_state routines is as follows.
>
>   pcie_aspm_init_link_state will alloc_pcie_link_state if:
>
>   1 - Device is PCIe
>   2 - Hasn't already allocated a link_state
>   3 - Device is PCI_EXP_TYPE_ROOT_PORT or PCI_EXP_TYPE_DOWNSTREAM
>   4 - Device is not VIA chipset (root port under bridge)
>   5 - Device's subordinate device list is not empty
>
>
>   pcie_aspm_exit_link_state will free_link_state of *parent* device if:
>
>   1 - Device is PCIe
>   2 - Device has a parent device
>   3 - Parent device has a link_state
>   4 - Parent device is PCI_EXP_TYPE_ROOT_PORT or PCI_EXP_TYPE_DOWNSTREAM
>   5 - Device is the last entry in parent's subordinate device list
>
>
> The Stratus PCI topology includes a branch that looks like:
>
> ... -00.0-[03-3c]----00.0-[04-2d]--+- ...
>                                    |
>                                    \-01.0-[2c-2d]--+-00.0
>                                                    +-00.1
>                                                    \-1f.0
>
This is an interesting topology.  The switch contains an Express
capable downstream port - 04:01.0 - leading to PCI (non-Express)
devices (2c:00.0, 2c:00.1, and 2c:1f.0).  Would Express links even be
used in this topology?  I'm guessing not, which brings up the
question: why would ASPM be inserting link state structures into its
link_list for such a topology?  Seems like the proper thing to do is
change the code in the beginning of pcie_aspm_init_link_state, or
pcie_aspm_sanity_check() with some re-factoring, to short-circuit out
and do nothing (even when ASPM is enabled in the kernel).

Could you supply an "lspci -xxx -vvs 04:01.0?  It would be interesting
to see what the "express capabilities" LnkCap indicates with respect
to its ASPM bits (11:10 Active State Power Management (ASPM) Support
bits).

Myron

> % lspci -vs 03:00.0
> 03:00.0 PCI bridge: Device 1bcf:0004 (rev 01) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0
>         Memory at 93000000 (64-bit, non-prefetchable) [size=4K]
>         Bus: primary=03, secondary=04, subordinate=2d, sec-latency=0
>         I/O behind bridge: 00001000-00005fff
>         Memory behind bridge: 84000000-92ffffff
>         Prefetchable memory behind bridge: 00000000c4000000-00000000c7ffffff
>         Capabilities: [b0] Express Upstream Port, MSI 00
>         Capabilities: [ec] Power Management version 2
>         Capabilities: [100] Advanced Error Reporting
>         Kernel driver in use: pcieport
>
> % lspci -vs 04:01.0
> 04:01.0 PCI bridge: Device 1bcf:0009 (rev 01) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0
>         Bus: primary=04, secondary=2c, subordinate=2d, sec-latency=0
>         I/O behind bridge: 00005000-00005fff
>         Memory behind bridge: 90000000-92ffffff
>         Capabilities: [b0] Express Downstream Port (Slot-), MSI 00
>         Capabilities: [ec] Power Management version 2
>         Capabilities: [100] Advanced Error Reporting
>         Kernel driver in use: pcieport
>
> % lspci -vs 2c:00.0
> 2c:00.0 USB Controller: Intel Corporation Patsburg USB2 Enhanced Host
>                         Controller #1 (rev 06) (prog-if 20 [EHCI])
>         Subsystem: Device 1bcf:8002
>         Flags: bus master, medium devsel, latency 0, IRQ 10
>         Memory at 90000000 (32-bit, non-prefetchable) [size=1K]
>         Capabilities: [50] Power Management version 2
>         Capabilities: [58] Debug port: BAR=1 offset=00a0
>         Capabilities: [98] PCI Advanced Features
>         Kernel driver in use: ehci-pci
>
> % lspci -vs 2c:00.1
> 2c:00.1 USB Controller: Intel Corporation Patsburg USB2 Enhanced Host
>                         Controller #2 (rev 06) (prog-if 20 [EHCI])
>         Subsystem: Device 1bcf:8002
>         Flags: bus master, medium devsel, latency 0, IRQ 11
>         Memory at 90001000 (32-bit, non-prefetchable) [size=1K]
>         Capabilities: [50] Power Management version 2
>         Capabilities: [58] Debug port: BAR=1 offset=00a0
>         Capabilities: [98] PCI Advanced Features
>         Kernel driver in use: ehci-pci
>
> % lspci -vs 2c:1f.0
> 2c:1f.0 ISA bridge: Intel Corporation Patsburg LPC Controller (rev 06)
>         Subsystem: Device 1bcf:8002
>         Flags: bus master, medium devsel, latency 0
>         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>
> When debugging is added to pcie_init_link_state, 04:01.0 does set
> blacklist to 1, as the pcie_aspm_sanity_check figures out that at least
> one of its children is not PCIe.  However, all of the criteria 1-5 above
> is met and a link_state is allocated for this device.
>
> On device removal, pcie_aspm_exit_link_state will never clean up 04:01.0 as
> 2c:00.0, 2c:00.1, and most importantly 2c:1f.0 are not PCIe (criteria 1).
>
>
> In reply to this mail, I will post two patches I have been testing this
> week:
>
> [PATCH 1/2] PCI: ASPM exit link state code could skip devices
>
> In looking at pcie_aspm_exit_link_state, might it be better to verify that
> the *parent* device is PCIe rather than the device itself?  This would
> prevent early exit when the last function is not PCIe and also verify that
> the parent is PCIe before calling pci_pcie_type on it.
>
> Furthermore, I didn't investigate this, but does the strategy of the
> last device function out cleaning up the parent link_state leave another
> whole -- what if there are no child devices in the first place?
>
> I'd be happy to test other approaches here.
>
> [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled
>
> Matthew Garrett had provided us this patch to use against a RHEL 6.3
> kernel.  It prevents ASPM from allocating any link_state when
> 'pcie_aspm=off' is specified on the kernel command line.  Stratus has been
> using this patch to avoid this crash in RHEL 6.3.
>
> When testing with Patch 1, this second patch was unnecessary (I removed
> any pcie_aspm directives from the boot line).  However given its utility
> in avoiding the crash in the first place, it would be nice to offer the
> user a means to turn off ASPM tracking to avoid other potential issues.
>
> Regards,
>
> -- Joe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled
  2013-01-18 18:24 ` [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled Joe Lawrence
@ 2013-01-18 22:54   ` Myron Stowe
  2013-02-01 22:32     ` Bjorn Helgaas
  0 siblings, 1 reply; 28+ messages in thread
From: Myron Stowe @ 2013-01-18 22:54 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-pci, Matthew Garrett, David Bulkow

On Fri, 2013-01-18 at 13:24 -0500, Joe Lawrence wrote:
> From 108cca7bee5b04f45d9712507d14b5f3fbc22c29 Mon Sep 17 00:00:00 2001
> From: Joe Lawrence <joe.lawrence@stratus.com>
> Date: Tue, 15 Jan 2013 15:31:28 -0500
> Subject: [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled
> 
> Don't allocate and track PCIe ASPM state when "pcie_aspm=off" is specified on
> the kernel parameter list.
> 
> Based-on-patch-from: Matthew Garrett <mjg59@srcf.ucam.org>
> Reviewed-by: David Bulkow <david.bulkow@stratus.com>
> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
> ---
>  drivers/pci/pcie/aspm.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 6122447..f912a93 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -556,6 +556,9 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
>  	struct pcie_link_state *link;
>  	int blacklist = !!pcie_aspm_sanity_check(pdev);
>  
> +	if (!aspm_support_enabled)
> +		return;
> +
>  	if (!pci_is_pcie(pdev) || pdev->link_state)
>  		return;
>  	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&

I was considering whether putting in a similar check in the
pcie_aspm_exit_link_state() would be best in order to "short circuit"
all the unnecessary processing that still occurs with ASPM turned off
and also for symmetry.  Bjorn suggested just sticking with the fix in
pcie_aspm_init_link_state() since adding something similar in
pcie_aspm_exit_link_state() may create other issues and does not help
with simplifying ASPM's implementation.

Acked-by: Myron Stowe <myron.stowe@redhat.com>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: PCIe ASPM crash on device removal
  2013-01-18 19:53   ` PCIe ASPM crash on device removal Joe Lawrence
@ 2013-01-18 23:15     ` Myron Stowe
  2013-01-18 23:41       ` Myron Stowe
  0 siblings, 1 reply; 28+ messages in thread
From: Myron Stowe @ 2013-01-18 23:15 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-pci, Matthew Garrett, Myron Stowe, David Bulkow

On Fri, Jan 18, 2013 at 12:53 PM, Joe Lawrence <Joe.Lawrence@stratus.com> wrote:
> On Fri, 18 Jan 2013, Myron Stowe wrote:
>
>> On Fri, Jan 18, 2013 at 11:22 AM, Joe Lawrence <Joe.Lawrence@stratus.com>wrote:
>>
>> > The Stratus PCI topology includes a branch that looks like:
>> >
>> > ... -00.0-[03-3c]----00.0-[04-2d]--+- ...
>> >                                    |
>> >                                    \-01.0-[2c-2d]--+-00.0
>> >                                                    +-00.1
>> >                                                    \-1f.0
>> >
>> > This is an interesting topology.  The switch contains an Express capable
>> downstream port - 04:01.0 - leading to PCI (non-Express) devices (2c:00.0,
>> 2c:00.1, and 2c:1f.0).  Would Express links even be used in this topology?
>> I'm guessing not, which brings up the question: why would ASPM be inserting
>> link state structures into its link_list for such a topology?  Seems like
>> the proper thing to do is change the code in the beginning of
>> pcie_aspm_init_link_state, or pcie_aspm_sanity_check() with some
>> re-factoring, to short-circuit out and do nothing (even when ASPM is
>> enabled in the kernel).
>>
>> Could you supply an "lspci -xxx -vvs 04:01.0?  It would be interesting to
>> see what the "express capabilities" LnkCap indicates with respect to its
>> ASPM bits (11:10 Active State Power Management (ASPM) Support bits).
>>
>> Myron
>
> Hi Myron,
>
> See lspci output below (hopefully won't be word wrap mangled)
>
> -- Joe
>
>
> 04:01.0 PCI bridge: Device 1bcf:0009 (rev 01) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Bus: primary=04, secondary=2c, subordinate=2d, sec-latency=0
>         I/O behind bridge: 00005000-00005fff
>         Memory behind bridge: 90000000-92ffffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [b0] Express (v2) Downstream Port (Slot-), MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                         ExtTag- RBE- FLReset-
>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                         MaxPayload 256 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>                 LnkCap: Port #1, Speed 5GT/s, Width x4, ASPM L1, Latency L0 unlimited, L1 <1us
>                         ClockPM- Surprise- LLActRep- BwNot-
>                 LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                 DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd-
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
>                 LnkCtl2: Target Link Speed: Unknown, EnterCompliance+ SpeedDis-, Selectable De-emphasis: -6dB
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS+
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>         Capabilities: [ec] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [100 v1] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>                 AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
>         Kernel driver in use: pcieport
> 00: cf 1b 09 00 07 00 10 00 01 00 04 06 10 00 01 00
> 10: 00 00 00 00 00 00 00 00 04 2c 2d 00 51 51 00 00
> 20: 00 90 f0 92 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 b0 00 00 00 00 00 00 00 0b 01 00 00
> 40: 00 01 00 00 10 00 00 00 0a 00 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> b0: 10 ec 62 00 01 00 00 00 3f 28 00 00 42 78 00 01
> c0: 00 00 41 10 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 1f 08 01 00 00 00 00 00 00 00 00 00 01 00 02 c8
> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>

Joe:

Thanks for the data.  So the downstream port of interest has ASPM link
capability but it's currently not enabled - see LnkCap and LnkCtl
above.

I'm still do not understand if PCI Express links would even be
involved in a topology where all the devices connected below the
downstream port are PCI and not PCI Express.  Seems as if the ASPM
code is going to a lot of work to put link state structures in place
for all these devices that would not be capable of supporting ASPM.

I'm still trying to come up to speed understanding ASPM so hopefully
someone knowledgeable can help clue me in.

Myron

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: PCIe ASPM crash on device removal
  2013-01-18 23:15     ` Myron Stowe
@ 2013-01-18 23:41       ` Myron Stowe
  2013-01-19  1:03         ` Joe Lawrence
  0 siblings, 1 reply; 28+ messages in thread
From: Myron Stowe @ 2013-01-18 23:41 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-pci, Matthew Garrett, Myron Stowe, David Bulkow, shli,
	shli, kaneshige.kenji

[+cc Shaohua and Kenji]

On Fri, Jan 18, 2013 at 4:15 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
> On Fri, Jan 18, 2013 at 12:53 PM, Joe Lawrence <Joe.Lawrence@stratus.com> wrote:
>> On Fri, 18 Jan 2013, Myron Stowe wrote:
>>
>>> On Fri, Jan 18, 2013 at 11:22 AM, Joe Lawrence <Joe.Lawrence@stratus.com>wrote:
>>>
>>> > The Stratus PCI topology includes a branch that looks like:
>>> >
>>> > ... -00.0-[03-3c]----00.0-[04-2d]--+- ...
>>> >                                    |
>>> >                                    \-01.0-[2c-2d]--+-00.0
>>> >                                                    +-00.1
>>> >                                                    \-1f.0
>>> >
>>> > This is an interesting topology.  The switch contains an Express capable
>>> downstream port - 04:01.0 - leading to PCI (non-Express) devices (2c:00.0,
>>> 2c:00.1, and 2c:1f.0).  Would Express links even be used in this topology?
>>> I'm guessing not, which brings up the question: why would ASPM be inserting
>>> link state structures into its link_list for such a topology?  Seems like
>>> the proper thing to do is change the code in the beginning of
>>> pcie_aspm_init_link_state, or pcie_aspm_sanity_check() with some
>>> re-factoring, to short-circuit out and do nothing (even when ASPM is
>>> enabled in the kernel).
>>>
>>> Could you supply an "lspci -xxx -vvs 04:01.0?  It would be interesting to
>>> see what the "express capabilities" LnkCap indicates with respect to its
>>> ASPM bits (11:10 Active State Power Management (ASPM) Support bits).
>>>
>>> Myron
>>
>> Hi Myron,
>>
>> See lspci output below (hopefully won't be word wrap mangled)
>>
>> -- Joe
>>
>>
>> 04:01.0 PCI bridge: Device 1bcf:0009 (rev 01) (prog-if 00 [Normal decode])
>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>         Latency: 0, Cache Line Size: 64 bytes
>>         Bus: primary=04, secondary=2c, subordinate=2d, sec-latency=0
>>         I/O behind bridge: 00005000-00005fff
>>         Memory behind bridge: 90000000-92ffffff
>>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>>         BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>>         Capabilities: [b0] Express (v2) Downstream Port (Slot-), MSI 00
>>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>>                         ExtTag- RBE- FLReset-
>>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
>>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>                         MaxPayload 256 bytes, MaxReadReq 512 bytes
>>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>>                 LnkCap: Port #1, Speed 5GT/s, Width x4, ASPM L1, Latency L0 unlimited, L1 <1us
>>                         ClockPM- Surprise- LLActRep- BwNot-
>>                 LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
>>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>                 LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>                 DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd-
>>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
>>                 LnkCtl2: Target Link Speed: Unknown, EnterCompliance+ SpeedDis-, Selectable De-emphasis: -6dB
>>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS+
>>                          Compliance De-emphasis: -6dB
>>                 LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>>                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>         Capabilities: [ec] Power Management version 2
>>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>>         Capabilities: [100 v1] Advanced Error Reporting
>>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>                 UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>>                 AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
>>         Kernel driver in use: pcieport
>> 00: cf 1b 09 00 07 00 10 00 01 00 04 06 10 00 01 00
>> 10: 00 00 00 00 00 00 00 00 04 2c 2d 00 51 51 00 00
>> 20: 00 90 f0 92 f1 ff 01 00 00 00 00 00 00 00 00 00
>> 30: 00 00 00 00 b0 00 00 00 00 00 00 00 0b 01 00 00
>> 40: 00 01 00 00 10 00 00 00 0a 00 00 00 00 00 00 00
>> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> b0: 10 ec 62 00 01 00 00 00 3f 28 00 00 42 78 00 01
>> c0: 00 00 41 10 00 00 00 00 00 00 00 00 00 00 00 00
>> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> e0: 1f 08 01 00 00 00 00 00 00 00 00 00 01 00 02 c8
>> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>
>
> Joe:
>
> Thanks for the data.  So the downstream port of interest has ASPM link
> capability but it's currently not enabled - see LnkCap and LnkCtl
> above.
>
> I'm still do not understand if PCI Express links would even be
> involved in a topology where all the devices connected below the
> downstream port are PCI and not PCI Express.  Seems as if the ASPM
> code is going to a lot of work to put link state structures in place
> for all these devices that would not be capable of supporting ASPM.
>
> I'm still trying to come up to speed understanding ASPM so hopefully
> someone knowledgeable can help clue me in.
>
> Myron

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: PCIe ASPM crash on device removal
  2013-01-18 23:41       ` Myron Stowe
@ 2013-01-19  1:03         ` Joe Lawrence
  2013-02-01 22:45           ` Bjorn Helgaas
  0 siblings, 1 reply; 28+ messages in thread
From: Joe Lawrence @ 2013-01-19  1:03 UTC (permalink / raw)
  To: Myron Stowe
  Cc: Joe Lawrence, linux-pci, Matthew Garrett, Myron Stowe,
	David Bulkow, shli, shli, kaneshige.kenji

On Fri, 18 Jan 2013, Myron Stowe wrote:

> [+cc Shaohua and Kenji]
> 
> On Fri, Jan 18, 2013 at 4:15 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
> > Joe:
> >
> > Thanks for the data.  So the downstream port of interest has ASPM link
> > capability but it's currently not enabled - see LnkCap and LnkCtl
> > above.
> >
> > I'm still do not understand if PCI Express links would even be
> > involved in a topology where all the devices connected below the
> > downstream port are PCI and not PCI Express.  Seems as if the ASPM
> > code is going to a lot of work to put link state structures in place
> > for all these devices that would not be capable of supporting ASPM.
> >
> > I'm still trying to come up to speed understanding ASPM so hopefully
> > someone knowledgeable can help clue me in.
> >
> > Myron

Scratch my earlier worry about link_state for a device without any 
subordinate devices for pcie_aspm_exit_link_state to clean up.  I forgot 
the check that pcie_aspm_init_link_state makes on the subordinate device 
list before allocating link_state.

What is confusing to me is the asymmetry between these two routines and 
how the pcie_aspm_sanity_check return value (blacklist, introduced in 
commit 46bbdfa4) is handled.  I wonder if pcie_aspm_init_link_state could 
simply skip any device who's subordinate device list contains no pcie 
devices.  (Actually one of the things that pcie_aspm_sanity_check looks 
for.)

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-01-18 18:23 ` [PATCH 1/2] PCI: ASPM exit link state code could skip devices Joe Lawrence
@ 2013-01-31 23:29   ` Myron Stowe
  2013-02-01 19:55     ` Joe Lawrence
  2013-02-06 10:09   ` Gu Zheng
  1 sibling, 1 reply; 28+ messages in thread
From: Myron Stowe @ 2013-01-31 23:29 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-pci, Matthew Garrett, David Bulkow, shli, shli,
	kaneshige.kenji, linux-kernel

On Fri, 2013-01-18 at 13:23 -0500, Joe Lawrence wrote:
> From 3a51bbad5555ba6c6e144aa5176c8112eb449325 Mon Sep 17 00:00:00 2001
> From: Joe Lawrence <joe.lawrence@stratus.com>
> Date: Tue, 15 Jan 2013 14:51:57 -0500
> Subject: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
> 
> On PCI bus hotplug removal, pcie_aspm_exit_link_state could potentially skip
> parent devices that have link_state allocated.  Instead of exiting early if
> a given device is not PCIe, check that the device's parent is PCIe.  This
> enables pcie_aspm_exit_link_state to properly clean up parent link_state when
> the last function in a slot might not be PCIe.
> 
> Reviewed-by: David Bulkow <david.bulkow@stratus.com>
> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
> ---
>  drivers/pci/pcie/aspm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b52630b..6122447 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -634,7 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>  	struct pci_dev *parent = pdev->bus->self;
>  	struct pcie_link_state *link, *root, *parent_link;
>  
> -	if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
> +	if (!parent || !pci_is_pcie(parent) || !parent->link_state)
>  		return;
>  	if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
>  	    (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))

Joe:

Bjorn and I looked at this again today and think the checks that occur
early in 'pcie_aspm_exit_link_state()' could be simplified (as shown
below) to avoid issues such as you are encountering.

I think part of the confusion concerning the asymmetry between
'pcie_aspm_init_link_state()' and 'pcie_aspm_exit_link_state()' is that
the former is passed in a pointer to a bridge device whereas the latter
is passed in a pointer to a end-point device.  Frankly, I find the
entire driver to be confusing and very hard to understand.

Could you please test with the following patch and let us know the
results?


PCI: ASPM exit link state code is skipping devices

From: Myron Stowe <myron.stowe@redhat.com>

On PCI bus hotplug removal, 'pcie_aspm_exit_link_state' can potentially
skip parent devices that have link_state allocated.  Instead of exiting
early if a given device in not PCIe, check whether or not the device's
parent has link_state allocated.  This enables 'pcie_aspm_exit_link_state'
to properly clean up parent link_state when the last function in a slot
might not be PCIe.

Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
---

 drivers/pci/pcie/aspm.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)


diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 8474b6a..aa52727 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -634,10 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
 	struct pci_dev *parent = pdev->bus->self;
 	struct pcie_link_state *link, *root, *parent_link;
 
-	if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
-		return;
-	if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
-	    (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
+	if (!parent || !parent->link_state)
 		return;
 
 	down_read(&pci_bus_sem);



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-01-31 23:29   ` Myron Stowe
@ 2013-02-01 19:55     ` Joe Lawrence
  2013-02-01 22:31       ` Bjorn Helgaas
  0 siblings, 1 reply; 28+ messages in thread
From: Joe Lawrence @ 2013-02-01 19:55 UTC (permalink / raw)
  To: Myron Stowe
  Cc: Joe Lawrence, linux-pci, Matthew Garrett, David Bulkow, shli,
	shli, kaneshige.kenji, linux-kernel

On Thu, 31 Jan 2013, Myron Stowe wrote:

> On Fri, 2013-01-18 at 13:23 -0500, Joe Lawrence wrote:
> > From 3a51bbad5555ba6c6e144aa5176c8112eb449325 Mon Sep 17 00:00:00 2001
> > From: Joe Lawrence <joe.lawrence@stratus.com>
> > Date: Tue, 15 Jan 2013 14:51:57 -0500
> > Subject: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
> > 
> > On PCI bus hotplug removal, pcie_aspm_exit_link_state could potentially skip
> > parent devices that have link_state allocated.  Instead of exiting early if
> > a given device is not PCIe, check that the device's parent is PCIe.  This
> > enables pcie_aspm_exit_link_state to properly clean up parent link_state when
> > the last function in a slot might not be PCIe.
> > 
> > Reviewed-by: David Bulkow <david.bulkow@stratus.com>
> > Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index b52630b..6122447 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -634,7 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> >  	struct pci_dev *parent = pdev->bus->self;
> >  	struct pcie_link_state *link, *root, *parent_link;
> >  
> > -	if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
> > +	if (!parent || !pci_is_pcie(parent) || !parent->link_state)
> >  		return;
> >  	if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
> >  	    (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
> 
> Joe:
> 
> Bjorn and I looked at this again today and think the checks that occur
> early in 'pcie_aspm_exit_link_state()' could be simplified (as shown
> below) to avoid issues such as you are encountering.
> 
> I think part of the confusion concerning the asymmetry between
> 'pcie_aspm_init_link_state()' and 'pcie_aspm_exit_link_state()' is that
> the former is passed in a pointer to a bridge device whereas the latter
> is passed in a pointer to a end-point device.  Frankly, I find the
> entire driver to be confusing and very hard to understand.
> 
> Could you please test with the following patch and let us know the
> results?
> 
> 
> PCI: ASPM exit link state code is skipping devices
> 
> From: Myron Stowe <myron.stowe@redhat.com>
> 
> On PCI bus hotplug removal, 'pcie_aspm_exit_link_state' can potentially
> skip parent devices that have link_state allocated.  Instead of exiting
> early if a given device in not PCIe, check whether or not the device's
> parent has link_state allocated.  This enables 'pcie_aspm_exit_link_state'
> to properly clean up parent link_state when the last function in a slot
> might not be PCIe.
> 
> Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
> ---
> 
>  drivers/pci/pcie/aspm.c |    5 +----
>  1 files changed, 1 insertions(+), 4 deletions(-)
> 
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 8474b6a..aa52727 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -634,10 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>  	struct pci_dev *parent = pdev->bus->self;
>  	struct pcie_link_state *link, *root, *parent_link;
>  
> -	if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
> -		return;
> -	if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
> -	    (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
> +	if (!parent || !parent->link_state)
>  		return;
>  
>  	down_read(&pci_bus_sem);
> 
> 
> 

Hi Myron,

The patch tests eliminates the reported crash when hotplugging our 
problem PCI bus.

The additional checks removed from pcie_aspm_exit_link_state (ie, the 
simplification) do appear to be unnecessary as pcie_aspm_init_link_state 
is already preventing such devices from allocating link state.  

Thanks,

-- Joe

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-01 19:55     ` Joe Lawrence
@ 2013-02-01 22:31       ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-02-01 22:31 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: Myron Stowe, linux-pci, Matthew Garrett, David Bulkow, shli,
	shli, kaneshige.kenji, linux-kernel

On Fri, Feb 1, 2013 at 12:55 PM, Joe Lawrence <Joe.Lawrence@stratus.com> wrote:
> On Thu, 31 Jan 2013, Myron Stowe wrote:

>> PCI: ASPM exit link state code is skipping devices
>>
>> From: Myron Stowe <myron.stowe@redhat.com>
>>
>> On PCI bus hotplug removal, 'pcie_aspm_exit_link_state' can potentially
>> skip parent devices that have link_state allocated.  Instead of exiting
>> early if a given device in not PCIe, check whether or not the device's
>> parent has link_state allocated.  This enables 'pcie_aspm_exit_link_state'
>> to properly clean up parent link_state when the last function in a slot
>> might not be PCIe.
>>
>> Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
>> Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
>> ---
>>
>>  drivers/pci/pcie/aspm.c |    5 +----
>>  1 files changed, 1 insertions(+), 4 deletions(-)
>>
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 8474b6a..aa52727 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -634,10 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>>       struct pci_dev *parent = pdev->bus->self;
>>       struct pcie_link_state *link, *root, *parent_link;
>>
>> -     if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
>> -             return;
>> -     if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
>> -         (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
>> +     if (!parent || !parent->link_state)
>>               return;
>>
>>       down_read(&pci_bus_sem);
>>
>>
>>
>
> Hi Myron,
>
> The patch tests eliminates the reported crash when hotplugging our
> problem PCI bus.
>
> The additional checks removed from pcie_aspm_exit_link_state (ie, the
> simplification) do appear to be unnecessary as pcie_aspm_init_link_state
> is already preventing such devices from allocating link state.

I added a Tested-by for you, Joe, and applied this to my pci/joe-aspm
branch for v3.9.  Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled
  2013-01-18 22:54   ` Myron Stowe
@ 2013-02-01 22:32     ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-02-01 22:32 UTC (permalink / raw)
  To: Myron Stowe; +Cc: Joe Lawrence, linux-pci, Matthew Garrett, David Bulkow

On Fri, Jan 18, 2013 at 3:54 PM, Myron Stowe <mstowe@redhat.com> wrote:
> On Fri, 2013-01-18 at 13:24 -0500, Joe Lawrence wrote:
>> From 108cca7bee5b04f45d9712507d14b5f3fbc22c29 Mon Sep 17 00:00:00 2001
>> From: Joe Lawrence <joe.lawrence@stratus.com>
>> Date: Tue, 15 Jan 2013 15:31:28 -0500
>> Subject: [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled
>>
>> Don't allocate and track PCIe ASPM state when "pcie_aspm=off" is specified on
>> the kernel parameter list.
>>
>> Based-on-patch-from: Matthew Garrett <mjg59@srcf.ucam.org>
>> Reviewed-by: David Bulkow <david.bulkow@stratus.com>
>> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
>> ---
>>  drivers/pci/pcie/aspm.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 6122447..f912a93 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -556,6 +556,9 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev)
>>       struct pcie_link_state *link;
>>       int blacklist = !!pcie_aspm_sanity_check(pdev);
>>
>> +     if (!aspm_support_enabled)
>> +             return;
>> +
>>       if (!pci_is_pcie(pdev) || pdev->link_state)
>>               return;
>>       if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT &&
>
> I was considering whether putting in a similar check in the
> pcie_aspm_exit_link_state() would be best in order to "short circuit"
> all the unnecessary processing that still occurs with ASPM turned off
> and also for symmetry.  Bjorn suggested just sticking with the fix in
> pcie_aspm_init_link_state() since adding something similar in
> pcie_aspm_exit_link_state() may create other issues and does not help
> with simplifying ASPM's implementation.
>
> Acked-by: Myron Stowe <myron.stowe@redhat.com>

I applied this to pci/joe-aspm for v3.9.  Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: PCIe ASPM crash on device removal
  2013-01-19  1:03         ` Joe Lawrence
@ 2013-02-01 22:45           ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-02-01 22:45 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: Myron Stowe, linux-pci, Matthew Garrett, Myron Stowe,
	David Bulkow, shli, shli, kaneshige.kenji

On Fri, Jan 18, 2013 at 6:03 PM, Joe Lawrence <Joe.Lawrence@stratus.com> wrote:
> On Fri, 18 Jan 2013, Myron Stowe wrote:
>
>> [+cc Shaohua and Kenji]
>>
>> On Fri, Jan 18, 2013 at 4:15 PM, Myron Stowe <myron.stowe@gmail.com> wrote:
>> > Joe:
>> >
>> > Thanks for the data.  So the downstream port of interest has ASPM link
>> > capability but it's currently not enabled - see LnkCap and LnkCtl
>> > above.
>> >
>> > I'm still do not understand if PCI Express links would even be
>> > involved in a topology where all the devices connected below the
>> > downstream port are PCI and not PCI Express.  Seems as if the ASPM
>> > code is going to a lot of work to put link state structures in place
>> > for all these devices that would not be capable of supporting ASPM.
>> >
>> > I'm still trying to come up to speed understanding ASPM so hopefully
>> > someone knowledgeable can help clue me in.
>> >
>> > Myron
>
> Scratch my earlier worry about link_state for a device without any
> subordinate devices for pcie_aspm_exit_link_state to clean up.  I forgot
> the check that pcie_aspm_init_link_state makes on the subordinate device
> list before allocating link_state.
>
> What is confusing to me is the asymmetry between these two routines and
> how the pcie_aspm_sanity_check return value (blacklist, introduced in
> commit 46bbdfa4) is handled.  I wonder if pcie_aspm_init_link_state could
> simply skip any device who's subordinate device list contains no pcie
> devices.  (Actually one of the things that pcie_aspm_sanity_check looks
> for.)

I've looked as aspm.c three times in the last couple months, and I
agree, it is a bit difficult to analyze, and the asymmetry you mention
is one reason.  I'd love it if somebody wanted to overhaul it.

I wonder if we could have a very simple rule, e.g., allocate
link_state for a device IFF the device has a link.  Then the init/exit
paths could become symmetric, and the question of whether ASPM is
enabled could be handled separately.  It would become possible to turn
ASPM on/off at run-time.

I'm a little dubious about the link_list, too.  That list is
maintained internal to aspm.c, and we have to worry about updating it
on hotplugs.  I'd rather use some standard way of traversing the PCI
hierarchy so it looks like other PCI code and the locking is easier to
verify.

I guess this is just blue-sky thinking, but thanks for fixing an issue
and also pointing out one of the sources of confusion.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-01-18 18:23 ` [PATCH 1/2] PCI: ASPM exit link state code could skip devices Joe Lawrence
  2013-01-31 23:29   ` Myron Stowe
@ 2013-02-06 10:09   ` Gu Zheng
  2013-02-06 15:23     ` Joe Lawrence
  2013-02-09  0:35     ` Bjorn Helgaas
  1 sibling, 2 replies; 28+ messages in thread
From: Gu Zheng @ 2013-02-06 10:09 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-pci, Matthew Garrett, Myron Stowe, David Bulkow

On 01/19/2013 02:23 AM, Joe Lawrence wrote:

>>From 3a51bbad5555ba6c6e144aa5176c8112eb449325 Mon Sep 17 00:00:00 2001
> From: Joe Lawrence <joe.lawrence@stratus.com>
> Date: Tue, 15 Jan 2013 14:51:57 -0500
> Subject: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
> 
> On PCI bus hotplug removal, pcie_aspm_exit_link_state could potentially skip
> parent devices that have link_state allocated.  Instead of exiting early if
> a given device is not PCIe, check that the device's parent is PCIe.  This
> enables pcie_aspm_exit_link_state to properly clean up parent link_state when
> the last function in a slot might not be PCIe.
> 
> Reviewed-by: David Bulkow <david.bulkow@stratus.com>
> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
> ---
>  drivers/pci/pcie/aspm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b52630b..6122447 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -634,7 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>  	struct pci_dev *parent = pdev->bus->self;
>  	struct pcie_link_state *link, *root, *parent_link;
>  
> -	if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
> +	if (!parent || !pci_is_pcie(parent) || !parent->link_state)
>  		return;
>  	if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
>  	    (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))

Hi Joe,
      We encounter the issue you mentioned above, so I used your patch to try to
fix it, but it seems not helpful. Could you please help to look into it ? 

Thanks,
Gu 

*test script*
echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove

*pci topology tree*
+-09.0-[10-1e]----00.0-[11-1e]--+-00.0-[12-18]----00.0-[13-18]--+-00.0-[14]--+-00.0
             |                               |                               |            \-00.1
             |                               |                               +-01.0-[15]--+-00.0
             |                               |                               |            \-00.1
             |                               |                               +-02.0-[16]----00.0
             |                               |                               +-03.0-[17]----00.0
             |                               |                               \-04.0-[18]--
             |                               \-01.0-[19-1e]----00.0-[1a-1e]--+-00.0-[1b]--
             |                                                               +-01.0-[1c]--+-00.0
             |                                                               |            \-00.1
             |                                                               +-02.0-[1d]--
             |                                                               \-03.0-[1e]--

$ lspci -vs 10:00.0
10:00.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=10, secondary=11, subordinate=1e, sec-latency=0
        I/O behind bridge: 00001000-00005fff
        Memory behind bridge: 92a00000-937fffff
        Prefetchable memory behind bridge: 0000000092200000-00000000929fffff
        Capabilities: <access denied>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

$ lspci -vs 1a:01.0
1a:01.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=1a, secondary=1c, subordinate=1c, sec-latency=0
        I/O behind bridge: 00001000-00001fff
        Memory behind bridge: 92e00000-930fffff
        Prefetchable memory behind bridge: 0000000092600000-00000000927fffff
        Capabilities: <access denied>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

*dmesg info*
[  328.037479] general protection fault: 0000 [#1] SMP 
[  328.096991] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
[  328.697122] CPU 6 
[  328.719040] Pid: 6, comm: kworker/u:0 Tainted: G        W    3.8.0-rc6-aspm-pcie-fix+ #58 FUJITSU-SV PRIMEQUEST 1800E/SB
[  328.851117] RIP: 0010:[<ffffffff813928f8>]  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
[  328.961428] RSP: 0018:ffff8807bde17c48  EFLAGS: 00010202
[  329.024874] RAX: ffff8807bb4a1290 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000006
[  329.110125] RDX: 0000000000000006 RSI: ffff8807bde1afc8 RDI: 0000000000000246
[  329.195371] RBP: ffff8807bde17c68 R08: 0000000000000001 R09: 0000000000000001
[  329.280619] R10: 0000000000000003 R11: 0000000000000001 R12: ffff8807bb49b3d8
[  329.365869] R13: ffff8807bb49b3d8 R14: ffffffff82126d80 R15: ffff8807bde17d58
[  329.451127] FS:  0000000000000000(0000) GS:ffff8807c2600000(0000) knlGS:0000000000000000
[  329.547796] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  329.616431] CR2: ffffffffff600400 CR3: 0000000001c0c000 CR4: 00000000000007e0
[  329.701687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  329.786935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  329.872185] Process kworker/u:0 (pid: 6, threadinfo ffff8807bde16000, task ffff8807bde1a680)
[  329.973006] Stack:
[  329.997000]  0000000000000006 ffff8807bb49b3d8 0000000000000000 ffff8807bb49b3d8
[  330.085822]  ffff8807bde17c88 ffffffff81380f42 2222222222222222 ffff8807bb49b3d8
[  330.174627]  ffff8807bde17cb8 ffffffff81380fb4 0000000000000000 ffff8807bb49b3d8
[  330.263427] Call Trace:
[  330.292616]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
[  330.356064]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
[  330.426778]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
[  330.508919]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
[  330.575487]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
[  330.655551]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
[  330.725228]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
[  330.796981]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
[  330.876002]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
[  330.942567]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
[  331.012243]  [<ffffffff81099f9e>] kthread+0xee/0x100
[  331.071546]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
[  331.141223]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
[  331.216099]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
[  331.280585]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
[  331.355453] Code: 89 65 f0 4c 89 6d f8 66 66 66 66 90 31 c0 49 89 fc 48 c7 c7 35 ee a3 81 e8 70 83 31 00 49 8b 44 24 10 48 8b 58 38 48 85 db 74 48 <80> 7b 4a 00 74 42 48 83 bb 88 00 00 00 00 74 38 31 c0 48 c7 c7 
[  331.587982] RIP  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
[  331.670227]  RSP <ffff8807bde17c48>
[  331.711935] ---[ end trace 359d14e0593f23af ]---
[  331.767128] Kernel panic - not syncing: Fatal exception
[  331.829701] ------------[ cut here ]------------
[  331.884839] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x5c/0x60()
[  331.981506] Hardware name: PRIMEQUEST 1800E
[  332.031449] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
[  332.631448] Pid: 6, comm: kworker/u:0 Tainted: G      D W    3.8.0-rc6-aspm-pcie-fix+ #58
[  332.729156] Call Trace:
[  332.758334]  <IRQ>  [<ffffffff8106dd5f>] warn_slowpath_common+0x7f/0xc0
[  332.837472]  [<ffffffff8106ddba>] warn_slowpath_null+0x1a/0x20
[  332.907144]  [<ffffffff8103db0c>] native_smp_send_reschedule+0x5c/0x60
[  332.985129]  [<ffffffff810bc027>] trigger_load_balance+0x357/0x4f0
[  333.058957]  [<ffffffff810aab76>] scheduler_tick+0x116/0x150
[  333.126557]  [<ffffffff8108093e>] update_process_times+0x6e/0x90
[  333.198305]  [<ffffffff810d8359>] tick_sched_handle+0x39/0x80
[  333.266939]  [<ffffffff810d8584>] tick_sched_timer+0x54/0x90
[  333.334541]  [<ffffffff8109f613>] __run_hrtimer+0x83/0x320
[  333.400060]  [<ffffffff810d8530>] ? tick_nohz_handler+0xc0/0xc0
[  333.470773]  [<ffffffff8109fb56>] hrtimer_interrupt+0x106/0x280
[  333.541489]  [<ffffffff810b3fe7>] ? irqtime_account_irq+0xe7/0x100
[  333.615316]  [<ffffffff816ba949>] smp_apic_timer_interrupt+0x69/0x99
[  333.691221]  [<ffffffff816b9872>] apic_timer_interrupt+0x72/0x80
[  333.762968]  <EOI>  [<ffffffff816aab60>] ? panic+0x1a6/0x1ee
[  333.830680]  [<ffffffff816aab5c>] ? panic+0x1a2/0x1ee
[  333.891012]  [<ffffffff81071ca8>] ? kmsg_dump+0x1d8/0x2a0
[  333.955492]  [<ffffffff81071af6>] ? kmsg_dump+0x26/0x2a0
[  334.018937]  [<ffffffff81071c90>] ? kmsg_dump+0x1c0/0x2a0
[  334.083424]  [<ffffffff816b022c>] oops_end+0xdc/0xf0
[  334.142717]  [<ffffffff8101aa8b>] die+0x5b/0x90
[  334.196816]  [<ffffffff816afe0c>] do_general_protection+0xdc/0x160
[  334.270643]  [<ffffffff816af2a3>] ? restore_args+0x30/0x30
[  334.336165]  [<ffffffff816af518>] general_protection+0x28/0x30
[  334.405839]  [<ffffffff813928f8>] ? pcie_aspm_exit_link_state+0x38/0x190
[  334.485897]  [<ffffffff813928ea>] ? pcie_aspm_exit_link_state+0x2a/0x190
[  334.565955]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
[  334.629398]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
[  334.700114]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
[  334.782248]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
[  334.848807]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
[  334.928863]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
[  334.998539]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
[  335.070290]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
[  335.149311]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
[  335.215870]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
[  335.285544]  [<ffffffff81099f9e>] kthread+0xee/0x100
[  335.344837]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
[  335.414511]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
[  335.489379]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
[  335.553860]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
[  335.628727] ---[ end trace 359d14e0593f23b0 ]---




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-06 10:09   ` Gu Zheng
@ 2013-02-06 15:23     ` Joe Lawrence
  2013-02-09  0:35     ` Bjorn Helgaas
  1 sibling, 0 replies; 28+ messages in thread
From: Joe Lawrence @ 2013-02-06 15:23 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Joe Lawrence, linux-pci, Matthew Garrett, Myron Stowe, David Bulkow

On Wed, 6 Feb 2013, Gu Zheng wrote:

> On 01/19/2013 02:23 AM, Joe Lawrence wrote:
> 
> >>From 3a51bbad5555ba6c6e144aa5176c8112eb449325 Mon Sep 17 00:00:00 2001
> > From: Joe Lawrence <joe.lawrence@stratus.com>
> > Date: Tue, 15 Jan 2013 14:51:57 -0500
> > Subject: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
> > 
> > On PCI bus hotplug removal, pcie_aspm_exit_link_state could potentially skip
> > parent devices that have link_state allocated.  Instead of exiting early if
> > a given device is not PCIe, check that the device's parent is PCIe.  This
> > enables pcie_aspm_exit_link_state to properly clean up parent link_state when
> > the last function in a slot might not be PCIe.
> > 
> > Reviewed-by: David Bulkow <david.bulkow@stratus.com>
> > Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index b52630b..6122447 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -634,7 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> >  	struct pci_dev *parent = pdev->bus->self;
> >  	struct pcie_link_state *link, *root, *parent_link;
> >  
> > -	if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
> > +	if (!parent || !pci_is_pcie(parent) || !parent->link_state)
> >  		return;
> >  	if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
> >  	    (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
> 
> Hi Joe,
>       We encounter the issue you mentioned above, so I used your patch to try to
> fix it, but it seems not helpful. Could you please help to look into it ? 
> 
> Thanks,
> Gu 

You might want to give Myron's patch [1] a try, for that's the version 
that Bjorn applied in his tree.

While you're there, you might want to add a few printk's that sanity check 
some of the values.  Without a disassembly of your version of 
pcie_aspm_exit_link_state(), it's hard to know precisely what caused your 
crash.  pcie_aspm_exit_link_state + 0x2a and 0x38 (the addresses noted 
in your backtrace) seem pretty early in the routine, so perhaps something 
is strange about the parent pointer or references through it.

If all else fails, patch [2] should provide a mechanism to avoid most of  
pcie_aspm_exit_link_state() from executing (provided you boot with  
pcie_aspm=off).

[1] PCI/ASPM: Deallocate upstream link state even if device is not PCIe
http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=commit;h=84fb913c43475e0d1e061220ef4622e3e82e91d6

[2] PCI/ASPM: Don't touch ASPM if forcibly disabled
http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=commit;h=a26d5ecb3201c11e03663a8f4a7dedc0c5f85c07

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-06 10:09   ` Gu Zheng
  2013-02-06 15:23     ` Joe Lawrence
@ 2013-02-09  0:35     ` Bjorn Helgaas
       [not found]       ` <5122F276.80807@cn.fujitsu.com>
  1 sibling, 1 reply; 28+ messages in thread
From: Bjorn Helgaas @ 2013-02-09  0:35 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Joe Lawrence, linux-pci, Matthew Garrett, Myron Stowe, David Bulkow

On Wed, Feb 6, 2013 at 3:09 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> On 01/19/2013 02:23 AM, Joe Lawrence wrote:
>
>>>From 3a51bbad5555ba6c6e144aa5176c8112eb449325 Mon Sep 17 00:00:00 2001
>> From: Joe Lawrence <joe.lawrence@stratus.com>
>> Date: Tue, 15 Jan 2013 14:51:57 -0500
>> Subject: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
>>
>> On PCI bus hotplug removal, pcie_aspm_exit_link_state could potentially skip
>> parent devices that have link_state allocated.  Instead of exiting early if
>> a given device is not PCIe, check that the device's parent is PCIe.  This
>> enables pcie_aspm_exit_link_state to properly clean up parent link_state when
>> the last function in a slot might not be PCIe.
>>
>> Reviewed-by: David Bulkow <david.bulkow@stratus.com>
>> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
>> ---
>>  drivers/pci/pcie/aspm.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index b52630b..6122447 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -634,7 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>>       struct pci_dev *parent = pdev->bus->self;
>>       struct pcie_link_state *link, *root, *parent_link;
>>
>> -     if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
>> +     if (!parent || !pci_is_pcie(parent) || !parent->link_state)
>>               return;
>>       if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
>>           (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
>
> Hi Joe,
>       We encounter the issue you mentioned above, so I used your patch to try to
> fix it, but it seems not helpful. Could you please help to look into it ?

Please try this with a current linux-next tree or at least a tree with
"PCI/ASPM: Deallocate upstream link state even if device is not PCIe"
in it, e.g., http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next

Can you collect  the complete "lspci -vv" output as root?

Do you need both "remove" actions in your test script to cause the
crash?  Removing 10:00.0 should remove the whole tree below it,
including 1a:01.0, so I would think the 1a:01.0 remove would just
fail.

Also, if you can do "disassemble pcie_aspm_exit_link_state" in gdb, we
can figure out exactly where the fault is.

It might be easiest to open a bugzilla and attach the complete dmesg
log, lspci output, disassembly, etc., there.

> *test script*
> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>
> *pci topology tree*
> +-09.0-[10-1e]----00.0-[11-1e]--+-00.0-[12-18]----00.0-[13-18]--+-00.0-[14]--+-00.0
>              |                               |                               |            \-00.1
>              |                               |                               +-01.0-[15]--+-00.0
>              |                               |                               |            \-00.1
>              |                               |                               +-02.0-[16]----00.0
>              |                               |                               +-03.0-[17]----00.0
>              |                               |                               \-04.0-[18]--
>              |                               \-01.0-[19-1e]----00.0-[1a-1e]--+-00.0-[1b]--
>              |                                                               +-01.0-[1c]--+-00.0
>              |                                                               |            \-00.1
>              |                                                               +-02.0-[1d]--
>              |                                                               \-03.0-[1e]--
>
> $ lspci -vs 10:00.0
> 10:00.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0
>         Bus: primary=10, secondary=11, subordinate=1e, sec-latency=0
>         I/O behind bridge: 00001000-00005fff
>         Memory behind bridge: 92a00000-937fffff
>         Prefetchable memory behind bridge: 0000000092200000-00000000929fffff
>         Capabilities: <access denied>
>         Kernel driver in use: pcieport
>         Kernel modules: shpchp
>
> $ lspci -vs 1a:01.0
> 1a:01.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
>         Flags: bus master, fast devsel, latency 0
>         Bus: primary=1a, secondary=1c, subordinate=1c, sec-latency=0
>         I/O behind bridge: 00001000-00001fff
>         Memory behind bridge: 92e00000-930fffff
>         Prefetchable memory behind bridge: 0000000092600000-00000000927fffff
>         Capabilities: <access denied>
>         Kernel driver in use: pcieport
>         Kernel modules: shpchp
>
> *dmesg info*
> [  328.037479] general protection fault: 0000 [#1] SMP
> [  328.096991] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
> [  328.697122] CPU 6
> [  328.719040] Pid: 6, comm: kworker/u:0 Tainted: G        W    3.8.0-rc6-aspm-pcie-fix+ #58 FUJITSU-SV PRIMEQUEST 1800E/SB
> [  328.851117] RIP: 0010:[<ffffffff813928f8>]  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
> [  328.961428] RSP: 0018:ffff8807bde17c48  EFLAGS: 00010202
> [  329.024874] RAX: ffff8807bb4a1290 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000006
> [  329.110125] RDX: 0000000000000006 RSI: ffff8807bde1afc8 RDI: 0000000000000246
> [  329.195371] RBP: ffff8807bde17c68 R08: 0000000000000001 R09: 0000000000000001
> [  329.280619] R10: 0000000000000003 R11: 0000000000000001 R12: ffff8807bb49b3d8
> [  329.365869] R13: ffff8807bb49b3d8 R14: ffffffff82126d80 R15: ffff8807bde17d58
> [  329.451127] FS:  0000000000000000(0000) GS:ffff8807c2600000(0000) knlGS:0000000000000000
> [  329.547796] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  329.616431] CR2: ffffffffff600400 CR3: 0000000001c0c000 CR4: 00000000000007e0
> [  329.701687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  329.786935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  329.872185] Process kworker/u:0 (pid: 6, threadinfo ffff8807bde16000, task ffff8807bde1a680)
> [  329.973006] Stack:
> [  329.997000]  0000000000000006 ffff8807bb49b3d8 0000000000000000 ffff8807bb49b3d8
> [  330.085822]  ffff8807bde17c88 ffffffff81380f42 2222222222222222 ffff8807bb49b3d8
> [  330.174627]  ffff8807bde17cb8 ffffffff81380fb4 0000000000000000 ffff8807bb49b3d8
> [  330.263427] Call Trace:
> [  330.292616]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
> [  330.356064]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
> [  330.426778]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
> [  330.508919]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
> [  330.575487]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
> [  330.655551]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
> [  330.725228]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
> [  330.796981]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
> [  330.876002]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
> [  330.942567]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
> [  331.012243]  [<ffffffff81099f9e>] kthread+0xee/0x100
> [  331.071546]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
> [  331.141223]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
> [  331.216099]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
> [  331.280585]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
> [  331.355453] Code: 89 65 f0 4c 89 6d f8 66 66 66 66 90 31 c0 49 89 fc 48 c7 c7 35 ee a3 81 e8 70 83 31 00 49 8b 44 24 10 48 8b 58 38 48 85 db 74 48 <80> 7b 4a 00 74 42 48 83 bb 88 00 00 00 00 74 38 31 c0 48 c7 c7
> [  331.587982] RIP  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
> [  331.670227]  RSP <ffff8807bde17c48>
> [  331.711935] ---[ end trace 359d14e0593f23af ]---
> [  331.767128] Kernel panic - not syncing: Fatal exception
> [  331.829701] ------------[ cut here ]------------
> [  331.884839] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x5c/0x60()
> [  331.981506] Hardware name: PRIMEQUEST 1800E
> [  332.031449] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
> [  332.631448] Pid: 6, comm: kworker/u:0 Tainted: G      D W    3.8.0-rc6-aspm-pcie-fix+ #58
> [  332.729156] Call Trace:
> [  332.758334]  <IRQ>  [<ffffffff8106dd5f>] warn_slowpath_common+0x7f/0xc0
> [  332.837472]  [<ffffffff8106ddba>] warn_slowpath_null+0x1a/0x20
> [  332.907144]  [<ffffffff8103db0c>] native_smp_send_reschedule+0x5c/0x60
> [  332.985129]  [<ffffffff810bc027>] trigger_load_balance+0x357/0x4f0
> [  333.058957]  [<ffffffff810aab76>] scheduler_tick+0x116/0x150
> [  333.126557]  [<ffffffff8108093e>] update_process_times+0x6e/0x90
> [  333.198305]  [<ffffffff810d8359>] tick_sched_handle+0x39/0x80
> [  333.266939]  [<ffffffff810d8584>] tick_sched_timer+0x54/0x90
> [  333.334541]  [<ffffffff8109f613>] __run_hrtimer+0x83/0x320
> [  333.400060]  [<ffffffff810d8530>] ? tick_nohz_handler+0xc0/0xc0
> [  333.470773]  [<ffffffff8109fb56>] hrtimer_interrupt+0x106/0x280
> [  333.541489]  [<ffffffff810b3fe7>] ? irqtime_account_irq+0xe7/0x100
> [  333.615316]  [<ffffffff816ba949>] smp_apic_timer_interrupt+0x69/0x99
> [  333.691221]  [<ffffffff816b9872>] apic_timer_interrupt+0x72/0x80
> [  333.762968]  <EOI>  [<ffffffff816aab60>] ? panic+0x1a6/0x1ee
> [  333.830680]  [<ffffffff816aab5c>] ? panic+0x1a2/0x1ee
> [  333.891012]  [<ffffffff81071ca8>] ? kmsg_dump+0x1d8/0x2a0
> [  333.955492]  [<ffffffff81071af6>] ? kmsg_dump+0x26/0x2a0
> [  334.018937]  [<ffffffff81071c90>] ? kmsg_dump+0x1c0/0x2a0
> [  334.083424]  [<ffffffff816b022c>] oops_end+0xdc/0xf0
> [  334.142717]  [<ffffffff8101aa8b>] die+0x5b/0x90
> [  334.196816]  [<ffffffff816afe0c>] do_general_protection+0xdc/0x160
> [  334.270643]  [<ffffffff816af2a3>] ? restore_args+0x30/0x30
> [  334.336165]  [<ffffffff816af518>] general_protection+0x28/0x30
> [  334.405839]  [<ffffffff813928f8>] ? pcie_aspm_exit_link_state+0x38/0x190
> [  334.485897]  [<ffffffff813928ea>] ? pcie_aspm_exit_link_state+0x2a/0x190
> [  334.565955]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
> [  334.629398]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
> [  334.700114]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
> [  334.782248]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
> [  334.848807]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
> [  334.928863]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
> [  334.998539]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
> [  335.070290]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
> [  335.149311]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
> [  335.215870]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
> [  335.285544]  [<ffffffff81099f9e>] kthread+0xee/0x100
> [  335.344837]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
> [  335.414511]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
> [  335.489379]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
> [  335.553860]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
> [  335.628727] ---[ end trace 359d14e0593f23b0 ]---
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
       [not found]       ` <5122F276.80807@cn.fujitsu.com>
@ 2013-02-24  0:20         ` Bjorn Helgaas
  2013-02-24  3:13           ` Yinghai Lu
  2013-02-25  5:59           ` Gu Zheng
  0 siblings, 2 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-02-24  0:20 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Joe Lawrence, linux-pci, Matthew Garrett, Myron Stowe,
	David Bulkow, Yinghai Lu

[+cc Yinghai]

On Mon, Feb 18, 2013 at 8:33 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> Resend mail for the network problem, if you have received, please ignore it.
>
> On 02/09/2013 08:35 AM, Bjorn Helgaas wrote:
>
>> On Wed, Feb 6, 2013 at 3:09 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>> On 01/19/2013 02:23 AM, Joe Lawrence wrote:
>>>
>>>> >From 3a51bbad5555ba6c6e144aa5176c8112eb449325 Mon Sep 17 00:00:00 2001
>>>> From: Joe Lawrence <joe.lawrence@stratus.com>
>>>> Date: Tue, 15 Jan 2013 14:51:57 -0500
>>>> Subject: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
>>>>
>>>> On PCI bus hotplug removal, pcie_aspm_exit_link_state could potentially skip
>>>> parent devices that have link_state allocated.  Instead of exiting early if
>>>> a given device is not PCIe, check that the device's parent is PCIe.  This
>>>> enables pcie_aspm_exit_link_state to properly clean up parent link_state when
>>>> the last function in a slot might not be PCIe.
>>>>
>>>> Reviewed-by: David Bulkow <david.bulkow@stratus.com>
>>>> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
>>>> ---
>>>>  drivers/pci/pcie/aspm.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>> index b52630b..6122447 100644
>>>> --- a/drivers/pci/pcie/aspm.c
>>>> +++ b/drivers/pci/pcie/aspm.c
>>>> @@ -634,7 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>>>>       struct pci_dev *parent = pdev->bus->self;
>>>>       struct pcie_link_state *link, *root, *parent_link;
>>>>
>>>> -     if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
>>>> +     if (!parent || !pci_is_pcie(parent) || !parent->link_state)
>>>>               return;
>>>>       if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
>>>>           (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
>>>
>>> Hi Joe,
>>>       We encounter the issue you mentioned above, so I used your patch to try to
>>> fix it, but it seems not helpful. Could you please help to look into it ?
>
>
> Hi Bjorn,
>         Sorry for the later reply, we were on vacation in the last week.
>
>>
>> Please try this with a current linux-next tree or at least a tree with
>> "PCI/ASPM: Deallocate upstream link state even if device is not PCIe"
>> in it, e.g., http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next
>>
>
>
> OK, I'll take it when my test machine is OK.
>
>> Can you collect  the complete "lspci -vv" output as root?
>
>
> Sure, my test machine has a lot of hardware, the info is send as an attachment.
>
>>
>> Do you need both "remove" actions in your test script to cause the
>> crash?  Removing 10:00.0 should remove the whole tree below it,
>> including 1a:01.0, so I would think the 1a:01.0 remove would just
>> fail.
>
>
> The test script is used to test the parallel remove routines trigged by sysfs/pci interface.
> So the issue we mentioned is a extra discovery.
> As you said, when the front part script "echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove"
> runs over, the whole tree below 10:00.0 should be removed already, so the latter part script "
> echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove" wants to remove 1a:01.0 would fail.
> But it does not work as we said indeed, it also enters the remove routine, and rises a fault.
>
> I print some debug info when run the test script. When the latter part remove routine works into
> pcie_aspm_exit_link_state(), the *parent* is set to 0x6b6b6b6b6b6b6b6b, POISON_FREE, because of
> the front remove routine has free it. So the check "if (!parent || !parent->link_state)" will rise
> a fault. Maybe we need to add a check whether *parent* is freed.
>
> Thanks,
> Gu
>
>>
>> Also, if you can do "disassemble pcie_aspm_exit_link_state" in gdb, we
>> can figure out exactly where the fault is.
>>
>> It might be easiest to open a bugzilla and attach the complete dmesg
>> log, lspci output, disassembly, etc., there.
>
>
> OK, I'll create a bugzilla later.
>
>>
>>> *test script*
>>> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>>
>>> *pci topology tree*
>>> +-09.0-[10-1e]----00.0-[11-1e]--+-00.0-[12-18]----00.0-[13-18]--+-00.0-[14]--+-00.0
>>>              |                               |                               |            \-00.1
>>>              |                               |                               +-01.0-[15]--+-00.0
>>>              |                               |                               |            \-00.1
>>>              |                               |                               +-02.0-[16]----00.0
>>>              |                               |                               +-03.0-[17]----00.0
>>>              |                               |                               \-04.0-[18]--
>>>              |                               \-01.0-[19-1e]----00.0-[1a-1e]--+-00.0-[1b]--
>>>              |                                                               +-01.0-[1c]--+-00.0
>>>              |                                                               |            \-00.1
>>>              |                                                               +-02.0-[1d]--
>>>              |                                                               \-03.0-[1e]--
>>>
>>> $ lspci -vs 10:00.0
>>> 10:00.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
>>>         Flags: bus master, fast devsel, latency 0
>>>         Bus: primary=10, secondary=11, subordinate=1e, sec-latency=0
>>>         I/O behind bridge: 00001000-00005fff
>>>         Memory behind bridge: 92a00000-937fffff
>>>         Prefetchable memory behind bridge: 0000000092200000-00000000929fffff
>>>         Capabilities: <access denied>
>>>         Kernel driver in use: pcieport
>>>         Kernel modules: shpchp
>>>
>>> $ lspci -vs 1a:01.0
>>> 1a:01.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
>>>         Flags: bus master, fast devsel, latency 0
>>>         Bus: primary=1a, secondary=1c, subordinate=1c, sec-latency=0
>>>         I/O behind bridge: 00001000-00001fff
>>>         Memory behind bridge: 92e00000-930fffff
>>>         Prefetchable memory behind bridge: 0000000092600000-00000000927fffff
>>>         Capabilities: <access denied>
>>>         Kernel driver in use: pcieport
>>>         Kernel modules: shpchp
>>>
>>> *dmesg info*
>>> [  328.037479] general protection fault: 0000 [#1] SMP
>>> [  328.096991] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>>> [  328.697122] CPU 6
>>> [  328.719040] Pid: 6, comm: kworker/u:0 Tainted: G        W    3.8.0-rc6-aspm-pcie-fix+ #58 FUJITSU-SV PRIMEQUEST 1800E/SB
>>> [  328.851117] RIP: 0010:[<ffffffff813928f8>]  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
>>> [  328.961428] RSP: 0018:ffff8807bde17c48  EFLAGS: 00010202
>>> [  329.024874] RAX: ffff8807bb4a1290 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000006
>>> [  329.110125] RDX: 0000000000000006 RSI: ffff8807bde1afc8 RDI: 0000000000000246
>>> [  329.195371] RBP: ffff8807bde17c68 R08: 0000000000000001 R09: 0000000000000001
>>> [  329.280619] R10: 0000000000000003 R11: 0000000000000001 R12: ffff8807bb49b3d8
>>> [  329.365869] R13: ffff8807bb49b3d8 R14: ffffffff82126d80 R15: ffff8807bde17d58
>>> [  329.451127] FS:  0000000000000000(0000) GS:ffff8807c2600000(0000) knlGS:0000000000000000
>>> [  329.547796] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [  329.616431] CR2: ffffffffff600400 CR3: 0000000001c0c000 CR4: 00000000000007e0
>>> [  329.701687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [  329.786935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> [  329.872185] Process kworker/u:0 (pid: 6, threadinfo ffff8807bde16000, task ffff8807bde1a680)
>>> [  329.973006] Stack:
>>> [  329.997000]  0000000000000006 ffff8807bb49b3d8 0000000000000000 ffff8807bb49b3d8
>>> [  330.085822]  ffff8807bde17c88 ffffffff81380f42 2222222222222222 ffff8807bb49b3d8
>>> [  330.174627]  ffff8807bde17cb8 ffffffff81380fb4 0000000000000000 ffff8807bb49b3d8
>>> [  330.263427] Call Trace:
>>> [  330.292616]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
>>> [  330.356064]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
>>> [  330.426778]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
>>> [  330.508919]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
>>> [  330.575487]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
>>> [  330.655551]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
>>> [  330.725228]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
>>> [  330.796981]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
>>> [  330.876002]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
>>> [  330.942567]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
>>> [  331.012243]  [<ffffffff81099f9e>] kthread+0xee/0x100
>>> [  331.071546]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
>>> [  331.141223]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>> [  331.216099]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
>>> [  331.280585]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>> [  331.355453] Code: 89 65 f0 4c 89 6d f8 66 66 66 66 90 31 c0 49 89 fc 48 c7 c7 35 ee a3 81 e8 70 83 31 00 49 8b 44 24 10 48 8b 58 38 48 85 db 74 48 <80> 7b 4a 00 74 42 48 83 bb 88 00 00 00 00 74 38 31 c0 48 c7 c7
>>> [  331.587982] RIP  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
>>> [  331.670227]  RSP <ffff8807bde17c48>
>>> [  331.711935] ---[ end trace 359d14e0593f23af ]---
>>> [  331.767128] Kernel panic - not syncing: Fatal exception
>>> [  331.829701] ------------[ cut here ]------------
>>> [  331.884839] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x5c/0x60()
>>> [  331.981506] Hardware name: PRIMEQUEST 1800E
>>> [  332.031449] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>>> [  332.631448] Pid: 6, comm: kworker/u:0 Tainted: G      D W    3.8.0-rc6-aspm-pcie-fix+ #58
>>> [  332.729156] Call Trace:
>>> [  332.758334]  <IRQ>  [<ffffffff8106dd5f>] warn_slowpath_common+0x7f/0xc0
>>> [  332.837472]  [<ffffffff8106ddba>] warn_slowpath_null+0x1a/0x20
>>> [  332.907144]  [<ffffffff8103db0c>] native_smp_send_reschedule+0x5c/0x60
>>> [  332.985129]  [<ffffffff810bc027>] trigger_load_balance+0x357/0x4f0
>>> [  333.058957]  [<ffffffff810aab76>] scheduler_tick+0x116/0x150
>>> [  333.126557]  [<ffffffff8108093e>] update_process_times+0x6e/0x90
>>> [  333.198305]  [<ffffffff810d8359>] tick_sched_handle+0x39/0x80
>>> [  333.266939]  [<ffffffff810d8584>] tick_sched_timer+0x54/0x90
>>> [  333.334541]  [<ffffffff8109f613>] __run_hrtimer+0x83/0x320
>>> [  333.400060]  [<ffffffff810d8530>] ? tick_nohz_handler+0xc0/0xc0
>>> [  333.470773]  [<ffffffff8109fb56>] hrtimer_interrupt+0x106/0x280
>>> [  333.541489]  [<ffffffff810b3fe7>] ? irqtime_account_irq+0xe7/0x100
>>> [  333.615316]  [<ffffffff816ba949>] smp_apic_timer_interrupt+0x69/0x99
>>> [  333.691221]  [<ffffffff816b9872>] apic_timer_interrupt+0x72/0x80
>>> [  333.762968]  <EOI>  [<ffffffff816aab60>] ? panic+0x1a6/0x1ee
>>> [  333.830680]  [<ffffffff816aab5c>] ? panic+0x1a2/0x1ee
>>> [  333.891012]  [<ffffffff81071ca8>] ? kmsg_dump+0x1d8/0x2a0
>>> [  333.955492]  [<ffffffff81071af6>] ? kmsg_dump+0x26/0x2a0
>>> [  334.018937]  [<ffffffff81071c90>] ? kmsg_dump+0x1c0/0x2a0
>>> [  334.083424]  [<ffffffff816b022c>] oops_end+0xdc/0xf0
>>> [  334.142717]  [<ffffffff8101aa8b>] die+0x5b/0x90
>>> [  334.196816]  [<ffffffff816afe0c>] do_general_protection+0xdc/0x160
>>> [  334.270643]  [<ffffffff816af2a3>] ? restore_args+0x30/0x30
>>> [  334.336165]  [<ffffffff816af518>] general_protection+0x28/0x30
>>> [  334.405839]  [<ffffffff813928f8>] ? pcie_aspm_exit_link_state+0x38/0x190
>>> [  334.485897]  [<ffffffff813928ea>] ? pcie_aspm_exit_link_state+0x2a/0x190
>>> [  334.565955]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
>>> [  334.629398]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
>>> [  334.700114]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
>>> [  334.782248]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
>>> [  334.848807]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
>>> [  334.928863]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
>>> [  334.998539]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
>>> [  335.070290]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
>>> [  335.149311]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
>>> [  335.215870]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
>>> [  335.285544]  [<ffffffff81099f9e>] kthread+0xee/0x100
>>> [  335.344837]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
>>> [  335.414511]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>> [  335.489379]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
>>> [  335.553860]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>> [  335.628727] ---[ end trace 359d14e0593f23b0 ]---

Please create a bugzilla for this issue.

I think this is a general object lifetime issue that really has
nothing to do with ASPM except that ASPM happens to be the victim.

You're doing this:

    echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
>  /sys/bus/pci/devices/0000\:1a\:01.0/remove

The 1a:01.0 device is downstream from the 10:00.0 bridge.  The sysfs
interface remove_store() uses device_schedule_callback() to schedule
the remove for later.  I think what's happening is that we schedule
remove_callback() for both devices before 10:00.0 has been removed,
like this:

    # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
    remove_store  # for 10:00.0
      device_schedule_callback(10:00.0, remove_callback)
        sysfs_schedule_callback
          kobject_get
          queue_work
    # echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
    remove_store  # for 1a:01.0
      device_schedule_callback(1a:01.0, remove_callback)
        sysfs_schedule_callback
          kobject_get
          queue_work

Note that we acquire a reference on each pci_dev before queuing the work item.

Later, we run the callbacks, starting with 10:00.0.  This calls
remove_callback() to perform the remove:

    remove_callback(10:00.0)
      mutex_lock(&pci_remove_rescan_mutex)
      pci_stop_and_remove_bus_device(pdev)
      mutex_unlock(&pci_remove_rescan_mutex)

This will stop and remove the subtree below 10:00.0, but it does not
actually free the pci_dev for 1a:01.0 because we increased its ref
count in sysfs_schedule_callback.  So after completing
remove_callback(10:00.0), we run the second callback for 1a:01.0.

The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
pci_stop_dev().  This is where we blow up because, according to your
debugging, pdev->bus->self is no longer valid.

The PCI core did this removal wrong.  If we have a valid pci_dev
pointer, as we do in pcie_aspm_exit_link_state(), the whole object
ought to be valid.  But the PCI core deallocated the struct pci_bus
for bus 0000:1a too soon.

My guess is that when we build a pci_dev, we need to increase the ref
count on the pci_bus where that pci_dev lives.  That way we can keep
around all the buses and bridges leading from the root to the device
in question.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-24  0:20         ` Bjorn Helgaas
@ 2013-02-24  3:13           ` Yinghai Lu
  2013-02-27 20:14             ` Bjorn Helgaas
  2013-02-25  5:59           ` Gu Zheng
  1 sibling, 1 reply; 28+ messages in thread
From: Yinghai Lu @ 2013-02-24  3:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Gu Zheng, Joe Lawrence, linux-pci, Matthew Garrett, Myron Stowe,
	David Bulkow

[-- Attachment #1: Type: text/plain, Size: 2483 bytes --]

On Sat, Feb 23, 2013 at 4:20 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> I think this is a general object lifetime issue that really has
> nothing to do with ASPM except that ASPM happens to be the victim.
>
> You're doing this:
>
>     echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
>>  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>
> The 1a:01.0 device is downstream from the 10:00.0 bridge.  The sysfs
> interface remove_store() uses device_schedule_callback() to schedule
> the remove for later.  I think what's happening is that we schedule
> remove_callback() for both devices before 10:00.0 has been removed,
> like this:
>
>     # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
>     remove_store  # for 10:00.0
>       device_schedule_callback(10:00.0, remove_callback)
>         sysfs_schedule_callback
>           kobject_get
>           queue_work
>     # echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>     remove_store  # for 1a:01.0
>       device_schedule_callback(1a:01.0, remove_callback)
>         sysfs_schedule_callback
>           kobject_get
>           queue_work
>
> Note that we acquire a reference on each pci_dev before queuing the work item.
>
> Later, we run the callbacks, starting with 10:00.0.  This calls
> remove_callback() to perform the remove:
>
>     remove_callback(10:00.0)
>       mutex_lock(&pci_remove_rescan_mutex)
>       pci_stop_and_remove_bus_device(pdev)
>       mutex_unlock(&pci_remove_rescan_mutex)
>
> This will stop and remove the subtree below 10:00.0, but it does not
> actually free the pci_dev for 1a:01.0 because we increased its ref
> count in sysfs_schedule_callback.  So after completing
> remove_callback(10:00.0), we run the second callback for 1a:01.0.
>
> The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
> pci_stop_dev().  This is where we blow up because, according to your
> debugging, pdev->bus->self is no longer valid.
>
> The PCI core did this removal wrong.  If we have a valid pci_dev
> pointer, as we do in pcie_aspm_exit_link_state(), the whole object
> ought to be valid.  But the PCI core deallocated the struct pci_bus
> for bus 0000:1a too soon.
>
> My guess is that when we build a pci_dev, we need to increase the ref
> count on the pci_bus where that pci_dev lives.  That way we can keep
> around all the buses and bridges leading from the root to the device
> in question.

Agreed, Please check if attached is what you want.

Thanks

Yinghai

[-- Attachment #2: pci_dev_bus_ref.patch --]
[-- Type: application/octet-stream, Size: 750 bytes --]

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index b494066..6df5428 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1337,6 +1337,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 	 */
 	down_write(&pci_bus_sem);
 	list_add_tail(&dev->bus_list, &bus->devices);
+	get_device(&bus->dev);
 	up_write(&pci_bus_sem);
 
 	pci_fixup_device(pci_fixup_final, dev);
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index cc875e6..a72b700 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -36,6 +36,7 @@ static void pci_destroy_dev(struct pci_dev *dev)
 {
 	down_write(&pci_bus_sem);
 	list_del(&dev->bus_list);
+	put_device(&dev->bus->dev);
 	up_write(&pci_bus_sem);
 
 	pci_free_resources(dev);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-24  0:20         ` Bjorn Helgaas
  2013-02-24  3:13           ` Yinghai Lu
@ 2013-02-25  5:59           ` Gu Zheng
  2013-02-26 16:03             ` Myron Stowe
  1 sibling, 1 reply; 28+ messages in thread
From: Gu Zheng @ 2013-02-25  5:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Joe Lawrence, linux-pci, Matthew Garrett, Myron Stowe,
	David Bulkow, Yinghai Lu

On 02/24/2013 08:20 AM, Bjorn Helgaas wrote:

> [+cc Yinghai]
> 
> On Mon, Feb 18, 2013 at 8:33 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>> Resend mail for the network problem, if you have received, please ignore it.
>>
>> On 02/09/2013 08:35 AM, Bjorn Helgaas wrote:
>>
>>> On Wed, Feb 6, 2013 at 3:09 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>>> On 01/19/2013 02:23 AM, Joe Lawrence wrote:
>>>>
>>>>> >From 3a51bbad5555ba6c6e144aa5176c8112eb449325 Mon Sep 17 00:00:00 2001
>>>>> From: Joe Lawrence <joe.lawrence@stratus.com>
>>>>> Date: Tue, 15 Jan 2013 14:51:57 -0500
>>>>> Subject: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
>>>>>
>>>>> On PCI bus hotplug removal, pcie_aspm_exit_link_state could potentially skip
>>>>> parent devices that have link_state allocated.  Instead of exiting early if
>>>>> a given device is not PCIe, check that the device's parent is PCIe.  This
>>>>> enables pcie_aspm_exit_link_state to properly clean up parent link_state when
>>>>> the last function in a slot might not be PCIe.
>>>>>
>>>>> Reviewed-by: David Bulkow <david.bulkow@stratus.com>
>>>>> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
>>>>> ---
>>>>>  drivers/pci/pcie/aspm.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>> index b52630b..6122447 100644
>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>> @@ -634,7 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>>>>>       struct pci_dev *parent = pdev->bus->self;
>>>>>       struct pcie_link_state *link, *root, *parent_link;
>>>>>
>>>>> -     if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
>>>>> +     if (!parent || !pci_is_pcie(parent) || !parent->link_state)
>>>>>               return;
>>>>>       if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
>>>>>           (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
>>>>
>>>> Hi Joe,
>>>>       We encounter the issue you mentioned above, so I used your patch to try to
>>>> fix it, but it seems not helpful. Could you please help to look into it ?
>>
>>
>> Hi Bjorn,
>>         Sorry for the later reply, we were on vacation in the last week.
>>
>>>
>>> Please try this with a current linux-next tree or at least a tree with
>>> "PCI/ASPM: Deallocate upstream link state even if device is not PCIe"
>>> in it, e.g., http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next
>>>
>>
>>
>> OK, I'll take it when my test machine is OK.
>>
>>> Can you collect  the complete "lspci -vv" output as root?
>>
>>
>> Sure, my test machine has a lot of hardware, the info is send as an attachment.
>>
>>>
>>> Do you need both "remove" actions in your test script to cause the
>>> crash?  Removing 10:00.0 should remove the whole tree below it,
>>> including 1a:01.0, so I would think the 1a:01.0 remove would just
>>> fail.
>>
>>
>> The test script is used to test the parallel remove routines trigged by sysfs/pci interface.
>> So the issue we mentioned is a extra discovery.
>> As you said, when the front part script "echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove"
>> runs over, the whole tree below 10:00.0 should be removed already, so the latter part script "
>> echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove" wants to remove 1a:01.0 would fail.
>> But it does not work as we said indeed, it also enters the remove routine, and rises a fault.
>>
>> I print some debug info when run the test script. When the latter part remove routine works into
>> pcie_aspm_exit_link_state(), the *parent* is set to 0x6b6b6b6b6b6b6b6b, POISON_FREE, because of
>> the front remove routine has free it. So the check "if (!parent || !parent->link_state)" will rise
>> a fault. Maybe we need to add a check whether *parent* is freed.
>>
>> Thanks,
>> Gu
>>
>>>
>>> Also, if you can do "disassemble pcie_aspm_exit_link_state" in gdb, we
>>> can figure out exactly where the fault is.
>>>
>>> It might be easiest to open a bugzilla and attach the complete dmesg
>>> log, lspci output, disassembly, etc., there.
>>
>>
>> OK, I'll create a bugzilla later.
>>
>>>
>>>> *test script*
>>>> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>>>
>>>> *pci topology tree*
>>>> +-09.0-[10-1e]----00.0-[11-1e]--+-00.0-[12-18]----00.0-[13-18]--+-00.0-[14]--+-00.0
>>>>              |                               |                               |            \-00.1
>>>>              |                               |                               +-01.0-[15]--+-00.0
>>>>              |                               |                               |            \-00.1
>>>>              |                               |                               +-02.0-[16]----00.0
>>>>              |                               |                               +-03.0-[17]----00.0
>>>>              |                               |                               \-04.0-[18]--
>>>>              |                               \-01.0-[19-1e]----00.0-[1a-1e]--+-00.0-[1b]--
>>>>              |                                                               +-01.0-[1c]--+-00.0
>>>>              |                                                               |            \-00.1
>>>>              |                                                               +-02.0-[1d]--
>>>>              |                                                               \-03.0-[1e]--
>>>>
>>>> $ lspci -vs 10:00.0
>>>> 10:00.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
>>>>         Flags: bus master, fast devsel, latency 0
>>>>         Bus: primary=10, secondary=11, subordinate=1e, sec-latency=0
>>>>         I/O behind bridge: 00001000-00005fff
>>>>         Memory behind bridge: 92a00000-937fffff
>>>>         Prefetchable memory behind bridge: 0000000092200000-00000000929fffff
>>>>         Capabilities: <access denied>
>>>>         Kernel driver in use: pcieport
>>>>         Kernel modules: shpchp
>>>>
>>>> $ lspci -vs 1a:01.0
>>>> 1a:01.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
>>>>         Flags: bus master, fast devsel, latency 0
>>>>         Bus: primary=1a, secondary=1c, subordinate=1c, sec-latency=0
>>>>         I/O behind bridge: 00001000-00001fff
>>>>         Memory behind bridge: 92e00000-930fffff
>>>>         Prefetchable memory behind bridge: 0000000092600000-00000000927fffff
>>>>         Capabilities: <access denied>
>>>>         Kernel driver in use: pcieport
>>>>         Kernel modules: shpchp
>>>>
>>>> *dmesg info*
>>>> [  328.037479] general protection fault: 0000 [#1] SMP
>>>> [  328.096991] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>>>> [  328.697122] CPU 6
>>>> [  328.719040] Pid: 6, comm: kworker/u:0 Tainted: G        W    3.8.0-rc6-aspm-pcie-fix+ #58 FUJITSU-SV PRIMEQUEST 1800E/SB
>>>> [  328.851117] RIP: 0010:[<ffffffff813928f8>]  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
>>>> [  328.961428] RSP: 0018:ffff8807bde17c48  EFLAGS: 00010202
>>>> [  329.024874] RAX: ffff8807bb4a1290 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000006
>>>> [  329.110125] RDX: 0000000000000006 RSI: ffff8807bde1afc8 RDI: 0000000000000246
>>>> [  329.195371] RBP: ffff8807bde17c68 R08: 0000000000000001 R09: 0000000000000001
>>>> [  329.280619] R10: 0000000000000003 R11: 0000000000000001 R12: ffff8807bb49b3d8
>>>> [  329.365869] R13: ffff8807bb49b3d8 R14: ffffffff82126d80 R15: ffff8807bde17d58
>>>> [  329.451127] FS:  0000000000000000(0000) GS:ffff8807c2600000(0000) knlGS:0000000000000000
>>>> [  329.547796] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [  329.616431] CR2: ffffffffff600400 CR3: 0000000001c0c000 CR4: 00000000000007e0
>>>> [  329.701687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [  329.786935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>> [  329.872185] Process kworker/u:0 (pid: 6, threadinfo ffff8807bde16000, task ffff8807bde1a680)
>>>> [  329.973006] Stack:
>>>> [  329.997000]  0000000000000006 ffff8807bb49b3d8 0000000000000000 ffff8807bb49b3d8
>>>> [  330.085822]  ffff8807bde17c88 ffffffff81380f42 2222222222222222 ffff8807bb49b3d8
>>>> [  330.174627]  ffff8807bde17cb8 ffffffff81380fb4 0000000000000000 ffff8807bb49b3d8
>>>> [  330.263427] Call Trace:
>>>> [  330.292616]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
>>>> [  330.356064]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
>>>> [  330.426778]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
>>>> [  330.508919]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
>>>> [  330.575487]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
>>>> [  330.655551]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
>>>> [  330.725228]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
>>>> [  330.796981]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
>>>> [  330.876002]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
>>>> [  330.942567]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
>>>> [  331.012243]  [<ffffffff81099f9e>] kthread+0xee/0x100
>>>> [  331.071546]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
>>>> [  331.141223]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>>> [  331.216099]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
>>>> [  331.280585]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>>> [  331.355453] Code: 89 65 f0 4c 89 6d f8 66 66 66 66 90 31 c0 49 89 fc 48 c7 c7 35 ee a3 81 e8 70 83 31 00 49 8b 44 24 10 48 8b 58 38 48 85 db 74 48 <80> 7b 4a 00 74 42 48 83 bb 88 00 00 00 00 74 38 31 c0 48 c7 c7
>>>> [  331.587982] RIP  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
>>>> [  331.670227]  RSP <ffff8807bde17c48>
>>>> [  331.711935] ---[ end trace 359d14e0593f23af ]---
>>>> [  331.767128] Kernel panic - not syncing: Fatal exception
>>>> [  331.829701] ------------[ cut here ]------------
>>>> [  331.884839] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x5c/0x60()
>>>> [  331.981506] Hardware name: PRIMEQUEST 1800E
>>>> [  332.031449] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>>>> [  332.631448] Pid: 6, comm: kworker/u:0 Tainted: G      D W    3.8.0-rc6-aspm-pcie-fix+ #58
>>>> [  332.729156] Call Trace:
>>>> [  332.758334]  <IRQ>  [<ffffffff8106dd5f>] warn_slowpath_common+0x7f/0xc0
>>>> [  332.837472]  [<ffffffff8106ddba>] warn_slowpath_null+0x1a/0x20
>>>> [  332.907144]  [<ffffffff8103db0c>] native_smp_send_reschedule+0x5c/0x60
>>>> [  332.985129]  [<ffffffff810bc027>] trigger_load_balance+0x357/0x4f0
>>>> [  333.058957]  [<ffffffff810aab76>] scheduler_tick+0x116/0x150
>>>> [  333.126557]  [<ffffffff8108093e>] update_process_times+0x6e/0x90
>>>> [  333.198305]  [<ffffffff810d8359>] tick_sched_handle+0x39/0x80
>>>> [  333.266939]  [<ffffffff810d8584>] tick_sched_timer+0x54/0x90
>>>> [  333.334541]  [<ffffffff8109f613>] __run_hrtimer+0x83/0x320
>>>> [  333.400060]  [<ffffffff810d8530>] ? tick_nohz_handler+0xc0/0xc0
>>>> [  333.470773]  [<ffffffff8109fb56>] hrtimer_interrupt+0x106/0x280
>>>> [  333.541489]  [<ffffffff810b3fe7>] ? irqtime_account_irq+0xe7/0x100
>>>> [  333.615316]  [<ffffffff816ba949>] smp_apic_timer_interrupt+0x69/0x99
>>>> [  333.691221]  [<ffffffff816b9872>] apic_timer_interrupt+0x72/0x80
>>>> [  333.762968]  <EOI>  [<ffffffff816aab60>] ? panic+0x1a6/0x1ee
>>>> [  333.830680]  [<ffffffff816aab5c>] ? panic+0x1a2/0x1ee
>>>> [  333.891012]  [<ffffffff81071ca8>] ? kmsg_dump+0x1d8/0x2a0
>>>> [  333.955492]  [<ffffffff81071af6>] ? kmsg_dump+0x26/0x2a0
>>>> [  334.018937]  [<ffffffff81071c90>] ? kmsg_dump+0x1c0/0x2a0
>>>> [  334.083424]  [<ffffffff816b022c>] oops_end+0xdc/0xf0
>>>> [  334.142717]  [<ffffffff8101aa8b>] die+0x5b/0x90
>>>> [  334.196816]  [<ffffffff816afe0c>] do_general_protection+0xdc/0x160
>>>> [  334.270643]  [<ffffffff816af2a3>] ? restore_args+0x30/0x30
>>>> [  334.336165]  [<ffffffff816af518>] general_protection+0x28/0x30
>>>> [  334.405839]  [<ffffffff813928f8>] ? pcie_aspm_exit_link_state+0x38/0x190
>>>> [  334.485897]  [<ffffffff813928ea>] ? pcie_aspm_exit_link_state+0x2a/0x190
>>>> [  334.565955]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
>>>> [  334.629398]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
>>>> [  334.700114]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
>>>> [  334.782248]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
>>>> [  334.848807]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
>>>> [  334.928863]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
>>>> [  334.998539]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
>>>> [  335.070290]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
>>>> [  335.149311]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
>>>> [  335.215870]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
>>>> [  335.285544]  [<ffffffff81099f9e>] kthread+0xee/0x100
>>>> [  335.344837]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
>>>> [  335.414511]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>>> [  335.489379]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
>>>> [  335.553860]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>>> [  335.628727] ---[ end trace 359d14e0593f23b0 ]---
> 
> Please create a bugzilla for this issue.
> 

Here:https://bugzilla.kernel.org/show_bug.cgi?id=54411

> I think this is a general object lifetime issue that really has
> nothing to do with ASPM except that ASPM happens to be the victim.
> 
> You're doing this:
> 
>     echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
>>  /sys/bus/pci/devices/0000\:1a\:01.0/remove
> 
> The 1a:01.0 device is downstream from the 10:00.0 bridge.  The sysfs
> interface remove_store() uses device_schedule_callback() to schedule
> the remove for later.  I think what's happening is that we schedule
> remove_callback() for both devices before 10:00.0 has been removed,
> like this:
> 
>     # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
>     remove_store  # for 10:00.0
>       device_schedule_callback(10:00.0, remove_callback)
>         sysfs_schedule_callback
>           kobject_get
>           queue_work
>     # echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>     remove_store  # for 1a:01.0
>       device_schedule_callback(1a:01.0, remove_callback)
>         sysfs_schedule_callback
>           kobject_get
>           queue_work
> 
> Note that we acquire a reference on each pci_dev before queuing the work item.
> 
> Later, we run the callbacks, starting with 10:00.0.  This calls
> remove_callback() to perform the remove:
> 
>     remove_callback(10:00.0)
>       mutex_lock(&pci_remove_rescan_mutex)
>       pci_stop_and_remove_bus_device(pdev)
>       mutex_unlock(&pci_remove_rescan_mutex)
> 
> This will stop and remove the subtree below 10:00.0, but it does not
> actually free the pci_dev for 1a:01.0 because we increased its ref
> count in sysfs_schedule_callback.  So after completing
> remove_callback(10:00.0), we run the second callback for 1a:01.0.
> 
> The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
> pci_stop_dev().  This is where we blow up because, according to your
> debugging, pdev->bus->self is no longer valid.
> 
> The PCI core did this removal wrong.  If we have a valid pci_dev
> pointer, as we do in pcie_aspm_exit_link_state(), the whole object
> ought to be valid.  But the PCI core deallocated the struct pci_bus
> for bus 0000:1a too soon.

Your analysis is perfect, and it solves my doubt. Thanks very much!:) 

> 
> My guess is that when we build a pci_dev, we need to increase the ref
> count on the pci_bus where that pci_dev lives.  That way we can keep
> around all the buses and bridges leading from the root to the device
> in question.
> 
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-25  5:59           ` Gu Zheng
@ 2013-02-26 16:03             ` Myron Stowe
  2013-02-27  6:42               ` Gu Zheng
  0 siblings, 1 reply; 28+ messages in thread
From: Myron Stowe @ 2013-02-26 16:03 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Bjorn Helgaas, Joe Lawrence, linux-pci, Matthew Garrett,
	Myron Stowe, David Bulkow, Yinghai Lu

On Sun, Feb 24, 2013 at 10:59 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> On 02/24/2013 08:20 AM, Bjorn Helgaas wrote:
>
>> [+cc Yinghai]
>>
>> On Mon, Feb 18, 2013 at 8:33 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>> Resend mail for the network problem, if you have received, please ignore it.
>>>
>>> On 02/09/2013 08:35 AM, Bjorn Helgaas wrote:
>>>
>>>> On Wed, Feb 6, 2013 at 3:09 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>>>> On 01/19/2013 02:23 AM, Joe Lawrence wrote:
>>>>>
>>>>>> >From 3a51bbad5555ba6c6e144aa5176c8112eb449325 Mon Sep 17 00:00:00 2001
>>>>>> From: Joe Lawrence <joe.lawrence@stratus.com>
>>>>>> Date: Tue, 15 Jan 2013 14:51:57 -0500
>>>>>> Subject: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
>>>>>>
>>>>>> On PCI bus hotplug removal, pcie_aspm_exit_link_state could potentially skip
>>>>>> parent devices that have link_state allocated.  Instead of exiting early if
>>>>>> a given device is not PCIe, check that the device's parent is PCIe.  This
>>>>>> enables pcie_aspm_exit_link_state to properly clean up parent link_state when
>>>>>> the last function in a slot might not be PCIe.
>>>>>>
>>>>>> Reviewed-by: David Bulkow <david.bulkow@stratus.com>
>>>>>> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
>>>>>> ---
>>>>>>  drivers/pci/pcie/aspm.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>>> index b52630b..6122447 100644
>>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>>> @@ -634,7 +634,7 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>>>>>>       struct pci_dev *parent = pdev->bus->self;
>>>>>>       struct pcie_link_state *link, *root, *parent_link;
>>>>>>
>>>>>> -     if (!pci_is_pcie(pdev) || !parent || !parent->link_state)
>>>>>> +     if (!parent || !pci_is_pcie(parent) || !parent->link_state)
>>>>>>               return;
>>>>>>       if ((pci_pcie_type(parent) != PCI_EXP_TYPE_ROOT_PORT) &&
>>>>>>           (pci_pcie_type(parent) != PCI_EXP_TYPE_DOWNSTREAM))
>>>>>
>>>>> Hi Joe,
>>>>>       We encounter the issue you mentioned above, so I used your patch to try to
>>>>> fix it, but it seems not helpful. Could you please help to look into it ?
>>>
>>>
>>> Hi Bjorn,
>>>         Sorry for the later reply, we were on vacation in the last week.
>>>
>>>>
>>>> Please try this with a current linux-next tree or at least a tree with
>>>> "PCI/ASPM: Deallocate upstream link state even if device is not PCIe"
>>>> in it, e.g., http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next
>>>>
>>>
>>>
>>> OK, I'll take it when my test machine is OK.
>>>
>>>> Can you collect  the complete "lspci -vv" output as root?
>>>
>>>
>>> Sure, my test machine has a lot of hardware, the info is send as an attachment.
>>>
>>>>
>>>> Do you need both "remove" actions in your test script to cause the
>>>> crash?  Removing 10:00.0 should remove the whole tree below it,
>>>> including 1a:01.0, so I would think the 1a:01.0 remove would just
>>>> fail.
>>>
>>>
>>> The test script is used to test the parallel remove routines trigged by sysfs/pci interface.
>>> So the issue we mentioned is a extra discovery.
>>> As you said, when the front part script "echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove"
>>> runs over, the whole tree below 10:00.0 should be removed already, so the latter part script "
>>> echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove" wants to remove 1a:01.0 would fail.
>>> But it does not work as we said indeed, it also enters the remove routine, and rises a fault.
>>>
>>> I print some debug info when run the test script. When the latter part remove routine works into
>>> pcie_aspm_exit_link_state(), the *parent* is set to 0x6b6b6b6b6b6b6b6b, POISON_FREE, because of
>>> the front remove routine has free it. So the check "if (!parent || !parent->link_state)" will rise
>>> a fault. Maybe we need to add a check whether *parent* is freed.
>>>
>>> Thanks,
>>> Gu
>>>
>>>>
>>>> Also, if you can do "disassemble pcie_aspm_exit_link_state" in gdb, we
>>>> can figure out exactly where the fault is.
>>>>
>>>> It might be easiest to open a bugzilla and attach the complete dmesg
>>>> log, lspci output, disassembly, etc., there.
>>>
>>>
>>> OK, I'll create a bugzilla later.
>>>
>>>>
>>>>> *test script*
>>>>> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>>>>
>>>>> *pci topology tree*
>>>>> +-09.0-[10-1e]----00.0-[11-1e]--+-00.0-[12-18]----00.0-[13-18]--+-00.0-[14]--+-00.0
>>>>>              |                               |                               |            \-00.1
>>>>>              |                               |                               +-01.0-[15]--+-00.0
>>>>>              |                               |                               |            \-00.1
>>>>>              |                               |                               +-02.0-[16]----00.0
>>>>>              |                               |                               +-03.0-[17]----00.0
>>>>>              |                               |                               \-04.0-[18]--
>>>>>              |                               \-01.0-[19-1e]----00.0-[1a-1e]--+-00.0-[1b]--
>>>>>              |                                                               +-01.0-[1c]--+-00.0
>>>>>              |                                                               |            \-00.1
>>>>>              |                                                               +-02.0-[1d]--
>>>>>              |                                                               \-03.0-[1e]--
>>>>>
>>>>> $ lspci -vs 10:00.0
>>>>> 10:00.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
>>>>>         Flags: bus master, fast devsel, latency 0
>>>>>         Bus: primary=10, secondary=11, subordinate=1e, sec-latency=0
>>>>>         I/O behind bridge: 00001000-00005fff
>>>>>         Memory behind bridge: 92a00000-937fffff
>>>>>         Prefetchable memory behind bridge: 0000000092200000-00000000929fffff
>>>>>         Capabilities: <access denied>
>>>>>         Kernel driver in use: pcieport
>>>>>         Kernel modules: shpchp
>>>>>
>>>>> $ lspci -vs 1a:01.0
>>>>> 1a:01.0 PCI bridge: Integrated Device Technology, Inc. Device 807f (rev 02) (prog-if 00 [Normal decode])
>>>>>         Flags: bus master, fast devsel, latency 0
>>>>>         Bus: primary=1a, secondary=1c, subordinate=1c, sec-latency=0
>>>>>         I/O behind bridge: 00001000-00001fff
>>>>>         Memory behind bridge: 92e00000-930fffff
>>>>>         Prefetchable memory behind bridge: 0000000092600000-00000000927fffff
>>>>>         Capabilities: <access denied>
>>>>>         Kernel driver in use: pcieport
>>>>>         Kernel modules: shpchp
>>>>>
>>>>> *dmesg info*
>>>>> [  328.037479] general protection fault: 0000 [#1] SMP
>>>>> [  328.096991] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>>>>> [  328.697122] CPU 6
>>>>> [  328.719040] Pid: 6, comm: kworker/u:0 Tainted: G        W    3.8.0-rc6-aspm-pcie-fix+ #58 FUJITSU-SV PRIMEQUEST 1800E/SB
>>>>> [  328.851117] RIP: 0010:[<ffffffff813928f8>]  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
>>>>> [  328.961428] RSP: 0018:ffff8807bde17c48  EFLAGS: 00010202
>>>>> [  329.024874] RAX: ffff8807bb4a1290 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000006
>>>>> [  329.110125] RDX: 0000000000000006 RSI: ffff8807bde1afc8 RDI: 0000000000000246
>>>>> [  329.195371] RBP: ffff8807bde17c68 R08: 0000000000000001 R09: 0000000000000001
>>>>> [  329.280619] R10: 0000000000000003 R11: 0000000000000001 R12: ffff8807bb49b3d8
>>>>> [  329.365869] R13: ffff8807bb49b3d8 R14: ffffffff82126d80 R15: ffff8807bde17d58
>>>>> [  329.451127] FS:  0000000000000000(0000) GS:ffff8807c2600000(0000) knlGS:0000000000000000
>>>>> [  329.547796] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>> [  329.616431] CR2: ffffffffff600400 CR3: 0000000001c0c000 CR4: 00000000000007e0
>>>>> [  329.701687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> [  329.786935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>> [  329.872185] Process kworker/u:0 (pid: 6, threadinfo ffff8807bde16000, task ffff8807bde1a680)
>>>>> [  329.973006] Stack:
>>>>> [  329.997000]  0000000000000006 ffff8807bb49b3d8 0000000000000000 ffff8807bb49b3d8
>>>>> [  330.085822]  ffff8807bde17c88 ffffffff81380f42 2222222222222222 ffff8807bb49b3d8
>>>>> [  330.174627]  ffff8807bde17cb8 ffffffff81380fb4 0000000000000000 ffff8807bb49b3d8
>>>>> [  330.263427] Call Trace:
>>>>> [  330.292616]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
>>>>> [  330.356064]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
>>>>> [  330.426778]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
>>>>> [  330.508919]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
>>>>> [  330.575487]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
>>>>> [  330.655551]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
>>>>> [  330.725228]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
>>>>> [  330.796981]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
>>>>> [  330.876002]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
>>>>> [  330.942567]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
>>>>> [  331.012243]  [<ffffffff81099f9e>] kthread+0xee/0x100
>>>>> [  331.071546]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
>>>>> [  331.141223]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>>>> [  331.216099]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
>>>>> [  331.280585]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>>>> [  331.355453] Code: 89 65 f0 4c 89 6d f8 66 66 66 66 90 31 c0 49 89 fc 48 c7 c7 35 ee a3 81 e8 70 83 31 00 49 8b 44 24 10 48 8b 58 38 48 85 db 74 48 <80> 7b 4a 00 74 42 48 83 bb 88 00 00 00 00 74 38 31 c0 48 c7 c7
>>>>> [  331.587982] RIP  [<ffffffff813928f8>] pcie_aspm_exit_link_state+0x38/0x190
>>>>> [  331.670227]  RSP <ffff8807bde17c48>
>>>>> [  331.711935] ---[ end trace 359d14e0593f23af ]---
>>>>> [  331.767128] Kernel panic - not syncing: Fatal exception
>>>>> [  331.829701] ------------[ cut here ]------------
>>>>> [  331.884839] WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x5c/0x60()
>>>>> [  331.981506] Hardware name: PRIMEQUEST 1800E
>>>>> [  332.031449] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr lpc_ich mfd_core sg i2c_i801 i2c_core ioatdma i7core_edac edac_core e1000e igb dca ptp pps_core sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
>>>>> [  332.631448] Pid: 6, comm: kworker/u:0 Tainted: G      D W    3.8.0-rc6-aspm-pcie-fix+ #58
>>>>> [  332.729156] Call Trace:
>>>>> [  332.758334]  <IRQ>  [<ffffffff8106dd5f>] warn_slowpath_common+0x7f/0xc0
>>>>> [  332.837472]  [<ffffffff8106ddba>] warn_slowpath_null+0x1a/0x20
>>>>> [  332.907144]  [<ffffffff8103db0c>] native_smp_send_reschedule+0x5c/0x60
>>>>> [  332.985129]  [<ffffffff810bc027>] trigger_load_balance+0x357/0x4f0
>>>>> [  333.058957]  [<ffffffff810aab76>] scheduler_tick+0x116/0x150
>>>>> [  333.126557]  [<ffffffff8108093e>] update_process_times+0x6e/0x90
>>>>> [  333.198305]  [<ffffffff810d8359>] tick_sched_handle+0x39/0x80
>>>>> [  333.266939]  [<ffffffff810d8584>] tick_sched_timer+0x54/0x90
>>>>> [  333.334541]  [<ffffffff8109f613>] __run_hrtimer+0x83/0x320
>>>>> [  333.400060]  [<ffffffff810d8530>] ? tick_nohz_handler+0xc0/0xc0
>>>>> [  333.470773]  [<ffffffff8109fb56>] hrtimer_interrupt+0x106/0x280
>>>>> [  333.541489]  [<ffffffff810b3fe7>] ? irqtime_account_irq+0xe7/0x100
>>>>> [  333.615316]  [<ffffffff816ba949>] smp_apic_timer_interrupt+0x69/0x99
>>>>> [  333.691221]  [<ffffffff816b9872>] apic_timer_interrupt+0x72/0x80
>>>>> [  333.762968]  <EOI>  [<ffffffff816aab60>] ? panic+0x1a6/0x1ee
>>>>> [  333.830680]  [<ffffffff816aab5c>] ? panic+0x1a2/0x1ee
>>>>> [  333.891012]  [<ffffffff81071ca8>] ? kmsg_dump+0x1d8/0x2a0
>>>>> [  333.955492]  [<ffffffff81071af6>] ? kmsg_dump+0x26/0x2a0
>>>>> [  334.018937]  [<ffffffff81071c90>] ? kmsg_dump+0x1c0/0x2a0
>>>>> [  334.083424]  [<ffffffff816b022c>] oops_end+0xdc/0xf0
>>>>> [  334.142717]  [<ffffffff8101aa8b>] die+0x5b/0x90
>>>>> [  334.196816]  [<ffffffff816afe0c>] do_general_protection+0xdc/0x160
>>>>> [  334.270643]  [<ffffffff816af2a3>] ? restore_args+0x30/0x30
>>>>> [  334.336165]  [<ffffffff816af518>] general_protection+0x28/0x30
>>>>> [  334.405839]  [<ffffffff813928f8>] ? pcie_aspm_exit_link_state+0x38/0x190
>>>>> [  334.485897]  [<ffffffff813928ea>] ? pcie_aspm_exit_link_state+0x2a/0x190
>>>>> [  334.565955]  [<ffffffff81380f42>] pci_stop_dev+0xb2/0xd0
>>>>> [  334.629398]  [<ffffffff81380fb4>] pci_stop_bus_device+0x54/0x60
>>>>> [  334.700114]  [<ffffffff81381156>] pci_stop_and_remove_bus_device+0x16/0x30
>>>>> [  334.782248]  [<ffffffff8138894b>] remove_callback+0x2b/0x40
>>>>> [  334.848807]  [<ffffffff8125a82a>] sysfs_schedule_callback_work+0x1a/0x80
>>>>> [  334.928863]  [<ffffffff81091b81>] process_one_work+0x241/0x5f0
>>>>> [  334.998539]  [<ffffffff81091b0f>] ? process_one_work+0x1cf/0x5f0
>>>>> [  335.070290]  [<ffffffff8125a810>] ? sysfs_schedule_callback+0x210/0x210
>>>>> [  335.149311]  [<ffffffff81093d3b>] worker_thread+0x12b/0x3f0
>>>>> [  335.215870]  [<ffffffff81093c10>] ? manage_workers+0x180/0x180
>>>>> [  335.285544]  [<ffffffff81099f9e>] kthread+0xee/0x100
>>>>> [  335.344837]  [<ffffffff810e1839>] ? __lock_release+0x129/0x190
>>>>> [  335.414511]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>>>> [  335.489379]  [<ffffffff816b8aec>] ret_from_fork+0x7c/0xb0
>>>>> [  335.553860]  [<ffffffff81099eb0>] ? __init_kthread_worker+0x70/0x70
>>>>> [  335.628727] ---[ end trace 359d14e0593f23b0 ]---
>>
>> Please create a bugzilla for this issue.
>>
>
> Here:https://bugzilla.kernel.org/show_bug.cgi?id=54411
>
>> I think this is a general object lifetime issue that really has
>> nothing to do with ASPM except that ASPM happens to be the victim.
>>
>> You're doing this:
>>
>>     echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
>>>  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>
>> The 1a:01.0 device is downstream from the 10:00.0 bridge.  The sysfs
>> interface remove_store() uses device_schedule_callback() to schedule
>> the remove for later.  I think what's happening is that we schedule
>> remove_callback() for both devices before 10:00.0 has been removed,
>> like this:
>>
>>     # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
>>     remove_store  # for 10:00.0
>>       device_schedule_callback(10:00.0, remove_callback)
>>         sysfs_schedule_callback
>>           kobject_get
>>           queue_work
>>     # echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>     remove_store  # for 1a:01.0
>>       device_schedule_callback(1a:01.0, remove_callback)
>>         sysfs_schedule_callback
>>           kobject_get
>>           queue_work
>>
>> Note that we acquire a reference on each pci_dev before queuing the work item.
>>
>> Later, we run the callbacks, starting with 10:00.0.  This calls
>> remove_callback() to perform the remove:
>>
>>     remove_callback(10:00.0)
>>       mutex_lock(&pci_remove_rescan_mutex)
>>       pci_stop_and_remove_bus_device(pdev)
>>       mutex_unlock(&pci_remove_rescan_mutex)
>>
>> This will stop and remove the subtree below 10:00.0, but it does not
>> actually free the pci_dev for 1a:01.0 because we increased its ref
>> count in sysfs_schedule_callback.  So after completing
>> remove_callback(10:00.0), we run the second callback for 1a:01.0.
>>
>> The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
>> pci_stop_dev().  This is where we blow up because, according to your
>> debugging, pdev->bus->self is no longer valid.
>>
>> The PCI core did this removal wrong.  If we have a valid pci_dev
>> pointer, as we do in pcie_aspm_exit_link_state(), the whole object
>> ought to be valid.  But the PCI core deallocated the struct pci_bus
>> for bus 0000:1a too soon.
>
> Your analysis is perfect, and it solves my doubt. Thanks very much!:)

Gu:

I can't tell from your response whether you are stating that you agree
with Bjorn's analysis -or- that you applied Yinghai's patch above
(dated Feb 23) and found that the testing scenario was now successful.
 Could you please specifically state which and if it was just the
former would you be able to test Yinghai's implementation of Bjorn's
analysis?

Thanks,
 Myron
>
>>
>> My guess is that when we build a pci_dev, we need to increase the ref
>> count on the pci_bus where that pci_dev lives.  That way we can keep
>> around all the buses and bridges leading from the root to the device
>> in question.
>>
>> Bjorn
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-26 16:03             ` Myron Stowe
@ 2013-02-27  6:42               ` Gu Zheng
  2013-02-27  6:47                 ` Yinghai Lu
  0 siblings, 1 reply; 28+ messages in thread
From: Gu Zheng @ 2013-02-27  6:42 UTC (permalink / raw)
  To: Myron Stowe
  Cc: Bjorn Helgaas, Joe Lawrence, linux-pci, Matthew Garrett,
	Myron Stowe, David Bulkow, Yinghai Lu

On 02/27/2013 12:03 AM, Myron Stowe wrote:

> On Sun, Feb 24, 2013 at 10:59 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>> On 02/24/2013 08:20 AM, Bjorn Helgaas wrote:
>>
>>> [+cc Yinghai]
>>>
...snip...
>>> Please create a bugzilla for this issue.
>>>
>>
>> Here:https://bugzilla.kernel.org/show_bug.cgi?id=54411
>>
>>> I think this is a general object lifetime issue that really has
>>> nothing to do with ASPM except that ASPM happens to be the victim.
>>>
>>> You're doing this:
>>>
>>>     echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
>>>>  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>>
>>> The 1a:01.0 device is downstream from the 10:00.0 bridge.  The sysfs
>>> interface remove_store() uses device_schedule_callback() to schedule
>>> the remove for later.  I think what's happening is that we schedule
>>> remove_callback() for both devices before 10:00.0 has been removed,
>>> like this:
>>>
>>>     # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
>>>     remove_store  # for 10:00.0
>>>       device_schedule_callback(10:00.0, remove_callback)
>>>         sysfs_schedule_callback
>>>           kobject_get
>>>           queue_work
>>>     # echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>>     remove_store  # for 1a:01.0
>>>       device_schedule_callback(1a:01.0, remove_callback)
>>>         sysfs_schedule_callback
>>>           kobject_get
>>>           queue_work
>>>
>>> Note that we acquire a reference on each pci_dev before queuing the work item.
>>>
>>> Later, we run the callbacks, starting with 10:00.0.  This calls
>>> remove_callback() to perform the remove:
>>>
>>>     remove_callback(10:00.0)
>>>       mutex_lock(&pci_remove_rescan_mutex)
>>>       pci_stop_and_remove_bus_device(pdev)
>>>       mutex_unlock(&pci_remove_rescan_mutex)
>>>
>>> This will stop and remove the subtree below 10:00.0, but it does not
>>> actually free the pci_dev for 1a:01.0 because we increased its ref
>>> count in sysfs_schedule_callback.  So after completing
>>> remove_callback(10:00.0), we run the second callback for 1a:01.0.
>>>
>>> The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
>>> pci_stop_dev().  This is where we blow up because, according to your
>>> debugging, pdev->bus->self is no longer valid.
>>>
>>> The PCI core did this removal wrong.  If we have a valid pci_dev
>>> pointer, as we do in pcie_aspm_exit_link_state(), the whole object
>>> ought to be valid.  But the PCI core deallocated the struct pci_bus
>>> for bus 0000:1a too soon.
>>
>> Your analysis is perfect, and it solves my doubt. Thanks very much!:)
> 
> Gu:
> 
> I can't tell from your response whether you are stating that you agree
> with Bjorn's analysis -or- that you applied Yinghai's patch above
> (dated Feb 23) and found that the testing scenario was now successful.
>  Could you please specifically state which and if it was just the
> former would you be able to test Yinghai's implementation of Bjorn's
> analysis?

Hi Myron,
    Sorry to make you confused.
    I just agree with Bjorn's analysis. And I have test Yinghai's patch on kernel 3.8
, but it seems does not work. More infos, please refer to bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=54411

Thanks,
Gu

> 
> Thanks,
>  Myron
>>
>>>
>>> My guess is that when we build a pci_dev, we need to increase the ref
>>> count on the pci_bus where that pci_dev lives.  That way we can keep
>>> around all the buses and bridges leading from the root to the device
>>> in question.
>>>
>>> Bjorn
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-27  6:42               ` Gu Zheng
@ 2013-02-27  6:47                 ` Yinghai Lu
  2013-02-28 10:47                   ` Gu Zheng
  0 siblings, 1 reply; 28+ messages in thread
From: Yinghai Lu @ 2013-02-27  6:47 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Myron Stowe, Bjorn Helgaas, Joe Lawrence, linux-pci,
	Matthew Garrett, Myron Stowe, David Bulkow

On Tue, Feb 26, 2013 at 10:42 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>     I just agree with Bjorn's analysis. And I have test Yinghai's patch on kernel 3.8
> , but it seems does not work. More infos, please refer to bugzilla:
> https://bugzilla.kernel.org/show_bug.cgi?id=54411

you need to test that on linus's tree of 2013-02-26.
or v3.9-rc1

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-24  3:13           ` Yinghai Lu
@ 2013-02-27 20:14             ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-02-27 20:14 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Gu Zheng, Joe Lawrence, linux-pci, Matthew Garrett, Myron Stowe,
	David Bulkow

On Sat, Feb 23, 2013 at 07:13:11PM -0800, Yinghai Lu wrote:
> On Sat, Feb 23, 2013 at 4:20 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > I think this is a general object lifetime issue that really has
> > nothing to do with ASPM except that ASPM happens to be the victim.
> >
> > You're doing this:
> >
> >     echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1
> >>  /sys/bus/pci/devices/0000\:1a\:01.0/remove
> >
> > The 1a:01.0 device is downstream from the 10:00.0 bridge.  The sysfs
> > interface remove_store() uses device_schedule_callback() to schedule
> > the remove for later.  I think what's happening is that we schedule
> > remove_callback() for both devices before 10:00.0 has been removed,
> > like this:
> >
> >     # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
> >     remove_store  # for 10:00.0
> >       device_schedule_callback(10:00.0, remove_callback)
> >         sysfs_schedule_callback
> >           kobject_get
> >           queue_work
> >     # echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/remove
> >     remove_store  # for 1a:01.0
> >       device_schedule_callback(1a:01.0, remove_callback)
> >         sysfs_schedule_callback
> >           kobject_get
> >           queue_work
> >
> > Note that we acquire a reference on each pci_dev before queuing the work item.
> >
> > Later, we run the callbacks, starting with 10:00.0.  This calls
> > remove_callback() to perform the remove:
> >
> >     remove_callback(10:00.0)
> >       mutex_lock(&pci_remove_rescan_mutex)
> >       pci_stop_and_remove_bus_device(pdev)
> >       mutex_unlock(&pci_remove_rescan_mutex)
> >
> > This will stop and remove the subtree below 10:00.0, but it does not
> > actually free the pci_dev for 1a:01.0 because we increased its ref
> > count in sysfs_schedule_callback.  So after completing
> > remove_callback(10:00.0), we run the second callback for 1a:01.0.
> >
> > The remove for 1a:01.0 calls pcie_aspm_exit_link_state() from
> > pci_stop_dev().  This is where we blow up because, according to your
> > debugging, pdev->bus->self is no longer valid.
> >
> > The PCI core did this removal wrong.  If we have a valid pci_dev
> > pointer, as we do in pcie_aspm_exit_link_state(), the whole object
> > ought to be valid.  But the PCI core deallocated the struct pci_bus
> > for bus 0000:1a too soon.
> >
> > My guess is that when we build a pci_dev, we need to increase the ref
> > count on the pci_bus where that pci_dev lives.  That way we can keep
> > around all the buses and bridges leading from the root to the device
> > in question.
> 
> Agreed, Please check if attached is what you want.

I included the patch directly below for convenience.

> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index b494066..6df5428 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1337,6 +1337,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>  	 */
>  	down_write(&pci_bus_sem);
>  	list_add_tail(&dev->bus_list, &bus->devices);
> +	get_device(&bus->dev);
>  	up_write(&pci_bus_sem);

This suggests that taking the reference must be protected by
pci_bus_sem, but I don't think that's the case, is it?

>  	pci_fixup_device(pci_fixup_final, dev);
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index cc875e6..a72b700 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -36,6 +36,7 @@ static void pci_destroy_dev(struct pci_dev *dev)
>  {
>  	down_write(&pci_bus_sem);
>  	list_del(&dev->bus_list);
> +	put_device(&dev->bus->dev);
>  	up_write(&pci_bus_sem);
>  
>  	pci_free_resources(dev);

The problem is that we capture the struct pci_bus pointer without
taking a reference on the bus object.  The capture occurs in
pci_scan_device() (and other callers of alloc_pci_dev()) where we do
this:

    dev = alloc_pci_dev();
    dev->bus = bus;

To make code analysis easier, I think we should take the reference at
the exact point where we capture the pointer, using something like:

    dev->bus = pci_bus_get(bus);

It'd probably be even better to deprecate alloc_pci_dev() and make a
pci_alloc_dev(struct pci_bus *bus) that takes care of acquiring the
reference itself.

But I don't think this takes care of the whole issue.  What protects
the pci_scan_slot() path against concurrent removal of the pci_bus?
I don't see anything in the hotplug drivers or in the
pci_scan_child_bus() path that acquires a reference or does locking
for this.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-27  6:47                 ` Yinghai Lu
@ 2013-02-28 10:47                   ` Gu Zheng
  2013-02-28 11:50                     ` Yijing Wang
  2013-02-28 15:11                     ` Yinghai Lu
  0 siblings, 2 replies; 28+ messages in thread
From: Gu Zheng @ 2013-02-28 10:47 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Myron Stowe, Bjorn Helgaas, Joe Lawrence, linux-pci,
	Matthew Garrett, Myron Stowe, David Bulkow

On 02/27/2013 02:47 PM, Yinghai Lu wrote:

> On Tue, Feb 26, 2013 at 10:42 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>     I just agree with Bjorn's analysis. And I have test Yinghai's patch on kernel 3.8
>> , but it seems does not work. More infos, please refer to bugzilla:
>> https://bugzilla.kernel.org/show_bug.cgi?id=54411
> 
> you need to test that on linus's tree of 2013-02-26.
> or v3.9-rc1

Hi Yinghai,
	I test your patch on linus' tree of 2-26
commit d895cb1af15c04c522a25c79cc429076987c089b
But it still does not work~

Thanks
Gu

> 
> Thanks
> 
> Yinghai
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-28 10:47                   ` Gu Zheng
@ 2013-02-28 11:50                     ` Yijing Wang
  2013-02-28 15:11                     ` Yinghai Lu
  1 sibling, 0 replies; 28+ messages in thread
From: Yijing Wang @ 2013-02-28 11:50 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yinghai Lu, Myron Stowe, Bjorn Helgaas, Joe Lawrence, linux-pci,
	Matthew Garrett, Myron Stowe, David Bulkow

[-- Attachment #1: Type: text/plain, Size: 10082 bytes --]

On 2013/2/28 18:47, Gu Zheng wrote:
> On 02/27/2013 02:47 PM, Yinghai Lu wrote:
> 
>> On Tue, Feb 26, 2013 at 10:42 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>>     I just agree with Bjorn's analysis. And I have test Yinghai's patch on kernel 3.8
>>> , but it seems does not work. More infos, please refer to bugzilla:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=54411
>>
>> you need to test that on linus's tree of 2013-02-26.
>> or v3.9-rc1
> 
> Hi Yinghai,
> 	I test your patch on linus' tree of 2-26
> commit d895cb1af15c04c522a25c79cc429076987c089b
> But it still does not work~

I found another problem when doing device remove by /sys/..../$device/remove and acpi hotplug.
Because remove_callback() function was called in workqueue. The device which was hold by
remove_callback() may be removed by other interfaces like acpiphp/pciehp, upstream device remove....
So once remove_callback() try to remove this device again(which was removed), system may panic.

panic info found in my machine:
kworker/u:3[273]: Oops 11003706212352 [1]
Modules linked in: raw snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device nfsv3 nf
s_acl iptable_filter ip_tables x_tables nfs fscache dns_resolver lockd sunrpc cp
ufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq binfmt_misc
fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandler dm_mod snd_hda_code
c_hdmi snd_hda_intel igb snd_hda_codec snd_hwdep snd_pcm snd_timer iTCO_wdt iTCO
_vendor_support snd ppdev soundcore serio_raw lpc_ich mfd_core snd_page_alloc sg
 ehci_pci mptctl ptp pps_core i2c_i801 parport_pc i2c_core hid_generic parport c
ontainer button usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10di
f ext3 mbcache jbd fan processor ide_pci_generic ide_core mptsas mptscsih mptbas
e scsi_transport_sas ata_piix libata scsi_mod thermal thermal_sys hwmon

Pid: 273, CPU 29, comm:          kworker/u:3
psr : 0000121008526038 ifs : 8000000000000307 ip  : [<a0000001004d3e21>]    Tain
ted: G    B        (3.8.0-rc2-pci-bind)
ip is at pci_destroy_dev+0x61/0x160
unat: 0000000000000000 pfs : 0000000000000307 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000018000019585
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c9e70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001004d3df0 b6  : a0000001004c92a0 b7  : a00000010000b4e0
f6  : 000000000000000000000 f7  : 1003e00000018ac0017c7
f8  : 1003e0044b82fa09b5a53 f9  : 1003e00002779e56ddcba
f10 : 1003e17b2cb67d049962e f11 : 1003e0000000000000c56
r1  : a0000001015ae780 r2  : 0000000000100100 r3  : 0000000000100108
r8  : a0000001013af748 r9  : 0000000000000000 r10 : 0000000000200201
r11 : 000000000000d5a4 r12 : e0000007059afdd0 r13 : e0000007059a0000
r14 : 0000000000200200 r15 : 0000000000200200 r16 : 0000000000100100
r17 : e00000170353da88 r18 : e000001f03503e80 r19 : e00000170353da90
r20 : 0000000000000000 r21 : 0000000000000000 r22 : a0000001013cc608
r23 : 0000000000000063 r24 : 000000000000006b r25 : 000000000000006c
r26 : 000000000000006f r27 : a000000101a82cc0 r28 : 0000000000000000
r29 : 0000000000000000 r30 : 000000000000d5a2 r31 : 000000000000d5a2

Call Trace:
 [<a000000100015f00>] show_stack+0x80/0xa0
                                sp=e0000007059af990 bsp=e0000007059a1400
 [<a000000100016560>] show_regs+0x640/0x920
                                sp=e0000007059afb60 bsp=e0000007059a13a0
 [<a0000001000418f0>] die+0x190/0x2c0
                                sp=e0000007059afb70 bsp=e0000007059a1360
 [<a00000010094b370>] ia64_do_page_fault+0xbd0/0xc00
                                sp=e0000007059afb70 bsp=e0000007059a12d0
 [<a00000010000bd40>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000007059afc00 bsp=e0000007059a12d0
 [<a0000001004d3e20>] pci_destroy_dev+0x60/0x160
                                sp=e0000007059afdd0 bsp=e0000007059a1298
 [<a0000001004d44a0>] pci_remove_bus_device+0xc0/0xe0
                                sp=e0000007059afdd0 bsp=e0000007059a1258
 [<a0000001004d44f0>] pci_stop_and_remove_bus_device+0x30/0x60
                                sp=e0000007059afdd0 bsp=e0000007059a1238
 [<a0000001004e33d0>] remove_callback+0xf0/0x1c0
                                sp=e0000007059afdd0 bsp=e0000007059a1208
 [<a00000010034d730>] sysfs_schedule_callback_work+0x50/0x120
                                sp=e0000007059afdd0 bsp=e0000007059a11d0
 [<a0000001000b85a0>] process_one_work+0x520/0xa80
                                sp=e0000007059afdd0 bsp=e0000007059a1140
 [<a0000001000b98b0>] worker_thread+0x330/0xde0
                                sp=e0000007059afdd0 bsp=e0000007059a1070
 [<a0000001000cd070>] kthread+0x150/0x180
                                sp=e0000007059afdd0 bsp=e0000007059a1038
 [<a00000010000bb30>] call_payload+0x50/0x80
                                sp=e0000007059afe30 bsp=e0000007059a1020
Unable to handle kernel NULL pointer dereference (address 0000000000000038)
kworker/u:3[273]: Oops 8813272891392 [2]
Modules linked in: raw snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device nfsv3 nf
s_acl iptable_filter ip_tables x_tables nfs fscache dns_resolver lockd sunrpc cp
ufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq binfmt_misc
fuse nls_iso8859_1 loop ipmi_si ipmi_devintf ipmi_msghandler dm_mod snd_hda_code
c_hdmi snd_hda_intel igb snd_hda_codec snd_hwdep snd_pcm snd_timer iTCO_wdt iTCO
_vendor_support snd ppdev soundcore serio_raw lpc_ich mfd_core snd_page_alloc sg
 ehci_pci mptctl ptp pps_core i2c_i801 parport_pc i2c_core hid_generic parport c
ontainer button usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10di
f ext3 mbcache jbd fan processor ide_pci_generic ide_core mptsas mptscsih mptbas
e scsi_transport_sas ata_piix libata scsi_mod thermal thermal_sys hwmon

Pid: 273, CPU 29, comm:          kworker/u:3
psr : 0000101008022038 ifs : 8000000000000309 ip  : [<a0000001000c21b0>]    Tain
ted: G    B D      (3.8.0-rc2-pci-bind)
ip is at wq_worker_sleeping+0x30/0x180
unat: 0000000000000000 pfs : 0000000000000309 rsc : 0000000000000003
rnat: 000000000000040e bsps: 0000000000000003 pr  : 000565501552a5d5
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001000c21a0 b6  : a0000001000fdc80 b7  : a0000001000ffbe0
f6  : 0ffefaec33e1f63409a90 f7  : 0fff1ed2d4e22a0000000
f8  : 10017a916000000000000 f9  : 1000ebb80000000000000
f10 : 10007e6dbd1941e705b2d f11 : 1003e00000000000001cd
r1  : a0000001015ae780 r2  : 0000000000000000 r3  : 0000000000000038
r8  : 0000000000000000 r9  : 0000000000000000 r10 : e000001800206280
r11 : e0000018002063a0 r12 : e0000007059afb60 r13 : e0000007059a0000
r14 : ffffffffffffffd8 r15 : e0000018002062f4 r16 : 0000315801ec75e5
r17 : e000001800206bd0 r18 : e0000018002063a0 r19 : 000000000315801e
r20 : e000001800206360 r21 : a0000001014fb630 r22 : e0000018002062e0
r23 : a000000101b2cb88 r24 : e0000007059a0070 r25 : e000001800206b40
r26 : 00000000000001cc r27 : 000000000000bb80 r28 : 000000000000bb7f
r29 : 000000000420806c r30 : e0000007059a0014 r31 : 000000000000b9dd

Call Trace:
 [<a000000100015f00>] show_stack+0x80/0xa0
                                sp=e0000007059af720 bsp=e0000007059a1740
 [<a000000100016560>] show_regs+0x640/0x920
                                sp=e0000007059af8f0 bsp=e0000007059a16e8
 [<a0000001000418f0>] die+0x190/0x2c0
                                sp=e0000007059af900 bsp=e0000007059a16a8
 [<a00000010094b150>] ia64_do_page_fault+0x9b0/0xc00
                                sp=e0000007059af900 bsp=e0000007059a1618
 [<a00000010000bd40>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000007059af990 bsp=e0000007059a1618
 [<a0000001000c21b0>] wq_worker_sleeping+0x30/0x180
                                sp=e0000007059afb60 bsp=e0000007059a15c8
 [<a0000001009430f0>] __schedule+0x14f0/0x16c0
                                sp=e0000007059afb60 bsp=e0000007059a1458
 [<a000000100943580>] schedule+0x60/0x140
                                sp=e0000007059afb70 bsp=e0000007059a1400
 [<a00000010008e050>] do_exit+0x6d0/0xc20
                                sp=e0000007059afb70 bsp=e0000007059a13a0
 [<a0000001000419c0>] die+0x260/0x2c0
                                sp=e0000007059afb70 bsp=e0000007059a1360
 [<a00000010094b370>] ia64_do_page_fault+0xbd0/0xc00
                                sp=e0000007059afb70 bsp=e0000007059a12d0
 [<a00000010000bd40>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000007059afc00 bsp=e0000007059a12d0
 [<a0000001004d3e20>] pci_destroy_dev+0x60/0x160
                                sp=e0000007059afdd0 bsp=e0000007059a1298
 [<a0000001004d44a0>] pci_remove_bus_device+0xc0/0xe0
                                sp=e0000007059afdd0 bsp=e0000007059a1258
 [<a0000001004d44f0>] pci_stop_and_remove_bus_device+0x30/0x60
                                sp=e0000007059afdd0 bsp=e0000007059a1238
 [<a0000001004e33d0>] remove_callback+0xf0/0x1c0
                                sp=e0000007059afdd0 bsp=e0000007059a1208
 [<a00000010034d730>] sysfs_schedule_callback_work+0x50/0x120
                                sp=e0000007059afdd0 bsp=e0000007059a11d0
 [<a0000001000b85a0>] process_one_work+0x520/0xa80
                                sp=e0000007059afdd0 bsp=e0000007059a1140
 [<a0000001000b98b0>] worker_thread+0x330/0xde0
                                sp=e0000007059afdd0 bsp=e0000007059a1070
 [<a0000001000cd070>] kthread+0x150/0x180
                                sp=e0000007059afdd0 bsp=e0000007059a1038
 [<a00000010000bb30>] call_payload+0x50/0x80
                                sp=e0000007059afe30 bsp=e0000007059a1020
Fixing recursive fault but reboot is needed!

I hope this patch can fix your problem too.


> 
> Thanks
> Gu
> 
>>
>> Thanks
>>
>> Yinghai
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


-- 
Thanks!
Yijing

[-- Attachment #2: 0001-PCI-check-device-is_added-flag-in-remove_callback.patch --]
[-- Type: text/x-patch, Size: 1517 bytes --]

>From ba405b9ea86d8ebd4fd9754aef67d986b0835f9a Mon Sep 17 00:00:00 2001
From: Yijing Wang <wangyijing@huawei.com>
Date: Thu, 28 Feb 2013 19:51:40 +0800
Subject: [PATCH] PCI: check device is_added flag in remove_callback()

Currently, remove_store() function use device_schedule_callback()
mechanism to do device remove action. It will queue remove_callback()
into sysfs_workqueue. If this device was removed by other interfaces
like acpiphp/pciehp between device_schedule_callback() function and
remove_callback() function. This patch add is_added flag check
in remove_callback() to avoid remove a removed device again.


+-07.0-[0000:05]--+-00.0  nVidia Corporation GT218 [GeForce G210]
|                 \-00.1  nVidia Corporation High Definition Audio Controller

#echo 1 > /sys/bus/pci/devices/0000:05:00.0/remove
#echo 0 > /sys/bus/pci/slots/0/power (address: 0000:05:00, slot attached to 0000:00:07.0)

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
---
 drivers/pci/pci-sysfs.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 9c6e9bb..6b77133 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -331,7 +331,8 @@ static void remove_callback(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 
 	mutex_lock(&pci_remove_rescan_mutex);
-	pci_stop_and_remove_bus_device(pdev);
+	if (pdev->is_added)
+		pci_stop_and_remove_bus_device(pdev);
 	mutex_unlock(&pci_remove_rescan_mutex);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-28 10:47                   ` Gu Zheng
  2013-02-28 11:50                     ` Yijing Wang
@ 2013-02-28 15:11                     ` Yinghai Lu
  2013-03-01  1:14                       ` Gu Zheng
  1 sibling, 1 reply; 28+ messages in thread
From: Yinghai Lu @ 2013-02-28 15:11 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Myron Stowe, Bjorn Helgaas, Joe Lawrence, linux-pci,
	Matthew Garrett, Myron Stowe, David Bulkow

On Thu, Feb 28, 2013 at 2:47 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> On 02/27/2013 02:47 PM, Yinghai Lu wrote:
>
>> On Tue, Feb 26, 2013 at 10:42 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>>     I just agree with Bjorn's analysis. And I have test Yinghai's patch on kernel 3.8
>>> , but it seems does not work. More infos, please refer to bugzilla:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=54411
>>
>> you need to test that on linus's tree of 2013-02-26.
>> or v3.9-rc1
>
> Hi Yinghai,
>         I test your patch on linus' tree of 2-26
> commit d895cb1af15c04c522a25c79cc429076987c089b
> But it still does not work~

Still have full of warning in dmesg?

or double remove does not work.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] PCI: ASPM exit link state code could skip devices
  2013-02-28 15:11                     ` Yinghai Lu
@ 2013-03-01  1:14                       ` Gu Zheng
  0 siblings, 0 replies; 28+ messages in thread
From: Gu Zheng @ 2013-03-01  1:14 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Myron Stowe, Bjorn Helgaas, Joe Lawrence, linux-pci,
	Matthew Garrett, Myron Stowe, David Bulkow

On 02/28/2013 11:11 PM, Yinghai Lu wrote:

> On Thu, Feb 28, 2013 at 2:47 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>> On 02/27/2013 02:47 PM, Yinghai Lu wrote:
>>
>>> On Tue, Feb 26, 2013 at 10:42 PM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>>>     I just agree with Bjorn's analysis. And I have test Yinghai's patch on kernel 3.8
>>>> , but it seems does not work. More infos, please refer to bugzilla:
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=54411
>>>
>>> you need to test that on linus's tree of 2013-02-26.
>>> or v3.9-rc1
>>
>> Hi Yinghai,
>>         I test your patch on linus' tree of 2-26
>> commit d895cb1af15c04c522a25c79cc429076987c089b
>> But it still does not work~
> 
> Still have full of warning in dmesg?

> or double remove does not work.

No, there is no warning in dmesg on booting, just concurrent removal does not work!

Thanks,
Gu

> 
> Thanks
> 
> Yinghai
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-03-01  1:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-18 18:22 PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 18:23 ` [PATCH 1/2] PCI: ASPM exit link state code could skip devices Joe Lawrence
2013-01-31 23:29   ` Myron Stowe
2013-02-01 19:55     ` Joe Lawrence
2013-02-01 22:31       ` Bjorn Helgaas
2013-02-06 10:09   ` Gu Zheng
2013-02-06 15:23     ` Joe Lawrence
2013-02-09  0:35     ` Bjorn Helgaas
     [not found]       ` <5122F276.80807@cn.fujitsu.com>
2013-02-24  0:20         ` Bjorn Helgaas
2013-02-24  3:13           ` Yinghai Lu
2013-02-27 20:14             ` Bjorn Helgaas
2013-02-25  5:59           ` Gu Zheng
2013-02-26 16:03             ` Myron Stowe
2013-02-27  6:42               ` Gu Zheng
2013-02-27  6:47                 ` Yinghai Lu
2013-02-28 10:47                   ` Gu Zheng
2013-02-28 11:50                     ` Yijing Wang
2013-02-28 15:11                     ` Yinghai Lu
2013-03-01  1:14                       ` Gu Zheng
2013-01-18 18:24 ` [PATCH 2/2] PCI: Don't touch ASPM if forcibly disabled Joe Lawrence
2013-01-18 22:54   ` Myron Stowe
2013-02-01 22:32     ` Bjorn Helgaas
     [not found] ` <CAL-B5D0+6uO7WDYR7inmZKdU0h8-bpkOs_CzbF0bD2b9i6=1ZA@mail.gmail.com>
2013-01-18 19:53   ` PCIe ASPM crash on device removal Joe Lawrence
2013-01-18 23:15     ` Myron Stowe
2013-01-18 23:41       ` Myron Stowe
2013-01-19  1:03         ` Joe Lawrence
2013-02-01 22:45           ` Bjorn Helgaas
2013-01-18 19:57 ` Myron Stowe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).