linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
@ 2017-01-11 19:49 Uwe Kleine-König
  2017-01-11 22:02 ` Bjorn Helgaas
  2017-01-17 15:14 ` Bjorn Helgaas
  0 siblings, 2 replies; 18+ messages in thread
From: Uwe Kleine-König @ 2017-01-11 19:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the
ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM
is enabled:

# dmesg | grep ath
[    7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, currently in D3
[    7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110
[    7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110

if however PCIEASPM is off, the driver probes correctly and the ath10k
adapter works fine.

I wonder if someone has an idea what needs to be done to fix this
problem. (OK, I could disable PCIEASPM, but I'd like to have a solution
for a distribution kernel where I think PCIEASPM=y is sensible in
general.)

Best regards
Uwe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170111/91e57d62/attachment.sig>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-11 19:49 CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine Uwe Kleine-König
@ 2017-01-11 22:02 ` Bjorn Helgaas
  2017-01-12 13:18   ` Uwe Kleine-König
  2017-01-17 15:14 ` Bjorn Helgaas
  1 sibling, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2017-01-11 22:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Uwe,

On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K?nig wrote:
> Hello,
> 
> on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the
> ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM
> is enabled:
> 
> # dmesg | grep ath
> [    7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, currently in D3
> [    7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> [    7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> 
> if however PCIEASPM is off, the driver probes correctly and the ath10k
> adapter works fine.
> 
> I wonder if someone has an idea what needs to be done to fix this
> problem. (OK, I could disable PCIEASPM, but I'd like to have a solution
> for a distribution kernel where I think PCIEASPM=y is sensible in
> general.)

PCIEASPM=y is definitely sensible and disabling ASPM is OK for a
workaround but is not a fix.

We have several open issues related to ASPM:

  https://bugzilla.kernel.org/show_bug.cgi?id=102311 ASPM: ASMEDA asm1062 not working
  https://bugzilla.kernel.org/show_bug.cgi?id=187731 Null pointer dereference in ASPM
  https://bugzilla.kernel.org/show_bug.cgi?id=189951 Enabling ASPM causes NIC performance issue
  https://bugzilla.kernel.org/show_bug.cgi?id=60111 NULL pointer deref in ASPM alloc_pcie_link_state()

I don't recognize yours as being one of these.  Can you open a new
issue and attach the complete dmesg log and "lspci -vv" output?

Is this a regression?

Bjorn

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-11 22:02 ` Bjorn Helgaas
@ 2017-01-12 13:18   ` Uwe Kleine-König
  2017-01-12 15:03     ` Bjorn Helgaas
  0 siblings, 1 reply; 18+ messages in thread
From: Uwe Kleine-König @ 2017-01-12 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/11/2017 11:02 PM, Bjorn Helgaas wrote:
> Hi Uwe,
> 
> On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K?nig wrote:
>> Hello,
>>
>> on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the
>> ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM
>> is enabled:
>>
>> [...]
> We have several open issues related to ASPM:
> 
>   https://bugzilla.kernel.org/show_bug.cgi?id=102311 ASPM: ASMEDA asm1062 not working
>   https://bugzilla.kernel.org/show_bug.cgi?id=187731 Null pointer dereference in ASPM
>   https://bugzilla.kernel.org/show_bug.cgi?id=189951 Enabling ASPM causes NIC performance issue
>   https://bugzilla.kernel.org/show_bug.cgi?id=60111 NULL pointer deref in ASPM alloc_pcie_link_state()
> 
> I don't recognize yours as being one of these.  Can you open a new
> issue and attach the complete dmesg log and "lspci -vv" output?

Done: https://bugzilla.kernel.org/show_bug.cgi?id=192441

> Is this a regression?

As written in the bug report this also happens on 4.7.

Best regards
Uwe


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170112/d9f681f1/attachment.sig>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-12 13:18   ` Uwe Kleine-König
@ 2017-01-12 15:03     ` Bjorn Helgaas
  2017-01-12 15:24       ` Andrew Lunn
  0 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2017-01-12 15:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 12, 2017 at 02:18:46PM +0100, Uwe Kleine-K?nig wrote:
> On 01/11/2017 11:02 PM, Bjorn Helgaas wrote:
> > Hi Uwe,
> > 
> > On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K?nig wrote:
> >> Hello,
> >>
> >> on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the
> >> ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM
> >> is enabled:
> >>
> >> [...]
> > We have several open issues related to ASPM:
> > 
> >   https://bugzilla.kernel.org/show_bug.cgi?id=102311 ASPM: ASMEDA asm1062 not working
> >   https://bugzilla.kernel.org/show_bug.cgi?id=187731 Null pointer dereference in ASPM
> >   https://bugzilla.kernel.org/show_bug.cgi?id=189951 Enabling ASPM causes NIC performance issue
> >   https://bugzilla.kernel.org/show_bug.cgi?id=60111 NULL pointer deref in ASPM alloc_pcie_link_state()
> > 
> > I don't recognize yours as being one of these.  Can you open a new
> > issue and attach the complete dmesg log and "lspci -vv" output?
> 
> Done: https://bugzilla.kernel.org/show_bug.cgi?id=192441

Thanks!  Can you attach a dmesg with CONFIG_PCIEASPM turned off, too?

There are several interesting things going on with that ath10k device,
and not all of them seem ASPM-related:

  pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
  pci 0000:02:00.0: reg 0x10: [mem 0xe8000000-0xe81fffff 64bit]
  pci 0000:02:00.0: reg 0x30: [mem 0xe8200000-0xe820ffff pref]
  pci 0000:02:00.0: of_irq_parse_pci() failed with rc=134
  pci 0000:02:00.0: BAR 0: assigned [mem 0xe0000000-0xe01fffff 64bit]
  pci 0000:02:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
  pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)

1) We found BAR 0 (reg 0x10) with 0xe8000000, so firmware probably
   programmed it, and it probably works there.

2) The host bridge window doesn't include that BAR 0 space.
   Unfortunately I don't think we print the initial 00:02.0 bridge
   window leading to bus 02; we only print the new window we assign to
   it.

3) No idea what the of_irq_parse_pci() issue is.

4) No idea why the BAR 0 update failed.  Maybe a Marvell config
   accessor problem?

I don't see any connection between these and ASPM, so I'm curious why
things work with CONFIG_PCIEASPM turned off.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-12 15:03     ` Bjorn Helgaas
@ 2017-01-12 15:24       ` Andrew Lunn
  0 siblings, 0 replies; 18+ messages in thread
From: Andrew Lunn @ 2017-01-12 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

>   pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
>   pci 0000:02:00.0: reg 0x10: [mem 0xe8000000-0xe81fffff 64bit]
>   pci 0000:02:00.0: reg 0x30: [mem 0xe8200000-0xe820ffff pref]
>   pci 0000:02:00.0: of_irq_parse_pci() failed with rc=134
>   pci 0000:02:00.0: BAR 0: assigned [mem 0xe0000000-0xe01fffff 64bit]
>   pci 0000:02:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff)
>   pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
> 
> 3) No idea what the of_irq_parse_pci() issue is.

134 is 0x86.

Could it be:

#define PCIBIOS_DEVICE_NOT_FOUND        0x86

pci-mvebu.c will return this in a few places, mvebu_pcie_wr_conf(),
mvebu_pcie_rd_conf().

Could this be

rc = pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin);

It looks like pci_read_config_byte() is expected to return a real
errno value, and maybe it is returning PCIBIOS_DEVICE_NOT_FOUND?

      Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-11 19:49 CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine Uwe Kleine-König
  2017-01-11 22:02 ` Bjorn Helgaas
@ 2017-01-17 15:14 ` Bjorn Helgaas
  2017-01-17 15:25   ` Russell King - ARM Linux
  1 sibling, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2017-01-17 15:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K?nig wrote:
> Hello,
> 
> on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the
> ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM
> is enabled:
> 
> # dmesg | grep ath
> [    7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, currently in D3
> [    7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> [    7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> 
> if however PCIEASPM is off, the driver probes correctly and the ath10k
> adapter works fine.
> 
> I wonder if someone has an idea what needs to be done to fix this
> problem. (OK, I could disable PCIEASPM, but I'd like to have a solution
> for a distribution kernel where I think PCIEASPM=y is sensible in
> general.)

Can somebody confirm that this system (Marvell Armada 385-based Turris
Omnia) does actually support ASPM in hardware?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 15:14 ` Bjorn Helgaas
@ 2017-01-17 15:25   ` Russell King - ARM Linux
  2017-01-17 17:46     ` Bjorn Helgaas
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2017-01-17 15:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 09:14:44AM -0600, Bjorn Helgaas wrote:
> On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K?nig wrote:
> > Hello,
> > 
> > on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the
> > ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM
> > is enabled:
> > 
> > # dmesg | grep ath
> > [    7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, currently in D3
> > [    7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> > [    7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> > 
> > if however PCIEASPM is off, the driver probes correctly and the ath10k
> > adapter works fine.
> > 
> > I wonder if someone has an idea what needs to be done to fix this
> > problem. (OK, I could disable PCIEASPM, but I'd like to have a solution
> > for a distribution kernel where I think PCIEASPM=y is sensible in
> > general.)
> 
> Can somebody confirm that this system (Marvell Armada 385-based Turris
> Omnia) does actually support ASPM in hardware?

What sort of "hardware" are you referring to?

>From my reading of the specs, ASPM doesn't require any external hardware.
It's all done inside the PCIe root hub and PCIe device.

The PCIe spec specifically prohibits cutting power supplies and clocks to
PCIe devices during L0s and L1, with the exception that the PCIe clock may
be stopped in L1 if CLKREQ# is deasserted.  CLKREQ# handling generally
requires GPIO usage, and as there's no support for that, there's no support
for stopping the PCIe clock in L1.  We do the correct thing there,
preventing the PCI_EXP_LNKCTL_CLKREQ_EN bit being set.

That all said, it would probably be a good idea to throw some printk()
debugging into mvebu_sw_pci_bridge_write() and mvebu_sw_pci_bridge_read()
so we can see what's going on at that level, and maybe also some debug
in mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf() so we can see
what's happening at the PCIe device too.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 15:25   ` Russell King - ARM Linux
@ 2017-01-17 17:46     ` Bjorn Helgaas
  2017-01-17 17:51       ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2017-01-17 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 03:25:49PM +0000, Russell King - ARM Linux wrote:
> On Tue, Jan 17, 2017 at 09:14:44AM -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K?nig wrote:
> > > Hello,
> > > 
> > > on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the
> > > ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM
> > > is enabled:
> > > 
> > > # dmesg | grep ath
> > > [    7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, currently in D3
> > > [    7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> > > [    7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> > > 
> > > if however PCIEASPM is off, the driver probes correctly and the ath10k
> > > adapter works fine.
> > > 
> > > I wonder if someone has an idea what needs to be done to fix this
> > > problem. (OK, I could disable PCIEASPM, but I'd like to have a solution
> > > for a distribution kernel where I think PCIEASPM=y is sensible in
> > > general.)
> > 
> > Can somebody confirm that this system (Marvell Armada 385-based Turris
> > Omnia) does actually support ASPM in hardware?
> 
> What sort of "hardware" are you referring to?
> 
> From my reading of the specs, ASPM doesn't require any external hardware.
> It's all done inside the PCIe root hub and PCIe device.

Right.  My question is just whether we know that the Marvell PCIe root
hub hardware works correctly with respect to ASPM.  The PCI core isn't
doing anything special for Marvell, so problems here are likely to be
either in the Marvell hardware or in the pci-mvebu.c driver.

> The PCIe spec specifically prohibits cutting power supplies and clocks to
> PCIe devices during L0s and L1, with the exception that the PCIe clock may
> be stopped in L1 if CLKREQ# is deasserted.  CLKREQ# handling generally
> requires GPIO usage, and as there's no support for that, there's no support
> for stopping the PCIe clock in L1.  We do the correct thing there,
> preventing the PCI_EXP_LNKCTL_CLKREQ_EN bit being set.
> 
> That all said, it would probably be a good idea to throw some printk()
> debugging into mvebu_sw_pci_bridge_write() and mvebu_sw_pci_bridge_read()
> so we can see what's going on at that level, and maybe also some debug
> in mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf() so we can see
> what's happening at the PCIe device too.

Uwe has already done that; the dmesg logs including this
instrumentation are at
https://bugzilla.kernel.org/show_bug.cgi?id=192441

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 17:46     ` Bjorn Helgaas
@ 2017-01-17 17:51       ` Russell King - ARM Linux
  2017-01-17 17:57         ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2017-01-17 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 11:46:49AM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 17, 2017 at 03:25:49PM +0000, Russell King - ARM Linux wrote:
> > On Tue, Jan 17, 2017 at 09:14:44AM -0600, Bjorn Helgaas wrote:
> > > On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K?nig wrote:
> > > > Hello,
> > > > 
> > > > on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the
> > > > ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM
> > > > is enabled:
> > > > 
> > > > # dmesg | grep ath
> > > > [    7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, currently in D3
> > > > [    7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110
> > > > [    7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110
> > > > 
> > > > if however PCIEASPM is off, the driver probes correctly and the ath10k
> > > > adapter works fine.
> > > > 
> > > > I wonder if someone has an idea what needs to be done to fix this
> > > > problem. (OK, I could disable PCIEASPM, but I'd like to have a solution
> > > > for a distribution kernel where I think PCIEASPM=y is sensible in
> > > > general.)
> > > 
> > > Can somebody confirm that this system (Marvell Armada 385-based Turris
> > > Omnia) does actually support ASPM in hardware?
> > 
> > What sort of "hardware" are you referring to?
> > 
> > From my reading of the specs, ASPM doesn't require any external hardware.
> > It's all done inside the PCIe root hub and PCIe device.
> 
> Right.  My question is just whether we know that the Marvell PCIe root
> hub hardware works correctly with respect to ASPM.  The PCI core isn't
> doing anything special for Marvell, so problems here are likely to be
> either in the Marvell hardware or in the pci-mvebu.c driver.
> 
> > The PCIe spec specifically prohibits cutting power supplies and clocks to
> > PCIe devices during L0s and L1, with the exception that the PCIe clock may
> > be stopped in L1 if CLKREQ# is deasserted.  CLKREQ# handling generally
> > requires GPIO usage, and as there's no support for that, there's no support
> > for stopping the PCIe clock in L1.  We do the correct thing there,
> > preventing the PCI_EXP_LNKCTL_CLKREQ_EN bit being set.
> > 
> > That all said, it would probably be a good idea to throw some printk()
> > debugging into mvebu_sw_pci_bridge_write() and mvebu_sw_pci_bridge_read()
> > so we can see what's going on at that level, and maybe also some debug
> > in mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf() so we can see
> > what's happening at the PCIe device too.
> 
> Uwe has already done that; the dmesg logs including this
> instrumentation are at
> https://bugzilla.kernel.org/show_bug.cgi?id=192441

Grr, <swears about SSL incompatibilities>... wget's the URL and then
uses elinks on it...

Umm, not quite.  He's done mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf()
but not the bridge from the descriptions given on the attachments.
Obviously, it's going to be a lot of work to manufacture the links to
look at each attachment to thoroughly check, so I'm not going to do
that given quite how broken SSL crap is today.

(Try installing elinks and pointing it at the above URL.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 17:51       ` Russell King - ARM Linux
@ 2017-01-17 17:57         ` Russell King - ARM Linux
  2017-01-17 18:14           ` Bjorn Helgaas
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2017-01-17 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 05:51:16PM +0000, Russell King - ARM Linux wrote:
> On Tue, Jan 17, 2017 at 11:46:49AM -0600, Bjorn Helgaas wrote:
> > Uwe has already done that; the dmesg logs including this
> > instrumentation are at
> > https://bugzilla.kernel.org/show_bug.cgi?id=192441
> 
> Grr, <swears about SSL incompatibilities>... wget's the URL and then
> uses elinks on it...
> 
> Umm, not quite.  He's done mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf()
> but not the bridge from the descriptions given on the attachments.
> Obviously, it's going to be a lot of work to manufacture the links to
> look at each attachment to thoroughly check, so I'm not going to do
> that given quite how broken SSL crap is today.
> 
> (Try installing elinks and pointing it at the above URL.)

Oh, and looking at some of the debug that's been added:

[    3.646322] mvebu_pcie_rd_conf(where=16, size=4, val=3892314116) => 0
[    3.646325] mvebu_pcie_wr_conf(where=16, size=4, val=4294967295)
[    3.646329] mvebu_pcie_rd_conf(where=16, size=4, val=4292870148) => 0
[    3.646332] mvebu_pcie_wr_conf(where=16, size=4, val=3892314116)

Please print register values in HEX, not decimal.  Same for register
addresses.  Hex is the normal base to print this information, which
the human brain can easily comprehend and translate to bits in a
register.  Decimal values are useless and might as well be encrypted.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 17:57         ` Russell King - ARM Linux
@ 2017-01-17 18:14           ` Bjorn Helgaas
  2017-01-17 19:34             ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2017-01-17 18:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 05:57:28PM +0000, Russell King - ARM Linux wrote:
> On Tue, Jan 17, 2017 at 05:51:16PM +0000, Russell King - ARM Linux wrote:
> > On Tue, Jan 17, 2017 at 11:46:49AM -0600, Bjorn Helgaas wrote:
> > > Uwe has already done that; the dmesg logs including this
> > > instrumentation are at
> > > https://bugzilla.kernel.org/show_bug.cgi?id=192441
> > 
> > Grr, <swears about SSL incompatibilities>... wget's the URL and then
> > uses elinks on it...
> > 
> > Umm, not quite.  He's done mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf()
> > but not the bridge from the descriptions given on the attachments.
> > Obviously, it's going to be a lot of work to manufacture the links to
> > look at each attachment to thoroughly check, so I'm not going to do
> > that given quite how broken SSL crap is today.
> > 
> > (Try installing elinks and pointing it at the above URL.)
> 
> Oh, and looking at some of the debug that's been added:
> 
> [    3.646322] mvebu_pcie_rd_conf(where=16, size=4, val=3892314116) => 0
> [    3.646325] mvebu_pcie_wr_conf(where=16, size=4, val=4294967295)
> [    3.646329] mvebu_pcie_rd_conf(where=16, size=4, val=4292870148) => 0
> [    3.646332] mvebu_pcie_wr_conf(where=16, size=4, val=3892314116)
> 
> Please print register values in HEX, not decimal.  Same for register
> addresses.  Hex is the normal base to print this information, which
> the human brain can easily comprehend and translate to bits in a
> register.  Decimal values are useless and might as well be encrypted.

The instrumentation has evolved a bit since then.  Latest is below (could
still use improvement, but it does address your suggestions above):

https://bugzilla.kernel.org/attachment.cgi?id=251691 (CONFIG_PCIEASPM=y)
https://bugzilla.kernel.org/attachment.cgi?id=251701 (CONFIG_PCIEASPM not set)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 18:14           ` Bjorn Helgaas
@ 2017-01-17 19:34             ` Russell King - ARM Linux
  2017-01-17 21:02               ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2017-01-17 19:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 12:14:58PM -0600, Bjorn Helgaas wrote:
> The instrumentation has evolved a bit since then.  Latest is below (could
> still use improvement, but it does address your suggestions above):
> 
> https://bugzilla.kernel.org/attachment.cgi?id=251691 (CONFIG_PCIEASPM=y)
> https://bugzilla.kernel.org/attachment.cgi?id=251701 (CONFIG_PCIEASPM not set)

Thanks.

The point at which things die is when we request a link retrain - I've
augmented the trace with the register names:

pci 0000:02:00.0: rd where=0x074 size=4 val=0x8dc1 (hw)         EXP_DEVCAP

pcie_aspm_configure_common_clock():

pci 0000:02:00.0: rd where=0x082 size=2 val=0x1011 (hw)         EXP_LNKSTA
pci 0000:??:??.?: rd where=0x052 size=2 val=0x1011 (sw)         EXP_LNKSTA
pci 0000:02:00.0: rd where=0x080 size=2 val=0x0 (hw)            EXP_LNKCTL
pci 0000:02:00.0: wr where=0x080 size=2 val=0x40 (hw)           EXP_LNKCTL

Enables common clock configuration on the device.

pci 0000:??:??.?: rd where=0x050 size=2 val=0x40 (sw)           EXP_LNKCTL
pci 0000:??:??.?: wr where=0x050 size=2 val=0x40 (sw)           EXP_LNKCTL

Common clock configuration is already enabled on the root.

pci 0000:??:??.?: rd where=0x050 size=4 val=0x10110040 (sw)     EXP_LNKCTL
pci 0000:??:??.?: wr where=0x050 size=2 val=0x60 (sw)           EXP_LNKCTL

Here we request the train, setting bit 5 in the link control
register.

pci 0000:??:??.?: rd where=0x050 size=4 val=0x110040 (sw)       EXP_LNKCTL
pci 0000:??:??.?: rd where=0x052 size=2 val=0x811 (sw)          EXP_LNKSTA
pci 0000:??:??.?: rd where=0x052 size=2 val=0x811 (sw)          EXP_LNKSTA

Waiting for the link training bit to clear...

pci 0000:??:??.?: rd where=0x052 size=2 val=0x11 (sw)           EXP_LNKSTA

and it's cleared here - but note that the link is still down.

pci 0000:??:??.?: rd where=0x04c size=4 val=0x3ac12 (sw)        EXP_LNKCAP
pci 0000:??:??.?: rd where=0x050 size=2 val=0x40 (sw)           EXP_LNKCTL

pcie_get_aspm_reg() for the root.

pci 0000:02:00.0: rd where=0x07c size=4 val=0xffffffff (no link)

pcie_get_aspm_reg() for the device (fails).

So, I think the question is... why does asking for a retrain cause
the link to fail and never recover?

Uwe, can you try:

setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
	0x50.w=0x60

and see whether it remains alive (you can check by reading the root
register 0x52.w - bit 12 should be set once bit 11 clears again.

If that's successful, maybe setting the common clock bit on the PCIe
device is what's causing the problem, in which case:

setpci -s 02:00.0 0x80.w=0x40
setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
	0x50.w=0x60

I would imagine would cause the link to go down.  So, the question
this gives us is why the common clock setup is not working on your
platform.  Maybe we need to source the SLC bit in the link status
from DT, though I'd like to understand what's going on here more
first.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 19:34             ` Russell King - ARM Linux
@ 2017-01-17 21:02               ` Russell King - ARM Linux
  2017-01-17 22:22                 ` Bjorn Helgaas
  0 siblings, 1 reply; 18+ messages in thread
From: Russell King - ARM Linux @ 2017-01-17 21:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 07:34:14PM +0000, Russell King - ARM Linux wrote:
> Uwe, can you try:
> 
> setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
> 	0x50.w=0x60
> 
> and see whether it remains alive (you can check by reading the root
> register 0x52.w - bit 12 should be set once bit 11 clears again.

For reference, this I got wrong...

0xf1041a04 bit 0 indicates link status (0 = link up, 1 = link down).

> If that's successful, maybe setting the common clock bit on the PCIe
> device is what's causing the problem, in which case:
> 
> setpci -s 02:00.0 0x80.w=0x40
> setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
> 	0x50.w=0x60

Having worked with Uwe over IRC, it seems that any request to retrain
causes the link to go down, either with or without the common clock bit
set:

# setpci -s 2.0 0x50.w=0x60
# setpci -s 2.0 0x52.w
0011
# memtool md 0xf1041a04+4
f1041a04: 00010201
... reboot ...
# setpci -s 2.0 0x50.w=0x20
# memtool md 0xf1041a04+4
f1041a04: 00010201

which doesn't point towards ASPM itself, but the problem is caused by
a side effect of ASPM's setup code which always triggers a retrain.

Bit 5 in that register is documented (at least in the Armada 370 docs
and Armada XP docs I have) as:

5  RetrnLnk  RW    Retrain Link
             0x0   This bit forces the device to initiate link retraining.
                   Always returns 0 when read.
                   NOTE: If configured as an Endpoint, this field is
                   reserved and has no effect.

Bjorn, are you aware of similar situations where a request for the PCIe
link to be retrained causes it to fail?

Here, on my Armada 388, I can request a link retrain with or without the
common clock bit set and everything's happy (this is with an ASM1062 SATA
mini-PCIe card):

root at clearfog21:~# setpci -s 2.0 0x50.w=0x60
root at clearfog21:~# setpci -s 2.0 0x52.w
0012
root at clearfog21:~# /shared/bin/devmem2 0xf1041a04
Value at address 0xf1041a04: 0x00010100
root at clearfog21:~# setpci -s 2.0 0x50.w=0x20
root at clearfog21:~# setpci -s 2.0 0x52.w
0012
root at clearfog21:~# /shared/bin/devmem2 0xf1041a04
Value at address 0xf1041a04: 0x00010100

One curious observation I have noticed on Armada 388 is this behaviour:

root at clearfog21:~# setpci -s 2.0 0x50.l=0xffff0040 0x50.l 0x50.l=0x0fff0040 0x50.l
10120040
00120040

bit 28 is writable, which goes against the 370/XP docs:

28 SltClkCfg  RO  Slot Clock Configuration
              0x1 0 = Independent: The device uses an independent clock,
                      irrespective of the presence of a reference clock
                      on the connector.
                  1 = Reference: The device uses the reference clock that
                      the platform provides.

It seems that this bit is _not_ read-only.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 21:02               ` Russell King - ARM Linux
@ 2017-01-17 22:22                 ` Bjorn Helgaas
  2017-01-17 23:37                   ` David Daney
  0 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2017-01-17 22:22 UTC (permalink / raw)
  To: linux-arm-kernel

[+cc David]

On Tue, Jan 17, 2017 at 09:02:58PM +0000, Russell King - ARM Linux wrote:
> On Tue, Jan 17, 2017 at 07:34:14PM +0000, Russell King - ARM Linux wrote:
> > Uwe, can you try:
> > 
> > setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
> > 	0x50.w=0x60
> > 
> > and see whether it remains alive (you can check by reading the root
> > register 0x52.w - bit 12 should be set once bit 11 clears again.
> 
> For reference, this I got wrong...
> 
> 0xf1041a04 bit 0 indicates link status (0 = link up, 1 = link down).
> 
> > If that's successful, maybe setting the common clock bit on the PCIe
> > device is what's causing the problem, in which case:
> > 
> > setpci -s 02:00.0 0x80.w=0x40
> > setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
> > 	0x50.w=0x60
> 
> Having worked with Uwe over IRC, it seems that any request to retrain
> causes the link to go down, either with or without the common clock bit
> set:
> 
> # setpci -s 2.0 0x50.w=0x60
> # setpci -s 2.0 0x52.w
> 0011
> # memtool md 0xf1041a04+4
> f1041a04: 00010201
> ... reboot ...
> # setpci -s 2.0 0x50.w=0x20
> # memtool md 0xf1041a04+4
> f1041a04: 00010201
> 
> which doesn't point towards ASPM itself, but the problem is caused by
> a side effect of ASPM's setup code which always triggers a retrain.
> 
> Bit 5 in that register is documented (at least in the Armada 370 docs
> and Armada XP docs I have) as:
> 
> 5  RetrnLnk  RW    Retrain Link
>              0x0   This bit forces the device to initiate link retraining.
>                    Always returns 0 when read.
>                    NOTE: If configured as an Endpoint, this field is
>                    reserved and has no effect.
> 
> Bjorn, are you aware of similar situations where a request for the PCIe
> link to be retrained causes it to fail?

The only one that comes to mind is this patch from David (CC'd) that
avoids ASPM-related retrains when we know the link doesn't support ASPM:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53f9a28bee3

Side note: it looks like we don't use the recommended retrain
algorithm in the implementation note about avoiding race conditions in 
PCIe r3.0, sec 7.8.7.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 22:22                 ` Bjorn Helgaas
@ 2017-01-17 23:37                   ` David Daney
  2017-01-18 14:22                     ` Bjorn Helgaas
  0 siblings, 1 reply; 18+ messages in thread
From: David Daney @ 2017-01-17 23:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/17/2017 02:22 PM, Bjorn Helgaas wrote:
> [+cc David]
>
> On Tue, Jan 17, 2017 at 09:02:58PM +0000, Russell King - ARM Linux wrote:
>> On Tue, Jan 17, 2017 at 07:34:14PM +0000, Russell King - ARM Linux wrote:
>>> Uwe, can you try:
>>>
>>> setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
>>> 	0x50.w=0x60
>>>
>>> and see whether it remains alive (you can check by reading the root
>>> register 0x52.w - bit 12 should be set once bit 11 clears again.
>>
>> For reference, this I got wrong...
>>
>> 0xf1041a04 bit 0 indicates link status (0 = link up, 1 = link down).
>>
>>> If that's successful, maybe setting the common clock bit on the PCIe
>>> device is what's causing the problem, in which case:
>>>
>>> setpci -s 02:00.0 0x80.w=0x40
>>> setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
>>> 	0x50.w=0x60
>>
>> Having worked with Uwe over IRC, it seems that any request to retrain
>> causes the link to go down, either with or without the common clock bit
>> set:
>>
>> # setpci -s 2.0 0x50.w=0x60
>> # setpci -s 2.0 0x52.w
>> 0011
>> # memtool md 0xf1041a04+4
>> f1041a04: 00010201
>> ... reboot ...
>> # setpci -s 2.0 0x50.w=0x20
>> # memtool md 0xf1041a04+4
>> f1041a04: 00010201
>>
>> which doesn't point towards ASPM itself, but the problem is caused by
>> a side effect of ASPM's setup code which always triggers a retrain.
>>
>> Bit 5 in that register is documented (at least in the Armada 370 docs
>> and Armada XP docs I have) as:
>>
>> 5  RetrnLnk  RW    Retrain Link
>>              0x0   This bit forces the device to initiate link retraining.
>>                    Always returns 0 when read.
>>                    NOTE: If configured as an Endpoint, this field is
>>                    reserved and has no effect.
>>
>> Bjorn, are you aware of similar situations where a request for the PCIe
>> link to be retrained causes it to fail?


Link (re)training can fail for several reasons including, but not 
limited to:

- Poor signal propagation through the chips/packages/boards/connectors, 
also known as Signal Integrity (SI) problmes.

- Incorrect implementation, in hardware, of link training protocols at 
either end of the link

Usually, system and PCIe device vendors do a lot of testing and signal 
analysis across a variety of configurations with the end goal being that 
PCIe looks like a bullet-proof interconnect to the end consumer.

Unfortunatly, sometimes it doesn't work.  In these cases, the vendors of 
the devices on each end of the link tend to point fingers at the link 
partner for being detective in some way.

This patch:

>
> The only one that comes to mind is this patch from David (CC'd) that
> avoids ASPM-related retrains when we know the link doesn't support ASPM:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53f9a28bee3
>

Is an attempt to work around the problem from the system (host) end.  If 
the system vendor knows a priori that a defective PCIe device is present 
in the system, the PCIe root port can be configured to indicate no ASPM 
is supported, resulting (with the patch) in no link retraining being 
attempted.

To me it feels that we need a black list of devices that fail at a high 
rate in the link retraining, that when encountered would disable ASPM on 
the link where they reside.

Just my $0.02
David Daney


> Side note: it looks like we don't use the recommended retrain
> algorithm in the implementation note about avoiding race conditions in
> PCIe r3.0, sec 7.8.7.
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-17 23:37                   ` David Daney
@ 2017-01-18 14:22                     ` Bjorn Helgaas
  2017-01-18 17:36                       ` David Daney
  0 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2017-01-18 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 17, 2017 at 03:37:10PM -0800, David Daney wrote:
> On 01/17/2017 02:22 PM, Bjorn Helgaas wrote:
> >[+cc David]
> >
> >On Tue, Jan 17, 2017 at 09:02:58PM +0000, Russell King - ARM Linux wrote:
> >>On Tue, Jan 17, 2017 at 07:34:14PM +0000, Russell King - ARM Linux wrote:
> >>>Uwe, can you try:
> >>>
> >>>setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
> >>>	0x50.w=0x60
> >>>
> >>>and see whether it remains alive (you can check by reading the root
> >>>register 0x52.w - bit 12 should be set once bit 11 clears again.
> >>
> >>For reference, this I got wrong...
> >>
> >>0xf1041a04 bit 0 indicates link status (0 = link up, 1 = link down).
> >>
> >>>If that's successful, maybe setting the common clock bit on the PCIe
> >>>device is what's causing the problem, in which case:
> >>>
> >>>setpci -s 02:00.0 0x80.w=0x40
> >>>setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \
> >>>	0x50.w=0x60
> >>
> >>Having worked with Uwe over IRC, it seems that any request to retrain
> >>causes the link to go down, either with or without the common clock bit
> >>set:
> >>
> >># setpci -s 2.0 0x50.w=0x60
> >># setpci -s 2.0 0x52.w
> >>0011
> >># memtool md 0xf1041a04+4
> >>f1041a04: 00010201
> >>... reboot ...
> >># setpci -s 2.0 0x50.w=0x20
> >># memtool md 0xf1041a04+4
> >>f1041a04: 00010201
> >>
> >>which doesn't point towards ASPM itself, but the problem is caused by
> >>a side effect of ASPM's setup code which always triggers a retrain.
> >>
> >>Bit 5 in that register is documented (at least in the Armada 370 docs
> >>and Armada XP docs I have) as:
> >>
> >>5  RetrnLnk  RW    Retrain Link
> >>             0x0   This bit forces the device to initiate link retraining.
> >>                   Always returns 0 when read.
> >>                   NOTE: If configured as an Endpoint, this field is
> >>                   reserved and has no effect.
> >>
> >>Bjorn, are you aware of similar situations where a request for the PCIe
> >>link to be retrained causes it to fail?
> 
> 
> Link (re)training can fail for several reasons including, but not
> limited to:
> 
> - Poor signal propagation through the
> chips/packages/boards/connectors, also known as Signal Integrity
> (SI) problmes.
> 
> - Incorrect implementation, in hardware, of link training protocols
> at either end of the link
> 
> Usually, system and PCIe device vendors do a lot of testing and
> signal analysis across a variety of configurations with the end goal
> being that PCIe looks like a bullet-proof interconnect to the end
> consumer.
> 
> Unfortunatly, sometimes it doesn't work.  In these cases, the
> vendors of the devices on each end of the link tend to point fingers
> at the link partner for being detective in some way.
> 
> This patch:
> 
> >
> >The only one that comes to mind is this patch from David (CC'd) that
> >avoids ASPM-related retrains when we know the link doesn't support ASPM:
> >http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53f9a28bee3
> >
> 
> Is an attempt to work around the problem from the system (host) end.
> If the system vendor knows a priori that a defective PCIe device is
> present in the system, the PCIe root port can be configured to
> indicate no ASPM is supported, resulting (with the patch) in no link
> retraining being attempted.
> 
> To me it feels that we need a black list of devices that fail at a
> high rate in the link retraining, that when encountered would
> disable ASPM on the link where they reside.

I should have asked you for details about the defective devices
related to e53f9a28bee3 :)  If we had included that in the changelog,
we would have something to seed a blacklist with.

There are several situations other than ASPM where link retraining is
required per spec (rate change, error handling, etc), and I guess we'd
have to avoid all of them.   So I suppose e53f9a28bee3 avoids the most
obvious failures, but maybe we could still see issues in those other
cases.

Bjorn

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-18 14:22                     ` Bjorn Helgaas
@ 2017-01-18 17:36                       ` David Daney
  2017-01-18 17:55                         ` Russell King - ARM Linux
  0 siblings, 1 reply; 18+ messages in thread
From: David Daney @ 2017-01-18 17:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/18/2017 06:22 AM, Bjorn Helgaas wrote:
> On Tue, Jan 17, 2017 at 03:37:10PM -0800, David Daney wrote:
[...]
>>
>>
>> Link (re)training can fail for several reasons including, but not
>> limited to:
>>
>> - Poor signal propagation through the
>> chips/packages/boards/connectors, also known as Signal Integrity
>> (SI) problmes.
>>
>> - Incorrect implementation, in hardware, of link training protocols
>> at either end of the link
>>
>> Usually, system and PCIe device vendors do a lot of testing and
>> signal analysis across a variety of configurations with the end goal
>> being that PCIe looks like a bullet-proof interconnect to the end
>> consumer.
>>
>> Unfortunatly, sometimes it doesn't work.  In these cases, the
>> vendors of the devices on each end of the link tend to point fingers
>> at the link partner for being detective in some way.
>>
>> This patch:
>>
>>>
>>> The only one that comes to mind is this patch from David (CC'd) that
>>> avoids ASPM-related retrains when we know the link doesn't support ASPM:
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53f9a28bee3
>>>
>>
>> Is an attempt to work around the problem from the system (host) end.
>> If the system vendor knows a priori that a defective PCIe device is
>> present in the system, the PCIe root port can be configured to
>> indicate no ASPM is supported, resulting (with the patch) in no link
>> retraining being attempted.
>>
>> To me it feels that we need a black list of devices that fail at a
>> high rate in the link retraining, that when encountered would
>> disable ASPM on the link where they reside.
>
> I should have asked you for details about the defective devices
> related to e53f9a28bee3 :)  If we had included that in the changelog,
> we would have something to seed a blacklist with.

The device I saw failing I don't have access to any more, so I don't 
know the PCI IDs.  It was a solid-state storage device with a Xilinx 
FPGA acting as the PCIe endpoint.  In any event, it would only fail in 
about 0.5% of system boots, it wasn't the case that it could be made to 
reliably fail.

The tricky thing here is assigning the blame for failure in link 
training.  In the case in question we spent many months analysing the 
analog properties of the bus and examining/decoding  analog scope 
captures of the failures before credibly assigning blame to the other 
guy.  Usually what happens is the device vendor accurately claims that 
their device works flawlessly in conjunction with certain Intel root 
ports, so the problem must be fixed in the root port of the failing 
system.  If you have a black list, you may be disabling ASPM in systems 
where it can work without failures.



>
> There are several situations other than ASPM where link retraining is
> required per spec (rate change, error handling, etc), and I guess we'd
> have to avoid all of them.   So I suppose e53f9a28bee3 avoids the most
> obvious failures, but maybe we could still see issues in those other
> cases.
>
> Bjorn
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine
  2017-01-18 17:36                       ` David Daney
@ 2017-01-18 17:55                         ` Russell King - ARM Linux
  0 siblings, 0 replies; 18+ messages in thread
From: Russell King - ARM Linux @ 2017-01-18 17:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 18, 2017 at 09:36:55AM -0800, David Daney wrote:
> On 01/18/2017 06:22 AM, Bjorn Helgaas wrote:
> The tricky thing here is assigning the blame for failure in link training.
> In the case in question we spent many months analysing the analog properties
> of the bus and examining/decoding  analog scope captures of the failures
> before credibly assigning blame to the other guy.  Usually what happens is
> the device vendor accurately claims that their device works flawlessly in
> conjunction with certain Intel root ports, so the problem must be fixed in
> the root port of the failing system.  If you have a black list, you may be
> disabling ASPM in systems where it can work without failures.

So what we need is not a table of just devices, but a combination of
devices... iow, "when root A and endpoint B are combined, retrains
need to be avoided."

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-01-18 17:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-11 19:49 CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine Uwe Kleine-König
2017-01-11 22:02 ` Bjorn Helgaas
2017-01-12 13:18   ` Uwe Kleine-König
2017-01-12 15:03     ` Bjorn Helgaas
2017-01-12 15:24       ` Andrew Lunn
2017-01-17 15:14 ` Bjorn Helgaas
2017-01-17 15:25   ` Russell King - ARM Linux
2017-01-17 17:46     ` Bjorn Helgaas
2017-01-17 17:51       ` Russell King - ARM Linux
2017-01-17 17:57         ` Russell King - ARM Linux
2017-01-17 18:14           ` Bjorn Helgaas
2017-01-17 19:34             ` Russell King - ARM Linux
2017-01-17 21:02               ` Russell King - ARM Linux
2017-01-17 22:22                 ` Bjorn Helgaas
2017-01-17 23:37                   ` David Daney
2017-01-18 14:22                     ` Bjorn Helgaas
2017-01-18 17:36                       ` David Daney
2017-01-18 17:55                         ` Russell King - ARM Linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).