All of lore.kernel.org
 help / color / mirror / Atom feed
* Fixing PCIe issues on Armada XP
@ 2014-04-10 16:19 ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 16:19 UTC (permalink / raw)
  To: Jason Gunthorpe, Neil Greatorex, Willy Tarreau, Matthew Minter,
	Gerlando Falauto
  Cc: linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

Hello all,

This is an e-mail that attempts to summarize the situation in terms of
Armada XP PCIe issues.

At
https://github.com/MISL-EBU-System-SW/mainline-public/commits/3.14/pci-debug,
I've pushed a branch based on top of v3.14 that contains:

 * 2 backports for the igb driver, needed to get the igb driver to work
   in situations where MSI-X support is not available. These patches
   are already in mainline (post v3.14), and I've sent a mail to the
   maintainers and the netdev@ mailing list to ask for these patches to
   be pushed to 3.14 stable.

 * 3 backports of patches that use the 0xf1000000 internal register
   address for Armada XP development boards. These are needed for me
   because I have a recent Marvell bootloader. You may or may not want
   to apply these patches depending on which Marvell board you're
   using, and which version of the bootloader you have.

 * 3 patches on the irq-armada-370-xp driver, to fix the MSI support.
   One from Neil Greatorex, two from me.

 * 2 patches on the mvebu-mbus driver. One from Jason Gunthorpe which
   adds loud warnings when a non power-of-two window size is requested,
   and one from me to allow the creation of several windows having the
   same target and attribute values, which is needed if we want to
   create multiple windows to describe a single PCI BAR.

 * 2 patches on the pci-mvebu driver. One from Willy Tarreau to fix the
   off by one on the sizes. And another one from me which splits the
   PCI BAR into power-of-two sized chunks, in order to create valid
   MBus windows. I've tested this with my IGB card which needs a 9 MB
   BAR (so 8 MB + 1 MB needed), and I've also faked the code to code to
   simulate a 11.5 MB BAR (so 8 + 2 + 1 + 0.5 MB), and it worked. I
   also checked that if we have an error when creating one of the
   windows, then all the previous windows needed for the current BAR
   are properly removed.

Can you test this stack of patches on your system and configuration, and
let me know if that works for you? Of course, please do not include any
other PCI related fix: the goal is to be aware of *all* the issues, and
fix them in mainline.

Gerlando: I remember you also had a power-of-two related issues, that
you reported a while ago. This patch series should fix it. Would it be
possible for you to test this patch series?

Remaining issues:

 * The link up problem. Unfortunately, I tried to reproduce it today,
   and didn't manage to. It's weird, because I'm sure I was able to
   produce it in the past, but I'm no longer able to, I don't know.
   Therefore, it's not easy for me to work on this topic. Neil, Jason,
   do you think this is a topic you could potentially handle?

 * On my Armada XP DB board, if I plug 5 PCIe cards, the IGB card for
   some reason isn't able to read data from its non-volatile memory. So
   the window points to something, but it doesn't seem to patch what
   the igb driver expects. I've double checked the MBus windows, and
   they look correct. I haven't tested this PCIe configuration with the
   Marvell LSP though, so maybe I'm hitting an unrelated hardware
   problem or something.

Thanks a lot for your feedback and participation around these PCIe
issues!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 16:19 ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hello all,

This is an e-mail that attempts to summarize the situation in terms of
Armada XP PCIe issues.

At
https://github.com/MISL-EBU-System-SW/mainline-public/commits/3.14/pci-debug,
I've pushed a branch based on top of v3.14 that contains:

 * 2 backports for the igb driver, needed to get the igb driver to work
   in situations where MSI-X support is not available. These patches
   are already in mainline (post v3.14), and I've sent a mail to the
   maintainers and the netdev@ mailing list to ask for these patches to
   be pushed to 3.14 stable.

 * 3 backports of patches that use the 0xf1000000 internal register
   address for Armada XP development boards. These are needed for me
   because I have a recent Marvell bootloader. You may or may not want
   to apply these patches depending on which Marvell board you're
   using, and which version of the bootloader you have.

 * 3 patches on the irq-armada-370-xp driver, to fix the MSI support.
   One from Neil Greatorex, two from me.

 * 2 patches on the mvebu-mbus driver. One from Jason Gunthorpe which
   adds loud warnings when a non power-of-two window size is requested,
   and one from me to allow the creation of several windows having the
   same target and attribute values, which is needed if we want to
   create multiple windows to describe a single PCI BAR.

 * 2 patches on the pci-mvebu driver. One from Willy Tarreau to fix the
   off by one on the sizes. And another one from me which splits the
   PCI BAR into power-of-two sized chunks, in order to create valid
   MBus windows. I've tested this with my IGB card which needs a 9 MB
   BAR (so 8 MB + 1 MB needed), and I've also faked the code to code to
   simulate a 11.5 MB BAR (so 8 + 2 + 1 + 0.5 MB), and it worked. I
   also checked that if we have an error when creating one of the
   windows, then all the previous windows needed for the current BAR
   are properly removed.

Can you test this stack of patches on your system and configuration, and
let me know if that works for you? Of course, please do not include any
other PCI related fix: the goal is to be aware of *all* the issues, and
fix them in mainline.

Gerlando: I remember you also had a power-of-two related issues, that
you reported a while ago. This patch series should fix it. Would it be
possible for you to test this patch series?

Remaining issues:

 * The link up problem. Unfortunately, I tried to reproduce it today,
   and didn't manage to. It's weird, because I'm sure I was able to
   produce it in the past, but I'm no longer able to, I don't know.
   Therefore, it's not easy for me to work on this topic. Neil, Jason,
   do you think this is a topic you could potentially handle?

 * On my Armada XP DB board, if I plug 5 PCIe cards, the IGB card for
   some reason isn't able to read data from its non-volatile memory. So
   the window points to something, but it doesn't seem to patch what
   the igb driver expects. I've double checked the MBus windows, and
   they look correct. I haven't tested this PCIe configuration with the
   Marvell LSP though, so maybe I'm hitting an unrelated hardware
   problem or something.

Thanks a lot for your feedback and participation around these PCIe
issues!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 16:19 ` Thomas Petazzoni
@ 2014-04-10 16:57   ` Jason Gunthorpe
  -1 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-10 16:57 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Neil Greatorex, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

On Thu, Apr 10, 2014 at 06:19:53PM +0200, Thomas Petazzoni wrote:

> This is an e-mail that attempts to summarize the situation in terms of
> Armada XP PCIe issues.
> 
> At
> https://github.com/MISL-EBU-System-SW/mainline-public/commits/3.14/pci-debug,
> I've pushed a branch based on top of v3.14 that contains:

mvebu_pcie_del_windows / mvebu_pcie_add_windows I wonder if these
functions should be dropped into the mbus driver.. They are pretty
generic.

'bus: mvebu-mbus: Avoid setting an undefined window size' and
'pci: mvebu: fix off-by-one in the computed size of the mbus windows'
need to be swapped in order to maintain bisect-ability.

> Can you test this stack of patches on your system and configuration, and
> let me know if that works for you? 

Continues to work as expected here, and I see the new error message:

mvebu_mbus: cannot add window '4:e8', conflicts with another window
mvebu-pcie pex.1: Could not create MBus window at 0xe0000000, size 0x100000: -22

(this is due to the PEX window being in the DT mbus ranges already)

Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

> 
>  * The link up problem. Unfortunately, I tried to reproduce it today,
>    and didn't manage to. It's weird, because I'm sure I was able to
>    produce it in the past, but I'm no longer able to, I don't know.
>    Therefore, it's not easy for me to work on this topic. Neil, Jason,
>    do you think this is a topic you could potentially handle?

You said you had a system that sometimes required the udelay? Can you
run Neil's patch and see if your system behaves the same? Specifically
that the link eventually goes down after mvebu_pcie_set_local_dev_nr ?

I couldn't find any case where the BDF leaks below the TLP layer, and
the spec is very clear that the assigned BDF can change at run time,
so I don't have an explanation for why the link reset is happening.
Do you have a contact at Marvell that might shed some light on that
behavior?

>  * On my Armada XP DB board, if I plug 5 PCIe cards, the IGB card for
>    some reason isn't able to read data from its non-volatile memory. So
>    the window points to something, but it doesn't seem to patch what
>    the igb driver expects. I've double checked the MBus windows, and
>    they look correct. I haven't tested this PCIe configuration with the
>    Marvell LSP though, so maybe I'm hitting an unrelated hardware
>    problem or something.

Certainly troubling.. And the IGB works if it is the only card in the
system?

That does sound like more mbus troubles.

Regrads,
Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 16:57   ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-10 16:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 10, 2014 at 06:19:53PM +0200, Thomas Petazzoni wrote:

> This is an e-mail that attempts to summarize the situation in terms of
> Armada XP PCIe issues.
> 
> At
> https://github.com/MISL-EBU-System-SW/mainline-public/commits/3.14/pci-debug,
> I've pushed a branch based on top of v3.14 that contains:

mvebu_pcie_del_windows / mvebu_pcie_add_windows I wonder if these
functions should be dropped into the mbus driver.. They are pretty
generic.

'bus: mvebu-mbus: Avoid setting an undefined window size' and
'pci: mvebu: fix off-by-one in the computed size of the mbus windows'
need to be swapped in order to maintain bisect-ability.

> Can you test this stack of patches on your system and configuration, and
> let me know if that works for you? 

Continues to work as expected here, and I see the new error message:

mvebu_mbus: cannot add window '4:e8', conflicts with another window
mvebu-pcie pex.1: Could not create MBus window at 0xe0000000, size 0x100000: -22

(this is due to the PEX window being in the DT mbus ranges already)

Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

> 
>  * The link up problem. Unfortunately, I tried to reproduce it today,
>    and didn't manage to. It's weird, because I'm sure I was able to
>    produce it in the past, but I'm no longer able to, I don't know.
>    Therefore, it's not easy for me to work on this topic. Neil, Jason,
>    do you think this is a topic you could potentially handle?

You said you had a system that sometimes required the udelay? Can you
run Neil's patch and see if your system behaves the same? Specifically
that the link eventually goes down after mvebu_pcie_set_local_dev_nr ?

I couldn't find any case where the BDF leaks below the TLP layer, and
the spec is very clear that the assigned BDF can change at run time,
so I don't have an explanation for why the link reset is happening.
Do you have a contact at Marvell that might shed some light on that
behavior?

>  * On my Armada XP DB board, if I plug 5 PCIe cards, the IGB card for
>    some reason isn't able to read data from its non-volatile memory. So
>    the window points to something, but it doesn't seem to patch what
>    the igb driver expects. I've double checked the MBus windows, and
>    they look correct. I haven't tested this PCIe configuration with the
>    Marvell LSP though, so maybe I'm hitting an unrelated hardware
>    problem or something.

Certainly troubling.. And the IGB works if it is the only card in the
system?

That does sound like more mbus troubles.

Regrads,
Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 16:19 ` Thomas Petazzoni
@ 2014-04-10 17:10   ` Willy Tarreau
  -1 siblings, 0 replies; 50+ messages in thread
From: Willy Tarreau @ 2014-04-10 17:10 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Jason Gunthorpe, Neil Greatorex, Matthew Minter,
	Gerlando Falauto, linux-arm-kernel, Jason Cooper,
	Gregory Clément, Ezequiel Garcia, Andrew Lunn, linux-pci,
	Tawfik Bayouk, Lior Amsalem

Hi Thomas,

On Thu, Apr 10, 2014 at 06:19:53PM +0200, Thomas Petazzoni wrote:
> Hello all,
> 
> This is an e-mail that attempts to summarize the situation in terms of
> Armada XP PCIe issues.
> 
> At
> https://github.com/MISL-EBU-System-SW/mainline-public/commits/3.14/pci-debug,
> I've pushed a branch based on top of v3.14 that contains:

Thanks for putting all this online.

I have a minor comment below :

>  * 2 patches on the pci-mvebu driver. One from Willy Tarreau to fix the
>    off by one on the sizes. And another one from me which splits the
>    PCI BAR into power-of-two sized chunks, in order to create valid
>    MBus windows.

As suggested by Jason, this one should be merged before his that's just
before, to ensure that it will not cause a regression.

> I've tested this with my IGB card which needs a 9 MB
>    BAR (so 8 MB + 1 MB needed), and I've also faked the code to code to
>    simulate a 11.5 MB BAR (so 8 + 2 + 1 + 0.5 MB), and it worked. I
>    also checked that if we have an error when creating one of the
>    windows, then all the previous windows needed for the current BAR
>    are properly removed.

Really cool, I'm going to test that on a few PCIe cards and will report
the results here. How can we check the number of mbus windows in use ?

Thanks,
Willy


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 17:10   ` Willy Tarreau
  0 siblings, 0 replies; 50+ messages in thread
From: Willy Tarreau @ 2014-04-10 17:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On Thu, Apr 10, 2014 at 06:19:53PM +0200, Thomas Petazzoni wrote:
> Hello all,
> 
> This is an e-mail that attempts to summarize the situation in terms of
> Armada XP PCIe issues.
> 
> At
> https://github.com/MISL-EBU-System-SW/mainline-public/commits/3.14/pci-debug,
> I've pushed a branch based on top of v3.14 that contains:

Thanks for putting all this online.

I have a minor comment below :

>  * 2 patches on the pci-mvebu driver. One from Willy Tarreau to fix the
>    off by one on the sizes. And another one from me which splits the
>    PCI BAR into power-of-two sized chunks, in order to create valid
>    MBus windows.

As suggested by Jason, this one should be merged before his that's just
before, to ensure that it will not cause a regression.

> I've tested this with my IGB card which needs a 9 MB
>    BAR (so 8 MB + 1 MB needed), and I've also faked the code to code to
>    simulate a 11.5 MB BAR (so 8 + 2 + 1 + 0.5 MB), and it worked. I
>    also checked that if we have an error when creating one of the
>    windows, then all the previous windows needed for the current BAR
>    are properly removed.

Really cool, I'm going to test that on a few PCIe cards and will report
the results here. How can we check the number of mbus windows in use ?

Thanks,
Willy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 16:57   ` Jason Gunthorpe
@ 2014-04-10 18:01     ` Thomas Petazzoni
  -1 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 18:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Neil Greatorex, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

Dear Jason Gunthorpe,

On Thu, 10 Apr 2014 10:57:33 -0600, Jason Gunthorpe wrote:

> > At
> > https://github.com/MISL-EBU-System-SW/mainline-public/commits/3.14/pci-debug,
> > I've pushed a branch based on top of v3.14 that contains:
> 
> mvebu_pcie_del_windows / mvebu_pcie_add_windows I wonder if these
> functions should be dropped into the mbus driver.. They are pretty
> generic.

Yes. I must say I hesitated quite a bit of time between having them in
pci-mvebu or in mvebu-mbus. I settled on pci-mvebu because PCI is so
far the only user of this functionality. But this is not a strong
opinion.

> 'bus: mvebu-mbus: Avoid setting an undefined window size' and
> 'pci: mvebu: fix off-by-one in the computed size of the mbus windows'
> need to be swapped in order to maintain bisect-ability.

They could, indeed, but on the other hand, PCI and bus are handled by
different maintainers, so they will flow through distinct branches, so
I'm not sure it's going to be easy to guarantee the bisectability here.

Unless we get the Acks from both maintainers, so that Jason Cooper can
carry the patches and maintain the right order.

I'll reorder them when posting the patch series anyway.

> > Can you test this stack of patches on your system and configuration, and
> > let me know if that works for you? 
> 
> Continues to work as expected here, and I see the new error message:
> 
> mvebu_mbus: cannot add window '4:e8', conflicts with another window
> mvebu-pcie pex.1: Could not create MBus window at 0xe0000000, size 0x100000: -22
> 
> (this is due to the PEX window being in the DT mbus ranges already)

I'm not sure to understand here: you have hardcoded in your DT the
necessary PEX window? If so, then yes, it's expected.

> Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Thanks!

> >  * The link up problem. Unfortunately, I tried to reproduce it today,
> >    and didn't manage to. It's weird, because I'm sure I was able to
> >    produce it in the past, but I'm no longer able to, I don't know.
> >    Therefore, it's not easy for me to work on this topic. Neil, Jason,
> >    do you think this is a topic you could potentially handle?
> 
> You said you had a system that sometimes required the udelay?

Yes, and I tried to reproduce it today, and I didn't manage to
reproduce the problem, even on this platform, in the same conditions.
I'll try again tomorrow.

> Can you
> run Neil's patch and see if your system behaves the same? Specifically
> that the link eventually goes down after mvebu_pcie_set_local_dev_nr ?
> 
> I couldn't find any case where the BDF leaks below the TLP layer, and
> the spec is very clear that the assigned BDF can change at run time,
> so I don't have an explanation for why the link reset is happening.
> Do you have a contact at Marvell that might shed some light on that
> behavior?

There was a potential explanation about the mvebu-soc-id driver that
enables the clock and then disables it, before the pci-mvebu driver.
This is different that what was happening before, where the pci-mvebu
driver was the only one to enable the clock, which was already enabled
by the bootloader. So if the clock takes some time to stabilize, the
introduction of mvebu-soc-id may lead to problems.

But I'm not entirely convinced by this, because in my testing, I saw:

 * Enable the clock
 * Values in the PCI configuration space are correct (like
   vendor/product ID)
 * mvebu_pcie_set_local_dev_nr()
 * Values in the PCI configuration space are no longer correct, unless
   you wait a little bit.

> >  * On my Armada XP DB board, if I plug 5 PCIe cards, the IGB card for
> >    some reason isn't able to read data from its non-volatile memory. So
> >    the window points to something, but it doesn't seem to patch what
> >    the igb driver expects. I've double checked the MBus windows, and
> >    they look correct. I haven't tested this PCIe configuration with the
> >    Marvell LSP though, so maybe I'm hitting an unrelated hardware
> >    problem or something.
> 
> Certainly troubling.. And the IGB works if it is the only card in the
> system?

Yes, the IGB card works fine when plugged as the only card. It also
works when plugged with 3 other PCIe cards. But when I plug the five
PCIe cards I have, the IGB stops working.

However, note that some interfaces are x1, some other x4, and maybe
there was a mismatch/incompatibility with the PCIe cards I was plugging
in the board.

> That does sound like more mbus troubles.

Interestingly, the problem occurred when I was plugging a SATA PCIe
card. And regardless of whether the SATA PCIe card is present or not,
the MBus mappings for the IGB are exactly the same.

Here is a dump of the MBus windows with 4 PCIe cards (including the IGB
one) :

[00] 00000000ffe30000 - 00000000ffe40000 : 0004:00f0 (remap 0000000000030000)
[01] disabled
[02] disabled
[03] disabled
[04] disabled
[05] disabled
[06] disabled
[07] disabled
[08] 00000000fff00000 - 0000000100000000 : 0001:001d -> bootrom
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f -> NOR flash
[10] 00000000f8000000 - 00000000f8100000 : 0004:00e8 -> some other PCIe card
[11] 00000000f8800000 - 00000000f9000000 : 0004:00f8 -> this is the IGB
[12] 00000000f9000000 - 00000000f9100000 : 0004:00f8 -> this is the IGB
[13] disabled
[14] disabled
[15] disabled
[16] disabled
[17] disabled
[18] disabled
[19] disabled

And now, with the SATA card plugged in:

[00] 00000000ffe40000 - 00000000ffe50000 : 0008:00f0 (remap 0000000000040000)
[01] 00000000ffe30000 - 00000000ffe40000 : 0004:00f0 (remap 0000000000030000)
[02] disabled
[03] disabled
[04] disabled
[05] disabled
[06] disabled
[07] disabled
[08] 00000000fff00000 - 0000000100000000 : 0001:001d	=> bootrom
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f	=> NOR flash
[10] 00000000f8000000 - 00000000f8100000 : 0004:00e8	=> port 0.0 mem
[11] 00000000f8400000 - 00000000f8600000 : 0008:00f8	=> port 3.0 mem (SATA)
[12] 00000000f8800000 - 00000000f9000000 : 0004:00f8	=> port 2.0 mem (igb)
[13] 00000000f9000000 - 00000000f9100000 : 0004:00f8	=> port 2.0 mem (igb)
[14] disabled
[15] disabled
[16] disabled
[17] disabled
[18] disabled
[19] disabled

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 18:01     ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 18:01 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Jason Gunthorpe,

On Thu, 10 Apr 2014 10:57:33 -0600, Jason Gunthorpe wrote:

> > At
> > https://github.com/MISL-EBU-System-SW/mainline-public/commits/3.14/pci-debug,
> > I've pushed a branch based on top of v3.14 that contains:
> 
> mvebu_pcie_del_windows / mvebu_pcie_add_windows I wonder if these
> functions should be dropped into the mbus driver.. They are pretty
> generic.

Yes. I must say I hesitated quite a bit of time between having them in
pci-mvebu or in mvebu-mbus. I settled on pci-mvebu because PCI is so
far the only user of this functionality. But this is not a strong
opinion.

> 'bus: mvebu-mbus: Avoid setting an undefined window size' and
> 'pci: mvebu: fix off-by-one in the computed size of the mbus windows'
> need to be swapped in order to maintain bisect-ability.

They could, indeed, but on the other hand, PCI and bus are handled by
different maintainers, so they will flow through distinct branches, so
I'm not sure it's going to be easy to guarantee the bisectability here.

Unless we get the Acks from both maintainers, so that Jason Cooper can
carry the patches and maintain the right order.

I'll reorder them when posting the patch series anyway.

> > Can you test this stack of patches on your system and configuration, and
> > let me know if that works for you? 
> 
> Continues to work as expected here, and I see the new error message:
> 
> mvebu_mbus: cannot add window '4:e8', conflicts with another window
> mvebu-pcie pex.1: Could not create MBus window at 0xe0000000, size 0x100000: -22
> 
> (this is due to the PEX window being in the DT mbus ranges already)

I'm not sure to understand here: you have hardcoded in your DT the
necessary PEX window? If so, then yes, it's expected.

> Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Thanks!

> >  * The link up problem. Unfortunately, I tried to reproduce it today,
> >    and didn't manage to. It's weird, because I'm sure I was able to
> >    produce it in the past, but I'm no longer able to, I don't know.
> >    Therefore, it's not easy for me to work on this topic. Neil, Jason,
> >    do you think this is a topic you could potentially handle?
> 
> You said you had a system that sometimes required the udelay?

Yes, and I tried to reproduce it today, and I didn't manage to
reproduce the problem, even on this platform, in the same conditions.
I'll try again tomorrow.

> Can you
> run Neil's patch and see if your system behaves the same? Specifically
> that the link eventually goes down after mvebu_pcie_set_local_dev_nr ?
> 
> I couldn't find any case where the BDF leaks below the TLP layer, and
> the spec is very clear that the assigned BDF can change at run time,
> so I don't have an explanation for why the link reset is happening.
> Do you have a contact at Marvell that might shed some light on that
> behavior?

There was a potential explanation about the mvebu-soc-id driver that
enables the clock and then disables it, before the pci-mvebu driver.
This is different that what was happening before, where the pci-mvebu
driver was the only one to enable the clock, which was already enabled
by the bootloader. So if the clock takes some time to stabilize, the
introduction of mvebu-soc-id may lead to problems.

But I'm not entirely convinced by this, because in my testing, I saw:

 * Enable the clock
 * Values in the PCI configuration space are correct (like
   vendor/product ID)
 * mvebu_pcie_set_local_dev_nr()
 * Values in the PCI configuration space are no longer correct, unless
   you wait a little bit.

> >  * On my Armada XP DB board, if I plug 5 PCIe cards, the IGB card for
> >    some reason isn't able to read data from its non-volatile memory. So
> >    the window points to something, but it doesn't seem to patch what
> >    the igb driver expects. I've double checked the MBus windows, and
> >    they look correct. I haven't tested this PCIe configuration with the
> >    Marvell LSP though, so maybe I'm hitting an unrelated hardware
> >    problem or something.
> 
> Certainly troubling.. And the IGB works if it is the only card in the
> system?

Yes, the IGB card works fine when plugged as the only card. It also
works when plugged with 3 other PCIe cards. But when I plug the five
PCIe cards I have, the IGB stops working.

However, note that some interfaces are x1, some other x4, and maybe
there was a mismatch/incompatibility with the PCIe cards I was plugging
in the board.

> That does sound like more mbus troubles.

Interestingly, the problem occurred when I was plugging a SATA PCIe
card. And regardless of whether the SATA PCIe card is present or not,
the MBus mappings for the IGB are exactly the same.

Here is a dump of the MBus windows with 4 PCIe cards (including the IGB
one) :

[00] 00000000ffe30000 - 00000000ffe40000 : 0004:00f0 (remap 0000000000030000)
[01] disabled
[02] disabled
[03] disabled
[04] disabled
[05] disabled
[06] disabled
[07] disabled
[08] 00000000fff00000 - 0000000100000000 : 0001:001d -> bootrom
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f -> NOR flash
[10] 00000000f8000000 - 00000000f8100000 : 0004:00e8 -> some other PCIe card
[11] 00000000f8800000 - 00000000f9000000 : 0004:00f8 -> this is the IGB
[12] 00000000f9000000 - 00000000f9100000 : 0004:00f8 -> this is the IGB
[13] disabled
[14] disabled
[15] disabled
[16] disabled
[17] disabled
[18] disabled
[19] disabled

And now, with the SATA card plugged in:

[00] 00000000ffe40000 - 00000000ffe50000 : 0008:00f0 (remap 0000000000040000)
[01] 00000000ffe30000 - 00000000ffe40000 : 0004:00f0 (remap 0000000000030000)
[02] disabled
[03] disabled
[04] disabled
[05] disabled
[06] disabled
[07] disabled
[08] 00000000fff00000 - 0000000100000000 : 0001:001d	=> bootrom
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f	=> NOR flash
[10] 00000000f8000000 - 00000000f8100000 : 0004:00e8	=> port 0.0 mem
[11] 00000000f8400000 - 00000000f8600000 : 0008:00f8	=> port 3.0 mem (SATA)
[12] 00000000f8800000 - 00000000f9000000 : 0004:00f8	=> port 2.0 mem (igb)
[13] 00000000f9000000 - 00000000f9100000 : 0004:00f8	=> port 2.0 mem (igb)
[14] disabled
[15] disabled
[16] disabled
[17] disabled
[18] disabled
[19] disabled

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 17:10   ` Willy Tarreau
@ 2014-04-10 18:02     ` Thomas Petazzoni
  -1 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 18:02 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Jason Gunthorpe, Neil Greatorex, Matthew Minter,
	Gerlando Falauto, linux-arm-kernel, Jason Cooper,
	Gregory Clément, Ezequiel Garcia, Andrew Lunn, linux-pci,
	Tawfik Bayouk, Lior Amsalem

Dear Willy Tarreau,

On Thu, 10 Apr 2014 19:10:00 +0200, Willy Tarreau wrote:

> Thanks for putting all this online.
> 
> I have a minor comment below :
> 
> >  * 2 patches on the pci-mvebu driver. One from Willy Tarreau to fix the
> >    off by one on the sizes. And another one from me which splits the
> >    PCI BAR into power-of-two sized chunks, in order to create valid
> >    MBus windows.
> 
> As suggested by Jason, this one should be merged before his that's just
> before, to ensure that it will not cause a regression.

Ok, will change this, thanks.

> > I've tested this with my IGB card which needs a 9 MB
> >    BAR (so 8 MB + 1 MB needed), and I've also faked the code to code to
> >    simulate a 11.5 MB BAR (so 8 + 2 + 1 + 0.5 MB), and it worked. I
> >    also checked that if we have an error when creating one of the
> >    windows, then all the previous windows needed for the current BAR
> >    are properly removed.
> 
> Really cool, I'm going to test that on a few PCIe cards and will report
> the results here. How can we check the number of mbus windows in use ?

# cat /sys/kernel/debug/mvebu-mbus/devices

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 18:02     ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Willy Tarreau,

On Thu, 10 Apr 2014 19:10:00 +0200, Willy Tarreau wrote:

> Thanks for putting all this online.
> 
> I have a minor comment below :
> 
> >  * 2 patches on the pci-mvebu driver. One from Willy Tarreau to fix the
> >    off by one on the sizes. And another one from me which splits the
> >    PCI BAR into power-of-two sized chunks, in order to create valid
> >    MBus windows.
> 
> As suggested by Jason, this one should be merged before his that's just
> before, to ensure that it will not cause a regression.

Ok, will change this, thanks.

> > I've tested this with my IGB card which needs a 9 MB
> >    BAR (so 8 MB + 1 MB needed), and I've also faked the code to code to
> >    simulate a 11.5 MB BAR (so 8 + 2 + 1 + 0.5 MB), and it worked. I
> >    also checked that if we have an error when creating one of the
> >    windows, then all the previous windows needed for the current BAR
> >    are properly removed.
> 
> Really cool, I'm going to test that on a few PCIe cards and will report
> the results here. How can we check the number of mbus windows in use ?

# cat /sys/kernel/debug/mvebu-mbus/devices

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 16:19 ` Thomas Petazzoni
@ 2014-04-10 18:20   ` Neil Greatorex
  -1 siblings, 0 replies; 50+ messages in thread
From: Neil Greatorex @ 2014-04-10 18:20 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Jason Gunthorpe, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

Thomas,

On Thu, 10 Apr 2014, Thomas Petazzoni wrote:

> Can you test this stack of patches on your system and configuration, and
> let me know if that works for you? Of course, please do not include any
> other PCI related fix: the goal is to be aware of *all* the issues, and
> fix them in mainline.

I have tested this branch on my Mirabox. I still get the link up problem 
but if I work around that (see below) the igb driver works flawlessly.

Tested-by: Neil Greatorex <neil@fatboyfat.co.uk>

> Remaining issues:
>
> * The link up problem. Unfortunately, I tried to reproduce it today,
>   and didn't manage to. It's weird, because I'm sure I was able to
>   produce it in the past, but I'm no longer able to, I don't know.
>   Therefore, it's not easy for me to work on this topic. Neil, Jason,
>   do you think this is a topic you could potentially handle?

Do you have earlyprintk enabled? I've found that if I have earlyprintk in 
my bootargs / command line then I don't get the issue. I assume that is 
because of timing, but it could be something else?

Cheers,
Neil

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 18:20   ` Neil Greatorex
  0 siblings, 0 replies; 50+ messages in thread
From: Neil Greatorex @ 2014-04-10 18:20 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas,

On Thu, 10 Apr 2014, Thomas Petazzoni wrote:

> Can you test this stack of patches on your system and configuration, and
> let me know if that works for you? Of course, please do not include any
> other PCI related fix: the goal is to be aware of *all* the issues, and
> fix them in mainline.

I have tested this branch on my Mirabox. I still get the link up problem 
but if I work around that (see below) the igb driver works flawlessly.

Tested-by: Neil Greatorex <neil@fatboyfat.co.uk>

> Remaining issues:
>
> * The link up problem. Unfortunately, I tried to reproduce it today,
>   and didn't manage to. It's weird, because I'm sure I was able to
>   produce it in the past, but I'm no longer able to, I don't know.
>   Therefore, it's not easy for me to work on this topic. Neil, Jason,
>   do you think this is a topic you could potentially handle?

Do you have earlyprintk enabled? I've found that if I have earlyprintk in 
my bootargs / command line then I don't get the issue. I assume that is 
because of timing, but it could be something else?

Cheers,
Neil

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 18:01     ` Thomas Petazzoni
@ 2014-04-10 20:12       ` Jason Gunthorpe
  -1 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-10 20:12 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Neil Greatorex, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

[-- Attachment #1: Type: text/plain, Size: 3180 bytes --]

On Thu, Apr 10, 2014 at 08:01:53PM +0200, Thomas Petazzoni wrote:
> > Can you run Neil's patch and see if your system behaves the same?
> > Specifically that the link eventually goes down after
> > mvebu_pcie_set_local_dev_nr ?
> > 
> > I couldn't find any case where the BDF leaks below the TLP layer, and
> > the spec is very clear that the assigned BDF can change at run time,
> > so I don't have an explanation for why the link reset is happening.
> > Do you have a contact at Marvell that might shed some light on that
> > behavior?
> 
> There was a potential explanation about the mvebu-soc-id driver that
> enables the clock and then disables it, before the pci-mvebu driver.
> This is different that what was happening before, where the pci-mvebu
> driver was the only one to enable the clock, which was already enabled
> by the bootloader. So if the clock takes some time to stabilize, the
> introduction of mvebu-soc-id may lead to problems.

Oh, that certainly sounds like a potential problem. Disabling the
clock will certainly cause 'interesting' effects on the PEX, depending
on exactly what it does it could confuse the link partner (trigger a
timeout based retrain?).

Gating the clock without disabling the Phy first does sound like a
bad idea..

Neil, does this do anything for you?

diff --git a/arch/arm/mach-mvebu/mvebu-soc-id.c b/arch/arm/mach-mvebu/mvebu-soc-id.c
index f3b325f..e0a032f 100644
--- a/arch/arm/mach-mvebu/mvebu-soc-id.c
+++ b/arch/arm/mach-mvebu/mvebu-soc-id.c
@@ -107,7 +107,7 @@ static int __init mvebu_soc_id_init(void)
        iounmap(pci_base);
 
 res_ioremap:
-       clk_disable_unprepare(clk);
+//     clk_disable_unprepare(clk);
 
 clk_err:
        of_node_put(child);

> But I'm not entirely convinced by this, because in my testing, I saw:
> 
>  * Enable the clock
>  * Values in the PCI configuration space are correct (like
>    vendor/product ID)
>  * mvebu_pcie_set_local_dev_nr()
>  * Values in the PCI configuration space are no longer correct, unless
>    you wait a little bit.

Were you reading the configuation space through the MMIO mapping or
through the configuration indirection?

In any event, turning on the clock should almost certainly be
accompanied by a phy reset sequence to get both link ends on the same
page.

Attached is a rough, untested patch along those lines.

> > That does sound like more mbus troubles.
> 
> Interestingly, the problem occurred when I was plugging a SATA PCIe
> card. And regardless of whether the SATA PCIe card is present or not,
> the MBus mappings for the IGB are exactly the same.

Maybe something wrong with mbus window index 13?

Any change if you use other windows?

--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -299,7 +299,7 @@ static int mvebu_mbus_alloc_window(struct mvebu_mbus_state *mbus,
        int win;
 
        if (remap == MVEBU_MBUS_NO_REMAP) {
-               for (win = mbus->soc->num_remappable_wins;
+               for (win = 0;
                     win < mbus->soc->num_wins; win++)
                        if (mvebu_mbus_window_is_free(mbus, win))
                                return mvebu_mbus_setup_window(mbus, win, base,

Jason

[-- Attachment #2: pex-reset.diff --]
[-- Type: text/x-diff, Size: 1897 bytes --]

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 0d638b7..7b7d19a 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -21,6 +21,7 @@
 #include <linux/of_gpio.h>
 #include <linux/of_pci.h>
 #include <linux/of_platform.h>
+#include <linux/clk-provider.h>
 
 /*
  * PCIe unit register offsets.
@@ -973,6 +974,7 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 	for_each_child_of_node(pdev->dev.of_node, child) {
 		struct mvebu_pcie_port *port = &pcie->ports[i];
 		enum of_gpio_flags flags;
+		bool enabled;
 
 		if (!of_device_is_available(child))
 			continue;
@@ -1044,6 +1046,9 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 			continue;
 		}
 
+		// Does this work on MVEBU?
+		enabled = __clk_is_enabled(port->clk);
+
 		ret = clk_prepare_enable(port->clk);
 		if (ret)
 			continue;
@@ -1057,7 +1062,35 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 			continue;
 		}
 
-		mvebu_pcie_set_local_dev_nr(port, 1);
+		if (!enabled) {
+			u32 reg;
+			unsigned int tries;
+
+			/* The clock is being turned on for the first time, do
+			 * a PHY reset
+			 */
+			dev_info(&pdev->dev,
+				 "PCIe%d.%d: performing link reset\n",
+				 port->port, port->lane);
+			reg = mvebu_readl(port, PCIE_CTRL_OFF);
+			mvebu_writel(port,
+				     reg & ~BIT(30), // Conf_TrainingDisable
+				     PCIE_CTRL_OFF);
+			do {
+				udelay(100); // Guess?
+			} while (mvebu_pcie_link_up(port));
+			mvebu_pcie_set_local_dev_nr(port, 1);
+			mvebu_writel(port, reg | ~BIT(30), PCIE_CTRL_OFF);
+
+			for (tries = 0;
+			     !mvebu_pcie_link_up(port) && tries != 100; tries++)
+				udelay(100);
+		} else {
+			/* We expect the bootloader has setup the port and
+			 * waited for the link to go up
+			 */
+			mvebu_pcie_set_local_dev_nr(port, 1);
+		}
 
 		port->dn = child;
 		spin_lock_init(&port->conf_lock);

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 20:12       ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-10 20:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 10, 2014 at 08:01:53PM +0200, Thomas Petazzoni wrote:
> > Can you run Neil's patch and see if your system behaves the same?
> > Specifically that the link eventually goes down after
> > mvebu_pcie_set_local_dev_nr ?
> > 
> > I couldn't find any case where the BDF leaks below the TLP layer, and
> > the spec is very clear that the assigned BDF can change at run time,
> > so I don't have an explanation for why the link reset is happening.
> > Do you have a contact at Marvell that might shed some light on that
> > behavior?
> 
> There was a potential explanation about the mvebu-soc-id driver that
> enables the clock and then disables it, before the pci-mvebu driver.
> This is different that what was happening before, where the pci-mvebu
> driver was the only one to enable the clock, which was already enabled
> by the bootloader. So if the clock takes some time to stabilize, the
> introduction of mvebu-soc-id may lead to problems.

Oh, that certainly sounds like a potential problem. Disabling the
clock will certainly cause 'interesting' effects on the PEX, depending
on exactly what it does it could confuse the link partner (trigger a
timeout based retrain?).

Gating the clock without disabling the Phy first does sound like a
bad idea..

Neil, does this do anything for you?

diff --git a/arch/arm/mach-mvebu/mvebu-soc-id.c b/arch/arm/mach-mvebu/mvebu-soc-id.c
index f3b325f..e0a032f 100644
--- a/arch/arm/mach-mvebu/mvebu-soc-id.c
+++ b/arch/arm/mach-mvebu/mvebu-soc-id.c
@@ -107,7 +107,7 @@ static int __init mvebu_soc_id_init(void)
        iounmap(pci_base);
 
 res_ioremap:
-       clk_disable_unprepare(clk);
+//     clk_disable_unprepare(clk);
 
 clk_err:
        of_node_put(child);

> But I'm not entirely convinced by this, because in my testing, I saw:
> 
>  * Enable the clock
>  * Values in the PCI configuration space are correct (like
>    vendor/product ID)
>  * mvebu_pcie_set_local_dev_nr()
>  * Values in the PCI configuration space are no longer correct, unless
>    you wait a little bit.

Were you reading the configuation space through the MMIO mapping or
through the configuration indirection?

In any event, turning on the clock should almost certainly be
accompanied by a phy reset sequence to get both link ends on the same
page.

Attached is a rough, untested patch along those lines.

> > That does sound like more mbus troubles.
> 
> Interestingly, the problem occurred when I was plugging a SATA PCIe
> card. And regardless of whether the SATA PCIe card is present or not,
> the MBus mappings for the IGB are exactly the same.

Maybe something wrong with mbus window index 13?

Any change if you use other windows?

--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -299,7 +299,7 @@ static int mvebu_mbus_alloc_window(struct mvebu_mbus_state *mbus,
        int win;
 
        if (remap == MVEBU_MBUS_NO_REMAP) {
-               for (win = mbus->soc->num_remappable_wins;
+               for (win = 0;
                     win < mbus->soc->num_wins; win++)
                        if (mvebu_mbus_window_is_free(mbus, win))
                                return mvebu_mbus_setup_window(mbus, win, base,

Jason
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pex-reset.diff
Type: text/x-diff
Size: 1897 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140410/e9227692/attachment.bin>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 20:12       ` Jason Gunthorpe
@ 2014-04-10 21:04         ` Thomas Petazzoni
  -1 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 21:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Neil Greatorex, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

Dear Jason Gunthorpe,

On Thu, 10 Apr 2014 14:12:01 -0600, Jason Gunthorpe wrote:

> > But I'm not entirely convinced by this, because in my testing, I saw:
> > 
> >  * Enable the clock
> >  * Values in the PCI configuration space are correct (like
> >    vendor/product ID)
> >  * mvebu_pcie_set_local_dev_nr()
> >  * Values in the PCI configuration space are no longer correct, unless
> >    you wait a little bit.
> 
> Were you reading the configuation space through the MMIO mapping or
> through the configuration indirection?

I was simply calling the mvebu_pcie_hw_rd_conf() function, so I guess
it goes through what you call the "configuration indirection".

> In any event, turning on the clock should almost certainly be
> accompanied by a phy reset sequence to get both link ends on the same
> page.
> 
> Attached is a rough, untested patch along those lines.

I'll try tomorrow, if I manage to reproduce the initial bug to start
with.

> > > That does sound like more mbus troubles.
> > 
> > Interestingly, the problem occurred when I was plugging a SATA PCIe
> > card. And regardless of whether the SATA PCIe card is present or not,
> > the MBus mappings for the IGB are exactly the same.
> 
> Maybe something wrong with mbus window index 13?
> 
> Any change if you use other windows?

Don't know, will try tomorrow and report back :-)

Thanks for the suggestions!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 21:04         ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 21:04 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Jason Gunthorpe,

On Thu, 10 Apr 2014 14:12:01 -0600, Jason Gunthorpe wrote:

> > But I'm not entirely convinced by this, because in my testing, I saw:
> > 
> >  * Enable the clock
> >  * Values in the PCI configuration space are correct (like
> >    vendor/product ID)
> >  * mvebu_pcie_set_local_dev_nr()
> >  * Values in the PCI configuration space are no longer correct, unless
> >    you wait a little bit.
> 
> Were you reading the configuation space through the MMIO mapping or
> through the configuration indirection?

I was simply calling the mvebu_pcie_hw_rd_conf() function, so I guess
it goes through what you call the "configuration indirection".

> In any event, turning on the clock should almost certainly be
> accompanied by a phy reset sequence to get both link ends on the same
> page.
> 
> Attached is a rough, untested patch along those lines.

I'll try tomorrow, if I manage to reproduce the initial bug to start
with.

> > > That does sound like more mbus troubles.
> > 
> > Interestingly, the problem occurred when I was plugging a SATA PCIe
> > card. And regardless of whether the SATA PCIe card is present or not,
> > the MBus mappings for the IGB are exactly the same.
> 
> Maybe something wrong with mbus window index 13?
> 
> Any change if you use other windows?

Don't know, will try tomorrow and report back :-)

Thanks for the suggestions!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 18:20   ` Neil Greatorex
@ 2014-04-10 21:07     ` Thomas Petazzoni
  -1 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 21:07 UTC (permalink / raw)
  To: Neil Greatorex
  Cc: Jason Gunthorpe, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

Dear Neil Greatorex,

On Thu, 10 Apr 2014 19:20:37 +0100 (BST), Neil Greatorex wrote:

> I have tested this branch on my Mirabox. I still get the link up problem 
> but if I work around that (see below) the igb driver works flawlessly.
> 
> Tested-by: Neil Greatorex <neil@fatboyfat.co.uk>

Good, thanks, we're making progress!

> > Remaining issues:
> >
> > * The link up problem. Unfortunately, I tried to reproduce it today,
> >   and didn't manage to. It's weird, because I'm sure I was able to
> >   produce it in the past, but I'm no longer able to, I don't know.
> >   Therefore, it's not easy for me to work on this topic. Neil, Jason,
> >   do you think this is a topic you could potentially handle?
> 
> Do you have earlyprintk enabled? I've found that if I have earlyprintk in 
> my bootargs / command line then I don't get the issue. I assume that is 
> because of timing, but it could be something else?

I have indeed disabled earlyprintk. Initially by removing it from the
command line, and then even by disabling CONFIG_DEBUG_LL.

In fact the original bug report I had came from my colleague Gregory
Clement, who precisely reported to me that a PCIe card was properly
detected on his Armada 385 board when earlyprintk was enabled, but the
PCIe card was not detected when earlyprintk was disabled. At the time,
I was able to reproduce the problem as well on my Armada 385, and
debugged it to find that the mvebu_pcie_set_local_dev_nr().

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 21:07     ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-10 21:07 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Neil Greatorex,

On Thu, 10 Apr 2014 19:20:37 +0100 (BST), Neil Greatorex wrote:

> I have tested this branch on my Mirabox. I still get the link up problem 
> but if I work around that (see below) the igb driver works flawlessly.
> 
> Tested-by: Neil Greatorex <neil@fatboyfat.co.uk>

Good, thanks, we're making progress!

> > Remaining issues:
> >
> > * The link up problem. Unfortunately, I tried to reproduce it today,
> >   and didn't manage to. It's weird, because I'm sure I was able to
> >   produce it in the past, but I'm no longer able to, I don't know.
> >   Therefore, it's not easy for me to work on this topic. Neil, Jason,
> >   do you think this is a topic you could potentially handle?
> 
> Do you have earlyprintk enabled? I've found that if I have earlyprintk in 
> my bootargs / command line then I don't get the issue. I assume that is 
> because of timing, but it could be something else?

I have indeed disabled earlyprintk. Initially by removing it from the
command line, and then even by disabling CONFIG_DEBUG_LL.

In fact the original bug report I had came from my colleague Gregory
Clement, who precisely reported to me that a PCIe card was properly
detected on his Armada 385 board when earlyprintk was enabled, but the
PCIe card was not detected when earlyprintk was disabled. At the time,
I was able to reproduce the problem as well on my Armada 385, and
debugged it to find that the mvebu_pcie_set_local_dev_nr().

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 20:12       ` Jason Gunthorpe
@ 2014-04-10 21:56         ` Neil Greatorex
  -1 siblings, 0 replies; 50+ messages in thread
From: Neil Greatorex @ 2014-04-10 21:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Thomas Petazzoni, Willy Tarreau, Matthew Minter,
	Gerlando Falauto, linux-arm-kernel, Jason Cooper,
	Gregory Clément, Ezequiel Garcia, Andrew Lunn, linux-pci,
	Tawfik Bayouk, Lior Amsalem

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2101 bytes --]

Jason,

On Thu, 10 Apr 2014, Jason Gunthorpe wrote:

> Gating the clock without disabling the Phy first does sound like a
> bad idea..
>
> Neil, does this do anything for you?
>
> diff --git a/arch/arm/mach-mvebu/mvebu-soc-id.c b/arch/arm/mach-mvebu/mvebu-soc-id.c
> index f3b325f..e0a032f 100644
> --- a/arch/arm/mach-mvebu/mvebu-soc-id.c
> +++ b/arch/arm/mach-mvebu/mvebu-soc-id.c
> @@ -107,7 +107,7 @@ static int __init mvebu_soc_id_init(void)
>        iounmap(pci_base);
>
> res_ioremap:
> -       clk_disable_unprepare(clk);
> +//     clk_disable_unprepare(clk);
>
> clk_err:
>        of_node_put(child);
>

That patch has fixed it for me. The PCIe card seems to be now be always 
properly detected.

> In any event, turning on the clock should almost certainly be
> accompanied by a phy reset sequence to get both link ends on the same
> page.
>
> Attached is a rough, untested patch along those lines.
>

I took your attached patch and extended it a bit to print out how long it 
took. The delays also need to be much longer for me. I also fixed a small 
typo you made where the bit wasn't being set again to bring the link back 
up. I've attached the diff to your patch, as well as the combined patch 
(hope that makes sense).

With the attached patch I get the following output:

mirabox ~ # dmesg | grep PCIe0.0
[    0.135947] mvebu-pcie pcie-controller.3: PCIe0.0: performing link 
reset
[    0.161989] mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 
26 tries
[    0.173984] mvebu-pcie pcie-controller.3: PCIe0.0: link came back up 
after 12 tries
mirabox ~ # lspci
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01)
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01)
01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
03:00.0 USB controller: Fresco Logic FL1009 USB 3.0 Host Controller (rev 
02)

So that seems to also work. I will leave it to you and Thomas to decide 
which approach is better!

Cheers,
Neil

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Combined patch --]
[-- Type: TEXT/x-diff; name=pex-combined.diff, Size: 2236 bytes --]

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 487c926..d09a7e5 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -21,6 +21,7 @@
 #include <linux/of_gpio.h>
 #include <linux/of_pci.h>
 #include <linux/of_platform.h>
+#include <linux/clk-provider.h>
 
 /*
  * PCIe unit register offsets.
@@ -951,6 +952,7 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 	for_each_child_of_node(pdev->dev.of_node, child) {
 		struct mvebu_pcie_port *port = &pcie->ports[i];
 		enum of_gpio_flags flags;
+		bool enabled;
 
 		if (!of_device_is_available(child))
 			continue;
@@ -1022,6 +1024,9 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 			continue;
 		}
 
+		// Does this work on MVEBU?
+		enabled = __clk_is_enabled(port->clk);
+
 		ret = clk_prepare_enable(port->clk);
 		if (ret)
 			continue;
@@ -1035,7 +1040,45 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 			continue;
 		}
 
-		mvebu_pcie_set_local_dev_nr(port, 1);
+		if (!enabled) {
+			u32 reg;
+			unsigned int tries;
+
+			/* The clock is being turned on for the first time, do
+			 * a PHY reset
+			 */
+			dev_info(&pdev->dev,
+				 "PCIe%d.%d: performing link reset\n",
+				 port->port, port->lane);
+			reg = mvebu_readl(port, PCIE_CTRL_OFF);
+			mvebu_writel(port,
+				     reg & ~BIT(30), // Conf_TrainingDisable
+				     PCIE_CTRL_OFF);
+			
+			for (tries = 0;
+			     mvebu_pcie_link_up(port) && tries < 100; tries++)	
+				mdelay(1);
+			
+			dev_info(&pdev->dev,
+				"PCIe%d.%d: link went down after %d tries\n",
+				port->port, port->lane, tries);
+
+			mvebu_pcie_set_local_dev_nr(port, 1);
+			mvebu_writel(port, reg | BIT(30), PCIE_CTRL_OFF);
+
+			for (tries = 0;
+			     !mvebu_pcie_link_up(port) && tries != 100; tries++)
+				mdelay(1);
+			
+			dev_info(&pdev->dev,
+				"PCIe%d.%d: link came back up after %d tries\n",
+				port->port, port->lane, tries);
+		} else {
+			/* We expect the bootloader has setup the port and
+			 * waited for the link to go up
+			 */
+			mvebu_pcie_set_local_dev_nr(port, 1);
+		}
 
 		port->dn = child;
 		spin_lock_init(&port->conf_lock);

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: Diff to Jason's patch --]
[-- Type: TEXT/x-diff; name=pex-diff-to-jasons-patch.diff, Size: 1179 bytes --]

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index c902ca0..d09a7e5 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -1054,15 +1054,25 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 			mvebu_writel(port,
 				     reg & ~BIT(30), // Conf_TrainingDisable
 				     PCIE_CTRL_OFF);
-			do {
-				udelay(100); // Guess?
-			} while (mvebu_pcie_link_up(port));
+			
+			for (tries = 0;
+			     mvebu_pcie_link_up(port) && tries < 100; tries++)	
+				mdelay(1);
+			
+			dev_info(&pdev->dev,
+				"PCIe%d.%d: link went down after %d tries\n",
+				port->port, port->lane, tries);
+
 			mvebu_pcie_set_local_dev_nr(port, 1);
-			mvebu_writel(port, reg | ~BIT(30), PCIE_CTRL_OFF);
+			mvebu_writel(port, reg | BIT(30), PCIE_CTRL_OFF);
 
 			for (tries = 0;
 			     !mvebu_pcie_link_up(port) && tries != 100; tries++)
-				udelay(100);
+				mdelay(1);
+			
+			dev_info(&pdev->dev,
+				"PCIe%d.%d: link came back up after %d tries\n",
+				port->port, port->lane, tries);
 		} else {
 			/* We expect the bootloader has setup the port and
 			 * waited for the link to go up

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 21:56         ` Neil Greatorex
  0 siblings, 0 replies; 50+ messages in thread
From: Neil Greatorex @ 2014-04-10 21:56 UTC (permalink / raw)
  To: linux-arm-kernel

Jason,

On Thu, 10 Apr 2014, Jason Gunthorpe wrote:

> Gating the clock without disabling the Phy first does sound like a
> bad idea..
>
> Neil, does this do anything for you?
>
> diff --git a/arch/arm/mach-mvebu/mvebu-soc-id.c b/arch/arm/mach-mvebu/mvebu-soc-id.c
> index f3b325f..e0a032f 100644
> --- a/arch/arm/mach-mvebu/mvebu-soc-id.c
> +++ b/arch/arm/mach-mvebu/mvebu-soc-id.c
> @@ -107,7 +107,7 @@ static int __init mvebu_soc_id_init(void)
>        iounmap(pci_base);
>
> res_ioremap:
> -       clk_disable_unprepare(clk);
> +//     clk_disable_unprepare(clk);
>
> clk_err:
>        of_node_put(child);
>

That patch has fixed it for me. The PCIe card seems to be now be always 
properly detected.

> In any event, turning on the clock should almost certainly be
> accompanied by a phy reset sequence to get both link ends on the same
> page.
>
> Attached is a rough, untested patch along those lines.
>

I took your attached patch and extended it a bit to print out how long it 
took. The delays also need to be much longer for me. I also fixed a small 
typo you made where the bit wasn't being set again to bring the link back 
up. I've attached the diff to your patch, as well as the combined patch 
(hope that makes sense).

With the attached patch I get the following output:

mirabox ~ # dmesg | grep PCIe0.0
[    0.135947] mvebu-pcie pcie-controller.3: PCIe0.0: performing link 
reset
[    0.161989] mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 
26 tries
[    0.173984] mvebu-pcie pcie-controller.3: PCIe0.0: link came back up 
after 12 tries
mirabox ~ # lspci
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01)
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01)
01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
03:00.0 USB controller: Fresco Logic FL1009 USB 3.0 Host Controller (rev 
02)

So that seems to also work. I will leave it to you and Thomas to decide 
which approach is better!

Cheers,
Neil
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pex-combined.diff
Type: text/x-diff
Size: 2236 bytes
Desc: Combined patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140410/c4f61db5/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pex-diff-to-jasons-patch.diff
Type: text/x-diff
Size: 1179 bytes
Desc: Diff to Jason's patch
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140410/c4f61db5/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 21:56         ` Neil Greatorex
@ 2014-04-10 22:06           ` Jason Gunthorpe
  -1 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-10 22:06 UTC (permalink / raw)
  To: Neil Greatorex, Gregory Clément
  Cc: Thomas Petazzoni, Willy Tarreau, Matthew Minter,
	Gerlando Falauto, linux-arm-kernel, Jason Cooper,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

On Thu, Apr 10, 2014 at 10:56:00PM +0100, Neil Greatorex wrote:

> >diff --git a/arch/arm/mach-mvebu/mvebu-soc-id.c b/arch/arm/mach-mvebu/mvebu-soc-id.c
> >index f3b325f..e0a032f 100644
> >+++ b/arch/arm/mach-mvebu/mvebu-soc-id.c
> >@@ -107,7 +107,7 @@ static int __init mvebu_soc_id_init(void)
> >       iounmap(pci_base);
> >
> >res_ioremap:
> >-       clk_disable_unprepare(clk);
> >+//     clk_disable_unprepare(clk);
> >
> >clk_err:
> >       of_node_put(child);
> >
> 
> That patch has fixed it for me. The PCIe card seems to be now be
> always properly detected.

Okay, that makes lots of sense to me at least.

Gregory: I have this vauge recollection it was discussed when you
wrote the mvebu_soc_id_init patch to use a __clk_is_enabled but got
shot down? Clearly this needs to be fixed.

> >In any event, turning on the clock should almost certainly be
> >accompanied by a phy reset sequence to get both link ends on the same
> >page.
> >
> >Attached is a rough, untested patch along those lines.
> >
> 
> I took your attached patch and extended it a bit to print out how
> long it took. The delays also need to be much longer for me. I also
> fixed a small typo you made where the bit wasn't being set again to
> bring the link back up. I've attached the diff to your patch, as
> well as the combined patch (hope that makes sense).

Just to be clear you tried this alone without the above?

Thanks for fixing the patch, I think it also confirms the theory.

IMHO, both approaches are required.

The first prevents messing up the PEX as was left by the bootloader

The second allows the driver to startup a PEX that wasn't enabled by
the bootloader, and recover from clock gating in the kernel (eg the
modular case)

Both seem valuable..

Ideally I'd like to see the clk driver turn off the PEX PHY when it
gates the clock, but I have no great idea how to accomplish that sort
of cross register space adventure...

Thomas, if we can figure out how to properly do __clk_is_enabled I
can probably send a proper patch?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 22:06           ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-10 22:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 10, 2014 at 10:56:00PM +0100, Neil Greatorex wrote:

> >diff --git a/arch/arm/mach-mvebu/mvebu-soc-id.c b/arch/arm/mach-mvebu/mvebu-soc-id.c
> >index f3b325f..e0a032f 100644
> >+++ b/arch/arm/mach-mvebu/mvebu-soc-id.c
> >@@ -107,7 +107,7 @@ static int __init mvebu_soc_id_init(void)
> >       iounmap(pci_base);
> >
> >res_ioremap:
> >-       clk_disable_unprepare(clk);
> >+//     clk_disable_unprepare(clk);
> >
> >clk_err:
> >       of_node_put(child);
> >
> 
> That patch has fixed it for me. The PCIe card seems to be now be
> always properly detected.

Okay, that makes lots of sense to me at least.

Gregory: I have this vauge recollection it was discussed when you
wrote the mvebu_soc_id_init patch to use a __clk_is_enabled but got
shot down? Clearly this needs to be fixed.

> >In any event, turning on the clock should almost certainly be
> >accompanied by a phy reset sequence to get both link ends on the same
> >page.
> >
> >Attached is a rough, untested patch along those lines.
> >
> 
> I took your attached patch and extended it a bit to print out how
> long it took. The delays also need to be much longer for me. I also
> fixed a small typo you made where the bit wasn't being set again to
> bring the link back up. I've attached the diff to your patch, as
> well as the combined patch (hope that makes sense).

Just to be clear you tried this alone without the above?

Thanks for fixing the patch, I think it also confirms the theory.

IMHO, both approaches are required.

The first prevents messing up the PEX as was left by the bootloader

The second allows the driver to startup a PEX that wasn't enabled by
the bootloader, and recover from clock gating in the kernel (eg the
modular case)

Both seem valuable..

Ideally I'd like to see the clk driver turn off the PEX PHY when it
gates the clock, but I have no great idea how to accomplish that sort
of cross register space adventure...

Thomas, if we can figure out how to properly do __clk_is_enabled I
can probably send a proper patch?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 22:06           ` Jason Gunthorpe
@ 2014-04-10 22:15             ` Neil Greatorex
  -1 siblings, 0 replies; 50+ messages in thread
From: Neil Greatorex @ 2014-04-10 22:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Gregory Clément, Thomas Petazzoni, Willy Tarreau,
	Matthew Minter, Gerlando Falauto, linux-arm-kernel, Jason Cooper,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

Jason,

On Thu, 10 Apr 2014, Jason Gunthorpe wrote:

>> I took your attached patch and extended it a bit to print out how
>> long it took. The delays also need to be much longer for me. I also
>> fixed a small typo you made where the bit wasn't being set again to
>> bring the link back up. I've attached the diff to your patch, as
>> well as the combined patch (hope that makes sense).
>
> Just to be clear you tried this alone without the above?

Yes I did.

>
> Thanks for fixing the patch, I think it also confirms the theory.
>

No problem!

> IMHO, both approaches are required.
>
> The first prevents messing up the PEX as was left by the bootloader
>
> The second allows the driver to startup a PEX that wasn't enabled by
> the bootloader, and recover from clock gating in the kernel (eg the
> modular case)
>
> Both seem valuable..
>
> Ideally I'd like to see the clk driver turn off the PEX PHY when it
> gates the clock, but I have no great idea how to accomplish that sort
> of cross register space adventure...
>
> Thomas, if we can figure out how to properly do __clk_is_enabled I
> can probably send a proper patch?

Let me know when you do and I will test it and add my Tested-by.

>
> Thanks,
> Jason
>

Cheers,
Neil


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 22:15             ` Neil Greatorex
  0 siblings, 0 replies; 50+ messages in thread
From: Neil Greatorex @ 2014-04-10 22:15 UTC (permalink / raw)
  To: linux-arm-kernel

Jason,

On Thu, 10 Apr 2014, Jason Gunthorpe wrote:

>> I took your attached patch and extended it a bit to print out how
>> long it took. The delays also need to be much longer for me. I also
>> fixed a small typo you made where the bit wasn't being set again to
>> bring the link back up. I've attached the diff to your patch, as
>> well as the combined patch (hope that makes sense).
>
> Just to be clear you tried this alone without the above?

Yes I did.

>
> Thanks for fixing the patch, I think it also confirms the theory.
>

No problem!

> IMHO, both approaches are required.
>
> The first prevents messing up the PEX as was left by the bootloader
>
> The second allows the driver to startup a PEX that wasn't enabled by
> the bootloader, and recover from clock gating in the kernel (eg the
> modular case)
>
> Both seem valuable..
>
> Ideally I'd like to see the clk driver turn off the PEX PHY when it
> gates the clock, but I have no great idea how to accomplish that sort
> of cross register space adventure...
>
> Thomas, if we can figure out how to properly do __clk_is_enabled I
> can probably send a proper patch?

Let me know when you do and I will test it and add my Tested-by.

>
> Thanks,
> Jason
>

Cheers,
Neil

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 18:02     ` Thomas Petazzoni
@ 2014-04-10 23:13       ` Willy Tarreau
  -1 siblings, 0 replies; 50+ messages in thread
From: Willy Tarreau @ 2014-04-10 23:13 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Lior Amsalem, Neil Greatorex, Matthew Minter, Jason Cooper,
	Tawfik Bayouk, Andrew Lunn, linux-pci, Jason Gunthorpe,
	Gerlando Falauto, Ezequiel Garcia, Gregory Clément,
	linux-arm-kernel

Hi Thomas,

On Thu, Apr 10, 2014 at 08:02:22PM +0200, Thomas Petazzoni wrote:
> > Really cool, I'm going to test that on a few PCIe cards and will report
> > the results here. How can we check the number of mbus windows in use ?
> 
> # cat /sys/kernel/debug/mvebu-mbus/devices

Thanks, so here we go :

XP-GP with igb, works perfectly :

root@xpgp:~# dmesg|grep igb
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
igb 0000:02:00.0: added PHC on eth4
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth4: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth4: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
igb 0000:02:00.1: added PHC on eth5
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)

root@xpgp:~# lspci -v -s 02:00 | egrep -i '^0|Memory'
02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e0000000 (32-bit, non-prefetchable) [size=512K]
        Memory at e0200000 (32-bit, non-prefetchable) [size=16K]
02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e0100000 (32-bit, non-prefetchable) [size=512K]
        Memory at e0204000 (32-bit, non-prefetchable) [size=16K]

root@xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices 
[00] 00000000e8010000 - 00000000e8020000 : 0004:00f0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e0000000 - 00000000e0200000 : 0004:00f8
[11] 00000000e0200000 - 00000000e0300000 : 0004:00f8

So the areas are well covered, though #11 seems larger than needed
but I seem to remember that they're all rounded up by 1 MB anyway,
then if so, that's OK.

Now with igb + myricom :

root@xpgp:~# lspci -v -s 02:00 | egrep -i '^0|Memory'
02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1800000 (32-bit, non-prefetchable) [disabled] [size=512K]
        Memory at e1a00000 (32-bit, non-prefetchable) [disabled] [size=16K]
02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1900000 (32-bit, non-prefetchable) [disabled] [size=512K]
        Memory at e1a04000 (32-bit, non-prefetchable) [disabled] [size=16K]

root@xpgp:~# lspci -v -s 03:00 | egrep -i '^0|Memory'
03:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC
        Memory at e0000000 (64-bit, prefetchable) [size=16M]
        Memory at e1000000 (64-bit, non-prefetchable) [size=1M]

root@xpgp:~# modprobe igb
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:09.0 (0140 -> 0143)
PCI: enabling device 0000:02:00.0 (0140 -> 0142)
igb 0000:02:00.0: added PHC on eth4
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth4: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth4: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
igb 0000:02:00.1: added PHC on eth5
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)

root@xpgp:~# modprobe myri10ge
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: PCIE x4 Link
myri10ge 0000:03:00.0: Direct firmware load failed with error -2
myri10ge 0000:03:00.0: Falling back to user helper
myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
myri10ge 0000:03:00.0: hotplug firmware loading failed
myri10ge 0000:03:00.0: Successfully adopted running firmware
myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
myri10ge 0000:03:00.0: performance consider loading optimized firmware
myri10ge 0000:03:00.0: via hotplug
myri10ge 0000:03:00.0: MSI IRQ 113, tx bndry 2048, fw adopted, WC Disabled

root@xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8010000 - 00000000e8020000 : 0004:00f0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e1800000 - 00000000e1a00000 : 0004:00f8
[11] 00000000e1a00000 - 00000000e1b00000 : 0004:00f8
[12] 00000000e0000000 - 00000000e1000000 : 0008:00f8
[13] 00000000e1000000 - 00000000e1800000 : 0008:00f8

I noticed above that both igb ports share the same window #11. So I
tried to rmmod igb, remove both PCI devices, check mbus again (which
did not change), rescan PCI and modprobe igb again, and everything is
still operational with the same windows. I don't know if it is normal
that they're not unregistered when the device goes away (maybe there's
no refcount) ?

If we have to keep them forever, then maybe a further improvement
will consist in merging adjacent windows which sum up as a power of
two (eg: #10 and #11 may be merged).

I tried to add a 3rd NIC in the mix (broadcom tg3), which caused the
myri10ge to fail to load for an obscure reason after loading igb
properly :

root@xpgp:~# dmesg
tg3.c:v3.134 (Sep 16, 2013)
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
tg3 0000:01:00.0 eth4: Tigon3 [partno(BCM95721A211) rev 4001] (PCI Express) MAC address 00:07:11:04:3e:e6
tg3 0000:01:00.0 eth4: attached PHY is 5750 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
tg3 0000:01:00.0 eth4: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
tg3 0000:01:00.0 eth4: dma_rwctrl[76180000] dma_mask[64-bit]
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:09.0 (0140 -> 0143)
PCI: enabling device 0000:02:00.0 (0140 -> 0142)
igb 0000:02:00.0: added PHC on eth5
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
igb 0000:02:00.1: added PHC on eth6
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth6: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth6: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: invalid sram_size -1B or board span 16777216B
root@xpgp:~# 

root@xpgp:~# lspci -v -s 01:00 | egrep -i '^0| at '
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 01)
        Memory at e1800000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at e1810000 [disabled] [size=64K]

root@xpgp:~# lspci -v -s 02:00 | egrep -i '^0| at '
02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1a00000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 10000 [disabled] [size=32]
        Memory at e1c00000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at e1a80000 [disabled] [size=512K]
02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1b00000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 10020 [disabled] [size=32]
        Memory at e1c04000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at e1b80000 [disabled] [size=512K]

root@xpgp:~# lspci -v -s 03:00 | egrep -i '^0| at '
03:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC
        Memory at e0000000 (64-bit, prefetchable) [size=16M]
        Memory at e1000000 (64-bit, non-prefetchable) [size=1M]
        [virtual] Expansion ROM at e1100000 [disabled] [size=512K]
root@xpgp:~# 

root@xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8010000 - 00000000e8020000 : 0004:00f0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e1800000 - 00000000e1900000 : 0004:00e8
[11] 00000000e1a00000 - 00000000e1c00000 : 0004:00f8
[12] 00000000e1c00000 - 00000000e1d00000 : 0004:00f8
[13] 00000000e0000000 - 00000000e1000000 : 0008:00f8
[14] 00000000e1000000 - 00000000e1800000 : 0008:00f8

At least nothing seems wrong anywhere, so for now we should probably
ignore it, unless someone has a good idea about something to look at.

Now I'm using a Realtek instead of TG3, so I have this :

root@xpgp:~# lspci -v -s 01:00 | egrep -i '^0| at '
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
        I/O ports at 10000 [disabled] [size=256]
        Memory at e1820000 (64-bit, non-prefetchable) [size=4K]
        Memory at e1800000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at e1810000 [size=64K]

root@xpgp:~# lspci -v -s 02:00 | egrep -i '^0| at '
02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1a00000 (32-bit, non-prefetchable) [disabled] [size=512K]
        I/O ports at 20000 [disabled] [size=32]
        Memory at e1c00000 (32-bit, non-prefetchable) [disabled] [size=16K]
        [virtual] Expansion ROM at e1a80000 [disabled] [size=512K]
02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1b00000 (32-bit, non-prefetchable) [disabled] [size=512K]
        I/O ports at 20020 [disabled] [size=32]
        Memory at e1c04000 (32-bit, non-prefetchable) [disabled] [size=16K]
        [virtual] Expansion ROM at e1b80000 [disabled] [size=512K]

root@xpgp:~# lspci -v -s 03:00 | egrep -i '^0| at '
03:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC
        Memory at e0000000 (64-bit, prefetchable) [size=16M]
        Memory at e1000000 (64-bit, non-prefetchable) [size=1M]
        [virtual] Expansion ROM at e1100000 [disabled] [size=512K]

I get similar results :

root@xpgp:~# dmesg
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
PCI: enabling device 0000:01:00.0 (0146 -> 0147)
r8169 0000:01:00.0 eth4: RTL8168c/8111c at 0xf0346000, 00:e0:4c:81:20:79, XID 1c4000c0 IRQ 119
r8169 0000:01:00.0 eth4: jumbo features [frames: 6128 bytes, tx checksumming: ko]
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:09.0 (0140 -> 0143)
PCI: enabling device 0000:02:00.0 (0140 -> 0142)
igb 0000:02:00.0: added PHC on eth5
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
igb 0000:02:00.1: added PHC on eth6
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth6: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth6: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: invalid sram_size -1B or board span 16777216B

root@xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8010000 - 00000000e8020000 : 0004:00e0 (remap 0000000000010000)
[01] 00000000e8020000 - 00000000e8030000 : 0004:00f0 (remap 0000000000020000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e1800000 - 00000000e1900000 : 0004:00e8
[11] 00000000e1a00000 - 00000000e1c00000 : 0004:00f8
[12] 00000000e1c00000 - 00000000e1d00000 : 0004:00f8
[13] 00000000e0000000 - 00000000e1000000 : 0008:00f8
[14] 00000000e1000000 - 00000000e1800000 : 0008:00f8
root@xpgp:~# 

Ah, interestingly if I load the NICs in the opposite order, they all load
properly (myri10ge, igb, r8169) :

root@xpgp:~# dmesg
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: PCIE x4 Link
myri10ge 0000:03:00.0: Direct firmware load failed with error -2
myri10ge 0000:03:00.0: Falling back to user helper
myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
myri10ge 0000:03:00.0: hotplug firmware loading failed
myri10ge 0000:03:00.0: Successfully adopted running firmware
myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
myri10ge 0000:03:00.0: performance consider loading optimized firmware
myri10ge 0000:03:00.0: via hotplug
myri10ge 0000:03:00.0: MSI IRQ 114, tx bndry 2048, fw adopted, WC Disabled
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:09.0 (0140 -> 0143)
PCI: enabling device 0000:02:00.0 (0140 -> 0142)
igb 0000:02:00.0: added PHC on eth5
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
igb 0000:02:00.1: added PHC on eth6
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth6: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth6: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
PCI: enabling device 0000:01:00.0 (0146 -> 0147)
r8169 0000:01:00.0 eth7: RTL8168c/8111c at 0xf037e000, 00:e0:4c:81:20:79, XID 1c4000c0 IRQ 121
r8169 0000:01:00.0 eth7: jumbo features [frames: 6128 bytes, tx checksumming: ko]

root@xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8020000 - 00000000e8030000 : 0004:00f0 (remap 0000000000020000)
[01] 00000000e8010000 - 00000000e8020000 : 0004:00e0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e0000000 - 00000000e1000000 : 0008:00f8
[11] 00000000e1000000 - 00000000e1800000 : 0008:00f8
[12] 00000000e1a00000 - 00000000e1c00000 : 0004:00f8
[13] 00000000e1c00000 - 00000000e1d00000 : 0004:00f8
[14] 00000000e1800000 - 00000000e1900000 : 0004:00e8

On the Mirabox, I don't see the igb NIC on lspci, but it's late and
I start to think slowly so I'll have to dig this out tomorrow. Hmmm
I'm seeing it after a rescan, it looks like the same issue that Neil
initially reported about the link up delay.

All the devices are detected now (including the USB3 controller) :

root@mirabox:~# lspci -v  | egrep -i '^0| at '
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
01:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e0200000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 10000 [disabled] [size=32]
        Memory at e0400000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at e0280000 [disabled] [size=512K]
01:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e0300000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 10020 [disabled] [size=32]
        Memory at e0404000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at e0380000 [disabled] [size=512K]
02:00.0 USB Controller: Device 1b73:1009 (rev 02) (prog-if 30)
        Memory at e0000000 (64-bit, non-prefetchable) [size=64K]
        Memory at e0010000 (64-bit, non-prefetchable) [size=4K]
        Memory at e0011000 (64-bit, non-prefetchable) [size=4K]

The nic properly loads :

root@mirabox:~# dmesg
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
PCI: enabling device 0000:01:00.0 (0000 -> 0002)
igb 0000:01:00.0: added PHC on eth2
igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:01:00.0: eth2: (PCIe:2.5Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:01:00.0: eth2: PBA No: FFFFFF-0FF
igb 0000:01:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:01:00.1 (0000 -> 0002)
igb 0000:01:00.1: added PHC on eth3
igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:01:00.1: eth3: (PCIe:2.5Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:01:00.1: eth3: PBA No: FFFFFF-0FF
igb 0000:01:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
root@mirabox:~# 

And the mbus windows match expectations :

root@mirabox:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8010000 - 00000000e8020000 : 0004:00e0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:00e0
[09] 00000000e0000000 - 00000000e0100000 : 0008:00e8
[10] 00000000e0200000 - 00000000e0400000 : 0004:00e8
[11] 00000000e0400000 - 00000000e0500000 : 0004:00e8

So overall, it's a big Ack from my side considering the huge improvements,
let's retry tomorrow with the link up workaround/fix to see if the detection
issue is related. Great work!

Best regards,
Willy


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 23:13       ` Willy Tarreau
  0 siblings, 0 replies; 50+ messages in thread
From: Willy Tarreau @ 2014-04-10 23:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On Thu, Apr 10, 2014 at 08:02:22PM +0200, Thomas Petazzoni wrote:
> > Really cool, I'm going to test that on a few PCIe cards and will report
> > the results here. How can we check the number of mbus windows in use ?
> 
> # cat /sys/kernel/debug/mvebu-mbus/devices

Thanks, so here we go :

XP-GP with igb, works perfectly :

root at xpgp:~# dmesg|grep igb
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
igb 0000:02:00.0: added PHC on eth4
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth4: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth4: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
igb 0000:02:00.1: added PHC on eth5
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)

root at xpgp:~# lspci -v -s 02:00 | egrep -i '^0|Memory'
02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e0000000 (32-bit, non-prefetchable) [size=512K]
        Memory at e0200000 (32-bit, non-prefetchable) [size=16K]
02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e0100000 (32-bit, non-prefetchable) [size=512K]
        Memory at e0204000 (32-bit, non-prefetchable) [size=16K]

root at xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices 
[00] 00000000e8010000 - 00000000e8020000 : 0004:00f0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e0000000 - 00000000e0200000 : 0004:00f8
[11] 00000000e0200000 - 00000000e0300000 : 0004:00f8

So the areas are well covered, though #11 seems larger than needed
but I seem to remember that they're all rounded up by 1 MB anyway,
then if so, that's OK.

Now with igb + myricom :

root at xpgp:~# lspci -v -s 02:00 | egrep -i '^0|Memory'
02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1800000 (32-bit, non-prefetchable) [disabled] [size=512K]
        Memory at e1a00000 (32-bit, non-prefetchable) [disabled] [size=16K]
02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1900000 (32-bit, non-prefetchable) [disabled] [size=512K]
        Memory at e1a04000 (32-bit, non-prefetchable) [disabled] [size=16K]

root at xpgp:~# lspci -v -s 03:00 | egrep -i '^0|Memory'
03:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC
        Memory at e0000000 (64-bit, prefetchable) [size=16M]
        Memory at e1000000 (64-bit, non-prefetchable) [size=1M]

root at xpgp:~# modprobe igb
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:09.0 (0140 -> 0143)
PCI: enabling device 0000:02:00.0 (0140 -> 0142)
igb 0000:02:00.0: added PHC on eth4
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth4: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth4: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
igb 0000:02:00.1: added PHC on eth5
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)

root at xpgp:~# modprobe myri10ge
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: PCIE x4 Link
myri10ge 0000:03:00.0: Direct firmware load failed with error -2
myri10ge 0000:03:00.0: Falling back to user helper
myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
myri10ge 0000:03:00.0: hotplug firmware loading failed
myri10ge 0000:03:00.0: Successfully adopted running firmware
myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
myri10ge 0000:03:00.0: performance consider loading optimized firmware
myri10ge 0000:03:00.0: via hotplug
myri10ge 0000:03:00.0: MSI IRQ 113, tx bndry 2048, fw adopted, WC Disabled

root at xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8010000 - 00000000e8020000 : 0004:00f0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e1800000 - 00000000e1a00000 : 0004:00f8
[11] 00000000e1a00000 - 00000000e1b00000 : 0004:00f8
[12] 00000000e0000000 - 00000000e1000000 : 0008:00f8
[13] 00000000e1000000 - 00000000e1800000 : 0008:00f8

I noticed above that both igb ports share the same window #11. So I
tried to rmmod igb, remove both PCI devices, check mbus again (which
did not change), rescan PCI and modprobe igb again, and everything is
still operational with the same windows. I don't know if it is normal
that they're not unregistered when the device goes away (maybe there's
no refcount) ?

If we have to keep them forever, then maybe a further improvement
will consist in merging adjacent windows which sum up as a power of
two (eg: #10 and #11 may be merged).

I tried to add a 3rd NIC in the mix (broadcom tg3), which caused the
myri10ge to fail to load for an obscure reason after loading igb
properly :

root at xpgp:~# dmesg
tg3.c:v3.134 (Sep 16, 2013)
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
tg3 0000:01:00.0 eth4: Tigon3 [partno(BCM95721A211) rev 4001] (PCI Express) MAC address 00:07:11:04:3e:e6
tg3 0000:01:00.0 eth4: attached PHY is 5750 (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[0])
tg3 0000:01:00.0 eth4: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
tg3 0000:01:00.0 eth4: dma_rwctrl[76180000] dma_mask[64-bit]
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:09.0 (0140 -> 0143)
PCI: enabling device 0000:02:00.0 (0140 -> 0142)
igb 0000:02:00.0: added PHC on eth5
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
igb 0000:02:00.1: added PHC on eth6
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth6: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth6: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: invalid sram_size -1B or board span 16777216B
root at xpgp:~# 

root at xpgp:~# lspci -v -s 01:00 | egrep -i '^0| at '
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 01)
        Memory at e1800000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at e1810000 [disabled] [size=64K]

root at xpgp:~# lspci -v -s 02:00 | egrep -i '^0| at '
02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1a00000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 10000 [disabled] [size=32]
        Memory at e1c00000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at e1a80000 [disabled] [size=512K]
02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1b00000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 10020 [disabled] [size=32]
        Memory at e1c04000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at e1b80000 [disabled] [size=512K]

root at xpgp:~# lspci -v -s 03:00 | egrep -i '^0| at '
03:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC
        Memory at e0000000 (64-bit, prefetchable) [size=16M]
        Memory at e1000000 (64-bit, non-prefetchable) [size=1M]
        [virtual] Expansion ROM at e1100000 [disabled] [size=512K]
root at xpgp:~# 

root at xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8010000 - 00000000e8020000 : 0004:00f0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e1800000 - 00000000e1900000 : 0004:00e8
[11] 00000000e1a00000 - 00000000e1c00000 : 0004:00f8
[12] 00000000e1c00000 - 00000000e1d00000 : 0004:00f8
[13] 00000000e0000000 - 00000000e1000000 : 0008:00f8
[14] 00000000e1000000 - 00000000e1800000 : 0008:00f8

At least nothing seems wrong anywhere, so for now we should probably
ignore it, unless someone has a good idea about something to look at.

Now I'm using a Realtek instead of TG3, so I have this :

root at xpgp:~# lspci -v -s 01:00 | egrep -i '^0| at '
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
        I/O ports at 10000 [disabled] [size=256]
        Memory at e1820000 (64-bit, non-prefetchable) [size=4K]
        Memory at e1800000 (64-bit, prefetchable) [size=64K]
        Expansion ROM at e1810000 [size=64K]

root at xpgp:~# lspci -v -s 02:00 | egrep -i '^0| at '
02:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1a00000 (32-bit, non-prefetchable) [disabled] [size=512K]
        I/O ports at 20000 [disabled] [size=32]
        Memory at e1c00000 (32-bit, non-prefetchable) [disabled] [size=16K]
        [virtual] Expansion ROM at e1a80000 [disabled] [size=512K]
02:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e1b00000 (32-bit, non-prefetchable) [disabled] [size=512K]
        I/O ports at 20020 [disabled] [size=32]
        Memory at e1c04000 (32-bit, non-prefetchable) [disabled] [size=16K]
        [virtual] Expansion ROM at e1b80000 [disabled] [size=512K]

root at xpgp:~# lspci -v -s 03:00 | egrep -i '^0| at '
03:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC
        Memory at e0000000 (64-bit, prefetchable) [size=16M]
        Memory at e1000000 (64-bit, non-prefetchable) [size=1M]
        [virtual] Expansion ROM at e1100000 [disabled] [size=512K]

I get similar results :

root at xpgp:~# dmesg
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
PCI: enabling device 0000:01:00.0 (0146 -> 0147)
r8169 0000:01:00.0 eth4: RTL8168c/8111c at 0xf0346000, 00:e0:4c:81:20:79, XID 1c4000c0 IRQ 119
r8169 0000:01:00.0 eth4: jumbo features [frames: 6128 bytes, tx checksumming: ko]
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:09.0 (0140 -> 0143)
PCI: enabling device 0000:02:00.0 (0140 -> 0142)
igb 0000:02:00.0: added PHC on eth5
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
igb 0000:02:00.1: added PHC on eth6
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth6: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth6: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: invalid sram_size -1B or board span 16777216B

root at xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8010000 - 00000000e8020000 : 0004:00e0 (remap 0000000000010000)
[01] 00000000e8020000 - 00000000e8030000 : 0004:00f0 (remap 0000000000020000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e1800000 - 00000000e1900000 : 0004:00e8
[11] 00000000e1a00000 - 00000000e1c00000 : 0004:00f8
[12] 00000000e1c00000 - 00000000e1d00000 : 0004:00f8
[13] 00000000e0000000 - 00000000e1000000 : 0008:00f8
[14] 00000000e1000000 - 00000000e1800000 : 0008:00f8
root at xpgp:~# 

Ah, interestingly if I load the NICs in the opposite order, they all load
properly (myri10ge, igb, r8169) :

root at xpgp:~# dmesg
myri10ge: Version 1.5.3-1.534
PCI: enabling device 0000:00:0a.0 (0140 -> 0143)
myri10ge 0000:03:00.0: PCIE x4 Link
myri10ge 0000:03:00.0: Direct firmware load failed with error -2
myri10ge 0000:03:00.0: Falling back to user helper
myri10ge 0000:03:00.0: Unable to load myri10ge_eth_z8e.dat firmware image via hotplug
myri10ge 0000:03:00.0: hotplug firmware loading failed
myri10ge 0000:03:00.0: Successfully adopted running firmware
myri10ge 0000:03:00.0: Using firmware currently running on NIC.  For optimal
myri10ge 0000:03:00.0: performance consider loading optimized firmware
myri10ge 0000:03:00.0: via hotplug
myri10ge 0000:03:00.0: MSI IRQ 114, tx bndry 2048, fw adopted, WC Disabled
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:09.0 (0140 -> 0143)
PCI: enabling device 0000:02:00.0 (0140 -> 0142)
igb 0000:02:00.0: added PHC on eth5
igb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.0: eth5: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:02:00.0: eth5: PBA No: FFFFFF-0FF
igb 0000:02:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:02:00.1 (0140 -> 0142)
igb 0000:02:00.1: added PHC on eth6
igb 0000:02:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:02:00.1: eth6: (PCIe:5.0Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:02:00.1: eth6: PBA No: FFFFFF-0FF
igb 0000:02:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
PCI: enabling device 0000:01:00.0 (0146 -> 0147)
r8169 0000:01:00.0 eth7: RTL8168c/8111c at 0xf037e000, 00:e0:4c:81:20:79, XID 1c4000c0 IRQ 121
r8169 0000:01:00.0 eth7: jumbo features [frames: 6128 bytes, tx checksumming: ko]

root at xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8020000 - 00000000e8030000 : 0004:00f0 (remap 0000000000020000)
[01] 00000000e8010000 - 00000000e8020000 : 0004:00e0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:001d
[09] 00000000f0000000 - 00000000f1000000 : 0001:002f
[10] 00000000e0000000 - 00000000e1000000 : 0008:00f8
[11] 00000000e1000000 - 00000000e1800000 : 0008:00f8
[12] 00000000e1a00000 - 00000000e1c00000 : 0004:00f8
[13] 00000000e1c00000 - 00000000e1d00000 : 0004:00f8
[14] 00000000e1800000 - 00000000e1900000 : 0004:00e8

On the Mirabox, I don't see the igb NIC on lspci, but it's late and
I start to think slowly so I'll have to dig this out tomorrow. Hmmm
I'm seeing it after a rescan, it looks like the same issue that Neil
initially reported about the link up delay.

All the devices are detected now (including the USB3 controller) :

root at mirabox:~# lspci -v  | egrep -i '^0| at '
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
01:00.0 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e0200000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 10000 [disabled] [size=32]
        Memory at e0400000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at e0280000 [disabled] [size=512K]
01:00.1 Ethernet controller: Intel Corporation Device 1521 (rev 01)
        Memory at e0300000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 10020 [disabled] [size=32]
        Memory at e0404000 (32-bit, non-prefetchable) [size=16K]
        [virtual] Expansion ROM at e0380000 [disabled] [size=512K]
02:00.0 USB Controller: Device 1b73:1009 (rev 02) (prog-if 30)
        Memory at e0000000 (64-bit, non-prefetchable) [size=64K]
        Memory at e0010000 (64-bit, non-prefetchable) [size=4K]
        Memory at e0011000 (64-bit, non-prefetchable) [size=4K]

The nic properly loads :

root at mirabox:~# dmesg
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.0.5-k
igb: Copyright (c) 2007-2013 Intel Corporation.
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
PCI: enabling device 0000:01:00.0 (0000 -> 0002)
igb 0000:01:00.0: added PHC on eth2
igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
igb 0000:01:00.0: eth2: (PCIe:2.5Gb/s:Width x1) 00:30:18:a6:6c:6a
igb 0000:01:00.0: eth2: PBA No: FFFFFF-0FF
igb 0000:01:00.0: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
PCI: enabling device 0000:01:00.1 (0000 -> 0002)
igb 0000:01:00.1: added PHC on eth3
igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection
igb 0000:01:00.1: eth3: (PCIe:2.5Gb/s:Width x1) 00:30:18:a6:6c:6b
igb 0000:01:00.1: eth3: PBA No: FFFFFF-0FF
igb 0000:01:00.1: Using MSI interrupts. 1 rx queue(s), 1 tx queue(s)
root at mirabox:~# 

And the mbus windows match expectations :

root at mirabox:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
[00] 00000000e8010000 - 00000000e8020000 : 0004:00e0 (remap 0000000000010000)
[08] 00000000fff00000 - 0000000100000000 : 0001:00e0
[09] 00000000e0000000 - 00000000e0100000 : 0008:00e8
[10] 00000000e0200000 - 00000000e0400000 : 0004:00e8
[11] 00000000e0400000 - 00000000e0500000 : 0004:00e8

So overall, it's a big Ack from my side considering the huge improvements,
let's retry tomorrow with the link up workaround/fix to see if the detection
issue is related. Great work!

Best regards,
Willy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 23:13       ` Willy Tarreau
@ 2014-04-10 23:40         ` Jason Gunthorpe
  -1 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-10 23:40 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Thomas Petazzoni, Lior Amsalem, Neil Greatorex, Matthew Minter,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, linux-pci,
	Gerlando Falauto, Ezequiel Garcia, Gregory Clément,
	linux-arm-kernel

On Fri, Apr 11, 2014 at 01:13:36AM +0200, Willy Tarreau wrote:

> So the areas are well covered, though #11 seems larger than needed
> but I seem to remember that they're all rounded up by 1 MB anyway,
> then if so, that's OK.

Right, PCI bridge windows are 1MB aligned

> root@xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
> [00] 00000000e8010000 - 00000000e8020000 : 0004:00f0 (remap 0000000000010000)
> [08] 00000000fff00000 - 0000000100000000 : 0001:001d
> [09] 00000000f0000000 - 00000000f1000000 : 0001:002f
> [10] 00000000e1800000 - 00000000e1a00000 : 0004:00f8
> [11] 00000000e1a00000 - 00000000e1b00000 : 0004:00f8
> [12] 00000000e0000000 - 00000000e1000000 : 0008:00f8
> [13] 00000000e1000000 - 00000000e1800000 : 0008:00f8
> 
> I noticed above that both igb ports share the same window #11. So I
> tried to rmmod igb, remove both PCI devices, check mbus again (which
> did not change), rescan PCI and modprobe igb again, and everything is
> still operational with the same windows. I don't know if it is normal
> that they're not unregistered when the device goes away (maybe there's
> no refcount) ?

The windows are tied to the PCI core, not to the using driver
module. So they will only changed based on rescan an dynamic resource
assignment in the PCI core. PCI rescan has a 'memory' of the last
bridge windows and won't make dramtic changes, so expect the windows
to fairly sticky.

> If we have to keep them forever, then maybe a further improvement
> will consist in merging adjacent windows which sum up as a power of
> two (eg: #10 and #11 may be merged).

0x1b00000 - 0x1800000 = 0x300000 which is not a power of two..

> I tried to add a 3rd NIC in the mix (broadcom tg3), which caused the
> myri10ge to fail to load for an obscure reason after loading igb
> properly :

Oh, this looks a lot like what Thomas reported with his 5 NICs.

I really wonder what could be going on here.....

> Ah, interestingly if I load the NICs in the opposite order, they all load
> properly (myri10ge, igb, r8169) :

Load the NICs means insmod the driver ?

That is repeatable?

Certainly spooky, and suggests a kernel bug.....

It would be interesting to see what register values the driver is
getting back, is it all 0xF? 

I wonder if something is going wrong with the config write to enable
the memory decoder. That is triggered by the driver...

> So overall, it's a big Ack from my side considering the huge
> improvements, let's retry tomorrow with the link up workaround/fix
> to see if the detection issue is related. Great work!

Seems very likely to me, if the modified patch from Neil fixes it for
you too then we need to get that into mergable shape too!

Regards,
Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-10 23:40         ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-10 23:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 01:13:36AM +0200, Willy Tarreau wrote:

> So the areas are well covered, though #11 seems larger than needed
> but I seem to remember that they're all rounded up by 1 MB anyway,
> then if so, that's OK.

Right, PCI bridge windows are 1MB aligned

> root at xpgp:~# grep -v disabled /sys/kernel/debug/mvebu-mbus/devices
> [00] 00000000e8010000 - 00000000e8020000 : 0004:00f0 (remap 0000000000010000)
> [08] 00000000fff00000 - 0000000100000000 : 0001:001d
> [09] 00000000f0000000 - 00000000f1000000 : 0001:002f
> [10] 00000000e1800000 - 00000000e1a00000 : 0004:00f8
> [11] 00000000e1a00000 - 00000000e1b00000 : 0004:00f8
> [12] 00000000e0000000 - 00000000e1000000 : 0008:00f8
> [13] 00000000e1000000 - 00000000e1800000 : 0008:00f8
> 
> I noticed above that both igb ports share the same window #11. So I
> tried to rmmod igb, remove both PCI devices, check mbus again (which
> did not change), rescan PCI and modprobe igb again, and everything is
> still operational with the same windows. I don't know if it is normal
> that they're not unregistered when the device goes away (maybe there's
> no refcount) ?

The windows are tied to the PCI core, not to the using driver
module. So they will only changed based on rescan an dynamic resource
assignment in the PCI core. PCI rescan has a 'memory' of the last
bridge windows and won't make dramtic changes, so expect the windows
to fairly sticky.

> If we have to keep them forever, then maybe a further improvement
> will consist in merging adjacent windows which sum up as a power of
> two (eg: #10 and #11 may be merged).

0x1b00000 - 0x1800000 = 0x300000 which is not a power of two..

> I tried to add a 3rd NIC in the mix (broadcom tg3), which caused the
> myri10ge to fail to load for an obscure reason after loading igb
> properly :

Oh, this looks a lot like what Thomas reported with his 5 NICs.

I really wonder what could be going on here.....

> Ah, interestingly if I load the NICs in the opposite order, they all load
> properly (myri10ge, igb, r8169) :

Load the NICs means insmod the driver ?

That is repeatable?

Certainly spooky, and suggests a kernel bug.....

It would be interesting to see what register values the driver is
getting back, is it all 0xF? 

I wonder if something is going wrong with the config write to enable
the memory decoder. That is triggered by the driver...

> So overall, it's a big Ack from my side considering the huge
> improvements, let's retry tomorrow with the link up workaround/fix
> to see if the detection issue is related. Great work!

Seems very likely to me, if the modified patch from Neil fixes it for
you too then we need to get that into mergable shape too!

Regards,
Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 23:40         ` Jason Gunthorpe
@ 2014-04-11  6:23           ` Willy Tarreau
  -1 siblings, 0 replies; 50+ messages in thread
From: Willy Tarreau @ 2014-04-11  6:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Thomas Petazzoni, Lior Amsalem, Neil Greatorex, Matthew Minter,
	Jason Cooper, Tawfik Bayouk, Andrew Lunn, linux-pci,
	Gerlando Falauto, Ezequiel Garcia, Gregory Clément,
	linux-arm-kernel

Hi Jason,

On Thu, Apr 10, 2014 at 05:40:00PM -0600, Jason Gunthorpe wrote:
> The windows are tied to the PCI core, not to the using driver
> module. So they will only changed based on rescan an dynamic resource
> assignment in the PCI core. PCI rescan has a 'memory' of the last
> bridge windows and won't make dramtic changes, so expect the windows
> to fairly sticky.

OK.

> > If we have to keep them forever, then maybe a further improvement
> > will consist in merging adjacent windows which sum up as a power of
> > two (eg: #10 and #11 may be merged).
> 
> 0x1b00000 - 0x1800000 = 0x300000 which is not a power of two..

Of course you're right. It was late last night, and I was having
a hard time thinking the addresses were not inclusive so in my
mind it was 0x18..0x1b inclusive, thus 4MB... Never mind.

> > I tried to add a 3rd NIC in the mix (broadcom tg3), which caused the
> > myri10ge to fail to load for an obscure reason after loading igb
> > properly :
> 
> Oh, this looks a lot like what Thomas reported with his 5 NICs.
> 
> I really wonder what could be going on here.....

I don't know but I have the hardware to easily reproduce it, if we want
to add printks again.

> > Ah, interestingly if I load the NICs in the opposite order, they all load
> > properly (myri10ge, igb, r8169) :
> 
> Load the NICs means insmod the driver ?

Yes.

> That is repeatable?

Yes, 100% it seems.

> Certainly spooky, and suggests a kernel bug.....
> 
> It would be interesting to see what register values the driver is
> getting back, is it all 0xF? 

That's what I suspected from the -1, but since the driver says "or 16MB"
and one of the windows is 16MB, I'm still confused, I need to add some
printk there.

> I wonder if something is going wrong with the config write to enable
> the memory decoder. That is triggered by the driver...

Thomas told me that the mbus driver is able to suggest a different
start address for the PCI windows. Maybe we fall in this case and the
driver doesn't expect this and uses a different register for the start
address.

> > So overall, it's a big Ack from my side considering the huge
> > improvements, let's retry tomorrow with the link up workaround/fix
> > to see if the detection issue is related. Great work!
> 
> Seems very likely to me, if the modified patch from Neil fixes it for
> you too then we need to get that into mergable shape too!

I can confirm that simply commenting out clk_disable_unprepare(clk)
fixes this problem, so yes it's the same issue. Just tried Neil's
modified patch and it works fine as well. So yes, we're making a lot
of progress.

Just in case anyone is interested, this is the NIC I'm using, both
on the mirabox and on the XP-GP ; it was worth an acquisition
considering how many corner cases it triggers in the kernel code :

  http://www.jetway.com.tw/jw/ipcboard_view.asp?productid=873&proname=ADMPEIDLA

Cheers,
Willy


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-11  6:23           ` Willy Tarreau
  0 siblings, 0 replies; 50+ messages in thread
From: Willy Tarreau @ 2014-04-11  6:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jason,

On Thu, Apr 10, 2014 at 05:40:00PM -0600, Jason Gunthorpe wrote:
> The windows are tied to the PCI core, not to the using driver
> module. So they will only changed based on rescan an dynamic resource
> assignment in the PCI core. PCI rescan has a 'memory' of the last
> bridge windows and won't make dramtic changes, so expect the windows
> to fairly sticky.

OK.

> > If we have to keep them forever, then maybe a further improvement
> > will consist in merging adjacent windows which sum up as a power of
> > two (eg: #10 and #11 may be merged).
> 
> 0x1b00000 - 0x1800000 = 0x300000 which is not a power of two..

Of course you're right. It was late last night, and I was having
a hard time thinking the addresses were not inclusive so in my
mind it was 0x18..0x1b inclusive, thus 4MB... Never mind.

> > I tried to add a 3rd NIC in the mix (broadcom tg3), which caused the
> > myri10ge to fail to load for an obscure reason after loading igb
> > properly :
> 
> Oh, this looks a lot like what Thomas reported with his 5 NICs.
> 
> I really wonder what could be going on here.....

I don't know but I have the hardware to easily reproduce it, if we want
to add printks again.

> > Ah, interestingly if I load the NICs in the opposite order, they all load
> > properly (myri10ge, igb, r8169) :
> 
> Load the NICs means insmod the driver ?

Yes.

> That is repeatable?

Yes, 100% it seems.

> Certainly spooky, and suggests a kernel bug.....
> 
> It would be interesting to see what register values the driver is
> getting back, is it all 0xF? 

That's what I suspected from the -1, but since the driver says "or 16MB"
and one of the windows is 16MB, I'm still confused, I need to add some
printk there.

> I wonder if something is going wrong with the config write to enable
> the memory decoder. That is triggered by the driver...

Thomas told me that the mbus driver is able to suggest a different
start address for the PCI windows. Maybe we fall in this case and the
driver doesn't expect this and uses a different register for the start
address.

> > So overall, it's a big Ack from my side considering the huge
> > improvements, let's retry tomorrow with the link up workaround/fix
> > to see if the detection issue is related. Great work!
> 
> Seems very likely to me, if the modified patch from Neil fixes it for
> you too then we need to get that into mergable shape too!

I can confirm that simply commenting out clk_disable_unprepare(clk)
fixes this problem, so yes it's the same issue. Just tried Neil's
modified patch and it works fine as well. So yes, we're making a lot
of progress.

Just in case anyone is interested, this is the NIC I'm using, both
on the mirabox and on the XP-GP ; it was worth an acquisition
considering how many corner cases it triggers in the kernel code :

  http://www.jetway.com.tw/jw/ipcboard_view.asp?productid=873&proname=ADMPEIDLA

Cheers,
Willy

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 21:56         ` Neil Greatorex
@ 2014-04-11 10:23           ` Thomas Petazzoni
  -1 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-11 10:23 UTC (permalink / raw)
  To: Neil Greatorex
  Cc: Jason Gunthorpe, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

Dear Neil Greatorex,

On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:
> > In any event, turning on the clock should almost certainly be
> > accompanied by a phy reset sequence to get both link ends on the same
> > page.
> >
> > Attached is a rough, untested patch along those lines.
> >
> 
> I took your attached patch and extended it a bit to print out how long it 
> took. The delays also need to be much longer for me. I also fixed a small 
> typo you made where the bit wasn't being set again to bring the link back 
> up. I've attached the diff to your patch, as well as the combined patch 
> (hope that makes sense).

I managed to reproduce the problem of the PCIe device not being
detected on Mirabox when earlyprintk is disabled.

However, the proposed patch doesn't seem to work:

mvebu-pcie pcie-controller.3: PCIe0.0: performing link reset
mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 100 tries
mvebu-pcie pcie-controller.3: PCIe0.0: link came back up after 0 tries

It waits hundred times for the link to go down, but it never goes down,
and then it doesn't wait at all to go up... because it never went down.

Moreover, I am not entirely convinced that a PHY reset is needed here.
In my tests, doing just a wait after setting the local dev number was
sufficient. The fact is works with earlyprintk also indicates that we
don't need to do any specific action with the PHY, just to wait a bit
of time. I am worried that resetting the PHY might actually take more
time than what is needed, and may have other consequences that we don't
necessarily understand at this point.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-11 10:23           ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-11 10:23 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Neil Greatorex,

On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:
> > In any event, turning on the clock should almost certainly be
> > accompanied by a phy reset sequence to get both link ends on the same
> > page.
> >
> > Attached is a rough, untested patch along those lines.
> >
> 
> I took your attached patch and extended it a bit to print out how long it 
> took. The delays also need to be much longer for me. I also fixed a small 
> typo you made where the bit wasn't being set again to bring the link back 
> up. I've attached the diff to your patch, as well as the combined patch 
> (hope that makes sense).

I managed to reproduce the problem of the PCIe device not being
detected on Mirabox when earlyprintk is disabled.

However, the proposed patch doesn't seem to work:

mvebu-pcie pcie-controller.3: PCIe0.0: performing link reset
mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 100 tries
mvebu-pcie pcie-controller.3: PCIe0.0: link came back up after 0 tries

It waits hundred times for the link to go down, but it never goes down,
and then it doesn't wait at all to go up... because it never went down.

Moreover, I am not entirely convinced that a PHY reset is needed here.
In my tests, doing just a wait after setting the local dev number was
sufficient. The fact is works with earlyprintk also indicates that we
don't need to do any specific action with the PHY, just to wait a bit
of time. I am worried that resetting the PHY might actually take more
time than what is needed, and may have other consequences that we don't
necessarily understand at this point.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 16:19 ` Thomas Petazzoni
@ 2014-04-11 14:32   ` Thomas Petazzoni
  -1 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-11 14:32 UTC (permalink / raw)
  To: Jason Gunthorpe, Neil Greatorex, Willy Tarreau, Matthew Minter,
	Gerlando Falauto
  Cc: Lior Amsalem, Andrew Lunn, Jason Cooper, Tawfik Bayouk,
	linux-pci, Ezequiel Garcia, Gregory Clément,
	linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]

Hello all,

On Thu, 10 Apr 2014 18:19:53 +0200, Thomas Petazzoni wrote:

> This is an e-mail that attempts to summarize the situation in terms of
> Armada XP PCIe issues.

Attached is a v2 of the patches to fix the various pci-mvebu issues.
Changes since the version posted yesterday:

 * Include a fix for the timing issue of the PCIe interface that gets
   its clock disabled. I've chosen a different approach than the one
   suggested by Jason Gunthorpe, which does not involve resetting the
   PHY. I've tested my fix on the Mirabox, and the Armada 385 DB board
   on which Gregory originally reported the problem (I finally managed
   to reproduce the problem, it was due to the fact that only one of
   the PCIe interfaces is actually affected by the problem, because
   only the clock of the first PCIe interface is used by the
   mvebu-soc-id stuff).

 * Invert the order of Willy's and Jason's patches around MBus
   addresses.

I've also:

 * Pushed the patches at
   https://github.com/MISL-EBU-System-SW/mainline-public/tree/3.14/pci-debug

 * Included a single combined patch, because I know one of you needs
   that to test easily.

Can everybody test these patches, and confirm that they solve all the
outstanding problems?

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: 0001-igb-Fix-Null-pointer-dereference-in-igb_reset_q_vect.patch --]
[-- Type: text/x-patch, Size: 1808 bytes --]

>From 8996f4001da3d364b3d1b0633e059b5ff3da3b5b Mon Sep 17 00:00:00 2001
From: Christoph Paasch <christoph.paasch@uclouvain.be>
Date: Fri, 21 Mar 2014 03:48:19 -0700
Subject: [PATCH 01/13] igb: Fix Null-pointer dereference in igb_reset_q_vector

When igb_set_interrupt_capability() calls
igb_reset_interrupt_capability() (e.g., because CONFIG_PCI_MSI is unset),
num_q_vectors has been set but no vector has yet been allocated.

igb_reset_interrupt_capability() will then call igb_reset_q_vector,
which assumes that the vector is allocated. As this is not the case, we
are accessing a NULL-pointer.

This patch fixes it by checking that q_vector is indeed different from
NULL.

Fixes: 02ef6e1d0b0023 (igb: Fix queue allocation method to accommodate changing during runtime)
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 46d31a4..bfcf192 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1014,6 +1014,12 @@ static void igb_reset_q_vector(struct igb_adapter *adapter, int v_idx)
 {
 	struct igb_q_vector *q_vector = adapter->q_vector[v_idx];
 
+	/* Coming from igb_set_interrupt_capability, the vectors are not yet
+	 * allocated. So, q_vector is NULL so we should stop here.
+	 */
+	if (!q_vector)
+		return;
+
 	if (q_vector->tx.ring)
 		adapter->tx_ring[q_vector->tx.ring->queue_index] = NULL;
 
-- 
1.8.3.2


[-- Attachment #3: 0002-igb-Unset-IGB_FLAG_HAS_MSIX-flag-when-falling-back-t.patch --]
[-- Type: text/x-patch, Size: 5116 bytes --]

>From 9ecd88559e5d53d8923e1942e2a42aa266404554 Mon Sep 17 00:00:00 2001
From: Christoph Paasch <christoph.paasch@uclouvain.be>
Date: Fri, 21 Mar 2014 04:02:09 -0700
Subject: [PATCH 02/13] igb: Unset IGB_FLAG_HAS_MSIX-flag when falling back to
 msi-only

Prior to cd14ef54d25 (igb: Change to use statically allocated array for
MSIx entries), having msix_entries different from NULL was an indicator
that MSIX is enabled.
In igb_set_interrupt_capabiliy we may fall back to MSI-only. Prior to
the above patch msix_entries was set to NULL by
igb_reset_interrupt_capability.

However, now we are checking the flag for IGB_FLAG_HAS_MSIX and so the
stack gets completly confused:

[   42.659791] ------------[ cut here ]------------
[   42.715032] WARNING: CPU: 7 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x15c/0x1fb()
[   42.848263] NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out
[   42.923253] Modules linked in:
[   42.959875] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.14.0-rc2-mptcp #437
[   43.043184] Hardware name: HP ProLiant DL165 G7, BIOS O37 01/26/2011
[   43.119215]  0000000000000108 ffff88023fdc3da8 ffffffff81487847 0000000000000108
[   43.208165]  ffff88023fdc3df8 ffff88023fdc3de8 ffffffff81034e7d ffff88023fdc3dd8
[   43.297120]  ffffffff813fff10 ffff880236018000 ffff880236b178c0 0000000000000008
[   43.386071] Call Trace:
[   43.415303]  <IRQ>  [<ffffffff81487847>] dump_stack+0x49/0x62
[   43.484174]  [<ffffffff81034e7d>] warn_slowpath_common+0x77/0x91
[   43.556049]  [<ffffffff813fff10>] ? dev_watchdog+0x15c/0x1fb
[   43.623759]  [<ffffffff81034f2b>] warn_slowpath_fmt+0x41/0x43
[   43.692511]  [<ffffffff813fff10>] dev_watchdog+0x15c/0x1fb
[   43.758141]  [<ffffffff813ffdb4>] ? __netdev_watchdog_up+0x64/0x64
[   43.832091]  [<ffffffff8103cd04>] call_timer_fn+0x17/0x6f
[   43.896682]  [<ffffffff8103cebe>] run_timer_softirq+0x162/0x1a2
[   43.967511]  [<ffffffff81038520>] __do_softirq+0xcd/0x1cc
[   44.032104]  [<ffffffff81038689>] irq_exit+0x3a/0x48
[   44.091492]  [<ffffffff81026d43>] smp_apic_timer_interrupt+0x43/0x50
[   44.167525]  [<ffffffff8148c24a>] apic_timer_interrupt+0x6a/0x70
[   44.239392]  <EOI>  [<ffffffff8100992c>] ? default_idle+0x6/0x8
[   44.310343]  [<ffffffff81009b31>] arch_cpu_idle+0x13/0x18
[   44.374934]  [<ffffffff81066126>] cpu_startup_entry+0xa7/0x101
[   44.444724]  [<ffffffff81025660>] start_secondary+0x1b2/0x1b7
[   44.513472] ---[ end trace a5a075fd4e7f854f ]---
[   44.568753] igb 0000:04:00.0 eth0: Reset adapter
[   46.206945] random: nonblocking pool is initialized
[   46.465670] irq 44: nobody cared (try booting with the "irqpoll" option)
[   46.545862] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W    3.14.0-rc2-mptcp #437
[   46.640610] Hardware name: HP ProLiant DL165 G7, BIOS O37 01/26/2011
[   46.716641]  ffff8802363f8c84 ffff88023fdc3e38 ffffffff81487847 00000000a03cdb6d
[   46.805598]  ffff8802363f8c00 ffff88023fdc3e68 ffffffff81068489 0000007f81825400
[   46.894539]  ffff8802363f8c00 0000000000000000 0000000000000000 ffff88023fdc3ea8
[   46.983484] Call Trace:
[   47.012714]  <IRQ>  [<ffffffff81487847>] dump_stack+0x49/0x62
[   47.081585]  [<ffffffff81068489>] __report_bad_irq+0x35/0xc1
[   47.149295]  [<ffffffff81068683>] note_interrupt+0x16e/0x1ea
[   47.217006]  [<ffffffff8106679e>] handle_irq_event_percpu+0x116/0x12e
[   47.294075]  [<ffffffff810667e9>] handle_irq_event+0x33/0x4f
[   47.361787]  [<ffffffff81068c95>] handle_fasteoi_irq+0x83/0xd1
[   47.431577]  [<ffffffff81003d5b>] handle_irq+0x1f/0x28
[   47.493047]  [<ffffffff81003567>] do_IRQ+0x4e/0xd4
[   47.550358]  [<ffffffff8148b06a>] common_interrupt+0x6a/0x6a
[   47.618066]  <EOI>  [<ffffffff8100992c>] ? default_idle+0x6/0x8
[   47.689016]  [<ffffffff81009b31>] arch_cpu_idle+0x13/0x18
[   47.753605]  [<ffffffff81066126>] cpu_startup_entry+0xa7/0x101
[   47.823397]  [<ffffffff81025660>] start_secondary+0x1b2/0x1b7
[   47.892146] handlers:
[   47.919301] [<ffffffff812fbd7d>] igb_intr

So, this patch unsets the flag to indicate that we are not using MSIX.
This patch does exactly this: Unsetting the flag when falling back to MSI.

Fixes: cd14ef54d25b (igb: Change to use statically allocated array for MSIx entries)
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index bfcf192..d9c7eb2 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1127,6 +1127,7 @@ static void igb_set_interrupt_capability(struct igb_adapter *adapter, bool msix)
 
 	/* If we can't do MSI-X, try MSI */
 msi_only:
+	adapter->flags &= ~IGB_FLAG_HAS_MSIX;
 #ifdef CONFIG_PCI_IOV
 	/* disable SR-IOV for non MSI-X configurations */
 	if (adapter->vf_data) {
-- 
1.8.3.2


[-- Attachment #4: 0003-ARM-mvebu-change-the-default-PCIe-apertures-for-Arma.patch --]
[-- Type: text/x-patch, Size: 2556 bytes --]

>From dc1e59a9dddf9f39b9a68fdd6d75e4517811de3b Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Tue, 4 Mar 2014 17:36:59 +0100
Subject: [PATCH 03/13] ARM: mvebu: change the default PCIe apertures for
 Armada 370/XP

The latest Marvell bootloaders for various boards change the MBus
Window base address from 0xC0000000 to 0xF0000000, in order to make
more RAM in the first 4 GB actually usable by the kernel (RAM that is
covered by the MBus window is "shadowed" and therefore not usable).

However, our default PCIe memory and I/O apertures where sitting at
0xe0000000 (for memory) and 0xe8000000 (for I/O), which will now be
outside of the MBus Window range on those platforms. To make things
work, we have to ensure those apertures use addresses in the
0xF0000000 -> 0xFFFFFFFF range.

Of course this change of the MBus Window base address from 0xC0000000
to 0xF0000000 also comes with a change of the internal register base
address from 0xD0000000 to 0xF1000000.

We have therefore designed the following memory map:

 * 0xF0000000 -> 0xF1000000: 16 MB, used for NOR flashes on Armada XP
   GP and Armada XP DB.

 * 0xF1000000 -> 0xF1100000: 1 MB, used for internal registers.

 * 0xF8000000 -> 0xFFE00000: 126 MB, used for PCIe memory.

 * 0xFFE00000 -> 0xFFF00000: 1 MB, used for PCIe I/O.

 * 0xFFF00000 -> 0xFFFFFFFF: 1 MB, used for the BootROM mapping

There is one exception to this layout: the Armada XP OpenBlocks, which
has a 128 MB NOR flash, mapped from 0xF0000000 to 0xF8000000. This
does not conflict with the current change for the PCIe I/O and memory
apertures, and continues to work because on Armada XP OpenBlocks, the
bootloader is an old one, and continues to have internal registers
mapped at 0xD0000000.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 arch/arm/boot/dts/armada-370-xp.dtsi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
index 74b5964..2188ce6 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -44,8 +44,8 @@
 		#size-cells = <1>;
 		controller = <&mbusc>;
 		interrupt-parent = <&mpic>;
-		pcie-mem-aperture = <0xe0000000 0x8000000>;
-		pcie-io-aperture  = <0xe8000000 0x100000>;
+		pcie-mem-aperture = <0xf8000000 0x7e00000>;
+		pcie-io-aperture  = <0xffe00000 0x100000>;
 
 		devbus-bootcs {
 			compatible = "marvell,mvebu-devbus";
-- 
1.8.3.2


[-- Attachment #5: 0004-ARM-mvebu-switch-the-Armada-XP-DB-to-use-internal-re.patch --]
[-- Type: text/x-patch, Size: 2498 bytes --]

>From b6ee050ba68e679ffc4549cb13913ba643c52d83 Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Tue, 4 Mar 2014 17:37:00 +0100
Subject: [PATCH 04/13] ARM: mvebu: switch the Armada XP DB to use internal
 registers at 0xf1000000

Marvell has now provided bootloaders that are Device Tree capable for
the Armada XP DB board, and that also remap the internal register base
address to 0xf1000000. In addition, the bootloader now sets the MBus
Window base address to 0xf0000000, but on this board, this change
doesn't make much difference since the board is by default equipped
with 2 GB of RAM.

Therefore this commit updates the soc->ranges Device Tree property
with the fact that the internal registers are now mapped at
0xf1000000.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 arch/arm/boot/dts/armada-xp-db.dts | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/armada-xp-db.dts b/arch/arm/boot/dts/armada-xp-db.dts
index bcf6d79..448373c 100644
--- a/arch/arm/boot/dts/armada-xp-db.dts
+++ b/arch/arm/boot/dts/armada-xp-db.dts
@@ -2,7 +2,7 @@
  * Device Tree file for Marvell Armada XP evaluation board
  * (DB-78460-BP)
  *
- * Copyright (C) 2012 Marvell
+ * Copyright (C) 2012-2014 Marvell
  *
  * Lior Amsalem <alior@marvell.com>
  * Gregory CLEMENT <gregory.clement@free-electrons.com>
@@ -11,6 +11,15 @@
  * This file is licensed under the terms of the GNU General Public
  * License version 2.  This program is licensed "as is" without any
  * warranty of any kind, whether express or implied.
+  *
+ * Note: this Device Tree assumes that the bootloader has remapped the
+ * internal registers to 0xf1000000 (instead of the default
+ * 0xd0000000). The 0xf1000000 is the default used by the recent,
+ * DT-capable, U-Boot bootloaders provided by Marvell. Some earlier
+ * boards were delivered with an older version of the bootloader that
+ * left internal registers mapped at 0xd0000000. If you are in this
+ * situation, you should either update your bootloader (preferred
+ * solution) or the below Device Tree should be adjusted.
  */
 
 /dts-v1/;
@@ -30,7 +39,7 @@
 	};
 
 	soc {
-		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xd0000000 0x100000
+		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000
 			  MBUS_ID(0x01, 0x1d) 0 0 0xfff00000 0x100000
 			  MBUS_ID(0x01, 0x2f) 0 0 0xf0000000 0x1000000>;
 
-- 
1.8.3.2


[-- Attachment #6: 0005-ARM-mvebu-switch-the-Armada-XP-GP-to-use-internal-re.patch --]
[-- Type: text/x-patch, Size: 3574 bytes --]

>From 05124e6a87b68c1aa1d138d1b1c0662386b9eab6 Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Tue, 4 Mar 2014 17:37:01 +0100
Subject: [PATCH 05/13] ARM: mvebu: switch the Armada XP GP to use internal
 registers at 0xf1000000

Marvell has now provided bootloaders that are Device Tree capable for
the Armada XP GP board, and that also remap the internal register base
address to 0xf1000000. In addition, the bootloader now sets the MBus
Window base address to 0xf0000000, which allows to use much more RAM
in the last GB of RAM before the 4 GB limit (the entire space from
0xC0000000 to 0xFFFFFFFF was not usable due to being used for I/O, not
only the space from 0xF0000000 to 0xFFFFFFFF is used for I/O).

Therefore this commit:

 * Updates the memory->reg Device Tree property with the fact that in
   the first bank of RAM, memory up to 0xf0000000 can be used.

 * Updates the soc->ranges Device Tree property with the fact that the
   internal registers are now mapped at 0xf1000000.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 arch/arm/boot/dts/armada-xp-gp.dts | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm/boot/dts/armada-xp-gp.dts b/arch/arm/boot/dts/armada-xp-gp.dts
index 274e2ad..61bda68 100644
--- a/arch/arm/boot/dts/armada-xp-gp.dts
+++ b/arch/arm/boot/dts/armada-xp-gp.dts
@@ -2,7 +2,7 @@
  * Device Tree file for Marvell Armada XP development board
  * (DB-MV784MP-GP)
  *
- * Copyright (C) 2013 Marvell
+ * Copyright (C) 2013-2014 Marvell
  *
  * Lior Amsalem <alior@marvell.com>
  * Gregory CLEMENT <gregory.clement@free-electrons.com>
@@ -11,6 +11,15 @@
  * This file is licensed under the terms of the GNU General Public
  * License version 2.  This program is licensed "as is" without any
  * warranty of any kind, whether express or implied.
+ *
+ * Note: this Device Tree assumes that the bootloader has remapped the
+ * internal registers to 0xf1000000 (instead of the default
+ * 0xd0000000). The 0xf1000000 is the default used by the recent,
+ * DT-capable, U-Boot bootloaders provided by Marvell. Some earlier
+ * boards were delivered with an older version of the bootloader that
+ * left internal registers mapped at 0xd0000000. If you are in this
+ * situation, you should either update your bootloader (preferred
+ * solution) or the below Device Tree should be adjusted.
  */
 
 /dts-v1/;
@@ -30,16 +39,17 @@
                  * 8 GB of plug-in RAM modules by default.The amount
                  * of memory available can be changed by the
                  * bootloader according the size of the module
-                 * actually plugged. Only 7GB are usable because
-                 * addresses from 0xC0000000 to 0xffffffff are used by
-                 * the internal registers of the SoC.
+                 * actually plugged. However, memory between
+                 * 0xF0000000 to 0xFFFFFFFF cannot be used, as it is
+                 * the address range used for I/O (internal registers,
+                 * MBus windows).
 		 */
-		reg = <0x00000000 0x00000000 0x00000000 0xC0000000>,
+		reg = <0x00000000 0x00000000 0x00000000 0xf0000000>,
 		      <0x00000001 0x00000000 0x00000001 0x00000000>;
 	};
 
 	soc {
-		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xd0000000 0x100000
+		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000
 			  MBUS_ID(0x01, 0x1d) 0 0 0xfff00000 0x100000
 			  MBUS_ID(0x01, 0x2f) 0 0 0xf0000000 0x1000000>;
 
-- 
1.8.3.2


[-- Attachment #7: 0006-irqchip-armada-370-xp-fix-invalid-cast-of-signed-val.patch --]
[-- Type: text/x-patch, Size: 1377 bytes --]

>From 5881c7df1167750b103a3612876c0d1c37b55345 Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Tue, 8 Apr 2014 17:33:31 +0200
Subject: [PATCH 06/13] irqchip: armada-370-xp: fix invalid cast of signed
 value into unsigned variable

The armada_370_xp_alloc_msi() function returns a signed int, which is
negative on error. However, we store the return value into an
irq_hw_number_t, which is unsigned. Therefore, we actually never test
if armada_370_xp_alloc_msi() returns an error or not, which may lead
us to use hwirq numbers of as 0xffffffe4 (when
armada_370_xp_alloc_msi() returns -ENOSPC).

This commit fixes that by storing the return value of
armada_370_xp_alloc_msi() in a signed variable.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/irqchip/irq-armada-370-xp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index 5409564..7858729 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -130,8 +130,7 @@ static int armada_370_xp_setup_msi_irq(struct msi_chip *chip,
 				       struct msi_desc *desc)
 {
 	struct msi_msg msg;
-	irq_hw_number_t hwirq;
-	int virq;
+	int virq, hwirq;
 
 	hwirq = armada_370_xp_alloc_msi();
 	if (hwirq < 0)
-- 
1.8.3.2


[-- Attachment #8: 0007-irqchip-armada-370-xp-implement-the-check_device-msi.patch --]
[-- Type: text/x-patch, Size: 1700 bytes --]

>From 8ee3f1764fdc1c49b4199b36d21e5f40f53eca4f Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Tue, 8 Apr 2014 17:35:23 +0200
Subject: [PATCH 07/13] irqchip: armada-370-xp: implement the ->check_device()
 msi_chip operation

Until now, we were leaving the ->check_device() msi_chip operation
empty, which leads the PCI core to believe that we support both MSI
and MSI-X. In fact, we do not support MSI-X, so we have to tell this
to the PCI core by providing an implementation of this operation.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/irqchip/irq-armada-370-xp.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index 7858729..5d925af 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -160,6 +160,15 @@ static void armada_370_xp_teardown_msi_irq(struct msi_chip *chip,
 	armada_370_xp_free_msi(d->hwirq);
 }
 
+static int armada_370_xp_check_msi_device(struct msi_chip *chip, struct pci_dev *dev,
+					  int nvec, int type)
+{
+	/* We support MSI, but not MSI-X */
+	if (type == PCI_CAP_ID_MSI)
+		return 0;
+	return -EINVAL;
+}
+
 static struct irq_chip armada_370_xp_msi_irq_chip = {
 	.name = "armada_370_xp_msi_irq",
 	.irq_enable = unmask_msi_irq,
@@ -198,6 +207,7 @@ static int armada_370_xp_msi_init(struct device_node *node,
 
 	msi_chip->setup_irq = armada_370_xp_setup_msi_irq;
 	msi_chip->teardown_irq = armada_370_xp_teardown_msi_irq;
+	msi_chip->check_device = armada_370_xp_check_msi_device;
 	msi_chip->of_node = node;
 
 	armada_370_xp_msi_domain =
-- 
1.8.3.2


[-- Attachment #9: 0008-irqchip-armada-370-xp-Fix-releasing-of-MSIs.patch --]
[-- Type: text/x-patch, Size: 1273 bytes --]

>From 3521e42c3b6880fc466ef431e766d71bf77b206e Mon Sep 17 00:00:00 2001
From: Neil Greatorex <neil@fatboyfat.co.uk>
Date: Sun, 6 Apr 2014 16:10:43 +0100
Subject: [PATCH 08/13] irqchip: armada-370-xp: Fix releasing of MSIs

Store the value of d->hwirq in a local variable as the real value is wiped out
by calling irq_dispose_mapping. Without this patch, the armada_370_xp_free_msi
function would always free MSI#0, no matter what was passed to it.

Signed-off-by: Neil Greatorex <neil@fatboyfat.co.uk>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/irqchip/irq-armada-370-xp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index 5d925af..939eb0d 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -156,8 +156,10 @@ static void armada_370_xp_teardown_msi_irq(struct msi_chip *chip,
 					   unsigned int irq)
 {
 	struct irq_data *d = irq_get_irq_data(irq);
+	unsigned long hwirq = d->hwirq;
+
 	irq_dispose_mapping(irq);
-	armada_370_xp_free_msi(d->hwirq);
+	armada_370_xp_free_msi(hwirq);
 }
 
 static int armada_370_xp_check_msi_device(struct msi_chip *chip, struct pci_dev *dev,
-- 
1.8.3.2


[-- Attachment #10: 0009-pci-mvebu-fix-off-by-one-in-the-computed-size-of-the.patch --]
[-- Type: text/x-patch, Size: 1860 bytes --]

>From 7c1f808988da77a1d315459f735609426abb2c41 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Wed, 9 Apr 2014 08:05:09 +0200
Subject: [PATCH 09/13] pci: mvebu: fix off-by-one in the computed size of the
 mbus windows

mvebu_pcie_handle_membase_change() and
mvebu_pcie_handle_iobase_change() do not correctly compute the window
size. PCI uses an inclusive start/end address pair, which requires a
+1 when converting to size.

This only worked because a bug in the mbus driver allowed it to
silently accept and round up bogus sizes.

Fix this by adding one to the computed size.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/pci/host/pci-mvebu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 0e79665..eff0ab5 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -329,7 +329,7 @@ static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 	port->iowin_base = port->pcie->io.start + iobase;
 	port->iowin_size = ((0xFFF | ((port->bridge.iolimit & 0xF0) << 8) |
 			    (port->bridge.iolimitupper << 16)) -
-			    iobase);
+			    iobase) + 1;
 
 	mvebu_mbus_add_window_remap_by_id(port->io_target, port->io_attr,
 					  port->iowin_base, port->iowin_size,
@@ -362,7 +362,7 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
 	port->memwin_base  = ((port->bridge.membase & 0xFFF0) << 16);
 	port->memwin_size  =
 		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
-		port->memwin_base;
+		port->memwin_base + 1;
 
 	mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
 				    port->memwin_base, port->memwin_size);
-- 
1.8.3.2


[-- Attachment #11: 0010-bus-mvebu-mbus-Avoid-setting-an-undefined-window-siz.patch --]
[-- Type: text/x-patch, Size: 2238 bytes --]

>From e3eaec9f807213a6321c44780a09187c93f0145a Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Tue, 8 Apr 2014 17:44:14 -0600
Subject: [PATCH 10/13] bus: mvebu-mbus: Avoid setting an undefined window size

The mbus hardware requires a power of two size, and size aligned base.
Currently, if a non-power of two is passed in to the low level routines
they configure the register in a way that results in undefined behaviour.

Call WARN and return EINVAL instead.

Also, update the debugfs routines to show a message if there is an
invalid register setting.

All together this makes the recent problems with silent failure
of PCI very obvious, noisy and debuggable.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/bus/mvebu-mbus.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index 725c461..15a398d 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -56,6 +56,7 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/debugfs.h>
+#include <linux/log2.h>
 
 /*
  * DDR target is the same on all platforms.
@@ -266,6 +267,17 @@ static int mvebu_mbus_setup_window(struct mvebu_mbus_state *mbus,
 		mbus->soc->win_cfg_offset(win);
 	u32 ctrl, remap_addr;
 
+	if (!is_power_of_2(size)) {
+		WARN(true, "Invalid MBus window size: 0x%zx\n", size);
+		return -EINVAL;
+	}
+
+	if ((base & (phys_addr_t)(size - 1)) != 0) {
+		WARN(true, "Invalid MBus base/size: %pa len 0x%zx\n", &base,
+		     size);
+		return -EINVAL;
+	}
+
 	ctrl = ((size - 1) & WIN_CTRL_SIZE_MASK) |
 		(attr << WIN_CTRL_ATTR_SHIFT)    |
 		(target << WIN_CTRL_TGT_SHIFT)   |
@@ -413,6 +425,10 @@ static int mvebu_devs_debug_show(struct seq_file *seq, void *v)
 			   win, (unsigned long long)wbase,
 			   (unsigned long long)(wbase + wsize), wtarget, wattr);
 
+		if (!is_power_of_2(wsize) ||
+		    ((wbase & (u64)(wsize - 1)) != 0))
+			seq_puts(seq, " (Invalid base/size!!)");
+
 		if (win < mbus->soc->num_remappable_wins) {
 			seq_printf(seq, " (remap %016llx)\n",
 				   (unsigned long long)wremap);
-- 
1.8.3.2


[-- Attachment #12: 0011-bus-mvebu-mbus-allow-several-windows-with-the-same-t.patch --]
[-- Type: text/x-patch, Size: 1136 bytes --]

>From d6184385d76474408ad9922be57d0d47525094e0 Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Thu, 10 Apr 2014 16:15:15 +0200
Subject: [PATCH 11/13] bus: mvebu-mbus: allow several windows with the same
 target/attribute

Having multiple windows with the same target and attribute is actually
legal, and can be useful for PCIe windows, when PCIe BARs have a size
that isn't a power of two, and we therefore need to create several
MBus windows to cover the PCIe BAR for a given PCIe interface.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/bus/mvebu-mbus.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index 15a398d..54d339e 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -223,12 +223,6 @@ static int mvebu_mbus_window_conflicts(struct mvebu_mbus_state *mbus,
 		 */
 		if ((u64)base < wend && end > wbase)
 			return 0;
-
-		/*
-		 * Check if target/attribute conflicts
-		 */
-		if (target == wtarget && attr == wattr)
-			return 0;
 	}
 
 	return 1;
-- 
1.8.3.2


[-- Attachment #13: 0012-pci-pci-mvebu-split-PCIe-BARs-into-multiple-MBus-win.patch --]
[-- Type: text/x-patch, Size: 6126 bytes --]

>From d5c41e956a19f4f7c69bc002291a866f4ad3e145 Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Thu, 10 Apr 2014 16:16:16 +0200
Subject: [PATCH 12/13] pci: pci-mvebu: split PCIe BARs into multiple MBus
 windows when needed

MBus windows are used on Marvell platforms to map certain peripherals
in the physical address space. In the PCIe context, MBus windows are
needed to map PCIe I/O and memory regions in the physical address.

However, those MBus windows can only have power of two sizes, while
PCIe BAR do not necessarily guarantee this. For this reason, the
current pci-mvebu breaks on platforms where PCIe devices have BARs
that don't sum up to a power of two size at the emulated bridge level.

This commit fixes this by allowing the pci-mvebu driver to create
multiple contiguous MBus windows (each having a power of two size) to
cover a given PCIe BAR.

To achieve this, two functions are added: mvebu_pcie_add_windows() and
mvebu_pcie_del_windows() to respectively add and remove all the MBus
windows that are needed to map the provided PCIe region base and
size. The emulated PCI bridge code now calls those functions, instead
of directly calling the mvebu-mbus driver functions.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 drivers/pci/host/pci-mvebu.c | 88 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 74 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index eff0ab5..487c926 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -291,6 +291,58 @@ static int mvebu_pcie_hw_wr_conf(struct mvebu_pcie_port *port,
 	return PCIBIOS_SUCCESSFUL;
 }
 
+/*
+ * Remove windows, starting from the largest ones to the smallest
+ * ones.
+ */
+static void mvebu_pcie_del_windows(struct mvebu_pcie_port *port,
+				   phys_addr_t base, size_t size)
+{
+	while (size) {
+		size_t sz = 1 << (fls(size) - 1);
+
+		mvebu_mbus_del_window(base, sz);
+		base += sz;
+		size -= sz;
+	}
+}
+
+/*
+ * MBus windows can only have a power of two size, but PCI BARs do not
+ * have this constraint. Therefore, we have to split the PCI BAR into
+ * areas each having a power of two size. We start from the largest
+ * one (i.e highest order bit set in the size).
+ */
+static void mvebu_pcie_add_windows(struct mvebu_pcie_port *port,
+				   unsigned int target, unsigned int attribute,
+				   phys_addr_t base, size_t size,
+				   phys_addr_t remap)
+{
+	size_t size_mapped = 0;
+
+	while (size) {
+		size_t sz = 1 << (fls(size) - 1);
+		int ret;
+
+		ret = mvebu_mbus_add_window_remap_by_id(target, attribute, base,
+							sz, remap);
+		if (ret) {
+			dev_err(&port->pcie->pdev->dev,
+				"Could not create MBus window at 0x%x, size 0x%x: %d\n",
+				base, sz, ret);
+			mvebu_pcie_del_windows(port, base - size_mapped,
+					       size_mapped);
+			return;
+		}
+
+		size -= sz;
+		size_mapped += sz;
+		base += sz;
+		if (remap != MVEBU_MBUS_NO_REMAP)
+			remap += sz;
+	}
+}
+
 static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 {
 	phys_addr_t iobase;
@@ -302,8 +354,8 @@ static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 
 		/* If a window was configured, remove it */
 		if (port->iowin_base) {
-			mvebu_mbus_del_window(port->iowin_base,
-					      port->iowin_size);
+			mvebu_pcie_del_windows(port, port->iowin_base,
+					       port->iowin_size);
 			port->iowin_base = 0;
 			port->iowin_size = 0;
 		}
@@ -331,9 +383,9 @@ static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 			    (port->bridge.iolimitupper << 16)) -
 			    iobase) + 1;
 
-	mvebu_mbus_add_window_remap_by_id(port->io_target, port->io_attr,
-					  port->iowin_base, port->iowin_size,
-					  iobase);
+	mvebu_pcie_add_windows(port, port->io_target, port->io_attr,
+			       port->iowin_base, port->iowin_size,
+			       iobase);
 }
 
 static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
@@ -344,8 +396,8 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
 
 		/* If a window was configured, remove it */
 		if (port->memwin_base) {
-			mvebu_mbus_del_window(port->memwin_base,
-					      port->memwin_size);
+			mvebu_pcie_del_windows(port, port->memwin_base,
+					       port->memwin_size);
 			port->memwin_base = 0;
 			port->memwin_size = 0;
 		}
@@ -364,8 +416,9 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
 		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
 		port->memwin_base + 1;
 
-	mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
-				    port->memwin_base, port->memwin_size);
+	mvebu_pcie_add_windows(port, port->mem_target, port->mem_attr,
+			       port->memwin_base, port->memwin_size,
+			       MVEBU_MBUS_NO_REMAP);
 }
 
 /*
@@ -721,14 +774,21 @@ static resource_size_t mvebu_pcie_align_resource(struct pci_dev *dev,
 
 	/*
 	 * On the PCI-to-PCI bridge side, the I/O windows must have at
-	 * least a 64 KB size and be aligned on their size, and the
-	 * memory windows must have at least a 1 MB size and be
-	 * aligned on their size
+	 * least a 64 KB size and the memory windows must have at
+	 * least a 1 MB size. Moreover, MBus windows need to have a
+	 * base address aligned on their size, and their size must be
+	 * a power of two. This means that if the BAR doesn't have a
+	 * power of two size, several MBus windows will actually be
+	 * created. We need to ensure that the biggest MBus window
+	 * (which will be the first one) is aligned on its size, which
+	 * explains the rounddown_pow_of_two() being done here.
 	 */
 	if (res->flags & IORESOURCE_IO)
-		return round_up(start, max_t(resource_size_t, SZ_64K, size));
+		return round_up(start, max_t(resource_size_t, SZ_64K,
+					     rounddown_pow_of_two(size)));
 	else if (res->flags & IORESOURCE_MEM)
-		return round_up(start, max_t(resource_size_t, SZ_1M, size));
+		return round_up(start, max_t(resource_size_t, SZ_1M,
+					     rounddown_pow_of_two(size)));
 	else
 		return start;
 }
-- 
1.8.3.2


[-- Attachment #14: 0013-pci-pci-mvebu-wait-for-a-device-to-appear-to-fix-clo.patch --]
[-- Type: text/x-patch, Size: 3592 bytes --]

>From 72706b22617da7ef8dcf1924f4c204383111401c Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Date: Fri, 11 Apr 2014 15:22:36 +0200
Subject: [PATCH 13/13] pci: pci-mvebu: wait for a device to appear to fix
 clock issues

With the introduction of the mvebu-soc-id mechanism in
arch/arm/mach-mvebu/, the PCIe clocks can be gated during early boot,
and then re-enabled later when the pci-mvebu driver gets
called. However, after the clock has been enabled, it takes some time
for the PCIe device to become visible: this is causing problems on
some platforms where PCIe devices may not be detected at boot time due
to this.

To fix this, this commit introduces a simple loop that waits for a
valid device to actually show up on each PCIe interface for which the
link is up.

It fixes a problem reported both by Gregory Clement and Neil
Greatorex, which were seeing all PCIe devices detected when
earlyprintk was enabled, but one of the device was missing when
earlyprintk was disabled. This was due to the fact that earlyprintk
was slowing down the boot sufficiently to make the problem invisible.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reported-by: Neil Greatorex <neil@fatboyfat.co.uk>
Reported-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
 drivers/pci/host/pci-mvebu.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 487c926..008d718 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -21,6 +21,7 @@
 #include <linux/of_gpio.h>
 #include <linux/of_pci.h>
 #include <linux/of_platform.h>
+#include <linux/clk-provider.h>
 
 /*
  * PCIe unit register offsets.
@@ -162,6 +163,14 @@ static void mvebu_pcie_set_local_bus_nr(struct mvebu_pcie_port *port, int nr)
 	mvebu_writel(port, stat, PCIE_STAT_OFF);
 }
 
+static u32 mvebu_pcie_get_local_bus_nr(struct mvebu_pcie_port *port)
+{
+	u32 stat;
+
+	stat = mvebu_readl(port, PCIE_STAT_OFF);
+	return (stat & PCIE_STAT_BUS) >> 8;
+}
+
 static void mvebu_pcie_set_local_dev_nr(struct mvebu_pcie_port *port, int nr)
 {
 	u32 stat;
@@ -172,6 +181,30 @@ static void mvebu_pcie_set_local_dev_nr(struct mvebu_pcie_port *port, int nr)
 	mvebu_writel(port, stat, PCIE_STAT_OFF);
 }
 
+static void mvebu_pcie_wait_dev(struct mvebu_pcie_port *port)
+{
+	int tries;
+
+	for (tries = 0; tries < 1000; tries++) {
+		u32 vpid;
+
+		mvebu_writel(port,
+			     PCIE_CONF_ADDR(mvebu_pcie_get_local_bus_nr(port),
+					    PCI_DEVFN(0, 0), PCI_VENDOR_ID),
+			     PCIE_CONF_ADDR_OFF);
+		vpid = mvebu_readl(port, PCIE_CONF_DATA_OFF);
+
+		if (vpid != 0xffffffff)
+			break;
+
+		udelay(100);
+	}
+
+	if (tries >= 1000)
+		dev_warn(&port->pcie->pdev->dev,
+			 "timeout when looking for the PCIe device\n");
+}
+
 /*
  * Setup PCIE BARs and Address Decode Wins:
  * BAR[0,2] -> disabled, BAR[1] -> covers all DRAM banks
@@ -951,6 +984,7 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 	for_each_child_of_node(pdev->dev.of_node, child) {
 		struct mvebu_pcie_port *port = &pcie->ports[i];
 		enum of_gpio_flags flags;
+		int linkup;
 
 		if (!of_device_is_available(child))
 			continue;
@@ -1035,8 +1069,13 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 			continue;
 		}
 
+		linkup = mvebu_pcie_link_up(port);
+
 		mvebu_pcie_set_local_dev_nr(port, 1);
 
+		if (linkup)
+			mvebu_pcie_wait_dev(port);
+
 		port->dn = child;
 		spin_lock_init(&port->conf_lock);
 		mvebu_sw_pci_bridge_init(port);
-- 
1.8.3.2


[-- Attachment #15: combined.patch --]
[-- Type: text/x-patch, Size: 15073 bytes --]

diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
index 74b5964..2188ce6 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -44,8 +44,8 @@
 		#size-cells = <1>;
 		controller = <&mbusc>;
 		interrupt-parent = <&mpic>;
-		pcie-mem-aperture = <0xe0000000 0x8000000>;
-		pcie-io-aperture  = <0xe8000000 0x100000>;
+		pcie-mem-aperture = <0xf8000000 0x7e00000>;
+		pcie-io-aperture  = <0xffe00000 0x100000>;
 
 		devbus-bootcs {
 			compatible = "marvell,mvebu-devbus";
diff --git a/arch/arm/boot/dts/armada-xp-db.dts b/arch/arm/boot/dts/armada-xp-db.dts
index bcf6d79..448373c 100644
--- a/arch/arm/boot/dts/armada-xp-db.dts
+++ b/arch/arm/boot/dts/armada-xp-db.dts
@@ -2,7 +2,7 @@
  * Device Tree file for Marvell Armada XP evaluation board
  * (DB-78460-BP)
  *
- * Copyright (C) 2012 Marvell
+ * Copyright (C) 2012-2014 Marvell
  *
  * Lior Amsalem <alior@marvell.com>
  * Gregory CLEMENT <gregory.clement@free-electrons.com>
@@ -11,6 +11,15 @@
  * This file is licensed under the terms of the GNU General Public
  * License version 2.  This program is licensed "as is" without any
  * warranty of any kind, whether express or implied.
+  *
+ * Note: this Device Tree assumes that the bootloader has remapped the
+ * internal registers to 0xf1000000 (instead of the default
+ * 0xd0000000). The 0xf1000000 is the default used by the recent,
+ * DT-capable, U-Boot bootloaders provided by Marvell. Some earlier
+ * boards were delivered with an older version of the bootloader that
+ * left internal registers mapped at 0xd0000000. If you are in this
+ * situation, you should either update your bootloader (preferred
+ * solution) or the below Device Tree should be adjusted.
  */
 
 /dts-v1/;
@@ -30,7 +39,7 @@
 	};
 
 	soc {
-		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xd0000000 0x100000
+		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000
 			  MBUS_ID(0x01, 0x1d) 0 0 0xfff00000 0x100000
 			  MBUS_ID(0x01, 0x2f) 0 0 0xf0000000 0x1000000>;
 
diff --git a/arch/arm/boot/dts/armada-xp-gp.dts b/arch/arm/boot/dts/armada-xp-gp.dts
index 274e2ad..61bda68 100644
--- a/arch/arm/boot/dts/armada-xp-gp.dts
+++ b/arch/arm/boot/dts/armada-xp-gp.dts
@@ -2,7 +2,7 @@
  * Device Tree file for Marvell Armada XP development board
  * (DB-MV784MP-GP)
  *
- * Copyright (C) 2013 Marvell
+ * Copyright (C) 2013-2014 Marvell
  *
  * Lior Amsalem <alior@marvell.com>
  * Gregory CLEMENT <gregory.clement@free-electrons.com>
@@ -11,6 +11,15 @@
  * This file is licensed under the terms of the GNU General Public
  * License version 2.  This program is licensed "as is" without any
  * warranty of any kind, whether express or implied.
+ *
+ * Note: this Device Tree assumes that the bootloader has remapped the
+ * internal registers to 0xf1000000 (instead of the default
+ * 0xd0000000). The 0xf1000000 is the default used by the recent,
+ * DT-capable, U-Boot bootloaders provided by Marvell. Some earlier
+ * boards were delivered with an older version of the bootloader that
+ * left internal registers mapped at 0xd0000000. If you are in this
+ * situation, you should either update your bootloader (preferred
+ * solution) or the below Device Tree should be adjusted.
  */
 
 /dts-v1/;
@@ -30,16 +39,17 @@
                  * 8 GB of plug-in RAM modules by default.The amount
                  * of memory available can be changed by the
                  * bootloader according the size of the module
-                 * actually plugged. Only 7GB are usable because
-                 * addresses from 0xC0000000 to 0xffffffff are used by
-                 * the internal registers of the SoC.
+                 * actually plugged. However, memory between
+                 * 0xF0000000 to 0xFFFFFFFF cannot be used, as it is
+                 * the address range used for I/O (internal registers,
+                 * MBus windows).
 		 */
-		reg = <0x00000000 0x00000000 0x00000000 0xC0000000>,
+		reg = <0x00000000 0x00000000 0x00000000 0xf0000000>,
 		      <0x00000001 0x00000000 0x00000001 0x00000000>;
 	};
 
 	soc {
-		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xd0000000 0x100000
+		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000
 			  MBUS_ID(0x01, 0x1d) 0 0 0xfff00000 0x100000
 			  MBUS_ID(0x01, 0x2f) 0 0 0xf0000000 0x1000000>;
 
diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index 725c461..54d339e 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -56,6 +56,7 @@
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/debugfs.h>
+#include <linux/log2.h>
 
 /*
  * DDR target is the same on all platforms.
@@ -222,12 +223,6 @@ static int mvebu_mbus_window_conflicts(struct mvebu_mbus_state *mbus,
 		 */
 		if ((u64)base < wend && end > wbase)
 			return 0;
-
-		/*
-		 * Check if target/attribute conflicts
-		 */
-		if (target == wtarget && attr == wattr)
-			return 0;
 	}
 
 	return 1;
@@ -266,6 +261,17 @@ static int mvebu_mbus_setup_window(struct mvebu_mbus_state *mbus,
 		mbus->soc->win_cfg_offset(win);
 	u32 ctrl, remap_addr;
 
+	if (!is_power_of_2(size)) {
+		WARN(true, "Invalid MBus window size: 0x%zx\n", size);
+		return -EINVAL;
+	}
+
+	if ((base & (phys_addr_t)(size - 1)) != 0) {
+		WARN(true, "Invalid MBus base/size: %pa len 0x%zx\n", &base,
+		     size);
+		return -EINVAL;
+	}
+
 	ctrl = ((size - 1) & WIN_CTRL_SIZE_MASK) |
 		(attr << WIN_CTRL_ATTR_SHIFT)    |
 		(target << WIN_CTRL_TGT_SHIFT)   |
@@ -413,6 +419,10 @@ static int mvebu_devs_debug_show(struct seq_file *seq, void *v)
 			   win, (unsigned long long)wbase,
 			   (unsigned long long)(wbase + wsize), wtarget, wattr);
 
+		if (!is_power_of_2(wsize) ||
+		    ((wbase & (u64)(wsize - 1)) != 0))
+			seq_puts(seq, " (Invalid base/size!!)");
+
 		if (win < mbus->soc->num_remappable_wins) {
 			seq_printf(seq, " (remap %016llx)\n",
 				   (unsigned long long)wremap);
diff --git a/drivers/irqchip/irq-armada-370-xp.c b/drivers/irqchip/irq-armada-370-xp.c
index 5409564..939eb0d 100644
--- a/drivers/irqchip/irq-armada-370-xp.c
+++ b/drivers/irqchip/irq-armada-370-xp.c
@@ -130,8 +130,7 @@ static int armada_370_xp_setup_msi_irq(struct msi_chip *chip,
 				       struct msi_desc *desc)
 {
 	struct msi_msg msg;
-	irq_hw_number_t hwirq;
-	int virq;
+	int virq, hwirq;
 
 	hwirq = armada_370_xp_alloc_msi();
 	if (hwirq < 0)
@@ -157,8 +156,19 @@ static void armada_370_xp_teardown_msi_irq(struct msi_chip *chip,
 					   unsigned int irq)
 {
 	struct irq_data *d = irq_get_irq_data(irq);
+	unsigned long hwirq = d->hwirq;
+
 	irq_dispose_mapping(irq);
-	armada_370_xp_free_msi(d->hwirq);
+	armada_370_xp_free_msi(hwirq);
+}
+
+static int armada_370_xp_check_msi_device(struct msi_chip *chip, struct pci_dev *dev,
+					  int nvec, int type)
+{
+	/* We support MSI, but not MSI-X */
+	if (type == PCI_CAP_ID_MSI)
+		return 0;
+	return -EINVAL;
 }
 
 static struct irq_chip armada_370_xp_msi_irq_chip = {
@@ -199,6 +209,7 @@ static int armada_370_xp_msi_init(struct device_node *node,
 
 	msi_chip->setup_irq = armada_370_xp_setup_msi_irq;
 	msi_chip->teardown_irq = armada_370_xp_teardown_msi_irq;
+	msi_chip->check_device = armada_370_xp_check_msi_device;
 	msi_chip->of_node = node;
 
 	armada_370_xp_msi_domain =
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 46d31a4..d9c7eb2 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1014,6 +1014,12 @@ static void igb_reset_q_vector(struct igb_adapter *adapter, int v_idx)
 {
 	struct igb_q_vector *q_vector = adapter->q_vector[v_idx];
 
+	/* Coming from igb_set_interrupt_capability, the vectors are not yet
+	 * allocated. So, q_vector is NULL so we should stop here.
+	 */
+	if (!q_vector)
+		return;
+
 	if (q_vector->tx.ring)
 		adapter->tx_ring[q_vector->tx.ring->queue_index] = NULL;
 
@@ -1121,6 +1127,7 @@ static void igb_set_interrupt_capability(struct igb_adapter *adapter, bool msix)
 
 	/* If we can't do MSI-X, try MSI */
 msi_only:
+	adapter->flags &= ~IGB_FLAG_HAS_MSIX;
 #ifdef CONFIG_PCI_IOV
 	/* disable SR-IOV for non MSI-X configurations */
 	if (adapter->vf_data) {
diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index 0e79665..008d718 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -21,6 +21,7 @@
 #include <linux/of_gpio.h>
 #include <linux/of_pci.h>
 #include <linux/of_platform.h>
+#include <linux/clk-provider.h>
 
 /*
  * PCIe unit register offsets.
@@ -162,6 +163,14 @@ static void mvebu_pcie_set_local_bus_nr(struct mvebu_pcie_port *port, int nr)
 	mvebu_writel(port, stat, PCIE_STAT_OFF);
 }
 
+static u32 mvebu_pcie_get_local_bus_nr(struct mvebu_pcie_port *port)
+{
+	u32 stat;
+
+	stat = mvebu_readl(port, PCIE_STAT_OFF);
+	return (stat & PCIE_STAT_BUS) >> 8;
+}
+
 static void mvebu_pcie_set_local_dev_nr(struct mvebu_pcie_port *port, int nr)
 {
 	u32 stat;
@@ -172,6 +181,30 @@ static void mvebu_pcie_set_local_dev_nr(struct mvebu_pcie_port *port, int nr)
 	mvebu_writel(port, stat, PCIE_STAT_OFF);
 }
 
+static void mvebu_pcie_wait_dev(struct mvebu_pcie_port *port)
+{
+	int tries;
+
+	for (tries = 0; tries < 1000; tries++) {
+		u32 vpid;
+
+		mvebu_writel(port,
+			     PCIE_CONF_ADDR(mvebu_pcie_get_local_bus_nr(port),
+					    PCI_DEVFN(0, 0), PCI_VENDOR_ID),
+			     PCIE_CONF_ADDR_OFF);
+		vpid = mvebu_readl(port, PCIE_CONF_DATA_OFF);
+
+		if (vpid != 0xffffffff)
+			break;
+
+		udelay(100);
+	}
+
+	if (tries >= 1000)
+		dev_warn(&port->pcie->pdev->dev,
+			 "timeout when looking for the PCIe device\n");
+}
+
 /*
  * Setup PCIE BARs and Address Decode Wins:
  * BAR[0,2] -> disabled, BAR[1] -> covers all DRAM banks
@@ -291,6 +324,58 @@ static int mvebu_pcie_hw_wr_conf(struct mvebu_pcie_port *port,
 	return PCIBIOS_SUCCESSFUL;
 }
 
+/*
+ * Remove windows, starting from the largest ones to the smallest
+ * ones.
+ */
+static void mvebu_pcie_del_windows(struct mvebu_pcie_port *port,
+				   phys_addr_t base, size_t size)
+{
+	while (size) {
+		size_t sz = 1 << (fls(size) - 1);
+
+		mvebu_mbus_del_window(base, sz);
+		base += sz;
+		size -= sz;
+	}
+}
+
+/*
+ * MBus windows can only have a power of two size, but PCI BARs do not
+ * have this constraint. Therefore, we have to split the PCI BAR into
+ * areas each having a power of two size. We start from the largest
+ * one (i.e highest order bit set in the size).
+ */
+static void mvebu_pcie_add_windows(struct mvebu_pcie_port *port,
+				   unsigned int target, unsigned int attribute,
+				   phys_addr_t base, size_t size,
+				   phys_addr_t remap)
+{
+	size_t size_mapped = 0;
+
+	while (size) {
+		size_t sz = 1 << (fls(size) - 1);
+		int ret;
+
+		ret = mvebu_mbus_add_window_remap_by_id(target, attribute, base,
+							sz, remap);
+		if (ret) {
+			dev_err(&port->pcie->pdev->dev,
+				"Could not create MBus window at 0x%x, size 0x%x: %d\n",
+				base, sz, ret);
+			mvebu_pcie_del_windows(port, base - size_mapped,
+					       size_mapped);
+			return;
+		}
+
+		size -= sz;
+		size_mapped += sz;
+		base += sz;
+		if (remap != MVEBU_MBUS_NO_REMAP)
+			remap += sz;
+	}
+}
+
 static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 {
 	phys_addr_t iobase;
@@ -302,8 +387,8 @@ static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 
 		/* If a window was configured, remove it */
 		if (port->iowin_base) {
-			mvebu_mbus_del_window(port->iowin_base,
-					      port->iowin_size);
+			mvebu_pcie_del_windows(port, port->iowin_base,
+					       port->iowin_size);
 			port->iowin_base = 0;
 			port->iowin_size = 0;
 		}
@@ -329,11 +414,11 @@ static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 	port->iowin_base = port->pcie->io.start + iobase;
 	port->iowin_size = ((0xFFF | ((port->bridge.iolimit & 0xF0) << 8) |
 			    (port->bridge.iolimitupper << 16)) -
-			    iobase);
+			    iobase) + 1;
 
-	mvebu_mbus_add_window_remap_by_id(port->io_target, port->io_attr,
-					  port->iowin_base, port->iowin_size,
-					  iobase);
+	mvebu_pcie_add_windows(port, port->io_target, port->io_attr,
+			       port->iowin_base, port->iowin_size,
+			       iobase);
 }
 
 static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
@@ -344,8 +429,8 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
 
 		/* If a window was configured, remove it */
 		if (port->memwin_base) {
-			mvebu_mbus_del_window(port->memwin_base,
-					      port->memwin_size);
+			mvebu_pcie_del_windows(port, port->memwin_base,
+					       port->memwin_size);
 			port->memwin_base = 0;
 			port->memwin_size = 0;
 		}
@@ -362,10 +447,11 @@ static void mvebu_pcie_handle_membase_change(struct mvebu_pcie_port *port)
 	port->memwin_base  = ((port->bridge.membase & 0xFFF0) << 16);
 	port->memwin_size  =
 		(((port->bridge.memlimit & 0xFFF0) << 16) | 0xFFFFF) -
-		port->memwin_base;
+		port->memwin_base + 1;
 
-	mvebu_mbus_add_window_by_id(port->mem_target, port->mem_attr,
-				    port->memwin_base, port->memwin_size);
+	mvebu_pcie_add_windows(port, port->mem_target, port->mem_attr,
+			       port->memwin_base, port->memwin_size,
+			       MVEBU_MBUS_NO_REMAP);
 }
 
 /*
@@ -721,14 +807,21 @@ static resource_size_t mvebu_pcie_align_resource(struct pci_dev *dev,
 
 	/*
 	 * On the PCI-to-PCI bridge side, the I/O windows must have at
-	 * least a 64 KB size and be aligned on their size, and the
-	 * memory windows must have at least a 1 MB size and be
-	 * aligned on their size
+	 * least a 64 KB size and the memory windows must have at
+	 * least a 1 MB size. Moreover, MBus windows need to have a
+	 * base address aligned on their size, and their size must be
+	 * a power of two. This means that if the BAR doesn't have a
+	 * power of two size, several MBus windows will actually be
+	 * created. We need to ensure that the biggest MBus window
+	 * (which will be the first one) is aligned on its size, which
+	 * explains the rounddown_pow_of_two() being done here.
 	 */
 	if (res->flags & IORESOURCE_IO)
-		return round_up(start, max_t(resource_size_t, SZ_64K, size));
+		return round_up(start, max_t(resource_size_t, SZ_64K,
+					     rounddown_pow_of_two(size)));
 	else if (res->flags & IORESOURCE_MEM)
-		return round_up(start, max_t(resource_size_t, SZ_1M, size));
+		return round_up(start, max_t(resource_size_t, SZ_1M,
+					     rounddown_pow_of_two(size)));
 	else
 		return start;
 }
@@ -891,6 +984,7 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 	for_each_child_of_node(pdev->dev.of_node, child) {
 		struct mvebu_pcie_port *port = &pcie->ports[i];
 		enum of_gpio_flags flags;
+		int linkup;
 
 		if (!of_device_is_available(child))
 			continue;
@@ -975,8 +1069,13 @@ static int mvebu_pcie_probe(struct platform_device *pdev)
 			continue;
 		}
 
+		linkup = mvebu_pcie_link_up(port);
+
 		mvebu_pcie_set_local_dev_nr(port, 1);
 
+		if (linkup)
+			mvebu_pcie_wait_dev(port);
+
 		port->dn = child;
 		spin_lock_init(&port->conf_lock);
 		mvebu_sw_pci_bridge_init(port);

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-11 14:32   ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-11 14:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hello all,

On Thu, 10 Apr 2014 18:19:53 +0200, Thomas Petazzoni wrote:

> This is an e-mail that attempts to summarize the situation in terms of
> Armada XP PCIe issues.

Attached is a v2 of the patches to fix the various pci-mvebu issues.
Changes since the version posted yesterday:

 * Include a fix for the timing issue of the PCIe interface that gets
   its clock disabled. I've chosen a different approach than the one
   suggested by Jason Gunthorpe, which does not involve resetting the
   PHY. I've tested my fix on the Mirabox, and the Armada 385 DB board
   on which Gregory originally reported the problem (I finally managed
   to reproduce the problem, it was due to the fact that only one of
   the PCIe interfaces is actually affected by the problem, because
   only the clock of the first PCIe interface is used by the
   mvebu-soc-id stuff).

 * Invert the order of Willy's and Jason's patches around MBus
   addresses.

I've also:

 * Pushed the patches at
   https://github.com/MISL-EBU-System-SW/mainline-public/tree/3.14/pci-debug

 * Included a single combined patch, because I know one of you needs
   that to test easily.

Can everybody test these patches, and confirm that they solve all the
outstanding problems?

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-igb-Fix-Null-pointer-dereference-in-igb_reset_q_vect.patch
Type: text/x-patch
Size: 1808 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0014.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-igb-Unset-IGB_FLAG_HAS_MSIX-flag-when-falling-back-t.patch
Type: text/x-patch
Size: 5116 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0015.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-ARM-mvebu-change-the-default-PCIe-apertures-for-Arma.patch
Type: text/x-patch
Size: 2556 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0016.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-ARM-mvebu-switch-the-Armada-XP-DB-to-use-internal-re.patch
Type: text/x-patch
Size: 2498 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0017.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-ARM-mvebu-switch-the-Armada-XP-GP-to-use-internal-re.patch
Type: text/x-patch
Size: 3574 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0018.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-irqchip-armada-370-xp-fix-invalid-cast-of-signed-val.patch
Type: text/x-patch
Size: 1377 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0019.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0007-irqchip-armada-370-xp-implement-the-check_device-msi.patch
Type: text/x-patch
Size: 1700 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0020.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0008-irqchip-armada-370-xp-Fix-releasing-of-MSIs.patch
Type: text/x-patch
Size: 1273 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0021.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0009-pci-mvebu-fix-off-by-one-in-the-computed-size-of-the.patch
Type: text/x-patch
Size: 1860 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0022.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0010-bus-mvebu-mbus-Avoid-setting-an-undefined-window-siz.patch
Type: text/x-patch
Size: 2238 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0023.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0011-bus-mvebu-mbus-allow-several-windows-with-the-same-t.patch
Type: text/x-patch
Size: 1136 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0024.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0012-pci-pci-mvebu-split-PCIe-BARs-into-multiple-MBus-win.patch
Type: text/x-patch
Size: 6126 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0025.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0013-pci-pci-mvebu-wait-for-a-device-to-appear-to-fix-clo.patch
Type: text/x-patch
Size: 3592 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0026.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: combined.patch
Type: text/x-patch
Size: 15073 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140411/1b58ef39/attachment-0027.bin>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-11 14:32   ` Thomas Petazzoni
@ 2014-04-11 15:57     ` Neil Greatorex
  -1 siblings, 0 replies; 50+ messages in thread
From: Neil Greatorex @ 2014-04-11 15:57 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Jason Gunthorpe, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	Lior Amsalem, Andrew Lunn, Jason Cooper, Tawfik Bayouk,
	linux-pci, Ezequiel Garcia, Gregory Clément,
	linux-arm-kernel

Thomas,

On Fri, 11 Apr 2014, Thomas Petazzoni wrote:

> Can everybody test these patches, and confirm that they solve all the
> outstanding problems?

PCIe now works well for me on my Mirabox / i350 combination with this 
patch set. Thanks!

Tested-by: Neil Greatorex <neil@fatboyfat.co.uk>

Cheers,
Neil

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-11 15:57     ` Neil Greatorex
  0 siblings, 0 replies; 50+ messages in thread
From: Neil Greatorex @ 2014-04-11 15:57 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas,

On Fri, 11 Apr 2014, Thomas Petazzoni wrote:

> Can everybody test these patches, and confirm that they solve all the
> outstanding problems?

PCIe now works well for me on my Mirabox / i350 combination with this 
patch set. Thanks!

Tested-by: Neil Greatorex <neil@fatboyfat.co.uk>

Cheers,
Neil

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-11 10:23           ` Thomas Petazzoni
@ 2014-04-11 16:31             ` Jason Gunthorpe
  -1 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-11 16:31 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Neil Greatorex, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

On Fri, Apr 11, 2014 at 12:23:14PM +0200, Thomas Petazzoni wrote:

> On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:
> > > In any event, turning on the clock should almost certainly be
> > > accompanied by a phy reset sequence to get both link ends on the same
> > > page.
> > >
> > > Attached is a rough, untested patch along those lines.
> > >
> > 
> > I took your attached patch and extended it a bit to print out how long it 
> > took. The delays also need to be much longer for me. I also fixed a small 
> > typo you made where the bit wasn't being set again to bring the link back 
> > up. I've attached the diff to your patch, as well as the combined patch 
> > (hope that makes sense).
> 
> I managed to reproduce the problem of the PCIe device not being
> detected on Mirabox when earlyprintk is disabled.
> 
> However, the proposed patch doesn't seem to work:
> 
> mvebu-pcie pcie-controller.3: PCIe0.0: performing link reset
> mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 100 tries
> mvebu-pcie pcie-controller.3: PCIe0.0: link came back up after 0 tries
> 
> It waits hundred times for the link to go down, but it never goes down,
> and then it doesn't wait at all to go up... because it never went down.

I wonder if bit 30 only disables link training, but doesn't force the
link down. There may be another bit that is required here.

Alternatively, are you seeing a different problem? If you apply the
hack to the socid does your symptom go away as well?

> Moreover, I am not entirely convinced that a PHY reset is needed here.
> In my tests, doing just a wait after setting the local dev number was
> sufficient. The fact is works with earlyprintk also indicates that we
> don't need to do any specific action with the PHY, just to wait a bit
> of time. I am worried that resetting the PHY might actually take more
> time than what is needed, and may have other consequences that we don't
> necessarily understand at this point.

I really disagree - clearly the fundamental problem is suspending one
side of the PEX link while the other remains operating. That isn't
specified to work and really shouldn't work.

If you suspend one side of the PEX you *MUST* reset the
link. Absolutely. No Doubt In My Mind.

Remember, PCI-E is a serial shared state protocol. The two sides must
remain in sync or a link reset is required to recover the shared
state. Halting one side obviously destroys this invariant.

I think in many cases the reset happens autonomously. The remote side
will force it to happen. This is what the debugging from Neil shows -
the link just resets at some inconvenient point. Now we know why.

The timing sensitivity also makes sense, if you suspend for a very
short time window the other side might not notice. If you suspend for
longer the other side will reset the link autonomously and your local
side will quickly notice the reset once it comes back.

If you suspend for a little bit the link might not retrain immediately
but the shared state can become corrupted (eg sequence number
mismatch) this will eventually trigger a reset, after the local PEX
has been operating again.

The LinkUp bit after resume is also clearly a lie - most likely the
PEX takes some time to detect the change in remote state to trigger a
link down. After all, it was suspended while the remote was busy
trying to recover.

The fundamental problem here is the clock driver. It should not be
gating PEX clocks so naively. A PEX suspend needs to be sequenced to
ensure the link is cleanly brought down before the PEX is put to
sleep. That way it can cleanly and unambiguously be started up when it
resumes. No risk of glitching/corrupting the far side with some kind
of crap on the serial bus.

In any event, the most important required patch here is one that fixes
socid. It must not turn off the PEX clock. Then we can talk about how
to fix pci-mvebu to work as a module...

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-11 16:31             ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-11 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 12:23:14PM +0200, Thomas Petazzoni wrote:

> On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:
> > > In any event, turning on the clock should almost certainly be
> > > accompanied by a phy reset sequence to get both link ends on the same
> > > page.
> > >
> > > Attached is a rough, untested patch along those lines.
> > >
> > 
> > I took your attached patch and extended it a bit to print out how long it 
> > took. The delays also need to be much longer for me. I also fixed a small 
> > typo you made where the bit wasn't being set again to bring the link back 
> > up. I've attached the diff to your patch, as well as the combined patch 
> > (hope that makes sense).
> 
> I managed to reproduce the problem of the PCIe device not being
> detected on Mirabox when earlyprintk is disabled.
> 
> However, the proposed patch doesn't seem to work:
> 
> mvebu-pcie pcie-controller.3: PCIe0.0: performing link reset
> mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 100 tries
> mvebu-pcie pcie-controller.3: PCIe0.0: link came back up after 0 tries
> 
> It waits hundred times for the link to go down, but it never goes down,
> and then it doesn't wait at all to go up... because it never went down.

I wonder if bit 30 only disables link training, but doesn't force the
link down. There may be another bit that is required here.

Alternatively, are you seeing a different problem? If you apply the
hack to the socid does your symptom go away as well?

> Moreover, I am not entirely convinced that a PHY reset is needed here.
> In my tests, doing just a wait after setting the local dev number was
> sufficient. The fact is works with earlyprintk also indicates that we
> don't need to do any specific action with the PHY, just to wait a bit
> of time. I am worried that resetting the PHY might actually take more
> time than what is needed, and may have other consequences that we don't
> necessarily understand at this point.

I really disagree - clearly the fundamental problem is suspending one
side of the PEX link while the other remains operating. That isn't
specified to work and really shouldn't work.

If you suspend one side of the PEX you *MUST* reset the
link. Absolutely. No Doubt In My Mind.

Remember, PCI-E is a serial shared state protocol. The two sides must
remain in sync or a link reset is required to recover the shared
state. Halting one side obviously destroys this invariant.

I think in many cases the reset happens autonomously. The remote side
will force it to happen. This is what the debugging from Neil shows -
the link just resets at some inconvenient point. Now we know why.

The timing sensitivity also makes sense, if you suspend for a very
short time window the other side might not notice. If you suspend for
longer the other side will reset the link autonomously and your local
side will quickly notice the reset once it comes back.

If you suspend for a little bit the link might not retrain immediately
but the shared state can become corrupted (eg sequence number
mismatch) this will eventually trigger a reset, after the local PEX
has been operating again.

The LinkUp bit after resume is also clearly a lie - most likely the
PEX takes some time to detect the change in remote state to trigger a
link down. After all, it was suspended while the remote was busy
trying to recover.

The fundamental problem here is the clock driver. It should not be
gating PEX clocks so naively. A PEX suspend needs to be sequenced to
ensure the link is cleanly brought down before the PEX is put to
sleep. That way it can cleanly and unambiguously be started up when it
resumes. No risk of glitching/corrupting the far side with some kind
of crap on the serial bus.

In any event, the most important required patch here is one that fixes
socid. It must not turn off the PEX clock. Then we can talk about how
to fix pci-mvebu to work as a module...

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-11 16:31             ` Jason Gunthorpe
@ 2014-04-11 17:21               ` Matthew Minter
  -1 siblings, 0 replies; 50+ messages in thread
From: Matthew Minter @ 2014-04-11 17:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Thomas Petazzoni, Neil Greatorex, Willy Tarreau,
	Gerlando Falauto, linux-arm-kernel, Jason Cooper,
	Gregory Clément, Ezequiel Garcia, Andrew Lunn, linux-pci,
	Tawfik Bayouk, Lior Amsalem

Hi Guys,

I just wanted to confirm that both V1 and V2 of the patch set fixes my
problems, I still have an odd bug where a certain peripheral causes
the link to drop down from x4 to x1 but I think that is far more
likely a hardware issue with the device as it does not happen with any
other peripheral. So to confirm you can put my "tested by" on the
patches if you like.

I would also like to note, my board uses an external clock for the PCI
ports, thus it is unlikely to be effected by any issues relating to
clock drivers.

Please keep me up to date if there are any new revisions of the set
and I will try and test them.

Many thanks,
Matt

On 11 April 2014 17:31, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
> On Fri, Apr 11, 2014 at 12:23:14PM +0200, Thomas Petazzoni wrote:
>
>> On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:
>> > > In any event, turning on the clock should almost certainly be
>> > > accompanied by a phy reset sequence to get both link ends on the same
>> > > page.
>> > >
>> > > Attached is a rough, untested patch along those lines.
>> > >
>> >
>> > I took your attached patch and extended it a bit to print out how long it
>> > took. The delays also need to be much longer for me. I also fixed a small
>> > typo you made where the bit wasn't being set again to bring the link back
>> > up. I've attached the diff to your patch, as well as the combined patch
>> > (hope that makes sense).
>>
>> I managed to reproduce the problem of the PCIe device not being
>> detected on Mirabox when earlyprintk is disabled.
>>
>> However, the proposed patch doesn't seem to work:
>>
>> mvebu-pcie pcie-controller.3: PCIe0.0: performing link reset
>> mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 100 tries
>> mvebu-pcie pcie-controller.3: PCIe0.0: link came back up after 0 tries
>>
>> It waits hundred times for the link to go down, but it never goes down,
>> and then it doesn't wait at all to go up... because it never went down.
>
> I wonder if bit 30 only disables link training, but doesn't force the
> link down. There may be another bit that is required here.
>
> Alternatively, are you seeing a different problem? If you apply the
> hack to the socid does your symptom go away as well?
>
>> Moreover, I am not entirely convinced that a PHY reset is needed here.
>> In my tests, doing just a wait after setting the local dev number was
>> sufficient. The fact is works with earlyprintk also indicates that we
>> don't need to do any specific action with the PHY, just to wait a bit
>> of time. I am worried that resetting the PHY might actually take more
>> time than what is needed, and may have other consequences that we don't
>> necessarily understand at this point.
>
> I really disagree - clearly the fundamental problem is suspending one
> side of the PEX link while the other remains operating. That isn't
> specified to work and really shouldn't work.
>
> If you suspend one side of the PEX you *MUST* reset the
> link. Absolutely. No Doubt In My Mind.
>
> Remember, PCI-E is a serial shared state protocol. The two sides must
> remain in sync or a link reset is required to recover the shared
> state. Halting one side obviously destroys this invariant.
>
> I think in many cases the reset happens autonomously. The remote side
> will force it to happen. This is what the debugging from Neil shows -
> the link just resets at some inconvenient point. Now we know why.
>
> The timing sensitivity also makes sense, if you suspend for a very
> short time window the other side might not notice. If you suspend for
> longer the other side will reset the link autonomously and your local
> side will quickly notice the reset once it comes back.
>
> If you suspend for a little bit the link might not retrain immediately
> but the shared state can become corrupted (eg sequence number
> mismatch) this will eventually trigger a reset, after the local PEX
> has been operating again.
>
> The LinkUp bit after resume is also clearly a lie - most likely the
> PEX takes some time to detect the change in remote state to trigger a
> link down. After all, it was suspended while the remote was busy
> trying to recover.
>
> The fundamental problem here is the clock driver. It should not be
> gating PEX clocks so naively. A PEX suspend needs to be sequenced to
> ensure the link is cleanly brought down before the PEX is put to
> sleep. That way it can cleanly and unambiguously be started up when it
> resumes. No risk of glitching/corrupting the far side with some kind
> of crap on the serial bus.
>
> In any event, the most important required patch here is one that fixes
> socid. It must not turn off the PEX clock. Then we can talk about how
> to fix pci-mvebu to work as a module...
>
> Jason

-- 


------------------------------
For additional information including the registered office and the treatment of Xyratex confidential information please visit www.xyratex.com

------------------------------

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-11 17:21               ` Matthew Minter
  0 siblings, 0 replies; 50+ messages in thread
From: Matthew Minter @ 2014-04-11 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Guys,

I just wanted to confirm that both V1 and V2 of the patch set fixes my
problems, I still have an odd bug where a certain peripheral causes
the link to drop down from x4 to x1 but I think that is far more
likely a hardware issue with the device as it does not happen with any
other peripheral. So to confirm you can put my "tested by" on the
patches if you like.

I would also like to note, my board uses an external clock for the PCI
ports, thus it is unlikely to be effected by any issues relating to
clock drivers.

Please keep me up to date if there are any new revisions of the set
and I will try and test them.

Many thanks,
Matt

On 11 April 2014 17:31, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
> On Fri, Apr 11, 2014 at 12:23:14PM +0200, Thomas Petazzoni wrote:
>
>> On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:
>> > > In any event, turning on the clock should almost certainly be
>> > > accompanied by a phy reset sequence to get both link ends on the same
>> > > page.
>> > >
>> > > Attached is a rough, untested patch along those lines.
>> > >
>> >
>> > I took your attached patch and extended it a bit to print out how long it
>> > took. The delays also need to be much longer for me. I also fixed a small
>> > typo you made where the bit wasn't being set again to bring the link back
>> > up. I've attached the diff to your patch, as well as the combined patch
>> > (hope that makes sense).
>>
>> I managed to reproduce the problem of the PCIe device not being
>> detected on Mirabox when earlyprintk is disabled.
>>
>> However, the proposed patch doesn't seem to work:
>>
>> mvebu-pcie pcie-controller.3: PCIe0.0: performing link reset
>> mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 100 tries
>> mvebu-pcie pcie-controller.3: PCIe0.0: link came back up after 0 tries
>>
>> It waits hundred times for the link to go down, but it never goes down,
>> and then it doesn't wait at all to go up... because it never went down.
>
> I wonder if bit 30 only disables link training, but doesn't force the
> link down. There may be another bit that is required here.
>
> Alternatively, are you seeing a different problem? If you apply the
> hack to the socid does your symptom go away as well?
>
>> Moreover, I am not entirely convinced that a PHY reset is needed here.
>> In my tests, doing just a wait after setting the local dev number was
>> sufficient. The fact is works with earlyprintk also indicates that we
>> don't need to do any specific action with the PHY, just to wait a bit
>> of time. I am worried that resetting the PHY might actually take more
>> time than what is needed, and may have other consequences that we don't
>> necessarily understand at this point.
>
> I really disagree - clearly the fundamental problem is suspending one
> side of the PEX link while the other remains operating. That isn't
> specified to work and really shouldn't work.
>
> If you suspend one side of the PEX you *MUST* reset the
> link. Absolutely. No Doubt In My Mind.
>
> Remember, PCI-E is a serial shared state protocol. The two sides must
> remain in sync or a link reset is required to recover the shared
> state. Halting one side obviously destroys this invariant.
>
> I think in many cases the reset happens autonomously. The remote side
> will force it to happen. This is what the debugging from Neil shows -
> the link just resets at some inconvenient point. Now we know why.
>
> The timing sensitivity also makes sense, if you suspend for a very
> short time window the other side might not notice. If you suspend for
> longer the other side will reset the link autonomously and your local
> side will quickly notice the reset once it comes back.
>
> If you suspend for a little bit the link might not retrain immediately
> but the shared state can become corrupted (eg sequence number
> mismatch) this will eventually trigger a reset, after the local PEX
> has been operating again.
>
> The LinkUp bit after resume is also clearly a lie - most likely the
> PEX takes some time to detect the change in remote state to trigger a
> link down. After all, it was suspended while the remote was busy
> trying to recover.
>
> The fundamental problem here is the clock driver. It should not be
> gating PEX clocks so naively. A PEX suspend needs to be sequenced to
> ensure the link is cleanly brought down before the PEX is put to
> sleep. That way it can cleanly and unambiguously be started up when it
> resumes. No risk of glitching/corrupting the far side with some kind
> of crap on the serial bus.
>
> In any event, the most important required patch here is one that fixes
> socid. It must not turn off the PEX clock. Then we can talk about how
> to fix pci-mvebu to work as a module...
>
> Jason

-- 


------------------------------
For additional information including the registered office and the treatment of Xyratex confidential information please visit www.xyratex.com

------------------------------

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-11 17:21               ` Matthew Minter
@ 2014-04-11 17:29                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-11 17:29 UTC (permalink / raw)
  To: Matthew Minter
  Cc: Thomas Petazzoni, Neil Greatorex, Willy Tarreau,
	Gerlando Falauto, linux-arm-kernel, Jason Cooper,
	Gregory Clément, Ezequiel Garcia, Andrew Lunn, linux-pci,
	Tawfik Bayouk, Lior Amsalem

On Fri, Apr 11, 2014 at 06:21:08PM +0100, Matthew Minter wrote:

> I would also like to note, my board uses an external clock for the
> PCI ports, thus it is unlikely to be effected by any issues relating
> to clock drivers.

It doesn't matter, the clock in question is the internal divided CPU
synchronous clock that drives the PEX TLP logic. Externally sourcing
the PHY clock won't change the synchronous side.

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-11 17:29                 ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-11 17:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 11, 2014 at 06:21:08PM +0100, Matthew Minter wrote:

> I would also like to note, my board uses an external clock for the
> PCI ports, thus it is unlikely to be effected by any issues relating
> to clock drivers.

It doesn't matter, the clock in question is the internal divided CPU
synchronous clock that drives the PEX TLP logic. Externally sourcing
the PHY clock won't change the synchronous side.

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-10 21:56         ` Neil Greatorex
@ 2014-04-18 12:58           ` Thomas Petazzoni
  -1 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-18 12:58 UTC (permalink / raw)
  To: Neil Greatorex
  Cc: Jason Gunthorpe, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

Neil, Jason,

On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:

> I took your attached patch and extended it a bit to print out how long it 
> took. The delays also need to be much longer for me. I also fixed a small 
> typo you made where the bit wasn't being set again to bring the link back 
> up. I've attached the diff to your patch, as well as the combined patch 
> (hope that makes sense).

Unfortunately here your patch doesn't work (and neither does the patch
from Jason Gunthorpe). On Armada 370 DB, without the patch, the e1000e
NIC is detected when earlyprintk is enabled, and not detected when
earlyprintk is disabled. With the patch applied, the e1000e is no
longer detected *at all*, even if earlyprintk is enabled.

Extract from a boot log:

Linux version 3.15.0-rc1-00007-gedf643a-dirty (thomas@skate) (gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu7) ) #317 SMP Fri Apr 18 14:54:13 CEST 2014
[...]
Kernel command line: console=ttyS0,115200 earlyprintk loglevel=8 root=/dev/nfs nfsroot=192.168.1.22:/home/thomas/nfsroot ip=192.168.1.142:192.168.1.22:192.168.1.1:255.255.255.0:devboard:eth0:on
[...]
mvebu-pcie pcie-controller.2: PCIe0.0: performing link reset
mvebu-pcie pcie-controller.2: PCIe0.0: link went down after 20 tries
mvebu-pcie pcie-controller.2: PCIe0.0: link came back up after 100 tries
mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xf8000000-0xffdfffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:6710] type 01 class 0x060400
pci 0000:00:02.0: [11ab:6710] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
PCI: bus1: Fast back to back transfers enabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
PCI: bus2: Fast back to back transfers enabled
pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:02.0: PCI bridge to [bus 02]
[...]
# /usr/sbin/lspci 
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710
#

Any idea?

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-18 12:58           ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-18 12:58 UTC (permalink / raw)
  To: linux-arm-kernel

Neil, Jason,

On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:

> I took your attached patch and extended it a bit to print out how long it 
> took. The delays also need to be much longer for me. I also fixed a small 
> typo you made where the bit wasn't being set again to bring the link back 
> up. I've attached the diff to your patch, as well as the combined patch 
> (hope that makes sense).

Unfortunately here your patch doesn't work (and neither does the patch
from Jason Gunthorpe). On Armada 370 DB, without the patch, the e1000e
NIC is detected when earlyprintk is enabled, and not detected when
earlyprintk is disabled. With the patch applied, the e1000e is no
longer detected *at all*, even if earlyprintk is enabled.

Extract from a boot log:

Linux version 3.15.0-rc1-00007-gedf643a-dirty (thomas at skate) (gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu7) ) #317 SMP Fri Apr 18 14:54:13 CEST 2014
[...]
Kernel command line: console=ttyS0,115200 earlyprintk loglevel=8 root=/dev/nfs nfsroot=192.168.1.22:/home/thomas/nfsroot ip=192.168.1.142:192.168.1.22:192.168.1.1:255.255.255.0:devboard:eth0:on
[...]
mvebu-pcie pcie-controller.2: PCIe0.0: performing link reset
mvebu-pcie pcie-controller.2: PCIe0.0: link went down after 20 tries
mvebu-pcie pcie-controller.2: PCIe0.0: link came back up after 100 tries
mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xf8000000-0xffdfffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:6710] type 01 class 0x060400
pci 0000:00:02.0: [11ab:6710] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
PCI: bus1: Fast back to back transfers enabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
PCI: bus2: Fast back to back transfers enabled
pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:02.0: PCI bridge to [bus 02]
[...]
# /usr/sbin/lspci 
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710
#

Any idea?

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-11 16:31             ` Jason Gunthorpe
@ 2014-04-18 13:02               ` Thomas Petazzoni
  -1 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-18 13:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Neil Greatorex, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

Dear Jason Gunthorpe,

On Fri, 11 Apr 2014 10:31:29 -0600, Jason Gunthorpe wrote:

> > Moreover, I am not entirely convinced that a PHY reset is needed here.
> > In my tests, doing just a wait after setting the local dev number was
> > sufficient. The fact is works with earlyprintk also indicates that we
> > don't need to do any specific action with the PHY, just to wait a bit
> > of time. I am worried that resetting the PHY might actually take more
> > time than what is needed, and may have other consequences that we don't
> > necessarily understand at this point.
> 
> I really disagree - clearly the fundamental problem is suspending one
> side of the PEX link while the other remains operating. That isn't
> specified to work and really shouldn't work.
> 
> If you suspend one side of the PEX you *MUST* reset the
> link. Absolutely. No Doubt In My Mind.
> 
> Remember, PCI-E is a serial shared state protocol. The two sides must
> remain in sync or a link reset is required to recover the shared
> state. Halting one side obviously destroys this invariant.
> 
> I think in many cases the reset happens autonomously. The remote side
> will force it to happen. This is what the debugging from Neil shows -
> the link just resets at some inconvenient point. Now we know why.
> 
> The timing sensitivity also makes sense, if you suspend for a very
> short time window the other side might not notice. If you suspend for
> longer the other side will reset the link autonomously and your local
> side will quickly notice the reset once it comes back.
> 
> If you suspend for a little bit the link might not retrain immediately
> but the shared state can become corrupted (eg sequence number
> mismatch) this will eventually trigger a reset, after the local PEX
> has been operating again.
> 
> The LinkUp bit after resume is also clearly a lie - most likely the
> PEX takes some time to detect the change in remote state to trigger a
> link down. After all, it was suspended while the remote was busy
> trying to recover.

Ok, fair enough. However, Neil's patch (which is basically your patch
with longer delays) isn't working here, as I just reported in a
separate e-mail.

So we don't have a solution right now. Do you have another proposal to
try ?

> The fundamental problem here is the clock driver. It should not be
> gating PEX clocks so naively. A PEX suspend needs to be sequenced to
> ensure the link is cleanly brought down before the PEX is put to
> sleep. That way it can cleanly and unambiguously be started up when it
> resumes. No risk of glitching/corrupting the far side with some kind
> of crap on the serial bus.
> 
> In any event, the most important required patch here is one that fixes
> socid. It must not turn off the PEX clock. Then we can talk about how
> to fix pci-mvebu to work as a module...

I don't really understand this: all clocks are gated at boot time, so
regardless of mvebu-soc-id behavior, the PCIe driver should take care
of doing all the necessary initialization without making the
assumptions that the clocks were left turned on from the bootloader
time. This is needed if we want to support pci-mvebu as a module, so I
don't see why hacking mvebu-soc-id is going to solve this.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-18 13:02               ` Thomas Petazzoni
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Petazzoni @ 2014-04-18 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Jason Gunthorpe,

On Fri, 11 Apr 2014 10:31:29 -0600, Jason Gunthorpe wrote:

> > Moreover, I am not entirely convinced that a PHY reset is needed here.
> > In my tests, doing just a wait after setting the local dev number was
> > sufficient. The fact is works with earlyprintk also indicates that we
> > don't need to do any specific action with the PHY, just to wait a bit
> > of time. I am worried that resetting the PHY might actually take more
> > time than what is needed, and may have other consequences that we don't
> > necessarily understand at this point.
> 
> I really disagree - clearly the fundamental problem is suspending one
> side of the PEX link while the other remains operating. That isn't
> specified to work and really shouldn't work.
> 
> If you suspend one side of the PEX you *MUST* reset the
> link. Absolutely. No Doubt In My Mind.
> 
> Remember, PCI-E is a serial shared state protocol. The two sides must
> remain in sync or a link reset is required to recover the shared
> state. Halting one side obviously destroys this invariant.
> 
> I think in many cases the reset happens autonomously. The remote side
> will force it to happen. This is what the debugging from Neil shows -
> the link just resets at some inconvenient point. Now we know why.
> 
> The timing sensitivity also makes sense, if you suspend for a very
> short time window the other side might not notice. If you suspend for
> longer the other side will reset the link autonomously and your local
> side will quickly notice the reset once it comes back.
> 
> If you suspend for a little bit the link might not retrain immediately
> but the shared state can become corrupted (eg sequence number
> mismatch) this will eventually trigger a reset, after the local PEX
> has been operating again.
> 
> The LinkUp bit after resume is also clearly a lie - most likely the
> PEX takes some time to detect the change in remote state to trigger a
> link down. After all, it was suspended while the remote was busy
> trying to recover.

Ok, fair enough. However, Neil's patch (which is basically your patch
with longer delays) isn't working here, as I just reported in a
separate e-mail.

So we don't have a solution right now. Do you have another proposal to
try ?

> The fundamental problem here is the clock driver. It should not be
> gating PEX clocks so naively. A PEX suspend needs to be sequenced to
> ensure the link is cleanly brought down before the PEX is put to
> sleep. That way it can cleanly and unambiguously be started up when it
> resumes. No risk of glitching/corrupting the far side with some kind
> of crap on the serial bus.
> 
> In any event, the most important required patch here is one that fixes
> socid. It must not turn off the PEX clock. Then we can talk about how
> to fix pci-mvebu to work as a module...

I don't really understand this: all clocks are gated at boot time, so
regardless of mvebu-soc-id behavior, the PCIe driver should take care
of doing all the necessary initialization without making the
assumptions that the clocks were left turned on from the bootloader
time. This is needed if we want to support pci-mvebu as a module, so I
don't see why hacking mvebu-soc-id is going to solve this.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-18 13:02               ` Thomas Petazzoni
@ 2014-04-22 17:34                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-22 17:34 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Neil Greatorex, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

On Fri, Apr 18, 2014 at 03:02:44PM +0200, Thomas Petazzoni wrote:

> > The LinkUp bit after resume is also clearly a lie - most likely the
> > PEX takes some time to detect the change in remote state to trigger a
> > link down. After all, it was suspended while the remote was busy
> > trying to recover.
> 
> Ok, fair enough. However, Neil's patch (which is basically your patch
> with longer delays) isn't working here, as I just reported in a
> separate e-mail.
> 
> So we don't have a solution right now. Do you have another proposal to
> try ?

Do you have time to see if we can isolate the difference on your
system?

> > The fundamental problem here is the clock driver. It should not be
> > gating PEX clocks so naively. A PEX suspend needs to be sequenced to
> > ensure the link is cleanly brought down before the PEX is put to
> > sleep. That way it can cleanly and unambiguously be started up when it
> > resumes. No risk of glitching/corrupting the far side with some kind
> > of crap on the serial bus.
> > 
> > In any event, the most important required patch here is one that fixes
> > socid. It must not turn off the PEX clock. Then we can talk about how
> > to fix pci-mvebu to work as a module...
 
> I don't really understand this: all clocks are gated at boot time, so
> regardless of mvebu-soc-id behavior, the PCIe driver should take care
> of doing all the necessary initialization without making the
> assumptions that the clocks were left turned on from the bootloader

The current mvebu-soc-id makes it impossible to do a clean hand off of
the boot loader PEX setup to the PEX driver. I think that is a
problem. Certainly if we fix it non-modular PEX will start working
reliably.

Keep in mind the current driver cannot startup a PEX without
bootloader support. It does not clear the set-at-reset
Conf_TrainingDisable bit, and it doesn't wait for a link to come up
after doing so.

> time. This is needed if we want to support pci-mvebu as a module, so I
> don't see why hacking mvebu-soc-id is going to solve this.

I agree we should try to figure the modular case out, but it looks to
me that suspending the PEX is a bigger job that just gating the
clock..

The automatic gating of the PEX clocks should be more clever. If there
are no DT elements that reference the clock it can be disabled,
otherwise we should probably leave it enabled and rely on the PEX
driver to do power management once it gets loaded.

So broadly, my thinking is the following largely orthogonal items:
 1) PEX's that are setup by the bootloader must be cleanly handed off
    to the driver. The clocks must never gate.
 2) The driver should learn to turn on a PEX from either the
    Conf_TrainingDisable=1 state or the clock gated state
 3) PEX clocks that are never used in DT can be safely shutoff,
    otherwise they must remain on
 4) The PEX driver can learn to properly suspend the PEX for power
    savings, via the normal Linux power management stuff.

What do you think?

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-22 17:34                 ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-22 17:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 18, 2014 at 03:02:44PM +0200, Thomas Petazzoni wrote:

> > The LinkUp bit after resume is also clearly a lie - most likely the
> > PEX takes some time to detect the change in remote state to trigger a
> > link down. After all, it was suspended while the remote was busy
> > trying to recover.
> 
> Ok, fair enough. However, Neil's patch (which is basically your patch
> with longer delays) isn't working here, as I just reported in a
> separate e-mail.
> 
> So we don't have a solution right now. Do you have another proposal to
> try ?

Do you have time to see if we can isolate the difference on your
system?

> > The fundamental problem here is the clock driver. It should not be
> > gating PEX clocks so naively. A PEX suspend needs to be sequenced to
> > ensure the link is cleanly brought down before the PEX is put to
> > sleep. That way it can cleanly and unambiguously be started up when it
> > resumes. No risk of glitching/corrupting the far side with some kind
> > of crap on the serial bus.
> > 
> > In any event, the most important required patch here is one that fixes
> > socid. It must not turn off the PEX clock. Then we can talk about how
> > to fix pci-mvebu to work as a module...
 
> I don't really understand this: all clocks are gated at boot time, so
> regardless of mvebu-soc-id behavior, the PCIe driver should take care
> of doing all the necessary initialization without making the
> assumptions that the clocks were left turned on from the bootloader

The current mvebu-soc-id makes it impossible to do a clean hand off of
the boot loader PEX setup to the PEX driver. I think that is a
problem. Certainly if we fix it non-modular PEX will start working
reliably.

Keep in mind the current driver cannot startup a PEX without
bootloader support. It does not clear the set-at-reset
Conf_TrainingDisable bit, and it doesn't wait for a link to come up
after doing so.

> time. This is needed if we want to support pci-mvebu as a module, so I
> don't see why hacking mvebu-soc-id is going to solve this.

I agree we should try to figure the modular case out, but it looks to
me that suspending the PEX is a bigger job that just gating the
clock..

The automatic gating of the PEX clocks should be more clever. If there
are no DT elements that reference the clock it can be disabled,
otherwise we should probably leave it enabled and rely on the PEX
driver to do power management once it gets loaded.

So broadly, my thinking is the following largely orthogonal items:
 1) PEX's that are setup by the bootloader must be cleanly handed off
    to the driver. The clocks must never gate.
 2) The driver should learn to turn on a PEX from either the
    Conf_TrainingDisable=1 state or the clock gated state
 3) PEX clocks that are never used in DT can be safely shutoff,
    otherwise they must remain on
 4) The PEX driver can learn to properly suspend the PEX for power
    savings, via the normal Linux power management stuff.

What do you think?

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fixing PCIe issues on Armada XP
  2014-04-18 12:58           ` Thomas Petazzoni
@ 2014-04-22 17:56             ` Jason Gunthorpe
  -1 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-22 17:56 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Neil Greatorex, Willy Tarreau, Matthew Minter, Gerlando Falauto,
	linux-arm-kernel, Jason Cooper, Gregory Clément,
	Ezequiel Garcia, Andrew Lunn, linux-pci, Tawfik Bayouk,
	Lior Amsalem

> mvebu-pcie pcie-controller.2: PCIe0.0: performing link reset
> mvebu-pcie pcie-controller.2: PCIe0.0: link went down after 20 tries
> mvebu-pcie pcie-controller.2: PCIe0.0: link came back up after 100 tries

So the '100' here says the timeout expired. My first stab would be to
set the timeout longer.. Maybe your chip takes longer to train the
link.

You also mentioned that setting Conf_TrainingDisable doesn't cause the
link to go down? That would be useful to verify (in a clean
environment without clock gating weirdness..)

> # /usr/sbin/lspci 
> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710
> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710

I'm guessing:

echo 1 > /sys/bus/pci/rescan

Will make it appear?

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Fixing PCIe issues on Armada XP
@ 2014-04-22 17:56             ` Jason Gunthorpe
  0 siblings, 0 replies; 50+ messages in thread
From: Jason Gunthorpe @ 2014-04-22 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

> mvebu-pcie pcie-controller.2: PCIe0.0: performing link reset
> mvebu-pcie pcie-controller.2: PCIe0.0: link went down after 20 tries
> mvebu-pcie pcie-controller.2: PCIe0.0: link came back up after 100 tries

So the '100' here says the timeout expired. My first stab would be to
set the timeout longer.. Maybe your chip takes longer to train the
link.

You also mentioned that setting Conf_TrainingDisable doesn't cause the
link to go down? That would be useful to verify (in a clean
environment without clock gating weirdness..)

> # /usr/sbin/lspci 
> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710
> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710

I'm guessing:

echo 1 > /sys/bus/pci/rescan

Will make it appear?

Jason

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2014-04-22 17:56 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 16:19 Fixing PCIe issues on Armada XP Thomas Petazzoni
2014-04-10 16:19 ` Thomas Petazzoni
2014-04-10 16:57 ` Jason Gunthorpe
2014-04-10 16:57   ` Jason Gunthorpe
2014-04-10 18:01   ` Thomas Petazzoni
2014-04-10 18:01     ` Thomas Petazzoni
2014-04-10 20:12     ` Jason Gunthorpe
2014-04-10 20:12       ` Jason Gunthorpe
2014-04-10 21:04       ` Thomas Petazzoni
2014-04-10 21:04         ` Thomas Petazzoni
2014-04-10 21:56       ` Neil Greatorex
2014-04-10 21:56         ` Neil Greatorex
2014-04-10 22:06         ` Jason Gunthorpe
2014-04-10 22:06           ` Jason Gunthorpe
2014-04-10 22:15           ` Neil Greatorex
2014-04-10 22:15             ` Neil Greatorex
2014-04-11 10:23         ` Thomas Petazzoni
2014-04-11 10:23           ` Thomas Petazzoni
2014-04-11 16:31           ` Jason Gunthorpe
2014-04-11 16:31             ` Jason Gunthorpe
2014-04-11 17:21             ` Matthew Minter
2014-04-11 17:21               ` Matthew Minter
2014-04-11 17:29               ` Jason Gunthorpe
2014-04-11 17:29                 ` Jason Gunthorpe
2014-04-18 13:02             ` Thomas Petazzoni
2014-04-18 13:02               ` Thomas Petazzoni
2014-04-22 17:34               ` Jason Gunthorpe
2014-04-22 17:34                 ` Jason Gunthorpe
2014-04-18 12:58         ` Thomas Petazzoni
2014-04-18 12:58           ` Thomas Petazzoni
2014-04-22 17:56           ` Jason Gunthorpe
2014-04-22 17:56             ` Jason Gunthorpe
2014-04-10 17:10 ` Willy Tarreau
2014-04-10 17:10   ` Willy Tarreau
2014-04-10 18:02   ` Thomas Petazzoni
2014-04-10 18:02     ` Thomas Petazzoni
2014-04-10 23:13     ` Willy Tarreau
2014-04-10 23:13       ` Willy Tarreau
2014-04-10 23:40       ` Jason Gunthorpe
2014-04-10 23:40         ` Jason Gunthorpe
2014-04-11  6:23         ` Willy Tarreau
2014-04-11  6:23           ` Willy Tarreau
2014-04-10 18:20 ` Neil Greatorex
2014-04-10 18:20   ` Neil Greatorex
2014-04-10 21:07   ` Thomas Petazzoni
2014-04-10 21:07     ` Thomas Petazzoni
2014-04-11 14:32 ` Thomas Petazzoni
2014-04-11 14:32   ` Thomas Petazzoni
2014-04-11 15:57   ` Neil Greatorex
2014-04-11 15:57     ` Neil Greatorex

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.