All of lore.kernel.org
 help / color / mirror / Atom feed
* pci mvebu issue (memory controller)
@ 2021-02-09 13:17 Marek Behún
  2021-02-10  8:54 ` Thomas Petazzoni
  2021-10-03 12:09 ` Pali Rohár
  0 siblings, 2 replies; 12+ messages in thread
From: Marek Behún @ 2021-02-09 13:17 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Stefan Roese, Phil Sutter, Mario Six, Pali Rohár,
	Lorenzo Pieralisi, Bjorn Helgaas, linux-pci

Hello Thomas,

(sending this e-mail again because previously I sent it to Thomas' old
e-mail address at free-electrons)

we have enountered an issue with pci-mvebu driver and would like your
opinion, since you are the author of commit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4ac99011e542d06ea2bda10063502583c6d7991

After upgrading to new version of U-Boot on a Armada XP / 38x device,
some WiFi cards stopped working in kernel. Ath10k driver, for example,
could not load firmware into the card.

We discovered that the issue is caused by U-Boot:
- when U-Boot's pci_mvebu driver was converted to driver model API,
  U-Boot started to configure PCIe registers not only for the newtork
  adapter, but also for the Marvell Memory Controller (that you are
  mentioning in your commit).
- Since pci-mvebu driver in Linux is ignoring the Marvell Memory
  Controller device, and U-Boot configures its registers (BARs and what
  not), after kernel boots, the registers of this device are
  incompatible with kernel, or something, and this causes problems for
  the real PCIe device.
- Stefan Roese has temporarily solved this issue with U-Boot commit
  https://gitlab.denx.de/u-boot/custodians/u-boot-marvell/-/commit/6a2fa284aee2981be2c7661b3757ce112de8d528
  which basically just masks the Memory Controller's existence.

- in Linux commit f4ac99011e54 ("pci: mvebu: no longer fake the slot
  location of downstream devices") you mention that:

   * On slot 0, a "Marvell Memory controller", identical on all PCIe
     interfaces, and which isn't useful when the Marvell SoC is the PCIe
     root complex (i.e, the normal case when we run Linux on the Marvell
     SoC).

What we are wondering is:
- what does the Marvell Memory controller really do? Can it be used to
  configure something? It clearly does something, because if it is
  configured in U-Boot somehow but not in kernel, problems can occur.
- is the best solution really just to ignore this device?
- should U-Boot also start doing what commit f4ac99011e54 does? I.e.
  to make sure that the real device is in slot 0, and Marvell Memory
  Controller in slot 1.
- why is Linux ignoring this device? It isn't even listed in lspci
  output.

Thanks,

Marek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pci mvebu issue (memory controller)
  2021-02-09 13:17 pci mvebu issue (memory controller) Marek Behún
@ 2021-02-10  8:54 ` Thomas Petazzoni
  2021-02-10 13:59   ` [EXT] " Stefan Chulski
  2021-10-03 12:09 ` Pali Rohár
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Petazzoni @ 2021-02-10  8:54 UTC (permalink / raw)
  To: Marek Behún
  Cc: Stefan Roese, Phil Sutter, Mario Six, Pali Rohár,
	Lorenzo Pieralisi, Bjorn Helgaas, linux-pci, Stefan Chulski

Hello Marek,

On Tue, 9 Feb 2021 14:17:59 +0100
Marek Behún <kabel@kernel.org> wrote:

> (sending this e-mail again because previously I sent it to Thomas' old
> e-mail address at free-electrons)

Thanks. Turns out I still receive e-mail sent to @free-electrons.com,
so I had seen your previous e-mail but didn't had the chance to reply.

> we have enountered an issue with pci-mvebu driver and would like your
> opinion, since you are the author of commit
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4ac99011e542d06ea2bda10063502583c6d7991
> 
> After upgrading to new version of U-Boot on a Armada XP / 38x device,
> some WiFi cards stopped working in kernel. Ath10k driver, for example,
> could not load firmware into the card.
> 
> We discovered that the issue is caused by U-Boot:
> - when U-Boot's pci_mvebu driver was converted to driver model API,
>   U-Boot started to configure PCIe registers not only for the newtork
>   adapter, but also for the Marvell Memory Controller (that you are
>   mentioning in your commit).
> - Since pci-mvebu driver in Linux is ignoring the Marvell Memory
>   Controller device, and U-Boot configures its registers (BARs and what
>   not), after kernel boots, the registers of this device are
>   incompatible with kernel, or something, and this causes problems for
>   the real PCIe device.
> - Stefan Roese has temporarily solved this issue with U-Boot commit
>   https://gitlab.denx.de/u-boot/custodians/u-boot-marvell/-/commit/6a2fa284aee2981be2c7661b3757ce112de8d528
>   which basically just masks the Memory Controller's existence.
> 
> - in Linux commit f4ac99011e54 ("pci: mvebu: no longer fake the slot
>   location of downstream devices") you mention that:
> 
>    * On slot 0, a "Marvell Memory controller", identical on all PCIe
>      interfaces, and which isn't useful when the Marvell SoC is the PCIe
>      root complex (i.e, the normal case when we run Linux on the Marvell
>      SoC).
> 
> What we are wondering is:
> - what does the Marvell Memory controller really do? Can it be used to
>   configure something? It clearly does something, because if it is
>   configured in U-Boot somehow but not in kernel, problems can occur.
> - is the best solution really just to ignore this device?
> - should U-Boot also start doing what commit f4ac99011e54 does? I.e.
>   to make sure that the real device is in slot 0, and Marvell Memory
>   Controller in slot 1.
> - why is Linux ignoring this device? It isn't even listed in lspci
>   output.

To be honest, I don't have much details about what this device does,
and my memory is unclear on whether I really ever had any details. I
vaguely remember that this is a device that made sense when the Marvell
PCIe controller is used as an endpoint, and in such a situation this
device also the root complex to "see" the physical memory of the
Marvell SoC. And therefore in a situation where the Marvell PCIe
controller is the root complex, seeing this device didn't make sense.

In addition, I /think/ it was causing problems with the MBus windows
allocation. Indeed, if this device is visible, then we will try to
allocate MBus windows for its different BARs, and those windows are in
limited number.

I know this isn't a very helpful answer, but the documentation on this
is pretty much nonexistent, and I don't remember ever having very
solid and convincing answers.

I've added in Cc Stefan Chulski, from Marvell, who has recently posted
patches on the PPv2 driver. I don't know if he will have details about
PCIe, but perhaps he will be able to ask internally at Marvell.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [EXT] Re: pci mvebu issue (memory controller)
  2021-02-10  8:54 ` Thomas Petazzoni
@ 2021-02-10 13:59   ` Stefan Chulski
  2021-02-19 17:44     ` Pali Rohár
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Chulski @ 2021-02-10 13:59 UTC (permalink / raw)
  To: Thomas Petazzoni, Marek Behún
  Cc: Stefan Roese, Phil Sutter, Mario Six, Pali Rohár,
	Lorenzo Pieralisi, Bjorn Helgaas, linux-pci

> > (sending this e-mail again because previously I sent it to Thomas' old
> > e-mail address at free-electrons)
> 
> Thanks. Turns out I still receive e-mail sent to @free-electrons.com, so I had
> seen your previous e-mail but didn't had the chance to reply.
> 
> > we have enountered an issue with pci-mvebu driver and would like your
> > opinion, since you are the author of commit
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pu
> > b_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-
> 3Df4ac99011e542
> >
> d06ea2bda10063502583c6d7991&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
> r=DDQ3dK
> > wkTIxKAl6_Bs7GMx4zhJArrXKN2mDMOXGh7lg&m=lENmudbu2hlK44mVm-
> e8bgdi9Rm2AC
> > DXN8QY0frgcuY&s=7109I-
> xvpx1wW532pxvk1W8s_XeG77VQf2iP7QzhEao&e=
> >
> > After upgrading to new version of U-Boot on a Armada XP / 38x device,
> > some WiFi cards stopped working in kernel. Ath10k driver, for example,
> > could not load firmware into the card.
> >
> > We discovered that the issue is caused by U-Boot:
> > - when U-Boot's pci_mvebu driver was converted to driver model API,
> >   U-Boot started to configure PCIe registers not only for the newtork
> >   adapter, but also for the Marvell Memory Controller (that you are
> >   mentioning in your commit).
> > - Since pci-mvebu driver in Linux is ignoring the Marvell Memory
> >   Controller device, and U-Boot configures its registers (BARs and what
> >   not), after kernel boots, the registers of this device are
> >   incompatible with kernel, or something, and this causes problems for
> >   the real PCIe device.
> > - Stefan Roese has temporarily solved this issue with U-Boot commit
> >   https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__gitlab.denx.de_u-2Dboot_custodians_u-2Dboot-2Dmarvell_-
> 2D_commit_6a2fa284aee2981be2c7661b3757ce112de8d528&d=DwIFaQ&c=n
> KjWec2b6R0mOyPaz7xtfQ&r=DDQ3dKwkTIxKAl6_Bs7GMx4zhJArrXKN2mDM
> OXGh7lg&m=lENmudbu2hlK44mVm-
> e8bgdi9Rm2ACDXN8QY0frgcuY&s=B0eKBkblEygPGYvKDdMuwzzYhDg5Jlh_Q4
> eXHlIL-oc&e=
> >   which basically just masks the Memory Controller's existence.
> >
> > - in Linux commit f4ac99011e54 ("pci: mvebu: no longer fake the slot
> >   location of downstream devices") you mention that:
> >
> >    * On slot 0, a "Marvell Memory controller", identical on all PCIe
> >      interfaces, and which isn't useful when the Marvell SoC is the PCIe
> >      root complex (i.e, the normal case when we run Linux on the Marvell
> >      SoC).
> >
> > What we are wondering is:
> > - what does the Marvell Memory controller really do? Can it be used to
> >   configure something? It clearly does something, because if it is
> >   configured in U-Boot somehow but not in kernel, problems can occur.
> > - is the best solution really just to ignore this device?
> > - should U-Boot also start doing what commit f4ac99011e54 does? I.e.
> >   to make sure that the real device is in slot 0, and Marvell Memory
> >   Controller in slot 1.
> > - why is Linux ignoring this device? It isn't even listed in lspci
> >   output.
> 
> To be honest, I don't have much details about what this device does, and my
> memory is unclear on whether I really ever had any details. I vaguely
> remember that this is a device that made sense when the Marvell PCIe
> controller is used as an endpoint, and in such a situation this device also the
> root complex to "see" the physical memory of the Marvell SoC. And
> therefore in a situation where the Marvell PCIe controller is the root
> complex, seeing this device didn't make sense.
> 
> In addition, I /think/ it was causing problems with the MBus windows
> allocation. Indeed, if this device is visible, then we will try to allocate MBus
> windows for its different BARs, and those windows are in limited number.
> 
> I know this isn't a very helpful answer, but the documentation on this is
> pretty much nonexistent, and I don't remember ever having very solid and
> convincing answers.
> 
> I've added in Cc Stefan Chulski, from Marvell, who has recently posted
> patches on the PPv2 driver. I don't know if he will have details about PCIe,
> but perhaps he will be able to ask internally at Marvell.
> 
> Best regards,

I not familiar with Armada XP PCIe. But I can check internally at Marvell.

Best Regards,
Stefan.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] Re: pci mvebu issue (memory controller)
  2021-02-10 13:59   ` [EXT] " Stefan Chulski
@ 2021-02-19 17:44     ` Pali Rohár
  2021-03-04 18:29       ` Bjorn Helgaas
  0 siblings, 1 reply; 12+ messages in thread
From: Pali Rohár @ 2021-02-19 17:44 UTC (permalink / raw)
  To: Stefan Chulski, Bjorn Helgaas
  Cc: Thomas Petazzoni, Marek Behún, Stefan Roese, Phil Sutter,
	Mario Six, Lorenzo Pieralisi, linux-pci

On Wednesday 10 February 2021 13:59:41 Stefan Chulski wrote:
> > > (sending this e-mail again because previously I sent it to Thomas' old
> > > e-mail address at free-electrons)
> > 
> > Thanks. Turns out I still receive e-mail sent to @free-electrons.com, so I had
> > seen your previous e-mail but didn't had the chance to reply.
> > 
> > > we have enountered an issue with pci-mvebu driver and would like your
> > > opinion, since you are the author of commit
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pu
> > > b_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-
> > 3Df4ac99011e542
> > >
> > d06ea2bda10063502583c6d7991&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
> > r=DDQ3dK
> > > wkTIxKAl6_Bs7GMx4zhJArrXKN2mDMOXGh7lg&m=lENmudbu2hlK44mVm-
> > e8bgdi9Rm2AC
> > > DXN8QY0frgcuY&s=7109I-
> > xvpx1wW532pxvk1W8s_XeG77VQf2iP7QzhEao&e=
> > >
> > > After upgrading to new version of U-Boot on a Armada XP / 38x device,
> > > some WiFi cards stopped working in kernel. Ath10k driver, for example,
> > > could not load firmware into the card.
> > >
> > > We discovered that the issue is caused by U-Boot:
> > > - when U-Boot's pci_mvebu driver was converted to driver model API,
> > >   U-Boot started to configure PCIe registers not only for the newtork
> > >   adapter, but also for the Marvell Memory Controller (that you are
> > >   mentioning in your commit).
> > > - Since pci-mvebu driver in Linux is ignoring the Marvell Memory
> > >   Controller device, and U-Boot configures its registers (BARs and what
> > >   not), after kernel boots, the registers of this device are
> > >   incompatible with kernel, or something, and this causes problems for
> > >   the real PCIe device.
> > > - Stefan Roese has temporarily solved this issue with U-Boot commit
> > >   https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__gitlab.denx.de_u-2Dboot_custodians_u-2Dboot-2Dmarvell_-
> > 2D_commit_6a2fa284aee2981be2c7661b3757ce112de8d528&d=DwIFaQ&c=n
> > KjWec2b6R0mOyPaz7xtfQ&r=DDQ3dKwkTIxKAl6_Bs7GMx4zhJArrXKN2mDM
> > OXGh7lg&m=lENmudbu2hlK44mVm-
> > e8bgdi9Rm2ACDXN8QY0frgcuY&s=B0eKBkblEygPGYvKDdMuwzzYhDg5Jlh_Q4
> > eXHlIL-oc&e=
> > >   which basically just masks the Memory Controller's existence.
> > >
> > > - in Linux commit f4ac99011e54 ("pci: mvebu: no longer fake the slot
> > >   location of downstream devices") you mention that:
> > >
> > >    * On slot 0, a "Marvell Memory controller", identical on all PCIe
> > >      interfaces, and which isn't useful when the Marvell SoC is the PCIe
> > >      root complex (i.e, the normal case when we run Linux on the Marvell
> > >      SoC).
> > >
> > > What we are wondering is:
> > > - what does the Marvell Memory controller really do? Can it be used to
> > >   configure something? It clearly does something, because if it is
> > >   configured in U-Boot somehow but not in kernel, problems can occur.
> > > - is the best solution really just to ignore this device?
> > > - should U-Boot also start doing what commit f4ac99011e54 does? I.e.
> > >   to make sure that the real device is in slot 0, and Marvell Memory
> > >   Controller in slot 1.
> > > - why is Linux ignoring this device? It isn't even listed in lspci
> > >   output.
> > 
> > To be honest, I don't have much details about what this device does, and my
> > memory is unclear on whether I really ever had any details. I vaguely
> > remember that this is a device that made sense when the Marvell PCIe
> > controller is used as an endpoint, and in such a situation this device also the
> > root complex to "see" the physical memory of the Marvell SoC. And
> > therefore in a situation where the Marvell PCIe controller is the root
> > complex, seeing this device didn't make sense.
> > 
> > In addition, I /think/ it was causing problems with the MBus windows
> > allocation. Indeed, if this device is visible, then we will try to allocate MBus
> > windows for its different BARs, and those windows are in limited number.
> > 
> > I know this isn't a very helpful answer, but the documentation on this is
> > pretty much nonexistent, and I don't remember ever having very solid and
> > convincing answers.
> > 
> > I've added in Cc Stefan Chulski, from Marvell, who has recently posted
> > patches on the PPv2 driver. I don't know if he will have details about PCIe,
> > but perhaps he will be able to ask internally at Marvell.
> > 
> > Best regards,
> 
> I not familiar with Armada XP PCIe. But I can check internally at Marvell.
> 
> Best Regards,
> Stefan.
> 

Stefan: If you get any information internally in Marvell, please let us know!

Bjorn: What do you think, should Linux kernel completely hide some PCIe
devices from /sys hierarchy and also from 'lspci' output? Or should
kernel preserve even non-functional / unknown PCIe devices visible in
'lspci' output?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] Re: pci mvebu issue (memory controller)
  2021-02-19 17:44     ` Pali Rohár
@ 2021-03-04 18:29       ` Bjorn Helgaas
  2021-11-01 18:07         ` Jason Gunthorpe
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2021-03-04 18:29 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Stefan Chulski, Bjorn Helgaas, Thomas Petazzoni,
	Marek Behún, Stefan Roese, Phil Sutter, Mario Six,
	Lorenzo Pieralisi, linux-pci

On Fri, Feb 19, 2021 at 06:44:06PM +0100, Pali Rohár wrote:
> On Wednesday 10 February 2021 13:59:41 Stefan Chulski wrote:
> > > > (sending this e-mail again because previously I sent it to Thomas' old
> > > > e-mail address at free-electrons)
> > > 
> > > Thanks. Turns out I still receive e-mail sent to @free-electrons.com, so I had
> > > seen your previous e-mail but didn't had the chance to reply.
> > > 
> > > > we have enountered an issue with pci-mvebu driver and would like your
> > > > opinion, since you are the author of commit
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pu
> > > > b_scm_linux_kernel_git_torvalds_linux.git_commit_-3Fid-
> > > 3Df4ac99011e542
> > > >
> > > d06ea2bda10063502583c6d7991&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&
> > > r=DDQ3dK
> > > > wkTIxKAl6_Bs7GMx4zhJArrXKN2mDMOXGh7lg&m=lENmudbu2hlK44mVm-
> > > e8bgdi9Rm2AC
> > > > DXN8QY0frgcuY&s=7109I-
> > > xvpx1wW532pxvk1W8s_XeG77VQf2iP7QzhEao&e=
> > > >
> > > > After upgrading to new version of U-Boot on a Armada XP / 38x device,
> > > > some WiFi cards stopped working in kernel. Ath10k driver, for example,
> > > > could not load firmware into the card.
> > > >
> > > > We discovered that the issue is caused by U-Boot:
> > > > - when U-Boot's pci_mvebu driver was converted to driver model API,
> > > >   U-Boot started to configure PCIe registers not only for the newtork
> > > >   adapter, but also for the Marvell Memory Controller (that you are
> > > >   mentioning in your commit).
> > > > - Since pci-mvebu driver in Linux is ignoring the Marvell Memory
> > > >   Controller device, and U-Boot configures its registers (BARs and what
> > > >   not), after kernel boots, the registers of this device are
> > > >   incompatible with kernel, or something, and this causes problems for
> > > >   the real PCIe device.
> > > > - Stefan Roese has temporarily solved this issue with U-Boot commit
> > > >   https://urldefense.proofpoint.com/v2/url?u=https-
> > > 3A__gitlab.denx.de_u-2Dboot_custodians_u-2Dboot-2Dmarvell_-
> > > 2D_commit_6a2fa284aee2981be2c7661b3757ce112de8d528&d=DwIFaQ&c=n
> > > KjWec2b6R0mOyPaz7xtfQ&r=DDQ3dKwkTIxKAl6_Bs7GMx4zhJArrXKN2mDM
> > > OXGh7lg&m=lENmudbu2hlK44mVm-
> > > e8bgdi9Rm2ACDXN8QY0frgcuY&s=B0eKBkblEygPGYvKDdMuwzzYhDg5Jlh_Q4
> > > eXHlIL-oc&e=
> > > >   which basically just masks the Memory Controller's existence.
> > > >
> > > > - in Linux commit f4ac99011e54 ("pci: mvebu: no longer fake the slot
> > > >   location of downstream devices") you mention that:
> > > >
> > > >    * On slot 0, a "Marvell Memory controller", identical on all PCIe
> > > >      interfaces, and which isn't useful when the Marvell SoC is the PCIe
> > > >      root complex (i.e, the normal case when we run Linux on the Marvell
> > > >      SoC).
> > > >
> > > > What we are wondering is:
> > > > - what does the Marvell Memory controller really do? Can it be used to
> > > >   configure something? It clearly does something, because if it is
> > > >   configured in U-Boot somehow but not in kernel, problems can occur.
> > > > - is the best solution really just to ignore this device?
> > > > - should U-Boot also start doing what commit f4ac99011e54 does? I.e.
> > > >   to make sure that the real device is in slot 0, and Marvell Memory
> > > >   Controller in slot 1.
> > > > - why is Linux ignoring this device? It isn't even listed in lspci
> > > >   output.
> > > 
> > > To be honest, I don't have much details about what this device does, and my
> > > memory is unclear on whether I really ever had any details. I vaguely
> > > remember that this is a device that made sense when the Marvell PCIe
> > > controller is used as an endpoint, and in such a situation this device also the
> > > root complex to "see" the physical memory of the Marvell SoC. And
> > > therefore in a situation where the Marvell PCIe controller is the root
> > > complex, seeing this device didn't make sense.
> > > 
> > > In addition, I /think/ it was causing problems with the MBus windows
> > > allocation. Indeed, if this device is visible, then we will try to allocate MBus
> > > windows for its different BARs, and those windows are in limited number.
> > > 
> > > I know this isn't a very helpful answer, but the documentation on this is
> > > pretty much nonexistent, and I don't remember ever having very solid and
> > > convincing answers.
> > > 
> > > I've added in Cc Stefan Chulski, from Marvell, who has recently posted
> > > patches on the PPv2 driver. I don't know if he will have details about PCIe,
> > > but perhaps he will be able to ask internally at Marvell.
> > > 
> > > Best regards,
> > 
> > I not familiar with Armada XP PCIe. But I can check internally at Marvell.
> > 
> > Best Regards,
> > Stefan.
> > 
> 
> Stefan: If you get any information internally in Marvell, please let us know!
> 
> Bjorn: What do you think, should Linux kernel completely hide some PCIe
> devices from /sys hierarchy and also from 'lspci' output? Or should
> kernel preserve even non-functional / unknown PCIe devices visible in
> 'lspci' output?

In general I don't think the kernel should hide PCI devices.  The PCI
core has no way of knowing whether devices are non-functional, and
"unknown" doesn't really mean anything because a driver could be
loaded later.

But if a device is in use by firmware, or if exposing it causes some
problem, it might make sense to hide it.

In your case, the problem description is "... the registers of this
device are incompatible with kernel, or something, and this causes
problems for the real PCIe device ..."

That's not much to go on.  Someone with more knowledge of the actual
problem would have to weigh in on whether hiding a device is the best
approach.

With more details we might see what the conflict between the devices
is.  E.g., maybe we assign the same resources to both, or maybe we
don't assign a bridge window to reach the WiFi card.

Bjorn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: pci mvebu issue (memory controller)
  2021-02-09 13:17 pci mvebu issue (memory controller) Marek Behún
  2021-02-10  8:54 ` Thomas Petazzoni
@ 2021-10-03 12:09 ` Pali Rohár
  1 sibling, 0 replies; 12+ messages in thread
From: Pali Rohár @ 2021-10-03 12:09 UTC (permalink / raw)
  To: Bjorn Helgaas, Lorenzo Pieralisi
  Cc: Marek Behún, Thomas Petazzoni, Stefan Roese, Phil Sutter,
	Mario Six, Stefan Chulski, linux-pci

Hello! See explanation below.

On Tuesday 09 February 2021 14:17:59 Marek Behún wrote:
> Hello Thomas,
> 
> (sending this e-mail again because previously I sent it to Thomas' old
> e-mail address at free-electrons)
> 
> we have enountered an issue with pci-mvebu driver and would like your
> opinion, since you are the author of commit
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4ac99011e542d06ea2bda10063502583c6d7991
> 
> After upgrading to new version of U-Boot on a Armada XP / 38x device,
> some WiFi cards stopped working in kernel. Ath10k driver, for example,
> could not load firmware into the card.
> 
> We discovered that the issue is caused by U-Boot:
> - when U-Boot's pci_mvebu driver was converted to driver model API,
>   U-Boot started to configure PCIe registers not only for the newtork
>   adapter, but also for the Marvell Memory Controller (that you are
>   mentioning in your commit).
> - Since pci-mvebu driver in Linux is ignoring the Marvell Memory
>   Controller device, and U-Boot configures its registers (BARs and what
>   not), after kernel boots, the registers of this device are
>   incompatible with kernel, or something, and this causes problems for
>   the real PCIe device.
> - Stefan Roese has temporarily solved this issue with U-Boot commit
>   https://gitlab.denx.de/u-boot/custodians/u-boot-marvell/-/commit/6a2fa284aee2981be2c7661b3757ce112de8d528
>   which basically just masks the Memory Controller's existence.
> 
> - in Linux commit f4ac99011e54 ("pci: mvebu: no longer fake the slot
>   location of downstream devices") you mention that:
> 
>    * On slot 0, a "Marvell Memory controller", identical on all PCIe
>      interfaces, and which isn't useful when the Marvell SoC is the PCIe
>      root complex (i.e, the normal case when we run Linux on the Marvell
>      SoC).
> 
> What we are wondering is:
> - what does the Marvell Memory controller really do? Can it be used to
>   configure something? It clearly does something, because if it is
>   configured in U-Boot somehow but not in kernel, problems can occur.
> - is the best solution really just to ignore this device?
> - should U-Boot also start doing what commit f4ac99011e54 does? I.e.
>   to make sure that the real device is in slot 0, and Marvell Memory
>   Controller in slot 1.
> - why is Linux ignoring this device? It isn't even listed in lspci
>   output.
> 
> Thanks,
> 
> Marek

tl;dr

- Mysterious Marvell Memory Controller is PCIe Root Port (it can be
  verified by e.g. doing config space dump from U-Boot and then parse
  it via lspci)
- Config space of this PCIe device is mapped directly to the address
  space of PCIe controller (to offset zero)
- It has config space with Header Type 0 and Class Code 0x5080
- BARs configure PCIe controller itself, BAR0 must point to beginning of
  the SoC registers, other BARs to DDR memory address space
- Both U-Boot and Kernel pci mvebu drivers set Secondary Bus num to zero
- Patch which was fixing this issue disappeared from kernel

I think this explains all mentioned issues in previous email. Controller
driver configures registers for SoC and DDR and then PCI core/pnp
reconfigured them via config space to different values = no PCIe device
is working as PCIe controller it not able to access SoC registers and
DDR memory correctly anymore.

Bjorn, it is normal that PCIe Root Port device has Type 0 config space
and Class Code 0x5080 (Memory controller)? Because I thought that PCIe
Root Port device must have Class Code 0x6004 (PCI Bridge) with Type 1
config space.

And what should happen according to PCIe standards when both primary and
secondary bus numbers are configured to zeros? Or to other same numbers?

On primary bus is that Memory Controller == Root Port and on secondary
bus is endpoint card. Marvell has additional register for specifying
device number at which Root Port appears. And it looks like that if
primary and secondary bus numbers are same, then on this bus at Root
Port device address appears Root Port and on all other device addresses
appears endpoint card (which looks crazy if endpoint card is at all
possible BDF addresses where B=primary=secondary and D!=root_port).
But I have no idea what happens for other buses.

Seems that due to these issues pci-mvebu.c kernel driver filters access
to this PCIe Root Port device and uses pci-bridge-emul.c for providing
emulated PCIe Root Port device. It sets Root Port device address to 1
and allow access only to device address 0 (at which is endpoint card).

This issues appears in all Marvell SoCs. Here are just few lspci output
sent by different people in past. And All have one common thing: device
with "Root Port" and "Memory controller: Marvell" marking:

https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.02.1210261857100.20029@mirri/
https://lore.kernel.org/linux-arm-kernel/ad9478410910120746g2ce82af1t71a84ea02e9eecb7@mail.gmail.com/
https://lore.kernel.org/ath9k-devel/4FF0EBCE.3020308@allnet.de/

And that is not all. It looks like that this issue with Root Port /
Memory Controller was known also for kernel developers. In past, about
10 years ago, into kernel was merged following commit which explained it
and fixed class code from Memory Controller to PCI Bridge:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1dc831bf53fddcc6443f74a39e72db5bcea4f15d

Apparently this patch completely disappeared as I'm not able find any
code with this comment or fixup in mainline kernel anymore.
Bjorn, Lorenzo, do you have any idea what happened?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] Re: pci mvebu issue (memory controller)
  2021-03-04 18:29       ` Bjorn Helgaas
@ 2021-11-01 18:07         ` Jason Gunthorpe
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2021-11-01 18:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Pali Rohár, Stefan Chulski, Bjorn Helgaas, Thomas Petazzoni,
	Marek Behún, Stefan Roese, Phil Sutter, Mario Six,
	Lorenzo Pieralisi, linux-pci

On Thu, Mar 04, 2021 at 12:29:08PM -0600, Bjorn Helgaas wrote:

> That's not much to go on.  Someone with more knowledge of the actual
> problem would have to weigh in on whether hiding a device is the best
> approach.

Since Pali asked..

The issue with this HW is the IP designers took an end port PCIe core
and glued it up to act as a root port without changing anything. This
is why it doesn't present a bridge config space. It is a *completely*
non compliant design.

The pci-mvebu host bridge driver is designged to fix this. It provides
a compliant PCI register view for a root port device using SW to
inspect config space operations and remaps the config space accesses
to their non-compliant positions within the SOC.

Hoping that the PCI core can directly drive this PCI device as a root
port without the above driver is just an endless sea of hacks.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] Re: pci mvebu issue (memory controller)
  2021-06-02 21:13       ` Pali Rohár
@ 2021-06-02 21:59         ` Marek Behún
  0 siblings, 0 replies; 12+ messages in thread
From: Marek Behún @ 2021-06-02 21:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Pali Rohár, Stefan Chulski, Bjorn Helgaas, Thomas Petazzoni,
	Stefan Roese, Phil Sutter, Mario Six, Lorenzo Pieralisi,
	linux-pci

On Wed, 2 Jun 2021 23:13:35 +0200
Pali Rohár <pali@kernel.org> wrote:

> > If the NICs are ordinary PCIe endpoints there must be *something* to
> > terminate the other end of the link.  Maybe it has some sort of
> > non-standard programming interface, but from a PCIe topology point of
> > view, it's a root port.
> > 
> > I don't think I can contribute anything to the stuff below.  It sounds
> > like there's some confusion about how to handle these root ports that
> > aren't exactly root ports.  That's really up to uboot and the mvebu
> > driver to figure out.  
> 
> Yes, I understand, it is non-standard and since beginning I'm also
> confused how this stuff work at all... And this is also reason why
> kernel emulates those root ports (via virtual PCIe bridge devices) to
> present "standard" topology.
> 
> Remaining question is: should really kernel filter that "memory
> controller" device and do not show it in linux PCIe device hierarchy?
> 

Bjorn,

this discussion has gone a little too complex.

The basic issue which Pali tries to solve can be recapitulated:
- there is a "memory controller" device on the "virtual" bus
- Linux' pci-mvebu driver hides this device
- we don't know what is the purpose of this device; it is visible even
  if there is no PCIe device connected
- Pali wants to know what is the purpose of this "memory controller"
  and whether this device should be hidden by Linux and U-Boot, as it
  currently is, or if the controller driver should expose this device

Marek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] Re: pci mvebu issue (memory controller)
  2021-06-02 21:01     ` Bjorn Helgaas
@ 2021-06-02 21:13       ` Pali Rohár
  2021-06-02 21:59         ` Marek Behún
  0 siblings, 1 reply; 12+ messages in thread
From: Pali Rohár @ 2021-06-02 21:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Stefan Chulski, Bjorn Helgaas, Thomas Petazzoni,
	Marek Behún, Stefan Roese, Phil Sutter, Mario Six,
	Lorenzo Pieralisi, linux-pci

On Wednesday 02 June 2021 16:01:01 Bjorn Helgaas wrote:
> On Wed, Jun 02, 2021 at 10:39:17PM +0200, Pali Rohár wrote:
> > On Wednesday 02 June 2021 14:14:55 Bjorn Helgaas wrote:
> > > On Wed, Jun 02, 2021 at 01:07:03PM +0200, Pali Rohár wrote:
> > > 
> > > > In configuration with *bad* suffix is used U-Boot which does not ignore
> > > > PCIe device Memory controller and configure it when U-Boot initialize.
> > > > In this configuration loaded kernel is unable to initialize wifi cards.
> > > > 
> > > > In configuration with *ok* suffix is U-Boot explicitly patched to
> > > > ignores PCIe device Memory controller and loaded kernel can use wifi
> > > > cards without any issue.
> > > > 
> > > > In both configurations is used same kernel version. As I wrote in
> > > > previous emails kernel already ignores and hides Memory controller PCIe
> > > > device, so lspci does not see it.
> > > > 
> > > > In attachment I'm sending dmesg and lspci outputs from Linux and pci
> > > > output from U-Boot.
> > > > 
> > > > What is suspicious for me is that this Memory controller device is at
> > > > the same bus as wifi card. PCIe is "point to point", so at the other end
> > > > of link should be only one device... Therefore I'm not sure if kernel
> > > > can handle such thing like "two PCIe devices" at other end of PCIe link.
> > > > 
> > > > Could you look at attached logs if you see something suspicious here? Or
> > > > if you need other logs (either from U-Boot or kernel) please let me
> > > > know.
> > > > 
> > > > Note that U-Boot does not see PCIe Bridge as it is emulated only by
> > > > kernel. So U-Boot enumerates buses from zero and kernel from one (as
> > > > kernel's bus zero is for emulated PCIe Bridges).
> > > 
> > > I've lost track of what the problem is or what patch we're evaluating.
> > 
> > With bad uboot (which enumerates and initialize all PCIe devices,
> > including that memory controller), linux kernel is unable to use PCIe
> > devices. E.g. ath10k driver fails to start. If bad uboot has disabled
> > PCIe device initialization then kernel has no problems.
> > 
> > > Here's what I see from dmesg/lspci/uboot:
> > > 
> > >   # dmesg (both bad/ok) and lspci:
> > >   00:01.0 [11ab:6820] Root Port to [bus 01]
> > >   00:02.0 [11ab:6820] Root Port to [bus 02]
> > >   00:03.0 [11ab:6820] Root Port to [bus 03]
> > >   01:00.0 [168c:002e] Atheros AR9287 NIC
> > >   02:00.0 [168c:0046] Atheros QCA9984 NIC
> > >   03:00.0 [168c:003c] Atheros QCA986x/988x NIC
> > > 
> > > The above looks perfectly reasonable.
> > > 
> > >   # uboot (bad):
> > >   00.00.00 [11ab:6820] memory controller
> > >   00.01.00 [168c:002e] NIC
> > >   01.00.00 [11ab:6820] memory controller
> > >   01.01.00 [168c:0046] NIC
> > >   02.00.00 [11ab:6820] memory controller
> > >   02.01.00 [168c:003c] NIC
> > > 
> > > The above looks dubious at best.  Bus 00 clearly must be a root bus
> > > because bus 00 can never be a bridge's secondary bus.
> > > 
> > > Either buses 01 and 02 need to also be root buses (e.g., if we had
> > > three host bridges, one leading to bus 00, another to bus 01, and
> > > another to bus 02), OR there must be Root Ports that act as bridges
> > > leading from bus 00 to bus 01 and bus 02.
> > 
> > There are 3 independent links from CPU, so 3 independent buses. Buses 01
> > and 02 are accessed directly, not via bus 00.
> 
> That sounds like the "three host bridges leading to three root buses"
> scenario.  If the NICs are ordinary endpoints (not Root Complex
> integrated endpoints), there must be a Root Port on each root bus, and
> the Root Port and the endpoint must be on different buses (just like
> any other bridge).
> 
> For example, you could have this (which seems to be what you describe
> above):
> 
>   host bridge A to domain 0000 [bus 00-01]:
>     00:00.0 Root Port to [bus 01]
>     01:00.0 Atheros AR9287 NIC
> 
>   host bridge B to domain 0000 [bus 02-03]:
>     02:00.0 Root Port to [bus 03]
>     03:00.0 Atheros QCA9984 NIC
> 
>   host bridge C to domain 0000 [bus 04-05]:
>     04:00.0 Root Port to [bus 05]
>     05:00.0 Atheros QCA986x/988x NIC

Yes, this basically matches kernel dmesg/lspci output. These root ports
are implemented and emulated by pci-bridge-emul.c driver. Just that all
root ports are on "virtual" bus 0.

> Or, since each host bridge spawns a separate PCI hierarchy, you could
> have each one in its own domain.  If the host bridges are truly
> independent, this is probably a software choice and could look like
> this:
> 
>   host bridge A to domain 0000 [bus 00-ff]:
>     0000:00:00.0 Root Port to [bus 01]
>     0000:01:00.0 Atheros AR9287 NIC
> 
>   host bridge B to domain 0001 [bus 00-ff]:
>     0001:00:00.0 Root Port to [bus 01]
>     0001:01:00.0 Atheros QCA9984 NIC
> 
>   host bridge C to domain 0002 [bus 00-ff]:
>     0002:00:00.0 Root Port to [bus 01]
>     0002:01:00.0 Atheros QCA986x/988x NIC

Yes, it is just a software choice and seems that pci-bridge-emul.c puts
all these root ports to the same linux domain. But I do not see reason
if pci-bridge-emul.c could not be programmed to put all devices into
separate segments / domains.

> > Linux devices 00:01.0, 00:02.0 and 00:03.0 are just virtual devices
> > created by kernel pci-bridge-emul.c driver. They are not real devices,
> > they are not present on PCIe bus and therefore they cannot be visible or
> > available in u-boot.
> 
> If the NICs are ordinary PCIe endpoints there must be *something* to
> terminate the other end of the link.  Maybe it has some sort of
> non-standard programming interface, but from a PCIe topology point of
> view, it's a root port.
> 
> I don't think I can contribute anything to the stuff below.  It sounds
> like there's some confusion about how to handle these root ports that
> aren't exactly root ports.  That's really up to uboot and the mvebu
> driver to figure out.

Yes, I understand, it is non-standard and since beginning I'm also
confused how this stuff work at all... And this is also reason why
kernel emulates those root ports (via virtual PCIe bridge devices) to
present "standard" topology.

Remaining question is: should really kernel filter that "memory
controller" device and do not show it in linux PCIe device hierarchy?

> > Moreover kernel pci-mvebu.c controller driver filters exactly one device
> > at every bus which results that "memory controller" is not visible in
> > lspci.
> > 
> > Moreover there is mvebu specific register which seems to set device
> > number on which is present this "memory controller". U-Boot sets this
> > register to zero, so at XX:00.00 is "memory controller" and on XX:01.00
> > is wifi card. Kernel sets this register to one, so at XX:01.00 is
> > "memory controller" and on XX:00.00 is wifi card. Kernel then filter
> > config read/write access to BDF XX:01.YY address.
> > 
> > > The "memory controllers"
> > > are vendor/device ID [11ab:6820], which Linux thinks are Root Ports,
> > > so I assume they are really Root Ports (or some emulation of them).
> > 
> > This is just coincidence that memory controller visible in PCIe config
> > space has same PCI device id as virtual root bridge emulated by kernel
> > pci-bridge-emul.c driver. These are totally different devices.
> > 
> > > It's *possible* to have both a Root Port and a NIC on bus 0, as shown
> > > here.  However, the NIC would have to be a Root Complex integrated
> > > Endpoint, and this NIC ([168c:002e]) is not one of those.
> > 
> > This is ordinary PCIe wifi card. It does not have integrated Root
> > Complex. Moreover that "memory controller" device is visible (in u-boot)
> > also when I disconnect wifi card.
> > 
> > > It's a
> > > garden-variety PCIe legacy endpoint connected by a link.  So this NIC
> > > cannot actually be on bus 00.
> > > 
> > > All these NICs are PCIe legacy endpoints with links, so they all must
> > > have a Root Port leading to them.  So this topology is not really
> > > possible.
> > > 
> > >   # uboot (ok):
> > >   00.00.00 [168c:002e] NIC
> > >   01.00.00 [168c:0046] NIC
> > >   02.00.00 [168c:003c] NIC
> > > 
> > > This topology is impossible from a PCI perspective because there's no
> > > way to get from bus 00 to bus 01 or 02.
> > 
> > This matches linux lspci output, just first bus is indexed from zero
> > instead of one. In linux it is indexed from one because at zero is that
> > fake/virtual bridge device emulated by linux kernel.
> > 
> > Does it make a little more sense now?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] Re: pci mvebu issue (memory controller)
  2021-06-02 20:39   ` Pali Rohár
@ 2021-06-02 21:01     ` Bjorn Helgaas
  2021-06-02 21:13       ` Pali Rohár
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2021-06-02 21:01 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Stefan Chulski, Bjorn Helgaas, Thomas Petazzoni,
	Marek Behún, Stefan Roese, Phil Sutter, Mario Six,
	Lorenzo Pieralisi, linux-pci

On Wed, Jun 02, 2021 at 10:39:17PM +0200, Pali Rohár wrote:
> On Wednesday 02 June 2021 14:14:55 Bjorn Helgaas wrote:
> > On Wed, Jun 02, 2021 at 01:07:03PM +0200, Pali Rohár wrote:
> > 
> > > In configuration with *bad* suffix is used U-Boot which does not ignore
> > > PCIe device Memory controller and configure it when U-Boot initialize.
> > > In this configuration loaded kernel is unable to initialize wifi cards.
> > > 
> > > In configuration with *ok* suffix is U-Boot explicitly patched to
> > > ignores PCIe device Memory controller and loaded kernel can use wifi
> > > cards without any issue.
> > > 
> > > In both configurations is used same kernel version. As I wrote in
> > > previous emails kernel already ignores and hides Memory controller PCIe
> > > device, so lspci does not see it.
> > > 
> > > In attachment I'm sending dmesg and lspci outputs from Linux and pci
> > > output from U-Boot.
> > > 
> > > What is suspicious for me is that this Memory controller device is at
> > > the same bus as wifi card. PCIe is "point to point", so at the other end
> > > of link should be only one device... Therefore I'm not sure if kernel
> > > can handle such thing like "two PCIe devices" at other end of PCIe link.
> > > 
> > > Could you look at attached logs if you see something suspicious here? Or
> > > if you need other logs (either from U-Boot or kernel) please let me
> > > know.
> > > 
> > > Note that U-Boot does not see PCIe Bridge as it is emulated only by
> > > kernel. So U-Boot enumerates buses from zero and kernel from one (as
> > > kernel's bus zero is for emulated PCIe Bridges).
> > 
> > I've lost track of what the problem is or what patch we're evaluating.
> 
> With bad uboot (which enumerates and initialize all PCIe devices,
> including that memory controller), linux kernel is unable to use PCIe
> devices. E.g. ath10k driver fails to start. If bad uboot has disabled
> PCIe device initialization then kernel has no problems.
> 
> > Here's what I see from dmesg/lspci/uboot:
> > 
> >   # dmesg (both bad/ok) and lspci:
> >   00:01.0 [11ab:6820] Root Port to [bus 01]
> >   00:02.0 [11ab:6820] Root Port to [bus 02]
> >   00:03.0 [11ab:6820] Root Port to [bus 03]
> >   01:00.0 [168c:002e] Atheros AR9287 NIC
> >   02:00.0 [168c:0046] Atheros QCA9984 NIC
> >   03:00.0 [168c:003c] Atheros QCA986x/988x NIC
> > 
> > The above looks perfectly reasonable.
> > 
> >   # uboot (bad):
> >   00.00.00 [11ab:6820] memory controller
> >   00.01.00 [168c:002e] NIC
> >   01.00.00 [11ab:6820] memory controller
> >   01.01.00 [168c:0046] NIC
> >   02.00.00 [11ab:6820] memory controller
> >   02.01.00 [168c:003c] NIC
> > 
> > The above looks dubious at best.  Bus 00 clearly must be a root bus
> > because bus 00 can never be a bridge's secondary bus.
> > 
> > Either buses 01 and 02 need to also be root buses (e.g., if we had
> > three host bridges, one leading to bus 00, another to bus 01, and
> > another to bus 02), OR there must be Root Ports that act as bridges
> > leading from bus 00 to bus 01 and bus 02.
> 
> There are 3 independent links from CPU, so 3 independent buses. Buses 01
> and 02 are accessed directly, not via bus 00.

That sounds like the "three host bridges leading to three root buses"
scenario.  If the NICs are ordinary endpoints (not Root Complex
integrated endpoints), there must be a Root Port on each root bus, and
the Root Port and the endpoint must be on different buses (just like
any other bridge).

For example, you could have this (which seems to be what you describe
above):

  host bridge A to domain 0000 [bus 00-01]:
    00:00.0 Root Port to [bus 01]
    01:00.0 Atheros AR9287 NIC

  host bridge B to domain 0000 [bus 02-03]:
    02:00.0 Root Port to [bus 03]
    03:00.0 Atheros QCA9984 NIC

  host bridge C to domain 0000 [bus 04-05]:
    04:00.0 Root Port to [bus 05]
    05:00.0 Atheros QCA986x/988x NIC

Or, since each host bridge spawns a separate PCI hierarchy, you could
have each one in its own domain.  If the host bridges are truly
independent, this is probably a software choice and could look like
this:

  host bridge A to domain 0000 [bus 00-ff]:
    0000:00:00.0 Root Port to [bus 01]
    0000:01:00.0 Atheros AR9287 NIC

  host bridge B to domain 0001 [bus 00-ff]:
    0001:00:00.0 Root Port to [bus 01]
    0001:01:00.0 Atheros QCA9984 NIC

  host bridge C to domain 0002 [bus 00-ff]:
    0002:00:00.0 Root Port to [bus 01]
    0002:01:00.0 Atheros QCA986x/988x NIC

> Linux devices 00:01.0, 00:02.0 and 00:03.0 are just virtual devices
> created by kernel pci-bridge-emul.c driver. They are not real devices,
> they are not present on PCIe bus and therefore they cannot be visible or
> available in u-boot.

If the NICs are ordinary PCIe endpoints there must be *something* to
terminate the other end of the link.  Maybe it has some sort of
non-standard programming interface, but from a PCIe topology point of
view, it's a root port.

I don't think I can contribute anything to the stuff below.  It sounds
like there's some confusion about how to handle these root ports that
aren't exactly root ports.  That's really up to uboot and the mvebu
driver to figure out.

> Moreover kernel pci-mvebu.c controller driver filters exactly one device
> at every bus which results that "memory controller" is not visible in
> lspci.
> 
> Moreover there is mvebu specific register which seems to set device
> number on which is present this "memory controller". U-Boot sets this
> register to zero, so at XX:00.00 is "memory controller" and on XX:01.00
> is wifi card. Kernel sets this register to one, so at XX:01.00 is
> "memory controller" and on XX:00.00 is wifi card. Kernel then filter
> config read/write access to BDF XX:01.YY address.
> 
> > The "memory controllers"
> > are vendor/device ID [11ab:6820], which Linux thinks are Root Ports,
> > so I assume they are really Root Ports (or some emulation of them).
> 
> This is just coincidence that memory controller visible in PCIe config
> space has same PCI device id as virtual root bridge emulated by kernel
> pci-bridge-emul.c driver. These are totally different devices.
> 
> > It's *possible* to have both a Root Port and a NIC on bus 0, as shown
> > here.  However, the NIC would have to be a Root Complex integrated
> > Endpoint, and this NIC ([168c:002e]) is not one of those.
> 
> This is ordinary PCIe wifi card. It does not have integrated Root
> Complex. Moreover that "memory controller" device is visible (in u-boot)
> also when I disconnect wifi card.
> 
> > It's a
> > garden-variety PCIe legacy endpoint connected by a link.  So this NIC
> > cannot actually be on bus 00.
> > 
> > All these NICs are PCIe legacy endpoints with links, so they all must
> > have a Root Port leading to them.  So this topology is not really
> > possible.
> > 
> >   # uboot (ok):
> >   00.00.00 [168c:002e] NIC
> >   01.00.00 [168c:0046] NIC
> >   02.00.00 [168c:003c] NIC
> > 
> > This topology is impossible from a PCI perspective because there's no
> > way to get from bus 00 to bus 01 or 02.
> 
> This matches linux lspci output, just first bus is indexed from zero
> instead of one. In linux it is indexed from one because at zero is that
> fake/virtual bridge device emulated by linux kernel.
> 
> Does it make a little more sense now?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] Re: pci mvebu issue (memory controller)
  2021-06-02 19:14 ` [EXT] " Bjorn Helgaas
@ 2021-06-02 20:39   ` Pali Rohár
  2021-06-02 21:01     ` Bjorn Helgaas
  0 siblings, 1 reply; 12+ messages in thread
From: Pali Rohár @ 2021-06-02 20:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Stefan Chulski, Bjorn Helgaas, Thomas Petazzoni,
	Marek Behún, Stefan Roese, Phil Sutter, Mario Six,
	Lorenzo Pieralisi, linux-pci

On Wednesday 02 June 2021 14:14:55 Bjorn Helgaas wrote:
> On Wed, Jun 02, 2021 at 01:07:03PM +0200, Pali Rohár wrote:
> 
> > In configuration with *bad* suffix is used U-Boot which does not ignore
> > PCIe device Memory controller and configure it when U-Boot initialize.
> > In this configuration loaded kernel is unable to initialize wifi cards.
> > 
> > In configuration with *ok* suffix is U-Boot explicitly patched to
> > ignores PCIe device Memory controller and loaded kernel can use wifi
> > cards without any issue.
> > 
> > In both configurations is used same kernel version. As I wrote in
> > previous emails kernel already ignores and hides Memory controller PCIe
> > device, so lspci does not see it.
> > 
> > In attachment I'm sending dmesg and lspci outputs from Linux and pci
> > output from U-Boot.
> > 
> > What is suspicious for me is that this Memory controller device is at
> > the same bus as wifi card. PCIe is "point to point", so at the other end
> > of link should be only one device... Therefore I'm not sure if kernel
> > can handle such thing like "two PCIe devices" at other end of PCIe link.
> > 
> > Could you look at attached logs if you see something suspicious here? Or
> > if you need other logs (either from U-Boot or kernel) please let me
> > know.
> > 
> > Note that U-Boot does not see PCIe Bridge as it is emulated only by
> > kernel. So U-Boot enumerates buses from zero and kernel from one (as
> > kernel's bus zero is for emulated PCIe Bridges).
> 
> I've lost track of what the problem is or what patch we're evaluating.

With bad uboot (which enumerates and initialize all PCIe devices,
including that memory controller), linux kernel is unable to use PCIe
devices. E.g. ath10k driver fails to start. If bad uboot has disabled
PCIe device initialization then kernel has no problems.

> Here's what I see from dmesg/lspci/uboot:
> 
>   # dmesg (both bad/ok) and lspci:
>   00:01.0 [11ab:6820] Root Port to [bus 01]
>   00:02.0 [11ab:6820] Root Port to [bus 02]
>   00:03.0 [11ab:6820] Root Port to [bus 03]
>   01:00.0 [168c:002e] Atheros AR9287 NIC
>   02:00.0 [168c:0046] Atheros QCA9984 NIC
>   03:00.0 [168c:003c] Atheros QCA986x/988x NIC
> 
> The above looks perfectly reasonable.
> 
>   # uboot (bad):
>   00.00.00 [11ab:6820] memory controller
>   00.01.00 [168c:002e] NIC
>   01.00.00 [11ab:6820] memory controller
>   01.01.00 [168c:0046] NIC
>   02.00.00 [11ab:6820] memory controller
>   02.01.00 [168c:003c] NIC
> 
> The above looks dubious at best.  Bus 00 clearly must be a root bus
> because bus 00 can never be a bridge's secondary bus.
> 
> Either buses 01 and 02 need to also be root buses (e.g., if we had
> three host bridges, one leading to bus 00, another to bus 01, and
> another to bus 02), OR there must be Root Ports that act as bridges
> leading from bus 00 to bus 01 and bus 02.

There are 3 independent links from CPU, so 3 independent buses. Buses 01
and 02 are accessed directly, not via bus 00.

Linux devices 00:01.0, 00:02.0 and 00:03.0 are just virtual devices
created by kernel pci-bridge-emul.c driver. They are not real devices,
they are not present on PCIe bus and therefore they cannot be visible or
available in u-boot.

Moreover kernel pci-mvebu.c controller driver filters exactly one device
at every bus which results that "memory controller" is not visible in
lspci.

Moreover there is mvebu specific register which seems to set device
number on which is present this "memory controller". U-Boot sets this
register to zero, so at XX:00.00 is "memory controller" and on XX:01.00
is wifi card. Kernel sets this register to one, so at XX:01.00 is
"memory controller" and on XX:00.00 is wifi card. Kernel then filter
config read/write access to BDF XX:01.YY address.

> The "memory controllers"
> are vendor/device ID [11ab:6820], which Linux thinks are Root Ports,
> so I assume they are really Root Ports (or some emulation of them).

This is just coincidence that memory controller visible in PCIe config
space has same PCI device id as virtual root bridge emulated by kernel
pci-bridge-emul.c driver. These are totally different devices.

> It's *possible* to have both a Root Port and a NIC on bus 0, as shown
> here.  However, the NIC would have to be a Root Complex integrated
> Endpoint, and this NIC ([168c:002e]) is not one of those.

This is ordinary PCIe wifi card. It does not have integrated Root
Complex. Moreover that "memory controller" device is visible (in u-boot)
also when I disconnect wifi card.

> It's a
> garden-variety PCIe legacy endpoint connected by a link.  So this NIC
> cannot actually be on bus 00.
> 
> All these NICs are PCIe legacy endpoints with links, so they all must
> have a Root Port leading to them.  So this topology is not really
> possible.
> 
>   # uboot (ok):
>   00.00.00 [168c:002e] NIC
>   01.00.00 [168c:0046] NIC
>   02.00.00 [168c:003c] NIC
> 
> This topology is impossible from a PCI perspective because there's no
> way to get from bus 00 to bus 01 or 02.

This matches linux lspci output, just first bus is indexed from zero
instead of one. In linux it is indexed from one because at zero is that
fake/virtual bridge device emulated by linux kernel.

Does it make a little more sense now?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXT] Re: pci mvebu issue (memory controller)
       [not found] <20210602110703.ymdt6nxsjl7e6glk@pali>
@ 2021-06-02 19:14 ` Bjorn Helgaas
  2021-06-02 20:39   ` Pali Rohár
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2021-06-02 19:14 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Stefan Chulski, Bjorn Helgaas, Thomas Petazzoni,
	Marek Behún, Stefan Roese, Phil Sutter, Mario Six,
	Lorenzo Pieralisi, linux-pci

On Wed, Jun 02, 2021 at 01:07:03PM +0200, Pali Rohár wrote:

> In configuration with *bad* suffix is used U-Boot which does not ignore
> PCIe device Memory controller and configure it when U-Boot initialize.
> In this configuration loaded kernel is unable to initialize wifi cards.
> 
> In configuration with *ok* suffix is U-Boot explicitly patched to
> ignores PCIe device Memory controller and loaded kernel can use wifi
> cards without any issue.
> 
> In both configurations is used same kernel version. As I wrote in
> previous emails kernel already ignores and hides Memory controller PCIe
> device, so lspci does not see it.
> 
> In attachment I'm sending dmesg and lspci outputs from Linux and pci
> output from U-Boot.
> 
> What is suspicious for me is that this Memory controller device is at
> the same bus as wifi card. PCIe is "point to point", so at the other end
> of link should be only one device... Therefore I'm not sure if kernel
> can handle such thing like "two PCIe devices" at other end of PCIe link.
> 
> Could you look at attached logs if you see something suspicious here? Or
> if you need other logs (either from U-Boot or kernel) please let me
> know.
> 
> Note that U-Boot does not see PCIe Bridge as it is emulated only by
> kernel. So U-Boot enumerates buses from zero and kernel from one (as
> kernel's bus zero is for emulated PCIe Bridges).

I've lost track of what the problem is or what patch we're evaluating.

Here's what I see from dmesg/lspci/uboot:

  # dmesg (both bad/ok) and lspci:
  00:01.0 [11ab:6820] Root Port to [bus 01]
  00:02.0 [11ab:6820] Root Port to [bus 02]
  00:03.0 [11ab:6820] Root Port to [bus 03]
  01:00.0 [168c:002e] Atheros AR9287 NIC
  02:00.0 [168c:0046] Atheros QCA9984 NIC
  03:00.0 [168c:003c] Atheros QCA986x/988x NIC

The above looks perfectly reasonable.

  # uboot (bad):
  00.00.00 [11ab:6820] memory controller
  00.01.00 [168c:002e] NIC
  01.00.00 [11ab:6820] memory controller
  01.01.00 [168c:0046] NIC
  02.00.00 [11ab:6820] memory controller
  02.01.00 [168c:003c] NIC

The above looks dubious at best.  Bus 00 clearly must be a root bus
because bus 00 can never be a bridge's secondary bus.

Either buses 01 and 02 need to also be root buses (e.g., if we had
three host bridges, one leading to bus 00, another to bus 01, and
another to bus 02), OR there must be Root Ports that act as bridges
leading from bus 00 to bus 01 and bus 02.  The "memory controllers"
are vendor/device ID [11ab:6820], which Linux thinks are Root Ports,
so I assume they are really Root Ports (or some emulation of them).

It's *possible* to have both a Root Port and a NIC on bus 0, as shown
here.  However, the NIC would have to be a Root Complex integrated
Endpoint, and this NIC ([168c:002e]) is not one of those.  It's a
garden-variety PCIe legacy endpoint connected by a link.  So this NIC
cannot actually be on bus 00.

All these NICs are PCIe legacy endpoints with links, so they all must
have a Root Port leading to them.  So this topology is not really
possible.

  # uboot (ok):
  00.00.00 [168c:002e] NIC
  01.00.00 [168c:0046] NIC
  02.00.00 [168c:003c] NIC

This topology is impossible from a PCI perspective because there's no
way to get from bus 00 to bus 01 or 02.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-11-01 18:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 13:17 pci mvebu issue (memory controller) Marek Behún
2021-02-10  8:54 ` Thomas Petazzoni
2021-02-10 13:59   ` [EXT] " Stefan Chulski
2021-02-19 17:44     ` Pali Rohár
2021-03-04 18:29       ` Bjorn Helgaas
2021-11-01 18:07         ` Jason Gunthorpe
2021-10-03 12:09 ` Pali Rohár
     [not found] <20210602110703.ymdt6nxsjl7e6glk@pali>
2021-06-02 19:14 ` [EXT] " Bjorn Helgaas
2021-06-02 20:39   ` Pali Rohár
2021-06-02 21:01     ` Bjorn Helgaas
2021-06-02 21:13       ` Pali Rohár
2021-06-02 21:59         ` Marek Behún

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.