nouveau.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* GT710 and Nouveau on ARM/ARM64
@ 2020-10-28 13:46 Dave Stevenson
       [not found] ` <CAPY8ntDMWeJao5Ld435s0cLSH-a7yYe4=daUso-nZNLdarMupg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Stevenson @ 2020-10-28 13:46 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi

Seeing as we (Raspberry Pi) have just launched the Compute Module 4
with an exposed PCIe x1 lane, people are asking about adding graphics
cards.

Seeing as you are the people who have the knowledge with regard to
NVidia and nouveau, what are your immediate thoughts of nouveau
working on ARM/ARM64? Is there a chance of this working? I'm no PCIe
expert, although I can call on some expertise :-/

I've tried it so far with a GT710 board [1] and ARM64. It's blowing up
in the memset of nvkm_instobj_new whilst initialising the BAR
subdevice [2], having gone through the "No such luck" path in
nvkm_mmu_ptc_get [3].

Taking the naive approach of simply removing the memset, I get through
initialising all the subdevices, but again die in a location I
currently haven't pinpointed. The last logging messages are:
[ 1023.407302] nouveau 0000:01:00.0: fifo: one-time init completed in 760us
[ 1023.407312] nouveau 0000:01:00.0: fifo: init completed in 775us
[ 1023.407325] nouveau: DRM-master:00000000:80009009: init running...
[ 1023.407329] nouveau: DRM-master:00000000:80009009: init children...
[ 1023.407333] nouveau: DRM-master:00000000:80009009: init completed in 4us
[ 1023.407352] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at
00000000000d1000 engine 05 [BAR2] client 08 [HUB/HOST_CPU_NB] reason
02 [PTE] on channel -1 [007fd38000 unknown]
[ 1023.407354] nouveau: DRM-master:00000000:00000000: ioctl: return 0
[ 1023.407385] nouveau: DRM-master:00000000:00000000: ioctl: size 32
[ 1023.407392] nouveau: DRM-master:00000000:00000000: ioctl: vers 0
type 01 object ffffff80ee8c2170 owner ff
[ 1023.407415] nouveau: DRM-master:00000000:80009009: ioctl: sclass size 8
[ 1023.407419] nouveau: DRM-master:00000000:80009009: ioctl: sclass
vers 0 count 0
[ 1023.407432] nouveau: DRM-master:00000000:00000000: ioctl: return 0
[ 1023.407452] nouveau: DRM-master:00000000:00000000: ioctl: size 48
[ 1023.407459] nouveau: DRM-master:00000000:00000000: ioctl: vers 0
type 01 object ffffff80ee8c2170 owner ff
[ 1023.407482] nouveau: DRM-master:00000000:80009009: ioctl: sclass size 24
[ 1023.407485] nouveau: DRM-master:00000000:80009009: ioctl: sclass
vers 0 count 2
[ 1023.407498] nouveau: DRM-master:00000000:00000000: ioctl: return 0
[ 1023.407519] nouveau: DRM-master:00000000:00000000: ioctl: size 48

Any input very welcome, otherwise I'll continue blundering about
slightly in the dark.

Thanks
  Dave

[1] https://www.amazon.co.uk/ASUS-GT710-4H-SL-2GD5-GeForce-Multi-Monitor-Productivity/dp/B0897T6PYM/
[2] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/base.c#L114
[3] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c#L201

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GT710 and Nouveau on ARM/ARM64
       [not found] ` <CAPY8ntDMWeJao5Ld435s0cLSH-a7yYe4=daUso-nZNLdarMupg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-28 14:10   ` Ilia Mirkin
       [not found]     ` <CAKb7Uvi-+0nt8Jfp+kaRC=Eq2s5bSB_VFSHqSyV_9tgdDRvg9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Ilia Mirkin @ 2020-10-28 14:10 UTC (permalink / raw)
  To: Dave Stevenson; +Cc: nouveau


[-- Attachment #1.1: Type: text/plain, Size: 3288 bytes --]

The most common issue on arm is that the pci memory window is too narrow to
allocate all the BARs. Can you see if there are messages in the kernel to
that effect?

On Wed, Oct 28, 2020, 9:46 AM Dave Stevenson <dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org>
wrote:

> Hi
>
> Seeing as we (Raspberry Pi) have just launched the Compute Module 4
> with an exposed PCIe x1 lane, people are asking about adding graphics
> cards.
>
> Seeing as you are the people who have the knowledge with regard to
> NVidia and nouveau, what are your immediate thoughts of nouveau
> working on ARM/ARM64? Is there a chance of this working? I'm no PCIe
> expert, although I can call on some expertise :-/
>
> I've tried it so far with a GT710 board [1] and ARM64. It's blowing up
> in the memset of nvkm_instobj_new whilst initialising the BAR
> subdevice [2], having gone through the "No such luck" path in
> nvkm_mmu_ptc_get [3].
>
> Taking the naive approach of simply removing the memset, I get through
> initialising all the subdevices, but again die in a location I
> currently haven't pinpointed. The last logging messages are:
> [ 1023.407302] nouveau 0000:01:00.0: fifo: one-time init completed in 760us
> [ 1023.407312] nouveau 0000:01:00.0: fifo: init completed in 775us
> [ 1023.407325] nouveau: DRM-master:00000000:80009009: init running...
> [ 1023.407329] nouveau: DRM-master:00000000:80009009: init children...
> [ 1023.407333] nouveau: DRM-master:00000000:80009009: init completed in 4us
> [ 1023.407352] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at
> 00000000000d1000 engine 05 [BAR2] client 08 [HUB/HOST_CPU_NB] reason
> 02 [PTE] on channel -1 [007fd38000 unknown]
> [ 1023.407354] nouveau: DRM-master:00000000:00000000: ioctl: return 0
> [ 1023.407385] nouveau: DRM-master:00000000:00000000: ioctl: size 32
> [ 1023.407392] nouveau: DRM-master:00000000:00000000: ioctl: vers 0
> type 01 object ffffff80ee8c2170 owner ff
> [ 1023.407415] nouveau: DRM-master:00000000:80009009: ioctl: sclass size 8
> [ 1023.407419] nouveau: DRM-master:00000000:80009009: ioctl: sclass
> vers 0 count 0
> [ 1023.407432] nouveau: DRM-master:00000000:00000000: ioctl: return 0
> [ 1023.407452] nouveau: DRM-master:00000000:00000000: ioctl: size 48
> [ 1023.407459] nouveau: DRM-master:00000000:00000000: ioctl: vers 0
> type 01 object ffffff80ee8c2170 owner ff
> [ 1023.407482] nouveau: DRM-master:00000000:80009009: ioctl: sclass size 24
> [ 1023.407485] nouveau: DRM-master:00000000:80009009: ioctl: sclass
> vers 0 count 2
> [ 1023.407498] nouveau: DRM-master:00000000:00000000: ioctl: return 0
> [ 1023.407519] nouveau: DRM-master:00000000:00000000: ioctl: size 48
>
> Any input very welcome, otherwise I'll continue blundering about
> slightly in the dark.
>
> Thanks
>   Dave
>
> [1]
> https://www.amazon.co.uk/ASUS-GT710-4H-SL-2GD5-GeForce-Multi-Monitor-Productivity/dp/B0897T6PYM/
> [2]
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/base.c#L114
> [3]
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c#L201
> _______________________________________________
> Nouveau mailing list
> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>

[-- Attachment #1.2: Type: text/html, Size: 4459 bytes --]

[-- Attachment #2: Type: text/plain, Size: 182 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GT710 and Nouveau on ARM/ARM64
       [not found]     ` <CAKb7Uvi-+0nt8Jfp+kaRC=Eq2s5bSB_VFSHqSyV_9tgdDRvg9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-28 14:19       ` Dave Stevenson
       [not found]         ` <CAPY8ntBOnWo78VhhgRew9o67_0VrtLtAAcDY-U07ksCPQA-e0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Stevenson @ 2020-10-28 14:19 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau

Hi Ilia

Thanks for taking the time to reply.

On Wed, 28 Oct 2020 at 14:10, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
>
> The most common issue on arm is that the pci memory window is too narrow to allocate all the BARs. Can you see if there are messages in the kernel to that effect?

All the BAR allocations seem to succeed except for the IO one.
AIUI I/O is deprecated, but is it still used on these cards?

[    1.060851] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    1.060892] brcm-pcie fd500000.pcie:   No bus range found for
/scb/pcie@7d500000, using [bus 00-ff]
[    1.060975] brcm-pcie fd500000.pcie:      MEM
0x0600000000..0x063fffffff -> 0x00c0000000
[    1.061061] brcm-pcie fd500000.pcie:   IB MEM
0x0000000000..0x00ffffffff -> 0x0100000000
[    1.109943] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[    1.110129] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[    1.110159] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.110187] pci_bus 0000:00: root bus resource [mem
0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff])
[    1.110286] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[    1.110505] pci 0000:00:00.0: PME# supported from D0 D3hot
[    1.114095] pci 0000:00:00.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[    1.114343] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000
[    1.114404] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[    1.114456] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
64bit pref]
[    1.114510] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
64bit pref]
[    1.114551] pci 0000:01:00.0: reg 0x24: [io  0x0000-0x007f]
[    1.114590] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[    1.114853] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth,
limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008
Gb/s with 8.0 GT/s PCIe x8 link)
[    1.115022] pci 0000:01:00.0: vgaarb: VGA device added:
decodes=io+mem,owns=none,locks=none
[    1.115125] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300
[    1.115184] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[    1.119065] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.119120] pci 0000:00:00.0: BAR 9: assigned [mem
0x600000000-0x60bffffff 64bit pref]
[    1.119151] pci 0000:00:00.0: BAR 8: assigned [mem 0x60c000000-0x60d7fffff]
[    1.119183] pci 0000:01:00.0: BAR 1: assigned [mem
0x600000000-0x607ffffff 64bit pref]
[    1.119235] pci 0000:01:00.0: BAR 3: assigned [mem
0x608000000-0x609ffffff 64bit pref]
[    1.119285] pci 0000:01:00.0: BAR 0: assigned [mem 0x60c000000-0x60cffffff]
[    1.119317] pci 0000:01:00.0: BAR 6: assigned [mem
0x60d000000-0x60d07ffff pref]
[    1.119345] pci 0000:01:00.1: BAR 0: assigned [mem 0x60d080000-0x60d083fff]
[    1.119376] pci 0000:01:00.0: BAR 5: no space for [io  size 0x0080]
[    1.119400] pci 0000:01:00.0: BAR 5: failed to assign [io  size 0x0080]
[    1.119426] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.119456] pci 0000:00:00.0:   bridge window [mem 0x60c000000-0x60d7fffff]
[    1.119484] pci 0000:00:00.0:   bridge window [mem
0x600000000-0x60bffffff 64bit pref]
[    1.119662] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0

  Dave

> On Wed, Oct 28, 2020, 9:46 AM Dave Stevenson <dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org> wrote:
>>
>> Hi
>>
>> Seeing as we (Raspberry Pi) have just launched the Compute Module 4
>> with an exposed PCIe x1 lane, people are asking about adding graphics
>> cards.
>>
>> Seeing as you are the people who have the knowledge with regard to
>> NVidia and nouveau, what are your immediate thoughts of nouveau
>> working on ARM/ARM64? Is there a chance of this working? I'm no PCIe
>> expert, although I can call on some expertise :-/
>>
>> I've tried it so far with a GT710 board [1] and ARM64. It's blowing up
>> in the memset of nvkm_instobj_new whilst initialising the BAR
>> subdevice [2], having gone through the "No such luck" path in
>> nvkm_mmu_ptc_get [3].
>>
>> Taking the naive approach of simply removing the memset, I get through
>> initialising all the subdevices, but again die in a location I
>> currently haven't pinpointed. The last logging messages are:
>> [ 1023.407302] nouveau 0000:01:00.0: fifo: one-time init completed in 760us
>> [ 1023.407312] nouveau 0000:01:00.0: fifo: init completed in 775us
>> [ 1023.407325] nouveau: DRM-master:00000000:80009009: init running...
>> [ 1023.407329] nouveau: DRM-master:00000000:80009009: init children...
>> [ 1023.407333] nouveau: DRM-master:00000000:80009009: init completed in 4us
>> [ 1023.407352] nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at
>> 00000000000d1000 engine 05 [BAR2] client 08 [HUB/HOST_CPU_NB] reason
>> 02 [PTE] on channel -1 [007fd38000 unknown]
>> [ 1023.407354] nouveau: DRM-master:00000000:00000000: ioctl: return 0
>> [ 1023.407385] nouveau: DRM-master:00000000:00000000: ioctl: size 32
>> [ 1023.407392] nouveau: DRM-master:00000000:00000000: ioctl: vers 0
>> type 01 object ffffff80ee8c2170 owner ff
>> [ 1023.407415] nouveau: DRM-master:00000000:80009009: ioctl: sclass size 8
>> [ 1023.407419] nouveau: DRM-master:00000000:80009009: ioctl: sclass
>> vers 0 count 0
>> [ 1023.407432] nouveau: DRM-master:00000000:00000000: ioctl: return 0
>> [ 1023.407452] nouveau: DRM-master:00000000:00000000: ioctl: size 48
>> [ 1023.407459] nouveau: DRM-master:00000000:00000000: ioctl: vers 0
>> type 01 object ffffff80ee8c2170 owner ff
>> [ 1023.407482] nouveau: DRM-master:00000000:80009009: ioctl: sclass size 24
>> [ 1023.407485] nouveau: DRM-master:00000000:80009009: ioctl: sclass
>> vers 0 count 2
>> [ 1023.407498] nouveau: DRM-master:00000000:00000000: ioctl: return 0
>> [ 1023.407519] nouveau: DRM-master:00000000:00000000: ioctl: size 48
>>
>> Any input very welcome, otherwise I'll continue blundering about
>> slightly in the dark.
>>
>> Thanks
>>   Dave
>>
>> [1] https://www.amazon.co.uk/ASUS-GT710-4H-SL-2GD5-GeForce-Multi-Monitor-Productivity/dp/B0897T6PYM/
>> [2] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/base.c#L114
>> [3] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c#L201
>> _______________________________________________
>> Nouveau mailing list
>> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GT710 and Nouveau on ARM/ARM64
       [not found]         ` <CAPY8ntBOnWo78VhhgRew9o67_0VrtLtAAcDY-U07ksCPQA-e0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-28 14:59           ` Ilia Mirkin
       [not found]             ` <CAKb7UvguZ0VfDLTUJwBpTjR_M1dHaeajrkjDCHmpKQty4Ja9yw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Ilia Mirkin @ 2020-10-28 14:59 UTC (permalink / raw)
  To: Dave Stevenson; +Cc: nouveau

On Wed, Oct 28, 2020 at 10:20 AM Dave Stevenson
<dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org> wrote:
>
> Hi Ilia
>
> Thanks for taking the time to reply.
>
> On Wed, 28 Oct 2020 at 14:10, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> >
> > The most common issue on arm is that the pci memory window is too narrow to allocate all the BARs. Can you see if there are messages in the kernel to that effect?
>
> All the BAR allocations seem to succeed except for the IO one.
> AIUI I/O is deprecated, but is it still used on these cards?

I must admit I was ignorant of the fact that the IO ports were treated
as a BAR, but it makes a lot of sense.

One thing does stand out as odd:

>
> [    1.060851] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
> [    1.060892] brcm-pcie fd500000.pcie:   No bus range found for
> /scb/pcie@7d500000, using [bus 00-ff]
> [    1.060975] brcm-pcie fd500000.pcie:      MEM
> 0x0600000000..0x063fffffff -> 0x00c0000000
> [    1.061061] brcm-pcie fd500000.pcie:   IB MEM
> 0x0000000000..0x00ffffffff -> 0x0100000000
> [    1.109943] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
> [    1.110129] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
> [    1.110159] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    1.110187] pci_bus 0000:00: root bus resource [mem
> 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff])
> [    1.110286] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
> [    1.110505] pci 0000:00:00.0: PME# supported from D0 D3hot
> [    1.114095] pci 0000:00:00.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [    1.114343] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000
> [    1.114404] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> [    1.114456] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> 64bit pref]
> [    1.114510] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> 64bit pref]
> [    1.114551] pci 0000:01:00.0: reg 0x24: [io  0x0000-0x007f]
> [    1.114590] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> [    1.114853] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth,
> limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008
> Gb/s with 8.0 GT/s PCIe x8 link)
> [    1.115022] pci 0000:01:00.0: vgaarb: VGA device added:
> decodes=io+mem,owns=none,locks=none
> [    1.115125] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300
> [    1.115184] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> [    1.119065] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> [    1.119120] pci 0000:00:00.0: BAR 9: assigned [mem
> 0x600000000-0x60bffffff 64bit pref]
> [    1.119151] pci 0000:00:00.0: BAR 8: assigned [mem 0x60c000000-0x60d7fffff]

This is your brcm-pcie device.

> [    1.119183] pci 0000:01:00.0: BAR 1: assigned [mem
> 0x600000000-0x607ffffff 64bit pref]
> [    1.119235] pci 0000:01:00.0: BAR 3: assigned [mem
> 0x608000000-0x609ffffff 64bit pref]
> [    1.119285] pci 0000:01:00.0: BAR 0: assigned [mem 0x60c000000-0x60cffffff]

And this is the NVIDIA device. Note that these memory windows are
identical (or at least overlapping). I must admit almost complete
ignorance of PCIe and whether this is OK, but it seems sketchy at
first glance. A quick eyeballing of my x86 system suggests that all
PCIe devices get non-overlapping windows. OTOH there are messages
further up about some sort of remapping going on, so perhaps it's OK?
But two things on the same bus still shouldn't have the same addresses
allocated, based on my (limited) understanding.

In case it's an option, could you "unplug" the NIC (not just not load
its driver, but make it not appear at all on the PCI bus)?

> [    1.119317] pci 0000:01:00.0: BAR 6: assigned [mem
> 0x60d000000-0x60d07ffff pref]

Never heard of a BAR6 on NVIDIA. Probably just my ignorance though.

> [    1.119345] pci 0000:01:00.1: BAR 0: assigned [mem 0x60d080000-0x60d083fff]
> [    1.119376] pci 0000:01:00.0: BAR 5: no space for [io  size 0x0080]
> [    1.119400] pci 0000:01:00.0: BAR 5: failed to assign [io  size 0x0080]
> [    1.119426] pci 0000:00:00.0: PCI bridge to [bus 01]
> [    1.119456] pci 0000:00:00.0:   bridge window [mem 0x60c000000-0x60d7fffff]
> [    1.119484] pci 0000:00:00.0:   bridge window [mem
> 0x600000000-0x60bffffff 64bit pref]
> [    1.119662] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0

Back to your original issue:

>
>   Dave
>
> > On Wed, Oct 28, 2020, 9:46 AM Dave Stevenson <dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org> wrote:
> >>
> >> Hi
> >>
> >> Seeing as we (Raspberry Pi) have just launched the Compute Module 4
> >> with an exposed PCIe x1 lane, people are asking about adding graphics
> >> cards.
> >>
> >> Seeing as you are the people who have the knowledge with regard to
> >> NVidia and nouveau, what are your immediate thoughts of nouveau
> >> working on ARM/ARM64? Is there a chance of this working? I'm no PCIe
> >> expert, although I can call on some expertise :-/
> >>
> >> I've tried it so far with a GT710 board [1] and ARM64. It's blowing up
> >> in the memset of nvkm_instobj_new whilst initialising the BAR
> >> subdevice [2], having gone through the "No such luck" path in
> >> nvkm_mmu_ptc_get [3].
> >>
> >> Taking the naive approach of simply removing the memset, I get through
> >> initialising all the subdevices, but again die in a location I
> >> currently haven't pinpointed. The last logging messages are:

That's not a winning strategy, I'm afraid. You need to figure out why
the memset is blowing up. The simplest explanation is "it's trying to
write to an I/O resource but that resource wasn't allocated", hence my
question about BARs. But something's not mapped, or mapped in the
wrong way, or whatever. If you can't write to it at that point in
time, you won't be able to write to it later either. I would focus on
that.

FWIW this all does "work" on ARM in general -- the Jetson
TK1/TX1/T(latest)1 are NVIDIA ARM devices with onboard *platform*
devices (not PCI) which work OK. So the core of nouveau works (or at
least worked) fine. Just some of the PCIe glue may be off.

Cheers,

  -ilia

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GT710 and Nouveau on ARM/ARM64
       [not found]             ` <CAKb7UvguZ0VfDLTUJwBpTjR_M1dHaeajrkjDCHmpKQty4Ja9yw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-11-03 18:07               ` Dave Stevenson
       [not found]                 ` <CAPY8ntC6NTJyMyXv_1wrw4D52-MRRFn8AVZaVpKJ5NEaba1thg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Stevenson @ 2020-11-03 18:07 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau

Hi Ilia
Thanks again for the reply.

On Wed, 28 Oct 2020 at 14:59, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
>
> On Wed, Oct 28, 2020 at 10:20 AM Dave Stevenson
> <dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org> wrote:
> >
> > Hi Ilia
> >
> > Thanks for taking the time to reply.
> >
> > On Wed, 28 Oct 2020 at 14:10, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> > >
> > > The most common issue on arm is that the pci memory window is too narrow to allocate all the BARs. Can you see if there are messages in the kernel to that effect?
> >
> > All the BAR allocations seem to succeed except for the IO one.
> > AIUI I/O is deprecated, but is it still used on these cards?
>
> I must admit I was ignorant of the fact that the IO ports were treated
> as a BAR, but it makes a lot of sense.
>
> One thing does stand out as odd:
>
> >
> > [    1.060851] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
> > [    1.060892] brcm-pcie fd500000.pcie:   No bus range found for
> > /scb/pcie@7d500000, using [bus 00-ff]
> > [    1.060975] brcm-pcie fd500000.pcie:      MEM
> > 0x0600000000..0x063fffffff -> 0x00c0000000
> > [    1.061061] brcm-pcie fd500000.pcie:   IB MEM
> > 0x0000000000..0x00ffffffff -> 0x0100000000
> > [    1.109943] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
> > [    1.110129] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
> > [    1.110159] pci_bus 0000:00: root bus resource [bus 00-ff]
> > [    1.110187] pci_bus 0000:00: root bus resource [mem
> > 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff])
> > [    1.110286] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
> > [    1.110505] pci 0000:00:00.0: PME# supported from D0 D3hot
> > [    1.114095] pci 0000:00:00.0: bridge configuration invalid ([bus
> > 00-00]), reconfiguring
> > [    1.114343] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000
> > [    1.114404] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> > [    1.114456] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> > 64bit pref]
> > [    1.114510] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> > 64bit pref]
> > [    1.114551] pci 0000:01:00.0: reg 0x24: [io  0x0000-0x007f]
> > [    1.114590] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > [    1.114853] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth,
> > limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008
> > Gb/s with 8.0 GT/s PCIe x8 link)
> > [    1.115022] pci 0000:01:00.0: vgaarb: VGA device added:
> > decodes=io+mem,owns=none,locks=none
> > [    1.115125] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300
> > [    1.115184] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> > [    1.119065] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> > [    1.119120] pci 0000:00:00.0: BAR 9: assigned [mem
> > 0x600000000-0x60bffffff 64bit pref]
> > [    1.119151] pci 0000:00:00.0: BAR 8: assigned [mem 0x60c000000-0x60d7fffff]
>
> This is your brcm-pcie device.
>
> > [    1.119183] pci 0000:01:00.0: BAR 1: assigned [mem
> > 0x600000000-0x607ffffff 64bit pref]
> > [    1.119235] pci 0000:01:00.0: BAR 3: assigned [mem
> > 0x608000000-0x609ffffff 64bit pref]
> > [    1.119285] pci 0000:01:00.0: BAR 0: assigned [mem 0x60c000000-0x60cffffff]
>
> And this is the NVIDIA device. Note that these memory windows are
> identical (or at least overlapping). I must admit almost complete
> ignorance of PCIe and whether this is OK, but it seems sketchy at
> first glance. A quick eyeballing of my x86 system suggests that all
> PCIe devices get non-overlapping windows. OTOH there are messages
> further up about some sort of remapping going on, so perhaps it's OK?
> But two things on the same bus still shouldn't have the same addresses
> allocated, based on my (limited) understanding.

I've raised this with colleagues and it seems that this is normal.
The PCI bridge reports the window through which devices can be mapped,
and all devices have to fit within that. Pass as to whether that is a
quirk of ARM or this particular bridge.

I do note that on my x86 systems device 0000:00:00.0 is reported by
lspci as a "Host bridge" instead of a "PCI bridge".
On an Ubuntu VM I've got running, I do get
[    0.487249] PCI host bridge to bus 0000:00
[    0.487252] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.487254] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.487256] pci_bus 0000:00: root bus resource [mem
0x000a0000-0x000bffff window]
[    0.487258] pci_bus 0000:00: root bus resource [mem
0xe0000000-0xfdffffff window]
[    0.487260] pci_bus 0000:00: root bus resource [bus 00-ff]
and all device allocations are from within those ranges, so I'm not
convinced it's that different.

> In case it's an option, could you "unplug" the NIC (not just not load
> its driver, but make it not appear at all on the PCI bus)?

NIC? The network interface is totally separate. Or is this another
reuse of a TLA?

Unplugging the GPU means the PCI bus reports as being down and I get
no output at all from lspci.

> > [    1.119317] pci 0000:01:00.0: BAR 6: assigned [mem
> > 0x60d000000-0x60d07ffff pref]
>
> Never heard of a BAR6 on NVIDIA. Probably just my ignorance though.

You're ahead of me in knowledge, so I really don't know.
When I get a chance I'll have a look again on an x86 system to see
what it reports.

> > [    1.119345] pci 0000:01:00.1: BAR 0: assigned [mem 0x60d080000-0x60d083fff]
> > [    1.119376] pci 0000:01:00.0: BAR 5: no space for [io  size 0x0080]
> > [    1.119400] pci 0000:01:00.0: BAR 5: failed to assign [io  size 0x0080]
> > [    1.119426] pci 0000:00:00.0: PCI bridge to [bus 01]
> > [    1.119456] pci 0000:00:00.0:   bridge window [mem 0x60c000000-0x60d7fffff]
> > [    1.119484] pci 0000:00:00.0:   bridge window [mem
> > 0x600000000-0x60bffffff 64bit pref]
> > [    1.119662] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
>
> Back to your original issue:
>
> >
> >   Dave
> >
> > > On Wed, Oct 28, 2020, 9:46 AM Dave Stevenson <dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org> wrote:
> > >>
> > >> Hi
> > >>
> > >> Seeing as we (Raspberry Pi) have just launched the Compute Module 4
> > >> with an exposed PCIe x1 lane, people are asking about adding graphics
> > >> cards.
> > >>
> > >> Seeing as you are the people who have the knowledge with regard to
> > >> NVidia and nouveau, what are your immediate thoughts of nouveau
> > >> working on ARM/ARM64? Is there a chance of this working? I'm no PCIe
> > >> expert, although I can call on some expertise :-/
> > >>
> > >> I've tried it so far with a GT710 board [1] and ARM64. It's blowing up
> > >> in the memset of nvkm_instobj_new whilst initialising the BAR
> > >> subdevice [2], having gone through the "No such luck" path in
> > >> nvkm_mmu_ptc_get [3].
> > >>
> > >> Taking the naive approach of simply removing the memset, I get through
> > >> initialising all the subdevices, but again die in a location I
> > >> currently haven't pinpointed. The last logging messages are:
>
> That's not a winning strategy, I'm afraid. You need to figure out why
> the memset is blowing up. The simplest explanation is "it's trying to
> write to an I/O resource but that resource wasn't allocated", hence my
> question about BARs. But something's not mapped, or mapped in the
> wrong way, or whatever. If you can't write to it at that point in
> time, you won't be able to write to it later either. I would focus on
> that.

I did say it was the naive approach :-)
I was trying to gauge how much effort was going to be needed to get
this going. Was it going to blow up in 1, 10, or 100 places? It feels
like it is only a couple of things that are wrong, so there is hope.

Slightly annoyingly something more urgent has come up and I need to
shelve my experimentation for now, but thanks for the pointers. At
least I have some idea of where to start looking when time allows.

> FWIW this all does "work" on ARM in general -- the Jetson
> TK1/TX1/T(latest)1 are NVIDIA ARM devices with onboard *platform*
> devices (not PCI) which work OK. So the core of nouveau works (or at
> least worked) fine. Just some of the PCIe glue may be off.

It certainly feels like PCIe glue being off, but I was so far in the
dark it was worth the quick email to ask the question, and I'm very
grateful for the response.

Thanks again,
  Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GT710 and Nouveau on ARM/ARM64
       [not found]                 ` <CAPY8ntC6NTJyMyXv_1wrw4D52-MRRFn8AVZaVpKJ5NEaba1thg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-11-03 18:25                   ` Ilia Mirkin
       [not found]                     ` <CAKb7UvhqU-2tSWD4xjpoxKJsp1F=FhgSHaHr6sr2uagpxHMYzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Ilia Mirkin @ 2020-11-03 18:25 UTC (permalink / raw)
  To: Dave Stevenson; +Cc: nouveau

On Tue, Nov 3, 2020 at 1:08 PM Dave Stevenson
<dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org> wrote:
>
> Hi Ilia
> Thanks again for the reply.
>
> On Wed, 28 Oct 2020 at 14:59, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> >
> > On Wed, Oct 28, 2020 at 10:20 AM Dave Stevenson
> > <dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org> wrote:
> > >
> > > Hi Ilia
> > >
> > > Thanks for taking the time to reply.
> > >
> > > On Wed, 28 Oct 2020 at 14:10, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> > > >
> > > > The most common issue on arm is that the pci memory window is too narrow to allocate all the BARs. Can you see if there are messages in the kernel to that effect?
> > >
> > > All the BAR allocations seem to succeed except for the IO one.
> > > AIUI I/O is deprecated, but is it still used on these cards?
> >
> > I must admit I was ignorant of the fact that the IO ports were treated
> > as a BAR, but it makes a lot of sense.
> >
> > One thing does stand out as odd:
> >
> > >
> > > [    1.060851] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
> > > [    1.060892] brcm-pcie fd500000.pcie:   No bus range found for
> > > /scb/pcie@7d500000, using [bus 00-ff]
> > > [    1.060975] brcm-pcie fd500000.pcie:      MEM
> > > 0x0600000000..0x063fffffff -> 0x00c0000000
> > > [    1.061061] brcm-pcie fd500000.pcie:   IB MEM
> > > 0x0000000000..0x00ffffffff -> 0x0100000000
> > > [    1.109943] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
> > > [    1.110129] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
> > > [    1.110159] pci_bus 0000:00: root bus resource [bus 00-ff]
> > > [    1.110187] pci_bus 0000:00: root bus resource [mem
> > > 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff])
> > > [    1.110286] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
> > > [    1.110505] pci 0000:00:00.0: PME# supported from D0 D3hot
> > > [    1.114095] pci 0000:00:00.0: bridge configuration invalid ([bus
> > > 00-00]), reconfiguring
> > > [    1.114343] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000
> > > [    1.114404] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> > > [    1.114456] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> > > 64bit pref]
> > > [    1.114510] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> > > 64bit pref]
> > > [    1.114551] pci 0000:01:00.0: reg 0x24: [io  0x0000-0x007f]
> > > [    1.114590] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > > [    1.114853] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth,
> > > limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008
> > > Gb/s with 8.0 GT/s PCIe x8 link)
> > > [    1.115022] pci 0000:01:00.0: vgaarb: VGA device added:
> > > decodes=io+mem,owns=none,locks=none
> > > [    1.115125] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300
> > > [    1.115184] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> > > [    1.119065] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> > > [    1.119120] pci 0000:00:00.0: BAR 9: assigned [mem
> > > 0x600000000-0x60bffffff 64bit pref]
> > > [    1.119151] pci 0000:00:00.0: BAR 8: assigned [mem 0x60c000000-0x60d7fffff]
> >
> > This is your brcm-pcie device.
> >
> > > [    1.119183] pci 0000:01:00.0: BAR 1: assigned [mem
> > > 0x600000000-0x607ffffff 64bit pref]
> > > [    1.119235] pci 0000:01:00.0: BAR 3: assigned [mem
> > > 0x608000000-0x609ffffff 64bit pref]
> > > [    1.119285] pci 0000:01:00.0: BAR 0: assigned [mem 0x60c000000-0x60cffffff]
> >
> > And this is the NVIDIA device. Note that these memory windows are
> > identical (or at least overlapping). I must admit almost complete
> > ignorance of PCIe and whether this is OK, but it seems sketchy at
> > first glance. A quick eyeballing of my x86 system suggests that all
> > PCIe devices get non-overlapping windows. OTOH there are messages
> > further up about some sort of remapping going on, so perhaps it's OK?
> > But two things on the same bus still shouldn't have the same addresses
> > allocated, based on my (limited) understanding.
>
> I've raised this with colleagues and it seems that this is normal.
> The PCI bridge reports the window through which devices can be mapped,
> and all devices have to fit within that. Pass as to whether that is a
> quirk of ARM or this particular bridge.
>
> I do note that on my x86 systems device 0000:00:00.0 is reported by
> lspci as a "Host bridge" instead of a "PCI bridge".
> On an Ubuntu VM I've got running, I do get
> [    0.487249] PCI host bridge to bus 0000:00
> [    0.487252] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> [    0.487254] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
> [    0.487256] pci_bus 0000:00: root bus resource [mem
> 0x000a0000-0x000bffff window]
> [    0.487258] pci_bus 0000:00: root bus resource [mem
> 0xe0000000-0xfdffffff window]
> [    0.487260] pci_bus 0000:00: root bus resource [bus 00-ff]
> and all device allocations are from within those ranges, so I'm not
> convinced it's that different.
>
> > In case it's an option, could you "unplug" the NIC (not just not load
> > its driver, but make it not appear at all on the PCI bus)?
>
> NIC? The network interface is totally separate. Or is this another
> reuse of a TLA?
>
> Unplugging the GPU means the PCI bus reports as being down and I get
> no output at all from lspci.

Oh duh. I thought brcm-pcie was a broadcom NIC. Apparently it's the
whole bus - can't unplug that! Also explains the "conflict" which
makes a lot more sense if you (correctly) understand that other
"device" is the bus itself. Apologies for the misinterpretation :(
[And in hindsight, RPi runs on a Broadcom SoC, so ... I should have
remembered that. In my mind they just make network stuff, will try to
get that updated.]

> > > >> I've tried it so far with a GT710 board [1] and ARM64. It's blowing up
> > > >> in the memset of nvkm_instobj_new whilst initialising the BAR
> > > >> subdevice [2], having gone through the "No such luck" path in
> > > >> nvkm_mmu_ptc_get [3].
> > > >>
> > > >> Taking the naive approach of simply removing the memset, I get through
> > > >> initialising all the subdevices, but again die in a location I
> > > >> currently haven't pinpointed. The last logging messages are:
> >
> > That's not a winning strategy, I'm afraid. You need to figure out why
> > the memset is blowing up. The simplest explanation is "it's trying to
> > write to an I/O resource but that resource wasn't allocated", hence my
> > question about BARs. But something's not mapped, or mapped in the
> > wrong way, or whatever. If you can't write to it at that point in
> > time, you won't be able to write to it later either. I would focus on
> > that.
>
> I did say it was the naive approach :-)
> I was trying to gauge how much effort was going to be needed to get
> this going. Was it going to blow up in 1, 10, or 100 places? It feels
> like it is only a couple of things that are wrong, so there is hope.
>
> Slightly annoyingly something more urgent has come up and I need to
> shelve my experimentation for now, but thanks for the pointers. At
> least I have some idea of where to start looking when time allows.

When/if you do get back to it, you might consider posting a more
complete log without getting rid of the memset, perhaps the nature of
the blow-up will make the underlying problem more apparent, or make
further investigation paths apparent.

Cheers,

  -ilia

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GT710 and Nouveau on ARM/ARM64
       [not found]                     ` <CAKb7UvhqU-2tSWD4xjpoxKJsp1F=FhgSHaHr6sr2uagpxHMYzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-11-03 18:34                       ` Dave Stevenson
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Stevenson @ 2020-11-03 18:34 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau

On Tue, 3 Nov 2020 at 18:25, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
>
> On Tue, Nov 3, 2020 at 1:08 PM Dave Stevenson
> <dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org> wrote:
> >
> > Hi Ilia
> > Thanks again for the reply.
> >
> > On Wed, 28 Oct 2020 at 14:59, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> > >
> > > On Wed, Oct 28, 2020 at 10:20 AM Dave Stevenson
> > > <dave.stevenson-FnsA7b+Nu9Xg+KzdeAJ3hg@public.gmane.org> wrote:
> > > >
> > > > Hi Ilia
> > > >
> > > > Thanks for taking the time to reply.
> > > >
> > > > On Wed, 28 Oct 2020 at 14:10, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> > > > >
> > > > > The most common issue on arm is that the pci memory window is too narrow to allocate all the BARs. Can you see if there are messages in the kernel to that effect?
> > > >
> > > > All the BAR allocations seem to succeed except for the IO one.
> > > > AIUI I/O is deprecated, but is it still used on these cards?
> > >
> > > I must admit I was ignorant of the fact that the IO ports were treated
> > > as a BAR, but it makes a lot of sense.
> > >
> > > One thing does stand out as odd:
> > >
> > > >
> > > > [    1.060851] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
> > > > [    1.060892] brcm-pcie fd500000.pcie:   No bus range found for
> > > > /scb/pcie@7d500000, using [bus 00-ff]
> > > > [    1.060975] brcm-pcie fd500000.pcie:      MEM
> > > > 0x0600000000..0x063fffffff -> 0x00c0000000
> > > > [    1.061061] brcm-pcie fd500000.pcie:   IB MEM
> > > > 0x0000000000..0x00ffffffff -> 0x0100000000
> > > > [    1.109943] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
> > > > [    1.110129] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
> > > > [    1.110159] pci_bus 0000:00: root bus resource [bus 00-ff]
> > > > [    1.110187] pci_bus 0000:00: root bus resource [mem
> > > > 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff])
> > > > [    1.110286] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
> > > > [    1.110505] pci 0000:00:00.0: PME# supported from D0 D3hot
> > > > [    1.114095] pci 0000:00:00.0: bridge configuration invalid ([bus
> > > > 00-00]), reconfiguring
> > > > [    1.114343] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000
> > > > [    1.114404] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
> > > > [    1.114456] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff
> > > > 64bit pref]
> > > > [    1.114510] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff
> > > > 64bit pref]
> > > > [    1.114551] pci 0000:01:00.0: reg 0x24: [io  0x0000-0x007f]
> > > > [    1.114590] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
> > > > [    1.114853] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth,
> > > > limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008
> > > > Gb/s with 8.0 GT/s PCIe x8 link)
> > > > [    1.115022] pci 0000:01:00.0: vgaarb: VGA device added:
> > > > decodes=io+mem,owns=none,locks=none
> > > > [    1.115125] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300
> > > > [    1.115184] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> > > > [    1.119065] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> > > > [    1.119120] pci 0000:00:00.0: BAR 9: assigned [mem
> > > > 0x600000000-0x60bffffff 64bit pref]
> > > > [    1.119151] pci 0000:00:00.0: BAR 8: assigned [mem 0x60c000000-0x60d7fffff]
> > >
> > > This is your brcm-pcie device.
> > >
> > > > [    1.119183] pci 0000:01:00.0: BAR 1: assigned [mem
> > > > 0x600000000-0x607ffffff 64bit pref]
> > > > [    1.119235] pci 0000:01:00.0: BAR 3: assigned [mem
> > > > 0x608000000-0x609ffffff 64bit pref]
> > > > [    1.119285] pci 0000:01:00.0: BAR 0: assigned [mem 0x60c000000-0x60cffffff]
> > >
> > > And this is the NVIDIA device. Note that these memory windows are
> > > identical (or at least overlapping). I must admit almost complete
> > > ignorance of PCIe and whether this is OK, but it seems sketchy at
> > > first glance. A quick eyeballing of my x86 system suggests that all
> > > PCIe devices get non-overlapping windows. OTOH there are messages
> > > further up about some sort of remapping going on, so perhaps it's OK?
> > > But two things on the same bus still shouldn't have the same addresses
> > > allocated, based on my (limited) understanding.
> >
> > I've raised this with colleagues and it seems that this is normal.
> > The PCI bridge reports the window through which devices can be mapped,
> > and all devices have to fit within that. Pass as to whether that is a
> > quirk of ARM or this particular bridge.
> >
> > I do note that on my x86 systems device 0000:00:00.0 is reported by
> > lspci as a "Host bridge" instead of a "PCI bridge".
> > On an Ubuntu VM I've got running, I do get
> > [    0.487249] PCI host bridge to bus 0000:00
> > [    0.487252] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> > [    0.487254] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
> > [    0.487256] pci_bus 0000:00: root bus resource [mem
> > 0x000a0000-0x000bffff window]
> > [    0.487258] pci_bus 0000:00: root bus resource [mem
> > 0xe0000000-0xfdffffff window]
> > [    0.487260] pci_bus 0000:00: root bus resource [bus 00-ff]
> > and all device allocations are from within those ranges, so I'm not
> > convinced it's that different.
> >
> > > In case it's an option, could you "unplug" the NIC (not just not load
> > > its driver, but make it not appear at all on the PCI bus)?
> >
> > NIC? The network interface is totally separate. Or is this another
> > reuse of a TLA?
> >
> > Unplugging the GPU means the PCI bus reports as being down and I get
> > no output at all from lspci.
>
> Oh duh. I thought brcm-pcie was a broadcom NIC. Apparently it's the
> whole bus - can't unplug that! Also explains the "conflict" which
> makes a lot more sense if you (correctly) understand that other
> "device" is the bus itself. Apologies for the misinterpretation :(
> [And in hindsight, RPi runs on a Broadcom SoC, so ... I should have
> remembered that. In my mind they just make network stuff, will try to
> get that updated.]

Phew, I thought I was going crazy :-)

> > > > >> I've tried it so far with a GT710 board [1] and ARM64. It's blowing up
> > > > >> in the memset of nvkm_instobj_new whilst initialising the BAR
> > > > >> subdevice [2], having gone through the "No such luck" path in
> > > > >> nvkm_mmu_ptc_get [3].
> > > > >>
> > > > >> Taking the naive approach of simply removing the memset, I get through
> > > > >> initialising all the subdevices, but again die in a location I
> > > > >> currently haven't pinpointed. The last logging messages are:
> > >
> > > That's not a winning strategy, I'm afraid. You need to figure out why
> > > the memset is blowing up. The simplest explanation is "it's trying to
> > > write to an I/O resource but that resource wasn't allocated", hence my
> > > question about BARs. But something's not mapped, or mapped in the
> > > wrong way, or whatever. If you can't write to it at that point in
> > > time, you won't be able to write to it later either. I would focus on
> > > that.
> >
> > I did say it was the naive approach :-)
> > I was trying to gauge how much effort was going to be needed to get
> > this going. Was it going to blow up in 1, 10, or 100 places? It feels
> > like it is only a couple of things that are wrong, so there is hope.
> >
> > Slightly annoyingly something more urgent has come up and I need to
> > shelve my experimentation for now, but thanks for the pointers. At
> > least I have some idea of where to start looking when time allows.
>
> When/if you do get back to it, you might consider posting a more
> complete log without getting rid of the memset, perhaps the nature of
> the blow-up will make the underlying problem more apparent, or make
> further investigation paths apparent.

Will do, thanks.

  Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-03 18:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-28 13:46 GT710 and Nouveau on ARM/ARM64 Dave Stevenson
     [not found] ` <CAPY8ntDMWeJao5Ld435s0cLSH-a7yYe4=daUso-nZNLdarMupg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-28 14:10   ` Ilia Mirkin
     [not found]     ` <CAKb7Uvi-+0nt8Jfp+kaRC=Eq2s5bSB_VFSHqSyV_9tgdDRvg9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-28 14:19       ` Dave Stevenson
     [not found]         ` <CAPY8ntBOnWo78VhhgRew9o67_0VrtLtAAcDY-U07ksCPQA-e0w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-28 14:59           ` Ilia Mirkin
     [not found]             ` <CAKb7UvguZ0VfDLTUJwBpTjR_M1dHaeajrkjDCHmpKQty4Ja9yw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-11-03 18:07               ` Dave Stevenson
     [not found]                 ` <CAPY8ntC6NTJyMyXv_1wrw4D52-MRRFn8AVZaVpKJ5NEaba1thg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-11-03 18:25                   ` Ilia Mirkin
     [not found]                     ` <CAKb7UvhqU-2tSWD4xjpoxKJsp1F=FhgSHaHr6sr2uagpxHMYzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-11-03 18:34                       ` Dave Stevenson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).