From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org
Subject: [Bug 94725] New: Nouveau driver fails to load on GM204
Date: Sun, 27 Mar 2016 16:28:08 +0000
Message-ID:
Bug ID
94725
Summary
Nouveau driver fails to load on GM204
Product
xorg
Version
unspecified
Hardware
Other
OS
Linux (All)
Status
NEW
Severity
normal
Priority
medium
Component
Driver/nouveau
Assignee
nouveau@lists.freedesktop.org
Reporter
rashed@linux.com
QA Contact
xorg-team@lists.x.org
Created attachment 122587 [details]
dmesg from boot
Booting Linux 4.6-rc1 and Mesa 11.2/11.3 fails to load the nouveau driver on a
GTX 970M (6GB) on an MSI GS60 Ghost Pro 4K (i7-6700HQ). It spews out wonderful
messages like
[ 2.146398] nouveau 0000:01:00.0: priv: HUB0: 10ecc0 ffffffff (1940822c)
[ 2.154362] vga_switcheroo: enabled
[ 2.154567] [TTM] Zone kernel: Available graphics memory: 8170764 kiB
[ 2.154568] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
[ 2.154569] [TTM] Initializing pool allocator
[ 2.154572] [TTM] Initializing DMA pool allocator
[ 2.154577] nouveau 0000:01:00.0: DRM: VRAM: 6144 MiB
[ 2.154578] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
[ 2.154580] nouveau 0000:01:00.0: DRM: Pointer to TMDS table invalid
[ 2.154582] nouveau 0000:01:00.0: DRM: DCB version 4.1
[ 2.154583] nouveau 0000:01:00.0: DRM: Pointer to flat panel table invalid
Attached is a dmesg from boot. The driver does just drop to the i915 driver so
the machine is usable, but whenever I run lspci or lshw or try to logout of the
X session, it hangs when it switches back to the nVidia GPU (the laptop has an
LED indicator showing which GPU is in use)
Nouveau is successfully loaded on your laptop, but it seems to fail when it tries to wake up the NVIDIA GPU (if you look at the dmesg you linked, around 11sec, the NVIDIA GPU goes to sleep). You could try booting with `nouveau.runpm=0` on the kernel command line, and see if you still get the issue. Do you have any dmesg from when it hangs? IIRC, Alexandre Courbot sent a patch some time ago to fix an issue where the driver would try to reload the signed firmware upong resume and fail, but I would have guess it is included in 4.6-rc1.
In addition to the runpm=0 thing, please ensure that you have the appropriate firmware installed for this GPU - it should be in linux-firmware.git by now (nvidia/*). I don't see a message about nouveaufb, which could be due to how you configured your kernel, but it could also be because you don't have the firmware, and the user helper is kicking in and waiting 60 seconds for it to fail out, so nouveau's not fully done loading by the time the runpm stuff kicks in. Just a theory.
(In reply to Ilia Mirkin from comment #2) > In addition to the runpm=0 thing, please ensure that you have the > appropriate firmware installed for this GPU - it should be in > linux-firmware.git by now (nvidia/*). I don't see a message about nouveaufb, > which could be due to how you configured your kernel, but it could also be > because you don't have the firmware, and the user helper is kicking in and > waiting 60 seconds for it to fail out, so nouveau's not fully done loading > by the time the runpm stuff kicks in. Just a theory. I have the gm20x firmware from the linux-firmware repo installed. (In reply to Pierre Moreau from comment #1) > Nouveau is successfully loaded on your laptop, but it seems to fail when it > tries to wake up the NVIDIA GPU (if you look at the dmesg you linked, around > 11sec, the NVIDIA GPU goes to sleep). You could try booting with > `nouveau.runpm=0` on the kernel command line, and see if you still get the > issue. > Do you have any dmesg from when it hangs? I'll try that in a bit as well as try to get a dmesg when it hangs (not at my computer ATM) I see "NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!" when it hangs during a logout/shutdown but that's not particularly helpful.
Created attachment 122591 [details]
dmesg while crashing
Here is the dmesg from when it crashes. I ran lshw and it seems that triggered
the nVidia card to start back up which caused the crash. With runpm=0 the
nVidia card is never powered off so it doesn't crash.
What | Removed | Added |
---|---|---|
Summary | Nouveau driver fails to load on GM204 | Nouveau driver fails to poweron GPU on GM204 after dynamic poweroff |
What | Removed | Added |
---|---|---|
CC | efremmc2@gmail.com |
I have the same problem with a GM206, GTX 960. I have to recycle the computer twice.
Maybe these messages are pointing to the root of the problem: [ 51.608479] nouveau 0000:01:00.0: Refused to change power state, currently in D3 [ 51.683924] nouveau 0000:01:00.0: Refused to change power state, currently in D3 [ 51.700020] nouveau 0000:01:00.0: Refused to change power state, currently in D3 If the device is still in D3 when we resume it, then accessing registers would understandably result in a freeze. Devinit comes early enough in the resume chain to make this plausible. FWIW I can successfully suspend/resume (echo mem >/sys/power/state) a GTX 960, but runtime PM works slightly differently. I would like to enable runtime PM on my desktop GTX960 to repro this, but for some reason I am failing - despite loading nouveau with "modeset=2 runpm=1", I cannot see runtime PM kicking in and /sys/class/drm/card0/power/runtime_status says "unsupported". What am I doing wrong?
can you try booting with acpi_osi="!Windows 2013" on the kernel command line.
What | Removed | Added |
---|---|---|
CC | gnurou@gmail.com |
Created attachment 122653 [details]
Tentative fix
The attached patch *might* help with this issue, but I have no way to test it.
Rashed, Efrem, can one of you give it a try and tell us if it helps?
Dave, I tried adding the option you suggested, but it did not allow me to enable runtime PM, sadly. /sys/class/drm/card0/power/runtime_status still "unsupported" despite nouveau.ko being loaded with "modeset=2 runpm=1".
Created attachment 122654 [details]
dmesg using tentative fix
Alexandre, using the tentative fix you uploaded it switches GPUs properly now.
Here is the dmesg from that since there are still some errors related to power
state in it. Also, for some reason lshw is returning this for the nVidia GPU
now:
*-generic
description: Unassigned class
product: Illegal Vendor ID
vendor: Illegal Vendor ID
physical id: 0
bus info: pci@0000:01:00.0
version: ff
width: 32 bits
clock: 66MHz
capabilities: bus_master vga_palette cap_list rom
configuration: driver=nouveau latency=255 maxlatency=255
mingnt=255
resources: irq:129 memory:dc000000-dcffffff
memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:e000(size=128)
memory:dd000000-dd07ffff
I don't know if that's related to this at all but before, if I set runpm=0 and
run lshw, it would return the proper description (running it without runpm
would cause the system to hang)
Thanks Rashed. This looks better but something seems to be going wrong with PCI. I'm pretty clueless about PCI/ACPI, so let's see if someone else has something to suggest...
I will not be able to test until later this afternoon. I have a GTX 960 as PCI 01:00.0 and GTX 730 as PCI 02:00.0. I will send over dmesg/journalctl -k. Regards Efrem On Apr 1, 2016 4:02 AM, <bugzilla-daemon@freedesktop.org> wrote: > *Comment # 11 <https://bugs.freedesktop.org/show_bug.cgi?id=94725#c11> on > bug 94725 <https://bugs.freedesktop.org/show_bug.cgi?id=94725> from > Alexandre Courbot <gnurou@gmail.com> * > > Thanks Rashed. This looks better but something seems to be going wrong with > PCI. I'm pretty clueless about PCI/ACPI, so let's see if someone else has > something to suggest... > > ------------------------------ > You are receiving this mail because: > > - You are the assignee for the bug. > > > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau > >
(In reply to Rashed Abdel-Tawab from comment #10) > Created attachment 122654 [details] > dmesg using tentative fix > > Alexandre, using the tentative fix you uploaded it switches GPUs properly > now. Here is the dmesg from that since there are still some errors related > to power state in it. Also, for some reason lshw is returning this for the > nVidia GPU now: > > *-generic > description: Unassigned class > product: Illegal Vendor ID > vendor: Illegal Vendor ID > physical id: 0 > bus info: pci@0000:01:00.0 > version: ff > width: 32 bits > clock: 66MHz > capabilities: bus_master vga_palette cap_list rom > configuration: driver=nouveau latency=255 maxlatency=255 > mingnt=255 > resources: irq:129 memory:dc000000-dcffffff > memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:e000(size=128) > memory:dd000000-dd07ffff > > I don't know if that's related to this at all but before, if I set runpm=0 > and run lshw, it would return the proper description (running it without > runpm would cause the system to hang) I have the same issue with bbswitch (and maybe vgaswitcheroo too). Basically this means that in d3cold we can't talke to the gpu and the information isn't cached or something like that.
This what I captured from dmesg. PCI 02:00.0 is a 730 GTX w/1Gb DDR5 PCI 01:00.0 is a 960 GTX w/4Gb DDR5 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: bios: version 84.06.26.00.2c Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: gr: using external firmware Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: Direct firmware load for nvidia/gm206/fecs_inst.bin failed with error -2 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: gr: failed to load fecs_inst Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: disp: dcb 15 type 8 unknown Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: fb: 4096 MiB GDDR5 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: VRAM: 4096 MiB Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: GART: 1048576 MiB Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB version 4.1 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 01000f02 00020030 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 02000f00 00000000 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 04011f82 00020030 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 03: 02022f62 00020010 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 05: 02833f76 04400020 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 06: 02033f72 00020020 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB outp 15: 01df5ff8 00000000 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 01: 01000131 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 02: 00010261 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 03: 00020346 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: DCB conn 05: 00000570 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: Pointer to flat panel table invalid Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: unknown connector type 70 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: failed to create encoder 1/8/0: -19 Apr 02 17:08:56 localhost kernel: nouveau 0000:01:00.0: DRM: Unknown-1 has no encoders, removing Apr 02 17:08:57 localhost kernel: nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies Apr 02 17:08:57 localhost kernel: nouveau 0000:01:00.0: DRM: allocated 1920x1080 fb: 0x60000, bo ffff88089ac02800 Apr 02 17:08:57 localhost kernel: nouveau 0000:01:00.0: fb0: nouveaufb frame buffer device Apr 02 17:08:57 localhost kernel: nouveau 0000:02:00.0: enabling device (0000 -> 0003) Apr 02 17:08:57 localhost kernel: nouveau 0000:02:00.0: NVIDIA GK208B (b06070b1) Apr 02 17:08:57 localhost kernel: nouveau 0000:02:00.0: bios: version 80.28.78.00.01 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: priv: HUB0: 086014 ffffffff (1f70820c) Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: fb: 1024 MiB GDDR5 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: VRAM: 1024 MiB Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: GART: 1048576 MiB Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: TMDS table version 2.0 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: DCB version 4.0 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: DCB outp 00: 01000f02 00020030 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: DCB outp 01: 02011f62 00020010 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: DCB outp 02: 02022f10 00000000 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: DCB conn 00: 00001031 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: DCB conn 01: 00002161 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: DCB conn 02: 00000200 Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: DRM: MM: using COPY for buffer copies Apr 02 17:08:58 localhost kernel: nouveau 0000:02:00.0: No connectors reported connected with modes Apr 02 17:08:59 localhost kernel: nouveau 0000:02:00.0: DRM: allocated 1024x768 fb: 0x60000, bo ffff88089a47c400 Apr 02 17:08:59 localhost kernel: nouveau 0000:02:00.0: fb1: nouveaufb frame buffer device Apr 02 17:09:01 localhost.localdomain kernel: mei_me 0000:00:16.0: enabling device (0000 -> 0002) Apr 02 17:09:01 localhost.localdomain kernel: snd_hda_intel 0000:01:00.1: Disabling MSI Apr 02 17:09:01 localhost.localdomain kernel: snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client Apr 02 17:09:01 localhost.localdomain kernel: snd_hda_intel 0000:02:00.1: Disabling MSI Apr 02 17:09:01 localhost.localdomain kernel: snd_hda_intel 0000:02:00.1: Handle vga_switcheroo audio client Apr 02 17:09:02 localhost.localdomain kernel: snd_hda_intel 0000:00:1f.3: failed to add i915 component master (-19) Apr 02 17:09:02 localhost.localdomain kernel: e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode Apr 02 17:09:02 localhost.localdomain kernel: e1000e 0000:00:1f.6 eth1: registered PHC clock Apr 02 17:09:02 localhost.localdomain kernel: e1000e 0000:00:1f.6 eth1: (PCI Express:2.5GT/s:Width x1) 00:1f:bc:0f:37:76 Apr 02 17:09:02 localhost.localdomain kernel: e1000e 0000:00:1f.6 eth1: Intel(R) PRO/1000 Network Connection Apr 02 17:09:02 localhost.localdomain kernel: e1000e 0000:00:1f.6 eth1: MAC: 12, PHY: 12, PBA No: FFFFFF-0FF Apr 02 17:09:03 localhost.localdomain kernel: e1000e 0000:00:1f.6 enp0s31f6: renamed from eth1 Apr 02 17:09:11 localhost.localdomain kernel: ahci 0000:00:17.0: port does not support device sleep Apr 02 17:09:29 localhost.localdomain kernel: e1000e 0000:00:1f.6 enp0s31f6: 10/100 speed: disabling TSO On Fri, Apr 1, 2016 at 8:17 AM, <bugzilla-daemon@freedesktop.org> wrote: > *Comment # 13 <https://bugs.freedesktop.org/show_bug.cgi?id=94725#c13> on > bug 94725 <https://bugs.freedesktop.org/show_bug.cgi?id=94725> from Karol > Herbst <freedesktop@karolherbst.de> * > > (In reply to Rashed Abdel-Tawab from comment #10 <https://bugs.freedesktop.org/show_bug.cgi?id=94725#c10>)> Created attachment 122654 [details] <https://bugs.freedesktop.org/attachment.cgi?id=122654> [details] <https://bugs.freedesktop.org/attachment.cgi?id=122654&action=edit> > > dmesg using tentative fix > > > > Alexandre, using the tentative fix you uploaded it switches GPUs properly > > now. Here is the dmesg from that since there are still some errors related > > to power state in it. Also, for some reason lshw is returning this for the > > nVidia GPU now: > > > > *-generic > > description: Unassigned class > > product: Illegal Vendor ID > > vendor: Illegal Vendor ID > > physical id: 0 > > bus info: pci@0000:01:00.0 > > version: ff > > width: 32 bits > > clock: 66MHz > > capabilities: bus_master vga_palette cap_list rom > > configuration: driver=nouveau latency=255 maxlatency=255 > > mingnt=255 > > resources: irq:129 memory:dc000000-dcffffff > > memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:e000(size=128) > > memory:dd000000-dd07ffff > > > > I don't know if that's related to this at all but before, if I set runpm=0 > > and run lshw, it would return the proper description (running it without > > runpm would cause the system to hang) > > I have the same issue with bbswitch (and maybe vgaswitcheroo too). Basically > this means that in d3cold we can't talke to the gpu and the information isn't > cached or something like that. > > ------------------------------ > You are receiving this mail because: > > - You are the assignee for the bug. > > > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau > >
=20 =20 =20
Comment # 13 on bug 94725 fromKarol Herbst(In reply to Rashed Abdel-Tawab from comment #10= a>) > Created attachment 12265= 4 [det= ails] > dmesg using tentative fix >=20 > Alexandre, using the tentative fix you uploaded it switches GPUs prope= rly > now. Here is the dmesg from that since there are still some errors rel= ated > to power state in it. Also, for some reason lshw is returning this for= the > nVidia GPU now: >=20 > *-generic > description: Unassigned class > product: Illegal Vendor ID > vendor: Illegal Vendor ID > physical id: 0 > bus info: pci@0000:01:00.0 > version: ff > width: 32 bits > clock: 66MHz > capabilities: bus_master vga_palette cap_list rom > configuration: driver=3Dnouveau latency=3D255 maxlaten= cy=3D255 > mingnt=3D255 > resources: irq:129 memory:dc000000-dcffffff > memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:e000(size=3D1= 28) > memory:dd000000-dd07ffff >=20 > I don't know if that's related to this at all but before, if I= set runpm=3D0 > and run lshw, it would return the proper description (running it witho= ut > runpm would cause the system to hang) I have the same issue with bbswitch (and maybe vgaswitcheroo too). Basicall= y this means that in d3cold we can't talke to the gpu and the information= isn't cached or something like that.
You are receiving this mail because: =20
- You are the assignee for the bug.
_______________________________________________
Nouveau mailing list
Nouveau-PD4FTy7X32lNgt0PjOBp9/rsn8yoX9R0@public.gmane.org= org
https://lists.freedesktop.org/mailman/listinfo= /nouveau
So according to Karol's comment it seems like the issue might be fixed. Rashed, can you confirm that the GPU is operational after runtime resume with the patch I posted?
(In reply to Alexandre Courbot from comment #15) > So according to Karol's comment it seems like the issue might be fixed. > Rashed, can you confirm that the GPU is operational after runtime resume > with the patch I posted? I can confirm the driver no longer hangs on runtime resume, yes. I don't know how to offload to the GPU so I guess I can't say I know if its operational.
Created attachment 122747 [details]
new dmesg during hang
I decided to keep testing this in case we missed something, and running lshw
twice in a row causes it to hang. I'm not sure what's up with it so I've
attached the dmesg. It looks pretty similar to before, but that doesn't make
sense since Alexandre patched the original problem.
uhh the card seems pretty much messed up after resume, because several things just fail.
I think I'm seeing the same issue here on a Schenker XMG A506 = notebook. The NV GPU is the dedicated one. AFAIK all display hardware is connected to the In= tel iGPU. Kernel is vanilla 4.6.1. Actually this is the first kernel (and the first t= ime) that I try to use the dedicated GPU. lspci: 01:00.0 VGA compatible controller: NVIDIA Corporation GM107M [GeForce GTX 9= 60M] (rev a2) Without runpm=3D0 even just calling DRI_PRIME=3D1 glxinfo leaves the system unresponsive shortly afterwards. SysRq still works though. Going to attach the part of the log in a second.
Created =
attachment 124326 [details]
nouveau run on GM170 with default runpm