From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org
Subject: [Bug 72180] New: Nouveau Random GPU Lockups
Date: Sat, 30 Nov 2013 18:05:36 +0000
Message-ID:
Priority
medium
Bug ID
72180
Assignee
nouveau@lists.freedesktop.org
Summary
Nouveau Random GPU Lockups
QA Contact
xorg-team@lists.x.org
Severity
normal
Classification
Unclassified
OS
Linux (All)
Reporter
bass.jordan+bugzilla@gmail.com
Hardware
x86-64 (AMD64)
Status
NEW
Version
unspecified
Component
Driver/nouveau
Product
xorg
Created attachment 90035 [details]
nouveau log
Frequent GPU lockups, happens anywhere from 10 minutes to 2 hours after
booting. No common cause that I've noticed when it happens. Occasionally I can
get out to another tty but often completely unresponsive. Some system
information:
Linux jordans-pc 3.11.9-200.fc19.x86_64 #1 SMP Wed Nov 20 21:22:24 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux
xorg-x11-server-Xorg.x86_64 1.14.4-3.fc19
xorg-x11-server-common.x86_64 1.14.4-3.fc19
xorg-x11-server-utils.x86_64 7.7-1.fc19
xorg-x11-utils.x86_64 7.5-9.fc19
xorg-x11-xauth.x86_64 1:1.0.7-3.fc19
xorg-x11-xinit.x86_64 1.3.2-8.fc19
xorg-x11-xkb-utils.x86_64 7.7-7.fc19
xorg-x11-drv-nouveau.x86_64 1:1.0.9-1.fc19
Attached log of messages from Nouveau from time of boot until lockup.
Sometimes, shortly before the lockup, graphics will become highly corrupted.
What | Removed | Added |
---|---|---|
Summary | Nouveau Random GPU Lockups | [NVE6] Random GPU Lockups |
I have this problem on a Geforce GTX 660 (NVE6, GK106), too. This problem only occures when desktop effects are turned on in KDE SC. Therefore I think this problem is 3d related or maybe drm. Steam (32bit) is sometimes at fault, too. Sometimes befor the grafics are unresponsive, some parts of the grafical ui only show flicering white an black rectangles. And sometimes the computer (or perhaps only grafics) become very slow. After a lockup, I can sometimes use my keyboard to issue commands. Sometimes my computer is complete unresponsive. I use archlinux (x86_64) with the newest Nouveau releases and this bug existes as long as I use Nouveau. (3-4 month)
My personal guess, based on roughly 0 real information, is that the graph firmware is "wrong". Could one of you try to extract the graph firmware from the blob and use it with nouveau to see if it improves things? http://nouveau.freedesktop.org/wiki/NVC0_Firmware/ (You don't need the video firmware bits.) Don't forget to add nouveau.config=NvGrUseFw=1 in order for nouveau to actually load the external firmware.
Created attachment 90048 [details]
nvidia installer log
I had to build a custom kernel to enable mmiotrace, and I'm having trouble
building the nvidia module on that kernel so I can run the trace. I'm using the
installer from the nvidia website, version 331.17.
Okay, that driver worked and I got a trace, now I'm trying to boot my regular kernel with nouveau but fails to start x. After splash screen it claims there is an error it can't recover from. Upon killing X I see it's complaining about GLX being missing.
you need to switch back to mesa's opengl impl (including, but not limited to, glx)
(In reply to comment #10) > I'm not sure how I would do such a thing? Nevermind, sorted it. The trace seems useless though, none of the values in the "NVC0 Firmware" link appear.
https://spideroak.com/storage/NJXXEYTBOM/shared/709942-4-6416/trace.txt?db8d9c5616249125da0eb60f7b18f4a6
Created attachment 90061 [details]
dmesg nouveau log
Okay I took another one, this time doing some more stuff I thought could have a
chance of triggering it. This one came out as expected. The downside is loading
it causes nouveau to fail. Attached the log from dmesg with the relevant
nouveau bits.
The files need to be in /lib/firmware/nouveau and need to be called nve6_fuc409c (and so on for the other ones) -- is that what you did? If so, is /lib/firmware/nouveau available when the nouveau driver is being loaded? If nouveau is being loaded off an initrd, make sure that the firmware files are in the initrd as well.
I had to make a new initramfs with the files, booted correctly this time. Will report back if it cures the lockups or not.
5 hours now without a lockup, not even a peep in dmesg from nouveau. I would be locking up every 30 minutes to an hour on average before. What now?
What | Removed | Added |
---|---|---|
Summary | [NVE6] Random GPU Lockups | [NVE6] Random GPU Lockups, works with blob PGRAPH fw |
Sit back and enjoy the lockup-free graphics? You might also upload a copy of your vbios (/sys/kernel/debug/dri/0/vbios.rom) as well as the files you extracted. Perhaps a clue will lie there.
I suspected to have the same problem and tried to use the orginal firmware, but failed to do so. I did not try to excract my "own" firmware but tried to use the firmware from the zip file that was attached by Jordan as we have the same card (both GTX 660). But if I boot with the nouveau.config="NvGrUseFw=1" kernel option, the kernel stops directly on the attemp to load the nouveau fb driver. I must mention that I use UEFI boot and my kernel resides on the EFI partion (/dev/sda1) at /boot/EFI/Boot/gentoo/bzImage-3.12.5.efi (the mount point is /boot). On the other hand my firmware is at /lib/firmware/nouveau/ on partition /dev/sda5, which is mounted at /. Let me guess: I need an initrd file in order to make it work? :-( Then I must figure out how to make this work together with UEFI boot. Until now, I tried to avoid initrd and UEFI.
Whether the firmware needs to be in the initrd or not depends on how you have it set up. It needs to be there when the nouveau code is initialized (if you have a module, that means when the module is loaded, if it's built-in, then it needs to be added to the kernel image itself with the ADDITIONAL_FIRMWARE thing or whatever it's called). e.g. the way I set up my initrd is that it does next to nothing, just asks for a password, decrypts the partition, mounts it, and swaps it in as the 'new' root. I never need to touch it (unless I want to make changes to the decryption logic). Modules are loaded with my regular '/' in place. Most distros prefer the complex route and have their initrd's load everything, which in turn means that the firmware needs to be in the intird, and the initrd needs to be updated for every kernel. [Aside: This makes sense when you're creating something that must work on every hardware combination ever imagined (and not) whereby you don't want to build every driver in, but you do want to support various esoteric devices that are may be required for booting, like disk or network. And once you do that, might as well do everything there. But it's very rare that I solder some crazy raid controller into my laptop, so it doesn't really make sense for more tailored setups.]
I went the easy way now. I built nouveau as a module that is loaded later and I see [ 5.323010] nouveau [ PGRAPH][0000:01:00.0] using external firmware in my dmesg output :-) I will stay with the "nouveau as a module" solution until I am sure, that the external firmware solves the problem. Then I can still figure out how to make UEFI and initrd work together. I will come back and report my results, but this may take its time, because the crash only occurs once in a while.
As announced in my previous post I wanted you to inform about my findings. The GPU lock-ups do not occur with the NVIDIA binary PGRAPH fw.
What | Removed | Added |
---|---|---|
CC | pastas4@gmail.com |
*** Bug 69882 has been marked as a duplicate of this bug. ***
I suffered from the same bug. I was able to fix it with some help from users on IRC as well as the files presented here. Figured I'd provide explicit instructions for others. (I'm using Ubuntu 13.10). 1. First make sure you have NV6 firmware $dmesg | grep nouveau | grep Chipset You should see something that looks like: [ 2.318701] nouveau [ DEVICE][0000:01:00.0] Chipset: GK106 (NVE6) 2. Download and move the files from Jordan Bass (thanks!) in comment 19 sudo mkdir /lib/firmware/nouveau sudo /path/to/extractedfiles/* /lib/firmware/nouveau 3. Update initramfs sudo update-initramfs -c -k <YOUR_KERNEL> 4. Update GRUB sudo nano /etc/default/grub Add nouveau.config=NvGrUseFW=1 to GRUB_CMDLINE_LINUX_DEFAULT so that it looks like: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nouveau.config=NvGrUseFW=1" 5. Update grub sudo update-grub 6. Restart and verify sudo shutdown -r now On restart: $dmesg | grep external [ 2.484773] nouveau [ PGRAPH][0000:01:00.0] using external firmware If you see PGRAPH using external firmware, you're done.
It's been about 5 hours since I rebooted using the blob firmware. No problems to report, it seems to be a work-around for this bug. Thank you previous commenters!