From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org
Subject: [Bug 58378] Distorted graphics on NVIDIA GeForce 8400M G
after upgrade the kernel to 3.7.0 version
Date: Tue, 18 Dec 2012 18:46:11 +0000
Message-ID:
What
Removed
Added
Assignee
dri-devel@lists.freedesktop.org
nouveau@lists.freedesktop.org
QA Contact
xorg-team@lists.x.org
Product
DRI
xorg
Component
DRM/other
Driver/nouveau
You are receiving this mail because:
--1355856371.daeB82534.9885--
--===============1914788029==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
Nouveau mailing list
Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
http://lists.freedesktop.org/mailman/listinfo/nouveau
--===============1914788029==--
From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org
Subject: [Bug 58378] Distorted graphics on NVIDIA GeForce 8400M G
after upgrade the kernel to 3.7.0 version
Date: Wed, 19 Dec 2012 10:23:48 +0000
Message-ID:
I am having the same problems post kernel version 3.7.0 with a GeForce 8800 GTS. Even glxgears will lock up. I get a ton of these messages: [ 83.399004] nouveau [ PFIFO][0000:01:00.0] CACHE_ERROR - Ch 2/3 Mthd 0x108c Data 0x2036652f with the occasional: [ 83.418650] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 4 MP 1: INVALID_OPCODE at 07f4d8 warp 2, opcode 0423c788 10000811 [ 83.418659] nouveau [ PGRAPH][0000:01:00.0] TRAP [ 83.418663] nouveau E[ PGRAPH][0000:01:00.0] ch 4 [0x0027948000] subc 3 class 0x5097 mthd 0x0f04 data 0x00000000 [ 83.418672] nouveau E[ PFB][0000:01:00.0] trapped read at 0x0000000000 on channel 0x00027948 PFIFO/PFIFO_READ/SEMAPHORE reason: DMAOBJ_LIMIT [ 83.431368] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 4 MP 1: INVALID_OPCODE at 07f4d8 warp 2, opcode 0423c788 10000811 [ 83.431376] nouveau [ PGRAPH][0000:01:00.0] TRAP [ 83.431379] nouveau E[ PGRAPH][0000:01:00.0] ch 4 [0x0027948000] subc 3 class 0x5097 mthd 0x0f04 data 0x00000000
Same here with nVidia GeForce 8400M G videocard in an Acer Aspire 7520 G laptop running Ubuntu 12.10 64bit AMD64. My first impression was a heat problem due to dust. So i cleaned the laptop fan and refitted the heatsink and heatpipes with new thermal (silver) contact paste, but the video-error reoccurs. When only two webpages are opened: no problem. Starting a Youtube video: screen is a mass, like Henrique Dias reported. Is there a relation to the reported failure of nVidia GeForce 8 series?? http://news.cnet.com/8301-13924_3-10037632-64.html Carolien.
Hi. I have exactly the same issue. I seem to be able to trigger it faster by opening firefox on a page with many images. Current Kernel: 3.8.3-103.fc17.x86_64 Other kernels affected: kernel-3.7.9-104.fc17.x86_64 kernel-3.7.9-101.fc17.x86_64 01:00.0 VGA compatible controller: nVidia Corporation G86 [GeForce 8300 GS] (rev a1) (prog-if 00 [VGA controller]) Subsystem: nVidia Corporation Device 0494 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at f8000000 (64-bit, non-prefetchable) [size=32M] Region 5: I/O ports at df00 [size=128] [virtual] Expansion ROM at fb000000 [disabled] [size=128K] Capabilities: [60] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [78] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM unknown, Latency L0 <512ns, L1 <4us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau
After further investigation, this issue only seems to happen to applications using the gtk libs. In my case at least ... After triggering the bug, any app which is using the GTK libs will be affected. It does not seem to affect other app's ( not using gtk ) rendering process. Also, the same issue doesn't happen whilst using the NVIDIA drivers, which are just impossible to use as in my case the system is just unusable slow.
Hello! New to ubuntu. I have an old acer 5520g with the exact same problem you are describing in the comments above. I also tought it was a heat problem and found alot of dust in the graphics cards fan. My computer completely locks down and I am unable to even login or open a terminal at the loginscreen after the first glitch. Dave
Hi, same here after updating to Ubuntu 12.04.3 Kernel 3.8.0-33-generic lspci -nnvv says: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G86M [GeForce 8400M G] [10de:0428] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Fujitsu Limited. Device [10cf:1422] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at dc000000 (64-bit, non-prefetchable) [size=32M] Region 5: I/O ports at 2000 [size=128] Expansion ROM at <unassigned> [disabled] Capabilities: <access denied> Kernel driver in use: nouveau Kernel modules: nouveau, nvidiafb Graphic is distorted once it happens the system is frozen ( with some luck I may reach a terminal ) Before it happens the fontcolor in Windowframes changes to "white on white " e.g. same as the background color I run a E8410 Lifebook BTW : Using the Nvidia proprietary drivers is not an option they made the system unusable at all and forced me to reinstall several times
HI again, in addition some error messages Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.215782] nouveau E[ PGRAPH][0000:01:00.0] ch 2 [0x0007b23000] subc 7 class 0x8297 mthd 0x15e0 data 0x00000000 Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304180] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 000004 warp 10, opcode ffb9c1d8 ffbac2d9 Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304188] nouveau E[ PGRAPH][0000:01:00.0] TRAP Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304193] nouveau E[ PGRAPH][0000:01:00.0] ch 2 [0x0007b23000] subc 7 class 0x8297 mthd 0x15e0 data 0x00000000 Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304477] nouveau E[ PGRAPH][0000:01:00.0] TRAP_MP_EXEC - TP 0 MP 0: INVALID_OPCODE at 000004 warp 10, opcode ffb9c1d8 ffbac2d9 Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304483] nouveau E[ PGRAPH][0000:01:00.0] TRAP Nov 29 11:21:47 torsten-LIFEBOOK-E8410 kernel: [ 66.304487] nouveau E[ PGRAPH][0000:01:00.0] ch 2 [0x0007b23000] subc 7 class 0x8297 mthd 0x15e0 data 0x00000000 and Nov 29 11:26:04 torsten-LIFEBOOK-E8410 kernel: [ 323.106306] nouveau E[ DRM] GPU lockup - switching to software fbcon Nov 29 11:27:07 torsten-LIFEBOOK-E8410 kernel: [ 386.736037] nouveau E[ 3431] failed to idle channel 0xcccc0001 Nov 29 11:27:09 torsten-LIFEBOOK-E8410 kernel: [ 388.735098] nouveau E[ PFIFO][0000:01:00.0] channel 3 unload timeout Nov 29 11:27:12 torsten-LIFEBOOK-E8410 kernel: [ 391.732025] nouveau E[ 3431] failed to idle channel 0xcccc0000 Nov 29 11:27:14 torsten-LIFEBOOK-E8410 kernel: [ 393.731221] nouveau E[ PFIFO][0000:01:00.0] channel 2 unload timeout Nov 29 11:28:09 torsten-LIFEBOOK-E8410 kernel: [ 448.580025] nouveau E[ 4056] failed to idle channel 0xcccc0001 Nov 29 11:28:11 torsten-LIFEBOOK-E8410 kernel: [ 450.579162] nouveau E[ PFIFO][0000:01:00.0] channel 3 unload timeout Nov 29 11:28:14 torsten-LIFEBOOK-E8410 kernel: [ 453.576022] nouveau E[ 4056] failed to idle channel 0xcccc0000 Nov 29 11:28:16 torsten-LIFEBOOK-E8410 kernel: [ 455.575198] nouveau E[ PFIFO][0000:01:00.0] channel 2 unload timeout Nov 29 11:29:17 torsten-LIFEBOOK-E8410 kernel: [ 516.552036] nouveau E[ 4211] failed to idle channel 0xcccc0001 Nov 29 11:29:19 torsten-LIFEBOOK-E8410 kernel: [ 518.553893] nouveau E[ PFIFO][0000:01:00.0] channel 3 unload timeout Nov 29 11:29:22 torsten-LIFEBOOK-E8410 kernel: [ 521.556024] nouveau E[ 4211] failed to idle channel 0xcccc0000 Nov 29 11:29:24 torsten-LIFEBOOK-E8410 kernel: [ 523.555077] nouveau E[ PFIFO][0000:01:00.0] channel 2 unload timeout For both the session is Gnome. Now when running on Gnome (no effects ) ist is slighly more stable. As mentioned I also tried NVIDIA drivers .... with the effect that the system was unusable at all. Since the issue seems to be quite old . . . there should be an appropriate solution by now ! cheers TS
HI, I wonder if this is still alive ?? Any news on this cheers T
What | Removed | Added |
---|---|---|
Priority | medium | high |
What | Removed | Added |
---|---|---|
Severity | critical | normal |
Priority | high | medium |
Messing with priority just annoys the developers. In the meanwhile, try new kernels. I only see up to 3.8 tested. Do a bisect. There was a major driver rewrite in 3.7, but it might have been something else that causes the issue. Make sure you're running an updated DDX. As you might imagine, none of the devs are seeing this, so you'll have to do the debugging if you want it fixed.
What | Removed | Added |
---|---|---|
Summary | Distorted graphics on NVIDIA GeForce 8400M G after upgrade the kernel to 3.7.0 version | [NV86] Distorted graphics on NVIDIA GeForce 8400M G after upgrade the kernel to 3.7.0 version |
Created attachment 90715 [details]
Distorted graphics with RHEL6/OL6 showing uname -a kernel 3.12.4
Created attachment 90717 [details]
Distorted graphics: Icons (on kernel 3.12.4)
Hello, I would like to join discussions in this bug, as I have found myself affected after the recent update from Red Hat Enterprise Linux/Oracle Linux 6.4 (stock RHEL kernel 2.6.32-358.23.2) to RHEL/OL 6.5 (RHEL kernel 2.6.32-431). My graphics card is NVidia Quadro NVS 130M: BOOT0 : 0x086a00a2 Chipset: G86 (NV86) Family : NV50 It seems that RHEL 6.5 kernel 2.6.32-431 has updated its kernel modules for nouveau DRM to a codebase level that matches official Linux kernels 3.7, and therefore introduced this severe graphics distortion issue into mainline RHEL 6. In order to verify that it indeed is the nouveau DRM kernel module resonsible for the distortion, I have upgraded my OL6 packages to the following versions: * mesa-9.2.0.5 (including support for nouveau, which is commented out by default in RHEL6) * libdrm-2.4.50 * xorg-x11-drv-nouveau-1.0.9 but this does NOT affect the issue at all. But reverting back to RHEL stock kernel 2.6.32-358.23.2 makes the issue vanish, also when using the above updated library versions. I then tried Oracle's UEK kernels, and while the current UEK2 kernel (2.6.39-400.211.2) does NOT have the issue, the current UEK3 kernel (3.8.13-16.2.2) also shows it. I then tried to find out about the exact "versions" (git commit levels?) of the nouveau libdrm modules, and found out the following: (1) Oracle UEK2 kernel 2.6.39-400.211.2 - NO ISSUE: [drm] Initialized nouveau 0.0.16 20090420 for 0000:01:00.0 on minor 0 (2) RHEL stock kernel 2.6.32-358.23.2 - NO ISSUE: [drm] Initialized nouveau 1.0.0 20120316 for 0000:01:00.0 on minor 0 (3) RHEL stock kernel 2.6.32-431 - DOES SHOW THE ISSUE: [drm] Initialized nouveau 1.1.0 20120801 for 0000:01:00.0 on minor 0 (4) more recent kernels, such as Oracle UEK3 (3.8.13-16.2.2) and the most recent Oracle "playground" kernel from public-yum.oracle.com (3.12.4-3.12.y.20131210) all DO SHOW THE ISSUE: [drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 0 So to me it now seems as if the issue has been introduced with the massive changes to nouveau/DRM that went into 3.7: http://www.phoronix.com/scan.php?page=news_item&px=MTE1NDg and affects ALL subsequent versions since then... :-( I would be very interested and willing to help in debugging/tracking this down, but I don't have any git background, so you would have to guide me through how to do the "bisect"... Hope this helps & looking forward to your feedback! :-) Best regards, Andreas
Had been missing my "lspci -nnvv" information: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation G86M [Quadro NVS 130M] [10de:042a] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Toshiba America Info Systems Device [1179:0002] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M] Region 5: I/O ports at cf00 [size=128] [virtual] Expansion ROM at fc000000 [disabled] [size=128K] Capabilities: [60] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [78] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <4us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s L1 Enabled; RCB 128 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau Kernel modules: nouveau, nvidiafb
(a) Can we see a full boot log (e.g. output of dmesg) with a recent kernel? Ideally it would include the time that the visual issues happen. (b) This looks like it could be a fencing issue, i.e. we try to draw to a texture, but then instead of waiting, we don't wait. There were some fixes that went into 3.13-rc1, so perhaps trying the latest and greatest (e.g. 3.13-rc3, or the latest Linus HEAD) would be good to test out. (c) There are many bisection guides on the internet. You will also need to figure out how to make the compiled kernel play nice with your distribution. The basics are simple though: 1. git bisect start v3.7 v3.6 -- drivers/gpu/drm/nouveau 2. build/install/boot/test 3. if it's good, "git bisect good", if it's bad, "git bisect bad" 4. goto 2 At some point running the step 3 command will tell you "first bad commit is xyz". That's when you're done. I suspect it might be the giant mega "rewrite nouveau" commit, in which case we're screwed and this will have been a huge time-waster (apologies in advance if it turns out this way). But it might be one of the many other commits that went into 3.7, which would be nice and indicate an area to focus on.
Hello Ilia, regarding (a) and (b): I am just waiting for a rpmbuild of an OL6 version of 3.13-rc3 to finish and will report back on my findings and include a dmesg output from that version. Regarding (c): Would'nt it make more sense than starting with 3.6 release and 3.7 release tags to first rule out the "mega commit"? Can you give me the git commands (or point me to a doc that tells me how to produce them) for getting "ordinary kernel tarballs" out of the DRM nouveau git just like the ones published on https://www.kernel.org/pub/linux/kernel/v3.0/testing/ for two points in time in between 3.6 and 3.7: (1) for the version up to the immediate commit BEFORE the "mega commit" (2) for the version exactly matching the "mega commit"? Using these two kernel tarballs, I could then either confirm or rule out the "mega commit" as the root cause for the issue, and in the (unlikely) case the mega commit can indeed be ruled out, I could then concentrate on further narrowing down the commits * either between 3.6 and the mega commit if build (1) is already broken * or between the mega commit and 3.7 if build (2) still works, but 3.7 fails? Sorry, but rather than pulling the whole git on my poor old laptop and starting a huge number of bisection attemps "into the blue", I think that this makes more sense and does not require me to become a git expert in order to try and help tracking this down... ;-) What do you think? I will report back shortly with my 3.13-rc3 results... BR, Andreas
The mega-commit is ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69. So you could check out ebb945a94bba^ and see if it works, and then test ebb945a94bba to see if it doesn't. In either case, you could use those as your new "good" or "bad" starting points. You can do a clone with like --depth 1 or something. Not sure how to do that at a commit. Also I'd recommend against it, it'll just be more downloading later on if things don't pan out. A full git clone of the linux kernel is ~800MB (+ space to actually store the files, but that's all part of the 800MB). In fact, I don't even know if that 818MB is compressed or not -- I'd guess not, so the download is probably much smaller.
Created attachment 90764 [details]
dmesg output on 3.13-rc3 while the issue was seen
Created attachment 90765 [details]
dmesg output in debug mode (nouveau.debug=debug) on 3.13-rc3 while the issue
was seen
Hi again, sorry, it took longer than needed for me to find my way through compiling recent kernels with rpmbuild and an appropriate spec file. The result of my testing is negative: The bug is still included in the most recent 3.13-rc3 kernel... :-( >From the attached dmesg output (which in both cases, includes the time when the issue was seen and my screen was completely garbled), it looks to me that there are no signs - not even in debug mode - of anything going wrong, so if I am right with this assumption, I think this supports your theory that the root cause of the severe screen corruption indeed is a "fencing" issue... In the meantime, I have created a git repository on my machine and produced two 3.6-based tarballs for before and after the "mega patch". I will now move forward to adapt a 3.6 kernel rpmbuild spec file and then build two kernels for these two snapshots. I should be able to update you on my progress some time tomorrow... Thanks & best regards, Andreas
Hmm - bad news once again:
I have now compiled and tested a 3.6.kernel to match the commit immediately
before the "mega commit", i.e. the kernel tarball has been produced by the
following command:
$ git archive --format=tar "ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69^" | bzip2
> ~/Projekte/nouveau_drm/linux-before-mega.tar.bz2
Unfortunately, I am unable to test whether the screen distortion issue occurs
with this kernel, because I get a complete hang (system freezes, CPU and GPU
fans running full speed) somewhere between some seconds and some minutes after
starting GNOME...
Note that I have seen both: either no screen corruption at all or first slight
signs of screen corruption (white rectangles around window frames) at the times
of the hangs.
The error messages that I find in /var/log/messages probably associated with
the hangs (sorry, I can't get any messages ot of dmesg due to the hang...) seem
to be the following:
[drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
[drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
[drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
repeating any number between 3 to 5 times directly before the hangs
(immediately followed by /var/log/messages starting over with my power-off
machine restart).
Will now move forward to test with the most recent stock kernel from the 3.6
series: 3.6.11-3.6.y.20121225.ol6 from the Oracle public yum playground to see
whether this already is affected... :-(
BR,
Andreas
This gets really interesting now: Oracle public yum "playground" 3.6.11-3.6.y.20121225.ol6 (should be stock kernel 3.6.11) does NOT show any hangs, but DOES INDEED ALREADY show the graphics corruption issue FOR ME (although it was thought by the original posters here that it started with 3.7.0)...!? So I will now try and move backwards in kernel versions until I might find one that does not exhibit the corruption bug. As Oracle's "playground" kernels are only available starting from 3.6, I will probably move to ELRepo "ml" kernels for this job. I'll report back once I have some idea of where exactly the issue indeed started... BR, Andreas
OK, finally I have some more encouraging news: It now looks like the issue indeed started much earlier than initially thought, namely already between the 3.4 and 3.5 kernel series!!! Results from my testing with stock kernels obtained from kernel.org (I've never ever before compiled so many kernels in such a short period of time...): * 3.4.5 -> NO ISSUE * 3.4.74 -> NO ISSUE * 3.5.1 -> ISSUE SEEN * 3.5.5 -> ISSUE SEEN * all later versions (3.6 onwards) -> ISSUE SEEN. So please advise now what next steps I should undertake to track it down more closely: What new commits have happened between the 3.4 and 3.5 series, and did one of them possibly affect so-called "fencing" on NV86/NV50 chips? (And - in order to learn some more git - how can I find out the associated commits using git command-line, such that I can produce the respective kernel tarballs for testing out of git?) Many thanks in advance for your feedback! :-) Andreas
In addition, one more request to all the other people who raised this issue here and/or have also seen it before myself: Can you confirm that for you, the issue indeed also already started after the 3.4 series like it does for me, i.e. you never tried a 3.5.x or 3.6.x kernels? Thanks & BR, Andreas
You really need to figure out how to do things inside the git tree and not do some sort of crazy export. That will speed things up by an order of magnitude. To get the list of nouveau changes between 3.4 and 3.5: git log v3.4..v3.5 -- drivers/gpu/drm/nouveau To do a bisect between 3.4 and 3.5, same instructions as before, but use v3.5 as the bad tag and v3.4 as the good tag. Looking through the list of changes, c420b2dc8dc3cdd507214f4df5c5f96f08812cbe stands out as a big one, as does 5e120f6e4b3f35b741c5445dfc755f50128c3c44 which actually introduces the nv84+ fence mechanism. This had actually previously occurred to me, but a quick thing to try out is to switch to the nv17 fence and see what happens. You can do this by editing the logic in drivers/gpu/drm/nouveau/nouveau_drm.c:nouveau_accel_init, and just replace nv84_fence_create with nv50_fence_create (which will make a nv50+ appropriate nv17 fence impl).
Many thanks for your quick reply - even on a Sunday! :-) Regarding: "You really need to figure out how to do things inside the git tree and not do some sort of crazy export. That will speed things up by an order of magnitude." the main issue is that I need to build a RHEL6/OL6 compliant kernel on my machine, and I simply don't have a spec file which properly builds such a kernel from git, so I need to export the git snapshot to a tarball. In case you have such an RHEL6/OL6 spec file (or know where to get one from), please let me know... I'm just in the process of trying whether moving from nv84_fence_create to nv50_fence_create will make a difference with 3.6.11 and will report back later. BR, Andreas