All of lore.kernel.org
 help / color / mirror / Atom feed
* nouveau graphical corruption in 3.13.2
@ 2014-02-08  7:58 ` Daniel J Blueman
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel J Blueman @ 2014-02-08  7:58 UTC (permalink / raw)
  To: nouveau, dri-devel; +Cc: Linux Kernel, Dave Airlie, Ben Skeggs

Hi guys,

With a GeForce 320M GPU running linux 3.13.2 and Xorg 1.15.0, I'm
seeing significant graphical corruption and later unrecoverable GPU
lockup, accompanied by thousands of ILLEGAL_MTHD or related kernel
messages [1]. I see similar issues on 3.12 also.

Is there any debugging or testing I can do to help diagnose this?

Many thanks,
  Daniel

--- [1]

http://quora.org/nouveau-dmesg.txt
http://quora.org/nouveau-Xorg.0.log
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* nouveau graphical corruption in 3.13.2
@ 2014-02-08  7:58 ` Daniel J Blueman
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel J Blueman @ 2014-02-08  7:58 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Dave Airlie, Linux Kernel, Ben Skeggs

Hi guys,

With a GeForce 320M GPU running linux 3.13.2 and Xorg 1.15.0, I'm
seeing significant graphical corruption and later unrecoverable GPU
lockup, accompanied by thousands of ILLEGAL_MTHD or related kernel
messages [1]. I see similar issues on 3.12 also.

Is there any debugging or testing I can do to help diagnose this?

Many thanks,
  Daniel

--- [1]

http://quora.org/nouveau-dmesg.txt
http://quora.org/nouveau-Xorg.0.log
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
  2014-02-08  7:58 ` Daniel J Blueman
  (?)
@ 2014-02-08  8:33 ` Ilia Mirkin
  2014-02-08 15:38     ` Daniel J Blueman
  -1 siblings, 1 reply; 14+ messages in thread
From: Ilia Mirkin @ 2014-02-08  8:33 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: nouveau, dri-devel, Linux Kernel, Dave Airlie, Ben Skeggs

On Sat, Feb 8, 2014 at 2:58 AM, Daniel J Blueman <daniel@quora.org> wrote:
> Hi guys,
>
> With a GeForce 320M GPU running linux 3.13.2 and Xorg 1.15.0, I'm
> seeing significant graphical corruption and later unrecoverable GPU
> lockup, accompanied by thousands of ILLEGAL_MTHD or related kernel
> messages [1]. I see similar issues on 3.12 also.
>
> Is there any debugging or testing I can do to help diagnose this?

Is this new? i.e. was there a kernel where it all worked well?

You get caught by the new disable logic in 3.13 that looks at a
register to figure out what engines have been disabled:

[    6.306005] nouveau W[    PCE0][0000:04:00.0] disabled, PCE0=1 to enable

Perhaps it's actually there and we have incorrect information about
the feature disable register -- you can force-enable it with
nouveau.config=PCE0=1 if you like. Although this logic is new in 3.13,
so if you also saw the issue in 3.12, that's probably not the cause.
(Also, the in-kernel logic falls back to M2MF, and so does the DDX,
and I don't see any usage in mesa, so even _if_ it's incorrectly
disabled, I doubt this would be your issue.)

Another thing that's new in 3.13 is MSI -- you can disable it with
nouveau.config=NvMSI=0.

There's only one currently-open bug about NVAF:
https://bugs.freedesktop.org/show_bug.cgi?id=60150 -- unfortunately
the bug filer wasn't very specific about the issues. But it might be
worth trying an ancient kernel (e.g. pre-3.7 -- there was a big
rewrite in 3.7, or even one of those 3.2-based franken-kernels that
distros maintained.) I suppose if you were to boot with
nouveau.noaccel=1 your problems would go away, but so would any 2d/3d
accel.

[   85.751375] nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 3
[Xorg[919]] get 0x0020022a4c put 0x0020023140 ib_get 0x00000391 ib_put
0x000003c2 state 0x8000e6a8 (err: INVALID_CMD) push 0x00400040

I've seen this kind of error before, on many different card types, and
have _no clue_ how it happens -- at no point is that command actually
written to the ring (I think). After that happens, it looks like
things get a little upset, and basically nothing works again. When
I've seen it before things tend to recover, but I guess they don't
have to.

I wouldn't be surprised if this was some sort of issue in the fifo
context switch code (which I'm most unfamiliar with, but others know
more). It has all sorts of chipset-specific stuff, and chances are
nvaf wasn't well-represented when all those were made. Assuming there
isn't an earlier working version of nouveau, one avenue is to do a
mmiotrace (https://wiki.ubuntu.com/X/MMIOTracing) of the blob starting
X and running e.g. glxgears. Then one would have to look at what
ctxprog it uploads and reconcile that with nouveau's somehow. (But
perhaps this is entirely wrong and nouveau's ctxprogs are fine.)

 -ilia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
@ 2014-02-08 15:38     ` Daniel J Blueman
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel J Blueman @ 2014-02-08 15:38 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau, dri-devel, Linux Kernel, Dave Airlie, Ben Skeggs

On 8 February 2014 16:33, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> On Sat, Feb 8, 2014 at 2:58 AM, Daniel J Blueman <daniel@quora.org> wrote:
>> Hi guys,
>>
>> With a GeForce 320M GPU running linux 3.13.2 and Xorg 1.15.0, I'm
>> seeing significant graphical corruption and later unrecoverable GPU
>> lockup, accompanied by thousands of ILLEGAL_MTHD or related kernel
>> messages [1]. I see similar issues on 3.12 also.
>>
>> Is there any debugging or testing I can do to help diagnose this?
>
> Is this new? i.e. was there a kernel where it all worked well?
>
> You get caught by the new disable logic in 3.13 that looks at a
> register to figure out what engines have been disabled:
>
> [    6.306005] nouveau W[    PCE0][0000:04:00.0] disabled, PCE0=1 to enable
>
> Perhaps it's actually there and we have incorrect information about
> the feature disable register -- you can force-enable it with
> nouveau.config=PCE0=1 if you like. Although this logic is new in 3.13,
> so if you also saw the issue in 3.12, that's probably not the cause.
> (Also, the in-kernel logic falls back to M2MF, and so does the DDX,
> and I don't see any usage in mesa, so even _if_ it's incorrectly
> disabled, I doubt this would be your issue.)
>
> Another thing that's new in 3.13 is MSI -- you can disable it with
> nouveau.config=NvMSI=0.
>
> There's only one currently-open bug about NVAF:
> https://bugs.freedesktop.org/show_bug.cgi?id=60150 -- unfortunately
> the bug filer wasn't very specific about the issues. But it might be
> worth trying an ancient kernel (e.g. pre-3.7 -- there was a big
> rewrite in 3.7, or even one of those 3.2-based franken-kernels that
> distros maintained.) I suppose if you were to boot with
> nouveau.noaccel=1 your problems would go away, but so would any 2d/3d
> accel.
>
> [   85.751375] nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 3
> [Xorg[919]] get 0x0020022a4c put 0x0020023140 ib_get 0x00000391 ib_put
> 0x000003c2 state 0x8000e6a8 (err: INVALID_CMD) push 0x00400040
>
> I've seen this kind of error before, on many different card types, and
> have _no clue_ how it happens -- at no point is that command actually
> written to the ring (I think). After that happens, it looks like
> things get a little upset, and basically nothing works again. When
> I've seen it before things tend to recover, but I guess they don't
> have to.
>
> I wouldn't be surprised if this was some sort of issue in the fifo
> context switch code (which I'm most unfamiliar with, but others know
> more). It has all sorts of chipset-specific stuff, and chances are
> nvaf wasn't well-represented when all those were made. Assuming there
> isn't an earlier working version of nouveau, one avenue is to do a
> mmiotrace (https://wiki.ubuntu.com/X/MMIOTracing) of the blob starting
> X and running e.g. glxgears. Then one would have to look at what
> ctxprog it uploads and reconcile that with nouveau's somehow. (But
> perhaps this is entirely wrong and nouveau's ctxprogs are fine.)

Superb writeup! Indeed, booting with nouveau.config=PCI0=1 didn't help
as you deduced, nor did nouveau.config=NvMSI=0.

Interestingly, there was graphical failure booting 3.6.11, even
nvidia-current fails to initialise, but these two issues could be due
to running the Xorg stack in Ubuntu 14.04 pre-release. Using
nouveau.noaccel=1 works great for the first X session, but after
logging out, lightdm and the next session experiences this consistent
screen corruption:

http://quora.org/nouveau-corruption.jpg

Changing to other resolutions except 1280x800 (native panel
resolution), there is no corruption, but the corruption is
consistently there when changing back to 1280x800. What would be good
to look out for to help diagnose this?

Obviously noaccel is a very useful and important fallback. For the
accelerated case, I'll wait until the Nvidia blob works in Ubuntu
14.04, then employ MMIO tracing and follow up with the differences I
find.

Many thanks,
  Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
@ 2014-02-08 15:38     ` Daniel J Blueman
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel J Blueman @ 2014-02-08 15:38 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: Dave Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Linux Kernel, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Ben Skeggs

On 8 February 2014 16:33, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> On Sat, Feb 8, 2014 at 2:58 AM, Daniel J Blueman <daniel-JCXMqzvhRcvYtjvyW6yDsg@public.gmane.org> wrote:
>> Hi guys,
>>
>> With a GeForce 320M GPU running linux 3.13.2 and Xorg 1.15.0, I'm
>> seeing significant graphical corruption and later unrecoverable GPU
>> lockup, accompanied by thousands of ILLEGAL_MTHD or related kernel
>> messages [1]. I see similar issues on 3.12 also.
>>
>> Is there any debugging or testing I can do to help diagnose this?
>
> Is this new? i.e. was there a kernel where it all worked well?
>
> You get caught by the new disable logic in 3.13 that looks at a
> register to figure out what engines have been disabled:
>
> [    6.306005] nouveau W[    PCE0][0000:04:00.0] disabled, PCE0=1 to enable
>
> Perhaps it's actually there and we have incorrect information about
> the feature disable register -- you can force-enable it with
> nouveau.config=PCE0=1 if you like. Although this logic is new in 3.13,
> so if you also saw the issue in 3.12, that's probably not the cause.
> (Also, the in-kernel logic falls back to M2MF, and so does the DDX,
> and I don't see any usage in mesa, so even _if_ it's incorrectly
> disabled, I doubt this would be your issue.)
>
> Another thing that's new in 3.13 is MSI -- you can disable it with
> nouveau.config=NvMSI=0.
>
> There's only one currently-open bug about NVAF:
> https://bugs.freedesktop.org/show_bug.cgi?id=60150 -- unfortunately
> the bug filer wasn't very specific about the issues. But it might be
> worth trying an ancient kernel (e.g. pre-3.7 -- there was a big
> rewrite in 3.7, or even one of those 3.2-based franken-kernels that
> distros maintained.) I suppose if you were to boot with
> nouveau.noaccel=1 your problems would go away, but so would any 2d/3d
> accel.
>
> [   85.751375] nouveau E[   PFIFO][0000:04:00.0] DMA_PUSHER - ch 3
> [Xorg[919]] get 0x0020022a4c put 0x0020023140 ib_get 0x00000391 ib_put
> 0x000003c2 state 0x8000e6a8 (err: INVALID_CMD) push 0x00400040
>
> I've seen this kind of error before, on many different card types, and
> have _no clue_ how it happens -- at no point is that command actually
> written to the ring (I think). After that happens, it looks like
> things get a little upset, and basically nothing works again. When
> I've seen it before things tend to recover, but I guess they don't
> have to.
>
> I wouldn't be surprised if this was some sort of issue in the fifo
> context switch code (which I'm most unfamiliar with, but others know
> more). It has all sorts of chipset-specific stuff, and chances are
> nvaf wasn't well-represented when all those were made. Assuming there
> isn't an earlier working version of nouveau, one avenue is to do a
> mmiotrace (https://wiki.ubuntu.com/X/MMIOTracing) of the blob starting
> X and running e.g. glxgears. Then one would have to look at what
> ctxprog it uploads and reconcile that with nouveau's somehow. (But
> perhaps this is entirely wrong and nouveau's ctxprogs are fine.)

Superb writeup! Indeed, booting with nouveau.config=PCI0=1 didn't help
as you deduced, nor did nouveau.config=NvMSI=0.

Interestingly, there was graphical failure booting 3.6.11, even
nvidia-current fails to initialise, but these two issues could be due
to running the Xorg stack in Ubuntu 14.04 pre-release. Using
nouveau.noaccel=1 works great for the first X session, but after
logging out, lightdm and the next session experiences this consistent
screen corruption:

http://quora.org/nouveau-corruption.jpg

Changing to other resolutions except 1280x800 (native panel
resolution), there is no corruption, but the corruption is
consistently there when changing back to 1280x800. What would be good
to look out for to help diagnose this?

Obviously noaccel is a very useful and important fallback. For the
accelerated case, I'll wait until the Nvidia blob works in Ubuntu
14.04, then employ MMIO tracing and follow up with the differences I
find.

Many thanks,
  Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
@ 2014-02-08 18:57       ` Ilia Mirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Ilia Mirkin @ 2014-02-08 18:57 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: nouveau, dri-devel, Linux Kernel, Dave Airlie, Ben Skeggs

On Sat, Feb 8, 2014 at 10:38 AM, Daniel J Blueman <daniel@quora.org> wrote:
> Interestingly, there was graphical failure booting 3.6.11, even
> nvidia-current fails to initialise, but these two issues could be due
> to running the Xorg stack in Ubuntu 14.04 pre-release. Using
> nouveau.noaccel=1 works great for the first X session, but after
> logging out, lightdm and the next session experiences this consistent
> screen corruption:
>
> http://quora.org/nouveau-corruption.jpg

Does that just happen in 3.6.11 or even in 3.13? If the latter, that
points to some key lack of understanding of... something. With
noaccel, we're not using pgraph or anything fancy -- it's just a
framebuffer, basically. So if we can't even render _that_ right...

Hopefully someone else will pipe up re your other issues -- my
knowledge base on this is exhausted :(

  -ilia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
@ 2014-02-08 18:57       ` Ilia Mirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Ilia Mirkin @ 2014-02-08 18:57 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Dave Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Linux Kernel, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Ben Skeggs

On Sat, Feb 8, 2014 at 10:38 AM, Daniel J Blueman <daniel-JCXMqzvhRcvYtjvyW6yDsg@public.gmane.org> wrote:
> Interestingly, there was graphical failure booting 3.6.11, even
> nvidia-current fails to initialise, but these two issues could be due
> to running the Xorg stack in Ubuntu 14.04 pre-release. Using
> nouveau.noaccel=1 works great for the first X session, but after
> logging out, lightdm and the next session experiences this consistent
> screen corruption:
>
> http://quora.org/nouveau-corruption.jpg

Does that just happen in 3.6.11 or even in 3.13? If the latter, that
points to some key lack of understanding of... something. With
noaccel, we're not using pgraph or anything fancy -- it's just a
framebuffer, basically. So if we can't even render _that_ right...

Hopefully someone else will pipe up re your other issues -- my
knowledge base on this is exhausted :(

  -ilia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
  2014-02-08 18:57       ` Ilia Mirkin
  (?)
@ 2014-02-23  3:45       ` Daniel J Blueman
  2014-02-23  3:48           ` Ilia Mirkin
  2014-02-23  4:33           ` Ilia Mirkin
  -1 siblings, 2 replies; 14+ messages in thread
From: Daniel J Blueman @ 2014-02-23  3:45 UTC (permalink / raw)
  To: Ilia Mirkin, Ben Skeggs, Dave Airlie
  Cc: nouveau, dri-devel, Linux Kernel, Maarten Lankhorst, Marcin Slusarz

On 9 February 2014 02:57, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> On Sat, Feb 8, 2014 at 10:38 AM, Daniel J Blueman <daniel@quora.org> wrote:
>> Interestingly, there was graphical failure booting 3.6.11, even
>> nvidia-current fails to initialise, but these two issues could be due
>> to running the Xorg stack in Ubuntu 14.04 pre-release. Using
>> nouveau.noaccel=1 works great for the first X session, but after
>> logging out, lightdm and the next session experiences this consistent
>> screen corruption:
>>
>> http://quora.org/nouveau-corruption.jpg
>
> Does that just happen in 3.6.11 or even in 3.13? If the latter, that.
> points to some key lack of understanding of... something. With
> noaccel, we're not using pgraph or anything fancy -- it's just a
> framebuffer, basically. So if we can't even render _that_ right...
>
> Hopefully someone else will pipe up re your other issues -- my
> knowledge base on this is exhausted :(

Interestingly, it turns out that the screen corruption occurs on every
boot (booting with nouveau.noaccel=1 for now), and I can consistently
work around it by one suspend-resume cycle.

To that effect, I've captured kernel message output booting 3.14-rc3
with 'nouveau.noaccel=1 nouveau.debug=trace,DEVINIT=spam drm.debug=0x6
log_buf_len=16M', and performed a suspend-resume cycle:
http://quora.org/nouveau-log.txt

Ben et al, would some specific register tracing or otherwise help to
locate the issue?
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
  2014-02-23  3:45       ` Daniel J Blueman
@ 2014-02-23  3:48           ` Ilia Mirkin
  2014-02-23  4:33           ` Ilia Mirkin
  1 sibling, 0 replies; 14+ messages in thread
From: Ilia Mirkin @ 2014-02-23  3:48 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Ben Skeggs, Dave Airlie, nouveau, dri-devel, Linux Kernel,
	Maarten Lankhorst, Marcin Slusarz

On Sat, Feb 22, 2014 at 10:45 PM, Daniel J Blueman <daniel@quora.org> wrote:
> On 9 February 2014 02:57, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>> On Sat, Feb 8, 2014 at 10:38 AM, Daniel J Blueman <daniel@quora.org> wrote:
>>> Interestingly, there was graphical failure booting 3.6.11, even
>>> nvidia-current fails to initialise, but these two issues could be due
>>> to running the Xorg stack in Ubuntu 14.04 pre-release. Using
>>> nouveau.noaccel=1 works great for the first X session, but after
>>> logging out, lightdm and the next session experiences this consistent
>>> screen corruption:
>>>
>>> http://quora.org/nouveau-corruption.jpg
>>
>> Does that just happen in 3.6.11 or even in 3.13? If the latter, that.
>> points to some key lack of understanding of... something. With
>> noaccel, we're not using pgraph or anything fancy -- it's just a
>> framebuffer, basically. So if we can't even render _that_ right...
>>
>> Hopefully someone else will pipe up re your other issues -- my
>> knowledge base on this is exhausted :(
>
> Interestingly, it turns out that the screen corruption occurs on every
> boot (booting with nouveau.noaccel=1 for now), and I can consistently
> work around it by one suspend-resume cycle.

Does booting with nouveau.config=NvForcePost=1 help?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
@ 2014-02-23  3:48           ` Ilia Mirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Ilia Mirkin @ 2014-02-23  3:48 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Dave Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Linux Kernel, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Ben Skeggs, Marcin Slusarz

On Sat, Feb 22, 2014 at 10:45 PM, Daniel J Blueman <daniel-JCXMqzvhRcvYtjvyW6yDsg@public.gmane.org> wrote:
> On 9 February 2014 02:57, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
>> On Sat, Feb 8, 2014 at 10:38 AM, Daniel J Blueman <daniel-JCXMqzvhRcvYtjvyW6yDsg@public.gmane.org> wrote:
>>> Interestingly, there was graphical failure booting 3.6.11, even
>>> nvidia-current fails to initialise, but these two issues could be due
>>> to running the Xorg stack in Ubuntu 14.04 pre-release. Using
>>> nouveau.noaccel=1 works great for the first X session, but after
>>> logging out, lightdm and the next session experiences this consistent
>>> screen corruption:
>>>
>>> http://quora.org/nouveau-corruption.jpg
>>
>> Does that just happen in 3.6.11 or even in 3.13? If the latter, that.
>> points to some key lack of understanding of... something. With
>> noaccel, we're not using pgraph or anything fancy -- it's just a
>> framebuffer, basically. So if we can't even render _that_ right...
>>
>> Hopefully someone else will pipe up re your other issues -- my
>> knowledge base on this is exhausted :(
>
> Interestingly, it turns out that the screen corruption occurs on every
> boot (booting with nouveau.noaccel=1 for now), and I can consistently
> work around it by one suspend-resume cycle.

Does booting with nouveau.config=NvForcePost=1 help?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
@ 2014-02-23  4:01             ` Daniel J Blueman
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel J Blueman @ 2014-02-23  4:01 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: Ben Skeggs, Dave Airlie, nouveau, dri-devel, Linux Kernel,
	Maarten Lankhorst, Marcin Slusarz

On 23 February 2014 11:48, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> On Sat, Feb 22, 2014 at 10:45 PM, Daniel J Blueman <daniel@quora.org> wrote:
>> On 9 February 2014 02:57, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>>> On Sat, Feb 8, 2014 at 10:38 AM, Daniel J Blueman <daniel@quora.org> wrote:
>>>> Interestingly, there was graphical failure booting 3.6.11, even
>>>> nvidia-current fails to initialise, but these two issues could be due
>>>> to running the Xorg stack in Ubuntu 14.04 pre-release. Using
>>>> nouveau.noaccel=1 works great for the first X session, but after
>>>> logging out, lightdm and the next session experiences this consistent
>>>> screen corruption:
>>>>
>>>> http://quora.org/nouveau-corruption.jpg
>>>
>>> Does that just happen in 3.6.11 or even in 3.13? If the latter, that.
>>> points to some key lack of understanding of... something. With
>>> noaccel, we're not using pgraph or anything fancy -- it's just a
>>> framebuffer, basically. So if we can't even render _that_ right...
>>>
>>> Hopefully someone else will pipe up re your other issues -- my
>>> knowledge base on this is exhausted :(
>>
>> Interestingly, it turns out that the screen corruption occurs on every
>> boot (booting with nouveau.noaccel=1 for now), and I can consistently
>> work around it by one suspend-resume cycle.
>
> Does booting with nouveau.config=NvForcePost=1 help?

Still no cigar with nouveau.config=NvForcePost=1.

I meant to add that restarting X or changing resolutions doesn't
resolve the issue. The corruption is consistently 16-pixel wide by 1
high stippling on a consistent address range of the framebuffer.

Thanks,
  Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
@ 2014-02-23  4:01             ` Daniel J Blueman
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel J Blueman @ 2014-02-23  4:01 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: Dave Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Linux Kernel, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Ben Skeggs, Marcin Slusarz

On 23 February 2014 11:48, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> On Sat, Feb 22, 2014 at 10:45 PM, Daniel J Blueman <daniel-JCXMqzvhRcvYtjvyW6yDsg@public.gmane.org> wrote:
>> On 9 February 2014 02:57, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
>>> On Sat, Feb 8, 2014 at 10:38 AM, Daniel J Blueman <daniel-JCXMqzvhRcvYtjvyW6yDsg@public.gmane.org> wrote:
>>>> Interestingly, there was graphical failure booting 3.6.11, even
>>>> nvidia-current fails to initialise, but these two issues could be due
>>>> to running the Xorg stack in Ubuntu 14.04 pre-release. Using
>>>> nouveau.noaccel=1 works great for the first X session, but after
>>>> logging out, lightdm and the next session experiences this consistent
>>>> screen corruption:
>>>>
>>>> http://quora.org/nouveau-corruption.jpg
>>>
>>> Does that just happen in 3.6.11 or even in 3.13? If the latter, that.
>>> points to some key lack of understanding of... something. With
>>> noaccel, we're not using pgraph or anything fancy -- it's just a
>>> framebuffer, basically. So if we can't even render _that_ right...
>>>
>>> Hopefully someone else will pipe up re your other issues -- my
>>> knowledge base on this is exhausted :(
>>
>> Interestingly, it turns out that the screen corruption occurs on every
>> boot (booting with nouveau.noaccel=1 for now), and I can consistently
>> work around it by one suspend-resume cycle.
>
> Does booting with nouveau.config=NvForcePost=1 help?

Still no cigar with nouveau.config=NvForcePost=1.

I meant to add that restarting X or changing resolutions doesn't
resolve the issue. The corruption is consistently 16-pixel wide by 1
high stippling on a consistent address range of the framebuffer.

Thanks,
  Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
  2014-02-23  3:45       ` Daniel J Blueman
@ 2014-02-23  4:33           ` Ilia Mirkin
  2014-02-23  4:33           ` Ilia Mirkin
  1 sibling, 0 replies; 14+ messages in thread
From: Ilia Mirkin @ 2014-02-23  4:33 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Ben Skeggs, Dave Airlie, nouveau, dri-devel, Linux Kernel,
	Maarten Lankhorst, Marcin Slusarz

On Sat, Feb 22, 2014 at 10:45 PM, Daniel J Blueman <daniel@quora.org> wrote:
> On 9 February 2014 02:57, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>> On Sat, Feb 8, 2014 at 10:38 AM, Daniel J Blueman <daniel@quora.org> wrote:
>>> Interestingly, there was graphical failure booting 3.6.11, even
>>> nvidia-current fails to initialise, but these two issues could be due
>>> to running the Xorg stack in Ubuntu 14.04 pre-release. Using
>>> nouveau.noaccel=1 works great for the first X session, but after
>>> logging out, lightdm and the next session experiences this consistent
>>> screen corruption:
>>>
>>> http://quora.org/nouveau-corruption.jpg
>>
>> Does that just happen in 3.6.11 or even in 3.13? If the latter, that.
>> points to some key lack of understanding of... something. With
>> noaccel, we're not using pgraph or anything fancy -- it's just a
>> framebuffer, basically. So if we can't even render _that_ right...
>>
>> Hopefully someone else will pipe up re your other issues -- my
>> knowledge base on this is exhausted :(
>
> Interestingly, it turns out that the screen corruption occurs on every
> boot (booting with nouveau.noaccel=1 for now), and I can consistently
> work around it by one suspend-resume cycle.
>
> To that effect, I've captured kernel message output booting 3.14-rc3
> with 'nouveau.noaccel=1 nouveau.debug=trace,DEVINIT=spam drm.debug=0x6
> log_buf_len=16M', and performed a suspend-resume cycle:
> http://quora.org/nouveau-log.txt

An observation:

on boot:

[    7.086599] [drm:drm_crtc_helper_set_mode], [ENCODER:16:LVDS-16]
set [MODE:29:1280x800]
[    7.150571] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000010
0x00000020
[    7.164662] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000020
0x00000030
[    7.164903] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000040
0x00000030

on resume:

[   59.538135] [drm:drm_crtc_helper_set_mode], [ENCODER:16:LVDS-16]
set [MODE:29:1280x800]
[   59.539586] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000010
0x000002a0
[   59.540738] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000020
0x000002b0
[   59.540812] nouveau T[   VBIOS][0000:04:00.0] 0x547f[0]: SUB_DIRECT 0x556f
[   59.540814] nouveau T[   VBIOS][0000:04:00.0] 0x556f[1]: NV_REG
R[0x4061c00c] &= 0xfffffffe |= 0x00000000
[   59.540818] nouveau T[   VBIOS][0000:04:00.0] 0x557c[1]: NV_REG
R[0x4061c00c] &= 0xfffffffe |= 0x00000001
[   59.540836] nouveau T[   VBIOS][0000:04:00.0] 0x5589[1]: TIME 0x3e80
[   59.556702] nouveau T[   VBIOS][0000:04:00.0] 0x558c[1]: NV_REG
R[0x4061c00c] &= 0xfffffffe |= 0x00000000
[   59.556714] nouveau T[   VBIOS][0000:04:00.0] 0x5599[1]: DONE
[   59.556716] nouveau T[   VBIOS][0000:04:00.0] 0x5482[0]: ZM_REG_SEQUENCE 0x05
[   59.556718] nouveau T[   VBIOS][0000:04:00.0] 0x5488[0]:
R[0x4061c00c] = 0x01060200
[   59.556720] nouveau T[   VBIOS][0000:04:00.0] 0x548c[0]:
R[0x4061c010] = 0x0310000a
[   59.556721] nouveau T[   VBIOS][0000:04:00.0] 0x5490[0]:
R[0x4061c014] = 0x00000000
[   59.556723] nouveau T[   VBIOS][0000:04:00.0] 0x5494[0]:
R[0x4061c018] = 0x000f4af8
[   59.556725] nouveau T[   VBIOS][0000:04:00.0] 0x5498[0]:
R[0x4061c01c] = 0x0001caf0
[   59.556726] nouveau T[   VBIOS][0000:04:00.0] 0x549c[0]: SUB_DIRECT 0x55c5
[   59.556728] nouveau T[   VBIOS][0000:04:00.0] 0x55c5[1]: NV_REG
R[0x00e1e4] &= 0xfffffffc |= 0x00000000
[   59.556741] nouveau T[   VBIOS][0000:04:00.0] 0x55d2[1]: NV_REG
R[0x00e100] &= 0xfff7ffff |= 0x00080000
[   59.556751] nouveau T[   VBIOS][0000:04:00.0] 0x55df[1]: ZM_REG_SEQUENCE 0x02
[   59.556753] nouveau T[   VBIOS][0000:04:00.0] 0x55e5[1]:
R[0x4061c118] = 0x15151515
[   59.556754] nouveau T[   VBIOS][0000:04:00.0] 0x55e9[1]:
R[0x4061c11c] = 0x00000015
[   59.556756] nouveau T[   VBIOS][0000:04:00.0] 0x55ed[1]: ZM_REG_SEQUENCE 0x02
[   59.556757] nouveau T[   VBIOS][0000:04:00.0] 0x55f3[1]:
R[0x4061c198] = 0x15151515
[   59.556759] nouveau T[   VBIOS][0000:04:00.0] 0x55f7[1]:
R[0x4061c19c] = 0x00000015
[   59.556760] nouveau T[   VBIOS][0000:04:00.0] 0x55fb[1]: SUB_DIRECT 0x5e02
[   59.556762] nouveau T[   VBIOS][0000:04:00.0] 0x5e02[2]: DONE
[   59.556763] nouveau T[   VBIOS][0000:04:00.0] 0x55fe[1]: DONE
[   59.556765] nouveau T[   VBIOS][0000:04:00.0] 0x549f[0]: SUB_DIRECT 0x55ff
[   59.556766] nouveau T[   VBIOS][0000:04:00.0] 0x55ff[1]: DONE
[   59.556767] nouveau T[   VBIOS][0000:04:00.0] 0x54a2[0]: DONE
[   59.811966] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000040
0x000002b0
[   59.812033] nouveau T[   VBIOS][0000:04:00.0] 0x5600[0]: DONE

I'm pretty weak on that supervisor logic, unfortunately. But somehow
on boot it's not saying to execute the vbios snippet? Perhaps that's
normal though? Ben?

Also, Daniel -- first off, DEVINIT doesn't really do very much (you
might be thinking it has something to do with device init... while you
wouldn't be completely wrong, you also wouldn't be right enough).
Debug levels below "trace" need a special kernel recompile (there's a
config option for how low to compile in, otherwise there'd be too much
overhead). spam causes nv_wr*/nv_rd* to each print lines, so... a
_lot_ of output. A mmiotrace might be more concise :)

Daniel -- you could try to hack things in the supervisor handler
(core/engine/disp/nv50.c iirc, but just grep for that supervisor debug
print) s.t. it thinks the supervisor values are s.t. it should execute
vbios stuff.

  -ilia

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nouveau graphical corruption in 3.13.2
@ 2014-02-23  4:33           ` Ilia Mirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Ilia Mirkin @ 2014-02-23  4:33 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Dave Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Linux Kernel, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Ben Skeggs, Marcin Slusarz

On Sat, Feb 22, 2014 at 10:45 PM, Daniel J Blueman <daniel-JCXMqzvhRcvYtjvyW6yDsg@public.gmane.org> wrote:
> On 9 February 2014 02:57, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
>> On Sat, Feb 8, 2014 at 10:38 AM, Daniel J Blueman <daniel-JCXMqzvhRcvYtjvyW6yDsg@public.gmane.org> wrote:
>>> Interestingly, there was graphical failure booting 3.6.11, even
>>> nvidia-current fails to initialise, but these two issues could be due
>>> to running the Xorg stack in Ubuntu 14.04 pre-release. Using
>>> nouveau.noaccel=1 works great for the first X session, but after
>>> logging out, lightdm and the next session experiences this consistent
>>> screen corruption:
>>>
>>> http://quora.org/nouveau-corruption.jpg
>>
>> Does that just happen in 3.6.11 or even in 3.13? If the latter, that.
>> points to some key lack of understanding of... something. With
>> noaccel, we're not using pgraph or anything fancy -- it's just a
>> framebuffer, basically. So if we can't even render _that_ right...
>>
>> Hopefully someone else will pipe up re your other issues -- my
>> knowledge base on this is exhausted :(
>
> Interestingly, it turns out that the screen corruption occurs on every
> boot (booting with nouveau.noaccel=1 for now), and I can consistently
> work around it by one suspend-resume cycle.
>
> To that effect, I've captured kernel message output booting 3.14-rc3
> with 'nouveau.noaccel=1 nouveau.debug=trace,DEVINIT=spam drm.debug=0x6
> log_buf_len=16M', and performed a suspend-resume cycle:
> http://quora.org/nouveau-log.txt

An observation:

on boot:

[    7.086599] [drm:drm_crtc_helper_set_mode], [ENCODER:16:LVDS-16]
set [MODE:29:1280x800]
[    7.150571] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000010
0x00000020
[    7.164662] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000020
0x00000030
[    7.164903] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000040
0x00000030

on resume:

[   59.538135] [drm:drm_crtc_helper_set_mode], [ENCODER:16:LVDS-16]
set [MODE:29:1280x800]
[   59.539586] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000010
0x000002a0
[   59.540738] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000020
0x000002b0
[   59.540812] nouveau T[   VBIOS][0000:04:00.0] 0x547f[0]: SUB_DIRECT 0x556f
[   59.540814] nouveau T[   VBIOS][0000:04:00.0] 0x556f[1]: NV_REG
R[0x4061c00c] &= 0xfffffffe |= 0x00000000
[   59.540818] nouveau T[   VBIOS][0000:04:00.0] 0x557c[1]: NV_REG
R[0x4061c00c] &= 0xfffffffe |= 0x00000001
[   59.540836] nouveau T[   VBIOS][0000:04:00.0] 0x5589[1]: TIME 0x3e80
[   59.556702] nouveau T[   VBIOS][0000:04:00.0] 0x558c[1]: NV_REG
R[0x4061c00c] &= 0xfffffffe |= 0x00000000
[   59.556714] nouveau T[   VBIOS][0000:04:00.0] 0x5599[1]: DONE
[   59.556716] nouveau T[   VBIOS][0000:04:00.0] 0x5482[0]: ZM_REG_SEQUENCE 0x05
[   59.556718] nouveau T[   VBIOS][0000:04:00.0] 0x5488[0]:
R[0x4061c00c] = 0x01060200
[   59.556720] nouveau T[   VBIOS][0000:04:00.0] 0x548c[0]:
R[0x4061c010] = 0x0310000a
[   59.556721] nouveau T[   VBIOS][0000:04:00.0] 0x5490[0]:
R[0x4061c014] = 0x00000000
[   59.556723] nouveau T[   VBIOS][0000:04:00.0] 0x5494[0]:
R[0x4061c018] = 0x000f4af8
[   59.556725] nouveau T[   VBIOS][0000:04:00.0] 0x5498[0]:
R[0x4061c01c] = 0x0001caf0
[   59.556726] nouveau T[   VBIOS][0000:04:00.0] 0x549c[0]: SUB_DIRECT 0x55c5
[   59.556728] nouveau T[   VBIOS][0000:04:00.0] 0x55c5[1]: NV_REG
R[0x00e1e4] &= 0xfffffffc |= 0x00000000
[   59.556741] nouveau T[   VBIOS][0000:04:00.0] 0x55d2[1]: NV_REG
R[0x00e100] &= 0xfff7ffff |= 0x00080000
[   59.556751] nouveau T[   VBIOS][0000:04:00.0] 0x55df[1]: ZM_REG_SEQUENCE 0x02
[   59.556753] nouveau T[   VBIOS][0000:04:00.0] 0x55e5[1]:
R[0x4061c118] = 0x15151515
[   59.556754] nouveau T[   VBIOS][0000:04:00.0] 0x55e9[1]:
R[0x4061c11c] = 0x00000015
[   59.556756] nouveau T[   VBIOS][0000:04:00.0] 0x55ed[1]: ZM_REG_SEQUENCE 0x02
[   59.556757] nouveau T[   VBIOS][0000:04:00.0] 0x55f3[1]:
R[0x4061c198] = 0x15151515
[   59.556759] nouveau T[   VBIOS][0000:04:00.0] 0x55f7[1]:
R[0x4061c19c] = 0x00000015
[   59.556760] nouveau T[   VBIOS][0000:04:00.0] 0x55fb[1]: SUB_DIRECT 0x5e02
[   59.556762] nouveau T[   VBIOS][0000:04:00.0] 0x5e02[2]: DONE
[   59.556763] nouveau T[   VBIOS][0000:04:00.0] 0x55fe[1]: DONE
[   59.556765] nouveau T[   VBIOS][0000:04:00.0] 0x549f[0]: SUB_DIRECT 0x55ff
[   59.556766] nouveau T[   VBIOS][0000:04:00.0] 0x55ff[1]: DONE
[   59.556767] nouveau T[   VBIOS][0000:04:00.0] 0x54a2[0]: DONE
[   59.811966] nouveau D[   PDISP][0000:04:00.0] supervisor 0x00000040
0x000002b0
[   59.812033] nouveau T[   VBIOS][0000:04:00.0] 0x5600[0]: DONE

I'm pretty weak on that supervisor logic, unfortunately. But somehow
on boot it's not saying to execute the vbios snippet? Perhaps that's
normal though? Ben?

Also, Daniel -- first off, DEVINIT doesn't really do very much (you
might be thinking it has something to do with device init... while you
wouldn't be completely wrong, you also wouldn't be right enough).
Debug levels below "trace" need a special kernel recompile (there's a
config option for how low to compile in, otherwise there'd be too much
overhead). spam causes nv_wr*/nv_rd* to each print lines, so... a
_lot_ of output. A mmiotrace might be more concise :)

Daniel -- you could try to hack things in the supervisor handler
(core/engine/disp/nv50.c iirc, but just grep for that supervisor debug
print) s.t. it thinks the supervisor values are s.t. it should execute
vbios stuff.

  -ilia

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-02-23  4:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-08  7:58 nouveau graphical corruption in 3.13.2 Daniel J Blueman
2014-02-08  7:58 ` Daniel J Blueman
2014-02-08  8:33 ` Ilia Mirkin
2014-02-08 15:38   ` Daniel J Blueman
2014-02-08 15:38     ` Daniel J Blueman
2014-02-08 18:57     ` Ilia Mirkin
2014-02-08 18:57       ` Ilia Mirkin
2014-02-23  3:45       ` Daniel J Blueman
2014-02-23  3:48         ` Ilia Mirkin
2014-02-23  3:48           ` Ilia Mirkin
2014-02-23  4:01           ` Daniel J Blueman
2014-02-23  4:01             ` Daniel J Blueman
2014-02-23  4:33         ` Ilia Mirkin
2014-02-23  4:33           ` Ilia Mirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.