Tracking down severe regression in 5.3-rc4/5.4 for TU116

* Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed
@ 2019-12-16 16:35 Marcin Zajączkowski
       [not found] ` <c34a6fe1-80dd-a4db-c605-0a13c69e803f-5tc4TXWwyLM@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Marcin Zajączkowski @ 2019-12-16 16:35 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi,

I've encountered a severe regression in TU116 (probably also TU117)
introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
usually hangs on the subsequent graphic mode related operation (calling
xrandr after login is enough) with the following error:

> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
...
> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> kernel: ------------[ cut here ]------------
> kernel: nouveau 0000:01:00.0: timeout
> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]

(detailed log in a corresponding issue - [1])

With earlier kernels there was no hardware acceleration for NVidia GTX
1660 Ti, but at least I could use nouveau to disable it (to save
battery, trees and lower temperature) or even have an external output
(with Wayland). Now, the system is unusable with nouveau :(.

I spent some time trying to narrow the scope using on the existing
kernel builds for Fedora. I was able to determine that the problem was
introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).

It's just a few days (7-11 Aug) and "only" around 250 commits. I went
through them, but (based on the commits name) I haven't seen any nouveau
related changes and in general no very suspected drm related changes.

> git log 33920f1ec5bf..v5.3-rc4 --stat

Maybe some of more nouveau/drm-experienced developers could take a look
at that to determine which commit could break it (to make it easier to
find out what should be fixed to prevent that regression)?

[1] -
https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516

Thanks in advance
Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread