From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org
Subject: [Bug 100567] New: Nouveau system freeze fifo: SCHED_ERROR
0a [CTXSW_TIMEOUT]
Date: Tue, 04 Apr 2017 20:09:12 +0000
Message-ID:
Bug ID
100567
Summary
Nouveau system freeze fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
Product
xorg
Version
unspecified
Hardware
Other
OS
All
Status
NEW
Severity
normal
Priority
medium
Component
Driver/nouveau
Assignee
nouveau@lists.freedesktop.org
Reporter
jeremy.booker@gmail.com
QA Contact
xorg-team@lists.x.org
Created attachment 130676 [details]
journalctl output for last boot to crash
I'm experiencing a random, but consistent hard-free (cannot switch virtual
terminals) running an nVidia card with three monitors under Fedora 25. Rece=
nt
kernel and driver updates have seemed to make this freeze much more frequent
(5-6 times in two work days).
Journalctl -b -1 output is attached. Most relevant lines are:
kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
kernel: nouveau 0000:01:00.0: fifo: gr engine fault on channel 12,
recovering...
kernel: [drm:drm_atomic_helper_swap_state [drm_kms_helper]] *ERROR*
[CRTC:37:head-0] hw_done timed out
kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERR=
OR*
[CRTC:37:head-0] hw_done timed out
kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERR=
OR*
[CRTC:37:head-0] flip_done timed out
A possibly related bug is #96562, but it is old and hasn't received any
attention, so I'm opening a new one.
I've tried downgrading to kernel 4.9.14, as the 4.10 series se= ems to have other changes relating to video drives and nouveau (which are also causing crashe= s). I made it from ~8am to ~noon without a crash. Same basic log entries found = in journalctl output: Apr 05 12:10:09 localhost.localdomain kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] Apr 05 12:10:09 localhost.localdomain kernel: nouveau 0000:01:00.0: fifo: gr engine fault on channel 10, recovering...
https://bugs.freedesktop.org/show_bug.c= gi?id=3D90453 Sounds related.
What | Removed | Added |
---|---|---|
CC | kevin@potatofrom.space |
I have the same issue, on Linux 4.11-rc6. journalctl -b: Apr 10 18:22:55 jenny kernel: nouveau 0000:02:00.0: gr: TRAP ch 2 [007f9010= 00 X[6443]] Apr 10 18:22:55 jenny kernel: nouveau 0000:02:00.0: gr: GPC2/TPC1/MP trap: global 00000000 [] warp 3f0009 [ILLEGAL_INSTR_ENCODING] ... Apr 10 18:22:59 jenny kernel: nouveau 0000:02:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] Apr 10 18:22:59 jenny kernel: nouveau 0000:02:00.0: fifo: runlist 0: schedu= led for recovery Apr 10 18:22:59 jenny kernel: nouveau 0000:02:00.0: fifo: channel 2: killed Apr 10 18:22:59 jenny kernel: nouveau 0000:02:00.0: fifo: engine 0: schedul= ed for recovery Apr 10 18:22:59 jenny kernel: nouveau 0000:02:00.0: X[6443]: channel 2 kill= ed! lspci -vv (GTX 770; NVE4): 02:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 77= 0] (rev a1) (prog-if 00 [VGA controller])=20=20=20=20=20=20=20=20=20 Subsystem: Micro-Star International Co., Ltd. [MSI] GK104 [GeForce = GTX 770]=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Flags: bus master, fast devsel, latency 0, IRQ 33=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Memory at fa000000 (32-bit, non-prefetchable) [size=3D16M]=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 Memory at f0000000 (64-bit, prefetchable) [size=3D128M]=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Memory at f8000000 (64-bit, prefetchable) [size=3D32M]=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 I/O ports at e000 [size=3D128]=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20 Expansion ROM at fb000000 [disabled] [size=3D512K]=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable+ Count=3D1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=3D14 <?> Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=3D0001 Rev=3D1 = Len=3D024 <?> Capabilities: [900] #19 Kernel driver in use: nouveau 02:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a= 1) Subsystem: Micro-Star International Co., Ltd. [MSI] GK104 HDMI Audio Controller Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at fb080000 (32-bit, non-prefetchable) [size=3D16K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=3D1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Kernel driver in use: snd_hda_intel I'll post a proper dmesg the next time the bug appears.
Created attachment 130834 [detail=
s]
dmesg with SCHED_ERROR (starting at 73576 sec)
It happened again! I believe the monitors were turned off when the problem
occurred (I wasn't there at the time), but when I returned the monitors wer=
e on
and frozen. I could move the mouse around, but nothing else responded.
The errors start at [73576].
Created attachment 130945 [details]=
journalctl -kb of CTXSW_TIMEOUT on 4.10.10
Can reproduce on Linux 4.10.10 as well. Along with the initial CTXSW_TIMEOU=
T,
it seems to hang a few kernel tasks relating to atomic commits.
Relevant(?) snippet:
Apr 20 10:06:03 jenny kernel: nouveau 0000:02:00.0: gr: TRAP ch 3 [007f7c20=
00
X[3611]]
Apr 20 10:06:03 jenny kernel: nouveau 0000:02:00.0: gr: GPC1/TPC0/MP trap:
global 00000004 [MULTIPLE_WARP_ERRORS] warp 3e000d [OOR_REG]
Apr 20 10:06:03 jenny kernel: nouveau 0000:02:00.0: gr: GPC2/TPC0/MP trap:
global 00000004 [MULTIPLE_WARP_ERRORS] warp 3e000d [OOR_REG]
Apr 20 10:06:03 jenny kernel: nouveau 0000:02:00.0: gr: TRAP ch 3 [007f7c20=
00
X[3611]]
Apr 20 10:06:03 jenny kernel: nouveau 0000:02:00.0: gr: GPC1/TPC0/MP trap:
global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f000d [OOR_REG]
Apr 20 10:06:03 jenny kernel: nouveau 0000:02:00.0: gr: GPC1/TPC1/MP trap:
global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f000d [OOR_REG]
Apr 20 10:06:03 jenny kernel: nouveau 0000:02:00.0: gr: GPC2/TPC0/MP trap:
global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f000d [OOR_REG]
Apr 20 10:06:03 jenny kernel: nouveau 0000:02:00.0: gr: GPC3/TPC0/MP trap:
global 00000004 [MULTIPLE_WARP_ERRORS] warp 3f000d [OOR_REG]
Apr 20 10:06:03 jenny kernel: nouveau 0000:02:00.0: gr: GPC3/TPC1/MP trap:
global 00000004 [MULTIPLE_WARP_ERRORS] warp 3d000d [OOR_REG]
Apr 20 10:06:07 jenny kernel: nouveau 0000:02:00.0: fifo: SCHED_ERROR 0a
[CTXSW_TIMEOUT]
Apr 20 10:06:07 jenny kernel: nouveau 0000:02:00.0: fifo: gr engine fault on
channel 6, recovering...
Apr 20 10:06:18 jenny kernel: nouveau 0000:02:00.0: X[3611]: nv50cal_space:=
-16
Apr 20 10:06:18 jenny kernel: nouveau 0000:02:00.0: X[3611]: nv50cal_space:=
-16
Apr 20 10:06:19 jenny kernel: nouveau 0000:02:00.0: X[3611]: nv50cal_space:=
-16
Apr 20 10:06:19 jenny kernel: nouveau 0000:02:00.0: X[3611]: nv50cal_space:=
-16
Apr 20 10:06:19 jenny kernel: nouveau 0000:02:00.0: X[3611]: nv50cal_space:=
-16
Apr 20 10:06:20 jenny kernel: nouveau 0000:02:00.0: X[3611]: nv50cal_space:=
-16
Apr 20 10:06:20 jenny kernel: nouveau 0000:02:00.0: X[3611]: nv50cal_space:=
-16
Apr 20 10:06:20 jenny kernel: nouveau 0000:02:00.0: X[3611]: nv50cal_space:=
-16
Apr 20 10:06:21 jenny kernel: nouveau 0000:02:00.0: X[3611]: nv50cal_space:=
-16
...
Same problem on 4.9.16-gentoo. I've attached syslog and dmesg.=20 In average the problem occurs every one or two days on my machine with one monitor, resolution 3440x1440.=20 The x11-drivers/xf86-video-nouveau driver version is 1.0.15
What | Removed | Added |
---|---|---|
CC | jadziadax30@gmail.com |
Created attachment 132245 [details]
syslog
Created attachment 132246 [details]<=
/span>
dmesg
I think I am experiencing the same problem on 4.10.0-24-generi= c on Ubuntu 17.04 with GNOME 3.24. I have 1 monitor running an Nvidia card. The desktop freezes between 20 mins to 5 hours of the system running from b= oot. This is the system log when it crashes Jun 30 16:49:47 r-ubuntu kernel: [ 2282.503403] nouveau 0000:01:00.0: gr: T= RAP ch 2 [003fa29000 Xorg[1154]] Jun 30 16:49:47 r-ubuntu kernel: [ 2282.503411] nouveau 0000:01:00.0: gr: GPC0/TPC0/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 3c000d [OOR_= REG] Let me know if I need to provide more info
It must be a driver bug, as I see the same error on different = kernel and different graphic hardware. dmesg output: nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recovery nouveau 0000:01:00.0: fifo: channel 12: killed nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery lspci -vv -s 01:00.0 VGA compatible controller: NVIDIA Corporation GK106GLM [Quadro K2100M] (rev= a1) (prog-if 00 [VGA controller]) Subsystem: Dell GK106GLM [Quadro K2100M] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Steppin= g- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=3Dfast >TAbort- <TAb= ort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 25 Region 0: Memory at f5000000 (32-bit, non-prefetchable) [size=3D16M] Region 1: Memory at e0000000 (64-bit, prefetchable) [size=3D256M] Region 3: Memory at f0000000 (64-bit, prefetchable) [size=3D32M] Region 5: I/O ports at e000 [size=3D128] Expansion ROM at 000c0000 [disabled] [size=3D128K] Capabilities: <access denied> Kernel driver in use: nouveau Kernel modules: nouveau uname -a Linux Geza-DellM4700 4.13.11-1-ARCH #1 SMP PREEMPT Thu Nov 2 10:25:56 CET 2= 017 x86_64 GNU/Linux lsmod Module Size Used by fuse 94208 3 ccm 20480 9 cmac 16384 1 rfcomm 69632 32 ipt_MASQUERADE 16384 1 nf_nat_masquerade_ipv4 16384 1 ipt_MASQUERADE nf_conntrack_netlink 36864 0 nfnetlink 16384 2 nf_conntrack_netlink xfrm_user 32768 1 xfrm_algo 16384 1 xfrm_user iptable_nat 16384 1 nf_conntrack_ipv4 16384 3 nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 nf_nat_ipv4 16384 1 iptable_nat xt_addrtype 16384 2 iptable_filter 16384 1 xt_conntrack 16384 1 nf_nat 24576 2 nf_nat_masquerade_ipv4,nf_nat_ipv4 nf_conntrack 110592 7 nf_conntrack_ipv4,ipt_MASQUERADE,nf_conntrack_netlink,nf_nat_masquerade_ipv= 4,xt_conntrack,nf_nat_ipv4,nf_nat libcrc32c 16384 2 nf_conntrack,nf_nat crc32c_generic 16384 0 br_netfilter 24576 0 bridge 139264 1 br_netfilter stp 16384 1 bridge llc 16384 2 bridge,stp joydev 20480 0 mousedev 20480 0 snd_hda_codec_hdmi 49152 1 snd_hda_codec_idt 49152 1 snd_hda_codec_generic 69632 1 snd_hda_codec_idt bnep 20480 2 arc4 16384 2 hid_logitech_hidpp 32768 0 mei_wdt 16384 0 iTCO_wdt 16384 0 iTCO_vendor_support 16384 1 iTCO_wdt ppdev 20480 0 intel_rapl 20480 0 x86_pkg_temp_thermal 16384 0 intel_powerclamp 16384 0 dell_laptop 20480 0 coretemp 16384 0 dell_smm_hwmon 16384 0 kvm_intel 192512 0 iwlmvm 299008 0 kvm 516096 1 kvm_intel mac80211 688128 1 iwlmvm irqbypass 16384 1 kvm crct10dif_pclmul 16384 0 crc32_pclmul 16384 0 ghash_clmulni_intel 16384 0 pcbc 16384 0 snd_hda_intel 36864 8 snd_hda_codec 106496 4 snd_hda_intel,snd_hda_codec_idt,snd_hda_codec_hdmi,snd_hda_codec_generic aesni_intel 184320 8 uvcvideo 86016 0 snd_hda_core 65536 5 snd_hda_intel,snd_hda_codec,snd_hda_codec_idt,snd_hda_codec_hdmi,snd_hda_co= dec_generic aes_x86_64 20480 1 aesni_intel videobuf2_vmalloc 16384 1 uvcvideo crypto_simd 16384 1 aesni_intel snd_hwdep 20480 1 snd_hda_codec btusb 40960 0 glue_helper 16384 1 aesni_intel videobuf2_memops 16384 1 videobuf2_vmalloc e1000e 225280 0 nls_iso8859_1 16384 1 btrtl 16384 1 btusb cryptd 20480 3 crypto_simd,ghash_clmulni_intel,aesni_intel videobuf2_v4l2 20480 1 uvcvideo iwlwifi 217088 1 iwlmvm intel_cstate 16384 0 snd_pcm 86016 4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi videobuf2_core 36864 2 uvcvideo,videobuf2_v4l2 btbcm 16384 1 btusb nls_cp437 20480 1 btintel 16384 1 btusb dell_wmi 16384 0 dell_smbios 16384 2 dell_wmi,dell_laptop bluetooth 479232 68 btrtl,btintel,bnep,btbcm,rfcomm,btusb vfat 20480 1 snd_timer 28672 1 snd_pcm fat 65536 1 vfat videodev 155648 3 uvcvideo,videobuf2_core,videobuf2_v4l2 intel_rapl_perf 16384 0 snd 73728 24 snd_hda_intel,snd_hwdep,snd_hda_codec,snd_hda_codec_idt,snd_timer,snd_hda_c= odec_hdmi,snd_hda_codec_generic,snd_pcm dcdbas 16384 1 dell_smbios ecdh_generic 24576 1 bluetooth mei_me 36864 1 media 32768 2 uvcvideo,videodev psmouse 135168 0 input_leds 16384 0 sparse_keymap 16384 1 dell_wmi wmi_bmof 16384 0 cfg80211 532480 3 iwlmvm,iwlwifi,mac80211 crc16 16384 1 bluetooth hid_logitech_dj 20480 0 ptp 20480 1 e1000e lpc_ich 24576 0 i2c_i801 24576 0 soundcore 16384 1 snd mei 81920 3 mei_me,mei_wdt pps_core 20480 1 ptp shpchp 32768 0 parport_pc 28672 0 parport 40960 2 parport_pc,ppdev dell_smo8800 16384 0 thermal 20480 0 dell_rbtn 16384 0 rfkill 20480 10 bluetooth,dell_laptop,dell_rbtn,cfg80211 battery 20480 0 ac 16384 0 evdev 24576 39 mac_hid 16384 0 squashfs 49152 1 loop 28672 2 sch_fq_codel 20480 6 vboxnetflt 28672 0 vboxnetadp 28672 0 pci_stub 16384 1 vboxpci 24576 0 vboxdrv 393216 3 vboxnetadp,vboxnetflt,vboxpci nfsd 315392 13 auth_rpcgss 57344 1 nfsd overlay 65536 0 oid_registry 16384 1 auth_rpcgss nfs_acl 16384 1 nfsd lockd 86016 1 nfsd sg 36864 0 grace 16384 2 nfsd,lockd crypto_user 16384 0 sunrpc 282624 19 auth_rpcgss,nfsd,nfs_acl,lockd ip_tables 24576 2 iptable_filter,iptable_nat x_tables 32768 5 ip_tables,iptable_filter,ipt_MASQUERADE,xt_addrtype,xt_conntrack btrfs 1036288 1 xor 24576 1 btrfs raid6_pq 114688 1 btrfs sr_mod 24576 0 cdrom 53248 1 sr_mod sd_mod 49152 3 usbhid 45056 0 hid 114688 4 usbhid,hid_logitech_dj,hid_logitech_hidpp serio_raw 16384 0 atkbd 24576 0 libps2 16384 2 atkbd,psmouse ahci 36864 2 libahci 28672 1 ahci firewire_ohci 40960 0 crc32c_intel 24576 2 libata 208896 2 ahci,libahci xhci_pci 16384 0 sdhci_pci 28672 0 sdhci 40960 1 sdhci_pci ehci_pci 16384 0 xhci_hcd 188416 1 xhci_pci scsi_mod 155648 4 sd_mod,libata,sr_mod,sg ehci_hcd 73728 1 ehci_pci firewire_core 57344 1 firewire_ohci mmc_core 122880 2 sdhci,sdhci_pci crc_itu_t 16384 1 firewire_core usbcore 208896 7 uvcvideo,usbhid,ehci_hcd,xhci_pci,btusb,xhci_hcd,ehci_pci usb_common 16384 1 usbcore i8042 24576 1 dell_laptop serio 20480 6 serio_raw,atkbd,psmouse,i8042 nouveau 1564672 44 button 16384 1 nouveau video 36864 3 dell_wmi,dell_laptop,nouveau led_class 16384 5 iwlmvm,sdhci,input_leds,dell_laptop,nouveau mxm_wmi 16384 1 nouveau wmi 20480 4 dell_wmi,wmi_bmof,mxm_wmi,nouveau i2c_algo_bit 16384 1 nouveau drm_kms_helper 131072 1 nouveau syscopyarea 16384 1 drm_kms_helper sysfillrect 16384 1 drm_kms_helper sysimgblt 16384 1 drm_kms_helper fb_sys_fops 16384 1 drm_kms_helper ttm 81920 1 nouveau drm 303104 29 nouveau,ttm,drm_kms_helper agpgart 36864 3 nouveau,ttm,drm
Same problem on Linux Neon 4.10.0-40-generic xserver-xorg-video-nouveau-hwe-16.04 - 1:1.0.14-0ubuntu1~16.04.1
On my Lenovo P50 this problem is newly introduced since I use = a 4.14 kernel. Kernels below that are stable/working in regard of that problem. This is wh= at I can aquire from the logs: Dec 28 17:23:13 marc kernel: [ 5657.352825] nouveau 0000:01:00.0: fifo: cha= nnel 2: killed Dec 28 17:23:13 marc kernel: [ 5657.352827] nouveau 0000:01:00.0: fifo: run= list 0: scheduled for recovery Dec 28 17:23:13 marc kernel: [ 5657.352830] nouveau 0000:01:00.0: fifo: eng= ine 0: scheduled for recovery Dec 28 17:23:13 marc kernel: [ 5657.352833] nouveau 0000:01:00.0: X[5708]: channel 2 killed! Dec 28 17:23:13 local kernel: nouveau 0000:01:00.0: gr: TRAP ch 2 [00ffbd00= 00 X[5708]] Dec 28 17:23:13 local kernel: nouveau 0000:01:00.0: gr: GPC0/TPC0/TEX: 8000= 0000 Dec 28 17:23:13 local kernel: nouveau 0000:01:00.0: gr: GPC0/TPC1/TEX: 8000= 0041 Dec 28 17:23:13 local kernel: nouveau 0000:01:00.0: fifo: read fault at 000ac40000 engine 00 [GR] client 07 [GPC0/T1_2] reason 02 [PTE] on channel 2 [00ffbd0000 X[5708]] Dec 28 17:23:13 local kernel: nouveau 0000:01:00.0: fifo: channel 2: killed Dec 28 17:23:13 local kernel: nouveau 0000:01:00.0: fifo: runlist 0: schedu= led for recovery Dec 28 17:23:13 local kernel: nouveau 0000:01:00.0: fifo: engine 0: schedul= ed for recovery Dec 28 17:23:13 local kernel: nouveau 0000:01:00.0: X[5708]: channel 2 kill= ed! Dec 28 17:24:03 marc kernel: [ 5706.891489] sysrq: SysRq : Keyboard mode se= t to system default Dec 28 17:24:03 local kernel: sysrq: SysRq : Keyboard mode set to system default Dec 28 17:24:04 marc exiting on signal 15 I found this in my Xorg.0.log but I don't know if it's somehow related to t= he above: [ 155.535] (EE) libinput bug: timer event13 debounce short: offset negati= ve (-2229)
What | Removed | Added |
---|---|---|
Depends on | 103721 |
Hello there?!?! Just because of pure curiosity: is any developer interested in this or are = the users left alone watching their machines freeze for almost 10 months now? Is there anything I (or any other user) could do to provide more info? Maybe apply a patch, send debug log, try this or that? It's really so sad to see absolutely no progress but more complaints for th= at long...
I also have this problem with a 770 GTX since I switched to KD= E 5 about 5 months ago. Freeze without reason learns exept the mouse cursor.=20 Kubuntu 17.10, Kubuntu 18.4, Debian 9 KDE and KDE Neon, same problem. In addition, with the nVidia driver it's worse. --- 01:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 77= 0] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Micro-Star International Co., Ltd. [MSI] GK104 [GeForce = GTX 770] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=3Dfast >TAbort-= <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 27 Region 0: Memory at f6000000 (32-bit, non-prefetchable) [size=3D16M] Region 1: Memory at e8000000 (64-bit, prefetchable) [size=3D128M] Region 3: Memory at f0000000 (64-bit, prefetchable) [size=3D32M] Region 5: I/O ports at e000 [size=3D128] Expansion ROM at 000c0000 [disabled] [size=3D128K] Capabilities: <access denied> Kernel driver in use: nouveau Kernel modules: nvidiafb, nouveau 01:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a= 1) Subsystem: Micro-Star International Co., Ltd. [MSI] GK104 HDMI Audio Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=3Dfast >TAbort-= <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin B routed to IRQ 17 Region 0: Memory at f7080000 (32-bit, non-prefetchable) [size=3D16K] Capabilities: <access denied> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel
I had the same problem: Nov 5 10:56:01 daniel-Inspiron-3543 kernel: [ 8431.056748] nouveau 0000:05:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] Nov 5 10:56:01 daniel-Inspiron-3543 kernel: [ 8431.056765] nouveau 0000:05:00.0: fifo: runlist 0: scheduled for recovery Nov 5 10:56:01 daniel-Inspiron-3543 kernel: [ 8431.056781] nouveau 0000:05:00.0: fifo: channel 15: killed Nov 5 10:56:01 daniel-Inspiron-3543 kernel: [ 8431.056791] nouveau 0000:05:00.0: fifo: engine 7: scheduled for recovery Nov 5 10:56:01 daniel-Inspiron-3543 kernel: [ 8431.056798] nouveau 0000:05:00.0: fifo: engine 0: scheduled for recovery Nov 5 10:56:01 daniel-Inspiron-3543 kernel: [ 8431.057508] nouveau 0000:05:00.0: gnome-shell[4200]: channel 15 killed! lsb_release -d Description: Ubuntu 18.04.1 LTS gnome-shell --version GNOME Shell 3.28.3 sudo lshw -c video *-display=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 description: VGA compatible controller product: GK107 [NVS 510] vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:05:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller bus_master cap_list r= om configuration: driver=3Dnouveau latency=3D0 resources: irq:47 memory:de000000-deffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:c000(size=3D128) memory:c0000-dffff hwinfo --gfxcard 56: PCI 500.0: 0300 VGA compatible controller (VGA)=20=20=20=20=20=20=20=20= =20=20=20=20=20 [Created at pci.378] Unique ID: Ddhb.WEoD030yuUF Parent ID: _Znp.jlvukYYcj4B SysFS ID: /devices/pci0000:00/0000:00:02.0/0000:05:00.0 SysFS BusID: 0000:05:00.0 Hardware Class: graphics card Model: "nVidia GK107 [NVS 510]" Vendor: pci 0x10de "nVidia Corporation" Device: pci 0x0ffd "GK107 [NVS 510]" SubVendor: pci 0x10de "nVidia Corporation" SubDevice: pci 0x0967=20 Revision: 0xa1 Driver: "nouveau" Driver Modules: "nouveau" Memory Range: 0xde000000-0xdeffffff (rw,non-prefetchable) Memory Range: 0xc0000000-0xcfffffff (ro,non-prefetchable) Memory Range: 0xd0000000-0xd1ffffff (ro,non-prefetchable) I/O Ports: 0xc000-0xc07f (rw) Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled) IRQ: 47 (296925 events) I/O Port: 0x00 (rw) Module Alias: "pci:v000010DEd00000FFDsv000010DEsd00000967bc03sc00i00= " Driver Info #0: Driver Status: nvidiafb is not active Driver Activation Cmd: "modprobe nvidiafb" Driver Info #1: Driver Status: nouveau is active Driver Activation Cmd: "modprobe nouveau" Config Status: cfg=3Dnew, avail=3Dyes, need=3Dno, active=3Dunknown Attached to: #73 (PCI bridge) Primary display adapter: #56
Same issue with Lenovo P50 on 4.20-rc7. 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GLM [Quad= ro M1000M] [10de:13b1] (rev a2) (prog-if 00 [VGA controller]) [162840.653595] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] [162840.653610] nouveau 0000:01:00.0: fifo: runlist 0: scheduled for recove= ry [162840.653621] nouveau 0000:01:00.0: fifo: channel 4: killed [162840.653631] nouveau 0000:01:00.0: fifo: engine 0: scheduled for recovery [162840.654013] nouveau 0000:01:00.0: systemd-logind[1383]: channel 4 kille= d!
The same problem on Ubuntu 18.10, kernel 4.18.0-13. I've got 4x GPU: GTX 1080 Ti (3-Way SLI Connector), NVIDIA GeForce GTX 1080= Ti graphics card with 3584 cores. $ uname -a Linux Ubuntu-PC 4.18.0-13-generic #14-Ubuntu SMP Wed Dec 5 09:04:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Errors in kern.log file: nouveau 0000:65:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT] nouveau 0000:65:00.0: fifo: runlist 0: scheduled for recovery nouveau 0000:65:00.0: fifo: channel 2: killed nouveau 0000:65:00.0: fifo: engine 0: scheduled for recovery nouveau 0000:65:00.0: Xorg[5447]: channel 2 killed! nouveau 0000:65:00.0: systemd-logind[3394]: nv50cal_space: -16 nouveau 0000:65:00.0: systemd-logind[3394]: nv50cal_space: -16 (the same message repeated 800x over and over again) The system got freeze (no mouse or keyboard reaction), however kernel react= ed on few Magic SysRq keys, so here are some stack traces: INFO: task kworker/u72:8:492 blocked for more than 120 seconds. Tainted: G O 4.18.0-13-generic #14-Ubuntu "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables th= is message. kworker/u72:8 D 0 492 2 0x80000000 Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau] Call Trace at 20:25:50: __schedule+0x29e/0x840 schedule+0x2c/0x80 schedule_timeout+0x258/0x360 ? nv50_wndw_atomic_destroy_state+0x1d/0x20 [nouveau] dma_fence_default_wait+0x1fc/0x260 ? dma_fence_release+0xa0/0xa0 dma_fence_wait_timeout+0x3e/0xf0 drm_atomic_helper_wait_for_fences+0x3f/0xc0 [drm_kms_helper] nv50_disp_atomic_commit_tail+0x78/0x860 [nouveau] ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] process_one_work+0x20f/0x3c0 worker_thread+0x34/0x400 kthread+0x120/0x140 ? pwq_unbound_release_workfn+0xd0/0xd0 ? kthread_bind+0x40/0x40 ret_from_fork+0x35/0x40 Same call trace at 20:29:51 (few minutes later while Xorg was frozen): Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau] Call Trace: __schedule+0x29e/0x840 ? apic_timer_interrupt+0xa/0x20 ? __drm_crtc_commit_free+0x12/0x20 [drm] schedule+0x2c/0x80 schedule_timeout+0x258/0x360 ? nv50_wndw_atomic_destroy_state+0x1d/0x20 [nouveau] dma_fence_default_wait+0x1fc/0x260 ? dma_fence_release+0xa0/0xa0 dma_fence_wait_timeout+0x3e/0xf0 drm_atomic_helper_wait_for_fences+0x3f/0xc0 [drm_kms_helper] nv50_disp_atomic_commit_tail+0x78/0x860 [nouveau] ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 nv50_disp_atomic_commit_work+0x12/0x20 [nouveau] process_one_work+0x20f/0x3c0 worker_thread+0x34/0x400 kthread+0x120/0x140 ? pwq_unbound_release_workfn+0xd0/0xd0 ? kthread_bind+0x40/0x40 ret_from_fork+0x35/0x40 Another one: INFO: task Xorg:5447 blocked for more than 120 seconds. Tainted: G O 4.18.0-13-generic #14-Ubuntu "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables th= is message. Xorg D 0 5447 5445 0x00000004 Call Trace: __schedule+0x29e/0x840 schedule+0x2c/0x80 schedule_preempt_disabled+0xe/0x10 __ww_mutex_lock.isra.6+0x3c1/0x660 __ww_mutex_lock_slowpath+0x16/0x20 ww_mutex_lock+0x34/0x50 drm_modeset_lock+0x6e/0xb0 [drm] drm_crtc_get_sequence_ioctl+0xbc/0x190 [drm] ? drm_wait_vblank_ioctl+0x610/0x610 [drm] drm_ioctl_kernel+0xa4/0xf0 [drm] drm_ioctl+0x227/0x400 [drm] ? drm_wait_vblank_ioctl+0x610/0x610 [drm] ? do_iter_write+0xe1/0x1a0 ? do_iter_write+0xe1/0x1a0 nouveau_drm_ioctl+0x73/0xc0 [nouveau] do_vfs_ioctl+0xa8/0x620 ? __sys_recvmsg+0x88/0xa0 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f3f654b93c7 Code: Bad RIP value. RSP: 002b:00007ffd57bbf168 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffd57bbf200 RCX: 00007f3f654b93c7 RDX: 00007ffd57bbf1a0 RSI: 00000000c018643b RDI: 000000000000000e RBP: 00007ffd57bbf1a0 R08: 0000000000000000 R09: 00005646eb8ff7c0 R10: 00005646eb54ad30 R11: 0000000000000246 R12: 00000000c018643b R13: 000000000000000e R14: 00005646eb54b800 R15: 00005646eb466880 Full log: https://gist.github.com/kenorb/5b95caa1694dbf7f030ccc808a110856<= /a>
What | Removed | Added |
---|---|---|
CC | stefan.kelemen@gmx.de |
*** Bug 98138 has been marked as a dupl= icate of this bug. ***
What | Removed | Added |
---|---|---|
Priority | medium | high |
Severity | normal | critical |
Based on the provided call stacks, related commit for nv50_wndw_atomic_destroy_state: https://lore.kernel.org/patchwork/patch/781346/
What | Removed | Added |
---|---|---|
CC | flat@imo.uto.moe |
*** Bug 96562 has been marked as a dupl= icate of this bug. ***
What | Removed | Added |
---|---|---|
See Also | https://bugs.freedesktop.org/show_bug.cgi?id=3D93629 |
What | Removed | Added |
---|---|---|
See Also | https://bugs.freedesktop.org/show_bug.cgi?id=3D99900 |