linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
       [not found] <20190830032948.13516-1-hdanton@sina.com>
@ 2019-09-03  6:48 ` Mikhail Gavrilov
       [not found]   ` <5d6e2298.1c69fb81.b5532.8395SMTPIN_ADDED_MISSING@mx.google.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Mikhail Gavrilov @ 2019-09-03  6:48 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Daniel Vetter, dri-devel, amd-gfx list, Linux kernel

On Fri, 30 Aug 2019 at 08:30, Hillf Danton <hdanton@sina.com> wrote:
>
> Add a warning to show if it makes sense in field: neither regression nor
> problem will have been observed with the warning printed.
>

I caught the problem.

[21793.094289] ------------[ cut here ]------------
[21793.094296] gnome shell stuck warning
[21793.094391] WARNING: CPU: 14 PID: 1768 at
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332
amdgpu_fence_wait_empty+0x1c2/0x230 [amdgpu]
[21793.094394] Modules linked in: rfcomm fuse xt_CHECKSUM
xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc
nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT
nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack
ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd
snd_hda_codec_realtek rtwpci rtw88 snd_hda_codec_generic snd_usb_audio
kvm ledtrig_audio snd_hda_codec_hdmi snd_hda_intel mac80211
snd_hda_codec snd_usbmidi_lib irqbypass uvcvideo snd_rawmidi
snd_hda_core videobuf2_vmalloc videobuf2_memops crct10dif_pclmul btusb
videobuf2_v4l2 snd_hwdep crc32_pclmul btrtl videobuf2_common snd_seq
eeepc_wmi btbcm xpad asus_wmi btintel snd_seq_device
ghash_clmulni_intel cfg80211 sparse_keymap
[21793.094426]  ff_memless joydev bluetooth videodev video snd_pcm
wmi_bmof mc ecdh_generic snd_timer ecc snd ccp rfkill libarc4
soundcore sp5100_tco k10temp i2c_piix4 gpio_amdpt gpio_generic
acpi_cpufreq binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj
amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper igb drm nvme dca
crc32c_intel i2c_algo_bit nvme_core wmi pinctrl_amd
[21793.094449] CPU: 14 PID: 1768 Comm: Xorg Tainted: G        W
 5.3.0-0.rc6.git2.1b.fc32.x86_64 #1
[21793.094452] Hardware name: System manufacturer System Product
Name/ROG STRIX X470-I GAMING, BIOS 2406 06/21/2019
[21793.094499] RIP: 0010:amdgpu_fence_wait_empty+0x1c2/0x230 [amdgpu]
[21793.094502] Code: b5 f4 e9 c1 fe ff ff 31 c0 c3 48 89 ef e8 36 69
f8 f4 84 c0 74 08 48 89 ef e8 8a e9 15 f5 48 c7 c7 2c d6 91 c0 e8 86
f8 ad f4 <0f> 0b b8 ea ff ff ff 5d c3 e8 f0 97 b7 f4 84 c0 0f 85 73 ff
ff ff
[21793.094505] RSP: 0018:ffffae13418c3798 EFLAGS: 00010282
[21793.094508] RAX: 0000000000000000 RBX: ffff8aa065f85a80 RCX: 0000000000000006
[21793.094511] RDX: 0000000000000007 RSI: ffff8a9fe32ec070 RDI: ffff8aa07bdd9e00
[21793.094513] RBP: ffff8aa069469d00 R08: 000013d219a4ead6 R09: 0000000000000000
[21793.094516] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8aa065f80000
[21793.094518] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8aa065fb0000
[21793.094521] FS:  00007f586201cf00(0000) GS:ffff8aa07bc00000(0000)
knlGS:0000000000000000
[21793.094524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21793.094526] CR2: 00007f57fc5b5000 CR3: 0000000763340000 CR4: 00000000003406e0
[21793.094528] Call Trace:
[21793.094580]  amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu]
[21793.094655]  dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
[21793.094728]  dce12_update_clocks+0xd8/0x110 [amdgpu]
[21793.094799]  dc_commit_state+0x414/0x590 [amdgpu]
[21793.094807]  ? find_held_lock+0x32/0x90
[21793.094880]  amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
[21793.094888]  ? reacquire_held_locks+0xed/0x210
[21793.094898]  ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm]
[21793.094903]  ? find_held_lock+0x32/0x90
[21793.094906]  ? find_held_lock+0x32/0x90
[21793.094912]  ? __lock_acquire+0x247/0x1910
[21793.094920]  ? find_held_lock+0x32/0x90
[21793.094925]  ? mark_held_locks+0x50/0x80
[21793.094929]  ? _raw_spin_unlock_irq+0x29/0x40
[21793.094933]  ? lockdep_hardirqs_on+0xf0/0x180
[21793.094937]  ? _raw_spin_unlock_irq+0x29/0x40
[21793.094941]  ? wait_for_completion_timeout+0x75/0x190
[21793.094954]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[21793.094962]  commit_tail+0x3c/0x70 [drm_kms_helper]
[21793.094971]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[21793.094986]  drm_mode_atomic_ioctl+0x793/0x9b0 [drm]
[21793.094994]  ? __lock_acquire+0x247/0x1910
[21793.095013]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
[21793.095025]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[21793.095039]  drm_ioctl+0x208/0x390 [drm]
[21793.095053]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
[21793.095060]  ? lockdep_hardirqs_on+0xf0/0x180
[21793.095108]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[21793.095114]  do_vfs_ioctl+0x411/0x750
[21793.095121]  ksys_ioctl+0x5e/0x90
[21793.095126]  __x64_sys_ioctl+0x16/0x20
[21793.095130]  do_syscall_64+0x5c/0xb0
[21793.095135]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[21793.095138] RIP: 0033:0x7f586249300b
[21793.095142] Code: 0f 1e fa 48 8b 05 7d 9e 0c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4d 9e 0c 00 f7 d8 64 89
01 48
[21793.095144] RSP: 002b:00007ffcfa7f6c08 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[21793.095147] RAX: ffffffffffffffda RBX: 00007ffcfa7f6c50 RCX: 00007f586249300b
[21793.095149] RDX: 00007ffcfa7f6c50 RSI: 00000000c03864bc RDI: 000000000000000e
[21793.095152] RBP: 00000000c03864bc R08: 00005647f11aab00 R09: 0000000000000001
[21793.095154] R10: 0000000000000001 R11: 0000000000000246 R12: 00005647f05870e0
[21793.095156] R13: 000000000000000e R14: 00005647f08df6a0 R15: 00005647f09d7940
[21793.095166] irq event stamp: 822588868
[21793.095170] hardirqs last  enabled at (822588867):
[<ffffffffb5170beb>] console_unlock+0x46b/0x5d0
[21793.095173] hardirqs last disabled at (822588868):
[<ffffffffb50038da>] trace_hardirqs_off_thunk+0x1a/0x20
[21793.095177] softirqs last  enabled at (822588724):
[<ffffffffb5e0035d>] __do_softirq+0x35d/0x45d
[21793.095181] softirqs last disabled at (822588717):
[<ffffffffb50f1e57>] irq_exit+0xf7/0x100
[21793.095183] ---[ end trace 67af5f8c4c325f95 ]---


Full dmesg log here https://pastebin.com/2Nab7AC3
Any idea how completely to fix it ?

--
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
       [not found]   ` <5d6e2298.1c69fb81.b5532.8395SMTPIN_ADDED_MISSING@mx.google.com>
@ 2019-09-03 18:07     ` Mikhail Gavrilov
  2019-09-03 20:18       ` Daniel Vetter
  0 siblings, 1 reply; 12+ messages in thread
From: Mikhail Gavrilov @ 2019-09-03 18:07 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Daniel Vetter, dri-devel, amd-gfx list, Linux kernel

On Tue, 3 Sep 2019 at 13:21, Hillf Danton <hdanton@sina.com> wrote:
>
> Describe the problems you are experiencing please.
> Say is the screen locked up? Machine lockedup?
> Anything unnormal after you see the warning?
>

According to my observations, all "gnome shell stuck warning" happened
when me not sitting on the computer and the computer was locked.

I did not notice any problems at the morning (I did not even look at
the kernel logs), I found that the problem happened when I remotely
connected to my computer via ssh from work and accidently look dmesg
output.

At the evening after work, I even played in the "Division", and still
not noted any problems.

Now 11:01pm and "gnome shell stuck warning" not appear since 19:17. So
looks like issue happens only when computer blocked and monitor in
power save mode.


$ dmesg -T | grep gnome

---> I am goto sleep
[Tue Sep  3 01:00:10 2019] gnome shell stuck warning
[Tue Sep  3 01:00:55 2019] gnome shell stuck warning
[Tue Sep  3 06:54:50 2019] gnome shell stuck warning
<--- I am wake up at 8:00 am and sitting again on the computer
---> I am went to work at 9:30
[Tue Sep  3 10:00:05 2019] gnome shell stuck warning
[Tue Sep  3 10:10:01 2019] gnome shell stuck warning
[Tue Sep  3 10:13:43 2019] gnome shell stuck warning
[Tue Sep  3 10:23:37 2019] gnome shell stuck warning
[Tue Sep  3 10:42:07 2019] gnome shell stuck warning
[Tue Sep  3 10:42:57 2019] gnome shell stuck warning
[Tue Sep  3 10:59:25 2019] gnome shell stuck warning
[Tue Sep  3 11:08:35 2019] gnome shell stuck warning
[Tue Sep  3 11:13:19 2019] gnome shell stuck warning
[Tue Sep  3 11:15:20 2019] gnome shell stuck warning
[Tue Sep  3 11:26:20 2019] gnome shell stuck warning
[Tue Sep  3 11:26:20 2019] gnome shell stuck warning
[Tue Sep  3 11:36:30 2019] gnome shell stuck warning
[Tue Sep  3 11:46:08 2019] gnome shell stuck warning
[Tue Sep  3 11:53:52 2019] gnome shell stuck warning
[Tue Sep  3 11:56:36 2019] gnome shell stuck warning
[Tue Sep  3 12:17:10 2019] gnome shell stuck warning
[Tue Sep  3 12:20:20 2019] gnome shell stuck warning
[Tue Sep  3 12:20:20 2019] gnome shell stuck warning
[Tue Sep  3 12:30:46 2019] gnome shell stuck warning
[Tue Sep  3 12:40:52 2019] gnome shell stuck warning
[Tue Sep  3 12:55:30 2019] gnome shell stuck warning
[Tue Sep  3 12:57:52 2019] gnome shell stuck warning
[Tue Sep  3 13:04:00 2019] gnome shell stuck warning
[Tue Sep  3 13:12:38 2019] gnome shell stuck warning
[Tue Sep  3 13:14:32 2019] gnome shell stuck warning
[Tue Sep  3 13:53:12 2019] gnome shell stuck warning
[Tue Sep  3 14:12:52 2019] gnome shell stuck warning
[Tue Sep  3 14:15:54 2019] gnome shell stuck warning
[Tue Sep  3 14:17:04 2019] gnome shell stuck warning
[Tue Sep  3 14:21:57 2019] gnome shell stuck warning
[Tue Sep  3 14:22:10 2019] gnome shell stuck warning
[Tue Sep  3 14:37:42 2019] gnome shell stuck warning
[Tue Sep  3 14:41:51 2019] gnome shell stuck warning
[Tue Sep  3 14:42:52 2019] gnome shell stuck warning
[Tue Sep  3 14:46:35 2019] gnome shell stuck warning
[Tue Sep  3 15:03:18 2019] gnome shell stuck warning
[Tue Sep  3 15:16:50 2019] gnome shell stuck warning
[Tue Sep  3 15:27:30 2019] gnome shell stuck warning
[Tue Sep  3 15:27:41 2019] gnome shell stuck warning
[Tue Sep  3 16:08:06 2019] gnome shell stuck warning
[Tue Sep  3 16:24:16 2019] gnome shell stuck warning
[Tue Sep  3 16:33:04 2019] gnome shell stuck warning
[Tue Sep  3 16:52:10 2019] gnome shell stuck warning
[Tue Sep  3 17:18:27 2019] gnome shell stuck warning
[Tue Sep  3 17:25:30 2019] gnome shell stuck warning
[Tue Sep  3 17:41:16 2019] gnome shell stuck warning
[Tue Sep  3 17:43:32 2019] gnome shell stuck warning
[Tue Sep  3 17:51:10 2019] gnome shell stuck warning
[Tue Sep  3 18:41:44 2019] gnome shell stuck warning
[Tue Sep  3 18:44:18 2019] gnome shell stuck warning
[Tue Sep  3 19:03:07 2019] gnome shell stuck warning
[Tue Sep  3 19:17:58 2019] gnome shell stuck warning
<--- Returned to home and sitting again on the computer

--
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
  2019-09-03 18:07     ` Mikhail Gavrilov
@ 2019-09-03 20:18       ` Daniel Vetter
       [not found]         ` <5d6f10a6.1c69fb81.6b104.af73SMTPIN_ADDED_MISSING@mx.google.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2019-09-03 20:18 UTC (permalink / raw)
  To: Mikhail Gavrilov; +Cc: Hillf Danton, dri-devel, amd-gfx list, Linux kernel

On Tue, Sep 3, 2019 at 8:07 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Tue, 3 Sep 2019 at 13:21, Hillf Danton <hdanton@sina.com> wrote:
> >
> > Describe the problems you are experiencing please.
> > Say is the screen locked up? Machine lockedup?
> > Anything unnormal after you see the warning?
> >
>
> According to my observations, all "gnome shell stuck warning" happened
> when me not sitting on the computer and the computer was locked.
>
> I did not notice any problems at the morning (I did not even look at
> the kernel logs), I found that the problem happened when I remotely
> connected to my computer via ssh from work and accidently look dmesg
> output.
>
> At the evening after work, I even played in the "Division", and still
> not noted any problems.
>
> Now 11:01pm and "gnome shell stuck warning" not appear since 19:17. So
> looks like issue happens only when computer blocked and monitor in
> power save mode.

I'd bet on runtime pm or some other power saving feature in amdgpu
shutting the interrupt handling down before we've handled all the
interrupts. That would then result in a stuck fence.

Do we already know which fence is stuck? All the debuggin on the
dma_fence_wait side is just looking at the messenger, this isn't the
source of the problem.
-Daniel

>
> $ dmesg -T | grep gnome
>
> ---> I am goto sleep
> [Tue Sep  3 01:00:10 2019] gnome shell stuck warning
> [Tue Sep  3 01:00:55 2019] gnome shell stuck warning
> [Tue Sep  3 06:54:50 2019] gnome shell stuck warning
> <--- I am wake up at 8:00 am and sitting again on the computer
> ---> I am went to work at 9:30
> [Tue Sep  3 10:00:05 2019] gnome shell stuck warning
> [Tue Sep  3 10:10:01 2019] gnome shell stuck warning
> [Tue Sep  3 10:13:43 2019] gnome shell stuck warning
> [Tue Sep  3 10:23:37 2019] gnome shell stuck warning
> [Tue Sep  3 10:42:07 2019] gnome shell stuck warning
> [Tue Sep  3 10:42:57 2019] gnome shell stuck warning
> [Tue Sep  3 10:59:25 2019] gnome shell stuck warning
> [Tue Sep  3 11:08:35 2019] gnome shell stuck warning
> [Tue Sep  3 11:13:19 2019] gnome shell stuck warning
> [Tue Sep  3 11:15:20 2019] gnome shell stuck warning
> [Tue Sep  3 11:26:20 2019] gnome shell stuck warning
> [Tue Sep  3 11:26:20 2019] gnome shell stuck warning
> [Tue Sep  3 11:36:30 2019] gnome shell stuck warning
> [Tue Sep  3 11:46:08 2019] gnome shell stuck warning
> [Tue Sep  3 11:53:52 2019] gnome shell stuck warning
> [Tue Sep  3 11:56:36 2019] gnome shell stuck warning
> [Tue Sep  3 12:17:10 2019] gnome shell stuck warning
> [Tue Sep  3 12:20:20 2019] gnome shell stuck warning
> [Tue Sep  3 12:20:20 2019] gnome shell stuck warning
> [Tue Sep  3 12:30:46 2019] gnome shell stuck warning
> [Tue Sep  3 12:40:52 2019] gnome shell stuck warning
> [Tue Sep  3 12:55:30 2019] gnome shell stuck warning
> [Tue Sep  3 12:57:52 2019] gnome shell stuck warning
> [Tue Sep  3 13:04:00 2019] gnome shell stuck warning
> [Tue Sep  3 13:12:38 2019] gnome shell stuck warning
> [Tue Sep  3 13:14:32 2019] gnome shell stuck warning
> [Tue Sep  3 13:53:12 2019] gnome shell stuck warning
> [Tue Sep  3 14:12:52 2019] gnome shell stuck warning
> [Tue Sep  3 14:15:54 2019] gnome shell stuck warning
> [Tue Sep  3 14:17:04 2019] gnome shell stuck warning
> [Tue Sep  3 14:21:57 2019] gnome shell stuck warning
> [Tue Sep  3 14:22:10 2019] gnome shell stuck warning
> [Tue Sep  3 14:37:42 2019] gnome shell stuck warning
> [Tue Sep  3 14:41:51 2019] gnome shell stuck warning
> [Tue Sep  3 14:42:52 2019] gnome shell stuck warning
> [Tue Sep  3 14:46:35 2019] gnome shell stuck warning
> [Tue Sep  3 15:03:18 2019] gnome shell stuck warning
> [Tue Sep  3 15:16:50 2019] gnome shell stuck warning
> [Tue Sep  3 15:27:30 2019] gnome shell stuck warning
> [Tue Sep  3 15:27:41 2019] gnome shell stuck warning
> [Tue Sep  3 16:08:06 2019] gnome shell stuck warning
> [Tue Sep  3 16:24:16 2019] gnome shell stuck warning
> [Tue Sep  3 16:33:04 2019] gnome shell stuck warning
> [Tue Sep  3 16:52:10 2019] gnome shell stuck warning
> [Tue Sep  3 17:18:27 2019] gnome shell stuck warning
> [Tue Sep  3 17:25:30 2019] gnome shell stuck warning
> [Tue Sep  3 17:41:16 2019] gnome shell stuck warning
> [Tue Sep  3 17:43:32 2019] gnome shell stuck warning
> [Tue Sep  3 17:51:10 2019] gnome shell stuck warning
> [Tue Sep  3 18:41:44 2019] gnome shell stuck warning
> [Tue Sep  3 18:44:18 2019] gnome shell stuck warning
> [Tue Sep  3 19:03:07 2019] gnome shell stuck warning
> [Tue Sep  3 19:17:58 2019] gnome shell stuck warning
> <--- Returned to home and sitting again on the computer
>
> --
> Best Regards,
> Mike Gavrilov.



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
       [not found]         ` <5d6f10a6.1c69fb81.6b104.af73SMTPIN_ADDED_MISSING@mx.google.com>
@ 2019-09-04  8:37           ` Daniel Vetter
  2019-09-04 22:26             ` Mikhail Gavrilov
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2019-09-04  8:37 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Daniel Vetter, Mikhail Gavrilov, dri-devel, amd-gfx list,
	Linux kernel, Alex Deucher, Harry Wentland

On Wed, Sep 04, 2019 at 09:17:16AM +0800, Hillf Danton wrote:
> Daniel Vetter <daniel@ffwll.ch>
> >>
> >> Now 11:01pm and "gnome shell stuck warning" not appear since 19:17. So
> >> looks like issue happens only when computer blocked and monitor in
> >> power save mode.
> >
> > I'd bet on runtime pm or some other power saving feature in amdgpu
> > shutting the interrupt handling down before we've handled all the
> > interrupts. That would then result in a stuck fence.
> >
> > Do we already know which fence is stuck?
> 
> It is welcomed to shed a thread of light on how to collect/print that info.
> Say line:xxx-yyy in path/to/amdgpu/zzz.c

Extend your backtrac warning slightly like

	WARN(r, "we're stuck on fence %pS\n", fence->ops);

Also adding Harry and Alex, I'm not really working on amdgpu ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
  2019-09-04  8:37           ` Daniel Vetter
@ 2019-09-04 22:26             ` Mikhail Gavrilov
  2019-09-05  7:58               ` Daniel Vetter
  0 siblings, 1 reply; 12+ messages in thread
From: Mikhail Gavrilov @ 2019-09-04 22:26 UTC (permalink / raw)
  To: Hillf Danton, Mikhail Gavrilov, dri-devel, amd-gfx list,
	Linux kernel, Alex Deucher, Harry Wentland
  Cc: Daniel Vetter

On Wed, 4 Sep 2019 at 13:37, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> Extend your backtrac warning slightly like
>
>         WARN(r, "we're stuck on fence %pS\n", fence->ops);
>
> Also adding Harry and Alex, I'm not really working on amdgpu ...

[ 3511.998320] ------------[ cut here ]------------
[ 3511.998714] we're stuck on fence
amdgpu_fence_ops+0x0/0xffffffffffffc220 [amdgpu]
[ 3511.998991] WARNING: CPU: 10 PID: 1811 at
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332
amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
[ 3511.999009] Modules linked in: rfcomm fuse xt_CHECKSUM
xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc
nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT
nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack
ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd
snd_hda_codec_realtek rtwpci snd_hda_codec_generic kvm ledtrig_audio
snd_hda_codec_hdmi uvcvideo rtw88 videobuf2_vmalloc snd_hda_intel
videobuf2_memops videobuf2_v4l2 irqbypass snd_usb_audio snd_hda_codec
videobuf2_common crct10dif_pclmul snd_usbmidi_lib crc32_pclmul
mac80211 snd_rawmidi videodev snd_hda_core ghash_clmulni_intel btusb
snd_hwdep btrtl snd_seq btbcm btintel snd_seq_device eeepc_wmi
bluetooth xpad joydev mc snd_pcm
[ 3511.999076]  asus_wmi ff_memless cfg80211 sparse_keymap video
wmi_bmof ecdh_generic snd_timer ecc sp5100_tco k10temp snd i2c_piix4
ccp rfkill soundcore libarc4 gpio_amdpt gpio_generic acpi_cpufreq
binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu
amd_iommu_v2 gpu_sched ttm drm_kms_helper drm crc32c_intel igb dca
nvme i2c_algo_bit nvme_core wmi pinctrl_amd
[ 3511.999126] CPU: 10 PID: 1811 Comm: Xorg Not tainted
5.3.0-0.rc6.git2.1c.fc32.x86_64 #1
[ 3511.999131] Hardware name: System manufacturer System Product
Name/ROG STRIX X470-I GAMING, BIOS 2703 08/20/2019
[ 3511.999253] RIP: 0010:amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
[ 3511.999278] Code: fe ff ff 31 c0 c3 48 89 ef e8 36 29 04 cb 84 c0
74 08 48 89 ef e8 8a a9 21 cb 48 8b 75 08 48 c7 c7 2c 16 86 c0 e8 82
b8 b9 ca <0f> 0b b8 ea ff ff ff 5d c3 e8 ec 57 c3 ca 84 c0 0f 85 6f ff
ff ff
[ 3511.999282] RSP: 0018:ffffb9c04170f798 EFLAGS: 00210282
[ 3511.999288] RAX: 0000000000000000 RBX: ffff8d2ce5205a80 RCX: 0000000000000006
[ 3511.999292] RDX: 0000000000000007 RSI: ffff8d2c5bea4070 RDI: ffff8d2cfb5d9e00
[ 3511.999296] RBP: ffff8d28becae480 R08: 00000331b36fd503 R09: 0000000000000000
[ 3511.999299] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d2ce5200000
[ 3511.999303] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d2ce1540000
[ 3511.999308] FS:  00007f59a5bc6f00(0000) GS:ffff8d2cfb400000(0000)
knlGS:0000000000000000
[ 3511.999311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3511.999315] CR2: 00001108bc475960 CR3: 000000075bf32000 CR4: 00000000003406e0
[ 3511.999319] Call Trace:
[ 3511.999394]  amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu]
[ 3511.999503]  dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
[ 3511.999609]  dce12_update_clocks+0xd8/0x110 [amdgpu]
[ 3511.999712]  dc_commit_state+0x414/0x590 [amdgpu]
[ 3511.999725]  ? find_held_lock+0x32/0x90
[ 3511.999832]  amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
[ 3511.999844]  ? reacquire_held_locks+0xed/0x210
[ 3511.999859]  ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm]
[ 3511.999866]  ? find_held_lock+0x32/0x90
[ 3511.999872]  ? find_held_lock+0x32/0x90
[ 3511.999881]  ? __lock_acquire+0x247/0x1910
[ 3511.999893]  ? find_held_lock+0x32/0x90
[ 3511.999901]  ? mark_held_locks+0x50/0x80
[ 3511.999907]  ? _raw_spin_unlock_irq+0x29/0x40
[ 3511.999913]  ? lockdep_hardirqs_on+0xf0/0x180
[ 3511.999919]  ? _raw_spin_unlock_irq+0x29/0x40
[ 3511.999924]  ? wait_for_completion_timeout+0x75/0x190
[ 3511.999952]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[ 3511.999966]  commit_tail+0x3c/0x70 [drm_kms_helper]
[ 3511.999979]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[ 3512.000002]  drm_mode_atomic_ioctl+0x793/0x9b0 [drm]
[ 3512.000014]  ? __lock_acquire+0x247/0x1910
[ 3512.000044]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
[ 3512.000066]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 3512.000088]  drm_ioctl+0x208/0x390 [drm]
[ 3512.000108]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
[ 3512.000120]  ? lockdep_hardirqs_on+0xf0/0x180
[ 3512.000205]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 3512.000216]  do_vfs_ioctl+0x411/0x750
[ 3512.000229]  ksys_ioctl+0x5e/0x90
[ 3512.000237]  __x64_sys_ioctl+0x16/0x20
[ 3512.000242]  do_syscall_64+0x5c/0xb0
[ 3512.000249]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3512.000254] RIP: 0033:0x7f59a603d00b
[ 3512.000259] Code: 0f 1e fa 48 8b 05 7d 9e 0c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4d 9e 0c 00 f7 d8 64 89
01 48
[ 3512.000263] RSP: 002b:00007ffc493bcc08 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 3512.000267] RAX: ffffffffffffffda RBX: 00007ffc493bcc50 RCX: 00007f59a603d00b
[ 3512.000271] RDX: 00007ffc493bcc50 RSI: 00000000c03864bc RDI: 000000000000000e
[ 3512.000275] RBP: 00000000c03864bc R08: 000055aa62e41d00 R09: 0000000000000001
[ 3512.000278] R10: 0000000000000001 R11: 0000000000000246 R12: 000055aa61a99d00
[ 3512.000282] R13: 000000000000000e R14: 000055aa628f7430 R15: 000055aa62e34540
[ 3512.000297] irq event stamp: 258283232
[ 3512.000303] hardirqs last  enabled at (258283231):
[<ffffffff8b170beb>] console_unlock+0x46b/0x5d0
[ 3512.000309] hardirqs last disabled at (258283232):
[<ffffffff8b0038da>] trace_hardirqs_off_thunk+0x1a/0x20
[ 3512.000314] softirqs last  enabled at (258282448):
[<ffffffff8be0035d>] __do_softirq+0x35d/0x45d
[ 3512.000319] softirqs last disabled at (258282413):
[<ffffffff8b0f1e57>] irq_exit+0xf7/0x100
[ 3512.000323] ---[ end trace 55ed0c80b95aef99 ]---

https://pastebin.com/DfqVGDgc

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
  2019-09-04 22:26             ` Mikhail Gavrilov
@ 2019-09-05  7:58               ` Daniel Vetter
  2019-09-08 21:24                 ` Mikhail Gavrilov
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2019-09-05  7:58 UTC (permalink / raw)
  To: Mikhail Gavrilov, Christian König
  Cc: Hillf Danton, dri-devel, amd-gfx list, Linux kernel,
	Alex Deucher, Harry Wentland

On Thu, Sep 5, 2019 at 12:27 AM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Wed, 4 Sep 2019 at 13:37, Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > Extend your backtrac warning slightly like
> >
> >         WARN(r, "we're stuck on fence %pS\n", fence->ops);
> >
> > Also adding Harry and Alex, I'm not really working on amdgpu ...
>
> [ 3511.998320] ------------[ cut here ]------------
> [ 3511.998714] we're stuck on fence
> amdgpu_fence_ops+0x0/0xffffffffffffc220 [amdgpu]$

I think those fences are only emitted for CS, not display related.
Adding Christian König.
-Daniel

> [ 3511.998991] WARNING: CPU: 10 PID: 1811 at
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332
> amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
> [ 3511.999009] Modules linked in: rfcomm fuse xt_CHECKSUM
> xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc
> nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT
> nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack
> ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
> ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
> iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
> ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables
> iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd
> snd_hda_codec_realtek rtwpci snd_hda_codec_generic kvm ledtrig_audio
> snd_hda_codec_hdmi uvcvideo rtw88 videobuf2_vmalloc snd_hda_intel
> videobuf2_memops videobuf2_v4l2 irqbypass snd_usb_audio snd_hda_codec
> videobuf2_common crct10dif_pclmul snd_usbmidi_lib crc32_pclmul
> mac80211 snd_rawmidi videodev snd_hda_core ghash_clmulni_intel btusb
> snd_hwdep btrtl snd_seq btbcm btintel snd_seq_device eeepc_wmi
> bluetooth xpad joydev mc snd_pcm
> [ 3511.999076]  asus_wmi ff_memless cfg80211 sparse_keymap video
> wmi_bmof ecdh_generic snd_timer ecc sp5100_tco k10temp snd i2c_piix4
> ccp rfkill soundcore libarc4 gpio_amdpt gpio_generic acpi_cpufreq
> binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu
> amd_iommu_v2 gpu_sched ttm drm_kms_helper drm crc32c_intel igb dca
> nvme i2c_algo_bit nvme_core wmi pinctrl_amd
> [ 3511.999126] CPU: 10 PID: 1811 Comm: Xorg Not tainted
> 5.3.0-0.rc6.git2.1c.fc32.x86_64 #1
> [ 3511.999131] Hardware name: System manufacturer System Product
> Name/ROG STRIX X470-I GAMING, BIOS 2703 08/20/2019
> [ 3511.999253] RIP: 0010:amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
> [ 3511.999278] Code: fe ff ff 31 c0 c3 48 89 ef e8 36 29 04 cb 84 c0
> 74 08 48 89 ef e8 8a a9 21 cb 48 8b 75 08 48 c7 c7 2c 16 86 c0 e8 82
> b8 b9 ca <0f> 0b b8 ea ff ff ff 5d c3 e8 ec 57 c3 ca 84 c0 0f 85 6f ff
> ff ff
> [ 3511.999282] RSP: 0018:ffffb9c04170f798 EFLAGS: 00210282
> [ 3511.999288] RAX: 0000000000000000 RBX: ffff8d2ce5205a80 RCX: 0000000000000006
> [ 3511.999292] RDX: 0000000000000007 RSI: ffff8d2c5bea4070 RDI: ffff8d2cfb5d9e00
> [ 3511.999296] RBP: ffff8d28becae480 R08: 00000331b36fd503 R09: 0000000000000000
> [ 3511.999299] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d2ce5200000
> [ 3511.999303] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d2ce1540000
> [ 3511.999308] FS:  00007f59a5bc6f00(0000) GS:ffff8d2cfb400000(0000)
> knlGS:0000000000000000
> [ 3511.999311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3511.999315] CR2: 00001108bc475960 CR3: 000000075bf32000 CR4: 00000000003406e0
> [ 3511.999319] Call Trace:
> [ 3511.999394]  amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu]
> [ 3511.999503]  dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
> [ 3511.999609]  dce12_update_clocks+0xd8/0x110 [amdgpu]
> [ 3511.999712]  dc_commit_state+0x414/0x590 [amdgpu]
> [ 3511.999725]  ? find_held_lock+0x32/0x90
> [ 3511.999832]  amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
> [ 3511.999844]  ? reacquire_held_locks+0xed/0x210
> [ 3511.999859]  ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm]
> [ 3511.999866]  ? find_held_lock+0x32/0x90
> [ 3511.999872]  ? find_held_lock+0x32/0x90
> [ 3511.999881]  ? __lock_acquire+0x247/0x1910
> [ 3511.999893]  ? find_held_lock+0x32/0x90
> [ 3511.999901]  ? mark_held_locks+0x50/0x80
> [ 3511.999907]  ? _raw_spin_unlock_irq+0x29/0x40
> [ 3511.999913]  ? lockdep_hardirqs_on+0xf0/0x180
> [ 3511.999919]  ? _raw_spin_unlock_irq+0x29/0x40
> [ 3511.999924]  ? wait_for_completion_timeout+0x75/0x190
> [ 3511.999952]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
> [ 3511.999966]  commit_tail+0x3c/0x70 [drm_kms_helper]
> [ 3511.999979]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
> [ 3512.000002]  drm_mode_atomic_ioctl+0x793/0x9b0 [drm]
> [ 3512.000014]  ? __lock_acquire+0x247/0x1910
> [ 3512.000044]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
> [ 3512.000066]  drm_ioctl_kernel+0xaa/0xf0 [drm]
> [ 3512.000088]  drm_ioctl+0x208/0x390 [drm]
> [ 3512.000108]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
> [ 3512.000120]  ? lockdep_hardirqs_on+0xf0/0x180
> [ 3512.000205]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> [ 3512.000216]  do_vfs_ioctl+0x411/0x750
> [ 3512.000229]  ksys_ioctl+0x5e/0x90
> [ 3512.000237]  __x64_sys_ioctl+0x16/0x20
> [ 3512.000242]  do_syscall_64+0x5c/0xb0
> [ 3512.000249]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 3512.000254] RIP: 0033:0x7f59a603d00b
> [ 3512.000259] Code: 0f 1e fa 48 8b 05 7d 9e 0c 00 64 c7 00 26 00 00
> 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4d 9e 0c 00 f7 d8 64 89
> 01 48
> [ 3512.000263] RSP: 002b:00007ffc493bcc08 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 3512.000267] RAX: ffffffffffffffda RBX: 00007ffc493bcc50 RCX: 00007f59a603d00b
> [ 3512.000271] RDX: 00007ffc493bcc50 RSI: 00000000c03864bc RDI: 000000000000000e
> [ 3512.000275] RBP: 00000000c03864bc R08: 000055aa62e41d00 R09: 0000000000000001
> [ 3512.000278] R10: 0000000000000001 R11: 0000000000000246 R12: 000055aa61a99d00
> [ 3512.000282] R13: 000000000000000e R14: 000055aa628f7430 R15: 000055aa62e34540
> [ 3512.000297] irq event stamp: 258283232
> [ 3512.000303] hardirqs last  enabled at (258283231):
> [<ffffffff8b170beb>] console_unlock+0x46b/0x5d0
> [ 3512.000309] hardirqs last disabled at (258283232):
> [<ffffffff8b0038da>] trace_hardirqs_off_thunk+0x1a/0x20
> [ 3512.000314] softirqs last  enabled at (258282448):
> [<ffffffff8be0035d>] __do_softirq+0x35d/0x45d
> [ 3512.000319] softirqs last disabled at (258282413):
> [<ffffffff8b0f1e57>] irq_exit+0xf7/0x100
> [ 3512.000323] ---[ end trace 55ed0c80b95aef99 ]---
>
> https://pastebin.com/DfqVGDgc



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
  2019-09-05  7:58               ` Daniel Vetter
@ 2019-09-08 21:24                 ` Mikhail Gavrilov
  2019-09-09  9:15                   ` Koenig, Christian
  0 siblings, 1 reply; 12+ messages in thread
From: Mikhail Gavrilov @ 2019-09-08 21:24 UTC (permalink / raw)
  To: Christian König
  Cc: Hillf Danton, dri-devel, amd-gfx list, Linux kernel,
	Alex Deucher, Harry Wentland, Daniel Vetter

On Thu, 5 Sep 2019 at 12:58, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> I think those fences are only emitted for CS, not display related.
> Adding Christian König.

More fresh kernel log with 5.3RC7 - the issue still happens.
https://pastebin.com/tyxkWJYV


--
Best Regards,
Mike Gavrilov.

On Thu, 5 Sep 2019 at 12:58, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, Sep 5, 2019 at 12:27 AM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > On Wed, 4 Sep 2019 at 13:37, Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > Extend your backtrac warning slightly like
> > >
> > >         WARN(r, "we're stuck on fence %pS\n", fence->ops);
> > >
> > > Also adding Harry and Alex, I'm not really working on amdgpu ...
> >
> > [ 3511.998320] ------------[ cut here ]------------
> > [ 3511.998714] we're stuck on fence
> > amdgpu_fence_ops+0x0/0xffffffffffffc220 [amdgpu]$
>
> I think those fences are only emitted for CS, not display related.
> Adding Christian König.
> -Daniel
>
> > [ 3511.998991] WARNING: CPU: 10 PID: 1811 at
> > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332
> > amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
> > [ 3511.999009] Modules linked in: rfcomm fuse xt_CHECKSUM
> > xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc
> > nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT
> > nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack
> > ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
> > ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
> > iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
> > ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables
> > iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd
> > snd_hda_codec_realtek rtwpci snd_hda_codec_generic kvm ledtrig_audio
> > snd_hda_codec_hdmi uvcvideo rtw88 videobuf2_vmalloc snd_hda_intel
> > videobuf2_memops videobuf2_v4l2 irqbypass snd_usb_audio snd_hda_codec
> > videobuf2_common crct10dif_pclmul snd_usbmidi_lib crc32_pclmul
> > mac80211 snd_rawmidi videodev snd_hda_core ghash_clmulni_intel btusb
> > snd_hwdep btrtl snd_seq btbcm btintel snd_seq_device eeepc_wmi
> > bluetooth xpad joydev mc snd_pcm
> > [ 3511.999076]  asus_wmi ff_memless cfg80211 sparse_keymap video
> > wmi_bmof ecdh_generic snd_timer ecc sp5100_tco k10temp snd i2c_piix4
> > ccp rfkill soundcore libarc4 gpio_amdpt gpio_generic acpi_cpufreq
> > binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu
> > amd_iommu_v2 gpu_sched ttm drm_kms_helper drm crc32c_intel igb dca
> > nvme i2c_algo_bit nvme_core wmi pinctrl_amd
> > [ 3511.999126] CPU: 10 PID: 1811 Comm: Xorg Not tainted
> > 5.3.0-0.rc6.git2.1c.fc32.x86_64 #1
> > [ 3511.999131] Hardware name: System manufacturer System Product
> > Name/ROG STRIX X470-I GAMING, BIOS 2703 08/20/2019
> > [ 3511.999253] RIP: 0010:amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
> > [ 3511.999278] Code: fe ff ff 31 c0 c3 48 89 ef e8 36 29 04 cb 84 c0
> > 74 08 48 89 ef e8 8a a9 21 cb 48 8b 75 08 48 c7 c7 2c 16 86 c0 e8 82
> > b8 b9 ca <0f> 0b b8 ea ff ff ff 5d c3 e8 ec 57 c3 ca 84 c0 0f 85 6f ff
> > ff ff
> > [ 3511.999282] RSP: 0018:ffffb9c04170f798 EFLAGS: 00210282
> > [ 3511.999288] RAX: 0000000000000000 RBX: ffff8d2ce5205a80 RCX: 0000000000000006
> > [ 3511.999292] RDX: 0000000000000007 RSI: ffff8d2c5bea4070 RDI: ffff8d2cfb5d9e00
> > [ 3511.999296] RBP: ffff8d28becae480 R08: 00000331b36fd503 R09: 0000000000000000
> > [ 3511.999299] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d2ce5200000
> > [ 3511.999303] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d2ce1540000
> > [ 3511.999308] FS:  00007f59a5bc6f00(0000) GS:ffff8d2cfb400000(0000)
> > knlGS:0000000000000000
> > [ 3511.999311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 3511.999315] CR2: 00001108bc475960 CR3: 000000075bf32000 CR4: 00000000003406e0
> > [ 3511.999319] Call Trace:
> > [ 3511.999394]  amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu]
> > [ 3511.999503]  dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
> > [ 3511.999609]  dce12_update_clocks+0xd8/0x110 [amdgpu]
> > [ 3511.999712]  dc_commit_state+0x414/0x590 [amdgpu]
> > [ 3511.999725]  ? find_held_lock+0x32/0x90
> > [ 3511.999832]  amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
> > [ 3511.999844]  ? reacquire_held_locks+0xed/0x210
> > [ 3511.999859]  ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm]
> > [ 3511.999866]  ? find_held_lock+0x32/0x90
> > [ 3511.999872]  ? find_held_lock+0x32/0x90
> > [ 3511.999881]  ? __lock_acquire+0x247/0x1910
> > [ 3511.999893]  ? find_held_lock+0x32/0x90
> > [ 3511.999901]  ? mark_held_locks+0x50/0x80
> > [ 3511.999907]  ? _raw_spin_unlock_irq+0x29/0x40
> > [ 3511.999913]  ? lockdep_hardirqs_on+0xf0/0x180
> > [ 3511.999919]  ? _raw_spin_unlock_irq+0x29/0x40
> > [ 3511.999924]  ? wait_for_completion_timeout+0x75/0x190
> > [ 3511.999952]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
> > [ 3511.999966]  commit_tail+0x3c/0x70 [drm_kms_helper]
> > [ 3511.999979]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
> > [ 3512.000002]  drm_mode_atomic_ioctl+0x793/0x9b0 [drm]
> > [ 3512.000014]  ? __lock_acquire+0x247/0x1910
> > [ 3512.000044]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
> > [ 3512.000066]  drm_ioctl_kernel+0xaa/0xf0 [drm]
> > [ 3512.000088]  drm_ioctl+0x208/0x390 [drm]
> > [ 3512.000108]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
> > [ 3512.000120]  ? lockdep_hardirqs_on+0xf0/0x180
> > [ 3512.000205]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> > [ 3512.000216]  do_vfs_ioctl+0x411/0x750
> > [ 3512.000229]  ksys_ioctl+0x5e/0x90
> > [ 3512.000237]  __x64_sys_ioctl+0x16/0x20
> > [ 3512.000242]  do_syscall_64+0x5c/0xb0
> > [ 3512.000249]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [ 3512.000254] RIP: 0033:0x7f59a603d00b
> > [ 3512.000259] Code: 0f 1e fa 48 8b 05 7d 9e 0c 00 64 c7 00 26 00 00
> > 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
> > 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4d 9e 0c 00 f7 d8 64 89
> > 01 48
> > [ 3512.000263] RSP: 002b:00007ffc493bcc08 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000010
> > [ 3512.000267] RAX: ffffffffffffffda RBX: 00007ffc493bcc50 RCX: 00007f59a603d00b
> > [ 3512.000271] RDX: 00007ffc493bcc50 RSI: 00000000c03864bc RDI: 000000000000000e
> > [ 3512.000275] RBP: 00000000c03864bc R08: 000055aa62e41d00 R09: 0000000000000001
> > [ 3512.000278] R10: 0000000000000001 R11: 0000000000000246 R12: 000055aa61a99d00
> > [ 3512.000282] R13: 000000000000000e R14: 000055aa628f7430 R15: 000055aa62e34540
> > [ 3512.000297] irq event stamp: 258283232
> > [ 3512.000303] hardirqs last  enabled at (258283231):
> > [<ffffffff8b170beb>] console_unlock+0x46b/0x5d0
> > [ 3512.000309] hardirqs last disabled at (258283232):
> > [<ffffffff8b0038da>] trace_hardirqs_off_thunk+0x1a/0x20
> > [ 3512.000314] softirqs last  enabled at (258282448):
> > [<ffffffff8be0035d>] __do_softirq+0x35d/0x45d
> > [ 3512.000319] softirqs last disabled at (258282413):
> > [<ffffffff8b0f1e57>] irq_exit+0xf7/0x100
> > [ 3512.000323] ---[ end trace 55ed0c80b95aef99 ]---
> >
> > https://pastebin.com/DfqVGDgc
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
  2019-09-08 21:24                 ` Mikhail Gavrilov
@ 2019-09-09  9:15                   ` Koenig, Christian
  2019-09-15 19:47                     ` Mikhail Gavrilov
  0 siblings, 1 reply; 12+ messages in thread
From: Koenig, Christian @ 2019-09-09  9:15 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Hillf Danton, dri-devel, amd-gfx list, Linux kernel,
	Alex Deucher, Wentland, Harry, Daniel Vetter

I agree with Daniels analysis.

It looks like the problem is simply that PM turns of a block before all 
work is done on that block.

Have you opened a bug report yet? If not then that would certainly help 
cause it is really hard to extract all necessary information from that 
mail thread.

Regards,
Christian.

Am 08.09.19 um 23:24 schrieb Mikhail Gavrilov:
> On Thu, 5 Sep 2019 at 12:58, Daniel Vetter <daniel@ffwll.ch> wrote:
>> I think those fences are only emitted for CS, not display related.
>> Adding Christian König.
> More fresh kernel log with 5.3RC7 - the issue still happens.
> https://pastebin.com/tyxkWJYV
>
>
> --
> Best Regards,
> Mike Gavrilov.
>
> On Thu, 5 Sep 2019 at 12:58, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Thu, Sep 5, 2019 at 12:27 AM Mikhail Gavrilov
>> <mikhail.v.gavrilov@gmail.com> wrote:
>>> On Wed, 4 Sep 2019 at 13:37, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>> Extend your backtrac warning slightly like
>>>>
>>>>          WARN(r, "we're stuck on fence %pS\n", fence->ops);
>>>>
>>>> Also adding Harry and Alex, I'm not really working on amdgpu ...
>>> [ 3511.998320] ------------[ cut here ]------------
>>> [ 3511.998714] we're stuck on fence
>>> amdgpu_fence_ops+0x0/0xffffffffffffc220 [amdgpu]$
>> I think those fences are only emitted for CS, not display related.
>> Adding Christian König.
>> -Daniel
>>
>>> [ 3511.998991] WARNING: CPU: 10 PID: 1811 at
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332
>>> amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
>>> [ 3511.999009] Modules linked in: rfcomm fuse xt_CHECKSUM
>>> xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc
>>> nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT
>>> nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack
>>> ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
>>> ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
>>> iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
>>> ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables
>>> iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd
>>> snd_hda_codec_realtek rtwpci snd_hda_codec_generic kvm ledtrig_audio
>>> snd_hda_codec_hdmi uvcvideo rtw88 videobuf2_vmalloc snd_hda_intel
>>> videobuf2_memops videobuf2_v4l2 irqbypass snd_usb_audio snd_hda_codec
>>> videobuf2_common crct10dif_pclmul snd_usbmidi_lib crc32_pclmul
>>> mac80211 snd_rawmidi videodev snd_hda_core ghash_clmulni_intel btusb
>>> snd_hwdep btrtl snd_seq btbcm btintel snd_seq_device eeepc_wmi
>>> bluetooth xpad joydev mc snd_pcm
>>> [ 3511.999076]  asus_wmi ff_memless cfg80211 sparse_keymap video
>>> wmi_bmof ecdh_generic snd_timer ecc sp5100_tco k10temp snd i2c_piix4
>>> ccp rfkill soundcore libarc4 gpio_amdpt gpio_generic acpi_cpufreq
>>> binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu
>>> amd_iommu_v2 gpu_sched ttm drm_kms_helper drm crc32c_intel igb dca
>>> nvme i2c_algo_bit nvme_core wmi pinctrl_amd
>>> [ 3511.999126] CPU: 10 PID: 1811 Comm: Xorg Not tainted
>>> 5.3.0-0.rc6.git2.1c.fc32.x86_64 #1
>>> [ 3511.999131] Hardware name: System manufacturer System Product
>>> Name/ROG STRIX X470-I GAMING, BIOS 2703 08/20/2019
>>> [ 3511.999253] RIP: 0010:amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
>>> [ 3511.999278] Code: fe ff ff 31 c0 c3 48 89 ef e8 36 29 04 cb 84 c0
>>> 74 08 48 89 ef e8 8a a9 21 cb 48 8b 75 08 48 c7 c7 2c 16 86 c0 e8 82
>>> b8 b9 ca <0f> 0b b8 ea ff ff ff 5d c3 e8 ec 57 c3 ca 84 c0 0f 85 6f ff
>>> ff ff
>>> [ 3511.999282] RSP: 0018:ffffb9c04170f798 EFLAGS: 00210282
>>> [ 3511.999288] RAX: 0000000000000000 RBX: ffff8d2ce5205a80 RCX: 0000000000000006
>>> [ 3511.999292] RDX: 0000000000000007 RSI: ffff8d2c5bea4070 RDI: ffff8d2cfb5d9e00
>>> [ 3511.999296] RBP: ffff8d28becae480 R08: 00000331b36fd503 R09: 0000000000000000
>>> [ 3511.999299] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d2ce5200000
>>> [ 3511.999303] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8d2ce1540000
>>> [ 3511.999308] FS:  00007f59a5bc6f00(0000) GS:ffff8d2cfb400000(0000)
>>> knlGS:0000000000000000
>>> [ 3511.999311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 3511.999315] CR2: 00001108bc475960 CR3: 000000075bf32000 CR4: 00000000003406e0
>>> [ 3511.999319] Call Trace:
>>> [ 3511.999394]  amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu]
>>> [ 3511.999503]  dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
>>> [ 3511.999609]  dce12_update_clocks+0xd8/0x110 [amdgpu]
>>> [ 3511.999712]  dc_commit_state+0x414/0x590 [amdgpu]
>>> [ 3511.999725]  ? find_held_lock+0x32/0x90
>>> [ 3511.999832]  amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
>>> [ 3511.999844]  ? reacquire_held_locks+0xed/0x210
>>> [ 3511.999859]  ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm]
>>> [ 3511.999866]  ? find_held_lock+0x32/0x90
>>> [ 3511.999872]  ? find_held_lock+0x32/0x90
>>> [ 3511.999881]  ? __lock_acquire+0x247/0x1910
>>> [ 3511.999893]  ? find_held_lock+0x32/0x90
>>> [ 3511.999901]  ? mark_held_locks+0x50/0x80
>>> [ 3511.999907]  ? _raw_spin_unlock_irq+0x29/0x40
>>> [ 3511.999913]  ? lockdep_hardirqs_on+0xf0/0x180
>>> [ 3511.999919]  ? _raw_spin_unlock_irq+0x29/0x40
>>> [ 3511.999924]  ? wait_for_completion_timeout+0x75/0x190
>>> [ 3511.999952]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
>>> [ 3511.999966]  commit_tail+0x3c/0x70 [drm_kms_helper]
>>> [ 3511.999979]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
>>> [ 3512.000002]  drm_mode_atomic_ioctl+0x793/0x9b0 [drm]
>>> [ 3512.000014]  ? __lock_acquire+0x247/0x1910
>>> [ 3512.000044]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
>>> [ 3512.000066]  drm_ioctl_kernel+0xaa/0xf0 [drm]
>>> [ 3512.000088]  drm_ioctl+0x208/0x390 [drm]
>>> [ 3512.000108]  ? drm_atomic_set_property+0xa50/0xa50 [drm]
>>> [ 3512.000120]  ? lockdep_hardirqs_on+0xf0/0x180
>>> [ 3512.000205]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
>>> [ 3512.000216]  do_vfs_ioctl+0x411/0x750
>>> [ 3512.000229]  ksys_ioctl+0x5e/0x90
>>> [ 3512.000237]  __x64_sys_ioctl+0x16/0x20
>>> [ 3512.000242]  do_syscall_64+0x5c/0xb0
>>> [ 3512.000249]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [ 3512.000254] RIP: 0033:0x7f59a603d00b
>>> [ 3512.000259] Code: 0f 1e fa 48 8b 05 7d 9e 0c 00 64 c7 00 26 00 00
>>> 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
>>> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4d 9e 0c 00 f7 d8 64 89
>>> 01 48
>>> [ 3512.000263] RSP: 002b:00007ffc493bcc08 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000010
>>> [ 3512.000267] RAX: ffffffffffffffda RBX: 00007ffc493bcc50 RCX: 00007f59a603d00b
>>> [ 3512.000271] RDX: 00007ffc493bcc50 RSI: 00000000c03864bc RDI: 000000000000000e
>>> [ 3512.000275] RBP: 00000000c03864bc R08: 000055aa62e41d00 R09: 0000000000000001
>>> [ 3512.000278] R10: 0000000000000001 R11: 0000000000000246 R12: 000055aa61a99d00
>>> [ 3512.000282] R13: 000000000000000e R14: 000055aa628f7430 R15: 000055aa62e34540
>>> [ 3512.000297] irq event stamp: 258283232
>>> [ 3512.000303] hardirqs last  enabled at (258283231):
>>> [<ffffffff8b170beb>] console_unlock+0x46b/0x5d0
>>> [ 3512.000309] hardirqs last disabled at (258283232):
>>> [<ffffffff8b0038da>] trace_hardirqs_off_thunk+0x1a/0x20
>>> [ 3512.000314] softirqs last  enabled at (258282448):
>>> [<ffffffff8be0035d>] __do_softirq+0x35d/0x45d
>>> [ 3512.000319] softirqs last disabled at (258282413):
>>> [<ffffffff8b0f1e57>] irq_exit+0xf7/0x100
>>> [ 3512.000323] ---[ end trace 55ed0c80b95aef99 ]---
>>>
>>> https://pastebin.com/DfqVGDgc
>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
  2019-09-09  9:15                   ` Koenig, Christian
@ 2019-09-15 19:47                     ` Mikhail Gavrilov
  0 siblings, 0 replies; 12+ messages in thread
From: Mikhail Gavrilov @ 2019-09-15 19:47 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Hillf Danton, dri-devel, amd-gfx list, Linux kernel,
	Alex Deucher, Wentland, Harry, Daniel Vetter

On Mon, 9 Sep 2019 at 14:15, Koenig, Christian <Christian.Koenig@amd.com> wrote:
>
> I agree with Daniels analysis.
>
> It looks like the problem is simply that PM turns of a block before all
> work is done on that block.
>
> Have you opened a bug report yet? If not then that would certainly help
> cause it is really hard to extract all necessary information from that
> mail thread.

https://bugs.freedesktop.org/show_bug.cgi?id=111689
It'll do?

--
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
  2019-08-26  9:24 ` Daniel Vetter
@ 2019-08-29 22:03   ` mikhail.v.gavrilov
  0 siblings, 0 replies; 12+ messages in thread
From: mikhail.v.gavrilov @ 2019-08-29 22:03 UTC (permalink / raw)
  To: Daniel Vetter, Hillf Danton
  Cc: dri-devel, amd-gfx list, Linux List Kernel Mailing

On Sun, Aug 25, 2019 at 10:13:05PM +0800, Hillf Danton wrote:
> Can we try to add the fallback timer manually?
> 
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -322,6 +322,10 @@ int amdgpu_fence_wait_empty(struct amdgp
>         }
>         rcu_read_unlock();
>  
> +       if (!timer_pending(&ring->fence_drv.fallback_timer))
> +               mod_timer(&ring->fence_drv.fallback_timer,
> +                       jiffies + (AMDGPU_FENCE_JIFFIES_TIMEOUT <<
> 1));
> +
>         r = dma_fence_wait(fence, false);
>         dma_fence_put(fence);
>         return r;
> --
> 
> Or simply wait with an ear on signal and timeout if adding timer
> seems to go a bit too far?
> 
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -322,7 +322,12 @@ int amdgpu_fence_wait_empty(struct amdgp
>         }
>         rcu_read_unlock();
>  
> -       r = dma_fence_wait(fence, false);
> +       if (0 < dma_fence_wait_timeout(fence, true,
> +                               AMDGPU_FENCE_JIFFIES_TIMEOUT +
> +                               (AMDGPU_FENCE_JIFFIES_TIMEOUT >> 3)))
> +               r = 0;
> +       else
> +               r = -EINVAL;
>         dma_fence_put(fence);
>         return r;
>  }

I tested both patches on top of 5.3 RC6. Each patch I was tested more
than 24 hours and I don't seen any regressions or problems with them.


On Mon, 2019-08-26 at 11:24 +0200, Daniel Vetter wrote:
> 
> This will paper over the issue, but won't fix it. dma_fences have to
> complete, at least for normal operations, otherwise your desktop will
> start feeling like the gpu hangs all the time.
> 
> I think would be much more interesting to dump which fence isn't
> completing here in time, i.e. not just the timeout, but lots of debug
> printks.
> -Daniel

As I am understood none of these patches couldn't be merged because
they do not fix the root cause they eliminate only the consequences?
Eliminating consequences has any negative effects? And we will never
know the root cause because not having enough debugging information.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]
       [not found] <20190825141305.13984-1-hdanton@sina.com>
@ 2019-08-26  9:24 ` Daniel Vetter
  2019-08-29 22:03   ` mikhail.v.gavrilov
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2019-08-26  9:24 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Mikhail Gavrilov, dri-devel, amd-gfx list, Linux List Kernel Mailing

On Sun, Aug 25, 2019 at 10:13:05PM +0800, Hillf Danton wrote:
> 
> On Sun, 25 Aug 2019 04:28:01 -0700 Mikhail Gavrilov wrote:
> > Hi folks,
> > I left unblocked gnome-shell at noon, and when I returned at the
> > evening I discovered than monitor not sleeping and show open gnome
> > activity. At first, I thought that some application did not let fall
> > asleep the system. But when I try to move the mouse, I realized that
> > the system hanged. So I connect via ssh and tried to investigate the
> > problem. I did not see anything strange in kernel logs. And my last
> > idea before trying to kill the gnome-shell process was dumps tasks
> > that are in uninterruptable (blocked) state.
> > 
> > After [Alt + PrnScr + W] I saw this:
> > 
> > [32840.701909] sysrq: Show Blocked State
> > [32840.701976]   task                        PC stack   pid father
> > [32840.702407] gnome-shell     D11240  1900   1830 0x00000000
> > [32840.702438] Call Trace:
> > [32840.702446]  ? __schedule+0x352/0x900
> > [32840.702453]  schedule+0x3a/0xb0
> > [32840.702457]  schedule_timeout+0x289/0x3c0
> > [32840.702461]  ? find_held_lock+0x32/0x90
> > [32840.702464]  ? find_held_lock+0x32/0x90
> > [32840.702469]  ? mark_held_locks+0x50/0x80
> > [32840.702473]  ? _raw_spin_unlock_irqrestore+0x4b/0x60
> > [32840.702478]  dma_fence_default_wait+0x1f5/0x340
> > [32840.702482]  ? dma_fence_free+0x20/0x20
> > [32840.702487]  dma_fence_wait_timeout+0x182/0x1e0
> > [32840.702533]  amdgpu_fence_wait_empty+0xe7/0x210 [amdgpu]
> > [32840.702577]  amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu]
> > [32840.702641]  dm_pp_apply_display_requirements+0x19e/0x1c0 [amdgpu]
> > [32840.702705]  dce12_update_clocks+0xd8/0x110 [amdgpu]
> > [32840.702766]  dc_commit_state+0x414/0x590 [amdgpu]
> > [32840.702834]  amdgpu_dm_atomic_commit_tail+0xd1e/0x1cf0 [amdgpu]
> > [32840.702840]  ? reacquire_held_locks+0xed/0x210
> > [32840.702848]  ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm]
> > [32840.702853]  ? find_held_lock+0x32/0x90
> > [32840.702855]  ? find_held_lock+0x32/0x90
> > [32840.702860]  ? __lock_acquire+0x247/0x1910
> > [32840.702867]  ? find_held_lock+0x32/0x90
> > [32840.702871]  ? mark_held_locks+0x50/0x80
> > [32840.702874]  ? _raw_spin_unlock_irq+0x29/0x40
> > [32840.702877]  ? lockdep_hardirqs_on+0xf0/0x180
> > [32840.702881]  ? _raw_spin_unlock_irq+0x29/0x40
> > [32840.702884]  ? wait_for_completion_timeout+0x75/0x190
> > [32840.702895]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
> > [32840.702902]  commit_tail+0x3c/0x70 [drm_kms_helper]
> > [32840.702909]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
> > [32840.702922]  drm_atomic_connector_commit_dpms+0xd7/0x100 [drm]
> > [32840.702936]  set_property_atomic+0xcc/0x140 [drm]
> > [32840.702955]  drm_mode_obj_set_property_ioctl+0xcb/0x1c0 [drm]
> > [32840.702968]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
> > [32840.702978]  drm_ioctl_kernel+0xaa/0xf0 [drm]
> > [32840.702990]  drm_ioctl+0x208/0x390 [drm]
> > [32840.703003]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
> > [32840.703007]  ? sched_clock_cpu+0xc/0xc0
> > [32840.703012]  ? lockdep_hardirqs_on+0xf0/0x180
> > [32840.703053]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> > [32840.703058]  do_vfs_ioctl+0x411/0x750
> > [32840.703065]  ksys_ioctl+0x5e/0x90
> > [32840.703069]  __x64_sys_ioctl+0x16/0x20
> > [32840.703072]  do_syscall_64+0x5c/0xb0
> > [32840.703076]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [32840.703079] RIP: 0033:0x7f8bcab0f00b
> > [32840.703084] Code: Bad RIP value.
> > [32840.703086] RSP: 002b:00007ffe76c62338 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [32840.703089] RAX: ffffffffffffffda RBX: 00007ffe76c62370 RCX: 00007f8bcab0f00b
> > [32840.703092] RDX: 00007ffe76c62370 RSI: 00000000c01864ba RDI: 0000000000000009
> > [32840.703094] RBP: 00000000c01864ba R08: 0000000000000003 R09: 00000000c0c0c0c0
> > [32840.703096] R10: 000056476c86a018 R11: 0000000000000246 R12: 000056476c8ad940
> > [32840.703098] R13: 0000000000000009 R14: 0000000000000002 R15: 0000000000000003
> > [root@localhost ~]#
> > [root@localhost ~]# ps aux | grep gnome-shell
> > mikhail     1900  0.3  1.1 6447496 378696 tty2   Dl+  Aug24   2:10 > /usr/bin/gnome-shell
> > mikhail     2099  0.0  0.0 519984 23392 ?        Ssl  Aug24   0:00 > /usr/libexec/gnome-shell-calendar-server
> > mikhail    12214  0.0  0.0 399484 29660 pts/2    Sl+  Aug24   0:00 > /usr/bin/python3 /usr/bin/chrome-gnome-shell
> > chrome-extension://gphhapmejobijbbhgpjhcjognlahblep/
> > root       22957  0.0  0.0 216120  2456 pts/10   S+   03:59   0:00 > grep --color=auto gnome-shell
> > 
> > After it, I tried to kill gnome-shell process with signal 9, but the
> > process won't terminate after several unsuccessful attempts.
> > 
> > Only [Alt + PrnScr + B] helped reboot the hanging system.
> > I am writing here because I hope some ampgpu hackers cal look in the
> > trace and understand that is happening.
> > 
> > Sorry, I dont know how to reproduce this bug. But the problem itself
> > is very annoying.
> > 
> > Thanks.
> > 
> > GPU: AMD Radeon VII
> > Kernel: 5.3 RC5
> > 
> Can we try to add the fallback timer manually?
> 
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -322,6 +322,10 @@ int amdgpu_fence_wait_empty(struct amdgp
>  	}
>  	rcu_read_unlock();
>  
> +	if (!timer_pending(&ring->fence_drv.fallback_timer))
> +		mod_timer(&ring->fence_drv.fallback_timer,
> +			jiffies + (AMDGPU_FENCE_JIFFIES_TIMEOUT << 1));

This will paper over the issue, but won't fix it. dma_fences have to
complete, at least for normal operations, otherwise your desktop will
start feeling like the gpu hangs all the time.

I think would be much more interesting to dump which fence isn't
completing here in time, i.e. not just the timeout, but lots of debug
printks.
-Daniel

> +
>  	r = dma_fence_wait(fence, false);
>  	dma_fence_put(fence);
>  	return r;
> --
> 
> Or simply wait with an ear on signal and timeout if adding timer seems
> to go a bit too far?
> 
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -322,7 +322,12 @@ int amdgpu_fence_wait_empty(struct amdgp
>  	}
>  	rcu_read_unlock();
>  
> -	r = dma_fence_wait(fence, false);
> +	if (0 < dma_fence_wait_timeout(fence, true,
> +				AMDGPU_FENCE_JIFFIES_TIMEOUT +
> +				(AMDGPU_FENCE_JIFFIES_TIMEOUT >> 3)))
> +		r = 0;
> +	else
> +		r = -EINVAL;
>  	dma_fence_put(fence);
>  	return r;
>  }
> --
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* gnome-shell stuck because of amdgpu driver [5.3 RC5]
@ 2019-08-25 11:27 Mikhail Gavrilov
  0 siblings, 0 replies; 12+ messages in thread
From: Mikhail Gavrilov @ 2019-08-25 11:27 UTC (permalink / raw)
  To: amd-gfx list, dri-devel, Linux List Kernel Mailing

Hi folks,
I left unblocked gnome-shell at noon, and when I returned at the
evening I discovered than monitor not sleeping and show open gnome
activity. At first, I thought that some application did not let fall
asleep the system. But when I try to move the mouse, I realized that
the system hanged. So I connect via ssh and tried to investigate the
problem. I did not see anything strange in kernel logs. And my last
idea before trying to kill the gnome-shell process was dumps tasks
that are in uninterruptable (blocked) state.

After [Alt + PrnScr + W] I saw this:

[32840.701909] sysrq: Show Blocked State
[32840.701976]   task                        PC stack   pid father
[32840.702407] gnome-shell     D11240  1900   1830 0x00000000
[32840.702438] Call Trace:
[32840.702446]  ? __schedule+0x352/0x900
[32840.702453]  schedule+0x3a/0xb0
[32840.702457]  schedule_timeout+0x289/0x3c0
[32840.702461]  ? find_held_lock+0x32/0x90
[32840.702464]  ? find_held_lock+0x32/0x90
[32840.702469]  ? mark_held_locks+0x50/0x80
[32840.702473]  ? _raw_spin_unlock_irqrestore+0x4b/0x60
[32840.702478]  dma_fence_default_wait+0x1f5/0x340
[32840.702482]  ? dma_fence_free+0x20/0x20
[32840.702487]  dma_fence_wait_timeout+0x182/0x1e0
[32840.702533]  amdgpu_fence_wait_empty+0xe7/0x210 [amdgpu]
[32840.702577]  amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu]
[32840.702641]  dm_pp_apply_display_requirements+0x19e/0x1c0 [amdgpu]
[32840.702705]  dce12_update_clocks+0xd8/0x110 [amdgpu]
[32840.702766]  dc_commit_state+0x414/0x590 [amdgpu]
[32840.702834]  amdgpu_dm_atomic_commit_tail+0xd1e/0x1cf0 [amdgpu]
[32840.702840]  ? reacquire_held_locks+0xed/0x210
[32840.702848]  ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm]
[32840.702853]  ? find_held_lock+0x32/0x90
[32840.702855]  ? find_held_lock+0x32/0x90
[32840.702860]  ? __lock_acquire+0x247/0x1910
[32840.702867]  ? find_held_lock+0x32/0x90
[32840.702871]  ? mark_held_locks+0x50/0x80
[32840.702874]  ? _raw_spin_unlock_irq+0x29/0x40
[32840.702877]  ? lockdep_hardirqs_on+0xf0/0x180
[32840.702881]  ? _raw_spin_unlock_irq+0x29/0x40
[32840.702884]  ? wait_for_completion_timeout+0x75/0x190
[32840.702895]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[32840.702902]  commit_tail+0x3c/0x70 [drm_kms_helper]
[32840.702909]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[32840.702922]  drm_atomic_connector_commit_dpms+0xd7/0x100 [drm]
[32840.702936]  set_property_atomic+0xcc/0x140 [drm]
[32840.702955]  drm_mode_obj_set_property_ioctl+0xcb/0x1c0 [drm]
[32840.702968]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[32840.702978]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[32840.702990]  drm_ioctl+0x208/0x390 [drm]
[32840.703003]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[32840.703007]  ? sched_clock_cpu+0xc/0xc0
[32840.703012]  ? lockdep_hardirqs_on+0xf0/0x180
[32840.703053]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[32840.703058]  do_vfs_ioctl+0x411/0x750
[32840.703065]  ksys_ioctl+0x5e/0x90
[32840.703069]  __x64_sys_ioctl+0x16/0x20
[32840.703072]  do_syscall_64+0x5c/0xb0
[32840.703076]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[32840.703079] RIP: 0033:0x7f8bcab0f00b
[32840.703084] Code: Bad RIP value.
[32840.703086] RSP: 002b:00007ffe76c62338 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[32840.703089] RAX: ffffffffffffffda RBX: 00007ffe76c62370 RCX: 00007f8bcab0f00b
[32840.703092] RDX: 00007ffe76c62370 RSI: 00000000c01864ba RDI: 0000000000000009
[32840.703094] RBP: 00000000c01864ba R08: 0000000000000003 R09: 00000000c0c0c0c0
[32840.703096] R10: 000056476c86a018 R11: 0000000000000246 R12: 000056476c8ad940
[32840.703098] R13: 0000000000000009 R14: 0000000000000002 R15: 0000000000000003
[root@localhost ~]#
[root@localhost ~]# ps aux | grep gnome-shell
mikhail     1900  0.3  1.1 6447496 378696 tty2   Dl+  Aug24   2:10
/usr/bin/gnome-shell
mikhail     2099  0.0  0.0 519984 23392 ?        Ssl  Aug24   0:00
/usr/libexec/gnome-shell-calendar-server
mikhail    12214  0.0  0.0 399484 29660 pts/2    Sl+  Aug24   0:00
/usr/bin/python3 /usr/bin/chrome-gnome-shell
chrome-extension://gphhapmejobijbbhgpjhcjognlahblep/
root       22957  0.0  0.0 216120  2456 pts/10   S+   03:59   0:00
grep --color=auto gnome-shell

After it, I tried to kill gnome-shell process with signal 9, but the
process won't terminate after several unsuccessful attempts.

Only [Alt + PrnScr + B] helped reboot the hanging system.
I am writing here because I hope some ampgpu hackers cal look in the
trace and understand that is happening.

Sorry, I don’t know how to reproduce this bug. But the problem itself
is very annoying.

Thanks.

GPU: AMD Radeon VII
Kernel: 5.3 RC5


--
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-09-15 19:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190830032948.13516-1-hdanton@sina.com>
2019-09-03  6:48 ` gnome-shell stuck because of amdgpu driver [5.3 RC5] Mikhail Gavrilov
     [not found]   ` <5d6e2298.1c69fb81.b5532.8395SMTPIN_ADDED_MISSING@mx.google.com>
2019-09-03 18:07     ` Mikhail Gavrilov
2019-09-03 20:18       ` Daniel Vetter
     [not found]         ` <5d6f10a6.1c69fb81.6b104.af73SMTPIN_ADDED_MISSING@mx.google.com>
2019-09-04  8:37           ` Daniel Vetter
2019-09-04 22:26             ` Mikhail Gavrilov
2019-09-05  7:58               ` Daniel Vetter
2019-09-08 21:24                 ` Mikhail Gavrilov
2019-09-09  9:15                   ` Koenig, Christian
2019-09-15 19:47                     ` Mikhail Gavrilov
     [not found] <20190825141305.13984-1-hdanton@sina.com>
2019-08-26  9:24 ` Daniel Vetter
2019-08-29 22:03   ` mikhail.v.gavrilov
2019-08-25 11:27 Mikhail Gavrilov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).