All of lore.kernel.org
 help / color / mirror / Atom feed
* Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
@ 2018-01-01 16:32 Luís Mendes
       [not found] ` <CAEzXK1o+FewfiG84pCj8c_1Xz5KVsOOU7EX13LWaFVxK7s66fg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-01 16:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	alexander.deucher-5C7GfCeVMHo, christian.koenig-5C7GfCeVMHo

I am currently testing the amdgpu driver with AMD RX460 and RX550
graphics cards on an ARM Cortex-A9 with 1GB RAM and I am consistently
getting deadlocks when playing videos with Kodi or other applications.

I'm using Linux kernel from
https://cgit.freedesktop.org/~agd5f/linux/, branch drm-next-4.16 at
commit "drm/amdgpu: Correct the IB size of bo update mapping" -
104bd2ca1124dfd9aa904d5f5a96253ef2b580f6  along with libdrm-2.4.89 and
mesa-17.3.1 on an Ubuntu 17.10 with Mate desktop and Lightdm session
manager over X11.


I am consistently getting deadlocks, which sometimes are almost
immediate, but sometimes they take about half an hour to occur. There
are some video files that I am using for testing which have more
probability of causing a deadlock than others.

I got some kernel crash dumps, kodi process backtraces for the
offending thread and the deadlocked process tree listing which I
attach here. The kernel seems to deadlock during a page flip,
indefinitelly waiting for the DMA fence to complete, however, it
doesn't and the timeout doesn't expire either... as such this may be a
GPU lockup.

I can provide more details, if needed, if there is interest or time to
look into this.

Regards,
Luís Mendes
Software and Hardware engineer

[  253.904103] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=43831, last emitted seq=43833
[  253.915041] [drm] IP block:gmc_v8_0 is hung!
[  253.915047] [drm] IP block:gfx_v8_0 is hung!
[  253.915162] [drm] GPU recovery disabled.
[  366.541614] INFO: task kworker/u4:4:90 blocked for more than 120
seconds.
[  366.548436]       Not tainted 4.15.0-rc4-drmnext2g #1
[  366.554300] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  366.562162] kworker/u4:4    D    0    90      2 0x00000000
[  366.562196] Workqueue: events_unbound commit_work [drm_kms_helper]
[  366.562215] [<80b8c6d4>] (__schedule) from [<80b8cdd0>]
(schedule+0x4c/0xac)
[  366.562223] [<80b8cdd0>] (schedule) from [<80b91024>]
(schedule_timeout+0x228/0x444)
[  366.562233] [<80b91024>] (schedule_timeout) from [<80886738>]
(dma_fence_default_wait+0x2b4/0x2d8)
[  366.562241] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
(dma_fence_wait_timeout+0x40/0x150)
[  366.562248] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
(reservation_object_wait_timeout_rcu+0xfc/0x34c)
[  366.562476] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
[<7f2d3988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
[  366.562754] [<7f2d3988>] (amdgpu_dm_do_flip [amdgpu]) from
[<7f2d509c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
[  366.562908] [<7f2d509c>] (amdgpu_dm_atomic_commit_tail [amdgpu])
from [<7f13e58c>] (commit_tail+0x50/0x94 [drm_kms_helper])
[  366.562931] [<7f13e58c>] (commit_tail [drm_kms_helper]) from
[<7f13e5ec>] (commit_work+0x1c/0x20 [drm_kms_helper])
[  366.562948] [<7f13e5ec>] (commit_work [drm_kms_helper]) from
[<8016f4c8>] (process_one_work+0x1a8/0x4ac)
[  366.562955] [<8016f4c8>] (process_one_work) from [<8017050c>]
(worker_thread+0x68/0x598)
[  366.562962] [<8017050c>] (worker_thread) from [<80175e50>]
(kthread+0x16c/0x174)
[  366.562970] [<80175e50>] (kthread) from [<80109de8>]
(ret_from_fork+0x14/0x2c)


From userland side:
(gdb) info thread
  Id   Target Id         Frame
* 1    Thread 0x6eb17c70 (LWP 2071) "kodi.bin" 0x748b2246 in ioctl ()
    at ../sysdeps/unix/syscall-template.S:84
  2    Thread 0x6eb14170 (LWP 2072) "Announce" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  3    Thread 0x6e1ff170 (LWP 2075) "ActiveAE" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  4    Thread 0x6d9ff170 (LWP 2076) "AESink" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  5    Thread 0x6b7c9170 (LWP 2081) "amdgpu_cs:0" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  6    Thread 0x6ae3c170 (LWP 2082) "disk_cache:0" __libc_do_syscall
()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  7    Thread 0x571df170 (LWP 2083) "si_shader:0" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  8    Thread 0x569df170 (LWP 2084) "si_shader_low:0"
__libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  9    Thread 0x561df170 (LWP 2085) "gallium_drv:0" __libc_do_syscall
()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  10   Thread 0x551f6170 (LWP 2086) "kodi.bin" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  11   Thread 0x549f6170 (LWP 2087) "PeripBusUSBUdev"
__libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
---Type <return> to continue, or q <return> to quit---
  12   Thread 0x541f6170 (LWP 2088) "PeripBusCEC" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  13   Thread 0x539f6170 (LWP 2089) "PeripBusAddon" __libc_do_syscall
()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  14   Thread 0x531f6170 (LWP 2090) "kodi.bin" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  15   Thread 0x510ff170 (LWP 2095) "Timer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  16   Thread 0x508ff170 (LWP 2096) "kodi.bin" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  17   Thread 0x500ff170 (LWP 2097) "kodi.bin" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  18   Thread 0x4f8b5170 (LWP 2098) "kodi.bin" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  19   Thread 0x4f0b5170 (LWP 2099) "kodi.bin" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  20   Thread 0x4e8b5170 (LWP 2100) "kodi.bin" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  21   Thread 0x4e0b5170 (LWP 2101) "EventServer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  22   Thread 0x4d8b5170 (LWP 2102) "TCPServer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  23   Thread 0x4c9a8170 (LWP 2103) "JobWorker" __libc_do_syscall ()
---Type <return> to continue, or q <return> to quit---
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  24   Thread 0x4c1a8170 (LWP 2104) "JobWorker" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  25   Thread 0x4b9a8170 (LWP 2108) "PVRManager" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  26   Thread 0x4b1a8170 (LWP 2114) "PVRGUIInfo" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  27   Thread 0x4a9a8170 (LWP 2115) "EPGUpdater" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  28   Thread 0x4a1a8170 (LWP 2119) "VideoPlayer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  29   Thread 0x499a8170 (LWP 2120) "VideoPlayer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  30   Thread 0x491a8170 (LWP 2121) "VideoPlayer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  31   Thread 0x489a8170 (LWP 2122) "VideoPlayer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  32   Thread 0x481a8170 (LWP 2123) "VideoPlayer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  33   Thread 0x479a8170 (LWP 2124) "VideoPlayer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  34   Thread 0x46d59170 (LWP 2302) "VideoPlayer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
---Type <return> to continue, or q <return> to quit---
  35   Thread 0x46559170 (LWP 2303) "VideoPlayer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
  36   Thread 0x45d59170 (LWP 2304) "VideoPlayer" __libc_do_syscall ()
    at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46

0x748b2246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x748b2246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
#1  0x7605c096 in drmIoctl (fd=22, request=3223348297,
    arg=arg@entry=0x7ec1b218) at ../xf86drm.c:191
#2  0x6b875218 in amdgpu_ioctl_wait_cs (context=<optimized out>,
    context=<optimized out>, busy=<synthetic pointer>,
flags=<optimized out>,
    timeout_ns=18446744073709551615, handle=10555, ring=<optimized
out>,
    ip_instance=<optimized out>, ip=<optimized out>)
    at ../../amdgpu/amdgpu_cs.c:408
#3  amdgpu_cs_query_fence_status (fence=<optimized out>,
    timeout_ns=<optimized out>, flags=1, expired=0x7ec1b280)
    at ../../amdgpu/amdgpu_cs.c:437
#4  0x6be7866e in ?? () from /usr/lib/arm-linux-gnueabihf/dri/radeonsi_dri.so
    ...

->Another reboot another crash - just at begin of playing a video with kodi

[   73.432967] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=4183, last emitted seq=4185
[   73.443847] [drm] IP block:gmc_v8_0 is hung!
[   73.443854] [drm] IP block:gfx_v8_0 is hung!
[   73.444019] [drm] GPU recovery disabled.
[  243.672640] INFO: task kworker/u4:3:89 blocked for more than 120
seconds.
[  243.679466]       Not tainted 4.15.0-rc4-drmnext2g #1
[  243.685337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  243.693200] kworker/u4:3    D    0    89      2 0x00000000
[  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
[  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>]
(schedule+0x4c/0xac)
[  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
(schedule_timeout+0x228/0x444)
[  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
(dma_fence_default_wait+0x2b4/0x2d8)
[  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
(dma_fence_wait_timeout+0x40/0x150)
[  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
(reservation_object_wait_timeout_rcu+0xfc/0x34c)
[  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
[<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
[  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
[<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
[  243.693941] [<7f33309c>] (amdgpu_dm_atomic_commit_tail [amdgpu])
from [<7f15758c>] (commit_tail+0x50/0x94 [drm_kms_helper])
[  243.693964] [<7f15758c>] (commit_tail [drm_kms_helper]) from
[<7f1575ec>] (commit_work+0x1c/0x20 [drm_kms_helper])
[  243.693981] [<7f1575ec>] (commit_work [drm_kms_helper]) from
[<8016f4c8>] (process_one_work+0x1a8/0x4ac)
[  243.693987] [<8016f4c8>] (process_one_work) from [<8017050c>]
(worker_thread+0x68/0x598)
[  243.693994] [<8017050c>] (worker_thread) from [<80175e50>]
(kthread+0x16c/0x174)
[  243.694003] [<80175e50>] (kthread) from [<80109de8>]
(ret_from_fork+0x14/0x2c)

bt
0x7480c246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x7480c246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
#1  0x76068096 in drmIoctl (fd=22, request=3223348297,
    arg=arg@entry=0x7efc81c0) at ../xf86drm.c:191
#2  0x6e457218 in amdgpu_ioctl_wait_cs (context=<optimized out>,
    context=<optimized out>, busy=<synthetic pointer>,
flags=<optimized out>,
    timeout_ns=18446744073709551615, handle=832, ring=<optimized out>,
    ip_instance=<optimized out>, ip=<optimized out>)
    at ../../amdgpu/amdgpu_cs.c:408
#3  amdgpu_cs_query_fence_status (fence=<optimized out>,
    timeout_ns=<optimized out>, flags=1, expired=0x7efc8228)
    at ../../amdgpu/amdgpu_cs.c:437
#4  0x6bf5b6ec in ?? () from /usr/lib/arm-linux-gnueabihf/dri/radeonsi_dri.so


->Process tree with Deadlock
ubuntu@localhost:~$ ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:05 /sbin/init
    2 ?        S      0:00 [kthreadd]
    4 ?        I<     0:00 [kworker/0:0H]
    6 ?        I<     0:00 [mm_percpu_wq]
    7 ?        S      0:00 [ksoftirqd/0]
    8 ?        I      0:00 [rcu_sched]
    9 ?        I      0:00 [rcu_bh]
   10 ?        S      0:00 [migration/0]
   11 ?        S      0:00 [cpuhp/0]
   12 ?        S      0:00 [cpuhp/1]
   13 ?        S      0:00 [migration/1]
   14 ?        S      0:00 [ksoftirqd/1]
   16 ?        I<     0:00 [kworker/1:0H]
   17 ?        S      0:00 [kdevtmpfs]
   18 ?        I<     0:00 [netns]
   19 ?        I      0:00 [kworker/0:1]
   20 ?        I      0:00 [kworker/1:1]
   21 ?        S      0:00 [kauditd]
   22 ?        S      0:00 [khungtaskd]
   23 ?        S      0:00 [oom_reaper]
   24 ?        I<     0:00 [writeback]
   25 ?        S      0:00 [kcompactd0]
   26 ?        SN     0:00 [ksmd]
   27 ?        I<     0:00 [crypto]
   28 ?        I<     0:00 [kintegrityd]
   29 ?        I<     0:00 [kblockd]
   30 ?        I<     0:00 [ata_sff]
   31 ?        I<     0:00 [devfreq_wq]
   32 ?        I<     0:00 [watchdogd]
   44 ?        S      0:00 [kswapd0]
   45 ?        I<     0:00 [xfsalloc]
   46 ?        I<     0:00 [xfs_mru_cache]
   68 ?        I      0:00 [kworker/u4:1]
   78 ?        I<     0:00 [kthrotld]
   79 ?        S      0:00 [kapmd]
   80 ?        S      0:00 [scsi_eh_0]
   81 ?        I<     0:00 [scsi_tmf_0]
   82 ?        S      0:00 [scsi_eh_1]
   83 ?        I<     0:00 [scsi_tmf_1]
   84 ?        I      0:00 [kworker/u4:2]
   85 ?        S      0:00 [scsi_eh_2]
   86 ?        I<     0:00 [scsi_tmf_2]
   87 ?        S      0:00 [scsi_eh_3]
   88 ?        I<     0:00 [scsi_tmf_3]
   89 ?        D      0:00 [kworker/u4:3]
   91 ?        S      0:00 [spi1]
   96 ?        S      0:00 [irq/47-mmc0]
   98 ?        S      0:00 [irq/53-f10d8000]
   99 ?        S      0:00 [irq/42-f1090000]
  100 ?        S      0:00 [irq/43-f1090000]
  113 ?        I<     0:00 [ipv6_addrconf]
  114 ?        I      0:00 [kworker/0:2]
  143 ?        S      0:00 [mmcqd/0]
  144 ?        I<     0:00 [kworker/1:1H]
  145 ?        I<     0:00 [kworker/0:1H]
  146 ?        S      0:00 [jbd2/sda2-8]
  147 ?        I<     0:00 [ext4-rsv-conver]
  181 ?        Ss     0:00 /lib/systemd/systemd-journald
  213 ?        Ss     0:00 /lib/systemd/systemd-udevd
  243 ?        I<     0:00 [dsa_ordered]
  310 ?        I<     0:00 [ttm_swap]
  314 ?        S      0:00 [gfx]
  316 ?        S      0:00 [comp_1.0.0]
  317 ?        S      0:00 [comp_1.0.1]
  318 ?        S      0:00 [comp_1.0.2]
  319 ?        S      0:00 [comp_1.0.3]
  320 ?        S      0:00 [comp_1.0.4]
  321 ?        S      0:00 [jbd2/mmcblk0p1-]
  322 ?        S      0:00 [comp_1.0.5]
  323 ?        I<     0:00 [ext4-rsv-conver]
  324 ?        S      0:00 [comp_1.0.6]
  325 ?        S      0:00 [comp_1.0.7]
  326 ?        S      0:00 [sdma0]
  327 ?        S      0:00 [sdma1]
  328 ?        S      0:00 [uvd]
  329 ?        S      0:00 [uvd_enc0]
  330 ?        S      0:00 [uvd_enc1]
  331 ?        S      0:00 [vce0]
  332 ?        S      0:00 [vce1]
  334 ?        I<     0:00 [dm_timer_queue]
  347 ?        Ssl    0:00 /lib/systemd/systemd-timesyncd
  455 ?        Ss     0:00 /usr/bin/dbus-daemon --system
--address=systemd: --no
  456 ?        Ssl    0:00 /usr/sbin/NetworkManager --no-daemon
  457 ?        Ss     0:00 /usr/sbin/cupsd -l
  458 ?        Ssl    0:00 /usr/lib/accountsservice/accounts-daemon
  462 ?        Ss     0:00 /usr/sbin/atd -f
  464 ?        Ssl    0:00 /usr/sbin/ModemManager
  467 ?        Ss     0:00 /usr/sbin/cron -f
  475 ?        Ssl    0:00 /usr/lib/udisks2/udisksd
  483 ?        Ssl    0:00 /usr/sbin/rsyslogd -n
  487 ?        Ss     0:00 /lib/systemd/systemd-logind
  488 ?        Ss     0:00 avahi-daemon: running [linux.local]
  489 ?        S      0:00 avahi-daemon: chroot helper
  490 ?        Ssl    0:00 /usr/lib/snapd/snapd
  538 ?        Ssl    0:02 /usr/bin/tvheadend -f -u hts -g video
  563 ?        I      0:00 [kworker/1:3]
  565 ?        Ssl    0:00 /usr/sbin/cups-browsed
  581 ?        Ss     0:00 /lib/systemd/systemd-resolved
  582 ?        Ssl    0:00 /usr/lib/policykit-1/polkitd --no-debug
  586 ?        Ss     0:00 /usr/sbin/sshd -D
  619 ?        SLsl   0:00 /usr/sbin/lightdm
  630 ttyS0    Ss     0:00 /bin/login --
  638 tty7     Ssl+   0:03 /usr/lib/xorg/Xorg -core :0 -seat seat0
-auth /var/ru
  642 tty1     Ss+    0:00 /sbin/agetty --noclear tty1 linux
  737 ?        Ssl    0:00 /usr/lib/upower/upowerd
  758 ?        Sl     0:00 lightdm --session-child 13 20
  762 ?        Ssl    0:00 /usr/bin/whoopsie -f
  769 ?        Ss     0:00 /usr/sbin/kerneloops
  778 ?        Ss     0:00 /lib/systemd/systemd --user
  779 ?        S      0:00 (sd-pam)
  788 ?        Sl     0:00 /usr/bin/gnome-keyring-daemon --daemonize
--login
  791 ?        Ssl    0:00 mate-session
  976 ?        Ss     0:00 /usr/bin/dbus-daemon --session
--address=systemd: --n
 1099 ?        Ss     0:00 /usr/bin/ssh-agent /usr/bin/im-launch
mate-session
 1168 ?        Ssl    0:00 /usr/lib/gvfs/gvfsd
 1173 ?        Sl     0:00 /usr/lib/gvfs/gvfsd-fuse
/run/user/1000/gvfs -f -o bi
 1183 ?        Sl     0:00 /usr/lib/dconf/dconf-service
 1189 ?        Sl     0:00 /usr/bin/mate-settings-daemon
 1193 ?        Sl     0:00 marco
 1201 ?        S<l    0:00 /usr/bin/pulseaudio --start
--log-target=syslog
 1202 ?        SNsl   0:00 /usr/lib/rtkit/rtkit-daemon
 1205 ?        Sl     0:01 mate-panel
 1220 ?        Sl     0:02 caja
 1224 ?        Sl     0:00 /usr/lib/mate-panel/wnck-applet
 1225 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-udisks2-volume-monitor
 1232 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-goa-volume-monitor
 1243 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-afc-volume-monitor
 1248 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-gphoto2-volume-monitor
 1252 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-mtp-volume-monitor
 1258 ?        Sl     0:00 /usr/lib/mate-applets/trashapplet
 1262 ?        Sl     0:00 /usr/lib/mate-panel/clock-applet
 1263 ?        Sl     0:00
/usr/lib/mate-panel/notification-area-applet
 1267 ?        Sl     0:00 mate-screensaver
 1269 ?        Sl     0:00 /usr/bin/python3
/usr/share/system-config-printer/app
 1272 ?        Sl     0:00 mate-maximus
 1274 ?        Sl     0:00 nm-applet
 1281 ?        Sl     0:00 mate-power-manager
 1283 ?        Sl     0:00 kerneloops-applet
 1285 ?        Sl     0:00 /usr/lib/at-spi2-core/at-spi-bus-launcher
--launch-im
 1289 ?        Sl     0:00 update-notifier
 1291 ?        Sl     0:01 /usr/bin/python3 /usr/bin/blueman-applet
 1292 ?        Sl     0:00
/usr/lib/arm-linux-gnueabihf/deja-dup/deja-dup-monito
 1293 ?        Sl     0:00 mate-volume-control-applet
 1296 ?        Sl     0:02 /usr/bin/python3 /usr/bin/onboard
 1306 ?        Sl     0:00
/usr/lib/arm-linux-gnueabihf/polkit-mate/polkit-mate-
 1314 ?        S      0:00 /usr/bin/dbus-daemon
--config-file=/usr/share/default
 1327 ?        Sl     0:00 /usr/lib/at-spi2-core/at-spi2-registryd
--use-gnome-s
 1342 ?        Sl     0:00 /usr/lib/gvfs/gvfsd-trash --spawner :1.6
/org/gtk/gvf
 1395 ?        S      0:00 [jbd2/sda3-8]
 1396 ?        I<     0:00 [ext4-rsv-conver]
 1397 ?        S      0:00 [jbd2/mmcblk0p2-]
 1398 ?        I<     0:00 [ext4-rsv-conver]
 1413 ?        Ssl    0:00 /usr/lib/gvfs/gvfsd-metadata
 1438 ?        S      0:00 /bin/sh /usr/bin/kodi
 1442 ?        Sl     0:44 /usr/lib/arm-linux-gnueabihf/kodi/kodi.bin
 1484 ?        Ss     0:00 /usr/lib/bluetooth/obexd


Kernel amdgpu initialization:
[   12.865468] [drm] amdgpu kernel modesetting enabled.
[   12.865699] amdgpu 0000:01:00.0: enabling device (0140 -> 0143)
[   12.866422] [drm] initializing kernel modesetting (POLARIS11
0x1002:0x67EF 0x174B:0xE348 0xCF).
[   12.866434] [drm] register mmio base: 0xE0200000
[   12.866436] [drm] register mmio size: 262144
[   12.866463] [drm] probing gen 2 caps for device 11ab:6828 = 3ac12/0
[   12.866466] [drm] probing mlw for device 11ab:6828 = 3ac12
[   12.866482] [drm] UVD is enabled in VM mode
[   12.866483] [drm] UVD ENC is enabled in VM mode
[   12.866487] [drm] VCE enabled in VM mode
[   13.083379] ATOM BIOS: 113-34801-U03
[   13.083412] [drm] GPU posting now...
[   13.300951] [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
fragment size is 9-bit
[   13.302750] amdgpu 0000:01:00.0: VRAM: 2048M 0x000000F400000000 -
0x000000F47FFFFFFF (2048M used)
[   13.302756] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 -
0x000000000FFFFFFF
[   13.302759] [drm] Detected VRAM RAM=2048M, BAR=256M
[   13.302761] [drm] RAM width 128bits GDDR5
[   13.307710] [TTM] Zone  kernel: Available graphics memory: 510748
kiB
[   13.307714] [TTM] Initializing pool allocator
[   13.307776] [drm] amdgpu: 2048M of VRAM memory ready
[   13.307781] [drm] amdgpu: 748M of GTT memory ready.
[   13.307828] [drm] GART: num cpu pages 65536, num gpu pages 65536
[   13.307910] [drm] PCIE GART of 256M enabled (table at
0x000000F400040000).
[   13.309597] [drm] Chained IB support enabled!
                                                [   13.341215] [drm]
Found UVD firmware Version: 1.79 Family ID: 16
[   13.359302] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[   13.427998] amdgpu: [powerplay]
                failed to send message 309 ret is 254
[   13.428020] amdgpu: [powerplay]
                failed to send pre message 14e ret is 254
[   13.436878] [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR*
Don't have enable_spread_spectrum_on_ppll for v4
[   13.447496] [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR*
Don't have program_clock for v7
[   13.456402] [drm] DM_PPLIB: values for Engine clock
[   13.456405] [drm] DM_PPLIB:   21400
[   13.456406] [drm] DM_PPLIB:   48100
[   13.456407] [drm] DM_PPLIB:   76000
[   13.456408] [drm] DM_PPLIB:   102000
[   13.456409] [drm] DM_PPLIB:   110200
[   13.456410] [drm] DM_PPLIB:   113800
[   13.456411] [drm] DM_PPLIB:   117200
[   13.456412] [drm] DM_PPLIB:   121000
[   13.456413] [drm] DM_PPLIB: Warning: using default validation
clocks!
[   13.456414] [drm] DM_PPLIB: Validation clocks:
[   13.456416] [drm] DM_PPLIB:    engine_max_clock: 72000
[   13.456417] [drm] DM_PPLIB:    memory_max_clock: 80000
[   13.456418] [drm] DM_PPLIB:    level           : 0
[   13.456419] [drm] DM_PPLIB: reducing engine clock level from 8 to 2
[   13.456423] [drm] DM_PPLIB: values for Memory clock
[   13.456425] [drm] DM_PPLIB:   30000
[   13.456425] [drm] DM_PPLIB:   175000
[   13.456427] [drm] DM_PPLIB: Warning: using default validation
clocks!
[   13.456427] [drm] DM_PPLIB: Validation clocks:
[   13.456428] [drm] DM_PPLIB:    engine_max_clock: 72000
[   13.456429] [drm] DM_PPLIB:    memory_max_clock: 80000
[   13.456430] [drm] DM_PPLIB:    level           : 0
[   13.456432] [drm] DM_PPLIB: reducing memory clock level from 2 to 1
[   13.457440] [drm] Display Core initialized with v3.1.27!
[   13.664292] [drm] Supports vblank timestamp caching Rev 2
(21.10.2013).
[   13.664295] [drm] Driver supports precise vblank timestamp query.
[   13.703612] [drm] UVD and UVD ENC initialized successfully.
[   13.805731] [drm] VCE initialized successfully.
[   14.316289] [drm] fb mappable at 0xD03F2000
[   14.316291] [drm] vram apper at 0xD0000000
[   14.316293] [drm] size 3145728
[   14.316294] [drm] fb depth is 24
[   14.316296] [drm]    pitch is 4096
[   14.336250] Console: switching to colour frame buffer device 128x48
[   14.396964] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer
device
[   14.469610] [drm] Initialized amdgpu 3.23.0 20150101 for
0000:01:00.0 on minor 0
[   14.470097] snd_hda_intel 0000:01:00.1: enabling device (0140 ->
0142)
[   14.470109] snd_hda_intel 0000:01:00.1: Force to snoop mode by
module option
[   14.535472] input: HDA ATI HDMI HDMI/DP,pcm=3 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input3
[   14.535606] input: HDA ATI HDMI HDMI/DP,pcm=7 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input4
[   14.535730] input: HDA ATI HDMI HDMI/DP,pcm=8 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input5
[   14.535845] input: HDA ATI HDMI HDMI/DP,pcm=9 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input6
[   14.535971] input: HDA ATI HDMI HDMI/DP,pcm=10 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input7
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found] ` <CAEzXK1o+FewfiG84pCj8c_1Xz5KVsOOU7EX13LWaFVxK7s66fg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-02  2:51   ` Chunming Zhou
       [not found]     ` <1ca083a5-f33c-7be0-a3c8-c2996a087f70-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Chunming Zhou @ 2018-01-02  2:51 UTC (permalink / raw)
  To: Luís Mendes, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	alexander.deucher-5C7GfCeVMHo, christian.koenig-5C7GfCeVMHo

Did you try it on x86 board? Is there same issue?

We should identify it is ARM specific or genera issue for amdgpu driver.


Thanks,

David Zhou


On 2018年01月02日 00:32, Luís Mendes wrote:
> I am currently testing the amdgpu driver with AMD RX460 and RX550
> graphics cards on an ARM Cortex-A9 with 1GB RAM and I am consistently
> getting deadlocks when playing videos with Kodi or other applications.
>
> I'm using Linux kernel from
> https://cgit.freedesktop.org/~agd5f/linux/, branch drm-next-4.16 at
> commit "drm/amdgpu: Correct the IB size of bo update mapping" -
> 104bd2ca1124dfd9aa904d5f5a96253ef2b580f6  along with libdrm-2.4.89 and
> mesa-17.3.1 on an Ubuntu 17.10 with Mate desktop and Lightdm session
> manager over X11.
>
>
> I am consistently getting deadlocks, which sometimes are almost
> immediate, but sometimes they take about half an hour to occur. There
> are some video files that I am using for testing which have more
> probability of causing a deadlock than others.
>
> I got some kernel crash dumps, kodi process backtraces for the
> offending thread and the deadlocked process tree listing which I
> attach here. The kernel seems to deadlock during a page flip,
> indefinitelly waiting for the DMA fence to complete, however, it
> doesn't and the timeout doesn't expire either... as such this may be a
> GPU lockup.
>
> I can provide more details, if needed, if there is interest or time to
> look into this.
>
> Regards,
> Luís Mendes
> Software and Hardware engineer
>
> [  253.904103] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, last signaled seq=43831, last emitted seq=43833
> [  253.915041] [drm] IP block:gmc_v8_0 is hung!
> [  253.915047] [drm] IP block:gfx_v8_0 is hung!
> [  253.915162] [drm] GPU recovery disabled.
> [  366.541614] INFO: task kworker/u4:4:90 blocked for more than 120
> seconds.
> [  366.548436]       Not tainted 4.15.0-rc4-drmnext2g #1
> [  366.554300] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  366.562162] kworker/u4:4    D    0    90      2 0x00000000
> [  366.562196] Workqueue: events_unbound commit_work [drm_kms_helper]
> [  366.562215] [<80b8c6d4>] (__schedule) from [<80b8cdd0>]
> (schedule+0x4c/0xac)
> [  366.562223] [<80b8cdd0>] (schedule) from [<80b91024>]
> (schedule_timeout+0x228/0x444)
> [  366.562233] [<80b91024>] (schedule_timeout) from [<80886738>]
> (dma_fence_default_wait+0x2b4/0x2d8)
> [  366.562241] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
> (dma_fence_wait_timeout+0x40/0x150)
> [  366.562248] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
> [  366.562476] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
> [<7f2d3988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
> [  366.562754] [<7f2d3988>] (amdgpu_dm_do_flip [amdgpu]) from
> [<7f2d509c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
> [  366.562908] [<7f2d509c>] (amdgpu_dm_atomic_commit_tail [amdgpu])
> from [<7f13e58c>] (commit_tail+0x50/0x94 [drm_kms_helper])
> [  366.562931] [<7f13e58c>] (commit_tail [drm_kms_helper]) from
> [<7f13e5ec>] (commit_work+0x1c/0x20 [drm_kms_helper])
> [  366.562948] [<7f13e5ec>] (commit_work [drm_kms_helper]) from
> [<8016f4c8>] (process_one_work+0x1a8/0x4ac)
> [  366.562955] [<8016f4c8>] (process_one_work) from [<8017050c>]
> (worker_thread+0x68/0x598)
> [  366.562962] [<8017050c>] (worker_thread) from [<80175e50>]
> (kthread+0x16c/0x174)
> [  366.562970] [<80175e50>] (kthread) from [<80109de8>]
> (ret_from_fork+0x14/0x2c)
>
>
>  From userland side:
> (gdb) info thread
>    Id   Target Id         Frame
> * 1    Thread 0x6eb17c70 (LWP 2071) "kodi.bin" 0x748b2246 in ioctl ()
>      at ../sysdeps/unix/syscall-template.S:84
>    2    Thread 0x6eb14170 (LWP 2072) "Announce" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    3    Thread 0x6e1ff170 (LWP 2075) "ActiveAE" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    4    Thread 0x6d9ff170 (LWP 2076) "AESink" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    5    Thread 0x6b7c9170 (LWP 2081) "amdgpu_cs:0" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    6    Thread 0x6ae3c170 (LWP 2082) "disk_cache:0" __libc_do_syscall
> ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    7    Thread 0x571df170 (LWP 2083) "si_shader:0" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    8    Thread 0x569df170 (LWP 2084) "si_shader_low:0"
> __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    9    Thread 0x561df170 (LWP 2085) "gallium_drv:0" __libc_do_syscall
> ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    10   Thread 0x551f6170 (LWP 2086) "kodi.bin" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    11   Thread 0x549f6170 (LWP 2087) "PeripBusUSBUdev"
> __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
> ---Type <return> to continue, or q <return> to quit---
>    12   Thread 0x541f6170 (LWP 2088) "PeripBusCEC" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    13   Thread 0x539f6170 (LWP 2089) "PeripBusAddon" __libc_do_syscall
> ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    14   Thread 0x531f6170 (LWP 2090) "kodi.bin" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    15   Thread 0x510ff170 (LWP 2095) "Timer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    16   Thread 0x508ff170 (LWP 2096) "kodi.bin" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    17   Thread 0x500ff170 (LWP 2097) "kodi.bin" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    18   Thread 0x4f8b5170 (LWP 2098) "kodi.bin" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    19   Thread 0x4f0b5170 (LWP 2099) "kodi.bin" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    20   Thread 0x4e8b5170 (LWP 2100) "kodi.bin" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    21   Thread 0x4e0b5170 (LWP 2101) "EventServer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    22   Thread 0x4d8b5170 (LWP 2102) "TCPServer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    23   Thread 0x4c9a8170 (LWP 2103) "JobWorker" __libc_do_syscall ()
> ---Type <return> to continue, or q <return> to quit---
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    24   Thread 0x4c1a8170 (LWP 2104) "JobWorker" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    25   Thread 0x4b9a8170 (LWP 2108) "PVRManager" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    26   Thread 0x4b1a8170 (LWP 2114) "PVRGUIInfo" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    27   Thread 0x4a9a8170 (LWP 2115) "EPGUpdater" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    28   Thread 0x4a1a8170 (LWP 2119) "VideoPlayer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    29   Thread 0x499a8170 (LWP 2120) "VideoPlayer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    30   Thread 0x491a8170 (LWP 2121) "VideoPlayer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    31   Thread 0x489a8170 (LWP 2122) "VideoPlayer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    32   Thread 0x481a8170 (LWP 2123) "VideoPlayer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    33   Thread 0x479a8170 (LWP 2124) "VideoPlayer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    34   Thread 0x46d59170 (LWP 2302) "VideoPlayer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
> ---Type <return> to continue, or q <return> to quit---
>    35   Thread 0x46559170 (LWP 2303) "VideoPlayer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>    36   Thread 0x45d59170 (LWP 2304) "VideoPlayer" __libc_do_syscall ()
>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>
> 0x748b2246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
> 84      ../sysdeps/unix/syscall-template.S: No such file or directory.
> (gdb) bt
> #0  0x748b2246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
> #1  0x7605c096 in drmIoctl (fd=22, request=3223348297,
>      arg=arg@entry=0x7ec1b218) at ../xf86drm.c:191
> #2  0x6b875218 in amdgpu_ioctl_wait_cs (context=<optimized out>,
>      context=<optimized out>, busy=<synthetic pointer>,
> flags=<optimized out>,
>      timeout_ns=18446744073709551615, handle=10555, ring=<optimized
> out>,
>      ip_instance=<optimized out>, ip=<optimized out>)
>      at ../../amdgpu/amdgpu_cs.c:408
> #3  amdgpu_cs_query_fence_status (fence=<optimized out>,
>      timeout_ns=<optimized out>, flags=1, expired=0x7ec1b280)
>      at ../../amdgpu/amdgpu_cs.c:437
> #4  0x6be7866e in ?? () from /usr/lib/arm-linux-gnueabihf/dri/radeonsi_dri.so
>      ...
>
> ->Another reboot another crash - just at begin of playing a video with kodi
>
> [   73.432967] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, last signaled seq=4183, last emitted seq=4185
> [   73.443847] [drm] IP block:gmc_v8_0 is hung!
> [   73.443854] [drm] IP block:gfx_v8_0 is hung!
> [   73.444019] [drm] GPU recovery disabled.
> [  243.672640] INFO: task kworker/u4:3:89 blocked for more than 120
> seconds.
> [  243.679466]       Not tainted 4.15.0-rc4-drmnext2g #1
> [  243.685337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>]
> (schedule+0x4c/0xac)
> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
> (schedule_timeout+0x228/0x444)
> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
> (dma_fence_default_wait+0x2b4/0x2d8)
> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
> (dma_fence_wait_timeout+0x40/0x150)
> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
> [  243.693941] [<7f33309c>] (amdgpu_dm_atomic_commit_tail [amdgpu])
> from [<7f15758c>] (commit_tail+0x50/0x94 [drm_kms_helper])
> [  243.693964] [<7f15758c>] (commit_tail [drm_kms_helper]) from
> [<7f1575ec>] (commit_work+0x1c/0x20 [drm_kms_helper])
> [  243.693981] [<7f1575ec>] (commit_work [drm_kms_helper]) from
> [<8016f4c8>] (process_one_work+0x1a8/0x4ac)
> [  243.693987] [<8016f4c8>] (process_one_work) from [<8017050c>]
> (worker_thread+0x68/0x598)
> [  243.693994] [<8017050c>] (worker_thread) from [<80175e50>]
> (kthread+0x16c/0x174)
> [  243.694003] [<80175e50>] (kthread) from [<80109de8>]
> (ret_from_fork+0x14/0x2c)
>
> bt
> 0x7480c246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
> 84      ../sysdeps/unix/syscall-template.S: No such file or directory.
> (gdb) bt
> #0  0x7480c246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
> #1  0x76068096 in drmIoctl (fd=22, request=3223348297,
>      arg=arg@entry=0x7efc81c0) at ../xf86drm.c:191
> #2  0x6e457218 in amdgpu_ioctl_wait_cs (context=<optimized out>,
>      context=<optimized out>, busy=<synthetic pointer>,
> flags=<optimized out>,
>      timeout_ns=18446744073709551615, handle=832, ring=<optimized out>,
>      ip_instance=<optimized out>, ip=<optimized out>)
>      at ../../amdgpu/amdgpu_cs.c:408
> #3  amdgpu_cs_query_fence_status (fence=<optimized out>,
>      timeout_ns=<optimized out>, flags=1, expired=0x7efc8228)
>      at ../../amdgpu/amdgpu_cs.c:437
> #4  0x6bf5b6ec in ?? () from /usr/lib/arm-linux-gnueabihf/dri/radeonsi_dri.so
>
>
> ->Process tree with Deadlock
> ubuntu@localhost:~$ ps ax
>    PID TTY      STAT   TIME COMMAND
>      1 ?        Ss     0:05 /sbin/init
>      2 ?        S      0:00 [kthreadd]
>      4 ?        I<     0:00 [kworker/0:0H]
>      6 ?        I<     0:00 [mm_percpu_wq]
>      7 ?        S      0:00 [ksoftirqd/0]
>      8 ?        I      0:00 [rcu_sched]
>      9 ?        I      0:00 [rcu_bh]
>     10 ?        S      0:00 [migration/0]
>     11 ?        S      0:00 [cpuhp/0]
>     12 ?        S      0:00 [cpuhp/1]
>     13 ?        S      0:00 [migration/1]
>     14 ?        S      0:00 [ksoftirqd/1]
>     16 ?        I<     0:00 [kworker/1:0H]
>     17 ?        S      0:00 [kdevtmpfs]
>     18 ?        I<     0:00 [netns]
>     19 ?        I      0:00 [kworker/0:1]
>     20 ?        I      0:00 [kworker/1:1]
>     21 ?        S      0:00 [kauditd]
>     22 ?        S      0:00 [khungtaskd]
>     23 ?        S      0:00 [oom_reaper]
>     24 ?        I<     0:00 [writeback]
>     25 ?        S      0:00 [kcompactd0]
>     26 ?        SN     0:00 [ksmd]
>     27 ?        I<     0:00 [crypto]
>     28 ?        I<     0:00 [kintegrityd]
>     29 ?        I<     0:00 [kblockd]
>     30 ?        I<     0:00 [ata_sff]
>     31 ?        I<     0:00 [devfreq_wq]
>     32 ?        I<     0:00 [watchdogd]
>     44 ?        S      0:00 [kswapd0]
>     45 ?        I<     0:00 [xfsalloc]
>     46 ?        I<     0:00 [xfs_mru_cache]
>     68 ?        I      0:00 [kworker/u4:1]
>     78 ?        I<     0:00 [kthrotld]
>     79 ?        S      0:00 [kapmd]
>     80 ?        S      0:00 [scsi_eh_0]
>     81 ?        I<     0:00 [scsi_tmf_0]
>     82 ?        S      0:00 [scsi_eh_1]
>     83 ?        I<     0:00 [scsi_tmf_1]
>     84 ?        I      0:00 [kworker/u4:2]
>     85 ?        S      0:00 [scsi_eh_2]
>     86 ?        I<     0:00 [scsi_tmf_2]
>     87 ?        S      0:00 [scsi_eh_3]
>     88 ?        I<     0:00 [scsi_tmf_3]
>     89 ?        D      0:00 [kworker/u4:3]
>     91 ?        S      0:00 [spi1]
>     96 ?        S      0:00 [irq/47-mmc0]
>     98 ?        S      0:00 [irq/53-f10d8000]
>     99 ?        S      0:00 [irq/42-f1090000]
>    100 ?        S      0:00 [irq/43-f1090000]
>    113 ?        I<     0:00 [ipv6_addrconf]
>    114 ?        I      0:00 [kworker/0:2]
>    143 ?        S      0:00 [mmcqd/0]
>    144 ?        I<     0:00 [kworker/1:1H]
>    145 ?        I<     0:00 [kworker/0:1H]
>    146 ?        S      0:00 [jbd2/sda2-8]
>    147 ?        I<     0:00 [ext4-rsv-conver]
>    181 ?        Ss     0:00 /lib/systemd/systemd-journald
>    213 ?        Ss     0:00 /lib/systemd/systemd-udevd
>    243 ?        I<     0:00 [dsa_ordered]
>    310 ?        I<     0:00 [ttm_swap]
>    314 ?        S      0:00 [gfx]
>    316 ?        S      0:00 [comp_1.0.0]
>    317 ?        S      0:00 [comp_1.0.1]
>    318 ?        S      0:00 [comp_1.0.2]
>    319 ?        S      0:00 [comp_1.0.3]
>    320 ?        S      0:00 [comp_1.0.4]
>    321 ?        S      0:00 [jbd2/mmcblk0p1-]
>    322 ?        S      0:00 [comp_1.0.5]
>    323 ?        I<     0:00 [ext4-rsv-conver]
>    324 ?        S      0:00 [comp_1.0.6]
>    325 ?        S      0:00 [comp_1.0.7]
>    326 ?        S      0:00 [sdma0]
>    327 ?        S      0:00 [sdma1]
>    328 ?        S      0:00 [uvd]
>    329 ?        S      0:00 [uvd_enc0]
>    330 ?        S      0:00 [uvd_enc1]
>    331 ?        S      0:00 [vce0]
>    332 ?        S      0:00 [vce1]
>    334 ?        I<     0:00 [dm_timer_queue]
>    347 ?        Ssl    0:00 /lib/systemd/systemd-timesyncd
>    455 ?        Ss     0:00 /usr/bin/dbus-daemon --system
> --address=systemd: --no
>    456 ?        Ssl    0:00 /usr/sbin/NetworkManager --no-daemon
>    457 ?        Ss     0:00 /usr/sbin/cupsd -l
>    458 ?        Ssl    0:00 /usr/lib/accountsservice/accounts-daemon
>    462 ?        Ss     0:00 /usr/sbin/atd -f
>    464 ?        Ssl    0:00 /usr/sbin/ModemManager
>    467 ?        Ss     0:00 /usr/sbin/cron -f
>    475 ?        Ssl    0:00 /usr/lib/udisks2/udisksd
>    483 ?        Ssl    0:00 /usr/sbin/rsyslogd -n
>    487 ?        Ss     0:00 /lib/systemd/systemd-logind
>    488 ?        Ss     0:00 avahi-daemon: running [linux.local]
>    489 ?        S      0:00 avahi-daemon: chroot helper
>    490 ?        Ssl    0:00 /usr/lib/snapd/snapd
>    538 ?        Ssl    0:02 /usr/bin/tvheadend -f -u hts -g video
>    563 ?        I      0:00 [kworker/1:3]
>    565 ?        Ssl    0:00 /usr/sbin/cups-browsed
>    581 ?        Ss     0:00 /lib/systemd/systemd-resolved
>    582 ?        Ssl    0:00 /usr/lib/policykit-1/polkitd --no-debug
>    586 ?        Ss     0:00 /usr/sbin/sshd -D
>    619 ?        SLsl   0:00 /usr/sbin/lightdm
>    630 ttyS0    Ss     0:00 /bin/login --
>    638 tty7     Ssl+   0:03 /usr/lib/xorg/Xorg -core :0 -seat seat0
> -auth /var/ru
>    642 tty1     Ss+    0:00 /sbin/agetty --noclear tty1 linux
>    737 ?        Ssl    0:00 /usr/lib/upower/upowerd
>    758 ?        Sl     0:00 lightdm --session-child 13 20
>    762 ?        Ssl    0:00 /usr/bin/whoopsie -f
>    769 ?        Ss     0:00 /usr/sbin/kerneloops
>    778 ?        Ss     0:00 /lib/systemd/systemd --user
>    779 ?        S      0:00 (sd-pam)
>    788 ?        Sl     0:00 /usr/bin/gnome-keyring-daemon --daemonize
> --login
>    791 ?        Ssl    0:00 mate-session
>    976 ?        Ss     0:00 /usr/bin/dbus-daemon --session
> --address=systemd: --n
>   1099 ?        Ss     0:00 /usr/bin/ssh-agent /usr/bin/im-launch
> mate-session
>   1168 ?        Ssl    0:00 /usr/lib/gvfs/gvfsd
>   1173 ?        Sl     0:00 /usr/lib/gvfs/gvfsd-fuse
> /run/user/1000/gvfs -f -o bi
>   1183 ?        Sl     0:00 /usr/lib/dconf/dconf-service
>   1189 ?        Sl     0:00 /usr/bin/mate-settings-daemon
>   1193 ?        Sl     0:00 marco
>   1201 ?        S<l    0:00 /usr/bin/pulseaudio --start
> --log-target=syslog
>   1202 ?        SNsl   0:00 /usr/lib/rtkit/rtkit-daemon
>   1205 ?        Sl     0:01 mate-panel
>   1220 ?        Sl     0:02 caja
>   1224 ?        Sl     0:00 /usr/lib/mate-panel/wnck-applet
>   1225 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-udisks2-volume-monitor
>   1232 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-goa-volume-monitor
>   1243 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-afc-volume-monitor
>   1248 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-gphoto2-volume-monitor
>   1252 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-mtp-volume-monitor
>   1258 ?        Sl     0:00 /usr/lib/mate-applets/trashapplet
>   1262 ?        Sl     0:00 /usr/lib/mate-panel/clock-applet
>   1263 ?        Sl     0:00
> /usr/lib/mate-panel/notification-area-applet
>   1267 ?        Sl     0:00 mate-screensaver
>   1269 ?        Sl     0:00 /usr/bin/python3
> /usr/share/system-config-printer/app
>   1272 ?        Sl     0:00 mate-maximus
>   1274 ?        Sl     0:00 nm-applet
>   1281 ?        Sl     0:00 mate-power-manager
>   1283 ?        Sl     0:00 kerneloops-applet
>   1285 ?        Sl     0:00 /usr/lib/at-spi2-core/at-spi-bus-launcher
> --launch-im
>   1289 ?        Sl     0:00 update-notifier
>   1291 ?        Sl     0:01 /usr/bin/python3 /usr/bin/blueman-applet
>   1292 ?        Sl     0:00
> /usr/lib/arm-linux-gnueabihf/deja-dup/deja-dup-monito
>   1293 ?        Sl     0:00 mate-volume-control-applet
>   1296 ?        Sl     0:02 /usr/bin/python3 /usr/bin/onboard
>   1306 ?        Sl     0:00
> /usr/lib/arm-linux-gnueabihf/polkit-mate/polkit-mate-
>   1314 ?        S      0:00 /usr/bin/dbus-daemon
> --config-file=/usr/share/default
>   1327 ?        Sl     0:00 /usr/lib/at-spi2-core/at-spi2-registryd
> --use-gnome-s
>   1342 ?        Sl     0:00 /usr/lib/gvfs/gvfsd-trash --spawner :1.6
> /org/gtk/gvf
>   1395 ?        S      0:00 [jbd2/sda3-8]
>   1396 ?        I<     0:00 [ext4-rsv-conver]
>   1397 ?        S      0:00 [jbd2/mmcblk0p2-]
>   1398 ?        I<     0:00 [ext4-rsv-conver]
>   1413 ?        Ssl    0:00 /usr/lib/gvfs/gvfsd-metadata
>   1438 ?        S      0:00 /bin/sh /usr/bin/kodi
>   1442 ?        Sl     0:44 /usr/lib/arm-linux-gnueabihf/kodi/kodi.bin
>   1484 ?        Ss     0:00 /usr/lib/bluetooth/obexd
>
>
> Kernel amdgpu initialization:
> [   12.865468] [drm] amdgpu kernel modesetting enabled.
> [   12.865699] amdgpu 0000:01:00.0: enabling device (0140 -> 0143)
> [   12.866422] [drm] initializing kernel modesetting (POLARIS11
> 0x1002:0x67EF 0x174B:0xE348 0xCF).
> [   12.866434] [drm] register mmio base: 0xE0200000
> [   12.866436] [drm] register mmio size: 262144
> [   12.866463] [drm] probing gen 2 caps for device 11ab:6828 = 3ac12/0
> [   12.866466] [drm] probing mlw for device 11ab:6828 = 3ac12
> [   12.866482] [drm] UVD is enabled in VM mode
> [   12.866483] [drm] UVD ENC is enabled in VM mode
> [   12.866487] [drm] VCE enabled in VM mode
> [   13.083379] ATOM BIOS: 113-34801-U03
> [   13.083412] [drm] GPU posting now...
> [   13.300951] [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
> fragment size is 9-bit
> [   13.302750] amdgpu 0000:01:00.0: VRAM: 2048M 0x000000F400000000 -
> 0x000000F47FFFFFFF (2048M used)
> [   13.302756] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 -
> 0x000000000FFFFFFF
> [   13.302759] [drm] Detected VRAM RAM=2048M, BAR=256M
> [   13.302761] [drm] RAM width 128bits GDDR5
> [   13.307710] [TTM] Zone  kernel: Available graphics memory: 510748
> kiB
> [   13.307714] [TTM] Initializing pool allocator
> [   13.307776] [drm] amdgpu: 2048M of VRAM memory ready
> [   13.307781] [drm] amdgpu: 748M of GTT memory ready.
> [   13.307828] [drm] GART: num cpu pages 65536, num gpu pages 65536
> [   13.307910] [drm] PCIE GART of 256M enabled (table at
> 0x000000F400040000).
> [   13.309597] [drm] Chained IB support enabled!
>                                                  [   13.341215] [drm]
> Found UVD firmware Version: 1.79 Family ID: 16
> [   13.359302] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
> [   13.427998] amdgpu: [powerplay]
>                  failed to send message 309 ret is 254
> [   13.428020] amdgpu: [powerplay]
>                  failed to send pre message 14e ret is 254
> [   13.436878] [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR*
> Don't have enable_spread_spectrum_on_ppll for v4
> [   13.447496] [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR*
> Don't have program_clock for v7
> [   13.456402] [drm] DM_PPLIB: values for Engine clock
> [   13.456405] [drm] DM_PPLIB:   21400
> [   13.456406] [drm] DM_PPLIB:   48100
> [   13.456407] [drm] DM_PPLIB:   76000
> [   13.456408] [drm] DM_PPLIB:   102000
> [   13.456409] [drm] DM_PPLIB:   110200
> [   13.456410] [drm] DM_PPLIB:   113800
> [   13.456411] [drm] DM_PPLIB:   117200
> [   13.456412] [drm] DM_PPLIB:   121000
> [   13.456413] [drm] DM_PPLIB: Warning: using default validation
> clocks!
> [   13.456414] [drm] DM_PPLIB: Validation clocks:
> [   13.456416] [drm] DM_PPLIB:    engine_max_clock: 72000
> [   13.456417] [drm] DM_PPLIB:    memory_max_clock: 80000
> [   13.456418] [drm] DM_PPLIB:    level           : 0
> [   13.456419] [drm] DM_PPLIB: reducing engine clock level from 8 to 2
> [   13.456423] [drm] DM_PPLIB: values for Memory clock
> [   13.456425] [drm] DM_PPLIB:   30000
> [   13.456425] [drm] DM_PPLIB:   175000
> [   13.456427] [drm] DM_PPLIB: Warning: using default validation
> clocks!
> [   13.456427] [drm] DM_PPLIB: Validation clocks:
> [   13.456428] [drm] DM_PPLIB:    engine_max_clock: 72000
> [   13.456429] [drm] DM_PPLIB:    memory_max_clock: 80000
> [   13.456430] [drm] DM_PPLIB:    level           : 0
> [   13.456432] [drm] DM_PPLIB: reducing memory clock level from 2 to 1
> [   13.457440] [drm] Display Core initialized with v3.1.27!
> [   13.664292] [drm] Supports vblank timestamp caching Rev 2
> (21.10.2013).
> [   13.664295] [drm] Driver supports precise vblank timestamp query.
> [   13.703612] [drm] UVD and UVD ENC initialized successfully.
> [   13.805731] [drm] VCE initialized successfully.
> [   14.316289] [drm] fb mappable at 0xD03F2000
> [   14.316291] [drm] vram apper at 0xD0000000
> [   14.316293] [drm] size 3145728
> [   14.316294] [drm] fb depth is 24
> [   14.316296] [drm]    pitch is 4096
> [   14.336250] Console: switching to colour frame buffer device 128x48
> [   14.396964] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer
> device
> [   14.469610] [drm] Initialized amdgpu 3.23.0 20150101 for
> 0000:01:00.0 on minor 0
> [   14.470097] snd_hda_intel 0000:01:00.1: enabling device (0140 ->
> 0142)
> [   14.470109] snd_hda_intel 0000:01:00.1: Force to snoop mode by
> module option
> [   14.535472] input: HDA ATI HDMI HDMI/DP,pcm=3 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input3
> [   14.535606] input: HDA ATI HDMI HDMI/DP,pcm=7 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input4
> [   14.535730] input: HDA ATI HDMI HDMI/DP,pcm=8 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input5
> [   14.535845] input: HDA ATI HDMI HDMI/DP,pcm=9 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input6
> [   14.535971] input: HDA ATI HDMI HDMI/DP,pcm=10 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input7
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]     ` <1ca083a5-f33c-7be0-a3c8-c2996a087f70-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-02  9:38       ` Christian König
  2018-01-02 13:09       ` Luís Mendes
  1 sibling, 0 replies; 25+ messages in thread
From: Christian König @ 2018-01-02  9:38 UTC (permalink / raw)
  To: Chunming Zhou, Luís Mendes,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	alexander.deucher-5C7GfCeVMHo

Hi Luis,

well first of all that isn't a deadlock, but just a hardware lockup. So 
the traces and logs you tried to attach are not really useful.

What we need instead is a full dmesg and/or some kodi logs, API trace 
etc.. what exactly is happening when the problem occurs.

Please also try Mesa master and try to disable power management as much 
as possible.

Regards,
Christian.

Am 02.01.2018 um 03:51 schrieb Chunming Zhou:
> Did you try it on x86 board? Is there same issue?
>
> We should identify it is ARM specific or genera issue for amdgpu driver.
>
>
> Thanks,
>
> David Zhou
>
>
> On 2018年01月02日 00:32, Luís Mendes wrote:
>> I am currently testing the amdgpu driver with AMD RX460 and RX550
>> graphics cards on an ARM Cortex-A9 with 1GB RAM and I am consistently
>> getting deadlocks when playing videos with Kodi or other applications.
>>
>> I'm using Linux kernel from
>> https://cgit.freedesktop.org/~agd5f/linux/, branch drm-next-4.16 at
>> commit "drm/amdgpu: Correct the IB size of bo update mapping" -
>> 104bd2ca1124dfd9aa904d5f5a96253ef2b580f6  along with libdrm-2.4.89 and
>> mesa-17.3.1 on an Ubuntu 17.10 with Mate desktop and Lightdm session
>> manager over X11.
>>
>>
>> I am consistently getting deadlocks, which sometimes are almost
>> immediate, but sometimes they take about half an hour to occur. There
>> are some video files that I am using for testing which have more
>> probability of causing a deadlock than others.
>>
>> I got some kernel crash dumps, kodi process backtraces for the
>> offending thread and the deadlocked process tree listing which I
>> attach here. The kernel seems to deadlock during a page flip,
>> indefinitelly waiting for the DMA fence to complete, however, it
>> doesn't and the timeout doesn't expire either... as such this may be a
>> GPU lockup.
>>
>> I can provide more details, if needed, if there is interest or time to
>> look into this.
>>
>> Regards,
>> Luís Mendes
>> Software and Hardware engineer
>>
>> [  253.904103] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
>> timeout, last signaled seq=43831, last emitted seq=43833
>> [  253.915041] [drm] IP block:gmc_v8_0 is hung!
>> [  253.915047] [drm] IP block:gfx_v8_0 is hung!
>> [  253.915162] [drm] GPU recovery disabled.
>> [  366.541614] INFO: task kworker/u4:4:90 blocked for more than 120
>> seconds.
>> [  366.548436]       Not tainted 4.15.0-rc4-drmnext2g #1
>> [  366.554300] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  366.562162] kworker/u4:4    D    0    90      2 0x00000000
>> [  366.562196] Workqueue: events_unbound commit_work [drm_kms_helper]
>> [  366.562215] [<80b8c6d4>] (__schedule) from [<80b8cdd0>]
>> (schedule+0x4c/0xac)
>> [  366.562223] [<80b8cdd0>] (schedule) from [<80b91024>]
>> (schedule_timeout+0x228/0x444)
>> [  366.562233] [<80b91024>] (schedule_timeout) from [<80886738>]
>> (dma_fence_default_wait+0x2b4/0x2d8)
>> [  366.562241] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
>> (dma_fence_wait_timeout+0x40/0x150)
>> [  366.562248] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
>> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
>> [  366.562476] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
>> [<7f2d3988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
>> [  366.562754] [<7f2d3988>] (amdgpu_dm_do_flip [amdgpu]) from
>> [<7f2d509c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
>> [  366.562908] [<7f2d509c>] (amdgpu_dm_atomic_commit_tail [amdgpu])
>> from [<7f13e58c>] (commit_tail+0x50/0x94 [drm_kms_helper])
>> [  366.562931] [<7f13e58c>] (commit_tail [drm_kms_helper]) from
>> [<7f13e5ec>] (commit_work+0x1c/0x20 [drm_kms_helper])
>> [  366.562948] [<7f13e5ec>] (commit_work [drm_kms_helper]) from
>> [<8016f4c8>] (process_one_work+0x1a8/0x4ac)
>> [  366.562955] [<8016f4c8>] (process_one_work) from [<8017050c>]
>> (worker_thread+0x68/0x598)
>> [  366.562962] [<8017050c>] (worker_thread) from [<80175e50>]
>> (kthread+0x16c/0x174)
>> [  366.562970] [<80175e50>] (kthread) from [<80109de8>]
>> (ret_from_fork+0x14/0x2c)
>>
>>
>>  From userland side:
>> (gdb) info thread
>>    Id   Target Id         Frame
>> * 1    Thread 0x6eb17c70 (LWP 2071) "kodi.bin" 0x748b2246 in ioctl ()
>>      at ../sysdeps/unix/syscall-template.S:84
>>    2    Thread 0x6eb14170 (LWP 2072) "Announce" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    3    Thread 0x6e1ff170 (LWP 2075) "ActiveAE" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    4    Thread 0x6d9ff170 (LWP 2076) "AESink" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    5    Thread 0x6b7c9170 (LWP 2081) "amdgpu_cs:0" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    6    Thread 0x6ae3c170 (LWP 2082) "disk_cache:0" __libc_do_syscall
>> ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    7    Thread 0x571df170 (LWP 2083) "si_shader:0" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    8    Thread 0x569df170 (LWP 2084) "si_shader_low:0"
>> __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    9    Thread 0x561df170 (LWP 2085) "gallium_drv:0" __libc_do_syscall
>> ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    10   Thread 0x551f6170 (LWP 2086) "kodi.bin" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    11   Thread 0x549f6170 (LWP 2087) "PeripBusUSBUdev"
>> __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>> ---Type <return> to continue, or q <return> to quit---
>>    12   Thread 0x541f6170 (LWP 2088) "PeripBusCEC" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    13   Thread 0x539f6170 (LWP 2089) "PeripBusAddon" __libc_do_syscall
>> ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    14   Thread 0x531f6170 (LWP 2090) "kodi.bin" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    15   Thread 0x510ff170 (LWP 2095) "Timer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    16   Thread 0x508ff170 (LWP 2096) "kodi.bin" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    17   Thread 0x500ff170 (LWP 2097) "kodi.bin" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    18   Thread 0x4f8b5170 (LWP 2098) "kodi.bin" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    19   Thread 0x4f0b5170 (LWP 2099) "kodi.bin" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    20   Thread 0x4e8b5170 (LWP 2100) "kodi.bin" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    21   Thread 0x4e0b5170 (LWP 2101) "EventServer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    22   Thread 0x4d8b5170 (LWP 2102) "TCPServer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    23   Thread 0x4c9a8170 (LWP 2103) "JobWorker" __libc_do_syscall ()
>> ---Type <return> to continue, or q <return> to quit---
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    24   Thread 0x4c1a8170 (LWP 2104) "JobWorker" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    25   Thread 0x4b9a8170 (LWP 2108) "PVRManager" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    26   Thread 0x4b1a8170 (LWP 2114) "PVRGUIInfo" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    27   Thread 0x4a9a8170 (LWP 2115) "EPGUpdater" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    28   Thread 0x4a1a8170 (LWP 2119) "VideoPlayer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    29   Thread 0x499a8170 (LWP 2120) "VideoPlayer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    30   Thread 0x491a8170 (LWP 2121) "VideoPlayer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    31   Thread 0x489a8170 (LWP 2122) "VideoPlayer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    32   Thread 0x481a8170 (LWP 2123) "VideoPlayer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    33   Thread 0x479a8170 (LWP 2124) "VideoPlayer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    34   Thread 0x46d59170 (LWP 2302) "VideoPlayer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>> ---Type <return> to continue, or q <return> to quit---
>>    35   Thread 0x46559170 (LWP 2303) "VideoPlayer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>    36   Thread 0x45d59170 (LWP 2304) "VideoPlayer" __libc_do_syscall ()
>>      at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
>>
>> 0x748b2246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
>> 84      ../sysdeps/unix/syscall-template.S: No such file or directory.
>> (gdb) bt
>> #0  0x748b2246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
>> #1  0x7605c096 in drmIoctl (fd=22, request=3223348297,
>>      arg=arg@entry=0x7ec1b218) at ../xf86drm.c:191
>> #2  0x6b875218 in amdgpu_ioctl_wait_cs (context=<optimized out>,
>>      context=<optimized out>, busy=<synthetic pointer>,
>> flags=<optimized out>,
>>      timeout_ns=18446744073709551615, handle=10555, ring=<optimized
>> out>,
>>      ip_instance=<optimized out>, ip=<optimized out>)
>>      at ../../amdgpu/amdgpu_cs.c:408
>> #3  amdgpu_cs_query_fence_status (fence=<optimized out>,
>>      timeout_ns=<optimized out>, flags=1, expired=0x7ec1b280)
>>      at ../../amdgpu/amdgpu_cs.c:437
>> #4  0x6be7866e in ?? () from 
>> /usr/lib/arm-linux-gnueabihf/dri/radeonsi_dri.so
>>      ...
>>
>> ->Another reboot another crash - just at begin of playing a video 
>> with kodi
>>
>> [   73.432967] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
>> timeout, last signaled seq=4183, last emitted seq=4185
>> [   73.443847] [drm] IP block:gmc_v8_0 is hung!
>> [   73.443854] [drm] IP block:gfx_v8_0 is hung!
>> [   73.444019] [drm] GPU recovery disabled.
>> [  243.672640] INFO: task kworker/u4:3:89 blocked for more than 120
>> seconds.
>> [  243.679466]       Not tainted 4.15.0-rc4-drmnext2g #1
>> [  243.685337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
>> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
>> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>]
>> (schedule+0x4c/0xac)
>> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
>> (schedule_timeout+0x228/0x444)
>> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
>> (dma_fence_default_wait+0x2b4/0x2d8)
>> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
>> (dma_fence_wait_timeout+0x40/0x150)
>> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
>> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
>> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
>> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
>> [  243.693941] [<7f33309c>] (amdgpu_dm_atomic_commit_tail [amdgpu])
>> from [<7f15758c>] (commit_tail+0x50/0x94 [drm_kms_helper])
>> [  243.693964] [<7f15758c>] (commit_tail [drm_kms_helper]) from
>> [<7f1575ec>] (commit_work+0x1c/0x20 [drm_kms_helper])
>> [  243.693981] [<7f1575ec>] (commit_work [drm_kms_helper]) from
>> [<8016f4c8>] (process_one_work+0x1a8/0x4ac)
>> [  243.693987] [<8016f4c8>] (process_one_work) from [<8017050c>]
>> (worker_thread+0x68/0x598)
>> [  243.693994] [<8017050c>] (worker_thread) from [<80175e50>]
>> (kthread+0x16c/0x174)
>> [  243.694003] [<80175e50>] (kthread) from [<80109de8>]
>> (ret_from_fork+0x14/0x2c)
>>
>> bt
>> 0x7480c246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
>> 84      ../sysdeps/unix/syscall-template.S: No such file or directory.
>> (gdb) bt
>> #0  0x7480c246 in ioctl () at ../sysdeps/unix/syscall-template.S:84
>> #1  0x76068096 in drmIoctl (fd=22, request=3223348297,
>>      arg=arg@entry=0x7efc81c0) at ../xf86drm.c:191
>> #2  0x6e457218 in amdgpu_ioctl_wait_cs (context=<optimized out>,
>>      context=<optimized out>, busy=<synthetic pointer>,
>> flags=<optimized out>,
>>      timeout_ns=18446744073709551615, handle=832, ring=<optimized out>,
>>      ip_instance=<optimized out>, ip=<optimized out>)
>>      at ../../amdgpu/amdgpu_cs.c:408
>> #3  amdgpu_cs_query_fence_status (fence=<optimized out>,
>>      timeout_ns=<optimized out>, flags=1, expired=0x7efc8228)
>>      at ../../amdgpu/amdgpu_cs.c:437
>> #4  0x6bf5b6ec in ?? () from 
>> /usr/lib/arm-linux-gnueabihf/dri/radeonsi_dri.so
>>
>>
>> ->Process tree with Deadlock
>> ubuntu@localhost:~$ ps ax
>>    PID TTY      STAT   TIME COMMAND
>>      1 ?        Ss     0:05 /sbin/init
>>      2 ?        S      0:00 [kthreadd]
>>      4 ?        I<     0:00 [kworker/0:0H]
>>      6 ?        I<     0:00 [mm_percpu_wq]
>>      7 ?        S      0:00 [ksoftirqd/0]
>>      8 ?        I      0:00 [rcu_sched]
>>      9 ?        I      0:00 [rcu_bh]
>>     10 ?        S      0:00 [migration/0]
>>     11 ?        S      0:00 [cpuhp/0]
>>     12 ?        S      0:00 [cpuhp/1]
>>     13 ?        S      0:00 [migration/1]
>>     14 ?        S      0:00 [ksoftirqd/1]
>>     16 ?        I<     0:00 [kworker/1:0H]
>>     17 ?        S      0:00 [kdevtmpfs]
>>     18 ?        I<     0:00 [netns]
>>     19 ?        I      0:00 [kworker/0:1]
>>     20 ?        I      0:00 [kworker/1:1]
>>     21 ?        S      0:00 [kauditd]
>>     22 ?        S      0:00 [khungtaskd]
>>     23 ?        S      0:00 [oom_reaper]
>>     24 ?        I<     0:00 [writeback]
>>     25 ?        S      0:00 [kcompactd0]
>>     26 ?        SN     0:00 [ksmd]
>>     27 ?        I<     0:00 [crypto]
>>     28 ?        I<     0:00 [kintegrityd]
>>     29 ?        I<     0:00 [kblockd]
>>     30 ?        I<     0:00 [ata_sff]
>>     31 ?        I<     0:00 [devfreq_wq]
>>     32 ?        I<     0:00 [watchdogd]
>>     44 ?        S      0:00 [kswapd0]
>>     45 ?        I<     0:00 [xfsalloc]
>>     46 ?        I<     0:00 [xfs_mru_cache]
>>     68 ?        I      0:00 [kworker/u4:1]
>>     78 ?        I<     0:00 [kthrotld]
>>     79 ?        S      0:00 [kapmd]
>>     80 ?        S      0:00 [scsi_eh_0]
>>     81 ?        I<     0:00 [scsi_tmf_0]
>>     82 ?        S      0:00 [scsi_eh_1]
>>     83 ?        I<     0:00 [scsi_tmf_1]
>>     84 ?        I      0:00 [kworker/u4:2]
>>     85 ?        S      0:00 [scsi_eh_2]
>>     86 ?        I<     0:00 [scsi_tmf_2]
>>     87 ?        S      0:00 [scsi_eh_3]
>>     88 ?        I<     0:00 [scsi_tmf_3]
>>     89 ?        D      0:00 [kworker/u4:3]
>>     91 ?        S      0:00 [spi1]
>>     96 ?        S      0:00 [irq/47-mmc0]
>>     98 ?        S      0:00 [irq/53-f10d8000]
>>     99 ?        S      0:00 [irq/42-f1090000]
>>    100 ?        S      0:00 [irq/43-f1090000]
>>    113 ?        I<     0:00 [ipv6_addrconf]
>>    114 ?        I      0:00 [kworker/0:2]
>>    143 ?        S      0:00 [mmcqd/0]
>>    144 ?        I<     0:00 [kworker/1:1H]
>>    145 ?        I<     0:00 [kworker/0:1H]
>>    146 ?        S      0:00 [jbd2/sda2-8]
>>    147 ?        I<     0:00 [ext4-rsv-conver]
>>    181 ?        Ss     0:00 /lib/systemd/systemd-journald
>>    213 ?        Ss     0:00 /lib/systemd/systemd-udevd
>>    243 ?        I<     0:00 [dsa_ordered]
>>    310 ?        I<     0:00 [ttm_swap]
>>    314 ?        S      0:00 [gfx]
>>    316 ?        S      0:00 [comp_1.0.0]
>>    317 ?        S      0:00 [comp_1.0.1]
>>    318 ?        S      0:00 [comp_1.0.2]
>>    319 ?        S      0:00 [comp_1.0.3]
>>    320 ?        S      0:00 [comp_1.0.4]
>>    321 ?        S      0:00 [jbd2/mmcblk0p1-]
>>    322 ?        S      0:00 [comp_1.0.5]
>>    323 ?        I<     0:00 [ext4-rsv-conver]
>>    324 ?        S      0:00 [comp_1.0.6]
>>    325 ?        S      0:00 [comp_1.0.7]
>>    326 ?        S      0:00 [sdma0]
>>    327 ?        S      0:00 [sdma1]
>>    328 ?        S      0:00 [uvd]
>>    329 ?        S      0:00 [uvd_enc0]
>>    330 ?        S      0:00 [uvd_enc1]
>>    331 ?        S      0:00 [vce0]
>>    332 ?        S      0:00 [vce1]
>>    334 ?        I<     0:00 [dm_timer_queue]
>>    347 ?        Ssl    0:00 /lib/systemd/systemd-timesyncd
>>    455 ?        Ss     0:00 /usr/bin/dbus-daemon --system
>> --address=systemd: --no
>>    456 ?        Ssl    0:00 /usr/sbin/NetworkManager --no-daemon
>>    457 ?        Ss     0:00 /usr/sbin/cupsd -l
>>    458 ?        Ssl    0:00 /usr/lib/accountsservice/accounts-daemon
>>    462 ?        Ss     0:00 /usr/sbin/atd -f
>>    464 ?        Ssl    0:00 /usr/sbin/ModemManager
>>    467 ?        Ss     0:00 /usr/sbin/cron -f
>>    475 ?        Ssl    0:00 /usr/lib/udisks2/udisksd
>>    483 ?        Ssl    0:00 /usr/sbin/rsyslogd -n
>>    487 ?        Ss     0:00 /lib/systemd/systemd-logind
>>    488 ?        Ss     0:00 avahi-daemon: running [linux.local]
>>    489 ?        S      0:00 avahi-daemon: chroot helper
>>    490 ?        Ssl    0:00 /usr/lib/snapd/snapd
>>    538 ?        Ssl    0:02 /usr/bin/tvheadend -f -u hts -g video
>>    563 ?        I      0:00 [kworker/1:3]
>>    565 ?        Ssl    0:00 /usr/sbin/cups-browsed
>>    581 ?        Ss     0:00 /lib/systemd/systemd-resolved
>>    582 ?        Ssl    0:00 /usr/lib/policykit-1/polkitd --no-debug
>>    586 ?        Ss     0:00 /usr/sbin/sshd -D
>>    619 ?        SLsl   0:00 /usr/sbin/lightdm
>>    630 ttyS0    Ss     0:00 /bin/login --
>>    638 tty7     Ssl+   0:03 /usr/lib/xorg/Xorg -core :0 -seat seat0
>> -auth /var/ru
>>    642 tty1     Ss+    0:00 /sbin/agetty --noclear tty1 linux
>>    737 ?        Ssl    0:00 /usr/lib/upower/upowerd
>>    758 ?        Sl     0:00 lightdm --session-child 13 20
>>    762 ?        Ssl    0:00 /usr/bin/whoopsie -f
>>    769 ?        Ss     0:00 /usr/sbin/kerneloops
>>    778 ?        Ss     0:00 /lib/systemd/systemd --user
>>    779 ?        S      0:00 (sd-pam)
>>    788 ?        Sl     0:00 /usr/bin/gnome-keyring-daemon --daemonize
>> --login
>>    791 ?        Ssl    0:00 mate-session
>>    976 ?        Ss     0:00 /usr/bin/dbus-daemon --session
>> --address=systemd: --n
>>   1099 ?        Ss     0:00 /usr/bin/ssh-agent /usr/bin/im-launch
>> mate-session
>>   1168 ?        Ssl    0:00 /usr/lib/gvfs/gvfsd
>>   1173 ?        Sl     0:00 /usr/lib/gvfs/gvfsd-fuse
>> /run/user/1000/gvfs -f -o bi
>>   1183 ?        Sl     0:00 /usr/lib/dconf/dconf-service
>>   1189 ?        Sl     0:00 /usr/bin/mate-settings-daemon
>>   1193 ?        Sl     0:00 marco
>>   1201 ?        S<l    0:00 /usr/bin/pulseaudio --start
>> --log-target=syslog
>>   1202 ?        SNsl   0:00 /usr/lib/rtkit/rtkit-daemon
>>   1205 ?        Sl     0:01 mate-panel
>>   1220 ?        Sl     0:02 caja
>>   1224 ?        Sl     0:00 /usr/lib/mate-panel/wnck-applet
>>   1225 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-udisks2-volume-monitor
>>   1232 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-goa-volume-monitor
>>   1243 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-afc-volume-monitor
>>   1248 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-gphoto2-volume-monitor
>>   1252 ?        Ssl    0:00 /usr/lib/gvfs/gvfs-mtp-volume-monitor
>>   1258 ?        Sl     0:00 /usr/lib/mate-applets/trashapplet
>>   1262 ?        Sl     0:00 /usr/lib/mate-panel/clock-applet
>>   1263 ?        Sl     0:00
>> /usr/lib/mate-panel/notification-area-applet
>>   1267 ?        Sl     0:00 mate-screensaver
>>   1269 ?        Sl     0:00 /usr/bin/python3
>> /usr/share/system-config-printer/app
>>   1272 ?        Sl     0:00 mate-maximus
>>   1274 ?        Sl     0:00 nm-applet
>>   1281 ?        Sl     0:00 mate-power-manager
>>   1283 ?        Sl     0:00 kerneloops-applet
>>   1285 ?        Sl     0:00 /usr/lib/at-spi2-core/at-spi-bus-launcher
>> --launch-im
>>   1289 ?        Sl     0:00 update-notifier
>>   1291 ?        Sl     0:01 /usr/bin/python3 /usr/bin/blueman-applet
>>   1292 ?        Sl     0:00
>> /usr/lib/arm-linux-gnueabihf/deja-dup/deja-dup-monito
>>   1293 ?        Sl     0:00 mate-volume-control-applet
>>   1296 ?        Sl     0:02 /usr/bin/python3 /usr/bin/onboard
>>   1306 ?        Sl     0:00
>> /usr/lib/arm-linux-gnueabihf/polkit-mate/polkit-mate-
>>   1314 ?        S      0:00 /usr/bin/dbus-daemon
>> --config-file=/usr/share/default
>>   1327 ?        Sl     0:00 /usr/lib/at-spi2-core/at-spi2-registryd
>> --use-gnome-s
>>   1342 ?        Sl     0:00 /usr/lib/gvfs/gvfsd-trash --spawner :1.6
>> /org/gtk/gvf
>>   1395 ?        S      0:00 [jbd2/sda3-8]
>>   1396 ?        I<     0:00 [ext4-rsv-conver]
>>   1397 ?        S      0:00 [jbd2/mmcblk0p2-]
>>   1398 ?        I<     0:00 [ext4-rsv-conver]
>>   1413 ?        Ssl    0:00 /usr/lib/gvfs/gvfsd-metadata
>>   1438 ?        S      0:00 /bin/sh /usr/bin/kodi
>>   1442 ?        Sl     0:44 /usr/lib/arm-linux-gnueabihf/kodi/kodi.bin
>>   1484 ?        Ss     0:00 /usr/lib/bluetooth/obexd
>>
>>
>> Kernel amdgpu initialization:
>> [   12.865468] [drm] amdgpu kernel modesetting enabled.
>> [   12.865699] amdgpu 0000:01:00.0: enabling device (0140 -> 0143)
>> [   12.866422] [drm] initializing kernel modesetting (POLARIS11
>> 0x1002:0x67EF 0x174B:0xE348 0xCF).
>> [   12.866434] [drm] register mmio base: 0xE0200000
>> [   12.866436] [drm] register mmio size: 262144
>> [   12.866463] [drm] probing gen 2 caps for device 11ab:6828 = 3ac12/0
>> [   12.866466] [drm] probing mlw for device 11ab:6828 = 3ac12
>> [   12.866482] [drm] UVD is enabled in VM mode
>> [   12.866483] [drm] UVD ENC is enabled in VM mode
>> [   12.866487] [drm] VCE enabled in VM mode
>> [   13.083379] ATOM BIOS: 113-34801-U03
>> [   13.083412] [drm] GPU posting now...
>> [   13.300951] [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
>> fragment size is 9-bit
>> [   13.302750] amdgpu 0000:01:00.0: VRAM: 2048M 0x000000F400000000 -
>> 0x000000F47FFFFFFF (2048M used)
>> [   13.302756] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 -
>> 0x000000000FFFFFFF
>> [   13.302759] [drm] Detected VRAM RAM=2048M, BAR=256M
>> [   13.302761] [drm] RAM width 128bits GDDR5
>> [   13.307710] [TTM] Zone  kernel: Available graphics memory: 510748
>> kiB
>> [   13.307714] [TTM] Initializing pool allocator
>> [   13.307776] [drm] amdgpu: 2048M of VRAM memory ready
>> [   13.307781] [drm] amdgpu: 748M of GTT memory ready.
>> [   13.307828] [drm] GART: num cpu pages 65536, num gpu pages 65536
>> [   13.307910] [drm] PCIE GART of 256M enabled (table at
>> 0x000000F400040000).
>> [   13.309597] [drm] Chained IB support enabled!
>>                                                  [   13.341215] [drm]
>> Found UVD firmware Version: 1.79 Family ID: 16
>> [   13.359302] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
>> [   13.427998] amdgpu: [powerplay]
>>                  failed to send message 309 ret is 254
>> [   13.428020] amdgpu: [powerplay]
>>                  failed to send pre message 14e ret is 254
>> [   13.436878] [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR*
>> Don't have enable_spread_spectrum_on_ppll for v4
>> [   13.447496] [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR*
>> Don't have program_clock for v7
>> [   13.456402] [drm] DM_PPLIB: values for Engine clock
>> [   13.456405] [drm] DM_PPLIB:   21400
>> [   13.456406] [drm] DM_PPLIB:   48100
>> [   13.456407] [drm] DM_PPLIB:   76000
>> [   13.456408] [drm] DM_PPLIB:   102000
>> [   13.456409] [drm] DM_PPLIB:   110200
>> [   13.456410] [drm] DM_PPLIB:   113800
>> [   13.456411] [drm] DM_PPLIB:   117200
>> [   13.456412] [drm] DM_PPLIB:   121000
>> [   13.456413] [drm] DM_PPLIB: Warning: using default validation
>> clocks!
>> [   13.456414] [drm] DM_PPLIB: Validation clocks:
>> [   13.456416] [drm] DM_PPLIB:    engine_max_clock: 72000
>> [   13.456417] [drm] DM_PPLIB:    memory_max_clock: 80000
>> [   13.456418] [drm] DM_PPLIB:    level           : 0
>> [   13.456419] [drm] DM_PPLIB: reducing engine clock level from 8 to 2
>> [   13.456423] [drm] DM_PPLIB: values for Memory clock
>> [   13.456425] [drm] DM_PPLIB:   30000
>> [   13.456425] [drm] DM_PPLIB:   175000
>> [   13.456427] [drm] DM_PPLIB: Warning: using default validation
>> clocks!
>> [   13.456427] [drm] DM_PPLIB: Validation clocks:
>> [   13.456428] [drm] DM_PPLIB:    engine_max_clock: 72000
>> [   13.456429] [drm] DM_PPLIB:    memory_max_clock: 80000
>> [   13.456430] [drm] DM_PPLIB:    level           : 0
>> [   13.456432] [drm] DM_PPLIB: reducing memory clock level from 2 to 1
>> [   13.457440] [drm] Display Core initialized with v3.1.27!
>> [   13.664292] [drm] Supports vblank timestamp caching Rev 2
>> (21.10.2013).
>> [   13.664295] [drm] Driver supports precise vblank timestamp query.
>> [   13.703612] [drm] UVD and UVD ENC initialized successfully.
>> [   13.805731] [drm] VCE initialized successfully.
>> [   14.316289] [drm] fb mappable at 0xD03F2000
>> [   14.316291] [drm] vram apper at 0xD0000000
>> [   14.316293] [drm] size 3145728
>> [   14.316294] [drm] fb depth is 24
>> [   14.316296] [drm]    pitch is 4096
>> [   14.336250] Console: switching to colour frame buffer device 128x48
>> [   14.396964] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer
>> device
>> [   14.469610] [drm] Initialized amdgpu 3.23.0 20150101 for
>> 0000:01:00.0 on minor 0
>> [   14.470097] snd_hda_intel 0000:01:00.1: enabling device (0140 ->
>> 0142)
>> [   14.470109] snd_hda_intel 0000:01:00.1: Force to snoop mode by
>> module option
>> [   14.535472] input: HDA ATI HDMI HDMI/DP,pcm=3 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input3 
>>
>> [   14.535606] input: HDA ATI HDMI HDMI/DP,pcm=7 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input4 
>>
>> [   14.535730] input: HDA ATI HDMI HDMI/DP,pcm=8 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input5 
>>
>> [   14.535845] input: HDA ATI HDMI HDMI/DP,pcm=9 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input6 
>>
>> [   14.535971] input: HDA ATI HDMI HDMI/DP,pcm=10 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input7 
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]     ` <1ca083a5-f33c-7be0-a3c8-c2996a087f70-5C7GfCeVMHo@public.gmane.org>
  2018-01-02  9:38       ` Christian König
@ 2018-01-02 13:09       ` Luís Mendes
       [not found]         ` <CAEzXK1odkDX-D3MOGHLFJuKNHh0RjSfsFL8PtB=6YQRDf1+Tkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-02 13:09 UTC (permalink / raw)
  To: Chunming Zhou
  Cc: alexander.deucher-5C7GfCeVMHo, christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Dear Mr. David, Mr. Christian,

First of all, thanks for your replies!

David, I will try the same software versions on x86 to see if I am
able to replicate the problem on x86, but I suspect it is ARM
specific... I'll report back when I have more details.

Christian, I'll collect the data you've referred and will disable the
power management. Regarding the mesa master version, I've tried it,
and the problem just gets worse. With latest mesa, It easily locks up
in lightdm login screen, or when navigating through the Ubuntu Mate
menus, or with Kodi.  I've tested with mesa commit "radv: Implement
binning on GFX9" - 6a36bfc64d2096aa338958c4605f5fc6372c07b8. Just one
question... when you refer to API traces, you're suggesting to strace
kodi, or what do you mean?

Regards,
Luís

On Tue, Jan 2, 2018 at 2:51 AM, Chunming Zhou <zhoucm1@amd.com> wrote:
> Did you try it on x86 board? Is there same issue?
>
> We should identify it is ARM specific or genera issue for amdgpu driver.
>
>
> Thanks,
>
> David Zhou
>
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]         ` <CAEzXK1odkDX-D3MOGHLFJuKNHh0RjSfsFL8PtB=6YQRDf1+Tkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-02 13:17           ` Christian König
       [not found]             ` <d755cd12-1a1e-ce98-2b0d-ea4b5eafc483-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2018-01-02 13:17 UTC (permalink / raw)
  To: Luís Mendes, Chunming Zhou
  Cc: alexander.deucher-5C7GfCeVMHo, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

> when you refer to API traces, you're suggesting to strace
> kodi, or what do you mean?
What I meant was apitrace (https://github.com/apitrace/apitrace), but 
when even the lightdm login screen crashes than this won't be much helpful.

That strongly sounds like a ARM specific problem, maybe USWC doesn't 
work as it should? See function drm_arch_can_wc_memory() in the kernel 
source and try if it helps if you always return false.

Apart from that the only other explanation I have is that some system 
memory isn't accessible for the GPU while some other is working fine.

Please provide the output of "sudo cat /proc/iomem" to double check that.

Regards,
Christian.

Am 02.01.2018 um 14:09 schrieb Luís Mendes:
> Dear Mr. David, Mr. Christian,
>
> First of all, thanks for your replies!
>
> David, I will try the same software versions on x86 to see if I am
> able to replicate the problem on x86, but I suspect it is ARM
> specific... I'll report back when I have more details.
>
> Christian, I'll collect the data you've referred and will disable the
> power management. Regarding the mesa master version, I've tried it,
> and the problem just gets worse. With latest mesa, It easily locks up
> in lightdm login screen, or when navigating through the Ubuntu Mate
> menus, or with Kodi.  I've tested with mesa commit "radv: Implement
> binning on GFX9" - 6a36bfc64d2096aa338958c4605f5fc6372c07b8. Just one
> question... when you refer to API traces, you're suggesting to strace
> kodi, or what do you mean?
>
> Regards,
> Luís
>
> On Tue, Jan 2, 2018 at 2:51 AM, Chunming Zhou <zhoucm1@amd.com> wrote:
>> Did you try it on x86 board? Is there same issue?
>>
>> We should identify it is ARM specific or genera issue for amdgpu driver.
>>
>>
>> Thanks,
>>
>> David Zhou
>>
>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]             ` <d755cd12-1a1e-ce98-2b0d-ea4b5eafc483-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-02 22:29               ` Luís Mendes
       [not found]                 ` <CAEzXK1qCfnHUvnuuViTP5fZQx2StjecV6o_QfJ2uJk603_9nhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-02 22:29 UTC (permalink / raw)
  To: Christian König
  Cc: alexander.deucher-5C7GfCeVMHo, Chunming Zhou,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Ok... I've done some of the suggested tests.

I still haven't tested on x86, but I'll get to that.

I've recompiled the kernel to disable Power Management as much as
possible at all levels, including the PCIe, I've also modified
/include/drm/drm_cache.h - static inline bool
drm_arch_can_wc_memory(void) to always return false, but neither
solved the issue.

When I run kodi under apitrace with mesa 17.3.1 it becomes much more
difficult to reproduce the crash, there are a lot of missed frames due
to the CPU overload of apitrace, but I was to able to crash the GPU
once. The apitrace log has 2.3GB, how should I send it?
It happened while playing a VP9 encoded webm video file, which is
decoded by software, as RX 460 is unable to hardware decode this codec
AFAIK. In fact software decoded videos are more prone to produce the
GPU hang, while a H265 4K hardware decoded video never causes a GPU
hang. I'm affraid I forgot to have kodi to log the execution data when
I did the apitrace.


The full dmesg is presented below as well as the /proc/iomem
information and lspci output.
 I just want to note that I'm having EDID DDC errors with my TV
screen, because at some point in kernel 4.14 onwards, both the RX460
as well as the RX550 cards started to corrupt the I2C TV screen EDID
memory, so that I have to reflash the correct EDID data to get the
screen back to its own configuration. This is a rare problem that only
occurs with this TV. All other TVs and monitors that I've tested don't
show this EDID corruption issue. I currently have stopped to reflash
the I2C EDID configuration memory of my TV to avoid exceeding the
memory write cycles endurance, instead I now modify gpu/drm/drm_edid.c
in function drm_do_get_edid() to allow the corrupted EDID to pass and
enter X. So please ignore the EDID error warnings on my dmesg log. The
GPU hangs occur just the same, even when I have the correct EDID, as
it is an unrelated issue.

Regards,
Luís

iomem shows this:
00000000-3fffffff : System RAM
  00008000-00efffff : Kernel code
  01000000-010e3913 : Kernel data
d0000000-efffffff : PCI MEM
  d0000000-e7ffffff : PCI Bus 0000:01
    d0000000-dfffffff : 0000:01:00.0
    e0000000-e01fffff : 0000:01:00.0
    e0200000-e023ffff : 0000:01:00.0
    e0240000-e025ffff : 0000:01:00.0
    e0260000-e0263fff : 0000:01:00.1
      e0260000-e0263fff : ICH HD audio
f1010680-f10106cf : spi@10680
f1011000-f101101f : i2c@11000
f1011100-f101111f : i2c@11100
f1012000-f101201f : serial
f1012100-f101211f : serial
f1018000-f101801f : pinctrl@18000
f1018100-f101813f : gpio
f1018140-f101817f : gpio
f1018454-f1018457 : conf-sdio3
f10184a0-f10184ab : rtc-soc
f1020704-f1020707 : watchdog@20300
f1020800-f102080f : cpurst@20800
f1020a00-f1020ccf : interrupt-controller@20a00
f1021070-f10210c7 : interrupt-controller@20a00
f1022000-f1022fff : pmsu@22000
f1030000-f1033fff : ethernet@30000
f1034000-f1037fff : ethernet@34000
f1040000-f1041fff : pcie@2,0
f1044000-f1045fff : pcie@3,0
f1058000-f10584ff : usb@58000
f1070000-f1073fff : ethernet@70000
f10a3800-f10a381f : rtc
f10a8000-f10a9fff : sata@a8000
f10d8000-f10d8fff : sdhci
f10e0000-f10e1fff : sata@e0000
f10e4074-f10e4077 : thermal@e8078
f10e4078-f10e407b : thermal@e8078
f10f0000-f10f3fff : usb3@f0000
f10f8000-f10fbfff : usb3@f8000
f1100000-f11007ff : f1100000.sa-sram0
f1110000-f11107ff : f1110000.sa-sram1
f1200000-f12fffff : f1200000.bm-bppi

lspci is like this:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Baffin [Radeon RX 460] (rev cf) (prog-if 00)
        Subsystem: PC Partner Limited / Sapphire Technology Baffin
[Radeon RX 460]
        Flags: bus master, fast devsel, latency 0, IRQ 57
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at e0000000 (64-bit, prefetchable) [size=2M]
        I/O ports at 10000 [size=256]
        Memory at e0200000 (32-bit, non-prefetchable) [size=256K]
        Expansion ROM at e0240000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] #15
        Capabilities: [270] #19
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] Page Request Interface (PRI)
        Capabilities: [2d0] Process Address Space ID (PASID)
        Capabilities: [320] Latency Tolerance Reporting
        Capabilities: [328] Alternative Routing-ID Interpretation
(ARI)
        Capabilities: [370] L1 PM Substates
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device
aae0
        Subsystem: PC Partner Limited / Sapphire Technology Device
aae0
        Flags: bus master, fast devsel, latency 0, IRQ 56
        Memory at e0260000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [328] Alternative Routing-ID Interpretation
(ARI)
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel


Full dmesg output follows:

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.15.0-rc4-strong-g104bd2c-dirty
(lpnm@ENIAC10) (gcc version 5.4.0 20160609 (Ubuntu/Lina8
[    0.000000] CPU: ARMv7 Processor [414fc091] revision 1 (ARMv7),
cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
instruction cache
[    0.000000] Memory policy: Data cache writealloc
[    0.000000] random: fast init done
[    0.000000] percpu: Embedded 16 pages/cpu @(ptrval) s35148 r8192
d22196 u65536
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages:
260096
[    0.000000] Kernel command line: root=/dev/sda2 rw rootfstype=ext4
rootwait console=ttyS0,115200n8
[    0.000000] Dentry cache hash table entries: 131072 (order: 7,
524288 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144
bytes)
[    0.000000] Memory: 1023088K/1048576K available (11264K kernel
code, 685K rwdata, 2296K rodata, 1024K init, 224K b)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
[    0.000000]     vmalloc : 0xc0800000 - 0xff800000   (1008 MB)
[    0.000000]     lowmem  : 0x80000000 - 0xc0000000   (1024 MB)
[    0.000000]     pkmap   : 0x7fe00000 - 0x80000000   (   2 MB)
[    0.000000]     modules : 0x7f000000 - 0x7fe00000   (  14 MB)
[    0.000000]       .text : 0x(ptrval) - 0x(ptrval)   (12256 kB)
[    0.000000]       .init : 0x(ptrval) - 0x(ptrval)   (1024 kB)
[    0.000000]       .data : 0x(ptrval) - 0x(ptrval)   ( 686 kB)
[    0.000000]        .bss : 0x(ptrval) - 0x(ptrval)   ( 225 kB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2,
Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=2
[    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000000] L2C-310 erratum 769419 enabled
[    0.000000] L2C-310 enabling early BRESP for Cortex-A9
[    0.000000] L2C-310 full line of zeros enabled for Cortex-A9
[    0.000000] L2C-310 D prefetch enabled, offset 1 lines
[    0.000000] L2C-310 dynamic clock gating enabled, standby mode
enabled
[    0.000000] L2C-310 Coherent cache controller enabled, 16 ways,
1024 kB
[    0.000000] L2C-310 Coherent: CACHE_ID 0x410054c9, AUX_CTRL
0x56070001
[    0.000006] sched_clock: 64 bits at 800MHz, resolution 1ns, wraps
every 4398046511103ns
[    0.000017] clocksource: arm_global_timer: mask: 0xffffffffffffffff
max_cycles: 0xb881274fa3, max_idle_ns: 4407952s
[    0.000030] Switching to timer-based delay loop, resolution 1ns
[    0.000168] Ignoring duplicate/late registration of
read_current_timer delay
[    0.000175] clocksource: armada_370_xp_clocksource: mask:
0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76450417s
[    0.000351] Console: colour dummy device 80x30
[    0.000364] Calibrating delay loop (skipped), value calculated
using timer frequency.. 1600.00 BogoMIPS (lpj=80000)
[    0.000370] pid_max: default: 32768 minimum: 301
[    0.000416] Mount-cache hash table entries: 2048 (order: 1, 8192
bytes)
[    0.000422] Mountpoint-cache hash table entries: 2048 (order: 1,
8192 bytes)
[    0.000660] CPU: Testing write buffer coherency: ok
[    0.000775] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[    0.000907] Setting up static identity map for 0x100000 - 0x100060
[    0.000982] mvebu-soc-id: MVEBU SoC ID=0x6828, Rev=0x4
[    0.001058] mvebu-pmsu: Initializing Power Management Service Unit
[    0.001107] Hierarchical SRCU implementation.
[    0.002574] smp: Bringing up secondary CPUs ...
[    0.002727] Booting CPU 1
[    0.002882] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
[    0.002927] smp: Brought up 1 node, 2 CPUs
[    0.002933] SMP: Total of 2 processors activated (3200.00
BogoMIPS).
[    0.002936] CPU: All CPU(s) started in SVC mode.
[    0.003353] devtmpfs: initialized
[    0.005053] VFP support v0.3: implementor 41 architecture 3 part 30
variant 9 rev 4
[    0.005112] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.005120] futex hash table entries: 512 (order: 3, 32768 bytes)
[    0.005188] xor: measuring software checksum speed
[    0.100066]    arm4regs  :  2506.400 MB/sec
[    0.200065]    8regs     :  1967.600 MB/sec
[    0.300066]    32regs    :  1854.800 MB/sec
[    0.400067]    neon      :  1822.000 MB/sec
[    0.400070] xor: using function: arm4regs (2506.400 MB/sec)
[    0.400076] pinctrl core: initialized pinctrl subsystem
[    0.400421] NET: Registered protocol family 16
[    0.400918] DMA: preallocated 256 KiB pool for atomic coherent
allocations
[    0.401426] hw-breakpoint: found 5 (+1 reserved) breakpoint and 1
watchpoint registers.
[    0.401432] hw-breakpoint: maximum watchpoint size is 4 bytes.
[    0.401547] mvebu-pmsu: CPU hotplug support is currently broken on
Armada 38x: disabling
[    0.401554] mvebu-pmsu: CPU idle is currently broken on Armada 38x:
disabling
[    0.580251] raid6: int32x1  gen()   210 MB/s
[    0.750114] raid6: int32x1  xor()   299 MB/s
[    0.920090] raid6: int32x2  gen()   291 MB/s
[    1.090117] raid6: int32x2  xor()   345 MB/s
[    1.260146] raid6: int32x4  gen()   385 MB/s
[    1.430096] raid6: int32x4  xor()   343 MB/s
[    1.600076] raid6: int32x8  gen()   405 MB/s
[    1.770097] raid6: int32x8  xor()   283 MB/s
[    1.940098] raid6: neonx1   gen()  1212 MB/s
[    2.110070] raid6: neonx1   xor()  1147 MB/s
[    2.280110] raid6: neonx2   gen()  1294 MB/s
[    2.450092] raid6: neonx2   xor()  1331 MB/s
[    2.620096] raid6: neonx4   gen()  1271 MB/s
[    2.790069] raid6: neonx4   xor()  1273 MB/s
[    2.960097] raid6: neonx8   gen()  1094 MB/s
[    3.130097] raid6: neonx8   xor()  1033 MB/s
[    3.130100] raid6: using algorithm neonx2 gen() 1294 MB/s
[    3.130103] raid6: .... xor() 1331 MB/s, rmw enabled
[    3.130106] raid6: using neon recovery algorithm
[    3.130501] vgaarb: loaded
[    3.130631] SCSI subsystem initialized
[    3.130846] usbcore: registered new interface driver usbfs
[    3.130870] usbcore: registered new interface driver hub
[    3.130895] usbcore: registered new device driver usb
[    3.130993] pps_core: LinuxPPS API ver. 1 registered
[    3.130997] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
Rodolfo Giometti <giometti@linux.it>
[    3.131007] PTP clock support registered
[    3.131255] Advanced Linux Sound Architecture Driver Initialized.
[    3.131458] Bluetooth: Core ver 2.22
[    3.131480] NET: Registered protocol family 31
[    3.131484] Bluetooth: HCI device and connection manager
initialized
[    3.131490] Bluetooth: HCI socket layer initialized
[    3.131494] Bluetooth: L2CAP socket layer initialized
[    3.131507] Bluetooth: SCO socket layer initialized
[    3.131694] clocksource: Switched to clocksource arm_global_timer
[    3.156110] NET: Registered protocol family 2
[    3.156352] TCP established hash table entries: 8192 (order: 3,
32768 bytes)
[    3.156391] TCP bind hash table entries: 8192 (order: 4, 65536
bytes)
[    3.156452] TCP: Hash tables configured (established 8192 bind
8192)
[    3.156507] UDP hash table entries: 512 (order: 2, 16384 bytes)
[    3.156532] UDP-Lite hash table entries: 512 (order: 2, 16384
bytes)
[    3.156610] NET: Registered protocol family 1
[    3.156807] RPC: Registered named UNIX socket transport module.
[    3.156811] RPC: Registered udp transport module.
[    3.156814] RPC: Registered tcp transport module.
[    3.156817] RPC: Registered tcp NFSv4.1 backchannel transport
module.
[    3.157229] hw perfevents: enabled with armv7_cortex_a9 PMU driver,
7 counters available
[    3.157717] Initialise system trusted keyrings
[    3.157939] workingset: timestamp_bits=30 max_order=18
bucket_order=0
[    3.160489] Installing knfsd (copyright (C) 1996
okir@monad.swb.de).
[    3.162171] async_tx: api initialized (async)
[    3.162179] Key type asymmetric registered
[    3.162183] Asymmetric key parser 'x509' registered
[    3.162218] Block layer SCSI generic (bsg) driver version 0.4
loaded (major 249)
[    3.162224] io scheduler noop registered
[    3.162228] io scheduler deadline registered
[    3.162300] io scheduler cfq registered (default)
[    3.162305] io scheduler mq-deadline registered
[    3.162309] io scheduler kyber registered
[    3.162896] armada-38x-pinctrl f1018000.pinctrl: registered pinctrl
driver
[    3.163856] mvebu-pcie soc:pcie: /soc/pcie/pcie@2,0: reset gpio is
active low
[    3.164199] mv_xor f1060800.xor: Marvell shared XOR driver
[    3.222153] mv_xor f1060800.xor: Marvell XOR (Descriptor Mode): (
xor cpy intr )
[    3.222894] mv_xor f1060900.xor: Marvell shared XOR driver
[    3.282140] mv_xor f1060900.xor: Marvell XOR (Descriptor Mode): (
xor cpy intr )
[    3.301102] Serial: 8250/16550 driver, 4 ports, IRQ sharing
disabled
[    3.301902] console [ttyS0] disabled
[    3.321974] f1012000.serial: ttyS0 at MMIO 0xf1012000 (irq = 23,
base_baud = 15625000) is a 16550A
[    4.145968] console [ttyS0] enabled
[    4.169945] f1012100.serial: ttyS1 at MMIO 0xf1012100 (irq = 24,
base_baud = 15625000) is a 16550A
[    4.179505] ahci-mvebu f10a8000.sata: AHCI 0001.0000 32 slots 2
ports 6 Gbps 0x3 impl platform mode
[    4.188591] ahci-mvebu f10a8000.sata: flags: 64bit ncq sntf led
only pmp fbs pio slum part sxs
[    4.197699] scsi host0: ahci-mvebu
[    4.201249] scsi host1: ahci-mvebu
[    4.204774] ata1: SATA max UDMA/133 mmio [mem
0xf10a8000-0xf10a9fff] port 0x100 irq 45
[    4.212715] ata2: SATA max UDMA/133 mmio [mem
0xf10a8000-0xf10a9fff] port 0x180 irq 45
[    4.220778] ahci-mvebu f10e0000.sata: AHCI 0001.0000 32 slots 2
ports 6 Gbps 0x3 impl platform mode
[    4.229854] ahci-mvebu f10e0000.sata: flags: 64bit ncq sntf led
only pmp fbs pio slum part sxs
[    4.238957] scsi host2: ahci-mvebu
[    4.242505] scsi host3: ahci-mvebu
[    4.245987] ata3: SATA max UDMA/133 mmio [mem
0xf10e0000-0xf10e1fff] port 0x100 irq 46
[    4.253944] ata4: SATA max UDMA/133 mmio [mem
0xf10e0000-0xf10e1fff] port 0x180 irq 46
[    4.262653] Ethernet Channel Bonding Driver: v3.7.1 (April 27,
2011)
[    4.269799] libphy: Fixed MDIO Bus: probed
[    4.274099] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    4.279945] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    4.285917] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver -
version 5.1.0-k
[    4.293594] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[    4.299440] ixgb: Intel(R) PRO/10GbE Network Driver - version
1.0.135-k2-NAPI
[    4.306607] ixgb: Copyright (c) 1999-2008 Intel Corporation.
[    4.312474] libphy: orion_mdio_bus: probed
[    4.318289] mv88e6085: probe of f1072004.mdio-mii:04 failed with
error -110
[    4.326210] mvneta f1070000.ethernet eth0: Using device tree mac
address 00:50:43:56:27:29
[    4.335261] mvneta f1030000.ethernet eth1: Using device tree mac
address 00:50:43:56:8b:29
[    4.344297] mvneta f1034000.ethernet eth2: Using device tree mac
address 00:50:43:27:8b:56
[    4.352756] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network
Connection driver for Linux, in-tree:s
[    4.361914] iwl3945: Copyright(c) 2003-2011 Intel Corporation
[    4.367671] iwl3945: hw_scan is disabled
[    4.371688] usbcore: registered new interface driver asix
[    4.377123] usbcore: registered new interface driver ax88179_178a
[    4.383251] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI)
Driver
[    4.389796] ehci-pci: EHCI PCI platform driver
[    4.394273] ehci-orion: EHCI orion driver
[    4.398380] orion-ehci f1058000.usb: EHCI Host Controller
[    4.403807] orion-ehci f1058000.usb: new USB bus registered,
assigned bus number 1
[    4.411440] orion-ehci f1058000.usb: irq 41, io mem 0xf1058000
[    4.441680] orion-ehci f1058000.usb: USB 2.0 started, EHCI 1.00
[    4.447935] hub 1-0:1.0: USB hub found
[    4.451780] hub 1-0:1.0: 1 port detected
[    4.456093] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    4.461436] xhci-hcd f10f0000.usb3: new USB bus registered,
assigned bus number 2
[    4.469014] xhci-hcd f10f0000.usb3: hcc params 0x0a000990 hci
version 0x100 quirks 0x00010010
[    4.477582] xhci-hcd f10f0000.usb3: irq 48, io mem 0xf10f0000
[    4.483661] hub 2-0:1.0: USB hub found
[    4.487443] hub 2-0:1.0: 1 port detected
[    4.491486] xhci-hcd f10f0000.usb3: xHCI Host Controller
[    4.496838] xhci-hcd f10f0000.usb3: new USB bus registered,
assigned bus number 3
[    4.504383] usb usb3: We don't know the algorithms for LPM for this
host, disabling LPM.
[    4.512749] hub 3-0:1.0: USB hub found
[    4.516526] hub 3-0:1.0: 1 port detected
[    4.520635] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    4.525989] xhci-hcd f10f8000.usb3: new USB bus registered,
assigned bus number 4
[    4.533545] xhci-hcd f10f8000.usb3: hcc params 0x0a000990 hci
version 0x100 quirks 0x00010010
[    4.542111] xhci-hcd f10f8000.usb3: irq 49, io mem 0xf10f8000
[    4.548169] hub 4-0:1.0: USB hub found
[    4.551961] ata2: SATA link down (SStatus 0 SControl 300)
[    4.551965] hub 4-0:1.0: 1 port detected
[    4.552066] xhci-hcd f10f8000.usb3: xHCI Host Controller
[    4.552073] xhci-hcd f10f8000.usb3: new USB bus registered,
assigned bus number 5
[    4.552119] usb usb5: We don't know the algorithms for LPM for this
host, disabling LPM.
[    4.552414] hub 5-0:1.0: USB hub found
[    4.552444] hub 5-0:1.0: 1 port detected
[    4.552652] usbcore: registered new interface driver usb-storage
[    4.552824] mousedev: PS/2 mouse device common for all mice
[    4.553353] armada38x-rtc f10a3800.rtc: rtc core: registered
f10a3800.rtc as rtc0
[    4.553480] i2c /dev entries driver
[    4.553855] pca953x 0-0020: 0-0020 supply vcc not found, using
dummy regulator
[    4.567037] GPIO line 496 (pcie1.0-clkreq) hogged as input
[    4.568076] GPIO line 499 (pcie1.0-w-disable) hogged as output/low
[    4.568805] GPIO line 501 (usb3-current-limit) hogged as input
[    4.569844] GPIO line 502 (usb3-power) hogged as output/high
[    4.570883] GPIO line 507 (m.2 devslp) hogged as output/low
[    4.571613] GPIO line 508 (sfp-los) hogged as input
[    4.572343] GPIO line 509 (sfp-tx-fault) hogged as input
[    4.573382] GPIO line 510 (sfp-tx-disable) hogged as output/low
[    4.574111] GPIO line 511 (sfp-mod-def0) hogged as input
[    4.574840] GPIO line 500 (pcie2.0-clkreq) hogged as input
[    4.575879] GPIO line 503 (pcie2.0-w-disable) hogged as output/low
[    4.575961] pca953x 0-0020: interrupt support not compiled in
[    4.576394] IR NEC protocol handler initialized
[    4.576396] IR RC5(x/sz) protocol handler initialized
[    4.576397] IR RC6 protocol handler initialized
[    4.576398] IR JVC protocol handler initialized
[    4.576399] IR Sony protocol handler initialized
[    4.576400] IR SANYO protocol handler initialized
[    4.576402] IR Sharp protocol handler initialized
[    4.576403] IR MCE Keyboard/mouse protocol handler initialized
[    4.576404] IR XMP protocol handler initialized
[    4.586610] (NULL device *): hwmon_device_register() is deprecated.
Please convert the driver to use hwmon_device_.
[    4.587121] orion_wdt: Initial timeout 171 sec
[    4.587445] sdhci: Secure Digital Host Controller Interface driver
[    4.587446] sdhci: Copyright(c) Pierre Ossman
[    4.587610] sdhci-pxav3 f10d8000.sdhci: Got CD GPIO
[    4.601813] ata4: SATA link down (SStatus 0 SControl 300)
[    4.602073] ata3: SATA link down (SStatus 0 SControl 300)
[    4.651761] mmc0: SDHCI controller on f10d8000.sdhci
[f10d8000.sdhci] using ADMA
[    4.651860] sdhci-pltfm: SDHCI platform and OF driver helper
[    4.652046] usbcore: registered new interface driver usbhid
[    4.652047] usbhid: USB HID core driver
[    4.653005] NET: Registered protocol family 10
[    4.653849] Segment Routing with IPv6
[    4.653881] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    4.654110] NET: Registered protocol family 17
[    4.657571] 8021q: 802.1Q VLAN Support v1.8
[    4.657655] ThumbEE CPU extension supported.
[    4.657660] Registering SWP/SWPB emulation handler
[    4.657907] Loading compiled-in X.509 certificates
[    4.658691] Btrfs loaded, crc32c=crc32c-generic
[    4.659319] mvebu-pcie soc:pcie: /soc/pcie/pcie@2,0: reset gpio is
active low
[    4.659357] mvebu-pcie soc:pcie: /soc/pcie/pcie@3,0: reset gpio is
active low
[    4.704927] mmc0: new high speed SDHC card at address aaaa
[    4.705127] mmcblk0: mmc0:aaaa SL16G 14.8 GiB
[    4.707004]  mmcblk0: p1 p2
[    4.891777] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
[    4.897716] pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
[    4.904006] pci_bus 0000:00: root bus resource [mem
0xd0000000-0xefffffff]
[    4.910897] pci_bus 0000:00: root bus resource [bus 00-ff]
[    4.916598] PCI: bus0: Fast back to back transfers disabled
[    4.922192] pci 0000:00:02.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[    4.930217] pci 0000:00:03.0: bridge configuration invalid ([bus
00-00]), reconfiguring
[    4.938401] pci 0000:01:00.0: enabling Extended Tags
[    4.943568] pci 0000:01:00.0: vgaarb: VGA device added:
decodes=io+mem,owns=none,locks=none
[    4.952057] pci 0000:01:00.1: enabling Extended Tags
[    4.957200] PCI: bus1: Fast back to back transfers disabled
[    4.962831] PCI: bus2: Fast back to back transfers enabled
[    4.968347] pci 0000:00:02.0: BAR 8: assigned [mem
0xd0000000-0xe7ffffff]
[    4.975159] pci 0000:00:02.0: BAR 7: assigned [io  0x10000-0x10fff]
[    4.981444] pci 0000:01:00.0: BAR 0: assigned [mem
0xd0000000-0xdfffffff 64bit pref]
[    4.989219] pci 0000:01:00.0: BAR 2: assigned [mem
0xe0000000-0xe01fffff 64bit pref]
[    4.996991] pci 0000:01:00.0: BAR 5: assigned [mem
0xe0200000-0xe023ffff]
[    5.003802] pci 0000:01:00.0: BAR 6: assigned [mem
0xe0240000-0xe025ffff pref]
[    5.011041] pci 0000:01:00.1: BAR 0: assigned [mem
0xe0260000-0xe0263fff 64bit]
[    5.018378] pci 0000:01:00.0: BAR 4: assigned [io  0x10000-0x100ff]
[    5.024667] pci 0000:00:02.0: PCI bridge to [bus 01]
[    5.029642] pci 0000:00:02.0:   bridge window [io  0x10000-0x10fff]
[    5.035929] pci 0000:00:02.0:   bridge window [mem
0xd0000000-0xe7ffffff]
[    5.042738] pci 0000:00:03.0: PCI bridge to [bus 02]
[    5.047757] pcieport 0000:00:02.0: enabling device (0140 -> 0143)
[    5.054113] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    5.054406] usb 4-1: new high-speed USB device number 2 using
xhci-hcd
[    5.054803] input: gpio-keys as
/devices/platform/gpio-keys/input/input0
[    5.060314] armada38x-rtc f10a3800.rtc: setting system clock to
2018-01-02 22:16:04 UTC (1514931364)
[    5.060413] cfg80211: Loading compiled-in X.509 certificates for
regulatory database
[    5.062147] cfg80211: Loaded X.509 cert 'sforshee:
00b28ddf47aef9cea7'
[    5.062170] ALSA device list:
[    5.062171]   No soundcards found.
[    5.062225] platform regulatory.0: Direct firmware load for
regulatory.db failed with error -2
[    5.062228] cfg80211: failed to load regulatory.db
[    5.118446] ata1.00: supports DRM functions and may not be fully
accessible
[    5.125486] ata1.00: ATA-9: Samsung SSD 850 EVO mSATA 250GB,
EMT41B6Q, max UDMA/133
[    5.133166] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth
31/32)
[    5.141562] ata1.00: supports DRM functions and may not be fully
accessible
[    5.149950] ata1.00: configured for UDMA/133
[    5.154421] scsi 0:0:0:0: Direct-Access     ATA      Samsung SSD
850  1B6Q PQ: 0 ANSI: 5
[    5.162898] sd 0:0:0:0: [sda] 488397168 512-byte logical blocks:
(250 GB/233 GiB)
[    5.170424] sd 0:0:0:0: [sda] Write Protect is off
[    5.175289] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[    5.185066]  sda: sda1 sda2 sda3
[    5.188854] sd 0:0:0:0: [sda] Attached SCSI removable disk
[    5.194390] md: Waiting for all devices to be available before
autodetect
[    5.201192] md: If you don't use raid, use raid=noautodetect
[    5.207132] md: Autodetecting RAID arrays.
[    5.211237] md: autorun ...
[    5.214043] md: ... autorun DONE.
[    5.223207] EXT4-fs (sda2): mounted filesystem with ordered data
mode. Opts: (null)
[    5.230899] VFS: Mounted root (ext4 filesystem) on device 8:2.
[    5.237830] devtmpfs: mounted
[    5.241302] Freeing unused kernel memory: 1024K
[    5.247070] hub 4-1:1.0: USB hub found
[    5.250926] hub 4-1:1.0: 4 ports detected
[    5.317951] systemd[1]: systemd 234 running in system mode. (+PAM
+AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT )
[    5.338867] systemd[1]: Detected architecture arm.
Jan  2 22:16:10 localhost kernel: [    5.571694] usb 4-1.2: new
low-speed USB device number 3 using xhci-hcd
Jan  2 22:16:10 localhost kernel: [    5.740438] input: Trust Trust
Wireless TouchKB as
/devices/platform/soc/soc:internal-regs/f10f8000.usb3/usb4/4-1/4-1.2/4-1.2:1.0/0003:145F:01D3.0001/input/input1
Jan  2 22:16:10 localhost kernel: [    5.822021] hid-generic
0003:145F:01D3.0001: input: USB HID v1.10 Keyboard [Trust Trust
Wireless TouchKB] on usb-f10f8000.usb3-1.2/input0
Jan  2 22:16:10 localhost kernel: [    5.842379] input: Trust Trust
Wireless TouchKB as
/devices/platform/soc/soc:internal-regs/f10f8000.usb3/usb4/4-1/4-1.2/4-1.2:1.1/0003:145F:01D3.0002/input/input2
Jan  2 22:16:10 localhost kernel: [    5.921815] hid-generic
0003:145F:01D3.0002: input: USB HID v1.10 Mouse [Trust Trust Wireless
TouchKB] on usb-f10f8000.usb3-1.2/input1
Jan  2 22:16:10 localhost kernel: [    6.409779] lp: driver loaded but
no devices found
Jan  2 22:16:10 localhost kernel: [    6.417487] ppdev: user-space
parallel port driver
Jan  2 22:16:10 localhost kernel: [    6.614196] EXT4-fs (sda2):
re-mounted. Opts: errors=remount-ro
Jan  2 22:16:10 localhost kernel: [    7.648916] snd_hda_intel
0000:01:00.1: enabling device (0140 -> 0142)
Jan  2 22:16:10 localhost kernel: [    7.648929] snd_hda_intel
0000:01:00.1: Force to snoop mode by module option
Jan  2 22:16:10 localhost kernel: [    7.650649]
drm_panel_orientation_quirks: module license 'unspecified' taints
kernel.
Jan  2 22:16:10 localhost kernel: [    7.650653] Disabling lock
debugging due to kernel taint
Jan  2 22:16:10 localhost kernel: [    7.706837] input: HDA ATI HDMI
HDMI/DP,pcm=3 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input3
Jan  2 22:16:10 localhost kernel: [    7.706974] input: HDA ATI HDMI
HDMI/DP,pcm=7 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input4
Jan  2 22:16:10 localhost kernel: [    7.707110] input: HDA ATI HDMI
HDMI/DP,pcm=8 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input5
Jan  2 22:16:10 localhost kernel: [    7.707249] input: HDA ATI HDMI
HDMI/DP,pcm=9 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input6
Jan  2 22:16:10 localhost kernel: [    7.707369] input: HDA ATI HDMI
HDMI/DP,pcm=10 as
/devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input7
Jan  2 22:16:10 localhost kernel: [    7.777661] [drm] amdgpu kernel
modesetting enabled.
Jan  2 22:16:10 localhost kernel: [    7.778616] amdgpu 0000:01:00.0:
enabling device (0140 -> 0143)
Jan  2 22:16:10 localhost kernel: [    7.780795] [drm] initializing
kernel modesetting (POLARIS11 0x1002:0x67EF 0x174B:0xE348 0xCF).
Jan  2 22:16:10 localhost kernel: [    7.780809] [drm] register mmio
base: 0xE0200000
Jan  2 22:16:10 localhost kernel: [    7.780811] [drm] register mmio
size: 262144
Jan  2 22:16:10 localhost kernel: [    7.780846] [drm] probing gen 2
caps for device 11ab:6828 = 3ac12/0
Jan  2 22:16:10 localhost kernel: [    7.780848] [drm] probing mlw for
device 11ab:6828 = 3ac12
Jan  2 22:16:10 localhost kernel: [    7.780876] [drm] UVD is enabled in VM mode
Jan  2 22:16:10 localhost kernel: [    7.780877] [drm] UVD ENC is
enabled in VM mode
Jan  2 22:16:10 localhost kernel: [    7.780882] [drm] VCE enabled in VM mode
Jan  2 22:16:10 localhost kernel: [    7.998516] ATOM BIOS: 113-34801-U03
Jan  2 22:16:10 localhost kernel: [    7.998546] [drm] GPU posting now...
Jan  2 22:16:10 localhost kernel: [    8.128956] [drm] vm size is 64
GB, 2 levels, block size is 10-bit, fragment size is 9-bit
Jan  2 22:16:10 localhost kernel: [    8.133710] amdgpu 0000:01:00.0:
VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
Jan  2 22:16:10 localhost kernel: [    8.133724] amdgpu 0000:01:00.0:
GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
Jan  2 22:16:10 localhost kernel: [    8.133727] [drm] Detected VRAM
RAM=2048M, BAR=256M
Jan  2 22:16:10 localhost kernel: [    8.133729] [drm] RAM width 128bits GDDR5
Jan  2 22:16:10 localhost kernel: [    8.133823] [TTM] Zone  kernel:
Available graphics memory: 512056 kiB
Jan  2 22:16:10 localhost kernel: [    8.133825] [TTM] Initializing
pool allocator
Jan  2 22:16:10 localhost kernel: [    8.133864] [TTM] Initializing
DMA pool allocator
Jan  2 22:16:10 localhost kernel: [    8.133906] [drm] amdgpu: 2048M
of VRAM memory ready
Jan  2 22:16:10 localhost kernel: [    8.133910] [drm] amdgpu: 750M of
GTT memory ready.
Jan  2 22:16:10 localhost kernel: [    8.133933] [drm] GART: num cpu
pages 65536, num gpu pages 65536
Jan  2 22:16:10 localhost kernel: [    8.134011] [drm] PCIE GART of
256M enabled (table at 0x000000F400040000).
Jan  2 22:16:10 localhost kernel: [    8.135969] [drm] Chained IB
support enabled!
Jan  2 22:16:10 localhost kernel: [    8.222388] [drm] Found UVD
firmware Version: 1.79 Family ID: 16
Jan  2 22:16:10 localhost kernel: [    8.370002] [drm] Found VCE
firmware Version: 52.4 Binary ID: 3
Jan  2 22:16:10 localhost kernel: [    8.459983] amdgpu: [powerplay]
Jan  2 22:16:10 localhost kernel: [    8.459983]  failed to send
message 309 ret is 254
Jan  2 22:16:10 localhost kernel: [    8.460011] amdgpu: [powerplay]
Jan  2 22:16:10 localhost kernel: [    8.460011]  failed to send pre
message 14e ret is 254
Jan  2 22:16:10 localhost kernel: [    8.591421]
[drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR* Don't have
enable_spread_spectrum_on_ppll for v4
Jan  2 22:16:10 localhost kernel: [    8.602061]
[drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR* Don't have
program_clock for v7
Jan  2 22:16:10 localhost kernel: [    8.610933] [drm] DM_PPLIB:
values for Engine clock
Jan  2 22:16:10 localhost kernel: [    8.610937] [drm] DM_PPLIB:     21400
Jan  2 22:16:10 localhost kernel: [    8.610939] [drm] DM_PPLIB:     48100
Jan  2 22:16:10 localhost kernel: [    8.610940] [drm] DM_PPLIB:     76000
Jan  2 22:16:10 localhost kernel: [    8.610941] [drm] DM_PPLIB:     102000
Jan  2 22:16:10 localhost kernel: [    8.610943] [drm] DM_PPLIB:     110200
Jan  2 22:16:10 localhost kernel: [    8.610944] [drm] DM_PPLIB:     113800
Jan  2 22:16:10 localhost kernel: [    8.610945] [drm] DM_PPLIB:     117200
Jan  2 22:16:10 localhost kernel: [    8.610946] [drm] DM_PPLIB:     121000
Jan  2 22:16:10 localhost kernel: [    8.610948] [drm] DM_PPLIB:
Warning: using default validation clocks!
Jan  2 22:16:10 localhost kernel: [    8.610950] [drm] DM_PPLIB:
Validation clocks:
Jan  2 22:16:10 localhost kernel: [    8.610952] [drm] DM_PPLIB:
engine_max_clock: 72000
Jan  2 22:16:10 localhost kernel: [    8.610953] [drm] DM_PPLIB:
memory_max_clock: 80000
Jan  2 22:16:10 localhost kernel: [    8.610955] [drm] DM_PPLIB:
level           : 0
Jan  2 22:16:10 localhost kernel: [    8.610957] [drm] DM_PPLIB:
reducing engine clock level from 8 to 2
Jan  2 22:16:10 localhost kernel: [    8.610961] [drm] DM_PPLIB:
values for Memory clock
Jan  2 22:16:10 localhost kernel: [    8.610963] [drm] DM_PPLIB:     30000
Jan  2 22:16:10 localhost kernel: [    8.610964] [drm] DM_PPLIB:     175000
Jan  2 22:16:10 localhost kernel: [    8.610966] [drm] DM_PPLIB:
Warning: using default validation clocks!
Jan  2 22:16:10 localhost kernel: [    8.610967] [drm] DM_PPLIB:
Validation clocks:
Jan  2 22:16:10 localhost kernel: [    8.610969] [drm] DM_PPLIB:
engine_max_clock: 72000
Jan  2 22:16:10 localhost kernel: [    8.610970] [drm] DM_PPLIB:
memory_max_clock: 80000
Jan  2 22:16:10 localhost kernel: [    8.610972] [drm] DM_PPLIB:
level           : 0
Jan  2 22:16:10 localhost kernel: [    8.610973] [drm] DM_PPLIB:
reducing memory clock level from 2 to 1
Jan  2 22:16:10 localhost kernel: [    8.611994] [drm] Display Core
initialized with v3.1.27!
Jan  2 22:16:10 localhost kernel: [    8.711955] [drm] EDID checksum
is invalid, remainder is 223
Jan  2 22:16:10 localhost kernel: [    8.711961] Raw EDID:
Jan  2 22:16:10 localhost kernel: [    8.711965]      00 ff ff ff ff
ff ff 00 2e 83 54 21 34 00 00 00
Jan  2 22:16:10 localhost kernel: [    8.711967]      29 15 01 03 80
30 1b 78 0a f0 65 98 57 51 91 27
Jan  2 22:16:10 localhost kernel: [    8.711969]      00 50 54 21 08
00 81 80 a9 c0 01 01 01 01 01 01
Jan  2 22:16:10 localhost kernel: [    8.711970]      01 01 01 01 01
01 02 3a 80 18 71 38 2d 40 58 2c
Jan  2 22:16:10 localhost kernel: [    8.711972]      45 00 dc 0c 11
00 00 1e 66 21 50 b0 51 00 1b 30
Jan  2 22:16:10 localhost kernel: [    8.711974]      40 70 36 00 dc
0c 11 00 00 1e 00 00 00 fc 00 32
Jan  2 22:16:10 localhost kernel: [    8.711975]      32 4c 31 31 41
2d 48 44 2d 41 55 0a 00 00 00 fd
Jan  2 22:16:10 localhost kernel: [    8.711977]      00 31 3d 0f 44
0f 00 0a 20 20 20 20 20 20 01 15
Jan  2 22:16:10 localhost kernel: [    8.773617] [drm] EDID checksum
is invalid, remainder is 223
Jan  2 22:16:10 localhost kernel: [    8.773622] Raw EDID:
Jan  2 22:16:10 localhost kernel: [    8.773625]      00 ff ff ff ff
ff ff 00 2e 83 54 21 34 00 00 00
Jan  2 22:16:10 localhost kernel: [    8.773627]      29 15 01 03 80
30 1b 78 0a f0 65 98 57 51 91 27
Jan  2 22:16:10 localhost kernel: [    8.773629]      00 50 54 21 08
00 81 80 a9 c0 01 01 01 01 01 01
Jan  2 22:16:10 localhost kernel: [    8.773631]      01 01 01 01 01
01 02 3a 80 18 71 38 2d 40 58 2c
Jan  2 22:16:10 localhost kernel: [    8.773632]      45 00 dc 0c 11
00 00 1e 66 21 50 b0 51 00 1b 30
Jan  2 22:16:10 localhost kernel: [    8.773634]      40 70 36 00 dc
0c 11 00 00 1e 00 00 00 fc 00 32
Jan  2 22:16:10 localhost kernel: [    8.773635]      32 4c 31 31 41
2d 48 44 2d 41 55 0a 00 00 00 fd
Jan  2 22:16:10 localhost kernel: [    8.773637]      00 31 3d 0f 44
0f 00 0a 20 20 20 20 20 20 01 15
Jan  2 22:16:10 localhost kernel: [    8.835263] [drm] EDID checksum
is invalid, remainder is 223
Jan  2 22:16:10 localhost kernel: [    8.835267] Raw EDID:
Jan  2 22:16:10 localhost kernel: [    8.835270]      00 ff ff ff ff
ff ff 00 2e 83 54 21 34 00 00 00
Jan  2 22:16:10 localhost kernel: [    8.835272]      29 15 01 03 80
30 1b 78 0a f0 65 98 57 51 91 27
Jan  2 22:16:10 localhost kernel: [    8.835274]      00 50 54 21 08
00 81 80 a9 c0 01 01 01 01 01 01
Jan  2 22:16:10 localhost kernel: [    8.835275]      01 01 01 01 01
01 02 3a 80 18 71 38 2d 40 58 2c
Jan  2 22:16:10 localhost kernel: [    8.835277]      45 00 dc 0c 11
00 00 1e 66 21 50 b0 51 00 1b 30
Jan  2 22:16:10 localhost kernel: [    8.835278]      40 70 36 00 dc
0c 11 00 00 1e 00 00 00 fc 00 32
Jan  2 22:16:10 localhost kernel: [    8.835280]      32 4c 31 31 41
2d 48 44 2d 41 55 0a 00 00 00 fd
Jan  2 22:16:10 localhost kernel: [    8.835282]      00 31 3d 0f 44
0f 00 0a 20 20 20 20 20 20 01 15
Jan  2 22:16:10 localhost kernel: [    8.835794]
[drm:dm_helpers_read_local_edid [amdgpu]] *ERROR* EDID err: 3, on
connector: HDMI-A-1
Jan  2 22:16:10 localhost kernel: [    8.836127]
[drm:log_to_debug_console [amdgpu]] *ERROR* EDID checksum invalid.
Jan  2 22:16:10 localhost kernel: [    8.858895] [drm] Supports vblank
timestamp caching Rev 2 (21.10.2013).
Jan  2 22:16:10 localhost kernel: [    8.858900] [drm] Driver supports
precise vblank timestamp query.
Jan  2 22:16:10 localhost kernel: [    8.891054] [drm] UVD and UVD ENC
initialized successfully.
Jan  2 22:16:10 localhost kernel: [    8.991987] [drm] VCE initialized
successfully.
Jan  2 22:16:10 localhost kernel: [    9.122860] EXT4-fs (mmcblk0p1):
mounted filesystem with ordered data mode. Opts: errors=remount-ro
Jan  2 22:16:10 localhost kernel: [    9.148940] Adding 1952764k swap
on /dev/sda1.  Priority:-2 extents:1 across:1952764k SS
Jan  2 22:16:10 localhost kernel: [    9.514830] [drm] EDID checksum
is invalid, remainder is 223
Jan  2 22:16:10 localhost kernel: [    9.514834] Raw EDID:
Jan  2 22:16:10 localhost kernel: [    9.514838]      00 ff ff ff ff
ff ff 00 2e 83 54 21 34 00 00 00
Jan  2 22:16:10 localhost kernel: [    9.514840]      29 15 01 03 80
30 1b 78 0a f0 65 98 57 51 91 27
Jan  2 22:16:10 localhost kernel: [    9.514842]      00 50 54 21 08
00 81 80 a9 c0 01 01 01 01 01 01
Jan  2 22:16:10 localhost kernel: [    9.514843]      01 01 01 01 01
01 02 3a 80 18 71 38 2d 40 58 2c
Jan  2 22:16:10 localhost kernel: [    9.514845]      45 00 dc 0c 11
00 00 1e 66 21 50 b0 51 00 1b 30
Jan  2 22:16:10 localhost kernel: [    9.514846]      40 70 36 00 dc
0c 11 00 00 1e 00 00 00 fc 00 32
Jan  2 22:16:10 localhost kernel: [    9.514848]      32 4c 31 31 41
2d 48 44 2d 41 55 0a 00 00 00 fd
Jan  2 22:16:10 localhost kernel: [    9.514850]      00 31 3d 0f 44
0f 00 0a 20 20 20 20 20 20 01 15
Jan  2 22:16:10 localhost kernel: [    9.514856] amdgpu 0000:01:00.0:
HDMI-A-1: EDID invalid.
Jan  2 22:16:10 localhost kernel: [    9.515693] [drm] fb mappable at 0xD03F2000
Jan  2 22:16:10 localhost kernel: [    9.515695] [drm] vram apper at 0xD0000000
Jan  2 22:16:10 localhost kernel: [    9.515697] [drm] size 3145728
Jan  2 22:16:10 localhost kernel: [    9.515698] [drm] fb depth is 24
Jan  2 22:16:10 localhost kernel: [    9.515700] [drm]    pitch is 4096
Jan  2 22:16:10 localhost kernel: [    9.535511] Console: switching to
colour frame buffer device 128x48
Jan  2 22:16:10 localhost kernel: [    9.595719] amdgpu 0000:01:00.0:
fb0: amdgpudrmfb frame buffer device
Jan  2 22:16:10 localhost kernel: [    9.633154] [drm] Initialized
amdgpu 3.23.0 20150101 for 0000:01:00.0 on minor 0
Jan  2 22:16:12 localhost kernel: [   12.690445] IPv6:
ADDRCONF(NETDEV_UP): eth0: link is not ready
Jan  2 22:16:12 localhost kernel: [   12.782663] IPv6:
ADDRCONF(NETDEV_UP): eth0: link is not ready
Jan  2 22:16:12 localhost kernel: [   12.804369] IPv6:
ADDRCONF(NETDEV_UP): eth1: link is not ready
Jan  2 22:16:12 localhost kernel: [   12.805095] IPv6:
ADDRCONF(NETDEV_UP): eth1: link is not ready
Jan  2 22:16:12 localhost kernel: [   12.840083] IPv6:
ADDRCONF(NETDEV_UP): eth2: link is not ready
Jan  2 22:16:12 localhost kernel: [   12.841117] IPv6:
ADDRCONF(NETDEV_UP): eth2: link is not ready
Jan  2 22:16:12 localhost kernel: [   13.371194] [drm] EDID checksum
is invalid, remainder is 223
Jan  2 22:16:12 localhost kernel: [   13.371199] Raw EDID:
Jan  2 22:16:12 localhost kernel: [   13.371202]      00 ff ff ff ff
ff ff 00 2e 83 54 21 34 00 00 00
Jan  2 22:16:12 localhost kernel: [   13.371204]      29 15 01 03 80
30 1b 78 0a f0 65 98 57 51 91 27
Jan  2 22:16:12 localhost kernel: [   13.371206]      00 50 54 21 08
00 81 80 a9 c0 01 01 01 01 01 01
Jan  2 22:16:12 localhost kernel: [   13.371208]      01 01 01 01 01
01 02 3a 80 18 71 38 2d 40 58 2c
Jan  2 22:16:12 localhost kernel: [   13.371209]      45 00 dc 0c 11
00 00 1e 66 21 50 b0 51 00 1b 30
Jan  2 22:16:12 localhost kernel: [   13.371211]      40 70 36 00 dc
0c 11 00 00 1e 00 00 00 fc 00 32
Jan  2 22:16:12 localhost kernel: [   13.371213]      32 4c 31 31 41
2d 48 44 2d 41 55 0a 00 00 00 fd
Jan  2 22:16:12 localhost kernel: [   13.371214]      00 31 3d 0f 44
0f 00 0a 20 20 20 20 20 20 01 15
Jan  2 22:16:12 localhost kernel: [   13.371220] amdgpu 0000:01:00.0:
HDMI-A-1: EDID invalid.
Jan  2 22:16:12 localhost kernel: [   13.372798] [drm] EDID checksum
is invalid, remainder is 223
Jan  2 22:16:12 localhost kernel: [   13.372801] Raw EDID:
Jan  2 22:16:12 localhost kernel: [   13.372804]      00 ff ff ff ff
ff ff 00 2e 83 54 21 34 00 00 00
Jan  2 22:16:12 localhost kernel: [   13.372806]      29 15 01 03 80
30 1b 78 0a f0 65 98 57 51 91 27
Jan  2 22:16:12 localhost kernel: [   13.372808]      00 50 54 21 08
00 81 80 a9 c0 01 01 01 01 01 01
Jan  2 22:16:12 localhost kernel: [   13.372809]      01 01 01 01 01
01 02 3a 80 18 71 38 2d 40 58 2c
Jan  2 22:16:12 localhost kernel: [   13.372811]      45 00 dc 0c 11
00 00 1e 66 21 50 b0 51 00 1b 30
Jan  2 22:16:12 localhost kernel: [   13.372813]      40 70 36 00 dc
0c 11 00 00 1e 00 00 00 fc 00 32
Jan  2 22:16:12 localhost kernel: [   13.372814]      32 4c 31 31 41
2d 48 44 2d 41 55 0a 00 00 00 fd
Jan  2 22:16:12 localhost kernel: [   13.372816]      00 31 3d 0f 44
0f 00 0a 20 20 20 20 20 20 01 15
Jan  2 22:16:12 localhost kernel: [   13.372821] amdgpu 0000:01:00.0:
HDMI-A-1: EDID invalid.
Jan  2 22:16:13 localhost kernel: [   13.831733] mvneta
f1030000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
Jan  2 22:16:13 localhost kernel: [   13.831755] IPv6:
ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Jan  2 22:16:13 localhost kernel: [   13.911730] mvneta
f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control off
Jan  2 22:16:13 localhost kernel: [   13.911748] IPv6:
ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
Jan  2 22:16:13 localhost kernel: [   14.209369] random: crng init done
Jan  2 22:16:13 localhost kernel: [   14.534838] fuse init (API version 7.26)
Jan  2 22:16:15 localhost kernel: [   15.992865] mvneta
f1070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
Jan  2 22:16:15 localhost kernel: [   15.992884] IPv6:
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

On Tue, Jan 2, 2018 at 1:17 PM, Christian König
<christian.koenig@amd.com> wrote:
>> when you refer to API traces, you're suggesting to strace
>> kodi, or what do you mean?
>
> What I meant was apitrace (https://github.com/apitrace/apitrace), but when
> even the lightdm login screen crashes than this won't be much helpful.
>
> That strongly sounds like a ARM specific problem, maybe USWC doesn't work as
> it should? See function drm_arch_can_wc_memory() in the kernel source and
> try if it helps if you always return false.
>
> Apart from that the only other explanation I have is that some system memory
> isn't accessible for the GPU while some other is working fine.
>
> Please provide the output of "sudo cat /proc/iomem" to double check that.
>
> Regards,
> Christian.
>
>
> Am 02.01.2018 um 14:09 schrieb Luís Mendes:
>>
>> Dear Mr. David, Mr. Christian,
>>
>> First of all, thanks for your replies!
>>
>> David, I will try the same software versions on x86 to see if I am
>> able to replicate the problem on x86, but I suspect it is ARM
>> specific... I'll report back when I have more details.
>>
>> Christian, I'll collect the data you've referred and will disable the
>> power management. Regarding the mesa master version, I've tried it,
>> and the problem just gets worse. With latest mesa, It easily locks up
>> in lightdm login screen, or when navigating through the Ubuntu Mate
>> menus, or with Kodi.  I've tested with mesa commit "radv: Implement
>> binning on GFX9" - 6a36bfc64d2096aa338958c4605f5fc6372c07b8. Just one
>> question... when you refer to API traces, you're suggesting to strace
>> kodi, or what do you mean?
>>
>> Regards,
>> Luís
>>
>> On Tue, Jan 2, 2018 at 2:51 AM, Chunming Zhou <zhoucm1@amd.com> wrote:
>>>
>>> Did you try it on x86 board? Is there same issue?
>>>
>>> We should identify it is ARM specific or genera issue for amdgpu driver.
>>>
>>>
>>> Thanks,
>>>
>>> David Zhou
>>>
>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                 ` <CAEzXK1qCfnHUvnuuViTP5fZQx2StjecV6o_QfJ2uJk603_9nhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-03  0:36                   ` Luís Mendes
       [not found]                     ` <CAEzXK1oUyHsSGHeXS9qzWBSDL-FfRq2h-EiQMCfa=5BroO50gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-03  0:36 UTC (permalink / raw)
  To: Christian König
  Cc: alexander.deucher-5C7GfCeVMHo, Chunming Zhou,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Just a small update, regarding to what I have posted...

I've made additional tests with mesa-17.4 at commit "radv: Implement
binning on GFX9" - 6a36bfc64d2096aa338958c4605f5fc6372c07b8 and I was
able to gather a smaller apitrace of kodi playing a video with about
1GB that hangs the GPU, almost always, when replayed with glretrace if
without the option --singlethread. If option --singlethread is used,
when doing glretrace, no gpu hang occurs, ever, it seems.

For some reason now I am getting past the lightdm login screen without
issues, maybe some of the suggested changes improved the behaviour
with mesa-17.4, however with mesa-17.3.1 I didn't have those issues
anyway.

Now both mesa-17.3.1 and mesa-17.4 behave similarly, blocking while
playing video with kodi, but is also possible to cause the gpu hang
with other applications.
On the other hand pure openGL application seem to work fine... I am
able to run glmark2 tests without issues.

How can I send these apitraces?

On Tue, Jan 2, 2018 at 10:29 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
> Ok... I've done some of the suggested tests.
>
> I still haven't tested on x86, but I'll get to that.
>
> I've recompiled the kernel to disable Power Management as much as
> possible at all levels, including the PCIe, I've also modified
> /include/drm/drm_cache.h - static inline bool
> drm_arch_can_wc_memory(void) to always return false, but neither
> solved the issue.
>
> When I run kodi under apitrace with mesa 17.3.1 it becomes much more
> difficult to reproduce the crash, there are a lot of missed frames due
> to the CPU overload of apitrace, but I was to able to crash the GPU
> once. The apitrace log has 2.3GB, how should I send it?
> It happened while playing a VP9 encoded webm video file, which is
> decoded by software, as RX 460 is unable to hardware decode this codec
> AFAIK. In fact software decoded videos are more prone to produce the
> GPU hang, while a H265 4K hardware decoded video never causes a GPU
> hang. I'm affraid I forgot to have kodi to log the execution data when
> I did the apitrace.
>
>
> The full dmesg is presented below as well as the /proc/iomem
> information and lspci output.
>  I just want to note that I'm having EDID DDC errors with my TV
> screen, because at some point in kernel 4.14 onwards, both the RX460
> as well as the RX550 cards started to corrupt the I2C TV screen EDID
> memory, so that I have to reflash the correct EDID data to get the
> screen back to its own configuration. This is a rare problem that only
> occurs with this TV. All other TVs and monitors that I've tested don't
> show this EDID corruption issue. I currently have stopped to reflash
> the I2C EDID configuration memory of my TV to avoid exceeding the
> memory write cycles endurance, instead I now modify gpu/drm/drm_edid.c
> in function drm_do_get_edid() to allow the corrupted EDID to pass and
> enter X. So please ignore the EDID error warnings on my dmesg log. The
> GPU hangs occur just the same, even when I have the correct EDID, as
> it is an unrelated issue.
>
> Regards,
> Luís
>
> iomem shows this:
> 00000000-3fffffff : System RAM
>   00008000-00efffff : Kernel code
>   01000000-010e3913 : Kernel data
> d0000000-efffffff : PCI MEM
>   d0000000-e7ffffff : PCI Bus 0000:01
>     d0000000-dfffffff : 0000:01:00.0
>     e0000000-e01fffff : 0000:01:00.0
>     e0200000-e023ffff : 0000:01:00.0
>     e0240000-e025ffff : 0000:01:00.0
>     e0260000-e0263fff : 0000:01:00.1
>       e0260000-e0263fff : ICH HD audio
> f1010680-f10106cf : spi@10680
> f1011000-f101101f : i2c@11000
> f1011100-f101111f : i2c@11100
> f1012000-f101201f : serial
> f1012100-f101211f : serial
> f1018000-f101801f : pinctrl@18000
> f1018100-f101813f : gpio
> f1018140-f101817f : gpio
> f1018454-f1018457 : conf-sdio3
> f10184a0-f10184ab : rtc-soc
> f1020704-f1020707 : watchdog@20300
> f1020800-f102080f : cpurst@20800
> f1020a00-f1020ccf : interrupt-controller@20a00
> f1021070-f10210c7 : interrupt-controller@20a00
> f1022000-f1022fff : pmsu@22000
> f1030000-f1033fff : ethernet@30000
> f1034000-f1037fff : ethernet@34000
> f1040000-f1041fff : pcie@2,0
> f1044000-f1045fff : pcie@3,0
> f1058000-f10584ff : usb@58000
> f1070000-f1073fff : ethernet@70000
> f10a3800-f10a381f : rtc
> f10a8000-f10a9fff : sata@a8000
> f10d8000-f10d8fff : sdhci
> f10e0000-f10e1fff : sata@e0000
> f10e4074-f10e4077 : thermal@e8078
> f10e4078-f10e407b : thermal@e8078
> f10f0000-f10f3fff : usb3@f0000
> f10f8000-f10fbfff : usb3@f8000
> f1100000-f11007ff : f1100000.sa-sram0
> f1110000-f11107ff : f1110000.sa-sram1
> f1200000-f12fffff : f1200000.bm-bppi
>
> lspci is like this:
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> [AMD/ATI] Baffin [Radeon RX 460] (rev cf) (prog-if 00)
>         Subsystem: PC Partner Limited / Sapphire Technology Baffin
> [Radeon RX 460]
>         Flags: bus master, fast devsel, latency 0, IRQ 57
>         Memory at d0000000 (64-bit, prefetchable) [size=256M]
>         Memory at e0000000 (64-bit, prefetchable) [size=2M]
>         I/O ports at 10000 [size=256]
>         Memory at e0200000 (32-bit, non-prefetchable) [size=256K]
>         Expansion ROM at e0240000 [disabled] [size=128K]
>         Capabilities: [48] Vendor Specific Information: Len=08 <?>
>         Capabilities: [50] Power Management version 3
>         Capabilities: [58] Express Legacy Endpoint, MSI 00
>         Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>         Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
>         Capabilities: [150] Advanced Error Reporting
>         Capabilities: [200] #15
>         Capabilities: [270] #19
>         Capabilities: [2b0] Address Translation Service (ATS)
>         Capabilities: [2c0] Page Request Interface (PRI)
>         Capabilities: [2d0] Process Address Space ID (PASID)
>         Capabilities: [320] Latency Tolerance Reporting
>         Capabilities: [328] Alternative Routing-ID Interpretation
> (ARI)
>         Capabilities: [370] L1 PM Substates
>         Kernel driver in use: amdgpu
>         Kernel modules: amdgpu
>
> 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device
> aae0
>         Subsystem: PC Partner Limited / Sapphire Technology Device
> aae0
>         Flags: bus master, fast devsel, latency 0, IRQ 56
>         Memory at e0260000 (64-bit, non-prefetchable) [size=16K]
>         Capabilities: [48] Vendor Specific Information: Len=08 <?>
>         Capabilities: [50] Power Management version 3
>         Capabilities: [58] Express Legacy Endpoint, MSI 00
>         Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>         Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
> Len=010 <?>
>         Capabilities: [150] Advanced Error Reporting
>         Capabilities: [328] Alternative Routing-ID Interpretation
> (ARI)
>         Kernel driver in use: snd_hda_intel
>         Kernel modules: snd_hda_intel
>
>
> Full dmesg output follows:
>
> Starting kernel ...
>
> [    0.000000] Booting Linux on physical CPU 0x0
> [    0.000000] Linux version 4.15.0-rc4-strong-g104bd2c-dirty
> (lpnm@ENIAC10) (gcc version 5.4.0 20160609 (Ubuntu/Lina8
> [    0.000000] CPU: ARMv7 Processor [414fc091] revision 1 (ARMv7),
> cr=10c5387d
> [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
> instruction cache
> [    0.000000] Memory policy: Data cache writealloc
> [    0.000000] random: fast init done
> [    0.000000] percpu: Embedded 16 pages/cpu @(ptrval) s35148 r8192
> d22196 u65536
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages:
> 260096
> [    0.000000] Kernel command line: root=/dev/sda2 rw rootfstype=ext4
> rootwait console=ttyS0,115200n8
> [    0.000000] Dentry cache hash table entries: 131072 (order: 7,
> 524288 bytes)
> [    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144
> bytes)
> [    0.000000] Memory: 1023088K/1048576K available (11264K kernel
> code, 685K rwdata, 2296K rodata, 1024K init, 224K b)
> [    0.000000] Virtual kernel memory layout:
> [    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
> [    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
> [    0.000000]     vmalloc : 0xc0800000 - 0xff800000   (1008 MB)
> [    0.000000]     lowmem  : 0x80000000 - 0xc0000000   (1024 MB)
> [    0.000000]     pkmap   : 0x7fe00000 - 0x80000000   (   2 MB)
> [    0.000000]     modules : 0x7f000000 - 0x7fe00000   (  14 MB)
> [    0.000000]       .text : 0x(ptrval) - 0x(ptrval)   (12256 kB)
> [    0.000000]       .init : 0x(ptrval) - 0x(ptrval)   (1024 kB)
> [    0.000000]       .data : 0x(ptrval) - 0x(ptrval)   ( 686 kB)
> [    0.000000]        .bss : 0x(ptrval) - 0x(ptrval)   ( 225 kB)
> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2,
> Nodes=1
> [    0.000000] Hierarchical RCU implementation.
> [    0.000000]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
> [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
> nr_cpu_ids=2
> [    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
> [    0.000000] L2C-310 erratum 769419 enabled
> [    0.000000] L2C-310 enabling early BRESP for Cortex-A9
> [    0.000000] L2C-310 full line of zeros enabled for Cortex-A9
> [    0.000000] L2C-310 D prefetch enabled, offset 1 lines
> [    0.000000] L2C-310 dynamic clock gating enabled, standby mode
> enabled
> [    0.000000] L2C-310 Coherent cache controller enabled, 16 ways,
> 1024 kB
> [    0.000000] L2C-310 Coherent: CACHE_ID 0x410054c9, AUX_CTRL
> 0x56070001
> [    0.000006] sched_clock: 64 bits at 800MHz, resolution 1ns, wraps
> every 4398046511103ns
> [    0.000017] clocksource: arm_global_timer: mask: 0xffffffffffffffff
> max_cycles: 0xb881274fa3, max_idle_ns: 4407952s
> [    0.000030] Switching to timer-based delay loop, resolution 1ns
> [    0.000168] Ignoring duplicate/late registration of
> read_current_timer delay
> [    0.000175] clocksource: armada_370_xp_clocksource: mask:
> 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76450417s
> [    0.000351] Console: colour dummy device 80x30
> [    0.000364] Calibrating delay loop (skipped), value calculated
> using timer frequency.. 1600.00 BogoMIPS (lpj=80000)
> [    0.000370] pid_max: default: 32768 minimum: 301
> [    0.000416] Mount-cache hash table entries: 2048 (order: 1, 8192
> bytes)
> [    0.000422] Mountpoint-cache hash table entries: 2048 (order: 1,
> 8192 bytes)
> [    0.000660] CPU: Testing write buffer coherency: ok
> [    0.000775] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
> [    0.000907] Setting up static identity map for 0x100000 - 0x100060
> [    0.000982] mvebu-soc-id: MVEBU SoC ID=0x6828, Rev=0x4
> [    0.001058] mvebu-pmsu: Initializing Power Management Service Unit
> [    0.001107] Hierarchical SRCU implementation.
> [    0.002574] smp: Bringing up secondary CPUs ...
> [    0.002727] Booting CPU 1
> [    0.002882] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> [    0.002927] smp: Brought up 1 node, 2 CPUs
> [    0.002933] SMP: Total of 2 processors activated (3200.00
> BogoMIPS).
> [    0.002936] CPU: All CPU(s) started in SVC mode.
> [    0.003353] devtmpfs: initialized
> [    0.005053] VFP support v0.3: implementor 41 architecture 3 part 30
> variant 9 rev 4
> [    0.005112] clocksource: jiffies: mask: 0xffffffff max_cycles:
> 0xffffffff, max_idle_ns: 19112604462750000 ns
> [    0.005120] futex hash table entries: 512 (order: 3, 32768 bytes)
> [    0.005188] xor: measuring software checksum speed
> [    0.100066]    arm4regs  :  2506.400 MB/sec
> [    0.200065]    8regs     :  1967.600 MB/sec
> [    0.300066]    32regs    :  1854.800 MB/sec
> [    0.400067]    neon      :  1822.000 MB/sec
> [    0.400070] xor: using function: arm4regs (2506.400 MB/sec)
> [    0.400076] pinctrl core: initialized pinctrl subsystem
> [    0.400421] NET: Registered protocol family 16
> [    0.400918] DMA: preallocated 256 KiB pool for atomic coherent
> allocations
> [    0.401426] hw-breakpoint: found 5 (+1 reserved) breakpoint and 1
> watchpoint registers.
> [    0.401432] hw-breakpoint: maximum watchpoint size is 4 bytes.
> [    0.401547] mvebu-pmsu: CPU hotplug support is currently broken on
> Armada 38x: disabling
> [    0.401554] mvebu-pmsu: CPU idle is currently broken on Armada 38x:
> disabling
> [    0.580251] raid6: int32x1  gen()   210 MB/s
> [    0.750114] raid6: int32x1  xor()   299 MB/s
> [    0.920090] raid6: int32x2  gen()   291 MB/s
> [    1.090117] raid6: int32x2  xor()   345 MB/s
> [    1.260146] raid6: int32x4  gen()   385 MB/s
> [    1.430096] raid6: int32x4  xor()   343 MB/s
> [    1.600076] raid6: int32x8  gen()   405 MB/s
> [    1.770097] raid6: int32x8  xor()   283 MB/s
> [    1.940098] raid6: neonx1   gen()  1212 MB/s
> [    2.110070] raid6: neonx1   xor()  1147 MB/s
> [    2.280110] raid6: neonx2   gen()  1294 MB/s
> [    2.450092] raid6: neonx2   xor()  1331 MB/s
> [    2.620096] raid6: neonx4   gen()  1271 MB/s
> [    2.790069] raid6: neonx4   xor()  1273 MB/s
> [    2.960097] raid6: neonx8   gen()  1094 MB/s
> [    3.130097] raid6: neonx8   xor()  1033 MB/s
> [    3.130100] raid6: using algorithm neonx2 gen() 1294 MB/s
> [    3.130103] raid6: .... xor() 1331 MB/s, rmw enabled
> [    3.130106] raid6: using neon recovery algorithm
> [    3.130501] vgaarb: loaded
> [    3.130631] SCSI subsystem initialized
> [    3.130846] usbcore: registered new interface driver usbfs
> [    3.130870] usbcore: registered new interface driver hub
> [    3.130895] usbcore: registered new device driver usb
> [    3.130993] pps_core: LinuxPPS API ver. 1 registered
> [    3.130997] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
> Rodolfo Giometti <giometti@linux.it>
> [    3.131007] PTP clock support registered
> [    3.131255] Advanced Linux Sound Architecture Driver Initialized.
> [    3.131458] Bluetooth: Core ver 2.22
> [    3.131480] NET: Registered protocol family 31
> [    3.131484] Bluetooth: HCI device and connection manager
> initialized
> [    3.131490] Bluetooth: HCI socket layer initialized
> [    3.131494] Bluetooth: L2CAP socket layer initialized
> [    3.131507] Bluetooth: SCO socket layer initialized
> [    3.131694] clocksource: Switched to clocksource arm_global_timer
> [    3.156110] NET: Registered protocol family 2
> [    3.156352] TCP established hash table entries: 8192 (order: 3,
> 32768 bytes)
> [    3.156391] TCP bind hash table entries: 8192 (order: 4, 65536
> bytes)
> [    3.156452] TCP: Hash tables configured (established 8192 bind
> 8192)
> [    3.156507] UDP hash table entries: 512 (order: 2, 16384 bytes)
> [    3.156532] UDP-Lite hash table entries: 512 (order: 2, 16384
> bytes)
> [    3.156610] NET: Registered protocol family 1
> [    3.156807] RPC: Registered named UNIX socket transport module.
> [    3.156811] RPC: Registered udp transport module.
> [    3.156814] RPC: Registered tcp transport module.
> [    3.156817] RPC: Registered tcp NFSv4.1 backchannel transport
> module.
> [    3.157229] hw perfevents: enabled with armv7_cortex_a9 PMU driver,
> 7 counters available
> [    3.157717] Initialise system trusted keyrings
> [    3.157939] workingset: timestamp_bits=30 max_order=18
> bucket_order=0
> [    3.160489] Installing knfsd (copyright (C) 1996
> okir@monad.swb.de).
> [    3.162171] async_tx: api initialized (async)
> [    3.162179] Key type asymmetric registered
> [    3.162183] Asymmetric key parser 'x509' registered
> [    3.162218] Block layer SCSI generic (bsg) driver version 0.4
> loaded (major 249)
> [    3.162224] io scheduler noop registered
> [    3.162228] io scheduler deadline registered
> [    3.162300] io scheduler cfq registered (default)
> [    3.162305] io scheduler mq-deadline registered
> [    3.162309] io scheduler kyber registered
> [    3.162896] armada-38x-pinctrl f1018000.pinctrl: registered pinctrl
> driver
> [    3.163856] mvebu-pcie soc:pcie: /soc/pcie/pcie@2,0: reset gpio is
> active low
> [    3.164199] mv_xor f1060800.xor: Marvell shared XOR driver
> [    3.222153] mv_xor f1060800.xor: Marvell XOR (Descriptor Mode): (
> xor cpy intr )
> [    3.222894] mv_xor f1060900.xor: Marvell shared XOR driver
> [    3.282140] mv_xor f1060900.xor: Marvell XOR (Descriptor Mode): (
> xor cpy intr )
> [    3.301102] Serial: 8250/16550 driver, 4 ports, IRQ sharing
> disabled
> [    3.301902] console [ttyS0] disabled
> [    3.321974] f1012000.serial: ttyS0 at MMIO 0xf1012000 (irq = 23,
> base_baud = 15625000) is a 16550A
> [    4.145968] console [ttyS0] enabled
> [    4.169945] f1012100.serial: ttyS1 at MMIO 0xf1012100 (irq = 24,
> base_baud = 15625000) is a 16550A
> [    4.179505] ahci-mvebu f10a8000.sata: AHCI 0001.0000 32 slots 2
> ports 6 Gbps 0x3 impl platform mode
> [    4.188591] ahci-mvebu f10a8000.sata: flags: 64bit ncq sntf led
> only pmp fbs pio slum part sxs
> [    4.197699] scsi host0: ahci-mvebu
> [    4.201249] scsi host1: ahci-mvebu
> [    4.204774] ata1: SATA max UDMA/133 mmio [mem
> 0xf10a8000-0xf10a9fff] port 0x100 irq 45
> [    4.212715] ata2: SATA max UDMA/133 mmio [mem
> 0xf10a8000-0xf10a9fff] port 0x180 irq 45
> [    4.220778] ahci-mvebu f10e0000.sata: AHCI 0001.0000 32 slots 2
> ports 6 Gbps 0x3 impl platform mode
> [    4.229854] ahci-mvebu f10e0000.sata: flags: 64bit ncq sntf led
> only pmp fbs pio slum part sxs
> [    4.238957] scsi host2: ahci-mvebu
> [    4.242505] scsi host3: ahci-mvebu
> [    4.245987] ata3: SATA max UDMA/133 mmio [mem
> 0xf10e0000-0xf10e1fff] port 0x100 irq 46
> [    4.253944] ata4: SATA max UDMA/133 mmio [mem
> 0xf10e0000-0xf10e1fff] port 0x180 irq 46
> [    4.262653] Ethernet Channel Bonding Driver: v3.7.1 (April 27,
> 2011)
> [    4.269799] libphy: Fixed MDIO Bus: probed
> [    4.274099] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
> [    4.279945] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [    4.285917] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver -
> version 5.1.0-k
> [    4.293594] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
> [    4.299440] ixgb: Intel(R) PRO/10GbE Network Driver - version
> 1.0.135-k2-NAPI
> [    4.306607] ixgb: Copyright (c) 1999-2008 Intel Corporation.
> [    4.312474] libphy: orion_mdio_bus: probed
> [    4.318289] mv88e6085: probe of f1072004.mdio-mii:04 failed with
> error -110
> [    4.326210] mvneta f1070000.ethernet eth0: Using device tree mac
> address 00:50:43:56:27:29
> [    4.335261] mvneta f1030000.ethernet eth1: Using device tree mac
> address 00:50:43:56:8b:29
> [    4.344297] mvneta f1034000.ethernet eth2: Using device tree mac
> address 00:50:43:27:8b:56
> [    4.352756] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network
> Connection driver for Linux, in-tree:s
> [    4.361914] iwl3945: Copyright(c) 2003-2011 Intel Corporation
> [    4.367671] iwl3945: hw_scan is disabled
> [    4.371688] usbcore: registered new interface driver asix
> [    4.377123] usbcore: registered new interface driver ax88179_178a
> [    4.383251] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI)
> Driver
> [    4.389796] ehci-pci: EHCI PCI platform driver
> [    4.394273] ehci-orion: EHCI orion driver
> [    4.398380] orion-ehci f1058000.usb: EHCI Host Controller
> [    4.403807] orion-ehci f1058000.usb: new USB bus registered,
> assigned bus number 1
> [    4.411440] orion-ehci f1058000.usb: irq 41, io mem 0xf1058000
> [    4.441680] orion-ehci f1058000.usb: USB 2.0 started, EHCI 1.00
> [    4.447935] hub 1-0:1.0: USB hub found
> [    4.451780] hub 1-0:1.0: 1 port detected
> [    4.456093] xhci-hcd f10f0000.usb3: xHCI Host Controller
> [    4.461436] xhci-hcd f10f0000.usb3: new USB bus registered,
> assigned bus number 2
> [    4.469014] xhci-hcd f10f0000.usb3: hcc params 0x0a000990 hci
> version 0x100 quirks 0x00010010
> [    4.477582] xhci-hcd f10f0000.usb3: irq 48, io mem 0xf10f0000
> [    4.483661] hub 2-0:1.0: USB hub found
> [    4.487443] hub 2-0:1.0: 1 port detected
> [    4.491486] xhci-hcd f10f0000.usb3: xHCI Host Controller
> [    4.496838] xhci-hcd f10f0000.usb3: new USB bus registered,
> assigned bus number 3
> [    4.504383] usb usb3: We don't know the algorithms for LPM for this
> host, disabling LPM.
> [    4.512749] hub 3-0:1.0: USB hub found
> [    4.516526] hub 3-0:1.0: 1 port detected
> [    4.520635] xhci-hcd f10f8000.usb3: xHCI Host Controller
> [    4.525989] xhci-hcd f10f8000.usb3: new USB bus registered,
> assigned bus number 4
> [    4.533545] xhci-hcd f10f8000.usb3: hcc params 0x0a000990 hci
> version 0x100 quirks 0x00010010
> [    4.542111] xhci-hcd f10f8000.usb3: irq 49, io mem 0xf10f8000
> [    4.548169] hub 4-0:1.0: USB hub found
> [    4.551961] ata2: SATA link down (SStatus 0 SControl 300)
> [    4.551965] hub 4-0:1.0: 1 port detected
> [    4.552066] xhci-hcd f10f8000.usb3: xHCI Host Controller
> [    4.552073] xhci-hcd f10f8000.usb3: new USB bus registered,
> assigned bus number 5
> [    4.552119] usb usb5: We don't know the algorithms for LPM for this
> host, disabling LPM.
> [    4.552414] hub 5-0:1.0: USB hub found
> [    4.552444] hub 5-0:1.0: 1 port detected
> [    4.552652] usbcore: registered new interface driver usb-storage
> [    4.552824] mousedev: PS/2 mouse device common for all mice
> [    4.553353] armada38x-rtc f10a3800.rtc: rtc core: registered
> f10a3800.rtc as rtc0
> [    4.553480] i2c /dev entries driver
> [    4.553855] pca953x 0-0020: 0-0020 supply vcc not found, using
> dummy regulator
> [    4.567037] GPIO line 496 (pcie1.0-clkreq) hogged as input
> [    4.568076] GPIO line 499 (pcie1.0-w-disable) hogged as output/low
> [    4.568805] GPIO line 501 (usb3-current-limit) hogged as input
> [    4.569844] GPIO line 502 (usb3-power) hogged as output/high
> [    4.570883] GPIO line 507 (m.2 devslp) hogged as output/low
> [    4.571613] GPIO line 508 (sfp-los) hogged as input
> [    4.572343] GPIO line 509 (sfp-tx-fault) hogged as input
> [    4.573382] GPIO line 510 (sfp-tx-disable) hogged as output/low
> [    4.574111] GPIO line 511 (sfp-mod-def0) hogged as input
> [    4.574840] GPIO line 500 (pcie2.0-clkreq) hogged as input
> [    4.575879] GPIO line 503 (pcie2.0-w-disable) hogged as output/low
> [    4.575961] pca953x 0-0020: interrupt support not compiled in
> [    4.576394] IR NEC protocol handler initialized
> [    4.576396] IR RC5(x/sz) protocol handler initialized
> [    4.576397] IR RC6 protocol handler initialized
> [    4.576398] IR JVC protocol handler initialized
> [    4.576399] IR Sony protocol handler initialized
> [    4.576400] IR SANYO protocol handler initialized
> [    4.576402] IR Sharp protocol handler initialized
> [    4.576403] IR MCE Keyboard/mouse protocol handler initialized
> [    4.576404] IR XMP protocol handler initialized
> [    4.586610] (NULL device *): hwmon_device_register() is deprecated.
> Please convert the driver to use hwmon_device_.
> [    4.587121] orion_wdt: Initial timeout 171 sec
> [    4.587445] sdhci: Secure Digital Host Controller Interface driver
> [    4.587446] sdhci: Copyright(c) Pierre Ossman
> [    4.587610] sdhci-pxav3 f10d8000.sdhci: Got CD GPIO
> [    4.601813] ata4: SATA link down (SStatus 0 SControl 300)
> [    4.602073] ata3: SATA link down (SStatus 0 SControl 300)
> [    4.651761] mmc0: SDHCI controller on f10d8000.sdhci
> [f10d8000.sdhci] using ADMA
> [    4.651860] sdhci-pltfm: SDHCI platform and OF driver helper
> [    4.652046] usbcore: registered new interface driver usbhid
> [    4.652047] usbhid: USB HID core driver
> [    4.653005] NET: Registered protocol family 10
> [    4.653849] Segment Routing with IPv6
> [    4.653881] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> [    4.654110] NET: Registered protocol family 17
> [    4.657571] 8021q: 802.1Q VLAN Support v1.8
> [    4.657655] ThumbEE CPU extension supported.
> [    4.657660] Registering SWP/SWPB emulation handler
> [    4.657907] Loading compiled-in X.509 certificates
> [    4.658691] Btrfs loaded, crc32c=crc32c-generic
> [    4.659319] mvebu-pcie soc:pcie: /soc/pcie/pcie@2,0: reset gpio is
> active low
> [    4.659357] mvebu-pcie soc:pcie: /soc/pcie/pcie@3,0: reset gpio is
> active low
> [    4.704927] mmc0: new high speed SDHC card at address aaaa
> [    4.705127] mmcblk0: mmc0:aaaa SL16G 14.8 GiB
> [    4.707004]  mmcblk0: p1 p2
> [    4.891777] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
> [    4.897716] pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
> [    4.904006] pci_bus 0000:00: root bus resource [mem
> 0xd0000000-0xefffffff]
> [    4.910897] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    4.916598] PCI: bus0: Fast back to back transfers disabled
> [    4.922192] pci 0000:00:02.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [    4.930217] pci 0000:00:03.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> [    4.938401] pci 0000:01:00.0: enabling Extended Tags
> [    4.943568] pci 0000:01:00.0: vgaarb: VGA device added:
> decodes=io+mem,owns=none,locks=none
> [    4.952057] pci 0000:01:00.1: enabling Extended Tags
> [    4.957200] PCI: bus1: Fast back to back transfers disabled
> [    4.962831] PCI: bus2: Fast back to back transfers enabled
> [    4.968347] pci 0000:00:02.0: BAR 8: assigned [mem
> 0xd0000000-0xe7ffffff]
> [    4.975159] pci 0000:00:02.0: BAR 7: assigned [io  0x10000-0x10fff]
> [    4.981444] pci 0000:01:00.0: BAR 0: assigned [mem
> 0xd0000000-0xdfffffff 64bit pref]
> [    4.989219] pci 0000:01:00.0: BAR 2: assigned [mem
> 0xe0000000-0xe01fffff 64bit pref]
> [    4.996991] pci 0000:01:00.0: BAR 5: assigned [mem
> 0xe0200000-0xe023ffff]
> [    5.003802] pci 0000:01:00.0: BAR 6: assigned [mem
> 0xe0240000-0xe025ffff pref]
> [    5.011041] pci 0000:01:00.1: BAR 0: assigned [mem
> 0xe0260000-0xe0263fff 64bit]
> [    5.018378] pci 0000:01:00.0: BAR 4: assigned [io  0x10000-0x100ff]
> [    5.024667] pci 0000:00:02.0: PCI bridge to [bus 01]
> [    5.029642] pci 0000:00:02.0:   bridge window [io  0x10000-0x10fff]
> [    5.035929] pci 0000:00:02.0:   bridge window [mem
> 0xd0000000-0xe7ffffff]
> [    5.042738] pci 0000:00:03.0: PCI bridge to [bus 02]
> [    5.047757] pcieport 0000:00:02.0: enabling device (0140 -> 0143)
> [    5.054113] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [    5.054406] usb 4-1: new high-speed USB device number 2 using
> xhci-hcd
> [    5.054803] input: gpio-keys as
> /devices/platform/gpio-keys/input/input0
> [    5.060314] armada38x-rtc f10a3800.rtc: setting system clock to
> 2018-01-02 22:16:04 UTC (1514931364)
> [    5.060413] cfg80211: Loading compiled-in X.509 certificates for
> regulatory database
> [    5.062147] cfg80211: Loaded X.509 cert 'sforshee:
> 00b28ddf47aef9cea7'
> [    5.062170] ALSA device list:
> [    5.062171]   No soundcards found.
> [    5.062225] platform regulatory.0: Direct firmware load for
> regulatory.db failed with error -2
> [    5.062228] cfg80211: failed to load regulatory.db
> [    5.118446] ata1.00: supports DRM functions and may not be fully
> accessible
> [    5.125486] ata1.00: ATA-9: Samsung SSD 850 EVO mSATA 250GB,
> EMT41B6Q, max UDMA/133
> [    5.133166] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth
> 31/32)
> [    5.141562] ata1.00: supports DRM functions and may not be fully
> accessible
> [    5.149950] ata1.00: configured for UDMA/133
> [    5.154421] scsi 0:0:0:0: Direct-Access     ATA      Samsung SSD
> 850  1B6Q PQ: 0 ANSI: 5
> [    5.162898] sd 0:0:0:0: [sda] 488397168 512-byte logical blocks:
> (250 GB/233 GiB)
> [    5.170424] sd 0:0:0:0: [sda] Write Protect is off
> [    5.175289] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [    5.185066]  sda: sda1 sda2 sda3
> [    5.188854] sd 0:0:0:0: [sda] Attached SCSI removable disk
> [    5.194390] md: Waiting for all devices to be available before
> autodetect
> [    5.201192] md: If you don't use raid, use raid=noautodetect
> [    5.207132] md: Autodetecting RAID arrays.
> [    5.211237] md: autorun ...
> [    5.214043] md: ... autorun DONE.
> [    5.223207] EXT4-fs (sda2): mounted filesystem with ordered data
> mode. Opts: (null)
> [    5.230899] VFS: Mounted root (ext4 filesystem) on device 8:2.
> [    5.237830] devtmpfs: mounted
> [    5.241302] Freeing unused kernel memory: 1024K
> [    5.247070] hub 4-1:1.0: USB hub found
> [    5.250926] hub 4-1:1.0: 4 ports detected
> [    5.317951] systemd[1]: systemd 234 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT )
> [    5.338867] systemd[1]: Detected architecture arm.
> Jan  2 22:16:10 localhost kernel: [    5.571694] usb 4-1.2: new
> low-speed USB device number 3 using xhci-hcd
> Jan  2 22:16:10 localhost kernel: [    5.740438] input: Trust Trust
> Wireless TouchKB as
> /devices/platform/soc/soc:internal-regs/f10f8000.usb3/usb4/4-1/4-1.2/4-1.2:1.0/0003:145F:01D3.0001/input/input1
> Jan  2 22:16:10 localhost kernel: [    5.822021] hid-generic
> 0003:145F:01D3.0001: input: USB HID v1.10 Keyboard [Trust Trust
> Wireless TouchKB] on usb-f10f8000.usb3-1.2/input0
> Jan  2 22:16:10 localhost kernel: [    5.842379] input: Trust Trust
> Wireless TouchKB as
> /devices/platform/soc/soc:internal-regs/f10f8000.usb3/usb4/4-1/4-1.2/4-1.2:1.1/0003:145F:01D3.0002/input/input2
> Jan  2 22:16:10 localhost kernel: [    5.921815] hid-generic
> 0003:145F:01D3.0002: input: USB HID v1.10 Mouse [Trust Trust Wireless
> TouchKB] on usb-f10f8000.usb3-1.2/input1
> Jan  2 22:16:10 localhost kernel: [    6.409779] lp: driver loaded but
> no devices found
> Jan  2 22:16:10 localhost kernel: [    6.417487] ppdev: user-space
> parallel port driver
> Jan  2 22:16:10 localhost kernel: [    6.614196] EXT4-fs (sda2):
> re-mounted. Opts: errors=remount-ro
> Jan  2 22:16:10 localhost kernel: [    7.648916] snd_hda_intel
> 0000:01:00.1: enabling device (0140 -> 0142)
> Jan  2 22:16:10 localhost kernel: [    7.648929] snd_hda_intel
> 0000:01:00.1: Force to snoop mode by module option
> Jan  2 22:16:10 localhost kernel: [    7.650649]
> drm_panel_orientation_quirks: module license 'unspecified' taints
> kernel.
> Jan  2 22:16:10 localhost kernel: [    7.650653] Disabling lock
> debugging due to kernel taint
> Jan  2 22:16:10 localhost kernel: [    7.706837] input: HDA ATI HDMI
> HDMI/DP,pcm=3 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input3
> Jan  2 22:16:10 localhost kernel: [    7.706974] input: HDA ATI HDMI
> HDMI/DP,pcm=7 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input4
> Jan  2 22:16:10 localhost kernel: [    7.707110] input: HDA ATI HDMI
> HDMI/DP,pcm=8 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input5
> Jan  2 22:16:10 localhost kernel: [    7.707249] input: HDA ATI HDMI
> HDMI/DP,pcm=9 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input6
> Jan  2 22:16:10 localhost kernel: [    7.707369] input: HDA ATI HDMI
> HDMI/DP,pcm=10 as
> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input7
> Jan  2 22:16:10 localhost kernel: [    7.777661] [drm] amdgpu kernel
> modesetting enabled.
> Jan  2 22:16:10 localhost kernel: [    7.778616] amdgpu 0000:01:00.0:
> enabling device (0140 -> 0143)
> Jan  2 22:16:10 localhost kernel: [    7.780795] [drm] initializing
> kernel modesetting (POLARIS11 0x1002:0x67EF 0x174B:0xE348 0xCF).
> Jan  2 22:16:10 localhost kernel: [    7.780809] [drm] register mmio
> base: 0xE0200000
> Jan  2 22:16:10 localhost kernel: [    7.780811] [drm] register mmio
> size: 262144
> Jan  2 22:16:10 localhost kernel: [    7.780846] [drm] probing gen 2
> caps for device 11ab:6828 = 3ac12/0
> Jan  2 22:16:10 localhost kernel: [    7.780848] [drm] probing mlw for
> device 11ab:6828 = 3ac12
> Jan  2 22:16:10 localhost kernel: [    7.780876] [drm] UVD is enabled in VM mode
> Jan  2 22:16:10 localhost kernel: [    7.780877] [drm] UVD ENC is
> enabled in VM mode
> Jan  2 22:16:10 localhost kernel: [    7.780882] [drm] VCE enabled in VM mode
> Jan  2 22:16:10 localhost kernel: [    7.998516] ATOM BIOS: 113-34801-U03
> Jan  2 22:16:10 localhost kernel: [    7.998546] [drm] GPU posting now...
> Jan  2 22:16:10 localhost kernel: [    8.128956] [drm] vm size is 64
> GB, 2 levels, block size is 10-bit, fragment size is 9-bit
> Jan  2 22:16:10 localhost kernel: [    8.133710] amdgpu 0000:01:00.0:
> VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
> Jan  2 22:16:10 localhost kernel: [    8.133724] amdgpu 0000:01:00.0:
> GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
> Jan  2 22:16:10 localhost kernel: [    8.133727] [drm] Detected VRAM
> RAM=2048M, BAR=256M
> Jan  2 22:16:10 localhost kernel: [    8.133729] [drm] RAM width 128bits GDDR5
> Jan  2 22:16:10 localhost kernel: [    8.133823] [TTM] Zone  kernel:
> Available graphics memory: 512056 kiB
> Jan  2 22:16:10 localhost kernel: [    8.133825] [TTM] Initializing
> pool allocator
> Jan  2 22:16:10 localhost kernel: [    8.133864] [TTM] Initializing
> DMA pool allocator
> Jan  2 22:16:10 localhost kernel: [    8.133906] [drm] amdgpu: 2048M
> of VRAM memory ready
> Jan  2 22:16:10 localhost kernel: [    8.133910] [drm] amdgpu: 750M of
> GTT memory ready.
> Jan  2 22:16:10 localhost kernel: [    8.133933] [drm] GART: num cpu
> pages 65536, num gpu pages 65536
> Jan  2 22:16:10 localhost kernel: [    8.134011] [drm] PCIE GART of
> 256M enabled (table at 0x000000F400040000).
> Jan  2 22:16:10 localhost kernel: [    8.135969] [drm] Chained IB
> support enabled!
> Jan  2 22:16:10 localhost kernel: [    8.222388] [drm] Found UVD
> firmware Version: 1.79 Family ID: 16
> Jan  2 22:16:10 localhost kernel: [    8.370002] [drm] Found VCE
> firmware Version: 52.4 Binary ID: 3
> Jan  2 22:16:10 localhost kernel: [    8.459983] amdgpu: [powerplay]
> Jan  2 22:16:10 localhost kernel: [    8.459983]  failed to send
> message 309 ret is 254
> Jan  2 22:16:10 localhost kernel: [    8.460011] amdgpu: [powerplay]
> Jan  2 22:16:10 localhost kernel: [    8.460011]  failed to send pre
> message 14e ret is 254
> Jan  2 22:16:10 localhost kernel: [    8.591421]
> [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR* Don't have
> enable_spread_spectrum_on_ppll for v4
> Jan  2 22:16:10 localhost kernel: [    8.602061]
> [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR* Don't have
> program_clock for v7
> Jan  2 22:16:10 localhost kernel: [    8.610933] [drm] DM_PPLIB:
> values for Engine clock
> Jan  2 22:16:10 localhost kernel: [    8.610937] [drm] DM_PPLIB:     21400
> Jan  2 22:16:10 localhost kernel: [    8.610939] [drm] DM_PPLIB:     48100
> Jan  2 22:16:10 localhost kernel: [    8.610940] [drm] DM_PPLIB:     76000
> Jan  2 22:16:10 localhost kernel: [    8.610941] [drm] DM_PPLIB:     102000
> Jan  2 22:16:10 localhost kernel: [    8.610943] [drm] DM_PPLIB:     110200
> Jan  2 22:16:10 localhost kernel: [    8.610944] [drm] DM_PPLIB:     113800
> Jan  2 22:16:10 localhost kernel: [    8.610945] [drm] DM_PPLIB:     117200
> Jan  2 22:16:10 localhost kernel: [    8.610946] [drm] DM_PPLIB:     121000
> Jan  2 22:16:10 localhost kernel: [    8.610948] [drm] DM_PPLIB:
> Warning: using default validation clocks!
> Jan  2 22:16:10 localhost kernel: [    8.610950] [drm] DM_PPLIB:
> Validation clocks:
> Jan  2 22:16:10 localhost kernel: [    8.610952] [drm] DM_PPLIB:
> engine_max_clock: 72000
> Jan  2 22:16:10 localhost kernel: [    8.610953] [drm] DM_PPLIB:
> memory_max_clock: 80000
> Jan  2 22:16:10 localhost kernel: [    8.610955] [drm] DM_PPLIB:
> level           : 0
> Jan  2 22:16:10 localhost kernel: [    8.610957] [drm] DM_PPLIB:
> reducing engine clock level from 8 to 2
> Jan  2 22:16:10 localhost kernel: [    8.610961] [drm] DM_PPLIB:
> values for Memory clock
> Jan  2 22:16:10 localhost kernel: [    8.610963] [drm] DM_PPLIB:     30000
> Jan  2 22:16:10 localhost kernel: [    8.610964] [drm] DM_PPLIB:     175000
> Jan  2 22:16:10 localhost kernel: [    8.610966] [drm] DM_PPLIB:
> Warning: using default validation clocks!
> Jan  2 22:16:10 localhost kernel: [    8.610967] [drm] DM_PPLIB:
> Validation clocks:
> Jan  2 22:16:10 localhost kernel: [    8.610969] [drm] DM_PPLIB:
> engine_max_clock: 72000
> Jan  2 22:16:10 localhost kernel: [    8.610970] [drm] DM_PPLIB:
> memory_max_clock: 80000
> Jan  2 22:16:10 localhost kernel: [    8.610972] [drm] DM_PPLIB:
> level           : 0
> Jan  2 22:16:10 localhost kernel: [    8.610973] [drm] DM_PPLIB:
> reducing memory clock level from 2 to 1
> Jan  2 22:16:10 localhost kernel: [    8.611994] [drm] Display Core
> initialized with v3.1.27!
> Jan  2 22:16:10 localhost kernel: [    8.711955] [drm] EDID checksum
> is invalid, remainder is 223
> Jan  2 22:16:10 localhost kernel: [    8.711961] Raw EDID:
> Jan  2 22:16:10 localhost kernel: [    8.711965]      00 ff ff ff ff
> ff ff 00 2e 83 54 21 34 00 00 00
> Jan  2 22:16:10 localhost kernel: [    8.711967]      29 15 01 03 80
> 30 1b 78 0a f0 65 98 57 51 91 27
> Jan  2 22:16:10 localhost kernel: [    8.711969]      00 50 54 21 08
> 00 81 80 a9 c0 01 01 01 01 01 01
> Jan  2 22:16:10 localhost kernel: [    8.711970]      01 01 01 01 01
> 01 02 3a 80 18 71 38 2d 40 58 2c
> Jan  2 22:16:10 localhost kernel: [    8.711972]      45 00 dc 0c 11
> 00 00 1e 66 21 50 b0 51 00 1b 30
> Jan  2 22:16:10 localhost kernel: [    8.711974]      40 70 36 00 dc
> 0c 11 00 00 1e 00 00 00 fc 00 32
> Jan  2 22:16:10 localhost kernel: [    8.711975]      32 4c 31 31 41
> 2d 48 44 2d 41 55 0a 00 00 00 fd
> Jan  2 22:16:10 localhost kernel: [    8.711977]      00 31 3d 0f 44
> 0f 00 0a 20 20 20 20 20 20 01 15
> Jan  2 22:16:10 localhost kernel: [    8.773617] [drm] EDID checksum
> is invalid, remainder is 223
> Jan  2 22:16:10 localhost kernel: [    8.773622] Raw EDID:
> Jan  2 22:16:10 localhost kernel: [    8.773625]      00 ff ff ff ff
> ff ff 00 2e 83 54 21 34 00 00 00
> Jan  2 22:16:10 localhost kernel: [    8.773627]      29 15 01 03 80
> 30 1b 78 0a f0 65 98 57 51 91 27
> Jan  2 22:16:10 localhost kernel: [    8.773629]      00 50 54 21 08
> 00 81 80 a9 c0 01 01 01 01 01 01
> Jan  2 22:16:10 localhost kernel: [    8.773631]      01 01 01 01 01
> 01 02 3a 80 18 71 38 2d 40 58 2c
> Jan  2 22:16:10 localhost kernel: [    8.773632]      45 00 dc 0c 11
> 00 00 1e 66 21 50 b0 51 00 1b 30
> Jan  2 22:16:10 localhost kernel: [    8.773634]      40 70 36 00 dc
> 0c 11 00 00 1e 00 00 00 fc 00 32
> Jan  2 22:16:10 localhost kernel: [    8.773635]      32 4c 31 31 41
> 2d 48 44 2d 41 55 0a 00 00 00 fd
> Jan  2 22:16:10 localhost kernel: [    8.773637]      00 31 3d 0f 44
> 0f 00 0a 20 20 20 20 20 20 01 15
> Jan  2 22:16:10 localhost kernel: [    8.835263] [drm] EDID checksum
> is invalid, remainder is 223
> Jan  2 22:16:10 localhost kernel: [    8.835267] Raw EDID:
> Jan  2 22:16:10 localhost kernel: [    8.835270]      00 ff ff ff ff
> ff ff 00 2e 83 54 21 34 00 00 00
> Jan  2 22:16:10 localhost kernel: [    8.835272]      29 15 01 03 80
> 30 1b 78 0a f0 65 98 57 51 91 27
> Jan  2 22:16:10 localhost kernel: [    8.835274]      00 50 54 21 08
> 00 81 80 a9 c0 01 01 01 01 01 01
> Jan  2 22:16:10 localhost kernel: [    8.835275]      01 01 01 01 01
> 01 02 3a 80 18 71 38 2d 40 58 2c
> Jan  2 22:16:10 localhost kernel: [    8.835277]      45 00 dc 0c 11
> 00 00 1e 66 21 50 b0 51 00 1b 30
> Jan  2 22:16:10 localhost kernel: [    8.835278]      40 70 36 00 dc
> 0c 11 00 00 1e 00 00 00 fc 00 32
> Jan  2 22:16:10 localhost kernel: [    8.835280]      32 4c 31 31 41
> 2d 48 44 2d 41 55 0a 00 00 00 fd
> Jan  2 22:16:10 localhost kernel: [    8.835282]      00 31 3d 0f 44
> 0f 00 0a 20 20 20 20 20 20 01 15
> Jan  2 22:16:10 localhost kernel: [    8.835794]
> [drm:dm_helpers_read_local_edid [amdgpu]] *ERROR* EDID err: 3, on
> connector: HDMI-A-1
> Jan  2 22:16:10 localhost kernel: [    8.836127]
> [drm:log_to_debug_console [amdgpu]] *ERROR* EDID checksum invalid.
> Jan  2 22:16:10 localhost kernel: [    8.858895] [drm] Supports vblank
> timestamp caching Rev 2 (21.10.2013).
> Jan  2 22:16:10 localhost kernel: [    8.858900] [drm] Driver supports
> precise vblank timestamp query.
> Jan  2 22:16:10 localhost kernel: [    8.891054] [drm] UVD and UVD ENC
> initialized successfully.
> Jan  2 22:16:10 localhost kernel: [    8.991987] [drm] VCE initialized
> successfully.
> Jan  2 22:16:10 localhost kernel: [    9.122860] EXT4-fs (mmcblk0p1):
> mounted filesystem with ordered data mode. Opts: errors=remount-ro
> Jan  2 22:16:10 localhost kernel: [    9.148940] Adding 1952764k swap
> on /dev/sda1.  Priority:-2 extents:1 across:1952764k SS
> Jan  2 22:16:10 localhost kernel: [    9.514830] [drm] EDID checksum
> is invalid, remainder is 223
> Jan  2 22:16:10 localhost kernel: [    9.514834] Raw EDID:
> Jan  2 22:16:10 localhost kernel: [    9.514838]      00 ff ff ff ff
> ff ff 00 2e 83 54 21 34 00 00 00
> Jan  2 22:16:10 localhost kernel: [    9.514840]      29 15 01 03 80
> 30 1b 78 0a f0 65 98 57 51 91 27
> Jan  2 22:16:10 localhost kernel: [    9.514842]      00 50 54 21 08
> 00 81 80 a9 c0 01 01 01 01 01 01
> Jan  2 22:16:10 localhost kernel: [    9.514843]      01 01 01 01 01
> 01 02 3a 80 18 71 38 2d 40 58 2c
> Jan  2 22:16:10 localhost kernel: [    9.514845]      45 00 dc 0c 11
> 00 00 1e 66 21 50 b0 51 00 1b 30
> Jan  2 22:16:10 localhost kernel: [    9.514846]      40 70 36 00 dc
> 0c 11 00 00 1e 00 00 00 fc 00 32
> Jan  2 22:16:10 localhost kernel: [    9.514848]      32 4c 31 31 41
> 2d 48 44 2d 41 55 0a 00 00 00 fd
> Jan  2 22:16:10 localhost kernel: [    9.514850]      00 31 3d 0f 44
> 0f 00 0a 20 20 20 20 20 20 01 15
> Jan  2 22:16:10 localhost kernel: [    9.514856] amdgpu 0000:01:00.0:
> HDMI-A-1: EDID invalid.
> Jan  2 22:16:10 localhost kernel: [    9.515693] [drm] fb mappable at 0xD03F2000
> Jan  2 22:16:10 localhost kernel: [    9.515695] [drm] vram apper at 0xD0000000
> Jan  2 22:16:10 localhost kernel: [    9.515697] [drm] size 3145728
> Jan  2 22:16:10 localhost kernel: [    9.515698] [drm] fb depth is 24
> Jan  2 22:16:10 localhost kernel: [    9.515700] [drm]    pitch is 4096
> Jan  2 22:16:10 localhost kernel: [    9.535511] Console: switching to
> colour frame buffer device 128x48
> Jan  2 22:16:10 localhost kernel: [    9.595719] amdgpu 0000:01:00.0:
> fb0: amdgpudrmfb frame buffer device
> Jan  2 22:16:10 localhost kernel: [    9.633154] [drm] Initialized
> amdgpu 3.23.0 20150101 for 0000:01:00.0 on minor 0
> Jan  2 22:16:12 localhost kernel: [   12.690445] IPv6:
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> Jan  2 22:16:12 localhost kernel: [   12.782663] IPv6:
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> Jan  2 22:16:12 localhost kernel: [   12.804369] IPv6:
> ADDRCONF(NETDEV_UP): eth1: link is not ready
> Jan  2 22:16:12 localhost kernel: [   12.805095] IPv6:
> ADDRCONF(NETDEV_UP): eth1: link is not ready
> Jan  2 22:16:12 localhost kernel: [   12.840083] IPv6:
> ADDRCONF(NETDEV_UP): eth2: link is not ready
> Jan  2 22:16:12 localhost kernel: [   12.841117] IPv6:
> ADDRCONF(NETDEV_UP): eth2: link is not ready
> Jan  2 22:16:12 localhost kernel: [   13.371194] [drm] EDID checksum
> is invalid, remainder is 223
> Jan  2 22:16:12 localhost kernel: [   13.371199] Raw EDID:
> Jan  2 22:16:12 localhost kernel: [   13.371202]      00 ff ff ff ff
> ff ff 00 2e 83 54 21 34 00 00 00
> Jan  2 22:16:12 localhost kernel: [   13.371204]      29 15 01 03 80
> 30 1b 78 0a f0 65 98 57 51 91 27
> Jan  2 22:16:12 localhost kernel: [   13.371206]      00 50 54 21 08
> 00 81 80 a9 c0 01 01 01 01 01 01
> Jan  2 22:16:12 localhost kernel: [   13.371208]      01 01 01 01 01
> 01 02 3a 80 18 71 38 2d 40 58 2c
> Jan  2 22:16:12 localhost kernel: [   13.371209]      45 00 dc 0c 11
> 00 00 1e 66 21 50 b0 51 00 1b 30
> Jan  2 22:16:12 localhost kernel: [   13.371211]      40 70 36 00 dc
> 0c 11 00 00 1e 00 00 00 fc 00 32
> Jan  2 22:16:12 localhost kernel: [   13.371213]      32 4c 31 31 41
> 2d 48 44 2d 41 55 0a 00 00 00 fd
> Jan  2 22:16:12 localhost kernel: [   13.371214]      00 31 3d 0f 44
> 0f 00 0a 20 20 20 20 20 20 01 15
> Jan  2 22:16:12 localhost kernel: [   13.371220] amdgpu 0000:01:00.0:
> HDMI-A-1: EDID invalid.
> Jan  2 22:16:12 localhost kernel: [   13.372798] [drm] EDID checksum
> is invalid, remainder is 223
> Jan  2 22:16:12 localhost kernel: [   13.372801] Raw EDID:
> Jan  2 22:16:12 localhost kernel: [   13.372804]      00 ff ff ff ff
> ff ff 00 2e 83 54 21 34 00 00 00
> Jan  2 22:16:12 localhost kernel: [   13.372806]      29 15 01 03 80
> 30 1b 78 0a f0 65 98 57 51 91 27
> Jan  2 22:16:12 localhost kernel: [   13.372808]      00 50 54 21 08
> 00 81 80 a9 c0 01 01 01 01 01 01
> Jan  2 22:16:12 localhost kernel: [   13.372809]      01 01 01 01 01
> 01 02 3a 80 18 71 38 2d 40 58 2c
> Jan  2 22:16:12 localhost kernel: [   13.372811]      45 00 dc 0c 11
> 00 00 1e 66 21 50 b0 51 00 1b 30
> Jan  2 22:16:12 localhost kernel: [   13.372813]      40 70 36 00 dc
> 0c 11 00 00 1e 00 00 00 fc 00 32
> Jan  2 22:16:12 localhost kernel: [   13.372814]      32 4c 31 31 41
> 2d 48 44 2d 41 55 0a 00 00 00 fd
> Jan  2 22:16:12 localhost kernel: [   13.372816]      00 31 3d 0f 44
> 0f 00 0a 20 20 20 20 20 20 01 15
> Jan  2 22:16:12 localhost kernel: [   13.372821] amdgpu 0000:01:00.0:
> HDMI-A-1: EDID invalid.
> Jan  2 22:16:13 localhost kernel: [   13.831733] mvneta
> f1030000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
> Jan  2 22:16:13 localhost kernel: [   13.831755] IPv6:
> ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
> Jan  2 22:16:13 localhost kernel: [   13.911730] mvneta
> f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control off
> Jan  2 22:16:13 localhost kernel: [   13.911748] IPv6:
> ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
> Jan  2 22:16:13 localhost kernel: [   14.209369] random: crng init done
> Jan  2 22:16:13 localhost kernel: [   14.534838] fuse init (API version 7.26)
> Jan  2 22:16:15 localhost kernel: [   15.992865] mvneta
> f1070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> Jan  2 22:16:15 localhost kernel: [   15.992884] IPv6:
> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>
> On Tue, Jan 2, 2018 at 1:17 PM, Christian König
> <christian.koenig@amd.com> wrote:
>>> when you refer to API traces, you're suggesting to strace
>>> kodi, or what do you mean?
>>
>> What I meant was apitrace (https://github.com/apitrace/apitrace), but when
>> even the lightdm login screen crashes than this won't be much helpful.
>>
>> That strongly sounds like a ARM specific problem, maybe USWC doesn't work as
>> it should? See function drm_arch_can_wc_memory() in the kernel source and
>> try if it helps if you always return false.
>>
>> Apart from that the only other explanation I have is that some system memory
>> isn't accessible for the GPU while some other is working fine.
>>
>> Please provide the output of "sudo cat /proc/iomem" to double check that.
>>
>> Regards,
>> Christian.
>>
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                     ` <CAEzXK1oUyHsSGHeXS9qzWBSDL-FfRq2h-EiQMCfa=5BroO50gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-03  9:37                       ` Christian König
       [not found]                         ` <8bcdc933-050b-12ca-46e6-54bb66b6824d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2018-01-03  9:37 UTC (permalink / raw)
  To: Luís Mendes, Christian König
  Cc: alexander.deucher-5C7GfCeVMHo, Chunming Zhou,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Luis,

In general please add information like /proc/iomem and dmesg as 
attachment and not mangled inside the mail.

The good news is that your ARM board at least has a memory layout which 
should work in theory. So at least one problem rules out.

I don't think that apitrace would be much helpful in this case as long 
as no developer has access to one of those ARM boards. But it is 
interesting that the apitrace reliable reproduces the issue. This means 
that it isn't something random, but rather a specific timing of things.

Regards,
Christian.

Am 03.01.2018 um 01:36 schrieb Luís Mendes:
> Just a small update, regarding to what I have posted...
>
> I've made additional tests with mesa-17.4 at commit "radv: Implement
> binning on GFX9" - 6a36bfc64d2096aa338958c4605f5fc6372c07b8 and I was
> able to gather a smaller apitrace of kodi playing a video with about
> 1GB that hangs the GPU, almost always, when replayed with glretrace if
> without the option --singlethread. If option --singlethread is used,
> when doing glretrace, no gpu hang occurs, ever, it seems.
>
> For some reason now I am getting past the lightdm login screen without
> issues, maybe some of the suggested changes improved the behaviour
> with mesa-17.4, however with mesa-17.3.1 I didn't have those issue
> anyway.
>
> Now both mesa-17.3.1 and mesa-17.4 behave similarly, blocking while
> playing video with kodi, but is also possible to cause the gpu hang
> with other applications.
> On the other hand pure openGL application seem to work fine... I am
> able to run glmark2 tests without issues.
>
> How can I send these apitraces?
>
> On Tue, Jan 2, 2018 at 10:29 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
>> Ok... I've done some of the suggested tests.
>>
>> I still haven't tested on x86, but I'll get to that.
>>
>> I've recompiled the kernel to disable Power Management as much as
>> possible at all levels, including the PCIe, I've also modified
>> /include/drm/drm_cache.h - static inline bool
>> drm_arch_can_wc_memory(void) to always return false, but neither
>> solved the issue.
>>
>> When I run kodi under apitrace with mesa 17.3.1 it becomes much more
>> difficult to reproduce the crash, there are a lot of missed frames due
>> to the CPU overload of apitrace, but I was to able to crash the GPU
>> once. The apitrace log has 2.3GB, how should I send it?
>> It happened while playing a VP9 encoded webm video file, which is
>> decoded by software, as RX 460 is unable to hardware decode this codec
>> AFAIK. In fact software decoded videos are more prone to produce the
>> GPU hang, while a H265 4K hardware decoded video never causes a GPU
>> hang. I'm affraid I forgot to have kodi to log the execution data when
>> I did the apitrace.
>>
>>
>> The full dmesg is presented below as well as the /proc/iomem
>> information and lspci output.
>>   I just want to note that I'm having EDID DDC errors with my TV
>> screen, because at some point in kernel 4.14 onwards, both the RX460
>> as well as the RX550 cards started to corrupt the I2C TV screen EDID
>> memory, so that I have to reflash the correct EDID data to get the
>> screen back to its own configuration. This is a rare problem that only
>> occurs with this TV. All other TVs and monitors that I've tested don't
>> show this EDID corruption issue. I currently have stopped to reflash
>> the I2C EDID configuration memory of my TV to avoid exceeding the
>> memory write cycles endurance, instead I now modify gpu/drm/drm_edid.c
>> in function drm_do_get_edid() to allow the corrupted EDID to pass and
>> enter X. So please ignore the EDID error warnings on my dmesg log. The
>> GPU hangs occur just the same, even when I have the correct EDID, as
>> it is an unrelated issue.
>>
>> Regards,
>> Luís
>>
>> iomem shows this:
>> 00000000-3fffffff : System RAM
>>    00008000-00efffff : Kernel code
>>    01000000-010e3913 : Kernel data
>> d0000000-efffffff : PCI MEM
>>    d0000000-e7ffffff : PCI Bus 0000:01
>>      d0000000-dfffffff : 0000:01:00.0
>>      e0000000-e01fffff : 0000:01:00.0
>>      e0200000-e023ffff : 0000:01:00.0
>>      e0240000-e025ffff : 0000:01:00.0
>>      e0260000-e0263fff : 0000:01:00.1
>>        e0260000-e0263fff : ICH HD audio
>> f1010680-f10106cf : spi@10680
>> f1011000-f101101f : i2c@11000
>> f1011100-f101111f : i2c@11100
>> f1012000-f101201f : serial
>> f1012100-f101211f : serial
>> f1018000-f101801f : pinctrl@18000
>> f1018100-f101813f : gpio
>> f1018140-f101817f : gpio
>> f1018454-f1018457 : conf-sdio3
>> f10184a0-f10184ab : rtc-soc
>> f1020704-f1020707 : watchdog@20300
>> f1020800-f102080f : cpurst@20800
>> f1020a00-f1020ccf : interrupt-controller@20a00
>> f1021070-f10210c7 : interrupt-controller@20a00
>> f1022000-f1022fff : pmsu@22000
>> f1030000-f1033fff : ethernet@30000
>> f1034000-f1037fff : ethernet@34000
>> f1040000-f1041fff : pcie@2,0
>> f1044000-f1045fff : pcie@3,0
>> f1058000-f10584ff : usb@58000
>> f1070000-f1073fff : ethernet@70000
>> f10a3800-f10a381f : rtc
>> f10a8000-f10a9fff : sata@a8000
>> f10d8000-f10d8fff : sdhci
>> f10e0000-f10e1fff : sata@e0000
>> f10e4074-f10e4077 : thermal@e8078
>> f10e4078-f10e407b : thermal@e8078
>> f10f0000-f10f3fff : usb3@f0000
>> f10f8000-f10fbfff : usb3@f8000
>> f1100000-f11007ff : f1100000.sa-sram0
>> f1110000-f11107ff : f1110000.sa-sram1
>> f1200000-f12fffff : f1200000.bm-bppi
>>
>> lspci is like this:
>> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>> [AMD/ATI] Baffin [Radeon RX 460] (rev cf) (prog-if 00)
>>          Subsystem: PC Partner Limited / Sapphire Technology Baffin
>> [Radeon RX 460]
>>          Flags: bus master, fast devsel, latency 0, IRQ 57
>>          Memory at d0000000 (64-bit, prefetchable) [size=256M]
>>          Memory at e0000000 (64-bit, prefetchable) [size=2M]
>>          I/O ports at 10000 [size=256]
>>          Memory at e0200000 (32-bit, non-prefetchable) [size=256K]
>>          Expansion ROM at e0240000 [disabled] [size=128K]
>>          Capabilities: [48] Vendor Specific Information: Len=08 <?>
>>          Capabilities: [50] Power Management version 3
>>          Capabilities: [58] Express Legacy Endpoint, MSI 00
>>          Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>          Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
>> Len=010 <?>
>>          Capabilities: [150] Advanced Error Reporting
>>          Capabilities: [200] #15
>>          Capabilities: [270] #19
>>          Capabilities: [2b0] Address Translation Service (ATS)
>>          Capabilities: [2c0] Page Request Interface (PRI)
>>          Capabilities: [2d0] Process Address Space ID (PASID)
>>          Capabilities: [320] Latency Tolerance Reporting
>>          Capabilities: [328] Alternative Routing-ID Interpretation
>> (ARI)
>>          Capabilities: [370] L1 PM Substates
>>          Kernel driver in use: amdgpu
>>          Kernel modules: amdgpu
>>
>> 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device
>> aae0
>>          Subsystem: PC Partner Limited / Sapphire Technology Device
>> aae0
>>          Flags: bus master, fast devsel, latency 0, IRQ 56
>>          Memory at e0260000 (64-bit, non-prefetchable) [size=16K]
>>          Capabilities: [48] Vendor Specific Information: Len=08 <?>
>>          Capabilities: [50] Power Management version 3
>>          Capabilities: [58] Express Legacy Endpoint, MSI 00
>>          Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>          Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1
>> Len=010 <?>
>>          Capabilities: [150] Advanced Error Reporting
>>          Capabilities: [328] Alternative Routing-ID Interpretation
>> (ARI)
>>          Kernel driver in use: snd_hda_intel
>>          Kernel modules: snd_hda_intel
>>
>>
>> Full dmesg output follows:
>>
>> Starting kernel ...
>>
>> [    0.000000] Booting Linux on physical CPU 0x0
>> [    0.000000] Linux version 4.15.0-rc4-strong-g104bd2c-dirty
>> (lpnm@ENIAC10) (gcc version 5.4.0 20160609 (Ubuntu/Lina8
>> [    0.000000] CPU: ARMv7 Processor [414fc091] revision 1 (ARMv7),
>> cr=10c5387d
>> [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing
>> instruction cache
>> [    0.000000] Memory policy: Data cache writealloc
>> [    0.000000] random: fast init done
>> [    0.000000] percpu: Embedded 16 pages/cpu @(ptrval) s35148 r8192
>> d22196 u65536
>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages:
>> 260096
>> [    0.000000] Kernel command line: root=/dev/sda2 rw rootfstype=ext4
>> rootwait console=ttyS0,115200n8
>> [    0.000000] Dentry cache hash table entries: 131072 (order: 7,
>> 524288 bytes)
>> [    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144
>> bytes)
>> [    0.000000] Memory: 1023088K/1048576K available (11264K kernel
>> code, 685K rwdata, 2296K rodata, 1024K init, 224K b)
>> [    0.000000] Virtual kernel memory layout:
>> [    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
>> [    0.000000]     fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
>> [    0.000000]     vmalloc : 0xc0800000 - 0xff800000   (1008 MB)
>> [    0.000000]     lowmem  : 0x80000000 - 0xc0000000   (1024 MB)
>> [    0.000000]     pkmap   : 0x7fe00000 - 0x80000000   (   2 MB)
>> [    0.000000]     modules : 0x7f000000 - 0x7fe00000   (  14 MB)
>> [    0.000000]       .text : 0x(ptrval) - 0x(ptrval)   (12256 kB)
>> [    0.000000]       .init : 0x(ptrval) - 0x(ptrval)   (1024 kB)
>> [    0.000000]       .data : 0x(ptrval) - 0x(ptrval)   ( 686 kB)
>> [    0.000000]        .bss : 0x(ptrval) - 0x(ptrval)   ( 225 kB)
>> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2,
>> Nodes=1
>> [    0.000000] Hierarchical RCU implementation.
>> [    0.000000]  RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
>> [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
>> nr_cpu_ids=2
>> [    0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
>> [    0.000000] L2C-310 erratum 769419 enabled
>> [    0.000000] L2C-310 enabling early BRESP for Cortex-A9
>> [    0.000000] L2C-310 full line of zeros enabled for Cortex-A9
>> [    0.000000] L2C-310 D prefetch enabled, offset 1 lines
>> [    0.000000] L2C-310 dynamic clock gating enabled, standby mode
>> enabled
>> [    0.000000] L2C-310 Coherent cache controller enabled, 16 ways,
>> 1024 kB
>> [    0.000000] L2C-310 Coherent: CACHE_ID 0x410054c9, AUX_CTRL
>> 0x56070001
>> [    0.000006] sched_clock: 64 bits at 800MHz, resolution 1ns, wraps
>> every 4398046511103ns
>> [    0.000017] clocksource: arm_global_timer: mask: 0xffffffffffffffff
>> max_cycles: 0xb881274fa3, max_idle_ns: 4407952s
>> [    0.000030] Switching to timer-based delay loop, resolution 1ns
>> [    0.000168] Ignoring duplicate/late registration of
>> read_current_timer delay
>> [    0.000175] clocksource: armada_370_xp_clocksource: mask:
>> 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76450417s
>> [    0.000351] Console: colour dummy device 80x30
>> [    0.000364] Calibrating delay loop (skipped), value calculated
>> using timer frequency.. 1600.00 BogoMIPS (lpj=80000)
>> [    0.000370] pid_max: default: 32768 minimum: 301
>> [    0.000416] Mount-cache hash table entries: 2048 (order: 1, 8192
>> bytes)
>> [    0.000422] Mountpoint-cache hash table entries: 2048 (order: 1,
>> 8192 bytes)
>> [    0.000660] CPU: Testing write buffer coherency: ok
>> [    0.000775] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
>> [    0.000907] Setting up static identity map for 0x100000 - 0x100060
>> [    0.000982] mvebu-soc-id: MVEBU SoC ID=0x6828, Rev=0x4
>> [    0.001058] mvebu-pmsu: Initializing Power Management Service Unit
>> [    0.001107] Hierarchical SRCU implementation.
>> [    0.002574] smp: Bringing up secondary CPUs ...
>> [    0.002727] Booting CPU 1
>> [    0.002882] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
>> [    0.002927] smp: Brought up 1 node, 2 CPUs
>> [    0.002933] SMP: Total of 2 processors activated (3200.00
>> BogoMIPS).
>> [    0.002936] CPU: All CPU(s) started in SVC mode.
>> [    0.003353] devtmpfs: initialized
>> [    0.005053] VFP support v0.3: implementor 41 architecture 3 part 30
>> variant 9 rev 4
>> [    0.005112] clocksource: jiffies: mask: 0xffffffff max_cycles:
>> 0xffffffff, max_idle_ns: 19112604462750000 ns
>> [    0.005120] futex hash table entries: 512 (order: 3, 32768 bytes)
>> [    0.005188] xor: measuring software checksum speed
>> [    0.100066]    arm4regs  :  2506.400 MB/sec
>> [    0.200065]    8regs     :  1967.600 MB/sec
>> [    0.300066]    32regs    :  1854.800 MB/sec
>> [    0.400067]    neon      :  1822.000 MB/sec
>> [    0.400070] xor: using function: arm4regs (2506.400 MB/sec)
>> [    0.400076] pinctrl core: initialized pinctrl subsystem
>> [    0.400421] NET: Registered protocol family 16
>> [    0.400918] DMA: preallocated 256 KiB pool for atomic coherent
>> allocations
>> [    0.401426] hw-breakpoint: found 5 (+1 reserved) breakpoint and 1
>> watchpoint registers.
>> [    0.401432] hw-breakpoint: maximum watchpoint size is 4 bytes.
>> [    0.401547] mvebu-pmsu: CPU hotplug support is currently broken on
>> Armada 38x: disabling
>> [    0.401554] mvebu-pmsu: CPU idle is currently broken on Armada 38x:
>> disabling
>> [    0.580251] raid6: int32x1  gen()   210 MB/s
>> [    0.750114] raid6: int32x1  xor()   299 MB/s
>> [    0.920090] raid6: int32x2  gen()   291 MB/s
>> [    1.090117] raid6: int32x2  xor()   345 MB/s
>> [    1.260146] raid6: int32x4  gen()   385 MB/s
>> [    1.430096] raid6: int32x4  xor()   343 MB/s
>> [    1.600076] raid6: int32x8  gen()   405 MB/s
>> [    1.770097] raid6: int32x8  xor()   283 MB/s
>> [    1.940098] raid6: neonx1   gen()  1212 MB/s
>> [    2.110070] raid6: neonx1   xor()  1147 MB/s
>> [    2.280110] raid6: neonx2   gen()  1294 MB/s
>> [    2.450092] raid6: neonx2   xor()  1331 MB/s
>> [    2.620096] raid6: neonx4   gen()  1271 MB/s
>> [    2.790069] raid6: neonx4   xor()  1273 MB/s
>> [    2.960097] raid6: neonx8   gen()  1094 MB/s
>> [    3.130097] raid6: neonx8   xor()  1033 MB/s
>> [    3.130100] raid6: using algorithm neonx2 gen() 1294 MB/s
>> [    3.130103] raid6: .... xor() 1331 MB/s, rmw enabled
>> [    3.130106] raid6: using neon recovery algorithm
>> [    3.130501] vgaarb: loaded
>> [    3.130631] SCSI subsystem initialized
>> [    3.130846] usbcore: registered new interface driver usbfs
>> [    3.130870] usbcore: registered new interface driver hub
>> [    3.130895] usbcore: registered new device driver usb
>> [    3.130993] pps_core: LinuxPPS API ver. 1 registered
>> [    3.130997] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
>> Rodolfo Giometti <giometti@linux.it>
>> [    3.131007] PTP clock support registered
>> [    3.131255] Advanced Linux Sound Architecture Driver Initialized.
>> [    3.131458] Bluetooth: Core ver 2.22
>> [    3.131480] NET: Registered protocol family 31
>> [    3.131484] Bluetooth: HCI device and connection manager
>> initialized
>> [    3.131490] Bluetooth: HCI socket layer initialized
>> [    3.131494] Bluetooth: L2CAP socket layer initialized
>> [    3.131507] Bluetooth: SCO socket layer initialized
>> [    3.131694] clocksource: Switched to clocksource arm_global_timer
>> [    3.156110] NET: Registered protocol family 2
>> [    3.156352] TCP established hash table entries: 8192 (order: 3,
>> 32768 bytes)
>> [    3.156391] TCP bind hash table entries: 8192 (order: 4, 65536
>> bytes)
>> [    3.156452] TCP: Hash tables configured (established 8192 bind
>> 8192)
>> [    3.156507] UDP hash table entries: 512 (order: 2, 16384 bytes)
>> [    3.156532] UDP-Lite hash table entries: 512 (order: 2, 16384
>> bytes)
>> [    3.156610] NET: Registered protocol family 1
>> [    3.156807] RPC: Registered named UNIX socket transport module.
>> [    3.156811] RPC: Registered udp transport module.
>> [    3.156814] RPC: Registered tcp transport module.
>> [    3.156817] RPC: Registered tcp NFSv4.1 backchannel transport
>> module.
>> [    3.157229] hw perfevents: enabled with armv7_cortex_a9 PMU driver,
>> 7 counters available
>> [    3.157717] Initialise system trusted keyrings
>> [    3.157939] workingset: timestamp_bits=30 max_order=18
>> bucket_order=0
>> [    3.160489] Installing knfsd (copyright (C) 1996
>> okir@monad.swb.de).
>> [    3.162171] async_tx: api initialized (async)
>> [    3.162179] Key type asymmetric registered
>> [    3.162183] Asymmetric key parser 'x509' registered
>> [    3.162218] Block layer SCSI generic (bsg) driver version 0.4
>> loaded (major 249)
>> [    3.162224] io scheduler noop registered
>> [    3.162228] io scheduler deadline registered
>> [    3.162300] io scheduler cfq registered (default)
>> [    3.162305] io scheduler mq-deadline registered
>> [    3.162309] io scheduler kyber registered
>> [    3.162896] armada-38x-pinctrl f1018000.pinctrl: registered pinctrl
>> driver
>> [    3.163856] mvebu-pcie soc:pcie: /soc/pcie/pcie@2,0: reset gpio is
>> active low
>> [    3.164199] mv_xor f1060800.xor: Marvell shared XOR driver
>> [    3.222153] mv_xor f1060800.xor: Marvell XOR (Descriptor Mode): (
>> xor cpy intr )
>> [    3.222894] mv_xor f1060900.xor: Marvell shared XOR driver
>> [    3.282140] mv_xor f1060900.xor: Marvell XOR (Descriptor Mode): (
>> xor cpy intr )
>> [    3.301102] Serial: 8250/16550 driver, 4 ports, IRQ sharing
>> disabled
>> [    3.301902] console [ttyS0] disabled
>> [    3.321974] f1012000.serial: ttyS0 at MMIO 0xf1012000 (irq = 23,
>> base_baud = 15625000) is a 16550A
>> [    4.145968] console [ttyS0] enabled
>> [    4.169945] f1012100.serial: ttyS1 at MMIO 0xf1012100 (irq = 24,
>> base_baud = 15625000) is a 16550A
>> [    4.179505] ahci-mvebu f10a8000.sata: AHCI 0001.0000 32 slots 2
>> ports 6 Gbps 0x3 impl platform mode
>> [    4.188591] ahci-mvebu f10a8000.sata: flags: 64bit ncq sntf led
>> only pmp fbs pio slum part sxs
>> [    4.197699] scsi host0: ahci-mvebu
>> [    4.201249] scsi host1: ahci-mvebu
>> [    4.204774] ata1: SATA max UDMA/133 mmio [mem
>> 0xf10a8000-0xf10a9fff] port 0x100 irq 45
>> [    4.212715] ata2: SATA max UDMA/133 mmio [mem
>> 0xf10a8000-0xf10a9fff] port 0x180 irq 45
>> [    4.220778] ahci-mvebu f10e0000.sata: AHCI 0001.0000 32 slots 2
>> ports 6 Gbps 0x3 impl platform mode
>> [    4.229854] ahci-mvebu f10e0000.sata: flags: 64bit ncq sntf led
>> only pmp fbs pio slum part sxs
>> [    4.238957] scsi host2: ahci-mvebu
>> [    4.242505] scsi host3: ahci-mvebu
>> [    4.245987] ata3: SATA max UDMA/133 mmio [mem
>> 0xf10e0000-0xf10e1fff] port 0x100 irq 46
>> [    4.253944] ata4: SATA max UDMA/133 mmio [mem
>> 0xf10e0000-0xf10e1fff] port 0x180 irq 46
>> [    4.262653] Ethernet Channel Bonding Driver: v3.7.1 (April 27,
>> 2011)
>> [    4.269799] libphy: Fixed MDIO Bus: probed
>> [    4.274099] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
>> [    4.279945] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>> [    4.285917] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver -
>> version 5.1.0-k
>> [    4.293594] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
>> [    4.299440] ixgb: Intel(R) PRO/10GbE Network Driver - version
>> 1.0.135-k2-NAPI
>> [    4.306607] ixgb: Copyright (c) 1999-2008 Intel Corporation.
>> [    4.312474] libphy: orion_mdio_bus: probed
>> [    4.318289] mv88e6085: probe of f1072004.mdio-mii:04 failed with
>> error -110
>> [    4.326210] mvneta f1070000.ethernet eth0: Using device tree mac
>> address 00:50:43:56:27:29
>> [    4.335261] mvneta f1030000.ethernet eth1: Using device tree mac
>> address 00:50:43:56:8b:29
>> [    4.344297] mvneta f1034000.ethernet eth2: Using device tree mac
>> address 00:50:43:27:8b:56
>> [    4.352756] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network
>> Connection driver for Linux, in-tree:s
>> [    4.361914] iwl3945: Copyright(c) 2003-2011 Intel Corporation
>> [    4.367671] iwl3945: hw_scan is disabled
>> [    4.371688] usbcore: registered new interface driver asix
>> [    4.377123] usbcore: registered new interface driver ax88179_178a
>> [    4.383251] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI)
>> Driver
>> [    4.389796] ehci-pci: EHCI PCI platform driver
>> [    4.394273] ehci-orion: EHCI orion driver
>> [    4.398380] orion-ehci f1058000.usb: EHCI Host Controller
>> [    4.403807] orion-ehci f1058000.usb: new USB bus registered,
>> assigned bus number 1
>> [    4.411440] orion-ehci f1058000.usb: irq 41, io mem 0xf1058000
>> [    4.441680] orion-ehci f1058000.usb: USB 2.0 started, EHCI 1.00
>> [    4.447935] hub 1-0:1.0: USB hub found
>> [    4.451780] hub 1-0:1.0: 1 port detected
>> [    4.456093] xhci-hcd f10f0000.usb3: xHCI Host Controller
>> [    4.461436] xhci-hcd f10f0000.usb3: new USB bus registered,
>> assigned bus number 2
>> [    4.469014] xhci-hcd f10f0000.usb3: hcc params 0x0a000990 hci
>> version 0x100 quirks 0x00010010
>> [    4.477582] xhci-hcd f10f0000.usb3: irq 48, io mem 0xf10f0000
>> [    4.483661] hub 2-0:1.0: USB hub found
>> [    4.487443] hub 2-0:1.0: 1 port detected
>> [    4.491486] xhci-hcd f10f0000.usb3: xHCI Host Controller
>> [    4.496838] xhci-hcd f10f0000.usb3: new USB bus registered,
>> assigned bus number 3
>> [    4.504383] usb usb3: We don't know the algorithms for LPM for this
>> host, disabling LPM.
>> [    4.512749] hub 3-0:1.0: USB hub found
>> [    4.516526] hub 3-0:1.0: 1 port detected
>> [    4.520635] xhci-hcd f10f8000.usb3: xHCI Host Controller
>> [    4.525989] xhci-hcd f10f8000.usb3: new USB bus registered,
>> assigned bus number 4
>> [    4.533545] xhci-hcd f10f8000.usb3: hcc params 0x0a000990 hci
>> version 0x100 quirks 0x00010010
>> [    4.542111] xhci-hcd f10f8000.usb3: irq 49, io mem 0xf10f8000
>> [    4.548169] hub 4-0:1.0: USB hub found
>> [    4.551961] ata2: SATA link down (SStatus 0 SControl 300)
>> [    4.551965] hub 4-0:1.0: 1 port detected
>> [    4.552066] xhci-hcd f10f8000.usb3: xHCI Host Controller
>> [    4.552073] xhci-hcd f10f8000.usb3: new USB bus registered,
>> assigned bus number 5
>> [    4.552119] usb usb5: We don't know the algorithms for LPM for this
>> host, disabling LPM.
>> [    4.552414] hub 5-0:1.0: USB hub found
>> [    4.552444] hub 5-0:1.0: 1 port detected
>> [    4.552652] usbcore: registered new interface driver usb-storage
>> [    4.552824] mousedev: PS/2 mouse device common for all mice
>> [    4.553353] armada38x-rtc f10a3800.rtc: rtc core: registered
>> f10a3800.rtc as rtc0
>> [    4.553480] i2c /dev entries driver
>> [    4.553855] pca953x 0-0020: 0-0020 supply vcc not found, using
>> dummy regulator
>> [    4.567037] GPIO line 496 (pcie1.0-clkreq) hogged as input
>> [    4.568076] GPIO line 499 (pcie1.0-w-disable) hogged as output/low
>> [    4.568805] GPIO line 501 (usb3-current-limit) hogged as input
>> [    4.569844] GPIO line 502 (usb3-power) hogged as output/high
>> [    4.570883] GPIO line 507 (m.2 devslp) hogged as output/low
>> [    4.571613] GPIO line 508 (sfp-los) hogged as input
>> [    4.572343] GPIO line 509 (sfp-tx-fault) hogged as input
>> [    4.573382] GPIO line 510 (sfp-tx-disable) hogged as output/low
>> [    4.574111] GPIO line 511 (sfp-mod-def0) hogged as input
>> [    4.574840] GPIO line 500 (pcie2.0-clkreq) hogged as input
>> [    4.575879] GPIO line 503 (pcie2.0-w-disable) hogged as output/low
>> [    4.575961] pca953x 0-0020: interrupt support not compiled in
>> [    4.576394] IR NEC protocol handler initialized
>> [    4.576396] IR RC5(x/sz) protocol handler initialized
>> [    4.576397] IR RC6 protocol handler initialized
>> [    4.576398] IR JVC protocol handler initialized
>> [    4.576399] IR Sony protocol handler initialized
>> [    4.576400] IR SANYO protocol handler initialized
>> [    4.576402] IR Sharp protocol handler initialized
>> [    4.576403] IR MCE Keyboard/mouse protocol handler initialized
>> [    4.576404] IR XMP protocol handler initialized
>> [    4.586610] (NULL device *): hwmon_device_register() is deprecated.
>> Please convert the driver to use hwmon_device_.
>> [    4.587121] orion_wdt: Initial timeout 171 sec
>> [    4.587445] sdhci: Secure Digital Host Controller Interface driver
>> [    4.587446] sdhci: Copyright(c) Pierre Ossman
>> [    4.587610] sdhci-pxav3 f10d8000.sdhci: Got CD GPIO
>> [    4.601813] ata4: SATA link down (SStatus 0 SControl 300)
>> [    4.602073] ata3: SATA link down (SStatus 0 SControl 300)
>> [    4.651761] mmc0: SDHCI controller on f10d8000.sdhci
>> [f10d8000.sdhci] using ADMA
>> [    4.651860] sdhci-pltfm: SDHCI platform and OF driver helper
>> [    4.652046] usbcore: registered new interface driver usbhid
>> [    4.652047] usbhid: USB HID core driver
>> [    4.653005] NET: Registered protocol family 10
>> [    4.653849] Segment Routing with IPv6
>> [    4.653881] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
>> [    4.654110] NET: Registered protocol family 17
>> [    4.657571] 8021q: 802.1Q VLAN Support v1.8
>> [    4.657655] ThumbEE CPU extension supported.
>> [    4.657660] Registering SWP/SWPB emulation handler
>> [    4.657907] Loading compiled-in X.509 certificates
>> [    4.658691] Btrfs loaded, crc32c=crc32c-generic
>> [    4.659319] mvebu-pcie soc:pcie: /soc/pcie/pcie@2,0: reset gpio is
>> active low
>> [    4.659357] mvebu-pcie soc:pcie: /soc/pcie/pcie@3,0: reset gpio is
>> active low
>> [    4.704927] mmc0: new high speed SDHC card at address aaaa
>> [    4.705127] mmcblk0: mmc0:aaaa SL16G 14.8 GiB
>> [    4.707004]  mmcblk0: p1 p2
>> [    4.891777] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
>> [    4.897716] pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
>> [    4.904006] pci_bus 0000:00: root bus resource [mem
>> 0xd0000000-0xefffffff]
>> [    4.910897] pci_bus 0000:00: root bus resource [bus 00-ff]
>> [    4.916598] PCI: bus0: Fast back to back transfers disabled
>> [    4.922192] pci 0000:00:02.0: bridge configuration invalid ([bus
>> 00-00]), reconfiguring
>> [    4.930217] pci 0000:00:03.0: bridge configuration invalid ([bus
>> 00-00]), reconfiguring
>> [    4.938401] pci 0000:01:00.0: enabling Extended Tags
>> [    4.943568] pci 0000:01:00.0: vgaarb: VGA device added:
>> decodes=io+mem,owns=none,locks=none
>> [    4.952057] pci 0000:01:00.1: enabling Extended Tags
>> [    4.957200] PCI: bus1: Fast back to back transfers disabled
>> [    4.962831] PCI: bus2: Fast back to back transfers enabled
>> [    4.968347] pci 0000:00:02.0: BAR 8: assigned [mem
>> 0xd0000000-0xe7ffffff]
>> [    4.975159] pci 0000:00:02.0: BAR 7: assigned [io  0x10000-0x10fff]
>> [    4.981444] pci 0000:01:00.0: BAR 0: assigned [mem
>> 0xd0000000-0xdfffffff 64bit pref]
>> [    4.989219] pci 0000:01:00.0: BAR 2: assigned [mem
>> 0xe0000000-0xe01fffff 64bit pref]
>> [    4.996991] pci 0000:01:00.0: BAR 5: assigned [mem
>> 0xe0200000-0xe023ffff]
>> [    5.003802] pci 0000:01:00.0: BAR 6: assigned [mem
>> 0xe0240000-0xe025ffff pref]
>> [    5.011041] pci 0000:01:00.1: BAR 0: assigned [mem
>> 0xe0260000-0xe0263fff 64bit]
>> [    5.018378] pci 0000:01:00.0: BAR 4: assigned [io  0x10000-0x100ff]
>> [    5.024667] pci 0000:00:02.0: PCI bridge to [bus 01]
>> [    5.029642] pci 0000:00:02.0:   bridge window [io  0x10000-0x10fff]
>> [    5.035929] pci 0000:00:02.0:   bridge window [mem
>> 0xd0000000-0xe7ffffff]
>> [    5.042738] pci 0000:00:03.0: PCI bridge to [bus 02]
>> [    5.047757] pcieport 0000:00:02.0: enabling device (0140 -> 0143)
>> [    5.054113] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>> [    5.054406] usb 4-1: new high-speed USB device number 2 using
>> xhci-hcd
>> [    5.054803] input: gpio-keys as
>> /devices/platform/gpio-keys/input/input0
>> [    5.060314] armada38x-rtc f10a3800.rtc: setting system clock to
>> 2018-01-02 22:16:04 UTC (1514931364)
>> [    5.060413] cfg80211: Loading compiled-in X.509 certificates for
>> regulatory database
>> [    5.062147] cfg80211: Loaded X.509 cert 'sforshee:
>> 00b28ddf47aef9cea7'
>> [    5.062170] ALSA device list:
>> [    5.062171]   No soundcards found.
>> [    5.062225] platform regulatory.0: Direct firmware load for
>> regulatory.db failed with error -2
>> [    5.062228] cfg80211: failed to load regulatory.db
>> [    5.118446] ata1.00: supports DRM functions and may not be fully
>> accessible
>> [    5.125486] ata1.00: ATA-9: Samsung SSD 850 EVO mSATA 250GB,
>> EMT41B6Q, max UDMA/133
>> [    5.133166] ata1.00: 488397168 sectors, multi 1: LBA48 NCQ (depth
>> 31/32)
>> [    5.141562] ata1.00: supports DRM functions and may not be fully
>> accessible
>> [    5.149950] ata1.00: configured for UDMA/133
>> [    5.154421] scsi 0:0:0:0: Direct-Access     ATA      Samsung SSD
>> 850  1B6Q PQ: 0 ANSI: 5
>> [    5.162898] sd 0:0:0:0: [sda] 488397168 512-byte logical blocks:
>> (250 GB/233 GiB)
>> [    5.170424] sd 0:0:0:0: [sda] Write Protect is off
>> [    5.175289] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
>> enabled, doesn't support DPO or FUA
>> [    5.185066]  sda: sda1 sda2 sda3
>> [    5.188854] sd 0:0:0:0: [sda] Attached SCSI removable disk
>> [    5.194390] md: Waiting for all devices to be available before
>> autodetect
>> [    5.201192] md: If you don't use raid, use raid=noautodetect
>> [    5.207132] md: Autodetecting RAID arrays.
>> [    5.211237] md: autorun ...
>> [    5.214043] md: ... autorun DONE.
>> [    5.223207] EXT4-fs (sda2): mounted filesystem with ordered data
>> mode. Opts: (null)
>> [    5.230899] VFS: Mounted root (ext4 filesystem) on device 8:2.
>> [    5.237830] devtmpfs: mounted
>> [    5.241302] Freeing unused kernel memory: 1024K
>> [    5.247070] hub 4-1:1.0: USB hub found
>> [    5.250926] hub 4-1:1.0: 4 ports detected
>> [    5.317951] systemd[1]: systemd 234 running in system mode. (+PAM
>> +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT )
>> [    5.338867] systemd[1]: Detected architecture arm.
>> Jan  2 22:16:10 localhost kernel: [    5.571694] usb 4-1.2: new
>> low-speed USB device number 3 using xhci-hcd
>> Jan  2 22:16:10 localhost kernel: [    5.740438] input: Trust Trust
>> Wireless TouchKB as
>> /devices/platform/soc/soc:internal-regs/f10f8000.usb3/usb4/4-1/4-1.2/4-1.2:1.0/0003:145F:01D3.0001/input/input1
>> Jan  2 22:16:10 localhost kernel: [    5.822021] hid-generic
>> 0003:145F:01D3.0001: input: USB HID v1.10 Keyboard [Trust Trust
>> Wireless TouchKB] on usb-f10f8000.usb3-1.2/input0
>> Jan  2 22:16:10 localhost kernel: [    5.842379] input: Trust Trust
>> Wireless TouchKB as
>> /devices/platform/soc/soc:internal-regs/f10f8000.usb3/usb4/4-1/4-1.2/4-1.2:1.1/0003:145F:01D3.0002/input/input2
>> Jan  2 22:16:10 localhost kernel: [    5.921815] hid-generic
>> 0003:145F:01D3.0002: input: USB HID v1.10 Mouse [Trust Trust Wireless
>> TouchKB] on usb-f10f8000.usb3-1.2/input1
>> Jan  2 22:16:10 localhost kernel: [    6.409779] lp: driver loaded but
>> no devices found
>> Jan  2 22:16:10 localhost kernel: [    6.417487] ppdev: user-space
>> parallel port driver
>> Jan  2 22:16:10 localhost kernel: [    6.614196] EXT4-fs (sda2):
>> re-mounted. Opts: errors=remount-ro
>> Jan  2 22:16:10 localhost kernel: [    7.648916] snd_hda_intel
>> 0000:01:00.1: enabling device (0140 -> 0142)
>> Jan  2 22:16:10 localhost kernel: [    7.648929] snd_hda_intel
>> 0000:01:00.1: Force to snoop mode by module option
>> Jan  2 22:16:10 localhost kernel: [    7.650649]
>> drm_panel_orientation_quirks: module license 'unspecified' taints
>> kernel.
>> Jan  2 22:16:10 localhost kernel: [    7.650653] Disabling lock
>> debugging due to kernel taint
>> Jan  2 22:16:10 localhost kernel: [    7.706837] input: HDA ATI HDMI
>> HDMI/DP,pcm=3 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input3
>> Jan  2 22:16:10 localhost kernel: [    7.706974] input: HDA ATI HDMI
>> HDMI/DP,pcm=7 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input4
>> Jan  2 22:16:10 localhost kernel: [    7.707110] input: HDA ATI HDMI
>> HDMI/DP,pcm=8 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input5
>> Jan  2 22:16:10 localhost kernel: [    7.707249] input: HDA ATI HDMI
>> HDMI/DP,pcm=9 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input6
>> Jan  2 22:16:10 localhost kernel: [    7.707369] input: HDA ATI HDMI
>> HDMI/DP,pcm=10 as
>> /devices/platform/soc/soc:pcie/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card0/input7
>> Jan  2 22:16:10 localhost kernel: [    7.777661] [drm] amdgpu kernel
>> modesetting enabled.
>> Jan  2 22:16:10 localhost kernel: [    7.778616] amdgpu 0000:01:00.0:
>> enabling device (0140 -> 0143)
>> Jan  2 22:16:10 localhost kernel: [    7.780795] [drm] initializing
>> kernel modesetting (POLARIS11 0x1002:0x67EF 0x174B:0xE348 0xCF).
>> Jan  2 22:16:10 localhost kernel: [    7.780809] [drm] register mmio
>> base: 0xE0200000
>> Jan  2 22:16:10 localhost kernel: [    7.780811] [drm] register mmio
>> size: 262144
>> Jan  2 22:16:10 localhost kernel: [    7.780846] [drm] probing gen 2
>> caps for device 11ab:6828 = 3ac12/0
>> Jan  2 22:16:10 localhost kernel: [    7.780848] [drm] probing mlw for
>> device 11ab:6828 = 3ac12
>> Jan  2 22:16:10 localhost kernel: [    7.780876] [drm] UVD is enabled in VM mode
>> Jan  2 22:16:10 localhost kernel: [    7.780877] [drm] UVD ENC is
>> enabled in VM mode
>> Jan  2 22:16:10 localhost kernel: [    7.780882] [drm] VCE enabled in VM mode
>> Jan  2 22:16:10 localhost kernel: [    7.998516] ATOM BIOS: 113-34801-U03
>> Jan  2 22:16:10 localhost kernel: [    7.998546] [drm] GPU posting now...
>> Jan  2 22:16:10 localhost kernel: [    8.128956] [drm] vm size is 64
>> GB, 2 levels, block size is 10-bit, fragment size is 9-bit
>> Jan  2 22:16:10 localhost kernel: [    8.133710] amdgpu 0000:01:00.0:
>> VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
>> Jan  2 22:16:10 localhost kernel: [    8.133724] amdgpu 0000:01:00.0:
>> GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
>> Jan  2 22:16:10 localhost kernel: [    8.133727] [drm] Detected VRAM
>> RAM=2048M, BAR=256M
>> Jan  2 22:16:10 localhost kernel: [    8.133729] [drm] RAM width 128bits GDDR5
>> Jan  2 22:16:10 localhost kernel: [    8.133823] [TTM] Zone  kernel:
>> Available graphics memory: 512056 kiB
>> Jan  2 22:16:10 localhost kernel: [    8.133825] [TTM] Initializing
>> pool allocator
>> Jan  2 22:16:10 localhost kernel: [    8.133864] [TTM] Initializing
>> DMA pool allocator
>> Jan  2 22:16:10 localhost kernel: [    8.133906] [drm] amdgpu: 2048M
>> of VRAM memory ready
>> Jan  2 22:16:10 localhost kernel: [    8.133910] [drm] amdgpu: 750M of
>> GTT memory ready.
>> Jan  2 22:16:10 localhost kernel: [    8.133933] [drm] GART: num cpu
>> pages 65536, num gpu pages 65536
>> Jan  2 22:16:10 localhost kernel: [    8.134011] [drm] PCIE GART of
>> 256M enabled (table at 0x000000F400040000).
>> Jan  2 22:16:10 localhost kernel: [    8.135969] [drm] Chained IB
>> support enabled!
>> Jan  2 22:16:10 localhost kernel: [    8.222388] [drm] Found UVD
>> firmware Version: 1.79 Family ID: 16
>> Jan  2 22:16:10 localhost kernel: [    8.370002] [drm] Found VCE
>> firmware Version: 52.4 Binary ID: 3
>> Jan  2 22:16:10 localhost kernel: [    8.459983] amdgpu: [powerplay]
>> Jan  2 22:16:10 localhost kernel: [    8.459983]  failed to send
>> message 309 ret is 254
>> Jan  2 22:16:10 localhost kernel: [    8.460011] amdgpu: [powerplay]
>> Jan  2 22:16:10 localhost kernel: [    8.460011]  failed to send pre
>> message 14e ret is 254
>> Jan  2 22:16:10 localhost kernel: [    8.591421]
>> [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR* Don't have
>> enable_spread_spectrum_on_ppll for v4
>> Jan  2 22:16:10 localhost kernel: [    8.602061]
>> [drm:dal_bios_parser_init_cmd_tbl [amdgpu]] *ERROR* Don't have
>> program_clock for v7
>> Jan  2 22:16:10 localhost kernel: [    8.610933] [drm] DM_PPLIB:
>> values for Engine clock
>> Jan  2 22:16:10 localhost kernel: [    8.610937] [drm] DM_PPLIB:     21400
>> Jan  2 22:16:10 localhost kernel: [    8.610939] [drm] DM_PPLIB:     48100
>> Jan  2 22:16:10 localhost kernel: [    8.610940] [drm] DM_PPLIB:     76000
>> Jan  2 22:16:10 localhost kernel: [    8.610941] [drm] DM_PPLIB:     102000
>> Jan  2 22:16:10 localhost kernel: [    8.610943] [drm] DM_PPLIB:     110200
>> Jan  2 22:16:10 localhost kernel: [    8.610944] [drm] DM_PPLIB:     113800
>> Jan  2 22:16:10 localhost kernel: [    8.610945] [drm] DM_PPLIB:     117200
>> Jan  2 22:16:10 localhost kernel: [    8.610946] [drm] DM_PPLIB:     121000
>> Jan  2 22:16:10 localhost kernel: [    8.610948] [drm] DM_PPLIB:
>> Warning: using default validation clocks!
>> Jan  2 22:16:10 localhost kernel: [    8.610950] [drm] DM_PPLIB:
>> Validation clocks:
>> Jan  2 22:16:10 localhost kernel: [    8.610952] [drm] DM_PPLIB:
>> engine_max_clock: 72000
>> Jan  2 22:16:10 localhost kernel: [    8.610953] [drm] DM_PPLIB:
>> memory_max_clock: 80000
>> Jan  2 22:16:10 localhost kernel: [    8.610955] [drm] DM_PPLIB:
>> level           : 0
>> Jan  2 22:16:10 localhost kernel: [    8.610957] [drm] DM_PPLIB:
>> reducing engine clock level from 8 to 2
>> Jan  2 22:16:10 localhost kernel: [    8.610961] [drm] DM_PPLIB:
>> values for Memory clock
>> Jan  2 22:16:10 localhost kernel: [    8.610963] [drm] DM_PPLIB:     30000
>> Jan  2 22:16:10 localhost kernel: [    8.610964] [drm] DM_PPLIB:     175000
>> Jan  2 22:16:10 localhost kernel: [    8.610966] [drm] DM_PPLIB:
>> Warning: using default validation clocks!
>> Jan  2 22:16:10 localhost kernel: [    8.610967] [drm] DM_PPLIB:
>> Validation clocks:
>> Jan  2 22:16:10 localhost kernel: [    8.610969] [drm] DM_PPLIB:
>> engine_max_clock: 72000
>> Jan  2 22:16:10 localhost kernel: [    8.610970] [drm] DM_PPLIB:
>> memory_max_clock: 80000
>> Jan  2 22:16:10 localhost kernel: [    8.610972] [drm] DM_PPLIB:
>> level           : 0
>> Jan  2 22:16:10 localhost kernel: [    8.610973] [drm] DM_PPLIB:
>> reducing memory clock level from 2 to 1
>> Jan  2 22:16:10 localhost kernel: [    8.611994] [drm] Display Core
>> initialized with v3.1.27!
>> Jan  2 22:16:10 localhost kernel: [    8.711955] [drm] EDID checksum
>> is invalid, remainder is 223
>> Jan  2 22:16:10 localhost kernel: [    8.711961] Raw EDID:
>> Jan  2 22:16:10 localhost kernel: [    8.711965]      00 ff ff ff ff
>> ff ff 00 2e 83 54 21 34 00 00 00
>> Jan  2 22:16:10 localhost kernel: [    8.711967]      29 15 01 03 80
>> 30 1b 78 0a f0 65 98 57 51 91 27
>> Jan  2 22:16:10 localhost kernel: [    8.711969]      00 50 54 21 08
>> 00 81 80 a9 c0 01 01 01 01 01 01
>> Jan  2 22:16:10 localhost kernel: [    8.711970]      01 01 01 01 01
>> 01 02 3a 80 18 71 38 2d 40 58 2c
>> Jan  2 22:16:10 localhost kernel: [    8.711972]      45 00 dc 0c 11
>> 00 00 1e 66 21 50 b0 51 00 1b 30
>> Jan  2 22:16:10 localhost kernel: [    8.711974]      40 70 36 00 dc
>> 0c 11 00 00 1e 00 00 00 fc 00 32
>> Jan  2 22:16:10 localhost kernel: [    8.711975]      32 4c 31 31 41
>> 2d 48 44 2d 41 55 0a 00 00 00 fd
>> Jan  2 22:16:10 localhost kernel: [    8.711977]      00 31 3d 0f 44
>> 0f 00 0a 20 20 20 20 20 20 01 15
>> Jan  2 22:16:10 localhost kernel: [    8.773617] [drm] EDID checksum
>> is invalid, remainder is 223
>> Jan  2 22:16:10 localhost kernel: [    8.773622] Raw EDID:
>> Jan  2 22:16:10 localhost kernel: [    8.773625]      00 ff ff ff ff
>> ff ff 00 2e 83 54 21 34 00 00 00
>> Jan  2 22:16:10 localhost kernel: [    8.773627]      29 15 01 03 80
>> 30 1b 78 0a f0 65 98 57 51 91 27
>> Jan  2 22:16:10 localhost kernel: [    8.773629]      00 50 54 21 08
>> 00 81 80 a9 c0 01 01 01 01 01 01
>> Jan  2 22:16:10 localhost kernel: [    8.773631]      01 01 01 01 01
>> 01 02 3a 80 18 71 38 2d 40 58 2c
>> Jan  2 22:16:10 localhost kernel: [    8.773632]      45 00 dc 0c 11
>> 00 00 1e 66 21 50 b0 51 00 1b 30
>> Jan  2 22:16:10 localhost kernel: [    8.773634]      40 70 36 00 dc
>> 0c 11 00 00 1e 00 00 00 fc 00 32
>> Jan  2 22:16:10 localhost kernel: [    8.773635]      32 4c 31 31 41
>> 2d 48 44 2d 41 55 0a 00 00 00 fd
>> Jan  2 22:16:10 localhost kernel: [    8.773637]      00 31 3d 0f 44
>> 0f 00 0a 20 20 20 20 20 20 01 15
>> Jan  2 22:16:10 localhost kernel: [    8.835263] [drm] EDID checksum
>> is invalid, remainder is 223
>> Jan  2 22:16:10 localhost kernel: [    8.835267] Raw EDID:
>> Jan  2 22:16:10 localhost kernel: [    8.835270]      00 ff ff ff ff
>> ff ff 00 2e 83 54 21 34 00 00 00
>> Jan  2 22:16:10 localhost kernel: [    8.835272]      29 15 01 03 80
>> 30 1b 78 0a f0 65 98 57 51 91 27
>> Jan  2 22:16:10 localhost kernel: [    8.835274]      00 50 54 21 08
>> 00 81 80 a9 c0 01 01 01 01 01 01
>> Jan  2 22:16:10 localhost kernel: [    8.835275]      01 01 01 01 01
>> 01 02 3a 80 18 71 38 2d 40 58 2c
>> Jan  2 22:16:10 localhost kernel: [    8.835277]      45 00 dc 0c 11
>> 00 00 1e 66 21 50 b0 51 00 1b 30
>> Jan  2 22:16:10 localhost kernel: [    8.835278]      40 70 36 00 dc
>> 0c 11 00 00 1e 00 00 00 fc 00 32
>> Jan  2 22:16:10 localhost kernel: [    8.835280]      32 4c 31 31 41
>> 2d 48 44 2d 41 55 0a 00 00 00 fd
>> Jan  2 22:16:10 localhost kernel: [    8.835282]      00 31 3d 0f 44
>> 0f 00 0a 20 20 20 20 20 20 01 15
>> Jan  2 22:16:10 localhost kernel: [    8.835794]
>> [drm:dm_helpers_read_local_edid [amdgpu]] *ERROR* EDID err: 3, on
>> connector: HDMI-A-1
>> Jan  2 22:16:10 localhost kernel: [    8.836127]
>> [drm:log_to_debug_console [amdgpu]] *ERROR* EDID checksum invalid.
>> Jan  2 22:16:10 localhost kernel: [    8.858895] [drm] Supports vblank
>> timestamp caching Rev 2 (21.10.2013).
>> Jan  2 22:16:10 localhost kernel: [    8.858900] [drm] Driver supports
>> precise vblank timestamp query.
>> Jan  2 22:16:10 localhost kernel: [    8.891054] [drm] UVD and UVD ENC
>> initialized successfully.
>> Jan  2 22:16:10 localhost kernel: [    8.991987] [drm] VCE initialized
>> successfully.
>> Jan  2 22:16:10 localhost kernel: [    9.122860] EXT4-fs (mmcblk0p1):
>> mounted filesystem with ordered data mode. Opts: errors=remount-ro
>> Jan  2 22:16:10 localhost kernel: [    9.148940] Adding 1952764k swap
>> on /dev/sda1.  Priority:-2 extents:1 across:1952764k SS
>> Jan  2 22:16:10 localhost kernel: [    9.514830] [drm] EDID checksum
>> is invalid, remainder is 223
>> Jan  2 22:16:10 localhost kernel: [    9.514834] Raw EDID:
>> Jan  2 22:16:10 localhost kernel: [    9.514838]      00 ff ff ff ff
>> ff ff 00 2e 83 54 21 34 00 00 00
>> Jan  2 22:16:10 localhost kernel: [    9.514840]      29 15 01 03 80
>> 30 1b 78 0a f0 65 98 57 51 91 27
>> Jan  2 22:16:10 localhost kernel: [    9.514842]      00 50 54 21 08
>> 00 81 80 a9 c0 01 01 01 01 01 01
>> Jan  2 22:16:10 localhost kernel: [    9.514843]      01 01 01 01 01
>> 01 02 3a 80 18 71 38 2d 40 58 2c
>> Jan  2 22:16:10 localhost kernel: [    9.514845]      45 00 dc 0c 11
>> 00 00 1e 66 21 50 b0 51 00 1b 30
>> Jan  2 22:16:10 localhost kernel: [    9.514846]      40 70 36 00 dc
>> 0c 11 00 00 1e 00 00 00 fc 00 32
>> Jan  2 22:16:10 localhost kernel: [    9.514848]      32 4c 31 31 41
>> 2d 48 44 2d 41 55 0a 00 00 00 fd
>> Jan  2 22:16:10 localhost kernel: [    9.514850]      00 31 3d 0f 44
>> 0f 00 0a 20 20 20 20 20 20 01 15
>> Jan  2 22:16:10 localhost kernel: [    9.514856] amdgpu 0000:01:00.0:
>> HDMI-A-1: EDID invalid.
>> Jan  2 22:16:10 localhost kernel: [    9.515693] [drm] fb mappable at 0xD03F2000
>> Jan  2 22:16:10 localhost kernel: [    9.515695] [drm] vram apper at 0xD0000000
>> Jan  2 22:16:10 localhost kernel: [    9.515697] [drm] size 3145728
>> Jan  2 22:16:10 localhost kernel: [    9.515698] [drm] fb depth is 24
>> Jan  2 22:16:10 localhost kernel: [    9.515700] [drm]    pitch is 4096
>> Jan  2 22:16:10 localhost kernel: [    9.535511] Console: switching to
>> colour frame buffer device 128x48
>> Jan  2 22:16:10 localhost kernel: [    9.595719] amdgpu 0000:01:00.0:
>> fb0: amdgpudrmfb frame buffer device
>> Jan  2 22:16:10 localhost kernel: [    9.633154] [drm] Initialized
>> amdgpu 3.23.0 20150101 for 0000:01:00.0 on minor 0
>> Jan  2 22:16:12 localhost kernel: [   12.690445] IPv6:
>> ADDRCONF(NETDEV_UP): eth0: link is not ready
>> Jan  2 22:16:12 localhost kernel: [   12.782663] IPv6:
>> ADDRCONF(NETDEV_UP): eth0: link is not ready
>> Jan  2 22:16:12 localhost kernel: [   12.804369] IPv6:
>> ADDRCONF(NETDEV_UP): eth1: link is not ready
>> Jan  2 22:16:12 localhost kernel: [   12.805095] IPv6:
>> ADDRCONF(NETDEV_UP): eth1: link is not ready
>> Jan  2 22:16:12 localhost kernel: [   12.840083] IPv6:
>> ADDRCONF(NETDEV_UP): eth2: link is not ready
>> Jan  2 22:16:12 localhost kernel: [   12.841117] IPv6:
>> ADDRCONF(NETDEV_UP): eth2: link is not ready
>> Jan  2 22:16:12 localhost kernel: [   13.371194] [drm] EDID checksum
>> is invalid, remainder is 223
>> Jan  2 22:16:12 localhost kernel: [   13.371199] Raw EDID:
>> Jan  2 22:16:12 localhost kernel: [   13.371202]      00 ff ff ff ff
>> ff ff 00 2e 83 54 21 34 00 00 00
>> Jan  2 22:16:12 localhost kernel: [   13.371204]      29 15 01 03 80
>> 30 1b 78 0a f0 65 98 57 51 91 27
>> Jan  2 22:16:12 localhost kernel: [   13.371206]      00 50 54 21 08
>> 00 81 80 a9 c0 01 01 01 01 01 01
>> Jan  2 22:16:12 localhost kernel: [   13.371208]      01 01 01 01 01
>> 01 02 3a 80 18 71 38 2d 40 58 2c
>> Jan  2 22:16:12 localhost kernel: [   13.371209]      45 00 dc 0c 11
>> 00 00 1e 66 21 50 b0 51 00 1b 30
>> Jan  2 22:16:12 localhost kernel: [   13.371211]      40 70 36 00 dc
>> 0c 11 00 00 1e 00 00 00 fc 00 32
>> Jan  2 22:16:12 localhost kernel: [   13.371213]      32 4c 31 31 41
>> 2d 48 44 2d 41 55 0a 00 00 00 fd
>> Jan  2 22:16:12 localhost kernel: [   13.371214]      00 31 3d 0f 44
>> 0f 00 0a 20 20 20 20 20 20 01 15
>> Jan  2 22:16:12 localhost kernel: [   13.371220] amdgpu 0000:01:00.0:
>> HDMI-A-1: EDID invalid.
>> Jan  2 22:16:12 localhost kernel: [   13.372798] [drm] EDID checksum
>> is invalid, remainder is 223
>> Jan  2 22:16:12 localhost kernel: [   13.372801] Raw EDID:
>> Jan  2 22:16:12 localhost kernel: [   13.372804]      00 ff ff ff ff
>> ff ff 00 2e 83 54 21 34 00 00 00
>> Jan  2 22:16:12 localhost kernel: [   13.372806]      29 15 01 03 80
>> 30 1b 78 0a f0 65 98 57 51 91 27
>> Jan  2 22:16:12 localhost kernel: [   13.372808]      00 50 54 21 08
>> 00 81 80 a9 c0 01 01 01 01 01 01
>> Jan  2 22:16:12 localhost kernel: [   13.372809]      01 01 01 01 01
>> 01 02 3a 80 18 71 38 2d 40 58 2c
>> Jan  2 22:16:12 localhost kernel: [   13.372811]      45 00 dc 0c 11
>> 00 00 1e 66 21 50 b0 51 00 1b 30
>> Jan  2 22:16:12 localhost kernel: [   13.372813]      40 70 36 00 dc
>> 0c 11 00 00 1e 00 00 00 fc 00 32
>> Jan  2 22:16:12 localhost kernel: [   13.372814]      32 4c 31 31 41
>> 2d 48 44 2d 41 55 0a 00 00 00 fd
>> Jan  2 22:16:12 localhost kernel: [   13.372816]      00 31 3d 0f 44
>> 0f 00 0a 20 20 20 20 20 20 01 15
>> Jan  2 22:16:12 localhost kernel: [   13.372821] amdgpu 0000:01:00.0:
>> HDMI-A-1: EDID invalid.
>> Jan  2 22:16:13 localhost kernel: [   13.831733] mvneta
>> f1030000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
>> Jan  2 22:16:13 localhost kernel: [   13.831755] IPv6:
>> ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
>> Jan  2 22:16:13 localhost kernel: [   13.911730] mvneta
>> f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control off
>> Jan  2 22:16:13 localhost kernel: [   13.911748] IPv6:
>> ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
>> Jan  2 22:16:13 localhost kernel: [   14.209369] random: crng init done
>> Jan  2 22:16:13 localhost kernel: [   14.534838] fuse init (API version 7.26)
>> Jan  2 22:16:15 localhost kernel: [   15.992865] mvneta
>> f1070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
>> Jan  2 22:16:15 localhost kernel: [   15.992884] IPv6:
>> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>
>> On Tue, Jan 2, 2018 at 1:17 PM, Christian König
>> <christian.koenig@amd.com> wrote:
>>>> when you refer to API traces, you're suggesting to strace
>>>> kodi, or what do you mean?
>>> What I meant was apitrace (https://github.com/apitrace/apitrace), but when
>>> even the lightdm login screen crashes than this won't be much helpful.
>>>
>>> That strongly sounds like a ARM specific problem, maybe USWC doesn't work as
>>> it should? See function drm_arch_can_wc_memory() in the kernel source and
>>> try if it helps if you always return false.
>>>
>>> Apart from that the only other explanation I have is that some system memory
>>> isn't accessible for the GPU while some other is working fine.
>>>
>>> Please provide the output of "sudo cat /proc/iomem" to double check that.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                         ` <8bcdc933-050b-12ca-46e6-54bb66b6824d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-03 11:02                           ` Luís Mendes
       [not found]                             ` <CAEzXK1pPzRM=EpL8fKQ5SdDvEePR+_KPhrtPMPiArt==BsJspA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-03 11:02 UTC (permalink / raw)
  To: Christian König
  Cc: alexander.deucher-5C7GfCeVMHo, Chunming Zhou,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Christian,

Replies follow in between.

Regards,
Luís

On Wed, Jan 3, 2018 at 9:37 AM, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Hi Luis,
>
> In general please add information like /proc/iomem and dmesg as attachment
> and not mangled inside the mail.

Ok, I'll take that into account next time. Sorry for the inconvenience.

>
> The good news is that your ARM board at least has a memory layout which
> should work in theory. So at least one problem rules out.

Ok, nice.

>
> I don't think that apitrace would be much helpful in this case as long as no
> developer has access to one of those ARM boards. But it is interesting that
> the apitrace reliable reproduces the issue. This means that it isn't
> something random, but rather a specific timing of things.

I am afraid, I currently don't have boards that I can send yet. I am
developing one, but it will still take some time, before I have one
ready.

I've checked the apitrace and there is a common call
glXSwapBuffers(dpy=0x1389f00, drawable=52428803) that I believe will
trigger the page flip. I suspect there is a race condition with
glXSwapBuffers in mesa or amdgpu, that corrupts some of the data sent
to the GPU causing an hang.
What I believe it seems to be the case is that the GPU lock up only
happens when doing a page flip, since the kernel locks with:
[  243.693200] kworker/u4:3    D    0    89      2 0x00000000
[  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
[  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac)
[  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
(schedule_timeout+0x228/0x444)
[  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
(dma_fence_default_wait+0x2b4/0x2d8)
[  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
(dma_fence_wait_timeout+0x40/0x150)
[  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
(reservation_object_wait_timeout_rcu+0xfc/0x34c)
[  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
[<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
[  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
[<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
...

I will try to reproduce this on x86 with a similar software stack...
and the apitrace traces I got.
What do you think, does this makes sense? Do you have further
suggestions that may help pin down the problem?

Another strange thing... the traces that were consistently causing
hangs yesterday, today are having a bit more difficulty causing them,
but if I play the video with kodi it hangs easily again. Both kodi and
glretarce always hangs with similar kernel backtraces, like the one
above.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                             ` <CAEzXK1pPzRM=EpL8fKQ5SdDvEePR+_KPhrtPMPiArt==BsJspA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-03 11:56                               ` Luís Mendes
       [not found]                                 ` <CAEzXK1pFk_VvOnWXnaLdUcgHeGLZ=+E5fE-+GP1gkfWbQB0OWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2018-01-03 17:09                               ` Michel Dänzer
  1 sibling, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-03 11:56 UTC (permalink / raw)
  To: Christian König
  Cc: alexander.deucher-5C7GfCeVMHo, Chunming Zhou,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Christian, David,

David, replying to your question... The issue is indeed reproducible
on x86, I just did it with kodi and the same VP9 video. So it is not
arm specific.

Regards,
Luís

On Wed, Jan 3, 2018 at 11:02 AM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
> Hi Christian,
>
> Replies follow in between.
>
> Regards,
> Luís
>
> On Wed, Jan 3, 2018 at 9:37 AM, Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Hi Luis,
>>
>> In general please add information like /proc/iomem and dmesg as attachment
>> and not mangled inside the mail.
>
> Ok, I'll take that into account next time. Sorry for the inconvenience.
>
>>
>> The good news is that your ARM board at least has a memory layout which
>> should work in theory. So at least one problem rules out.
>
> Ok, nice.
>
>>
>> I don't think that apitrace would be much helpful in this case as long as no
>> developer has access to one of those ARM boards. But it is interesting that
>> the apitrace reliable reproduces the issue. This means that it isn't
>> something random, but rather a specific timing of things.
>
> I am afraid, I currently don't have boards that I can send yet. I am
> developing one, but it will still take some time, before I have one
> ready.
>
> I've checked the apitrace and there is a common call
> glXSwapBuffers(dpy=0x1389f00, drawable=52428803) that I believe will
> trigger the page flip. I suspect there is a race condition with
> glXSwapBuffers in mesa or amdgpu, that corrupts some of the data sent
> to the GPU causing an hang.
> What I believe it seems to be the case is that the GPU lock up only
> happens when doing a page flip, since the kernel locks with:
> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac)
> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
> (schedule_timeout+0x228/0x444)
> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
> (dma_fence_default_wait+0x2b4/0x2d8)
> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
> (dma_fence_wait_timeout+0x40/0x150)
> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
> ...
>
> I will try to reproduce this on x86 with a similar software stack...
> and the apitrace traces I got.
> What do you think, does this makes sense? Do you have further
> suggestions that may help pin down the problem?
>
> Another strange thing... the traces that were consistently causing
> hangs yesterday, today are having a bit more difficulty causing them,
> but if I play the video with kodi it hangs easily again. Both kodi and
> glretarce always hangs with similar kernel backtraces, like the one
> above.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                 ` <CAEzXK1pFk_VvOnWXnaLdUcgHeGLZ=+E5fE-+GP1gkfWbQB0OWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-03 12:34                                   ` Christian König
  0 siblings, 0 replies; 25+ messages in thread
From: Christian König @ 2018-01-03 12:34 UTC (permalink / raw)
  To: Luís Mendes, Christian König
  Cc: alexander.deucher-5C7GfCeVMHo, Chunming Zhou,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

In this case please open a bug report on fdo and describe exactly how to 
reproduce it.

Marek should be able to take a look then.

Thanks,
Christian.

Am 03.01.2018 um 12:56 schrieb Luís Mendes:
> Hi Christian, David,
>
> David, replying to your question... The issue is indeed reproducible
> on x86, I just did it with kodi and the same VP9 video. So it is not
> arm specific.
>
> Regards,
> Luís
>
> On Wed, Jan 3, 2018 at 11:02 AM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
>> Hi Christian,
>>
>> Replies follow in between.
>>
>> Regards,
>> Luís
>>
>> On Wed, Jan 3, 2018 at 9:37 AM, Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Hi Luis,
>>>
>>> In general please add information like /proc/iomem and dmesg as attachment
>>> and not mangled inside the mail.
>> Ok, I'll take that into account next time. Sorry for the inconvenience.
>>
>>> The good news is that your ARM board at least has a memory layout which
>>> should work in theory. So at least one problem rules out.
>> Ok, nice.
>>
>>> I don't think that apitrace would be much helpful in this case as long as no
>>> developer has access to one of those ARM boards. But it is interesting that
>>> the apitrace reliable reproduces the issue. This means that it isn't
>>> something random, but rather a specific timing of things.
>> I am afraid, I currently don't have boards that I can send yet. I am
>> developing one, but it will still take some time, before I have one
>> ready.
>>
>> I've checked the apitrace and there is a common call
>> glXSwapBuffers(dpy=0x1389f00, drawable=52428803) that I believe will
>> trigger the page flip. I suspect there is a race condition with
>> glXSwapBuffers in mesa or amdgpu, that corrupts some of the data sent
>> to the GPU causing an hang.
>> What I believe it seems to be the case is that the GPU lock up only
>> happens when doing a page flip, since the kernel locks with:
>> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
>> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
>> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac)
>> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
>> (schedule_timeout+0x228/0x444)
>> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
>> (dma_fence_default_wait+0x2b4/0x2d8)
>> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
>> (dma_fence_wait_timeout+0x40/0x150)
>> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
>> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
>> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
>> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
>> ...
>>
>> I will try to reproduce this on x86 with a similar software stack...
>> and the apitrace traces I got.
>> What do you think, does this makes sense? Do you have further
>> suggestions that may help pin down the problem?
>>
>> Another strange thing... the traces that were consistently causing
>> hangs yesterday, today are having a bit more difficulty causing them,
>> but if I play the video with kodi it hangs easily again. Both kodi and
>> glretarce always hangs with similar kernel backtraces, like the one
>> above.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                             ` <CAEzXK1pPzRM=EpL8fKQ5SdDvEePR+_KPhrtPMPiArt==BsJspA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2018-01-03 11:56                               ` Luís Mendes
@ 2018-01-03 17:09                               ` Michel Dänzer
       [not found]                                 ` <99a9e27e-f166-f969-416f-f128b0673388-otUistvHUpPR7s880joybQ@public.gmane.org>
  1 sibling, 1 reply; 25+ messages in thread
From: Michel Dänzer @ 2018-01-03 17:09 UTC (permalink / raw)
  To: Luís Mendes, Christian König
  Cc: alexander.deucher-5C7GfCeVMHo, Chunming Zhou,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-01-03 12:02 PM, Luís Mendes wrote:
> 
> What I believe it seems to be the case is that the GPU lock up only
> happens when doing a page flip, since the kernel locks with:
> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac)
> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
> (schedule_timeout+0x228/0x444)
> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
> (dma_fence_default_wait+0x2b4/0x2d8)
> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
> (dma_fence_wait_timeout+0x40/0x150)
> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
> ...

Does the problem also occur if you disable DC with amdgpu.dc=0 on the
kernel command line?

Does it also happen with a kernel built from the amd-staging-drm-next
branch instead of drm-next-4.16?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                 ` <99a9e27e-f166-f969-416f-f128b0673388-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2018-01-03 17:47                                   ` Luís Mendes
       [not found]                                     ` <CAEzXK1rb6ngg=3MHo6yT+ed-a_1xr3ASwLPtsK4CpSMBk3xKgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-03 17:47 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: alexander.deucher-5C7GfCeVMHo, Chunming Zhou,
	Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Michel, Christian,

Christian, I have followed your suggestion and I have just submitted a
bug to fdo at https://bugs.freedesktop.org/show_bug.cgi?id=104481 -
GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7
platforms while playing video.

Michel, amdgpu.dc=0 seems to make no difference. I will try
amd-staging-drm-next and report back.

Regards,
Luís

On Wed, Jan 3, 2018 at 5:09 PM, Michel Dänzer <michel@daenzer.net> wrote:
> On 2018-01-03 12:02 PM, Luís Mendes wrote:
>>
>> What I believe it seems to be the case is that the GPU lock up only
>> happens when doing a page flip, since the kernel locks with:
>> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
>> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
>> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac)
>> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
>> (schedule_timeout+0x228/0x444)
>> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
>> (dma_fence_default_wait+0x2b4/0x2d8)
>> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
>> (dma_fence_wait_timeout+0x40/0x150)
>> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
>> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
>> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
>> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
>> ...
>
> Does the problem also occur if you disable DC with amdgpu.dc=0 on the
> kernel command line?
>
> Does it also happen with a kernel built from the amd-staging-drm-next
> branch instead of drm-next-4.16?
>
>
> --
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                     ` <CAEzXK1rb6ngg=3MHo6yT+ed-a_1xr3ASwLPtsK4CpSMBk3xKgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-03 23:08                                       ` Luís Mendes
       [not found]                                         ` <CAEzXK1qFh-Y6zCoVpRVRcLEe_hLFueqntrjaAwgfKyLGd2u27A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-03 23:08 UTC (permalink / raw)
  To: Michel Dänzer, Christian König
  Cc: alexander.deucher-5C7GfCeVMHo, Chunming Zhou,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Michel, Christian,

Michel, I have tested amd-staging-drm-next at commit "drm/amdgpu/gfx9:
only init the apertures used by KGD (v2)" -
0e4946409d11913523d30bc4830d10b388438c7a and the issues remain, both
on ARMv7 and on x86 amd64.

Christian, in fact if I replay the apitraces obtained on the ARMv7
platform on the AMD64 I am also able to reproduce the GPU hang! So it
is not ARM platform specific. Should I send/upload the apitraces? I
have two of them, typically when one doesn't hang the gpu the other
hangs. One takes about 1GB of disk space while the other takes 2.3GB.
...
[   69.019381] ISO 9660 Extensions: RRIP_1991A
[  213.292094] DMAR: DRHD: handling fault status reg 2
[  213.292102] DMAR: [INTR-REMAP] Request device [00:00.0] fault index
1c [fault reason 38] Blocked an interrupt request due to source-id
verification failure
[  223.406919] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=25158, last emitted seq=25160
[  223.406926] [drm] IP block:tonga_ih is hung!
[  223.407167] [drm] GPU recovery disabled.

Regards,
Luís


On Wed, Jan 3, 2018 at 5:47 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
> Hi Michel, Christian,
>
> Christian, I have followed your suggestion and I have just submitted a
> bug to fdo at https://bugs.freedesktop.org/show_bug.cgi?id=104481 -
> GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7
> platforms while playing video.
>
> Michel, amdgpu.dc=0 seems to make no difference. I will try
> amd-staging-drm-next and report back.
>
> Regards,
> Luís
>
> On Wed, Jan 3, 2018 at 5:09 PM, Michel Dänzer <michel@daenzer.net> wrote:
>> On 2018-01-03 12:02 PM, Luís Mendes wrote:
>>>
>>> What I believe it seems to be the case is that the GPU lock up only
>>> happens when doing a page flip, since the kernel locks with:
>>> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
>>> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
>>> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac)
>>> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
>>> (schedule_timeout+0x228/0x444)
>>> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
>>> (dma_fence_default_wait+0x2b4/0x2d8)
>>> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
>>> (dma_fence_wait_timeout+0x40/0x150)
>>> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
>>> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
>>> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
>>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
>>> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
>>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
>>> ...
>>
>> Does the problem also occur if you disable DC with amdgpu.dc=0 on the
>> kernel command line?
>>
>> Does it also happen with a kernel built from the amd-staging-drm-next
>> branch instead of drm-next-4.16?
>>
>>
>> --
>> Earthling Michel Dänzer               |               http://www.amd.com
>> Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                         ` <CAEzXK1qFh-Y6zCoVpRVRcLEe_hLFueqntrjaAwgfKyLGd2u27A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-30 18:30                                           ` Luís Mendes
       [not found]                                             ` <CAEzXK1rixP9spSTBY4V5GWxyUWJdf23Nbis2gbKgfxz4A6w2rQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-30 18:30 UTC (permalink / raw)
  To: Michel Dänzer, Christian König
  Cc: Alex Deucher, Chunming Zhou, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi everyone,

I've tested the kernel from amd-drm-next-4.17-wip at commit
9ab2894122275a6d636bb2654a157e88a0f7b9e2 (
drm/amdgpu: set DRIVER_ATOMIC flag early) on ARMv7l, and the reported
issues seem now to have gone. I haven't checked from which commit this
is fixed, but it is now fixed! I also noticed a performance
improvement in one of the glmark2 tests.

There seem to be some other small issues, possibly unrelated, such
that sometimes the screen becomes black and the sound stops while
playing the video for a second or less and then normal playback is
recovered, this happens rarely and at most once per power cycle, while
using X and Kodi, despite I have played many individual videos and
power cycled the machine sometimes.

I've also observed what was already reported, when watching non-VP9 videos:
[  591.729558] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.740255] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.750968] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.761628] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.772248] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.782672] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.793172] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.803681] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.814129] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.824560] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.835054] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.845437] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.855860] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.866415] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.876945] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.887454] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!

Regards,
Luís Mendes

On Wed, Jan 3, 2018 at 11:08 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
> Hi Michel, Christian,
>
> Michel, I have tested amd-staging-drm-next at commit "drm/amdgpu/gfx9:
> only init the apertures used by KGD (v2)" -
> 0e4946409d11913523d30bc4830d10b388438c7a and the issues remain, both
> on ARMv7 and on x86 amd64.
>
> Christian, in fact if I replay the apitraces obtained on the ARMv7
> platform on the AMD64 I am also able to reproduce the GPU hang! So it
> is not ARM platform specific. Should I send/upload the apitraces? I
> have two of them, typically when one doesn't hang the gpu the other
> hangs. One takes about 1GB of disk space while the other takes 2.3GB.
> ...
> [   69.019381] ISO 9660 Extensions: RRIP_1991A
> [  213.292094] DMAR: DRHD: handling fault status reg 2
> [  213.292102] DMAR: [INTR-REMAP] Request device [00:00.0] fault index
> 1c [fault reason 38] Blocked an interrupt request due to source-id
> verification failure
> [  223.406919] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, last signaled seq=25158, last emitted seq=25160
> [  223.406926] [drm] IP block:tonga_ih is hung!
> [  223.407167] [drm] GPU recovery disabled.
>
> Regards,
> Luís
>
>
> On Wed, Jan 3, 2018 at 5:47 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
>> Hi Michel, Christian,
>>
>> Christian, I have followed your suggestion and I have just submitted a
>> bug to fdo at https://bugs.freedesktop.org/show_bug.cgi?id=104481 -
>> GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7
>> platforms while playing video.
>>
>> Michel, amdgpu.dc=0 seems to make no difference. I will try
>> amd-staging-drm-next and report back.
>>
>> Regards,
>> Luís
>>
>> On Wed, Jan 3, 2018 at 5:09 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>> On 2018-01-03 12:02 PM, Luís Mendes wrote:
>>>>
>>>> What I believe it seems to be the case is that the GPU lock up only
>>>> happens when doing a page flip, since the kernel locks with:
>>>> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
>>>> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
>>>> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac)
>>>> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
>>>> (schedule_timeout+0x228/0x444)
>>>> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
>>>> (dma_fence_default_wait+0x2b4/0x2d8)
>>>> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
>>>> (dma_fence_wait_timeout+0x40/0x150)
>>>> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
>>>> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
>>>> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
>>>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
>>>> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
>>>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
>>>> ...
>>>
>>> Does the problem also occur if you disable DC with amdgpu.dc=0 on the
>>> kernel command line?
>>>
>>> Does it also happen with a kernel built from the amd-staging-drm-next
>>> branch instead of drm-next-4.16?
>>>
>>>
>>> --
>>> Earthling Michel Dänzer               |               http://www.amd.com
>>> Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                             ` <CAEzXK1rixP9spSTBY4V5GWxyUWJdf23Nbis2gbKgfxz4A6w2rQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-30 18:44                                               ` Deucher, Alexander
       [not found]                                                 ` <BN6PR12MB1652E01048ABD6A307E80B8DF7E40-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Deucher, Alexander @ 2018-01-30 18:44 UTC (permalink / raw)
  To: Luís Mendes, Michel Dänzer, Koenig, Christian
  Cc: Zhou, David(ChunMing), amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 6720 bytes --]

Fixed with this patch:

https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html


Alex

________________________________
From: Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Tuesday, January 30, 2018 1:30 PM
To: Michel Dänzer; Koenig, Christian
Cc: Deucher, Alexander; Zhou, David(ChunMing); amd-gfx-PD4FTy7X32lNgt0PjOBp9/EVdHwE84te@public.gmane.orgg
Subject: Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2

Hi everyone,

I've tested the kernel from amd-drm-next-4.17-wip at commit
9ab2894122275a6d636bb2654a157e88a0f7b9e2 (
drm/amdgpu: set DRIVER_ATOMIC flag early) on ARMv7l, and the reported
issues seem now to have gone. I haven't checked from which commit this
is fixed, but it is now fixed! I also noticed a performance
improvement in one of the glmark2 tests.

There seem to be some other small issues, possibly unrelated, such
that sometimes the screen becomes black and the sound stops while
playing the video for a second or less and then normal playback is
recovered, this happens rarely and at most once per power cycle, while
using X and Kodi, despite I have played many individual videos and
power cycled the machine sometimes.

I've also observed what was already reported, when watching non-VP9 videos:
[  591.729558] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.740255] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.750968] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.761628] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.772248] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.782672] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.793172] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.803681] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.814129] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.824560] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.835054] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.845437] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.855860] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.866415] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.876945] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!
[  591.887454] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
writing more dwords to the ring than expected!

Regards,
Luís Mendes

On Wed, Jan 3, 2018 at 11:08 PM, Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Michel, Christian,
>
> Michel, I have tested amd-staging-drm-next at commit "drm/amdgpu/gfx9:
> only init the apertures used by KGD (v2)" -
> 0e4946409d11913523d30bc4830d10b388438c7a and the issues remain, both
> on ARMv7 and on x86 amd64.
>
> Christian, in fact if I replay the apitraces obtained on the ARMv7
> platform on the AMD64 I am also able to reproduce the GPU hang! So it
> is not ARM platform specific. Should I send/upload the apitraces? I
> have two of them, typically when one doesn't hang the gpu the other
> hangs. One takes about 1GB of disk space while the other takes 2.3GB.
> ...
> [   69.019381] ISO 9660 Extensions: RRIP_1991A
> [  213.292094] DMAR: DRHD: handling fault status reg 2
> [  213.292102] DMAR: [INTR-REMAP] Request device [00:00.0] fault index
> 1c [fault reason 38] Blocked an interrupt request due to source-id
> verification failure
> [  223.406919] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, last signaled seq=25158, last emitted seq=25160
> [  223.406926] [drm] IP block:tonga_ih is hung!
> [  223.407167] [drm] GPU recovery disabled.
>
> Regards,
> Luís
>
>
> On Wed, Jan 3, 2018 at 5:47 PM, Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi Michel, Christian,
>>
>> Christian, I have followed your suggestion and I have just submitted a
>> bug to fdo at https://bugs.freedesktop.org/show_bug.cgi?id=104481 -
>> GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7
>> platforms while playing video.
>>
>> Michel, amdgpu.dc=0 seems to make no difference. I will try
>> amd-staging-drm-next and report back.
>>
>> Regards,
>> Luís
>>
>> On Wed, Jan 3, 2018 at 5:09 PM, Michel Dänzer <michel-otUistvHUpPR7s880joybQ@public.gmane.org> wrote:
>>> On 2018-01-03 12:02 PM, Luís Mendes wrote:
>>>>
>>>> What I believe it seems to be the case is that the GPU lock up only
>>>> happens when doing a page flip, since the kernel locks with:
>>>> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
>>>> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
>>>> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>] (schedule+0x4c/0xac)
>>>> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
>>>> (schedule_timeout+0x228/0x444)
>>>> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
>>>> (dma_fence_default_wait+0x2b4/0x2d8)
>>>> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
>>>> (dma_fence_wait_timeout+0x40/0x150)
>>>> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
>>>> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
>>>> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
>>>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
>>>> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
>>>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
>>>> ...
>>>
>>> Does the problem also occur if you disable DC with amdgpu.dc=0 on the
>>> kernel command line?
>>>
>>> Does it also happen with a kernel built from the amd-staging-drm-next
>>> branch instead of drm-next-4.16?
>>>
>>>
>>> --
>>> Earthling Michel Dänzer               |               http://www.amd.com
>>> Libre software enthusiast             |             Mesa and X developer

[-- Attachment #1.2: Type: text/html, Size: 9740 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                                 ` <BN6PR12MB1652E01048ABD6A307E80B8DF7E40-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-01-31 12:47                                                   ` Luís Mendes
       [not found]                                                     ` <CAEzXK1rbmaNifg00RY+GPdUgLtysLsF1mciPekbotWfT0gagJA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-31 12:47 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: Zhou, David(ChunMing),
	Michel Dänzer, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Alexander,

I've cherry picked the patch you pointed out into kernel from
amd-drm-next-4.17-wip at commit
9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
gone indeed.

Working great on ARMv7l with AMD RX460.

Thanks,
Luís Mendes

On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
<Alexander.Deucher@amd.com> wrote:
> Fixed with this patch:
>
> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>
>
> Alex
>
> ________________________________
> From: Luís Mendes <luis.p.mendes@gmail.com>
> Sent: Tuesday, January 30, 2018 1:30 PM
> To: Michel Dänzer; Koenig, Christian
> Cc: Deucher, Alexander; Zhou, David(ChunMing); amd-gfx@lists.freedesktop.org
> Subject: Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 -
> Update 2
>
> Hi everyone,
>
> I've tested the kernel from amd-drm-next-4.17-wip at commit
> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 (
> drm/amdgpu: set DRIVER_ATOMIC flag early) on ARMv7l, and the reported
> issues seem now to have gone. I haven't checked from which commit this
> is fixed, but it is now fixed! I also noticed a performance
> improvement in one of the glmark2 tests.
>
> There seem to be some other small issues, possibly unrelated, such
> that sometimes the screen becomes black and the sound stops while
> playing the video for a second or less and then normal playback is
> recovered, this happens rarely and at most once per power cycle, while
> using X and Kodi, despite I have played many individual videos and
> power cycled the machine sometimes.
>
> I've also observed what was already reported, when watching non-VP9 videos:
> [  591.729558] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.740255] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.750968] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.761628] [drm:uvd_v6_0_ring_emit_fence [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.772248] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.782672] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.793172] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.803681] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.814129] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.824560] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.835054] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.845437] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.855860] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.866415] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.876945] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
> [  591.887454] [drm:amdgpu_ring_insert_nop [amdgpu]] *ERROR* amdgpu:
> writing more dwords to the ring than expected!
>
> Regards,
> Luís Mendes
>
> On Wed, Jan 3, 2018 at 11:08 PM, Luís Mendes <luis.p.mendes@gmail.com>
> wrote:
>> Hi Michel, Christian,
>>
>> Michel, I have tested amd-staging-drm-next at commit "drm/amdgpu/gfx9:
>> only init the apertures used by KGD (v2)" -
>> 0e4946409d11913523d30bc4830d10b388438c7a and the issues remain, both
>> on ARMv7 and on x86 amd64.
>>
>> Christian, in fact if I replay the apitraces obtained on the ARMv7
>> platform on the AMD64 I am also able to reproduce the GPU hang! So it
>> is not ARM platform specific. Should I send/upload the apitraces? I
>> have two of them, typically when one doesn't hang the gpu the other
>> hangs. One takes about 1GB of disk space while the other takes 2.3GB.
>> ...
>> [   69.019381] ISO 9660 Extensions: RRIP_1991A
>> [  213.292094] DMAR: DRHD: handling fault status reg 2
>> [  213.292102] DMAR: [INTR-REMAP] Request device [00:00.0] fault index
>> 1c [fault reason 38] Blocked an interrupt request due to source-id
>> verification failure
>> [  223.406919] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
>> timeout, last signaled seq=25158, last emitted seq=25160
>> [  223.406926] [drm] IP block:tonga_ih is hung!
>> [  223.407167] [drm] GPU recovery disabled.
>>
>> Regards,
>> Luís
>>
>>
>> On Wed, Jan 3, 2018 at 5:47 PM, Luís Mendes <luis.p.mendes@gmail.com>
>> wrote:
>>> Hi Michel, Christian,
>>>
>>> Christian, I have followed your suggestion and I have just submitted a
>>> bug to fdo at https://bugs.freedesktop.org/show_bug.cgi?id=104481 -
>>> GPU lockup Polaris 11 - AMD RX 460 and RX 550 on amd64 and on ARMv7
>>> platforms while playing video.
>>>
>>> Michel, amdgpu.dc=0 seems to make no difference. I will try
>>> amd-staging-drm-next and report back.
>>>
>>> Regards,
>>> Luís
>>>
>>> On Wed, Jan 3, 2018 at 5:09 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>>> On 2018-01-03 12:02 PM, Luís Mendes wrote:
>>>>>
>>>>> What I believe it seems to be the case is that the GPU lock up only
>>>>> happens when doing a page flip, since the kernel locks with:
>>>>> [  243.693200] kworker/u4:3    D    0    89      2 0x00000000
>>>>> [  243.693232] Workqueue: events_unbound commit_work [drm_kms_helper]
>>>>> [  243.693251] [<80b8c6d4>] (__schedule) from [<80b8cdd0>]
>>>>> (schedule+0x4c/0xac)
>>>>> [  243.693259] [<80b8cdd0>] (schedule) from [<80b91024>]
>>>>> (schedule_timeout+0x228/0x444)
>>>>> [  243.693270] [<80b91024>] (schedule_timeout) from [<80886738>]
>>>>> (dma_fence_default_wait+0x2b4/0x2d8)
>>>>> [  243.693276] [<80886738>] (dma_fence_default_wait) from [<80885d60>]
>>>>> (dma_fence_wait_timeout+0x40/0x150)
>>>>> [  243.693284] [<80885d60>] (dma_fence_wait_timeout) from [<80887b1c>]
>>>>> (reservation_object_wait_timeout_rcu+0xfc/0x34c)
>>>>> [  243.693509] [<80887b1c>] (reservation_object_wait_timeout_rcu) from
>>>>> [<7f331988>] (amdgpu_dm_do_flip+0xec/0x36c [amdgpu])
>>>>> [  243.693789] [<7f331988>] (amdgpu_dm_do_flip [amdgpu]) from
>>>>> [<7f33309c>] (amdgpu_dm_atomic_commit_tail+0xbfc/0xe58 [amdgpu])
>>>>> ...
>>>>
>>>> Does the problem also occur if you disable DC with amdgpu.dc=0 on the
>>>> kernel command line?
>>>>
>>>> Does it also happen with a kernel built from the amd-staging-drm-next
>>>> branch instead of drm-next-4.16?
>>>>
>>>>
>>>> --
>>>> Earthling Michel Dänzer               |               http://www.amd.com
>>>> Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                                     ` <CAEzXK1rbmaNifg00RY+GPdUgLtysLsF1mciPekbotWfT0gagJA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-31 23:57                                                       ` Luís Mendes
       [not found]                                                         ` <CAEzXK1ok5BsSMY2t4+D0575QSAqEkzKPC7cnWEVmg0OhCFcgcA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-01-31 23:57 UTC (permalink / raw)
  To: Deucher, Alexander
  Cc: Zhou, David(ChunMing),
	Michel Dänzer, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]

Hi everyone,

I am getting a new issue with amdgpu with RX460, that is, now I can
play any videos with Kodi or play web videos with firefox and run
OpenGL applications without running into any issues, however after
some uptime with XOrg even when almost inactive I get a kmalloc
allocation failure, normally followed by a GPU hang a while after the
the allocation failure.
I had a terminal window under Ubuntu Mate 17.10 and I was compiling
code when I got the kernel messages that can be found in attachment.

I am using the kernel as identified on my previous email, which can be
found below.

Regards,
Luís Mendes

On Wed, Jan 31, 2018 at 12:47 PM, Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Alexander,
>
> I've cherry picked the patch you pointed out into kernel from
> amd-drm-next-4.17-wip at commit
> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
> DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
> gone indeed.
>
>
>Working great on ARMv7l with AMD RX460.
>
>Thanks,
>Luís Mendes
>
>
>On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
><Alexander.Deucher-5C7GfCeVMHo@public.gmane.org> wrote:
>> Fixed with this patch:
>>
>> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>>
>>
>> Alex
<>
>> __________________

[-- Attachment #2: AMDGPU_new_hang.txt --]
[-- Type: text/plain, Size: 16988 bytes --]

Jan 31 21:56:11 localhost kernel: [ 4091.449841] Xorg: page allocation failure: order:5, mode:0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
Jan 31 21:56:11 localhost kernel: [ 4091.449845] Xorg cpuset=/ mems_allowed=0
Jan 31 21:56:11 localhost kernel: [ 4091.449855] CPU: 0 PID: 3810 Comm: Xorg Not tainted 4.15.0-rc8-next2g-g9ab2894-dirty #1
Jan 31 21:56:11 localhost kernel: [ 4091.449857] Hardware name: Marvell Armada 380/385 (Device Tree)
Jan 31 21:56:11 localhost kernel: [ 4091.449859] Backtrace: 
Jan 31 21:56:11 localhost kernel: [ 4091.449870] [<8010dca8>] (dump_backtrace) from [<8010dfa4>] (show_stack+0x18/0x1c)
Jan 31 21:56:11 localhost kernel: [ 4091.449875]  r7:ffffe000 r6:60070013 r5:00000000 r4:8108d150
Jan 31 21:56:11 localhost kernel: [ 4091.449883] [<8010df8c>] (show_stack) from [<80b3ef04>] (dump_stack+0x94/0xa8)
Jan 31 21:56:11 localhost kernel: [ 4091.449891] [<80b3ee70>] (dump_stack) from [<8021fbe8>] (warn_alloc+0xc4/0x15c)
Jan 31 21:56:11 localhost kernel: [ 4091.449895]  r7:ffffe000 r6:80d53610 r5:00000000 r4:81004c48
Jan 31 21:56:11 localhost kernel: [ 4091.449900] [<8021fb28>] (warn_alloc) from [<80220b0c>] (__alloc_pages_nodemask+0xde4/0xf54)
Jan 31 21:56:11 localhost kernel: [ 4091.449902]  r3:00000005 r2:80d53610
Jan 31 21:56:11 localhost kernel: [ 4091.449905]  r7:00000032 r6:0140c0c0 r5:00000040 r4:00000000
Jan 31 21:56:11 localhost kernel: [ 4091.449911] [<8021fd28>] (__alloc_pages_nodemask) from [<80242164>] (kmalloc_order+0x20/0x38)
Jan 31 21:56:11 localhost kernel: [ 4091.449915]  r10:bcb48000 r9:a766ca00 r8:00000005 r7:7f2f37f4 r6:014080c0 r5:00018018
Jan 31 21:56:11 localhost kernel: [ 4091.449917]  r4:bc83d02c
Jan 31 21:56:11 localhost kernel: [ 4091.449922] [<80242144>] (kmalloc_order) from [<802421a0>] (kmalloc_order_trace+0x24/0xc8)
Jan 31 21:56:11 localhost kernel: [ 4091.450184] [<8024217c>] (kmalloc_order_trace) from [<7f2f37f4>] (dc_create_gamma+0x24/0x34 [amdgpu])
Jan 31 21:56:11 localhost kernel: [ 4091.450189]  r10:bcb48000 r9:a766ca00 r8:00000000 r7:00000001 r6:be4f0c00 r5:b9de1448
Jan 31 21:56:11 localhost kernel: [ 4091.450191]  r4:bc83d02c
Jan 31 21:56:11 localhost kernel: [ 4091.450474] [<7f2f37d0>] (dc_create_gamma [amdgpu]) from [<7f29d8a8>] (amdgpu_dm_atomic_check+0x67c/0xc6c [amdgpu])
Jan 31 21:56:11 localhost kernel: [ 4091.450658] [<7f29d22c>] (amdgpu_dm_atomic_check [amdgpu]) from [<7f0af238>] (drm_atomic_check_only+0x3bc/0x5c4 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.450663]  r10:00000800 r9:7fffffff r8:81004c48 r7:b4718e80 r6:b9cd1380 r5:00000001
Jan 31 21:56:11 localhost kernel: [ 4091.450664]  r4:00000000
Jan 31 21:56:11 localhost kernel: [ 4091.450711] [<7f0aee7c>] (drm_atomic_check_only [drm]) from [<7f0af458>] (drm_atomic_commit+0x18/0x60 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.450715]  r10:00000800 r9:bbc4c000 r8:bc83d000 r7:b9cd1380 r6:be1b5000 r5:b9cd1380
Jan 31 21:56:11 localhost kernel: [ 4091.450717]  r4:00000001
Jan 31 21:56:11 localhost kernel: [ 4091.450763] [<7f0af440>] (drm_atomic_commit [drm]) from [<7f1260e8>] (drm_atomic_helper_legacy_gamma_set+0x110/0x160 [drm_kms_helper])
Jan 31 21:56:11 localhost kernel: [ 4091.450767]  r7:b9cd1380 r6:bbc4b9fe r5:a766d200 r4:00000001
Jan 31 21:56:11 localhost kernel: [ 4091.450799] [<7f125fd8>] (drm_atomic_helper_legacy_gamma_set [drm_kms_helper]) from [<7f0b951c>] (drm_mode_gamma_set_ioctl+0x1c4/0x2c0 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.450804]  r10:ffffe000 r9:bbc4ba00 r8:bbc4b800 r7:bbc4c034 r6:b1911e2c r5:b1911d80
Jan 31 21:56:11 localhost kernel: [ 4091.450806]  r4:7f125fd8 r3:bbc4bc00
Jan 31 21:56:11 localhost kernel: [ 4091.450848] [<7f0b9358>] (drm_mode_gamma_set_ioctl [drm]) from [<7f09d920>] (drm_ioctl_kernel+0x68/0xb4 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.450852]  r10:00000020 r9:b1911e2c r8:7f0b9358 r7:00000012 r6:00000000 r5:be1b5000
Jan 31 21:56:11 localhost kernel: [ 4091.450854]  r4:b4875240
Jan 31 21:56:11 localhost kernel: [ 4091.450892] [<7f09d8b8>] (drm_ioctl_kernel [drm]) from [<7f09ddec>] (drm_ioctl+0x2cc/0x3b0 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.450897]  r9:000000a5 r8:c02064a5 r7:b4875240 r6:7f0b9358 r5:7f0c4710 r4:81004c48
Jan 31 21:56:11 localhost kernel: [ 4091.451060] [<7f09db20>] (drm_ioctl [drm]) from [<7f180010>] (amdgpu_drm_ioctl+0x10/0x14 [amdgpu])
Jan 31 21:56:11 localhost kernel: [ 4091.451064]  r10:bcb5c6b0 r9:b1910000 r8:7e9f4aa8 r7:0000000b r6:be4b1300 r5:7e9f4aa8
Jan 31 21:56:11 localhost kernel: [ 4091.451066]  r4:81004c48
Jan 31 21:56:11 localhost kernel: [ 4091.451198] [<7f180000>] (amdgpu_drm_ioctl [amdgpu]) from [<8028c71c>] (do_vfs_ioctl+0xb8/0x8cc)
Jan 31 21:56:11 localhost kernel: [ 4091.451203] [<8028c664>] (do_vfs_ioctl) from [<8028cf6c>] (SyS_ioctl+0x3c/0x60)
Jan 31 21:56:11 localhost kernel: [ 4091.451208]  r10:00000000 r9:b1910000 r8:7e9f4aa8 r7:c02064a5 r6:0000000b r5:be4b1300
Jan 31 21:56:11 localhost kernel: [ 4091.451209]  r4:be4b1301
Jan 31 21:56:11 localhost kernel: [ 4091.451216] [<8028cf30>] (SyS_ioctl) from [<80108f00>] (ret_fast_syscall+0x0/0x54)
Jan 31 21:56:11 localhost kernel: [ 4091.451220]  r9:b1910000 r8:801090e4 r7:00000036 r6:c02064a5 r5:7e9f4aa8 r4:00000000
Jan 31 21:56:11 localhost kernel: [ 4091.451238] Mem-Info:
Jan 31 21:56:11 localhost kernel: [ 4091.451248] active_anon:14596 inactive_anon:40773 isolated_anon:0
Jan 31 21:56:11 localhost kernel: [ 4091.451248]  active_file:94588 inactive_file:79288 isolated_file:0
Jan 31 21:56:11 localhost kernel: [ 4091.451248]  unevictable:8 dirty:1694 writeback:0 unstable:0
Jan 31 21:56:11 localhost kernel: [ 4091.451248]  slab_reclaimable:9942 slab_unreclaimable:3950
Jan 31 21:56:11 localhost kernel: [ 4091.451248]  mapped:31412 shmem:3542 pagetables:1301 bounce:0
Jan 31 21:56:11 localhost kernel: [ 4091.451248]  free:3477 free_pcp:27 free_cma:0
Jan 31 21:56:11 localhost kernel: [ 4091.451254] Node 0 active_anon:58384kB inactive_anon:163092kB active_file:378352kB inactive_file:317152kB unevictable:32kB isolated(anon):0kB isolated(file):0kB mapped:125648kB dirty:6776kB writeback:0kB shmem:14168kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jan 31 21:56:11 localhost kernel: [ 4091.451262] Normal free:13908kB min:4040kB low:5060kB high:6080kB active_anon:58280kB inactive_anon:163404kB active_file:378644kB inactive_file:316952kB unevictable:32kB writepending:6776kB present:1048576kB managed:1023064kB mlocked:32kB kernel_stack:2640kB pagetables:5204kB bounce:0kB free_pcp:108kB local_pcp:0kB free_cma:0kB
Jan 31 21:56:11 localhost kernel: [ 4091.451263] lowmem_reserve[]: 0 0 0
Jan 31 21:56:11 localhost kernel: [ 4091.451269] Normal: 17*4kB (UME) 208*8kB (UME) 197*16kB (UME) 200*32kB (UM) 45*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 14164kB
Jan 31 21:56:11 localhost kernel: [ 4091.451291] 177463 total pagecache pages
Jan 31 21:56:11 localhost kernel: [ 4091.451294] 12 pages in swap cache
Jan 31 21:56:11 localhost kernel: [ 4091.451296] Swap cache stats: add 684, delete 672, find 0/0
Jan 31 21:56:11 localhost kernel: [ 4091.451298] Free swap  = 1949948kB
Jan 31 21:56:11 localhost kernel: [ 4091.451299] Total swap = 1952764kB
Jan 31 21:56:11 localhost kernel: [ 4091.451300] 262144 pages RAM
Jan 31 21:56:11 localhost kernel: [ 4091.451302] 0 pages HighMem/MovableOnly
Jan 31 21:56:11 localhost kernel: [ 4091.451303] 6378 pages reserved
Jan 31 21:56:11 localhost kernel: [ 4091.451305] ------------[ cut here ]------------
Jan 31 21:56:11 localhost kernel: [ 4091.451454] WARNING: CPU: 0 PID: 3810 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:1966 amdgpu_dm_atomic_check+0xc00/0xc6c [amdgpu]
Jan 31 21:56:11 localhost kernel: [ 4091.451456] Modules linked in: fuse amdgpu mfd_core chash gpu_sched ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm snd_hda_codec_hdmi drm_panel_orientation_quirks snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep parport_pc ppdev lp parport
Jan 31 21:56:11 localhost kernel: [ 4091.451490] CPU: 0 PID: 3810 Comm: Xorg Not tainted 4.15.0-rc8-next2g-g9ab2894-dirty #1
Jan 31 21:56:11 localhost kernel: [ 4091.451492] Hardware name: Marvell Armada 380/385 (Device Tree)
Jan 31 21:56:11 localhost kernel: [ 4091.451494] Backtrace: 
Jan 31 21:56:11 localhost kernel: [ 4091.451500] [<8010dca8>] (dump_backtrace) from [<8010dfa4>] (show_stack+0x18/0x1c)
Jan 31 21:56:11 localhost kernel: [ 4091.451504]  r7:00000009 r6:60070013 r5:00000000 r4:8108d150
Jan 31 21:56:11 localhost kernel: [ 4091.451509] [<8010df8c>] (show_stack) from [<80b3ef04>] (dump_stack+0x94/0xa8)
Jan 31 21:56:11 localhost kernel: [ 4091.451517] [<80b3ee70>] (dump_stack) from [<80123678>] (__warn+0xe8/0x100)
Jan 31 21:56:11 localhost kernel: [ 4091.451520]  r7:00000009 r6:7f3623d4 r5:00000000 r4:00000000
Jan 31 21:56:11 localhost kernel: [ 4091.451525] [<80123590>] (__warn) from [<801237b0>] (warn_slowpath_null+0x48/0x50)
Jan 31 21:56:11 localhost kernel: [ 4091.451529]  r9:a766ca00 r8:00000000 r7:00000001 r6:7f29de2c r5:000007ae r4:7f3623d4
Jan 31 21:56:11 localhost kernel: [ 4091.451678] [<80123768>] (warn_slowpath_null) from [<7f29de2c>] (amdgpu_dm_atomic_check+0xc00/0xc6c [amdgpu])
Jan 31 21:56:11 localhost kernel: [ 4091.451681]  r6:be4f0c00 r5:b9de1448 r4:bc83d02c
Jan 31 21:56:11 localhost kernel: [ 4091.451845] [<7f29d22c>] (amdgpu_dm_atomic_check [amdgpu]) from [<7f0af238>] (drm_atomic_check_only+0x3bc/0x5c4 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.451849]  r10:00000800 r9:7fffffff r8:81004c48 r7:b4718e80 r6:b9cd1380 r5:00000001
Jan 31 21:56:11 localhost kernel: [ 4091.451851]  r4:00000000
Jan 31 21:56:11 localhost kernel: [ 4091.451897] [<7f0aee7c>] (drm_atomic_check_only [drm]) from [<7f0af458>] (drm_atomic_commit+0x18/0x60 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.451902]  r10:00000800 r9:bbc4c000 r8:bc83d000 r7:b9cd1380 r6:be1b5000 r5:b9cd1380
Jan 31 21:56:11 localhost kernel: [ 4091.451903]  r4:00000001
Jan 31 21:56:11 localhost kernel: [ 4091.451943] [<7f0af440>] (drm_atomic_commit [drm]) from [<7f1260e8>] (drm_atomic_helper_legacy_gamma_set+0x110/0x160 [drm_kms_helper])
Jan 31 21:56:11 localhost kernel: [ 4091.451946]  r7:b9cd1380 r6:bbc4b9fe r5:a766d200 r4:00000001
Jan 31 21:56:11 localhost kernel: [ 4091.451979] [<7f125fd8>] (drm_atomic_helper_legacy_gamma_set [drm_kms_helper]) from [<7f0b951c>] (drm_mode_gamma_set_ioctl+0x1c4/0x2c0 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.451984]  r10:ffffe000 r9:bbc4ba00 r8:bbc4b800 r7:bbc4c034 r6:b1911e2c r5:b1911d80
Jan 31 21:56:11 localhost kernel: [ 4091.451986]  r4:7f125fd8 r3:bbc4bc00
Jan 31 21:56:11 localhost kernel: [ 4091.452028] [<7f0b9358>] (drm_mode_gamma_set_ioctl [drm]) from [<7f09d920>] (drm_ioctl_kernel+0x68/0xb4 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.452033]  r10:00000020 r9:b1911e2c r8:7f0b9358 r7:00000012 r6:00000000 r5:be1b5000
Jan 31 21:56:11 localhost kernel: [ 4091.452034]  r4:b4875240
Jan 31 21:56:11 localhost kernel: [ 4091.452074] [<7f09d8b8>] (drm_ioctl_kernel [drm]) from [<7f09ddec>] (drm_ioctl+0x2cc/0x3b0 [drm])
Jan 31 21:56:11 localhost kernel: [ 4091.452078]  r9:000000a5 r8:c02064a5 r7:b4875240 r6:7f0b9358 r5:7f0c4710 r4:81004c48
Jan 31 21:56:11 localhost kernel: [ 4091.452244] [<7f09db20>] (drm_ioctl [drm]) from [<7f180010>] (amdgpu_drm_ioctl+0x10/0x14 [amdgpu])
Jan 31 21:56:11 localhost kernel: [ 4091.452249]  r10:bcb5c6b0 r9:b1910000 r8:7e9f4aa8 r7:0000000b r6:be4b1300 r5:7e9f4aa8
Jan 31 21:56:11 localhost kernel: [ 4091.452250]  r4:81004c48
Jan 31 21:56:11 localhost kernel: [ 4091.452381] [<7f180000>] (amdgpu_drm_ioctl [amdgpu]) from [<8028c71c>] (do_vfs_ioctl+0xb8/0x8cc)
Jan 31 21:56:11 localhost kernel: [ 4091.452386] [<8028c664>] (do_vfs_ioctl) from [<8028cf6c>] (SyS_ioctl+0x3c/0x60)
Jan 31 21:56:11 localhost kernel: [ 4091.452390]  r10:00000000 r9:b1910000 r8:7e9f4aa8 r7:c02064a5 r6:0000000b r5:be4b1300
Jan 31 21:56:11 localhost kernel: [ 4091.452392]  r4:be4b1301
Jan 31 21:56:11 localhost kernel: [ 4091.452398] [<8028cf30>] (SyS_ioctl) from [<80108f00>] (ret_fast_syscall+0x0/0x54)
Jan 31 21:56:11 localhost kernel: [ 4091.452402]  r9:b1910000 r8:801090e4 r7:00000036 r6:c02064a5 r5:7e9f4aa8 r4:00000000
Jan 31 21:56:11 localhost kernel: [ 4091.452404] ---[ end trace a82753e3670d007b ]---

Jan 31 23:18:49 localhost kernel: [ 9049.085261] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, last signaled seq=153752, last emitted seq=153755
Jan 31 23:18:49 localhost kernel: [ 9049.096562] [drm] IP block:gmc_v8_0 is hung!
Jan 31 23:18:49 localhost kernel: [ 9049.096569] [drm] IP block:sdma_v3_0 is hung!
Jan 31 23:18:49 localhost kernel: [ 9049.096617] [drm] GPU recovery disabled.
Jan 31 23:21:34 localhost kernel: [ 9213.550781] INFO: task kworker/u4:5:1186 blocked for more than 120 seconds.
Jan 31 23:21:34 localhost kernel: [ 9213.557763]       Tainted: G        W        4.15.0-rc8-next2g-g9ab2894-dirty #1
Jan 31 23:21:34 localhost kernel: [ 9213.565196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 31 23:21:34 localhost kernel: [ 9213.573058] kworker/u4:5    D    0  1186      2 0x00000000
Jan 31 23:21:34 localhost kernel: [ 9213.573087] Workqueue: events_unbound commit_work [drm_kms_helper]
Jan 31 23:21:34 localhost kernel: [ 9213.573091] Backtrace: 
Jan 31 23:21:34 localhost kernel: [ 9213.573103] [<80b548f0>] (__schedule) from [<80b54ff4>] (schedule+0x44/0xa4)
Jan 31 23:21:34 localhost kernel: [ 9213.573108]  r10:60070013 r9:be74a000 r8:be74bc8c r7:00000000 r6:7fffffff r5:81004c48
Jan 31 23:21:34 localhost kernel: [ 9213.573110]  r4:ffffe000
Jan 31 23:21:34 localhost kernel: [ 9213.573116] [<80b54fb0>] (schedule) from [<80b58c18>] (schedule_timeout+0x1e0/0x2e8)
Jan 31 23:21:34 localhost kernel: [ 9213.573118]  r5:81004c48 r4:7fffffff
Jan 31 23:21:34 localhost kernel: [ 9213.573126] [<80b58a38>] (schedule_timeout) from [<8065c140>] (dma_fence_default_wait+0x218/0x2b0)
Jan 31 23:21:34 localhost kernel: [ 9213.573130]  r10:60070013 r9:be74a000 r8:be74bc8c r7:00000000 r6:7fffffff r5:81004c48
Jan 31 23:21:34 localhost kernel: [ 9213.573132]  r4:93892580
Jan 31 23:21:34 localhost kernel: [ 9213.573137] [<8065bf28>] (dma_fence_default_wait) from [<8065b8b8>] (dma_fence_wait_timeout+0x48/0x15c)
Jan 31 23:21:34 localhost kernel: [ 9213.573141]  r10:00000001 r9:b4bfd590 r8:b9de1570 r7:000332fe r6:00000000 r5:93892580
Jan 31 23:21:34 localhost kernel: [ 9213.573143]  r4:81096bd8
Jan 31 23:21:34 localhost kernel: [ 9213.573148] [<8065b870>] (dma_fence_wait_timeout) from [<8065d7f0>] (reservation_object_wait_timeout_rcu+0x2f4/0x3a8)
Jan 31 23:21:34 localhost kernel: [ 9213.573151]  r7:000332fe r6:00000000 r5:93892580 r4:93892580
Jan 31 23:21:34 localhost kernel: [ 9213.573360] [<8065d4fc>] (reservation_object_wait_timeout_rcu) from [<7f29b968>] (amdgpu_dm_do_flip+0xe0/0x364 [amdgpu])
Jan 31 23:21:34 localhost kernel: [ 9213.573365]  r10:be448780 r9:b9de1448 r8:bcb497e8 r7:00001dfb r6:b9de15f8 r5:bcb48000
Jan 31 23:21:34 localhost kernel: [ 9213.573367]  r4:bbc4c000
Jan 31 23:21:34 localhost kernel: [ 9213.573639] [<7f29b888>] (amdgpu_dm_do_flip [amdgpu]) from [<7f29cfcc>] (amdgpu_dm_atomic_commit_tail+0xbd0/0xe30 [amdgpu])
Jan 31 23:21:34 localhost kernel: [ 9213.573644]  r10:bbc4c000 r9:ba95f200 r8:be448780 r7:00000000 r6:b19f8b80 r5:bbc4c000
Jan 31 23:21:34 localhost kernel: [ 9213.573645]  r4:bea5b200
Jan 31 23:21:34 localhost kernel: [ 9213.573790] [<7f29c3fc>] (amdgpu_dm_atomic_commit_tail [amdgpu]) from [<7f127e74>] (commit_tail+0x48/0x8c [drm_kms_helper])
Jan 31 23:21:34 localhost kernel: [ 9213.573794]  r10:00000000 r9:810944b4 r8:00000000 r7:bf004500 r6:bf008400 r5:7f36b484
Jan 31 23:21:34 localhost kernel: [ 9213.573796]  r4:ba95f200
Jan 31 23:21:34 localhost kernel: [ 9213.573815] [<7f127e2c>] (commit_tail [drm_kms_helper]) from [<7f127ecc>] (commit_work+0x14/0x18 [drm_kms_helper])
Jan 31 23:21:34 localhost kernel: [ 9213.573818]  r5:be5bfe00 r4:ba95f22c
Jan 31 23:21:34 localhost kernel: [ 9213.573833] [<7f127eb8>] (commit_work [drm_kms_helper]) from [<8013eb70>] (process_one_work+0x204/0x510)
Jan 31 23:21:34 localhost kernel: [ 9213.573838] [<8013e96c>] (process_one_work) from [<8013fc38>] (worker_thread+0x5c/0x5f0)
Jan 31 23:21:34 localhost kernel: [ 9213.573842]  r10:81003d00 r9:00000088 r8:ffffe000 r7:bf008418 r6:be5bfe18 r5:bf008400
Jan 31 23:21:34 localhost kernel: [ 9213.573844]  r4:be5bfe00
Jan 31 23:21:34 localhost kernel: [ 9213.573850] [<8013fbdc>] (worker_thread) from [<8014534c>] (kthread+0x164/0x16c)
Jan 31 23:21:34 localhost kernel: [ 9213.573854]  r10:be6f3e84 r9:8013fbdc r8:be5bfe00 r7:be74a000 r6:00000000 r5:be5bed80
Jan 31 23:21:34 localhost kernel: [ 9213.573856]  r4:be5bee00
Jan 31 23:21:34 localhost kernel: [ 9213.573862] [<801451e8>] (kthread) from [<80108fe8>] (ret_from_fork+0x14/0x2c)
Jan 31 23:21:34 localhost kernel: [ 9213.573866]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:801451e8
Jan 31 23:21:34 localhost kernel: [ 9213.573868]  r4:be5bed80


[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                                         ` <CAEzXK1ok5BsSMY2t4+D0575QSAqEkzKPC7cnWEVmg0OhCFcgcA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-01  2:30                                                           ` Alex Deucher
       [not found]                                                             ` <CADnq5_NPWa5s+dJwinBFLTzcGfycuuuin_YFp7CJnt_8A2p9eA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Deucher @ 2018-02-01  2:30 UTC (permalink / raw)
  To: Luís Mendes
  Cc: Deucher, Alexander, Zhou, David(ChunMing),
	Michel Dänzer, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Wed, Jan 31, 2018 at 6:57 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
> Hi everyone,
>
> I am getting a new issue with amdgpu with RX460, that is, now I can
> play any videos with Kodi or play web videos with firefox and run
> OpenGL applications without running into any issues, however after
> some uptime with XOrg even when almost inactive I get a kmalloc
> allocation failure, normally followed by a GPU hang a while after the
> the allocation failure.
> I had a terminal window under Ubuntu Mate 17.10 and I was compiling
> code when I got the kernel messages that can be found in attachment.
>
> I am using the kernel as identified on my previous email, which can be
> found below.

does this patch help?
https://patchwork.freedesktop.org/patch/198258/

Alex

>
> Regards,
> Luís Mendes
>
> On Wed, Jan 31, 2018 at 12:47 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
>> Hi Alexander,
>>
>> I've cherry picked the patch you pointed out into kernel from
>> amd-drm-next-4.17-wip at commit
>> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
>> DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
>> gone indeed.
>>
>>
>>Working great on ARMv7l with AMD RX460.
>>
>>Thanks,
>>Luís Mendes
>>
>>
>>On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
>><Alexander.Deucher@amd.com> wrote:
>>> Fixed with this patch:
>>>
>>> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>>>
>>>
>>> Alex
> <>
>>> __________________
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                                             ` <CADnq5_NPWa5s+dJwinBFLTzcGfycuuuin_YFp7CJnt_8A2p9eA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-01 23:03                                                               ` Luís Mendes
       [not found]                                                                 ` <CAEzXK1rsnm318e6tE+9f=A3pAB4fHw3XUY0dMgvno4yUcoxgjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-02-01 23:03 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Deucher, Alexander, Zhou, David(ChunMing),
	Michel Dänzer, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Alexander,

I didn't notice improvements on this issue with that particular patch
applied. It still ends up failing to allocate kernel memory after a
few hours of uptime with Xorg.

I will try to upgrade to mesa 18.0.0-rc3 and to amd-staging-drm-next
head, to see if the issue still occurs with those versions.

If you have additional suggestions I'll be happy to try them.

Regards,
Luís Mendes

On Thu, Feb 1, 2018 at 2:30 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
> On Wed, Jan 31, 2018 at 6:57 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
>> Hi everyone,
>>
>> I am getting a new issue with amdgpu with RX460, that is, now I can
>> play any videos with Kodi or play web videos with firefox and run
>> OpenGL applications without running into any issues, however after
>> some uptime with XOrg even when almost inactive I get a kmalloc
>> allocation failure, normally followed by a GPU hang a while after the
>> the allocation failure.
>> I had a terminal window under Ubuntu Mate 17.10 and I was compiling
>> code when I got the kernel messages that can be found in attachment.
>>
>> I am using the kernel as identified on my previous email, which can be
>> found below.
>
> does this patch help?
> https://patchwork.freedesktop.org/patch/198258/
>
> Alex
>
>>
>> Regards,
>> Luís Mendes
>>
>> On Wed, Jan 31, 2018 at 12:47 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
>>> Hi Alexander,
>>>
>>> I've cherry picked the patch you pointed out into kernel from
>>> amd-drm-next-4.17-wip at commit
>>> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
>>> DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
>>> gone indeed.
>>>
>>>
>>>Working great on ARMv7l with AMD RX460.
>>>
>>>Thanks,
>>>Luís Mendes
>>>
>>>
>>>On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
>>><Alexander.Deucher@amd.com> wrote:
>>>> Fixed with this patch:
>>>>
>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>>>>
>>>>
>>>> Alex
>> <>
>>>> __________________
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                                                 ` <CAEzXK1rsnm318e6tE+9f=A3pAB4fHw3XUY0dMgvno4yUcoxgjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-02  7:48                                                                   ` Christian König
       [not found]                                                                     ` <c9d12d8c-a186-4041-ea41-1ccabe41eb29-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2018-02-02  7:48 UTC (permalink / raw)
  To: Luís Mendes, Alex Deucher
  Cc: Deucher, Alexander, Zhou, David(ChunMing),
	Michel Dänzer, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Luis,

please enable kmemleak in your build and watch out for any suspicious 
messages in the system log.

Regards,
Christian.

Am 02.02.2018 um 00:03 schrieb Luís Mendes:
> Hi Alexander,
>
> I didn't notice improvements on this issue with that particular patch
> applied. It still ends up failing to allocate kernel memory after a
> few hours of uptime with Xorg.
>
> I will try to upgrade to mesa 18.0.0-rc3 and to amd-staging-drm-next
> head, to see if the issue still occurs with those versions.
>
> If you have additional suggestions I'll be happy to try them.
>
> Regards,
> Luís Mendes
>
> On Thu, Feb 1, 2018 at 2:30 AM, Alex Deucher <alexdeucher@gmail.com> wrote:
>> On Wed, Jan 31, 2018 at 6:57 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
>>> Hi everyone,
>>>
>>> I am getting a new issue with amdgpu with RX460, that is, now I can
>>> play any videos with Kodi or play web videos with firefox and run
>>> OpenGL applications without running into any issues, however after
>>> some uptime with XOrg even when almost inactive I get a kmalloc
>>> allocation failure, normally followed by a GPU hang a while after the
>>> the allocation failure.
>>> I had a terminal window under Ubuntu Mate 17.10 and I was compiling
>>> code when I got the kernel messages that can be found in attachment.
>>>
>>> I am using the kernel as identified on my previous email, which can be
>>> found below.
>> does this patch help?
>> https://patchwork.freedesktop.org/patch/198258/
>>
>> Alex
>>
>>> Regards,
>>> Luís Mendes
>>>
>>> On Wed, Jan 31, 2018 at 12:47 PM, Luís Mendes <luis.p.mendes@gmail.com> wrote:
>>>> Hi Alexander,
>>>>
>>>> I've cherry picked the patch you pointed out into kernel from
>>>> amd-drm-next-4.17-wip at commit
>>>> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
>>>> DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
>>>> gone indeed.
>>>>
>>>>
>>>> Working great on ARMv7l with AMD RX460.
>>>>
>>>> Thanks,
>>>> Luís Mendes
>>>>
>>>>
>>>> On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
>>>> <Alexander.Deucher@amd.com> wrote:
>>>>> Fixed with this patch:
>>>>>
>>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>>>>>
>>>>>
>>>>> Alex
>>> <>
>>>>> __________________
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                                                     ` <c9d12d8c-a186-4041-ea41-1ccabe41eb29-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-02-02 18:46                                                                       ` Luís Mendes
       [not found]                                                                         ` <CAEzXK1rWeUYoSvd-T5JRSZnZSMcSCdpHDy6Eh5wkH=Qr=Gz2Bw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-02-02 18:46 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Deucher, Deucher, Alexander, Michel Dänzer,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Zhou, David(ChunMing)

[-- Attachment #1: Type: text/plain, Size: 3995 bytes --]

Hi Christian, Alexander,

I have enabled kmemleak, but memleak didn't detect anything special,
in fact this time, I don't know why, I didn't get any allocation
failure at all, but the GPU did hang after around 4h 6m of uptime with
Xorg.
The log can be found in attachment. I will try again to see if the
allocation failure reappears, or if it has become less apparent due to
kmemleak scans.

The kernel stack trace is similar to the GPU hangs I was getting on
earlier kernel versions with Kodi, or Firefox when watching videos
with either one, but if I left Xorg idle, it would remain up and
available without hanging for more than one day.
This stack trace also looks quite similar to what Daniel Andersson
reported in "[BUG] Intermittent hang/deadlock when opening browser tab
with Vega gpu", looks like another demonstration of the same bug on
different architectures.

Regards,
Luís

On Fri, Feb 2, 2018 at 7:48 AM, Christian König
<ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Luis,
>
> please enable kmemleak in your build and watch out for any suspicious
> messages in the system log.
>
> Regards,
> Christian.
>
>
> Am 02.02.2018 um 00:03 schrieb Luís Mendes:
>>
>> Hi Alexander,
>>
>> I didn't notice improvements on this issue with that particular patch
>> applied. It still ends up failing to allocate kernel memory after a
>> few hours of uptime with Xorg.
>>
>> I will try to upgrade to mesa 18.0.0-rc3 and to amd-staging-drm-next
>> head, to see if the issue still occurs with those versions.
>>
>> If you have additional suggestions I'll be happy to try them.
>>
>> Regards,
>> Luís Mendes
>>
>> On Thu, Feb 1, 2018 at 2:30 AM, Alex Deucher <alexdeucher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> wrote:
>>>
>>> On Wed, Jan 31, 2018 at 6:57 PM, Luís Mendes <luis.p.mendes@gmail.com>
>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I am getting a new issue with amdgpu with RX460, that is, now I can
>>>> play any videos with Kodi or play web videos with firefox and run
>>>> OpenGL applications without running into any issues, however after
>>>> some uptime with XOrg even when almost inactive I get a kmalloc
>>>> allocation failure, normally followed by a GPU hang a while after the
>>>> the allocation failure.
>>>> I had a terminal window under Ubuntu Mate 17.10 and I was compiling
>>>> code when I got the kernel messages that can be found in attachment.
>>>>
>>>> I am using the kernel as identified on my previous email, which can be
>>>> found below.
>>>
>>> does this patch help?
>>> https://patchwork.freedesktop.org/patch/198258/
>>>
>>> Alex
>>>
>>>> Regards,
>>>> Luís Mendes
>>>>
>>>> On Wed, Jan 31, 2018 at 12:47 PM, Luís Mendes <luis.p.mendes@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi Alexander,
>>>>>
>>>>> I've cherry picked the patch you pointed out into kernel from
>>>>> amd-drm-next-4.17-wip at commit
>>>>> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
>>>>> DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
>>>>> gone indeed.
>>>>>
>>>>>
>>>>> Working great on ARMv7l with AMD RX460.
>>>>>
>>>>> Thanks,
>>>>> Luís Mendes
>>>>>
>>>>>
>>>>> On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
>>>>> <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org> wrote:
>>>>>>
>>>>>> Fixed with this patch:
>>>>>>
>>>>>>
>>>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>>>>>>
>>>>>>
>>>>>> Alex
>>>>
>>>> <>
>>>>>>
>>>>>> __________________
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>

[-- Attachment #2: AMDGPU_new_hang3.txt --]
[-- Type: text/plain, Size: 7808 bytes --]

Feb  2 16:29:29 localhost kernel: [14801.740467] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=831006, last emitted seq=831008
Feb  2 16:29:29 localhost kernel: [14801.751557] [drm] IP block:gmc_v8_0 is hung!
Feb  2 16:29:29 localhost kernel: [14801.751563] [drm] IP block:gfx_v8_0 is hung!
Feb  2 16:29:29 localhost kernel: [14801.751611] [drm] GPU recovery disabled.
Feb  2 16:44:53 localhost kernel: [15725.856181] INFO: task amdgpu_cs:0:3803 blocked for more than 120 seconds.
Feb  2 16:44:53 localhost kernel: [15725.863085]       Not tainted 4.15.0-rc8-next2g-g9ab2894-dirty #3
Feb  2 16:44:53 localhost kernel: [15725.869213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 16:44:53 localhost kernel: [15725.877078] amdgpu_cs:0     D    0  3803   3091 0x00000000
Feb  2 16:44:53 localhost kernel: [15725.877084] Backtrace: 
Feb  2 16:44:53 localhost kernel: [15725.877096] [<80b571c8>] (__schedule) from [<80b578cc>] (schedule+0x44/0xa4)
Feb  2 16:44:53 localhost kernel: [15725.877102]  r10:600f0013 r9:b45b6000 r8:b45b7bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  2 16:44:53 localhost kernel: [15725.877104]  r4:ffffe000
Feb  2 16:44:53 localhost kernel: [15725.877110] [<80b57888>] (schedule) from [<80b5b4f0>] (schedule_timeout+0x1e0/0x2e8)
Feb  2 16:44:53 localhost kernel: [15725.877112]  r5:81004c48 r4:7fffffff
Feb  2 16:44:53 localhost kernel: [15725.877121] [<80b5b310>] (schedule_timeout) from [<8065df3c>] (dma_fence_default_wait+0x218/0x2b0)
Feb  2 16:44:53 localhost kernel: [15725.877125]  r10:600f0013 r9:b45b6000 r8:b45b7bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  2 16:44:53 localhost kernel: [15725.877127]  r4:9de12280
Feb  2 16:44:53 localhost kernel: [15725.877132] [<8065dd24>] (dma_fence_default_wait) from [<8065d6b4>] (dma_fence_wait_timeout+0x48/0x15c)
Feb  2 16:44:53 localhost kernel: [15725.877137]  r10:b4593800 r9:bbfd8000 r8:00000001 r7:a17b3768 r6:00000000 r5:9de12280
Feb  2 16:44:53 localhost kernel: [15725.877138]  r4:81096c18
Feb  2 16:44:53 localhost kernel: [15725.877342] [<8065d66c>] (dma_fence_wait_timeout) from [<7f1b5bc8>] (amdgpu_ctx_wait_prev_fence+0x48/0x80 [amdgpu])
Feb  2 16:44:53 localhost kernel: [15725.877346]  r7:a17b3768 r6:00000001 r5:b341e6c0 r4:00000001
Feb  2 16:44:53 localhost kernel: [15725.877606] [<7f1b5b80>] (amdgpu_ctx_wait_prev_fence [amdgpu]) from [<7f19e780>] (amdgpu_cs_ioctl+0x428/0x1edc [amdgpu])
Feb  2 16:44:53 localhost kernel: [15725.877609]  r5:b341e6c0 r4:00000001
Feb  2 16:44:53 localhost kernel: [15725.877773] [<7f19e358>] (amdgpu_cs_ioctl [amdgpu]) from [<7f08b920>] (drm_ioctl_kernel+0x68/0xb4 [drm])
Feb  2 16:44:53 localhost kernel: [15725.877778]  r10:00000018 r9:b45b7e2c r8:7f19e358 r7:00000021 r6:00000000 r5:bbff0000
Feb  2 16:44:53 localhost kernel: [15725.877779]  r4:be9d30c0
Feb  2 16:44:53 localhost kernel: [15725.877821] [<7f08b8b8>] (drm_ioctl_kernel [drm]) from [<7f08bdec>] (drm_ioctl+0x2cc/0x3b0 [drm])
Feb  2 16:44:53 localhost kernel: [15725.877825]  r9:00000044 r8:c0186444 r7:be9d30c0 r6:7f19e358 r5:7f2fcba4 r4:81004c48
Feb  2 16:44:53 localhost kernel: [15725.877971] [<7f08bb20>] (drm_ioctl [drm]) from [<7f180010>] (amdgpu_drm_ioctl+0x10/0x14 [amdgpu])
Feb  2 16:44:53 localhost kernel: [15725.877976]  r10:bb5bf210 r9:b45b6000 r8:7322aaa0 r7:0000000c r6:b0694d80 r5:7322aaa0
Feb  2 16:44:53 localhost kernel: [15725.877977]  r4:81004c48
Feb  2 16:44:53 localhost kernel: [15725.878103] [<7f180000>] (amdgpu_drm_ioctl [amdgpu]) from [<8028e4b4>] (do_vfs_ioctl+0xb8/0x8cc)
Feb  2 16:44:53 localhost kernel: [15725.878108] [<8028e3fc>] (do_vfs_ioctl) from [<8028ed04>] (SyS_ioctl+0x3c/0x60)
Feb  2 16:44:53 localhost kernel: [15725.878112]  r10:00000000 r9:b45b6000 r8:7322aaa0 r7:c0186444 r6:0000000c r5:b0694d80
Feb  2 16:44:53 localhost kernel: [15725.878114]  r4:b0694d81
Feb  2 16:44:53 localhost kernel: [15725.878121] [<8028ecc8>] (SyS_ioctl) from [<80108f00>] (ret_fast_syscall+0x0/0x54)
Feb  2 16:44:53 localhost kernel: [15725.878125]  r9:b45b6000 r8:801090e4 r7:00000036 r6:c0186444 r5:7322aaa0 r4:c0006400
Feb  2 16:46:56 localhost kernel: [15848.730505] INFO: task amdgpu_cs:0:3803 blocked for more than 120 seconds.
Feb  2 16:46:56 localhost kernel: [15848.737413]       Not tainted 4.15.0-rc8-next2g-g9ab2894-dirty #3
Feb  2 16:46:56 localhost kernel: [15848.743541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 16:46:56 localhost kernel: [15848.751404] amdgpu_cs:0     D    0  3803   3091 0x00000000
Feb  2 16:46:56 localhost kernel: [15848.751410] Backtrace: 
Feb  2 16:46:56 localhost kernel: [15848.751421] [<80b571c8>] (__schedule) from [<80b578cc>] (schedule+0x44/0xa4)
Feb  2 16:46:56 localhost kernel: [15848.751426]  r10:600f0013 r9:b45b6000 r8:b45b7bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  2 16:46:56 localhost kernel: [15848.751428]  r4:ffffe000
Feb  2 16:46:56 localhost kernel: [15848.751434] [<80b57888>] (schedule) from [<80b5b4f0>] (schedule_timeout+0x1e0/0x2e8)
Feb  2 16:46:56 localhost kernel: [15848.751436]  r5:81004c48 r4:7fffffff
Feb  2 16:46:56 localhost kernel: [15848.751444] [<80b5b310>] (schedule_timeout) from [<8065df3c>] (dma_fence_default_wait+0x218/0x2b0)
Feb  2 16:46:56 localhost kernel: [15848.751449]  r10:600f0013 r9:b45b6000 r8:b45b7bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  2 16:46:56 localhost kernel: [15848.751451]  r4:9de12280
Feb  2 16:46:56 localhost kernel: [15848.751456] [<8065dd24>] (dma_fence_default_wait) from [<8065d6b4>] (dma_fence_wait_timeout+0x48/0x15c)
Feb  2 16:46:56 localhost kernel: [15848.751460]  r10:b4593800 r9:bbfd8000 r8:00000001 r7:a17b3768 r6:00000000 r5:9de12280
Feb  2 16:46:56 localhost kernel: [15848.751462]  r4:81096c18
Feb  2 16:46:56 localhost kernel: [15848.751667] [<8065d66c>] (dma_fence_wait_timeout) from [<7f1b5bc8>] (amdgpu_ctx_wait_prev_fence+0x48/0x80 [amdgpu])
Feb  2 16:46:56 localhost kernel: [15848.751671]  r7:a17b3768 r6:00000001 r5:b341e6c0 r4:00000001
Feb  2 16:46:56 localhost kernel: [15848.751930] [<7f1b5b80>] (amdgpu_ctx_wait_prev_fence [amdgpu]) from [<7f19e780>] (amdgpu_cs_ioctl+0x428/0x1edc [amdgpu])
Feb  2 16:46:56 localhost kernel: [15848.751933]  r5:b341e6c0 r4:00000001
Feb  2 16:46:56 localhost kernel: [15848.752098] [<7f19e358>] (amdgpu_cs_ioctl [amdgpu]) from [<7f08b920>] (drm_ioctl_kernel+0x68/0xb4 [drm])
Feb  2 16:46:56 localhost kernel: [15848.752103]  r10:00000018 r9:b45b7e2c r8:7f19e358 r7:00000021 r6:00000000 r5:bbff0000
Feb  2 16:46:56 localhost kernel: [15848.752105]  r4:be9d30c0
Feb  2 16:46:56 localhost kernel: [15848.752144] [<7f08b8b8>] (drm_ioctl_kernel [drm]) from [<7f08bdec>] (drm_ioctl+0x2cc/0x3b0 [drm])
Feb  2 16:46:56 localhost kernel: [15848.752148]  r9:00000044 r8:c0186444 r7:be9d30c0 r6:7f19e358 r5:7f2fcba4 r4:81004c48
Feb  2 16:46:56 localhost kernel: [15848.752295] [<7f08bb20>] (drm_ioctl [drm]) from [<7f180010>] (amdgpu_drm_ioctl+0x10/0x14 [amdgpu])
Feb  2 16:46:56 localhost kernel: [15848.752299]  r10:bb5bf210 r9:b45b6000 r8:7322aaa0 r7:0000000c r6:b0694d80 r5:7322aaa0
Feb  2 16:46:56 localhost kernel: [15848.752301]  r4:81004c48
Feb  2 16:46:56 localhost kernel: [15848.752426] [<7f180000>] (amdgpu_drm_ioctl [amdgpu]) from [<8028e4b4>] (do_vfs_ioctl+0xb8/0x8cc)
Feb  2 16:46:56 localhost kernel: [15848.752432] [<8028e3fc>] (do_vfs_ioctl) from [<8028ed04>] (SyS_ioctl+0x3c/0x60)
Feb  2 16:46:56 localhost kernel: [15848.752436]  r10:00000000 r9:b45b6000 r8:7322aaa0 r7:c0186444 r6:0000000c r5:b0694d80
Feb  2 16:46:56 localhost kernel: [15848.752438]  r4:b0694d81
Feb  2 16:46:56 localhost kernel: [15848.752445] [<8028ecc8>] (SyS_ioctl) from [<80108f00>] (ret_fast_syscall+0x0/0x54)
Feb  2 16:46:56 localhost kernel: [15848.752449]  r9:b45b6000 r8:801090e4 r7:00000036 r6:c0186444 r5:7322aaa0 r4:c0006400

[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                                                         ` <CAEzXK1rWeUYoSvd-T5JRSZnZSMcSCdpHDy6Eh5wkH=Qr=Gz2Bw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-05 12:40                                                                           ` Luís Mendes
       [not found]                                                                             ` <CAEzXK1p3feNO1O-Kf=TCqU8sq1Xz=z3Sav1kfDB6e-1hxe+YMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-02-05 12:40 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Deucher, Deucher, Alexander, Michel Dänzer,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Zhou, David(ChunMing)

[-- Attachment #1: Type: text/plain, Size: 5087 bytes --]

Hi everyone,

I have some updates. I left the system idle most of the time during
the weekend and from time to time I played a video on youtube and
turned off the screen. Yesterday night I did the same and today
morning I checked the system and it got hung up during the night. This
time it took a lot longer to hang, but I think it was related to a
Flash animation add that was only present on the youtube page the last
time I switched off the screen. The amdgpu always seem to hang when
that flash animation is present, from all the crash attempts I have
made.
There is a memory leak according to kmemleak which I attach along with
the crash dmesg log.

The kernel and patches are the same as on my previous email. I ended
up not changing either the mesa version, nor the kernel version and
patches.

Regards,
Luís


On Fri, Feb 2, 2018 at 6:46 PM, Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Christian, Alexander,
>
> I have enabled kmemleak, but memleak didn't detect anything special,
> in fact this time, I don't know why, I didn't get any allocation
> failure at all, but the GPU did hang after around 4h 6m of uptime with
> Xorg.
> The log can be found in attachment. I will try again to see if the
> allocation failure reappears, or if it has become less apparent due to
> kmemleak scans.
>
> The kernel stack trace is similar to the GPU hangs I was getting on
> earlier kernel versions with Kodi, or Firefox when watching videos
> with either one, but if I left Xorg idle, it would remain up and
> available without hanging for more than one day.
> This stack trace also looks quite similar to what Daniel Andersson
> reported in "[BUG] Intermittent hang/deadlock when opening browser tab
> with Vega gpu", looks like another demonstration of the same bug on
> different architectures.
>
> Regards,
> Luís
>
> On Fri, Feb 2, 2018 at 7:48 AM, Christian König
> <ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi Luis,
>>
>> please enable kmemleak in your build and watch out for any suspicious
>> messages in the system log.
>>
>> Regards,
>> Christian.
>>
>>
>> Am 02.02.2018 um 00:03 schrieb Luís Mendes:
>>>
>>> Hi Alexander,
>>>
>>> I didn't notice improvements on this issue with that particular patch
>>> applied. It still ends up failing to allocate kernel memory after a
>>> few hours of uptime with Xorg.
>>>
>>> I will try to upgrade to mesa 18.0.0-rc3 and to amd-staging-drm-next
>>> head, to see if the issue still occurs with those versions.
>>>
>>> If you have additional suggestions I'll be happy to try them.
>>>
>>> Regards,
>>> Luís Mendes
>>>
>>> On Thu, Feb 1, 2018 at 2:30 AM, Alex Deucher <alexdeucher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>> wrote:
>>>>
>>>> On Wed, Jan 31, 2018 at 6:57 PM, Luís Mendes <luis.p.mendes@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I am getting a new issue with amdgpu with RX460, that is, now I can
>>>>> play any videos with Kodi or play web videos with firefox and run
>>>>> OpenGL applications without running into any issues, however after
>>>>> some uptime with XOrg even when almost inactive I get a kmalloc
>>>>> allocation failure, normally followed by a GPU hang a while after the
>>>>> the allocation failure.
>>>>> I had a terminal window under Ubuntu Mate 17.10 and I was compiling
>>>>> code when I got the kernel messages that can be found in attachment.
>>>>>
>>>>> I am using the kernel as identified on my previous email, which can be
>>>>> found below.
>>>>
>>>> does this patch help?
>>>> https://patchwork.freedesktop.org/patch/198258/
>>>>
>>>> Alex
>>>>
>>>>> Regards,
>>>>> Luís Mendes
>>>>>
>>>>> On Wed, Jan 31, 2018 at 12:47 PM, Luís Mendes <luis.p.mendes@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi Alexander,
>>>>>>
>>>>>> I've cherry picked the patch you pointed out into kernel from
>>>>>> amd-drm-next-4.17-wip at commit
>>>>>> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
>>>>>> DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
>>>>>> gone indeed.
>>>>>>
>>>>>>
>>>>>> Working great on ARMv7l with AMD RX460.
>>>>>>
>>>>>> Thanks,
>>>>>> Luís Mendes
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
>>>>>> <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org> wrote:
>>>>>>>
>>>>>>> Fixed with this patch:
>>>>>>>
>>>>>>>
>>>>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>>>>>>>
>>>>>>>
>>>>>>> Alex
>>>>>
>>>>> <>
>>>>>>>
>>>>>>> __________________
>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>

[-- Attachment #2: AMDGPU_kmemleak.txt --]
[-- Type: text/plain, Size: 15685 bytes --]

ubuntu@linux:~$ sudo cat /sys/kernel/debug/kmemleak
[sudo] password for ubuntu:
unreferenced object 0xb0fac380 (size 128):
  comm "Xorg", pid 3750, jiffies 5608934 (age 178088.970s)
  hex dump (first 32 bytes):
    00 4e 9f b9 00 f0 33 bb 80 1a 15 97 00 00 00 00  .N....3.........
    fa 00 00 00 82 01 00 00 80 00 00 00 80 00 00 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<85099e84>] dm_drm_plane_duplicate_state+0x30/0x60 [amdgpu]
    [<5c55b4e7>] drm_atomic_get_plane_state+0x74/0x118 [drm]
    [<03e85711>] drm_atomic_add_affected_planes+0x84/0xb0 [drm]
    [<d9340120>] drm_atomic_helper_check_modeset+0x4d4/0xb04 [drm_kms_helper]
    [<3c94e005>] amdgpu_dm_atomic_check+0x44/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<8fac31c8>] drm_atomic_helper_set_config+0x9c/0xac [drm_kms_helper]
    [<a9956a39>] __drm_mode_set_config_internal+0x60/0xe4 [drm]
    [<617b0b52>] drm_mode_setcrtc+0x3f4/0x598 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
unreferenced object 0xa44c5800 (size 1024):
  comm "Xorg", pid 3750, jiffies 5608934 (age 178088.970s)
  hex dump (first 32 bytes):
    00 70 4c a4 40 05 00 00 00 00 00 00 00 04 00 00  .pL.@...........
    00 00 00 00 18 00 00 00 88 00 00 00 26 03 00 00  ............&...
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<a591e85d>] dc_create_stream_for_sink+0x30/0x15c [amdgpu]
    [<87d2cda1>] create_stream_for_sink+0x50/0x4b8 [amdgpu]
    [<4e0dde84>] dm_update_crtcs_state+0x120/0x36c [amdgpu]
    [<0fc8d7c2>] amdgpu_dm_atomic_check+0x290/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<8fac31c8>] drm_atomic_helper_set_config+0x9c/0xac [drm_kms_helper]
    [<a9956a39>] __drm_mode_set_config_internal+0x60/0xe4 [drm]
    [<617b0b52>] drm_mode_setcrtc+0x3f4/0x598 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
unreferenced object 0xb0fac080 (size 128):
  comm "Xorg", pid 3750, jiffies 5608935 (age 178088.960s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 00 00 00 ff ff ff ff 08 00 08 00  ..3.............
    90 c0 fa b0 90 c0 fa b0 ff ff ff ff 0a 00 0a 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<024c2a79>] drm_atomic_helper_setup_commit+0x1d0/0x4d4 [drm_kms_helper]
    [<b2a1265a>] drm_atomic_helper_commit+0x44/0x12c [drm_kms_helper]
    [<a8a36ef6>] amdgpu_dm_atomic_commit+0xc0/0xc8 [amdgpu]
    [<d422bfea>] drm_atomic_commit+0x54/0x60 [drm]
    [<8fac31c8>] drm_atomic_helper_set_config+0x9c/0xac [drm_kms_helper]
    [<a9956a39>] __drm_mode_set_config_internal+0x60/0xe4 [drm]
    [<617b0b52>] drm_mode_setcrtc+0x3f4/0x598 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0xba604680 (size 128):
  comm "Xorg", pid 3750, jiffies 5608936 (age 178088.960s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 00 00 00 ff ff ff ff 04 00 04 00  ..3.............
    90 46 60 ba 90 46 60 ba ff ff ff ff 06 00 06 00  .F`..F`.........
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<024c2a79>] drm_atomic_helper_setup_commit+0x1d0/0x4d4 [drm_kms_helper]
    [<b2a1265a>] drm_atomic_helper_commit+0x44/0x12c [drm_kms_helper]
    [<a8a36ef6>] amdgpu_dm_atomic_commit+0xc0/0xc8 [amdgpu]
    [<d422bfea>] drm_atomic_commit+0x54/0x60 [drm]
    [<2b9bc4a3>] drm_atomic_connector_commit_dpms+0xec/0xfc [drm]
    [<9b7fdbd7>] drm_mode_obj_set_property_ioctl+0x1a8/0x318 [drm]
    [<be4c5272>] drm_mode_connector_property_set_ioctl+0x4c/0x68 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0xb948fe00 (size 512):
  comm "Xorg", pid 3750, jiffies 5608937 (age 178088.950s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 01 01 00 21 00 00 00 02 00 00 00  ..3.....!.......
    02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<70e004db>] dm_crtc_duplicate_state+0x38/0x84 [amdgpu]
    [<b13e4c2f>] drm_atomic_get_crtc_state+0x78/0x10c [drm]
    [<1c6f2b44>] page_flip_common+0x28/0xcc [drm_kms_helper]
    [<cc45921a>] drm_atomic_helper_page_flip+0x50/0xac [drm_kms_helper]
    [<c18a310c>] drm_mode_page_flip_ioctl+0x490/0x4f4 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0xb0face00 (size 128):
  comm "Xorg", pid 3750, jiffies 5608937 (age 178088.950s)
  hex dump (first 32 bytes):
    00 7a 81 b9 00 f0 33 bb 40 28 84 be 00 00 00 00  .z....3.@(......
    00 00 00 00 00 00 00 00 00 04 00 00 00 03 00 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<85099e84>] dm_drm_plane_duplicate_state+0x30/0x60 [amdgpu]
    [<5c55b4e7>] drm_atomic_get_plane_state+0x74/0x118 [drm]
    [<2802153b>] page_flip_common+0x50/0xcc [drm_kms_helper]
    [<cc45921a>] drm_atomic_helper_page_flip+0x50/0xac [drm_kms_helper]
    [<c18a310c>] drm_mode_page_flip_ioctl+0x490/0x4f4 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0xb0facf80 (size 128):
  comm "Xorg", pid 3750, jiffies 5608937 (age 178088.950s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 00 00 00 ff ff ff ff 05 00 05 00  ..3.............
    90 cf fa b0 90 cf fa b0 ff ff ff ff 06 00 06 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<024c2a79>] drm_atomic_helper_setup_commit+0x1d0/0x4d4 [drm_kms_helper]
    [<b2a1265a>] drm_atomic_helper_commit+0x44/0x12c [drm_kms_helper]
    [<a8a36ef6>] amdgpu_dm_atomic_commit+0xc0/0xc8 [amdgpu]
    [<270dc6b5>] drm_atomic_nonblocking_commit+0x54/0x58 [drm]
    [<2e425aac>] drm_atomic_helper_page_flip+0x9c/0xac [drm_kms_helper]
    [<c18a310c>] drm_mode_page_flip_ioctl+0x490/0x4f4 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0x9bdcd800 (size 1024):
  comm "kworker/0:0", pid 5643, jiffies 5609303 (age 178085.640s)
  hex dump (first 32 bytes):
    04 00 00 00 00 01 00 00 00 ff ff ff ff ff ff 00  ................
    2e 83 54 21 34 00 00 00 29 15 01 03 80 30 1b 78  ..T!4...)....0.x
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<f1541e77>] dc_sink_create+0x2c/0x88 [amdgpu]
    [<b58321e8>] dc_link_detect+0x1f0/0x994 [amdgpu]
    [<1e160950>] handle_hpd_irq+0x40/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x933a6480 (size 64):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.440s)
  hex dump (first 32 bytes):
    01 00 00 00 00 a4 81 b9 01 00 00 00 00 c6 56 ba  ..............V.
    80 27 f4 9b 03 00 00 00 80 62 3a 93 00 00 00 00  .'.......b:.....
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<1a912412>] dm_atomic_state_alloc+0x2c/0x60 [amdgpu]
    [<1305fb86>] drm_atomic_state_alloc+0x24/0x78 [drm]
    [<b39c3463>] dm_restore_drm_connector_state+0x6c/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x9bf42780 (size 128):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.440s)
  hex dump (first 32 bytes):
    00 f0 33 bb 00 fe 48 b9 00 fe 48 b9 00 fe 71 b1  ..3...H...H...q.
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<bac96619>] __kmalloc+0x1a0/0x284
    [<6315c347>] drm_atomic_state_init+0x50/0xcc [drm]
    [<b3875b9a>] dm_atomic_state_alloc+0x40/0x60 [amdgpu]
    [<1305fb86>] drm_atomic_state_alloc+0x24/0x78 [drm]
    [<b39c3463>] dm_restore_drm_connector_state+0x6c/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0xba56c600 (size 192):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.440s)
  hex dump (first 32 bytes):
    00 7a 81 b9 00 ce fa b0 00 ce fa b0 00 28 f4 9b  .z...........(..
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<bac96619>] __kmalloc+0x1a0/0x284
    [<81bcb9b0>] drm_atomic_state_init+0x78/0xcc [drm]
    [<b3875b9a>] dm_atomic_state_alloc+0x40/0x60 [amdgpu]
    [<1305fb86>] drm_atomic_state_alloc+0x24/0x78 [drm]
    [<b39c3463>] dm_restore_drm_connector_state+0x6c/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x933a6280 (size 64):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.780s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 50 8c b9 80 c2 fa b0 80 c2 fa b0 80 20 f4 9b  .P........... ..
  backtrace:
    [<b51d17be>] __kmalloc_track_caller+0x1a0/0x284
    [<bfb2e13f>] krealloc+0x54/0xc0
    [<b6fa0df1>] drm_atomic_get_connector_state+0x134/0x180 [drm]
    [<1d297503>] dm_restore_drm_connector_state+0x90/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0xb772e000 (size 8192):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.780s)
  hex dump (first 32 bytes):
    00 c4 dc 9b 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<ecab1fe2>] dc_create_state+0x28/0x3c [amdgpu]
    [<3f29ed80>] amdgpu_dm_atomic_check+0xe0/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<ce70d259>] dm_restore_drm_connector_state+0xd8/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x9bdcc400 (size 1024):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.790s)
  hex dump (first 32 bytes):
    00 d8 dc 9b 40 05 00 00 00 00 00 00 00 04 00 00  ....@...........
    00 00 00 00 18 00 00 00 88 00 00 00 26 03 00 00  ............&...
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<a591e85d>] dc_create_stream_for_sink+0x30/0x15c [amdgpu]
    [<87d2cda1>] create_stream_for_sink+0x50/0x4b8 [amdgpu]
    [<4e0dde84>] dm_update_crtcs_state+0x120/0x36c [amdgpu]
    [<0fc8d7c2>] amdgpu_dm_atomic_check+0x290/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<ce70d259>] dm_restore_drm_connector_state+0xd8/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x9bf30000 (size 24632):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.790s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<8ff04208>] kmalloc_order+0x4c/0x54
    [<24677274>] kmalloc_order_trace+0x24/0xc8
    [<b006d2b2>] dc_create_transfer_func+0x20/0x30 [amdgpu]
    [<a9439ec9>] create_stream_for_sink+0xc0/0x4b8 [amdgpu]
    [<4e0dde84>] dm_update_crtcs_state+0x120/0x36c [amdgpu]
    [<0fc8d7c2>] amdgpu_dm_atomic_check+0x290/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<ce70d259>] dm_restore_drm_connector_state+0xd8/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0xba604f80 (size 128):
  comm "kworker/0:0", pid 5643, jiffies 5609324 (age 178086.120s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 00 00 00 ff ff ff ff 08 00 08 00  ..3.............
    90 4f 60 ba 90 4f 60 ba ff ff ff ff 0a 00 0a 00  .O`..O`.........
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<024c2a79>] drm_atomic_helper_setup_commit+0x1d0/0x4d4 [drm_kms_helper]
    [<b2a1265a>] drm_atomic_helper_commit+0x44/0x12c [drm_kms_helper]
    [<a8a36ef6>] amdgpu_dm_atomic_commit+0xc0/0xc8 [amdgpu]
    [<d422bfea>] drm_atomic_commit+0x54/0x60 [drm]
    [<ce70d259>] dm_restore_drm_connector_state+0xd8/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff

[-- Attachment #3: AMDGPU_new_hang4.txt --]
[-- Type: text/plain, Size: 4089 bytes --]

Feb  4 23:36:30 linux kernel: [188528.461609] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=2186227, last emitted seq=2186230
Feb  4 23:36:30 linux kernel: [188528.472965] [drm] IP block:gmc_v8_0 is hung!
Feb  4 23:36:30 linux kernel: [188528.472971] [drm] IP block:gfx_v8_0 is hung!
Feb  4 23:36:30 linux kernel: [188528.473019] [drm] GPU recovery disabled.
Feb  4 23:52:11 linux kernel: [189469.863152] INFO: task amdgpu_cs:0:3799 blocked for more than 120 seconds.
Feb  4 23:52:11 linux kernel: [189469.870134]       Not tainted 4.15.0-rc8-next2g-g9ab2894-dirty #3
Feb  4 23:52:11 linux kernel: [189469.876354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  4 23:52:11 linux kernel: [189469.884304] amdgpu_cs:0     D    0  3799   3088 0x00000000
Feb  4 23:52:11 linux kernel: [189469.884309] Backtrace: 
Feb  4 23:52:11 linux kernel: [189469.884320] [<80b571c8>] (__schedule) from [<80b578cc>] (schedule+0x44/0xa4)
Feb  4 23:52:11 linux kernel: [189469.884325]  r10:600c0013 r9:b6108000 r8:b6109bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  4 23:52:11 linux kernel: [189469.884327]  r4:ffffe000
Feb  4 23:52:11 linux kernel: [189469.884333] [<80b57888>] (schedule) from [<80b5b4f0>] (schedule_timeout+0x1e0/0x2e8)
Feb  4 23:52:11 linux kernel: [189469.884336]  r5:81004c48 r4:7fffffff
Feb  4 23:52:11 linux kernel: [189469.884344] [<80b5b310>] (schedule_timeout) from [<8065df3c>] (dma_fence_default_wait+0x218/0x2b0)
Feb  4 23:52:11 linux kernel: [189469.884348]  r10:600c0013 r9:b6108000 r8:b6109bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  4 23:52:11 linux kernel: [189469.884350]  r4:94953e80
Feb  4 23:52:11 linux kernel: [189469.884355] [<8065dd24>] (dma_fence_default_wait) from [<8065d6b4>] (dma_fence_wait_timeout+0x48/0x15c)
Feb  4 23:52:11 linux kernel: [189469.884360]  r10:ba77b000 r9:b9820000 r8:00000001 r7:91374968 r6:00000000 r5:94953e80
Feb  4 23:52:11 linux kernel: [189469.884361]  r4:81096c18
Feb  4 23:52:11 linux kernel: [189469.884566] [<8065d66c>] (dma_fence_wait_timeout) from [<7f1b5bc8>] (amdgpu_ctx_wait_prev_fence+0x48/0x80 [amdgpu])
Feb  4 23:52:11 linux kernel: [189469.884570]  r7:91374968 r6:00000001 r5:b6bc60c0 r4:00000001
Feb  4 23:52:11 linux kernel: [189469.884829] [<7f1b5b80>] (amdgpu_ctx_wait_prev_fence [amdgpu]) from [<7f19e780>] (amdgpu_cs_ioctl+0x428/0x1edc [amdgpu])
Feb  4 23:52:11 linux kernel: [189469.884832]  r5:b6bc60c0 r4:00000001
Feb  4 23:52:11 linux kernel: [189469.884995] [<7f19e358>] (amdgpu_cs_ioctl [amdgpu]) from [<7f045920>] (drm_ioctl_kernel+0x68/0xb4 [drm])
Feb  4 23:52:11 linux kernel: [189469.884999]  r10:00000018 r9:b6109e2c r8:7f19e358 r7:00000021 r6:00000000 r5:b981a400
Feb  4 23:52:11 linux kernel: [189469.885001]  r4:ba6ca240
Feb  4 23:52:11 linux kernel: [189469.885041] [<7f0458b8>] (drm_ioctl_kernel [drm]) from [<7f045dec>] (drm_ioctl+0x2cc/0x3b0 [drm])
Feb  4 23:52:11 linux kernel: [189469.885045]  r9:00000044 r8:c0186444 r7:ba6ca240 r6:7f19e358 r5:7f2fcba4 r4:81004c48
Feb  4 23:52:11 linux kernel: [189469.885193] [<7f045b20>] (drm_ioctl [drm]) from [<7f180010>] (amdgpu_drm_ioctl+0x10/0x14 [amdgpu])
Feb  4 23:52:11 linux kernel: [189469.885197]  r10:b9b28510 r9:b6108000 r8:732c5ac0 r7:0000000c r6:b6426480 r5:732c5ac0
Feb  4 23:52:11 linux kernel: [189469.885199]  r4:81004c48
Feb  4 23:52:11 linux kernel: [189469.885324] [<7f180000>] (amdgpu_drm_ioctl [amdgpu]) from [<8028e4b4>] (do_vfs_ioctl+0xb8/0x8cc)
Feb  4 23:52:11 linux kernel: [189469.885330] [<8028e3fc>] (do_vfs_ioctl) from [<8028ed04>] (SyS_ioctl+0x3c/0x60)
Feb  4 23:52:11 linux kernel: [189469.885334]  r10:00000000 r9:b6108000 r8:732c5ac0 r7:c0186444 r6:0000000c r5:b6426480
Feb  4 23:52:11 linux kernel: [189469.885336]  r4:b6426481
Feb  4 23:52:11 linux kernel: [189469.885343] [<8028ecc8>] (SyS_ioctl) from [<80108f00>] (ret_fast_syscall+0x0/0x54)
Feb  4 23:52:11 linux kernel: [189469.885347]  r9:b6108000 r8:801090e4 r7:00000036 r6:c0186444 r5:732c5ac0 r4:c0006400


ubuntu@linux:~$ uptime
 12:36:43 up 2 days, 17:22,  3 users,  load average: 1.02, 1.03, 1.00


[-- Attachment #4: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                                                             ` <CAEzXK1p3feNO1O-Kf=TCqU8sq1Xz=z3Sav1kfDB6e-1hxe+YMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-07 15:50                                                                               ` Luís Mendes
       [not found]                                                                                 ` <CAEzXK1rni05EQGJaZVFc08LqwdbAX-8no-ARK255WCVt7AcH9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Luís Mendes @ 2018-02-07 15:50 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Deucher, Deucher, Alexander, Michel Dänzer,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Zhou, David(ChunMing)

[-- Attachment #1: Type: text/plain, Size: 5687 bytes --]

Hi Christian, Alexander,

Kmemleak reported leaked data structures and the GPU hung a bit after.
Could this be caused from DC?
Info in attachments.


I'm not sure if my previous email got overlooked, or if simply, there
are no suggestions at this moment. Sorry for kind of re-sending the
email.


Regards,
Luís

On Mon, Feb 5, 2018 at 12:40 PM, Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi everyone,
>
> I have some updates. I left the system idle most of the time during
> the weekend and from time to time I played a video on youtube and
> turned off the screen. Yesterday night I did the same and today
> morning I checked the system and it got hung up during the night. This
> time it took a lot longer to hang, but I think it was related to a
> Flash animation add that was only present on the youtube page the last
> time I switched off the screen. The amdgpu always seem to hang when
> that flash animation is present, from all the crash attempts I have
> made.
> There is a memory leak according to kmemleak which I attach along with
> the crash dmesg log.
>
> The kernel and patches are the same as on my previous email. I ended
> up not changing either the mesa version, nor the kernel version and
> patches.
>
> Regards,
> Luís
>
>
> On Fri, Feb 2, 2018 at 6:46 PM, Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi Christian, Alexander,
>>
>> I have enabled kmemleak, but memleak didn't detect anything special,
>> in fact this time, I don't know why, I didn't get any allocation
>> failure at all, but the GPU did hang after around 4h 6m of uptime with
>> Xorg.
>> The log can be found in attachment. I will try again to see if the
>> allocation failure reappears, or if it has become less apparent due to
>> kmemleak scans.
>>
>> The kernel stack trace is similar to the GPU hangs I was getting on
>> earlier kernel versions with Kodi, or Firefox when watching videos
>> with either one, but if I left Xorg idle, it would remain up and
>> available without hanging for more than one day.
>> This stack trace also looks quite similar to what Daniel Andersson
>> reported in "[BUG] Intermittent hang/deadlock when opening browser tab
>> with Vega gpu", looks like another demonstration of the same bug on
>> different architectures.
>>
>> Regards,
>> Luís
>>
>> On Fri, Feb 2, 2018 at 7:48 AM, Christian König
>> <ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Hi Luis,
>>>
>>> please enable kmemleak in your build and watch out for any suspicious
>>> messages in the system log.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>> Am 02.02.2018 um 00:03 schrieb Luís Mendes:
>>>>
>>>> Hi Alexander,
>>>>
>>>> I didn't notice improvements on this issue with that particular patch
>>>> applied. It still ends up failing to allocate kernel memory after a
>>>> few hours of uptime with Xorg.
>>>>
>>>> I will try to upgrade to mesa 18.0.0-rc3 and to amd-staging-drm-next
>>>> head, to see if the issue still occurs with those versions.
>>>>
>>>> If you have additional suggestions I'll be happy to try them.
>>>>
>>>> Regards,
>>>> Luís Mendes
>>>>
>>>> On Thu, Feb 1, 2018 at 2:30 AM, Alex Deucher <alexdeucher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>> wrote:
>>>>>
>>>>> On Wed, Jan 31, 2018 at 6:57 PM, Luís Mendes <luis.p.mendes@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I am getting a new issue with amdgpu with RX460, that is, now I can
>>>>>> play any videos with Kodi or play web videos with firefox and run
>>>>>> OpenGL applications without running into any issues, however after
>>>>>> some uptime with XOrg even when almost inactive I get a kmalloc
>>>>>> allocation failure, normally followed by a GPU hang a while after the
>>>>>> the allocation failure.
>>>>>> I had a terminal window under Ubuntu Mate 17.10 and I was compiling
>>>>>> code when I got the kernel messages that can be found in attachment.
>>>>>>
>>>>>> I am using the kernel as identified on my previous email, which can be
>>>>>> found below.
>>>>>
>>>>> does this patch help?
>>>>> https://patchwork.freedesktop.org/patch/198258/
>>>>>
>>>>> Alex
>>>>>
>>>>>> Regards,
>>>>>> Luís Mendes
>>>>>>
>>>>>> On Wed, Jan 31, 2018 at 12:47 PM, Luís Mendes <luis.p.mendes@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Alexander,
>>>>>>>
>>>>>>> I've cherry picked the patch you pointed out into kernel from
>>>>>>> amd-drm-next-4.17-wip at commit
>>>>>>> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
>>>>>>> DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
>>>>>>> gone indeed.
>>>>>>>
>>>>>>>
>>>>>>> Working great on ARMv7l with AMD RX460.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Luís Mendes
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
>>>>>>> <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org> wrote:
>>>>>>>>
>>>>>>>> Fixed with this patch:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>>>>>>>>
>>>>>>>>
>>>>>>>> Alex
>>>>>>
>>>>>> <>
>>>>>>>>
>>>>>>>> __________________
>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>

[-- Attachment #2: AMDGPU_kmemleak.txt --]
[-- Type: text/plain, Size: 15685 bytes --]

ubuntu@linux:~$ sudo cat /sys/kernel/debug/kmemleak
[sudo] password for ubuntu:
unreferenced object 0xb0fac380 (size 128):
  comm "Xorg", pid 3750, jiffies 5608934 (age 178088.970s)
  hex dump (first 32 bytes):
    00 4e 9f b9 00 f0 33 bb 80 1a 15 97 00 00 00 00  .N....3.........
    fa 00 00 00 82 01 00 00 80 00 00 00 80 00 00 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<85099e84>] dm_drm_plane_duplicate_state+0x30/0x60 [amdgpu]
    [<5c55b4e7>] drm_atomic_get_plane_state+0x74/0x118 [drm]
    [<03e85711>] drm_atomic_add_affected_planes+0x84/0xb0 [drm]
    [<d9340120>] drm_atomic_helper_check_modeset+0x4d4/0xb04 [drm_kms_helper]
    [<3c94e005>] amdgpu_dm_atomic_check+0x44/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<8fac31c8>] drm_atomic_helper_set_config+0x9c/0xac [drm_kms_helper]
    [<a9956a39>] __drm_mode_set_config_internal+0x60/0xe4 [drm]
    [<617b0b52>] drm_mode_setcrtc+0x3f4/0x598 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
unreferenced object 0xa44c5800 (size 1024):
  comm "Xorg", pid 3750, jiffies 5608934 (age 178088.970s)
  hex dump (first 32 bytes):
    00 70 4c a4 40 05 00 00 00 00 00 00 00 04 00 00  .pL.@...........
    00 00 00 00 18 00 00 00 88 00 00 00 26 03 00 00  ............&...
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<a591e85d>] dc_create_stream_for_sink+0x30/0x15c [amdgpu]
    [<87d2cda1>] create_stream_for_sink+0x50/0x4b8 [amdgpu]
    [<4e0dde84>] dm_update_crtcs_state+0x120/0x36c [amdgpu]
    [<0fc8d7c2>] amdgpu_dm_atomic_check+0x290/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<8fac31c8>] drm_atomic_helper_set_config+0x9c/0xac [drm_kms_helper]
    [<a9956a39>] __drm_mode_set_config_internal+0x60/0xe4 [drm]
    [<617b0b52>] drm_mode_setcrtc+0x3f4/0x598 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
unreferenced object 0xb0fac080 (size 128):
  comm "Xorg", pid 3750, jiffies 5608935 (age 178088.960s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 00 00 00 ff ff ff ff 08 00 08 00  ..3.............
    90 c0 fa b0 90 c0 fa b0 ff ff ff ff 0a 00 0a 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<024c2a79>] drm_atomic_helper_setup_commit+0x1d0/0x4d4 [drm_kms_helper]
    [<b2a1265a>] drm_atomic_helper_commit+0x44/0x12c [drm_kms_helper]
    [<a8a36ef6>] amdgpu_dm_atomic_commit+0xc0/0xc8 [amdgpu]
    [<d422bfea>] drm_atomic_commit+0x54/0x60 [drm]
    [<8fac31c8>] drm_atomic_helper_set_config+0x9c/0xac [drm_kms_helper]
    [<a9956a39>] __drm_mode_set_config_internal+0x60/0xe4 [drm]
    [<617b0b52>] drm_mode_setcrtc+0x3f4/0x598 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0xba604680 (size 128):
  comm "Xorg", pid 3750, jiffies 5608936 (age 178088.960s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 00 00 00 ff ff ff ff 04 00 04 00  ..3.............
    90 46 60 ba 90 46 60 ba ff ff ff ff 06 00 06 00  .F`..F`.........
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<024c2a79>] drm_atomic_helper_setup_commit+0x1d0/0x4d4 [drm_kms_helper]
    [<b2a1265a>] drm_atomic_helper_commit+0x44/0x12c [drm_kms_helper]
    [<a8a36ef6>] amdgpu_dm_atomic_commit+0xc0/0xc8 [amdgpu]
    [<d422bfea>] drm_atomic_commit+0x54/0x60 [drm]
    [<2b9bc4a3>] drm_atomic_connector_commit_dpms+0xec/0xfc [drm]
    [<9b7fdbd7>] drm_mode_obj_set_property_ioctl+0x1a8/0x318 [drm]
    [<be4c5272>] drm_mode_connector_property_set_ioctl+0x4c/0x68 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0xb948fe00 (size 512):
  comm "Xorg", pid 3750, jiffies 5608937 (age 178088.950s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 01 01 00 21 00 00 00 02 00 00 00  ..3.....!.......
    02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<70e004db>] dm_crtc_duplicate_state+0x38/0x84 [amdgpu]
    [<b13e4c2f>] drm_atomic_get_crtc_state+0x78/0x10c [drm]
    [<1c6f2b44>] page_flip_common+0x28/0xcc [drm_kms_helper]
    [<cc45921a>] drm_atomic_helper_page_flip+0x50/0xac [drm_kms_helper]
    [<c18a310c>] drm_mode_page_flip_ioctl+0x490/0x4f4 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0xb0face00 (size 128):
  comm "Xorg", pid 3750, jiffies 5608937 (age 178088.950s)
  hex dump (first 32 bytes):
    00 7a 81 b9 00 f0 33 bb 40 28 84 be 00 00 00 00  .z....3.@(......
    00 00 00 00 00 00 00 00 00 04 00 00 00 03 00 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<85099e84>] dm_drm_plane_duplicate_state+0x30/0x60 [amdgpu]
    [<5c55b4e7>] drm_atomic_get_plane_state+0x74/0x118 [drm]
    [<2802153b>] page_flip_common+0x50/0xcc [drm_kms_helper]
    [<cc45921a>] drm_atomic_helper_page_flip+0x50/0xac [drm_kms_helper]
    [<c18a310c>] drm_mode_page_flip_ioctl+0x490/0x4f4 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0xb0facf80 (size 128):
  comm "Xorg", pid 3750, jiffies 5608937 (age 178088.950s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 00 00 00 ff ff ff ff 05 00 05 00  ..3.............
    90 cf fa b0 90 cf fa b0 ff ff ff ff 06 00 06 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<024c2a79>] drm_atomic_helper_setup_commit+0x1d0/0x4d4 [drm_kms_helper]
    [<b2a1265a>] drm_atomic_helper_commit+0x44/0x12c [drm_kms_helper]
    [<a8a36ef6>] amdgpu_dm_atomic_commit+0xc0/0xc8 [amdgpu]
    [<270dc6b5>] drm_atomic_nonblocking_commit+0x54/0x58 [drm]
    [<2e425aac>] drm_atomic_helper_page_flip+0x9c/0xac [drm_kms_helper]
    [<c18a310c>] drm_mode_page_flip_ioctl+0x490/0x4f4 [drm]
    [<31f247ae>] drm_ioctl_kernel+0x68/0xb4 [drm]
    [<4d074688>] drm_ioctl+0x2cc/0x3b0 [drm]
    [<83459b01>] amdgpu_drm_ioctl+0x10/0x14 [amdgpu]
    [<99bb30d0>] do_vfs_ioctl+0xb8/0x8cc
    [<95adff3a>] SyS_ioctl+0x3c/0x60
    [<4cde0ae2>] ret_fast_syscall+0x0/0x54
    [<1738b98c>] 0xffffffff
unreferenced object 0x9bdcd800 (size 1024):
  comm "kworker/0:0", pid 5643, jiffies 5609303 (age 178085.640s)
  hex dump (first 32 bytes):
    04 00 00 00 00 01 00 00 00 ff ff ff ff ff ff 00  ................
    2e 83 54 21 34 00 00 00 29 15 01 03 80 30 1b 78  ..T!4...)....0.x
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<f1541e77>] dc_sink_create+0x2c/0x88 [amdgpu]
    [<b58321e8>] dc_link_detect+0x1f0/0x994 [amdgpu]
    [<1e160950>] handle_hpd_irq+0x40/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x933a6480 (size 64):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.440s)
  hex dump (first 32 bytes):
    01 00 00 00 00 a4 81 b9 01 00 00 00 00 c6 56 ba  ..............V.
    80 27 f4 9b 03 00 00 00 80 62 3a 93 00 00 00 00  .'.......b:.....
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<1a912412>] dm_atomic_state_alloc+0x2c/0x60 [amdgpu]
    [<1305fb86>] drm_atomic_state_alloc+0x24/0x78 [drm]
    [<b39c3463>] dm_restore_drm_connector_state+0x6c/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x9bf42780 (size 128):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.440s)
  hex dump (first 32 bytes):
    00 f0 33 bb 00 fe 48 b9 00 fe 48 b9 00 fe 71 b1  ..3...H...H...q.
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<bac96619>] __kmalloc+0x1a0/0x284
    [<6315c347>] drm_atomic_state_init+0x50/0xcc [drm]
    [<b3875b9a>] dm_atomic_state_alloc+0x40/0x60 [amdgpu]
    [<1305fb86>] drm_atomic_state_alloc+0x24/0x78 [drm]
    [<b39c3463>] dm_restore_drm_connector_state+0x6c/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0xba56c600 (size 192):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.440s)
  hex dump (first 32 bytes):
    00 7a 81 b9 00 ce fa b0 00 ce fa b0 00 28 f4 9b  .z...........(..
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<bac96619>] __kmalloc+0x1a0/0x284
    [<81bcb9b0>] drm_atomic_state_init+0x78/0xcc [drm]
    [<b3875b9a>] dm_atomic_state_alloc+0x40/0x60 [amdgpu]
    [<1305fb86>] drm_atomic_state_alloc+0x24/0x78 [drm]
    [<b39c3463>] dm_restore_drm_connector_state+0x6c/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x933a6280 (size 64):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.780s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 50 8c b9 80 c2 fa b0 80 c2 fa b0 80 20 f4 9b  .P........... ..
  backtrace:
    [<b51d17be>] __kmalloc_track_caller+0x1a0/0x284
    [<bfb2e13f>] krealloc+0x54/0xc0
    [<b6fa0df1>] drm_atomic_get_connector_state+0x134/0x180 [drm]
    [<1d297503>] dm_restore_drm_connector_state+0x90/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0xb772e000 (size 8192):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.780s)
  hex dump (first 32 bytes):
    00 c4 dc 9b 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00  ................
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<ecab1fe2>] dc_create_state+0x28/0x3c [amdgpu]
    [<3f29ed80>] amdgpu_dm_atomic_check+0xe0/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<ce70d259>] dm_restore_drm_connector_state+0xd8/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x9bdcc400 (size 1024):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.790s)
  hex dump (first 32 bytes):
    00 d8 dc 9b 40 05 00 00 00 00 00 00 00 04 00 00  ....@...........
    00 00 00 00 18 00 00 00 88 00 00 00 26 03 00 00  ............&...
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<a591e85d>] dc_create_stream_for_sink+0x30/0x15c [amdgpu]
    [<87d2cda1>] create_stream_for_sink+0x50/0x4b8 [amdgpu]
    [<4e0dde84>] dm_update_crtcs_state+0x120/0x36c [amdgpu]
    [<0fc8d7c2>] amdgpu_dm_atomic_check+0x290/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<ce70d259>] dm_restore_drm_connector_state+0xd8/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0x9bf30000 (size 24632):
  comm "kworker/0:0", pid 5643, jiffies 5609323 (age 178085.790s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<8ff04208>] kmalloc_order+0x4c/0x54
    [<24677274>] kmalloc_order_trace+0x24/0xc8
    [<b006d2b2>] dc_create_transfer_func+0x20/0x30 [amdgpu]
    [<a9439ec9>] create_stream_for_sink+0xc0/0x4b8 [amdgpu]
    [<4e0dde84>] dm_update_crtcs_state+0x120/0x36c [amdgpu]
    [<0fc8d7c2>] amdgpu_dm_atomic_check+0x290/0xc6c [amdgpu]
    [<4699f226>] drm_atomic_check_only+0x3bc/0x5c4 [drm]
    [<36cb27b1>] drm_atomic_commit+0x18/0x60 [drm]
    [<ce70d259>] dm_restore_drm_connector_state+0xd8/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff
unreferenced object 0xba604f80 (size 128):
  comm "kworker/0:0", pid 5643, jiffies 5609324 (age 178086.120s)
  hex dump (first 32 bytes):
    00 f0 33 bb 01 00 00 00 ff ff ff ff 08 00 08 00  ..3.............
    90 4f 60 ba 90 4f 60 ba ff ff ff ff 0a 00 0a 00  .O`..O`.........
  backtrace:
    [<400a53a4>] kmem_cache_alloc_trace+0x180/0x24c
    [<024c2a79>] drm_atomic_helper_setup_commit+0x1d0/0x4d4 [drm_kms_helper]
    [<b2a1265a>] drm_atomic_helper_commit+0x44/0x12c [drm_kms_helper]
    [<a8a36ef6>] amdgpu_dm_atomic_commit+0xc0/0xc8 [amdgpu]
    [<d422bfea>] drm_atomic_commit+0x54/0x60 [drm]
    [<ce70d259>] dm_restore_drm_connector_state+0xd8/0x15c [amdgpu]
    [<02786958>] handle_hpd_irq+0x70/0x90 [amdgpu]
    [<0ebe63e6>] dm_irq_work_func+0x68/0x78 [amdgpu]
    [<4bf849a6>] process_one_work+0x204/0x510
    [<37e481ef>] worker_thread+0x5c/0x5f0
    [<292a33bf>] kthread+0x164/0x16c
    [<c1559dd0>] ret_from_fork+0x14/0x2c
    [<1738b98c>] 0xffffffff

[-- Attachment #3: AMDGPU_new_hang4.txt --]
[-- Type: text/plain, Size: 4089 bytes --]

Feb  4 23:36:30 linux kernel: [188528.461609] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=2186227, last emitted seq=2186230
Feb  4 23:36:30 linux kernel: [188528.472965] [drm] IP block:gmc_v8_0 is hung!
Feb  4 23:36:30 linux kernel: [188528.472971] [drm] IP block:gfx_v8_0 is hung!
Feb  4 23:36:30 linux kernel: [188528.473019] [drm] GPU recovery disabled.
Feb  4 23:52:11 linux kernel: [189469.863152] INFO: task amdgpu_cs:0:3799 blocked for more than 120 seconds.
Feb  4 23:52:11 linux kernel: [189469.870134]       Not tainted 4.15.0-rc8-next2g-g9ab2894-dirty #3
Feb  4 23:52:11 linux kernel: [189469.876354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  4 23:52:11 linux kernel: [189469.884304] amdgpu_cs:0     D    0  3799   3088 0x00000000
Feb  4 23:52:11 linux kernel: [189469.884309] Backtrace: 
Feb  4 23:52:11 linux kernel: [189469.884320] [<80b571c8>] (__schedule) from [<80b578cc>] (schedule+0x44/0xa4)
Feb  4 23:52:11 linux kernel: [189469.884325]  r10:600c0013 r9:b6108000 r8:b6109bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  4 23:52:11 linux kernel: [189469.884327]  r4:ffffe000
Feb  4 23:52:11 linux kernel: [189469.884333] [<80b57888>] (schedule) from [<80b5b4f0>] (schedule_timeout+0x1e0/0x2e8)
Feb  4 23:52:11 linux kernel: [189469.884336]  r5:81004c48 r4:7fffffff
Feb  4 23:52:11 linux kernel: [189469.884344] [<80b5b310>] (schedule_timeout) from [<8065df3c>] (dma_fence_default_wait+0x218/0x2b0)
Feb  4 23:52:11 linux kernel: [189469.884348]  r10:600c0013 r9:b6108000 r8:b6109bd4 r7:00000000 r6:7fffffff r5:81004c48
Feb  4 23:52:11 linux kernel: [189469.884350]  r4:94953e80
Feb  4 23:52:11 linux kernel: [189469.884355] [<8065dd24>] (dma_fence_default_wait) from [<8065d6b4>] (dma_fence_wait_timeout+0x48/0x15c)
Feb  4 23:52:11 linux kernel: [189469.884360]  r10:ba77b000 r9:b9820000 r8:00000001 r7:91374968 r6:00000000 r5:94953e80
Feb  4 23:52:11 linux kernel: [189469.884361]  r4:81096c18
Feb  4 23:52:11 linux kernel: [189469.884566] [<8065d66c>] (dma_fence_wait_timeout) from [<7f1b5bc8>] (amdgpu_ctx_wait_prev_fence+0x48/0x80 [amdgpu])
Feb  4 23:52:11 linux kernel: [189469.884570]  r7:91374968 r6:00000001 r5:b6bc60c0 r4:00000001
Feb  4 23:52:11 linux kernel: [189469.884829] [<7f1b5b80>] (amdgpu_ctx_wait_prev_fence [amdgpu]) from [<7f19e780>] (amdgpu_cs_ioctl+0x428/0x1edc [amdgpu])
Feb  4 23:52:11 linux kernel: [189469.884832]  r5:b6bc60c0 r4:00000001
Feb  4 23:52:11 linux kernel: [189469.884995] [<7f19e358>] (amdgpu_cs_ioctl [amdgpu]) from [<7f045920>] (drm_ioctl_kernel+0x68/0xb4 [drm])
Feb  4 23:52:11 linux kernel: [189469.884999]  r10:00000018 r9:b6109e2c r8:7f19e358 r7:00000021 r6:00000000 r5:b981a400
Feb  4 23:52:11 linux kernel: [189469.885001]  r4:ba6ca240
Feb  4 23:52:11 linux kernel: [189469.885041] [<7f0458b8>] (drm_ioctl_kernel [drm]) from [<7f045dec>] (drm_ioctl+0x2cc/0x3b0 [drm])
Feb  4 23:52:11 linux kernel: [189469.885045]  r9:00000044 r8:c0186444 r7:ba6ca240 r6:7f19e358 r5:7f2fcba4 r4:81004c48
Feb  4 23:52:11 linux kernel: [189469.885193] [<7f045b20>] (drm_ioctl [drm]) from [<7f180010>] (amdgpu_drm_ioctl+0x10/0x14 [amdgpu])
Feb  4 23:52:11 linux kernel: [189469.885197]  r10:b9b28510 r9:b6108000 r8:732c5ac0 r7:0000000c r6:b6426480 r5:732c5ac0
Feb  4 23:52:11 linux kernel: [189469.885199]  r4:81004c48
Feb  4 23:52:11 linux kernel: [189469.885324] [<7f180000>] (amdgpu_drm_ioctl [amdgpu]) from [<8028e4b4>] (do_vfs_ioctl+0xb8/0x8cc)
Feb  4 23:52:11 linux kernel: [189469.885330] [<8028e3fc>] (do_vfs_ioctl) from [<8028ed04>] (SyS_ioctl+0x3c/0x60)
Feb  4 23:52:11 linux kernel: [189469.885334]  r10:00000000 r9:b6108000 r8:732c5ac0 r7:c0186444 r6:0000000c r5:b6426480
Feb  4 23:52:11 linux kernel: [189469.885336]  r4:b6426481
Feb  4 23:52:11 linux kernel: [189469.885343] [<8028ecc8>] (SyS_ioctl) from [<80108f00>] (ret_fast_syscall+0x0/0x54)
Feb  4 23:52:11 linux kernel: [189469.885347]  r9:b6108000 r8:801090e4 r7:00000036 r6:c0186444 r5:732c5ac0 r4:c0006400


ubuntu@linux:~$ uptime
 12:36:43 up 2 days, 17:22,  3 users,  load average: 1.02, 1.03, 1.00


[-- Attachment #4: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2
       [not found]                                                                                 ` <CAEzXK1rni05EQGJaZVFc08LqwdbAX-8no-ARK255WCVt7AcH9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-07 16:41                                                                                   ` Deucher, Alexander
  0 siblings, 0 replies; 25+ messages in thread
From: Deucher, Alexander @ 2018-02-07 16:41 UTC (permalink / raw)
  To: Luís Mendes, Koenig, Christian
  Cc: Alex Deucher, Zhou, David(ChunMing),
	Michel Dänzer, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 6128 bytes --]

We haven't had a chance to look yet.


Alex

________________________________
From: Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Sent: Wednesday, February 7, 2018 10:50:48 AM
To: Koenig, Christian
Cc: Alex Deucher; Deucher, Alexander; Zhou, David(ChunMing); Michel Dänzer; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2

Hi Christian, Alexander,

Kmemleak reported leaked data structures and the GPU hung a bit after.
Could this be caused from DC?
Info in attachments.


I'm not sure if my previous email got overlooked, or if simply, there
are no suggestions at this moment. Sorry for kind of re-sending the
email.


Regards,
Luís

On Mon, Feb 5, 2018 at 12:40 PM, Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi everyone,
>
> I have some updates. I left the system idle most of the time during
> the weekend and from time to time I played a video on youtube and
> turned off the screen. Yesterday night I did the same and today
> morning I checked the system and it got hung up during the night. This
> time it took a lot longer to hang, but I think it was related to a
> Flash animation add that was only present on the youtube page the last
> time I switched off the screen. The amdgpu always seem to hang when
> that flash animation is present, from all the crash attempts I have
> made.
> There is a memory leak according to kmemleak which I attach along with
> the crash dmesg log.
>
> The kernel and patches are the same as on my previous email. I ended
> up not changing either the mesa version, nor the kernel version and
> patches.
>
> Regards,
> Luís
>
>
> On Fri, Feb 2, 2018 at 6:46 PM, Luís Mendes <luis.p.mendes-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi Christian, Alexander,
>>
>> I have enabled kmemleak, but memleak didn't detect anything special,
>> in fact this time, I don't know why, I didn't get any allocation
>> failure at all, but the GPU did hang after around 4h 6m of uptime with
>> Xorg.
>> The log can be found in attachment. I will try again to see if the
>> allocation failure reappears, or if it has become less apparent due to
>> kmemleak scans.
>>
>> The kernel stack trace is similar to the GPU hangs I was getting on
>> earlier kernel versions with Kodi, or Firefox when watching videos
>> with either one, but if I left Xorg idle, it would remain up and
>> available without hanging for more than one day.
>> This stack trace also looks quite similar to what Daniel Andersson
>> reported in "[BUG] Intermittent hang/deadlock when opening browser tab
>> with Vega gpu", looks like another demonstration of the same bug on
>> different architectures.
>>
>> Regards,
>> Luís
>>
>> On Fri, Feb 2, 2018 at 7:48 AM, Christian König
>> <ckoenig.leichtzumerken-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Hi Luis,
>>>
>>> please enable kmemleak in your build and watch out for any suspicious
>>> messages in the system log.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>> Am 02.02.2018 um 00:03 schrieb Luís Mendes:
>>>>
>>>> Hi Alexander,
>>>>
>>>> I didn't notice improvements on this issue with that particular patch
>>>> applied. It still ends up failing to allocate kernel memory after a
>>>> few hours of uptime with Xorg.
>>>>
>>>> I will try to upgrade to mesa 18.0.0-rc3 and to amd-staging-drm-next
>>>> head, to see if the issue still occurs with those versions.
>>>>
>>>> If you have additional suggestions I'll be happy to try them.
>>>>
>>>> Regards,
>>>> Luís Mendes
>>>>
>>>> On Thu, Feb 1, 2018 at 2:30 AM, Alex Deucher <alexdeucher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>> wrote:
>>>>>
>>>>> On Wed, Jan 31, 2018 at 6:57 PM, Luís Mendes <luis.p.mendes@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I am getting a new issue with amdgpu with RX460, that is, now I can
>>>>>> play any videos with Kodi or play web videos with firefox and run
>>>>>> OpenGL applications without running into any issues, however after
>>>>>> some uptime with XOrg even when almost inactive I get a kmalloc
>>>>>> allocation failure, normally followed by a GPU hang a while after the
>>>>>> the allocation failure.
>>>>>> I had a terminal window under Ubuntu Mate 17.10 and I was compiling
>>>>>> code when I got the kernel messages that can be found in attachment.
>>>>>>
>>>>>> I am using the kernel as identified on my previous email, which can be
>>>>>> found below.
>>>>>
>>>>> does this patch help?
>>>>> https://patchwork.freedesktop.org/patch/198258/
>>>>>
>>>>> Alex
>>>>>
>>>>>> Regards,
>>>>>> Luís Mendes
>>>>>>
>>>>>> On Wed, Jan 31, 2018 at 12:47 PM, Luís Mendes <luis.p.mendes@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Alexander,
>>>>>>>
>>>>>>> I've cherry picked the patch you pointed out into kernel from
>>>>>>> amd-drm-next-4.17-wip at commit
>>>>>>> 9ab2894122275a6d636bb2654a157e88a0f7b9e2 ( drm/amdgpu: set
>>>>>>> DRIVER_ATOMIC flag early) and tested it on ARMv7l and the problem has
>>>>>>> gone indeed.
>>>>>>>
>>>>>>>
>>>>>>> Working great on ARMv7l with AMD RX460.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Luís Mendes
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 30, 2018 at 6:44 PM, Deucher, Alexander
>>>>>>> <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org> wrote:
>>>>>>>>
>>>>>>>> Fixed with this patch:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-January/018472.html
>>>>>>>>
>>>>>>>>
>>>>>>>> Alex
>>>>>>
>>>>>> <>
>>>>>>>>
>>>>>>>> __________________
>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>

[-- Attachment #1.2: Type: text/html, Size: 9804 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-02-07 16:41 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-01 16:32 Deadlocks with multiple applications on AMD RX 460 and RX 550 - Update 2 Luís Mendes
     [not found] ` <CAEzXK1o+FewfiG84pCj8c_1Xz5KVsOOU7EX13LWaFVxK7s66fg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-02  2:51   ` Chunming Zhou
     [not found]     ` <1ca083a5-f33c-7be0-a3c8-c2996a087f70-5C7GfCeVMHo@public.gmane.org>
2018-01-02  9:38       ` Christian König
2018-01-02 13:09       ` Luís Mendes
     [not found]         ` <CAEzXK1odkDX-D3MOGHLFJuKNHh0RjSfsFL8PtB=6YQRDf1+Tkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-02 13:17           ` Christian König
     [not found]             ` <d755cd12-1a1e-ce98-2b0d-ea4b5eafc483-5C7GfCeVMHo@public.gmane.org>
2018-01-02 22:29               ` Luís Mendes
     [not found]                 ` <CAEzXK1qCfnHUvnuuViTP5fZQx2StjecV6o_QfJ2uJk603_9nhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-03  0:36                   ` Luís Mendes
     [not found]                     ` <CAEzXK1oUyHsSGHeXS9qzWBSDL-FfRq2h-EiQMCfa=5BroO50gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-03  9:37                       ` Christian König
     [not found]                         ` <8bcdc933-050b-12ca-46e6-54bb66b6824d-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-01-03 11:02                           ` Luís Mendes
     [not found]                             ` <CAEzXK1pPzRM=EpL8fKQ5SdDvEePR+_KPhrtPMPiArt==BsJspA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-03 11:56                               ` Luís Mendes
     [not found]                                 ` <CAEzXK1pFk_VvOnWXnaLdUcgHeGLZ=+E5fE-+GP1gkfWbQB0OWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-03 12:34                                   ` Christian König
2018-01-03 17:09                               ` Michel Dänzer
     [not found]                                 ` <99a9e27e-f166-f969-416f-f128b0673388-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-01-03 17:47                                   ` Luís Mendes
     [not found]                                     ` <CAEzXK1rb6ngg=3MHo6yT+ed-a_1xr3ASwLPtsK4CpSMBk3xKgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-03 23:08                                       ` Luís Mendes
     [not found]                                         ` <CAEzXK1qFh-Y6zCoVpRVRcLEe_hLFueqntrjaAwgfKyLGd2u27A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-30 18:30                                           ` Luís Mendes
     [not found]                                             ` <CAEzXK1rixP9spSTBY4V5GWxyUWJdf23Nbis2gbKgfxz4A6w2rQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-30 18:44                                               ` Deucher, Alexander
     [not found]                                                 ` <BN6PR12MB1652E01048ABD6A307E80B8DF7E40-/b2+HYfkarQqUD6E6FAiowdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-01-31 12:47                                                   ` Luís Mendes
     [not found]                                                     ` <CAEzXK1rbmaNifg00RY+GPdUgLtysLsF1mciPekbotWfT0gagJA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-31 23:57                                                       ` Luís Mendes
     [not found]                                                         ` <CAEzXK1ok5BsSMY2t4+D0575QSAqEkzKPC7cnWEVmg0OhCFcgcA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-01  2:30                                                           ` Alex Deucher
     [not found]                                                             ` <CADnq5_NPWa5s+dJwinBFLTzcGfycuuuin_YFp7CJnt_8A2p9eA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-01 23:03                                                               ` Luís Mendes
     [not found]                                                                 ` <CAEzXK1rsnm318e6tE+9f=A3pAB4fHw3XUY0dMgvno4yUcoxgjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-02  7:48                                                                   ` Christian König
     [not found]                                                                     ` <c9d12d8c-a186-4041-ea41-1ccabe41eb29-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-02 18:46                                                                       ` Luís Mendes
     [not found]                                                                         ` <CAEzXK1rWeUYoSvd-T5JRSZnZSMcSCdpHDy6Eh5wkH=Qr=Gz2Bw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-05 12:40                                                                           ` Luís Mendes
     [not found]                                                                             ` <CAEzXK1p3feNO1O-Kf=TCqU8sq1Xz=z3Sav1kfDB6e-1hxe+YMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-07 15:50                                                                               ` Luís Mendes
     [not found]                                                                                 ` <CAEzXK1rni05EQGJaZVFc08LqwdbAX-8no-ARK255WCVt7AcH9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-07 16:41                                                                                   ` Deucher, Alexander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.