All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
@ 2018-01-29  6:57 bugzilla-daemon
  2018-01-29 15:47 ` bugzilla-daemon
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-01-29  6:57 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 5646 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

            Bug ID: 104825
           Summary: [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled
                    failed (scratch(0xC040)=0x00000000) when unbinding
           Product: DRI
           Version: XOrg git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: mlen@mlen.pl

I use two amdgpu rx480 cards. During boot one of them is rebound to vfio-pci
driver using the following script:

pci_ids=("0000:03:00.0" "0000:03:00.1")

for id in "${pci_ids[@]}"; do
  vendor="$(cat "/sys/bus/pci/devices/$id/vendor")"
  device="$(cat "/sys/bus/pci/devices/$id/device")"

  if [ -e "/sys/bus/pci/devices/$id/driver/unbind" ]; then
    echo "$id" >"/sys/bus/pci/devices/$id/driver/unbind"
  fi

  echo "$vendor $device" >/sys/bus/pci/drivers/vfio-pci/new_id
done

Starting from Linux 4.15 with amdgpu DC enabled (I wanted to use it for HDMI
audio), unbind operation causes general protection failure:

[   68.011473] [drm] amdgpu: finishing device.
[   68.377945] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   68.575193] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   68.770107] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   68.971775] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.164265] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.350089] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.538302] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.729260] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   69.729733] general protection fault: 0000 [#1] PREEMPT SMP PTI
[   69.730901] Modules linked in:
[   69.731936] CPU: 2 PID: 3934 Comm: openrc-run.sh Not tainted 4.15.0-gentoo
#2
[   69.733009] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS,
BIOS 3407 03/10/2017
[   69.734240] RIP: 0010:dm_read_reg_func.isra.0+0x3/0xc
[   69.735314] RSP: 0018:ffffa80d8bd4fc40 EFLAGS: 00010282
[   69.736353] RAX: ccea607dac10c354 RBX: ffff95af35bfca80 RCX:
0000000180200008
[   69.737408] RDX: 0000000180200009 RSI: 0000000000005c02 RDI:
ffff95af362b91c0
[   69.738494] RBP: ffff95af35b75c90 R08: 0000000000000001 R09:
ffffffffb30fced2
[   69.739561] R10: ffff95af2f770a40 R11: 00000000ffffff80 R12:
ffff95af35f30100
[   69.740638] R13: 0000000000000000 R14: ffff95af358a3c20 R15:
ffff95bf2d5f9c20
[   69.742007] FS:  00007fe7530af740(0000) GS:ffff95af3da00000(0000)
knlGS:0000000000000000
[   69.743083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   69.744134] CR2: 000000c420870008 CR3: 0000001027426003 CR4:
00000000003606e0
[   69.745190] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   69.746246] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[   69.747320] Call Trace:
[   69.748359]  destroy+0x21/0x9c
[   69.749395]  dal_i2caux_destruct+0x6f/0xab
[   69.750459]  destroy+0x15/0x27
[   69.751495]  dal_i2caux_destroy+0x26/0x2f
[   69.752537]  destruct+0x86/0xfd
[   69.753571]  dc_destroy+0x11/0x22
[   69.754626]  dm_hw_fini+0x1e/0x22
[   69.755632]  amdgpu_fini+0xf3/0x2d6
[   69.756616]  amdgpu_device_fini+0x5c/0x158
[   69.757597]  amdgpu_driver_unload_kms+0x6b/0x7e
[   69.758601]  drm_dev_unregister+0x4c/0xc6
[   69.759584]  amdgpu_pci_remove+0x19/0x37
[   69.760576]  pci_device_remove+0x3b/0x8b
[   69.761563]  device_release_driver_internal+0x125/0x1f9
[   69.762580]  unbind_store+0x60/0x90
[   69.763568]  kernfs_fop_write+0x111/0x159
[   69.764557]  __vfs_write+0x33/0xd7
[   69.765543]  ? preempt_count_sub+0x8b/0x94
[   69.766552]  ? __sb_start_write+0xc0/0x180
[   69.767525]  vfs_write+0xa5/0xe2
[   69.768490]  SyS_write+0x5f/0xa3
[   69.769439]  do_syscall_64+0x72/0x81
[   69.770419]  entry_SYSCALL64_slow_path+0x25/0x25
[   69.771373] RIP: 0033:0x7fe7529a1408
[   69.772324] RSP: 002b:00007ffe1885c200 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   69.773300] RAX: ffffffffffffffda RBX: 000000000000000d RCX:
00007fe7529a1408
[   69.774290] RDX: 000000000000000d RSI: 0000558994641890 RDI:
0000000000000001
[   69.775290] RBP: 0000558994641890 R08: 000000000000000a R09:
00005589946475f0
[   69.776260] R10: 000000000000009b R11: 0000000000000246 R12:
000000000000000d
[   69.777248] R13: 0000000000000001 R14: 00007fe752c6e740 R15:
000000000000000d
[   69.778244] Code: f7 74 24 04 89 43 5c 48 8b 4c 24 08 65 48 33 0c 25 28 00
00 00 4c 89 e0 74 05 e8 39 58 9e ff 48 83 c4 10 5b 5d 41 5c c3 48 8b 07 <48> 8b
40 30 e9 03 4a 70 00 0f 1f 44 00 00 48 8b 47 30 8b 70 04 
[   69.779406] RIP: dm_read_reg_func.isra.0+0x3/0xc RSP: ffffa80d8bd4fc40
[   69.780491] ---[ end trace 6aa4681ba3a43ec3 ]---
[   71.815503] [drm:amdgpu_fill_buffer] *ERROR* Trying to clear memory with
ring turned off.
[   71.899258] amdgpu 0000:03:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=none
[   71.899263] amdgpu 0000:02:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=none
[   72.217918] [drm:amdgpu_fill_buffer] *ERROR* Trying to clear memory with
ring turned off.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 7102 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
@ 2018-01-29 15:47 ` bugzilla-daemon
  2018-01-30  8:00 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-01-29 15:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 568 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #1 from Harry Wentland <harry.wentland@amd.com> ---
This patch https://patchwork.freedesktop.org/patch/198719/ should fix it, but
there could be some other issues as well.

amd-staging-drm-next has fixes for a whole bunch of driver unload issues,
including what you're seeing. It's hosted at
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

Can you try the patch and/or amd-staging-drm-next?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1610 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
  2018-01-29 15:47 ` bugzilla-daemon
@ 2018-01-30  8:00 ` bugzilla-daemon
  2018-01-31  7:21 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-01-30  8:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 229 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #2 from mlen <mlen@mlen.pl> ---
I'll try running amd-staging-drm-next later today

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1105 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
  2018-01-29 15:47 ` bugzilla-daemon
  2018-01-30  8:00 ` bugzilla-daemon
@ 2018-01-31  7:21 ` bugzilla-daemon
  2018-02-27  0:09 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-01-31  7:21 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4610 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #3 from mlen <mlen@mlen.pl> ---
I tested amd-staging-drm-next with HEAD at
f1367d12f5fabb04789c7772594887434c8d9e8b. This time the unbind succeeded, but
there are still some errors logged and kernel reports locking problem in
amdgpu:

[   77.098923] [drm] amdgpu: finishing device.
[   77.458614] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   77.481247] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
[   77.653815] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   77.845085] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   77.855055] IPv6: ADDRCONF(NETDEV_CHANGE): virbr10: link becomes ready
[   78.036695] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   78.233244] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   78.425058] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   78.616635] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   78.808323] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[   78.810659] amdgpu 0000:03:00.0: 00000000a667dd57 unpin not necessary
[   78.810672] amdgpu 0000:03:00.0: 00000000a7594a2b unpin not necessary

[   78.811733] =====================================
[   78.813109] WARNING: bad unlock balance detected!
[   78.813947] 4.15.0-rc4+ #2 Not tainted
[   78.814835] -------------------------------------
[   78.815731] openrc-run.sh/3931 is trying to release lock
(&(&mgr->lock)->rlock) at:
[   78.816646] [<000000006fd39549>] amdgpu_gtt_mgr_fini+0x22/0x37
[   78.817531] but there are no more locks to release!
[   78.818446] 
               other info that might help us debug this:
[   78.820208] 5 locks held by openrc-run.sh/3931:
[   78.821127]  #0:  (sb_writers#6){....}, at: [<00000000322e5044>]
vfs_write+0x87/0xe2
[   78.822051]  #1:  (&of->mutex){....}, at: [<00000000660270c4>]
kernfs_fop_write+0xca/0x156
[   78.823007]  #2:  (kn->count#211){....}, at: [<000000000634dafb>]
kernfs_fop_write+0xd2/0x156
[   78.823936]  #3:  (&dev->mutex){....}, at: [<00000000c386f49f>]
unbind_store+0x58/0x90
[   78.824912]  #4:  (&dev->mutex){....}, at: [<00000000eefcc37f>]
device_release_driver_internal+0x2f/0x1f3
[   78.825861] 
               stack backtrace:
[   78.827764] CPU: 7 PID: 3931 Comm: openrc-run.sh Not tainted 4.15.0-rc4+ #2
[   78.828747] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS,
BIOS 3407 03/10/2017
[   78.829718] Call Trace:
[   78.830717]  dump_stack+0x67/0x8e
[   78.831689]  ? amdgpu_gtt_mgr_fini+0x22/0x37
[   78.832687]  print_unlock_imbalance_bug+0xcc/0xd3
[   78.833657]  lock_release+0x134/0x267
[   78.834646]  ? _raw_spin_unlock+0x2e/0x40
[   78.835605]  _raw_spin_unlock+0x1c/0x40
[   78.836586]  amdgpu_gtt_mgr_fini+0x22/0x37
[   78.837549]  ttm_bo_clean_mm+0x79/0xab
[   78.838544]  amdgpu_ttm_fini+0x75/0x11c
[   78.839507]  amdgpu_bo_fini+0xe/0x2d
[   78.840495]  gmc_v8_0_sw_fini+0x2e/0x49
[   78.841454]  amdgpu_device_ip_fini+0x21f/0x2d3
[   78.842439]  amdgpu_device_fini+0x4c/0x125
[   78.843394]  amdgpu_driver_unload_kms+0x63/0x76
[   78.844373]  drm_dev_unregister+0x49/0xc3
[   78.845318]  amdgpu_pci_remove+0x19/0x37
[   78.846244]  pci_device_remove+0x36/0x86
[   78.847190]  device_release_driver_internal+0x122/0x1f3
[   78.848120]  unbind_store+0x60/0x90
[   78.849069]  kernfs_fop_write+0x10e/0x156
[   78.849997]  __vfs_write+0x31/0xcc
[   78.850937]  ? preempt_count_sub+0x8b/0x94
[   78.851871]  ? __sb_start_write+0xc0/0x180
[   78.852828]  vfs_write+0xa5/0xe2
[   78.853755]  SyS_write+0x5f/0xa3
[   78.854708]  do_syscall_64+0x6c/0x7b
[   78.855630]  entry_SYSCALL64_slow_path+0x25/0x25
[   78.856583] RIP: 0033:0x7fc5804b6408
[   78.857511] RSP: 002b:00007ffd95228060 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   78.858484] RAX: ffffffffffffffda RBX: 000000000000000d RCX:
00007fc5804b6408
[   78.859438] RDX: 000000000000000d RSI: 000055b229e9e890 RDI:
0000000000000001
[   78.860419] RBP: 000055b229e9e890 R08: 000000000000000a R09:
000055b229ea45f0
[   78.861379] R10: 000000000000009b R11: 0000000000000246 R12:
000000000000000d
[   78.862364] R13: 0000000000000001 R14: 00007fc580783740 R15:
000000000000000d
[   78.863411] [drm] amdgpu: ttm finalized

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5560 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-01-31  7:21 ` bugzilla-daemon
@ 2018-02-27  0:09 ` bugzilla-daemon
  2018-02-27 10:55 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27  0:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 274 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #4 from Harry Wentland <harry.wentland@amd.com> ---
Good to hear DC issues are gone. Not sure about the unlock balance myself.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1170 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-02-27  0:09 ` bugzilla-daemon
@ 2018-02-27 10:55 ` bugzilla-daemon
  2018-02-27 11:48 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 10:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 384 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #5 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Hi, can you please provide full dmesg for the unbind sequence ? 
So the issue happens when you unbind the card to from the host driver and
before binding it to VFIO ?

Andrey

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1286 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-02-27 10:55 ` bugzilla-daemon
@ 2018-02-27 11:48 ` bugzilla-daemon
  2018-02-27 14:51 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 11:48 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 347 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #6 from mlen <mlen@mlen.pl> ---
Currently I don't have access to that machine, I should be able to test next
week. I'll extend the script to be more verbose and writing some delimiters to
/dev/kmsg

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1223 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-02-27 11:48 ` bugzilla-daemon
@ 2018-02-27 14:51 ` bugzilla-daemon
  2018-02-27 15:20 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 14:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 821 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

Andrey Grodzovsky <andrey.grodzovsky@amd.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrey.grodzovsky@amd.com

--- Comment #7 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
On Carrizo unbinding  the driver will indeed show [drm:gfx_v8_0_hw_fini
[amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD), I don't
have LOCKDEP related kflags enabled in kernel for some reason so that probably
why I don't see the locking imbalance warning, will rebuild and check.

What card model are you using ?

Andrey

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2334 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (6 preceding siblings ...)
  2018-02-27 14:51 ` bugzilla-daemon
@ 2018-02-27 15:20 ` bugzilla-daemon
  2018-02-27 15:30 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 15:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 452 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

Andrey Grodzovsky <andrey.grodzovsky@amd.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|dri-devel@lists.freedesktop |andrey.grodzovsky@amd.com
                   |.org                        |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1192 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (7 preceding siblings ...)
  2018-02-27 15:20 ` bugzilla-daemon
@ 2018-02-27 15:30 ` bugzilla-daemon
  2018-02-27 16:17 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 15:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 458 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

Andrey Grodzovsky <andrey.grodzovsky@amd.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|andrey.grodzovsky@amd.com   |dri-devel@lists.freedesktop
                   |                            |.org

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1192 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (8 preceding siblings ...)
  2018-02-27 15:30 ` bugzilla-daemon
@ 2018-02-27 16:17 ` bugzilla-daemon
  2018-02-27 18:10 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 16:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 242 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #8 from mlen <mlen@mlen.pl> ---
I'm using two RX 480 cards, but I'm rebinding only one of them

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1118 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (9 preceding siblings ...)
  2018-02-27 16:17 ` bugzilla-daemon
@ 2018-02-27 18:10 ` bugzilla-daemon
  2018-02-27 18:12 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 18:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 663 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #9 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 137652
  --> https://bugs.freedesktop.org/attachment.cgi?id=137652&action=edit
DAL warnings during driver unbind

Harry, with DAL enabled I observe warnings and unbind can't complete - 
to reproduce under root do 

cd /sys/bus/pci/drivers/amdgpu/
echo 'pci_id' > unbind where pci_id is seen when doing ls -l on this folder and
seeing soft link like this 
0000:2a:00.0 -> ../../../../devices/pci0000:00/0000:00:03.1/0000:2a:00.0/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1729 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (10 preceding siblings ...)
  2018-02-27 18:10 ` bugzilla-daemon
@ 2018-02-27 18:12 ` bugzilla-daemon
  2018-02-27 19:01 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 18:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 410 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #10 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
P.S I am using CZ or Ellsmire with tip of amd-staging-drm-next. Going to
disable DAL for now to debug the KCQ ring failure. 

P.S After enabling lockdep i still don't see any locking related warnings.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1313 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (11 preceding siblings ...)
  2018-02-27 18:12 ` bugzilla-daemon
@ 2018-02-27 19:01 ` bugzilla-daemon
  2018-02-27 19:17 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 19:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 426 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #11 from Harry Wentland <harry.wentland@amd.com> ---
Created attachment 137662
  --> https://bugs.freedesktop.org/attachment.cgi?id=137662&action=edit
[PATCH] drm/amd/display: Use atomic crtc_disable for DC on shutdown

Andrey, can you see if this fixes the warning for you?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1635 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (12 preceding siblings ...)
  2018-02-27 19:01 ` bugzilla-daemon
@ 2018-02-27 19:17 ` bugzilla-daemon
  2018-03-27 18:02 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 19:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 7752 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #12 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Harry Wentland from comment #11)
> Created attachment 137662 [details] [review]
> [PATCH] drm/amd/display: Use atomic crtc_disable for DC on shutdown
> 
> Andrey, can you see if this fixes the warning for you?

Get use after free now 

[   82.400097 <    0.000387>] BUG: KASAN: use-after-free in
amdgpu_dm_set_pflip_irq_state+0x3d/0xa0 [amdgpu]
[   82.400185 <    0.000088>] Read of size 4 at addr ffff88008f53ee94 by task
bash/1178

[   82.400302 <    0.000117>] CPU: 0 PID: 1178 Comm: bash Tainted: G        W 
OE    4.16.0-rc1.main+ #14
[   82.400308 <    0.000006>] Hardware name: AMD Gardenia/Gardenia, BIOS
RGA1101C 07/20/2015
[   82.400312 <    0.000004>] Call Trace:
[   82.400329 <    0.000017>]  dump_stack+0x5c/0x78
[   82.400342 <    0.000013>]  print_address_description+0xd1/0x270
[   82.400618 <    0.000276>]  ? amdgpu_dm_set_pflip_irq_state+0x3d/0xa0
[amdgpu]
[   82.400627 <    0.000009>]  kasan_report+0x260/0x360
[   82.400913 <    0.000286>]  amdgpu_dm_set_pflip_irq_state+0x3d/0xa0 [amdgpu]
[   82.401189 <    0.000276>]  amdgpu_irq_disable_all+0x111/0x190 [amdgpu]
[   82.401452 <    0.000263>]  amdgpu_device_ip_fini+0x1b7/0x610 [amdgpu]
[   82.401718 <    0.000266>]  amdgpu_device_fini+0xa1/0x320 [amdgpu]
[   82.401973 <    0.000255>]  amdgpu_driver_unload_kms+0x6a/0xd0 [amdgpu]
[   82.402059 <    0.000086>]  drm_dev_unregister+0x79/0x180 [drm]
[   82.402315 <    0.000256>]  amdgpu_pci_remove+0x2a/0x60 [amdgpu]
[   82.402331 <    0.000016>]  pci_device_remove+0x5b/0x100
[   82.402347 <    0.000016>]  device_release_driver_internal+0x1da/0x300
[   82.402363 <    0.000016>]  unbind_store+0x143/0x190
[   82.402376 <    0.000013>]  ? sysfs_file_ops+0xa0/0xa0
[   82.402386 <    0.000010>]  kernfs_fop_write+0x186/0x220
[   82.402402 <    0.000016>]  __vfs_write+0xb9/0x2e0
[   82.402412 <    0.000010>]  ? locks_remove_posix+0x87/0x220
[   82.402421 <    0.000009>]  ? kernel_read+0xa0/0xa0
[   82.402430 <    0.000009>]  ? find_held_lock+0xfb/0x130
[   82.402441 <    0.000011>]  ? __lock_acquire.isra.30+0x414/0xb00
[   82.402465 <    0.000024>]  ? vfs_write+0x227/0x250
[   82.402485 <    0.000020>]  ? __sb_start_write+0xc3/0x1a0
[   82.402492 <    0.000007>]  ? vfs_write+0x227/0x250
[   82.402506 <    0.000014>]  vfs_write+0xe6/0x250
[   82.402522 <    0.000016>]  SyS_write+0xa1/0x120
[   82.402532 <    0.000010>]  ? SyS_read+0x120/0x120
[   82.402547 <    0.000015>]  ? vtime_user_exit+0xc8/0xe0
[   82.402558 <    0.000011>]  ? SyS_read+0x120/0x120
[   82.402570 <    0.000012>]  do_syscall_64+0xf0/0x270
[   82.402588 <    0.000018>]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   82.402597 <    0.000009>] RIP: 0033:0x7fd050bc32c0
[   82.402603 <    0.000006>] RSP: 002b:00007ffc06b7f3b8 EFLAGS: 00000246
ORIG_RAX: 0000000000000001
[   82.402615 <    0.000012>] RAX: ffffffffffffffda RBX: 000000000000000d RCX:
00007fd050bc32c0
[   82.402620 <    0.000005>] RDX: 000000000000000d RSI: 0000000001a6e408 RDI:
0000000000000001
[   82.402626 <    0.000006>] RBP: 0000000001a6e408 R08: 00007fd050e92780 R09:
00007fd0514d9700
[   82.402632 <    0.000006>] R10: 000000000000000c R11: 0000000000000246 R12:
000000000000000d
[   82.402637 <    0.000005>] R13: 0000000000000001 R14: 00007fd050e91620 R15:
0000000000000000

[   82.402711 <    0.000074>] Allocated by task 1084:
[   82.402771 <    0.000060>]  kasan_kmalloc+0xa6/0xd0
[   82.402780 <    0.000009>]  kmem_cache_alloc_trace+0x13a/0x270
[   82.403079 <    0.000299>]  dm_hw_init+0x898/0x1660 [amdgpu]
[   82.403338 <    0.000259>]  amdgpu_device_init+0x1a97/0x2100 [amdgpu]
[   82.403596 <    0.000258>]  amdgpu_driver_load_kms+0xa8/0x3a0 [amdgpu]
[   82.403673 <    0.000077>]  drm_dev_register+0x1d5/0x2f0 [drm]
[   82.403931 <    0.000258>]  amdgpu_pci_probe+0x1bf/0x290 [amdgpu]
[   82.403941 <    0.000010>]  local_pci_probe+0x74/0xe0
[   82.403951 <    0.000010>]  pci_device_probe+0x1dc/0x2d0
[   82.403970 <    0.000019>]  driver_probe_device+0x40e/0x6b0
[   82.403977 <    0.000007>]  __driver_attach+0x11d/0x130
[   82.403984 <    0.000007>]  bus_for_each_dev+0xd8/0x140
[   82.403990 <    0.000006>]  bus_add_driver+0x31d/0x3a0
[   82.403998 <    0.000008>]  driver_register+0xc6/0x170
[   82.404006 <    0.000008>]  do_one_initcall+0x82/0x1d0
[   82.404012 <    0.000006>]  do_init_module+0xe7/0x333
[   82.404020 <    0.000008>]  load_module+0x41b3/0x4c40
[   82.404028 <    0.000008>]  SYSC_finit_module+0x14d/0x180
[   82.404036 <    0.000008>]  do_syscall_64+0xf0/0x270
[   82.404044 <    0.000008>]  entry_SYSCALL_64_after_hwframe+0x21/0x86

[   82.404095 <    0.000051>] Freed by task 1178:
[   82.404152 <    0.000057>]  __kasan_slab_free+0x124/0x170
[   82.404159 <    0.000007>]  kfree+0xd4/0x200
[   82.404239 <    0.000080>]  drm_mode_config_cleanup+0x241/0x450 [drm]
[   82.404536 <    0.000297>]  amdgpu_dm_fini+0x29/0xb0 [amdgpu]
[   82.404834 <    0.000298>]  dm_hw_fini+0x1e/0x30 [amdgpu]
[   82.405091 <    0.000257>]  amdgpu_device_ip_fini+0x157/0x610 [amdgpu]
[   82.405349 <    0.000258>]  amdgpu_device_fini+0xa1/0x320 [amdgpu]
[   82.405607 <    0.000258>]  amdgpu_driver_unload_kms+0x6a/0xd0 [amdgpu]
[   82.405684 <    0.000077>]  drm_dev_unregister+0x79/0x180 [drm]
[   82.405941 <    0.000257>]  amdgpu_pci_remove+0x2a/0x60 [amdgpu]
[   82.405949 <    0.000008>]  pci_device_remove+0x5b/0x100
[   82.405957 <    0.000008>]  device_release_driver_internal+0x1da/0x300
[   82.405963 <    0.000006>]  unbind_store+0x143/0x190
[   82.405971 <    0.000008>]  kernfs_fop_write+0x186/0x220
[   82.405978 <    0.000007>]  __vfs_write+0xb9/0x2e0
[   82.405985 <    0.000007>]  vfs_write+0xe6/0x250
[   82.405991 <    0.000006>]  SyS_write+0xa1/0x120
[   82.405998 <    0.000007>]  do_syscall_64+0xf0/0x270
[   82.406007 <    0.000009>]  entry_SYSCALL_64_after_hwframe+0x21/0x86

[   82.406057 <    0.000050>] The buggy address belongs to the object at
ffff88008f53e600
                               which belongs to the cache kmalloc-4096 of size
4096
[   82.406163 <    0.000106>] The buggy address is located 2196 bytes inside of
                               4096-byte region [ffff88008f53e600,
ffff88008f53f600)
[   82.406262 <    0.000099>] The buggy address belongs to the page:
[   82.406326 <    0.000064>] page:ffffea00023d4e00 count:1 mapcount:0 mapping:
         (null) index:0x0 compound_mapcount: 0
[   82.406424 <    0.000098>] flags: 0x1ffff0000008100(slab|head)
[   82.406488 <    0.000064>] raw: 01ffff0000008100 0000000000000000
0000000000000000 0000000100070007
[   82.406571 <    0.000083>] raw: dead000000000100 dead000000000200
ffff880102802600 0000000000000000
[   82.406649 <    0.000078>] page dumped because: kasan: bad access detected

[   82.406754 <    0.000105>] Memory state around the buggy address:
[   82.406816 <    0.000062>]  ffff88008f53ed80: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[   82.406893 <    0.000077>]  ffff88008f53ee00: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[   82.406968 <    0.000075>] >ffff88008f53ee80: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[   82.407036 <    0.000068>]                          ^
[   82.407087 <    0.000051>]  ffff88008f53ef00: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[   82.407157 <    0.000070>]  ffff88008f53ef80: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[   82.407226 <    0.000069>]
==================================================================

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 9714 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (13 preceding siblings ...)
  2018-02-27 19:17 ` bugzilla-daemon
@ 2018-03-27 18:02 ` bugzilla-daemon
  2018-03-27 20:20 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-03-27 18:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 406 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #13 from Harry Wentland <harry.wentland@amd.com> ---
This should be fixed in drm-next-4.17-wip and amd-staging-drm-next. Can someone
test if this resolves this ticket satisfactorily?

Both branches can be found on https://cgit.freedesktop.org/~agd5f/linux

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1359 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (14 preceding siblings ...)
  2018-03-27 18:02 ` bugzilla-daemon
@ 2018-03-27 20:20 ` bugzilla-daemon
  2018-03-29 16:07 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-03-27 20:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 696 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #14 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Harry Wentland from comment #13)
> This should be fixed in drm-next-4.17-wip and amd-staging-drm-next. Can
> someone test if this resolves this ticket satisfactorily?
> 
> Both branches can be found on https://cgit.freedesktop.org/~agd5f/linux

Yes, DAL issues are gone, also gone KIQ ring error on unbind. 
Still there is a KIQ ring error on rebind which I am investigating now + there
is SDMA ring IB test failure which I will get to after KIQ is resloved.

Andrey

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1736 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (15 preceding siblings ...)
  2018-03-27 20:20 ` bugzilla-daemon
@ 2018-03-29 16:07 ` bugzilla-daemon
  2018-03-29 19:55 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-03-29 16:07 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1605 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #15 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to mlen from comment #3)
> I tested amd-staging-drm-next with HEAD at
> f1367d12f5fabb04789c7772594887434c8d9e8b. This time the unbind succeeded,
> but there are still some errors logged and kernel reports locking problem in
> amdgpu:
> 
> [   77.098923] [drm] amdgpu: finishing device.
> [   77.458614] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [   77.481247] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
> [   77.653815] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [   77.845085] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [   77.855055] IPv6: ADDRCONF(NETDEV_CHANGE): virbr10: link becomes ready
> [   78.036695] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [   78.233244] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [   78.425058] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [   78.616635] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [   78.808323] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)

Can you retest with latest
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.17-wip to see if
KCQ related errors are gone ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2729 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (16 preceding siblings ...)
  2018-03-29 16:07 ` bugzilla-daemon
@ 2018-03-29 19:55 ` bugzilla-daemon
  2018-04-01  9:51 ` bugzilla-daemon
  2018-04-02 20:49 ` bugzilla-daemon
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-03-29 19:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 256 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #16 from mlen <mlen@mlen.pl> ---
I'll test it on saturday, I don't have access to that machine at the moment

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1133 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (17 preceding siblings ...)
  2018-03-29 19:55 ` bugzilla-daemon
@ 2018-04-01  9:51 ` bugzilla-daemon
  2018-04-02 20:49 ` bugzilla-daemon
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-04-01  9:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 5994 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

--- Comment #17 from mlen <mlen@mlen.pl> ---
No issues on drm-next-4.17-wip, unbinding works without any lockdep issues.

Do you plan to backport these changes to 4.16 or maybe even 4.15?


For the record, suspend fails on drm-next-4.17-wip. This could be possibly
related, but I don't know how to debug this and most likely this is out of
scope of this bug.

[   96.222095] PM: suspend entry (deep)
[   96.222099] PM: Syncing filesystems ... done.
[   96.230020] INFO: trying to register non-static key.
[   96.230024] the code is fine but needs lockdep annotation.
[   96.230026] turning off the locking correctness validator.
[   96.230029] CPU: 39 PID: 4506 Comm: pm-suspend Not tainted 4.16.0-rc7+ #2
[   96.230031] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS,
BIOS 3407 03/10/2017
[   96.230033] Call Trace:
[   96.230040]  dump_stack+0x46/0x59
[   96.230046]  register_lock_class+0x192/0x361
[   96.230050]  ? cycles_2_ns+0x55/0x75
[   96.230054]  __lock_acquire.isra.30+0x97/0x595
[   96.230057]  lock_acquire+0x105/0x12e
[   96.230060]  ? devres_for_each_res+0x41/0xc4
[   96.230064]  ? __fw_entry_found+0x3b/0x3b
[   96.230068]  _raw_spin_lock_irqsave+0x3d/0x74
[   96.230071]  ? devres_for_each_res+0x41/0xc4
[   96.230073]  ? kref_get+0xa/0xa
[   96.230075]  ? alloc_fw_cache_entry+0x4e/0x4e
[   96.230077]  devres_for_each_res+0x41/0xc4
[   96.230096]  dev_cache_fw_image+0x59/0x11d
[   96.230098]  ? fw_pm_notify+0xd1/0xd1
[   96.230102]  dpm_for_each_dev+0x41/0x58
[   96.230104]  fw_pm_notify+0xac/0xd1
[   96.230108]  notifier_call_chain+0x39/0x5a
[   96.230127]  __blocking_notifier_call_chain+0x4e/0x65
[   96.230130]  __pm_notifier_call_chain+0x1b/0x2f
[   96.230133]  pm_suspend+0x15b/0x2c1
[   96.230135]  state_store+0x4b/0x7e
[   96.230140]  kernfs_fop_write+0x114/0x15c
[   96.230145]  __vfs_write+0x33/0xd7
[   96.230148]  ? __sb_start_write+0x94/0x180
[   96.230150]  ? __sb_start_write+0xc0/0x180
[   96.230153]  vfs_write+0xa5/0xe2
[   96.230156]  SyS_write+0x5f/0xa3
[   96.230160]  do_syscall_64+0x79/0x88
[   96.230164]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   96.230166] RIP: 0033:0x7fac7a914468
[   96.230168] RSP: 002b:00007ffea25554e0 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   96.230171] RAX: ffffffffffffffda RBX: 0000000000000004 RCX:
00007fac7a914468
[   96.230172] RDX: 0000000000000004 RSI: 0000558679eb4500 RDI:
0000000000000001
[   96.230174] RBP: 0000558679eb4500 R08: 000000000000000a R09:
0000558679eeb6b0
[   96.230175] R10: 00007fac7a9a5b20 R11: 0000000000000246 R12:
0000000000000004
[   96.230177] R13: 0000000000000001 R14: 00007fac7abe1740 R15:
0000000000000004
[   96.230183] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[   96.230189] IP: devres_for_each_res+0x59/0xc4
[   96.230191] PGD 0 P4D 0 
[   96.230196] Oops: 0000 [#1] PREEMPT SMP PTI
[   96.230199] Modules linked in:
[   96.230203] CPU: 39 PID: 4506 Comm: pm-suspend Not tainted 4.16.0-rc7+ #2
[   96.230205] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS,
BIOS 3407 03/10/2017
[   96.230208] RIP: 0010:devres_for_each_res+0x59/0xc4
[   96.230210] RSP: 0018:ffffa63689fabc70 EFLAGS: 00010086
[   96.230214] RAX: 0000000000000000 RBX: ffff9fd574d4ad98 RCX:
ffff9fd574d4b1f0
[   96.230216] RDX: ffff9fd576ea0918 RSI: ffff9fd574d4b1c0 RDI:
ffff9fd576ea0918
[   96.230218] RBP: ffffffffa8951b3a R08: 0000000000000000 R09:
0000000000000000
[   96.230221] R10: ffff9fd576ea0000 R11: ffffffffab22aa07 R12:
ffffffffa8951f44
[   96.230223] R13: dead000000000100 R14: ffffffffa8951c08 R15:
ffff9fd574d4b1a8
[   96.230226] FS:  00007fac7b021740(0000) GS:ffff9fe57f400000(0000)
knlGS:0000000000000000
[   96.230229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   96.230231] CR2: 0000000000000008 CR3: 000000202a4a8002 CR4:
00000000003606e0
[   96.230233] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   96.230236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[   96.230238] Call Trace:
[   96.230243]  dev_cache_fw_image+0x59/0x11d
[   96.230247]  ? fw_pm_notify+0xd1/0xd1
[   96.230250]  dpm_for_each_dev+0x41/0x58
[   96.230253]  fw_pm_notify+0xac/0xd1
[   96.230256]  notifier_call_chain+0x39/0x5a
[   96.230261]  __blocking_notifier_call_chain+0x4e/0x65
[   96.230264]  __pm_notifier_call_chain+0x1b/0x2f
[   96.230267]  pm_suspend+0x15b/0x2c1
[   96.230271]  state_store+0x4b/0x7e
[   96.230275]  kernfs_fop_write+0x114/0x15c
[   96.230279]  __vfs_write+0x33/0xd7
[   96.230284]  ? __sb_start_write+0x94/0x180
[   96.230286]  ? __sb_start_write+0xc0/0x180
[   96.230290]  vfs_write+0xa5/0xe2
[   96.230295]  SyS_write+0x5f/0xa3
[   96.230299]  do_syscall_64+0x79/0x88
[   96.230303]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   96.230305] RIP: 0033:0x7fac7a914468
[   96.230308] RSP: 002b:00007ffea25554e0 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   96.230311] RAX: ffffffffffffffda RBX: 0000000000000004 RCX:
00007fac7a914468
[   96.230313] RDX: 0000000000000004 RSI: 0000558679eb4500 RDI:
0000000000000001
[   96.230316] RBP: 0000558679eb4500 R08: 000000000000000a R09:
0000558679eeb6b0
[   96.230318] R10: 00007fac7a9a5b20 R11: 0000000000000246 R12:
0000000000000004
[   96.230320] R13: 0000000000000001 R14: 00007fac7abe1740 R15:
0000000000000004
[   96.230326] Code: 48 83 ec 28 48 89 4c 24 18 4c 89 4c 24 20 e8 ca 7d 4b 00
48 8d 8b 58 04 00 00 48 89 44 24 08 48 8b 83 60 04 00 00 48 89 4c 24 10 <4c> 8b
68 08 48 3b 44 24 10 74 44 4c 3b 70 10 75 35 48 83 c0 28 
[   96.230390] RIP: devres_for_each_res+0x59/0xc4 RSP: ffffa63689fabc70
[   96.230392] CR2: 0000000000000008
[   96.230395] ---[ end trace 43f33fa700a0efa9 ]---
[   96.233392] note: pm-suspend[4506] exited with preempt_count 1

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 6877 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
  2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
                   ` (18 preceding siblings ...)
  2018-04-01  9:51 ` bugzilla-daemon
@ 2018-04-02 20:49 ` bugzilla-daemon
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-04-02 20:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1122 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=104825

Harry Wentland <harry.wentland@amd.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #18 from Harry Wentland <harry.wentland@amd.com> ---
Thanks for your bug report, and constant testing and feedback. Since the
original issue is fixed in drm-next-4.17-wip I'll mark this resolved. If I'm
jumping the gun here and you still notice problems feel free to reopen.

The suspend/resume issue you mention seems to be something else and might be
best tracked with a new bug report, possibly on kernel.org as I don't see an
indication that this is graphics driver related.

As for back-porting the changes, AFAIK this bug report uncovered quite a few
issues which were all fixed with separate commits. I don't think all of them
have been pulled into 4.16.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2795 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-04-02 20:49 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-29  6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
2018-01-29 15:47 ` bugzilla-daemon
2018-01-30  8:00 ` bugzilla-daemon
2018-01-31  7:21 ` bugzilla-daemon
2018-02-27  0:09 ` bugzilla-daemon
2018-02-27 10:55 ` bugzilla-daemon
2018-02-27 11:48 ` bugzilla-daemon
2018-02-27 14:51 ` bugzilla-daemon
2018-02-27 15:20 ` bugzilla-daemon
2018-02-27 15:30 ` bugzilla-daemon
2018-02-27 16:17 ` bugzilla-daemon
2018-02-27 18:10 ` bugzilla-daemon
2018-02-27 18:12 ` bugzilla-daemon
2018-02-27 19:01 ` bugzilla-daemon
2018-02-27 19:17 ` bugzilla-daemon
2018-03-27 18:02 ` bugzilla-daemon
2018-03-27 20:20 ` bugzilla-daemon
2018-03-29 16:07 ` bugzilla-daemon
2018-03-29 19:55 ` bugzilla-daemon
2018-04-01  9:51 ` bugzilla-daemon
2018-04-02 20:49 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.