* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
@ 2018-01-29 6:57 bugzilla-daemon
2018-01-29 15:47 ` bugzilla-daemon
` (19 more replies)
0 siblings, 20 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-01-29 6:57 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 5646 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
Bug ID: 104825
Summary: [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled
failed (scratch(0xC040)=0x00000000) when unbinding
Product: DRI
Version: XOrg git
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: normal
Priority: medium
Component: DRM/AMDgpu
Assignee: dri-devel@lists.freedesktop.org
Reporter: mlen@mlen.pl
I use two amdgpu rx480 cards. During boot one of them is rebound to vfio-pci
driver using the following script:
pci_ids=("0000:03:00.0" "0000:03:00.1")
for id in "${pci_ids[@]}"; do
vendor="$(cat "/sys/bus/pci/devices/$id/vendor")"
device="$(cat "/sys/bus/pci/devices/$id/device")"
if [ -e "/sys/bus/pci/devices/$id/driver/unbind" ]; then
echo "$id" >"/sys/bus/pci/devices/$id/driver/unbind"
fi
echo "$vendor $device" >/sys/bus/pci/drivers/vfio-pci/new_id
done
Starting from Linux 4.15 with amdgpu DC enabled (I wanted to use it for HDMI
audio), unbind operation causes general protection failure:
[ 68.011473] [drm] amdgpu: finishing device.
[ 68.377945] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 68.575193] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 68.770107] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 68.971775] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 69.164265] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 69.350089] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 69.538302] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 69.729260] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 69.729733] general protection fault: 0000 [#1] PREEMPT SMP PTI
[ 69.730901] Modules linked in:
[ 69.731936] CPU: 2 PID: 3934 Comm: openrc-run.sh Not tainted 4.15.0-gentoo
#2
[ 69.733009] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS,
BIOS 3407 03/10/2017
[ 69.734240] RIP: 0010:dm_read_reg_func.isra.0+0x3/0xc
[ 69.735314] RSP: 0018:ffffa80d8bd4fc40 EFLAGS: 00010282
[ 69.736353] RAX: ccea607dac10c354 RBX: ffff95af35bfca80 RCX:
0000000180200008
[ 69.737408] RDX: 0000000180200009 RSI: 0000000000005c02 RDI:
ffff95af362b91c0
[ 69.738494] RBP: ffff95af35b75c90 R08: 0000000000000001 R09:
ffffffffb30fced2
[ 69.739561] R10: ffff95af2f770a40 R11: 00000000ffffff80 R12:
ffff95af35f30100
[ 69.740638] R13: 0000000000000000 R14: ffff95af358a3c20 R15:
ffff95bf2d5f9c20
[ 69.742007] FS: 00007fe7530af740(0000) GS:ffff95af3da00000(0000)
knlGS:0000000000000000
[ 69.743083] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 69.744134] CR2: 000000c420870008 CR3: 0000001027426003 CR4:
00000000003606e0
[ 69.745190] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 69.746246] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 69.747320] Call Trace:
[ 69.748359] destroy+0x21/0x9c
[ 69.749395] dal_i2caux_destruct+0x6f/0xab
[ 69.750459] destroy+0x15/0x27
[ 69.751495] dal_i2caux_destroy+0x26/0x2f
[ 69.752537] destruct+0x86/0xfd
[ 69.753571] dc_destroy+0x11/0x22
[ 69.754626] dm_hw_fini+0x1e/0x22
[ 69.755632] amdgpu_fini+0xf3/0x2d6
[ 69.756616] amdgpu_device_fini+0x5c/0x158
[ 69.757597] amdgpu_driver_unload_kms+0x6b/0x7e
[ 69.758601] drm_dev_unregister+0x4c/0xc6
[ 69.759584] amdgpu_pci_remove+0x19/0x37
[ 69.760576] pci_device_remove+0x3b/0x8b
[ 69.761563] device_release_driver_internal+0x125/0x1f9
[ 69.762580] unbind_store+0x60/0x90
[ 69.763568] kernfs_fop_write+0x111/0x159
[ 69.764557] __vfs_write+0x33/0xd7
[ 69.765543] ? preempt_count_sub+0x8b/0x94
[ 69.766552] ? __sb_start_write+0xc0/0x180
[ 69.767525] vfs_write+0xa5/0xe2
[ 69.768490] SyS_write+0x5f/0xa3
[ 69.769439] do_syscall_64+0x72/0x81
[ 69.770419] entry_SYSCALL64_slow_path+0x25/0x25
[ 69.771373] RIP: 0033:0x7fe7529a1408
[ 69.772324] RSP: 002b:00007ffe1885c200 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 69.773300] RAX: ffffffffffffffda RBX: 000000000000000d RCX:
00007fe7529a1408
[ 69.774290] RDX: 000000000000000d RSI: 0000558994641890 RDI:
0000000000000001
[ 69.775290] RBP: 0000558994641890 R08: 000000000000000a R09:
00005589946475f0
[ 69.776260] R10: 000000000000009b R11: 0000000000000246 R12:
000000000000000d
[ 69.777248] R13: 0000000000000001 R14: 00007fe752c6e740 R15:
000000000000000d
[ 69.778244] Code: f7 74 24 04 89 43 5c 48 8b 4c 24 08 65 48 33 0c 25 28 00
00 00 4c 89 e0 74 05 e8 39 58 9e ff 48 83 c4 10 5b 5d 41 5c c3 48 8b 07 <48> 8b
40 30 e9 03 4a 70 00 0f 1f 44 00 00 48 8b 47 30 8b 70 04
[ 69.779406] RIP: dm_read_reg_func.isra.0+0x3/0xc RSP: ffffa80d8bd4fc40
[ 69.780491] ---[ end trace 6aa4681ba3a43ec3 ]---
[ 71.815503] [drm:amdgpu_fill_buffer] *ERROR* Trying to clear memory with
ring turned off.
[ 71.899258] amdgpu 0000:03:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=none
[ 71.899263] amdgpu 0000:02:00.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=none:owns=none
[ 72.217918] [drm:amdgpu_fill_buffer] *ERROR* Trying to clear memory with
ring turned off.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 7102 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
@ 2018-01-29 15:47 ` bugzilla-daemon
2018-01-30 8:00 ` bugzilla-daemon
` (18 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-01-29 15:47 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 568 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #1 from Harry Wentland <harry.wentland@amd.com> ---
This patch https://patchwork.freedesktop.org/patch/198719/ should fix it, but
there could be some other issues as well.
amd-staging-drm-next has fixes for a whole bunch of driver unload issues,
including what you're seeing. It's hosted at
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
Can you try the patch and/or amd-staging-drm-next?
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1610 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
2018-01-29 15:47 ` bugzilla-daemon
@ 2018-01-30 8:00 ` bugzilla-daemon
2018-01-31 7:21 ` bugzilla-daemon
` (17 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-01-30 8:00 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 229 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #2 from mlen <mlen@mlen.pl> ---
I'll try running amd-staging-drm-next later today
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1105 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
2018-01-29 15:47 ` bugzilla-daemon
2018-01-30 8:00 ` bugzilla-daemon
@ 2018-01-31 7:21 ` bugzilla-daemon
2018-02-27 0:09 ` bugzilla-daemon
` (16 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-01-31 7:21 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 4610 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #3 from mlen <mlen@mlen.pl> ---
I tested amd-staging-drm-next with HEAD at
f1367d12f5fabb04789c7772594887434c8d9e8b. This time the unbind succeeded, but
there are still some errors logged and kernel reports locking problem in
amdgpu:
[ 77.098923] [drm] amdgpu: finishing device.
[ 77.458614] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 77.481247] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
[ 77.653815] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 77.845085] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 77.855055] IPv6: ADDRCONF(NETDEV_CHANGE): virbr10: link becomes ready
[ 78.036695] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 78.233244] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 78.425058] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 78.616635] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 78.808323] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
(scratch(0xC040)=0x00000000)
[ 78.810659] amdgpu 0000:03:00.0: 00000000a667dd57 unpin not necessary
[ 78.810672] amdgpu 0000:03:00.0: 00000000a7594a2b unpin not necessary
[ 78.811733] =====================================
[ 78.813109] WARNING: bad unlock balance detected!
[ 78.813947] 4.15.0-rc4+ #2 Not tainted
[ 78.814835] -------------------------------------
[ 78.815731] openrc-run.sh/3931 is trying to release lock
(&(&mgr->lock)->rlock) at:
[ 78.816646] [<000000006fd39549>] amdgpu_gtt_mgr_fini+0x22/0x37
[ 78.817531] but there are no more locks to release!
[ 78.818446]
other info that might help us debug this:
[ 78.820208] 5 locks held by openrc-run.sh/3931:
[ 78.821127] #0: (sb_writers#6){....}, at: [<00000000322e5044>]
vfs_write+0x87/0xe2
[ 78.822051] #1: (&of->mutex){....}, at: [<00000000660270c4>]
kernfs_fop_write+0xca/0x156
[ 78.823007] #2: (kn->count#211){....}, at: [<000000000634dafb>]
kernfs_fop_write+0xd2/0x156
[ 78.823936] #3: (&dev->mutex){....}, at: [<00000000c386f49f>]
unbind_store+0x58/0x90
[ 78.824912] #4: (&dev->mutex){....}, at: [<00000000eefcc37f>]
device_release_driver_internal+0x2f/0x1f3
[ 78.825861]
stack backtrace:
[ 78.827764] CPU: 7 PID: 3931 Comm: openrc-run.sh Not tainted 4.15.0-rc4+ #2
[ 78.828747] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS,
BIOS 3407 03/10/2017
[ 78.829718] Call Trace:
[ 78.830717] dump_stack+0x67/0x8e
[ 78.831689] ? amdgpu_gtt_mgr_fini+0x22/0x37
[ 78.832687] print_unlock_imbalance_bug+0xcc/0xd3
[ 78.833657] lock_release+0x134/0x267
[ 78.834646] ? _raw_spin_unlock+0x2e/0x40
[ 78.835605] _raw_spin_unlock+0x1c/0x40
[ 78.836586] amdgpu_gtt_mgr_fini+0x22/0x37
[ 78.837549] ttm_bo_clean_mm+0x79/0xab
[ 78.838544] amdgpu_ttm_fini+0x75/0x11c
[ 78.839507] amdgpu_bo_fini+0xe/0x2d
[ 78.840495] gmc_v8_0_sw_fini+0x2e/0x49
[ 78.841454] amdgpu_device_ip_fini+0x21f/0x2d3
[ 78.842439] amdgpu_device_fini+0x4c/0x125
[ 78.843394] amdgpu_driver_unload_kms+0x63/0x76
[ 78.844373] drm_dev_unregister+0x49/0xc3
[ 78.845318] amdgpu_pci_remove+0x19/0x37
[ 78.846244] pci_device_remove+0x36/0x86
[ 78.847190] device_release_driver_internal+0x122/0x1f3
[ 78.848120] unbind_store+0x60/0x90
[ 78.849069] kernfs_fop_write+0x10e/0x156
[ 78.849997] __vfs_write+0x31/0xcc
[ 78.850937] ? preempt_count_sub+0x8b/0x94
[ 78.851871] ? __sb_start_write+0xc0/0x180
[ 78.852828] vfs_write+0xa5/0xe2
[ 78.853755] SyS_write+0x5f/0xa3
[ 78.854708] do_syscall_64+0x6c/0x7b
[ 78.855630] entry_SYSCALL64_slow_path+0x25/0x25
[ 78.856583] RIP: 0033:0x7fc5804b6408
[ 78.857511] RSP: 002b:00007ffd95228060 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 78.858484] RAX: ffffffffffffffda RBX: 000000000000000d RCX:
00007fc5804b6408
[ 78.859438] RDX: 000000000000000d RSI: 000055b229e9e890 RDI:
0000000000000001
[ 78.860419] RBP: 000055b229e9e890 R08: 000000000000000a R09:
000055b229ea45f0
[ 78.861379] R10: 000000000000009b R11: 0000000000000246 R12:
000000000000000d
[ 78.862364] R13: 0000000000000001 R14: 00007fc580783740 R15:
000000000000000d
[ 78.863411] [drm] amdgpu: ttm finalized
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 5560 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (2 preceding siblings ...)
2018-01-31 7:21 ` bugzilla-daemon
@ 2018-02-27 0:09 ` bugzilla-daemon
2018-02-27 10:55 ` bugzilla-daemon
` (15 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 0:09 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 274 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #4 from Harry Wentland <harry.wentland@amd.com> ---
Good to hear DC issues are gone. Not sure about the unlock balance myself.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1170 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (3 preceding siblings ...)
2018-02-27 0:09 ` bugzilla-daemon
@ 2018-02-27 10:55 ` bugzilla-daemon
2018-02-27 11:48 ` bugzilla-daemon
` (14 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 10:55 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 384 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #5 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Hi, can you please provide full dmesg for the unbind sequence ?
So the issue happens when you unbind the card to from the host driver and
before binding it to VFIO ?
Andrey
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1286 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (4 preceding siblings ...)
2018-02-27 10:55 ` bugzilla-daemon
@ 2018-02-27 11:48 ` bugzilla-daemon
2018-02-27 14:51 ` bugzilla-daemon
` (13 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 11:48 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 347 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #6 from mlen <mlen@mlen.pl> ---
Currently I don't have access to that machine, I should be able to test next
week. I'll extend the script to be more verbose and writing some delimiters to
/dev/kmsg
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1223 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (5 preceding siblings ...)
2018-02-27 11:48 ` bugzilla-daemon
@ 2018-02-27 14:51 ` bugzilla-daemon
2018-02-27 15:20 ` bugzilla-daemon
` (12 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 14:51 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 821 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
Andrey Grodzovsky <andrey.grodzovsky@amd.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |andrey.grodzovsky@amd.com
--- Comment #7 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
On Carrizo unbinding the driver will indeed show [drm:gfx_v8_0_hw_fini
[amdgpu]] *ERROR* KCQ disabled failed (scratch(0xC040)=0xCAFEDEAD), I don't
have LOCKDEP related kflags enabled in kernel for some reason so that probably
why I don't see the locking imbalance warning, will rebuild and check.
What card model are you using ?
Andrey
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2334 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (6 preceding siblings ...)
2018-02-27 14:51 ` bugzilla-daemon
@ 2018-02-27 15:20 ` bugzilla-daemon
2018-02-27 15:30 ` bugzilla-daemon
` (11 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 15:20 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 452 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
Andrey Grodzovsky <andrey.grodzovsky@amd.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|dri-devel@lists.freedesktop |andrey.grodzovsky@amd.com
|.org |
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1192 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (7 preceding siblings ...)
2018-02-27 15:20 ` bugzilla-daemon
@ 2018-02-27 15:30 ` bugzilla-daemon
2018-02-27 16:17 ` bugzilla-daemon
` (10 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 15:30 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 458 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
Andrey Grodzovsky <andrey.grodzovsky@amd.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|andrey.grodzovsky@amd.com |dri-devel@lists.freedesktop
| |.org
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1192 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (8 preceding siblings ...)
2018-02-27 15:30 ` bugzilla-daemon
@ 2018-02-27 16:17 ` bugzilla-daemon
2018-02-27 18:10 ` bugzilla-daemon
` (9 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 16:17 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 242 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #8 from mlen <mlen@mlen.pl> ---
I'm using two RX 480 cards, but I'm rebinding only one of them
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1118 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (9 preceding siblings ...)
2018-02-27 16:17 ` bugzilla-daemon
@ 2018-02-27 18:10 ` bugzilla-daemon
2018-02-27 18:12 ` bugzilla-daemon
` (8 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 18:10 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 663 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #9 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 137652
--> https://bugs.freedesktop.org/attachment.cgi?id=137652&action=edit
DAL warnings during driver unbind
Harry, with DAL enabled I observe warnings and unbind can't complete -
to reproduce under root do
cd /sys/bus/pci/drivers/amdgpu/
echo 'pci_id' > unbind where pci_id is seen when doing ls -l on this folder and
seeing soft link like this
0000:2a:00.0 -> ../../../../devices/pci0000:00/0000:00:03.1/0000:2a:00.0/
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1729 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (10 preceding siblings ...)
2018-02-27 18:10 ` bugzilla-daemon
@ 2018-02-27 18:12 ` bugzilla-daemon
2018-02-27 19:01 ` bugzilla-daemon
` (7 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 18:12 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 410 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #10 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
P.S I am using CZ or Ellsmire with tip of amd-staging-drm-next. Going to
disable DAL for now to debug the KCQ ring failure.
P.S After enabling lockdep i still don't see any locking related warnings.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1313 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (11 preceding siblings ...)
2018-02-27 18:12 ` bugzilla-daemon
@ 2018-02-27 19:01 ` bugzilla-daemon
2018-02-27 19:17 ` bugzilla-daemon
` (6 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 19:01 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 426 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #11 from Harry Wentland <harry.wentland@amd.com> ---
Created attachment 137662
--> https://bugs.freedesktop.org/attachment.cgi?id=137662&action=edit
[PATCH] drm/amd/display: Use atomic crtc_disable for DC on shutdown
Andrey, can you see if this fixes the warning for you?
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1635 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (12 preceding siblings ...)
2018-02-27 19:01 ` bugzilla-daemon
@ 2018-02-27 19:17 ` bugzilla-daemon
2018-03-27 18:02 ` bugzilla-daemon
` (5 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-27 19:17 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 7752 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #12 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Harry Wentland from comment #11)
> Created attachment 137662 [details] [review]
> [PATCH] drm/amd/display: Use atomic crtc_disable for DC on shutdown
>
> Andrey, can you see if this fixes the warning for you?
Get use after free now
[ 82.400097 < 0.000387>] BUG: KASAN: use-after-free in
amdgpu_dm_set_pflip_irq_state+0x3d/0xa0 [amdgpu]
[ 82.400185 < 0.000088>] Read of size 4 at addr ffff88008f53ee94 by task
bash/1178
[ 82.400302 < 0.000117>] CPU: 0 PID: 1178 Comm: bash Tainted: G W
OE 4.16.0-rc1.main+ #14
[ 82.400308 < 0.000006>] Hardware name: AMD Gardenia/Gardenia, BIOS
RGA1101C 07/20/2015
[ 82.400312 < 0.000004>] Call Trace:
[ 82.400329 < 0.000017>] dump_stack+0x5c/0x78
[ 82.400342 < 0.000013>] print_address_description+0xd1/0x270
[ 82.400618 < 0.000276>] ? amdgpu_dm_set_pflip_irq_state+0x3d/0xa0
[amdgpu]
[ 82.400627 < 0.000009>] kasan_report+0x260/0x360
[ 82.400913 < 0.000286>] amdgpu_dm_set_pflip_irq_state+0x3d/0xa0 [amdgpu]
[ 82.401189 < 0.000276>] amdgpu_irq_disable_all+0x111/0x190 [amdgpu]
[ 82.401452 < 0.000263>] amdgpu_device_ip_fini+0x1b7/0x610 [amdgpu]
[ 82.401718 < 0.000266>] amdgpu_device_fini+0xa1/0x320 [amdgpu]
[ 82.401973 < 0.000255>] amdgpu_driver_unload_kms+0x6a/0xd0 [amdgpu]
[ 82.402059 < 0.000086>] drm_dev_unregister+0x79/0x180 [drm]
[ 82.402315 < 0.000256>] amdgpu_pci_remove+0x2a/0x60 [amdgpu]
[ 82.402331 < 0.000016>] pci_device_remove+0x5b/0x100
[ 82.402347 < 0.000016>] device_release_driver_internal+0x1da/0x300
[ 82.402363 < 0.000016>] unbind_store+0x143/0x190
[ 82.402376 < 0.000013>] ? sysfs_file_ops+0xa0/0xa0
[ 82.402386 < 0.000010>] kernfs_fop_write+0x186/0x220
[ 82.402402 < 0.000016>] __vfs_write+0xb9/0x2e0
[ 82.402412 < 0.000010>] ? locks_remove_posix+0x87/0x220
[ 82.402421 < 0.000009>] ? kernel_read+0xa0/0xa0
[ 82.402430 < 0.000009>] ? find_held_lock+0xfb/0x130
[ 82.402441 < 0.000011>] ? __lock_acquire.isra.30+0x414/0xb00
[ 82.402465 < 0.000024>] ? vfs_write+0x227/0x250
[ 82.402485 < 0.000020>] ? __sb_start_write+0xc3/0x1a0
[ 82.402492 < 0.000007>] ? vfs_write+0x227/0x250
[ 82.402506 < 0.000014>] vfs_write+0xe6/0x250
[ 82.402522 < 0.000016>] SyS_write+0xa1/0x120
[ 82.402532 < 0.000010>] ? SyS_read+0x120/0x120
[ 82.402547 < 0.000015>] ? vtime_user_exit+0xc8/0xe0
[ 82.402558 < 0.000011>] ? SyS_read+0x120/0x120
[ 82.402570 < 0.000012>] do_syscall_64+0xf0/0x270
[ 82.402588 < 0.000018>] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 82.402597 < 0.000009>] RIP: 0033:0x7fd050bc32c0
[ 82.402603 < 0.000006>] RSP: 002b:00007ffc06b7f3b8 EFLAGS: 00000246
ORIG_RAX: 0000000000000001
[ 82.402615 < 0.000012>] RAX: ffffffffffffffda RBX: 000000000000000d RCX:
00007fd050bc32c0
[ 82.402620 < 0.000005>] RDX: 000000000000000d RSI: 0000000001a6e408 RDI:
0000000000000001
[ 82.402626 < 0.000006>] RBP: 0000000001a6e408 R08: 00007fd050e92780 R09:
00007fd0514d9700
[ 82.402632 < 0.000006>] R10: 000000000000000c R11: 0000000000000246 R12:
000000000000000d
[ 82.402637 < 0.000005>] R13: 0000000000000001 R14: 00007fd050e91620 R15:
0000000000000000
[ 82.402711 < 0.000074>] Allocated by task 1084:
[ 82.402771 < 0.000060>] kasan_kmalloc+0xa6/0xd0
[ 82.402780 < 0.000009>] kmem_cache_alloc_trace+0x13a/0x270
[ 82.403079 < 0.000299>] dm_hw_init+0x898/0x1660 [amdgpu]
[ 82.403338 < 0.000259>] amdgpu_device_init+0x1a97/0x2100 [amdgpu]
[ 82.403596 < 0.000258>] amdgpu_driver_load_kms+0xa8/0x3a0 [amdgpu]
[ 82.403673 < 0.000077>] drm_dev_register+0x1d5/0x2f0 [drm]
[ 82.403931 < 0.000258>] amdgpu_pci_probe+0x1bf/0x290 [amdgpu]
[ 82.403941 < 0.000010>] local_pci_probe+0x74/0xe0
[ 82.403951 < 0.000010>] pci_device_probe+0x1dc/0x2d0
[ 82.403970 < 0.000019>] driver_probe_device+0x40e/0x6b0
[ 82.403977 < 0.000007>] __driver_attach+0x11d/0x130
[ 82.403984 < 0.000007>] bus_for_each_dev+0xd8/0x140
[ 82.403990 < 0.000006>] bus_add_driver+0x31d/0x3a0
[ 82.403998 < 0.000008>] driver_register+0xc6/0x170
[ 82.404006 < 0.000008>] do_one_initcall+0x82/0x1d0
[ 82.404012 < 0.000006>] do_init_module+0xe7/0x333
[ 82.404020 < 0.000008>] load_module+0x41b3/0x4c40
[ 82.404028 < 0.000008>] SYSC_finit_module+0x14d/0x180
[ 82.404036 < 0.000008>] do_syscall_64+0xf0/0x270
[ 82.404044 < 0.000008>] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 82.404095 < 0.000051>] Freed by task 1178:
[ 82.404152 < 0.000057>] __kasan_slab_free+0x124/0x170
[ 82.404159 < 0.000007>] kfree+0xd4/0x200
[ 82.404239 < 0.000080>] drm_mode_config_cleanup+0x241/0x450 [drm]
[ 82.404536 < 0.000297>] amdgpu_dm_fini+0x29/0xb0 [amdgpu]
[ 82.404834 < 0.000298>] dm_hw_fini+0x1e/0x30 [amdgpu]
[ 82.405091 < 0.000257>] amdgpu_device_ip_fini+0x157/0x610 [amdgpu]
[ 82.405349 < 0.000258>] amdgpu_device_fini+0xa1/0x320 [amdgpu]
[ 82.405607 < 0.000258>] amdgpu_driver_unload_kms+0x6a/0xd0 [amdgpu]
[ 82.405684 < 0.000077>] drm_dev_unregister+0x79/0x180 [drm]
[ 82.405941 < 0.000257>] amdgpu_pci_remove+0x2a/0x60 [amdgpu]
[ 82.405949 < 0.000008>] pci_device_remove+0x5b/0x100
[ 82.405957 < 0.000008>] device_release_driver_internal+0x1da/0x300
[ 82.405963 < 0.000006>] unbind_store+0x143/0x190
[ 82.405971 < 0.000008>] kernfs_fop_write+0x186/0x220
[ 82.405978 < 0.000007>] __vfs_write+0xb9/0x2e0
[ 82.405985 < 0.000007>] vfs_write+0xe6/0x250
[ 82.405991 < 0.000006>] SyS_write+0xa1/0x120
[ 82.405998 < 0.000007>] do_syscall_64+0xf0/0x270
[ 82.406007 < 0.000009>] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 82.406057 < 0.000050>] The buggy address belongs to the object at
ffff88008f53e600
which belongs to the cache kmalloc-4096 of size
4096
[ 82.406163 < 0.000106>] The buggy address is located 2196 bytes inside of
4096-byte region [ffff88008f53e600,
ffff88008f53f600)
[ 82.406262 < 0.000099>] The buggy address belongs to the page:
[ 82.406326 < 0.000064>] page:ffffea00023d4e00 count:1 mapcount:0 mapping:
(null) index:0x0 compound_mapcount: 0
[ 82.406424 < 0.000098>] flags: 0x1ffff0000008100(slab|head)
[ 82.406488 < 0.000064>] raw: 01ffff0000008100 0000000000000000
0000000000000000 0000000100070007
[ 82.406571 < 0.000083>] raw: dead000000000100 dead000000000200
ffff880102802600 0000000000000000
[ 82.406649 < 0.000078>] page dumped because: kasan: bad access detected
[ 82.406754 < 0.000105>] Memory state around the buggy address:
[ 82.406816 < 0.000062>] ffff88008f53ed80: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[ 82.406893 < 0.000077>] ffff88008f53ee00: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[ 82.406968 < 0.000075>] >ffff88008f53ee80: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[ 82.407036 < 0.000068>] ^
[ 82.407087 < 0.000051>] ffff88008f53ef00: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[ 82.407157 < 0.000070>] ffff88008f53ef80: fb fb fb fb fb fb fb fb fb fb
fb fb fb fb fb fb
[ 82.407226 < 0.000069>]
==================================================================
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 9714 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (13 preceding siblings ...)
2018-02-27 19:17 ` bugzilla-daemon
@ 2018-03-27 18:02 ` bugzilla-daemon
2018-03-27 20:20 ` bugzilla-daemon
` (4 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-03-27 18:02 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 406 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #13 from Harry Wentland <harry.wentland@amd.com> ---
This should be fixed in drm-next-4.17-wip and amd-staging-drm-next. Can someone
test if this resolves this ticket satisfactorily?
Both branches can be found on https://cgit.freedesktop.org/~agd5f/linux
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1359 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (14 preceding siblings ...)
2018-03-27 18:02 ` bugzilla-daemon
@ 2018-03-27 20:20 ` bugzilla-daemon
2018-03-29 16:07 ` bugzilla-daemon
` (3 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-03-27 20:20 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 696 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #14 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Harry Wentland from comment #13)
> This should be fixed in drm-next-4.17-wip and amd-staging-drm-next. Can
> someone test if this resolves this ticket satisfactorily?
>
> Both branches can be found on https://cgit.freedesktop.org/~agd5f/linux
Yes, DAL issues are gone, also gone KIQ ring error on unbind.
Still there is a KIQ ring error on rebind which I am investigating now + there
is SDMA ring IB test failure which I will get to after KIQ is resloved.
Andrey
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1736 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (15 preceding siblings ...)
2018-03-27 20:20 ` bugzilla-daemon
@ 2018-03-29 16:07 ` bugzilla-daemon
2018-03-29 19:55 ` bugzilla-daemon
` (2 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-03-29 16:07 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1605 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #15 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to mlen from comment #3)
> I tested amd-staging-drm-next with HEAD at
> f1367d12f5fabb04789c7772594887434c8d9e8b. This time the unbind succeeded,
> but there are still some errors logged and kernel reports locking problem in
> amdgpu:
>
> [ 77.098923] [drm] amdgpu: finishing device.
> [ 77.458614] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [ 77.481247] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
> [ 77.653815] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [ 77.845085] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [ 77.855055] IPv6: ADDRCONF(NETDEV_CHANGE): virbr10: link becomes ready
> [ 78.036695] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [ 78.233244] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [ 78.425058] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [ 78.616635] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
> [ 78.808323] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed
> (scratch(0xC040)=0x00000000)
Can you retest with latest
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.17-wip to see if
KCQ related errors are gone ?
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2729 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (16 preceding siblings ...)
2018-03-29 16:07 ` bugzilla-daemon
@ 2018-03-29 19:55 ` bugzilla-daemon
2018-04-01 9:51 ` bugzilla-daemon
2018-04-02 20:49 ` bugzilla-daemon
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-03-29 19:55 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 256 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #16 from mlen <mlen@mlen.pl> ---
I'll test it on saturday, I don't have access to that machine at the moment
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1133 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (17 preceding siblings ...)
2018-03-29 19:55 ` bugzilla-daemon
@ 2018-04-01 9:51 ` bugzilla-daemon
2018-04-02 20:49 ` bugzilla-daemon
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-04-01 9:51 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 5994 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
--- Comment #17 from mlen <mlen@mlen.pl> ---
No issues on drm-next-4.17-wip, unbinding works without any lockdep issues.
Do you plan to backport these changes to 4.16 or maybe even 4.15?
For the record, suspend fails on drm-next-4.17-wip. This could be possibly
related, but I don't know how to debug this and most likely this is out of
scope of this bug.
[ 96.222095] PM: suspend entry (deep)
[ 96.222099] PM: Syncing filesystems ... done.
[ 96.230020] INFO: trying to register non-static key.
[ 96.230024] the code is fine but needs lockdep annotation.
[ 96.230026] turning off the locking correctness validator.
[ 96.230029] CPU: 39 PID: 4506 Comm: pm-suspend Not tainted 4.16.0-rc7+ #2
[ 96.230031] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS,
BIOS 3407 03/10/2017
[ 96.230033] Call Trace:
[ 96.230040] dump_stack+0x46/0x59
[ 96.230046] register_lock_class+0x192/0x361
[ 96.230050] ? cycles_2_ns+0x55/0x75
[ 96.230054] __lock_acquire.isra.30+0x97/0x595
[ 96.230057] lock_acquire+0x105/0x12e
[ 96.230060] ? devres_for_each_res+0x41/0xc4
[ 96.230064] ? __fw_entry_found+0x3b/0x3b
[ 96.230068] _raw_spin_lock_irqsave+0x3d/0x74
[ 96.230071] ? devres_for_each_res+0x41/0xc4
[ 96.230073] ? kref_get+0xa/0xa
[ 96.230075] ? alloc_fw_cache_entry+0x4e/0x4e
[ 96.230077] devres_for_each_res+0x41/0xc4
[ 96.230096] dev_cache_fw_image+0x59/0x11d
[ 96.230098] ? fw_pm_notify+0xd1/0xd1
[ 96.230102] dpm_for_each_dev+0x41/0x58
[ 96.230104] fw_pm_notify+0xac/0xd1
[ 96.230108] notifier_call_chain+0x39/0x5a
[ 96.230127] __blocking_notifier_call_chain+0x4e/0x65
[ 96.230130] __pm_notifier_call_chain+0x1b/0x2f
[ 96.230133] pm_suspend+0x15b/0x2c1
[ 96.230135] state_store+0x4b/0x7e
[ 96.230140] kernfs_fop_write+0x114/0x15c
[ 96.230145] __vfs_write+0x33/0xd7
[ 96.230148] ? __sb_start_write+0x94/0x180
[ 96.230150] ? __sb_start_write+0xc0/0x180
[ 96.230153] vfs_write+0xa5/0xe2
[ 96.230156] SyS_write+0x5f/0xa3
[ 96.230160] do_syscall_64+0x79/0x88
[ 96.230164] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 96.230166] RIP: 0033:0x7fac7a914468
[ 96.230168] RSP: 002b:00007ffea25554e0 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 96.230171] RAX: ffffffffffffffda RBX: 0000000000000004 RCX:
00007fac7a914468
[ 96.230172] RDX: 0000000000000004 RSI: 0000558679eb4500 RDI:
0000000000000001
[ 96.230174] RBP: 0000558679eb4500 R08: 000000000000000a R09:
0000558679eeb6b0
[ 96.230175] R10: 00007fac7a9a5b20 R11: 0000000000000246 R12:
0000000000000004
[ 96.230177] R13: 0000000000000001 R14: 00007fac7abe1740 R15:
0000000000000004
[ 96.230183] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[ 96.230189] IP: devres_for_each_res+0x59/0xc4
[ 96.230191] PGD 0 P4D 0
[ 96.230196] Oops: 0000 [#1] PREEMPT SMP PTI
[ 96.230199] Modules linked in:
[ 96.230203] CPU: 39 PID: 4506 Comm: pm-suspend Not tainted 4.16.0-rc7+ #2
[ 96.230205] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS,
BIOS 3407 03/10/2017
[ 96.230208] RIP: 0010:devres_for_each_res+0x59/0xc4
[ 96.230210] RSP: 0018:ffffa63689fabc70 EFLAGS: 00010086
[ 96.230214] RAX: 0000000000000000 RBX: ffff9fd574d4ad98 RCX:
ffff9fd574d4b1f0
[ 96.230216] RDX: ffff9fd576ea0918 RSI: ffff9fd574d4b1c0 RDI:
ffff9fd576ea0918
[ 96.230218] RBP: ffffffffa8951b3a R08: 0000000000000000 R09:
0000000000000000
[ 96.230221] R10: ffff9fd576ea0000 R11: ffffffffab22aa07 R12:
ffffffffa8951f44
[ 96.230223] R13: dead000000000100 R14: ffffffffa8951c08 R15:
ffff9fd574d4b1a8
[ 96.230226] FS: 00007fac7b021740(0000) GS:ffff9fe57f400000(0000)
knlGS:0000000000000000
[ 96.230229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 96.230231] CR2: 0000000000000008 CR3: 000000202a4a8002 CR4:
00000000003606e0
[ 96.230233] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 96.230236] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 96.230238] Call Trace:
[ 96.230243] dev_cache_fw_image+0x59/0x11d
[ 96.230247] ? fw_pm_notify+0xd1/0xd1
[ 96.230250] dpm_for_each_dev+0x41/0x58
[ 96.230253] fw_pm_notify+0xac/0xd1
[ 96.230256] notifier_call_chain+0x39/0x5a
[ 96.230261] __blocking_notifier_call_chain+0x4e/0x65
[ 96.230264] __pm_notifier_call_chain+0x1b/0x2f
[ 96.230267] pm_suspend+0x15b/0x2c1
[ 96.230271] state_store+0x4b/0x7e
[ 96.230275] kernfs_fop_write+0x114/0x15c
[ 96.230279] __vfs_write+0x33/0xd7
[ 96.230284] ? __sb_start_write+0x94/0x180
[ 96.230286] ? __sb_start_write+0xc0/0x180
[ 96.230290] vfs_write+0xa5/0xe2
[ 96.230295] SyS_write+0x5f/0xa3
[ 96.230299] do_syscall_64+0x79/0x88
[ 96.230303] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 96.230305] RIP: 0033:0x7fac7a914468
[ 96.230308] RSP: 002b:00007ffea25554e0 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 96.230311] RAX: ffffffffffffffda RBX: 0000000000000004 RCX:
00007fac7a914468
[ 96.230313] RDX: 0000000000000004 RSI: 0000558679eb4500 RDI:
0000000000000001
[ 96.230316] RBP: 0000558679eb4500 R08: 000000000000000a R09:
0000558679eeb6b0
[ 96.230318] R10: 00007fac7a9a5b20 R11: 0000000000000246 R12:
0000000000000004
[ 96.230320] R13: 0000000000000001 R14: 00007fac7abe1740 R15:
0000000000000004
[ 96.230326] Code: 48 83 ec 28 48 89 4c 24 18 4c 89 4c 24 20 e8 ca 7d 4b 00
48 8d 8b 58 04 00 00 48 89 44 24 08 48 8b 83 60 04 00 00 48 89 4c 24 10 <4c> 8b
68 08 48 3b 44 24 10 74 44 4c 3b 70 10 75 35 48 83 c0 28
[ 96.230390] RIP: devres_for_each_res+0x59/0xc4 RSP: ffffa63689fabc70
[ 96.230392] CR2: 0000000000000008
[ 96.230395] ---[ end trace 43f33fa700a0efa9 ]---
[ 96.233392] note: pm-suspend[4506] exited with preempt_count 1
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 6877 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
` (18 preceding siblings ...)
2018-04-01 9:51 ` bugzilla-daemon
@ 2018-04-02 20:49 ` bugzilla-daemon
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-04-02 20:49 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1122 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=104825
Harry Wentland <harry.wentland@amd.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #18 from Harry Wentland <harry.wentland@amd.com> ---
Thanks for your bug report, and constant testing and feedback. Since the
original issue is fixed in drm-next-4.17-wip I'll mark this resolved. If I'm
jumping the gun here and you still notice problems feel free to reopen.
The suspend/resume issue you mention seems to be something else and might be
best tracked with a new bug report, possibly on kernel.org as I don't see an
indication that this is graphics driver related.
As for back-porting the changes, AFAIK this bug report uncovered quite a few
issues which were all fixed with separate commits. I don't think all of them
have been pulled into 4.16.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2795 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2018-04-02 20:49 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-29 6:57 [Bug 104825] [amdgpu] [drm:gfx_v8_0_hw_fini] *ERROR* KCQ disabled failed (scratch(0xC040)=0x00000000) when unbinding bugzilla-daemon
2018-01-29 15:47 ` bugzilla-daemon
2018-01-30 8:00 ` bugzilla-daemon
2018-01-31 7:21 ` bugzilla-daemon
2018-02-27 0:09 ` bugzilla-daemon
2018-02-27 10:55 ` bugzilla-daemon
2018-02-27 11:48 ` bugzilla-daemon
2018-02-27 14:51 ` bugzilla-daemon
2018-02-27 15:20 ` bugzilla-daemon
2018-02-27 15:30 ` bugzilla-daemon
2018-02-27 16:17 ` bugzilla-daemon
2018-02-27 18:10 ` bugzilla-daemon
2018-02-27 18:12 ` bugzilla-daemon
2018-02-27 19:01 ` bugzilla-daemon
2018-02-27 19:17 ` bugzilla-daemon
2018-03-27 18:02 ` bugzilla-daemon
2018-03-27 20:20 ` bugzilla-daemon
2018-03-29 16:07 ` bugzilla-daemon
2018-03-29 19:55 ` bugzilla-daemon
2018-04-01 9:51 ` bugzilla-daemon
2018-04-02 20:49 ` bugzilla-daemon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.