* [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
@ 2018-02-15 15:58 ` bugzilla-daemon
2018-02-15 16:09 ` [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause " bugzilla-daemon
` (18 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-15 15:58 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 528 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #1 from Vedran Miletić <vedran@miletic.net> ---
Same story with
tests/cl/program/execute/calls-struct.cl
tests/cl/program/execute/calls-workitem-id.cl
tests/cl/program/execute/calls.cl
tests/cl/program/execute/tail-calls.cl
while
tests/cl/program/execute/call-clobbers-amdgcn.cl
gets skipped. All those test were added in
e408ce1f2bff23121670a8206258c80bb3d9befd.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1427 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
2018-02-15 15:58 ` bugzilla-daemon
@ 2018-02-15 16:09 ` bugzilla-daemon
2018-04-29 22:23 ` bugzilla-daemon
` (17 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-15 16:09 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 889 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Vedran Miletić <vedran@miletic.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |arsenm2@gmail.com
Summary|[hawaii] Running Piglit |[hawaii, radeonsi, clover]
|cl/program/execute/calls-st |Running Piglit
|ruct.cl causes GPU VM error |cl/program/execute/{,tail-}
|and ring stalled GPU lockup |calls{,-struct,-workitem-id
| |}.cl cause GPU VM error and
| |ring stalled GPU lockup
Priority|medium |high
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1772 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
2018-02-15 15:58 ` bugzilla-daemon
2018-02-15 16:09 ` [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause " bugzilla-daemon
@ 2018-04-29 22:23 ` bugzilla-daemon
2018-06-30 13:06 ` bugzilla-daemon
` (16 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-04-29 22:23 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 3096 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Maciej S. Szmigiero <mail@maciej.szmigiero.name> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mail@maciej.szmigiero.name
--- Comment #2 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
I've also hit this issue on "Oland PRO [Radeon R7 240/340] (rev 87)" with
mesa-18.1.0_rc2, llvm-6.0.0 and kernel 4.16.5.
The crash happens at "cl/program/execute/calls-struct.cl" from piglit as well.
It happens both from a X session and from a KMS console.
The exact crash looks like this:
[ 171.969488] radeon 0000:20:00.0: GPU fault detected: 147 0x06106001
[ 171.969489] radeon 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00500030
[ 171.969490] radeon 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x10060001
[ 171.969491] VM fault (0x01, vmid 8) at page 5242928, read from CB (96)
Then the radeon driver tries to reset the GPU endlessly.
I've tried pcie_gen2=0, msi=0, dpm=0, hard_reset=1, vm_size=16 in various
combinations, nothing seems to help (msi=0 gives a ton of IOMMU errors, BTW).
Also have tried amdgpu which gives a similar crash (it looks like this
driver didn't attempt to reset the GPU afterwards):
[ 435.596230] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[ 435.596233] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00500060
[ 435.596235] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08060002
[ 435.596239] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (96)
[ 435.596245] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[ 435.596247] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00500060
[ 435.596248] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[ 435.596252] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[ 435.596256] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[ 435.596258] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00500060
[ 435.596260] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08010002
[ 435.596263] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (16)
[ 435.596267] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002
[ 435.596269] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00500060
[ 435.596271] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[ 435.596274] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[ 435.596278] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002
This might be (also?) a kernel bug since a userspace program should not
be able to crash a GPU, regardless how incorrect command stream it sends
to one.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 4767 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (2 preceding siblings ...)
2018-04-29 22:23 ` bugzilla-daemon
@ 2018-06-30 13:06 ` bugzilla-daemon
2018-10-27 17:42 ` bugzilla-daemon
` (15 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-06-30 13:06 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 409 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #3 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
Still happens on Mesa 18.1.2, LLVM 6.0.1 and kernel 4.17.2.
Note that piglit tests aren't the only thing that is affected
by this bug - ImageMagick OpenCL support also causes a
similar GPU fault.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1402 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (3 preceding siblings ...)
2018-06-30 13:06 ` bugzilla-daemon
@ 2018-10-27 17:42 ` bugzilla-daemon
2018-10-27 19:02 ` bugzilla-daemon
` (14 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-10-27 17:42 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1728 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #4 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
I have tested this hardware setup again with Mesa 18.2.3, LLVM 7.0.0,
kernel 4.19.0 and piglit from yesterday's git.
These tests no longer crash the GPU but fail anyway with various errors:
program@execute@calls-struct:
Error: 6 unsupported relocations
Expecting 1021 (0x3fd) with tolerance 0, but got 1 (0x1)
Error at int[0]
Argument 0: FAIL
Expecting 14 (0xe) with tolerance 0, but got 1 (0x1)
... and so for other arguments in this test.
program@execute@calls-workitem-id:
Error: 8 unsupported relocations
Could not wait for kernel to finish:
CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
Unexpected CL error: CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST -14
program@execute@calls:
<inline asm>:1:2: error: instruction not supported on this GPU
v_lshlrev_b64 v[0:1], 44, 1
program@execute@tail-calls:
Expecting 4 (0x4) with tolerance 0, but got 0 (0x0)
Error at int[0]
Argument 0: FAIL
Running kernel test: Tail call with more arguments than caller
Using kernel kernel_call_tailcall_extra_arg
Setting kernel arguments...
Running the kernel...
Validating results...
Expecting 2 (0x2) with tolerance 0, but got 1 (0x1)
Error at int[0]
Argument 0: FAIL
Running kernel test: Tail call with fewer arguments than acller
Using kernel kernel_call_tailcall_fewer_args
Setting kernel arguments...
Running the kernel...
Validating results...
Expecting 4 (0x4) with tolerance 0, but got 0 (0x0)
Error at int[0]
Argument 0: FAIL
... and so for other arguments and calls in this test.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2759 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (4 preceding siblings ...)
2018-10-27 17:42 ` bugzilla-daemon
@ 2018-10-27 19:02 ` bugzilla-daemon
2018-11-14 9:58 ` bugzilla-daemon
` (13 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-10-27 19:02 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 526 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Maciej S. Szmigiero <mail@maciej.szmigiero.name> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |99553
Referenced Bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=99553
[Bug 99553] Tracker bug for runnning OpenCL applications on Clover
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1698 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (5 preceding siblings ...)
2018-10-27 19:02 ` bugzilla-daemon
@ 2018-11-14 9:58 ` bugzilla-daemon
2018-11-14 10:00 ` bugzilla-daemon
` (12 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14 9:58 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Pander <pander@users.sourceforge.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://bugs.freedesktop.or
| |g/show_bug.cgi?id=102909
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (6 preceding siblings ...)
2018-11-14 9:58 ` bugzilla-daemon
@ 2018-11-14 10:00 ` bugzilla-daemon
2018-11-14 10:08 ` bugzilla-daemon
` (11 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14 10:00 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 433 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Pander <pander@users.sourceforge.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
OS|All |Linux (All)
Hardware|Other |All
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1331 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (7 preceding siblings ...)
2018-11-14 10:00 ` bugzilla-daemon
@ 2018-11-14 10:08 ` bugzilla-daemon
2018-11-14 10:08 ` bugzilla-daemon
` (10 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14 10:08 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Pander <pander@users.sourceforge.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://bugs.freedesktop.or
| |g/show_bug.cgi?id=104307
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (8 preceding siblings ...)
2018-11-14 10:08 ` bugzilla-daemon
@ 2018-11-14 10:08 ` bugzilla-daemon
2018-11-14 10:09 ` bugzilla-daemon
` (9 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14 10:08 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Pander <pander@users.sourceforge.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://bugs.freedesktop.or
| |g/show_bug.cgi?id=101712
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (9 preceding siblings ...)
2018-11-14 10:08 ` bugzilla-daemon
@ 2018-11-14 10:09 ` bugzilla-daemon
2018-11-15 22:24 ` bugzilla-daemon
` (8 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14 10:09 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Pander <pander@users.sourceforge.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://bugs.freedesktop.or
| |g/show_bug.cgi?id=107545
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (10 preceding siblings ...)
2018-11-14 10:09 ` bugzilla-daemon
@ 2018-11-15 22:24 ` bugzilla-daemon
2018-11-16 16:13 ` bugzilla-daemon
` (7 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-15 22:24 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1036 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Jan Vesely <jan.vesely@rutgers.edu> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also|https://bugs.freedesktop.or |
|g/show_bug.cgi?id=102909, |
|https://bugs.freedesktop.or |
|g/show_bug.cgi?id=104307, |
|https://bugs.freedesktop.or |
|g/show_bug.cgi?id=101712, |
|https://bugs.freedesktop.or |
|g/show_bug.cgi?id=107545 |
--- Comment #5 from Jan Vesely <jan.vesely@rutgers.edu> ---
This behaviour is expected.
until mesa properly supports code relocations (or llvm stops using relocations
for internal symbols) it will refuse to run kernels that need relocating.
Jumping to invalid address causes both pagefault and a gpu hang.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2516 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (11 preceding siblings ...)
2018-11-15 22:24 ` bugzilla-daemon
@ 2018-11-16 16:13 ` bugzilla-daemon
2018-11-18 19:24 ` bugzilla-daemon
` (6 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-16 16:13 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1046 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #6 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
There are really two issues at play here:
1) If the LLVM-generated code cannot be run properly then it should be simply
rejected by whatever is actually in charge of submitting it to the GPU (I guess
this would be Mesa?).
This way an application will know it cannot use OpenCL for computation, at
least
not with this compute kernel.
Instead, it currently looks like many of these test run but give incorrect
results, which is obviously rather bad.
2) Some (previous) Mesa + LLVM versions generate a command stream that crashes
the GPU and, as far as I can remember, sometimes even lockup the whole machine.
It should not be possible to crash the GPU, regardless how incorrect a command
stream that userspace sends to it is - because otherwise it is possible for
an unprivileged user with GPU access to DoS the machine.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2039 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (12 preceding siblings ...)
2018-11-16 16:13 ` bugzilla-daemon
@ 2018-11-18 19:24 ` bugzilla-daemon
2018-11-19 14:03 ` bugzilla-daemon
` (5 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-18 19:24 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1598 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #7 from Jan Vesely <jan.vesely@rutgers.edu> ---
(In reply to Maciej S. Szmigiero from comment #6)
> There are really two issues at play here:
> 1) If the LLVM-generated code cannot be run properly then it should be simply
> rejected by whatever is actually in charge of submitting it to the GPU (I
> guess
> this would be Mesa?).
> This way an application will know it cannot use OpenCL for computation, at
> least
> not with this compute kernel.
>
> Instead, it currently looks like many of these test run but give incorrect
> results, which is obviously rather bad.
Do you have an example of this? clover should return OUT_OF_RESOURCES error
when the compute state creation fails (like in the presence of code
relocations).
It does not change the content of the buffer, so it will return whatever was
stored in the buffer on creation.
> 2) Some (previous) Mesa + LLVM versions generate a command stream that
> crashes the GPU and, as far as I can remember, sometimes even lockup the
> whole machine.
>
> It should not be possible to crash the GPU, regardless how incorrect a
> command stream that userspace sends to it is - because otherwise it is
> possible for
> an unprivileged user with GPU access to DoS the machine.
This is a separate issue. GPU hangs are generally addressed via gpu reset which
should be enabled for gfx8/9 GPUs in recent amdgpu.ko [0]
[0] https://patchwork.freedesktop.org/patch/257994/
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2793 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (13 preceding siblings ...)
2018-11-18 19:24 ` bugzilla-daemon
@ 2018-11-19 14:03 ` bugzilla-daemon
2018-11-22 4:44 ` bugzilla-daemon
` (4 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-19 14:03 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 2264 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #8 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
(In reply to Jan Vesely from comment #7)
> (In reply to Maciej S. Szmigiero from comment #6)
> > There are really two issues at play here:
> > 1) If the LLVM-generated code cannot be run properly then it should be simply
> > rejected by whatever is actually in charge of submitting it to the GPU (I
> > guess
> > this would be Mesa?).
> > This way an application will know it cannot use OpenCL for computation, at
> > least
> > not with this compute kernel.
> >
> > Instead, it currently looks like many of these test run but give incorrect
> > results, which is obviously rather bad.
>
> Do you have an example of this? clover should return OUT_OF_RESOURCES error
> when the compute state creation fails (like in the presence of code
> relocations).
> It does not change the content of the buffer, so it will return whatever was
> stored in the buffer on creation.
Aren't program@execute@calls-struct and program@execute@tail-calls tests
from comment 4 examples of this behavior?
These seem to run but return wrong results, or am I not parsing the piglit
test results correctly?
> > 2) Some (previous) Mesa + LLVM versions generate a command stream that
> > crashes the GPU and, as far as I can remember, sometimes even lockup the
> > whole machine.
> >
> > It should not be possible to crash the GPU, regardless how incorrect a
> > command stream that userspace sends to it is - because otherwise it is
> > possible for
> > an unprivileged user with GPU access to DoS the machine.
>
> This is a separate issue. GPU hangs are generally addressed via gpu reset
> which should be enabled for gfx8/9 GPUs in recent amdgpu.ko [0]
>
> [0] https://patchwork.freedesktop.org/patch/257994/
This would explain why "amdgpu" seemed to not even attempt to reset the GPU
after a crash.
However, I think I've got at least one lockup when testing this issue half a
year ago on "radeon" driver ("amdgpu" is still marked as experimental for SI
parts).
If I am able to reproduce it in the future I will report it then.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 3691 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (14 preceding siblings ...)
2018-11-19 14:03 ` bugzilla-daemon
@ 2018-11-22 4:44 ` bugzilla-daemon
2018-11-23 13:43 ` bugzilla-daemon
` (3 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-22 4:44 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1592 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #9 from Jan Vesely <jan.vesely@rutgers.edu> ---
(In reply to Maciej S. Szmigiero from comment #8)
> Aren't program@execute@calls-struct and program@execute@tail-calls tests
> from comment 4 examples of this behavior?
> These seem to run but return wrong results, or am I not parsing the piglit
> test results correctly?
This is more of a piglit problem. piglit uses a combination of enqueue and
clFinish. However, the error happens on kernel launch. thus;
1.) clEnqueueNDRangeKernel -- success
2.) The driver tries to launch the kernel and fails on relocations
3.) application(piglit) calls clFinish
depending on the order of 2. and 3. clFinish can either see an empty queue and
succeed or try to wait for kernel execution and fail.
The following series should address that:
https://patchwork.freedesktop.org/series/52857/
> This would explain why "amdgpu" seemed to not even attempt to reset the GPU
> after a crash.
>
> However, I think I've got at least one lockup when testing this issue half a
> year ago on "radeon" driver ("amdgpu" is still marked as experimental for SI
> parts).
> If I am able to reproduce it in the future I will report it then.
comment #1 shows an example of a successful restart using radeon.ko, so I guess
it worked for at least some ASICs. at any rate, restarting GPU is a separate,
kernel, problem.
Feel free to remove the relocation guard if you want to investigate GPU reset.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2889 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (15 preceding siblings ...)
2018-11-22 4:44 ` bugzilla-daemon
@ 2018-11-23 13:43 ` bugzilla-daemon
2018-12-04 17:40 ` bugzilla-daemon
` (2 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-23 13:43 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1453 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #10 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
(In reply to Jan Vesely from comment #9)
> (In reply to Maciej S. Szmigiero from comment #8)
> > Aren't program@execute@calls-struct and program@execute@tail-calls tests
> > from comment 4 examples of this behavior?
> > These seem to run but return wrong results, or am I not parsing the piglit
> > test results correctly?
>
> This is more of a piglit problem. piglit uses a combination of enqueue and
> clFinish. However, the error happens on kernel launch. thus;
> 1.) clEnqueueNDRangeKernel -- success
> 2.) The driver tries to launch the kernel and fails on relocations
> 3.) application(piglit) calls clFinish
>
> depending on the order of 2. and 3. clFinish can either see an empty queue
> and succeed or try to wait for kernel execution and fail.
>
> The following series should address that:
> https://patchwork.freedesktop.org/series/52857/
Thanks for the detailed explanation and the patches.
I can confirm that with them applied program@execute@calls-struct and
program@execute@tail-calls exit with
CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST, so I guess they work
(or rather, fail) as expected.
Feel free to add
"Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>" tag if you would
like.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2824 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (16 preceding siblings ...)
2018-11-23 13:43 ` bugzilla-daemon
@ 2018-12-04 17:40 ` bugzilla-daemon
2019-01-15 9:23 ` bugzilla-daemon
2019-06-13 18:45 ` bugzilla-daemon
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-12-04 17:40 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1660 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #11 from Jan Vesely <jan.vesely@rutgers.edu> ---
(In reply to Maciej S. Szmigiero from comment #10)
> (In reply to Jan Vesely from comment #9)
> > (In reply to Maciej S. Szmigiero from comment #8)
> > > Aren't program@execute@calls-struct and program@execute@tail-calls tests
> > > from comment 4 examples of this behavior?
> > > These seem to run but return wrong results, or am I not parsing the piglit
> > > test results correctly?
> >
> > This is more of a piglit problem. piglit uses a combination of enqueue and
> > clFinish. However, the error happens on kernel launch. thus;
> > 1.) clEnqueueNDRangeKernel -- success
> > 2.) The driver tries to launch the kernel and fails on relocations
> > 3.) application(piglit) calls clFinish
> >
> > depending on the order of 2. and 3. clFinish can either see an empty queue
> > and succeed or try to wait for kernel execution and fail.
> >
> > The following series should address that:
> > https://patchwork.freedesktop.org/series/52857/
>
> Thanks for the detailed explanation and the patches.
>
> I can confirm that with them applied program@execute@calls-struct and
> program@execute@tail-calls exit with
> CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST, so I guess they work
> (or rather, fail) as expected.
>
> Feel free to add
> "Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>" tag if you
> would
> like.
Thanks. I pushed the piglit patches. I'll keep this bug open until mesa
properly supports relocations.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 3149 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (17 preceding siblings ...)
2018-12-04 17:40 ` bugzilla-daemon
@ 2019-01-15 9:23 ` bugzilla-daemon
2019-06-13 18:45 ` bugzilla-daemon
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2019-01-15 9:23 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 269 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
--- Comment #12 from Pander <pander@users.sourceforge.net> ---
Super for the tested patch. What is the status regarding Mesa on this?
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1252 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
` (18 preceding siblings ...)
2019-01-15 9:23 ` bugzilla-daemon
@ 2019-06-13 18:45 ` bugzilla-daemon
19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2019-06-13 18:45 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 626 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=105113
Jan Vesely <jv356@scarletmail.rutgers.edu> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #13 from Jan Vesely <jv356@scarletmail.rutgers.edu> ---
Relocations are now handled in the new radeonsi linker (merged in
77b05cc42df29472a7852b90575a19e8991815cd and co.)
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2437 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 21+ messages in thread