dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup
@ 2018-02-15 14:58 bugzilla-daemon
  2018-02-15 15:58 ` bugzilla-daemon
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-15 14:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 15234 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

            Bug ID: 105113
           Summary: [hawaii] Running Piglit
                    cl/program/execute/calls-struct.cl causes GPU VM error
                    and ring stalled GPU lockup
           Product: Mesa
           Version: git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: Drivers/Gallium/radeonsi
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: vedran@miletic.net
        QA Contact: dri-devel@lists.freedesktop.org

On kernel 4.16.0-0.rc1.git0.1.fc28.x86_64, with 01:00.0 VGA compatible
controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT GL [FirePro W9100]
upon running Piglit's cl/program/execute/calls-struct.cl test I get:

[ 1574.837119] radeon 0000:01:00.0: GPU fault detected: 147 0x080a8401
[ 1574.837124] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00400840
[ 1574.837126] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0A084001
[ 1574.837128] VM fault (0x01, vmid 5) at page 4196416, read from 'TC5'
(0x54433500) (132)
[ 1585.420894] radeon 0000:01:00.0: ring 0 stalled for more than 10080msec
[ 1585.420901] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1585.924885] radeon 0000:01:00.0: ring 0 stalled for more than 10584msec
[ 1585.924892] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1586.428890] radeon 0000:01:00.0: ring 0 stalled for more than 11088msec
[ 1586.428897] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1586.932902] radeon 0000:01:00.0: ring 0 stalled for more than 11592msec
[ 1586.932911] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1587.436903] radeon 0000:01:00.0: ring 0 stalled for more than 12096msec
[ 1587.436909] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1587.940855] radeon 0000:01:00.0: ring 0 stalled for more than 12600msec
[ 1587.940859] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1588.444913] radeon 0000:01:00.0: ring 0 stalled for more than 13104msec
[ 1588.444922] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1588.948909] radeon 0000:01:00.0: ring 0 stalled for more than 13608msec
[ 1588.948918] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1589.452909] radeon 0000:01:00.0: ring 0 stalled for more than 14112msec
[ 1589.452916] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1589.956912] radeon 0000:01:00.0: ring 0 stalled for more than 14616msec
[ 1589.956920] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1590.460913] radeon 0000:01:00.0: ring 0 stalled for more than 15120msec
[ 1590.460920] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1590.964927] radeon 0000:01:00.0: ring 0 stalled for more than 15624msec
[ 1590.964934] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1591.468898] radeon 0000:01:00.0: ring 0 stalled for more than 16128msec
[ 1591.468905] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1591.972882] radeon 0000:01:00.0: ring 0 stalled for more than 16632msec
[ 1591.972887] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1592.476903] radeon 0000:01:00.0: ring 0 stalled for more than 17136msec
[ 1592.476908] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1592.980928] radeon 0000:01:00.0: ring 0 stalled for more than 17640msec
[ 1592.980936] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1593.484931] radeon 0000:01:00.0: ring 0 stalled for more than 18144msec
[ 1593.484939] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1593.988933] radeon 0000:01:00.0: ring 0 stalled for more than 18648msec
[ 1593.988941] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1594.492935] radeon 0000:01:00.0: ring 0 stalled for more than 19152msec
[ 1594.492943] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1594.996951] radeon 0000:01:00.0: ring 0 stalled for more than 19656msec
[ 1594.996962] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1595.500953] radeon 0000:01:00.0: ring 0 stalled for more than 20160msec
[ 1595.500963] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1596.004957] radeon 0000:01:00.0: ring 0 stalled for more than 20664msec
[ 1596.004967] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1596.508970] radeon 0000:01:00.0: ring 0 stalled for more than 21168msec
[ 1596.508983] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1597.012966] radeon 0000:01:00.0: ring 0 stalled for more than 21672msec
[ 1597.012982] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1597.516969] radeon 0000:01:00.0: ring 0 stalled for more than 22176msec
[ 1597.516984] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1598.020970] radeon 0000:01:00.0: ring 0 stalled for more than 22680msec
[ 1598.020985] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1598.524974] radeon 0000:01:00.0: ring 0 stalled for more than 23184msec
[ 1598.524989] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1599.028975] radeon 0000:01:00.0: ring 0 stalled for more than 23688msec
[ 1599.028990] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1599.532977] radeon 0000:01:00.0: ring 0 stalled for more than 24192msec
[ 1599.532992] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1600.036981] radeon 0000:01:00.0: ring 0 stalled for more than 24696msec
[ 1600.036997] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1600.540984] radeon 0000:01:00.0: ring 0 stalled for more than 25200msec
[ 1600.540999] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1601.044948] radeon 0000:01:00.0: ring 0 stalled for more than 25704msec
[ 1601.044963] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1601.548986] radeon 0000:01:00.0: ring 0 stalled for more than 26208msec
[ 1601.549002] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1602.052966] radeon 0000:01:00.0: ring 0 stalled for more than 26712msec
[ 1602.052981] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1602.556999] radeon 0000:01:00.0: ring 0 stalled for more than 27216msec
[ 1602.557014] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1603.060934] radeon 0000:01:00.0: ring 0 stalled for more than 27720msec
[ 1603.060938] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1603.564977] radeon 0000:01:00.0: ring 0 stalled for more than 28224msec
[ 1603.564981] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1604.068965] radeon 0000:01:00.0: ring 0 stalled for more than 28728msec
[ 1604.068969] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1604.572967] radeon 0000:01:00.0: ring 0 stalled for more than 29232msec
[ 1604.572971] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1605.076978] radeon 0000:01:00.0: ring 0 stalled for more than 29736msec
[ 1605.076984] radeon 0000:01:00.0: GPU lockup (current fence id
0x000000000000002b last fence id 0x000000000000002c on ring 0)
[ 1605.136370] radeon 0000:01:00.0: Saved 24 dwords of commands on ring 0.
[ 1605.136381] radeon 0000:01:00.0: GPU softreset: 0x00000009
[ 1605.136382] radeon 0000:01:00.0:   GRBM_STATUS=0xA0403028
[ 1605.136383] radeon 0000:01:00.0:   GRBM_STATUS2=0x50000008
[ 1605.136385] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x08000006
[ 1605.136386] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x08000006
[ 1605.136387] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x08000006
[ 1605.136388] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x08000006
[ 1605.136389] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[ 1605.136390] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[ 1605.136391] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[ 1605.136392] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[ 1605.136393] radeon 0000:01:00.0:   CP_STAT = 0x80038600
[ 1605.136394] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
[ 1605.136395] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00018000
[ 1605.136397] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[ 1605.136398] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x00000002
[ 1605.136399] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[ 1605.136400] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80000063
[ 1605.136401] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000000
[ 1605.136402] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[ 1605.136403] radeon 0000:01:00.0:   CP_CPC_STATUS = 0x00000000
[ 1605.136404] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00000000
[ 1605.136406] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x00000000
[ 1605.136539] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
[ 1605.136591] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[ 1605.137745] radeon 0000:01:00.0:   GRBM_STATUS=0x00003028
[ 1605.137746] radeon 0000:01:00.0:   GRBM_STATUS2=0x00000008
[ 1605.137747] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
[ 1605.137748] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
[ 1605.137749] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
[ 1605.137751] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
[ 1605.137752] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
[ 1605.137752] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
[ 1605.137754] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
[ 1605.137755] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEE557
[ 1605.137756] radeon 0000:01:00.0:   CP_STAT = 0x00000000
[ 1605.137757] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
[ 1605.137758] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
[ 1605.137759] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
[ 1605.137760] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x00000000
[ 1605.137761] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
[ 1605.137762] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x00000000
[ 1605.137763] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000000
[ 1605.137764] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
[ 1605.137766] radeon 0000:01:00.0:   CP_CPC_STATUS = 0x00000000
[ 1605.137779] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[ 1605.316214] [drm:ci_dpm_enable [radeon]] *ERROR* ci_start_dpm failed
[ 1605.316228] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume
failed
[ 1605.316232] [drm] probing gen 2 caps for device 8086:c01 = 261ad03/e
[ 1605.316234] [drm] PCIE gen 3 link speeds already enabled
[ 1605.322812] [drm] PCIE GART of 2048M enabled (table at 0x000000000030E000).
[ 1605.322948] radeon 0000:01:00.0: WB enabled
[ 1605.322963] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr
0x0000000400000c00 and cpu addr 0x0000000069866a2d
[ 1605.322964] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr
0x0000000400000c04 and cpu addr 0x000000006efe9aa0
[ 1605.322965] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr
0x0000000400000c08 and cpu addr 0x00000000a652c3ad
[ 1605.322966] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr
0x0000000400000c0c and cpu addr 0x00000000fc5d211b
[ 1605.322967] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr
0x0000000400000c10 and cpu addr 0x00000000cd5ca2f4
[ 1605.323322] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr
0x0000000000078d30 and cpu addr 0x00000000ae9e3dfe
[ 1605.323463] radeon 0000:01:00.0: fence driver on ring 6 use gpu addr
0x0000000400000c18 and cpu addr 0x000000007065469b
[ 1605.323464] radeon 0000:01:00.0: fence driver on ring 7 use gpu addr
0x0000000400000c1c and cpu addr 0x00000000b246b6b7
[ 1605.325595] [drm] ring test on 0 succeeded in 4 usecs
[ 1605.325659] [drm] ring test on 1 succeeded in 3 usecs
[ 1605.325667] [drm] ring test on 2 succeeded in 2 usecs
[ 1605.325846] [drm] ring test on 3 succeeded in 5 usecs
[ 1605.325852] [drm] ring test on 4 succeeded in 4 usecs
[ 1605.371875] [drm] ring test on 5 succeeded in 1 usecs
[ 1605.391888] [drm] UVD initialized successfully.
[ 1605.493990] [drm] ring test on 6 succeeded in 1223 usecs
[ 1605.494000] [drm] ring test on 7 succeeded in 4 usecs
[ 1605.494000] [drm] VCE initialized successfully.
[ 1605.494036] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume
failed
[ 1605.494372] [drm] ib test on ring 0 succeeded in 0 usecs
[ 1605.494505] [drm] ib test on ring 1 succeeded in 0 usecs
[ 1605.494638] [drm] ib test on ring 2 succeeded in 0 usecs
[ 1605.494771] [drm] ib test on ring 3 succeeded in 0 usecs
[ 1605.494903] [drm] ib test on ring 4 succeeded in 0 usecs
[ 1606.021037] [drm] ib test on ring 5 succeeded
[ 1606.042045] [drm] ib test on ring 6 succeeded
[ 1606.042863] [drm] ib test on ring 7 succeeded

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 16623 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
@ 2018-02-15 15:58 ` bugzilla-daemon
  2018-02-15 16:09 ` [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause " bugzilla-daemon
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-15 15:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 528 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #1 from Vedran Miletić <vedran@miletic.net> ---
Same story with

tests/cl/program/execute/calls-struct.cl
tests/cl/program/execute/calls-workitem-id.cl
tests/cl/program/execute/calls.cl
tests/cl/program/execute/tail-calls.cl

while

tests/cl/program/execute/call-clobbers-amdgcn.cl

gets skipped. All those test were added in
e408ce1f2bff23121670a8206258c80bb3d9befd.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1427 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
  2018-02-15 15:58 ` bugzilla-daemon
@ 2018-02-15 16:09 ` bugzilla-daemon
  2018-04-29 22:23 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-02-15 16:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 889 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Vedran Miletić <vedran@miletic.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arsenm2@gmail.com
            Summary|[hawaii] Running Piglit     |[hawaii, radeonsi, clover]
                   |cl/program/execute/calls-st |Running Piglit
                   |ruct.cl causes GPU VM error |cl/program/execute/{,tail-}
                   |and ring stalled GPU lockup |calls{,-struct,-workitem-id
                   |                            |}.cl cause GPU VM error and
                   |                            |ring stalled GPU lockup
           Priority|medium                      |high

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1772 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
  2018-02-15 15:58 ` bugzilla-daemon
  2018-02-15 16:09 ` [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause " bugzilla-daemon
@ 2018-04-29 22:23 ` bugzilla-daemon
  2018-06-30 13:06 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-04-29 22:23 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3096 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Maciej S. Szmigiero <mail@maciej.szmigiero.name> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mail@maciej.szmigiero.name

--- Comment #2 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
I've also hit this issue on "Oland PRO [Radeon R7 240/340] (rev 87)" with
mesa-18.1.0_rc2, llvm-6.0.0 and kernel 4.16.5.

The crash happens at "cl/program/execute/calls-struct.cl" from piglit as well.
It happens both from a X session and from a KMS console.

The exact crash looks like this:
[  171.969488] radeon 0000:20:00.0: GPU fault detected: 147 0x06106001
[  171.969489] radeon 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500030
[  171.969490] radeon 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x10060001
[  171.969491] VM fault (0x01, vmid 8) at page 5242928, read from CB (96)

Then the radeon driver tries to reset the GPU endlessly.
I've tried pcie_gen2=0, msi=0, dpm=0, hard_reset=1, vm_size=16 in various
combinations, nothing seems to help (msi=0 gives a ton of IOMMU errors, BTW).

Also have tried amdgpu which gives a similar crash (it looks like this
driver didn't attempt to reset the GPU afterwards):
[  435.596230] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596233] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500060
[  435.596235] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08060002
[  435.596239] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (96)
[  435.596245] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596247] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500060
[  435.596248] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[  435.596252] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[  435.596256] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596258] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500060
[  435.596260] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08010002
[  435.596263] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (16)
[  435.596267] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002
[  435.596269] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00500060
[  435.596271] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[  435.596274] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[  435.596278] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002

This might be (also?) a kernel bug since a userspace program should not
be able to crash a GPU, regardless how incorrect command stream it sends
to one.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4767 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-04-29 22:23 ` bugzilla-daemon
@ 2018-06-30 13:06 ` bugzilla-daemon
  2018-10-27 17:42 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-06-30 13:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 409 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #3 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
Still happens on Mesa 18.1.2, LLVM 6.0.1 and kernel 4.17.2.

Note that piglit tests aren't the only thing that is affected
by this bug - ImageMagick OpenCL support also causes a
similar GPU fault.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1402 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-06-30 13:06 ` bugzilla-daemon
@ 2018-10-27 17:42 ` bugzilla-daemon
  2018-10-27 19:02 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-10-27 17:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1728 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #4 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
I have tested this hardware setup again with Mesa 18.2.3, LLVM 7.0.0,
kernel 4.19.0 and piglit from yesterday's git.
These tests no longer crash the GPU but fail anyway with various errors:

program@execute@calls-struct:
Error: 6 unsupported relocations
Expecting 1021 (0x3fd) with tolerance 0, but got 1 (0x1)
Error at int[0]
Argument 0: FAIL
Expecting 14 (0xe) with tolerance 0, but got 1 (0x1)
... and so for other arguments in this test.

program@execute@calls-workitem-id:
Error: 8 unsupported relocations
Could not wait for kernel to finish:
CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
Unexpected CL error: CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST -14

program@execute@calls:
<inline asm>:1:2: error: instruction not supported on this GPU
v_lshlrev_b64 v[0:1], 44, 1

program@execute@tail-calls:
Expecting 4 (0x4) with tolerance 0, but got 0 (0x0)
Error at int[0]
Argument 0: FAIL
Running kernel test: Tail call with more arguments than caller
Using kernel kernel_call_tailcall_extra_arg
Setting kernel arguments...
Running the kernel...
Validating results...
Expecting 2 (0x2) with tolerance 0, but got 1 (0x1)
Error at int[0]
Argument 0: FAIL
Running kernel test: Tail call with fewer  arguments than acller
Using kernel kernel_call_tailcall_fewer_args
Setting kernel arguments...
Running the kernel...
Validating results...
Expecting 4 (0x4) with tolerance 0, but got 0 (0x0)
Error at int[0]
Argument 0: FAIL
... and so for other arguments and calls in this test.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2759 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-10-27 17:42 ` bugzilla-daemon
@ 2018-10-27 19:02 ` bugzilla-daemon
  2018-11-14  9:58 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-10-27 19:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 526 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Maciej S. Szmigiero <mail@maciej.szmigiero.name> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |99553


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=99553
[Bug 99553] Tracker bug for runnning OpenCL applications on Clover
-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1698 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-10-27 19:02 ` bugzilla-daemon
@ 2018-11-14  9:58 ` bugzilla-daemon
  2018-11-14 10:00 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14  9:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Pander <pander@users.sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugs.freedesktop.or
                   |                            |g/show_bug.cgi?id=102909

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (6 preceding siblings ...)
  2018-11-14  9:58 ` bugzilla-daemon
@ 2018-11-14 10:00 ` bugzilla-daemon
  2018-11-14 10:08 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14 10:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 433 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Pander <pander@users.sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|All                         |Linux (All)
           Hardware|Other                       |All

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1331 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (7 preceding siblings ...)
  2018-11-14 10:00 ` bugzilla-daemon
@ 2018-11-14 10:08 ` bugzilla-daemon
  2018-11-14 10:08 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14 10:08 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Pander <pander@users.sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugs.freedesktop.or
                   |                            |g/show_bug.cgi?id=104307

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (8 preceding siblings ...)
  2018-11-14 10:08 ` bugzilla-daemon
@ 2018-11-14 10:08 ` bugzilla-daemon
  2018-11-14 10:09 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14 10:08 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Pander <pander@users.sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugs.freedesktop.or
                   |                            |g/show_bug.cgi?id=101712

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (9 preceding siblings ...)
  2018-11-14 10:08 ` bugzilla-daemon
@ 2018-11-14 10:09 ` bugzilla-daemon
  2018-11-15 22:24 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-14 10:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Pander <pander@users.sourceforge.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugs.freedesktop.or
                   |                            |g/show_bug.cgi?id=107545

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (10 preceding siblings ...)
  2018-11-14 10:09 ` bugzilla-daemon
@ 2018-11-15 22:24 ` bugzilla-daemon
  2018-11-16 16:13 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-15 22:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1036 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Jan Vesely <jan.vesely@rutgers.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|https://bugs.freedesktop.or |
                   |g/show_bug.cgi?id=102909,   |
                   |https://bugs.freedesktop.or |
                   |g/show_bug.cgi?id=104307,   |
                   |https://bugs.freedesktop.or |
                   |g/show_bug.cgi?id=101712,   |
                   |https://bugs.freedesktop.or |
                   |g/show_bug.cgi?id=107545    |

--- Comment #5 from Jan Vesely <jan.vesely@rutgers.edu> ---
This behaviour is expected.
until mesa properly supports code relocations (or llvm stops using relocations
for internal symbols) it will refuse to run kernels that need relocating.
Jumping to invalid address causes both pagefault and a gpu hang.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2516 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (11 preceding siblings ...)
  2018-11-15 22:24 ` bugzilla-daemon
@ 2018-11-16 16:13 ` bugzilla-daemon
  2018-11-18 19:24 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-16 16:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1046 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #6 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
There are really two issues at play here:
1) If the LLVM-generated code cannot be run properly then it should be simply
rejected by whatever is actually in charge of submitting it to the GPU (I guess
this would be Mesa?).
This way an application will know it cannot use OpenCL for computation, at
least
not with this compute kernel.

Instead, it currently looks like many of these test run but give incorrect
results, which is obviously rather bad.

2) Some (previous) Mesa + LLVM versions generate a command stream that crashes
the GPU and, as far as I can remember, sometimes even lockup the whole machine.

It should not be possible to crash the GPU, regardless how incorrect a command
stream that userspace sends to it is - because otherwise it is possible for
an unprivileged user with GPU access to DoS the machine.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2039 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (12 preceding siblings ...)
  2018-11-16 16:13 ` bugzilla-daemon
@ 2018-11-18 19:24 ` bugzilla-daemon
  2018-11-19 14:03 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-18 19:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1598 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #7 from Jan Vesely <jan.vesely@rutgers.edu> ---
(In reply to Maciej S. Szmigiero from comment #6)
> There are really two issues at play here:
> 1) If the LLVM-generated code cannot be run properly then it should be simply
> rejected by whatever is actually in charge of submitting it to the GPU (I
> guess
> this would be Mesa?).
> This way an application will know it cannot use OpenCL for computation, at
> least
> not with this compute kernel.
> 
> Instead, it currently looks like many of these test run but give incorrect
> results, which is obviously rather bad.

Do you have an example of this? clover should return OUT_OF_RESOURCES error
when the compute state creation fails (like in the presence of code
relocations).
It does not change the content of the buffer, so it will return whatever was
stored in the buffer on creation.

> 2) Some (previous) Mesa + LLVM versions generate a command stream that
> crashes the GPU and, as far as I can remember, sometimes even lockup the
> whole machine.
> 
> It should not be possible to crash the GPU, regardless how incorrect a
> command stream that userspace sends to it is - because otherwise it is
> possible for
> an unprivileged user with GPU access to DoS the machine.

This is a separate issue. GPU hangs are generally addressed via gpu reset which
should be enabled for gfx8/9 GPUs in recent amdgpu.ko [0]

[0] https://patchwork.freedesktop.org/patch/257994/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2793 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (13 preceding siblings ...)
  2018-11-18 19:24 ` bugzilla-daemon
@ 2018-11-19 14:03 ` bugzilla-daemon
  2018-11-22  4:44 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-19 14:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2264 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #8 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
(In reply to Jan Vesely from comment #7)
> (In reply to Maciej S. Szmigiero from comment #6)
> > There are really two issues at play here:
> > 1) If the LLVM-generated code cannot be run properly then it should be simply
> > rejected by whatever is actually in charge of submitting it to the GPU (I
> > guess
> > this would be Mesa?).
> > This way an application will know it cannot use OpenCL for computation, at
> > least
> > not with this compute kernel.
> > 
> > Instead, it currently looks like many of these test run but give incorrect
> > results, which is obviously rather bad.
> 
> Do you have an example of this? clover should return OUT_OF_RESOURCES error
> when the compute state creation fails (like in the presence of code
> relocations).
> It does not change the content of the buffer, so it will return whatever was
> stored in the buffer on creation.

Aren't program@execute@calls-struct and program@execute@tail-calls tests
from comment 4 examples of this behavior?
These seem to run but return wrong results, or am I not parsing the piglit
test results correctly?

> > 2) Some (previous) Mesa + LLVM versions generate a command stream that
> > crashes the GPU and, as far as I can remember, sometimes even lockup the
> > whole machine.
> > 
> > It should not be possible to crash the GPU, regardless how incorrect a
> > command stream that userspace sends to it is - because otherwise it is
> > possible for
> > an unprivileged user with GPU access to DoS the machine.
> 
> This is a separate issue. GPU hangs are generally addressed via gpu reset
> which should be enabled for gfx8/9 GPUs in recent amdgpu.ko [0]
> 
> [0] https://patchwork.freedesktop.org/patch/257994/

This would explain why "amdgpu" seemed to not even attempt to reset the GPU
after a crash.

However, I think I've got at least one lockup when testing this issue half a
year ago on "radeon" driver ("amdgpu" is still marked as experimental for SI
parts).
If I am able to reproduce it in the future I will report it then.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (14 preceding siblings ...)
  2018-11-19 14:03 ` bugzilla-daemon
@ 2018-11-22  4:44 ` bugzilla-daemon
  2018-11-23 13:43 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-22  4:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1592 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #9 from Jan Vesely <jan.vesely@rutgers.edu> ---
(In reply to Maciej S. Szmigiero from comment #8)
> Aren't program@execute@calls-struct and program@execute@tail-calls tests
> from comment 4 examples of this behavior?
> These seem to run but return wrong results, or am I not parsing the piglit
> test results correctly?

This is more of a piglit problem. piglit uses a combination of enqueue and
clFinish. However, the error happens on kernel launch. thus;
1.) clEnqueueNDRangeKernel -- success
2.) The driver tries to launch the kernel and fails on relocations
3.) application(piglit) calls clFinish

depending on the order of 2. and 3. clFinish can either see an empty queue and
succeed or try to wait for kernel execution and fail.

The following series should address that:
https://patchwork.freedesktop.org/series/52857/

> This would explain why "amdgpu" seemed to not even attempt to reset the GPU
> after a crash.
> 
> However, I think I've got at least one lockup when testing this issue half a
> year ago on "radeon" driver ("amdgpu" is still marked as experimental for SI
> parts).
> If I am able to reproduce it in the future I will report it then.

comment #1 shows an example of a successful restart using radeon.ko, so I guess
it worked for at least some ASICs. at any rate, restarting GPU is a separate,
kernel, problem.
Feel free to remove the relocation guard if you want to investigate GPU reset.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2889 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (15 preceding siblings ...)
  2018-11-22  4:44 ` bugzilla-daemon
@ 2018-11-23 13:43 ` bugzilla-daemon
  2018-12-04 17:40 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-11-23 13:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1453 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #10 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
(In reply to Jan Vesely from comment #9)
> (In reply to Maciej S. Szmigiero from comment #8)
> > Aren't program@execute@calls-struct and program@execute@tail-calls tests
> > from comment 4 examples of this behavior?
> > These seem to run but return wrong results, or am I not parsing the piglit
> > test results correctly?
> 
> This is more of a piglit problem. piglit uses a combination of enqueue and
> clFinish. However, the error happens on kernel launch. thus;
> 1.) clEnqueueNDRangeKernel -- success
> 2.) The driver tries to launch the kernel and fails on relocations
> 3.) application(piglit) calls clFinish
> 
> depending on the order of 2. and 3. clFinish can either see an empty queue
> and succeed or try to wait for kernel execution and fail.
> 
> The following series should address that:
> https://patchwork.freedesktop.org/series/52857/

Thanks for the detailed explanation and the patches.

I can confirm that with them applied program@execute@calls-struct and
program@execute@tail-calls exit with
CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST, so I guess they work
(or rather, fail) as expected.

Feel free to add
"Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>" tag if you would
like.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2824 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (16 preceding siblings ...)
  2018-11-23 13:43 ` bugzilla-daemon
@ 2018-12-04 17:40 ` bugzilla-daemon
  2019-01-15  9:23 ` bugzilla-daemon
  2019-06-13 18:45 ` bugzilla-daemon
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2018-12-04 17:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1660 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #11 from Jan Vesely <jan.vesely@rutgers.edu> ---
(In reply to Maciej S. Szmigiero from comment #10)
> (In reply to Jan Vesely from comment #9)
> > (In reply to Maciej S. Szmigiero from comment #8)
> > > Aren't program@execute@calls-struct and program@execute@tail-calls tests
> > > from comment 4 examples of this behavior?
> > > These seem to run but return wrong results, or am I not parsing the piglit
> > > test results correctly?
> > 
> > This is more of a piglit problem. piglit uses a combination of enqueue and
> > clFinish. However, the error happens on kernel launch. thus;
> > 1.) clEnqueueNDRangeKernel -- success
> > 2.) The driver tries to launch the kernel and fails on relocations
> > 3.) application(piglit) calls clFinish
> > 
> > depending on the order of 2. and 3. clFinish can either see an empty queue
> > and succeed or try to wait for kernel execution and fail.
> > 
> > The following series should address that:
> > https://patchwork.freedesktop.org/series/52857/
> 
> Thanks for the detailed explanation and the patches.
> 
> I can confirm that with them applied program@execute@calls-struct and
> program@execute@tail-calls exit with
> CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST, so I guess they work
> (or rather, fail) as expected.
> 
> Feel free to add
> "Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>" tag if you
> would
> like.

Thanks. I pushed the piglit patches. I'll keep this bug open until mesa
properly supports relocations.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3149 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (17 preceding siblings ...)
  2018-12-04 17:40 ` bugzilla-daemon
@ 2019-01-15  9:23 ` bugzilla-daemon
  2019-06-13 18:45 ` bugzilla-daemon
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2019-01-15  9:23 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 269 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

--- Comment #12 from Pander <pander@users.sourceforge.net> ---
Super for the tested patch. What is the status regarding Mesa on this?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1252 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
  2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
                   ` (18 preceding siblings ...)
  2019-01-15  9:23 ` bugzilla-daemon
@ 2019-06-13 18:45 ` bugzilla-daemon
  19 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2019-06-13 18:45 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 626 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=105113

Jan Vesely <jv356@scarletmail.rutgers.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from Jan Vesely <jv356@scarletmail.rutgers.edu> ---
Relocations are now handled in the new radeonsi linker (merged in
77b05cc42df29472a7852b90575a19e8991815cd and co.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2437 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-06-13 18:45 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-15 14:58 [Bug 105113] [hawaii] Running Piglit cl/program/execute/calls-struct.cl causes GPU VM error and ring stalled GPU lockup bugzilla-daemon
2018-02-15 15:58 ` bugzilla-daemon
2018-02-15 16:09 ` [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause " bugzilla-daemon
2018-04-29 22:23 ` bugzilla-daemon
2018-06-30 13:06 ` bugzilla-daemon
2018-10-27 17:42 ` bugzilla-daemon
2018-10-27 19:02 ` bugzilla-daemon
2018-11-14  9:58 ` bugzilla-daemon
2018-11-14 10:00 ` bugzilla-daemon
2018-11-14 10:08 ` bugzilla-daemon
2018-11-14 10:08 ` bugzilla-daemon
2018-11-14 10:09 ` bugzilla-daemon
2018-11-15 22:24 ` bugzilla-daemon
2018-11-16 16:13 ` bugzilla-daemon
2018-11-18 19:24 ` bugzilla-daemon
2018-11-19 14:03 ` bugzilla-daemon
2018-11-22  4:44 ` bugzilla-daemon
2018-11-23 13:43 ` bugzilla-daemon
2018-12-04 17:40 ` bugzilla-daemon
2019-01-15  9:23 ` bugzilla-daemon
2019-06-13 18:45 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).