From mboxrd@z Thu Jan  1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 105113] [hawaii, radeonsi, clover] Running Piglit
 cl/program/execute/{, tail-}calls{, -struct,
 -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
Date: Sun, 29 Apr 2018 22:23:24 +0000
Message-ID: <bug-105113-502-19o4sHQoXv@http.bugs.freedesktop.org/>
References: <bug-105113-502@http.bugs.freedesktop.org/>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0218758608=="
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
 [IPv6:2610:10:20:722:a800:ff:fe98:4b55])
 by gabe.freedesktop.org (Postfix) with ESMTP id 4FFFC6E0BA
 for <dri-devel@lists.freedesktop.org>; Sun, 29 Apr 2018 22:23:24 +0000 (UTC)
In-Reply-To: <bug-105113-502@http.bugs.freedesktop.org/>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org


--===============0218758608==
Content-Type: multipart/alternative; boundary="15250406040.C79DFd.23775"
Content-Transfer-Encoding: 7bit


--15250406040.C79DFd.23775
Date: Sun, 29 Apr 2018 22:23:24 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated

https://bugs.freedesktop.org/show_bug.cgi?id=3D105113

Maciej S. Szmigiero <mail@maciej.szmigiero.name> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mail@maciej.szmigiero.name

--- Comment #2 from Maciej S. Szmigiero <mail@maciej.szmigiero.name> ---
I've also hit this issue on "Oland PRO [Radeon R7 240/340] (rev 87)" with
mesa-18.1.0_rc2, llvm-6.0.0 and kernel 4.16.5.

The crash happens at "cl/program/execute/calls-struct.cl" from piglit as we=
ll.
It happens both from a X session and from a KMS console.

The exact crash looks like this:
[  171.969488] radeon 0000:20:00.0: GPU fault detected: 147 0x06106001
[  171.969489] radeon 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500030
[  171.969490] radeon 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x10060001
[  171.969491] VM fault (0x01, vmid 8) at page 5242928, read from CB (96)

Then the radeon driver tries to reset the GPU endlessly.
I've tried pcie_gen2=3D0, msi=3D0, dpm=3D0, hard_reset=3D1, vm_size=3D16 in=
 various
combinations, nothing seems to help (msi=3D0 gives a ton of IOMMU errors, B=
TW).

Also have tried amdgpu which gives a similar crash (it looks like this
driver didn't attempt to reset the GPU afterwards):
[  435.596230] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596233] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500060
[  435.596235] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08060002
[  435.596239] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (96)
[  435.596245] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596247] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500060
[  435.596248] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[  435.596252] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[  435.596256] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596258] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500060
[  435.596260] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08010002
[  435.596263] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (16)
[  435.596267] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002
[  435.596269] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500060
[  435.596271] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[  435.596274] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[  435.596278] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002

This might be (also?) a kernel bug since a userspace program should not
be able to crash a GPU, regardless how incorrect command stream it sends
to one.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

--15250406040.C79DFd.23775
Date: Sun, 29 Apr 2018 22:23:24 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated

<html>
    <head>
      <base href=3D"https://bugs.freedesktop.org/">
    </head>
    <body><span class=3D"vcard"><a class=3D"email" href=3D"mailto:mail&#64;=
maciej.szmigiero.name" title=3D"Maciej S. Szmigiero &lt;mail&#64;maciej.szm=
igiero.name&gt;"> <span class=3D"fn">Maciej S. Szmigiero</span></a>
</span> changed
          <a class=3D"bz_bug_link=20
          bz_status_NEW "
   title=3D"NEW - [hawaii, radeonsi, clover] Running Piglit cl/program/exec=
ute/{,tail-}calls{,-struct,-workitem-id}.cl cause GPU VM error and ring sta=
lled GPU lockup"
   href=3D"https://bugs.freedesktop.org/show_bug.cgi?id=3D105113">bug 10511=
3</a>
          <br>
             <table border=3D"1" cellspacing=3D"0" cellpadding=3D"8">
          <tr>
            <th>What</th>
            <th>Removed</th>
            <th>Added</th>
          </tr>

         <tr>
           <td style=3D"text-align:right;">CC</td>
           <td>
               &nbsp;
           </td>
           <td>mail&#64;maciej.szmigiero.name
           </td>
         </tr></table>
      <p>
        <div>
            <b><a class=3D"bz_bug_link=20
          bz_status_NEW "
   title=3D"NEW - [hawaii, radeonsi, clover] Running Piglit cl/program/exec=
ute/{,tail-}calls{,-struct,-workitem-id}.cl cause GPU VM error and ring sta=
lled GPU lockup"
   href=3D"https://bugs.freedesktop.org/show_bug.cgi?id=3D105113#c2">Commen=
t # 2</a>
              on <a class=3D"bz_bug_link=20
          bz_status_NEW "
   title=3D"NEW - [hawaii, radeonsi, clover] Running Piglit cl/program/exec=
ute/{,tail-}calls{,-struct,-workitem-id}.cl cause GPU VM error and ring sta=
lled GPU lockup"
   href=3D"https://bugs.freedesktop.org/show_bug.cgi?id=3D105113">bug 10511=
3</a>
              from <span class=3D"vcard"><a class=3D"email" href=3D"mailto:=
mail&#64;maciej.szmigiero.name" title=3D"Maciej S. Szmigiero &lt;mail&#64;m=
aciej.szmigiero.name&gt;"> <span class=3D"fn">Maciej S. Szmigiero</span></a>
</span></b>
        <pre>I've also hit this issue on &quot;Oland PRO [Radeon R7 240/340=
] (rev 87)&quot; with
mesa-18.1.0_rc2, llvm-6.0.0 and kernel 4.16.5.

The crash happens at &quot;cl/program/execute/calls-struct.cl&quot; from pi=
glit as well.
It happens both from a X session and from a KMS console.

The exact crash looks like this:
[  171.969488] radeon 0000:20:00.0: GPU fault detected: 147 0x06106001
[  171.969489] radeon 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500030
[  171.969490] radeon 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x10060001
[  171.969491] VM fault (0x01, vmid 8) at page 5242928, read from CB (96)

Then the radeon driver tries to reset the GPU endlessly.
I've tried pcie_gen2=3D0, msi=3D0, dpm=3D0, hard_reset=3D1, vm_size=3D16 in=
 various
combinations, nothing seems to help (msi=3D0 gives a ton of IOMMU errors, B=
TW).

Also have tried amdgpu which gives a similar crash (it looks like this
driver didn't attempt to reset the GPU afterwards):
[  435.596230] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596233] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500060
[  435.596235] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08060002
[  435.596239] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (96)
[  435.596245] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596247] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500060
[  435.596248] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[  435.596252] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[  435.596256] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c086002
[  435.596258] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500060
[  435.596260] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08010002
[  435.596263] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (16)
[  435.596267] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002
[  435.596269] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=20=
=20
0x00500060
[  435.596271] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x08050002
[  435.596274] amdgpu 0000:20:00.0: VM fault (0x02, vmid 4) at page 5242976,
read from '' (0x00000000) (80)
[  435.596278] amdgpu 0000:20:00.0: GPU fault detected: 147 0x0c085002

This might be (also?) a kernel bug since a userspace program should not
be able to crash a GPU, regardless how incorrect command stream it sends
to one.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>=

--15250406040.C79DFd.23775--

--===============0218758608==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==

--===============0218758608==--