From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 105113] [hawaii, radeonsi, clover] Running Piglit cl/program/execute/{, tail-}calls{, -struct, -workitem-id}.cl cause GPU VM error and ring stalled GPU lockup Date: Thu, 22 Nov 2018 04:44:54 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1415521109==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 727446E08D for ; Thu, 22 Nov 2018 04:44:54 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1415521109== Content-Type: multipart/alternative; boundary="15428618940.cBCC73E.24817" Content-Transfer-Encoding: 7bit --15428618940.cBCC73E.24817 Date: Thu, 22 Nov 2018 04:44:54 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D105113 --- Comment #9 from Jan Vesely --- (In reply to Maciej S. Szmigiero from comment #8) > Aren't program@execute@calls-struct and program@execute@tail-calls tests > from comment 4 examples of this behavior? > These seem to run but return wrong results, or am I not parsing the piglit > test results correctly? This is more of a piglit problem. piglit uses a combination of enqueue and clFinish. However, the error happens on kernel launch. thus; 1.) clEnqueueNDRangeKernel -- success 2.) The driver tries to launch the kernel and fails on relocations 3.) application(piglit) calls clFinish depending on the order of 2. and 3. clFinish can either see an empty queue = and succeed or try to wait for kernel execution and fail. The following series should address that: https://patchwork.freedesktop.org/series/52857/ > This would explain why "amdgpu" seemed to not even attempt to reset the G= PU > after a crash. >=20 > However, I think I've got at least one lockup when testing this issue hal= f a > year ago on "radeon" driver ("amdgpu" is still marked as experimental for= SI > parts). > If I am able to reproduce it in the future I will report it then. comment #1 shows an example of a successful restart using radeon.ko, so I g= uess it worked for at least some ASICs. at any rate, restarting GPU is a separat= e, kernel, problem. Feel free to remove the relocation guard if you want to investigate GPU res= et. --=20 You are receiving this mail because: You are the assignee for the bug.= --15428618940.cBCC73E.24817 Date: Thu, 22 Nov 2018 04:44:54 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 9 on bug 10511= 3 from Jan Vesely
(In reply to Maciej S. Szmigiero from comment #8)
> Aren't program@execute@calls-struct and =
program@execute@tail-calls tests
> from comment 4 examples of=
 this behavior?
> These seem to run but return wrong results, or am I not parsing the pi=
glit
> test results correctly?

This is more of a piglit problem. piglit uses a combination of enqueue and
clFinish. However, the error happens on kernel launch. thus;
1.) clEnqueueNDRangeKernel -- success
2.) The driver tries to launch the kernel and fails on relocations
3.) application(piglit) calls clFinish

depending on the order of 2. and 3. clFinish can either see an empty queue =
and
succeed or try to wait for kernel execution and fail.

The following series should address that:
https://patchwo=
rk.freedesktop.org/series/52857/

> This would explain why "amdgpu" seemed=
 to not even attempt to reset the GPU
> after a crash.
>=20
> However, I think I've got at least one lockup when testing this issue =
half a
> year ago on "radeon" driver ("amdgpu" is still mar=
ked as experimental for SI
> parts).
> If I am able to reproduce it in the future I will report it then.

comment #1 shows an example of =
a successful restart using radeon.ko, so I guess
it worked for at least some ASICs. at any rate, restarting GPU is a separat=
e,
kernel, problem.
Feel free to remove the relocation guard if you want to investigate GPU res=
et.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15428618940.cBCC73E.24817-- --===============1415521109== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1415521109==--