From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 105113] [hawaii, radeonsi, clover] Running Piglit
cl/program/execute/{, tail-}calls{, -struct,
-workitem-id}.cl cause GPU VM error and ring stalled GPU lockup
Date: Mon, 19 Nov 2018 14:03:36 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0822147913=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id 2A57489FD9
for ; Mon, 19 Nov 2018 14:03:36 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============0822147913==
Content-Type: multipart/alternative; boundary="15426362161.FeCC.28457"
Content-Transfer-Encoding: 7bit
--15426362161.FeCC.28457
Date: Mon, 19 Nov 2018 14:03:36 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D105113
--- Comment #8 from Maciej S. Szmigiero ---
(In reply to Jan Vesely from comment #7)
> (In reply to Maciej S. Szmigiero from comment #6)
> > There are really two issues at play here:
> > 1) If the LLVM-generated code cannot be run properly then it should be =
simply
> > rejected by whatever is actually in charge of submitting it to the GPU =
(I
> > guess
> > this would be Mesa?).
> > This way an application will know it cannot use OpenCL for computation,=
at
> > least
> > not with this compute kernel.
> >=20
> > Instead, it currently looks like many of these test run but give incorr=
ect
> > results, which is obviously rather bad.
>=20
> Do you have an example of this? clover should return OUT_OF_RESOURCES err=
or
> when the compute state creation fails (like in the presence of code
> relocations).
> It does not change the content of the buffer, so it will return whatever =
was
> stored in the buffer on creation.
Aren't program@execute@calls-struct and program@execute@tail-calls tests
from comment 4 examples of this behavior?
These seem to run but return wrong results, or am I not parsing the piglit
test results correctly?
> > 2) Some (previous) Mesa + LLVM versions generate a command stream that
> > crashes the GPU and, as far as I can remember, sometimes even lockup the
> > whole machine.
> >=20
> > It should not be possible to crash the GPU, regardless how incorrect a
> > command stream that userspace sends to it is - because otherwise it is
> > possible for
> > an unprivileged user with GPU access to DoS the machine.
>=20
> This is a separate issue. GPU hangs are generally addressed via gpu reset
> which should be enabled for gfx8/9 GPUs in recent amdgpu.ko [0]
>=20
> [0] https://patchwork.freedesktop.org/patch/257994/
This would explain why "amdgpu" seemed to not even attempt to reset the GPU
after a crash.
However, I think I've got at least one lockup when testing this issue half a
year ago on "radeon" driver ("amdgpu" is still marked as experimental for SI
parts).
If I am able to reproduce it in the future I will report it then.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15426362161.FeCC.28457
Date: Mon, 19 Nov 2018 14:03:36 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Commen=
t # 8
on bug 10511=
3
from Maciej S. Szmigiero
(In reply to Jan Vesely from comment #7)
> (In reply to Maciej S. Szmigiero from comment #6)
> > There are really two issues at play here:
> > 1) If the LLVM-generated code cannot be run properly then it shou=
ld be simply
> > rejected by whatever is actually in charge of submitting it to th=
e GPU (I
> > guess
> > this would be Mesa?).
> > This way an application will know it cannot use OpenCL for comput=
ation, at
> > least
> > not with this compute kernel.
> >=20
> > Instead, it currently looks like many of these test run but give =
incorrect
> > results, which is obviously rather bad.
>=20
> Do you have an example of this? clover should return OUT_OF_RESOURCES =
error
> when the compute state creation fails (like in the presence of code
> relocations).
> It does not change the content of the buffer, so it will return whatev=
er was
> stored in the buffer on creation.
Aren't program@execute@calls-struct and program@execute@tai=
l-calls tests
from comment 4 examples of this=
behavior?
These seem to run but return wrong results, or am I not parsing the piglit
test results correctly?
> > 2) Some (previous) Mesa + LLVM versions gen=
erate a command stream that
> > crashes the GPU and, as far as I can remember, sometimes even loc=
kup the
> > whole machine.
> >=20
> > It should not be possible to crash the GPU, regardless how incorr=
ect a
> > command stream that userspace sends to it is - because otherwise =
it is
> > possible for
> > an unprivileged user with GPU access to DoS the machine.
>=20
> This is a separate issue. GPU hangs are generally addressed via gpu re=
set
> which should be enabled for gfx8/9 GPUs in recent amdgpu.ko [0]
>=20
> [0] https:=
//patchwork.freedesktop.org/patch/257994/
This would explain why "amdgpu" seemed to not even attempt to res=
et the GPU
after a crash.
However, I think I've got at least one lockup when testing this issue half a
year ago on "radeon" driver ("amdgpu" is still marked a=
s experimental for SI
parts).
If I am able to reproduce it in the future I will report it then.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15426362161.FeCC.28457--
--===============0822147913==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==
--===============0822147913==--