From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 108272] [polaris10] opencl-mesa: Anything using OpenCL
segfaults, XFX Radeon RX 580
Date: Mon, 17 Dec 2018 17:46:12 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1340947340=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id DFA826E60F
for ; Mon, 17 Dec 2018 17:46:11 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============1340947340==
Content-Type: multipart/alternative; boundary="15450687711.2a8a5a8.25436"
Content-Transfer-Encoding: 7bit
--15450687711.2a8a5a8.25436
Date: Mon, 17 Dec 2018 17:46:11 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D108272
--- Comment #12 from Jan Vesely ---
Hi,
sorry for the delay. somehow I missed the notifications.
(In reply to jamespharvey20 from comment #11)
> When I originally filed this, I assumed it was 1 bug since I tried 2 thin=
gs
> with OpenCL, and both failed with opencl-mesa but worked with opencl-amd.
>=20
> Jan Vesely was correct that there were two separate problems.
>=20
> I'm hoping Jan Vesely can give guidance on whether to leave this bug open
> for any of the reasons below, or if I should close it and potentially open
> up 1-2 new bugs.
>=20
> The original luxmark bug (segfault) is solved, but that exposes 2 new
> opencl-mesa bugs when running luxmark.
>=20
> The original IndigoBenchmark bug (segfault) isn't solved, but as explained
> below, I understand if we have to consider that unsolvable for now.
>=20
> I don't think this affects any of these bugs, but I'll mention a few weeks
> ago, I switched back to my Asus Radeon R9 390. The same behaviors discus=
sed
> in this entire bug report occur. (i.e. 18.2.3 and before crash luxmark.)=
=20
> If someone really wants me to do so, I can switch back to the RX 580 to t=
est
> 18.2.4, but I'm betting since it works properly with the R9 390 that the
> problem is fixed.
>=20
> ORIGINAL LUXMARK BUG #1
> -----------------------------------------
>=20
> Using mesa 18.2.4, the luxmark segfault is solved.
As this was the first bug. I'd close this one and open new bugs for both in=
digo
and incorrect rendering in luxmark.
>=20
> NEW - LUXMARK BUG #2
> ------------------------------------
>=20
> Jan Vesely's comment on 2018-10-09 mentions: "bumping MAX_GLOBAL_BUFFERS =
to
> 32 allows luxmark to run, albeit still with many incorrect pixels -- libc=
lc
> rounding conversions are incorrect."
>=20
> That's what I'm seeing out of 18.2.4. Using LuxBall HDR (Simple Benchmar=
k):
>=20
> MESA 18.2.4: 40626 (Image validation OK (65739 different pixels, 10.27%)
>=20
> AMDGPU-PRO: 15739 (Image validation OK (5736 different pixels, 0.90%)
>=20
> There's no typos there. opencl-mesa scores almost unbelievably higher th=
an
> opencl-amd, but the different pixels percentage increases by a factor of
> 11.4.
>=20
> As Jan's other comment on 2018-10-09 mentions, the image looks garbled and
> the results are incorrect.
>=20
> Not sure if this bug should be left open for this issue, or if I should
> create a new bug. (Or, if there is a bug already open for it.) Or, if m=
esa
> will say it's purely libclc's problem, and to go to them about it.
I'd say this is probably a purely libclc problem, but feel free to open the=
bug
against clover on freedesktop. 10% is rather good I usually saw ~30% wrong
pixels on my machines.
>=20
> NEW - LUXMARK BUG #3
> ------------------------------------
>=20
> Although luxmark can now benchmark, when doing so, all input becomes
> unusably awful. It reminds me of when Windows has too many things open,
> suddenly decided it can't cope, and you're waiting to see if it's going to
> recover or crash. Keystrokes take too long to be printed, and the mouse
> becomes slow and jumpy. Top shows cpu and memory usage are fine, which w=
as
> my first thought. BTW, running xf86-video-amdgpu 18.1.0, and when I
> upgraded mesa, it was both mesa and opencl-mesa.
>=20
> In comparison, if I use opencl-amd, input is not affected. I wouldn't ev=
en
> know the GPU is being slammed.
>=20
> Using the program radeontop, I can see when using mesa, "Graphics pipe",
> "Texture Addresser", and "Shader Interpolator" are between 95-100%, usual=
ly
> 98-100%.
>=20
> When using opencl-amd, radeontop shows the same. (Granted, Vertex Groupe=
r +
> Tesselator / Shader Export/Scan Converter/Depth Block/Color Block bounce
> between 5-20% vs on opencl-mesa, they bounce between 1-5%.)
This sounds like GPU priority/scheduling problem. I haven't looked into whe=
ther
it can be solved via opening lower priority pipe for compute, or we need to
enable advanced features like CWSR. Please open a separate bug. Hogging a l=
arge
portion of the GPU might explain some of that high score.
>=20
> INDIGO BUG
> ------------------
>=20
> I edited 18.2.4's si_get.c to be very short:
>=20
> snprintf(sscreen->renderer_string, sizeof(sscreen->renderer_string),
> "%s",
> chip_name);
>=20
> And compiled/installed it, but it didn't affect the crash.
>=20
> IndigoBenchmark said they're statically linking with LLVM 3.4, which is
> quite old. But, it runs fine with opencl-amd, and only crashes on
> opencl-mesa. I just posted a followup "where do we go from here"-ish
> comment there which has to be moderator approved so isn't showing yet.=20
> https://www.indigorenderer.com/forum/viewtopic.php?f=3D37&t=3D14986
>=20
> Part of me thinks it needs to be given up on, being a closed-source
> precompiled binary statically linked against LLVM 3.4.
>=20
> Part of me thinks since it only crashes with opencl-mesa, and runs perfec=
tly
> fine with opencl-amd, there's probably (but not definitely) a bug in
> opencl-mesa.
>=20
> But, I understand since they don't seem to be paying this any attention, =
we
> may have to give up on the Indigo Bug as being unable to be realistically
> investigated further.
Can you check if indigo exports any LLVM symbols? It might be that we end up
using those instead of the new ones from libLLVM.*
If that's the case one solution would be to link mesa/clover with static LL=
VM.
Enabling symbol versioning for LLVM should work as well.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15450687711.2a8a5a8.25436
Date: Mon, 17 Dec 2018 17:46:11 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Comme=
nt # 12
on bug 10827=
2
from Jan Vesely
Hi,
sorry for the delay. somehow I missed the notifications.
(In reply to jamespharvey20 from c=
omment #11)
> When I originally filed this, I assumed it was 1=
bug since I tried 2 things
> with OpenCL, and both failed with opencl-mesa but worked with opencl-a=
md.
>=20
> Jan Vesely was correct that there were two separate problems.
>=20
> I'm hoping Jan Vesely can give guidance on whether to leave this bug o=
pen
> for any of the reasons below, or if I should close it and potentially =
open
> up 1-2 new bugs.
>=20
> The original luxmark bug (segfault) is solved, but that exposes 2 new
> opencl-mesa bugs when running luxmark.
>=20
> The original IndigoBenchmark bug (segfault) isn't solved, but as expla=
ined
> below, I understand if we have to consider that unsolvable for now.
>=20
> I don't think this affects any of these bugs, but I'll mention a few w=
eeks
> ago, I switched back to my Asus Radeon R9 390. The same behaviors dis=
cussed
> in this entire bug report occur. (i.e. 18.2.3 and before crash luxmar=
k.)=20
> If someone really wants me to do so, I can switch back to the RX 580 t=
o test
> 18.2.4, but I'm betting since it works properly with the R9 390 that t=
he
> problem is fixed.
>=20
> ORIGINAL LUXMARK BUG #1
> -----------------------------------------
>=20
> Using mesa 18.2.4, the luxmark segfault is solved.
As this was the first bug. I'd close this one and open new bugs for both in=
digo
and incorrect rendering in luxmark.
>=20
> NEW - LUXMARK BUG #2
> ------------------------------------
>=20
> Jan Vesely's comment on 2018-10-09 mentions: "bumping MAX_GLOBAL_=
BUFFERS to
> 32 allows luxmark to run, albeit still with many incorrect pixels -- l=
ibclc
> rounding conversions are incorrect."
>=20
> That's what I'm seeing out of 18.2.4. Using LuxBall HDR (Simple Bench=
mark):
>=20
> MESA 18.2.4: 40626 (Image validation OK (65739 different pixels, 10.27=
%)
>=20
> AMDGPU-PRO: 15739 (Image validation OK (5736 different pixels, 0.90%)
>=20
> There's no typos there. opencl-mesa scores almost unbelievably higher=
than
> opencl-amd, but the different pixels percentage increases by a factor =
of
> 11.4.
>=20
> As Jan's other comment on 2018-10-09 mentions, the image looks garbled=
and
> the results are incorrect.
>=20
> Not sure if this bug should be left open for this issue, or if I should
> create a new bug. (Or, if there is a bug already open for it.) Or, i=
f mesa
> will say it's purely libclc's problem, and to go to them about it.
I'd say this is probably a purely libclc problem, but feel free to open the=
bug
against clover on freedesktop. 10% is rather good I usually saw ~30% wrong
pixels on my machines.
>=20
> NEW - LUXMARK BUG #3
> ------------------------------------
>=20
> Although luxmark can now benchmark, when doing so, all input becomes
> unusably awful. It reminds me of when Windows has too many things ope=
n,
> suddenly decided it can't cope, and you're waiting to see if it's goin=
g to
> recover or crash. Keystrokes take too long to be printed, and the mou=
se
> becomes slow and jumpy. Top shows cpu and memory usage are fine, whic=
h was
> my first thought. BTW, running xf86-video-amdgpu 18.1.0, and when I
> upgraded mesa, it was both mesa and opencl-mesa.
>=20
> In comparison, if I use opencl-amd, input is not affected. I wouldn't=
even
> know the GPU is being slammed.
>=20
> Using the program radeontop, I can see when using mesa, "Graphics=
pipe",
> "Texture Addresser", and "Shader Interpolator" are=
between 95-100%, usually
> 98-100%.
>=20
> When using opencl-amd, radeontop shows the same. (Granted, Vertex Gro=
uper +
> Tesselator / Shader Export/Scan Converter/Depth Block/Color Block boun=
ce
> between 5-20% vs on opencl-mesa, they bounce between 1-5%.)
This sounds like GPU priority/scheduling problem. I haven't looked into whe=
ther
it can be solved via opening lower priority pipe for compute, or we need to
enable advanced features like CWSR. Please open a separate bug. Hogging a l=
arge
portion of the GPU might explain some of that high score.
>=20
> INDIGO BUG
> ------------------
>=20
> I edited 18.2.4's si_get.c to be very short:
>=20
> snprintf(sscreen->renderer_string, sizeof(sscreen->renderer_=
string),
> "%s",
> chip_name);
>=20
> And compiled/installed it, but it didn't affect the crash.
>=20
> IndigoBenchmark said they're statically linking with LLVM 3.4, which is
> quite old. But, it runs fine with opencl-amd, and only crashes on
> opencl-mesa. I just posted a followup "where do we go from here&=
quot;-ish
> comment there which has to be moderator approved so isn't showing yet.=
=20
> https://www.indigorenderer.com/forum/viewtopic.php?f=3D37&am=
p;t=3D14986
>=20
> Part of me thinks it needs to be given up on, being a closed-source
> precompiled binary statically linked against LLVM 3.4.
>=20
> Part of me thinks since it only crashes with opencl-mesa, and runs per=
fectly
> fine with opencl-amd, there's probably (but not definitely) a bug in
> opencl-mesa.
>=20
> But, I understand since they don't seem to be paying this any attentio=
n, we
> may have to give up on the Indigo Bug as being unable to be realistica=
lly
> investigated further.
Can you check if indigo exports any LLVM symbols? It might be that we end up
using those instead of the new ones from libLLVM.*
If that's the case one solution would be to link mesa/clover with static LL=
VM.
Enabling symbol versioning for LLVM should work as well.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15450687711.2a8a5a8.25436--
--===============1340947340==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==
--===============1340947340==--