From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 108272] [polaris10] opencl-mesa: Anything using OpenCL segfaults, XFX Radeon RX 580 Date: Mon, 17 Dec 2018 17:46:12 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1340947340==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id DFA826E60F for ; Mon, 17 Dec 2018 17:46:11 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1340947340== Content-Type: multipart/alternative; boundary="15450687711.2a8a5a8.25436" Content-Transfer-Encoding: 7bit --15450687711.2a8a5a8.25436 Date: Mon, 17 Dec 2018 17:46:11 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D108272 --- Comment #12 from Jan Vesely --- Hi, sorry for the delay. somehow I missed the notifications. (In reply to jamespharvey20 from comment #11) > When I originally filed this, I assumed it was 1 bug since I tried 2 thin= gs > with OpenCL, and both failed with opencl-mesa but worked with opencl-amd. >=20 > Jan Vesely was correct that there were two separate problems. >=20 > I'm hoping Jan Vesely can give guidance on whether to leave this bug open > for any of the reasons below, or if I should close it and potentially open > up 1-2 new bugs. >=20 > The original luxmark bug (segfault) is solved, but that exposes 2 new > opencl-mesa bugs when running luxmark. >=20 > The original IndigoBenchmark bug (segfault) isn't solved, but as explained > below, I understand if we have to consider that unsolvable for now. >=20 > I don't think this affects any of these bugs, but I'll mention a few weeks > ago, I switched back to my Asus Radeon R9 390. The same behaviors discus= sed > in this entire bug report occur. (i.e. 18.2.3 and before crash luxmark.)= =20 > If someone really wants me to do so, I can switch back to the RX 580 to t= est > 18.2.4, but I'm betting since it works properly with the R9 390 that the > problem is fixed. >=20 > ORIGINAL LUXMARK BUG #1 > ----------------------------------------- >=20 > Using mesa 18.2.4, the luxmark segfault is solved. As this was the first bug. I'd close this one and open new bugs for both in= digo and incorrect rendering in luxmark. >=20 > NEW - LUXMARK BUG #2 > ------------------------------------ >=20 > Jan Vesely's comment on 2018-10-09 mentions: "bumping MAX_GLOBAL_BUFFERS = to > 32 allows luxmark to run, albeit still with many incorrect pixels -- libc= lc > rounding conversions are incorrect." >=20 > That's what I'm seeing out of 18.2.4. Using LuxBall HDR (Simple Benchmar= k): >=20 > MESA 18.2.4: 40626 (Image validation OK (65739 different pixels, 10.27%) >=20 > AMDGPU-PRO: 15739 (Image validation OK (5736 different pixels, 0.90%) >=20 > There's no typos there. opencl-mesa scores almost unbelievably higher th= an > opencl-amd, but the different pixels percentage increases by a factor of > 11.4. >=20 > As Jan's other comment on 2018-10-09 mentions, the image looks garbled and > the results are incorrect. >=20 > Not sure if this bug should be left open for this issue, or if I should > create a new bug. (Or, if there is a bug already open for it.) Or, if m= esa > will say it's purely libclc's problem, and to go to them about it. I'd say this is probably a purely libclc problem, but feel free to open the= bug against clover on freedesktop. 10% is rather good I usually saw ~30% wrong pixels on my machines. >=20 > NEW - LUXMARK BUG #3 > ------------------------------------ >=20 > Although luxmark can now benchmark, when doing so, all input becomes > unusably awful. It reminds me of when Windows has too many things open, > suddenly decided it can't cope, and you're waiting to see if it's going to > recover or crash. Keystrokes take too long to be printed, and the mouse > becomes slow and jumpy. Top shows cpu and memory usage are fine, which w= as > my first thought. BTW, running xf86-video-amdgpu 18.1.0, and when I > upgraded mesa, it was both mesa and opencl-mesa. >=20 > In comparison, if I use opencl-amd, input is not affected. I wouldn't ev= en > know the GPU is being slammed. >=20 > Using the program radeontop, I can see when using mesa, "Graphics pipe", > "Texture Addresser", and "Shader Interpolator" are between 95-100%, usual= ly > 98-100%. >=20 > When using opencl-amd, radeontop shows the same. (Granted, Vertex Groupe= r + > Tesselator / Shader Export/Scan Converter/Depth Block/Color Block bounce > between 5-20% vs on opencl-mesa, they bounce between 1-5%.) This sounds like GPU priority/scheduling problem. I haven't looked into whe= ther it can be solved via opening lower priority pipe for compute, or we need to enable advanced features like CWSR. Please open a separate bug. Hogging a l= arge portion of the GPU might explain some of that high score. >=20 > INDIGO BUG > ------------------ >=20 > I edited 18.2.4's si_get.c to be very short: >=20 > snprintf(sscreen->renderer_string, sizeof(sscreen->renderer_string), > "%s", > chip_name); >=20 > And compiled/installed it, but it didn't affect the crash. >=20 > IndigoBenchmark said they're statically linking with LLVM 3.4, which is > quite old. But, it runs fine with opencl-amd, and only crashes on > opencl-mesa. I just posted a followup "where do we go from here"-ish > comment there which has to be moderator approved so isn't showing yet.=20 > https://www.indigorenderer.com/forum/viewtopic.php?f=3D37&t=3D14986 >=20 > Part of me thinks it needs to be given up on, being a closed-source > precompiled binary statically linked against LLVM 3.4. >=20 > Part of me thinks since it only crashes with opencl-mesa, and runs perfec= tly > fine with opencl-amd, there's probably (but not definitely) a bug in > opencl-mesa. >=20 > But, I understand since they don't seem to be paying this any attention, = we > may have to give up on the Indigo Bug as being unable to be realistically > investigated further. Can you check if indigo exports any LLVM symbols? It might be that we end up using those instead of the new ones from libLLVM.* If that's the case one solution would be to link mesa/clover with static LL= VM. Enabling symbol versioning for LLVM should work as well. --=20 You are receiving this mail because: You are the assignee for the bug.= --15450687711.2a8a5a8.25436 Date: Mon, 17 Dec 2018 17:46:11 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 12 on bug 10827= 2 from Jan Vesely
Hi,

sorry for the delay. somehow I missed the notifications.
(In reply to jamespharvey20 from c=
omment #11)
> When I originally filed this, I assumed it was 1=
 bug since I tried 2 things
> with OpenCL, and both failed with opencl-mesa but worked with opencl-a=
md.
>=20
> Jan Vesely was correct that there were two separate problems.
>=20
> I'm hoping Jan Vesely can give guidance on whether to leave this bug o=
pen
> for any of the reasons below, or if I should close it and potentially =
open
> up 1-2 new bugs.
>=20
> The original luxmark bug (segfault) is solved, but that exposes 2 new
> opencl-mesa bugs when running luxmark.
>=20
> The original IndigoBenchmark bug (segfault) isn't solved, but as expla=
ined
> below, I understand if we have to consider that unsolvable for now.
>=20
> I don't think this affects any of these bugs, but I'll mention a few w=
eeks
> ago, I switched back to my Asus Radeon R9 390.  The same behaviors dis=
cussed
> in this entire bug report occur.  (i.e. 18.2.3 and before crash luxmar=
k.)=20
> If someone really wants me to do so, I can switch back to the RX 580 t=
o test
> 18.2.4, but I'm betting since it works properly with the R9 390 that t=
he
> problem is fixed.
>=20
> ORIGINAL LUXMARK BUG #1
> -----------------------------------------
>=20
> Using mesa 18.2.4, the luxmark segfault is solved.

As this was the first bug. I'd close this one and open new bugs for both in=
digo
and incorrect rendering in luxmark.

>=20
> NEW - LUXMARK BUG #2
> ------------------------------------
>=20
> Jan Vesely's comment on 2018-10-09 mentions: "bumping MAX_GLOBAL_=
BUFFERS to
> 32 allows luxmark to run, albeit still with many incorrect pixels -- l=
ibclc
> rounding conversions are incorrect."
>=20
> That's what I'm seeing out of 18.2.4.  Using LuxBall HDR (Simple Bench=
mark):
>=20
> MESA 18.2.4: 40626 (Image validation OK (65739 different pixels, 10.27=
%)
>=20
> AMDGPU-PRO: 15739 (Image validation OK (5736 different pixels, 0.90%)
>=20
> There's no typos there.  opencl-mesa scores almost unbelievably higher=
 than
> opencl-amd, but the different pixels percentage increases by a factor =
of
> 11.4.
>=20
> As Jan's other comment on 2018-10-09 mentions, the image looks garbled=
 and
> the results are incorrect.
>=20
> Not sure if this bug should be left open for this issue, or if I should
> create a new bug.  (Or, if there is a bug already open for it.)  Or, i=
f mesa
> will say it's purely libclc's problem, and to go to them about it.

I'd say this is probably a purely libclc problem, but feel free to open the=
 bug
against clover on freedesktop. 10% is rather good I usually saw ~30% wrong
pixels on my machines.

>=20
> NEW - LUXMARK BUG #3
> ------------------------------------
>=20
> Although luxmark can now benchmark, when doing so, all input becomes
> unusably awful.  It reminds me of when Windows has too many things ope=
n,
> suddenly decided it can't cope, and you're waiting to see if it's goin=
g to
> recover or crash.  Keystrokes take too long to be printed, and the mou=
se
> becomes slow and jumpy.  Top shows cpu and memory usage are fine, whic=
h was
> my first thought.  BTW, running xf86-video-amdgpu 18.1.0, and when I
> upgraded mesa, it was both mesa and opencl-mesa.
>=20
> In comparison, if I use opencl-amd, input is not affected.  I wouldn't=
 even
> know the GPU is being slammed.
>=20
> Using the program radeontop, I can see when using mesa, "Graphics=
 pipe",
> "Texture Addresser", and "Shader Interpolator" are=
 between 95-100%, usually
> 98-100%.
>=20
> When using opencl-amd, radeontop shows the same.  (Granted, Vertex Gro=
uper +
> Tesselator / Shader Export/Scan Converter/Depth Block/Color Block boun=
ce
> between 5-20% vs on opencl-mesa, they bounce between 1-5%.)

This sounds like GPU priority/scheduling problem. I haven't looked into whe=
ther
it can be solved via opening lower priority pipe for compute, or we need to
enable advanced features like CWSR. Please open a separate bug. Hogging a l=
arge
portion of the GPU might explain some of that high score.

>=20
> INDIGO BUG
> ------------------
>=20
> I edited 18.2.4's si_get.c to be very short:
>=20
>     snprintf(sscreen->renderer_string, sizeof(sscreen->renderer_=
string),
>        "%s",
>        chip_name);
>=20
> And compiled/installed it, but it didn't affect the crash.
>=20
> IndigoBenchmark said they're statically linking with LLVM 3.4, which is
> quite old.  But, it runs fine with opencl-amd, and only crashes on
> opencl-mesa.  I just posted a followup "where do we go from here&=
quot;-ish
> comment there which has to be moderator approved so isn't showing yet.=
=20
>  https://www.indigorenderer.com/forum/viewtopic.php?f=3D37&am=
p;t=3D14986
>=20
> Part of me thinks it needs to be given up on, being a closed-source
> precompiled binary statically linked against LLVM 3.4.
>=20
> Part of me thinks since it only crashes with opencl-mesa, and runs per=
fectly
> fine with opencl-amd, there's probably (but not definitely) a bug in
> opencl-mesa.
>=20
> But, I understand since they don't seem to be paying this any attentio=
n, we
> may have to give up on the Indigo Bug as being unable to be realistica=
lly
> investigated further.

Can you check if indigo exports any LLVM symbols? It might be that we end up
using those instead of the new ones from libLLVM.*
If that's the case one solution would be to link mesa/clover with static LL=
VM.
Enabling symbol versioning for LLVM should work as well.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15450687711.2a8a5a8.25436-- --===============1340947340== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1340947340==--