From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 108854] [polaris11] - GPU Hang - ring gfx timeout Date: Fri, 22 Feb 2019 18:15:41 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0289075146==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 2CB49897E8 for ; Fri, 22 Feb 2019 18:15:41 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0289075146== Content-Type: multipart/alternative; boundary="15508593410.E7cC3D.14857" Content-Transfer-Encoding: 7bit --15508593410.E7cC3D.14857 Date: Fri, 22 Feb 2019 18:15:41 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D108854 --- Comment #16 from Tom Seewald --- (In reply to Tom St Denis from comment #15) > If you can't reproduce on a newer version of mesa then it's "been fixed" = :-) My (probably incorrect) understanding is roughly this: +-------+-------+ 1.) | Application | +-------+-------+ | | Possibly sending bad commands/calls to Mesa | v +------+---------+ 2.) | Mesa | +------+---------+ | | Passing on bad calls from the application | or | There is a bug in Mesa itself where it is sending bad calls/comman= ds to the kernel v +--------+--------+ 3.) | Kernel/amdgpu | +--------+--------+ | | amdgpu puts the physical device in a bad state due to bad commands from Mesa v +--------+--------+ 4.) | GPU | +--------+--------+ Given that mesa 18.3.3+ "fixes" the issue, it sounds like a specific case of mesa sending garbage to the kernel (step 2 to 3) has been fixed. But in general shouldn't the kernel driver (ideally) be able to handle mesa passing malformed/bad commands rather than freezing the device (step 3 to 4= )?=20 I understand not every case can be covered, and I also understand that GPU resets need to be supported in user space for seamless recovery, but should= n't the driver "unstick" itself enough so the computer can be rebooted normally? Thanks for your time and patience. --=20 You are receiving this mail because: You are the assignee for the bug.= --15508593410.E7cC3D.14857 Date: Fri, 22 Feb 2019 18:15:41 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 16 on bug 10885= 4 from Tom Seewald
(In reply to Tom St Denis from comment #15)
> If you can't reproduce on a newer version of mes=
a then it's "been fixed" :-)

My (probably incorrect) understanding is roughly this:

    +-------+-------+
1.) |  Application  |
    +-------+-------+
       |
       | Possibly sending bad commands/calls to Mesa
       |
       v
    +------+---------+
2.) |     Mesa       |
    +------+---------+
       |
       | Passing on bad calls from the application
       |     or
       | There is a bug in Mesa itself where it is sending bad calls/comman=
ds
to the kernel
       v
    +--------+--------+
3.) |  Kernel/amdgpu  |
    +--------+--------+
       |
       | amdgpu puts the physical device in a bad state due to bad commands
from Mesa
       v
    +--------+--------+
4.) |       GPU       |
    +--------+--------+

Given that mesa 18.3.3+ "fixes" the issue, it sounds like a speci=
fic case of
mesa sending garbage to the kernel (step 2 to 3) has been fixed.

But in general shouldn't the kernel driver (ideally) be able to handle mesa
passing malformed/bad commands rather than freezing the device (step 3 to 4=
)?=20
I understand not every case can be covered, and I also understand that GPU
resets need to be supported in user space for seamless recovery, but should=
n't
the driver "unstick" itself enough so the computer can be reboote=
d normally?

Thanks for your time and patience.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15508593410.E7cC3D.14857-- --===============0289075146== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0289075146==--