From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753368AbcLGQJD (ORCPT ); Wed, 7 Dec 2016 11:09:03 -0500 Received: from mail-wm0-f49.google.com ([74.125.82.49]:35556 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752323AbcLGQJA (ORCPT ); Wed, 7 Dec 2016 11:09:00 -0500 Date: Wed, 7 Dec 2016 17:09:02 +0100 From: Daniel Vetter To: Matt Turner Cc: Jani Nikula , "intel-gfx@lists.freedesktop.org" , LKML , Kenneth Graunke , Daniel Vetter , Mika Kuoppala Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove instructions to file a bug report. Message-ID: <20161207160902.7gwosv2m4z3xikib@phenom.ffwll.local> Mail-Followup-To: Matt Turner , Jani Nikula , "intel-gfx@lists.freedesktop.org" , LKML , Kenneth Graunke , Daniel Vetter , Mika Kuoppala References: <1480726985-12762-1-git-send-email-mattst88@gmail.com> <87inr1qqz2.fsf@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: Linux phenom 4.8.0-1-amd64 User-Agent: NeoMutt/20161104 (1.7.1) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 05, 2016 at 04:55:47PM -0800, Matt Turner wrote: > On Sat, Dec 3, 2016 at 1:52 AM, Jani Nikula wrote: > > On Sat, 03 Dec 2016, Matt Turner wrote: > >> From these instructions, users assume that /sys/class/drm/card0/error > >> contains all the information a developer needs to diagnose and fix a GPU > >> hang. > >> > >> In fact it doesn't, and we have no tools for solving them (other than > >> stabbing in the dark). Most of the time the error state itself isn't > >> even useful because it just shows a hang on a PIPE_CONTROL or similar. > >> > >> Until a time when the error state contains enough information to > >> actually solve a hang, stop telling users to file unsolvable bugs, and > >> instead rely on users who know where and how to file a good bug report > >> to find their own way there. > >> > >> Signed-off-by: Matt Turner > >> --- > >> Maybe now's a good time to discuss what *would* be useful to put in the > >> error state for debugging hangs. The currently executing shader program > >> would be a great place to start. > > > > I'm wondering why we're getting this patch now, and my guess is that > > it's because we have been reassigning the related bugs to Mesa more > > actively lately. Is that the case? > > No, it's simply because I spent a week going through Bugzilla and > realized how incomplete an unactionable the majority of GPU hang > reports are. > > Asking users to report bugs, but not telling them what actually > constitutes a bug report, is a recipe for a lot of wasted developer > time. > > I suspect we could improve the usefulness of the reports by directing > users to a webpage that gave a few suggestions (tell us what you were > doing when the hang occurred would be an obvious one) about filing a > bug and then provided a link to Bugzilla. Or even configured Bugzilla > to have a default template that requested various bits of information. I think dumping at least some of the aux buffers should make this tons more useful for mesa, since it would indicate stuff like "we always die on resolves on skl gt4" or stuff like that. Thus far error states have been mostly used by kernel folks to debug kernel issues, which is why none of that additional stuff gets dumped. But a bare-bones parser to hunt for indirect state base addresses and fish out the aux stuff shouldn't be that hard, and might make this fully useful. Like Chris said the goal is to at least be able to triage and classify bugs, and I'm perfectly fine with merging additional code to the dumper to get there for mesa folks. We z-compress the state, so size isn't really an issue. And Ben has commit rights, so shouldn't be a problem to get this all merged. > > IIUC the bug reports are useful for us when it's a kernel bug, but less > > useful for you when it's a Mesa bug. And you'd rather have fewer > > incoming bugs that you think are unsolvable with the information at > > hand. > > > > Sounds like a bug workflow issue between drm/i915 and Mesa to be ironed > > out. > > Indeed. I'd rather have the information provided in a bug report to > actually solve it. I hope having access to the shader program will > make many more reports useful. > > I am also happy to see that there's now a sunset to the GPU hang message. The other option is that mesa folks don't want error states that we triage to mesa. We could definitely update the process in that area. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch