From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752819AbcLCI6p (ORCPT ); Sat, 3 Dec 2016 03:58:45 -0500 Received: from mail.fireflyinternet.com ([109.228.58.192]:55281 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751708AbcLCI6o (ORCPT ); Sat, 3 Dec 2016 03:58:44 -0500 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Date: Sat, 3 Dec 2016 08:57:00 +0000 From: Chris Wilson To: Matt Turner Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Kenneth Graunke , Daniel Vetter , Mika Kuoppala Subject: Re: [Intel-gfx] [PATCH] drm/i915: Remove instructions to file a bug report. Message-ID: <20161203085700.GA9731@nuc-i3427.alporthouse.com> Mail-Followup-To: Chris Wilson , Matt Turner , intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Kenneth Graunke , Daniel Vetter , Mika Kuoppala References: <1480726985-12762-1-git-send-email-mattst88@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1480726985-12762-1-git-send-email-mattst88@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 02, 2016 at 05:03:05PM -0800, Matt Turner wrote: > From these instructions, users assume that /sys/class/drm/card0/error > contains all the information a developer needs to diagnose and fix a GPU > hang. > > In fact it doesn't, and we have no tools for solving them (other than > stabbing in the dark). Most of the time the error state itself isn't > even useful because it just shows a hang on a PIPE_CONTROL or similar. > > Until a time when the error state contains enough information to > actually solve a hang, stop telling users to file unsolvable bugs, and > instead rely on users who know where and how to file a good bug report > to find their own way there. > > Signed-off-by: Matt Turner Nak. Though having stale bug reports is a pain, we've recently adopted the policy of stopping the request after a certain period, those bug reports are still vital. They don't just represent bugs in mesa. > --- > Maybe now's a good time to discuss what *would* be useful to put in the > error state for debugging hangs. The currently executing shader program > would be a great place to start. Now? That is the conversation we've being trying to have for several years. The contents of the error state are currently about sufficient to spot kernel bugs, triage the culprit and the general class of bug. Capturing all state for a request is unfeasible (because we can't copy the gigabytes of memory required). Copying a selected set of aux bo is one option. And since those bo are under user control and do not have to be executed, you can even store aub data in them or whatnot. Even if you make attaching the debug information conditional, I would still keep the error message unconditional. -Chris -- Chris Wilson, Intel Open Source Technology Centre