From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750782AbdALGDb (ORCPT ); Thu, 12 Jan 2017 01:03:31 -0500 Received: from mx2.suse.de ([195.135.220.15]:53527 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750698AbdALGD3 (ORCPT ); Thu, 12 Jan 2017 01:03:29 -0500 Subject: Re: [Intel-gfx] GPU hang with kernel 4.10rc3 To: Chris Wilson , Linux Kernel Mailing List , dri-devel@lists.freedesktop.org, intel-gfx , airlied@linux.ie, daniel.vetter@intel.com References: <7737c1e1-7523-1eea-07a9-0be04b8078e9@suse.com> <20170111170823.GC16278@nuc-i3427.alporthouse.com> From: Juergen Gross Message-ID: <37feeb4d-7dfc-5866-6a25-b204701a4938@suse.com> Date: Thu, 12 Jan 2017 07:03:25 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <20170111170823.GC16278@nuc-i3427.alporthouse.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/01/17 18:08, Chris Wilson wrote: > On Wed, Jan 11, 2017 at 05:33:34PM +0100, Juergen Gross wrote: >> With kernel 4.10rc3 running as Xen dm0 I get at each boot: >> >> [ 49.213697] [drm] GPU HANG: ecode 7:0:0x3d1d3d3d, in gnome-shell >> [1431], reason: Hang on render ring, action: reset >> [ 49.213699] [drm] GPU hangs can indicate a bug anywhere in the entire >> gfx stack, including userspace. >> [ 49.213700] [drm] Please file a _new_ bug report on >> bugs.freedesktop.org against DRI -> DRM/Intel >> [ 49.213700] [drm] drm/i915 developers can then reassign to the right >> component if it's not a kernel issue. >> [ 49.213700] [drm] The gpu crash dump is required to analyze gpu >> hangs, so please always attach it. >> [ 49.213701] [drm] GPU crash dump saved to /sys/class/drm/card0/error >> [ 49.213755] drm/i915: Resetting chip after gpu hang >> [ 60.213769] drm/i915: Resetting chip after gpu hang >> [ 71.189737] drm/i915: Resetting chip after gpu hang >> [ 82.165747] drm/i915: Resetting chip after gpu hang >> [ 93.205727] drm/i915: Resetting chip after gpu hang >> >> The dump is attached. > > That's a nasty one. The first couple of pages of the batchbuffer appear > to be overwritten. (Full of 0xc2c2c2c2, i.e. probably pixel data.) That > may be a concurrent write by either the GPU or CPU, or we may have > incorrected mapped a set of pages. That it doesn't recovered suggests > that the corruption occurs frequently, probably on every request/batch. I hoped someone would have an idea already. > Is this a new bug? Bisection would be the fastest way to triage it. Commit 7453c549f was still okay. Starting bisect now (2882 commits, 12 steps) ... Juergen