From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [PATCH] drm/i915: add interface to simulate gpu hangs Date: Sat, 5 May 2012 21:13:10 +0200 Message-ID: <20120505191310.GB4985@phenom.ffwll.local> References: <1335532667-10597-1-git-send-email-daniel.vetter@ffwll.ch> <1336049296-5494-1-git-send-email-daniel.vetter@ffwll.ch> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-we0-f177.google.com (mail-we0-f177.google.com [74.125.82.177]) by gabe.freedesktop.org (Postfix) with ESMTP id 50DE59E733 for ; Sat, 5 May 2012 12:12:03 -0700 (PDT) Received: by werp11 with SMTP id p11so2904808wer.36 for ; Sat, 05 May 2012 12:12:02 -0700 (PDT) Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Eugeni Dodonov Cc: Daniel Vetter , Intel Graphics Development List-Id: intel-gfx@lists.freedesktop.org On Thu, May 03, 2012 at 04:00:00PM -0300, Eugeni Dodonov wrote: > On Thu, May 3, 2012 at 9:48 AM, Daniel Vetter wrote: > > > gpu reset is a very important piece of our infrastructure. > > Unfortunately we only really it test by actually hanging the gpu, > > which often has bad side-effects for the entire system. And the gpu > > hang handling code is one of the rather complicated pieces of code we > > have, consisting of > > - hang detection > > - error capture > > - actual gpu reset > > - reset of all the gem bookkeeping > > - reinitialition of the entire gpu > > > > This patch adds a debugfs to selectively stopping rings by ceasing to > > update the hw tail pointer, which will result in the gpu no longer > > updating it's head pointer and eventually to the hangcheck firing. > > This way we can exercise the gpu hang code under controlled conditions > > without a dying gpu taking down the entire systems. > > > > Patch motivated by me forgetting to properly reinitialize ppgtt after > > a gpu reset. > > > > Usage: > > > > echo $((1 << $ringnum)) > i915_ring_stop # stops one ring > > > > echo 0xffffffff > i915_ring_stop # stops all, future-proof version > > > > then run whatever testload is desired. i915_ring_stop automatically > > resets after a gpu hang is detected to avoid hanging the gpu to fast > > and declaring it wedged. > > > > v2: Incorporate feedback from Chris Wilson. > > > > v3: Add the missing cleanup. > > > > v4: Fix up inconsistent size of ring_stop_read vs _write, noticed by > > Eugeni Dodonov. > > > > Signed-Off-by: Daniel Vetter > > Reviewed-by: Chris Wilson > > > > Reviewed-by: Eugeni Dodonov I've slurped the hangman into -next, thanks for the review. -Daniel -- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48