From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: [PATCH] drm/i915: kicking rings considered harmful
Date: Mon, 26 Sep 2011 19:59:50 +0200
Message-ID: <1317059990-1922-1-git-send-email-daniel.vetter@ffwll.ch>
References: <CAObL_7Hi+c2aEtEMzrMNnrQXfnUmNY_ZnP==xCd7egMoKBow_g@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from mail-ww0-f43.google.com (mail-ww0-f43.google.com [74.125.82.43])
	by gabe.freedesktop.org (Postfix) with ESMTP id 8D78F9E86E
	for <intel-gfx@lists.freedesktop.org>;
	Mon, 26 Sep 2011 12:00:12 -0700 (PDT)
Received: by wwf27 with SMTP id 27so5930311wwf.12
	for <intel-gfx@lists.freedesktop.org>;
	Mon, 26 Sep 2011 12:00:11 -0700 (PDT)
In-Reply-To: <CAObL_7Hi+c2aEtEMzrMNnrQXfnUmNY_ZnP==xCd7egMoKBow_g@mail.gmail.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: intel-gfx <intel-gfx@lists.freedesktop.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
List-Id: intel-gfx@lists.freedesktop.org

Only do it in the hope of resurrecting the gpu. Disable when reset is
disabled because it seems to tremendously increases our changes to
actually capture an error_state before the system goes all belly-up.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
Hi Andrew,

Can you please apply this patch and boot your system with

i915.reset=0 i915.semaphores=1

and rehang your gpu? This patch to fully disable any attempts at
resurrecting a dead gpu hopefully prevents the full system hang you're
experiencing. At least it helps greatly here on my systems.

If the systems isn't completely dead with this, can you please ssh
into the machine and grabe dmesg, i915_error_state, Xorg.log and
whatever else there might be?

Thanks a lot,

Daniel

 drivers/gpu/drm/i915/i915_drv.c |    2 +-
 drivers/gpu/drm/i915/i915_drv.h |    1 +
 drivers/gpu/drm/i915/i915_irq.c |    2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index b79c6f1..ad85c13 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -91,7 +91,7 @@ MODULE_PARM_DESC(vbt_sdvo_panel_type,
 		"Override selection of SDVO panel mode in the VBT "
 		"(default: auto)");
 
-static bool i915_try_reset __read_mostly = true;
+bool i915_try_reset __read_mostly = true;
 module_param_named(reset, i915_try_reset, bool, 0600);
 MODULE_PARM_DESC(reset, "Attempt GPU resets (default: true)");
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3621336..788a801 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -995,6 +995,7 @@ extern unsigned int i915_semaphores __read_mostly;
 extern unsigned int i915_lvds_downclock __read_mostly;
 extern unsigned int i915_panel_use_ssc __read_mostly;
 extern int i915_vbt_sdvo_panel_type __read_mostly;
+extern bool i915_try_reset __read_mostly;
 extern unsigned int i915_enable_rc6 __read_mostly;
 extern unsigned int i915_enable_fbc __read_mostly;
 extern bool i915_enable_hangcheck __read_mostly;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index da5d607..09c11e4 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1694,7 +1694,7 @@ void i915_hangcheck_elapsed(unsigned long data)
 		if (dev_priv->hangcheck_count++ > 1) {
 			DRM_ERROR("Hangcheck timer elapsed... GPU hung\n");
 
-			if (!IS_GEN2(dev)) {
+			if (!IS_GEN2(dev) && i915_try_reset) {
 				/* Is the chip hanging on a WAIT_FOR_EVENT?
 				 * If so we can simply poke the RB_WAIT bit
 				 * and break the hang. This should work on
-- 
1.7.6.2