All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs
@ 2014-05-22 15:56 Daniel Vetter
  2014-05-22 15:56 ` [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N Daniel Vetter
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Daniel Vetter @ 2014-05-22 15:56 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

On platforms with shared interrupt enable bits (which are shared even
with the pipe CRC logic) there's some tricky corner cases. Add
information to make debugging those easier.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 16bbdc7243df..4a79d3fe35be 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2356,6 +2356,10 @@ static int i915_display_info(struct seq_file *m, void *unused)
 				   x, y, crtc->cursor_addr,
 				   yesno(active));
 		}
+
+		seq_printf(m, "\tunderrun reporting: cpu=%s pch=%s \n",
+			   yesno(!crtc->cpu_fifo_underrun_disabled),
+			   yesno(!crtc->pch_fifo_underrun_disabled));
 	}
 
 	seq_printf(m, "\n");
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N
  2014-05-22 15:56 [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs Daniel Vetter
@ 2014-05-22 15:56 ` Daniel Vetter
  2014-05-22 16:55   ` Ville Syrjälä
  2014-05-22 15:56 ` [PATCH 3/5] drm/i915: Disable gpu reset on i965g/gm Daniel Vetter
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2014-05-22 15:56 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

So apparently this is tricky.

We need to consider:
- We start out with all the hw enabling bits disabled, both the
  individual fifo underrun interrupts and the shared display error
  interrupts masked. Otherwise if the bios config is broken we'll blow
  up with a NULL deref in our interrupt handler since the crtc
  structures aren't set up yet at driver load time.
- On gmch we need to mask fifo underruns on the sw side, so always
  need to set that in sanitize_crtc for those platforms.
- On other platforms we try to set the sw tracking so that it reflects
  the real state. But since a few platforms have shared bits we must
  _not_ disable fifo underrun reporting. Otherwise we'll never enable
  the shared error interrupt.

This is the state before out patch, but unfortunately this is not good
enough. But after a suspend resume operation this is broken:
1. We don't enable the hw interrupts since the same code runs on
resume as on driver load.
2. The fifo underrun state adjustments we do in sanitize_crtc doesn't
fire on resume since (except for hilarious firmware) all pipes are off
at that point. But they also don't hurt since the subsequent crtc
enabling due to force_restore will enable fifo underruns.

Which means when we enable fifo underrun reporting we notice that the
per-crtc state is already correct and short-circuit everthing out. And
the interrupt doesn't get enabled.

A similar problem would happen if the bios doesn't light up anything
when the driver loads. Which is exactly what happens when we reload
the driver since our unload functions disables all outputs.

Now we can't just rip out the short-circuit logic and unconditionally
update the fifo underrun reporting interrupt masking: We have some
checks for shared error interrupts to catch issues that happened when
the shared error interrupt was disabled.

The right fix is to push down this logic so that we can always update
the hardware state, but only check for missed fifo underruns on a real
enabled->disabled transition and ignore them when we're already
disabled.

On platforms with shared error interrupt the pipe CRC interrupts are
grouped together with the fifo underrun reporting this fixes pipe CRC
support after suspend and driver reloads.

Testcase: igt/kms_pipe_crc_basic/suspend-*
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_irq.c | 44 ++++++++++++++++++-----------------------
 1 file changed, 19 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 304f86a3162c..4d44f09eb833 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -335,7 +335,8 @@ void i9xx_check_fifo_underruns(struct drm_device *dev)
 }
 
 static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
-					     enum pipe pipe, bool enable)
+					     enum pipe pipe,
+					     bool enable, bool old)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u32 reg = PIPESTAT(pipe);
@@ -347,7 +348,7 @@ static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
 		I915_WRITE(reg, pipestat | PIPE_FIFO_UNDERRUN_STATUS);
 		POSTING_READ(reg);
 	} else {
-		if (pipestat & PIPE_FIFO_UNDERRUN_STATUS)
+		if (old && pipestat & PIPE_FIFO_UNDERRUN_STATUS)
 			DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
 	}
 }
@@ -366,7 +367,8 @@ static void ironlake_set_fifo_underrun_reporting(struct drm_device *dev,
 }
 
 static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
-						  enum pipe pipe, bool enable)
+						  enum pipe pipe,
+						  bool enable, bool old)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	if (enable) {
@@ -379,7 +381,8 @@ static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
 	} else {
 		ironlake_disable_display_irq(dev_priv, DE_ERR_INT_IVB);
 
-		if (I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
+		if (old &&
+		    I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
 			DRM_ERROR("uncleared fifo underrun on pipe %c\n",
 				  pipe_name(pipe));
 		}
@@ -444,7 +447,7 @@ static void ibx_set_fifo_underrun_reporting(struct drm_device *dev,
 
 static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
 					    enum transcoder pch_transcoder,
-					    bool enable)
+					    bool enable, bool old)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
@@ -459,7 +462,8 @@ static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
 	} else {
 		ibx_disable_display_interrupt(dev_priv, SDE_ERROR_CPT);
 
-		if (I915_READ(SERR_INT) & SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
+		if (old && I915_READ(SERR_INT) &
+		    SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
 			DRM_ERROR("uncleared pch fifo underrun on pch transcoder %c\n",
 				  transcoder_name(pch_transcoder));
 		}
@@ -486,28 +490,23 @@ static bool __intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
-	bool ret;
+	bool old;
 
 	assert_spin_locked(&dev_priv->irq_lock);
 
-	ret = !intel_crtc->cpu_fifo_underrun_disabled;
-
-	if (enable == ret)
-		goto done;
-
+	old = !intel_crtc->cpu_fifo_underrun_disabled;
 	intel_crtc->cpu_fifo_underrun_disabled = !enable;
 
 	if (INTEL_INFO(dev)->gen < 5 || IS_VALLEYVIEW(dev))
-		i9xx_set_fifo_underrun_reporting(dev, pipe, enable);
+		i9xx_set_fifo_underrun_reporting(dev, pipe, enable, old);
 	else if (IS_GEN5(dev) || IS_GEN6(dev))
 		ironlake_set_fifo_underrun_reporting(dev, pipe, enable);
 	else if (IS_GEN7(dev))
-		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable);
+		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable, old);
 	else if (IS_GEN8(dev))
 		broadwell_set_fifo_underrun_reporting(dev, pipe, enable);
 
-done:
-	return ret;
+	return old;
 }
 
 bool intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
@@ -556,7 +555,7 @@ bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
 	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pch_transcoder];
 	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	unsigned long flags;
-	bool ret;
+	bool old;
 
 	/*
 	 * NOTE: Pre-LPT has a fixed cpu pipe -> pch transcoder mapping, but LPT
@@ -569,21 +568,16 @@ bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
 
 	spin_lock_irqsave(&dev_priv->irq_lock, flags);
 
-	ret = !intel_crtc->pch_fifo_underrun_disabled;
-
-	if (enable == ret)
-		goto done;
-
+	old = !intel_crtc->pch_fifo_underrun_disabled;
 	intel_crtc->pch_fifo_underrun_disabled = !enable;
 
 	if (HAS_PCH_IBX(dev))
 		ibx_set_fifo_underrun_reporting(dev, pch_transcoder, enable);
 	else
-		cpt_set_fifo_underrun_reporting(dev, pch_transcoder, enable);
+		cpt_set_fifo_underrun_reporting(dev, pch_transcoder, enable, old);
 
-done:
 	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-	return ret;
+	return old;
 }
 
 
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] drm/i915: Disable gpu reset on i965g/gm
  2014-05-22 15:56 [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs Daniel Vetter
  2014-05-22 15:56 ` [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N Daniel Vetter
@ 2014-05-22 15:56 ` Daniel Vetter
  2014-05-22 15:56 ` [PATCH 4/5] drm/i915: Inline ilk/gen8_irq_reset Daniel Vetter
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2014-05-22 15:56 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

Ville figured out that it needs a full display reset since apparently
a lot more goes down than just the GT. Until that's address it's
better to just diable gpu reset.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_uncore.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 67385a9aa98c..8eb294074e89 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -967,6 +967,9 @@ static int i965_do_reset(struct drm_device *dev)
 {
 	int ret;
 
+	/* FIXME: i965g/gm need a display save/restore for gpu reset. */
+	return -ENODEV;
+
 	/*
 	 * Set the domains we want to reset (GRDOM/bits 2 and 3) as
 	 * well as the reset bit (GR/bit 0).  Setting the GR bit
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] drm/i915: Inline ilk/gen8_irq_reset
  2014-05-22 15:56 [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs Daniel Vetter
  2014-05-22 15:56 ` [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N Daniel Vetter
  2014-05-22 15:56 ` [PATCH 3/5] drm/i915: Disable gpu reset on i965g/gm Daniel Vetter
@ 2014-05-22 15:56 ` Daniel Vetter
  2014-05-22 15:56 ` [PATCH 5/5] drm/i915: Improve irq handling after gpu resets Daniel Vetter
  2014-05-22 17:02 ` [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs Ville Syrjälä
  4 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2014-05-22 15:56 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

No point in having this indirection.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_irq.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4d44f09eb833..3cd659b47dd2 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3111,11 +3111,6 @@ static void ironlake_irq_reset(struct drm_device *dev)
 	ibx_irq_reset(dev);
 }
 
-static void ironlake_irq_preinstall(struct drm_device *dev)
-{
-	ironlake_irq_reset(dev);
-}
-
 static void valleyview_irq_preinstall(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -3168,11 +3163,6 @@ static void gen8_irq_reset(struct drm_device *dev)
 	ibx_irq_reset(dev);
 }
 
-static void gen8_irq_preinstall(struct drm_device *dev)
-{
-	gen8_irq_reset(dev);
-}
-
 static void cherryview_irq_preinstall(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -4381,7 +4371,7 @@ void intel_irq_init(struct drm_device *dev)
 		dev_priv->display.hpd_irq_setup = i915_hpd_irq_setup;
 	} else if (IS_GEN8(dev)) {
 		dev->driver->irq_handler = gen8_irq_handler;
-		dev->driver->irq_preinstall = gen8_irq_preinstall;
+		dev->driver->irq_preinstall = gen8_irq_reset;
 		dev->driver->irq_postinstall = gen8_irq_postinstall;
 		dev->driver->irq_uninstall = gen8_irq_uninstall;
 		dev->driver->enable_vblank = gen8_enable_vblank;
@@ -4389,7 +4379,7 @@ void intel_irq_init(struct drm_device *dev)
 		dev_priv->display.hpd_irq_setup = ibx_hpd_irq_setup;
 	} else if (HAS_PCH_SPLIT(dev)) {
 		dev->driver->irq_handler = ironlake_irq_handler;
-		dev->driver->irq_preinstall = ironlake_irq_preinstall;
+		dev->driver->irq_preinstall = ironlake_irq_reset;
 		dev->driver->irq_postinstall = ironlake_irq_postinstall;
 		dev->driver->irq_uninstall = ironlake_irq_uninstall;
 		dev->driver->enable_vblank = ironlake_enable_vblank;
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] drm/i915: Improve irq handling after gpu resets
  2014-05-22 15:56 [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs Daniel Vetter
                   ` (2 preceding siblings ...)
  2014-05-22 15:56 ` [PATCH 4/5] drm/i915: Inline ilk/gen8_irq_reset Daniel Vetter
@ 2014-05-22 15:56 ` Daniel Vetter
  2014-05-22 16:51   ` Ville Syrjälä
  2014-05-22 20:18   ` [PATCH 1/2] " Daniel Vetter
  2014-05-22 17:02 ` [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs Ville Syrjälä
  4 siblings, 2 replies; 19+ messages in thread
From: Daniel Vetter @ 2014-05-22 15:56 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

Currently we do a full re-init of all interrupts after a gpu hang.
Which is pretty bad since we don't restore the interrupts we've
enabled at runtime correctly. Even with that addressed it's rather
horribly race.

But on g4x and later we only reset the gt and not the entire gpu.
Which means we only need to reset the GT interrupt bits. Which has the
nice benefit that vblank waits, pipe CRC interrupts and everything
else display related just keeps on working.

The downside is that gt interrupt handling (i.e. ring->get/put_irq) is
still racy. But as long as the gpu hang reliably wakes all waters and
we have a short time where the refcount drops to 0 we'll recover. So
not that bad really.

Testcase: igt/kms_flip/vblank-vs-hang
Testcase: igt/kms_pipe_crc_basic/hang-*
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.c | 17 +++++++++--------
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_irq.c | 35 +++++++++++++++++++++++++++--------
 3 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index c83c83b74bf4..69a75bb7ad83 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -811,17 +811,18 @@ int i915_reset(struct drm_device *dev)
 		}
 
 		/*
-		 * FIXME: This is horribly race against concurrent pageflip and
-		 * vblank wait ioctls since they can observe dev->irqs_disabled
-		 * being false when they shouldn't be able to.
+		 * FIXME: This races pretty badly against concurrent holders of
+		 * ring interrupts. This is possible since we've started to drop
+		 * dev->struct_mutex in select places when waiting for the gpu.
 		 */
-		drm_irq_uninstall(dev);
-		drm_irq_install(dev, dev->pdev->irq);
+		intel_gt_irq_reinit(dev);
 
-		/* rps/rc6 re-init is necessary to restore state lost after the
-		 * reset and the re-install of drm irq. Skip for ironlake per
+		/*
+		 * rps/rc6 re-init is necessary to restore state lost after the
+		 * reset and the re-install of gt irqs. Skip for ironlake per
 		 * previous concerns that it doesn't respond well to some forms
-		 * of re-init after reset. */
+		 * of re-init after reset.
+		 */
 		if (INTEL_INFO(dev)->gen > 5)
 			intel_reset_gt_powersave(dev);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 459b02ad1ef3..61792c4050e7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2078,6 +2078,7 @@ void gen6_set_pm_mask(struct drm_i915_private *dev_priv, u32 pm_iir,
 							int new_delay);
 extern void intel_irq_init(struct drm_device *dev);
 extern void intel_hpd_init(struct drm_device *dev);
+extern void intel_gt_irq_reinit(struct drm_device *dev);
 
 extern void intel_uncore_sanitize(struct drm_device *dev);
 extern void intel_uncore_early_sanitize(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 3cd659b47dd2..6dba645a81fa 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3140,6 +3140,14 @@ static void valleyview_irq_preinstall(struct drm_device *dev)
 	POSTING_READ(VLV_IER);
 }
 
+static void gen8_gt_irq_reset(struct drm_i915_private *dev_priv)
+{
+	GEN8_IRQ_RESET_NDX(GT, 0);
+	GEN8_IRQ_RESET_NDX(GT, 1);
+	GEN8_IRQ_RESET_NDX(GT, 2);
+	GEN8_IRQ_RESET_NDX(GT, 3);
+}
+
 static void gen8_irq_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -3148,10 +3156,7 @@ static void gen8_irq_reset(struct drm_device *dev)
 	I915_WRITE(GEN8_MASTER_IRQ, 0);
 	POSTING_READ(GEN8_MASTER_IRQ);
 
-	GEN8_IRQ_RESET_NDX(GT, 0);
-	GEN8_IRQ_RESET_NDX(GT, 1);
-	GEN8_IRQ_RESET_NDX(GT, 2);
-	GEN8_IRQ_RESET_NDX(GT, 3);
+	gen8_gt_irq_reset(dev_priv);
 
 	for_each_pipe(pipe)
 		GEN8_IRQ_RESET_NDX(DE_PIPE, pipe);
@@ -3171,10 +3176,7 @@ static void cherryview_irq_preinstall(struct drm_device *dev)
 	I915_WRITE(GEN8_MASTER_IRQ, 0);
 	POSTING_READ(GEN8_MASTER_IRQ);
 
-	GEN8_IRQ_RESET_NDX(GT, 0);
-	GEN8_IRQ_RESET_NDX(GT, 1);
-	GEN8_IRQ_RESET_NDX(GT, 2);
-	GEN8_IRQ_RESET_NDX(GT, 3);
+	gen8_gt_irq_reset(dev_priv);
 
 	GEN5_IRQ_RESET(GEN8_PCU_);
 
@@ -4436,6 +4438,23 @@ void intel_hpd_init(struct drm_device *dev)
 	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
 }
 
+void intel_gt_irq_reinit(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	if (INTEL_INFO(dev)->gen >= 8) {
+		gen8_gt_irq_reset(dev_priv);
+		gen8_gt_irq_postinstall(dev_priv);
+	} else if (INTEL_INFO(dev)->gen >= 5) {
+		gen5_gt_irq_reset(dev);
+		gen5_gt_irq_postinstall(dev);
+	} else {
+		WARN_ON(!IS_G4X(dev));
+	}
+	spin_unlock_irq(&dev_priv->irq_lock);
+}
+
 /* Disable interrupts so we can allow runtime PM. */
 void intel_runtime_pm_disable_interrupts(struct drm_device *dev)
 {
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/i915: Improve irq handling after gpu resets
  2014-05-22 15:56 ` [PATCH 5/5] drm/i915: Improve irq handling after gpu resets Daniel Vetter
@ 2014-05-22 16:51   ` Ville Syrjälä
  2014-05-22 17:06     ` Ville Syrjälä
  2014-05-22 20:12     ` Daniel Vetter
  2014-05-22 20:18   ` [PATCH 1/2] " Daniel Vetter
  1 sibling, 2 replies; 19+ messages in thread
From: Ville Syrjälä @ 2014-05-22 16:51 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development

On Thu, May 22, 2014 at 05:56:35PM +0200, Daniel Vetter wrote:
> Currently we do a full re-init of all interrupts after a gpu hang.
> Which is pretty bad since we don't restore the interrupts we've
> enabled at runtime correctly. Even with that addressed it's rather
> horribly race.
> 
> But on g4x and later we only reset the gt and not the entire gpu.
> Which means we only need to reset the GT interrupt bits. Which has the
> nice benefit that vblank waits, pipe CRC interrupts and everything
> else display related just keeps on working.
> 
> The downside is that gt interrupt handling (i.e. ring->get/put_irq) is
> still racy. But as long as the gpu hang reliably wakes all waters and
> we have a short time where the refcount drops to 0 we'll recover. So
> not that bad really.

A quick test on IVB tells me that GTIMR and GEN6_PMIMR survive the full
gt reset. But the ring IMRs do get clobbered. So could we just skip the
entire irq reset here?

> 
> Testcase: igt/kms_flip/vblank-vs-hang
> Testcase: igt/kms_pipe_crc_basic/hang-*
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 17 +++++++++--------
>  drivers/gpu/drm/i915/i915_drv.h |  1 +
>  drivers/gpu/drm/i915/i915_irq.c | 35 +++++++++++++++++++++++++++--------
>  3 files changed, 37 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index c83c83b74bf4..69a75bb7ad83 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -811,17 +811,18 @@ int i915_reset(struct drm_device *dev)
>  		}
>  
>  		/*
> -		 * FIXME: This is horribly race against concurrent pageflip and
> -		 * vblank wait ioctls since they can observe dev->irqs_disabled
> -		 * being false when they shouldn't be able to.
> +		 * FIXME: This races pretty badly against concurrent holders of
> +		 * ring interrupts. This is possible since we've started to drop
> +		 * dev->struct_mutex in select places when waiting for the gpu.
>  		 */
> -		drm_irq_uninstall(dev);
> -		drm_irq_install(dev, dev->pdev->irq);
> +		intel_gt_irq_reinit(dev);
>  
> -		/* rps/rc6 re-init is necessary to restore state lost after the
> -		 * reset and the re-install of drm irq. Skip for ironlake per
> +		/*
> +		 * rps/rc6 re-init is necessary to restore state lost after the
> +		 * reset and the re-install of gt irqs. Skip for ironlake per
>  		 * previous concerns that it doesn't respond well to some forms
> -		 * of re-init after reset. */
> +		 * of re-init after reset.
> +		 */
>  		if (INTEL_INFO(dev)->gen > 5)
>  			intel_reset_gt_powersave(dev);
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 459b02ad1ef3..61792c4050e7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2078,6 +2078,7 @@ void gen6_set_pm_mask(struct drm_i915_private *dev_priv, u32 pm_iir,
>  							int new_delay);
>  extern void intel_irq_init(struct drm_device *dev);
>  extern void intel_hpd_init(struct drm_device *dev);
> +extern void intel_gt_irq_reinit(struct drm_device *dev);
>  
>  extern void intel_uncore_sanitize(struct drm_device *dev);
>  extern void intel_uncore_early_sanitize(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 3cd659b47dd2..6dba645a81fa 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3140,6 +3140,14 @@ static void valleyview_irq_preinstall(struct drm_device *dev)
>  	POSTING_READ(VLV_IER);
>  }
>  
> +static void gen8_gt_irq_reset(struct drm_i915_private *dev_priv)
> +{
> +	GEN8_IRQ_RESET_NDX(GT, 0);
> +	GEN8_IRQ_RESET_NDX(GT, 1);
> +	GEN8_IRQ_RESET_NDX(GT, 2);
> +	GEN8_IRQ_RESET_NDX(GT, 3);
> +}
> +
>  static void gen8_irq_reset(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -3148,10 +3156,7 @@ static void gen8_irq_reset(struct drm_device *dev)
>  	I915_WRITE(GEN8_MASTER_IRQ, 0);
>  	POSTING_READ(GEN8_MASTER_IRQ);
>  
> -	GEN8_IRQ_RESET_NDX(GT, 0);
> -	GEN8_IRQ_RESET_NDX(GT, 1);
> -	GEN8_IRQ_RESET_NDX(GT, 2);
> -	GEN8_IRQ_RESET_NDX(GT, 3);
> +	gen8_gt_irq_reset(dev_priv);
>  
>  	for_each_pipe(pipe)
>  		GEN8_IRQ_RESET_NDX(DE_PIPE, pipe);
> @@ -3171,10 +3176,7 @@ static void cherryview_irq_preinstall(struct drm_device *dev)
>  	I915_WRITE(GEN8_MASTER_IRQ, 0);
>  	POSTING_READ(GEN8_MASTER_IRQ);
>  
> -	GEN8_IRQ_RESET_NDX(GT, 0);
> -	GEN8_IRQ_RESET_NDX(GT, 1);
> -	GEN8_IRQ_RESET_NDX(GT, 2);
> -	GEN8_IRQ_RESET_NDX(GT, 3);
> +	gen8_gt_irq_reset(dev_priv);
>  
>  	GEN5_IRQ_RESET(GEN8_PCU_);
>  
> @@ -4436,6 +4438,23 @@ void intel_hpd_init(struct drm_device *dev)
>  	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
>  }
>  
> +void intel_gt_irq_reinit(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	spin_lock_irq(&dev_priv->irq_lock);
> +	if (INTEL_INFO(dev)->gen >= 8) {
> +		gen8_gt_irq_reset(dev_priv);
> +		gen8_gt_irq_postinstall(dev_priv);
> +	} else if (INTEL_INFO(dev)->gen >= 5) {
> +		gen5_gt_irq_reset(dev);
> +		gen5_gt_irq_postinstall(dev);
> +	} else {
> +		WARN_ON(!IS_G4X(dev));
> +	}
> +	spin_unlock_irq(&dev_priv->irq_lock);
> +}
> +
>  /* Disable interrupts so we can allow runtime PM. */
>  void intel_runtime_pm_disable_interrupts(struct drm_device *dev)
>  {
> -- 
> 1.8.4.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N
  2014-05-22 15:56 ` [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N Daniel Vetter
@ 2014-05-22 16:55   ` Ville Syrjälä
  2014-05-22 20:10     ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Ville Syrjälä @ 2014-05-22 16:55 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development

On Thu, May 22, 2014 at 05:56:32PM +0200, Daniel Vetter wrote:
> So apparently this is tricky.
> 
> We need to consider:
> - We start out with all the hw enabling bits disabled, both the
>   individual fifo underrun interrupts and the shared display error
>   interrupts masked. Otherwise if the bios config is broken we'll blow
>   up with a NULL deref in our interrupt handler since the crtc
>   structures aren't set up yet at driver load time.
> - On gmch we need to mask fifo underruns on the sw side, so always
>   need to set that in sanitize_crtc for those platforms.
> - On other platforms we try to set the sw tracking so that it reflects
>   the real state. But since a few platforms have shared bits we must
>   _not_ disable fifo underrun reporting. Otherwise we'll never enable
>   the shared error interrupt.
> 
> This is the state before out patch, but unfortunately this is not good
> enough. But after a suspend resume operation this is broken:
> 1. We don't enable the hw interrupts since the same code runs on
> resume as on driver load.
> 2. The fifo underrun state adjustments we do in sanitize_crtc doesn't
> fire on resume since (except for hilarious firmware) all pipes are off
> at that point. But they also don't hurt since the subsequent crtc
> enabling due to force_restore will enable fifo underruns.
> 
> Which means when we enable fifo underrun reporting we notice that the
> per-crtc state is already correct and short-circuit everthing out. And
> the interrupt doesn't get enabled.
> 
> A similar problem would happen if the bios doesn't light up anything
> when the driver loads. Which is exactly what happens when we reload
> the driver since our unload functions disables all outputs.
> 
> Now we can't just rip out the short-circuit logic and unconditionally
> update the fifo underrun reporting interrupt masking: We have some
> checks for shared error interrupts to catch issues that happened when
> the shared error interrupt was disabled.

Hmm. Do we have cases where we do enabled->enabled "transition"?
Because in that case we would now clear the register without
reporting if there was an underrun in the register.

> 
> The right fix is to push down this logic so that we can always update
> the hardware state, but only check for missed fifo underruns on a real
> enabled->disabled transition and ignore them when we're already
> disabled.
> 
> On platforms with shared error interrupt the pipe CRC interrupts are
> grouped together with the fifo underrun reporting this fixes pipe CRC
> support after suspend and driver reloads.
> 
> Testcase: igt/kms_pipe_crc_basic/suspend-*
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 44 ++++++++++++++++++-----------------------
>  1 file changed, 19 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 304f86a3162c..4d44f09eb833 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -335,7 +335,8 @@ void i9xx_check_fifo_underruns(struct drm_device *dev)
>  }
>  
>  static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
> -					     enum pipe pipe, bool enable)
> +					     enum pipe pipe,
> +					     bool enable, bool old)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	u32 reg = PIPESTAT(pipe);
> @@ -347,7 +348,7 @@ static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
>  		I915_WRITE(reg, pipestat | PIPE_FIFO_UNDERRUN_STATUS);
>  		POSTING_READ(reg);
>  	} else {
> -		if (pipestat & PIPE_FIFO_UNDERRUN_STATUS)
> +		if (old && pipestat & PIPE_FIFO_UNDERRUN_STATUS)
>  			DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
>  	}
>  }
> @@ -366,7 +367,8 @@ static void ironlake_set_fifo_underrun_reporting(struct drm_device *dev,
>  }
>  
>  static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
> -						  enum pipe pipe, bool enable)
> +						  enum pipe pipe,
> +						  bool enable, bool old)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	if (enable) {
> @@ -379,7 +381,8 @@ static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
>  	} else {
>  		ironlake_disable_display_irq(dev_priv, DE_ERR_INT_IVB);
>  
> -		if (I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
> +		if (old &&
> +		    I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
>  			DRM_ERROR("uncleared fifo underrun on pipe %c\n",
>  				  pipe_name(pipe));
>  		}
> @@ -444,7 +447,7 @@ static void ibx_set_fifo_underrun_reporting(struct drm_device *dev,
>  
>  static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
>  					    enum transcoder pch_transcoder,
> -					    bool enable)
> +					    bool enable, bool old)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
> @@ -459,7 +462,8 @@ static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
>  	} else {
>  		ibx_disable_display_interrupt(dev_priv, SDE_ERROR_CPT);
>  
> -		if (I915_READ(SERR_INT) & SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
> +		if (old && I915_READ(SERR_INT) &
> +		    SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
>  			DRM_ERROR("uncleared pch fifo underrun on pch transcoder %c\n",
>  				  transcoder_name(pch_transcoder));
>  		}
> @@ -486,28 +490,23 @@ static bool __intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
>  	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> -	bool ret;
> +	bool old;
>  
>  	assert_spin_locked(&dev_priv->irq_lock);
>  
> -	ret = !intel_crtc->cpu_fifo_underrun_disabled;
> -
> -	if (enable == ret)
> -		goto done;
> -
> +	old = !intel_crtc->cpu_fifo_underrun_disabled;
>  	intel_crtc->cpu_fifo_underrun_disabled = !enable;
>  
>  	if (INTEL_INFO(dev)->gen < 5 || IS_VALLEYVIEW(dev))
> -		i9xx_set_fifo_underrun_reporting(dev, pipe, enable);
> +		i9xx_set_fifo_underrun_reporting(dev, pipe, enable, old);
>  	else if (IS_GEN5(dev) || IS_GEN6(dev))
>  		ironlake_set_fifo_underrun_reporting(dev, pipe, enable);
>  	else if (IS_GEN7(dev))
> -		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable);
> +		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable, old);
>  	else if (IS_GEN8(dev))
>  		broadwell_set_fifo_underrun_reporting(dev, pipe, enable);
>  
> -done:
> -	return ret;
> +	return old;
>  }
>  
>  bool intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
> @@ -556,7 +555,7 @@ bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
>  	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pch_transcoder];
>  	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>  	unsigned long flags;
> -	bool ret;
> +	bool old;
>  
>  	/*
>  	 * NOTE: Pre-LPT has a fixed cpu pipe -> pch transcoder mapping, but LPT
> @@ -569,21 +568,16 @@ bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
>  
>  	spin_lock_irqsave(&dev_priv->irq_lock, flags);
>  
> -	ret = !intel_crtc->pch_fifo_underrun_disabled;
> -
> -	if (enable == ret)
> -		goto done;
> -
> +	old = !intel_crtc->pch_fifo_underrun_disabled;
>  	intel_crtc->pch_fifo_underrun_disabled = !enable;
>  
>  	if (HAS_PCH_IBX(dev))
>  		ibx_set_fifo_underrun_reporting(dev, pch_transcoder, enable);
>  	else
> -		cpt_set_fifo_underrun_reporting(dev, pch_transcoder, enable);
> +		cpt_set_fifo_underrun_reporting(dev, pch_transcoder, enable, old);
>  
> -done:
>  	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> -	return ret;
> +	return old;
>  }
>  
>  
> -- 
> 1.8.4.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs
  2014-05-22 15:56 [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs Daniel Vetter
                   ` (3 preceding siblings ...)
  2014-05-22 15:56 ` [PATCH 5/5] drm/i915: Improve irq handling after gpu resets Daniel Vetter
@ 2014-05-22 17:02 ` Ville Syrjälä
  4 siblings, 0 replies; 19+ messages in thread
From: Ville Syrjälä @ 2014-05-22 17:02 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development

On Thu, May 22, 2014 at 05:56:31PM +0200, Daniel Vetter wrote:
> On platforms with shared interrupt enable bits (which are shared even
> with the pipe CRC logic) there's some tricky corner cases. Add
> information to make debugging those easier.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

For patches 1,3,4:
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 16bbdc7243df..4a79d3fe35be 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2356,6 +2356,10 @@ static int i915_display_info(struct seq_file *m, void *unused)
>  				   x, y, crtc->cursor_addr,
>  				   yesno(active));
>  		}
> +
> +		seq_printf(m, "\tunderrun reporting: cpu=%s pch=%s \n",
> +			   yesno(!crtc->cpu_fifo_underrun_disabled),
> +			   yesno(!crtc->pch_fifo_underrun_disabled));
>  	}
>  
>  	seq_printf(m, "\n");
> -- 
> 1.8.4.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/i915: Improve irq handling after gpu resets
  2014-05-22 16:51   ` Ville Syrjälä
@ 2014-05-22 17:06     ` Ville Syrjälä
  2014-05-22 20:12     ` Daniel Vetter
  1 sibling, 0 replies; 19+ messages in thread
From: Ville Syrjälä @ 2014-05-22 17:06 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development

On Thu, May 22, 2014 at 07:51:45PM +0300, Ville Syrjälä wrote:
> On Thu, May 22, 2014 at 05:56:35PM +0200, Daniel Vetter wrote:
> > Currently we do a full re-init of all interrupts after a gpu hang.
> > Which is pretty bad since we don't restore the interrupts we've
> > enabled at runtime correctly. Even with that addressed it's rather
> > horribly race.
> > 
> > But on g4x and later we only reset the gt and not the entire gpu.
> > Which means we only need to reset the GT interrupt bits. Which has the
> > nice benefit that vblank waits, pipe CRC interrupts and everything
> > else display related just keeps on working.
> > 
> > The downside is that gt interrupt handling (i.e. ring->get/put_irq) is
> > still racy. But as long as the gpu hang reliably wakes all waters and
> > we have a short time where the refcount drops to 0 we'll recover. So
> > not that bad really.
> 
> A quick test on IVB tells me that GTIMR and GEN6_PMIMR survive the full
> gt reset. But the ring IMRs do get clobbered. So could we just skip the
> entire irq reset here?

Same on ILK. GTIMR survives both render and media resets.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N
  2014-05-22 16:55   ` Ville Syrjälä
@ 2014-05-22 20:10     ` Daniel Vetter
  2014-05-23  8:11       ` Ville Syrjälä
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2014-05-22 20:10 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Daniel Vetter, Intel Graphics Development

On Thu, May 22, 2014 at 07:55:52PM +0300, Ville Syrjälä wrote:
> On Thu, May 22, 2014 at 05:56:32PM +0200, Daniel Vetter wrote:
> > So apparently this is tricky.
> > 
> > We need to consider:
> > - We start out with all the hw enabling bits disabled, both the
> >   individual fifo underrun interrupts and the shared display error
> >   interrupts masked. Otherwise if the bios config is broken we'll blow
> >   up with a NULL deref in our interrupt handler since the crtc
> >   structures aren't set up yet at driver load time.
> > - On gmch we need to mask fifo underruns on the sw side, so always
> >   need to set that in sanitize_crtc for those platforms.
> > - On other platforms we try to set the sw tracking so that it reflects
> >   the real state. But since a few platforms have shared bits we must
> >   _not_ disable fifo underrun reporting. Otherwise we'll never enable
> >   the shared error interrupt.
> > 
> > This is the state before out patch, but unfortunately this is not good
> > enough. But after a suspend resume operation this is broken:
> > 1. We don't enable the hw interrupts since the same code runs on
> > resume as on driver load.
> > 2. The fifo underrun state adjustments we do in sanitize_crtc doesn't
> > fire on resume since (except for hilarious firmware) all pipes are off
> > at that point. But they also don't hurt since the subsequent crtc
> > enabling due to force_restore will enable fifo underruns.
> > 
> > Which means when we enable fifo underrun reporting we notice that the
> > per-crtc state is already correct and short-circuit everthing out. And
> > the interrupt doesn't get enabled.
> > 
> > A similar problem would happen if the bios doesn't light up anything
> > when the driver loads. Which is exactly what happens when we reload
> > the driver since our unload functions disables all outputs.
> > 
> > Now we can't just rip out the short-circuit logic and unconditionally
> > update the fifo underrun reporting interrupt masking: We have some
> > checks for shared error interrupts to catch issues that happened when
> > the shared error interrupt was disabled.
> 
> Hmm. Do we have cases where we do enabled->enabled "transition"?
> Because in that case we would now clear the register without
> reporting if there was an underrun in the register.

We definitely have disabled->disabled transtions, which is what we need to
filter out. Since it means we have either untrusted watermarks or hit and
underrun.

For enabled->enabled I think that can happen in crtc_enable - we
unconditionally enable underrun reporting againg to clear out old fail (or
firmware setups). But we don't always disable it when disabling the crtc
since some platforms/ports don't underrun when disabled.

This also only matters when we've had an underrun with a shared error
interrupt somewhere. Which is the case where the old code is broken
anyway, so I'm not sure how much we should care really.

Also the current code only checks for missed fifo underruns on disabling,
so even if there is a notch of a leak here it's definitely not something
this patch will regress.
-Daniel
> 
> > 
> > The right fix is to push down this logic so that we can always update
> > the hardware state, but only check for missed fifo underruns on a real
> > enabled->disabled transition and ignore them when we're already
> > disabled.
> > 
> > On platforms with shared error interrupt the pipe CRC interrupts are
> > grouped together with the fifo underrun reporting this fixes pipe CRC
> > support after suspend and driver reloads.
> > 
> > Testcase: igt/kms_pipe_crc_basic/suspend-*
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 44 ++++++++++++++++++-----------------------
> >  1 file changed, 19 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 304f86a3162c..4d44f09eb833 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -335,7 +335,8 @@ void i9xx_check_fifo_underruns(struct drm_device *dev)
> >  }
> >  
> >  static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
> > -					     enum pipe pipe, bool enable)
> > +					     enum pipe pipe,
> > +					     bool enable, bool old)
> >  {
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	u32 reg = PIPESTAT(pipe);
> > @@ -347,7 +348,7 @@ static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
> >  		I915_WRITE(reg, pipestat | PIPE_FIFO_UNDERRUN_STATUS);
> >  		POSTING_READ(reg);
> >  	} else {
> > -		if (pipestat & PIPE_FIFO_UNDERRUN_STATUS)
> > +		if (old && pipestat & PIPE_FIFO_UNDERRUN_STATUS)
> >  			DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
> >  	}
> >  }
> > @@ -366,7 +367,8 @@ static void ironlake_set_fifo_underrun_reporting(struct drm_device *dev,
> >  }
> >  
> >  static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
> > -						  enum pipe pipe, bool enable)
> > +						  enum pipe pipe,
> > +						  bool enable, bool old)
> >  {
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	if (enable) {
> > @@ -379,7 +381,8 @@ static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
> >  	} else {
> >  		ironlake_disable_display_irq(dev_priv, DE_ERR_INT_IVB);
> >  
> > -		if (I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
> > +		if (old &&
> > +		    I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
> >  			DRM_ERROR("uncleared fifo underrun on pipe %c\n",
> >  				  pipe_name(pipe));
> >  		}
> > @@ -444,7 +447,7 @@ static void ibx_set_fifo_underrun_reporting(struct drm_device *dev,
> >  
> >  static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
> >  					    enum transcoder pch_transcoder,
> > -					    bool enable)
> > +					    bool enable, bool old)
> >  {
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  
> > @@ -459,7 +462,8 @@ static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
> >  	} else {
> >  		ibx_disable_display_interrupt(dev_priv, SDE_ERROR_CPT);
> >  
> > -		if (I915_READ(SERR_INT) & SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
> > +		if (old && I915_READ(SERR_INT) &
> > +		    SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
> >  			DRM_ERROR("uncleared pch fifo underrun on pch transcoder %c\n",
> >  				  transcoder_name(pch_transcoder));
> >  		}
> > @@ -486,28 +490,23 @@ static bool __intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> >  	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
> >  	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> > -	bool ret;
> > +	bool old;
> >  
> >  	assert_spin_locked(&dev_priv->irq_lock);
> >  
> > -	ret = !intel_crtc->cpu_fifo_underrun_disabled;
> > -
> > -	if (enable == ret)
> > -		goto done;
> > -
> > +	old = !intel_crtc->cpu_fifo_underrun_disabled;
> >  	intel_crtc->cpu_fifo_underrun_disabled = !enable;
> >  
> >  	if (INTEL_INFO(dev)->gen < 5 || IS_VALLEYVIEW(dev))
> > -		i9xx_set_fifo_underrun_reporting(dev, pipe, enable);
> > +		i9xx_set_fifo_underrun_reporting(dev, pipe, enable, old);
> >  	else if (IS_GEN5(dev) || IS_GEN6(dev))
> >  		ironlake_set_fifo_underrun_reporting(dev, pipe, enable);
> >  	else if (IS_GEN7(dev))
> > -		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable);
> > +		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable, old);
> >  	else if (IS_GEN8(dev))
> >  		broadwell_set_fifo_underrun_reporting(dev, pipe, enable);
> >  
> > -done:
> > -	return ret;
> > +	return old;
> >  }
> >  
> >  bool intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
> > @@ -556,7 +555,7 @@ bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
> >  	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pch_transcoder];
> >  	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> >  	unsigned long flags;
> > -	bool ret;
> > +	bool old;
> >  
> >  	/*
> >  	 * NOTE: Pre-LPT has a fixed cpu pipe -> pch transcoder mapping, but LPT
> > @@ -569,21 +568,16 @@ bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
> >  
> >  	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> >  
> > -	ret = !intel_crtc->pch_fifo_underrun_disabled;
> > -
> > -	if (enable == ret)
> > -		goto done;
> > -
> > +	old = !intel_crtc->pch_fifo_underrun_disabled;
> >  	intel_crtc->pch_fifo_underrun_disabled = !enable;
> >  
> >  	if (HAS_PCH_IBX(dev))
> >  		ibx_set_fifo_underrun_reporting(dev, pch_transcoder, enable);
> >  	else
> > -		cpt_set_fifo_underrun_reporting(dev, pch_transcoder, enable);
> > +		cpt_set_fifo_underrun_reporting(dev, pch_transcoder, enable, old);
> >  
> > -done:
> >  	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> > -	return ret;
> > +	return old;
> >  }
> >  
> >  
> > -- 
> > 1.8.4.rc3
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] drm/i915: Improve irq handling after gpu resets
  2014-05-22 16:51   ` Ville Syrjälä
  2014-05-22 17:06     ` Ville Syrjälä
@ 2014-05-22 20:12     ` Daniel Vetter
  1 sibling, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2014-05-22 20:12 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Daniel Vetter, Intel Graphics Development

On Thu, May 22, 2014 at 07:51:45PM +0300, Ville Syrjälä wrote:
> On Thu, May 22, 2014 at 05:56:35PM +0200, Daniel Vetter wrote:
> > Currently we do a full re-init of all interrupts after a gpu hang.
> > Which is pretty bad since we don't restore the interrupts we've
> > enabled at runtime correctly. Even with that addressed it's rather
> > horribly race.
> > 
> > But on g4x and later we only reset the gt and not the entire gpu.
> > Which means we only need to reset the GT interrupt bits. Which has the
> > nice benefit that vblank waits, pipe CRC interrupts and everything
> > else display related just keeps on working.
> > 
> > The downside is that gt interrupt handling (i.e. ring->get/put_irq) is
> > still racy. But as long as the gpu hang reliably wakes all waters and
> > we have a short time where the refcount drops to 0 we'll recover. So
> > not that bad really.
> 
> A quick test on IVB tells me that GTIMR and GEN6_PMIMR survive the full
> gt reset. But the ring IMRs do get clobbered. So could we just skip the
> entire irq reset here?

Hm ... And things still neatly work afterwards?

I guess the FIXME we need to keep since clobbering the per-ring registers
through the reset is still racy. But we'd only need to restore the bits
correctly.

I'll respin.
-Daniel

> 
> > 
> > Testcase: igt/kms_flip/vblank-vs-hang
> > Testcase: igt/kms_pipe_crc_basic/hang-*
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c | 17 +++++++++--------
> >  drivers/gpu/drm/i915/i915_drv.h |  1 +
> >  drivers/gpu/drm/i915/i915_irq.c | 35 +++++++++++++++++++++++++++--------
> >  3 files changed, 37 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index c83c83b74bf4..69a75bb7ad83 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -811,17 +811,18 @@ int i915_reset(struct drm_device *dev)
> >  		}
> >  
> >  		/*
> > -		 * FIXME: This is horribly race against concurrent pageflip and
> > -		 * vblank wait ioctls since they can observe dev->irqs_disabled
> > -		 * being false when they shouldn't be able to.
> > +		 * FIXME: This races pretty badly against concurrent holders of
> > +		 * ring interrupts. This is possible since we've started to drop
> > +		 * dev->struct_mutex in select places when waiting for the gpu.
> >  		 */
> > -		drm_irq_uninstall(dev);
> > -		drm_irq_install(dev, dev->pdev->irq);
> > +		intel_gt_irq_reinit(dev);
> >  
> > -		/* rps/rc6 re-init is necessary to restore state lost after the
> > -		 * reset and the re-install of drm irq. Skip for ironlake per
> > +		/*
> > +		 * rps/rc6 re-init is necessary to restore state lost after the
> > +		 * reset and the re-install of gt irqs. Skip for ironlake per
> >  		 * previous concerns that it doesn't respond well to some forms
> > -		 * of re-init after reset. */
> > +		 * of re-init after reset.
> > +		 */
> >  		if (INTEL_INFO(dev)->gen > 5)
> >  			intel_reset_gt_powersave(dev);
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 459b02ad1ef3..61792c4050e7 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2078,6 +2078,7 @@ void gen6_set_pm_mask(struct drm_i915_private *dev_priv, u32 pm_iir,
> >  							int new_delay);
> >  extern void intel_irq_init(struct drm_device *dev);
> >  extern void intel_hpd_init(struct drm_device *dev);
> > +extern void intel_gt_irq_reinit(struct drm_device *dev);
> >  
> >  extern void intel_uncore_sanitize(struct drm_device *dev);
> >  extern void intel_uncore_early_sanitize(struct drm_device *dev);
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 3cd659b47dd2..6dba645a81fa 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -3140,6 +3140,14 @@ static void valleyview_irq_preinstall(struct drm_device *dev)
> >  	POSTING_READ(VLV_IER);
> >  }
> >  
> > +static void gen8_gt_irq_reset(struct drm_i915_private *dev_priv)
> > +{
> > +	GEN8_IRQ_RESET_NDX(GT, 0);
> > +	GEN8_IRQ_RESET_NDX(GT, 1);
> > +	GEN8_IRQ_RESET_NDX(GT, 2);
> > +	GEN8_IRQ_RESET_NDX(GT, 3);
> > +}
> > +
> >  static void gen8_irq_reset(struct drm_device *dev)
> >  {
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > @@ -3148,10 +3156,7 @@ static void gen8_irq_reset(struct drm_device *dev)
> >  	I915_WRITE(GEN8_MASTER_IRQ, 0);
> >  	POSTING_READ(GEN8_MASTER_IRQ);
> >  
> > -	GEN8_IRQ_RESET_NDX(GT, 0);
> > -	GEN8_IRQ_RESET_NDX(GT, 1);
> > -	GEN8_IRQ_RESET_NDX(GT, 2);
> > -	GEN8_IRQ_RESET_NDX(GT, 3);
> > +	gen8_gt_irq_reset(dev_priv);
> >  
> >  	for_each_pipe(pipe)
> >  		GEN8_IRQ_RESET_NDX(DE_PIPE, pipe);
> > @@ -3171,10 +3176,7 @@ static void cherryview_irq_preinstall(struct drm_device *dev)
> >  	I915_WRITE(GEN8_MASTER_IRQ, 0);
> >  	POSTING_READ(GEN8_MASTER_IRQ);
> >  
> > -	GEN8_IRQ_RESET_NDX(GT, 0);
> > -	GEN8_IRQ_RESET_NDX(GT, 1);
> > -	GEN8_IRQ_RESET_NDX(GT, 2);
> > -	GEN8_IRQ_RESET_NDX(GT, 3);
> > +	gen8_gt_irq_reset(dev_priv);
> >  
> >  	GEN5_IRQ_RESET(GEN8_PCU_);
> >  
> > @@ -4436,6 +4438,23 @@ void intel_hpd_init(struct drm_device *dev)
> >  	spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
> >  }
> >  
> > +void intel_gt_irq_reinit(struct drm_device *dev)
> > +{
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +
> > +	spin_lock_irq(&dev_priv->irq_lock);
> > +	if (INTEL_INFO(dev)->gen >= 8) {
> > +		gen8_gt_irq_reset(dev_priv);
> > +		gen8_gt_irq_postinstall(dev_priv);
> > +	} else if (INTEL_INFO(dev)->gen >= 5) {
> > +		gen5_gt_irq_reset(dev);
> > +		gen5_gt_irq_postinstall(dev);
> > +	} else {
> > +		WARN_ON(!IS_G4X(dev));
> > +	}
> > +	spin_unlock_irq(&dev_priv->irq_lock);
> > +}
> > +
> >  /* Disable interrupts so we can allow runtime PM. */
> >  void intel_runtime_pm_disable_interrupts(struct drm_device *dev)
> >  {
> > -- 
> > 1.8.4.rc3
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] drm/i915: Improve irq handling after gpu resets
  2014-05-22 15:56 ` [PATCH 5/5] drm/i915: Improve irq handling after gpu resets Daniel Vetter
  2014-05-22 16:51   ` Ville Syrjälä
@ 2014-05-22 20:18   ` Daniel Vetter
  2014-05-22 20:18     ` [PATCH 2/2] drm/i915: Extract gen8_gt_irq_reset Daniel Vetter
  2014-05-26  8:36     ` [PATCH 1/2] drm/i915: Improve irq handling after gpu resets Ville Syrjälä
  1 sibling, 2 replies; 19+ messages in thread
From: Daniel Vetter @ 2014-05-22 20:18 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

Currently we do a full re-init of all interrupts after a gpu hang.
Which is pretty bad since we don't restore the interrupts we've
enabled at runtime correctly. Even with that addressed it's rather
horribly race.

But on g4x and later we only reset the gt and not the entire gpu.
Which means we only need to reset the GT interrupt bits. Which has the
nice benefit that vblank waits, pipe CRC interrupts and everything
else display related just keeps on working.

The downside is that gt interrupt handling (i.e. ring->get/put_irq) is
still racy. But as long as the gpu hang reliably wakes all waters and
we have a short time where the refcount drops to 0 we'll recover. So
not that bad really.

v2: Ville noticed that GTIMR and PMIMR don't get cleared, only the
subordinate per-ring registers. So let's rip out all the interrupt dancing.
The FIXME comment is still required though since the ring irq handling
happens at the per-ring interrupt mask registers, too.

Testcase: igt/kms_flip/vblank-vs-hang
Testcase: igt/kms_pipe_crc_basic/hang-*
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index c83c83b74bf4..7ae906c811bb 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -811,17 +811,17 @@ int i915_reset(struct drm_device *dev)
 		}
 
 		/*
-		 * FIXME: This is horribly race against concurrent pageflip and
-		 * vblank wait ioctls since they can observe dev->irqs_disabled
-		 * being false when they shouldn't be able to.
+		 * FIXME: This races pretty badly against concurrent holders of
+		 * ring interrupts. This is possible since we've started to drop
+		 * dev->struct_mutex in select places when waiting for the gpu.
 		 */
-		drm_irq_uninstall(dev);
-		drm_irq_install(dev, dev->pdev->irq);
 
-		/* rps/rc6 re-init is necessary to restore state lost after the
-		 * reset and the re-install of drm irq. Skip for ironlake per
+		/*
+		 * rps/rc6 re-init is necessary to restore state lost after the
+		 * reset and the re-install of gt irqs. Skip for ironlake per
 		 * previous concerns that it doesn't respond well to some forms
-		 * of re-init after reset. */
+		 * of re-init after reset.
+		 */
 		if (INTEL_INFO(dev)->gen > 5)
 			intel_reset_gt_powersave(dev);
 
-- 
1.8.4.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] drm/i915: Extract gen8_gt_irq_reset
  2014-05-22 20:18   ` [PATCH 1/2] " Daniel Vetter
@ 2014-05-22 20:18     ` Daniel Vetter
  2014-05-26  8:36     ` [PATCH 1/2] drm/i915: Improve irq handling after gpu resets Ville Syrjälä
  1 sibling, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2014-05-22 20:18 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

Fallout from an intermediate patch revision that I deemed worth saving.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_irq.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 3cd659b47dd2..e73d9ab90070 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3140,6 +3140,14 @@ static void valleyview_irq_preinstall(struct drm_device *dev)
 	POSTING_READ(VLV_IER);
 }
 
+static void gen8_gt_irq_reset(struct drm_i915_private *dev_priv)
+{
+	GEN8_IRQ_RESET_NDX(GT, 0);
+	GEN8_IRQ_RESET_NDX(GT, 1);
+	GEN8_IRQ_RESET_NDX(GT, 2);
+	GEN8_IRQ_RESET_NDX(GT, 3);
+}
+
 static void gen8_irq_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -3148,10 +3156,7 @@ static void gen8_irq_reset(struct drm_device *dev)
 	I915_WRITE(GEN8_MASTER_IRQ, 0);
 	POSTING_READ(GEN8_MASTER_IRQ);
 
-	GEN8_IRQ_RESET_NDX(GT, 0);
-	GEN8_IRQ_RESET_NDX(GT, 1);
-	GEN8_IRQ_RESET_NDX(GT, 2);
-	GEN8_IRQ_RESET_NDX(GT, 3);
+	gen8_gt_irq_reset(dev_priv);
 
 	for_each_pipe(pipe)
 		GEN8_IRQ_RESET_NDX(DE_PIPE, pipe);
@@ -3171,10 +3176,7 @@ static void cherryview_irq_preinstall(struct drm_device *dev)
 	I915_WRITE(GEN8_MASTER_IRQ, 0);
 	POSTING_READ(GEN8_MASTER_IRQ);
 
-	GEN8_IRQ_RESET_NDX(GT, 0);
-	GEN8_IRQ_RESET_NDX(GT, 1);
-	GEN8_IRQ_RESET_NDX(GT, 2);
-	GEN8_IRQ_RESET_NDX(GT, 3);
+	gen8_gt_irq_reset(dev_priv);
 
 	GEN5_IRQ_RESET(GEN8_PCU_);
 
-- 
1.8.4.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N
  2014-05-22 20:10     ` Daniel Vetter
@ 2014-05-23  8:11       ` Ville Syrjälä
  2014-05-23  8:21         ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Ville Syrjälä @ 2014-05-23  8:11 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel Graphics Development

On Thu, May 22, 2014 at 10:10:33PM +0200, Daniel Vetter wrote:
> On Thu, May 22, 2014 at 07:55:52PM +0300, Ville Syrjälä wrote:
> > On Thu, May 22, 2014 at 05:56:32PM +0200, Daniel Vetter wrote:
> > > So apparently this is tricky.
> > > 
> > > We need to consider:
> > > - We start out with all the hw enabling bits disabled, both the
> > >   individual fifo underrun interrupts and the shared display error
> > >   interrupts masked. Otherwise if the bios config is broken we'll blow
> > >   up with a NULL deref in our interrupt handler since the crtc
> > >   structures aren't set up yet at driver load time.
> > > - On gmch we need to mask fifo underruns on the sw side, so always
> > >   need to set that in sanitize_crtc for those platforms.
> > > - On other platforms we try to set the sw tracking so that it reflects
> > >   the real state. But since a few platforms have shared bits we must
> > >   _not_ disable fifo underrun reporting. Otherwise we'll never enable
> > >   the shared error interrupt.
> > > 
> > > This is the state before out patch, but unfortunately this is not good
> > > enough. But after a suspend resume operation this is broken:
> > > 1. We don't enable the hw interrupts since the same code runs on
> > > resume as on driver load.
> > > 2. The fifo underrun state adjustments we do in sanitize_crtc doesn't
> > > fire on resume since (except for hilarious firmware) all pipes are off
> > > at that point. But they also don't hurt since the subsequent crtc
> > > enabling due to force_restore will enable fifo underruns.
> > > 
> > > Which means when we enable fifo underrun reporting we notice that the
> > > per-crtc state is already correct and short-circuit everthing out. And
> > > the interrupt doesn't get enabled.
> > > 
> > > A similar problem would happen if the bios doesn't light up anything
> > > when the driver loads. Which is exactly what happens when we reload
> > > the driver since our unload functions disables all outputs.
> > > 
> > > Now we can't just rip out the short-circuit logic and unconditionally
> > > update the fifo underrun reporting interrupt masking: We have some
> > > checks for shared error interrupts to catch issues that happened when
> > > the shared error interrupt was disabled.
> > 
> > Hmm. Do we have cases where we do enabled->enabled "transition"?
> > Because in that case we would now clear the register without
> > reporting if there was an underrun in the register.
> 
> We definitely have disabled->disabled transtions, which is what we need to
> filter out. Since it means we have either untrusted watermarks or hit and
> underrun.
> 
> For enabled->enabled I think that can happen in crtc_enable - we
> unconditionally enable underrun reporting againg to clear out old fail (or
> firmware setups). But we don't always disable it when disabling the crtc
> since some platforms/ports don't underrun when disabled.

Hmm. Actually since we don't necessarily notice the underruns with the
shared error interrupt, maybe we need to also throw an explicit underrun
check at the end of crtc_disable. That would mean we'd catch the underrun
at the very latest there, and crtc_enable can then clear the bit without
worrying about losing valid underrun reports.

So, I think this patch looks OK. But we will need to keep this issue in
mind if we add the underrun report re-enable timer, or something like it.
Since my quick hack for that just blindly walks the crtcs and
(re)enables the underrun reporting for everything.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> 
> This also only matters when we've had an underrun with a shared error
> interrupt somewhere. Which is the case where the old code is broken
> anyway, so I'm not sure how much we should care really.
> 
> Also the current code only checks for missed fifo underruns on disabling,
> so even if there is a notch of a leak here it's definitely not something
> this patch will regress.
> -Daniel
> > 
> > > 
> > > The right fix is to push down this logic so that we can always update
> > > the hardware state, but only check for missed fifo underruns on a real
> > > enabled->disabled transition and ignore them when we're already
> > > disabled.
> > > 
> > > On platforms with shared error interrupt the pipe CRC interrupts are
> > > grouped together with the fifo underrun reporting this fixes pipe CRC
> > > support after suspend and driver reloads.
> > > 
> > > Testcase: igt/kms_pipe_crc_basic/suspend-*
> > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > ---
> > >  drivers/gpu/drm/i915/i915_irq.c | 44 ++++++++++++++++++-----------------------
> > >  1 file changed, 19 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > index 304f86a3162c..4d44f09eb833 100644
> > > --- a/drivers/gpu/drm/i915/i915_irq.c
> > > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > @@ -335,7 +335,8 @@ void i9xx_check_fifo_underruns(struct drm_device *dev)
> > >  }
> > >  
> > >  static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
> > > -					     enum pipe pipe, bool enable)
> > > +					     enum pipe pipe,
> > > +					     bool enable, bool old)
> > >  {
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > >  	u32 reg = PIPESTAT(pipe);
> > > @@ -347,7 +348,7 @@ static void i9xx_set_fifo_underrun_reporting(struct drm_device *dev,
> > >  		I915_WRITE(reg, pipestat | PIPE_FIFO_UNDERRUN_STATUS);
> > >  		POSTING_READ(reg);
> > >  	} else {
> > > -		if (pipestat & PIPE_FIFO_UNDERRUN_STATUS)
> > > +		if (old && pipestat & PIPE_FIFO_UNDERRUN_STATUS)
> > >  			DRM_ERROR("pipe %c underrun\n", pipe_name(pipe));
> > >  	}
> > >  }
> > > @@ -366,7 +367,8 @@ static void ironlake_set_fifo_underrun_reporting(struct drm_device *dev,
> > >  }
> > >  
> > >  static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
> > > -						  enum pipe pipe, bool enable)
> > > +						  enum pipe pipe,
> > > +						  bool enable, bool old)
> > >  {
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > >  	if (enable) {
> > > @@ -379,7 +381,8 @@ static void ivybridge_set_fifo_underrun_reporting(struct drm_device *dev,
> > >  	} else {
> > >  		ironlake_disable_display_irq(dev_priv, DE_ERR_INT_IVB);
> > >  
> > > -		if (I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
> > > +		if (old &&
> > > +		    I915_READ(GEN7_ERR_INT) & ERR_INT_FIFO_UNDERRUN(pipe)) {
> > >  			DRM_ERROR("uncleared fifo underrun on pipe %c\n",
> > >  				  pipe_name(pipe));
> > >  		}
> > > @@ -444,7 +447,7 @@ static void ibx_set_fifo_underrun_reporting(struct drm_device *dev,
> > >  
> > >  static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
> > >  					    enum transcoder pch_transcoder,
> > > -					    bool enable)
> > > +					    bool enable, bool old)
> > >  {
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > >  
> > > @@ -459,7 +462,8 @@ static void cpt_set_fifo_underrun_reporting(struct drm_device *dev,
> > >  	} else {
> > >  		ibx_disable_display_interrupt(dev_priv, SDE_ERROR_CPT);
> > >  
> > > -		if (I915_READ(SERR_INT) & SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
> > > +		if (old && I915_READ(SERR_INT) &
> > > +		    SERR_INT_TRANS_FIFO_UNDERRUN(pch_transcoder)) {
> > >  			DRM_ERROR("uncleared pch fifo underrun on pch transcoder %c\n",
> > >  				  transcoder_name(pch_transcoder));
> > >  		}
> > > @@ -486,28 +490,23 @@ static bool __intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > >  	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
> > >  	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> > > -	bool ret;
> > > +	bool old;
> > >  
> > >  	assert_spin_locked(&dev_priv->irq_lock);
> > >  
> > > -	ret = !intel_crtc->cpu_fifo_underrun_disabled;
> > > -
> > > -	if (enable == ret)
> > > -		goto done;
> > > -
> > > +	old = !intel_crtc->cpu_fifo_underrun_disabled;
> > >  	intel_crtc->cpu_fifo_underrun_disabled = !enable;
> > >  
> > >  	if (INTEL_INFO(dev)->gen < 5 || IS_VALLEYVIEW(dev))
> > > -		i9xx_set_fifo_underrun_reporting(dev, pipe, enable);
> > > +		i9xx_set_fifo_underrun_reporting(dev, pipe, enable, old);
> > >  	else if (IS_GEN5(dev) || IS_GEN6(dev))
> > >  		ironlake_set_fifo_underrun_reporting(dev, pipe, enable);
> > >  	else if (IS_GEN7(dev))
> > > -		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable);
> > > +		ivybridge_set_fifo_underrun_reporting(dev, pipe, enable, old);
> > >  	else if (IS_GEN8(dev))
> > >  		broadwell_set_fifo_underrun_reporting(dev, pipe, enable);
> > >  
> > > -done:
> > > -	return ret;
> > > +	return old;
> > >  }
> > >  
> > >  bool intel_set_cpu_fifo_underrun_reporting(struct drm_device *dev,
> > > @@ -556,7 +555,7 @@ bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
> > >  	struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pch_transcoder];
> > >  	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> > >  	unsigned long flags;
> > > -	bool ret;
> > > +	bool old;
> > >  
> > >  	/*
> > >  	 * NOTE: Pre-LPT has a fixed cpu pipe -> pch transcoder mapping, but LPT
> > > @@ -569,21 +568,16 @@ bool intel_set_pch_fifo_underrun_reporting(struct drm_device *dev,
> > >  
> > >  	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> > >  
> > > -	ret = !intel_crtc->pch_fifo_underrun_disabled;
> > > -
> > > -	if (enable == ret)
> > > -		goto done;
> > > -
> > > +	old = !intel_crtc->pch_fifo_underrun_disabled;
> > >  	intel_crtc->pch_fifo_underrun_disabled = !enable;
> > >  
> > >  	if (HAS_PCH_IBX(dev))
> > >  		ibx_set_fifo_underrun_reporting(dev, pch_transcoder, enable);
> > >  	else
> > > -		cpt_set_fifo_underrun_reporting(dev, pch_transcoder, enable);
> > > +		cpt_set_fifo_underrun_reporting(dev, pch_transcoder, enable, old);
> > >  
> > > -done:
> > >  	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> > > -	return ret;
> > > +	return old;
> > >  }
> > >  
> > >  
> > > -- 
> > > 1.8.4.rc3
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> > -- 
> > Ville Syrjälä
> > Intel OTC
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N
  2014-05-23  8:11       ` Ville Syrjälä
@ 2014-05-23  8:21         ` Daniel Vetter
  2014-05-26  8:09           ` Ville Syrjälä
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2014-05-23  8:21 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel Graphics Development

On Fri, May 23, 2014 at 10:11 AM, Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>> For enabled->enabled I think that can happen in crtc_enable - we
>> unconditionally enable underrun reporting againg to clear out old fail (or
>> firmware setups). But we don't always disable it when disabling the crtc
>> since some platforms/ports don't underrun when disabled.
>
> Hmm. Actually since we don't necessarily notice the underruns with the
> shared error interrupt, maybe we need to also throw an explicit underrun
> check at the end of crtc_disable. That would mean we'd catch the underrun
> at the very latest there, and crtc_enable can then clear the bit without
> worrying about losing valid underrun reports.
>
> So, I think this patch looks OK. But we will need to keep this issue in
> mind if we add the underrun report re-enable timer, or something like it.
> Since my quick hack for that just blindly walks the crtcs and
> (re)enables the underrun reporting for everything.

Hm yeah, we might want a fifo underrun check like on gmch platforms.
Otoh if we disable the shared error interrupt things are already
rather bad, and the problem on the first pipe should have accurate
irq-based reporting. The 2nd pipe is hidden, but should show up as
soon as the first issue is addressed.

So I don't think we really should worry about this. Maybe more
intersting would be to check the _other_ pipes when we re-enable fifo
underruns. At least some of the reports we've seen suggest that
modesets on the pch influence each another ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N
  2014-05-23  8:21         ` Daniel Vetter
@ 2014-05-26  8:09           ` Ville Syrjälä
  2014-05-26  8:11             ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Ville Syrjälä @ 2014-05-26  8:09 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development

On Fri, May 23, 2014 at 10:21:24AM +0200, Daniel Vetter wrote:
> On Fri, May 23, 2014 at 10:11 AM, Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> >> For enabled->enabled I think that can happen in crtc_enable - we
> >> unconditionally enable underrun reporting againg to clear out old fail (or
> >> firmware setups). But we don't always disable it when disabling the crtc
> >> since some platforms/ports don't underrun when disabled.
> >
> > Hmm. Actually since we don't necessarily notice the underruns with the
> > shared error interrupt, maybe we need to also throw an explicit underrun
> > check at the end of crtc_disable. That would mean we'd catch the underrun
> > at the very latest there, and crtc_enable can then clear the bit without
> > worrying about losing valid underrun reports.
> >
> > So, I think this patch looks OK. But we will need to keep this issue in
> > mind if we add the underrun report re-enable timer, or something like it.
> > Since my quick hack for that just blindly walks the crtcs and
> > (re)enables the underrun reporting for everything.
> 
> Hm yeah, we might want a fifo underrun check like on gmch platforms.
> Otoh if we disable the shared error interrupt things are already
> rather bad, and the problem on the first pipe should have accurate
> irq-based reporting. The 2nd pipe is hidden, but should show up as
> soon as the first issue is addressed.
> 
> So I don't think we really should worry about this. Maybe more
> intersting would be to check the _other_ pipes when we re-enable fifo
> underruns. At least some of the reports we've seen suggest that
> modesets on the pch influence each another ...

But we enable underrun reporting before doing the modeset, so at that
time the modeset can't have caused underruns on any pipe. So adding the
explicit check to the end .crtc_enable() (as I did for gmch) should
catch those nicely.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N
  2014-05-26  8:09           ` Ville Syrjälä
@ 2014-05-26  8:11             ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2014-05-26  8:11 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel Graphics Development

On Mon, May 26, 2014 at 10:09 AM, Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
> On Fri, May 23, 2014 at 10:21:24AM +0200, Daniel Vetter wrote:
>> On Fri, May 23, 2014 at 10:11 AM, Ville Syrjälä
>> <ville.syrjala@linux.intel.com> wrote:
>> >> For enabled->enabled I think that can happen in crtc_enable - we
>> >> unconditionally enable underrun reporting againg to clear out old fail (or
>> >> firmware setups). But we don't always disable it when disabling the crtc
>> >> since some platforms/ports don't underrun when disabled.
>> >
>> > Hmm. Actually since we don't necessarily notice the underruns with the
>> > shared error interrupt, maybe we need to also throw an explicit underrun
>> > check at the end of crtc_disable. That would mean we'd catch the underrun
>> > at the very latest there, and crtc_enable can then clear the bit without
>> > worrying about losing valid underrun reports.
>> >
>> > So, I think this patch looks OK. But we will need to keep this issue in
>> > mind if we add the underrun report re-enable timer, or something like it.
>> > Since my quick hack for that just blindly walks the crtcs and
>> > (re)enables the underrun reporting for everything.
>>
>> Hm yeah, we might want a fifo underrun check like on gmch platforms.
>> Otoh if we disable the shared error interrupt things are already
>> rather bad, and the problem on the first pipe should have accurate
>> irq-based reporting. The 2nd pipe is hidden, but should show up as
>> soon as the first issue is addressed.
>>
>> So I don't think we really should worry about this. Maybe more
>> intersting would be to check the _other_ pipes when we re-enable fifo
>> underruns. At least some of the reports we've seen suggest that
>> modesets on the pch influence each another ...
>
> But we enable underrun reporting before doing the modeset, so at that
> time the modeset can't have caused underruns on any pipe. So adding the
> explicit check to the end .crtc_enable() (as I did for gmch) should
> catch those nicely.

Hm yeah, that would indeed be a nice extension of the underrun checks
we have. Not going to volunteer for it though, since my goal here was
to make the pipe crc stuff work more reliably.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] drm/i915: Improve irq handling after gpu resets
  2014-05-22 20:18   ` [PATCH 1/2] " Daniel Vetter
  2014-05-22 20:18     ` [PATCH 2/2] drm/i915: Extract gen8_gt_irq_reset Daniel Vetter
@ 2014-05-26  8:36     ` Ville Syrjälä
  2014-05-26 10:48       ` Daniel Vetter
  1 sibling, 1 reply; 19+ messages in thread
From: Ville Syrjälä @ 2014-05-26  8:36 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel Graphics Development

On Thu, May 22, 2014 at 10:18:21PM +0200, Daniel Vetter wrote:
> Currently we do a full re-init of all interrupts after a gpu hang.
> Which is pretty bad since we don't restore the interrupts we've
> enabled at runtime correctly. Even with that addressed it's rather
> horribly race.
> 
> But on g4x and later we only reset the gt and not the entire gpu.
> Which means we only need to reset the GT interrupt bits. Which has the
> nice benefit that vblank waits, pipe CRC interrupts and everything
> else display related just keeps on working.
> 
> The downside is that gt interrupt handling (i.e. ring->get/put_irq) is
> still racy. But as long as the gpu hang reliably wakes all waters and
> we have a short time where the refcount drops to 0 we'll recover. So
> not that bad really.
> 
> v2: Ville noticed that GTIMR and PMIMR don't get cleared, only the
> subordinate per-ring registers. So let's rip out all the interrupt dancing.
> The FIXME comment is still required though since the ring irq handling
> happens at the per-ring interrupt mask registers, too.
> 
> Testcase: igt/kms_flip/vblank-vs-hang
> Testcase: igt/kms_pipe_crc_basic/hang-*
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Both patches:
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

And to answer you earlier question, yes things seemd to work fine after
a GPU reset if I didn't touch the interrupt registers. In fact I also
tried killing most of the gem_hw_init() stuff (basically just left
ring->init(), l3_remap, and context enable) and things still seemed to
work just fine.

> ---
>  drivers/gpu/drm/i915/i915_drv.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index c83c83b74bf4..7ae906c811bb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -811,17 +811,17 @@ int i915_reset(struct drm_device *dev)
>  		}
>  
>  		/*
> -		 * FIXME: This is horribly race against concurrent pageflip and
> -		 * vblank wait ioctls since they can observe dev->irqs_disabled
> -		 * being false when they shouldn't be able to.
> +		 * FIXME: This races pretty badly against concurrent holders of
> +		 * ring interrupts. This is possible since we've started to drop
> +		 * dev->struct_mutex in select places when waiting for the gpu.
>  		 */
> -		drm_irq_uninstall(dev);
> -		drm_irq_install(dev, dev->pdev->irq);
>  
> -		/* rps/rc6 re-init is necessary to restore state lost after the
> -		 * reset and the re-install of drm irq. Skip for ironlake per
> +		/*
> +		 * rps/rc6 re-init is necessary to restore state lost after the
> +		 * reset and the re-install of gt irqs. Skip for ironlake per
>  		 * previous concerns that it doesn't respond well to some forms
> -		 * of re-init after reset. */
> +		 * of re-init after reset.
> +		 */
>  		if (INTEL_INFO(dev)->gen > 5)
>  			intel_reset_gt_powersave(dev);

I'm suspecting that GPU reset doesn't affect the RPS/RC6 stuff either.
But I suppose it shouldn't really hurt anything to do it, so it's just
something to look into if we want to reduce the amount of stuff we do
at reset.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] drm/i915: Improve irq handling after gpu resets
  2014-05-26  8:36     ` [PATCH 1/2] drm/i915: Improve irq handling after gpu resets Ville Syrjälä
@ 2014-05-26 10:48       ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2014-05-26 10:48 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Daniel Vetter, Intel Graphics Development

On Mon, May 26, 2014 at 11:36:33AM +0300, Ville Syrjälä wrote:
> On Thu, May 22, 2014 at 10:18:21PM +0200, Daniel Vetter wrote:
> > Currently we do a full re-init of all interrupts after a gpu hang.
> > Which is pretty bad since we don't restore the interrupts we've
> > enabled at runtime correctly. Even with that addressed it's rather
> > horribly race.
> > 
> > But on g4x and later we only reset the gt and not the entire gpu.
> > Which means we only need to reset the GT interrupt bits. Which has the
> > nice benefit that vblank waits, pipe CRC interrupts and everything
> > else display related just keeps on working.
> > 
> > The downside is that gt interrupt handling (i.e. ring->get/put_irq) is
> > still racy. But as long as the gpu hang reliably wakes all waters and
> > we have a short time where the refcount drops to 0 we'll recover. So
> > not that bad really.
> > 
> > v2: Ville noticed that GTIMR and PMIMR don't get cleared, only the
> > subordinate per-ring registers. So let's rip out all the interrupt dancing.
> > The FIXME comment is still required though since the ring irq handling
> > happens at the per-ring interrupt mask registers, too.
> > 
> > Testcase: igt/kms_flip/vblank-vs-hang
> > Testcase: igt/kms_pipe_crc_basic/hang-*
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Both patches:
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Thanks, will pull in the patches later (lunchtime firts).

> And to answer you earlier question, yes things seemd to work fine after
> a GPU reset if I didn't touch the interrupt registers. In fact I also
> tried killing most of the gem_hw_init() stuff (basically just left
> ring->init(), l3_remap, and context enable) and things still seemed to
> work just fine.

Hm, we have pretty spotty test coverage for this stuff against gpu resets.
Also in case of doubt I prefer we don't special-case hw init code without
a good reason. Duplicating such logic for driver init, resume, gpu reset
or runtime pm resume just leads to duplicated bugs ime.


> > ---
> >  drivers/gpu/drm/i915/i915_drv.c | 16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index c83c83b74bf4..7ae906c811bb 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -811,17 +811,17 @@ int i915_reset(struct drm_device *dev)
> >  		}
> >  
> >  		/*
> > -		 * FIXME: This is horribly race against concurrent pageflip and
> > -		 * vblank wait ioctls since they can observe dev->irqs_disabled
> > -		 * being false when they shouldn't be able to.
> > +		 * FIXME: This races pretty badly against concurrent holders of
> > +		 * ring interrupts. This is possible since we've started to drop
> > +		 * dev->struct_mutex in select places when waiting for the gpu.
> >  		 */
> > -		drm_irq_uninstall(dev);
> > -		drm_irq_install(dev, dev->pdev->irq);
> >  
> > -		/* rps/rc6 re-init is necessary to restore state lost after the
> > -		 * reset and the re-install of drm irq. Skip for ironlake per
> > +		/*
> > +		 * rps/rc6 re-init is necessary to restore state lost after the
> > +		 * reset and the re-install of gt irqs. Skip for ironlake per
> >  		 * previous concerns that it doesn't respond well to some forms
> > -		 * of re-init after reset. */
> > +		 * of re-init after reset.
> > +		 */
> >  		if (INTEL_INFO(dev)->gen > 5)
> >  			intel_reset_gt_powersave(dev);
> 
> I'm suspecting that GPU reset doesn't affect the RPS/RC6 stuff either.
> But I suppose it shouldn't really hurt anything to do it, so it's just
> something to look into if we want to reduce the amount of stuff we do
> at reset.

Iirc we've had bugs in this area, and we have a testcase for at least some
of them: igt/pm_rps/reset. Not sure if there's more lurking.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-05-26 10:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-22 15:56 [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs Daniel Vetter
2014-05-22 15:56 ` [PATCH 2/5] drm/i915: Fix up fifo underrun tracking, take N Daniel Vetter
2014-05-22 16:55   ` Ville Syrjälä
2014-05-22 20:10     ` Daniel Vetter
2014-05-23  8:11       ` Ville Syrjälä
2014-05-23  8:21         ` Daniel Vetter
2014-05-26  8:09           ` Ville Syrjälä
2014-05-26  8:11             ` Daniel Vetter
2014-05-22 15:56 ` [PATCH 3/5] drm/i915: Disable gpu reset on i965g/gm Daniel Vetter
2014-05-22 15:56 ` [PATCH 4/5] drm/i915: Inline ilk/gen8_irq_reset Daniel Vetter
2014-05-22 15:56 ` [PATCH 5/5] drm/i915: Improve irq handling after gpu resets Daniel Vetter
2014-05-22 16:51   ` Ville Syrjälä
2014-05-22 17:06     ` Ville Syrjälä
2014-05-22 20:12     ` Daniel Vetter
2014-05-22 20:18   ` [PATCH 1/2] " Daniel Vetter
2014-05-22 20:18     ` [PATCH 2/2] drm/i915: Extract gen8_gt_irq_reset Daniel Vetter
2014-05-26  8:36     ` [PATCH 1/2] drm/i915: Improve irq handling after gpu resets Ville Syrjälä
2014-05-26 10:48       ` Daniel Vetter
2014-05-22 17:02 ` [PATCH 1/5] drm/i915: Add fifo underrun reporting state to debugfs Ville Syrjälä

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.