All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits
@ 2014-07-03  7:09 Chris Wilson
  2014-07-03 15:44 ` Jesse Barnes
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Wilson @ 2014-07-03  7:09 UTC (permalink / raw)
  To: intel-gfx

Since we rely on hangcheck to wait up and kick us out of an indefinite
wait should the GPU ever stop functioning, it appears sensible that we
should check that hangcheck is indeed active before starting that wait.
This just prevents a driver error in the processing of hangcheck from
appearing to hang the machine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  1 +
 drivers/gpu/drm/i915/i915_gem.c         |  2 ++
 drivers/gpu/drm/i915/i915_irq.c         | 11 +++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
 4 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8670d05..c326a99 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2149,6 +2149,7 @@ extern void intel_console_resume(struct work_struct *work);
 
 /* i915_irq.c */
 void i915_queue_hangcheck(struct drm_device *dev);
+void i915_check_hangcheck(struct drm_device *dev);
 __printf(3, 4)
 void i915_handle_error(struct drm_device *dev, bool wedged,
 		       const char *fmt, ...);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ecb835b..120ed6d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1455,6 +1455,8 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
 			break;
 		}
 
+		i915_check_hangcheck(ring->dev);
+
 		timer.function = NULL;
 		if (timeout || missed_irq(dev_priv, ring)) {
 			unsigned long expire;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9e5a295..daa590e 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3063,6 +3063,17 @@ void i915_queue_hangcheck(struct drm_device *dev)
 		  round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
 }
 
+void i915_check_hangcheck(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	if (!i915.enable_hangcheck)
+		return;
+
+	if (!timer_pending(&dev_priv->gpu_error.hangcheck_timer))
+		mod_timer(&dev_priv->gpu_error.hangcheck_timer,
+			  round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
+}
+
 static void ibx_irq_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4f3397f..6365d4d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1634,6 +1634,8 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
 				master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
 		}
 
+		i915_check_hangcheck(dev);
+
 		io_schedule_timeout(1);
 
 		if (dev_priv->mm.interruptible && signal_pending(current)) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits
  2014-07-03  7:09 [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits Chris Wilson
@ 2014-07-03 15:44 ` Jesse Barnes
  2014-07-03 15:51   ` Chris Wilson
  0 siblings, 1 reply; 4+ messages in thread
From: Jesse Barnes @ 2014-07-03 15:44 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu,  3 Jul 2014 08:09:01 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> Since we rely on hangcheck to wait up and kick us out of an indefinite
> wait should the GPU ever stop functioning, it appears sensible that we
> should check that hangcheck is indeed active before starting that wait.
> This just prevents a driver error in the processing of hangcheck from
> appearing to hang the machine.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  1 +
>  drivers/gpu/drm/i915/i915_gem.c         |  2 ++
>  drivers/gpu/drm/i915/i915_irq.c         | 11 +++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
>  4 files changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8670d05..c326a99 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2149,6 +2149,7 @@ extern void intel_console_resume(struct work_struct *work);
>  
>  /* i915_irq.c */
>  void i915_queue_hangcheck(struct drm_device *dev);
> +void i915_check_hangcheck(struct drm_device *dev);
>  __printf(3, 4)
>  void i915_handle_error(struct drm_device *dev, bool wedged,
>  		       const char *fmt, ...);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ecb835b..120ed6d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1455,6 +1455,8 @@ static int __wait_seqno(struct intel_engine_cs *ring, u32 seqno,
>  			break;
>  		}
>  
> +		i915_check_hangcheck(ring->dev);
> +
>  		timer.function = NULL;
>  		if (timeout || missed_irq(dev_priv, ring)) {
>  			unsigned long expire;
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 9e5a295..daa590e 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3063,6 +3063,17 @@ void i915_queue_hangcheck(struct drm_device *dev)
>  		  round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
>  }
>  
> +void i915_check_hangcheck(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	if (!i915.enable_hangcheck)
> +		return;
> +
> +	if (!timer_pending(&dev_priv->gpu_error.hangcheck_timer))
> +		mod_timer(&dev_priv->gpu_error.hangcheck_timer,
> +			  round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES));
> +}
> +
>  static void ibx_irq_reset(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 4f3397f..6365d4d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1634,6 +1634,8 @@ static int ring_wait_for_space(struct intel_engine_cs *ring, int n)
>  				master_priv->sarea_priv->perf_boxes |= I915_BOX_WAIT;
>  		}
>  
> +		i915_check_hangcheck(dev);
> +
>  		io_schedule_timeout(1);
>  
>  		if (dev_priv->mm.interruptible && signal_pending(current)) {

Are there any bugs associated with this?

i915_rearm_hangcheck() or something might more accurately describe
what's going on here.

I suppose both of these paths are protected by the struct_mutex?  If
not, might we race and mod_timer() this twice from two threads in
succession?  I guess that's harmless...

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits
  2014-07-03 15:44 ` Jesse Barnes
@ 2014-07-03 15:51   ` Chris Wilson
  2014-07-03 16:00     ` Jesse Barnes
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Wilson @ 2014-07-03 15:51 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx

On Thu, Jul 03, 2014 at 08:44:20AM -0700, Jesse Barnes wrote:
> On Thu,  3 Jul 2014 08:09:01 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > Since we rely on hangcheck to wait up and kick us out of an indefinite
> > wait should the GPU ever stop functioning, it appears sensible that we
> > should check that hangcheck is indeed active before starting that wait.
> > This just prevents a driver error in the processing of hangcheck from
> > appearing to hang the machine.

> Are there any bugs associated with this?

No open bugs. They have cropped up during dev though, and I think I am
not alone. I believe that both Ben and I have tried to convince Daniel
the merits of having this security blanket.
 
> i915_rearm_hangcheck() or something might more accurately describe
> what's going on here.

How about i915_ensure_hangcheck()? (I agree that rearm is better than
check.)
 
> I suppose both of these paths are protected by the struct_mutex?  If
> not, might we race and mod_timer() this twice from two threads in
> succession?  I guess that's harmless...

Concurrently arming a timer within a jiffie or two isn't going to make
too much difference, or even pushing an almost firing timer off by
another hangcheck interval. Conversely, since we already have read the
hangcheck counter, if the hangcheck does fire before we schedule(), that
will immediately wake us up and we will spot the hang.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits
  2014-07-03 15:51   ` Chris Wilson
@ 2014-07-03 16:00     ` Jesse Barnes
  0 siblings, 0 replies; 4+ messages in thread
From: Jesse Barnes @ 2014-07-03 16:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu, 3 Jul 2014 16:51:11 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Thu, Jul 03, 2014 at 08:44:20AM -0700, Jesse Barnes wrote:
> > On Thu,  3 Jul 2014 08:09:01 +0100
> > Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > 
> > > Since we rely on hangcheck to wait up and kick us out of an indefinite
> > > wait should the GPU ever stop functioning, it appears sensible that we
> > > should check that hangcheck is indeed active before starting that wait.
> > > This just prevents a driver error in the processing of hangcheck from
> > > appearing to hang the machine.
> 
> > Are there any bugs associated with this?
> 
> No open bugs. They have cropped up during dev though, and I think I am
> not alone. I believe that both Ben and I have tried to convince Daniel
> the merits of having this security blanket.
>  
> > i915_rearm_hangcheck() or something might more accurately describe
> > what's going on here.
> 
> How about i915_ensure_hangcheck()? (I agree that rearm is better than
> check.)
>  
> > I suppose both of these paths are protected by the struct_mutex?  If
> > not, might we race and mod_timer() this twice from two threads in
> > succession?  I guess that's harmless...
> 
> Concurrently arming a timer within a jiffie or two isn't going to make
> too much difference, or even pushing an almost firing timer off by
> another hangcheck interval. Conversely, since we already have read the
> hangcheck counter, if the hangcheck does fire before we schedule(), that
> will immediately wake us up and we will spot the hang.

Sounds good.  ensure_hangcheck() or update_hangcheck() are fine with me
too.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-07-03 16:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-03  7:09 [PATCH] drm/i915: Check hangcheck is functioning before indefinite waits Chris Wilson
2014-07-03 15:44 ` Jesse Barnes
2014-07-03 15:51   ` Chris Wilson
2014-07-03 16:00     ` Jesse Barnes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.