All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Report to userspace if we have a (presumed) working GPU reset
@ 2015-06-15 11:23 Chris Wilson
  2015-06-15 13:45 ` Daniel Vetter
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2015-06-15 11:23 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

In igt, we want to test handling of GPU hangs, both for recovery
purposes and for reporting. However, we don't want to inject a genuine
GPU hang onto a machine that cannot recover and so be permenantly
wedged. Rather than embed heuristics into igt, have the kernel report
exactly when it expects the GPU reset to work.

This can also be usefully extended in future to indicate different
levels of fine-grained resets.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tim Gore <tim.gore@intel.com>
Cc: Tomas Elf <tomas.elf@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c     |  5 +++++
 drivers/gpu/drm/i915/i915_drv.h     |  1 +
 drivers/gpu/drm/i915/intel_uncore.c | 28 ++++++++++++++++++++++------
 include/uapi/drm/i915_drm.h         |  1 +
 4 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 34248635c36c..88795d2f1819 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -163,6 +163,11 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		if (!value)
 			return -ENODEV;
 		break;
+	case I915_PARAM_HAS_GPU_RESET:
+		value = i915.enable_hangcheck &&
+			i915.reset &&
+			intel_has_gpu_reset(dev);
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1110d492ec01..85da0dc3c0e6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2599,6 +2599,7 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
 			      unsigned long arg);
 #endif
 extern int intel_gpu_reset(struct drm_device *dev);
+extern bool intel_has_gpu_reset(struct drm_device *dev);
 extern int i915_reset(struct drm_device *dev);
 extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
 extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 40382bff5ca0..a61de6e944d2 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1489,20 +1489,36 @@ static int gen6_do_reset(struct drm_device *dev)
 	return ret;
 }
 
-int intel_gpu_reset(struct drm_device *dev)
+static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
 {
 	if (INTEL_INFO(dev)->gen >= 6)
-		return gen6_do_reset(dev);
+		return gen6_do_reset;
 	else if (IS_GEN5(dev))
-		return ironlake_do_reset(dev);
+		return ironlake_do_reset;
 	else if (IS_G4X(dev))
-		return g4x_do_reset(dev);
+		return g4x_do_reset;
 	else if (IS_G33(dev))
-		return g33_do_reset(dev);
+		return g33_do_reset;
 	else if (INTEL_INFO(dev)->gen >= 3)
-		return i915_do_reset(dev);
+		return i915_do_reset;
 	else
+		return NULL;
+}
+
+int intel_gpu_reset(struct drm_device *dev)
+{
+	int (*reset)(struct drm_device *);
+
+	reset = intel_get_gpu_reset(dev);
+	if (reset == NULL)
 		return -ENODEV;
+
+	return reset(dev);
+}
+
+bool intel_has_gpu_reset(struct drm_device *dev)
+{
+	return intel_get_gpu_reset(dev) != NULL;
 }
 
 void intel_uncore_check_errors(struct drm_device *dev)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4c3420f932a5..312adbeb4eec 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -354,6 +354,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_REVISION              32
 #define I915_PARAM_SUBSLICE_TOTAL	 33
 #define I915_PARAM_EU_TOTAL		 34
+#define I915_PARAM_HAS_GPU_RESET	 35
 
 typedef struct drm_i915_getparam {
 	int param;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: Report to userspace if we have a (presumed) working GPU reset
  2015-06-15 11:23 [PATCH] drm/i915: Report to userspace if we have a (presumed) working GPU reset Chris Wilson
@ 2015-06-15 13:45 ` Daniel Vetter
  2015-06-15 13:53   ` Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Vetter @ 2015-06-15 13:45 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Mon, Jun 15, 2015 at 12:23:48PM +0100, Chris Wilson wrote:
> In igt, we want to test handling of GPU hangs, both for recovery
> purposes and for reporting. However, we don't want to inject a genuine
> GPU hang onto a machine that cannot recover and so be permenantly
> wedged. Rather than embed heuristics into igt, have the kernel report
> exactly when it expects the GPU reset to work.
> 
> This can also be usefully extended in future to indicate different
> levels of fine-grained resets.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tim Gore <tim.gore@intel.com>
> Cc: Tomas Elf <tomas.elf@intel.com>

Yeah makes sense. Will merge as soon as someone smashes a t-b with a few
igt patches using this on top.
-Daniel
> ---
>  drivers/gpu/drm/i915/i915_dma.c     |  5 +++++
>  drivers/gpu/drm/i915/i915_drv.h     |  1 +
>  drivers/gpu/drm/i915/intel_uncore.c | 28 ++++++++++++++++++++++------
>  include/uapi/drm/i915_drm.h         |  1 +
>  4 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 34248635c36c..88795d2f1819 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -163,6 +163,11 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  		if (!value)
>  			return -ENODEV;
>  		break;
> +	case I915_PARAM_HAS_GPU_RESET:
> +		value = i915.enable_hangcheck &&
> +			i915.reset &&
> +			intel_has_gpu_reset(dev);
> +		break;
>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
>  		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1110d492ec01..85da0dc3c0e6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2599,6 +2599,7 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
>  			      unsigned long arg);
>  #endif
>  extern int intel_gpu_reset(struct drm_device *dev);
> +extern bool intel_has_gpu_reset(struct drm_device *dev);
>  extern int i915_reset(struct drm_device *dev);
>  extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
>  extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 40382bff5ca0..a61de6e944d2 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1489,20 +1489,36 @@ static int gen6_do_reset(struct drm_device *dev)
>  	return ret;
>  }
>  
> -int intel_gpu_reset(struct drm_device *dev)
> +static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *)
>  {
>  	if (INTEL_INFO(dev)->gen >= 6)
> -		return gen6_do_reset(dev);
> +		return gen6_do_reset;
>  	else if (IS_GEN5(dev))
> -		return ironlake_do_reset(dev);
> +		return ironlake_do_reset;
>  	else if (IS_G4X(dev))
> -		return g4x_do_reset(dev);
> +		return g4x_do_reset;
>  	else if (IS_G33(dev))
> -		return g33_do_reset(dev);
> +		return g33_do_reset;
>  	else if (INTEL_INFO(dev)->gen >= 3)
> -		return i915_do_reset(dev);
> +		return i915_do_reset;
>  	else
> +		return NULL;
> +}
> +
> +int intel_gpu_reset(struct drm_device *dev)
> +{
> +	int (*reset)(struct drm_device *);
> +
> +	reset = intel_get_gpu_reset(dev);
> +	if (reset == NULL)
>  		return -ENODEV;
> +
> +	return reset(dev);
> +}
> +
> +bool intel_has_gpu_reset(struct drm_device *dev)
> +{
> +	return intel_get_gpu_reset(dev) != NULL;
>  }
>  
>  void intel_uncore_check_errors(struct drm_device *dev)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 4c3420f932a5..312adbeb4eec 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -354,6 +354,7 @@ typedef struct drm_i915_irq_wait {
>  #define I915_PARAM_REVISION              32
>  #define I915_PARAM_SUBSLICE_TOTAL	 33
>  #define I915_PARAM_EU_TOTAL		 34
> +#define I915_PARAM_HAS_GPU_RESET	 35
>  
>  typedef struct drm_i915_getparam {
>  	int param;
> -- 
> 2.1.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: Report to userspace if we have a (presumed) working GPU reset
  2015-06-15 13:45 ` Daniel Vetter
@ 2015-06-15 13:53   ` Chris Wilson
  2015-06-15 13:58     ` Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2015-06-15 13:53 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx

On Mon, Jun 15, 2015 at 03:45:38PM +0200, Daniel Vetter wrote:
> On Mon, Jun 15, 2015 at 12:23:48PM +0100, Chris Wilson wrote:
> > In igt, we want to test handling of GPU hangs, both for recovery
> > purposes and for reporting. However, we don't want to inject a genuine
> > GPU hang onto a machine that cannot recover and so be permenantly
> > wedged. Rather than embed heuristics into igt, have the kernel report
> > exactly when it expects the GPU reset to work.
> > 
> > This can also be usefully extended in future to indicate different
> > levels of fine-grained resets.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Tim Gore <tim.gore@intel.com>
> > Cc: Tomas Elf <tomas.elf@intel.com>
> 
> Yeah makes sense. Will merge as soon as someone smashes a t-b with a few
> igt patches using this on top.

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index deb5560..8a1ffb2 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -26,6 +26,7 @@
 #include <errno.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <sys/ioctl.h>
 #include <fcntl.h>
 
 #include "drmtest.h"
@@ -47,6 +48,21 @@
  * engines.
  */
 
+static bool has_gpu_reset(int fd)
+{
+       struct drm_i915_getparam gp;
+       int val = 0;
+
+       memset(&gp, 0, sizeof(gp));
+       gp.param = 35; /* HAS_GPU_RESET */
+       gp.value = &val;
+
+       if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp, sizeof(gp)))
+               return intel_gen(intel_get_drm_devid(fd)) >= 5;
+
+       return val > 0;
+}
 
 /**
  * igt_require_hang_ring:
@@ -60,7 +76,7 @@
 void igt_require_hang_ring(int fd, int ring)
 {
        gem_context_require_ban_period(fd);
-       igt_require(intel_gen(intel_get_drm_devid(fd)) >= 5);
+       igt_require(has_gpu_reset(fd));
 }
 
 /**


-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: Report to userspace if we have a (presumed) working GPU reset
  2015-06-15 13:53   ` Chris Wilson
@ 2015-06-15 13:58     ` Chris Wilson
  2015-06-15 15:01       ` Daniel Vetter
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2015-06-15 13:58 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx, Daniel Vetter, Tim Gore, Tomas Elf

On Mon, Jun 15, 2015 at 02:53:41PM +0100, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 03:45:38PM +0200, Daniel Vetter wrote:
> > On Mon, Jun 15, 2015 at 12:23:48PM +0100, Chris Wilson wrote:
> > > In igt, we want to test handling of GPU hangs, both for recovery
> > > purposes and for reporting. However, we don't want to inject a genuine
> > > GPU hang onto a machine that cannot recover and so be permenantly
> > > wedged. Rather than embed heuristics into igt, have the kernel report
> > > exactly when it expects the GPU reset to work.
> > > 
> > > This can also be usefully extended in future to indicate different
> > > levels of fine-grained resets.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Tim Gore <tim.gore@intel.com>
> > > Cc: Tomas Elf <tomas.elf@intel.com>
> > 
> > Yeah makes sense. Will merge as soon as someone smashes a t-b with a few
> > igt patches using this on top.
> 
> diff --git a/lib/igt_gt.c b/lib/igt_gt.c
> index deb5560..8a1ffb2 100644
> --- a/lib/igt_gt.c
> +++ b/lib/igt_gt.c
> @@ -26,6 +26,7 @@
>  #include <errno.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <sys/ioctl.h>
>  #include <fcntl.h>
>  
>  #include "drmtest.h"
> @@ -47,6 +48,21 @@
>   * engines.
>   */
>  
> +static bool has_gpu_reset(int fd)
> +{
> +       struct drm_i915_getparam gp;
> +       int val = 0;
> +
> +       memset(&gp, 0, sizeof(gp));
> +       gp.param = 35; /* HAS_GPU_RESET */
> +       gp.value = &val;
> +
> +       if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp, sizeof(gp)))
> +               return intel_gen(intel_get_drm_devid(fd)) >= 5;
> +
> +       return val > 0;
> +}
>  
>  /**
>   * igt_require_hang_ring:
> @@ -60,7 +76,7 @@
>  void igt_require_hang_ring(int fd, int ring)
>  {
>         gem_context_require_ban_period(fd);
> -       igt_require(intel_gen(intel_get_drm_devid(fd)) >= 5);
> +       igt_require(has_gpu_reset(fd));
>  }

Speaking of which, do we want
  igt_require(getenv("IGT_DISABLE_HANG") == NULL);
here?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/i915: Report to userspace if we have a (presumed) working GPU reset
  2015-06-15 13:58     ` Chris Wilson
@ 2015-06-15 15:01       ` Daniel Vetter
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Vetter @ 2015-06-15 15:01 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Daniel Vetter, Tim Gore,
	Tomas Elf

On Mon, Jun 15, 2015 at 02:58:17PM +0100, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 02:53:41PM +0100, Chris Wilson wrote:
> > On Mon, Jun 15, 2015 at 03:45:38PM +0200, Daniel Vetter wrote:
> > > On Mon, Jun 15, 2015 at 12:23:48PM +0100, Chris Wilson wrote:
> > > > In igt, we want to test handling of GPU hangs, both for recovery
> > > > purposes and for reporting. However, we don't want to inject a genuine
> > > > GPU hang onto a machine that cannot recover and so be permenantly
> > > > wedged. Rather than embed heuristics into igt, have the kernel report
> > > > exactly when it expects the GPU reset to work.
> > > > 
> > > > This can also be usefully extended in future to indicate different
> > > > levels of fine-grained resets.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > Cc: Tim Gore <tim.gore@intel.com>
> > > > Cc: Tomas Elf <tomas.elf@intel.com>
> > > 
> > > Yeah makes sense. Will merge as soon as someone smashes a t-b with a few
> > > igt patches using this on top.
> > 
> > diff --git a/lib/igt_gt.c b/lib/igt_gt.c
> > index deb5560..8a1ffb2 100644
> > --- a/lib/igt_gt.c
> > +++ b/lib/igt_gt.c
> > @@ -26,6 +26,7 @@
> >  #include <errno.h>
> >  #include <sys/types.h>
> >  #include <sys/stat.h>
> > +#include <sys/ioctl.h>
> >  #include <fcntl.h>
> >  
> >  #include "drmtest.h"
> > @@ -47,6 +48,21 @@
> >   * engines.
> >   */
> >  
> > +static bool has_gpu_reset(int fd)
> > +{
> > +       struct drm_i915_getparam gp;
> > +       int val = 0;
> > +
> > +       memset(&gp, 0, sizeof(gp));
> > +       gp.param = 35; /* HAS_GPU_RESET */
> > +       gp.value = &val;
> > +
> > +       if (ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp, sizeof(gp)))
> > +               return intel_gen(intel_get_drm_devid(fd)) >= 5;
> > +
> > +       return val > 0;
> > +}
> >  
> >  /**
> >   * igt_require_hang_ring:
> > @@ -60,7 +76,7 @@
> >  void igt_require_hang_ring(int fd, int ring)
> >  {
> >         gem_context_require_ban_period(fd);
> > -       igt_require(intel_gen(intel_get_drm_devid(fd)) >= 5);
> > +       igt_require(has_gpu_reset(fd));
> >  }

Count me convinced, patch applied ;-)

> Speaking of which, do we want
>   igt_require(getenv("IGT_DISABLE_HANG") == NULL);
> here?

Well igt_require(!igt_check_boolean_env_var(IGT_DISABLE_HANG, false)); but
tbh I'm not sure of that. Filtering testcases with piglit using -x hang
should amount to the same really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-06-15 14:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-15 11:23 [PATCH] drm/i915: Report to userspace if we have a (presumed) working GPU reset Chris Wilson
2015-06-15 13:45 ` Daniel Vetter
2015-06-15 13:53   ` Chris Wilson
2015-06-15 13:58     ` Chris Wilson
2015-06-15 15:01       ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.