All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: add i915_reset_count
@ 2013-10-30 13:44 Mika Kuoppala
  2013-10-30 13:44 ` [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl Mika Kuoppala
  2013-11-08 17:40 ` [PATCH 1/2] drm/i915: add i915_reset_count Damien Lespiau
  0 siblings, 2 replies; 11+ messages in thread
From: Mika Kuoppala @ 2013-10-30 13:44 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

reset_counter will be incremented twice per successful
reset. Odd values mean reset is in progress and even values
mean that reset has completed.

Reset status ioctl introduced in following commit
needs to deliver global reset count to userspace so
use reset_counter to derive the actual reset count
for the gpu

Note that reset in progress is enough to increment
the counter.

v2: wedged equals reset in progress (Daniel Vetter)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |   11 ++++++++---
 drivers/gpu/drm/i915/i915_irq.c |    2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0a886fd..9fd716d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1082,7 +1082,7 @@ struct i915_gpu_error {
 	 * being true.
 	 */
 #define I915_RESET_IN_PROGRESS_FLAG	1
-#define I915_WEDGED			0xffffffff
+#define I915_WEDGED			(1 << 31)
 
 	/**
 	 * Waitqueue to signal when the reset has completed. Used by clients
@@ -2031,12 +2031,17 @@ int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
 static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
 {
 	return unlikely(atomic_read(&error->reset_counter)
-			& I915_RESET_IN_PROGRESS_FLAG);
+			& (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
 }
 
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
-	return atomic_read(&error->reset_counter) == I915_WEDGED;
+	return atomic_read(&error->reset_counter) & I915_WEDGED;
+}
+
+static inline u32 i915_reset_count(struct i915_gpu_error *error)
+{
+	return ((atomic_read(&error->reset_counter) & ~I915_WEDGED) + 1) / 2;
 }
 
 void i915_gem_reset(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a228176..80800d2 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1783,7 +1783,7 @@ static void i915_error_work_func(struct work_struct *work)
 			kobject_uevent_env(&dev->primary->kdev.kobj,
 					   KOBJ_CHANGE, reset_done_event);
 		} else {
-			atomic_set(&error->reset_counter, I915_WEDGED);
+			atomic_set_mask(I915_WEDGED, &error->reset_counter);
 		}
 
 		/*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl
  2013-10-30 13:44 [PATCH 1/2] drm/i915: add i915_reset_count Mika Kuoppala
@ 2013-10-30 13:44 ` Mika Kuoppala
  2013-10-30 17:37   ` Ian Romanick
                     ` (2 more replies)
  2013-11-08 17:40 ` [PATCH 1/2] drm/i915: add i915_reset_count Damien Lespiau
  1 sibling, 3 replies; 11+ messages in thread
From: Mika Kuoppala @ 2013-10-30 13:44 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, miku

This ioctl returns reset stats for specified context.

The struct returned contains context loss counters.

reset_count:    all resets across all contexts
batch_active:   active batches lost on resets
batch_pending:  pending batches lost on resets

v2: get rid of state tracking completely and deliver only counts. Idea
    from Chris Wilson.

v3: fix commit message

v4: default context handled inside i915_gem_context_get_hang_stats

v5: reset_count only for priviledged process

v6: ctx=0 needs CAP_SYS_ADMIN for batch_* counters (Chris Wilson)

v7: context hang stats never returns NULL

v8: rebased on top of reworked context hang stats
    DRM_RENDER_ALLOW for ioctl

v9: use DEFAULT_CONTEXT_ID. Improve comments for ioctl struct members

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_dma.c     |    1 +
 drivers/gpu/drm/i915/i915_drv.h     |    2 ++
 drivers/gpu/drm/i915/intel_uncore.c |   34 ++++++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h         |   19 +++++++++++++++++++
 4 files changed, 56 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 6eecce7..f2cdeb2 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1921,6 +1921,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_get_reset_stats_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };
 
 int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9fd716d..8870804 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2369,6 +2369,8 @@ extern int intel_enable_rc6(const struct drm_device *dev);
 extern bool i915_semaphore_is_enabled(struct drm_device *dev);
 int i915_reg_read_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file);
+int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file);
 
 /* overlay */
 extern struct intel_overlay_error_state *intel_overlay_capture_error_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index f6fae35..21cf951 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -633,6 +633,40 @@ int i915_reg_read_ioctl(struct drm_device *dev,
 	return 0;
 }
 
+int i915_get_reset_stats_ioctl(struct drm_device *dev,
+			       void *data, struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_reset_stats *args = data;
+	struct i915_ctx_hang_stats *hs;
+	int ret;
+
+	if (args->ctx_id == DEFAULT_CONTEXT_ID && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	hs = i915_gem_context_get_hang_stats(dev, file, args->ctx_id);
+	if (IS_ERR(hs)) {
+		mutex_unlock(&dev->struct_mutex);
+		return PTR_ERR(hs);
+	}
+
+	if (capable(CAP_SYS_ADMIN))
+		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
+	else
+		args->reset_count = 0;
+
+	args->batch_active = hs->batch_active;
+	args->batch_pending = hs->batch_pending;
+
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+}
+
 static int i965_reset_complete(struct drm_device *dev)
 {
 	u8 gdrst;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 3a4e97b..52aed89 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -222,6 +222,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_SET_CACHING	0x2f
 #define DRM_I915_GEM_GET_CACHING	0x30
 #define DRM_I915_REG_READ		0x31
+#define DRM_I915_GET_RESET_STATS	0x32
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -271,6 +272,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
 #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
 #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
+#define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1030,4 +1032,21 @@ struct drm_i915_reg_read {
 	__u64 offset;
 	__u64 val; /* Return value */
 };
+
+struct drm_i915_reset_stats {
+	__u32 ctx_id;
+	__u32 flags;
+
+	/* All resets since boot/module reload, for all contexts */
+	__u32 reset_count;
+
+	/* Number of batches lost when active in GPU, for this context */
+	__u32 batch_active;
+
+	/* Number of batches lost pending for execution, for this context */
+	__u32 batch_pending;
+
+	__u32 pad;
+};
+
 #endif /* _UAPI_I915_DRM_H_ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl
  2013-10-30 13:44 ` [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl Mika Kuoppala
@ 2013-10-30 17:37   ` Ian Romanick
  2013-11-08 18:11   ` Damien Lespiau
  2013-11-12 14:19   ` Daniel Vetter
  2 siblings, 0 replies; 11+ messages in thread
From: Ian Romanick @ 2013-10-30 17:37 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, miku

On 10/30/2013 06:44 AM, Mika Kuoppala wrote:
> This ioctl returns reset stats for specified context.
> 
> The struct returned contains context loss counters.
> 
> reset_count:    all resets across all contexts
> batch_active:   active batches lost on resets
> batch_pending:  pending batches lost on resets
> 
> v2: get rid of state tracking completely and deliver only counts. Idea
>     from Chris Wilson.
> 
> v3: fix commit message
> 
> v4: default context handled inside i915_gem_context_get_hang_stats
> 
> v5: reset_count only for priviledged process
> 
> v6: ctx=0 needs CAP_SYS_ADMIN for batch_* counters (Chris Wilson)
> 
> v7: context hang stats never returns NULL
> 
> v8: rebased on top of reworked context hang stats
>     DRM_RENDER_ALLOW for ioctl
> 
> v9: use DEFAULT_CONTEXT_ID. Improve comments for ioctl struct members
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Ian Romanick <idr@freedesktop.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_dma.c     |    1 +
>  drivers/gpu/drm/i915/i915_drv.h     |    2 ++
>  drivers/gpu/drm/i915/intel_uncore.c |   34 ++++++++++++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h         |   19 +++++++++++++++++++
>  4 files changed, 56 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 6eecce7..f2cdeb2 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1921,6 +1921,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_get_reset_stats_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  };
>  
>  int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 9fd716d..8870804 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2369,6 +2369,8 @@ extern int intel_enable_rc6(const struct drm_device *dev);
>  extern bool i915_semaphore_is_enabled(struct drm_device *dev);
>  int i915_reg_read_ioctl(struct drm_device *dev, void *data,
>  			struct drm_file *file);
> +int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
> +			       struct drm_file *file);
>  
>  /* overlay */
>  extern struct intel_overlay_error_state *intel_overlay_capture_error_state(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index f6fae35..21cf951 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -633,6 +633,40 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>  	return 0;
>  }
>  
> +int i915_get_reset_stats_ioctl(struct drm_device *dev,
> +			       void *data, struct drm_file *file)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_reset_stats *args = data;
> +	struct i915_ctx_hang_stats *hs;
> +	int ret;
> +
> +	if (args->ctx_id == DEFAULT_CONTEXT_ID && !capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	hs = i915_gem_context_get_hang_stats(dev, file, args->ctx_id);
> +	if (IS_ERR(hs)) {
> +		mutex_unlock(&dev->struct_mutex);
> +		return PTR_ERR(hs);
> +	}
> +
> +	if (capable(CAP_SYS_ADMIN))
> +		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
> +	else
> +		args->reset_count = 0;

We're having some additional debate about issues related to this.  Eric
(added to CC so he'll notice) believes that we may encounter memory
corruption around a reset (most likely causing the reset instead of the
other way around).  This means that we may need to deliver a reset
notification to an otherwise unaffected GL context after all. :(

If we decided that this is possible, we should deliver a single bit to
user mode that says "there was a reset after this context was created."
 I assume that could be returned to user space in the flags field?

I don't think this provides the same potential information leak as
directly exposing the global reset count, but I could be wrong.

I don't think we need to change anything /yet/, but we may need to soon.

> +
> +	args->batch_active = hs->batch_active;
> +	args->batch_pending = hs->batch_pending;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	return 0;
> +}
> +
>  static int i965_reset_complete(struct drm_device *dev)
>  {
>  	u8 gdrst;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 3a4e97b..52aed89 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -222,6 +222,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_I915_GEM_SET_CACHING	0x2f
>  #define DRM_I915_GEM_GET_CACHING	0x30
>  #define DRM_I915_REG_READ		0x31
> +#define DRM_I915_GET_RESET_STATS	0x32
>  
>  #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
>  #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
> @@ -271,6 +272,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
>  #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
>  #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
> +#define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
>  
>  /* Allow drivers to submit batchbuffers directly to hardware, relying
>   * on the security mechanisms provided by hardware.
> @@ -1030,4 +1032,21 @@ struct drm_i915_reg_read {
>  	__u64 offset;
>  	__u64 val; /* Return value */
>  };
> +
> +struct drm_i915_reset_stats {
> +	__u32 ctx_id;
> +	__u32 flags;
> +
> +	/* All resets since boot/module reload, for all contexts */
> +	__u32 reset_count;
> +
> +	/* Number of batches lost when active in GPU, for this context */
> +	__u32 batch_active;
> +
> +	/* Number of batches lost pending for execution, for this context */
> +	__u32 batch_pending;
> +
> +	__u32 pad;
> +};
> +
>  #endif /* _UAPI_I915_DRM_H_ */
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/i915: add i915_reset_count
  2013-10-30 13:44 [PATCH 1/2] drm/i915: add i915_reset_count Mika Kuoppala
  2013-10-30 13:44 ` [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl Mika Kuoppala
@ 2013-11-08 17:40 ` Damien Lespiau
  2013-11-12 12:44   ` Mika Kuoppala
  1 sibling, 1 reply; 11+ messages in thread
From: Damien Lespiau @ 2013-11-08 17:40 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku

On Wed, Oct 30, 2013 at 03:44:15PM +0200, Mika Kuoppala wrote:
> reset_counter will be incremented twice per successful
> reset. Odd values mean reset is in progress and even values
> mean that reset has completed.

Could you also update the lengthy comment above reset_counter. It looks
quite stale. There's also a:

	* Note that the code relies on
	*      I915_WEDGED & I915_RESET_IN_PROGRESS_FLAG
	* being true.

which is not true anymore with this commit. i915_reset_in_progress() has
been modified to take the I915_WEDGED change into account, so we should
be safe, I believe.

With this, you can add a:

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

-- 
Damien

> 
> Reset status ioctl introduced in following commit
> needs to deliver global reset count to userspace so
> use reset_counter to derive the actual reset count
> for the gpu
> 
> Note that reset in progress is enough to increment
> the counter.
> 
> v2: wedged equals reset in progress (Daniel Vetter)
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |   11 ++++++++---
>  drivers/gpu/drm/i915/i915_irq.c |    2 +-
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0a886fd..9fd716d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1082,7 +1082,7 @@ struct i915_gpu_error {
>  	 * being true.
>  	 */
>  #define I915_RESET_IN_PROGRESS_FLAG	1
> -#define I915_WEDGED			0xffffffff
> +#define I915_WEDGED			(1 << 31)
>  
>  	/**
>  	 * Waitqueue to signal when the reset has completed. Used by clients
> @@ -2031,12 +2031,17 @@ int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
>  static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
>  {
>  	return unlikely(atomic_read(&error->reset_counter)
> -			& I915_RESET_IN_PROGRESS_FLAG);
> +			& (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
>  }
>  
>  static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
>  {
> -	return atomic_read(&error->reset_counter) == I915_WEDGED;
> +	return atomic_read(&error->reset_counter) & I915_WEDGED;
> +}
> +
> +static inline u32 i915_reset_count(struct i915_gpu_error *error)
> +{
> +	return ((atomic_read(&error->reset_counter) & ~I915_WEDGED) + 1) / 2;
>  }
>  
>  void i915_gem_reset(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index a228176..80800d2 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1783,7 +1783,7 @@ static void i915_error_work_func(struct work_struct *work)
>  			kobject_uevent_env(&dev->primary->kdev.kobj,
>  					   KOBJ_CHANGE, reset_done_event);
>  		} else {
> -			atomic_set(&error->reset_counter, I915_WEDGED);
> +			atomic_set_mask(I915_WEDGED, &error->reset_counter);
>  		}
>  
>  		/*
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl
  2013-10-30 13:44 ` [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl Mika Kuoppala
  2013-10-30 17:37   ` Ian Romanick
@ 2013-11-08 18:11   ` Damien Lespiau
  2013-11-12  0:01     ` Ian Romanick
  2013-11-12 14:19   ` Daniel Vetter
  2 siblings, 1 reply; 11+ messages in thread
From: Damien Lespiau @ 2013-11-08 18:11 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, miku

On Wed, Oct 30, 2013 at 03:44:16PM +0200, Mika Kuoppala wrote:
> This ioctl returns reset stats for specified context.
> 
> The struct returned contains context loss counters.
> 
> reset_count:    all resets across all contexts
> batch_active:   active batches lost on resets
> batch_pending:  pending batches lost on resets
> 
> v2: get rid of state tracking completely and deliver only counts. Idea
>     from Chris Wilson.
> 
> v3: fix commit message
> 
> v4: default context handled inside i915_gem_context_get_hang_stats
> 
> v5: reset_count only for priviledged process
> 
> v6: ctx=0 needs CAP_SYS_ADMIN for batch_* counters (Chris Wilson)
> 
> v7: context hang stats never returns NULL
> 
> v8: rebased on top of reworked context hang stats
>     DRM_RENDER_ALLOW for ioctl
> 
> v9: use DEFAULT_CONTEXT_ID. Improve comments for ioctl struct members
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Ian Romanick <idr@freedesktop.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

The logic there looks good to me, I was just wondering if we wanted a
bit more padding in case we want other counters or misc stuff.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

-- 
Damien

> ---
>  drivers/gpu/drm/i915/i915_dma.c     |    1 +
>  drivers/gpu/drm/i915/i915_drv.h     |    2 ++
>  drivers/gpu/drm/i915/intel_uncore.c |   34 ++++++++++++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h         |   19 +++++++++++++++++++
>  4 files changed, 56 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 6eecce7..f2cdeb2 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1921,6 +1921,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_get_reset_stats_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  };
>  
>  int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 9fd716d..8870804 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2369,6 +2369,8 @@ extern int intel_enable_rc6(const struct drm_device *dev);
>  extern bool i915_semaphore_is_enabled(struct drm_device *dev);
>  int i915_reg_read_ioctl(struct drm_device *dev, void *data,
>  			struct drm_file *file);
> +int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
> +			       struct drm_file *file);
>  
>  /* overlay */
>  extern struct intel_overlay_error_state *intel_overlay_capture_error_state(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index f6fae35..21cf951 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -633,6 +633,40 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>  	return 0;
>  }
>  
> +int i915_get_reset_stats_ioctl(struct drm_device *dev,
> +			       void *data, struct drm_file *file)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_reset_stats *args = data;
> +	struct i915_ctx_hang_stats *hs;
> +	int ret;
> +
> +	if (args->ctx_id == DEFAULT_CONTEXT_ID && !capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	hs = i915_gem_context_get_hang_stats(dev, file, args->ctx_id);
> +	if (IS_ERR(hs)) {
> +		mutex_unlock(&dev->struct_mutex);
> +		return PTR_ERR(hs);
> +	}
> +
> +	if (capable(CAP_SYS_ADMIN))
> +		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
> +	else
> +		args->reset_count = 0;
> +
> +	args->batch_active = hs->batch_active;
> +	args->batch_pending = hs->batch_pending;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	return 0;
> +}
> +
>  static int i965_reset_complete(struct drm_device *dev)
>  {
>  	u8 gdrst;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 3a4e97b..52aed89 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -222,6 +222,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_I915_GEM_SET_CACHING	0x2f
>  #define DRM_I915_GEM_GET_CACHING	0x30
>  #define DRM_I915_REG_READ		0x31
> +#define DRM_I915_GET_RESET_STATS	0x32
>  
>  #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
>  #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
> @@ -271,6 +272,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
>  #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
>  #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
> +#define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
>  
>  /* Allow drivers to submit batchbuffers directly to hardware, relying
>   * on the security mechanisms provided by hardware.
> @@ -1030,4 +1032,21 @@ struct drm_i915_reg_read {
>  	__u64 offset;
>  	__u64 val; /* Return value */
>  };
> +
> +struct drm_i915_reset_stats {
> +	__u32 ctx_id;
> +	__u32 flags;
> +
> +	/* All resets since boot/module reload, for all contexts */
> +	__u32 reset_count;
> +
> +	/* Number of batches lost when active in GPU, for this context */
> +	__u32 batch_active;
> +
> +	/* Number of batches lost pending for execution, for this context */
> +	__u32 batch_pending;
> +
> +	__u32 pad;
> +};
> +
>  #endif /* _UAPI_I915_DRM_H_ */
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl
  2013-11-08 18:11   ` Damien Lespiau
@ 2013-11-12  0:01     ` Ian Romanick
  2013-11-12 13:17       ` Daniel Vetter
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Romanick @ 2013-11-12  0:01 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: intel-gfx, miku, Daniel Vetter

On 11/08/2013 10:11 AM, Damien Lespiau wrote:
> On Wed, Oct 30, 2013 at 03:44:16PM +0200, Mika Kuoppala wrote:
>> This ioctl returns reset stats for specified context.
>>
>> The struct returned contains context loss counters.
>>
>> reset_count:    all resets across all contexts
>> batch_active:   active batches lost on resets
>> batch_pending:  pending batches lost on resets
>>
>> v2: get rid of state tracking completely and deliver only counts. Idea
>>     from Chris Wilson.
>>
>> v3: fix commit message
>>
>> v4: default context handled inside i915_gem_context_get_hang_stats
>>
>> v5: reset_count only for priviledged process
>>
>> v6: ctx=0 needs CAP_SYS_ADMIN for batch_* counters (Chris Wilson)
>>
>> v7: context hang stats never returns NULL
>>
>> v8: rebased on top of reworked context hang stats
>>     DRM_RENDER_ALLOW for ioctl
>>
>> v9: use DEFAULT_CONTEXT_ID. Improve comments for ioctl struct members
>>
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> Cc: Ian Romanick <idr@freedesktop.org>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> The logic there looks good to me, I was just wondering if we wanted a
> bit more padding in case we want other counters or misc stuff.

I think the current padding should be sufficient.  The other things that
could occur aren't things that we should return counts of.  For those
kinds of things, we'd only want to return flags.  In fact, the current
user space only uses the counts as flags anyway (is the count non-zero?).

> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

Also,

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] drm/i915: add i915_reset_count
  2013-11-08 17:40 ` [PATCH 1/2] drm/i915: add i915_reset_count Damien Lespiau
@ 2013-11-12 12:44   ` Mika Kuoppala
  0 siblings, 0 replies; 11+ messages in thread
From: Mika Kuoppala @ 2013-11-12 12:44 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku

reset_counter will be incremented twice per successful
reset. Odd values mean reset is in progress and even values
mean that reset has completed.

Reset status ioctl introduced in following commit
needs to deliver global reset count to userspace so
use reset_counter to derive the actual reset count
for the gpu

Note that reset in progress is enough to increment
the counter.

v2: wedged equals reset in progress (Daniel Vetter)

v3: Fixed stale comments (Damien Lespiau)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |   43 ++++++++++++++++++++-------------------
 drivers/gpu/drm/i915/i915_irq.c |    2 +-
 2 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c546316..4c0f751 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1069,34 +1069,30 @@ struct i915_gpu_error {
 	unsigned long missed_irq_rings;
 
 	/**
-	 * State variable and reset counter controlling the reset flow
+	 * State variable controlling the reset flow and count
 	 *
-	 * Upper bits are for the reset counter.  This counter is used by the
-	 * wait_seqno code to race-free noticed that a reset event happened and
-	 * that it needs to restart the entire ioctl (since most likely the
-	 * seqno it waited for won't ever signal anytime soon).
+	 * This is a counter which gets incremented when reset is triggered,
+	 * and again when reset has been handled. So odd values (lowest bit set)
+	 * means that reset is in progress and even values that
+	 * (reset_counter >> 1):th reset was successfully completed.
+	 *
+	 * If reset is not completed succesfully, the I915_WEDGE bit is
+	 * set meaning that hardware is terminally sour and there is no
+	 * recovery. All waiters on the reset_queue will be woken when
+	 * that happens.
+	 *
+	 * This counter is used by the wait_seqno code to notice that reset
+	 * event happened and it needs to restart the entire ioctl (since most
+	 * likely the seqno it waited for won't ever signal anytime soon).
 	 *
 	 * This is important for lock-free wait paths, where no contended lock
 	 * naturally enforces the correct ordering between the bail-out of the
 	 * waiter and the gpu reset work code.
-	 *
-	 * Lowest bit controls the reset state machine: Set means a reset is in
-	 * progress. This state will (presuming we don't have any bugs) decay
-	 * into either unset (successful reset) or the special WEDGED value (hw
-	 * terminally sour). All waiters on the reset_queue will be woken when
-	 * that happens.
 	 */
 	atomic_t reset_counter;
 
-	/**
-	 * Special values/flags for reset_counter
-	 *
-	 * Note that the code relies on
-	 * 	I915_WEDGED & I915_RESET_IN_PROGRESS_FLAG
-	 * being true.
-	 */
 #define I915_RESET_IN_PROGRESS_FLAG	1
-#define I915_WEDGED			0xffffffff
+#define I915_WEDGED			(1 << 31)
 
 	/**
 	 * Waitqueue to signal when the reset has completed. Used by clients
@@ -2046,12 +2042,17 @@ int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
 static inline bool i915_reset_in_progress(struct i915_gpu_error *error)
 {
 	return unlikely(atomic_read(&error->reset_counter)
-			& I915_RESET_IN_PROGRESS_FLAG);
+			& (I915_RESET_IN_PROGRESS_FLAG | I915_WEDGED));
 }
 
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
-	return atomic_read(&error->reset_counter) == I915_WEDGED;
+	return atomic_read(&error->reset_counter) & I915_WEDGED;
+}
+
+static inline u32 i915_reset_count(struct i915_gpu_error *error)
+{
+	return ((atomic_read(&error->reset_counter) & ~I915_WEDGED) + 1) / 2;
 }
 
 void i915_gem_reset(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e14285b..19949e8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1781,7 +1781,7 @@ static void i915_error_work_func(struct work_struct *work)
 			kobject_uevent_env(&dev->primary->kdev.kobj,
 					   KOBJ_CHANGE, reset_done_event);
 		} else {
-			atomic_set(&error->reset_counter, I915_WEDGED);
+			atomic_set_mask(I915_WEDGED, &error->reset_counter);
 		}
 
 		/*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl
  2013-11-12  0:01     ` Ian Romanick
@ 2013-11-12 13:17       ` Daniel Vetter
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2013-11-12 13:17 UTC (permalink / raw)
  To: Ian Romanick; +Cc: miku, intel-gfx, Daniel Vetter

On Mon, Nov 11, 2013 at 04:01:05PM -0800, Ian Romanick wrote:
> On 11/08/2013 10:11 AM, Damien Lespiau wrote:
> > On Wed, Oct 30, 2013 at 03:44:16PM +0200, Mika Kuoppala wrote:
> >> This ioctl returns reset stats for specified context.
> >>
> >> The struct returned contains context loss counters.
> >>
> >> reset_count:    all resets across all contexts
> >> batch_active:   active batches lost on resets
> >> batch_pending:  pending batches lost on resets
> >>
> >> v2: get rid of state tracking completely and deliver only counts. Idea
> >>     from Chris Wilson.
> >>
> >> v3: fix commit message
> >>
> >> v4: default context handled inside i915_gem_context_get_hang_stats
> >>
> >> v5: reset_count only for priviledged process
> >>
> >> v6: ctx=0 needs CAP_SYS_ADMIN for batch_* counters (Chris Wilson)
> >>
> >> v7: context hang stats never returns NULL
> >>
> >> v8: rebased on top of reworked context hang stats
> >>     DRM_RENDER_ALLOW for ioctl
> >>
> >> v9: use DEFAULT_CONTEXT_ID. Improve comments for ioctl struct members
> >>
> >> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >> Cc: Ian Romanick <idr@freedesktop.org>
> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > The logic there looks good to me, I was just wondering if we wanted a
> > bit more padding in case we want other counters or misc stuff.
> 
> I think the current padding should be sufficient.  The other things that
> could occur aren't things that we should return counts of.  For those
> kinds of things, we'd only want to return flags.  In fact, the current
> user space only uses the counts as flags anyway (is the count non-zero?).
> 
> > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> 
> Also,
> 
> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

I've just pulled in this and Mika's updated prep patch into my queue. I'll
freeze it down into drm-intel-next at the end of this week. Thanks for the
patch and review.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl
  2013-10-30 13:44 ` [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl Mika Kuoppala
  2013-10-30 17:37   ` Ian Romanick
  2013-11-08 18:11   ` Damien Lespiau
@ 2013-11-12 14:19   ` Daniel Vetter
  2013-11-12 17:49     ` [PATCH] drm/i915: check i915_get_reset_stats_ioctl args Mika Kuoppala
  2 siblings, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2013-11-12 14:19 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, miku

On Wed, Oct 30, 2013 at 03:44:16PM +0200, Mika Kuoppala wrote:
> This ioctl returns reset stats for specified context.
> 
> The struct returned contains context loss counters.
> 
> reset_count:    all resets across all contexts
> batch_active:   active batches lost on resets
> batch_pending:  pending batches lost on resets
> 
> v2: get rid of state tracking completely and deliver only counts. Idea
>     from Chris Wilson.
> 
> v3: fix commit message
> 
> v4: default context handled inside i915_gem_context_get_hang_stats
> 
> v5: reset_count only for priviledged process
> 
> v6: ctx=0 needs CAP_SYS_ADMIN for batch_* counters (Chris Wilson)
> 
> v7: context hang stats never returns NULL
> 
> v8: rebased on top of reworked context hang stats
>     DRM_RENDER_ALLOW for ioctl
> 
> v9: use DEFAULT_CONTEXT_ID. Improve comments for ioctl struct members
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Ian Romanick <idr@freedesktop.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_dma.c     |    1 +
>  drivers/gpu/drm/i915/i915_drv.h     |    2 ++
>  drivers/gpu/drm/i915/intel_uncore.c |   34 ++++++++++++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h         |   19 +++++++++++++++++++
>  4 files changed, 56 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 6eecce7..f2cdeb2 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1921,6 +1921,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_get_reset_stats_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  };
>  
>  int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 9fd716d..8870804 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2369,6 +2369,8 @@ extern int intel_enable_rc6(const struct drm_device *dev);
>  extern bool i915_semaphore_is_enabled(struct drm_device *dev);
>  int i915_reg_read_ioctl(struct drm_device *dev, void *data,
>  			struct drm_file *file);
> +int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
> +			       struct drm_file *file);
>  
>  /* overlay */
>  extern struct intel_overlay_error_state *intel_overlay_capture_error_state(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index f6fae35..21cf951 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -633,6 +633,40 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>  	return 0;
>  }
>  
> +int i915_get_reset_stats_ioctl(struct drm_device *dev,
> +			       void *data, struct drm_file *file)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_reset_stats *args = data;
> +	struct i915_ctx_hang_stats *hs;
> +	int ret;
> +
> +	if (args->ctx_id == DEFAULT_CONTEXT_ID && !capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	hs = i915_gem_context_get_hang_stats(dev, file, args->ctx_id);
> +	if (IS_ERR(hs)) {
> +		mutex_unlock(&dev->struct_mutex);
> +		return PTR_ERR(hs);
> +	}
> +
> +	if (capable(CAP_SYS_ADMIN))
> +		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
> +	else
> +		args->reset_count = 0;
> +
> +	args->batch_active = hs->batch_active;
> +	args->batch_pending = hs->batch_pending;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	return 0;
> +}
> +
>  static int i965_reset_complete(struct drm_device *dev)
>  {
>  	u8 gdrst;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 3a4e97b..52aed89 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -222,6 +222,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_I915_GEM_SET_CACHING	0x2f
>  #define DRM_I915_GEM_GET_CACHING	0x30
>  #define DRM_I915_REG_READ		0x31
> +#define DRM_I915_GET_RESET_STATS	0x32
>  
>  #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
>  #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
> @@ -271,6 +272,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
>  #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
>  #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
> +#define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
>  
>  /* Allow drivers to submit batchbuffers directly to hardware, relying
>   * on the security mechanisms provided by hardware.
> @@ -1030,4 +1032,21 @@ struct drm_i915_reg_read {
>  	__u64 offset;
>  	__u64 val; /* Return value */
>  };
> +
> +struct drm_i915_reset_stats {
> +	__u32 ctx_id;
> +	__u32 flags;
> +
> +	/* All resets since boot/module reload, for all contexts */
> +	__u32 reset_count;
> +
> +	/* Number of batches lost when active in GPU, for this context */
> +	__u32 batch_active;
> +
> +	/* Number of batches lost pending for execution, for this context */
> +	__u32 batch_pending;
> +
> +	__u32 pad;

We need to check that userspace doesn't submit stack garbage in this
field, otherwise we can't ever use it to extend it later on. Same for the
flags field. Can you please submit the follow-up patch and a extension for
your testcase to check for this (i.e. submit 0xffffffff in flags/pad and
check that we get an EINVAL).

Thanks, Daniel

> +};
> +
>  #endif /* _UAPI_I915_DRM_H_ */
> -- 
> 1.7.9.5
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] drm/i915: check i915_get_reset_stats_ioctl args
  2013-11-12 14:19   ` Daniel Vetter
@ 2013-11-12 17:49     ` Mika Kuoppala
  2013-11-13 10:52       ` Daniel Vetter
  0 siblings, 1 reply; 11+ messages in thread
From: Mika Kuoppala @ 2013-11-12 17:49 UTC (permalink / raw)
  To: intel-gfx

Insist that flags and pad fields are zero, so that
we can safely extend the interface in future.

Testcase: igt/gem_reset_stats/params

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_uncore.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 21cf951..a881906 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -641,6 +641,9 @@ int i915_get_reset_stats_ioctl(struct drm_device *dev,
 	struct i915_ctx_hang_stats *hs;
 	int ret;
 
+	if (args->flags || args->pad)
+		return -EINVAL;
+
 	if (args->ctx_id == DEFAULT_CONTEXT_ID && !capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: check i915_get_reset_stats_ioctl args
  2013-11-12 17:49     ` [PATCH] drm/i915: check i915_get_reset_stats_ioctl args Mika Kuoppala
@ 2013-11-13 10:52       ` Daniel Vetter
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2013-11-13 10:52 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, Nov 12, 2013 at 07:49:35PM +0200, Mika Kuoppala wrote:
> Insist that flags and pad fields are zero, so that
> we can safely extend the interface in future.
> 
> Testcase: igt/gem_reset_stats/params
> 
> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-11-13 10:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-30 13:44 [PATCH 1/2] drm/i915: add i915_reset_count Mika Kuoppala
2013-10-30 13:44 ` [PATCH 2/2] drm/i915: add i915_get_reset_stats_ioctl Mika Kuoppala
2013-10-30 17:37   ` Ian Romanick
2013-11-08 18:11   ` Damien Lespiau
2013-11-12  0:01     ` Ian Romanick
2013-11-12 13:17       ` Daniel Vetter
2013-11-12 14:19   ` Daniel Vetter
2013-11-12 17:49     ` [PATCH] drm/i915: check i915_get_reset_stats_ioctl args Mika Kuoppala
2013-11-13 10:52       ` Daniel Vetter
2013-11-08 17:40 ` [PATCH 1/2] drm/i915: add i915_reset_count Damien Lespiau
2013-11-12 12:44   ` Mika Kuoppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.