All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Make i915 events part of uapi
@ 2013-07-16 23:41 Ben Widawsky
  2013-07-17  5:07 ` Daniel Vetter
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Ben Widawsky @ 2013-07-16 23:41 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

Make the uevent strings part of the user API for people who wish to
write their own listeners.

CC: Chad Versace <chad.versace@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c | 8 ++++----
 include/uapi/drm/i915_drm.h     | 3 +++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9910699..60fa513 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -812,7 +812,7 @@ static void ivybridge_parity_work(struct work_struct *work)
 
 	mutex_unlock(&dev_priv->dev->struct_mutex);
 
-	parity_event[0] = "L3_PARITY_ERROR=1";
+	parity_event[0] = I915_L3_PARITY_EVENT"=1";
 	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
 	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
 	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
@@ -1435,9 +1435,9 @@ static void i915_error_work_func(struct work_struct *work)
 						    gpu_error);
 	struct drm_device *dev = dev_priv->dev;
 	struct intel_ring_buffer *ring;
-	char *error_event[] = { "ERROR=1", NULL };
-	char *reset_event[] = { "RESET=1", NULL };
-	char *reset_done_event[] = { "ERROR=0", NULL };
+	char *error_event[] = { I915_ERROR_EVENT"=1", NULL };
+	char *reset_event[] = { I915_RESET_EVENT"=1", NULL };
+	char *reset_done_event[] = { I915_ERROR_EVENT"=0", NULL };
 	int i, ret;
 
 	kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 923ed7f..25a3e74 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -33,6 +33,9 @@
  * subject to backwards-compatibility constraints.
  */
 
+#define I915_L3_PARITY_EVENT "L3_PARITY_ERROR"
+#define I915_ERROR_EVENT "ERROR"
+#define I915_RESET_EVENT "RESET"
 
 /* Each region is a minimum of 16k, and there are at most 255 of them.
  */
-- 
1.8.3.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Make i915 events part of uapi
  2013-07-16 23:41 [PATCH] drm/i915: Make i915 events part of uapi Ben Widawsky
@ 2013-07-17  5:07 ` Daniel Vetter
  2013-07-17 21:12   ` Chad Versace
  2013-07-19  5:42 ` [PATCH 1/3] drm/i915: Move error uevent to detection time Ben Widawsky
  2013-07-19 16:16 ` [PATCH] [v3] drm/i915: Make i915 events part of uapi Ben Widawsky
  2 siblings, 1 reply; 11+ messages in thread
From: Daniel Vetter @ 2013-07-17  5:07 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Tue, Jul 16, 2013 at 04:41:13PM -0700, Ben Widawsky wrote:
> Make the uevent strings part of the user API for people who wish to
> write their own listeners.
> 
> CC: Chad Versace <chad.versace@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

One thing I've toyed around with a bit is that we should add kerneldoc to
our uapi headers and create a DocBook out of it (maybe as a subsection in
the drm userspace api chapter). I guess the DocBook integration needs an
overall approach, but we should start to add comments to each piece of
userspace api to clearly spec them. See below for what I have in mind ...
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 8 ++++----
>  include/uapi/drm/i915_drm.h     | 3 +++
>  2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 9910699..60fa513 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -812,7 +812,7 @@ static void ivybridge_parity_work(struct work_struct *work)
>  
>  	mutex_unlock(&dev_priv->dev->struct_mutex);
>  
> -	parity_event[0] = "L3_PARITY_ERROR=1";
> +	parity_event[0] = I915_L3_PARITY_EVENT"=1";
>  	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
>  	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
>  	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
> @@ -1435,9 +1435,9 @@ static void i915_error_work_func(struct work_struct *work)
>  						    gpu_error);
>  	struct drm_device *dev = dev_priv->dev;
>  	struct intel_ring_buffer *ring;
> -	char *error_event[] = { "ERROR=1", NULL };
> -	char *reset_event[] = { "RESET=1", NULL };
> -	char *reset_done_event[] = { "ERROR=0", NULL };
> +	char *error_event[] = { I915_ERROR_EVENT"=1", NULL };
> +	char *reset_event[] = { I915_RESET_EVENT"=1", NULL };
> +	char *reset_done_event[] = { I915_ERROR_EVENT"=0", NULL };
>  	int i, ret;
>  
>  	kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 923ed7f..25a3e74 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -33,6 +33,9 @@
>   * subject to backwards-compatibility constraints.
>   */
>  

/**
 * DOC uevents generated by i915 on it's device node
 *
 * These are events generated in addition to the standardized uevents
 * genereated by the drm core (like signalling connector state changes).
 */

/**
 * I915_L3_PARITY_EVENT - l3 parity event
 *
 * Generated when the driver receives a parity mismatch event from the gpu
 * l3 cache. Additional information supplied is ROW, BANK, SUBBANK of the
 * affected cacheline. Userspace should keep track of these events and if
 * a specific cache-line seems to have a persistent error remap it with
 * the l3 remapping tool supplied in intel-gpu-tools
 *
 * The value supplied with the event is always 1.
 */
> +#define I915_L3_PARITY_EVENT "L3_PARITY_ERROR"
> +#define I915_ERROR_EVENT "ERROR"
> +#define I915_RESET_EVENT "RESET"

I'll let you cook up something nice for the other two ;-)

Cheers, Daniel
>  
>  /* Each region is a minimum of 16k, and there are at most 255 of them.
>   */
> -- 
> 1.8.3.3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/i915: Make i915 events part of uapi
  2013-07-17  5:07 ` Daniel Vetter
@ 2013-07-17 21:12   ` Chad Versace
  0 siblings, 0 replies; 11+ messages in thread
From: Chad Versace @ 2013-07-17 21:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Ben Widawsky, Intel GFX

On 07/16/2013 10:07 PM, Daniel Vetter wrote:
> On Tue, Jul 16, 2013 at 04:41:13PM -0700, Ben Widawsky wrote:
>> Make the uevent strings part of the user API for people who wish to
>> write their own listeners.
>>
>> CC: Chad Versace <chad.versace@linux.intel.com>
>> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>
> One thing I've toyed around with a bit is that we should add kerneldoc to
> our uapi headers and create a DocBook out of it (maybe as a subsection in
> the drm userspace api chapter). I guess the DocBook integration needs an
> overall approach, but we should start to add comments to each piece of
> userspace api to clearly spec them. See below for what I have in mind ...

Yes, docs please. I don't have the kernel-fu of a kernel-dev, so without
docs I don't know what these events mean and what to expect from them.

>> -	parity_event[0] = "L3_PARITY_ERROR=1";
>> +	parity_event[0] = I915_L3_PARITY_EVENT"=1";

Small nitpick. I usually find string concatenation more readable like this,
with a space:
     parity_event[0] = I915_L3_PARITY_EVENT "=1";
But, that's just my preference, so feel free to ignore me.

>> +#define I915_L3_PARITY_EVENT "L3_PARITY_ERROR"
>> +#define I915_ERROR_EVENT "ERROR"
>> +#define I915_RESET_EVENT "RESET"

Maybe this is a dumb question... since these are uevents, do you think the names
would be improved by given them a "UEVENT" suffix? Like I915_ERROR_UEVENT? Or is
that dumb, because these tokens are intended to serve more purposes than uevents?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] drm/i915: Move error uevent to detection time
  2013-07-16 23:41 [PATCH] drm/i915: Make i915 events part of uapi Ben Widawsky
  2013-07-17  5:07 ` Daniel Vetter
@ 2013-07-19  5:42 ` Ben Widawsky
  2013-07-19  5:42   ` [PATCH 2/3] drm/i915: Change uevent for reset completion Ben Widawsky
                     ` (2 more replies)
  2013-07-19 16:16 ` [PATCH] [v3] drm/i915: Make i915 events part of uapi Ben Widawsky
  2 siblings, 3 replies; 11+ messages in thread
From: Ben Widawsky @ 2013-07-19  5:42 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

We've recently deferred error handling with a workqueue for a number of
reasons. However, when we're trying to pass the information to
userspace, it's likely more interesting to know when the error was
detected by the kernel, and not when we eventually get around to
handling it in the workqueue.

This was inspired as a result of the patch to move these events to the
uapi.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9910699..9fe430a 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1435,13 +1435,10 @@ static void i915_error_work_func(struct work_struct *work)
 						    gpu_error);
 	struct drm_device *dev = dev_priv->dev;
 	struct intel_ring_buffer *ring;
-	char *error_event[] = { "ERROR=1", NULL };
 	char *reset_event[] = { "RESET=1", NULL };
 	char *reset_done_event[] = { "ERROR=0", NULL };
 	int i, ret;
 
-	kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
-
 	/*
 	 * Note that there's only one work item which does gpu resets, so we
 	 * need not worry about concurrent gpu resets potentially incrementing
@@ -1594,11 +1591,14 @@ void i915_handle_error(struct drm_device *dev, bool wedged)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
+	char *error_event[] = { "ERROR=1", NULL };
 	int i;
 
 	i915_capture_error_state(dev);
 	i915_report_and_clear_eir(dev);
 
+	kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
+
 	if (wedged) {
 		atomic_set_mask(I915_RESET_IN_PROGRESS_FLAG,
 				&dev_priv->gpu_error.reset_counter);
-- 
1.8.3.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] drm/i915: Change uevent for reset completion
  2013-07-19  5:42 ` [PATCH 1/3] drm/i915: Move error uevent to detection time Ben Widawsky
@ 2013-07-19  5:42   ` Ben Widawsky
  2013-07-19  6:57     ` Daniel Vetter
  2013-07-19  5:42   ` [PATCH 3/3] [v2] drm/i915: Make i915 events part of uapi Ben Widawsky
  2013-07-19  8:45   ` [PATCH 1/3] drm/i915: Move error uevent to detection time Chris Wilson
  2 siblings, 1 reply; 11+ messages in thread
From: Ben Widawsky @ 2013-07-19  5:42 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

Prior to this patch, we used ERROR=0 as a way of signifying the reset
was complete. The functionality dates back quite a ways to:

commit f316a42cc49eca73b33d85feb6177e32431747ff
Author: Ben Gamari <bgamari.foss@gmail.com>
Date:   Mon Sep 14 17:48:46 2009 -0400

    drm/i915: Hookup chip reset in error handler

I'm not really sure what the motivation for this was originally, but to
me it makes more sense to have a distinct event for error detection, and
another event for reset start/finish (since reset is prone to failure).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9fe430a..03071d7 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1436,7 +1436,7 @@ static void i915_error_work_func(struct work_struct *work)
 	struct drm_device *dev = dev_priv->dev;
 	struct intel_ring_buffer *ring;
 	char *reset_event[] = { "RESET=1", NULL };
-	char *reset_done_event[] = { "ERROR=0", NULL };
+	char *reset_done_event[] = { "RESET=0", NULL };
 	int i, ret;
 
 	/*
-- 
1.8.3.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] [v2] drm/i915: Make i915 events part of uapi
  2013-07-19  5:42 ` [PATCH 1/3] drm/i915: Move error uevent to detection time Ben Widawsky
  2013-07-19  5:42   ` [PATCH 2/3] drm/i915: Change uevent for reset completion Ben Widawsky
@ 2013-07-19  5:42   ` Ben Widawsky
  2013-07-19  8:45   ` [PATCH 1/3] drm/i915: Move error uevent to detection time Chris Wilson
  2 siblings, 0 replies; 11+ messages in thread
From: Ben Widawsky @ 2013-07-19  5:42 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

Make the uevent strings part of the user API for people who wish to
write their own listeners.

v2: Make a space in the string concatenation. (Chad)
Use the "UEVENT" suffix intead of "EVENT" (Chad)
Make kernel-doc (Daniel)

Thanks to Daniel Vetter for a majority of the parity error comments.

CC: Chad Versace <chad.versace@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c |  8 ++++----
 include/uapi/drm/i915_drm.h     | 22 ++++++++++++++++++++++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 03071d7..3e19ce0 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -812,7 +812,7 @@ static void ivybridge_parity_work(struct work_struct *work)
 
 	mutex_unlock(&dev_priv->dev->struct_mutex);
 
-	parity_event[0] = "L3_PARITY_ERROR=1";
+	parity_event[0] = I915_L3_PARITY_UEVENT "=1";
 	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
 	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
 	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
@@ -1435,8 +1435,8 @@ static void i915_error_work_func(struct work_struct *work)
 						    gpu_error);
 	struct drm_device *dev = dev_priv->dev;
 	struct intel_ring_buffer *ring;
-	char *reset_event[] = { "RESET=1", NULL };
-	char *reset_done_event[] = { "RESET=0", NULL };
+	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
+	char *reset_done_event[] = { I915_RESET_UEVENT "=0", NULL };
 	int i, ret;
 
 	/*
@@ -1591,7 +1591,7 @@ void i915_handle_error(struct drm_device *dev, bool wedged)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
-	char *error_event[] = { "ERROR=1", NULL };
+	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	int i;
 
 	i915_capture_error_state(dev);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 923ed7f..f5fe31b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -33,6 +33,28 @@
  * subject to backwards-compatibility constraints.
  */
 
+/**
+ * DOC: uevents generated by i915 on it's device node
+ *
+ * I915_L3_PARITY_UEVENT - Generated when the driver receives a parity mismatch
+ *	event from the gpu l3 cache. Additional information supplied is ROW,
+ *	BANK, SUBBANK of the affected cacheline. Userspace should keep track of
+ *	these events and if a specific cache-line seems to have a persistent
+ *	error remap it with the l3 remapping tool supplied in intel-gpu-tools.
+ *	The value supplied with the event is always 1.
+ *
+ * I915_ERROR_UEVENT - Generated upon error detection, currently only via
+ *	hangcheck. The error detection event is a good indicator of when things
+ *	began to go badly. The value supplied with the event is always 1.
+ *
+ * I915_RESET_UEVENT - Event is generated just before an attempt to reset the
+ *	the GPU. Reset has historically been a precarious attempt which the GPU
+ *	may or may not recover from. The value supplied with the event is 1
+ *	before the reset attempt, and a 0 after it has completed.
+ */
+#define I915_L3_PARITY_UEVENT "L3_PARITY_ERROR"
+#define I915_ERROR_UEVENT "ERROR"
+#define I915_RESET_UEVENT "RESET"
 
 /* Each region is a minimum of 16k, and there are at most 255 of them.
  */
-- 
1.8.3.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] drm/i915: Change uevent for reset completion
  2013-07-19  5:42   ` [PATCH 2/3] drm/i915: Change uevent for reset completion Ben Widawsky
@ 2013-07-19  6:57     ` Daniel Vetter
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2013-07-19  6:57 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Thu, Jul 18, 2013 at 10:42:49PM -0700, Ben Widawsky wrote:
> Prior to this patch, we used ERROR=0 as a way of signifying the reset
> was complete. The functionality dates back quite a ways to:
> 
> commit f316a42cc49eca73b33d85feb6177e32431747ff
> Author: Ben Gamari <bgamari.foss@gmail.com>
> Date:   Mon Sep 14 17:48:46 2009 -0400
> 
>     drm/i915: Hookup chip reset in error handler
> 
> I'm not really sure what the motivation for this was originally, but to
> me it makes more sense to have a distinct event for error detection, and
> another event for reset start/finish (since reset is prone to failure).
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 9fe430a..03071d7 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1436,7 +1436,7 @@ static void i915_error_work_func(struct work_struct *work)
>  	struct drm_device *dev = dev_priv->dev;
>  	struct intel_ring_buffer *ring;
>  	char *reset_event[] = { "RESET=1", NULL };
> -	char *reset_done_event[] = { "ERROR=0", NULL };
> +	char *reset_done_event[] = { "RESET=0", NULL };

It's a kernel api change so we need to check all distros and everything
else out there whether they don't use it. Imo not worth the effort since
signalling that the ERROR condition has passed is somewhat sensible, too.
Can you please respin with this part dropped?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] drm/i915: Move error uevent to detection time
  2013-07-19  5:42 ` [PATCH 1/3] drm/i915: Move error uevent to detection time Ben Widawsky
  2013-07-19  5:42   ` [PATCH 2/3] drm/i915: Change uevent for reset completion Ben Widawsky
  2013-07-19  5:42   ` [PATCH 3/3] [v2] drm/i915: Make i915 events part of uapi Ben Widawsky
@ 2013-07-19  8:45   ` Chris Wilson
  2 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2013-07-19  8:45 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Thu, Jul 18, 2013 at 10:42:48PM -0700, Ben Widawsky wrote:
> We've recently deferred error handling with a workqueue for a number of
> reasons. However, when we're trying to pass the information to
> userspace, it's likely more interesting to know when the error was
> detected by the kernel, and not when we eventually get around to
> handling it in the workqueue.
> 
> This was inspired as a result of the patch to move these events to the
> uapi.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 9910699..9fe430a 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1435,13 +1435,10 @@ static void i915_error_work_func(struct work_struct *work)
>  						    gpu_error);
>  	struct drm_device *dev = dev_priv->dev;
>  	struct intel_ring_buffer *ring;
> -	char *error_event[] = { "ERROR=1", NULL };
>  	char *reset_event[] = { "RESET=1", NULL };
>  	char *reset_done_event[] = { "ERROR=0", NULL };
>  	int i, ret;
>  
> -	kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
> -
>  	/*
>  	 * Note that there's only one work item which does gpu resets, so we
>  	 * need not worry about concurrent gpu resets potentially incrementing
> @@ -1594,11 +1591,14 @@ void i915_handle_error(struct drm_device *dev, bool wedged)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct intel_ring_buffer *ring;
> +	char *error_event[] = { "ERROR=1", NULL };
>  	int i;
>  
>  	i915_capture_error_state(dev);
>  	i915_report_and_clear_eir(dev);
>  
> +	kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
> +
>  	if (wedged) {
>  		atomic_set_mask(I915_RESET_IN_PROGRESS_FLAG,
>  				&dev_priv->gpu_error.reset_counter);

This leaves the reset counter unchanged prior to the uevent, if you move
the uevent signalling down to just before the queue_work() there would
be no userspace visible changes (other than timing, which is a moot
point).

However, kobject_uevent_env() is not suitable for calling from
irq-context.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] [v3] drm/i915: Make i915 events part of uapi
  2013-07-16 23:41 [PATCH] drm/i915: Make i915 events part of uapi Ben Widawsky
  2013-07-17  5:07 ` Daniel Vetter
  2013-07-19  5:42 ` [PATCH 1/3] drm/i915: Move error uevent to detection time Ben Widawsky
@ 2013-07-19 16:16 ` Ben Widawsky
  2013-07-19 16:27   ` Daniel Vetter
  2013-07-19 22:35   ` Chad Versace
  2 siblings, 2 replies; 11+ messages in thread
From: Ben Widawsky @ 2013-07-19 16:16 UTC (permalink / raw)
  To: Intel GFX; +Cc: Ben Widawsky

Make the uevent strings part of the user API for people who wish to
write their own listeners.

v2: Make a space in the string concatenation. (Chad)
Use the "UEVENT" suffix intead of "EVENT" (Chad)
Make kernel-doc parseable Docbook comments (Daniel)

v3: Undid reset change introduced in last submission (Daniel)
Fixed up comments to address removal changes.

Thanks to Daniel Vetter for a majority of the parity error comments.

CC: Chad Versace <chad.versace@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c |  8 ++++----
 include/uapi/drm/i915_drm.h     | 24 ++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9910699..4e4fa5f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -812,7 +812,7 @@ static void ivybridge_parity_work(struct work_struct *work)
 
 	mutex_unlock(&dev_priv->dev->struct_mutex);
 
-	parity_event[0] = "L3_PARITY_ERROR=1";
+	parity_event[0] = I915_L3_PARITY_UEVENT "=1";
 	parity_event[1] = kasprintf(GFP_KERNEL, "ROW=%d", row);
 	parity_event[2] = kasprintf(GFP_KERNEL, "BANK=%d", bank);
 	parity_event[3] = kasprintf(GFP_KERNEL, "SUBBANK=%d", subbank);
@@ -1435,9 +1435,9 @@ static void i915_error_work_func(struct work_struct *work)
 						    gpu_error);
 	struct drm_device *dev = dev_priv->dev;
 	struct intel_ring_buffer *ring;
-	char *error_event[] = { "ERROR=1", NULL };
-	char *reset_event[] = { "RESET=1", NULL };
-	char *reset_done_event[] = { "ERROR=0", NULL };
+	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
+	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
+	char *reset_done_event[] = { I915_ERROR_UEVENT "=0", NULL };
 	int i, ret;
 
 	kobject_uevent_env(&dev->primary->kdev.kobj, KOBJ_CHANGE, error_event);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 923ed7f..a1a7b6b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -33,6 +33,30 @@
  * subject to backwards-compatibility constraints.
  */
 
+/**
+ * DOC: uevents generated by i915 on it's device node
+ *
+ * I915_L3_PARITY_UEVENT - Generated when the driver receives a parity mismatch
+ *	event from the gpu l3 cache. Additional information supplied is ROW,
+ *	BANK, SUBBANK of the affected cacheline. Userspace should keep track of
+ *	these events and if a specific cache-line seems to have a persistent
+ *	error remap it with the l3 remapping tool supplied in intel-gpu-tools.
+ *	The value supplied with the event is always 1.
+ *
+ * I915_ERROR_UEVENT - Generated upon error detection, currently only via
+ *	hangcheck. The error detection event is a good indicator of when things
+ *	began to go badly. The value supplied with the event is a 1 upon error
+ *	detection, and a 0 upon reset completion, signifying no more error
+ *	exists. NOTE: Disabling hangcheck or reset via module parameter will
+ *	cause the related events to not be seen.
+ *
+ * I915_RESET_UEVENT - Event is generated just before an attempt to reset the
+ *	the GPU. The value supplied with the event is always 1. NOTE: Disable
+ *	reset via module parameter will cause this event to not be seen.
+ */
+#define I915_L3_PARITY_UEVENT		"L3_PARITY_ERROR"
+#define I915_ERROR_UEVENT		"ERROR"
+#define I915_RESET_UEVENT		"RESET"
 
 /* Each region is a minimum of 16k, and there are at most 255 of them.
  */
-- 
1.8.3.3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] [v3] drm/i915: Make i915 events part of uapi
  2013-07-19 16:16 ` [PATCH] [v3] drm/i915: Make i915 events part of uapi Ben Widawsky
@ 2013-07-19 16:27   ` Daniel Vetter
  2013-07-19 22:35   ` Chad Versace
  1 sibling, 0 replies; 11+ messages in thread
From: Daniel Vetter @ 2013-07-19 16:27 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On Fri, Jul 19, 2013 at 09:16:42AM -0700, Ben Widawsky wrote:
> Make the uevent strings part of the user API for people who wish to
> write their own listeners.
> 
> v2: Make a space in the string concatenation. (Chad)
> Use the "UEVENT" suffix intead of "EVENT" (Chad)
> Make kernel-doc parseable Docbook comments (Daniel)
> 
> v3: Undid reset change introduced in last submission (Daniel)
> Fixed up comments to address removal changes.
> 
> Thanks to Daniel Vetter for a majority of the parity error comments.
> 
> CC: Chad Versace <chad.versace@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [v3] drm/i915: Make i915 events part of uapi
  2013-07-19 16:16 ` [PATCH] [v3] drm/i915: Make i915 events part of uapi Ben Widawsky
  2013-07-19 16:27   ` Daniel Vetter
@ 2013-07-19 22:35   ` Chad Versace
  1 sibling, 0 replies; 11+ messages in thread
From: Chad Versace @ 2013-07-19 22:35 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Intel GFX

On 07/19/2013 09:16 AM, Ben Widawsky wrote:
> Make the uevent strings part of the user API for people who wish to
> write their own listeners.
>
> v2: Make a space in the string concatenation. (Chad)
> Use the "UEVENT" suffix intead of "EVENT" (Chad)
> Make kernel-doc parseable Docbook comments (Daniel)
>
> v3: Undid reset change introduced in last submission (Daniel)
> Fixed up comments to address removal changes.
>
> Thanks to Daniel Vetter for a majority of the parity error comments.
>
> CC: Chad Versace <chad.versace@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>   drivers/gpu/drm/i915/i915_irq.c |  8 ++++----
>   include/uapi/drm/i915_drm.h     | 24 ++++++++++++++++++++++++
>   2 files changed, 28 insertions(+), 4 deletions(-)

For what it's worth,
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-07-19 22:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-16 23:41 [PATCH] drm/i915: Make i915 events part of uapi Ben Widawsky
2013-07-17  5:07 ` Daniel Vetter
2013-07-17 21:12   ` Chad Versace
2013-07-19  5:42 ` [PATCH 1/3] drm/i915: Move error uevent to detection time Ben Widawsky
2013-07-19  5:42   ` [PATCH 2/3] drm/i915: Change uevent for reset completion Ben Widawsky
2013-07-19  6:57     ` Daniel Vetter
2013-07-19  5:42   ` [PATCH 3/3] [v2] drm/i915: Make i915 events part of uapi Ben Widawsky
2013-07-19  8:45   ` [PATCH 1/3] drm/i915: Move error uevent to detection time Chris Wilson
2013-07-19 16:16 ` [PATCH] [v3] drm/i915: Make i915 events part of uapi Ben Widawsky
2013-07-19 16:27   ` Daniel Vetter
2013-07-19 22:35   ` Chad Versace

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.