intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] [v2] drm/i915: Created a sized object error dump
@ 2013-02-25  2:10 Ben Widawsky
  2013-02-25  2:10 ` [PATCH 2/3] drm/i915: exclude CCID for platforms without it Ben Widawsky
  2013-02-25  2:10 ` [PATCH 3/3] [v2] drm/i915: Capture current context on error Ben Widawsky
  0 siblings, 2 replies; 8+ messages in thread
From: Ben Widawsky @ 2013-02-25  2:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky

v2: Actually use num_pages (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 29037e0..420911c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -952,24 +952,23 @@ static void i915_get_extra_instdone(struct drm_device *dev,
 
 #ifdef CONFIG_DEBUG_FS
 static struct drm_i915_error_object *
-i915_error_object_create(struct drm_i915_private *dev_priv,
-			 struct drm_i915_gem_object *src)
+i915_error_object_create_sized(struct drm_i915_private *dev_priv,
+			       struct drm_i915_gem_object *src,
+			       const int num_pages)
 {
 	struct drm_i915_error_object *dst;
-	int i, count;
+	int i;
 	u32 reloc_offset;
 
 	if (src == NULL || src->pages == NULL)
 		return NULL;
 
-	count = src->base.size / PAGE_SIZE;
-
-	dst = kmalloc(sizeof(*dst) + count * sizeof(u32 *), GFP_ATOMIC);
+	dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *), GFP_ATOMIC);
 	if (dst == NULL)
 		return NULL;
 
 	reloc_offset = src->gtt_offset;
-	for (i = 0; i < count; i++) {
+	for (i = 0; i < num_pages; i++) {
 		unsigned long flags;
 		void *d;
 
@@ -1019,7 +1018,7 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 
 		reloc_offset += PAGE_SIZE;
 	}
-	dst->page_count = count;
+	dst->page_count = num_pages;
 	dst->gtt_offset = src->gtt_offset;
 
 	return dst;
@@ -1030,6 +1029,9 @@ unwind:
 	kfree(dst);
 	return NULL;
 }
+#define i915_error_object_create(dev_priv, src) \
+	i915_error_object_create_sized((dev_priv), (src), \
+				       (src)->base.size>>PAGE_SHIFT)
 
 static void
 i915_error_object_free(struct drm_i915_error_object *obj)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] drm/i915: exclude CCID for platforms without it
  2013-02-25  2:10 [PATCH 1/3] [v2] drm/i915: Created a sized object error dump Ben Widawsky
@ 2013-02-25  2:10 ` Ben Widawsky
  2013-02-25  2:10 ` [PATCH 3/3] [v2] drm/i915: Capture current context on error Ben Widawsky
  1 sibling, 0 replies; 8+ messages in thread
From: Ben Widawsky @ 2013-02-25  2:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_irq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 420911c..6a328b8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1317,7 +1317,8 @@ static void i915_capture_error_state(struct drm_device *dev)
 	kref_init(&error->ref);
 	error->eir = I915_READ(EIR);
 	error->pgtbl_er = I915_READ(PGTBL_ER);
-	error->ccid = I915_READ(CCID);
+	if (HAS_HW_CONTEXTS(dev))
+		error->ccid = I915_READ(CCID);
 
 	if (HAS_PCH_SPLIT(dev))
 		error->ier = I915_READ(DEIER) | I915_READ(GTIER);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] [v2] drm/i915: Capture current context on error
  2013-02-25  2:10 [PATCH 1/3] [v2] drm/i915: Created a sized object error dump Ben Widawsky
  2013-02-25  2:10 ` [PATCH 2/3] drm/i915: exclude CCID for platforms without it Ben Widawsky
@ 2013-02-25  2:10 ` Ben Widawsky
  2013-02-25 10:26   ` Chris Wilson
                     ` (3 more replies)
  1 sibling, 4 replies; 8+ messages in thread
From: Ben Widawsky @ 2013-02-25  2:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky

On error, this represents the state of the currently running context at
the time it was loaded.

Unfortunately, since we're hung and can't switch out the context this
may not tell us too much about the most current state of the context,
but does give clues about what has happened since loading.

Thanks to recent doc updates, we have a little more confidence regarding
what is actually in this memory, and perhaps it will help us gain more
insight into certain bugs. AFAICT, the most interesting info is in the
first page. To save space, we only capture the first page. In the
future, we might want to dump more.

Sample of the relevant part of error state:
render ring --- HW Context = 0x01b20000
[0000] 00000000 1100105f 00002028 ffff0880
[0010] 0000209c feff4040 000020c0 efdf0080
[0020] 00002178 00000001 0000217c 00145855
[0030] 00002310 00000000 00002314 00000000

v2: Move error collection to the ring error code
Change format of dump to not confuse intel_error_decode (Chris)
Put the context error object with the others (Chris)
Don't search bound_list instead of active_list (chris)

References: https://bugs.freedesktop.org/show_bug.cgi?id=55845
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 15 +++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h     |  2 +-
 drivers/gpu/drm/i915/i915_irq.c     | 10 ++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7c65ab8..9b4f3d7 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -772,6 +772,21 @@ static int i915_error_state(struct seq_file *m, void *unused)
 				}
 			}
 		}
+		if ((obj = error->ring[i].ctx)) {
+			seq_printf(m, "%s --- HW Context = 0x%08x\n",
+				   dev_priv->ring[i].name,
+				   obj->gtt_offset);
+			offset = 0;
+			for (elt = 0; elt < PAGE_SIZE/16; elt+=4) {
+				seq_printf(m, "[%04x] %08x %08x %08x %08x\n",
+					   offset,
+					   obj->pages[0][elt],
+					   obj->pages[0][elt+1],
+					   obj->pages[0][elt+2],
+					   obj->pages[0][elt+3]);
+					offset += 16;
+			}
+		}
 	}
 
 	if (error->overlay)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e95337c..bedabcd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -243,7 +243,7 @@ struct drm_i915_error_state {
 			int page_count;
 			u32 gtt_offset;
 			u32 *pages[0];
-		} *ringbuffer, *batchbuffer;
+		} *ringbuffer, *batchbuffer, *ctx;
 		struct drm_i915_error_request {
 			long jiffies;
 			u32 seqno;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6a328b8..125b7d8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1244,6 +1244,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
 	struct drm_i915_gem_request *request;
+	struct drm_i915_gem_object *obj;
 	int i, count;
 
 	for_each_ring(ring, dev_priv, i) {
@@ -1255,6 +1256,15 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		error->ring[i].ringbuffer =
 			i915_error_object_create(dev_priv, ring->obj);
 
+		/* Currently render ring is the only HW context user */
+		if ((ring->id == RCS) && error->ccid) {
+			list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list)
+			if ((error->ccid & PAGE_MASK) == obj->gtt_offset)
+				error->ring[i].ctx =
+					i915_error_object_create_sized(dev_priv,
+								       obj, 1);
+		}
+
 		count = 0;
 		list_for_each_entry(request, &ring->request_list, list)
 			count++;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] [v2] drm/i915: Capture current context on error
  2013-02-25  2:10 ` [PATCH 3/3] [v2] drm/i915: Capture current context on error Ben Widawsky
@ 2013-02-25 10:26   ` Chris Wilson
  2013-03-04 19:57   ` Daniel Vetter
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Chris Wilson @ 2013-02-25 10:26 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx

On Sun, Feb 24, 2013 at 06:10:02PM -0800, Ben Widawsky wrote:
> On error, this represents the state of the currently running context at
> the time it was loaded.
> 
> Unfortunately, since we're hung and can't switch out the context this
> may not tell us too much about the most current state of the context,
> but does give clues about what has happened since loading.
> 
> Thanks to recent doc updates, we have a little more confidence regarding
> what is actually in this memory, and perhaps it will help us gain more
> insight into certain bugs. AFAICT, the most interesting info is in the
> first page. To save space, we only capture the first page. In the
> future, we might want to dump more.
> 
> Sample of the relevant part of error state:
> render ring --- HW Context = 0x01b20000
> [0000] 00000000 1100105f 00002028 ffff0880
> [0010] 0000209c feff4040 000020c0 efdf0080
> [0020] 00002178 00000001 0000217c 00145855
> [0030] 00002310 00000000 00002314 00000000
> 
> v2: Move error collection to the ring error code
> Change format of dump to not confuse intel_error_decode (Chris)
> Put the context error object with the others (Chris)
> Don't search bound_list instead of active_list (chris)
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=55845
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Can't spot anything else to nag about so all 3 are
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] [v2] drm/i915: Capture current context on error
  2013-02-25  2:10 ` [PATCH 3/3] [v2] drm/i915: Capture current context on error Ben Widawsky
  2013-02-25 10:26   ` Chris Wilson
@ 2013-03-04 19:57   ` Daniel Vetter
  2013-03-04 21:12   ` [PATCH] [v3] " Ben Widawsky
  2013-03-05  1:00   ` [PATCH] [v4] " Ben Widawsky
  3 siblings, 0 replies; 8+ messages in thread
From: Daniel Vetter @ 2013-03-04 19:57 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx

On Sun, Feb 24, 2013 at 06:10:02PM -0800, Ben Widawsky wrote:
> On error, this represents the state of the currently running context at
> the time it was loaded.
> 
> Unfortunately, since we're hung and can't switch out the context this
> may not tell us too much about the most current state of the context,
> but does give clues about what has happened since loading.
> 
> Thanks to recent doc updates, we have a little more confidence regarding
> what is actually in this memory, and perhaps it will help us gain more
> insight into certain bugs. AFAICT, the most interesting info is in the
> first page. To save space, we only capture the first page. In the
> future, we might want to dump more.
> 
> Sample of the relevant part of error state:
> render ring --- HW Context = 0x01b20000
> [0000] 00000000 1100105f 00002028 ffff0880
> [0010] 0000209c feff4040 000020c0 efdf0080
> [0020] 00002178 00000001 0000217c 00145855
> [0030] 00002310 00000000 00002314 00000000
> 
> v2: Move error collection to the ring error code
> Change format of dump to not confuse intel_error_decode (Chris)
> Put the context error object with the others (Chris)
> Don't search bound_list instead of active_list (chris)
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=55845
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 15 +++++++++++++++
>  drivers/gpu/drm/i915/i915_drv.h     |  2 +-
>  drivers/gpu/drm/i915/i915_irq.c     | 10 ++++++++++
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 7c65ab8..9b4f3d7 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -772,6 +772,21 @@ static int i915_error_state(struct seq_file *m, void *unused)
>  				}
>  			}
>  		}
> +		if ((obj = error->ring[i].ctx)) {
> +			seq_printf(m, "%s --- HW Context = 0x%08x\n",
> +				   dev_priv->ring[i].name,
> +				   obj->gtt_offset);
> +			offset = 0;
> +			for (elt = 0; elt < PAGE_SIZE/16; elt+=4) {
> +				seq_printf(m, "[%04x] %08x %08x %08x %08x\n",
> +					   offset,
> +					   obj->pages[0][elt],
> +					   obj->pages[0][elt+1],
> +					   obj->pages[0][elt+2],
> +					   obj->pages[0][elt+3]);
> +					offset += 16;
> +			}
> +		}
>  	}
>  
>  	if (error->overlay)
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e95337c..bedabcd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -243,7 +243,7 @@ struct drm_i915_error_state {
>  			int page_count;
>  			u32 gtt_offset;
>  			u32 *pages[0];
> -		} *ringbuffer, *batchbuffer;
> +		} *ringbuffer, *batchbuffer, *ctx;
>  		struct drm_i915_error_request {
>  			long jiffies;
>  			u32 seqno;
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 6a328b8..125b7d8 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1244,6 +1244,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct intel_ring_buffer *ring;
>  	struct drm_i915_gem_request *request;
> +	struct drm_i915_gem_object *obj;
>  	int i, count;
>  
>  	for_each_ring(ring, dev_priv, i) {
> @@ -1255,6 +1256,15 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  		error->ring[i].ringbuffer =
>  			i915_error_object_create(dev_priv, ring->obj);
>  
> +		/* Currently render ring is the only HW context user */
> +		if ((ring->id == RCS) && error->ccid) {
> +			list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list)
> +			if ((error->ccid & PAGE_MASK) == obj->gtt_offset)
> +				error->ring[i].ctx =
> +					i915_error_object_create_sized(dev_priv,
> +								       obj, 1);

checkpatch is a bit unhappy about your hunk in i915_debugfs.c. And it
complains about the overtly long line here. I wanted to quickly fix this
up until I've noticed that the indentation here is a bit ... artistic.
Fixing it up indents the code way too much, I think this should be
extracted into a tiny helper function, which allows us to flatten the
logic with early return;s and continue;s.

First two patches of this series merged to dinq, thanks.
-Daniel

> +		}
> +
>  		count = 0;
>  		list_for_each_entry(request, &ring->request_list, list)
>  			count++;
> -- 
> 1.8.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] [v3] drm/i915: Capture current context on error
  2013-02-25  2:10 ` [PATCH 3/3] [v2] drm/i915: Capture current context on error Ben Widawsky
  2013-02-25 10:26   ` Chris Wilson
  2013-03-04 19:57   ` Daniel Vetter
@ 2013-03-04 21:12   ` Ben Widawsky
  2013-03-05  1:00   ` [PATCH] [v4] " Ben Widawsky
  3 siblings, 0 replies; 8+ messages in thread
From: Ben Widawsky @ 2013-03-04 21:12 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

On error, this represents the state of the currently running context at
the time it was loaded.

Unfortunately, since we're hung and can't switch out the context this
may not tell us too much about the most current state of the context,
but does give clues about what has happened since loading.

Thanks to recent doc updates, we have a little more confidence regarding
what is actually in this memory, and perhaps it will help us gain more
insight into certain bugs. AFAICT, the most interesting info is in the
first page. To save space, we only capture the first page. In the
future, we might want to dump more.

Sample of the relevant part of error state:
render ring --- HW Context = 0x01b20000
[0000] 00000000 1100105f 00002028 ffff0880
[0010] 0000209c feff4040 000020c0 efdf0080
[0020] 00002178 00000001 0000217c 00145855
[0030] 00002310 00000000 00002314 00000000

v2: Move error collection to the ring error code
Change format of dump to not confuse intel_error_decode (Chris)
Put the context error object with the others (Chris)
Don't search bound_list instead of active_list (chris)

v3: extract and flatten context recording (daniel)
checkpatch related fixes for the copypasta in debugfs

References: https://bugs.freedesktop.org/show_bug.cgi?id=55845
Reviewed-by (v2): Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 17 +++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h     |  2 +-
 drivers/gpu/drm/i915/i915_irq.c     | 23 +++++++++++++++++++++++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7c65ab8..c92ae7f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -772,6 +772,23 @@ static int i915_error_state(struct seq_file *m, void *unused)
 				}
 			}
 		}
+
+		obj = error->ring[i].ctx;
+		if (obj) {
+			seq_printf(m, "%s --- HW Context = 0x%08x\n",
+				   dev_priv->ring[i].name,
+				   obj->gtt_offset);
+			offset = 0;
+			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
+				seq_printf(m, "[%04x] %08x %08x %08x %08x\n",
+					   offset,
+					   obj->pages[0][elt],
+					   obj->pages[0][elt+1],
+					   obj->pages[0][elt+2],
+					   obj->pages[0][elt+3]);
+					offset += 16;
+			}
+		}
 	}
 
 	if (error->overlay)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e95337c..bedabcd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -243,7 +243,7 @@ struct drm_i915_error_state {
 			int page_count;
 			u32 gtt_offset;
 			u32 *pages[0];
-		} *ringbuffer, *batchbuffer;
+		} *ringbuffer, *batchbuffer, *ctx;
 		struct drm_i915_error_request {
 			long jiffies;
 			u32 seqno;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6a328b8..ac2ad0f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1238,6 +1238,26 @@ static void i915_record_ring_state(struct drm_device *dev,
 	error->cpu_ring_tail[ring->id] = ring->tail;
 }
 
+
+static void i915_gem_record_active_context(struct intel_ring_buffer *ring,
+					   struct drm_i915_error_state *error,
+					   struct drm_i915_error_ring *ering)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_gem_object *obj;
+
+	/* Currently render ring is the only HW context user */
+	if ((ring->id == RCS) && error->ccid)
+		return;
+
+	list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list) {
+		if ((error->ccid & PAGE_MASK) == obj->gtt_offset) {
+			ering->ctx = i915_error_object_create_sized(dev_priv,
+								    obj, 1);
+		}
+	}
+}
+
 static void i915_gem_record_rings(struct drm_device *dev,
 				  struct drm_i915_error_state *error)
 {
@@ -1255,6 +1275,9 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		error->ring[i].ringbuffer =
 			i915_error_object_create(dev_priv, ring->obj);
 
+
+		i915_gem_record_active_context(ring, error, &error->ring[i]);
+
 		count = 0;
 		list_for_each_entry(request, &ring->request_list, list)
 			count++;
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] [v4] drm/i915: Capture current context on error
  2013-02-25  2:10 ` [PATCH 3/3] [v2] drm/i915: Capture current context on error Ben Widawsky
                     ` (2 preceding siblings ...)
  2013-03-04 21:12   ` [PATCH] [v3] " Ben Widawsky
@ 2013-03-05  1:00   ` Ben Widawsky
  2013-03-05  8:39     ` Daniel Vetter
  3 siblings, 1 reply; 8+ messages in thread
From: Ben Widawsky @ 2013-03-05  1:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

On error, this represents the state of the currently running context at
the time it was loaded.

Unfortunately, since we're hung and can't switch out the context this
may not tell us too much about the most current state of the context,
but does give clues about what has happened since loading.

Thanks to recent doc updates, we have a little more confidence regarding
what is actually in this memory, and perhaps it will help us gain more
insight into certain bugs. AFAICT, the most interesting info is in the
first page. To save space, we only capture the first page. In the
future, we might want to dump more.

Sample of the relevant part of error state:
render ring --- HW Context = 0x01b20000
[0000] 00000000 1100105f 00002028 ffff0880
[0010] 0000209c feff4040 000020c0 efdf0080
[0020] 00002178 00000001 0000217c 00145855
[0030] 00002310 00000000 00002314 00000000

v2: Move error collection to the ring error code
Change format of dump to not confuse intel_error_decode (Chris)
Put the context error object with the others (Chris)
Don't search bound_list instead of active_list (chris)

v3: extract and flatten context recording (daniel)
checkpatch related fixes for the copypasta in debugfs

v4: bug in v3 (Daniel)
-       if ((ring->id == RCS) && error->ccid)
+       if ((ring->id != RCS) || !error->ccid)

References: https://bugs.freedesktop.org/show_bug.cgi?id=55845
Reviewed-by (v2): Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 17 +++++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h     |  2 +-
 drivers/gpu/drm/i915/i915_irq.c     | 23 +++++++++++++++++++++++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7c65ab8..c92ae7f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -772,6 +772,23 @@ static int i915_error_state(struct seq_file *m, void *unused)
 				}
 			}
 		}
+
+		obj = error->ring[i].ctx;
+		if (obj) {
+			seq_printf(m, "%s --- HW Context = 0x%08x\n",
+				   dev_priv->ring[i].name,
+				   obj->gtt_offset);
+			offset = 0;
+			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
+				seq_printf(m, "[%04x] %08x %08x %08x %08x\n",
+					   offset,
+					   obj->pages[0][elt],
+					   obj->pages[0][elt+1],
+					   obj->pages[0][elt+2],
+					   obj->pages[0][elt+3]);
+					offset += 16;
+			}
+		}
 	}
 
 	if (error->overlay)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e95337c..bedabcd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -243,7 +243,7 @@ struct drm_i915_error_state {
 			int page_count;
 			u32 gtt_offset;
 			u32 *pages[0];
-		} *ringbuffer, *batchbuffer;
+		} *ringbuffer, *batchbuffer, *ctx;
 		struct drm_i915_error_request {
 			long jiffies;
 			u32 seqno;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6a328b8..dd95e82 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1238,6 +1238,26 @@ static void i915_record_ring_state(struct drm_device *dev,
 	error->cpu_ring_tail[ring->id] = ring->tail;
 }
 
+
+static void i915_gem_record_active_context(struct intel_ring_buffer *ring,
+					   struct drm_i915_error_state *error,
+					   struct drm_i915_error_ring *ering)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_gem_object *obj;
+
+	/* Currently render ring is the only HW context user */
+	if ((ring->id != RCS) || !error->ccid)
+		return;
+
+	list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list) {
+		if ((error->ccid & PAGE_MASK) == obj->gtt_offset) {
+			ering->ctx = i915_error_object_create_sized(dev_priv,
+								    obj, 1);
+		}
+	}
+}
+
 static void i915_gem_record_rings(struct drm_device *dev,
 				  struct drm_i915_error_state *error)
 {
@@ -1255,6 +1275,9 @@ static void i915_gem_record_rings(struct drm_device *dev,
 		error->ring[i].ringbuffer =
 			i915_error_object_create(dev_priv, ring->obj);
 
+
+		i915_gem_record_active_context(ring, error, &error->ring[i]);
+
 		count = 0;
 		list_for_each_entry(request, &ring->request_list, list)
 			count++;
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] [v4] drm/i915: Capture current context on error
  2013-03-05  1:00   ` [PATCH] [v4] " Ben Widawsky
@ 2013-03-05  8:39     ` Daniel Vetter
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Vetter @ 2013-03-05  8:39 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx

On Mon, Mar 04, 2013 at 05:00:29PM -0800, Ben Widawsky wrote:
> On error, this represents the state of the currently running context at
> the time it was loaded.
> 
> Unfortunately, since we're hung and can't switch out the context this
> may not tell us too much about the most current state of the context,
> but does give clues about what has happened since loading.
> 
> Thanks to recent doc updates, we have a little more confidence regarding
> what is actually in this memory, and perhaps it will help us gain more
> insight into certain bugs. AFAICT, the most interesting info is in the
> first page. To save space, we only capture the first page. In the
> future, we might want to dump more.
> 
> Sample of the relevant part of error state:
> render ring --- HW Context = 0x01b20000
> [0000] 00000000 1100105f 00002028 ffff0880
> [0010] 0000209c feff4040 000020c0 efdf0080
> [0020] 00002178 00000001 0000217c 00145855
> [0030] 00002310 00000000 00002314 00000000
> 
> v2: Move error collection to the ring error code
> Change format of dump to not confuse intel_error_decode (Chris)
> Put the context error object with the others (Chris)
> Don't search bound_list instead of active_list (chris)
> 
> v3: extract and flatten context recording (daniel)
> checkpatch related fixes for the copypasta in debugfs
> 
> v4: bug in v3 (Daniel)
> -       if ((ring->id == RCS) && error->ccid)
> +       if ((ring->id != RCS) || !error->ccid)

Still a redundant () pair here ... I've killed it.

> References: https://bugs.freedesktop.org/show_bug.cgi?id=55845
> Reviewed-by (v2): Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-03-05  8:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-25  2:10 [PATCH 1/3] [v2] drm/i915: Created a sized object error dump Ben Widawsky
2013-02-25  2:10 ` [PATCH 2/3] drm/i915: exclude CCID for platforms without it Ben Widawsky
2013-02-25  2:10 ` [PATCH 3/3] [v2] drm/i915: Capture current context on error Ben Widawsky
2013-02-25 10:26   ` Chris Wilson
2013-03-04 19:57   ` Daniel Vetter
2013-03-04 21:12   ` [PATCH] [v3] " Ben Widawsky
2013-03-05  1:00   ` [PATCH] [v4] " Ben Widawsky
2013-03-05  8:39     ` Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).