All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position
@ 2012-02-08 13:34 Chris Wilson
  2012-02-08 13:34 ` [PATCH 2/2] drm/i915: Record the tail at each request and use it to estimate the head Chris Wilson
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Chris Wilson @ 2012-02-08 13:34 UTC (permalink / raw)
  To: intel-gfx

This is a revert of 6aa56062eaba67adfb247cded244fd877329588d.

This was originally introduced to workaround reads of the ringbuffer
registers returning 0 on SandyBridge causing hangs due to ringbuffer
overflow. The root cause here was reads through the GT powerwell require
the forcewake dance, something we only learnt of later. Now it appears
that reading the reported head position from the HWS is returning
garbage, leading once again to hangs.

For example, on q35 the autoreported head reports:
  [  217.975608] head now 00010000, actual 00010000
  [  436.725613] head now 00200000, actual 00200000
  [  462.956033] head now 00210000, actual 00210010
  [  485.501409] head now 00400000, actual 00400020
  [  508.064280] head now 00410000, actual 00410000
  [  530.576078] head now 00600000, actual 00600020
  [  553.273489] head now 00610000, actual 00610018
which appears reasonably sane. In contrast, if we look at snb:
  [  141.970680] head now 00e10000, actual 00008238
  [  141.974062] head now 02734000, actual 000083c8
  [  141.974425] head now 00e10000, actual 00008488
  [  141.980374] head now 032b5000, actual 000088b8
  [  141.980885] head now 03271000, actual 00008950
  [  142.040628] head now 02101000, actual 00008b40
  [  142.180173] head now 02734000, actual 00009050
  [  142.181090] head now 00000000, actual 00000ae0
  [  142.183737] head now 02734000, actual 00009050

In addition, the automatic reporting of the head position is scheduled
to be defeatured in the future. It has no more utility, remove it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45492
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c |   14 +-------------
 1 files changed, 1 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 023739f..620b2ed 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -287,7 +287,7 @@ static int init_ring_common(struct intel_ring_buffer *ring)
 
 	I915_WRITE_CTL(ring,
 			((ring->size - PAGE_SIZE) & RING_NR_PAGES)
-			| RING_REPORT_64K | RING_VALID);
+			| RING_VALID);
 
 	/* If the head is still not zero, the ring is dead */
 	if ((I915_READ_CTL(ring) & RING_VALID) == 0 ||
@@ -1059,18 +1059,6 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long end;
-	u32 head;
-
-	/* If the reported head position has wrapped or hasn't advanced,
-	 * fallback to the slow and accurate path.
-	 */
-	head = intel_read_status_page(ring, 4);
-	if (head > ring->head) {
-		ring->head = head;
-		ring->space = ring_space(ring);
-		if (ring->space >= n)
-			return 0;
-	}
 
 	trace_i915_ring_wait_begin(ring);
 	if (drm_core_check_feature(dev, DRIVER_GEM))
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] drm/i915: Record the tail at each request and use it to estimate the head
  2012-02-08 13:34 [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Chris Wilson
@ 2012-02-08 13:34 ` Chris Wilson
  2012-02-08 14:51   ` Daniel Vetter
  2012-02-08 14:54 ` [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Daniel Vetter
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2012-02-08 13:34 UTC (permalink / raw)
  To: intel-gfx

By recording the location of every request in the ringbuffer, we know
that in order to retire the request the GPU must have finished reading
it and so the GPU head is now beyond the tail of the request. We can
therefore provide a conservative estimate of where the GPU is reading
from in order to avoid having to read back the ring buffer registers
when polling for space upon starting a new write into the ringbuffer.

A secondary effect is that this allows us to convert
intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
upon the single function to handle the complicated task of waiting upon
the GPU. A necessary precaution is that we need to make that wait
uninterruptible to match the existing conditions as all the callers of
intel_ring_begin() have not been audited to handle ERESTARTSYS
correctly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |    5 ++
 drivers/gpu/drm/i915/i915_gem.c         |    9 ++++-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   70 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |   15 +++++++
 4 files changed, 98 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4ee0793..e8e1b85 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -945,6 +945,9 @@ struct drm_i915_gem_request {
 	/** GEM sequence number associated with this request. */
 	uint32_t seqno;
 
+	/** Postion in the ringbuffer of the end of the request */
+	u32 tail;
+
 	/** Time at which this request was emitted, in jiffies. */
 	unsigned long emitted_jiffies;
 
@@ -1234,6 +1237,8 @@ i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj)
 }
 
 void i915_gem_retire_requests(struct drm_device *dev);
+void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring);
+
 void i915_gem_reset(struct drm_device *dev);
 void i915_gem_clflush_object(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_object_set_domain(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 51390b0..4de9e4e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1798,6 +1798,7 @@ i915_add_request(struct intel_ring_buffer *ring,
 
 	request->seqno = seqno;
 	request->ring = ring;
+	request->tail = intel_ring_get_tail(ring);
 	request->emitted_jiffies = jiffies;
 	was_empty = list_empty(&ring->request_list);
 	list_add_tail(&request->list, &ring->request_list);
@@ -1934,7 +1935,7 @@ void i915_gem_reset(struct drm_device *dev)
 /**
  * This function clears the request list as sequence numbers are passed.
  */
-static void
+void
 i915_gem_retire_requests_ring(struct intel_ring_buffer *ring)
 {
 	uint32_t seqno;
@@ -1962,6 +1963,12 @@ i915_gem_retire_requests_ring(struct intel_ring_buffer *ring)
 			break;
 
 		trace_i915_gem_request_retire(ring, request->seqno);
+		/* We know the GPU must have read the request to have
+		 * sent us the seqno + interrupt, so use the position
+		 * of tail of the request to update the last known position
+		 * of the GPU head.
+		 */
+		ring->last_retired_head = request->tail;
 
 		list_del(&request->list);
 		i915_gem_request_remove_from_client(request);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 620b2ed..f539235 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -533,6 +533,7 @@ pc_render_add_request(struct intel_ring_buffer *ring,
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
 	scratch_addr += 128;
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
+
 	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
 			PIPE_CONTROL_WRITE_FLUSH |
 			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
@@ -1054,11 +1055,80 @@ static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring)
 	return 0;
 }
 
+static int intel_ring_wait_seqno(struct intel_ring_buffer *ring, u32 seqno)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	bool was_interruptible;
+	int ret;
+
+	/* XXX As we have not yet audited all the paths to check that
+	 * they are ready for ERESTARTSYS from intel_ring_begin, do not
+	 * allow us to be interruptible by a signal.
+	 */
+	was_interruptible = dev_priv->mm.interruptible;
+	dev_priv->mm.interruptible = false;
+
+	ret = i915_wait_request(ring, seqno);
+
+	dev_priv->mm.interruptible = was_interruptible;
+
+	return ret;
+}
+
+static int intel_ring_wait_request(struct intel_ring_buffer *ring, int n)
+{
+	struct drm_i915_gem_request *request;
+	u32 seqno = 0;
+	int ret;
+
+	if (ring->last_retired_head != -1) {
+		ring->head = ring->last_retired_head;
+		ring->last_retired_head = -1;
+		ring->space = ring_space(ring);
+		if (ring->space >= n)
+			return 0;
+	}
+
+	list_for_each_entry(request, &ring->request_list, list) {
+		int space = request->tail - (ring->tail + 8);
+		if (space < 0)
+			space += ring->size;
+		if (space >= n) {
+			seqno = request->seqno;
+			break;
+		}
+		request->tail = -1;
+	}
+
+	if (seqno == 0)
+		return -ENOSPC;
+
+	ret = intel_ring_wait_seqno(ring, seqno);
+	if (ret)
+		return ret;
+
+	if (WARN_ON(ring->last_retired_head == -1))
+		return -ENOSPC;
+
+	ring->head = ring->last_retired_head;
+	ring->last_retired_head = -1;
+	ring->space = ring_space(ring);
+	if (WARN_ON(ring->space < n))
+		return -ENOSPC;
+
+	return 0;
+}
+
 int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long end;
+	int ret;
+
+	ret = intel_ring_wait_request(ring, n);
+	if (ret != -ENOSPC)
+		return ret;
 
 	trace_i915_ring_wait_begin(ring);
 	if (drm_core_check_feature(dev, DRIVER_GEM))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4c5338c..855233b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -46,6 +46,16 @@ struct  intel_ring_buffer {
 	int		effective_size;
 	struct intel_hw_status_page status_page;
 
+	/** We track the position of the requests in the ring buffer, and
+	 * when each is retired we increment last_retired_head as the GPU
+	 * must have finished processing the request and so we know we
+	 * can advance the ringbuffer up to that position.
+	 *
+	 * last_retired_head is set to -1 after the value is consumed so
+	 * we can detect new retirements.
+	 */
+	u32		last_retired_head;
+
 	spinlock_t	irq_lock;
 	u32		irq_refcount;
 	u32		irq_mask;
@@ -191,6 +201,11 @@ int intel_init_blt_ring_buffer(struct drm_device *dev);
 u32 intel_ring_get_active_head(struct intel_ring_buffer *ring);
 void intel_ring_setup_status_page(struct intel_ring_buffer *ring);
 
+static inline u32 intel_ring_get_tail(struct intel_ring_buffer *ring)
+{
+	return ring->tail;
+}
+
 static inline void i915_trace_irq_get(struct intel_ring_buffer *ring, u32 seqno)
 {
 	if (ring->trace_irq_seqno == 0 && ring->irq_get(ring))
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] drm/i915: Record the tail at each request and use it to estimate the head
  2012-02-08 13:34 ` [PATCH 2/2] drm/i915: Record the tail at each request and use it to estimate the head Chris Wilson
@ 2012-02-08 14:51   ` Daniel Vetter
  2012-02-08 18:06     ` [PATCH] " Chris Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Vetter @ 2012-02-08 14:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Feb 08, 2012 at 01:34:14PM +0000, Chris Wilson wrote:
> By recording the location of every request in the ringbuffer, we know
> that in order to retire the request the GPU must have finished reading
> it and so the GPU head is now beyond the tail of the request. We can
> therefore provide a conservative estimate of where the GPU is reading
> from in order to avoid having to read back the ring buffer registers
> when polling for space upon starting a new write into the ringbuffer.
> 
> A secondary effect is that this allows us to convert
> intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
> upon the single function to handle the complicated task of waiting upon
> the GPU. A necessary precaution is that we need to make that wait
> uninterruptible to match the existing conditions as all the callers of
> intel_ring_begin() have not been audited to handle ERESTARTSYS
> correctly.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I like how this now looks like, two minor quibbles left though:

I think we should mention the 'head going backwards' race which could
happen when we don't clear up all requests due to the mismatch between
ring->tail head updates and actual head updates from the head registers.
Imo it's good enough to explain what's going on in the commit msg.

> ---
>  drivers/gpu/drm/i915/i915_drv.h         |    5 ++
>  drivers/gpu/drm/i915/i915_gem.c         |    9 ++++-
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   70 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   15 +++++++
>  4 files changed, 98 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4ee0793..e8e1b85 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -945,6 +945,9 @@ struct drm_i915_gem_request {
>  	/** GEM sequence number associated with this request. */
>  	uint32_t seqno;
>  
> +	/** Postion in the ringbuffer of the end of the request */
> +	u32 tail;
> +
>  	/** Time at which this request was emitted, in jiffies. */
>  	unsigned long emitted_jiffies;
>  
> @@ -1234,6 +1237,8 @@ i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj)
>  }
>  
>  void i915_gem_retire_requests(struct drm_device *dev);
> +void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring);
> +
>  void i915_gem_reset(struct drm_device *dev);
>  void i915_gem_clflush_object(struct drm_i915_gem_object *obj);
>  int __must_check i915_gem_object_set_domain(struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 51390b0..4de9e4e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1798,6 +1798,7 @@ i915_add_request(struct intel_ring_buffer *ring,
>  
>  	request->seqno = seqno;
>  	request->ring = ring;
> +	request->tail = intel_ring_get_tail(ring);

I'm rather uneasy with using the tail _after_ we emitted the request as
the cookie. With funky hw workarounds like the pipe control flushing dance
for ilk we could receive the seqno write before the gpu has actually
finished processing all the w/a flushes. Similarly (but even smaller race
window) we could receive the seqno write before the gpu has executed the
user irq. So I think the paranoid but right thing to do here is to grab
the tail pointer before emitting the request.

Otherwise I'm ok with grabbing the tail pointer from i915_add_request -
we already have some nice layering violations with the seqno handling
anyway.

Otherwise I'm happy with the patch and the explanatory comments.

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position
  2012-02-08 13:34 [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Chris Wilson
  2012-02-08 13:34 ` [PATCH 2/2] drm/i915: Record the tail at each request and use it to estimate the head Chris Wilson
@ 2012-02-08 14:54 ` Daniel Vetter
  2012-02-08 17:36   ` Keith Packard
  2012-02-08 20:09   ` [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang Chris Wilson
  2012-02-21 19:23 ` [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Eric Anholt
  2012-02-27 16:53 ` Jesse Barnes
  3 siblings, 2 replies; 20+ messages in thread
From: Daniel Vetter @ 2012-02-08 14:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Feb 08, 2012 at 01:34:13PM +0000, Chris Wilson wrote:
> This is a revert of 6aa56062eaba67adfb247cded244fd877329588d.
> 
> This was originally introduced to workaround reads of the ringbuffer
> registers returning 0 on SandyBridge causing hangs due to ringbuffer
> overflow. The root cause here was reads through the GT powerwell require
> the forcewake dance, something we only learnt of later. Now it appears
> that reading the reported head position from the HWS is returning
> garbage, leading once again to hangs.
> 
> For example, on q35 the autoreported head reports:
>   [  217.975608] head now 00010000, actual 00010000
>   [  436.725613] head now 00200000, actual 00200000
>   [  462.956033] head now 00210000, actual 00210010
>   [  485.501409] head now 00400000, actual 00400020
>   [  508.064280] head now 00410000, actual 00410000
>   [  530.576078] head now 00600000, actual 00600020
>   [  553.273489] head now 00610000, actual 00610018
> which appears reasonably sane. In contrast, if we look at snb:
>   [  141.970680] head now 00e10000, actual 00008238
>   [  141.974062] head now 02734000, actual 000083c8
>   [  141.974425] head now 00e10000, actual 00008488
>   [  141.980374] head now 032b5000, actual 000088b8
>   [  141.980885] head now 03271000, actual 00008950
>   [  142.040628] head now 02101000, actual 00008b40
>   [  142.180173] head now 02734000, actual 00009050
>   [  142.181090] head now 00000000, actual 00000ae0
>   [  142.183737] head now 02734000, actual 00009050
> 
> In addition, the automatic reporting of the head position is scheduled
> to be defeatured in the future. It has no more utility, remove it.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45492
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

I'm on the fence whether we should include this in -fixes. On one hand it
fixes a severe issue on snb, but introduces a perf regression without the
second patch. Otoh we've never shipped snb without it broken like this.
But if it turns out that this is broken on ilk and earlier, too, I think
we definitely need these two patches in -fixes.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position
  2012-02-08 14:54 ` [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Daniel Vetter
@ 2012-02-08 17:36   ` Keith Packard
  2012-02-08 18:02     ` Daniel Vetter
  2012-02-08 20:09   ` [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang Chris Wilson
  1 sibling, 1 reply; 20+ messages in thread
From: Keith Packard @ 2012-02-08 17:36 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson; +Cc: intel-gfx

<#part sign=pgpmime>
On Wed, 8 Feb 2012 15:54:25 +0100, Daniel Vetter <daniel@ffwll.ch> wrote:

> I'm on the fence whether we should include this in -fixes. On one hand it
> fixes a severe issue on snb, but introduces a perf regression without the
> second patch. Otoh we've never shipped snb without it broken like this.
> But if it turns out that this is broken on ilk and earlier, too, I think
> we definitely need these two patches in -fixes.

The first one is a simple bug fix -- not attempting to use the status
page value. That seems reasonable for -fixes.

The second one has a bunch of global changes which seems less suitable
for -fixes, especially as it now requires that the driver reliably track
ring positions, which seems a bit perilous to me.

-- 
keith.packard@intel.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position
  2012-02-08 17:36   ` Keith Packard
@ 2012-02-08 18:02     ` Daniel Vetter
  2012-02-08 18:17       ` Chris Wilson
  2012-02-08 19:13       ` Keith Packard
  0 siblings, 2 replies; 20+ messages in thread
From: Daniel Vetter @ 2012-02-08 18:02 UTC (permalink / raw)
  To: Keith Packard; +Cc: intel-gfx

On Wed, Feb 8, 2012 at 18:36, Keith Packard <keithp@keithp.com> wrote:
> <#part sign=pgpmime>
> On Wed, 8 Feb 2012 15:54:25 +0100, Daniel Vetter <daniel@ffwll.ch> wrote:
>
>> I'm on the fence whether we should include this in -fixes. On one hand it
>> fixes a severe issue on snb, but introduces a perf regression without the
>> second patch. Otoh we've never shipped snb without it broken like this.
>> But if it turns out that this is broken on ilk and earlier, too, I think
>> we definitely need these two patches in -fixes.
>
> The first one is a simple bug fix -- not attempting to use the status
> page value. That seems reasonable for -fixes.

The issue is that the first one introduces a pretty decent perf
regression, iirc Chris mentions something much large than 10x slowdown
on certain cairo traces on sna. So we can only merge the first one
together with the second one.

> The second one has a bunch of global changes which seems less suitable
> for -fixes, especially as it now requires that the driver reliably track
> ring positions, which seems a bit perilous to me.

Yeah, that's kinda the issue which I've failed to phrase clearly in my mail ;-)

Without this issue being confirmed on older machines I'm leaning
towards merging both patches to -next - after all snb never really
regressed because the offending commit changed behaviour from "dies as
soon as the ring wraps" to "sometimes corrupts the ring".
-Daniel
-- 
Daniel Vetter
daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] drm/i915: Record the tail at each request and use it to estimate the head
  2012-02-08 14:51   ` Daniel Vetter
@ 2012-02-08 18:06     ` Chris Wilson
  2012-02-08 18:28       ` Daniel Vetter
  2012-02-08 18:56       ` Eugeni Dodonov
  0 siblings, 2 replies; 20+ messages in thread
From: Chris Wilson @ 2012-02-08 18:06 UTC (permalink / raw)
  To: intel-gfx

By recording the location of every request in the ringbuffer, we know
that in order to retire the request the GPU must have finished reading
it and so the GPU head is now beyond the tail of the request. We can
therefore provide a conservative estimate of where the GPU is reading
from in order to avoid having to read back the ring buffer registers
when polling for space upon starting a new write into the ringbuffer.

A secondary effect is that this allows us to convert
intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
upon the single function to handle the complicated task of waiting upon
the GPU. A necessary precaution is that we need to make that wait
uninterruptible to match the existing conditions as all the callers of
intel_ring_begin() have not been audited to handle ERESTARTSYS
correctly.

By using a conservative estimate for the head, and always processing all
outstanding requests first, we prevent a race condition between using
the estimate and direct reads of I915_RING_HEAD which could result in
the value of the head going backwards, and the tail overflowing once
again. We are also careful to mark any request that we skip over in
order to free space in ring as consumed which provides a
self-consistency check.

Given sufficient abuse, such as a set of unthrottled GPU bound
cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
Sandy Bridge (i5-2520m):
  firefox-paintball  18927ms -> 15646ms: 1.21x speedup
  firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
which is a mild consolation for the performance those traces achieved from
exploiting the buggy autoreported head.

v2: Add a few more comments and make request->tail a conservative
estimate as suggested by Daniel Vetter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |    5 ++
 drivers/gpu/drm/i915/i915_gem.c         |   17 ++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   83 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |   15 ++++++
 4 files changed, 119 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4ee0793..e8e1b85 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -945,6 +945,9 @@ struct drm_i915_gem_request {
 	/** GEM sequence number associated with this request. */
 	uint32_t seqno;
 
+	/** Postion in the ringbuffer of the end of the request */
+	u32 tail;
+
 	/** Time at which this request was emitted, in jiffies. */
 	unsigned long emitted_jiffies;
 
@@ -1234,6 +1237,8 @@ i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj)
 }
 
 void i915_gem_retire_requests(struct drm_device *dev);
+void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring);
+
 void i915_gem_reset(struct drm_device *dev);
 void i915_gem_clflush_object(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_object_set_domain(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 51390b0..dbf30bc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1784,12 +1784,20 @@ i915_add_request(struct intel_ring_buffer *ring,
 {
 	drm_i915_private_t *dev_priv = ring->dev->dev_private;
 	uint32_t seqno;
+	u32 request_ring_position;
 	int was_empty;
 	int ret;
 
 	BUG_ON(request == NULL);
 	seqno = i915_gem_next_request_seqno(ring);
 
+	/* Record the position of the start of the request so that
+	 * should we detect the updated seqno part-way through the
+	 * GPU processing the request, we never over-estimate the
+	 * position of the head.
+	 */
+	request_ring_position = intel_ring_get_tail(ring);
+
 	ret = ring->add_request(ring, &seqno);
 	if (ret)
 	    return ret;
@@ -1798,6 +1806,7 @@ i915_add_request(struct intel_ring_buffer *ring,
 
 	request->seqno = seqno;
 	request->ring = ring;
+	request->tail = request_ring_position;
 	request->emitted_jiffies = jiffies;
 	was_empty = list_empty(&ring->request_list);
 	list_add_tail(&request->list, &ring->request_list);
@@ -1934,7 +1943,7 @@ void i915_gem_reset(struct drm_device *dev)
 /**
  * This function clears the request list as sequence numbers are passed.
  */
-static void
+void
 i915_gem_retire_requests_ring(struct intel_ring_buffer *ring)
 {
 	uint32_t seqno;
@@ -1962,6 +1971,12 @@ i915_gem_retire_requests_ring(struct intel_ring_buffer *ring)
 			break;
 
 		trace_i915_gem_request_retire(ring, request->seqno);
+		/* We know the GPU must have read the request to have
+		 * sent us the seqno + interrupt, so use the position
+		 * of tail of the request to update the last known position
+		 * of the GPU head.
+		 */
+		ring->last_retired_head = request->tail;
 
 		list_del(&request->list);
 		i915_gem_request_remove_from_client(request);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 620b2ed..ae78da9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -533,6 +533,7 @@ pc_render_add_request(struct intel_ring_buffer *ring,
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
 	scratch_addr += 128;
 	PIPE_CONTROL_FLUSH(ring, scratch_addr);
+
 	intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
 			PIPE_CONTROL_WRITE_FLUSH |
 			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
@@ -1054,11 +1055,93 @@ static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring)
 	return 0;
 }
 
+static int intel_ring_wait_seqno(struct intel_ring_buffer *ring, u32 seqno)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	bool was_interruptible;
+	int ret;
+
+	/* XXX As we have not yet audited all the paths to check that
+	 * they are ready for ERESTARTSYS from intel_ring_begin, do not
+	 * allow us to be interruptible by a signal.
+	 */
+	was_interruptible = dev_priv->mm.interruptible;
+	dev_priv->mm.interruptible = false;
+
+	ret = i915_wait_request(ring, seqno);
+
+	dev_priv->mm.interruptible = was_interruptible;
+
+	return ret;
+}
+
+static int intel_ring_wait_request(struct intel_ring_buffer *ring, int n)
+{
+	struct drm_i915_gem_request *request;
+	u32 seqno = 0;
+	int ret;
+
+	i915_gem_retire_requests_ring(ring);
+
+	if (ring->last_retired_head != -1) {
+		ring->head = ring->last_retired_head;
+		ring->last_retired_head = -1;
+		ring->space = ring_space(ring);
+		if (ring->space >= n)
+			return 0;
+	}
+
+	list_for_each_entry(request, &ring->request_list, list) {
+		int space;
+
+		if (request->tail == -1)
+			continue;
+
+		space = request->tail - (ring->tail + 8);
+		if (space < 0)
+			space += ring->size;
+		if (space >= n) {
+			seqno = request->seqno;
+			break;
+		}
+
+		/* Consume this request in case we need more space than
+		 * is available and so need to prevent a race between
+		 * updating last_retired_head and direct reads of
+		 * I915_RING_HEAD. It also provides a nice sanity check.
+		 */
+		request->tail = -1;
+	}
+
+	if (seqno == 0)
+		return -ENOSPC;
+
+	ret = intel_ring_wait_seqno(ring, seqno);
+	if (ret)
+		return ret;
+
+	if (WARN_ON(ring->last_retired_head == -1))
+		return -ENOSPC;
+
+	ring->head = ring->last_retired_head;
+	ring->last_retired_head = -1;
+	ring->space = ring_space(ring);
+	if (WARN_ON(ring->space < n))
+		return -ENOSPC;
+
+	return 0;
+}
+
 int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	unsigned long end;
+	int ret;
+
+	ret = intel_ring_wait_request(ring, n);
+	if (ret != -ENOSPC)
+		return ret;
 
 	trace_i915_ring_wait_begin(ring);
 	if (drm_core_check_feature(dev, DRIVER_GEM))
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4c5338c..855233b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -46,6 +46,16 @@ struct  intel_ring_buffer {
 	int		effective_size;
 	struct intel_hw_status_page status_page;
 
+	/** We track the position of the requests in the ring buffer, and
+	 * when each is retired we increment last_retired_head as the GPU
+	 * must have finished processing the request and so we know we
+	 * can advance the ringbuffer up to that position.
+	 *
+	 * last_retired_head is set to -1 after the value is consumed so
+	 * we can detect new retirements.
+	 */
+	u32		last_retired_head;
+
 	spinlock_t	irq_lock;
 	u32		irq_refcount;
 	u32		irq_mask;
@@ -191,6 +201,11 @@ int intel_init_blt_ring_buffer(struct drm_device *dev);
 u32 intel_ring_get_active_head(struct intel_ring_buffer *ring);
 void intel_ring_setup_status_page(struct intel_ring_buffer *ring);
 
+static inline u32 intel_ring_get_tail(struct intel_ring_buffer *ring)
+{
+	return ring->tail;
+}
+
 static inline void i915_trace_irq_get(struct intel_ring_buffer *ring, u32 seqno)
 {
 	if (ring->trace_irq_seqno == 0 && ring->irq_get(ring))
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position
  2012-02-08 18:02     ` Daniel Vetter
@ 2012-02-08 18:17       ` Chris Wilson
  2012-02-08 18:24         ` Daniel Vetter
  2012-02-08 19:13       ` Keith Packard
  1 sibling, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2012-02-08 18:17 UTC (permalink / raw)
  To: Daniel Vetter, Keith Packard; +Cc: intel-gfx

On Wed, 8 Feb 2012 19:02:47 +0100, Daniel Vetter <daniel@ffwll.ch> wrote:
> The issue is that the first one introduces a pretty decent perf
> regression, iirc Chris mentions something much large than 10x slowdown
> on certain cairo traces on sna. So we can only merge the first one
> together with the second one.

As it turns out, the cairo-traces were successfully exploiting the
ringbuffer overflow to avoid work. The reason why it eventually broke is
that when I experimented with using the BLT and semaphores, I introduced
a dependency on accurate seqno processing, which was obviously broken by
the overwriting of commands in the ringbuffer. And so is consistent with
the hangs reported by Eugeni and Ben, but we have yet to see many more
examples in the wild. Otoh, I don't think this explains the
all-generation use-after-free bug.

Net result, the 0.9s firefox-paintball result was a fantasy and the perf
gains provided by the second patch are only around 20% for the same
trace. Ergo -next material only if it passes scrutiny.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position
  2012-02-08 18:17       ` Chris Wilson
@ 2012-02-08 18:24         ` Daniel Vetter
  0 siblings, 0 replies; 20+ messages in thread
From: Daniel Vetter @ 2012-02-08 18:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Feb 08, 2012 at 06:17:00PM +0000, Chris Wilson wrote:
> On Wed, 8 Feb 2012 19:02:47 +0100, Daniel Vetter <daniel@ffwll.ch> wrote:
> > The issue is that the first one introduces a pretty decent perf
> > regression, iirc Chris mentions something much large than 10x slowdown
> > on certain cairo traces on sna. So we can only merge the first one
> > together with the second one.
> 
> As it turns out, the cairo-traces were successfully exploiting the
> ringbuffer overflow to avoid work. The reason why it eventually broke is
> that when I experimented with using the BLT and semaphores, I introduced
> a dependency on accurate seqno processing, which was obviously broken by
> the overwriting of commands in the ringbuffer. And so is consistent with
> the hangs reported by Eugeni and Ben, but we have yet to see many more
> examples in the wild. Otoh, I don't think this explains the
> all-generation use-after-free bug.
> 
> Net result, the 0.9s firefox-paintball result was a fantasy and the perf
> gains provided by the second patch are only around 20% for the same
> trace. Ergo -next material only if it passes scrutiny.

Well, that explains a few things. In that case I'm fine with just merging
the first patch for -fixes, I think there're quite a few bugs this does
fix.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/i915: Record the tail at each request and use it to estimate the head
  2012-02-08 18:06     ` [PATCH] " Chris Wilson
@ 2012-02-08 18:28       ` Daniel Vetter
  2012-02-08 18:56       ` Eugeni Dodonov
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel Vetter @ 2012-02-08 18:28 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Feb 08, 2012 at 06:06:30PM +0000, Chris Wilson wrote:
> By recording the location of every request in the ringbuffer, we know
> that in order to retire the request the GPU must have finished reading
> it and so the GPU head is now beyond the tail of the request. We can
> therefore provide a conservative estimate of where the GPU is reading
> from in order to avoid having to read back the ring buffer registers
> when polling for space upon starting a new write into the ringbuffer.
> 
> A secondary effect is that this allows us to convert
> intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
> upon the single function to handle the complicated task of waiting upon
> the GPU. A necessary precaution is that we need to make that wait
> uninterruptible to match the existing conditions as all the callers of
> intel_ring_begin() have not been audited to handle ERESTARTSYS
> correctly.
> 
> By using a conservative estimate for the head, and always processing all
> outstanding requests first, we prevent a race condition between using
> the estimate and direct reads of I915_RING_HEAD which could result in
> the value of the head going backwards, and the tail overflowing once
> again. We are also careful to mark any request that we skip over in
> order to free space in ring as consumed which provides a
> self-consistency check.
> 
> Given sufficient abuse, such as a set of unthrottled GPU bound
> cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
> Sandy Bridge (i5-2520m):
>   firefox-paintball  18927ms -> 15646ms: 1.21x speedup
>   firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
> which is a mild consolation for the performance those traces achieved from
> exploiting the buggy autoreported head.
> 
> v2: Add a few more comments and make request->tail a conservative
> estimate as suggested by Daniel Vetter.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/i915: Record the tail at each request and use it to estimate the head
  2012-02-08 18:06     ` [PATCH] " Chris Wilson
  2012-02-08 18:28       ` Daniel Vetter
@ 2012-02-08 18:56       ` Eugeni Dodonov
  2012-02-08 19:21         ` Chris Wilson
  1 sibling, 1 reply; 20+ messages in thread
From: Eugeni Dodonov @ 2012-02-08 18:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1192 bytes --]

On Wed, Feb 8, 2012 at 16:06, Chris Wilson <chris@chris-wilson.co.uk> wrote:

> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1784,12 +1784,20 @@ i915_add_request(struct intel_ring_buffer *ring,
>  {
>        drm_i915_private_t *dev_priv = ring->dev->dev_private;
>        uint32_t seqno;
> +       u32 request_ring_position;
>        int was_empty;
>        int ret;
>
>        BUG_ON(request == NULL);
>        seqno = i915_gem_next_request_seqno(ring);
>
> +       /* Record the position of the start of the request so that
> +        * should we detect the updated seqno part-way through the
> +        * GPU processing the request, we never over-estimate the
> +        * position of the head.
> +        */
> +       request_ring_position = intel_ring_get_tail(ring);
>


Perhaps a stupid question within the bikeshedding spirit, but why do we
need intel_ring_get_tail()? Wouldn't just:

request_ring_position = ring->tail;

be self-explainable? Or some additional logic could be involved there at
some point?

But in overall, I like this, so:
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>

-- 
Eugeni Dodonov
<http://eugeni.dodonov.net/>

[-- Attachment #1.2: Type: text/html, Size: 1669 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position
  2012-02-08 18:02     ` Daniel Vetter
  2012-02-08 18:17       ` Chris Wilson
@ 2012-02-08 19:13       ` Keith Packard
  1 sibling, 0 replies; 20+ messages in thread
From: Keith Packard @ 2012-02-08 19:13 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

<#part sign=pgpmime>
On Wed, 8 Feb 2012 19:02:47 +0100, Daniel Vetter <daniel@ffwll.ch> wrote:

> The issue is that the first one introduces a pretty decent perf
> regression, iirc Chris mentions something much large than 10x slowdown
> on certain cairo traces on sna. So we can only merge the first one
> together with the second one.

Is there any effect on mesa or uxa?

> Yeah, that's kinda the issue which I've failed to phrase clearly in my mail ;-)

Frightening patches with that kind of global scope need a reason far
more compelling than a performance regression caused by a separate
correctness fix.

If we need the correctness patch to resolve a known problem (especially
a regression), then we'll take the performance hit and fix that in the
next release.

> Without this issue being confirmed on older machines I'm leaning
> towards merging both patches to -next - after all snb never really
> regressed because the offending commit changed behaviour from "dies as
> soon as the ring wraps" to "sometimes corrupts the ring".

If we don't see any significant issues in 'normal' usage (mesa and uxa),
then I agree, we should push both patches to -next.

-- 
keith.packard@intel.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] drm/i915: Record the tail at each request and use it to estimate the head
  2012-02-08 18:56       ` Eugeni Dodonov
@ 2012-02-08 19:21         ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2012-02-08 19:21 UTC (permalink / raw)
  To: Eugeni Dodonov; +Cc: intel-gfx

On Wed, 8 Feb 2012 16:56:30 -0200, Eugeni Dodonov <eugeni@dodonov.net> wrote:
> On Wed, Feb 8, 2012 at 16:06, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > +       /* Record the position of the start of the request so that
> > +        * should we detect the updated seqno part-way through the
> > +        * GPU processing the request, we never over-estimate the
> > +        * position of the head.
> > +        */
> > +       request_ring_position = intel_ring_get_tail(ring);
> >
> Perhaps a stupid question within the bikeshedding spirit, but why do we
> need intel_ring_get_tail()? Wouldn't just:
> 
> request_ring_position = ring->tail;
> 
> be self-explainable? Or some additional logic could be involved there at
> some point?

It is to try and preserve the illusion of a ringbuffer abstraction; the
position of the request within the ring should be an internal detail of
the ring but we are leaking it for our purposes. The use of a function there
serves as a reminder of that leak (and easier to find than ring->tail) in
the hope that someone will clean up the entire ringbuffer API and make the
issue moot.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang
  2012-02-08 14:54 ` [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Daniel Vetter
  2012-02-08 17:36   ` Keith Packard
@ 2012-02-08 20:09   ` Chris Wilson
  2012-02-08 20:09     ` [PATCH 2/2] drm/i915: Record the position of the request upon error Chris Wilson
  2012-02-08 22:33     ` [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang Daniel Vetter
  1 sibling, 2 replies; 20+ messages in thread
From: Chris Wilson @ 2012-02-08 20:09 UTC (permalink / raw)
  To: intel-gfx

Being able to tally the list of outstanding requests with the sequence
of commands in the ringbuffer is often useful evidence with respect to
driver corruption.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   24 +++++++++----
 drivers/gpu/drm/i915/i915_drv.h     |   17 +++++++---
 drivers/gpu/drm/i915/i915_irq.c     |   63 ++++++++++++++++++++++++++---------
 3 files changed, 75 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 88bda69..28089f7 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -745,7 +745,7 @@ static int i915_error_state(struct seq_file *m, void *unused)
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct drm_i915_error_state *error;
 	unsigned long flags;
-	int i, page, offset, elt;
+	int i, j, page, offset, elt;
 
 	spin_lock_irqsave(&dev_priv->error_lock, flags);
 	error = dev_priv->first_error;
@@ -789,10 +789,10 @@ static int i915_error_state(struct seq_file *m, void *unused)
 				    error->pinned_bo,
 				    error->pinned_bo_count);
 
-	for (i = 0; i < ARRAY_SIZE(error->batchbuffer); i++) {
-		if (error->batchbuffer[i]) {
-			struct drm_i915_error_object *obj = error->batchbuffer[i];
+	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
+		struct drm_i915_error_object *obj;
 
+		if ((obj = error->ring[i].batchbuffer)) {
 			seq_printf(m, "%s --- gtt_offset = 0x%08x\n",
 				   dev_priv->ring[i].name,
 				   obj->gtt_offset);
@@ -804,11 +804,19 @@ static int i915_error_state(struct seq_file *m, void *unused)
 				}
 			}
 		}
-	}
 
-	for (i = 0; i < ARRAY_SIZE(error->ringbuffer); i++) {
-		if (error->ringbuffer[i]) {
-			struct drm_i915_error_object *obj = error->ringbuffer[i];
+		if (error->ring[i].num_requests) {
+			seq_printf(m, "%s --- %d requests\n",
+				   dev_priv->ring[i].name,
+				   error->ring[i].num_requests);
+			for (j = 0; j < error->ring[i].num_requests; j++) {
+				seq_printf(m, "%d: emitted %ld\n",
+					  error->ring[i].requests[j].seqno,
+					  error->ring[i].requests[j].jiffies);
+			}
+		}
+
+		if ((obj = error->ring[i].ringbuffer)) {
 			seq_printf(m, "%s --- ringbuffer = 0x%08x\n",
 				   dev_priv->ring[i].name,
 				   obj->gtt_offset);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e8e1b85..8f6ae11 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -172,11 +172,18 @@ struct drm_i915_error_state {
 	u32 faddr[I915_NUM_RINGS];
 	u64 fence[I915_MAX_NUM_FENCES];
 	struct timeval time;
-	struct drm_i915_error_object {
-		int page_count;
-		u32 gtt_offset;
-		u32 *pages[0];
-	} *ringbuffer[I915_NUM_RINGS], *batchbuffer[I915_NUM_RINGS];
+	struct drm_i915_error_ring {
+		struct drm_i915_error_object {
+			int page_count;
+			u32 gtt_offset;
+			u32 *pages[0];
+		} *ringbuffer, *batchbuffer;
+		struct drm_i915_error_request {
+			long jiffies;
+			u32 seqno;
+		} *requests;
+		int num_requests;
+	} ring[I915_NUM_RINGS];
 	struct drm_i915_error_buffer {
 		u32 size;
 		u32 name;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6266f57..24f8928 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -790,11 +790,11 @@ i915_error_state_free(struct kref *error_ref)
 							  typeof(*error), ref);
 	int i;
 
-	for (i = 0; i < ARRAY_SIZE(error->batchbuffer); i++)
-		i915_error_object_free(error->batchbuffer[i]);
-
-	for (i = 0; i < ARRAY_SIZE(error->ringbuffer); i++)
-		i915_error_object_free(error->ringbuffer[i]);
+	for (i = 0; i < ARRAY_SIZE(error->ring); i++) {
+		i915_error_object_free(error->ring[i].batchbuffer);
+		i915_error_object_free(error->ring[i].ringbuffer);
+		kfree(error->ring[i].requests);
+	}
 
 	kfree(error->active_bo);
 	kfree(error->overlay);
@@ -929,6 +929,47 @@ static void i915_record_ring_state(struct drm_device *dev,
 	error->tail[ring->id] = I915_READ_TAIL(ring);
 }
 
+static void i915_gem_record_rings(struct drm_device *dev,
+				  struct drm_i915_error_state *error)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_request *request;
+	int i, count;
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		struct intel_ring_buffer *ring = &dev_priv->ring[i];
+
+		error->ring[i].batchbuffer =
+			i915_error_first_batchbuffer(dev_priv, ring);
+
+		error->ring[i].ringbuffer =
+			i915_error_object_create(dev_priv, ring->obj);
+
+		count = 0;
+		list_for_each_entry(request, &ring->request_list, list)
+			count++;
+
+		error->ring[i].num_requests = count;
+		error->ring[i].requests =
+			kcalloc(count,
+				sizeof(struct drm_i915_error_request),
+				GFP_ATOMIC);
+		if (error->ring[i].requests == NULL) {
+			error->ring[i].requests = 0;
+			continue;
+		}
+
+		count = 0;
+		list_for_each_entry(request, &ring->request_list, list) {
+			struct drm_i915_error_request *erq;
+
+			erq = &error->ring[i].requests[count++];
+			erq->seqno = request->seqno;
+			erq->jiffies = request->emitted_jiffies;
+		}
+	}
+}
+
 /**
  * i915_capture_error_state - capture an error record for later analysis
  * @dev: drm device
@@ -980,17 +1021,7 @@ static void i915_capture_error_state(struct drm_device *dev)
 		i915_record_ring_state(dev, error, &dev_priv->ring[VCS]);
 
 	i915_gem_record_fences(dev, error);
-
-	/* Record the active batch and ring buffers */
-	for (i = 0; i < I915_NUM_RINGS; i++) {
-		error->batchbuffer[i] =
-			i915_error_first_batchbuffer(dev_priv,
-						     &dev_priv->ring[i]);
-
-		error->ringbuffer[i] =
-			i915_error_object_create(dev_priv,
-						 dev_priv->ring[i].obj);
-	}
+	i915_gem_record_rings(dev, error);
 
 	/* Record buffers on the active and pinned lists. */
 	error->active_bo = NULL;
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] drm/i915: Record the position of the request upon error
  2012-02-08 20:09   ` [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang Chris Wilson
@ 2012-02-08 20:09     ` Chris Wilson
  2012-02-08 22:34       ` Daniel Vetter
  2012-02-10 14:51       ` Eugeni Dodonov
  2012-02-08 22:33     ` [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang Daniel Vetter
  1 sibling, 2 replies; 20+ messages in thread
From: Chris Wilson @ 2012-02-08 20:09 UTC (permalink / raw)
  To: intel-gfx

So that we can tally the request against the command sequence in the
ringbuffer, or merely jump to the interesting locations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c |    5 +++--
 drivers/gpu/drm/i915/i915_drv.h     |    1 +
 drivers/gpu/drm/i915/i915_irq.c     |    1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 28089f7..4c1c47d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -810,9 +810,10 @@ static int i915_error_state(struct seq_file *m, void *unused)
 				   dev_priv->ring[i].name,
 				   error->ring[i].num_requests);
 			for (j = 0; j < error->ring[i].num_requests; j++) {
-				seq_printf(m, "%d: emitted %ld\n",
+				seq_printf(m, "%d: emitted %ld, tail %d\n",
 					  error->ring[i].requests[j].seqno,
-					  error->ring[i].requests[j].jiffies);
+					  error->ring[i].requests[j].jiffies,
+					  error->ring[i].requests[j].tail);
 			}
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8f6ae11..3ab593b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -181,6 +181,7 @@ struct drm_i915_error_state {
 		struct drm_i915_error_request {
 			long jiffies;
 			u32 seqno;
+			u32 tail;
 		} *requests;
 		int num_requests;
 	} ring[I915_NUM_RINGS];
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 24f8928..0d28e87 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -966,6 +966,7 @@ static void i915_gem_record_rings(struct drm_device *dev,
 			erq = &error->ring[i].requests[count++];
 			erq->seqno = request->seqno;
 			erq->jiffies = request->emitted_jiffies;
+			erq->tail = request->tail;
 		}
 	}
 }
-- 
1.7.9

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang
  2012-02-08 20:09   ` [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang Chris Wilson
  2012-02-08 20:09     ` [PATCH 2/2] drm/i915: Record the position of the request upon error Chris Wilson
@ 2012-02-08 22:33     ` Daniel Vetter
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel Vetter @ 2012-02-08 22:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Feb 08, 2012 at 08:09:27PM +0000, Chris Wilson wrote:
> Being able to tally the list of outstanding requests with the sequence
> of commands in the ringbuffer is often useful evidence with respect to
> driver corruption.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I like this and I think we can nicely put some more robust
request->batchbuffer_bo tracking on top of this instead of the current
broken heuristics. Two bikeshed comments
- consisten seqno pretty-printing in the debugfs file
- some mention that you've gone ahead and refactored the per-ring stuff
  quite a bit might be good. It confused me quite a bit until I've
  noticed what you've done and found the actual change ...

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] drm/i915: Record the position of the request upon error
  2012-02-08 20:09     ` [PATCH 2/2] drm/i915: Record the position of the request upon error Chris Wilson
@ 2012-02-08 22:34       ` Daniel Vetter
  2012-02-10 14:51       ` Eugeni Dodonov
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel Vetter @ 2012-02-08 22:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Feb 08, 2012 at 08:09:28PM +0000, Chris Wilson wrote:
> So that we can tally the request against the command sequence in the
> ringbuffer, or merely jump to the interesting locations.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I like the level of paranoia this is approaching ;-)
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] drm/i915: Record the position of the request upon error
  2012-02-08 20:09     ` [PATCH 2/2] drm/i915: Record the position of the request upon error Chris Wilson
  2012-02-08 22:34       ` Daniel Vetter
@ 2012-02-10 14:51       ` Eugeni Dodonov
  1 sibling, 0 replies; 20+ messages in thread
From: Eugeni Dodonov @ 2012-02-10 14:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 392 bytes --]

On Wed, Feb 8, 2012 at 18:09, Chris Wilson <chris@chris-wilson.co.uk> wrote:

> So that we can tally the request against the command sequence in the
> ringbuffer, or merely jump to the interesting locations.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>

For both patches:
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>

-- 
Eugeni Dodonov
<http://eugeni.dodonov.net/>

[-- Attachment #1.2: Type: text/html, Size: 849 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position
  2012-02-08 13:34 [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Chris Wilson
  2012-02-08 13:34 ` [PATCH 2/2] drm/i915: Record the tail at each request and use it to estimate the head Chris Wilson
  2012-02-08 14:54 ` [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Daniel Vetter
@ 2012-02-21 19:23 ` Eric Anholt
  2012-02-27 16:53 ` Jesse Barnes
  3 siblings, 0 replies; 20+ messages in thread
From: Eric Anholt @ 2012-02-21 19:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 712 bytes --]

On Wed,  8 Feb 2012 13:34:13 +0000, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> This is a revert of 6aa56062eaba67adfb247cded244fd877329588d.
> 
> This was originally introduced to workaround reads of the ringbuffer
> registers returning 0 on SandyBridge causing hangs due to ringbuffer
> overflow. The root cause here was reads through the GT powerwell require
> the forcewake dance, something we only learnt of later. Now it appears
> that reading the reported head position from the HWS is returning
> garbage, leading once again to hangs.

Tested-by: Eric Anholt <eric@anholt.net>

(on top of de67cba65944f26c0f147035bd62e30c5f456b96, this fixes
rendering failures and GPU hangs on gen7).

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position
  2012-02-08 13:34 [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Chris Wilson
                   ` (2 preceding siblings ...)
  2012-02-21 19:23 ` [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Eric Anholt
@ 2012-02-27 16:53 ` Jesse Barnes
  3 siblings, 0 replies; 20+ messages in thread
From: Jesse Barnes @ 2012-02-27 16:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 698 bytes --]

On Wed,  8 Feb 2012 13:34:13 +0000
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> This is a revert of 6aa56062eaba67adfb247cded244fd877329588d.
> 
> This was originally introduced to workaround reads of the ringbuffer
> registers returning 0 on SandyBridge causing hangs due to ringbuffer
> overflow. The root cause here was reads through the GT powerwell require
> the forcewake dance, something we only learnt of later. Now it appears
> that reading the reported head position from the HWS is returning
> garbage, leading once again to hangs.

Sorry meant to apply this early last week... on its way to Dave now.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-02-27 16:53 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-08 13:34 [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Chris Wilson
2012-02-08 13:34 ` [PATCH 2/2] drm/i915: Record the tail at each request and use it to estimate the head Chris Wilson
2012-02-08 14:51   ` Daniel Vetter
2012-02-08 18:06     ` [PATCH] " Chris Wilson
2012-02-08 18:28       ` Daniel Vetter
2012-02-08 18:56       ` Eugeni Dodonov
2012-02-08 19:21         ` Chris Wilson
2012-02-08 14:54 ` [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Daniel Vetter
2012-02-08 17:36   ` Keith Packard
2012-02-08 18:02     ` Daniel Vetter
2012-02-08 18:17       ` Chris Wilson
2012-02-08 18:24         ` Daniel Vetter
2012-02-08 19:13       ` Keith Packard
2012-02-08 20:09   ` [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang Chris Wilson
2012-02-08 20:09     ` [PATCH 2/2] drm/i915: Record the position of the request upon error Chris Wilson
2012-02-08 22:34       ` Daniel Vetter
2012-02-10 14:51       ` Eugeni Dodonov
2012-02-08 22:33     ` [PATCH 1/2] drm/i915: Record the in-flight requests at the time of a hang Daniel Vetter
2012-02-21 19:23 ` [PATCH 1/2] drm/i915: Remove use of the autoreported ringbuffer HEAD position Eric Anholt
2012-02-27 16:53 ` Jesse Barnes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.