All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled
@ 2017-02-12 17:19 Chris Wilson
  2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:19 UTC (permalink / raw)
  To: intel-gfx

As the request is not complete, it should not be signaled. Assert that
this is true before we process the request for a reset.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b8d869d7937d..48922ff454e6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2611,6 +2611,8 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 			continue;
 
 		GEM_BUG_ON(request->engine != engine);
+		GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				    &request->fence.flags));
 		return request;
 	}
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare()
  2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
@ 2017-02-12 17:19 ` Chris Wilson
  2017-02-13 10:22   ` Mika Kuoppala
  2017-02-12 17:20 ` [PATCH 3/5] drm/i915: Kill the tasklet then disable Chris Wilson
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:19 UTC (permalink / raw)
  To: intel-gfx

As i915_gem_reset_finish() undoes the steps from
i915_gem_reset_prepare() to leave the system in a fully-working state,
e.g. to be able to free the breadcrumb signal threads, make sure that we
always call it even on the error path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 555b61bb8a63..1bc2cd3adc84 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1873,10 +1873,10 @@ void i915_reset(struct drm_i915_private *dev_priv)
 		goto error;
 	}
 
-	i915_gem_reset_finish(dev_priv);
 	i915_queue_hangcheck(dev_priv);
 
 wakeup:
+	i915_gem_reset_finish(dev_priv);
 	enable_irq(dev_priv->drm.irq);
 	wake_up_bit(&error->flags, I915_RESET_IN_PROGRESS);
 	return;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] drm/i915: Kill the tasklet then disable
  2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
  2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
@ 2017-02-12 17:20 ` Chris Wilson
  2017-02-13 10:30   ` Mika Kuoppala
  2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Disabling the tasklet leaves it if scheduled on the ready to run list
until it is re-enabled. This will leave the ksoftird thread spinning
until satisfied. To prevent this situation on starting the GPU reset, we
want to kill the tasklet first and then disable. The same problem will
arise when a tasklet is scheduled from another device, so a better
solution is required for the general case.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1f7b847d72c3 ("drm/i915: Disable engine->irq_tasklet around resets")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 48922ff454e6..3c2c2296c1dc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2651,8 +2651,8 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
 		 * Turning off the engine->irq_tasklet until the reset is over
 		 * prevents the race.
 		 */
-		tasklet_disable(&engine->irq_tasklet);
 		tasklet_kill(&engine->irq_tasklet);
+		tasklet_disable(&engine->irq_tasklet);
 
 		if (engine->irq_seqno_barrier)
 			engine->irq_seqno_barrier(engine);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset
  2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
  2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
  2017-02-12 17:20 ` [PATCH 3/5] drm/i915: Kill the tasklet then disable Chris Wilson
@ 2017-02-12 17:20 ` Chris Wilson
  2017-02-13 10:03   ` Mika Kuoppala
  2017-02-13 10:16   ` Chris Wilson
  2017-02-12 17:20 ` [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset Chris Wilson
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

The signal threads may be running concurrently with the GPU reset. The
completion from the GPU run asynchronous with the reset and two threads
may see different snapshots of the state, and the signaler may mark a
request as complete as we try to reset it. We don't tolerate 2 different
views of the same state and complain if we try to mark a request as
failed if it is already complete. Disable the signal threads during
reset to prevent this conflict (even though the conflict implies that
the state we resetting to is invalid, we have already made our
decision!).

References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c          | 16 +++++++++++++++-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  3 +++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3c2c2296c1dc..730804ce9610 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -35,6 +35,7 @@
 #include "intel_frontbuffer.h"
 #include "intel_mocs.h"
 #include <linux/dma-fence-array.h>
+#include <linux/kthread.h>
 #include <linux/reservation.h>
 #include <linux/shmem_fs.h>
 #include <linux/slab.h>
@@ -2643,6 +2644,17 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
 	for_each_engine(engine, dev_priv, id) {
 		struct drm_i915_gem_request *request;
 
+		/* Prevent the signaler thread from updating the request
+		 * state (by calling dma_fence_signal) as we are processing
+		 * the reset. The write from the GPU of the seqno is
+		 * asynchronous and the signaler thread may see a different
+		 * value to us and declare the request complete, even though
+		 * the reset routine have picked that request as the active
+		 * (incomplete) request. This conflict is not handled
+		 * gracefully!
+		 */
+		kthread_park(engine->breadcrumbs.signaler);
+
 		/* Prevent request submission to the hardware until we have
 		 * completed the reset in i915_gem_reset_finish(). If a request
 		 * is completed by one engine, it may then queue a request
@@ -2796,8 +2808,10 @@ void i915_gem_reset_finish(struct drm_i915_private *dev_priv)
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
-	for_each_engine(engine, dev_priv, id)
+	for_each_engine(engine, dev_priv, id) {
 		tasklet_enable(&engine->irq_tasklet);
+		kthread_unpark(engine->breadcrumbs.signaler);
+	}
 }
 
 static void nop_submit_request(struct drm_i915_gem_request *request)
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 9fd002bcebb6..f5e05343110a 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -490,6 +490,9 @@ static int intel_breadcrumbs_signaler(void *arg)
 				break;
 
 			schedule();
+
+			if (kthread_should_park())
+				kthread_parkme();
 		}
 	} while (1);
 	__set_current_state(TASK_RUNNING);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset
  2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
                   ` (2 preceding siblings ...)
  2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
@ 2017-02-12 17:20 ` Chris Wilson
  2017-02-13 10:55   ` Mika Kuoppala
  2017-02-12 17:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled Patchwork
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Following a hang and reset, we know that the engine is idle and all
context state has been saved or lost. Consequently, we know that the
engine is no longer referencing the last context and we can relinquish
our tracking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
---
 drivers/gpu/drm/i915/i915_gem.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 730804ce9610..638b335e72b6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2788,8 +2788,14 @@ void i915_gem_reset(struct drm_i915_private *dev_priv)
 
 	i915_gem_retire_requests(dev_priv);
 
-	for_each_engine(engine, dev_priv, id)
+	for_each_engine(engine, dev_priv, id) {
+		struct i915_gem_context *ctx;
+
 		i915_gem_reset_engine(engine);
+		ctx = fetch_and_zero(&engine->last_retired_context);
+		if (ctx)
+			engine->context_unpin(engine, ctx);
+	}
 
 	i915_gem_restore_fences(dev_priv);
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled
  2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
                   ` (3 preceding siblings ...)
  2017-02-12 17:20 ` [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset Chris Wilson
@ 2017-02-12 17:52 ` Patchwork
  2017-02-13  9:38 ` [PATCH 1/5] " Chris Wilson
  2017-02-13 10:52 ` ✓ Fi.CI.BAT: success for series starting with [1/5] " Patchwork
  6 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2017-02-12 17:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled
URL   : https://patchwork.freedesktop.org/series/19518/
State : failure

== Summary ==

Series 19518v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/19518/revisions/1/mbox/

Test kms_force_connector_basic:
        Subgroup prune-stale-modes:
                pass       -> SKIP       (fi-ivb-3770)
Test kms_pipe_crc_basic:
        Subgroup nonblocking-crc-pipe-a-frame-sequence:
                pass       -> FAIL       (fi-skl-6700hq)

fi-bdw-5557u     total:252  pass:241  dwarn:0   dfail:0   fail:0   skip:11 
fi-bsw-n3050     total:252  pass:213  dwarn:0   dfail:0   fail:0   skip:39 
fi-bxt-j4205     total:252  pass:233  dwarn:0   dfail:0   fail:0   skip:19 
fi-bxt-t5700     total:83   pass:70   dwarn:0   dfail:0   fail:0   skip:12 
fi-byt-j1900     total:252  pass:225  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-n2820     total:252  pass:221  dwarn:0   dfail:0   fail:0   skip:31 
fi-hsw-4770      total:252  pass:236  dwarn:0   dfail:0   fail:0   skip:16 
fi-hsw-4770r     total:252  pass:236  dwarn:0   dfail:0   fail:0   skip:16 
fi-ilk-650       total:252  pass:202  dwarn:0   dfail:0   fail:0   skip:50 
fi-ivb-3520m     total:252  pass:234  dwarn:0   dfail:0   fail:0   skip:18 
fi-ivb-3770      total:252  pass:233  dwarn:0   dfail:0   fail:0   skip:19 
fi-kbl-7500u     total:252  pass:234  dwarn:0   dfail:0   fail:0   skip:18 
fi-skl-6260u     total:252  pass:242  dwarn:0   dfail:0   fail:0   skip:10 
fi-skl-6700hq    total:252  pass:234  dwarn:0   dfail:0   fail:1   skip:17 
fi-skl-6700k     total:252  pass:230  dwarn:4   dfail:0   fail:0   skip:18 
fi-skl-6770hq    total:252  pass:242  dwarn:0   dfail:0   fail:0   skip:10 
fi-snb-2520m     total:252  pass:224  dwarn:0   dfail:0   fail:0   skip:28 
fi-snb-2600      total:252  pass:223  dwarn:0   dfail:0   fail:0   skip:29 

f48164e93e75dbbe0bb9dac254779eeea08c0657 drm-tip: 2017y-02m-12d-11h-10m-28s UTC integration manifest
68d06e7 drm/i915: Clear the last_retired_context following a hang/reset
218d5de drm/i915: Park the breadcrumbs signaler across a GPU reset
7bf5713 drm/i915: Kill the tasklet then disable
fa51a2d drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare()
76305ef drm/i915: Assert that the active request hasn't been signaled

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3784/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled
  2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
                   ` (4 preceding siblings ...)
  2017-02-12 17:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled Patchwork
@ 2017-02-13  9:38 ` Chris Wilson
  2017-02-13 10:52 ` ✓ Fi.CI.BAT: success for series starting with [1/5] " Patchwork
  6 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-13  9:38 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

On Sun, Feb 12, 2017 at 05:19:58PM +0000, Chris Wilson wrote:
> As the request is not complete, it should not be signaled. Assert that
> this is true before we process the request for a reset.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b8d869d7937d..48922ff454e6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2611,6 +2611,8 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
>  			continue;
>  
>  		GEM_BUG_ON(request->engine != engine);
> +		GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> +				    &request->fence.flags));
>  		return request;
>  	}
>  
> -- 
> 2.11.0
> 

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset
  2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
@ 2017-02-13 10:03   ` Mika Kuoppala
  2017-02-13 10:16   ` Chris Wilson
  1 sibling, 0 replies; 14+ messages in thread
From: Mika Kuoppala @ 2017-02-13 10:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> The signal threads may be running concurrently with the GPU reset. The
> completion from the GPU run asynchronous with the reset and two threads
> may see different snapshots of the state, and the signaler may mark a
> request as complete as we try to reset it. We don't tolerate 2 different
> views of the same state and complain if we try to mark a request as
> failed if it is already complete. Disable the signal threads during
> reset to prevent this conflict (even though the conflict implies that
> the state we resetting to is invalid, we have already made our
> decision!).
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c          | 16 +++++++++++++++-
>  drivers/gpu/drm/i915/intel_breadcrumbs.c |  3 +++
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3c2c2296c1dc..730804ce9610 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -35,6 +35,7 @@
>  #include "intel_frontbuffer.h"
>  #include "intel_mocs.h"
>  #include <linux/dma-fence-array.h>
> +#include <linux/kthread.h>
>  #include <linux/reservation.h>
>  #include <linux/shmem_fs.h>
>  #include <linux/slab.h>
> @@ -2643,6 +2644,17 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
>  	for_each_engine(engine, dev_priv, id) {
>  		struct drm_i915_gem_request *request;
>  
> +		/* Prevent the signaler thread from updating the request
> +		 * state (by calling dma_fence_signal) as we are processing
> +		 * the reset. The write from the GPU of the seqno is
> +		 * asynchronous and the signaler thread may see a different
> +		 * value to us and declare the request complete, even though
> +		 * the reset routine have picked that request as the active
> +		 * (incomplete) request. This conflict is not handled
> +		 * gracefully!
> +		 */
> +		kthread_park(engine->breadcrumbs.signaler);
> +
>  		/* Prevent request submission to the hardware until we have
>  		 * completed the reset in i915_gem_reset_finish(). If a request
>  		 * is completed by one engine, it may then queue a request
> @@ -2796,8 +2808,10 @@ void i915_gem_reset_finish(struct drm_i915_private *dev_priv)
>  
>  	lockdep_assert_held(&dev_priv->drm.struct_mutex);
>  
> -	for_each_engine(engine, dev_priv, id)
> +	for_each_engine(engine, dev_priv, id) {
>  		tasklet_enable(&engine->irq_tasklet);
> +		kthread_unpark(engine->breadcrumbs.signaler);
> +	}
>  }
>  
>  static void nop_submit_request(struct drm_i915_gem_request *request)
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 9fd002bcebb6..f5e05343110a 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -490,6 +490,9 @@ static int intel_breadcrumbs_signaler(void *arg)
>  				break;
>  
>  			schedule();
> +
> +			if (kthread_should_park())
> +				kthread_parkme();
>  		}
>  	} while (1);
>  	__set_current_state(TASK_RUNNING);
> -- 
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset
  2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
  2017-02-13 10:03   ` Mika Kuoppala
@ 2017-02-13 10:16   ` Chris Wilson
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-13 10:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

On Sun, Feb 12, 2017 at 05:20:01PM +0000, Chris Wilson wrote:
> The signal threads may be running concurrently with the GPU reset. The
> completion from the GPU run asynchronous with the reset and two threads
> may see different snapshots of the state, and the signaler may mark a
> request as complete as we try to reset it. We don't tolerate 2 different
> views of the same state and complain if we try to mark a request as
> failed if it is already complete. Disable the signal threads during
> reset to prevent this conflict (even though the conflict implies that
> the state we resetting to is invalid, we have already made our
> decision!).
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare()
  2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
@ 2017-02-13 10:22   ` Mika Kuoppala
  0 siblings, 0 replies; 14+ messages in thread
From: Mika Kuoppala @ 2017-02-13 10:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> As i915_gem_reset_finish() undoes the steps from
> i915_gem_reset_prepare() to leave the system in a fully-working state,
> e.g. to be able to free the breadcrumb signal threads, make sure that we
> always call it even on the error path.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

As disable_irq is our guard against rescheduling the tasklet,
how about moving the disable/enable_irq() pair onto
prepare and finish() functions?

-Mika

> ---
>  drivers/gpu/drm/i915/i915_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 555b61bb8a63..1bc2cd3adc84 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1873,10 +1873,10 @@ void i915_reset(struct drm_i915_private *dev_priv)
>  		goto error;
>  	}
>  
> -	i915_gem_reset_finish(dev_priv);
>  	i915_queue_hangcheck(dev_priv);
>  
>  wakeup:
> +	i915_gem_reset_finish(dev_priv);
>  	enable_irq(dev_priv->drm.irq);
>  	wake_up_bit(&error->flags, I915_RESET_IN_PROGRESS);
>  	return;
> -- 
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] drm/i915: Kill the tasklet then disable
  2017-02-12 17:20 ` [PATCH 3/5] drm/i915: Kill the tasklet then disable Chris Wilson
@ 2017-02-13 10:30   ` Mika Kuoppala
  0 siblings, 0 replies; 14+ messages in thread
From: Mika Kuoppala @ 2017-02-13 10:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Disabling the tasklet leaves it if scheduled on the ready to run list
> until it is re-enabled. This will leave the ksoftird thread spinning
> until satisfied. To prevent this situation on starting the GPU reset, we
> want to kill the tasklet first and then disable. The same problem will
> arise when a tasklet is scheduled from another device, so a better
> solution is required for the general case.
>
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: 1f7b847d72c3 ("drm/i915: Disable engine->irq_tasklet around resets")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 48922ff454e6..3c2c2296c1dc 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2651,8 +2651,8 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
>  		 * Turning off the engine->irq_tasklet until the reset is over
>  		 * prevents the race.
>  		 */
> -		tasklet_disable(&engine->irq_tasklet);
>  		tasklet_kill(&engine->irq_tasklet);
> +		tasklet_disable(&engine->irq_tasklet);
>  
>  		if (engine->irq_seqno_barrier)
>  			engine->irq_seqno_barrier(engine);
> -- 
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled
  2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
                   ` (5 preceding siblings ...)
  2017-02-13  9:38 ` [PATCH 1/5] " Chris Wilson
@ 2017-02-13 10:52 ` Patchwork
  6 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2017-02-13 10:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled
URL   : https://patchwork.freedesktop.org/series/19518/
State : success

== Summary ==

Series 19518v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/19518/revisions/1/mbox/

Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                incomplete -> PASS       (fi-skl-6260u) FDO#99750
Test pm_rpm:
        Subgroup basic-pci-d3-state:
                incomplete -> PASS       (fi-byt-n2820) FDO#99740

FDO#99750 https://bugs.freedesktop.org/show_bug.cgi?id=99750
FDO#99740 https://bugs.freedesktop.org/show_bug.cgi?id=99740

fi-bdw-5557u     total:252  pass:241  dwarn:0   dfail:0   fail:0   skip:11 
fi-bsw-n3050     total:252  pass:213  dwarn:0   dfail:0   fail:0   skip:39 
fi-bxt-j4205     total:252  pass:233  dwarn:0   dfail:0   fail:0   skip:19 
fi-bxt-t5700     total:83   pass:70   dwarn:0   dfail:0   fail:0   skip:12 
fi-byt-j1900     total:252  pass:225  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-n2820     total:252  pass:221  dwarn:0   dfail:0   fail:0   skip:31 
fi-hsw-4770      total:252  pass:236  dwarn:0   dfail:0   fail:0   skip:16 
fi-hsw-4770r     total:252  pass:236  dwarn:0   dfail:0   fail:0   skip:16 
fi-ilk-650       total:252  pass:202  dwarn:0   dfail:0   fail:0   skip:50 
fi-ivb-3520m     total:252  pass:234  dwarn:0   dfail:0   fail:0   skip:18 
fi-ivb-3770      total:252  pass:234  dwarn:0   dfail:0   fail:0   skip:18 
fi-kbl-7500u     total:252  pass:234  dwarn:0   dfail:0   fail:0   skip:18 
fi-skl-6260u     total:252  pass:242  dwarn:0   dfail:0   fail:0   skip:10 
fi-skl-6700hq    total:252  pass:235  dwarn:0   dfail:0   fail:0   skip:17 
fi-skl-6700k     total:252  pass:230  dwarn:4   dfail:0   fail:0   skip:18 
fi-skl-6770hq    total:252  pass:242  dwarn:0   dfail:0   fail:0   skip:10 
fi-snb-2520m     total:252  pass:224  dwarn:0   dfail:0   fail:0   skip:28 
fi-snb-2600      total:252  pass:223  dwarn:0   dfail:0   fail:0   skip:29 

91e2366ecab8b191f15f4457bde3b4e15f5e152d drm-tip: 2017y-02m-13d-09h-00m-05s UTC integration manifest
14c404f drm/i915: Clear the last_retired_context following a hang/reset
065c135 drm/i915: Park the breadcrumbs signaler across a GPU reset
73c9bba drm/i915: Kill the tasklet then disable
6f6498c drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare()
104c7b7 drm/i915: Assert that the active request hasn't been signaled

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3787/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset
  2017-02-12 17:20 ` [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset Chris Wilson
@ 2017-02-13 10:55   ` Mika Kuoppala
  2017-02-13 11:25     ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Mika Kuoppala @ 2017-02-13 10:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Following a hang and reset, we know that the engine is idle and all
> context state has been saved or lost. Consequently, we know that the
> engine is no longer referencing the last context and we can relinquish
> our tracking.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tursulin@ursulin.net>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 730804ce9610..638b335e72b6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2788,8 +2788,14 @@ void i915_gem_reset(struct drm_i915_private *dev_priv)
>  
>  	i915_gem_retire_requests(dev_priv);
>  
> -	for_each_engine(engine, dev_priv, id)
> +	for_each_engine(engine, dev_priv, id) {
> +		struct i915_gem_context *ctx;
> +
>  		i915_gem_reset_engine(engine);
> +		ctx = fetch_and_zero(&engine->last_retired_context);
> +		if (ctx)
> +			engine->context_unpin(engine, ctx);
> +	}
>  
>  	i915_gem_restore_fences(dev_priv);
>  
> -- 
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset
  2017-02-13 10:55   ` Mika Kuoppala
@ 2017-02-13 11:25     ` Chris Wilson
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-13 11:25 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Mon, Feb 13, 2017 at 12:55:02PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Following a hang and reset, we know that the engine is idle and all
> > context state has been saved or lost. Consequently, we know that the
> > engine is no longer referencing the last context and we can relinquish
> > our tracking.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

Thanks for working through these. Hopefully we are getting resets stable
in the new async world :|
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-02-13 11:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
2017-02-13 10:22   ` Mika Kuoppala
2017-02-12 17:20 ` [PATCH 3/5] drm/i915: Kill the tasklet then disable Chris Wilson
2017-02-13 10:30   ` Mika Kuoppala
2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
2017-02-13 10:03   ` Mika Kuoppala
2017-02-13 10:16   ` Chris Wilson
2017-02-12 17:20 ` [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset Chris Wilson
2017-02-13 10:55   ` Mika Kuoppala
2017-02-13 11:25     ` Chris Wilson
2017-02-12 17:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled Patchwork
2017-02-13  9:38 ` [PATCH 1/5] " Chris Wilson
2017-02-13 10:52 ` ✓ Fi.CI.BAT: success for series starting with [1/5] " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.