* [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled
@ 2017-02-12 17:19 Chris Wilson
2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
` (6 more replies)
0 siblings, 7 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:19 UTC (permalink / raw)
To: intel-gfx
As the request is not complete, it should not be signaled. Assert that
this is true before we process the request for a reset.
References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/i915_gem.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b8d869d7937d..48922ff454e6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2611,6 +2611,8 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
continue;
GEM_BUG_ON(request->engine != engine);
+ GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+ &request->fence.flags));
return request;
}
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare()
2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
@ 2017-02-12 17:19 ` Chris Wilson
2017-02-13 10:22 ` Mika Kuoppala
2017-02-12 17:20 ` [PATCH 3/5] drm/i915: Kill the tasklet then disable Chris Wilson
` (5 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:19 UTC (permalink / raw)
To: intel-gfx
As i915_gem_reset_finish() undoes the steps from
i915_gem_reset_prepare() to leave the system in a fully-working state,
e.g. to be able to free the breadcrumb signal threads, make sure that we
always call it even on the error path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/i915_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 555b61bb8a63..1bc2cd3adc84 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1873,10 +1873,10 @@ void i915_reset(struct drm_i915_private *dev_priv)
goto error;
}
- i915_gem_reset_finish(dev_priv);
i915_queue_hangcheck(dev_priv);
wakeup:
+ i915_gem_reset_finish(dev_priv);
enable_irq(dev_priv->drm.irq);
wake_up_bit(&error->flags, I915_RESET_IN_PROGRESS);
return;
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/5] drm/i915: Kill the tasklet then disable
2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
@ 2017-02-12 17:20 ` Chris Wilson
2017-02-13 10:30 ` Mika Kuoppala
2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
` (4 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:20 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Disabling the tasklet leaves it if scheduled on the ready to run list
until it is re-enabled. This will leave the ksoftird thread spinning
until satisfied. To prevent this situation on starting the GPU reset, we
want to kill the tasklet first and then disable. The same problem will
arise when a tasklet is scheduled from another device, so a better
solution is required for the general case.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1f7b847d72c3 ("drm/i915: Disable engine->irq_tasklet around resets")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 48922ff454e6..3c2c2296c1dc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2651,8 +2651,8 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
* Turning off the engine->irq_tasklet until the reset is over
* prevents the race.
*/
- tasklet_disable(&engine->irq_tasklet);
tasklet_kill(&engine->irq_tasklet);
+ tasklet_disable(&engine->irq_tasklet);
if (engine->irq_seqno_barrier)
engine->irq_seqno_barrier(engine);
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset
2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
2017-02-12 17:20 ` [PATCH 3/5] drm/i915: Kill the tasklet then disable Chris Wilson
@ 2017-02-12 17:20 ` Chris Wilson
2017-02-13 10:03 ` Mika Kuoppala
2017-02-13 10:16 ` Chris Wilson
2017-02-12 17:20 ` [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset Chris Wilson
` (3 subsequent siblings)
6 siblings, 2 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:20 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
The signal threads may be running concurrently with the GPU reset. The
completion from the GPU run asynchronous with the reset and two threads
may see different snapshots of the state, and the signaler may mark a
request as complete as we try to reset it. We don't tolerate 2 different
views of the same state and complain if we try to mark a request as
failed if it is already complete. Disable the signal threads during
reset to prevent this conflict (even though the conflict implies that
the state we resetting to is invalid, we have already made our
decision!).
References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 16 +++++++++++++++-
drivers/gpu/drm/i915/intel_breadcrumbs.c | 3 +++
2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3c2c2296c1dc..730804ce9610 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -35,6 +35,7 @@
#include "intel_frontbuffer.h"
#include "intel_mocs.h"
#include <linux/dma-fence-array.h>
+#include <linux/kthread.h>
#include <linux/reservation.h>
#include <linux/shmem_fs.h>
#include <linux/slab.h>
@@ -2643,6 +2644,17 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
for_each_engine(engine, dev_priv, id) {
struct drm_i915_gem_request *request;
+ /* Prevent the signaler thread from updating the request
+ * state (by calling dma_fence_signal) as we are processing
+ * the reset. The write from the GPU of the seqno is
+ * asynchronous and the signaler thread may see a different
+ * value to us and declare the request complete, even though
+ * the reset routine have picked that request as the active
+ * (incomplete) request. This conflict is not handled
+ * gracefully!
+ */
+ kthread_park(engine->breadcrumbs.signaler);
+
/* Prevent request submission to the hardware until we have
* completed the reset in i915_gem_reset_finish(). If a request
* is completed by one engine, it may then queue a request
@@ -2796,8 +2808,10 @@ void i915_gem_reset_finish(struct drm_i915_private *dev_priv)
lockdep_assert_held(&dev_priv->drm.struct_mutex);
- for_each_engine(engine, dev_priv, id)
+ for_each_engine(engine, dev_priv, id) {
tasklet_enable(&engine->irq_tasklet);
+ kthread_unpark(engine->breadcrumbs.signaler);
+ }
}
static void nop_submit_request(struct drm_i915_gem_request *request)
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 9fd002bcebb6..f5e05343110a 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -490,6 +490,9 @@ static int intel_breadcrumbs_signaler(void *arg)
break;
schedule();
+
+ if (kthread_should_park())
+ kthread_parkme();
}
} while (1);
__set_current_state(TASK_RUNNING);
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset
2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
` (2 preceding siblings ...)
2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
@ 2017-02-12 17:20 ` Chris Wilson
2017-02-13 10:55 ` Mika Kuoppala
2017-02-12 17:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled Patchwork
` (2 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-02-12 17:20 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Following a hang and reset, we know that the engine is idle and all
context state has been saved or lost. Consequently, we know that the
engine is no longer referencing the last context and we can relinquish
our tracking.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
---
drivers/gpu/drm/i915/i915_gem.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 730804ce9610..638b335e72b6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2788,8 +2788,14 @@ void i915_gem_reset(struct drm_i915_private *dev_priv)
i915_gem_retire_requests(dev_priv);
- for_each_engine(engine, dev_priv, id)
+ for_each_engine(engine, dev_priv, id) {
+ struct i915_gem_context *ctx;
+
i915_gem_reset_engine(engine);
+ ctx = fetch_and_zero(&engine->last_retired_context);
+ if (ctx)
+ engine->context_unpin(engine, ctx);
+ }
i915_gem_restore_fences(dev_priv);
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* ✗ Fi.CI.BAT: failure for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled
2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
` (3 preceding siblings ...)
2017-02-12 17:20 ` [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset Chris Wilson
@ 2017-02-12 17:52 ` Patchwork
2017-02-13 9:38 ` [PATCH 1/5] " Chris Wilson
2017-02-13 10:52 ` ✓ Fi.CI.BAT: success for series starting with [1/5] " Patchwork
6 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2017-02-12 17:52 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled
URL : https://patchwork.freedesktop.org/series/19518/
State : failure
== Summary ==
Series 19518v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/19518/revisions/1/mbox/
Test kms_force_connector_basic:
Subgroup prune-stale-modes:
pass -> SKIP (fi-ivb-3770)
Test kms_pipe_crc_basic:
Subgroup nonblocking-crc-pipe-a-frame-sequence:
pass -> FAIL (fi-skl-6700hq)
fi-bdw-5557u total:252 pass:241 dwarn:0 dfail:0 fail:0 skip:11
fi-bsw-n3050 total:252 pass:213 dwarn:0 dfail:0 fail:0 skip:39
fi-bxt-j4205 total:252 pass:233 dwarn:0 dfail:0 fail:0 skip:19
fi-bxt-t5700 total:83 pass:70 dwarn:0 dfail:0 fail:0 skip:12
fi-byt-j1900 total:252 pass:225 dwarn:0 dfail:0 fail:0 skip:27
fi-byt-n2820 total:252 pass:221 dwarn:0 dfail:0 fail:0 skip:31
fi-hsw-4770 total:252 pass:236 dwarn:0 dfail:0 fail:0 skip:16
fi-hsw-4770r total:252 pass:236 dwarn:0 dfail:0 fail:0 skip:16
fi-ilk-650 total:252 pass:202 dwarn:0 dfail:0 fail:0 skip:50
fi-ivb-3520m total:252 pass:234 dwarn:0 dfail:0 fail:0 skip:18
fi-ivb-3770 total:252 pass:233 dwarn:0 dfail:0 fail:0 skip:19
fi-kbl-7500u total:252 pass:234 dwarn:0 dfail:0 fail:0 skip:18
fi-skl-6260u total:252 pass:242 dwarn:0 dfail:0 fail:0 skip:10
fi-skl-6700hq total:252 pass:234 dwarn:0 dfail:0 fail:1 skip:17
fi-skl-6700k total:252 pass:230 dwarn:4 dfail:0 fail:0 skip:18
fi-skl-6770hq total:252 pass:242 dwarn:0 dfail:0 fail:0 skip:10
fi-snb-2520m total:252 pass:224 dwarn:0 dfail:0 fail:0 skip:28
fi-snb-2600 total:252 pass:223 dwarn:0 dfail:0 fail:0 skip:29
f48164e93e75dbbe0bb9dac254779eeea08c0657 drm-tip: 2017y-02m-12d-11h-10m-28s UTC integration manifest
68d06e7 drm/i915: Clear the last_retired_context following a hang/reset
218d5de drm/i915: Park the breadcrumbs signaler across a GPU reset
7bf5713 drm/i915: Kill the tasklet then disable
fa51a2d drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare()
76305ef drm/i915: Assert that the active request hasn't been signaled
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3784/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled
2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
` (4 preceding siblings ...)
2017-02-12 17:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled Patchwork
@ 2017-02-13 9:38 ` Chris Wilson
2017-02-13 10:52 ` ✓ Fi.CI.BAT: success for series starting with [1/5] " Patchwork
6 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-13 9:38 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
On Sun, Feb 12, 2017 at 05:19:58PM +0000, Chris Wilson wrote:
> As the request is not complete, it should not be signaled. Assert that
> this is true before we process the request for a reset.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b8d869d7937d..48922ff454e6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2611,6 +2611,8 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
> continue;
>
> GEM_BUG_ON(request->engine != engine);
> + GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> + &request->fence.flags));
> return request;
> }
>
> --
> 2.11.0
>
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset
2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
@ 2017-02-13 10:03 ` Mika Kuoppala
2017-02-13 10:16 ` Chris Wilson
1 sibling, 0 replies; 14+ messages in thread
From: Mika Kuoppala @ 2017-02-13 10:03 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> The signal threads may be running concurrently with the GPU reset. The
> completion from the GPU run asynchronous with the reset and two threads
> may see different snapshots of the state, and the signaler may mark a
> request as complete as we try to reset it. We don't tolerate 2 different
> views of the same state and complain if we try to mark a request as
> failed if it is already complete. Disable the signal threads during
> reset to prevent this conflict (even though the conflict implies that
> the state we resetting to is invalid, we have already made our
> decision!).
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 16 +++++++++++++++-
> drivers/gpu/drm/i915/intel_breadcrumbs.c | 3 +++
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3c2c2296c1dc..730804ce9610 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -35,6 +35,7 @@
> #include "intel_frontbuffer.h"
> #include "intel_mocs.h"
> #include <linux/dma-fence-array.h>
> +#include <linux/kthread.h>
> #include <linux/reservation.h>
> #include <linux/shmem_fs.h>
> #include <linux/slab.h>
> @@ -2643,6 +2644,17 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
> for_each_engine(engine, dev_priv, id) {
> struct drm_i915_gem_request *request;
>
> + /* Prevent the signaler thread from updating the request
> + * state (by calling dma_fence_signal) as we are processing
> + * the reset. The write from the GPU of the seqno is
> + * asynchronous and the signaler thread may see a different
> + * value to us and declare the request complete, even though
> + * the reset routine have picked that request as the active
> + * (incomplete) request. This conflict is not handled
> + * gracefully!
> + */
> + kthread_park(engine->breadcrumbs.signaler);
> +
> /* Prevent request submission to the hardware until we have
> * completed the reset in i915_gem_reset_finish(). If a request
> * is completed by one engine, it may then queue a request
> @@ -2796,8 +2808,10 @@ void i915_gem_reset_finish(struct drm_i915_private *dev_priv)
>
> lockdep_assert_held(&dev_priv->drm.struct_mutex);
>
> - for_each_engine(engine, dev_priv, id)
> + for_each_engine(engine, dev_priv, id) {
> tasklet_enable(&engine->irq_tasklet);
> + kthread_unpark(engine->breadcrumbs.signaler);
> + }
> }
>
> static void nop_submit_request(struct drm_i915_gem_request *request)
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 9fd002bcebb6..f5e05343110a 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -490,6 +490,9 @@ static int intel_breadcrumbs_signaler(void *arg)
> break;
>
> schedule();
> +
> + if (kthread_should_park())
> + kthread_parkme();
> }
> } while (1);
> __set_current_state(TASK_RUNNING);
> --
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset
2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
2017-02-13 10:03 ` Mika Kuoppala
@ 2017-02-13 10:16 ` Chris Wilson
1 sibling, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-13 10:16 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
On Sun, Feb 12, 2017 at 05:20:01PM +0000, Chris Wilson wrote:
> The signal threads may be running concurrently with the GPU reset. The
> completion from the GPU run asynchronous with the reset and two threads
> may see different snapshots of the state, and the signaler may mark a
> request as complete as we try to reset it. We don't tolerate 2 different
> views of the same state and complain if we try to mark a request as
> failed if it is already complete. Disable the signal threads during
> reset to prevent this conflict (even though the conflict implies that
> the state we resetting to is invalid, we have already made our
> decision!).
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare()
2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
@ 2017-02-13 10:22 ` Mika Kuoppala
0 siblings, 0 replies; 14+ messages in thread
From: Mika Kuoppala @ 2017-02-13 10:22 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> As i915_gem_reset_finish() undoes the steps from
> i915_gem_reset_prepare() to leave the system in a fully-working state,
> e.g. to be able to free the breadcrumb signal threads, make sure that we
> always call it even on the error path.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
As disable_irq is our guard against rescheduling the tasklet,
how about moving the disable/enable_irq() pair onto
prepare and finish() functions?
-Mika
> ---
> drivers/gpu/drm/i915/i915_drv.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 555b61bb8a63..1bc2cd3adc84 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1873,10 +1873,10 @@ void i915_reset(struct drm_i915_private *dev_priv)
> goto error;
> }
>
> - i915_gem_reset_finish(dev_priv);
> i915_queue_hangcheck(dev_priv);
>
> wakeup:
> + i915_gem_reset_finish(dev_priv);
> enable_irq(dev_priv->drm.irq);
> wake_up_bit(&error->flags, I915_RESET_IN_PROGRESS);
> return;
> --
> 2.11.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] drm/i915: Kill the tasklet then disable
2017-02-12 17:20 ` [PATCH 3/5] drm/i915: Kill the tasklet then disable Chris Wilson
@ 2017-02-13 10:30 ` Mika Kuoppala
0 siblings, 0 replies; 14+ messages in thread
From: Mika Kuoppala @ 2017-02-13 10:30 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Disabling the tasklet leaves it if scheduled on the ready to run list
> until it is re-enabled. This will leave the ksoftird thread spinning
> until satisfied. To prevent this situation on starting the GPU reset, we
> want to kill the tasklet first and then disable. The same problem will
> arise when a tasklet is scheduled from another device, so a better
> solution is required for the general case.
>
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: 1f7b847d72c3 ("drm/i915: Disable engine->irq_tasklet around resets")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 48922ff454e6..3c2c2296c1dc 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2651,8 +2651,8 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
> * Turning off the engine->irq_tasklet until the reset is over
> * prevents the race.
> */
> - tasklet_disable(&engine->irq_tasklet);
> tasklet_kill(&engine->irq_tasklet);
> + tasklet_disable(&engine->irq_tasklet);
>
> if (engine->irq_seqno_barrier)
> engine->irq_seqno_barrier(engine);
> --
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled
2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
` (5 preceding siblings ...)
2017-02-13 9:38 ` [PATCH 1/5] " Chris Wilson
@ 2017-02-13 10:52 ` Patchwork
6 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2017-02-13 10:52 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled
URL : https://patchwork.freedesktop.org/series/19518/
State : success
== Summary ==
Series 19518v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/19518/revisions/1/mbox/
Test kms_pipe_crc_basic:
Subgroup suspend-read-crc-pipe-b:
incomplete -> PASS (fi-skl-6260u) FDO#99750
Test pm_rpm:
Subgroup basic-pci-d3-state:
incomplete -> PASS (fi-byt-n2820) FDO#99740
FDO#99750 https://bugs.freedesktop.org/show_bug.cgi?id=99750
FDO#99740 https://bugs.freedesktop.org/show_bug.cgi?id=99740
fi-bdw-5557u total:252 pass:241 dwarn:0 dfail:0 fail:0 skip:11
fi-bsw-n3050 total:252 pass:213 dwarn:0 dfail:0 fail:0 skip:39
fi-bxt-j4205 total:252 pass:233 dwarn:0 dfail:0 fail:0 skip:19
fi-bxt-t5700 total:83 pass:70 dwarn:0 dfail:0 fail:0 skip:12
fi-byt-j1900 total:252 pass:225 dwarn:0 dfail:0 fail:0 skip:27
fi-byt-n2820 total:252 pass:221 dwarn:0 dfail:0 fail:0 skip:31
fi-hsw-4770 total:252 pass:236 dwarn:0 dfail:0 fail:0 skip:16
fi-hsw-4770r total:252 pass:236 dwarn:0 dfail:0 fail:0 skip:16
fi-ilk-650 total:252 pass:202 dwarn:0 dfail:0 fail:0 skip:50
fi-ivb-3520m total:252 pass:234 dwarn:0 dfail:0 fail:0 skip:18
fi-ivb-3770 total:252 pass:234 dwarn:0 dfail:0 fail:0 skip:18
fi-kbl-7500u total:252 pass:234 dwarn:0 dfail:0 fail:0 skip:18
fi-skl-6260u total:252 pass:242 dwarn:0 dfail:0 fail:0 skip:10
fi-skl-6700hq total:252 pass:235 dwarn:0 dfail:0 fail:0 skip:17
fi-skl-6700k total:252 pass:230 dwarn:4 dfail:0 fail:0 skip:18
fi-skl-6770hq total:252 pass:242 dwarn:0 dfail:0 fail:0 skip:10
fi-snb-2520m total:252 pass:224 dwarn:0 dfail:0 fail:0 skip:28
fi-snb-2600 total:252 pass:223 dwarn:0 dfail:0 fail:0 skip:29
91e2366ecab8b191f15f4457bde3b4e15f5e152d drm-tip: 2017y-02m-13d-09h-00m-05s UTC integration manifest
14c404f drm/i915: Clear the last_retired_context following a hang/reset
065c135 drm/i915: Park the breadcrumbs signaler across a GPU reset
73c9bba drm/i915: Kill the tasklet then disable
6f6498c drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare()
104c7b7 drm/i915: Assert that the active request hasn't been signaled
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3787/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset
2017-02-12 17:20 ` [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset Chris Wilson
@ 2017-02-13 10:55 ` Mika Kuoppala
2017-02-13 11:25 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Mika Kuoppala @ 2017-02-13 10:55 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Following a hang and reset, we know that the engine is idle and all
> context state has been saved or lost. Consequently, we know that the
> engine is no longer referencing the last context and we can relinquish
> our tracking.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tursulin@ursulin.net>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 730804ce9610..638b335e72b6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2788,8 +2788,14 @@ void i915_gem_reset(struct drm_i915_private *dev_priv)
>
> i915_gem_retire_requests(dev_priv);
>
> - for_each_engine(engine, dev_priv, id)
> + for_each_engine(engine, dev_priv, id) {
> + struct i915_gem_context *ctx;
> +
> i915_gem_reset_engine(engine);
> + ctx = fetch_and_zero(&engine->last_retired_context);
> + if (ctx)
> + engine->context_unpin(engine, ctx);
> + }
>
> i915_gem_restore_fences(dev_priv);
>
> --
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset
2017-02-13 10:55 ` Mika Kuoppala
@ 2017-02-13 11:25 ` Chris Wilson
0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-02-13 11:25 UTC (permalink / raw)
To: Mika Kuoppala; +Cc: intel-gfx
On Mon, Feb 13, 2017 at 12:55:02PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > Following a hang and reset, we know that the engine is idle and all
> > context state has been saved or lost. Consequently, we know that the
> > engine is no longer referencing the last context and we can relinquish
> > our tracking.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Thanks for working through these. Hopefully we are getting resets stable
in the new async world :|
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-02-13 11:25 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-12 17:19 [PATCH 1/5] drm/i915: Assert that the active request hasn't been signaled Chris Wilson
2017-02-12 17:19 ` [PATCH 2/5] drm/i915: Always call i915_gem_reset_finish() following i915_gem_reset_prepare() Chris Wilson
2017-02-13 10:22 ` Mika Kuoppala
2017-02-12 17:20 ` [PATCH 3/5] drm/i915: Kill the tasklet then disable Chris Wilson
2017-02-13 10:30 ` Mika Kuoppala
2017-02-12 17:20 ` [PATCH 4/5] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
2017-02-13 10:03 ` Mika Kuoppala
2017-02-13 10:16 ` Chris Wilson
2017-02-12 17:20 ` [PATCH 5/5] drm/i915: Clear the last_retired_context following a hang/reset Chris Wilson
2017-02-13 10:55 ` Mika Kuoppala
2017-02-13 11:25 ` Chris Wilson
2017-02-12 17:52 ` ✗ Fi.CI.BAT: failure for series starting with [1/5] drm/i915: Assert that the active request hasn't been signaled Patchwork
2017-02-13 9:38 ` [PATCH 1/5] " Chris Wilson
2017-02-13 10:52 ` ✓ Fi.CI.BAT: success for series starting with [1/5] " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.